ArticlePDF Available

ESAENARX and DE-RELM: Novel schemes for big data predictive analytics of electricity load and price

Authors:

Abstract and Figures

Accurate forecasting of the electricity price and load is an essential and challenging task in smart grids. Since electricity load and price have a strong correlation, the forecast accuracy degrades when bidirectional relation of price and load is not considered. Therefore, this paper considers price and load relationship and proposes two Multiple Inputs Multiple Outputs (MIMO) Deep Recurrent Neural Networks (DRNNs) models for price and load forecasting. The first proposed model, Efficient Sparse Autoencoder Nonlinear Autoregressive Network with eXogenous inputs (ESAENARX) comprises of feature engineering and forecasting. For feature engineering, we propose ESAE and performed forecasting using existing method NARX. The second proposed model: Differential Evolution Recurrent Extreme Learning Machine (DE-RELM) is based on RELM model and the meta-heuristic DE optimization technique. The descriptive and predictive analyses are performed on two well-known electricity markets' big data, i.e., ISO NE and PJM. The proposed models outperform their sub models and a benchmark model. The refined and informative features extracted by ESAE improve the forecasting accuracy in ESANARX and optimization improves the DE-RELMâââs accuracy. As compared to cascade Elman network, ESAENARX has reduced MAPE upto 16% for load forecasting, 7% for price forecasting. DE-RELM reduce 1% MAPE for both load and price forecasting.
Content may be subject to copyright.
ESAENARX and DE-RELM: Novel Schemes for Big Data Predictive
Analytics of Electricity Load and Price
Sana Mujeeba,Nadeem Javaida,
aCOMSATS University Islamabad, Islamabad 44000, Pakistan
ARTICLE INFO
Keywords:
Coordination
Dynamic programming
Knapsack
Multi-objective optimization
Pareto front
Meta-heuristic
Nature-inspired
Bird swarm and Cuckoo search algo-
rithm
Hybrid technique
Demand side management
Demand response
Smart grid.
ABSTRACT
Accurate forecasting of the electricity price and load is an essential and challenging task in smart
grids. Since electricity load and price have a strong correlation, the forecast accuracy degrades when
bidirectional relation of price and load is not considered. Therefore, this paper considers price and
load relationship and proposes two Multiple Inputs Multiple Outputs (MIMO) Deep Recurrent Neural
Networks (DRNNs) models for price and load forecasting. The first proposed model, Efficient Sparse
Autoencoder Nonlinear Autoregressive Network with eXogenous inputs (ESAENARX) comprises of
feature engineering and forecasting. For feature engineering, we propose ESAE and performed fore-
casting using existing method NARX. The second proposed model: Differential Evolution Recurrent
Extreme Learning Machine (DE-RELM) is based on RELM model and the meta-heuristic DE opti-
mization technique. The descriptive and predictive analyses are performed on two well-known elec-
tricity markets’ big data, i.e., ISO NE and PJM. The proposed models outperform their sub models and
a benchmark model. The refined and informative features extracted by ESAE improve the forecasting
accuracy in ESANARX and optimization improves the DE-RELMâĂŹs accuracy. As compared to
cascade Elman network, ESAENARX has reduced MAPE upto 16% for load forecasting, 7% for price
forecasting. DE-RELM reduce 1% MAPE for both load and price forecasting.
1. Introduction
THE smart grid is a modern power supply network that
uses communication technology. It consists of automation,
control and technology that responds quickly to the con-
sumption changes. Smart grid provides energy in an effi-
cient, secure, reliable, economical and environment-friendly
manner. Renewable Energy Sources (RESs) of power gener-
ation are integrated for reducing the carbon emission. It al-
lows a two-way communication between the consumers and
utility. With the emergence of smart metering infrastructure,
consumers are informed about the price per unit in advance.
Consumers can adjust their demand load economically, ac-
cording to the price signals. They can reduce consumption
cost by shifting load to a low price hour. Smart grids make
a price responsive environment where the price varies from
a change in demand and vice versa.
In unidirectional grids, there is a one-way interaction from
the generation side to consumers. The consumers are not
able to respond to the price signal because of the fact that
they are unaware of the price dynamically. The demand has
shown a very little or no elasticity to price variations in uni-
directional grids. However, with the advent of the smart me-
tering system, consumers are well aware of the price and
they control their power consumption accordingly. There-
fore, price and demand are highly correlated and interde-
pendent. The market participants need reliable techniques to
maximize their profit that depends on accurate load and price
forecasting. The price and demand forecasting also play an
Corresponding author
nadeemjavaidqau@gmail.com (N. Javaid)
www.njavaid.com (N. Javaid)
ORCID(s): 0000-0003-3777-8249 (N. Javaid)
important role in energy: systems planning, market design,
security of supply and operation planning for future power
consumption. An accurate forecast is very important. A 1%
reduction in Mean Absolute Percentage Error (MAPE) of the
load forecast reduces the generation cost to 0.1% to 0.3% [1].
0.1% generation cost is approximately $1 million annually in
a large scale smart grid. Due to the importance of an accu-
rate forecast of electricity price and load, the researchers are
still competing for improving the forecast accuracy. Using
big data for predictive analytics improves the forecasting ac-
curacy [2]. Electricity data is big data as the smart meters
record data in small time intervals [3]. In a large-sized smart
grid, approximately 220 million smart meter measurements
are recorded daily. Analytics of this energy big data helps
the power utilities to get deeper insights of consumer behav-
ior [4]. The volume of input data is increasing and train-
ing of classical forecasting methods is difficult. Processing
of big data by classifier based models is very difficult. Be-
cause of their high space and time complexity. On the other
hand, Deep Neural Networks (DNN) perform very well on
big data [5]. DNN has an excellent ability of self learning
and nonlinear approximation. They optimize the space by
dividing the training data into mini-batches. After dividing
whole data is trained batch by batch.
The rest of the paper is organized as: Section 2is related
work, the problem statement is stated in Section 3, descrip-
tions of used methods are presented in Section 4, proposed
models are described in Section 12 and DE-RELM 13, Sec-
tion 15 is simulations and results and Section 22 concludes
this article.
Sana et al.: Preprint submitted to Elsevier Page 1 of 16
Big Data Predictive Analytics of Electricity Load and Price
2. Related Work
With the advent of smart metering system, the energy-
related data is collected in a very huge volume at a high ve-
locity from variety of sources. This data is referred as en-
ergy big data. For making decision regarding energy mar-
ket operation, predictive analytics is performed on this load
and price data. For maintaining the demand and supply bal-
ance, an accurate prediction of load is essential. Whereas,
the price forecasting plays an important role in the bidding
process and energy trading. To ensure the reliability, stabil-
ity and security of smart grid accurate forecasts of electricity
load and price are essential. Electricity load and price have
bi-directional nature, therefore, simultaneous prediction of
load and price yields greater accuracy.
The authors of papers [6,7] have predicted price and load si-
multaneously. Authors of [6] have proposed a hybrid model
for simultaneous forecasting of electricity load and price.
The proposed model consists of three stages, i.e., denois-
ing, feature engineering and forecasting. For denoising,
authors propose a new Wavelet Packet Transform (WPT)
based method, Flexible WPT (FWPT). The features are se-
lected by adjacent features and Conditional Mutual Infor-
mation (CMI). In the forecasting step, Autoregressive In-
tegrated Moving Average (ARIMA) and Nonlinear Least
Square Support Vector Machine (NLSSVM) are employed
for linear and nonlinear modeling. The NLSSVM is opti-
mized using enhanced optimization technique Time Vary-
ing Artificial Bee Colony (TV-ABC). This hybrid model re-
sults in reasonable forecasting accuracy, however, the model
is highly complex. Moreover, the optimization of forecast-
ing model leads to over-fitting. In paper [7], authors pre-
dict load and price using a multi-stage forecasting approach.
The complex forecasting approach proposed in this work is
comprised of feature selection and multi-stage forecast en-
gine. Features are selected through a modified Maximum
Relevancy Minimum Redundancy (MRMR) method. Elec-
tricity load and price are forecasted using multi-block Arti-
ficial Neural Network (ANN) known as Elman Neural Net-
work (ENN). The forecasting model is optimized by a shark
smell optimization method. This method results in a reason-
able forecasting accuracy. However, it is computationally
very expensive. The feature engineering process and opti-
mization of ENN increase complexity. Moreover, big data
is not considered in this method. In paper [7], authors pre-
dict load and price using a multi-stage forecasting approach.
The complex forecasting approach proposed in this work is
comprised of feature selection and multi-stage forecast en-
gine. Features are selected through a modified Maximum
Relevancy Minimum Redundancy (MRMR) method. Elec-
tricity load and price are forecasted using multi-block Arti-
ficial Neural Network (ANN) known as Elman Neural Net-
work (ENN). The forecasting model is optimized by a shark
smell optimization method. This method results in a rea-
sonable forecasting accuracy. However, it is computation-
ally very expensive. The feature engineering process and
optimization of ENN increase complexity. Moreover, big
data is not considered in this method. Authors of paper [8]
have conducted a predictive analysis of electricity price fore-
casting taking advantage of big data. The relevant features
for the training prediction model are selected through an ex-
tensive feature engineering process. This process has three
steps: firstly, correlated features are selected using Gray
Correlation Analysis (GCA). Secondly, features are selected
through a hybrid of two feature selection methods: RliefF
and Random Forest (RF) are used for further feature selec-
tion. Lastly, the Kernel Principle Analysis (KPCA) is ap-
plied for dimension reduction. Price is predicted by SVM
and the hyper-parameters of SVM are optimized through
modified Differential Evolution (DE). In paper [9], the au-
thors forecast the energy consumption on big data. An anal-
ysis of frequent patterns is performed using a supervised
clustering method. Energy consumption is forecasted by the
Bayesian network.
Authors of paper [10] have utilized the computational power
of deep learning for Electricity Price Forecasting (EPF).
Stacked Denoising Autoencoder (SDA) and RANSAC-SDA
(RS-SDA) models are implemented for online and the day
ahead hourly EPF. The three years (i.e., January 2012 –
November 2014) data utilized in this paper. Data is collected
from Texas, Arkansas, Nebraska, Indiana and Louisiana
ISO hubs in the USA. Comprehensive analyses of the ca-
pabilities of the RS-SDA and SDA models in the EPF are
performed. The effectiveness of the proposed models is
validated through their comparative analyses with classical
ANN, SVM (Support Vector Machine) and MARS (Multi-
variate Adaptive Regression Splines). Both the SDA and
RS-SDA models are able to accurately predict electricity
price with a considerably less MAPE as compared to the
aforementioned models.
A deep learning model for Short-term Load Forecasting
(STLF) is proposed by Tong et al. [11]. The features are
extracted using SDA from the historical electricity load and
corresponding temperature data. Support Vector Regressor
(SVR) model is trained for the day ahead STLF. The SDA
has effectively extracted the abstract features from the data.
SVR model trained on these extracted features forecasts elec-
tricity load with low errors. The proposed model outper-
forms simple SVR and ANN in terms of forecasting accu-
racy which validates its performance.
The Shallow ANN (SANN) is utilized for electricity load
forecasting in [12] and [13]. SANN have the problem of
overfitting. To avoid overfitting, hyperparametersâĂŹ op-
timization is required that increases the complexity of the
forecasting model.
A hybrid deep learning method is applied to forecast price
in [14]. Two deep learning methods are combined in this re-
search work. Features are extracted by Convolution Neural
Network (CNN). Short-term energy price is predicted using
LSTM. Half hourly price data of PJM 2017 is used for pre-
diction. Previous 24 hour price is used to predict the next
1-hour electricity price. The hybrid DNN structure has 10
hidden layers. It has 2 convolution layers, 2 max-pooling
layers, 3 Rectified Linear Unit (ReLU), 1 batch normaliza-
tion layer, 1 LSTM layer for prediction and the last hidden
Sana et al.: Preprint submitted to Elsevier Page 2 of 16
Big Data Predictive Analytics of Electricity Load and Price
Table 1
Related work of load and price forecasting.
Task Forecast Horizon Platform / Testbed Dataset Algorithms
Load and price forecasting [6] Short-term Hourly data of 6 states OF USA NYISO, 2015 MRMR, Multi-block Elman ANN, En-
hanced shark smell optimization
Price forecasting [8] Short-term Hourly electricity price of 6 states of USA ISO NE, 2010-2015 GCA, Random forest (RF), ReliefF,
SVM, DE
Consumption forecasting [9] Short and long-
term
6 second resolution consumption of 5
homes with 109 domestic appliance
UK-Dale, 2012-2015 Association rule mining, Incremental
k-means clustering, Bayesian network
Price forecasting [10] Short-term Hourly price of 5 hubs of MISO USA, 2012-2014 Stacked Denoising Autoencoders
(SDA)
Consumption forecasting [11] Short-term Aggregated hourly load of four regions Los Angeles, California, Florida,
New York City, USA, August
2015-2016
SDA, SVR
Consumption forecasting [12] Short-term Electricity market data of 3 grids: FE,
DAYTOWN, and EKPC
PJM, USA, 2015 Mutual Information (MI), ANN
Consumption forecasting [13] Short-term Electricity market data of 2 grids: DAY-
TOWN, and EKPC
PJM, USA, 2015 Modified MI + ANN
Price forecasting [14] Short-term Half hourly price of PJM Intercontinental Exchange
(ICE), USA
Long Short Term Memory (LSTM),
Convolutional Neural Network (CNN)
Price forecasting [15] Short-term Turkish day-ahead market electricity prices Turkey, 2013-2016 Recurrent Neural Network (RNN)
Cooling load forecasting [16] Short-term HVAC Cooling load of an educational build-
ing
Hong Kong, 2015 Elastic Net (ELN), SAE, RF, MLR,
Gradient Boosting Machines (GBM),
Extreme GB tree, SVR
Consumption forecasting [17] Short-term Hourly load of Korea Electric Power Cor-
poration (KEPCO)
South Korea, 2012-2014 Restricted Boltzman Machine (RBM)
Consumption forecasting [18] Short-term Individual house consumption of 7km of
Paris
Individual household electric
power consumption, France,
2006-2010
Conditional RBM (CRBM), Factored
CRBM
Load forecasting [19] Short-term 15 minute resolution of one retail building Fremont, CA SAE, ELM
Load forecasting [20] Short-term 15 minutes cooling consumption of a com-
mercial building in Shenzhen city
Guangdong province, South
China, 2015
Empirical Mode Decomposition
(EMD), Deep Belief Networks
(DBN)
Load forecasting [21] Short-term Hourly consumption from Macedonian
Transmission Network Operator (MEPSO)
Republic of Macedonia, 2008-
2014
DBN
Load forecasting [22] Short-term Hourly consumption from Australia AEMO, 2013 EMD, DBN
Load forecasting [23] Medium to
long-term
Hourly consumption of a public safety
building, Salt Lake City, Utah. Aggregated
hourly consumption of residential buildings,
Austin, Texas
USA, 2015, 2016 LSTM
Load forecasting [24] Medium-term Half hourly metropolitan electricity con-
sumption
France, 2008-2016 LSTM, GA
Load forecasting [25] Short-term Hourly aggregated consumption of 6 states
OF USA
ISO NE, 2003-2016 Xgboost weighted k-means, EMD-
LSTM
Load forecasting [26] Short-term Ireland consumption Smart meter database of load
profile, Ireland
Pooling deep RNN
Load forecasting [27] Short-term Daily electricity consumption data 3 Chinese cities, 2014 Feed Forward DNN (FFDNN), Prob-
ability Density Estimation
Load and photovoltaic power
forecasting [28]
Short-term Hourly residential power load data Dataport dataset, 2018 Deep Recurrent Neural Network
(DRNN) with LSTM units
Load forecasting [29] Short-term Hourly electricity market data ISO NE, 2007–2012 Deep RNN
Load forecasting [30] Short-term Hourly aggregated consumption of 6 states ISO NE, USA, DRNN, FFDNN
layer is a fully connected layer. The CNN feature extrac-
tor has 7 hidden layers and LSTM predictor has 3 hidden
layers. The output of 7𝑡ℎ hidden layer of feature extractor
CNN becomes the input of LSTM predictor. The proposed
method outperforms simple CNN, LSTM and various ma-
chine learning methods.
Authors of [15] have utilized the Gated Recurrent Units
(GRU) in RNN for Energy Price Forecasting (EPF).
Recently deep learning forecasting methods have shown
good performance in electricity price [1416] and load fore-
casting [1730]. However, the interdependency of load and
price are not considered in these DNN forecasting models.
In [31], the author discusses the importance of big data ap-
plications and analytics in the development of Smart Sus-
tainable Cities (SSCs). An IoT based framework is proposed
to improve the functionalities of SSCs. The importance of
accurate load and price forecasting in smart gridâĂŹs sta-
bility is discussed. Stability of grid improves sustainabil-
ity of SSCs. A SSC uses Information and Communication
Technology (ICT) for improving lifeâĂŹs quality, services
and urban operations. It ensures to fulfill the present and fu-
tureâĂŹs environmental, social, cultural and economic re-
quirements.
The authors of [32] conduct an extensive literature review on
future SSCs. Besides other aspects of future SSCs, energy
efficiency is also mooted in this review. The authors describe
the SSC as an energy efficient, eco-friendly and real-time
city. Load demand forecasting plays a key role in energy
management and efficiency.
The future trends, architecture and challenges of SSCs are
reviewed in [33]. The major aspects of a smart city are illus-
trated in this study. Smart grid is discussed as an important
component of a smart city. The role of load demand fore-
casting is emphasized in an energy efficient city. Six dimen-
sions of SSCs are explained in [34]. The authors present a
road map towards SSCs. The concept of SSC is elaborated
with the help of six dimensions; one of these dimensions is
energy efficiency.
The authors of [35] discuss the present services of smart
cities like load demand forecasting in order to achieve a
sustainable city. The short-term load of Girona University,
Spain is studied. The forecasting model consists of outlier
rejection, feature selection using auto correlation and pre-
diction using auto regression. First, outliers are removed
based on k nearest neighbors and Euclidean distance. Sec-
ondly, highly correlated features with the target class are se-
lected and features having high correlation with other fea-
tures and less correlation with target class are eliminated. Fi-
nally, a classical data-driven prediction model, auto regres-
sion is implemented for STLF. The services embedded in the
studied layered architecture are described in detail, aiming to
make it part of a sustainable city.
Sana et al.: Preprint submitted to Elsevier Page 3 of 16
Big Data Predictive Analytics of Electricity Load and Price
3. Problem Statement and Contributions
Authors of paper [8] and [9] have used big data for pre-
dictive analytics. However, the extensive feature engineer-
ing process increases the computational complexity. The
feature engineering involves denoising of inputs, feature se-
lection and dimension reduction. After the feature engineer-
ing step, another important step is the optimization of the
prediction method’s hyperparameters. This optimization is
crucial to achieving accurate forecast results. Feature en-
gineering and model optimization steps make forecasting
complex. To avoid the extensive feature engineering pro-
cess, the deep learning methods are proposed for electricity
price [10] and load [11] forecasting. The mentioned deep
learning based forecasting models have forecasted electric-
ity load and price separately.
The electricity load and price signals have a high correla-
tion. The incorporation of the inherent bi-directional rela-
tion of electricity load and price in prediction models’ inputs
results in high prediction accuracy. The correlation of elec-
tricity load and price is not taken into consideration in [10]
and [11]. A forecasting method is needed that accurately
forecasts the electricity load and price simultaneously. In
this article, a forecasting model is proposed that is based on
deep learning. The proposed method accurately forecasts
electricity load and price simultaneously taking advantage
of big data. The major contributions of this study are en-
listed below:
The proposed models take advantage of big data. Big
data analyses of electricity load and price are pre-
sented in this study. Data and forecasting models are
analyzed statistically and graphically.
A new feature extraction scheme based on Sparse Au-
toencoder (SAE) is introduced in the first proposed
model. The performance of SAE is improved by us-
ing wavelet packet denoising as a decoding function
that significantly improves the quality of extracted fea-
tures. The extracted features are presented as refined
information and smooth training input of the forecast-
ing model Nonlinear Autoregressive Network with
Exogenous variables (NARX).
The second proposed model is an optimized Recurrent
Extreme Learning Machine (RELM). The parameters
of RELM are optimized using a meta-heuristic opti-
mization technique differential evolution. The pro-
posed models outperform ELM, RELM, NARX, DE-
ELM and Cascade Elman ANN (CEANN) [7].
4. Proposed Model
Before describing the proposed forecasting model, the
utilized methods are introduced. A brief description of the
methods used in the proposed models is given in this section.
5. Artificial Neural Network for Forecasting
ANNs are inspired by the learning process of the bio-
logical neural networks. ANNs have the capability to model
the complex patterns hidden in the data. Multilayer Percep-
tron (MLP) is the simplest and fundamental architecture of
ANN [36]. The MLP comprises of the neurons, bias and
weights. The ANNs make a mapping of the inputs 𝑥𝑖and
their respective targets 𝑡𝑖. The weights, 𝑊𝑖are updated while
creating this mapping. The network learns when the weights
are updated.
𝑦(𝑡) = 𝑓(𝑊1𝑥1+𝑊2𝑥2+…+𝑊𝑛𝑥𝑛)(1)
Where, 𝑊𝑖are the weights and 𝑓is the activation function.
The most common algorithm used for updating the weights
is gradient descent. It reduces the squared error 𝐸using the
delta rule:
𝐸=𝑦(𝑡) − 𝑡(𝑡)2(2)
Where, 𝑡(𝑡)is the correspondent target vector of the 𝑥(𝑡)
training vector.
𝑤(𝓁)
𝑖𝑗 (𝑡+ 1) = 𝑤(𝓁)
𝑖𝑗 (𝑡) − 𝛼𝜕𝐸
𝜕𝑤(𝓁)
𝑖𝑗 (𝑡)
(3)
𝑏(𝓁)
𝑗(𝑡+ 1) = 𝑏(𝓁)
𝑗(𝑡) − 𝛼𝜕𝐸
𝜕𝑏(𝓁)
𝑗(𝑡)
(4)
Where, 𝑤(𝓁)
𝑖𝑗 (𝑡+ 1) is the new modified weight, 𝑤(𝓁)
𝑖𝑗 (𝑡)is the
weight that is required to be changed, bias is 𝑏(𝓁)
𝑗(𝑡)and the
learning rate is 𝛼(>0).
Deep Neural Network (DNN) is ANN with deeper architec-
ture, i.e., several numbers of hidden layers. DNN is compu-
tationally stronger as compared to Shallow ANN (SANN).
The proposed forecasting engines are based on Deep Recur-
rent Neural Networks (DRNN), i.e., NARX and LSTM.
6. Sparse Autoencoder
The SAE neural network is an unsupervised learning al-
gorithm that applies back propagation method setting the tar-
get values to be equal to the inputs, i.e., 𝑦𝑖=𝑥𝑖. The SAE
attempts to learn a function 𝑊 ,𝑏(𝑥) ≈ 𝑥. Basically, SAE
tries to learn an approximation function, so the output ̂𝑥 is
similar to the input 𝑥. The network must reconstruct the in-
put data. By placing constraints on the network and limiting
the number of hidden units and adding sparsity, an interest-
ing structure of the data is discovered. The network is forced
to learn a compressed representation of the input, i.e., given
only the vector of hidden unit activations. Generally, sig-
moid is the activation function of the autoencoder, which
is designed to obtain a better representation of input data:
(𝑋, 𝑊 , 𝑏) = 𝜎(𝑊 𝑋 +𝑏). A sparse penalty term is added
to the sparse autoencoder cost function to limit the average
activation value of the hidden-layer neuron. Normally, when
the output value of a neuron is 1, it is active and the neuron
is inactive when its output value is 0. The purpose of enforc-
ing sparsity is to limit the undesired activation. 𝑎𝑗(𝑥)is set
Sana et al.: Preprint submitted to Elsevier Page 4 of 16
Big Data Predictive Analytics of Electricity Load and Price
x1
xn
x2
D
D
D å s
.
.
.
Input
Layer
Hidden
Layer 1
Output Layer
.
.
.
w11
w1n
w12
w21
w22
w2n
Outputs
.
.
.
.
.
.
Hidden
Layer 2
Time
Delay
Layer
w31
w32
w3n
Smart Grid
Historic Temperature Forecast
s
Load Forecast
Price Forecast
Historic Data ESAE Feature Extractor MIMO Forecaster ESAENARX
Figure 1: Proposed System model.
as the 𝑗𝑡ℎ activation value. In the process of feature learning,
the activation value of the hidden-layerneuron is usually ex-
pressed as 𝑎=𝜎(𝑊 𝑋 +𝑏), where, 𝑊are the weight matrix
and 𝑏is the deviation matrix. The mean activation value of
the 𝑗𝑡ℎ neuron in the hidden layer is defined as:
𝜌𝑗=1
𝑛
𝑛
𝑖=1
[𝑎𝑗(𝑥𝑖)] (5)
The hidden layer is kept at a lower value to ensure that the
average activation value of the sparse parameter is defined as
𝜌, and the penalty term is used to prevent 𝜌𝑗from deviating
from parameter 𝜌. The Kullback-Leibler (KL) divergence
[37] is used in this study for the re-enforcement learning.
The mathematical expression of KL divergence is as follows:
𝐾𝐿(𝜌𝜌𝑗) = 𝜌ln 𝜌
𝜌𝑗
+ (1 − 𝜌) ln 1 − 𝜌
1 − 𝜌𝑗
(6)
When 𝜌𝑗does not deviate from parameter 𝜌, the KL diver-
gence value is 0; otherwise, the KL divergence value will
gradually increase with the deviation. The cost function of
the neural network is set as 𝐶(𝑊 , 𝑏). Then, the cost function
of adding the sparse penalty term is:
𝐶𝑆𝑝𝑎𝑟𝑠𝑒 =𝐶(𝑊 , 𝑏) + 𝛽
𝑆2
𝑗=1
𝐾𝐿(𝜌𝜌𝑗)(7)
Where, 𝑆2is the number of neurons in the implicit layer and
𝑊is the weight of the sparse penalty term. The training
essence of a neural network is to find the appropriate weight
and threshold parameter (𝑊 , 𝑏). After the sparse penalty
term is defined, the sparse expression can be obtained by
minimizing the sparse cost function.
An SAE can be transformed into Sparse Denoising Autoen-
coder (SDA). Data is corrupted in a stochastic manner by
introducing some noise into it. The corrupted data is then
attempted to reconstruct to the original data.
SAE is capable of discovering the correlation among the fea-
tures. A refined and the most relevant feature representation
achieved using SAE.
7. Efficient SAE (ESAE)
The Efficient SAE (ESAE) is proposed to create a better
representation of electricity data, that is useful for an accu-
rate forecast of price and load. In this section, the proposed
feature extractor Efficient SAE is discussed in detail.
8. Pre-training of ESAE
To initialize the weights and bias an unsupervised pre-
training is applied. Where the input of a hidden layer is the
output of its previous layer. In the pre-training step, the ini-
tial bias and weights of the autoencoder are learned.
In the proposed method, the input data 𝑋𝑡is corrupted by
introducing white noise [38]. The white noise is added to
randomly selected 30% data points. A random process 𝑦(𝑡)
is known as white noise when the 𝑆𝑦(𝑓)is constant at all the
frequencies 𝑓:
𝑆𝑦(𝑓) = 𝑁0
2𝑓(8)
The white noise describes random disturbances with small
correlation periods. The white noise generalized correlation
function is defined by:
𝐵(𝑡) = 𝛿(𝑡)𝜎2(9)
Where, 𝛿(𝑡)is the delta function and 𝜎is a positive constant.
9. Fine-tuning of ESAE
The fine-tuning step is followed by the pre-training step.
In fine-tuning, the wavelet denoising is proposed as the en-
coding transfer function of the first hidden layer of ESAE.
The activation function of the second layer is sigmoid. The
wavelet denoising has two steps: (i) wavelet packet decom-
position and, (ii) reconstruction denoising operation. Firstly,
the input time series is decomposed into different frequency
band by passing through the high pass and low pass filters.
Then the frequency band of noise is set to be zero. The signal
is then reconstructed using wavelet reconstruction function,
that is the inverse of a wavelet decomposition function [39].
Sana et al.: Preprint submitted to Elsevier Page 5 of 16
Big Data Predictive Analytics of Electricity Load and Price
Start
Extracted FeaturesDe-normalization Forecasting by
NARX
Price and load
forecasts
Min-max
normalization of data
Finish
Stage 1: Feature Extraction
Stage 2: Prediction
Pre-training Fine-tuning
Encoding with SAE
Corrupting input
features with white
noise
Fine-tuning with
efficient SAE
Figure 2: Step by step flow of proposed model ESAENARX.
Wavelet decomposition operation can be expressed by:
𝑐𝑗,𝑘 =𝑛𝑐𝑗−1 , ℎ𝑛−2𝑘
𝑑𝑗,𝑘 =𝑛𝑑𝑗−1 , 𝑔𝑛−2𝑘𝑘= (1,2,, 𝑁 − 1)
Where, 𝑐𝑗,𝑘 is scale coefficient, 𝑑𝑗 ,𝑘 is the wavelet coefficient,
and 𝑔are the quadrature mirror filter banks. 𝑗is level of
decomposition and 𝑁are the sampling points. The wavelet
reconstruction function that is inverse wavelet decomposi-
tion is expressed as:
𝑐𝑗−1,𝑛 =
𝑛
𝑐𝑗ℎ𝑘 − 2𝑛+
𝑛
𝑑𝑗𝑔𝑘 − 2𝑛(10)
The denoising operation is shown by equations below.
̂𝜔𝑗,𝑘 =𝑠𝑖𝑔𝑛(𝜔𝑗,𝑘 (𝜔𝑗,𝑘 𝑇 𝜆)),𝜔𝑗 ,𝑘𝜆,
0,𝜔𝑗,𝑘 < 𝜆.
Where, ̂𝜔𝑗,𝑘 is denoised signal, 𝜔𝑗 ,𝑘 is wavelet transformed
signal and 𝜆is the threshold.
In ESAE feature extractor, the number of the units in hid-
den layer one and two are 400 and 300, respectively. The
coefficient that controls the layer 2 weight regularization is
set to be 0.001. Sparsity regularization is 4 and sparsity pro-
portion is 0.05. A maximum number of epochs is 100. The
algorithm for the learning of weights is scale conjugate gra-
dient descent.
10. Non-linear Autoregressive Network with
Exogenous Variables
NARX is an autoregressive RNN. Its feedback connec-
tions enclose several hidden layers of the network, leaving
the input layer. NARX has a memory that is utilized for
creating a nonlinear mapping between inputs and outputs.
The network learns from the recurrence on the past values
of time series and the past predicted values of the network
[40]. For predicting a value 𝑦(𝑡), the inputs of the NARX are
𝑦(𝑡− 1), 𝑦(𝑡− 2),, 𝑦(𝑡𝑑). NARX can be explained by
the following equation:
̂𝑦(𝑡+ 1) = 𝑓(𝑦(𝑡), 𝑦(𝑡− 1), ..., 𝑦(𝑡𝑑), 𝑥(𝑡+ 1), 𝑥(𝑡), ..., 𝑥(𝑡𝑑)) + 𝜀(𝑡)
(11)
Where ̂𝑦(𝑡+ 1) is network’s output at 𝑡,𝑓() is the nonlin-
ear mapping function, 𝑦(𝑡), 𝑦(𝑡− 1), ..., 𝑦(𝑡𝑑)are the
past observed values, 𝑥(𝑡+ 1), 𝑥(𝑡), ..., 𝑥(𝑡𝑑)are the net-
work’s inputs, number of the delays is 𝑑, and the error term
is denoted by 𝜀(𝑡). In the proposed NARX, for simultaneous
forecasting of price and load, the number of delays is 2. The
hidden layers of the network are 10. The training function is
Levenberg Marquardt.
11. Long Short-term Memory
LSTM is a well-known sub-category of the RNN. It is
widely used for modeling of sequential data. In LSTM, in-
ternal states are used to process input sequence. This struc-
ture allows it to learn dynamic temporal behavior for a time
sequence. Unlike feed forward ANNs, LSTM use their inter-
nal state to process sequences of inputs and remember longer
dependencies in the data. The LSTM is used to solve many
time sequence problems. LSTM contains three gates: input
gate, forget gate and output gate. It has a memory cell that
keeps relevant information of data as a memory. The pur-
pose of the forget gate is to flush out irrelevant data. LSTM
can be explained by following equations:
Suppose an input time series, 𝑥=𝑥1, 𝑥2,, 𝑥𝑛. The
LSTM models the input time series using recurrence (as
shown in equation 12):
𝑡=𝑓(𝑥𝑡, ℎ𝑡−1)(12)
Where, 𝑡is the hidden state at time 𝑡,𝑥𝑡is input at time 𝑡
and 𝑡−1 is the previous hidden state, i.e., at time 𝑡− 1. The
Sana et al.: Preprint submitted to Elsevier Page 6 of 16
Big Data Predictive Analytics of Electricity Load and Price
recurrence function 𝑓()contains gated operations as shown
in the following equations 13,14 and 15:
𝑖𝑡=𝜎(𝑤𝑖[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑖)(13)
𝑓𝑡=𝜎(𝑤𝑓[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑓)(14)
𝑜𝑡=𝜎(𝑤𝑜[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑜)(15)
̃
𝐶𝑡=𝑡𝑎𝑛ℎ(𝑤𝑐[𝑥𝑡, ℎ𝑡−1] + 𝑏𝐶)(16)
𝐶𝑡=𝑖𝑡̃
𝐶𝑡+𝑓𝑡𝐶𝑡−1 (17)
𝑡=𝑡𝑎𝑛ℎ(𝐶𝑡)𝑜𝑡(18)
Where, 𝑖𝑡,𝑓𝑡and 𝑜𝑡are input, forget and output gates, respec-
tively. 𝑤𝑖,𝑤𝑓and 𝑤𝑜are their respective weights. 𝑏𝑖,𝑏𝑓and
𝑏𝑜are their respective biases. 𝐶𝑡is the current state of the
memory cell. ̃
𝐶𝑡is the new value candidate for the memory
cell. The sigmoid function 𝜎()converts the gatesâĂŹ val-
ues in the range of 0 to 1. The gates’ decisions depend on the
current input 𝑥𝑡and previous output 𝑡−1. An input signal
is blocked if the gate’s value is 0. The forget gate decides
the amount of previous state 𝑡−1 to be passed. The input
gate defines the amount of new input to be added or updated
to the previous cell state. Based on the cell state, the output
gate determines which information is output. In this man-
ner, the short and long-term sequence related information is
learned in the LSTM.
LSTM is superior to ANN because of its quality that it
can handle the problem of vanishing or exploding gradient.
The vanishing gradient problem arises while updating of
weights. The weights are updated by the delta rule in which
the gradient of the weight is taken with respect to the error
(as shown in equation 3). If the gradient becomes too small,
the change in updated weights will also be smaller resulting
in no improvement in learning. Whereas, if the gradient be-
comes too big, the updated weights will change too much
resulting in no convergence and un-stability of the network.
LSTM overcomes this problem by using the memory cell 𝑐𝑡,
that is able to preserve the state over a long period of time.
The amount of information to be restrained or discarded is
controlled by changing the values of forget gate, 𝑓𝑡, and in-
put gate, 𝑖𝑡. The dependency on individual inputs is also
controlled. This increased regulation helps in overcoming
the vanishing and exploding gradient problems.
12. ESAENARX Forecast Model
The deep learning is well known for its high precision
feature extraction. A sparse autoencoder deep neural net-
work with dropout is proposed to extract useful feature. This
deep neural network can significantly reduce the adverse ef-
fect of overfitting, making the learned features more con-
ducive to the identification and forecasting. NARX is pro-
posed for load and price forecasting.
A Multi Input Multi Output (MIMO) forecast model is pro-
posed to predict the price and load simultaneously. Fea-
tures are extracted using ESAE. Then the NARX network is
trained for simultaneous forecasting of price and load. The
system model is shown in Figure 1. The input features are:
hour, temperature forecast, wind speed forecast, lagged load,
the lagged price. There are two targets, electricity load and
price. The prediction process has the following five steps:
1. Inputs and targets are normalized using min-max
normalization. Suppose an input vector 𝑋=
𝑥1, 𝑥2, 𝑥3, ..., 𝑥𝑛. The number of instances in the vec-
tor is 𝑛. The min-max normalized is obtained by:
𝑋𝑛𝑜𝑟 =𝑥𝑖𝑋𝑚𝑖𝑛
𝑋𝑚𝑎𝑥 𝑋𝑚𝑖𝑛
(19)
Where, 𝑖= 1,2, ..., 𝑛.
2. The normalized inputs are fed to train the ESAE fea-
ture extractor. After the ESAE is trained, the input fea-
tures are encoded using this trained ESAE. The output
of ESAE is the encoded features.
3. The encoded features are given as input to train NARX
network. 80% data is given for training, 15% is used
for validation and 5% is used for testing.
4. The price and load are predicted for 168 hours that is
one week.
5. The predicted values of load and price are de-
normalized to obtain actual values. The NARX ac-
curately predicts the price and load simultaneously.
The ESAE feature extractor has wavelet packet denoising as
a decoder function that performs the denoising of the input
features along with extraction. A refined and rich represen-
tation of features is extracted by ESAE. Generally, SAE has
sigmoid decoder functions. The usage of wavelet packet de-
noising enhanced the extracted features and consequently the
forecasting accuracy improved significantly. The purpose of
good forecasting accuracy is achieved by ESAENARX with
the help of efficient feature extraction.
13. DE-RELM Forecast Model
The second proposed model is an also a MIMO model
like ESAENARX. DE-RELM is an efficient method for elec-
tricity load and price forecasting. DE-RELM has three
stages, in the first stage, the parameters of ELM are opti-
mized by applying the DE algorithm. In the second stage,
ELM is trained. The inputs and outputs of ELM are the in-
put features of load and price. With similar inputs and out-
puts, ELM acts like an encoder. Once the optimized ELM is
trained, the learned weights are set as the initial weights of
the RNN network that is used for forecasting. The learned
weights of ELM are the best representation of the input data.
Setting these initial weights helps RNN converge faster and
Sana et al.: Preprint submitted to Elsevier Page 7 of 16
Big Data Predictive Analytics of Electricity Load and Price
Start
De-normalization
Price and load
forecasts
Min-max
normalization of data
Finish
Stage 1: ELM optimization
Stage 2: Training ELM
Select weights and
biases with DE No
Yes
Stage 3: Prediction with DE-RELM
Calculate objective
function
Train ELM with same
inputs and outputs
Learned Weights Train ELM with
optimized weights
Forecasting by DE-
RELM
Initialize DE-RELM
with learned weights
Figure 3: Flowchart of DE-ELM.
forecast accurately. This is the third and final stage of DE-
RELM. The number of neurons in the hidden layer of ELM
and RNN is kept the same. In order to use the learned
weights of ELM for the RNN network, the dimensions of
weight vectors have to be the same. For the prediction of
load and price, DE-RELM follows the steps shown in the
flowchart, Figure 3.
1. The inputs and targets are normalized using min-max
normalization (as shown in equation 19).
2. The normalized inputs are given to the ELM networks
as inputs and outputs. The network is trained.
3. The forecasting error is calculated by equation 22.
4. The DE algorithm is used to optimize the weights and
biases of ELM. The objective function of DE is the
minimization of the prediction error.
𝑂𝑏𝑗 =minimize 1
𝑛
𝑛
𝑖=1
𝑋𝑎𝑐𝑡
𝑖𝑦𝑓 𝑜𝑟
𝑖
𝑋𝑎𝑐𝑡
𝑖
100(20)
Where, 𝑥𝑓 𝑜𝑟 is the forecasted value, 𝑋𝑚𝑎𝑥 is the max-
imum value of the actual target and 𝑋𝑚𝑖𝑛 is the mini-
mum value of the actual target.
5. When the forecasting error is reduced to the desired
value, the optimized ELM network is trained.
6. The weights of ELM are set as initial weights of the
RNN network.
7. The RNN network predicts the price and load simul-
taneously.
8. The predicted values are de-normalized by inverse
min-max function.
𝑋= [𝑥𝑓 𝑜𝑟 × (𝑋𝑚𝑎𝑥 𝑋𝑚𝑖𝑛)] + 𝑋𝑚𝑖𝑛 (21)
Where, 𝑥𝑓 𝑜𝑟 is the forecasted value, 𝑋𝑚𝑎𝑥 is the max-
imum value of the actual target and 𝑋𝑚𝑖𝑛 is the mini-
mum value of the actual target.
In DE-RELM, the number of neurons in the hidden layer of
ELM and RNN is 100. ELM has 1 hidden layer. The acti-
vation function of ELM is sigmoid. DE has 100 iterations,
population size is 50, mutation factor is 0.5 and the crossover
rate is 1. The RNN network has 1 hidden layer. The transfer
function is logistic sigmoid.
The proposed models have multiple inputs and outputs. In-
puts are: hour, temperature, wind speed, lagged price and
lagged load and outputs are: price and load. The forecast en-
gines create a mapping between inputs and targets. Hence,
a mapping of input hour, temperature, price and load is cre-
ated with target price and target load. The relation between
price and load is captured while creating this mapping. The
price and load are affected by past price and load, therefore,
lagged values are good features for prediction. The load is
affected by temperature. The temperature and lagged values
are the most relevant inputs for price and load prediction.
Sana et al.: Preprint submitted to Elsevier Page 8 of 16
Big Data Predictive Analytics of Electricity Load and Price
Moreover, the input is extracted with ESAE feature extractor
which further enhances the input to NARX forecaster. The
best mapping of relevant and informative inputs with targets,
results in improved forecast accuracy.
Both proposed models comprise of neural network based en-
coders: ESAE and ELM encoder, and deep RNN forecasters:
NARX and LSTM. In the first model; ESAENARX, features
are extracted by an efficient sparse encoder. In the second
model; DE-RELM, an extreme learning machine is used as
encoder to learn the initial weights for forecast engine.
This study is aimed at helping electricity market experts and
traders. Several market operations benefit from the load
forecasting, such as: formulation of demand-response pro-
grams, generation scheduling and planning new generation
sources. On the other hand, the traders take advantage of
price forecasting for making bidding strategies and market
experts make modified pricing schemes to control consump-
tion behaviors. No specific sector (i.e., residential, indus-
trial, commercial, etc.) is targeted in this study, instead ag-
gregated load and average regulation price of two power util-
ities are studied.
14. Applications of Proposed Models
The proposed models forecast electricity load and price.
Both price and load forecasts are useful in the case of smart
grids and micro grids. They help utility experts in under-
standing load and price correlation and dynamics. They have
following applications:
1. Minimize the risk of demand and supply imbalance. If
the generation of electricity is less than the demand,
the grids will not be able to fulfill the demands of con-
sumers. If generation is more than demand, the energy
will be wasted.
2. Enable the power utility companies to plan better since
they understand the future load demand.
3. Help to determine the required resources; such as, fu-
els required to operate the generating plants.
4. Maximize utilization of power generating plants. The
load forecasting prevents under generation and over
generation.
Several Independent Service Operators (ISOs) take advan-
tage of load forecasting. These ISOs publish the day-ahead
or month-ahead load forecasting data on their websites; such
as, NYISO [41], PJM [42], etc. In the aforementioned real
world scenarios, the proposed forecasting models are appli-
cable.
15. Simulations and Results
All the simulations are performed using MATLAB
R2018a on a computer with core i3 processor and 8 GB
RAM. In this section, the description of datasets, big data
analysis and results’ discussion are presented.
16. Data Description
The data used for forecasting is taken from the well-
known electricity utilities: ISO NE (Independent System
Operator New England) [43] and PJM [44], USA. Both
datasets are publicly available.
17. ISO NE Electricity Market
ISO NE is an independent system operator that provides
power to the six states of the USA, known as New England.
It serves Maine, Connecticut, Massachusetts, Rhode Island,
Vermont and New Hampshire. Approximately, every year
the transaction of $10 million is made by 400 electricity
market participants in ISO NE. It has almost 7 million con-
sumers: business and household. Hourly electricity market
data of almost 8 years is used for prediction purpose. Dura-
tion of data used in simulations is from January 2011 to June
2018. Total measurements are 65,616. The data utilized in
this paper is aggregated load and regulation capacity clear-
ing price of the ISO NE control area.
18. PJM Electricity Market
PJM Interconnection is a Regional Transmission Organi-
zation (RTO) in the USA. It is an electric transmission sys-
tem that is part of the Eastern Interconnection grid. It sup-
plies power to 14 regions, i.e., Illinois, Delaware, Kentucky,
Indiana, Maryland, New Jersey, Michigan, Ohio, North Car-
olina, Pennsylvania, Virginia, West Virginia, District of
Columbia and Tennessee. The data taken from PJM is hourly
consumption and price of thirteen years, i.e., January 2006
to October 2018. Data comprises of 112,300 measurements
of load and price each.
19. Performance Evaluation
To evaluate the performance of ESAENARX two per-
formance measures are used, i.e., MAPE, Root Mean Square
Error (RMSE) and Normalized RMSE (NRMSE). The lower
value of the error is better forecasting accuracy. MAPE is an
average absolute error of the forecasted and observed values
and defined by the following equation:
𝑀 𝐴𝑃 𝐸 =1
𝑛
𝑛
𝑖=1
𝑋𝑎𝑐𝑡
𝑖𝑦𝑓 𝑜𝑟
𝑖
𝑋𝑎𝑐𝑡
𝑖
100 (22)
NRMSE is the normalized root mean square error of fore-
casted and observed values and defined by:
𝑅𝑀𝑆𝐸 =
1
𝑛
𝑛
𝑖=1
(𝑋𝑎𝑐𝑡
𝑖𝑦𝑓 𝑜𝑟
𝑖)2(23)
𝑁𝑅𝑀𝑆𝐸 =𝑅𝑀𝑆𝐸
(𝑚𝑎𝑥(𝑋𝑎𝑐𝑡
𝑖) − 𝑚𝑖𝑛(𝑋𝑎𝑐𝑡
𝑖)) (24)
Where 𝑋𝑎𝑐𝑡
𝑖is the observed value, 𝑦𝑓 𝑜𝑟
𝑖is the forecasted
value and 𝑛is number of values.
20. Big Data Analytics of Electricity Price and
Demand
In this research study, the big data of load and price are
deeply analyzed. Both visual and statistical analyses are per-
formed. The visual analyses are presented in graphs. The vi-
sual analyses of ISO NE load are shown in Figures 4,8,11,
Sana et al.: Preprint submitted to Elsevier Page 9 of 16
Big Data Predictive Analytics of Electricity Load and Price
and 13 and PJM load are illustrated in Figures 14,18,21,
and 23. The price analyses of ISONE are presented in Fig-
ures: 5,9,10, and 12, and PJM price is demonstrated by
Figures: 15,19,20, and 22. The price demand relation of
ISO NE is shown in Figure 6, and 7and PJM is shown in
Figure 16, and 17. The statistical analysis of the forecast er-
ror is shown in Table 2.
ISO NE price and load have daily and weekly seasonality.
Price and load have a strong relation with the ISO NE mar-
ket. The load of 8 years is shown in Figure 4and price is
shown in Figure 5. The scatter plot in Figure 6shows the di-
rectly proportional relation of price and demand. The scatter
plot shows the proportionality of price and load. The corre-
lation coefficient is also shown in the figure. The normalized
load and price of one week are shown in a Figure 7for better
visualization of their bidirectional relation. The price elas-
ticity of demand is a factor that describes changes in demand
with respect to changes in the price. Usually, the demand de-
creases if the price increases, however, the price elasticity of
power demand is low. According to the analysis presented
in [45], the price elasticity of demand is âĂŞ0.1 or lesser
within a year in the USA. The season affects the energy con-
sumption and price. In the USA there are four seasons in a
year. The spring season duration is from March to May, the
summer season is from June to August, the autumn (fall) is
from September to November and winters are from Decem-
ber to February. The summer season has the highest electric-
ity consumption of the year as shown in Figure 8. The peak
consumption hours of summer are from 1:00 pm to 5:00 pm
on weekdays. In winters (December to January), the peak
consumption hours are from 5:00 pm to 7:00 pm on week-
days. In ISO NE there are two peak load points in a day.
The 1𝑠𝑡 peak point is around 11:00 am and 2𝑛𝑑 peak point
is between 4:00 pm to 5:00 pm (as shown in Figure 8). The
consumption of 1𝑠𝑡 January, 1𝑠𝑡 April, 1𝑠𝑡 July and 1𝑠𝑡 Octo-
ber is shown in Figure 8and 18. The mentioned four days
are from the four different seasons of a year.
Prices of the same four days are shown in Figure 9and 19.
Both consumption and price are the highest in the summer
season from the rest of the year. The building cooling is re-
quired in the hot weather of summer. Air conditioners con-
sume a lot of power, that is the major reason for an increase in
energy consumption. Electricity prices are relatively higher
in the winters too. The electricity price and load are less in
the spring season as compared to the rest of the year. Due to
the fact that in moderate weather building heating or cool-
ing is not required, that reduces consumption and ultimately
price too. The electricity consumption pattern is fixed with
the seasons and time of use. The electricity consumption is
more in the working hours and less in the nonworking hours.
The load pattern trend has fewer variations as compared to
the price trend. Mostly price and load increasing and de-
creasing at the same time. However, there are a few points
in time where the energy price increase sharply in an unex-
pected manner, even if the load is not increased accordingly
(as shown in Figure 7, between hours 75 to 82 and Figure 17,
between hours 30 to 35). The unexpected change in the price
is due to the external influential factors other than consump-
tion. The factors that influence energy price are: Renewable
Energy Resources (RES) available, fuel prices, economic
conditions, excessive use penalty and transmission contin-
gency. The load is not much affected by most of these fac-
tors. Energy load shows a little or no variation towards the
aforementioned external factors. The energy consumption is
majorly affected by weather conditions. The electricity con-
sumption and price continue to increase over the last 8 years,
that is clear from Figure 4and 5. The visual representation
of past years’ consumption enables utility experts to visual-
ize increasing demand that helps in planning new generation
plants to satisfy future power demand.
PJM load and price of 13 years (2006-2018) are shown in
Figure 14 and Figure 15, respectively. Scatter plot in Fig-
ure 6illustrates the relation of price and load in ISO NE.
Figure 16 shows price demand relation in the PJM electric-
ity markets. The direct proportionality of load and price sig-
nals can be seen in these two figures. In Figure 7and 17, the
normalized load and price of 1𝑠𝑡 week of January 2018 are
plotted. The correlation of price and load signals is demon-
strated in these two figures.
The proposed models ESAENARX and DE-RELM are used
for short-term load and price forecasting. The forecast pe-
riod is one week that is 168 hours. The results of ISO NE
price and load forecast of 1𝑠𝑡 week of June 2018 are shown
in Figure 10 and Figure 11. The PJM price and load fore-
cast of 1𝑠𝑡 week of September 2018 are shown in Figure 20
and 21, respectively. The actual and forecasted values are
plotted and the forecasted values are following the trend of
the actual values. The forecasted load trend closer to the ac-
tual load trend as compared to price. The price forecast is
slightly less accurate as compared to the load forecast. This
is because the load has a similar repetitive pattern and price
pattern has a volatile nature.
Price data exhibit certain characteristics: volatility, sudden,
sharp spikes and changes. The nature of price makes its
forecasting difficult. Learning the pattern of price require
great effort. Only refined features learned with a good pre-
diction method can produce an accurate price forecast result.
It is clear from the results of the experiments that the ESAE-
NARX forecasts price and load very well.
21. Comparison and Discussion
The proposed methods are compared with four ANN
forecasting methods: NARX and ELM, DE-ELM and
RELM. These methods are widely used in electricity load
and price forecasting. The ESAENSARX, ELM, enhanced
ELM, NARX and RELM results for ISO NE price and load
forecast are shown in Figure 12 and Figure 13, respectively.
ESAENARX is able to follow the price and load trend bet-
ter than compared methods. The reason behind the better
forecast accuracy is the best representative features extracted
by proposed feature extractor ESAE. NARX forecaster is
trained with extracted features and it performs very well.
The proposed method takes advantage of the strengths of
both SAE and NARX. The SAE is further made efficient for
Sana et al.: Preprint submitted to Elsevier Page 10 of 16
Big Data Predictive Analytics of Electricity Load and Price
0123456
Hours 104
0.5
2
3
Load (MW)
104
Figure 4: Load of January 2011 to March 2018, ISO NE.
0 1 2 3 4 5
Hours 104
0
500
1000
1500
Price ($/MWh)
Figure 5: Price of January 2011 to March 2018, ISO NE.
1 1.2 1.4 1.6 1.8
Load (MW) 104
0
100
200
300
400
Price ($/MWh)
Correlation Coefficient = 0.62
Figure 6: Price-demand signals relation of January 2018 to
March 2018, ISO NE.
0 20 40 60 80 100 120 140 160
Hours
0.2
0.4
0.6
0.8
1
1.2
Load (MW)
0
0.2
0.4
0.6
0.8
1
Price ($/MWh)
Load
Price
Figure 7: Normalized load and price of first week of June 2018,
ISO NE.
better performance. The detailed comparison of all the com-
pared methods is presented in this section. The results and
reasoning are also elaborated with the comparative analy-
sis. Moreover, the strengths and limitations of the compared
methods are highlighted.
The effect of proposed feature engineering is clear from
the numerical results. The forecasted accuracy of ESAE-
0 5 10 15 20 25
Hours
1
1.5
2
Load (MW)
104
Figure 8: One day consumption of all four seasons, ISO NE.
0 5 10 15 20 25
Hours
0
50
100
150
Price ($/MWh)
Figure 9: One day energy price of all four seasons, ISO NE.
0 20 40 60 80 100 120 140 160
Hours
0
20
40
60
Price ($/MWh)
Observed
Predicted
Figure 10: Forecasted and observed price of first week of June
2018, ISO NE.
0 50 100 150 200 250
Hours
1
1.5
2
2.5
Load (MW)
104
Predicted
Observed
Figure 11: Forecasted and observed load of first week of June
2018, ISO NE.
NARX with extracted features is much better as compared to
simple NARX. The extracted features are informative; there-
fore, the forecaster is able to model data in a better way and
forecast with greater accuracy.
The proposed methods are compared with three types of
ELMs: ELM, DE-ELM and RELM. The comparative anal-
ysis of these methods is given below.
Sana et al.: Preprint submitted to Elsevier Page 11 of 16
Big Data Predictive Analytics of Electricity Load and Price
0 20 40 60 80 100 120 140 160
Hours
0
50
100
150
200
250
Price ($/MWh)
Observed
ESAENARX
ELM
NARX
DE-ELM
RELM
DE-RELM
CEANN
Figure 12: Comparison of ESAENARX and DE-RELM price
prediction with NARX, ELM and DE-ELM, ISO NE.
0 60 120 170
Hours
0
1
2
3
4
Load (MW)
104
Observed
ESAENARX
ELM
NARX
RELM
DE-ELM
DE-RELM
CEANN
Figure 13: Comparison of ESAENARX and DE-RELM load
prediction with NARX, ELM and DE-ELM, ISO NE.
0 1 2 3 4 5 6 7
Hours 104
0.5
1
1.5
Load (MW)
105
Figure 14: Load of PJM from January 2010 to March 2018.
0 1 2 3 4 5 6 7
Hours 104
0
500
1000
Price ($/MWh)
Figure 15: Price of PJM from January 2010 to March 2018.
The ELM is optimized using a meta-heuristic optimization
algorithm, named differential evolution. The initial weights
and biases of ELMâĂŹs hidden and output layers are op-
timized using DE. DE is an optimization method that iter-
atively improves the performance of an algorithm with re-
spect to the optimization function. In the case of ELM, the
performance is improved, when the forecast accuracy im-
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Load (MW) 105
0
500
1000
Price ($/MWh)
Correlation Coefficient = 0.87
Figure 16: Price-demand signals relation of PJM from January
2018 to March 2018.
0 20 40 60 80 100 120 140 160
Hours
0
0.5
1
Load (MW)
0
0.5
1
Price ($/MWh)
Load
Price
Figure 17: Normalized load and price of PJM first week of
January 2018.
0 5 10 15 20 25
Hours
0.6
0.8
1
1.2
1.4
Load (MW)
105
Figure 18: One day consumption of all four seasons, PJM.
0 5 10 15 20 25
Hours
0
50
100
150
Price ($/MWh)
Figure 19: One day energy price of all four seasons, PJM.
proves. The objective function is to reduce the forecast error
on validation data of electricity load and price. First of all,
the population of weights and bias is generated. The pop-
ulation follows the normal distribution. For every selected
weight combination, the NRMSE and MAE are calculated.
The crossover and mutation operations are performed to gen-
erate new combinations of weights and biases. The opti-
Sana et al.: Preprint submitted to Elsevier Page 12 of 16
Big Data Predictive Analytics of Electricity Load and Price
0 20 40 60 80 100 120 140 160
Hours
0
50
100
150
Price ($/MWh)
Observed
Predicted
Figure 20: Actual and predicted price of PJM.
0 20 40 60 80 100 120 140 160
Hours
0.6
0.8
1
1.2
1.4
Load (MW)
105
Observed
Predicted
Figure 21: Actual and predicted load of PJM.
0 60 120 170
Hours
0
50
100
150
Price ($/MWh)
Observed
ESAENARX
NARX
ELM
DE-ELM
RELM
DE-RELM
CEANN
Figure 22: Comparison of ESAENARX and DE-RELM price
prediction with NARX, ELM and DE-ELM, PJM.
0 60 120 170
Hours
0
0.5
1
1.5
2
2.5
Load (MW)
105
Observed
ESAENARX
ELM
NARX
DE-ELM
RELM
DE-RELM
CEANN
Figure 23: Comparison of ESAENARX and DE-RELM load
prediction with NARX, ELM and DE-ELM, PJM.
mized combination of weights and biases are achieved after
multiple iterations of DE. The optimized weights and biases
are used in ELM for the price and load forecasting on test
data. The DE-ELM has a lesser error as compared to simple
ELM. The accuracy of DE-ELM is improved due to the op-
timized initial weights and biases according to the data. The
accuracy of DE-ELM is better than ELM and slightly worse
than RELM in load forecasting. However, for price fore-
casting, the performance of DE-ELM degrades. The price
data has high nonlinearity and dependency on exogenous
variables. Therefore, the relevant features of price are re-
quired to be extracted carefully. The proposed feature ex-
tractor ESAE is capable of extracting the fine details of rel-
evant data. Therefore, the proposed method, ESAENARX
shows good accuracy for both price and load forecasting.
RELM is a variant of the recurrent neural network. It is a
combination of two methods, ELM and RNN. ELM acts as
an encoder, where the inputs and outputs of the network are
same, i.e., the input features. The learned weights of the
ELM network are set as the initial weights of the RNN. By
keeping the inputs and outputs of ELM network similar, the
learned weights are a good representation of the input fea-
tures. The number of neurons in the hidden layer of ELM
and RNN is kept the same. Two ELM encoders are trained,
one for the hidden layerâĂŹs weights of RNN and second
for the output layerâĂŹs weights of the RNN. The learned
weights, make the RNN converge fast and better. The results
of RELM are slightly better than DE-ELM and comparable
to NARX. Both RELM and NARX belong to the same cat-
egory of the neural network, known as a recurrent neural
network.
The second proposed method DE-RELM perform reason-
ably well on load forecasting. The load forecasting results
are much better as compared to other techniques and com-
parable to ESAENARX. However, no significant improve-
ment is seen in the price forecast. ESAENARX performs
equally well for both load and price. The DE-RELM trains
the forecaster on learned weights, a minor improvement is
achieved, that is not comparable to ESAENARX. For price
forecast only properly extracted features can improve accu-
racy. ESAE extracts the relevant and the most informative
features, that improves the forecast accuracy.
ELM has the worst forecast results in the six compared meth-
ods. Because of the fact that ELM is a feed forward network.
Its weights are learned once in a forward pass and never
updated. Therefore, to achieve acceptable forecast results,
the initial weights of the ELM have to be very optimized.
NARX performs better as compared to the ELM. However,
its forecast results are not as accurate as the proposed meth-
ods ESAENARX and DE-RELM. The errors MAPE and
NRMSE are shown in Table 2. The forecast accuracy of all
six methods is in sequence: ESAENARX > DE-RELM >
NARX > DE-ELM > RELM > ELM.
The lesser error than compared methods verifies the good
performance of the ESAENARX forecast model. The PJM
results in Figure 22 and Figure 23, prove the better accu-
racy of ESAENARX and DE-RELM as compared to ELM,
DE-ELM, RELM and NARX. The MAPE and NRMSE of
ESAENARX, DE-RELM, ELM, DE-ELM, RELM, NARX
and CEANN [7] are listed in Table 2. The efficiency of
ESAENARX and DE-RELM is confirmed by lesser MAPE
and RMSE compared to the mentioned methods.
The computational time of both proposed models is pre-
sented in Table 3. The computational time of ESAENARX is
Sana et al.: Preprint submitted to Elsevier Page 13 of 16
Big Data Predictive Analytics of Electricity Load and Price
Table 2
Comparison of forecasting errors.
ISO NE
Forecast Method MAPE RMSE NRMSE
ELM 74.59 7.82 1.53
NARX 1.35 4.35 0.37
Load Forecast DE-ELM 21.73 5.23 0.41
RELM 18.78 4.62 0.37
CEANN [7]8.62 3.75 0.57
DE-RELM 7.78 3.14 0.32
ESAENARX 1.13 2.27 0.03
ELM 89.95 9.78 1.91
NARX 8.29 5.24 0.89
Price Forecast DE-ELM 28.06 6.92 0.32
RELM 21.06 5.62 0.28
CEANN [7]19.96 4.45 0.96
DE-RELM 18.62 3.75 0.34
ESAENARX 3.32 2.85 0.08
PJM
ELM 72.32 21.2 1.92
NARX 32 9.26 1.8
Load Forecast DE-ELM 6.52 9.18 0.08
RELM 1.14 9.04 0.032
CEANN [7]3.87 8.96 0.64
DE-RELM 1.09 5.24 0.028
ESAENARX 1.08 3.86 0.03
ELM 99 21.6 2.19
NARX 8.78 18.72 0.16
Price Forecast DE-ELM 18.49 21.76 0.35
RELM 11.09 18.96 0.52
CEANN [7]10.74 8.76 0.2604
DE-RELM 10.56 7.24 0.18
ESAENARX 4.32 4.67 0.12
Table 3
Computational time of proposed algorithms.
Model Dataset Training
Time (s)
Testing Time
(s)
SAENARX ISO NE 162 37
PJM 187 53
DE-RELM ISO NE 104 28
PJM 123 29
higher as compared to DE-RELM because the feature extrac-
tor ESAE involves pre-training and fine tuning steps. Both
models take more time for training on PJM data. The reason
behind PJM’s higher time complexity is its larger size than
ISO NE.
22. Conclusion
In this paper, electricity load and price forecasting is con-
sidered in order to take part in the ISO NE and PJM mar-
kets that regulate the price and demand in the power systems
of the USA. The modeling of electricity load and price is
addressed by two new deep learning based models: ESAE-
NARX and DE-RELM. Descriptive and predictive analytics
of electricity big data are performed. The proposed methods
consider the bidirectional impacts of demand and prices on
each other. These methods capture the load and price inter-
dependencies in the past market data. Following conclusions
are drawn from this study:
The big data analytics unveils the insightful infor-
mation about consumer behaviors and increasing de-
mand. This information helps in the formulation of
new demand-response programs and long term deci-
sions, such as, upscaling of the grid for satisfying the
future demand. Consequently, the grid stability is sig-
nificantly improved.
The proposed feature extractor; ESAE, significantly
improves the quality of extracting feature resulting in
accurate forecasting. The functionality of ESAE is im-
proved because of implementing proposed combina-
tion of decoder functions.
The proposed models efficiently capture price-
demand trends in energy big data. Numerical results
show that proposed forecasting models have lesser
MAPE and RMSE than the compared methods.
The feasibility and practicality of proposed models are
confirmed by their accuracy on well-known real elec-
tricity market data.
In future work, the SAE feature extractor will be enhanced
using multiple combinations of encoder and decoder func-
tions. The effect of each combination on the performance
of feature extractor will be examined. A comparative analy-
sis will be performed on enhanced feature extractor in order
to propose a generalized SAE that performs well on multi-
ple scenarios and datasets. Proposed models can be imple-
mented in real world scenario of smart grid or micro grid in
order to improve power system operations.
References
[1] Liu Y, Wang W, Ghadimi N. Electricity load forecasting
by an improved forecast engine for building level con-
sumers. Energy. 2017 Nov 15;139:18-30.
[2] Akhavan-Hejazi H, Mohsenian-Rad H. Power systems
big data analytics: An assessment of paradigm shift bar-
riers and prospects. Energy Reports. 2018 Nov 30;4:91-
100.
[3] Jiang H, Wang K, Wang Y, Gao M, Zhang Y. Energy big
data: A survey. IEEE Access. 2016; 4:3844-61.
[4] Zhou K, Fu C, Yang S. Big data driven smart energy
management: From big data to big insights. Renewable
and Sustainable Energy Reviews. 2016 Apr 1;56:215-
25.
[5] Zhang Q, Yang LT, Chen Z, Li P. A survey on deep
learning for big data. Information Fusion. 2018 Jul 31;
42:146-57.
Sana et al.: Preprint submitted to Elsevier Page 14 of 16
Big Data Predictive Analytics of Electricity Load and Price
[6] Ghasemi A, Shayeghi H, Moradzadeh M, Nooshyar M.
A novel hybrid algorithm for electricity price and load
forecasting in smart grids with demand-side manage-
ment. Applied energy. 2016 Sep 1;177:40-59.
[7] Gao W, Darvishan A, Toghani M, Mohammadi M, Abe-
dinia O, Ghadimi N. Different states of multi-block
based forecast engine for price and load prediction. In-
ternational Journal of Electrical Power & Energy Sys-
tems. 2019 Jan 1;104:423-35.
[8] Wang K, Xu C, Zhang Y, Guo S, Zomaya A. Robust
big data analytics for electricity price forecasting in the
smart grid. IEEE Transactions on Big Data. 2017 Jul 5,
DOI: 10.1109/TBDATA.2017.2723563.
[9] Singh S, Yassine A. Big data mining of energy time
series for behavioral analytics and energy consumption
forecasting. Energies. 2018 Feb 20;11(2):452.
[10] Wang L, Zhang Z, Chen J. Short-term electricity price
forecasting with stacked denoising autoencoders. IEEE
Transactions on Power Systems. 2017 Jul;32(4):2673-
81.
[11] Tong C, Li J, Lang C, Kong F, Niu J, Rodrigues JJ.
An efficient deep model for day-ahead electricity load
forecasting with stacked denoising autoencoders. Jour-
nal of Parallel and Distributed Computing. 2018 Jul
1;117:267-73.
[12] Ahmad A, Javaid N, Guizani M, Alrajeh N, Khan ZA.
An accurate and fast converging short-term load fore-
casting model for industrial applications in a smart grid.
IEEE Transactions on Industrial Informatics. 2017 Oct
1;13(5):2587-96.
[13] Ahmad A, Javaid N, Alrajeh N, Khan ZA, Qasim U,
Khan A. A modified feature selection and artificial neu-
ral network-based day-ahead load forecasting model for
a smart grid. Applied Sciences. 2015 Dec 11;5(4):1756-
72.
[14] Kuo PH, Huang CJ. An Electricity Price Forecasting
Model by Hybrid Structured Deep Neural Networks.
Sustainability. 2018 Apr 21;10(4):1280.
[15] Ugurlu U, Oksuz I, Tas O. Electricity Price Forecasting
Using Recurrent Neural Networks. Energies. 2018 Apr
23;11(5):1-23.
[16] Fan C, Xiao F, Zhao Y. A short-term building cooling
load prediction method using deep learning algorithms.
Applied energy. 2017 Jun 1;195:222-33.
[17] Ryu S, Noh J, Kim H. Deep neural network based de-
mand side short term load forecasting. Energies. 2016
Dec 22;10(1):3.
[18] Mocanu E, Nguyen PH, Gibescu M, Kling WL. Deep
learning for estimating building energy consumption.
Sustainable Energy, Grids and Networks. 2016 Jun
1;6:91-9.
[19] Li C, Ding Z, Zhao D, Yi J, Zhang G. Building energy
consumption prediction: An extreme deep learning ap-
proach. Energies. 2017 Oct 7;10(10):1525.
[20] Fu G. Deep belief network based ensemble approach
for cooling load forecasting of air-conditioning system.
Energy. 2018 Apr 1;148:269-82.
[21] Dedinec A, Filiposka S, Dedinec A, Kocarev L.
Deep belief network based electricity load forecasting:
An analysis of Macedonian case. Energy. 2016 Nov
15;115:1688-700.
[22] Qiu X, Ren Y, Suganthan PN, Amaratunga GA. Empir-
ical mode decomposition based ensemble deep learning
for load demand time series forecasting. Applied Soft
Computing. 2017 May 1;54:246-55.
[23] Rahman A, Srikumar V, Smith AD. Predicting electric-
ity consumption for commercial and residential build-
ings using deep recurrent neural networks. Applied En-
ergy. 2018 Feb 15;212:372-85.
[24] Bouktif S, Fiaz A, Ouni A, Serhani M. Optimal deep
learning lstm model for electric load forecasting using
feature selection and genetic algorithm: Comparison
with machine learning approaches. Energies. 2018 Jun
22;11(7):1636.
[25] Zheng H, Yuan J, Chen L. Short-term load forecast-
ing using EMD-LSTM neural networks with a Xgboost
algorithm for feature importance evaluation. Energies.
2017 Aug 8;10(8):1168.
[26] Shi H, Xu M, Li R. Deep learning for household load
forecasting-A novel pooling deep RNN. IEEE Transac-
tions on Smart Grid. 2018 Sep;9(5):5271-80.
[27] Guo Z, Zhou K, Zhang X, Yang S. A deep learning
model for short-term power load and probability density
forecasting. Energy. 2018 Oct 1;160:1186-200.
[28] Wen L, Zhou K, Yang S, Lu X. Optimal load dispatch
of community microgrid with deep learning based solar
power and load forecasting. Energy. 2019 Jan 16.
[29] Torres JF, Fernandez AM, Troncoso A, Martinez-
Alvarez F. Deep learning-based approach for time series
forecasting with application to electricity load. In In-
ternational Work-Conference on the Interplay Between
Natural and Artificial Computation 2017 Jun 19 (pp.
203-212). Springer, Cham.
[30] Din GM, Marnerides AK. Short term power load fore-
casting using deep neural networks. In 2017 Interna-
tional Conference on Computing, Networking and Com-
munications (ICNC) 2017 Jan 26 (pp. 594-598). IEEE.
[31] Bibri SE. The IoT for smart sustainable cities of the
future: An analytical framework for sensor-based big
data applications for environmental sustainability. Sus-
tainable Cities and Society. 2018 Apr 1, 38: 230-253.
Sana et al.: Preprint submitted to Elsevier Page 15 of 16
Big Data Predictive Analytics of Electricity Load and Price
[32] Bibri SE, Krogstie J. Smart sustainable cities of the
future: An extensive interdisciplinary literature review.
Sustainable Cities and Society. 2017 May 1, 31: 183-
212.
[33] Silva BN, Khan M, Han K. Towards sustainable smart
cities: A review of trends, architectures, components,
and open challenges in smart cities. Sustainable Cities
and Society. 2018 Apr 1, 38: 697-713.
[34] Ibrahim M, El-Zaart A, Adams C. Smart sustainable
cities roadmap: Readiness for transformation towards
urban sustainability. Sustainable cities and society. 2018
Feb 1, 37: 530-540.
[35] Massana J, Pous C, Burgas L, Melendez J, Colomer J.
Identifying services for short-term load forecasting us-
ing data driven models in a Smart City platform. Sus-
tainable cities and society. 2017 Jan 1, 28: 108-17.
[36] White, B.W. Principles of neurodynamics: Perceptrons
and the theory of brain mechanisms. Spartan Books,
Washington DC. 1963.
[37] Youssef A, Delpha C, Diallo D. An optimal fault de-
tection threshold for early detection using Kullback-
âĂŞLeibler divergence for unknown distribution data.
Signal Processing. 2016 Mar 1;120:266-79.
[38] Hida T, Kuo HH, Potthoff J, Streit L. White noise: an
infinite dimensional calculus. Springer Science & Busi-
ness Media; 2013 Jun 29.
[39] Chen S, Billings SA, Grant PM. Non-linear system
identification using neural networks. International jour-
nal of control.
[40] Chen X, Li S, Wang W. New de-noising method for
speech signal based on wavelet entropy and adaptive
threshold. Journal of Information & Computational Sci-
ence. 2015;12(3):1257-65.
[41] NYISO Market Operation Data, https://www.nyiso.
com/load-data (Last visited on 16𝑡ℎ March 2019)
[42] PJM Market Operation Data, https://www.pjm.com
(Last visited on 16𝑡ℎ March 2019)
[43] ISO NE Market Operations Data, https://www.iso-ne.
com/isoexpress/web/reports/pricing/-/tree/zone- info
(Last visited on 10𝑡ℎ November 2018)
[44] PJM Market Operations Data, https://dataminer2.pjm.
com (Last visited on 10𝑡ℎ November 2018)
[45] Burke PJ, Abayasekara A. The price elas-
ticity of electricity demand in the United
States: A three-dimensional analysis. Energy J.
2017;39(2):123âĂŞ145.
Sana et al.: Preprint submitted to Elsevier Page 16 of 16
... So, accurate short term forecasting is necessary as it is beneficial both for consumer and seller. In [23][24][25][26][27][28][29], authors highlight the issues in price and load forecasting ( Table 2). ...
Conference Paper
Full-text available
Conventional grid moves towards Smart Grid (SG). In conventional grids, electricity is wasted in generation-transmissions-distribution, and communication is in one direction only. SG is introduced to solve prior issues. In SG, there are no restrictions, and communication is bi-directional. Electricity forecasting plays a significant role in SG to enhance operational cost and efficient management. Load and price forecasting gives future trends. In literature many data-driven methods have been discussed for price and load forecasting. The objective of this paper is to focus on literature related to price and load forecasting in last four years. The author classifies each paper in terms of its problems and solutions. Additionally, the comparison of each proposed technique regarding performance are presented in this paper. Lastly, papers limitations and future challenges are discussed.
... In most cases, the available data has missing values, outliers and is unscaled. As a result, the accuracy of the classification models is reduced [11]. ...
Article
Full-text available
The problem of electricity theft is exponentially increasing around the globe, which is harmful to the power sectors and consumers. The recent development in the advanced metering infrastructure brings opportunities for experts to identify the electricity thieves in the smart grid community. Many advancements are made in the area of the smart grid for Electricity Theft Detection (ETD), where the data collected from the smart meters is utilized. However, the problems of imbalanced distribution of data and inaccurate classification are not efficiently addressed. Therefore, to overcome the problems, machine learning and deep learning models are proposed for ETD. Initially, to refine the smart meters' data, pre-processing methods are used. Then, the class imbalance problem is solved through Synthetic Minority Oversampling Tomek Links (ST-Links). It solves the classifier's biasness problem, which occurs due to imbalanced data. It achieves the benefits of both data oversampling and undersampling. Afterwards, an AlexNet and peephole long short-term memory network based feature extractor with an attention layer is developed to extract the relevant features from electricity consumption profiles that are most suitable to classify honest and theft consumers. After the extraction of suitable features, the classification of consumers is performed by an echo state neural network. Moreover, an evolutionary grey wolf optimization technique is utilized to tune the hyper-parameters of the proposed model. A paired t-test is also applied on the final classification results for a reliable assessment of the proposed model. The simulations are conducted on a realistic smart meters' dataset of China to check the performance of the proposed model. In addition, different benchmark models are implemented to perform a comparative analysis. Different meaningful performance metrics are considered for the fair evaluation of the proposed model: Matthews Correlation Coefficient (MCC), F1-score, Area Under Curve (AUC), precision and recall. The simulation results depict that the proposed model obtains accuracy, recall, F1-score, AUC, PR-AUC, precision and MCC score of 96.3%, 92.1%, 92.0%, 96.4%, 97.3%, 90.0% and 84.0%, respectively. It is worth mentioning that the application of the proposed solution is quite general. Therefore, it can be used by the power companies to overcome the power losses in the energy sector. INDEX TERMS AlexNet, Data pre-processing, electricity theft detection, echo state neural network, imbalanced data, peephole LSTM, supervised learning, smart meter.
... In [58,[79][80][81][82][83][84][85][86][87][88], despite of extensive uses of ML techniques, no one focuses on the selection of optimal features. In [46,[89][90][91][92][93][94]123], the authors give possibilities of implementing ML classifiers for detection of NTLs and describe the advantage of selecting optimal features and their impacts on classifier performance. One of main challenges [95] that limited the classification ability of existing methods are high dimensionality of data. ...
Research Proposal
Full-text available
In this synopsis, the first solution introduces a hybrid deep learning model, which tackles the class imbalance problem and curse of dimensionality and low detection rate of existing models. The proposed model integrates benefits of both GoogLeNet and gated recurrent unit. The one dimensional EC data is fed into GRU to remember periodic patterns. Whereas, GoogLeNet model is leveraged to extract latent features from the two dimensional weekly stacked EC data. Furthermore , the time least square generative adversarial network is proposed to solve the class imbalance problem. The second solution presents a framework, which is employed to solve the curse of dimensionality issue. In literature, the existing studies are mostly concerned with tuning the hyperparameters of ML/ DL methods for efficient detection of NTL. Some of them focus on the selection of prominent features from data to improve the performance of electricity theft detection. However, the curse of dimensionality affects the generalization ability of ML/ DL classifiers and leads to computational, storage and overfitting problems. Therefore, to deal with above-mentioned issues, this study proposes a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for electricity theft detection using big data in electric power systems. The third solution introduces a hybrid deep learning model for prediction of upwards and downwards trends in financial market data. The financial market exhibits complex and volatile behavior that is difficult to predict using conventional machine learning (ML) and statistical methods, as well as shallow neural networks. Its behavior depends on many factors such as political upheavals , investor sentiment, interest rates, government policies, natural disasters, etc. However, it is possible to predict upward and downward trends in financial market behavior using complex DL models. In this synopsis, we have proposed three solutions to solve different issues in smart grids and financial market. The validations of proposed solutions will be done in thesis work using real-world datasets.
... In [58,79,80,81,82,83,84,85,86,87,88], despite of extensive uses of ML techniques, no one focuses on the selection of optimal features. In [46,123,89,90,91,92,93,94], the authors give possibilities of implementing ML classifiers for detection of NTLs and describe the advantage of selecting optimal features and their impacts on classifier performance. One of main challenges [95] that limited the classification ability of existing methods are high dimensionality of data. ...
Thesis
Full-text available
Data science is an emerging field, which has applications in multiple disciplines; like healthcare, advanced image recognition, airline route planning, augmented reality, targeted advertising, etc. In this thesis, we have exploited its applications in smart grids and financial markets with three major contributions. In the first two contributions, machine learning (ML) and deep learning (DL) models are utilized to detect anomalies in electricity consumption (EC) data, while in third contribution, upwards and downwards trends in the financial markets are predicted to give benefits to the potential investors. Non-technical losses (NTLs) are one of the major causes of revenue losses for electric utilities. In the literature, various ML and DL approaches are employed to detect NTLs. The first solution introduces a hybrid DL model, which tackles the class imbalance problem and curse of dimensionality and low detection rate of existing models. The proposed model integrates benefits of both GoogLeNet and gated recurrent unit (GRU). The one dimensional EC data is fed into GRU to remember periodic patterns. Whereas, GoogLeNet model is leveraged to extract latent features from the two dimensional weekly stacked EC data. Furthermore, the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples, which have high resemblance with real world theft samples. The standard generative adversarial network only updates the weights of those points that are available at the wrong side of the decision boundary. Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of decision boundary, which prevent the model from vanishing gradient problem. Moreover, dropout and batch normalization layers are utilized to enhance model’s convergence speed and generalization ability. The proposed model is compared with different state-of-the-art classifiers including multilayer perceptron (MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network and wide and deep convolutional neural network. The second solution presents a framework, which is employed to solve the curse of dimensionality issue. In literature, the existing studies are mostly concerned with tuning the hyperparameters of ML/ DL methods for efficient detection of NTL, i.e., electricity theft detection. Some of them focus on the selection of prominent features from data to improve the performance of electricity theft detection. However, the curse of dimensionality affects the generalization ability of ML/ DL classifiers and leads to computational, storage and overfitting problems. Therefore, to deal with above-mentioned issues, this study proposes a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for electricity theft detecton using big data in electric power systems. The former (metaheuristics) are used to select prominent features. While the latter are utilized to extract high variance features from electricity consumption data. First, new features are synthesized from statistical and electrical parameters from the user’s consumption history. Then, the synthesized features are used as input to metaheuristic techniques to find a subset of optimal features. Finally, the optimal features are fed as input to the denoising autoencoder to extract features with high variance. The ability of both techniques to select and extract features is measured using a support vector machine. The proposed system reduces the overfitting, storage and computational overhead of ML classifiers. Moreover, we perform several experiments to verify the effectiveness of our proposed system and results reveal that the proposed system has higher performance our counterparts. The third solution introduces a hybrid DL model for prediction of upwards and downwards trends in financial market data. The financial market exhibits complex and volatile behavior that is difficult to predict using conventional ML and statistical methods, as well as shallow neural networks. Its behavior depends on many factors such as political upheavals, investor sentiment, interest rates, government policies, natural disasters, etc. However, it is possible to predict upward and downward trends in financial market behavior using complex DL models. This paper therefore addresses the following limitations that adversely affect the performance of existing ML and DL models, i.e., the curse of dimensionality, the low accuracy of the standalone models, and the inability to learn complex patterns from high-frequency time series data. The denoising autoencoder is used to reduce the high dimensionality of the data, overcoming the problem of overfitting and reducing the training time of the ML and DL models. Moreover, a hybrid DL model HRG is proposed based on a ResNet module and gated recurrent units. The former is used to extract latent or abstract patterns that are not visible to the human eye, while the latter retrieves temporal patterns from the financial market dataset. Thus, HRG integrates the advantages of both models. It is evaluated on real-world financial market datasets obtained from IBM, APPL, BA and WMT . Also, various performance indicators such as f1-score, accuracy, precision, recall, receiver operating characteristic-area under the curve (ROC-AUC) are used to check the performance of the proposed and benchmark models. The RG 2 achieves 0.95, 0.90, 0.82 and 0.80 ROC-AUC values on APPL, IBM, BA and WMT datasets respectively, which are higher than the ROC-AUC values of all implemented ML and DL models.
... Competitive electricity markets benefit from load and price forecast [28]- [31]. Several important operating decisions are based on load forecasts, such as, power generation scheduling [32]- [35], demand supply management [36]- [39], maintenance planning [40]- [42] and reliability analysis [43]. Price forecast is crucial to energy market participants for bidding strategies formulation, assets allocation, risk assessment and facility investment planning. ...
Thesis
Full-text available
The revolution of power grids from traditional grids to Smart Grids (SGs) requires effective Demand Side Management (DSM) and reliable Renewable Energy Sources (RESs) incorporation in order to maintain demand, supply balance and optimize energy in an environment friendly manner. Data analytics provide solutions to the emerging challenges of power systems, such as DSM, environmental pollution (due to carbon emission), fossil fuel dependency mitigation, RESs incorporation, cost curtailment, grid’s stability and security. To efficiently manage electricity and maximize the profit of power utilities several tasks are focused in this thesis, i.e., prediction of electricity load to avoid demand and generation mismatch, wind power forecasting to satisfy energy demand effectively, electricity price forecasting for regulating market operations, carbon emissions forecasting for reducing payment of carbon tax, Electricity Theft Detection (ETD) for recovering power utilities’ revenue loss caused by electricity theft. In addition to that, a wind power forecast based DSM scheme is proposed. Furthermore, impact of RESs integration level on carbon emissions, electricity price and consumption cost is quantified. Both forecasting and classification techniques are utilized for efficient energy management. Forecasting of electricity load, price, wind power and carbon emissions is performed, whereas, classification of fair and fraudulent electricity consumers is performed. To balance electricity demand and supply, electricity load forecasting is required. Three models are proposed for this purpose, i.e., Deep Long Short-Term Memory (DLSTM), Efficient Sparse Autoencoder Nonlinear Autoregressive eXogenous network (ESAENARX) and Differential Evolution Recurrent Extreme Learning Machine (DE-RELM). DLSTM utilizes univariate data and gives single result, whereas, ESAENARX and DE-RELM model multivariate data and predict electricity load and price simultaneously. Due to adaptive and automatic feature learning mechanism, DLSTM achieves accurate results for separate forecasting of electricity load and price. ESAENARX and DE-RELM models are enhanced by newly proposed efficient feature extractor and model’s parameter tuning, respectively. Real-world datasets of ISO-NE, PJM, NYISO are used for load and price forecasting. The purpose of regulating the electricity market operations is achieved by forecasting of electricity load, price, wind power and carbon emissions. Wind power generation is predicted by an efficient model named Efficient Deep Convolution Neural Network (EDCNN). Moreover, a DSM strategy is also proposed based on predicted wind power generation. Power utilities have to pay carbon emissions tax imposed by government. To pay less carbon emissions tax, carbon emissions prediction is required, which helps in encouraging electricity consumers to shift their consumption load to low carbon price time periods of the day. For accomplishing the carbon emissions forecasting task, an efficient model named as Improved Particle Swarm Optimization based Deep Neural Network (IPSO DNN) is proposed. This model is improved by tunning the parameters of DNN by newly proposed improved optimization technique named as IPSO. ISO-NE dataset is used for wind power and carbon emissions forecasting. To reduce the financial loss of power utilities ETD is very important. For this purpose four models are proposed, named as, Differential Evolution Random Under Sampling Boosting (DE-RUSBoost), Jaya-RUSBoost, RUS Ensemble CNN (RUSE-CNN) and anomaly detection based ETD. In DE-RUSBoost and Jaya-RUSBoost, the parameters of RUSBoost classifier are tunned by DE and Jaya optimization techniques, respectively. In RUSE-CNN, RUS data balancing technique is applied along with ensemble CNN to improve ETD performance. DE-RUSBoost, Jaya-RUSBoost and RUSE-CNN are supervised model that work on labeled electricity theft data. Whereas, anomaly detection based ETD model is capable of identifying electricity theft from unlabeled electricity consumption data. Real-world datasets of SGCC, UMass, PRECON, CER, EnerNOC and LCL are used for ETD. Simulation results show that all the proposed models perform significantly better on real-world dataset as compared to their state-of-the-art counterpart models. The improved feature engineering and model hyper-parameter tuning enhance the performance of the proposed models in terms of prediction and classification results.
Article
Short-term electrical energy load forecasting is one of the most significant problems associated with energy management for smart grids, which aims to optimize the operational strategies of buildings. Electricity forecasting models are considered a key aspect of the provision of better electricity management and reductions in energy consumption. This motivates the researchers to develop efficient electricity load forecasting (ELF) models, based on historical nonlinear and high volatile data, which require appropriate forecasting strategies. Therefore, in this article, we present an innovative two-phase framework for short-term ELF. The first phase is dedicated to data cleansing, in which pre-processing strategies are applied to raw data. In the second phase, a deep residual Convolutional Neural Network (CNN) is designed to extract the important features from the refined data. To the best of our knowledge, this is the first work to introduce a deep CNN architecture for the extraction of spatial features from electricity data. The output of the residual CNN network is forwarded to a stacked Long Short-Term Memory (LSTM) network to learn the temporal information of the electricity data. The proposed model is then evaluated using the Individual-Household-Electric-Power-Consumption (IHEPC) and Pennsylvania–New Jersey–Maryland (PJM) datasets. The results reveal a significant reduction in the error rate over the IHEPC dataset in terms of Mean-Absolute-Error (MAE) (15.65%), Mean-Square-Error (MSE) (8.77%), and Root-Mean-Square-Error (RMSE) (14.85%) and over the PJM dataset our method reduced RMSE up to 3.4% as compared to baseline models i.e., linear regression, LSTM, and Gated Recurrent Unit (GRU). Furthermore, we performed several experiments with CNN, LSTM, and GRU models and evaluated it with additional Coefficient of Variation of the RMSE (CV-RMSE) metrics, which proves the effectiveness of our model for short-term load forecasting.
Article
Full-text available
Smart meters are key elements of a smart grid. These data from Smart Meters can help us analyze energy consumption behaviour. The machine learning and deep learning approaches can be used for mining the hidden theft detection information in the smart meter data. However, it needs effective data extraction. This research presents a theft detection dataset (TDD2022) and a machine learning-based solution for automated theft identification in a smart grid environment. An effective theft generator is modelled and used for obtaining a multi-class theft detection dataset from publicly available consumer energy consumption data, owned by the “Open Energy Data Initiative” (OEDI) platform. This is an important and interesting phase to explore in the smart grid field. The proposed dataset can be used for benchmarking and comparative studies. We evaluated the proposed dataset using five different machine learning techniques: k-nearest neighbours (KNN), decision trees (DT), random forest (RF), bagging ensemble (BE), and artificial neural networks (ANN) with different evaluation alternatives (mechanisms). Overall, our best empirical results have been recorded to the theft detection-based RF model scoring an improvement in the performance metrics by 10% or more over the other developed models.
Article
Real-time electricity market data is highly volatile and very noisy. The properties of such data make forecasting models difficult to develop, with traditional statistical models in particular affected by the “curse of dimensionality” for such data. However, autoencoders, or neural networks specifically designed to reduce the noise and dimensions of input data, may prove useful to advance the accuracy of real-time price forecasting models. This paper studies the optimal design of such an autoencoder, developing a quadruple branch, CNN-based autoencoder (QCAE) which is pre-trained and then directly linked to a forecasting model. The QCAE compresses the input data in both time and feature directions. Ablation analyses verify the architecture of the QCAE, and its integration with the forecasting model is tested and validated on fifty generators in the New York Independent System Operator (NYISO) power grid. The QCAE forecasting framework outperforms benchmark and state-of-the-art models with an average improvement of 6.3% in sMAPE and 3.10% in MAE.
Article
Electricity theft has significant impact on the power grids in terms of generating non-technical losses, which eventually degrading the power quality and minimizing the outfitted profit. In this paper, we proposed a hybrid approach based on deep learning and support vector machine for the detection of energy theft to facilitate and assess energy supplier companies to eliminate the issue of insufficient power, irregular power expenditure and ineffective electricity monitoring. A deep convolutional neural network is proposed for the feature learning using smart meters data in different times, varying from hours to days. Extracted features were further used to train support vector machine, which classify the features in two categories as theft and non-theft. Furthermore, a dropout layer is introduced in convolutional neural network model to avoid over fitting issues. Several careful experiments were carried out on real time customers smart meter data and the results validate the effectiveness of the proposed method in terms of accuracy and less detection error.
Article
Full-text available
Electric power systems are taking drastic advances in deployment of information and communication technologies; numerous new measurement devices are installed in forms of advanced metering infrastructure , distributed energy resources (DER) monitoring systems, high frequency synchronized wide-area awareness systems that with great speed are generating immense volume of energy data. However, it is still questioned that whether the today's power system data, the structures and the tools being developed are indeed aligned with the pillars of the big data science. Further, several requirements and especial features of power systems and energy big data call for customized methods and platforms. This paper provides an assessment of the distinguished aspects in big data analytics developments in the domain of power systems. We perform several taxonomy of the existing and the missing elements in the structures and methods associated with big data analytics in power systems. We also provide a holistic outline, classifications, and concise discussions on the technical approaches, research opportunities, and application areas for energy big data analytics.
Article
Full-text available
Background: With the development of smart grids, accurate electric load forecasting has become increasingly important as it can help power companies in better load scheduling and reduce excessive electricity production. However, developing and selecting accurate time series models is a challenging task as this requires training several different models for selecting the best amongst them along with substantial feature engineering to derive informative features and finding optimal time lags, a commonly used input features for time series models. Methods: Our approach uses machine learning and a long short-term memory (LSTM)-based neural network with various configurations to construct forecasting models for short to medium term aggregate load forecasting. The research solves above mentioned problems by training several linear and non-linear machine learning algorithms and picking the best as baseline, choosing best features using wrapper and embedded feature selection methods and finally using genetic algorithm (GA) to find optimal time lags and number of layers for LSTM model predictive performance optimization. Results: Using France metropolitan’s electricity consumption data as a case study, obtained results show that LSTM based model has shown high accuracy then machine learning model that is optimized with hyperparameter tuning. Using the best features, optimal lags, layers and training various LSTM configurations further improved forecasting accuracy. Conclusions: A LSTM model using only optimally selected time lagged features captured all the characteristics of complex time series and showed decreased Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for medium to long range forecasting for a wider metropolitan area.
Article
Full-text available
Accurate electricity price forecasting has become a substantial requirement since the liberalization of the electricity markets. Due to the challenging nature of electricity prices, which includes high volatility, sharp price spikes and seasonality, various types of electricity price forecasting models still compete and cannot outperform each other consistently. Neural Networks have been successfully used in machine learning problems and Recurrent Neural Networks (RNNs) have been proposed to address time-dependent learning problems. In particular, Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU) are tailor-made for time series price estimation. In this paper, we propose to use multi-layer Gated Recurrent Units as a new technique for electricity price forecasting. We have trained a variety of algorithms with three-year rolling window and compared the results with the RNNs. In our experiments, three-layered GRUs outperformed all other neural network structures and state-of-the-art statistical techniques in a statistically significant manner in the Turkish day-ahead market.
Article
Full-text available
Electricity price is a key influencer in the electricity market. Electricity market trades by each participant are based on electricity price. The electricity price adjusted with the change in supply and demand relationship can reflect the real value of electricity in the transaction process. However, for the power generating party, bidding strategy determines the level of profit, and the accurate prediction of electricity price could make it possible to determine a more accurate bidding price. This cannot only reduce transaction risk, but also seize opportunities in the electricity market. In order to effectively estimate electricity price, this paper proposes an electricity price forecasting system based on the combination of 2 deep neural networks, the Convolutional Neural Network (CNN) and the Long Short Term Memory (LSTM). In order to compare the overall performance of each algorithm, the Mean Absolute Error (MAE) and Root-Mean-Square error (RMSE) evaluating measures were applied in the experiments of this paper. Experiment results show that compared with other traditional machine learning methods, the prediction performance of the estimating model proposed in this paper is proven to be the best. By combining the CNN and LSTM models, the feasibility and practicality of electricity price prediction is also confirmed in this paper.
Article
Full-text available
Responsible, efficient and environmentally aware energy consumption behavior is becoming a necessity for the reliable modern electricity grid. In this paper, we present an intelligent data mining model to analyze, forecast and visualize energy time series to uncover various temporal energy consumption patterns. These patterns define the appliance usage in terms of association with time such as hour of the day, period of the day, weekday, week, month and season of the year as well as appliance-appliance associations in a household, which are key factors to infer and analyze the impact of consumers’ energy consumption behavior and energy forecasting trend. This is challenging since it is not trivial to determine the multiple relationships among different appliances usage from concurrent streams of data. Also, it is difficult to derive accurate relationships between interval-based events where multiple appliance usages persist for some duration. To overcome these challenges, we propose unsupervised data clustering and frequent pattern mining analysis on energy time series, and Bayesian network prediction for energy usage forecasting. We perform extensive experiments using real-world context-rich smart meter datasets. The accuracy results of identifying appliance usage patterns using the proposed model outperformed Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) at each stage while attaining a combined accuracy of 81.82%, 85.90%, 89.58% for 25%, 50% and 75% of the training data size respectively. Moreover, we achieved energy consumption forecast accuracies of 81.89% for short-term (hourly) and 75.88%, 79.23%, 74.74%, and 72.81% for the long-term; i.e., day, week, month, and season respectively.
Article
A deep recurrent neural network with long short-term memory units (DRNN-LSTM) model is developed to forecast aggregated power load and the photovoltaic (PV) power output in community microgrid. Meanwhile, an optimal load dispatch model for grid-connected community microgrid which includes residential power load, PV arrays, electric vehicles (EVs), and energy storage system (ESS), is established under three different scheduling scenarios. To promote the supply-demand balance, the uncertainties of both residential power load and PV power output are considered in the model by integrating the forecasting results. Two real-world data sets are used to test the proposed forecasting model, and the results show that the DRNN-LSTM model performs better than multi-layer perception (MLP) network and support vector machine (SVM). Finally, particle swarm optimization (PSO) algorithm is used to optimize the load dispatch of grid-connected community microgrid. The results show that EES and the coordinated charging mode of EVs can promote peak load shifting and reduce 8.97% of the daily costs. This study contributes to the optimal load dispatch of community microgrid with load and renewable energy forecasting. The optimal load dispatch of community microgrid with deep learning based solar power and load forecasting achieves total costs reduction and system reliability improvement.
Article
Accurate load forecasting is critical for power system planning and operational decision making. In this study, we are the first to utilize a deep feedforward network for short-term electricity load forecasting. Our results are compared to those of popular machine learning models such as random forest and gradient boosting machine models. Then, electricity consumption patterns are explored based on monthly, weekly and temperature-based patterns in terms of feature importance. Also, a probability density forecasting method based on deep learning, quantile regression and kernel density estimation is proposed. To verify the efficiency of the proposed methods, three case studies based on daily electricity consumption data for three Chinese cities for 2014 are conducted. The empirical results demonstrate that (1) the proposed deep learning-based approach exhibits better forecasting accuracy in terms of measuring electricity consumption relative to the random forest and gradient boosting model; (2) monthly, weekly and weather-related variables are key factors that have a great influence on household electricity consumption; and (3) the proposed probability density forecasting method is capable of forecasting high-quality prediction intervals via probability density forecasting.
Article
Deep learning, as one of the most currently remarkable machine learning techniques, has achieved great success in many applications such as image analysis, speech recognition and text understanding. It uses supervised and unsupervised strategies to learn multi-level representations and features in hierarchical architectures for the tasks of classification and pattern recognition. Recent development in sensor networks and communication technologies has enabled the collection of big data. Although big data provides great opportunities for a broad of areas including e-commerce, industrial control and smart medical, it poses many challenging issues on data mining and information processing due to its characteristics of large volume, large variety, large velocity and large veracity. In the past few years, deep learning has played an important role in big data analytic solutions. In this paper, we review the emerging researches of deep learning models for big data feature learning. Furthermore, we point out the remaining challenges of big data deep learning and discuss the future topics.