Big Data Analytics for Price and Load
Forecasting in Smart Grids
Sana Mujeeb1, Nadeem Javaid1,∗, Rabiya Khalid1, Orooj Nazeer2,
Isra Shaﬁ2, and Mahnoor Khan1
1COMSATS University, Islamabad 44000, Pakistan
2Abasyn University Department of Computing and Technology,
Islamabad, 44000, Pakistan
∗Correspondence: email@example.com, www.njavaid.com
Abstract. This paper focuses on analytics of an extremely large dataset
of smart grid electricity price and load, that is diﬃcult to process with
conventional computational models, referred as big data. The process-
ing and analyzing of power big data divulges the deeper insights that
help experts in improvement of smart grid operations. Processing and
extracting of the meaningful most information from data is challenging.
Electricity load and price are the most inﬂuential factors in the electricity
market. For improving reliability, control and management of electricity
market operations, an exact estimate of the day ahead load is a sub-
stantial requirement. Energy market trade is based on price. Accurate
price forecast enables energy market participants to make eﬀective and
most proﬁtable bidding strategies. This paper proposes a deep learning-
based model for the forecast of price and demand on big data using
deeper Long Short-Term Memory (LSTM). Due to adaptive and auto-
matic feature learning of DNNs, processing of big data is easier with
LSTM, compared to purely data driven methods. The proposed model
is evaluated using a well-known real electricity markets’ data.
Key words: Smart grid, Big data, electricity load, price forecasting,
Long Short-Term Memory LSTM
The smart grid is the modern and intelligent power grid that eﬃciently man-
ages the generation, distribution and consumption of electricity by introducing
communication, sensing and control technologies in power grids. Smart grid fa-
cilitates customers in an economical, reliable, sustainable and secure manner.
Customers can manage their energy demand in an economical fashion based on
Demand Side Management (DSM). DSM is a program that allows customers
to manage their load demand according to price variations. It oﬀers the energy
customers load shifting and energy preservation in order to reduce the cost of
power consumption. Smart grid establishes an interactive environment between
energy consumers and utility. Customers partake in smart grid operations, for a
reduced price by load shifting and energy preservation.
2 Sana Mujeeb et al.
Competitive electricity markets beneﬁt from load and price forecast. Sev-
eral important operating decisions are based on load forecasts, such as power
generation scheduling, demand supply management, maintenance planning and
reliability analysis. While price forecast is crucial to energy market participants
for bidding strategies formulation, assets allocation, risk assessment and facil-
ity investment planning. Eﬀective bidding strategies help market participants in
maximizing proﬁt. Utility maximization is the ultimate goal of both power pro-
ducers and customers. With the help of robust and exact price estimate power
producers can maximize proﬁt and consumers can minimize the cost of their
purchased electricity. The necessity of eﬃcient generation and consumption is
another crucial issue in the energy sector. As most of the generated electricity
cannot be stored, a perfect equilibrium is necessary to be maintained between the
generated and consumed electricity. Therefore, accurate forecast of both electric-
ity load and price holds a great importance in market operations management.
ISO New England is a Regional Transmission Organization (RTO), coordi-
nated by an independent system operator (ISO). It is responsible for manage-
ment of the wholesale energy markets operations and power trade auctions. ISO
NE provides energy to six states of New England including Connecticut, Maine,
Massachusetts, New Hampshire, Rhodes Island and Vermont. In this paper ana-
lytics are performed on a large dataset of ISO NE. ISO NE price and load exhibit
certain characteristics. Electricity load and price have a relationship of direct pro-
portionality. However, some unexpected variations are observed in price data of
ISO NE. There are various reason for these unexpected changes in price pattern.
In reality price is not only eﬀected by a change in load, there are several diﬀerent
parameters inﬂuencing energy price, such as fuel price, availability of inexpensive
generation sources (e.g. hydro, wind etc.), weather conditions etc. In this paper
analysis are performed on electricity large amount of data referred as big data.
Big data is deﬁned as datasets with extremely huge volume and complexity that
are not possible to process with traditional data mining techniques. Big data
analytics enables identiﬁcation of hidden patterns, consumer preferences, mar-
ket trends, and other valuable information that help utility company to make
strategic business decisions. The amount of real world historical data of smart
grid is very large . Authors survey smart grid big data with great detail in .
This large volume of data enables energy utilities to make novel analysis lead-
ing to major improvements in market operation’s planning and management.
Utilities can have a better understanding of: customer behavior, demand and
consumption, power failures and downtime etc.
Various techniques are used for load and price forecasting. With increasing
size of input data the conventional forecasting methods become diﬃcult to train
and utilize. Big data is really diﬃcult to handle by classiﬁer models due to their
high computational cost and memory usage. On the other hand deep learning
methods work very well on big data, because they divide training data into
mini batches and train the whole data batch by batch. Artiﬁcial Neural Net-
works have the excellent ability of nonlinear approximation and self learning,
that makes them the suitable most method for electricity price and load fore-
Smart Grid Big Data Analytics 3
casting. Deep neural networks are capable of automatic extraction of complex
data representations with greater accuracy, as they have higher computation
power compared to shallow ANN.
The main objective of this paper is to propose an accurate forecast model
that can take advantage of large amount data.
2 Related Work
Several forecasting methods available in the literature, from traditional time
series analysis to modern machine learning methods. Generally forecasting mod-
els can be categorized into three major groups: classical, artiﬁcially intelligent
and data driven. Classical methods are the statistical and mathematical such
as ARIMA, SARIMA, Naive Bayes, Random Forest etc. Artiﬁcial intelligent
methods are Artiﬁcial Neural Networks, Particle Swarm Optimization PSO, etc.
Classiﬁer based approaches are widely used for forecasting, such as Sperm Whale
Algorithm + LSSVM , FWPT + TV-ABC + LSSVM , DE + SVM .
Although the afore-mention methods show reasonable results in load or price
forecasting, however they don’t consider the simultaneous forecasting of load
and price. Moreover optimized classiﬁer based forecasting methods suﬀer from
the over-ﬁtting problem.
The existing forecasting methods mostly forecast only load or price. A fore-
casting method that can accurately forecast both load and price together is
greatly required. Conventional forecasting methods in literature have to hand-
craft useful features with great eﬀort [3-5] before forecasting. Neural networks
has a advantage over other methods that it automatically extracts features from
data and learn complex and meaningful pattern eﬃciently. However, Shallow
Neural Networks (SNN) [3, 6] tends to over-ﬁt and need to optimized for im-
proving forecast accuracy. Recently Deep Neural Networks (DNN) have shown
promising results in forecasting of load [7,8] and price [10, 11]. In  authors
used Restricted Boltzman Machine (RBM) with pre-training and Rectiﬁed Lin-
ear Unit (ReLU) to forecast day and week ahead load. RBM results in accurate
forecast compared to ReLU. Deep Auto Encoders (DAE) are implemented in
paper  for prediction of building’s cooling load. DAE is unsupervised learn-
ing method. It learns the pattern of data very well and predicts with greater
accuracy. Paper  implements Gated Recurrent Units (GRU) for price fore-
casting, that is a type of Recurrent Neural Networks (RNN). GRU outperforms
in accuracy compared to Long Short Term Memory (LSTM) and several sta-
tistical time series forecasting models. In paper  authors proposed a hybrid
model for price forecasting. Two deep learning methods are combined, i.e. Con-
volution Neural Networks (CNN) are used for useful feature’s extraction. LSTM
forecasting model is learned on features extracted by CNN. This hybrid model
perform better than both CNN and LSTM, separately. This model out perform
several sate-of-the-art forecasting models. The good performance of aforemen-
tioned DNN models proves the eﬀectiveness of deep learning in forecasting.
4 Sana Mujeeb et al.
Smart grid big data analysis helps in ﬁnding the trend of electricity con-
sumption and cost. This further enables the utility to design predictive demand
supply maintenance program(s), that is the basic requirement for demand sup-
ply balance. Smart grid big data has been studied for: power system anomaly
detection , optimal placement of computing units for communicating data to
smart grid , price forecasting . In  the authors proposed a hierarchal
framework to fore-detect anomaly in power system. The main aim of this work
is to avoid power outage and system failure. Hierarchy of anomalies are made.
Anomalies are detected through online monitoring and processing of smart me-
ter’s readings. Expectation-maximization model is used to check the system’s
health status. A hybrid framework is proposed in  to forecast price. Big data
analytics are performed in this work. Correlated features are selected using Gray
Correlation Analysis (GCA). Further relevant most features are selected through
a hybrid feature selector, that is combination of Random Forest and ReliefF. Di-
mensionality reduction of selected features is performed using Kernel PCA. After
feature extraction a forecasting model is trained using Kernel SVM. SVM is op-
timized by modiﬁed Diﬀerential Evaluation (DE) algorithm. Mutation operation
of DE is modiﬁed by dynamic adjustment of mutation scaling factor on every it-
eration. Modiﬁed DE accelerates the optimization process. Albeit this proposed
framework results in acceptable accuracy in price forecasting. However, both
price and load are not forecast simultaneously, and bidirectional nature of price
and load is not analysed on big data. Deep learning is an eﬀective technique for
big data analytics . With the high computation power and ability to model
huge data, DNN gives the deeper insights of data. In  authors perform a
comprehensive and detailed survey on importance deep learning techniques in
the area of big data analytics. For analytics of smart grid’s big data DNN can
be a very eﬀective technique.
After reviewing existing forecasting methods in the literature, the following is
the motivation of this work:
– Big data is not taken into consideration by learning based electricity load and
price forecasting methods. Evaluation of performance is only conducted on
the price data which is not quite large which has reduced forecasting accuracy
– Intelligent data driven models like fuzzy inference, Artiﬁcial Neural Networks
(ANN) and Wavelet Transform (WT) + Support Vector Machine (SVM), have
limited generalization capability therefore these methods have an over-ﬁtting
Smart Grid Big Data Analytics 5
– The nonlinear and protean pattern of electricity price is very diﬃcult to fore-
cast with traditional data. Use of big data make it possible to generalize
complex patterns of data and forecast accurately.
– Automatic feature extraction process of deep learning can eﬃciently extract
useful and rich hidden patterns in data.
4 Proposed Scheme
The proposed method comprises of four main parts: preprocessing of data, train-
ing LSTM network on train data, tuning network with validation data, forecast-
ing load and price on test data.
Forecasted Load and Price
Historical Load and
Fig. 1: Proposed System Model
4.1 Data Preprocessing
Hourly data of regulation market capacity clearing price and system load is ac-
quired from New England Control Area Independent System Operator NE CA
ISO. The data of eight years i.e. January 2011 to March 2018 is used in proposed
method. The data is divided month-wise and similar months data is combined
to make training data for forecasting respective months future price and load.
For example, data of January 2011, January 2012, January 2013 up to January
2018 ﬁrst three weeks are training data for forecasting price and load of last
week of January 2018. The data is prepared in the same manner for all twelve
months. Total data of eight months is used in forecasting of January, February
and March 2018. Data of seven months are used in forecasting of remaining eight
months April to December 2017. Data is normalized with the maximum values.
Data is partition into three parts train, validation and test data.
6 Sana Mujeeb et al.
4.2 Proposed Deep LSTM
Training, validation and test data is obtained by preprocessing data. The price
and load data are feed to the LSTM network for training. The LSTM has four
layers i.e. an input layer, one LSTM layer, a fully connected layer and the re-
gression output layer. The number of hidden units in LSTM layer is 200. The
architecture of the network is shown in Fig. 2. The ﬁnal number of hidden units
are decided after experimenting on diﬀerent number of hidden units and keeping
the number of hidden units with best forecast accuracy. During training process
of LSTM, the network predicts step ahead value at every time step. The LSTM
learn patterns of data at every time step and update the network trained till
previous time step.
LSTM network is trained for price and load data separately. The network
trained on training data is the initial network. Initial network is tested on vali-
dation data. The initial network forecasts step ahead value on validation data.
After taking forecast results from the initial network the root mean square error
is calculated. The initial network re-learns and tuned on actual values of valida-
tion data till the root mean square error reduce to a minimum. Now the ﬁnal
and tuned network is used to forecast price and load.
Sequence Input Layer
LSTM Layer 1
LSTM Layer 2
Fully Connected Layer
Fig. 2: Architecture of LSTM
The steps in proposed model are listed as follows.
–Step 1 : The historical price and load vectors, p and l respectively. Normalized
as (p-mean(p))/std(p). Price data is split month-wise. Data is divided into
three partitions train, validate and test.
Smart Grid Big Data Analytics 7
–Step 2 : Network is trained on training data and tested on validation data.
Root mean square error is calculated on validation data.
–Step 3 : Network is tuned and updated on actual values of validation data.
–Step 4 : The upgraded network is tested on the test data, and day ahead
and week ahead prices and load are forecasted. Forecasters Performance is
evaluated by calculating the root mean square error.
Historical load data
Historical price data
Normalize price data Normalize price data
Prepare month-wise price data
Split Training Xt,Valida on Xv and Tes ngXs Data
Split Training, Valida on and Tes ng Data
Tune and update networkon Xv
Forecast load and
Print forecasted price and load
price on test data Xs
Fig. 3: Proposed System Model Flow Chart
The ﬂow of proposed forecaster is shown step by step in Fig. 3.
5 Data Description
The historic electricity price and load data used in simulations are taken from
ISO NE (Independent System Operator New England) . ISO NE operates the
8 Sana Mujeeb et al.
generation and transmission system of New England, where on average 30,000
MW electric energy is produced and transmitted daily. In ISO NE, annually 10
million dollars transactions are completed by 400 electricity market participants.
The data comprises of ISO NE Control Area’s hourly system load and regulation
capacity clearing price of eight years; January 2011 to March 2018. Data contains
The ISO NE CA market electricity prices and load are signiﬁcantly eﬀected
by seasonality. In proposed work inter-season price and load variation are also
handled. Data is split month-wise which improves the forecast accuracy. Inter-
season splitting of data helps in eﬃcient capturing of highly varying price trend.
The electricity load exhibits a repetitive pattern over the years, on the other
hand price pattern changes very drastically and stochastically. Both load and
the price increase over the years. It is clear from Fig. 4. that load increase with
a constant rate and trend of load proﬁle remains the same. Whereas price is in-
creasing without any similar pattern as observed in load proﬁle. Price has a wide
range and has sudden increase spikes. Extremely volatile nature of energy price
makes forecasting of price very diﬃcult. Price trend is too random to handle
by any forecasting algorithm. Price pattern is shown in Fig. 5. The repetitive
pattern of load is caused by the same consumption times; the consumption hours
always remains the same. There is more consumption in working hours and less
in oﬀ hours and late night. There are several reasons behind prices varying pat-
tern. Firstly, amount of generation which is inversely proportional to electricity
price. Secondly, source of electricity generation which increases price if fuel is
used for generation and reduces price if renewable resources are used for gener-
ation. Thirdly, price of fuel used for power generation. Fourthly, Governments
increments in price or taxes. Fifthly, excessive use of electricity penalty.
Electricity price and load have a directly proportional. The relationship be-
tween electricity load and price is shown in Fig.6. It is clear from the ﬁgure that
mostly price increases with increase in load and only a few otherwise values.
6 Case Studies
6.1 Case Study 1: Load Forecast
First load data is taken to train forecast model. After normalization the load pro-
ﬁle trend shows a monotonous pattern; shown in Fig. 4. The load data is given
for network training after normalization. Network is trained on 9,000 weeks,
validated on 75 weeks and tested on 1 week. Validation is used to improve the
forecast model for accurate forecast. Without validation the LSTM network re-
sults in a linear, ﬂat forecast. The network tuned and updated its previous state
on real values of validation data. After tuning the LSTM network is able to
forecast short term load up to 1 week or 168 hours.
The hourly system load from 01/01/2011 to 31/03/2018 is shown in Fig. 4.
Load shows a similar pattern over the years. The hourly price from 01/01/2011
Smart Grid Big Data Analytics 9
Normalized Load (MW)
Fig. 4: Normalized load of ISO NE CA January 2011 to March 2018
Normalized Price ($/MWh)
Fig. 5: Normalized price of ISO NE CA January 2011 to March 2018
Normalized Load (MW)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Price ($/MWh)
Fig. 6: Relation between load and price signals of ISO NE CA January 2011 to March
to 31/03/2018 is shown in Fig. 5. Fig. 5. depicts that the electricity price has
a stochastic nature with sharp price spikes and it increases continuously. Fig-
ure 6. shows the relation between load and price signals from 01/02/2018 to
6.2 Case Study 2: Price Forecast
In Fig. 7. (a) price signals of January 2017 are shown, in (b) price signals of
March 2018 are shown and (c) illustrates the price signals of March 2017 and
(d) shows price signals of March 2018. The price signals of same months i.e.
March 2017 and March 2018 shows a similar pattern. It is clear from Fig. 7. that
pattern of price signal of diﬀerent months i.e. January 2018 and March 2018
is very diﬀerent. Therefore, forecast engine is trained on January 2011, January
2012, up to ﬁrst two weeks of January 2018 and test on last two weeks of January
2018. All twelve months are trained in the same fashion.
10 Sana Mujeeb et al.
0 100 200 300 400 500 600 700
Price ($/MWh) January 2017
0 100 200 300 400 500 600 700
Price ($/MWh) January 2018
0 100 200 300 400 500 600 700
Price ($/MWh) March 2017
0 100 200 300 400 500 600 700
Price ($/MWh) March 2018
Fig. 7: a: Normalized price of January 2017, b: Normalized price of January 2018, c:
Normalized price of March 2017, d: Normalized price of March 2018
Same month of the year exhibits almost same price patterns.
In this paper big data is studied for load and price forecasting problem. Deep
LSTM is proposed as forecast model for simultaneous short term load and price
forecasting. The proposed framework comprises of data preprocessing, training
of improved LSTM model, and forecasting of 24 and 168 hours load and price
patterns. The data is studied with great depth and analytics are performed ex-
ploring data behaviors and trends. Problems in training LSTM model are also
investigated. The main contribution of this work is the development of a frame-
work that can accurately forecast energy load and price and big data analysis
which gives deeper insights of energy consumption behavior and price trends.
This analysis can vitally serve energy utilities in future decision making and
improving market operation planning.
1. K. Wang, J. Yu, Y. Yu, Y. Qian, D. Zeng, S. Guo, Y. Xiang, and J. Wu, ”A survey on
energy internet: architecture, approach and emerging technologies”, IEEE Systems
2. Jiang, Hui, Kun Wang, Yihui Wang, Min Gao, and Yan Zhang. ”Energy big data:
A survey.” IEEE Access 4 (2016): 3844-3861.
3. Liu, Jin-peng, and Chang-ling Li. ”The short-term power load forecasting based
on sperm whale algorithm and wavelet least square support vector machine with
DWT-IR for feature selection.” Sustainability 9, no. 7 (2017): 1188.
Smart Grid Big Data Analytics 11
4. Ghasemi, A., H. Shayeghi, Mohammad Moradzadeh, and M. Nooshyar. ”A novel hy-
brid algorithm for electricity price and load forecasting in smart grids with demand-
side management.” Applied energy 177 (2016): 40-59.
5. Wang, Kun, Chenhan Xu, Yan Zhang, Song Guo, and Albert Zomaya. ”Robust big
data analytics for electricity price forecasting in the smart grid.” IEEE Transactions
on Big Data (2017).
6. Fan, Cheng, Fu Xiao, and Yang Zhao. ”A short-term building cooling load prediction
method using deep learning algorithms.” Applied energy 195 (2017): 222-233.
7. Ryu, Seunghyoung, Jaekoo Noh, and Hongseok Kim. ”Deep neural network based
demand side short term load forecasting.” Energies 10, no. 1 (2016): 3.
8. Fan, Cheng, Fu Xiao, and Yang Zhao. ”A short-term building cooling load prediction
method using deep learning algorithms.” Applied energy 195 (2017): 222-233.
9. Ugurlu, Umut, Ilkay Oksuz, and Oktay Tas. ”Electricity Price Forecasting Using
Recurrent Neural Networks.” Energies 11, no. 5 (2018): 1-23.
10. Kuo, Ping-Huan, and Chiou-Jye Huang. ”An Electricity Price Forecasting Model
by Hybrid Structured Deep Neural Networks.” Sustainability 10, no. 4 (2018): 1280.
11. Moghaddass, Ramin, and Jianhui Wang. ”A hierarchical framework for smart grid
anomaly detection using large-scale smart meter data.” IEEE Transactions on Smart
12. Hou, Weigang, Zhaolong Ning, Lei Guo, and Xu Zhang. ”Temporal, Functional
and Spatial Big Data Computing Framework for Large-Scale Smart Grid.” IEEE
Transactions on Emerging Topics in Computing (2017).
13. Zhang, Qingchen, Laurence T. Yang, Zhikui Chen, and Peng Li. ”A survey on deep
learning for big data.” Information Fusion 42 (2018): 146-157.