DCNN and LDA-RF-RFE based Short-Term
Electricity Load and Price Forecasting
Hammad-Ur-Rehman, Sana Mujeeb, Nadeem Javaid*
Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
*Correspondence: firstname.lastname@example.org, www.njavaid.com
Abstract—In this paper, Deep Convolutional Neural Network
(DCNN) is proposed for short term electricity load and price
forecasting. Extracting useful information from data and then
using that information for prediction is a challenging task.
This paper presents a model consisting of two stages; feature
engineering and prediction. Feature engineering comprises of
Feature Extraction (FE) and Feature Selection (FS). For FS,
this paper proposes a technique that is combination of Random
Forest (RF) and Recursive Feature Elimination (RFE). The
proposed technique is used for feature redundancy removal
and dimensionality reduction. After ﬁnding the useful features
DCNN is used for electricity price and load forecasting. DCNN
performance is compared with Convolutional Neural Network
(CNN) and Support Vector Classiﬁer (SVC) models. Using the
forecasting models, day-ahead and the week ahead forecasting is
performed for electricity price and load. To evaluate the CNN,
SVC and DCNN models, real electricity market data of New York
Independent System Operators (NYISO) is used. Mean Absolute
Error (MAE) and Root Mean Square Error (RMSE) are used
to evaluate the performance of the models. DCNN outperforms
compared models by yielding lesser errors.
Index Terms—Convolutional Neural Network (CNN), Data
Analytics, Electricity price and load forecasting, Smart Grid.
Internet of Things (IoT) has gained much popularity these
days. Smart Grid (SG) comes under the IoT umbrella. In
the past, traditional power grids were used. Traditional power
grid consists of transformers, substation, transmission lines
and distribution lines that are located far from each other. In
traditional power grids, there are huge power losses, because
of only one way communication (from grid to consumers).
Power theft is also easy in the traditional grid. Due to these
problem, electricity cost increases. SG solves the aforemen-
tioned problems; therefore, power systems and consumers are
now more inclined towards SG. With SG electricity generation,
demand and consumption can be managed efﬁciently. SG has
the functionality of communication between end user and the
utility. It enables customers to take decisions after evaluating
their usage. Utilities can also manage the generation keeping in
mind the demand. Demand Side Management (DSM) allows
the energy consumers to shift their load in off peak hours
which results in reduced consumption cost . Due to load
shifting from on peak hours to off peak hours, utilities can
manage the electricity generation efﬁciently which will even-
tually result in lower prices. SG can also reduce the electricity
theft. SG’s past data can be used to forecast the future load
and price . Price and load forecasting are very important
for decision making like production of energy, distribution,
planning and price bidding in electricity wholesale markets.
Accurate prediction is very important because it is very
difﬁcult and expensive to store electricity . There must be a
balance between generation and consumption in order to avoid
energy wastage by over-generation.
In past, many techniques are presented and implemented to
accurately forecast price and load –. Forecasting tech-
niques can be divided into three categories: classical, data
driven and artiﬁcial intelligent. Auto Regressive Integrated
Moving Average (ARIMA) , Random Forest (RF) ,
and Naive Bayes constitute classical techniques. Techniques
like Deep Neural Network (DNN), Artiﬁcial Neural Network
(ANN), Convolutional Neural Network (CNN), Support Vector
Classiﬁer (SVC), Sperm Whale Algorithm (SWA), etc. belongs
to the category of artiﬁcial intelligent techniques.
A. Motivation and Contributions
There are many load and price forecasting models already
present, which can be used for short, medium and long term
forecasting. In the past, generally for price and load forecasting
some classical techniques  like ARIMA, RF etc. are used.
Whereas, in , authors used CNN with Long Short Term
Memory (LSTM) to forecast price. Following their methodol-
ogy in this paper DCNN is proposed for both price and load
prediction. Also, the proposed model is compared with SVC
and one base CNN model. The proposed model outperforms
base models. In paper , authors have performed Feature
Extraction (FE) using CNN, however, Feature Selection (FS)
is not performed. Therefore, redundancy remains in extracted
features. After reviewing existing prediction model following
points are deduced:
1. FS and FE are not used in combination to reduce feature
redundancy and dimensionality reduction.
2. Evaluation is either done on price or load. In this paper,
evaluation is done on price and load both.
3. Data driven and classical prediction models have over-ﬁtting
4. Price prediction is very important due to its random pattern.
DCNN is used to ﬁnd those patterns for better prediction
In this paper, SVC and CNN schemes are implemented for
electricity load and price forecasting to compare performance
of proposed model. In this paper, electricity price and load
NYISO Data Feature Extraction
Train / Test
Deep Convolution Neural
Load and Price Forecast
(RF + RFE)
Data Preprocessing and Feature Engineering
Fig. 1: Proposed system model
forecasting is done. Objective of work is to accurately forecast
day-ahead and weak-ahead load and price. The main contri-
butions of this paper are:
1. Enhanced DCNN based electricity load and price forecast-
2. Combining RF and Recursive Feature Elimination (RFE)
for removing redundancy and dimensionality reduction.
3. Finding appropriate training parameters for prediction tech-
In this paper, terms: end user, consumer and customer are used
interchangeably. The terms: load, demand and consumption
are also related in context of this paper. Forecasting and
prediction is also used in the same context.
Rest of the paper is arranged as follows. Section 2 consists of
the proposed system model. Simulation results and accuracy
comparison with SVC and Base CNN are presented in section
3. Section 4 is Conclusion.
II. PRO PO SE D SYS TE M MOD EL
The proposed system model consists of two stages; feature en-
gineering and prediction. Feature engineering is a combination
of FE and FS. For FS this paper proposes a technique that is
combination of RF and RFE. The proposed technique is used
to remove feature redundancy and also used for dimensionality
reduction. After ﬁnding the useful features DCNN is used for
electricity price and load forecasting. In the proposed system
NYISO dataset for the year 2016 is used for electricity price
and load forecasting. Data is sampled every hour therefore
for each day we have total of twenty-four points. System
inputs are Weighted Integrated (TWI) and day-ahead: zonal
congestion, zonal losses, day-ahead Locational Marginal Price
(LMP). Price and load of day-ahead and next week are
forecasted using data from January 2016. The system model’s
diagram of the proposed model is shown in Figure 1.
A. Dataset Description
Historical electricity price and load data for simulations is
taken from NYISO CAPITL region. Price and Load data from
January 2016 to January 2017 is used for simulation. From
Figure 2 it can be observed that load exhibits the same pattern
over the year. From Figure 3 it can observed that price has
volatile changes over the period of time and have some random
pattern. From Figure 2 of load we can see that values of
load increases in summer season. Electricity consumption is
less in months from January to March then it increases and
Fig. 2: Load of NYISO CAPITL region
Fig. 3: Price of NYISO CAPITL region
stays nearly same from April to September. Price on the other
hand is different and have random pattern. There are many
reasons for the varying nature of price. Price of fuel is directly
proportional to electricity price.
B. Feature Engineering
In feature engineering, feature are extracted using LDA and
extracted features are further selected using RF and RFE.
Linear Discriminant Analysis (LDA): LDA was proposed by
R. A. Fisher in 1936. In LDA, linear combination of input
variables are searched that suits best to separate two classes.
LDA is most commonly used technique for dimensionality
reduction. In this paper, LDA has been used for FE. LDA
can be summarized in ﬁve steps given below and shown in
1. ﬁrst features dimensional average vectors are computed,
2. in second step, scatter matrices are computed,
3. in third step, eigen-values and eigen-vectors are calculated,
4. in fourth step, the eigen vectors are arranged in descending
order based on their eigen values, and
5. in ﬁnal step, features are transformed into a new sub-space
using eigen vector matrix.
NYISO dataset has total of eight features mentioned in the
previous section. These feature vectors are given as an input
to LDA and resultant features are taken as output. This step
has been done for both load and price. Simulation results will
be shown in simulation results section. From the total of eight
input feature vectors after LDA only four vectors are remained
because in LDA function it is speciﬁed that output should be
the ﬁve most suitable vectors from all the vectors at input. Now
these vectors from LDA will be used as an input to feature
Most of the times in machine learning systems there are
several features and there is a dire need to select important
ones. Selecting important features has lot of beneﬁts on the
system. First and foremost, it can reduce the complexity and
computational cost of system. Process of identifying important
and required features is called “Feature Selection”. Feature
selector used in this paper is hybrid of two techniques; i.e.,
RF and RFE. Both techniques have their own uses. RF is used
to creat random forest and then ﬁnd the importance. RFE is
used to remove feature redundancy. Now both techniques will
be discussed in detail.
Random Forest (RF): RF is a simple, robust and most popular
technique used for ﬁnding the feature importance. RF is an
ensemble learning technique that can be used for ﬁnding
the importance of fathers. First Classiﬁer is trained on the
features, which gives us the importance of each feature. Then
the aforementioned thresholding can be applied to ﬁnd the
needed feature. In this paper, in case of load and price different
thresholding are used. For load it is 0.253 and for the price it
is 0.250. Steps for implementing RF are:
1. From all features randomly selects some features. A point
to remember is random number should be less than the total
number of features,
2. From the selected features calculate node point,
3. Now divides the node into sub nodes,
4. Repeat above three steps until desired nodes is achieved,
5. Finally, forest is built by repeating the above steps.
Recursive Feature Elimination: RFE is used to remove redun-
dant features. RFE is a method in which model is trained to
remove the weakest feature. “‘feature importance” parameter
is used to rank the model. A small number of features are
eliminated in each iteration. RFE at the input requires the
number of features “nF” to be retained. In this paper, for“nF”
value one is used.
C. Deep CNN
The base CNN model used for predicting electricity price and
load. Output of CNN is used for comparison with DCNN
model. In general, CNN consists of one or more than one
convolution layers followed by fully connected layer. In this
paper, one convolution layer, one max-pooling layer, one
ﬂattens layer and one dense layer is used to construct the base
After RFE features and target is divided into testing and
training data. For price and load prediction DCNN model
is proposed. DCNN model used is shown below. Output of
DCNN is compared with base CNN and SVC. In general,
CNN consists of one or more than one convolution layers
followed by fully connected layer. There can be other layers
also like Dense, Flatten, etc. CNN comprises of a number
of conventional and max pooling or sub-sampling layers.
Sometimes there are fully connected layers which in this paper
are used. DCNN convolution layer input is “r” by “c” matrix.
Here “r” is number of rows and “c” is number of columns
with channels as one. CNN convolution layer have “k” kernels
which has size “c” by “c”. Here “c” is less than input matrix
dimension with number of channels as “one”. Each ﬁlter map
is then max pooled or sub-sampled typically with max or
average pooling. Either before or after the max pooling layer
an activator function is used. In this “relu” function is used.
Dense layer is also used in building DCNN model. In this
paper, two Dense layers are also used. After adding layer
model is compiled ﬁrst, then trained on training data and
after that prediction is done using test features and at least
predicted data comparative test data. 70% data is used training
the network and testing is performed on remaining 30% data.
Algorithm 1: DCNN algorithm
Input: filter1Size,f ilter2Size,epochs,activation,
1Make sequential model using a function called
2In model, add ﬁrst convolution layer using
Conv1D(f ilterSiz e, kernelSiz e, activation, input)
3In model, add second convolution layer using
Conv1D(f ilterSiz e, kernelSiz e, activation)
4In model, add maxpooling layer using
M axP ooling1D(poolingSize)
5In model, add ﬂatten layer using F latten()
6In model, add ﬁrst dense layer using
7In model, add second dense layer using
8Now compile the model using
Compile(optimizer, lossF unction)
9Do the model training for epochs
10 for i←0to epochs by 1do
11 Train the model using
fit(f eaturesT rain, tar getT rain)
12 Predict the data using predict(f eaturesT est)
III. SIMULATION RES ULTS
All simulations are done using PYTHON. Data preprocessing
and forecasting results will be discussed in detail in this
section. Price and load forecasting is discussed in this paper.
After LDA new feature vectors for the load are shown in
Fig. 4: Load features after LDA
Fig. 5: Price features after LDA
Figure 4 and for price new feature vectors are shown in
Figure 5. Now these features are given as an input to RF.
RF gives importance to features and after thresholding (using a
threshold of importance value more than 50% of total range of
feature importance values), important features will be selected.
In Figure 6 feature importance score of price, features is
displayed in the form of Bar Graph. It shows that Principal
Component (PC) PC-0 has the highest and others have very
less importance as compared to PC-0. So, after thresholding
only PC-0 will be left. In Figure 7 feature importance score of
load features is displayed in the form of Bar Graph. It shows
that almost all the features have same importance, however,
PC-0 has the highest score so after applying thresholding only
PC-0 will be left. Output of RF will be used as an input to
next technique called “RFE”. The errors matrices are shown
in table I.
Day-ahead predicted and target price results from CNN and
SVC are shown in Figure 8 and Figure 9, respectively. Price
prediction results are low accurate because of less number of
epochs, less number of ﬁlters and less layers. CNN works well
for prediction when there is more data, appropriate number
of conventional layers and above all it should run of higher
number of epochs. CNN at each epoch learns the weights for
kernel and try to optimize it. Now improved CNN models
will be built which will be beating both CNN and SVC base
models. In all models, kernel sizes used in convolution layer
and pooling size used in max pooling layer are kept constant.
Fig. 6: Price features importance score
Also in all models, ﬂatten layer is always used. Flatten layers is
used to convert the data matrix into one column. Except these
2 parameters and one layer all the parameters are changed to
ﬁnd the accuracy of CNN model.
It is clear that as the number of epochs are increased accu-
racy starts increasing signiﬁcantly. With increasing number of
epochs number of ﬁlters is also increased. In CNN 1 with 32
ﬁlters in ﬁrst convolution layer and 128 ﬁlters second layer
MAE and RMSE are decreased and accuracy is increased. In
CNN 2 keeping epochs same as of CNN 1, however, ﬁlters
in ﬁrst layer changed to 64 due to which MAE and RMSE
reduced to 6.250 and 7.100 approximately. Now changing
Fig. 7: Load features importance score
Fig. 8: Comparison of actual and predicted day-ahead price
using the base CNN model
epochs to 9,000 and ﬁlters to 32 and 128 CNN 3 and CNN 4
models are produced. Again the results are increased. Here,
point to note is in CNN 4 dropout layer was also added.
For epochs counting to 10,000 simulations were also run.
Accuracy is increased further. Best results were produced
when epochs were made 20,000 and ﬁlters were 32 and 128
for ﬁrst and second layer, respectively.
Enhanced CNN (CNN 5) model results for day-ahead are
shown in Figure 10. These results beats the SVC (as shown
in Figure 11) CNN (as shown in Figure 12) and SVC base
models. Still, there are some ﬂuctuations because of the fact
that data may have some missing ﬁelds. Using the CNN model,
load prediction is not that good due to multiple reasons. Firstly,
Fig. 9: Comparison of actual and predicted day-ahead price
Fig. 10: Comparison of actual and predicted day-ahead price
using DCNN (CNN 5)
Fig. 11: Comparison of Actual and Predicted day-ahead load
Fig. 12: Comparison of actual and predicted day-ahead load
using the base CNN model
Fig. 13: Comparison of actual and predicted day-ahead load
using DCNN (CNN 4)
TABLE I: Enhanced CNN model error matrices
Convolutional Neural Network Error Metrices
Flatten Dense Epochs MAE RMSECNN
ID Filters Kernel Size Pool Size Size
CNN 1 128, 512 2 2 Yes 200, 24 500 5.106 26.732
CNN 2 32, 128 2 2 Yes 200, 24 500 5.566 28.302
CNN 3 32, 128 2 2 Yes 200, 24 1000 8.847 42.915
CNN 4 64, 512 2 2 Yes 200, 24 10000 4.6875 24.955
CNN 5 64, 512 2 2 Yes 200, 24 10000 13.089 17.233
base CNN model contains less ﬁlters and secondly the number
of epochs are less. CNN learns the weights on each iteration. If
there are less epochs, then learning rate will be less. Hence, the
output will be bad. CNN kernels require a reasonable number
of iterations to converge. CNN works well for prediction when
there is more data, appropriate number of conventional layers
and above all it should run of higher number of epochs. Epochs
should be an optimal number. Because if epochs are very
higher than after some time loss value starts saturating and
beyond that point there is no need of running it more. This
epoch value can be found out by hit and trial. CNN at each
epoch learns the weights for kernel and try to optimize it. Now
improved CNN models will be built which will be beating both
CNN and SVC base models. In all models, kernel sizes used
in convolution layers and pooling size used in max pooling
layer are kept constant. Also in all models Flatten layer is
always used. Flatten layers is used to convert the data matrix
into one column. Except these two parameters and one layer
all the parameters are changed to increase the accuracy of
CNN model. It is clear that as the number of epochs are
increased accuracy starts increasing signiﬁcantly, however, it
can also be noted that when epochs are increased from 10000
to 20000 then instead of increasing accuracy have decreased.
So, it can be concluded that 10,000 is the required number.
With increasing number of epochs number of ﬁlters is also
increased. In CNN 1, with 128 ﬁlters in ﬁrst convolution layer
and 512 ﬁlters second layer MAE and RMSE are decreased so
we can say that accuracy is increased. In CNN, 2 epochs are
same as in CNN 1. Whereas, ﬁlters in ﬁrst layer changed to 32
and in second changed to 128 due to which reduce forecast
error. Best results were produced when epochs were made
10,000 and ﬁlters were 64 and 512 for ﬁrst and second layer,
respectively. From enhanced CNN (CNN 4) model forecasting
results from day-ahead whole load is shown in Figure 13
which beats the CNN and SVC base models. Still, there are
some ﬂuctuations because of the fact that data is not up to
date and in future more data can be used to further increase
the accuracy. Using the enhanced model week ahead load is
also predicted which shows lots of improvement as compared
to predicted using the base CNN and SVC models.
In this paper, DCNN model is proposed for load and price
forecasting. The proposed models comprises of three stages;
i.e., feature extraction through LDA, feature selection through
a new feature selector that is combination of RF and RFE and
forecasting through an enhanced DCNN. Enhanced DCNN’s
hyper parameters are tuned to achieved high prediction ac-
curacy. The proposed model is tested using the real-world
electricity market data of NYISO to validate its authenticity.
Simulation results show that the proposed model has lesser
error rates as compared to CNN and SVC that proves its
applicability and effectiveness for electricity load and price
 B. Neupane, W. L. Woon and Z. Aung, Ensemble Prediction Model with
Expert Selection for Electricity Price Forecasting, Energies, Vol. 10, pp.
 Shi, Heng and Xu, Minghao and Li, Ran, Deep learning for household
load forecasting - A novel pooling deep RNN IEEE Transactions on Smart
Grid, 9(5), pp.5271-5280.2018.
 Chitsaz, Hamed and Zamani-Dehkordi, Payam and Zareipour, Hamidreza
and Parikh, Palak, ricity price forecasting for operational scheduling of
behind-the-meter storage systems, IEEE Transactions on Smart Grids,
 Claessens, Bert J and Vrancx, Peter and Ruelens, Frederik, Convolutional
neural networks for automatic state-time feature extraction in reinforce-
ment learning applied to residential load control, IEEE Transactions on
Power Systems, Vol. 9 (4), pp.3259–3269. 2018.
 Ahmad. Ashfaq, Javaid Nadeem, Guizani M., Alrajeh, N. and Khan,
Zahoor.A., An accurate and fast converging short-term load forecasting
model for industrial applications in a smart grid, IEEE Transactions on
Industrial Informatics, Vol. 13(5), pp.2587-2596, 2017.
 Lago, J., De Ridder, F. and De Schutter, B.,Forecasting spot electricity
prices: Deep learning approaches and empirical comparison of tradi-
tional algorithms Applied Energy, Vol. 221, pp.386-405. 2018.
 Raﬁei, Mehdi and Niknam, Taher and Khooban, Mohammad-Hassan,
Probabilistic forecasting of hourly electricity price by generalization of
ELM for usage in improved wavelet neural network, IEEE Transactions
on Industrial Informatics, Vol. 13 (1), pp.71-79. 2019.
 Ugurlu, Umut and Oksuz, Ilkay and Tas, Oktay, Electricity Price Fore-
casting Using Recurrent Neural Networks, Energies, Vol. 11 (5), pp.1-23.
 Bassamzadeh, Nastaran and Ghanem, Roger, Multiscale stochastic pre-
diction of electricity demand in smart grids using Bayesian networks,
Applied energy, Vol. 193, pp.369-380. 2017.
e Portela and San Roque, Antonio Mu˜
noz and P´
Estrella Alonso, Forecasting functional time series with a new Hilber-
tian ARMAX model: Application to electricity price forecasting, IEEE
Transactions on Power Systems, Vol. 33 (1), pp.545–556. 2018.
 Ziel, Florian and Weron, Rafał, Day-ahead electricity price forecasting
with high-dimensional structures: Univariate vs. multivariate modeling
frameworks, Energy Economics, Vol. 70, pp.396–420. 2018.
 Luo, Jian and Hong, Tao and Fang, Shu-Cherng, Benchmarking robust-
ness of load forecasting models under data integrity attacks, International
Journal of Forecasting, Vol. 34 (1), pp.89–104. 2018.
 Muralitharan, K and Sakthivel, Rathinasamy and Vishnuvarthan, R, Neu-
ral network based optimization approach for energy demand prediction
in smart grid, Neurocomputing, Vol. 273, pp.199–208. 2018.
 Kuo, Ping-Huan and Huang, Chiou-Jy, An Electricity Price Forecasting
Model by Hybrid Structured Deep Neural Networks Sustainability, Vol.
10, pp.10. 2018.