Content uploaded by Nadeem Javaid

Author content

All content in this area was uploaded by Nadeem Javaid on Oct 14, 2019

Content may be subject to copyright.

DCNN and LDA-RF-RFE based Short-Term

Electricity Load and Price Forecasting

Hammad-Ur-Rehman, Sana Mujeeb, Nadeem Javaid*

Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan

*Correspondence: nadeemjavaidqau@gmail.com, www.njavaid.com

Abstract—In this paper, Deep Convolutional Neural Network

(DCNN) is proposed for short term electricity load and price

forecasting. Extracting useful information from data and then

using that information for prediction is a challenging task.

This paper presents a model consisting of two stages; feature

engineering and prediction. Feature engineering comprises of

Feature Extraction (FE) and Feature Selection (FS). For FS,

this paper proposes a technique that is combination of Random

Forest (RF) and Recursive Feature Elimination (RFE). The

proposed technique is used for feature redundancy removal

and dimensionality reduction. After ﬁnding the useful features

DCNN is used for electricity price and load forecasting. DCNN

performance is compared with Convolutional Neural Network

(CNN) and Support Vector Classiﬁer (SVC) models. Using the

forecasting models, day-ahead and the week ahead forecasting is

performed for electricity price and load. To evaluate the CNN,

SVC and DCNN models, real electricity market data of New York

Independent System Operators (NYISO) is used. Mean Absolute

Error (MAE) and Root Mean Square Error (RMSE) are used

to evaluate the performance of the models. DCNN outperforms

compared models by yielding lesser errors.

Index Terms—Convolutional Neural Network (CNN), Data

Analytics, Electricity price and load forecasting, Smart Grid.

I. INTRODUCTION

Internet of Things (IoT) has gained much popularity these

days. Smart Grid (SG) comes under the IoT umbrella. In

the past, traditional power grids were used. Traditional power

grid consists of transformers, substation, transmission lines

and distribution lines that are located far from each other. In

traditional power grids, there are huge power losses, because

of only one way communication (from grid to consumers).

Power theft is also easy in the traditional grid. Due to these

problem, electricity cost increases. SG solves the aforemen-

tioned problems; therefore, power systems and consumers are

now more inclined towards SG. With SG electricity generation,

demand and consumption can be managed efﬁciently. SG has

the functionality of communication between end user and the

utility. It enables customers to take decisions after evaluating

their usage. Utilities can also manage the generation keeping in

mind the demand. Demand Side Management (DSM) allows

the energy consumers to shift their load in off peak hours

which results in reduced consumption cost [1]. Due to load

shifting from on peak hours to off peak hours, utilities can

manage the electricity generation efﬁciently which will even-

tually result in lower prices. SG can also reduce the electricity

theft. SG’s past data can be used to forecast the future load

and price [2]. Price and load forecasting are very important

for decision making like production of energy, distribution,

planning and price bidding in electricity wholesale markets.

Accurate prediction is very important because it is very

difﬁcult and expensive to store electricity [3]. There must be a

balance between generation and consumption in order to avoid

energy wastage by over-generation.

In past, many techniques are presented and implemented to

accurately forecast price and load [4]–[9]. Forecasting tech-

niques can be divided into three categories: classical, data

driven and artiﬁcial intelligent. Auto Regressive Integrated

Moving Average (ARIMA) [10], Random Forest (RF) [11],

and Naive Bayes constitute classical techniques. Techniques

like Deep Neural Network (DNN), Artiﬁcial Neural Network

(ANN), Convolutional Neural Network (CNN), Support Vector

Classiﬁer (SVC), Sperm Whale Algorithm (SWA), etc. belongs

to the category of artiﬁcial intelligent techniques.

A. Motivation and Contributions

There are many load and price forecasting models already

present, which can be used for short, medium and long term

forecasting. In the past, generally for price and load forecasting

some classical techniques [12] like ARIMA, RF etc. are used.

Whereas, in [13], authors used CNN with Long Short Term

Memory (LSTM) to forecast price. Following their methodol-

ogy in this paper DCNN is proposed for both price and load

prediction. Also, the proposed model is compared with SVC

and one base CNN model. The proposed model outperforms

base models. In paper [14], authors have performed Feature

Extraction (FE) using CNN, however, Feature Selection (FS)

is not performed. Therefore, redundancy remains in extracted

features. After reviewing existing prediction model following

points are deduced:

1. FS and FE are not used in combination to reduce feature

redundancy and dimensionality reduction.

2. Evaluation is either done on price or load. In this paper,

evaluation is done on price and load both.

3. Data driven and classical prediction models have over-ﬁtting

problems.

4. Price prediction is very important due to its random pattern.

DCNN is used to ﬁnd those patterns for better prediction

accuracy.

In this paper, SVC and CNN schemes are implemented for

electricity load and price forecasting to compare performance

of proposed model. In this paper, electricity price and load

NYISO Data Feature Extraction

(LDA)

Split Data

Train / Test

Deep Convolution Neural

Network Forecasting

Load and Price Forecast

Feature Selection

(RF + RFE)

Data Preprocessing and Feature Engineering

Fig. 1: Proposed system model

forecasting is done. Objective of work is to accurately forecast

day-ahead and weak-ahead load and price. The main contri-

butions of this paper are:

1. Enhanced DCNN based electricity load and price forecast-

ing system.

2. Combining RF and Recursive Feature Elimination (RFE)

for removing redundancy and dimensionality reduction.

3. Finding appropriate training parameters for prediction tech-

nique.

In this paper, terms: end user, consumer and customer are used

interchangeably. The terms: load, demand and consumption

are also related in context of this paper. Forecasting and

prediction is also used in the same context.

Rest of the paper is arranged as follows. Section 2 consists of

the proposed system model. Simulation results and accuracy

comparison with SVC and Base CNN are presented in section

3. Section 4 is Conclusion.

II. PRO PO SE D SYS TE M MOD EL

The proposed system model consists of two stages; feature en-

gineering and prediction. Feature engineering is a combination

of FE and FS. For FS this paper proposes a technique that is

combination of RF and RFE. The proposed technique is used

to remove feature redundancy and also used for dimensionality

reduction. After ﬁnding the useful features DCNN is used for

electricity price and load forecasting. In the proposed system

NYISO dataset for the year 2016 is used for electricity price

and load forecasting. Data is sampled every hour therefore

for each day we have total of twenty-four points. System

inputs are Weighted Integrated (TWI) and day-ahead: zonal

congestion, zonal losses, day-ahead Locational Marginal Price

(LMP). Price and load of day-ahead and next week are

forecasted using data from January 2016. The system model’s

diagram of the proposed model is shown in Figure 1.

A. Dataset Description

Historical electricity price and load data for simulations is

taken from NYISO CAPITL region. Price and Load data from

January 2016 to January 2017 is used for simulation. From

Figure 2 it can be observed that load exhibits the same pattern

over the year. From Figure 3 it can observed that price has

volatile changes over the period of time and have some random

pattern. From Figure 2 of load we can see that values of

load increases in summer season. Electricity consumption is

less in months from January to March then it increases and

Fig. 2: Load of NYISO CAPITL region

Fig. 3: Price of NYISO CAPITL region

stays nearly same from April to September. Price on the other

hand is different and have random pattern. There are many

reasons for the varying nature of price. Price of fuel is directly

proportional to electricity price.

B. Feature Engineering

In feature engineering, feature are extracted using LDA and

extracted features are further selected using RF and RFE.

Linear Discriminant Analysis (LDA): LDA was proposed by

R. A. Fisher in 1936. In LDA, linear combination of input

variables are searched that suits best to separate two classes.

LDA is most commonly used technique for dimensionality

reduction. In this paper, LDA has been used for FE. LDA

can be summarized in ﬁve steps given below and shown in

following algorithm:

1. ﬁrst features dimensional average vectors are computed,

2. in second step, scatter matrices are computed,

3. in third step, eigen-values and eigen-vectors are calculated,

4. in fourth step, the eigen vectors are arranged in descending

order based on their eigen values, and

5. in ﬁnal step, features are transformed into a new sub-space

using eigen vector matrix.

NYISO dataset has total of eight features mentioned in the

previous section. These feature vectors are given as an input

to LDA and resultant features are taken as output. This step

has been done for both load and price. Simulation results will

be shown in simulation results section. From the total of eight

input feature vectors after LDA only four vectors are remained

because in LDA function it is speciﬁed that output should be

the ﬁve most suitable vectors from all the vectors at input. Now

these vectors from LDA will be used as an input to feature

selector.

Most of the times in machine learning systems there are

several features and there is a dire need to select important

ones. Selecting important features has lot of beneﬁts on the

system. First and foremost, it can reduce the complexity and

computational cost of system. Process of identifying important

and required features is called “Feature Selection”. Feature

selector used in this paper is hybrid of two techniques; i.e.,

RF and RFE. Both techniques have their own uses. RF is used

to creat random forest and then ﬁnd the importance. RFE is

used to remove feature redundancy. Now both techniques will

be discussed in detail.

Random Forest (RF): RF is a simple, robust and most popular

technique used for ﬁnding the feature importance. RF is an

ensemble learning technique that can be used for ﬁnding

the importance of fathers. First Classiﬁer is trained on the

features, which gives us the importance of each feature. Then

the aforementioned thresholding can be applied to ﬁnd the

needed feature. In this paper, in case of load and price different

thresholding are used. For load it is 0.253 and for the price it

is 0.250. Steps for implementing RF are:

1. From all features randomly selects some features. A point

to remember is random number should be less than the total

number of features,

2. From the selected features calculate node point,

3. Now divides the node into sub nodes,

4. Repeat above three steps until desired nodes is achieved,

and

5. Finally, forest is built by repeating the above steps.

Recursive Feature Elimination: RFE is used to remove redun-

dant features. RFE is a method in which model is trained to

remove the weakest feature. “‘feature importance” parameter

is used to rank the model. A small number of features are

eliminated in each iteration. RFE at the input requires the

number of features “nF” to be retained. In this paper, for“nF”

value one is used.

C. Deep CNN

The base CNN model used for predicting electricity price and

load. Output of CNN is used for comparison with DCNN

model. In general, CNN consists of one or more than one

convolution layers followed by fully connected layer. In this

paper, one convolution layer, one max-pooling layer, one

ﬂattens layer and one dense layer is used to construct the base

CNN model.

After RFE features and target is divided into testing and

training data. For price and load prediction DCNN model

is proposed. DCNN model used is shown below. Output of

DCNN is compared with base CNN and SVC. In general,

CNN consists of one or more than one convolution layers

followed by fully connected layer. There can be other layers

also like Dense, Flatten, etc. CNN comprises of a number

of conventional and max pooling or sub-sampling layers.

Sometimes there are fully connected layers which in this paper

are used. DCNN convolution layer input is “r” by “c” matrix.

Here “r” is number of rows and “c” is number of columns

with channels as one. CNN convolution layer have “k” kernels

which has size “c” by “c”. Here “c” is less than input matrix

dimension with number of channels as “one”. Each ﬁlter map

is then max pooled or sub-sampled typically with max or

average pooling. Either before or after the max pooling layer

an activator function is used. In this “relu” function is used.

Dense layer is also used in building DCNN model. In this

paper, two Dense layers are also used. After adding layer

model is compiled ﬁrst, then trained on training data and

after that prediction is done using test features and at least

predicted data comparative test data. 70% data is used training

the network and testing is performed on remaining 30% data.

Algorithm 1: DCNN algorithm

Input: filter1Size,f ilter2Size,epochs,activation,

loss,poolingSize,input,kernelSize

Output: dcnnModel,predictedData

1Make sequential model using a function called

Sequential()

2In model, add ﬁrst convolution layer using

Conv1D(f ilterSiz e, kernelSiz e, activation, input)

3In model, add second convolution layer using

Conv1D(f ilterSiz e, kernelSiz e, activation)

4In model, add maxpooling layer using

M axP ooling1D(poolingSize)

5In model, add ﬂatten layer using F latten()

6In model, add ﬁrst dense layer using

Dense(denseSize, activation)

7In model, add second dense layer using

Dense(denseSize)

8Now compile the model using

Compile(optimizer, lossF unction)

9Do the model training for epochs

10 for i←0to epochs by 1do

11 Train the model using

fit(f eaturesT rain, tar getT rain)

12 Predict the data using predict(f eaturesT est)

III. SIMULATION RES ULTS

All simulations are done using PYTHON. Data preprocessing

and forecasting results will be discussed in detail in this

section. Price and load forecasting is discussed in this paper.

After LDA new feature vectors for the load are shown in

Fig. 4: Load features after LDA

Fig. 5: Price features after LDA

Figure 4 and for price new feature vectors are shown in

Figure 5. Now these features are given as an input to RF.

RF gives importance to features and after thresholding (using a

threshold of importance value more than 50% of total range of

feature importance values), important features will be selected.

In Figure 6 feature importance score of price, features is

displayed in the form of Bar Graph. It shows that Principal

Component (PC) PC-0 has the highest and others have very

less importance as compared to PC-0. So, after thresholding

only PC-0 will be left. In Figure 7 feature importance score of

load features is displayed in the form of Bar Graph. It shows

that almost all the features have same importance, however,

PC-0 has the highest score so after applying thresholding only

PC-0 will be left. Output of RF will be used as an input to

next technique called “RFE”. The errors matrices are shown

in table I.

Day-ahead predicted and target price results from CNN and

SVC are shown in Figure 8 and Figure 9, respectively. Price

prediction results are low accurate because of less number of

epochs, less number of ﬁlters and less layers. CNN works well

for prediction when there is more data, appropriate number

of conventional layers and above all it should run of higher

number of epochs. CNN at each epoch learns the weights for

kernel and try to optimize it. Now improved CNN models

will be built which will be beating both CNN and SVC base

models. In all models, kernel sizes used in convolution layer

and pooling size used in max pooling layer are kept constant.

Fig. 6: Price features importance score

Also in all models, ﬂatten layer is always used. Flatten layers is

used to convert the data matrix into one column. Except these

2 parameters and one layer all the parameters are changed to

ﬁnd the accuracy of CNN model.

It is clear that as the number of epochs are increased accu-

racy starts increasing signiﬁcantly. With increasing number of

epochs number of ﬁlters is also increased. In CNN 1 with 32

ﬁlters in ﬁrst convolution layer and 128 ﬁlters second layer

MAE and RMSE are decreased and accuracy is increased. In

CNN 2 keeping epochs same as of CNN 1, however, ﬁlters

in ﬁrst layer changed to 64 due to which MAE and RMSE

reduced to 6.250 and 7.100 approximately. Now changing

Fig. 7: Load features importance score

Fig. 8: Comparison of actual and predicted day-ahead price

using the base CNN model

epochs to 9,000 and ﬁlters to 32 and 128 CNN 3 and CNN 4

models are produced. Again the results are increased. Here,

point to note is in CNN 4 dropout layer was also added.

For epochs counting to 10,000 simulations were also run.

Accuracy is increased further. Best results were produced

when epochs were made 20,000 and ﬁlters were 32 and 128

for ﬁrst and second layer, respectively.

Enhanced CNN (CNN 5) model results for day-ahead are

shown in Figure 10. These results beats the SVC (as shown

in Figure 11) CNN (as shown in Figure 12) and SVC base

models. Still, there are some ﬂuctuations because of the fact

that data may have some missing ﬁelds. Using the CNN model,

load prediction is not that good due to multiple reasons. Firstly,

Fig. 9: Comparison of actual and predicted day-ahead price

using SVC

Fig. 10: Comparison of actual and predicted day-ahead price

using DCNN (CNN 5)

Fig. 11: Comparison of Actual and Predicted day-ahead load

usin SVC

Fig. 12: Comparison of actual and predicted day-ahead load

using the base CNN model

Fig. 13: Comparison of actual and predicted day-ahead load

using DCNN (CNN 4)

TABLE I: Enhanced CNN model error matrices

Convolutional Neural Network Error Metrices

Convolution Max

Pooling

Flatten Dense Epochs MAE RMSECNN

Model

ID Filters Kernel Size Pool Size Size

CNN 1 128, 512 2 2 Yes 200, 24 500 5.106 26.732

CNN 2 32, 128 2 2 Yes 200, 24 500 5.566 28.302

CNN 3 32, 128 2 2 Yes 200, 24 1000 8.847 42.915

CNN 4 64, 512 2 2 Yes 200, 24 10000 4.6875 24.955

CNN 5 64, 512 2 2 Yes 200, 24 10000 13.089 17.233

base CNN model contains less ﬁlters and secondly the number

of epochs are less. CNN learns the weights on each iteration. If

there are less epochs, then learning rate will be less. Hence, the

output will be bad. CNN kernels require a reasonable number

of iterations to converge. CNN works well for prediction when

there is more data, appropriate number of conventional layers

and above all it should run of higher number of epochs. Epochs

should be an optimal number. Because if epochs are very

higher than after some time loss value starts saturating and

beyond that point there is no need of running it more. This

epoch value can be found out by hit and trial. CNN at each

epoch learns the weights for kernel and try to optimize it. Now

improved CNN models will be built which will be beating both

CNN and SVC base models. In all models, kernel sizes used

in convolution layers and pooling size used in max pooling

layer are kept constant. Also in all models Flatten layer is

always used. Flatten layers is used to convert the data matrix

into one column. Except these two parameters and one layer

all the parameters are changed to increase the accuracy of

CNN model. It is clear that as the number of epochs are

increased accuracy starts increasing signiﬁcantly, however, it

can also be noted that when epochs are increased from 10000

to 20000 then instead of increasing accuracy have decreased.

So, it can be concluded that 10,000 is the required number.

With increasing number of epochs number of ﬁlters is also

increased. In CNN 1, with 128 ﬁlters in ﬁrst convolution layer

and 512 ﬁlters second layer MAE and RMSE are decreased so

we can say that accuracy is increased. In CNN, 2 epochs are

same as in CNN 1. Whereas, ﬁlters in ﬁrst layer changed to 32

and in second changed to 128 due to which reduce forecast

error. Best results were produced when epochs were made

10,000 and ﬁlters were 64 and 512 for ﬁrst and second layer,

respectively. From enhanced CNN (CNN 4) model forecasting

results from day-ahead whole load is shown in Figure 13

which beats the CNN and SVC base models. Still, there are

some ﬂuctuations because of the fact that data is not up to

date and in future more data can be used to further increase

the accuracy. Using the enhanced model week ahead load is

also predicted which shows lots of improvement as compared

to predicted using the base CNN and SVC models.

IV. CONCLUSION

In this paper, DCNN model is proposed for load and price

forecasting. The proposed models comprises of three stages;

i.e., feature extraction through LDA, feature selection through

a new feature selector that is combination of RF and RFE and

forecasting through an enhanced DCNN. Enhanced DCNN’s

hyper parameters are tuned to achieved high prediction ac-

curacy. The proposed model is tested using the real-world

electricity market data of NYISO to validate its authenticity.

Simulation results show that the proposed model has lesser

error rates as compared to CNN and SVC that proves its

applicability and effectiveness for electricity load and price

forecasting.

REFERENCES

[1] B. Neupane, W. L. Woon and Z. Aung, Ensemble Prediction Model with

Expert Selection for Electricity Price Forecasting, Energies, Vol. 10, pp.

77, 2017.

[2] Shi, Heng and Xu, Minghao and Li, Ran, Deep learning for household

load forecasting - A novel pooling deep RNN IEEE Transactions on Smart

Grid, 9(5), pp.5271-5280.2018.

[3] Chitsaz, Hamed and Zamani-Dehkordi, Payam and Zareipour, Hamidreza

and Parikh, Palak, ricity price forecasting for operational scheduling of

behind-the-meter storage systems, IEEE Transactions on Smart Grids,

2017.

[4] Claessens, Bert J and Vrancx, Peter and Ruelens, Frederik, Convolutional

neural networks for automatic state-time feature extraction in reinforce-

ment learning applied to residential load control, IEEE Transactions on

Power Systems, Vol. 9 (4), pp.3259–3269. 2018.

[5] Ahmad. Ashfaq, Javaid Nadeem, Guizani M., Alrajeh, N. and Khan,

Zahoor.A., An accurate and fast converging short-term load forecasting

model for industrial applications in a smart grid, IEEE Transactions on

Industrial Informatics, Vol. 13(5), pp.2587-2596, 2017.

[6] Lago, J., De Ridder, F. and De Schutter, B.,Forecasting spot electricity

prices: Deep learning approaches and empirical comparison of tradi-

tional algorithms Applied Energy, Vol. 221, pp.386-405. 2018.

[7] Raﬁei, Mehdi and Niknam, Taher and Khooban, Mohammad-Hassan,

Probabilistic forecasting of hourly electricity price by generalization of

ELM for usage in improved wavelet neural network, IEEE Transactions

on Industrial Informatics, Vol. 13 (1), pp.71-79. 2019.

[8] Ugurlu, Umut and Oksuz, Ilkay and Tas, Oktay, Electricity Price Fore-

casting Using Recurrent Neural Networks, Energies, Vol. 11 (5), pp.1-23.

2018.

[9] Bassamzadeh, Nastaran and Ghanem, Roger, Multiscale stochastic pre-

diction of electricity demand in smart grids using Bayesian networks,

Applied energy, Vol. 193, pp.369-380. 2017.

[10] Gonz´

alez, Jos´

e Portela and San Roque, Antonio Mu˜

noz and P´

erez,

Estrella Alonso, Forecasting functional time series with a new Hilber-

tian ARMAX model: Application to electricity price forecasting, IEEE

Transactions on Power Systems, Vol. 33 (1), pp.545–556. 2018.

[11] Ziel, Florian and Weron, Rafał, Day-ahead electricity price forecasting

with high-dimensional structures: Univariate vs. multivariate modeling

frameworks, Energy Economics, Vol. 70, pp.396–420. 2018.

[12] Luo, Jian and Hong, Tao and Fang, Shu-Cherng, Benchmarking robust-

ness of load forecasting models under data integrity attacks, International

Journal of Forecasting, Vol. 34 (1), pp.89–104. 2018.

[13] Muralitharan, K and Sakthivel, Rathinasamy and Vishnuvarthan, R, Neu-

ral network based optimization approach for energy demand prediction

in smart grid, Neurocomputing, Vol. 273, pp.199–208. 2018.

[14] Kuo, Ping-Huan and Huang, Chiou-Jy, An Electricity Price Forecasting

Model by Hybrid Structured Deep Neural Networks Sustainability, Vol.

10, pp.10. 2018.