Data Analytics for Electricity Load
and Price Forecasting in the Smart Grid
Syeda Aimal1, Nadeem Javaid1(B
), Amjad Rehman2, Nasir Ayub1,
Tanzeela Sultana1, and Aroosa Tahir3
1COMSATS University, Islamabad 44000, Pakistan
2Al Yamamah University, Riyadh, Saudi Arabia
3Sardar Bahadur Khan Women University, Quetta 87300, Pakistan
Abstract. The present strategies for the prediction of price and load
may be diﬃcult to deal with huge amount of load and price data. To
resolve the problem, three modules are incorporated within the model.
Firstly, the fusion of Decision Tree (DT) and Random Forest (RF) are
used for feature selection and to remove the redundancy among fea-
ture. Secondly, Recursive Feature Elimination (RFE) is taken for feature
extraction purpose that extracts the principle components and also used
for dimensionality reduction. Finally, to forecast load and price, Support
Vector Machine (SVM) and Logistic Regression (LR) as a classiﬁers are
used through which we achieve good accuracy results in load and price
Demand-side Management (DSM) is the main factor to motivate the user to
consume less energy during the peak hours, and also to shift the on-peak hours
energy load to oﬀ-peak hours. The main beneﬁt for shifting the energy from on-
peak to oﬀ-peak hours is to making the smart grid cost-eﬀective or proﬁtable.
Smart grid is an electric grid that is consists of number of energy and oper-
ational measures like smart appliances, meters, and renewable energy resources
etc. The main objective of smart grid is to reduce electricity peak load and
to achieve the balance between power supply and demand . Customers are
capable of partake inside the operations of smart grid, the cost of energy can be
minimized by the preservation of electricity and shifting the load. In this context,
dynamic pricing is a key indicator. The price forecasting is presumed because of
the requirement of ﬁnancial system and enterprise . As for customers, they’re
keen to know whether or not the price of electricity exceeds the certain con-
sumer deﬁned thresholds, which they used to make decision to either turn the
load oﬀ or on. In these circumstances, clients require the electricity price classi-
ﬁcation. Hence, a few speciﬁc thresholds based totally on point price forecasting
algorithms are used to classify the price of electricity. Function approximation
Springer Nature Switzerland AG 2019
L. Barolli et al. (Eds.): WAINA 2019, AISC 927, pp. 582–591, 2019.
Electricity Load and Price Forecasting in Smart Grid 583
strategies are the fundamentals of forecasting algorithms, in which the basic
manner of rate formation is imitated by a price model . Moreover, rate clas-
siﬁcation calls for decrease accuracy. Thus, the classiﬁcation will become a key
precedence inside the price forecasting. The energy rate is inﬂuenced by the
usage of diﬀerent factors, like with power requirement, renewable power supply
and it varies hourly.
1.1 Probem Statement
In paper [4,5], the authors claimed to improve price forecasting. However, the
load is not considered. The authors also worked on reducing the peak load
through diﬀerent factors. However, the accuracy of load and price isnot
considered. The authors also made hybrid model of non-linear regression and
SVM. However, the load is not considered in this model. The electricity prices
are changing frequently, there is a need of a model which can over come these
1.2 Paper Organization
Section 2, the literature of diﬀerent related works is reviewed which is based
on feature engineering and electricity price and load forecasting. There are two
diﬀerent ways for electricity load and price forecasting i.e. time-series model
and machine-learning model. Section 3, contains model in which the primary
goal of the model is to calculate the eﬃciency and accuracy. For this purpose,
we ﬁrst need to collect the real data, preprocessing is done on that raw data,
selected features are extracted out and then parameters are tuned carefully.
Section 4, contains simulation and results in which we simulate our proposed
model, we performed the simulation in python environment. The simulations
are performed on system with speciﬁcation intel corei5, 8 GB RAM, and 1 TB
hard disk. Section 5, contains conclusion of this paper.
2 Related Work
In the operations of modern electricity power system there is the choice of load
shifting as well as we can reduce the electricity energy cost. Dynamic electricity
pricing  and  helps the user to switch the load.
There are two main ways for the forecasting of load and price i.e. time-
series and machine learning model. For electricity load forecasting  developed
a model for the prediction of load and temperature of day ahead electricity
market with the analysis of singular spectrum and neural network structure.
The users are well pleased by knowing that the prices of electricity transcend
the thresholds, on the basis of which they can make the decision to whether turn
oﬀ or turn on the load. In the current situation, it is necessary to implement
price data analytic for the classiﬁcation of electrical price and well as load. There
are various methods to apply classiﬁcation for load and price forecasting i.e. by
584 S. Aimal et al.
categorizing the price on the basis of applied threshold values. As compared to
load, it is diﬃcult to achieve accuracy in price due to factors inﬂuencing it.
Moreover, to come up with these issues, the authors applied various tech-
niques, traditional classiﬁers like Support Vector Machine widely known as SVM,
Navies Bayes, Neural Network, Decision Tree, and Random Forest are the most
famous and well-known one . A survey is done in paper , according to
that survey we came to know that SVM outperform better accuracy but is high
in complexity. Like SVM, Logistic Regression is also a classiﬁer that is used to
predict a binary outcome.
Time-series analysis is used largely in forecasting problem of price and
load. In , authors proposed an Auto-regressive Integrated Moving Average
(ARIMA) based algorithm for the forecasting purpose in the electricity market
of Turkey. The existence of diﬀerent kinds of outliers in ARIMA, raw data that
is coming from the market. However, it is diﬃcult to built a model with sta-
ble forecasting accuracy. In , an Auto-Regression Moving Average Hilbertian
(AMAH) is modeled for estimating the average values for time series which can
be applied to electrical price forecasting purpose.
For the application of the classiﬁers feature engineering is the bedrock for this
purpose. Feature engineering consist of two operations i.e. feature selection and
extraction. In load and price forecasting, diﬀerent kinds of methods and tech-
niques are used for feature engineering. In , time-series data is processed by
using feature extraction. The extraction algorithm is based on Symbolic Aggre-
gation Approximation (SAX).
In the paper , for feature selection the author uses multi-variable mutual
information technique. In , the authors measures the feature’s relevancy.
Moreover, in  C4.5 algorithm for the selection of features which outperforms
better then Iterative Di-chotomiser 3 (ID3) in building the model of decision tree
. Based on previous studies, the algorithms for feature selection or extraction,
classiﬁer for modeling and tuning the parameters. The traditional classiﬁers are:
Artiﬁcial Neural Network (ANN), and DT are most widely used classiﬁers [19,
20]. However, in the case of DT, it often faces the outﬁt problem due to which it
does not performs good in forecasting however performs well in training. On the
other hand, ANN convergence cannot be controlled easily and its generalization
capability is so limited.
To enquire the problem of electricity load and price forecast. Our aim is to
predict the electricity load and price by using the data from grid and to achieve
good accuracy. To get over this issue, LR and SVM are taken as classiﬁers that
is used to predict the load as well as price. LR is a type of regression that is use
to predict the occurrence probability of an event, which is done when the data
is ﬁtted to a logistic function. It may use several predictor variables that can be
either categorical or numerical. Historical data is used as a training-set on which
testing is done (Fig. 1). In the proposed approach, we ﬁrst took the real data-set
Electricity Load and Price Forecasting in Smart Grid 585
from the electricity market (NYISO) i.e. is for three years from january 2011
till 2013. The required data is divided into month-wise and the data with the
same months are utilize for training and testing purpose. For example, for the
training purpose ﬁrst three weeks of january from 2011 till 2013 are selected and
for the testing purpose last week of january 2011 till 2013 is selected. It consists
of following steps:
Fig. 1. System model
•First of all, for the suitable structure format it is necessary to format the
•Incomplete and redundant values and variables are removed.
586 S. Aimal et al.
•The data sampling must be done to reduce consumption time taking by run-
The prepared data is used for training, validation, and testing. In this model,
feature extraction, selection, and classiﬁcation is done. A fusion of DT, RF and
RFE is applied to the data-set for selecting and extracting the principle compo-
nents. RFE is used to eliminate the further redundancy in the selected features.
So, we can say that modules in this model consists of three parts: feature engi-
neering i.e. (selection, extraction) and classiﬁcation.
The primary goal of the model is to calculate the eﬃciency and accuracy
while predicting the electricity load and price. For this purpose, we ﬁrst took
real data from grid, selected features are extracted out and then parameters are
The following metrics are the most essential metrics for accessing the per-
formance of the model:
•The model that is applied for the purpose of predicting load and price should
work eﬃciently and run rapidly.
•Accuracy of classiﬁcation is inﬂuence directly by the performance of feature
•Classiﬁcation’s accuracy is highly dependent on the goal of model.
The main issue that we are facing in the prediction of price and load is
accuracy due to the factors inﬂuencing them. However, it makes the training
tough. To enhance prediction accuracy, we embedded feature engineering process
(selection, extraction) in our model.
3.1 Feature Selector
In the feature selection, we combine DT and RF to select the important features
in data. A threshold value is given (i.e. 0.8) which selects the features that shows
the important features.
Feature selection for the following top reasons:
•The accuracy of a model can be improved.
•The complexity of a model can be improved.
•It increases the eﬃciency of machine learning algorithm to train faster.
3.2 Feature Extractor
After feature selection, feature extraction is done which eliminate the redun-
dancy among features using RFE.
•Improves the quality, speed, and eﬀectiveness of supervised learning.
•It is a reduction process.
•it is used for the transformation of attributes.
Electricity Load and Price Forecasting in Smart Grid 587
Classiﬁcation is the method of discovering to which the set of new observation
belongs. This can be achieved on the basis of a trained data-set containing
observations. For the purpose of classiﬁcation; LR and SVM classiﬁers are used
and tune them accordingly.
After eliminating the irrelevant and duplication in the features. Classiﬁer is
applied for the prediction of load and price.
3.3.1 Steps for Logistic Regression
•from sklearn.preprocessing import StandardScaler
•a = StandardScaler()
•test = a.transform(test)
Fitting Logistic Regression to dataset
•from sklearn.linear.model import LogisticRegression
•classiﬁer = LogisticRegression()
Predicting the test set result
•y.pred = classiﬁer.predict(X.test)
Making the confusion matrix
•from sklearn.metrics import confusion.matrix
•cm = confusion.matrix(y.test, y.pred)
4 Simulation and Reasoning
In this section, we are going to explain the following:
Each of the points mentioned above are describe in detail below.
588 S. Aimal et al.
4.1 Simulation Setup
To simulate our proposed model, we performed the simulation in python envi-
ronment. The simulation is performed on system with speciﬁcation intel corei5,
8 GB RAM, and 1TB hard disk. Three years hourly electricity load data is used
for the evaluation of proposed scheme. The simulation results are organized as
•Features are selected with RF and DT.
•Redundant features are dropped also with RFE technique.
•Principle Components are selected.
•Logistic Regression performance is compared with four performance evalua-
tors i.e. Mean Average Error (MAE), Mean Square Error (MSE), Root Mean
Square Error (RMSE), and Mean Absolute Percentage Error (MAPE).
4.2 Simulation Results
See Table 1.
Table 1. Attributes of data
Selected features Rejected features
Fig. 2. Feature importance Fig. 3. Feature importance
Electricity Load and Price Forecasting in Smart Grid 589
Fig. 4. Load forecast for one month
4.2.1 Feature Selection Performance of DT and RE
DT and RF techniques are applied to select the important features with respect
to target feature from electricity load and price data i.e. for three years and the
RFE is applied to reduce the dimensions of data. The features, which have little
impact on load and price are dropped from data (shown in Figs. 2and 3).“ In
Figs. 4and 5, we see that LR as a classiﬁer outperforms the best at the forecast
accuracy of electric load as well as price as compared to SVM. The accuracy
that is achieved by LR is 90%. However, if we see SVM it is not performing the
best as compared to LR. In load and price prediction of one month, we see that
LR is more close to the actual load and price as compared to SVM which is good
result. The normalized load of three years as shown in Fig.7.
Fig. 5. Price forecast for one month
590 S. Aimal et al.
4.3 Performance Evaluation
For performance evaluation purpose, four means are used i.e. MAE, MSE,
RMSE, and MAPE. MAPE has the lowest error value i.e. 9.834 and MSE has
the highest error value i.e. 189.6475. In order to investigate the robustness and
accuracy of the techniques, the comparison is conducted as shown in Fig.6.
Fig. 6. Error value comparison Fig. 7. Normalized load
In this paper, we’ve enquired the problem of prediction for electricity load and
price in smart grid. To solve the problem, forecasting model is used that is easy to
implement on distributed and parallelized systems. The proposed model is based
on feature engineering and classiﬁcation. A fusion of two techniques is applied
for feature selection. To further remove the redundancy among features RFE
is used to extract the principle features. Now we have the new features which
enhances the classiﬁers and achieves the accuracy. LR as a classiﬁer outperforms
best at the forecast accuracy of electric load as well as price. The accuracy that
is achieved by LR is 90%. However, if we see SVM it is not performing the best
as compared to LR. The performance is evaluated by comparing the results with
four means. i.e. MAPE, RMSE, MAE and MSE. For now, we have predicted
load and price for one month; in future we will attempt long term prediction.
i.e. for nine months as well as 12 months.
Acknowledgments. This research is supported by Al Yamamah university Riyadh
1. Zhang, D., Li, S., Sun, M., O’Neill, Z.: An optimal and learning-based demand
response and home energy management system. IEEE Trans. Smart Grid 7(4),
2. Javaid, N., Hafeez, G., Iqbal, S., Alrajeh, N., Alabed, M.S., Guizani, M.: Energy
eﬃcient integration of renewable energy sources in the smart grid for demand side
management. IEEE Access 6, 77077 (2018)
3. Jindal, A., Singh, M., Kumar, N.: Consumption-aware data analytical demand
response scheme for peak load reduction in smart grid. IEEE Trans. Ind. Electron.
Electricity Load and Price Forecasting in Smart Grid 591
4. Wang, K., Xu, C., Guo, S.: Big data analytics for price forecasting in smart grids.
In: 2016 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE
5. Wang, K., Xu, C., Zhang, Y., Guo, S., Zomaya, A.: Robust big data analytics for
electricity price forecasting in the smart grid. IEEE Trans. Big Data (2017)
6. Moon, J., Kim, K.H., Kim, Y., Hwang, E.: A short-term electric load forecasting
scheme using 2-stage predictive analytics. In: 2018 IEEE International Conference
on Big Data and Smart Computing (BigComp), pp. 219–226. IEEE (2018)
7. Keles, D., Scelle, J., Paraschiv, F., Fichtner, W.: Extended forecast methods for
day-ahead electricity spot prices applying artiﬁcial neural networks. Appl. Energy
162, 218–230 (2016)
8. Chen, P.C., Kezunovic, M.: Fuzzy logic approach to predictive risk analysis in
distribution outage management. IEEE Trans. Smart Grid 7(6), 2827–2836 (2016)
9. Mujeeb, S., Javaid, N., Akbar, M., Khalid, R., Nazeer, O., Khan, M.: Big data
analytics for price and load forecasting in smart grids. In: International Conference
on Broadband and Wireless Computing, Communication and Applications, pp. 77–
87. Springer, Cham (2018)
10. Javaid, N., Javaid, S., Abdul, W., Ahmed, I., Almogren, A., Alamri, A., Niaz,
I.A.: A hybrid genetic wind driven heuristic optimization algorithm for demand
side management in smart grid. Energies 10(3), 319 (2017)
11. Mahmood, D., Javaid, N., Ahmed, I., Alrajeh, N., Niaz, I.A., Khan, Z.A.: Multi-
agent-based sharing power economy for a smart community. Int. J. Energy Res.
41, 2074 (2017)
12. Zhao, Z., Lee, W.C., Shin, Y., Song, K.B.: An optimal power scheduling method for
demand response in home energy management system. IEEE Trans. Smart Grid
4(3), 1391–1400 (2013)
13. Logenthiran, T., Srinivasan, D., Shun, T.Z.: Demand side management in smart
grid using heuristic optimization. IEEE Trans. Smart Grid 3(3), 1244–1252 (2012)
14. Ahmad, A., Javaid, N., Alrajeh, N., Khan, Z.A., Qasim, U., Khan, A.: A modiﬁed
feature selection and artiﬁcial neural network-based day-ahead load forecasting
modelforasmartgrid.Appl.Sci.5(4), 1756–1772 (2015)
15. Ahmed, M.S., Mohamed, A., Khatib, T., Shareef, H., Homod, R.Z., Ali, J.A.: Real
time optimal schedule controller for home energy management system using new
binary backtracking search algorithm. Energy Build. 138, 215–227 (2017)
16. Ahmad, A., Javaid, N., Guizani, M., Alrajeh, N., Khan, Z.A.: An accurate and
fast converging short-term load forecasting model for industrial applications in a
smart grid. IEEE Trans. Ind. Inform. 13(5), 2587–2596 (2017)
17. Wu, M., Wang, Y.: A feature selection algorithm of music genre classiﬁcation based
on ReliefF and SFS. In: IEEE International Conference on Computer and Informa-
tion Science (ICIS), pp. 539–544 (2009). Processing and Communications Appli-
cations, 2009, pp. 61–64
18. Wu, M., Wang, Y.: A feature selection algorithm of music genre classiﬁcation based
on ReliefF and SFS. In: IEEE International Conference on Computer and Infor-
mation Science (ICIS), pp. 539–544 (2015)
19. Fleury, A., Vacher, M., Noury, N.: SVM-based multimodal classiﬁcation of activ-
ities of daily living in health smart homes: sensors, algorithms, and ﬁrst experi-
mental results. IEEE Trans. Inf. Technol. Biomed. 14(2), 274–283 (2010)
20. Huang, D., Zareipour, H., Rosehart, W.D., Amjady, N.: Data mining for electricity
price classiﬁcation and the application to demand-side management. IEEE Trans.
Smart Grid 3(2), 808–817 (2012)