Content uploaded by Salih Tutun
Author content
All content in this area was uploaded by Salih Tutun on Mar 14, 2018
Content may be subject to copyright.
Long Term Forecasting using Machine Learning
Methods
Hossein Sangrody
1
, Student Member, IEEE, Ning Zhou
1
, Senior Member, IEEE, Salih Tutun
2
, Benyamin
Khorramdel
3
, Student Member, IEEE, Mahdi Motalleb
4
,
Student Member, IEEE, Morteza Sarailoo
1
1
Electrical and Computer Engineering Department, State University of New York at Binghamton, NY, USA
{habdoll1, ningzhou, msarail1}@binghamton.edu}
2
Industrial Engineering Department, State University of New York at Binghamton, NY, USA, stutun1@binghamton.edu
3
Department of Electrical and Computer Engineering, University of Saskatchewan, Canada, bek067@mail.usask.ca
4
Department of Mechanical Engineering, Massachusetts Institute of Technology (MIT), MA, USA, motalleb@mit.edu
Abstract— A robust model for power system load forecasting
covering different horizons of time from short-term to long-term is
an indispensable tool to have a better management of the system.
However, as the horizon of time in load forecasting increases, it will
be more challenging to have an accurate forecast. Machine
learning methods have got more attention as efficient methods in
dealing with the stochastic load pattern and resulting in accurate
forecasting. In this study, the problem of long-term load
forecasting for the case study of New England Network is studied
using several commonly used machine learning methods such as
feedforward artificial neural network, support vector machine,
recurrent neural network, generalized regression neural network,
k-nearest neighbors, and Gaussian Process Regression. The results
of these methods are compared with mean absolute percentage
error (MAPE).
I. INTRODUCTION
Distributed energy resources (DERs) are penetrating in
power system more and more and a lot of studies have focused
on implementation of them [1-3]. Although DERs have benefits
for a power system, their indispatchability, intermittency, and
uncertainty have presented unprecedented challenges to power
grid operation and planning [4]. At this condition, load
forecasting (LF) plays a critical role in the operation and
planning of a power system. Depending on the purposes of LF,
the lead times of LF can vary from seconds to years. Very short-
term load forecasting (VSTLF) [5] and short-term load
forecasting (STLF) [6] usually have lead times of seconds to
weeks and are often used for control and operation purposes. In
contrast, medium-term load forecasting (MTLF) [7] and long-
term load forecasting (LTLF) [8] have lead times of month(s),
years, even decades and are often used for scheduling and
planning purposes [9].
The driving inputs of a forecasting model are important factors
to yield an efficient forecast model. The inputs of forecasting
model depend on the purpose of forecasting and its term (very
short to long term). In [10] historical data of price are applied to
predict the hourly prices in the California Independent System
Operator (CAISO)’s day-ahead electricity market. Along with
historical data of load in load forecasting model, temperature is
also one of the most common input variables. However, since
load pattern is a nonlinear function of temperature, heating
degree days (HDD) and cooling degree days (CDD) are applied
as weather indicators in load forecasting modeling [11].
The methodologies applied in LF can be classified in three main
categories of statistical analysis, machine learning, and hybrid
methods. Among them, machine learning methods have got
more attentions in recent years [12]. Despite of the benefits of
hybrid methods, their parameters need to be adjusted well to
achieve accurate forecasting [7]. Among all machine learning
methods, feedforward artificial neural network (ANN), support
vector regression (SVR), recurrent neural network (RNN), k-
nearest neighbors (KNN), generalized regression neural
network, and Gaussian Process Regression (GPR) are the most
common methods in load forecasting. In this study, the LTLF
for monthly load forecasting in ISO New England Network is
applied using the aforementioned machine learning methods
and their results are quantified and compared by mean absolute
percentage error (MAPE).
The rest of the paper is organized as follows. Section II
elaborates the forecasting inputs, output, the methodology of
applying them, and forecasting models. Section III represents
the simulation results for the case study using different methods.
The conclusion is drawn in Section V.
II. FORECAST MODEL
Obtaining accurate forecast results depend on various factors.
Generally, the horizon of LF, certainty of the inputs, and
efficiency of forecasting models are the major influential factors
on the accuracy of forecasting results. As the horizon of load
forecast increases, having accurate prediction with highly time
resolution will be more challenging. The main reason is the high
uncertainty in the inputs of forecasting model in a long term
forecasting. Accordingly, in VSTL and STLF, since the horizon
of forecasting spans only in the time frame of seconds to weeks,
the weather indicators are more accurate inputs while for the
MTLF and LTLF which cover the lead time of months to years,
the prediction relying on inaccurate weather indicators.
A. Input and Output of the Models
As mentioned the most common variables in LF are weather
indicators. However, the relationship of weather indicators and
energy usage is not linear. Thus, two other weather indicators,
HDD and CDD are applied in forecasting model. HDD is a
criterion showing whether a unit requires to be heated and it is
obtained by the number of degrees which average temperature
of the day is below 65ᵒ F. On the other hand, CDD showing that
a unit needs to be colder is the number of degrees that the
average temperature of a day is above 65ᵒ F. Such variables
yield linear relationships between energy and weather
indicators. In this study, the forecasting model is applied to
predict monthly energy for the New England Network and the
total HDDs and CDDs for each month are applied as inputs of
the model.
To improve the accuracy of forecasting model, the historical
record of energy is also used as input using moving average
method. As shown in (1), to have prediction of energy in a target
month, an 11-month average energy corresponding to the target
month is also fed to the model along with weather indicators.
Here is historical data time which is 11 for this case and
represents target variable.
=1
(1)
Fig. 1 illustrates the overall structure of inputs and outputs in
the forecasting model. In this figure, for each month of
forecasted energy as target, the model uses corresponding total
HDD, CDD, and average energy of 11 corresponding months of
the past years.
Fig. 1. Forecasting structure
B. Forecasting Models
In this sections, a brief review on the commonly used machine
learning methods in forecasting are discussed.
1) Feed forward Artificial Neural Network Model
The ANN method provides an efficient way to address
modelling of a complex nonlinear system. In other words, in
forecasting model using ANN, there is no need for a forecaster
to have a clear understanding of the complex relationship
between inputs and outputs.
Fig. 2 depicts a typical neural network normally consists of
three layers of input, hidden and output layers. Each layer
consists of several neurons which are connected to other layer’s
neuron(s) with weighted connections. As shown in this figure,
the arrowheads of the connections indicate that all data
propagate in the direction from inputs to the output. Such a
structure is entitled feedforward ANN model. The number of
neurons in the input and output layers are the number of inputs
and output, respectively. The hidden layer located between the
input layer and output layer has an arbitrary number of neurons
which are defined by the forecaster. For many problems in
forecasting, one or two hidden layers often give good results.
Fig. 2 Typical structure of feedforward ANN model
The training algorithm for this study is the Levenberg-
Marquardt algorithm which takes more training time but gives
better results.
2) Support Vector Machine
Support vector regression is the version of the support vector
machine method that are applying for forecasting model.
Assuming as the input variable vector and as the output
variables, the SVR solution can be obtained by minimizing the
sum of training error ∑(
+
∗
)
and regularization term
‖‖
in (2) subjected to constraints (3).
‖‖
+∑(
+
∗
)
(2)
−(
(
)+)≤+
(
(
)+)−
≤+
∗
,
∗
≥0,=1,…, (3)
Here, N is the total number of observation sets,
and
∗
are
upper and lower training errors associated to (margin of
tolerance) and is the kernel function, which transforms
to
higher dimensional space.
3) Recurrent neural network
The structure of Recurrent neural network(RNN) includes an
input layer, hidden layer, a context layer, and output layer [13,
14]. (4) and (5) show the calculations in RNN model for
training data points
and a target values
.
ℎ
=
(
+
ℎ
+
) (4)
=
(
ℎ
+
) (5)
Where ℎ
denotes the hidden layer vector, is bias vector,
and are weigh matrices, and represents the activation
function.
4) Generalized Regression Neural Network
Generalized regression neural network (GRNN) is
nonparametric model whose structure includes input layer,
output layer, radial basis layer, and a special linear layer. The
GRNN model derives prediction of a target value corresponding
to a given data point by calculating the weighted average of
target values in the training data points in the vicinity of the data
point [15]. As shown in (6), a target point () corresponding
to the data point is predicted by the average of the target points
and assigning weights using a kernel function considering the
distance of predictors in training set to the data point . In this
case, the kernel function () is a standard Gaussian kernel.
=
(6)
=(‖−
‖
ℎ)
∑(‖−
‖
ℎ)
(7)
5) K nearest neighbors Regression
K nearest neighbors method (KNN) is a nonparametric
method applied for both regression and classification. In this
method, the prediction is yielded based on the target values of
the K nearest neighbors in the given point. In other words, given
a data point, the K nearest data points in the training data set are
selected and the average of their target values are considered as
predicted target value as follows.
=1
(8)
6) Gaussian Process Regression
Gaussian Process Regression (GPR) is a nonparametric
method which is modeled based on considering a priori
distribution (multivariate joint Gaussian distribution) for any
subset of target values of different data points in training set
[16]. In GPR, if two input vectors are close, the correlation
between their function value is higher. The posterior
distribution for a predicted value is derived using the prior
distribution. The covariance two inputs
and
in training
inputs can be model as follows [15].
,
=
(9)
So, the vector function (consisting function values
for
training point ) follows multivariate Gaussian density function
as follows.
~
(0,(,)) (10)
Where,
denotes multivariate normal density function and
(,) denotes covariance matrix whose (,)
element is
,
. Considering the target values as (11), the predicted
posterior function ∗ for an input value
∗
is derived as (12).
=+ (11)
∗=(
∗
|,,
∗
)=(
∗
,)[(,)+
]
(12)
Where denotes the vector of noise with standard deviation
of
.
C. Accuracy metric
To quantify the results of forecasting models, there are
various statics metrics. Some common metrics are mean
absolute percentage error (MAPE) [17] defined by (13), mean
absolute error (MAE) [18] defined by (14), mean squared error
(MSE) [19] defined by (15), and root-mean-square error
(RMSE) [20] defined by (16).
MAPE=
×100 (13)
MAE=
−
(14)
MSE=
(
−
)
(15)
RMSE=
(
−
)
(16)
Where, is the total number of time instants,
is the target
value at instance of and
is the corresponding forecasted
target.
Between aforementioned error metrics, MAPE is the most
common metrics. As shown in (13) the MAPE gives relative
errors in percentage, which does not depend on the scale of
forecasted variables. Therefore, the MAPE has been widely
used to compare forecasting accuracy under different scenarios.
Accordingly, in this study, the MAPE is used as the criterion to
measure the proficiency of the models.
III. SIMULATION RESULTS
In this section, the long-term load forecasting models are
applied for the case study of the New England Network. The
load profile of the network during 2000 to March 2016 is shown
in Fig. 3.
Fig. 3. New England Network Energy during 2000 to March 2016
As mentioned before, the input variables of each load forecast
model depend on the case study and horizon of forecasting. In
long term forecasting, one of the likely inputs is population
growing rate which has a positive exponential rate. However,
Fig. 3 illustrates that the load did not grow exponentially during
long term of 17 years which dismisses the influence of growing
population as a input for this case.
In Fig. 4, the energy usage is represented in month and year
axes. As shown the load in July and August peaks dramatically.
Such a stochastic load behavior will make it more difficult for
the forecasting model to have accurate prediction if the model
is supposed to rely on only weather variables. Note that although
residential load pattern depends on whether indicators like
temperature, industrial and commercial loads do not correlate
strongly on weather temperature. However, in this case since the
dramatic changes of load in the aforementioned months as well
as January are repeated at the same time during all years, using
historical value of load data can be a good solution to deal with
such a problem. In addition, considering the number of month
as an input is one of the inputs which improves the results of
prediction. Accordingly, each target value in a load forecasting
model corresponds to weather indicators, historical average, and
dummy variable of the month number.
The monthly energy data are divided in three categories of
historical data (which is applied in forecasting model as one of
the inputs), training set, and cross validation data set. As shown
in Fig. 5, the green color are historical data from 2000 to 2011
and sixty percent of the rest of data which is the monthly energy
during January 2011 to March 2016 are used for training of
(blue color) the forecast model and 40 percent are applied for
validation (red color).
The data before 2011 are implemented as one of inputs
corresponding to each target value. In other words, each month
of target gets benefit from the corresponding monthly historical
data of during 11 years ago and the historical data moves ahead
as the target value of monthly energy moves ahead in training
and validation processes.
Fig. 4. 2D representation of energy usage for New England Network
As an example, to forecast energy in January 2011, as the first
target, the inputs are total HDD and CDD of January 2011,
dummy variable for the month number which is one for January
and zero for other months, and historical monthly energy of the
January during 2000 to 2010. For the next slide, for forecasting
the energy in February 2011, the energy usage in February 2000
to 2010, the month variable (which is 1 for February), as well as
HDD and CDD of February 2011 are applied in the model.
Fig. 5. Training and validation set of data
As mentioned before, the most common machine learning
methods in load forecasting are ANN, SVR, RNN, GRNN,
KNN, and GPR. All LF models are implemented using
MATLAB
®
and the results for both training and validation set
are compared with MAPE metric.
For the feedforward ANN, by trying with different hidden
layers, one hidden layer and 3 neurons results in low forecasting
errors for both training and validation data sets. The SVR
method is applied for LF using LIBSVM [21].
Table I represents the results of the 6 forecasting models. In
this table, the results are resented in MAPEs for both training
and validation data sets.
As seen in the table, although the results of LF for all methods
are close to each other, the feedforward ANN represents better
results than other methods for the validation set while it also has
decent result for the training set. Note that the MAPE for
training set in the KNN method is zero since the same data for
model training is applied in testing of training data set.
T
ABLE
I.
RESULTS OF
LF
MODELS FOR TRAINING AND VALIDATION DATA SETS
Method Training Validation
ANN 0.7 1.5
SVR 0.9 1.7
RNN 0.7 1.9
GRNN 0.7 2.3
KNN 0.0 2.3
GPR 0.6 2.0
The result of ANN is also represented in Fig. 6. In this figure,
the green color graph exhibits the actual energy and the blue and
red ones demonstrate the training and validation set,
respectively. Note that since the ANN method, random process
is applied, each running of the simulation may result in slightly
different values. In this regard, the average of several simulation
running is considered as ANN’s final result.
Fig. 6. LF using ANN model
IV. CONCLUSION
In this study, the performance of the most commonly used
machine learning methods in load forecasting has been studied.
These methods are feedforward artificial neural network
(ANN), support vector regression (SVR), recurrent neural
network (RNN), k-nearest neighbors (KNN), generalized
regression neural network (GRNN), and Gaussian Process
Regression (GPR). The case study is New England Network and
its monthly energy usage during 2000 to May 2016 is
considered for training and validation of the load forecasting
models. The inputs of the load forecasting models are weather
indicators (HDD and CDD), dummy variable of month number,
and the moving average of the target variable before 2011. The
results of forecasting models which are represented by MAPE
indicate that although for both training and validation data set,
all LF methods depict proficient performance, the feedforward
ANN method shows better results than the other forecasting
methods.
REFERENCES
[1] H. Sadeghian and Z. Wang, "Decentralized Demand Side
Management with Rooftop PV in Residential Distribution Network,"
in Innovative Smart Grid Technologies Conference (ISGT), 2018
IEEE Power & Energy Society, 2017, pp. 1-5.
[2] M. Ghorbaniparvar, X. Li, and N. Zhou, "Demand side management
with a human behavior model for energy cost optimization in smart
grids," in Signal and Information Processing (GlobalSIP), 2015
IEEE Global Conference on, 2015, pp. 503-507.
[3] M. Motalleb, A. Eshraghi, E. Reihani, H. Sangrody, and R.
Ghorbani, "A Game-Theoretic Demand Response Market with
Networked Competition Model," in 49th North American Power
Symposium (NAPS), 2017.
[4] M. H. Athari and Z. Wang, "Impacts of Wind Power Uncertainty on
Grid Vulnerability to Cascading Overload Failures," IEEE
Transactions on Sustainable Energy, vol. 9, pp. 128-137, 2018.
[5] J. W. Taylor, "An evaluation of methods for very short-term load
forecasting using minute-by-minute British data," International
Journal of Forecasting, vol. 24, pp. 645-658, 2008.
[6] T. Hong, Short term electric load forecasting: North Carolina State
University, 2010.
[7] N. Amjady and A. Daraeepour, "Midterm demand prediction of
electrical power systems using a new hybrid forecast technique,"
Power Systems, IEEE Transactions on, vol. 26, pp. 755-765, 2011.
[8] R. J. Hyndman and S. Fan, "Density forecasting for long-term peak
electricity demand," Power Systems, IEEE Transactions on, vol. 25,
pp. 1142-1153, 2010.
[9] H. Sangrody and N. Zhou, "An initial study on load forecasting
considering economic factors," in 2016 IEEE Power and Energy
Society General Meeting (PESGM), Boston, MA, 2016, pp. 1-5.
[10] A. Sadeghi-Mobarakeh, M. Kohansal, E. E. Papalexakis, and H.
Mohsenian-Rad, "Data mining based on random forest model to
predict the California ISO day-ahead market prices," in Power &
Energy Society Innovative Smart Grid Technologies Conference
(ISGT), 2017 IEEE, 2017, pp. 1-5.
[11] C.-L. Hor, S. J. Watson, and S. Majithia, "Analyzing the impact of
weather variables on monthly electricity demand," Power Systems,
IEEE Transactions on, vol. 20, pp. 2078-2085, 2005.
[12] E. Foruzan, S. D. Scott, and J. Lin, "A comparative study of different
machine learning methods for electricity prices forecasting of an
electricity market," in North American Power Symposium (NAPS),
2015, 2015, pp. 1-6.
[13] H. Liu, X.-w. Mi, and Y.-f. Li, "Wind speed forecasting method
based on deep learning strategy using empirical wavelet transform,
long short term memory neural network and Elman neural network,"
Energy Conversion and Management, vol. 156, pp. 498-514, 2018.
[14] X. Chen, X. Chen, J. She, and M. Wu, "A hybrid time series
prediction model based on recurrent neural network and double joint
linear–nonlinear extreme learning network for prediction of carbon
efficiency in iron ore sintering process," Neurocomputing, vol. 249,
pp. 128-139, 2017.
[15] N. K. Ahmed, A. F. Atiya, N. E. Gayar, and H. El-Shishin y, "An
empirical comparison of machine learning models for time seri es
forecasting," Econometric Reviews, vol. 29, pp. 594-621, 2010.
[16] C. E. Rasmussen and C. K. Williams, Gaussian processes for
machine learning vol. 1: MIT press Cambridge, 2006.
[17] J. W. Taylor and R. Buizza, "Neural network load forecasting with
weather ensemble predictions," Power Systems, IEEE Transactions
on, vol. 17, pp. 626-632, 2002.
[18] H. Shih-Che, L. Chan-Nan, and L. Yuan-Liang, "Evaluation of AMI
and SCADA Data Synergy for Distribution Feeder Modeling,"
Smart Grid, IEEE Transactions on, vol. 6, pp. 1639-1647, 2015.
[19] K. Siwek, S. Osowski, and R. Szupiluk, "Ensemble neural network
approach for accurate load forecasting in a power system,"
International Journal of Applied Mathematics and Computer
Science, vol. 19, pp. 303-315, 2009.
[20] T. Chen and Y.-C. Wang, "Long-term load forecasting by a
collaborative fuzzy-neural approach," International Journal of
Electrical Power & Energy Systems, vol. 43, pp. 454-464, 2012.
[21] C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector
machines," ACM Transactions on Intelligent Systems and
Technology (TIST), vol. 2, p. 27, 2011.
2010 2011 2012 2013 2014 2015 2016 2017
Time
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3 104
Actual Data
Training Data
Validatio n Data