Content uploaded by Nadeem Javaid
Author content
All content in this area was uploaded by Nadeem Javaid on Jul 01, 2016
Content may be subject to copyright.
A Scalable Short-Term Load Forecasting Model for
Micro-grid Communication Networks
By
Ashfaq Ahmad
CIIT/FA13-REE-044/ISB
MS Thesis
In
Electrical Engineering
COMSATS Institute of Information Technology
Islamabad – Pakistan
Spring, 2015
A Scalable Short-Term Load Forecasting Model for
Micro-grid Communication Networks
A Thesis Presented to
COMSATS Institute of Information Technology, Islamabad
In partial fulfillment
of the requirement for the degree of
MS (Electrical Engineering)
By
Ashfaq Ahmad
CIIT/FA13-REE-044/ISB
Spring, 2015
ii
A Scalable Short-Term Load Forecasting Model for
Micro-grid Communication Networks
A Graduate Thesis submitted to Department of Electrical Engineering as partial
fulfillment of the requirement for the award of Degree of M.S (Electrical Engineering).
Name
Registration Number
Ashfaq Ahmad
CIIT/FA13-REE-044/ISB
Supervisor
Dr. Nadeem Javaid,
Assistant Professor,
Department of Computer Science,
COMSATS Institute of Information Technology (CIIT),
Islamabad Campus.
iii
Final Approval
This thesis titled
A Scalable Short-Term Load Forecasting Model for
Micro-grid Communication Networks
By
Ashfaq Ahmad
CIIT/FA13-REE-044/ISB
has been approved
For the COMSATS Institute of Information Technology, Islamabad
External Examiner: ___________________________________
Dr. Hasan Mahmood
Associate Professor, Department of Electronics,
QAU, Islamabad
Supervisor: ________________________________________________
Dr. Nadeem Javaid
Assistant Professor, Department of Computer Science,
CIIT, Islamabad
HoD:_________________________________________________________
Dr. Shahid A. Khan
Professor, Department of Electrical Engineering,
CIIT, Islamabad
iv
Declaration
I Mr. Ashfaq Ahmad, CIIT/FA13-REE-044/ISB, hereby declare that I have
produced the work presented in this thesis, during the scheduled period of study. I
also declare that I have not taken any material from any source except referred to
wherever due that amount of plagiarism is within acceptable range. If a violation
of HEC rules on research has occurred in this thesis, I shall be liable to punishable
action under the plagiarism rules of the HEC.
Signature of the student:
Date: ____________________________
____________________________
Ashfaq Ahmad
CIIT/FA13-REE-044/ISB
v
Certificate
It is certified that Ashfaq Ahmad CIIT/FA13-REE-044/ISB has carried out all the
work related to this thesis under my supervision at the Department of Electrical
Engineering, COMSATS Institute of Information Technology, Islamabad and the
work fulfills the requirements for the award of the MS degree.
Date: ____________________________
Supervisor:
____________________________
Dr. Nadeem Javaid,
Assistant Professor
Head of Department:
____________________________
Dr. Shahid A. Khan
Professor, Department of Electrical Engineering.
vi
DEDICATION
This thesis is dedicated to my teachers, my family and my
friends.
vii
ACKNOWLEDGMENT
I am heartily grateful to my supervisor, Dr. Nadeem Javaid, who not only guided me but also
motivated me via insightful criticism from the beginning to the final level that enabled me to
complete this thesis.
I would like to acknowledge my family, my friends, and the cooperative CAST lab attendants.
They all kept me motivated and energetic, and this work have not been possible without them.
Finally, I offer my regard and blessing to everyone who supported me in any regard during the
completion of my thesis.
Ashfaq Ahmad
CIIT/FA13-REE-044/ISB
viii
ABSTRACT
A Scalable Short-Term Load Forecasting Model for
Micro-grid Communication Networks
The underlying forecast model is one of the most significant strategies that directly
affect the economies of energy trade because not only prosumers but also the utilities
aim to maximize their benefits. In this regard, most of the existing forecast models
trade-off between forecast accuracy and convergence rate. This thesis presents a short
term load forecasting model for micro-grid communication networks. Unlike existing
short term forecast models, our proposed model factors in accuracy as well as
convergence rate. Subject to accuracy improvement, we devise modifications in two
popular techniques; mutual information based feature selection, and enhanced
differential evolution algorithm based error minimization. On the other hand,
convergence rate of the overall forecast strategy is enhanced by devising modifications
in the heuristic algorithm. Besides accuracy and convergence rate improvement, we
also devise modification in the feature selection technique to make our proposed
model scalable. Simulation results show that accuracy of the proposed scalable short
term load forecasting model is 99.5%.
ix
TABLE OF CONTENTS
1 Introduction 1
1.1 TheSmartGrid......................... 2
1.2 Towards Localization: The Smart Micro-Grid . . . . . . . . 3
1.2.1 Load Forecast . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Our Contribution . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 7
2.1 Stochastic Distribution Based Strategies . . . . . . . . . . . 8
2.2 ANN based Strategies . . . . . . . . . . . . . . . . . . . . . 8
2.3 Markov Chain Based Strategies . . . . . . . . . . . . . . . . 10
3 Forecast Strategies: Towards Development 11
3.1 Challenges............................ 12
3.2 Influencing Factors . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Basic Units of a Generic Forecast Model . . . . . . . . . . . 13
3.3.1 Feature Selector . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 Forecaster . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.3 Optimizer . . . . . . . . . . . . . . . . . . . . . . . . 15
4 ANN Based Forecast Strategy 16
4.1 Data Preparation Module . . . . . . . . . . . . . . . . . . . 17
4.2 Feature Selection Module . . . . . . . . . . . . . . . . . . . . 18
4.3 Forecast Module . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.1 Error Performance . . . . . . . . . . . . . . . . . . . 27
4.4.2 Convergence Rate Analysis . . . . . . . . . . . . . . . 30
5 mEDE and ANN Based Forecast Strategy 32
5.1 Motivation............................ 34
5.2 The mEDE and ANN Based Forecast . . . . . . . . . . . . . 34
5.2.1 Pre-Processing Module . . . . . . . . . . . . . . . . . 34
5.2.2 Forecast Module . . . . . . . . . . . . . . . . . . . . 38
x
5.2.3 Optimization Module . . . . . . . . . . . . . . . . . . 41
5.2.4 Simulation Results . . . . . . . . . . . . . . . . . . . 43
5.2.5 Error Performance . . . . . . . . . . . . . . . . . . . 44
5.2.6 Convergence Rate Analysis . . . . . . . . . . . . . . . 49
6 Modified Feature Selection, ANN and Modified EDE based
Forecast Strategy 50
6.1 Motivation............................ 51
6.2 The Proposed S-STLF Model . . . . . . . . . . . . . . . . . 51
6.2.1 Modified MI based Feature Selection . . . . . . . . . 53
6.2.2 ANN based STLF . . . . . . . . . . . . . . . . . . . . 56
6.2.3 mEDE Based Forecast Error Minimization . . . . . . 58
6.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 60
6.3.1 Error Performance . . . . . . . . . . . . . . . . . . . 64
6.3.2 Convergence Rate Analysis . . . . . . . . . . . . . . . 67
6.3.3 Scalability Analysis . . . . . . . . . . . . . . . . . . . 68
7 Conclusion and Future Work 71
7.1 Conclusion............................ 72
7.2 FutureWork........................... 72
8 References 74
xi
LIST OF FIGURES
1.1 AnSMG............................. 4
4.1 ANN based forecast: Block diagram . . . . . . . . . . . . . . 17
4.2 Data preparation module for ANN based forecast . . . . . . 19
4.3 An artificial neuron . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 ANN based data forecast module . . . . . . . . . . . . . . . 25
4.5 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs
MI+ANN forecast (actual vs forecast) . . . . . . . . . . . . 27
4.6 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs
MI+ANN forecast (error performance) . . . . . . . . . . . . 28
4.7 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs
MI+ANN forecast (convergence rate analysis) . . . . . . . . 28
4.8 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN
forecast (actual vs forecast) . . . . . . . . . . . . . . . . . . 29
4.9 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN
forecast (error performance) . . . . . . . . . . . . . . . . . . 29
4.10 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN
forecast (convergence rate analysis) . . . . . . . . . . . . . . 30
5.1 mEDE and ANN: Block diagram . . . . . . . . . . . . . . . 35
5.2 DAYTOWN (27th January, 2015): MI+ANN+mEDE fore-
cast vs Bi-level forecast and MI+ANN forecast (actual vs
forecast)............................. 44
5.3 DAYTOWN (27th January, 2015): MI+ANN+mEDE fore-
cast vs Bi-level forecast and MI+ANN forecast (error per-
formance) ............................ 45
5.4 DAYTOWN (27th January, 2015): MI+ANN+mEDE fore-
cast vs Bi-level forecast and MI+ANN forecast (convergence
rateanalysis) .......................... 45
5.5 EKPC (27th January, 2015): MI+ANN+mEDE forecast vs
Bi-level forecast and MI+ANN forecast (actual vs forecast) . 46
xii
5.6 EKPC (27th January, 2015): MI+ANN+mEDE forecast vs
Bi-level forecast and MI+ANN forecast (error performance) 46
5.7 EKPC (27th January, 2015): MI+ANN+mEDE forecast vs
Bi-level forecast and MI+ANN forecast (convergence rate
analysis)............................. 47
5.8 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-
level forecast and MI+ANN forecast (actual vs forecast) . . 47
5.9 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-
level forecast and MI+ANN forecast (error performance) . . 48
5.10 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-
level forecast and MI+ANN forecast (convergence rate anal-
ysis) ............................... 48
6.1 The proposed S-STLF model . . . . . . . . . . . . . . . . . . 52
6.2 PJMW: Actual vs forecast . . . . . . . . . . . . . . . . . . . 61
6.3 EKPC: Actual vs forecast . . . . . . . . . . . . . . . . . . . 61
6.4 DAYTOWN: Actual vs forecast . . . . . . . . . . . . . . . . 62
6.5 FE: Actual vs forecast . . . . . . . . . . . . . . . . . . . . . 62
6.6 PJMW: Error performance . . . . . . . . . . . . . . . . . . . 63
6.7 EKPC: Error performance . . . . . . . . . . . . . . . . . . . 63
6.8 DAYTOWN: Error performance . . . . . . . . . . . . . . . . 64
6.9 FE: Error performance . . . . . . . . . . . . . . . . . . . . . 64
6.10 PJMW: Convergence rate analysis . . . . . . . . . . . . . . . 66
6.11 EKPC: Convergence rate analysis . . . . . . . . . . . . . . . 66
6.12 DAYTOWN: Convergence rate analysis . . . . . . . . . . . . 67
6.13 FE: Convergence rate analysis . . . . . . . . . . . . . . . . . 67
6.14 Impact of load on error performance . . . . . . . . . . . . . 69
xiii
LIST OF TABLES
4.1 Simulation parameters of ANN based forecast . . . . . . . . 27
5.1 Simulation parameters of mEDE and ANN based forecast . . 44
6.1 Simulation parameters of mFS, ANN, and mEDE based fore-
cast ............................... 65
6.2 Performance evaluation of the selected forecast strategies . . 70
xiv
Chapter 1
Introduction
1
1.1 The Smart Grid
In most parts of the world, especially in developed countries, transmission
and distribution systems have become aged. Existing/traditional grid sys-
tem needs renovation not only to bridge the ever increasing gap between
demand and supply but also to meet some other essential challenges like
grid reliability, grid robustness, customer electricity cost minimization, etc
[1]. In this regard, recent integration of the latest information and commu-
nication technologies with the existing grid system has gained enormous
attention. One of the beauties of this integration is customer engagement
that plays a key role in the economies of energy trade. In other words, the
old concept of uni-directional energy flow is replaced by the new and smart
concept of bi-directional energy flow–transformation from traditional con-
sumer to a smart prosumer or transformation from traditional grid into a
smart one (the smart grid) [2]. European technology platform (European
Commission, 2006) defines smart grid as, “a smart grid is an electricity
network that can intelligently integrate the actions of all users connected
to it–generators, consumers and those that do both in order to efficiently
deliver sustainable, economic and secure electricity supplies”. Smart grid
has revolutionized the performance of all the sections of conventional grid.
In case of conventional grid, energy can only flow from generation side to
consumer, whereas, in case of smart grid consumer can also sell its ex-
tra electricity generated through domestic sources, e.g., solar, wind, etc
[3]. Introduction of smart grid infrastructure on distribution section has
manifold impact where retailers and consumers are important players of
distribution section. Prior and after advanced technology installation, util-
ities seek for as much return on investment as possible. On the other hand,
customers seek for as minimum electricity consumption paying cost as pos-
sible. Thus, the nature of not only utilities but also consumers is greedy.
Traditional grid was unable to entertain both parties at the same time due
to lack of flexibility. In other words, absence of two way communication or
bi-directional energy flow between utility and consumer makes the tradi-
tional grid inadequate to meet modern day grid challenges like reliability,
robustness, etc [4]. It is more likely that the smart grid will integrate
new communication technologies, advanced metering, distributed systems,
distributed storage, security and safety to achieve considerable robustness
and reliability [5, 6, 7]. From this discussion, it is clear that unlike tradi-
tional grid where utility was the main/dominant player, smart grid involves
customers in energy trade as well–bi-directional energy flow.
In smart grids, user engagement via two way communications leads to peak
2
load reduction as optimal decisions are taken by the energy management
unit. The resulting/new grid with its advanced metering infrastructure,
will affect that how [8]:
•to determine and meet the load,
•to determine customer engagement with utility, and
•integration of the latest technologies will affect the energy trade be-
tween customer and utility.
Thus, we have two main players in the smart grid; user and utility (ev-
ery user is a player if more one users are considered). The bi-directional
communication or energy flow benefice not only the utilities but also the
consumers. More specifically, the consumers are no longer only consumers
instead they are prosumers who have the ability to access electricity market
both as sellers and buyers. At the same time, the smart utilities have the
ability to efficiently manage their resources. Consequently, the demand and
supply gap that is ever increasing can be met [6, 9]. Initially, the utility fore-
casts future load/price signal that is based on past activities of the users.
Users then adjust their power usage schedules as per utilities price/load
signal while not compromising their comfort levels. However, with the ever
growing expectations, accurate forecast strategies and advanced scheduling
techniques are of extreme significance that would make the over all opera-
tion as optimal as possible. In this regard, many demand side scheduling
techniques are proposed [10, 11, 12, 13]. However, there exists sufficient
challenges prior to scheduling techniques in terms of stochastic information
schemes to predict the future load. Thus, with the growing expectation in
the adoption of smart grids, advanced techniques and tools are required to
optimize the overall operation. Moreover, this determination would require
that the daily operations of a smart grid utility (like strategic decisions to
bridge the gap between demand and supply, and fuel resource planning)
are properly conveyed. To sum up, all these decisions are highly influenced
by the underlying load forecast strategy [14].
1.2 Towards Localization: The Smart Micro-Grid
By taking into consideration the development of demand response in smart
grids, the resulting capacity on the user side (in residential areas) would
be small enough such that we can refer it as a micro-grid (refer Fig. 1.1
[15]). It is foreseen that over the next decade, Smart Micro-Grids (SMGs)
will significantly grow due to minimized installation cost, higher reliability,
3
Figure 1.1: An SMG
increased support from prosumers and utilities, etc. During disturbances,
an SMG can work in islanded mode, i.e., it can disconnect itself from the
main distribution system. Thus, the SMG can maintain a high service
level. Moreover, islanding in an intentional manner (no disturbances) has
the potential to provide high local reliability [16]. Another benefit of SMGs
is the exploitation of distributed control to prevent single point of failure.
1.2.1 Load Forecast
In terms of load forecast, the SMGs are more difficult to realize than macro
smart grids. This is obvious as load forecast curve exhibits more volatile
and non-linear load fluctuations in SMGs as compared macro smart grids.
Load forecast is one of the fundamental as well as essential tasks that are
needed for proper operation of the micro-grid. On another note, accurate
load forecasting leads to enhanced management of resources (renewable and
conventional) which in turn directly affects the economies of energy trade.
In SMGs, load forecast is of two types; short term and long term. However,
in terms of Short Term Load Load Forecast (STLF), the micro-grid is more
4
difficult to realize due to lower similarities (high randomness due to more
load fluctuations) in history load curves as compared to that of long term
load forecasting [16].
As mentioned earlier, the load of a micro-grid shows more fluctuations as
compared to the traditional large power system. In these grids, adaptation
to production w.r.t load can be performed in a more dynamic way as com-
pared macro grids. The load curve of an SMG does not always show the
same shape due to random power consumption schedules of the prosumers
which leads to more variability as compared to the macro grid. However,
all these operations are significantly affected by the underlying forecast
strategy to predict the future load(s). Due to more volatility in the history
load curve, STLF is more challenging than long term load forecast [17]. In
literature, many STLF strategies are presented. The authors in [18] use Ar-
tificial Neural Network (ANN) and mutual information based technique to
forecast load/price of the next day. In their work, Artificial Neurons (ANs)
are activated by sigmoid function because of its ability to capture non-
linearity(ies) in the load time series. Apart from its advantages, the major
disadvantage of this strategy is the high value of relative error between the
actual and forecast curves. Subject to relative error minimization of [18],
[16] utilizes Enhanced version of Differential Evolution (EDE) algorithm.
This integration minimizes the forecast error very efficiently, however, not
only further improvement can be achieved in terms of accuracy but also the
execution time of this strategy can be improved which is relatively on the
higher side. Another hybrid STLF strategy is presented in [19], however,
this strategy is very complex in terms of implementation and its execution
time is also very high.
1.2.2 Our Contribution
In this thesis, we present a Scalable-STLF (S-STLF) model for Micro-grid
Communication Networks (MCNs). We use a modular strategy where the
output of each preceding module is fed into the succeeding module. Over-
all, our proposition consists of three modules; feature selector, forecaster,
and optimizer. Initially, the feature selector receives historical time se-
ries of load data as input, and then selects candidate inputs having more
relevant information based on our improved version of the mutual infor-
mation based technique. Thus, the feature selector minimizes the curse of
high dimensionality. Followed by the forecaster (note: it consists of ANN)
which receives selected candidate inputs from the feature selector. Based
on this received data, ANs (activated by sigmoid function) are trained to
5
predict load of the upcoming day. At this stage, the relative error between
the actual and forecast curves is high. Thus, the optimizer, which con-
sists of our modified version of the EDE (mEDE) algorithm, minimizes
the forecast error. The proposed S-STLF model for MCNs is validated via
simulations which show that our proposed S-STLF model performs better
than the selected existing strategies in terms accuracy, convergence rate,
and scalability.
Rest of the thesis is organized as follows. Chapter 2 contains relevant
STLF contributions from research community, chapter 3 deals with the
basic architecture of a generic forecast model, chapter 4 contains description
of ANN based forecast strategy, chapter 5 integrates EDE with the ANN
based forecast strategy of chapter 4, chapter 6 integrates feature selection
module with ANN+EDE based forecast strategy of chapter 5, and chapter
7 not only concludes the thesis but also provides future research directions.
Finally, references are provided at the end of the thesis.
6
Chapter 2
Related Work
7
As accurate load forecasting has a direct impact on the economics of energy
trade. So, we discuss some of the previous load forecasting research articles
in SMGs as follows.
2.1 Stochastic Distribution Based Strategies
[25] presents a probabilistic approach that is subjected to energy consump-
tion profile generation of household appliances. The proposed approach
takes a wide range of appliances into consideration along with a high degree
of flexibility. The proposed methodology configures household appliances
between holidays and working days. Main assumptions of this work are; (i)
gaussian distributed ON-OFF cycles of different appliances, (ii) gaussian
distributed appliances’ energy consumption patterns, and (iii) gaussian dis-
tributed appliances in terms of their number. In this work, not only a wide
range of appliances is considered but also high flexibility degree of appli-
ances is considered. However, absence of closed form solution makes the
gaussian based forecast strategy very complex. Moreover, these assumption
can not be always true, thus, accuracy of the predicted load-time series is
highly questionable.
An improvement over [25] is presented in [26]. This research work uses reg-
ulizer to overcome the computational complexity of gaussian distribution
based STLF strategy in [25]. Moreover, the proposed STLF strategy has
the ability to capture heteroscedasity of load in a more efficient way as com-
pared [25]. Simulations are conducted to prove that the proposed STLF
strategy performs better than the existing one. To sum up, we conclude
that [26] has overcome the complexity of [25] to some extent, however, the
basic assumptions (gaussian distribution based on-off cycles of household
appliances, number of appliances, and power consumption pattern of appli-
ances) still hold the bases and thus make the proposal highly questionable
in terms of accuracy.
2.2 ANN based Strategies
In [18], authors present a hybrid technique subject to short term price
forecasting of SMGs. This hybrid technique comprises of two steps; feature
selection and prediction. In the first step, a mutual information based
technique is implemented to remove redundancy and irrelevancy from the
input load time series. In the second step, ANN along with evolutionary
8
algorithm is used to predict the future load time curve. In this process,
the authors assume sigmoid activation function for ANs. In addition, the
authors fine-tune some adjustable parameters during the first and second
steps via an iterative search procedure which is part of this work. Subject to
forecast accuracy, this technique is efficient as it embeds various techniques,
however, the cost paid is implementation complexity.
In [16], the authors study the characteristics of load time series of a micro
grid and then compare its differences with that of a traditional power sys-
tem. More importantly, the authors propose a bi-level (upper and lower)
short term load prediction strategy for micro grids. The lower level is a
forecaster which utilizes neural network and evolutionary algorithm. The
upper level optimizes the performance of the lower level by using the differ-
ential evolution algorithm. In terms of effectiveness, the proposed bi-level
prediction strategy is evaluated via real time data of a Canadian univer-
sity. Effectiveness of this work is reflected via MATLAB simulations which
demonstrate that the proposed strategy performs STLF in SMGs with a
reasonable accuracy. However, its implementation complexity is very high.
Another ANN based STLF strategy is presented in [23]. This hybrid
methodology completes the STLF task in four steps; data selection, trans-
formation, forecast, and error correction. In step one, some well known
techniques of data selection are used to minimize the high dimensional-
ity curse of input load time series characteristics. Step two deals wavelet
transformation of the selected characteristics of input load time series to
enable redundancy and irrelevancy filter implementation. Followed by step
three, which uses ANN and a training algorithm subject to STLF in SMGs.
More importantly, they choose sigmoid activation function for ANs due
non-linear capturability. Finally, error correcting functions are used in step
four to improve the proposed STLF methodology in terms of accuracy. In
simulations, this methodology is tested against practical household load
which demonstrates that this methodology is very good in terms of accu-
racy, however, at the cost of complexity.
Similarly, another novel strategy is presented in [24] to predict the oc-
currence of price spikes in SMGs. The proposed strategy utilizes wavelet
transformation for input feature selection. An ANN is then used to pre-
dict future price spikes based on the training of the selected inputs. In
[27], another STLF strategy is presented for SMGs which is completed in
five steps: (i) database handling of historical load data, (ii) detection of
missing data and its interpolation, (iii) principle component analysis to
detect outliers, (iv) ANN based forecast, and (v) display the forecast data
on different devices. However, accuracy of [27] is not satisfactory.
9
2.3 Markov Chain Based Strategies
Subject to robustness of STLF forecast strategy, authors in [22] propose a
markov chains based strategy. This stochastic strategy aims to tackle load
time series fluctuations associated with energy consumption of users in a
heterogeneous environment. The markov chains are used to predict the
future on-off cycles of household appliances in a robust way due to their
memoryless nature (future values only depend on the current values; past
values are not considered). This memoryless nature of markov chains not
only makes the STLF strategy robust but also relatively less complex in
comparison to the aforementioned techniques. However, the memory less
nature of markov chains also has a drawback; less accuracy.
10
Chapter 3
Forecast Strategies: Towards Development
11
Subject to daily supply and demand planning of an utility, the daily oper-
ations are strongly influenced by price/load forecast strategies. Accurate
load forecasting hold basis for spot price management in the system. As a
growing interest is shown by utilities towards the implementation of smart
grids, so the significance of forecast strategies becomes more important due
to expanded application horizon–storage maintenance, demand side man-
agement, integration of renewable resources, load scheduling, etc. From
customers point of view, accurate forecast strategies means proper under-
standing of the relationship between price and demand that enable them
to properly schedule their usage pattern.
3.1 Challenges
Due to growing awareness of customer participation in smart grids, utilities
are enforced to to develop load/price forecast strategy(ies). However, in
doing so utilities face many challenges like [20]:
•High and varied range of customers consumption data.
•Highly volatile nature of the load/price signal.
•Highly non-linear characteristics of the load/price signal.
•Hybrid customer groups–using traditional meters or smart meters.
•High dimensionality curse of identifying factors that may lead to over-
fitting problem.
•Complexity of the identifying parameters.
•Lack of data availability for different scenarios.
3.2 Influencing Factors
In addition to the aforementioned challenges, there are some factors that
influence load forecasting in smart grid [20]:
•Weather conditions: specifically when renewable energy sources are
integrated.
•Time of the day: electricity consumption significantly varies at dif-
ferent time slots of the day.
12
•Random disturbances: for example, sudden cloudy conditions highly
disturb solar generation.
•Electricity price market: as per price market the customers consump-
tion pattern varies and vice versa.
•Storage cells: at both utility and customer locations would greatly
affect the forecast signal.
3.3 Basic Units of a Generic Forecast Model
In view of the aforementioned challenges, the research community has de-
veloped many forecast strategies. From these works, we conclude that a
forecast strategy comprises of three basic units; feature selector, forecaster,
optimizer. In this subsection, these basic units are discussed in detail.
3.3.1 Feature Selector
As per basic assumption of the feature selector, input data not only con-
tain irrelevant features but also redundant features. Irrelevant features are
those which do not provide useful information, and redundant features are
the duplicate ones that do not provide more information. In this unit, a
subset of most relevant features subject to forecast strategy development
is selected. Incorporation of feature selector mainly provide three benefits;
reduced over-fitting, decreased time during training, and improved model
interpretation.
In literature, many forecast strategies exist that have utilized the feature
selector. For example, [19] uses four indices of load variation and empirical
mode decomposition for feature selection. The indices of load variation
observe data variation over months, between two adjacent days, between
hours of the same day and in between an hour. Followed by the empirical
mode decomposition algorithm that gradually decomposes the load/price
signal into linear components along with some residue. The decomposed
components are then ranked based on different trends and scales. Similarly,
[21] applies forward selection algorithm to select reduced number of scenar-
ios that were generated via monte carlo simulation. [22] utilizes multi-scale
setting to fine tune information during state aggregation. [9] uses probabil-
ity distribution function and roulette wheel mechanism to generate several
scenarios. Among the generated scenarios, a subset is selected based on
13
scenario reduction process where weibull and gaussian probability distri-
bution functions are utilized. In another work [23], input data is initially
classified into schedulable and non-schedulable loads. Then, wavelet trans-
formation is conducted to rank the input data into detailed components
(high frequency) and approximate components (low frequency). Finally,
[18, 16, 24] use entropy based mutual information technique for feature
selection.
3.3.2 Forecaster
The basic purpose of this unit is to forecast the future load/price signal
based on learning algorithms. Since the load/price is highly non-linear, the
forecaster needs to capture these non-linearities with reasonable accuracy
and execution time. The type of learning used here would be supervised
learning because history load data is available. An important advantage
of the forecaster is its ability to provide valuable information. Based on
this valuable information, experts take qualitative as well as quantitative
decisions that benefice the energy trade between utility and its customers.
Literature review reveals that many strategies have been proposed subject
to this unit. For example, [19] uses extreme learning machine with kernel
in an artificial neural network environment, and [22] uses markov chains
to predict the next state. [16, 18, 24] uses artificial neural network based
forecaster. Among the typically used activation functions, these authors
prefer sigmoid function for neuron activation due to its ability to handle the
non-linearities associated with price/load signal. Subject to training of the
network, [24] uses mallat’s algorithm, [23] uses discrete wavelet transforma-
tion based technique, and [16, 16] use multi-variate auto regressive model.
Some other well known training algorithms are Newton’s Method, Gradient
Descent based back propagation, levenbergmarquardt learning algorithm,
etc. However, among these training algorithms, the typically used one is
levenbergmarquardt learning algorithm because it can train the artificial
neural network 10–100 times faster than the classical Newton’s Method and
Gradient Descent based back propagation algorithm. Rest of the training
algorithms are still unexplored in this area–a potential research area for
future. It is worth mentioning that some forecasters like [16, 18] also use
an evolutionary algorithm based local optimizer. The key benefit of local
optimizer is its ability to escape from trapping in local minima/maxima
that may arise during the training process of the artificial neural network.
14
3.3.3 Optimizer
Generally, an optimization problem is written as;
Max f0(x)OR M in f0(x) (3.1)
subject to:
fi(x)≤ci∀i∈Z+(3.2)
where, the optimization variable is x= (x1, ..., xn), the objective function
is f0:Rn→R, the constraints are fi:Rn→R, and the upper bounds are
ci.x∗is an optimal solution of the optimization problem if and only if it
has the smallest or greatest objective value among all the possible solution
vectors that satisfy the constraints; we have f0(z)≥f0(z∗) for any zwith
f1(z)≤c1, ..., fn(z)≤cn.
Subject to forecast strategy(ies), the forecaster returns day ahead pric/load
signal with some error. The forecast strategy can be further enhanced in
terms of accuracy if error minimization is considered as an objective func-
tion of the optimizer. However, in this process, surplus execution time is
spent. For applications, where accuracy is more important than execution
time, the optimizer is of extreme significance. Here, heuristic optimization
techniques (like differential evolution, particle swarm optimization, etc.)
are preferred over the other optimization techniques (like linear program-
ming, non-linear programming, etc.) due faster convergence rate. In this
regard, only few techniques have been proposed that take into considera-
tion the optimizer. For example, [16] uses enhanced differential evolution
algorithm and [19] uses particle swarm optimization in the optimizer. To
our knowledge, this unit is still unexplored and can be considered as a po-
tential research area (ant algorithms, bee algorithms, genetic algorithms,
etc. need to be explored).
15
Chapter 4
ANN Based Forecast Strategy
16
Subject to complex day-ahead load forecast of SGs, any proposed prediction
strategy should be capable enough to mitigate the non-linear input/output
relationship as efficiently as possible. ANNs are widely used as forecasters
that can predict the non-linear behaviour of SG’s load time series with
acceptable accuracy. However, prior to ANN based forecasting, input load
time series must be made compatible. Therefore, our proposed day-ahead
load forecasting model (for SGs) consists of three modules; data prepara-
tion module, feature selection module and forecast module (refer figure 4.1).
The first module performs pre-processing to make the input data compati-
ble with the feature selection module and the forecast module. The second
module removes irrelevant and redundant features from the input data.
The third module consists of an ANN to forecast day-ahead load of the
SG. Details are as follows.
Figure 4.1: ANN based forecast: Block diagram
4.1 Data Preparation Module
As mentioned earlier, the data preparation module receives the input load
time series (historical). Suppose, the input load time series is shown by the
17
following matrix:
P=
pd1
h1pd1
h2pd1
h3. . . pd1
hm
pd2
h1pd2
h2pd2
h3. . . pd2
hm
pd3
h1pd3
h2pd3
h3. . . pd3
hm
.
.
..
.
..
.
.....
.
.
pdn
h1pdn
h2pdn
h3. . . pdn
hm
(4.1)
where, hmis the mth hour, dnis the nth day, and pdn
hmis historical power
consumption value at mth hour of the nth day. As there are 24 hours in a
day, so m= 24. The value of ndepends on designer’s choice, i.e., greater
value of nleads to fine tuning during the training process of the forecast
module because more lagged samples of input data are available. However,
it would lead to more execution time.
Prior to feed the ANN with input matrix P, the following step wise oper-
ations are performed by the data preparation module (refer Fig. 4.4):
1. Local maximum: Initially, a local maximum value is calculated for
each column of the Pmatrix; pci
max =max{pd1
hi, pd2
hi, pd3
hi,...,pdn
hi},
∀i∈ {1,2,3, . . . , n}.
2. Local normalization: In this step, each column of the matrix Pis nor-
malized by its respective local maxima such that the resultant matrix
is represented by Pnrm. Now, each entry of Pnrm ranges between 0
and 1.
3. Local median: For each column of Pnr m matrix, a local median value
Mediis calculated (∀i∈ {1,2,3,...,n}).
4. Binary encoding: Each entry of Pnrm matrix is compared with its
respective Medivalue. If the entry is less than its respective local
median value, then it is encoded with a binary 0, else, it is encoded
with a binary 1. In this way, a resultant matrix containing only
binary values (0’s and 1’s),Pb, is obtained.
At this stage, the Pbmatrix is compatible with the forecast module and is
thus fed into it.
4.2 Feature Selection Module
Once the data is binary encoded, not only redundant but also irrelevant
samples needs to be removed from the lagged input data samples. In re-
moving redundant features, the execution time during the training process
18
Figure 4.2: Data preparation module for ANN based forecast
is minimized. On the other hand, removal of irrelevant features leads to
improvement in forecast accuracy because the outliers are removed.
In order to remove the irrelevant and redundant features from the binary
encoded input data matrix Pb, an entropy based mutual information tech-
nique is used in [16, 18] which defines the mutual information between
input Qand target Tby the following formula,
MI(Q, T ) = X
iX
j
p(Qi, Tj)log2p(Qi, Tj)
p(Qi)p(Ti)∀i, j ∈ {0,1}(4.2)
In equation 4.2, M I (Q, T ) = 0 means that Qand Tare independent, high
value of MI (Q, T ) means that Qand Tare strongly related and low value
of MI(Q, T ) means that Qand Tare loosely related.
Thus, the candidate inputs are ranked with respect to the mutual informa-
tion value between input and target values. In [16],[18], the target values
are chosen as the last samples for every hour of the day among all the
training samples (for every hour only one target value is chosen that is
value of the previous day). Choice of the last sample seems logical as it
is the closest value to the upcoming day with respect to time, however, it
may lead to serious forecast errors due to inconsideration of the average
behaviour. However, consideration of only the average behaviour is also in-
sufficient because the last sample has its own importance. To sum up, we
come up with a solution that not only considers the last sample but also
the average behaviour. Thus, we modify equation 4.2 for three discrete
random variables as,
MI(Q, T , M) = X
iX
jX
k
p(Qi, Tj, Mk)log2p(Qi, Tj, Mk)
p(Qi)p(Ti)p(Mk)∀i, j ∈ {0,1}
(4.3)
19
In expanded form, equation 4.3 is written as follows,
MI(Q, T , M) = p(Q= 0, T = 0, M = 0) ×log2p(Q= 0, T = 0, M = 0
p(Q= 0)p(T= 0)p(M= 0)
+p(Q= 0, T = 0, M = 1) ×log2p(Q= 0, T = 0, M = 1
p(Q= 0)p(T= 0)p(M= 1)
+p(Q= 0, T = 1, M = 0) ×log2p(Q= 0, T = 1, M = 0
p(Q= 0)p(T= 1)p(M= 0)
+p(Q= 0, T = 1, M = 1) ×log2p(Q= 0, T = 1, M = 1
p(Q= 0)p(T= 1)p(M= 1)
+p(Q= 1, T = 0, M = 0) ×log2p(Q= 1, T = 0, M = 0)
p(Q= 1)p(T= 0)p(M= 0)
+p(Q= 1, T = 0, M = 1) ×log2p(Q= 1, T = 0, M = 1)
p(Q= 1)p(T= 0)p(M= 1)
+p(Q= 1, T = 1, M = 0) ×log2p(Q= 1, T = 1, M = 0)
p(Q= 1)p(T= 1)p(M= 0)
+p(Q= 1, T = 1, M = 1) ×log2p(Q= 1, T = 1, M = 1)
p(Q= 1)p(T= 1)p(M= 1)
(4.4)
In order to determine the M I value between Qand T, the joint and inde-
pendent probabilities needs to be determined. For this purpose, an auxil-
iary variable Avis introduced.
Av= 4T+ 2M+Q∀T, M, Q ∈ {0,1}(4.5)
It is clear from equation 4.5 that Avranges between 0 and 7. A0v,A1v,
A2v,A3v, ..., A7vcounts the number of sample data points (out of total
ldata points) for which Av= 0, Av= 1, Av= 2, Av= 3,..., Av=
7, respectively. In this way, we can now easily determine the joint and
20
independent probabilities as follows.
p(Q= 0, T = 0, M = 0) = A0v
l
p(Q= 0, T = 0, M = 1) = A2v
l
p(Q= 0, T = 1, M = 0) = A4v
l
p(Q= 0, T = 1, M = 1) = A6v
l(4.6)
p(Q= 1, T = 0, M = 0) = A1v
l
p(Q= 1, T = 0, M = 1) = A3v
l
p(Q= 1, T = 1, M = 0) = A5v
l
p(Q= 1, T = 1, M = 1) = A7v
l
p(Q= 0) = A0v+A2v+A4v+A6v
l
p(Q= 1) = A1v+A3v+A5v+A7v
l
p(T= 0) = A0v+A1v+A2v+A3v
l
p(T= 1) = A4v+A4v+A5v+A7v
l(4.7)
p(M= 0) = A0v+A1v+A4v+A5v
l
p(M= 1) = A2v+A3v+A6v+A7v
l
Based on equation 4.4, mutual information between Qand Tis calculated,
and thus redundancy and irrelevancy is removed from the input samples.
This mutual information based technique is computed with reasonable ex-
ecution time and acceptable accuracy.
4.3 Forecast Module
By evaluating load variations over several months or between two con-
secutive days or between consecutive hours over a day, [19] concluded that
21
SG’s load-time series signal exhibits strong volatility and randomness. This
result is obvious because different users have different energy/power con-
sumption patterns/habits. Thus, in terms of DLF, realization of a SG is
more difficult as compared to its realization in terms of long-term load
forecast. Therefore, the basic requirement of the forecast module is to
forecast the load-time series of a SG by taking into consideration its non-
linear characteristics. In this regard, ANNs are widely used due to two
reasons; accurate forecast ability, and the ability to capture the non-linear
characteristics.
Due to the aforementioned reasons, we choose ANN based implementation
in our forecast module. Initially, the forecast module receives selected
features SF (.), and then constructs training ‘T S’ and validation samples
‘V S’ from it as follows:
T S =SF (i, j),∀i∈ {2,3,...,m}
and ∀j∈ {1,2,3,...,n}(4.8)
V S =SF (1, j),∀j∈ {1,2,3,...,n}(4.9)
From equations 4.8 and 4.9, it is clear that the ANN is trained by all the
historical load-time series candidates except the last one which is used for
validation purpose. This discussion leads us towards the explanation of
the training mechanism. However, prior to explanation, it is essential to
describe the ANN. An ANN, inspired from the nervous system of humans,
Figure 4.3: An artificial neuron
is a set of Artificial Neurons (ANs) to perform tasks of interest (note: our
task of interest is STLF of micro-grids). Usually, an AN performs a non-
linear mapping from RIto [0,1] that depends on the activation function
used.
fAN
act :RI→[0,1] (4.10)
22
where Iis the vector of input signal to AN. Fig. 4.3 illustrates the structure
of an AN that receives I= (I1, I2,...,In). In order to either deplete or
strengthen the input signal, to each Iiis associated a weight wi. The ANN
computes I, and uses fAN
act to compute the output signal ‘y’. However, the
strength of yis also influenced by a bias value (threshold) ‘b’. So, we can
compute Ias follows:
I=
imax
X
i=1
Iiwi(4.11)
The fAN
act receives Iand bto determine y. Generally, fAN
act s are mappings
that monotonically increase (fAN
act (−∞ = 0) and fAN
act (+∞= 1)). Among
the typically used fAN
act s, we use sigmoid fAN
act .
fAN
act (I, b) = 1
1 + e−α(I−b)(4.12)
We choose sigmoid fAN
act due to two reasons; fAN
act ∈(0,1) and the param-
eter αhas the ability to control steepness of the fAN
act . In other words,
sigmoid fAN
act choice enables the AN to capture the non-linear characteris-
tic of load time series. Since, this work aims at day-ahead load forecasting
for micro-grids, and one day consists of 24 hours. So, the ANN consists of
24 forecasters (one AN for an hour) where each forecaster predicts load of
one hour of the next day. In other words, 24 hourly load time-series are
separately modeled instead of one complex forecaster. The whole process
is repeated every day to forecast load of the next day.
The question that now needs to be answered is how to determine wiand
b? The answer is straight forward, i.e., via learning. In our case, prior
knowledge of load-time series exists. Thereby, we use supervised learning;
adjusting wiand bvalues until a certain termination criterion is satisfied.
The basic objective of supervised training is to adjust wiand bsuch that
the error signal ‘e(k)’ between the target value ‘ˆy(k)’ and real output of
neuron ‘y(k)’ is minimized.
Minimize e(k) = y(k)−ˆy(k),
∀k∈ {1,2,3,...,m}(4.13)
We use the method of least squares to determine the parameter matrices,
that is given as follows,
Minimize J (K) =
m
X
k=1
eT(k)e(k),
∀k∈ {1,2,3,...,m}(4.14)
23
Subject to most feasible solution of Eqn. 4.14, we use the multi-variate
auto regressive model presented in [28] because it solves the objective func-
tion in relatively less time with reasonable accuracy as compared to the
typically used learning rules like gradient descent, widrow-hoff, and delta
[29]. According to [28], the parameter matrices are given as follows,
n
X
i=1
W(i)R(j−i) = 0 j={2,3,...,n}(4.15)
n
X
i=1
W(i)R(i−j) = 0 j={2,3,...,n}(4.16)
where, W(1) = ID(IDis identity matrix), W(1) = ID, and Ris the cross
co-relation given as:
R(i) = 1
n
n−1−i
X
k=i
[x(k)−m][x(k−i)−m]T(4.17)
In Eqn. 4.17, mis the mean vector of observed data,
m=1
n
n
X
k=i
x(k) (4.18)
Based on these equations, [28] defines the following prediction error co-
variance matrices.
Vt=Pn
k=1 Wt(k)R(−k)
Vt=Pn
k=1 Wt(k)R(−k)
∆t=Pn
k=1 Wt(−k)R(t−k+ 1)
∆t=Pn
k=1 Wt(k)R(−t+k−1)
(4.19)
The recursive equations are as follows:
Wt+1(k) = Wt(k)Wt+1 (t+ 1)Wt(t−k+ 1)
Wt+1(k) = Wt(k)Wt+1(t+ 1)Wt(t−k+ 1))(4.20)
Wt+1(t+ 1) = −∆tV−1
t
Wt+1(t+ 1) = −∆tV−1
t)(4.21)
In order to find the weights, Eqn. 4.20 and Eqn. 4.21 are solved recursively.
For further details about the weight update mechanism, readers are sug-
gested to read [28]. Fig. 4.4 is pictorial representation of the steps involved
in data forecast module.
24
Figure 4.4: ANN based data forecast module
Once the weights in Eqn. 4.19 and Eqn. 4.20 are recursively adjusted as per
objective function in Eqn. 4.14, the output matrix is ten binary decoded
and de-normalized to get the desired load-time series. Stepwise algorithm
of the proposed methodology is shown in algorithm 1.
4.4 Simulation Results
We evaluate our proposed DLF model (m(MI+ANN)) by comparing it
with an existing MI+ANN model in [16]. In our simulations, historical
load time-series data from November (2014) to January (2015) is taken
from the publicly available PJM electricity market for two SGs in United
States of America; DAYTOWN, and EKPC [30]. November–December
(2014) data is used for training and validation purpose, and January (2015)
data is used for test purpose. Simulation parameters are shown in Table
4.1, and their justification can be found in [16, 18, 28, 29]. In this paper,
we have considered two performance metrics; % error, and execution time
(convergence rate).
•Error performance: It is the difference between actual and the fore-
cast signal/curve, and is measured in %.
•Convergence rate or execution time: The simulation time taken by
the system to execute a specific forecast model. Forecast models for
which execution time is small are said to converge fastly as compared
to the vice versa case. In this paper, execution time is measured in
seconds.
Figures 4.5 and 4.8 are the graphical illustrations of the fact that how
well our proposed ANN based DALF model predicts the target values of
an SG. In these figures, the proposed m(MI+ANN) based forecast curve
more tightly follows the target curve as compared to the existing MI+ANN
25
Algorithm 1 : Pseudo-code of day-ahead ANN load forecast
1: Pre-conditions: i= # of days, and j= # of hours per day
2: P←historical load data
3: Compute Pci
max ∀i∈ {1,2,3,...,m}
4: Compute Pnrm
5: Compute Medi∀i∈ {1,2,3,...,m}
6: for all (i∈ {1,2,3,...,m})do
7: for all (j∈ {1,2,3,...,n})do
8: if (P(i,j)
nrm ≤M edi)then
9: Pi,j
b←0
10: else if then
11: Pi,j
b←1
12: end if
13: end for
14: end for
15: Compute STand SV
16: Compute y(1) by letting W(1) = Iand
17: W(1) = I
18: while Max. # of iterations not reached do
19: if J(k+ 1) ≤J(k)then
20: y(k)←y(k+ 1)
21: else if then
22: Train ANN as per Eqn. 4.20 and Eqn. 4.21
23: Compute y(k+ 1) and go back to step (17)
24: end if
25: end while
26: Perform decoding
27: Perform de-normalization
26
Table 4.1: Simulation parameters of ANN based forecast
Parameter Value
Number of forecasters 24
Number of hidden layers 1
Number of neurons in the
hidden unit
5
Number of iterations 100
Momentum 0
Initial weights 0.1
Historical load data 26 days
Bias value 0
based forecast curve which is justification of the theoretical discussion of
our proposed methodology in terms of non-linear forecast ability. Not only
the sigmoid fAN
act (refer equation) but also the multivariate auto-regressive
training algorithm enable the day-ahead ANN based forecast methodology
to capture non-linearity(ies) in historical load data.
0 5 10 15 20 25
1600
1700
1800
1900
2000
2100
2200
2300
Time (hr)
Load (KW)
Actual
m(MI+ANN) forecast
MI+ANN forecast
Figure 4.5: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs
MI+ANN forecast (actual vs forecast)
4.4.1 Error Performance
Figure 4.6 shows the % forecast error when tests are conducted on DAY-
TOWN grid; our m(MI+ANN) forecasts with 2.9% and the existing MI+ANN
27
m(MI+ANN) forecast MI+ANN forecast
0
0.5
1
1.5
2
2.5
3
3.5
4X = 2
Y = 3.84
Error (%)
X = 1
Y = 2.9
Figure 4.6: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs
MI+ANN forecast (error performance)
m(MI+ANN) forecast MI+ANN forecast
0
1
2
3
4
5
6
7X = 2
Y = 6.54
Execution time (sec)
X = 1
Y = 2.48
Figure 4.7: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs
MI+ANN forecast (convergence rate analysis)
forecasts with 3.84% relative errors, respectively. Similarly, Fig. 4.9 shows
the % forecast error when tests are conducted on EKPC grid; our m(MI+ANN)
forecasts with 2.88% and the existing MI+ANN forecasts with 3.88% rel-
ative errors, respectively. This improvement in terms of relative % error
performance by our proposed DALF model is due to the following two rea-
sons; (i) the modified feature selection technique in our proposed DALF
28
0 5 10 15 20 25
1200
1300
1400
1500
1600
1700
1800
Time (hr)
Load (KW)
Actual
m(MI+ANN) forecast
MI+ANN forecast
Figure 4.8: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN
forecast (actual vs forecast)
m(MI+ANN) forecast MI+ANN forecast
0
0.5
1
1.5
2
2.5
3
3.5
4
X = 1
Y = 2.88
Error (%)
X = 2
Y = 3.88
Figure 4.9: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN
forecast (error performance)
model, and (ii) multi variate auto regressive training algorithm. The first
reason accounts for the removal of redundant as well as irrelevant features
from the input data in a more efficient way as compared to the existing
DALF model. By more efficient way we mean that as our proposal con-
siders average sample in the feature selection process as well in addition
to the last sample and the target sample. Thus, the margin of outliers
29
m(MI+ANN) forecast MI+ANN forecast
0
1
2
3
4
5
6
7
X = 1
Y = 2.58
Execution time (sec)
X = 2
Y = 6.6
Figure 4.10: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN
forecast (convergence rate analysis)
which cause significant relative % error is down-sized. The second reason
deals with the selection of an efficient training algorithm, as our proposi-
tion trains the ANN via the multi variate auto regressive algorithm and the
existing DALF model trains the ANN via levenberg-marquardt algorithm.
4.4.2 Convergence Rate Analysis
As discussed earlier that there exist a trade-off between forecast accuracy
and execution time. However, Figs. 4.6–4.7 and 4.9–4.10 show that our
proposed DALF model not only results in relatively less % error but also
less execution time. As mentioned earlier, our devised modifications in the
feature selection process and selection of the multi variate training algo-
rithm cause relative improvement in terms of % error. On the other hand,
m(MI+ANN) model converges with a faster rate (less execution time) as
compared to the existing MI+AN model due to three reasons; (i) exclu-
sion of the local optimization algorithm subject to error minimization, (ii)
modified feature selection process, and (iii) selection of multi variate auto
regressive training algorithm. Our proposition selects features from the
input data while considering average sample, last sample and the target
sample. This means that the chances of outliers in selected features have
been significantly decreased, and the local optimization algorithm used
by the existing MI+ANN forecast model is not further needed. Our pro-
posed m(MI+ANN) forecast model does not account for the execution time
30
taken by the iterative optimization algorithm. As a result, our proposed
DALF model converges with a faster rate as compared to the existing DALF
model.
31
Chapter 5
mEDE and ANN Based Forecast Strategy
32
In STLF problems, the stochastic volatility of target values of the time
series has a significant impact on the STLF errors. As we know that the
load of MG shows larger volatility as compared to that of traditional large
power system. Thus, with the objective to correctly discuss the difference
between load time series characteristics of a SMG and a large power system,
following are the typically used indices [19].
1. LV in months: In this technique, normalized mean square Mto show
LV in several months. Initially, sampled values of load time series are
normalized into [0,1] and then variance is calculated. If xiis the ith
load sample in total Nnumber of samples, then Mis given as follows:
M=1
N
N
X
i=1
(x′
i−µ′
i) (5.1)
where, x′
i=xi
max(x)and µ′=1
NPN
i=1 x′
i.
2. LV between two consecutive days: LV between two consecutive days
can be shown by two sub-indices; maximum difference ’Dmax’ and
average difference ’Davg ’. If Xiis the load of ith day and Ndis the
total number of days, then the formulae for Dmax and Davg are as
follows:
Dmax =max r∆Xi∆XT
i
L!(5.2)
Davg =1
Nd−1
Nd−1
X
i=1 r∆Xi∆XT
i
L(5.3)
where, ∆Xi=Xi+1 −Xi,i= 1,...,Nd−1 for Eqn. 5.2, i= 1,...,Nd
for Eqn. 5.3, and Xi=x(i−1)×L+1,...xi×L
x(i−1)×L+1 .
3. LV in a day: As the daily load time series curve shows variations–
average value ’¯xi’, minimum value ’xmin
i’ and maximum value ’xmax
i’.
For a maximum of Ndnumber of days, the minimum daily load rate
is computed as follows:
Rmin =1
Nd
Nd
X
i=1
xmin
i
xmax
i
.(5.4)
Similarly, the daily load rate can be computed as follows:
R=1
Nd
Nd
X
i=1
¯xi
xmax
i
(5.5)
33
4. LV in an hour: In order to analyze load variation in an hour, max-
imum slope ’mmax’ and average slope ’mavg ’ are used. If Nsamples
of load time series are considered then mavg and mmax are given as
follows:
mavg =1
N−1
N−1
X
i=1
|| x′
i−x′
i
ti+1 −ti
|| (5.6)
mmax =|| x′
i−x′
i
ti+1 −ti|| (5.7)
5.1 Motivation
Subject to highly volatile DALF of SGs, any forecast proposition must effi-
ciently deal with the non-linear input/output relationship. In this regard,
ANNs are widely used as forecasters because these networks can predict
the non-linearities of SGs’ load with low convergence time. However, some-
times the achieved prediction accuracy is not up to the mark. Thus, leading
to the adoption of optimization techniques that can significantly enhance
the prediction accuracy of ANNs. However, the cost paid to achieve high
accuracy is increased convergence time. Therefore, we focus on the devel-
opment of an DALF strategy that is based on a compromising approach
between prediction accuracy and convergence time.
5.2 The mEDE and ANN Based Forecast
Our proposed DALF strategy consists of three modules; pre-processing
module, forecast module, and optimization module (refer Fig. 5.1). The
pre-processing module makes the input load time series compatible with
the forecast module, and removes redundant and irrelevant features from
the input data. Based on sigmoid activation function and multi-variate
auto regressive model, the forecast module (which consists ANNs) performs
DALF of SGs. Finally, the optimization module minimizes prediction errors
to improve accuracy of the overall DALF strategy. Detailed description of
each module is as follows.
5.2.1 Pre-Processing Module
Since the ANN based forecaster uses only binary data to predict load of
the next day, the input data must be made compatible. In other words, the
34
Figure 5.1: mEDE and ANN: Block diagram
input data must be pre-processed to make it compatible with the forecast
module. In addition, redundant and irrelevant samples must be removed
from the input data set due to two reasons; (i) redundant features do not
provide more information and thus unnecessarily increase the execution
time during the training process (will be later discussed in the forecast
module), and (ii) irrelevant features do not provide useful information and
act as outliers. Detailed description of the pre-processor module is as fol-
lows.
As mentioned earlier, the data preparation module receives the input load
time series (historical). Suppose, the input load time series is shown by the
following matrix,
P=
p(h1, d1)p(h2, d1)p(h3, d1). . . p(hm, d1)
p(h1, d2)p(h2, d2)p(h3, d2). . . p(hm, d2)
p(h1, d3)p(h2, d3)p(h3, d3). . . p(hm, d3)
p(h1, d4)p(h2, d4)p(h3, d4). . . p(hm, d4)
p(h1, d5)p(h2, d5)p(h3, d5). . . p(hm, d5)
.
.
..
.
..
.
.....
.
.
p(h1, dn)p(h2, dn)p(h3, dn). . . p(hm, dn)
(5.8)
where, dnis the nth day, hmis the mth hour of the day, and p(hm, dn) is
35
the power consumption value of the of the nth day at the mth hour. As
per standard time horizon of one complete day, m= 24. The value of nis
totally dependent on the choice of designer; increasing nmeans fine tuning
during the training of the forecast module because more historical lagged
samples of input power matrix are available. However, this fine tuning is
achieved at the cost of more execution time. Thus, there is a trade-off
between convergence rate and accuracy.
Before feeding the forecast module with input matrix P, algorithm 2 is
executed by the pre-processing module to ensure P’s compatibility with
the forecast module.
Algorithm 2 : Pseudo-code of the pre-processing module
1: Pre-conditions: i= # of days, and j= # of hours per day
2: P←historical load data
3: Compute Pci
max ∀i∈ {1,2,3,...,m}
4: Compute Pnrm
5: Compute Medi∀i∈ {1,2,3,...,m}
6: for all (i∈ {1,2,3,...,m})do
7: for all (j∈ {1,2,3,...,n})do
8: if (P(i,j)
nrm ≤M edi)then
9: Pi,j
b←0
10: else if then
11: Pi,j
b←1
12: end if
13: end for
14: end for
Firstly, a local maximum value ‘pci
max’ is calculated subject to each column
of the historical input load matrix P;
pci
max =max(p(hi, d1), p(hi, d2), p(hi, d3),...
, p(hi, dn)),∀i∈ {1,2,3,...,m}(5.9)
Secondly, local normalization of each column of Pis carried out by its re-
spective local maxima; results are saved in Pnrm (range of Pnrm ∈[0,...,1]).
Thirdly, a local median value ’Medi’ (∀i∈ {1,2,3,...,n}) is computed
subject to each column of the Pnrm matrix. Fourthly, each element of Pnrm
matrix is compared with its respective local median value ’Medi’ based on
which encoding is performed as follows:
Pb(hi, dj) = 1 if Pnrm(hi, dj)≥Medi
0 Otherwise (5.10)
36
In this way, a resultant matrix ‘Pb’ consisting of only binary values (0’s
and 1’s) is obtained. This Pbmatrix not only contain irrelevant features
but also contain redundant features. In order to remove these two types
of features from the matrix Pb, we use mutual information technique that
is proposed in [16] and later on used in [18] as well. According to this
technique, the mutual information between input Xand target Tis given
as follows,
MI(X, T ) = X
iX
j
p(Xi, Tj)log2p(Xi, Tj)
p(Xi)p(Xi)(5.11)
In (5.11), M I (X, T ) = 0 means that the input and target variables and
independent, high value of MI (X, T ) means that the two variables are
strongly related and low value of M I(X, T ) means that the two variables
are loosely related. Expanded form of (5.11) is as follows,
MI(X, T ) = p(X= 0, T = 0) ×log2p(X= 0, T = 0
p(X= 0)p(T= 0)
+p(X= 0, T = 1) ×log2p(X= 0, T = 1
p(X= 0)p(T= 1)
+p(X= 1, T = 0) ×log2p(X= 1, T = 0
p(X= 1)p(T= 0)(5.12)
+p(X= 1, T = 1) ×log2p(X= 1, T = 1
p(X= 1)p(T= 1)
In order to determine the joint and independent probabilities in (5.12), an
auxiliary variable Vbis introduced.
Vb= 2T+X∀T, X ∈0,1 (5.13)
It is clear from (5.13) that Vbranges between 0 and 3. V0b,V1b,V2b, and
V3bcounts the number of sample data points (out of total ldata points) for
which Vb= 0, Vb= 1, Vb= 2, and Vb= 3, respectively. In this way, we can
now easily determine the independent and joint probabilities as follows.
p(X= 0) = V0b+V2b
l, p(X= 1) = V1b+V3b
l
p(T= 0) = V0b+V1b
l, p(T= 1) = V2b+V3b
l(5.14)
p(X= 0, T = 0) = V0b
l, p(X= 0, T = 1) = V2b
l
p(X= 1, T = 0) = V1b
l, p(X= 1, T = 1) = V3b
l(5.15)
37
Based on (5.12), mutual information between Xand Tis calculated, and
thus redundancy and irrelevancy is removed from the input samples. Ac-
cording to [16, 18], this mutual information based technique is computed
with reasonable execution time and acceptable accuracy.
5.2.2 Forecast Module
In literature, many research works exist that investigated LV in SMGs.
However, authors in [19] comprehensively examined LV based on four in-
dices; LV in months (refer Eqn. 5.1), LV between two consecutive days
(refer Eqn. 5.2), Eqn. 5.3), LV in day (refer Eqn. 5.4, Eqn. 5.5) and LV
in an hour (refer Eqn. 5.6, Eqn. 5.14). From these works, it is concluded
that any forecast strategy must be able to perform STLF of SMGs while
ensuring non-linear prediction capability. Therefore, we choose ANNs be-
cause these have the ability to capture the highly volatile characteristics of
load time series with reasonable accuracy.
For STLF, two strategies are used; direct forecasting and iterative fore-
casting [16]. However, it is discussed in [31] that the first strategy may
introduce significant round off errors and the second one introduces large
forecast errors. In order to overcome these imperfections, [16] has intro-
duced the idea of cascaded strategy. Thus, our proposed forecast module
implements the cascaded strategy. Our forecast module consists of an ANN;
24 consecutive cascaded forecasters such that each one of the 24 forecasters
has a single output to forecast the load of an hour of the upcoming day.
It is worth mentioning that the 24 hourly time series forecasters are sep-
arately modeled instead of a single complete/complex one. These 24 one
hour ahead forecasters allow improvement in terms of accuracy [16]. The
cascaded ANN forecast structure is a combination of direct and iterative
structures such that load of each hour of the next day is directly predicted
and each forecaster yields exactly one output.
In the forecast module, each forecaster is an AN that implements sigmoid
function as an activation function. We have chosen sigmoid activation
function because it enables the AN to capture the highly volatile (non-
linear) characteristic of SMG’s load time series. In order to update the
weights during training process of the AN, like [16, 18], we use multi-variate
auto regressive algorithm because it can train the ANN more faster than
levenberg-marquardt algorithm and gradient descent back propagation al-
gorithm [29]. According to kolmogrov theorem, if the ANN is provided
with proper number of ANs then it has the ability to solve a problem by
38
adopting one hidden layer. Thus, we have considered one hidden layer in
the ANN structure of all 24 ANs.
In short, (due to the aforementioned reasons) our proposed forecast module
is basically an ANN that consists of 24 ANs. Each AN is activated by
sigmoid function and is trained by multi-variate auto regressive algorithm.
Initially, the forecast module receives the binary encoded matrix Pbwhich
is the output of pre-processing module. From this matrix, the forecast
module constructs training and validation samples as follows:
ST=Pb(i, j),∀i∈ {2,3,...,m}
and ∀j∈ {1,2,3,...,n}(5.16)
SV=Pb(1, j),∀j∈ {1,2,3,...,n}(5.17)
Eqns. 5.16 and 5.17 illustrate that the ANN is trained by all the candidate
inputs (historical load time series) except the last one. The last sample of
historical load time series is used for validation purpose. In fact, the valida-
tion set/sample is a part of the training load samples that is removed from
it during the training process. Thus, the validation set becomes unseen for
ANN. Moreover, validation error can be used as a measure of ANN’s error
for the 24 hour forecast horizon. In order to make the validation error as a
true representative of the forecast error, validation sample needs to be as
close to the forecast horizon as possible. We consider the validation sample
as the day before the forecast day because it includes not only the short run
trend but also the daily periodicity characteristics of the load signal [32].
Thus, each of the 24 ANs is trained as per multi variate auto regressive
algorithm by the training samples and is validated by the last/unseen val-
idation sample. The Mean Absolute Percentage Error (MAPE) for each of
the 24 validation samples is considered as validation error in this research
work.
M AP Ei=1
m
m
X
j=1
|pact(hi, dj)−pf or (hi, dj)|
pact(hi, dj)(5.18)
where pact(hi, dj) is the actual load value of the ith hour of the jth day,
pfor(hi, dj) is the forecast load value of the ith hour of the jth day, and m
is the number of days under consideration.
The objective of supervised training is to adaptively adjust the weight
values (fed to ANs) such that the error signal ‘M AP Ei’ between the target
39
value and real output of neuron is minimized. For the sake of clarity, we
represent MAP Eias M AP E(i)
M inimize M AP E(i)∀i∈ {1,2,3,...,m}(5.19)
In this research work, the method of least squares is used, thus we can
write,
Minimize J (I) =
m
X
k=1
M AP ET(i)M AP E(i),
∀i∈ {1,2,3,...,m}(5.20)
In order to achieve the objective function in Eqn. 5.20, we use the multi-
variate auto regressive model [28]. We choose this model due to two reasons:
(i) it provides solution to the objective function in relatively less time, and
(ii) in terms of accuracy it is reasonable. It is worth mentioning here
that both these reasons are given after comparison of the multi-variate
auto regressive model with the typically used learning models like gradient
descent, delta, and widrow-hoff [29]. Thus, the parameter matrices are [28],
n
X
i=1
W(i)R(j−i) = 0, j ={2,3,...,n}(5.21)
n
X
i=1
W(i)R(i−j) = 0, j ={2,3,...,n}(5.22)
where, W(1) = ID(IDis identity matrix), W(1) = ID, and Ris the cross
co-relation given as:
R(i) = 1
n
n−1−i
X
k=i
[x(k)−xm][x(k−i)−xm]T(5.23)
In Eqn. 5.23, xis the vector of observed data, and xmis the mean of
observed data,
Based on these equations, [28] defines prediction error co-variance matrices
as follows,
∆t=Pn
k=1 Wt(−k)R(t−k+ 1)
∆t=Pn
k=1 Wt(k)R(−t+k−1)
Vt=Pn
k=1 Wt(k)R(−k)
Vt=Pn
k=1 Wt(k)R(−k)
(5.24)
40
The recursive equations are given as follows:
Wt+1(k) = Wt(k)Wt+1 (t+ 1)Wt(t−k+ 1)
Wt+1(k) = Wt(k)Wt+1(t+ 1)Wt(t−k+ 1))(5.25)
Wt+1(t+ 1) = −∆tV−1
t
Wt+1(t+ 1) = −∆tV−1
t)(5.26)
In order to find the weights W, the recursive equations are solved. Further
details of the weight update process can be found in [28].
Once the weights in Eqn. 5.25 and Eqn. 5.26 are adaptively adjusted
in a recursive manner, the forecast module return the error signal to the
optimization module. Stepwise algorithm of the proposed forecast module
is shown in algorithm 3.
Algorithm 3 : Pseudo-code of the forecast module
1: Pre-conditions: M AP E(i) is the output of AN, and i∈ {1,2, ..., 24}
2: Receive Pbmatrix from the pre-processing module
3: Compute STand SV
4: Compute M AP E(i) by letting W(1) = IDand
5: W(1) = ID
6: Compute J(i)
7: while Max. # of iterations not reached do
8: if J(i+ 1) ≤J(i)then
9: M AP E(i)←M AP E(i+ 1)
10: else if then
11: Train ANN as per Eqn. 5.25 and Eqn. 5.26
12: Compute M AP E(i) and go back to step (6)
13: end if
14: end while
15: Return J(I) to the optimization module
5.2.3 Optimization Module
Based on the nature of the overall forecast strategy, the basic objective of
optimization module is to minimize the forecast error. For this purpose,
various choices are available like linear programming, non-linear program-
ming, quadratic programming, convex optimization, heuristic optimization,
etc. However, the first one is not applicable here because the problem is
41
highly non-linear. The non-linear problem can be converted into a linear
problem, however, the overall process would become very complex. The
second one is applicable here and gives accurate results, however, its exe-
cution time is very high. Similarly, the third and fourth ones suffer from
slow convergence time. It is worth mentioning here that optimization does
not imply exact reachability to optimum set of solutions, rather, near opti-
mal solution(s) are obtained. To sum up, heuristic optimization techniques
are preferred in these situations because these provide near optimal solu-
tion(s) in relatively faster rate of convergence.
Differential evolution is one of the heuristic optimization techniques pro-
posed in [33] and its enhanced version is used for forecast