Content uploaded by Nadeem Javaid

Author content

All content in this area was uploaded by Nadeem Javaid on Jul 01, 2016

Content may be subject to copyright.

A Scalable Short-Term Load Forecasting Model for

Micro-grid Communication Networks

By

Ashfaq Ahmad

CIIT/FA13-REE-044/ISB

MS Thesis

In

Electrical Engineering

COMSATS Institute of Information Technology

Islamabad – Pakistan

Spring, 2015

A Scalable Short-Term Load Forecasting Model for

Micro-grid Communication Networks

A Thesis Presented to

COMSATS Institute of Information Technology, Islamabad

In partial fulfillment

of the requirement for the degree of

MS (Electrical Engineering)

By

Ashfaq Ahmad

CIIT/FA13-REE-044/ISB

Spring, 2015

ii

A Scalable Short-Term Load Forecasting Model for

Micro-grid Communication Networks

A Graduate Thesis submitted to Department of Electrical Engineering as partial

fulfillment of the requirement for the award of Degree of M.S (Electrical Engineering).

Name

Registration Number

Ashfaq Ahmad

CIIT/FA13-REE-044/ISB

Supervisor

Dr. Nadeem Javaid,

Assistant Professor,

Department of Computer Science,

COMSATS Institute of Information Technology (CIIT),

Islamabad Campus.

iii

Final Approval

This thesis titled

A Scalable Short-Term Load Forecasting Model for

Micro-grid Communication Networks

By

Ashfaq Ahmad

CIIT/FA13-REE-044/ISB

has been approved

For the COMSATS Institute of Information Technology, Islamabad

External Examiner: ___________________________________

Dr. Hasan Mahmood

Associate Professor, Department of Electronics,

QAU, Islamabad

Supervisor: ________________________________________________

Dr. Nadeem Javaid

Assistant Professor, Department of Computer Science,

CIIT, Islamabad

HoD:_________________________________________________________

Dr. Shahid A. Khan

Professor, Department of Electrical Engineering,

CIIT, Islamabad

iv

Declaration

I Mr. Ashfaq Ahmad, CIIT/FA13-REE-044/ISB, hereby declare that I have

produced the work presented in this thesis, during the scheduled period of study. I

also declare that I have not taken any material from any source except referred to

wherever due that amount of plagiarism is within acceptable range. If a violation

of HEC rules on research has occurred in this thesis, I shall be liable to punishable

action under the plagiarism rules of the HEC.

Signature of the student:

Date: ____________________________

____________________________

Ashfaq Ahmad

CIIT/FA13-REE-044/ISB

v

Certificate

It is certified that Ashfaq Ahmad CIIT/FA13-REE-044/ISB has carried out all the

work related to this thesis under my supervision at the Department of Electrical

Engineering, COMSATS Institute of Information Technology, Islamabad and the

work fulfills the requirements for the award of the MS degree.

Date: ____________________________

Supervisor:

____________________________

Dr. Nadeem Javaid,

Assistant Professor

Head of Department:

____________________________

Dr. Shahid A. Khan

Professor, Department of Electrical Engineering.

vi

DEDICATION

This thesis is dedicated to my teachers, my family and my

friends.

vii

ACKNOWLEDGMENT

I am heartily grateful to my supervisor, Dr. Nadeem Javaid, who not only guided me but also

motivated me via insightful criticism from the beginning to the final level that enabled me to

complete this thesis.

I would like to acknowledge my family, my friends, and the cooperative CAST lab attendants.

They all kept me motivated and energetic, and this work have not been possible without them.

Finally, I offer my regard and blessing to everyone who supported me in any regard during the

completion of my thesis.

Ashfaq Ahmad

CIIT/FA13-REE-044/ISB

viii

ABSTRACT

A Scalable Short-Term Load Forecasting Model for

Micro-grid Communication Networks

The underlying forecast model is one of the most significant strategies that directly

affect the economies of energy trade because not only prosumers but also the utilities

aim to maximize their benefits. In this regard, most of the existing forecast models

trade-off between forecast accuracy and convergence rate. This thesis presents a short

term load forecasting model for micro-grid communication networks. Unlike existing

short term forecast models, our proposed model factors in accuracy as well as

convergence rate. Subject to accuracy improvement, we devise modifications in two

popular techniques; mutual information based feature selection, and enhanced

differential evolution algorithm based error minimization. On the other hand,

convergence rate of the overall forecast strategy is enhanced by devising modifications

in the heuristic algorithm. Besides accuracy and convergence rate improvement, we

also devise modification in the feature selection technique to make our proposed

model scalable. Simulation results show that accuracy of the proposed scalable short

term load forecasting model is 99.5%.

ix

TABLE OF CONTENTS

1 Introduction 1

1.1 TheSmartGrid......................... 2

1.2 Towards Localization: The Smart Micro-Grid . . . . . . . . 3

1.2.1 Load Forecast . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Our Contribution . . . . . . . . . . . . . . . . . . . . 5

2 Related Work 7

2.1 Stochastic Distribution Based Strategies . . . . . . . . . . . 8

2.2 ANN based Strategies . . . . . . . . . . . . . . . . . . . . . 8

2.3 Markov Chain Based Strategies . . . . . . . . . . . . . . . . 10

3 Forecast Strategies: Towards Development 11

3.1 Challenges............................ 12

3.2 Inﬂuencing Factors . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Basic Units of a Generic Forecast Model . . . . . . . . . . . 13

3.3.1 Feature Selector . . . . . . . . . . . . . . . . . . . . . 13

3.3.2 Forecaster . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.3 Optimizer . . . . . . . . . . . . . . . . . . . . . . . . 15

4 ANN Based Forecast Strategy 16

4.1 Data Preparation Module . . . . . . . . . . . . . . . . . . . 17

4.2 Feature Selection Module . . . . . . . . . . . . . . . . . . . . 18

4.3 Forecast Module . . . . . . . . . . . . . . . . . . . . . . . . 21

4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 25

4.4.1 Error Performance . . . . . . . . . . . . . . . . . . . 27

4.4.2 Convergence Rate Analysis . . . . . . . . . . . . . . . 30

5 mEDE and ANN Based Forecast Strategy 32

5.1 Motivation............................ 34

5.2 The mEDE and ANN Based Forecast . . . . . . . . . . . . . 34

5.2.1 Pre-Processing Module . . . . . . . . . . . . . . . . . 34

5.2.2 Forecast Module . . . . . . . . . . . . . . . . . . . . 38

x

5.2.3 Optimization Module . . . . . . . . . . . . . . . . . . 41

5.2.4 Simulation Results . . . . . . . . . . . . . . . . . . . 43

5.2.5 Error Performance . . . . . . . . . . . . . . . . . . . 44

5.2.6 Convergence Rate Analysis . . . . . . . . . . . . . . . 49

6 Modiﬁed Feature Selection, ANN and Modiﬁed EDE based

Forecast Strategy 50

6.1 Motivation............................ 51

6.2 The Proposed S-STLF Model . . . . . . . . . . . . . . . . . 51

6.2.1 Modiﬁed MI based Feature Selection . . . . . . . . . 53

6.2.2 ANN based STLF . . . . . . . . . . . . . . . . . . . . 56

6.2.3 mEDE Based Forecast Error Minimization . . . . . . 58

6.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 60

6.3.1 Error Performance . . . . . . . . . . . . . . . . . . . 64

6.3.2 Convergence Rate Analysis . . . . . . . . . . . . . . . 67

6.3.3 Scalability Analysis . . . . . . . . . . . . . . . . . . . 68

7 Conclusion and Future Work 71

7.1 Conclusion............................ 72

7.2 FutureWork........................... 72

8 References 74

xi

LIST OF FIGURES

1.1 AnSMG............................. 4

4.1 ANN based forecast: Block diagram . . . . . . . . . . . . . . 17

4.2 Data preparation module for ANN based forecast . . . . . . 19

4.3 An artiﬁcial neuron . . . . . . . . . . . . . . . . . . . . . . . 22

4.4 ANN based data forecast module . . . . . . . . . . . . . . . 25

4.5 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs

MI+ANN forecast (actual vs forecast) . . . . . . . . . . . . 27

4.6 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs

MI+ANN forecast (error performance) . . . . . . . . . . . . 28

4.7 DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs

MI+ANN forecast (convergence rate analysis) . . . . . . . . 28

4.8 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN

forecast (actual vs forecast) . . . . . . . . . . . . . . . . . . 29

4.9 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN

forecast (error performance) . . . . . . . . . . . . . . . . . . 29

4.10 EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN

forecast (convergence rate analysis) . . . . . . . . . . . . . . 30

5.1 mEDE and ANN: Block diagram . . . . . . . . . . . . . . . 35

5.2 DAYTOWN (27th January, 2015): MI+ANN+mEDE fore-

cast vs Bi-level forecast and MI+ANN forecast (actual vs

forecast)............................. 44

5.3 DAYTOWN (27th January, 2015): MI+ANN+mEDE fore-

cast vs Bi-level forecast and MI+ANN forecast (error per-

formance) ............................ 45

5.4 DAYTOWN (27th January, 2015): MI+ANN+mEDE fore-

cast vs Bi-level forecast and MI+ANN forecast (convergence

rateanalysis) .......................... 45

5.5 EKPC (27th January, 2015): MI+ANN+mEDE forecast vs

Bi-level forecast and MI+ANN forecast (actual vs forecast) . 46

xii

5.6 EKPC (27th January, 2015): MI+ANN+mEDE forecast vs

Bi-level forecast and MI+ANN forecast (error performance) 46

5.7 EKPC (27th January, 2015): MI+ANN+mEDE forecast vs

Bi-level forecast and MI+ANN forecast (convergence rate

analysis)............................. 47

5.8 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-

level forecast and MI+ANN forecast (actual vs forecast) . . 47

5.9 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-

level forecast and MI+ANN forecast (error performance) . . 48

5.10 FE (27th January, 2015): MI+ANN+mEDE forecast vs Bi-

level forecast and MI+ANN forecast (convergence rate anal-

ysis) ............................... 48

6.1 The proposed S-STLF model . . . . . . . . . . . . . . . . . . 52

6.2 PJMW: Actual vs forecast . . . . . . . . . . . . . . . . . . . 61

6.3 EKPC: Actual vs forecast . . . . . . . . . . . . . . . . . . . 61

6.4 DAYTOWN: Actual vs forecast . . . . . . . . . . . . . . . . 62

6.5 FE: Actual vs forecast . . . . . . . . . . . . . . . . . . . . . 62

6.6 PJMW: Error performance . . . . . . . . . . . . . . . . . . . 63

6.7 EKPC: Error performance . . . . . . . . . . . . . . . . . . . 63

6.8 DAYTOWN: Error performance . . . . . . . . . . . . . . . . 64

6.9 FE: Error performance . . . . . . . . . . . . . . . . . . . . . 64

6.10 PJMW: Convergence rate analysis . . . . . . . . . . . . . . . 66

6.11 EKPC: Convergence rate analysis . . . . . . . . . . . . . . . 66

6.12 DAYTOWN: Convergence rate analysis . . . . . . . . . . . . 67

6.13 FE: Convergence rate analysis . . . . . . . . . . . . . . . . . 67

6.14 Impact of load on error performance . . . . . . . . . . . . . 69

xiii

LIST OF TABLES

4.1 Simulation parameters of ANN based forecast . . . . . . . . 27

5.1 Simulation parameters of mEDE and ANN based forecast . . 44

6.1 Simulation parameters of mFS, ANN, and mEDE based fore-

cast ............................... 65

6.2 Performance evaluation of the selected forecast strategies . . 70

xiv

Chapter 1

Introduction

1

1.1 The Smart Grid

In most parts of the world, especially in developed countries, transmission

and distribution systems have become aged. Existing/traditional grid sys-

tem needs renovation not only to bridge the ever increasing gap between

demand and supply but also to meet some other essential challenges like

grid reliability, grid robustness, customer electricity cost minimization, etc

[1]. In this regard, recent integration of the latest information and commu-

nication technologies with the existing grid system has gained enormous

attention. One of the beauties of this integration is customer engagement

that plays a key role in the economies of energy trade. In other words, the

old concept of uni-directional energy ﬂow is replaced by the new and smart

concept of bi-directional energy ﬂow–transformation from traditional con-

sumer to a smart prosumer or transformation from traditional grid into a

smart one (the smart grid) [2]. European technology platform (European

Commission, 2006) deﬁnes smart grid as, “a smart grid is an electricity

network that can intelligently integrate the actions of all users connected

to it–generators, consumers and those that do both in order to eﬃciently

deliver sustainable, economic and secure electricity supplies”. Smart grid

has revolutionized the performance of all the sections of conventional grid.

In case of conventional grid, energy can only ﬂow from generation side to

consumer, whereas, in case of smart grid consumer can also sell its ex-

tra electricity generated through domestic sources, e.g., solar, wind, etc

[3]. Introduction of smart grid infrastructure on distribution section has

manifold impact where retailers and consumers are important players of

distribution section. Prior and after advanced technology installation, util-

ities seek for as much return on investment as possible. On the other hand,

customers seek for as minimum electricity consumption paying cost as pos-

sible. Thus, the nature of not only utilities but also consumers is greedy.

Traditional grid was unable to entertain both parties at the same time due

to lack of ﬂexibility. In other words, absence of two way communication or

bi-directional energy ﬂow between utility and consumer makes the tradi-

tional grid inadequate to meet modern day grid challenges like reliability,

robustness, etc [4]. It is more likely that the smart grid will integrate

new communication technologies, advanced metering, distributed systems,

distributed storage, security and safety to achieve considerable robustness

and reliability [5, 6, 7]. From this discussion, it is clear that unlike tradi-

tional grid where utility was the main/dominant player, smart grid involves

customers in energy trade as well–bi-directional energy ﬂow.

In smart grids, user engagement via two way communications leads to peak

2

load reduction as optimal decisions are taken by the energy management

unit. The resulting/new grid with its advanced metering infrastructure,

will aﬀect that how [8]:

•to determine and meet the load,

•to determine customer engagement with utility, and

•integration of the latest technologies will aﬀect the energy trade be-

tween customer and utility.

Thus, we have two main players in the smart grid; user and utility (ev-

ery user is a player if more one users are considered). The bi-directional

communication or energy ﬂow beneﬁce not only the utilities but also the

consumers. More speciﬁcally, the consumers are no longer only consumers

instead they are prosumers who have the ability to access electricity market

both as sellers and buyers. At the same time, the smart utilities have the

ability to eﬃciently manage their resources. Consequently, the demand and

supply gap that is ever increasing can be met [6, 9]. Initially, the utility fore-

casts future load/price signal that is based on past activities of the users.

Users then adjust their power usage schedules as per utilities price/load

signal while not compromising their comfort levels. However, with the ever

growing expectations, accurate forecast strategies and advanced scheduling

techniques are of extreme signiﬁcance that would make the over all opera-

tion as optimal as possible. In this regard, many demand side scheduling

techniques are proposed [10, 11, 12, 13]. However, there exists suﬃcient

challenges prior to scheduling techniques in terms of stochastic information

schemes to predict the future load. Thus, with the growing expectation in

the adoption of smart grids, advanced techniques and tools are required to

optimize the overall operation. Moreover, this determination would require

that the daily operations of a smart grid utility (like strategic decisions to

bridge the gap between demand and supply, and fuel resource planning)

are properly conveyed. To sum up, all these decisions are highly inﬂuenced

by the underlying load forecast strategy [14].

1.2 Towards Localization: The Smart Micro-Grid

By taking into consideration the development of demand response in smart

grids, the resulting capacity on the user side (in residential areas) would

be small enough such that we can refer it as a micro-grid (refer Fig. 1.1

[15]). It is foreseen that over the next decade, Smart Micro-Grids (SMGs)

will signiﬁcantly grow due to minimized installation cost, higher reliability,

3

Figure 1.1: An SMG

increased support from prosumers and utilities, etc. During disturbances,

an SMG can work in islanded mode, i.e., it can disconnect itself from the

main distribution system. Thus, the SMG can maintain a high service

level. Moreover, islanding in an intentional manner (no disturbances) has

the potential to provide high local reliability [16]. Another beneﬁt of SMGs

is the exploitation of distributed control to prevent single point of failure.

1.2.1 Load Forecast

In terms of load forecast, the SMGs are more diﬃcult to realize than macro

smart grids. This is obvious as load forecast curve exhibits more volatile

and non-linear load ﬂuctuations in SMGs as compared macro smart grids.

Load forecast is one of the fundamental as well as essential tasks that are

needed for proper operation of the micro-grid. On another note, accurate

load forecasting leads to enhanced management of resources (renewable and

conventional) which in turn directly aﬀects the economies of energy trade.

In SMGs, load forecast is of two types; short term and long term. However,

in terms of Short Term Load Load Forecast (STLF), the micro-grid is more

4

diﬃcult to realize due to lower similarities (high randomness due to more

load ﬂuctuations) in history load curves as compared to that of long term

load forecasting [16].

As mentioned earlier, the load of a micro-grid shows more ﬂuctuations as

compared to the traditional large power system. In these grids, adaptation

to production w.r.t load can be performed in a more dynamic way as com-

pared macro grids. The load curve of an SMG does not always show the

same shape due to random power consumption schedules of the prosumers

which leads to more variability as compared to the macro grid. However,

all these operations are signiﬁcantly aﬀected by the underlying forecast

strategy to predict the future load(s). Due to more volatility in the history

load curve, STLF is more challenging than long term load forecast [17]. In

literature, many STLF strategies are presented. The authors in [18] use Ar-

tiﬁcial Neural Network (ANN) and mutual information based technique to

forecast load/price of the next day. In their work, Artiﬁcial Neurons (ANs)

are activated by sigmoid function because of its ability to capture non-

linearity(ies) in the load time series. Apart from its advantages, the major

disadvantage of this strategy is the high value of relative error between the

actual and forecast curves. Subject to relative error minimization of [18],

[16] utilizes Enhanced version of Diﬀerential Evolution (EDE) algorithm.

This integration minimizes the forecast error very eﬃciently, however, not

only further improvement can be achieved in terms of accuracy but also the

execution time of this strategy can be improved which is relatively on the

higher side. Another hybrid STLF strategy is presented in [19], however,

this strategy is very complex in terms of implementation and its execution

time is also very high.

1.2.2 Our Contribution

In this thesis, we present a Scalable-STLF (S-STLF) model for Micro-grid

Communication Networks (MCNs). We use a modular strategy where the

output of each preceding module is fed into the succeeding module. Over-

all, our proposition consists of three modules; feature selector, forecaster,

and optimizer. Initially, the feature selector receives historical time se-

ries of load data as input, and then selects candidate inputs having more

relevant information based on our improved version of the mutual infor-

mation based technique. Thus, the feature selector minimizes the curse of

high dimensionality. Followed by the forecaster (note: it consists of ANN)

which receives selected candidate inputs from the feature selector. Based

on this received data, ANs (activated by sigmoid function) are trained to

5

predict load of the upcoming day. At this stage, the relative error between

the actual and forecast curves is high. Thus, the optimizer, which con-

sists of our modiﬁed version of the EDE (mEDE) algorithm, minimizes

the forecast error. The proposed S-STLF model for MCNs is validated via

simulations which show that our proposed S-STLF model performs better

than the selected existing strategies in terms accuracy, convergence rate,

and scalability.

Rest of the thesis is organized as follows. Chapter 2 contains relevant

STLF contributions from research community, chapter 3 deals with the

basic architecture of a generic forecast model, chapter 4 contains description

of ANN based forecast strategy, chapter 5 integrates EDE with the ANN

based forecast strategy of chapter 4, chapter 6 integrates feature selection

module with ANN+EDE based forecast strategy of chapter 5, and chapter

7 not only concludes the thesis but also provides future research directions.

Finally, references are provided at the end of the thesis.

6

Chapter 2

Related Work

7

As accurate load forecasting has a direct impact on the economics of energy

trade. So, we discuss some of the previous load forecasting research articles

in SMGs as follows.

2.1 Stochastic Distribution Based Strategies

[25] presents a probabilistic approach that is subjected to energy consump-

tion proﬁle generation of household appliances. The proposed approach

takes a wide range of appliances into consideration along with a high degree

of ﬂexibility. The proposed methodology conﬁgures household appliances

between holidays and working days. Main assumptions of this work are; (i)

gaussian distributed ON-OFF cycles of diﬀerent appliances, (ii) gaussian

distributed appliances’ energy consumption patterns, and (iii) gaussian dis-

tributed appliances in terms of their number. In this work, not only a wide

range of appliances is considered but also high ﬂexibility degree of appli-

ances is considered. However, absence of closed form solution makes the

gaussian based forecast strategy very complex. Moreover, these assumption

can not be always true, thus, accuracy of the predicted load-time series is

highly questionable.

An improvement over [25] is presented in [26]. This research work uses reg-

ulizer to overcome the computational complexity of gaussian distribution

based STLF strategy in [25]. Moreover, the proposed STLF strategy has

the ability to capture heteroscedasity of load in a more eﬃcient way as com-

pared [25]. Simulations are conducted to prove that the proposed STLF

strategy performs better than the existing one. To sum up, we conclude

that [26] has overcome the complexity of [25] to some extent, however, the

basic assumptions (gaussian distribution based on-oﬀ cycles of household

appliances, number of appliances, and power consumption pattern of appli-

ances) still hold the bases and thus make the proposal highly questionable

in terms of accuracy.

2.2 ANN based Strategies

In [18], authors present a hybrid technique subject to short term price

forecasting of SMGs. This hybrid technique comprises of two steps; feature

selection and prediction. In the ﬁrst step, a mutual information based

technique is implemented to remove redundancy and irrelevancy from the

input load time series. In the second step, ANN along with evolutionary

8

algorithm is used to predict the future load time curve. In this process,

the authors assume sigmoid activation function for ANs. In addition, the

authors ﬁne-tune some adjustable parameters during the ﬁrst and second

steps via an iterative search procedure which is part of this work. Subject to

forecast accuracy, this technique is eﬃcient as it embeds various techniques,

however, the cost paid is implementation complexity.

In [16], the authors study the characteristics of load time series of a micro

grid and then compare its diﬀerences with that of a traditional power sys-

tem. More importantly, the authors propose a bi-level (upper and lower)

short term load prediction strategy for micro grids. The lower level is a

forecaster which utilizes neural network and evolutionary algorithm. The

upper level optimizes the performance of the lower level by using the diﬀer-

ential evolution algorithm. In terms of eﬀectiveness, the proposed bi-level

prediction strategy is evaluated via real time data of a Canadian univer-

sity. Eﬀectiveness of this work is reﬂected via MATLAB simulations which

demonstrate that the proposed strategy performs STLF in SMGs with a

reasonable accuracy. However, its implementation complexity is very high.

Another ANN based STLF strategy is presented in [23]. This hybrid

methodology completes the STLF task in four steps; data selection, trans-

formation, forecast, and error correction. In step one, some well known

techniques of data selection are used to minimize the high dimensional-

ity curse of input load time series characteristics. Step two deals wavelet

transformation of the selected characteristics of input load time series to

enable redundancy and irrelevancy ﬁlter implementation. Followed by step

three, which uses ANN and a training algorithm subject to STLF in SMGs.

More importantly, they choose sigmoid activation function for ANs due

non-linear capturability. Finally, error correcting functions are used in step

four to improve the proposed STLF methodology in terms of accuracy. In

simulations, this methodology is tested against practical household load

which demonstrates that this methodology is very good in terms of accu-

racy, however, at the cost of complexity.

Similarly, another novel strategy is presented in [24] to predict the oc-

currence of price spikes in SMGs. The proposed strategy utilizes wavelet

transformation for input feature selection. An ANN is then used to pre-

dict future price spikes based on the training of the selected inputs. In

[27], another STLF strategy is presented for SMGs which is completed in

ﬁve steps: (i) database handling of historical load data, (ii) detection of

missing data and its interpolation, (iii) principle component analysis to

detect outliers, (iv) ANN based forecast, and (v) display the forecast data

on diﬀerent devices. However, accuracy of [27] is not satisfactory.

9

2.3 Markov Chain Based Strategies

Subject to robustness of STLF forecast strategy, authors in [22] propose a

markov chains based strategy. This stochastic strategy aims to tackle load

time series ﬂuctuations associated with energy consumption of users in a

heterogeneous environment. The markov chains are used to predict the

future on-oﬀ cycles of household appliances in a robust way due to their

memoryless nature (future values only depend on the current values; past

values are not considered). This memoryless nature of markov chains not

only makes the STLF strategy robust but also relatively less complex in

comparison to the aforementioned techniques. However, the memory less

nature of markov chains also has a drawback; less accuracy.

10

Chapter 3

Forecast Strategies: Towards Development

11

Subject to daily supply and demand planning of an utility, the daily oper-

ations are strongly inﬂuenced by price/load forecast strategies. Accurate

load forecasting hold basis for spot price management in the system. As a

growing interest is shown by utilities towards the implementation of smart

grids, so the signiﬁcance of forecast strategies becomes more important due

to expanded application horizon–storage maintenance, demand side man-

agement, integration of renewable resources, load scheduling, etc. From

customers point of view, accurate forecast strategies means proper under-

standing of the relationship between price and demand that enable them

to properly schedule their usage pattern.

3.1 Challenges

Due to growing awareness of customer participation in smart grids, utilities

are enforced to to develop load/price forecast strategy(ies). However, in

doing so utilities face many challenges like [20]:

•High and varied range of customers consumption data.

•Highly volatile nature of the load/price signal.

•Highly non-linear characteristics of the load/price signal.

•Hybrid customer groups–using traditional meters or smart meters.

•High dimensionality curse of identifying factors that may lead to over-

ﬁtting problem.

•Complexity of the identifying parameters.

•Lack of data availability for diﬀerent scenarios.

3.2 Inﬂuencing Factors

In addition to the aforementioned challenges, there are some factors that

inﬂuence load forecasting in smart grid [20]:

•Weather conditions: speciﬁcally when renewable energy sources are

integrated.

•Time of the day: electricity consumption signiﬁcantly varies at dif-

ferent time slots of the day.

12

•Random disturbances: for example, sudden cloudy conditions highly

disturb solar generation.

•Electricity price market: as per price market the customers consump-

tion pattern varies and vice versa.

•Storage cells: at both utility and customer locations would greatly

aﬀect the forecast signal.

3.3 Basic Units of a Generic Forecast Model

In view of the aforementioned challenges, the research community has de-

veloped many forecast strategies. From these works, we conclude that a

forecast strategy comprises of three basic units; feature selector, forecaster,

optimizer. In this subsection, these basic units are discussed in detail.

3.3.1 Feature Selector

As per basic assumption of the feature selector, input data not only con-

tain irrelevant features but also redundant features. Irrelevant features are

those which do not provide useful information, and redundant features are

the duplicate ones that do not provide more information. In this unit, a

subset of most relevant features subject to forecast strategy development

is selected. Incorporation of feature selector mainly provide three beneﬁts;

reduced over-ﬁtting, decreased time during training, and improved model

interpretation.

In literature, many forecast strategies exist that have utilized the feature

selector. For example, [19] uses four indices of load variation and empirical

mode decomposition for feature selection. The indices of load variation

observe data variation over months, between two adjacent days, between

hours of the same day and in between an hour. Followed by the empirical

mode decomposition algorithm that gradually decomposes the load/price

signal into linear components along with some residue. The decomposed

components are then ranked based on diﬀerent trends and scales. Similarly,

[21] applies forward selection algorithm to select reduced number of scenar-

ios that were generated via monte carlo simulation. [22] utilizes multi-scale

setting to ﬁne tune information during state aggregation. [9] uses probabil-

ity distribution function and roulette wheel mechanism to generate several

scenarios. Among the generated scenarios, a subset is selected based on

13

scenario reduction process where weibull and gaussian probability distri-

bution functions are utilized. In another work [23], input data is initially

classiﬁed into schedulable and non-schedulable loads. Then, wavelet trans-

formation is conducted to rank the input data into detailed components

(high frequency) and approximate components (low frequency). Finally,

[18, 16, 24] use entropy based mutual information technique for feature

selection.

3.3.2 Forecaster

The basic purpose of this unit is to forecast the future load/price signal

based on learning algorithms. Since the load/price is highly non-linear, the

forecaster needs to capture these non-linearities with reasonable accuracy

and execution time. The type of learning used here would be supervised

learning because history load data is available. An important advantage

of the forecaster is its ability to provide valuable information. Based on

this valuable information, experts take qualitative as well as quantitative

decisions that beneﬁce the energy trade between utility and its customers.

Literature review reveals that many strategies have been proposed subject

to this unit. For example, [19] uses extreme learning machine with kernel

in an artiﬁcial neural network environment, and [22] uses markov chains

to predict the next state. [16, 18, 24] uses artiﬁcial neural network based

forecaster. Among the typically used activation functions, these authors

prefer sigmoid function for neuron activation due to its ability to handle the

non-linearities associated with price/load signal. Subject to training of the

network, [24] uses mallat’s algorithm, [23] uses discrete wavelet transforma-

tion based technique, and [16, 16] use multi-variate auto regressive model.

Some other well known training algorithms are Newton’s Method, Gradient

Descent based back propagation, levenbergmarquardt learning algorithm,

etc. However, among these training algorithms, the typically used one is

levenbergmarquardt learning algorithm because it can train the artiﬁcial

neural network 10–100 times faster than the classical Newton’s Method and

Gradient Descent based back propagation algorithm. Rest of the training

algorithms are still unexplored in this area–a potential research area for

future. It is worth mentioning that some forecasters like [16, 18] also use

an evolutionary algorithm based local optimizer. The key beneﬁt of local

optimizer is its ability to escape from trapping in local minima/maxima

that may arise during the training process of the artiﬁcial neural network.

14

3.3.3 Optimizer

Generally, an optimization problem is written as;

Max f0(x)OR M in f0(x) (3.1)

subject to:

fi(x)≤ci∀i∈Z+(3.2)

where, the optimization variable is x= (x1, ..., xn), the objective function

is f0:Rn→R, the constraints are fi:Rn→R, and the upper bounds are

ci.x∗is an optimal solution of the optimization problem if and only if it

has the smallest or greatest objective value among all the possible solution

vectors that satisfy the constraints; we have f0(z)≥f0(z∗) for any zwith

f1(z)≤c1, ..., fn(z)≤cn.

Subject to forecast strategy(ies), the forecaster returns day ahead pric/load

signal with some error. The forecast strategy can be further enhanced in

terms of accuracy if error minimization is considered as an objective func-

tion of the optimizer. However, in this process, surplus execution time is

spent. For applications, where accuracy is more important than execution

time, the optimizer is of extreme signiﬁcance. Here, heuristic optimization

techniques (like diﬀerential evolution, particle swarm optimization, etc.)

are preferred over the other optimization techniques (like linear program-

ming, non-linear programming, etc.) due faster convergence rate. In this

regard, only few techniques have been proposed that take into considera-

tion the optimizer. For example, [16] uses enhanced diﬀerential evolution

algorithm and [19] uses particle swarm optimization in the optimizer. To

our knowledge, this unit is still unexplored and can be considered as a po-

tential research area (ant algorithms, bee algorithms, genetic algorithms,

etc. need to be explored).

15

Chapter 4

ANN Based Forecast Strategy

16

Subject to complex day-ahead load forecast of SGs, any proposed prediction

strategy should be capable enough to mitigate the non-linear input/output

relationship as eﬃciently as possible. ANNs are widely used as forecasters

that can predict the non-linear behaviour of SG’s load time series with

acceptable accuracy. However, prior to ANN based forecasting, input load

time series must be made compatible. Therefore, our proposed day-ahead

load forecasting model (for SGs) consists of three modules; data prepara-

tion module, feature selection module and forecast module (refer ﬁgure 4.1).

The ﬁrst module performs pre-processing to make the input data compati-

ble with the feature selection module and the forecast module. The second

module removes irrelevant and redundant features from the input data.

The third module consists of an ANN to forecast day-ahead load of the

SG. Details are as follows.

Figure 4.1: ANN based forecast: Block diagram

4.1 Data Preparation Module

As mentioned earlier, the data preparation module receives the input load

time series (historical). Suppose, the input load time series is shown by the

17

following matrix:

P=

pd1

h1pd1

h2pd1

h3. . . pd1

hm

pd2

h1pd2

h2pd2

h3. . . pd2

hm

pd3

h1pd3

h2pd3

h3. . . pd3

hm

.

.

..

.

..

.

.....

.

.

pdn

h1pdn

h2pdn

h3. . . pdn

hm

(4.1)

where, hmis the mth hour, dnis the nth day, and pdn

hmis historical power

consumption value at mth hour of the nth day. As there are 24 hours in a

day, so m= 24. The value of ndepends on designer’s choice, i.e., greater

value of nleads to ﬁne tuning during the training process of the forecast

module because more lagged samples of input data are available. However,

it would lead to more execution time.

Prior to feed the ANN with input matrix P, the following step wise oper-

ations are performed by the data preparation module (refer Fig. 4.4):

1. Local maximum: Initially, a local maximum value is calculated for

each column of the Pmatrix; pci

max =max{pd1

hi, pd2

hi, pd3

hi,...,pdn

hi},

∀i∈ {1,2,3, . . . , n}.

2. Local normalization: In this step, each column of the matrix Pis nor-

malized by its respective local maxima such that the resultant matrix

is represented by Pnrm. Now, each entry of Pnrm ranges between 0

and 1.

3. Local median: For each column of Pnr m matrix, a local median value

Mediis calculated (∀i∈ {1,2,3,...,n}).

4. Binary encoding: Each entry of Pnrm matrix is compared with its

respective Medivalue. If the entry is less than its respective local

median value, then it is encoded with a binary 0, else, it is encoded

with a binary 1. In this way, a resultant matrix containing only

binary values (0’s and 1’s),Pb, is obtained.

At this stage, the Pbmatrix is compatible with the forecast module and is

thus fed into it.

4.2 Feature Selection Module

Once the data is binary encoded, not only redundant but also irrelevant

samples needs to be removed from the lagged input data samples. In re-

moving redundant features, the execution time during the training process

18

Figure 4.2: Data preparation module for ANN based forecast

is minimized. On the other hand, removal of irrelevant features leads to

improvement in forecast accuracy because the outliers are removed.

In order to remove the irrelevant and redundant features from the binary

encoded input data matrix Pb, an entropy based mutual information tech-

nique is used in [16, 18] which deﬁnes the mutual information between

input Qand target Tby the following formula,

MI(Q, T ) = X

iX

j

p(Qi, Tj)log2p(Qi, Tj)

p(Qi)p(Ti)∀i, j ∈ {0,1}(4.2)

In equation 4.2, M I (Q, T ) = 0 means that Qand Tare independent, high

value of MI (Q, T ) means that Qand Tare strongly related and low value

of MI(Q, T ) means that Qand Tare loosely related.

Thus, the candidate inputs are ranked with respect to the mutual informa-

tion value between input and target values. In [16],[18], the target values

are chosen as the last samples for every hour of the day among all the

training samples (for every hour only one target value is chosen that is

value of the previous day). Choice of the last sample seems logical as it

is the closest value to the upcoming day with respect to time, however, it

may lead to serious forecast errors due to inconsideration of the average

behaviour. However, consideration of only the average behaviour is also in-

suﬃcient because the last sample has its own importance. To sum up, we

come up with a solution that not only considers the last sample but also

the average behaviour. Thus, we modify equation 4.2 for three discrete

random variables as,

MI(Q, T , M) = X

iX

jX

k

p(Qi, Tj, Mk)log2p(Qi, Tj, Mk)

p(Qi)p(Ti)p(Mk)∀i, j ∈ {0,1}

(4.3)

19

In expanded form, equation 4.3 is written as follows,

MI(Q, T , M) = p(Q= 0, T = 0, M = 0) ×log2p(Q= 0, T = 0, M = 0

p(Q= 0)p(T= 0)p(M= 0)

+p(Q= 0, T = 0, M = 1) ×log2p(Q= 0, T = 0, M = 1

p(Q= 0)p(T= 0)p(M= 1)

+p(Q= 0, T = 1, M = 0) ×log2p(Q= 0, T = 1, M = 0

p(Q= 0)p(T= 1)p(M= 0)

+p(Q= 0, T = 1, M = 1) ×log2p(Q= 0, T = 1, M = 1

p(Q= 0)p(T= 1)p(M= 1)

+p(Q= 1, T = 0, M = 0) ×log2p(Q= 1, T = 0, M = 0)

p(Q= 1)p(T= 0)p(M= 0)

+p(Q= 1, T = 0, M = 1) ×log2p(Q= 1, T = 0, M = 1)

p(Q= 1)p(T= 0)p(M= 1)

+p(Q= 1, T = 1, M = 0) ×log2p(Q= 1, T = 1, M = 0)

p(Q= 1)p(T= 1)p(M= 0)

+p(Q= 1, T = 1, M = 1) ×log2p(Q= 1, T = 1, M = 1)

p(Q= 1)p(T= 1)p(M= 1)

(4.4)

In order to determine the M I value between Qand T, the joint and inde-

pendent probabilities needs to be determined. For this purpose, an auxil-

iary variable Avis introduced.

Av= 4T+ 2M+Q∀T, M, Q ∈ {0,1}(4.5)

It is clear from equation 4.5 that Avranges between 0 and 7. A0v,A1v,

A2v,A3v, ..., A7vcounts the number of sample data points (out of total

ldata points) for which Av= 0, Av= 1, Av= 2, Av= 3,..., Av=

7, respectively. In this way, we can now easily determine the joint and

20

independent probabilities as follows.

p(Q= 0, T = 0, M = 0) = A0v

l

p(Q= 0, T = 0, M = 1) = A2v

l

p(Q= 0, T = 1, M = 0) = A4v

l

p(Q= 0, T = 1, M = 1) = A6v

l(4.6)

p(Q= 1, T = 0, M = 0) = A1v

l

p(Q= 1, T = 0, M = 1) = A3v

l

p(Q= 1, T = 1, M = 0) = A5v

l

p(Q= 1, T = 1, M = 1) = A7v

l

p(Q= 0) = A0v+A2v+A4v+A6v

l

p(Q= 1) = A1v+A3v+A5v+A7v

l

p(T= 0) = A0v+A1v+A2v+A3v

l

p(T= 1) = A4v+A4v+A5v+A7v

l(4.7)

p(M= 0) = A0v+A1v+A4v+A5v

l

p(M= 1) = A2v+A3v+A6v+A7v

l

Based on equation 4.4, mutual information between Qand Tis calculated,

and thus redundancy and irrelevancy is removed from the input samples.

This mutual information based technique is computed with reasonable ex-

ecution time and acceptable accuracy.

4.3 Forecast Module

By evaluating load variations over several months or between two con-

secutive days or between consecutive hours over a day, [19] concluded that

21

SG’s load-time series signal exhibits strong volatility and randomness. This

result is obvious because diﬀerent users have diﬀerent energy/power con-

sumption patterns/habits. Thus, in terms of DLF, realization of a SG is

more diﬃcult as compared to its realization in terms of long-term load

forecast. Therefore, the basic requirement of the forecast module is to

forecast the load-time series of a SG by taking into consideration its non-

linear characteristics. In this regard, ANNs are widely used due to two

reasons; accurate forecast ability, and the ability to capture the non-linear

characteristics.

Due to the aforementioned reasons, we choose ANN based implementation

in our forecast module. Initially, the forecast module receives selected

features SF (.), and then constructs training ‘T S’ and validation samples

‘V S’ from it as follows:

T S =SF (i, j),∀i∈ {2,3,...,m}

and ∀j∈ {1,2,3,...,n}(4.8)

V S =SF (1, j),∀j∈ {1,2,3,...,n}(4.9)

From equations 4.8 and 4.9, it is clear that the ANN is trained by all the

historical load-time series candidates except the last one which is used for

validation purpose. This discussion leads us towards the explanation of

the training mechanism. However, prior to explanation, it is essential to

describe the ANN. An ANN, inspired from the nervous system of humans,

Figure 4.3: An artiﬁcial neuron

is a set of Artiﬁcial Neurons (ANs) to perform tasks of interest (note: our

task of interest is STLF of micro-grids). Usually, an AN performs a non-

linear mapping from RIto [0,1] that depends on the activation function

used.

fAN

act :RI→[0,1] (4.10)

22

where Iis the vector of input signal to AN. Fig. 4.3 illustrates the structure

of an AN that receives I= (I1, I2,...,In). In order to either deplete or

strengthen the input signal, to each Iiis associated a weight wi. The ANN

computes I, and uses fAN

act to compute the output signal ‘y’. However, the

strength of yis also inﬂuenced by a bias value (threshold) ‘b’. So, we can

compute Ias follows:

I=

imax

X

i=1

Iiwi(4.11)

The fAN

act receives Iand bto determine y. Generally, fAN

act s are mappings

that monotonically increase (fAN

act (−∞ = 0) and fAN

act (+∞= 1)). Among

the typically used fAN

act s, we use sigmoid fAN

act .

fAN

act (I, b) = 1

1 + e−α(I−b)(4.12)

We choose sigmoid fAN

act due to two reasons; fAN

act ∈(0,1) and the param-

eter αhas the ability to control steepness of the fAN

act . In other words,

sigmoid fAN

act choice enables the AN to capture the non-linear characteris-

tic of load time series. Since, this work aims at day-ahead load forecasting

for micro-grids, and one day consists of 24 hours. So, the ANN consists of

24 forecasters (one AN for an hour) where each forecaster predicts load of

one hour of the next day. In other words, 24 hourly load time-series are

separately modeled instead of one complex forecaster. The whole process

is repeated every day to forecast load of the next day.

The question that now needs to be answered is how to determine wiand

b? The answer is straight forward, i.e., via learning. In our case, prior

knowledge of load-time series exists. Thereby, we use supervised learning;

adjusting wiand bvalues until a certain termination criterion is satisﬁed.

The basic objective of supervised training is to adjust wiand bsuch that

the error signal ‘e(k)’ between the target value ‘ˆy(k)’ and real output of

neuron ‘y(k)’ is minimized.

Minimize e(k) = y(k)−ˆy(k),

∀k∈ {1,2,3,...,m}(4.13)

We use the method of least squares to determine the parameter matrices,

that is given as follows,

Minimize J (K) =

m

X

k=1

eT(k)e(k),

∀k∈ {1,2,3,...,m}(4.14)

23

Subject to most feasible solution of Eqn. 4.14, we use the multi-variate

auto regressive model presented in [28] because it solves the objective func-

tion in relatively less time with reasonable accuracy as compared to the

typically used learning rules like gradient descent, widrow-hoﬀ, and delta

[29]. According to [28], the parameter matrices are given as follows,

n

X

i=1

W(i)R(j−i) = 0 j={2,3,...,n}(4.15)

n

X

i=1

W(i)R(i−j) = 0 j={2,3,...,n}(4.16)

where, W(1) = ID(IDis identity matrix), W(1) = ID, and Ris the cross

co-relation given as:

R(i) = 1

n

n−1−i

X

k=i

[x(k)−m][x(k−i)−m]T(4.17)

In Eqn. 4.17, mis the mean vector of observed data,

m=1

n

n

X

k=i

x(k) (4.18)

Based on these equations, [28] deﬁnes the following prediction error co-

variance matrices.

Vt=Pn

k=1 Wt(k)R(−k)

Vt=Pn

k=1 Wt(k)R(−k)

∆t=Pn

k=1 Wt(−k)R(t−k+ 1)

∆t=Pn

k=1 Wt(k)R(−t+k−1)

(4.19)

The recursive equations are as follows:

Wt+1(k) = Wt(k)Wt+1 (t+ 1)Wt(t−k+ 1)

Wt+1(k) = Wt(k)Wt+1(t+ 1)Wt(t−k+ 1))(4.20)

Wt+1(t+ 1) = −∆tV−1

t

Wt+1(t+ 1) = −∆tV−1

t)(4.21)

In order to ﬁnd the weights, Eqn. 4.20 and Eqn. 4.21 are solved recursively.

For further details about the weight update mechanism, readers are sug-

gested to read [28]. Fig. 4.4 is pictorial representation of the steps involved

in data forecast module.

24

Figure 4.4: ANN based data forecast module

Once the weights in Eqn. 4.19 and Eqn. 4.20 are recursively adjusted as per

objective function in Eqn. 4.14, the output matrix is ten binary decoded

and de-normalized to get the desired load-time series. Stepwise algorithm

of the proposed methodology is shown in algorithm 1.

4.4 Simulation Results

We evaluate our proposed DLF model (m(MI+ANN)) by comparing it

with an existing MI+ANN model in [16]. In our simulations, historical

load time-series data from November (2014) to January (2015) is taken

from the publicly available PJM electricity market for two SGs in United

States of America; DAYTOWN, and EKPC [30]. November–December

(2014) data is used for training and validation purpose, and January (2015)

data is used for test purpose. Simulation parameters are shown in Table

4.1, and their justiﬁcation can be found in [16, 18, 28, 29]. In this paper,

we have considered two performance metrics; % error, and execution time

(convergence rate).

•Error performance: It is the diﬀerence between actual and the fore-

cast signal/curve, and is measured in %.

•Convergence rate or execution time: The simulation time taken by

the system to execute a speciﬁc forecast model. Forecast models for

which execution time is small are said to converge fastly as compared

to the vice versa case. In this paper, execution time is measured in

seconds.

Figures 4.5 and 4.8 are the graphical illustrations of the fact that how

well our proposed ANN based DALF model predicts the target values of

an SG. In these ﬁgures, the proposed m(MI+ANN) based forecast curve

more tightly follows the target curve as compared to the existing MI+ANN

25

Algorithm 1 : Pseudo-code of day-ahead ANN load forecast

1: Pre-conditions: i= # of days, and j= # of hours per day

2: P←historical load data

3: Compute Pci

max ∀i∈ {1,2,3,...,m}

4: Compute Pnrm

5: Compute Medi∀i∈ {1,2,3,...,m}

6: for all (i∈ {1,2,3,...,m})do

7: for all (j∈ {1,2,3,...,n})do

8: if (P(i,j)

nrm ≤M edi)then

9: Pi,j

b←0

10: else if then

11: Pi,j

b←1

12: end if

13: end for

14: end for

15: Compute STand SV

16: Compute y(1) by letting W(1) = Iand

17: W(1) = I

18: while Max. # of iterations not reached do

19: if J(k+ 1) ≤J(k)then

20: y(k)←y(k+ 1)

21: else if then

22: Train ANN as per Eqn. 4.20 and Eqn. 4.21

23: Compute y(k+ 1) and go back to step (17)

24: end if

25: end while

26: Perform decoding

27: Perform de-normalization

26

Table 4.1: Simulation parameters of ANN based forecast

Parameter Value

Number of forecasters 24

Number of hidden layers 1

Number of neurons in the

hidden unit

5

Number of iterations 100

Momentum 0

Initial weights 0.1

Historical load data 26 days

Bias value 0

based forecast curve which is justiﬁcation of the theoretical discussion of

our proposed methodology in terms of non-linear forecast ability. Not only

the sigmoid fAN

act (refer equation) but also the multivariate auto-regressive

training algorithm enable the day-ahead ANN based forecast methodology

to capture non-linearity(ies) in historical load data.

0 5 10 15 20 25

1600

1700

1800

1900

2000

2100

2200

2300

Time (hr)

Load (KW)

Actual

m(MI+ANN) forecast

MI+ANN forecast

Figure 4.5: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs

MI+ANN forecast (actual vs forecast)

4.4.1 Error Performance

Figure 4.6 shows the % forecast error when tests are conducted on DAY-

TOWN grid; our m(MI+ANN) forecasts with 2.9% and the existing MI+ANN

27

m(MI+ANN) forecast MI+ANN forecast

0

0.5

1

1.5

2

2.5

3

3.5

4X = 2

Y = 3.84

Error (%)

X = 1

Y = 2.9

Figure 4.6: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs

MI+ANN forecast (error performance)

m(MI+ANN) forecast MI+ANN forecast

0

1

2

3

4

5

6

7X = 2

Y = 6.54

Execution time (sec)

X = 1

Y = 2.48

Figure 4.7: DAYTOWN (27th January, 2015): m(MI+ANN) forecast vs

MI+ANN forecast (convergence rate analysis)

forecasts with 3.84% relative errors, respectively. Similarly, Fig. 4.9 shows

the % forecast error when tests are conducted on EKPC grid; our m(MI+ANN)

forecasts with 2.88% and the existing MI+ANN forecasts with 3.88% rel-

ative errors, respectively. This improvement in terms of relative % error

performance by our proposed DALF model is due to the following two rea-

sons; (i) the modiﬁed feature selection technique in our proposed DALF

28

0 5 10 15 20 25

1200

1300

1400

1500

1600

1700

1800

Time (hr)

Load (KW)

Actual

m(MI+ANN) forecast

MI+ANN forecast

Figure 4.8: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN

forecast (actual vs forecast)

m(MI+ANN) forecast MI+ANN forecast

0

0.5

1

1.5

2

2.5

3

3.5

4

X = 1

Y = 2.88

Error (%)

X = 2

Y = 3.88

Figure 4.9: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN

forecast (error performance)

model, and (ii) multi variate auto regressive training algorithm. The ﬁrst

reason accounts for the removal of redundant as well as irrelevant features

from the input data in a more eﬃcient way as compared to the existing

DALF model. By more eﬃcient way we mean that as our proposal con-

siders average sample in the feature selection process as well in addition

to the last sample and the target sample. Thus, the margin of outliers

29

m(MI+ANN) forecast MI+ANN forecast

0

1

2

3

4

5

6

7

X = 1

Y = 2.58

Execution time (sec)

X = 2

Y = 6.6

Figure 4.10: EKPC (27th January, 2015): m(MI+ANN) forecast vs MI+ANN

forecast (convergence rate analysis)

which cause signiﬁcant relative % error is down-sized. The second reason

deals with the selection of an eﬃcient training algorithm, as our proposi-

tion trains the ANN via the multi variate auto regressive algorithm and the

existing DALF model trains the ANN via levenberg-marquardt algorithm.

4.4.2 Convergence Rate Analysis

As discussed earlier that there exist a trade-oﬀ between forecast accuracy

and execution time. However, Figs. 4.6–4.7 and 4.9–4.10 show that our

proposed DALF model not only results in relatively less % error but also

less execution time. As mentioned earlier, our devised modiﬁcations in the

feature selection process and selection of the multi variate training algo-

rithm cause relative improvement in terms of % error. On the other hand,

m(MI+ANN) model converges with a faster rate (less execution time) as

compared to the existing MI+AN model due to three reasons; (i) exclu-

sion of the local optimization algorithm subject to error minimization, (ii)

modiﬁed feature selection process, and (iii) selection of multi variate auto

regressive training algorithm. Our proposition selects features from the

input data while considering average sample, last sample and the target

sample. This means that the chances of outliers in selected features have

been signiﬁcantly decreased, and the local optimization algorithm used

by the existing MI+ANN forecast model is not further needed. Our pro-

posed m(MI+ANN) forecast model does not account for the execution time

30

taken by the iterative optimization algorithm. As a result, our proposed

DALF model converges with a faster rate as compared to the existing DALF

model.

31

Chapter 5

mEDE and ANN Based Forecast Strategy

32

In STLF problems, the stochastic volatility of target values of the time

series has a signiﬁcant impact on the STLF errors. As we know that the

load of MG shows larger volatility as compared to that of traditional large

power system. Thus, with the objective to correctly discuss the diﬀerence

between load time series characteristics of a SMG and a large power system,

following are the typically used indices [19].

1. LV in months: In this technique, normalized mean square Mto show

LV in several months. Initially, sampled values of load time series are

normalized into [0,1] and then variance is calculated. If xiis the ith

load sample in total Nnumber of samples, then Mis given as follows:

M=1

N

N

X

i=1

(x′

i−µ′

i) (5.1)

where, x′

i=xi

max(x)and µ′=1

NPN

i=1 x′

i.

2. LV between two consecutive days: LV between two consecutive days

can be shown by two sub-indices; maximum diﬀerence ’Dmax’ and

average diﬀerence ’Davg ’. If Xiis the load of ith day and Ndis the

total number of days, then the formulae for Dmax and Davg are as

follows:

Dmax =max r∆Xi∆XT

i

L!(5.2)

Davg =1

Nd−1

Nd−1

X

i=1 r∆Xi∆XT

i

L(5.3)

where, ∆Xi=Xi+1 −Xi,i= 1,...,Nd−1 for Eqn. 5.2, i= 1,...,Nd

for Eqn. 5.3, and Xi=x(i−1)×L+1,...xi×L

x(i−1)×L+1 .

3. LV in a day: As the daily load time series curve shows variations–

average value ’¯xi’, minimum value ’xmin

i’ and maximum value ’xmax

i’.

For a maximum of Ndnumber of days, the minimum daily load rate

is computed as follows:

Rmin =1

Nd

Nd

X

i=1

xmin

i

xmax

i

.(5.4)

Similarly, the daily load rate can be computed as follows:

R=1

Nd

Nd

X

i=1

¯xi

xmax

i

(5.5)

33

4. LV in an hour: In order to analyze load variation in an hour, max-

imum slope ’mmax’ and average slope ’mavg ’ are used. If Nsamples

of load time series are considered then mavg and mmax are given as

follows:

mavg =1

N−1

N−1

X

i=1

|| x′

i−x′

i

ti+1 −ti

|| (5.6)

mmax =|| x′

i−x′

i

ti+1 −ti|| (5.7)

5.1 Motivation

Subject to highly volatile DALF of SGs, any forecast proposition must eﬃ-

ciently deal with the non-linear input/output relationship. In this regard,

ANNs are widely used as forecasters because these networks can predict

the non-linearities of SGs’ load with low convergence time. However, some-

times the achieved prediction accuracy is not up to the mark. Thus, leading

to the adoption of optimization techniques that can signiﬁcantly enhance

the prediction accuracy of ANNs. However, the cost paid to achieve high

accuracy is increased convergence time. Therefore, we focus on the devel-

opment of an DALF strategy that is based on a compromising approach

between prediction accuracy and convergence time.

5.2 The mEDE and ANN Based Forecast

Our proposed DALF strategy consists of three modules; pre-processing

module, forecast module, and optimization module (refer Fig. 5.1). The

pre-processing module makes the input load time series compatible with

the forecast module, and removes redundant and irrelevant features from

the input data. Based on sigmoid activation function and multi-variate

auto regressive model, the forecast module (which consists ANNs) performs

DALF of SGs. Finally, the optimization module minimizes prediction errors

to improve accuracy of the overall DALF strategy. Detailed description of

each module is as follows.

5.2.1 Pre-Processing Module

Since the ANN based forecaster uses only binary data to predict load of

the next day, the input data must be made compatible. In other words, the

34

Figure 5.1: mEDE and ANN: Block diagram

input data must be pre-processed to make it compatible with the forecast

module. In addition, redundant and irrelevant samples must be removed

from the input data set due to two reasons; (i) redundant features do not

provide more information and thus unnecessarily increase the execution

time during the training process (will be later discussed in the forecast

module), and (ii) irrelevant features do not provide useful information and

act as outliers. Detailed description of the pre-processor module is as fol-

lows.

As mentioned earlier, the data preparation module receives the input load

time series (historical). Suppose, the input load time series is shown by the

following matrix,

P=

p(h1, d1)p(h2, d1)p(h3, d1). . . p(hm, d1)

p(h1, d2)p(h2, d2)p(h3, d2). . . p(hm, d2)

p(h1, d3)p(h2, d3)p(h3, d3). . . p(hm, d3)

p(h1, d4)p(h2, d4)p(h3, d4). . . p(hm, d4)

p(h1, d5)p(h2, d5)p(h3, d5). . . p(hm, d5)

.

.

..

.

..

.

.....

.

.

p(h1, dn)p(h2, dn)p(h3, dn). . . p(hm, dn)

(5.8)

where, dnis the nth day, hmis the mth hour of the day, and p(hm, dn) is

35

the power consumption value of the of the nth day at the mth hour. As

per standard time horizon of one complete day, m= 24. The value of nis

totally dependent on the choice of designer; increasing nmeans ﬁne tuning

during the training of the forecast module because more historical lagged

samples of input power matrix are available. However, this ﬁne tuning is

achieved at the cost of more execution time. Thus, there is a trade-oﬀ

between convergence rate and accuracy.

Before feeding the forecast module with input matrix P, algorithm 2 is

executed by the pre-processing module to ensure P’s compatibility with

the forecast module.

Algorithm 2 : Pseudo-code of the pre-processing module

1: Pre-conditions: i= # of days, and j= # of hours per day

2: P←historical load data

3: Compute Pci

max ∀i∈ {1,2,3,...,m}

4: Compute Pnrm

5: Compute Medi∀i∈ {1,2,3,...,m}

6: for all (i∈ {1,2,3,...,m})do

7: for all (j∈ {1,2,3,...,n})do

8: if (P(i,j)

nrm ≤M edi)then

9: Pi,j

b←0

10: else if then

11: Pi,j

b←1

12: end if

13: end for

14: end for

Firstly, a local maximum value ‘pci

max’ is calculated subject to each column

of the historical input load matrix P;

pci

max =max(p(hi, d1), p(hi, d2), p(hi, d3),...

, p(hi, dn)),∀i∈ {1,2,3,...,m}(5.9)

Secondly, local normalization of each column of Pis carried out by its re-

spective local maxima; results are saved in Pnrm (range of Pnrm ∈[0,...,1]).

Thirdly, a local median value ’Medi’ (∀i∈ {1,2,3,...,n}) is computed

subject to each column of the Pnrm matrix. Fourthly, each element of Pnrm

matrix is compared with its respective local median value ’Medi’ based on

which encoding is performed as follows:

Pb(hi, dj) = 1 if Pnrm(hi, dj)≥Medi

0 Otherwise (5.10)

36

In this way, a resultant matrix ‘Pb’ consisting of only binary values (0’s

and 1’s) is obtained. This Pbmatrix not only contain irrelevant features

but also contain redundant features. In order to remove these two types

of features from the matrix Pb, we use mutual information technique that

is proposed in [16] and later on used in [18] as well. According to this

technique, the mutual information between input Xand target Tis given

as follows,

MI(X, T ) = X

iX

j

p(Xi, Tj)log2p(Xi, Tj)

p(Xi)p(Xi)(5.11)

In (5.11), M I (X, T ) = 0 means that the input and target variables and

independent, high value of MI (X, T ) means that the two variables are

strongly related and low value of M I(X, T ) means that the two variables

are loosely related. Expanded form of (5.11) is as follows,

MI(X, T ) = p(X= 0, T = 0) ×log2p(X= 0, T = 0

p(X= 0)p(T= 0)

+p(X= 0, T = 1) ×log2p(X= 0, T = 1

p(X= 0)p(T= 1)

+p(X= 1, T = 0) ×log2p(X= 1, T = 0

p(X= 1)p(T= 0)(5.12)

+p(X= 1, T = 1) ×log2p(X= 1, T = 1

p(X= 1)p(T= 1)

In order to determine the joint and independent probabilities in (5.12), an

auxiliary variable Vbis introduced.

Vb= 2T+X∀T, X ∈0,1 (5.13)

It is clear from (5.13) that Vbranges between 0 and 3. V0b,V1b,V2b, and

V3bcounts the number of sample data points (out of total ldata points) for

which Vb= 0, Vb= 1, Vb= 2, and Vb= 3, respectively. In this way, we can

now easily determine the independent and joint probabilities as follows.

p(X= 0) = V0b+V2b

l, p(X= 1) = V1b+V3b

l

p(T= 0) = V0b+V1b

l, p(T= 1) = V2b+V3b

l(5.14)

p(X= 0, T = 0) = V0b

l, p(X= 0, T = 1) = V2b

l

p(X= 1, T = 0) = V1b

l, p(X= 1, T = 1) = V3b

l(5.15)

37

Based on (5.12), mutual information between Xand Tis calculated, and

thus redundancy and irrelevancy is removed from the input samples. Ac-

cording to [16, 18], this mutual information based technique is computed

with reasonable execution time and acceptable accuracy.

5.2.2 Forecast Module

In literature, many research works exist that investigated LV in SMGs.

However, authors in [19] comprehensively examined LV based on four in-

dices; LV in months (refer Eqn. 5.1), LV between two consecutive days

(refer Eqn. 5.2), Eqn. 5.3), LV in day (refer Eqn. 5.4, Eqn. 5.5) and LV

in an hour (refer Eqn. 5.6, Eqn. 5.14). From these works, it is concluded

that any forecast strategy must be able to perform STLF of SMGs while

ensuring non-linear prediction capability. Therefore, we choose ANNs be-

cause these have the ability to capture the highly volatile characteristics of

load time series with reasonable accuracy.

For STLF, two strategies are used; direct forecasting and iterative fore-

casting [16]. However, it is discussed in [31] that the ﬁrst strategy may

introduce signiﬁcant round oﬀ errors and the second one introduces large

forecast errors. In order to overcome these imperfections, [16] has intro-

duced the idea of cascaded strategy. Thus, our proposed forecast module

implements the cascaded strategy. Our forecast module consists of an ANN;

24 consecutive cascaded forecasters such that each one of the 24 forecasters

has a single output to forecast the load of an hour of the upcoming day.

It is worth mentioning that the 24 hourly time series forecasters are sep-

arately modeled instead of a single complete/complex one. These 24 one

hour ahead forecasters allow improvement in terms of accuracy [16]. The

cascaded ANN forecast structure is a combination of direct and iterative

structures such that load of each hour of the next day is directly predicted

and each forecaster yields exactly one output.

In the forecast module, each forecaster is an AN that implements sigmoid

function as an activation function. We have chosen sigmoid activation

function because it enables the AN to capture the highly volatile (non-

linear) characteristic of SMG’s load time series. In order to update the

weights during training process of the AN, like [16, 18], we use multi-variate

auto regressive algorithm because it can train the ANN more faster than

levenberg-marquardt algorithm and gradient descent back propagation al-

gorithm [29]. According to kolmogrov theorem, if the ANN is provided

with proper number of ANs then it has the ability to solve a problem by

38

adopting one hidden layer. Thus, we have considered one hidden layer in

the ANN structure of all 24 ANs.

In short, (due to the aforementioned reasons) our proposed forecast module

is basically an ANN that consists of 24 ANs. Each AN is activated by

sigmoid function and is trained by multi-variate auto regressive algorithm.

Initially, the forecast module receives the binary encoded matrix Pbwhich

is the output of pre-processing module. From this matrix, the forecast

module constructs training and validation samples as follows:

ST=Pb(i, j),∀i∈ {2,3,...,m}

and ∀j∈ {1,2,3,...,n}(5.16)

SV=Pb(1, j),∀j∈ {1,2,3,...,n}(5.17)

Eqns. 5.16 and 5.17 illustrate that the ANN is trained by all the candidate

inputs (historical load time series) except the last one. The last sample of

historical load time series is used for validation purpose. In fact, the valida-

tion set/sample is a part of the training load samples that is removed from

it during the training process. Thus, the validation set becomes unseen for

ANN. Moreover, validation error can be used as a measure of ANN’s error

for the 24 hour forecast horizon. In order to make the validation error as a

true representative of the forecast error, validation sample needs to be as

close to the forecast horizon as possible. We consider the validation sample

as the day before the forecast day because it includes not only the short run

trend but also the daily periodicity characteristics of the load signal [32].

Thus, each of the 24 ANs is trained as per multi variate auto regressive

algorithm by the training samples and is validated by the last/unseen val-

idation sample. The Mean Absolute Percentage Error (MAPE) for each of

the 24 validation samples is considered as validation error in this research

work.

M AP Ei=1

m

m

X

j=1

|pact(hi, dj)−pf or (hi, dj)|

pact(hi, dj)(5.18)

where pact(hi, dj) is the actual load value of the ith hour of the jth day,

pfor(hi, dj) is the forecast load value of the ith hour of the jth day, and m

is the number of days under consideration.

The objective of supervised training is to adaptively adjust the weight

values (fed to ANs) such that the error signal ‘M AP Ei’ between the target

39

value and real output of neuron is minimized. For the sake of clarity, we

represent MAP Eias M AP E(i)

M inimize M AP E(i)∀i∈ {1,2,3,...,m}(5.19)

In this research work, the method of least squares is used, thus we can

write,

Minimize J (I) =

m

X

k=1

M AP ET(i)M AP E(i),

∀i∈ {1,2,3,...,m}(5.20)

In order to achieve the objective function in Eqn. 5.20, we use the multi-

variate auto regressive model [28]. We choose this model due to two reasons:

(i) it provides solution to the objective function in relatively less time, and

(ii) in terms of accuracy it is reasonable. It is worth mentioning here

that both these reasons are given after comparison of the multi-variate

auto regressive model with the typically used learning models like gradient

descent, delta, and widrow-hoﬀ [29]. Thus, the parameter matrices are [28],

n

X

i=1

W(i)R(j−i) = 0, j ={2,3,...,n}(5.21)

n

X

i=1

W(i)R(i−j) = 0, j ={2,3,...,n}(5.22)

where, W(1) = ID(IDis identity matrix), W(1) = ID, and Ris the cross

co-relation given as:

R(i) = 1

n

n−1−i

X

k=i

[x(k)−xm][x(k−i)−xm]T(5.23)

In Eqn. 5.23, xis the vector of observed data, and xmis the mean of

observed data,

Based on these equations, [28] deﬁnes prediction error co-variance matrices

as follows,

∆t=Pn

k=1 Wt(−k)R(t−k+ 1)

∆t=Pn

k=1 Wt(k)R(−t+k−1)

Vt=Pn

k=1 Wt(k)R(−k)

Vt=Pn

k=1 Wt(k)R(−k)

(5.24)

40

The recursive equations are given as follows:

Wt+1(k) = Wt(k)Wt+1 (t+ 1)Wt(t−k+ 1)

Wt+1(k) = Wt(k)Wt+1(t+ 1)Wt(t−k+ 1))(5.25)

Wt+1(t+ 1) = −∆tV−1

t

Wt+1(t+ 1) = −∆tV−1

t)(5.26)

In order to ﬁnd the weights W, the recursive equations are solved. Further

details of the weight update process can be found in [28].

Once the weights in Eqn. 5.25 and Eqn. 5.26 are adaptively adjusted

in a recursive manner, the forecast module return the error signal to the

optimization module. Stepwise algorithm of the proposed forecast module

is shown in algorithm 3.

Algorithm 3 : Pseudo-code of the forecast module

1: Pre-conditions: M AP E(i) is the output of AN, and i∈ {1,2, ..., 24}

2: Receive Pbmatrix from the pre-processing module

3: Compute STand SV

4: Compute M AP E(i) by letting W(1) = IDand

5: W(1) = ID

6: Compute J(i)

7: while Max. # of iterations not reached do

8: if J(i+ 1) ≤J(i)then

9: M AP E(i)←M AP E(i+ 1)

10: else if then

11: Train ANN as per Eqn. 5.25 and Eqn. 5.26

12: Compute M AP E(i) and go back to step (6)

13: end if

14: end while

15: Return J(I) to the optimization module

5.2.3 Optimization Module

Based on the nature of the overall forecast strategy, the basic objective of

optimization module is to minimize the forecast error. For this purpose,

various choices are available like linear programming, non-linear program-

ming, quadratic programming, convex optimization, heuristic optimization,

etc. However, the ﬁrst one is not applicable here because the problem is

41

highly non-linear. The non-linear problem can be converted into a linear

problem, however, the overall process would become very complex. The

second one is applicable here and gives accurate results, however, its exe-

cution time is very high. Similarly, the third and fourth ones suﬀer from

slow convergence time. It is worth mentioning here that optimization does

not imply exact reachability to optimum set of solutions, rather, near opti-

mal solution(s) are obtained. To sum up, heuristic optimization techniques

are preferred in these situations because these provide near optimal solu-

tion(s) in relatively faster rate of convergence.

Diﬀerential evolution is one of the heuristic optimization techniques pro-

posed in [33] and its enhanced version is used for forecast