Available via license: CC BY 3.0

Content may be subject to copyright.

Journal of Physics: Conference Series

PAPER • OPEN ACCESS

Probabilistic Load Forecasting of Adaptive Multiple Polynomial

Regression considering Temperature Scenario and Dummy variables

To cite this article: Jiang Li et al 2020 J. Phys.: Conf. Ser. 1550 032117

View the article online for updates and enhancements.

This content was downloaded from IP address 181.215.75.235 on 16/06/2020 at 13:43

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution

of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Published under licence by IOP Publishing Ltd

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

1

Probabilistic Load Forecasting of Adaptive Multiple

Polynomial Regression considering Temperature Scenario

and Dummy variables

Jiang Li 1, Liyang Ren 1, Baocai Wang 1 and Guoqing Li1

1Northeast Electric Power University, Jilin, Jilin, 132000, China

2China Electric Power Research Institute, Beijing, 100000, China

*Liyang Ren: 2024468066@qq.com

Abstract. The monthly or yearly low accurate history data always leads to the low prediction-

accuracy for load forecasting. We use temperature data from Sydney, Australia and the New

South Wales Natural Load Dataset. To improve the data-based forecasting accuracy and time

related scenario, this paper builds an adaptive multiple polynomial regression model considering

temperature scenario and dummy variables. These dummy variables are divided into three

aspects: trend variables, date variables and temperature variables. Trend variables are used to

predict the whole economic development and user habit. Date variables are introduced to deal

with the characteristics of working days and holidays. Cubic function for temperature variables

from Australia and the New South Wales electric load history data is constructed to describe the

relationship between load and temperature scenario. A temperature scenario is generated by

considering the different loads of different seasons and the probability search of different

scenarios. The load forecasting interval under different scenarios is given and analyzed by using

dummy variables. At last, the method is validated based on the history data in a certain area. The

prediction result with high accuracy shows clear intuitive and powerful interpreting ability,

which can provide reliable decision basis for long term load forecasting. After simulation

analysis, the accuracy of load forecasting based on 3-year history increases by 3.8%.

1.Introduction

Long-term load forecasting is very important for the production, operation, planning and construction

of power systems, which is the basis and also the value of history dada mining [1, 2]. With load

diversification and the accessing of large-scale distributed renewable energy sources, it is more and

more difficult to give the load forecasting interval under complex scenarios.

In recent years, the load forecasting mainly focuses on short-term load forecasting, topic papers about

long-term load forecasting are relatively fewer [3]. In practice, forecasting is essentially a stochastic

problem. Thus, exact forecasting for the future is impossible, and it can be assumed that forecasting for

long-term horizons can only be the reference for reducing the effect of uncertainty as few as possible

[4]. One way to counter this assumption is the scenario analysis that looks into a selected scenario in the

future. Due to the uncertainty in weather and economic forecasts, forecasting process is encouraged to

provide explicit forecasting value based on different scenarios. The other load forecasting methods are

predictive modeling, weather normalization, and probabilistic forecasting [5]. There are many

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

2

traditional short-term load prediction methods, such as regression prediction method and gray prediction

method. There are also other intelligent prediction algorithms, such as support vector machine method

and neural network method [6]. Gray prediction method requires less sample datum and is easy to

achieve. However, the demand load has an exponential trend [7-9]. Neural network method has effective

prediction results. The black-box model cannot explain the relationships between input and output

variables, which makes the model less able to explain and is easily trapped in local optimal solution.

Therefore, it is very difficult to initialize the model [10]. The regression analysis method is simple in

calculating principle and has a clear solving algorithm. The prediction speed is fast and has a strong

explanatory power of the model, and it is the earliest used in load forecasting. Literature [11] proposes

a new approach to support the process of forecasting the hourly electric load values for the next day.

The adopted methodology based on neural networks is only supported by detailed information related

to consumers’ typical behavior and climatic information. The case study was tested in two real

distribution substation outputs, demonstrating its effectiveness and practical applicability in [12].

Literature [13,14] provides new ideas for regression prediction. However, the method cannot reflect the

inherent mechanism of load fluctuation, and just considers the quantitative factors such as gross national

product and population, neglecting the meteorological temperature, periodic load characteristics, and

the special nature of the holiday load, which affect the adaptability of proposed method under different

scenarios.

With the increase of economic level, the proportion of temperature-sensitive load in the home is

increasing, which makes the load more and more obvious with temperature. Due to the uncertainty of

temperature, load forecasting is a random problem. The main methods are point forecasting and cannot

determine the forecasting interval of load fluctuations in the future, So, it is unscientific to judge the

long-term load forecasting by comparing the predicted and true values of the corresponding points [15-

16]. The low accuracy features of the traditional prediction methods provide very limited information

for the prediction model, their prediction errors are large and have poor interpretation ability, such as

the monthly maximum or minimum temperature, and it cannot explain the specific moment when this

temperature appears and the dynamic characteristics of the load with temperature [17, 18]. Therefore,

this paper proposes a high-precision load forecasting method that adapts to different data quality to solve

such problems.

The main contribution of this paper is to generate temperature scenario and applied into probabilistic

load forecasting problem by using dummy variables. The long-term load forecasting accuracy is

improved and both upper boundary and lower boundary are given with probabilistic forecasting. [19]

Based on the hourly history data, we first establish a regression model, dummy variables are used to

quantify the year, week and day of the dummy variables. When the weekly history data are classified, it

should take into account the special nature of holidays; the temperature effect is considered for periodic

load forecasting under working days and holidays. Compared with different term scale scenario for

forecasting error, the optimal scenario with high accuracy is generated. The probability is used to

optimize the load parameters and the forecasting interval is used to define the load change. We will

explain the concept of the temperature scene in 3.3.4 and introduce the construction of the temperature

scene in the form of simulation verification in Section 4.2.

The remainder of the paper is organized as follows: Firstly, the generalized multivariate linear

regression model for load forecasting is established based on per hour history data and regression

constant in section2. Then, the detailed model for probabilistic load forecasting is descripted by using

trend variables, date dummy variables and temperature scenario in Section 3. Finally, the performance

of the proposed method is verified in Section 4. Section 5 concludes the paper and proposes future work.

2.Generalized Multiple Linear Regression

In this section, a multivariate linear regression model is firstly given and then the polynomial regression

model is proposed to solve uncertainties from working days and holidays.

The general form of a multivariate linear regression model:

)...2,1(...

110 nieXXY ipp =++++=

(1)

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

3

Therefore, β0 stands for the regression constant, β0, …, βp stands for the partial regression coefficient.

Y is called the explained variable (dependent variable),X1 , X2 ,⋅⋅⋅ , XP is called the explanatory

variable(independent variable）,𝑒𝑖 is the random error[20-22].Compared with other load forecasting

methods, the proposed load forecasting method based on principal component regression effectively

retains most information of the original variables and reduces the correlation among the data, finally

improve the accuracy of load forecasting [23].

In practical problems, the relationship between the explained variable Y and the explanatory variable

X is not linear in many models and they can be transformed into a linear relationship through the

functional relationship of independent variables or dependent variables. Linear regression could be used

to solve unknown parameters and make regression diagnosis [24].

In polynomial regression, the influencing variable may be a polynomial, or they are the two

independent variables that have an interaction effect, the regression equation:

3

0 1 1 2 2 3 1 2 4 1i i i i i i

Y X X X X X e

= + + + + +

(2)

The polynomial regression is transformed into a linear regression of four variables:

0 1 1 2 2 3 3 4 4i i i i i i

Y X X X X e

= + + + + +

(3)

In the regression analysis, we first quantify the qualitative variables by quantifying some independent

variable sand then introduce dummy variables that take only two values of 0 and 1. When an attribute

appears, the dummy variable takes 1, and otherwise 0. If a qualitative variable has K categories, it is

necessary to introduce K-1 0-1 virtual arguments, taking working days and holidays as an example.

1

1

1 working days

1 holidays

X

X

=

=

Then a regression equation with load characteristics for the working days is described as follows:

0 1 1

YX

=+

(4)

When describing the working days, X1=1, the regression equation is: E(Y)=β0+β1.When describing

holidays, X1=0 the regression equation is E(Y)=β0 in [25]. The resulting daily load characteristics are

described by regression constants.

3.Building the forecasting model

In this section, dummy variables, such as trend variables and data variables, are firstly introduced. Then,

interaction among different variables is modeled in linear regression expression. Finally, two methods

for generating temperature scenes, such as moving day temperature method and probabilistic

temperature scene creation method, are proposed, and the probability prediction errors under different

time scenarios are analyzed.

3.1. Trend variables

Data are sourced from Sydney's temperature in Australia and the Natural Load Dataset in New South

Wales.

Figure 1 plots the hourly load and temperature scatter plots for a region from 2006 to 2013, and Table

1 shows the annual load table, and it is relatively stable. There is no annual increase or decrease trend.

This may be caused by social-economic development and population growth resulting in increased

electricity consumption.

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

4

Figure 1. Scatter plot of history data (2006-2013)

Table 1. Annual load (2006-2013)

Years

2006

2007

2008

2009

Load (GW)

28286

28434

28579

28511

Years

2010

2011

2012

2013

Load (GW)

28741

29068

29415

29734

In order to actually describe the trend of increasing load, we introduce the trend variables Tr in the

regression model and define the rise of a series of natural numbers per hour to quantify the load growth

trend. For example, in the first hour of 2006, the trend variable was 1, the second hour was 2, and then

the analogy. This trend variable is a linear approximation of the load growth sequence. The trend

expression of economic growth is expressed as:

01ir

Load T e

= + +

(5)

3.2. Date variable

Power consumption behavior is one of the main factors that affect load fluctuation. This section

describes the load characteristics of periodic daily, weekly and yearly variables by date. As can be seen

from Fig. 1, the annual load has a periodic pattern of load fluctuation. The yearly component of the load

is closely related to seasonal climate characteristics. The peak loads of summer and winter reach

maximum, while loads of spring and autumn are minimum. This paper introduces the virtual independent

variable M for 12 categories, the treatment is as follows.

2006/1/1 2007/1/1 2008/1/1 2009/1/1 2010/1/1 2011/1/1 2012/1/1 2013/1/1

2006/1/1 2007/1/1 2008/1/1 2009/1/1 2010/1/1 2011/1/1 2012/1/1 2013/1/1

Time/h

(b) Hourly Temperature

Time/h

(a) Hourly Load

40

20

0

-20

-40

Temperature/

6000

5000

4000

3000

2000

Load/MW

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

5

1

1

2

2

11

11

1

1

1

1

1

1

2

2

X = January

X =0 others

X = February

X =0 others

X =

December

November

X =0 others

X =

X =0 others

The regression equation described the monthly load characteristics is:

0 1 1 2 2 12 12

0 1 2 3 4

...

t t t t t t

Y X X X

Load M D H D H e

= + + + +

= + + + + +

(6)

Where β is the regression coefficient and Mt, Ht, Dt is the dummy variable. HtDt represents the

interaction between the dummy variables D(day) and H(hour). e indicates random error. When the load

is described in January, the variables are X1=1, X2=X3=…=X11= X12=0 in the regression equation.

The load on different date types is also very different within a week, but shows a clear periodic

pattern. As shown in Figure 2. In normal days, there was a significant difference between weekdays and

weekends, the total load of the weekends was significantly lower than the daily cyclical changes.

Load/MW

12000

11000

10000

9000

8000

7000

6000

5000

4000 Sat Sun Mon Tue Wed Thu Fri Sat Sun

Week/h

Figure 2. Weekly load (2006/3/25—2006/4/2)

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

6

6500

6000

5500

5000

4500

4000

3500

3000

2500

2000

1500

Load/MW

Temperature/

-30 -20 -10 0 10 20 30 40

Quadratic Function

Section Function

Cubic

Function

Figure 3. The fitting plot of hourly temperature- hourly load

In order to describe the load characteristics, we introduce the independent dummy variable D to

describe the load difference between different date types. One week can be divided into 7 categories. 6

dummy arguments and processing methods are introduced into the monthly variable M [23]. Because it

reduces industrial electricity consumption, in the daily cycle, the load characteristics were significantly

different at different times of the day, and the nighttime electricity consumption was significantly lower

than that during the daytime. The virtual independent variable H is introduced to describe the load

characteristics, which is divided into 24 categories and introduced 23 dummy variables.

Working day morning shift load is significantly higher than the holiday morning, this is due to the

fact that people do not have to get up early to work on a day off, and reduce the load on electricity, and

we introduce interaction H and D in the model. Due to the different load components, each holiday

generally occurs in a fixed period of time every year. During the holidays, a large number of factories,

enterprises and institutions to withdraw from the electricity load, they are mainly including residential

load, commercial load and non-stop industrial load, this made the load significantly reduced from the

normal day [24]. According to the flexible adjustment policy, current holidays will be converted into

lasted holidays or working days, and it will raise the overall forecast level. In summary, the regression

equation can be expressed as follows,

0 1 2 3 4t t t t t t

Load M D H D H e

= + + + + +

(7)

3.3. Temperature Scenarios Generation

1) Analysis of Temperature Variables

In this section, the load-temperature function of the cubic function is introduced. Figure 3givesan

hourly temperature-load scatter plot for a region from 2006 to 2011, and its section linear, quadratic,

and cubic fitting functions are plotted. The temperature-load relationship is asymmetric, while the

quadratic function can only describe the symmetrical function. Thus, the cubic function is better than

the quadratic function for load forecasting.

2) Interaction of Temperature Variables

The temperature of summer is higher than that in winter, the temperature is distinguishing in different

months. M*T should be considered in the interaction between month variable M and temperature

variable T. During the day, the temperature in different time periods also changes regularly. The daytime

temperature is higher than that of night, and the interaction between variables H and T need to be

considered. The temperature function in the regression model is:

2 3 2 3 2 3

0 1 2 3 4 5 6 7 8 9t t t t t t t t t t t t t t t

Load T T T T T H T H TM T M T M e

= + + + + + + + + + +

(8)

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

7

Where β is the regression coefficient and Mt, Ht, Dt is the dummy variable. H×T, H×T2, H×T3 is the

interaction between the variables H and T. M×T, M×T2, M×T3 is the interaction between variables M

and T. e indicates random error.

3) Proximity of Temperature

Proximity is a phenomenon in psychology, referring to the phenomenon that when people recognize

a series of things, the memory effect of the last part of the items is better than that of the middle part.

The same phenomenon exists between the load and the temperature, that is, the current time before the

temperature will also affect the load changes. We add temperature variables into the model, introducing

the same form of variables as T. Tt-i refers to the temperature of the first i hours (i = 1, 2, 3), as, Tt-i, Tt-

i2, Tt-i3, Tt-iHt, Tt-i2Ht, Tt-i3Ht, Tt-iMt, Tt-i2Mt, Tt-i3Mt, Ta refers to the temperature average of the first 24 hours

of the current time. Proximity variables Ta, Ta2, Ta3, TaHt, Ta2Ht, Ta3Ht, TaMt, Ta2Mt, Ta3Mt.

Considering the proximity effect, the function of load forecasting is:

3

2 3 2 3 2 3 2

0 1 2 3 4 5 6 7 8 9 1 1 2

1

3 2 3 2 3 2 3

3 4 5 6 7 8 9 1 2 3 4

5

(

)

t t t t t t t t t t t t t t t i t i t i

i

i t i i t i t i t i t i t i t i t i t i t i t i t i t a a a a a a a a t

aa

Load T T T T T H T H TM T M T M T T

T T H T H T H T M T M T M T T T T H

T

−−

=

− − − − − − −

= + + + + + + + + + + + +

+ + + + + + + + + + +

2 3 2 3

6 7 8 9t a a t a a t a a t a a t

H T H T M T M T M e

+ + + + +

(9)

In summary, the multiple linear regression load forecasting model based on time-temperature near-

effect is:

2 3 2 3

0 1 2 3 4 5 6 7 8 9 10 11 12

3

2 3 2 3 2 3 2

13 14 6 7 8 9 10 11 12 13

1

3

14

(

t t t t t t t t t t t t t t t t

t t t t i t i i t i i t i i t i t i t i t i t i t i t i t i t i t

t

i t i

Load Tr M D H D H T T T T H T H T H T M

T M T M T T T T H T H T H T M T M

T

− − − − − − − −

=

−

= + + + + + + + + + + +

+ + + + + + + + + +

+

2 3 2 3 2 3

6 7 8 9 10 11 12 13 14

)

t a a a a a a a a t a a t a a t a a t a a t a a t

M T T T T H T H T H T M T M T M e (i=1,2,3)

+ + + + + + + + + +

(10)

4) Probabilistic Temperature Scenario Generation

The meteorological characteristics of the same period of each year are similar. A type of meteorology

may arrive a few days earlier, or it may arrive a few days later. For example, the temperature in May

2006 may be similar to the temperature in April or June 2009. There is a strong correlation between load

and temperature, and the load will follow the temperature change. It is expressed that if the high

temperature lasts for a long time in the summer, people will continue to use the air conditioner to cool

down, resulting in an increase in load. This temperature change phenomenon will lead to large load

differences, so this temperature change characteristics should be considered in long-term load

forecasting, and power planning and scheduling should be done to fully cope with this uncertain

meteorological variation.

In this section, a probabilistic temperature scenario generation method based on a moving

temperature scenario is proposed, which is compared with the fixed temperature scenario generation

method. If history year is k, then we will generate kth probability temperature scenario. The moving day

temperature method is based on the change characteristic of temperature, and the historical temperature

is moved forward or backward by n days to create more equal-probability historical temperature scenes,

taking table.3 forward and backward one day as an example. If k history moves forward and backward

by n days, then (2n+1) k temperature scenes are generated.

Table 2. Schematic diagram of shifted-data

Base year

)1,365(−id

T

),1( id

T

),365(id

T

Move forward for 1

day

),1( id

T

),2( id

T

)1,1( +id

T

Moved 1day later

)1,364(−id

T

)1,365(−id

T

),364(id

T

The history year and moving days of temperature are the key indicators that influence the prediction

accuracy. We use the error parameter optimization method, and the formula is:

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

8

1100

100

t q t t t q

t q t

t t q t t q

qy y y y

S y y q qy y y y

− −

=

−

,,

,

,,

( )( )

( , , )

( )

(11)

Where, q is the given value (q=1,2…,99). yt is the actual load at time t, yt,q is the q-digit load at time

t. The smaller the value is, the smaller its error.

3.4. System flow chart

This section predicts the date entered. First, it is judged whether it is a normal working day. If it is an

abnormal working day, it is corrected according to the temperature scene and then enters the normal

cycle. The normal cycle is predicted once per hour to 24 hours, and H returns to zero to start a new day

forecast.

The load forecasting system flow chart is shown in Figure 4.

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

9

Start

Establish generalized regression linear equation

Construction of prediction model

End

Lead into trend

variable Tr and date

variable

D

day

variable

M

month

variable

H

hour

variable

Lead into

temperature variable

Load characteristic equation

Loadt=β0+β1Mt+β2Dt+β3Ht+β4DtHt+e

Considering the interaction

between date variable and T

Choose the right

temperature scene

Get the equation Loadt=β0+β1Tt+β2Tt2+β3Tt3

+β4Tt+β5Tt2Ht+β6Tt3Ht+β7TtMt+β8Tt2Mt+β9Tt3Mt+e

Calculate predicted value

Compare predicted and actual

values

Enter the forecast date

Determine if it is a

normal working day

Y

N

Output forecast result

H<23?

Definition H=0，D=0

N

H=H+1

Y

D=D+1

H=0

Updated

Whether to reach

the end date？

Y

N

Figure 4. Flow chart of load forecasting based on temperature scenario

4.Case study

In this Section, we use temperature data from Sydney, Australia and the New South Wales Natural Load

Dataset. the hourly data from 2006 to 2010 in regression model are used to predict the load in 2011 and

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

10

analyze error. The mean absolute percentage error (MAPE) is used to evaluate the prediction accuracy of

the model.

%100

1

1

−

=

=

N

tt

tt yyy

N

MAPE

(12)

Where yt is the real load, and

ˆt

y

is the predictive value.

Table 3 Deviation for different models

Model

R2

MAPE (%)

Standard deviation

(MW)

1

0.867

8.94

318.30

2

0.944

4.27

186.57

3

0.952

3.33

143.40

4

0.967

2.95

128.59

In model 1 single-factor variables (Tr, M, D, H, T, T2 and T3) are considered, the accuracy was also

low because of interaction between variables. Model 2 added the coupled variable with temperature and

date (M∙T, M∙T2, M∙T3, D∙T, D∙T2, D∙T3, D∙H, H∙T, H∙T2 and H∙T3) into the regression model for

improving forecasting accuracy. Model 3 has been revised on the coupled variables of Model 2. Model

4 adds the short-term effect of temperature scenario, the corrected R2 reaches 0.967, and MAPE is

reduced to 2.95%, which verifies the validity of short-term effect to improve prediction accuracy for

temperature.

The prediction model 4 considering temperature is

1 2 3 4 5 ()

t t t t t t t

L Tr M D H DH f T

= + + + + +

(13)

Where f

（

Tt

）

is the temperature model and the expression is

2 3 2 3

6 7 8 9 10 11

2 3 2 3

12 13 14 15 16 17

()

+

t t t t t t t

t t t t t t

f T T T T MT MT MT

HT HT HT MHT MHT MHT

= + + + + + +

+ + + +

(14)

During the year, summer is higher than winter temperature, that is, there are differences in

temperature in different months, and the interaction between variables M and T should be considered.

M×T

、

M×T2

、

M×T3; During the day, the temperature of different time periods also changes periodically.

The temperature at noon is higher than that at night.

4.1. Data length for estimating the regression parameters

The length of the history data in the regression model is a key factor affecting the forecasting accuracy.

Table 4 lists error for different data length. In the table 4, the second line is based on last 2-years history

data to forecast the load power. It can be seen from the average value that the minimum error 3.17% of

parameter estimation can be obtained by using 3-years history data. This paper selects the data from

last3-years to estimate the regression parameters.

Table 4. Error for different data length

DATA LENGTH (years)

2009

(%)

2010

(%)

2011

(%)

2012

(%)

Average

(%)

1

4.09

3.23

3.19

2.89

3.35

2

4.30

2.94

2.94

2.67

3.21

3

4.26

2.89

2.96

2.58

3.17

4

4.37

2.64

3.20

2.70

3.23

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

11

4.2. Temperature scenario generation

Probabilistic load forecasting method flow is as follows. First, the probabilistic parameter optimization

is performed. The k-n parameter with the highest accuracy is selected, and (2n+1)k temperature scenes

are created as the input of the prediction model for each temperature scene. Forecasting separately to

simulate the predicted annual temperature, and obtaining (2n+1)k prediction results, these results can be

used to find the median or interval division, which is of great significance for guiding medium and long-

term power grid planning and scheduling.

A temperature scenario based on from history 2005-2012 was generated, and the load forecasting

was based on 2013 actual date type (k = 1, 2, ..., 8). When k = 1, the scenario is generated from the 2012

temperature data, and when k = 2, the temperature scenario is based on 2011-2012, and so on. This

section will search for the optimal k-n parameter by moving the history year temperature data under k-

n year(s) and n day(s), and Table 4 shows the probabilistic error of different temperature scenarios.

As can be seen in the Table 5, the optimal k-n parameter is 8-13 days, which is based on the 8-year

and 13 days, probabilistic error is 59.35%, which has higher accuracy than 63.05%for 8-year fixed day

temperature scenario. Probabilistic error for 8-year and the fixed-day temperature scenario was 4.69%,

and the median error of MAPE was 4.51%. Figure 5 shows the probability error curves of temperature

scenes based on different history years (k = 1, 2, ..., 8). It can be seen that with moving-days increasing,

the probability error fluctuating. In the initial stage, the forecasting accuracy can be significantly

improved by increasing the number of moving days. When k = 1, it means the temperature data moving

forward three days in 2012, the probability error can be reduced from 78.37% to 66.27%, and the median

MAPE dropped from 5.67 to 5.01, the accuracy increased to 11.6%.

Table 5. Probabilistic error (%) of different temperature scenarios

k

n

1

2

3

4

5

6

7

8

0

78.37

77.23

69.63

67.86

64.48

63.49

63.05

63.05

1

76.42

76.41

63.36

64.45

61.83

61.08

60.99

63.58

2

70.43

68.82

61.63

62.41

60.61

60.12

60.13

61.57

3

66.27

64.34

60.79

61.21

59.99

59.94

59.86

60.02

4

65.45

64.17

60.49

61.00

59.78

59.87

59.63

60.09

5

64.77

63.71

60.40

60.32

59.77

59.77

59.63

59.92

6

63.57

62.46

60.38

60.83

59.83

59.76

59.68

59.55

7

63.62

63.50

60.31

60.63

59.88

59.76

59.70

59.70

8

62.61

62.8

60.26

60.55

59.87

59.75

59.73

59.69

9

62.80

62.67

60.24

60.30

59.82

59.62

59.78

59.48

10

62.56

62.64

60.22

60.28

59.80

59.75

59.84

59.42

11

61.91

62.21

60.20

60.13

59.73

59.74

59.87

59.38

12

61.03

61.90

60.17

60.01

59.68

59.73

59.86

59.35

13

61.72

61.85

60.20

60.04

59.70

59.72

59.93

59.43

15

61.23

62.16

60.32

60.4

59.91

59.87

60.12

59.75

20

62.03

62.57

60.73

61.08

60.07

60.20

60.30

60.29

30

62.85

62.71

61.5

61.87

60.78

60.94

61.15

61.07

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

12

Figure 5. Curve of Probabilistic error

The regression model parameters are estimated by using temperature scenario and the 2014 load

power are forecasted. The dashed line in Figure 6 is the forecasting curve based on the temperature

scenario. Black is the actual load and the red line is the median load. It can be seen that compared with

the forecasting accuracy of fixed-day temperature, the forecasting accuracy of moving-day temperature

scenario is obviously improved [26].

Figure 6. Probabilistic load forecasts for different term (2014)

Table 6 summarizes the forecast results. The results show that the probability error at the moving

day temperature of July 16 to July 22, 2014 is 1.82, which is larger than the 2.62 from the fixed day

temperature. In the median, the MAPE was 4.07, representing an increase of 40.32% under the fixed-

day temperature of 6.82. The 2014 full-year median load MAPE dropped from 4.90 to 4.76. It can be

seen that the predictive accuracy of the probabilistic load based on the moving day temperature scenario

is significantly higher than that of fixed day forecast, especially in the case of historical temperature

5 10 15 20 25 30

Moving Days(n)

85

80

75

70

65

60

55

50

Error

7/16 7/17 7/18 7/19 7/20 7/21 7/22

Time/h

（a） Fixed-day Forecast

Load/MW

5000

4000

3000

2000

5000

4000

3000

2000

Load/MW

7/16 7/17 7/18 7/19 7/20 7/21 7/22

Time/h

(b) Mobile-day Forecast

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

13

data is limited, and it can also reach higher forecast level by moving daily temperature.

Table 6. Error statistics

error

2014

7/16-7/22,2014

Fixed-

day

Mobile-

day

Fixed-

day

Mobile-

day

Probability

error

63.63

60.53

2.62

1.82

(MAPE)

4.90

4.76

6.82

4.07

Based on the preferred parameter k-h, according to the 2014 actual date type, we forecast the 2014

monthly electricity consumption and monthly maximum load a 10% quantile, a median and a 90%

quantile load value are taken at each of the forecast points.

Figure 7 and 8 show monthly maximum load and monthly electricity consumption respectively. The

broken line in the figure represents the predicted value based on the historical temperature and the daily

moving temperature data from 2005 to 2013, and the three solid lines from top to bottom are respectively

90% quantiles, median and 10% quantiles, black solid points represent the true value of the load.

Figure 7. Monthly peak load (2014)

Figure 8. Monthly load (2014)

As can be seen from the figure, most of the actual load points fall near the median, and individual

points fall outside the range of 10% and 90% quantile lines. The 10% and 90% quantile loadings indicate

the extreme cases with a lower probability of occurrence, but that does not mean it will never happen.

The maximum load in February, October, November and December and the total electricity

consumption in June and October are all in the 10% quantile line. In March and April, electricity

consumptions all in the 90% quantile. The maximum load in May was below the 10% quantile, and the

electricity consumption in January exceeded the 90% quantile. It can be seen that the defined prediction

interval can reflect the real value of the load more accurately.

Figures 9 and 10 show the temperature scenarios created with the data of 2005-2011 to forecast the

monthly maximum load and monthly electricity consumption for 2012-2014. As can be seen from the

figure, the annual maximum load in February-May and the electricity consumption in August-November

is small, and it even below the 10% quantile load value, during which the relevant part should be done

according to the 10% quantile. In January of each year, it is reasonable to arrange the power generation

plan according to the median monthly maximum load. In July, the power generation plan in the light of

90% quantile load would be more reasonable.

1 2 3 4 5 6 7 8 9 10 11 12

Time/Month

6000

5500

5000

4500

4000

3500

3000

Load/MW

90% Quantiles

Median

10%

Quantiles

1 2 3 4 5 6 7 8 9 10 11 12

Time/Month

Load/MW

2.9

2.8

2.7

2.6

2.5

2.4

2.3

2.2

2.1

2

10%

Quantles

90%

Quantiles

Median

×106

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

14

Figure 9. Monthly peak load (2012-2014)

Figure 10. Monthly load (2012-2014)

In comparison with the point load forecasting, the proposed probability forecasting method provides

a series of load changes, which can reflect the fluctuation range and trend of load fluctuation more

accurately and define different quantile intervals as needed. In different time periods, the width of the

forecast interval is different, this provides policy makers with more useful information, which is

unmatched by point forecasts

5.Conclusion

This paper extends the linear multiple linear regression model into the adaptive polynomial multiple

regression model. Trend variables, date variables and temperature variables as dummy variables are

used to describe the inherent characteristics of load changes in future. Economic development, utility

consumption habit of working day and holidays, temperature effect and so on are viewed as linear,

quadratic and even triple terms of the polynomial model. The proposed method quantifies 12 months, 7

days and 24 hours categories as the main factors for scenario generation. Temperature scenario

optimization is applied to analyzing load forecasting median error and border, and load forecasting

accuracy based on 3 years history is improved with 3.8%.

Case studies show that the proposed probability forecasting method can explain the trend of future

load changes more accurately, and it can provide more useful information for the long-term load

forecasting. It will help policy-makers estimate the possible uncertainties and risk factors of future loads.

This will lay a solid foundation for load forecasting in complex operations.

Acknowledgment

Thanks to the National Natural Science Foundation of China for the project: Probability prediction and

active smoothing theory of renewable energy slope events in AC and DC power grids (Project No.

51977030)

Author Biographies

Jiang Li In 2003, he obtained a bachelor's degree in electrical engineering and automation from

Shanghai Electric Power College; In 2006, he obtained a master's degree in electrical engineering and

automation from Northeastern Electric Power University; In 2010, he obtained a doctorate degree in

electrical engineering and automation from North China Electric Power University; Visiting scholar at

Cornell University in the United States in 2014; In 2015, he was a visiting scholar at the American

Energy System Research Center.

Liyang Ren In 2017, he obtained a bachelor's degree in electrical engineering and automation from

Northeastern Electric Power University; Master's degree in Power Systems and Automation from

Northeastern Electric Power University from 2017 to the present;

2012/1 2012/7 2013/1 2013/7 2014/1 2014/7

Time/Month

6000

5500

5000

4500

4000

3500

3000

Load/MW

90% Quantiles

Median

10%

Quantiles

2012/1 2012/7 2013/1 2013/7 2014/1 2014/7

Time/Month

2.9

2.8

2.7

2.6

2.5

2.4

2.3

2.2

2.1

2

×106

Load/MW

10%

Quantiles

Median

90% Quantiles

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

15

References

[1] Kanggu Park; Seungwook Yoon; Euiseok Hwang. Hybrid Load Forecasting for Mixed-Use

Complex Based on the Characteristic Load Decomposition by Pilot Signals. IEEE Access.

December 2018; pp.12297-12306.

[2] Mohamed Reda Nezzar; Nadir Farah; Tarek Khadir. Mid-long term Algerian electric load

forecasting using regression approach. IEEE Transactions on Power Systems, July2013;

pp.121-126.

[3] Weicong Kong ; Zhao Yang Dong ; et al. Short-Term Residential Load Forecasting Based on

Resident Behaviour Learning. 2018; pp. 1087-1088

[4] T. Hong, J. Wilson, J. Xie,; Long term probabilistic load forecasting and normalization with hourly

information; 2013. pp. 456-462

[5] Qingshan Xu; Yifan Ding; Qingguo Yan; et al. Day-Ahead Load Peak Shedding/Shifting Scheme

Based on Potential Load Values Utilization: Theory and Practice of Policy-Driven Demand

Response in China. IEEE Access August2017; pp.22892-22901.

[6] Chen Y.; Kloft M.; Yang Y.; et al. Mixed kernel based extreme learning machine for electric load

forecasting. Neurocomputing, 2018.

[7] Zhang X.; Wang R.; Zhang T.; et al. Short-Term load forecasting using a novel deep learning

framework. Energies, 2018, 11, 1554.

[8] Shepero M.; Meer D. V. D.; Munkhammar J.; et al. Residential probabilistic load forecasting: A

method using Gaussian process designed for electric load data. Applied Energy, 2018; pp.159-

172.

[9] Bowen Li; Jing Zhang; Yu He; et al. Short-Term Load-Forecasting Method Based on Wavelet

Decomposition With Second-Order Gray Neural Network Model Combined With ADF Test.

IEEE Access. May 2017; pp.16324-16331.

[10] Li Y.; Huang Y.; Zhang M.; et al. Short-Term load forecasting for electric vehicle charging station

based on niche immunity lion algorithm and convolutional neural network. Energies, 2018.

[11] Wang Y.; Zhang N.; Chen Q.; et al. Data-driven probabilistic net load forecasting with high

penetration of invisible PV. IEEE Transactions on Power Systems, 2017; pp.1-1.

[12] Fan G.F.; Peng L.L.; Hong W.C.; Short term load forecasting based on phase space reconstruction

algorithm and bi-square kernel regression model. Applied Energy, 2018, 224, 13-33.

[13] Giuseppe fenza; Mariacristina Gallo; Vincenzo Loia. Drift-Aware Methodology for Anomaly

Detection in Smart Grid, IEEE Access. December 2018; pp.9645-9657.

[14] Singh P.; Dwivedi P.; Integration of new evolutionary approach with artificial neural network for

solving short term load forecast problem. Applied Energy, 2018, 217, 537-549.

[15] Prakash A.; Xu S.; Rajagopal R.; et al. Robust building energy load forecasting using physically-

based kernel models. Energies, 2018, 11, 862.

[16] Yang Y.; Li S.; Li W.; et al. Power load probability density forecasting using Gaussian process

quantile regression. Applied Energy, 2018, 213.

[17] Barman M.; Choudhury N.B.D.; Sutradhar S. A regional hybrid GOA-SVM model based on similar

day approach for short-term load forecasting in Assam, India. Energy, 2018, 145.

[18] Karimi M.; Karami H.; Gholami M.; et al. Priority index considering temperature and date

proximity for selection of similar days in knowledge-based short term load forecasting method.

Energy, 2018, 144, 928-940.

[19] Simona Vasilica Oprea; Adela B´RA; Vlad Diaconta. Sliding Time Window Electricity

Consumption Optimization Algorithm for Communities in the Context of Big Data Processing.

IEEE Access December2018.pp. 13050-13067.

[20] Yang Z.C.; Discrete cosine transform-based predictive model extended in the least-squares sense

for hourly load forecasting. IET Generation Transmission & Distribution, 2016, 10, 3930-

3939.

[21] Kaur A.; Nonnenmacher L.; Coimbra C.F.M.; Net load forecasting for high renewable energy

penetration grids. Energy, 2016, 114, 1073-1084.

IWAACE 2020

Journal of Physics: Conference Series 1550 (2020) 032117

IOP Publishing

doi:10.1088/1742-6596/1550/3/032117

16

[22] Gu C.; Jirutitijaroen P. Dynamic state estimation under communication failure using kriging based

bus load forecasting. IEEE Transactions on Power Systems, 2015, 30, 2831-2840.

[23] Park H.; Baldick R.; Morton D.P.; A stochastic transmission planning model with dependent load

and wind forecasts. IEEE Transactions on Power Systems, 2015, 30, 3003-3011.

[24] Che J.X.; Wang J.Z.; Short-term load forecasting using a kernel-based support vector regression

combination model. Applied Energy, 2014, 132, 602-609.

[25] Hernández L.; Baladrón C.; Aguiar J.M.; et al. Artificial neural networks for short-term load

forecasting in microgrids environment. Energy, 2014, 75, 252-264.

[26] Vasudev Dehalwar; Akhtar Kalam; Mohan LalKolhe; et.al. Electricity load forecasting for urban

area using weather forecast information. IEEE International Conference on Power and

Renewable Energy, Oct2016; pp. 21-23.