Content uploaded by Erwann Sbai
Author content
All content in this area was uploaded by Erwann Sbai on Oct 26, 2016
Content may be subject to copyright.
Shortterm Forecasting of Nodal Electricity
Demand in New Zealand
Erwann Sbaï and Michael Simpson
1
December 2008
1
University of Auckland.
This work is part of a Honours dissertation written by Michael Simpson under the supervision of Erwann Sbaï.
Abstract
This paper compares the accuracy of several models for shortterm electricity load forecasting
for nodes in New Zealand. The literature on shortterm electricity load forecasting is reviewed,
resulting in the selection of the seasonal ARIMA model and the double seasonal HoltWinters
exponential smoothing model for empirical testing. We also estimate a one season ARIMA
model with some double seasonal aspects. A multivariate regression model is also developed
for comparison, utilising aspects of several multivariate models used in other shortterm
electricity load forecasting papers. A time series of halfhourly electricity demand in the
Hayward node in New Zealand is used for estimation and forecasting. The data clearly indicates
a daily and a weekly seasonal pattern. The seasonal ARIMA model outperforms the Holt
Winters and multivariate regression models in forecasting electricity demand from half an hour
to three hours ahead. The superiority of the seasonal ARIMA model increases as the forecasting
length increases. Significant autocorrelation of residuals in the HoltWinters and multivariate
regression models leads to an error adjustment term being included, which improves the
forecasting accuracy of the HoltWinters models but not the multivariate regression model.
1. Introduction
Accurate forecasting of shortterm electricity load is important for all participants in the New
Zealand electricity spot market. Generators require accurate forecasts in order to determine
the most efficient method of producing the load needed in the forthcoming periods, and to
submit supply schedules to the spot market for each halfhour period of the day, in order to
maximise profits with respect to the other generators bids. Transpower, the systems operator,
utilises forecasts of demand in forthcoming periods to determine the lowest cost method of
providing the required quantity of electricity for each node in New Zealand, given the supply
bids submitted by each generator. Overestimation of the quantity required results in financial
losses from generators being on standby that are not needed, while underestimation can result
in the costly startup of cold generators or, in the worst case scenario, the frequency dropping
from the required 50Hz, with the potential tripping of generators resulting in a blackout (Bunn,
1982).
The New Zealand electricity transmission grid is divided into 244 nodes, where generated
electricity exits the highvoltage network and is distributed to customers. Electricity generators
in New Zealand submit a supply function at least two hours ahead for each node they are
willing to supply electricity to for each half hour period of the day. The system allows each
generator to submit five price bands, and the quantity they are willing to supply at each price.
Transpower forecasts the quantity required in each node for the approaching half hour, and
calculates the quantities each generator should supply to each node in order to achieve the
lowest cost. Each generator is paid the price of the last quantity required to meet the demand
forecast in each node. This mechanism is illustrated in Figure 1, in which Generators 1, 2 and 3
submit their supply schedules to the systems operator. Transpower determines that 540MW
are needed for the approaching half hour. Adding the quantities provided at the lowest cost
from the three generators to cover the required quantity results in a price of $75.02 per
megawatt generated.
Figure 1: Price Determination in Electricity Spot Market
Source: Genesis Energy
The importance of accurate forecasting and the power of computer algorithms in estimating
complex models have encouraged extensive research into forecasting models for shortterm
electricity demand. Seasonal ARIMA and HoltWinters exponential smoothing have been widely
used as they require only the quantity demanded variable, and are relatively simple and robust
in forecasting. A variety of multivariate models have also been empirically tested in other
papers. In this paper the forecasting ability of seasonal ARIMA, HoltWinters exponential
smoothing and multivariate regression models will be empirically tested on data from the
Hayward node in New Zealand which, due to its relatively small size, is far more unpredictable
than aggregate country demand data used for empirical testing in other papers. The following
section discusses the relevant literature on the three shortterm forecasting models. We then
introduce the data used for estimation and forecasting, and outline the structure of the models
to be estimated. Section 4 discusses the estimation of the models and empirical testing,
followed by a summary and a conclusion.
2. Literature Review
The HoltWinters exponential smoothing model was pioneered by Winters (1960) in a
comparison of forecasting methods for sales, as a modification of existing exponential
smoothing models. The basic exponential smoothing model forecast the value of a time series
variable in the next period to be a weighted average of the actual value in the current period,
and the previous period’s forecast for the current period. The HoltWinters model added a
seasonal ratio and linear trend to this model, to allow for seasonal variations and changes in the
underlying mean over time. Additional modifications have been suggested in order to improve
forecasting accuracy. Chatfield (1978) noted significant autocorrelation of residuals in
comparing HoltWinters models to BoxJenkins ARIMA models for forecasting ability, and used
a method suggested by Reid (1975) for improving the forecasting ability of exponential
smoothing. The adjustment involves fitting a linear relationship λe
t
to the onestepahead
forecast. The parameter lambda can be estimated after the estimation of the other parameters
in the model, or simultaneously, which Chatfield suggests may be more efficient.
A second major modification to the HoltWinters exponential smoothing model is the addition
of a second seasonal index, introduced by Taylor (2003) when comparing univariate methods
for shortterm electricity demand forecasting in England and Wales. The double seasonal model
achieves more accurate forecasts than the traditional HoltWinters model, since electricity
demand clearly has both a daily and a weekly seasonal pattern. Taylor found that by combining
the double seasonal HoltWinters model with an adjustment for error autocorrelation,
forecasting accuracy was improved. In this paper, one and two season HoltWinters models will
be estimated with and without the error adjustment term, and a new estimation method will
be discussed in order for the model to be estimated with relatively little knowledge of
programming.
The multiplicative seasonal ARIMA model is well established in shortterm load forecasting
literature, and is often used as a benchmark to compare alternative methods to. Box, Jenkins, &
Reinsel (1994) noted that the model can be extended to include multiple seasons. Darbellay
and Slama (2000) found that the double seasonal ARIMA model outperformed the nonlinear
artificial neural network model in forecasting hourly electricity demand in the Czech Republic.
The suggested reason for the superiority of the linear ARIMA model was the linearity of the
autocorrelation in the data. Taylor (2003) compared double seasonal ARIMA with the double
seasonal HoltWinters model and found ARIMA to be less accurate. However, Taylor (2006)
used an ARIMA model once again to compare against a variety of nonlinear models, and found
ARIMA to be more accurate than all but one. Despite the success of the double seasonal ARIMA
model, the lack of an available estimation method will limit this paper to a one season ARIMA
model, which will contain certain aspects of a double seasonal model, though not in the
orthodox format.
2
Multivariate models for shortterm load forecasting are less common than univariate models,
due in part to their impracticality for realtime forecasting. Weather variables are the major
factors affecting electricity demand, and gathering data for predictions in the shortterm would
be very demanding (Bunn, 1982). Also, weather variables tend to change in a smooth fashion,
which may already be reflected in lagged demand values in univariate models (Taylor, 2003).
2
We restrict our self to the use of one econometric software: Eviews 6.0.
Some models have been developed that achieve accurate results, particularly in longerterm
forecasts. Hyde and Hodnett (1997) developed a regression model to predict daily load demand
in Ireland, to replace the Electricity Supply Board’s method of forecasting based on observed
loads from a similar day and adjusting based on weather variables. The proposed model
consisted of a normal level for the day type, weather effects, and special events such as power
outages or industrial strikes, estimated using regression analysis. An error adjustment term was
also included, allowing the model to adjust its forecast based on shortterm deviations from
previous days forecasts. Favourable results were reported for most times of the year, and it was
proposed that periods of significant deviation from the forecast be dealt with using rulebased
procedures for handling special days. Cottet and Smith (2003) developed a vector
autoregressive model using temperature and humidity variables, and dummy variables for day
of the week and public holidays, to estimate separate regressions for each of the 48 halfhour
periods of the day. The model is able to predict not only distribution of the daily load, but also
the time and quantity of the daily peak load. The resulting accuracy is difficult to judge, as other
models were not used for comparison. Our work will combine aspects of these papers to
develop a simple multivariate regression model which includes temperature measurements and
dummy variables for halfhour of the day instead of separate regressions for each halfhour, to
allow for the inclusion of lagged quantity variables.
3. Data and Econometric Models
Data
The data consists of a time series of halfhourly observations of electricity demand from the
Hayward node in New Zealand for the one year and four week period from July 21 2005 to
August 16 2006. This period was chosen because the univariate models will only use the last
twelve weeks from May 25 2006 to August 16 2006, which contain only one public holiday and
no changes in daylight savings, thus preventing unnecessary challenges to the models. The last
four weeks of the sample will be used to test the forecasting accuracy of all of the models, and
the previous eight weeks will be used to estimate the seasonal ARIMA and exponential
smoothing models. The first year, from July 21 2005 to July 20 2006, will be used to estimate
the regression model, since the model requires at least one year of data to estimate all of the
parameters.
Figure 1 shows demand in the Hayward node for each halfhour on July 18 2006. The variation
in demand across time is apparent, with peaks in demand at 8am and 6pm, a trough in the
middle of the day, and a deeper trough around 2am.
Figure 1: Electricity Demand in the Hayward Node on 18 July 2006
Figure 2 shows demand in the Hayward node across a two week period in July 2006. The strong
daily pattern is reflected in almost all of the twentyfour hour periods shown, with some
variations in the magnitude of the peaks in demand. The two weeks have a similar overall
pattern, with lower than average demand on Monday, Saturday and Sunday, although even this
is variable, with demand on the Saturday of the first week as high as the Wednesday, Thursday
and Friday of that week. These variations in the overall pattern will make forecasting more
difficult, especially over a longer timeframe than one period ahead. Despite variations in the
seasonal pattern, our data clearly has two seasonal patterns, one over 48 periods and one over
336 periods.
0
2000
4000
6000
8000
10000
12000
14000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
Demand (kW)
Halfhour
Figure 2: Electricity Demand in the Hayward Node 16 July 2006 to 29 July 2006
Hourly temperature observations for Wellington City were retrieved from the National Institute
of Water and Atmospheric Research. Halfhourly observations were not available, resulting in
the same temperature observation being used for both periods in each hour. Some smoothing
of the data  so that periods between the recorded hourly observations took the average of the
two recorded observations – may be more accurate, but the sheer scale of observations in the
one year and four month period made this approach impractical. Figure 3 shows temperature
observations over the same two week period as Figure 2. The variation and lack of a consistent
seasonal pattern may explain some of the variation in demand that is not attributed to the time
of day or the day of the week, hence allowing for more accurate forecasting.
0
2000
4000
6000
8000
1000 0
1200 0
1400 0
1 49 97 145 193 241 2 89 337 385 433 481 529 57 7 625
Demand (kW)
Halfhour
Figure 3: Temperature in Wellington 16 July 2006 to 29 July 2006
Stationarity
In order for a reliable econometric model to be estimated, some knowledge of the trends in the
data is needed. An important consideration in time series models is whether the data is
stationaryIf a data series is nonstationary, with the mean or the variance changing over time,
then the model may be misspecified and lead to a spurious regression. The stationarity of the
quantity series must be tested, both for the twelve week period for the ARIMA and Holt
Winters models, and for the longer period for the multivariate regression model. If the longer
period is found to be nonstationary, and the temperature variable is also nonstationary, the
two variables should be tested for cointegration before differencing is initiated.
0
2
4
6
8
10
12
14
1 49 97 145 19 3 241 28 9 337 385 433 481 529 577 625
Temperature (degrees Celcius)
Halfhours
Multiplicative or Additive Model Selection
Both the HoltWinters exponential smoothing model and the seasonal ARIMA model can be
specified in either multiplicative or additive form. The multiplicative form is appropriate in
cases where the magnitude of the seasonal variation of the data increases as the mean of the
data increases (Taylor, 2003). In other shortterm load forecasting papers, multiplicative models
have been used, suggesting it is the correct choice. Testing for this condition in our data
ensures the correct model is used.
A sevenday moving mean is calculated for each observation for a six month period. This is
subtracted from each observation to find the deviation from the moving mean. For each week
in the six month period, the average of the moving mean and the average of the absolute
deviation from the moving mean are determined. The correlation between these averages is
over 0.95, which indicates that for our data, seasonal variation is positively related to the mean
level, suggesting multiplicative models should be used.
One Season HoltWinters Exponential Smoothing
The one season HoltWinters model is a modification of the exponential smoothing model. It
consists of a local level S
t
which is equivalent to a deseasonalised local mean, a local trend T
t
which is the difference between local levels (S
t
– S
t1
), and a seasonal index I
t,
estimated by
dividing the actual observation X
t
by the local level. The specification of the variables is outlined
below.
Level S
t
= α (X
t
/ I
ts
) + (1α) (S
t1
+ T
t1
)
Trend T
t
= β (S
t
– S
t1
) + (1β) T
t1
Index I
t
= ω(X
t
/ S
t
) + (1 ω) I
ts
Forecast X
t+k
= (S
t
+ kT
t
) I
t+ks
where α, β and ω are smoothing parameters between zero and one, s is the number of periods
in the seasonal cycle, and X
t+k
is the kstep ahead forecast.
Since the data on electricity demand has two seasonal components, the most prominent
seasonal cycle must be chosen for the one season exponential smoothing model. Choosing a
season of 336 periods will incorporate both the effect of the day of the week and the halfhour
of the day into the model, since the seasonal index will be taken from the same halfhour and
the same day of the week from the previous week, while choosing a season of 48 periods
ignores the effect the day of the week has on demand. Therefore, choosing a season of 336
periods is more sensible.
Double Seasonal HoltWinters Exponential Smoothing
As described in section 2, the HoltWinters exponential smoothing method has been adapted
for data containing two seasons by Taylor (2003). The major addition is a second seasonal
index, resulting in a daily index D
t
and a weekly index W
t
. There are also changes in the
formulation of the level and the forecast to include the effect of the second season.
Level S
t
= α (X
t
/ (D
ts1
W
ts2
)) + (1α) (S
t1
+ T
t1
)
Trend T
t
= β (S
t
– S
t1
) + (1 β) T
t1
Daily Index D
t
= δ (X
t
/ (S
t
W
ts2
)) + (1δ) D
ts1
Weekly Index W
t
= ω (X
t
/ (S
t
D
ts1
)) + (1ω) W
ts2
Forecast X
t+k
= (S
t
+ kT
t
) D
t+ks1
W
t+ks2
where s1 is the number of periods in the first season (48), and s2 the number of periods in the
second season (336).
Error Adjustment for HoltWinters Models
HoltWinters estimation may be improved using a simple AR(1) error correction term, of the
form e
t
= λe
t1
+ε
t
. For forecasting, the forecast equation above will be expanded to as follows:
X
t+k
= (S
t
+ kT
t
) D
t+ks1
W
t+ks2
–λ
k
e
t
Forecasts for more than one period ahead will not have knowledge of the previous period’s
error term, since the observed demand will not be known. However, the relationship between
error terms of k lag is:
E[e
t
] = E[λe
t1
+ ε
t
]
=E[ λ(λe
t2
+ ε
t1
)]= … = λ
k
e
tk
Therefore, a forecast for kperiods ahead will subtract λ
k
times the current period’s error term.
The HoltWinters parameters and the error adjustment parameter lambda will be estimated
simultaneously to improve efficiency.
Seasonal ARIMA
An autoregressive integrated moving average (ARIMA) model explains one variable with respect
to its past and the history of its residuals. ARIMA models can be written as ARIMA(p,d,q), where
p is the number of autoregressive terms, d is the order of integration, and q is the number of
moving average terms in the model.
The autoregressive terms explain the variable with respect to previous observations of the
variable. For example, an AR(2) model appears as follows:
Y
t
= φ
1
Y
t1
+ φ
2
Y
t2
+ u
t.
The moving average terms explain the variable with respect to previous residuals, as the
following MA(2) model demonstrates:
Y
t
= u
t
+ φ
1
u
t1
+ φ
2
u
t2
.
The order of integration indicates the number of times the data must be differenced before it
becomes stationary.
An ARIMA(1,1,1) model is written as
(1φL)(ΔY
t
– c) = (1θL)ε
t
,
where L is the lag operator, with LY
t
= Y
t1
, and Lε
t
= ε
t1
. This process can also be written linearly
as
ΔY
t
= (1φ)c + φΔY
t1
 θε
t1
+ ε
t
.
In the case where the data contains not only a relationship between the variable and past
observations of the variable and residuals but also a seasonal component, the basic ARIMA
model outlined above can be extended to a seasonal ARIMA model, containing seasonal
autoregressive and moving average terms, as well as seasonal differencing. The seasonal ARIMA
model can be classified as ARIMA(p,d,q)х(P,D,Q,s), where P is the number of seasonal
autoregressive terms, D is the order of seasonal differencing, Q the number of seasonal moving
average terms included in the model and s the number of periods in the season. For example,
ARIMA(1,0,1) х(1,0,1,336) is written as (1φL)(1φL
336
)(Y
t
c) = (1θL)(1ωL
336
)ε
t
. The requirement
for seasonal AR and MA terms can be found by analysing the correllelogram of residuals after
the basic ARIMA model has been estimated.
Seasonal ARIMA models with more than one seasonal period can be developed. Darbellay and
Slama (2000), and also Taylor (2003), used seasonal periods of one day and one week for their
double seasonal ARIMA models. An example that would be a natural startingpoint for the data
in this paper would be an ARIMA(1,0,1)х(1,0,1,48)х(1,0,1,336) model. This can be written as (1
φL)( 1φ
1
L
48
)(1φ
2
L
336
)(Y
t
c) = (1θL) (1ω
1
L
48
) (1ω
2
L
336
)ε
t
.
A season of 336 periods will be used in the estimation of a seasonal ARIMA model for this data.
Some seasonal autoregressive or moving average terms may be added for the 48period
season, though not in the form outlined in the double seasonal model above. The estimation
techniques used prevent a second season being added in a separate bracket as written above,
but lags of 48 periods may be included in the 336period bracket; i.e. (1φL)(1φ
1
L
48
φ
2
L
336
)(Y
t

c) = (1θL) (1ω
1
L
48
ω
2
L
336
)ε
t
.
Multivariate Regression
A simple multivariate regression model can be used to compare the univariate ARIMA and
exponential smoothing models against. Although there are few time series variables to regress
electricity demand against, with the one obvious choice being temperature, dummy variables
can be used for the halfhour period of the day, the day of the week, the month of the year and
also the year itself. Other dummy variables can be tested, and adding lagged dependent
variables is another way to improve the accuracy of the regression model. Bunn (1982)
discussed some “special effects” that may be taken into account when forecasting shortterm
demand in electricity. These include weather variables, television audience behaviour, holidays,
onset of darkness and daylightsavings time. Limitations in the availability of data prevent
some of these variables being tested. The general formula for such a model is as follows:
X
t
= c + Σ
i=248
φ
i
h
t
+ Σ
j=27
θ
j
d
t
+Σ
k=212
ρ
k
m
t
+ ωy
t
+ α*temp
t
+ {other dummy variables} + {lagged
dependent variables} + e
t,
where h is the halfhourly dummy variable, d is the day of the week dummy variable, m the
month dummy variable, and y the year dummy variable, and φ
i,
θ
j,
ρ
k,
and
ω are the parameters
for the 47 halfhours, 6 days, 11 months and the one year requiring dummy variables. Dummy
variables need to be estimated for all but one of the halfhours, days, months and years, with
the halfhour, day month and year not allocated a dummy variable used as a base.
Hyde and Hodnett (1997) found that adding an error adjustment term improved forecasting
accuracy. A simple error adjustment term will be added, as for HoltWinters, in the form of e
t
=
λe
t1
+ ε
t
.
4. Results and Interpretation
Empirical analysis is performed to compare the forecasting accuracy of the models. Demand is
forecast for one to six periods ahead since the generators and the Systems Operator for the
spot market require different lengths of foresight in demand for their different roles. The
possibility exists that some models that accurately forecast one period ahead may be less
accurate in forecasting six periods ahead. As mentioned previously, the univariate models have
estimation periods of eight weeks, the multivariate regression model have an estimation period
of one year, and the same four week period is used as a forecast sample for all models.
Stationarity
The twelveweek period of quantity demanded observations used for univariate estimation and
forecasting is tested for nonstationarity. There is very strong evidence against quantity
containing a unit root. Therefore, quantity does not need to be differenced before estimating
HoltWinters and ARMA models.
Testing the one year and four week period of quantity data provides no evidence against
nonstationarity in the data. In the absence of other time series variables in the data, this would
require quantity to be differenced to remove the nonstationarity. However, since temperature
is a time series variable, there may exist cointegration between quantity and temperature.
A linear regression with an intercept is run on quantity and temperature, and the residuals are
tested for nonstationarity. There is very strong evidence against nonstationarity, so Engle
Granger cointegration exists between quantity and temperature, which means the data is not
required to be differenced before the regression is run.
One Season HoltWinters Exponential Smoothing
The initial trend value is calculated by taking the average of: (i) the difference between the
mean of the first and the second 336 observations divided by 336; and (ii) the mean of the first
differences from the first two weeks of the sample. The initial level value is calculated by taking
the mean of the first 672 observations, and adding 336.5 times the initial trend. The model also
requires initial seasonal indices for each of the 336 periods in the season. These are calculated
as the average of the ratio of observation to 336point moving means for each corresponding
halfhour period, taken from the first two weeks of the sample.
The parameters are estimated in Excel to minimise the mean absolute percentage error (MAPE)
of onestepahead forecasts. MAPE is the most commonly used forecasting error summary in
shortterm electricity demand forecasting (Taylor, 2003). Random numbers between zero and
one are generated for the parameter values, and the combinations of parameters that resulted
in the lowest MAPEs over time are selected and used to narrow the range of random number
generation for each parameter. This process is followed until the specific parameters that
minimised the MAPE for the estimation sample were identified.
Double Seasonal HoltWinters Exponential Smoothing
Initial values for the level and the trend are calculated in the same manner as for HoltWinters
with one season. The initial values for the 48 day indices are estimated by taking the average of
the ratios of observations to 48point moving mean for the corresponding periods from the first
week of the sample. The initial values for the 336 week indices are estimated by taking the
average of the ratios of observations to 336point moving mean from the first two weeks of the
sample, divided by the corresponding day indices.
The same process for estimation of the parameters is used for double seasonal HoltWinters as
is used for one season HoltWinters.
Error Adjustment for HoltWinters Models
After estimating the HoltWinters model for one and two seasons, significant positive
autocorrelation of residuals was found. This is unsurprising, given the prevalence of
autocorrelation in other literature on HoltWinters forecasting. Our estimation of both the one
season and double seasonal HoltWinters models with error correction estimates the error
correction parameter lambda in conjunction with estimating the other parameters, and using
the same method as the estimation of the other parameters. Random numbers between 0 and
1 are generated for lambda since the autocorrelation coefficient is significantly positive, and the
range is narrowed down to find the value of lambda that minimises the MAPE of the estimation
sample. The estimated coefficients for the HoltWinters exponential smoothing models without
and with error adjustment terms are reported in Tables 1 and 2 respectively.
Table 1: Estimated Coefficients of HoltWinters Exponential Smoothing Models
Table 2: Estimated Coefficients of HoltWinters Exponential Smoothing Models with Error Adjustment
Seasonal ARIMA
Several statistics are considered important for identifying the correct specification of the
ARIMA model. In keeping with the estimation of the HoltWinters model, the MAPE of one
stepahead forecasts is one of the statistics considered. Other criteria used to select the
specification of the model include the Akaike Information Criterion, the adjusted Rsquared,
and the BoxJenkins methodology to minimise or eliminate the autocorrelation of residuals.
Due to the difficulty in estimating a true double seasonal ARIMA model, the hybrid model which
performed the best overall with regards to the criteria above is difficult to write in compact
form. In expanded form, the model is written as:
(1  1.36L + 0.41L
2
)(1  0.32L
48
 0.24L
336
 0.15L
672
 0.12L
1008
 0.12L
1344
)(Y
t
 8421) =
(1 + 0.16L
48
)(1 + 0.04L
336
+ 0.04L
672
)ε
t
This specification has two autoregressive terms, seasonal autoregressive terms of lags 48, 336,
672, 1008 and 1344, a movingaverage term with a lag of 48, and seasonal movingaverage
terms of lags 336 and 672. Of the many combinations of autoregressive and movingaverage
terms tested, this was the only specification that did not have statistically significant
Level
α
Trend β
Withinday
seasonality
δ
Withinweek
seasonality
ω
One Season HoltWinters
0.94
0

1
Double Seasonal HoltWinters
0.95
0
0.8
1
Level
α
Trend β
Withinday
seasonality
δ
Withinweek
seasonality
ω
Error
adjustment
λ
One Season HoltWinters
0.86
0

1
0.4
Double Seasonal HoltWinters
0.91
0
0.14
0.14
0.34
autocorrelation of residuals. It also has the secondlowest Akaike Information Criterion and
MAPE, and the highest adjusted Rsquared, which all indicate a wellspecified model.
Multivariate Regression
Initially, a model with only dummy variables for the halfhour, day, month and year is
estimated, with an MAPE for the period of estimation of 16.4. Adding temperature to the
model decreases the MAPE to 15.9, though of the other variables added, only a dummy
variable for public holidays that fall on weekdays was significant. Variables found to be
insignificant include dummy variables for daylight savings and school holidays, and a time
variable that increases by one for each subsequent period.
Adding lagged dependent variables has a significant effect on decreasing the MAPE of the
forecasts. For example, adding a dependent variable of one lag decreases the MAPE from 15.8
to 4.32. Adding lags from 1 to 5 periods behind, 46 to 49 periods behind, and 329 and 334 to
339 periods behind all added to the forecasting accuracy of the model.
The BreuschGodfrey serial correlation test provides extremely strong evidence against no
autocorrelation of residuals. Autocorrelation is particularly severe with residuals from 48
periods behind. However, autocorrelation only affects the standard errors of the coefficients,
not the estimates of the coefficients themselves. In testing the significance of variables this
would be a considerable problem. Since we are only interested in estimating a model to provide
accurate forecasts, autocorrelation is far less important. Forecasting accuracy may be improved
by including a lagged error term as was used in the HoltWinters models. In this case, the
results when including a lagged error term for one period behind and for 48 periods behind did
not improve the forecasting accuracy of the model. The White Test provided extremely strong
evidence against no heteroskedasticity in the model. This implies that the magnitude of the
residuals in the model vary as at least one of the variables in the model varies. Once again,
heteroskedasticity affects the standard errors of the coefficients and not the estimation of the
coefficients themselves, so for building a forecasting model this will not have an impact. The
estimated coefficients and standard errors for the multivariate regression model are reported
in Table 7 of Appendix 1.
Discussion
Figures 6 and 7 compare the accuracy of the models for forecasting one to six periods ahead.
The seasonal ARIMA model performed best in forecasting for all time periods except one period
ahead, where it was second best with an MAPE of 2.02, slightly behind the double seasonal
HoltWinters exponential smoothing model’s MAPE of 2.00. The one season HoltWinters,
double seasonal HoltWinters and seasonal ARIMA were very similar in accuracy for one period
ahead, and even the multivariate regression model was not far behind. From two to six periods
ahead, the errors for the HoltWinters and multivariate regression models were fairly similar in
accuracy, highlighting the relative supremacy of the seasonal ARIMA. Seasonal ARIMA
forecasting errors for six periods ahead were 6.21, well below the other models’ MAPEs of
between 6.93 and 7.37. One reason for this could be the superior error adjustment structure of
the seasonal ARIMA model. Three moving average terms – including two seasonal moving
average terms with one and twoweek lags – were included, and each had the same power for
any number of forecasting periods ahead. However, for the HoltWinters models with error
adjustment terms, the coefficient for the error adjustment decreases greatly in magnitude as
the forecasting period ahead is extended, so that when forecasting six periods ahead, the error
adjustment would have had very little impact on the actual forecast. The multivariate
regression model had no error adjustment term as it did not improve forecasting ability in
estimation. The inaccuracy in forecasting from two to six periods ahead for the multivariate
model is not surprising given the relative inaccuracy in forecasting one period ahead. It must be
noted that there are many multivariate models in literature for shortterm electricity load
forecasting, so we cannot conclude from these results that univariate models are superior.
These results are surprising given that Taylor found the double seasonal HoltWinters model
with an error adjustment to be clearly more accurate than the seasonal ARIMA model. Equally
surprising is that Taylor’s paper estimated a true double seasonal ARIMA model, while our
seasonal ARIMA model was effectively a one season model with a crude adjustment. A properly
formatted double seasonal ARIMA model would be expected to yield even more accurate
results.
Figure 6
: MAPE’s of forecasts from one to six periods ahead for data from July 21 to August 16 2006.
Forecasting Method 1 2 3 4 5 6
One Season HoltWinters 2.29 3.73 4.90 5.83 6.65 7.37
One Season Error Adjusted HoltWinters 2.04 3.53 4.76 5.73 6.57 7.27
Double Seasonal HoltWinters 2.18 3.54 4.64 5.55 6.31 7.00
Double Seasonal Error Adjusted HoltWinters 2.00 3.41 4.55 5.46 6.24 6.93
Multivariate Regression 2.10 3.63 4.85 5.83 6.56 7.13
Seasonal ARIMA 2.02 3.26 4.27 5.08 5.71 6.21
Forecast Periods Ahead
Figure 7
: MAPE’s of forecasts from one to six periods ahead for data from July 21 to August 16 2006.
The results when comparing the four HoltWinters models are unsurprising, with the double
seasonal HoltWinters performing better than the one season version. The data clearly
contained two seasonal trends, so the one season HoltWinters would not have adjusted for the
seasonal variation as successfully as the double seasonal model. The error adjustment versions
also performed better than the models without them, especially for very shortterm
forecasting, for reasons explained above.
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
1 2 3 4 5 6
MAPE
Forecast Periods Ahe ad
One Season HoltWinter s
One Season Erro r A djusted HoltWinter s
Double Seasonal HoltWinters
Double Seasonal Error Adjusted Holt
Winters
Multivariate Re gression
Seasonal ARIMA
5. Conclusion
The importance of shortterm electricity load forecasting and the significant financial gains for
players in the spot market that can be achieved through increased accuracy has encouraged a
substantial amount of research into various univariate and multivariate models. Seasonal
ARIMA, HoltWinters exponential smoothing and variations on weatherrelated multivariate
models have proven most popular in recent years. In this study, one season and two season
HoltWinters exponential smoothing models with and without error adjustment terms, a hybrid
of a single and double seasonal ARIMA model, and a dummy variable model with temperature
and lagged dependent variables were compared for forecasting electricity load in the Hayward
node of New Zealand for one to six periods ahead.
The seasonal ARIMA model performed the best in minimising the mean absolute percentage
errors, especially for longerterm forecasting. The error adjustment term in the HoltWinters
model significantly improved very shortterm forecasting, but the improvements were reduced
for longerterm forecasts. Given the relative superiority of the seasonal ARIMA model in
forecasting for the Hayward node, the estimation and performance of a pure double seasonal
ARIMA model for nodal electricity demand should be investigated. Also, other multivariate
models should be empirically tested before it can be concluded that univariate models are
superior for shortterm load forecasting, as the multivariate model used in this study is only one
of a variety of multivariate models introduced in other literature. Another extension would be
to use the models to forecast demand for other nodes in New Zealand. The distribution of
demand for electricity over a day varies greatly across nodes, given the characteristics of
electricity consumers, especially nonresidential consumers, also varies greatly across nodes. A
combination of models may be a sensible way to deal with the relative strengths and
weaknesses of models, with weights varying for different periods of time (Smith, 1989) or
different forecast periods ahead.
6. Acknowledgements
I would like to sincerely thank Dr Erwann Sbai for his constant guidance and assistance on all
aspects of this paper. I would also like to thank Dr James W. Taylor for his helpful response to
queries about estimation of the HoltWinters exponential smoothing model.
Appendix 1
Table 1
DickeyFuller Unit Root Test for Quantity for One Year and Four Week Period
Null Hypothesis: QUANTITY has a unit root
Exogenous: Constant
Lag Length: 44 (Automatic based on SIC, MAXLAG=44)
tStatistic Prob.*
Augmented DickeyFuller test statistic 1.884183 0.3401
Test critical values: 1% level 3.430527
5% level 2.861502
10% level
2.566791
*MacKinnon (1996) onesided pvalues.
Augmented DickeyFuller Test Equation
Dependent Variable: D(QUANTITY)
Method: Least Squares
Date: 11/16/08 Time: 13:53
Sample (adjusted): 46 18816
Included observations: 18771 after adjustments
Table 2
DickeyFuller Unit Root Test for Temperature for One Year and Four Week Period
Null
Hypothesis: TEMPHH has a unit root
Exogenous: Constant
Lag Length: 40 (Automatic based on SIC, MAXLAG=44)
t

Statistic
Prob.*
Augmented DickeyFuller test statistic 1.111129 0.2426
Test critical values: 1% level
2.565085
5% level
1.940841
10% level
1.616688
*MacKinnon (1996) onesided pvalues.
Augmented DickeyFuller Test Equation
Dependent Variable: D(TEMPHH)
Method: Least Squares
Date: 11/16/08 Time: 13:57
Sample (adjusted): 42 18816
Included observations: 18775 after adjustments
Table 3
DickeyFuller Unit Root Test for Residuals of Regression for One Year and Four Week Period
Null Hypothesis: RESID01 has a unit root
Exogenous: Constant
Lag Length: 44 (Automatic based on SIC, MAXLAG=44)
tStatistic Prob.*
Augmented DickeyFuller test statistic 3.624805 0.0053
Test critical values: 1% level 3.430527
5% level 2.861502
10% level

2.566791
*MacKinnon (1996) onesided pvalues.
Augmented DickeyFuller Test Equation
Dependent Variable: D(RESID01)
Method: Least Squares
Date: 11/16/08 Time: 13:59
Sample (adjusted): 46 18816
Included observations: 18771 after adjustments
Table 4
DickeyFuller Unit Root Test for Quantity for Twelve Week Period
Null Hypothesis: Q has a unit root
Exogenous: Constant
Lag Length: 30 (Automatic based on SIC, MAXLAG=31)
tStatistic Prob.*
Augmented Dickey

Fuller test statistic

12.10734
0.0000
Test critical values: 1% level 3.431567
5% level 2.861963
10% level

2.567038
*MacKinnon (1996) onesided pvalues.
Augmented DickeyFuller Test Equation
Dependent Variable: D(Q)
Method: Least Squares
Date: 11/11/08 Time: 14:16
Sample (adjusted): 32 4704
Included observations: 4673 after adjustments
Table 5
Estimated Coefficients and Standard Errors from ARIMA Estimation
Dependent Variable: Q
Method: Least Squares
Date: 10/28/08 Time: 18:35
Sample (adjusted): 1347 2688
Included observations: 1342 after adjustments
Convergence achieved after 11 iterations
Backcast: OFF (Roots of MA process too large)
Variable Coefficient
Std. Error
tStatistic
Prob.
C 8420.591
2630.438
3.201212
0.0014
AR(1) 1.366646
0.025380
53.84793
0.0000
AR(2) 0.406294
0.025366
16.01724
0.0000
SAR(48) 0.318523
0.033220
9.588285
0.0000
SAR(336) 0.244005
0.042613
5.726087
0.0000
SAR(672) 0.151092
0.032099
4.707000
0.0000
SAR(1008) 0.120835
0.025237
4.787964
0.0000
SAR(1344) 0.120494
0.024087
5.002441
0.0000
MA(48) 0.159622
0.041761
3.822280
0.0001
SMA(336) 0.039639
0.053064
0.747005
0.4552
SMA(672) 0.043258
0.051023
0.847815
0.3967
Rsquared 0.991933
Mean dependent var 8151.314
Adjusted R

squared
0.991872
S.D. dependent var
2359.418
S.E. of regression 212.7091
Akaike info criterion 13.56589
Sum squared resid 60221297
Schwarz criterion 13.60853
Log likelihood

9091.713
F

statistic
16366.24
DurbinWatson stat 1.995216
Prob(Fstatistic) 0.000000
Table 6
Autocorrelation Test for Preferred ARIMA Model
BreuschGodfrey Serial Correlation LM Test:
Fstatistic 1.198490
Probability 0.096106
Obs*Rsquared 119.0366
Probability 0.094168
Test Equation:
Dependent Variable: RESID
Method: Least Squares
Date: 11/11/08
Time: 14:25
Presample missing value lagged residuals set to zero.
Table 7
Estimated Coefficients and Standard Errors of Multivariate Regression Model
Dependent Variable: QUANTITY
Method: Least Squares
Date: 10/30/08 Time: 15:26
Sample (adjusted): 340 17472
Included observations: 17133 after adjustments
Variable
Coefficient
Std. Error
t

Statistic
Prob.
C 194.7551
29.96662
6.499069
0.0000
QUANTITY(

1)
1.227663
0.007560
162.3895
0.0000
QUANTITY(2) 0.321918
0.012022
26.77658
0.0000
QUANTITY(3) 0.022960
0.012195
1.882772
0.0597
QUANTITY(

4)

0.029077
0.011742

2.476353
0.0133
QUANTITY(5) 0.021643
0.006714
3.223700
0.0013
QUANTITY(46) 0.023362
0.007015
3.330206
0.0009
QUANTITY(47) 0.023487
0.011872
1.978360
0.0479
QUANTITY(48) 0.135758
0.011926
11.38327
0.0000
QUANTITY(49) 0.150994
0.007183
21.02119
0.0000
QUANTITY(329) 0.021161
0.002086
10.14464
0.0000
QUANTITY(334) 0.044443
0.007351
6.046013
0.0000
QUANTITY(335) 0.078885
0.011893
6.632837
0.0000
QUANTITY(336) 0.095609
0.012219
7.824341
0.0000
QUANTITY(337) 0.147344
0.012198
12.07947
0.0000
QUANTITY(338) 0.014878
0.011944
1.245573
0.2129
QUANTITY(339) 0.026491
0.007308
3.625099
0.0003
NUM2006 27.46326
19.29675
1.423207
0.1547
FEBRUARY 12.65403
9.012376
1.404073
0.1603
MARCH 13.73540
9.330467
1.472102
0.1410
APRIL 19.48600
9.386951
2.075861
0.0379
MAY 95.99835
13.45055
7.137132
0.0000
JUNE 102.5430
15.79644
6.491528
0.0000
JULY 94.52589
16.41292
5.759238
0.0000
AUGUST 51.76024
23.89098
2.166518
0.0303
SEPTEMBER 37.93423
22.91510
1.655425
0.0979
OCTOBER 21.43332
22.39648
0.956995
0.3386
NOVEMBER 22.70637
21.82012
1.040616
0.2981
DECEMBER 21.08347
21.39019
0.985661
0.3243
TUESDAY 51.81510
7.029219
7.371388
0.0000
WEDNESDAY 34.98138
7.041428
4.967939
0.0000
THURSDAY 31.24492
7.050731
4.431444
0.0000
FRIDAY 24.48236
7.011854
3.491568
0.0005
SATURDAY 20.24399
6.946393
2.914317
0.0036
SUNDAY 12.13907
6.900171
1.759242
0.0786
H2 19.15437
18.43024
1.039290
0.2987
H3 17.84026
18.17636
0.981509
0.3264
H4

49.21197
18.30412

2.688573
0.0072
H5 1.915839
18.47415
0.103704
0.9174
H6 24.22863
19.06027
1.271159
0.2037
H7
24.23929
19.46469
1.245296
0.2130
H8 22.28745
20.12121
1.107660
0.2680
H9 31.54093
20.39318
1.546641
0.1220
H10

4.991769
20.55144

0.242891
0.8081
H11 55.69474
20.74062
2.685298
0.0073
H12

6.815370
21.08962

0.323162
0.7466
H13 211.5525
21.29253
9.935525
0.0000
H14 88.29046
21.76058
4.057359
0.0000
H15 199.9586
21.56358
9.272978
0.0000
H16 66.28875
21.78833
3.042397
0.0024
H17 73.84176
21.59334
3.419655
0.0006
H18 6.354603
21.24608
0.299095
0.7649
H19 155.9929
20.74055
7.521152
0.0000
H20 73.40924
20.27106
3.621381
0.0003
H21 105.6186
19.55145
5.402089
0.0000
H22 80.32300
19.40940
4.138355
0.0000
H23 91.14822
19.30067
4.722543
0.0000
H24 119.2945
19.18206
6.219067
0.0000
H25 74.02396
19.27193
3.841025
0.0001
H26 65.27203
19.28548
3.384517
0.0007
H27 200.3019
19.48236
10.28119
0.0000
H28 39.66476
19.98651
1.984577
0.0472
H29 141.8016
20.09571
7.056310
0.0000
H30 146.0759
20.53381
7.113922
0.0000
H31 67.96664
20.44214
3.324829
0.0009
H32 150.5739
20.26742
7.429360
0.0000
H33 135.5470
20.76032
6.529141
0.0000
H34 113.6618
21.08011
5.391897
0.0000
H35 242.5684
21.22070
11.43075
0.0000
H36 274.8353
21.52374
12.76894
0.0000
H37 42.16129
21.28147
1.981127
0.0476
H38 83.16012
20.82901
3.992514
0.0001
H39 14.90985
20.91307
0.712944
0.4759
H40 87.30350
20.45918
4.267204
0.0000
H41 121.8005
19.87114
6.129516
0.0000
H42
116.4392
19.47258
5.979650
0.0000
H43 38.31945
19.16478
1.999472
0.0456
H44 43.00934
18.90118
2.275485
0.0229
H45

46.20064
18.80425

2.456925
0.0140
H46 92.62841
18.65968
4.964094
0.0000
H47 177.9301
18.38378
9.678647
0.0000
H48

141.9948
19.04131

7.457196
0.0000
HOLIDAYWKD 69.38390
45.04178
1.540434
0.1235
TEMPHH 10.41107
0.931714
11.17410
0.0000
Rsquared 0.990608
Mean dependent var 5195.558
Adjusted Rsquared 0.990563
S.D. dependent var 2435.223
S.E. of regression
236.5724
Akaike info criterion
13.77528
Sum squared resid 9.54E+08
Schwarz criterion 13.81327
Log likelihood 117921.9
Fstatistic 21666.12
Durbin

Watson stat
2.000209
Prob
(F

statistic)
0.000000
Appendix 2
References
Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994). Time Series Analysis: Forecasting and Control, 3
rd
edn, PrenticeHall: Englewood Cliffs. NJ, p333.
Bunn, D. W. (1982). A Review of Procedures in the Electricity Supply Industry. The Journal of the
Operational Research Society, Vol 33(6), 533545.
Chatfield, C. (1978). The HoltWinters Forecasting Procedure. Applied Statistics, Vol 27(3), 264279.
Cottet, R. and Smith, M. (2003). Bayesian Modeling and Forecasting if Intraday Electricity Load. Journal
of the American Statistical Association, Vol 98, 839849
Darbellay, G. A. and Slama, M. (2000). Forecasting the Shortterm Demand for Electricity – Do Neural
Networks Stand a Better Chance? International Journal of Forecasting, Vol 16, 7183.
Genesis Energy. Electricity Trading in the New Zealand Electricity Market. Retrieved from
http://www.genesisenergy.co.nz/shadomx/apps/fms/fmsdownload.cfm?file_uuid=42F2E11E7E95
D74807D04721F0FC48B7&siteName=genesis on 13 March 2008.
Hyde, O. and Hodnett, P. F. (1997). Modeling the Effects of Weather in Shortterm Electricity Load
Forecasting. Mathematical Engineering in Industry, Vol 6, 155169.
Reid, D. J. (1975). A Review of Shortterm Projection Techniques. Practical Aspects of Forecasting.
Operational Reserch Society: London, pp825.
Smith, D. G. C. (1989). Combination of Forecasts in Electricity Demand Prediction. Journal of Forecasting,
Vol 8, 349356.
Taylor, J. W. (2003). Shortterm Electricity Demand Forecasting using Double Seasonal Exponential
Smoothing. Journal of the Operational Research Society, Vol 54, 799805.
Taylor, J. W. (2006). A Comparison of Univariate Methods for Forecasting Electricity Demand up to a Day
Ahead. International Journal of Forecasting, Vol 22, 116.
Vogelvang, B. (2005). Econometrics: Theory and Applications with EViews. Pearson Education Limited,
Essex, England.