Content uploaded by Chanaka Keerthisinghe

Author content

All content in this area was uploaded by Chanaka Keerthisinghe on Dec 14, 2016

Content may be subject to copyright.

1

A Fast Technique for Smart Home Management:

ADP with Temporal Difference Learning

Chanaka Keerthisinghe, Student Member, IEEE, Gregor Verbiˇ

c, Senior Member, IEEE,

and Archie C. Chapman, Member, IEEE

Abstract—This paper presents a computationally efﬁcient

smart home energy management system (SHEMS) using an ap-

proximate dynamic programming (ADP) approach with temporal

difference learning for scheduling distributed energy resources.

This approach improves the performance of a SHEMS by

incorporating stochastic energy consumption and PV generation

models over a horizon of several days, using only the com-

putational power of existing smart meters. In this paper, we

consider a PV-storage (thermal and battery) system, however,

our method can extend to multiple controllable devices without

the exponential growth in computation that other methods such

as dynamic programming (DP) and stochastic mixed-integer

linear programming (MILP) suffer from. Speciﬁcally, probability

distributions associated with the PV output and demand are

kernel estimated from empirical data collected during the Smart

Grid Smart City project in NSW, Australia. Our results show

that ADP computes a solution much faster than both DP and

stochastic MILP, and provides only a slight reduction in quality

compared to the optimal DP solution. In addition, incorporating

a thermal energy storage unit using the proposed ADP-based

SHEMS reduces the daily electricity cost by up to 26.3% without

a noticeable increase in the computational burden. Moreover,

ADP with a two-day decision horizon reduces the average yearly

electricity cost by a 4.6% over a daily DP method, yet requires

less than half of the computational effort.

Index Terms—demand response, smart home energy man-

agement, distributed energy resources, approximate dynamic

programming, dynamic programming, stochastic mixed-integer

linear programming, value function approximation, temporal

difference learning.

NOMENCLATURE

kTime-step

KTotal number of time-steps

iIndex of controllable devices

ITotal number of controllable devices

jIndex of stochastic variables

JTotal number of stochastic variables

s{i,j}

kState of ior jat time-step k

xi

kDecision of controllable device iat time-step k

ωj

kVariation of stochastic variable jat time-step k

CkCost or reward at time-step k

πPolicy

rRealisation of random information

RTotal number of random realisations

αStepsize

Vπ

kExpected future cost/reward for following πfrom k

Chanaka Keerthisinghe, Gregor Verbiˇ

c, and Archie C. Chapman are with

the School of Electrical and Information Engineering, The University of

Sydney, Sydney, New South Wales, Australia. E-mail: {chanaka.keerthisinghe,

gregor.verbic, archie.chapman}@sydney.edu.au.

si,max Maximum state of a controllable device

si,min Minimum state of a controllable device

µiEfﬁciency of a controllable device [%]

liLosses of a controllable device per time-step

aIndex of a segment in the VFA

AkTotal number of segments at time-step k

zka Capacity of a segment

¯vka Slope of a segment

nA particular scenario

NTotal number of scenarios in stochastic MILP

I. INTRODUCTION

FUTURE electrical power grids will be based on a

“demand-following-supply” paradigm, because of the in-

creasing penetration of intermittent renewable energy sources

and advances in technology enabling non-dispatchable loads.

A cornerstone of achieving this is demand side management,

which can be roughly divided into demand response (DR) and

direct load control. This research focuses on DR, which is used

to reduce electricity costs on one side and provide system

services on the other. In particular, we focus on residential

customers, because roughly 30% of the energy use in Australia

is comprised of residential loads [1] and their diurnal patterns

drive daily and seasonal peak loads.

Currently, DR is implemented using small-scale voluntary

programs and ﬁxed pricing strategies. The main drawback of

using the same pricing strategy for all the customers is the

possibility of inducing unwanted peaks in the demand curve

as customers respond to prices. Also, it is not possible for

residential and small commercial users to directly participate

in wholesale energy or ancillary services markets because

of the regulatory regimes and computational requirements.

As such, DR revolves around the interaction between an

aggregator and customers, as shown in Fig 1. In many cases,

the retailer acts as the aggregator. The aggregator’s task is to

construct a scheme that coordinates, schedules or otherwise

controls part of a participating users load via interaction with

its smart home energy management system (SHEMS) [2]. In

the context of DR, the aggregator sends control signals in the

form of electricity price signals to the SHEMS. The SHEMS

then schedules and coordinates customers distributed gener-

ation (DG), storage and ﬂexible loads, collectively known as

distributed energy resources (DERs), to minimise energy costs

while maintaining a suitable level of comfort. In particular, this

research assumes that the exact electricity price signals are

available from a DR aggregator/retailer in the form of time-

2

Agent 1

Large user Retailer

Generator

Residential buildings

Agent 3

Agent 2

Wholesale ele ctricity

market

Generator Generator

Retailer

Energy flow

Information flow

Aggregator

Fig. 1. Customers, aggregator and the wholesale electricity market.

of-use pricing (ToUP). ToUP is chosen as it is prevalent in

Australia [3], [4].

The DERs in the smart homes considered in this paper com-

prise a PV unit, a battery and a thermal energy storage (TES)

unit (i.e. electric water heater). In the existing literature, a

range of DER have been used to achieve DR [5]–[8]. However,

our choice stems from Australia’s increasing penetration of

rooftop PV and battery storage systems in response to rising

electricity costs, decreasing technology costs and existing

ﬂeet of hot water storage devices [9], [10]. According to

AEMO, the payback period for residential PV-storage systems

is already below 15 years in South Australia, with the other

states to follow suit in less than a decade [11]. Similarly,

residential users with PV-storage systems in the USA have

been forecast to reach grid parity within the next decade [12].

The SHEMS schedules the DER in a such a way that

electrical power drawn from the grid is minimised, especially

during peak periods. Given the recent drop in PV costs, feed-

in tariffs (FiTs) in Australia are signiﬁcantly less than the

retail tariffs paid by the households, and it is anticipated that

this may happen in other parts of the world too. As such,

selling power back to the electrical grid is uneconomical,

and households have a strong incentive to self-consume as

much locally generated power as possible. Therefore, when

PV generation is higher than electrical demand, an effective

SHEMS should either store the surplus energy or consume it

using controllable loads.

The underlying optimisation problem here is a sequential

decision making process under uncertainty. As shown in [13],

SHEMSs that consider stochastic variables such as variations

in PV output, electrical and thermal demand using appropriate

probability distributions yield better quality schedules than

those obtained from a deterministic model. Moreover, in [14]

we showed that extending the optimisation horizon beyond

one day is economically beneﬁcial as the SHEMS can exploit

inter-daily variations in consumption and solar-insolation pat-

terns. Given this, stochastic mixed-integer linear programming

(MILP) [13], [15]–[18], particle swarm optimisation [19], [20]

and dynamic programming (DP) [21], [22] have previously

been proposed to solve the SHEMS problem. In particular, DP

accommodates full non-linear controllable device models and

produces close-to-optimal solutions when ﬁnely discretised.

However, DP is infeasible if the optimisation horizon is

extended or when multiple controllable devices are added to

the SHEMS due to the exponential growth in the state and

action spaces [23].

Given these insights, in [24] we reported our prelimi-

nary work towards implementing a computationally efﬁcient

SHEMS using approximate dynamic programming (ADP) with

temporal difference learning, an approach that has successfully

been applied for the control of grid level storage in [25]. In this

proposed approach, we obtain policies from value function ap-

proximations (VFAs)1. Our choice to use VFAs is based on the

observation that they work best in time-dependent problems

with regular load, energy and price patterns, relatively high

noise and less accurate forecasts (errors grow with the horizon)

[26]. Other ADP methods are better suited to applications with

different characteristics [27]–[33].

Building on this, the main contributions of this paper are

the development of a computationally efﬁcient SHEMS using

ADP with temporal difference learning, which can incorporate

multiple controllable devices, and a demonstration of its

practical implementation using empirical data. Speciﬁcally, the

proposed ADP method enables us to:

1) incorporate stochastic input variables without a noticeable

increase in the computational burden;

2) extend the decision horizon with less computational bur-

den to consider uncertainties over several days, which

results in signiﬁcant ﬁnancial beneﬁts;

3) enable integration of multiple controllable devices with

less computational burden than DP;

4) integrate the SHEMS into an existing smart meter as it

uses less memory compared to existing methods.

In order to show the performance of ADP, we use it with

a two-day decision horizon over three years using a rolling

horizon approach where the expected electricity cost of each

day is minimised considering the uncertainties over the next

day. The daily performance is benchmarked against DP and

stochastic MILP by applying them to three different scenarios

with different electrical and thermal demand patterns and PV

outputs. The three year evaluation is benchmarked against

a daily DP approach by applying it to 10 smart homes.

Moreover, the PV output and demand proﬁles are from data

[34] collected during the Smart Grid Smart City project by

Ausgrid and their consortium partners in NSW, Australia,

which investigated the beneﬁts and costs of implementing

a range of smart grid technologies in Australian households

[35]. Speciﬁcally, SHEMSs estimate probability distributions

associated with the PV output and electrical demand from

empirical data using kernel regression, which are more realistic

than assuming parametric approaches. In addition, throughout

1A value function describes the expected future cost of following a policy

from a given state.

3

the paper we demonstrate the beneﬁts of residential PV-storage

systems with a SHEMS.

The paper is structured as follows: Section II states the

stochastic energy management problem and the existing so-

lution techniques. This is followed by a description of the

ADP formulation in Section III. The stochastic variable models

are described in Section IV. SectionV presents the simulation

results and the discussion.

II. SMART HOME ENERGY MANAGEMENT PROBLEM

In this section, we describe the general formulation of

the sequential stochastic optimisation problem, formulate our

stochastic SHEMS problem as a sequential stochastic opti-

misation problem, and present a short description about the

stochastic MILP and DP use to solve this problem.

A. General sequential stochastic optimisation problem

A sequential stochastic optimisation problem comprises:

•A sequence of time-steps,K={1. . . k . . . K }, where k

and Kdenote a particular time-step and the total number

of time-steps in the decision horizon, respectively.

•A set of controllable devices, I={1...i...I}, where

each iis represented using:

•A state variable, si

k∈ S.

•A decision variable, xi

k∈ X , which is a control action.

•A set of non-controllable inputs, J={1. . . j . . . J }, where

each jis represented using:

•A state variable, sj

k∈ S.

•A random variable, ωj

k∈Ω, capturing exogenous

information or perturbations.

Given this, we let: sk= [ si

k. . . sI

k, sj

k. . . sJ

k]T,xk=

[xi

k. . . xI

k]T, and ωk= [ ωj

k. . . ωJ

k]T. Note that the

state variables contain the information that is necessary

and sufﬁcient to make the decisions and compute costs,

rewards and transitions.

•Constraints for state and decision variables.

•Transition functions, sk+1 =sM(sk,xk,ωk), describing

the evolution of states from kto k+ 1, where sM(.)

is the system model that consists of controllable device

i’s operational constraints such as power ﬂow limits,

efﬁciencies and losses. Note that the transition functions

are only required for the controllable devices.

•An objective function:

F=E(K

X

k=1

Ck(sk,xk,ωk)),(1)

where Ck(sk,xk,ωk)is the contribution (i.e cost or

reward of energy, or a discomfort penalty) incurred at

time-step k, which accumulates over time.

B. Instantiation

The objective of the SHEMS is to minimise energy costs

over a decision horizon. The sequential stochastic optimisation

problem is solved before the start of each day, using either a

daily or a two day decision horizon. In this paper we consider a

Battery

Electric Water

System

PV System

Controller

+

Electrical Grid

Smart Meter

Demand

=~

Thermal Energy

Electrical Energy

+

b

s

p

v

s

p

s

t

s

d,t

s

d,e

s

b

x

p

v

+

d,e

+

d,t

+

wh

x

Inverter

Fig. 2. Illustration of electrical and thermal energy ﬂows in a smart home,

and the state, decision and random variables use to formulate the problem.

PV unit, battery and a hot water system (TES unit), as depicted

in Fig. 2. We use a single inverter for both the battery and the

PV, which is becoming popular in Australia.

In order to optimise performance, a SHEMS needs to

incorporate the variations in PV output, electrical and thermal

demand of the household. Given this, we model stochastic

variables using their mean as state variables and variation as

random variables. This enables us to use an algorithmic strat-

egy that separates the transition function into a deterministic

term, using the mean, and a random term, using variation

(discussed in Section III). In some cases, electricity prices

may be considered as stochastic variables. However, in this

paper, we assume that the exact electricity prices are available

before the start of the decision horizon from a residential DR

aggregator/retailer (i.e. in the form of ToUP).

In more detail, we cast our SHEMS problem as the se-

quential stochastic optimisation formulation in Section II-A

as follows: The daily decision horizon is a 24 hour period,

divided into K= 48 time-steps with a 30 minutes resolution.

We do this similarly for the two-day decision horizon. Here

30 minutes time resolution is chosen to match with typical

dispatch time lines because the PV and demand data from the

Smart Grid Smart City project [34] are only available at 30

minutes intervals. If required, the proposed ADP approach can

increase the time resolution with less computational burden

compared to existing methods. The controllable devices are the

battery and the TES, while the non-controllable inputs are the

PV output and the electrical and thermal demand. As depicted

in Fig. 2, for each time-step, k, in the decision horizon, state

variables are used to represent: the battery SOC, sb

k; TES

SOC, st

k; mean PV output, spv

k; mean electrical demand, sd,e

k;

mean thermal demand, sd,t

k; and electricity tariff, sp

k. Control

variables consist of: charge and discharge rates of the battery,

xb

k; and electric water heater input, xwh

k. Random variables

are: the variations in PV output, ωpv

k; variations in thermal

demand, ωd,t

k; and variations in electrical demand, ωd,e

k. We

use empirical data to estimate the probability distributions

associated with the uncertain variables using kernel regression,

which are more realistic than assuming parametric approaches

(discussed in detail in Section IV).

The energy balance constraint is given by:

sd,e

k+ωd,e

k+xwh

k=µixi

k+xg

k,(2)

where: xi

k=spv

k+ωpv

k−µbxb

kis the inverter power at the

DC side (positive value means power into the inverter); µiis

4

the efﬁciency of the inverter (note that the efﬁciency is 1/µi

when the inverter power is negative); µbis the efﬁciency of the

battery action corresponding to either charging or discharging;

and xg

kis the electrical grid power. The charge rate of the

battery is constrained by the maximum charge rate xb+

k≤γc

and discharge rate of the battery is constrained by the maxi-

mum discharge rate xb−

k≥γd. The electric water heater input

should never exceed the maximum possible electric water

heater input xwh

k≤γwh. In order satisfy thermal demand at

all time-steps, we make sure that the TES has enough energy

at each time-step, st,req

kto satisfy thermal demand for the next

2 hours. Therefore, the energy stored in the TES is always

within the limits:

st,req ≤st

k≤st,max.(3)

The energy stored in the battery should be within the limits

sb,min ≤sb

k≤sb,max.

Transition functions govern how the state variables evolve

over time. The battery SOC, denoted sb

k∈[sb,min, sb,max

k],

progresses by:

sb

k+1 =1−lb(sb

k)sb

k−xb-

k+µb+xb+

k,(4)

where lb(sb

k)models the self-discharging process of the bat-

tery. The TES SOC is denoted st

k∈[st,req, st,max

k], and evolves

according to:

st

k+1 =1−lt(st

k)st

k−sd,t

k−ωd,t

k+µwhxwh

k,(5)

where lt(st

k)models the thermal loss of the TES and µwh is

the efﬁciency of the electric water heater. In the above, both

the transition functions are non-linear functions of state.

The discharge efﬁciency of the battery and the efﬁciency

of the inverter are non-linear, and the different ways that the

stochastic MILP, DP and ADP approaches represent them

are illustrated in Fig 3. These indicate that DP and ADP

can directly incorporate non-linear characteristics, while linear

approximations have to be made with stochastic MILP. For all

the implemented SHEMSs the following device characteristics

are the same: the charging efﬁciency of the battery is µb+ = 1;

the maximum and minimum battery SOC are 2 kWh and 10

kWh, respectively; the maximum charge and discharge rates

of the battery are 2kWh; electric water heater efﬁciency is

µwh = 0.9while its maximum possible input is 3kWh; and

TES limit is set to 12 kWh.

The optimal policy, π∗, is a choice of action for each state

π:S → X , that minimises the expected sum of future costs

over the decision horizon; that is:

Fπ∗= min

π∗

E(K

X

k=0

Ck(sk, π(sk),ωk)),(6)

where Ck(sk,xk,ωk)is the cost incurred at a given time-step,

which is given by:

Ck(sk,xk,ωk) = sp

ksd,e

k+ωd,e

k−µixi

k+xwh

k.(7)

Note that we don’t use any speciﬁc user comfort criteria in the

contribution function. However, we endeavour to supply the

thermal demand at all time-steps without any user discomfort

0.2 0.4 0.6 0.8 1 1.2 1.4

0

50

100

Input power of the inverter (kWh)

Efficiency of the

inverter (%)

ADP and DP

Stochastic MILP

0 0.2 0.4 0.6 0.8 1

70

80

90

Battery discharge rate (kWh)

Discharge efficiency

of the battery (%)

ADP and DP

Stochastic MILP

(b)

(a)

Fig. 3. Characteristics of the battery and the inverter.

by penalising undesired states of the TES in DP and ADP,

and directly using the constraint (3) in stochastic MILP .

The problem is formulated as an optimisation of the expected

contribution because the contribution is generally a random

variable due to the effect of ωk. In all the SHEMSs, we

obtain the decisions xk=π(sk)=[xb

k, xwh

k], depending on the

state variables sk= [sb

k, st

k, sd,e

k, spv

k, sd,t

k, sp

k], and realisations

of random variables ωk= [ωpv

k, ωd,e

k, ωd,t

k]at each time-step.

C. Solution techniques

The ﬁrst method we use is a scenario-based MILP ap-

proach, which we referred to as stochastic MILP in [23].

This technique requires us to linearise the constraints and

transition functions mentioned in Section II-B and model the

problem as a mathematical programming problem. The second

method we use is DP, in which we model our problem as

a Markov decision process (MDP). This method enables us

to incorporate all the non-linear constraints and transition

functions with no additional computational burden over using

linear constraints and transition functions. Details of these

methods are as follows:

1) Stochastic MILP: The deterministic version of the

SHEMS problem can be solved using a MILP approach,

which optimises a linear objective function subject to linear

constraints with continuous and integer variables [13]. Note

that the transition functions presented in Section II-B are

considered as constraints in the MILP formulation. Integer

variables are used to model power ﬂow directions.

In order to incorporate stochasticity, a large set of scenarios

are generated by sampling from all combined realisations of

the stochastic variables mentioned in Section II-B. A larger

number of scenarios should improve the solutions generated by

better incorporating the stochastic variables, but this imposes

a greater computational burden. Therefore, heuristic scenario

reduction techniques are employed to obtain a scenario set

of size N, which can be solved within a given time with

reasonable accuracy.

5

Given this, a scenario-based stochastic MILP formulation

of the problem is described by:

min

N

X

n=1

Pn(sj,n)

K

X

k=1 sp,buy

kxg+

k−sp,sell

kxg-

k,(8)

where Pn(sj,n)is the probability of a particular scenario n

corresponding to realizations of stochastic variables sj, subject

to PN

n=1 Pn(sj,n)=1.

For each realized scenario, the optimisation problem is

solved for the whole horizon at once using a standard MILP

solver, so the solution time grows exponentially with the length

of the horizon. Here CPLEX is used, however, all commercial

solvers gives similar quality solutions. As such, in the existing

literature, a one day optimisation horizon is typically assumed.

Moreover, the solutions are of lower quality because of the

linear approximations made and the inability to incorporate

all the probability distributions [23]. In response to these

limitations, DP was proposed in [22] to improve the solution

quality.

2) Dynamic programming (DP): The problem in (6) is

easily cast as an MDP due to the separable objective function

and Markov property of the transition functions. Given this,

DP solves the MDP form of (6) by computing a value function

Vπ(sk). This is the expected future cost of following a policy,

π, starting in state, sk, and is given by:

Vπ(sk) = X

s0∈S

P(s0|sk, π(sk),ωk) [C(sk, π(sk),s0) + Vπ(s0)] .

(9)

An optimal policy, π∗, is one that minimises (6), and which

also satisﬁes Bellman’s optimality condition:

Vπ∗

k(sk) = min

π∗Ck(sk, π(sk)) + EnVπ∗

k+1(s0)|sko.(10)

The expression in (10) is typically computed using backward

induction, a procedure called value iteration, and then an

optimal policy is extracted from the value function by selecting

a minimum value action for each state. This is the key func-

tional point of difference between DP and stochastic MILP.

DP enable us to plan ofﬂine by generating value functions

for every time-step. Once we have the value functions, we

can make faster online solutions using (10) (more details are

towards the end of this section). Note that a value function at

a given time-step consists of the expected future cost from all

the states. This process of mapping states and actions is not

possible with stochastic MILP.

An illustration of a deterministic DP using a simpliﬁed

model of a battery storage is shown in Fig. 4. At every time-

step, there are three battery SOC states (i.e. highest, middle,

and lowest) and three possible battery actions that results in

different instantaneous costs. At the last time-step, k=K, the

expected future cost from the desired state, sk= M, is zero,

while the other two states are penalised with a large cost. This

is an important step that allows us to control the end-of-day

battery SOC (discussed in Section V). The expected future cost

at every possible state is calculated using (10), which is the

minimum of the combined instantaneous cost that results from

the decision that we take and the expected future cost from the

k=1 k=2 k=3 k=K

b

k

s

- Battery state of charge

b

k

x

- Battery action

3

4

5

11

4

6

3

5

3

5

7

5

6

3

2

2

6

1

5

bb

111

(,)2

Csx

Optimal policy

- Expected future cost

5

3

5

b

2

L

s

b

22

8Vs

b

2

M

s

b

22 7Vs

b

2

H

s

b

33

11Vs

b

1

s

b

11

10Vs

b

0

KK

Vs

b

4

H

s

b

44

3Vs

b

4

M

s

b

44

4Vs

b

4Ls

b

44

5Vs

b

33

6Vs

b

33

4Vs

b

3

Hs

b

3

L

s

b

33

7Vs

b

3

M

s

4

2

b

kk

Vs

b

b

11 1

(, )Cs x

- Contribution

bL

K

s

b

10

KK

Vs

b

10

KK

Vs

b

H

K

s

b

M

K

s

k=4

3

Fig. 4. A deterministic DP example using a battery storage, where expected

future cost is calculated using (10). The instantaneous contributions from the

battery decisions are on the edges of the lines while the expected future cost

is below the states. The optimal policy satisﬁes (6), which is obtained using

(10).

state we end up at the next time-step. In Fig 4, instantaneous

cost is on the edges of the lines while the expected future

cost is below the states. An optimal policy is extracted from

the value functions by selecting a minimum value action for

each state using (10). For example, from sb

1, if we take the

optimal decision to go to sb

2= L then the total combined cost

of 10 consists of a instantaneous cost of 2and a expected

future cost of 8. Even though the expected future cost of 7

from sb

2= M is lower than the expected future cost from sb

2=

L, the instantaneous cost that takes us there is 4so the total

combined cost is 11. Given this, the expected future cost of

following the optimal policy from sb

1is 10 and at time-step 2

we will be at sb

2= L.

There are several reasons to prefer DP over MILP. First, DP

produces close-to-optimal solutions when the value functions

obtained during the ofﬂine planning phase are from ﬁnely dis-

cretised state, action and outcome spaces. Second, in practical

applications, the SHEMS can make real-time decisions using

the policy implied by (10). This means that at each time-step,

the optimal decision from the current state can be executed.

Note that (10) is a simple linear program at each time-step so

is computationally feasible using existing smart meters. This

stands in contrast to a stochastic MILP formulation, which

would involve solving the entire stochastic MILP program,

which is computationally difﬁcult even for the ofﬂine planning.

Third, we can always obtain a solution with DP regardless

of the constraints and the inputs while MILP fails to ﬁnd

a solution when the constraints are not satisﬁed. From our

experience, MILP fails to ﬁnd a solution when the end-of-

day TES SOC is ﬁxed, because the energy out of the TES

unit can not be controlled, which is the thermal demand of

the household. We can overcome this by either removing the

end-of-day TES constraint or by having a range of values.

However, this means we end up with a sub optimal level of

6

Algorithm 1 :ADP using Temporal Difference Learning TD(1)

1: Initialize ¯

V0

k, k ∈K,

2: Set r= 1 and k= 1,

3: Set s1.

4: while r≤Rdo

5: Choose a sample path ωr.

6: for k= 0,...,K do

7: Solve the deterministic problem (17).

8: for i= 1,...,I do

9: Find right and left marginal contributions (18).

10: end for

11: if k < K then

12: Find the post decision states (11) and the next pre

decision states (12).

13: end if

14: end for

15: for k=K, . . . , 0do

16: Calculate the marginal values (19).

17: Update the estimates of the marginal values (20).

18: Update the VFAs using CAVE algorithm.

19: Combine value functions of each controllable device (16)

20: end for

21: r=r+ 1.

22: end while

23: Return the value function approximations ¯

VR

k∀k.

TES SOC at the end of the day or require user interaction

to adjust the TES SOC, which we should avoid in practical

applications.

However, the required computation to generate value func-

tions using DP grows exponentially with the size of the

state, action and outcome spaces. One way of overcoming

this problem is to approximate the value function, while

maintaining the beneﬁts of DP.

III. APP ROXIMATE DYNA MI C PROGRAMMING (ADP)

ADP, also known as forward DP, is an algorithmic strategy

for approximating a value function, which steps forward in

time, compared to backward induction used in value iteration.

Policies in ADP are extracted from these VFAs [29]. Similar to

DP, ADP operates on an MDP formulation of the problem, so

all the non-liner constraints and transition functions in Section

II-B can be incorporated with the same computational burden

as modelling linear transition functions and constraints. ADP

is an anytime optimisation solution technique2so we always

obtain a solution regardless of the constraints and the inputs.

In this instance, the problem is formulated as a maximisation

problem for convenience.

A. Policy-based value function approximation

VFAs are obtained iteratively, and here the focus is on

approximating the value function around a post decision state

vector, sx

k, which is the state of the system at discrete time, k,

soon after making the decisions but before the realisation of

any random variables [29]. This is because approximating the

expectation within the max or min operator in (10) is difﬁcult

2An anytime algorithm is an algorithm that returns a feasible solution even

if it is interrupted prematurely. The quality of the solution, however, improves

if the algorithm is allowed to run until the desired convergence.

k=1 k=2

1

s

1

L

x

s

1

M

x

s

,

211

(,)

Mx

sS s

2

M

s

222

(,)

Csx

2

H

s

- Pre-decision state

k

x

- Decision variable

k

k

s

- Random variable

- Contribution

Next pre-decision state

(, )

kkk

Csx

1

H

x

s

1

L

x

s

2

Ls

,

111

(, )

xMx

s

Ssx

Post-decision state

2

M

x

s

2

H

x

s

2

L

x

s

2

x

s

2

x

s

- Post-decision state

x

k

s

111

(,)

Csx

1

Fig. 5. Illustration of the modiﬁed Markov decision process, which separates

the state variables into post decision states and pre-decision states.

in large practical applications as transition probabilities from

all the possible states are required. Pseudo-code of the method

used to approximate the value function is given in Algorithm1,

which is a double pass algorithm referred to as temporal

difference learning with a discount factor λ= 1 or TD(1).

Given this, the original transition function sk+1 =

sM(sk,xk,ωk)is divided into the post-decision state:

sx

k=sM,x (sk,xk),(11)

and the next pre-decision state:

sk+1 =sM,ω (sx

k,ωk),(12)

which are used in line 12 of Algorithm 1.

An example of the new modiﬁed MDP is illustrated in

Fig. 5, which uses the mean and variation of the stochastic

variables to obtain the post-decision and next pre-decision

states, respectively. In more detail, at s1, there are three

possible decisions that takes us to three post-decision states,

which correspond to high, middle and lowest states. However,

the next pre decision state s2depends on the random variables

ω1.

Given this, the new form of the value function is written as:

¯

Vπ

k(sk) = max

xkCk(sk,xk) + ¯

Vπ,x

k(sx

k),(13)

where ¯

Vπ,x

k(sx

k)is the VFA around the post-decision state sx

k,

given by:

¯

Vπ,x

k(sx

k) = EVπ

k+1(sk+1 )|sx

k.(14)

This method is computationally feasible because

EVπ

k+1(sk+1 )|sx

kis a function of the post-decision

state sx

k, that is a deterministic function of xk. However,

in order to solve (13), we still need to calculate the value

functions in (14) for every possible state sx

kfor all k. This

can be computationally difﬁcult since sx

kis continuous and

multidimensional, so we approximate (14).

Two strategies are employed. First we construct lookup

tables for VFAs in (14) that are concave and piecewise linear in

the resource dimension of all the state variables of controllable

devices [25]. For example, in the VFA for k= 49, which

7

2 4 6 8 10

0

0.5

1

Battery state

Expected future reward ($)

k=49

k=60

500 1000

0

1.5

Iterations

Objective function ($)

ADP

DP

(b)

(a)

Fig. 6. (a) Expected future reward or VFA (14) for following the optimal

policy vs. state of the battery for time-steps k= 49 and k= 60, and (b)

value of the objective function (i.e. reward) vs. iterations for ADP approach

and the expected value from DP.

is depicted in Fig. 6(a), the expected future rewards stay the

same after approximately 7 kWh, so if we are at 7 kWh in

the time-step k= 48, charging the battery further will have

no future rewards and will only incur an instantaneous cost

if the electricity has to come from the grid. However, if the

electricity price or demand is high then we can discharge

the battery as the expected future rewards will only decrease

slightly. Given this, we never charge the storage when there is

no marginal value so the slopes of the VFA are always greater

than or equal to zero. Accordingly, the VFA is given by:

¯

Vi

k(si,x

k) =

Ak

X

a=1

¯vkazk a,(15)

where Pazka =si,x

kand 0≤zka ≤¯zka for all a.zka is

the resource coordinate variable for segment a∈(1 . . . Ak),

Ak∈ A,¯zka is the capacity of the segment and ¯vka is the

slope. Other strategies that could be used for this step are

parametric and non-parametric approximations of the value

functions [29].

Second we handle the multidimensional state space by

generating independent VFAs for each controllable device,

which are then combined to obtain the optimum policy. The

separable VFA is given by:

¯

Vk(sx

k) =

I

X

i=1

¯

Vi

k(si,x

k).(16)

It is possible to separate the VFAs for battery and TES because

their state transitions are independent as shown in (4) and

(5), respectively. Instead, the inter-device coupling between the

battery and the TES only arise through their effect on energy

costs, so it is done in the contribution function in (7). If the

state transition functions of the controllable devices depend

on each other, then the VFAs are not separable and we have

to use multidimensional value functions. In such situations the

number of iterations needed for VFA convergence will increase

and concavity needs to be generalised as well.

In more detail, Algorithm 1 proceeds as follows:

1) Set the initial VFAs to zero (i.e. all the slopes to zero)

or to an initial guess to speed up the convergence (lines

1-3). Estimates for the initial VFAs can be obtained by

solving the deterministic problem using MILP. The value

of the initial starting state si

1is assumed.

2) For each realisation of random variables, step forward in

time by solving the following deterministic problem (line

7) using the VFA from the previous iteration:

xr

k=arg max

xk∈XkC(sr

k,xk) + ¯

Vr−1

k(sx,r

k),

=arg max

xk∈Xk

C(sr

k,xk) +

Ar−1

k

X

a=1

¯vr−1

ka zka

.(17)

3) Determine the positive and the negative marginal contri-

butions ˆcr,i+

k(sr,i

k)and ˆcr,i−

k(sr,i

k), respectively (line 9)

for each controllable device, using:

ˆcr,i+

k(sr,i

k) = cr,i+

k(sr,i+

k, xr,i+

k)−cr,i

k(sr,i

k, xr,i

k)

δs ,

ˆcr,i−

k(sr,i

k) = cr,i

k(sr,i

k, xr,i

k)−cr,i−

ki (sr,i−

k, xr,i−

k)

δs ,(18)

where sr,i+

k=sr,i

k+δs,xr,i+

k=Xπ

ksr,i+

k,and δs

is the mesh size of the state space. We do this similarly

for sr,i−

kand xr,i−

k.

4) Find the post-decision and the next pre-decision states

using (11) and (12), respectively. Transition functions of

the controllable devices can be non-linear (line 12).

5) Starting from K, step backward in time to compute the

slopes, ˆvr,i+

k, which are then used to update the VFA (line

16). Compute ˆvr,i+

k:

ˆvr,i+

k(sr,i

k) =

ˆcr,i+

K(sr,i

K),if k=N

ˆcr,i+

k(sr,i

k)+

∆r,i+

kˆvr,i+

k+1 (sr,i

k+1)otherwise

,(19)

where ∆r,i+

k=1

δs SM(xr,i

k−xr,i+

k)is the marginal

ﬂow (i.e. whether or not there is a change in energy in

the storage as a result of the perturbation). We do this

similarly for ˆvr,i−

k(sr,i

k). Note that we took the power

coming out of the storage as negative.

6) Update the estimates of the marginal values [27]:

¯vr,i+

k−1(sx,r,i

k−1) = 1−αr−1¯vr−1,i+

k−1(sx,r,i

k−1) + αr−1ˆvr,i+

k,

(20)

where αis a “stepsize”; α∈(0,1], and similarly for

¯vr,i−

k−1(sx,r,i

k−1)(line 17). In this research, a harmonic step-

size formula is used, α=b/(b+r), where bis a constant.

This step-size formula satisﬁes conditions ensuring that

the values will converge as r→ ∞ [36].

7) Use the concave adaptive value estimation (CAVE) algo-

rithm to update the VFAs [37] (line 18).

8) Combine value functions of each device using (16).

9) Repeat this procedure over Riterations, which are gener-

ated randomly according to the probability distributions

of the random variables. We ﬁnd that R= 1000 re-

alisations is enough for the objective function to come

8

0.2 0.4 0.6 0.8 1

0

0.5

1

Normalised PV output

Probability density

6 am 9 am 10 am 12 pm 4 pm

0.2 0.4 0.6 0.8 1

0

0.5

1

Normalised electrical demand

6 am 10 am 2 pm 5 pm 9 pm

(b)

(a)

Fig. 7. Probability density functions of the (a) PV output over different times

of a sunny day (January 1st, 2013), and (b) electrical demand over different

times of a high demand day (July 2nd, 2012).

within an acceptable accuracy even for the worst possible

scenario. We investigated a range of scenarios and an

example is given in Fig.6(b).

Note that when solving a deterministic SHEMS problem

using ADP, the post decision and next pre decision states are

the same because the random variables will be zero. However,

the remaining steps in Algorithm 1 stay the same. This means,

with ADP, there is no noticeable computational burden for

considering variation in the stochastic variables compared to

solving the deterministic problem. On the other hand, with

DP we have to loop over all the possible combinations of

realisation of random variables, which signiﬁcantly increases

the computational burden.

Now we present the probabilistic models of the stochastic

variables.

IV. ESTIMATING STOCHASTIC VARIABLE MODELS

In order to optimise performance, it is important for a

SHEMS to incorporate variations in the PV output and elec-

trical and thermal demand, and to do so over a horizon of

several days. The beneﬁts of using a stochastic optimisation

over a deterministic optimisation are discussed in Section V-

D and in [13], [19], [20], [22], [23]. Given this, SHEMSs

require the mean PV output and the demand with its appro-

priate probability distributions before the start of the decision

horizon. The effects of these random variations on the SHEMS

problem are discussed below:

1) PV output depends on solar insolation, a forecast of

which can be obtained before the horizon starts with a

reasonable accuracy from weather forecasting services.

PV output is important to the SHEMS problem as it is

a key source of energy and is expected to be closely

coupled with the battery storage proﬁle. Failing to accom-

modate for variation in PV generation would be expected

to increase costs to the household as more power is

imported from the grid.

2) Electrical demand of the household depends on the num-

ber of occupants and their behavioural patterns, which is

difﬁcult to predict in the real world. In the context of

SHEMS, electrical demand should be supplied from the

DG units, storage units and the electrical grid. Failure to

accommodate variations in electrical demand may result

in additional costs to the household.

3) Thermal demand is also difﬁcult to predict in the real

world so failure to accommodate variations in thermal

demand may result in user discomfort.

In this paper, the mean PV output and electrical demand

are from a data set collected during Smart Grid Smart City

project. The data set [34] consists of PV output and electrical

demand measurements at 30 minutes intervals over 3 years

for 50 households. In real world applications, the SHEMSs

estimate mean PV output using the weather forecast [38] and

the mean electrical demand using a suitable demand prediction

algorithm [39], [40].

Commonly, probability distributions associated with these

stochastic variables are modelled as Gaussian or skew-Laplace

distributions. However, in this paper, we kernel estimate the

probability distributions of PV-output and electrical demand

using a hierarchical approach, where we ﬁrst cluster total

daily empirical data, and then kernel estimate probability

distributions within each cluster. In more detail, we obtain

the probability distributions of the PV output, which depend

on the time and type of day (sunny, normal or cloudy days)

in two steps.

1) First we cluster daily empirical data using a k-means

algorithm to obtain 3 clusters with different total daily PV

generations, corresponding to sunny, normal and cloudy

days.

2) Second, for each time-step in the corresponding clusters,

we estimate a probability distribution of the PV output

using an Epanechnikov kernel estimating technique.

Note that the draws from the kernel estimates within a day-

type are independent. We obtain the probability distributions

of the electrical demand in a similar way except the clustering

is done according to days with high, normal or low demand

levels. The probability distributions of the PV output and

electrical demand follows a skewed unimodal distributions as

depicted in Fig. 7. It is worthwhile to note that before the start

of the decision horizon, the SHEMS uses the predicted mean

PV-output and the electrical demand to determine the type of

day and hence the corresponding probability distribution. Note

that the prediction is accurate enough to decide the type of day

but not accurate enough to use a deterministic optimisation.

Given this, we investigate the effects on the total electricity

costs from deterministic and stochastic SHEMSs, using a range

of possible PV output and demand proﬁles, in Section V-D.

Finally, we construct the magnitudes of the thermal demand

and the time they occur making use of Australian Standard

AS4552 [41] and the hot water plug readings in [35]. We

assume a Gaussian distribution for the thermal demand be-

cause there is not enough empirical data to obtain a reasonable

distribution.

9

TABLE I

DAILY OPTIMISATION RESULTS FOR THE THREE SCENARIOS.

Total daily: Scenario 1 Scenario 2 Scenario 3

Electrical demand (kWh) 24.72 64.75 10.48

Thermal demand (kWh) 10.42 19.1 8.66

PV generation (kWh) 9.27 6.06 12.31

Benchmark cost ($) 6.13 10.5 1.77

With PV ($) 5.75 9.86 1.37

DP (sb

1=sb

K= 6) ($) 3.16 8.04 0.72

DP ($) 3.05 7.96 0.59

ADP ($) 3.14 7.79 0.6

Stochastic MILP ($) 3.15 8.11 0.63

Dummy TES control ($) 2.13 2.56 0.83

DP TES control ($) 0.91 1.68 0.58

Marginal value of TES ($) 1.22 0.88 0.25

Now we show the performance of the presented SHEMSs

using real-data.

V. SIMULATION RESULTS AND DISCUSSION

There are three sets of simulations. The ﬁrst set consists

of discussions about: challenges of estimating the end-of-day

SOC; beneﬁts of PV-storage systems; quality of the solutions

from ADP, DP and stochastic MILP; and the beneﬁts of

a stochastic optimisation over a deterministic one (Sections

V-A, -B, -C and -D). The second set has discussions on:

computational aspects; and effects of extending the decision

horizon (Sections V-E, and -F). The third set is about the year-

long optimisation (Section V-G). The time-of-use electricity

tariff consists of off-peak, shoulder and peak periods with

$0.11,$0.2and $0.47 per kWh, respectively. In weekdays,

off-peak period is between 12 am - 7 am and 10 pm - 12 am,

and the peak is between 2 pm - 8 pm. On weekends peak-

period is replaced by the shoulder. Stochastic MILP uses 6000

scenarios. MATLAB is used to implement all the SHEMSs.

A. Challenges of estimating the end-of-day SOC

In the ﬁrst set of simulations, we discuss the challenges

of estimating the end-of-day battery SOC (Section V-A),

beneﬁts of residential PV-storage systems (Section V-B), the

performance of the three SHEMSs over a day (Section V-

C) and the beneﬁts of using a stochastic optimisation over

a deterministic one (Section V-D) using three scenarios for

a Central Coast, NSW, Australia, based residential building

shown in Fig. 8: Scenario 1, 2, and 3 are on August 20th ,

2012, July 2nd, 2012, and January 1st , 2013, respectively. The

PV system size is 2.2kWp.

From our preliminary investigations, and as depicted in

Fig 6(a), the expected future rewards in the start-of-day two

(time-step k= 49) only increases slightly after 6kWh, which

suggests that using half of the available battery capacity as the

start-of-day and end-of-day battery SOC (sb

1=sB

K= 6 kWh)

in daily optimisations with DP is a valid assumption. However,

we observe that on days with a low morning demand, high

2 4 6 8 10 12 14 16 18 20 22 24

0

2.2

2 4 6 8 10 12 14 16 18 20 22 24

0

3.3

Power (kW)

2 4 6 8 10 12 14 16 18 20 22 24

0

1.2

Time (hour)

PV output Electrical demand Thermal demand

(b)

(a)

(c)

Fig. 8. PV output and electrical and thermal demand for (a) Scenario 1, (b)

Scenario 2 and (c) Scenario 3.

PV output and medium-high evening demand (see Fig. 8(c)),

sb

1=sB

K= 2 kWh gives the best results because the battery

can be used to supply the evening demand and there is no need

to charge it back. However, the next day’s electricity cost can

signiﬁcantly increase if we are anticipating a high morning

demand and low-high PV output (see Fig. 8(a-b)). Because of

such situations, it is beneﬁcial to control the end-of-day battery

SOC by considering uncertainties over several days, which is

our special focus of attention in Section V-E & F.

B. Beneﬁts of PV-storage systems

Residential PV-storage systems using a DP based SHEMS

result in signiﬁcant ﬁnancial beneﬁts, which is evident from

the electricity cost of the three scenarios in three instances

(i.e. benchmark cost with neither PV or storage, cost with

PV but no storage and PV-storage system with SHEMSs) in

Table. 1. The daily electricity cost can be reduced by 6.2%,

6.1% and 22.6% for Scenario 1, 2 and 3, respectively, if there

is only a PV system. We can further improve this by having a

battery and effectively controlling its SOC using a SHEMS in

which the battery is charged to a certain level from solar power

and electrical grid before peak periods. A DP based SHEMS

constrained to a 60% start-of-day and end-of-day battery SOC

reduces the total electricity cost by further 42.25%, 17.33%

and 36.72% for Scenario 1, 2, & 3, respectively. That is a total

of 48.45%, 23.43% and 59.32% cost reduction for Scenario

1, 2 and 3, respectively, by controlling both the battery and

TES. As shown in Fig. 8, inhabitants are away during the day

for Scenario 1 and 3 so the extra PV generation is stored

in the battery thus shows the beneﬁts of having a storage.

In contrast to Scenarios 1 & 3, Scenario 2 electrical demand

10

0 0.2 0.4 0.6 0.8 1

2

2.5

3

0 0.2 0.4 0.6 0.8 1

5

5.5

6

6.5

Standard deviation (kW)

0 0.2 0.4 0.6 0.8 1

Total electricity cost ($)

0

0.5

1

Deterministic Stochastic

(a)

(c)

(b)

Fig. 9. The total electricity cost of PV-battery systems of Scenarios 1-3

from deterministic and stochastic ADP for a range of possible PV output and

demand proﬁles, which are generated by adding Gaussian noise with varying

standard deviation and zero mean to the actual PV output and electrical

demand.

exceeds PV generation so the beneﬁt of battery is minimal.

The electricity cost of controlling the TES in Scenario 1 and

3 are the lowest because the surplus of solar and battery power

is used to charge the TES instead of sending it back to the

grid as FiTs are negligible in Australia. Note that we obtain

the benchmark cost of the TES by assuming a dummy control

system, which operates regardless of the electricity price. The

dummy TES control electricity cost varies depending on the

time the hot water is used.

C. Quality of the solutions from ADP, DP and stochastic MILP

ADP and DP results in better quality solutions than stochas-

tic MILP as they both incorporate stochastic input variables

using appropriate probabilistic models and non-linear con-

straints [24]. However, ADP results in slightly lower quality

schedules compared to the optimal DP solutions because the

value functions used are approximations. This is evident in

Table 1. Note that the DP based SHEMS with two controllable

devices (battery and thermal) is computationally intractable

to be use in an existing smart meter, and we only use it to

compare solutions with ADP .

D. Beneﬁts of a stochastic optimisation over a deterministic

optimisation

A stochastic optimisation always performs better over a

deterministic one and we investigate this using a range of

possible PV output and the demand proﬁles as shown in Fig. 9.

The Fig. 9 is obtained as follows: ﬁrst we obtain VFAs from

both the stochastic and deterministic optimisations using ADP.

The stochastic optimisation uses the kernel estimated probabil-

ity distributions of the PV output and electrical demand while

the deterministic ADP only uses the predicted mean PV output

and the electrical demand (i.e. all the random variables are

zero). Second we obtain the total electricity cost for different

possible PV output and demand proﬁles, which are generated

by adding Gaussian noise with varying standard deviation and

zero mean to the actual PV output and electrical demand. The

mean absolute errors of the actual (i.e. zero Gaussian noise)

and the predicted mean electrical demand are 0.238%,0.696%,

and 0.117%, for Scenarios 1,2, and 3, respectively, which are

the initial points. Note that the forecast errors associated with

residential electrical demand predictions are typically very

high and our aim here is not to minimise forecast errors but to

ﬁnd a suitable stochastic optimisation technique that performs

well under uncertainty.

ADP enables us to incorporate the stochastic input variables

without a noticeable increase in the computational effort over

a deterministic ADP. Moreover, stochastic ADP requires less

computational effort than deterministic DP. An ADP-based

stochastic optimisation for a PV-battery system can reduce

the total electricity cost by 13.62%,0.16%, and 94.67%

for scenarios 1,2, and 3, respectively, in instances without

Gaussian noise. The beneﬁts from Scenarios 1 and 3 are

noticeable and their forecast errors are what we can expect

from electrical demand prediction algorithms [39], [40]. The

beneﬁts from the stochastic optimisation is minimal when the

forecast errors are very high (Scenario 2) or very low, which

are highly unlikely. In Scenario 2, the stochastic optimisation

gives slightly lower quality results after 0.2 kW standard

deviation of Gaussian noise because both the forecast and

kernel estimation errors are high. However, these scenarios are

highly unlikely and the resulting cost is negligible compared

to the beneﬁts from a stochastic optimisation. Moreover, the

initial point of Scenario 2 already has a mean absolute error

of 0.696, which is highly unlikely to increase any further in

a practical scenario. In summary, even though the beneﬁts

vary with the scenario and the forecast error, a stochastic

optimisation performs better or same as a deterministic one,

and ADP provides these beneﬁts without a noticeable increase

in the computational effort.

E. Computational aspects

In our second set of simulation results, we discuss the

computational performance of the three solution techniques

(Section V-E) and the beneﬁts of extending the decision

horizon (Section V-F) of households with PV-battery systems.

ADP computes a solution much faster than both DP and

stochastic MILP and, most importantly, ADP with a two-day

decision horizon computes a solution with less than half of the

computational time of a daily DP, as depicted in Fig. 10(a).

The computational time of both SHEMSs using ADP and

DP increases linearly as we increase the decision horizon,

however, ADP has a lesser slope. This linear increase with

DP is because the state transitions in this problem are only

between two adjacent time-steps so time does not increase

exponentially. The computational time of ADP with a two-

day decision horizon will only increase by 4 minutes when

11

1 2 3 4

0

20

40

60

80

Length of the decision horizon (days)

Solution time (minutes)

ADP

DP

Stochastic MILP

1234

0.8

1

Length of the decision horizon (days)

Normalised electricity cost

(b)

(a)

Fig. 10. Effects of extending the decision horizon for PV-battery systems,

(a) computational time of ADP, DP and stochastic MILP against the length

of the decision horizon, and (b) normalised electricity cost with error bars

against the length of the decision horizon.

0

5

10

Year 1 Year 2 Year 3

Electricity cost savings (%)

Mean

Fig. 11. Electricity cost savings over 3 years for 10 households, where blue

lines indicate 25th and 75th percentiles and the red lines indicate the median.

the TES is added while a ﬁnely discretised DP based SHEMS

takes approximately 2.5 hours.

F. Effects of extending the decision horizon

Extending the decision horizon beyond one day to consider

inter-daily variations in PV output and electrical demand

results in signiﬁcant beneﬁts as depicted in Fig. 10(b), which

shows the normalised electricity cost vs. length of the decision

horizon. We obtain our results using a DP based SHEMS

for 10 households over 2 months.3Our results show that

increasing the decision horizon beyond two days has no

signiﬁcant beneﬁts. However, increasing the decision horizon

up to one week is beneﬁcial in some situations, e.g. off-the-

grid systems and if there are high variations in PV output and

demand. The beneﬁts of the two-day decision horizon varies

depending on the household, which we will discus in the next

section.

3Here we use DP as we want to obtain the exact solution. In a practical

implementation, extending the decision horizon with DP is difﬁcult as the

computational power is limited in existing smart meters.

TABLE II

YEARLY OPTIMISATION RESULTS FOR TWO HOUSEHOLDS OVER 3YEARS

Total: Household 1 Household 2

Year 1 Year 2 Year 3 Year 1 Year 2 Year 3

PV output

(MWh) 2.91 2.82 2.89 5.99 5.56 5.35

Demand

(MWh) 4.29 4.82 4.38 9.82 10.24 12.85

Benchmark

cost ($) 568.09 610.6 558 1208.3 1276 1588.2

With

PV ($) 440.52 482.5 417.1 821.7 891.4 1194.8

PV-battery

DP ($) 248.37 297.55 238.15 534.8 596.25 890.25

PV-battery

ADP ($) 232.25 285.63 224.18 526.2 589.23 882.29

PV-battery

Stochastic

MILP ($)

281.59 333.60 276.23 554.83 599.07 906.75

G. Year-long optimisation

In our third set of simulation results, we compare the ADP

based SHEMS with a two-day decision horizon to a DP

approach with a daily decision horizon over three years for

10 households with PV-battery systems. We omit TES as we

already identiﬁed its beneﬁts in Section V-B, and moreover, a

yearly optimisation using DP with two controllable devices is

computationally difﬁcult. The time-periods of the three years

are: year 1 from July 1st, 2012 to June 31st , 2013, year 2 is

from July 1st, 2011 to June 31st , 2012 and year 3 is from July

1st, 2010 to June 31st , 2011. Electricity cost savings for all

the households over three years are given in Fig.11 and in

Table 2 we demonstrate detailed results of two households in

Central Coast (Household 1) and Sydney (Household 2), in

NSW, Australia. The PV sizes of the Households 1 & 2 are

2.2kWp and 3.78 kWp, respectively.

The proposed ADP-based SHEMS implemented using a

two-day decision horizon that consider variations in PV output

and electrical demand reduces the average yearly electricity

cost by 4.63% compared to a daily DP based SHEMS, as

depicted in Fig 11. We also ﬁnd that the average yearly savings

are 5.12%, 3.89% and 4.95% for years 1-3, respectively. This

is because 2013 was a sunny year compared to 2012 and 2011

so the two day optimisation has greater beneﬁts. For example:

if we are anticipating a sunny weekend with low demand

then we can completely discharge the battery on Friday night.

However, if we have a half charge battery on Friday night

then we will be wasting most of the free energy as the storage

capacity is limited. Our results showed that a daily DP results

in a signiﬁcant cost savings of 12.04% ($107.35) and 1.91%

($39.35) for Households 1 and 2, respectively, compared to a

daily stochastic MILP approach. The difference in the savings

is because of the following reasons. In scenarios with high

demand (i.e. Household 2), most of the time the battery will

discharge its maximum possible power to the household during

peak periods, so the battery and the inverter will operate in

the maximum efﬁciencies even though MILP solver does not

consider the non-linear constraints. The converse is happening

for scenarios with low demand (i.e. Household 1).

For demonstration purposes we show optimisation results

12

for two households over three years in Table.2. The average

yearly electricity cost for a residential PV-battery system

with the proposed ADP based SHEMS can reduce the total

electricity cost over 3 years by 57.27% and 50.95% for

Households 1 and 2, respectively, compared to the 22.80%

and 28.60% improvements by having only a PV system. It is

important to note that a DP based SHEMS over a two-day

long decision horizon may result in a slightly better solution.

However, it is computationally difﬁcult and the computational

power available will be limited as it won’t be worthwhile

investing in a specialised equipment to solve this problem

given the savings on offer.

VI. CONCLUSION

This paper shows the beneﬁts having a smart home energy

management system and presented an approximate dynamic

programming approach for implementing a computationally

efﬁcient smart home energy management system with sim-

ilar quality schedules as with dynamic programming. This

approach enables us to extend the decision horizon to up to a

week with high resolution while considering multiple devices.

Our results indicate that these improvements provide ﬁnancial

beneﬁts to households employing them in a smart home energy

management system. Moreover, stochastic approximate dy-

namic programming always performs better over deterministic

approximate dynamic programming under uncertainty without

a noticeable increase in the computational effort.

In practical applications, we can use value function approx-

imations generated from approximate dynamic programming

during ofﬂine planning phase to make faster online solu-

tions. This is not possible with stochastic mixed-integer linear

programming and generating value functions using dynamic

programming is computationally difﬁcult. In the future, we

will learn initial value function approximations of approximate

dynamic programming using historical results, which will fur-

ther speed up the value function approximation convergence.

Given the beneﬁts outlined in this paper and the possible

future work, we recommend the use of approximate dynamic

programming in smart home energy management systems.

REFERENCES

[1] Australian Government Bureau of Resources and Energy Economics.

Energy in Australia 2014.

[2] A. C. Chapman, G. Verbic, and D. J. Hill, “Algorithmic and strategic

aspects to integrating demand-side aggregation and energy management

methods,” IEEE Transactions on Smart Grid, vol. PP, no. 99, pp. 1–13,

2016.

[3] Ausgrid, NSW, Australia. Time-of-use pricing.

[4] Energy Australia. Flexible pricing FAQs.

[5] S. Lu, N. Samaan, R. Diao, M. Elizondo, C. Jin, E. Mayhorn, Y. Zhang,

and H. Kirkham, “Centralized and decentralized control for demand

response,” in Innovative Smart Grid Technologies (ISGT), 2011 IEEE

PES, 2011, pp. 1–8.

[6] M. Pipattanasomporn, M. Kuzlu, and S. Rahman, “An algorithm for

intelligent home energy management and demand response analysis,”

IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 2166–2173, 2012.

[7] S. Li, D. Zhang, A. B. Roget, and Z. O’Neill, “Integrating home energy

simulation and dynamic electricity price for demand response study,”

IEEE Transactions on Smart Grid, vol. 5, no. 2, pp. 779–788, March

2014.

[8] M. Muratori and G. Rizzoni, “Residential demand response: Dynamic

energy management and time-varying electricity pricing,” IEEE Trans-

actions on Power Systems, vol. 31, no. 2, pp. 1108–1117, March 2016.

[9] Australian PV Institute. Australian PV market since April 2001.

[Online]. Available: http://pv-map.apvi.org.au/analyses

[10] Electric Power Research Institute (EPRI), “The integrated grid realizing

the full value of central and distributed energy resources,” Tech. Rep.,

2014.

[11] Australian Energy Market Operator (AEMO), “Emerging technologies

information,” Tech. Rep., 2015.

[12] Rocky Mountain Institute, Homer Energy, Cohnreznick Think Energy,

“The economics of grid defection when and where distributed solar

generation plus storage competes with traditional utility service,” Tech.

Rep., 2014.

[13] Z. Chen, L. Wu, and Y. Fu, “Real-time price-based demand response

management for residential appliances via stochastic optimization and

robust optimization,” IEEE Transactions on Smart Grid, vol. 3, no. 4,

pp. 1822–1831, 2012.

[14] C. Keerthisinghe, G. Verbiˇ

c, and A. Chapman, “Evaluation of a multi-

stage stochastic optimisation framework for energy management of

residential PV-storage systems,” in Power Engineering Conference (AU-

PEC), 2014 Australasian Universities, Sept 2014, pp. 1–6.

[15] M. Bozchalui, S. Hashmi, H. Hassen, C. Canizares, and K. Bhattacharya,

“Optimal operation of residential energy hubs in smart grids,” IEEE

Transactions on Smart Grid, vol. 3, no. 4, pp. 1755–1766, 2012.

[16] F. De Angelis, M. Boaro, D. Fuselli, S. Squartini, F. Piazza, and

Q. Wei, “Optimal home energy management under dynamic electrical

and thermal constraints,” IEEE Transactions on Industrial Informatics,

vol. 9, no. 3, pp. 1518–1527, 2013.

[17] J. Wang, Z. Sun, Y. Zhou, and J. Dai, “Optimal dispatching model

of smart home energy management system,” in Innovative Smart Grid

Technologies - Asia (ISGT Asia), 2012 IEEE, 2012, pp. 1–5.

[18] K. C. Sou, J. Weimer, H. Sandberg, and K. Johansson, “Scheduling smart

home appliances using mixed integer linear programming,” in Decision

and Control and European Control Conference (CDC-ECC), 2011 50th

IEEE Conference on, 2011, pp. 5144–5149.

[19] M. Pedrasa, E. Spooner, and I. MacGill, “Robust scheduling of residen-

tial distributed energy resources using a novel energy service decision-

support tool,” in Innovative Smart Grid Technologies (ISGT), 2011 IEEE

PES, 2011, pp. 1–8.

[20] M. Pedrasa, T. Spooner, and I. MacGill, “Coordinated scheduling of

residential distributed energy resources to optimize smart home energy

services,” IEEE Transactions on Smart Grid, vol. 1, no. 2, pp. 134–143,

2010.

[21] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed.

Athena Scientiﬁc, Bemont, Massachuttes, 2005, vol. 1.

[22] H. Tischer and G. Verbiˇ

c, “Towards a smart home energy management

system - a dynamic programming approach,” in Innovative Smart Grid

Technologies Asia (ISGT), 2011 IEEE PES, 2011, pp. 1–7.

[23] C. Keerthisinghe, G. Verbiˇ

c, and A. Chapman, “Addressing the stochas-

tic nature of energy management in smart homes,” in Power Systems

Computation Conference (PSCC), 2014, Aug 2014, pp. 1–7.

[24] ——, “Energy management of PV-Storage systems: ADP approach

with temporal difference learning,” in Power Systems Computation

Conference (PSCC), 2016, Jun 2016, pp. 1–7.

[25] D. F. Salas and W. B. Powell, “Benchmarking a scalable approximate dy-

namic programming algorithm for stochastic control of multidimensional

energy storage problems,” Technical report, Working Paper, Department

of Operations Research and Financial Engineering, Princeton, NJ, Tech.

Rep., 2013.

[26] D. Jiang, T. Pham, W. Powell, D. Salas, and W. Scott, “A comparison

of approximate dynamic programming techniques on benchmark energy

storage problems: Does anything work?” in Adaptive Dynamic Program-

ming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on,

Dec 2014, pp. 1–8.

[27] R. Anderson, A. Boulanger, W. Powell, and W. Scott, “Adaptive stochas-

tic control for the smart grid,” Proceedings of the IEEE, vol. 99, no. 6,

pp. 1098–1115, 2011.

[28] Powell, Warren B. and George, Abraham and Simao, Hugo and Scott,

Warren and Lamont, Alan and Stewart, Jeffrey, “SMART: A Stochastic

Multiscale Model for the Analysis of Energy Resources, Technology,

and Policy,” INFORMS Journal on Computing, vol. 24, no. 4, pp. 665–

682, 2012.

[29] W. B. Powell, Approximate Dynamic Programming - Solving the Curses

of Dimensionality. John Wiley and Sons, Inc., 2007.

[30] ——, “Approximate dynamic programming for large-scale resource al-

location problems,” Princeton University, Princeton, New Jersey 08544,

USA,, 2005.

[31] F. Borghesan, R. Vignali, L. Piroddi, M. Prandini, and M. Strelec,

“Approximate dynamic programming-based control of a building cooling

13

system with thermal storage,” in Innovative Smart Grid Technologies

Europe (ISGT EUROPE), 2013 4th IEEE/PES, Oct 2013, pp. 1–5.

[32] M. Strelec and J. Berka, “Microgrid energy management based on

approximate dynamic programming,” in Innovative Smart Grid Tech-

nologies Europe (ISGT EUROPE), 2013 4th IEEE/PES, Oct 2013, pp.

1–5.

[33] P. Samadi, H. Mohsenian-Rad, V. Wong, and R. Schober, “Real-time

pricing for demand response based on stochastic approximation,” IEEE

Transactions on Smart Grid, vol. 5, no. 2, pp. 789–798, March 2014.

[34] E. L. Ratnam, S. R. Weller, C. M. Kellett, and A. T. Murray, “Residential

load and rooftop PV generation: an Australian distribution network

dataset,” International Journal of Sustainable Energy, vol. 0, no. 0, pp.

1–20, 0.

[35] Smart-Grid Smart-City Customer Trial Data. [Online]. Available:

https://data.gov.au/dataset/smart-grid-smart-city-customer-trial-data

[36] T. Jaakkola, M. I. Jordan, and S. P. Singh, “Convergence of stochastic

iterative dynamic programming algorithms,” Neural Computation, vol. 6,

pp. 1185–1201, 1994.

[37] G. A. Godfrey and W. B. Powell, “An adaptive, distribution-free

algorithm for the newsvendor problem with censored demands, with

applications to inventory and distribution,” Management Science, vol. 47,

no. 8, pp. 1101–1112, 2001.

[38] N. Sharma, P. Sharma, D. Irwin, and P. Shenoy, “Predicting solar

generation from weather forecasts using machine learning,” in Smart

Grid Communications (SmartGridComm), 2011 IEEE International

Conference on, Oct 2011, pp. 528–533.

[39] K. Bao, F. Allerding, and H. Schmeck, “User behavior prediction for

energy management in smart homes,” in Fuzzy Systems and Knowledge

Discovery (FSKD), 2011 Eighth International Conference on, vol. 2,

July 2011, pp. 1335–1339.

[40] D. Lachut, N. Banerjee, and S. Rollins, “Predictability of energy use in

homes,” in Green Computing Conference (IGCC), 2014 International,

Nov 2014, pp. 1–10.

[41] E3 - Equipment Energy Efﬁcieny, “Water heating data collection and

analysis,” Tech. Rep., 2012.

Chanaka Keerthisinghe (S10) received B.E. (Hons)

and M.E. (Hons) in Electrical and Electronic En-

gineering from the University of Auckland, New

Zealand, in 2011 and 2012, respectively. He is

currently a PhD candidate at the Centre for Future

Energy Networks, University of Sydney, Australia.

Chanaka’s research interests include demand re-

sponse, energy management in residential buildings,

wireless charging of electric vehicles and applying

approximate dynamic programming and optimisa-

tion techniques in power systems.

Gregor Verbiˇ

creceived the B.Sc., M.Sc., and Ph.D.

degrees in electrical engineering from the University

of Ljubljana, Ljubljana, Slovenia, in 1995, 2000, and

2003, respectively. In 2005, he was a North Atlantic

Treaty Organization Natural Sciences and Engineer-

ing Research Council of Canada Postdoctoral Fellow

with the University of Waterloo, Waterloo, ON,

Canada. Since 2010, he has been with the School of

Electrical and Information Engineering, University

of Sydney, Sydney, NSW, Australia. His expertise

is in power system operation, stability and control,

and electricity markets. His current research interests include integration of

renewable energies into power systems and markets, optimization and control

of distributed energy resources, demand response, and energy management

in residential buildings. He was a recipient of the IEEE Power and Energy

Society Prize Paper Award in 2006. He is an Associate Editor of the IEEE

TRANSACTIONS ON SMART GRID.

Archie C. Chapman (M14) received the B.A. de-

gree in math and political science, and the B.Econ.

(Hons.) degree from the University of Queensland,

Brisbane, QLD, Australia, in 2003 and 2004, re-

spectively, and the Ph.D. degree in computer science

from the University of Southampton, Southampton,

U.K., in 2009. He is currently a Research Fellow

in Smart Grids with the School of Electrical and

Information Engineering, Centre for Future Energy

Networks, University of Sydney, Sydney, NSW, Aus-

tralia. He has expertise is in game-theoretic and

reinforcement learning techniques for optimization and control in large

distributed systems. His research focuses on integrating renewables into legacy

power networks, using distributed energy and load scheduling methods, and on

designing tariffs and market mechanisms that support efﬁcient use of existing

infrastructure and new controllable devices.