Content uploaded by Chanaka Keerthisinghe
Author content
All content in this area was uploaded by Chanaka Keerthisinghe on Dec 14, 2016
Content may be subject to copyright.
1
A Fast Technique for Smart Home Management:
ADP with Temporal Difference Learning
Chanaka Keerthisinghe, Student Member, IEEE, Gregor Verbiˇ
c, Senior Member, IEEE,
and Archie C. Chapman, Member, IEEE
Abstract—This paper presents a computationally efficient
smart home energy management system (SHEMS) using an ap-
proximate dynamic programming (ADP) approach with temporal
difference learning for scheduling distributed energy resources.
This approach improves the performance of a SHEMS by
incorporating stochastic energy consumption and PV generation
models over a horizon of several days, using only the com-
putational power of existing smart meters. In this paper, we
consider a PV-storage (thermal and battery) system, however,
our method can extend to multiple controllable devices without
the exponential growth in computation that other methods such
as dynamic programming (DP) and stochastic mixed-integer
linear programming (MILP) suffer from. Specifically, probability
distributions associated with the PV output and demand are
kernel estimated from empirical data collected during the Smart
Grid Smart City project in NSW, Australia. Our results show
that ADP computes a solution much faster than both DP and
stochastic MILP, and provides only a slight reduction in quality
compared to the optimal DP solution. In addition, incorporating
a thermal energy storage unit using the proposed ADP-based
SHEMS reduces the daily electricity cost by up to 26.3% without
a noticeable increase in the computational burden. Moreover,
ADP with a two-day decision horizon reduces the average yearly
electricity cost by a 4.6% over a daily DP method, yet requires
less than half of the computational effort.
Index Terms—demand response, smart home energy man-
agement, distributed energy resources, approximate dynamic
programming, dynamic programming, stochastic mixed-integer
linear programming, value function approximation, temporal
difference learning.
NOMENCLATURE
kTime-step
KTotal number of time-steps
iIndex of controllable devices
ITotal number of controllable devices
jIndex of stochastic variables
JTotal number of stochastic variables
s{i,j}
kState of ior jat time-step k
xi
kDecision of controllable device iat time-step k
ωj
kVariation of stochastic variable jat time-step k
CkCost or reward at time-step k
πPolicy
rRealisation of random information
RTotal number of random realisations
αStepsize
Vπ
kExpected future cost/reward for following πfrom k
Chanaka Keerthisinghe, Gregor Verbiˇ
c, and Archie C. Chapman are with
the School of Electrical and Information Engineering, The University of
Sydney, Sydney, New South Wales, Australia. E-mail: {chanaka.keerthisinghe,
gregor.verbic, archie.chapman}@sydney.edu.au.
si,max Maximum state of a controllable device
si,min Minimum state of a controllable device
µiEfficiency of a controllable device [%]
liLosses of a controllable device per time-step
aIndex of a segment in the VFA
AkTotal number of segments at time-step k
zka Capacity of a segment
¯vka Slope of a segment
nA particular scenario
NTotal number of scenarios in stochastic MILP
I. INTRODUCTION
FUTURE electrical power grids will be based on a
“demand-following-supply” paradigm, because of the in-
creasing penetration of intermittent renewable energy sources
and advances in technology enabling non-dispatchable loads.
A cornerstone of achieving this is demand side management,
which can be roughly divided into demand response (DR) and
direct load control. This research focuses on DR, which is used
to reduce electricity costs on one side and provide system
services on the other. In particular, we focus on residential
customers, because roughly 30% of the energy use in Australia
is comprised of residential loads [1] and their diurnal patterns
drive daily and seasonal peak loads.
Currently, DR is implemented using small-scale voluntary
programs and fixed pricing strategies. The main drawback of
using the same pricing strategy for all the customers is the
possibility of inducing unwanted peaks in the demand curve
as customers respond to prices. Also, it is not possible for
residential and small commercial users to directly participate
in wholesale energy or ancillary services markets because
of the regulatory regimes and computational requirements.
As such, DR revolves around the interaction between an
aggregator and customers, as shown in Fig 1. In many cases,
the retailer acts as the aggregator. The aggregator’s task is to
construct a scheme that coordinates, schedules or otherwise
controls part of a participating users load via interaction with
its smart home energy management system (SHEMS) [2]. In
the context of DR, the aggregator sends control signals in the
form of electricity price signals to the SHEMS. The SHEMS
then schedules and coordinates customers distributed gener-
ation (DG), storage and flexible loads, collectively known as
distributed energy resources (DERs), to minimise energy costs
while maintaining a suitable level of comfort. In particular, this
research assumes that the exact electricity price signals are
available from a DR aggregator/retailer in the form of time-
2
Agent 1
Large user Retailer
Generator
Residential buildings
Agent 3
Agent 2
Wholesale ele ctricity
market
Generator Generator
Retailer
Energy flow
Information flow
Aggregator
Fig. 1. Customers, aggregator and the wholesale electricity market.
of-use pricing (ToUP). ToUP is chosen as it is prevalent in
Australia [3], [4].
The DERs in the smart homes considered in this paper com-
prise a PV unit, a battery and a thermal energy storage (TES)
unit (i.e. electric water heater). In the existing literature, a
range of DER have been used to achieve DR [5]–[8]. However,
our choice stems from Australia’s increasing penetration of
rooftop PV and battery storage systems in response to rising
electricity costs, decreasing technology costs and existing
fleet of hot water storage devices [9], [10]. According to
AEMO, the payback period for residential PV-storage systems
is already below 15 years in South Australia, with the other
states to follow suit in less than a decade [11]. Similarly,
residential users with PV-storage systems in the USA have
been forecast to reach grid parity within the next decade [12].
The SHEMS schedules the DER in a such a way that
electrical power drawn from the grid is minimised, especially
during peak periods. Given the recent drop in PV costs, feed-
in tariffs (FiTs) in Australia are significantly less than the
retail tariffs paid by the households, and it is anticipated that
this may happen in other parts of the world too. As such,
selling power back to the electrical grid is uneconomical,
and households have a strong incentive to self-consume as
much locally generated power as possible. Therefore, when
PV generation is higher than electrical demand, an effective
SHEMS should either store the surplus energy or consume it
using controllable loads.
The underlying optimisation problem here is a sequential
decision making process under uncertainty. As shown in [13],
SHEMSs that consider stochastic variables such as variations
in PV output, electrical and thermal demand using appropriate
probability distributions yield better quality schedules than
those obtained from a deterministic model. Moreover, in [14]
we showed that extending the optimisation horizon beyond
one day is economically beneficial as the SHEMS can exploit
inter-daily variations in consumption and solar-insolation pat-
terns. Given this, stochastic mixed-integer linear programming
(MILP) [13], [15]–[18], particle swarm optimisation [19], [20]
and dynamic programming (DP) [21], [22] have previously
been proposed to solve the SHEMS problem. In particular, DP
accommodates full non-linear controllable device models and
produces close-to-optimal solutions when finely discretised.
However, DP is infeasible if the optimisation horizon is
extended or when multiple controllable devices are added to
the SHEMS due to the exponential growth in the state and
action spaces [23].
Given these insights, in [24] we reported our prelimi-
nary work towards implementing a computationally efficient
SHEMS using approximate dynamic programming (ADP) with
temporal difference learning, an approach that has successfully
been applied for the control of grid level storage in [25]. In this
proposed approach, we obtain policies from value function ap-
proximations (VFAs)1. Our choice to use VFAs is based on the
observation that they work best in time-dependent problems
with regular load, energy and price patterns, relatively high
noise and less accurate forecasts (errors grow with the horizon)
[26]. Other ADP methods are better suited to applications with
different characteristics [27]–[33].
Building on this, the main contributions of this paper are
the development of a computationally efficient SHEMS using
ADP with temporal difference learning, which can incorporate
multiple controllable devices, and a demonstration of its
practical implementation using empirical data. Specifically, the
proposed ADP method enables us to:
1) incorporate stochastic input variables without a noticeable
increase in the computational burden;
2) extend the decision horizon with less computational bur-
den to consider uncertainties over several days, which
results in significant financial benefits;
3) enable integration of multiple controllable devices with
less computational burden than DP;
4) integrate the SHEMS into an existing smart meter as it
uses less memory compared to existing methods.
In order to show the performance of ADP, we use it with
a two-day decision horizon over three years using a rolling
horizon approach where the expected electricity cost of each
day is minimised considering the uncertainties over the next
day. The daily performance is benchmarked against DP and
stochastic MILP by applying them to three different scenarios
with different electrical and thermal demand patterns and PV
outputs. The three year evaluation is benchmarked against
a daily DP approach by applying it to 10 smart homes.
Moreover, the PV output and demand profiles are from data
[34] collected during the Smart Grid Smart City project by
Ausgrid and their consortium partners in NSW, Australia,
which investigated the benefits and costs of implementing
a range of smart grid technologies in Australian households
[35]. Specifically, SHEMSs estimate probability distributions
associated with the PV output and electrical demand from
empirical data using kernel regression, which are more realistic
than assuming parametric approaches. In addition, throughout
1A value function describes the expected future cost of following a policy
from a given state.
3
the paper we demonstrate the benefits of residential PV-storage
systems with a SHEMS.
The paper is structured as follows: Section II states the
stochastic energy management problem and the existing so-
lution techniques. This is followed by a description of the
ADP formulation in Section III. The stochastic variable models
are described in Section IV. SectionV presents the simulation
results and the discussion.
II. SMART HOME ENERGY MANAGEMENT PROBLEM
In this section, we describe the general formulation of
the sequential stochastic optimisation problem, formulate our
stochastic SHEMS problem as a sequential stochastic opti-
misation problem, and present a short description about the
stochastic MILP and DP use to solve this problem.
A. General sequential stochastic optimisation problem
A sequential stochastic optimisation problem comprises:
•A sequence of time-steps,K={1. . . k . . . K }, where k
and Kdenote a particular time-step and the total number
of time-steps in the decision horizon, respectively.
•A set of controllable devices, I={1...i...I}, where
each iis represented using:
•A state variable, si
k∈ S.
•A decision variable, xi
k∈ X , which is a control action.
•A set of non-controllable inputs, J={1. . . j . . . J }, where
each jis represented using:
•A state variable, sj
k∈ S.
•A random variable, ωj
k∈Ω, capturing exogenous
information or perturbations.
Given this, we let: sk= [ si
k. . . sI
k, sj
k. . . sJ
k]T,xk=
[xi
k. . . xI
k]T, and ωk= [ ωj
k. . . ωJ
k]T. Note that the
state variables contain the information that is necessary
and sufficient to make the decisions and compute costs,
rewards and transitions.
•Constraints for state and decision variables.
•Transition functions, sk+1 =sM(sk,xk,ωk), describing
the evolution of states from kto k+ 1, where sM(.)
is the system model that consists of controllable device
i’s operational constraints such as power flow limits,
efficiencies and losses. Note that the transition functions
are only required for the controllable devices.
•An objective function:
F=E(K
X
k=1
Ck(sk,xk,ωk)),(1)
where Ck(sk,xk,ωk)is the contribution (i.e cost or
reward of energy, or a discomfort penalty) incurred at
time-step k, which accumulates over time.
B. Instantiation
The objective of the SHEMS is to minimise energy costs
over a decision horizon. The sequential stochastic optimisation
problem is solved before the start of each day, using either a
daily or a two day decision horizon. In this paper we consider a
Battery
Electric Water
System
PV System
Controller
+
Electrical Grid
Smart Meter
Demand
=~
Thermal Energy
Electrical Energy
+
b
s
p
v
s
p
s
t
s
d,t
s
d,e
s
b
x
p
v
+
d,e
+
d,t
+
wh
x
Inverter
Fig. 2. Illustration of electrical and thermal energy flows in a smart home,
and the state, decision and random variables use to formulate the problem.
PV unit, battery and a hot water system (TES unit), as depicted
in Fig. 2. We use a single inverter for both the battery and the
PV, which is becoming popular in Australia.
In order to optimise performance, a SHEMS needs to
incorporate the variations in PV output, electrical and thermal
demand of the household. Given this, we model stochastic
variables using their mean as state variables and variation as
random variables. This enables us to use an algorithmic strat-
egy that separates the transition function into a deterministic
term, using the mean, and a random term, using variation
(discussed in Section III). In some cases, electricity prices
may be considered as stochastic variables. However, in this
paper, we assume that the exact electricity prices are available
before the start of the decision horizon from a residential DR
aggregator/retailer (i.e. in the form of ToUP).
In more detail, we cast our SHEMS problem as the se-
quential stochastic optimisation formulation in Section II-A
as follows: The daily decision horizon is a 24 hour period,
divided into K= 48 time-steps with a 30 minutes resolution.
We do this similarly for the two-day decision horizon. Here
30 minutes time resolution is chosen to match with typical
dispatch time lines because the PV and demand data from the
Smart Grid Smart City project [34] are only available at 30
minutes intervals. If required, the proposed ADP approach can
increase the time resolution with less computational burden
compared to existing methods. The controllable devices are the
battery and the TES, while the non-controllable inputs are the
PV output and the electrical and thermal demand. As depicted
in Fig. 2, for each time-step, k, in the decision horizon, state
variables are used to represent: the battery SOC, sb
k; TES
SOC, st
k; mean PV output, spv
k; mean electrical demand, sd,e
k;
mean thermal demand, sd,t
k; and electricity tariff, sp
k. Control
variables consist of: charge and discharge rates of the battery,
xb
k; and electric water heater input, xwh
k. Random variables
are: the variations in PV output, ωpv
k; variations in thermal
demand, ωd,t
k; and variations in electrical demand, ωd,e
k. We
use empirical data to estimate the probability distributions
associated with the uncertain variables using kernel regression,
which are more realistic than assuming parametric approaches
(discussed in detail in Section IV).
The energy balance constraint is given by:
sd,e
k+ωd,e
k+xwh
k=µixi
k+xg
k,(2)
where: xi
k=spv
k+ωpv
k−µbxb
kis the inverter power at the
DC side (positive value means power into the inverter); µiis
4
the efficiency of the inverter (note that the efficiency is 1/µi
when the inverter power is negative); µbis the efficiency of the
battery action corresponding to either charging or discharging;
and xg
kis the electrical grid power. The charge rate of the
battery is constrained by the maximum charge rate xb+
k≤γc
and discharge rate of the battery is constrained by the maxi-
mum discharge rate xb−
k≥γd. The electric water heater input
should never exceed the maximum possible electric water
heater input xwh
k≤γwh. In order satisfy thermal demand at
all time-steps, we make sure that the TES has enough energy
at each time-step, st,req
kto satisfy thermal demand for the next
2 hours. Therefore, the energy stored in the TES is always
within the limits:
st,req ≤st
k≤st,max.(3)
The energy stored in the battery should be within the limits
sb,min ≤sb
k≤sb,max.
Transition functions govern how the state variables evolve
over time. The battery SOC, denoted sb
k∈[sb,min, sb,max
k],
progresses by:
sb
k+1 =1−lb(sb
k)sb
k−xb-
k+µb+xb+
k,(4)
where lb(sb
k)models the self-discharging process of the bat-
tery. The TES SOC is denoted st
k∈[st,req, st,max
k], and evolves
according to:
st
k+1 =1−lt(st
k)st
k−sd,t
k−ωd,t
k+µwhxwh
k,(5)
where lt(st
k)models the thermal loss of the TES and µwh is
the efficiency of the electric water heater. In the above, both
the transition functions are non-linear functions of state.
The discharge efficiency of the battery and the efficiency
of the inverter are non-linear, and the different ways that the
stochastic MILP, DP and ADP approaches represent them
are illustrated in Fig 3. These indicate that DP and ADP
can directly incorporate non-linear characteristics, while linear
approximations have to be made with stochastic MILP. For all
the implemented SHEMSs the following device characteristics
are the same: the charging efficiency of the battery is µb+ = 1;
the maximum and minimum battery SOC are 2 kWh and 10
kWh, respectively; the maximum charge and discharge rates
of the battery are 2kWh; electric water heater efficiency is
µwh = 0.9while its maximum possible input is 3kWh; and
TES limit is set to 12 kWh.
The optimal policy, π∗, is a choice of action for each state
π:S → X , that minimises the expected sum of future costs
over the decision horizon; that is:
Fπ∗= min
π∗
E(K
X
k=0
Ck(sk, π(sk),ωk)),(6)
where Ck(sk,xk,ωk)is the cost incurred at a given time-step,
which is given by:
Ck(sk,xk,ωk) = sp
ksd,e
k+ωd,e
k−µixi
k+xwh
k.(7)
Note that we don’t use any specific user comfort criteria in the
contribution function. However, we endeavour to supply the
thermal demand at all time-steps without any user discomfort
0.2 0.4 0.6 0.8 1 1.2 1.4
0
50
100
Input power of the inverter (kWh)
Efficiency of the
inverter (%)
ADP and DP
Stochastic MILP
0 0.2 0.4 0.6 0.8 1
70
80
90
Battery discharge rate (kWh)
Discharge efficiency
of the battery (%)
ADP and DP
Stochastic MILP
(b)
(a)
Fig. 3. Characteristics of the battery and the inverter.
by penalising undesired states of the TES in DP and ADP,
and directly using the constraint (3) in stochastic MILP .
The problem is formulated as an optimisation of the expected
contribution because the contribution is generally a random
variable due to the effect of ωk. In all the SHEMSs, we
obtain the decisions xk=π(sk)=[xb
k, xwh
k], depending on the
state variables sk= [sb
k, st
k, sd,e
k, spv
k, sd,t
k, sp
k], and realisations
of random variables ωk= [ωpv
k, ωd,e
k, ωd,t
k]at each time-step.
C. Solution techniques
The first method we use is a scenario-based MILP ap-
proach, which we referred to as stochastic MILP in [23].
This technique requires us to linearise the constraints and
transition functions mentioned in Section II-B and model the
problem as a mathematical programming problem. The second
method we use is DP, in which we model our problem as
a Markov decision process (MDP). This method enables us
to incorporate all the non-linear constraints and transition
functions with no additional computational burden over using
linear constraints and transition functions. Details of these
methods are as follows:
1) Stochastic MILP: The deterministic version of the
SHEMS problem can be solved using a MILP approach,
which optimises a linear objective function subject to linear
constraints with continuous and integer variables [13]. Note
that the transition functions presented in Section II-B are
considered as constraints in the MILP formulation. Integer
variables are used to model power flow directions.
In order to incorporate stochasticity, a large set of scenarios
are generated by sampling from all combined realisations of
the stochastic variables mentioned in Section II-B. A larger
number of scenarios should improve the solutions generated by
better incorporating the stochastic variables, but this imposes
a greater computational burden. Therefore, heuristic scenario
reduction techniques are employed to obtain a scenario set
of size N, which can be solved within a given time with
reasonable accuracy.
5
Given this, a scenario-based stochastic MILP formulation
of the problem is described by:
min
N
X
n=1
Pn(sj,n)
K
X
k=1 sp,buy
kxg+
k−sp,sell
kxg-
k,(8)
where Pn(sj,n)is the probability of a particular scenario n
corresponding to realizations of stochastic variables sj, subject
to PN
n=1 Pn(sj,n)=1.
For each realized scenario, the optimisation problem is
solved for the whole horizon at once using a standard MILP
solver, so the solution time grows exponentially with the length
of the horizon. Here CPLEX is used, however, all commercial
solvers gives similar quality solutions. As such, in the existing
literature, a one day optimisation horizon is typically assumed.
Moreover, the solutions are of lower quality because of the
linear approximations made and the inability to incorporate
all the probability distributions [23]. In response to these
limitations, DP was proposed in [22] to improve the solution
quality.
2) Dynamic programming (DP): The problem in (6) is
easily cast as an MDP due to the separable objective function
and Markov property of the transition functions. Given this,
DP solves the MDP form of (6) by computing a value function
Vπ(sk). This is the expected future cost of following a policy,
π, starting in state, sk, and is given by:
Vπ(sk) = X
s0∈S
P(s0|sk, π(sk),ωk) [C(sk, π(sk),s0) + Vπ(s0)] .
(9)
An optimal policy, π∗, is one that minimises (6), and which
also satisfies Bellman’s optimality condition:
Vπ∗
k(sk) = min
π∗Ck(sk, π(sk)) + EnVπ∗
k+1(s0)|sko.(10)
The expression in (10) is typically computed using backward
induction, a procedure called value iteration, and then an
optimal policy is extracted from the value function by selecting
a minimum value action for each state. This is the key func-
tional point of difference between DP and stochastic MILP.
DP enable us to plan offline by generating value functions
for every time-step. Once we have the value functions, we
can make faster online solutions using (10) (more details are
towards the end of this section). Note that a value function at
a given time-step consists of the expected future cost from all
the states. This process of mapping states and actions is not
possible with stochastic MILP.
An illustration of a deterministic DP using a simplified
model of a battery storage is shown in Fig. 4. At every time-
step, there are three battery SOC states (i.e. highest, middle,
and lowest) and three possible battery actions that results in
different instantaneous costs. At the last time-step, k=K, the
expected future cost from the desired state, sk= M, is zero,
while the other two states are penalised with a large cost. This
is an important step that allows us to control the end-of-day
battery SOC (discussed in Section V). The expected future cost
at every possible state is calculated using (10), which is the
minimum of the combined instantaneous cost that results from
the decision that we take and the expected future cost from the
k=1 k=2 k=3 k=K
b
k
s
- Battery state of charge
b
k
x
- Battery action
3
4
5
11
4
6
3
5
3
5
7
5
6
3
2
2
6
1
5
bb
111
(,)2
Csx
Optimal policy
- Expected future cost
5
3
5
b
2
L
s
b
22
8Vs
b
2
M
s
b
22 7Vs
b
2
H
s
b
33
11Vs
b
1
s
b
11
10Vs
b
0
KK
Vs
b
4
H
s
b
44
3Vs
b
4
M
s
b
44
4Vs
b
4Ls
b
44
5Vs
b
33
6Vs
b
33
4Vs
b
3
Hs
b
3
L
s
b
33
7Vs
b
3
M
s
4
2
b
kk
Vs
b
b
11 1
(, )Cs x
- Contribution
bL
K
s
b
10
KK
Vs
b
10
KK
Vs
b
H
K
s
b
M
K
s
k=4
3
Fig. 4. A deterministic DP example using a battery storage, where expected
future cost is calculated using (10). The instantaneous contributions from the
battery decisions are on the edges of the lines while the expected future cost
is below the states. The optimal policy satisfies (6), which is obtained using
(10).
state we end up at the next time-step. In Fig 4, instantaneous
cost is on the edges of the lines while the expected future
cost is below the states. An optimal policy is extracted from
the value functions by selecting a minimum value action for
each state using (10). For example, from sb
1, if we take the
optimal decision to go to sb
2= L then the total combined cost
of 10 consists of a instantaneous cost of 2and a expected
future cost of 8. Even though the expected future cost of 7
from sb
2= M is lower than the expected future cost from sb
2=
L, the instantaneous cost that takes us there is 4so the total
combined cost is 11. Given this, the expected future cost of
following the optimal policy from sb
1is 10 and at time-step 2
we will be at sb
2= L.
There are several reasons to prefer DP over MILP. First, DP
produces close-to-optimal solutions when the value functions
obtained during the offline planning phase are from finely dis-
cretised state, action and outcome spaces. Second, in practical
applications, the SHEMS can make real-time decisions using
the policy implied by (10). This means that at each time-step,
the optimal decision from the current state can be executed.
Note that (10) is a simple linear program at each time-step so
is computationally feasible using existing smart meters. This
stands in contrast to a stochastic MILP formulation, which
would involve solving the entire stochastic MILP program,
which is computationally difficult even for the offline planning.
Third, we can always obtain a solution with DP regardless
of the constraints and the inputs while MILP fails to find
a solution when the constraints are not satisfied. From our
experience, MILP fails to find a solution when the end-of-
day TES SOC is fixed, because the energy out of the TES
unit can not be controlled, which is the thermal demand of
the household. We can overcome this by either removing the
end-of-day TES constraint or by having a range of values.
However, this means we end up with a sub optimal level of
6
Algorithm 1 :ADP using Temporal Difference Learning TD(1)
1: Initialize ¯
V0
k, k ∈K,
2: Set r= 1 and k= 1,
3: Set s1.
4: while r≤Rdo
5: Choose a sample path ωr.
6: for k= 0,...,K do
7: Solve the deterministic problem (17).
8: for i= 1,...,I do
9: Find right and left marginal contributions (18).
10: end for
11: if k < K then
12: Find the post decision states (11) and the next pre
decision states (12).
13: end if
14: end for
15: for k=K, . . . , 0do
16: Calculate the marginal values (19).
17: Update the estimates of the marginal values (20).
18: Update the VFAs using CAVE algorithm.
19: Combine value functions of each controllable device (16)
20: end for
21: r=r+ 1.
22: end while
23: Return the value function approximations ¯
VR
k∀k.
TES SOC at the end of the day or require user interaction
to adjust the TES SOC, which we should avoid in practical
applications.
However, the required computation to generate value func-
tions using DP grows exponentially with the size of the
state, action and outcome spaces. One way of overcoming
this problem is to approximate the value function, while
maintaining the benefits of DP.
III. APP ROXIMATE DYNA MI C PROGRAMMING (ADP)
ADP, also known as forward DP, is an algorithmic strategy
for approximating a value function, which steps forward in
time, compared to backward induction used in value iteration.
Policies in ADP are extracted from these VFAs [29]. Similar to
DP, ADP operates on an MDP formulation of the problem, so
all the non-liner constraints and transition functions in Section
II-B can be incorporated with the same computational burden
as modelling linear transition functions and constraints. ADP
is an anytime optimisation solution technique2so we always
obtain a solution regardless of the constraints and the inputs.
In this instance, the problem is formulated as a maximisation
problem for convenience.
A. Policy-based value function approximation
VFAs are obtained iteratively, and here the focus is on
approximating the value function around a post decision state
vector, sx
k, which is the state of the system at discrete time, k,
soon after making the decisions but before the realisation of
any random variables [29]. This is because approximating the
expectation within the max or min operator in (10) is difficult
2An anytime algorithm is an algorithm that returns a feasible solution even
if it is interrupted prematurely. The quality of the solution, however, improves
if the algorithm is allowed to run until the desired convergence.
k=1 k=2
1
s
1
L
x
s
1
M
x
s
,
211
(,)
Mx
sS s
2
M
s
222
(,)
Csx
2
H
s
- Pre-decision state
k
x
- Decision variable
k
k
s
- Random variable
- Contribution
Next pre-decision state
(, )
kkk
Csx
1
H
x
s
1
L
x
s
2
Ls
,
111
(, )
xMx
s
Ssx
Post-decision state
2
M
x
s
2
H
x
s
2
L
x
s
2
x
s
2
x
s
- Post-decision state
x
k
s
111
(,)
Csx
1
Fig. 5. Illustration of the modified Markov decision process, which separates
the state variables into post decision states and pre-decision states.
in large practical applications as transition probabilities from
all the possible states are required. Pseudo-code of the method
used to approximate the value function is given in Algorithm1,
which is a double pass algorithm referred to as temporal
difference learning with a discount factor λ= 1 or TD(1).
Given this, the original transition function sk+1 =
sM(sk,xk,ωk)is divided into the post-decision state:
sx
k=sM,x (sk,xk),(11)
and the next pre-decision state:
sk+1 =sM,ω (sx
k,ωk),(12)
which are used in line 12 of Algorithm 1.
An example of the new modified MDP is illustrated in
Fig. 5, which uses the mean and variation of the stochastic
variables to obtain the post-decision and next pre-decision
states, respectively. In more detail, at s1, there are three
possible decisions that takes us to three post-decision states,
which correspond to high, middle and lowest states. However,
the next pre decision state s2depends on the random variables
ω1.
Given this, the new form of the value function is written as:
¯
Vπ
k(sk) = max
xkCk(sk,xk) + ¯
Vπ,x
k(sx
k),(13)
where ¯
Vπ,x
k(sx
k)is the VFA around the post-decision state sx
k,
given by:
¯
Vπ,x
k(sx
k) = EVπ
k+1(sk+1 )|sx
k.(14)
This method is computationally feasible because
EVπ
k+1(sk+1 )|sx
kis a function of the post-decision
state sx
k, that is a deterministic function of xk. However,
in order to solve (13), we still need to calculate the value
functions in (14) for every possible state sx
kfor all k. This
can be computationally difficult since sx
kis continuous and
multidimensional, so we approximate (14).
Two strategies are employed. First we construct lookup
tables for VFAs in (14) that are concave and piecewise linear in
the resource dimension of all the state variables of controllable
devices [25]. For example, in the VFA for k= 49, which
7
2 4 6 8 10
0
0.5
1
Battery state
Expected future reward ($)
k=49
k=60
500 1000
0
1.5
Iterations
Objective function ($)
ADP
DP
(b)
(a)
Fig. 6. (a) Expected future reward or VFA (14) for following the optimal
policy vs. state of the battery for time-steps k= 49 and k= 60, and (b)
value of the objective function (i.e. reward) vs. iterations for ADP approach
and the expected value from DP.
is depicted in Fig. 6(a), the expected future rewards stay the
same after approximately 7 kWh, so if we are at 7 kWh in
the time-step k= 48, charging the battery further will have
no future rewards and will only incur an instantaneous cost
if the electricity has to come from the grid. However, if the
electricity price or demand is high then we can discharge
the battery as the expected future rewards will only decrease
slightly. Given this, we never charge the storage when there is
no marginal value so the slopes of the VFA are always greater
than or equal to zero. Accordingly, the VFA is given by:
¯
Vi
k(si,x
k) =
Ak
X
a=1
¯vkazk a,(15)
where Pazka =si,x
kand 0≤zka ≤¯zka for all a.zka is
the resource coordinate variable for segment a∈(1 . . . Ak),
Ak∈ A,¯zka is the capacity of the segment and ¯vka is the
slope. Other strategies that could be used for this step are
parametric and non-parametric approximations of the value
functions [29].
Second we handle the multidimensional state space by
generating independent VFAs for each controllable device,
which are then combined to obtain the optimum policy. The
separable VFA is given by:
¯
Vk(sx
k) =
I
X
i=1
¯
Vi
k(si,x
k).(16)
It is possible to separate the VFAs for battery and TES because
their state transitions are independent as shown in (4) and
(5), respectively. Instead, the inter-device coupling between the
battery and the TES only arise through their effect on energy
costs, so it is done in the contribution function in (7). If the
state transition functions of the controllable devices depend
on each other, then the VFAs are not separable and we have
to use multidimensional value functions. In such situations the
number of iterations needed for VFA convergence will increase
and concavity needs to be generalised as well.
In more detail, Algorithm 1 proceeds as follows:
1) Set the initial VFAs to zero (i.e. all the slopes to zero)
or to an initial guess to speed up the convergence (lines
1-3). Estimates for the initial VFAs can be obtained by
solving the deterministic problem using MILP. The value
of the initial starting state si
1is assumed.
2) For each realisation of random variables, step forward in
time by solving the following deterministic problem (line
7) using the VFA from the previous iteration:
xr
k=arg max
xk∈XkC(sr
k,xk) + ¯
Vr−1
k(sx,r
k),
=arg max
xk∈Xk
C(sr
k,xk) +
Ar−1
k
X
a=1
¯vr−1
ka zka
.(17)
3) Determine the positive and the negative marginal contri-
butions ˆcr,i+
k(sr,i
k)and ˆcr,i−
k(sr,i
k), respectively (line 9)
for each controllable device, using:
ˆcr,i+
k(sr,i
k) = cr,i+
k(sr,i+
k, xr,i+
k)−cr,i
k(sr,i
k, xr,i
k)
δs ,
ˆcr,i−
k(sr,i
k) = cr,i
k(sr,i
k, xr,i
k)−cr,i−
ki (sr,i−
k, xr,i−
k)
δs ,(18)
where sr,i+
k=sr,i
k+δs,xr,i+
k=Xπ
ksr,i+
k,and δs
is the mesh size of the state space. We do this similarly
for sr,i−
kand xr,i−
k.
4) Find the post-decision and the next pre-decision states
using (11) and (12), respectively. Transition functions of
the controllable devices can be non-linear (line 12).
5) Starting from K, step backward in time to compute the
slopes, ˆvr,i+
k, which are then used to update the VFA (line
16). Compute ˆvr,i+
k:
ˆvr,i+
k(sr,i
k) =
ˆcr,i+
K(sr,i
K),if k=N
ˆcr,i+
k(sr,i
k)+
∆r,i+
kˆvr,i+
k+1 (sr,i
k+1)otherwise
,(19)
where ∆r,i+
k=1
δs SM(xr,i
k−xr,i+
k)is the marginal
flow (i.e. whether or not there is a change in energy in
the storage as a result of the perturbation). We do this
similarly for ˆvr,i−
k(sr,i
k). Note that we took the power
coming out of the storage as negative.
6) Update the estimates of the marginal values [27]:
¯vr,i+
k−1(sx,r,i
k−1) = 1−αr−1¯vr−1,i+
k−1(sx,r,i
k−1) + αr−1ˆvr,i+
k,
(20)
where αis a “stepsize”; α∈(0,1], and similarly for
¯vr,i−
k−1(sx,r,i
k−1)(line 17). In this research, a harmonic step-
size formula is used, α=b/(b+r), where bis a constant.
This step-size formula satisfies conditions ensuring that
the values will converge as r→ ∞ [36].
7) Use the concave adaptive value estimation (CAVE) algo-
rithm to update the VFAs [37] (line 18).
8) Combine value functions of each device using (16).
9) Repeat this procedure over Riterations, which are gener-
ated randomly according to the probability distributions
of the random variables. We find that R= 1000 re-
alisations is enough for the objective function to come
8
0.2 0.4 0.6 0.8 1
0
0.5
1
Normalised PV output
Probability density
6 am 9 am 10 am 12 pm 4 pm
0.2 0.4 0.6 0.8 1
0
0.5
1
Normalised electrical demand
6 am 10 am 2 pm 5 pm 9 pm
(b)
(a)
Fig. 7. Probability density functions of the (a) PV output over different times
of a sunny day (January 1st, 2013), and (b) electrical demand over different
times of a high demand day (July 2nd, 2012).
within an acceptable accuracy even for the worst possible
scenario. We investigated a range of scenarios and an
example is given in Fig.6(b).
Note that when solving a deterministic SHEMS problem
using ADP, the post decision and next pre decision states are
the same because the random variables will be zero. However,
the remaining steps in Algorithm 1 stay the same. This means,
with ADP, there is no noticeable computational burden for
considering variation in the stochastic variables compared to
solving the deterministic problem. On the other hand, with
DP we have to loop over all the possible combinations of
realisation of random variables, which significantly increases
the computational burden.
Now we present the probabilistic models of the stochastic
variables.
IV. ESTIMATING STOCHASTIC VARIABLE MODELS
In order to optimise performance, it is important for a
SHEMS to incorporate variations in the PV output and elec-
trical and thermal demand, and to do so over a horizon of
several days. The benefits of using a stochastic optimisation
over a deterministic optimisation are discussed in Section V-
D and in [13], [19], [20], [22], [23]. Given this, SHEMSs
require the mean PV output and the demand with its appro-
priate probability distributions before the start of the decision
horizon. The effects of these random variations on the SHEMS
problem are discussed below:
1) PV output depends on solar insolation, a forecast of
which can be obtained before the horizon starts with a
reasonable accuracy from weather forecasting services.
PV output is important to the SHEMS problem as it is
a key source of energy and is expected to be closely
coupled with the battery storage profile. Failing to accom-
modate for variation in PV generation would be expected
to increase costs to the household as more power is
imported from the grid.
2) Electrical demand of the household depends on the num-
ber of occupants and their behavioural patterns, which is
difficult to predict in the real world. In the context of
SHEMS, electrical demand should be supplied from the
DG units, storage units and the electrical grid. Failure to
accommodate variations in electrical demand may result
in additional costs to the household.
3) Thermal demand is also difficult to predict in the real
world so failure to accommodate variations in thermal
demand may result in user discomfort.
In this paper, the mean PV output and electrical demand
are from a data set collected during Smart Grid Smart City
project. The data set [34] consists of PV output and electrical
demand measurements at 30 minutes intervals over 3 years
for 50 households. In real world applications, the SHEMSs
estimate mean PV output using the weather forecast [38] and
the mean electrical demand using a suitable demand prediction
algorithm [39], [40].
Commonly, probability distributions associated with these
stochastic variables are modelled as Gaussian or skew-Laplace
distributions. However, in this paper, we kernel estimate the
probability distributions of PV-output and electrical demand
using a hierarchical approach, where we first cluster total
daily empirical data, and then kernel estimate probability
distributions within each cluster. In more detail, we obtain
the probability distributions of the PV output, which depend
on the time and type of day (sunny, normal or cloudy days)
in two steps.
1) First we cluster daily empirical data using a k-means
algorithm to obtain 3 clusters with different total daily PV
generations, corresponding to sunny, normal and cloudy
days.
2) Second, for each time-step in the corresponding clusters,
we estimate a probability distribution of the PV output
using an Epanechnikov kernel estimating technique.
Note that the draws from the kernel estimates within a day-
type are independent. We obtain the probability distributions
of the electrical demand in a similar way except the clustering
is done according to days with high, normal or low demand
levels. The probability distributions of the PV output and
electrical demand follows a skewed unimodal distributions as
depicted in Fig. 7. It is worthwhile to note that before the start
of the decision horizon, the SHEMS uses the predicted mean
PV-output and the electrical demand to determine the type of
day and hence the corresponding probability distribution. Note
that the prediction is accurate enough to decide the type of day
but not accurate enough to use a deterministic optimisation.
Given this, we investigate the effects on the total electricity
costs from deterministic and stochastic SHEMSs, using a range
of possible PV output and demand profiles, in Section V-D.
Finally, we construct the magnitudes of the thermal demand
and the time they occur making use of Australian Standard
AS4552 [41] and the hot water plug readings in [35]. We
assume a Gaussian distribution for the thermal demand be-
cause there is not enough empirical data to obtain a reasonable
distribution.
9
TABLE I
DAILY OPTIMISATION RESULTS FOR THE THREE SCENARIOS.
Total daily: Scenario 1 Scenario 2 Scenario 3
Electrical demand (kWh) 24.72 64.75 10.48
Thermal demand (kWh) 10.42 19.1 8.66
PV generation (kWh) 9.27 6.06 12.31
Benchmark cost ($) 6.13 10.5 1.77
With PV ($) 5.75 9.86 1.37
DP (sb
1=sb
K= 6) ($) 3.16 8.04 0.72
DP ($) 3.05 7.96 0.59
ADP ($) 3.14 7.79 0.6
Stochastic MILP ($) 3.15 8.11 0.63
Dummy TES control ($) 2.13 2.56 0.83
DP TES control ($) 0.91 1.68 0.58
Marginal value of TES ($) 1.22 0.88 0.25
Now we show the performance of the presented SHEMSs
using real-data.
V. SIMULATION RESULTS AND DISCUSSION
There are three sets of simulations. The first set consists
of discussions about: challenges of estimating the end-of-day
SOC; benefits of PV-storage systems; quality of the solutions
from ADP, DP and stochastic MILP; and the benefits of
a stochastic optimisation over a deterministic one (Sections
V-A, -B, -C and -D). The second set has discussions on:
computational aspects; and effects of extending the decision
horizon (Sections V-E, and -F). The third set is about the year-
long optimisation (Section V-G). The time-of-use electricity
tariff consists of off-peak, shoulder and peak periods with
$0.11,$0.2and $0.47 per kWh, respectively. In weekdays,
off-peak period is between 12 am - 7 am and 10 pm - 12 am,
and the peak is between 2 pm - 8 pm. On weekends peak-
period is replaced by the shoulder. Stochastic MILP uses 6000
scenarios. MATLAB is used to implement all the SHEMSs.
A. Challenges of estimating the end-of-day SOC
In the first set of simulations, we discuss the challenges
of estimating the end-of-day battery SOC (Section V-A),
benefits of residential PV-storage systems (Section V-B), the
performance of the three SHEMSs over a day (Section V-
C) and the benefits of using a stochastic optimisation over
a deterministic one (Section V-D) using three scenarios for
a Central Coast, NSW, Australia, based residential building
shown in Fig. 8: Scenario 1, 2, and 3 are on August 20th ,
2012, July 2nd, 2012, and January 1st , 2013, respectively. The
PV system size is 2.2kWp.
From our preliminary investigations, and as depicted in
Fig 6(a), the expected future rewards in the start-of-day two
(time-step k= 49) only increases slightly after 6kWh, which
suggests that using half of the available battery capacity as the
start-of-day and end-of-day battery SOC (sb
1=sB
K= 6 kWh)
in daily optimisations with DP is a valid assumption. However,
we observe that on days with a low morning demand, high
2 4 6 8 10 12 14 16 18 20 22 24
0
2.2
2 4 6 8 10 12 14 16 18 20 22 24
0
3.3
Power (kW)
2 4 6 8 10 12 14 16 18 20 22 24
0
1.2
Time (hour)
PV output Electrical demand Thermal demand
(b)
(a)
(c)
Fig. 8. PV output and electrical and thermal demand for (a) Scenario 1, (b)
Scenario 2 and (c) Scenario 3.
PV output and medium-high evening demand (see Fig. 8(c)),
sb
1=sB
K= 2 kWh gives the best results because the battery
can be used to supply the evening demand and there is no need
to charge it back. However, the next day’s electricity cost can
significantly increase if we are anticipating a high morning
demand and low-high PV output (see Fig. 8(a-b)). Because of
such situations, it is beneficial to control the end-of-day battery
SOC by considering uncertainties over several days, which is
our special focus of attention in Section V-E & F.
B. Benefits of PV-storage systems
Residential PV-storage systems using a DP based SHEMS
result in significant financial benefits, which is evident from
the electricity cost of the three scenarios in three instances
(i.e. benchmark cost with neither PV or storage, cost with
PV but no storage and PV-storage system with SHEMSs) in
Table. 1. The daily electricity cost can be reduced by 6.2%,
6.1% and 22.6% for Scenario 1, 2 and 3, respectively, if there
is only a PV system. We can further improve this by having a
battery and effectively controlling its SOC using a SHEMS in
which the battery is charged to a certain level from solar power
and electrical grid before peak periods. A DP based SHEMS
constrained to a 60% start-of-day and end-of-day battery SOC
reduces the total electricity cost by further 42.25%, 17.33%
and 36.72% for Scenario 1, 2, & 3, respectively. That is a total
of 48.45%, 23.43% and 59.32% cost reduction for Scenario
1, 2 and 3, respectively, by controlling both the battery and
TES. As shown in Fig. 8, inhabitants are away during the day
for Scenario 1 and 3 so the extra PV generation is stored
in the battery thus shows the benefits of having a storage.
In contrast to Scenarios 1 & 3, Scenario 2 electrical demand
10
0 0.2 0.4 0.6 0.8 1
2
2.5
3
0 0.2 0.4 0.6 0.8 1
5
5.5
6
6.5
Standard deviation (kW)
0 0.2 0.4 0.6 0.8 1
Total electricity cost ($)
0
0.5
1
Deterministic Stochastic
(a)
(c)
(b)
Fig. 9. The total electricity cost of PV-battery systems of Scenarios 1-3
from deterministic and stochastic ADP for a range of possible PV output and
demand profiles, which are generated by adding Gaussian noise with varying
standard deviation and zero mean to the actual PV output and electrical
demand.
exceeds PV generation so the benefit of battery is minimal.
The electricity cost of controlling the TES in Scenario 1 and
3 are the lowest because the surplus of solar and battery power
is used to charge the TES instead of sending it back to the
grid as FiTs are negligible in Australia. Note that we obtain
the benchmark cost of the TES by assuming a dummy control
system, which operates regardless of the electricity price. The
dummy TES control electricity cost varies depending on the
time the hot water is used.
C. Quality of the solutions from ADP, DP and stochastic MILP
ADP and DP results in better quality solutions than stochas-
tic MILP as they both incorporate stochastic input variables
using appropriate probabilistic models and non-linear con-
straints [24]. However, ADP results in slightly lower quality
schedules compared to the optimal DP solutions because the
value functions used are approximations. This is evident in
Table 1. Note that the DP based SHEMS with two controllable
devices (battery and thermal) is computationally intractable
to be use in an existing smart meter, and we only use it to
compare solutions with ADP .
D. Benefits of a stochastic optimisation over a deterministic
optimisation
A stochastic optimisation always performs better over a
deterministic one and we investigate this using a range of
possible PV output and the demand profiles as shown in Fig. 9.
The Fig. 9 is obtained as follows: first we obtain VFAs from
both the stochastic and deterministic optimisations using ADP.
The stochastic optimisation uses the kernel estimated probabil-
ity distributions of the PV output and electrical demand while
the deterministic ADP only uses the predicted mean PV output
and the electrical demand (i.e. all the random variables are
zero). Second we obtain the total electricity cost for different
possible PV output and demand profiles, which are generated
by adding Gaussian noise with varying standard deviation and
zero mean to the actual PV output and electrical demand. The
mean absolute errors of the actual (i.e. zero Gaussian noise)
and the predicted mean electrical demand are 0.238%,0.696%,
and 0.117%, for Scenarios 1,2, and 3, respectively, which are
the initial points. Note that the forecast errors associated with
residential electrical demand predictions are typically very
high and our aim here is not to minimise forecast errors but to
find a suitable stochastic optimisation technique that performs
well under uncertainty.
ADP enables us to incorporate the stochastic input variables
without a noticeable increase in the computational effort over
a deterministic ADP. Moreover, stochastic ADP requires less
computational effort than deterministic DP. An ADP-based
stochastic optimisation for a PV-battery system can reduce
the total electricity cost by 13.62%,0.16%, and 94.67%
for scenarios 1,2, and 3, respectively, in instances without
Gaussian noise. The benefits from Scenarios 1 and 3 are
noticeable and their forecast errors are what we can expect
from electrical demand prediction algorithms [39], [40]. The
benefits from the stochastic optimisation is minimal when the
forecast errors are very high (Scenario 2) or very low, which
are highly unlikely. In Scenario 2, the stochastic optimisation
gives slightly lower quality results after 0.2 kW standard
deviation of Gaussian noise because both the forecast and
kernel estimation errors are high. However, these scenarios are
highly unlikely and the resulting cost is negligible compared
to the benefits from a stochastic optimisation. Moreover, the
initial point of Scenario 2 already has a mean absolute error
of 0.696, which is highly unlikely to increase any further in
a practical scenario. In summary, even though the benefits
vary with the scenario and the forecast error, a stochastic
optimisation performs better or same as a deterministic one,
and ADP provides these benefits without a noticeable increase
in the computational effort.
E. Computational aspects
In our second set of simulation results, we discuss the
computational performance of the three solution techniques
(Section V-E) and the benefits of extending the decision
horizon (Section V-F) of households with PV-battery systems.
ADP computes a solution much faster than both DP and
stochastic MILP and, most importantly, ADP with a two-day
decision horizon computes a solution with less than half of the
computational time of a daily DP, as depicted in Fig. 10(a).
The computational time of both SHEMSs using ADP and
DP increases linearly as we increase the decision horizon,
however, ADP has a lesser slope. This linear increase with
DP is because the state transitions in this problem are only
between two adjacent time-steps so time does not increase
exponentially. The computational time of ADP with a two-
day decision horizon will only increase by 4 minutes when
11
1 2 3 4
0
20
40
60
80
Length of the decision horizon (days)
Solution time (minutes)
ADP
DP
Stochastic MILP
1234
0.8
1
Length of the decision horizon (days)
Normalised electricity cost
(b)
(a)
Fig. 10. Effects of extending the decision horizon for PV-battery systems,
(a) computational time of ADP, DP and stochastic MILP against the length
of the decision horizon, and (b) normalised electricity cost with error bars
against the length of the decision horizon.
0
5
10
Year 1 Year 2 Year 3
Electricity cost savings (%)
Mean
Fig. 11. Electricity cost savings over 3 years for 10 households, where blue
lines indicate 25th and 75th percentiles and the red lines indicate the median.
the TES is added while a finely discretised DP based SHEMS
takes approximately 2.5 hours.
F. Effects of extending the decision horizon
Extending the decision horizon beyond one day to consider
inter-daily variations in PV output and electrical demand
results in significant benefits as depicted in Fig. 10(b), which
shows the normalised electricity cost vs. length of the decision
horizon. We obtain our results using a DP based SHEMS
for 10 households over 2 months.3Our results show that
increasing the decision horizon beyond two days has no
significant benefits. However, increasing the decision horizon
up to one week is beneficial in some situations, e.g. off-the-
grid systems and if there are high variations in PV output and
demand. The benefits of the two-day decision horizon varies
depending on the household, which we will discus in the next
section.
3Here we use DP as we want to obtain the exact solution. In a practical
implementation, extending the decision horizon with DP is difficult as the
computational power is limited in existing smart meters.
TABLE II
YEARLY OPTIMISATION RESULTS FOR TWO HOUSEHOLDS OVER 3YEARS
Total: Household 1 Household 2
Year 1 Year 2 Year 3 Year 1 Year 2 Year 3
PV output
(MWh) 2.91 2.82 2.89 5.99 5.56 5.35
Demand
(MWh) 4.29 4.82 4.38 9.82 10.24 12.85
Benchmark
cost ($) 568.09 610.6 558 1208.3 1276 1588.2
With
PV ($) 440.52 482.5 417.1 821.7 891.4 1194.8
PV-battery
DP ($) 248.37 297.55 238.15 534.8 596.25 890.25
PV-battery
ADP ($) 232.25 285.63 224.18 526.2 589.23 882.29
PV-battery
Stochastic
MILP ($)
281.59 333.60 276.23 554.83 599.07 906.75
G. Year-long optimisation
In our third set of simulation results, we compare the ADP
based SHEMS with a two-day decision horizon to a DP
approach with a daily decision horizon over three years for
10 households with PV-battery systems. We omit TES as we
already identified its benefits in Section V-B, and moreover, a
yearly optimisation using DP with two controllable devices is
computationally difficult. The time-periods of the three years
are: year 1 from July 1st, 2012 to June 31st , 2013, year 2 is
from July 1st, 2011 to June 31st , 2012 and year 3 is from July
1st, 2010 to June 31st , 2011. Electricity cost savings for all
the households over three years are given in Fig.11 and in
Table 2 we demonstrate detailed results of two households in
Central Coast (Household 1) and Sydney (Household 2), in
NSW, Australia. The PV sizes of the Households 1 & 2 are
2.2kWp and 3.78 kWp, respectively.
The proposed ADP-based SHEMS implemented using a
two-day decision horizon that consider variations in PV output
and electrical demand reduces the average yearly electricity
cost by 4.63% compared to a daily DP based SHEMS, as
depicted in Fig 11. We also find that the average yearly savings
are 5.12%, 3.89% and 4.95% for years 1-3, respectively. This
is because 2013 was a sunny year compared to 2012 and 2011
so the two day optimisation has greater benefits. For example:
if we are anticipating a sunny weekend with low demand
then we can completely discharge the battery on Friday night.
However, if we have a half charge battery on Friday night
then we will be wasting most of the free energy as the storage
capacity is limited. Our results showed that a daily DP results
in a significant cost savings of 12.04% ($107.35) and 1.91%
($39.35) for Households 1 and 2, respectively, compared to a
daily stochastic MILP approach. The difference in the savings
is because of the following reasons. In scenarios with high
demand (i.e. Household 2), most of the time the battery will
discharge its maximum possible power to the household during
peak periods, so the battery and the inverter will operate in
the maximum efficiencies even though MILP solver does not
consider the non-linear constraints. The converse is happening
for scenarios with low demand (i.e. Household 1).
For demonstration purposes we show optimisation results
12
for two households over three years in Table.2. The average
yearly electricity cost for a residential PV-battery system
with the proposed ADP based SHEMS can reduce the total
electricity cost over 3 years by 57.27% and 50.95% for
Households 1 and 2, respectively, compared to the 22.80%
and 28.60% improvements by having only a PV system. It is
important to note that a DP based SHEMS over a two-day
long decision horizon may result in a slightly better solution.
However, it is computationally difficult and the computational
power available will be limited as it won’t be worthwhile
investing in a specialised equipment to solve this problem
given the savings on offer.
VI. CONCLUSION
This paper shows the benefits having a smart home energy
management system and presented an approximate dynamic
programming approach for implementing a computationally
efficient smart home energy management system with sim-
ilar quality schedules as with dynamic programming. This
approach enables us to extend the decision horizon to up to a
week with high resolution while considering multiple devices.
Our results indicate that these improvements provide financial
benefits to households employing them in a smart home energy
management system. Moreover, stochastic approximate dy-
namic programming always performs better over deterministic
approximate dynamic programming under uncertainty without
a noticeable increase in the computational effort.
In practical applications, we can use value function approx-
imations generated from approximate dynamic programming
during offline planning phase to make faster online solu-
tions. This is not possible with stochastic mixed-integer linear
programming and generating value functions using dynamic
programming is computationally difficult. In the future, we
will learn initial value function approximations of approximate
dynamic programming using historical results, which will fur-
ther speed up the value function approximation convergence.
Given the benefits outlined in this paper and the possible
future work, we recommend the use of approximate dynamic
programming in smart home energy management systems.
REFERENCES
[1] Australian Government Bureau of Resources and Energy Economics.
Energy in Australia 2014.
[2] A. C. Chapman, G. Verbic, and D. J. Hill, “Algorithmic and strategic
aspects to integrating demand-side aggregation and energy management
methods,” IEEE Transactions on Smart Grid, vol. PP, no. 99, pp. 1–13,
2016.
[3] Ausgrid, NSW, Australia. Time-of-use pricing.
[4] Energy Australia. Flexible pricing FAQs.
[5] S. Lu, N. Samaan, R. Diao, M. Elizondo, C. Jin, E. Mayhorn, Y. Zhang,
and H. Kirkham, “Centralized and decentralized control for demand
response,” in Innovative Smart Grid Technologies (ISGT), 2011 IEEE
PES, 2011, pp. 1–8.
[6] M. Pipattanasomporn, M. Kuzlu, and S. Rahman, “An algorithm for
intelligent home energy management and demand response analysis,”
IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 2166–2173, 2012.
[7] S. Li, D. Zhang, A. B. Roget, and Z. O’Neill, “Integrating home energy
simulation and dynamic electricity price for demand response study,”
IEEE Transactions on Smart Grid, vol. 5, no. 2, pp. 779–788, March
2014.
[8] M. Muratori and G. Rizzoni, “Residential demand response: Dynamic
energy management and time-varying electricity pricing,” IEEE Trans-
actions on Power Systems, vol. 31, no. 2, pp. 1108–1117, March 2016.
[9] Australian PV Institute. Australian PV market since April 2001.
[Online]. Available: http://pv-map.apvi.org.au/analyses
[10] Electric Power Research Institute (EPRI), “The integrated grid realizing
the full value of central and distributed energy resources,” Tech. Rep.,
2014.
[11] Australian Energy Market Operator (AEMO), “Emerging technologies
information,” Tech. Rep., 2015.
[12] Rocky Mountain Institute, Homer Energy, Cohnreznick Think Energy,
“The economics of grid defection when and where distributed solar
generation plus storage competes with traditional utility service,” Tech.
Rep., 2014.
[13] Z. Chen, L. Wu, and Y. Fu, “Real-time price-based demand response
management for residential appliances via stochastic optimization and
robust optimization,” IEEE Transactions on Smart Grid, vol. 3, no. 4,
pp. 1822–1831, 2012.
[14] C. Keerthisinghe, G. Verbiˇ
c, and A. Chapman, “Evaluation of a multi-
stage stochastic optimisation framework for energy management of
residential PV-storage systems,” in Power Engineering Conference (AU-
PEC), 2014 Australasian Universities, Sept 2014, pp. 1–6.
[15] M. Bozchalui, S. Hashmi, H. Hassen, C. Canizares, and K. Bhattacharya,
“Optimal operation of residential energy hubs in smart grids,” IEEE
Transactions on Smart Grid, vol. 3, no. 4, pp. 1755–1766, 2012.
[16] F. De Angelis, M. Boaro, D. Fuselli, S. Squartini, F. Piazza, and
Q. Wei, “Optimal home energy management under dynamic electrical
and thermal constraints,” IEEE Transactions on Industrial Informatics,
vol. 9, no. 3, pp. 1518–1527, 2013.
[17] J. Wang, Z. Sun, Y. Zhou, and J. Dai, “Optimal dispatching model
of smart home energy management system,” in Innovative Smart Grid
Technologies - Asia (ISGT Asia), 2012 IEEE, 2012, pp. 1–5.
[18] K. C. Sou, J. Weimer, H. Sandberg, and K. Johansson, “Scheduling smart
home appliances using mixed integer linear programming,” in Decision
and Control and European Control Conference (CDC-ECC), 2011 50th
IEEE Conference on, 2011, pp. 5144–5149.
[19] M. Pedrasa, E. Spooner, and I. MacGill, “Robust scheduling of residen-
tial distributed energy resources using a novel energy service decision-
support tool,” in Innovative Smart Grid Technologies (ISGT), 2011 IEEE
PES, 2011, pp. 1–8.
[20] M. Pedrasa, T. Spooner, and I. MacGill, “Coordinated scheduling of
residential distributed energy resources to optimize smart home energy
services,” IEEE Transactions on Smart Grid, vol. 1, no. 2, pp. 134–143,
2010.
[21] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed.
Athena Scientific, Bemont, Massachuttes, 2005, vol. 1.
[22] H. Tischer and G. Verbiˇ
c, “Towards a smart home energy management
system - a dynamic programming approach,” in Innovative Smart Grid
Technologies Asia (ISGT), 2011 IEEE PES, 2011, pp. 1–7.
[23] C. Keerthisinghe, G. Verbiˇ
c, and A. Chapman, “Addressing the stochas-
tic nature of energy management in smart homes,” in Power Systems
Computation Conference (PSCC), 2014, Aug 2014, pp. 1–7.
[24] ——, “Energy management of PV-Storage systems: ADP approach
with temporal difference learning,” in Power Systems Computation
Conference (PSCC), 2016, Jun 2016, pp. 1–7.
[25] D. F. Salas and W. B. Powell, “Benchmarking a scalable approximate dy-
namic programming algorithm for stochastic control of multidimensional
energy storage problems,” Technical report, Working Paper, Department
of Operations Research and Financial Engineering, Princeton, NJ, Tech.
Rep., 2013.
[26] D. Jiang, T. Pham, W. Powell, D. Salas, and W. Scott, “A comparison
of approximate dynamic programming techniques on benchmark energy
storage problems: Does anything work?” in Adaptive Dynamic Program-
ming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on,
Dec 2014, pp. 1–8.
[27] R. Anderson, A. Boulanger, W. Powell, and W. Scott, “Adaptive stochas-
tic control for the smart grid,” Proceedings of the IEEE, vol. 99, no. 6,
pp. 1098–1115, 2011.
[28] Powell, Warren B. and George, Abraham and Simao, Hugo and Scott,
Warren and Lamont, Alan and Stewart, Jeffrey, “SMART: A Stochastic
Multiscale Model for the Analysis of Energy Resources, Technology,
and Policy,” INFORMS Journal on Computing, vol. 24, no. 4, pp. 665–
682, 2012.
[29] W. B. Powell, Approximate Dynamic Programming - Solving the Curses
of Dimensionality. John Wiley and Sons, Inc., 2007.
[30] ——, “Approximate dynamic programming for large-scale resource al-
location problems,” Princeton University, Princeton, New Jersey 08544,
USA,, 2005.
[31] F. Borghesan, R. Vignali, L. Piroddi, M. Prandini, and M. Strelec,
“Approximate dynamic programming-based control of a building cooling
13
system with thermal storage,” in Innovative Smart Grid Technologies
Europe (ISGT EUROPE), 2013 4th IEEE/PES, Oct 2013, pp. 1–5.
[32] M. Strelec and J. Berka, “Microgrid energy management based on
approximate dynamic programming,” in Innovative Smart Grid Tech-
nologies Europe (ISGT EUROPE), 2013 4th IEEE/PES, Oct 2013, pp.
1–5.
[33] P. Samadi, H. Mohsenian-Rad, V. Wong, and R. Schober, “Real-time
pricing for demand response based on stochastic approximation,” IEEE
Transactions on Smart Grid, vol. 5, no. 2, pp. 789–798, March 2014.
[34] E. L. Ratnam, S. R. Weller, C. M. Kellett, and A. T. Murray, “Residential
load and rooftop PV generation: an Australian distribution network
dataset,” International Journal of Sustainable Energy, vol. 0, no. 0, pp.
1–20, 0.
[35] Smart-Grid Smart-City Customer Trial Data. [Online]. Available:
https://data.gov.au/dataset/smart-grid-smart-city-customer-trial-data
[36] T. Jaakkola, M. I. Jordan, and S. P. Singh, “Convergence of stochastic
iterative dynamic programming algorithms,” Neural Computation, vol. 6,
pp. 1185–1201, 1994.
[37] G. A. Godfrey and W. B. Powell, “An adaptive, distribution-free
algorithm for the newsvendor problem with censored demands, with
applications to inventory and distribution,” Management Science, vol. 47,
no. 8, pp. 1101–1112, 2001.
[38] N. Sharma, P. Sharma, D. Irwin, and P. Shenoy, “Predicting solar
generation from weather forecasts using machine learning,” in Smart
Grid Communications (SmartGridComm), 2011 IEEE International
Conference on, Oct 2011, pp. 528–533.
[39] K. Bao, F. Allerding, and H. Schmeck, “User behavior prediction for
energy management in smart homes,” in Fuzzy Systems and Knowledge
Discovery (FSKD), 2011 Eighth International Conference on, vol. 2,
July 2011, pp. 1335–1339.
[40] D. Lachut, N. Banerjee, and S. Rollins, “Predictability of energy use in
homes,” in Green Computing Conference (IGCC), 2014 International,
Nov 2014, pp. 1–10.
[41] E3 - Equipment Energy Efficieny, “Water heating data collection and
analysis,” Tech. Rep., 2012.
Chanaka Keerthisinghe (S10) received B.E. (Hons)
and M.E. (Hons) in Electrical and Electronic En-
gineering from the University of Auckland, New
Zealand, in 2011 and 2012, respectively. He is
currently a PhD candidate at the Centre for Future
Energy Networks, University of Sydney, Australia.
Chanaka’s research interests include demand re-
sponse, energy management in residential buildings,
wireless charging of electric vehicles and applying
approximate dynamic programming and optimisa-
tion techniques in power systems.
Gregor Verbiˇ
creceived the B.Sc., M.Sc., and Ph.D.
degrees in electrical engineering from the University
of Ljubljana, Ljubljana, Slovenia, in 1995, 2000, and
2003, respectively. In 2005, he was a North Atlantic
Treaty Organization Natural Sciences and Engineer-
ing Research Council of Canada Postdoctoral Fellow
with the University of Waterloo, Waterloo, ON,
Canada. Since 2010, he has been with the School of
Electrical and Information Engineering, University
of Sydney, Sydney, NSW, Australia. His expertise
is in power system operation, stability and control,
and electricity markets. His current research interests include integration of
renewable energies into power systems and markets, optimization and control
of distributed energy resources, demand response, and energy management
in residential buildings. He was a recipient of the IEEE Power and Energy
Society Prize Paper Award in 2006. He is an Associate Editor of the IEEE
TRANSACTIONS ON SMART GRID.
Archie C. Chapman (M14) received the B.A. de-
gree in math and political science, and the B.Econ.
(Hons.) degree from the University of Queensland,
Brisbane, QLD, Australia, in 2003 and 2004, re-
spectively, and the Ph.D. degree in computer science
from the University of Southampton, Southampton,
U.K., in 2009. He is currently a Research Fellow
in Smart Grids with the School of Electrical and
Information Engineering, Centre for Future Energy
Networks, University of Sydney, Sydney, NSW, Aus-
tralia. He has expertise is in game-theoretic and
reinforcement learning techniques for optimization and control in large
distributed systems. His research focuses on integrating renewables into legacy
power networks, using distributed energy and load scheduling methods, and on
designing tariffs and market mechanisms that support efficient use of existing
infrastructure and new controllable devices.