ArticlePDF Available

Abstract and Figures

This paper presents a computationally efficient smart home energy management system (SHEMS) using an approximate dynamic programming (ADP) approach with temporal difference learning for scheduling distributed energy resources. This approach improves the performance of a SHEMS by incorporating stochastic energy consumption and PV generation models over a horizon of several days, using only the computational power of existing smart meters. In this paper, we consider a PV-storage (thermal and battery) system, however, our method can extend to multiple controllable devices without the exponential growth in computation that other methods such as dynamic programming (DP) and stochastic mixed-integer linear programming (MILP) suffer from. Specifically, probability distributions associated with the PV output and demand are kernel estimated from empirical data collected during the Smart Grid Smart City project in NSW, Australia. Our results show that ADP computes a solution much faster than both DP and stochastic MILP, and provides only a slight reduction in quality compared to the optimal DP solution. In addition, incorporating a thermal energy storage unit using the proposed ADP-based SHEMS reduces the daily electricity cost by up to 26:3% without a noticeable increase in the computational burden. Moreover, ADP with a two-day decision horizon reduces the average yearly electricity cost by a 4:6% over a daily DP method, yet requires less than half of the computational effort.
Content may be subject to copyright.
1
A Fast Technique for Smart Home Management:
ADP with Temporal Difference Learning
Chanaka Keerthisinghe, Student Member, IEEE, Gregor Verbiˇ
c, Senior Member, IEEE,
and Archie C. Chapman, Member, IEEE
Abstract—This paper presents a computationally efficient
smart home energy management system (SHEMS) using an ap-
proximate dynamic programming (ADP) approach with temporal
difference learning for scheduling distributed energy resources.
This approach improves the performance of a SHEMS by
incorporating stochastic energy consumption and PV generation
models over a horizon of several days, using only the com-
putational power of existing smart meters. In this paper, we
consider a PV-storage (thermal and battery) system, however,
our method can extend to multiple controllable devices without
the exponential growth in computation that other methods such
as dynamic programming (DP) and stochastic mixed-integer
linear programming (MILP) suffer from. Specifically, probability
distributions associated with the PV output and demand are
kernel estimated from empirical data collected during the Smart
Grid Smart City project in NSW, Australia. Our results show
that ADP computes a solution much faster than both DP and
stochastic MILP, and provides only a slight reduction in quality
compared to the optimal DP solution. In addition, incorporating
a thermal energy storage unit using the proposed ADP-based
SHEMS reduces the daily electricity cost by up to 26.3% without
a noticeable increase in the computational burden. Moreover,
ADP with a two-day decision horizon reduces the average yearly
electricity cost by a 4.6% over a daily DP method, yet requires
less than half of the computational effort.
Index Terms—demand response, smart home energy man-
agement, distributed energy resources, approximate dynamic
programming, dynamic programming, stochastic mixed-integer
linear programming, value function approximation, temporal
difference learning.
NOMENCLATURE
kTime-step
KTotal number of time-steps
iIndex of controllable devices
ITotal number of controllable devices
jIndex of stochastic variables
JTotal number of stochastic variables
s{i,j}
kState of ior jat time-step k
xi
kDecision of controllable device iat time-step k
ωj
kVariation of stochastic variable jat time-step k
CkCost or reward at time-step k
πPolicy
rRealisation of random information
RTotal number of random realisations
αStepsize
Vπ
kExpected future cost/reward for following πfrom k
Chanaka Keerthisinghe, Gregor Verbiˇ
c, and Archie C. Chapman are with
the School of Electrical and Information Engineering, The University of
Sydney, Sydney, New South Wales, Australia. E-mail: {chanaka.keerthisinghe,
gregor.verbic, archie.chapman}@sydney.edu.au.
si,max Maximum state of a controllable device
si,min Minimum state of a controllable device
µiEfficiency of a controllable device [%]
liLosses of a controllable device per time-step
aIndex of a segment in the VFA
AkTotal number of segments at time-step k
zka Capacity of a segment
¯vka Slope of a segment
nA particular scenario
NTotal number of scenarios in stochastic MILP
I. INTRODUCTION
FUTURE electrical power grids will be based on a
“demand-following-supply” paradigm, because of the in-
creasing penetration of intermittent renewable energy sources
and advances in technology enabling non-dispatchable loads.
A cornerstone of achieving this is demand side management,
which can be roughly divided into demand response (DR) and
direct load control. This research focuses on DR, which is used
to reduce electricity costs on one side and provide system
services on the other. In particular, we focus on residential
customers, because roughly 30% of the energy use in Australia
is comprised of residential loads [1] and their diurnal patterns
drive daily and seasonal peak loads.
Currently, DR is implemented using small-scale voluntary
programs and fixed pricing strategies. The main drawback of
using the same pricing strategy for all the customers is the
possibility of inducing unwanted peaks in the demand curve
as customers respond to prices. Also, it is not possible for
residential and small commercial users to directly participate
in wholesale energy or ancillary services markets because
of the regulatory regimes and computational requirements.
As such, DR revolves around the interaction between an
aggregator and customers, as shown in Fig 1. In many cases,
the retailer acts as the aggregator. The aggregator’s task is to
construct a scheme that coordinates, schedules or otherwise
controls part of a participating users load via interaction with
its smart home energy management system (SHEMS) [2]. In
the context of DR, the aggregator sends control signals in the
form of electricity price signals to the SHEMS. The SHEMS
then schedules and coordinates customers distributed gener-
ation (DG), storage and flexible loads, collectively known as
distributed energy resources (DERs), to minimise energy costs
while maintaining a suitable level of comfort. In particular, this
research assumes that the exact electricity price signals are
available from a DR aggregator/retailer in the form of time-
2
Agent 1
Large user Retailer
Generator
Residential buildings
Agent 3
Agent 2
Wholesale ele ctricity
market
Generator Generator
Retailer
Energy flow
Information flow
Aggregator
Fig. 1. Customers, aggregator and the wholesale electricity market.
of-use pricing (ToUP). ToUP is chosen as it is prevalent in
Australia [3], [4].
The DERs in the smart homes considered in this paper com-
prise a PV unit, a battery and a thermal energy storage (TES)
unit (i.e. electric water heater). In the existing literature, a
range of DER have been used to achieve DR [5]–[8]. However,
our choice stems from Australia’s increasing penetration of
rooftop PV and battery storage systems in response to rising
electricity costs, decreasing technology costs and existing
fleet of hot water storage devices [9], [10]. According to
AEMO, the payback period for residential PV-storage systems
is already below 15 years in South Australia, with the other
states to follow suit in less than a decade [11]. Similarly,
residential users with PV-storage systems in the USA have
been forecast to reach grid parity within the next decade [12].
The SHEMS schedules the DER in a such a way that
electrical power drawn from the grid is minimised, especially
during peak periods. Given the recent drop in PV costs, feed-
in tariffs (FiTs) in Australia are significantly less than the
retail tariffs paid by the households, and it is anticipated that
this may happen in other parts of the world too. As such,
selling power back to the electrical grid is uneconomical,
and households have a strong incentive to self-consume as
much locally generated power as possible. Therefore, when
PV generation is higher than electrical demand, an effective
SHEMS should either store the surplus energy or consume it
using controllable loads.
The underlying optimisation problem here is a sequential
decision making process under uncertainty. As shown in [13],
SHEMSs that consider stochastic variables such as variations
in PV output, electrical and thermal demand using appropriate
probability distributions yield better quality schedules than
those obtained from a deterministic model. Moreover, in [14]
we showed that extending the optimisation horizon beyond
one day is economically beneficial as the SHEMS can exploit
inter-daily variations in consumption and solar-insolation pat-
terns. Given this, stochastic mixed-integer linear programming
(MILP) [13], [15]–[18], particle swarm optimisation [19], [20]
and dynamic programming (DP) [21], [22] have previously
been proposed to solve the SHEMS problem. In particular, DP
accommodates full non-linear controllable device models and
produces close-to-optimal solutions when finely discretised.
However, DP is infeasible if the optimisation horizon is
extended or when multiple controllable devices are added to
the SHEMS due to the exponential growth in the state and
action spaces [23].
Given these insights, in [24] we reported our prelimi-
nary work towards implementing a computationally efficient
SHEMS using approximate dynamic programming (ADP) with
temporal difference learning, an approach that has successfully
been applied for the control of grid level storage in [25]. In this
proposed approach, we obtain policies from value function ap-
proximations (VFAs)1. Our choice to use VFAs is based on the
observation that they work best in time-dependent problems
with regular load, energy and price patterns, relatively high
noise and less accurate forecasts (errors grow with the horizon)
[26]. Other ADP methods are better suited to applications with
different characteristics [27]–[33].
Building on this, the main contributions of this paper are
the development of a computationally efficient SHEMS using
ADP with temporal difference learning, which can incorporate
multiple controllable devices, and a demonstration of its
practical implementation using empirical data. Specifically, the
proposed ADP method enables us to:
1) incorporate stochastic input variables without a noticeable
increase in the computational burden;
2) extend the decision horizon with less computational bur-
den to consider uncertainties over several days, which
results in significant financial benefits;
3) enable integration of multiple controllable devices with
less computational burden than DP;
4) integrate the SHEMS into an existing smart meter as it
uses less memory compared to existing methods.
In order to show the performance of ADP, we use it with
a two-day decision horizon over three years using a rolling
horizon approach where the expected electricity cost of each
day is minimised considering the uncertainties over the next
day. The daily performance is benchmarked against DP and
stochastic MILP by applying them to three different scenarios
with different electrical and thermal demand patterns and PV
outputs. The three year evaluation is benchmarked against
a daily DP approach by applying it to 10 smart homes.
Moreover, the PV output and demand profiles are from data
[34] collected during the Smart Grid Smart City project by
Ausgrid and their consortium partners in NSW, Australia,
which investigated the benefits and costs of implementing
a range of smart grid technologies in Australian households
[35]. Specifically, SHEMSs estimate probability distributions
associated with the PV output and electrical demand from
empirical data using kernel regression, which are more realistic
than assuming parametric approaches. In addition, throughout
1A value function describes the expected future cost of following a policy
from a given state.
3
the paper we demonstrate the benefits of residential PV-storage
systems with a SHEMS.
The paper is structured as follows: Section II states the
stochastic energy management problem and the existing so-
lution techniques. This is followed by a description of the
ADP formulation in Section III. The stochastic variable models
are described in Section IV. SectionV presents the simulation
results and the discussion.
II. SMART HOME ENERGY MANAGEMENT PROBLEM
In this section, we describe the general formulation of
the sequential stochastic optimisation problem, formulate our
stochastic SHEMS problem as a sequential stochastic opti-
misation problem, and present a short description about the
stochastic MILP and DP use to solve this problem.
A. General sequential stochastic optimisation problem
A sequential stochastic optimisation problem comprises:
A sequence of time-steps,K={1. . . k . . . K }, where k
and Kdenote a particular time-step and the total number
of time-steps in the decision horizon, respectively.
A set of controllable devices, I={1...i...I}, where
each iis represented using:
A state variable, si
k∈ S.
A decision variable, xi
k X , which is a control action.
A set of non-controllable inputs, J={1. . . j . . . J }, where
each jis represented using:
A state variable, sj
k∈ S.
A random variable, ωj
k, capturing exogenous
information or perturbations.
Given this, we let: sk= [ si
k. . . sI
k, sj
k. . . sJ
k]T,xk=
[xi
k. . . xI
k]T, and ωk= [ ωj
k. . . ωJ
k]T. Note that the
state variables contain the information that is necessary
and sufficient to make the decisions and compute costs,
rewards and transitions.
Constraints for state and decision variables.
Transition functions, sk+1 =sM(sk,xk,ωk), describing
the evolution of states from kto k+ 1, where sM(.)
is the system model that consists of controllable device
i’s operational constraints such as power flow limits,
efficiencies and losses. Note that the transition functions
are only required for the controllable devices.
An objective function:
F=E(K
X
k=1
Ck(sk,xk,ωk)),(1)
where Ck(sk,xk,ωk)is the contribution (i.e cost or
reward of energy, or a discomfort penalty) incurred at
time-step k, which accumulates over time.
B. Instantiation
The objective of the SHEMS is to minimise energy costs
over a decision horizon. The sequential stochastic optimisation
problem is solved before the start of each day, using either a
daily or a two day decision horizon. In this paper we consider a
Battery
Electric Water
System
PV System
Controller
+
Electrical Grid
Smart Meter
Demand
=~
Thermal Energy
Electrical Energy
+
b
s
p
v
s
p
t
s
d,t
s
d,e
s
b
x
p
v
+
d,e
+
d,t
+
wh
x
Inverter
Fig. 2. Illustration of electrical and thermal energy flows in a smart home,
and the state, decision and random variables use to formulate the problem.
PV unit, battery and a hot water system (TES unit), as depicted
in Fig. 2. We use a single inverter for both the battery and the
PV, which is becoming popular in Australia.
In order to optimise performance, a SHEMS needs to
incorporate the variations in PV output, electrical and thermal
demand of the household. Given this, we model stochastic
variables using their mean as state variables and variation as
random variables. This enables us to use an algorithmic strat-
egy that separates the transition function into a deterministic
term, using the mean, and a random term, using variation
(discussed in Section III). In some cases, electricity prices
may be considered as stochastic variables. However, in this
paper, we assume that the exact electricity prices are available
before the start of the decision horizon from a residential DR
aggregator/retailer (i.e. in the form of ToUP).
In more detail, we cast our SHEMS problem as the se-
quential stochastic optimisation formulation in Section II-A
as follows: The daily decision horizon is a 24 hour period,
divided into K= 48 time-steps with a 30 minutes resolution.
We do this similarly for the two-day decision horizon. Here
30 minutes time resolution is chosen to match with typical
dispatch time lines because the PV and demand data from the
Smart Grid Smart City project [34] are only available at 30
minutes intervals. If required, the proposed ADP approach can
increase the time resolution with less computational burden
compared to existing methods. The controllable devices are the
battery and the TES, while the non-controllable inputs are the
PV output and the electrical and thermal demand. As depicted
in Fig. 2, for each time-step, k, in the decision horizon, state
variables are used to represent: the battery SOC, sb
k; TES
SOC, st
k; mean PV output, spv
k; mean electrical demand, sd,e
k;
mean thermal demand, sd,t
k; and electricity tariff, sp
k. Control
variables consist of: charge and discharge rates of the battery,
xb
k; and electric water heater input, xwh
k. Random variables
are: the variations in PV output, ωpv
k; variations in thermal
demand, ωd,t
k; and variations in electrical demand, ωd,e
k. We
use empirical data to estimate the probability distributions
associated with the uncertain variables using kernel regression,
which are more realistic than assuming parametric approaches
(discussed in detail in Section IV).
The energy balance constraint is given by:
sd,e
k+ωd,e
k+xwh
k=µixi
k+xg
k,(2)
where: xi
k=spv
k+ωpv
kµbxb
kis the inverter power at the
DC side (positive value means power into the inverter); µiis
4
the efficiency of the inverter (note that the efficiency is 1/µi
when the inverter power is negative); µbis the efficiency of the
battery action corresponding to either charging or discharging;
and xg
kis the electrical grid power. The charge rate of the
battery is constrained by the maximum charge rate xb+
kγc
and discharge rate of the battery is constrained by the maxi-
mum discharge rate xb
kγd. The electric water heater input
should never exceed the maximum possible electric water
heater input xwh
kγwh. In order satisfy thermal demand at
all time-steps, we make sure that the TES has enough energy
at each time-step, st,req
kto satisfy thermal demand for the next
2 hours. Therefore, the energy stored in the TES is always
within the limits:
st,req st
kst,max.(3)
The energy stored in the battery should be within the limits
sb,min sb
ksb,max.
Transition functions govern how the state variables evolve
over time. The battery SOC, denoted sb
k[sb,min, sb,max
k],
progresses by:
sb
k+1 =1lb(sb
k)sb
kxb-
k+µb+xb+
k,(4)
where lb(sb
k)models the self-discharging process of the bat-
tery. The TES SOC is denoted st
k[st,req, st,max
k], and evolves
according to:
st
k+1 =1lt(st
k)st
ksd,t
kωd,t
k+µwhxwh
k,(5)
where lt(st
k)models the thermal loss of the TES and µwh is
the efficiency of the electric water heater. In the above, both
the transition functions are non-linear functions of state.
The discharge efficiency of the battery and the efficiency
of the inverter are non-linear, and the different ways that the
stochastic MILP, DP and ADP approaches represent them
are illustrated in Fig 3. These indicate that DP and ADP
can directly incorporate non-linear characteristics, while linear
approximations have to be made with stochastic MILP. For all
the implemented SHEMSs the following device characteristics
are the same: the charging efficiency of the battery is µb+ = 1;
the maximum and minimum battery SOC are 2 kWh and 10
kWh, respectively; the maximum charge and discharge rates
of the battery are 2kWh; electric water heater efficiency is
µwh = 0.9while its maximum possible input is 3kWh; and
TES limit is set to 12 kWh.
The optimal policy, π, is a choice of action for each state
π:S X , that minimises the expected sum of future costs
over the decision horizon; that is:
Fπ= min
π
E(K
X
k=0
Ck(sk, π(sk),ωk)),(6)
where Ck(sk,xk,ωk)is the cost incurred at a given time-step,
which is given by:
Ck(sk,xk,ωk) = sp
ksd,e
k+ωd,e
kµixi
k+xwh
k.(7)
Note that we don’t use any specific user comfort criteria in the
contribution function. However, we endeavour to supply the
thermal demand at all time-steps without any user discomfort
0.2 0.4 0.6 0.8 1 1.2 1.4
0
50
100
Input power of the inverter (kWh)
Efficiency of the
inverter (%)
ADP and DP
Stochastic MILP
0 0.2 0.4 0.6 0.8 1
70
80
90
Battery discharge rate (kWh)
Discharge efficiency
of the battery (%)
ADP and DP
Stochastic MILP
(b)
(a)
Fig. 3. Characteristics of the battery and the inverter.
by penalising undesired states of the TES in DP and ADP,
and directly using the constraint (3) in stochastic MILP .
The problem is formulated as an optimisation of the expected
contribution because the contribution is generally a random
variable due to the effect of ωk. In all the SHEMSs, we
obtain the decisions xk=π(sk)=[xb
k, xwh
k], depending on the
state variables sk= [sb
k, st
k, sd,e
k, spv
k, sd,t
k, sp
k], and realisations
of random variables ωk= [ωpv
k, ωd,e
k, ωd,t
k]at each time-step.
C. Solution techniques
The first method we use is a scenario-based MILP ap-
proach, which we referred to as stochastic MILP in [23].
This technique requires us to linearise the constraints and
transition functions mentioned in Section II-B and model the
problem as a mathematical programming problem. The second
method we use is DP, in which we model our problem as
a Markov decision process (MDP). This method enables us
to incorporate all the non-linear constraints and transition
functions with no additional computational burden over using
linear constraints and transition functions. Details of these
methods are as follows:
1) Stochastic MILP: The deterministic version of the
SHEMS problem can be solved using a MILP approach,
which optimises a linear objective function subject to linear
constraints with continuous and integer variables [13]. Note
that the transition functions presented in Section II-B are
considered as constraints in the MILP formulation. Integer
variables are used to model power flow directions.
In order to incorporate stochasticity, a large set of scenarios
are generated by sampling from all combined realisations of
the stochastic variables mentioned in Section II-B. A larger
number of scenarios should improve the solutions generated by
better incorporating the stochastic variables, but this imposes
a greater computational burden. Therefore, heuristic scenario
reduction techniques are employed to obtain a scenario set
of size N, which can be solved within a given time with
reasonable accuracy.
5
Given this, a scenario-based stochastic MILP formulation
of the problem is described by:
min
N
X
n=1
Pn(sj,n)
K
X
k=1 sp,buy
kxg+
ksp,sell
kxg-
k,(8)
where Pn(sj,n)is the probability of a particular scenario n
corresponding to realizations of stochastic variables sj, subject
to PN
n=1 Pn(sj,n)=1.
For each realized scenario, the optimisation problem is
solved for the whole horizon at once using a standard MILP
solver, so the solution time grows exponentially with the length
of the horizon. Here CPLEX is used, however, all commercial
solvers gives similar quality solutions. As such, in the existing
literature, a one day optimisation horizon is typically assumed.
Moreover, the solutions are of lower quality because of the
linear approximations made and the inability to incorporate
all the probability distributions [23]. In response to these
limitations, DP was proposed in [22] to improve the solution
quality.
2) Dynamic programming (DP): The problem in (6) is
easily cast as an MDP due to the separable objective function
and Markov property of the transition functions. Given this,
DP solves the MDP form of (6) by computing a value function
Vπ(sk). This is the expected future cost of following a policy,
π, starting in state, sk, and is given by:
Vπ(sk) = X
s0∈S
P(s0|sk, π(sk),ωk) [C(sk, π(sk),s0) + Vπ(s0)] .
(9)
An optimal policy, π, is one that minimises (6), and which
also satisfies Bellman’s optimality condition:
Vπ
k(sk) = min
πCk(sk, π(sk)) + EnVπ
k+1(s0)|sko.(10)
The expression in (10) is typically computed using backward
induction, a procedure called value iteration, and then an
optimal policy is extracted from the value function by selecting
a minimum value action for each state. This is the key func-
tional point of difference between DP and stochastic MILP.
DP enable us to plan offline by generating value functions
for every time-step. Once we have the value functions, we
can make faster online solutions using (10) (more details are
towards the end of this section). Note that a value function at
a given time-step consists of the expected future cost from all
the states. This process of mapping states and actions is not
possible with stochastic MILP.
An illustration of a deterministic DP using a simplified
model of a battery storage is shown in Fig. 4. At every time-
step, there are three battery SOC states (i.e. highest, middle,
and lowest) and three possible battery actions that results in
different instantaneous costs. At the last time-step, k=K, the
expected future cost from the desired state, sk= M, is zero,
while the other two states are penalised with a large cost. This
is an important step that allows us to control the end-of-day
battery SOC (discussed in Section V). The expected future cost
at every possible state is calculated using (10), which is the
minimum of the combined instantaneous cost that results from
the decision that we take and the expected future cost from the
k=1 k=2 k=3 k=K
b
k
s
- Battery state of charge
b
k
x
- Battery action
3
4
5
11
4
6
3
5
3
5
7
5
6
3
2
2
6
1
5
bb
111
(,)2
Csx
Optimal policy
- Expected future cost
5
3
5
b
2
L
s
b
22
8Vs
b
2
M
s
b
22 7Vs
b
2
H
s
b
33
11Vs
b
1
s
b
11
10Vs
b
0
KK
Vs
b
4
H
s
b
44
3Vs
b
4
M
s
b
44
4Vs
b
4Ls
b
44
5Vs
b
33
6Vs
b
33
4Vs
b
3
Hs
b
3
L
s
b
33
7Vs
b
3
M
s
4
2

b
kk
Vs
b
b
11 1
(, )Cs x
- Contribution
bL
K
s
b
10
KK
Vs
b
10
KK
Vs
b
H
K
s
b
M
K
s
k=4
3
Fig. 4. A deterministic DP example using a battery storage, where expected
future cost is calculated using (10). The instantaneous contributions from the
battery decisions are on the edges of the lines while the expected future cost
is below the states. The optimal policy satisfies (6), which is obtained using
(10).
state we end up at the next time-step. In Fig 4, instantaneous
cost is on the edges of the lines while the expected future
cost is below the states. An optimal policy is extracted from
the value functions by selecting a minimum value action for
each state using (10). For example, from sb
1, if we take the
optimal decision to go to sb
2= L then the total combined cost
of 10 consists of a instantaneous cost of 2and a expected
future cost of 8. Even though the expected future cost of 7
from sb
2= M is lower than the expected future cost from sb
2=
L, the instantaneous cost that takes us there is 4so the total
combined cost is 11. Given this, the expected future cost of
following the optimal policy from sb
1is 10 and at time-step 2
we will be at sb
2= L.
There are several reasons to prefer DP over MILP. First, DP
produces close-to-optimal solutions when the value functions
obtained during the offline planning phase are from finely dis-
cretised state, action and outcome spaces. Second, in practical
applications, the SHEMS can make real-time decisions using
the policy implied by (10). This means that at each time-step,
the optimal decision from the current state can be executed.
Note that (10) is a simple linear program at each time-step so
is computationally feasible using existing smart meters. This
stands in contrast to a stochastic MILP formulation, which
would involve solving the entire stochastic MILP program,
which is computationally difficult even for the offline planning.
Third, we can always obtain a solution with DP regardless
of the constraints and the inputs while MILP fails to find
a solution when the constraints are not satisfied. From our
experience, MILP fails to find a solution when the end-of-
day TES SOC is fixed, because the energy out of the TES
unit can not be controlled, which is the thermal demand of
the household. We can overcome this by either removing the
end-of-day TES constraint or by having a range of values.
However, this means we end up with a sub optimal level of
6
Algorithm 1 :ADP using Temporal Difference Learning TD(1)
1: Initialize ¯
V0
k, k K,
2: Set r= 1 and k= 1,
3: Set s1.
4: while rRdo
5: Choose a sample path ωr.
6: for k= 0,...,K do
7: Solve the deterministic problem (17).
8: for i= 1,...,I do
9: Find right and left marginal contributions (18).
10: end for
11: if k < K then
12: Find the post decision states (11) and the next pre
decision states (12).
13: end if
14: end for
15: for k=K, . . . , 0do
16: Calculate the marginal values (19).
17: Update the estimates of the marginal values (20).
18: Update the VFAs using CAVE algorithm.
19: Combine value functions of each controllable device (16)
20: end for
21: r=r+ 1.
22: end while
23: Return the value function approximations ¯
VR
kk.
TES SOC at the end of the day or require user interaction
to adjust the TES SOC, which we should avoid in practical
applications.
However, the required computation to generate value func-
tions using DP grows exponentially with the size of the
state, action and outcome spaces. One way of overcoming
this problem is to approximate the value function, while
maintaining the benefits of DP.
III. APP ROXIMATE DYNA MI C PROGRAMMING (ADP)
ADP, also known as forward DP, is an algorithmic strategy
for approximating a value function, which steps forward in
time, compared to backward induction used in value iteration.
Policies in ADP are extracted from these VFAs [29]. Similar to
DP, ADP operates on an MDP formulation of the problem, so
all the non-liner constraints and transition functions in Section
II-B can be incorporated with the same computational burden
as modelling linear transition functions and constraints. ADP
is an anytime optimisation solution technique2so we always
obtain a solution regardless of the constraints and the inputs.
In this instance, the problem is formulated as a maximisation
problem for convenience.
A. Policy-based value function approximation
VFAs are obtained iteratively, and here the focus is on
approximating the value function around a post decision state
vector, sx
k, which is the state of the system at discrete time, k,
soon after making the decisions but before the realisation of
any random variables [29]. This is because approximating the
expectation within the max or min operator in (10) is difficult
2An anytime algorithm is an algorithm that returns a feasible solution even
if it is interrupted prematurely. The quality of the solution, however, improves
if the algorithm is allowed to run until the desired convergence.
k=1 k=2
1
s
1
L
x
s
1
M
x
s
,
211
(,)
Mx
sS s
2
M
s
222
(,)
Csx
2
H
s
- Pre-decision state
k
x
- Decision variable
k
k
s
- Random variable
- Contribution
Next pre-decision state
(, )
kkk
Csx
1
H
x
s
1
L
x
s
2
Ls
,
111
(, )
xMx
s
Ssx
Post-decision state
2
M
x
s
2
H
x
s
2
L
x
s
2
x
s
2
x
s
- Post-decision state
x
k
s
111
(,)
Csx
1
Fig. 5. Illustration of the modified Markov decision process, which separates
the state variables into post decision states and pre-decision states.
in large practical applications as transition probabilities from
all the possible states are required. Pseudo-code of the method
used to approximate the value function is given in Algorithm1,
which is a double pass algorithm referred to as temporal
difference learning with a discount factor λ= 1 or TD(1).
Given this, the original transition function sk+1 =
sM(sk,xk,ωk)is divided into the post-decision state:
sx
k=sM,x (sk,xk),(11)
and the next pre-decision state:
sk+1 =sM,ω (sx
k,ωk),(12)
which are used in line 12 of Algorithm 1.
An example of the new modified MDP is illustrated in
Fig. 5, which uses the mean and variation of the stochastic
variables to obtain the post-decision and next pre-decision
states, respectively. In more detail, at s1, there are three
possible decisions that takes us to three post-decision states,
which correspond to high, middle and lowest states. However,
the next pre decision state s2depends on the random variables
ω1.
Given this, the new form of the value function is written as:
¯
Vπ
k(sk) = max
xkCk(sk,xk) + ¯
Vπ,x
k(sx
k),(13)
where ¯
Vπ,x
k(sx
k)is the VFA around the post-decision state sx
k,
given by:
¯
Vπ,x
k(sx
k) = EVπ
k+1(sk+1 )|sx
k.(14)
This method is computationally feasible because
EVπ
k+1(sk+1 )|sx
kis a function of the post-decision
state sx
k, that is a deterministic function of xk. However,
in order to solve (13), we still need to calculate the value
functions in (14) for every possible state sx
kfor all k. This
can be computationally difficult since sx
kis continuous and
multidimensional, so we approximate (14).
Two strategies are employed. First we construct lookup
tables for VFAs in (14) that are concave and piecewise linear in
the resource dimension of all the state variables of controllable
devices [25]. For example, in the VFA for k= 49, which
7
2 4 6 8 10
0
0.5
1
Battery state
Expected future reward ($)
k=49
k=60
500 1000
0
1.5
Iterations
Objective function ($)
ADP
DP
(b)
(a)
Fig. 6. (a) Expected future reward or VFA (14) for following the optimal
policy vs. state of the battery for time-steps k= 49 and k= 60, and (b)
value of the objective function (i.e. reward) vs. iterations for ADP approach
and the expected value from DP.
is depicted in Fig. 6(a), the expected future rewards stay the
same after approximately 7 kWh, so if we are at 7 kWh in
the time-step k= 48, charging the battery further will have
no future rewards and will only incur an instantaneous cost
if the electricity has to come from the grid. However, if the
electricity price or demand is high then we can discharge
the battery as the expected future rewards will only decrease
slightly. Given this, we never charge the storage when there is
no marginal value so the slopes of the VFA are always greater
than or equal to zero. Accordingly, the VFA is given by:
¯
Vi
k(si,x
k) =
Ak
X
a=1
¯vkazk a,(15)
where Pazka =si,x
kand 0zka ¯zka for all a.zka is
the resource coordinate variable for segment a(1 . . . Ak),
Ak∈ A,¯zka is the capacity of the segment and ¯vka is the
slope. Other strategies that could be used for this step are
parametric and non-parametric approximations of the value
functions [29].
Second we handle the multidimensional state space by
generating independent VFAs for each controllable device,
which are then combined to obtain the optimum policy. The
separable VFA is given by:
¯
Vk(sx
k) =
I
X
i=1
¯
Vi
k(si,x
k).(16)
It is possible to separate the VFAs for battery and TES because
their state transitions are independent as shown in (4) and
(5), respectively. Instead, the inter-device coupling between the
battery and the TES only arise through their effect on energy
costs, so it is done in the contribution function in (7). If the
state transition functions of the controllable devices depend
on each other, then the VFAs are not separable and we have
to use multidimensional value functions. In such situations the
number of iterations needed for VFA convergence will increase
and concavity needs to be generalised as well.
In more detail, Algorithm 1 proceeds as follows:
1) Set the initial VFAs to zero (i.e. all the slopes to zero)
or to an initial guess to speed up the convergence (lines
1-3). Estimates for the initial VFAs can be obtained by
solving the deterministic problem using MILP. The value
of the initial starting state si
1is assumed.
2) For each realisation of random variables, step forward in
time by solving the following deterministic problem (line
7) using the VFA from the previous iteration:
xr
k=arg max
xkXkC(sr
k,xk) + ¯
Vr1
k(sx,r
k),
=arg max
xkXk
C(sr
k,xk) +
Ar1
k
X
a=1
¯vr1
ka zka
.(17)
3) Determine the positive and the negative marginal contri-
butions ˆcr,i+
k(sr,i
k)and ˆcr,i
k(sr,i
k), respectively (line 9)
for each controllable device, using:
ˆcr,i+
k(sr,i
k) = cr,i+
k(sr,i+
k, xr,i+
k)cr,i
k(sr,i
k, xr,i
k)
δs ,
ˆcr,i
k(sr,i
k) = cr,i
k(sr,i
k, xr,i
k)cr,i
ki (sr,i
k, xr,i
k)
δs ,(18)
where sr,i+
k=sr,i
k+δs,xr,i+
k=Xπ
ksr,i+
k,and δs
is the mesh size of the state space. We do this similarly
for sr,i
kand xr,i
k.
4) Find the post-decision and the next pre-decision states
using (11) and (12), respectively. Transition functions of
the controllable devices can be non-linear (line 12).
5) Starting from K, step backward in time to compute the
slopes, ˆvr,i+
k, which are then used to update the VFA (line
16). Compute ˆvr,i+
k:
ˆvr,i+
k(sr,i
k) =
ˆcr,i+
K(sr,i
K),if k=N
ˆcr,i+
k(sr,i
k)+
r,i+
kˆvr,i+
k+1 (sr,i
k+1)otherwise
,(19)
where r,i+
k=1
δs SM(xr,i
kxr,i+
k)is the marginal
flow (i.e. whether or not there is a change in energy in
the storage as a result of the perturbation). We do this
similarly for ˆvr,i
k(sr,i
k). Note that we took the power
coming out of the storage as negative.
6) Update the estimates of the marginal values [27]:
¯vr,i+
k1(sx,r,i
k1) = 1αr1¯vr1,i+
k1(sx,r,i
k1) + αr1ˆvr,i+
k,
(20)
where αis a “stepsize”; α(0,1], and similarly for
¯vr,i
k1(sx,r,i
k1)(line 17). In this research, a harmonic step-
size formula is used, α=b/(b+r), where bis a constant.
This step-size formula satisfies conditions ensuring that
the values will converge as r→ ∞ [36].
7) Use the concave adaptive value estimation (CAVE) algo-
rithm to update the VFAs [37] (line 18).
8) Combine value functions of each device using (16).
9) Repeat this procedure over Riterations, which are gener-
ated randomly according to the probability distributions
of the random variables. We find that R= 1000 re-
alisations is enough for the objective function to come
8
0.2 0.4 0.6 0.8 1
0
0.5
1
Normalised PV output
Probability density
6 am 9 am 10 am 12 pm 4 pm
0.2 0.4 0.6 0.8 1
0
0.5
1
Normalised electrical demand
6 am 10 am 2 pm 5 pm 9 pm
(b)
(a)
Fig. 7. Probability density functions of the (a) PV output over different times
of a sunny day (January 1st, 2013), and (b) electrical demand over different
times of a high demand day (July 2nd, 2012).
within an acceptable accuracy even for the worst possible
scenario. We investigated a range of scenarios and an
example is given in Fig.6(b).
Note that when solving a deterministic SHEMS problem
using ADP, the post decision and next pre decision states are
the same because the random variables will be zero. However,
the remaining steps in Algorithm 1 stay the same. This means,
with ADP, there is no noticeable computational burden for
considering variation in the stochastic variables compared to
solving the deterministic problem. On the other hand, with
DP we have to loop over all the possible combinations of
realisation of random variables, which significantly increases
the computational burden.
Now we present the probabilistic models of the stochastic
variables.
IV. ESTIMATING STOCHASTIC VARIABLE MODELS
In order to optimise performance, it is important for a
SHEMS to incorporate variations in the PV output and elec-
trical and thermal demand, and to do so over a horizon of
several days. The benefits of using a stochastic optimisation
over a deterministic optimisation are discussed in Section V-
D and in [13], [19], [20], [22], [23]. Given this, SHEMSs
require the mean PV output and the demand with its appro-
priate probability distributions before the start of the decision
horizon. The effects of these random variations on the SHEMS
problem are discussed below:
1) PV output depends on solar insolation, a forecast of
which can be obtained before the horizon starts with a
reasonable accuracy from weather forecasting services.
PV output is important to the SHEMS problem as it is
a key source of energy and is expected to be closely
coupled with the battery storage profile. Failing to accom-
modate for variation in PV generation would be expected
to increase costs to the household as more power is
imported from the grid.
2) Electrical demand of the household depends on the num-
ber of occupants and their behavioural patterns, which is
difficult to predict in the real world. In the context of
SHEMS, electrical demand should be supplied from the
DG units, storage units and the electrical grid. Failure to
accommodate variations in electrical demand may result
in additional costs to the household.
3) Thermal demand is also difficult to predict in the real
world so failure to accommodate variations in thermal
demand may result in user discomfort.
In this paper, the mean PV output and electrical demand
are from a data set collected during Smart Grid Smart City
project. The data set [34] consists of PV output and electrical
demand measurements at 30 minutes intervals over 3 years
for 50 households. In real world applications, the SHEMSs
estimate mean PV output using the weather forecast [38] and
the mean electrical demand using a suitable demand prediction
algorithm [39], [40].
Commonly, probability distributions associated with these
stochastic variables are modelled as Gaussian or skew-Laplace
distributions. However, in this paper, we kernel estimate the
probability distributions of PV-output and electrical demand
using a hierarchical approach, where we first cluster total
daily empirical data, and then kernel estimate probability
distributions within each cluster. In more detail, we obtain
the probability distributions of the PV output, which depend
on the time and type of day (sunny, normal or cloudy days)
in two steps.
1) First we cluster daily empirical data using a k-means
algorithm to obtain 3 clusters with different total daily PV
generations, corresponding to sunny, normal and cloudy
days.
2) Second, for each time-step in the corresponding clusters,
we estimate a probability distribution of the PV output
using an Epanechnikov kernel estimating technique.
Note that the draws from the kernel estimates within a day-
type are independent. We obtain the probability distributions
of the electrical demand in a similar way except the clustering
is done according to days with high, normal or low demand
levels. The probability distributions of the PV output and
electrical demand follows a skewed unimodal distributions as
depicted in Fig. 7. It is worthwhile to note that before the start
of the decision horizon, the SHEMS uses the predicted mean
PV-output and the electrical demand to determine the type of
day and hence the corresponding probability distribution. Note
that the prediction is accurate enough to decide the type of day
but not accurate enough to use a deterministic optimisation.
Given this, we investigate the effects on the total electricity
costs from deterministic and stochastic SHEMSs, using a range
of possible PV output and demand profiles, in Section V-D.
Finally, we construct the magnitudes of the thermal demand
and the time they occur making use of Australian Standard
AS4552 [41] and the hot water plug readings in [35]. We
assume a Gaussian distribution for the thermal demand be-
cause there is not enough empirical data to obtain a reasonable
distribution.
9
TABLE I
DAILY OPTIMISATION RESULTS FOR THE THREE SCENARIOS.
Total daily: Scenario 1 Scenario 2 Scenario 3
Electrical demand (kWh) 24.72 64.75 10.48
Thermal demand (kWh) 10.42 19.1 8.66
PV generation (kWh) 9.27 6.06 12.31
Benchmark cost ($) 6.13 10.5 1.77
With PV ($) 5.75 9.86 1.37
DP (sb
1=sb
K= 6) ($) 3.16 8.04 0.72
DP ($) 3.05 7.96 0.59
ADP ($) 3.14 7.79 0.6
Stochastic MILP ($) 3.15 8.11 0.63
Dummy TES control ($) 2.13 2.56 0.83
DP TES control ($) 0.91 1.68 0.58
Marginal value of TES ($) 1.22 0.88 0.25
Now we show the performance of the presented SHEMSs
using real-data.
V. SIMULATION RESULTS AND DISCUSSION
There are three sets of simulations. The first set consists
of discussions about: challenges of estimating the end-of-day
SOC; benefits of PV-storage systems; quality of the solutions
from ADP, DP and stochastic MILP; and the benefits of
a stochastic optimisation over a deterministic one (Sections
V-A, -B, -C and -D). The second set has discussions on:
computational aspects; and effects of extending the decision
horizon (Sections V-E, and -F). The third set is about the year-
long optimisation (Section V-G). The time-of-use electricity
tariff consists of off-peak, shoulder and peak periods with
$0.11,$0.2and $0.47 per kWh, respectively. In weekdays,
off-peak period is between 12 am - 7 am and 10 pm - 12 am,
and the peak is between 2 pm - 8 pm. On weekends peak-
period is replaced by the shoulder. Stochastic MILP uses 6000
scenarios. MATLAB is used to implement all the SHEMSs.
A. Challenges of estimating the end-of-day SOC
In the first set of simulations, we discuss the challenges
of estimating the end-of-day battery SOC (Section V-A),
benefits of residential PV-storage systems (Section V-B), the
performance of the three SHEMSs over a day (Section V-
C) and the benefits of using a stochastic optimisation over
a deterministic one (Section V-D) using three scenarios for
a Central Coast, NSW, Australia, based residential building
shown in Fig. 8: Scenario 1, 2, and 3 are on August 20th ,
2012, July 2nd, 2012, and January 1st , 2013, respectively. The
PV system size is 2.2kWp.
From our preliminary investigations, and as depicted in
Fig 6(a), the expected future rewards in the start-of-day two
(time-step k= 49) only increases slightly after 6kWh, which
suggests that using half of the available battery capacity as the
start-of-day and end-of-day battery SOC (sb
1=sB
K= 6 kWh)
in daily optimisations with DP is a valid assumption. However,
we observe that on days with a low morning demand, high
2 4 6 8 10 12 14 16 18 20 22 24
0
2.2
2 4 6 8 10 12 14 16 18 20 22 24
0
3.3
Power (kW)
2 4 6 8 10 12 14 16 18 20 22 24
0
1.2
Time (hour)
PV output Electrical demand Thermal demand
(b)
(a)
(c)
Fig. 8. PV output and electrical and thermal demand for (a) Scenario 1, (b)
Scenario 2 and (c) Scenario 3.
PV output and medium-high evening demand (see Fig. 8(c)),
sb
1=sB
K= 2 kWh gives the best results because the battery
can be used to supply the evening demand and there is no need
to charge it back. However, the next day’s electricity cost can
significantly increase if we are anticipating a high morning
demand and low-high PV output (see Fig. 8(a-b)). Because of
such situations, it is beneficial to control the end-of-day battery
SOC by considering uncertainties over several days, which is
our special focus of attention in Section V-E & F.
B. Benefits of PV-storage systems
Residential PV-storage systems using a DP based SHEMS
result in significant financial benefits, which is evident from
the electricity cost of the three scenarios in three instances
(i.e. benchmark cost with neither PV or storage, cost with
PV but no storage and PV-storage system with SHEMSs) in
Table. 1. The daily electricity cost can be reduced by 6.2%,
6.1% and 22.6% for Scenario 1, 2 and 3, respectively, if there
is only a PV system. We can further improve this by having a
battery and effectively controlling its SOC using a SHEMS in
which the battery is charged to a certain level from solar power
and electrical grid before peak periods. A DP based SHEMS
constrained to a 60% start-of-day and end-of-day battery SOC
reduces the total electricity cost by further 42.25%, 17.33%
and 36.72% for Scenario 1, 2, & 3, respectively. That is a total
of 48.45%, 23.43% and 59.32% cost reduction for Scenario
1, 2 and 3, respectively, by controlling both the battery and
TES. As shown in Fig. 8, inhabitants are away during the day
for Scenario 1 and 3 so the extra PV generation is stored
in the battery thus shows the benefits of having a storage.
In contrast to Scenarios 1 & 3, Scenario 2 electrical demand
10
0 0.2 0.4 0.6 0.8 1
2
2.5
3
0 0.2 0.4 0.6 0.8 1
5
5.5
6
6.5
Standard deviation (kW)
0 0.2 0.4 0.6 0.8 1
Total electricity cost ($)
0
0.5
1
Deterministic Stochastic
(a)
(c)
(b)
Fig. 9. The total electricity cost of PV-battery systems of Scenarios 1-3
from deterministic and stochastic ADP for a range of possible PV output and
demand profiles, which are generated by adding Gaussian noise with varying
standard deviation and zero mean to the actual PV output and electrical
demand.
exceeds PV generation so the benefit of battery is minimal.
The electricity cost of controlling the TES in Scenario 1 and
3 are the lowest because the surplus of solar and battery power
is used to charge the TES instead of sending it back to the
grid as FiTs are negligible in Australia. Note that we obtain
the benchmark cost of the TES by assuming a dummy control
system, which operates regardless of the electricity price. The
dummy TES control electricity cost varies depending on the
time the hot water is used.
C. Quality of the solutions from ADP, DP and stochastic MILP
ADP and DP results in better quality solutions than stochas-
tic MILP as they both incorporate stochastic input variables
using appropriate probabilistic models and non-linear con-
straints [24]. However, ADP results in slightly lower quality
schedules compared to the optimal DP solutions because the
value functions used are approximations. This is evident in
Table 1. Note that the DP based SHEMS with two controllable
devices (battery and thermal) is computationally intractable
to be use in an existing smart meter, and we only use it to
compare solutions with ADP .
D. Benefits of a stochastic optimisation over a deterministic
optimisation
A stochastic optimisation always performs better over a
deterministic one and we investigate this using a range of
possible PV output and the demand profiles as shown in Fig. 9.
The Fig. 9 is obtained as follows: first we obtain VFAs from
both the stochastic and deterministic optimisations using ADP.
The stochastic optimisation uses the kernel estimated probabil-
ity distributions of the PV output and electrical demand while
the deterministic ADP only uses the predicted mean PV output
and the electrical demand (i.e. all the random variables are
zero). Second we obtain the total electricity cost for different
possible PV output and demand profiles, which are generated
by adding Gaussian noise with varying standard deviation and
zero mean to the actual PV output and electrical demand. The
mean absolute errors of the actual (i.e. zero Gaussian noise)
and the predicted mean electrical demand are 0.238%,0.696%,
and 0.117%, for Scenarios 1,2, and 3, respectively, which are
the initial points. Note that the forecast errors associated with
residential electrical demand predictions are typically very
high and our aim here is not to minimise forecast errors but to
find a suitable stochastic optimisation technique that performs
well under uncertainty.
ADP enables us to incorporate the stochastic input variables
without a noticeable increase in the computational effort over
a deterministic ADP. Moreover, stochastic ADP requires less
computational effort than deterministic DP. An ADP-based
stochastic optimisation for a PV-battery system can reduce
the total electricity cost by 13.62%,0.16%, and 94.67%
for scenarios 1,2, and 3, respectively, in instances without
Gaussian noise. The benefits from Scenarios 1 and 3 are
noticeable and their forecast errors are what we can expect
from electrical demand prediction algorithms [39], [40]. The
benefits from the stochastic optimisation is minimal when the
forecast errors are very high (Scenario 2) or very low, which
are highly unlikely. In Scenario 2, the stochastic optimisation
gives slightly lower quality results after 0.2 kW standard
deviation of Gaussian noise because both the forecast and
kernel estimation errors are high. However, these scenarios are
highly unlikely and the resulting cost is negligible compared
to the benefits from a stochastic optimisation. Moreover, the
initial point of Scenario 2 already has a mean absolute error
of 0.696, which is highly unlikely to increase any further in
a practical scenario. In summary, even though the benefits
vary with the scenario and the forecast error, a stochastic
optimisation performs better or same as a deterministic one,
and ADP provides these benefits without a noticeable increase
in the computational effort.
E. Computational aspects
In our second set of simulation results, we discuss the
computational performance of the three solution techniques
(Section V-E) and the benefits of extending the decision
horizon (Section V-F) of households with PV-battery systems.
ADP computes a solution much faster than both DP and
stochastic MILP and, most importantly, ADP with a two-day
decision horizon computes a solution with less than half of the
computational time of a daily DP, as depicted in Fig. 10(a).
The computational time of both SHEMSs using ADP and
DP increases linearly as we increase the decision horizon,
however, ADP has a lesser slope. This linear increase with
DP is because the state transitions in this problem are only
between two adjacent time-steps so time does not increase
exponentially. The computational time of ADP with a two-
day decision horizon will only increase by 4 minutes when
11
1 2 3 4
0
20
40
60
80
Length of the decision horizon (days)
Solution time (minutes)
ADP
DP
Stochastic MILP
1234
0.8
1
Length of the decision horizon (days)
Normalised electricity cost
(b)
(a)
Fig. 10. Effects of extending the decision horizon for PV-battery systems,
(a) computational time of ADP, DP and stochastic MILP against the length
of the decision horizon, and (b) normalised electricity cost with error bars
against the length of the decision horizon.
0
5
10
Year 1 Year 2 Year 3
Electricity cost savings (%)
Mean
Fig. 11. Electricity cost savings over 3 years for 10 households, where blue
lines indicate 25th and 75th percentiles and the red lines indicate the median.
the TES is added while a finely discretised DP based SHEMS
takes approximately 2.5 hours.
F. Effects of extending the decision horizon
Extending the decision horizon beyond one day to consider
inter-daily variations in PV output and electrical demand
results in significant benefits as depicted in Fig. 10(b), which
shows the normalised electricity cost vs. length of the decision
horizon. We obtain our results using a DP based SHEMS
for 10 households over 2 months.3Our results show that
increasing the decision horizon beyond two days has no
significant benefits. However, increasing the decision horizon
up to one week is beneficial in some situations, e.g. off-the-
grid systems and if there are high variations in PV output and
demand. The benefits of the two-day decision horizon varies
depending on the household, which we will discus in the next
section.
3Here we use DP as we want to obtain the exact solution. In a practical
implementation, extending the decision horizon with DP is difficult as the
computational power is limited in existing smart meters.
TABLE II
YEARLY OPTIMISATION RESULTS FOR TWO HOUSEHOLDS OVER 3YEARS
Total: Household 1 Household 2
Year 1 Year 2 Year 3 Year 1 Year 2 Year 3
PV output
(MWh) 2.91 2.82 2.89 5.99 5.56 5.35
Demand
(MWh) 4.29 4.82 4.38 9.82 10.24 12.85
Benchmark
cost ($) 568.09 610.6 558 1208.3 1276 1588.2
With
PV ($) 440.52 482.5 417.1 821.7 891.4 1194.8
PV-battery
DP ($) 248.37 297.55 238.15 534.8 596.25 890.25
PV-battery
ADP ($) 232.25 285.63 224.18 526.2 589.23 882.29
PV-battery
Stochastic
MILP ($)
281.59 333.60 276.23 554.83 599.07 906.75
G. Year-long optimisation
In our third set of simulation results, we compare the ADP
based SHEMS with a two-day decision horizon to a DP
approach with a daily decision horizon over three years for
10 households with PV-battery systems. We omit TES as we
already identified its benefits in Section V-B, and moreover, a
yearly optimisation using DP with two controllable devices is
computationally difficult. The time-periods of the three years
are: year 1 from July 1st, 2012 to June 31st , 2013, year 2 is
from July 1st, 2011 to June 31st , 2012 and year 3 is from July
1st, 2010 to June 31st , 2011. Electricity cost savings for all
the households over three years are given in Fig.11 and in
Table 2 we demonstrate detailed results of two households in
Central Coast (Household 1) and Sydney (Household 2), in
NSW, Australia. The PV sizes of the Households 1 & 2 are
2.2kWp and 3.78 kWp, respectively.
The proposed ADP-based SHEMS implemented using a
two-day decision horizon that consider variations in PV output
and electrical demand reduces the average yearly electricity
cost by 4.63% compared to a daily DP based SHEMS, as
depicted in Fig 11. We also find that the average yearly savings
are 5.12%, 3.89% and 4.95% for years 1-3, respectively. This
is because 2013 was a sunny year compared to 2012 and 2011
so the two day optimisation has greater benefits. For example:
if we are anticipating a sunny weekend with low demand
then we can completely discharge the battery on Friday night.
However, if we have a half charge battery on Friday night
then we will be wasting most of the free energy as the storage
capacity is limited. Our results showed that a daily DP results
in a significant cost savings of 12.04% ($107.35) and 1.91%
($39.35) for Households 1 and 2, respectively, compared to a
daily stochastic MILP approach. The difference in the savings
is because of the following reasons. In scenarios with high
demand (i.e. Household 2), most of the time the battery will
discharge its maximum possible power to the household during
peak periods, so the battery and the inverter will operate in
the maximum efficiencies even though MILP solver does not
consider the non-linear constraints. The converse is happening
for scenarios with low demand (i.e. Household 1).
For demonstration purposes we show optimisation results
12
for two households over three years in Table.2. The average
yearly electricity cost for a residential PV-battery system
with the proposed ADP based SHEMS can reduce the total
electricity cost over 3 years by 57.27% and 50.95% for
Households 1 and 2, respectively, compared to the 22.80%
and 28.60% improvements by having only a PV system. It is
important to note that a DP based SHEMS over a two-day
long decision horizon may result in a slightly better solution.
However, it is computationally difficult and the computational
power available will be limited as it won’t be worthwhile
investing in a specialised equipment to solve this problem
given the savings on offer.
VI. CONCLUSION
This paper shows the benefits having a smart home energy
management system and presented an approximate dynamic
programming approach for implementing a computationally
efficient smart home energy management system with sim-
ilar quality schedules as with dynamic programming. This
approach enables us to extend the decision horizon to up to a
week with high resolution while considering multiple devices.
Our results indicate that these improvements provide financial
benefits to households employing them in a smart home energy
management system. Moreover, stochastic approximate dy-
namic programming always performs better over deterministic
approximate dynamic programming under uncertainty without
a noticeable increase in the computational effort.
In practical applications, we can use value function approx-
imations generated from approximate dynamic programming
during offline planning phase to make faster online solu-
tions. This is not possible with stochastic mixed-integer linear
programming and generating value functions using dynamic
programming is computationally difficult. In the future, we
will learn initial value function approximations of approximate
dynamic programming using historical results, which will fur-
ther speed up the value function approximation convergence.
Given the benefits outlined in this paper and the possible
future work, we recommend the use of approximate dynamic
programming in smart home energy management systems.
REFERENCES
[1] Australian Government Bureau of Resources and Energy Economics.
Energy in Australia 2014.
[2] A. C. Chapman, G. Verbic, and D. J. Hill, “Algorithmic and strategic
aspects to integrating demand-side aggregation and energy management
methods,” IEEE Transactions on Smart Grid, vol. PP, no. 99, pp. 1–13,
2016.
[3] Ausgrid, NSW, Australia. Time-of-use pricing.
[4] Energy Australia. Flexible pricing FAQs.
[5] S. Lu, N. Samaan, R. Diao, M. Elizondo, C. Jin, E. Mayhorn, Y. Zhang,
and H. Kirkham, “Centralized and decentralized control for demand
response,” in Innovative Smart Grid Technologies (ISGT), 2011 IEEE
PES, 2011, pp. 1–8.
[6] M. Pipattanasomporn, M. Kuzlu, and S. Rahman, “An algorithm for
intelligent home energy management and demand response analysis,”
IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 2166–2173, 2012.
[7] S. Li, D. Zhang, A. B. Roget, and Z. O’Neill, “Integrating home energy
simulation and dynamic electricity price for demand response study,
IEEE Transactions on Smart Grid, vol. 5, no. 2, pp. 779–788, March
2014.
[8] M. Muratori and G. Rizzoni, “Residential demand response: Dynamic
energy management and time-varying electricity pricing,IEEE Trans-
actions on Power Systems, vol. 31, no. 2, pp. 1108–1117, March 2016.
[9] Australian PV Institute. Australian PV market since April 2001.
[Online]. Available: http://pv-map.apvi.org.au/analyses
[10] Electric Power Research Institute (EPRI), “The integrated grid realizing
the full value of central and distributed energy resources,” Tech. Rep.,
2014.
[11] Australian Energy Market Operator (AEMO), “Emerging technologies
information,” Tech. Rep., 2015.
[12] Rocky Mountain Institute, Homer Energy, Cohnreznick Think Energy,
“The economics of grid defection when and where distributed solar
generation plus storage competes with traditional utility service,” Tech.
Rep., 2014.
[13] Z. Chen, L. Wu, and Y. Fu, “Real-time price-based demand response
management for residential appliances via stochastic optimization and
robust optimization,” IEEE Transactions on Smart Grid, vol. 3, no. 4,
pp. 1822–1831, 2012.
[14] C. Keerthisinghe, G. Verbiˇ
c, and A. Chapman, “Evaluation of a multi-
stage stochastic optimisation framework for energy management of
residential PV-storage systems,” in Power Engineering Conference (AU-
PEC), 2014 Australasian Universities, Sept 2014, pp. 1–6.
[15] M. Bozchalui, S. Hashmi, H. Hassen, C. Canizares, and K. Bhattacharya,
“Optimal operation of residential energy hubs in smart grids,” IEEE
Transactions on Smart Grid, vol. 3, no. 4, pp. 1755–1766, 2012.
[16] F. De Angelis, M. Boaro, D. Fuselli, S. Squartini, F. Piazza, and
Q. Wei, “Optimal home energy management under dynamic electrical
and thermal constraints,” IEEE Transactions on Industrial Informatics,
vol. 9, no. 3, pp. 1518–1527, 2013.
[17] J. Wang, Z. Sun, Y. Zhou, and J. Dai, “Optimal dispatching model
of smart home energy management system,” in Innovative Smart Grid
Technologies - Asia (ISGT Asia), 2012 IEEE, 2012, pp. 1–5.
[18] K. C. Sou, J. Weimer, H. Sandberg, and K. Johansson, “Scheduling smart
home appliances using mixed integer linear programming,” in Decision
and Control and European Control Conference (CDC-ECC), 2011 50th
IEEE Conference on, 2011, pp. 5144–5149.
[19] M. Pedrasa, E. Spooner, and I. MacGill, “Robust scheduling of residen-
tial distributed energy resources using a novel energy service decision-
support tool,” in Innovative Smart Grid Technologies (ISGT), 2011 IEEE
PES, 2011, pp. 1–8.
[20] M. Pedrasa, T. Spooner, and I. MacGill, “Coordinated scheduling of
residential distributed energy resources to optimize smart home energy
services,” IEEE Transactions on Smart Grid, vol. 1, no. 2, pp. 134–143,
2010.
[21] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed.
Athena Scientific, Bemont, Massachuttes, 2005, vol. 1.
[22] H. Tischer and G. Verbiˇ
c, “Towards a smart home energy management
system - a dynamic programming approach,” in Innovative Smart Grid
Technologies Asia (ISGT), 2011 IEEE PES, 2011, pp. 1–7.
[23] C. Keerthisinghe, G. Verbiˇ
c, and A. Chapman, “Addressing the stochas-
tic nature of energy management in smart homes,” in Power Systems
Computation Conference (PSCC), 2014, Aug 2014, pp. 1–7.
[24] ——, “Energy management of PV-Storage systems: ADP approach
with temporal difference learning,” in Power Systems Computation
Conference (PSCC), 2016, Jun 2016, pp. 1–7.
[25] D. F. Salas and W. B. Powell, “Benchmarking a scalable approximate dy-
namic programming algorithm for stochastic control of multidimensional
energy storage problems,” Technical report, Working Paper, Department
of Operations Research and Financial Engineering, Princeton, NJ, Tech.
Rep., 2013.
[26] D. Jiang, T. Pham, W. Powell, D. Salas, and W. Scott, “A comparison
of approximate dynamic programming techniques on benchmark energy
storage problems: Does anything work?” in Adaptive Dynamic Program-
ming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on,
Dec 2014, pp. 1–8.
[27] R. Anderson, A. Boulanger, W. Powell, and W. Scott, “Adaptive stochas-
tic control for the smart grid,” Proceedings of the IEEE, vol. 99, no. 6,
pp. 1098–1115, 2011.
[28] Powell, Warren B. and George, Abraham and Simao, Hugo and Scott,
Warren and Lamont, Alan and Stewart, Jeffrey, “SMART: A Stochastic
Multiscale Model for the Analysis of Energy Resources, Technology,
and Policy,” INFORMS Journal on Computing, vol. 24, no. 4, pp. 665–
682, 2012.
[29] W. B. Powell, Approximate Dynamic Programming - Solving the Curses
of Dimensionality. John Wiley and Sons, Inc., 2007.
[30] ——, “Approximate dynamic programming for large-scale resource al-
location problems,” Princeton University, Princeton, New Jersey 08544,
USA,, 2005.
[31] F. Borghesan, R. Vignali, L. Piroddi, M. Prandini, and M. Strelec,
“Approximate dynamic programming-based control of a building cooling
13
system with thermal storage,” in Innovative Smart Grid Technologies
Europe (ISGT EUROPE), 2013 4th IEEE/PES, Oct 2013, pp. 1–5.
[32] M. Strelec and J. Berka, “Microgrid energy management based on
approximate dynamic programming,” in Innovative Smart Grid Tech-
nologies Europe (ISGT EUROPE), 2013 4th IEEE/PES, Oct 2013, pp.
1–5.
[33] P. Samadi, H. Mohsenian-Rad, V. Wong, and R. Schober, “Real-time
pricing for demand response based on stochastic approximation,” IEEE
Transactions on Smart Grid, vol. 5, no. 2, pp. 789–798, March 2014.
[34] E. L. Ratnam, S. R. Weller, C. M. Kellett, and A. T. Murray, “Residential
load and rooftop PV generation: an Australian distribution network
dataset,” International Journal of Sustainable Energy, vol. 0, no. 0, pp.
1–20, 0.
[35] Smart-Grid Smart-City Customer Trial Data. [Online]. Available:
https://data.gov.au/dataset/smart-grid-smart-city-customer-trial-data
[36] T. Jaakkola, M. I. Jordan, and S. P. Singh, “Convergence of stochastic
iterative dynamic programming algorithms,Neural Computation, vol. 6,
pp. 1185–1201, 1994.
[37] G. A. Godfrey and W. B. Powell, “An adaptive, distribution-free
algorithm for the newsvendor problem with censored demands, with
applications to inventory and distribution,Management Science, vol. 47,
no. 8, pp. 1101–1112, 2001.
[38] N. Sharma, P. Sharma, D. Irwin, and P. Shenoy, “Predicting solar
generation from weather forecasts using machine learning,” in Smart
Grid Communications (SmartGridComm), 2011 IEEE International
Conference on, Oct 2011, pp. 528–533.
[39] K. Bao, F. Allerding, and H. Schmeck, “User behavior prediction for
energy management in smart homes,” in Fuzzy Systems and Knowledge
Discovery (FSKD), 2011 Eighth International Conference on, vol. 2,
July 2011, pp. 1335–1339.
[40] D. Lachut, N. Banerjee, and S. Rollins, “Predictability of energy use in
homes,” in Green Computing Conference (IGCC), 2014 International,
Nov 2014, pp. 1–10.
[41] E3 - Equipment Energy Efficieny, “Water heating data collection and
analysis,” Tech. Rep., 2012.
Chanaka Keerthisinghe (S10) received B.E. (Hons)
and M.E. (Hons) in Electrical and Electronic En-
gineering from the University of Auckland, New
Zealand, in 2011 and 2012, respectively. He is
currently a PhD candidate at the Centre for Future
Energy Networks, University of Sydney, Australia.
Chanaka’s research interests include demand re-
sponse, energy management in residential buildings,
wireless charging of electric vehicles and applying
approximate dynamic programming and optimisa-
tion techniques in power systems.
Gregor Verbiˇ
creceived the B.Sc., M.Sc., and Ph.D.
degrees in electrical engineering from the University
of Ljubljana, Ljubljana, Slovenia, in 1995, 2000, and
2003, respectively. In 2005, he was a North Atlantic
Treaty Organization Natural Sciences and Engineer-
ing Research Council of Canada Postdoctoral Fellow
with the University of Waterloo, Waterloo, ON,
Canada. Since 2010, he has been with the School of
Electrical and Information Engineering, University
of Sydney, Sydney, NSW, Australia. His expertise
is in power system operation, stability and control,
and electricity markets. His current research interests include integration of
renewable energies into power systems and markets, optimization and control
of distributed energy resources, demand response, and energy management
in residential buildings. He was a recipient of the IEEE Power and Energy
Society Prize Paper Award in 2006. He is an Associate Editor of the IEEE
TRANSACTIONS ON SMART GRID.
Archie C. Chapman (M14) received the B.A. de-
gree in math and political science, and the B.Econ.
(Hons.) degree from the University of Queensland,
Brisbane, QLD, Australia, in 2003 and 2004, re-
spectively, and the Ph.D. degree in computer science
from the University of Southampton, Southampton,
U.K., in 2009. He is currently a Research Fellow
in Smart Grids with the School of Electrical and
Information Engineering, Centre for Future Energy
Networks, University of Sydney, Sydney, NSW, Aus-
tralia. He has expertise is in game-theoretic and
reinforcement learning techniques for optimization and control in large
distributed systems. His research focuses on integrating renewables into legacy
power networks, using distributed energy and load scheduling methods, and on
designing tariffs and market mechanisms that support efficient use of existing
infrastructure and new controllable devices.
... Double-pass ADP algorithms have also been proposed for microgrid and storage dispatch. For example, a computationally efficient smart home energy management system is proposed in [25], considering stochastic energy consumption and photovoltaics (PV) generation models over a horizon of several days. In [26], a double-pass ADP based on a structured lookup table is proposed for optimal scheduling of wind paired with storage and benchmarked against the optimal solution on a library of deterministic and stochastic problems. ...
... Update VFAV n t−1 (S x,n t−1 ) using (9) 22: end for 23: n ← n + 1 24: end while 25 At each iteration n, a sample path ω n is first generated to represent a realization of exogenous information. Forward pass and backward pass are then executed in sequence: ...
Article
Full-text available
Approximate dynamic programming (ADP) is a promising approach for power system scheduling and dispatch under uncertainties. This paper presents an innovative ADP-based dispatch method for a microgrid with intermittent renewable generation, battery energy storage systems, and controllable distributed generators. The proposed ADP algorithm is based on a double-pass value iteration approach and takes advantage of the underlying properties of the microgrid dispatch problem. In the forward pass, decision variables are updated moving forward in time using an ε-greedy strategy to balance exploitation and exploration. In particular, an approximate optimization method is proposed to speed up exploitation. In addition to random exploration, a policy is designed to guide the algorithm to explore some promising solution space in a probabilistic manner. In the backward pass, the value function is updated moving backward in time using the trajectory of states, decisions, and outcomes of the sample path in the forward pass. The proposed method is evaluated through numerical experiments in both deterministic and stochastic environments. Case study results show that the proposed method demonstrates improved performance in both optimization gap and computation time in comparison to conventional methods.
... In this work, a policy is a sequence of on/off status of the HVAC system over a defined time horizon. Let s kþ1 ¼ s M ðs k ; x k Þ describe the evolution from time step k to the next time step, k þ 1, where s M is the underlying mathematical model of the studied system (see Ref. [27] for a detailed HEMS formulation). In this problem, the system model is a system of the ordinary differential equations (ODE) (1) and (2). ...
Article
Full-text available
This work investigates the extent to which phase change material (PCM) in the building's envelope can be used as an alternative to battery storage systems to increase self-consumption of rooftop solar photovoltaic (PV) generation. In particular, we explore the electricity cost-savings and increase in PV self-consumption that can be achieved by using PCMs and the operation of the heating, ventilation, and air conditioning (HVAC) system optimised by a home energy management system (HEMS). In more detail, we consider a HEMS with an HVAC system, rooftop PV, and a PCM layer integrated into the building envelope. The objective of the HEMS optimisation is to minimise electricity costs while maximising PV self-consumption and maintaining the indoor building temperature in a preferred comfort range. Solving this problem is challenging due to PCM's nonlinear characteristics, and using methods that can deal with the resulting non-convexity of the optimisation problem, like dynamic programming is computationally expensive. Therefore, we use multi-timescale approximate dynamic programming (MADP) that we developed in our earlier work to explore a number of Australian PCM scenarios. Specifically, we analyse a large number of residential buildings across five Australian capital cities. We find that using PCM can reduce annual electricity costs by between 10.6% in Brisbane and 19% in Adelaide. However, somewhat surprisingly, using PCM reduces PV self-consumption by between 1.5% in Brisbane and 2.7% in Perth.
... groups points together which are close to each other based on a distance measurement and a minimum number of points [52][53][54][55] Dimensionality Reduction Feature Selection Find a subset of the original variables (features). There are three strategies: the filter strategy, the wrapper strategy, and the embedded strategy (i.e., features are selected to add or be removed while building the model based on the prediction errors) [56][57][58][59] Feature Extraction Transforms the data in the high-dimensional space to space of fewer dimensions, which may be linear, but many nonlinear dimensionality reduction techniques also exist [60][61][62][63] Reinforcement Reinforcement Learning Q-learning Provides agents with the capability of learning to act optimally in Markovian domains by experiencing the consequences of actions, without requiring them to build maps of the domains [64][65][66][67] Temporal Difference An unsupervised technique in which the learning agent learns to predict the expected value of a variable occurring at the end of a sequence of states [68][69][70][71] Value Iteration Progressively enhancing the value function every iteration until it converges [72][73][74][75] ...
Article
Applications of machine learning (ML) methods have been used extensively to solve various complex challenges in recent years in various application areas, such as medical, financial, environmental, marketing, security, and industrial applications. ML methods are characterized by their ability to examine many data and discover exciting relationships, provide interpretation, and identify patterns. ML can help enhance the reliability, performance, predictability, and accuracy of diagnostic systems for many diseases. This survey provides a comprehensive review of the use of ML in the medical field highlighting standard technologies and how they affect medical diagnosis. Five major medical applications are deeply discussed, focusing on adapting the ML models to solve the problems in cancer, medical chemistry, brain, medical imaging, and wearable sensors. Finally, this survey provides valuable references and guidance for researchers, practitioners, and decision-makers framing future research and development directions.
... In this work, a policy is a sequence of on/off status of the HVAC system over a defined time horizon. Let s k+1 = s M (s k , x k ) describe the evolution from time step k to the next time step, k + 1, where s M is the underlying mathematical model of the studied system (see [24] for a detailed HEMS formulation). In this problem, the system model is a system of the ordinary differential equations (ODE) (1) and (2). ...
Preprint
Full-text available
This work investigates the extent to which phase change material (PCM) in the building's envelope can be used as an alternative to battery storage systems to increase self-consumption of rooftop solar photovoltaic (PV) generation. In particular, we explore the electricity cost-savings and increase in PV self-consumption that can be achieved by using PCMs and the operation of the heating, ventilation, and air conditioning (HVAC) system optimised by a home energy management system (HEMS). In more detail, we consider a HEMS with an HVAC system, rooftop PV, and a PCM layer integrated into the building envelope. The objective of the HEMS optimisation is to minimise electricity costs while maximising PV self-consumption and maintaining the indoor building temperature in a preferred comfort range. Solving this problem is challenging due to PCM's nonlinear characteristics, and using methods that can deal with the resulting non-convexity of the optimisation problem, like dynamic programming is computationally expensive. Therefore, we use multi-timescale approximate dynamic programming (MADP) that we developed in our earlier work to explore a number of Australian PCM scenarios. Specifically, we analyse a large number of residential buildings across five Australian capital cities. We find that using PCM can reduce annual electricity costs by between 10.6 % in Brisbane and 19 % in Adelaide. However, somewhat surprisingly, using PCM reduces PV self-consumption by between 1.5 % in Brisbane and 2.7 % in Perth.
... SHEM system provide an efficient service to monitor and control electricity storage, generation and consumption. Smart home contain different sensors through which different sensing data is collected from appliances, this data will communicate through HAN,s (Home area network) for real time monitoring and different operational mode are perform from personal laptop or mobile [13]. In this regard different techniques are used. ...
Preprint
Full-text available
Instead of planting new electricity generation units, there is a need to design an efficient energy management system to achieve a normalized trend of power consumption. Smart grid has been evolved as a solution, where Demand Response (DR) strategy is used to modify the nature of demand of consumer. In return, utility pay incentives to the consumer. The increasing load demand in residential area and irregular electricity load profile have encouraged us to propose an efficient Home Energy Management System (HEMS) for optimal scheduling of home appliances. In order to meet the electricity demand of the consumers, the energy consumption pattern of a consumer is maintained through scheduling the appliances in day-ahead and real-time bases. In this paper we propose a hybrid algorithm Bacterial foraging Ant colony optimization is proposed (HB-ACO) which contain both BFA and ACO properties. Primary objectives of scheduling is to shift load from On-peak hour to Off-peak hours to reduce electricity cost and peak to average ratio. A comparison of these algorithms is also presented in terms of performance parameters electricity cost, reduction of PAR and user comfort in term of waiting time. The proposed techniques are evaluated using two pricing scheme time of use and critical peak pricing. The HB-ACO shows better performance as compared to ACO and BFA which is evident from the simulation results Moreover the concept of coordination among home appliances is presented for real time scheduling. We consider this is knapsack problem and solve it through Ant colony optimization algorithm.
Article
This paper proposes a data-driven approach for multi-energy management of a smart home with different types of appliances, including battery energy storage system (BESS), thermal energy storage system (TES), micro combined heat and power system (mCHP), electrical heat pump (EHP), rooftop photovoltaics (PV) and electrical vehicle (EV). Firstly, home energy management (HEM) is formulated as a cost minimization problem with hard constraints to optimize energy generation, storage, and consumption. Secondly, this paper proposes a safe reinforcement learning (SRL) approach with Primal-Dual Optimization (PDO) policy search-based algorithm to solve the HEM problem. Unlike existing DRL methods, the proposed approach learns to minimize costs from accumulated cost functions and automatically tunes the cost function coefficients to achieve zero constraints violation. Besides, a dynamic electricity price forecasting model based on CNN-LSTM neural network is designed to deal with the uncertainties in future prices. To verify the performance of the proposed HEM method, simulations are conducted using Singapore wholesale electricity price data. Numerical results demonstrate that the proposed method has a better ability to minimize energy costs and satisfy constraints compared to existing methods.
Article
Accurate short-term forecasting of the individual residential load is a challenging task due to the nonlinear behavior of the residential customer. Moreover, there are a large number of features that have impact on the energy consumption of the residential load. Recently, deep learning algorithms are widely used for short-term load forecasting (STLF) of residential load. Although deep learning algorithms are capable of achieving promising results due to their ability in feature extraction, machine learning algorithms are also prone to obtain satisfactory results with lower complexity and easier implementation. Identifying the most dominant features which have the highest impact on residential load is a pragmatic measure to boost the accuracy of STLF. But deep learning algorithms use feature extraction, which leads to the loss of data interpretability due to transforming the data. This paper proposes to improve the accuracy of the individual residential STLF using an enhanced machine learning-based approach via a feature-engineering framework. To this end, various datasets and features such as historical load and climate features are collected. Afterward, correlation analysis and outlier detection via the k-nearest neighbor algorithm are deployed to implement outlier detection. In the next stage, feature selection algorithms are used to identify the foremost dominant features. Additionally, this paper conducts a comparative study between the proposed approach and state-of-the-art deep learning architectures. Eventually, the isolation forest algorithm is used to verify the effectiveness of the proposed approach by identifying anomalous samples and comparing the results of the proposed approach with those of deep learning algorithms.
Article
This article proposes a hybrid approach for energy management system in smart grid. The smart grid system contains photovoltaic, wind turbine, micro turbine, battery. The proposed hybrid approach is the combination of artificial cell swarm optimization (ACSO) and vapor liquid equilibrium (VLE); therefore, it is termed as ACSO‐VLE. The aim of the proposed approach is minimization of fuel cost, operation and maintenance cost, hourly power variation in the grid connected micro grid system. The necessary load demand of grid connected micro grid system is continually monitored by ACSO. The VLE is enhanced the perfect consolidation of micro grid with respect to predicted load demand. During the micro grid operation, the first approach is focused the scheduling of various renewable energy sources to lessen the cost of electricity. The aim of the second method is to balance the power flow and diminish the impacts of predicting errors depending on rule summarized from the scheduled power reference. The proposed model is carried out in MATLAB; its efficiency is examined to with and without grid of micro grid system. The effectiveness of ACSO‐VLE technique is analyzed through the comparison analysis using the existing techniques. The proficiency is analyzed utilizing cost analysis including power generation of photovoltaic, micro and wind turbine, battery. The Root mean square error (RMSE), MAPE and Mean bias error (MBE) under 50 counts of trails of the proposed technique are 9.3, 4.2 and 2.7, l. Likewise, the RMSE, MAPE and MBE under 100 counts of trails are 13.5, 3.9 and 5.7. The mean, median, standard deviation attains 0.9681, 0.9062, and 0.1099.
Conference Paper
Full-text available
In this paper, we develop a reinforcement learning-based scheme for the real-time energy management of a smart home that contains a photovoltaic-energy storage system. The objective of the proposed scheme is to minimize the electricity supplying cost by appropriately scheduling the storage system on a daily basis. The problem is formulated as a Markov decision process, which is optimized using the Deep Deterministic Policy Gradient (DDPG) algorithm. The main advantage of our proposed method compared to optimization-based ones is the ability to obtain effective daily schedules without relying on stochastic variables’ forecasts. In addition, nonlinearities related to the system’s operation can be effectively modeled, without the necessity of approximating them, as in linear optimization-based methods. The results confirm the ability of the DDPG agent to learn from historical data, and then to generalize the obtained knowledge to deal with real-time situations.
Conference Paper
Full-text available
In Australia, the penetration of rooftop photovoltaic (PV) systems with storage is expected to increase in the future because of rising electricity costs, decreasing capital costs and growing concerns about climate change. Residential energy users can seize the full financial benefits of these systems by using an automated energy management system (EMS) to schedule and coordinate their energy use. An important aspect of an effective EMS is to control the battery state of charge, taking into consideration of the intermittent nature of PV generation and variability of electrical demand over a decision horizon of several days. However, this is difficult because of the computational burden associated with the currently proposed solution techniques. Given these existing shortcomings, this paper evaluates a two-stage stochastic optimisation framework for energy management of residential PV-storage systems to identify the benefits of having a longer decision horizon. That is: a simplified longer-horizon solver that uses stochastic mixed-integer linear programming (MILP) and a more detail shorter horizon solver using dynamic programming. In doing so, this paper discusses the general benefits of residential PV-storage systems coupled with an EMS.
Conference Paper
Full-text available
In the future, automated smart home energy management systems (SHEMSs) will assist residential energy users to schedule and coordinate their energy use. In order to undertake efficient and robust scheduling of distributed energy resources, such a SHEMS needs to consider the stochastic nature of the household's energy use and the intermittent nature of its distributed generation. Currently, stochastic mixed-integer linear programming (MILP), particle swarm optimization and dynamic programming approaches have been proposed for incorporating these stochastic variables. However, these approaches result in a SHEMS with very costly computational requirements or lower quality solutions. Given this context, this paper discusses the drawbacks associated with these existing methods by comparing a SHEMS using stochastic MILP with heuristic scenario reduction techniques to one using a dynamic programming approach. Then, drawing on analysis of the two methods above, this paper discusses ways of reducing the computational burden of the stochastic optimization framework by using approximate dynamic programming to implement a SHEMS.
Article
Full-text available
Demand response programs are currently being proposed as a solution to deal with issues related to peak demand and to improve the operation of the electric power system. In the demand response paradigm, electric utilities provide incentives and benefits to private consumers as a compensation for their flexibility in the timing of their electricity consumption. In this paper, a dynamic energy management framework, based on highly resolved energy consumption models, is used to simulate automated residential demand response. The models estimate the residential demand using a novel bottom-up approach that quantifies consumer energy use behavior, thus providing an accurate estimation of the actual amount of controllable resources. The optimal schedule of all of the controllable appliances, including plug-in electric vehicles, is found by minimizing consumer electricity-related expenditures. Recently, time-varying electricity rate plans have been proposed by electric utilities as an incentive to their customers with the objective of re-shaping the aggregate demand. Large-scale simulations are performed to analyze and quantitatively assess the impact of demand response programs using different electricity price structures. Results show that simple time-varying electricity price structures, coupled with large-scale adoption of automated energy management systems, might create pronounced rebound peaks in the aggregate residential demand. To cope with the rebound peaks created by the synchronization of the individual residential demands, innovative electricity price structures—called Multi-TOU and Multi-CPP—are proposed.
Conference Paper
Full-text available
Predictability of home energy usage forms the basis of many home energy management and demand-response systems. While existing studies focus on designing more accu-rate prediction algorithms, a comprehensive energy management solution requires a broad understanding of prediction accuracy at different granularities, for example appliance and home, as well as different time horizons, for example an hour, day, or week into the future. In this paper, we undertake an analysis of predictability of power draw of appliances and whole-home energy consumption at four different time horizons: an hour, a quarter-day, a day, and a week in the future. Our analysis presents two research contributions. Our first contribution is a diverse dataset, GreenHomes, that includes appliance power draw and whole-home energy consumption data from seven homes across three states in the United States over a two-year period. Our second and primary contribution is a set of insights into the predictability of home energy usage. We show that simple statistic-based algorithms perform as well as sophisticated machine learning algorithms and time-series based predictors. These simple algorithms can considerably reduce the computational need for large-scale predictive analysis of home energy data. We also show that appliance-level power draw is more predictable than whole-home energy consumption at shorter time horizons while home-level energy consumption is more predictable at longer time horizons. Finally, we show that there is large variation in predictability across homes. This variation may be attributed to home type and points to the need for personalized energy management systems.
Article
We present and benchmark an approximate dynamic programming algorithm that is capable of designing near-optimal control policies for a portfolio of heterogenous storage devices in a time-dependent environment, where wind supply, demand, and electricity prices may evolve stochastically. We found that the algorithm was able to design storage policies that are within 0.08% of optimal on deterministic models, and within 0.86% on stochastic models. We use the algorithm to analyze a dual-storage system with different capacities and losses, and show that the policy properly uses the low-loss device (which is typically much more expensive) for high-frequency variations. We close by demonstrating the algorithm on a five-device system. The algorithm easily scales to handle heterogeneous portfolios of storage devices distributed over the grid and more complex storage networks.
Conference Paper
As more renewable, yet volatile, forms of energy like solar and wind are being incorporated into the grid, the problem of finding optimal control policies for energy storage is becoming increasingly important. These sequential decision problems are often modeled as stochastic dynamic programs, but when the state space becomes large, traditional (exact) techniques such as backward induction, policy iteration, or value iteration quickly become computationally intractable. Approximate dynamic programming (ADP) thus becomes a natural solution technique for solving these problems to near-optimality using significantly fewer computational resources. In this paper, we compare the performance of the following: various approximation architectures with approximate policy iteration (API), approximate value iteration (AVI) with structured lookup table, and direct policy search on a benchmarked energy storage problem (i.e., the optimal solution is computable).
Article
Demand-side participation schemes are employed to alter customers' use of electrical power. The design of a complete scheme comprises two separate sub-problems, a customer's local power and energy management problem, and the task of aggregating many customers into a usable coordinated demand-side energy resource. These are usually treated separately. This paper identifies the challenges of integrating the two sub-problems and maps the space of alternative formulations and configurations. Accordingly, the subject of this paper is home energy management systems and residential demand-side aggregation (RDSA) methods, with a focus on the assumptions underpinning both and the practical requirements that an integrated aggregation system should satisfy. This paper analyzes the consequences of choosing specific scheduling frameworks, preference models and aggregation mechanisms for RDSA, and identifies the tensions that exist between various proposed model formulations and optimization methods. By elucidating the implicit tradeoffs that come with choosing an RDSA scheme's configuration, this paper provides a guide to the most appropriate applications of the range of energy management and aggregation techniques proposed in the literature as part of an integrated scheme.
Article
Despite the rapid uptake of small-scale solar photovoltaic (PV) systems in recent years, public availability of generation and load data at the household level remains very limited. Moreover, such data are typically measured using bi-directional meters recording only PV generation in excess of residential load rather than recording generation and load separately. In this paper, we report a publicly available dataset consisting of load and rooftop PV generation for 300 de-identified residential customers in an Australian distribution network, with load centres covering metropolitan Sydney and surrounding regional areas. The dataset spans a 3-year period, with separately reported measurements of load and PV generation at 30-min intervals. Following a detailed description of the dataset, we identify several means by which anomalous records (e.g. due to inverter failure) are identified and excised. With the resulting ‘clean’ dataset, we identify key customer-specific and aggregated characteristics of rooftop PV generation and residential load.