ArticlePDF Available

Non-intrusive energy disaggregation by detecting similarities in consumption patterns

Authors:

Abstract and Figures

Breaking down the aggregated energy consumption into a detailed consumption per appliance is a crucial tool for energy efficiency in residential buildings. Non-intrusive load monitoring allows implementing this strategy using just a smart energy meter without installing extra hardware. The obtained information is critical to provide an accurate characterization of energy consumption in order to avoid an overload of the electric system, and also to elaborate special tariffs to reduce the electricity cost for users. This article presents an approach for energy consumption disaggregation in households, based on detecting similar consumption patterns from previously recorded labelled datasets. The experimental evaluation of the proposed method is performed over four different problem instances that model real household scenarios using data from an energy consumption repository. Experimental results are compared with twobuilt-in algorithms provided by the nilmtk framework (combinatorial optimization and factorial hidden Markov model). The proposed algorithm was able to achieve accurate results regarding standard prediction metrics. The accuracy was not affected in a significant manner by the presence of ambiguity between the energy consumption of different appliances or by the difference of consumption between training and test appliances.
Content may be subject to copyright.
Revista Facultad de Ingeniería, Universidad de Antioquia, No.98, pp. 27-46, Jan-Mar 2021
Nonintrusive energy disaggregation by
detecting similarities in consumption patterns
Desagregación de energía no intrusiva a travésde la detección de similitudes en los patrones
de consumo eléctrico
Juan P. Chavat1, Jorge Graneri1, Sergio Nesmachnow1
1Facultad de Ingeniería, Universidad de la República. Herrera y Reissig 565, C. P. 11300. Montevideo, Uruguay.
CITE THIS ARTICLE AS:
J. P. Chavat, J. Graneri, and S.
Nesmachnow. ”Nonintrusive
energy disaggregation by
detecting similarities in
consumption patterns”,
Revista Facultad de Ingeniería
Universidad de Antioquia, no.
98, pp. 27-46, Jan-Mar 2021.
[Online]. Available: https:
//www.doi.org/10.17533/
udea.redin.20200370
ARTICLE INFO:
Received: December 09, 2019
Accepted: March 13, 2020
Available online: March 30,
2020
KEYWORDS:
Non-intrusive load monitoring;
pattern similarities; energy
efciency
Monitoreo no intrusivo de
energía; similitud de patrones;
eciencia energética
ABSTRACT: Breaking down the aggregated energy consumption into a detailed
consumption per appliance is a crucial tool for energy efciency in residential buildings.
Non-intrusive load monitoring allows implementing this strategy using just a smart
energy meter without installing extra hardware. The obtained information is critical to
provide an accurate characterization of energy consumption in order to avoid an overload
of the electric system, and also to elaborate special tariffs to reduce the electricity cost
for users. This article presents an approach for energy consumption disaggregation in
households, based on detecting similar consumption patterns from previously recorded
labelled datasets. The experimental evaluation of the proposed method is performed
over four different problem instances that model real household scenarios using data
from an energy consumption repository. Experimental results are compared with two
built-in algorithms provided by the nilmtk framework (combinatorial optimization and
factorial hidden Markov model). The proposed algorithm was able to achieve accurate
results regarding standard prediction metrics. The accuracy was not affected in a
signicant manner by the presence of ambiguity between the energy consumption of
different appliances or by the difference of consumption between training and test
appliances.
RESUMEN: Desglosar el consumo energético agregado en un consumo detallado por
electrodoméstico es una herramienta crucial para la eciencia energética en edicios
residenciales. El monitoreo no intrusivo de consumo energético permite implementar
esta estrategia usando solo un medidor de energía inteligente, sin instalar hardware
adicional. La información obtenida es crítica para caracterizar el consumo de energía
con el n de evitar sobrecargas del sistema eléctrico y para elaborar tarifas que
reduzcan los costos de electricidad de los usuarios. Este artículo presenta un enfoque
para la desagregación del consumo de energía en hogares, basado en la detección
de patrones similares de consumo en conjuntos de datos registrados previamente.
La evaluación experimental se realiza en cuatro instancias que modelan escenarios
de hogares reales utilizando datos de un repositorio de consumo de energía. Los
resultados experimentales se comparan con algoritmos del entorno de trabajo nilmtk
(optimización combinatoria y modelo oculto de Markov factorial). El algoritmo propuesto
alcanzó resultados precisos, de acuerdo con métricas estándar de predicción. La
precisión no fue afectada signicativamente por la presencia de ambigüedad entre el
consumo de energía de diferentes dispositivos o por la diferencia de consumo entre los
dispositivos de entrenamiento y de validación.
1. Introduction
In the last fty years, residential buildings have
uninterruptedly increased their electricity utilization.
This phenomenon occurred worldwide, as described
by the World Energy Outlook report, elaborated by the
International Energy Agency [1]. The increment is also
27
* Corresponding author: Juan P. Chavat
E-mail: juan.pablo.chavat@ng.edu.uy
ISSN 0120-6230
e-ISSN 2422-2844
DOI: 10.17533/udea.redin.20200370 27
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
a trend expected for the near future, e.g., the electric
power demand in 2050 is expected to be twice as much
as that demanded in 2010 [2]. Under this premise, many
investigations have been carried out to achieve efcient
use of electricity in industries and households [35].
Furthermore, this is a very relevant problem under the
paradigm of smart cities [6].
One of the most rational approaches implemented to
guarantee more efcient use of electric energy in homes is
based on encouraging a user behaviour change, favourable
to saving. The basis for offering incentives for behavioural
changes are mostly derived from the analysis of electricity
utilization and energy consumption patterns.
Several methods have been proposed for the analysis
of electricity utilization in residential and non-residential
buildings [7,8]. The methods are classied into two main
groups: intrusive and non-intrusive. Intrusive methods
require placing sensors on every appliance to collect load
data, which leads to an intrusion on the dwellings.
On the other hand, Non-Intrusive Load Monitoring
(NILM) methods are applied just using the main load
meter that provides the aggregate consumption data,
without requiring additional hardware, thus avoiding
intrusions on the dwelling. Based on a detailed analysis
of the current and voltage of the aggregated load of a
building (e.g., measuring the changes in the signal), NILM
methods are able to determine the state (ON/OFF) and
energy consumption of each appliance. In particular,
NILM techniques apply in residential households, which
do not require to be instrumented for the analysis, in order
to gain valuable knowledge about energy consumption and
appliances utilization.
The fact that NILM uses only the aggregate load to
disaggregate the signal of each appliance makes it a
more practical method than intrusive methods to generate
detailed information about household energy consumption.
The disaggregated information is useful to provide
breakdown bill information to the consumer, schedule
the activation of appliances, detect malfunctioning, and
suggest actions that can lead to a signicant reduction
in electricity consumption (e.g., up to about 20% in some
cases [9]), among other uses.
Following this line of work, this article presents an
approach for solving the energy disaggregation problem
in residential households by applying a pattern similarities
algorithm.
The proposed algorithm bases on the idea of recognizing
the states of appliances (ON/OFF) and determine energy
consumption patterns, taking into account the historical
energy consumption information for each appliance and
the aggregate consumption signal. A traditional two-phase
procedure is applied, consisting of training and testing
phases. The experimental evaluation of the proposed
algorithm is performed over synthetic datasets, built using
a specic methodology and real energy consumption data
from the well-known UK-DALE repository [10].
The experimental analysis was conceived to analyze
the performance of the proposed method for household
energy disaggregation. The appliances consumption and
the sampling intervals vary in each experiment to create
complex scenarios, including complicated features such
as consumption ambiguity between appliances.
Relevant metrics were studied, including the precision of
the prediction, recall (the conditional probability that the
appliance is ON given that the prediction is ON); F–Score
(the harmonic mean of precision and recall); the error
of the total assigned energy consumption and the mean
normalized error in assigned energy consumption.
Experimental results were processed using the available
tools from the nilmtk toolkit, including the comparison
of the proposed algorithm with two standard built-in
methods of the toolkit: Combinatorial Optimization (CO)
and Factorial Hidden Markov Model (FHMM).Results show
that the proposed algorithm is able to achieve accurate
results, accounting for an average of 0.95 on the F-score
metric in the most complex problem instances and low
error in assigned energy consumption. The proposed
algorithm signicantly outperformed both CO and FHMM
in all problem instances studied.
The research reported in this article was developed within
the project “Computational intelligence to characterize the
use of electric energy in residential customers”, funded by
the National Administration of Power Plants and Electrical
Transmissions (Spanish: Administración Nacional de
Usinas y Trasmisiones Eléctricas, UTE), the Uruguayan
government-owned power company and Universidad de la
República, Uruguay. The project proposes the application
of computational intelligence techniques for processing
household electricity consumption data to characterize
energy consumption, determine the use of appliances
that have more impact on total consumption, and
identify consumption patterns in residential customers.
Knowledge and results generated in the project will
be helpful to conceive and design a specic automatic
recommendation system that takes into account both the
point of view of users and the electricity company.
This work extends our previous article Household
energy disaggregation based on pattern consumption
similarities [11], presented at II Ibero-American Congress
28
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
on Smart Cities, Soria, Spain, 2019. The main contributions
of this article are: i) a detailed description of the problem
and the proposed algorithm; ii) an extended experimental
analysis by including new instances that account for
different periods between consecutiveload records (10 and
15 minutes), in accordance with the available consumption
data from UTE, in Uruguay; and iii) new problem instances
including noise in the energy consumption records, in
order to analyze the robustness of the proposed method
for household energy consumption disaggregation.
The article is structured as follows. Section 2presents
the formulation of the problem addressed in the article,
while Section 3presents a review of the main related
work. Section 4describes the proposed algorithm for
residential household energy consumption disaggregation,
and Section 5reports the experimental analysis over all the
considered problem instances. Finally, Section 6presents
the conclusions and the main lines of future work.
2. The problem of energy
consumption disaggregation
This section describes the problem of energy consumption
disaggregation in residential households and its
mathematical formulation.
2.1 Generic description of the energy
consumption disaggregation problem
The problem consists of disaggregating the overall
energy consumption of a household into the individual
consumption of a given number of appliances. Energy
disaggregation is a particular case of a classication
problem. One of the most widely studied approaches
considers a set of signatures for household appliances
to solve the related classication problem. However, it is
difcult to nd the set of features to accurately describe
each appliance, which can be applied to different houses
and different consumption patterns [12].
2.2 Mathematical formulation of the energy
consumption disaggregation problem
The mathematical formulation of the problem of energy
consumption disaggregation considers the following
elements:
A set of appliances available in a household
A={ai}, i = 1, . . . , m.
A period of time T, discretized in intervals t.
A function C:A×TR.xi
t=C(ai, t)gives the
power consumption of each appliance in a given time
interval t.
The aggregate power consumption of the household
at a given time interval t,xt. The aggregate power
consumption is expressed as the sum of the individual
power consumption xi
tof each appliance in use in that
time interval xt=PaiAxi
t.
A binary variable yi
tthat indicates the status of
appliance iin time interval t.yi
ttakes the value 1 when
appliance iis ON and the value 0 when it is OFF.
The simplest version of the problem is the binary variant.
It assumes two possible values for the power consumption
of each appliance, i.e., xi
t=C(ai, t)×yi
t, that is to say,
that the power consumption of appliance iis constant
when switched on, and it does not depend on the activity
being performed by the appliance.
The total power consumption is described as a function
f:{0,1}mRdened by the expression in Equation 1.
xt=f((y1
t, y2
t,· · · , y m
t)) = c1y1
t+c2y2
t+· · ·+cmym
t(1)
For those cases in which function fis injective
(one-to-one), the problem is trivial. Otherwise, the
times series {xt}tTmust be studied, in order to learn
and deduce from the variation of power consumption on
time, the individual power consumption (or signatures) of
the individual appliances.
Let us suppose an instance of the problem considering ve
appliances: fridge (power consumption 250 W), washing
machine (power consumption 1500 W), dishwasher (power
consumption 2250 W), kettle (power consumption 2000 W),
and home theater (power consumption 80 W). For this
set of appliances, the aggregate power consumption is
a non-injective function. There is ambiguity between the
power consumption of the fridge and kettle (combined)
with the power consumption of the dishwasher, as dened
by Equation 2. The variation of the aggregate power
consumption in time must be studied to deduce if the
dishwasher or the combination of fridge and kettle is ON.
f((1,0,0,1,0)) = f((0,0,1,0,0)) = 2500 W(2)
Several attributes can be studied, and patterns can be
detected to solve ambiguities. In the previous example,
additional information can be used to solve the ambiguity:
e.g., the mean time of utilization of each appliance (it is a
couple of minutes for the kettle and more prolonged than
an hour for the dishwasher).
Another more sophisticated patterns can be detected to
solve problem instances with more complex ambiguities.
29
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
In general, the variation of the aggregated power
consumption in a time neighbourhood of instant tcan
be used to deduce the conguration of all appliances
{(y1
t, y2
t,· · · , y m
t)}with tT. The proposed approach
is based on using the available information to make
predictions {y1
t,ˆy2
t,· · · ,ˆym
t)}with tTthat maximizes
the number of time intervals tTfor which the status of
every appliance is correctly detected; represented by the
sum in Equation 3.
X
tT
m
Y
i=1
1{ˆyi
t=yi
t}(3)
3. Related works
The analysis of the related literature allows identifying
several proposals on the design and application of
software-based methods for energy consumption
disaggregation. This section reviews the main related
works on this topic.
Hart presented the concept of Non-intrusive Appliance
Load Monitoring in the pioneering publication on this
research area [13]. The author stated that the previously
presented approaches on the subject had a strong
hardware component, installing intrusively monitoring
points in each household appliance connected to a central
information collector. These methods, in general, had
the characteristic of relegating the software to the task
of collecting data. Hart proposed an approach based on
using simple hardware and sophisticated software for the
analysis, therefore eliminating permanent intrusion in
homes (i.e., the “non-intrusive” term was coined).
Hart dened a model for the analysis considering
that electrical appliances are connected in parallel to
the electrical network and that the power consumed is
additive (Equation 4), where ai(t)represents the ON/OFF
state of an appliance at time t.
ai(t) = (1if appliance iis ON at time t
0otherwise (4)
Multiphase loads with pphases are modelled as vectors
of dimension pwhere each component is the load in each
phase. The total charge of the vector is the sum of the
pcomponents. Piis dened as a vector representing
the power consumed by device iwhen it is turned on
(Equation 5), where P(t)is the p-vector corresponding to
time t, and e(t)represents the noise or the recorded error
for time t.
P(t) =
n
X
i=1
ai(t).Pi+e(t)(5)
The proposed model involves solving a combinatorial
optimization problem to determine vector a(t)from the
known information., i.e., vectors Piand P(t), in order to
minimize the error (Equation 6).
ˆa(t) = arg min
a
P(t)
n
X
i=1
ai(t).Pi
(6)
However, the resulting combinatorial optimization problem
is NP-hard and therefore, computationally intractable for
large values of n. Heuristic algorithms allow computing
solutions of acceptable quality, but their applicability is
limited because in practice the set of vectors Piis not fully
known, the value nis not xed, and unknown devices tend
to be described as a combination of those already known.
Furthermore, a small variation in the measurement of
P(t)can cause signicant changes in a(t), mistakenly
predicting simultaneous ON and OFF events. In order to
avoid these miss-predictions, Hart proposed the principle
of continuity switch, establishing that for small intervals
of time it is expected that few appliances have a change in
their status (ON/OFF). Additionally, the principle assumes
that no household appliance has a negative consumption
in order to eliminate the ambiguity produced between the
switch-on of a given appliance and the shutdown of an
energy generator. For this reason, it is assumed that there
are no electric generators connected to the network in the
studied home.
In recent works, NILM has been treated as a machine
learning problem, applying supervised and unsupervised
learning methods to solve it. Supervised learning
approaches are based on data sets of consumption of each
device and the aggregate signal. The approach aims to
generate models that learn how to disaggregate the signal
of the devices from the aggregate signal. Most commonly
techniques applied in this approach are Bayesian learning
and neural networks. Unsupervised approaches seek to
learn signatures of possible devices from the aggregate
signal without knowing a priori what devices are inside the
circuit.
As with all machine learning problems, it is essential
to have measurement data in order to apply the different
algorithms. Bongli et al. presented a survey of the test
data sets available to researchers and the main techniques
used for the unsupervised NILM approach [14]. The most
used unsupervised learning techniques are those based
on Hidden Markov Models (HMM), which dene a number
of hidden states in which the model can be moved,
representing the operating conditions of the device
(e.g., ON, OFF, and possible intermediate states) and an
observable result, which depends on the real state that
represents the analyzed consumption data.
Kelly and Knottenbelt analyzed three deep neural
networks applied to the NILM problem along with its
generalization when processing appliances not present
30
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
on the training stage [15]. The proposed neural networks
had between one and 150 million trainable parameters;
therefore, large amounts of training data was needed.
The data set used was UK-DALE, that records the total
electricity consumption of ve houses and its appliances.
The work used a six-second sampling interval version
of the dataset for the total and per-appliance electricity
consumption, and limit the use of the appliances to
ve (fridge, washing machine, dishwasher, kettle, and
microwave). Each appliance is present in at least three
of the ve houses, and their electricity consumption
is heterogeneous, ranging from ON/OFF appliances
(e.g., kettle) to multi-state appliances (e.g., washing
machines, which have complex consumption patterns).
Low-energy appliances were not taken into account since
their consumption tends to be lost in the noise of the
network. The approach consisted of training a neural
network for each household appliance that processes a
sequence of aggregate total consumption and returns
the prediction of the power demanded by the associated
appliance. Three neural networks architectures were
studied: i) long short-term memory (LSTM) recurrent
neural network, suitable for working with data sequences
because of its ability to associate the entire history of
the inputs to an output vector; ii) self-coding for noise
elimination (denoising autoencoder, dAE) that cleans
the aggregate consumption signal to obtain only the
corresponding to the target appliance; and iii) rectangles
network, which focuses on detecting the start and end
of the use of the target appliance, and its average power
demanded at that time. The networks were trained using
50% of real data and 50% of synthetic data, generated
with the signatures of UK-DALE appliances. Results
were compared with CO and FHMM. In the training stage
(using training data), dAE outperformed CO and FHMM
in all appliances regarding all metrics, except relative
error in total energy. The rectangles network computed
better results than CO and FHMM in all appliances,
except the microwave, regarding all studied metrics. In
the evaluation stage (using evaluation data), dAE and
rectangles network outperformed CO and FHMM in F1
score, precision, proportion of total energy correctly
assigned, and mean absolute error. LSTM network
outperformed CO and FHMM in ON/OFF appliances but
was behind in multi-state appliances.
Several related works have used the nilmtk tool,
developed by Batra et al. nilmtk is a framework for NILM
analysis implemented in Python that facilitates using
multiple data sets by converting them to a standard data
model [7].
Furthermore, nilmtk implements algorithms for data
preprocessing, statistics to describe the data sets,
disaggregation algorithms (such as CO and FHMM),
and metrics to evaluate the performance of developed
models. Preprocessing algorithms include downsample,
to normalize the frequency of consumption signals; and
voltage normalization, which implements a method
to normalize the data and is able to combine different sets
of household data to deal with the variation of voltage in
different countries.
The REDD dataset was introduced to study the
performance of the FHMM algorithm in the NILM
problem [16]. The experimental analysis used two weeks
of data from ve households with ten-second sampling
intervals. Results showed that FHMM classied correctly
64.5% of the consumption in the training set and 47.7%
in the evaluation test. Although results are reasonable,
it is evident their degradation between the training and
the evaluation sets. The authors posed the challenge of
combining REDD with the massive amount of untagged
data generated daily by public energy companies.
4. The pattern similarities algorithm
for energy disaggregation
This section describes the proposed algorithm to solve the
problem of energy consumption disaggregation based on
similar consumption patterns.
4.1 Algorithm description
The main details of the proposed algorithm are presented
next.
Generic description
Function f:{0,1}mRgives the aggregate power
consumption of a house for a set of appliances. A function
g:R2d+1 Rmis considered, where the positive number
ddetermines a time neighbourhood for the predictions
(Equation 7).
y1
t,ˆy2
t,· · · ,ˆym
t) := gW,Z (xi
td,· · · , xi
t,· · · , xi
t+d)(7)
In Equation 7,y1
t,ˆy2
t,· · · ,ˆym
t)is the estimated
conguration of the set of house appliances. Function
gW,Z has random elements; it is dened using the
information of a training dataset {W, Z}={wt, zt}such
that for t= 1,· · · , n,wt {0,1}m,ztRand Equation 8
holds.
zt=f((w1
t, w2
t,· · · , w m
t)) (8)
Parameters of function gW,Z are chosen empirically to
maximize the sum in Equation 9, where Ais the set of
ambiguous congurations A={y {0,1}m/y
{0,1}m, y=y, f (y) = f(y)}, equivalent to maximize
31
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
the number of time intervals tTfor which every
appliance status is correctly detected (Equation 3).
X
ytA
m
Y
i=1
1{ˆyi
t=yi
t},(9)
The output of the algorithm is y, the vector of
disaggregated power consumption, computed using
the following input:
The vector Xcontaining the aggregate power
consumption of one house measured over a period
with a certain time-frequency.
A training set Zcontaining the aggregate power
consumption of one or several houses measured over
a period with the same time-frequency as X.
A training set Wcontaining the disaggregated power
consumption of the house (houses) described in Z
over the same period and with the same frequency as
Xis measured.
The parameter dthat denes a time interval
neighbourhood.
The parameter δthat denes a power consumption
neighbourhood.
The parameter Hthat separates high from low power
consumption.
The proposed algorithm, named Pattern Similarities (PS),
consists of two parts, training and testing (prediction),
described next.
Training Stage
Algorithm 1describes the training stage. This stage builds
an array (MZ), whose elements relate each consumption
record on the training set to nearby records in the past
and in the future. Each element of the array (zjMZ)
can be interpreted as the value of a feature of appliances
signatures. The main loop (lines 2–10) iterates over each
sample in the training dataset. In each iteration step, the
algorithm counts how many consumptions from the time
neighbour samples are similar to the consumption of the
iteration step sample (nested loop in lines 4–8) . In line 9,
the array (MZ) is updated with the value of the counter,
in the position corresponding to the consumption sample
analyzed in the iteration. That array is used then, in the
testing stage, to nd samples whose consumption pattern
is similar to the sample processed.
Testing Stage
Algorithm 2describes the processing of the testing stage.
The rst loop (lines 1–10) is similar to the main loop of
Algorithm 1 PS algorithm: training stage
1: MZarray of lenght Z
2: for all ziZdo
3: counter 0
4: for all {zjZ:|ji|< d}do
5: if zj> ziφthen
6: counter counter + 1
7: end if
8: end for
9: MZ[i]counter
10: end for
Algorithm 2 PS algorithm: testing stage
1: MXarray of lenght X
2: for all xiXdo
3: counter 0
4: for all {xjX:|ji|< d}do
5: if xj> xiφthen
6: counter counter + 1
7: end if
8: end for
9: MX[i]counter
10: end for
11: for all xiXdo
12: I
13: for all zjZdo
14: if |zjxi| δAND xi> H then
15: II {j}
16: end if
17: end for
18: if |I| 1then
19: Jargmin{|MZ(I(·)) MX(i)|}
20: krand{1, . . . , length(J)}
21: y(i, ·)w(I(J(k)),·)
22: else
23: Jargmin{|z(·)x(i)|}
24: krand{1, . . . , length(J)}
25: y(i, ·)w(J(k),·)
26: end if
27: end for
the training stage, but applied to the testing dataset.
The result is an array MX, whose elements relate each
consumption record on the testing set to nearby records in
the past and in the future. The second loop (lines 11–27)
iterates over each testing sample to nd similarities with
samples of the training dataset. The third loop (nested into
the second, lines 13–17) compares each element of the
array created in the training stage to the corresponding
sample of the array created in the rst loop. If the
difference between elements is lower than threshold δ
and the testing sample has a consumption greater than
threshold H(δand Hdened in 4.1), a reference index of
the training element is added to set Ito be considered in
next comparisons. Thus, two key elements of the problem,
the energy consumption and its variation in a time
neighbourhood, are used in the disaggregation process.
32
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
If the set Ihas elements, the samples that minimize
the difference between signature features (the difference
between MZand MX) are selected. The minimum of the
vector |MZ(I(·)) MX(i)|is not necessarily attained
at a single sample, thus line 19 denes set Jwith the
indexes where the minimum is attained. In line 20, one
index of the set Jis randomly chosen. If set Iis empty,
i.e., no training sample similar to the processing sample
was found, the algorithm selects the training samples that
minimize the difference of consumption with the sample
that is being processed (line 23). The reference indexes
of the selected samples are stored in the set J(line 23),
and one of them is chosen (line 24). Once the algorithm
has found a similar training sample, its reference index is
related to the appliance states (ON/OFF) at the time of the
record (line 21 or 25, depending on I).
Figure 1graphically summarizes the stages of the
proposed PS algorithm.
4.2 Implementation
A rst version of the proposed algorithm was developed
on Matlab, version 8.3.0.532 (R2014a), as a proof of
concept. After that, the PS algorithm was re-implemented
on python version 3, using pandas and numpy, which
allows the implementation to be included as part of a
pipe of execution in nilmtk. For this stage, several
modications were included in the metrics and utils les
of the framework.
Two scripts were implemented for generating the
synthetic datasets. The rst script reads the UK-DALE
dataset (HDF5 le), normalizes the values for houses and
appliances, and builds a directory structure that contains
metadata and the normalized data in CSV les. The
normalization replaces all records over a given threshold
by an indicated value, and set all other values to zero. For
the generation of instances that include noise, the script
executes a function that adds power consumption of extra
appliances, whose behaviour is modelled as exponential
probability models (see details in Subsection 5.3).
The second script reads the directory structure and its
content to generate a new HDF5 le with the synthetic
dataset. In the resulting dataset, data have the same
sample rate than in the original dataset. The algorithm
implementation, the scripts for generating the datasets,
and the modied nilmtk les are available on a public
repository (gitlab.com/jpchavat/nilm-scripts).
5. Experimental analysis
This section presents the experimental analysis of the
proposed PS algorithm. The algorithm was executed
in a nilmtk pipeline of execution, using a synthetic
dataset based on UK-DALE dataset as input. The
disaggregation accuracy was studied considering different
sample intervals, in order to analyze possible degradations
when considering few available data (i.e., larger sample
intervals), and considering ambiguous appliance loads and
noise in the signals, in order to analyze its robustness.
Results of the PS algorithm were compared with the
results of CO and FHMM algorithms executed in the same
settings.
5.1 Datasets and problem instances
This subsection describes the datasets and problem
instances considered in the experimental evaluation.
Datasets
Datasets used in the experiments were synthetically
generated based on real data from house #1 of the
UK-DALE dataset. Data for the following appliances were
considered: fridge, washing machine, kettle, dishwasher,
and home theatre. These appliances are representative
of devices that contribute the most to household energy
consumption [17]. Several python scripts were generated
for instances generation. The tools of the nilmtk
framework, and the pandas and numpy libraries were
used along the process.
The instances of the datasets were generated according to
the following procedure:
1. Python scripts are used for reading UK-DALE data
structure and creating own metadata, following the
structure of NILM-Metadata proposed by Kelly and
Knottenbelt [18].
2. Two types of values were used for the normalization.
In one case, the mean of the maximum current
consumption of each activation is computed for each
appliance. In the other case, a list of constant values
was set for the normalization.
3. Each record in the UK-DALE dataset that is over
a given threshold (set to 5.0W) is transformed,
normalizing it using the values previously
calculated, i.e., if the record corresponds to a
power consumption above 5.0W, it is replaced by
the values calculated/dened in step two, if not, it is
replaced by zero.
The resulting datasets have the same sample rate than
the original UK-DALE dataset, with the particularity that
it does not present gaps, i.e., if the original sample rate
is six seconds, the generated dataset will have strictly
one record every six seconds with zeros lling the gaps
33
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Proposed algorithm
Set ?
=
|(()) ()|
=
|() ()|
(get appliance status from at
time , to predict at time )
(,) = (,)
TrueFalse
1. Training
i
d
j
 = [0, 6, 3, 0, 8, 6, 2, 0, 2]
> ?+1
9 > ?
|9 8|<?
Add to set
Training sample similar to
processing sample
2. Prediction
i
d
j
 = [0, 9, 1, 2]
> ?+1
Figure 1 Stages of the proposed PS algorithm
presented in the original dataset.
The applied methodology for generating problem instances
is generic and allows creating data for every building
and every appliance. It can be used over base data
from the UK-DALE dataset or other similar datasets from
repositories in the literature.
Problem instances
Five different base instances were generated for
the experimental analysis: instances #1 to #3 by
downsampling the UK-DALE dataset to an interval of 5
minutes and instances #4 and #5 by downsampling the
UK-DALE to intervals of 5, 10, and 15 minutes (see details
in Section 5.2). A datetime range limit was established
for training and testing data. For training data, the limits
were set from 2013-01-01 at 00:00:00 to 2013-07-01 at
00:00:00, while for the testing data, the limits were set
from 2013-07-01 at 00:00:00 to 2013-12-31 at 23:59:59. A
threshold of minimum consumption (H) was applied in
the normalization, which was set to 5.0 W. This threshold
allows discarding standby power consumption records.
The rst four instances were generated to analyze the
efcacy of the proposed algorithm to solve different cases
of energy consumption ambiguity. The fth instance, apart
from the presence of ambiguity, includes noise signals of
appliances that are not intended to be disaggregated.
A description of each problem instance and the motivation
of using it is provided next.
Instance #1. The generated dataset normalizes the
consumption of each appliance using the value of the
median of maximum consumption per activation (i.e.,
periods in which an appliance remains in state ON).
Outliers were ltered by lower and upper limits dened
by the standard deviation. The generated dataset is used
for training and testing the algorithms. This instance aims
at working with values close to the real ones but keeping
constant consumption values over time.
Instance #2. The generated dataset normalizes the
consumption values to generate ambiguity between the
consumption of two appliances: kettle and dishwasher.
The same dataset is used for training and testing the
algorithms. This instance aims at testing how the
algorithms solve the most basic case of ambiguity.
Instance #3. The generated dataset normalizes
consumption values in a similar way than instance
#2, but in this case including ambiguities between the
sum of consumption of three appliances (fridge, home
theatre, and washing machine) with the consumption of
another appliance (dishwasher). The same dataset is used
for training and testing the algorithms. This instance aims
at studying how the algorithms solve a more sophisticated
case of ambiguity.
Instance #4. The training dataset is the same than
in instance #2; but a new dataset was generated
for the testing stage, introducing small variations in
the consumption of every appliance, but the washing
machine. For example, the consumption of the fridge was
normalized to 260 W instead of 250 W. This instance was
designed to evaluate the proposed algorithm in a scenario
where testing appliances are similar but not equal to the
appliances used for training.
Instance #5. The dataset takes as a base the testing dataset
of the instance #4 and adds the consumption of extra
appliances to simulate noise in the signal. The behaviour
of each extra appliance is modelled as a discretization of
an exponential variable, procedure explained in Subsection
5.3. This instance aims to analyze the robustness of the
34
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
algorithm in the presence of unknown power consumption
values.
Table 1reports the normalized power consumption values
for each appliance, the sampling intervals, and the
presence of noise used for training and testing in each
instance. In turn, Figure 2shows the percentage of records
when each appliance is in state ON/OFF, which is the same
for all the generated datasets. Values were obtained by
applying data analysis to the UK-DALE dataset.
fridge washing
machine
kettle dishwasher home
theater
100
50
0
50
37
40.51.9
19
63
96 99.598.1
81
ON OFF
Figure 2 Percentage of operating time of each appliance
5.2 Analysis of different sampling intervals
Instances #4 and #5 have three sub-instances (each)
that vary the sampling interval, i.e., the period between
two consecutive power consumption records: 5 minutes
(sub-instances #4-5 and #5-5 ), 10 minutes (sub-instances
#4-10 and #5-10) and 15 minutes (sub-instances #4-15
and #5-15). Instances with different sampling intervals
are conceived to evaluate the proposed PS algorithm in
scenarios where the available data is more disperse in
time, closer to the real scenarios available by the national
electric company (UTE).
5.3 Adding noise to model uncertainty
Instance #5 include extra appliances, not considered
in previous instances, which are not intended to be
disaggregated. Instead, they are used to add noise to
the aggregated consumption signal. The main goal of
including those appliances is analyzing the robustness of
the proposed approach under the presence of uncertain
power consumption data. The procedure for generating
the consumption of an extra appliance is as follows.
Switching on and off a given appliance is assumed to
be a Poisson point process, i.e., they occur continuously
and independently at a nearly constant average rate.
The time interval in which the appliance status is OFF
(TOF F ) is assumed to be a discretization of an exponential
distribution of parameter λand the time interval in which
the appliance status is ON (TON ) is assumed to be a
discretization of an exponential distribution of parameter µ
(Equations 1011), where U1,U2are random numbers with
uniform distribution in [0,1] and xstands for the integer
part of x.
TOF F =1
λlog(1 U1)+ 1,(10)
TON =1
µlog(1 U2)+ 1,(11)
The procedure applied to generate noise for a single
appliance is described in Algorithm 3.
Algorithm 3 Procedure for generating the status of an extra
appliance along Ntime intervals
1: Input: N, λ,µ,output: y
2: y
0
3: m0
4: while m < N do
5: T1[(1) log(1 rand[0,1])] + 1
6: T2[(1) log(1 rand[0,1])] + 1
7: for i= 1,· · · ,min(T1, N m)do
8: y[m+i]0
9: end for
10: mm+T1
11: if m < N then
12: for i= 1,· · · ,min(T2, N m)do
13: y[m+i]1
14: end for
15: mm+T2
16: end if
17: end while
The procedure in Algorithm 3works as follows. The output
vector y is initialized as a vector of zeros of length N. The
main loop iterates until generating Ntime intervals. T1
represents a realisation of variable TOF F , generated using
the distribution dened in Equation 10 (line 5). Variable
1
λlog(1 U1)has exponential distribution with parameter
λand TOF F has geometric distribution with parameter
1eλ[19]. T2represents a realisation of variable TON
that has a geometric distribution with parameter 1eµ,
generated using the distribution dened in Equation 11
(line 6).
In lines 7-9, the OFF period of the appliance is added
to the noise generated so far (components equal to zero).
The case in which the number of time intervals simulated
so far is greater than the desired length Nis contemplated
by the expression min (T1, N m). In line 10, the value
35
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Table 1 Instances of datasets: normalized consumption of appliances, sampling intervals, and the presence of noise
instance appliance sampling interval noise
fridge washing machine kettle dishwasher home theater
#1 (testing, training) 117 W 3325 W 2390 W 2741 W 93 W 5 minutes No
#2 (testing, training) 250 W 2000 W 2500 W 2500 W 80 W 5 minutes No
#3 (testing, training) 300 W 1800 W 2200 W 2300 W 200 W 5 minutes No
#4 (testing) 250 W 2000 W 2500 W 2500 W 80 W 5 minutes No
#4-5 (training) 260 W 2000 W 2400 W 2600 W 70 W 5 minutes No
#4-10 (training) 260 W 2000 W 2400 W 2600 W 70 W 10 minutes No
#4-15 (training) 260 W 2000 W 2400 W 2600 W 70 W 15 minutes No
#5-5 (testing, training) 250 W 2000 W 2500 W 2500 W 80 W 5 minutes Yes
#5-10 (testing, training) 250 W 2000 W 2500 W 2500 W 80 W 10 minutes Yes
#5-15 (testing, training) 250 W 2000 W 2500 W 2500 W 80 W 15 minutes Yes
Table 2 Description of the appliances included in the problem
instances to simulate noisy loads. ST is the sampling time
expressed in seconds
appliance λ µ quantity consumption
lamp ST/3600 ST/300 3 8 W
lamp ST/15000 ST/300 3 10 W
microwave ST/7200 ST/100 1 2000 W
TV ST/30000 ST/7500 1 40 W
of mis modied according to the number of zeros added
in lines 7–9. If the value of mis smaller than N, the ON
period of the appliance is added to the noise generated
so far (components equal to one) (lines 12–14). The case
in which the number of time intervals simulated so far is
greater than the desired length Nis contemplated by the
expression min (T2, N m).
A series of zeros and ones is then generated for each
extra appliance, according to the values of parameters
λand µreported in Table 2and the noise in the form of
aggregate power of these extra appliances is calculated
according to their power consumption values. For
instance, for an interval of 5 minutes and three lamps
of 8 W, λ= 300/3600 implies that the mean time OFF
for these appliances is (1 eλ)112.5intervals of
5 minutes, i.e., approximately 62 minutes and the mean
time ON is (1 eµ)1= (1 e1)11.58 intervals
of 5 minutes, i.e., approximately 8 minutes. The same
computation for lamps of 10W results in a mean time OFF
of 4 hours and 12 minutes. Similarly, λ= 1/100 gives a
mean time OFF for the TV set of 8 hours and 22 minutes
and µ= 1/25 gives a mean time ON of 2 hours and 7
minutes. Finally, λ= 1/24 gives a mean time OFF for
the microwave of 2 hours and 2 minutes and µ= 3 gives
a mean time ON of 5 minutes. The fact that the values
of λand µare proportional to the length of the sampling
intervals gives similar ON and OFF mean values for the 10
and the 15 minutes sample.
In summary, for each sample interval, noise is generated in
the form of seven low consumption appliances (six lamps
and one TV set) and one high consumption appliance
(microwave), according to Algorithm 3and Table 2.
Sub-instances #5-5, #5-10 and #5-15 are formulated
to evaluate PS algorithm in scenarios where there are
appliances apart of the ones to be disaggregated. In real
scenarios, it is not reasonable to assume that two different
houses have identical sets of appliances. Additionally,
it is not possible to measure all appliances of a house
separately, and the consumption of the appliances not
included in the set of interest could be considered as
noise in the context of the problem. These facts justify the
inclusion of such sub-instances in order to get a more real
problem.
5.4 Software and hardware platform
The nilmtk framework was used to implement a pipeline
of execution for the experiments, as described in Figure 3.
The rst stage of the pipeline loads the dataset while the
second split the dataset into a training set and a testing
set. The training set is used to train the algorithm in its
different instances and then the testing set is used to
obtain the results of disaggregation. Finally, results are
compared with the ground truth data (i.e. the test set) to
compute a set of metrics.
The experimental evaluation was performed on National
Supercomputing Center (Cluster-UY) infrastructure that
counts with Intel Xeon-Gold 6138 nodes (up to 1120 CPU
cores), 3.5 TB RAM, and 28 GPU Nvidia Tesla P100,
connected by a high-speed 10 Gbps Ethernet network
(cluster.uy) [20].
5.5 Baseline algorithms for comparison
Two methods from the related literature were considered
as baseline for the comparison of the results obtained by
36
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Load datasets
Split into
train/test
sets
Train using
train set
Train set
Test set
Trained
model Test trained
model Predictions Process
metrics
Dataset
Figure 3 Execution pipeline implemented in nilmtk
the proposed algorithm: CO and FHMM.
The CO method was rst presented by Hart, and included
in the nilmtk framework. The approach of CO is to nd the
optimal combination of appliance states that minimises
the difference between the total sum of aggregated
consumption and the sum of the consumption of the
predicted state on of appliances. CO searches for a vector
ˆathat minimises the expression on Equation 6Given the
complexity of the CO algorithm, which is exponential in the
number of appliances, it is not useful to address scenarios
with a large number of appliances. The complexity of the
CO algorithm is exponential in the number of appliances.
Thus, it is not useful to address scenarios with a large
number of appliances.
After the introduction of FHMM [21],different variations
were developed to solve the energy disaggregation
problem [22]. HMM are mixture models that encode
historical information of a temporal series in a unique
multinomial variable, represented as a hidden state;
FHMM extends HMM to allow modeling multiple
independent hidden state sequences simultaneously.
FHMM scales worst than CO in scenarios with a large
number of appliances, due to the inherent computational
complexity of the method.
5.6 Metrics for results evaluation
A set of standard metrics were applied to evaluate the
efcacy of the proposed PS and baseline algorithms.
Consider that x(n)
iis the actual status series for appliance
nand ˆx(n)
ithe status series predicted by the algorithm.
Then, True Positive (TP), False Positive (FP), True Negative
(TN), and False Negative (FN) ratios are dened by
Equations 1215.
T P =X
i
AND(x(n)
i= 1,ˆx(n)
i= 1) (12)
F P =X
i
AND(x(n)
i= 0,ˆx(n)
i= 1) (13)
T N =X
i
AND(x(n)
i= 0,ˆx(n)
i= 0) (14)
F N =X
i
AND(x(n)
i= 1,ˆx(n)
i= 0) (15)
Five metrics are considered in the analysis:
precision of the prediction, dened as an estimator of
the conditional probability of predicting ON given that
the appliance is ON (Equation 16).
recall, dened as the conditional probability that the
appliance is ON given that the prediction is ON
(Equation 17).
F–Score, dened as the harmonic mean of precision
and recall (Equation 18).
Error in Total Energy Assigned (TEE), dened as
the error of the total assigned consumptions
(Equation 19).
Normalized Error in Assigned Power (NEAP), dened as
the mean normalized error in assigned consumptions
(Equation 20).
precision =T P
T P +F N (16)
recall =T P
T P +F P (17)
F–Score =2×precision ×recall
precision +recall (18)
TEE(n)=
X
t
y(n)
tX
t
ˆy(n)
t
(19)
NEAP(n)=Pt
y(n)
tˆy(n)
t
Pty(n)
t
(20)
37
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
5.7 Results
This subsection reports the numerical results of PS and
the baseline CO and FHMM algorithms in the experimental
evaluation. Regarding PS, all results were obtained
using the following parameter conguration, set by a
rule-of-thumb and empirical evaluation: δ= 100,d= 10,
H= 500 and φ= 250.
Instances without noise and sampling interval of 5
minutes
Tables 36report the results of the proposed algorithm
(PS) and the baseline algorithms (CO and FHMM), on
instances #1 to #4, considering a period of 5 minutes
between consecutive energy consumption records.
Results in Table 3indicate that PS was able to accurately
solve problem instances without ambiguity between
the power consumption of appliances. F-score values
between 0.92 and 1.0 were obtained. Both CO and FHMM
got F-score values around 0.6 for fridge and washing
machine, around 0.3 for dishwasher and home theater,
and 0.04 (i.e., almost null) for kettle. In all cases, F-score
values were lower than the obtained with PS.
Results in Table 4indicate that F-score values obtained
by PS for appliances with ambiguities decreased up to
9%, while the rest of the F-score values remains similar
to instance #1. Regarding the baseline algorithms, CO
showed a decrease of 50% in the prediction of appliances
with ambiguity, while results of FHMM remained similar
to the ones computed for instance #1, except for the kettle
(F-score value decreased 66%).
Results in Table 5indicates that the F-score values
of PS decreased for washing machine (3%), dishwasher
(6%), and kettle (the worst value, 25% less than for
instance #1). On the other hand, F-score values increased
for home theatre (6%) and did not vary for the fridge.
F-score values of the CO algorithm decreased for washing
machine (42%), kettle (67%), and dishwasher (42%),
compared with instance #1. In turn, F-score values for
FHMM decreased for all the appliances (up to 66% for
kettle), but the home theatre (increased 33%).
Finally, results in Table 6demonstrate that the proposed
PS algorithm has a robust behaviour when using different
normalized datasets for training and testing steps, which
slightly differ in the power consumption values used in
the normalization. The F-score for PS was greater than
0.99 for fridge and washing machine, greater than 0.97
for dishwasher, and greater than 0.94 for home theatre.
The lowest F-score value was obtained for the kettle (0.85),
which, similarly to instances #2 and #3, had the lowest
F-score values among all appliances. For instance #1,
the F-score of the kettle decreased 15%. The rest of the
appliances experienced a decrease/increase lower than
2%. In the case of the CO algorithm, concerning instance
#1, the F-score decreased 13% for the fridge, 55% for
the washing machine, 46% for the kettle, and 43% for the
dishwasher. In the case of the home theatre, the F-score
increased 8%. For the FHMM algorithm, F-score values of
fridge and dishwasher varied less than 1.6% with respect
to instance #1, and decreased for washing machine (11%)
and kettle (67%).
Instances without noise and variable sampling
interval
Tables 68report the results of the PS, CO, and FHMM
algorithms on instances #4-5, #4-10 and #4-15, where
the sampling interval varies in 5, 10 and 15 minutes.
In the case of the F-score of the PS algorithm, the
results show a decrease of 10.5% for the appliance kettle
from the 5 minutes sampling interval to the 15 minutes
version, while in the other appliances the decrease is
below 2.2%. Concerning to the TEE and NEAP metrics,
the results of the algorithm PS remain below 183 kW and
0.52 respectively. In general, the F-score of both CO and
FHMM algorithms remained lower than the F-score of PS.
Compared with the results varying the sampling intervals,
the results are mixed. The CO algorithm was able to
improve up to 70% in the case of the washing machine,
but it reduced more than 50% in the case of the kettle.
The F-score of the FHMM algorithm remained similar
along the three instances, with no variations or variations
below 10%. The TEE for both CO and FHMM algorithms
decreased in general, up to a decimal part in some cases,
while the NEAP metric varied up to 20% increasing or
decreasing.
The graphics in Figure 4and Figure 5summarize
the F-score results of the three studied algorithms in
the three variations of the sampling intervals for the
appliances fridge and kettle, respectively. The fridge
was selected for the graphic because it presents the
longest activation time, while the kettle was selected
because it presents the shortest activation time. Overall,
results of the PS algorithm were always better than those
computed by CO and FHMM. Furthermore, PS results show
high robustness, even when dealing with long sampling
intervals.
Instances with noise and variable sampling interval
Tables 911 report the result of PS, CO, and FHMM on
instances #5-5, #5-10 and #5-15. In these instances,
apart of varying the sampling interval in 5, 10 and 15
minutes, the consumption of extra appliances to generate
noise in the signal is added.
38
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Table 3 Results of CO, FHMM, and PS on instance #1
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 292.18 2318.84 2767.39 6651.69 1118.64
NEAP 0.8663 0.7644 5.9284 2.6279 2.1975
precision 0.8324 0.9863 0.7153 0.9758 0.8413
recall 0.5584 0.4827 0.0228 0.2301 0.2814
F-score 0.6684 0.6481 0.0442 0.3724 0.4218
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 306.46 3209.08 3399.42 5371.37 948.72
NEAP 0.8843 0.8367 6.8117 2.7134 2.3119
precision 0.7576 0.9817 0.7810 0.9768 0.5799
recall 0.5408 0.5078 0.0258 0.2377 0.2199
F-score 0.6311 0.6694 0.0500 0.3823 0.3188
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 23.87 0.00 0.00 0.00 29.67
NEAP 0.0218 0.0000 0.0000 0.0000 0.1497
precision 0.9839 1.0000 1.0000 1.0000 0.9409
recall 0.9942 1.0000 1.0000 1.0000 0.9121
F-score 0.9891 1.0000 1.0000 1.0000 0.9263
Table 4 Results of CO, FHMM, and PS on instance #2
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 2228.32 1701.36 5595.52 7206.13 685.29
NEAP 1.0053 1.5412 9.8478 3.0491 1.6285
precision 0.6973 0.8271 0.6715 0.9807 0.7781
recall 0.5123 0.2457 0.0111 0.1184 0.2907
F-score 0.5907 0.3789 0.0219 0.2113 0.4233
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 1401.84 962.88 7904.24 5016.79 431.27
NEAP 0.9007 1.1175 13.2448 2.1841 1.7049
precision 0.7687 0.9149 0.7007 0.9787 0.6649
recall 0.5573 0.4790 0.0084 0.2379 0.2850
F-score 0.6461 0.6288 0.0166 0.3828 0.3990
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 0.00 0.00 42.50 42.50 14.88
NEAP 0.0000 0.0000 0.1788 0.0473 0.1264
precision 1.0000 1.0000 0.9416 0.9681 0.9460
recall 1.0000 1.0000 0.8866 0.9843 0.9289
F-score 1.0000 1.0000 0.9133 0.9761 0.9374
Regarding the F-score metric of the PS algorithm,
the comparison of instances #5-5 (smaller sampling
interval) and instance #5-15 (larger sampling interval)
indicates that the results decreased up to 13% for washing
machine, kettle, and dishwasher, while F-score remained
similar for fridge and home theatre. Values of TEE and
NEAP metrics were all above 65kW and 0.53 respectively,
disregarding the appliance.
In the three sub-instances studied, the F-score of
the PS algorithm was higher than the F-score of CO and
FHMM. In addition, PS presented lower values than CO or
FHMM for TEE and NEAP, in all cases. The F-score of CO
39
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Table 5 Results of CO, FHMM, and PS on instance #3
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 1690.42 2194.13 6298.06 6720.05 949.90
NEAP 0.9386 1.6483 12.1919 3.0818 1.7343
precision 0.8217 0.8678 0.5876 0.9826 0.8432
recall 0.5754 0.2400 0.0073 0.1212 0.3250
F-score 0.6768 0.3760 0.0145 0.2157 0.4692
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 2069.24 1655.52 6273.13 6895.43 1561.12
NEAP 1.1036 1.2927 12.1388 3.1483 2.0024
precision 0.4318 0.9067 0.6387 0.9797 0.7645
recall 0.4512 0.3677 0.0087 0.1380 0.2942
F-score 0.4413 0.5232 0.0171 0.2419 0.4249
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 4.50 82.80 15.40 89.70 13.60
NEAP 0.0221 0.0668 0.5000 0.1092 0.0377
precision 0.9893 0.9771 0.7372 0.9266 0.9845
recall 0.9886 0.9570 0.7566 0.9629 0.9780
F-score 0.9889 0.9670 0.7468 0.9444 0.9812
Table 6 Results of CO, FHMM, and PS on instance #4-5
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 2543.42 2239.86 5208.68 7414.75 637.84
NEAP 0.9819 1.7824 9.6185 3.0921 1.8408
precision 0.6597 0.7653 0.6533 0.9826 0.7895
recall 0.5202 0.1823 0.0121 0.1193 0.3205
F-score 0.5817 0.2944 0.0238 0.2128 0.4559
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 1829.65 699.35 8567.56 5080.58 560.53
NEAP 0.9218 1.1591 14.6967 2.2034 1.9148
precision 0.7209 0.8403 0.6971 0.9797 0.6961
recall 0.5453 0.4634 0.0083 0.2383 0.2931
F-score 0.6210 0.5974 0.0163 0.3834 0.4125
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 182.69 62.00 42.60 111.00 145.00
NEAP 0.0440 0.0142 0.3221 0.0821 0.2720
precision 0.9985 1.0000 0.8066 0.9758 0.9666
recall 0.9957 0.9860 0.8984 0.9787 0.9166
F-score 0.9971 0.9930 0.8500 0.9773 0.9409
decreased 9% (for fridge) and 13.5% (for home theatre),
and did not vary for other appliances. TEE decreased up to
one third for all appliances, while NEAP presented similar
values. F-score for FHMM decreased up to 17% in all
appliances but the washing machine (incremented 3.5%).
TEE decreased up to one fth in all appliances, and NEAP
remained similar.
When comparing the results for all sub-instances of
instances #4 and #5 (i.e., lack of noise vs. presence
of noise), the results of the PS algorithm are different
depending on the sub-instance:
#4-5 vs #5-5: the F-score for the washing machine
and home theatre decreased up to 3%. On the other
40
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Table 7 Results of CO, FHMM, and PS on instance #4-10
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 1280.93 746.31 1607.04 4981.13 395.14
NEAP 1.0765 1.5259 6.8454 4.1716 2.1725
precision 0.4443 0.9471 0.6842 0.9764 0.792
recall 0.4132 0.2742 0.0121 0.0623 0.2694
F-score 0.4282 0.4252 0.0238 0.1171 0.402
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 581.89 280.14 3315.45 3020.55 460.4
NEAP 0.9491 1.2129 12.1956 2.6429 2.2588
precision 0.807 0.9188 0.594 0.9685 0.8431
recall 0.5167 0.4506 0.0068 0.2137 0.277
F-score 0.6301 0.6046 0.0135 0.3502 0.417
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 92.62 22.0 9.2 55.8 73.27
NEAP 0.0428 0.01 0.3321 0.0915 0.2675
precision 0.999 1.0 0.8195 0.9705 0.9703
recall 0.9965 0.9901 0.879 0.9743 0.9179
F-score 0.9977 0.995 0.8482 0.9724 0.9434
Table 8 Results of CO, FHMM, and PS on instance #4-15
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 670.46 551.67 1477.47 2706.1 191.76
NEAP 1.0734 1.4782 9.2879 3.3986 2.0149
precision 0.5722 0.9498 0.6867 0.9712 0.5858
recall 0.4674 0.2714 0.0067 0.0993 0.2092
F-score 0.5145 0.4222 0.0133 0.1802 0.3084
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 295.81 255.21 2085.94 2036.45 209.06
NEAP 0.9604 1.2312 12.3144 2.6414 2.0098
precision 0.7612 0.9281 0.6867 0.951 0.6825
recall 0.4861 0.4459 0.0067 0.2021 0.2471
F-score 0.5933 0.6024 0.0132 0.3333 0.3629
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 61.38 20.0 8.3 59.7 48.73
NEAP 0.0441 0.0136 0.5236 0.1216 0.2657
precision 0.9982 1.0 0.7590 0.9424 0.9705
recall 0.9960 0.9866 0.7590 0.9703 0.9191
F-score 0.9971 0.9933 0.7590 0.9561 0.9441
hand, the F-score increased 4.5% for the kettle and
remained equal for the other appliances. Concerning
TEE and NEAP, both metrics reduced their values
when comparing sub-instance #4-5 with #5-5.
#4-10 vs #5-10: the F-score for washing machine,
kettle, dishwasher, and home theatre decreased
from 0.5% up to 9.1%, while did not vary for
fridge. Regarding TEE, the values decreased for all
appliances but the kettle. Results for the NEAP metric
showed values lower than 0.35 in both instances, for
all appliances.
#4-15 vs #5-15: the F-score for appliances washing
41
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
5 10 15
0
0.2
0.4
0.6
0.8
1
Period (minutes)
F-Score
CO
FHMM
PS
Figure 4 F-score of CO, FHMM, and PS for appliance fridge in
instances #4-5, #4-10, and #4-15
5 10 15
0
0.2
0.4
0.6
0.8
1
Period (minutes)
F-Score
CO
FHMM
PS
Figure 5 F-score of CO, FHMM, and PS for appliance kettle in
instances #4-5, #4-10, and #4-15
machine and dishwasher decreased up to 6.5%,
while it increased up to 1.4% for the kettle and home
theatre. F-score values for the fridge remained
similar between instances. For TEE, results
decreased for the fridge and increased for the
other appliances, maintaining in all cases values
below 62kW. The NEAP metric results show similar
values for both instances.
The F-score of the algorithm PS for the Washing machine
decreases in the three sub-instances with the presence of
noise. Similarly, a decrease in the F-score was recorded
in two of the three sub-instances for the appliances
washing machine and dishwasher. In the other hand,
fridge kept similar values than in sub-instances without
noise (i.e., instance #4) while the kettle increases its
F-score in two of the three sub-instances. Results
in sub-instances with the presence of noise show a
tendency to decrease the performance in appliances with
a low number of activations and long activation times,
in the presence of noise and increasing sampling intervals.
The graphic in Figure 6summarizes the F-score results
of the algorithms CO, FHMM, and PS for the appliance
fridge in the problem instances #5-5, #5-10, and #5-15.
The graphic in Figure 7summarizes the F-Score of the
algorithms for the appliance kettle in the same scenarios.
5 10 15
0
0.2
0.4
0.6
0.8
1
Period (minutes)
f-Score
CO
FHMM
PS
Figure 6 F-score of CO, FHMM, and PS for appliance fridge in
instances #5-5, #5-10, and #5-15
5 10 15
0
0.2
0.4
0.6
0.8
1
Period (minutes)
f-Score
CO
FHMM
PS
Figure 7 F-score of CO, FHMM, and PS for appliance kettle in
instances #5-5, #5-10 and #5-15
Results of PS in Figures 6and 7show that the appliance
fridge, which presents the highest number of activations,
remains unchanged along with the instances. In
contstrast, results for CO and FHMM decrease along
with the instances. On the other hand, the results of PS for
42
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
the appliance kettle, which presents the lowest number
of activations, shows that the F-score decrease as the
sampling interval increases. It is expected to observe the
same behaviour for the algorithms CO and FHMM, but the
resulted F-score values are too low to conclude.
The reported results suggest that the presence of extra
appliances that generate noise in the aggregated signal
decrease the F-score performance of the PS algorithm in
most cases and up to 9.1%. The decrement of F-score for
PS is observed more frequently in appliances with lower
activation time than in ones with high activation times. For
CO and FHMM, results suggest that the F-score decrease
independently of the activation time. In general, results
indicate that in the presence of noise, as the sampling
interval increases, the F-score performance is degraded
up to 13%.
Summary
Overall, the proposed PS algorithm achieved satisfactory
results for all the studied instances of the power
consumption disaggregation problem.
Regarding the F-score metric, for instances without noise
and xed sampling intervals of 5 minutes, improvements
of PS were 60% over CO and 57% over FHMM in average,
and up to 64% over CO in problem instance #4-5 and
up to 60% over FHMM in problem instance #3. When
considering different sampling intervals, improvements
were 69% over CO and 59% over FHMM on average, and up
to 98% over CO and FHMM in problem instances #4-15 and
#4-5, respectively. For instance #4-15, with the maximum
sampling interval, results improved between 48% (worst
case) and 98% (best case), with an average improvement
of 70%, over CO. Similar results were obtained when
comparing with FHMM: PS results improved 98% in the
best case, 60% on average, and 39% in the worst case.
In problem instances with noise and different sampling
intervals, PS improved over baseline CO results up to 98%
in the best case (instance #5-15), 69% in average, and
43% in the worst case (instance #5-5). For baseline FHMM
results, PS improved up to 98% in the best case (the three
sub-instances), 61% on average, and 32% in the worst
case (instance #5-5). For instance #5-15, improvements
of PS ranged from 39% to 98% over CO and from 48% to
98% over FHMM.
Furthermore, PS systematically obtained the lowest
values of both TEE and NEAP for all instances. The
degradation of results obtained for the kettle in problem
instances with ambiguity, long sampling periods, or noise,
suggest that the lower percentage of operating time (0.5%
of the total time in ON state) negatively affects the results.
The more complex the dataset is, the more consumption
data are needed in the testing dataset, especially to
capture the ON/OFF behavior of appliances with shorter
operating time.
The graphics in Figures 8and 9summarize the F-score
obtained by CO, FHMM and PS algorithms for all studied
instances with a sampling interval of 5 minutes for fridge
and washing machine, respectively. These appliances
were selected because they present the larger (fridge)
and mean (washing machine) activation time. In all the
scenarios, PS obtained considerably better results than the
baseline algorithms.
#1 #2 #3 #4-5 #5-5
0
0.2
0.4
0.6
0.8
1
F-score
CO FHMM PS
Figure 8 F-score of CO, FHMM, and PS for instances with a
sampling interval of 5 minutes, for appliance fridge
#1 #2 #3 #4-5 #5-5
0
0.2
0.4
0.6
0.8
1
F-score
CO FHMM PS
Figure 9 F-score of CO, FHMM, and PS for instances
with a sampling interval of 5 minutes, for appliance
washing machine
6. Conclusions and future work
This article presented a proposal to address the problem
of household energy disaggregation using a non-intrusive
43
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Table 9 Results of CO, FHMM, and PS on instance #5-5
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 2217.76 1639.73 5802.36 8796.79 840.29
NEAP 1.0307 1.6097 10.1835 3.6616 1.9241
precision 0.6714 0.7493 0.6241 0.9865 0.7221
recall 0.4906 0.2591 0.0101 0.093 0.2525
F-score 0.567 0.3851 0.0199 0.17 0.3742
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 1080.94 1476.69 8210.2 5593.23 599.52
NEAP 0.8928 1.2786 13.6761 2.408 1.7635
precision 0.8592 0.8769 0.7263 0.9787 0.7177
recall 0.56 0.3932 0.0084 0.2127 0.262
F-score 0.6781 0.543 0.0166 0.3495 0.3839
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 0.5 36.0 20.0 20.0 10.16
NEAP 0.0001 0.0686 0.219 0.058 0.1269
precision 0.9999 0.9698 0.9051 0.9671 0.9428
recall 1.0 0.9619 0.8794 0.9747 0.9311
F-score 0.9999 0.9658 0.8921 0.9709 0.9369
Table 10 Results of CO, FHMM, and PS on instance #5-10
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 782.62 1164.42 2063.2 4600.55 325.91
NEAP 0.9966 1.6854 8.0059 3.9545 1.8397
precision 0.6627 0.9334 0.782 0.9606 0.7179
recall 0.4833 0.2249 0.0108 0.0729 0.2315
F-score 0.5589 0.3625 0.0213 0.1355 0.3501
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 373.1 635.38 3384.24 3238.49 483.63
NEAP 0.9853 1.3113 12.0121 2.8532 2.0886
precision 0.7949 0.9434 0.5865 0.9587 0.7987
recall 0.4918 0.3924 0.0064 0.1908 0.2492
F-score 0.6077 0.5543 0.0127 0.3183 0.3799
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 0.5 58.0 17.5 17.5 6.32
NEAP 0.0002 0.1232 0.3534 0.0925 0.1234
precision 0.9998 0.9252 0.8496 0.9469 0.946
recall 1.0 0.9503 0.8071 0.9601 0.9316
F-score 0.9999 0.9376 0.8278 0.9534 0.9388
approach. The PS algorithm, based on detecting pattern
similarities between power consumption, was proposed.
The method works in two stages:
the training stage, that creates the data used to nd pattern
similarities; and the testing stage, that looks for patterns
similarities between the training data and the data to be
disaggregated.
The experimental evaluation was performed over
realistic problem instances that consider the presence of
ambiguous appliance consumptions, different sampling
intervals, and extra appliances consumptions that are not
intended to be disaggregated but modify the aggregate
signal. Results were compared with two baseline
algorithms, CO and FHMM, from the related literature.
44
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
Table 11 Results of CO, FHMM, and PS on instance #5-15
CO
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 439.5 764.0 1571.18 2728.08 261.97
NEAP 1.0791 1.5812 9.43 3.492 2.0609
precision 0.6438 0.9552 0.6506 0.9597 0.6819
recall 0.4326 0.2385 0.0067 0.0989 0.2149
F-score 0.5175 0.3817 0.0134 0.1794 0.3268
FHMM
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 202.69 403.58 2172.2 2279.77 251.7
NEAP 1.0065 1.308 12.3287 2.9379 2.042
precision 0.7144 0.9349 0.6506 0.9625 0.7069
recall 0.4617 0.4043 0.0061 0.1723 0.2221
F-score 0.5609 0.5645 0.0121 0.2922 0.3381
PS
metric fridge washing machine kettle dishwasher home theater
TEE (kW) 0.5 30.0 65.0 65.0 4.0
NEAP 0.0002 0.1452 0.5301 0.1268 0.1241
precision 1.0 0.9376 0.8916 0.8991 0.9453
recall 0.9998 0.9189 0.6789 0.972 0.9317
F-score 0.9999 0.9281 0.7708 0.9341 0.9384
PS achieved very satisfactory results, signicantly
outperforming CO and FHMM, with improvements in
the F-score up to 64% for instances #1–#4-5, up to
69% for sub-instances of instance #4 and up to 98% for
sub-instances of instance #5.
Overall, the obtained results showed that the proposed PS
algorithm is effective for addressing the problem of energy
consumption disaggregation. The proposed approach
can be applied in practice as the rst step for household
energy planning by using intelligent recommendation
systems [23].
The main lines for future work are related to performing an
in-depth study of the training parameters of the proposed
algorithm, in order to capture those patterns that currently
are not learnt due to uncertainty or insufcient information
available to solve ambiguities. Furthermore, the proposed
approach must be extended by including the study of
instances with the presence of multi-state or continuous
variable appliances. The proposed methods can also
be integrated into more sophisticated computational
intelligent methods (e.g., long-short term memory neural
networks) to solve the problem.
7. Declaration of competing interest
None declared under nancial, professional and personal
competing interests.
8. Acknowledgements
The research was partially supported by ‘Comisión
Sectorial de Investigación Cientíca’, Universidad de la
República, Uruguay, and National Electricity Company
(UTE), Uruguay, under project ‘Computational intelligence
to characterize household energy consumption’. The
work of S. Nesmachnow was partly supported by ANII and
PEDECIBA, Uruguay.
References
[1] International Energy Agency, “World Energy Outlook 2015,” White
paper, 2015.
[2] D. Larcher and J. Tarascon, “Towards greener and more sustainable
batteries for electrical energy storage,” Nature Chemistry, vol. 7,
no. 1, pp. 19–29, 2015.
[3] R. Ford, “Reducing domestic energy consumption throughbehaviour
modication,” Ph.D. dissertation, Oxford University, 2009.
[4] E. Luján, A. Otero, S. Valenzuela, E. Mocskos, L. Steffenel, and
S. Nesmachnow, “Cloud Computing for Smart Energy Management
(CC-SEM Project),” in Smart Cities, ser. Communications in
Computer and Information Science. Springer, 2019, vol. 978.
[5] E. Orsi and S. Nesmachnow, “Smart home energy planning using IoT
and the cloud,” in IEEE URUCON, 2017.
[6] R. Massobrio, S. Nesmachnow, A. Tchernykh, A. Avetisyan, and
G. Radchenko, “Towards a cloud computing paradigm for big data
analysis in smart cities,” Programming and Computer Software,
vol. 44, no. 3, pp. 181–189, 2018.
[7] N. Batra, J. Kelly, O. Parson, H. Dutta, W. Knottenbelt, A. Rogers,
A. Singh, and M. Srivastava, “NILMTK: an open source toolkit for
non-intrusive load monitoring,” in 5th International Conference on
Future Energy Systems, 2014, pp. 265–276.
[8] R. Porteiro, S. Nesmachnow, and L. Hernández, “Short term load
45
Juan P. Chavat et al., Revista Facultad de Ingeniería, Universidad de Antioquia, No. 98, pp. 27-46, 2021
forecasting of industrial electricity using machine learning,” in
Smart Cities, ser. Communications in Computer and Information
Science, S. Nesmachnow and L. Hernández, Eds. Springer, 2019,
vol. 1152.
[9] B. Neenan, J. Robinson, and R. Boisvert, “Residential electricity use
feedback: A research synthesis and economic framework,” Electric
Power Research Institute, 2009.
[10] J. Kelly and W. Knottenbelt, “The UK-DALE dataset, domestic
appliance-level electricity demand and whole-house demand from
ve UK homes,” Scientific Data, vol. 2, 2015.
[11] J. Chavat, J. Graneri, and S. Nesmachnow, “Household energy
disaggregation based on pattern consumption similarities,” in 2nd
Iberoamerican Congress on Smart Cities, 2019.
[12] M. Figueiredo, A. De Almeida, and B. Ribeiro, “Home electrical signal
disaggregation for non-intrusive load monitoring (NILM) systems,”
Neurocomputing, vol. 96, pp. 66–73, 2012.
[13] G. Hart, “Nonintrusive appliance load monitoring,” Proceedings of
the IEEE, vol. 80, no. 12, pp. 1870–1891, 1992.
[14] R. Bongli, S. Squartini, M. Fagiani, and F. Piazza, “Unsupervised
algorithms for non-intrusive load monitoring: An up-to-date
overview,” in 15th International Conference on Environment and
Electrical Engineering, 2015.
[15] J. Kelly and W. Knottenbelt, “Neural NILM: Deep Neural Networks
Applied to Energy Disaggregation,” in 2nd ACM International
Conference on Embedded Systems for Energy-Efficient Built
Environments, 2015, pp. 55–64.
[16] J. Kolter and M. Johnson, “Redd: A public data set for energy
disaggregation research,” in Workshop on Data Mining Applications
in Sustainability, 2011, pp. 59–62.
[17] A. Soares, A. Gomes, and C. Antunes, “Categorization of residential
electricity consumption as a basis for the assessment of the impacts
of demand response actions,” Renewable and Sustainable Energy
Reviews, vol. 30, pp. 490–503, 2014.
[18] J. Kelly and W. Knottenbelt, “Metadata for Energy Disaggregation,”
in The 2nd IEEE International Workshop on Consumer Devices and
Systems, Västerås, Sweden, Jul. 2014.
[19] J. Gibbons and S. Chakraborti, Nonparametric Statistical Inference.
CRC Press, 2003.
[20] S. Nesmachnow and S. Iturriaga, “Cluster-UY: High Performance
Scientic Computing in Uruguay,” in International Supercomputing
Conference in Mexico, 2019.
[21] Z. Ghahramani and M. Jordan, “Factorial hidden Markov models,”
in Advances in Neural Information Processing Systems, 1996, pp.
472–478.
[22] H. Kim, M. Marwah, M. Arlitt, G. Lyon, and J. Han, “Unsupervised
disaggregation of low frequency power measurements,” in SIAM
international conference on data mining, 2011, pp. 747–758.
[23] G. Colacurcio, S. Nesmachnow, J. Toutouh, F. Luna, and D. Rossit,
“Multiobjective household energy planning using evolutionary
algorithms,” in Smart Cities, ser. Communications in Computer
and Information Science, S. Nesmachnow and L. Hernández, Eds.
Springer, 2019, vol. 1152.
46
... The more commonly used unsupervised techniques are hidden Markov models (HMMs), which define several hidden states that the model can transition to, thus representing the operational condition of the device (on, off, and intermediate states). They then relate these states to observable results based on the analyzed consumption data [4]. ...
Article
Full-text available
Climate change, primarily driven by human activities such as burning fossil fuels, is causing significant long-term changes in temperature and weather patterns. To mitigate these impacts, there is an increased focus on renewable energy sources. However, optimizing power consumption through effective usage control and waste recycling also offers substantial potential for reducing energy demands. This study explores non-intrusive load monitoring (NILM) to estimate disaggregated energy consumption from a single household meter, leveraging advancements in deep learning such as convolutional neural networks. The study uses the UK-DALE dataset to extract and plot power consumption data from the main meter and identify five household appliances. Convolutional neural networks (CNNs) are trained with transfer learning using VGG16 and MobileNet. The models are validated, tested on split datasets, and combined using ensemble methods for improved performance. A new voting scheme for ensembles is proposed, named weighted average confidence voting (WeCV), and it is used to create combinations of the best 3 and 5 models and applied to NILM. The base models achieve up to 97% accuracy. The ensemble methods applying WeCV show an increased accuracy of 98%, surpassing previous state-of-the-art results. This study shows that CNNs with transfer learning effectively disaggregate household energy use, achieving high accuracy. Ensemble methods further improve performance, offering a promising approach for optimizing energy use and mitigating climate change.
... At the device level, some authors treated the problem by detecting the appliance "on" (positive) state or event and"off" (negative) state or event, e.g. [12], [59], [60]; similar to dichotomous classification. The appliance is considered in the "on" state if the predicted power exceeds a certain per-appliance threshold. ...
Article
Full-text available
This paper examines a contextual paradigm for energy disaggregation using Non-Intrusive Load Monitoring (NILM). Due to numerous issues including low sampling rates, missing data, misaligned readings, and diverse combinations of nonlinear and multi-state appliances, this problem is challenging and complex. We proposed two different deep learning models for household energy disaggregation with shared parameter learning based on Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs). The proposed models utilize a sliding window of the mains aggregate readings to predict the per-appliance consumption at the end point of the sequence; using the entire input sequence gives more contextual information and reduces the prediction complexity in other problem settings. We evaluated the performance using two benchmark datasets, ENERTALK and UK-DALE, under different settings and scenarios including sampling rates, imputation methods, cross-dataset generalization, and single and multi-target settings. The results demonstrate that the proposed models show better robustness and generalization capability than the other sequence-to-point models when no consumption information is discarded in the alignment process, especially for cross-domain disaggregation.
... • Design and analysis of methods for residential electricity consumption disaggregation, with the main goal of automatically determining the appliances switched on in a household, using as input the total electricity consumption reported by a smart meter and other relevant features, by applying computational intelligence 6,23 . These techniques allow overcoming the difficulties and costs of implementing intrusive measurements, which usually are only performed in a small number of households and used as input for computational intelligence and machine learning methods. ...
Article
Full-text available
This article introduces a dataset containing electricity consumption records of residential households in Uruguay (mostly in Montevideo). The dataset is conceived to analyze customer behavior and detect patterns of energy consumption that can help to improve the service. The dataset is conformed by three subsets that cover total household consumption, electric water heater consumption, and by-appliance electricity consumption, with sample intervals from one to fifteen minutes. The datetime ranges of the recorded consumptions vary depending on the subset, from some weeks long to some years long. The data was collected by the Uruguayan electricity company (UTE) and studied by Universidad de la República. The presented dataset is a valuable input for researchers in the study of energy consumption patterns, energy disaggregation, the design of energy billing plans, among other relevant issues related to the intelligent utilization of energy in modern smart cities.
... This article proposes a novel mixed integer programming model for scheduling deferrable electric appliances in households, which simultaneously considers minimizing the electricity cost and maximizing the users satisfaction. Users satisfaction measures to what extend the starting time and duration for appliances usage scheduled by the model match the users preferences, which is estimated through the analysis of historical data [6][7][8]. However, since this parameter can show certain variability between different days, stochastic resolution approaches that consider this uncertain behaviour are devised. ...
Article
Full-text available
In the last decades, cities have increased the number of activities and services that depends on an efficient and reliable electricity service. In particular, households have had a sustained increase of electricity consumption to perform many residential activities. Thus, providing efficient methods to enhance the decision making processes in demand-side management is crucial for achieving a more sustainable usage of the available resources. In this line of work, this article presents an optimization model to schedule deferrable appliances in households, which simultaneously optimize two conflicting objectives: the minimization of the cost of electricity bill and the maximization of users satisfaction with the consumed energy. Since users satisfaction is based on human preferences, it is subjected to a great variability and, thus, stochastic resolution methods have to be applied to solve the proposed model. In turn, a maximum allowable power consumption value is included as constraint, to account for the maximum power contracted for each household or building. Two different algorithms are proposed: a simulation-optimization approach and a greedy heuristic. Both methods are evaluated over problem instances based on real-world data, accounting for different household types. The obtained results show the competitiveness of the proposed approach, which are able to compute different compromising solutions accounting for the trade-off between these two conflicting optimization criteria in reasonable computing times. The simulation-optimization obtains better solutions, outperforming and dominating the greedy heuristic in all considered scenarios.
Chapter
This article presents a study to characterize the electricity consumption in residential buildings in Uruguay. Understanding residential electricity consumption is a relevant concept to identify factors that influence electricity usage, and allows developing specific and custom energy efficiency policies. The study focuses on two home appliances: air conditioner and water heater, which represents a large share of the electricity consumption of Uruguayan households. A data-analysis approach is applied to process several data sources and compute relevant indicators. Statistical methods are applied to study the relationships between different relevant variables, including appliance ownership, average income of households, and temperature, and the residential electricity consumption. A specific application of the data analysis is presented: a regression model to determine the consumption patterns of water heaters in households. Results show that the proposed approach is able to compute good values for precision, recall and F1-score and an excellent value for accuracy (0.92). These results are very promising for conducting an economic analysis that takes into account the investment cost of remotely controlling water heaters and the benefits derived from managing their demand.
Article
This article presents an approach applying computational intelligence for detecting the use of air conditioners in households. The main objective is determining the intensive use of air conditioning with a high level of confidence. Unsupervised (-means) and supervised (Artificial Neural Networks) approaches are developed for classifying consumers for a case study in Uruguay, using data collected by a smart meters network and open weather data. Data from 29 Uruguayan cities were considered in the period from January 1, 2022, to December 31, 2022. Two thermal models are developed for estimating the temperature inside households. The main results indicate that the proposed approach is able to reach a high classification accuracy, up to 94.5% and a high classification recall, up to 95%, for the considered real case study. The final scope of the work is developing smart tools for classifying consumers, to design and suggest specific commercial products that promote energy efficiency.
Chapter
This article presents an unsupervised machine learning approach for the problem of detecting use of air conditioning in households, during the summer. This is a relevant problem in the context of the modern smart grid approach under the paradigm of smart cities. The proposed methodology applies data analysis, a thermal inertial model for estimating the temperature inside a household, statistical analysis, clustering, and classification. The proposed model is validated on a real case study, considering households with known use of air conditioning in summer. In the evaluation, the proposed classification methodology reached an accuracy of 0.897, a promising result considering the very small cardinality of the set of households. The proposed method is valuable since it applies an unsupervised approach, which does not require large volumes of labeled data for training, and allows determining characteristics in the electricity consumption patterns that are useful for categorization. In turn, it is a non-intrusive method and does not require investing in the installation of complex devices or conducting consumer surveys.KeywordsUnsupervised learningData analysisResidential electricity consumption
Article
Non-intrusive load monitoring (NILM) is among successful approaches aiding residential energy management. However, the presence of multi-mode appliances and appliances with close power values and lack of a proper volume of training dataset have remained influential in worsening the computational complexity and diminishing the accuracy of classification-based NILM algorithms. To tackle these challenges, we propose a novel classification process, which considers the correlation of water and electricity consumption of some appliances as a novel signature in the network to improve the accuracy of disaggregation process in overlapping modes and tackle the lack of proper volume of training dataset. In the first phase of the proposed method, the K-nearest neighbors method, as a fast classification technique, is employed to extract power signals of appliances with exclusive non-close power values. Then, two different deep learning-based methods are proposed to disaggregate the consumption of appliances with close consumption values considering the correlation of electricity and water consumption of some appliances. Throughout these methods the water consumption of these appliances are also disaggregated. The main objectives of the proposed methods are increasing accuracy in the close modes of power consumption, and informing consumers about the water consumption pattern of some appliances. To illustrate the proposed processes and validate its effectiveness, the Almanac Minutely Power Dataset as a real dataset with a sampling rate of 1-minute is considered. The numerical results show marked improvement with respect to the existing classification-based NILM techniques. Moreover, it shows the applicability of proposed methods in dealing with low-frequency readings carried out by existing smart meters.
Article
Residential energy flexibility is considered one of the efficient concepts to alleviate the ever-increasing concerns of better balancing supply and demand. A positive assumption that all buildings have the same energy flexibility potential, is not applicable in a realistic situation especially when direct load control is not applied for each. This paper proposes a novel approach to characterize the energy flexibility of shiftable appliances and EVs (as two main sources of energy flexibility) protecting consumers’ data privacy and considering the usage behavior. First, an xg-boost regression is utilized to non-intrusively extract the consumption of appliances. Then, the uncertainty of the power values and the operation time of each appliance is computed based on the extracted consumption patterns. Finally, a price-based DR model is used to determine their energy flexibility and prioritize them according to the change in their operating time before and after optimization. Case studies are conducted and results prove that the proposed method is computationally cost-effective and outperforms other methods in terms of accuracy, customer privacy and comfort. Moreover, the results show that the proposed model can significantly decrease the flattening signal up to 3% for each residential building and up to 25% in the residential aggregated level.
Article
Power supply is one of the basic needs in modern smart homes. Computer-aid tools help optimizing energy utilization, contributing to sustainable goals of modern societies. For this purpose, this article presents a mathematical formulation to the household energy planning problem and a specific resolution method to build schedules for using deferrable electric that can reduce the cost of the electricity bill while keeping user satisfaction at a satisfactory level. User satisfaction have a great variability, since it is based on human preferences, thus a stochastic simulation-optimization approach is applied for handling uncertainty in the optimization process. Results over instances based on real-world data show the competitiveness of the proposed approach, which is able to compute different compromise solution accounting for the trade-off between these two conflicting optimization criteria.
Preprint
Full-text available
This article presents the advances in the design and implementation of a recommendation system for planning the use of household appliances, focused on improving energy efficiency from the point of view of both energy companies and end-users. The system proposes using historical information and data from sensors to define instances of the planning problem considering user preferences, which in turn are proposed to be solved using a multiobjective evolutionary approach, in order to minimize energy consumption and maximize quality of service offered to users. Promising results are reported on realistic instances of the problem, compared with situations where no intelligent energy planning are used (i.e., ‘Bussiness as Usual’ model) and also with a greedy algorithm developed in the framework of the reference project. The proposed evolutionary approach was able to improve up to 29.0% in energy utilization and up to 65.3% in user preferences over the reference methods.
Conference Paper
Full-text available
This paper describes the CC-SEM project, a research effort focused on building an integrated platform for smart monitoring, controlling, and planning energy consumption and generation in urban scenarios. The project integrates cutting-edge technologies (Big Data analysis, computational intelligence, Internet of Things, High Performance Computing and Cloud Computing), specific hardware for energy monitoring/controlling built within the project and explores their communication. The proposed platform considers the point of view of both citizens and administrators, providing a set of tools for controlling home devices (for end users), planning/simulating scenarios of energy generation (for energy companies and administrators), and shows some advances in communication infrastructure for transmitting the generated data.
Article
Full-text available
In this paper, we present a Big Data analysis paradigm related to smart cities using cloud computing infrastructures. The proposed architecture follows the MapReduce parallel model implemented using the Hadoop framework. We analyse two case studies: a quality-of-service assessment of public transportation system using historical bus location data, and a passenger-mobility estimation using ticket sales data from smartcards. Both case studies use real data from the transportation system of Montevideo, Uruguay. The experimental evaluation demonstrates that the proposed model allows processing large volumes of data efficiently.
Conference Paper
Full-text available
Research on Smart Grids has recently focused on the energy monitoring issue, with the objective to maximize the user consumption awareness in building contexts on one hand, and to provide a detailed description of customer habits to the utilities on the other. One of the hottest topic in this field is represented by Non-Intrusive Load Monitoring (NILM): it refers to those techniques aimed at decomposing the consumption aggregated data acquired at a single point of measurement into the diverse consumption profiles of appliances operating in the electrical system under study. The focus here is on unsupervised algorithms, which are the most interesting and of practical use in real case scenarios. Indeed, these methods rely on a sustainable amount of a-priori knowledge related to the applicative context of interest, thus minimizing the user intervention to operate, and are targeted to extract all information to operate directly from the measured aggregate data. This paper reports and describes the most promising unsupervised NILM methods recently proposed in the literature, by dividing them into two main categories: load classification and source separation approaches. An overview of the public available dataset used on purpose and a comparative analysis of the algorithms performance is provided, together with a discussion of challenges and future research directions.
Conference Paper
Full-text available
Non-intrusive load monitoring, or energy disaggregation, aims to separate household energy consumption data collected from a single point of measurement into appliance-level consumption data. In recent years, the field has rapidly expanded due to increased interest as national deployments of smart meters have begun in many countries. However, empirically comparing disaggregation algorithms is currently virtually impossible. This is due to the different data sets used, the lack of reference implementations of these algorithms and the variety of accuracy metrics employed. To address this challenge, we present the Non-intrusive Load Monitoring Toolkit (NILMTK); an open source toolkit designed specifically to enable the comparison of energy disaggregation algorithms in a reproducible manner. This work is the first research to compare multiple disaggregation approaches across multiple publicly available data sets. Our toolkit includes parsers for a range of existing data sets, a collection of preprocessing algorithms, a set of statistics for describing data sets, two reference benchmark disaggregation algorithms and a suite of accuracy metrics. We demonstrate the range of reproducible analyses which are made possible by our toolkit, including the analysis of six publicly available data sets and the evaluation of both benchmark disaggregation algorithms across such data sets.
Article
Full-text available
Many countries are rolling out smart electricity meters. These measure a home's total power demand. However, research into consumer behaviour suggests that consumers are best able to improve their energy efficiency when provided with itemised, appliance-by-appliance consumption information. Energy disaggregation is a computational technique for estimating appliance-by-appliance energy consumption from a whole-house meter signal. To conduct research on disaggregation algorithms, researchers require data describing not just the aggregate demand per building but also the `ground truth' demand of individual appliances. In this context, we present UK-DALE: an open-access dataset from the UK recording Domestic Appliance-Level Electricity at a sample rate of 16 kHz for the whole-house and at 1/6 Hz for individual appliances. This is the first open access UK dataset at this temporal resolution. We recorded from five houses, one of which was recorded for 655 days, the longest duration we are aware of for any energy dataset at this sample rate. We also describe the low-cost, open-source, wireless system we built for collecting our dataset.
Conference Paper
Full-text available
Energy disaggregation is the process of estimating the energy consumed by individual electrical appliances given only a time series of the whole-home power demand. Energy disaggregation researchers require datasets of the power demand from individual appliances and the whole-home power demand. Multiple such datasets have been released over the last few years but provide metadata in a disparate array of formats including CSV files and plain-text README files. At best, the lack of a standard metadata schema makes it unnecessarily time-consuming to write software to process multiple datasets and, at worse, the lack of a standard means that crucial information is simply absent from some datasets. We propose a metadata schema for representing appliances, meters, buildings, datasets, prior knowledge about appliances and appliance models. The schema is relational and provides a simple but powerful inheritance mechanism.
Chapter
Forecasting the day-ahead electricity load is beneficial for both suppliers and consumers. The reduction of electricity waste and the rational dispatch of electric generator units can be significantly improved with accurate load forecasts. This article is focused on studying and developing computational intelligence techniques for electricity load forecasting. Several models are developed to forecast the electricity load of the next hour using real data from an industrial pole in Spain. Feature selection and feature extraction are performed to reduce overfitting and therefore achieve better models, reducing the training time of the developed methods. The best of the implemented models is optimized using grid search strategies on hyperparameter space. Then, twenty four different instances of the optimal model are trained to forecast the next twenty four hours. Considering the computational complexity of the applied techniques, they are developed and evaluated on the computational platform of the National Supercomputing Center (Cluster-UY), Uruguay. Standard performance metrics are applied to evaluate the proposed models. The main results indicate that the best model based on ExtraTreesRegressor obtained has a mean absolute percentage error of 2.55% on day ahead hourly forecast which is a promising result.