ArticlePDF Available

Abstract and Figures

Research on smart grid technologies is expected to result in effective climate change mitigation. Non-Intrusive Load Monitoring (NILM) is seen as a key technique for enabling innovative smart-grid services. By breaking down the energy consumption of households and industrial facilities into its components, NILM techniques provide information on present appliances and can be applied to perform diagnostics. As with related Machine Learning problems, research and development requires a sufficient amount of data to train and validate new approaches. As a viable alternative to collecting datasets in buildings during expensive and time-consuming measurement campaigns, the idea of generating synthetic datasets for NILM gain momentum recently. With SynD, we present a synthetic energy dataset with focus on residential buildings. We release 180 days of synthetic power data on aggregate level (i.e. mains) and individual appliances. SynD is the result of a custom simulation process that relies on power traces of real household appliances. In addition, we present several case studies that demonstrate similarity of our dataset and four real-world energy datasets.
This content is subject to copyright. Terms and conditions apply.
1
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
A synthetic energy dataset for
non-intrusive load monitoring in
households
Christoph Klemenjak
 ✉ , Christoph Kovatsch, Manuel Herold & Wilfried Elmenreich






during expensive and time-consuming measurement campaigns, the idea of generating synthetic






Load monitoring is vital for eective and accurate energy monitoring in buildings. Detailed insights can empower
further research, help streamlining processes, and improve a buildings energy efficiency1. Introduced in2,
Non-Intrusive Load Monitoring (NILM) techniques serve to break down a buildings aggregate energy consump-
tion to identify active appliances and also to provide diagnostic information. Extensive reviews can be obtained
from3 and4. NILM can be considered as Machine Learning problem. As such, it requires datasets to train models,
to conduct performance evaluation, to evaluate the benet in real scenarios, and also to perform benchmarking
on a common basis. In case of NILM, ground-truth data on aggregate and appliance-level energy consumption
are crucial4.
Traditionally, NILM scholarship relies on energy consumption datasets. Such datasets usually contain infor-
mation on energy consumption on aggregate level (monitored at the mains) and individual loads, which is
provided by plug-level meters. Energy consumption datasets are the outcome of measurement campaigns in
buildings or industrial facilities, which require expensive measurement equipment, bring bureaucratic burdens,
and are time-consuming activities5. As a viable alternative, the idea of generating synthetic data gain momentum
recently. e main motivation behind generating synthetic datasets is to reduce costs for measurement campaigns
and save valuable work hours. Instead, custom simulators provide energy consumption datasets on-demand and
in contrast to real datasets, without limitations on measurement periods. Furthermore, real datasets suer from
missing readings (gaps), misaligned timestamps, and corrupted data as a result of sensor miscalculation or mal-
function6,7. Synthetic data does not show such issues.
With SynD, we present a synthetic energy consumption dataset for Non-Intrusive Load Monitoring (NILM)
with focus on the residential sector. SynD provides 180 days of a simulated household with 21 household appli-
ances. We derive custom appliance models from the outcome of our measurement campaign in two Austrian
households and by applying a modelling approach similar to8 and9. As it is shown in the evaluation, the household
simulated in SynD can be associated with a relaxed lifestyle of a single person or a young couple. We implemented
a dataset generator that utilises our custom appliance models to simulate one household for given input param-
eters such as sampling rate and duration. As traditional energy consumption datasets, SynD provides aggregate
power readings as well as power readings of individual household appliances. Furthermore, our dataset complies
Institute of Networked and Embedded Systems, University of Klagenfurt, 9020, Klagenfurt, Austria. e-mail:
klemenjak@ieee.org


Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
with the majority of suggestions for energy datasets, which were presented in10. For instance, we release SynD in
two dierent versions: Besides the widely-used CSV format, we also provide a HDF5 version of SynD that is fully
compatible with NILMTK11, a toolkit for reproducible NILM experiments with state-of-the-art algorithms12.
To the best of our knowledge, there exist three major contributions on synthetic dataset generation with
regard to NILM: AMBAL8, SmartSim13, and SHED14.
e Automated Model Builder for Appliance Loads (AMBAL), presented in8, extracts appliance models from
real datasets. ese models consist of sequences of parametrised signatures and are used by a trace generator
to simulate a real household. Since the creators of AMBAL haven’t released a dataset, we report insights pro-
vided by8. Besides a statistical analysis of commercial and residential energy datasets, the authors of14 released
SHED (https://nilm.telecom-paristech.fr/shed/), a synthetic dataset with focus on commercial buildings. SHED
is generated by a custom algorithm that simulates current and power readings for several buildings. We draw
comparisons on the basis of provided power consumption data in SHED. SmartSim is a device-accurate smart
home energy trace generator. is simulation framework utilises device energy models and device usage models
to simulate a household. SmartSim leverages its modelling methodology from empirical characterisation studies
presented in9. Device models in SmartSim build on energy data from Smart*, a real-world energy dataset15. To
compare SynD and SmartSim, we consider the latest version on Github (https://github.com/klemenjak/smartsim/
tree/master/house_1).
We summarise key dierences between related work and our contribution in Table1. SynD provides 180
days of a simulated household that consists of 21 appliances. We provide aggregate and submeter readings at a
rate of 5 Hz, which is suspected to be suitable for low-frequency NILM investigations, as a recent study on data
requirements for NILM claims16. Besides energy data, we provide an extensive amount of metadata in the NILM
metadata format17.
Methods
In this section, we depict the methods applied to create the synthetic energy consumption dataset (SynD). First,
we report on a measurement campaign that was conducted in real households in Carinthia, a province of Austria.
Second, we explain how our approach categorises household appliances to group them according to their energy
consumption behaviour. Finally, we describe in detail our dataset generation approach.
 During a measurement campaign in two Austrian
households, one in Klagenfurt and one in Villach, we monitored 21 electrical household appliances. e main goal
of the measurement campaign was to record representative power consumption patterns of those 21 appliances,
where a power consumption pattern is represented by the shape of the power consumption over time for a single
operation18. Table2 summarises monitored appliances, their manufacturer, and the number of recorded patterns
during the campaign. For household appliances with a wide variety of operational programmes or adjustable
settings such as temperature or intensity, we recorded power consumption patterns of the most-frequently-used
options. Figure1 shows recorded power consumption patterns for two programmes of a dishwasher. Although
both power consumption patterns refer to the same device, we can observe a clear dierence in terms of shape,
length, and energy consumption between the two patterns.
As data logger, we used a Rohde & Schwarz HMC8015 power analyzer, which provides compliance with
IEC 62301, EN 50564, and EN 61000-3-2. Table3 summarises the main specications of this device. With a
measurement accuracy of 0.05% of reading and a temporal resolution of 100 ms, the measurement device meets
the instrumentation requirements for energy datasets suggested in10. In conjunction with a socket adapter, the
HZC815-EU EU connector, we attached the measurement device to one electrical appliance a time. Figure2
depicts how measurements were conducted. We gathered the outcome of our measurement campaign in form of
CSV les, which contain active-power readings with a sampling interval of 100 ms.
One way of categorising appliances is through the number of operational states3. In our considerations, we
focus on specic time windows of power consumption rather than on single operational states. Inspired by the
empirical characterisation in9, the automated model builder for appliance loads8, and the concept of predictabil-
ity of power consumption patterns in18, we dened four appliance categories: constantly-on, periodical, single
pattern, and multi pattern.
• Constantly-On: Appliances of this group consume energy without any downtime. In our dataset, an example
of such an appliance is the WiFi router, which operates continuously.
AMBAL SmartSim SHED SynD
Appliances 14 25 66 21
Duration n/a 7 days 14 days 180 days
NILMTK format n/a No No Yes
Released data No Yes Ye s Ye s
Sampling Rate 1 Hz 1 Hz 0.033 Hz 5 Hz
Scope residential residential commercial residential
Tab le 1. Comparison of existing synthetic energy datasets.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
• Periodical: We refer to appliances, which run autonomously and have a recurring consumption pattern, as
periodical appliances. A common example for periodical appliances are fridges. Fridges operate autono-
mously and have predictable duty cycles.
• Single pattern: e vast majority of household appliances do not operate autonomously i.e. they require a
user either to operate or to start a specic programme. From that follows that such appliances are activated
by a user, perform a specic task, and turn o or are turned o aer completion of that task. e group of
ID Household appliance Manufacturer Patterns Category
2 Fridge Bomann 1 Periodical
3Dishwasher Bosch 3Multi
4Electric heater Ningbo Elect. 2Multi
5Washing machine Miele 2 Multi
6Toa s ter Philips 3 Multi
7Fan CasaFan 2Multi
8 Microwave Siemens 3 Multi
9Iron Moulinex 2 Multi
10 Hot air gun ermo Elect. 2Multi
11 Router Linksys 1 Constantly-On
12 Coee machine DeLonghi 3 Multi
13 TV Panasonic 2Multi
14 Printer HP 2 Multi
15 Laptop computer Lenovo 2Multi
16 Lamp TaoTronics 1 Single
17 Gaming PC Acer 2 Multi
18 Pocket Radio Schneider 1 Single
19 Monitor DELL 1 Single
20 Electric oven Severin 1 Single
21 Hair dryer Philips 1 Single
22 Water kettle CLA Tronic 1 Single
Tab le 2. Household appliances in SynD.
Fig. 1 Power consumption patterns of the dishwasher in SynD: (a) pattern of programme A (b) pattern of
programme B.
Specication Description
A/D converter resolution 16 bit
Measurement accuracy 0.05 % of reading
Power range 50 μW to 12 kW
Physical quantity active power in W
Resolution of out put data 100 ms
Sampling frequency (waveform) 500 Hz
Tab le 3. Specications of the HMC8015 power analyzer.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
single-pattern appliances considers appliances with a single power consumption pattern. For instance, we can
observe a similar power consumption pattern during every operation for water kettles. External factors such
as the lling level of the kettle inuence the length of the pattern to a certain degree but the main characteris-
tics of the pattern, such as peak consumption and shape, can be predicted fairly well.
• Multi pattern: Appliances of the multi-pattern category oer several modes of operation with distinct power
consumption patterns. Examples for multi-pattern appliances are dishwashers, washing machines, and
electric heaters. Patterns of such appliances not only dier in length but also show distinct process steps.
From that follows, that appliances perform dierent tasks during these programmes that can lead to com-
pletely dierent power consumption patterns. Figure1 shows power consumption patterns of two dierent
programmes of the dishwasher in SynD. We observe clear dierences between the two patterns. erefore,
we want to emphasise the importance of considering multiple consumption patterns to better model such
appliances.
An overview of household appliances and associated categories in our dataset is provided in Table2. e
diculty of categorising household appliances lies in the extraction of consumption patterns and the predicta-
bility of appliance usage as well as duration of appliance usage18. While some appliances such as dishwashers are
designed to have clear programmes of operation with a predictable end time, it is challenging to identify the most
appropriate power consumption pattern for user-controlled appliances such as hair dryers and microwave ovens.
We addressed this issue by incorporating expert knowledge into our measurement campaign, which is a result
of studies related to a personalised feedback system for energy management in households19 and conclusions
drawn from the outcome of a measurement campaign in Austrian households20. Based on this knowledge, we
adapted the residents behaviour during measurements, i.e. appliance usage, in a way to produce as representative
consumption patterns as possible.
 SynD is the result of a simulation process that relies on power consumption patterns of
existing household appliances in two Austrian households. We provide detailed insights on the simulation process
following a top-down approach. We begin with the big picture of our implementation and conclude with details
on dynamic placing and interpolation of consumption patterns.
In principle, the simulation follows a straightforward procedure, as Box1 outlines. Parametrised by a set of
input parameters, we simulate the power consumption of one imaginary household day by day. In our simula-
tion approach, days are dened to be independent observations i.e. the energy consumption of one day does
not inuence the energy consumption of the next day. While a real household might show some correlations of
appliance usage between subsequent days or week days we decided for a simple model assuming independent
days, since this eect is hard to characterise based on existing data and is not very relevant for current load disag-
gregation algorithms. For every day in our simulation, we obtain the power consumption of selected household
appliances individually. As per default, we consider all 21 appliances. In addition to individual power readings
of appliances, we also obtain the aggregate power consumption of the household by accumulating the individual
power readings of appliances. Figure3 shows the aggregate power signal for one day. Aggregate power signals
are particularly interesting for applications such as Non-Intrusive Load Monitoring (i.e. load disaggregation) and
energy forecasting.
As soon as the simulation process nishes, the obtained dataset is either saved to a HDF5 le following the
NILMTK data format11 or compressed to a ZIP archive. In case of the ZIP archive, this archive contains metadata
as well as 22 CSV les (one le per appliance plus one le for the aggregated power).
Our simulation approach assumes that household appliances don’t alter their behaviour due to operation of
other present appliances i.e. appliances operate independently. We simulate the power consumption behaviour
of appliances individually and neglect any correlations between them, which was identied as a necessary step to
simplify the modelling problem. Appliance simulations in SynD share a set of input parameters: sampling inter-
val, duration and power type. As per default, the simulator generates a dataset with a duration of 180 days and a
Fig. 2 Reenactment of our measurement setup.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
sampling interval of 0.2 s. e vast majority of low-rate NILM datasets provide either active power readings or
apparent power readings5. Also, active power readings are used for billing in real energy grids. For this reason,
the emphasis of our simulation process is on active power. Our approach simulates the power consumption of
appliances day by day. To generate data for a new day, the simulation process follows three steps, which we will
discuss in detail:
1. Selection of a power consumption pattern from templates
2. Interpolation or resampling of the selected pattern
3. Identication of the time of usage and insertion of the selected pattern
e rst step of simulating the power consumption of an appliance in SynD is to select a power consumption
pattern for the current day of simulation. As already pointed out, we dened four dierent appliance categories:
constantly-on, periodical, single-pattern and multi-pattern. e category of an appliance decides on how a power
consumption pattern is selected during the simulation:
• For appliances of the constantly-on category, such as the router, the simulator loads the power consumption
pattern recorded during the measurement campaign and successively inserts this pattern until data for one
day is generated.
• Appliances such as fridges show a periodical power consumption behaviour. For such appliances, we recorded
multiple operational cycles during the measurement campaign. To mimic real periodical appliances, the sim-
ulation loads the recorded data and inserts this sequence of power consumption patterns until data for one
day is generated.
• For appliances of the single-pattern category, the simulator selects the one power consumption pattern
recorded during the measurement campaign.
• We incorporate several multi-pattern appliances in SynD, as Table2 shows. For appliances of this category, the
simulator randomly selects one of the recorded patterns, where all patterns are equally likely to be selected.
Box 1 e simulation process in SynD.
Fig. 3 One day in the life of SynD.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
For appliances of the categories constantly-on and periodical, the simulation is completed aer the rst step.
is is because our approach mimics the real-world behaviour of constantly-on and periodical appliances by
repeatedly inserting data that was recorded during the measurement campaign and therefore, no further process-
ing is required. For example, we expect fridges to show a strong periodical behaviour without any noticeable devi-
ations from it (unless the fridge is open for a considerable duration or warm food has been put in). In contrast to
that, simulation of single and multi-pattern appliances require more wide-ranging processing strategies in order
to better mimic their true behaviour. For instance, aer selecting a power consumption pattern for single and
multi-pattern appliances, we introduce a random variable with a uniform probability distribution, which decides
whether or not to ignore the selected pattern. In this way, we randomly ignore the outcome of the pattern selec-
tion since in real households, residents rarely use all of their appliances on a daily basis. Instead of the selected
power consumption pattern, we insert a Null vector for that day in case the random variable prompts the simu-
lation to ignore the pattern. For every single and multi-pattern appliance, we dened a unique probability distri-
bution. e probability distributions have been obtained from the appliance utilisation in GREEND20, an energy
consumption dataset that is the outcome of measurement campaigns in several Austrian and Italian households.
In principle, appliances can be divided into two main groups. e rst group denes clear programmes, which
result in predictable power consumption patterns. Wide-spread examples for this group are dishwashers and
washing machines. Such appliances oer a set of dierent washing programmes, which result in more or less the
same power consumption pattern. For this group of appliances, our simulator does not perform any manipulation
to the selected power consumption pattern. In contrast to the rst group, there exists a big variety of electrical
appliances without unique or pre-dened programmes. For instance, hairdryers, vacuum cleaners, microwave
ovens, water kettles and electric heaters belong to this group18. ese appliances are either actively controlled by
residents or strongly depend on individual user settings. Furthermore, such appliances show considerable varia-
tions in terms of daily energy usage. To mimic this behaviour, we implemented a special interpolation policy for
this second group of appliances: First, the simulator checks if interpolation is required for the selected appliance
i.e. to what category an appliance belongs. If there is a need for interpolation, then the simulator draws a ran-
dom number from a uniform distribution. e parameters of the uniform distribution depend on the appliance
and are listed in Table4. We derived those parameters by analysing existing datasets and estimating common
lower and upper duration of usage per appliance. e obtained sample denes the length of the power consump-
tion pattern aer interpolation i.e. the duration. Finally, the simulator applies interpolation to alter that specic
power consumption pattern. In this way, we add new samples to the pattern or remove samples from the pattern,
depending on the targeted length of the pattern.
Residents distinguish themselves by special habits and individual daily routines. On a household-wide level,
this may lead to certain time windows with higher energy consumption i.e. residential rush hours. However,
assuming that appliances always operate at the exact same time of the day represents a misleading modelling
assumption. For this reason, a reasonable level of timing variation has to be introduced to appliance simulations
i.e. appliance usage has to be shied within reasonable time windows. We approach this issue by spreading out the
use of household appliances during the day. We implemented a random placing mechanism that randomly selects
power-on times of appliances from pre-dened time windows. ose time windows were dened for single and
multi-pattern appliances and are summarised in Table4. We derived those time windows from studies related to
Appliance Range of mean
μ [time of day] Std. de viation
σ [min] Interpo lation
[min]
Toa s ter 08:00–09:30 15
Washing machine 14:00–16:45 60
Dishwasher 12:30–16:40 90
Fan 12:30–16:40 145 17–84
Heater 18:00–19:00 30 50–167
Hot air gun 11:00–12:30 30 3–7
Iron 13:30–15:15 30 40–100
Microwave 16:30–17:45 15 2–5
Radio 08:30–09:30 30 15–35
Water kettle 11:30–17:00 30 3–7
Hairdryer 07:45–16:45 30 4–8
Electric oven 08:00–17:15 60 5–15
Monitor 14:00–16:45 1 20–100
TV 15:15–19:00 1 35–250
Printer 09:45–19:30 1 1–15
Coee machine 08:20–15:15 1
Laptop 11:00–19:30 1 15–85
Lamp 16:40–21:00 1 15–50
Gaming PC 14:00–19:30 1 80–167
Tab le 4. Pre-dened parameters for dynamic placing and interpolation: range of the mean for start time,
standard deviation of the start time, variation of usage duration.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
a personalised feedback system19 and a measurement campaign in Austrian households20. We dene one uniform
distribution per appliance based on those time windows. During the simulation of an appliance, we draw a sample
from its associated uniform distribution, e.g. we obtain a sample between 11:30 and 17:00 for the water kettle. In
conjunction with a pre-set value for the standard deviation σ, this sample serves as mean μ to parametrise a nor-
mal distribution. Next, we draw a sample from that normal distribution to obtain the power-on time of the appli-
ance. Our simulator ensures that the power-on time of an appliance cannot lie on the following day. is way, we
are condent that the power consumption of one day cannot inuence the following day. For example: In case of
the dishwasher, we draw a sample from its associated uniform distribution to obtain a time between 12:30 and
16:40. Let’s assume we obtain the time 13:45. As next step, we convert this time to the number of minutes since
midnight (825 min). is number serves as the mean of a normal distribution with a standard deviation of 90
min, as Table4 reports. To obtain the power-on time of the dishwasher, we draw a sample from the normal distri-
bution
μσ==(825 min, 90 min)
. e obtained sample denes the starting time of the dishwasher for the
current day of the simulation. Figure4 illustrates the result of our random placing strategy for another common
appliance: a printer. e plot shows ten simulated days for the printer. We observe a clear spread of the patterns
during the day with dierent distances between the inserted patterns. We perform this special placing method in
order to increase the probability of obtaining dierent starting times for appliances even if we draw the same
starting times for two appliances in the rst step. In this special case, the normal distribution in the second step
would still provide distinct starting times for those two appliances. Avoiding identical switching times of appli-
ances is said to be an important detail in certain research problems. For instance, the Switch Continuity Principle
represents an essential assumption in Non-Intrusive Load Monitoring (NILM)21 and must not be neglected. By
deriving the power-on times in a nested manner and through utilisation of several probability distributions, we
aim to achieve strong compliance with the Switch Continuity Principle (SCP) in our dataset.
Our implementation of SynD builds on random number generators provided by the Numpy package. ose
generators support initial seeding to foster repeatability of simulations. As generator for discrete uniform distri-
butions, we selected randint. is function draws integers from a half-open interval [a, b) following a probability
density function (PDF):
=
fx ba
forx ab
otherwise
()
1
[, )
0(1)
For normally-distributed samples, we incorporate the generator normal. Samples provided by this generator
follow the probability density function (PDF):
πσ
=
μ
σ
fx e() 1
2(2)
x
2
()
2
2
2
Fig. 4 Variation of the power-on time for the printer for ten dierent days.
Specication Description
AC power types active power in W
Compatible to NILMTK Yes
Duration 180 days
File format CSV and HDF5
Number of appliances 21
Number of households 1
Origin of ground-truth Austria
Sampling interval 0.2 s
Tab le 5. Basic information on SynD.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
e actual shape of the PDF is parametrised by the mean μ and the standard deviation σ.
Data Records
With SynD, we release an energy consumption dataset that consists of synthetic data. For a duration of 180 days,
we simulated a household in Austria, where the emphasis of simulations was on consumption of electrical energy.
e utilised appliance models build on data that was recorded during a measurement campaign in real house-
holds. is data can be found in the archive appliance_traces.zip. Table5 summarises key properties of SynD. e
current version, published in a gshare data repository22, contains simulated active power readings of 21 appli-
ances. More information on appliances embedded in SynD can be obtained from Table2. is version of SynD
comes with a sampling interval of 0.2 s. Beyond power readings, we provide detailed metadata on appliances and
an HDF5 version of SynD, which is compatible with the Non-Intrusive Load Monitoring Toolkit (NILMTK)11.
e initial release of SynD comprises four les, as Table6 lists. Inspired by suggestions made in a recent paper
on energy datasets10, we release SynD in two dierent formats: CSV and HDF5. e CSV version of SynD can be
obtained from SynD_CSV.zip. is archive consists of 22 CSV les, where one CSV le contains the power time
series of one appliance a time. e lename indicates to what appliance the data is associated with. Box2 shows
the top of the le 1 .csv, which summarises the mains power consumption over time. Human-readable times-
tamps serve as index and tabulators as delimiters in all CSV les of our release.
e le appliance_labels.yml includes a Python dictionary that explains the mapping of CSV lenames and
appliances in SynD. e HDF5 version of SynD can be obtained from SynD.h5. e zip le metadata.zip oers
comprehensive metadata on the dataset, measurement devices (HMC8015), and all 21 household appliances.
Across all metadata les, we apply the metadata schema presented in17. We selected this metadata schema (https://
github.com/nilmtk/nilm_metadata) because of its great acceptance within the NILM community. In Box3, we
show metadata for the coee machine as an example. To the best of our abilities, we collected information on
the type, nominal power consumption and manufacturer for all appliances. e metadata les provided along-
side the dataset are meant to serve as important resources for future investigators. We provide information on
appliance-specic information, details on measurement devices, general remarks to our dataset, and references
to further resources.
File Name Description
appliance_labels.yml Explains mapping of IDs and appliances.
appliance_traces.zip e power traces used to create appliance models.
metadata.zip Contains metadata of dataset, meters and appliances.
dataset_generator.zip e generator used to create SynD.
SynD.h5 e NILMTK version of Sy nD.
SynD_CSV.zip e CSV version of SynD.
Tab le 6. Files associated with SynD.
Box 2 e head of le 1 .csv.
1 2019-09-29 00:00:00.000 3.842
2 2019-09-29 00:00:00.200 3.842
3 2019-09-29 00:00:00.400 3.832
4 2019-09-29 00:00:00.600 3.840
5 ...
Box 3 Metadata of the coee machine.
1 # coffee_machine.yaml
2 rooms:
3 - B10.2.014
4 meter_model: HMC 8015 Power Analyzer
5 appliance:
6 type: Coffee machine
7 components:
8type:ESAM04.120MagnicaS
9 nominal_consumption:
10 bias: 240
11 current: 10
12 frequency: 60
13 power: 1450
14 manufacturer: DeLonghi
15 year_of_manufacture: 2011
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
e methods section provides information on our dataset generation approach in form of clear step-by-step instruc-
tions on an abstracted level so that our approach can be understood without digging deep into source code. However,
to give experts better insights into our simulation approach, we release the rst public version of our dataset generator
along with SynD. e archive dataset_generator.zip contains an executable version of our generator with pre-dened
settings. We would like to emphasise that future versions of this toolkit will be published on our Github repository.
e Non-Intrusive Load Monitoring Toolkit, NILMTK, enjoys a high reputation in the NILM research com-
munity. Introduced in11, it provides functionalities to perform dataset analysis and aims to enable benchmark-
ing of load disaggregation algorithms. Recent contributions, presented in12, extend the toolkit by introducing
Fig. 5 NILMTK-DF format hierarchy for SynD.
Fig. 6 A comparison of aggregate power data: (a) variation of daily energy consumption for forty days (b)
heatmap for average load proles of forty days.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
10
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
new APIs for disaggregation and experiments. To lower the entry barrier for NILMTK users, we provide a
NILMTK-compatible version of our synthetic dataset. is version of SynD uses the NILMTK-DF le format11.
is allows seamless integration into the toolkit and therefore, easy access to power readings. Figure5 shows
the hierarchical model of the SynD household. SynD contains one energy meter group (elec) that consists of 22
meters. Meter1 represents the mains, i.e., aggregate power consumption of the household. Meter2 to meter22
contain power readings of one appliance per meter. In this version of SynD, power readings are stored as Pandas
DataFrames and indexed by human-readable timestamps. We demonstrate how to access and plot data from
SynD using NILMTK in Box4.
Technical Validation
Real-world energy datasets are the outcome of measurement campaigns in households and/or industrial facilities
with special attention to not disrupt daily routines within the monitored household so that the recorded data
resembles reality as best as possible5. In this section, we present analyses and case studies that signal strong simi-
larity between our synthetic energy dataset SynD and real-world energy consumption datasets. In our studies, we
use data from multiple households embedded in four common energy consumption datasets: DRED23, ECO24,
REFIT25 and UK-DALE26. We paid attention to select households that are commonly used in related work. An
extensive list of available energy datasets can be obtained from5. By assessing the similarity between real and
synthetic data, we demonstrate that SynD represents a valid energy dataset. Our validation studies focus on two
aspects of energy consumption datasets:
1. Aggregate consumption: We study dierences in the energy consumption of households on an aggregate
level (i.e. smart meter data)
2. Power consumption of single appliances: We study appliance usage in households and analyse similarities
between power readings from real households and SynD
 A household’s aggregate power signal, obtained from a smart meter, can
provide deep insights into daily routines of residents, individual habits, and present appliances such as heat
pumps27. Smart meter data can also be used to predict energy consumption of households28. e authors of29 pro-
vide a comprehensive review of smart meter data analytics. With regard to a synthetic energy dataset, the question
arises how well such a simulated time series resembles aggregate power series of real households. For this reason,
we present a study that compares aggregate power readings of SynD and readings obtained from real households.
For the duration of forty days, we extracted the aggregate power signal from house 1 in DRED, house 1 and 2 in
ECO, house 1 and 2 from REFIT, and house 1, 2 and 5 from UK-DALE.
As a rst step, we computed the daily energy consumption of those households for forty consecutive days.
e boxplot in Fig.6a gives insights on how much the daily energy consumption varies across the observed
households. With 2.09 kWh, we observe the lowest median energy consumption in house 1 of DRED, whereas
house 5 of UK-DALE shows the highest median energy consumption with 12.80 kWh. e household com-
posed of synthetic data, SynD, ranks in the middle of observed households with 6.47 kWh. Nearest neighbours
of SynD are house 1 of UK-DALE 7.62 kWh and house 2 in ECO 5.47 kWh. Furthermore, the box associated
with SynD shows an intermediate box size, which indicates that the average deviation from the median lies in a
realistic range compared to a narrow box for DRED and a rather wide box for house 2 of REFIT. To summarise
the ndings presented by Fig.6a: e results of our rst study indicate that based on the average daily energy
consumption, the synthetic household in SynD appears to be very similar to a real household. Neither does the
daily energy consumption of SynD focuses on a narrow interval nor we observe outliers during our observation
period of forty days.
With regard to the energy consumption of single days, it is important to examine at what time of the day
households consume the largest amount of energy. For a synthetic dataset, it is important to demonstrate that
appliance usage is assigned to realistic time windows. For instance: e average person would not classify dish-
washer usage in the middle of the night as a common event, though rare exceptions may apply. In our second
validation study, we derived the average load prole of nine households for a duration of forty consecutive days,
eight of them being real households and the remaining one the household embedded in SynD. We illustrate those
average load proles with the help of a heatmap. e heatmap in Fig.6b divides the load proles into time slots
with a duration of 30 min. For every time slot, we plot the average power consumption during that time window.
For many households, we observe strong similarities between ECO 1, REFIT 1, and UK-DALE 1. e households
ECO 2 and DRED 1 show considerable lower levels of power consumption for most times of the day compared to
other households in this study. Particularly noticeable are apparent special characteristics of some households: In
REFIT 2, we identify a considerable high level of power consumption in the morning, which has power consump-
tion levels similar to UK-DALE 2 and SynD 1 during late evenings. During the evening as well as late evening, we
identify strong similarities between the real households UK-DALE 2, REFIT 2 and the simulated household SynD
1. In general, SynD closely resembles real households during the second half of the day. However, we identify con-
siderably lower levels of power consumption in SynD during the morning, which rather resemble levels observed
in the households ECO 2 and DRED 1.
We suspect two independent causes to account for these dierences: First, our dataset does not contain any
white goods with substantial power consumption such as common electric stoves. As a consequence, activities
during breakfast time such as preparing ham and eggs is not reected in the energy consumption during the
morning. Also, our dataset does not include electric water heaters, which would operate in the morning. Second,
load proles are strongly inuenced by the lifestyle and daily routines of residents. Families, senior citizens,
Content courtesy of Springer Nature, terms of use apply. Rights reserved
11
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
adults, and young adults all distinguish themselves in their wake-up time as well as the duration they stay inside
their homes in the morning. Simulations related to SynD were implemented by young adults and students for the
large part. While most tasks like measuring a device are straightforward and have been meticulously performed,
we have to assume that some design decisions for example on selecting a given device or on looking up reasonable
schedules in other datasets might have been by the students’ own interpretation of a normal lifestyle. However,
evaluation of the dataset quality in terms of comparison to other real datasets has been done independently to
avoid students’ designing simulations that look realistic to them. As we can obtain from the heatmap, there is little
power consumption during the morning, medium consumption during the aernoon and rather large consump-
tion during the late evening. To summarise, the household simulated in SynD can be associated with a rather
laid-back lifestyle of a single person or a young couple having little energy consumption before noon and use their
appliances during the aernoon and night time. Furthermore, it should be pointed out that Fig.6b shows that
there is not a single time window for SynD with quixotic levels of power consumption i.e. all time windows of our
synthetic dataset show reasonable power consumption levels.
 An integral part of our implementation of a synthetic energy con-
sumption dataset is the simulation of individual load signals i.e. simulation of single appliances. Our simulation
approach builds on power consumption patterns that were recorded during a measurement campaign in real
households. During the simulation of SynD, we manipulate, resample, and interpolate those patterns according to
our random placing policy. As a result of this complex simulation, the question arises how similar those simulated
appliances are to real appliances. We demonstrate the validity of our approach by means of two studies: In the rst
study, we compare the energy consumption of simulated appliances to appliances monitored in real-world energy
datasets for a time window of forty days. In the second study, we apply statistical measures to examine similarities
of SynD and other datasets.
For a duration of forty consecutive days, we computed the energy consumption of dishwashers, fridges, wash-
ing machines and water kettles for multiple households of DRED, ECO, REFIT, SynD, and UK-DALE. Where
possible, we selected data from the same season to achieve a fair comparison. It should be noted that we apply the
same time window as in the previous study i.e. forty days per household. Table7 lists the energy consumption
per appliance. We mark those households that don’t contain a respective appliance type as not available (n/a).
We notice that the energy consumption of appliances diers signicantly between the observed households. For
example, the dishwasher in REFIT 1 consumed 7.75 kWh over a period of forty days, whereas the dishwasher
in REFIT 2 devoured 43.19 kWh. Similarly, we observe an energy consumption of 32.27 kWh for the washing
machine in UK-DALE 5, whereas in house 2 of the same dataset, we identify merely 2.28 kWh. ese dierences
in energy consumption can have various causes. For instance, the energy consumption of appliances strongly
depends on the number of residents, their habits, family situation, etc. As a result, common household appli-
ances such as dishwashers and washing machines may operate more frequently in households with larger families.
Second, appliances of the same kind but dierent device model may dier substantially in terms of energy con-
sumption. As a consequence, two dierent appliances that are built to serve the same physical task may require dif-
ferent levels of energy consumption to complete that specic task. Whatever the origin of dierent levels of energy
consumption may be, we observe similarities between certain groups of dishwashers, washing machines, and
water kettles. Interestingly, water kettles in British datasets (REFIT & UK-DALE) seem to consume considerably
more energy over those forty days than their Continental-European counterparts (DRED & ECO) in this study.
More studies on electric kettles can be found in related work, where researchers present studies on usage patterns
and discuss potentials for energy savings30. We observe that the water kettle in SynD shows a similar energy con-
sumption level as the kettles in DRED and ECO. As concerns the simulated household appliances of SynD, we
observe that their energy consumption ranks either in the upper third or in the middle of energ y consumption. As
a consequence of this ranking, we speculate that our simulation process generates a sucient amount of patterns.
Statistical similarity of appliances. Besides total energy consumption, appliances dier in power states and power
consumption patterns i.e. level of power consumption over time. Particularly when evaluating synthetic data, the
Dataset House
Energy Consumption in kWh
Dishwasher Fridge Washing
machine Water Kettle
DRED 1 n/a 28.45 4.60 2.45
ECO 1 n/a 16.19 22.04 4.27
ECO 2 15.94 19.70 n/a 2.39
REFIT 1 7.75 15.25 10.12 n/a
REFIT 2 43.19 28.19 14.15 23.06
REFIT 8 n/a 7.80 20.91 16.17
SynD 1 26.52 16.98 24.92 4.48
UK-DALE 1 12.57 30.71 21.04 11.71
UK-DALE 2 7.69 5.49 2.28 34.46
UK-DALE 5 13.09 30.85 32.27 0.00
Tab le 7. Energy consumption of selected household appliances for forty days.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
12
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
question arises how similar those synthetic time series are compared to time series that stem from real measure-
ments. In order to answer this question, and to validate our simulation approach, we present a case study that utilises
statistical distance measures to quantify the similarity between household appliances from DRED, ECO, REFIT,
SynD, and UK-DALE. Our study uses the same household appliances as previous case studies and data from the
same forty days. As a rst step, we extract the time series for dishwashers, fridges, washing machines, and water ket-
tles from the datasets. Where possible, we extract the time series from the same time of the year (i.e. same months).
Next, we clean the time series and resample to a sampling interval of 10 s. en, we derive the probability mass
Fig. 7 PMFs created from forty days of data: (a) dishwasher 1 in SynD, (b) dishwasher 2 in ECO, (c) dishwasher 2
in UK-DALE, (d) fridge 1 in SynD, (e) fridge 1 in ECO, (f) fridge 2 in REFIT, (g) washer 1 in SynD, (h) washer 1 in
UK-DALE, (i) washer 1 in DRED, (j) water kettle 1 in SynD, (k) water kettle 2 in ECO, (l) water kettle 8 in REFIT.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
13
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
function (PMF) from the respective time series as described in31. We provide examples for some of the derived PMFs
in Fig.7. To enhance readability in those plots, we drop values for power values smaller than 10 W. For the four appli-
ance types considered in Fig.7, we observe that in comparison to PMFs derived from real data, the PMFs obtained
from synthetic data scatter less. However, we identify strong similarity between PMFs of the same appliance category.
For instance, the PMFs of the dishwashers all have three representative power states, two below 250 W and one close
to 2000 W or above. Similar observations can be made for fridges, washing machines, and water kettles in this study.
To quantify the similarity between synthetic and real appliances, we compute statistical distance measures
between probability mass functions. In this study, we use the Hellinger distance and a distance measure based on
the Jensen-Shannon divergence. e Hellinger distance32 is dened as the Euclidean norm of the dierence of the
square-roots of two discrete probability distributions P and Q:
=⋅
=⋅
DPQPxQx
PQ
()
1
2(()())
1
2(3)
H
xX
2
2
A Hellinger distance of 0 indicates total similarity, whereas the maximum value is 1. We derive the Hellinger dis-
tance between PMFs of the dishwashers, fridges, washing machines, and water kettles. Figure8 reports the results of
our study. We present four matrices, where one matrix is associated with one appliance type a time. e presented
matrices state the similarity in form of the Hellinger distance between two appliances. For every row of a matrix,
we compute the Hellinger distance of one appliance, for instance a dishwasher, to all other appliances of the same
kind. It should be noted that the diagonal of the matrix is always zero since it reports the distance of a PMF to itself.
We observe low Hellinger distances, DH<0.25, between the dishwasher of SynD and dishwashers in ECO 2,
REFIT 1, REFIT 2, and UK-DALE 5. In addition, these appliances show pairwise low Hellinger distances, which
have approximately the same magnitude as Hellinger distances of SynD. In contrast to that, we measure extraordi-
narily high distances between the dishwashers of UK-DALE 1, UK-DALE 2 and the remaining dishwashers in our
study. Interestingly, UK-DALE 1 and UK-DALE 2 show a Hellinger distance of 0.20. For the fridges in our study,
we observe predominantly intermediate as well as high Hellinger distances between the PMFs. Except for rare
Fig. 8 Hellinger distance of probability mass functions for selected appliances: (a) dishwashers (b) fridges (c)
washing machines (d) water kettles.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
14
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
pairwise exceptions, such as the distance between SynD 1 and REFIT 8, we mostly observe indications of dissim-
ilarity. We observe a large group of washing machines with a low Hellinger distance in our study. e washers in
SynD 1, DRED 1, REFIT 1, REFIT 2, REFIT 8, and UK-DALE 1 all show values below 0.35. For ECO 1, we record
intermediate similarity to this group of washing machines and large dissimilarity between ECO 1 and UK-DALE
2 as well as UK-DALE 5. In case of water kettles, we identify two major groups: water kettles of UK-DALE and
others. Between water kettles of UK-DALE and kettles from other datasets, we measure high Hellinger distances
DH>0.70. In many cases, we measure maximum dissimilarity. In contrast to that, we observe high similarities
between water kettles of SynD, ECO, DRED, and REFIT (DH<0.10).
To complement our study, we apply the Jensen-Shannon distance as a second statistical measure to evaluate
the similarity of the PMFs. e Jensen-Shannon distance is dened as the square-root of the Jensen-Shannon
divergence33. is distance measures the similarity between two probability distributions P and Q:
=⋅ +DPQDPM DQM()
1
2
(( )())
(4)
JS KL KL
where M is dened as the point-wise mean of P and Q:
=⋅ +MPQ
1
2
()
(5)
is distance measure is based on the Kullback-Leibler divergence, is symmetric and always returns a nite
value34. e Kullback-Leibler divergence35, oen referred to as relative entropy, is the expectation of the logarith-
mic dierence between P and Q, where the expectation is taken with regard to the probabilities of P:
=⋅
DPQPxlog
Px
Qx
() () ()
()
(6)
KL
xX
Fig. 9 Jensen-Shannon distance of probability mass functions for selected appliances: (a) dishwashers (b)
fridges (c) washing machines (d) water kettles.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
15
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/
In the same manner as for the Hellinger distance, we derive the Jensen-Shannon distance for PMFs of dishwash-
ers, fridges, washing machines, and water kettles. Figure9 summarises the JS distance in the form of four matri-
ces, where we form one matrix per appliance type. e obtained matrices closely resemble the outcome of studies
related to the Hellinger distance. Based on these results, we draw identical conclusions about pairwise similarities
of appliances, we identify the same appliance groups based on pairwise similarity, and we observe that appliances
of the dataset UK-DALE show higher degrees of dissimilarity in general.
As concerns statistical similarity in form of the Hellinger or the Jensen-Shannon distance, we identify high
degrees of similarity of simulated appliances in SynD and appliances of real-world energy consumption datasets.
In addition, we nd high levels of pairwise similarity between certain datasets as well as extraordinarily low sim-
ilarities between other real datasets.
Discussion
We conclude this section with a summary of our technical validation studies and briey discuss some limitations
of our approach. To demonstrate the technical validity of the synthetic dataset SynD, we present several case stud-
ies that evaluate the similarity of SynD and four other energy datasets, which stem from measurement campaigns
in real households.
• We demonstrate that the variation of the household’s daily energy consumption lies within a realistic range. In
some cases, we identied a noticeable smaller variation for real households than for SynD.
• We derived the average load proles of households for forty days and examined the spread of appliance usage
during the day. For SynD, we identify resemblance to certain real households but also diagnose limitations
of our approach.
• During studies with focus on individual appliances, we nd that appliances in SynD show comparable energy
consumption as real household appliances for an observation period of forty days.
• We derive probability mass functions of selected appliances. Based on those PMFs, we illustrate similarities
between real and simulated appliances by help of statistical similarity measures such as Jensen-Shannon dis-
tance and Hellinger distance.
e current version of SynD faces certain limitations, which are the result of cost constraints with regard to the
measurement campaign or a consequence of our modelling approach:
• Although funds were available to invest in certied measurement hardware, the acquired hardware allowed
monitoring single-phase appliances only. Consequently, our measurement campaign excluded big consumers
such as electric water heaters, electric three-phase stoves, etc.
• e current version of SynD derives the mains signal by aggregating individual appliance-level power signals.
Aggregate power signals of real households contain certain levels of data noise that stems from unmetered
appliances, which increases the complexity of the load disaggregation problem6. One approach to overcome
this limitation could be to superimpose correlated as well as uncorrelated data noise.
• Our approach considers active power only. We hypothesise that incorporating further physical quantities such
as apparent power, current, or voltage would increase the value of a synthetic dataset generator for NILM.

To help users get started with SynD, we provide a simple code example to demonstrate how to access data. We
recommend the use of NILMTK in conjunction with SynD. In principle, working with SynD does not dier from
working with other datasets that use the NILMTK data format. To read data from SynD, users have to create a
new DataSet object and reference the HDF5 le. is object serves to access data and also oers metadata. SynD
contains one meter group, elec. With the help of this elec object, users can directly access data of the mains or indi-
vidual appliances. In the code example presented in Box2, we create a DataSet and an elec object for SynD, print
members of the meter group elec, and then plot the aggregate power signal for the household. Further material
can be obtained from our repository (https://github.com/klemenjak/synd/).
Box 4 Sample code for NILMTK.
1 from nilmtk import DataSet
2 import matplotlib.pyplot as plt
3 SynD = DataSet (’SynD.h5 ’)
4 elec = SynD.buildings [1].elec
5 print (elec)
6 plt.plot (elec.mains ().power_series_all_data ())
7 plt.ylabel (’Power in W’)
8 plt.xlabel (’Time’)
9 plt.grid (color=’0.75’, linestyle =’-.’, linewidth=0.5)
10 plt.title (’One day in the life of SynD’)
Content courtesy of Springer Nature, terms of use apply. Rights reserved
16
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/

We selected Python 3 as main programming language and identify the following dependencies of SynD: Pandas
0.22, Numpy 1.15, and NILMTK 0.3. We aimed at providing compatibility to the latest versions of these soware
packages and released code examples, an extensive user guide, and supplemental material under the licence
Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/) on our GitHub repository (https://
github.com/klemenjak/synd/).
Along with the dataset SynD, we release the rst public version of our dataset generator tool via gshare22. is
tool was used to create SynD and can also serve to generate new datasets on-demand. We release this early version
of our tool under the licence CC0 (https://creativecommons.org/publicdomain/zero/1.0/).
Received: 11 November 2019; Accepted: 2 March 2020;
Published: xx xx xxxx
References
1. Nalmpantis, C. & Vraas, D. Machine learning approaches for non-intrusive load monitoring: from qualitative to quantitative
comparation. Ar ticial Intelligence eview 52, 217–243 (2019).
2. Hart, G. W. Nonintrusive appliance load monitoring. Proceedings of the IEEE 80, 1870–1891 (1992).
3. Zoha, A., Gluha, A., Imran, M. & ajasegarar, S. Non-intrusive load monitoring approaches for disaggregated energy sensing: a
survey. Sensors 12, 16838–16866 (2012).
4. Bongli, ., Squartini, S., Fagiani, M.& Piazza, F. Unsupervised algorithms for non-intrusive load monitoring: an up-to-date
overview. 2015 IEEE 15th International Conference on Environment and Electrical Engineering (EEEIC) 1175–1180 (2015).
5. Pereira, L.& Nunes, N. Performance evaluation in non-intrusive load monitoring: Datasets, metrics, and tools - a review. Wiley
Interdisciplinary eviews: Data Mining and nowledge Discovery 8, 1–17 (2018).
6. Maonin, S. & Popowich, F. Nonintrusive load monitoring (NILM) performance evaluation. Energy Eciency 8, 809–814 (2015).
7. lemenja, C., Maonin, S.& Elmenreich, W. Towards comparability in non-intrusive load monitoring: on data and performance
evaluation. 2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) 1–5 (2020).
8. Buneeva, N.& einhardt, A. Ambal: realistic load signature generation for load disaggregation performance evaluation. 2017 IEEE
International Conference on Smart Grid Communications (SmartGridComm) 443–448 (2017).
9. Barer, S., alra, S., Irwin, D.& Shenoy, P. Empirical characterization and modeling of electrical loads in smart homes. 2013
international green computing conference proceedings 1–10 (2013).
10. lemenja, C. et al. Electricity consumption data sets: pitfalls and opportunities. Proceedings of the 6th ACM International
Conference on Systems for Energy-Ecient Buildings, Cities, and Transportation 169–162 (2019).
11. Batra, N. et al. NILMT: An open source toolit for non-intrusive load monitoring. Proceedings of the 5th international conference
on Future energ y systems 265–276 (2014).
12. Batra, N. et al. Towards reproducible state-of-the-art energy disaggregation. Proceedings of the 6th ACM International Conference on
Systems for Energy-Ecient Buildings, Cities, and Transportation 193–202 (2019).
13. Chen, D. Irwin, D.& Shenoy, P. Smartsim: a device-accurate smart home simulator for energy analytics. 2016 IEEE International
Conference on Smar t Gr id Communications (SmartGridComm) 686–692 (2016).
14. Henriet, S. Simseli, U. ichard, G.& Fuentes, B. Synthetic dataset generation for non-intrusive load monitoring in commercial
buildings. Proceedings of the 4th ACM International Conference on Systems for Energy-Ecient Built Environments 1–2 (2017).
15. Barer, S. et al. Smart*: An open data set and tools for enabling research in sustainable homes. SustDD 1–5 (2012).
16. Shin, C., ho, S., Lee, H. & hee, W. Data requirements for applying machine learning to energy disaggregation. Energies 12, 1696
(2019).
17. elly, J.& nottenbelt, W. Metadata for energy disaggregation. 2014 IEEE 38th International Computer Soware and Applications
Conference Worshops 578–583 (2014).
18. lemenja, C.& Elmenreich, W. On the applicability of correlation lters for appliance detection in smart meter readings. 2017
IEEE International Conference on Smart Grid Communications (SmartGridComm) 171–176 (2017).
19. Monacchi, A. et al. An open solution to provide personalized feedbac for building energy management. Journal of Ambient
Intelligence and Smart Environments 9, 147–162 (2017).
20. Monacchi, A., Egarter, D., Elmenreich, W., D’Alessandro, S.& Tonello, A. M. Greend: An energy consumption dataset of households
in italy and austria. 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm) 511–516 (2014).
21. Maonin, S. Investigating the switch continuity principle assumed in non-intrusive load monitoring (NILM). 2016 IEEE Canadian
Conference on Electrical and Computer Engineering (CCECE) 1–4 (2016).
22. lemenja, C., ovatsch, C., Herold, M. & Elmenreich, W. SynD: A Synthetic Energy Dataset for Non-Intrusive Load Monitoring
in Households. gshare https://doi.org/10.6084/m9.gshare.c.4716179 (2020).
23. Uttama Nambi, A. S., eyes Lua, A.& Prasad, V. . Loced: location-aware energy disaggregation framewor. Proceedings of the 2nd
ACM International Conference on Embedded Systems for Energy-Ecient Built Environments 45–54 (2015).
24. Becel, C., leiminger, W., Cicchetti, ., Staae, T.& Santini, S. e eco data set and the performance of non-intrusive load
monitoring algorithms. Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Ecient Buildings 80–89 (2014).
25. Murray, D. et al. A data management platform for personalised real-time energy feedbac. Proceedings of the 8th International
Conference on Energy Ecienc y in Domestic Appliances and Lighting 1–15 (2015).
26. elly, J.& nottenbelt, W. e U-DALE dataset, domestic appliance-level electricity demand and whole-house demand from ve
U homes. Scientic Data 2, 1–14 (2015).
27. Fei, H. et al. Heat pump detec tion from coarse grained smart meter data with positive and unlabeled learning. Proceedings of the 19th
ACM SIGDD international conference on nowledge discovery and data mining 1330–1338 (2013).
28. Petrican, T. et al. Evaluating forecasting techniques for integrating household energy prosumers into smart grids. 2018 IEEE 14th
International Conference on Intelligent Computer Communication and Processing (ICCP) 79–85 (2018).
29. Wang, Y., Chen, Q., Hong, T. & ang, C. eview of smart meter data analytics: Applications, methodologies, and challenges. IEEE
Transactions on Smart Grid 10, 3125–3148 (2018).
30. Murray, D., Liao, J., Stanovic, L. & Stanovic, V. Understanding usage patterns of electric ettle and energy saving potential. Applied
Energy 171, 231–242 (2016).
31. Maonin, S., Popowich, F., Bajić, I. V., Gill, B. & Bartram, L. Exploiting hmm sparsity to perform online real-time nonintrusive load
monitoring. IEEE Transactions on Smart Grid 7, 2575–2585 (2015).
32. Niulin, M. S. Hellinger distance. Encyclopedia of Mathematics 78, (2001).
33. Lin, J. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory 37, 145–151 (1991).
34. Endres, D. M.& Schindelin, J. E. A new metric for probability distributions. IEEE Transactions on Information theory 49, 1858–1860
(2003).
35. Macay, D. Information eory, Inference and Learning Algorithms (Cambridge university press, 2003).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
17
SCIENTIFIC DATA | (2020) 7:108 | https://doi.org/10.1038/s41597-020-0434-6
www.nature.com/scientificdata
www.nature.com/scientificdata/

e authors would like to thank Mr. Daniel Maurer for his assistance during the measurement campaign and Dr.
Andreas Reinhardt for inspiring discussions.

Christoph Klemenjak led the development of the dataset, acquired the measurement devices, implemented parts
of the dataset simulator, conducted technical validation of the nal dataset, and developed main parts of the
manuscript. Christoph Kovatsch led the measurement campaign, dened the appliance categories, implemented
main parts of the dataset simulator, and contributed to the methods section of this manuscript. Manuel Herold
assisted in implementing the dataset simulator, implemented a web interface for SynD, and provided insights on
related work. Wilfried Elmenreich added to the discussion of the technical validation and contributed to all parts
of the manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to C.K.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-
ative Commons license, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons license and your intended use is not per-
mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
e Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/
applies to the metadata les associated with this article.
© e Author(s) 2020
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Apart from the generative models, other methods like SmartSim [19], AMBAL [20] and SynD [21] are also available in the literature for generating synthetic appliance energy profiles. SmartSim uses empirical appliance energy model and a appliance usage model to synthesize energy consumption profile. ...
... Since the data from these datasets were originally collected at different sampling frequencies, all are resampled at a 10 s interval to match the data sampling frequency of the VL. The data generated from the VL are also compared with another dataset, SynD [21]. SynD provides synthetic appliance energy profiles for a duration of 180 days, sampled at a frequency of 5 Hz. ...
Article
Full-text available
Smart energy-aware homes can track their detailed energy consumption for better management of electricity use. A novel solution to monitor the appliance-wise energy consumption in homes is non-intrusive appliance load monitoring (NIALM). Recently, deep learning approaches have shown tremendous potential for NIALM solutions in smart homes. The efficacy of the NIALM model in unseen homes is highly dependent on the quantity and diversity of data used to train it. However, the presently available energy data are limited and less diverse. Furthermore, the collection of new real-life energy data from real homes is invasive, costly, and time-consuming. This paper addresses the above problems by proposing a virtual laboratory (VL), developed using MATLAB/Simulink, where one can create realistic, high-fidelity, and diverse energy data for various NIALM studies. The proposed VL is a simulation environment comprised of dynamic models of common household appliances, appliance schedulers, and measurement systems. An extensive comparative analysis among the data generated from the proposed VL, another synthetic data generator, and real-life data collected through measurement campaigns in four homes across three different countries demonstrates its realism, diversity, and ability to mimic occupant behavior. The two case studies demonstrate how to use the energy data from the VL to supplement real-world data and act as a test bed for assessing the NIALM model’s performance under various test conditions.
... Many methods that utilize the bottom-up approach depend on additional intrusive sensors for individual data collection, which are then aggregated to generate household-level load data. Even nonintrusive methods, such as [12], do not address issues related to data collection and load modeling. As a consequence, methods that employ the bottom-up approach often prove difficult to adapt to a large scale of households. ...
Preprint
Full-text available
The scarcity of high-quality residential load data can pose obstacles for decarbonizing the residential sector as well as effective grid planning and operation. The above challenges have motivated research into generating synthetic load data, but existing methods faced limitations in terms of scalability, diversity, and similarity. This paper proposes a Generative Adversarial Network-based Synthetic Residential Load Pattern (RLP-GAN) generation model, a novel weakly-supervised GAN framework, leveraging an over-complete autoencoder to capture dependencies within complex and diverse load patterns and learn household-level data distribution at scale. We incorporate a model weight selection method to address the mode collapse problem and generate load patterns with high diversity. We develop a holistic evaluation method to validate the effectiveness of RLP-GAN using real-world data of 417 households. The results demonstrate that RLP-GAN outperforms state-of-the-art models in capturing temporal dependencies and generating load patterns with higher similarity to real data. Furthermore, we have publicly released the RLP-GAN generated synthetic dataset, which comprises one million synthetic residential load pattern profiles.
... Understanding the factors that influence HP behavior and performance, as well as training algorithms to capture diverse usage patterns, requires a dataset that combines operational data with contextual information about the building and heating system, along with reliable ground-truth data on HP settings and optimization potential. There are several open-source datasets related to residential electricity consumption and HP usage, but none provide all the functionalities required for developing algorithms tailored to the aforementioned purposes [1,27,41,42,45,46]. For instance, the dataset by Schlemminger et al. [46] offers residential electricity and HP load profiles from single-family homes in Northern Germany, recorded at resolutions ranging from 10 seconds to 60 minutes between 2018 and 2020. ...
Preprint
Full-text available
Heat pumps are essential for decarbonizing residential heating but consume substantial electrical energy, impacting operational costs and grid demand. Many systems run inefficiently due to planning flaws, operational faults, or misconfigurations. While optimizing performance requires skilled professionals, labor shortages hinder large-scale interventions. However, digital tools and improved data availability create new service opportunities for energy efficiency, predictive maintenance, and demand-side management. To support research and practical solutions, we present an open-source dataset of electricity consumption from 1,408 households with heat pumps and smart electricity meters in the canton of Zurich, Switzerland, recorded at 15-minute and daily resolutions between 2018-11-03 and 2024-03-21. The dataset includes household metadata, weather data from 8 stations, and ground truth data from 410 field visit protocols collected by energy consultants during system optimizations. Additionally, the dataset includes a Python-based data loader to facilitate seamless data processing and exploration.
... Household datasets: REDD [19], BLUED [20], UK-DALE [21], SustDataED [22], EN-ERTALK [23], SynD [24] (sampling rates > 1 Hz, each); › Commercial building datasets: BLOND [25] (sampling rates ≥ 50 kHz); › Industrial datasets: Industrial Machines Dataset for Electrical Load Disaggregation (sampling rate = 1 Hz) [26], HIPE (sampling rates ≥ 1 kHz) [27]. ...
Article
Full-text available
The research area of NILM exhibits a high heterogeneity regarding approaches and characteristics, especially in terms of the applied algorithms, measurement data, quantities, and features used, as well as congruent appliance event and state definitions. Therefore, performance evaluation and the establishment of comparability is not straightforward. The aim of the presented work was to address these challenges through the development of an application-oriented, general methodology for the parametrization, optimization, and performance evaluation of existing NILM algorithms. The methodology is based on the general NILM framework and applicable to a wide range of NILM approaches and measurement data. Temporary, individual appliance measurements are utilized to build an extended appliance database and for providing a reliable ground truth for common performance evaluation metrics. Therefore, a congruent event and state definition was also formulated. The application of the methodology focused on event-based NILM algorithms and the measurement data of a commercial building and for one significant appliance, in relation to the total energy demand of the building. The methodology proved to be suitable for the intended purpose. Two different event-detection algorithms could be optimized regarding their input parameters, to be able to identify the appliance operation behavior optimally.
Article
Full-text available
Reliable data on residential power generation and consumption is vital for effectively integrating renewable energy sources. This is particularly important in the Baltic countries, where climate variability significantly impacts energy production and consumption. Such high-resolution residential usage data is beneficial for various applications, including planning, demand response, consumption behavior analysis, and forecasting. The dataset presented in this study contains one year (2023) of photovoltaic (PV) generation and energy meter power flow data collected at ten-second intervals from a residential dwelling in Estonia. To gather this data, two Camille Bauer PQ1000 power quality monitoring units were installed on the PV and meter side wiring of the house. The paper thoroughly discusses the data collection process, the original dataset, the processed data, and the feature analysis.
Preprint
Full-text available
This paper addresses the identification and classification of Distributed Generation (DG) connected to the secondary distribution network based on the Non-Intrusive Load Monitoring framework. We built a new public dataset with real-world data comprising samples of electrical variables aggregating loads and distributed generation data. Traditionally, NILM methods are concerned with disaggregating, identifying, and classifying electrical loads. On the other hand, Behind the Meter (BTM) estimation methods separate the consumption of electrical loads from the DG power generated by prosumers. Our work expands the traditional NILM and BTM analysis, presenting an ablation study of DG’s impact on the identification of electrical loads and the impact that aggregate loads represent for the identification of DG. We use state-of-the-art deep learning-based methods for disaggregation and classification on our new dataset and achieved up to 100% F1-Score for DG identification and up to 98% F1-Score for load disaggregation with the presence of DG.
Article
Non-Intrusive Load Monitoring (NILM) identifies individual appliance power usage within an overall power load, enabling more refined and secure load management. However, existing deep learning-based NILM models require large amounts of labeled data from diverse devices, which is time-consuming and raises privacy concerns. Additionally, handling these large datasets demands significant computational and memory resources. To address these issues, we propose a self-alignment, source-aware domain adaptation approach. Our method employs domain adversarial networks to address feature and label distribution shifts between source and target domains. To preserve privacy, we fine-tune the model without source domain data. To stabilize adversarial training, we incorporate a self-alignment mechanism (SAM). The SAM ensures parameter updates without accessing source domain data, enabling stable training while preserving privacy. Confidence-based label density maps generate pseudo-labels for fine-tuning. We validated our approach with intra-domain and inter-domain adaptability studies on synthetic and real data. We conducted intra-domain and inter-domain adaptability studies on synthetic and real data. Results show our method achieves decomposition accuracy superior to source-based methods for devices with regular usage patterns, all while effectively preserving privacy by eliminating the need for source data during the fine-tuning stage. This offers potential for improving NILM efficiency and energy management in industrial measurement settings with similar stability requirements.
Article
This paper addresses the identification and classification of distributed generation (DG) connected to the secondary distribution network based on the non-intrusive load monitoring framework. We built a new public dataset with real-world data comprising samples of electrical variables aggregating loads and distributed generation data. Traditionally, NILM methods are concerned with disaggregating, identifying, and classifying electrical loads. On the other hand, behind the meter (BTM) estimation methods separate the consumption of electrical loads from the DG power generated by prosumers. Our work expands the traditional NILM and BTM analysis, presenting an ablation study of DG’s impact on the identification of electrical loads and the impact that aggregate loads represent for the identification of DG. We use state-of-the-art deep learning-based methods for disaggregation and classification on our new dataset and achieved up to 100% F1-Score for DG identification and up to 98% F1-Score for load disaggregation with the presence of DG. Data and codes are fully available at https://github.com/evertoneie/DG-NILM.
Conference Paper
Full-text available
Real-world data sets are crucial to develop and test signal processing and machine learning algorithms to solve energy-related problems. Their scope and data resolution is, however, often limited to the means required to fulfill the experimenters' objectives and moreover governed by personal experience, budgetary and time constraints, and the availability of equipment. As a result, numerous differences between data sets can be observed, e.g., regarding their sampling rates, the number of sensors deployed, their amplitude resolutions, storage formats, or the availability and extent of ground-truth annotations. This heterogeneity poses a significant problem for researchers intending to comparatively use data sets because of the required data conversion, re-sampling, and adaptation steps. In short, there is a lack of widely agreed best practices for designing, deploying, and operating electrical data collection systems. We address this limitation by dissecting the collection methodologies used in existing data sets. By offering recommendations for data collection, data storage, and data provision, we intend to foster the creation of data sets with increased usability and comparability, and thus a greater benefit to the community.
Conference Paper
Full-text available
Non-intrusive load monitoring (NILM) or energy disaggregation is the task of separating the household energy measured at the aggregate level into constituent appliances. In 2014, the NILM toolkit (NILMTK) was introduced in an effort towards making NILM research reproducible. Despite serving as the reference library for data set parsers and reference benchmark algorithm implementations, few publications presenting algorithmic contributions within the field went on to contribute implementations back to the toolkit. This paper describes two significant contributions to the NILM community in an effort towards reproducible state-of-the-art research: i) a rewrite of the disaggregation API and a new experiment API which lower the barrier to entry for algorithm developers and simplify the definition of algorithm comparison experiments, and ii) the release of NILMTK-contrib; a new repository containing NILMTK-compatible implementations of 3 benchmarks and 9 recent disaggregation algorithms. We have performed an extensive empirical evaluation using a number of publicly available data sets across three important experiment scenarios to showcase the ease of performing reproducible research in NILMTK.
Article
Full-text available
Energy disaggregation, or nonintrusive load monitoring (NILM), is a technology for separating a household’s aggregate electricity consumption information. Although this technology was developed in 1992, its practical usage and mass deployment have been rather limited, possibly because the commonly used datasets are not adequate for NILM research. In this study, we report the findings from a newly collected dataset that contains 10 Hz sampling data for 58 houses. The dataset not only contains the aggregate measurements, but also individual appliance measurements for three types of appliances. By applying three classification algorithms (vanilla DNN (Deep Neural Network), ML (Machine Learning) with feature engineering, and CNN (Convolutional Neural Network) with hyper-parameter tuning) and a recent regression algorithm (Subtask Gated Network) to the new dataset, we show that NILM performance can be significantly limited when the data sampling rate is too low or when the number of distinct houses in the dataset is too small. The well-known NILM datasets that are popular in the research community do not meet these requirements. Our results indicate that higher quality datasets should be used to expedite the progress of NILM research.
Conference Paper
Full-text available
This paper tackles the problem of integrating household energy prosumers in Smart Energy Grids by analyzing a set of state-of-the-art energy forecasting techniques that allow individual or aggregated prosumers to evaluate their future energy demand and inform the Distributed System Operator (DSO) about potential grid imbalances. Thus, the DSO can perform a proactive strategy to manage the grid and avoid problems before they appear. The key element of this approach is the prediction technique, that must be accurate enough such that the resulting grid imbalances can be compensated in realtime. The paper evaluates a set of state-of-the-art statistical and Machine Learning (ML) prediction techniques, such as SARIMA, feed-forward and recurrent neural networks, support vector regression or ensemble prediction models, on real household historical energy demand logs by performing a feature selection process for each ML algorithm as to identify the best elements that influence the energy demand of a house. A set of experiments are performed on the REFIT Electrical Load Measurements data set evaluating each model’s performance with respect to the selected features. Among the evaluated algorithms, the Ensemble Prediction Model gives best prediction accuracy, showing a Mean Absolute Percentage Error (MAPE) of 14.4% followed by the SVM model with a MAPE of 15.4%.
Article
Full-text available
Non‐intrusive load monitoring (also known as NILM or energy disaggregation) is the process of estimating the energy consumption of individual appliances from electric power measurements taken at a limited number of locations in the electric distribution of a building. This approach reduces sensing infrastructure costs by relying on machine learning techniques to monitor electric loads. However, the ability to evaluate and benchmark the proposed approaches across different datasets is key for enabling the generalization of research findings and consequently contributes to the large‐scale adoption of this technology. Still, only recently researchers have focused on creating and standardizing the existing datasets in order to deliver a single interface to run NILM evaluations. Furthermore, there is still no consensus regarding, which performance metrics should be used to measure and report the performance of NILM systems and their underlying algorithms. This paper provides a review of the main datasets, metrics, and tools for evaluating the performance of NILM systems and technologies. Specifically, we review three main topics: (a) publicly available datasets, (b) performance metrics, and (c) frameworks and toolkits. The review suggests future research directions in NILM systems and technologies, including cross‐datasets, performance metrics for evaluation and generalizable frameworks for benchmarking NILM technology. This article is categorized under: • Application Areas > Science and Technology • Application Areas > Data Mining Software Tools • Technologies > Computational Intelligence • Technologies > Machine Learning
Article
Full-text available
The widespread popularity of smart meters enables an immense amount of fine-grained electricity consumption data to be collected. Meanwhile, the deregulation of the power industry, particularly on the delivery side, has continuously been moving forward worldwide. How to employ massive smart meter data to promote and enhance the efficiency and sustainability of the power grid is a pressing issue. To date, substantial works have been conducted on smart meter data analytics. To provide a comprehensive overview of the current research and to identify challenges for future research, this paper conducts an application-oriented review of smart meter data analytics. Following the three stages of analytics, namely, descriptive, predictive and prescriptive analytics, we identify the key application areas as load analysis, load forecasting, and load management. We also review the techniques and methodologies adopted or developed to address each application. In addition, we also discuss some research trends, such as big data issues, novel machine learning technologies, new business models, the transition of energy systems, and data privacy and security.
Conference Paper
Communication systems utilise correlation filters to detect waveforms. In a broader sense, these filters examine the amount of resemblance between a template pattern and the input pattern. In the domain of smart grids, many applications require the detection of active electrical appliances, their condition as well as their current state of operation. Furthermore, the identification of power eaters, the recognition of ageing effects, and the forecast of required maintenance represent important challenges in (home) energy management systems. In this paper, we examine the applicability of correlation filters as a possible solution to meet such challenges. First, we introduce the concept of predictability to power consumption patterns of electrical appliances. Second, we present our concept and the implementation of correlation filters for this kind of application. The correlation filters utilise a particular consumption pattern of an electrical appliance to detect the respective appliance in energy readings from smart meters and smart plugs. Lastly, we assess the performance of the correlation filters on the real-world energy consumption dataset GREEND, which provides readings from smart meter data as well as appliance-level measurement equipment. As the results approve, the correlation filters show a good performance for appliances with predictable consumption patterns such as refrigerators, dishwashers, or washing machines. Thus, we propose that future work should evaluate the applicability of correlation filters in appliance diagnosis systems.
Conference Paper
In the recent years, there has been an increasing academic and industrial interest for analyzing the electrical consumption of commercial buildings. Whilst having similarities with the Non Intrusive Load Monitoring (NILM) tasks for residential buildings, the nature of the signals that are collected from large commercial buildings introduces additional difficulties to the NILM research. One of the main difficulties is that the amount of publicly available datasets collected from commercial buildings is very limited, which makes the NILM research even more challenging for this type of large buildings. In order to circumvent the issues caused by the lack of data available, we propose a model for generating realistic synthetic current waveforms by making use of both publicly available datasets and our private dataset that is collected from real commercial buildings. Our primarily experiments show that the generated data ressemble real datasets.