PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

In recent years, smart meters have been widely adopted by electricity suppliers to improve the management of the smart grid system. These meters usually collect energy consumption data at a very low frequency (every 30min), enabling utilities to bill customers more accurately. To provide more personalized recommendations, the next step is to detect the appliances owned by customers, which is a challenging problem, due to the very-low meter reading frequency. Even though the appliance detection problem can be cast as a time series classification problem, with many such classifiers having been proposed in the literature, no study has applied and compared them on this specific problem. This paper presents an in-depth evaluation and comparison of state-of-the-art time series classifiers applied to detecting the presence/absence of diverse appliances in very low-frequency smart meter data. We report results with five real datasets. We first study the impact of the detection quality of 13 different appliances using 30min sampled data, and we subsequently propose an analysis of the possible detection performance gain by using a higher meter reading frequency. The results indicate that the performance of current time series classifiers varies significantly. Some of them, namely deep learning-based classifiers, provide promising results in terms of accuracy (especially for certain appliances), even using 30min sampled data, and are scalable to the large smart meter time series collections of energy consumption data currently available to electricity suppliers. Nevertheless, our study shows that more work is needed in this area to further improve the accuracy of the proposed solutions. This paper apparead in the proceedings of the 14th ACM International Conference on Future Energy Systems (e-Energy '23).
Appliance Detection Using Very Low-Frequency Smart Meter
Time Series
Adrien Petralia
EDF R&D - Université Paris Cité
Paris, France
adrien.petralia@gmail.com
Philippe Charpentier
EDF R&D
Palaiseau, France
philippe.charpentier@edf.f r
Paul Boniol
Université Paris Cité
Paris, France
boniol.paul@gmail.com
Themis Palpanas
Université Paris Cité - IUF
Paris, France
themis@mi.parisdescartes.fr
ABSTRACT
In recent years, smart meters have been widely adopted by electric-
ity suppliers to improve the management of the smart grid system.
These meters usually collect energy consumption data at a very low
frequency (every 30min), enabling utilities to bill customers more
accurately. To provide more personalized recommendations, the
next step is to detect the appliances owned by customers, which is a
challenging problem, due to the very-low meter reading frequency.
Even though the appliance detection problem can be cast as a time
series classication problem, with many such classiers having
been proposed in the literature, no study has applied and compared
them on this specic problem. This paper presents an in-depth
evaluation and comparison of state-of-the-art time series classiers
applied to detecting the presence/absence of diverse appliances in
very low-frequency smart meter data. We report results with ve
real datasets. We rst study the impact of the detection quality of
13 dierent appliances using 30min sampled data, and we subse-
quently propose an analysis of the possible detection performance
gain by using a higher meter reading frequency. The results indi-
cate that the performance of current time series classiers varies
signicantly. Some of them, namely deep learning-based classiers,
provide promising results in terms of accuracy (especially for cer-
tain appliances), even using 30min sampled data, and are scalable
to the large smart meter time series collections of energy consump-
tion data currently available to electricity suppliers. Nevertheless,
our study shows that more work is needed in this area to further
improve the accuracy of the proposed solutions. This paper was
accepted in e-Energy ’23.
CCS CONCEPTS
Computing methodologies Learning paradigms.
KEYWORDS
Appliance Detection, Smart Meter Data, Time Series Classication
1 INTRODUCTION
The energy sector is undergoing signicant changes, primarily
driven by the need for a more sustainable and secure energy supply.
One way to better manage our consumption is to understand it bet-
ter. In the last decade, electricity suppliers have installed millions of
smart meters worldwide to improve their ability to manage the elec-
trical grid [
9
,
40
]. These meters record detailed time-stamped data
on electricity consumption, allowing both individual customers
and businesses to better understand and rationalize their consump-
tion [
5
]. These data are also valuable for suppliers, as they can help
Time (s)
Time (s)
Time (s)
(a) Sampling frequency:
1 second vs
1 minute
(b) Sampling frequency:
1 second vs
15 minutes
(c) Sampling frequency:
1 second vs
30 minutes
Dishwasher Washing Machine
Figure 1: Comparisons of load curves containing a dish-
washer and a washing machine at dierent sampling fre-
quencies (1 second vs 1, 15, and 30min)
them anticipate energy demand more accurately. Overall, the wide-
spread adoption of smart meters plays a crucial role in transitioning
toward a more sustainable and ecient energy system.
For electricity suppliers, knowing the specic electrical appli-
ances owned by their customers is critical for providing person-
alized and relevant recommendations, or oers. As the demand
for personalized advice increases, being able to oer tailored sug-
gestions has become an essential aspect of customer satisfaction
and retention. One way to gather this information is by asking
customers directly through a consumption questionnaire. However,
this method can be a signicant investment in terms of time and
resources, which customers may not accept. Therefore, electricity
suppliers need to nd more ecient and non-intrusive ways of
gathering this information, such as using advanced data analytics
techniques to detect the appliances directly through the collected
smart meters data [18].
Appliance detection has become a signicant area of research,
with various techniques employed to detect the presence of de-
vices [
34
,
46
]. The use of signature-based methods, which utilize
information about the unique patterns of specic appliances, is a
widely adopted approach. However, all these studies relied on data
from smart meters capable of recording 1 (or even more) values per
second. Nonetheless, most smart meters record consumption at a
very low sampling frequency: once every 10 to 60min, and in some
cases with an even lower frequency. Note that, nowadays, individ-
ual smart meters collect data at resolutions of 15min in Italy, 30min
in the UK, and 60min in Spain [
59
]. In France, individual smart
meters installed by Enedis (an Electricité De France subsidiary)
collect the total household consumption index only every 30min
and will soon be collected every 15min. The very low frequency at
which smart meters record data leads to the aggregation of mul-
tiple appliance activation signatures that occur simultaneously at
dierent frequencies. This results in a smoothed signal, causing the
loss of unique appliance pattern information. Figure 1 illustrates
1
A. Petralia, et al.
this loss of information. We observe that the dishwasher (shown
on the left) and washing machine (shown on the right) signatures
become increasingly hard to distinguish from one another as the
sampling frequency drops. Therefore, it becomes infeasible to de-
tect appliances accurately using signature-based methods for the
sampling frequencies now used in practice.
In recent years, the eld of time series data mining has seen a sig-
nicant amount of research dedicated to developing algorithms for
classifying time series data of any kind [
4
,
6
,
16
,
24
,
33
]. However,
most of these algorithms have not yet been used and tested for the
appliance detection problem. We argue that it is necessary to eval-
uate these classiers on various datasets and appliance detection
cases to better understand their performance and limitations. In
addition, there is a need to evaluate the impact of the smart meter
reading characteristics (i.e., sampling frequency) on the classier
detection score; this information is valuable for electricity suppliers
in order to determine an adequate meter reading frequency, suitable
for detecting the presence of appliances with sucient accuracy.
In this paper, we propose a benchmark of diverse state-of-the-art
classication methods for the problem of appliance detection in
very low-frequency electrical consumption time series. We conduct
our experimental evaluation on ve real smart meter datasets using
dierent time series classiers. We rst focus on detecting appli-
ances in very low-sampled smart meters data (30min level), as it is
nowadays one of the standard sampling rates adopted by electricity
suppliers. We then provide an in-depth analysis of the increasing
detection quality using higher frequency smart meter readings:
15min, 10min, and 1 min. To our knowledge, this is the rst study
to perform an exhaustive comparison of 11 state-of-the-art methods
on ve diverse real datasets with 13 dierent types of appliances,
for multiple sampling frequencies. The experimental evaluation
demonstrates that current time series classiers can accurately de-
tect several appliances, even at the 30min resolution. Specically,
deep learning techniques are the most accurate and scalable when
applied to large smart meter datasets. Moreover, we demonstrate
that setting the smart meter reading frequency to 1min can greatly
enhance appliance detection using time series classiers.
Our contributions are summarized as follows.
We describe a framework for comparing the performance of dif-
ferent time series classication methods for the appliance detec-
tion problem, and make this framework publicly available: https:
//github.com/adrienpetralia/ApplianceDetectionBenchmark
We perform an extensive experimental evaluation using 5 di-
verse real datasets and 11 time series classiers, including both
traditional machine learning, as well as deep learning methods.
We report the results of our comparison, which demonstrate
that (i) current time series classiers can only detect certain
appliances at the 30min resolution; (ii) deep learning classiers
are the most accurate and scalable solution; and (iii) electric-
ity suppliers should target a minimum smart meter reading
frequency of 15min.
The ndings of this study can help electricity suppliers make in-
formed decisions regarding the characteristics of future smart
meter deployments. Moreover, these ndings point to inter-
esting (and still challenging) open research directions in the
context of electricity consumption time series analysis, and
appliance detection in particular.
2 BACKGROUND AND RELATED WORK
2.1 Smart Meter Data
An electrical consumption load curve is dened as a univariate
time series
X=(𝒙1, .. ., 𝒙𝑇)
of ordered elements
𝒙𝑗R1
+
follow-
ing
(𝑖1, . .., 𝑖𝑇)
time consumption indexes (i.e., timestamps). The
sampling frequency is dened as the time dierence between two
records index
Δ𝑡B𝑖𝑗𝑖𝑗1
. Each element
𝒙𝑗
, usually given in
Watt, indicates either the actual power at time
𝑖𝑗
or the average
electric power called during the interval time
Δ𝑡
. The value can also
be given in Watt-hour. In the literature, the denition of high and
low-frequency smart meters data can dier [
22
]. In this study, we
refer to high-frequency data sampled at less than 1 second and low-
frequency data sampled between 1 second and 1min. Data sampled
above 1min refers to very low-frequency smart meter data.
[Individual appliance load curve] By monitoring electric devices
with individual meters, we can obtain the consumption load curve of
each individual appliance in a household. However, instrumenting
every appliance in the house is prohibitively expensive.
[Aggregate load curve] The main consumption power of a house
is usually recorded by a smart meter device located on the electrical
meter of the household. This aggregate signal is the addition of the
power consumption of all individual appliances in the household.
2.2 Non-Intrusive Load Monitoring (NILM) and
Appliance Detection
Non-Intrusive Load Monitoring (NILM) [
18
], also called load disag-
gregation, relies on identifying the individual power consumption,
pattern, or on/o state activation of individual appliances using only
the total aggregated load curve [
27
]. NILM was initially approached
as a problem involving linear combinations, with algorithms aim-
ing to estimate the proportion of total power consumption used
by distinct active appliances at each time step [
27
]. Early research
on this topic employed combinatorial optimization techniques [
27
].
Later, Hidden Markov Models became the dominant approach, and
in the last few years, deep learning models have been the refer-
ence to perform disaggregation [
22
,
27
,
28
,
53
,
57
]. Furthermore,
NILM approaches can be divided into supervised and unsupervised
learning, depending on whether they usee labeled data for train-
ing the models. Supervised learning involves classifying detected
events (appliances being switched on or o) by matching extracted
features [
31
,
34
,
43
,
55
]. In contrast, unsupervised NILM methods de-
tect events by analyzing feature similarities, or correlations without
using labeled data [18, 58].
Since device recognition can be seen as a step of NILM-based
methods, dierent approaches exist in the literature to detect ap-
pliances in load curves using high or low-frequency smart meter
data [
3
,
25
,
26
,
31
,
43
,
45
,
55
]. However, numerous studies using
pattern recognition at low frequency require knowledge about how
each device operates. Few recent research studies [
3
,
25
,
31
,
43
]
used time series features, or deep learning representations, to detect
events or appliance activation patterns. Despite the promising re-
sults demonstrated by these studies using modern machine learning
approaches, we note that they are only applied to high-frequency
data (i.e., data sampled at a minimum rate of 1 sample per second).
2
Appliance Detection Using Very Low-Frequency Smart Meter Time Series
2.2.1 Studies on Very Low-Frequency Data. Most NILM studies
use high-frequency smart-meter data (seconds level at maximum),
and only very few studies have been conducted using very-low
sampling rates [
41
,
59
]. In [
59
], the authors suggested three methods
to estimate appliance consumption using hourly smart meter data.
The rst two methods are unsupervised and require knowledge
about manufacturer appliance parameters. The third method is
a supervised deep learning approach that requires disaggregate
appliance load curves for training. The few NILM studies conducted
at this sampling rate focus on estimating the consumed power
of each appliance, knowing which appliances are present in the
households.
Few papers in the literature [
2
,
14
] try to tackle the problem of de-
tecting the devices owned by a household using very low-frequency
sampled data. In [
2
], the authors used a Hidden Semi-Markov Model
(HSMM) to extract appliance features from power consumption data.
These features are then merged with external variables (such as
temperature) and serve to train an AdaBoost classier [
48
] to detect
the presence of dierent appliances. In [
14
], the authors proposed
a framework that uses a deep learning approach on subsequences
of a long consumption load curve to detect the appliances present
in the household. A majority vote gives the nal device prediction,
based on the individual predictions made on every examined subse-
quence. The study compares their method to [
2
], but not to any of
the current state-of-the-art time series classiers. In addition, only
one public dataset at one sampling rate was considered.
2.3 Time Series Classication
Time series classication (TSC) [
4
,
24
] is an important analysis
task across several domains. Many studies have suggested dierent
approaches to solve the TSC problem, ranging from the computation
of similarity measures between time series [
10
] to the identication
of discriminant patterns [
20
]. In addition, benchmarks, such as the
the UCR archive [
11
], have been proposed, on which exaustive
experimental studies have been conducted [
4
]. We discuss in more
detail the current state-of-the-art time series classiers in Section 3.
3 PROBLEM DEFINITION AND PROPOSED
BENCHMARK
3.1 Problem Denition
In this work, we treat the appliance detection problem as a super-
vised binary classication problem. We aim to identify the pres-
ence/absence of a specied appliance’s activation signature in a
smart meter data series, independently of the number of activations
of this appliance. The presence can be simply dened by the fact
that the device is switched "ON" at least once. Formally, we dene
the problem as follows:
Denition 3.1 (Appliance Detection Problem).Given an aggregate
smart meter time series
X R𝑇
, an appliance type
𝑎
, we want to
know if appliance
𝑎
is activated at least once in
X
(i.e., was in an
"ON" state, regardless of the time and number of activations).
3.2 Overview of Time Series Classiers
We now provide an overview of the dierent approaches proposed
in the literature to solve the TSC problem (refer to Figure 2). The
Time series Classification
Nearest-
Neighbor
Tre e-
based
Dictionary-
based
Deep
learning-based
Convolutional-
based
KNN
(eucli)
KNN
(dtw)
TSF
Rise
DrCIF
BOSS
BOSS
(ens)
cBOSS
(ens)
Rocket
MiniRocket
Arsenal
ConvNet
ResNet
ResNetAtt
Inception
Figure 2: Taxonomy of classier considered in our bench-
mark (in blue: classier used in the experimental evaluation).
objective is to compare the performance of these methods when
applied to the appliance detection problem.
3.2.1 Nearest-Neighbor Classifier.
𝐾
-Nearest-neighbor classiers
are the most simple and intuitive classiers, based on the notion of
time series similarity. Following a chosen distance measure, each
new instance is classied by getting assigned the same label as
the majority label of the
𝐾
closest samples in the training set. The
most popular distance measure is Euclidean distance, which allows
comparing two instances point to point. However, this distance does
not consider the possible distortions on the temporal axis. Dynamic
Time Warping (DTW) [
47
] is a distance measure to compute the
similarity between two time series, where relevant patterns may
evolve at dierent speeds. DTW suers from a high computational
cost, which makes it challenging to apply on large datasets.
3.2.2 Tree Based Classifier. Tree-based classiers, like Random
Forest [7], have exhibited promising results in classication tasks.
[Time Series Forest] TSF [
15
] is a random forest-based classier
that uses as input features extracted from randomly sampled inter-
vals of the raw data series. The algorithm rst selects a number
𝑟
of intervals with a random start position and length; then, from
each interval, three simple features are extracted: the mean, the
standard deviation, and the slope. Finally, the 3
𝑟
new features serve
to train a classic random forest classier.
[Random Interval Spectral Ensemble] The RISE algorithm [
33
]
is a random forest classier based on spectral extraction features,
rather than simple summary statistics for each interval. It computes
the Fast Fourier Transform (FFT) and the Auto Correlation Function
(ACF) for several randomly selected intervals. In contrast to TSF,
the algorithm extracts only one interval from the raw series for each
decision tree, and the rst tree is built using the features extracted
from the entire series.
[DrCIF] The Diverse Representation Canonical Interval Forest
Classier (DrCIF) algorithm [
38
] is an extension of the Canonical
Interval Forest (CIF) classier [
37
], which itself uses the Canonical
Time Series Characteristics (Catch22) [
35
]. Unlike the two previous
tree-based methods, this algorithm is an interval-based time series
classier that looks for discriminative subseries before building the
decision trees.
3.2.3 Dictionary Based Classifier. Dictionary-based approaches,
also called bag-of-words approaches, transform a time series into
a sequence of symbols (letters usually) according to a chosen dis-
cretization technique. Using a sliding window of a specic size
𝑙
,
3
A. Petralia, et al.
it is then possible to count the number of repeated patterns (i.e.,
symbolic words) to perform classication regarding the repetition
frequency of similar patterns.
[BOSS] The Bag Of SFA Symbol (BOSS) [
49
] is a dictionary-based
classier that uses Symbolic-Fourier-Approximation (SFA) [
50
] as
a discretization technique. It rst extracts sub-sequences from the
raw series using a predened sliding window of length
𝑙
. Then,
each sub-series is discretized in a word of size
𝑤
of
𝛼
symbols using
SFA and the Multiple Coecient Binning algorithms [
49
]. This
symbolic sentence (i.e., word arrangement) is then converted into
a histogram by counting the frequency occurrence of each word.
Finally, classication is performed using the histogram information.
[BOSS and cBOSS Ensembles] The BOSS ensemble [
49
] is a set
of individual BOSS classiers that use dierent discretization pa-
rameters
𝑤
and
𝑙
. The parameter
𝑙
is dened as
𝑙 [
10
, 𝑇 ]
(
𝑇
being
the time series length), and values of
𝑤 {
16
,
14
,
12
,
10
,
8
}
. The
number of symbols,
𝛼
, is set to the default value of 4. The algo-
rithm keeps only individual BOSS classiers that performed the
best according to a validation test. The BOSS ensemble requires
building and evaluating a large number of models, making it a time
and memory-intensive classier for large datasets. To address this
complexity, a compact version (cBOSS) was introduced, that uses a
restricted set of randomly chosen parameters for ensemble creation.
3.2.4 Deep Learning Based Classifier. The interest in deep learning
methods for time series classication has risen signicantly in
the past few years [
24
,
56
]. These models have shown excellent
performance, reaching the top of state-of-the-art.
[ConvNet] A Convolutional Neural Network (CNN) [
42
] is a type
of deep learning neural network widely used in image recognition
that is specially designed to extract patterns through data with
a grid-like structure, such as images, or time series. A CNN uses
convolution, where a lter is applied on a sliding window over the
time series. The ConvNet architecture proposed in [
56
] is composed
of three stacked Convolutional blocks followed by global average
pooling [
32
], and a Softmax activation function. Each Conv block
comprises a convolutional layer followed by a batch normalization
layer [
23
], and a ReLU activation layer. The three block used the
following 1D kernel sizes {8,5,3}.
[ResNet] The Residual Network (ResNet) architecture [
19
] was in-
troduced to address the gradient vanishing problem encountered in
large CNNs [
52
]. A ResNet is formed by stacking several blocks and
connecting them together using residual connections (i.e., identity
mapping). For time series classication, a ResNet architecture has
been proposed in [
56
], and has demonstrated a strong classication
accuracy [
6
]. It is the same architecture as the previously described
ConvNet model, with adding residual connection between each
Convolutional block.
[ResNet with Attention Mechanism] In [
14
], the authors pro-
posed a an extension of the ResNet architecture to perform appli-
ance detection. The model starts by extracting features using six
convolution blocks with dilated convolution and residual connec-
tions, followed by two encoder/decoder modules that use a dot
product attention mechanism. In this model, the dilated convolu-
tion (i.e., adding zeroes between the elements of the lter) aims to
increase the receptive eld of the kernels without increasing the
number of parameters. After the feature extraction step, the classi-
cation step is performed using a multi-layer perceptron followed
by a softmax activation function.
[InceptionTime] Inspired by inception-based networks in com-
puter vision [
54
], an ensemble of ve neural networks using Incep-
tion modules has been proposed for time series classication [
16
].
The model consists of ve identical networks using residual con-
nections and convolutional layers. One network uses 3 Inception
modules that replace the traditional residual blocks that we can
nd in a ResNet architecture. Each Inception modules consist of a
concatenation of convolutional layers using dierent size of lters.
Specically, each module results in the following layers. In the case
of multivariate time series, a 1D convolutional bottleneck layer is
used to reduce the number of dimensions of the time series .Then,
the output is fed to three dierent 1D convolutional layers with
dierent kernel sizes (10, 20, and 40) and one Max-Pooling layer
with kernel size 3. The last step consists of concatenating the previ-
ous four layers along the channel dimension and applying a ReLu
activation function to the output, followed by batch normalization.
All the convolutional layers used in the module come with 32 lters
and a stride parameter of 1.
3.2.5 Random Convolutional Kernel Features Classifiers. The au-
thors of [
12
] proposed an approach based on convolution lters
without learning any weights. Some variants of this model, based
on the same principle, were later proposed in the literature.
[ROCKET] The RandOm Convolutional KErnel Transform (ROCKET)
algorithm [
12
] uses 1D convolutional kernels to extract relevant
features. Instead of learning proper lter parameters using a gra-
dient descent algorithm to detect relevant patterns, the method
generates a large set of
𝐾
kernels with random length, weights,
bias, dilation, and padding. After applying them, the maximum
and the proportion of positive values are extracted as new features
for each time series, resulting in a 2
𝐾
features for each instance.
Classication is then performed on these features, using a simple
ridge classier. By default, ROCKET uses 10000 random kernels.
[MiniRocket] MINImally RandOm Convolutional KErnel Trans-
form (MiniRocket) [
13
] is a version of ROCKET that reduces the
random sampling space of the lter parameters, and keeps only
the proportion of positive values as a new feature for each ker-
nel. These modications lead to a lower execution time complexity
while maintaining similar performances.
[Arsenal] Arsenal [
39
] is an ensemble of multiple ROCKET clas-
siers that uses a restricted number of kernels compared to the
original model. This method was proposed to estimate the variance
predicted by the classier without changing the type of classier.
3.2.6 Ensemble Models. To reduce the variance in predictions, us-
ing a combination of models rather than a single one is a com-
mon technique. Ensemble models combining dierent approaches
have been proposed to address the TSC problem. Several ensemble
methods have been proposed in the literature, such as TS-CHIEF
(Time Series Combination of Heterogeneous and Integrated Embed-
ding Forest) [
51
] and HIVE-COTE (Hierarchical Vote Collective of
Transformation-Based Ensembles) [
39
]. The rst, is and ensemble
of tree classiers. The second is combining 4 dierent classiers and
use majority voting to provide the nal prediction. However, these
4
Appliance Detection Using Very Low-Frequency Smart Meter Time Series
Table 1: Left side : datasets characteristics (number of time series, sampling frequency, time series length). Right side : selected
appliance detection cases through the ve datasets; for each case, the table summarizes the number of time series available (
TS)
and the imbalance degree of the test set for the case (IB Ratio). A slash indicate that no data are available for this case/dataset.
Datasets Tot. TS
Datasets
TS Length Appliance case REFIT UKDALE CER EDF 1 EDF 2
1min 10min 15min 30min TS IB Ratio TS IB Ratio TS IB Ratio TS IB Ratio TS IB Ratio
REFIT 9091 1440 144 96 48
Tech
Desktop Computer 5190 0.56 3286 0.47 1402 0.38 3740 0.62
Television 1134 0.92 ⧸⧸⧸⧸
UKDALE 4767 1440 144 96 48
Kitchen
Cooker 1682 0.76
Kettle 4790 0.72 1222 0.84 ⧸⧸⧸
Microwave 7434 0.55 1678 0.77 324 0.91
Electric Oven ⧸⧸⧸510 0.85 1152 0.91
CER 4225 25728
Washer
Dishwasher 7798 0.44 2378 0.32 2350 0.66 224 0.93 2846 0.75
Tumble Dryer 3466 0.22 2214 0.68 1534 0.41 3470 0.42
Washing Machine 7422 0.54 2830 0.38 ⧸⧸⧸
EDF 1 2611 17520
Heating
Water Heater 3070 0.56 1336 0.66 548 0.86
Electric Heater 1348 0.19 1624 0.58 1538 0.56
Convector/Heat Pump ⧸⧸⧸506 0.69
EDF 2 1553 26208 17472 8736
Other
Electric Vehicule 140 0.3
models suer from a high execution time and cannot be applied to
very long time series such as load curves.
3.3 Energy Consumption Datasets
Numerous energy consumption datasets exist in the literature [8],
and some of them have become references to conduct NILM stud-
ies [
17
,
29
,
30
]. However, these datasets typically provide aggregated
and appliance-level load curves for only a few houses at a high-
sampling frequency. Resampling them at a very low frequency leads
to signicant data reduction. In order to include a broader range
of appliances and to align with existing literature, we include two
NILM datasets in our experiments: UK-DALE [
29
] and REFIT [
17
].
We also include one public dataset providing 30min sampled aggre-
gate load curves for a large number of households [
1
]. Moreover,
we include two private datasets from EDF (the main french elec-
tricity supplier). In total, we consider ve real diverse datasets in
our experimental evaluation. These datasets are detailed below.
3.3.1 NILM Datasets. UKDALE and REFIT are two well-known
high-frequency Smart Meters datasets used in NILM studies [
53
,
57
].
[UK-DALE] The UK-DALE dataset [
29
] contains data from 5 houses
in the United Kingdom, and includes appliance-level load curves
sampled every 6 seconds, as well as the whole-house aggregate
data series sampled at 16kHz. Four houses were recorded for over
a year and a half, while the 5th house was recorded for 655 days.
[REFIT] The REFIT project (Personalised Retrot Decision Sup-
port Tools for UK Homes using Smart Home Technology) [
17
]
ran between 2013 and 2015. During this period, 20 houses in the
United Kingdom were recorded after being monitored with smart
meters and multiple sensors. This dataset provides aggregate and
individual appliance load curves at 8-second sampling intervals.
3.3.2 CER Dataset. The Commission for Energy Regulation of Ire-
land conducted a study to assess the performance of smart meters
and their impact on consumer energy consumption [
1
], recording
the aggregate load curve consumption every 30min for over 5000
Irish homes and businesses. Pparticipants lled out a questionnaire
on the household composition, the behavior of electricity consump-
tion, and the type and number of appliances present in the home,
or business. In this work, we use the residential sub-group of the
study, i.e., 4225 households recorded from July 15, 2009, to January
1, 2011, for a total of 4225 series, of length 25728 data points each.
3.3.3 EDF Datasets. To better understand its customers’ base and
electricity consumption behavior, Electricité De France (EDF) con-
ducts surveys on customer samples. These customers consent to
EDF to use their data and analyze their consumption behaviors, and
only the aggregate power consumption of the house is recorded.
Similar to the CER study, customers ll out a questionnaire with in-
formation on which appliances are present in their households, and
on their consumption habits. Two EDF datasets from two dierent
studies were used in our experiments.
[EDF Dataset 1] The rst one contains 2611 load curves at 30min
sampling frequency of one year of electricity recording consump-
tion. Data were collected between September 2019 and September
2021 from 1553 dierent clients. The dataset consists of 2611 time
series of length 17520 from 1553 dierent sources.
[EDF Dataset 2] The second dataset contains 5354 load curves at
a 10min sampling frequency, recorded over a period of six months.
Data were collected between January 2012 and January 2015 from
1260 clients. The dataset consists of 5354 time series of length 26208
from 1260 dierent sources.
4 EXPERIMENTAL SETUP
All experiments are performed on a high-performance computing
cluster. The source code is in Python 3.7, and for each classier we
use the default parameters provided by the authors in the original
papers. For non-deep-learning approaches, we use the sktime li-
brary [
36
]. We perform each experiment on a server with 2 Intel
Xeon Gold 6140 CPUs with 190 Go RAM. For deep-learning based
models, we implement all the models using the 1.8.1 version of the
PyTorch framework [
44
], and run experiments on a server with 2
NVidia V100 GPUs with 16Go RAM.
We consider all the classiers presented in Section 3.2. We run
each method ve times using dierent random train/validation/test
5
A. Petralia, et al.
splits and report the average of these runs. Note that the error bars
shown in Figure 3, Figure 7, and Figure 8, correspond to the average
variability of the classiers through the ve runs. Additionally, we
set a 10-hour time limit per job. Only models that nished a run
(training + inference) are considered. We note that the ResNet with
Attention model was not evaluated using UKDALE and REFIT data
due to the residual block’s dilation convolution being incompatible
with the small size of the time series of these datasets.
We make all code available online: https://github.com/adrienpet
ralia/ApplianceDetectionBenchmark
4.1 Data Preprocessing
Since the datasets we employ in this study have been created using
dierent sampling frequencies, we preprocess them for the exper-
iments as explained below. The left part of Table 1 summarizes,
for each dataset, the number of time series and the corresponding
length, according to each sampling frequency.
4.1.1 NILM dataset preprocessing. The REFIT and UKDALE datasets
provide appliance level and total consumption load curves for a
small number of houses: 5 and 20, respectively. Moreover, the elec-
trical appliances in the houses are likely the same. Inspired by the
data processing step in NILM studies [
53
,
57
], we preprocess the
datasets by slicing the entire consumption curve of each household
into smaller sub-sequences.
For each experiment, we rst resample the data to a specied
sampling rate and ll in with linear interpolation the gaps of less
than 1 hour. Then, we process the datasets by splitting each house-
hold’s consumption load curve into smaller sub-sequences of one
day, and by dropping those with missing values. The choice of
the one day for the sub-sequence length provides an overall bal-
ance between positive (i.e., containing the device) and negative (i.e.,
not containing the device) samples. In contrast, slicing the entire
consumption curve in weeks leads to very few negative samples
for most appliance cases. This is because the appliances in these
datasets are devices that are very frequently used (on average, once
every two or three days). To assign the positive or negative label
(i.e., appliance presence or not) to a sub-sequence, we use the corre-
sponding disaggregated appliance load curve, allowing us to know
if the appliance has been switched on at least once for a given day.
By preprocessing the UKDALE dataset, we noticed that the fourth
house of the study could not be used for the experiments, since
a single disaggregated load curve regrouped multiple appliances.
Thus, we use only three houses for the training/validation set,
whereas the one last house’s sub-sequences are used for the test set.
With the REFIT data, we use two randomly selected houses for the
test set, while the other houses available are used for the train set.
4.1.2 CER and EDFs datasets preprocessing. The CER and EDFs
datasets provide only the total aggregated load curve of each house.
As a consequence, it is impossible to know if an appliance is acti-
vated or not for a given day. Therefore, we cannot slice the time
series into smaller subsequences as for the NILM datasets, and we
provide as inputs to the classier the full-length load curves. In
addition, we process the load curves by linearly interpolating gaps
of less than 1 hour and any time series with residual missing values
are not retained. The appliance presence label is assigned using
the provided questionnaire associated with each dataset. Finally,
we do a 70%/10%/20% random split of the houses for the training,
validation, and test sets, respectively.
4.1.3 Appliance Detection Cases. We select dierent cases of de-
vice detection through all the datasets, including small and big
appliances. The right part of Table 1 summarizes the selected ap-
pliance detection cases for all datasets. The REFIT and UKDALE
datasets include mostly small appliances because, in these studies,
only plugged devices were recorded. On the other hand, the CER
and EDFs datasets provide information about larger appliances,
directly connected to the electric meters, such as Water Heaters,
Heaters, and Electric Vehicles.
The selected cases aim to determine if a specic device is present
in a time series using binary detection. However, the "Convec-
tor/Heat Pump" case involves classifying the types of electric heaters,
such as distinguishing between convectors and heat pumps.
In order to ensure that the classiers are not biased during train-
ing, we maintain an equal balance of time series labeled with posi-
tive and negative samples. However, we note that the test set reects
the actual, imbalanced nature of the data, allowing us to evaluate
the classier’s performance in a realistic scenario.
TS is the number of labeled time series used for each case, in
which each class are balanced. IB Ratio indicates the imbalance
level of the corresponding test sets (i.e., the percentage of positive
instances in the number of instances).
4.2 Evaluation Measures
[Accuracy] When detecting appliance presence/absence, several
classication cases may be unbalanced. Indeed, most people own a
television or a washing machine but do not have an electric heating
system or a swimming pool. However, using a model that only
predicts the majority class may appear to perform well in these
cases when using the classication accuracy (i.e., the ratio of well
classied instances versus the total number of instances). Precision,
Recall, and the harmonic average of both, called the F1-Score, are
well-known measures, dened as follows:
F1-score =
2.𝑃 .𝑅
𝑃+𝑅, with: 𝑃=
𝑇 𝑃
𝑇 𝑃 +𝐹𝑃 ,𝑅=
𝑇 𝑃
𝑇 𝑃 +𝐹 𝑁
with
𝑇 𝑃 =True Positive
,
𝑇 𝑁 =True Negative
,
𝐹 𝑃 =False Positive
and
𝐹 𝑁 =False Negative
. Nevertheless, precision (P), recall (R), and
F1-score measures independently indicate the model’s performance
can be applied to one class only. In the case of a binary classi-
cation problem with data imbalance, these measures are typically
applied only to the minority class. In our classication problem, the
minority class varies depending on the specic device. Detecting
an appliance (i.e., the positive class) could correspond either to the
minority or the majority class. Thus, the F1-Score measure is not
appropriate in our case. To account for this variability and provide
an overall performance measure, we use the Macro F1-score to
evaluate the performance of the classiers. Formally, for
𝑁
class
(in our case, 𝑁=2), the Macro F1-Score is dened as follows:
Macro F1-score =
1
𝑁
𝑁
𝑖=1
F1-score𝑖
6
Appliance Detection Using Very Low-Frequency Smart Meter Time Series
Table 2: Results (average Macro F1-score for 5 runs) for the 11 classiers (as well as the average score of all classiers) evaluated
through the appliance detection cases (best in bold and second best underlined). The "Appliance Average Score" row shows the
average detection score for a specic device detection case if the appliance is available on multiple datasets. A slash indicates
that the corresponding classier failed to run on this case (time series length was not suciently large).
Appliance Dataset Arsenal Minirocket Rocket ConvNet ResNet ResNetAtt InceptionTime BOSS TSF Rise KNNeucli Avg. Score
Desktop Computer
CER 0.618 0.617 0.606 0.602 0.614 0.530 0.608 0.516 0.580 0.586 0.491 0.579
EDF 1 0.571 0.564 0.570 0.489 0.560 0.459 0.555 0.491 0.533 0.543 0.469 0.528
EDF 2 0.603 0.576 0.582 0.579 0.620 0.514 0.601 0.519 0.570 0.592 0.520 0.571
REFIT 0.697 0.683 0.674 0.715 0.740 0.623 0.542 0.525 0.600 0.548 0.635
Appliance Average Score 0.622 0.610 0.608 0.596 0.634 0.597 0.517 0.552 0.580 0.507 0.578
Television REFIT 0.656 0.647 0.645 0.695 0.699 0.718 0.485 0.737 0.664 0.513 0.646
Cooker CER 0.680 0.673 0.676 0.661 0.689 0.541 0.710 0.526 0.566 0.584 0.440 0.613
Kettle REFIT 0.368 0.376 0.381 0.522 0.477 0.415 0.536 0.359 0.428 0.421 0.428
UKDALE 0.540 0.502 0.522 0.428 0.432 0.583 0.504 0.353 0.442 0.446 0.475
Appliance Average Score 0.454 0.439 0.452 0.475 0.454 0.499 0.520 0.356 0.435 0.434 0.452
Microwave
REFIT 0.656 0.598 0.588 0.745 0.679 0.673 0.563 0.540 0.717 0.529 0.629
UKDALE 0.446 0.498 0.460 0.532 0.526 0.541 0.435 0.459 0.430 0.378 0.471
EDF 1 0.480 0.471 0.475 0.534 0.510 0.409 0.474 0.454 0.400 0.429 0.457 0.463
Appliance Average Score 0.527 0.522 0.508 0.604 0.572 0.563 0.484 0.466 0.525 0.455 0.521
Oven EDF 1 0.513 0.498 0.499 0.512 0.512 0.472 0.523 0.506 0.429 0.497 0.437 0.491
EDF 2 0.557 0.584 0.553 0.571 0.562 0.560 0.576 0.495 0.459 0.491 0.397 0.528
Appliance Average Score 0.535 0.541 0.526 0.542 0.537 0.516 0.550 0.500 0.444 0.494 0.417 0.509
Dishwasher
REFIT 0.650 0.599 0.619 0.580 0.605 0.590 0.557 0.519 0.584 0.515 0.582
UKDALE 0.458 0.465 0.465 0.419 0.380 0.384 0.399 0.429 0.554 0.525 0.448
CER 0.699 0.720 0.700 0.730 0.728 0.594 0.737 0.586 0.609 0.648 0.488 0.658
EDF 1 0.454 0.441 0.450 0.528 0.522 0.383 0.535 0.430 0.418 0.421 0.211 0.436
EDF 2 0.753 0.760 0.741 0.799 0.801 0.585 0.835 0.596 0.603 0.600 0.512 0.690
Appliance Average Score 0.603 0.597 0.595 0.611 0.607 0.616 0.514 0.516 0.561 0.450 0.563
Tumble Dryer
REFIT 0.493 0.503 0.502 0.468 0.448 0.441 0.506 0.416 0.434 0.461 0.467
CER 0.634 0.641 0.628 0.606 0.612 0.550 0.623 0.549 0.578 0.602 0.474 0.591
EDF 1 0.619 0.578 0.607 0.624 0.607 0.475 0.636 0.550 0.537 0.563 0.487 0.571
EDF 2 0.733 0.714 0.714 0.757 0.769 0.475 0.769 0.560 0.593 0.681 0.493 0.660
Appliance Average Score 0.620 0.609 0.613 0.614 0.609 0.617 0.541 0.531 0.570 0.479 0.572
Washing Machine REFIT 0.605 0.572 0.592 0.581 0.586 0.614 0.520 0.562 0.557 0.529 0.572
UKDALE 0.475 0.505 0.478 0.535 0.530 0.454 0.408 0.581 0.549 0.509 0.502
Appliance Average Score 0.540 0.538 0.535 0.558 0.558 0.534 0.464 0.572 0.553 0.519 0.537
Water Heater
CER 0.625 0.613 0.613 0.610 0.612 0.465 0.637 0.527 0.596 0.584 0.462 0.577
EDF 1 0.835 0.821 0.827 0.814 0.828 0.768 0.841 0.670 0.713 0.805 0.591 0.774
EDF 2 0.733 0.685 0.724 0.731 0.685 0.591 0.759 0.658 0.580 0.666 0.617 0.675
Appliance Average Score 0.731 0.706 0.721 0.718 0.708 0.608 0.746 0.618 0.630 0.685 0.557 0.675
Heater
CER 0.522 0.532 0.514 0.533 0.508 0.477 0.565 0.459 0.492 0.527 0.397 0.502
EDF 1 0.784 0.783 0.789 0.777 0.778 0.713 0.800 0.643 0.758 0.777 0.638 0.749
EDF 2 0.591 0.566 0.578 0.626 0.637 0.527 0.648 0.497 0.591 0.605 0.451 0.574
Appliance Average Score 0.603 0.597 0.595 0.659 0.607 0.572 0.616 0.514 0.516 0.561 0.450 0.609
Type of Heater EDF 1 0.632 0.622 0.631 0.597 0.638 0.534 0.651 0.539 0.556 0.625 0.467 0.590
Electric Vehicle EDF 1 0.689 0.730 0.670 0.681 0.699 0.553 0.720 0.541 0.456 0.725 0.556 0.638
Classiers Average Score 0.601 0.593 0.592 0.609 0.610 0.617 0.521 0.531 0.574 0.474
Classiers Average Rank 3.773 4.697 4.758 4.303 3.697 2.864 7.939 7.924 6.197 8.848
[Time Performance] Considering the computation time of clas-
siers is crucial for evaluating their eectiveness in real-world
scenarios. We measure the time performance of the classiers, con-
sidering the total time required for both training and inference.
5 RESULTS AND DISCUSSION
This section presents the results of our experimental evaluation.
First, we normalize the dierent datasets to the same sampling
frequency, i.e., 30min, to obtain overall results on all the cases.
Then, we perform an experimental evaluation of the inuence of
sampling frequency on the detection quality of the classiers. We
also analyze the data size impact on the detection quality. Finally,
we provide a discussion of the overall results.
5.1 Accuracy for 30min Sampling Frequency
The appliance detection results of the classiers for the sampling
frequency of 30min are summarized in Table 2. We observe that all
classiers return poor results for the UKDALE dataset. (We discuss
and explain these results in detail in Section 5.3.) Furthermore, we
note that independently of the dataset, some appliances are easier
to detect than others. The following sections provide an analysis of
these results according to the type of appliances.
5.1.1 Tech Appliances. Desktop Computer and Television seem
to be well detected in the REFIT dataset, with a Macro F1-Score
above 0.7 for the best classiers. The score obtained on Desktop
Computer on other datasets is not as good, but is consistent with
the number of time series provided. It can be explained by the
fact that the pattern is hidden behind other appliance activation
signatures in longer smart meters load curves, and thus, is hard for
classiers to detect.
7
A. Petralia, et al.
5.1.2 Kitchen Appliances. First, detecting Kettle usage looks pretty
challenging, with poor results obtained by all classiers and a Macro
F1-Score
0
.
45. Given that a kettle operates for relatively short
periods, it is understandable that its activation may not be captured
using 30min sampled data. Microwave oven and classic Oven are
not well detected in the EDF datasets. However, the detection score
obtained on REFIT by the best two classiers is above 0
.
7, thanks
to the larger amount of data available for this case in REFIT. Finally,
the Cooker is well detected on the CER dataset.
5.1.3 Washer Appliances. Classiers achieve promising results de-
tecting Dishwasher and Tumble Dryer through CER and EDF 2
datasets. The lower performance obtained with the EDF 1 datast is
explained by the lower amount of labeled instances given for these
cases. However, the low score results obtained on the three washer
appliances for REFIT are not due to the amount of time series data.
We believe that this poor detection score can be explained by the
fact that these three devices are used in combination and have
similar activation patterns; therefore, the classiers cannot easily
distinguish among them.
5.1.4 Heating Appliances. The best detection scores are achieved
for Water Heater on the EDF 1 and EDF 2 datasets. In France,
water heaters refer mainly to devices that heat water from a hot
tank, and usually operate during hours with high consumption
power levels [
21
]. The classiers can eectively discern this type
of pattern, even using 30min sampled data. The lower performance
on the CER dataset can be attributed to the use of two types of
water heaters in Ireland: instantaneous and tank-pumped. Instan-
taneous water heaters only operate on demand, resulting in high
spikes of short duration. Using the same label for these two devices,
which have dierent activation signatures, signicantly impacts
the performance of the classiers. The results on heater detection
are satisfying for the EDF datasets, and we assume that the score
dierence between EDF 1 and EDF 2 is mainly due to the span of
the time period used for training the model. By providing a full
year of electricity consumption, the model can more easily detect
the heater pattern, since it trains with data during the high con-
sumption levels of the winter season. The poor performance on
the CER dataset for heater detection can be attributed to the fact
that the heater label indicates the presence of a convector electric
heater, which is typically used as a supplementary heat source in
winter, rather than being the primary heat source for the home.
5.1.5 Other Appliances. Electric Vehicles are well detected on
the EDF 1 dataset considering the restricted number of labeled
instances that we have available. The lengthy recharging times of
electric vehicles and the high power required, combined with the
fact that recharging often occurs mainly during low-consumption
night-time hours, can explain the good performance we observe.
5.1.6 Overall Classifier Results Using 30min data. The overall re-
sults, shown in Table 2 and Figure 3, demonstrate that Inception-
Time outperforms other classiers when considering the average
score and rank; InceptionTime is followed by ResNet, Arsenal, Con-
vNet, MiniRocket and Rocket. Since the ResNet model enhanced
with the attention mechanism was not evaluated in all the cases,
we do not include it in the total average score shown in Figure 3.
KNNeucli
BOSS
TSF
Rise
Rocket
Minirocket
Arsenal
ConvNet
ResNet
Inception
0.4
0.5
0.6
0.7
F1-Score Macro
Figure 3: Average classiers detection score through all the
detection cases and all the datasets.
KNNeucli
TSF
Rise
ResNet
ConvNet
ResNetAtt
Arsenal
Inception
Minirocket
Rocket
BOSS
1
10
120
480
Time (min)
Log Scale
Figure 4: Average running time per run (training + inference
time) for all classiers (log scale y-axis).
However, this classier achieves relatively poor performance com-
pared to the others (refer to Table 2). In light of these results, it is
essential to note the dierence in performance between the best
and worst performing classiers: convolutional-based classiers,
i.e., InceptionTime, ConvNet and ResNet, are the optimal choice
for many detection cases.
Figure 4 summarizes the average total running time (i.e., training
and inference time together) for the 11 classiers we studied. Tak-
ing into consideration the performance of the convolutional-based
approaches (deep- and non deep-learning approaches), as well as
their running time, we observe that this type of classier is the most
suitable for appliance detection using 30min sampled smart meter
data. InceptionTime reaches a sligthly higher detection score, but at
the cost of longer execution times. A balance between performance
and eciency is achieved by the ResNet and ConvNet classiers.
5.2 Inuence of Sampling Rate
In this part of the experimental evaluation, we analyze the im-
provement of the detection score of the dierent classiers, as the
smart meter sampling rate increases. We used the REFIT and EDF 2
datasets to perform these experiments, since these datasets provide
data at a higher frequency than every 30min.
Using the REFIT dataset, we performed experiments at four
dierent sampling rates: 1min, 10min, 15min, and 30min. To obtain
complementary results on bigger appliances that were not available
with REFIT data, we also included appliance detection cases from
the EDF 2 dataset. However, since this dataset oers data sampled
at 10 min, we could only produce results for sampling rates: 10min,
15min, and 30min.
All the results are summarized in Figure 5. For clarity, we only
illustrate the scores of the ve best classiers.
On average across all cases, the appliance detection accuracy
decreases signicantly (by almost 0.1) when the sampling rate drops
8
Appliance Detection Using Very Low-Frequency Smart Meter Time Series
(a) Dishwasher (REFIT) (b) Microwave (REFIT) (c) Kettle (REFIT) (d) Desktop Computer (REFIT)
(e) Television (REFIT) (f) Washing Machine (REFIT) (g) Tumble Dryer (REFIT)
(j) Oven (EDF 2)
All Classifier (Avg.)
Legend:
(h) Water Heater (EDF 2)
(i) Heater (EDF 2) (k) Average All Cases
Figure 5: Inuence of sampling frequency on dierent appliance detection cases. The detection score is given for each classier
and detection case following the resampling frequency of the data. The black line shows the average score of all the classiers.
from 1min to 30min. For the best classier (InceptionTime), the
average drop is 0.15.
However, it is interesting to note that not all appliances are sig-
nicantly better detected using a higher sampling frequency. As
expected, appliances that operate only for short periods, i.e., Mi-
crowave or Kettle, benet the most when using higher smart meter
frequencies. For example, the results in Figure 5 show that using
1min sampled data can signicantly improve the Kettle detection.
In this case, the best classier, ResNet, achieves a 0.2 improvement
in the detection score when the sampling rate increases from 30min
to 1min. For the Microwave case, it is a 0.1 average gain score for
all the classiers using 1min sampled data.
Other appliances, such as Dishwasher,Desktop Computer,
Television,Washing Machine and Water Heater, which typ-
ically operate for long periods, are better detected using higher
sampling rates, as well. For example, using 1min level data, the
Washing Machine is much more accurately detected than when
using 30min data (refer to Figure 5(f)).
5.3 Inuence of Data Size
In this last part, we analyze the impact of the number of distinct
households on classier performance. These experiments demon-
strate that classiers cannot eectively learn the patterns of an
appliance using only a small number of households when the smart
meter data sampling frequency is very low (this explains the poor
results presented in Section 5.1 for the UK-DALE dataset). Further-
more, we demonstrate that the number of households is more im-
portant for training the machine learning models than the amount
of data available for each household.
We compared the following two approaches for training: (i) ran-
domly select a subset of the houses and use all the data from these
houses to train the models, and (ii) select all houses and use a
random subset of the time series from each house. We performed
the experiments on the appliance detection cases using the REFIT
dataset. Furthermore, in order to account for the impact of the smart
meter reading on these results, we performed the experiments using
4 dierent sampling frequencies: 1min, 10min, 15min, and 30min.
Figure 6 summarizes the results of these experiments: the graphs
show the average performance of all classiers
1
for each sampling
rate. The black line represents the score value averaged across all
sampling rates.
We note that for every sampling rate and detection case, it is
almost always preferable to use all the available households and a
subset of their time series, rather than to use all time series from
a subset of the households. Indeed, data from the same house is
frequently characterized by the consumption patterns of the res-
idents. Instead, using data from multiple households, enables the
classier to focus on and learn the actual activation patterns of
the appliances. Interestingly, using a subset of the households, or a
subset of the time series does not seem to signicantly aect the
detection accuracy for the Washing Machine and the Tumble
Dryer. The Tumble Dryer is indeed not well detected in our ex-
periments. However, the detection score of the Washing Machine
seems to be more impacted by the sampling frequency rather than
by the data size.
1
We average the performance of all classiers listed in Table 2, except for ResNetAtt,
which could not be used with the small length of the REFIT time series.
9
A. Petralia, et al.
(a) Desktop Computer (b) Television (c) Kettle
(e) Dishwasher
(d) Microwave
All rate (Avg.)
(g) Washing Machine
Legend:
(f) Tumble Dryer
Figure 6: Results of the data imputation study using the REFIT dataset. For each appliance case, we can see on the left gure
the evolution of the classication score according to the number of houses used, i.e., sources; on the right gure, we can see the
evolution of the classication score according to the percentage of data used by houses.
6 DISCUSSION
We now summarize the results of our evaluation. Figure 7 shows
the average score for each classier across all the experiments con-
ducted in our study. The results show that the three deep learning-
based methods are the most accurate overall. Among them, ResNet
and ConvNet perform on average sligthly better than Inception-
Time. However, as shown in Figure 8, the average score depends on
the time series length. ResNet and ConvNet are better on average
when using the short time series (REFIT and UKDALE datasets).
InceptionTime is better on average when using long time series
(CER and EDF datasets), because of InceptionTime’s ability to cap-
ture long-lasting patterns through the use of a combination of
dierently-sized kernels. Nevertheless, as the condence intervals
indicate, there is no clear winner among the three deep-learning
classiers. Based on these ndings, we recommend using either
ResNet or ConvNet, since their time performance is one order of
magnitude faster than InceptionTime (see Figure 4).
Overall, the experiments show that for improving appliance de-
tection, it is benecial for electricity suppliers to collect data over
extended periods of time, and at a ner time step than 30min. In-
deed, a 15min step seems to be the minimum target in order to
correctly detect a certain number of appliances. Furthermore, this
study shows that further work is needed to more accurately detect
appliances, even for data with 1min sampling frequency. Neverthe-
less, the lack of large electricity consumption public datasets that
can be used to develop and train new algorithms is an important
shortcoming. Having more good-quality data over long time inter-
vals are necessary in order to allow for the development of more
robust methods and further advancements in the eld.
KNNeucli
BOSS
TSF
Rise
Rocket
Minirocket
Arsenal
Inception
ConvNet
ResNet
0.4
0.5
0.6
0.7
F1-Score Macro
Figure 7: Average classier detection score through all the
experiments realized in this study (including sampling fre-
quency inuence and data size inuence experiments).
Short Time Series
(REFIT, UKDALE) Long Time Series
(CER, EDF 1, EDF 2)
0.60
0.65
0.70
F1-Score Macro
ConvNet
ResNet
Inception
Figure 8: Average score of the 3 best classiers (ConvNet,
ResNet and InceptionTime) according to the time series
length (i.e., datasets).
7 CONCLUSIONS
This paper presents a comprehensive evaluation of state-of-the-art
time series classiers applied to the appliance detection in very
low-frequency smart meter data. We develop the rst benchmark
of time series classiers for appliance detection using ve dierent
10
Appliance Detection Using Very Low-Frequency Smart Meter Time Series
real datasets of very low-frequency electricity consumption with
varying time series lengths. The results indicate that the perfor-
mance of current time series classiers varies signicantly; only
appliances that operate during long periods of time can be accu-
rately detected using 30min sampled data. However, using 1min
sampling data can drastically increase the detection accuracy of
small appliances. Furthermore, deep learning-based classiers have
shown promising results in terms of accuracy, particularly for cer-
tain appliances. Overall, this study provides a valuable contribution
to electricity suppliers, as well as analysts and practitioners, in
order to help them choose the appropriate classier for accurately
detecting appliances in very low-frequency smart meter data.
REFERENCES
[1]
2012. CER Smart Metering Project - Electricity Customer Behaviour Trial, 2009-
2010. https://www.ucd.ie/issda/data/commissionforenergyregulationcer/
[2]
Adrian Albert and Ram Rajagopal. 2013. Smart Meter Driven Segmentation:
What Your Consumption Says About You. IEEE Transactions on Power Systems
28, 4 (2013), 4019–4030. https://doi.org/10.1109/TPWRS.2013.2266122
[3]
Muzaer Aslan and Ebra Nur Zurel. 2022. An ecient hybrid model for ap-
pliances classication based on time series features. Energy and Buildings 266
(2022), 112087. https://doi.org/10.1016/j.enbuild.2022.112087
[4]
Anthony Bagnall, Aaron Bostrom, James Large, and Jason Lines. 2016. The Great
Time Series Classication Bake O: An Experimental Evaluation of Recently
Proposed Algorithms. Extended Version. https://doi.org/10.48550/ARXIV.1602.
01711
[5]
Gouri R. Barai, Sridhar Krishnan, and Bala Venkatesh. 2015. Smart metering and
functionalities of smart meters in smart grid - a review. In 2015 IEEE Electrical
Power and Energy Conference (EPEC). 138–145. https://doi.org/10.1109/EPEC
.2015.7379940
[6]
Paul Boniol, Mohammed Meftah, Emmanuel Remy, and Themis Palpanas. 2022.
DCAM: Dimension-Wise Class Activation Map for Explaining Multivariate Data
Series Classication. In Proceedings of the 2022 International Conference on Man-
agement of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Comput-
ing Machinery, New York, NY, USA, 1175–1189. https://doi.org/10.1145/3514221.
3526183
[7]
L Breiman. 2001. Random Forests. Machine Learning 45 (10 2001), 5–32. https:
//doi.org/10.1023/A:1010950718922
[8]
Deepika R. Chavan, Dagadu S. More, and Amruta M. Khot. 2022. IEDL: Indian
Energy Dataset with Low frequency for NILM. Energy Reports 8 (2022), 701–709.
https://doi.org/10.1016/j.egyr.2022.05.133 2022 The 4th International Conference
on Clean Energy and Electrical Systems.
[9]
Stanislav Chren, Bruno Rossi, and Tomáš Pitner. 2016. Smart grids deployments
within EU projects: The role of smart meters. In 2016 Smart Cities Symposium
Prague (SCSP). 1–5. https://doi.org/10.1109/SCSP.2016.7501033
[10]
T. Cover and P. Hart. 1967. Nearest neighbor pattern classication. IEEE Trans-
actions on Information Theory 13, 1 (1967), 21–27. https://doi.org/10.1109/TIT.
1967.1053964
[11]
Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan
Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh.
2018. The UCR Time Series Archive. https://doi.org/10.48550/ARXIV.1810.07758
[12]
Angus Dempster, François Petitjean, and Georey I. Webb. 2019. ROCKET:
Exceptionally fast and accurate time series classication using random convolu-
tional kernels. CoRR abs/1910.13051 (2019). arXiv:1910.13051 http://arxiv.org/ab
s/1910.13051
[13]
Angus Dempster, Daniel F Schmidt, and Georey I Webb. 2021. MiniRocket: A
Very Fast (Almost) Deterministic Transform for Time Series Classication. In
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining. ACM, New York, 248–257.
[14]
Chunyu Deng, Kehe Wu, and Binbin Wang.2022. Residential Appliance Detection
Using Attention-based Deep Convolutional Neural Network. CSEE Journal of
Power and Energy Systems 8, 2 (2022), 621–633. https://doi.org/10.17775/CSEEJP
ES.2020.03450
[15]
Houtao Deng, George Runger, Eugene Tuv, and Martyanov Vladimir. 2013. A
Time Series Forest for Classication and Feature Extraction. (2013). https:
//doi.org/10.48550/ARXIV.1302.2277
[16]
Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier,
Daniel F. Schmidt, Jonathan Weber, Georey I. Webb,Lhassane Idoumghar, Pierre-
Alain Muller, and François Petitjean. 2020. InceptionTime: Finding AlexNet for
time series classication. Data Mining and Knowledge Discovery 34, 6 (sep 2020),
1936–1962. https://doi.org/10.1007/s10618-020- 00710-y
[17]
Steven Firth, Tom Kane, Vanda Dimitriou, Tarek Hassan, Farid Fouchal, Michael
Coleman, and Lynda Webb. 2017. REFIT Smart Home dataset. (6 2017). https:
//doi.org/10.17028/rd.lboro.2070091.v1
[18]
G.W. Hart. 1992. Nonintrusive appliance load monitoring. Proc. IEEE 80, 12
(1992), 1870–1891. https://doi.org/10.1109/5.192069
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual
Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385
[20]
J. Hills, J. Lines, E. Baranauskas, et al
.
2014. Classication of time series by
shapelet transformation. Data Min Knowl Disc 28 (2014), 851–881. https:
//doi.org/10.1007/s10618-013- 0322-1
[21]
P.A. Hohne, K. Kusakana, and B.P. Numbi. 2019. A review of water heating
technologies: An application to the South African context. Energy Reports 5
(2019), 1–19. https://doi.org/10.1016/j.egyr.2018.10.013
[22]
Patrick Huber, Alberto Calatroni, Andreas Rumsch, and Andrew Paice. 2021.
Review on Deep Neural Networks Applied to Low-Frequency NILM. Energies
14, 9 (2021). https://doi.org/10.3390/en14092390
[23]
Sergey Ioe and Christian Szegedy. 2015. Batch Normalization: Accelerating
Deep Network Training by Reducing Internal Covariate Shift. https://doi.org/
10.48550/ARXIV.1502.03167
[24]
Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar,
and Pierre-Alain Muller. 2019. Deep Learning for Time Series Classication: A
Review. Data Min. Knowl. Discov. 33, 4 (jul 2019), 917–963. https://doi.org/10.
1007/s10618-019- 00619-1
[25]
Matthias Kahl, Daniel Jorde, and Hans-Arno Jacobsen. 2022. Representation
Learning for Appliance Recognition: A Comparison to Classical Machine Learn-
ing. https://doi.org/10.48550/ARXIV.2209.03759
[26]
Matthias Kahl, Anwar Ul Haq, Thomas Kriechbaumer, and Hans-Arno Jacobsen.
2017. A Comprehensive Feature Study for Appliance Recognition on High
Frequency Energy Data. In Proceedings of the Eighth International Conference
on Future Energy Systems (Shatin, Hong Kong) (e-Energy ’17). Association for
Computing Machinery, New York, NY, USA, 121–131. https://doi.org/10.1145/
3077839.3077845
[27]
Maria Kaselimi, Eftychios Protopapadakis, Athanasios Voulodimos, Nikolaos
Doulamis, and Anastasios Doulamis. 2022. Towards Trustworthy Energy Disag-
gregation: A Review of Challenges, Methods, and Perspectives for Non-Intrusive
Load Monitoring. Sensors 22 (08 2022), 5872. https://doi.org/10.3390/s22155872
[28]
Jack Kelly and William Knottenbelt. 2015. Neural NILM. In Proceedings of the
2nd ACM International Conference on Embedded Systems for Energy-Ecient Built
Environments. ACM. https://doi.org/10.1145/2821650.2821672
[29]
Jack Kelly and William Knottenbelt. 2015. The UK-DALE dataset, domestic
appliance-level electricity demand and whole-house demand from ve UK homes.
Scientic Data 2 (03 2015). https://doi.org/10.1038/sdata.2015.7
[30]
J. Zico Kolter. 2011. REDD : A Public Data Set for Energy Disaggregation
Research.
[31]
Pauline Laviron, Xueqi Dai, Bérénice Huquet, and Themis Palpanas. 2021.
Electricity Demand Activation Extraction: From Known to Unknown Signa-
tures, Using Similarity Search. In e-Energy ’21: The Twelfth ACM International
Conference on Future Energy Systems, Virtual Event, Torino, Italy, 28 June - 2
July, 2021, Herman de Meer and Michela Meo (Eds.). ACM, 148–159. https:
//doi.org/10.1145/3447555.3464865
[32]
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network In Network. https:
//doi.org/10.48550/ARXIV.1312.4400
[33]
Jason Lines, Sarah Taylor, and Anthony Bagnall. 2016. HIVE-COTE: The Hi-
erarchical Vote Collective of Transformation-Based Ensembles for Time Series
Classication. In 2016 IEEE 16th International Conference on Data Mining (ICDM).
1041–1046. https://doi.org/10.1109/ICDM.2016.0133
[34]
Yu Liu, Congxiao Liu, Yiwen Shen, Xin Zhao, Shan Gao, and Xueliang Huang.
2021. Non-intrusive energy estimation using random forest based multi-label
classication and integer linear programming. Energy Reports 7 (2021), 283–291.
https://doi.org/10.1016/j.egyr.2021.08.045 2021 The 4th International Conference
on Electrical Engineering and Green Energy.
[35]
Carl H Lubba, Sarab S Sethi, Philip Knaute, Simon R Schultz, Ben D Fulcher, and
Nick S Jones. 2019. catch22: CAnonical Time-series CHaracteristics. https:
//doi.org/10.48550/ARXIV.1901.10200
[36]
Markus Löning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason
Lines, and Franz J. Király. 2019. sktime: A Unied Interface for Machine Learning
with Time Series. https://doi.org/10.48550/ARXIV.1909.07872
[37]
Matthew Middlehurst, James Large, and Anthony Bagnall. 2020. The Canonical
Interval Forest (CIF) Classier for Time Series Classication. In 2020 IEEE Inter-
national Conference on Big Data (Big Data). IEEE. https://doi.org/10.1109/bigdat
a50022.2020.9378424
[38]
Matthew Middlehurst, James Large, and Anthony J. Bagnall. 2020. The Canon-
ical Interval Forest (CIF) Classier for Time Series Classication. CoRR
abs/2008.09172 (2020). arXiv:2008.09172 https://arxiv.org/abs/2008.09172
[39]
Matthew Middlehurst, James Large, Michael Flynn, Jason Lines, Aaron Bostrom,
and Anthony J. Bagnall. 2021. HIVE-COTE 2.0: a new meta ensemble for time
series classication. CoRR abs/2104.07551 (2021). arXiv:2104.07551 https://arxiv.
org/abs/2104.07551
11
A. Petralia, et al.
[40]
Megan Milam and G. Kumar Venayagamoorthy. 2014. Smart meter deployment:
US initiatives. In ISGT 2014. 1–5. https://doi.org/10.1109/ISGT.2014.6816507
[41]
Ayumu Miyasawa, Yu Fujimoto, and Yasuhiro Hayashi. 2019. Energy dis-
aggregation based on smart metering data via semi-binary nonnegative ma-
trix factorization. Energy and Buildings 183 (15 Jan. 2019), 547–558. https:
//doi.org/10.1016/j.enbuild.2018.10.030 Funding Information: Part of this work
was supported by the Japan Science and Technology Agency , CREST [ JP-
MJCR15K5 ]. We are deeply grateful to the sta members of Informetis Co., Ltd,.
and wish to thank them for providing real data and discussing the evaluation
index. Publisher Copyright: ©2018.
[42]
Keiron O’Shea and Ryan Nash. 2015. An Introduction to Convolutional Neural
Networks. CoRR abs/1511.08458 (2015). arXiv:1511.08458 http://arxiv.org/abs/
1511.08458
[43]
Francesca Paradiso, Federica Paganelli, Antonio Luchetta, Dino Giuli, and Pino
Castrogiovanni. 2013. ANN-based appliance recognition from low-frequency
energy monitoring data. In 2013 IEEE 14th International Symposium on "A World
of Wireless, Mobile and Multimedia Networks" (WoWMoM). 1–6. https://doi.org/
10.1109/WoWMoM.2013.6583496
[44]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory
Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban
Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan
Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith
Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning
Library. https://doi.org/10.48550/ARXIV.1912.01703
[45]
Leitao Qu, Yaguang Kong, Meng Li, Wei Dong, Fan Zhang, and Hongbo Zou.
2023. A residual convolutional neural network with multi-block for appliance
recognition in non-intrusive load identication. Energy and Buildings 281 (2023),
112749. https://doi.org/10.1016/j.enbuild.2022.112749
[46]
Florian Rossier, Philippe Lang, and Jean Hennebert. 2017. Near Real-Time Ap-
pliance Recognition Using Low Frequency Monitoring and Active Learning
Methods. Energy Procedia 122 (2017), 691–696. https://doi.org/10.1016/j.egypro
.2017.07.371 CISBAT 2017 International ConferenceFuture Buildings & Districts
Energy Eciency from Nano to Urban Scale.
[47]
H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimization
for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal
Processing 26, 1 (1978), 43–49. https://doi.org/10.1109/TASSP.1978.1163055
[48]
Robert E Schapire. 2013. Explaining adaboost. In Empirical inference. Springer,
37–52.
[49]
Patrick Schäfer. 2015. The BOSS is concerned with time series classication
in the presence of noise. Data Mining and Knowledge Discovery 29 (11 2015).
https://doi.org/10.1007/s10618-014- 0377-7
[50]
Patrick Schäfer and Mikael Högqvist. 2012. SFA: A symbolic fourier approx-
imation and index for similarity search in high dimensional datasets. ACM
International Conference Proceeding Series, 516 527. https://doi.org/10.1145/
2247596.2247656
[51]
Ahmed Shifaz, Charlotte Pelletier, François Petitjean, and Georey I. Webb.
2019. TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series
Classication. CoRR abs/1906.10329 (2019). arXiv:1906.10329 http://arxiv.org/ab
s/1906.10329
[52]
K Simonyan and A Zisserman. 2015. Very deep convolutional networks for large-
scale image recognition. 3rd International Conference on Learning Representations
(ICLR 2015), 1–14.
[53]
Stavros Sykiotis, Maria Kaselimi, Anastasios Doulamis, and Nikolaos Doulamis.
2022. ELECTRIcity: An Ecient Transformer for Non-Intrusive Load Monitoring.
Sensors 22, 8 (2022). https://doi.org/10.3390/s22082926
[54]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014.
Going Deeper with Convolutions. https://doi.org/10.48550/ARXIV.1409.4842
[55]
Seyed Mostafa Tabatabaei, Scott Dick, and Wilsun Xu. 2017. Toward Non-
Intrusive Load Monitoring via Multi-Label Classication. IEEE Transactions on
Smart Grid 8, 1 (2017), 26–40. https://doi.org/10.1109/TSG.2016.2584581
[56]
Zhiguang Wang, Weizhong Yan, and Tim Oates. 2016. Time Series Classication
from Scratch with Deep Neural Networks: A Strong Baseline. https://doi.org/
10.48550/ARXIV.1611.06455
[57]
Zhenrui Yue, Camilo Requena Witzig, Daniel Jorde, and Hans-Arno Jacobsen.
2020. BERT4NILM: A Bidirectional Transformer Model for Non-Intrusive Load
Monitoring. In Proceedings of the 5th International Workshop on Non-Intrusive
Load Monitoring (Virtual Event, Japan) (NILM’20). Association for Computing
Machinery, New York, NY, USA, 89–93. https://doi.org/10.1145/3427771.3429390
[58]
Bochao Zhao, Lina Stankovic, and Vladimir Stankovic. 2016. On a Training-
Less Solution for Non-Intrusive Appliance Load Monitoring Using Graph Signal
Processing. IEEE Access 4 (2016), 1784–1799. https://doi.org/10.1109/ACCESS
.2016.2557460
[59]
Bochao Zhao, Minxiang Ye, Lina Stankovic, and Vladimir Stankovic. 2020. Non-
intrusive load disaggregation solutions for very low-rate smart meter data. Ap-
plied Energy 268 (2020), 114949. https://doi.org/10.1016/j.apenergy.2020.114949
12
... Using the Internet of Things (IoT) technology, a smart meter acts as a sensing node in a meshed network. Several attempts to apply the IoT concept in smart meters for measuring electricity and household appliance detection have been made in recent years [3][4][5][6][7][8][9][10][11]. The research work in [3] introduces an advanced energy smart meter that obtains information from the end users' load devices and measures the overall energy consumption. ...
... The research work in [6] and [8] showed a deep learning-based multi-label classification. The research work in [9] and [10] presents an indepth evaluation and comparison of state-of-the-art time series classifiers applied to detecting the presence/absence of diverse appliances in very low-frequency smart meter data. In [11] a smart home appliance classification that utilizes a deep learning architecture deployed on a Raspberry Pi microcontroller and interfaced with smart meters in a home to generate almost realtime classification of appliances. ...
... Classifying electricity demand data of households and buildings into different end uses is an important but challenging and under-explored problem. Recent research has employed deep learning techniques to classify electricity demand to identify individual appliances through the analysis of smart meter data [5,7]. Moreover, in the context of detecting anomalies, several models have been developed that employ tree-based methods and deep learning techniques [1,2]. ...
Article
Full-text available
Anomaly detection is a fundamental task for time-series analytics with important implications for the downstream performance of many applications. Despite increasing academic interest and the large number of methods proposed in the literature, recent benchmark and evaluation studies demonstrated that no overall best anomaly detection methods exist when applied to very heterogeneous time series datasets. Therefore, the only scalable and viable solution to solve anomaly detection over very different time series collected from diverse domains is to propose a model selection method that will select, based on time series characteristics, the best anomaly detection method to run. Existing AutoML solutions are, unfortunately, not directly applicable to time series anomaly detection, and no evaluation of time series-based approaches for model selection exists. Towards that direction, this paper studies the performance of time series classification methods used as model selection for anomaly detection. Overall, we compare 17 different classifiers over 1800 time series, and we propose the first extensive experimental evaluation of time series classification as model selection for anomaly detection. Our results demonstrate that model selection methods outperform every single anomaly detection method while being in the same order of magnitude regarding execution time. This evaluation is the first step to demonstrate the accuracy and efficiency of time series classification algorithms for anomaly detection, and represents a strong baseline that can then be used to guide the model selection step in general AutoML pipelines.
Article
Full-text available
Non-intrusive load monitoring (NILM) is the task of disaggregating the total power consumption into its individual sub-components. Over the years, signal processing and machine learning algorithms have been combined to achieve this. Many publications and extensive research works are performed on energy disaggregation or NILM for the state-of-the-art methods to reach the desired performance. The initial interest of the scientific community to formulate and describe mathematically the NILM problem using machine learning tools has now shifted into a more practical NILM. Currently, we are in the mature NILM period where there is an attempt for NILM to be applied in real-life application scenarios. Thus, the complexity of the algorithms, transferability, reliability, practicality, and, in general, trustworthiness are the main issues of interest. This review narrows the gap between the early immature NILM era and the mature one. In particular, the paper provides a comprehensive literature review of the NILM methods for residential appliances only. The paper analyzes, summarizes, and presents the outcomes of a large number of recently published scholarly articles. Furthermore, the paper discusses the highlights of these methods and introduces the research dilemmas that should be taken into consideration by researchers to apply NILM methods. Finally, we show the need for transferring the traditional disaggregation models into a practical and trustworthy framework.
Conference Paper
Full-text available
Data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decision is a real need in many applications. Convolutional neural networks perform well for the data series classification task; though, the explanations provided by this type of algorithm are poor for the specific case of multivariate data series. Addressing this important limitation is a significant challenge. In this paper, we propose a novel method that solves this problem by highlighting both the temporal and dimensional discriminant information. Our contribution is twofold: we first describe a convolutional architecture that enables the comparison of dimensions; then, we propose a method that returns dCAM, a Dimension-wise Class Activation Map specifically designed for multivariate time series (and CNN-based models). Experiments with several synthetic and real datasets demonstrate that dCAM is not only more accurate than previous approaches but the only viable solution for discriminant feature discovery and classification explanation in multivariate time series.
Article
Full-text available
Non-Intrusive Load Monitoring (NILM) describes the process of inferring the consumption pattern of appliances by only having access to the aggregated household signal. Sequence-to-sequence deep learning models have been firmly established as state-of-the-art approaches for NILM, in an attempt to identify the pattern of the appliance power consumption signal into the aggregated power signal. Exceeding the limitations of recurrent models that have been widely used in sequential modeling, this paper proposes a transformer-based architecture for NILM. Our approach, called ELECTRIcity, utilizes transformer layers to accurately estimate the power signal of domestic appliances by relying entirely on attention mechanisms to extract global dependencies between the aggregate and the domestic appliance signals. Another additive value of the proposed model is that ELECTRIcity works with minimal dataset pre-processing and without requiring data balancing. Furthermore, ELECTRIcity introduces an efficient training routine compared to other traditional transformer-based architectures. According to this routine, ELECTRIcity splits model training into unsupervised pre-training and downstream task fine-tuning, which yields performance increases in both predictive accuracy and training time decrease. Experimental results indicate ELECTRIcity’s superiority compared to several state-of-the-art methods.
Article
Full-text available
The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble and Diverse Representation Canonical Interval Forest, which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate on average than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets.
Article
Non-intrusive load monitoring (NILM) is a promising technique for energy consumption monitoring that can recognize load states and appliance types without relying on excessive sensing meters. With the development of the Internet of Things in intelligent buildings, the NILM technique will have broad application prospects. According to the different characteristics of load electrical signals, this work constructs 2D load signatures, including building the weighted voltage–current (WVI ) trajectory image, Markov Transition Field (MTF) image, and current spectral sequence-based GAF (I-GAF) image. Furthermore, a deep learning model named Residual Convolutional Neural Network with Energy-normalization and Squeeze-and-excitation blocks (EN-SE-RECNN) is proposed to mine information on the constructed load signatures and realize the appliance identification task. The accuracy of the proposed method on PLAID, WHITED, and HRAD datasets reached 97.43%, 95.99%, and 98.14%, respectively. And it shows that the proposed method significantly improves the recognition performance compared to existing methods.
Thesis
Data science is about designing algorithms and pipelines for extracting knowledge from large masses of data.Time series analysis is a field of data science which is interested in analyzing sequences of numerical values ordered in time.Time series are particularly interesting because they allow us to visualize and understand the evolution of a process over time.Their analysis can reveal trends, relationships and similarities across the data.There exists numerous fields containing data in the form of time series: health care (electrocardiogram, blood sugar, etc.), activity recognition, remote sensing, finance (stock market price), industry (sensors), etc.In data mining, classification is a supervised task that involves learning a model from labeled data organized into classes in order to predict the correct label of a new instance.Time series classification consists of constructing algorithms dedicated to automatically label time series data.For example, using a labeled set of electrocardiograms from healthy patients or patients with a heart disease, the goal is to train a model capable of predicting whether or not a new electrocardiogram contains a pathology.The sequential aspect of time series data requires the development of algorithms that are able to harness this temporal property, thus making the existing off-the-shelf machine learning models for traditional tabular data suboptimal for solving the underlying task.In this context, deep learning has emerged in recent years as one of the most effective methods for tackling the supervised classification task, particularly in the field of computer vision.The main objective of this thesis was to study and develop deep neural networks specifically constructed for the classification of time series data.We thus carried out the first large scale experimental study allowing us to compare the existing deep methods and to position them compared other non-deep learning based state-of-the-art methods.Subsequently, we made numerous contributions in this area, notably in the context of transfer learning, data augmentation, ensembling and adversarial attacks.Finally, we have also proposed a novel architecture, based on the famous Inception network (Google), which ranks among the most efficient to date.Our experiments carried out on benchmarks comprising more than a hundred data sets enabled us to validate the performance of our contributions.Finally, we also showed the relevance of deep learning approaches in the field of surgical data science where we proposed an interpretable approach in order to assess surgical skills from kinematic multivariate time series data.
Article
Energy conservation has received a lot of attention in recent ten years. Adoption of sustainable energy is important for meeting energy demand, to address this Non-Intrusive Load Monitoring (NILM) technology is now being developed all over the world. Several energy consumption datasets have been released; each dataset has different properties, uses, and limitations. Thus, a solid comprehension of the relevant datasets will help to improve NILM system. This work is dedicated to detailed study of low frequency (data sampling rate lower than Alternating Current (AC) fundamental frequency) residential datasets, a total eighteen datasets are compared according to their measurement features, collected location, nature of sampling, data collection duration, data development platform etc. Furthermore, datasets are classified depending upon the appliance level and aggregated level data. To full fill the literature gap a new low-cost Indian Energy Dataset with Low frequency (IEDL) has been developed with detailed system deployment. This low frequency dataset collects data from aged appliances (older than ten years), which is allowed to adopt NILM applications in energy saving, recommendation system, appliance behavior, demand prediction area.
Article
Today, depending on the increasing population and technological developments, household appliances have significantly increased. This situation causes an ever-increasing energy demand, especially in modern societies. However, efficient energy use is also critical due to limited energy resources. With the Non-intrusive load monitoring methods, the usage of home appliances can be controlled, and behaviours can be adjusted for the energy saving of users. The control of home appliances is only possible with the effective detection of these appliances. In this study, a new hybrid model including a sliding window approach, long short-term memory (LSTM) network, and classifier is proposed to detect house appliances. Unlike traditional feature extraction, the proposed model uses time-series features to determine device features. For this purpose, each time series is divided into windows that do not overlap with the sliding window approach. The mean, standard deviation, median, and multiscale entropy values are calculated from each window. These values are combined and applied to the LSTM network for deep feature extraction. It is applied to k nearest neighbor (k-NN), Ensemble Learning (EBT), and support vector machine (SVM) classifiers to detect residential appliances from deep features retrieved from the LSTM network. The proposed model's effectiveness has been tested with the high-resolution profiles of the residential appliances dataset. In experimental studies, residential appliances were detected in LSTM based k-NN, EBT, and SVM classifiers with an accuracy of 98.25%, 97.81%, and 97.38%, respectively.
Article
Home energy management system is proposed to reduce the influences caused by the high ratio penetration of renewable energy generation, through managing and dispatching the residential power and energy consumption in the demand side. Being aware of how the electric energy is consumed is a key step of this system. Non-intrusive Load Monitoring is regarded as the most potential method to address this problem, which aims to separate individual appliances in households by decomposing the total power consumption. In recent years, NILM is framed as a multi-label classification problem and many researches has been investigated in this field. In this paper, a non-intrusive method which can identify appliances power usage information from the total power consumption is proposed and thoroughly investigated. Firstly, the random k-labelset multi-label classification algorithm is enhanced by introducing random forest algorithm as base classifier. Then, grid search method and cross validation method are integrated to determine the optimal paraments set. This algorithm is used to achieve the appliances identification. Finally, based on the identification result, the integer linear programming is employed for power estimation of each appliance, especially multi-state appliances. Experimental results on low voltage networks simulator demonstrate that the proposed method has a high identification accuracy compared with the traditional random k-labelset multi-label classification methods with other base classifiers, and it is capable of identifying the power usages of different appliances accurately. The desirable performance of power estimation has broadened the applications of machine learning based non-intrusive energy monitoring.