Conference PaperPDF Available

Anomaly Detection in Smart Grid Data: An Experience Report


Abstract and Figures

In recent years, we have been witnessing profound transformation of energy distribution systems fueled by Information and Communication Technologies (ICT), towards the so called Smart Grid. However, while the Smart Grid design strategies have been studied by academia, only anecdotal guidance is provided to the industry with respect to increasing the level of grid intelligence. In this paper, we report on a successful project in assisting the industry in this way, via conducting a large anomaly-detection study on the data of one of the power distribution companies in the Czech Republic. In the study, we move away from the concept of single events identified as anomaly to the concept of collective anomaly, that is itemsets of events that may be anomalous based on their patterns of appearance. This can assist the operators of the distribution system in the transformation of their grid to a smarter grid. By analyzing Smart Meters data streams, we used frequent itemset mining and categorical clustering with clustering silhouette thresholding to detect anomalous behaviour. As the main result, we provided to stakeholders both a visual representation of the candidate anomalies and the identification of the top-10 anomalies for a subset of Smart Meters.
Content may be subject to copyright.
Anomaly Detection in Smart Grid Data:
An Experience Report
Bruno Rossi, Stanislav Chren, Barbora Buhnova and Tomas Pitner
Faculty of Informatics
Masaryk University, Brno, Czech Republic
Email: {brossi,chren,buhnova,tomp}
Abstract—In recent years, we have been witnessing profound
transformation of energy distribution systems fueled by Infor-
mation and Communication Technologies (ICT), towards the so
called Smart Grid. However, while the Smart Grid design strate-
gies have been studied by academia, only anecdotal guidance
is provided to the industry with respect to increasing the level
of grid intelligence. In this paper, we report on a successful
project in assisting the industry in this way, via conducting a
large anomaly-detection study on the data of one of the power
distribution companies in the Czech Republic. In the study, we
move away from the concept of single events identified as anomaly
to the concept of collective anomaly, that is itemsets of events
that may be anomalous based on their patterns of appearance.
This can assist the operators of the distribution system in the
transformation of their grid to a smarter grid. By analyzing
Smart Meters data streams, we used frequent itemset mining
and categorical clustering with clustering silhouette thresholding
to detect anomalous behaviour. As the main result, we provided
to stakeholders both a visual representation of the candidate
anomalies and the identification of the top-10 anomalies for a
subset of Smart Meters.
Index Terms—Smart Grids, Smart Meters, Anomaly Detection,
Clustering, Frequent Itemset Mining.
The Smart Grid can be regarded as an electricity network
that benefits both from two-way cyber-secure communication
technologies and computational intelligence for electricity
generation, transmission, substations integration, distribution
and consumption to reach the goals of a clean, safe, secure,
reliable, resilient, efficient, and sustainable infrastructure [1].
The investment into large-scale Smart Grid deployment can
be very risky, as confirmed for instance by investment losses
during the Xcel Energys SmartGridCity project [1], [2].
There is a recent trend in adding more “smartness” in
the Smart Grid infrastructure, so that the large amount of
information that can be mined from normal usage can be used
to drive the decision-making process and optimize the overall
infrastructure management [3], [4]. This effect is enhanced by
the two-way nature of the more modern infrastructures that
allow operators to fine-tune parameters remotely based on the
knowledge acquired from the operating conditions.
In the this paper, we deal with anomaly detection from
Smart Grid data, that is looking for specific patterns in Smart
Meter’s data streams that do not conform to expected be-
haviour. In general terms, anomaly detection is a broad concept
that has been applied to different fields, ranging from systems
intrusion detection to fraud detection, with varying definitions
of expected behaviour [5]. Based on real data from one of
the power distribution companies in the Czech Republic, we
propose an approach for the detection of the anomalies in the
Smart Metering infrastructure that could be useful to promptly
intervene to investigate the cause of unexpected behaviour.
Based on this analysis, we report also about the insights
acquired in terms of extensions of the approach that would
allow us to implement such online system within the Smart
Grid infrastructure.
The proposed approach is based on frequent itemset mining
by encoding the different event types streamed from Smart
Meters, applying segmentation of the itemsets and using
categorical clustering for the evaluation of the itemsets and
detection of unexpected patterns. The proposed approach is
based on the analysis of event types from the Smart Meters.
It allows us to detect anomalies that might have impact on the
Smart Grid security, reliability or maintenance—for example
suspicious manipulation with Smart Meter casing, under/over-
voltage in specific locations or failure to switch remotely
controlled appliances.
The paper is structured as follows. Section II overviews
related work in the area of anomaly detection within Smart
Grids. Section III then discusses the context of the study
and provides descriptive information about the dataset. The
anomaly detection approach is described in Section IV to-
gether with the rationale for its derivation. Section V presents
the application within the Smart Grid domain according to
the contextual information provided. The main evaluation and
discussion from the experimental part is presented in Section
VI, while Section VII brings up the conclusions.
As the Smart Grid implementation is a strategic act for
many countries, extensive attention has been paid to the study
of smart infrastructures in recent years [1], [6]. Fang et al.
[1] divide the smart infrastructure into three subsystems: (1)
the smart energy subsystem, concerned with power generation,
transmission, and distribution, (2) the smart information sub-
system, concerned with information metering, measurement,
and management, and (3) the smart communication subsystem,
978-1-5090-1897-0/16/$31.00 c
2016 IEEE
concerned with wireless and wired communication technolo-
gies, and the end-to-end communication management.
Within the smart information subsystem (2), which frames
our work, significant advancement can be observed in both
industry and academia.
In industry, these projects are mainly led by electric utilities
or related organizations, which are however often lacking
expertise in information and communication technologies [1].
Evaluation of the devised strategies is hence realised rather via
pilot projects than via analytical and simulation means. While
simulation is not that uncommon within the design of smart
energy subsystem (1) and smart communication subsystem (3),
it is rather rare within the smart information subsystem (2) [7].
In academia, many approaches for the analysis of data
flowing within the Smart Grid, and hence the identification of
smart information, exist, mostly in the cyber-security domain
[8], [9]. Within the cyber-security, the approaches are mainly
concerned with intrusion detection harming the confidentiality,
integrity, or availability of the Smart Grid [10], [11], [12],
although more work has been done on preventative measures,
such as secure communication protocols and architectures in
the Smart Grid [8]. Overall, the cyber-security in the frame of
Smart Grids is very well researched and hence we will invest
more effort in investigating the other domains.
Besides cyber-security, the analysis of the Smart Grid
information flow is concerned mainly with the detection of
faults and failures [13], [14], [15], and to minor extent
with the study of consumer behaviour [16]. Calderaro et
al. [13] detect failures in data transmission and faults in
the distribution network with the help of petri nets analysis
and matrix operations. Kalaitzis et al. [14] study powerline
faults (on the level of the amplitude, frequency, or phase of
powerline signal) with the help of a sliding window approach
in multivariate time series. He et al. [15] are concerned with
fault detection and localization in transmission lines, using a
network inference algorithm based on Markov random fields
and dependency graphs. However, all these approaches are
based on the powerline level (i.e. modelling and observing
individual relays [13], powerline signal [14] or phasor angles
across the transmission line buses [15]), not the information
flow above it, which differentiates them from our aim.
The application of clustering to Smart Grids data is not
a novel idea, and has been successfully applied to Smart
Grid data to steer towards a more intelligent Smart Grid
infrastructure [3], [4]. However, applications are more specific
to clustering customers according to behaviour from Smart
Meters data [17] or looking into clustering sensor data to
segment the network topology and identify set of clusters
according to energy profiles [18].
This study was conducted in cooperation with the major
energy distribution company in the Czech Republic, in which
the smart metering infrastructure has been tested and examined
in several pilot projects since 2006. The pilot projects have
been part of the European Grid4EU initiative. Currently,
there have been almost 40,000 Smart Meters deployed in
total which constitutes about 1% of all consumers managed
by the Distribution company. The individual pilot projects
have specific goals, e.g. evaluation of available technologies,
communication infrastructure, quality of service.
In our case, we utilise the data sets from a project focus-
ing on local load management in low voltage power grids.
The selected consumers (both households and industry) are
equipped with Smart Meters that collect data about the power
consumption profile. The data is periodically sent to the data
concentrator which is installed at the Distribution Transforma-
tion Station (DTS)—there is one data concentrator per each
DTS. From the data concentrator, the data is collected and
stored by the Data Central (DC) server, which is located at
the power grid operation centre. Besides the data related to the
power consumption attributes, the individual devices generate
variety of events used for the grid monitoring and maintenance.
A central role is played by the Smart Meter, an electronic
device used for the measurement and provision of billing
information to customers [19]. In this study, we consider Smart
Meters as data sources of data streaming, allowing the analysis
of all data derived from the Smart Meters’ operations.
The Smart Meter events are used to notify the data central
about important state changes that happened at the level of the
Smart Meter, such as powering up or down of the meter, tariff
and rate switching, time synchronization, etc... Each event
belongs to one of possible 76 event types.
An event entry is described by the origin time, event type
and a Smart Meter device it was created by. For the device,
there is a number of additional attributes available, such as its
GPS coordinates, date of installation, tariff category or type
of deployment site (e.g. apartment, house, agriculture/industry,
etc.). The entire dataset contains 364,107 event entries that
have been collected from 381 Smart Meters over the time-
span from December 2014 to July 2015.
When dealing with anomaly detection, one important dis-
tinction is based on the way we aggregate data to determine
unexpected behaviour. We usually distinguish among: i) point
anomalies, ii) context anomalies, and iii) collective anomalies
[5]. A point anomaly means that one individual event instance
can be considered anomalous when compared to the remaining
data. For example, counting the number of occurrences of a
”gateway on” event from a Smart Meter might be considered
anomalous if its frequency is too low or high on a specific
day. Context anomalies start from the assumption of dividing
the behaviour from the context: the same behaviour might
not be considered an anomaly if it happens in a different
context. Based on the previous example, the same number of
occurrences of ”gateway on” might not trigger an anomaly de-
tection mechanism if they happen on a specific time of the day
/ period of the year. Compared to point anomaly, we need to
take into account the context of the event instance. The third—
and more interesting for our context—category is referred to
as collective anomaly. In this case, the event instance does
Fig. 1. Proposed Approach for Smart Meters anomaly detection (E=set of
events, S=set of data segments, T=set of transactions, F=frequent itemset,
M F I =most frequent itemset, C=set of clusters).
not represent an anomaly per se, but only if considered within
the collection of all the other events instances. Continuing
the previous example, we might consider as anomalous the
collection of events ”gateway on”,”gateway off”,”gateway
on”,”gateway off” rather than ”gateway on”,”transmission
start”,”gateway off”. For this type of anomaly, looking at
single event instances is not meaningful, as they need to be
considered together with the other collection of events.
A typical further characterization of anomaly detection is
based on the availability of label data for the event instances
that can constitute anomalies (supervised, semi-supervised and
unsupervised approaches [5]). In the former category, there
is the availability of labelled data for either all instances or
positive instances, meaning usually one human expert dealing
with labelling of each anomalous instance or the availability of
some form of failure data from which labels can be derived—
also known as tagging information. In our specific context, we
did not have any of such information available, so we ruled-
out the option of applying a supervised classification approach
for anomaly detection. In fact, due to the characteristics of the
Smart Metering dataset we opted for what we can refer to as
an unsupervised contextual and collective anomaly detection
approach. The main reasons were the unavailability of tagging
information, the needs to consider events within their context
and within the broader concept of itemsets.
Given all the considerations about the way to tackle the
problem in the Smart Grid domain, our approach is similar to
the one of Barbar´
a et al. that has been successfully applied
in the context of intrusion detection [20]. However, in our
approach we use a different way to detect outliers — clustering
silhouette indicators. Furthermore, by applying the approach to
Smart Grids data streams we identified several improvements
that we discuss in the paper. The approach is based on the idea
to first identify clusters of what can be considered as normal
behaviour, and then to look into itemsets that deviate from the
knowledge learned from the dataset. We present in detail the
steps of the approach (Fig. 1):
Step1. We first apply Association Rule Mining to identify
frequent itemsets [21], that is sets of events instances that
are more recurrent. For this, we need to identify sets of
transactions from the data streams. We define one transaction
as the set of all the items derived per one day and per each
Smart Meter (Data Segmentation, Fig. 1, Step1). Thus, each
Smart Meter will be associated with a list of daily transactions
of operations;
Step2. Based on the aforementioned concept of collective
anomaly, we extract the most frequent itemsets from data
transactions by applying the the Apriori algorithm [21]. This
will yield for each Transaction a list of frequent itemsets
within each transaction (Frequent Itemsets Identification, Fig.
1, Step2). As an example, after running the first two steps we
might end up with the following itemsets:
{R2XR1On,Rate 2 switching}
{Overcurrent L1,Overcurrent L3}
{R2XR1Off,R2XR1On,Rate 1 switching}
Step3. For each data segment, we have now the list of frequent
itemsets derived from all the transactions. We look then for
the Most Frequent Itemsets that are present in more than one
segment (MFI Filtering, Fig. 1, Step3). The assumption is that
the itemsets that appear in more than one segment can be
considered as an initial normal behaviour, while the other can
be considered potential anomalies at this stage;
Step4. Following the concept of contextual anomaly, each
frequent itemset is further augmented with additional infor-
mation, so that the same itemset in different segments will be
represented by additional features, e.g. whether a working day
(Contextual Information Enhancement, Fig. 1, Step4). Note
that this will increase the number of features, but will also
increase the number of itemsets as the same itemset might
appear in two different contexts (working day / non-working
day). At this point, the additional features can be derived from
additional data sources, not only Smart Meters—as long as
they can be associated to the itemsets. An example of itemsets
at this stage:
{R2XR1On,Rate 2 switching,week-day}
{Overcurrent L1,Overcurrent L3,week-day}
{Overcurrent L1,Overcurrent L3,week-end}
Step5. We cluster all the normal itemsets identified in Step2.
The assumption is that we cluster these as representative of the
normal behaviour (Clustering MFI’, Fig. 1, Step5). We might
also look at this stage from the clustered data if there are
Fig. 2. Clustering of 17 devices (n=681, k=10)
itemsets that are isolated or do not fit well in their clusters. As
the itemsets are represented by categorical / nominal variables,
we use a categorical clustering based on entropy minimization
[22] — used also in the original approach. To continue with
the example, if clustering the previous data with a number
of clusters k= 2, we can get the following clusters as they
minimize the entropy:
C1: {R2XR1On,Rate 2 switching,week-day}
C2: {Overcurrent L1,Overcurrent L3,week-day}
C2: {Overcurrent L1,Overcurrent L3,week-end}
Step6. The final step looks into the identification of anomalies
(Anomalies Identification, Fig. 1, Step6). In this step, we con-
sider again all the itemsets that were considered as potential
anomalies in Step2. For each of them, we cluster them and we
look how well they fit according to the clusters created with
the previously clustered data.
For this last step, to identify the goodness of fit of the
new itemsets, we use the concept of clustering silhouette [23].
Given an itemset i,a(i)represents the average dissimilarity
between all the other itemsets in i’s cluster. Given all the other
clusters (Cjwhere i /Cj),d(i, Cj )represents the dissimi-
larity of iwith all the itemsets in Cjand b(i) = min(d(i, Cj ))
represents the minimal distance of itemset ito the nearest
cluster. The silhouette represents how well an element fits in
its cluster:
max(a(i), b(i))
We can have three cases:
si>0: the itemset fits generally well into the cluster;
si0: the itemset is clustered between two clusters;
si<0: the itemset is probably clustered in the wrong
cluster, that is from the silhouette definition, the itemset
has higher dissimilarity with the elements of the belong-
ing cluster than some nearby cluster;
When looking for anomalies we look for the third type of
cases, that is those in which the itemset does not fit well in
created clusters. Furthermore, we can set a threshold value for
the silhouette. In the next section we present the application
of the approach to the Smart Meter data and we will also use
a visual representation of the clustering silhouette to aid in
anomalies identification.
The first consideration in analysing the Smart Meter data is
about parameters fine-tuning. Overall, for the frequent itemset
mining we need to define support—proportion of transactions
that contain a specific itemset—and confidence—for associ-
ation rules X=Y, the proportion of the transactions
that contain both X and Y. In the current analysis we used
support=0.1and confidence=0.8. Varying these parameters
can bring a different number of itemsets considered in the
initial steps. A third relevant parameter is the number of
clusters, k. Unluckily, the identification of the best number of
clusters based on the underlying dataset can be computation-
ally demanding and unfeasible for a large number of Smart
Meters. Sensitivity analysis can be used to optimize some
clustering quality indicator but such approach does not scale
up to larger number of itemsets. In the current experimental
section, we used k= 10. A last parameter is the threshold
for the clustering silhoutte to determine the anomalies. We
set this parameter to 0.20, but such value does not need to
be evaluated apriori, and can be set by looking at the visual
representation of the clustering silhouette.
Running the approach on the dataset with 381 devices
brings to a total of 364,107 overall events generated on the
devices based on the 76 event types. We map these events into
20,670 transactions associated to the 381 segments (Step1).
Running the Apriori algorithm brings overall 273,829 non-
unique itemsets (Step2). We can note how this is a large
number of itemsets, due to the fact that at this stage the same
itemset might appear in different transactions. The next step
goes into filtering the itemsets based on the condition that they
are present in 2segments considering the unique itemsets at
this stage, not anymore their association to a transaction. This
brings 44,450 unique itemsets that can be considered normal
behaviour according to the discussion above (Step3).
The list of itemsets at this stage is too large to give
useful insights to a decision maker. We can at this stage add
more contextual information (Step4). Since we are considering
unique events in Step3, by mapping back the events to the ones
in the segment we increase the number of events, as an event
might happen under a different context. This differentiates
further two events and is conforming to the idea of augmenting
the data points with contextual information. In the current
analysis we skipped this step, but it is applicable with the
assumption that the added contextual information is categorical
or can be converted in such form.
We run then the clustering algorithm on the mapped events
to create clusters of itemsets that are similar in terms of
the identified categorical features (Step5). We use categorical
clustering based on entropy minimization [22]. To represent
the clusters, we use the Clusplot representation, in which
clusters are represented as ellipses after multi-dimensionality
reduction by showing the two principal components [24].
After clustering has been completed, we have a set of
clusters that represent the way in which all the itemsets are
mapped to each cluster. To simplify the representation, we
show the categorical clustering with k= 10 clusters performed
on 681 itemsets for 17 devices as a subset of the whole
dataset (Fig. 2). The size of the clusters shows the spread
of the itemsets within the cluster and the shadowed area
shows the density within clusters. Clustered data represent
the normal behaviour according to our initial assumptions.
However, as noted in the previous section, we can still see
areas worth investigation, like cluster nr. 2 that is more spread
apart compared to the others.
We run then the final step (Step6). We now take back the
itemsets that were not frequent in the original dataset (NMFI’).
We cluster each one of the new itemsets to see how well they
fit into the existing clusters (based on MFI’).
If we look at the fitting of different itemsets for the case
of 17 devices, we can represent each itemset according to the
silhouette width (Fig. 3). We can see that some itemsets on
the upper part of the plot (sinear 1.0) are fitting better in the
clusters. On the bottom part, those that have negative si. As
new itemsets are clustered according to Step6 of the approach,
we look for those that have negative si.
Based on the analysed data, if we set a threshold of
0.20, we can identify the top-10 anomalies after running the
overall approach (Table I). We report events that were detected
as anomalies together with their sisilhouette value and a
reference index i. In particular, there are five itemsets that
might signal some forms of missing voltages associated with
overcurrents (itemset indexed 1,3,4,8,9). These are itemsets
that can be worth further investigation by decision makers.
Fig. 3. Lower part of the Clustering Silhouette, 17 devices (n=681, k=10)
There are several lessons learned from the application of
the approach to anomaly detection in Smart Meters data.
We approached initially the problem from different angles
but we found out that the most important aspect when con-
sidering anomalies is to provide a collective and contextual
overview. At least in the experimental project we described,
single point anomaly detection did not prove to be sufficient
to determine anomalous events. The addition of contextual
information and the inclusion of an event type instance in
relation to other event types instances proved to be a more
powerful mechanism.
TOP -10 A NO MAL IES A CCO RD ING T O si,DEVICES=17, N=681, K=10
-0.322 1 {”Missing voltage L2”, ”Overcurrent L1”, ”Overcurrent
L2”, ”Overcurrent L3”}
-0.255 2 {”Limiter activated”, ”Power-up”}
-0.249 3 {”Limiter activated”, ”Overcurrent L1”, ”Power-down”
”Power-up”, ”Rate switching error clear Rate switching
error cleared in meter”}
-0.239 4 {”Missing voltage L2”, ”Overcurrent L2”}
-0.236 5 {”Limiter activated”, ”Overcurrent L1”, ”Power-down”,
”Rate switching error”}
-0.222 6 {”Limiter activated”, ”Overcurrent L1”, ”Power-down”,
”Rate switching error clear Rate switching error cleared
in meter”}
-0.216 7 {”Rate 1 switching”, ”Rate 2 switching”}
-0.212 8 {”Missing voltage L2”, ”Overcurrent L2”, ”Overcurrent
-0.219 9 {”Overcurrent L1”, ”Power-down”, ”Power-up”, ”TOU
activated meter”}
-0.208 10 {”No overcurrent L2 ...”, ”Overcurrent L1”, ”Overcurrent
An additional aspect in the Smart Grids domain is that—
differently from other domains—we are not aware of existing
datasets that can be used for the evaluation of the goodness of
anomaly detection from Smart Meters data in comparison with
an established ground truth. As such, the opinion of domain
experts becomes very relevant: it is however unrealistic to
provide indicators such as false negatives and false positives
due the vast amount of itemsets to review. This makes the
evaluation of the approaches difficult to perform.
Running the project also allowed to identify several draw-
backs of the approach. These constitute an interesting list
of requirements for the improvement of the implemented
solution. One of the drawbacks of the proposed approach
is that we considered itemsets and not sequences. That is,
we did not discriminate the order of events within itemsets
both in frequent itemset mining and in clustering. One of
the aspects we are keen to explore is the usage of sequences
for the segmentation part with a different algorithm to cluster
sequences. Together with experts opinions we might derive a
comparison of several approaches.
Another consideration is about using the approach for online
learning—streaming data and real-time system behaviour. This
poses different issues than those considered in this paper, but
working towards the implementation of such a system can be
useful to support the concept of ”smarter grids”.
Finally, in this paper we did not explore in detail the usage
of contextual information in the experimental part. Given
how the features have been built this is not a problem as
long as numerical features are converted to categorical data.
We were considering also an initial phase in which domain
experts could rule-out non-interesting or non-relevant event
types. However, this initial phase can also be detrimental to
the possibility to detect unexpected events.
Modern Smart Grids will permeate our lives in years to
come. While in their initial appearances data communication
was mostly one-way, we are now in the context of two-ways
Smart Grids that can not only monitor but also fine-tune be-
haviour based on knowledge mining capabilities. In this sense,
there is a growing need to introduce smarter behaviours in the
infrastructure, and a central role is played by Smart Meters as
devices that can engage in two-way communications.
In this paper, we evaluated an approach for anomaly detec-
tion in Smart Grids derived from data streamed from Smart
Meters. We proposed to approach the problem by taking into
account the aspects of collective and contextual anomalies that
can bring benefits in building a wider set of dependencies
among events derived from Smart Meters.
We presented the application of a proposed unsupervised
contextual and collective detection approach to data streams
from a large energy distributor in the Czech Republic to reason
about different types of possible anomalies (e.g. over-voltages,
under-voltages). We discussed the benefits of the approach
but also identified drawbacks that can lead to improvements
towards the implementation of an online learning system.
In running the project we found several key needed charac-
teristics: the necessity to provide a constantly online-learning
system, scalable, and that can support detection of unexpected
events, possibly leading towards a self-healing system.
[1] X. Fang, S. Misra, G. Xue, and D. Yang, “Smart grid — the new and
improved power grid: A survey,” Communications Surveys & Tutorials,
IEEE, vol. 14, no. 4, pp. 944–980, 2012.
[2] “Xcel energy. smartgridcity,”
[3] C.-W. Tsai, A. Pelov, M.-C. Chiang, C.-S. Yang, and T.-P. Hong, “A brief
introduction to classification for smart grid,” in 2013 IEEE International
Conference on Systems, Man, and Cybernetics (SMC), ser. SMC ’13.
Washington, DC, USA: IEEE Computer Society, 2013, pp. 2905–2909.
[4] C. S. Lai and L. L. Lai, “Application of big data in smart grid,” in
2015 IEEE International Conference on Systems, Man, and Cybernetics
(SMC), Oct 2015, pp. 665–670.
[5] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,
ACM computing surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
[6] V. C. Gungor, D. Sahin, T. Kocak, S. Ergut, C. Buccella, C. Cecati,
and G. P. Hancke, “A survey on smart grid potential applications and
communication requirements,” Industrial Informatics, IEEE Trans. on,
vol. 9, no. 1, pp. 28–42, 2013.
[7] K. Mets, J. A. Ojea, and C. Develder, “Combining power and com-
munication network simulation for cost-effective smart grid analysis,
Communications Surveys & Tutorials, IEEE, vol. 16, no. 3, pp. 1771–
1796, 2014.
[8] W. Wang and Z. Lu, “Cyber security in the smart grid: Survey and
challenges,” Computer Networks, vol. 57, no. 5, pp. 1344–1371, 2013.
[9] Y. Yan, Y. Qian, H. Sharif, and D. Tipper, “A survey on cyber security
for smart grid communications,” Communications Surveys & Tutorials,
IEEE, vol. 14, no. 4, pp. 998–1010, 2012.
[10] Y. Zhang, L. Wang, W. Sun, R. C. Green, M. Alam et al., “Distributed
intrusion detection system in a multi-layer network architecture of smart
grids,” Smart Grid, IEEE Trans. on, vol. 2, no. 4, pp. 796–808, 2011.
[11] R. Berthier, W. H. Sanders, and H. Khurana, “Intrusion detection
for advanced metering infrastructures: Requirements and architectural
directions,” in Smart Grid Communications (SmartGridComm), 2010
First IEEE International Conference on. IEEE, 2010, pp. 350–355.
[12] C.-W. Ten, J. Hong, and C.-C. Liu, “Anomaly detection for cybersecurity
of the substations,” Smart Grid, IEEE Trans. on, vol. 2, no. 4, pp. 865–
873, 2011.
[13] V. Calderaro, C. N. Hadjicostis, A. Piccolo, and P. Siano, “Failure
identification in smart grids based on petri net modeling,” Industrial
Electronics, IEEE Trans. on, vol. 58, no. 10, pp. 4613–4623, 2011.
[14] A. Kalaitzis and J. D. Nelson, “Online joint classification and anomaly
detection via sparse coding,” in Machine Learning for Signal Processing
(MLSP), 2014 IEEE International Workshop on. IEEE, 2014, pp. 1–6.
[15] M. He and J. Zhang, “A dependency graph approach for fault detection
and localization towards secure smart grid,Smart Grid, IEEE Trans.
on, vol. 2, no. 2, pp. 342–351, 2011.
[16] S. L¨
uhr, G. West, and S. Venkatesh, “Recognition of emergent human
behaviour in a smart home: A data mining approach,” Pervasive and
Mobile Computing, vol. 3, no. 2, pp. 95–116, 2007.
[17] M. Zeifman, “Smart meter data analytics: Prediction of enrollment in
residential energy efficiency programs,” in 2014 IEEE International
Conference on Systems, Man, and Cybernetics (SMC), Oct 2014, pp.
[18] P. P. Rodrigues and J. Gama, “Holistic distributed stream clustering for
smart grids,” in Workshop on Ubiquitous Data Mining, 2012, p. 18.
[19] “Meters, smart. ”smart meter systems: a metering industry perspective.”
a joint project of the eei and aeic meter committees (2011).”
[20] D. Barbar´
a, Y. Li, J. Couto, J.-L. Lin, and S. Jajodia, “Bootstrapping
a data mining intrusion detection system,” in Proc. of the 2003 ACM
symposium on Applied computing. ACM, 2003, pp. 421–425.
[21] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules
in large databases,” in 20th international conference on very large data
bases. Morgan Kaufmann Publishers Inc., 1994, pp. 487–499.
[22] D. Barbar´
a, Y. Li, and J. Couto, “Coolcat: an entropy-based algorithm for
categorical clustering,” in Proc. of the eleventh international conference
on Information and knowledge management. ACM, 2002, pp. 582–589.
[23] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis,” Journal of computational and applied
mathematics, vol. 20, pp. 53–65, 1987.
[24] G. Pison, A. Struyf, and P. J. Rousseeuw, “Displaying a clustering with
clusplot,” Computational statistics & data analysis, vol. 30, no. 4, pp.
381–392, 1999.
... The unsupervised contextual and collective detection approach is utilized by [30] to data flow by a huge energy dispenser in the Czech Republic. The approach examines distinctive forms of potential abnormalities (e.g., above/below-voltages). ...
... Number_hidden_neurons L1 [5,10,15,20,25,30] Number_hidden_neurons L2 [5,10,15,20,25,30] Number_hidden_neurons L3 [5,10,15,20,25,30] Number_hidden_neurons L4 [5,10,15,20,25,30] batch [5,10,15,20,25,30] epochs [200,250,300,350,400,450,500] The AE-LSTM also learns/trains on a time-series signal and then tries to predict/ forecast these signal characteristics in the future. Therefore, the same as in Prophet, the R 2 , MSE, and MAE were used as objective functions. ...
... Number_hidden_neurons L1 [5,10,15,20,25,30] Number_hidden_neurons L2 [5,10,15,20,25,30] Number_hidden_neurons L3 [5,10,15,20,25,30] Number_hidden_neurons L4 [5,10,15,20,25,30] batch [5,10,15,20,25,30] epochs [200,250,300,350,400,450,500] The AE-LSTM also learns/trains on a time-series signal and then tries to predict/ forecast these signal characteristics in the future. Therefore, the same as in Prophet, the R 2 , MSE, and MAE were used as objective functions. ...
Full-text available
The rapid industrial growth in solar energy is gaining increasing interest in renewable power from smart grids and plants. Anomaly detection in photovoltaic (PV) systems is a demanding task. In this sense, it is vital to utilize the latest updates in machine learning technology to accurately and timely disclose different system anomalies. This paper addresses this issue by evaluating the performance of different machine learning schemes and applying them to detect anomalies on photovoltaic components. The following schemes are evaluated: AutoEncoder Long Short-Term Memory (AE-LSTM), Facebook-Prophet, and Isolation Forest. These models can identify the PV system’s healthy and abnormal actual behaviors. Our results provide clear insights to make an informed decision, especially with experimental trade-offs for such a complex solution space.
... Big 18 data [4], and data-driven techniques highly assist in detecting and preventing such 19 anomalies and detect faults on photovoltaic (PV) components. 20 The scalable and coherent functionality of PV systems needs advanced tools to 21 monitor the system parameters' dynamic evolution and release alerts about anomalies 22 to decision-makers. Online monitoring of PV systems is technically beneficial to assist 23 operators in managing their plants and establishing economic assimilation into smart 24 grids [1]. ...
... The Self-learning algorithms markedly decreased the measuring exertion 94 and supported reliable monitoring of faults. The authors of [12] used a k-Nearest- 95 Neighbours algorithm and a Multi-layer Perceptron to process the data from a DC The unsupervised contextual and collective detection approach is utilized by [20] 138 to data streams from a large energy distributor in the Czech Republic. The approach 139 examined distinctive forms of potential anomalies (e.g., over-voltages, under-voltages). ...
Full-text available
The rapid industrial growth in solar energy is gaining increasing interest in renewable power from smart grids and plants. Anomaly detection in photovoltaic (PV) systems is a demanding task. In this sense, it is vital to utilize recent advances in machine learning to accurately and timely detect different anomalies and condition monitoring. This paper addresses this issue by evaluating different machine learning techniques and schemes and showing how to apply these approaches to solve anomaly detection and detect faults on photovoltaic components. For this, we apply distinct state-of-the-art machine learning techniques (AutoEncoder Long Short-Term Memory (AE-LSTM), Facebook-Prophet, and Isolation Forest) to detect faults/anomalies and evaluate their performance. These models shall identify the PV system's healthy and abnormal actual behaviors. Our results provide clear insights to make an informed decision, especially with experimental trade-offs for such complex solution space.
... This means that a complex communication environment requires better tunable security solutions for keeping the operators in a state of awareness. Some studies [35][36][37] propose a viable solution for SG in an intelligent security monitoring and control layer, adapted and enhanced with smart artificial intelligent anomaly detector systems (ADS). The intelligent monitoring and control systems actively follow the communication messages and compare these with the "learned" normal situation. ...
... Thi means that a complex communication environment requires better tunable security solu tions for keeping the operators in a state of awareness. Some studies [35][36][37] propose viable solution for SG in an intelligent security monitoring and control layer, adapted and enhanced with smart artificial intelligent anomaly detector systems (ADS). The intelligen monitoring and control systems actively follow the communication messages and com pare these with the "learned" normal situation. ...
Full-text available
Empowered by the emergence of novel information and communication technologies (ICTs) such as sensors and high-performance digital communication systems, Europe has adapted its electricity distribution network into a modern infrastructure known as a smart grid (SG). The benefits of this new infrastructure include precise and real-time capacity for measuring and monitoring the different energy-relevant parameters on the various points of the grid and for the remote operation and optimization of distribution. Furthermore, a new user profile is derived from this novel infrastructure, known as a prosumer (a user that can produce and consume energy to/from the grid), who can benefit from the features derived from applying advanced analytics and semantic technologies in the rich amount of big data generated by the different subsystems. However, this novel, highly interconnected infrastructure also presents some significant drawbacks, like those related to information security (IS). We provide a systematic literature survey of the ICT-empowered environments that comprise SGs and homes, and the application of modern artificial intelligence (AI) related technologies with sensor fusion systems and actuators, ensuring energy efficiency in such systems. Furthermore, we outline the current challenges and outlook for this field. These address new developments on microgrids, and data-driven energy efficiency that leads to better knowledge representation and decision-making for smart homes and SGs.
... The Non-Technical Losses (NTL) represent a challenge, as electricity theft is identified in both conventional meters and smart metering systems and buildings [1]. They cause significant financial losses that threaten the security of supply and lead to collective burden as NTL are included in the utility companies' tariff and paid by all consumers in countries such as India, China, Brazil [2], Tunisia [3], Uruguay, etc. Furthermore, analyses of the smart meters data of a grid operator in the Czech Republic are performed to identify suspicious behavior [4]. Hence, resilient and performant investigations with ML algorithms or energy theft detection systems and on-site inspections are required to discourage and penalize dishonest behaviors [5]. ...
... Six steps are implemented for anomaly detection in smart metering data [4]. First, data segmentation is performed as creating datasets per day and per smart meter. ...
Full-text available
When analyzing smart metering data, both reading errors and frauds can be identified. The purpose of this analysis is to alert the utility companies to suspicious consumption behavior that could be further investigated with on-site inspections or other methods. The use of Machine Learning (ML) algorithms to analyze consumption readings can lead to the identification of malfunctions, cyberattacks interrupting measurements, or physical tampering with smart meters. Fraud detection is one of the classical anomaly detection examples, as it is not easy to label consumption or transactional data. Furthermore, frauds differ in nature, and learning is not always possible. In this paper, we analyze large datasets of readings provided by smart meters installed in a trial study in Ireland by applying a hybrid approach. More precisely, we propose an unsupervised ML technique to detect anomalous values in the time series, establish a threshold for the percentage of anomalous readings from the total readings, and then label that time series as suspicious or not. Initially, we propose two types of algorithms for anomaly detection for unlabeled data: Spectral Residual-Convolutional Neural Network (SR-CNN) and an anomaly trained model based on martingales for determining variations in time-series data streams. Then, the Two-Class Boosted Decision Tree and Fisher Linear Discriminant analysis are applied on the previously processed dataset. By training the model, we obtain the required capabilities of detecting suspicious consumers proved by an accuracy of 90%, precision score of 0.875, and F1 score of 0.894.
... This article examines time series analysis to identify anomalies in SMs' data. Rossi et al. in [164] presented a realistic report on the study of collective and contextual anomaly behaviour in the data of electricity distribution companies in the Czech Republic. They proposed a related new approach to the diagnosis of anomalies. ...
Full-text available
Smart Grid (SG) is the revolutionised power network characterised by a bidirectional flow of energy and information between customers and suppliers. The integration of power networks with information and communication technologies enables pervasive control, automation and connectivity from the energy generation power plants to the consumption level. However, the development of wireless communications, the increased level of autonomy, and the growing sofwarisation and virtualisation trends have expanded the attack susceptibility and threat surface of SGs. Besides, with the real-time information flow, and online energy consumption controlling systems, customers' privacy and preserving their confidential data in SG is critical to be addressed. In order to prevent potential attacks and vulnerabilities in evolving power networks, the need for additional studying security and privacy mechanisms is reinforced. In addition, recently, there has been an ever-increasing use of machine intelligence and Machine Learning (ML) algorithms in different components of SG. ML models are currently the mainstream for attack detection and threat analysis. However, despite these algorithms' high accuracy and reliability, ML systems are also vulnerable to a group of malicious activities called adversarial ML (AML) attacks. Throughout this paper, we survey and discuss new findings and developments in existing security issues and privacy breaches associated with the SG and the introduction of novel threats embedded within power systems due to the development of ML-based applications. Our survey builds multiple taxonomies and tables to express the relationships of various variables in the field. Our final section identifies the implications of emerging technologies, future communication systems, and advanced industries on the security and privacy issues of SG.
... modeling and observing individual relays [7] or phasor angles across the transmission line buses [8]), not the traffic flows above it. Another part of the study analyzed user behavior based on data from grid metering terminals, Zeifman et al. [17] clustered user behaviors based on household electricity consumption data, Yip et al. [2] identified electricity stealing behaviors, and Rossi [18] clustered more extensive abnormal behaviors of grid metering terminals. ...
Recognizing the type of connected devices to a network helps to perform security policies. In smart grids, identifying massive number of grid metering terminals based on network traffic analysis is almost blank and existing research has not proposed a targeted end-to-end model to solve the flow classification problem. Therefore, we proposed a hierarchical terminal recognition approach that applies the details of grid data. We have formed a two-level model structure by segmenting the grid data, which uses the statistical characteristics of network traffic and the specific behavior characteristics of grid metering terminals. Moreover, through the selection and reconstruction of features, we combine three algorithms to achieve accurate identification of terminal types that transmit network traffic. We conduct extensive experiments on a real dataset containing three types of grid metering terminals, and the results show that our research has improved performance compared to common recognition models. The combination of an autoencoder, K-Means and GradientBoost algorithm achieved the best recognition rate with F1 value of 98.3%.
... Most algorithms that are used in electricity theft are supervised, but unsupervised and hybrid algorithms are gaining increasing importance as well [26][27][28][29] . Furthermore, a combination of supervised algorithms (classifiers) and unsupervised algorithms (especially clustering) is also possible 30,31 . ...
Full-text available
Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detection using real datasets of Tunisian electricity consumption metered by conventional meters. We propose an extensive feature engineering approach using the structured query language (SQL) analytic functions. Furthermore, double merging of datasets reveals more dimensions of the data allowing better detection of irregularities in consumption. We analyze the results of several machine learning (ML) algorithms that manage cases of weakly correlated features and highly unbalanced datasets. The skewness of the target is approached as a regular characteristic of the input data because most of consumers are fair and only a small portion attempt to mislead the utility companies by tampering with metering devices. Our fraud detection solutions consist of combining classifiers with an anomaly detection feature obtained with an unsupervised ML algorithm—Isolation Forest, and extensive feature engineering using SQL analytic functions on large datasets. Several techniques for feature processing enhanced the Area Under the Curve score for Decision Tree algorithm from 0.68 to 0.99.
... In [15] Rossi focused on detecting collective and contextual anomalies from a stream of smart meter event occurrences. In this paper, association rule mining is used to identify sets of recurrent events instances (frequent itemsets). ...
... , Mannila et al.[65], Atallah et al.[11] and Rossi et al.[66] all suggested methods for abnormal event detection based on windowing and event sequencing techniques. ...
Full-text available
The use of machine learning for anomaly detection is a well-studied topic within various application domains. However, the detection problem for market surveillance remains challenging due to the lack of labelled data and the nature of anomalous behaviours, which are often contextual and spread over a sequence of anomalous instances. This paper provides a comprehensive review of state-of-the-art machine learning methods used, particularly in financial market surveillance. We discuss the research challenges and progress in this field, mainly applied in other related application domains. In particular, we present a case of machine learning-based surveillance system design for a physical power trading market and discuss how the nature of input data affects the effectiveness of the methods on detecting anomalous market behaviours. Overall, our findings indicate that the regression tree-based ensemble algorithms robustly and effectively predict day-ahead future prices, showing their capability to detect abnormal price changes.
Conference Paper
Full-text available
In this paper, the state-of-the-art of big data is reviewed. Challenges, opportunities and tools will be discussed. Some emerging technologies will be looked to promote big data applications. The applications of big data in smart grid in some countries will be summarized too.
Full-text available
We present a novel convex scheme for simultaneous online fault classification and anomaly detection in a multivariate time-series setting. Our approach extends recent work on sparse coding and anomaly detection using an over-complete dictionary to problems where some taxonomy of anomalies already exists. The temporal aspect of the data is addressed by a simple sliding window approach; inspired by a group-LASSO penalisation approach, classification is treated by jointly sparsifying groups of the coefficients (the sparse coding) of dictionary atoms via ℓ2,1 regularisation. The dictionary which drives the prediction and coding is assumed given and is learnable by a range of available prior algorithms. We demonstrate our framework on a classification and anomaly detection task on three-phase low-voltage time-series. In this case, we manually design our dictionary based on basic knowledge of common faults that affect low-voltage powerlines. For this reason our approach does not necessarily require a training stage.
Full-text available
A smart grid is a new form of electricity network with high fidelity power-flow control, self-healing, and energy reliability and energy security using digital communications and control technology. To upgrade an existing power grid into a smart grid, it requires significant dependence on intelligent and secure communication infrastructures. It requires security frameworks for distributed communications, pervasive computing and sensing technologies in smart grid. However, as many of the communication technologies currently recommended to use by a smart grid is vulnerable in cyber security, it could lead to unreliable system operations, causing unnecessary expenditure, even consequential disaster to both utilities and consumers. In this paper, we summarize the cyber security requirements and the possible vulnerabilities in smart grid communications and survey the current solutions on cyber security for smart grid communications.
Full-text available
The Smart Grid, regarded as the next generation power grid, uses two-way flows of electricity and information to create a widely distributed automated energy delivery network. In this article, we survey the literature till 2011 on the enabling technologies for the Smart Grid. We explore three major systems, namely the smart infrastructure system, the smart management system, and the smart protection system. We also propose possible future directions in each system. colorred{Specifically, for the smart infrastructure system, we explore the smart energy subsystem, the smart information subsystem, and the smart communication subsystem.} For the smart management system, we explore various management objectives, such as improving energy efficiency, profiling demand, maximizing utility, reducing cost, and controlling emission. We also explore various management methods to achieve these objectives. For the smart protection system, we explore various failure protection mechanisms which improve the reliability of the Smart Grid, and explore the security and privacy issues in the Smart Grid.
Full-text available
The Smart Grid, generally referred to as the next-generation power system, is considered as a revolutionary and evolutionary regime of existing power grids. More importantly, with the integration of advanced computing and communication technologies, the Smart Grid is expected to greatly enhance efficiency and reliability of future power systems with renewable energy resources, as well as distributed intelligence and demand response. Along with the silent features of the Smart Grid, cyber security emerges to be a critical issue because millions of electronic devices are inter-connected via communication networks throughout critical power facilities, which has an immediate impact on reliability of such a widespread infrastructure. In this paper, we present a comprehensive survey of cyber security issues for the Smart Grid. Specifically, we focus on reviewing and discussing security requirements, network vulnerabilities, attack countermeasures, secure communication protocols and architectures in the Smart Grid. We aim to provide a deep understanding of security vulnerabilities and solutions in the Smart Grid and shed light on future research directions for Smart Grid security.
Massive rollout of residential smart meters has spurred interest in processing the highly granular data available from these devices. Whereas the majority of smart meter data analytics is devoted to characterization of household electric appliances and their operational schedules, little work has been done to leverage these data to predict household propensity to enroll in energy efficiency and demand response programs. The state-of-the-art methodology for household enrollment prediction involves measurable household characteristics (e.g., age, household income, education, presence of children, average energy bill) and a multivariate logistic regression that connects these predictor variables with the probability to enroll. Unfortunately, the prediction accuracy of this method is just slightly better than 50%, and the required household data are not freely available to utilities/ program contractors. We developed a new method for prediction of household propensity to enroll using only hourly electricity consumption data from households' smart meters, collected over twelve months. The method implements advanced machine learning algorithms to reach an unprecedented prediction accuracy of about 90%. This level of accuracy was obtained in our study of a US West Coast behavior-based residential program.
Conference Paper
There is no doubt about the potentials of smart grid, for the traditional power grid has been kind of out of date, in terms of not only its infrastructure but also its restrictions on the way information is communicated. To provide better and smarter services via smart grid, the first thing we have to do is to make it more intelligent. Among the technologies that can be applied to smart grid to make it more intelligent, data mining will certainly play a vital role. This paper begins with a brief introduction to smart grid, followed by a discussion on the supervised learning (classification) and the unsupervised learning (clustering). Several open and possible research issues are then given to depict the future trends of smart grid.
Today's electricity grid is transitioning to a so-called smart grid. The associated challenges and funding initiatives have spurred great efforts from the research community to propose innovative smart grid solutions. To assess the performance of possible solutions, simulation tools offer a cost effective and safe approach. In this paper we will provide a comprehensive overview of various tools and their characteristics, applicable in smart grid research: we will cover both the communication and associated ICT infrastructure, on top of the power grid. First, we discuss the motivation for the development of smart grid simulators, as well as their associated research questions and design challenges. Next, we discuss three types of simulators in the smart grid area: power system simulators, communication network simulators, and combined power and communication simulators. To summarize the findings from this survey, we classify the different simulators according to targeted use cases, simulation model level of detail, and architecture. To conclude, we discuss the use of standards and multi-agent based modeling in smart grid simulation.