Content uploaded by Nadeem Javaid
Author content
All content in this area was uploaded by Nadeem Javaid on Mar 07, 2021
Content may be subject to copyright.
An adaptive synthesis to handle imbalanced big
data with deep siamese network for electricity theft
detection in smart grids
Nadeem Javaid*, Naeem Jan, Muhammad Umar Javed
Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
*Corresponding author: www.njavaid.com, nadeemjavaidqau@gmail.com
Abstract—The bi-directional flow of energy and information
in the smart grid makes it possible to record and analyze
the electricity consumption profiles of consumers. Because of
the increasing rate of inflation over the past few years, people
started looking for means to use electricity illegally, termed as
electricity theft. Many data analytics techniques are proposed
in the literature for electricity theft detection (ETD). These
techniques help in the detection of suspected illegal consumers.
However, the existing approaches have a low ETD rate either
due to improper handling of the imbalanced class problem in
a dataset or the selection of inappropriate classifier. In this
paper, a robust big data analytics technique is proposed to
resolve the aforementioned concerns. Firstly, adaptive synthesis
(ADASYN) is applied to handle the imbalanced class problem of
data. Secondly, convolutional neural network (CNN) and long-
short term memory (LSTM) integrated deep siamese network
(DSN) is proposed to discriminate the features of both honest and
fraudulent consumers. Specifically, the task of feature extraction
from weekly energy consumption profiles is handed over to the
CNN module while the LSTM module performs the sequence
learning. Finally, the DSN contemplates on the shared features
provided by the CNN-LSTM and applies final judgment. The
data analytics is performed on different train-test ratios of the
real-time smart meters’ data. The simulation results validate the
proposed model’s effectiveness in terms of high area under the
curve, F1-Score, precision and recall.
Index Terms—Big data analytics, imbalanced data, adaptive
synthesis, electricity theft detection, deep learning, long-short
term memory, convolutional neural network, deep siamese net-
work.
I. INTRODUCTION
Going with the United Nation’s 2030 vision, “electricity
for all”, is the major objective of all countries. 1. Both devel-
oped and developing countries are striving to add maximum
amount of electricity to the national grid. While the power
authorities struggle to ensure efficient power distribution to
every household, the energy theft became a hurdle in this
endeavor. According to a report, a loss of approximately 100
million Canadian dollars per year is revealed due to electricity
theft that is equal to the amount of electricity required to
power around 77000 homes for a year [1]. The yearly loss
in revenue caused by the electricity theft in America is 6U.S.
dollars. Similarly, the percentage of electricity loss caused due
1http://www.worldenergyoutlook.org
to theft is 0.5% to 25% in Brazil, 3.5% in Philippines and upto
1% in United Kingdom. Each year, the revenue loss due to
electricity theft reaches approximately 96 billion U.S. dollars
[2] worldwide.
With the advancements made in information and communi-
cation technology, the traditional power grids are now able
to grasp the benefits of bi-directional communication and
known as smart grids (SGs) [3], [4]. The roll-out of advanced
metering infrastructure (AMI) in the SG makes it possible
to provide the real time and fine-tuned measurements to the
utilities. The addition of communication layer to traditional
metering establishes a bridge between consumers and utility
[5]. Although, numerous benefits are provided by the AMI,
however, the power systems became more exposed to cyber at-
tacks due to the addition of this extra layer [6]. In contrast, the
traditional meters are only vulnerable to physical tampering.
In this paper, the fraud committed by either utilities or feeders
is beyond the scope and focus is on detecting irregularities in
the electricity consumption of consumers.
In a SG, the transmission and distribution of power include
both technical losses (TLs) and non-technical losses (NTLs).
The former include dissipation of energy due to Joules effect,
which in fact is caused by the emission of electrons due
to heat. The assessment of TLs is necessary for accounting
NTLs. Electricity theft is an intended act of illegal usage of
electricity, which is a major source of NTLs. These losses
represent the energy, which is consumed by the consumers,
but not billed. These are also known as commercial losses or
electricity theft. The main issue concerning NTLs is that they
cannot be detected precisely. Only the difference between the
dispatched amount of energy from utilities and the bill paid
for the consumed energy is calculated. The reason behind
this irregularity is either the illegal use of electricity or the
occurrence of technical faults [7]. This irregularity falls under
one of the two groups: internal fraud and external fraud. The
former is committed by the employees for achieving financial
benefits while the later is perpetrated by the consumers for
reducing electricity bills. Ultimately, the main goal behind this
irregularity is to hinder the actual electricity consumption and
consequently achieve financial benefits [8].
The vulnerabilities related to NTLs are generally catego-
rized into three classes: physical attacks, cyber attacks and data
attacks. The physical attacks include meter tampering, reverse
1
2
metering, bypassing the meters by direct supply, double-
tapping, washing out meter display, using bogus meters, en-
countering loops in terminal blocks and deploying tilted meters
[8]. In developing countries, the most frequently committed
electricity frauds are reverse metering and direct supply [7].
The cyber attacks are launched remotely by intercepting the
communication line and altering actual readings with malign
readings. Whereas, data attacks are the fusion of both physical
attacks and cyber attacks. The motive behind data attack is to
specifically target the recorded measurements of electricity and
adulterate them by fake data injection [8].
In the past, the primary means of detecting power theft
was on-site inspections and manual analytics of electricity
consumption records. However, these approaches are time
consuming and result in low success rate. Recently, the emer-
gence of information technology and advancements in machine
learning resulted in more robust solutions. Generally, the
solutions to handle NTLs can be grouped into three categories:
hardware based, non-hardware based (data-driven) and hybrid
of both. Hardware based solutions involve the deployment of
devices on different locations, i.e., sensors and they mainly
deal with the design and architecture of the smart meters [9]
to achieve high ETD rate. However, they have high operational
and maintenance cost of the specialized hardware. In contrast,
non-hardware based solutions restrain high potential due to the
low operational and maintenance cost. These solutions detect
the fraud through machine learning algorithms and classifiers.
They can further be categorized into state based, game theory
based and artificial intelligence (AI) based methods.
The state based methods estimate the aggregated NTLs by
calculating the TLs of a specific area. These methods calcu-
late the difference between the amount of energy consumed
and the corresponding invoiced energy. Moreover, different
measurements are estimate like deviation in voltage, power,
etc., for detecting NTLs, which result in high precision and
low cost [10]. However, the state based methods only provide
the aggregated NTLs and fail in providing the specific source
of the loss. Unlike state estimation based method, in game-
theoretic method [11], there is a contest between the utility
and the aberrant consumer. The aim of fraudster consumer is
to outmatch the utility. However, the game-theoretic methods
highly rely on strong estimation for theft characterization.
On the contrary, the AI based methods mainly focus on
the patterns of electricity consumption, which are analyzed
through machine learning algorithms. Both classification and
clustering methods require labeled and un-labeled data in order
to fetch the aberrant consumers from the pool of massive
electricity consumption profiles [12].
Detecting anomalous patterns from electricity consumption
profiles is a challenging task in the presence of imbalanced
class distribution problem in data. In real world scenario, the
number of fair electricity consumers are significantly more
than the thieves, which creates an issue of imbalanced distribu-
tion in dataset. Therefore, it may be considered a special type
of anomaly detection. In AI based methods, classifiers mostly
result in low ETD rate, mainly due to the underrepresentation
of the minority class [13].
The research work in [1], [14] show that analyzing the
electricity consumption patterns of consumers is beneficial
in detecting the suspicious consumers. However, after going
through the existing literature on the topic of ETD [6]- [11],
it is concluded that ETD has the following limitations:
•the models which are applied for ETD do not take care
of proper class balancing,
•in many cases, the attachment of special devices is
required,
•in highly dynamic time series analyses, methods such
as support vector machine (SVM), random forest (RF),
logistic regression (LR), etc., have low ETD rate and high
false positive (F+) rate,
•the deep learning approaches do not discriminate the
decisive features appropriately and
•in sequential time series data, the convolution neural
network (CNN) and multi-layer perceptron (MLP) do not
perform well. Moreover, CNN fails to provide the exact
source of NTL.
In this paper, a robust big data analytics method for electricity
theft detection (ETD) in the SG is proposed to better discrim-
inate the fair and fraud consumers on the basis of electricity
consumption data. The main contributions of this study are as
under:
•according to the nature of problem, an enhanced strategy
for data preprocessing is adopted,
•to avoid overfitting and to handle class imbalance issue,
adaptive synthesis (ADASYN) method is used,
•CNN and long short term memory (LSTM) are integrated
in a deep siamese network (DSN) in order to learn the
key features and to achieve high ETD rate and
•the performance metrics such as mean average precision
(mAP) and area under the curve (AUC) are used to better
comprehend the results.
Rest of the paper is organized as follows. The review on
various existing electricity theft strategies is given in section
II. The problem analysis and solutions to the problems are
described in section III and section IV, respectively. The
simulation results are discussed in section V. Finally, the paper
is concluded in section VI.
II. RE LATE D WO RK
Review on the state of the art ETD solutions is generally
categorized into two groups: hardware based solutions and
non-hardware based solutions. A comprehensive review on
system level and data level threats of AMI can be studied
in [15], [16].
A. Hardware based solutions
In hardware based solutions, deployment of special purpose
hardware and modification to the physical architecture are
performed to strengthen the system against vulnerabilities. An
identity based key establishment model is proposed in [9]
in order to avoid relying on pairing. The proposed model is
based on elliptic curve cryptography (ECC), which enhances
the performance along with the mitigation of computational
overhead. Using Chebyshev polynomial to access the security
3
features of smart meter, a power-authenticated key exchange
protocol is proposed in [17]. To address the ephemeral security
problem, an authentication scheme based on ECC is proposed
in [18], which aims to mitigate the communication and com-
putational complexity. Although, the hardware based solutions
give acceptable results, concentration is still focused on data-
driven approaches for NTL detection due to the following
reasons [19]:
•high deployment and maintenance cost due to specialized
metering hardware,
•negative benefit-cost ratio (BCR), i.e., the cost outweighs
the benefits,
•failure in detecting specific source of NTL and
•vulnerability of specialized meter hardware in extreme
weather conditions.
B. Non-hardware based solutions
In contrast to the hardware based solutions, the data-driven
approach surges more rapidly in detecting NTLs. In [2], a
two-fold machine learning technique is adopted to minimize
the ratio of misclassified instances. In the first step, the
maximum information coefficient (MIC) determines the
correlation between the suspicions and the consumption
profiles. In the second step, clustering is performed to find
the density peaks. Similarly in [20], clustering is used to
extract a prototype from consumption patterns. The unseen
data samples are categorized by a distance-measurer; the
instance with significant distance is considered as malign. In
contrast, the work performed in [8], [21], use a supervised
learning approach to handle ETD through relative entropy
and gradient boosting classifiers (GBCs). A hybrid of MLP
and LSTM is adopted to detect NTL in AMI [22]. In order
to find the suspicions’ rank, fuzzy logic is applied in [23]. A
framework for feature engineering with combination of both
genetic algorithm (GA) and finite mixture model (FMM) is
implemented in [24]. For final judgement in NTL detection,
gradient boosting machine (GBM) is applied. GA is an
efficient heuristic algorithm, however, it fails in providing
the global optima. A similar approach is proposed in [25],
which uses black hole algorithm (BHA) for feature extraction.
Although, BHA extracts the optimal features from time series
data, the performance of model is still inefficient in terms of
F+rate.
III. PROB LE M ANALYS IS
By analyzing the consumption patterns of electricity con-
sumers, it becomes evident that the fraudsters and the fair
consumers can be differentiated by their consumption pro-
files. Therefore, experiments are performed on the consumed
electricity data, as inspired from [1], in order to validate
the problem. Fig. 1(a) shows the electricity consumption of
benign consumers during October 2016. By visualizing the
results, it is difficult to analyze the key characteristics from
the sequential or one-dimensional (1-D) load profile. However,
by choosing the weekly load profile, it can be seen that the
consumption of a fair consumer shows symmetric behavior,
as depicted in Fig. 1(b). In our scenario, weekly consumption
profile of consumers is preferred over daily consumption
for CNN, because the behaviors of consumers are weekly
periodic. As shown in Fig. 1(b), a strong relation exists
between the weekly consumed energy, which shows the peak
consumption on 3rd day while the lowest consumption is
recorded on 6th day of each week. The exception is found
on 5th day of the 4th week. The reason behind this deviation
is the intermittent nature of a fair consumer. Therefore, it
is deduced from Fig. 1(b) that the consumption profiles of
the benign consumers follow a periodic pattern. Similarly, the
daily and weekly time series of the fraudster consumer is
exhibited in Fig. 2(a) and Fig. 2(b), respectively, which show
a non-periodic behavior at each time interval. In contrast to
Fig. 1(b), an abrupt and highest peak is observed on 3rd day
of the 1st week, as shown in Fig. 2(b), which validates the
problem.
After analyzing the time series data of both fair and fraud-
ster consumers, it is observed that the consumption patterns
of fair consumers follow a symmetric pattern, in contrast, the
suspicions show asymmetric behavior. This assumption leads
to scrutinize and analyze the electricity consumption patterns
of consumers, which violate the uniform control limit.
However, it is a challenging and an arduous task to capture
the dynamic changes in time series due to the following
reasons:
1) due to the imbalanced nature of dataset, the distribution
is skewed towards the dominating class and consequently, the
classifiers do not discriminate the decision boundary. Hence,
the classifier tends to overfit [1],
2) the energy consumption data mostly consists of missing
values and outliers. The smoothing spline can detect the
outliers, however, it is difficult to capture the true continuity.
The selection of thresholds (knots) and their location are two
big challenges. Moreover, by increasing the degree from a
certain threshold, the chances of misclassification increase.
Hence, the suspicious consumers can be misclassified. As
shown in Fig. 2(a), the consumption of a fraudster consumer
shows unusual activity, which is normalized by the smoothing
spline [13], [22],
3) extracting decisive features from a highly dynamic sequen-
tial time series is significant, which traditional CNN lacks [1].
4) in literature, most of the datasets referred to electricity theft
are unlabeled. The synthetic attacks are launched, which do
not show the true relation between consumed energy [21],
5) the selection of suitable performance metrics is of great
importance in ETD. The most widely used performance mea-
sure i.e., accuracy is an inadequate measure in terms of fraud
detection, because the cases of theft are reared as compared
to the adversary. The classifier shows higher accuracy, even
though the theft cases are misclassified, which negates the true
relation between weekly consumed energy [25]. Similarly, low
ETD rate, minimum AUC and high F+rate are observed in
[7].
4
Da y 1
Da y 2
Da y 3
Da y 4
Da y 5
Da y 6
Da y 7
(b )
4
6
8
10
12
14
16
18
20
1s t we ek
2n d we ek
3r d we ek
4t h w eek
Fig. 1: Electricity consumption pattern of a honest consumer.
(a) Date-wise electricity consumption. (b) Weekly electricity
consumption.
Da y 1
Da y 2
Da y 3
Da y 4
Da y 5
Da y 6
Da y 7
(b )
20
25
30
35
40
45 1s t we ek
2n d we ek
3r d we ek
4t h w eek
Fig. 2: Electricity consumption pattern of a fraudulent
consumer. (a) Date-wise electricity consumption. (b) Weekly
electricity consumption.
IV. OUR A PP ROACH
The proposed ETD technique consists of two steps. In the
first step, the preprocessing is done in which the issues of miss-
ing values, data standardization and handling the imbalanced
class are resolved. In the second step, a three-fold operation is
performed, which involves decisive feature extraction, analysis
of sequential time series and the application of a classifier. The
details are provided in the following subsections.
A. Data preprocessing
The preliminary analysis of data is a mandatory step in
highly dynamic time series analysis, which includes imputa-
tion, outlier detection, data standardization, handling imbal-
ance data, etc.
1) Handling missing values and data standardization: The
electricity consumption records of consumers contain either
incomplete information or missing values. The reasons behind
this issue may be the failure of hardware or corruption of data.
In case of high time series data, the missing values can not be
dropped. However, the imputation is performed synthetically
in order to fill these values. In most cases, the filling of
missing values is performed through averaging. In this paper,
the missing values are recovered through interpolation method
[1], as under:
f(zi) = (zi−1+zi+1
2if zi∈N aN, zi−1, zi+1 6∈ NaN
ziotherwise,
(1)
where, ziis the recorded or missed (null) observation in the
dataset. The null value is represented as NaN. If ziis null,
then it is filled according to equation (1).
Similarly, the data standardization is performed using min −
max normalization [1], using equation (2).
f(zi) = zi−min(z)
max(z)−min(z)(2)
where, min(z)shows minimum value of zand max(z)
represents maximum value of z.
Fig. 3: System model of the proposed DSN
2) Handling imbalanced class distribution: A dataset is
considered as imbalanced or biased, if the sample points
of one class (majority class) highly dominate the instances
of other class (minority class). Due to underrepesentation
of minority class, the distribution is skewed towards the
majority class. Consequently, the classifier cannot discriminate
the decision boundary. Hence, it becomes unable to learn
the key characteristics of minority class and tends to overfit.
The issues related to imbalanced data are not only limited to
image recognition, semantic segmentation, but are also applied
equally to time series data [26].
The existing remedies for handling imbalanced class issues
fall under one of the three solutions: cost-sensitive approach,
algorithm-level approach and data-level handling approach
[27]. In cost sensitive approach, the affects of highly dominat-
ing class are reduced in the training stage. The misclassifica-
tion costs of both the dominating and suppressing classes are
taken into account and the weights are assigned accordingly.
Hence, the cost-sensitive approach tweaks the minority class
5
towards the dominating class. In algorithm-level approach, the
model is modified and trained in such a way that the scarce
instances are favored and over-weighted, so that the disparity
produced by the majority class is reduced during learning
stage. Traditionally, the class balancing was achieved by
data-level approach, which includes both undersampling and
oversampling techniques. In undersampling, the majority class
is sacrificed a lot by down-sizing the actual data because in
most cases the right choices are eliminated. Similarly, copying
the instances of minority class mostly leads to overfitting,
which is a downfall of oversampling. The right choice for
the selection of technique related to handling the imbalanced
class issue depends upon the nature of problem.
In this paper, the responsibility of handling imbalanced data
is assigned from algorithm-level to data-level. In particular, the
oversampling technique is adopted in order to avoid the prob-
lem of decisive sample elimination caused by undersampling
technique. Specifically, for oversampling, ADASYN sampling
approach is applied in order to better comprehend the selected
points [28]. In contrast to simply duplicating the instances
of minority class, the ADASYN selects samples and injects
some noise. The impact of noise addition results in better
generalization of the model. The reason behind the selection
of ADASYN is not only to avoid overfitting, but also to
emphasize outliers’ detection in the feature space.
B. Proposed deep siamese (CNN-LSTM) network architecture
for ETD
In the second step of the proposed methodology, identi-
fication of the fraudulent consumers is performed via joint
integration of CNN-LSTM with DSN. The details are provided
in the following subsections:
1) Features extraction through convolution neural net-
works: The preliminary data analytics show the periodicity
and non-periodicity in electricity consumption of fair and
fraudulent consumers. The identification of a fraudster con-
sumer is difficult when analyzing the daily electricity con-
sumption record, since the electricity consumption of each day
shows a relatively independent pattern. Therefore, aligning the
electricity consumption of several weeks is beneficial for de-
tecting abnormal patterns. The work done in [1] indicates that
CNN performs well in such situation, hence the daily electric-
ity consumption data is transformed to weekly consumption,
accordingly. A deep CNN is trained on the weekly electricity
consumption profile through multiple stacked convolutional
layers, convolution filters, a max-pooling layer and a fully
connected layer. Convolution is the element-wise multiplica-
tion of weights with corresponding inputs. After convolution,
the features-map is obtained by sliding the convolution filter
or kernel over the input vector.
2) Sequence learning through long short term memory: The
association of memory to the NN makes it more powerful to
handle time series data, which becomes the inherent behavior
of recurrent neural network (RNN) [29]. The problems asso-
ciated with RNN are vanishing and exploding gradients [30].
These issues arise due to the ignorance of long-term and short-
term dependencies. Unlike traditional RNN models, LSTM is
introduced to overcome the aforementioned limitations [31].
The structure of LSTM is same as RNN except the repeating
module. Instead of a single NN layer, LSTM has more layers,
which demonstrate the better representation of time series data.
In fact, LSTM is capable to handle the vanishing gradient
problem and to remember the information for a long period
of time, which is practically its default behavior.
In our work, the daily electricity consumption profile is
analyzed by LSTM. Moreover, LSTM is also capable to fetch
the time window of anomalous time series.
3) Supervised learning based on deep siamese network:
DSN can be applied to the problem, where the aim is to
discriminate features on the basis of similarity measurer [32].
Unlike traditional CNN, which has low generalization ability,
DSN works superior because of its best feature extraction
capabilities [32], [33], [34]. DSN is a supervised machine
learning technique, which operates in two main steps: shared
feature extractor and distance measurer or cost estimator. The
shared feature extractor is the encoding of features while the
cost function estimates the difference between two embedding
streams.
4) Mathematical formulation for CNN-LSTM: The com-
bination of CNN and LSTM is used in the proposed work to
discriminate the features of two different types of consumers,
i.e., honest and fraudulent. The mathematical formulation
of the CNN-LSTM module used in the underlying work is
described below.
The two input sequences, i.e., ψiand ψjare taken paral-
lelly by the CNN-LSTM module, such that both ψi, ψj=
{(x1, y1),(x2, y2), ..., (xn, yn)}, where, xishows the input
features and yi∈[0,1] is the corresponding target values
(yi= 0 implies that the instance belongs to fair class). The
features of both the classes are learned by the CNN-LSTM
module and finally the encoding of features is performed [32],
using equations (3) and (4):
Ei=δ{ωn.δ{...δ{ω2.[δ(ω1.ψi+b1) + b2]...}+bn},(3)
Ej=δ{ωn.δ{...δ{ω2.[δ(ω1.ψj+b1) + b2]...}+bn},(4)
where, δ(.),ωnand b, show the sigmoid function, weights and
biases, respectively. Thereafter, the shared features are fed to a
loss function, which discriminates the features on the basis of
similarity measure. Therefore, the classification loss such as
binary cross entropy is not viable. Instead, a constructive loss
function is used, as in [32], to better comprehend the features,
given in euqation (5).
LossDSN
i,j =di,j .max[0,(1 −ˆ
di,j )] + (1 −di,j ).ˆ
di,j ,(5)
where, di,j is the Euclidean distance, which is calculated
for the features’ output accordingly, i.e., ˆ
di,j =||Ei−Ej||2.
Similarly, di,j shows the actual distance, given in equation
(6).
di,j =(1,if yi6= ˆyj
0,otherwise. (6)
The objective of training DSN is to minimize the variance
between di,j and ˆ
di,j .
6
V. SIMULATION RESULTS
In this section, the simulations are performed in order to
compare the performance of the proposed model with the
benchmark schemes.
A. Simulation setup
1) Dataset acquisition: The dataset is acquired from the
largest power providing company in China i.e., SGCC2, which
is publicly available. The daily consumption record is available
for 1035 days i.e., from January 1, 2014 to October 31,
2016. The ground truth of the dataset states that 9% of
the total consumers are declared as electricity thieves, which
demonstrates a high ratio.
0 2 4 6 8 10 12 14
0. 00
0. 25
0. 50
0. 75
Accur acy
0 2 4 6 8 10 12 14
Num ber of ep ochs
2
4
6
Loss
Tra in
Test
Fig. 4: Performance of CNN-LSTM model.
0. 0 0.2 0.4 0 .6 0 .8 1 .0
TPR
0. 00
0. 25
0. 50
0. 75
1. 00
FPR
Tra in (AUC = 0 .75)
Test (AUC = 0 .73 )
0. 0 0.2 0.4 0 .6 0 .8 1 .0
Reca ll
0. 00
0. 25
0. 50
0. 75
1. 00
Prec isi on
Tra in (PR = 0.8 2)
Test (PR = 0.5 2)
Fig. 5: ROC-AUC and PR curve of CNN-LSTM model.
2) Performance metrics: In order to detect NTL from
the pool of electricity consumption profile, the performance
metrics such as true positives (T+) and true negatives
(T−) show the correctly classified instances. In contrast,
false negatives (F−) and F+reflect an opposite scenario,
where F−shows the number of fraud consumers, which are
2http://www.sgcc.com.cn/
0 2 4 6 8 10 1 2 14 16
0. 6
0. 8
1. 0
Accur acy
0 2 4 6 8 10 1 2 14 16
Num ber of ep ochs
0. 0
0. 2
0. 4
0. 6
Loss
Tra in
Test
Fig. 6: Performance of DSN.
0. 0 0.2 0.4 0 .6 0 .8 1 .0
TPR
0. 00
0. 25
0. 50
0. 75
1. 00
FPR
Tra in (AUC = 0.9 9)
Test (AUC = 0 .9 3)
0. 0 0.2 0.4 0 .6 0 .8 1 .0
Reca ll
0. 00
0. 25
0. 50
0. 75
1. 00
Prec isi on
Tra in (PR = 0. 99 )
Test (PR = 0.9 2)
Fig. 7: ROC-AUC and PR curve of DSN.
misclassified as fair and vice versa. The objective behind
the accurate detection of NTL is to reduce the F+, which
consequently maximizes T+. Other performance metrics
related to classification are recall, precision, specificity,
F1-score, accuracy, mAP, and AUC of receiver operating
characteristics (ROC) curve, given by equations (7)-(11),
taken from [13].
Recall =T+
T++F−,(7)
P recision =T+
T++F+,(8)
Specif icity =T−
T−+F+,(9)
F1−score = 2 ∗P recision ∗Recall
P recision +Recall ,(10)
Accuracy =T++T−
T++T−+F++F−.(11)
Though, accuracy and recall are widely used in the literature as
performance metrics, however, they are inadequate in case of
imbalanced class distribution, as shown in Table I. Similarly,
7
precision, specificity and F1-Score do not show accurate
results and are not reliable when used individually.
In order to detect NTL without the loss of information,
selection of reliable performance metrics is required [13].
The performance metrics such as mAP and AUC are applied
in this work to better comprehend the imbalanced data. As
mentioned in [1], [2], ROC curve and mAP are the best
performance metrics used for detecting suspects in imbalanced
class distribution.
The ROC curve is the graphical representation of T+rate
and F+rate. It is used to evaluate the performance of a
classifier. The area under the ROC curve is called AUC, which
separates the distribution of fraudulent class from fair class,
as given in equation (12). The limits of ROC curve range
from 0to 1. The ideal situation arises when no curve overlaps
each other. AUC approaching 1demonstrates the validity of
classifier while AUC less than 0.5shows that the classifier
does not have the ability to discriminate the classes [1], [2].
AUC is calculated using equation (12), taken from [1].
AUC =Pi∈SRi−1
2|S|(|S|+ 1)
|S|∗|H|,(12)
where, Ridenotes the rank of suspicion degree of fraudulent
consumers in ascending order while |S|and |H|are the
cardinality of suspicious and honest consumers, respectively.
The second performance metric used in this paper is mAP.
It is defined as the mean of all average precisions. It is used
for useful information retrieval, when the performance metrics
discussed in equations (7)-(11) fail.
Let ykshows the number of fraudulent consumers and k
denotes the top rank fraudulent consumers, such that the
precision is defined as P@k=yk
k. The calculations performed
by mAP for information retrieval are given in equation (13),
taken from [1].
mAP @N=Pr
i=1 P@ki
r(13)
where, rshows the number of suspicious consumers of top
ranked theft labels N. The value of Nis 100 in our scenario.
B. Affect of imbalanced distribution on performance metrics
In imbalanced class problem, one class significantly domi-
nates the other class, which results in the suppression of the
minority class. The affect of least important and significant
performance metrics can be seen in Table I.
Table I shows the comparative analysis of DSN (with and
without) handling imbalanced class distribution. It is seen
that the performance is worst without handling imbalanced
class issue. Especially, in case of recall, where a lot of fraud
instances are misclassified as fair. Similarly, the performance
of ADASYN is better than that of random undersampling
(RUS). The reason behind the low-performance of RUS is due
to the elimination of decisive features.
The consequences of using accuracy as a performance
metric are that it results in low T+rate and high F+rate.
These results can also be seen in Table I that even though
the accuracy is higher, AUC and mAP are still minimized.
Therefore, it is deduced that accuracy does not guarantee
accurate classification of the instances in skewed distributions.
TABLE I: Significance of handling imbalanced class
distribution
Performance metrics
Without
handling
imbalance class
Undersampling
(RUS)
Oversampling
(ADASYN)
mAP 0.5952 0.5997 0.8988
AUC 0.6270 0.6520 0.9250
F1-Score 0.6467 0.5500 0.9249
Accuracy 0.7065 0.6519 0.9241
Precision 0.7524 0.6500 0.9153
Recall 0.3771 0.6300 0.9347
C. Comparative analysis
The proposed model is compared with the baseline methods
for validation purpose. The baseline methods used for com-
parison are discussed below.
1) Support vector machine: SVM is an elegant technique
used for both classification and regression tasks. It discrimi-
nates the boundary of different classes by a hyperplane. The
construction of hyperplane is entirely dependent upon the
selection of support vectors. Table II shows the optimized hy-
perparameters obtained through grid search. The regularization
parameter Cis selected to be 0.001 with radial basis function
(RBF) as a kernel.
TABLE II: SVM hyperparameters’ selection
Hyperparameter Values range Optimal value
C 0.001, 0.01, 0.1, 1, 10, 100 0.001
Kernel Linear, RBF RBF
2) Logistic regression: LR is a simple and an elegant
technique used for binary classification. In it, both classes are
separated by a hyperplane h= (w, b), where wand bshow
the norm and intercept of the hyperplane, respectively. The
finding of optimal wand bimplies that the hyperplane can
accurately separate the decision boundary of both the classes.
The operations performed by LR are same as NN for training
input features using trained weight metrics.
A distance metric for each observation such as di=wT.xi
||w|| is
used to find the margin with hyperplane, where wand xdenote
the weights and corresponding input metrics. ||w|| is the norm
to the hyperplane and is assumed as a unit vector. The weights
accompanied with input feature metrics are passed through
a sigmoid function f(d) = 1
1+ed. The library: LIBLINEAR
solver is used to train the classifier and find the optimal
weights while using the logarithmic loss function [35].
During grid-search, the hyperparameters of LR are obtained,
as given in Table III, where Cis the hyperparameter used to
handle the overfitting and Rshows the type of regularization.
The best hyperparameters are achieved when C= 0.01 and
L2norm is selected as the type of regularization. The careful
selection of these parameters is essential for the performance
of the forecasting model [36]
TABLE III: LR hyperparameters’ selection
Hyperparameter Values range Optimal value
C 0.001, 0.01, 0.1, 10, 100 0.01
R L1norm, L2norm L2norm
8
3) Random forest: RF is an ensemble model with decision
tree as a baseline classifier. In order to make better predictions,
RF combines multiple decision trees (DT) on the bases of
bootstrapping and feature sampling. Simultaneous execution of
bootstrapping and feature extraction yields a different model
each time. The essence of RF is that it can reduce variance
efficiently. The samples which are selected by the classifier
are called in-bag-samples (ibs)while the remaining samples
are known as out-of-bag (oob)samples. The ibs are used to
train the classifier while oob are used to validate the model.
Table IV shows the best generalized hyperparameters of RF.
TABLE IV: RF hyperparameters’ selection
Hyperparameter Values range Optimal value
Number of decision trees 800, 1200, 1600, 2000 1200
Maximum depth 10, 15, 20, 25 20
Minimum sample splits 5, 10, 15, 20 15
Minimum sample leaves 4, 8, 12, 16 16
4) CNN-LSTM: For performing comparative analysis, CNN
and LSTM are integrated to extract the features and analyze
the time series data [37]. The details of hyperparameters are
given in Table V. The performance of LSTM-CNN model
without DSN is shown in Fig. 4 and Fig. 5. Although, the
features are extracted by CNN and the sequence information
is preserved by LSTM, this hybrid model still fails to provide
efficient results due to the lack of discrimination between
instances.
TABLE V: CNN-LSTM hyperparameters’ selection
Hyperparameter Optimal
value Hyperparameter Optimal
value
Number of neurons 64 Stride 1
Number of CNN-layers 6 Dropout 0.1
Number of LSTM-layers 4 Dense layer 128
Number of filters 10 Activation function LeakyReLu
5) Wide and deep CNN: To capture both the wide and
deep information in time series data for NTL detection, a
wide and deep CNN (WD-CNN) is proposed in [1]. The wide
component takes the daily consumption (1-D data) as an input
while the deep component analyzes the weekly consumption
profile, which is represented as 2-D data. The rectified linear
unit (ReLu) is used as an activation function to detect the
positive value. Whereas, the metrics AUC and mAP are used to
measure the performance of the model. The hyperparameters
used to train the model are same as used in [1].
6) Results and discussion: Table VI provides an overview
of the performance metrics used for each classifier for different
training ratios, i.e., 60%,70% and 80%, respectively. Simi-
larly, the detail of each performance metric is given in order
to better understand its importance in ETD. All the results
obtained for traditional classifiers such as LR, SVM, RF show
an increasing trend. By investigating the results, it is observed
that the performance of traditional classifiers is enhanced
by the increase in training instances. In contrast, the deep
networks entirely depend on the selection of hyperparameters
along with the change in model’s training ratio. Moreover, it
is clear in Table VI that the proposed model is successfully
applied to both small sized and immensely large-sized datasets.
Similarly, the proposed model’s performance is visualized in
Fig. 6 and Fig. 7.
VI. CONCLUSION
In this paper, electricity theft is detected in the SG us-
ing dataset obtained through AMI. A novel theft detection
method is introduced via joint integration of CNN-LSTM
and DSN. The CNN component is capable to handle the
weekly 2-D electricity consumption profile by generalizing
the model efficiently, whereas, the LSTM module memorizes
the daily 1-D sequential electricity consumption data. Moving
ahead, DSN performs judgment on the shared feature extractor
and discriminates the deviating patterns of fraudulent class
consumers from the fair class consumers. The analysis is
performed on high resolution time series data, provided by
SGCC. The simulation results depict that SDN has high
ETD rate with an increased AUC and mAP of 0.93% and
0.9%, respectively. Its comparative analysis with benchmark
methods, such as LR, SVM, RF, CNN-LSTM and WD-CNN,
show that it achieves highest values for all performance
parameters: precision, recall, MaP, Accuracy, AUC and F1-
Score. It maintains its performance for all three training ratios:
60%, 70% and 80%.
REFERENCES
[1] Zheng, Z., Yang, Y., Niu, X., Dai, H.N. and Zhou, Y., 2017. Wide
and deep convolutional neural networks for electricity-theft detection to
secure smart grids. IEEE Transactions on Industrial Informatics, 14(4),
pp.1606-1615.
[2] Zheng, K., Chen, Q., Wang, Y., Kang, C. and Xia, Q., 2018. A novel
combined data-driven approach for electricity theft detection. IEEE
Transactions on Industrial Informatics, 15(3), pp.1809-1819.
[3] Asif Khan and Nadeem Javaid, "Jaya Learning-Based Optimization for
Optimal Sizing of Stand-Alone Photovoltaic, Wind Turbine, and Battery
Systems", Engineering, Pages: 1-21, Published: 2020, ISSN: 2095-8099.
[4] Ashfaq Ahmad, Nadeem Javaid, Mohsen Guizani, Nabil Ali Alrajeh and
Zahoor Ali Khan, "An Accurate and Fast Converging Short-Term Load
Forecasting Model for Industrial Applications in a Smart Grid", IEEE
Transactions on Industrial Informatics, Volume: 13, Issue: 5, Pages:
2587-2596, Published: October 2017, ISSN: 1551-3203.
[5] Sana Mujeeb and Nadeem Javaid, "ESAENARX and DE-RELM: Novel
Schemes for Big Data Predictive Analytics of Electricity Load and
Price", Sustainable Cities and Society, Volume: 51, Article Number:
101642, Pages: 1-16, Published: November 2019, ISSN: 2210-6707.
[6] Jokar, P., Arianpoo, N. and Leung, V.C., 2015. Electricity theft detection
in AMI using customers’ consumption patterns. IEEE Transactions on
Smart Grid, 7(1), pp.216-226.
[7] Saeed, M.S., Mustafa, M.W., Sheikh, U.U., Jumani, T.A. and Mirjat,
N.H., 2019. Ensemble Bagged Tree Based Classification for Reducing
Non-Technical Losses in Multan Electric Power Company of Pakistan.
Electronics, 8(8), pp.860-876.
[8] Singh, S.K., Bose, R. and Joshi, A., 2019. Energy theft detection for
AMI using principal component analysis based reconstructed data. IET
Cyber-Physical Systems: Theory & Applications, 4(2), pp.179-185.
[9] Mohamad, A.M. and Mohamed, Y.A.R.I., 2019. Investigation and As-
sessment of Stabilization Solutions for DC Microgrid With Dynamic
Loads. IEEE Transactions on Smart Grid, 10(5), pp.5735-5747.
[10] Martins, A.V., Bacurau, R.M., dos Santos, A.D. and Ferreira, E.C., 2019.
Non-Intrusive Energy Meter for Non-Technical Losses Identification.
IEEE Transactions on Instrumentation and Measurement, pp.1-8.
[11] Amin, S., Schwartz, G.A., Cardenas, A.A. and Sastry, S.S., 2015. Game-
theoretic models of electricity theft detection in smart utility networks:
Providing new capabilities with advanced metering infrastructure. IEEE
Control Systems Magazine, 35(1), pp.66-81.
[12] Ahmad, T., Chen, H., Wang, J. and Guo, Y., 2018. Review of various
modeling techniques for the detection of electricity theft in smart grid
environment. Renewable and Sustainable Energy Reviews, 82, pp.2916-
2933.
9
TABLE VI: Comparative analysis of DSN with benchmark schemes
Method Training ratio-60% Training ratio-70% Training ratio-80%
P R F1Acc mAP AUC P R F1Acc mAP AUC P R F1Acc mAP AUC
LR 0.710 0.710 0.680 0.700 0.645 0.702 0.725 0.740 0.715 0.720 0.640 0.716 0.730 0.725 0.725 0.730 0.668 0.720
SVM 0.675 0.670 0.680 0.676 0.6140 0.677 0.685 0.675 0.675 0.680 0.619 0.684 0.680 0.680 0.680 0.680 0.628 0.688
RF 0.700 0.550 0.551 0.710 0.687 0.706 0.740 0.730 0.735 0.740 0.652 0.735 0.750 0.750 0.750 0.750 0.681 0.749
CNN-LSTM 0.664 0.615 0.661 0.836 0.638 0.666 0.629 0.662 0.636 0.839 0.641 0.670 0.670 0.69 0.676 0.832 0.66 0.73
WD-CNN 0.640 0.691 0.651 0.820 0.669 0.689 0.624 0.720 0.770 0.770 0.689 0.718 0.661 0.760 0.685 0.840 0.711 0.756
DSN 0.875 0.839 0.857 0.839 0.814 0.860 0.840 0.850 0.845 0.844 0.819 0.844 0.912 0.923 0.928 0.953 0.900 0.934
[13] Avila, N.F., Figueroa, G. and Chu, C.C., 2018. NTL Detection in
Electric Distribution Systems Using the Maximal Overlap Discrete
Wavelet-Packet Transform and Random Undersampling Boosting. IEEE
Transactions on Power Systems, 33(6), pp.7171-7180.
[14] Li, W., Logenthiran, T., Phan, V.T. and Woo, W.L., 2019. A novel smart
energy theft system (SETS) for IoT-based smart home. IEEE Internet of
Things Journal, 6(3), pp.5531-5539.
[15] Kumar, P., Lin, Y., Bai, G., Paverd, A., Dong, J.S. and Martin, A.,
2019. Smart grid metering networks: A survey on security, privacy and
open research issues. IEEE Communications Surveys & Tutorials, 21(3),
pp.2886-2927.
[16] Viegas, J.L., Esteves, P.R., Melicio, R., Mendes, V.M.F. and Vieira,
S.M., 2017. Solutions for detection of non-technical losses in the
electricity grid: A review. Renewable and Sustainable Energy Reviews,
80, pp.1256-1268.
[17] Abbasinezhad-Mood, D. and Nikooghadam, M., 2018. Efficient anony-
mous password-authenticated key exchange protocol to read isolated
smart meters by utilization of extended Chebyshev chaotic maps. IEEE
Transactions on Industrial Informatics, 14(11), pp.4815-4828.
[18] Abbasinezhad-Mood, D. and Nikooghadam, M., 2018. Design and hard-
ware implementation of a security-enhanced elliptic curve cryptography
based lightweight authentication scheme for smart grid communications.
Future Generation Computer Systems, 84, pp.47-57.
[19] Saeed, Muhammad Salman, Mohd Wazir Mustafa, Nawaf N. Hamadneh,
Nawa A. Alshammari, Usman Ullah Sheikh, Touqeer Ahmed Jumani,
Saifulnizam Bin Abd Khalid, and Ilyas Khan. "Detection of Non-
Technical Losses in Power Utilities—A Comprehensive Systematic
Review." Energies 13, no. 18 (2020): 4727.
[20] Viegas, J.L., Esteves, P.R. and Vieira, S.M., 2018. Clustering-based
novelty detection for identification of non-technical losses. International
Journal of Electrical Power & Energy Systems, 101, pp.301-310.
[21] Punmiya, R. and Choe, S., 2019. Energy theft detection using gradient
boosting theft detector with feature engineering-based preprocessing.
IEEE Transactions on Smart Grid, 10(2), pp.2326-2329.
[22] Buzau, M.M., Tejedor-Aguilera, J., Cruz-Romero, P. and Gomez-
Exposito, A., 2019. Hybrid deep neural networks for detection of non-
technical losses in electricity smart meters. IEEE Transactions on Power
Systems, pp.1-10.
[23] Spiric, J.V., Stankovic, S.S. and Docic, M.B., 2018. Identification
of suspicious electricity customers. International Journal of Electrical
Power & Energy Systems, 95, pp.635-643.
[24] Razavi, R., Gharipour, A., Fleury, M. and Akpan, I.J., 2019. A practical
feature-engineering framework for electricity theft detection in smart
grids. Applied energy, 238, pp.481-494.
[25] Ramos, C.C., Rodrigues, D., de Souza, A.N. and Papa, J.P., 2016. On the
study of commercial losses in Brazil: a binary black hole algorithm for
theft characterization. IEEE Transactions on Smart Grid, 9(2), pp.676-
683.
[26] Khan, S.H., Bennamoun, M., Sohel, F., Togneri, R. and Naseem, I., 2016.
Integrating geometrical context for semantic labeling of indoor scenes
using rgb images. International Journal of Computer Vision, 117(1),
pp.1-20.
[27] Razavi-Far, R., Farajzadeh-Zanjani, M., Wang, B., Saif, M. and
Chakrabarti, S., 2019. Imputation-based Ensemble Techniques for Class
Imbalance Learning. IEEE Transactions on Knowledge and Data Engi-
neering, pp.1-14.
[28] He, H., Bai, Y., Garcia, E.A. and Li, S., 2008, June. ADASYN: Adaptive
synthetic sampling approach for imbalanced learning. In 2008 IEEE
international joint conference on neural networks (IEEE world congress
on computational intelligence) (pp. 1322-1328). IEEE.
[29] Rabiya Khalid, Nadeem Javaid, Fahad A. Al-zahrani, Khursheed Au-
rangzeb, Emad-ul-Haq Qazi and Tehreem Ashfaq, "Electricity Load and
Price Forecasting Using Jaya-Long Short Term Memory (JLSTM) in
Smart Grids", Entropy, Volume: 22, Issue: 1, Article Number: 10, Pages:
1-21, Published: January 2020, ISSN: 1099-4300.
[30] Muhammad Adil, Nadeem Javaid, Umar Qasim, Ibrar Ullah, Muhammad
Shafiq and Jin-Ghoo Choi, "LSTM and Bat-Based RUSBoost Approach
for Electricity Theft Detection", Applied Sciences, Volume: 10, Issue:
12, Article Number: 4378, Pages: 1-21, Published: June 2020, ISSN:
1099-4300.
[31] Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R. and Schmid-
huber, J., 2016. LSTM: A search space odyssey. IEEE transactions on
neural networks and learning systems, 28(10), pp.2222-2232.
[32] Hu, T., Guo, Q., Shen, X., Sun, H., Wu, R. and Xi, H., 2019. Utilizing
unlabeled data to detect electricity fraud in AMI: A semisupervised deep
learning approach. IEEE transactions on neural networks and learning
systems, 30(11), pp.3287-3299.
[33] Wang, M., Tan, K., Jia, X., Wang, X. and Chen, Y., 2020. A Deep
Siamese Network with Hybrid Convolutional Feature Extraction Module
for Change Detection Based on Multi-sensor Remote Sensing Images.
Remote Sensing, 12(2), p.205. DOI: 10.3390/rs12020205.
[34] Miao, J., Wang, B., Wu, X., Zhang, L., Hu, B. and Zhang, J.Q., 2019,
July. Deep Feature Extraction Based on Siamese Network and Auto-
Encoder for Hyperspectral Image Classification. In IGARSS 2019-2019
IEEE International Geoscience and Remote Sensing Symposium (pp.
397-400). IEEE.
[35] Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R. and Lin, C.J., 2008.
LIBLINEAR: A library for large linear classification. Journal of machine
learning research, 9(Aug), pp.1871-1874.
[36] Rabiya Khalid and Nadeem Javaid, "A Survey on Hyperparameters Opti-
mization Algorithms of Forecasting Models in Smart Grid", Sustainable
Cities and Society, Pages: 1-35, Article Number: 102275, Published:
2020, ISSN: 2210-6707.
[37] Hasan, M., Toma, R.N., Nahid, A.A., Islam, M.M. and Kim, J.M., 2019.
Electricity Theft Detection in Smart Grid Systems: A CNN-LSTM Based
Approach. Energies, 12(17), pp.3310-3328.
NADEEM JAVAID (S’8, M’11, SM’16) received
the bachelor degree in computer science from Go-
mal University, Dera Ismail Khan, Pakistan, in
1995, the master degree in electronics from Quaid-
i-Azam University, Islamabad, Pakistan, in 1999,
and the Ph.D. degree from the University of Paris-
Est, France, in 2010. He is currently an Associate
Professor and the Founding Director of the Commu-
nications Over Sensors (ComSens) Research Labora-
tory, Department of Computer Science, COMSATS
University Islamabad, Islamabad. He has supervised
126 master and 20 Ph.D. theses. He has authored over 900 articles in technical
journals and international conferences. His research interests include energy
optimization in smart/micro grids and in wireless sensor networks, data
analytics in smart grids, and blockchain in WSNs, etc. He was recipient of
the Best University Teacher Award from the Higher Education Commission
of Pakistan, in 2016, and the Research Productivity Award from the Pakistan
Council for Science and Technology, in 2017. He is also Associate Editor of
IEEE Access, Editor of the International Journal of Space-Based and Situated
Computing and editor of Sustainable Cities and Society.
10
NAEEM JAN Naeem Jan received the B.S. degree
in computer science from PMAS, University Insti-
tute of Information Technology Rawalpindi, Pak-
istan, in 2014, and the M.S. degree in computer
science, under the supervision of Dr. N. Javaid, from
the Department of Computer Science, COMSATS
University Islamabad, Islamabad Campus, Pakistan,
in 2017. He is currently with ComSens (Commu-
nication over Sensors) Research Laboratory, COM-
SATS University Islamabad. His research interests
include wireless sensor networks, optimization tech-
niques, Big data analysis, and Internet of Things.
MUHAMMAD UMAR JAVED received the bach-
elor’s and master’s degrees in electrical engineer-
ing from Government College University Lahore,
Lahore, Pakistan, in 2014 and 2018, respectively.
He is currently pursuing the Ph.D. degree in com-
puter science with the Communications Over Sen-
sors (ComSens) Research Laboratory, COMSATS
University Islamabad, Islamabad Campus, under the
supervision of Dr. Nadeem Javaid. His research
interests include smart grids, electric vehicles, big
data analysis and blockchain.