ArticlePDF Available

An adaptive synthesis to handle imbalanced big data with deep siamese network for electricity theft detection in smart grids



The bi-directional flow of energy and information in the smart grid makes it possible to record and analyze the electricity consumption profiles of consumers. Because of the increasing rate of inflation over the past few years, people started looking for means to use electricity illegally, termed as electricity theft. Many data analytics techniques are proposed in the literature for electricity theft detection (ETD). These techniques help in the detection of suspected illegal consumers. However, the existing approaches have a low ETD rate either due to improper handling of the imbalanced class problem in a dataset or the selection of inappropriate classifier. In this paper, a robust big data analytics technique is proposed to resolve the aforementioned concerns. Firstly, adaptive synthesis (ADASYN) is applied to handle the imbalanced class problem of data. Secondly, convolutional neural network (CNN) and long-short term memory (LSTM) integrated deep siamese network (DSN) is proposed to discriminate the features of both honest and fraudulent consumers. Specifically, the task of feature extraction from weekly energy consumption profiles is handed over to the CNN module while the LSTM module performs the sequence learning. Finally, the DSN contemplates on the shared features provided by the CNN-LSTM and applies final judgment. The data analytics is performed on different train-test ratios of the real-time smart meters' data. The simulation results validate the proposed model's effectiveness in terms of high area under the curve, F1-Score, precision and recall.
An adaptive synthesis to handle imbalanced big
data with deep siamese network for electricity theft
detection in smart grids
Nadeem Javaid*, Naeem Jan, Muhammad Umar Javed
Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
*Corresponding author:,
Abstract—The bi-directional flow of energy and information
in the smart grid makes it possible to record and analyze
the electricity consumption profiles of consumers. Because of
the increasing rate of inflation over the past few years, people
started looking for means to use electricity illegally, termed as
electricity theft. Many data analytics techniques are proposed
in the literature for electricity theft detection (ETD). These
techniques help in the detection of suspected illegal consumers.
However, the existing approaches have a low ETD rate either
due to improper handling of the imbalanced class problem in
a dataset or the selection of inappropriate classifier. In this
paper, a robust big data analytics technique is proposed to
resolve the aforementioned concerns. Firstly, adaptive synthesis
(ADASYN) is applied to handle the imbalanced class problem of
data. Secondly, convolutional neural network (CNN) and long-
short term memory (LSTM) integrated deep siamese network
(DSN) is proposed to discriminate the features of both honest and
fraudulent consumers. Specifically, the task of feature extraction
from weekly energy consumption profiles is handed over to the
CNN module while the LSTM module performs the sequence
learning. Finally, the DSN contemplates on the shared features
provided by the CNN-LSTM and applies final judgment. The
data analytics is performed on different train-test ratios of the
real-time smart meters’ data. The simulation results validate the
proposed model’s effectiveness in terms of high area under the
curve, F1-Score, precision and recall.
Index Terms—Big data analytics, imbalanced data, adaptive
synthesis, electricity theft detection, deep learning, long-short
term memory, convolutional neural network, deep siamese net-
Going with the United Nation’s 2030 vision, “electricity
for all”, is the major objective of all countries. 1. Both devel-
oped and developing countries are striving to add maximum
amount of electricity to the national grid. While the power
authorities struggle to ensure efficient power distribution to
every household, the energy theft became a hurdle in this
endeavor. According to a report, a loss of approximately 100
million Canadian dollars per year is revealed due to electricity
theft that is equal to the amount of electricity required to
power around 77000 homes for a year [1]. The yearly loss
in revenue caused by the electricity theft in America is 6U.S.
dollars. Similarly, the percentage of electricity loss caused due
to theft is 0.5% to 25% in Brazil, 3.5% in Philippines and upto
1% in United Kingdom. Each year, the revenue loss due to
electricity theft reaches approximately 96 billion U.S. dollars
[2] worldwide.
With the advancements made in information and communi-
cation technology, the traditional power grids are now able
to grasp the benefits of bi-directional communication and
known as smart grids (SGs) [3], [4]. The roll-out of advanced
metering infrastructure (AMI) in the SG makes it possible
to provide the real time and fine-tuned measurements to the
utilities. The addition of communication layer to traditional
metering establishes a bridge between consumers and utility
[5]. Although, numerous benefits are provided by the AMI,
however, the power systems became more exposed to cyber at-
tacks due to the addition of this extra layer [6]. In contrast, the
traditional meters are only vulnerable to physical tampering.
In this paper, the fraud committed by either utilities or feeders
is beyond the scope and focus is on detecting irregularities in
the electricity consumption of consumers.
In a SG, the transmission and distribution of power include
both technical losses (TLs) and non-technical losses (NTLs).
The former include dissipation of energy due to Joules effect,
which in fact is caused by the emission of electrons due
to heat. The assessment of TLs is necessary for accounting
NTLs. Electricity theft is an intended act of illegal usage of
electricity, which is a major source of NTLs. These losses
represent the energy, which is consumed by the consumers,
but not billed. These are also known as commercial losses or
electricity theft. The main issue concerning NTLs is that they
cannot be detected precisely. Only the difference between the
dispatched amount of energy from utilities and the bill paid
for the consumed energy is calculated. The reason behind
this irregularity is either the illegal use of electricity or the
occurrence of technical faults [7]. This irregularity falls under
one of the two groups: internal fraud and external fraud. The
former is committed by the employees for achieving financial
benefits while the later is perpetrated by the consumers for
reducing electricity bills. Ultimately, the main goal behind this
irregularity is to hinder the actual electricity consumption and
consequently achieve financial benefits [8].
The vulnerabilities related to NTLs are generally catego-
rized into three classes: physical attacks, cyber attacks and data
attacks. The physical attacks include meter tampering, reverse
metering, bypassing the meters by direct supply, double-
tapping, washing out meter display, using bogus meters, en-
countering loops in terminal blocks and deploying tilted meters
[8]. In developing countries, the most frequently committed
electricity frauds are reverse metering and direct supply [7].
The cyber attacks are launched remotely by intercepting the
communication line and altering actual readings with malign
readings. Whereas, data attacks are the fusion of both physical
attacks and cyber attacks. The motive behind data attack is to
specifically target the recorded measurements of electricity and
adulterate them by fake data injection [8].
In the past, the primary means of detecting power theft
was on-site inspections and manual analytics of electricity
consumption records. However, these approaches are time
consuming and result in low success rate. Recently, the emer-
gence of information technology and advancements in machine
learning resulted in more robust solutions. Generally, the
solutions to handle NTLs can be grouped into three categories:
hardware based, non-hardware based (data-driven) and hybrid
of both. Hardware based solutions involve the deployment of
devices on different locations, i.e., sensors and they mainly
deal with the design and architecture of the smart meters [9]
to achieve high ETD rate. However, they have high operational
and maintenance cost of the specialized hardware. In contrast,
non-hardware based solutions restrain high potential due to the
low operational and maintenance cost. These solutions detect
the fraud through machine learning algorithms and classifiers.
They can further be categorized into state based, game theory
based and artificial intelligence (AI) based methods.
The state based methods estimate the aggregated NTLs by
calculating the TLs of a specific area. These methods calcu-
late the difference between the amount of energy consumed
and the corresponding invoiced energy. Moreover, different
measurements are estimate like deviation in voltage, power,
etc., for detecting NTLs, which result in high precision and
low cost [10]. However, the state based methods only provide
the aggregated NTLs and fail in providing the specific source
of the loss. Unlike state estimation based method, in game-
theoretic method [11], there is a contest between the utility
and the aberrant consumer. The aim of fraudster consumer is
to outmatch the utility. However, the game-theoretic methods
highly rely on strong estimation for theft characterization.
On the contrary, the AI based methods mainly focus on
the patterns of electricity consumption, which are analyzed
through machine learning algorithms. Both classification and
clustering methods require labeled and un-labeled data in order
to fetch the aberrant consumers from the pool of massive
electricity consumption profiles [12].
Detecting anomalous patterns from electricity consumption
profiles is a challenging task in the presence of imbalanced
class distribution problem in data. In real world scenario, the
number of fair electricity consumers are significantly more
than the thieves, which creates an issue of imbalanced distribu-
tion in dataset. Therefore, it may be considered a special type
of anomaly detection. In AI based methods, classifiers mostly
result in low ETD rate, mainly due to the underrepresentation
of the minority class [13].
The research work in [1], [14] show that analyzing the
electricity consumption patterns of consumers is beneficial
in detecting the suspicious consumers. However, after going
through the existing literature on the topic of ETD [6]- [11],
it is concluded that ETD has the following limitations:
the models which are applied for ETD do not take care
of proper class balancing,
in many cases, the attachment of special devices is
in highly dynamic time series analyses, methods such
as support vector machine (SVM), random forest (RF),
logistic regression (LR), etc., have low ETD rate and high
false positive (F+) rate,
the deep learning approaches do not discriminate the
decisive features appropriately and
in sequential time series data, the convolution neural
network (CNN) and multi-layer perceptron (MLP) do not
perform well. Moreover, CNN fails to provide the exact
source of NTL.
In this paper, a robust big data analytics method for electricity
theft detection (ETD) in the SG is proposed to better discrim-
inate the fair and fraud consumers on the basis of electricity
consumption data. The main contributions of this study are as
according to the nature of problem, an enhanced strategy
for data preprocessing is adopted,
to avoid overfitting and to handle class imbalance issue,
adaptive synthesis (ADASYN) method is used,
CNN and long short term memory (LSTM) are integrated
in a deep siamese network (DSN) in order to learn the
key features and to achieve high ETD rate and
the performance metrics such as mean average precision
(mAP) and area under the curve (AUC) are used to better
comprehend the results.
Rest of the paper is organized as follows. The review on
various existing electricity theft strategies is given in section
II. The problem analysis and solutions to the problems are
described in section III and section IV, respectively. The
simulation results are discussed in section V. Finally, the paper
is concluded in section VI.
Review on the state of the art ETD solutions is generally
categorized into two groups: hardware based solutions and
non-hardware based solutions. A comprehensive review on
system level and data level threats of AMI can be studied
in [15], [16].
A. Hardware based solutions
In hardware based solutions, deployment of special purpose
hardware and modification to the physical architecture are
performed to strengthen the system against vulnerabilities. An
identity based key establishment model is proposed in [9]
in order to avoid relying on pairing. The proposed model is
based on elliptic curve cryptography (ECC), which enhances
the performance along with the mitigation of computational
overhead. Using Chebyshev polynomial to access the security
features of smart meter, a power-authenticated key exchange
protocol is proposed in [17]. To address the ephemeral security
problem, an authentication scheme based on ECC is proposed
in [18], which aims to mitigate the communication and com-
putational complexity. Although, the hardware based solutions
give acceptable results, concentration is still focused on data-
driven approaches for NTL detection due to the following
reasons [19]:
high deployment and maintenance cost due to specialized
metering hardware,
negative benefit-cost ratio (BCR), i.e., the cost outweighs
the benefits,
failure in detecting specific source of NTL and
vulnerability of specialized meter hardware in extreme
weather conditions.
B. Non-hardware based solutions
In contrast to the hardware based solutions, the data-driven
approach surges more rapidly in detecting NTLs. In [2], a
two-fold machine learning technique is adopted to minimize
the ratio of misclassified instances. In the first step, the
maximum information coefficient (MIC) determines the
correlation between the suspicions and the consumption
profiles. In the second step, clustering is performed to find
the density peaks. Similarly in [20], clustering is used to
extract a prototype from consumption patterns. The unseen
data samples are categorized by a distance-measurer; the
instance with significant distance is considered as malign. In
contrast, the work performed in [8], [21], use a supervised
learning approach to handle ETD through relative entropy
and gradient boosting classifiers (GBCs). A hybrid of MLP
and LSTM is adopted to detect NTL in AMI [22]. In order
to find the suspicions’ rank, fuzzy logic is applied in [23]. A
framework for feature engineering with combination of both
genetic algorithm (GA) and finite mixture model (FMM) is
implemented in [24]. For final judgement in NTL detection,
gradient boosting machine (GBM) is applied. GA is an
efficient heuristic algorithm, however, it fails in providing
the global optima. A similar approach is proposed in [25],
which uses black hole algorithm (BHA) for feature extraction.
Although, BHA extracts the optimal features from time series
data, the performance of model is still inefficient in terms of
By analyzing the consumption patterns of electricity con-
sumers, it becomes evident that the fraudsters and the fair
consumers can be differentiated by their consumption pro-
files. Therefore, experiments are performed on the consumed
electricity data, as inspired from [1], in order to validate
the problem. Fig. 1(a) shows the electricity consumption of
benign consumers during October 2016. By visualizing the
results, it is difficult to analyze the key characteristics from
the sequential or one-dimensional (1-D) load profile. However,
by choosing the weekly load profile, it can be seen that the
consumption of a fair consumer shows symmetric behavior,
as depicted in Fig. 1(b). In our scenario, weekly consumption
profile of consumers is preferred over daily consumption
for CNN, because the behaviors of consumers are weekly
periodic. As shown in Fig. 1(b), a strong relation exists
between the weekly consumed energy, which shows the peak
consumption on 3rd day while the lowest consumption is
recorded on 6th day of each week. The exception is found
on 5th day of the 4th week. The reason behind this deviation
is the intermittent nature of a fair consumer. Therefore, it
is deduced from Fig. 1(b) that the consumption profiles of
the benign consumers follow a periodic pattern. Similarly, the
daily and weekly time series of the fraudster consumer is
exhibited in Fig. 2(a) and Fig. 2(b), respectively, which show
a non-periodic behavior at each time interval. In contrast to
Fig. 1(b), an abrupt and highest peak is observed on 3rd day
of the 1st week, as shown in Fig. 2(b), which validates the
After analyzing the time series data of both fair and fraud-
ster consumers, it is observed that the consumption patterns
of fair consumers follow a symmetric pattern, in contrast, the
suspicions show asymmetric behavior. This assumption leads
to scrutinize and analyze the electricity consumption patterns
of consumers, which violate the uniform control limit.
However, it is a challenging and an arduous task to capture
the dynamic changes in time series due to the following
1) due to the imbalanced nature of dataset, the distribution
is skewed towards the dominating class and consequently, the
classifiers do not discriminate the decision boundary. Hence,
the classifier tends to overfit [1],
2) the energy consumption data mostly consists of missing
values and outliers. The smoothing spline can detect the
outliers, however, it is difficult to capture the true continuity.
The selection of thresholds (knots) and their location are two
big challenges. Moreover, by increasing the degree from a
certain threshold, the chances of misclassification increase.
Hence, the suspicious consumers can be misclassified. As
shown in Fig. 2(a), the consumption of a fraudster consumer
shows unusual activity, which is normalized by the smoothing
spline [13], [22],
3) extracting decisive features from a highly dynamic sequen-
tial time series is significant, which traditional CNN lacks [1].
4) in literature, most of the datasets referred to electricity theft
are unlabeled. The synthetic attacks are launched, which do
not show the true relation between consumed energy [21],
5) the selection of suitable performance metrics is of great
importance in ETD. The most widely used performance mea-
sure i.e., accuracy is an inadequate measure in terms of fraud
detection, because the cases of theft are reared as compared
to the adversary. The classifier shows higher accuracy, even
though the theft cases are misclassified, which negates the true
relation between weekly consumed energy [25]. Similarly, low
ETD rate, minimum AUC and high F+rate are observed in
Da y 1
Da y 2
Da y 3
Da y 4
Da y 5
Da y 6
Da y 7
(b )
1s t we ek
2n d we ek
3r d we ek
4t h w eek
Fig. 1: Electricity consumption pattern of a honest consumer.
(a) Date-wise electricity consumption. (b) Weekly electricity
Da y 1
Da y 2
Da y 3
Da y 4
Da y 5
Da y 6
Da y 7
(b )
45 1s t we ek
2n d we ek
3r d we ek
4t h w eek
Fig. 2: Electricity consumption pattern of a fraudulent
consumer. (a) Date-wise electricity consumption. (b) Weekly
electricity consumption.
The proposed ETD technique consists of two steps. In the
first step, the preprocessing is done in which the issues of miss-
ing values, data standardization and handling the imbalanced
class are resolved. In the second step, a three-fold operation is
performed, which involves decisive feature extraction, analysis
of sequential time series and the application of a classifier. The
details are provided in the following subsections.
A. Data preprocessing
The preliminary analysis of data is a mandatory step in
highly dynamic time series analysis, which includes imputa-
tion, outlier detection, data standardization, handling imbal-
ance data, etc.
1) Handling missing values and data standardization: The
electricity consumption records of consumers contain either
incomplete information or missing values. The reasons behind
this issue may be the failure of hardware or corruption of data.
In case of high time series data, the missing values can not be
dropped. However, the imputation is performed synthetically
in order to fill these values. In most cases, the filling of
missing values is performed through averaging. In this paper,
the missing values are recovered through interpolation method
[1], as under:
f(zi) = (zi1+zi+1
2if ziN aN, zi1, zi+1 6∈ NaN
where, ziis the recorded or missed (null) observation in the
dataset. The null value is represented as NaN. If ziis null,
then it is filled according to equation (1).
Similarly, the data standardization is performed using min
max normalization [1], using equation (2).
f(zi) = zimin(z)
where, min(z)shows minimum value of zand max(z)
represents maximum value of z.
Fig. 3: System model of the proposed DSN
2) Handling imbalanced class distribution: A dataset is
considered as imbalanced or biased, if the sample points
of one class (majority class) highly dominate the instances
of other class (minority class). Due to underrepesentation
of minority class, the distribution is skewed towards the
majority class. Consequently, the classifier cannot discriminate
the decision boundary. Hence, it becomes unable to learn
the key characteristics of minority class and tends to overfit.
The issues related to imbalanced data are not only limited to
image recognition, semantic segmentation, but are also applied
equally to time series data [26].
The existing remedies for handling imbalanced class issues
fall under one of the three solutions: cost-sensitive approach,
algorithm-level approach and data-level handling approach
[27]. In cost sensitive approach, the affects of highly dominat-
ing class are reduced in the training stage. The misclassifica-
tion costs of both the dominating and suppressing classes are
taken into account and the weights are assigned accordingly.
Hence, the cost-sensitive approach tweaks the minority class
towards the dominating class. In algorithm-level approach, the
model is modified and trained in such a way that the scarce
instances are favored and over-weighted, so that the disparity
produced by the majority class is reduced during learning
stage. Traditionally, the class balancing was achieved by
data-level approach, which includes both undersampling and
oversampling techniques. In undersampling, the majority class
is sacrificed a lot by down-sizing the actual data because in
most cases the right choices are eliminated. Similarly, copying
the instances of minority class mostly leads to overfitting,
which is a downfall of oversampling. The right choice for
the selection of technique related to handling the imbalanced
class issue depends upon the nature of problem.
In this paper, the responsibility of handling imbalanced data
is assigned from algorithm-level to data-level. In particular, the
oversampling technique is adopted in order to avoid the prob-
lem of decisive sample elimination caused by undersampling
technique. Specifically, for oversampling, ADASYN sampling
approach is applied in order to better comprehend the selected
points [28]. In contrast to simply duplicating the instances
of minority class, the ADASYN selects samples and injects
some noise. The impact of noise addition results in better
generalization of the model. The reason behind the selection
of ADASYN is not only to avoid overfitting, but also to
emphasize outliers’ detection in the feature space.
B. Proposed deep siamese (CNN-LSTM) network architecture
for ETD
In the second step of the proposed methodology, identi-
fication of the fraudulent consumers is performed via joint
integration of CNN-LSTM with DSN. The details are provided
in the following subsections:
1) Features extraction through convolution neural net-
works: The preliminary data analytics show the periodicity
and non-periodicity in electricity consumption of fair and
fraudulent consumers. The identification of a fraudster con-
sumer is difficult when analyzing the daily electricity con-
sumption record, since the electricity consumption of each day
shows a relatively independent pattern. Therefore, aligning the
electricity consumption of several weeks is beneficial for de-
tecting abnormal patterns. The work done in [1] indicates that
CNN performs well in such situation, hence the daily electric-
ity consumption data is transformed to weekly consumption,
accordingly. A deep CNN is trained on the weekly electricity
consumption profile through multiple stacked convolutional
layers, convolution filters, a max-pooling layer and a fully
connected layer. Convolution is the element-wise multiplica-
tion of weights with corresponding inputs. After convolution,
the features-map is obtained by sliding the convolution filter
or kernel over the input vector.
2) Sequence learning through long short term memory: The
association of memory to the NN makes it more powerful to
handle time series data, which becomes the inherent behavior
of recurrent neural network (RNN) [29]. The problems asso-
ciated with RNN are vanishing and exploding gradients [30].
These issues arise due to the ignorance of long-term and short-
term dependencies. Unlike traditional RNN models, LSTM is
introduced to overcome the aforementioned limitations [31].
The structure of LSTM is same as RNN except the repeating
module. Instead of a single NN layer, LSTM has more layers,
which demonstrate the better representation of time series data.
In fact, LSTM is capable to handle the vanishing gradient
problem and to remember the information for a long period
of time, which is practically its default behavior.
In our work, the daily electricity consumption profile is
analyzed by LSTM. Moreover, LSTM is also capable to fetch
the time window of anomalous time series.
3) Supervised learning based on deep siamese network:
DSN can be applied to the problem, where the aim is to
discriminate features on the basis of similarity measurer [32].
Unlike traditional CNN, which has low generalization ability,
DSN works superior because of its best feature extraction
capabilities [32], [33], [34]. DSN is a supervised machine
learning technique, which operates in two main steps: shared
feature extractor and distance measurer or cost estimator. The
shared feature extractor is the encoding of features while the
cost function estimates the difference between two embedding
4) Mathematical formulation for CNN-LSTM: The com-
bination of CNN and LSTM is used in the proposed work to
discriminate the features of two different types of consumers,
i.e., honest and fraudulent. The mathematical formulation
of the CNN-LSTM module used in the underlying work is
described below.
The two input sequences, i.e., ψiand ψjare taken paral-
lelly by the CNN-LSTM module, such that both ψi, ψj=
{(x1, y1),(x2, y2), ..., (xn, yn)}, where, xishows the input
features and yi[0,1] is the corresponding target values
(yi= 0 implies that the instance belongs to fair class). The
features of both the classes are learned by the CNN-LSTM
module and finally the encoding of features is performed [32],
using equations (3) and (4):
Ei=δ{ωn{...δ{ω2.[δ(ω1i+b1) + b2]...}+bn},(3)
Ej=δ{ωn{...δ{ω2.[δ(ω1j+b1) + b2]...}+bn},(4)
where, δ(.),ωnand b, show the sigmoid function, weights and
biases, respectively. Thereafter, the shared features are fed to a
loss function, which discriminates the features on the basis of
similarity measure. Therefore, the classification loss such as
binary cross entropy is not viable. Instead, a constructive loss
function is used, as in [32], to better comprehend the features,
given in euqation (5).
i,j =di,j .max[0,(1 ˆ
di,j )] + (1 di,j ).ˆ
di,j ,(5)
where, di,j is the Euclidean distance, which is calculated
for the features’ output accordingly, i.e., ˆ
di,j =||EiEj||2.
Similarly, di,j shows the actual distance, given in equation
di,j =(1,if yi6= ˆyj
0,otherwise. (6)
The objective of training DSN is to minimize the variance
between di,j and ˆ
di,j .
In this section, the simulations are performed in order to
compare the performance of the proposed model with the
benchmark schemes.
A. Simulation setup
1) Dataset acquisition: The dataset is acquired from the
largest power providing company in China i.e., SGCC2, which
is publicly available. The daily consumption record is available
for 1035 days i.e., from January 1, 2014 to October 31,
2016. The ground truth of the dataset states that 9% of
the total consumers are declared as electricity thieves, which
demonstrates a high ratio.
0 2 4 6 8 10 12 14
0. 00
0. 25
0. 50
0. 75
Accur acy
0 2 4 6 8 10 12 14
Num ber of ep ochs
Tra in
Fig. 4: Performance of CNN-LSTM model.
0. 0 0.2 0.4 0 .6 0 .8 1 .0
0. 00
0. 25
0. 50
0. 75
1. 00
Tra in (AUC = 0 .75)
Test (AUC = 0 .73 )
0. 0 0.2 0.4 0 .6 0 .8 1 .0
Reca ll
0. 00
0. 25
0. 50
0. 75
1. 00
Prec isi on
Tra in (PR = 0.8 2)
Test (PR = 0.5 2)
Fig. 5: ROC-AUC and PR curve of CNN-LSTM model.
2) Performance metrics: In order to detect NTL from
the pool of electricity consumption profile, the performance
metrics such as true positives (T+) and true negatives
(T) show the correctly classified instances. In contrast,
false negatives (F) and F+reflect an opposite scenario,
where Fshows the number of fraud consumers, which are
0 2 4 6 8 10 1 2 14 16
0. 6
0. 8
1. 0
Accur acy
0 2 4 6 8 10 1 2 14 16
Num ber of ep ochs
0. 0
0. 2
0. 4
0. 6
Tra in
Fig. 6: Performance of DSN.
0. 0 0.2 0.4 0 .6 0 .8 1 .0
0. 00
0. 25
0. 50
0. 75
1. 00
Tra in (AUC = 0.9 9)
Test (AUC = 0 .9 3)
0. 0 0.2 0.4 0 .6 0 .8 1 .0
Reca ll
0. 00
0. 25
0. 50
0. 75
1. 00
Prec isi on
Tra in (PR = 0. 99 )
Test (PR = 0.9 2)
Fig. 7: ROC-AUC and PR curve of DSN.
misclassified as fair and vice versa. The objective behind
the accurate detection of NTL is to reduce the F+, which
consequently maximizes T+. Other performance metrics
related to classification are recall, precision, specificity,
F1-score, accuracy, mAP, and AUC of receiver operating
characteristics (ROC) curve, given by equations (7)-(11),
taken from [13].
Recall =T+
P recision =T+
Specif icity =T
F1score = 2 P recision Recall
P recision +Recall ,(10)
Accuracy =T++T
Though, accuracy and recall are widely used in the literature as
performance metrics, however, they are inadequate in case of
imbalanced class distribution, as shown in Table I. Similarly,
precision, specificity and F1-Score do not show accurate
results and are not reliable when used individually.
In order to detect NTL without the loss of information,
selection of reliable performance metrics is required [13].
The performance metrics such as mAP and AUC are applied
in this work to better comprehend the imbalanced data. As
mentioned in [1], [2], ROC curve and mAP are the best
performance metrics used for detecting suspects in imbalanced
class distribution.
The ROC curve is the graphical representation of T+rate
and F+rate. It is used to evaluate the performance of a
classifier. The area under the ROC curve is called AUC, which
separates the distribution of fraudulent class from fair class,
as given in equation (12). The limits of ROC curve range
from 0to 1. The ideal situation arises when no curve overlaps
each other. AUC approaching 1demonstrates the validity of
classifier while AUC less than 0.5shows that the classifier
does not have the ability to discriminate the classes [1], [2].
AUC is calculated using equation (12), taken from [1].
2|S|(|S|+ 1)
where, Ridenotes the rank of suspicion degree of fraudulent
consumers in ascending order while |S|and |H|are the
cardinality of suspicious and honest consumers, respectively.
The second performance metric used in this paper is mAP.
It is defined as the mean of all average precisions. It is used
for useful information retrieval, when the performance metrics
discussed in equations (7)-(11) fail.
Let ykshows the number of fraudulent consumers and k
denotes the top rank fraudulent consumers, such that the
precision is defined as P@k=yk
k. The calculations performed
by mAP for information retrieval are given in equation (13),
taken from [1].
mAP @N=Pr
i=1 P@ki
where, rshows the number of suspicious consumers of top
ranked theft labels N. The value of Nis 100 in our scenario.
B. Affect of imbalanced distribution on performance metrics
In imbalanced class problem, one class significantly domi-
nates the other class, which results in the suppression of the
minority class. The affect of least important and significant
performance metrics can be seen in Table I.
Table I shows the comparative analysis of DSN (with and
without) handling imbalanced class distribution. It is seen
that the performance is worst without handling imbalanced
class issue. Especially, in case of recall, where a lot of fraud
instances are misclassified as fair. Similarly, the performance
of ADASYN is better than that of random undersampling
(RUS). The reason behind the low-performance of RUS is due
to the elimination of decisive features.
The consequences of using accuracy as a performance
metric are that it results in low T+rate and high F+rate.
These results can also be seen in Table I that even though
the accuracy is higher, AUC and mAP are still minimized.
Therefore, it is deduced that accuracy does not guarantee
accurate classification of the instances in skewed distributions.
TABLE I: Significance of handling imbalanced class
Performance metrics
imbalance class
mAP 0.5952 0.5997 0.8988
AUC 0.6270 0.6520 0.9250
F1-Score 0.6467 0.5500 0.9249
Accuracy 0.7065 0.6519 0.9241
Precision 0.7524 0.6500 0.9153
Recall 0.3771 0.6300 0.9347
C. Comparative analysis
The proposed model is compared with the baseline methods
for validation purpose. The baseline methods used for com-
parison are discussed below.
1) Support vector machine: SVM is an elegant technique
used for both classification and regression tasks. It discrimi-
nates the boundary of different classes by a hyperplane. The
construction of hyperplane is entirely dependent upon the
selection of support vectors. Table II shows the optimized hy-
perparameters obtained through grid search. The regularization
parameter Cis selected to be 0.001 with radial basis function
(RBF) as a kernel.
TABLE II: SVM hyperparameters’ selection
Hyperparameter Values range Optimal value
C 0.001, 0.01, 0.1, 1, 10, 100 0.001
Kernel Linear, RBF RBF
2) Logistic regression: LR is a simple and an elegant
technique used for binary classification. In it, both classes are
separated by a hyperplane h= (w, b), where wand bshow
the norm and intercept of the hyperplane, respectively. The
finding of optimal wand bimplies that the hyperplane can
accurately separate the decision boundary of both the classes.
The operations performed by LR are same as NN for training
input features using trained weight metrics.
A distance metric for each observation such as di=wT.xi
||w|| is
used to find the margin with hyperplane, where wand xdenote
the weights and corresponding input metrics. ||w|| is the norm
to the hyperplane and is assumed as a unit vector. The weights
accompanied with input feature metrics are passed through
a sigmoid function f(d) = 1
1+ed. The library: LIBLINEAR
solver is used to train the classifier and find the optimal
weights while using the logarithmic loss function [35].
During grid-search, the hyperparameters of LR are obtained,
as given in Table III, where Cis the hyperparameter used to
handle the overfitting and Rshows the type of regularization.
The best hyperparameters are achieved when C= 0.01 and
L2norm is selected as the type of regularization. The careful
selection of these parameters is essential for the performance
of the forecasting model [36]
TABLE III: LR hyperparameters’ selection
Hyperparameter Values range Optimal value
C 0.001, 0.01, 0.1, 10, 100 0.01
R L1norm, L2norm L2norm
3) Random forest: RF is an ensemble model with decision
tree as a baseline classifier. In order to make better predictions,
RF combines multiple decision trees (DT) on the bases of
bootstrapping and feature sampling. Simultaneous execution of
bootstrapping and feature extraction yields a different model
each time. The essence of RF is that it can reduce variance
efficiently. The samples which are selected by the classifier
are called in-bag-samples (ibs)while the remaining samples
are known as out-of-bag (oob)samples. The ibs are used to
train the classifier while oob are used to validate the model.
Table IV shows the best generalized hyperparameters of RF.
TABLE IV: RF hyperparameters’ selection
Hyperparameter Values range Optimal value
Number of decision trees 800, 1200, 1600, 2000 1200
Maximum depth 10, 15, 20, 25 20
Minimum sample splits 5, 10, 15, 20 15
Minimum sample leaves 4, 8, 12, 16 16
4) CNN-LSTM: For performing comparative analysis, CNN
and LSTM are integrated to extract the features and analyze
the time series data [37]. The details of hyperparameters are
given in Table V. The performance of LSTM-CNN model
without DSN is shown in Fig. 4 and Fig. 5. Although, the
features are extracted by CNN and the sequence information
is preserved by LSTM, this hybrid model still fails to provide
efficient results due to the lack of discrimination between
TABLE V: CNN-LSTM hyperparameters’ selection
Hyperparameter Optimal
value Hyperparameter Optimal
Number of neurons 64 Stride 1
Number of CNN-layers 6 Dropout 0.1
Number of LSTM-layers 4 Dense layer 128
Number of filters 10 Activation function LeakyReLu
5) Wide and deep CNN: To capture both the wide and
deep information in time series data for NTL detection, a
wide and deep CNN (WD-CNN) is proposed in [1]. The wide
component takes the daily consumption (1-D data) as an input
while the deep component analyzes the weekly consumption
profile, which is represented as 2-D data. The rectified linear
unit (ReLu) is used as an activation function to detect the
positive value. Whereas, the metrics AUC and mAP are used to
measure the performance of the model. The hyperparameters
used to train the model are same as used in [1].
6) Results and discussion: Table VI provides an overview
of the performance metrics used for each classifier for different
training ratios, i.e., 60%,70% and 80%, respectively. Simi-
larly, the detail of each performance metric is given in order
to better understand its importance in ETD. All the results
obtained for traditional classifiers such as LR, SVM, RF show
an increasing trend. By investigating the results, it is observed
that the performance of traditional classifiers is enhanced
by the increase in training instances. In contrast, the deep
networks entirely depend on the selection of hyperparameters
along with the change in model’s training ratio. Moreover, it
is clear in Table VI that the proposed model is successfully
applied to both small sized and immensely large-sized datasets.
Similarly, the proposed model’s performance is visualized in
Fig. 6 and Fig. 7.
In this paper, electricity theft is detected in the SG us-
ing dataset obtained through AMI. A novel theft detection
method is introduced via joint integration of CNN-LSTM
and DSN. The CNN component is capable to handle the
weekly 2-D electricity consumption profile by generalizing
the model efficiently, whereas, the LSTM module memorizes
the daily 1-D sequential electricity consumption data. Moving
ahead, DSN performs judgment on the shared feature extractor
and discriminates the deviating patterns of fraudulent class
consumers from the fair class consumers. The analysis is
performed on high resolution time series data, provided by
SGCC. The simulation results depict that SDN has high
ETD rate with an increased AUC and mAP of 0.93% and
0.9%, respectively. Its comparative analysis with benchmark
methods, such as LR, SVM, RF, CNN-LSTM and WD-CNN,
show that it achieves highest values for all performance
parameters: precision, recall, MaP, Accuracy, AUC and F1-
Score. It maintains its performance for all three training ratios:
60%, 70% and 80%.
[1] Zheng, Z., Yang, Y., Niu, X., Dai, H.N. and Zhou, Y., 2017. Wide
and deep convolutional neural networks for electricity-theft detection to
secure smart grids. IEEE Transactions on Industrial Informatics, 14(4),
[2] Zheng, K., Chen, Q., Wang, Y., Kang, C. and Xia, Q., 2018. A novel
combined data-driven approach for electricity theft detection. IEEE
Transactions on Industrial Informatics, 15(3), pp.1809-1819.
[3] Asif Khan and Nadeem Javaid, "Jaya Learning-Based Optimization for
Optimal Sizing of Stand-Alone Photovoltaic, Wind Turbine, and Battery
Systems", Engineering, Pages: 1-21, Published: 2020, ISSN: 2095-8099.
[4] Ashfaq Ahmad, Nadeem Javaid, Mohsen Guizani, Nabil Ali Alrajeh and
Zahoor Ali Khan, "An Accurate and Fast Converging Short-Term Load
Forecasting Model for Industrial Applications in a Smart Grid", IEEE
Transactions on Industrial Informatics, Volume: 13, Issue: 5, Pages:
2587-2596, Published: October 2017, ISSN: 1551-3203.
[5] Sana Mujeeb and Nadeem Javaid, "ESAENARX and DE-RELM: Novel
Schemes for Big Data Predictive Analytics of Electricity Load and
Price", Sustainable Cities and Society, Volume: 51, Article Number:
101642, Pages: 1-16, Published: November 2019, ISSN: 2210-6707.
[6] Jokar, P., Arianpoo, N. and Leung, V.C., 2015. Electricity theft detection
in AMI using customers’ consumption patterns. IEEE Transactions on
Smart Grid, 7(1), pp.216-226.
[7] Saeed, M.S., Mustafa, M.W., Sheikh, U.U., Jumani, T.A. and Mirjat,
N.H., 2019. Ensemble Bagged Tree Based Classification for Reducing
Non-Technical Losses in Multan Electric Power Company of Pakistan.
Electronics, 8(8), pp.860-876.
[8] Singh, S.K., Bose, R. and Joshi, A., 2019. Energy theft detection for
AMI using principal component analysis based reconstructed data. IET
Cyber-Physical Systems: Theory & Applications, 4(2), pp.179-185.
[9] Mohamad, A.M. and Mohamed, Y.A.R.I., 2019. Investigation and As-
sessment of Stabilization Solutions for DC Microgrid With Dynamic
Loads. IEEE Transactions on Smart Grid, 10(5), pp.5735-5747.
[10] Martins, A.V., Bacurau, R.M., dos Santos, A.D. and Ferreira, E.C., 2019.
Non-Intrusive Energy Meter for Non-Technical Losses Identification.
IEEE Transactions on Instrumentation and Measurement, pp.1-8.
[11] Amin, S., Schwartz, G.A., Cardenas, A.A. and Sastry, S.S., 2015. Game-
theoretic models of electricity theft detection in smart utility networks:
Providing new capabilities with advanced metering infrastructure. IEEE
Control Systems Magazine, 35(1), pp.66-81.
[12] Ahmad, T., Chen, H., Wang, J. and Guo, Y., 2018. Review of various
modeling techniques for the detection of electricity theft in smart grid
environment. Renewable and Sustainable Energy Reviews, 82, pp.2916-
TABLE VI: Comparative analysis of DSN with benchmark schemes
Method Training ratio-60% Training ratio-70% Training ratio-80%
LR 0.710 0.710 0.680 0.700 0.645 0.702 0.725 0.740 0.715 0.720 0.640 0.716 0.730 0.725 0.725 0.730 0.668 0.720
SVM 0.675 0.670 0.680 0.676 0.6140 0.677 0.685 0.675 0.675 0.680 0.619 0.684 0.680 0.680 0.680 0.680 0.628 0.688
RF 0.700 0.550 0.551 0.710 0.687 0.706 0.740 0.730 0.735 0.740 0.652 0.735 0.750 0.750 0.750 0.750 0.681 0.749
CNN-LSTM 0.664 0.615 0.661 0.836 0.638 0.666 0.629 0.662 0.636 0.839 0.641 0.670 0.670 0.69 0.676 0.832 0.66 0.73
WD-CNN 0.640 0.691 0.651 0.820 0.669 0.689 0.624 0.720 0.770 0.770 0.689 0.718 0.661 0.760 0.685 0.840 0.711 0.756
DSN 0.875 0.839 0.857 0.839 0.814 0.860 0.840 0.850 0.845 0.844 0.819 0.844 0.912 0.923 0.928 0.953 0.900 0.934
[13] Avila, N.F., Figueroa, G. and Chu, C.C., 2018. NTL Detection in
Electric Distribution Systems Using the Maximal Overlap Discrete
Wavelet-Packet Transform and Random Undersampling Boosting. IEEE
Transactions on Power Systems, 33(6), pp.7171-7180.
[14] Li, W., Logenthiran, T., Phan, V.T. and Woo, W.L., 2019. A novel smart
energy theft system (SETS) for IoT-based smart home. IEEE Internet of
Things Journal, 6(3), pp.5531-5539.
[15] Kumar, P., Lin, Y., Bai, G., Paverd, A., Dong, J.S. and Martin, A.,
2019. Smart grid metering networks: A survey on security, privacy and
open research issues. IEEE Communications Surveys & Tutorials, 21(3),
[16] Viegas, J.L., Esteves, P.R., Melicio, R., Mendes, V.M.F. and Vieira,
S.M., 2017. Solutions for detection of non-technical losses in the
electricity grid: A review. Renewable and Sustainable Energy Reviews,
80, pp.1256-1268.
[17] Abbasinezhad-Mood, D. and Nikooghadam, M., 2018. Efficient anony-
mous password-authenticated key exchange protocol to read isolated
smart meters by utilization of extended Chebyshev chaotic maps. IEEE
Transactions on Industrial Informatics, 14(11), pp.4815-4828.
[18] Abbasinezhad-Mood, D. and Nikooghadam, M., 2018. Design and hard-
ware implementation of a security-enhanced elliptic curve cryptography
based lightweight authentication scheme for smart grid communications.
Future Generation Computer Systems, 84, pp.47-57.
[19] Saeed, Muhammad Salman, Mohd Wazir Mustafa, Nawaf N. Hamadneh,
Nawa A. Alshammari, Usman Ullah Sheikh, Touqeer Ahmed Jumani,
Saifulnizam Bin Abd Khalid, and Ilyas Khan. "Detection of Non-
Technical Losses in Power Utilities—A Comprehensive Systematic
Review." Energies 13, no. 18 (2020): 4727.
[20] Viegas, J.L., Esteves, P.R. and Vieira, S.M., 2018. Clustering-based
novelty detection for identification of non-technical losses. International
Journal of Electrical Power & Energy Systems, 101, pp.301-310.
[21] Punmiya, R. and Choe, S., 2019. Energy theft detection using gradient
boosting theft detector with feature engineering-based preprocessing.
IEEE Transactions on Smart Grid, 10(2), pp.2326-2329.
[22] Buzau, M.M., Tejedor-Aguilera, J., Cruz-Romero, P. and Gomez-
Exposito, A., 2019. Hybrid deep neural networks for detection of non-
technical losses in electricity smart meters. IEEE Transactions on Power
Systems, pp.1-10.
[23] Spiric, J.V., Stankovic, S.S. and Docic, M.B., 2018. Identification
of suspicious electricity customers. International Journal of Electrical
Power & Energy Systems, 95, pp.635-643.
[24] Razavi, R., Gharipour, A., Fleury, M. and Akpan, I.J., 2019. A practical
feature-engineering framework for electricity theft detection in smart
grids. Applied energy, 238, pp.481-494.
[25] Ramos, C.C., Rodrigues, D., de Souza, A.N. and Papa, J.P., 2016. On the
study of commercial losses in Brazil: a binary black hole algorithm for
theft characterization. IEEE Transactions on Smart Grid, 9(2), pp.676-
[26] Khan, S.H., Bennamoun, M., Sohel, F., Togneri, R. and Naseem, I., 2016.
Integrating geometrical context for semantic labeling of indoor scenes
using rgb images. International Journal of Computer Vision, 117(1),
[27] Razavi-Far, R., Farajzadeh-Zanjani, M., Wang, B., Saif, M. and
Chakrabarti, S., 2019. Imputation-based Ensemble Techniques for Class
Imbalance Learning. IEEE Transactions on Knowledge and Data Engi-
neering, pp.1-14.
[28] He, H., Bai, Y., Garcia, E.A. and Li, S., 2008, June. ADASYN: Adaptive
synthetic sampling approach for imbalanced learning. In 2008 IEEE
international joint conference on neural networks (IEEE world congress
on computational intelligence) (pp. 1322-1328). IEEE.
[29] Rabiya Khalid, Nadeem Javaid, Fahad A. Al-zahrani, Khursheed Au-
rangzeb, Emad-ul-Haq Qazi and Tehreem Ashfaq, "Electricity Load and
Price Forecasting Using Jaya-Long Short Term Memory (JLSTM) in
Smart Grids", Entropy, Volume: 22, Issue: 1, Article Number: 10, Pages:
1-21, Published: January 2020, ISSN: 1099-4300.
[30] Muhammad Adil, Nadeem Javaid, Umar Qasim, Ibrar Ullah, Muhammad
Shafiq and Jin-Ghoo Choi, "LSTM and Bat-Based RUSBoost Approach
for Electricity Theft Detection", Applied Sciences, Volume: 10, Issue:
12, Article Number: 4378, Pages: 1-21, Published: June 2020, ISSN:
[31] Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R. and Schmid-
huber, J., 2016. LSTM: A search space odyssey. IEEE transactions on
neural networks and learning systems, 28(10), pp.2222-2232.
[32] Hu, T., Guo, Q., Shen, X., Sun, H., Wu, R. and Xi, H., 2019. Utilizing
unlabeled data to detect electricity fraud in AMI: A semisupervised deep
learning approach. IEEE transactions on neural networks and learning
systems, 30(11), pp.3287-3299.
[33] Wang, M., Tan, K., Jia, X., Wang, X. and Chen, Y., 2020. A Deep
Siamese Network with Hybrid Convolutional Feature Extraction Module
for Change Detection Based on Multi-sensor Remote Sensing Images.
Remote Sensing, 12(2), p.205. DOI: 10.3390/rs12020205.
[34] Miao, J., Wang, B., Wu, X., Zhang, L., Hu, B. and Zhang, J.Q., 2019,
July. Deep Feature Extraction Based on Siamese Network and Auto-
Encoder for Hyperspectral Image Classification. In IGARSS 2019-2019
IEEE International Geoscience and Remote Sensing Symposium (pp.
397-400). IEEE.
[35] Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R. and Lin, C.J., 2008.
LIBLINEAR: A library for large linear classification. Journal of machine
learning research, 9(Aug), pp.1871-1874.
[36] Rabiya Khalid and Nadeem Javaid, "A Survey on Hyperparameters Opti-
mization Algorithms of Forecasting Models in Smart Grid", Sustainable
Cities and Society, Pages: 1-35, Article Number: 102275, Published:
2020, ISSN: 2210-6707.
[37] Hasan, M., Toma, R.N., Nahid, A.A., Islam, M.M. and Kim, J.M., 2019.
Electricity Theft Detection in Smart Grid Systems: A CNN-LSTM Based
Approach. Energies, 12(17), pp.3310-3328.
NADEEM JAVAID (S’8, M’11, SM’16) received
the bachelor degree in computer science from Go-
mal University, Dera Ismail Khan, Pakistan, in
1995, the master degree in electronics from Quaid-
i-Azam University, Islamabad, Pakistan, in 1999,
and the Ph.D. degree from the University of Paris-
Est, France, in 2010. He is currently an Associate
Professor and the Founding Director of the Commu-
nications Over Sensors (ComSens) Research Labora-
tory, Department of Computer Science, COMSATS
University Islamabad, Islamabad. He has supervised
126 master and 20 Ph.D. theses. He has authored over 900 articles in technical
journals and international conferences. His research interests include energy
optimization in smart/micro grids and in wireless sensor networks, data
analytics in smart grids, and blockchain in WSNs, etc. He was recipient of
the Best University Teacher Award from the Higher Education Commission
of Pakistan, in 2016, and the Research Productivity Award from the Pakistan
Council for Science and Technology, in 2017. He is also Associate Editor of
IEEE Access, Editor of the International Journal of Space-Based and Situated
Computing and editor of Sustainable Cities and Society.
NAEEM JAN Naeem Jan received the B.S. degree
in computer science from PMAS, University Insti-
tute of Information Technology Rawalpindi, Pak-
istan, in 2014, and the M.S. degree in computer
science, under the supervision of Dr. N. Javaid, from
the Department of Computer Science, COMSATS
University Islamabad, Islamabad Campus, Pakistan,
in 2017. He is currently with ComSens (Commu-
nication over Sensors) Research Laboratory, COM-
SATS University Islamabad. His research interests
include wireless sensor networks, optimization tech-
niques, Big data analysis, and Internet of Things.
MUHAMMAD UMAR JAVED received the bach-
elor’s and master’s degrees in electrical engineer-
ing from Government College University Lahore,
Lahore, Pakistan, in 2014 and 2018, respectively.
He is currently pursuing the Ph.D. degree in com-
puter science with the Communications Over Sen-
sors (ComSens) Research Laboratory, COMSATS
University Islamabad, Islamabad Campus, under the
supervision of Dr. Nadeem Javaid. His research
interests include smart grids, electric vehicles, big
data analysis and blockchain.
... Given the importance of boosting and DL algorithms, a limited but growing body of literatures [8]- [11] utilized the publicly available SGCC (State Grid Corporation of China) dataset and successfully applied for NTL detection in smart grid. Hussain et al. [8] used a feature engineered based category boosting (CatBoost) algorithm in conjunction with the SMOTETomek sampling algorithm for ETD. ...
... Due to this, the input is limited to a fixed size window and the prediction model cannot capture a descent in the EC data if it occurred before the analysis period. More recent work in [11] utilized a deep siamese network (DSN) to discriminate between honest and dishonest consumers in EC data. The proposed model achieved good prediction results but at the cost of two shortcomings, as compared to the other well performing DL methods [12]. ...
... First, a random noise (Jitter) is added to each input pattern during network training. The addition of noise is attained via the Gaussian Noise layer [9], WD-CNN [10] and DSN [11] due to the reasons as discussed in Section II. ...
... Such data balancing methods are simple to execute; however, they can cause significant data loss, resulting in a reduction in the accuracy of the developed model. In another article [39], the data class distribution was balanced via the use of the ADAptive SYNthesis (ADASYN) based oversampling technique. While the developed approach obtained better generalizing ability, it achieved lower accuracy owing to the underfitting of the developed model. ...
... After feature engineering, choosing a suitable classifier for efficiently separating genuine and fraudulent customers is the next challenge in any supervised ML technique. Nagi et al. [39] used a predictive modelling technique based on support vector machines (SVM) to identify abnormal behaviour of the consumers. The SVM-based ML model was developed using customer load profile data and other characteristics such as creditworthiness rating, meter reading data, and fraudulent activity report to identify abnormal consumer behaviour effectively. ...
... However, the detection hit rate achieved was merely 60% which is significantly very low, particularly when consumers are in the millions. In one of the most recent studies, a deep Siamese network (DSN) coupled with a convolutional neural network (CNN) and long-short term memory (LSTM) was proposed by Javaid et al. [39] to differentiate the characteristics of genuine and dishonest consumers. The authors achieved a reasonable accuracy; however, the precision and recall rates were comparatively lower. ...
Full-text available
Abstract This paper presents a novel, sequentially executed supervised machine learning‐based electric theft detection framework using a Jaya‐optimized combined Kernel and Tree Boosting (KTBoost) classifier. It utilizes the intelligence of the XGBoost algorithm to estimate the missing values in the acquired dataset during the data pre‐processing phase. An oversampling algorithm based on the Robust‐SMOTE technique is utilized to avoid the unbalanced data class distribution issue. Afterward, with the aid of few very significant statistical, temporal, and spectral features extracted from the acquired kWh dataset, the complex underlying data patterns are comprehended to enhance the accuracy and detection rate of the classifier. For effectively classifying the consumers into “Honest” and “Fraudster,” the ensemble machine learning‐based classifier KTBoost, with Jaya algorithm optimized hyperparameters, is utilized. Finally, the developed model is re‐trained using a reduced set of highly important features to minimize the computational resources without compromising the performance of the developed model. The outcome of this study reveals that the proposed theft detection method achieves the highest accuracy (93.38%), precision (95%), and recall (93.18%) among all the studied methods, thus signifying its importance in the studied area of research.
... From [32] and [36], the existing literature is teemed with various oversampling techniques that are employed to handle the problem of class imbalance. In oversampling techniques, the minority class samples are augmented and the proportion of classes is equalized. ...
... : z l ← W l σ (h l ) + b l30 h Bi−LSTM = tanh(z l ) 31 Back propagation:32 U l T (x), W l T (x) and b l T (x) 33 end 34 end 35 Hybrid layer:36 NTL det = σ (W [h 2D−CNN , h Bi−LSTM ] + b)V. EXPERIMENTS AND RESULTSIn this section, the experimental results of the proposed and the existing schemes are presented. ...
Full-text available
In this paper, we present a hybrid deep learning model that is based on a two-dimensional convolutional neural network (2D-CNN) and a bidirectional long short-term memory network (Bi-LSTM)to detect non-technical losses (NTLs) in smart meters. NTLs occur due to the fraudulent use of electricity. The global integration of smart meters has proven to be beneficial for the storage of historical electricity consumption (EC) data. The proposed methodology learns the deep insights from the historical EC data and informs power utilities about the presence of NTLs. However, the effective detection of NTLs faces the problem of class imbalance that occurs due to the rare availability of fraudulent electricity consumers. To solve this issue, an evolutionary bidirectional Wasserstein generative adversarial network (Bi-WGAN) is employed. Bi-WGAN synthesizes the most plausible fraudulent EC samples by integrating an auxiliary encoder module. Besides, the inevitable curse of high dimensional data reduces the generalization ability of classifiers. The proposed hybrid model efficiently handles the highly dynamic data by utilizing its potent feature extracting capabilities. The one-dimensional daily EC data is passed to Bi-LSTM model for capturing the non-malicious changes from consumers’ profiles. Meanwhile, 2D-CNN takes 2D weekly EC data as input to extract the potential features by applying different convolutions and pooling operations. Extensive experiments are conducted on a realistic smart meters dataset to prove the effectiveness of the proposed model. The results show that the proposed model outperforms the state-of-the-art models by achieving area under the curve receiver operating characteristics of 0.97 and precision-recall area under the curve of 0.98, which make it suitable for real-world scenarios.
... In addition, F1-score measures the balance ratio between precision and recall for better evaluation of the model [36]. On the other hand, MCC is another suitable performance metric, which provides the correlation between positive and negative predictions [45]. The formula used to calculate the MCC score is described below [45]. ...
... On the other hand, MCC is another suitable performance metric, which provides the correlation between positive and negative predictions [45]. The formula used to calculate the MCC score is described below [45]. ...
Full-text available
Electricity theft (ET) is an utmost problem for power utilities because it threatens public safety, disturbs the normal working of grid infrastructure and increases revenue losses. In the literature, many machine learning (ML), deep learning (DL) and statistical based models are introduced to detect ET. However, these models do not give optimal results due to the following reasons: curse of dimensionality, class imbalance problem, inappropriate hyper-parameter tuning of ML and DL models, etc. Keeping the aforementioned concerns in view, we introduce a hybrid DL model for the efficient detection of electricity thieves in smart grids. AlexNet is utilized to handle the curse of dimensionality issue while the final classification of energy thieves and normal consumers is performed through adaptive boosting (AdaBoost). Moreover, class imbalance problem is resolved using an undersampling technique, named as near miss. Furthermore, hyper-parameters of AdaBoost and AlexNet are tuned using artificial bee colony optimization algorithm. The real smart meters’ dataset is used to assess the efficacy of the hybrid model. The substantial amount of simulations proves that the hybrid model obtains the highest classification results as compared to its counterparts. Our proposed model obtains 88%, 86%, 84%, 85%, 78% and 91% accuracy, precision, recall, F1-score, Matthew correlation coefficient and area under the curve receiver operating characteristics, respectively.
... Furthermore, recall, area under the curve (AUC), precision, and F1-score metrics are considered the appropriate measures in order to compute the classifiers' performance using the imbalanced data [35]. Based on the cases mentioned above, the accuracy metric is not an appropriate performance measure [48,49]. ...
Full-text available
Electricity theft is one of the challenging problems in smart grids. The power utilities around the globe face huge economic loss due to ET. The traditional electricity theft detection (ETD) models confront several challenges, such as highly imbalance distribution of electricity consumption data, curse of dimensionality and inevitable effects of non-malicious factors. To cope with the aforementioned concerns, this paper presents a novel ETD strategy for smart grids based on theft attacks, long short-term memory (LSTM) and gated recurrent unit (GRU) called TLGRU. It includes three subunits: (1) synthetic theft attacks based data balancing, (2) LSTM based feature extraction, and (3) GRU based theft classification. GRU is used for drift identification. It stores and extracts the long-term dependency in the power consumption data. It is beneficial for drift identification. In this way, a minimum false positive rate (FPR) is obtained. Moreover, dropout regularization and Adam optimizer are added in GRU for tackling overfitting and trapping model in the local minima, respectively. The proposed TLGRU model uses the realistic EC profiles of the Chinese power utility state grid corporation of China for analysis and to solve the ETD problem. From the simulation results, it is exhibited that 1% FPR, 97.96% precision, 91.56% accuracy, and 91.68% area under curve for ETD are obtained by the proposed model. The proposed model outperforms the existing models in terms of ETD.
Non-technical losses (NTLs) are one of the major causes of revenue losses for electric utilities. In the literature, various machine learning (ML)/deep learning (DL) approaches are employed to detect NTLs. The existing studies are mostly concerned with tuning the hyperparameters of ML/DL methods for efficient detection of NTL, i.e., electricity theft detection. Some of them focus on the selection of prominent features from data to improve the performance of electricity theft detection. However, the curse of dimensionality affects the generalization ability of ML/DL classifiers and leads to computational, storage, and overfitting problems. Therefore, to deal with the above-mentioned issues, this study proposes a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for electricity theft detection using big data in electric power systems. The former (metaheuristics) are used to select prominent features, while the latter is utilized to extract high variance features from electricity consumption data. Firstly, 11 new features are synthesized using statistical and electrical parameters from the user’s consumption history. Then, the synthesized features are used as input to metaheuristic techniques to find a subset of optimal features. Finally, the optimal features are fed as input to the denoising autoencoder to extract features with high variance. The ability of both metaheuristic and autoencoder techniques to select and extract features is measured using a support vector machine. The proposed system reduces the overfitting, storage, and computational overhead of ML classifiers. Moreover, we perform several experiments to verify the effectiveness of our proposed system and results reveal that the proposed system has better performance than its counterparts.
Full-text available
The development of facial recognition technology has become an increasingly powerful tool in wild animal individual recognition. In this paper, we develop an automatic detection and recognition method with the combinations of body features of big cats based on the deep convolutional neural network (CNNs). We collected dataset including 12244 images from 47 individual Amur tigers (Panthera tigris altaica) at the Siberian Tiger Park by mobile phones and digital camera and 1940 images and videos of 12 individual wild Amur leopard (Panthera pardus orientalis) by infrared cameras. Firstly, the Single Shot MultiBox Detector (SSD) algorithm is used to perform the automatic detection process of feature regions in each image. For the different feature regions of the image, like face stripe or spots, CNNs and multi‐layer perceptrons (MLP) models were applied to automatically identify tiger and leopard individuals, independently. Our results show that the identification accuracy of Amur tiger can reach up to 93.27% for face front, 93.33% for right body stripe and 93.46% for left body stripe. Furthermore, the combination of right face, left body stripe and right body stripe achieves the highest accuracy rate, up to 95.55%. Consequently, the combination of different body parts can improve the individual identification accuracy. However, it is not the higher the number of body parts, the higher the accuracy rate. The combination model with 3 body parts has the highest accuracy. The identification accuracy of Amur leopard can reach up to 86.90% for face front, 89.13% for left body spots and 88.33% for right body spots. The accuracy of different body parts combination is lower than the independent part. For wild Amur leopard, the combination of face with body spot part is not helpful for the improvement of identification accuracy. The most effective identification part is still the independent left or right body spot part. It can be applied in long‐term monitoring of big cats, including big data analysis for animal behavior, and be helpful for the individual identification of other wildlife species. This article is protected by copyright. All rights reserved
Data on online advertising is rising rapidly due to the fast development of science and technology. Click‐through rate (CTR) prediction has become a critical task regarding the digital advertising industry and a key element in increasing advertising profits and user experience. Therefore, this article describes the problem of CTR prediction as a function of sequence classification tasks. Then, we proposed a novel optimization strategy to solve the high‐dimensional problem and find a subset of relevant variables to ensure high performance of our model and maximize the number of clicks. Here, we introduced a feature selection and hyper‐parameter optimization approach using genetic algorithms (GA) and the upper confidence bound (UCB) model to optimize micro‐targeting technology, along with the long short‐term memory (LSTM) network‐based CTR prediction model. The efficiency of the proposed UCB‐LSTM‐GA model and two hybrid models, namely LSTM‐GA and LSTM‐PSO, is evaluated by comparing them to each other and to other machine‐learning‐based classification methods, including LSTM using a UCB algorithm (UCB‐LSTM), High‐order Attentive Factorization Machine (HoAFM), genetic algorithm‐artificial neural network (GA‐ANN), and a feature interaction graph neural network model (Fi‐GNN). Our solution achieved as high as 87%, 89%, and 92% for respectively accuracy, precision, and recall, using the popular python tools with real Avazu datasets.
Full-text available
The role of electricity theft detection (ETD) is critical to maintain cost-efficiency in smart grids. However, existing methods for theft detection can struggle to handle large electricity consumption datasets because of missing values, data variance and nonlinear data relationship problems, and there is a lack of integrated infrastructure for coordinating electricity load data analysis procedures. To help address these problems, a simple yet effective ETD model is developed. Three modules are combined into the proposed model. The first module deploys a combination of data imputation, outlier handling, normalization and class balancing algorithms, to enhance the time series characteristics and generate better quality data for improved training and learning by the classifiers. Three different machine learning (ML) methods, which are uncorrelated and skillful on the problem in different ways, are employed as the base learning model. Finally, a recently developed deep learning approach, namely a temporal convolutional network (TCN), is used to ensemble the outputs of the ML algorithms for improved classification accuracy. Experimental results confirm that the proposed framework yields a highly-accurate, robust classification performance, in comparison to other well-established machine and deep learning models and thus can be a practical tool for electricity theft detection in industrial applications.
Bi-directional communication network is the foundation of Smart Distribution Network(SDN), but it also exposes SDN to more serious communication risks. Most of the current researches solve this problem by Intrusion Detection Systems(IDSs), yet they focus more on the detection performance, while ignoring the real-time requirements, redundant network traffic features, and unbalanced data distribution in SDN communication network. To address these problems, this paper proposes a feature engineering based AutoEncoder(AE)-LightGBM intrusion detection system for SDN. The proposed system uses Borderline-SMOTE to optimize the data distribution firstly, after that, AE is used for feature engineering to extract the main features. Finally LightGBM is trained to recognize the intrusion using the extracted features. Experimental results on the KDDCup99 and NSL-KDD datasets show that the accuracy, precision, and F1-score performance of the proposed model are better than those of traditional models and related works, and have significant advantages in real-time performance.
Full-text available
Electricity theft and fraud in energy consumption are two of the major issues for power distribution companies (PDCs) for many years. PDCs around the world are trying different methodologies for detecting electricity theft. The traditional methods for non-technical losses (NTLs) detection such as onsite inspection and reward and penalty policy have lost their place in the modern era because of their ineffective and time-consuming mechanism. With the advancement in the field of Artificial Intelligence (AI), newer and efficient NTL detection methods have been proposed by different researchers working in the field of data mining and AI. The AI-based NTL detection methods are superior to the conventional methods in terms of accuracy, efficiency, time-consumption, precision, and labor required. The importance of such AI-based NTL detection methods can be judged by looking at the growing trend toward the increasing number of research articles on this important development. However, the authors felt the lack of a comprehensive study that can provide a one-stop source of information on these AI-based NTL methods and hence became the motivation for carrying out this comprehensive review on this significant field of science. This article systematically reviews and classifies the methods explored for NTL detection in recent literature, along with their benefits and limitations. For accomplishing the mentioned objective, the opted research articles for the review are classified based on algorithms used, features extracted, and metrics used for evaluation. Furthermore, a summary of different types of algorithms used for NTL detection is provided along with their applications in the studied field of research. Lastly, a comparison among the major NTL categories, i.e., data-based, network-based, and hybrid methods, is provided on the basis of their performance, expenses, and response time. It is expected that this comprehensive study will provide a one-stop source of information for all the new researchers and the experts working in the mentioned area of research.
Full-text available
The electrical losses in power systems are divided into non-technical losses (NTLs) and technical losses (TLs). NTL is more harmful than TL because it includes electricity theft, faulty meters and billing errors. It is one of the major concerns in the power system worldwide and incurs a huge revenue loss for utility companies. Electricity theft detection (ETD) is the mechanism used by industry and academia to detect electricity theft. However, due to imbalanced data, overfitting issues and the handling of high-dimensional data, the ETD cannot be applied efficiently. Therefore, this paper proposes a solution to address the above limitations. A long short-term memory (LSTM) technique is applied to detect abnormal patterns in electricity consumption data along with the bat-based random under-sampling boosting (RUSBoost) technique for parameter optimization. Our proposed system model uses the normalization and interpolation methods to pre-process the electricity data. Afterwards, the pre-processed data are fed into the LSTM module for feature extraction. Finally, the selected features are passed to the RUSBoost module for classification. The simulation results show that the proposed solution resolves the issues of data imbalancing, overfitting and the handling of massive time series data. Additionally, the proposed method outperforms the state-of-the-art techniques; i.e., support vector machine (SVM), convolutional neural network (CNN) and logistic regression (LR). Moreover, the F1-score, precision, recall and receiver operating characteristics (ROC) curve metrics are used for the comparative analysis.
Full-text available
Renewable energy sources (RESs) are considered to be reliable and green electric power generation sources. Photovoltaics (PVs) and wind turbines (WTs) are used to provide electricity in remote areas. Optimal sizing of hybrid RESs is a vital challenge in a stand-alone environment. The meta-heuristic algorithms proposed in the past are dependent on algorithm-specific parameters for achieving an optimal solution. This paper proposes a hybrid algorithm of Jaya and a teaching-learning-based optimization (TLBO) named the JLBO algorithm for the optimal unit sizing of a PV-WT-battery hybrid system to satisfy the consumer's load at minimal total annual cost (TAC). The reliability of the system is considered by a maximum allowable loss of power supply probability (LPSPmax) concept. The results obtained from the JLBO algorithm are compared with the original Jaya, TLBO, and genetic algorithms. The JLBO results show superior performance in terms of TAC, and the PV-WT-battery hybrid system is found to be the most economical scenario. This system provides a cost-effective solution for all proposed LPSPmax values as compared with PV-battery and WT-battery systems.
Full-text available
Forecasting in the smart grid (SG) plays a vital role in maintaining the balance between demand and supply of electricity, efficient energy management, better planning of energy generation units and renewable energy sources and their dispatching and scheduling. Existing forecasting models are being used and new models are developed for a wide range of SG applications. These algorithms have hy-perparameters which need to be optimized carefully before forecasting. The optimized values of these algorithms increase the forecasting accuracy up-to a significant level. In this paper, we present a brief literature review of forecasting models and the optimization methods used to tune their hyperparam-eters. In addition, we have also discussed the data preprocessing methods. A comparative analysis of these forecasting models, according to their hyperparameter optimization, error methods and prepro-cessing methods, is also presented. Besides, we have critically analyzed the existing optimization and data preprocessing models and highlighted the important findings. A survey of existing survey papers is also presented and their recency score is computed based on the number of recent papers reviewed in them. By recent, we mean that the year in which a survey paper is published and its previous three years. Finally, future research directions are discussed in detail.
Full-text available
Information extraction from multi-sensor remote sensing images has increasingly attracted attention with the development of remote sensing sensors. In this study, a supervised change detection method, based on the deep Siamese convolutional network with hybrid convolutional feature extraction module (OB-DSCNH), has been proposed using multi-sensor images. The proposed architecture, which is based on dilated convolution, can extract the deep change features effectively, and the character of “network in network” increases the depth and width of the network while keeping the computational budget constant. The change decision model is utilized to detect changes through the difference of extracted features. Finally, a change detection map is obtained via an uncertainty analysis, which combines the multi-resolution segmentation, with the output from the Siamese network. To validate the effectiveness of the proposed approach, we conducted experiments on multispectral images collected by the ZY-3 and GF-2 satellites. Experimental results demonstrate that our proposed method achieves comparable and better performance than mainstream methods in multi-sensor images change detection.
Full-text available
In the smart grid (SG) environment, consumers are enabled to alter electricity consumption patterns in response to electricity prices and incentives. This results in prices that may differ from the initial price pattern. Electricity price and demand forecasting play a vital role in the reliability and sustainability of SG. Forecasting using big data has become a new hot research topic as a massive amount of data is being generated and stored in the SG environment. Electricity users, having advanced knowledge of prices and demand of electricity, can manage their load efficiently. In this paper, a recurrent neural network (RNN), long short term memory (LSTM), is used for electricity price and demand forecasting using big data. Researchers are working actively to propose new models of forecasting. These models contain a single input variable as well as multiple variables. From the literature, we observed that the use of multiple variables enhances the forecasting accuracy. Hence, our proposed model uses multiple variables as input and forecasts the future values of electricity demand and price. The hyperparameters of this algorithm are tuned using the Jaya optimization algorithm to improve the forecasting ability and increase the training mechanism of the model. Parameter tuning is necessary because the performance of a forecasting model depends on the values of these parameters. Selection of inappropriate values can result in inaccurate forecasting. So, integration of an optimization method improves the forecasting accuracy with minimum user efforts. For efficient forecasting, data is preprocessed and cleaned from missing values and outliers, using the z-score method. Furthermore, data is normalized before forecasting. The forecasting accuracy of the proposed model is evaluated using the root mean square error (RMSE) and mean absolute error (MAE). For a fair comparison, the proposed forecasting model is compared with univariate LSTM and support vector machine (SVM). The values of the performance metrics depict that the proposed model has higher accuracy than SVM and univariate LSTM.
Full-text available
Among an electricity provider's non-technical losses, electricity theft has the most severe and dangerous effects. Fraudulent electricity consumption decreases the supply quality, increases generation load, causes legitimate consumers to pay excessive electricity bills, and affects the overall economy. The adaptation of smart grids can significantly reduce this loss through data analysis techniques. The smart grid infrastructure generates a massive amount of data, including the power consumption of individual users. Utilizing this data, machine learning and deep learning techniques can accurately identify electricity theft users. In this paper, an electricity theft detection system is proposed based on a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) architecture. CNN is a widely used technique that automates feature extraction and the classification process. Since the power consumption signature is time-series data, we were led to build a CNN-based LSTM (CNN-LSTM) model for smart grid data classification. In this work, a novel data pre-processing algorithm was also implemented to compute the missing instances in the dataset, based on the local values relative to the missing data point. Furthermore, in this dataset, the count of electricity theft users was relatively low, which could have made the model inefficient at identifying theft users. This class imbalance scenario was addressed through synthetic data generation. Finally, the results obtained indicate the proposed scheme can classify both the majority class (normal users) and the minority class (electricity theft users) with good accuracy.
Correct classification of rare samples is a vital data mining task and of paramount importance in many research domains. This paper mainly focuses on the development of the novel class-imbalance learning techniques, which make use of oversampling methods integrated with bagging and boosting ensembles. Two novel oversampling strategies based on the single and the multiple imputation methods are proposed. The proposed techniques aim to create useful synthetic minority class samples, similar to the original minority class samples, by estimation of missing values that are already induced in the minority class samples. The re-balanced datasets are then used to train base-learners of the ensemble algorithms. In addition, the proposed techniques are compared with the commonly used class imbalance learning methods in terms of three performance metrics including AUC, F-measure, and G-mean over several synthetic binary class datasets. The empirical results show that the proposed multiple imputation-based oversampling combined with bagging significantly outperforms other competitors.
Non-technical losses in electricity utilities are responsible for major revenue losses. In this paper, we propose a novel end-to-end solution to self-learn the features for detecting anomalies and frauds in smart meters using a hybrid deep neural network. The network is fed with simple raw data, removing the need of handcrafted feature engineering. The proposed architecture consists of a long short-term memory network and a multi-layer perceptrons network. The first network analyses the raw daily energy consumption history whilst the second one integrates non-sequential data such as its contracted power or geographical information. The results show that the hybrid neural network significantly outperforms state-of-the-art classifiers as well as previous deep learning models used in non-technical losses detection. The model has been trained and tested with real smart meter data of Endesa, the largest electricity utility in Spain.