ArticlePDF Available

An adaptive synthesis to handle imbalanced big data with deep siamese network for electricity theft detection in smart grids



The bi-directional flow of energy and information in the smart grid makes it possible to record and analyze the electricity consumption profiles of consumers. Because of the increasing rate of inflation over the past few years, people started looking for means to use electricity illegally, termed as electricity theft. Many data analytics techniques are proposed in the literature for electricity theft detection (ETD). These techniques help in the detection of suspected illegal consumers. However, the existing approaches have a low ETD rate either due to improper handling of the imbalanced class problem in a dataset or the selection of inappropriate classifier. In this paper, a robust big data analytics technique is proposed to resolve the aforementioned concerns. Firstly, adaptive synthesis (ADASYN) is applied to handle the imbalanced class problem of data. Secondly, convolutional neural network (CNN) and long-short term memory (LSTM) integrated deep siamese network (DSN) is proposed to discriminate the features of both honest and fraudulent consumers. Specifically, the task of feature extraction from weekly energy consumption profiles is handed over to the CNN module while the LSTM module performs the sequence learning. Finally, the DSN contemplates on the shared features provided by the CNN-LSTM and applies final judgment. The data analytics is performed on different train-test ratios of the real-time smart meters' data. The simulation results validate the proposed model's effectiveness in terms of high area under the curve, F1-Score, precision and recall.
An adaptive synthesis to handle imbalanced big
data with deep siamese network for electricity theft
detection in smart grids
Nadeem Javaid*, Naeem Jan, Muhammad Umar Javed
Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
*Corresponding author:,
Abstract—The bi-directional flow of energy and information
in the smart grid makes it possible to record and analyze
the electricity consumption profiles of consumers. Because of
the increasing rate of inflation over the past few years, people
started looking for means to use electricity illegally, termed as
electricity theft. Many data analytics techniques are proposed
in the literature for electricity theft detection (ETD). These
techniques help in the detection of suspected illegal consumers.
However, the existing approaches have a low ETD rate either
due to improper handling of the imbalanced class problem in
a dataset or the selection of inappropriate classifier. In this
paper, a robust big data analytics technique is proposed to
resolve the aforementioned concerns. Firstly, adaptive synthesis
(ADASYN) is applied to handle the imbalanced class problem of
data. Secondly, convolutional neural network (CNN) and long-
short term memory (LSTM) integrated deep siamese network
(DSN) is proposed to discriminate the features of both honest and
fraudulent consumers. Specifically, the task of feature extraction
from weekly energy consumption profiles is handed over to the
CNN module while the LSTM module performs the sequence
learning. Finally, the DSN contemplates on the shared features
provided by the CNN-LSTM and applies final judgment. The
data analytics is performed on different train-test ratios of the
real-time smart meters’ data. The simulation results validate the
proposed model’s effectiveness in terms of high area under the
curve, F1-Score, precision and recall.
Index Terms—Big data analytics, imbalanced data, adaptive
synthesis, electricity theft detection, deep learning, long-short
term memory, convolutional neural network, deep siamese net-
Going with the United Nation’s 2030 vision, “electricity
for all”, is the major objective of all countries. 1. Both devel-
oped and developing countries are striving to add maximum
amount of electricity to the national grid. While the power
authorities struggle to ensure efficient power distribution to
every household, the energy theft became a hurdle in this
endeavor. According to a report, a loss of approximately 100
million Canadian dollars per year is revealed due to electricity
theft that is equal to the amount of electricity required to
power around 77000 homes for a year [1]. The yearly loss
in revenue caused by the electricity theft in America is 6U.S.
dollars. Similarly, the percentage of electricity loss caused due
to theft is 0.5% to 25% in Brazil, 3.5% in Philippines and upto
1% in United Kingdom. Each year, the revenue loss due to
electricity theft reaches approximately 96 billion U.S. dollars
[2] worldwide.
With the advancements made in information and communi-
cation technology, the traditional power grids are now able
to grasp the benefits of bi-directional communication and
known as smart grids (SGs) [3], [4]. The roll-out of advanced
metering infrastructure (AMI) in the SG makes it possible
to provide the real time and fine-tuned measurements to the
utilities. The addition of communication layer to traditional
metering establishes a bridge between consumers and utility
[5]. Although, numerous benefits are provided by the AMI,
however, the power systems became more exposed to cyber at-
tacks due to the addition of this extra layer [6]. In contrast, the
traditional meters are only vulnerable to physical tampering.
In this paper, the fraud committed by either utilities or feeders
is beyond the scope and focus is on detecting irregularities in
the electricity consumption of consumers.
In a SG, the transmission and distribution of power include
both technical losses (TLs) and non-technical losses (NTLs).
The former include dissipation of energy due to Joules effect,
which in fact is caused by the emission of electrons due
to heat. The assessment of TLs is necessary for accounting
NTLs. Electricity theft is an intended act of illegal usage of
electricity, which is a major source of NTLs. These losses
represent the energy, which is consumed by the consumers,
but not billed. These are also known as commercial losses or
electricity theft. The main issue concerning NTLs is that they
cannot be detected precisely. Only the difference between the
dispatched amount of energy from utilities and the bill paid
for the consumed energy is calculated. The reason behind
this irregularity is either the illegal use of electricity or the
occurrence of technical faults [7]. This irregularity falls under
one of the two groups: internal fraud and external fraud. The
former is committed by the employees for achieving financial
benefits while the later is perpetrated by the consumers for
reducing electricity bills. Ultimately, the main goal behind this
irregularity is to hinder the actual electricity consumption and
consequently achieve financial benefits [8].
The vulnerabilities related to NTLs are generally catego-
rized into three classes: physical attacks, cyber attacks and data
attacks. The physical attacks include meter tampering, reverse
metering, bypassing the meters by direct supply, double-
tapping, washing out meter display, using bogus meters, en-
countering loops in terminal blocks and deploying tilted meters
[8]. In developing countries, the most frequently committed
electricity frauds are reverse metering and direct supply [7].
The cyber attacks are launched remotely by intercepting the
communication line and altering actual readings with malign
readings. Whereas, data attacks are the fusion of both physical
attacks and cyber attacks. The motive behind data attack is to
specifically target the recorded measurements of electricity and
adulterate them by fake data injection [8].
In the past, the primary means of detecting power theft
was on-site inspections and manual analytics of electricity
consumption records. However, these approaches are time
consuming and result in low success rate. Recently, the emer-
gence of information technology and advancements in machine
learning resulted in more robust solutions. Generally, the
solutions to handle NTLs can be grouped into three categories:
hardware based, non-hardware based (data-driven) and hybrid
of both. Hardware based solutions involve the deployment of
devices on different locations, i.e., sensors and they mainly
deal with the design and architecture of the smart meters [9]
to achieve high ETD rate. However, they have high operational
and maintenance cost of the specialized hardware. In contrast,
non-hardware based solutions restrain high potential due to the
low operational and maintenance cost. These solutions detect
the fraud through machine learning algorithms and classifiers.
They can further be categorized into state based, game theory
based and artificial intelligence (AI) based methods.
The state based methods estimate the aggregated NTLs by
calculating the TLs of a specific area. These methods calcu-
late the difference between the amount of energy consumed
and the corresponding invoiced energy. Moreover, different
measurements are estimate like deviation in voltage, power,
etc., for detecting NTLs, which result in high precision and
low cost [10]. However, the state based methods only provide
the aggregated NTLs and fail in providing the specific source
of the loss. Unlike state estimation based method, in game-
theoretic method [11], there is a contest between the utility
and the aberrant consumer. The aim of fraudster consumer is
to outmatch the utility. However, the game-theoretic methods
highly rely on strong estimation for theft characterization.
On the contrary, the AI based methods mainly focus on
the patterns of electricity consumption, which are analyzed
through machine learning algorithms. Both classification and
clustering methods require labeled and un-labeled data in order
to fetch the aberrant consumers from the pool of massive
electricity consumption profiles [12].
Detecting anomalous patterns from electricity consumption
profiles is a challenging task in the presence of imbalanced
class distribution problem in data. In real world scenario, the
number of fair electricity consumers are significantly more
than the thieves, which creates an issue of imbalanced distribu-
tion in dataset. Therefore, it may be considered a special type
of anomaly detection. In AI based methods, classifiers mostly
result in low ETD rate, mainly due to the underrepresentation
of the minority class [13].
The research work in [1], [14] show that analyzing the
electricity consumption patterns of consumers is beneficial
in detecting the suspicious consumers. However, after going
through the existing literature on the topic of ETD [6]- [11],
it is concluded that ETD has the following limitations:
the models which are applied for ETD do not take care
of proper class balancing,
in many cases, the attachment of special devices is
in highly dynamic time series analyses, methods such
as support vector machine (SVM), random forest (RF),
logistic regression (LR), etc., have low ETD rate and high
false positive (F+) rate,
the deep learning approaches do not discriminate the
decisive features appropriately and
in sequential time series data, the convolution neural
network (CNN) and multi-layer perceptron (MLP) do not
perform well. Moreover, CNN fails to provide the exact
source of NTL.
In this paper, a robust big data analytics method for electricity
theft detection (ETD) in the SG is proposed to better discrim-
inate the fair and fraud consumers on the basis of electricity
consumption data. The main contributions of this study are as
according to the nature of problem, an enhanced strategy
for data preprocessing is adopted,
to avoid overfitting and to handle class imbalance issue,
adaptive synthesis (ADASYN) method is used,
CNN and long short term memory (LSTM) are integrated
in a deep siamese network (DSN) in order to learn the
key features and to achieve high ETD rate and
the performance metrics such as mean average precision
(mAP) and area under the curve (AUC) are used to better
comprehend the results.
Rest of the paper is organized as follows. The review on
various existing electricity theft strategies is given in section
II. The problem analysis and solutions to the problems are
described in section III and section IV, respectively. The
simulation results are discussed in section V. Finally, the paper
is concluded in section VI.
Review on the state of the art ETD solutions is generally
categorized into two groups: hardware based solutions and
non-hardware based solutions. A comprehensive review on
system level and data level threats of AMI can be studied
in [15], [16].
A. Hardware based solutions
In hardware based solutions, deployment of special purpose
hardware and modification to the physical architecture are
performed to strengthen the system against vulnerabilities. An
identity based key establishment model is proposed in [9]
in order to avoid relying on pairing. The proposed model is
based on elliptic curve cryptography (ECC), which enhances
the performance along with the mitigation of computational
overhead. Using Chebyshev polynomial to access the security
features of smart meter, a power-authenticated key exchange
protocol is proposed in [17]. To address the ephemeral security
problem, an authentication scheme based on ECC is proposed
in [18], which aims to mitigate the communication and com-
putational complexity. Although, the hardware based solutions
give acceptable results, concentration is still focused on data-
driven approaches for NTL detection due to the following
reasons [19]:
high deployment and maintenance cost due to specialized
metering hardware,
negative benefit-cost ratio (BCR), i.e., the cost outweighs
the benefits,
failure in detecting specific source of NTL and
vulnerability of specialized meter hardware in extreme
weather conditions.
B. Non-hardware based solutions
In contrast to the hardware based solutions, the data-driven
approach surges more rapidly in detecting NTLs. In [2], a
two-fold machine learning technique is adopted to minimize
the ratio of misclassified instances. In the first step, the
maximum information coefficient (MIC) determines the
correlation between the suspicions and the consumption
profiles. In the second step, clustering is performed to find
the density peaks. Similarly in [20], clustering is used to
extract a prototype from consumption patterns. The unseen
data samples are categorized by a distance-measurer; the
instance with significant distance is considered as malign. In
contrast, the work performed in [8], [21], use a supervised
learning approach to handle ETD through relative entropy
and gradient boosting classifiers (GBCs). A hybrid of MLP
and LSTM is adopted to detect NTL in AMI [22]. In order
to find the suspicions’ rank, fuzzy logic is applied in [23]. A
framework for feature engineering with combination of both
genetic algorithm (GA) and finite mixture model (FMM) is
implemented in [24]. For final judgement in NTL detection,
gradient boosting machine (GBM) is applied. GA is an
efficient heuristic algorithm, however, it fails in providing
the global optima. A similar approach is proposed in [25],
which uses black hole algorithm (BHA) for feature extraction.
Although, BHA extracts the optimal features from time series
data, the performance of model is still inefficient in terms of
By analyzing the consumption patterns of electricity con-
sumers, it becomes evident that the fraudsters and the fair
consumers can be differentiated by their consumption pro-
files. Therefore, experiments are performed on the consumed
electricity data, as inspired from [1], in order to validate
the problem. Fig. 1(a) shows the electricity consumption of
benign consumers during October 2016. By visualizing the
results, it is difficult to analyze the key characteristics from
the sequential or one-dimensional (1-D) load profile. However,
by choosing the weekly load profile, it can be seen that the
consumption of a fair consumer shows symmetric behavior,
as depicted in Fig. 1(b). In our scenario, weekly consumption
profile of consumers is preferred over daily consumption
for CNN, because the behaviors of consumers are weekly
periodic. As shown in Fig. 1(b), a strong relation exists
between the weekly consumed energy, which shows the peak
consumption on 3rd day while the lowest consumption is
recorded on 6th day of each week. The exception is found
on 5th day of the 4th week. The reason behind this deviation
is the intermittent nature of a fair consumer. Therefore, it
is deduced from Fig. 1(b) that the consumption profiles of
the benign consumers follow a periodic pattern. Similarly, the
daily and weekly time series of the fraudster consumer is
exhibited in Fig. 2(a) and Fig. 2(b), respectively, which show
a non-periodic behavior at each time interval. In contrast to
Fig. 1(b), an abrupt and highest peak is observed on 3rd day
of the 1st week, as shown in Fig. 2(b), which validates the
After analyzing the time series data of both fair and fraud-
ster consumers, it is observed that the consumption patterns
of fair consumers follow a symmetric pattern, in contrast, the
suspicions show asymmetric behavior. This assumption leads
to scrutinize and analyze the electricity consumption patterns
of consumers, which violate the uniform control limit.
However, it is a challenging and an arduous task to capture
the dynamic changes in time series due to the following
1) due to the imbalanced nature of dataset, the distribution
is skewed towards the dominating class and consequently, the
classifiers do not discriminate the decision boundary. Hence,
the classifier tends to overfit [1],
2) the energy consumption data mostly consists of missing
values and outliers. The smoothing spline can detect the
outliers, however, it is difficult to capture the true continuity.
The selection of thresholds (knots) and their location are two
big challenges. Moreover, by increasing the degree from a
certain threshold, the chances of misclassification increase.
Hence, the suspicious consumers can be misclassified. As
shown in Fig. 2(a), the consumption of a fraudster consumer
shows unusual activity, which is normalized by the smoothing
spline [13], [22],
3) extracting decisive features from a highly dynamic sequen-
tial time series is significant, which traditional CNN lacks [1].
4) in literature, most of the datasets referred to electricity theft
are unlabeled. The synthetic attacks are launched, which do
not show the true relation between consumed energy [21],
5) the selection of suitable performance metrics is of great
importance in ETD. The most widely used performance mea-
sure i.e., accuracy is an inadequate measure in terms of fraud
detection, because the cases of theft are reared as compared
to the adversary. The classifier shows higher accuracy, even
though the theft cases are misclassified, which negates the true
relation between weekly consumed energy [25]. Similarly, low
ETD rate, minimum AUC and high F+rate are observed in
Da y 1
Da y 2
Da y 3
Da y 4
Da y 5
Da y 6
Da y 7
(b )
1s t we ek
2n d we ek
3r d we ek
4t h w eek
Fig. 1: Electricity consumption pattern of a honest consumer.
(a) Date-wise electricity consumption. (b) Weekly electricity
Da y 1
Da y 2
Da y 3
Da y 4
Da y 5
Da y 6
Da y 7
(b )
45 1s t we ek
2n d we ek
3r d we ek
4t h w eek
Fig. 2: Electricity consumption pattern of a fraudulent
consumer. (a) Date-wise electricity consumption. (b) Weekly
electricity consumption.
The proposed ETD technique consists of two steps. In the
first step, the preprocessing is done in which the issues of miss-
ing values, data standardization and handling the imbalanced
class are resolved. In the second step, a three-fold operation is
performed, which involves decisive feature extraction, analysis
of sequential time series and the application of a classifier. The
details are provided in the following subsections.
A. Data preprocessing
The preliminary analysis of data is a mandatory step in
highly dynamic time series analysis, which includes imputa-
tion, outlier detection, data standardization, handling imbal-
ance data, etc.
1) Handling missing values and data standardization: The
electricity consumption records of consumers contain either
incomplete information or missing values. The reasons behind
this issue may be the failure of hardware or corruption of data.
In case of high time series data, the missing values can not be
dropped. However, the imputation is performed synthetically
in order to fill these values. In most cases, the filling of
missing values is performed through averaging. In this paper,
the missing values are recovered through interpolation method
[1], as under:
f(zi) = (zi1+zi+1
2if ziN aN, zi1, zi+1 6∈ NaN
where, ziis the recorded or missed (null) observation in the
dataset. The null value is represented as NaN. If ziis null,
then it is filled according to equation (1).
Similarly, the data standardization is performed using min
max normalization [1], using equation (2).
f(zi) = zimin(z)
where, min(z)shows minimum value of zand max(z)
represents maximum value of z.
Fig. 3: System model of the proposed DSN
2) Handling imbalanced class distribution: A dataset is
considered as imbalanced or biased, if the sample points
of one class (majority class) highly dominate the instances
of other class (minority class). Due to underrepesentation
of minority class, the distribution is skewed towards the
majority class. Consequently, the classifier cannot discriminate
the decision boundary. Hence, it becomes unable to learn
the key characteristics of minority class and tends to overfit.
The issues related to imbalanced data are not only limited to
image recognition, semantic segmentation, but are also applied
equally to time series data [26].
The existing remedies for handling imbalanced class issues
fall under one of the three solutions: cost-sensitive approach,
algorithm-level approach and data-level handling approach
[27]. In cost sensitive approach, the affects of highly dominat-
ing class are reduced in the training stage. The misclassifica-
tion costs of both the dominating and suppressing classes are
taken into account and the weights are assigned accordingly.
Hence, the cost-sensitive approach tweaks the minority class
towards the dominating class. In algorithm-level approach, the
model is modified and trained in such a way that the scarce
instances are favored and over-weighted, so that the disparity
produced by the majority class is reduced during learning
stage. Traditionally, the class balancing was achieved by
data-level approach, which includes both undersampling and
oversampling techniques. In undersampling, the majority class
is sacrificed a lot by down-sizing the actual data because in
most cases the right choices are eliminated. Similarly, copying
the instances of minority class mostly leads to overfitting,
which is a downfall of oversampling. The right choice for
the selection of technique related to handling the imbalanced
class issue depends upon the nature of problem.
In this paper, the responsibility of handling imbalanced data
is assigned from algorithm-level to data-level. In particular, the
oversampling technique is adopted in order to avoid the prob-
lem of decisive sample elimination caused by undersampling
technique. Specifically, for oversampling, ADASYN sampling
approach is applied in order to better comprehend the selected
points [28]. In contrast to simply duplicating the instances
of minority class, the ADASYN selects samples and injects
some noise. The impact of noise addition results in better
generalization of the model. The reason behind the selection
of ADASYN is not only to avoid overfitting, but also to
emphasize outliers’ detection in the feature space.
B. Proposed deep siamese (CNN-LSTM) network architecture
for ETD
In the second step of the proposed methodology, identi-
fication of the fraudulent consumers is performed via joint
integration of CNN-LSTM with DSN. The details are provided
in the following subsections:
1) Features extraction through convolution neural net-
works: The preliminary data analytics show the periodicity
and non-periodicity in electricity consumption of fair and
fraudulent consumers. The identification of a fraudster con-
sumer is difficult when analyzing the daily electricity con-
sumption record, since the electricity consumption of each day
shows a relatively independent pattern. Therefore, aligning the
electricity consumption of several weeks is beneficial for de-
tecting abnormal patterns. The work done in [1] indicates that
CNN performs well in such situation, hence the daily electric-
ity consumption data is transformed to weekly consumption,
accordingly. A deep CNN is trained on the weekly electricity
consumption profile through multiple stacked convolutional
layers, convolution filters, a max-pooling layer and a fully
connected layer. Convolution is the element-wise multiplica-
tion of weights with corresponding inputs. After convolution,
the features-map is obtained by sliding the convolution filter
or kernel over the input vector.
2) Sequence learning through long short term memory: The
association of memory to the NN makes it more powerful to
handle time series data, which becomes the inherent behavior
of recurrent neural network (RNN) [29]. The problems asso-
ciated with RNN are vanishing and exploding gradients [30].
These issues arise due to the ignorance of long-term and short-
term dependencies. Unlike traditional RNN models, LSTM is
introduced to overcome the aforementioned limitations [31].
The structure of LSTM is same as RNN except the repeating
module. Instead of a single NN layer, LSTM has more layers,
which demonstrate the better representation of time series data.
In fact, LSTM is capable to handle the vanishing gradient
problem and to remember the information for a long period
of time, which is practically its default behavior.
In our work, the daily electricity consumption profile is
analyzed by LSTM. Moreover, LSTM is also capable to fetch
the time window of anomalous time series.
3) Supervised learning based on deep siamese network:
DSN can be applied to the problem, where the aim is to
discriminate features on the basis of similarity measurer [32].
Unlike traditional CNN, which has low generalization ability,
DSN works superior because of its best feature extraction
capabilities [32], [33], [34]. DSN is a supervised machine
learning technique, which operates in two main steps: shared
feature extractor and distance measurer or cost estimator. The
shared feature extractor is the encoding of features while the
cost function estimates the difference between two embedding
4) Mathematical formulation for CNN-LSTM: The com-
bination of CNN and LSTM is used in the proposed work to
discriminate the features of two different types of consumers,
i.e., honest and fraudulent. The mathematical formulation
of the CNN-LSTM module used in the underlying work is
described below.
The two input sequences, i.e., ψiand ψjare taken paral-
lelly by the CNN-LSTM module, such that both ψi, ψj=
{(x1, y1),(x2, y2), ..., (xn, yn)}, where, xishows the input
features and yi[0,1] is the corresponding target values
(yi= 0 implies that the instance belongs to fair class). The
features of both the classes are learned by the CNN-LSTM
module and finally the encoding of features is performed [32],
using equations (3) and (4):
Ei=δ{ωn{...δ{ω2.[δ(ω1i+b1) + b2]...}+bn},(3)
Ej=δ{ωn{...δ{ω2.[δ(ω1j+b1) + b2]...}+bn},(4)
where, δ(.),ωnand b, show the sigmoid function, weights and
biases, respectively. Thereafter, the shared features are fed to a
loss function, which discriminates the features on the basis of
similarity measure. Therefore, the classification loss such as
binary cross entropy is not viable. Instead, a constructive loss
function is used, as in [32], to better comprehend the features,
given in euqation (5).
i,j =di,j .max[0,(1 ˆ
di,j )] + (1 di,j ).ˆ
di,j ,(5)
where, di,j is the Euclidean distance, which is calculated
for the features’ output accordingly, i.e., ˆ
di,j =||EiEj||2.
Similarly, di,j shows the actual distance, given in equation
di,j =(1,if yi6= ˆyj
0,otherwise. (6)
The objective of training DSN is to minimize the variance
between di,j and ˆ
di,j .
In this section, the simulations are performed in order to
compare the performance of the proposed model with the
benchmark schemes.
A. Simulation setup
1) Dataset acquisition: The dataset is acquired from the
largest power providing company in China i.e., SGCC2, which
is publicly available. The daily consumption record is available
for 1035 days i.e., from January 1, 2014 to October 31,
2016. The ground truth of the dataset states that 9% of
the total consumers are declared as electricity thieves, which
demonstrates a high ratio.
0 2 4 6 8 10 12 14
0. 00
0. 25
0. 50
0. 75
Accur acy
0 2 4 6 8 10 12 14
Num ber of ep ochs
Tra in
Fig. 4: Performance of CNN-LSTM model.
0. 0 0.2 0.4 0 .6 0 .8 1 .0
0. 00
0. 25
0. 50
0. 75
1. 00
Tra in (AUC = 0 .75)
Test (AUC = 0 .73 )
0. 0 0.2 0.4 0 .6 0 .8 1 .0
Reca ll
0. 00
0. 25
0. 50
0. 75
1. 00
Prec isi on
Tra in (PR = 0.8 2)
Test (PR = 0.5 2)
Fig. 5: ROC-AUC and PR curve of CNN-LSTM model.
2) Performance metrics: In order to detect NTL from
the pool of electricity consumption profile, the performance
metrics such as true positives (T+) and true negatives
(T) show the correctly classified instances. In contrast,
false negatives (F) and F+reflect an opposite scenario,
where Fshows the number of fraud consumers, which are
0 2 4 6 8 10 1 2 14 16
0. 6
0. 8
1. 0
Accur acy
0 2 4 6 8 10 1 2 14 16
Num ber of ep ochs
0. 0
0. 2
0. 4
0. 6
Tra in
Fig. 6: Performance of DSN.
0. 0 0.2 0.4 0 .6 0 .8 1 .0
0. 00
0. 25
0. 50
0. 75
1. 00
Tra in (AUC = 0.9 9)
Test (AUC = 0 .9 3)
0. 0 0.2 0.4 0 .6 0 .8 1 .0
Reca ll
0. 00
0. 25
0. 50
0. 75
1. 00
Prec isi on
Tra in (PR = 0. 99 )
Test (PR = 0.9 2)
Fig. 7: ROC-AUC and PR curve of DSN.
misclassified as fair and vice versa. The objective behind
the accurate detection of NTL is to reduce the F+, which
consequently maximizes T+. Other performance metrics
related to classification are recall, precision, specificity,
F1-score, accuracy, mAP, and AUC of receiver operating
characteristics (ROC) curve, given by equations (7)-(11),
taken from [13].
Recall =T+
P recision =T+
Specif icity =T
F1score = 2 P recision Recall
P recision +Recall ,(10)
Accuracy =T++T
Though, accuracy and recall are widely used in the literature as
performance metrics, however, they are inadequate in case of
imbalanced class distribution, as shown in Table I. Similarly,
precision, specificity and F1-Score do not show accurate
results and are not reliable when used individually.
In order to detect NTL without the loss of information,
selection of reliable performance metrics is required [13].
The performance metrics such as mAP and AUC are applied
in this work to better comprehend the imbalanced data. As
mentioned in [1], [2], ROC curve and mAP are the best
performance metrics used for detecting suspects in imbalanced
class distribution.
The ROC curve is the graphical representation of T+rate
and F+rate. It is used to evaluate the performance of a
classifier. The area under the ROC curve is called AUC, which
separates the distribution of fraudulent class from fair class,
as given in equation (12). The limits of ROC curve range
from 0to 1. The ideal situation arises when no curve overlaps
each other. AUC approaching 1demonstrates the validity of
classifier while AUC less than 0.5shows that the classifier
does not have the ability to discriminate the classes [1], [2].
AUC is calculated using equation (12), taken from [1].
2|S|(|S|+ 1)
where, Ridenotes the rank of suspicion degree of fraudulent
consumers in ascending order while |S|and |H|are the
cardinality of suspicious and honest consumers, respectively.
The second performance metric used in this paper is mAP.
It is defined as the mean of all average precisions. It is used
for useful information retrieval, when the performance metrics
discussed in equations (7)-(11) fail.
Let ykshows the number of fraudulent consumers and k
denotes the top rank fraudulent consumers, such that the
precision is defined as P@k=yk
k. The calculations performed
by mAP for information retrieval are given in equation (13),
taken from [1].
mAP @N=Pr
i=1 P@ki
where, rshows the number of suspicious consumers of top
ranked theft labels N. The value of Nis 100 in our scenario.
B. Affect of imbalanced distribution on performance metrics
In imbalanced class problem, one class significantly domi-
nates the other class, which results in the suppression of the
minority class. The affect of least important and significant
performance metrics can be seen in Table I.
Table I shows the comparative analysis of DSN (with and
without) handling imbalanced class distribution. It is seen
that the performance is worst without handling imbalanced
class issue. Especially, in case of recall, where a lot of fraud
instances are misclassified as fair. Similarly, the performance
of ADASYN is better than that of random undersampling
(RUS). The reason behind the low-performance of RUS is due
to the elimination of decisive features.
The consequences of using accuracy as a performance
metric are that it results in low T+rate and high F+rate.
These results can also be seen in Table I that even though
the accuracy is higher, AUC and mAP are still minimized.
Therefore, it is deduced that accuracy does not guarantee
accurate classification of the instances in skewed distributions.
TABLE I: Significance of handling imbalanced class
Performance metrics
imbalance class
mAP 0.5952 0.5997 0.8988
AUC 0.6270 0.6520 0.9250
F1-Score 0.6467 0.5500 0.9249
Accuracy 0.7065 0.6519 0.9241
Precision 0.7524 0.6500 0.9153
Recall 0.3771 0.6300 0.9347
C. Comparative analysis
The proposed model is compared with the baseline methods
for validation purpose. The baseline methods used for com-
parison are discussed below.
1) Support vector machine: SVM is an elegant technique
used for both classification and regression tasks. It discrimi-
nates the boundary of different classes by a hyperplane. The
construction of hyperplane is entirely dependent upon the
selection of support vectors. Table II shows the optimized hy-
perparameters obtained through grid search. The regularization
parameter Cis selected to be 0.001 with radial basis function
(RBF) as a kernel.
TABLE II: SVM hyperparameters’ selection
Hyperparameter Values range Optimal value
C 0.001, 0.01, 0.1, 1, 10, 100 0.001
Kernel Linear, RBF RBF
2) Logistic regression: LR is a simple and an elegant
technique used for binary classification. In it, both classes are
separated by a hyperplane h= (w, b), where wand bshow
the norm and intercept of the hyperplane, respectively. The
finding of optimal wand bimplies that the hyperplane can
accurately separate the decision boundary of both the classes.
The operations performed by LR are same as NN for training
input features using trained weight metrics.
A distance metric for each observation such as di=wT.xi
||w|| is
used to find the margin with hyperplane, where wand xdenote
the weights and corresponding input metrics. ||w|| is the norm
to the hyperplane and is assumed as a unit vector. The weights
accompanied with input feature metrics are passed through
a sigmoid function f(d) = 1
1+ed. The library: LIBLINEAR
solver is used to train the classifier and find the optimal
weights while using the logarithmic loss function [35].
During grid-search, the hyperparameters of LR are obtained,
as given in Table III, where Cis the hyperparameter used to
handle the overfitting and Rshows the type of regularization.
The best hyperparameters are achieved when C= 0.01 and
L2norm is selected as the type of regularization. The careful
selection of these parameters is essential for the performance
of the forecasting model [36]
TABLE III: LR hyperparameters’ selection
Hyperparameter Values range Optimal value
C 0.001, 0.01, 0.1, 10, 100 0.01
R L1norm, L2norm L2norm
3) Random forest: RF is an ensemble model with decision
tree as a baseline classifier. In order to make better predictions,
RF combines multiple decision trees (DT) on the bases of
bootstrapping and feature sampling. Simultaneous execution of
bootstrapping and feature extraction yields a different model
each time. The essence of RF is that it can reduce variance
efficiently. The samples which are selected by the classifier
are called in-bag-samples (ibs)while the remaining samples
are known as out-of-bag (oob)samples. The ibs are used to
train the classifier while oob are used to validate the model.
Table IV shows the best generalized hyperparameters of RF.
TABLE IV: RF hyperparameters’ selection
Hyperparameter Values range Optimal value
Number of decision trees 800, 1200, 1600, 2000 1200
Maximum depth 10, 15, 20, 25 20
Minimum sample splits 5, 10, 15, 20 15
Minimum sample leaves 4, 8, 12, 16 16
4) CNN-LSTM: For performing comparative analysis, CNN
and LSTM are integrated to extract the features and analyze
the time series data [37]. The details of hyperparameters are
given in Table V. The performance of LSTM-CNN model
without DSN is shown in Fig. 4 and Fig. 5. Although, the
features are extracted by CNN and the sequence information
is preserved by LSTM, this hybrid model still fails to provide
efficient results due to the lack of discrimination between
TABLE V: CNN-LSTM hyperparameters’ selection
Hyperparameter Optimal
value Hyperparameter Optimal
Number of neurons 64 Stride 1
Number of CNN-layers 6 Dropout 0.1
Number of LSTM-layers 4 Dense layer 128
Number of filters 10 Activation function LeakyReLu
5) Wide and deep CNN: To capture both the wide and
deep information in time series data for NTL detection, a
wide and deep CNN (WD-CNN) is proposed in [1]. The wide
component takes the daily consumption (1-D data) as an input
while the deep component analyzes the weekly consumption
profile, which is represented as 2-D data. The rectified linear
unit (ReLu) is used as an activation function to detect the
positive value. Whereas, the metrics AUC and mAP are used to
measure the performance of the model. The hyperparameters
used to train the model are same as used in [1].
6) Results and discussion: Table VI provides an overview
of the performance metrics used for each classifier for different
training ratios, i.e., 60%,70% and 80%, respectively. Simi-
larly, the detail of each performance metric is given in order
to better understand its importance in ETD. All the results
obtained for traditional classifiers such as LR, SVM, RF show
an increasing trend. By investigating the results, it is observed
that the performance of traditional classifiers is enhanced
by the increase in training instances. In contrast, the deep
networks entirely depend on the selection of hyperparameters
along with the change in model’s training ratio. Moreover, it
is clear in Table VI that the proposed model is successfully
applied to both small sized and immensely large-sized datasets.
Similarly, the proposed model’s performance is visualized in
Fig. 6 and Fig. 7.
In this paper, electricity theft is detected in the SG us-
ing dataset obtained through AMI. A novel theft detection
method is introduced via joint integration of CNN-LSTM
and DSN. The CNN component is capable to handle the
weekly 2-D electricity consumption profile by generalizing
the model efficiently, whereas, the LSTM module memorizes
the daily 1-D sequential electricity consumption data. Moving
ahead, DSN performs judgment on the shared feature extractor
and discriminates the deviating patterns of fraudulent class
consumers from the fair class consumers. The analysis is
performed on high resolution time series data, provided by
SGCC. The simulation results depict that SDN has high
ETD rate with an increased AUC and mAP of 0.93% and
0.9%, respectively. Its comparative analysis with benchmark
methods, such as LR, SVM, RF, CNN-LSTM and WD-CNN,
show that it achieves highest values for all performance
parameters: precision, recall, MaP, Accuracy, AUC and F1-
Score. It maintains its performance for all three training ratios:
60%, 70% and 80%.
[1] Zheng, Z., Yang, Y., Niu, X., Dai, H.N. and Zhou, Y., 2017. Wide
and deep convolutional neural networks for electricity-theft detection to
secure smart grids. IEEE Transactions on Industrial Informatics, 14(4),
[2] Zheng, K., Chen, Q., Wang, Y., Kang, C. and Xia, Q., 2018. A novel
combined data-driven approach for electricity theft detection. IEEE
Transactions on Industrial Informatics, 15(3), pp.1809-1819.
[3] Asif Khan and Nadeem Javaid, "Jaya Learning-Based Optimization for
Optimal Sizing of Stand-Alone Photovoltaic, Wind Turbine, and Battery
Systems", Engineering, Pages: 1-21, Published: 2020, ISSN: 2095-8099.
[4] Ashfaq Ahmad, Nadeem Javaid, Mohsen Guizani, Nabil Ali Alrajeh and
Zahoor Ali Khan, "An Accurate and Fast Converging Short-Term Load
Forecasting Model for Industrial Applications in a Smart Grid", IEEE
Transactions on Industrial Informatics, Volume: 13, Issue: 5, Pages:
2587-2596, Published: October 2017, ISSN: 1551-3203.
[5] Sana Mujeeb and Nadeem Javaid, "ESAENARX and DE-RELM: Novel
Schemes for Big Data Predictive Analytics of Electricity Load and
Price", Sustainable Cities and Society, Volume: 51, Article Number:
101642, Pages: 1-16, Published: November 2019, ISSN: 2210-6707.
[6] Jokar, P., Arianpoo, N. and Leung, V.C., 2015. Electricity theft detection
in AMI using customers’ consumption patterns. IEEE Transactions on
Smart Grid, 7(1), pp.216-226.
[7] Saeed, M.S., Mustafa, M.W., Sheikh, U.U., Jumani, T.A. and Mirjat,
N.H., 2019. Ensemble Bagged Tree Based Classification for Reducing
Non-Technical Losses in Multan Electric Power Company of Pakistan.
Electronics, 8(8), pp.860-876.
[8] Singh, S.K., Bose, R. and Joshi, A., 2019. Energy theft detection for
AMI using principal component analysis based reconstructed data. IET
Cyber-Physical Systems: Theory & Applications, 4(2), pp.179-185.
[9] Mohamad, A.M. and Mohamed, Y.A.R.I., 2019. Investigation and As-
sessment of Stabilization Solutions for DC Microgrid With Dynamic
Loads. IEEE Transactions on Smart Grid, 10(5), pp.5735-5747.
[10] Martins, A.V., Bacurau, R.M., dos Santos, A.D. and Ferreira, E.C., 2019.
Non-Intrusive Energy Meter for Non-Technical Losses Identification.
IEEE Transactions on Instrumentation and Measurement, pp.1-8.
[11] Amin, S., Schwartz, G.A., Cardenas, A.A. and Sastry, S.S., 2015. Game-
theoretic models of electricity theft detection in smart utility networks:
Providing new capabilities with advanced metering infrastructure. IEEE
Control Systems Magazine, 35(1), pp.66-81.
[12] Ahmad, T., Chen, H., Wang, J. and Guo, Y., 2018. Review of various
modeling techniques for the detection of electricity theft in smart grid
environment. Renewable and Sustainable Energy Reviews, 82, pp.2916-
TABLE VI: Comparative analysis of DSN with benchmark schemes
Method Training ratio-60% Training ratio-70% Training ratio-80%
LR 0.710 0.710 0.680 0.700 0.645 0.702 0.725 0.740 0.715 0.720 0.640 0.716 0.730 0.725 0.725 0.730 0.668 0.720
SVM 0.675 0.670 0.680 0.676 0.6140 0.677 0.685 0.675 0.675 0.680 0.619 0.684 0.680 0.680 0.680 0.680 0.628 0.688
RF 0.700 0.550 0.551 0.710 0.687 0.706 0.740 0.730 0.735 0.740 0.652 0.735 0.750 0.750 0.750 0.750 0.681 0.749
CNN-LSTM 0.664 0.615 0.661 0.836 0.638 0.666 0.629 0.662 0.636 0.839 0.641 0.670 0.670 0.69 0.676 0.832 0.66 0.73
WD-CNN 0.640 0.691 0.651 0.820 0.669 0.689 0.624 0.720 0.770 0.770 0.689 0.718 0.661 0.760 0.685 0.840 0.711 0.756
DSN 0.875 0.839 0.857 0.839 0.814 0.860 0.840 0.850 0.845 0.844 0.819 0.844 0.912 0.923 0.928 0.953 0.900 0.934
[13] Avila, N.F., Figueroa, G. and Chu, C.C., 2018. NTL Detection in
Electric Distribution Systems Using the Maximal Overlap Discrete
Wavelet-Packet Transform and Random Undersampling Boosting. IEEE
Transactions on Power Systems, 33(6), pp.7171-7180.
[14] Li, W., Logenthiran, T., Phan, V.T. and Woo, W.L., 2019. A novel smart
energy theft system (SETS) for IoT-based smart home. IEEE Internet of
Things Journal, 6(3), pp.5531-5539.
[15] Kumar, P., Lin, Y., Bai, G., Paverd, A., Dong, J.S. and Martin, A.,
2019. Smart grid metering networks: A survey on security, privacy and
open research issues. IEEE Communications Surveys & Tutorials, 21(3),
[16] Viegas, J.L., Esteves, P.R., Melicio, R., Mendes, V.M.F. and Vieira,
S.M., 2017. Solutions for detection of non-technical losses in the
electricity grid: A review. Renewable and Sustainable Energy Reviews,
80, pp.1256-1268.
[17] Abbasinezhad-Mood, D. and Nikooghadam, M., 2018. Efficient anony-
mous password-authenticated key exchange protocol to read isolated
smart meters by utilization of extended Chebyshev chaotic maps. IEEE
Transactions on Industrial Informatics, 14(11), pp.4815-4828.
[18] Abbasinezhad-Mood, D. and Nikooghadam, M., 2018. Design and hard-
ware implementation of a security-enhanced elliptic curve cryptography
based lightweight authentication scheme for smart grid communications.
Future Generation Computer Systems, 84, pp.47-57.
[19] Saeed, Muhammad Salman, Mohd Wazir Mustafa, Nawaf N. Hamadneh,
Nawa A. Alshammari, Usman Ullah Sheikh, Touqeer Ahmed Jumani,
Saifulnizam Bin Abd Khalid, and Ilyas Khan. "Detection of Non-
Technical Losses in Power Utilities—A Comprehensive Systematic
Review." Energies 13, no. 18 (2020): 4727.
[20] Viegas, J.L., Esteves, P.R. and Vieira, S.M., 2018. Clustering-based
novelty detection for identification of non-technical losses. International
Journal of Electrical Power & Energy Systems, 101, pp.301-310.
[21] Punmiya, R. and Choe, S., 2019. Energy theft detection using gradient
boosting theft detector with feature engineering-based preprocessing.
IEEE Transactions on Smart Grid, 10(2), pp.2326-2329.
[22] Buzau, M.M., Tejedor-Aguilera, J., Cruz-Romero, P. and Gomez-
Exposito, A., 2019. Hybrid deep neural networks for detection of non-
technical losses in electricity smart meters. IEEE Transactions on Power
Systems, pp.1-10.
[23] Spiric, J.V., Stankovic, S.S. and Docic, M.B., 2018. Identification
of suspicious electricity customers. International Journal of Electrical
Power & Energy Systems, 95, pp.635-643.
[24] Razavi, R., Gharipour, A., Fleury, M. and Akpan, I.J., 2019. A practical
feature-engineering framework for electricity theft detection in smart
grids. Applied energy, 238, pp.481-494.
[25] Ramos, C.C., Rodrigues, D., de Souza, A.N. and Papa, J.P., 2016. On the
study of commercial losses in Brazil: a binary black hole algorithm for
theft characterization. IEEE Transactions on Smart Grid, 9(2), pp.676-
[26] Khan, S.H., Bennamoun, M., Sohel, F., Togneri, R. and Naseem, I., 2016.
Integrating geometrical context for semantic labeling of indoor scenes
using rgb images. International Journal of Computer Vision, 117(1),
[27] Razavi-Far, R., Farajzadeh-Zanjani, M., Wang, B., Saif, M. and
Chakrabarti, S., 2019. Imputation-based Ensemble Techniques for Class
Imbalance Learning. IEEE Transactions on Knowledge and Data Engi-
neering, pp.1-14.
[28] He, H., Bai, Y., Garcia, E.A. and Li, S., 2008, June. ADASYN: Adaptive
synthetic sampling approach for imbalanced learning. In 2008 IEEE
international joint conference on neural networks (IEEE world congress
on computational intelligence) (pp. 1322-1328). IEEE.
[29] Rabiya Khalid, Nadeem Javaid, Fahad A. Al-zahrani, Khursheed Au-
rangzeb, Emad-ul-Haq Qazi and Tehreem Ashfaq, "Electricity Load and
Price Forecasting Using Jaya-Long Short Term Memory (JLSTM) in
Smart Grids", Entropy, Volume: 22, Issue: 1, Article Number: 10, Pages:
1-21, Published: January 2020, ISSN: 1099-4300.
[30] Muhammad Adil, Nadeem Javaid, Umar Qasim, Ibrar Ullah, Muhammad
Shafiq and Jin-Ghoo Choi, "LSTM and Bat-Based RUSBoost Approach
for Electricity Theft Detection", Applied Sciences, Volume: 10, Issue:
12, Article Number: 4378, Pages: 1-21, Published: June 2020, ISSN:
[31] Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R. and Schmid-
huber, J., 2016. LSTM: A search space odyssey. IEEE transactions on
neural networks and learning systems, 28(10), pp.2222-2232.
[32] Hu, T., Guo, Q., Shen, X., Sun, H., Wu, R. and Xi, H., 2019. Utilizing
unlabeled data to detect electricity fraud in AMI: A semisupervised deep
learning approach. IEEE transactions on neural networks and learning
systems, 30(11), pp.3287-3299.
[33] Wang, M., Tan, K., Jia, X., Wang, X. and Chen, Y., 2020. A Deep
Siamese Network with Hybrid Convolutional Feature Extraction Module
for Change Detection Based on Multi-sensor Remote Sensing Images.
Remote Sensing, 12(2), p.205. DOI: 10.3390/rs12020205.
[34] Miao, J., Wang, B., Wu, X., Zhang, L., Hu, B. and Zhang, J.Q., 2019,
July. Deep Feature Extraction Based on Siamese Network and Auto-
Encoder for Hyperspectral Image Classification. In IGARSS 2019-2019
IEEE International Geoscience and Remote Sensing Symposium (pp.
397-400). IEEE.
[35] Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R. and Lin, C.J., 2008.
LIBLINEAR: A library for large linear classification. Journal of machine
learning research, 9(Aug), pp.1871-1874.
[36] Rabiya Khalid and Nadeem Javaid, "A Survey on Hyperparameters Opti-
mization Algorithms of Forecasting Models in Smart Grid", Sustainable
Cities and Society, Pages: 1-35, Article Number: 102275, Published:
2020, ISSN: 2210-6707.
[37] Hasan, M., Toma, R.N., Nahid, A.A., Islam, M.M. and Kim, J.M., 2019.
Electricity Theft Detection in Smart Grid Systems: A CNN-LSTM Based
Approach. Energies, 12(17), pp.3310-3328.
NADEEM JAVAID (S’8, M’11, SM’16) received
the bachelor degree in computer science from Go-
mal University, Dera Ismail Khan, Pakistan, in
1995, the master degree in electronics from Quaid-
i-Azam University, Islamabad, Pakistan, in 1999,
and the Ph.D. degree from the University of Paris-
Est, France, in 2010. He is currently an Associate
Professor and the Founding Director of the Commu-
nications Over Sensors (ComSens) Research Labora-
tory, Department of Computer Science, COMSATS
University Islamabad, Islamabad. He has supervised
126 master and 20 Ph.D. theses. He has authored over 900 articles in technical
journals and international conferences. His research interests include energy
optimization in smart/micro grids and in wireless sensor networks, data
analytics in smart grids, and blockchain in WSNs, etc. He was recipient of
the Best University Teacher Award from the Higher Education Commission
of Pakistan, in 2016, and the Research Productivity Award from the Pakistan
Council for Science and Technology, in 2017. He is also Associate Editor of
IEEE Access, Editor of the International Journal of Space-Based and Situated
Computing and editor of Sustainable Cities and Society.
NAEEM JAN Naeem Jan received the B.S. degree
in computer science from PMAS, University Insti-
tute of Information Technology Rawalpindi, Pak-
istan, in 2014, and the M.S. degree in computer
science, under the supervision of Dr. N. Javaid, from
the Department of Computer Science, COMSATS
University Islamabad, Islamabad Campus, Pakistan,
in 2017. He is currently with ComSens (Commu-
nication over Sensors) Research Laboratory, COM-
SATS University Islamabad. His research interests
include wireless sensor networks, optimization tech-
niques, Big data analysis, and Internet of Things.
MUHAMMAD UMAR JAVED received the bach-
elor’s and master’s degrees in electrical engineer-
ing from Government College University Lahore,
Lahore, Pakistan, in 2014 and 2018, respectively.
He is currently pursuing the Ph.D. degree in com-
puter science with the Communications Over Sen-
sors (ComSens) Research Laboratory, COMSATS
University Islamabad, Islamabad Campus, under the
supervision of Dr. Nadeem Javaid. His research
interests include smart grids, electric vehicles, big
data analysis and blockchain.
... A Siamese network (Bedi et al., 2020;Javaid & Jan, 2021;Zhang et al., 2022;Zhao et al., 2022) is a type of neural network architecture that is commonly used for tasks such as similarity matching, face recognition, signature verification, and others. It consists of two identical subnetworks that share the same weights and are trained on pairs of input examples. ...
... (n.d.); (b) first, the support vector machine (SVM) (Liu et al., 2011;Valero-Carreras et al., 2023) is used to generate correctly predicted candidate solutions to build solution pairs; (c) then, the key advantages of Siamese network (Bedi et al., 2020;Javaid & Jan, 2021;Zhang et al., 2022;Zhao et al., 2022) such as; use of identical networks with same configuration, same parameters and same weights are utilized; (d) then, correctly predicted candidate solutions are fed to this Siamese network which is more robust to class imbalance with one-shot learning approach to get better predictions with less number of candidate solutions; (e) as per empirical observations, a very less number of samples belonging to minority class are being correctly predicted by SVM, therefore, this Siamese network has been improvised to generate more synthetic samples by building optimized number of candidate solution pairs using HYNAA for improving the classification accuracy by maintaining a balance between minority and majority class samples; and (f) finally, the performance of proposed strategy has been measured by comparing with basic SMOTE and our previous work SMOTE-PSOEV (Rout et al., 2022) based on ROC-AUC learning curves and the various performance of the strategies and the computational effectiveness of this work has been evaluated. ...
... The outputs of the two networks are then compared to produce a similarity score or distance metric. Considering the key advantages of Siamese networks few authors have used this network to deal with this data imbalance problem by hybridizing this network with some other methodologies (Bedi et al., 2020;Javaid & Jan, 2021;Zhang et al., 2022;Zhao et al., 2022). Zhao et al. (2022) proposed a hybridized network of two CNNs in the Siamese network with same features and same weights named as SCNN and online reweighted example (ORE) to handle data imbalance learning. ...
Full-text available
Dealing with imbalanced data is a common challenge in machine learning, where one class has significantly fewer examples than another. Successfully addressing this challenge requires careful consideration of the data, algorithm, and evaluation metrics to ensure that the model accurately predicts the minority class. In this study, we present a hybrid approach called Siamese‐HYNAA, which combines a Siamese network and a population‐based optimizer hypercube natural aggregation algorithm (HYNAA) to generate candidate solutions for augmenting the minority class. We collected 10 imbalanced datasets ranging from 1.81 to 8.78 imbalanced ratios and built solution pairs based on correctly predicted candidate solutions using support vector machine (SVM). We then fed these solutions to the Siamese network, which employs a one‐shot learning approach to improve predictions with fewer candidate solutions. However, we found that SVM predicted only a small number of minority class samples accurately, prompting us to optimize the number of candidate solution pairs using HYNAA to generate more synthetic samples for the Siamese network. We evaluated our proposed strategy against basic SMOTE and our previous work, SMOTE‐PSOEV, using various performance measures, including ROC‐AUC learning curves, sensitivity, specificity, accuracy, Characteristic stability index, balanced accuracy, F1‐score, informedness, markedness, and execution time. Our results indicate that Siamese‐HYNAA generates promising results for imbalanced data.
... al [43] introduced pairwise loss for the Siamese network by reducing the intra class distance in the feature space while increasing the inter class distance. Also, Siamese networks come in handy and provides good accuracy with an imbalanced class distribution [23,24]. Therefore, to handle problems like imbalanced class distribution, limited Content courtesy of Springer Nature, terms of use apply. ...
... It enhances the discriminative information and modelling of spatial features making the proposed model an end to end model without a need of additional dimensionality reduction approaches. Also, the proposed model is less complex than current CNN models such as VGG Net or Inception Net and eliminates the overfitting problem [23]. With a larger number of layers and a smaller dataset, the lower layers are more likely to suffer from the vanishing gradient problem, which causes dense models to fail in such cases. ...
... Siamese Network presents a similarity measure that may be utilized for image classification challenges, particularly when the number of categories is large or the dataset has an imbalanced class distribution. [23,24]. Distance based methods are quite common solution to such problems as they compute a similarity metric between the pattern to be classified and a database of stored patterns using the same neural network [5]. ...
Full-text available
The prominence of deep learning models for classification of hyperspectral images is directly proportional to their ability to exploit spatial context and spectral bands jointly. The effectiveness of these deep learning models, however, is heavily reliant on a good amount of labelled training samples. In contrast, one of the biggest challenges with hyperspectral images is limited labelled samples availability as getting the samples annotated is a time consuming and labor-intensive process. Traditional machine learning algorithms are available for classification with a higher training time and very deep pre-trained networks like GoogleNet and VGGNet did not work well for hyperspectral image classification. The idea of one shot classification has been quite motivating in recent years to deal with the problems of limited labelled samples, imbalanced distribution of samples leading to poor classification results and overfitting. To implement one shot classification model and overcome these challenges, the proposed work is based on Siamese network that can work with limited samples or imbalanced samples. The proposed Siamese network has a handcrafted feature generation network that extracts discriminative features from the image. Experimental findings on two benchmark hyperspectral datasets demonstrate that the proposed network is capable of improving the classification performance with an overall accuracy of 95.17 and 93.25 for Pavia U and Indian Pines dataset respectively with a small scale trained data.
... The process of filling in missing values follows similar interpolation methods by sliding a time window over a specific time interval. In [20], a DL algorithm is designed to achieve both similar patterns isolation and adaptive learning features for NTL detection in smart grids. In this context, the CNN algorithm and LSTM are used together to solve complexity and data-related dynamism respectively. ...
... Moreover, one of the main contributions to data imbalance problem is the use of adaptive synthesis balance (ASB), while missing values were treated in the same way as in previous works. In [21], a similar but more complex algorithm than the one constructed in [20] is introduced for NTL detection. This time, a 2D-CNN and bidirectional LSTM (Bi-LSTM), which is a further complex variant of LSTM are used for NTL detection. ...
... Considering these works from the perspective of representation learning evolution, the timeline indicates that with each year, the upcoming methods become more and more complex, moving from an ordinary deep network to a hybrid combination of two or more models. Not to mention the adaptive learning features [20], [21], [22] and generative models [18], [21], [23]. The work reported in [19] was the only one that mainly focuses on the study of balancing methods more than the importance of data representations. ...
Electricity theft, known as "Non-Technical Loss" (NTL) is certainly one of the priorities of power distribution utilities. Indeed, NTL could lead to serious damage ranging from massive financial losses to loss of reputation resulting from poor power quality. With advances in metering infrastructure technologies, the availability of user data has fueled the emergence of data-driven methods in NTL detection. Among these methods, deep learning (DL) is an indisputable alternative to conventional human-centric approaches. Typically, modeling based on NTL data is subject to three main challenges, including (i) missing information; (ii) class imbalance; and (iii) data complexity. In this context, this paper contributes to solving these three main problems while paying more attention to data complexity related to cardinality. Accordingly, a multiverse recurrent expansion with multiple repeats (MV-REMR) algorithm is proposed in this paper. MV-REMR is able to provide deeper representations than ordinary DL networks and take advantage of different trained deep network responses to build an efficient model. For MV-REMR efficiency analysis, a realistic NTL dataset is considered. As a result, MV-REMR has shown that it can achieve what is considered excellent feature mapping proven by both scatter visualization and variations in widely used classification metrics. Moreover, MV-REMR shows its ability to marginalize the distance of data classes with superior performance. In addition, thanks to the new mapping scheme, MV-REMR shows its ability to correct outliers resulting from errors in missing values filling techniques. Finally, a comparison with some recent successful works also confirms the superiority of the MV-REMR model.
... Based on the distribution and demand side management of SGs, as well as the classification of SG technologies (i.e., communication technology, information provision, computing intelligence, and cybersecurity in Figure 1), the related works on the applications of SGs using various techniques are classified in Table 1. Neural networks [10,11,21], smart meters [12][13][14]22], and artificial intelligence [15] were adopted for power control; cyberphysical systems [18,23,24], big data [16,19,25], machine learning [17,18,26], AI [20,27] were adopted for demand side management, and network security [28][29][30][31][32] was used for communication and information transmission. All of them were implemented in smart manufacturing applications. ...
Full-text available
To enable highly automated manufacturing and net-zero carbon emissions, manufacturers have invested heavily in smart manufacturing. Sustainable and smart manufacturing involves improving the efficiency and environmental sustainability of various manufacturing operations such as resource allocation, data collecting and monitoring, and process control. Recently, a lot of artificial intelligence and optimization applications based on smart grid systems have improved the energy usage efficiency in various manufacturing operations. Therefore, this survey collects recent works on applications of artificial intelligence and optimization for smart grids in smart manufacturing and analyzes their features, requirements, and challenges. In addition, potential trends and further challenges for the integration of smart grids with renewable energies for smart manufacturing, applications of 5G and B5G (beyond 5G) technologies in the SG system, and next-generation smart manufacturing systems are discussed to provide references for further research.
... Weight coefficients emerge solely during the learning phase, culminating in the generation of a neuron activation signal destined for the subsequent layer of the network. Eventually, the neuron outputs' products are aggregated according to their respective weights, delineating the conclusive output, a process documented in [70,[73][74][75][76][77][78][79][80][81][82][83][84][85][86][87][88][89][90]. ...
Full-text available
Nontechnical losses of electrical energy (NTLEE) have been a persistent issue in both the Russian and global electric power industries since the end of the 20th century. Every year, these losses result in tens of billions of dollars in damages. Promptly identifying unscrupulous consumers can prevent the onset of NTLEE sources, substantially reduce the amount of NTLEE and economic damages to network grids, and generally improve the economic climate. The contemporary advancements in machine learning and artificial intelligence facilitate the identification of NTLEE sources through anomaly detection in energy consumption data. This article aims to analyze the current efficacy of computational methods in locating, detecting, and identifying nontechnical losses and their origins, highlighting the application of neural network technologies. Our research indicates that nearly half of the recent studies on identifying NTLEE sources (41%) employ neural networks. The most utilized tools are convolutional networks and autoencoders, the latter being recognized for their high-speed performance. This paper discusses the main metrics and criteria for assessing the effectiveness of NTLEE identification utilized in training and testing phases. Additionally, it explores the sources of initial data, their composition, and their impact on the outcomes of various algorithms.
... The traditional power grids are being revolutionized by the introduction of information and communication technologies, which make them intelligent. Smart grids allow for efficient energy management [11], electricity price and load forecasting [12,13,14,15,16], electricity consumption (EC) behavior characterization [17], [18] and integration of renewable energy sources [19], [20]. This advancement also brings the notion of smart meters that record EC of consumers at higher frequency. ...
Full-text available
The reduction of electricity theft in power sectors is an important concern of utilities as it represents an essential part of their total potential benefits. The existing electricity theft detection (ETD) approaches have unsatisfactory performance due to high dimensional and imbalanced data. Moreover, the existing approaches also have ineffective results due to the limited availability of supervised data and auxiliary information. Therefore, we introduce a new mechanism that is based on two scenarios. In the first scenario, a new supervised learning solution is presented, which is a combination of UNet and generative adversarial network (GAN), named as UNet-GAN. The GAN's structure is mainly comprised of two neural networks: generator and discriminator. Due to an excellent performance of UNet, we utilize it in both generator and discriminator parts. These two neural networks contest with each other in a game-theoretic manner to significantly boost the ETD performance. In the second scenario, a novel dynamic learning based semi-supervised solution is proposed that consists of probabilistic guider (PG) and Ladder network. The solution is termed as PG-Ladder network. PG dynamically guides the proposed network to further improve its performance in terms of ETD. Furthermore, the performance of the proposed solution is evaluated over the suitable classification indicators using real electricity consumption (EC) records. The results indicate more efficient performance of the proposed solution than the state-of-the-art approaches regarding the ETD.
... The optimal threshold and the positive class prior exhibit a substantial correlation, according to linear models and visualizations. Adaptive synthesis reliant on big data analytics has been developed by Javaid et al. [42] for the detection of electricity theft (ETD). The authors developed a deep Siamese network by combining LSTM and CNN to distinguish fraud and genuine customers. ...
In today’s digital world, information is growing along with the expansion of Internet usage worldwide. As a consequence, bulk of data is generated constantly which is known to be “Big Data”. One of the most evolving technologies in twenty-first century is Big Data analytics, it is promising field for extracting knowledge from very large datasets and enhancing benefits while lowering costs. Due to the enormous success of big data analytics, the healthcare sector is increasingly shifting toward adopting these approaches to diagnose diseases. Due to the recent boom in medical big data and the development of computational methods, researchers and practitioners have gained the ability to mine and visualize medical big data on a larger scale. Thus, with the aid of integration of big data analytics in healthcare sectors, precise medical data analysis is now feasible with early sickness detection, health status monitoring, patient treatment, and community services is now achievable. With all these improvements, a deadly disease COVID is considered in this comprehensive review with the intention of offering remedies utilizing big data analytics. The use of big data applications is vital to managing pandemic conditions, such as predicting outbreaks of COVID-19 and identifying cases and patterns of spread of COVID-19. Research is still being done on leveraging big data analytics to forecast COVID-19. But precise and early identification of COVID disease is still lacking due to the volume of medical records like dissimilar medical imaging modalities. Meanwhile, Digital imaging has now become essential to COVID diagnosis, but the main challenge is the storage of massive volumes of data. Taking these limitations into account, a comprehensive analysis is presented in the systematic literature review (SLR) to provide a deeper understanding of big data in the field of COVID-19.
Deep learning (DL) has achieved great success in the field of electricity theft detection (ETD). Most existing studies have used supervised mode to complete the DL-based ETD, but they do not have the capability of incremental detection, especially in small sample size scenarios. To address this problem, this paper proposes a semi-supervised ETD approach based on hybrid replay strategy. From the data perspective, this paper designs a hybrid replay strategy that includes a variational autoencoder (VAE) and sample scrambling ranking (SSR) methods, and uses a ”rehearsal” method to obtain incremental ETD capability. From the detection method perspective, this paper designs a semi-supervised ETD architecture that uses a temporal convolutional attention network (TCAN) as a feature extractor and uses contrastive learning to improve the utilization of unlabeled sensing samples, thus reducing the labeled sample size required for the fine-tuning process. Experimental results on the Irish smart energy trial (ISET) dataset show that the proposed scheme effectively solves the problem of incremental ETD in small sample size, and achieves 92.72%, 92.70%, 92.57% on accuracy, precision, and f1-score, respectively.
Full-text available
Electricity theft and fraud in energy consumption are two of the major issues for power distribution companies (PDCs) for many years. PDCs around the world are trying different methodologies for detecting electricity theft. The traditional methods for non-technical losses (NTLs) detection such as onsite inspection and reward and penalty policy have lost their place in the modern era because of their ineffective and time-consuming mechanism. With the advancement in the field of Artificial Intelligence (AI), newer and efficient NTL detection methods have been proposed by different researchers working in the field of data mining and AI. The AI-based NTL detection methods are superior to the conventional methods in terms of accuracy, efficiency, time-consumption, precision, and labor required. The importance of such AI-based NTL detection methods can be judged by looking at the growing trend toward the increasing number of research articles on this important development. However, the authors felt the lack of a comprehensive study that can provide a one-stop source of information on these AI-based NTL methods and hence became the motivation for carrying out this comprehensive review on this significant field of science. This article systematically reviews and classifies the methods explored for NTL detection in recent literature, along with their benefits and limitations. For accomplishing the mentioned objective, the opted research articles for the review are classified based on algorithms used, features extracted, and metrics used for evaluation. Furthermore, a summary of different types of algorithms used for NTL detection is provided along with their applications in the studied field of research. Lastly, a comparison among the major NTL categories, i.e., data-based, network-based, and hybrid methods, is provided on the basis of their performance, expenses, and response time. It is expected that this comprehensive study will provide a one-stop source of information for all the new researchers and the experts working in the mentioned area of research.
Full-text available
The electrical losses in power systems are divided into non-technical losses (NTLs) and technical losses (TLs). NTL is more harmful than TL because it includes electricity theft, faulty meters and billing errors. It is one of the major concerns in the power system worldwide and incurs a huge revenue loss for utility companies. Electricity theft detection (ETD) is the mechanism used by industry and academia to detect electricity theft. However, due to imbalanced data, overfitting issues and the handling of high-dimensional data, the ETD cannot be applied efficiently. Therefore, this paper proposes a solution to address the above limitations. A long short-term memory (LSTM) technique is applied to detect abnormal patterns in electricity consumption data along with the bat-based random under-sampling boosting (RUSBoost) technique for parameter optimization. Our proposed system model uses the normalization and interpolation methods to pre-process the electricity data. Afterwards, the pre-processed data are fed into the LSTM module for feature extraction. Finally, the selected features are passed to the RUSBoost module for classification. The simulation results show that the proposed solution resolves the issues of data imbalancing, overfitting and the handling of massive time series data. Additionally, the proposed method outperforms the state-of-the-art techniques; i.e., support vector machine (SVM), convolutional neural network (CNN) and logistic regression (LR). Moreover, the F1-score, precision, recall and receiver operating characteristics (ROC) curve metrics are used for the comparative analysis.
Full-text available
Renewable energy sources (RESs) are considered to be reliable and green electric power generation sources. Photovoltaics (PVs) and wind turbines (WTs) are used to provide electricity in remote areas. Optimal sizing of hybrid RESs is a vital challenge in a stand-alone environment. The meta-heuristic algorithms proposed in the past are dependent on algorithm-specific parameters for achieving an optimal solution. This paper proposes a hybrid algorithm of Jaya and a teaching-learning-based optimization (TLBO) named the JLBO algorithm for the optimal unit sizing of a PV-WT-battery hybrid system to satisfy the consumer's load at minimal total annual cost (TAC). The reliability of the system is considered by a maximum allowable loss of power supply probability (LPSPmax) concept. The results obtained from the JLBO algorithm are compared with the original Jaya, TLBO, and genetic algorithms. The JLBO results show superior performance in terms of TAC, and the PV-WT-battery hybrid system is found to be the most economical scenario. This system provides a cost-effective solution for all proposed LPSPmax values as compared with PV-battery and WT-battery systems.
Full-text available
Forecasting in the smart grid (SG) plays a vital role in maintaining the balance between demand and supply of electricity, efficient energy management, better planning of energy generation units and renewable energy sources and their dispatching and scheduling. Existing forecasting models are being used and new models are developed for a wide range of SG applications. These algorithms have hy-perparameters which need to be optimized carefully before forecasting. The optimized values of these algorithms increase the forecasting accuracy up-to a significant level. In this paper, we present a brief literature review of forecasting models and the optimization methods used to tune their hyperparam-eters. In addition, we have also discussed the data preprocessing methods. A comparative analysis of these forecasting models, according to their hyperparameter optimization, error methods and prepro-cessing methods, is also presented. Besides, we have critically analyzed the existing optimization and data preprocessing models and highlighted the important findings. A survey of existing survey papers is also presented and their recency score is computed based on the number of recent papers reviewed in them. By recent, we mean that the year in which a survey paper is published and its previous three years. Finally, future research directions are discussed in detail.
Full-text available
Information extraction from multi-sensor remote sensing images has increasingly attracted attention with the development of remote sensing sensors. In this study, a supervised change detection method, based on the deep Siamese convolutional network with hybrid convolutional feature extraction module (OB-DSCNH), has been proposed using multi-sensor images. The proposed architecture, which is based on dilated convolution, can extract the deep change features effectively, and the character of “network in network” increases the depth and width of the network while keeping the computational budget constant. The change decision model is utilized to detect changes through the difference of extracted features. Finally, a change detection map is obtained via an uncertainty analysis, which combines the multi-resolution segmentation, with the output from the Siamese network. To validate the effectiveness of the proposed approach, we conducted experiments on multispectral images collected by the ZY-3 and GF-2 satellites. Experimental results demonstrate that our proposed method achieves comparable and better performance than mainstream methods in multi-sensor images change detection.
Full-text available
In the smart grid (SG) environment, consumers are enabled to alter electricity consumption patterns in response to electricity prices and incentives. This results in prices that may differ from the initial price pattern. Electricity price and demand forecasting play a vital role in the reliability and sustainability of SG. Forecasting using big data has become a new hot research topic as a massive amount of data is being generated and stored in the SG environment. Electricity users, having advanced knowledge of prices and demand of electricity, can manage their load efficiently. In this paper, a recurrent neural network (RNN), long short term memory (LSTM), is used for electricity price and demand forecasting using big data. Researchers are working actively to propose new models of forecasting. These models contain a single input variable as well as multiple variables. From the literature, we observed that the use of multiple variables enhances the forecasting accuracy. Hence, our proposed model uses multiple variables as input and forecasts the future values of electricity demand and price. The hyperparameters of this algorithm are tuned using the Jaya optimization algorithm to improve the forecasting ability and increase the training mechanism of the model. Parameter tuning is necessary because the performance of a forecasting model depends on the values of these parameters. Selection of inappropriate values can result in inaccurate forecasting. So, integration of an optimization method improves the forecasting accuracy with minimum user efforts. For efficient forecasting, data is preprocessed and cleaned from missing values and outliers, using the z-score method. Furthermore, data is normalized before forecasting. The forecasting accuracy of the proposed model is evaluated using the root mean square error (RMSE) and mean absolute error (MAE). For a fair comparison, the proposed forecasting model is compared with univariate LSTM and support vector machine (SVM). The values of the performance metrics depict that the proposed model has higher accuracy than SVM and univariate LSTM.
Full-text available
Among an electricity provider's non-technical losses, electricity theft has the most severe and dangerous effects. Fraudulent electricity consumption decreases the supply quality, increases generation load, causes legitimate consumers to pay excessive electricity bills, and affects the overall economy. The adaptation of smart grids can significantly reduce this loss through data analysis techniques. The smart grid infrastructure generates a massive amount of data, including the power consumption of individual users. Utilizing this data, machine learning and deep learning techniques can accurately identify electricity theft users. In this paper, an electricity theft detection system is proposed based on a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) architecture. CNN is a widely used technique that automates feature extraction and the classification process. Since the power consumption signature is time-series data, we were led to build a CNN-based LSTM (CNN-LSTM) model for smart grid data classification. In this work, a novel data pre-processing algorithm was also implemented to compute the missing instances in the dataset, based on the local values relative to the missing data point. Furthermore, in this dataset, the count of electricity theft users was relatively low, which could have made the model inefficient at identifying theft users. This class imbalance scenario was addressed through synthetic data generation. Finally, the results obtained indicate the proposed scheme can classify both the majority class (normal users) and the minority class (electricity theft users) with good accuracy.
Correct classification of rare samples is a vital data mining task and of paramount importance in many research domains. This paper mainly focuses on the development of the novel class-imbalance learning techniques, which make use of oversampling methods integrated with bagging and boosting ensembles. Two novel oversampling strategies based on the single and the multiple imputation methods are proposed. The proposed techniques aim to create useful synthetic minority class samples, similar to the original minority class samples, by estimation of missing values that are already induced in the minority class samples. The re-balanced datasets are then used to train base-learners of the ensemble algorithms. In addition, the proposed techniques are compared with the commonly used class imbalance learning methods in terms of three performance metrics including AUC, F-measure, and G-mean over several synthetic binary class datasets. The empirical results show that the proposed multiple imputation-based oversampling combined with bagging significantly outperforms other competitors.
Non-technical losses in electricity utilities are responsible for major revenue losses. In this paper, we propose a novel end-to-end solution to self-learn the features for detecting anomalies and frauds in smart meters using a hybrid deep neural network. The network is fed with simple raw data, removing the need of handcrafted feature engineering. The proposed architecture consists of a long short-term memory network and a multi-layer perceptrons network. The first network analyses the raw daily energy consumption history whilst the second one integrates non-sequential data such as its contracted power or geographical information. The results show that the hybrid neural network significantly outperforms state-of-the-art classifiers as well as previous deep learning models used in non-technical losses detection. The model has been trained and tested with real smart meter data of Endesa, the largest electricity utility in Spain.