ArticlePDF Available

Data Augmentation using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM to Detect Non-Technical Losses in Smart Grids



In this paper, we present a hybrid deep learning model that is based on a two-dimensional convolutional neural network (2D-CNN) and a bidirectional long short-term memory network (Bi-LSTM)to detect non-technical losses (NTLs) in smart meters. NTLs occur due to the fraudulent use of electricity. The global integration of smart meters has proven to be beneficial for the storage of historical electricity consumption (EC) data. The proposed methodology learns the deep insights from the historical EC data and informs power utilities about the presence of NTLs. However, the effective detection of NTLs faces the problem of class imbalance that occurs due to the rare availability of fraudulent electricity consumers. To solve this issue, an evolutionary bidirectional Wasserstein generative adversarial network (Bi-WGAN) is employed. Bi-WGAN synthesizes the most plausible fraudulent EC samples by integrating an auxiliary encoder module. Besides, the inevitable curse of high dimensional data reduces the generalization ability of classifiers. The proposed hybrid model efficiently handles the highly dynamic data by utilizing its potent feature extracting capabilities. The one-dimensional daily EC data is passed to Bi-LSTM model for capturing the non-malicious changes from consumers’ profiles. Meanwhile, 2D-CNN takes 2D weekly EC data as input to extract the potential features by applying different convolutions and pooling operations. Extensive experiments are conducted on a realistic smart meters dataset to prove the effectiveness of the proposed model. The results show that the proposed model outperforms the state-of-the-art models by achieving area under the curve receiver operating characteristics of 0.97 and precision-recall area under the curve of 0.98, which make it suitable for real-world scenarios.
Received January 12, 2022, accepted February 4, 2022, date of publication February 8, 2022, date of current version March 16, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3150047
Data Augmentation Using BiWGAN, Feature
Extraction and Classification by Hybrid 2DCNN
and BiLSTM to Detect Non-Technical
Losses in Smart Grids
MUHAMMAD ASIF 1, (Graduate Student Member, IEEE), OROOJ NAZEER1,2,
1Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
2Department of Computing and Technology, Abasyn University, Islamabad 44000, Pakistan
3School of Computer Science, University of Technology Sydney, Ultimo, NSW 2007, Australia
4Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia
5Department of Computer Sciences, College of Computer and Information Science, Princess Nourah Bint Abdulrahman University, Riyadh 11671, Saudi Arabia
Corresponding author: Nadeem Javaid (
This work is supported by Taif University Researchers Supporting Project number (TURSP-2020/292) Taif University, Taif, Saudi Arabia.
This work is also supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number
(PNURSP2022R193), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
ABSTRACT In this paper, we present a hybrid deep learning model that is based on a two-dimensional
convolutional neural network (2D-CNN) and a bidirectional long short-term memory network (Bi-LSTM)to
detect non-technical losses (NTLs) in smart meters. NTLs occur due to the fraudulent use of electricity.
The global integration of smart meters has proven to be beneficial for the storage of historical electricity
consumption (EC) data. The proposed methodology learns the deep insights from the historical EC data
and informs power utilities about the presence of NTLs. However, the effective detection of NTLs faces
the problem of class imbalance that occurs due to the rare availability of fraudulent electricity consumers.
To solve this issue, an evolutionary bidirectional Wasserstein generative adversarial network (Bi-WGAN)
is employed. Bi-WGAN synthesizes the most plausible fraudulent EC samples by integrating an auxiliary
encoder module. Besides, the inevitable curse of high dimensional data reduces the generalization ability
of classifiers. The proposed hybrid model efficiently handles the highly dynamic data by utilizing its potent
feature extracting capabilities. The one-dimensional daily EC data is passed to Bi-LSTM model for capturing
the non-malicious changes from consumers’ profiles. Meanwhile, 2D-CNN takes 2D weekly EC data as
input to extract the potential features by applying different convolutions and pooling operations. Extensive
experiments are conducted on a realistic smart meters dataset to prove the effectiveness of the proposed
model. The results show that the proposed model outperforms the state-of-the-art models by achieving area
under the curve receiver operating characteristics of 0.97 and precision-recall area under the curve of 0.98,
which make it suitable for real-world scenarios.
INDEX TERMS Bidirectional generative adversarial network, convolutional neural network, data
augmentation, deep learning, electricity theft detection, feature extraction, long short-term memory network,
non-technical losses, smart grids.
Nowadays, the major activities of human lives are dependent
on the electricity. It has become an important part of human
The associate editor coordinating the review of this manuscript and
approving it for publication was Sotirios Goudos.
life. In the modern era, varieties of ways are introduced to
generate electricity, such as production through hydro power,
wind power, fuel power and thermal power. However, differ-
ent losses occur during the generation of electricity [1]. The
most common losses are classified into technical losses (TLs)
and non-technical losses (NTLs). TLs happen because of the
VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see 27467
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
heat production in electrical distribution lines, short circuits
in transformers or other grids components, etc. Whereas,
NTLs occur due to energy theft, meter bypassing, meter
malfunctioning, billing errors, etc. The major source of NTL
is electricity theft (ET). The power utilities around the globe
accounted for billions of dollars per annum due to NTLs. The
electric utilities in the United States of America bear almost
$6 billion every year because of NTLs [2]. Similarly, the
Chinese power companies lost almost $15 million till 2018 as
a result of energy theft [3]. The underdeveloped countries
are also affected by NTLs, such as Brazil and India and they
lose approximately 16% and 25% of their total energy supply,
respectively [4]. Besides the huge financial loss, NTLs also
disturb the normal flow of electricity by overloading the
transformers and grid’s internal components.
The recent enhancement in advanced metering infras-
tructure (AMI) integrates communication flow with energy
flow to enable the cooperation between consumers and
electric utilities. The integration of AMI brings potential
benefits, such as efficient recording of electricity usage,
remote controlling of electricity consumption (EC), real-
time pricing and providing grids’ status information for
power utilities to detect NTLs. However, it introduces numer-
ous ways for electricity thieves to remotely compromise
the smart metering systems and manipulate meters’ read-
ing [5]. Keeping the above concerns in view, electricity
theft detection (ETD) has become essential for the modern
era. In addition, the availability of massive EC data enables
researchers to exploit state-of-the-art data driven methods for
better ETD.
According to literature, different researchers performed
ETD using varieties of statistical and machine learning (ML)
methods [6], [7]. In general, three methods are commonly
used for ETD. These methods are enlisted as follows: i) state
based methods, ii) game theory based methods and iii) data
driven based methods. In state or hardware based methods,
special devices and sensors are integrated with the smart
meters to detect the abnormal consumers [8]. However, these
methods are costly in terms of both time and money. More-
over, extra maintenance cost is required for installation and
management of these devices. Whereas, in game theory based
methods, a virtual environment is initially created. Then,
a game is played between electric utilities and consumers
to perform ETD [9]. A special utility function is formulated
where the rules and regulations are defined. The game is
stopped when the equilibrium state is achieved. However,
these methods are not proven to be effective because design-
ing a suitable utility function for complex scenarios is a
challenging task for researchers. In contrast, the data driven
based methods demand only data for model’s training so they
become cost effective solutions to perform ETD. The massive
availability of EC data enables the application of numer-
ous data driven based solutions. The researchers put their
efforts by adopting different supervised and unsupervised ML
solutions to detect electricity thieves and support the power
industries to reduce revenue loss.
TABLE 1. List of Acronyms.
In recent literature, varieties of supervised and unsuper-
vised methods are adopted to detect energy thieves in smart
grids. In this regard, several machine and deep learning based
solutions are proposed by researchers to perform ETD [1],
[10]–[12], [13]. However, these solutions do not provide
satisfactory results because of inefficient feature engineering.
Poor feature engineering also degrades the generalization
ability of models. Moreover, limited amount of labeled EC
data is another underlying cause that decreases the detection
accuracy. Furthermore, in deep learning models, the problem
of internal covariate shift (ICS) adversely affects the stable
learning of hidden layers [1], [14]. ICS occurs when the input
distribution of a hidden neural layer is transferred to other
layers. The severe lack of fraudulent electricity consumers
in real-world scenarios creates a class imbalance problem,
which is an important concern for efficient ETD [1], [5], [14],
[15]. In addition, the noisy and high dimensional data leads
27468 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
to the curse of dimensionality issue, which is confronted by
the researchers during ETD [14].
Keeping the above concerns in view, we propose a novel
deep learning solution to improve the detection accuracy
of ETD in power grids. The proposed model consists of
a two-dimensional convolutional neural network (2D-CNN)
and a bidirectional long short-term memory (Bi-LSTM).
A bidirectional Wasserstein generative adversarial network
(Bi-WGAN) is exploited for synthesizing the minority class
theft samples. The one-dimensional (1D) daily EC data is
converted into a 2D manner according to weeks. 2D-CNN
is developed to capture the weekly insights and periodicity
from 2D weekly data. Meanwhile, Bi-LSTM takes 1D data
as input and extracts the long-term temporal correlation from
EC profiles. It also overcomes the effects of non-malicious
factors and consequently, reduces the high false positive rate
(FPR). Finally, a single feature vector is devised by merging
the outcomes of both models. Then, a sigmoid function is
employed for final ETD. It is worth mentioning that this work
is the extension of [16].
The major contributions of this study are enlisted as follows.
A novel state-of-the-art methodology is introduced,
which combines 2D-CNN and Bi-LSTM models. The
proposed model efficiently performs feature extraction
and resolves the curse of dimensionality issue.
The Bi-WGAN model is employed to resolve the
inevitable class imbalance problem. The samples gener-
ated by the model are closely related to real-world theft
patterns. To the best of our knowledge, we apply Bi-
WGAN first time in the ETD domain for augmenting
the theft class samples.
The Bi-LSTM model is leveraged to handle the problem
of high FPR, which occurs due to several non-malicious
factors. The model intelligently captures long-term ten-
dency and temporal correlations from the EC data to
minimize the effects of non-malicious changes.
For comprehensive analysis of the proposed model, area
under the curve (AUC), precision, recall, AUC receiver
operating characteristics (AUC-ROC), precision-recall
AUC (PR-AUC), F1-score and Matthews correlation
coefficient (MCC) metrics are considered.
The organization of the manuscript is as follows. The related
work is presented in Section II. The formulation and analysis
of the problem statement are given in Section III. The pro-
posed scheme is explained in Section IV. Section Vdescribes
the experimental results of the proposed and benchmark
schemes. In last, the manuscript is concluded in Section VI.
The literature is saturated with numerous statistical and
ML models where ETD is performed. In fact, these models
require handcraft feature engineering and pertinent domain
expertise, which is a difficult and time-consuming task. The
existing ML models under-performed while capturing tem-
poral correlations and complex non linearities from EC pro-
files. In general, most of the ML models performed ETD by
utilizing only 1D EC data. However, catching latent features
and periodicity from 1D data is a difficult process [1]. In [14],
it is referred that all conventional schemes are centered
around manual feature engineering in order to identify NTL
patterns. Moreover, in the existing work, no mathematical
based solutions are established to distinguish shunt and dou-
ble tapping attacks. The authors of [17] examine that the
existing ML algorithms are not taken into account for the
proper feature engineering step, which consequently leads to
the poor generalization issue.
The authors of [5] identify that many conventional
ML techniques are exploited to detect NTLs in power grids.
However, they neglect an efficient feature engineering pro-
cess that results in poor generalization and low detection
accuracy. Many classification and clustering techniques make
an early decision about the abrupt changes in consumers’
consumption that results in a high FPR because it may happen
due to several non-malicious factors, e.g., weekends, change
of residents, change of appliances, change of seasonality,
etc. Moreover, the existing techniques perform poorly in the
detection of zero-day attacks. Similarly, the authors of [12]
and [18] highlight the issue of inappropriate feature engineer-
ing. The process of handcraft feature engineering demands
the involvement of domain expert, which is a time intensive
and difficult task. In [18], the most prominent features are
extracted through autoencoder from highly dynamic EC data
to perform efficient ETD. However, further improvement is
needed to recognize some intelligent attacks, such as shunt
attack, zero data attack, double tapping attack and so forth.
In [19], numerous clustering based techniques are
exploited for anomaly detection in smart meters’ data. How-
ever, the fluctuations and variations in the normal and theft
load profiles are not properly detected, which yield poor
detection results. Similarly, the authors in [20] analyze some
traditional techniques that are applied to detect data poisoning
attacks. However, these techniques add up an additional stage
of data filtering, which first removes any available false
label and then performs the detection step. In [21]–[24], the
authors discuss that many pattern recognition and conven-
tional ML techniques are employed for NTL detection. These
techniques demand extensive handcraft feature engineering,
which is a laborious, time-consuming and financially expen-
sive task. Moreover, the re-involvement of the domain experts
is needed when new features are to be required. In addition,
these techniques poorly perform to extract vital features from
the available high dimensional EC data.
According to [25], many conventional anomaly detection
algorithms mistakenly detect the normal user as abnormal
because of several non-malicious factors: changes of home
residents, weekends, changes in the number of appliances,
etc. These non-malicious factors also become the reason
of high FPR. Moreover, in [21], it is mentioned that many
researchers exploit deep learning models for theft identifi-
cation and self feature learning from the highly dynamic
EC data. However, these models are tested and evaluated on
the artificially generated data, which is not effective for a
VOLUME 10, 2022 27469
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
reliable assessment. According to [26] and [27], the manual
creation of features is not sufficient to properly detect the
NTL behavior because of stochastic changes in EC profiles.
In [28], the problem of maintaining temporal correlation in
the existing ML models is highlighted. Moreover, the learning
algorithms are unable to learn the potential features from
1D raw EC data.
The study of [29] demonstrates that many researchers
propose different electricity theft detectors. However, these
detectors have low detection accuracy because the EC data
is a highly dynamic and rapidly growing time-series data.
In [30], the authors discuss that many conventional data
mining and ML techniques are exploited to filter customers’
consumption patterns for the detection of irregular elec-
tricity profiles. However, these techniques under-perform
because of improper feature engineering. Moreover, different
non-malicious factors mislead the classification model in a
wrong direction, which is a quite serious issue in the existing
research. From [31] and [32], numerous non-malicious fac-
tors degrade the detection accuracy of traditional ML models.
In [33], bidirectional gated recurrent unit (Bi-GRU) is used
for extracting the high level features from the electricity load
profile in order to detect NTLs. However, synthetic minority
oversampling technique (SMOTE) and SMOTE over sam-
pling tomik link are used for data balancing, which raise
overfitting issue because of generating duplicate records and
vanishing the temporal correlation between consumption pat-
terns. In addition, the authors of [34] discover that the existing
deep learning techniques are not suitable for anomaly detec-
tion in electricity power data because of interpretability and
practicality concerns. On the other hand, the authors of [2],
[12], [17], [21], [29] and [35] highlight a critical class imbal-
ance issue that occurs in ETD because of less availability
of fraudulent consumers. Consequently, the majority class
dominates the minority class, which leads to high FPR. More-
over, the learning algorithms are skewed towards the majority
class. As a result, the misclassification rate is increased to a
greater extent. According to [4], [11] and [19], the problem
of limited amount of labeled EC data becomes challenging
for ML algorithms to perform efficient ETD. Similarly, the
authors of [22], [26] and [28] examine that the severe imbal-
ance proportion of classes adversely affect the generalization
power of classifiers. Due to this, the classification algorithms
have higher chance to suffer from the overfitting issue.
From [32] and [36], the existing literature is teemed with
various oversampling techniques that are employed to handle
the problem of class imbalance. In oversampling techniques,
the minority class samples are augmented and the proportion
of classes is equalized. SMOTE, K-mean SMOTE, adaptive
synthetic (ADASYN) and so forth are well known oversam-
pling techniques that are used to synthesize the minority
class instances. The GAN model is also exploited to augment
the minority class samples. It becomes popular due to its
tremendous success in generating artificial data. However, the
above mentioned techniques lack in capturing the arbitrary
fluctuation and probabilistic curve from EC patterns while
generating fraudulent samples. Consequently, the final clas-
sification results do not provide real-world assessment.
With the advent of AMI, the energy flow is integrated with
the communication flow in order to establish two way real
time coordination between consumers and power industries.
However, with the involvement of the Internet, the communi-
cation flow can be prone to different contamination attacks,
which are harmful for power utilities and become one of
the reasons for NTLs. So, there is an important need for
a robust ETD model. In [1], wide and deep convolutional
neural network (WD-CNN) is proposed to reduce the curse of
dimensionality. However, a single layer of neural network is
integrated inside the wide component that does not learn the
temporal correlation and hidden features from 1D EC data
and also gets stuck in local optima. Moreover, the models
presented in [2], [4] and [14] do not use any feature extrac-
tion module to reduce the data dimensionality. The rapid
growth in the dimensions of time series data degrades the
model’s accuracy and increases the computational overhead.
Therefore, if data dimensionality is not handled correctly,
the deep or ML models memorize the noise and redundant
features that lead toward poor generalization problem. Fur-
thermore, the ICS is another common issue that occurs in
deep neural networks. It happens due to the shifting of input
distribution between different layers of neural networks and
the changing of network parameters on each hidden layer.
However, in [1] and [14], no mechanism is presented to
handle the ICS problem, which adversely affects the stable
learning of neural networks. It also degrades the hidden
layers’ feature learning capabilities, increases the training
time and slows down the convergence rate. Another major
issue faced by the researchers is the high FPR that occurs
due to several non-malicious factors and false injection of
noise in data by the intelligent attackers. For instance, the
deep learning models used in [1] and [21] are unable to
capture the non-malicious changes and long-term temporal
correlation from the EC data, which increases the FPR and
onsite inspection cost as well.
The imbalanced nature of data is another major con-
cern that occurs when detecting energy thieves. It raises
the overfitting and poor generalization issues. In [1], [14]
and [15], the problem of imbalance data is not handled.
As a result, the classification model is skewed towards the
larger class. Furthermore, in [11] and [29], the dataset is
balanced through random under sampling (RUS), which over-
looks the important information. Moreover, in [4] and [22],
the authors exploit SMOTE approach for data balancing.
It generates the synthetic samples without considering the
overlapping of neighboring samples. Therefore, it introduces
an additional noise and increases the ratio of duplicate
records, which lead the models towards overfitting. Fur-
thermore, in ETD, the selection of appropriate performance
metric is a necessary task for better evaluation of a model.
27470 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
However, in [2] and [19], the appropriate metrics are not
considered for performing a comprehensive analysis.
This section describes the architecture of the proposed elec-
tricity theft detection model, which is divided into four stages.
1) In the first stage, data preprocessing is performed in
which missing values are filled through linear inter-
polation method, outliers are handled by three sigma
rule (TSR) and feature scaling is done using Min-Max
2) In the second stage, class imbalance issue is resolved
by augmenting the minority class theft samples
using Bi-WGAN.
3) In the third stage, a hybrid deep learning model is
designed in which two modules, termed as 2D-CNN
and Bi-LSTM, are integrated in a parallel manner to
perform efficient feature extraction and memorization
of temporal EC patterns.
4) In the fourth stage, a hybrid module is developed to per-
form the classification of theft and benign consumers.
Further explanation about the above mentioned steps is given
in the upcoming subsections. Moreover, the complete rep-
resentation of the proposed scheme is shown in Fig. 1. For
easy understanding, a unique step number is assigned to each
stage. In the first step, data preprocessing is carried out. In the
second step, the preprocessed data is separated into minority
theft class and majority benign class. In the third step, the data
augmentation is performed by simulating theft samples. The
balanced dataset is produced at step four by concatenating
the augmented theft samples with benign ones. In the fifth
and sixth steps, feature extraction and memorization of tem-
poral EC patterns are preformed by 2D-CNN and Bi-LSTM,
respectively. Finally, the classification is performed in the
seventh step by leveraging a fully connected neural network.
The EC data recorded through AMI may contain noisy, erro-
neous and missing values. This is because of the metering
faults, problem in storage devices, meter tampering, etc.
The erroneous values in the dataset should be removed for
achieving accurate results. Therefore, the data preprocessing
techniques are adopted to handle the above issues. Missing
values are tackled through a linear interpolation method [1].
The equation used for filling the missing values is given
xi-1 +xi+1
2,xi== NaN ,xi±16= NaN ,
0,xi== NaN ,xi-1 or xi+1 == NaN ,
xi,xi6= NaN .
where xirepresents the electricity usage of a consumer over
a period i(e.g., a day). The equation has three parts. The first
part ensures that the EC value of a user at period i±1 should
not be equal to NAN . If the condition is satisfied, the missing
EC value of the consumer xiis filled by taking the average
of i±1 EC values. Otherwise, the missing value is filled
by zero, which is the second part of equation. The third
part of the equation states that if xiis not NAN then do not
change it. Similarly, some unusual values are also found in
the EC dataset. These values are referred to as outliers. The
outliers badly degrade the system performance. In this case,
we handle the outlier using a well known method, termed as
TSR [37]. The mathematical equation of TSR is given below.
f(xi)=(¯x+2×σ(x),if xi>¯x+2×σ(x),
where xshows the real EC vector of a consumer and ¯xrepre-
sents the average value of real usage. σdenotes the standard
deviation. In equation 2, the expression xi>¯x+2×σ(x)
states that if xidoes not follow the Gaussian distribution,
it will be declared as an outlier and will be handled by filling
with ¯x+2×σ(x). After incorporating outliers and missing
values, there is a need to scale the EC data. If we pass EC data
to neural networks without proper feature scaling, it may raise
the gradient exploding issue and increase the computational
overhead. The convergence rate of the neural network is
also suffered. Therefore, we adopt Min-Max normalization
technique to scale the EC data in the range of 0 to 1. The
equation of Min-Max normalization is given below.
xnew =ximin(x)
In equation 3,max(x) and min(x) represent the maximum and
minimum EC of a user, respectively. Algorithm 1describes
the complete workflow of data preprocessing steps. The
input, output, variables and functions of the algorithm are
described in lines 1 to 7. The lines 8 to 15 define the linear
interpolation method used for handling the missing values
present in the electricity load profiles. Similarly, the lines
from 17 to 21 and 23 deal with outliers and features scaling,
The problem of data imbalance adversely affects the per-
formance of classification algorithms. This issue is raised
when the data samples of one class is higher than the other
class. In ETD, this problem commonly occurs because the
data samples of theft consumers are rarely available. As a
result, the classification algorithms get biased towards the
majority class and ignore the minority class. Keeping this
in view, Bi-WGAN model is opted in this work to resolve
the class imbalance problem by simulating the EC patterns
of fraudulent consumers. In [28], it is used for extract-
ing the rich task-targeting features from the EC data and
shows satisfactory performance. Moreover, in [38], it per-
forms efficiently while synthesizing the fake image samples.
Hence, we are inspired and motivated from [28] and [38] and
exploited Bi-WGAN for generating the theft class samples.
The synthesized theft patterns of Bi-WGAN closely mimic
VOLUME 10, 2022 27471
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
Algorithm 1 Data Preprocessing
1Input: Real dataset
Sreal = {(x1,y1),(x2,y2),...,(xn,yn)},x,yR
2Output: Preprocessed dataset Sprep
3Variables and Functions: EC of user xSreal
4min(x): minimum consumption value of user x
5max(x): maximum consumption value of user x
6Sprep: store preprocessed data
7σ: standard deviation, avg(x): average value of x
Handling missing values:
8for n=1 to Sreal.length do
9for i=1 to x.length do
10 if xn
i== NaN && xn
i1|| xn
i+16= NaN then
11 xn
12 end
13 if xn
i== NaN && xn
i1|| xn
i+1== NaN then
14 xn
15 end
16 Outlier detection:
17 if xn
i>avg(xn)+2 * σ(xn)then
18 xn
i=avg(xn)+2 * σ(xn)
19 else
20 xn
21 end
22 Normalization:
23 xn
24 end
25 Sn
prep =xn
26 end
the patterns of real-world electricity thieves. Moreover, the
auxiliary encoder model strengthens the augmentation ability
of Bi-WGAN model through inverse mapping of original
input to the latent dimension.
Bi-WGAN is the advanced version of Bi-GAN and
WGAN [39], [40]. It is introduced to mitigate the drawbacks
of traditional GAN [41]. The traditional GAN suffers from
mode collapse, vanishing gradient and nash equilibrium prob-
lems. The mode collapse issue occurs when the generator
model generates almost the same data. In GAN, the Jensen
divergence loss function is used, which raises the vanishing
gradient issue during the adversarial training. Furthermore,
both generator and discriminator try to update their loss func-
tions, simultaneously, which affect the convergence speed of
the GAN model. Moreover, in traditional GAN, only the map-
ping from latent space to the samples exists, while the inverse
mapping is not present. In Bi-WGAN, an external encoder
module is attached with the generator network for performing
the inverse mapping of the real input to the latent space.
Moreover, an updated loss function, known as Wasserstein
distance (WD) [35], is used instead of Jensen divergence.
This function assists the model to obtain an optimal solu-
tion within minimum time. In this manner, the convergence
speed of the model towards the global optimum solution
is enhanced. The overall working of Bi-WGAN by augment-
ing electricity theft samples is explained below.
The available electricity theft data is selected as an input
for the training of Bi-WGAN model. It utilizes the objec-
tive function and loss function of Bi-GAN and WGAN,
respectively. Equation 4presents the objective function
of Bi-WGAN [32].
GE max
+EzPz(z)[log(1 D(G(z),z))].(4)
where G,E,Drepresent generator, encoder and discriminator
models, respectively. The original distribution of electricity
theft samples is denoted by Px(x). Pz(z) indicates the distri-
bution of latent noise z.Exand Ezdepict the overall expected
values of discriminator and generator models, respectively.
E(x) represents the encoded representation of the real elec-
tricity theft data x. A zero-sum game is conducted among
G,Eand Dto achieve an optimal output, which is the high
resemblance electricity theft patterns. Gis responsible for
generating those samples, which mimic the patterns of real-
world thieves. Whereas, the goal of Dis to check either the
generated theft data is real or fake. We pass real theft samples
along with the generated samples of Gto Dfor differentiating
between real and fake samples. The role of Eis to improve
the capabilities of Gby adding the encoded representation
E(x) back to the latent dimension z. The training process
continues until Pz(z) becomes similar to Px(x). To measure
the differences between the real and the fake probability dis-
tributions of theft samples, WD is utilized. It shifts the small
amount of Px(x) to Pz(z) for generating those theft samples,
which are closely related to the real-world thieves. In this
way, WD improves the convergence speed and the stable
learning of Bi-WGAN model. The mathematical formulation
of WD [35] is given below.
W(Px(x),Pz(z)) =inf
γ 5(Px (x),Pz(z))
where 5(Px(x),Pz(z)) demonstrates the set of joint distri-
butions γ(x,z). Whereas, |x,z|denotes the mass transported
from the value of xto z. The overall aim of W(Px(x),Pz(z)) is
to reduce the difference between Px(x) and Pz(z) to a minimal
level, so that the generated EC samples of Ghave a high
resemblance with the real-world electricity thieves.
In Algorithm 2, the process of handling class imbalance
problem is presented. The lines from 1 to 7 describe the input,
output, variables and functions for the algorithm. The prepro-
cessed data is split into honest and theft consumers at line 8.
In lines 9 and 10, the probability distribution for Bi-WGAN
is formulated using the real EC data of energy thieves and
random noise, respectively. The lines 11 to 25 present the
training process of both generator and discriminator models.
The training process is not stopped until the model finds the
optimal weight parameters and minimum loss value. After-
wards, the lines 27 and 28 describe the sample generation of
theft class through Bi-WGAN after after successfully training
27472 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
FIGURE 1. The proposed electricity theft detection model.
VOLUME 10, 2022 27473
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
Algorithm 2 Bi-WGAN for Data Augmentation
1Input: Preprocessed dataset
Sprep = {(x1,y1),(x2,y2),...,(xn,yn)},x,yR,
2Output: Parameters after training θG, θD, trained
Bi-WGAN model Gtrain, balanced dataset Sbal
3Variables and Functions:Sbal ,Xtheft ,Xhonest ,
α=0.00005,c=0.01, θGinitial generator parameter,
θDinitial discriminator parameter, size of batch m,
discriminator’s counter ncritics, encoder ε, encoded input
4RMSprop(α): optimizer
5split(): splitting theft and honest users’ data
6clip(): for clipping weights
7Bi-WGAN process:
8Xtheft ,Xhonest =split(Sprep)
9Pr=Pdistribution(Xtheft )
10 Pz=Pdistribution(z)
11 while θGhas not converged do
12 for j=0 to ncritics do
13 Sample from real data distribution xm
14 Sample from latent data distribution zm
15 ein =ε(x)
16 ˆx=G(z)
17 ld=
18 θd=θd+α.RMSProp(θd,lg)
19 θd=clip(θd,c,c)
20 end
21 Sample a batch from latent variable zm
22 lg= −`1
23 θg=θg+α.RMSProp(θg,lg)
24 update Gtrain(θg)
25 end
26 After training of generator, theft samples are generated
27 Xgen =Gtrain.predict(Nsample )
28 Sbal =concatenate(Xgen,Xtheft )
the model. In addition, notations and symbols used in the
algorithm is taken from [42].
In this study, a hybrid deep learning model is developed,
which is the combination of 2D-CNN and Bi-LSTM. The
hybrid model performs better than standalone model that is
proved in [43]. Both 2D-CNN and Bi-LSTM models are
integrated in a parallel manner. 2D-CNN takes 2D weekly EC
data for extracting the potential feature and periodicity from
consumers’ profiles. Meanwhile, 1D daily electricity data is
passed to Bi-LSTM for memorizing the global and temporal
correlated features. At the end, both models’ outcomes are
combined in the hybrid module for final classification. The
detailed working of these modules is provided in the follow-
ing subsections.
CNN is introduced to automatically capture the complex
feature representation and non-linearity from highly dynamic
data. It is mostly used in the domain of image processing and
computer vision. However, the authors of [44] employed it for
a speech recognition task. The results showed the superior
performance of CNN by capturing the latent correlations
from the speech data. In [1], a 2D-CNN is constructed with
the help of 2D convolution and pooling layers to explore the
electricity load profiles. It extracts the promising EC patterns
for efficient ETD. Therefore, motivated from [1] and [44],
we design a 2D-CNN model to investigate the electricity
load profiles. The major task of 2D-CNN is to learn the
hidden representations and potential features from the highly
dynamic feature space. Most of the EC datasets are provided
in 1D raw form. They contain the daily EC records of different
consumers. Since the 1D EC data has limited periodicity
and associations in EC patterns, so there is a need to trans-
form 1D daily EC profiles of consumers into 2D weekly
profiles. Therefore, 1D data is converted into 2D weekly data.
2D-CNN takes this data as input and passes it through various
filtrations, convolutions and pooling operations to capture the
latent trends and hidden fluctuations for better generalization.
In convolutional operations, different filters are incorporated.
They learn hidden feature representations and generate fea-
ture maps accordingly. Afterwards, pooling operations are
performed to diminish the spatial dimensions of generated
feature maps. In particular, we opt a max pooling strategy in
this work. The max pooling strategy picks up the highest val-
ues from the given receptive field of the specific feature map
and drops the remaining values. The dropout layers are added
in 2D-CNN to avoid overfitting issue. Moreover, we add
batch normalization layers in 2D-CNN to prevent it from the
ICS problem. Furthermore, the deep learning models are very
sensitive to diverse data, so the data should be in a normalized
form before passing it to the next layer. Otherwise, they will
become vulnerable to the gradient exploding or overfitting
problems. The mathematical formulation of the convolutional
layer [1] of 2D-CNN is as follows.
where σidepicts the sigmoid activation function and yirep-
resents the output of ith convolutional layer. xirefers to
the input, which is basically 2D weekly EC data. Similarly,
widenotes the weight of ith convolutional layer and bidepicts
the bias factor. The output yistores feature maps after the
convolving operations are performed. Afterwards, the pool-
ing operations are performed through a max pooling strategy.
The equation of the max pooling layers is shown below.
where ymdenotes the outcomes of max pooling layers, which
contain the reduced feature maps. Similarly, jdepicts the jth
neurons of a specific convolutional layer. The dropout and
batch normalization layers are added to prevent the model
from overfitting and ICS issues. Moreover, the flatten layer is
27474 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
utilized to convert the feature map into 1D vector for estab-
lishing connectivity between the following pooling layers
and the upcoming fully connected layer. The mathematical
derivation of the fully connected layer is as follows.
where girepresents the activation function. wf
iand bf
the weight and bias factors of the fully connected layer,
respectively. yfshows the output of the fully connected layer,
which contains the most important feature set that is extracted
from the 2D EC data. This feature set is further passed to the
hybrid module where it is concatenated with the feature set of
Bi-LSTM for the final classification as a honest or a malicious
The EC data contains lots of fluctuations in the EC profiles
of consumers. We observed that the electricity patterns of
consumers have a strong association with each other. In this
regard, we opt a Bi-LSTM model to capture the long-term
trend from the EC data for better NTL detection. The selec-
tion of Bi-LSTM is made because the authors of [45] prove
that its performance is outstanding in predicting the traffic
routes. The traffic routes dataset belongs to the time series
data. In the case of ETD, the EC data is also associated
with the time series data [46]. Moreover, the other reason
of using Bi-LSTM is that it stores the EC patterns for a
long time in its memory states to identify the effects of non-
malicious changes. As a result, it reduces the false detection
of electricity consumers to a minimal level.
Bi-LSTM is the extension of the traditional LSTM model
in which two sub-models are trained simultaneously. The
first sub-model works in the forward direction and the other
one works in the backward direction. Both sub-models are
aimed to learn long-term periodicity and temporal correlation
in EC load profiles. In Bi-LSTM, the provision of context
about EC patterns in both directions further improves its
feature learning capabilities. It also memorizes the long-term
historical EC patterns of consumers’ profiles, which are ben-
eficial to deal with the non-malicious changes. Consequently,
the high FPR is reduced to a greater extent. The reduction in
FPR helps the power utilities to save the maximum monetary
cost that is incurred in unnecessary onsite inspections.
Moreover, Bi-LSTM maintains the long-term sequence
in EC patterns through the collaboration of both short and
long-term memory states. The long-term memory state stores
the historical information for a long time. This state is updated
at each time step with the updated information. Whereas,
the short-term memory state consists of different memory
gates that keep the output at current time step. There are
three memory gates that work in the short-term memory state.
The input gate decides how much input data should be kept
and how much will be thrown away. It employs sigmoid
function for making the decision. Moreover, it utilizes both
current and previous state input data during decision process.
Similarly, the unnecessary information is discarded by the
forget gate. It passes only important information to the cell
state. In last, the final decision about how much information is
passed to the next hidden state is taken by the output gate.
In addition, the long-term historical information is stored in
cell state for future decisions. The process of storing the infor-
mation in both directions increases the detection accuracy and
reduces the high FPR. The mathematical representations of
different memory gates [14] are given as follows.
ct=ftct1+it∗ ˆct,(13)
where it,ftand otdenote the values of input, forget and
output gates at current time step, respectively. Similarly,
σzdenotes the sigmoid activation function of the correspond-
ing gate, which decides about the activation of the gates.
Wand Uindicate the weights matrices, which are integrated
with the input of current and previous time steps, respectively.
Moreover, ˆctand ctsignify the values in cell state at current
and overall timestamps, respectively. htrepresents hidden
state at time t. The factor bshows the bias term.
The hybrid module refers to a combined module where the
outcomes of both Bi-LSTM and 2D-CNN modules are inte-
grated into a unique feature vector. A joint weight matrix is
constructed for the hybrid training of both models. Finally,
a sigmoid function is applied on the combined feature vector
for the detection of NTL patterns.
NTLdet =σh(W[h2DCNN ,hBiLSTM ]+b),(15)
where σhdenotes the sigmoid activation function. h2DCNN
and hBiLSTM represent the final output of 2D-CNN and Bi-
LSTM models, respectively. Similarly, Wdenotes the joint
weight for a hybrid model and bis the bias factor. Algorithm 3
describes the process of feature learning and NTL detec-
tion through the hybrid 2D-CNN and Bi-LSTM model. The
lines 1 to 3 describe the input, output, variables and functions
of the algorithm. In lines 4 and 5, the transformation of data
from 1D to 2D is given. The lines from 6 to 12 present the
overall working mechanism of 2D-CNN model. Similarly,
the lines 14 to 30 indicate the learning process of Bi-LSTM
model. The lines from 19 to 24 describe the updating process
of memory gates and cell states. These gates keep or throw
current state information according to the previous cell state
information. The lines from 31 to 38 show the backpropa-
gation and the weight updating processes for different mem-
ory gates. Finally, the line 40 describes detection of energy
VOLUME 10, 2022 27475
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
Algorithm 3 The Proposed Hybrid Model
1Input: Balanced dataset
Sbal = {(x1,y1),(x2,y2),...,(xn,yn)},x,yR
3Variables and Functions: Weights
Wl,Ul,bland l,hL
62D-CNN working:
7Input layer xi=Input(X2D)
8Convolutional layers: xidenotes input of convolutional
10 Max pooling layers: ym=maxi,jR(yi,j)
11 Fully connected layer: yf=gi(wf
12 h2DCNN =yf
13 Bi-LSTM working mechanism:
14 while Wl,Uland blnot converge do
15 for xX1Ddo
16 Same process for forward and backward pass
17 for each hidden layer l=1 to l/2 do
18 for each time step t do
19 it=σ(Wl
20 ft=σ(Wl
21 ot=σ(Wl
22 ˆct=σ(Wl
23 cl
24 hl
25 end
26 h0l=hl
27 end
28 Fully connected:
29 Compute: zlWlσ(h0l)+bl
30 hBiLSTM =tanh(zl)
31 Back propagation:
32 OUlT(x),OWlT(x) and OblT(x)
33 end
34 end
35 Hybrid layer:
36 NTLdet =σ(W[h2DCNN ,hBiLSTM ]+b)
In this section, the experimental results of the proposed
and the existing schemes are presented. The experiments
are conducted on a realistic smart meters dataset, which is
released by the State grid corporation of China (SGCC). The
detailed description of the dataset is provided in Section V-
A. Moreover, Python 3.0 and Google Colab are used for
the training of deep learning models. All the deep learn-
ing models are developed through TensorFlow and Keras,
which are open source libraries that build deep neural net-
works. The baseline models are fitted using scikit-learn
The EC dataset is a publicly available realistic smart meters’
dataset, which is released by SGCC. It comprises of daily
EC of 42,372 consumers from 1 Jan 2014 to 31 Oct 2016.
In the dataset, each row represents the complete electricity
profile of a consumer and every column depicts daily EC at
a specific date. The normal and abnormal users in the dataset
are labeled as 0 and 1, respectively. The meta information
about the dataset is given in Table 2.
TABLE 2. Information of SGCC dataset.
In the ETD scenario, the available EC data is imbalanced.
Therefore, the selection of appropriate performance metrics
is a necessary task for fair and better evaluations of the model.
In the case of class imbalance, the accuracy metric is not
suitable because it only focuses on the correct predictions.
Moreover, both false positive (FP) and false negative (FN)
are important in the case of ETD. Therefore, in this study,
the selection of AUC metric is made to properly distinguish
between honest and dishonest consumers. Moreover, FN is
also important for power utilities because it increases the
financial loss. Hence, the selection of MCC metric is made
because it takes into account all the positive and negative
classes. It tells about how well true positive (TP), FP, true
negative (TN) and FN are separated. In particular, the range
of MCC score is between 0 and 1. The model performs well
if the value of MCC score is closer to 1. The interaction
towards 1 shows that the classification model efficiently
detects the positive and negative class samples. In addition,
we consider precision, recall, PR-AUC and F1-score metrics
for comprehensive analysis of the proposed scheme. Preci-
sion tells about the correct predictions of the model, which
assist the electric utilities to save the extra onsite inspec-
tion cost. Similarly, recall provides the overall suspicious
list of energy thieves, which also reduces the financial loss.
Whereas, PR-AUC focuses on both precision and recall, and
measures the ratio among them.
The mathematical formulation of the aforementioned per-
formance metrics is given as follows [22].
Precision =TP
TP +FP ,(16)
Recall =TP
TP +FN ,(17)
F1score =2Precision Recall
Precision +Recall ,(18)
T=(TP +FP)(TP +FN ),
P=(TN +FP)(TN +FN ),
27476 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
AUC =PipositiveclassRANKiP(1+P)
where Pand Nrepresent positive and negative class samples,
respectively. TP refers to the correctly identified positive
class users, which are actually normal electricity users. Sim-
ilarly, TN depicts the accurately identified abnormal class
users. Whereas, FN and FP represent the misclassified normal
and abnormal class users, respectively.
Table 3presents the analysis of the proposed methodology
using different sampling techniques to analyze the signifi-
cance of the balanced and the imbalanced data distributions.
The performance results depict that the hybrid 2D-CNN and
Bi-LSTM model obtains the highest performance on the
Bi-WGAN’s generated data distribution. The near miss and
SMOTE based balanced data does not provide satisfac-
tory performance results because these schemes randomly
remove and synthesize duplicate data records, respectively,
which raise information loss and overfitting issues. Moreover,
Bi-WGAN utilizes an auxiliary encoder module to improve
the stable learning and the convergence speed. That is why the
Bi-WGAN generated samples have close resemblance with
the real-world theft patterns, which enable the classification
model to perform efficient ETD.
TABLE 3. Proposed model performance on imbalance distribution.
In this section, the proposed model is compared with the
state-of-the-art benchmark models for efficient ETD. For
fair comparison, the same data preprocessing techniques are
opted for them. The description of the benchmark models is
given below.
The support vector machine (SVM) is the most popular
ML classifier. Both classification and regression tasks are
performed through SVM. In general, it is exploited for
binary classification. However, it also performs multi clas-
sification using a kernel trick. In [2], SVM is exploited for
final NTL detection. Therefore, we select SVM as a baseline
classifier in this work.
The random forest (RF) classifier is an ensemble learning
approach. It integrates several decision trees together that
make a forest. It follows a bagging method. In the bagging
method, the final outcome is decided by taking the average
or majority voting of different weak learners. In [21], it is
used to perform ETD.
Logistic regression (LR) is a simple and well known
ML classifier. It is used for binary classification and follows
the principle of neural networks. It contains a single layer
of neural network and a sigmoid activation function on the
output layer for binary classification. If the value on the
output layer is closer to 1, then the electricity user is classified
as an honest user and vice versa [21].
WD-CNN [1] is a hybrid deep learning approach. It is pro-
posed to detect electricity thieves in power grids. It consists
of two deep learning models, known as wide and deep compo-
nents. The wide component contains a single fully connected
layer of the neural network. It is used for extracting the
abstract features from the 1D daily EC data. Meanwhile, the
deep component captures the local features and periodicity
from the 2D weekly consumption data.
For efficient ETD, a hybrid of LSTM and multi layer percep-
tion (MLP) is proposed in [14]. In the proposed model, the
sequential time series data is passed to LSTM for capturing
the temporal correlation from the EC profiles of consumers.
Similarly, the non-sequential additional data is fed to the MLP
model for better detection of energy thieves. Afterwards, the
outputs of both models are combined into a unique feature
vector. Then, final NTL detection is performed by applying
the sigmoid activation function.
This section presents the analysis of the experimental results.
First of all, we discuss the analysis of data augmentation
using Bi-WGAN. In Fig. 2(a), the loss curves of discrimi-
nator on both real and fake samples along with the loss of
generator model are shown. The blue and the orange curves
exhibit the discriminator loss on real and fake samples. The
gradual decay in discriminator loss indicates that the dis-
criminator model efficiently discriminates the real samples
and the samples that are synthesized by the generator model.
The reason is that the discriminator model is trained more
than the generator model in Bi-WGAN. In particular, the
weights of discriminator model are updated by utilizing the
half batch of real samples and the half batch of fake samples
at each round of the training process. On the other hand, the
VOLUME 10, 2022 27477
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
FIGURE 2. (a) Training loss of Bi-WGAN generator and discriminator.
(b) Real and Bi-WGAN generated EC patterns.
loss of generator model during the training phase is shown
by the green curve. The addition of an external encoder
module in Bi-WGAN strengthens its power towards gener-
ating the most plausible EC samples. Due to this addition,
it efficiently captures the complex probability distribution
curve from EC profiles. That is why the loss of genera-
tor model is gradually reduced after few iterations of train-
ing. Consequently, the generated patterns have close resem-
blance with the real-world theft patterns. More specifically,
in Bi-WGAN, the Wasserstein loss function is used instead
of Jensen divergence loss function.
The Wasserstein loss function measures the score of real-
ness or fakeness of given samples while the regular GAN
loss function predicts the probability of generated samples
as real or fake. Hence, the addition of Wasserstein loss
function, integrating auxiliary encoder module in generator
network and the process of training discriminator model boost
the performance of Bi-WGAN towards generating promi-
nent electricity theft samples. Fig. 2(b) illustrates the perfor-
mance of Bi-WGAN during the generation of fake electricity
theft patterns. The red curve shows the real theft pattern of
an electricity user. Similarly, the blue curve demonstrates
Bi-WGAN generated theft patterns. From the figure, it is
seen that Bi-WGAN efficiently learns the objective laws from
the real electricity theft profiles and generates the real-world
synthetic theft patterns with high precision. Moreover, it is
proved that the integration of an external encoder module
in Bi-WGAN helps in simulating realistic real-world theft
Table 4describes the performance results of the proposed
model and the benchmark models on 70% training data and
30% testing data. From the results, it is seen that the pro-
posed model shows superior performance on all the existing
models. In the proposed hybrid model, the concurrent usage
of 2D-CNN and Bi-LSTM boosts its performance towards
achieving the best performance results. It obtains 0.97 AUC-
ROC score, which is the best achievement for efficient ETD.
It also beats the existing schemes, such as SVM, LR, RF,
WD-CNN and LSTM-MLP in terms of AUC-ROC. Higher
AUC-ROC means that a classification model efficiently dis-
tinguishes the two classes. Moreover, the proposed model
achieves PR-AUC of 0.98. This score states that how well the
model correctly identifies the electricity thieves. Our model
obtains the highest PR-AUC because of the powerful capabil-
ities of Bi-LSTM and 2D-CNN. Whereas, SVM obtains the
lowest AUC-ROC score of 0.77 because it does not perform
well on high dimensional data. It draws n1 hyperplanes,
where ndenotes the number of features. Therefore, the selec-
tion of an optimal hyperplane in the case of highly dynamic
data is very difficult for it. That is why SVM obtains the low-
est AUC-ROC score as compared to other baseline models.
In contrast, RF achieves a suitable AUC-ROC of 0.94 because
it follows the ensemble learning procedure. In RF, the out-
comes of several weak learners are combined for the final
prediction using the majority voting phenomenon. Moreover,
it uses a random subset of data samples and features for
training each weak learner. This process improves its perfor-
mance results. Therefore, it performs better than the conven-
tional ML techniques. It obtains AUC-ROC and PR-AUC of
0.94 and 0.96, respectively, which is higher than SVM and LR
predictions. LR does not achieve satisfactory results because
it has one single hidden layer. WD-CNN and LSTM-MLP
models achieve 0.92 and 0.95 AUC-ROC scores, respectively.
LSTM-MLP obtains better results than WD-CNN because it
uses the strong memorization and feature extraction abilities
of LSTM and MLP, respectively.
Fig. 3(a) shows the loss of the proposed hybrid model during
the training phase. The orange curve depicts the loss on
validation data and the blue curve demonstrates the loss on
training data. It is clearly seen that the hybrid model performs
well on both training and validation data. We analyze that
the loss value decreases when the epoch value increases.
However, after running 10 iterations of the training phase, the
loss value on training data starts decreasing gradually; mean-
while, the loss value on validation data becomes smooth. This
implies that the model has good generalization ability before
the 10th iteration. Moreover, a threshold must exist for epoch
value to optimize the training process. For instance, in our
case, the best performance of training is achieved when the
epoch value reaches 10.
27478 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
TABLE 4. Comparison analysis of the proposed model with benchmark schemes.
FIGURE 3. (a) Training and validation losses of hybrid 2D-CNN and
Bi-LSTM. (b) Training and validation accuracy of hybrid 2D-CNN and
Fig. 3(b) illustrates the accuracy of the hybrid model during
the training phase. It is seen that the hybrid model performs
well on both training and validation datasets because of
the effective gated configuration and the integration of both
forward and backward passes in Bi-LSTM model. In par-
ticular, the powerful feature extraction capabilities of 2D-
CNN model also improve the classification results. The per-
formance of the hybrid model on validation data is more
stable than training data. This implies that the proposed
hybrid model efficiently detects electricity thieves and honest
consumers from the EC data due to the hybrid functionali-
ties of 2D-CNN and Bi-LSTM. Its training accuracy gradu-
ally increases when the epoch value increases. The optimal
FIGURE 4. (a) AUC-ROC score of the proposed hybrid model. (b) MCC
score of the proposed hybrid model.
performance is obtained when the number of epoch hits 10.
Furthermore, a large fluctuation is seen in the accuracy value
at epoch 6. It is because of a noisy batch of samples dur-
ing the model’s training. However, the model stabilizes its
learning after the 6th epoch. Similarly, Fig. 4(a) depicts the
AUC-ROC score of the hybrid model during the training
and validation phases. It is seen that the model obtains an
AUC-ROC score of 0.97, which is an excellent achievement.
This achievement implies that the hybrid model effectively
discriminates normal and theft classes due to its best learning
mechanism. Fig. 4(b) exhibits the MCC score. MCC met-
ric is opted because it equally incorporates all findings of
confusion matrix. It finds the correlation between TP, FP,
VOLUME 10, 2022 27479
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
FIGURE 5. (a) F1-score of hybrid 2D-CNN and Bi-LSTM. (b) AUC-ROC
based benchmark comparison.
TN and FN. FN and TN are also important for electric utilities
because they help utilities to restore maximum monetary cost.
From the figure, it is observed that MCC score is increasing at
each iteration, which shows that the proposed model perfectly
deals with FN and TN. It obtains MCC score of 0.93, which
is satisfactory in case of detecting electricity thieves. Con-
sequently, it will be beneficial for power utilities to recover
maximum revenue by identifying the energy thieves. The
F1-score is depicted in Fig. 5(a) on both validation and
training datasets. It is determined by computing the har-
monic means of precision and recall values. During training,
an abrupt change is seen in the 6th epoch. This is because
of noise in the training batch. HBesides, the proposed model
obtains F1-score of 0.94, which depicts its superior perfor-
mance on validation dataset. The higher F1-score helps the
electric utilities to accurately identify and locate the energy
thieves. It also becomes beneficial to increase the detection
rate (DR) and reduce the high FPR.
The AUC-ROC scores of the proposed scheme and the
baseline models are illustrated in Fig. 5(b). The proposed
scheme obtains an AUC-ROC score of 0.97, which is sat-
isfactory as compared to the existing classifiers, such as
SVM, LR, RF, WD-CNN and LSTM-MLP. This achievement
implies that the proposed scheme efficiently distinguishes the
FIGURE 6. PR-AUC based benchmark comparison.
FIGURE 7. Training time (sec) of the proposed hybrid model and baseline
two classes due to its hybrid feature learning mechanism.
Moreover, the powerful gated configuration along with the
integration of both forward and reverse feature learning paths
in Bi-LSTM increases its performance towards capturing
the non-malicious changes. Consequently, the high FPR is
reduced to a minimum extent. The PR-AUC scores of the
proposed and baseline models are shown in Fig. 6. It equally
focuses on both precision and recall. In the case of detecting
electricity frauds, these both factors are dominant for electric
utilities. A high PR-AUC score proves the efficacy of models.
The proposed scheme achieves PR-AUC of 0.98, which is
higher than all baseline models. This implies that the pro-
posed scheme is proven to be beneficial for power industries
to accurately identify the energy frauds and help them to
recover maximum income. Moreover, Fig. 7illustrates the
training time of the proposed and baseline models. It is seen
that the proposed model takes less time for training as com-
pared to other deep models. The reason is that the proposed
model efficiently discards the redundant and noisy features
from the high dimensional EC data and reduces the com-
putational overhead to a greater extent. The model obtains
the highest performance results as compared to the baseline
models. Moreover, LR takes least time for training because it
contains one layer of neural networks. However, it does not
obtain satisfactory results. The SVM model takes the highest
training time because it first draws multiple hyperplanes and
then selects an optimal hyperplane from them to perform the
27480 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
TABLE 5. Mapping between identified limitations, proposed solutions and validation results.
classification task. This process increases the computational
complexity to a greater extent.
The mapping of identified limitations with their proposed
solutions and validations is given in Table 5. L1 is about the
noisy high dimensionality issue, which is solved by proposing
a hybrid of 2D-CNN and Bi-LSTM model and their results
are validated through suitable key performance indicators,
as shown in Figs. 4,5and 6. The poor generalization issue
is highlighted in L2. It occurs because of noisy and duplicate
features in the EC data. The issue is solved through the
proposed hybrid model. The proposed model captures only
potential features and discards the irrelevant features. More-
over, it efficiently extracts the temporal correlated features
from the EC data. Table 5validates this solution. In L3, the
problem of high FPR is discussed. This problem occurs due
to several non-malicious factors and abrupt changes in EC
load profiles. It may happen because of false data injection
by the intelligent attacker. Hence, the problem of high FPR
is resolved by utilizing the Bi-LSTM model. It maintains
the context of the long-term temporal correlation in memory
states. In this manner, the effects of various non-malicious
factors are easily identified by the model. The solution is val-
idated through AUC-ROC that is shown in Fig. 5(b). The class
imbalance issue is highlighted in L4. Bi-WGAN is employed
to synthesize the fraudulent electricity samples. The solution
is validated through the generated sample of Bi-WGAN,
as shown in Fig. 2(b). L5 is about the overfitting issue, which
occurs when using SMOTE due to the duplication of EC
records. Bi-WGAN simulates plausible theft samples because
of their powerful feature learning capabilities. The solution is
validated in Fig 2(b) where the learning process of Bi-WGAN
is presented. In L6, the ICS issue is discussed that occurs
in neural network while transferring the input distribution
from one hidden layer to the others. To solve ICS, we add
batch normalization layers and regularization penalties in the
neural network. The solution is validated by analyzing the
convergence speed of the proposed model, which is shown
in Figs. 3,4and 5. In L7, it is mentioned that the improper
selection of performance metrics in ETD does not provide fair
assessment. Therefore, the selection of appropriate metrics is
made for the fair evaluation of the proposed model. The solu-
tion is validated by suitable performance indicators, which
are shown in Figs. 3-6.
In this article, we have proposed a hybrid deep learning model
for the detection of ET in power grids. The proposed model
combines 2D-CNN and Bi-LSTM models. The noisy high
dimensionality issue is tackled through the hybrid capabilities
of both Bi-LSTM and 2D-CNN modules. Furthermore, the
challenge of the severe lack of fraudulent samples is solved
by generating realistic theft samples using Bi-WGAN. All
the experiments are conducted on the realistic smart meters
VOLUME 10, 2022 27481
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
dataset, which is released by the SGCC. The comparison
with other baseline models proves that the proposed scheme
surpasses the performance of the state-of-the-art models, such
as LR, SVM, RF, WD-CNN and LSTM-MLP. Moreover,
the simulation results illustrated that the proposed model
achieves higher AUC-ROC, PR-AUC, F1-score and MCC
score as compared to the baseline models. Our model obtains
AUC-ROC and PR-AUC of 0.97 and 0.98, respectively that
make it more suitable for real-world scenarios. Furthermore,
the proposed model can be used in different industrial appli-
cations to detect anomalies and frauds. In the future, we will
consider the high sampling EC data to enhance the perfor-
mance of the proposed hybrid model.
Dataset used in this study is publically available at
The authors would like to acknowledge Taif University
Researchers Supporting Project number (TURSP-2020/292)
Taif University, Taif, Saudi Arabia. The authors would
like also to acknowledge Princess Nourah bint Abdul-
rahman University Researchers Supporting Project number
(PNURSP2022R193), Princess Nourah bint Abdulrahman
University, Riyadh, Saudi Arabia.
[1] Z. Zheng, Y. Yang, X. Niu, H.-N. Dai, and Y. Zhou, ‘‘Wide and deep
convolutional neural networks for electricity-theft detection to secure
smart grids,’IEEE Trans. Ind. Informat., vol. 14, no. 4, pp. 1606–1615,
Apr. 2018.
[2] P. Jokar, N. Arianpoo, and V. C. M. Leung, ‘‘Electricity theft detection in
AMI using Customers’ consumption patterns,’IEEE Trans. Smart Grid,
vol. 7, no. 1, pp. 216–226, Jan. 2016.
[3] Q. Chen, K. Zheng, C. Kang, and F. Huangfu, ‘‘Detection methods
of abnormal electricity consumption behaviors: Review and prospect,’’
Autom. Electr. Power Syst., vol. 42, no. 17, pp. 189–199, 2018.
[4] S. K. Gunturi and D. Sarkar, ‘‘Ensemble machine learning models for the
detection of energy theft,’Electr. Power Syst. Res., vol. 192, Mar. 2021,
Art. no. 106904.
[5] R. Razavi, A. Gharipour, M. Fleury, and I. J. Akpan, ‘‘A practical feature-
engineering framework for electricity theft detection in smart grids,’’ Appl.
Energy, vol. 238, pp. 481–494, Mar. 2019.
[6] A. S. Iwashita, D. Rodrigues, D. S. Gastaldello, A. N. de Souza, and
J. P. Papa, ‘‘An incremental optimum-path forest classifier and its applica-
tion to non-technical losses identification,’Comput. Electr. Eng., vol. 95,
Oct. 2021, Art. no. 107389.
[7] S.-V. Oprea and A. Bâra, ‘‘Machine learning classification algorithms
and anomaly detection in conventional meters and Tunisian electricity
consumption large datasets,’Comput. Electr. Eng., vol. 94, Sep. 2021,
Art. no. 107329.
[8] C.-H. Lo and N. Ansari, ‘‘CONSUMER: A novel hybrid intrusion detec-
tion system for distribution networks in smart grid,’’ IEEE Trans. Emerg.
Topics Comput., vol. 1, no. 1, pp. 33–44, Jun. 2013.
[9] S. Amin, G. A. Schwartz, and H. Tembine, ‘‘Incentives and security in
electricity distribution networks,’’ in Proc. Int. Conf. Decis. Game Theory
Secur., Berlin, Germany: Springer, 2012, pp. 264–280.
[10] N. Javaid, H. Gul, S. Baig, F. Shehzad, C. Xia, L. Guan, and T. Sultana,
‘‘Using GANCNN and ERNET for detection of non technical losses to
secure smart grids,’IEEE Access, vol. 9, pp. 98679–98700, 2021.
[11] M. M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and
A. Gomez-Exposito, ‘‘Detection of non-technical losses using smart
meter data and supervised learning,’IEEE Trans. Smart Grid, vol. 10,
no. 3, pp. 2661–2670, May 2019.
[12] X. Kong, X. Zhao, C. Liu, Q. Li, D. Dong, and Y. Li, ‘‘Electricity
theft detection in low-voltage stations based on similarity measure and
DT-KSVM,’’ Int. J. Electr. Power Energy Syst., vol. 125, Feb. 2021,
Art. no. 106544.
[13] S. I. Popoola, B. Adebisi, M. Hammoudeh, H. Gacanin, and G. Gui,
‘‘Stacked recurrent neural network for BotNet detection in smart Homes,’’
Comput. Electr. Eng., vol. 92, Jun. 2021, Art. no. 107039.
[14] M.-M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and
A. Gomez-Exposito, ‘‘Hybrid deep neural networks for detection of
non-technical losses in electricity smart meters,’IEEE Trans. Power Syst.,
vol. 35, no. 2, pp. 1254–1263, Mar. 2020.
[15] D. Yao, M. Wen, X. Liang, Z. Fu, K. Zhang, and B. Yang, ‘‘Energy
theft detection with energy privacy preservation in the smart grid,’IEEE
Internet Things J., vol. 6, no. 5, pp. 7659–7669, Oct. 2019.
[16] M. Asif, B. Kabir, A. Ullah, S. Munawar, and N. Javaid, ‘‘Towards
energy efficient smart grids: Data augmentation through BiWGAN, feature
extraction and classification using hybrid 2DCNN and BiLSTM,’’ in Proc.
Int. Conf. Innov. Mobile Internet Services Ubiquitous Comput., Cham,
Switzerland: Springer, 2021, pp. 108–119.
[17] R. Punmiya and S. Choe, ‘‘Energy theft detection using gradient boosting
theft detector with feature engineering-based preprocessing,’IEEE Trans.
Smart Grid, vol. 10, no. 2, pp. 2326–2329, Mar. 2019.
[18] Y. Huang and Q. Xu, ‘‘Electricity theft detection based on stacked sparse
denoising autoencoder,’’ Int. J. Electr. Power Energy Syst., vol. 125,
Feb. 2021, Art. no. 106448.
[19] K. Zheng, Q. Chen, Y. Wang, C. Kang, and Q. Xia, ‘‘A novel combined
data-driven approach for electricity theft detection,’’ IEEE Trans. Ind.
Informat., vol. 15, no. 3, pp. 1809–1819, Mar. 2019.
[20] A. Takiddin, M. Ismail, U. Zafar, and E. Serpedin, ‘‘Robust electricity theft
detection against data poisoning attacks in smart grids,’IEEE Trans. Smart
Grid, vol. 12, no. 3, pp. 2675–2684, May 2021.
[21] S. Li, Y. Han, X. Yao, S. Yingchen, J. Wang, and Q. Zhao, ‘‘Electricity theft
detection in power grids with deep learning and random forests,’J. Electr.
Comput. Eng., vol. 2019, pp. 1–12, Oct. 2019.
[22] M. N. Hasan, R. N. Toma, A.-A. Nahid, M. M. M. Islam, and J.-M. Kim,
‘‘Electricity theft detection in smart grid systems: A CNN-LSTM based
approach,’Energies, vol. 12, no. 17, p. 3310, Aug. 2019.
[23] R. R. Bhat, R. D. Trevizan, R. Sengupta, X. Li, and A. Bretas, ‘‘Identi-
fying nontechnical power loss via spatial and temporal deep learning,’
in Proc. 15th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2016,
pp. 272–279.
[24] B. Kocaman and V. Tümen, ‘‘Detection of electricity theft using data
processing and LSTM method in distribution systems,’S¯
a, vol. 45,
no. 1, pp. 1–10, Dec. 2020.
[25] G. Fenza, M. Gallo, and V. Loia, ‘‘Drift-aware methodology for anomaly
detection in smart grid,’IEEE Access, vol. 7, pp. 9645–9657, 2019.
[26] X. Lu, Y. Zhou, Z. Wang, Y. Yi, L. Feng, and F. Wang, ‘‘Knowledge embed-
ded semi-supervised deep learning for detecting non-technical losses in the
smart grid,’Energies, vol. 12, no. 18, p. 3452, Sep. 2019.
[27] C. C. O. Ramos, D. Rodrigues, A. N. de Souza, and J. P. Papa, ‘‘On the
study of commercial losses in Brazil: A binary black hole algorithm for
theft characterization,’IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 676–683,
Mar. 2018.
[28] T. Hu, Q. Guo, H. Sun, T.-E. Huang, and J. Lan, ‘‘Nontechnical losses
detection through coordinated BiWGAN and SVDD,’IEEE Trans. Neural
Netw. Learn. Syst., vol. 32, no. 5, pp. 1866–1880, May 2021.
[29] N. F. Avila, G. Figueroa, and C.-C. Chu, ‘‘NTL detection in electric
distribution systems using the maximal overlap discrete wavelet-packet
transform and random undersampling boosting,’IEEE Trans. Power Syst.,
vol. 33, no. 6, pp. 7171–7180, Nov. 2018.
[30] J. I. Guerrero, I. Monedero, F. Biscarri, J. Biscarri, R. Millan, and C. Leon,
‘‘Non-technical losses reduction by improving the inspections accuracy in
a power utility,’’ IEEE Trans. Power Syst., vol. 33, no. 2, pp. 1209–1218,
Mar. 2018.
[31] M. S. Saeed, M. W. Mustafa, U. U. Sheikh, T. A. Jumani, and N. H. Mirjat,
‘‘Ensemble bagged tree based classification for reducing non-technical
losses in multan electric power company of Pakistan,’’ Electronics, vol. 8,
no. 8, p. 860, Aug. 2019.
[32] X. Gong, B. Tang, R. Zhu, W. Liao, and L. Song, ‘‘Data augmentation
for electricity theft detection using conditional variational auto-encoder,’’
Energies, vol. 13, no. 17, p. 4291, Aug. 2020.
[33] H. Gul, N. Javaid, I. Ullah, A. M. Qamar, M. K. Afzal, and G. P. Joshi,
‘‘Detection of non-technical losses using SOSTLink and bidirectional
gated recurrent unit to secure smart meters,’Appl. Sci., vol. 10, no. 9,
p. 3151, Apr. 2020.
27482 VOLUME 10, 2022
M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM
[34] X. Wang, I. Yang, and S.-H. Ahn, ‘‘Sample efficient home power anomaly
detection in real time using semi-supervised learning,’IEEE Access,
vol. 7, pp. 139712–139725, 2019.
[35] A. Aldegheishem, M. Anwar, N. Javaid, N. Alrajeh, M. Shafiq, and
H. Ahmed, ‘‘Towards sustainable energy efficiency with intelligent elec-
tricity theft detection in smart grids emphasising enhanced neural net-
works,’IEEE Access, vol. 9, pp. 25036–25061, 2021.
[36] N. Javaid, N. Jan, and M. U. Javed, ‘‘Anadaptive synthesis to handle imbal-
anced big data with deep Siamese network for electricity theft detection in
smart grids,’J. Parallel Distrib. Comput., vol. 153, pp. 44–52, Jul. 2021.
[37] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A survey,’’
ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, 2009.
[38] U. Mutlu and E. Alpaydın, ‘‘Training bidirectional generative adver-
sarial networks with hints,’Pattern Recognit., vol. 103, Jul. 2020,
Art. no. 107320.
[39] M. Arjovsky, S. Chintala, and L. Bottou, ‘‘Wasserstein generative adver-
sarial networks,’’ in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
[40] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville,
‘‘Improved training of Wasserstein GANs,’’ 2017, arXiv:1704.00028.
[41] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial networks,’’
2014, arXiv:1406.2661.
[42] M. Arjovsky, S. Chintala, and L. Bottou, ‘‘Wasserstein generative adver-
sarial networks,’’ in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
[43] J. Yu, X. Zhang, L. Xu, J. Dong, and L. Zhangzhong, ‘‘A hybrid CNN-GRU
model for predicting soil moisture in maize root zone,’Agricult. Water
Manage., vol. 245, Feb. 2021, Art. no. 106649.
[44] J. Zhao, X. Mao, and L. Chen, ‘‘Speech emotion recognition using deep 1D
& 2D CNN LSTM networks,’Biomed. Signal Process. Control, vol. 47,
pp. 312–323, Jan. 2019.
[45] Z. Cui, R. Ke, Z. Pu, and Y. Wang, ‘‘Stacked bidirectional and unidirec-
tional LSTM recurrent neural network for forecasting network-wide traffic
state with missing values,’Transp. Res. C, Emerg. Technol., vol. 118,
Sep. 2020, Art. no. 102674.
[46] N. Javaid, A. Naz, R. Khalid, A. Almogren, M. Shafiq, and A. Khalid,
‘‘ELS-Net: A new approach to forecast decomposed intrinsic mode func-
tions of electricity load,’IEEE Access, vol. 8, pp. 198935–198949, 2020.
MUHAMMAD ASIF (Graduate Student Member,
IEEE) received the B.S. degree in information
technology from the University of Gujrat, Gujrat,
Pakistan, in 2017. He is currently pursuing the
M.S. degree in computer science with the Com-
munications Over Sensors (ComSens) Research
Laboratory, COMSATS University Islamabad,
Islamabad Campus, under the supervision of
Prof. Nadeem Javaid. His research interests
include electricity load forecasting, financial
market forecasting, and smart grids.
OROOJ NAZEER received the M.S. degree in computer science from
Abasyn University, Islamabad, under the supervision of Prof. Nadeem
received the bachelor’s degree in computer sci-
ence from Gomal University, Dera Ismail Khan,
Pakistan, in 1995, the master’s degree in elec-
tronics from Quaid-i-Azam University, Islamabad,
Pakistan, in 1999, and the Ph.D. degree from the
University of Paris-Est, France, in 2010. He is
currently a Professor and the Founding Director
of the Communications Over Sensors (ComSens)
Research Laboratory, Department of Computer
Science, COMSATS University Islamabad, Islamabad Campus. He is also
working as a Visiting Professor at the School of Computer Science, Uni-
versity of Technology Sydney, Australia. He has supervised 146 master’s
and 27 Ph.D. theses. He has authored over 900 articles in technical jour-
nals and international conferences. His research interests include energy
optimization in smart/microgrids and in wireless sensor networks using
data analytics and blockchain. He was a recipient of the Best University
Teacher Award (BUTA’16) from the Higher Education Commission (HEC)
of Pakistan, in 2016, and the Research Productivity Award (RPA’17) from
the Pakistan Council for Science and Technology (PCST), in 2017. He is an
Associate Editor of IEEE ACCESS and the Editor of Sustainable Cities and
EMAN H. ALKHAMMASH received the M.Sc. and Ph.D. degrees in com-
puter science from the University of Southampton, U.K. She is currently
working as an Associate Professor of computer science with Taif University,
Saudi Arabia. Her research area includes formal methods, AI, data science,
and so on. She was awarded as a Senior Fellow of the Higher Education
Academy (FHEA) in March 2020.
MYRIAM HADJOUNI received the Ph.D. degree (Hons.) in computer
science from Paris XI (actual new name Paris Saclay) University, France,
and Manouba University, Tunisia, in 2012, and the M.Sc. degree (Hons.)
from the Higher Institute of Management of Tunis, University of Tunis,
Tunisia, in 2005. She is currently working as an Assistant Professor with
the Computer Sciences Department, College of Computer and Information
Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Kingdom
of Saudi Arabia. Her research includes but not restricted to information
retrieval, artificial intelligence, data science, data analytic, big data, and
image retrieval.
VOLUME 10, 2022 27483
... Moreover, one of the main contributions to data imbalance problem is the use of adaptive synthesis balance (ASB), while missing values were treated in the same way as in previous works. In [21], a similar but more complex algorithm than the one constructed in [20] is introduced for NTL detection. This time, a 2D-CNN and bidirectional LSTM (Bi-LSTM), which is a further complex variant of LSTM are used for NTL detection. ...
... Considering these works from the perspective of representation learning evolution, the timeline indicates that with each year, the upcoming methods become more and more complex, moving from an ordinary deep network to a hybrid combination of two or more models. Not to mention the adaptive learning features [20]- [22] and generative models [18], [21], [23]. The work reported in [19] was the only one that mainly focuses on the study of balancing methods more than the importance of data representations. ...
... Another important finding related to the types of algorithms used must be discussed in this case. Indeed, we can observe that the CNN algorithm is the most used algorithm as in [16]- [19], [21], [22]. There is no doubt that the CNN has a strong feature separation capability. ...
Electricity theft, known as "Non-Technical Loss" (NTL) is certainly one of the priorities of power distribution utilities. Indeed, NTL could lead to serious damage ranging from massive financial losses to loss of reputation resulting from poor power quality. With advances in metering infrastructure technologies, the availability of user data has fueled the emergence of data-driven methods in NTL detection. Among these methods, deep learning (DL) is an indisputable alternative to conventional human-centric approaches. Typically, modeling based on NTL data is subject to three main challenges, including (i) missing information; (ii) class imbalance; and (iii) data complexity. In this context, this paper contributes to solving these three main problems while paying more attention to data complexity related to cardinality. Accordingly, a multiverse recurrent expansion with multiple repeats (MV-REMR) algorithm is proposed in this paper. MV-REMR is able to provide deeper representations than ordinary DL networks and take advantage of different trained deep network responses to build an efficient model. For MV-REMR efficiency analysis, a realistic NTL dataset is considered. As a result, MV-REMR has shown that it can achieve what is considered excellent feature mapping proven by both scatter visualization and variations in widely used classification metrics. Moreover, MV-REMR shows its ability to marginalize the distance of data classes with superior performance. In addition, thanks to the new mapping scheme, MV-REMR shows its ability to correct outliers resulting from errors in missing values filling techniques. Finally, a comparison with some recent successful works also confirms the superiority of the MV-REMR model.
... In this subsection, the proposed deep and machine learning (ML) stacking ensemble model (BiLSTM-LogitBoost) is discussed, which consists of two models BiLSTM [64] and LogitBoost [65]. The details of the proposed model are given in the below subsection. ...
... The cell maintains and remembers the data in arbitrary time-intervals. The BiLSTM [64] model is the enhanced version of the traditional LSTM [66]. ...
... The proposed BiLSTM-LogitBoost stacking ensemble model, proposed for ETD in SGs, is evaluated and discussed in this section. Some recent benchmarks, such as SVM [19], [71], logistic regression (LR) [37], decision tree (DT) [37], LSTM [21], [71], adaptive boosting (AdaBoost) [37], BiLSTM [64], LogitBoost [65], and LSTM-AdaBoost [72] are also implemented for ETD and their results are compared with the proposed model. LogitBoost with n_estimators = 25 is employed as a benchmark technique to our proposed model. ...
Full-text available
Obtaining outstanding electricity theft detection (ETD) performance in the realm of advanced metering infrastructure (AMI) and smart grids (SGs) is quite difficult due to various issues. The issues include limited availability of theft data as compared to benign data, neglecting dimensionality reduction, usage of the standalone (single) electricity theft detectors, etc. These issues lead the classification techniques to low accuracy, minimum precision, low F1 score, and overfitting problems. For these reasons, it is extremely crucial to design such a novel strategy that is capable to tackle these issues and yield outstanding ETD performance. In this article, electricity theft happening in SGs is detected using a novel ETD approach. The proposed approach comprises recursive feature elimination (RFE), k nearest neighbor oversampling (KNNOR), bidirectional long short term memory (BiLSTM), and logit boosting (LogitBoost) techniques. Furthermore, three BiLSTM networks and a LogitBoost model are combined to make a BiLSTM-LogitBoost stacking ensemble model. Data preprocessing and feature selection followed by data balancing and electricity theft classification are the four major stages of the model proposed for ETD. It is obvious from the simulations performed using state grid corporation of China (SGCC)’s electricity consumption (EC) data that our proposed model achieves 96.32% precision, 94.33% F1 score, and 89.45% accuracy, which are higher than all the benchmarks employed in this study.
... The missing values was handled as same as previous work in [7]. In [9], a hybrid algorithm between 2D-CNN and bidirectional LSTM (Bi-LSTM) are used for electricity theft detection. Accordingly, missing patterns problem is solved via linear interpolation. ...
... Table 1 summarizes the contribution of the works cited above in terms of prediction complexity, class imbalance and management of missing values. Results in Table 1 explain that authors pay more attention to solving classification Linear interpolation [7] 2021 CNN Multiple methods [8] 2021 CNN-LSTM ASB [9] 2022 2D-CNN and Bi-LSTM Bi-WGAN oversampling [10] 2022 Alexnet, AdaBoost and ABC ...
... The results indicate that the REDL learning model has greater classification ability. The only model close to the REDL model is the one presented in [9]. These close results may be related to the nature of the model, which looks like a very complex combination of 2D-CNN and Bi-LSTM and Bi-WGAN, whereas the complexity of the model, in this case, was intended to explore meaningful features. ...
Conference Paper
The increase in electricity theft has become one of the main concerns of power distribution networks. Indeed, electricity theft could not only lead to financial losses but also leads to reputation damage by reducing the quality of supply. With advanced sensing technologies of metering infrastructures, data collection of electricity consumption enables data-driven methods to emerge in such non-technical loss detections as an alternative to traditional experience-based human-centric approaches. In this context, such fraud prediction problems are generally thematic of missing patterns, class imbalance, and a higher level of cardinality where there are many possibilities that a single feature can assume. Therefore, this article is introduced specifically to solve the data representation problem and increase the sparseness between different data classes. As a result, deeper representations than deep learning networks are introduced to repeatedly merge the learning models into a more complex architecture in a sort of recurrent expansion. To verify the effectiveness of the proposed recurrent expansion of deep learning (REDL) approach, a realistic dataset of electricity theft is involved. Consequently, REDL has achieved excellent data mapping results proven by both visualization and numerical metrics and shows the ability to separate different classes with higher performance. Another important REDL feature of outliers correction has also been discovered in this study. Finally, a comparison to some recent works also proved the superiority of the REDL model.
... Data augmentation with synthetically created samples has been proven beneficial for several machine learning models [8]. A. Mikołajczyk and M. Grochowski [9] compared and analyzed multiple data augmentation methods in image classification and improved the training process efficiency for image classification. To solve the problem of class imbalance due to the lack of fraudulent electricity consumers, M. Asif et al. [10] proposed employing an evolutionary bidirectional Wasserstein Generative Adversarial Network (Bi-WGAN). They [10] use Bi-WGAN to synthesize the most plausible fraudulent electricity consumer samples to detect non-technical losses (NTL) in smart meters. ...
... To solve the problem of class imbalance due to the lack of fraudulent electricity consumers, M. Asif et al. [10] proposed employing an evolutionary bidirectional Wasserstein Generative Adversarial Network (Bi-WGAN). They [10] use Bi-WGAN to synthesize the most plausible fraudulent electricity consumer samples to detect non-technical losses (NTL) in smart meters. W. Tan and H. Guo [11] also utilized a data augmentation method in their automatic COVID-19 diagnosis framework from lung CT images and improved the generalization capability of the 2D CNN classification models. ...
Full-text available
Various studies have shown the advantages of using Machine Learning (ML) techniques for analog and digital IC design automation and optimization. Data scarcity is still an issue for electronic designs, while training highly accurate ML models. This work proposes generating and evaluating artificial data using generative adversarial networks (GANs) for circuit data to aid and improve the accuracy of ML models trained with a small training data set. The training data is obtained by various simulations in the Cadence Virtuoso, HSPICE, and Microcap design environment with TSMC 180nm and 22nm CMOS technology nodes. The artificial data is generated and tested for an appropriate set of analog and digital circuits. The experimental results show that the proposed artificial data generation significantly improves ML models and reduces the percentage error by more than 50\% of the original percentage error, which were previously trained with insufficient data. Furthermore, this research aims to contribute to the extensive application of AI/ML in the field of VLSI design and technology by relieving the training data availability-related challenges.
... However, data balancing is performed via SMOTE, where, the models tend to overfit. Moreover, an improved SMOTE, i.e., k-means clustering SMOTE (K-SMOTE) based data balancing and improved RF based electricity theft classification is done in [22]. The proposed method provides accurate and reliable locations for manual on-site inspection, so that NTL is reduced and the power system's stability and reliability are improved. ...
... It is the collection of the DTs used to make accurate and reliable predictions [12]. RF is widely utilized for binary classification problems like ETD [12], [21], [22], [43]. ...
Full-text available
Electricity theft is considered one of the most significant reasons of the non technical losses (NTL). It negatively influences the utilities in terms of the power supply quality, grid’s safety, and economic loss. Therefore, it is necessary to effectively deal with the electricity theft problem. For detecting electricity theft in smart grids (SGs), an efficient and state-of-the-art approach is designed in the underlying work based on autoencoder and bidirectional gated recurrent unit (AE-BiGRU). The proposed approach consists of six components: (1) data collection, (2) data preparation, (3) data balancing, (4) feature extraction, (5) classification and (6) performance evaluation. Moreover, bidirectional gated recurrent unit (BiGRU) is used for the identification of the anomalies in electricity consumption (EC) patterns caused due to factors like family formation changes, holidays, parties, and so on, which are referred as non-theft factors. The proposed autoencoder-bidirectional gated recurrent unit (AE-BiGRU) model employs the EC data acquired from state grid corporation of China (SGCC) for simulations. Furthermore, it is visualized from the simulation results that 90.1% accuracy and 10.2% false positive rate (FPR) are obtained by the proposed model. The results are better than different existing classifiers, i.e., logistic regression (LR), decision tree (DT), extreme gradient boosting (XGBoost), gated recurrent unit (GRU), etc.
... Another trend is the improvement of neural networks used to classify fraudsters [59]. In [60] the authors advance on this subject, presenting new artificial neural network architectures to improve NTL detection. Advancing on neural networks trends, the work in [61] combines multi-layer perceptron and gated recurrent unit, combined with the SMOTE methods for balancing data. ...
Full-text available
Non-Technical Losses (NTL) represent a serious concern for electric companies. These losses are responsible for revenue losses, as well as reduced system reliability. Part of the revenue loss is charged to legal consumers, thus, causing social imbalance. NTL methods have been developed in order to reduce the impact in physical distribution systems and legal consumers. These methods can be classified as hardware-based and non-hardware-based. Hardware-based methods need an entirely new system infrastructure to be implemented, resulting in high investment and increased cost for energy companies, thus hampering implementation in poorer nations. With this in mind, this paper performs a review of non-hardware-based NTL detection methods. These methods use distribution systems and consumers’ data to detect abnormal energy consumption. They can be classified as network-based, which use network technical parameters to search for energy losses, data-based methods, which use data science and machine learning, and hybrid methods, which combine both. This paper focuses on reviewing non-hardware-based NTL detection methods, presenting a NTL detection methods overview and a literature search and analysis.
Full-text available
Precise knowledge of secondary arc extinction instant and fault nature (temporary or permanent) is necessary for auto-reclosing after a single line-to-ground fault. Existing intelligent reclosing schemes rely on the extraction of appropriate features using a signal processing module (SPM) during online data monitoring. The value of features varied greatly under different operating scenarios as well as the computational burden is greatly enhanced owing to SPM which significantly impacts the performance of the auto-reclosing scheme. Hence, in this study bi-directional long short-term memory (Bi-LSTM) network is designed which integrates feature extraction and classification process. Thus, the proposed scheme is directly incorporated into the incoming voltage data without using any SPM/ filtering technique. The open-source test system provided by the developers of Hydro-Quebec, Canada is used for training and testing. Around 4860 different signals are collected by varying power system parameters and secondary arc conditions to develop dataset A. The Bi-LSTM model is tested under no noise, low noise of SNR 30, and high noise of SNR 10. To ensure the efficacy of the proposed scheme, uni-directional long short-term memory (U-LSTM), gated recurrent unit (GRU), and machine learning models are also trained on the same dataset. Later, for validation, a second dataset B is developed by varying surge impedance loading, frequency-dependent transmission lines, and arc resistance. Then the efficiency of pre-trained artificial neural networks (ANNs) is validated on this unseen dataset. The testing and validation on both datasets confirm superior efficiency of Bi-LSTM in comparison to U-LSTM, GRU, and other models.
Full-text available
In this paper, two supervised learning models based solutions are proposed for Electricity Theft Detection (ETD). In the first solution, Adaptive Synthetic Edited Nearest Neighbor (ADASYNENN) is used to solve class imbalanced problem. For feature extraction, Locally Linear Embedding (LLE) technique is utilized. Moreover, Self-Attention Generative Adversarial Network (SAGAN) is used in combination with Convolutional Neural Network (CNN) for the classification of electricity consumers. In the second solution, Synthetic Minority Oversampling Technique Edited Nearest Neighbor (SMOTEENN) is proposed. Moreover, a novel classification model, named as ERNET, which is based on EfficientNet, Residual Network (ResNet) and Gated Recurrent Unit (GRU), is used to detect Non-Technical Losses (NTLs). We also used a Sparse Auto Encoder (SAE) for effective feature extraction that makes the classification more robust and easy. Furthermore, a robust Root Mean Square Propagation (RMSProp) optimizer is used to improve the learning rate of the model. To validate the proposed models, simulations are performed using different performance metrics, such as precision, recall, F1-score, Area Under the Curve (AUC), FPR and Root Mean Square Error (RMSE). All simulations are performed using State Grid Corporation of China (SGCC) dataset. The proposed models are compared with benchmark models, such as SAGAN, Wide and Deep Convolutional Neural Network (WDCNN), CNN and Long Short Term Memory (LSTM). The simulation results prove that the proposed models outperform the existing models in terms of the aforementioned performance metrics.
Full-text available
Internet of Things (IoT) devices in Smart Home Network (SHN) are highly vulnerable to complex botnet attacks. In this paper, we investigate the effectiveness of Recurrent Neural Network (RNN) to correctly classify network traffic samples in the minority classes of highly imbalanced network traffic data. Multiple layers of RNN are stacked to learn the hierarchical representations of highly imbalanced network traffic data with different levels of abstraction. We evaluate the performance of Stacked RNN (SRNN) model with Bot-IoT dataset. Results show that SRNN outperformed RNN in all classification scenarios. Specifically, SRNN model learned the discriminating features of highly imbalanced network traffic samples in the training set with better representations than RNN model. Also, SRNN model is more robust and it demonstrated better capability to effectively handle over-fitting problem than RNN model. Furthermore, SRNN model achieved better generalization ability in detecting network traffic samples of the minority classes.
Full-text available
The bi-directional flow of energy and information in the smart grid makes it possible to record and analyze the electricity consumption profiles of consumers. Because of the increasing rate of inflation over the past few years, people started looking for means to use electricity illegally, termed as electricity theft. Many data analytics techniques are proposed in the literature for electricity theft detection (ETD). These techniques help in the detection of suspected illegal consumers. However, the existing approaches have a low ETD rate either due to improper handling of the imbalanced class problem in a dataset or the selection of inappropriate classifier. In this paper, a robust big data analytics technique is proposed to resolve the aforementioned concerns. Firstly, adaptive synthesis (ADASYN) is applied to handle the imbalanced class problem of data. Secondly, convolutional neural network (CNN) and long-short term memory (LSTM) integrated deep siamese network (DSN) is proposed to discriminate the features of both honest and fraudulent consumers. Specifically, the task of feature extraction from weekly energy consumption profiles is handed over to the CNN module while the LSTM module performs the sequence learning. Finally, the DSN contemplates on the shared features provided by the CNN-LSTM and applies final judgment. The data analytics is performed on different train-test ratios of the real-time smart meters' data. The simulation results validate the proposed model's effectiveness in terms of high area under the curve, F1-Score, precision and recall.
Full-text available
In smart grids, electricity theft is the most significant challenge. It cannot be identified easily since existing methods are dependent on specific devices. Also, the methods lack in extracting meaningful information from high-dimensional electricity consumption data and increase the false positive rate that limit their performance. Moreover, imbalanced data is a hurdle in accurate electricity theft detection (ETD) using data driven methods. To address this problem, sampling techniques are used in the literature. However, the traditional sampling techniques generate insufficient and unrealistic data that degrade the ETD rate. In this work, two novel ETD models are developed. A hybrid sampling approach, i.e., synthetic minority oversampling technique with edited nearest neighbor, is introduced in the first model. Furthermore, AlexNet is used for dimensionality reduction and extracting useful information from electricity consumption data. Finally, a light gradient boosting model is used for classification purpose. In the second model, conditional wasserstein generative adversarial network with gradient penalty is used to capture the real distribution of the electricity consumption data. It is constructed by adding auxiliary provisional information to generate more realistic data for the minority class. Moreover, GoogLeNet architecture is employed to reduce the dataset’s dimensionality. Finally, adaptive boosting is used for classification of honest and suspicious consumers. Both models are trained and tested using real power consumption data provided by state grid corporation of China. The proposed models’ performance is evaluated using different performance metrics like precision, recall, accuracy, F1-score, etc. The simulation results prove that the proposed models outperform the existing techniques, such as support vector machine, extreme gradient boosting, convolution neural network, etc., in terms of efficient ETD.
Full-text available
Data-driven electricity theft detectors rely on customers’ reported energy consumption readings to detect malicious behavior. One common implicit assumption in such detectors is the correct labeling of the training data. Unfortunately, these detectors are vulnerable against data poisoning attacks that assume false labels during training. This paper addresses three major problems: What is the impact of data poisoning attacks on the detector’s performance? Which detector is more robust against data poisoning attacks, i.e., generalized or customer-specific detectors? How to improve the detector’s robustness against data poisoning attacks? Our investigations reveal that: (a) Shallow and deep learning-based detectors suffer from data poisoning attacks that may lead to a significant deterioration of detection rate of up to 17%. Furthermore, deep detectors offer 12% performance improvement over shallow detectors. (b) Generalized detectors present 4% performance improvement over customer-specific detectors even in the presence of data poisoning attacks. To enhance the detectors’ robustness against data poisoning attacks, we propose a sequential ensemble detector based on a deep auto-encoder with attention (AEA), gated recurrent units (GRUs), and feed forward neural networks. The proposed robust detector retains a stable detection performance that is deteriorated only by 1-3% in the presence of strong data poisoning attacks.
Full-text available
Electricity theft is a big problem faced by all energy distribution services and continues to rising. Therefore, studies on electricity theft detection techniques have increased in recent years. Unsuitable calibration and illegal calibration of energy meters during production may cause non-technical losses. Non-technical losses have been a major concern for the resulting security risks and the immeasurable loss of income. In most of the meter tampered locations, damaged meter terminals and/or illegal applications cannot be distinguishable during checking. In fact, electric distribution companies will never be able to eliminate electricity theft. But it is possible to take measure to detect, prevent and reduce it. In this paper, we developed by using deep learning methods on real daily electricity consumption data (Electricity consumption dataset of State Grid Corporation of China). Data reduction has been made by developing a new method to make the dataset more usable and to extract meaningful results. A Long Short-Term Memory (LSTM) based deep learning method has been developed for the dataset to be able to recognize the actual daily electricity consumption data of 2016. In order to evaluate the performance of the proposed method, the accuracy, prediction and recall metric was used by considering the five cross-fold technique. Performance of the proposed methods were found to be better than previously reported results.
Full-text available
The significance of electricity cannot be overlooked as all fields of life like material production, health care, educational sector, etc., depend upon it to render consistent and high-quality services, increase productivity and business continuity. To this end, energy operators have experienced a continuous increasing trend in the electricity demand for the past few decades. This may cause many issues like load shedding, increased electricity bills, imbalance between supply and demand, etc. Therefore, forecasting of electricity demand using efficient techniques is crucial for the energy operators to decide about optimal unit commitment and to make electricity dispatch plans. It also helps to avoid wastage as well as the shortage of energy. In this study, a novel forecasting model, known as ELS-net is proposed, which is a combination of an Ensemble Empirical Mode Decomposition (EEMD) method, multi-model Ensemble Bi Long Short-Term Memory (EBiLSTM) forecasting technique and Support Vector Machine (SVM). In the proposed model, EEMD is used to distinguish between linear and non-linear intrinsic mode functions (IMFs), EBiLSTM is used to forecast the non-linear IMFs and SVM is employed to forecast the linear IMFs. Using separate forecasting techniques for linear and non-linear IMFs decreases the computational complexity of the model. Moreover, SVM requires low computational time as compared to EBiLSTM for linear IMFs. Simulations are performed to examine the effectiveness of the proposed model using two different datasets: New South Wales (NSW) and Victoria (VIC). For performance evaluation, Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are used as performance metrics. From the simulation results, it is obvious that the proposed ELS-net model outperforms the start-of-the-art techniques, such as EMD-BILSTM-SVM, EMD-PSO-GA-SVR, BiLSTM, MLP and SVM in terms of forecasting accuracy and minimum execution time.
Non-technical losses stand for the energy consumed but not billed, affecting the energy grid as a whole. Such an issue somehow prevails in developing countries, harming the quality of energy and preventing social programs benefit from tax revenues. Machine learning techniques can help mitigate it by mining information from fraudsters and legal users for further decision-making. In this paper, we deal with a steady increase of dataset size, i.e., the incremental learning problem, which can cope with datasets regularly provided by energy companies, requiring the learner to be updated constantly. Since repeating the entire learning process might be prohibitive, adjusting the model to the new data shows to be a better choice. We propose an incremental Optimum-Path Forest approach with k-nn neighborhood that is considerably more efficient for training than its counterpart version, with experiments validated in general-purpose datasets and also in the context of non-technical losses identification.
Although fraud in electricity consumption is easier to detect when consumption is recorded hourly by smart meters, in most developing countries, where the propensity for fraud is higher, conventional meters are not yet affordable. Fraud detection is easier with time series data-logging due to the periodicity and variability of consumption that reveals deviations from a regular consumption pattern. In contrast, fraud detection with conventional meters remains a significant challenge because anomalies in consumption are well hidden within the normal consumption of other consumers. In this paper, large datasets regarding consumers and invoice data from Tunisia are combined and investigated with several Machine Learning (ML) classification algorithms, to detect irregularities in electricity consumption. By performing extensive feature engineering, including multivariate Gaussian distribution, the efficiency of ensemble classifiers such as Light Gradient Boosting (LGB) outperforms other algorithms and achieves realistic performance from challenging, unbalanced and uncorrelated input datasets.
In this paper, a novel hybrid deep learning approach is proposed to detect the nontechnical losses (NTLs) that occur in smart grids due to illegal use of electricity, faulty meters, meter malfunctioning, unpaid bills, etc. The proposed approach is based on data-driven methods due to the sufficient availability of smart meters’ data. Therefore, a bi-directional wasserstein generative adversarial network (Bi-WGAN) is utilized to generate the synthetic theft samples for solving the class imbalance problem. The Bi-WGAN efficiently synthesizes the minority class theft samples by leveraging the capabilities of an additional encoder module. Moreover, the curse of dimensionality degrades the model’s generalization ability. Therefore, the high dimensionality issue is solved using the two dimensional convolutional neural network (2D-CNN) and bidirectional long short-term memory network (Bi-LSTM). The 2D-CNN is applied on 2D weekly data to extract the most prominent features. In 2D-CNN, the convolutional and pooling layers extract only the potential features and discard the redundant features to reduce the curse of dimensionality. This process increases the convergence speed of the model as well as reduces the computational overhead. Meanwhile, a Bi-LSTM is also used to detect the non-malicious changes in consumers’ load profiles using its strong memorization capabilities. Finally, the outcomes of both models are concatenated into a single feature map and a sigmoid activation function is applied for final NTL detection. The simulation results demonstrate that the proposed model outperforms the existing scheme in terms of mathew correlation coefficient (MCC), precision-recall (PR) and area under the curve (AUC). It achieves 3%, 5% and 4% greater MCC, PR and AUC scores, respectively as compared to the existing model.