A Hybrid Deep Learning Approach for Detecting
Non Technical Losses in Smart Grids
Muhammad Asif1, Benish Kabir1, Pamir1, Ashraf Ullah1, Shoaib Munawar2, Nadeem Javaid1,∗
1Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
2Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan
Email: firstname.lastname@example.org, email@example.com,
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org,
∗Corresponding author: email@example.com; www.njavaid.com
Abstract—In this paper, a novel hybrid deep learning approach
is proposed to detect the nontechnical losses (NTLs) that occur in
smart grids due to illegal use of electricity, faulty meters, meter
malfunctioning, unpaid bills, etc. The proposed approach is based
on data-driven methods due to the sufﬁcient availability of smart
meters’ data. Therefore, a bi-directional wasserstein generative
adversarial network (Bi-WGAN) is utilized to generate the syn-
thetic theft samples for solving the class imbalance problem. The
Bi-WGAN efﬁciently synthesizes the minority class theft samples
by leveraging the capabilities of an additional encoder module.
Moreover, the curse of dimensionality degrades the model’s
generalization ability. Therefore, the high dimensionality issue is
solved using the two dimensional convolutional neural network
(2D-CNN) and bidirectional long short-term memory network
(Bi-LSTM). The 2D-CNN is applied on 2D weekly data to extract
the most prominent features. In 2D-CNN, the convolutional and
pooling layers extract only the potential features and discard the
redundant features to reduce the curse of dimensionality. This
process increases the convergence speed of the model as well
as reduces the computational overhead. Meanwhile, a Bi-LSTM
is also used to detect the non-malicious changes in consumers’
load proﬁles using its strong memorization capabilities. Finally,
the outcomes of both models are concatenated into a single
feature map and a sigmoid activation function is applied for
ﬁnal NTL detection. The simulation results demonstrate that
the proposed model outperforms the existing scheme in terms
of mathew correlation coefﬁcient (MCC), precision-recall (PR)
and area under the curve (AUC). It achieves 3%, 5% and 4%
greater MCC, PR and AUC scores, respectively as compared to
the existing model.
Index Terms—electricity theft detection, smart grid, deep
learning, data augmentation, feature engineering
Electricity has become a necessary part of our lives. The
electricity generated through hydropower, wind power, or ther-
mal power is transmitted to the grid stations. The grid stations
further transmit the electricity to power utilities for distribution
in different industrial and residential regions. Therefore, during
the generation, transmission and distribution of electricity
different losses often occur. These losses are generally dis-
tributed into technical losses (TLs) and non-technical losses
(NTLs). The former losses occur due to the energy dissipation
in electricity distribution lines, short circuits in transformer,
fatal electric shocks, etc. The later losses happen due to
the metering faults, bypassing the meters, physical tampering
through shunts devices, unpaid bills, etc. For power utilities,
the NTLs become a serious issue because they account for
billions of dollars in electricity losses every year. According
to a world bank report, the United States suffers from $6
billion  due to NTLs, which is a huge amount. Moreover,
the power utilities in Fujian China bear almost $15 million till
now . That is why the electricity theft detection (ETD) is a
quite serious issue for the current and future era. However, the
emergence of smart grids and advanced metering infrastructure
(AMI) enables two-way energy and communication ﬂow be-
tween power utilities and consumers. The smart meters collect
the electricity consumption data at each time stamp. So, the
sufﬁcient availability of electricity consumption data opens a
new way for the research community to contribute their efforts
for efﬁcient ETD.
The literature has teemed with the various methods of
ETD. Currently, in the literature, three main methods exist
for the detection of energy theft: 1) state or hardware based,
2) game theory based and 3) data driven based. The state
based methods  require additional hardware devices and
sensors for theft detection, which is not a suitable approach
because the additional monetary cost is needed to install
and maintain the devices. Similarly, the game theory based
methods  create a simulated environment where a game is
played between the consumers and power utilities for solving
the electricity theft problem. However, this is not a suitable
approach because designing the simulated environment for
complex real world scenarios is a challenging task. Therefore,
the data driven methods attract the research community’s
attention because they only require a dataset for model’s
training. Afterwords, they able to discriminate between the
normal and malicious users by exploiting different machine
and deep learning techniques.
A substantial body of work has been done in the literature to
identify energy theft by utilizing supervised and unsupervised
learning models. Many researcher , ,  use different
machine and deep learning techniques for detecting energy
theft. However, all of them have low detection rate and poor
generalization results due to the inefﬁcient feature engineering
and limited availability of labeled electricity data. Moreover,
another most common issue that occurs in ETD is a class im-
balance , , ,  because in the real world scenarios,
the theft samples are rarely available as compared to the honest
samples. Furthermore, the curse of dimensionality  is also
the major problem faces by the researchers. It degrades the
model’s accuracy as well as increase the computational time.
The major contribution of this research are as follows.
•In this study, a 2D-CNN and Bi-LSTM hybrid model is
used to solve the curse of dimensionality issue. The 2D-
CNN model captures only latent trend, hidden features
and periodicity from the high dimensional feature space.
And meanwhile, the Bi-LSTM learns the long-term tem-
poral correction from the electricity consumption data for
•The Bi-WGAN is used to solve the class imbalance prob-
lem. The minority class samples are synthesized through
Bi-WGAN. The Bi-WGAN generates most plausible theft
samples by using the strong capabilities of the additional
encoder module. The encoder module performs inverse
mapping of real input to the latent space in order to
strengthen the generator capabilities.
The rest of the paper is organized as follows: Section
2 presents the related work. The section 3 describes the
detailed information of the proposed system model. Whereas,
the section 4 describes the results and discussion about the
proposed and existing models. The conclusion of the proposed
model is presented in section 5.
II. RE LATE D WOR K
In literature, many researchers use different machine learn-
ing and statistical models for ETD, however, these models de-
mand manual feature engineering and relevant domain knowl-
edge. The existing models are applied to one-dimensional (1D)
electricity consumption data as capturing latent features from
the 1D data is a challenging task . Similarly, in , the
authors discuss that many existing machine learning models
do not focus on proper features engineering so, it leads the
models toward poor generalization results. In addition, the
available electricity consumption dataset is high dimensional.
So, extracting the most abstract features representation from
the high dimensional data is a very difﬁcult and challenging
task. As improper feature engineering also leads to high FPR,
which degrades the system performance.
In literature, many traditional schemes are focused on
handicraft feature engineering for NTL detection . Whereas,
there are no mathematical mechanisms founded in the existing
literature for identifying the shunt and double-tapping attacks.
Moreover, for detecting any new type of NTL behavior, the
traditional schemes demand re involvement of domain experts
for creating new relevant features, which is a tedious and time-
In , , , , the authors address that in existing
methods, there are no appropriate feature engineering mech-
anisms presented. The manual feature engineering process
is required extra time and domain knowledge. Whereas in
, the autoencoder is used to extract the abstract features
from high dimensional electricity consumption data. However,
it still needs improvement to detect some intelligent attacks
such that zero-day attack with high precision. The authors
of ,  study that in literature, the features relevant
to electricity consumption are mostly designed manually by
using the domain experts’ knowledge. However, these fea-
tures are not still suitable for detecting NTL because of
arbitrarily changing patterns of electricity consumption proﬁle.
So, for industrial users, these manually generated features
are not sufﬁcient for efﬁcient pattern recognition and NTL
detection. In ,  authors mention that several previous
studies exploit different machine and deep learning models
for efﬁcient ETD and feature construction. However, none of
them maintain temporal correlation of a customer consumption
pattern for a long period for efﬁcient theft identiﬁcation. Also,
learning hidden patterns from 1D electricity consumption data
is a difﬁcult task. Whereas, in , , the conventional
machine learning models have low detection ability and poor
performance results because of several non malicious factors.
In , a semisupervised based solution is proposed for ETD.
However, it still needs improvement in terms of improving DR
and lowering FPR.
In , , , , , the authors address that the
data imbalance is a vital issue in ETD. In a real world
scenario, theft samples are rarely available as compare to
honest samples. So, the machine learning classiﬁers show
biasness towards majority class samples. In addition, the
limited availability of theft samples degrades the DR of
classiﬁcation models. In , , , the NTL detection
through machine learning techniques become a challenging
task due to the insufﬁcient availability of labeled training
data. Similarly, in , , , the severe proportion of
imbalanced data also affects the classiﬁcation model’s gener-
alization ability and has a high chances of over-ﬁtting. The
authors of  discuss that different oversampling techniques
are used to reproduce the minority class data samples for
solving the data imbalance issue in the case of ETD. The
existing oversampling techniques such as SMOTE, adaptive
synthetic (ADASYN), generative adversarial network (GAN),
etc., are exploited for synthesizing the theft class samples.
However, these techniques did not consider the ﬂuctuation and
probability distribution curve while generating theft samples,
which failed to give a real assessment.
III. PROP OS ED S YS TE M MO DE L
This section describes the detailed description of each
component of the proposed system model. Whereas, Fig. 1
shows the complete workﬂow of the proposed methodology.
1) The electricity consumption data often contains noisy
and missing values because of faulty meters, meters’
hacking, maintenance or storage issues, etc. So, the
erroneous and noisy data degrade the models’ perfor-
mance. To tackle these issues, we use data preprocessing
techniques in this paper. The missing values are handled
through the linear interpolation method. This method
Fine tune training
Bi-WGAN for data augmentation
Normal class samples Theft class samples
2D weekly data 1D daily data
LSTM LSTM LSTM
LSTM LSTM LSTM
L1. Class imbalanced
L2. Curse of dimensionality
S1. Data augmentation through Bi-
S2. 2D-CNN and Bi-LSTM
Fig. 1: Proposed system model
ﬁlls the missing values by taking the average of the next
and previous day’s electricity consumption. Similarly,
the noise and outliers are also necessary to be handled
because they affect the model’s performance. So, we
use three sigma rule of thumb  (TSR) to handle
the outliers. Afterwards, the electricity consumption data
should be normalized because the deep neural networks
are very sensitive to diverse data. So, we use Min-Max
normalization to normalize the dataset.
2) The Bi-WGAN  is the enhanced version of WGAN
, . An additional encoder module is integrated
with Bi-WGAN for enhancing the capabilities of the
generator network. Therefore, in this study, to overcome
the class imbalanced issue, the Bi-WGAN is employed
for generating the most plausible fake electricity theft
patterns that closely mimic the real world behavior of
electricity thieves. The Bi-WGAN model is equipped
with an additional encoder module for effective inverse
mapping of latent space for a given input. To generate
the most prominent theft samples, the encoder module
works in the inverse direction of the generator network
for creating the latent space using the real input data.
Moreover, the Bi-WGAN utilizes wasserstein distance
(WD) as a loss function, which helps the model for
stable learning and speedy convergence towards global
optima. The WD is also called the earth moving dis-
tance. It moves the small portion of one probability
distribution to the other for the sake of generating
the fake samples closely related to the real samples.
So, during the adversarial training of generator and
discriminator, the WD should be minimum for better
generation of the fake samples.
3) In , the authors apply a 2D-CNN on temporal
data for speech recognition. It shows satisfactory per-
formance in speech recognition. Moreover, in , the
2D-CNN is used to capture the hidden patterns and
trends from the electricity load proﬁle. As motivated
from  and , in this methodology, a 2D-CNN
is exploited for extracting the most prominent features
from the high-dimensional feature space. The available
electricity data is in 1D raw form. To capture the hidden
ﬂuctuations and trends from the 1D electricity data is
very difﬁcult because of no association of consumption
patterns with each other. Therefore, in this work, we
transform the 1D daily electricity data into 2D weekly
electricity consumption data. The data is passed to the
CNN model for capturing the latent patterns and trends
for better generalization. The 2D-CNN model applies 2D
convolution layers on the data for convolving operations.
Every convolution layer has a speciﬁc receptive ﬁeld
or area where different ﬁlters are stride and generate
feature maps. Afterwards, pooling layers are also applied
to the feature maps in order to reduce the dimensionality
and number of parameters. In addition, max-pooling is
chosen for pooling operations. It picks up the maximum
value from the receptive ﬁeld and discard the remaining
4) In the proposed methodology, a Bi-LSTM  is used
for capturing the temporal correlated features from the
time series data for efﬁcient ETD. The Bi-LSTM utilizes
the forward and backward pass concurrently on each
timestamp. It also maintains the context of previous
knowledge as well as the current knowledge for better
prediction. Due to preserving the previous long-term
history of customer patterns, it efﬁciently deals with the
non-malicious factors and reduces the FPR to a minimal
level. So, by lowering FPR, it also saves the unnecessary
on-site inspections’ cost. In Bi-LSTM, different gates are
used to maintain the sequence of information. The input
gate in LSTM takes the data of previous and current
states and passes it through the sigmoid function for de-
ciding, which state information is important. Similarly,
the forget gate decides, which information should be
kept or thrown. Finally, the output gate decides, which
and how much information is passed to the next hidden
state. Furthermore, a cell state is maintained for storing
the necessary information for a long time. The beneﬁt of
the Bi-LSTM is that it also remembers the context of the
previous knowledge in both directions, which increases
the detection accuracy and reduces the FPR.
5) In the hybrid layer, we concatenate the output features
of both the 2D-CNN and Bi-LSTM models into a single
feature map and apply a joint weight for hybrid training.
Then, we use the sigmoid activation function to the
combined feature map for ﬁnal classiﬁcation.
IV. MOD EL E VALUATION
This section contains the simulations results and discussion
of the proposed and benchmark models. The proposed model
is evaluated and testing on SGCC dataset, which is publically
available on the internet.
A. Performance metrics
As the ETD is a class imbalance problem so, the selection of
appropriate performance measures is a necessary task for the
comprehensive evaluation of the proposed model. Therefore,
in this study, PR, AUC and MCC scores are considered as the
performance metrics. The mathematical formulation of these
metrics is given as follows:
P recision =T P
T P +F P ,(1)
Recall =T P
T P +F N ,(2)
MCC =T P ∗T N −F P ∗F N
P recision +Recall ,(3)
AUC =Pi∈positiveclssRAN Ki−P(1+P
where TP and TN denote how much consumers are accurately
identiﬁed as normal and abnormal, respectively. Whereas, the
FP represents those consumers who are wrongly classiﬁed as
abnormal. Similarly, the FN denotes those consumers, which
are misclassiﬁed as normal. The precision and recall scores tell
about the accurate prediction of theft. The AUC-ROC score
measures the separability of fair and unfair classes. The MCC
score equally focuses on TP, TN, FP and FN for fair analysis.
It has ranges between -1 and +1.The MCC score close to +1
depicts that the model performs best while detecting the energy
thieves and vise versa.
B. Simulation results
Fig.2 depicts the loss curves of both generator and discrim-
inator models while training and testing on the real and fake
samples. In Bi-WGAN, during each iteration of training, half
batch of real samples and half batch of fake samples is used
to update the weights of the discriminator model. Therefore,
the blue curve shows the loss of discriminator model on real
samples and the orange curve shows the loss of discriminator
on fake samples formulated by the generator. These both
curves clearly show that the discriminator model classiﬁes the
fake samples more efﬁciently than the real samples after a few
iterations. It also depicts that the discriminator model ﬁghts
well with generator model during the adversarial training.
Moreover, in Bi-WGAN, the discriminator model is updated
more as compared to the generator model during the training
phase for better generalization results. Whereas, the green
curve demonstrate the loss of generator model during the
training time. The generator model gradually reduces loss on
each iteration because of having additional encoder module
for the inverse mapping of real samples back to the latent
dimension. Due to the updated wasserstain loss function and
additional encoder module, the Bi-WGAN model has best
generalization results while generating the electricity theft
TABLE I: Mapping Table
Limitations identiﬁed Solutions proposed Validations
L.1 Curse of dimensionality and inefﬁcient fea-
ture engineering degrade the model’s accuracy
as well as increase the computational time ,
S.1 A hybrid 2D-CNN and Bi-LSTM approach is
used for extracting the most prominent features
from the high dimensional time series data.
V.1 The performance of the proposed model
is validated through MCC, AUC-ROC and
precision-recall curve (PRC), as given in Figs.
3a, 3b and 4
L.2 Due to class imbalanced issue the classiﬁer’s
biased towards majority class , .
S.2 Bi-WGAN generates the most plausible real
world synthetic attack samples by the addition of
encoder module along with generator.
V.2 Proposed Bi-WGAN synthesizes fake theft
samples and the results depicts in Figs. 2.
Fig. 2: Loss of generator and discriminator of Bi-WGAN
Table I shows the mapping of limitations to their proposed
solutions and validations. The limitation L1 describes the
curse of dimensionality and inefﬁcient feature engineering
issues. The authors of , , , did not consider any
feature engineering mechanism for extracting the most rele-
vant features from the high dimensional feature space, which
decreases the model’s detection accuracy as well as increases
the computational overhead. So, in S1, we present a hybrid
2D-CNN and Bi-LSTM approach for extracting the most
prominent features from the high-dimensional time series data.
In V1, the results of S1 are validated through MCC, AUC-
ROC and PRC. as shown in Figs. 3a, 3b and 4. Whereas, the
L2 is about the data imbalance issue. In ETD, the collection
of balance data is a challenging task because in a real world
scenario the electricity theft samples are rarely available as
compared to the normal users. So, the classiﬁcation models get
biased towards the majority class due to the class imbalance
problem. So, in S2, a Bi-WGAN model is used to generate
the synthesized fake theft samples that are closely related to
the real world theft cases. Therefore, in V2, the Bi-WGAN
performance is validated by measuring the classiﬁcation results
on the synthetic samples, as shown in Figs.2. Moreover, the
Fig. 2 validates the convergence speed of the Bi-WGAN in
terms of loss.
In Fig. 3a the MCC score is illustrated.
The MCC score equally focuses on TP, TN, FP and FN for
quantifying the correlation between them. The calculation of
MCC score is necessary in case of ETD because the FN rate is
also valuable for power utilities for recovering the maximum
revenue. The proposed model achieves 0.91 MCC score, which
is good in case of ETD. It depicts that the proposed model
efﬁciently tackle the FN rate and helps the power utilities to
save the ﬁnancial and onsite inspections’ expenses.
Fig. 3b shows the ROC-AUC score of the proposed model
and a benchmark LSTM-MLP model. The proposed and
LSTM-MLP models achieve AUC-ROC score of 0.98 and
0.96, respectively. It clearly means that the proposed model
outperforms the existing benchmark model while detecting the
energy thieves. The proposed model efﬁciently reduces the
high FPR to a minimal levels due to the strong memorization
and learning capabilities of Bi-LSTM model. The 2D-CNN
module of hybrid model solves the curse of dimensionality
issue by using the powerful capabilities of max pooling layers.
Fig. 4 illustrates the PRC score of the proposed model
and benchmark LSTM-MLP model. Both precision and recall
scores are valuable and important for power utilities. These
scores helps the power utilities to detect the electricity thieves
and recover the maximum revenue. It is seen that the pro-
posed and existing LSTM-MLP model obtain 0.96 and 0.94
PRC score, respectively. The simulation results prove that the
proposed model performs better than the LSTM-MLP model
while detecting energy thieves.
This paper presents a novel hybrid deep learning model for
efﬁcient ETD. The problem of imbalanced dataset is solved
thorough Bi-WGAN. The Bi-WGAN efﬁciently learns the
electricity theft patterns and then generates new theft samples
that are closely mimic the real world theft behavior. The Bi-
WGAN performs well due to the additions of external encoder
module with generator model for the inverse mapping of real
inputs back to the latent space. It increases the convergence
speed of the generator model of Bi-WGAN and helps it
to generate most plausible theft samples. Moreover, in Bi-
WGAN, the wasserstain distance is used as a loss function,
which increases the stable learning of Bi-WGAN model.
Furthermore, the curse of dimensionality issue is solved us-
ing the strong capabilities of 2D-CNN model and Bi-LSTM
model. The 2D-CNN model signiﬁcantly reduces the data
dimensions through the pooling layers. Meanwhile, the Bi-
LSTM stores only the relevant important information and
discards the redundant information and overlapping features.
Finally, the simulations results depict that the proposed model
outperforms in terms of AUC-ROC, PRC and MCC score
(a) MCC score of hybrid 2D-CNN and Bi-LSTM (b) ROC-AUC curve of hybrid 2D-CNN and Bi-LSTM
Fig. 3: MCC and ROC-AUC of the proposed 2D-CNN and Bi-LSTM
Fig. 4: PRC of hybrid 2D-CNN and Bi-LSTM
values, which are 3%, 2% and 4% greater than the existing
 Patrick McDaniel and Stephen McLaughlin. Security and privacy
challenges in the smart grid. IEEE Security & Privacy, 7(3):75–77,
 Qixin Chen, Kedi Zheng, Chongqing Kang, and Fenyu Huangfu. De-
tection methods of abnormal electricity consumption behaviors: Review
and prospect. Automation of Electric Power Systems, 42(17):189–199,
 Chun-Hao Lo and Nirwan Ansari. Consumer: A novel hybrid intrusion
detection system for distribution networks in smart grid. IEEE Trans-
actions on Emerging Topics in Computing, 1(1):33–44, 2013.
 Saurabh Amin, Galina A Schwartz, and Hamidou Tembine. Incentives
and security in electricity distribution networks. In International
Conference on Decision and Game Theory for Security, pages 264–280.
 Zibin Zheng, Yatao Yang, Xiangdong Niu, Hong-Ning Dai, and Yuren
Zhou. Wide and deep convolutional neural networks for electricity-
theft detection to secure smart grids. IEEE Transactions on Industrial
Informatics, 14(4):1606–1615, 2017.
 Madalina Mihaela Buzau, Javier Tejedor-Aguilera, Pedro Cruz-Romero,
and Antonio G´
osito. Detection of non-technical losses using
smart meter data and supervised learning. IEEE Transactions on Smart
Grid, 10(3):2661–2670, 2018.
 Xiangyu Kong, Xin Zhao, Chao Liu, Qiushuo Li, DeLong Dong, and
Ye Li. Electricity theft detection in low-voltage stations based on
similarity measure and dt-ksvm. International Journal of Electrical
Power & Energy Systems, 125:106544, 2021.
 Madalina-Mihaela Buzau, Javier Tejedor-Aguilera, Pedro Cruz-Romero,
and Antonio G´
osito. Hybrid deep neural networks for
detection of non-technical losses in electricity smart meters. IEEE
Transactions on Power Systems, 35(2):1254–1263, 2019.
 Rouzbeh Razavi, Amin Gharipour, Martin Fleury, and Ikpe Justice
Akpan. A practical feature-engineering framework for electricity theft
detection in smart grids. Applied energy, 238:481–494, 2019.
 Donghuan Yao, Mi Wen, Xiaohui Liang, Zipeng Fu, Kai Zhang, and
Baojia Yang. Energy theft detection with energy privacy preservation in
the smart grid. IEEE Internet of Things Journal, 6(5):7659–7669, 2019.
 Rajiv Punmiya and Sangho Choe. Energy theft detection using gradient
boosting theft detector with feature engineering-based preprocessing.
IEEE Transactions on Smart Grid, 10(2):2326–2329, 2019.
 Yifan Huang and Qifeng Xu. Electricity theft detection based on stacked
sparse denoising autoencoder. International Journal of Electrical Power
& Energy Systems, 125:106448, 2021.
 Arooj Arif, Nadeem Javaid, Abdulaziz Aldegheishem, and Nabil Al-
rajeh. Big data analytics for identifying electricity theft using machine
learning approaches in micro grids for smart communities. Concurrency
and Computation Practice and Experience, DOI: 10.1002/cpe.6316,
 Abdulaziz Aldegheishem, Mubbashra Anwar, Nadeem Javaid, Nabil
Alrajeh, Muhammad Shaﬁq, and Hasan Ahmed. Towards sustainable
energy efﬁciency with intelligent electricity theft detection in smart grids
emphasising enhanced neural networks. IEEE Access, 9:25036–25061,
 Xiaoquan Lu, Yu Zhou, Zhongdong Wang, Yongxian Yi, Longji Feng,
and Fei Wang. Knowledge embedded semi-supervised deep learning for
detecting non-technical losses in the smart grid. Energies, 12(18):3452,
 Caio CO Ramos, Douglas Rodrigues, Andr´
e N de Souza, and Jo˜
Papa. On the study of commercial losses in brazil: a binary black hole
algorithm for theft characterization. IEEE Transactions on Smart Grid,
 Behc¸et Kocaman and Vedat T¨
umen. Detection of electricity theft using
data processing and lstm method in distribution systems. S¯
 Tianyu Hu, Qinglai Guo, Hongbin Sun, Tian-En Huang, and Jian Lan.
Nontechnical losses detection through coordinated biwgan and svdd.
IEEE Transactions on Neural Networks and Learning Systems, 2020.
 Muhammad Salman Saeed, Mohd Wazir Mustafa, Usman Ullah Sheikh,
Touqeer Ahmed Jumani, and Nayyar Hussain Mirjat. Ensemble bagged
tree based classiﬁcation for reducing non-technical losses in multan
electric power company of pakistan. Electronics, 8(8):860, 2019.
 Xuejiao Gong, Bo Tang, Ruijin Zhu, Wenlong Liao, and Like Song. Data
augmentation for electricity theft detection using conditional variational
auto-encoder. Energies, 13(17):4291, 2020.
 Zeeshan Aslam, Fahad Ahmed, Ahmad Almogren, Muhammad Shaﬁq,
Mansour Zuair, and Nadeem Javaid. An attention guided semi-
supervised learning mechanism to detect electricity frauds in the dis-
tribution systems. IEEE Access, 8:221767–221782, 2020.
 Shuan Li, Yinghua Han, Xu Yao, Song Yingchen, Jinkuan Wang, and
Qiang Zhao. Electricity theft detection in power grids with deep learning
and random forests. Journal of Electrical and Computer Engineering,
 Nelson Fabian Avila, Gerardo Figueroa, and Chia-Chi Chu. Ntl detection
in electric distribution systems using the maximal overlap discrete
wavelet-packet transform and random undersampling boosting. IEEE
Transactions on Power Systems, 33(6):7171–7180, 2018.
 Paria Jokar, Nasim Arianpoo, and Victor CM Leung. Electricity
theft detection in ami using customers consumption patterns. IEEE
Transactions on Smart Grid, 7(1):216–226, 2015.
 Kedi Zheng, Qixin Chen, Yi Wang, Chongqing Kang, and Qing Xia.
A novel combined data-driven approach for electricity theft detection.
IEEE Transactions on Industrial Informatics, 15(3):1809–1819, 2018.
 Sravan Kumar Gunturi and Dipu Sarkar. Ensemble machine learning
models for the detection of energy theft. Electric Power Systems
Research, 192:106904, 2021.
 Md Hasan, Raﬁa Nishat Toma, Abdullah-Al Nahid, MM Islam, Jong-
Myon Kim, et al. Electricity theft detection in smart grid systems: A
cnn-lstm based approach. Energies, 12(17):3310, 2019.
 Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly
detection: A survey. ACM computing surveys (CSUR), 41(3):1–58, 2009.
 Martin Arjovsky, Soumith Chintala, and L´
eon Bottou. Wasserstein
generative adversarial networks. In International conference on machine
learning, pages 214–223. PMLR, 2017.
 Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin,
and Aaron Courville. Improved training of wasserstein gans. arXiv
preprint arXiv:1704.00028, 2017.
 Jianfeng Zhao, Xia Mao, and Lijiang Chen. Speech emotion recognition
using deep 1d & 2d cnn lstm networks. Biomedical Signal Processing
and Control, 47:312–323, 2019.
 Zhiyong Cui, Ruimin Ke, Ziyuan Pu, and Yinhai Wang. Stacked bidirec-
tional and unidirectional lstm recurrent neural network for forecasting
network-wide trafﬁc state with missing values. Transportation Research
Part C: Emerging Technologies, 118:102674, 2020.