Content uploaded by Muhammad Asif

Author content

All content in this area was uploaded by Muhammad Asif on Mar 31, 2022

Content may be subject to copyright.

Received January 12, 2022, accepted February 4, 2022, date of publication February 8, 2022, date of current version March 16, 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3150047

Data Augmentation Using BiWGAN, Feature

Extraction and Classification by Hybrid 2DCNN

and BiLSTM to Detect Non-Technical

Losses in Smart Grids

MUHAMMAD ASIF 1, (Graduate Student Member, IEEE), OROOJ NAZEER1,2,

NADEEM JAVAID 1,3, (Senior Member, IEEE), EMAN H. ALKHAMMASH4,

AND MYRIAM HADJOUNI5

1Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan

2Department of Computing and Technology, Abasyn University, Islamabad 44000, Pakistan

3School of Computer Science, University of Technology Sydney, Ultimo, NSW 2007, Australia

4Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia

5Department of Computer Sciences, College of Computer and Information Science, Princess Nourah Bint Abdulrahman University, Riyadh 11671, Saudi Arabia

Corresponding author: Nadeem Javaid (nadeemjavaidqau@gmail.com)

This work is supported by Taif University Researchers Supporting Project number (TURSP-2020/292) Taif University, Taif, Saudi Arabia.

This work is also supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number

(PNURSP2022R193), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

ABSTRACT In this paper, we present a hybrid deep learning model that is based on a two-dimensional

convolutional neural network (2D-CNN) and a bidirectional long short-term memory network (Bi-LSTM)to

detect non-technical losses (NTLs) in smart meters. NTLs occur due to the fraudulent use of electricity.

The global integration of smart meters has proven to be beneﬁcial for the storage of historical electricity

consumption (EC) data. The proposed methodology learns the deep insights from the historical EC data

and informs power utilities about the presence of NTLs. However, the effective detection of NTLs faces

the problem of class imbalance that occurs due to the rare availability of fraudulent electricity consumers.

To solve this issue, an evolutionary bidirectional Wasserstein generative adversarial network (Bi-WGAN)

is employed. Bi-WGAN synthesizes the most plausible fraudulent EC samples by integrating an auxiliary

encoder module. Besides, the inevitable curse of high dimensional data reduces the generalization ability

of classiﬁers. The proposed hybrid model efﬁciently handles the highly dynamic data by utilizing its potent

feature extracting capabilities. The one-dimensional daily EC data is passed to Bi-LSTM model for capturing

the non-malicious changes from consumers’ proﬁles. Meanwhile, 2D-CNN takes 2D weekly EC data as

input to extract the potential features by applying different convolutions and pooling operations. Extensive

experiments are conducted on a realistic smart meters dataset to prove the effectiveness of the proposed

model. The results show that the proposed model outperforms the state-of-the-art models by achieving area

under the curve receiver operating characteristics of 0.97 and precision-recall area under the curve of 0.98,

which make it suitable for real-world scenarios.

INDEX TERMS Bidirectional generative adversarial network, convolutional neural network, data

augmentation, deep learning, electricity theft detection, feature extraction, long short-term memory network,

non-technical losses, smart grids.

I. INTRODUCTION

Nowadays, the major activities of human lives are dependent

on the electricity. It has become an important part of human

The associate editor coordinating the review of this manuscript and

approving it for publication was Sotirios Goudos.

life. In the modern era, varieties of ways are introduced to

generate electricity, such as production through hydro power,

wind power, fuel power and thermal power. However, differ-

ent losses occur during the generation of electricity [1]. The

most common losses are classiﬁed into technical losses (TLs)

and non-technical losses (NTLs). TLs happen because of the

VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 27467

M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM

heat production in electrical distribution lines, short circuits

in transformers or other grids components, etc. Whereas,

NTLs occur due to energy theft, meter bypassing, meter

malfunctioning, billing errors, etc. The major source of NTL

is electricity theft (ET). The power utilities around the globe

accounted for billions of dollars per annum due to NTLs. The

electric utilities in the United States of America bear almost

$6 billion every year because of NTLs [2]. Similarly, the

Chinese power companies lost almost $15 million till 2018 as

a result of energy theft [3]. The underdeveloped countries

are also affected by NTLs, such as Brazil and India and they

lose approximately 16% and 25% of their total energy supply,

respectively [4]. Besides the huge ﬁnancial loss, NTLs also

disturb the normal ﬂow of electricity by overloading the

transformers and grid’s internal components.

The recent enhancement in advanced metering infras-

tructure (AMI) integrates communication ﬂow with energy

ﬂow to enable the cooperation between consumers and

electric utilities. The integration of AMI brings potential

beneﬁts, such as efﬁcient recording of electricity usage,

remote controlling of electricity consumption (EC), real-

time pricing and providing grids’ status information for

power utilities to detect NTLs. However, it introduces numer-

ous ways for electricity thieves to remotely compromise

the smart metering systems and manipulate meters’ read-

ing [5]. Keeping the above concerns in view, electricity

theft detection (ETD) has become essential for the modern

era. In addition, the availability of massive EC data enables

researchers to exploit state-of-the-art data driven methods for

better ETD.

According to literature, different researchers performed

ETD using varieties of statistical and machine learning (ML)

methods [6], [7]. In general, three methods are commonly

used for ETD. These methods are enlisted as follows: i) state

based methods, ii) game theory based methods and iii) data

driven based methods. In state or hardware based methods,

special devices and sensors are integrated with the smart

meters to detect the abnormal consumers [8]. However, these

methods are costly in terms of both time and money. More-

over, extra maintenance cost is required for installation and

management of these devices. Whereas, in game theory based

methods, a virtual environment is initially created. Then,

a game is played between electric utilities and consumers

to perform ETD [9]. A special utility function is formulated

where the rules and regulations are deﬁned. The game is

stopped when the equilibrium state is achieved. However,

these methods are not proven to be effective because design-

ing a suitable utility function for complex scenarios is a

challenging task for researchers. In contrast, the data driven

based methods demand only data for model’s training so they

become cost effective solutions to perform ETD. The massive

availability of EC data enables the application of numer-

ous data driven based solutions. The researchers put their

efforts by adopting different supervised and unsupervised ML

solutions to detect electricity thieves and support the power

industries to reduce revenue loss.

TABLE 1. List of Acronyms.

In recent literature, varieties of supervised and unsuper-

vised methods are adopted to detect energy thieves in smart

grids. In this regard, several machine and deep learning based

solutions are proposed by researchers to perform ETD [1],

[10]–[12], [13]. However, these solutions do not provide

satisfactory results because of inefﬁcient feature engineering.

Poor feature engineering also degrades the generalization

ability of models. Moreover, limited amount of labeled EC

data is another underlying cause that decreases the detection

accuracy. Furthermore, in deep learning models, the problem

of internal covariate shift (ICS) adversely affects the stable

learning of hidden layers [1], [14]. ICS occurs when the input

distribution of a hidden neural layer is transferred to other

layers. The severe lack of fraudulent electricity consumers

in real-world scenarios creates a class imbalance problem,

which is an important concern for efﬁcient ETD [1], [5], [14],

[15]. In addition, the noisy and high dimensional data leads

27468 VOLUME 10, 2022

M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM

to the curse of dimensionality issue, which is confronted by

the researchers during ETD [14].

Keeping the above concerns in view, we propose a novel

deep learning solution to improve the detection accuracy

of ETD in power grids. The proposed model consists of

a two-dimensional convolutional neural network (2D-CNN)

and a bidirectional long short-term memory (Bi-LSTM).

A bidirectional Wasserstein generative adversarial network

(Bi-WGAN) is exploited for synthesizing the minority class

theft samples. The one-dimensional (1D) daily EC data is

converted into a 2D manner according to weeks. 2D-CNN

is developed to capture the weekly insights and periodicity

from 2D weekly data. Meanwhile, Bi-LSTM takes 1D data

as input and extracts the long-term temporal correlation from

EC proﬁles. It also overcomes the effects of non-malicious

factors and consequently, reduces the high false positive rate

(FPR). Finally, a single feature vector is devised by merging

the outcomes of both models. Then, a sigmoid function is

employed for ﬁnal ETD. It is worth mentioning that this work

is the extension of [16].

The major contributions of this study are enlisted as follows.

•A novel state-of-the-art methodology is introduced,

which combines 2D-CNN and Bi-LSTM models. The

proposed model efﬁciently performs feature extraction

and resolves the curse of dimensionality issue.

•The Bi-WGAN model is employed to resolve the

inevitable class imbalance problem. The samples gener-

ated by the model are closely related to real-world theft

patterns. To the best of our knowledge, we apply Bi-

WGAN ﬁrst time in the ETD domain for augmenting

the theft class samples.

•The Bi-LSTM model is leveraged to handle the problem

of high FPR, which occurs due to several non-malicious

factors. The model intelligently captures long-term ten-

dency and temporal correlations from the EC data to

minimize the effects of non-malicious changes.

•For comprehensive analysis of the proposed model, area

under the curve (AUC), precision, recall, AUC receiver

operating characteristics (AUC-ROC), precision-recall

AUC (PR-AUC), F1-score and Matthews correlation

coefﬁcient (MCC) metrics are considered.

The organization of the manuscript is as follows. The related

work is presented in Section II. The formulation and analysis

of the problem statement are given in Section III. The pro-

posed scheme is explained in Section IV. Section Vdescribes

the experimental results of the proposed and benchmark

schemes. In last, the manuscript is concluded in Section VI.

II. RELATED WORK

The literature is saturated with numerous statistical and

ML models where ETD is performed. In fact, these models

require handcraft feature engineering and pertinent domain

expertise, which is a difﬁcult and time-consuming task. The

existing ML models under-performed while capturing tem-

poral correlations and complex non linearities from EC pro-

ﬁles. In general, most of the ML models performed ETD by

utilizing only 1D EC data. However, catching latent features

and periodicity from 1D data is a difﬁcult process [1]. In [14],

it is referred that all conventional schemes are centered

around manual feature engineering in order to identify NTL

patterns. Moreover, in the existing work, no mathematical

based solutions are established to distinguish shunt and dou-

ble tapping attacks. The authors of [17] examine that the

existing ML algorithms are not taken into account for the

proper feature engineering step, which consequently leads to

the poor generalization issue.

The authors of [5] identify that many conventional

ML techniques are exploited to detect NTLs in power grids.

However, they neglect an efﬁcient feature engineering pro-

cess that results in poor generalization and low detection

accuracy. Many classiﬁcation and clustering techniques make

an early decision about the abrupt changes in consumers’

consumption that results in a high FPR because it may happen

due to several non-malicious factors, e.g., weekends, change

of residents, change of appliances, change of seasonality,

etc. Moreover, the existing techniques perform poorly in the

detection of zero-day attacks. Similarly, the authors of [12]

and [18] highlight the issue of inappropriate feature engineer-

ing. The process of handcraft feature engineering demands

the involvement of domain expert, which is a time intensive

and difﬁcult task. In [18], the most prominent features are

extracted through autoencoder from highly dynamic EC data

to perform efﬁcient ETD. However, further improvement is

needed to recognize some intelligent attacks, such as shunt

attack, zero data attack, double tapping attack and so forth.

In [19], numerous clustering based techniques are

exploited for anomaly detection in smart meters’ data. How-

ever, the ﬂuctuations and variations in the normal and theft

load proﬁles are not properly detected, which yield poor

detection results. Similarly, the authors in [20] analyze some

traditional techniques that are applied to detect data poisoning

attacks. However, these techniques add up an additional stage

of data ﬁltering, which ﬁrst removes any available false

label and then performs the detection step. In [21]–[24], the

authors discuss that many pattern recognition and conven-

tional ML techniques are employed for NTL detection. These

techniques demand extensive handcraft feature engineering,

which is a laborious, time-consuming and ﬁnancially expen-

sive task. Moreover, the re-involvement of the domain experts

is needed when new features are to be required. In addition,

these techniques poorly perform to extract vital features from

the available high dimensional EC data.

According to [25], many conventional anomaly detection

algorithms mistakenly detect the normal user as abnormal

because of several non-malicious factors: changes of home

residents, weekends, changes in the number of appliances,

etc. These non-malicious factors also become the reason

of high FPR. Moreover, in [21], it is mentioned that many

researchers exploit deep learning models for theft identiﬁ-

cation and self feature learning from the highly dynamic

EC data. However, these models are tested and evaluated on

the artiﬁcially generated data, which is not effective for a

VOLUME 10, 2022 27469

M. Asif et al.: Data Augmentation Using BiWGAN, Feature Extraction and Classification by Hybrid 2DCNN and BiLSTM

reliable assessment. According to [26] and [27], the manual

creation of features is not sufﬁcient to properly detect the

NTL behavior because of stochastic changes in EC proﬁles.

In [28], the problem of maintaining temporal correlation in

the existing ML models is highlighted. Moreover, the learning

algorithms are unable to learn the potential features from

1D raw EC data.

The study of [29] demonstrates that many researchers

propose different electricity theft detectors. However, these

detectors have low detection accuracy because the EC data

is a highly dynamic and rapidly growing time-series data.

In [30], the authors discuss that many conventional data

mining and ML techniques are exploited to ﬁlter customers’

consumption patterns for the detection of irregular elec-

tricity proﬁles. However, these techniques under-perform

because of improper feature engineering. Moreover, different

non-malicious factors mislead the classiﬁcation model in a

wrong direction, which is a quite serious issue in the existing

research. From [31] and [32], numerous non-malicious fac-

tors degrade the detection accuracy of traditional ML models.

In [33], bidirectional gated recurrent unit (Bi-GRU) is used

for extracting the high level features from the electricity load

proﬁle in order to detect NTLs. However, synthetic minority

oversampling technique (SMOTE) and SMOTE over sam-

pling tomik link are used for data balancing, which raise

overﬁtting issue because of generating duplicate records and

vanishing the temporal correlation between consumption pat-

terns. In addition, the authors of [34] discover that the existing

deep learning techniques are not suitable for anomaly detec-

tion in electricity power data because of interpretability and

practicality concerns. On the other hand, the authors of [2],

[12], [17], [21], [29] and [35] highlight a critical class imbal-

ance issue that occurs in ETD because of less availability

of fraudulent consumers. Consequently, the majority class

dominates the minority class, which leads to high FPR. More-

over, the learning algorithms are skewed towards the majority

class. As a result, the misclassiﬁcation rate is increased to a

greater extent. According to [4], [11] and [19], the problem

of limited amount of labeled EC data becomes challenging

for ML algorithms to perform efﬁcient ETD. Similarly, the

authors of [22], [26] and [28] examine that the severe imbal-

ance proportion of classes adversely affect the generalization

power of classiﬁers. Due to this, the classiﬁcation algorithms

have higher chance to suffer from the overﬁtting issue.

From [32] and [36], the existing literature is teemed with

various oversampling techniques that are employed to handle

the problem of class imbalance. In oversampling techniques,

the minority class samples are augmented and the proportion

of classes is equalized. SMOTE, K-mean SMOTE, adaptive

synthetic (ADASYN) and so forth are well known oversam-

pling techniques that are used to synthesize the minority

class instances. The GAN model is also exploited to augment

the minority class samples. It becomes popular due to its

tremendous success in generating artiﬁcial data. However, the

above mentioned techniques lack in capturing the arbitrary

ﬂuctuation and probabilistic curve from EC patterns while

generating fraudulent samples. Consequently, the ﬁnal clas-

siﬁcation results do not provide real-world assessment.

III. PROBLEM ANALYSIS

With the advent of AMI, the energy ﬂow is integrated with

the communication ﬂow in order to establish two way real

time coordination between consumers and power industries.

However, with the involvement of the Internet, the communi-

cation ﬂow can be prone to different contamination attacks,

which are harmful for power utilities and become one of

the reasons for NTLs. So, there is an important need for

a robust ETD model. In [1], wide and deep convolutional

neural network (WD-CNN) is proposed to reduce the curse of

dimensionality. However, a single layer of neural network is

integrated inside the wide component that does not learn the

temporal correlation and hidden features from 1D EC data

and also gets stuck in local optima. Moreover, the models

presented in [2], [4] and [14] do not use any feature extrac-

tion module to reduce the data dimensionality. The rapid

growth in the dimensions of time series data degrades the

model’s accuracy and increases the computational overhead.

Therefore, if data dimensionality is not handled correctly,

the deep or ML models memorize the noise and redundant

features that lead toward poor generalization problem. Fur-

thermore, the ICS is another common issue that occurs in

deep neural networks. It happens due to the shifting of input

distribution between different layers of neural networks and

the changing of network parameters on each hidden layer.

However, in [1] and [14], no mechanism is presented to

handle the ICS problem, which adversely affects the stable

learning of neural networks. It also degrades the hidden

layers’ feature learning capabilities, increases the training

time and slows down the convergence rate. Another major

issue faced by the researchers is the high FPR that occurs

due to several non-malicious factors and false injection of

noise in data by the intelligent attackers. For instance, the

deep learning models used in [1] and [21] are unable to

capture the non-malicious changes and long-term temporal

correlation from the EC data, which increases the FPR and

onsite inspection cost as well.

The imbalanced nature of data is another major con-

cern that occurs when detecting energy thieves. It raises

the overﬁtting and poor generalization issues. In [1], [14]

and [15], the problem of imbalance data is not handled.

As a result, the classiﬁcation model is skewed towards the

larger class. Furthermore, in [11] and [29], the dataset is

balanced through random under sampling (RUS), which over-

looks the important information. Moreover, in [4] and [22],

the authors exploit SMOTE approach for data balancing.

It generates the synthetic samples without considering the

overlapping of neighboring samples. Therefore, it introduces

an additional noise and increases the ratio of duplicate

records, which lead the models towards overﬁtting. Fur-

thermore, in ETD, the selection of appropriate performance

metric is a necessary task for better evaluation of a model.

27470 VOLUME 10, 2022

However, in [2] and [19], the appropriate metrics are not

considered for performing a comprehensive analysis.

IV. PROPOSED ELECTRICITY THEFT DETECTION MODEL

This section describes the architecture of the proposed elec-

tricity theft detection model, which is divided into four stages.

1) In the ﬁrst stage, data preprocessing is performed in

which missing values are ﬁlled through linear inter-

polation method, outliers are handled by three sigma

rule (TSR) and feature scaling is done using Min-Max

normalization.

2) In the second stage, class imbalance issue is resolved

by augmenting the minority class theft samples

using Bi-WGAN.

3) In the third stage, a hybrid deep learning model is

designed in which two modules, termed as 2D-CNN

and Bi-LSTM, are integrated in a parallel manner to

perform efﬁcient feature extraction and memorization

of temporal EC patterns.

4) In the fourth stage, a hybrid module is developed to per-

form the classiﬁcation of theft and benign consumers.

Further explanation about the above mentioned steps is given

in the upcoming subsections. Moreover, the complete rep-

resentation of the proposed scheme is shown in Fig. 1. For

easy understanding, a unique step number is assigned to each

stage. In the ﬁrst step, data preprocessing is carried out. In the

second step, the preprocessed data is separated into minority

theft class and majority benign class. In the third step, the data

augmentation is performed by simulating theft samples. The

balanced dataset is produced at step four by concatenating

the augmented theft samples with benign ones. In the ﬁfth

and sixth steps, feature extraction and memorization of tem-

poral EC patterns are preformed by 2D-CNN and Bi-LSTM,

respectively. Finally, the classiﬁcation is performed in the

seventh step by leveraging a fully connected neural network.

A. DATA PREPROCESSING MODULE

The EC data recorded through AMI may contain noisy, erro-

neous and missing values. This is because of the metering

faults, problem in storage devices, meter tampering, etc.

The erroneous values in the dataset should be removed for

achieving accurate results. Therefore, the data preprocessing

techniques are adopted to handle the above issues. Missing

values are tackled through a linear interpolation method [1].

The equation used for ﬁlling the missing values is given

below.

f(xi)=

xi-1 +xi+1

2,xi== NaN ,xi±16= NaN ,

0,xi== NaN ,xi-1 or xi+1 == NaN ,

xi,xi6= NaN .

(1)

where xirepresents the electricity usage of a consumer over

a period i(e.g., a day). The equation has three parts. The ﬁrst

part ensures that the EC value of a user at period i±1 should

not be equal to NAN . If the condition is satisﬁed, the missing

EC value of the consumer xiis ﬁlled by taking the average

of i±1 EC values. Otherwise, the missing value is ﬁlled

by zero, which is the second part of equation. The third

part of the equation states that if xiis not NAN then do not

change it. Similarly, some unusual values are also found in

the EC dataset. These values are referred to as outliers. The

outliers badly degrade the system performance. In this case,

we handle the outlier using a well known method, termed as

TSR [37]. The mathematical equation of TSR is given below.

f(xi)=(¯x+2×σ(x),if xi>¯x+2×σ(x),

xi,otherwise.(2)

where xshows the real EC vector of a consumer and ¯xrepre-

sents the average value of real usage. σdenotes the standard

deviation. In equation 2, the expression xi>¯x+2×σ(x)

states that if xidoes not follow the Gaussian distribution,

it will be declared as an outlier and will be handled by ﬁlling

with ¯x+2×σ(x). After incorporating outliers and missing

values, there is a need to scale the EC data. If we pass EC data

to neural networks without proper feature scaling, it may raise

the gradient exploding issue and increase the computational

overhead. The convergence rate of the neural network is

also suffered. Therefore, we adopt Min-Max normalization

technique to scale the EC data in the range of 0 to 1. The

equation of Min-Max normalization is given below.

xnew =xi−min(x)

max(x)−min(x).(3)

In equation 3,max(x) and min(x) represent the maximum and

minimum EC of a user, respectively. Algorithm 1describes

the complete workﬂow of data preprocessing steps. The

input, output, variables and functions of the algorithm are

described in lines 1 to 7. The lines 8 to 15 deﬁne the linear

interpolation method used for handling the missing values

present in the electricity load proﬁles. Similarly, the lines

from 17 to 21 and 23 deal with outliers and features scaling,

respectively.

B. DATA AUGMENTATION MODULE

The problem of data imbalance adversely affects the per-

formance of classiﬁcation algorithms. This issue is raised

when the data samples of one class is higher than the other

class. In ETD, this problem commonly occurs because the

data samples of theft consumers are rarely available. As a

result, the classiﬁcation algorithms get biased towards the

majority class and ignore the minority class. Keeping this

in view, Bi-WGAN model is opted in this work to resolve

the class imbalance problem by simulating the EC patterns

of fraudulent consumers. In [28], it is used for extract-

ing the rich task-targeting features from the EC data and

shows satisfactory performance. Moreover, in [38], it per-

forms efﬁciently while synthesizing the fake image samples.

Hence, we are inspired and motivated from [28] and [38] and

exploited Bi-WGAN for generating the theft class samples.

The synthesized theft patterns of Bi-WGAN closely mimic

VOLUME 10, 2022 27471

Algorithm 1 Data Preprocessing

1Input: Real dataset

Sreal = {(x1,y1),(x2,y2),...,(xn,yn)},x,y∈R

2Output: Preprocessed dataset Sprep

3Variables and Functions: EC of user x⊆Sreal

4min(x): minimum consumption value of user x

5max(x): maximum consumption value of user x

6Sprep: store preprocessed data

7σ: standard deviation, avg(x): average value of x

Handling missing values:

8for n=1 to Sreal.length do

9for i=1 to x.length do

10 if xn

i== NaN && xn

i−1|| xn

i+16= NaN then

11 xn

i=(xn

i−1+xn

i+1)/2

12 end

13 if xn

i== NaN && xn

i−1|| xn

i+1== NaN then

14 xn

i=0

15 end

16 Outlier detection:

17 if xn

i>avg(xn)+2 * σ(xn)then

18 xn

i=avg(xn)+2 * σ(xn)

19 else

20 xn

i=xn

i

21 end

22 Normalization:

23 xn

i=(xn

i−min(xn))/(max(xn)−min(xn))

24 end

25 Sn

prep =xn

26 end

the patterns of real-world electricity thieves. Moreover, the

auxiliary encoder model strengthens the augmentation ability

of Bi-WGAN model through inverse mapping of original

input to the latent dimension.

Bi-WGAN is the advanced version of Bi-GAN and

WGAN [39], [40]. It is introduced to mitigate the drawbacks

of traditional GAN [41]. The traditional GAN suffers from

mode collapse, vanishing gradient and nash equilibrium prob-

lems. The mode collapse issue occurs when the generator

model generates almost the same data. In GAN, the Jensen

divergence loss function is used, which raises the vanishing

gradient issue during the adversarial training. Furthermore,

both generator and discriminator try to update their loss func-

tions, simultaneously, which affect the convergence speed of

the GAN model. Moreover, in traditional GAN, only the map-

ping from latent space to the samples exists, while the inverse

mapping is not present. In Bi-WGAN, an external encoder

module is attached with the generator network for performing

the inverse mapping of the real input to the latent space.

Moreover, an updated loss function, known as Wasserstein

distance (WD) [35], is used instead of Jensen divergence.

This function assists the model to obtain an optimal solu-

tion within minimum time. In this manner, the convergence

speed of the model towards the global optimum solution

is enhanced. The overall working of Bi-WGAN by augment-

ing electricity theft samples is explained below.

The available electricity theft data is selected as an input

for the training of Bi-WGAN model. It utilizes the objec-

tive function and loss function of Bi-GAN and WGAN,

respectively. Equation 4presents the objective function

of Bi-WGAN [32].

min

GE max

DV(G,E,D)=Ex∼Px(x)[logD(x,E(x))]

+Ez∼Pz(z)[log(1 −D(G(z),z))].(4)

where G,E,Drepresent generator, encoder and discriminator

models, respectively. The original distribution of electricity

theft samples is denoted by Px(x). Pz(z) indicates the distri-

bution of latent noise z.Exand Ezdepict the overall expected

values of discriminator and generator models, respectively.

E(x) represents the encoded representation of the real elec-

tricity theft data x. A zero-sum game is conducted among

G,Eand Dto achieve an optimal output, which is the high

resemblance electricity theft patterns. Gis responsible for

generating those samples, which mimic the patterns of real-

world thieves. Whereas, the goal of Dis to check either the

generated theft data is real or fake. We pass real theft samples

along with the generated samples of Gto Dfor differentiating

between real and fake samples. The role of Eis to improve

the capabilities of Gby adding the encoded representation

E(x) back to the latent dimension z. The training process

continues until Pz(z) becomes similar to Px(x). To measure

the differences between the real and the fake probability dis-

tributions of theft samples, WD is utilized. It shifts the small

amount of Px(x) to Pz(z) for generating those theft samples,

which are closely related to the real-world thieves. In this

way, WD improves the convergence speed and the stable

learning of Bi-WGAN model. The mathematical formulation

of WD [35] is given below.

W(Px(x),Pz(z)) =inf

γ 5(Px (x),Pz(z))

E(x,z)∼γ[kx−zk].(5)

where 5(Px(x),Pz(z)) demonstrates the set of joint distri-

butions γ(x,z). Whereas, |x,z|denotes the mass transported

from the value of xto z. The overall aim of W(Px(x),Pz(z)) is

to reduce the difference between Px(x) and Pz(z) to a minimal

level, so that the generated EC samples of Ghave a high

resemblance with the real-world electricity thieves.

In Algorithm 2, the process of handling class imbalance

problem is presented. The lines from 1 to 7 describe the input,

output, variables and functions for the algorithm. The prepro-

cessed data is split into honest and theft consumers at line 8.

In lines 9 and 10, the probability distribution for Bi-WGAN

is formulated using the real EC data of energy thieves and

random noise, respectively. The lines 11 to 25 present the

training process of both generator and discriminator models.

The training process is not stopped until the model ﬁnds the

optimal weight parameters and minimum loss value. After-

wards, the lines 27 and 28 describe the sample generation of

theft class through Bi-WGAN after after successfully training

27472 VOLUME 10, 2022

FIGURE 1. The proposed electricity theft detection model.

VOLUME 10, 2022 27473

Algorithm 2 Bi-WGAN for Data Augmentation

1Input: Preprocessed dataset

Sprep = {(x1,y1),(x2,y2),...,(xn,yn)},x,y∈R,

2Output: Parameters after training θG, θD, trained

Bi-WGAN model Gtrain, balanced dataset Sbal

3Variables and Functions:Sbal ,Xtheft ,Xhonest ,

α=0.00005,c=0.01, θGinitial generator parameter,

θDinitial discriminator parameter, size of batch m,

discriminator’s counter ncritics, encoder ε, encoded input

ein

4RMSprop(α): optimizer

5split(): splitting theft and honest users’ data

6clip(): for clipping weights

7Bi-WGAN process:

8Xtheft ,Xhonest =split(Sprep)

9Pr=Pdistribution(Xtheft )

10 Pz=Pdistribution(z)

11 while θGhas not converged do

12 for j=0 to ncritics do

13 Sample from real data distribution xm

i=1∼Pr

14 Sample from latent data distribution zm

i=1∼Pz

15 ein =ε(x)

16 ˆx=G(z)

17 ld=

`dh1

mPm

i=1Dw(x,ein)−1

mPm

i=1Dw(ˆx,z)i

18 θd=θd+α.RMSProp(θd,lg)

19 θd=clip(θd,−c,c)

20 end

21 Sample a batch from latent variable zm

i=1∼Pz

22 lg= −`1

mPm

i=1Dw(z)

23 θg=θg+α.RMSProp(θg,lg)

24 update Gtrain(θg)

25 end

26 After training of generator, theft samples are generated

27 Xgen =Gtrain.predict(Nsample )

28 Sbal =concatenate(Xgen,Xtheft )

the model. In addition, notations and symbols used in the

algorithm is taken from [42].

C. ARCHITECTURE OF THE PROPOSED HYBRID MODEL

In this study, a hybrid deep learning model is developed,

which is the combination of 2D-CNN and Bi-LSTM. The

hybrid model performs better than standalone model that is

proved in [43]. Both 2D-CNN and Bi-LSTM models are

integrated in a parallel manner. 2D-CNN takes 2D weekly EC

data for extracting the potential feature and periodicity from

consumers’ proﬁles. Meanwhile, 1D daily electricity data is

passed to Bi-LSTM for memorizing the global and temporal

correlated features. At the end, both models’ outcomes are

combined in the hybrid module for ﬁnal classiﬁcation. The

detailed working of these modules is provided in the follow-

ing subsections.

1) 2D CONVOLUTIONAL NEURAL NETWORK

CNN is introduced to automatically capture the complex

feature representation and non-linearity from highly dynamic

data. It is mostly used in the domain of image processing and

computer vision. However, the authors of [44] employed it for

a speech recognition task. The results showed the superior

performance of CNN by capturing the latent correlations

from the speech data. In [1], a 2D-CNN is constructed with

the help of 2D convolution and pooling layers to explore the

electricity load proﬁles. It extracts the promising EC patterns

for efﬁcient ETD. Therefore, motivated from [1] and [44],

we design a 2D-CNN model to investigate the electricity

load proﬁles. The major task of 2D-CNN is to learn the

hidden representations and potential features from the highly

dynamic feature space. Most of the EC datasets are provided

in 1D raw form. They contain the daily EC records of different

consumers. Since the 1D EC data has limited periodicity

and associations in EC patterns, so there is a need to trans-

form 1D daily EC proﬁles of consumers into 2D weekly

proﬁles. Therefore, 1D data is converted into 2D weekly data.

2D-CNN takes this data as input and passes it through various

ﬁltrations, convolutions and pooling operations to capture the

latent trends and hidden ﬂuctuations for better generalization.

In convolutional operations, different ﬁlters are incorporated.

They learn hidden feature representations and generate fea-

ture maps accordingly. Afterwards, pooling operations are

performed to diminish the spatial dimensions of generated

feature maps. In particular, we opt a max pooling strategy in

this work. The max pooling strategy picks up the highest val-

ues from the given receptive ﬁeld of the speciﬁc feature map

and drops the remaining values. The dropout layers are added

in 2D-CNN to avoid overﬁtting issue. Moreover, we add

batch normalization layers in 2D-CNN to prevent it from the

ICS problem. Furthermore, the deep learning models are very

sensitive to diverse data, so the data should be in a normalized

form before passing it to the next layer. Otherwise, they will

become vulnerable to the gradient exploding or overﬁtting

problems. The mathematical formulation of the convolutional

layer [1] of 2D-CNN is as follows.

yi=σi(wi∗xi+bi).(6)

where σidepicts the sigmoid activation function and yirep-

resents the output of ith convolutional layer. xirefers to

the input, which is basically 2D weekly EC data. Similarly,

widenotes the weight of ith convolutional layer and bidepicts

the bias factor. The output yistores feature maps after the

convolving operations are performed. Afterwards, the pool-

ing operations are performed through a max pooling strategy.

The equation of the max pooling layers is shown below.

ym=maxi,j∈R(yi,j).(7)

where ymdenotes the outcomes of max pooling layers, which

contain the reduced feature maps. Similarly, jdepicts the jth

neurons of a speciﬁc convolutional layer. The dropout and

batch normalization layers are added to prevent the model

from overﬁtting and ICS issues. Moreover, the ﬂatten layer is

27474 VOLUME 10, 2022

utilized to convert the feature map into 1D vector for estab-

lishing connectivity between the following pooling layers

and the upcoming fully connected layer. The mathematical

derivation of the fully connected layer is as follows.

yf=gi(wf

i∗ym+bf

i).(8)

where girepresents the activation function. wf

iand bf

idenote

the weight and bias factors of the fully connected layer,

respectively. yfshows the output of the fully connected layer,

which contains the most important feature set that is extracted

from the 2D EC data. This feature set is further passed to the

hybrid module where it is concatenated with the feature set of

Bi-LSTM for the ﬁnal classiﬁcation as a honest or a malicious

consumer.

2) BIDIRECTIONAL LONG SHORT-TERM MEMORY NETWORK

The EC data contains lots of ﬂuctuations in the EC proﬁles

of consumers. We observed that the electricity patterns of

consumers have a strong association with each other. In this

regard, we opt a Bi-LSTM model to capture the long-term

trend from the EC data for better NTL detection. The selec-

tion of Bi-LSTM is made because the authors of [45] prove

that its performance is outstanding in predicting the trafﬁc

routes. The trafﬁc routes dataset belongs to the time series

data. In the case of ETD, the EC data is also associated

with the time series data [46]. Moreover, the other reason

of using Bi-LSTM is that it stores the EC patterns for a

long time in its memory states to identify the effects of non-

malicious changes. As a result, it reduces the false detection

of electricity consumers to a minimal level.

Bi-LSTM is the extension of the traditional LSTM model

in which two sub-models are trained simultaneously. The

ﬁrst sub-model works in the forward direction and the other

one works in the backward direction. Both sub-models are

aimed to learn long-term periodicity and temporal correlation

in EC load proﬁles. In Bi-LSTM, the provision of context

about EC patterns in both directions further improves its

feature learning capabilities. It also memorizes the long-term

historical EC patterns of consumers’ proﬁles, which are ben-

eﬁcial to deal with the non-malicious changes. Consequently,

the high FPR is reduced to a greater extent. The reduction in

FPR helps the power utilities to save the maximum monetary

cost that is incurred in unnecessary onsite inspections.

Moreover, Bi-LSTM maintains the long-term sequence

in EC patterns through the collaboration of both short and

long-term memory states. The long-term memory state stores

the historical information for a long time. This state is updated

at each time step with the updated information. Whereas,

the short-term memory state consists of different memory

gates that keep the output at current time step. There are

three memory gates that work in the short-term memory state.

The input gate decides how much input data should be kept

and how much will be thrown away. It employs sigmoid

function for making the decision. Moreover, it utilizes both

current and previous state input data during decision process.

Similarly, the unnecessary information is discarded by the

forget gate. It passes only important information to the cell

state. In last, the ﬁnal decision about how much information is

passed to the next hidden state is taken by the output gate.

In addition, the long-term historical information is stored in

cell state for future decisions. The process of storing the infor-

mation in both directions increases the detection accuracy and

reduces the high FPR. The mathematical representations of

different memory gates [14] are given as follows.

ft=σz(Wfxt+Ufht−1+bf),(9)

it=σz(Wixt+Uiht−1+bi),(10)

ot=σz(Woxt+Uoht−1+bo),(11)

ˆct=σz(Wcxt+Ucht−1+bc),(12)

ct=ft∗ct−1+it∗ ˆct,(13)

ht=ot∗σz(ct).(14)

where it,ftand otdenote the values of input, forget and

output gates at current time step, respectively. Similarly,

σzdenotes the sigmoid activation function of the correspond-

ing gate, which decides about the activation of the gates.

Wand Uindicate the weights matrices, which are integrated

with the input of current and previous time steps, respectively.

Moreover, ˆctand ctsignify the values in cell state at current

and overall timestamps, respectively. htrepresents hidden

state at time t. The factor bshows the bias term.

3) HYBRID MODULE

The hybrid module refers to a combined module where the

outcomes of both Bi-LSTM and 2D-CNN modules are inte-

grated into a unique feature vector. A joint weight matrix is

constructed for the hybrid training of both models. Finally,

a sigmoid function is applied on the combined feature vector

for the detection of NTL patterns.

NTLdet =σh(W[h2D−CNN ,hBi−LSTM ]+b),(15)

where σhdenotes the sigmoid activation function. h2D−CNN

and hBi−LSTM represent the ﬁnal output of 2D-CNN and Bi-

LSTM models, respectively. Similarly, Wdenotes the joint

weight for a hybrid model and bis the bias factor. Algorithm 3

describes the process of feature learning and NTL detec-

tion through the hybrid 2D-CNN and Bi-LSTM model. The

lines 1 to 3 describe the input, output, variables and functions

of the algorithm. In lines 4 and 5, the transformation of data

from 1D to 2D is given. The lines from 6 to 12 present the

overall working mechanism of 2D-CNN model. Similarly,

the lines 14 to 30 indicate the learning process of Bi-LSTM

model. The lines from 19 to 24 describe the updating process

of memory gates and cell states. These gates keep or throw

current state information according to the previous cell state

information. The lines from 31 to 38 show the backpropa-

gation and the weight updating processes for different mem-

ory gates. Finally, the line 40 describes detection of energy

thieves.

VOLUME 10, 2022 27475

Algorithm 3 The Proposed Hybrid Model

1Input: Balanced dataset

Sbal = {(x1,y1),(x2,y2),...,(xn,yn)},x,y∈R

2Output:NTLdet

3Variables and Functions: Weights

Wl,Ul,bland ∀l,hL

t−1

4X1D=Sbal

5X2D=transform(X1D)

62D-CNN working:

7Input layer xi=Input(X2D)

8Convolutional layers: xidenotes input of convolutional

layers

9yi=σi(wi∗xi+bi)

10 Max pooling layers: ym=maxi,j∈R(yi,j)

11 Fully connected layer: yf=gi(wf

i∗ym+bf

i)

12 h2D−CNN =yf

13 Bi-LSTM working mechanism:

14 while Wl,Uland blnot converge do

15 for x∈X1Ddo

16 Same process for forward and backward pass

17 for each hidden layer l=1 to l/2 do

18 for each time step t do

19 it=σ(Wl

ixl

t+Ul

iht−1+bl

i)

20 ft=σ(Wl

fxl

t+Ul

fht−1+bl

i)

21 ot=σ(Wl

oxl

t+Ul

oht−1+bl

i)

22 ˆct=σ(Wl

cxl

t+Ul

cht−1+bl

c)

23 cl

t=fl

t∗cl

t−1+il

t∗σ(ˆcl

t)

24 hl

t=ol

t∗σ(cl

t)

25 end

26 h0l=hl

t

27 end

28 Fully connected:

29 Compute: zl←Wlσ(h0l)+bl

30 hBi−LSTM =tanh(zl)

31 Back propagation:

32 OUlT(x),OWlT(x) and OblT(x)

33 end

34 end

35 Hybrid layer:

36 NTLdet =σ(W[h2D−CNN ,hBi−LSTM ]+b)

V. EXPERIMENTS AND RESULTS

In this section, the experimental results of the proposed

and the existing schemes are presented. The experiments

are conducted on a realistic smart meters dataset, which is

released by the State grid corporation of China (SGCC). The

detailed description of the dataset is provided in Section V-

A. Moreover, Python 3.0 and Google Colab are used for

the training of deep learning models. All the deep learn-

ing models are developed through TensorFlow and Keras,

which are open source libraries that build deep neural net-

works. The baseline models are ﬁtted using scikit-learn

library.

A. DATASET DESCRIPTION

The EC dataset is a publicly available realistic smart meters’

dataset, which is released by SGCC. It comprises of daily

EC of 42,372 consumers from 1 Jan 2014 to 31 Oct 2016.

In the dataset, each row represents the complete electricity

proﬁle of a consumer and every column depicts daily EC at

a speciﬁc date. The normal and abnormal users in the dataset

are labeled as 0 and 1, respectively. The meta information

about the dataset is given in Table 2.

TABLE 2. Information of SGCC dataset.

B. PERFORMANCE EVALUATION

In the ETD scenario, the available EC data is imbalanced.

Therefore, the selection of appropriate performance metrics

is a necessary task for fair and better evaluations of the model.

In the case of class imbalance, the accuracy metric is not

suitable because it only focuses on the correct predictions.

Moreover, both false positive (FP) and false negative (FN)

are important in the case of ETD. Therefore, in this study,

the selection of AUC metric is made to properly distinguish

between honest and dishonest consumers. Moreover, FN is

also important for power utilities because it increases the

ﬁnancial loss. Hence, the selection of MCC metric is made

because it takes into account all the positive and negative

classes. It tells about how well true positive (TP), FP, true

negative (TN) and FN are separated. In particular, the range

of MCC score is between 0 and 1. The model performs well

if the value of MCC score is closer to 1. The interaction

towards 1 shows that the classiﬁcation model efﬁciently

detects the positive and negative class samples. In addition,

we consider precision, recall, PR-AUC and F1-score metrics

for comprehensive analysis of the proposed scheme. Preci-

sion tells about the correct predictions of the model, which

assist the electric utilities to save the extra onsite inspec-

tion cost. Similarly, recall provides the overall suspicious

list of energy thieves, which also reduces the ﬁnancial loss.

Whereas, PR-AUC focuses on both precision and recall, and

measures the ratio among them.

The mathematical formulation of the aforementioned per-

formance metrics is given as follows [22].

Precision =TP

TP +FP ,(16)

Recall =TP

TP +FN ,(17)

F1−score =2∗Precision ∗Recall

Precision +Recall ,(18)

T=(TP +FP)(TP +FN ),

P=(TN +FP)(TN +FN ),

27476 VOLUME 10, 2022

MCC =TP ∗TN −FP ∗FN

√T∗P,(19)

AUC =Pi∈positiveclassRANKi−P(1+P)

2

P∗N.(20)

where Pand Nrepresent positive and negative class samples,

respectively. TP refers to the correctly identiﬁed positive

class users, which are actually normal electricity users. Sim-

ilarly, TN depicts the accurately identiﬁed abnormal class

users. Whereas, FN and FP represent the misclassiﬁed normal

and abnormal class users, respectively.

C. MEASURING EFFECTS OF IMBALANCE DISTRIBUTION

ON PERFORMANCE RESULTS

Table 3presents the analysis of the proposed methodology

using different sampling techniques to analyze the signiﬁ-

cance of the balanced and the imbalanced data distributions.

The performance results depict that the hybrid 2D-CNN and

Bi-LSTM model obtains the highest performance on the

Bi-WGAN’s generated data distribution. The near miss and

SMOTE based balanced data does not provide satisfac-

tory performance results because these schemes randomly

remove and synthesize duplicate data records, respectively,

which raise information loss and overﬁtting issues. Moreover,

Bi-WGAN utilizes an auxiliary encoder module to improve

the stable learning and the convergence speed. That is why the

Bi-WGAN generated samples have close resemblance with

the real-world theft patterns, which enable the classiﬁcation

model to perform efﬁcient ETD.

TABLE 3. Proposed model performance on imbalance distribution.

D. COMPARATIVE ANALYSIS WITH BENCHMARK MODELS

In this section, the proposed model is compared with the

state-of-the-art benchmark models for efﬁcient ETD. For

fair comparison, the same data preprocessing techniques are

opted for them. The description of the benchmark models is

given below.

1) SUPPORT VECTOR MACHINE

The support vector machine (SVM) is the most popular

ML classiﬁer. Both classiﬁcation and regression tasks are

performed through SVM. In general, it is exploited for

binary classiﬁcation. However, it also performs multi clas-

siﬁcation using a kernel trick. In [2], SVM is exploited for

ﬁnal NTL detection. Therefore, we select SVM as a baseline

classiﬁer in this work.

2) RANDOM FOREST

The random forest (RF) classiﬁer is an ensemble learning

approach. It integrates several decision trees together that

make a forest. It follows a bagging method. In the bagging

method, the ﬁnal outcome is decided by taking the average

or majority voting of different weak learners. In [21], it is

used to perform ETD.

3) LOGISTIC REGRESSION

Logistic regression (LR) is a simple and well known

ML classiﬁer. It is used for binary classiﬁcation and follows

the principle of neural networks. It contains a single layer

of neural network and a sigmoid activation function on the

output layer for binary classiﬁcation. If the value on the

output layer is closer to 1, then the electricity user is classiﬁed

as an honest user and vice versa [21].

4) WIDE AND DEEP CNN

WD-CNN [1] is a hybrid deep learning approach. It is pro-

posed to detect electricity thieves in power grids. It consists

of two deep learning models, known as wide and deep compo-

nents. The wide component contains a single fully connected

layer of the neural network. It is used for extracting the

abstract features from the 1D daily EC data. Meanwhile, the

deep component captures the local features and periodicity

from the 2D weekly consumption data.

5) LSTM AND MLP

For efﬁcient ETD, a hybrid of LSTM and multi layer percep-

tion (MLP) is proposed in [14]. In the proposed model, the

sequential time series data is passed to LSTM for capturing

the temporal correlation from the EC proﬁles of consumers.

Similarly, the non-sequential additional data is fed to the MLP

model for better detection of energy thieves. Afterwards, the

outputs of both models are combined into a unique feature

vector. Then, ﬁnal NTL detection is performed by applying

the sigmoid activation function.

E. PERFORMANCE ANALYSIS AND DISCUSSION

This section presents the analysis of the experimental results.

First of all, we discuss the analysis of data augmentation

using Bi-WGAN. In Fig. 2(a), the loss curves of discrimi-

nator on both real and fake samples along with the loss of

generator model are shown. The blue and the orange curves

exhibit the discriminator loss on real and fake samples. The

gradual decay in discriminator loss indicates that the dis-

criminator model efﬁciently discriminates the real samples

and the samples that are synthesized by the generator model.

The reason is that the discriminator model is trained more

than the generator model in Bi-WGAN. In particular, the

weights of discriminator model are updated by utilizing the

half batch of real samples and the half batch of fake samples

at each round of the training process. On the other hand, the

VOLUME 10, 2022 27477

FIGURE 2. (a) Training loss of Bi-WGAN generator and discriminator.

(b) Real and Bi-WGAN generated EC patterns.

loss of generator model during the training phase is shown

by the green curve. The addition of an external encoder

module in Bi-WGAN strengthens its power towards gener-

ating the most plausible EC samples. Due to this addition,

it efﬁciently captures the complex probability distribution

curve from EC proﬁles. That is why the loss of genera-

tor model is gradually reduced after few iterations of train-

ing. Consequently, the generated patterns have close resem-

blance with the real-world theft patterns. More speciﬁcally,

in Bi-WGAN, the Wasserstein loss function is used instead

of Jensen divergence loss function.

The Wasserstein loss function measures the score of real-

ness or fakeness of given samples while the regular GAN

loss function predicts the probability of generated samples

as real or fake. Hence, the addition of Wasserstein loss

function, integrating auxiliary encoder module in generator

network and the process of training discriminator model boost

the performance of Bi-WGAN towards generating promi-

nent electricity theft samples. Fig. 2(b) illustrates the perfor-

mance of Bi-WGAN during the generation of fake electricity

theft patterns. The red curve shows the real theft pattern of

an electricity user. Similarly, the blue curve demonstrates

Bi-WGAN generated theft patterns. From the ﬁgure, it is

seen that Bi-WGAN efﬁciently learns the objective laws from

the real electricity theft proﬁles and generates the real-world

synthetic theft patterns with high precision. Moreover, it is

proved that the integration of an external encoder module

in Bi-WGAN helps in simulating realistic real-world theft

patterns.

Table 4describes the performance results of the proposed

model and the benchmark models on 70% training data and

30% testing data. From the results, it is seen that the pro-

posed model shows superior performance on all the existing

models. In the proposed hybrid model, the concurrent usage

of 2D-CNN and Bi-LSTM boosts its performance towards

achieving the best performance results. It obtains 0.97 AUC-

ROC score, which is the best achievement for efﬁcient ETD.

It also beats the existing schemes, such as SVM, LR, RF,

WD-CNN and LSTM-MLP in terms of AUC-ROC. Higher

AUC-ROC means that a classiﬁcation model efﬁciently dis-

tinguishes the two classes. Moreover, the proposed model

achieves PR-AUC of 0.98. This score states that how well the

model correctly identiﬁes the electricity thieves. Our model

obtains the highest PR-AUC because of the powerful capabil-

ities of Bi-LSTM and 2D-CNN. Whereas, SVM obtains the

lowest AUC-ROC score of 0.77 because it does not perform

well on high dimensional data. It draws n−1 hyperplanes,

where ndenotes the number of features. Therefore, the selec-

tion of an optimal hyperplane in the case of highly dynamic

data is very difﬁcult for it. That is why SVM obtains the low-

est AUC-ROC score as compared to other baseline models.

In contrast, RF achieves a suitable AUC-ROC of 0.94 because

it follows the ensemble learning procedure. In RF, the out-

comes of several weak learners are combined for the ﬁnal

prediction using the majority voting phenomenon. Moreover,

it uses a random subset of data samples and features for

training each weak learner. This process improves its perfor-

mance results. Therefore, it performs better than the conven-

tional ML techniques. It obtains AUC-ROC and PR-AUC of

0.94 and 0.96, respectively, which is higher than SVM and LR

predictions. LR does not achieve satisfactory results because

it has one single hidden layer. WD-CNN and LSTM-MLP

models achieve 0.92 and 0.95 AUC-ROC scores, respectively.

LSTM-MLP obtains better results than WD-CNN because it

uses the strong memorization and feature extraction abilities

of LSTM and MLP, respectively.

Fig. 3(a) shows the loss of the proposed hybrid model during

the training phase. The orange curve depicts the loss on

validation data and the blue curve demonstrates the loss on

training data. It is clearly seen that the hybrid model performs

well on both training and validation data. We analyze that

the loss value decreases when the epoch value increases.

However, after running 10 iterations of the training phase, the

loss value on training data starts decreasing gradually; mean-

while, the loss value on validation data becomes smooth. This

implies that the model has good generalization ability before

the 10th iteration. Moreover, a threshold must exist for epoch

value to optimize the training process. For instance, in our

case, the best performance of training is achieved when the

epoch value reaches 10.

27478 VOLUME 10, 2022

TABLE 4. Comparison analysis of the proposed model with benchmark schemes.

FIGURE 3. (a) Training and validation losses of hybrid 2D-CNN and

Bi-LSTM. (b) Training and validation accuracy of hybrid 2D-CNN and

Bi-LSTM.

Fig. 3(b) illustrates the accuracy of the hybrid model during

the training phase. It is seen that the hybrid model performs

well on both training and validation datasets because of

the effective gated conﬁguration and the integration of both

forward and backward passes in Bi-LSTM model. In par-

ticular, the powerful feature extraction capabilities of 2D-

CNN model also improve the classiﬁcation results. The per-

formance of the hybrid model on validation data is more

stable than training data. This implies that the proposed

hybrid model efﬁciently detects electricity thieves and honest

consumers from the EC data due to the hybrid functionali-

ties of 2D-CNN and Bi-LSTM. Its training accuracy gradu-

ally increases when the epoch value increases. The optimal

FIGURE 4. (a) AUC-ROC score of the proposed hybrid model. (b) MCC

score of the proposed hybrid model.

performance is obtained when the number of epoch hits 10.

Furthermore, a large ﬂuctuation is seen in the accuracy value

at epoch 6. It is because of a noisy batch of samples dur-

ing the model’s training. However, the model stabilizes its

learning after the 6th epoch. Similarly, Fig. 4(a) depicts the

AUC-ROC score of the hybrid model during the training

and validation phases. It is seen that the model obtains an

AUC-ROC score of 0.97, which is an excellent achievement.

This achievement implies that the hybrid model effectively

discriminates normal and theft classes due to its best learning

mechanism. Fig. 4(b) exhibits the MCC score. MCC met-

ric is opted because it equally incorporates all ﬁndings of

confusion matrix. It ﬁnds the correlation between TP, FP,

VOLUME 10, 2022 27479

FIGURE 5. (a) F1-score of hybrid 2D-CNN and Bi-LSTM. (b) AUC-ROC

based benchmark comparison.

TN and FN. FN and TN are also important for electric utilities

because they help utilities to restore maximum monetary cost.

From the ﬁgure, it is observed that MCC score is increasing at

each iteration, which shows that the proposed model perfectly

deals with FN and TN. It obtains MCC score of 0.93, which

is satisfactory in case of detecting electricity thieves. Con-

sequently, it will be beneﬁcial for power utilities to recover

maximum revenue by identifying the energy thieves. The

F1-score is depicted in Fig. 5(a) on both validation and

training datasets. It is determined by computing the har-

monic means of precision and recall values. During training,

an abrupt change is seen in the 6th epoch. This is because

of noise in the training batch. HBesides, the proposed model

obtains F1-score of 0.94, which depicts its superior perfor-

mance on validation dataset. The higher F1-score helps the

electric utilities to accurately identify and locate the energy

thieves. It also becomes beneﬁcial to increase the detection

rate (DR) and reduce the high FPR.

The AUC-ROC scores of the proposed scheme and the

baseline models are illustrated in Fig. 5(b). The proposed

scheme obtains an AUC-ROC score of 0.97, which is sat-

isfactory as compared to the existing classiﬁers, such as

SVM, LR, RF, WD-CNN and LSTM-MLP. This achievement

implies that the proposed scheme efﬁciently distinguishes the

FIGURE 6. PR-AUC based benchmark comparison.

FIGURE 7. Training time (sec) of the proposed hybrid model and baseline

models.

two classes due to its hybrid feature learning mechanism.

Moreover, the powerful gated conﬁguration along with the

integration of both forward and reverse feature learning paths

in Bi-LSTM increases its performance towards capturing

the non-malicious changes. Consequently, the high FPR is

reduced to a minimum extent. The PR-AUC scores of the

proposed and baseline models are shown in Fig. 6. It equally

focuses on both precision and recall. In the case of detecting

electricity frauds, these both factors are dominant for electric

utilities. A high PR-AUC score proves the efﬁcacy of models.

The proposed scheme achieves PR-AUC of 0.98, which is

higher than all baseline models. This implies that the pro-

posed scheme is proven to be beneﬁcial for power industries

to accurately identify the energy frauds and help them to

recover maximum income. Moreover, Fig. 7illustrates the

training time of the proposed and baseline models. It is seen

that the proposed model takes less time for training as com-

pared to other deep models. The reason is that the proposed

model efﬁciently discards the redundant and noisy features

from the high dimensional EC data and reduces the com-

putational overhead to a greater extent. The model obtains

the highest performance results as compared to the baseline

models. Moreover, LR takes least time for training because it

contains one layer of neural networks. However, it does not

obtain satisfactory results. The SVM model takes the highest

training time because it ﬁrst draws multiple hyperplanes and

then selects an optimal hyperplane from them to perform the

27480 VOLUME 10, 2022

TABLE 5. Mapping between identified limitations, proposed solutions and validation results.

classiﬁcation task. This process increases the computational

complexity to a greater extent.

F. MAPPING BETWEEN LIMITATIONS, SOLUTIONS AND

THEIR VALIDATIONS

The mapping of identiﬁed limitations with their proposed

solutions and validations is given in Table 5. L1 is about the

noisy high dimensionality issue, which is solved by proposing

a hybrid of 2D-CNN and Bi-LSTM model and their results

are validated through suitable key performance indicators,

as shown in Figs. 4,5and 6. The poor generalization issue

is highlighted in L2. It occurs because of noisy and duplicate

features in the EC data. The issue is solved through the

proposed hybrid model. The proposed model captures only

potential features and discards the irrelevant features. More-

over, it efﬁciently extracts the temporal correlated features

from the EC data. Table 5validates this solution. In L3, the

problem of high FPR is discussed. This problem occurs due

to several non-malicious factors and abrupt changes in EC

load proﬁles. It may happen because of false data injection

by the intelligent attacker. Hence, the problem of high FPR

is resolved by utilizing the Bi-LSTM model. It maintains

the context of the long-term temporal correlation in memory

states. In this manner, the effects of various non-malicious

factors are easily identiﬁed by the model. The solution is val-

idated through AUC-ROC that is shown in Fig. 5(b). The class

imbalance issue is highlighted in L4. Bi-WGAN is employed

to synthesize the fraudulent electricity samples. The solution

is validated through the generated sample of Bi-WGAN,

as shown in Fig. 2(b). L5 is about the overﬁtting issue, which

occurs when using SMOTE due to the duplication of EC

records. Bi-WGAN simulates plausible theft samples because

of their powerful feature learning capabilities. The solution is

validated in Fig 2(b) where the learning process of Bi-WGAN

is presented. In L6, the ICS issue is discussed that occurs

in neural network while transferring the input distribution

from one hidden layer to the others. To solve ICS, we add

batch normalization layers and regularization penalties in the

neural network. The solution is validated by analyzing the

convergence speed of the proposed model, which is shown

in Figs. 3,4and 5. In L7, it is mentioned that the improper

selection of performance metrics in ETD does not provide fair

assessment. Therefore, the selection of appropriate metrics is

made for the fair evaluation of the proposed model. The solu-

tion is validated by suitable performance indicators, which

are shown in Figs. 3-6.

VI. CONCLUSION AND FUTURE WORK

In this article, we have proposed a hybrid deep learning model

for the detection of ET in power grids. The proposed model

combines 2D-CNN and Bi-LSTM models. The noisy high

dimensionality issue is tackled through the hybrid capabilities

of both Bi-LSTM and 2D-CNN modules. Furthermore, the

challenge of the severe lack of fraudulent samples is solved

by generating realistic theft samples using Bi-WGAN. All

the experiments are conducted on the realistic smart meters

VOLUME 10, 2022 27481

dataset, which is released by the SGCC. The comparison

with other baseline models proves that the proposed scheme

surpasses the performance of the state-of-the-art models, such

as LR, SVM, RF, WD-CNN and LSTM-MLP. Moreover,

the simulation results illustrated that the proposed model

achieves higher AUC-ROC, PR-AUC, F1-score and MCC

score as compared to the baseline models. Our model obtains

AUC-ROC and PR-AUC of 0.97 and 0.98, respectively that

make it more suitable for real-world scenarios. Furthermore,

the proposed model can be used in different industrial appli-

cations to detect anomalies and frauds. In the future, we will

consider the high sampling EC data to enhance the perfor-

mance of the proposed hybrid model.

DATASET AVAILABILITY

Dataset used in this study is publically available at

‘‘https://github.com/henryRDlab/Electricity

TheftDetection/’’.

ACKNOWLEDGMENT

The authors would like to acknowledge Taif University

Researchers Supporting Project number (TURSP-2020/292)

Taif University, Taif, Saudi Arabia. The authors would

like also to acknowledge Princess Nourah bint Abdul-

rahman University Researchers Supporting Project number

(PNURSP2022R193), Princess Nourah bint Abdulrahman

University, Riyadh, Saudi Arabia.

REFERENCES

[1] Z. Zheng, Y. Yang, X. Niu, H.-N. Dai, and Y. Zhou, ‘‘Wide and deep

convolutional neural networks for electricity-theft detection to secure

smart grids,’’ IEEE Trans. Ind. Informat., vol. 14, no. 4, pp. 1606–1615,

Apr. 2018.

[2] P. Jokar, N. Arianpoo, and V. C. M. Leung, ‘‘Electricity theft detection in

AMI using Customers’ consumption patterns,’’ IEEE Trans. Smart Grid,

vol. 7, no. 1, pp. 216–226, Jan. 2016.

[3] Q. Chen, K. Zheng, C. Kang, and F. Huangfu, ‘‘Detection methods

of abnormal electricity consumption behaviors: Review and prospect,’’

Autom. Electr. Power Syst., vol. 42, no. 17, pp. 189–199, 2018.

[4] S. K. Gunturi and D. Sarkar, ‘‘Ensemble machine learning models for the

detection of energy theft,’’ Electr. Power Syst. Res., vol. 192, Mar. 2021,

Art. no. 106904.

[5] R. Razavi, A. Gharipour, M. Fleury, and I. J. Akpan, ‘‘A practical feature-

engineering framework for electricity theft detection in smart grids,’’ Appl.

Energy, vol. 238, pp. 481–494, Mar. 2019.

[6] A. S. Iwashita, D. Rodrigues, D. S. Gastaldello, A. N. de Souza, and

J. P. Papa, ‘‘An incremental optimum-path forest classiﬁer and its applica-

tion to non-technical losses identiﬁcation,’’ Comput. Electr. Eng., vol. 95,

Oct. 2021, Art. no. 107389.

[7] S.-V. Oprea and A. Bâra, ‘‘Machine learning classiﬁcation algorithms

and anomaly detection in conventional meters and Tunisian electricity

consumption large datasets,’’ Comput. Electr. Eng., vol. 94, Sep. 2021,

Art. no. 107329.

[8] C.-H. Lo and N. Ansari, ‘‘CONSUMER: A novel hybrid intrusion detec-

tion system for distribution networks in smart grid,’’ IEEE Trans. Emerg.

Topics Comput., vol. 1, no. 1, pp. 33–44, Jun. 2013.

[9] S. Amin, G. A. Schwartz, and H. Tembine, ‘‘Incentives and security in

electricity distribution networks,’’ in Proc. Int. Conf. Decis. Game Theory

Secur., Berlin, Germany: Springer, 2012, pp. 264–280.

[10] N. Javaid, H. Gul, S. Baig, F. Shehzad, C. Xia, L. Guan, and T. Sultana,

‘‘Using GANCNN and ERNET for detection of non technical losses to

secure smart grids,’’ IEEE Access, vol. 9, pp. 98679–98700, 2021.

[11] M. M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and

A. Gomez-Exposito, ‘‘Detection of non-technical losses using smart

meter data and supervised learning,’’ IEEE Trans. Smart Grid, vol. 10,

no. 3, pp. 2661–2670, May 2019.

[12] X. Kong, X. Zhao, C. Liu, Q. Li, D. Dong, and Y. Li, ‘‘Electricity

theft detection in low-voltage stations based on similarity measure and

DT-KSVM,’’ Int. J. Electr. Power Energy Syst., vol. 125, Feb. 2021,

Art. no. 106544.

[13] S. I. Popoola, B. Adebisi, M. Hammoudeh, H. Gacanin, and G. Gui,

‘‘Stacked recurrent neural network for BotNet detection in smart Homes,’’

Comput. Electr. Eng., vol. 92, Jun. 2021, Art. no. 107039.

[14] M.-M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and

A. Gomez-Exposito, ‘‘Hybrid deep neural networks for detection of

non-technical losses in electricity smart meters,’’ IEEE Trans. Power Syst.,

vol. 35, no. 2, pp. 1254–1263, Mar. 2020.

[15] D. Yao, M. Wen, X. Liang, Z. Fu, K. Zhang, and B. Yang, ‘‘Energy

theft detection with energy privacy preservation in the smart grid,’’ IEEE

Internet Things J., vol. 6, no. 5, pp. 7659–7669, Oct. 2019.

[16] M. Asif, B. Kabir, A. Ullah, S. Munawar, and N. Javaid, ‘‘Towards

energy efﬁcient smart grids: Data augmentation through BiWGAN, feature

extraction and classiﬁcation using hybrid 2DCNN and BiLSTM,’’ in Proc.

Int. Conf. Innov. Mobile Internet Services Ubiquitous Comput., Cham,

Switzerland: Springer, 2021, pp. 108–119.

[17] R. Punmiya and S. Choe, ‘‘Energy theft detection using gradient boosting

theft detector with feature engineering-based preprocessing,’’ IEEE Trans.

Smart Grid, vol. 10, no. 2, pp. 2326–2329, Mar. 2019.

[18] Y. Huang and Q. Xu, ‘‘Electricity theft detection based on stacked sparse

denoising autoencoder,’’ Int. J. Electr. Power Energy Syst., vol. 125,

Feb. 2021, Art. no. 106448.

[19] K. Zheng, Q. Chen, Y. Wang, C. Kang, and Q. Xia, ‘‘A novel combined

data-driven approach for electricity theft detection,’’ IEEE Trans. Ind.

Informat., vol. 15, no. 3, pp. 1809–1819, Mar. 2019.

[20] A. Takiddin, M. Ismail, U. Zafar, and E. Serpedin, ‘‘Robust electricity theft

detection against data poisoning attacks in smart grids,’’ IEEE Trans. Smart

Grid, vol. 12, no. 3, pp. 2675–2684, May 2021.

[21] S. Li, Y. Han, X. Yao, S. Yingchen, J. Wang, and Q. Zhao, ‘‘Electricity theft

detection in power grids with deep learning and random forests,’’ J. Electr.

Comput. Eng., vol. 2019, pp. 1–12, Oct. 2019.

[22] M. N. Hasan, R. N. Toma, A.-A. Nahid, M. M. M. Islam, and J.-M. Kim,

‘‘Electricity theft detection in smart grid systems: A CNN-LSTM based

approach,’’ Energies, vol. 12, no. 17, p. 3310, Aug. 2019.

[23] R. R. Bhat, R. D. Trevizan, R. Sengupta, X. Li, and A. Bretas, ‘‘Identi-

fying nontechnical power loss via spatial and temporal deep learning,’’

in Proc. 15th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2016,

pp. 272–279.

[24] B. Kocaman and V. Tümen, ‘‘Detection of electricity theft using data

processing and LSTM method in distribution systems,’’ S¯

adhan¯

a, vol. 45,

no. 1, pp. 1–10, Dec. 2020.

[25] G. Fenza, M. Gallo, and V. Loia, ‘‘Drift-aware methodology for anomaly

detection in smart grid,’’ IEEE Access, vol. 7, pp. 9645–9657, 2019.

[26] X. Lu, Y. Zhou, Z. Wang, Y. Yi, L. Feng, and F. Wang, ‘‘Knowledge embed-

ded semi-supervised deep learning for detecting non-technical losses in the

smart grid,’’ Energies, vol. 12, no. 18, p. 3452, Sep. 2019.

[27] C. C. O. Ramos, D. Rodrigues, A. N. de Souza, and J. P. Papa, ‘‘On the

study of commercial losses in Brazil: A binary black hole algorithm for

theft characterization,’’ IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 676–683,

Mar. 2018.

[28] T. Hu, Q. Guo, H. Sun, T.-E. Huang, and J. Lan, ‘‘Nontechnical losses

detection through coordinated BiWGAN and SVDD,’’ IEEE Trans. Neural

Netw. Learn. Syst., vol. 32, no. 5, pp. 1866–1880, May 2021.

[29] N. F. Avila, G. Figueroa, and C.-C. Chu, ‘‘NTL detection in electric

distribution systems using the maximal overlap discrete wavelet-packet

transform and random undersampling boosting,’’ IEEE Trans. Power Syst.,

vol. 33, no. 6, pp. 7171–7180, Nov. 2018.

[30] J. I. Guerrero, I. Monedero, F. Biscarri, J. Biscarri, R. Millan, and C. Leon,

‘‘Non-technical losses reduction by improving the inspections accuracy in

a power utility,’’ IEEE Trans. Power Syst., vol. 33, no. 2, pp. 1209–1218,

Mar. 2018.

[31] M. S. Saeed, M. W. Mustafa, U. U. Sheikh, T. A. Jumani, and N. H. Mirjat,

‘‘Ensemble bagged tree based classiﬁcation for reducing non-technical

losses in multan electric power company of Pakistan,’’ Electronics, vol. 8,

no. 8, p. 860, Aug. 2019.

[32] X. Gong, B. Tang, R. Zhu, W. Liao, and L. Song, ‘‘Data augmentation

for electricity theft detection using conditional variational auto-encoder,’’

Energies, vol. 13, no. 17, p. 4291, Aug. 2020.

[33] H. Gul, N. Javaid, I. Ullah, A. M. Qamar, M. K. Afzal, and G. P. Joshi,

‘‘Detection of non-technical losses using SOSTLink and bidirectional

gated recurrent unit to secure smart meters,’’ Appl. Sci., vol. 10, no. 9,

p. 3151, Apr. 2020.

27482 VOLUME 10, 2022

[34] X. Wang, I. Yang, and S.-H. Ahn, ‘‘Sample efﬁcient home power anomaly

detection in real time using semi-supervised learning,’’ IEEE Access,

vol. 7, pp. 139712–139725, 2019.

[35] A. Aldegheishem, M. Anwar, N. Javaid, N. Alrajeh, M. Shaﬁq, and

H. Ahmed, ‘‘Towards sustainable energy efﬁciency with intelligent elec-

tricity theft detection in smart grids emphasising enhanced neural net-

works,’’ IEEE Access, vol. 9, pp. 25036–25061, 2021.

[36] N. Javaid, N. Jan, and M. U. Javed, ‘‘Anadaptive synthesis to handle imbal-

anced big data with deep Siamese network for electricity theft detection in

smart grids,’’ J. Parallel Distrib. Comput., vol. 153, pp. 44–52, Jul. 2021.

[37] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A survey,’’

ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, 2009.

[38] U. Mutlu and E. Alpaydın, ‘‘Training bidirectional generative adver-

sarial networks with hints,’’ Pattern Recognit., vol. 103, Jul. 2020,

Art. no. 107320.

[39] M. Arjovsky, S. Chintala, and L. Bottou, ‘‘Wasserstein generative adver-

sarial networks,’’ in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.

[40] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville,

‘‘Improved training of Wasserstein GANs,’’ 2017, arXiv:1704.00028.

[41] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,

S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial networks,’’

2014, arXiv:1406.2661.

[42] M. Arjovsky, S. Chintala, and L. Bottou, ‘‘Wasserstein generative adver-

sarial networks,’’ in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.

[43] J. Yu, X. Zhang, L. Xu, J. Dong, and L. Zhangzhong, ‘‘A hybrid CNN-GRU

model for predicting soil moisture in maize root zone,’’ Agricult. Water

Manage., vol. 245, Feb. 2021, Art. no. 106649.

[44] J. Zhao, X. Mao, and L. Chen, ‘‘Speech emotion recognition using deep 1D

& 2D CNN LSTM networks,’’ Biomed. Signal Process. Control, vol. 47,

pp. 312–323, Jan. 2019.

[45] Z. Cui, R. Ke, Z. Pu, and Y. Wang, ‘‘Stacked bidirectional and unidirec-

tional LSTM recurrent neural network for forecasting network-wide trafﬁc

state with missing values,’’ Transp. Res. C, Emerg. Technol., vol. 118,

Sep. 2020, Art. no. 102674.

[46] N. Javaid, A. Naz, R. Khalid, A. Almogren, M. Shaﬁq, and A. Khalid,

‘‘ELS-Net: A new approach to forecast decomposed intrinsic mode func-

tions of electricity load,’’ IEEE Access, vol. 8, pp. 198935–198949, 2020.

MUHAMMAD ASIF (Graduate Student Member,

IEEE) received the B.S. degree in information

technology from the University of Gujrat, Gujrat,

Pakistan, in 2017. He is currently pursuing the

M.S. degree in computer science with the Com-

munications Over Sensors (ComSens) Research

Laboratory, COMSATS University Islamabad,

Islamabad Campus, under the supervision of

Prof. Nadeem Javaid. His research interests

include electricity load forecasting, ﬁnancial

market forecasting, and smart grids.

OROOJ NAZEER received the M.S. degree in computer science from

Abasyn University, Islamabad, under the supervision of Prof. Nadeem

Javaid.

NADEEM JAVAID (Senior Member, IEEE)

received the bachelor’s degree in computer sci-

ence from Gomal University, Dera Ismail Khan,

Pakistan, in 1995, the master’s degree in elec-

tronics from Quaid-i-Azam University, Islamabad,

Pakistan, in 1999, and the Ph.D. degree from the

University of Paris-Est, France, in 2010. He is

currently a Professor and the Founding Director

of the Communications Over Sensors (ComSens)

Research Laboratory, Department of Computer

Science, COMSATS University Islamabad, Islamabad Campus. He is also

working as a Visiting Professor at the School of Computer Science, Uni-

versity of Technology Sydney, Australia. He has supervised 146 master’s

and 27 Ph.D. theses. He has authored over 900 articles in technical jour-

nals and international conferences. His research interests include energy

optimization in smart/microgrids and in wireless sensor networks using

data analytics and blockchain. He was a recipient of the Best University

Teacher Award (BUTA’16) from the Higher Education Commission (HEC)

of Pakistan, in 2016, and the Research Productivity Award (RPA’17) from

the Pakistan Council for Science and Technology (PCST), in 2017. He is an

Associate Editor of IEEE ACCESS and the Editor of Sustainable Cities and

Society.

EMAN H. ALKHAMMASH received the M.Sc. and Ph.D. degrees in com-

puter science from the University of Southampton, U.K. She is currently

working as an Associate Professor of computer science with Taif University,

Saudi Arabia. Her research area includes formal methods, AI, data science,

and so on. She was awarded as a Senior Fellow of the Higher Education

Academy (FHEA) in March 2020.

MYRIAM HADJOUNI received the Ph.D. degree (Hons.) in computer

science from Paris XI (actual new name Paris Saclay) University, France,

and Manouba University, Tunisia, in 2012, and the M.Sc. degree (Hons.)

from the Higher Institute of Management of Tunis, University of Tunis,

Tunisia, in 2005. She is currently working as an Assistant Professor with

the Computer Sciences Department, College of Computer and Information

Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Kingdom

of Saudi Arabia. Her research includes but not restricted to information

retrieval, artiﬁcial intelligence, data science, data analytic, big data, and

image retrieval.

VOLUME 10, 2022 27483