Content uploaded by Nadeem Javaid
Author content
All content in this area was uploaded by Nadeem Javaid on Sep 15, 2021
Content may be subject to copyright.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2020.DOI
A Robust Hybrid Deep Learning Model
for Detection of Non-technical Losses to
Secure Smart Grids
FAISAL SHEHZAD1, NADEEM JAVAID1,*, (Senior Member, IEEE),
AHMAD ALMOGREN2, (Senior Member, IEEE), ABRAR AHMED3,
SARDAR MUHAMMAD GULFAM3, AYMAN RADWAN4
1Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
2Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11633, Saudi Arabia
3Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad 44000, Pakistan
4Instituto de Telecomunicacoes and Universidade de Aveiro, Aveiro, Portugal
*Corresponding authors: Nadeem Javaid. Email: nadeemjavaidqau@gmail.com and Ahmad Almogren. Email: ahalmogren@ksu.edu.sa
ABSTRACT For dealing with the electricity theft detection in the smart grids, this article introduces a
hybrid deep learning model. The model tackles various issues such as class imbalance problem, curse of
dimensionality and low theft detection rate of the existing models. The model integrates the benefits of
both GoogLeNet and gated recurrent unit (GRU). The one dimensional electricity consumption (EC) data
is fed into GRU to remember the periodic patterns of electricity consumption. Whereas, GoogLeNet model
is leveraged to extract the latent features from the two dimensional weekly stacked EC data. Furthermore,
the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance
problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples,
which have high resemblance with real world theft samples. The standard generative adversarial network
only updates the weights of those points that are available at the wrong side of the decision boundary.
Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of
decision boundary that prevent the model from vanishing gradient problem. Moreover, dropout and batch
normalization layers are utilized to enhance model’s convergence speed and generalization ability. The
proposed model is compared with different state-of-the-art classifiers including multilayer perceptron
(MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network
and wide and deep convolutional neural network. It outperforms all classifiers by achieving 96% and 97%
precision-recall area under the curve and receiver operating characteristics area under the curve, respectively.
INDEX TERMS Electricity theft detection, gated recurrent unit, GoogLeNet, non-technical losses, smart
grids, SGCC.
I. INTRODUCTION
Two types of losses occur during generation, transmission
and distribution of electricity that are technical losses (TLs)
and non-technical losses (NTLs). The former occur due to
dissipation of energy in distribution lines, transformers and
other electric equipments. Whereas, the latter are caused by
meter tampering, direct hooking to transmission lines, billing
errors, faulty meters, etc. These losses not only affect the
performance of electricity generation companies, however,
they also damage their physical components. Moreover, a
recent report shows that NTLs cause $96 billion of revenue
loss every year [1]. According to the World Bank’s report,
India, China and Brazil bear 25%, 6% and 16% loss on their
total electric supply, respectively. The NTLs are not limited
to only developing countries; it is estimated that developed
countries like UK and US also lose 232 million and 6 billion
USA dollars per annum, respectively [2].
Electricity theft is a primary cause of NTLs. The evolution
of advanced metering infrastructure (AMI) promises to over-
come electricity theft through monitoring users’ consumption
history. However, it introduces new types of cyber-attacks,
which are difficult to detect using conventional methods.
VOLUME 4, 2020 1
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 1: List of abbrevations
Abbreviation Full form
ADASYN Adaptive synthetic sampling approach
AMI Advanced metering infrastructure
CNN Convolutional neural network
CPBETD Consumption pattern based electricity theft detector
Catboost Categorical boosting
D Discriminator
DR Detection rate
EC Electricity consumption
ETD Electricity theft detection
FPR False positive rate
GRU Gated recurrent unit
G Generator
KNN k-nearest neighbors
LSTM Long short term memory
LR Logistic regression
LSGAN Least square generative adversarial network
LightGBM Light gradient boosting machine
ML Machine learning
SVM Support vector machine
SGCC Smart grid corporation of China
SMOTE Synthetic minority over-sampling technique
SMOTE_ENN SMOTE and edited nearest neighbors
MLP Multilayer perceptron
NTLs Non-technical losses
NB Naive bayes
NaN Not a number
TLSGAN Time LSGAN
TPR True positive rate
TSR Three sigma rule
SSDAE Stacked sparse denoising autoencoder
RUS Random undersampling
RF Random forest
ROS Random oversampling
RNN Recurrent neural network
RDBN Real-valued deep belief network
PR-AUC Precision recall-area under curve
PCA Principal component analysis
ROC-AUC Receiver operating characteristic - area under curve
TLs Technical losses
WDCNN Wide and deep convolutional neural network
XGBoost eXtreme gradient boosting
aLabel of theft sample
bLabel of fake sample
bHG2Bias of hybrid layer
cDistance variable
DenseGoogLeN et Last layer of GoogLeNet
DenseGRU Last layer of GRU
EExpected value of all instances
htHidden state at timestamp t
hHG2Hidden layer of hybrid module
ˆ
hCandidate value
Pdata(x)Theft data
Pg(z)Gaussian distribution
rReset gate
wiith week EC
wmmth week EC
WrWeight of reset gate
WzWeight of update gate
WHG2Weight of hybrid layer
xiComplete consumption history of consumer i
xi,j Daily EC of a consumer iover time period j(a day)
xi,j-1 EC of a previous day
xi,j+1 EC of a next day
Whereas, traditional meters are only compromised through
physical tampering. In AMI, the meter readings are tam-
Abbreviation Full form
¯xiAverage consumption of consumer i
σ(xi)Standard deviation of consumer i
min(xi)Minimum value of consumer i
YNT L Output of having NTLs or not
zUpdate gate
pered locally and remotely over the communication links
before sending them to an electric utility [3]. There are
three types of approaches to address the NTLs in AMI:
state, game theory and data-driven. State-based approaches
exploit wireless sensors and radio frequency identification
tags to detect NTLs. However, these approaches require high
installation, maintenance and training cost and they also
perform poorly in extreme weather conditions [4], [5]. Beside
this, game theory based approaches hold a game between
a power utility and consumers to achieve equilibrium state
and then extract hidden patterns from users’ EC history.
However, it is difficult to design a suitable utility function for
utilities, regulators, distributors and energy thieves to achieve
equilibrium state within the defined time [6]. Moreover, both
NTLs detection approaches have low detection rate (DR) and
high false positive rate (FPR)
The data driven methods get high attention due to the
availability of electricity consumption (EC) data that is col-
lected through AMI. A normal consumer’s EC follows a
statistical pattern, whereas, abnormal1EC does not follow
any pattern. The machine learning (ML) and data mining
techniques are trained on collected data to learn normal2and
abnormal consumption patterns. After training, the model is
deployed in a smart grid to classify incoming consumer’s data
into normal or abnormal samples. Since, these techniques use
already available data and do not require to deploy hardware
devices at consumers’ site that is why their installation and
maintenance costs are low as compared to hardware based
methods. However, class imbalance problem is a serious
issue for data driven methods where the number of normal
EC samples is more than theft ones. Normal data is easily
collected through users’ consumption history.
Whereas, theft cases are relatively rare than normal class in
the real world that is why few number of samples are present
in user’s consumption history. So, lack of theft samples
affect the performance of classification models. The ML
models become biased towards majority class and ignore the
minority class, which increases the FPR [7], [8]. In literature,
the authors mostly use random undersampling (RUS) and
random oversampling (ROS) techniques to handle the class
imbalance problem. However, both techniques have underfit-
ting and overfitting issues that increase the FPR and minimize
the DR [3], [9], [10], [11]. The second challenging issue is the
curse of dimensionality. A time series dataset contains a large
number of timestamps (features) that increase both execution
1Theft and abnormal words are used interchangeably
2Benign and normal words are used interchangeably.
2VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
time and memory complexity and reduce the generalization
ability of ML methods. However, traditional ML methods
have low DR and overfitting issue due to curse of dimension-
ality. They require domain knowledge to extract prominent
features that is a time consuming task [2], [3]. Moreover,
metaheuristic techniques are proposed by understaning the
working mechanism of nature. In literature, these techniques
are mostly utilized for optimization and feature selection
purposes [12], [13], [14].
In this article, time series least square generative adver-
sarial network (TLSGAN) is proposed, which is specifically
designed to handle data imbalance problem of time series
datasets. It utilizes supervised and unsupervised loss func-
tions and gated recurrent unit (GRU) layers to generate fake
theft samples, which have high resemblance with real world
theft samples. Whereas, standard GAN uses only unsuper-
vised loss function to generate fake theft samples, which have
low resemblance with real word theft samples. Moreover, a
HG2model is proposed, which is a hybrid of GoogLeNet and
GRU. It is a challenging task to capture long-term periodicity
from one dimensional (1D) time series dataset. The deep
learning models have better ability to memorize sequence
patterns as compare to traditional ML models. The 1D data is
fed into GRU to capture temporally correlated patterns from
users’ consumption history. Whereas, weekly consumption
data is passed to GoogLeNet to capture local features from
sequence data using the inception modules. Each inception
module contains multiple convolutional and max-pooling
layers that extract high level features from time series data
and overcome the curse of dimensionality issue. Moreover,
non malicious factors like changing the number of persons
in a house, extreme weather conditions, weekends, big party
in a house, etc., affect the performance of ML methods.
The GRU is used to handle non malicious factors because it
has memory modules. These memory modules help GRU to
learn sudden changes in consumption patterns and memorize
them, which decrease the FPR. Moreover, dropout and batch
normalization layers are used to enhance convergence speed,
model generalization ability and increase the DR. The main
contributions of this research article are given below:
•a state of art methodology is proposed that is based
on GRU and GoogLeNet. The automatic feature learn-
ing mechanism of both models increases convergence
speed, accuracy and handles the curse of dimensionality.
Moreover, this study integrates the benefits of both 1D
and 2D EC data in a parallel manner,
•the TLSGAN is proposed to generate fake samples from
existing theft patterns to tackle the class imbalance ratio,
•GRU model is utilized to handle non-malicious factors
like sudden changes in EC patterns due to increase in
family members, change in weather conditions, etc., and
•extensive experiments are conducted on a realistic EC
dataset that is provided by smart grid corporation of
China (SGCC), the largest smart grid company in China.
Different performance indicators are utilized to evaluate
the performance of the proposed model.
The remaining paper is organized as follows. Sections II
and III describe the related work and problem statement, re-
spectively. Section IV illustrates the data preprocessing steps
while Section V presents the working mechanism of TLS-
GAN for solving class imbalance problem. The description
of proposed model and experimental analysis are presented
in Sections V-B and Section VI, respectively. Finally, the
research article is concluded in Section VII.
II. RELATED WORK
In this Section, we discuss limitations of existing litera-
ture work. In [3], the authors extend existing consumption
pattern-based electricity theft detector (CPBETD) that is
based on support vector machine (SVM) to detect the ab-
normal patterns from EC data. However, the authors do not
use any feature engineering technique to extract or select
the prominent features from high dimensional time series
dataset. The high dimensionality of data creates time com-
plexity, storage and FPR issues. In [7], [10], [15], [16], [17],
feature selection is an important part of data-driven tech-
niques where significant features are selected from existing
ones. During feature selection process, less domain knowl-
edge increases FPR and decreases classification accuracy. In
[9], previous studies use only an EC dataset to train ML clas-
sifiers and predict abnormal patterns. They do not use smart
meter data and auxiliary data (geographical information, me-
ter inside or outside, etc.) to predict abnormal patterns from
electricity data. In [18], [19], there are various consumption
behaviours of different users. The consumption behaviour
of each customer gives different results. So, it is necessary
to select those features, which give best results. However,
consumption behaviours are closely related and significant
correlation exists between these features. The authors remove
highly correlated and overlapped features, which helps to
improve DR and decrease FPR. In [11], [20], the authors give
possibilities of implementing ML classifiers for detection
of NTLs and describe the advantage of selecting optimal
features and their impacts on classifier performance. One of
the main challenges [21] that limited the classification ability
of existing methods is high dimensionality of data. In [9],
the authors generate new features from the smart meter and
auxiliary data. These features are based on z-score, electrical
magnitude, users’ consumption patterns through clustering
technique, smart meter alarm system, geographical location
and smart meter’s placement. In [22], features are selected
from existing features based on clustering evaluation criteria.
In [8], the authors propose a new deep learning model, which
has ability to learn and extract latent features from EC data.
In [14], the authors use the black hole algorithm to select
the optimal number of features and compare the results with
particle swarm optimization, differential evolution, genetic
algorithm and harmony search. In [20], the authors perform
work on feature engineering and identify different features
like electricity contract, geographical location, weather con-
dition, etc. In [16], conventional methods are applied on data
VOLUME 4, 2016 3
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
to tackle the curse of dimensionality issue. This process is
very tedious and time-consuming.
In [17], one of the main contributions of this paper is
to find optimal number features. It is observed that not all
features equally contribute to prediction results. In [15], the
authors use Dense-Net based convolutional neural network
(CNN) to analyse periodicity in EC data. The convolutional
layers can capture the long-term and short-term sequences
from weekly and monthly EC patterns. In [11], maximal
overlap discrete wavelet packet transform is leveraged to
extract the optimal features. In [21], the authors implement a
bidirectional Wasserstein GAN to extract the optimal features
from time series data. In [9], the authors pass a combina-
tion of newly created features in different conventional ML
classifiers and compare their results. In [18], the authors
perform comparison between a number of selected features
and classification accuracy. In [8], [23], the authors measure
precision and recall score of long short term memory (LSTM)
classifier on test data. The hybrid of multilayer perceptron
(MLP) and LSTM outperform the single LSTM in terms of
PR curve because MLP adds additional information to the
network like meter location, contractual data and technical
information.
In [20], the identified features are passed to gradient
boosting classifiers to classify between normal and abnormal
samples. In [2], [9], [24], the authors do not use any feature
engineering technique to extract or select the optimal features
from high dimensional time series dataset. The high dimen-
sionality of data creates time complexity, storage issues and
affects the model generalization ability. In [18], the authors
form a feature library where they select a subset of features
from existing features using clustering evaluation criteria.
However, they do not compare the adopted feature selection
strategy with other feature selection strategies. In [2], [3],
[10], [18], [25] , data imbalance is a major issue for training
of ML classifiers. Benign samples are easily collected by
getting the history of any consumers. Whereas, theft cases
rarely happen in the real world. So, lack of theft samples
limit classification accuracy and increase FPR. Generally,
there are RUS and ROS techniques are utilized to solve data
imbalance problem. In [26], Chawla et all propose synthetic
minority oversampling technique (SMOTE) to create artifi-
cial samples of minority class. It has many advanced versions
like Random-SMOTE, Kmeans-SMOTE, etc. However, these
sampling techniques do not represent the overall distribution
of data, which affects the model performance. In [2], the
authors introduce six theft cases to generate malicious sam-
ples using benign samples. They argue that goal of theft is
to report less consumption than actual consumption or shift
load toward low tariff periods. After generating malicious
samples, the authors exploit ROS technique to solve class
imbalance problem.
In [10], the authors use six theft cases that are introduced
by [2] to generate malicious samples and SMOTE is lever-
aged to handle uneven distribution of samples. In [25], the
authors use SMOTE and near miss technique to tackle class
imbalance ratio. After balancing the dataset, the authors
perform comparison between bagging and boosting ensemble
techniques. However, both techniques give better results on
SMOTE rather than near miss. In [2], the authors argue that
goal of theft is to report less consumption or shift load from
high tariff periods to low tariff periods. So, it is possible
to generate malicious samples from benign ones. In [18],
the authors use 1D-Wasserstein GAN to generate duplicated
copies of minority class. In [19], the authors use adaptive
synthetic (ADASYN) sampling approach to tackle class im-
balance ratio and perform comparision between different ML
and deep learning techniques. In [3], [10], SMOTE technique
is used to tackle the class imbalance ratio. In [2], authors
use ROS technique to handle the class imbalance ratio. It
replicates existing samples of minority class, which create an
overfitting problem. Moreover, they introduce six theft cases
to generate malicious samples to balance ratio between theft
and normal samples. However, cases 1 and 2 do not have
resemblance with real theft cases. In [7], [8], [15], [17], [20],
[21], [27], [28] , the authors do not tackle above mentioned
problem. One of severe issue in ETD is class imbalance ratio
where one class (honest consumers) is dominant to other
class (theft consumers). In [25], the authors use SMOTE and
near miss method to handle class imbalance problem. In [9],
[11], the authors do not tackle class imbalance problem. The
ML classifiers become biased toward majority class, ignore
the minority class and generate false alarms due to uneven
distribution of samples. A utility cannot bear false alarm
because it has low budget for on site inspection.
III. PROBLEM STATEMENT
In [2], the authors propose a CPBETD to identify normal
and abnormal EC patterns. However, the CPBETD does not
use any feature engineering technique to solve the curse of
dimensionality issue. This issue refers to a set of problems
that occurs due to high dimensionality of a dataset. A dataset,
which contains a large number of features, generally in order
of hundreds or more, is known as a high dimensional dataset.
A time series dataset has high dimensionality that increases
time complexity, reduces DR and affects the generalization of
a classifier. In [7], [8], the authors solve the curse of dimen-
sionality issue by selecting the prominent features through
deep learning and meta-heuristic techniques. However, the
authors do not address class imbalance problem, which is a
major issue in NTLs detection. In [3], [25], the authors use
SMOTE to handle class imbalance ratio. However, SMOTE
creates an overfitting problem. It does not perform well on
time series data. In [9], the authors use RUS technique to
handle class imbalance ratio. However, this approach dis-
cards the useful information from data, which creates an
underfitting issue.
IV. DATA PREPROCESSING
Data preprocessing is an important part of data science where
the quality of data is improved by applying different tech-
niques that directly enhance the performance of ML methods.
4VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 2: Dataset information
Time window Jan. 1, 2014 to Oct. 31, 2016
Total consumers 42372
Normal consumers 38757
Electricity thieves 3615
In this Section, the data preprocessing techniques used in this
paper are discussed in detail.
A. ACQUIRING THE DATASET
SGCC dataset is used in this study to evaluate the perfor-
mance of the proposed model. It contains consumers’ IDs,
daily EC and labels either 0 or 1. It comprises EC data
of 42,372 consumers, out of which 91.46% are normal and
remaining are thieves. Each consumer is labeled as either 0
or 1, where 0 represents normal consumer and 1 represents
electricity thief. These labels are assigned by SGCC after
performing on-site inspections. The dataset is in a tabular
form. The rows represent complete record of each consumer.
While columns represent daily EC of all consumers. The
meta information about dataset is given in Table 2.
B. HANDLING THE MISSING VALUES
EC datasets often contain missing or erroneous values, which
are presented as not a number (NaN). The values often
occur due to many reasons: failure of smart meter, fault in
distribution lines, unscheduled maintenance of a system, data
storage problem, etc. Training data with missing values have
negative impact on the performance of ML methods. One
way to handle the missing values is to remove the consumers’
records that have missing values. However, this approach
may remove valuable information from data. In this study,
we use a linear imputation method to recover missing values
[3].
f(xi) =
xi,j-1 +xi,j+1
2,xi,j =N aN, xi,j±16=NaN,
0,xi,j-1 =NaN or xi,j+1 =NaN,
xi,j,xi,j 6=N aN.
(1)
In Equation (1), xi,j represents daily EC of a consumer iover
time period j(a day). xi,j-1 represents EC of the previous day.
xi,j+1 represents the EC of the next day.
C. REMOVING THE OUTLIERS FROM DATASET
We have found some outliers in the EC dataset. One of
the most important steps of data preprocessing phase is to
detect and treat outliers. The supervised learning models are
sensitive to the statistical distribution of data. The outliers
mislead the training process as a result the models take
longer time for training and generate false results. Motivated
from [7], we use three-sigma rule (TSR) to handle outliers.
Mathematical form of TSR is given in Equation (2).
f(xi) = (¯xi+ 3 ×σ(xi), if xi,j >¯xi+ 3 ×σ(xi),
xi,j otherwise. (2)
Algorithm 1: Data preprocessing steps
Data: EC dataset: X
1X= (x1, y1),(x2, y2), ..., (xm, ym)
2Variables: mini=minimum value of consumer xi,
maxi=miximum value of consumer xi,xi=
mean of consumer xi,σi=standard deviation of
consumer xi,row, col =X.shape
3for i←row do
4for j←col do
5Fill missing values:
6if xi,j−1&& xi,j +1 6=NaN && xi,j == NaN
then
7xi,j = (xi,j−1+xi,j +1)/2
8end
9if xi,j−1kxi,j +1 == NaN then
10 xi,j = 0
11 end
12 Remove outliers:
13 if xi,j > xi+ 3σithen
14 xi,j =xi+ 3σi
15 end
16 Min-max normalization:
17 xi,j =xi,j −mini
maxi−mini
18 end
19 end
Result: Xnormalized =X
xirepresents complete energy EC history of consumer i.
The ¯xidenotes average EC and σ(xi)represents standard
deviation of consumer i.
D. NORMALIZATION
After handling the missing values and outliers, we apply the
min-max technique to normalize the dataset because all deep
learning models are sensitive to the diversity of data [7]. The
experimental results show that deep learning models give
good results on normalized data. The mathematical form of
min-max technique is given in equation (3).
xi,j =xi,j −min(xi)
max(xi)−min(xi)(3)
The min(xi)and max(xi)represent minimum and max-
imum values of EC of consumer i, respectively. All data
preprocessing steps are shown in algorithm 1. In line number
1 and 2, the dataset is acquired from an electric utility and
variables are initialized. In line number 3 to 19, following
steps are performed: remove missing values, handle outliers
and apply the min-max normalization technique. Finally, we
obtain a normalized dataset.
E. EXPLORATORY DATASET ANALYSIS
Electricity theft is a criminal behaviour, which is done by
tampering or bypassing smart meters, hacking smart meters
through cyber attacks and manipulating meter readings us-
ing physical components or over the communication links.
VOLUME 4, 2016 5
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
15 20
Days
0
5
10
15
20
25
30
kWh
5 10 15 20
Days
(a)
Days
1st week
2nd week
3rd week
4th week
1 2 3 4 5
(b)
0
5
10
15
20
25
30
kWh
5 10 15 20
Days
(c)
1st week
2nd week
3rd week
4th week
1 2 3 4 5
Days
(d)
Ist week
2nd week
3rd week
4th week
Ist week
2nd week
3rd week
4th week
10.59 0.31 0.2
0.59 10.8 0.27
0.31 0.8 10.58
0.2 0.27 0.58 1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
(e)
1-0.067 0.43 0.042
-0.067 10.51 -0.79
0.43 0.51 1-0.79
0.042 -0.79 -0.79 1
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
Ist week
2nd week
3rd week
4th week
(f)
FIGURE 1: Statistical analysis between normal and abnor-
mal EC
Since EC data contains normal and abnormal patterns, that
is why data driven approaches receive high attention from
research community to differentiate between benign and thief
consumers. We conduct a preliminary analysis on EC data
through statistical techniques to check existence of period-
icity and non-periodicity in consumers’ EC patterns. Meta
information about dataset is given in Section IV-A. Figure 1a
shows the EC pattern of a normal consumer during a month.
There are a lot of fluctuations in a monthly EC pattern. So,
it is difficult to find normal and abnormal patterns from 1D
time series data. Figure 1b shows EC patterns of a normal
consumer according to weeks. The EC is decreasing on days
3 and 5, whereas, it is increasing on days 2 and 4. While,
2nd week shows abnormal pattern, which is different from
other weeks. We also conduct similar type of analysis on
theft patterns. Figures 1c and 1d show EC during a month
and a week of an energy thief. There are a lot of fluctuations
in monthly measurements and no periodicity exists in weekly
EC patterns.
Moreover, the correlation analysis is conducted between
EC of thieves and normal consumers. Figure 1e shows Pear-
son correlation values of a normal consumer that are mostly
more than 0.3. It is the indication of a strong relationship
between weekly EC patterns of a normal consumer. Fig-
ure 1f shows Pearson correlation values of electricity thief,
TABLE 3: Euclidean distance similarity measure
Consumers w1, w2w1, w3w1, w4Average
Normal 4.70 4.83 3.66 4.40
Theft 4.66 3.54 12.90 7.03
which indicate poor correlation between weekly EC data.
Hereinafter, we use Euclidean distance similarity measure to
examine how much weekly observations are similar to each
other. Euclidean distance is calculated for both normal and
theft consumers. We compare EC pattern of the last week
of a month with the previous three weeks and then take
the average of differences to decide how much normal EC
is different to abnormal EC. We observe that the Euclidean
distance between normal EC pattern is low as compared
to abnormal ones. Similar type of findings are found in
the whole dataset. To avoid the repetition, exploratory data
analysis is conducted on some observations, which are shown
in Figure 1 and Table 3.
f(x) = q(wi,j−wm,j)2+... + (wi,j−n−wm,j −n)2.
(4)
Equation (4) shows Euclidean distance formula to measure
similarity between weekly EC pattern. The wiand wmdenote
ith and mth weeks. j is a EC of a specific week day j ≤5.
After conducting statistical analysis on thieves and normal
consumers, we conclude that theft patterns have more fluc-
tuations (less periodic) than normal EC patterns. We believe
that this type of patterns can also be observed in datasets,
which are collected from different regions of countries. How-
ever, it is challenging to capture long-term periodicity from
1D time series dataset because it consists of long sequential
patterns. The conventional statistical and ML models, such as
autoregressive integrated moving average, SVM and decision
tree are unable to retrieve these patterns. Based on the above
analysis, we pass 1D data to GRU model because it is spe-
cially designed to capture temporal patterns from time series
data. Whereas, 1D EC data is stacked according to weeks and
is fed into GoogLeNet to extract periodicity between weeks.
V. THE PROPOSED MODEL
The proposed system model contains following steps:
•handling the class imbalance problem using TLSGAN,
•extracting prominent features utilizing GRU and
GoogLeNet,
•classifying the theft and benign samples leveraging fully
connected neural network,
•handling the non malicious factors using memory units
of GRU and
•enhancing the model’s generalization ability with the
help of dropout and batch normalization layers.
Each of the above mentioned steps is explained in the follow-
ing subsections.
A. HANDLING THE CLASS IMBALANCE PROBLEM
One of the critical problems in ETD is class imbalance ratio
where one class (honest consumers) is dominant to other
6VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
class (electricity thieves). The EC data is not normally dis-
tributed and skewed towards the majority class. When a ML
model is applied to an imbalance dataset, it becomes biased
towards the majority class and do not learn important features
of the minority class, which increases the FPR. Traditionally,
two sampling techniques such as ROS and RUS are used to
balance the dataset. However, these techniques have some
limitations: overfitting, information loss and duplication of
existing data. In this article, we propose TLSGAN to handle
class imbalance ratio because it is specially designed for
time series datasets by utilizing GRU layers. Its objective
function is based on the least-square method that computes
the difference between real and fake samples and generates
new samples, which have high closeness to real samples.
The collected electricity theft data belongs to the time se-
ries domain. So, GRU layers are exploited to design the
TLSGAN model. Using the least square function, the model
learns a small amount of both real theft data distribution and
generated fake samples. Finally, the generated samples are
concatenated with real samples and class imbalance problem
is solved. The overall working mechanism of TLSGAN is
explained below.
We select the existing theft data as training data. The
theft samples are presented as Pdata(x). A random noise or
latent variable zis drawn from Gaussian distribution Pg(z).
A mapping relationship is established between Pg(z)and
Pdata(x)through the GAN model. The GAN model contains
two deep learning models: generator (G) and discriminator
(D). The former is responsible to learn regularities from
Pdata(x)distribution and generate fake samples. It takes a
random variable zas input from Pg(z)and produces G(z)as
output. Its main goal is to fit Pg(z)onto Pdata(x)to generate
highly resembling fake samples with real theft samples and
confuse the D as many times as possible. The D is responsible
to discriminate whether input data is real or fake. It takes real
theft samples and synthetic samples generated by G as input
and produces output either 0 or 1, which indicates that the
generated samples are either real or fake. The mathematical
form of min-max equation of GAN network is given below
[29].
min
Gmax
DVGAN (D, G) = Ex∼pdata (x)[log D(x)]+
Ez∼pz(z)[log(1 −D(G(z)))],(5)
where, VGAN (D, G)is the loss function of GAN,
Ex∼pdata(x)is the expected value of theft distribution and
Ez∼pdata(z)is the expected value of latent distribution.
The standard GAN network is suitable for unsupervised
learning problems. It uses the binary cross-entropy function
to draw a decision boundary between real and fake sam-
ples. The limitation of binary cross-entropy is that it tells
whether the generated sample is real or fake but does not
tell how much generated samples are far away from the
decision boundary. It creates a vanishing gradient problem
and stops the training process of the GAN model. In [29], the
authors propose a least square generative adversarial network
Algorithm 2: Training of TLSGAN
Data: Xnormalized
1Variables: Seperate theft & benign samples from
Xnormalized ,Theft: T= {xi,j ,xi,j+1,xi,j+2, ...,
xm,n}, Normal: N= {yi,j,yi,j +1,yi,j +2,..., yp,n}
2while Stopping condition is not met do
3ti⇒Sample from theft distribution
4si⇒Sample from Gaussian distribution
51
tPt
i=1[1
2Et∼pdata(t)[(D(ti)−b)2] +
1
2Es∼ps(s)[(D(si)−a)2]]
6Fix discriminator weights
7Zi⇒Sample from latent space
81
nPn
i=1[1
2Ez∼pz(z)[(D(zi)−c)2]]
9end
10 aand bare labels of theft and fake patterns
11 cis distance that G wants to decieve D
12 After training of G, fake theft patterns are generated
13 F akeS amples =G(z)
14 XBalData =C oncatenate(F akeS amples, N, T )
Result: Return balanced dataset: XBalData
(LSGAN) architecture, which is an extension of the standard
GAN model. It uses the least square loss instead of binary
cross-entropy loss function. The LSGAN provides two bene-
fits. The standard GAN only updates those samples, which
are at wrong side of the decision boundary. The LSGAN
penalizes all the samples, which are away from the decision
boundary, even if the samples reside at the correct side of the
boundary. During the penalization process, the parameters of
D and decision boundary are fixed. Now, G generates samples
that are closer to the decision boundary. Secondly, penaliz-
ing the samples near the decision boundary produces more
changes in gradients, which solves the vanishing gradient
problem. The min-max objective function of LSGAN is given
in Equation (6), [29].
max
DVLSGAN (D) = 1
2Ex∼pdata(x)[(D(x)−b)2]+
1
2Ez∼pz(z)[(D(x)−a)2],
(6)
min
GVLSGAN (G) = 1
2Ez∼pz(z)[(D(G(z)) −c)2],
where, VLSGAN (G)is the loss function of LSGAN. The
aand bare labels of real (theft data) and fake samples.
cis the value of distance between both samples. The G
needs to minimize this value in order to deceive D. The
LSGAN is designed for generating fake images using con-
volutional layers. We change the internal architecture and
use GRU layers instead of convolutional layers because we
are working on a problem that belongs to sequential data.
The training process of TLSGAN is presented in algorithm
2. We pass Xnormalized data to algorithm 2 that is obtained
from algorithm 1. In the first step, variables are initialized. In
steps 2 to 9, TLSGAN is trained on theft samples to generate
VOLUME 4, 2016 7
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
fake theft patterns. In steps 10 to 14, the data is generated
from latent distribution and passed to G to produce fake theft
samples. At the end, we concatenate fake samples generated
by G, original theft samples, and normal samples and return
a balanced dataset XBalData .
B. ARCHITECTURE OF HYBRID MODEL
Time series data of EC has complex structure with high
random fluctuations because it is affected by various factors
like high load, weather conditions, big party in a house, etc.
Traditional models like SVM, MLP, etc., are not ideal to learn
complex patterns. The models have low DR and high FPR
due to curse of dimensionality issue. In literature, different
deep learning models are used to learn complex patterns from
time series data.
In this article, a hybrid model is proposed, which is a com-
bination of GoogLeNet and GRU. In [30], [31], the authors
prove that hybrid deep learning models perform better than
individual learners. The proposed model takes advantages of
both GoogLeNet and GRU by extracting and remembering
periodic features of EC dataset. The architecture of the pro-
posed model consists of three modules: GRU, GoogLeNet
and hybrid. We pass 1D data to the GRU module. Whereas,
2D weekly EC data is passed to the GoogLeNet module. The
hybrid module takes outputs of both modules, concatenates
them and gives final results about having anomaly in EC
patterns. The hybird deep learning models are very efficient
because they allow joint training of both models. Figure
2 shows overall structure of the proposed model. In the
proposed system model, steps 1, 2 and 3 show data pre-
processing phase where we handle missing values, outliers
and normalize the dataset, respectively. In step 4, the class
imbalance problem is solved. In steps 5 and 6, prominent
features are extracted from 1D and 2D EC datasets using
GRU and GoogLeNet models, respectively. Finally, in step 7,
extracted features of GRU and GoogLeNet are concatenated
and passed to a fully connected neural network to classify
between normal and theft samples.
C. GATED RECURRENT UNIT
We observe that there are a lot of fluctuations in theft EC
patterns as compared to normal consumers. So, 1D data is
fed into GRU model to capture co-occurring dependencies in
time series data. GRU is proposed by Chung et al. in 2014
to capture related dependencies in time series data. It has
memory modules to remember important periodic patterns,
which help to handle sudden changes in EC patterns due to
non-anomalous factors like changing in weather conditions,
big party in a house, weekends, etc. Moreover, it is introduced
to solve the vanishing gradient problem of recurrent neural
network (RNN). GRU and LSTM are considered as variants
of RNN. In [32], the authors compare the performance of
GRU and LSTM with RNN model on different sequential
datasets. Both models outperform the RNN and solve its
vanishing gradient problem. In [24], the authors from Google
conduct extensive experiments on 10,000 LSTM and RNN
architectures. Their final experimental results show that no
single model is found that performs better than GRU. Based
on the above analysis, we opt GRU to extract optimal features
from EC dataset because it gives good results on sequential
datasets. It has reset and update gates that control the flow of
information inside the network. The update gate decides how
much previous information should be preserved for future
decisions. Whereas, the reset gate decides that how much
past information should be kept or discarded. Equations of
update and reset gates are similar to each other. However,
the difference comes from weights and gates’ usage. The
equations of GRU network are given below [8].
zt=σ(Wz,[ht−1, xt]),(7)
rt=σ(Wr,[ht−1, xt]),(8)
ˆ
ht= tanh(W, [rt∗ht−1, xt]),(9)
ht= (1 −zt)∗ht−1+zt∗ˆ
ht.(10)
Where, t,zt,σ,Wzand xtrepresent time step, update gate,
sigmoid function, update gate weight and current input, re-
spectively. ht-1 ,ˆ
hand rtare previous hidden state, candidate
value, reset gate, respectively. Wris reset gate weight, Wis
weight of candidate value and htis hidden state. The last
hidden layer of GRU is presented as DenseGRU .
D. GOOGLENET
It is difficult to capture long-term periodicity from 1D
EC data. However, periodicity can be captured if data is
aligned according to weeks as explained in Section IV-E.
The GoogLeNet is a deep learning model that is proposed
by researchers at Google in 2014. It is designed to increase
the accuracy and computational efficiency of the existing
models. Its architecture is similar to the existing CNN models
like LeNet-5 and AlexNet, etc. However, the core of the
model is auxiliary classifiers and inception modules. Each
inception module contains 1×1,3×3,5×5and 7×7
convolutional filters that extract hidden or latent features
from EC data. After each inception module, the output of
convolutional and max pooling layers are concatenated and
passed to next inception module. The auxiliary classifiers
calculate training loss after 4th and 7th inception modules
and add it to the GoogLeNet network to prevent it from
vanishing gradient problem.
In [7], [31], the authors exploit 2D-CNN model to the
extract abstract features from time series dataset. Motivated
from these articles, the GoogLeNet is applied to extract latent
features from EC data. The latent features increase model’s
generalization ability. The 1D EC data is transformed into 2D
according to weeks and is fed as input to GoogLeNet model,
which has inception modules. Each inception module has
max pooling and multiple convolutional layers with different
filter sizes. In [7], the authors use simple CNN model to
extract local patterns from EC data. In simple CNN model,
multiple convolve windows of the same size move over EC
patterns and extract optimal features. However, the same size
8VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
S1
Noise
Discriminator
Time series signal
Real
Fake
Generator loss
Discriminator
loss
Generator
4
S1
L1
L4
L3
L2
Theft
samples
Normal
samples
xnk xnk
x11 x1k
xnk xnk
x11 x1k
xnk xnk
x11 x1k
x1x2
Conv layers Conv layers
Pooling
layers
Conv
layers
Text
GRUn
GRU2
GRU1
Balancing the
dataset
1D data
2D data
5
6
7
Hybrid module
GoogleNet module
x1x2
GRUh
GoogLeNet
Output with sigmoid
function
L5 S2
L5 S2
L6 S3
Missing values
12
Outliers
3Normalization
GRU module
SGCC labeled dataset
1
3
2Data preprocessing
module
5
6
GRU & GoogleNet
modules
7Hybrid module
4Data imbalance
module
L6: High FPR & Overfitting issue
S1: TimeGAN
S2: GoogLeNet and GRU
L1: Class imbalance
L2: Information loss due to RUS
L3: Data duplication due to ROS
L4: Overfitting due to SMOTE
L5: Curse of dimensionality
issue
S3: Dropout layers and batch
normalization
L1, L2, L3, L4
L5
S1
L6
L7
S2
S3
S4
Hybrid layer
FIGURE 2: The proposed system model
of convolve windows have low ability to extract optimal
features.
The GoogLeNet overcomes this problem through incep-
tion modules. Different number of convolve and max pooling
layers extract optimal features from EC data. Moreover,
GoogLeNet has less time and memory complexity as com-
pared to the existing deep learning models. However, it is
designed for computer vision tasks that is why it has multiple
inception modules to extract edges and interest points from
images. For our problem, we change the architecture and
use only one inception module that extracts periodicity and
non-periodicity from weekly EC patterns. Finally, we use
flatten and fully connected layers to attain principal features
that are extracted through convolutional and max pooling
layers. The last hidden layer of GoogLeNet is presented as
DenseGoogLeN et
E. HYBRID MODULE
GRU memorizes the periodic patterns from 1D data.
Whereas, GoogLeNet captures latent patterns from 2D data.
VOLUME 4, 2016 9
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
We combine the DenseGoogLeNet and DenseGRU to aggregate
latent and temporal patterns. The outcome of the model is
calculated through sigmoid activation function and training
loss is measured using binary cross entropy.
hHG2= (WH G2·[DenseGoogLeN et, D enseGRU], bH G2),
(11)
YNT L =σ(hH G2).(12)
Where, hHG2: hidden layer of hybrid module, WHG2: weight
of hybrid layer, bHG2: bias of hybrid layer, YNTL: output and
σ: sigmoid function. We pass XBalData to algorithm 3 that
is taken from algorithm 2. On lines 1 to 3, variables are
intialized. The 1D EC data is transformed into 2D format
from lines 4 to 6. On lines 7 to 17, we pass 1D data to GRU
to extract time-related patterns. Whereas, 2D data is fed into
GoogLeNet to retrieve periodicity and non periodicity from
weekly EC patterns. On lines 18 and 19, we concatenate fea-
tures of GRU and GoogLeNet and apply sigmoid activation
function, which classifies theft and normal EC patterns.
Algorithm 3: Training of HG2
Data: EC dataset: XBalData
1Data in 1D format
2X1D= {xi,j ,xi,j+1 ,xi,j+2, ..., xm,n }
3m= 42372, n = 1034
4Convert data in 2D format
5Z=
x1,1· · · x1,k
.
.
.....
.
.
xj,1· · · xm,k
6j= 147, k = 7
7Pass X1Dto GRU
8zt=σ(Wz,[ht−1, xt])
9rt=σ(Wr,[ht−1, xt])
10 ˆ
ht= tanh(W, [rt∗ht−1, xt])
11 ht= (1 −zt)∗ht−1+zt∗ˆ
ht
12 DenseGRU =relu(W·ht, b])
13 Pass Zto GoogLeNet
14 Z[a, c]=(Z)[a, c] = PjPkf[j, k]Z[a−j, c −k]
15 a, c ⇒dimension of output matrix
16 F lGoogLeN et =flatten(Z)
17 DenseGoogLeN et = [W·F lGoogLeN et +b]
18 hHG2= (WH G2·[DenseGRU , DenseGoog LeNet ]+ b)
19 b⇒bias term
20 YNT L =σ(hH G2)
Result: YNT L
F. PERFORMANCE METRICS
One of the main challenges of ETD is a class imbalance prob-
lem where classifiers become biased towards the majority
class and ignore the minority class. Therefore, the selection
of suitable measures is necessary to evaluate the performance
of classifiers for both classes. We opt ROC-AUC and PR-
AUC as performance metrices. The ROC-AUC is retrieved
by plotting true positive rate (TPR), also known as recall,
on y-axis and FPR on x-axis. It is a convenient diagnostic
tool because it is not biased towards minority and majority
classes. Its value lies between 0 and 1. Although, ROC-
AUC is a good performance measure, however, it does not
consider precision of a classifier and does not give equal
importance to both classes. Additionally, test dataset has
imbalance nature, so we decided to take into account PR-
AUC for performance evaluation of the classifiers [8]. PR-
AUC is a ratio of precision and recall on different threshold
values. The precision measures the percentage of correctly
identified number of electricity thieves. The maximization
of precision increases recovery revenue of utility. The recall
calculates percentage of electricity thieves on suspicious list.
High scores of precision and recall are very important for
accomplishing the goals of a utility.
VI. EXPERIMENTS AND RESULTS ANALYSIS
In this paper, all models are trained and tested on SGCC
dataset. The description of the dataset is given in Section
IV-A. We use Google Colab to train deep learning and
ML models by taking advantage of distributed clustering
computing. Deep learning models are implemented through
TensorFlow, which is a deep learning library. Moreover, con-
ventional models are fitted through the scikit-learn library.
A. PERFORMANCE ANALYSIS OF LEAST SQUARE
GENERATIVE ADVERSARIAL NETWORK
Due to the imbalance nature of the dataset, TLSGAN is pro-
posed to generate fake samples that have high resemblance
with real-world theft samples. The standard LSGAN uses
VGG neural network architecture to generate fake images.
However, our dataset belongs to the time series domain. So,
we change the network architecture according to our dataset’s
requirement. We replace the convolutional layers with GRU
layers because these layers are designed to handle problems
of sequential data. Both D and G models contain GRU and
dense layers. The linear activation function is implemented at
the last layer of D because it measures how much generated
samples are far away from real samples and changes the
weights of G to improve its performance.
Adam optimizer is used to train the parameters of TLS-
GAN because it is easy to implement, computationally less
expensive, requires little memory and gives good results on
large datasets. Figure 3a shows the loss function of Generator
(G) and Discriminator (D) on real and generated samples
during training process. After 100 epochs, D hardly differen-
tiates between real and fake samples. Whereas, G has the loss
function value between 0.5 and 1.75, which indicates that it
has developed a relation between real and latent data points
to generate new theft samples. Figure 3b shows patterns of
real theft samples. Moreover, Figures 3c and 3d present the
theft samples generated by TLSGAN. Both figures show that
generated samples have a high resemblance with original
samples of thieves that are presented in Figure 3b. Similar
trends are observed for both real and latent features, which
10 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
(a)
0 20 40 60 80 100
No. of features
0.0
0.2
0.4
0.6
0.8
1.0
kWh
Real theft samples
(b)
0 20 40 60 80 100
No. of features
0.0
0.2
0.4
0.6
0.8
1.0
kWh
Generated samples
(c)
0 20 40 60 80 100
No. of features
0.0
0.2
0.4
0.6
0.8
1.0
Generated samples
(d)
FIGURE 3: Performance analysis of TLSGAN
ensure the diversity in generated theft patterns. In Figures 3b,
3c and 3d, the x-axis represents the number of days, whereas,
the y-axis represents the EC in kilowatt-hour (kWh). Table 4
TABLE 4: Comparison through accuracy and execution time
of different data generation techniques
Techniques Execution time (s) Accuracy (%)
TLSGAN 61.61 95
SVM_SMOTE 177 88
Borderline_SMOTE 56 90
SMOTE_TomekLinks 71.61 93
SMOTE_ENN 957 89
ADASYN 71.61 93
SMOTE 9.18 81
ROS 0.12 88
RUS 0.5 89
presents classification accuracy and execution time of differ-
ent data generation techniques. We compare the performance
of proposed TLSGAN with current variants of SMOTE:
SVM_SMOTE, Borderline SMOTE, SMOTE_TomekLinks,
SMOTE_ENN and ADASYN. TLSGAN generates new theft
samples that increase the classification accuracy of the pro-
posed model. As explained above, the generated samples
have a high resemblance with real theft samples that re-
duces the overfitting problem, which occurs in other over-
sampling techniques and increases the model generalization
and robustness properties. The execution time of TLSGAN
is more than ROS, RUS, Borderline-SMOTE and SMOTE.
While it is less than SVM_SMOTE, SMOTE_TomekLinks,
SMOTE_ENN and ADASYN. The running time of TLS-
GAN depends upon the number of hidden layers and the sam-
pling rate of a dataset. The execution time of SVM_SMOTE,
Borderline-SMOTE, SMOTE_TomekLinks, SMOTE_ENN,
ADASYN and SMOTE depends upon the number of sam-
ples and features in a dataset. Whereas, the execution time
0.5 0.6 0.7 0.8 0.9 1.0
Recall
0.0
0.2
0.4
0.6
0.8
1.0
Precision
Train
Test
(a) PR Curve
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
0.0
0.2
0.4
0.6
0.8
1.0
True positive rate
No Skill
Train (ROC-AUC = 83.4%)
Test (ROC-AUC = 79.7%)
(b) ROC Curve
Epoch
0.6
0.7
0.8
Loss
Train
Test
Epochs
0.4
Accuracy
1 5 10 15 20
Train
Test
0.6
(c) Loss and Accuracy Curves
FIGURE 4: Performance analysis of gated recurrent unit
of RUS and ROS does not significantly change with large
datasets because they simply select samples from the dataset
and duplicate or remove them. SMOTE_TomekLinks and
SMOTE_ENN techniques take too much time because they
perform under sampling and over sampling steps to remove
redundant samples from the datasets.
B. PERFORMANCE ANALYSIS OF GATED RECURRENT
UNIT
Figures 4a, 4b and 4c show the performance of GRU model
on SGCC dataset. Figure 4a presents performance of the
model in terms of PR curve. The curve on training and
testing datasets is moving parallelly with a little bit differ-
ence, which means that model has learnt patterns of theft
and normal consumers. Now, it has ability to differentiate
between both classes. Figure 4b shows ROC curve and AUC
of model on training and testing datasets. GRU model attains
83.4% and 79.7% ROC-AUC values on training and testing
datasets, respectively. Figure 4c presents loss and accuracy
of the model on training and testing datasets. It achieves
good accuracy and has minimum loss after 20 epochs. Its
performance may increase with more number of epochs but it
is also possible that model may fall into overfitting problem.
GRU model has update and reset gates to regulate the flow
of information throughout the network. These gates prevent
the model from vanishing gradient problem and reduce its
chances of sticking in local minima problem. Moreover, these
gates increase the model’s overall performance by extracting
the optimal temporal features from the EC dataset, which
have time-related dependencies after certain intervals. Table
5 presents the hyperparameters setting of GRU model.
C. PERFORMANCE ANALYSIS OF GOOGLENET
Figures 5a, 5b and 5c show the performance of the
GoogLeNet model. The Figure 5a shows PR curve of
VOLUME 4, 2016 11
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 5: Hyperparameters setting of gated recurrent unit
Hyperparameters Optimal values
Size of GRU layer 60
Size of MLP 50
Dropout rate 0.4
Activation function at last layer Sigmoid
Optimizer ADAM
kernel_initializer he_normal
0.6 0.7 0.8 0.9 1.0
Recall
0.0
0.2
0.4
0.6
0.8
Precision
Train
Test
1.0
(a) PR Curve
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
0.0
0.2
0.4
0.6
0.8
1.0
True positive rate
No Skill
Train(ROC-AUC = 95.7%)
Test(ROC-AUC = 94.2%)
(b) ROC Curve
Epoch
0.4
0.5
0.6
Loss
Train
Test
1Epochs
0.7
Accuracy
5 10 15 20
Train
Test
0.8
(c) Loss and Accuracy Curves
FIGURE 5: Performance analysis of GoogLeNet
GoogLeNet model. PR curve provides a good analysis of
the model’s performance because it gives equal weights to
both normal and abnormal samples. The model obtains good
PR curves on training and testing datasets, which indicate
that it learns patterns of both normal and abnormal samples
appropriately during the training phase. Figure 5b shows
the model’s performance using the ROC curve and ROC-
AUC performance indicators. These indicators evaluate that
how much a model is good in predicting the positive class.
GoogLeNet achieves 95.7% and 94.2% AUC values on train-
ing and testing datasets, respectively, that are more than
AUC values of the GRU model. Moreover, loss and accuracy
of the model on training and testing datasets can be seen
in Figure 5c. We visualize the model’s performance with
more than 20 epochs. However, we observe that there are
more number of fluctuations in training and testing curves of
accuracy and loss, which indicate the model’s instability on
more than 20 epochs. Due to the above mentioned reasons,
model is trained only on 20 epochs that give good results
and save our computational resources. In this model, data
shape is transformed according to weeks to learn periodic
patterns and extract optimal features through convolution and
max pooling layers. The max pooling layers reduce data
dimensionality that increases the model’s convergence speed.
Moreover, dropout layers are used to reduce overfitting prob-
lem and increase generalization property. Table 6 presents the
hyperparameters setting of GoogLeNet.
TABLE 6: Hyperparameters setting of GoogLeNet
Hyperparameters Optimal values
Number of convolutional layers 2
Max pooling layers 1
Dense layer size 30
Dropout size 0.4
Activation function at last layer Sigmoid
Optimizer ADAM
kernel_initializer he_normal
0.5 0.6 0.7 0.8 0.9 1.0
Recall
0.0
0.2
0.4
0.6
0.8
1.0
Precision
Train
Test
(a) PR Curve
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
0.0
0.2
0.4
0.6
0.8
1.0
True positive rate
No Skill
Train(ROC-AUC = 97.8%)
Test(ROC-AUC = 95.7%)
(b) ROC Curve
0.0
0.2
0.4
Loss
Train
Test
Epochs
0.8
0.9
1.0
Accuracy
Train
Test
1 5 10 15 20
(c) Loss and Accuracy Curves
FIGURE 6: Performance analysis of HG2
D. PERFORMANCE ANALYSIS OF HYBRID HG2MODEL
In this Section, the performance of HG2model is compared
with stand-alone deep learning models. Figures 6a, 6b and 6c
show the HG2model’s performance using different perfor-
mance measures. HG2achieves 97.8% and 95.7% ROC-AUC
values on training and testing datasets, respectively, that are
more than GRU and GoogLeNet models. Figure 6c shows
loss and accuracy curves on training and testing datasets that
are better than curves of GRU and GoogLeNet models, which
are presented in Figures 4c and 5c. In [30] and [31], the
authors prove that a hybrid deep learning model performs
better than individual learners and achieves better conver-
gence speed, takes less computational time and extracts
optimal features. The GRU layers extract time related pat-
terns through update and reset gates. Whereas, GoogLeNet
model has inception module, which contains max pooling
and multiple convolution layers with different filter sizes.
These layers reduce computational complexity and extract
latent and abstract patterns using local receptive fields and
weight sharing mechanism. The Keras library is used to
concatenate extracted optimal features of both GoogLeNet
and GRU classifiers. Finally, these concatenated features
have properties of both individual learners that provide better
learning to HG2model. Although, it gives low performance.
However, when we combine it with GoogLeNet model, then
the overall performance is improved. The combined model
has the ability to learn better patterns from EC data. The
12 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 7: Hyperparameters setting of proposed model
Hyperparameters Optimal values
Number of convolutional layers 3
Max pooling layers 1
Size of GRU layer 60
Hybird layer size 20
Hybird layer activitation function ReLU
Activation function at last layer Sigmoid
Optimizer ADAM
kernel_initializer he_normal
proposed hybrid model ignores the weak points of both GRU
and GoogLeNet and uses the strong points of both. This is the
reason why the poor performance of GRU does not affect the
overall performance of the proposed model. Table 7 shows
the hyperparameters setting of HG2.
E. COMPARISON WITH BENCHMARK CLASSIFIERS
In this Section, the performance of proposed model is com-
pared with existing state-of-the-art deep learning and ML
classifiers.
(1) Wide and deep convolutional neural network (WD-
CNN): It is proposed in [7] to identify normal and abnormal
patterns from EC data. The wide component is equivalent to
MLP module that is used to extract global knowledge from
data. Whereas, CNN is leveraged to attain periodic patterns
from weekly EC data. We use same dataset and hyperparam-
eters setting to compare the model with our proposed model.
(2) Hybrid multilayer perceptron and long short term
memory model: In [8], the authors propose a hybrid model
that is a combination of LSTM and MLP. They pass EC
data to LSTM to extract periodic patterns. Whereas, smart
meter data is fed to the MLP model to retrieve non-sequential
information. They concatenate both models through Keras
library and prove that a hybrid model is better than a single
model. We use the same number of hyperparameters and
dataset settings as utilized in [8] to build a hybrid LSTM-
MLP model.
(3) Naive bayes classifier (NB): It is a statistical classifi-
cation technique that is based on bayes theorem. It assumes
that there is no relationship between input features and pre-
dicts the unknown class using a probability distribution. It has
high accuracy and speed on large datasets. Moreover, it has
many applications in the real world: spam filtering, sentiment
analysis, text classification, recommendation systems, etc.
The NB has different versions according to the nature of
a dataset. We utilize Gaussian NB to classify normal and
abnormal data points in the EC dataset because it is specially
designed for the prediction of continuous values.
(4) Support vector machine: SVM is a well-known clas-
sifier in ETD. It is an enhanced version of the maximal
margin hyperplane. It can classify both linear and non-linear
data. It exploits radial, sigmoid, gaussian, etc., kernels to
transform non-linear data into a linear format and then draw
a decision boundary between electricity thieves and normal
consumers. However, its computational time is high for large
datasets. In [2], the authors use SVM to classify benign and
theft consumers. We use radial basis function kernel (RBF)
due to the non-linearity of data and different values of C
parameter. After several iteration, 100 is found to be the
optimal value of C where SVM gives good results.
(5) Logistic regression (LR): It is a supervised ML al-
gorithm used for the binary classification task. It is just
like one layered neural network. For probability of having
NTL, it multiplies the input features with a trained weight
matrix and then pass the resultant values to sigmoid function
to generate output between 0 and 1. It has different solver
methods: newton’s method, stochastic average gradient and
sparse stochastic average gradient (SAGA). However, New-
ton’s method gives best results that are mentioned in Table.
8.
Results: We compare the performance of the proposed
HG2model with different state-of-the-art classifiers. The
same training and testing datasets are used for LR, NB,
MLP and SVM. We use RBF kernel for SVM due to the
non-linearity of data. Moreover, number of samples and
dimensionality of data is reduced because SVM requires high
computational time for large datasets. In [8], the authors
use sequential and non-sequential data for LSTM and MLP,
respectively. However, we do not have availability of non-
sequential information that is why only sequential informa-
tion is fed into MLP and LSTM models. The hybrid of both
models gives good results and achieves 95% and 94% ROC-
AUC and PR-AUC, respectively.
In [7], the sequential data is fed in MLP model to retrieve
global knowledge from data. Whereas, 2D stacked data is
given to CNN model to extract periodic patterns from weekly
EC data. The WDCNN achieves 92% and 88% ROC-AUC
and PR-AUC, respectively, which are more than as compared
to ROC-AUC and PR-AUC of conventional ML models.
The proposed HG2model outperforms hybrid and other
ML models because it extracts periodic and abstract patterns
from EC data using GRU and convolutional layers. As dis-
cussed earlier, the GRU layers have update and reset gates
that learn important patterns and remove redundant values.
These gates control the flow of information and improve the
overall performance of proposed model. GoogLeNet has an
inception module that contains max pooling and multiple
convolutional layers with different filter sizes. These layers
extract those patterns that cannot be retrieved through human
knowledge. These abstract or latent patterns are combined
with features that are extracted by GRU model. Due to a
combination of optimal features, HG2attains 96% and 97%
ROC-AUC and PR-AUC values that are more than all above
explained classifiers.
Table 8 shows comparison results of proposed model and
all other classifiers on different training ratios of datasets.
The deep learning models are sensitive to the size of training
data. The performance of these models increases with the
growing amount of training data. However, this is not true
for conventional ML models and their performance increases
according to the power law. After a certain point of training
on data, their performance does not imporve [33]. However,
VOLUME 4, 2016 13
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 8: Comparison of HG2with existing techniques
Dataset Training data = 80% Training data = 60% Training data = 50%
Methods ROC-AUC (%) PR-AUC (%) ROC-AUC (%) PR-AUC (%) ROC-AUC (%) PR-AUC (%)
SVM 77 64 77 64 78 64
LR 88 82 87 80 90 83
NB 50 52 50 54 51 59
MLP 88 82 87 79 86 77
MLP-LSTM 95 94 92 90 88 82
WDCNN 92 88 91 89 56 63
HG296 97 93 91 88 85
TABLE 9: Mapping table
Limitations Solutions Validations
L1: Class imbalance problem S1: TLSGAN V1: The proposed model achieve 96% PR-AUC that indicates model is not
biased toward majority class, which is shown in Figure 6a
L2: RUS removes important information
from data
S1: TLSGAN does not remove information from
dataset
V2: Result of TLSGAN and RUS are given in Table 4. Figure 6c shows that
model is not stuck into underfitting problem
L3: ROS causes overfitting problem
S1: TLSGAN reduces the overfitting problem of
ROS by generating fake samples, which have high
resemblance with real samples
V3: It achieves good PR curve and ROCcurve on training and testing datasets
as shown in 6a and 6b
L4: SMOTE causes overfitting problem S1: TLSGAN overcomes overfitting issue of
SMOTE and ADASYN
V4: Table 4 shows accuracy of SMOTE, ADASYN and TLSGAN. The
proposed model attains good PR curve and ROC curve on training and testing
data, which indicate that model is not stuck into overfitting problem
L5: Curse of dimensionality increases
model complexity and reduces model
generalization ability
S2: GRU and GoogLeNet are used to extract fea-
tures from sequence data (1D data)
V5: In Figures 6a, 6b and 6c, proposed model achieves good results, which
indicate that GoogLeNet and GRU extracts optimal temporal patterns from
EC dataset
L6: High FPR and overfitting issue S3: Dropout and batch normalization layers are
used to reduce FPR and overfitting problem
V6: We evaluate the model on training and testing data, which achieve FPR
score that is lower than FPR of the existing models.
HG2maintains superiority on other deep learning models
and gives better performance on different training ratios on
SGCC dataset. Both SVM and NB give good results on
balanced and large datasets. However, in our case, these
models perform poorly due to the following reasons. The
SVM does not perform well on noisy data and NB’s perfor-
mance is affected by continuous values because it assumes an
independent relationship between features. For MLP-LSTM,
WDCNN and HG2, if performance is not increasing or de-
creasing then it means that we must perform hyperparameter
tuning on training data to improve results.
F. MAPPING AMONG LIMITATIONS, SOLUTIONS AND
VALIDATIONS
Table 9 shows mapping of limitations, solutions and their val-
idations. L1 describes about class imbalance problem where
classifiers are biased towards majority and ignores minority
class that increases the FPR score. S1 solution is proposed
for L1. In S1, the TLSGAN is used to handle class imbalance
problem. As shown in Table 9, V1 is validation of S1. The
proposed model achieves 96% PR-AUC score that indicates
model is not biased toward majority class. Moreover, it
achieves 4% FPR score that is acceptable for a utility. In L2,
RUS randomly removes samples of majority class to balance
ratio of theft and normal samples. However, it discards the
useful information from data that causes underfitting prob-
lem. S2 solution is proposed to tackle L2. In S2, the TLS-
GAN is a deep learning technique that is designed to generate
fake samples, which have resemblance with real samples. So,
this technique does not remove useful information from data
and solves drawbacks of RUS. V2 validates the S2. Figure
6c shows that model is not stuck in underfitting problem. In
L3 and L4, the existing data sampling techniques generate
duplicated copies of minority class to solve class imbalance
problem. These techniques are designed for tabular data and
not for time series data. So, they face overfitting issue on
time series data. TLSGAN is specially designed to generate
fake samples of time series datasets that have severe class
imbalance problem. TLSGAN uses supervised and unsuper-
vised loss functions and generates samples that resemble
with actual data and also preserves time related patterns.
The performance of TLSGAN is compared with advanced
variants SMOTE techniques. V3 and V4 are validations of
S1. Table 4 shows the comparison between different data
sampling techniques, which shows that accuracy of TLS-
GAN is more than benchmark data augmentation techniques.
Figure 6c indicates that HG2attains good loss and accuracy
curves on training and testing datasets. Moreover, proposed
model achieves good PR curve that can be seen in Figure 6a.
L5 are issues that occur due to curse of dimensionality.
The GoolgeNet is used to capture weekly periodicity from
2D data. Whereas, the GRU is leveraged to capture long
term and short term features from 1D data. In S2, GRU
and GoogLeNet extract temporal and latent patterns and pass
them to a hybrid neural network to classsify theft and normal
samples. V5 is validation of S2. Figures 6a, 6b and 6c show
performance of proposed model through accuracy, loss, PR
and ROC curves, which indicate that GRU and GoogLeNet
extract optimal features from EC dataset and transfer them to
hybird module. Due to these optimal features, HG2achieves
96% and 97% ROC-AUC and PR-AUC scores, respectively
that are more than existing techniques, which are mentioned
in Table 8.
L6 is about high FPR and overfitting problem. We know
14 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
that utilities cannot bear high FPR due to limited budget for
on site inspection. In S4, dropout and batch normalization
layers are leveraged to solve overfitting problem and reduce
the FPR score. V6 validates S4 by computing FPR. The
proposed model achieves 4% FPR that is lower than as
compared to FPR of all existing models.
VII. CONCLUSION
In this article, we propose a model to detect NTLs in the elec-
tricity distribution system. The proposed model is a hybrid of
GRU and GoogLeNet. The GRU is used to extract temporal
patterns from time series dataset. Whereas, the GoogLeNet
is exploited to attain latent patterns from the weekly stacked
EC dataset. The performance of proposed model is evaluated
on realistic EC dataset that is provided by SGCC, the largest
smart grid company in China. The simulation results show
that HG2outperforms the benchmark classifiers: WDCNN,
MLP-LSTM, MLP, LR, NB and SVM. Moreover, the class
imbalance problem is a severe issue in ETD. The TLSGAN
is proposed that consist of GRU and dense layers to tackle
the class imbalance problem. The TLSGAN generates fake
samples, which have high resemblance with real world theft
samples. The model is evaluated using suitable performance
measures: ROC-AUC and PR-curve. The results of these
measures indicate that the proposed model outperforms the
benchmark classifiers and achieves 96% and 97% ROC-AUC
and PR-AUC, respectively. In fact, the proposed model is not
limited to detect electricity theft patterns only; it can also
be used in other industrial applications to classify normal
and abnormal samples or records. In near future, we plan
to implement the proposed model as an NTLs detector in
an electricity distribution company in Pakistan to classify
normal and theft samples.
VIII. DATASET AVAILABILITY
Dataset used in this study is publically available at this link
IX. ACKNOWLEDGEMENT
This work was supported by King Saud University, Riyadh,
Saudi Arabia, through Researchers Supporting Project num-
ber RSP-2021/184. The work of author Ayman Radwan was
supported by FCT / MEC through Programa Operacional
Regional do Centro and by the European Union through the
European Social Fund (ESF) under Investigador FCT Grant
(5G-AHEAD IF/FCT- IF/01393/2015/CP1310/CT0002).
REFERENCES
[1] Arango, L. G., E. Deccache, B. D. Bonatto, H. Arango, P. F. Ribeiro, and
P. M. Silveira. “Impact of electricity theft on power quality.” 2016 17th
International Conference on Harmonics and Quality of Power (ICHQP).
IEEE, 2016.
[2] Jokar, Paria, Nasim Arianpoo, and Victor CM Leung. “Electricity theft
detection in AMI using customers’ consumption patterns.” IEEE Transac-
tions on Smart Grid 7.1 (2015): 216-226.
[3] Punmiya, Rajiv, and Sangho Choe. “Energy theft detection using gradient
boosting theft detector with feature engineering-based preprocessing.”
IEEE Transactions on Smart Grid 10.2 (2019): 2326-2329.
[4] Lo, Chun-Hao, and Nirwan Ansari. “CONSUMER: A novel hybrid in-
trusion detection system for distribution networks in smart grid.” IEEE
Transactions on Emerging Topics in Computing 1.1 (2013): 33-44.
[5] Khoo, Benjamin, and Ye Cheng. “Using RFID for anti-theft in a Chi-
nese electrical supply company: A cost-benefit analysis.” 2011 Wireless
Telecommunications Symposium (WTS). IEEE, 2011.
[6] Amin, Saurabh, Galina A. Schwartz, and Hamidou Tembine. “Incentives
and security in electricity distribution networks.” International Conference
on Decision and Game Theory for Security. Springer, Berlin, Heidelberg,
2012.
[7] Zheng, Zibin, Yatao Yang, Xiangdong Niu, Hong-Ning Dai, and Yuren
Zhou. “Wide and deep convolutional neural networks for electricity-
theft detection to secure smart grids.” IEEE Transactions on Industrial
Informatics 14.4 (2017): 1606-1615.
[8] Buzau, Madalina-Mihaela, Javier Tejedor-Aguilera, Pedro Cruz-Romero,
and Antonio Gomez-Exposito. “Hybrid deep neural networks for detection
of non-technical losses in electricity smart meters.” IEEE Transactions on
Power Systems 35.2 (2019): 1254-1263.
[9] Buzau, Madalina Mihaela, Javier Tejedor-Aguilera, Pedro Cruz-Romero,
and Antonio Gomez-Exposito. “Detection of non-technical losses using
smart meter data and supervised learning.” IEEE Transactions on Smart
Grid 10.3 (2020): 2661-2670.
[10] Hasan, Md, Rafia Nishat Toma, Abdullah-Al Nahid, M. M. Islam, and
Jong-Myon Kim. “Electricity theft detection in smart grid systems: A
CNN-LSTM based approach.” Energies 12.17 (2019): 3310.
[11] Avila, Nelson Fabian, Gerardo Figueroa, and Chia-Chi Chu. “NTL detec-
tion in electric distribution systems using the maximal overlap discrete
wavelet-packet transform and random undersampling boosting.” IEEE
Transactions on Power Systems 33.6 (2018): 7171-7180.
[12] Aslam, Sheraz, Nadeem Javaid, Farman Ali Khan, Atif Alamri, Ahmad
Almogren, and Wadood Abdul. “Towards efficient energy management
and power trading in a residential area via integrating a grid-connected
microgrid.” Sustainability 10, no. 4 (2018): 1245.
[13] Iqbal, Zafar, Nadeem Javaid, Saleem Iqbal, Sheraz Aslam, Zahoor Ali
Khan, Wadood Abdul, Ahmad Almogren, and Atif Alamri. “A domestic
microgrid with optimized home energy management system.” Energies 11,
no. 4 (2018): 1002.
[14] Ramos, Caio CO, Douglas Rodrigues, AndreN. de Souza, and Joao P.
Papa. “On the study of commercial losses in Brazil: a binary black hole
algorithm for theft characterization.” IEEE Transactions on Smart Grid 9.2
(2016): 676-683.
[15] Li, Bo, Kele Xu, Xiaoyan Cui, Yiheng Wang, Xinbo Ai, and Yanbo
Wang. “Multi-scale DenseNet-based electricity theft detection.” Interna-
tional Conference on Intelligent Computing. Springer, Cham, 2018.
[16] Li, Shuan, Yinghua Han, Xu Yao, Song Yingchen, Jinkuan Wang, and
Qiang Zhao. “Electricity theft detection in power grids with deep learning
and random forests.” Journal of Electrical and Computer Engineering 2019
(2019).
[17] Ghori, Khawaja Moyeezullah, Rabeeh Ayaz Abbasi, Muhammad Awais,
Muhammad Imran, Ata Ullah, and Laszlo Szathmary. “Performance anal-
ysis of different types of machine learning classifiers for non-technical loss
detection.” IEEE Access 8 (2019): 16033-16048.
[18] Kong, Xiangyu, Xin Zhao, Chao Liu, Qiushuo Li, DeLong Dong, and Ye
Li. “Electricity theft detection in low-voltage stations based on similarity
measure and DT-KSVM.“ International Journal of Electrical Power &
Energy Systems 125 (2021): 106544.
[19] Aslam, Zeeshan, Fahad Ahmed, Ahmad Almogren, Muhammad Shafiq,
Mansour Zuair, and Nadeem Javaid. "An attention guided semi-supervised
learning mechanism to detect electricity frauds in the distribution sys-
tems." IEEE Access 8 (2020): 221767-221782.
[20] Coma-Puig, Bernat, and Josep Carmona. “Bridging the gap between en-
ergy consumption and distribution through non-technical loss detection.”
Energies 12.9 (2019): 1748.
[21] Hu, Tianyu, Qinglai Guo, Hongbin Sun, Tian-En Huang, and Jian Lan.
“Nontechnical losses detection through coordinated biwgan and svdd.”
IEEE Transactions on Neural Networks and Learning Systems (2020).
[22] Huang, Yifan, and Qifeng Xu. “Electricity theft detection based on stacked
sparse denoising autoencoder.” International Journal of Electrical Power &
Energy Systems 125 (2021): 106448.
[23] Nadeem Javaid, Aqdas Naz, Rabiya Khalid, Ahmad Almogren, Muham-
mad Shafiq, and Adia Khalid. “ELS-Net: A New Approach to Forecast
Decomposed Intrinsic Mode Functions of Electricity Load.” IEEE Access
8 (2020): 198935-198949.
VOLUME 4, 2016 15
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
[24] Ding, Nan, HaoXuan Ma, Huanbo Gao, YanHua Ma, and GuoZhen Tan.
“Real-time anomaly detection based on long short-Term memory and
Gaussian Mixture Model.” Computers & Electrical Engineering 79 (2019):
106458.
[25] Gunturi, Sravan Kumar, and Dipu Sarkar. “Ensemble machine learning
models for the detection of energy theft.” Electric Power Systems Research
192 (2021): 106904.
[26] Taft, Laritza M., R. Scott Evans, Chi-Ren Shyu, Marlene J. Egger, N.
Chawla, Joyce A. Mitchell, Sidney N. Thornton, B. Bray, and M. Varner.
"Countering imbalanced datasets to improve adverse drug event predictive
models in labor and delivery." Journal of biomedical informatics 42, no. 2
(2009): 356-364.
[27] Bhat, Rajendra Rana, Rodrigo Daniel Trevizan, Rahul Sengupta, Xiaolin
Li, and Arturo Bretas. “Identifying nontechnical power loss via spatial
and temporal deep learning.” 2016 15th IEEE International Conference
on Machine Learning and Applications (ICMLA). IEEE, 2016.
[28] Saeed, Muhammad Salman, Mohd Wazir Mustafa, Usman Ullah Sheikh,
Touqeer Ahmed Jumani, and Nayyar Hussain Mirjat. “Ensemble bagged
tree based classification for reducing non-technical losses in multan elec-
tric power company of Pakistan.” Electronics 8.8 (2019): 860.
[29] Mao, Xudong, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and
Stephen Paul Smolley. “Least squares generative adversarial networks.” In
Proceedings of the IEEE international conference on computer vision, pp.
2794-2802. 2017.
[30] Huang, ChiouJye, Yamin Shen, Yung, Hsiang Chen, and Hsin Chuan
Chen. “A novel hybrid deep neural network model for short term electricity
price forecasting.” International Journal of Energy Research 45.2 (2021):
2511-2532.
[31] Yu, Jingxin, Xin Zhang, Linlin Xu, Jing Dong, and Lili Zhangzhong. “A
hybrid CNN-GRU model for predicting soil moisture in maize root zone.”
Agricultural Water Management 245 (2021): 106649.
[32] Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio.
“Empirical evaluation of gated recurrent neural networks on sequence
modeling.” arXiv preprint arXiv:1412.3555 (2014).
[33] Sun, Chen, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta.
“Revisiting unreasonable effectiveness of data in deep learning era.” In
Proceedings of the IEEE international conference on computer vision, pp.
843-852. 2017.
FAISAL SHEHZAD received BS Software En-
gineering BS(SE) degree from Government Col-
lege University Faisalabad (GCUF), Faisalabad,
Pakistan in 2018. He is currently pursuing MS
in computer science with the communication over
Sensors (ComSens) Research Laboratory, Depart-
ment of Computer Science, COMSATS University
Islamabad, Islamabad, Pakistan, under the super-
vision of Dr. Nadeem Javaid. He has 5 research
publications in well reputed international journals
and conferences. His research includes Data science, Smart grid, Blockchain
and Financial market.
NADEEM JAVAID (S’8, M’11, SM’16) received
the bachelor’s degree in computer science from-
Gomal University, Dera Ismail Khan, KPK, Pak-
istan, in 1995, the master’s degree in electronics
from Quaid-i-Azam University, Islamabad, Pak-
istan, in 1999, and the Ph.D. degree in computer
science from the University of Paris-Est, France,
in 2010. He is currently an Associate Professor
and the Founding Director of the Communica-
tions over Sensors (ComSens) Research Labora-
tory, Department of Computer Science, COMSATS University Islamabad,
Islamabad Campus. He has supervised 126 master’s and 20 Ph.D. theses.
He has authored over 900 articles in technical journals and international
conferences. His research interests include energy optimization in smart
grids and in wireless sensor networks using data analytics and blockchain.
He was recipient of the Best University Teacher Award from the Higher
Education Commission of Pakistan in 2016 and the Research Productivity
Award from the Pakistan Council for Science and Technology in 2017.
He is also an Associate Editor of IEEE Access and the Editor of the
International Journal of Space Based and Situated Computing and editor of
the Sustainable Cities and Society.
AHMAD ALMOGREN (SM) received the Ph.D.
degree in computer science from Southern
Methodist University, Dallas, TX, USA, in 2002.
He is currently a Professor with the Computer
Science Department, College of Computer and
Information Sciences (CCIS), King Saud Univer-
sity (KSU), Riyadh, Saudi Arabia, where he is
currently the Director of the Cyber Security Chair,
CCIS. Previously, he worked as the Vice Dean of
the Development and Quality at CCIS. He also
served as the Dean for the College of Computer and Information Sciences
and the Head of the Academic Accreditation Council, Al Yamamah Univer-
sity. He served as the General Chair for the IEEE Smart World Symposium
and a Technical Program Committee member of numerous international
conferences/workshops, such as IEEE CCNC, ACM BodyNets, and IEEE
HPCC. His research interests include mobile-pervasive computing and cyber
security
ABRAR AHMED was born in Pakistan in 1985.
He received the B.S. in computer engineering from
the COMSATS Institute of Information Technol-
ogy, Abbottabad, Pakistan, in 2006., the M.S. de-
gree from Lancaster University, U.K, in 2008, the
Ph.D. degree in electrical engineering from the
COMSATS Institute of Information Technology,
Islamabad in 2017. Since 2006, he has been Asso-
ciated with the COMSATS Institute of Information
Technology, Islamabad, where he currently holds
the position of Assistant Professor. His research interests include wireless
channel modeling and characterizing, smart antenna systems, nonorthogonal
multiple access techniques, and adaptive signal processing.
16 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
SARDAR MUHAMMAD GULFAM received MS
in computer engineering from Tampere University
of Technology, Finland, in 2010 and the Ph.D. de-
gree in electrical engineering from the COMSATS
Institute of Information Technology, Islamabad in
2017. He is working as researcher in wireless
communication.
AYMAN RADWAN received the Ph.D. degree
from Queen’s University, Kingston, ON, Canada,
in 2009. He is a Senior Research Engineer (Inves-
tigador Auxiliar) with the Instituto de Telecomu-
nicações, University of Aveiro, Aveiro, Portugal.
He is mainly specialized in coordination and man-
agement of EU funded projects. He participated
in the coordination of multiple EU projects. He is
currently the Project Coordinator of the CELTIC+
Project “MUSCLES,” as well as participating in
the coordination of ITN-SECRET. He has also been the Technical Manager
of the FP7-C2POWER Project and the Coordinator of the CELTIC+ “Green-
T” Project. His current research interests include the Internet of Things, 5G,
and green communications.
VOLUME 4, 2016 17