ArticlePDF Available

A Robust Hybrid Deep Learning Model for Detection of Non-Technical Losses to Secure Smart Grids

Authors:

Abstract and Figures

For dealing with the electricity theft detection in the smart grids, this article introduces a hybrid deep learning model. The model tackles various issues such as class imbalance problem, curse of dimensionality and low theft detection rate of the existing models. The model integrates the benefits of both GoogLeNet and gated recurrent unit (GRU). The one dimensional electricity consumption (EC) data is fed into GRU to remember the periodic patterns of electricity consumption. Whereas, GoogLeNet model is leveraged to extract the latent features from the two dimensional weekly stacked EC data. Furthermore, the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples, which have high resemblance with real world theft samples. The standard generative adversarial network only updates the weights of those points that are available at the wrong side of the decision boundary. Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of decision boundary that prevent the model from vanishing gradient problem. Moreover, dropout and batch normalization layers are utilized to enhance model’s convergence speed and generalization ability. The proposed model is compared with different state-of-the-art classifiers including multilayer perceptron (MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network and wide and deep convolutional neural network. It outperforms all classifiers by achieving 96% and 97% precision-recall area under the curve and receiver operating characteristics area under the curve, respectively.
Content may be subject to copyright.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2020.DOI
A Robust Hybrid Deep Learning Model
for Detection of Non-technical Losses to
Secure Smart Grids
FAISAL SHEHZAD1, NADEEM JAVAID1,*, (Senior Member, IEEE),
AHMAD ALMOGREN2, (Senior Member, IEEE), ABRAR AHMED3,
SARDAR MUHAMMAD GULFAM3, AYMAN RADWAN4
1Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
2Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11633, Saudi Arabia
3Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad 44000, Pakistan
4Instituto de Telecomunicacoes and Universidade de Aveiro, Aveiro, Portugal
*Corresponding authors: Nadeem Javaid. Email: nadeemjavaidqau@gmail.com and Ahmad Almogren. Email: ahalmogren@ksu.edu.sa
ABSTRACT For dealing with the electricity theft detection in the smart grids, this article introduces a
hybrid deep learning model. The model tackles various issues such as class imbalance problem, curse of
dimensionality and low theft detection rate of the existing models. The model integrates the benefits of
both GoogLeNet and gated recurrent unit (GRU). The one dimensional electricity consumption (EC) data
is fed into GRU to remember the periodic patterns of electricity consumption. Whereas, GoogLeNet model
is leveraged to extract the latent features from the two dimensional weekly stacked EC data. Furthermore,
the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance
problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples,
which have high resemblance with real world theft samples. The standard generative adversarial network
only updates the weights of those points that are available at the wrong side of the decision boundary.
Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of
decision boundary that prevent the model from vanishing gradient problem. Moreover, dropout and batch
normalization layers are utilized to enhance model’s convergence speed and generalization ability. The
proposed model is compared with different state-of-the-art classifiers including multilayer perceptron
(MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network
and wide and deep convolutional neural network. It outperforms all classifiers by achieving 96% and 97%
precision-recall area under the curve and receiver operating characteristics area under the curve, respectively.
INDEX TERMS Electricity theft detection, gated recurrent unit, GoogLeNet, non-technical losses, smart
grids, SGCC.
I. INTRODUCTION
Two types of losses occur during generation, transmission
and distribution of electricity that are technical losses (TLs)
and non-technical losses (NTLs). The former occur due to
dissipation of energy in distribution lines, transformers and
other electric equipments. Whereas, the latter are caused by
meter tampering, direct hooking to transmission lines, billing
errors, faulty meters, etc. These losses not only affect the
performance of electricity generation companies, however,
they also damage their physical components. Moreover, a
recent report shows that NTLs cause $96 billion of revenue
loss every year [1]. According to the World Bank’s report,
India, China and Brazil bear 25%, 6% and 16% loss on their
total electric supply, respectively. The NTLs are not limited
to only developing countries; it is estimated that developed
countries like UK and US also lose 232 million and 6 billion
USA dollars per annum, respectively [2].
Electricity theft is a primary cause of NTLs. The evolution
of advanced metering infrastructure (AMI) promises to over-
come electricity theft through monitoring users’ consumption
history. However, it introduces new types of cyber-attacks,
which are difficult to detect using conventional methods.
VOLUME 4, 2020 1
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 1: List of abbrevations
Abbreviation Full form
ADASYN Adaptive synthetic sampling approach
AMI Advanced metering infrastructure
CNN Convolutional neural network
CPBETD Consumption pattern based electricity theft detector
Catboost Categorical boosting
D Discriminator
DR Detection rate
EC Electricity consumption
ETD Electricity theft detection
FPR False positive rate
GRU Gated recurrent unit
G Generator
KNN k-nearest neighbors
LSTM Long short term memory
LR Logistic regression
LSGAN Least square generative adversarial network
LightGBM Light gradient boosting machine
ML Machine learning
SVM Support vector machine
SGCC Smart grid corporation of China
SMOTE Synthetic minority over-sampling technique
SMOTE_ENN SMOTE and edited nearest neighbors
MLP Multilayer perceptron
NTLs Non-technical losses
NB Naive bayes
NaN Not a number
TLSGAN Time LSGAN
TPR True positive rate
TSR Three sigma rule
SSDAE Stacked sparse denoising autoencoder
RUS Random undersampling
RF Random forest
ROS Random oversampling
RNN Recurrent neural network
RDBN Real-valued deep belief network
PR-AUC Precision recall-area under curve
PCA Principal component analysis
ROC-AUC Receiver operating characteristic - area under curve
TLs Technical losses
WDCNN Wide and deep convolutional neural network
XGBoost eXtreme gradient boosting
aLabel of theft sample
bLabel of fake sample
bHG2Bias of hybrid layer
cDistance variable
DenseGoogLeN et Last layer of GoogLeNet
DenseGRU Last layer of GRU
EExpected value of all instances
htHidden state at timestamp t
hHG2Hidden layer of hybrid module
ˆ
hCandidate value
Pdata(x)Theft data
Pg(z)Gaussian distribution
rReset gate
wiith week EC
wmmth week EC
WrWeight of reset gate
WzWeight of update gate
WHG2Weight of hybrid layer
xiComplete consumption history of consumer i
xi,j Daily EC of a consumer iover time period j(a day)
xi,j-1 EC of a previous day
xi,j+1 EC of a next day
Whereas, traditional meters are only compromised through
physical tampering. In AMI, the meter readings are tam-
Abbreviation Full form
¯xiAverage consumption of consumer i
σ(xi)Standard deviation of consumer i
min(xi)Minimum value of consumer i
YNT L Output of having NTLs or not
zUpdate gate
pered locally and remotely over the communication links
before sending them to an electric utility [3]. There are
three types of approaches to address the NTLs in AMI:
state, game theory and data-driven. State-based approaches
exploit wireless sensors and radio frequency identification
tags to detect NTLs. However, these approaches require high
installation, maintenance and training cost and they also
perform poorly in extreme weather conditions [4], [5]. Beside
this, game theory based approaches hold a game between
a power utility and consumers to achieve equilibrium state
and then extract hidden patterns from users’ EC history.
However, it is difficult to design a suitable utility function for
utilities, regulators, distributors and energy thieves to achieve
equilibrium state within the defined time [6]. Moreover, both
NTLs detection approaches have low detection rate (DR) and
high false positive rate (FPR)
The data driven methods get high attention due to the
availability of electricity consumption (EC) data that is col-
lected through AMI. A normal consumer’s EC follows a
statistical pattern, whereas, abnormal1EC does not follow
any pattern. The machine learning (ML) and data mining
techniques are trained on collected data to learn normal2and
abnormal consumption patterns. After training, the model is
deployed in a smart grid to classify incoming consumer’s data
into normal or abnormal samples. Since, these techniques use
already available data and do not require to deploy hardware
devices at consumers’ site that is why their installation and
maintenance costs are low as compared to hardware based
methods. However, class imbalance problem is a serious
issue for data driven methods where the number of normal
EC samples is more than theft ones. Normal data is easily
collected through users’ consumption history.
Whereas, theft cases are relatively rare than normal class in
the real world that is why few number of samples are present
in user’s consumption history. So, lack of theft samples
affect the performance of classification models. The ML
models become biased towards majority class and ignore the
minority class, which increases the FPR [7], [8]. In literature,
the authors mostly use random undersampling (RUS) and
random oversampling (ROS) techniques to handle the class
imbalance problem. However, both techniques have underfit-
ting and overfitting issues that increase the FPR and minimize
the DR [3], [9], [10], [11]. The second challenging issue is the
curse of dimensionality. A time series dataset contains a large
number of timestamps (features) that increase both execution
1Theft and abnormal words are used interchangeably
2Benign and normal words are used interchangeably.
2VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
time and memory complexity and reduce the generalization
ability of ML methods. However, traditional ML methods
have low DR and overfitting issue due to curse of dimension-
ality. They require domain knowledge to extract prominent
features that is a time consuming task [2], [3]. Moreover,
metaheuristic techniques are proposed by understaning the
working mechanism of nature. In literature, these techniques
are mostly utilized for optimization and feature selection
purposes [12], [13], [14].
In this article, time series least square generative adver-
sarial network (TLSGAN) is proposed, which is specifically
designed to handle data imbalance problem of time series
datasets. It utilizes supervised and unsupervised loss func-
tions and gated recurrent unit (GRU) layers to generate fake
theft samples, which have high resemblance with real world
theft samples. Whereas, standard GAN uses only unsuper-
vised loss function to generate fake theft samples, which have
low resemblance with real word theft samples. Moreover, a
HG2model is proposed, which is a hybrid of GoogLeNet and
GRU. It is a challenging task to capture long-term periodicity
from one dimensional (1D) time series dataset. The deep
learning models have better ability to memorize sequence
patterns as compare to traditional ML models. The 1D data is
fed into GRU to capture temporally correlated patterns from
users’ consumption history. Whereas, weekly consumption
data is passed to GoogLeNet to capture local features from
sequence data using the inception modules. Each inception
module contains multiple convolutional and max-pooling
layers that extract high level features from time series data
and overcome the curse of dimensionality issue. Moreover,
non malicious factors like changing the number of persons
in a house, extreme weather conditions, weekends, big party
in a house, etc., affect the performance of ML methods.
The GRU is used to handle non malicious factors because it
has memory modules. These memory modules help GRU to
learn sudden changes in consumption patterns and memorize
them, which decrease the FPR. Moreover, dropout and batch
normalization layers are used to enhance convergence speed,
model generalization ability and increase the DR. The main
contributions of this research article are given below:
a state of art methodology is proposed that is based
on GRU and GoogLeNet. The automatic feature learn-
ing mechanism of both models increases convergence
speed, accuracy and handles the curse of dimensionality.
Moreover, this study integrates the benefits of both 1D
and 2D EC data in a parallel manner,
the TLSGAN is proposed to generate fake samples from
existing theft patterns to tackle the class imbalance ratio,
GRU model is utilized to handle non-malicious factors
like sudden changes in EC patterns due to increase in
family members, change in weather conditions, etc., and
extensive experiments are conducted on a realistic EC
dataset that is provided by smart grid corporation of
China (SGCC), the largest smart grid company in China.
Different performance indicators are utilized to evaluate
the performance of the proposed model.
The remaining paper is organized as follows. Sections II
and III describe the related work and problem statement, re-
spectively. Section IV illustrates the data preprocessing steps
while Section V presents the working mechanism of TLS-
GAN for solving class imbalance problem. The description
of proposed model and experimental analysis are presented
in Sections V-B and Section VI, respectively. Finally, the
research article is concluded in Section VII.
II. RELATED WORK
In this Section, we discuss limitations of existing litera-
ture work. In [3], the authors extend existing consumption
pattern-based electricity theft detector (CPBETD) that is
based on support vector machine (SVM) to detect the ab-
normal patterns from EC data. However, the authors do not
use any feature engineering technique to extract or select
the prominent features from high dimensional time series
dataset. The high dimensionality of data creates time com-
plexity, storage and FPR issues. In [7], [10], [15], [16], [17],
feature selection is an important part of data-driven tech-
niques where significant features are selected from existing
ones. During feature selection process, less domain knowl-
edge increases FPR and decreases classification accuracy. In
[9], previous studies use only an EC dataset to train ML clas-
sifiers and predict abnormal patterns. They do not use smart
meter data and auxiliary data (geographical information, me-
ter inside or outside, etc.) to predict abnormal patterns from
electricity data. In [18], [19], there are various consumption
behaviours of different users. The consumption behaviour
of each customer gives different results. So, it is necessary
to select those features, which give best results. However,
consumption behaviours are closely related and significant
correlation exists between these features. The authors remove
highly correlated and overlapped features, which helps to
improve DR and decrease FPR. In [11], [20], the authors give
possibilities of implementing ML classifiers for detection
of NTLs and describe the advantage of selecting optimal
features and their impacts on classifier performance. One of
the main challenges [21] that limited the classification ability
of existing methods is high dimensionality of data. In [9],
the authors generate new features from the smart meter and
auxiliary data. These features are based on z-score, electrical
magnitude, users’ consumption patterns through clustering
technique, smart meter alarm system, geographical location
and smart meter’s placement. In [22], features are selected
from existing features based on clustering evaluation criteria.
In [8], the authors propose a new deep learning model, which
has ability to learn and extract latent features from EC data.
In [14], the authors use the black hole algorithm to select
the optimal number of features and compare the results with
particle swarm optimization, differential evolution, genetic
algorithm and harmony search. In [20], the authors perform
work on feature engineering and identify different features
like electricity contract, geographical location, weather con-
dition, etc. In [16], conventional methods are applied on data
VOLUME 4, 2016 3
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
to tackle the curse of dimensionality issue. This process is
very tedious and time-consuming.
In [17], one of the main contributions of this paper is
to find optimal number features. It is observed that not all
features equally contribute to prediction results. In [15], the
authors use Dense-Net based convolutional neural network
(CNN) to analyse periodicity in EC data. The convolutional
layers can capture the long-term and short-term sequences
from weekly and monthly EC patterns. In [11], maximal
overlap discrete wavelet packet transform is leveraged to
extract the optimal features. In [21], the authors implement a
bidirectional Wasserstein GAN to extract the optimal features
from time series data. In [9], the authors pass a combina-
tion of newly created features in different conventional ML
classifiers and compare their results. In [18], the authors
perform comparison between a number of selected features
and classification accuracy. In [8], [23], the authors measure
precision and recall score of long short term memory (LSTM)
classifier on test data. The hybrid of multilayer perceptron
(MLP) and LSTM outperform the single LSTM in terms of
PR curve because MLP adds additional information to the
network like meter location, contractual data and technical
information.
In [20], the identified features are passed to gradient
boosting classifiers to classify between normal and abnormal
samples. In [2], [9], [24], the authors do not use any feature
engineering technique to extract or select the optimal features
from high dimensional time series dataset. The high dimen-
sionality of data creates time complexity, storage issues and
affects the model generalization ability. In [18], the authors
form a feature library where they select a subset of features
from existing features using clustering evaluation criteria.
However, they do not compare the adopted feature selection
strategy with other feature selection strategies. In [2], [3],
[10], [18], [25] , data imbalance is a major issue for training
of ML classifiers. Benign samples are easily collected by
getting the history of any consumers. Whereas, theft cases
rarely happen in the real world. So, lack of theft samples
limit classification accuracy and increase FPR. Generally,
there are RUS and ROS techniques are utilized to solve data
imbalance problem. In [26], Chawla et all propose synthetic
minority oversampling technique (SMOTE) to create artifi-
cial samples of minority class. It has many advanced versions
like Random-SMOTE, Kmeans-SMOTE, etc. However, these
sampling techniques do not represent the overall distribution
of data, which affects the model performance. In [2], the
authors introduce six theft cases to generate malicious sam-
ples using benign samples. They argue that goal of theft is
to report less consumption than actual consumption or shift
load toward low tariff periods. After generating malicious
samples, the authors exploit ROS technique to solve class
imbalance problem.
In [10], the authors use six theft cases that are introduced
by [2] to generate malicious samples and SMOTE is lever-
aged to handle uneven distribution of samples. In [25], the
authors use SMOTE and near miss technique to tackle class
imbalance ratio. After balancing the dataset, the authors
perform comparison between bagging and boosting ensemble
techniques. However, both techniques give better results on
SMOTE rather than near miss. In [2], the authors argue that
goal of theft is to report less consumption or shift load from
high tariff periods to low tariff periods. So, it is possible
to generate malicious samples from benign ones. In [18],
the authors use 1D-Wasserstein GAN to generate duplicated
copies of minority class. In [19], the authors use adaptive
synthetic (ADASYN) sampling approach to tackle class im-
balance ratio and perform comparision between different ML
and deep learning techniques. In [3], [10], SMOTE technique
is used to tackle the class imbalance ratio. In [2], authors
use ROS technique to handle the class imbalance ratio. It
replicates existing samples of minority class, which create an
overfitting problem. Moreover, they introduce six theft cases
to generate malicious samples to balance ratio between theft
and normal samples. However, cases 1 and 2 do not have
resemblance with real theft cases. In [7], [8], [15], [17], [20],
[21], [27], [28] , the authors do not tackle above mentioned
problem. One of severe issue in ETD is class imbalance ratio
where one class (honest consumers) is dominant to other
class (theft consumers). In [25], the authors use SMOTE and
near miss method to handle class imbalance problem. In [9],
[11], the authors do not tackle class imbalance problem. The
ML classifiers become biased toward majority class, ignore
the minority class and generate false alarms due to uneven
distribution of samples. A utility cannot bear false alarm
because it has low budget for on site inspection.
III. PROBLEM STATEMENT
In [2], the authors propose a CPBETD to identify normal
and abnormal EC patterns. However, the CPBETD does not
use any feature engineering technique to solve the curse of
dimensionality issue. This issue refers to a set of problems
that occurs due to high dimensionality of a dataset. A dataset,
which contains a large number of features, generally in order
of hundreds or more, is known as a high dimensional dataset.
A time series dataset has high dimensionality that increases
time complexity, reduces DR and affects the generalization of
a classifier. In [7], [8], the authors solve the curse of dimen-
sionality issue by selecting the prominent features through
deep learning and meta-heuristic techniques. However, the
authors do not address class imbalance problem, which is a
major issue in NTLs detection. In [3], [25], the authors use
SMOTE to handle class imbalance ratio. However, SMOTE
creates an overfitting problem. It does not perform well on
time series data. In [9], the authors use RUS technique to
handle class imbalance ratio. However, this approach dis-
cards the useful information from data, which creates an
underfitting issue.
IV. DATA PREPROCESSING
Data preprocessing is an important part of data science where
the quality of data is improved by applying different tech-
niques that directly enhance the performance of ML methods.
4VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 2: Dataset information
Time window Jan. 1, 2014 to Oct. 31, 2016
Total consumers 42372
Normal consumers 38757
Electricity thieves 3615
In this Section, the data preprocessing techniques used in this
paper are discussed in detail.
A. ACQUIRING THE DATASET
SGCC dataset is used in this study to evaluate the perfor-
mance of the proposed model. It contains consumers’ IDs,
daily EC and labels either 0 or 1. It comprises EC data
of 42,372 consumers, out of which 91.46% are normal and
remaining are thieves. Each consumer is labeled as either 0
or 1, where 0 represents normal consumer and 1 represents
electricity thief. These labels are assigned by SGCC after
performing on-site inspections. The dataset is in a tabular
form. The rows represent complete record of each consumer.
While columns represent daily EC of all consumers. The
meta information about dataset is given in Table 2.
B. HANDLING THE MISSING VALUES
EC datasets often contain missing or erroneous values, which
are presented as not a number (NaN). The values often
occur due to many reasons: failure of smart meter, fault in
distribution lines, unscheduled maintenance of a system, data
storage problem, etc. Training data with missing values have
negative impact on the performance of ML methods. One
way to handle the missing values is to remove the consumers’
records that have missing values. However, this approach
may remove valuable information from data. In this study,
we use a linear imputation method to recover missing values
[3].
f(xi) =
xi,j-1 +xi,j+1
2,xi,j =N aN, xi,j±16=NaN,
0,xi,j-1 =NaN or xi,j+1 =NaN,
xi,j,xi,j 6=N aN.
(1)
In Equation (1), xi,j represents daily EC of a consumer iover
time period j(a day). xi,j-1 represents EC of the previous day.
xi,j+1 represents the EC of the next day.
C. REMOVING THE OUTLIERS FROM DATASET
We have found some outliers in the EC dataset. One of
the most important steps of data preprocessing phase is to
detect and treat outliers. The supervised learning models are
sensitive to the statistical distribution of data. The outliers
mislead the training process as a result the models take
longer time for training and generate false results. Motivated
from [7], we use three-sigma rule (TSR) to handle outliers.
Mathematical form of TSR is given in Equation (2).
f(xi) = (¯xi+ 3 ×σ(xi), if xi,j >¯xi+ 3 ×σ(xi),
xi,j otherwise. (2)
Algorithm 1: Data preprocessing steps
Data: EC dataset: X
1X= (x1, y1),(x2, y2), ..., (xm, ym)
2Variables: mini=minimum value of consumer xi,
maxi=miximum value of consumer xi,xi=
mean of consumer xi,σi=standard deviation of
consumer xi,row, col =X.shape
3for irow do
4for jcol do
5Fill missing values:
6if xi,j1&& xi,j +1 6=NaN && xi,j == NaN
then
7xi,j = (xi,j1+xi,j +1)/2
8end
9if xi,j1kxi,j +1 == NaN then
10 xi,j = 0
11 end
12 Remove outliers:
13 if xi,j > xi+ 3σithen
14 xi,j =xi+ 3σi
15 end
16 Min-max normalization:
17 xi,j =xi,j mini
maximini
18 end
19 end
Result: Xnormalized =X
xirepresents complete energy EC history of consumer i.
The ¯xidenotes average EC and σ(xi)represents standard
deviation of consumer i.
D. NORMALIZATION
After handling the missing values and outliers, we apply the
min-max technique to normalize the dataset because all deep
learning models are sensitive to the diversity of data [7]. The
experimental results show that deep learning models give
good results on normalized data. The mathematical form of
min-max technique is given in equation (3).
xi,j =xi,j min(xi)
max(xi)min(xi)(3)
The min(xi)and max(xi)represent minimum and max-
imum values of EC of consumer i, respectively. All data
preprocessing steps are shown in algorithm 1. In line number
1 and 2, the dataset is acquired from an electric utility and
variables are initialized. In line number 3 to 19, following
steps are performed: remove missing values, handle outliers
and apply the min-max normalization technique. Finally, we
obtain a normalized dataset.
E. EXPLORATORY DATASET ANALYSIS
Electricity theft is a criminal behaviour, which is done by
tampering or bypassing smart meters, hacking smart meters
through cyber attacks and manipulating meter readings us-
ing physical components or over the communication links.
VOLUME 4, 2016 5
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
15 20
Days
0
5
10
15
20
25
30
kWh
5 10 15 20
Days
(a)
Days
1st week
2nd week
3rd week
4th week
1 2 3 4 5
(b)
0
5
10
15
20
25
30
kWh
5 10 15 20
Days
(c)
1st week
2nd week
3rd week
4th week
1 2 3 4 5
Days
(d)
Ist week
2nd week
3rd week
4th week
Ist week
2nd week
3rd week
4th week
10.59 0.31 0.2
0.59 10.8 0.27
0.31 0.8 10.58
0.2 0.27 0.58 1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
(e)
1-0.067 0.43 0.042
-0.067 10.51 -0.79
0.43 0.51 1-0.79
0.042 -0.79 -0.79 1
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
Ist week
2nd week
3rd week
4th week
(f)
FIGURE 1: Statistical analysis between normal and abnor-
mal EC
Since EC data contains normal and abnormal patterns, that
is why data driven approaches receive high attention from
research community to differentiate between benign and thief
consumers. We conduct a preliminary analysis on EC data
through statistical techniques to check existence of period-
icity and non-periodicity in consumers’ EC patterns. Meta
information about dataset is given in Section IV-A. Figure 1a
shows the EC pattern of a normal consumer during a month.
There are a lot of fluctuations in a monthly EC pattern. So,
it is difficult to find normal and abnormal patterns from 1D
time series data. Figure 1b shows EC patterns of a normal
consumer according to weeks. The EC is decreasing on days
3 and 5, whereas, it is increasing on days 2 and 4. While,
2nd week shows abnormal pattern, which is different from
other weeks. We also conduct similar type of analysis on
theft patterns. Figures 1c and 1d show EC during a month
and a week of an energy thief. There are a lot of fluctuations
in monthly measurements and no periodicity exists in weekly
EC patterns.
Moreover, the correlation analysis is conducted between
EC of thieves and normal consumers. Figure 1e shows Pear-
son correlation values of a normal consumer that are mostly
more than 0.3. It is the indication of a strong relationship
between weekly EC patterns of a normal consumer. Fig-
ure 1f shows Pearson correlation values of electricity thief,
TABLE 3: Euclidean distance similarity measure
Consumers w1, w2w1, w3w1, w4Average
Normal 4.70 4.83 3.66 4.40
Theft 4.66 3.54 12.90 7.03
which indicate poor correlation between weekly EC data.
Hereinafter, we use Euclidean distance similarity measure to
examine how much weekly observations are similar to each
other. Euclidean distance is calculated for both normal and
theft consumers. We compare EC pattern of the last week
of a month with the previous three weeks and then take
the average of differences to decide how much normal EC
is different to abnormal EC. We observe that the Euclidean
distance between normal EC pattern is low as compared
to abnormal ones. Similar type of findings are found in
the whole dataset. To avoid the repetition, exploratory data
analysis is conducted on some observations, which are shown
in Figure 1 and Table 3.
f(x) = q(wi,jwm,j)2+... + (wi,jnwm,j n)2.
(4)
Equation (4) shows Euclidean distance formula to measure
similarity between weekly EC pattern. The wiand wmdenote
ith and mth weeks. j is a EC of a specific week day j 5.
After conducting statistical analysis on thieves and normal
consumers, we conclude that theft patterns have more fluc-
tuations (less periodic) than normal EC patterns. We believe
that this type of patterns can also be observed in datasets,
which are collected from different regions of countries. How-
ever, it is challenging to capture long-term periodicity from
1D time series dataset because it consists of long sequential
patterns. The conventional statistical and ML models, such as
autoregressive integrated moving average, SVM and decision
tree are unable to retrieve these patterns. Based on the above
analysis, we pass 1D data to GRU model because it is spe-
cially designed to capture temporal patterns from time series
data. Whereas, 1D EC data is stacked according to weeks and
is fed into GoogLeNet to extract periodicity between weeks.
V. THE PROPOSED MODEL
The proposed system model contains following steps:
handling the class imbalance problem using TLSGAN,
extracting prominent features utilizing GRU and
GoogLeNet,
classifying the theft and benign samples leveraging fully
connected neural network,
handling the non malicious factors using memory units
of GRU and
enhancing the model’s generalization ability with the
help of dropout and batch normalization layers.
Each of the above mentioned steps is explained in the follow-
ing subsections.
A. HANDLING THE CLASS IMBALANCE PROBLEM
One of the critical problems in ETD is class imbalance ratio
where one class (honest consumers) is dominant to other
6VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
class (electricity thieves). The EC data is not normally dis-
tributed and skewed towards the majority class. When a ML
model is applied to an imbalance dataset, it becomes biased
towards the majority class and do not learn important features
of the minority class, which increases the FPR. Traditionally,
two sampling techniques such as ROS and RUS are used to
balance the dataset. However, these techniques have some
limitations: overfitting, information loss and duplication of
existing data. In this article, we propose TLSGAN to handle
class imbalance ratio because it is specially designed for
time series datasets by utilizing GRU layers. Its objective
function is based on the least-square method that computes
the difference between real and fake samples and generates
new samples, which have high closeness to real samples.
The collected electricity theft data belongs to the time se-
ries domain. So, GRU layers are exploited to design the
TLSGAN model. Using the least square function, the model
learns a small amount of both real theft data distribution and
generated fake samples. Finally, the generated samples are
concatenated with real samples and class imbalance problem
is solved. The overall working mechanism of TLSGAN is
explained below.
We select the existing theft data as training data. The
theft samples are presented as Pdata(x). A random noise or
latent variable zis drawn from Gaussian distribution Pg(z).
A mapping relationship is established between Pg(z)and
Pdata(x)through the GAN model. The GAN model contains
two deep learning models: generator (G) and discriminator
(D). The former is responsible to learn regularities from
Pdata(x)distribution and generate fake samples. It takes a
random variable zas input from Pg(z)and produces G(z)as
output. Its main goal is to fit Pg(z)onto Pdata(x)to generate
highly resembling fake samples with real theft samples and
confuse the D as many times as possible. The D is responsible
to discriminate whether input data is real or fake. It takes real
theft samples and synthetic samples generated by G as input
and produces output either 0 or 1, which indicates that the
generated samples are either real or fake. The mathematical
form of min-max equation of GAN network is given below
[29].
min
Gmax
DVGAN (D, G) = Expdata (x)[log D(x)]+
Ezpz(z)[log(1 D(G(z)))],(5)
where, VGAN (D, G)is the loss function of GAN,
Expdata(x)is the expected value of theft distribution and
Ezpdata(z)is the expected value of latent distribution.
The standard GAN network is suitable for unsupervised
learning problems. It uses the binary cross-entropy function
to draw a decision boundary between real and fake sam-
ples. The limitation of binary cross-entropy is that it tells
whether the generated sample is real or fake but does not
tell how much generated samples are far away from the
decision boundary. It creates a vanishing gradient problem
and stops the training process of the GAN model. In [29], the
authors propose a least square generative adversarial network
Algorithm 2: Training of TLSGAN
Data: Xnormalized
1Variables: Seperate theft & benign samples from
Xnormalized ,Theft: T= {xi,j ,xi,j+1,xi,j+2, ...,
xm,n}, Normal: N= {yi,j,yi,j +1,yi,j +2,..., yp,n}
2while Stopping condition is not met do
3tiSample from theft distribution
4siSample from Gaussian distribution
51
tPt
i=1[1
2Etpdata(t)[(D(ti)b)2] +
1
2Esps(s)[(D(si)a)2]]
6Fix discriminator weights
7ZiSample from latent space
81
nPn
i=1[1
2Ezpz(z)[(D(zi)c)2]]
9end
10 aand bare labels of theft and fake patterns
11 cis distance that G wants to decieve D
12 After training of G, fake theft patterns are generated
13 F akeS amples =G(z)
14 XBalData =C oncatenate(F akeS amples, N, T )
Result: Return balanced dataset: XBalData
(LSGAN) architecture, which is an extension of the standard
GAN model. It uses the least square loss instead of binary
cross-entropy loss function. The LSGAN provides two bene-
fits. The standard GAN only updates those samples, which
are at wrong side of the decision boundary. The LSGAN
penalizes all the samples, which are away from the decision
boundary, even if the samples reside at the correct side of the
boundary. During the penalization process, the parameters of
D and decision boundary are fixed. Now, G generates samples
that are closer to the decision boundary. Secondly, penaliz-
ing the samples near the decision boundary produces more
changes in gradients, which solves the vanishing gradient
problem. The min-max objective function of LSGAN is given
in Equation (6), [29].
max
DVLSGAN (D) = 1
2Expdata(x)[(D(x)b)2]+
1
2Ezpz(z)[(D(x)a)2],
(6)
min
GVLSGAN (G) = 1
2Ezpz(z)[(D(G(z)) c)2],
where, VLSGAN (G)is the loss function of LSGAN. The
aand bare labels of real (theft data) and fake samples.
cis the value of distance between both samples. The G
needs to minimize this value in order to deceive D. The
LSGAN is designed for generating fake images using con-
volutional layers. We change the internal architecture and
use GRU layers instead of convolutional layers because we
are working on a problem that belongs to sequential data.
The training process of TLSGAN is presented in algorithm
2. We pass Xnormalized data to algorithm 2 that is obtained
from algorithm 1. In the first step, variables are initialized. In
steps 2 to 9, TLSGAN is trained on theft samples to generate
VOLUME 4, 2016 7
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
fake theft patterns. In steps 10 to 14, the data is generated
from latent distribution and passed to G to produce fake theft
samples. At the end, we concatenate fake samples generated
by G, original theft samples, and normal samples and return
a balanced dataset XBalData .
B. ARCHITECTURE OF HYBRID MODEL
Time series data of EC has complex structure with high
random fluctuations because it is affected by various factors
like high load, weather conditions, big party in a house, etc.
Traditional models like SVM, MLP, etc., are not ideal to learn
complex patterns. The models have low DR and high FPR
due to curse of dimensionality issue. In literature, different
deep learning models are used to learn complex patterns from
time series data.
In this article, a hybrid model is proposed, which is a com-
bination of GoogLeNet and GRU. In [30], [31], the authors
prove that hybrid deep learning models perform better than
individual learners. The proposed model takes advantages of
both GoogLeNet and GRU by extracting and remembering
periodic features of EC dataset. The architecture of the pro-
posed model consists of three modules: GRU, GoogLeNet
and hybrid. We pass 1D data to the GRU module. Whereas,
2D weekly EC data is passed to the GoogLeNet module. The
hybrid module takes outputs of both modules, concatenates
them and gives final results about having anomaly in EC
patterns. The hybird deep learning models are very efficient
because they allow joint training of both models. Figure
2 shows overall structure of the proposed model. In the
proposed system model, steps 1, 2 and 3 show data pre-
processing phase where we handle missing values, outliers
and normalize the dataset, respectively. In step 4, the class
imbalance problem is solved. In steps 5 and 6, prominent
features are extracted from 1D and 2D EC datasets using
GRU and GoogLeNet models, respectively. Finally, in step 7,
extracted features of GRU and GoogLeNet are concatenated
and passed to a fully connected neural network to classify
between normal and theft samples.
C. GATED RECURRENT UNIT
We observe that there are a lot of fluctuations in theft EC
patterns as compared to normal consumers. So, 1D data is
fed into GRU model to capture co-occurring dependencies in
time series data. GRU is proposed by Chung et al. in 2014
to capture related dependencies in time series data. It has
memory modules to remember important periodic patterns,
which help to handle sudden changes in EC patterns due to
non-anomalous factors like changing in weather conditions,
big party in a house, weekends, etc. Moreover, it is introduced
to solve the vanishing gradient problem of recurrent neural
network (RNN). GRU and LSTM are considered as variants
of RNN. In [32], the authors compare the performance of
GRU and LSTM with RNN model on different sequential
datasets. Both models outperform the RNN and solve its
vanishing gradient problem. In [24], the authors from Google
conduct extensive experiments on 10,000 LSTM and RNN
architectures. Their final experimental results show that no
single model is found that performs better than GRU. Based
on the above analysis, we opt GRU to extract optimal features
from EC dataset because it gives good results on sequential
datasets. It has reset and update gates that control the flow of
information inside the network. The update gate decides how
much previous information should be preserved for future
decisions. Whereas, the reset gate decides that how much
past information should be kept or discarded. Equations of
update and reset gates are similar to each other. However,
the difference comes from weights and gates’ usage. The
equations of GRU network are given below [8].
zt=σ(Wz,[ht1, xt]),(7)
rt=σ(Wr,[ht1, xt]),(8)
ˆ
ht= tanh(W, [rtht1, xt]),(9)
ht= (1 zt)ht1+ztˆ
ht.(10)
Where, t,zt,σ,Wzand xtrepresent time step, update gate,
sigmoid function, update gate weight and current input, re-
spectively. ht-1 ,ˆ
hand rtare previous hidden state, candidate
value, reset gate, respectively. Wris reset gate weight, Wis
weight of candidate value and htis hidden state. The last
hidden layer of GRU is presented as DenseGRU .
D. GOOGLENET
It is difficult to capture long-term periodicity from 1D
EC data. However, periodicity can be captured if data is
aligned according to weeks as explained in Section IV-E.
The GoogLeNet is a deep learning model that is proposed
by researchers at Google in 2014. It is designed to increase
the accuracy and computational efficiency of the existing
models. Its architecture is similar to the existing CNN models
like LeNet-5 and AlexNet, etc. However, the core of the
model is auxiliary classifiers and inception modules. Each
inception module contains 1×1,3×3,5×5and 7×7
convolutional filters that extract hidden or latent features
from EC data. After each inception module, the output of
convolutional and max pooling layers are concatenated and
passed to next inception module. The auxiliary classifiers
calculate training loss after 4th and 7th inception modules
and add it to the GoogLeNet network to prevent it from
vanishing gradient problem.
In [7], [31], the authors exploit 2D-CNN model to the
extract abstract features from time series dataset. Motivated
from these articles, the GoogLeNet is applied to extract latent
features from EC data. The latent features increase model’s
generalization ability. The 1D EC data is transformed into 2D
according to weeks and is fed as input to GoogLeNet model,
which has inception modules. Each inception module has
max pooling and multiple convolutional layers with different
filter sizes. In [7], the authors use simple CNN model to
extract local patterns from EC data. In simple CNN model,
multiple convolve windows of the same size move over EC
patterns and extract optimal features. However, the same size
8VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
S1
Noise
Discriminator
Time series signal
Real
Fake
Generator loss
Discriminator
loss
Generator
4
S1
L1
L4
L3
L2
Theft
samples
Normal
samples
xnk xnk
x11 x1k
xnk xnk
x11 x1k
xnk xnk
x11 x1k
x1x2
Conv layers Conv layers
Pooling
layers
Conv
layers
Text
GRUn
GRU2
GRU1
Balancing the
dataset
1D data
2D data
5
6
7
Hybrid module
GoogleNet module
x1x2
GRUh
GoogLeNet
Output with sigmoid
function
L5 S2
L5 S2
L6 S3
Missing values
12
Outliers
3Normalization
GRU module
SGCC labeled dataset
1
3
2Data preprocessing
module
5
6
GRU & GoogleNet
modules
7Hybrid module
4Data imbalance
module
L6: High FPR & Overfitting issue
S1: TimeGAN
S2: GoogLeNet and GRU
L1: Class imbalance
L2: Information loss due to RUS
L3: Data duplication due to ROS
L4: Overfitting due to SMOTE
L5: Curse of dimensionality
issue
S3: Dropout layers and batch
normalization
L1, L2, L3, L4
L5
S1
L6
L7
S2
S3
S4
Hybrid layer
FIGURE 2: The proposed system model
of convolve windows have low ability to extract optimal
features.
The GoogLeNet overcomes this problem through incep-
tion modules. Different number of convolve and max pooling
layers extract optimal features from EC data. Moreover,
GoogLeNet has less time and memory complexity as com-
pared to the existing deep learning models. However, it is
designed for computer vision tasks that is why it has multiple
inception modules to extract edges and interest points from
images. For our problem, we change the architecture and
use only one inception module that extracts periodicity and
non-periodicity from weekly EC patterns. Finally, we use
flatten and fully connected layers to attain principal features
that are extracted through convolutional and max pooling
layers. The last hidden layer of GoogLeNet is presented as
DenseGoogLeN et
E. HYBRID MODULE
GRU memorizes the periodic patterns from 1D data.
Whereas, GoogLeNet captures latent patterns from 2D data.
VOLUME 4, 2016 9
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
We combine the DenseGoogLeNet and DenseGRU to aggregate
latent and temporal patterns. The outcome of the model is
calculated through sigmoid activation function and training
loss is measured using binary cross entropy.
hHG2= (WH G2·[DenseGoogLeN et, D enseGRU], bH G2),
(11)
YNT L =σ(hH G2).(12)
Where, hHG2: hidden layer of hybrid module, WHG2: weight
of hybrid layer, bHG2: bias of hybrid layer, YNTL: output and
σ: sigmoid function. We pass XBalData to algorithm 3 that
is taken from algorithm 2. On lines 1 to 3, variables are
intialized. The 1D EC data is transformed into 2D format
from lines 4 to 6. On lines 7 to 17, we pass 1D data to GRU
to extract time-related patterns. Whereas, 2D data is fed into
GoogLeNet to retrieve periodicity and non periodicity from
weekly EC patterns. On lines 18 and 19, we concatenate fea-
tures of GRU and GoogLeNet and apply sigmoid activation
function, which classifies theft and normal EC patterns.
Algorithm 3: Training of HG2
Data: EC dataset: XBalData
1Data in 1D format
2X1D= {xi,j ,xi,j+1 ,xi,j+2, ..., xm,n }
3m= 42372, n = 1034
4Convert data in 2D format
5Z=
x1,1· · · x1,k
.
.
.....
.
.
xj,1· · · xm,k
6j= 147, k = 7
7Pass X1Dto GRU
8zt=σ(Wz,[ht1, xt])
9rt=σ(Wr,[ht1, xt])
10 ˆ
ht= tanh(W, [rtht1, xt])
11 ht= (1 zt)ht1+ztˆ
ht
12 DenseGRU =relu(W·ht, b])
13 Pass Zto GoogLeNet
14 Z[a, c]=(Z)[a, c] = PjPkf[j, k]Z[aj, c k]
15 a, c dimension of output matrix
16 F lGoogLeN et =flatten(Z)
17 DenseGoogLeN et = [W·F lGoogLeN et +b]
18 hHG2= (WH G2·[DenseGRU , DenseGoog LeNet ]+ b)
19 bbias term
20 YNT L =σ(hH G2)
Result: YNT L
F. PERFORMANCE METRICS
One of the main challenges of ETD is a class imbalance prob-
lem where classifiers become biased towards the majority
class and ignore the minority class. Therefore, the selection
of suitable measures is necessary to evaluate the performance
of classifiers for both classes. We opt ROC-AUC and PR-
AUC as performance metrices. The ROC-AUC is retrieved
by plotting true positive rate (TPR), also known as recall,
on y-axis and FPR on x-axis. It is a convenient diagnostic
tool because it is not biased towards minority and majority
classes. Its value lies between 0 and 1. Although, ROC-
AUC is a good performance measure, however, it does not
consider precision of a classifier and does not give equal
importance to both classes. Additionally, test dataset has
imbalance nature, so we decided to take into account PR-
AUC for performance evaluation of the classifiers [8]. PR-
AUC is a ratio of precision and recall on different threshold
values. The precision measures the percentage of correctly
identified number of electricity thieves. The maximization
of precision increases recovery revenue of utility. The recall
calculates percentage of electricity thieves on suspicious list.
High scores of precision and recall are very important for
accomplishing the goals of a utility.
VI. EXPERIMENTS AND RESULTS ANALYSIS
In this paper, all models are trained and tested on SGCC
dataset. The description of the dataset is given in Section
IV-A. We use Google Colab to train deep learning and
ML models by taking advantage of distributed clustering
computing. Deep learning models are implemented through
TensorFlow, which is a deep learning library. Moreover, con-
ventional models are fitted through the scikit-learn library.
A. PERFORMANCE ANALYSIS OF LEAST SQUARE
GENERATIVE ADVERSARIAL NETWORK
Due to the imbalance nature of the dataset, TLSGAN is pro-
posed to generate fake samples that have high resemblance
with real-world theft samples. The standard LSGAN uses
VGG neural network architecture to generate fake images.
However, our dataset belongs to the time series domain. So,
we change the network architecture according to our dataset’s
requirement. We replace the convolutional layers with GRU
layers because these layers are designed to handle problems
of sequential data. Both D and G models contain GRU and
dense layers. The linear activation function is implemented at
the last layer of D because it measures how much generated
samples are far away from real samples and changes the
weights of G to improve its performance.
Adam optimizer is used to train the parameters of TLS-
GAN because it is easy to implement, computationally less
expensive, requires little memory and gives good results on
large datasets. Figure 3a shows the loss function of Generator
(G) and Discriminator (D) on real and generated samples
during training process. After 100 epochs, D hardly differen-
tiates between real and fake samples. Whereas, G has the loss
function value between 0.5 and 1.75, which indicates that it
has developed a relation between real and latent data points
to generate new theft samples. Figure 3b shows patterns of
real theft samples. Moreover, Figures 3c and 3d present the
theft samples generated by TLSGAN. Both figures show that
generated samples have a high resemblance with original
samples of thieves that are presented in Figure 3b. Similar
trends are observed for both real and latent features, which
10 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
(a)
0 20 40 60 80 100
No. of features
0.0
0.2
0.4
0.6
0.8
1.0
kWh
Real theft samples
(b)
(c)
0 20 40 60 80 100
No. of features
0.0
0.2
0.4
0.6
0.8
1.0
Generated samples
(d)
FIGURE 3: Performance analysis of TLSGAN
ensure the diversity in generated theft patterns. In Figures 3b,
3c and 3d, the x-axis represents the number of days, whereas,
the y-axis represents the EC in kilowatt-hour (kWh). Table 4
TABLE 4: Comparison through accuracy and execution time
of different data generation techniques
Techniques Execution time (s) Accuracy (%)
TLSGAN 61.61 95
SVM_SMOTE 177 88
Borderline_SMOTE 56 90
SMOTE_TomekLinks 71.61 93
SMOTE_ENN 957 89
ADASYN 71.61 93
SMOTE 9.18 81
ROS 0.12 88
RUS 0.5 89
presents classification accuracy and execution time of differ-
ent data generation techniques. We compare the performance
of proposed TLSGAN with current variants of SMOTE:
SVM_SMOTE, Borderline SMOTE, SMOTE_TomekLinks,
SMOTE_ENN and ADASYN. TLSGAN generates new theft
samples that increase the classification accuracy of the pro-
posed model. As explained above, the generated samples
have a high resemblance with real theft samples that re-
duces the overfitting problem, which occurs in other over-
sampling techniques and increases the model generalization
and robustness properties. The execution time of TLSGAN
is more than ROS, RUS, Borderline-SMOTE and SMOTE.
While it is less than SVM_SMOTE, SMOTE_TomekLinks,
SMOTE_ENN and ADASYN. The running time of TLS-
GAN depends upon the number of hidden layers and the sam-
pling rate of a dataset. The execution time of SVM_SMOTE,
Borderline-SMOTE, SMOTE_TomekLinks, SMOTE_ENN,
ADASYN and SMOTE depends upon the number of sam-
ples and features in a dataset. Whereas, the execution time
0.5 0.6 0.7 0.8 0.9 1.0
Recall
0.0
0.2
0.4
0.6
0.8
1.0
Precision
Train
Test
(a) PR Curve
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
0.0
0.2
0.4
0.6
0.8
1.0
True positive rate
No Skill
Train (ROC-AUC = 83.4%)
Test (ROC-AUC = 79.7%)
(b) ROC Curve
Epoch
0.6
0.7
0.8
Loss
Train
Test
Epochs
0.4
Accuracy
1 5 10 15 20
Train
Test
0.6
(c) Loss and Accuracy Curves
FIGURE 4: Performance analysis of gated recurrent unit
of RUS and ROS does not significantly change with large
datasets because they simply select samples from the dataset
and duplicate or remove them. SMOTE_TomekLinks and
SMOTE_ENN techniques take too much time because they
perform under sampling and over sampling steps to remove
redundant samples from the datasets.
B. PERFORMANCE ANALYSIS OF GATED RECURRENT
UNIT
Figures 4a, 4b and 4c show the performance of GRU model
on SGCC dataset. Figure 4a presents performance of the
model in terms of PR curve. The curve on training and
testing datasets is moving parallelly with a little bit differ-
ence, which means that model has learnt patterns of theft
and normal consumers. Now, it has ability to differentiate
between both classes. Figure 4b shows ROC curve and AUC
of model on training and testing datasets. GRU model attains
83.4% and 79.7% ROC-AUC values on training and testing
datasets, respectively. Figure 4c presents loss and accuracy
of the model on training and testing datasets. It achieves
good accuracy and has minimum loss after 20 epochs. Its
performance may increase with more number of epochs but it
is also possible that model may fall into overfitting problem.
GRU model has update and reset gates to regulate the flow
of information throughout the network. These gates prevent
the model from vanishing gradient problem and reduce its
chances of sticking in local minima problem. Moreover, these
gates increase the model’s overall performance by extracting
the optimal temporal features from the EC dataset, which
have time-related dependencies after certain intervals. Table
5 presents the hyperparameters setting of GRU model.
C. PERFORMANCE ANALYSIS OF GOOGLENET
Figures 5a, 5b and 5c show the performance of the
GoogLeNet model. The Figure 5a shows PR curve of
VOLUME 4, 2016 11
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 5: Hyperparameters setting of gated recurrent unit
Hyperparameters Optimal values
Size of GRU layer 60
Size of MLP 50
Dropout rate 0.4
Activation function at last layer Sigmoid
Optimizer ADAM
kernel_initializer he_normal
0.6 0.7 0.8 0.9 1.0
Recall
0.0
0.2
0.4
0.6
0.8
Precision
Train
Test
1.0
(a) PR Curve
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
0.0
0.2
0.4
0.6
0.8
1.0
True positive rate
No Skill
Train(ROC-AUC = 95.7%)
Test(ROC-AUC = 94.2%)
(b) ROC Curve
Epoch
0.4
0.5
0.6
Loss
Train
Test
1Epochs
0.7
Accuracy
5 10 15 20
Train
Test
0.8
(c) Loss and Accuracy Curves
FIGURE 5: Performance analysis of GoogLeNet
GoogLeNet model. PR curve provides a good analysis of
the model’s performance because it gives equal weights to
both normal and abnormal samples. The model obtains good
PR curves on training and testing datasets, which indicate
that it learns patterns of both normal and abnormal samples
appropriately during the training phase. Figure 5b shows
the model’s performance using the ROC curve and ROC-
AUC performance indicators. These indicators evaluate that
how much a model is good in predicting the positive class.
GoogLeNet achieves 95.7% and 94.2% AUC values on train-
ing and testing datasets, respectively, that are more than
AUC values of the GRU model. Moreover, loss and accuracy
of the model on training and testing datasets can be seen
in Figure 5c. We visualize the model’s performance with
more than 20 epochs. However, we observe that there are
more number of fluctuations in training and testing curves of
accuracy and loss, which indicate the model’s instability on
more than 20 epochs. Due to the above mentioned reasons,
model is trained only on 20 epochs that give good results
and save our computational resources. In this model, data
shape is transformed according to weeks to learn periodic
patterns and extract optimal features through convolution and
max pooling layers. The max pooling layers reduce data
dimensionality that increases the model’s convergence speed.
Moreover, dropout layers are used to reduce overfitting prob-
lem and increase generalization property. Table 6 presents the
hyperparameters setting of GoogLeNet.
TABLE 6: Hyperparameters setting of GoogLeNet
Hyperparameters Optimal values
Number of convolutional layers 2
Max pooling layers 1
Dense layer size 30
Dropout size 0.4
Activation function at last layer Sigmoid
Optimizer ADAM
kernel_initializer he_normal
0.5 0.6 0.7 0.8 0.9 1.0
Recall
0.0
0.2
0.4
0.6
0.8
1.0
Precision
Train
Test
(a) PR Curve
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
0.0
0.2
0.4
0.6
0.8
1.0
True positive rate
No Skill
Train(ROC-AUC = 97.8%)
Test(ROC-AUC = 95.7%)
(b) ROC Curve
0.0
0.2
0.4
Loss
Train
Test
Epochs
0.8
0.9
1.0
Accuracy
Train
Test
1 5 10 15 20
(c) Loss and Accuracy Curves
FIGURE 6: Performance analysis of HG2
D. PERFORMANCE ANALYSIS OF HYBRID HG2MODEL
In this Section, the performance of HG2model is compared
with stand-alone deep learning models. Figures 6a, 6b and 6c
show the HG2model’s performance using different perfor-
mance measures. HG2achieves 97.8% and 95.7% ROC-AUC
values on training and testing datasets, respectively, that are
more than GRU and GoogLeNet models. Figure 6c shows
loss and accuracy curves on training and testing datasets that
are better than curves of GRU and GoogLeNet models, which
are presented in Figures 4c and 5c. In [30] and [31], the
authors prove that a hybrid deep learning model performs
better than individual learners and achieves better conver-
gence speed, takes less computational time and extracts
optimal features. The GRU layers extract time related pat-
terns through update and reset gates. Whereas, GoogLeNet
model has inception module, which contains max pooling
and multiple convolution layers with different filter sizes.
These layers reduce computational complexity and extract
latent and abstract patterns using local receptive fields and
weight sharing mechanism. The Keras library is used to
concatenate extracted optimal features of both GoogLeNet
and GRU classifiers. Finally, these concatenated features
have properties of both individual learners that provide better
learning to HG2model. Although, it gives low performance.
However, when we combine it with GoogLeNet model, then
the overall performance is improved. The combined model
has the ability to learn better patterns from EC data. The
12 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 7: Hyperparameters setting of proposed model
Hyperparameters Optimal values
Number of convolutional layers 3
Max pooling layers 1
Size of GRU layer 60
Hybird layer size 20
Hybird layer activitation function ReLU
Activation function at last layer Sigmoid
Optimizer ADAM
kernel_initializer he_normal
proposed hybrid model ignores the weak points of both GRU
and GoogLeNet and uses the strong points of both. This is the
reason why the poor performance of GRU does not affect the
overall performance of the proposed model. Table 7 shows
the hyperparameters setting of HG2.
E. COMPARISON WITH BENCHMARK CLASSIFIERS
In this Section, the performance of proposed model is com-
pared with existing state-of-the-art deep learning and ML
classifiers.
(1) Wide and deep convolutional neural network (WD-
CNN): It is proposed in [7] to identify normal and abnormal
patterns from EC data. The wide component is equivalent to
MLP module that is used to extract global knowledge from
data. Whereas, CNN is leveraged to attain periodic patterns
from weekly EC data. We use same dataset and hyperparam-
eters setting to compare the model with our proposed model.
(2) Hybrid multilayer perceptron and long short term
memory model: In [8], the authors propose a hybrid model
that is a combination of LSTM and MLP. They pass EC
data to LSTM to extract periodic patterns. Whereas, smart
meter data is fed to the MLP model to retrieve non-sequential
information. They concatenate both models through Keras
library and prove that a hybrid model is better than a single
model. We use the same number of hyperparameters and
dataset settings as utilized in [8] to build a hybrid LSTM-
MLP model.
(3) Naive bayes classifier (NB): It is a statistical classifi-
cation technique that is based on bayes theorem. It assumes
that there is no relationship between input features and pre-
dicts the unknown class using a probability distribution. It has
high accuracy and speed on large datasets. Moreover, it has
many applications in the real world: spam filtering, sentiment
analysis, text classification, recommendation systems, etc.
The NB has different versions according to the nature of
a dataset. We utilize Gaussian NB to classify normal and
abnormal data points in the EC dataset because it is specially
designed for the prediction of continuous values.
(4) Support vector machine: SVM is a well-known clas-
sifier in ETD. It is an enhanced version of the maximal
margin hyperplane. It can classify both linear and non-linear
data. It exploits radial, sigmoid, gaussian, etc., kernels to
transform non-linear data into a linear format and then draw
a decision boundary between electricity thieves and normal
consumers. However, its computational time is high for large
datasets. In [2], the authors use SVM to classify benign and
theft consumers. We use radial basis function kernel (RBF)
due to the non-linearity of data and different values of C
parameter. After several iteration, 100 is found to be the
optimal value of C where SVM gives good results.
(5) Logistic regression (LR): It is a supervised ML al-
gorithm used for the binary classification task. It is just
like one layered neural network. For probability of having
NTL, it multiplies the input features with a trained weight
matrix and then pass the resultant values to sigmoid function
to generate output between 0 and 1. It has different solver
methods: newton’s method, stochastic average gradient and
sparse stochastic average gradient (SAGA). However, New-
ton’s method gives best results that are mentioned in Table.
8.
Results: We compare the performance of the proposed
HG2model with different state-of-the-art classifiers. The
same training and testing datasets are used for LR, NB,
MLP and SVM. We use RBF kernel for SVM due to the
non-linearity of data. Moreover, number of samples and
dimensionality of data is reduced because SVM requires high
computational time for large datasets. In [8], the authors
use sequential and non-sequential data for LSTM and MLP,
respectively. However, we do not have availability of non-
sequential information that is why only sequential informa-
tion is fed into MLP and LSTM models. The hybrid of both
models gives good results and achieves 95% and 94% ROC-
AUC and PR-AUC, respectively.
In [7], the sequential data is fed in MLP model to retrieve
global knowledge from data. Whereas, 2D stacked data is
given to CNN model to extract periodic patterns from weekly
EC data. The WDCNN achieves 92% and 88% ROC-AUC
and PR-AUC, respectively, which are more than as compared
to ROC-AUC and PR-AUC of conventional ML models.
The proposed HG2model outperforms hybrid and other
ML models because it extracts periodic and abstract patterns
from EC data using GRU and convolutional layers. As dis-
cussed earlier, the GRU layers have update and reset gates
that learn important patterns and remove redundant values.
These gates control the flow of information and improve the
overall performance of proposed model. GoogLeNet has an
inception module that contains max pooling and multiple
convolutional layers with different filter sizes. These layers
extract those patterns that cannot be retrieved through human
knowledge. These abstract or latent patterns are combined
with features that are extracted by GRU model. Due to a
combination of optimal features, HG2attains 96% and 97%
ROC-AUC and PR-AUC values that are more than all above
explained classifiers.
Table 8 shows comparison results of proposed model and
all other classifiers on different training ratios of datasets.
The deep learning models are sensitive to the size of training
data. The performance of these models increases with the
growing amount of training data. However, this is not true
for conventional ML models and their performance increases
according to the power law. After a certain point of training
on data, their performance does not imporve [33]. However,
VOLUME 4, 2016 13
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
TABLE 8: Comparison of HG2with existing techniques
Dataset Training data = 80% Training data = 60% Training data = 50%
Methods ROC-AUC (%) PR-AUC (%) ROC-AUC (%) PR-AUC (%) ROC-AUC (%) PR-AUC (%)
SVM 77 64 77 64 78 64
LR 88 82 87 80 90 83
NB 50 52 50 54 51 59
MLP 88 82 87 79 86 77
MLP-LSTM 95 94 92 90 88 82
WDCNN 92 88 91 89 56 63
HG296 97 93 91 88 85
TABLE 9: Mapping table
Limitations Solutions Validations
L1: Class imbalance problem S1: TLSGAN V1: The proposed model achieve 96% PR-AUC that indicates model is not
biased toward majority class, which is shown in Figure 6a
L2: RUS removes important information
from data
S1: TLSGAN does not remove information from
dataset
V2: Result of TLSGAN and RUS are given in Table 4. Figure 6c shows that
model is not stuck into underfitting problem
L3: ROS causes overfitting problem
S1: TLSGAN reduces the overfitting problem of
ROS by generating fake samples, which have high
resemblance with real samples
V3: It achieves good PR curve and ROCcurve on training and testing datasets
as shown in 6a and 6b
L4: SMOTE causes overfitting problem S1: TLSGAN overcomes overfitting issue of
SMOTE and ADASYN
V4: Table 4 shows accuracy of SMOTE, ADASYN and TLSGAN. The
proposed model attains good PR curve and ROC curve on training and testing
data, which indicate that model is not stuck into overfitting problem
L5: Curse of dimensionality increases
model complexity and reduces model
generalization ability
S2: GRU and GoogLeNet are used to extract fea-
tures from sequence data (1D data)
V5: In Figures 6a, 6b and 6c, proposed model achieves good results, which
indicate that GoogLeNet and GRU extracts optimal temporal patterns from
EC dataset
L6: High FPR and overfitting issue S3: Dropout and batch normalization layers are
used to reduce FPR and overfitting problem
V6: We evaluate the model on training and testing data, which achieve FPR
score that is lower than FPR of the existing models.
HG2maintains superiority on other deep learning models
and gives better performance on different training ratios on
SGCC dataset. Both SVM and NB give good results on
balanced and large datasets. However, in our case, these
models perform poorly due to the following reasons. The
SVM does not perform well on noisy data and NB’s perfor-
mance is affected by continuous values because it assumes an
independent relationship between features. For MLP-LSTM,
WDCNN and HG2, if performance is not increasing or de-
creasing then it means that we must perform hyperparameter
tuning on training data to improve results.
F. MAPPING AMONG LIMITATIONS, SOLUTIONS AND
VALIDATIONS
Table 9 shows mapping of limitations, solutions and their val-
idations. L1 describes about class imbalance problem where
classifiers are biased towards majority and ignores minority
class that increases the FPR score. S1 solution is proposed
for L1. In S1, the TLSGAN is used to handle class imbalance
problem. As shown in Table 9, V1 is validation of S1. The
proposed model achieves 96% PR-AUC score that indicates
model is not biased toward majority class. Moreover, it
achieves 4% FPR score that is acceptable for a utility. In L2,
RUS randomly removes samples of majority class to balance
ratio of theft and normal samples. However, it discards the
useful information from data that causes underfitting prob-
lem. S2 solution is proposed to tackle L2. In S2, the TLS-
GAN is a deep learning technique that is designed to generate
fake samples, which have resemblance with real samples. So,
this technique does not remove useful information from data
and solves drawbacks of RUS. V2 validates the S2. Figure
6c shows that model is not stuck in underfitting problem. In
L3 and L4, the existing data sampling techniques generate
duplicated copies of minority class to solve class imbalance
problem. These techniques are designed for tabular data and
not for time series data. So, they face overfitting issue on
time series data. TLSGAN is specially designed to generate
fake samples of time series datasets that have severe class
imbalance problem. TLSGAN uses supervised and unsuper-
vised loss functions and generates samples that resemble
with actual data and also preserves time related patterns.
The performance of TLSGAN is compared with advanced
variants SMOTE techniques. V3 and V4 are validations of
S1. Table 4 shows the comparison between different data
sampling techniques, which shows that accuracy of TLS-
GAN is more than benchmark data augmentation techniques.
Figure 6c indicates that HG2attains good loss and accuracy
curves on training and testing datasets. Moreover, proposed
model achieves good PR curve that can be seen in Figure 6a.
L5 are issues that occur due to curse of dimensionality.
The GoolgeNet is used to capture weekly periodicity from
2D data. Whereas, the GRU is leveraged to capture long
term and short term features from 1D data. In S2, GRU
and GoogLeNet extract temporal and latent patterns and pass
them to a hybrid neural network to classsify theft and normal
samples. V5 is validation of S2. Figures 6a, 6b and 6c show
performance of proposed model through accuracy, loss, PR
and ROC curves, which indicate that GRU and GoogLeNet
extract optimal features from EC dataset and transfer them to
hybird module. Due to these optimal features, HG2achieves
96% and 97% ROC-AUC and PR-AUC scores, respectively
that are more than existing techniques, which are mentioned
in Table 8.
L6 is about high FPR and overfitting problem. We know
14 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
that utilities cannot bear high FPR due to limited budget for
on site inspection. In S4, dropout and batch normalization
layers are leveraged to solve overfitting problem and reduce
the FPR score. V6 validates S4 by computing FPR. The
proposed model achieves 4% FPR that is lower than as
compared to FPR of all existing models.
VII. CONCLUSION
In this article, we propose a model to detect NTLs in the elec-
tricity distribution system. The proposed model is a hybrid of
GRU and GoogLeNet. The GRU is used to extract temporal
patterns from time series dataset. Whereas, the GoogLeNet
is exploited to attain latent patterns from the weekly stacked
EC dataset. The performance of proposed model is evaluated
on realistic EC dataset that is provided by SGCC, the largest
smart grid company in China. The simulation results show
that HG2outperforms the benchmark classifiers: WDCNN,
MLP-LSTM, MLP, LR, NB and SVM. Moreover, the class
imbalance problem is a severe issue in ETD. The TLSGAN
is proposed that consist of GRU and dense layers to tackle
the class imbalance problem. The TLSGAN generates fake
samples, which have high resemblance with real world theft
samples. The model is evaluated using suitable performance
measures: ROC-AUC and PR-curve. The results of these
measures indicate that the proposed model outperforms the
benchmark classifiers and achieves 96% and 97% ROC-AUC
and PR-AUC, respectively. In fact, the proposed model is not
limited to detect electricity theft patterns only; it can also
be used in other industrial applications to classify normal
and abnormal samples or records. In near future, we plan
to implement the proposed model as an NTLs detector in
an electricity distribution company in Pakistan to classify
normal and theft samples.
VIII. DATASET AVAILABILITY
Dataset used in this study is publically available at this link
IX. ACKNOWLEDGEMENT
This work was supported by King Saud University, Riyadh,
Saudi Arabia, through Researchers Supporting Project num-
ber RSP-2021/184. The work of author Ayman Radwan was
supported by FCT / MEC through Programa Operacional
Regional do Centro and by the European Union through the
European Social Fund (ESF) under Investigador FCT Grant
(5G-AHEAD IF/FCT- IF/01393/2015/CP1310/CT0002).
REFERENCES
[1] Arango, L. G., E. Deccache, B. D. Bonatto, H. Arango, P. F. Ribeiro, and
P. M. Silveira. “Impact of electricity theft on power quality.” 2016 17th
International Conference on Harmonics and Quality of Power (ICHQP).
IEEE, 2016.
[2] Jokar, Paria, Nasim Arianpoo, and Victor CM Leung. “Electricity theft
detection in AMI using customers’ consumption patterns.” IEEE Transac-
tions on Smart Grid 7.1 (2015): 216-226.
[3] Punmiya, Rajiv, and Sangho Choe. “Energy theft detection using gradient
boosting theft detector with feature engineering-based preprocessing.”
IEEE Transactions on Smart Grid 10.2 (2019): 2326-2329.
[4] Lo, Chun-Hao, and Nirwan Ansari. “CONSUMER: A novel hybrid in-
trusion detection system for distribution networks in smart grid.” IEEE
Transactions on Emerging Topics in Computing 1.1 (2013): 33-44.
[5] Khoo, Benjamin, and Ye Cheng. “Using RFID for anti-theft in a Chi-
nese electrical supply company: A cost-benefit analysis.” 2011 Wireless
Telecommunications Symposium (WTS). IEEE, 2011.
[6] Amin, Saurabh, Galina A. Schwartz, and Hamidou Tembine. “Incentives
and security in electricity distribution networks.” International Conference
on Decision and Game Theory for Security. Springer, Berlin, Heidelberg,
2012.
[7] Zheng, Zibin, Yatao Yang, Xiangdong Niu, Hong-Ning Dai, and Yuren
Zhou. “Wide and deep convolutional neural networks for electricity-
theft detection to secure smart grids.” IEEE Transactions on Industrial
Informatics 14.4 (2017): 1606-1615.
[8] Buzau, Madalina-Mihaela, Javier Tejedor-Aguilera, Pedro Cruz-Romero,
and Antonio Gomez-Exposito. “Hybrid deep neural networks for detection
of non-technical losses in electricity smart meters.” IEEE Transactions on
Power Systems 35.2 (2019): 1254-1263.
[9] Buzau, Madalina Mihaela, Javier Tejedor-Aguilera, Pedro Cruz-Romero,
and Antonio Gomez-Exposito. “Detection of non-technical losses using
smart meter data and supervised learning.” IEEE Transactions on Smart
Grid 10.3 (2020): 2661-2670.
[10] Hasan, Md, Rafia Nishat Toma, Abdullah-Al Nahid, M. M. Islam, and
Jong-Myon Kim. “Electricity theft detection in smart grid systems: A
CNN-LSTM based approach.” Energies 12.17 (2019): 3310.
[11] Avila, Nelson Fabian, Gerardo Figueroa, and Chia-Chi Chu. “NTL detec-
tion in electric distribution systems using the maximal overlap discrete
wavelet-packet transform and random undersampling boosting.” IEEE
Transactions on Power Systems 33.6 (2018): 7171-7180.
[12] Aslam, Sheraz, Nadeem Javaid, Farman Ali Khan, Atif Alamri, Ahmad
Almogren, and Wadood Abdul. “Towards efficient energy management
and power trading in a residential area via integrating a grid-connected
microgrid.” Sustainability 10, no. 4 (2018): 1245.
[13] Iqbal, Zafar, Nadeem Javaid, Saleem Iqbal, Sheraz Aslam, Zahoor Ali
Khan, Wadood Abdul, Ahmad Almogren, and Atif Alamri. “A domestic
microgrid with optimized home energy management system.” Energies 11,
no. 4 (2018): 1002.
[14] Ramos, Caio CO, Douglas Rodrigues, AndreN. de Souza, and Joao P.
Papa. “On the study of commercial losses in Brazil: a binary black hole
algorithm for theft characterization.” IEEE Transactions on Smart Grid 9.2
(2016): 676-683.
[15] Li, Bo, Kele Xu, Xiaoyan Cui, Yiheng Wang, Xinbo Ai, and Yanbo
Wang. “Multi-scale DenseNet-based electricity theft detection.” Interna-
tional Conference on Intelligent Computing. Springer, Cham, 2018.
[16] Li, Shuan, Yinghua Han, Xu Yao, Song Yingchen, Jinkuan Wang, and
Qiang Zhao. “Electricity theft detection in power grids with deep learning
and random forests.” Journal of Electrical and Computer Engineering 2019
(2019).
[17] Ghori, Khawaja Moyeezullah, Rabeeh Ayaz Abbasi, Muhammad Awais,
Muhammad Imran, Ata Ullah, and Laszlo Szathmary. “Performance anal-
ysis of different types of machine learning classifiers for non-technical loss
detection.” IEEE Access 8 (2019): 16033-16048.
[18] Kong, Xiangyu, Xin Zhao, Chao Liu, Qiushuo Li, DeLong Dong, and Ye
Li. “Electricity theft detection in low-voltage stations based on similarity
measure and DT-KSVM.“ International Journal of Electrical Power &
Energy Systems 125 (2021): 106544.
[19] Aslam, Zeeshan, Fahad Ahmed, Ahmad Almogren, Muhammad Shafiq,
Mansour Zuair, and Nadeem Javaid. "An attention guided semi-supervised
learning mechanism to detect electricity frauds in the distribution sys-
tems." IEEE Access 8 (2020): 221767-221782.
[20] Coma-Puig, Bernat, and Josep Carmona. “Bridging the gap between en-
ergy consumption and distribution through non-technical loss detection.
Energies 12.9 (2019): 1748.
[21] Hu, Tianyu, Qinglai Guo, Hongbin Sun, Tian-En Huang, and Jian Lan.
“Nontechnical losses detection through coordinated biwgan and svdd.”
IEEE Transactions on Neural Networks and Learning Systems (2020).
[22] Huang, Yifan, and Qifeng Xu. “Electricity theft detection based on stacked
sparse denoising autoencoder.” International Journal of Electrical Power &
Energy Systems 125 (2021): 106448.
[23] Nadeem Javaid, Aqdas Naz, Rabiya Khalid, Ahmad Almogren, Muham-
mad Shafiq, and Adia Khalid. “ELS-Net: A New Approach to Forecast
Decomposed Intrinsic Mode Functions of Electricity Load.” IEEE Access
8 (2020): 198935-198949.
VOLUME 4, 2016 15
Faisal et al.: Towards Energy Efficient Smart Grids: Sampling with TLSGAN
[24] Ding, Nan, HaoXuan Ma, Huanbo Gao, YanHua Ma, and GuoZhen Tan.
“Real-time anomaly detection based on long short-Term memory and
Gaussian Mixture Model.” Computers & Electrical Engineering 79 (2019):
106458.
[25] Gunturi, Sravan Kumar, and Dipu Sarkar. “Ensemble machine learning
models for the detection of energy theft.” Electric Power Systems Research
192 (2021): 106904.
[26] Taft, Laritza M., R. Scott Evans, Chi-Ren Shyu, Marlene J. Egger, N.
Chawla, Joyce A. Mitchell, Sidney N. Thornton, B. Bray, and M. Varner.
"Countering imbalanced datasets to improve adverse drug event predictive
models in labor and delivery." Journal of biomedical informatics 42, no. 2
(2009): 356-364.
[27] Bhat, Rajendra Rana, Rodrigo Daniel Trevizan, Rahul Sengupta, Xiaolin
Li, and Arturo Bretas. “Identifying nontechnical power loss via spatial
and temporal deep learning.” 2016 15th IEEE International Conference
on Machine Learning and Applications (ICMLA). IEEE, 2016.
[28] Saeed, Muhammad Salman, Mohd Wazir Mustafa, Usman Ullah Sheikh,
Touqeer Ahmed Jumani, and Nayyar Hussain Mirjat. “Ensemble bagged
tree based classification for reducing non-technical losses in multan elec-
tric power company of Pakistan.” Electronics 8.8 (2019): 860.
[29] Mao, Xudong, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and
Stephen Paul Smolley. “Least squares generative adversarial networks.” In
Proceedings of the IEEE international conference on computer vision, pp.
2794-2802. 2017.
[30] Huang, ChiouJye, Yamin Shen, Yung, Hsiang Chen, and Hsin Chuan
Chen. “A novel hybrid deep neural network model for short term electricity
price forecasting.” International Journal of Energy Research 45.2 (2021):
2511-2532.
[31] Yu, Jingxin, Xin Zhang, Linlin Xu, Jing Dong, and Lili Zhangzhong. “A
hybrid CNN-GRU model for predicting soil moisture in maize root zone.
Agricultural Water Management 245 (2021): 106649.
[32] Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio.
“Empirical evaluation of gated recurrent neural networks on sequence
modeling.” arXiv preprint arXiv:1412.3555 (2014).
[33] Sun, Chen, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta.
“Revisiting unreasonable effectiveness of data in deep learning era.” In
Proceedings of the IEEE international conference on computer vision, pp.
843-852. 2017.
FAISAL SHEHZAD received BS Software En-
gineering BS(SE) degree from Government Col-
lege University Faisalabad (GCUF), Faisalabad,
Pakistan in 2018. He is currently pursuing MS
in computer science with the communication over
Sensors (ComSens) Research Laboratory, Depart-
ment of Computer Science, COMSATS University
Islamabad, Islamabad, Pakistan, under the super-
vision of Dr. Nadeem Javaid. He has 5 research
publications in well reputed international journals
and conferences. His research includes Data science, Smart grid, Blockchain
and Financial market.
NADEEM JAVAID (S’8, M’11, SM’16) received
the bachelor’s degree in computer science from-
Gomal University, Dera Ismail Khan, KPK, Pak-
istan, in 1995, the master’s degree in electronics
from Quaid-i-Azam University, Islamabad, Pak-
istan, in 1999, and the Ph.D. degree in computer
science from the University of Paris-Est, France,
in 2010. He is currently an Associate Professor
and the Founding Director of the Communica-
tions over Sensors (ComSens) Research Labora-
tory, Department of Computer Science, COMSATS University Islamabad,
Islamabad Campus. He has supervised 126 master’s and 20 Ph.D. theses.
He has authored over 900 articles in technical journals and international
conferences. His research interests include energy optimization in smart
grids and in wireless sensor networks using data analytics and blockchain.
He was recipient of the Best University Teacher Award from the Higher
Education Commission of Pakistan in 2016 and the Research Productivity
Award from the Pakistan Council for Science and Technology in 2017.
He is also an Associate Editor of IEEE Access and the Editor of the
International Journal of Space Based and Situated Computing and editor of
the Sustainable Cities and Society.
AHMAD ALMOGREN (SM) received the Ph.D.
degree in computer science from Southern
Methodist University, Dallas, TX, USA, in 2002.
He is currently a Professor with the Computer
Science Department, College of Computer and
Information Sciences (CCIS), King Saud Univer-
sity (KSU), Riyadh, Saudi Arabia, where he is
currently the Director of the Cyber Security Chair,
CCIS. Previously, he worked as the Vice Dean of
the Development and Quality at CCIS. He also
served as the Dean for the College of Computer and Information Sciences
and the Head of the Academic Accreditation Council, Al Yamamah Univer-
sity. He served as the General Chair for the IEEE Smart World Symposium
and a Technical Program Committee member of numerous international
conferences/workshops, such as IEEE CCNC, ACM BodyNets, and IEEE
HPCC. His research interests include mobile-pervasive computing and cyber
security
ABRAR AHMED was born in Pakistan in 1985.
He received the B.S. in computer engineering from
the COMSATS Institute of Information Technol-
ogy, Abbottabad, Pakistan, in 2006., the M.S. de-
gree from Lancaster University, U.K, in 2008, the
Ph.D. degree in electrical engineering from the
COMSATS Institute of Information Technology,
Islamabad in 2017. Since 2006, he has been Asso-
ciated with the COMSATS Institute of Information
Technology, Islamabad, where he currently holds
the position of Assistant Professor. His research interests include wireless
channel modeling and characterizing, smart antenna systems, nonorthogonal
multiple access techniques, and adaptive signal processing.
16 VOLUME 4, 2016
Faisal et al.: Toward Energy Efficient Smart Grids: Sampling with TLSGAN
SARDAR MUHAMMAD GULFAM received MS
in computer engineering from Tampere University
of Technology, Finland, in 2010 and the Ph.D. de-
gree in electrical engineering from the COMSATS
Institute of Information Technology, Islamabad in
2017. He is working as researcher in wireless
communication.
AYMAN RADWAN received the Ph.D. degree
from Queen’s University, Kingston, ON, Canada,
in 2009. He is a Senior Research Engineer (Inves-
tigador Auxiliar) with the Instituto de Telecomu-
nicações, University of Aveiro, Aveiro, Portugal.
He is mainly specialized in coordination and man-
agement of EU funded projects. He participated
in the coordination of multiple EU projects. He is
currently the Project Coordinator of the CELTIC+
Project “MUSCLES,” as well as participating in
the coordination of ITN-SECRET. He has also been the Technical Manager
of the FP7-C2POWER Project and the Coordinator of the CELTIC+ “Green-
T” Project. His current research interests include the Internet of Things, 5G,
and green communications.
VOLUME 4, 2016 17
... Shehzad et al. [16] developed a mixed deep learning (DL) model to find NTLs in power networks. The hybrid model was constructed using GoogleNet and GRU. ...
... Overall from this literature study, it is understood that in [14] the blockchain network's scalability becomes a challenge when the smart grid increases in size and the number of users, and transactions, in [15] requires devising a technique to reduce the computational resources needed, in [16] capturing long-term periodicity in 1D time series datasets is problematic, in [17] it reacts too quickly to changes in the input data, in [18] there is a need to minimizing detection delays for electricity theft, in [19] the identification of electricity theft has an impact on consumer privacy, in [20,21] there is a need to enhance data-driven evaluation approaches by addressing class imbalance, in [22] deep neural networks require a significant quantity of data for training, in [23,24] had somewhat higher temporal complexity, in [25] the model was trained solely on the EC dataset, limiting its ability to accurately identify the location of energy thefts, in [26,27] struggle with huge datasets [28] and are occasionally computationally expensive, in [29] careful hyperparameter tuning is required, in [30] solely examines electricity usage statistics [31], which contain incomplete information and in [32] trouble with large datasets. Hence, there is a need for a novel approach to eliminate all these issues and expand the effectiveness and functionality of smart grid systems' detection of electricity theft. ...
... Shehzad et al. [16] Employs hybrid deep learning (GRU and GoogLeNet) for enhanced NTL detection. ...
Article
Full-text available
Energy management inside a blockchain framework developed for smart grids is primarily concerned with improving intrusion detection to protect data privacy. The emphasis is on real‐time detection of cyberattacks and preemptive forecasting of possible risks, especially in the realm of electricity theft within smart grid systems. Existing Electricity Theft Detection techniques for smart grids have obstacles such as class imbalance, which leads to poor generalization, increased complexity due to large EC data aspects, and a high false positive rate in supervised models, resulting in incorrect classification of regular customers as abnormal. To provide security in the smart grid, a novel BLS Privacy Blockchain with Siamese Bi‐LSTM is proposed. Initially, the privacy‐preserving Boneh‐Lynn‐Shacham blockchain technique is built on BLS Short signature and hash algorithms, which mitigate misclassification rates and false positives in the detection of smart grid attacks. Then, a hybrid framework employs an intrusion detection algorithm based on Siamese Bidirectional Long Short‐Term Memory to semantically distinguish between harmful and authentic behaviors, thereby improving data quality and predictive capabilities. Furthermore, a Recurrent Neural Network‐Generative Adversarial Network is presented for detecting electricity fraud, which addresses the issue of class imbalance. This uses both supervised and unsupervised loss functions to produce synthetic theft samples that closely resemble actual theft incidents. From the experiment, it is showing that the proposed models perform with high accuracy and low error rates. The proposed model from the outcomes when compared to other existing models achieves high accuracy, detection rate, recall, and low computation time.
... Among the most prevalent data-oriented approaches are those centered on data analysis [7], [8], machine learning (ML) techniques, including support vector machines (SVMs) [9], artificial neural networks (ANNs) [10], and fuzzy logic [11]. Additionally, sophisticated deep learning (DL) models have been introduced, such as convolutional neural networks [12] and recurrent neural networks [13]. ...
... Here x i,j is the original EC data of a consumer i at time j. min(x i ) and max(x i ) represent the minimum and maximum EC values of a consumer i, respectively [29]. The detail working implementation of PowerTrust preprocessing is shown in algorithm 1, where prior steps indicate the intialization of variable and SGCC data collection. ...
Article
Full-text available
Artificial intelligence (AI) is transforming the electrical grid by incorporating advanced communication protocols and novel monitoring infrastructure. This transformation alters traditional methods of electricity consumption and billing, shifting from manual meter reading to dynamic peak-hour tariff rates. While people are open to upgrading their energy consumption systems with cutting-edge technology, they express concerns over transmitting their data over wireless networks, fearing unauthorized access to their private information for unpredictable purposes. Thus, establishing trust in the new technology is crucial before individuals feel secure about sharing their personal information. In this study, we introduce PowerTrust, an ensemble learning stacking model designed to evaluate trust and safeguard user privacy in the Internet of Grid Things. PowerTrust is divided into two parts. In the first part, it assesses the trustworthiness of smart grid devices. In the second part, it proposes a secure scrambling method to protect electricity readings before they are transmitted to the control center. During trust evaluation, data is balanced, important features are extracted, and then a classification model is applied. The yynthetic minority oversampling technique is used to balance the dataset, and recursive feature elimination is used to select important features. The results show that the proposed scheme is secure and efficient in maintaining trust and privacy in the grid environment.
... After overcoming outlier and missing data issues, we move on to data normalization, a key aspect for the MicroTrust model, which often performs poorly on sparse, diverse, and unscaled data. To achieve this goal, the MM scaling technique is used to normalize the data [26]. Min-max scaling is an operation that tries to normalize the data uniformly, as shown in Equation 3. ...
Article
Full-text available
The global surge in population, coupled with the continuous emergence of new digital devices, has significantly increased the worldwide demand for electricity, intensifying the existing energy crisis. Microgrids are pivotal in addressing the current needs of utility industries and driving innovation. However, numerous challenges remain, particularly in developing countries where inadequate infrastructure hampers electricity generation, transmission, and distribution. A potential solution lies in energy sharing, which offers mutual benefits to both microgrid operators and consumers. The integration of Internet of Things (IoT) technology within microgrids enhances real-time monitoring and management of energy resources, thereby improving both efficiency and reliability. Microgrids have the capability to share electricity resources with other microgrids, local communities, or contribute surplus energy to the national grid. However, maintaining the integrity of participants in these energy exchanges is crucial. It is necessary to distinguish between trustworthy and potentially malicious microgrids or consumers. To address this issue, the implementation of a MicroTrust mechanism is proposed. The results demonstrate that MicroTrust effectively evaluates the trustworthiness of CCs and various requesting consumer nodes, providing a robust solution for secure and reliable energy sharing.
... • Machine Learning Approach: Many researchers have used machine learning to detect power theft. For example, a robust deep learning model using GoogleNet and gated recurrent units (GRU) has been developed [94]. • Cryptography and key management: Communication security and privacy are SG systems' top priorities. ...
Article
Full-text available
The rapid integration of Information and Communication Technology (ICT) is transforming the traditional electrical grid into a Smart Grid. Smart grids enable two-way communication and improved monitoring and control between utilities and customers. However, due to its heterogeneous nature, public exposure, and weak security at low-powered devices, the smart grid has vulnerabilities to various malicious threats, adversaries, and cyber attacks, which may affect cost and service availability. Additionally, when the systems’ confidentiality, integrity, or availability are compromised, the resulting fallout can threaten national security and have cascading effects on human lives. Given the extreme consequences of an attack, smart-grid technology must be thoroughly tested for correct operation and security before it is deployed. As a result, vulnerability testing of smart grids, not only for correctness but for security purposes, has been the subject of numerous studies by academics, government agencies, and private companies. This paper reviews the vulnerabilities associated with the smart grid and spotlights simulation as the vulnerability testing methodology conducted in recent pertinent research works. It also presents various security aspects of the smart grid, including grid applications, system and network infrastructure and components, cyber threats and attacks, different mitigation techniques, and simulation for security aspects. Finally, we analyze the gaps in the current research, focusing on integrating cybersecurity analysis in simulation. We then recommend future research directions focused on smart grid cyber ranges.
... Recently, some authors have proposed attention mechanisms to overcome some issues inherent to fraud detection problems, such as data labeling, data imbalance, and loss of historical information [17], [18]. In addition, a hybrid deep learning model using temporal least squares generative adversarial least squares networks has proven effective in overcoming class imbalance [19]. ...
Article
Full-text available
Non-technical losses (NTL) pose multiple challenges across distribution grids. This paper introduces a comprehensive framework combining data-driven methods to detect and categorize fraud due to meter tampering or direct connections while identifying potential culprits. The hybrid methodology utilizes grid and consumer-related data to obtain NTL curves through an energy balance approach, yielding indicators such as magnitude, duration, and other features. A Random Forest classifier trained with real historical cases of NTL achieves a weighted F1 score of 0.859, effectively labelling fraud types. Additionally, an unsupervised detection model, integrating clustering and correlation methods, enables accurate identification of tampered meters. The paper introduces two adjustable parameters enabling utilities to fine-tune meter tampering detection strategies based on economic considerations. The results demonstrate that true positives can be increased at the expense of increasing false positives. Accurate fraud identification is achieved using the Fuzzy C-Means algorithm, with an F1 score of 0.9. The algorithm is tested on grids with distributed generation, with a decrease of 10% on the predictive performance when half of the users are prosumers, demonstrating the methodology’s promising performance in real-world scenarios.
... The estimate of TLs is required for the tracking of NTLs. ET is a deliberate act of illegal electricity usage, a significant source of NTLs [7,8]. The ET severely threatens SGs because it causes monetary losses. ...
Article
Full-text available
The most significant issue today is electricity theft (ET) which causes much loss to electricity boards. The development of smart grids (SGs) is crucial for ET detection (ETD) because these systems produce enormous amounts of data, including information on customer consumption, which can be used to identify ET using machine learning and deep learning (DL) techniques. However, the existing models majorly suffers with lower prediction accuracy because of over-fitting and dataset imbalancing issues. Therefore, to overcome these shortcomings, this paper proposes a novel DL approach for ETD in the Internet of Things-based SGs using parameter-tuned bidirectional long short-term memory (PTBiLSTM) with pre-trained feature learning model. The proposed system mainly comprises '4' phases: preprocessing, dataset balancing, feature selection, and ETD. Initially, the consumers’ electricity consumption data are collected from the theft detection dataset 2022 (TDD2022) dataset. Then, the data balancing is carried out by using Gaussian distribution, including fuzzy C-means approach to handle the imbalance data. Afterward, the meaningful features from the balanced dataset are extracted using the hard swish and dropout layer included residual neural network-50 (ResNet-50) model. Finally, the ETD is done, which utilizes a PTBiLSTM. The proposed models’ performance is evaluated using different performance metrics like accuracy, precision, recall, f-measure, the area under the curve, and kappa. The outcomes proved the efficiency of the proposed method over other related schemes in the ETD of SGs.
... The proposed various stand-alone oversampling techniques like synthetic majority oversampling technique (SMOTE), adaptive synthetic sampling (ADAYN), random oversampling (ROS) and undersampling techniques like random undersampling (RUS), random undersampling boosting (RUSBOOST) and near-miss (NM) [9][10][11][12]. The oversampling techniques generate the synthetic data points in the minority class, making the classifier overfit [13]. While in the undersampling, the samples are randomly removed the samples from the majority class, which leads the classifier toward underfitting. ...
Article
This research presents the development and implementation of an integrated artificial intelligence model for electricity theft detection, combining Convolutional Neural Networks (CNN) and Support Vector Machines (SVM). The primary objective was to create a more accurate, efficient, and scalable method for identifying fraudulent electricity consumption patterns. Our CNN-SVM hybrid model leverages CNNs for automatic feature extraction from complex consumption data and SVMs for effective classification. This synergy allows for superior performance in detecting subtle anomalies indicative of electricity theft. The methodology involved pre-processing a large dataset of electricity consumption records, training the CNN to extract relevant features, and optimising the SVM classifier to distinguish between normal and fraudulent patterns. We evaluated the model's performance using metrics including accuracy, precision, recall, F1-score, and ROC AUC. Results demonstrated that our integrated CNN-SVM model significantly outperformed conventional machine learning techniques and standalone models in electricity theft detection. The model achieved an accuracy of 96.6%, with a precision of 97.2% and a recall of 96.1%. Comparative analysis against other state-of-the-art algorithms revealed consistently superior performance across all evaluation metrics. To enhance practical applicability, we developed and deployed a web application that implements the model, allowing for user-friendly interaction and real-time theft detection. This addition bridges the gap between research and real-world implementation, providing utility companies with an accessible tool for fraud detection. The study also explored the model's potential for real-time application and scalability to large-scale utility operations. Our findings suggest that the computational efficiency of the CNN-SVM model, coupled with the web application, offers utility companies a powerful and accessible tool for rapid response to potential fraud. This research contributes to the field of electricity theft detection by introducing a novel, high-performance AI model with a practical web-based implementation. The proposed approach not only improves detection accuracy but also offers potential for immediate real-world application, paving the way for more effective fraud prevention in the utility sector.
Article
Full-text available
Electricity theft presents a significant financial burden to utility companies globally, amounting to trillions of dollars annually. This pressing issue underscores the need for transformative measures within the electrical grid. Accordingly, our study explores the integration of block chain technology into smart grids to combat electricity theft, improve grid efficiency, and facilitate renewable energy integration. Block chain’s core principles of decentralization, transparency, and immutability align seamlessly with the objectives of modernizing power systems and securing transactions within the electricity grid. However, as smart grids advance, they also become more vulnerable to attacks, particularly from smart meters, compared to traditional mechanical meters. Our research aims to introduce an advanced approach to identifying energy theft while prioritizing user privacy, a critical aspect often neglected in existing methodologies that mandate the disclosure of sensitive user data. To achieve this goal, we introduce three distributed algorithms: lower–upper decomposition (LUD), lower–upper decomposition with partial pivoting (LUDP), and optimized LUD composition (OLUD), tailored specifically for peer-to-peer (P2P) computing in smart grids. These algorithms are meticulously crafted to solve linear systems of equations and calculate users’ “honesty coefficients,” providing a robust mechanism for detecting fraudulent activities. Through extensive simulations, we showcase the efficiency and accuracy of our algorithms in identifying deceitful users while safeguarding data confidentiality. This innovative approach not only bolsters the security of smart grids against energy theft, but also addresses privacy and security concerns inherent in conventional energy-theft detection methods.
Article
Full-text available
Electricity theft is one of the main causes of non-technical losses and its detection is important for power distribution companies to avoid revenue loss. The advancement of traditional grids to smart grids allows a two-way flow of information and energy that enables real-time energy management, billing and load surveillance. This infrastructure enables power distribution companies to automate electricity theft detection (ETD) by constructing new innovative data-driven solutions. Whereas, the traditional ETD approaches do not provide acceptable theft detection performance due to high-dimensional imbalanced data, loss of data relationships during feature extraction and the requirement of experts' involvement. Hence, this paper presents a new semi-supervised solution for ETD, which consists of relational denoising autoencoder (RDAE) and attention guided (AG) TripleGAN, named as RDAE-AG-TripleGAN. In this system, RDAE is implemented to derive features and their associations while AG performs feature weighting and dynamically supervises the AG-TripleGAN. As a result, this procedure significantly boosts the ETD. Furthermore, to demonstrate the acceptability of the proposed methodology over conventional approaches, we conducted extensive simulations using the real power consumption data of smart meters. The proposed solution is validated over the most useful and suitable performance indicators: area under the curve, precision, recall, Matthews correlation coefficient, F1-score and precision-recall area under the curve. The simulation results prove that the proposed method efficiently improves the detection of electricity frauds against conventional ETD schemes such as extreme gradient boosting machine and transductive support vector machine. The proposed solution achieves the detection rate of 0.956, which makes it more acceptable for electric utilities than the existing approaches.
Article
Full-text available
The significance of electricity cannot be overlooked as all fields of life like material production, health care, educational sector, etc., depend upon it to render consistent and high-quality services, increase productivity and business continuity. To this end, energy operators have experienced a continuous increasing trend in the electricity demand for the past few decades. This may cause many issues like load shedding, increased electricity bills, imbalance between supply and demand, etc. Therefore, forecasting of electricity demand using efficient techniques is crucial for the energy operators to decide about optimal unit commitment and to make electricity dispatch plans. It also helps to avoid wastage as well as the shortage of energy. In this study, a novel forecasting model, known as ELS-net is proposed, which is a combination of an Ensemble Empirical Mode Decomposition (EEMD) method, multi-model Ensemble Bi Long Short-Term Memory (EBiLSTM) forecasting technique and Support Vector Machine (SVM). In the proposed model, EEMD is used to distinguish between linear and non-linear intrinsic mode functions (IMFs), EBiLSTM is used to forecast the non-linear IMFs and SVM is employed to forecast the linear IMFs. Using separate forecasting techniques for linear and non-linear IMFs decreases the computational complexity of the model. Moreover, SVM requires low computational time as compared to EBiLSTM for linear IMFs. Simulations are performed to examine the effectiveness of the proposed model using two different datasets: New South Wales (NSW) and Victoria (VIC). For performance evaluation, Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are used as performance metrics. From the simulation results, it is obvious that the proposed ELS-net model outperforms the start-of-the-art techniques, such as EMD-BILSTM-SVM, EMD-PSO-GA-SVR, BiLSTM, MLP and SVM in terms of forecasting accuracy and minimum execution time.
Conference Paper
Full-text available
Electricity theft detection issue has drawn lots of attention during last decades. Timely identification of the electricity theft in the power system is crucial for the safety and availability of the system. Although sustainable efforts have been made, the detection task remains challenging and falls short of accuracy and efficiency, especially with the increase of the data size. Recently, convolutional neural network-based methods have achieved better performance in comparison with traditional methods, which employ handcrafted features and shallow-architecture classifiers. In this paper, we present a novel approach for automatic detection by using a multi-scale dense connected convolution neural network (multi-scale DenseNet) in order to capture the long-term and short-term periodic features within the sequential data. We compare the proposed approaches with the classical algorithms, and the experimental results demonstrate that the multi-scale DenseNet approach can significantly improve the accuracy of the detection. Moreover, our method is scalable, enabling larger data processing while no handcrafted feature engineering is needed.
Article
Full-text available
With the ever-growing demand of electric power, it is quite challenging to detect and prevent Non-Technical Loss (NTL) in power industries. NTL is committed by meter bypassing, hooking from the main lines, reversing and tampering the meters. Manual on-site checking and reporting of NTL remains an unattractive strategy due to the required manpower and associated cost. The use of machine learning classifiers has been an attractive option for NTL detection. It enhances data-oriented analysis and high hit ratio along with less cost and manpower requirements. However, there is still a need to explore the results across multiple types of classifiers on a real-world dataset. This paper considers a real dataset from a power supply company in Pakistan to identify NTL. We have evaluated 15 existing machine learning classifiers across 9 types which also include the recently developed CatBoost, LGBoost and XGBoost classifiers. Our work is validated using extensive simulations. Results elucidate that ensemble methods and Artificial Neural Network (ANN) outperform the other types of classifiers for NTL detection in our real dataset. Moreover, we have also derived a procedure to identify the top-14 features out of a total of 71 features, which are contributing 77% in predicting NTL. We conclude that including more features beyond this threshold does not improve performance and thus limiting to the selected feature set reduces the computation time required by the classifiers. Last but not least, the paper also analyzes the results of the classifiers with respect to their types, which has opened a new area of research in NTL detection.
Article
Soil water content in maize root zone is the main basis of irrigation decision-making. Therefore, it is important to predict the soil water content at different depths in maize root zone for rational agricultural irrigation. This study proposed a hybrid convolutional neural network-gated recurrent unit (CNN-GRU) integrated deep learning model that combines a CNN with strong feature expression capacity and a GRU neural network with strong memory capacity. The model was trained and tested with the soil water content and meteorological data from five representative sites in key maize producing areas, Shandong Province, China. We designed the model structure and selected the input variables based on a Pearson correlation analysis and soil water content auto-correlation analysis. The results showed that the hybrid CNN-GRU model performed better than the independent CNN or GRU model with respect to prediction accuracy and convergence rate. The average mean squared error (MSE), mean absolute error and root mean squared error of the hybrid CNN-GRU model on day 3 were 0.91, 0.51 and 0.93, respectively. The prediction accuracy of the model improved with increasing soil depth. Extending the forecast period, the prediction accuracy values of the hybrid CNN-GRU model for the soil water content on days 5, 7 and 10 were comparable, with an average MSE of less than 1.0.
Article
The theft of electricity affects power supply quality and safety of grid operation, and non-technical losses (NTL) have become the major reason of unfair power supply and economic losses for power companies. For more effective electricity theft inspection, an electricity theft detection method based on similarity measure and decision tree combined K-Nearest Neighbor and support vector machine (DT-KSVM) is proposed in the paper. Firstly, the condensed feature set is devised based on feature selection strategy, typical power consumption characteristic curves of users are obtained based on kernel fuzzy C-means algorithm (KFCM). Next, to solve the problem of lack of stealing data and realize the reasonable use of advanced metering infrastructure (AMI). One dimensional Wasserstein generative adversarial networks (1D-WGAN) is used to generate more simulated stealing data. Then the numerical and morphological features in the similarity measurement process are comprehensively considered to conduct preliminary detection of NTL. And DT-KSVM is used to perform secondary detection and identify suspicious customers. At last, simulation experiments verify the effectiveness of the proposed method.
Article
Advanced metering infrastructure allows the two-way sharing of information between smart meters and utilities. However, it makes smart grids more vulnerable to cyber-security threats such as energy theft. This study suggests ensemble machine learning (ML) models for the detection of energy theft in smart grids using customers’ consumption patterns. Ensemble ML models are meta-algorithms that create a pool of several ML approaches and combine them smartly into one predictive model to reduce variance and bias. A number of algorithms, including adaptive boosting, categorical boosting, extreme-boosting, light boosting, random forest, and extra trees, were tested to find their false positive and detection rates. A data pre-processing method was employed to improve detection performance. The statistical approach of minority over-sampling was also employed to tackle over-fitting. An extensive analysis based on a practical dataset of 5000 customers reveals that bagging models outperform other algorithms. The random forest and extra trees models achieve the highest area under the curve score of 0.90. The precision analysis shows that the proposed bagging methods perform better.
Article
A Ubiquitous Power Internet of Things is fundamentally an Internet of Things, but focused upon power systems. Being able to predict these prices accurately may help with the identification of customer needs and the effective regulation of the power grid by power producers. It may also help electric power traders to manage risks, make correct decisions, and obtain more benefits. In this paper, a novel hybrid model is proposed for short‐term electricity price prediction. The model consists of three algorithms: Variational Mode Decomposition (VMD); a Convolutional Neural Network (CNN); and Gated Recurrent Unit (GRU). This is called SEPNet for convenience. The annual electricity price data is divided into seasons because of seasonal differences in the time series of electricity prices. The VMD algorithm is used to decompose the complex time series of electricity prices into intrinsic mode functions (IMFs) with different center frequencies. The CNN is used to further extract the time‐domain features for all the intrinsic model functions in the VMD domain. The GRU is then employed to process and learn the time‐domain features extracted by the CNN, leading to the final prediction. A comparison is made with five models, such as LSTM, CNN, VMD‐CNN, BP, VMD‐ELMAN. The results showed that the proposed model had the best performance, and it was found that using VMD can improve the Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) for the four seasons by 84% and 81%, respectively. The addition of GRU in the SEPNet model further improved the MAPE and RMSE by 19% and 25%, respectively. Including CNN and VMD‐CNN, that shows that the proposed model has the best performance. The MAPE and RMSE for the four seasonal averages are 0.730% and 0.453, respectively. This confirms that the SEPNet model has the feasibility and high accuracy to predict short‐term electricity prices.
Article
Inspired by the powerful feature extraction and the data reconstruction ability of autoencoder, a stacked sparse denoising autoencoder is developed for electricity theft detection in this paper. The technical route is to employ the electricity data from honest users as the training samples, and the autoencoder can learn the effective features from the data and then reconstruct the inputs as much as possible. For the anomalous behavior, since it contributes little to the autoencoder, the detector returns to a comparatively higher reconstruction error; hence the theft users can be recognized by setting an appropriate error threshold. To improve the feature extraction ability and the robustness, the sparsity and noise are introduced into the autoencoder, and the particle swarm optimization algorithm is applied to optimize these hyper-parameters. Moreover, the receiver operating characteristic curve is put forward to estimate the optimal error threshold. Finally, the proposed approach is evaluated and verified using the electricity dataset in Fujian, China.
Article
Nontechnical losses (NTLs) are estimated to be considerable and increasing every year. Recently, high-resolution measurements from globally laid smart meters have brought deeper insights on users' consumption patterns that can be exploited potentially by NTL detection. However, consumption-pattern-based NTL detection is now facing two major challenges: the inefficiency of harnessing high dimensionality and the severe lack of fraudulent samples. To overcome them, an NTL detection model based on deep learning and anomaly detection is proposed in this article, namely bidirectional Wasserstein GAN and support vector data description-based NTL detector (BSBND). Motivated by the powerful ability of generative adversarial networks (GANs) to learn deep representation from high-dimensional distributions of data, in the BSBND, we utilized a BiWGAN for feature extraction from high-dimensional raw consumption records, and a one-class classifier trained only on benign samples--SVDD--is adopted to map features into judgments. Moreover, a novel alternate coordinating algorithm is proposed to optimize the cooperation between the upstream BiWGAN and the downstream SVDD, and also, an interpreting algorithm is proposed to visualize the basis of each fraudulent judgment. Case studies have demonstrated the superiority of the BSBND over the state of the arts, the powerful feature extraction ability of BiWGAN, and also the effectiveness of the proposed coordinating and interpreting algorithms.