Content uploaded by Nadeem Javaid
Author content
All content in this area was uploaded by Nadeem Javaid on Nov 15, 2021
Content may be subject to copyright.
COMSATS University Islamabad, Islamabad Campus
Synopsis For the Degree of ✓□ M.S/MPhil. □PhD.
Name of Student Faisal Shehzad
Department Department of Computer Science
Registration No.
FA19-RCS-039 Date of Thesis Registration March 29, 2021
Name of
(i) Research Supervisor
(ii) Co-Supervisor
(i) Dr. Nadeem Javaid
(ii) Dr. Mariam Akbar
Research Area Data science in Micro Grids
Members of Supervisory Committee
1Dr. Nadeem Javaid
2Dr. Mariam Akbar
3Dr. Saif Ur Rehman Khan
Title of Research Proposal Using Artificial Intelligence for Detecting Electricity Thefts in Smart
Grids and Predicting Trends in Financial Markets (MS Synopsis)
Signature of Student: Faisal Shehzad
Summary of the Research
In this synopsis, the first solution introduces a hybrid deep learning model, which tackles the class
imbalance problem and curse of dimensionality and low detection rate of existing models. The
proposed model integrates benefits of both GoogLeNet and gated recurrent unit. The one dimen-
sional EC data is fed into GRU to remember periodic patterns. Whereas, GoogLeNet model is
leveraged to extract latent features from the two dimensional weekly stacked EC data. Further-
more, the time least square generative adversarial network is proposed to solve the class imbalance
problem.
The second solution presents a framework, which is employed to solve the curse of dimensionality
issue. In literature, the existing studies are mostly concerned with tuning the hyperparameters of
ML/ DL methods for efficient detection of NTL. Some of them focus on the selection of prominent
features from data to improve the performance of electricity theft detection. However, the curse of
dimensionality affects the generalization ability of ML/ DL classifiers and leads to computational,
storage and overfitting problems. Therefore, to deal with above-mentioned issues, this study pro-
poses a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and
denoising autoencoder for electricity theft detection using big data in electric power systems.
The third solution introduces a hybrid deep learning model for prediction of upwards and down-
wards trends in financial market data. The financial market exhibits complex and volatile behav-
ior that is difficult to predict using conventional machine learning (ML) and statistical methods,
as well as shallow neural networks. Its behavior depends on many factors such as political up-
heavals, investor sentiment, interest rates, government policies, natural disasters, etc. However,
it is possible to predict upward and downward trends in financial market behavior using complex
DL models.
In this synopsis, we have proposed three solutions to solve different issues in smart grids and
financial market. The validations of proposed solutions will be done in thesis work using real-
world datasets.
1
Table 1: List of abbreviations
Abbreviation Full form
ADASYN Adaptive synthetic sampling approach
AMI Advanced metering infrastructure
APPL Apple technology company
ANN Artificial neural network
Adam Adaptive moment estimation
ABC Artificial bee colony
Adagrad Adaptive Gradient Algorithm
BA British airways
CNN Convolutional neural network
CPBETD Consumption pattern based electricity theft detector
Catboost Categorical boosting
D Discriminator
DR Detection rate
DL Deep learning
DT Decision tree
ETD Electricity theft detection
EC Electricity consumption
EMH Efficient-market hypothesis
FPR False positive rate
FP False positive
FN False negative
GA Genetic algorithm
GRU Gated recurrent unit
G Generator
I Electric current
IBM International Business Machines
KNN k-nearest neighbors
LReLU Leaky rectified linear unit
LSTM Long short term memory
LR Logistic regression
LSGAN Least square generative adversarial network
LightGBM Light gradient boosting machine
ML Machine learning
SVM Support vector machine
SGCC Smart grid corporation of China
SMOTE Synthetic minority over-sampling technique
MR Modification rate
MLP Multilayer perceptron
MAD Mean absolute deviation
MSE Mean sqaure error
NTLs Non technical losses
NB Naive bayes
NaN Not a number
TimeGAN Time series LSGAN
TPR True positive rate
TSR Three sigma rule
SSDAE Stacked sparse denoising autoencoder
R Resistance
RUS Random undersampling
RF Random forest
ROS Random oversampling
RNN Recurrent neural network
RDBN Real-valued deep belief network
HRG ResNet and GRU
RMSProp Root mean squared propagation
PR-AUC Precision recall-area under curve
PCA Principal component analysis
PRECON Pakistan residential electricity consumption
ROC-AUC Receiver operating characteristic - area under curve
ROC curve ROC curve
TLs Technical losses
TP True positive
TN True negative
VGG Visual geometry group
WMT Wal-Mart Stores
WDCNN Wide and deep convolutional neural network
XGBoost eXtreme gradient boosting
2
Table 2: List of acronyms
ˆ
hCandidate value W r Reset gate weight
htHidden state WzUpdate gate weight
ht−1Previous hidden state ¯xAverage of daily electricity consumption
hGoogleN et Last hidden layer of GoogleNet xiDaily electricity consumption per hour
hGRU Last hidden layer of GRU zUpdate gate
hInGRU Hidden layer of hybrid module σSigmoid function
rReset gate
WInGRU Weight of hybrid module
WCandidate value weight
1 Introduction
Data science is an interdisciplinary field, which includes scientific methods, advanced algo-
rithms and computer software to extract valuable knowledge from noisy, structured and un-
structured data, and applies extracted knowledge into a wide range of domains. The data sci-
ence domain takes knowledge from multiple fields: physics, mathematics, computer science,
statistics, domain knowledge, etc. It has applications in multiple fields:
Computer
science
Machine
learning Math, Physics
& Statistics
Software
development
Data
Science
Research
innovative
ideas
Domain
knowledge
Figure 1. Data science pillars
1. Fraud and risk detection in banking: The earliest application of data science was in the
banking sector. The banking systems collect customer information during paperwork
while giving loans or other products. After this, they implement advanced machine learn-
ing, deep learning and data mining methods to extract use insights, which tell about the
probability of risk or default. Moreover, this analysis also helps them how to increase the
purchasing power of consumers by learning their habits.
2. Health care: The health care sector receives tremendous benefits from data science: med-
ical images analysis to identify the areas where cancerous or malignant cells are growing
rapidly, Genetics & Genomics analysis to understand relationship between disease, drug
response and genetics of an individual person, etc.
3. Recommendation system (Netflix and Amazon, eBay). The recommendation systems are
the systems that are designed to recommend new things or articles to users on a basis
of their habits and other factors. Big giants like Netflix, Amazon, eBay, YouTube, etc.
3
utilize these systems to recommend products to their clients. Nowadays, data scientists
use advanced clustering, classification and forecasting models to extract complex patterns
in clients’ behaviors, which help companies in target marketing.
4. Computer vision: It is one of the hottest research topics in data science. We use Facebook
regularly where we attract with the tagging feature, which is one of the best examples of
data science in computer vision. So, data science has become part of our lives knowingly
or unknowingly. It has applications in many other fields like sentiment analysis, target
marketing, gaming, natural language processing, etc.
In this synopsis, we analyze the applications of data science in two domains. In first phase, data
science techniques are applied to detect anomalies in electricity consumption (EC) data, which
is collected with the help of smart grids. Moreover, the first phase consists of two solutions. In
first solution, feature extraction and classification steps are performed with the help of a hybrid
deep learning model. While, the second solution only focuses curse of dimensionality issue,
which is handled using denoising autoencoder and metaheuristic techniques. In second phase,
uplward and downward trends are predicted in financial market for giving benifits to potential
investors. The detailed description of both phases are given below.
Electricity is an important factor in human lives and becomes essential for the economic and
social development of any country. Various losses occur in the transmission and distribution of
electricity, namely technical losses and non-technical losses (NTLs). The technical losses occur
due to the dissipation of energy in transmission lines, transformers, and electrical equipment,
while the NTLs occur due to direct connection to transmission lines, meter tampering, faulty
meters and changes in meter readings through communication links Moreover, a recent report
shows that NTLs cause $96 billion of revenue loss every year [1]. According to the World
Bank’s report, India, China and Brazil bear 25%, 6% and 16% loss on their total electric sup-
ply, respectively. The NTLs are not limited to only developing countries; it is estimated that
developed countries like UK and US also lose 232 million and 6 billion US dollars per annum,
respectively [2–4].
Electricity theft is a primary cause of NTLs. The evolution of advanced metering infras-
tructure (AMI) promises to overcome electricity theft through monitoring users’ consumption
history. However, it introduces new types of cyber-attacks, which are difficult to detect using
conventional methods. Whereas, traditional meters are only compromised through physical tam-
pering. In AMI, the meter readings are tampered locally and remotely over the communication
links before sending them to an electric utility [5–16]. There are three types of approaches to
address the NTLs in AMI: state, game theory and data-driven. State-based approaches exploit
wireless sensors and radio frequency identification tags to detect NTLs. However, these ap-
proaches require high installation, maintenance and training cost and they also perform poorly
in extreme weather conditions [17–25]. Beside this, game theory based approaches hold a game
between a power utility and consumers to achieve equilibrium state and then extract hidden
patterns from users’ EC history. However, it is difficult to design a suitable utility function for
utilities, regulators, distributors and energy thieves to achieve equilibrium state within the de-
fined time [26–32]. Moreover, both NTLs detection approaches have low detection rate (DR)
and high false positive rate (FPR)
The data driven methods get high attention due to the availability of electricity consumption
(EC) data that is collected through AMI. A normal consumer’s EC follows a statistical pattern,
whereas, abnormal1EC does not follow any pattern. The machine learning (ML) and data
mining techniques are trained on collected data to learn normal2and abnormal consumption
patterns. After training, the model is deployed in a smart grid to classify incoming consumer’s
data into normal or abnormal samples. Since, these techniques use already available data and
1Theft and abnormal words are used interchangeably
2Benign and normal words are used interchangeably.
4
do not require to deploy hardware devices at consumers’ site that is why their installation and
maintenance costs are low as compared to hardware based methods. However, class imbalance
problem is a serious issue for data driven methods where the number of normal EC samples is
more than theft ones. Normal data is easily collected through users’ consumption history.
Whereas, theft cases are relatively rare than normal class in the real world that is why few
number of samples are present in user’s consumption history. So, lack of theft samples affect
the performance of classification models. The ML models become biased towards majority
class and ignore the minority class, which increases the FPR [33–44]. In literature, the authors
mostly use random undersampling (RUS) and random oversampling (ROS) techniques to handle
the class imbalance problem. However, both techniques have underfitting and overfitting issues
that increase the FPR and minimize the DR [5,45–57]. The second challenging issue is the curse
of dimensionality. A time series dataset contains a large number of timestamps (features) that
increase both execution time and memory complexity and reduce the generalization ability of
ML methods. However, traditional ML methods have low DR and overfitting issue due to curse
of dimensionality. They require domain knowledge to extract prominent features that is a time
consuming task [2,5]. Moreover, metaheuristic techniques are proposed by understaning the
working mechanism of nature. In literature, these techniques are mostly utilized for optimization
and feature selection purposes [58].
In this synopsis, time series least square generative adversarial network (TLSGAN) is pro-
posed, which is specifically designed to handle data imbalance problem of time series datasets.
It utilizes supervised and unsupervised loss functions and gated recurrent unit (GRU) layers
to generate fake theft samples, which have high resemblance with real world theft samples.
Whereas, standard GAN uses only unsupervised loss function to generate fake theft samples,
which have low resemblance with real word theft samples. Moreover, a HG2model is proposed,
which is a hybrid of GoogLeNet and GRU. It is a challenging task to capture long-term period-
icity from one dimensional (1D) time series dataset. The deep learning (DL) models have better
ability to memorize sequence patterns as compare to traditional ML models. The 1D data is fed
into GRU to capture temporally correlated patterns from users’ consumption history. Whereas,
weekly consumption data is passed to GoogLeNet to capture local features from sequence data
using the inception modules. Each inception module contains multiple convolutional and max-
pooling layers that extract high level features from time series data and overcome the curse of
dimensionality issue. Moreover, non malicious factors like changing the number of persons in a
house, extreme weather conditions, weekends, big party in a house, etc., affect the performance
of ML methods. The GRU is used to handle non malicious factors because it has memory mod-
ules. These memory modules help GRU to learn sudden changes in consumption patterns and
memorize them, which decrease the FPR. Moreover, dropout and batch normalization layers are
used to enhance convergence speed, model generalization ability and increase the DR. The main
contributions of solution 1 are given below.
• a hybrid model is proposed that is a combination of GRU and GoogleNet. The former ex-
tracts the temporal patterns from the EC dataset while the latter retrieves latent or abstract
features that are not observed through the human eye. The self-learning mechanism of the
hybrid model increases convergence speed, accuracy and overall performance. Moreover,
we work on 1D and 2D data parallelly. The 1D data is fed into GRU to learn time-related
patterns, whereas, 2D data is passed to GoogleNet to capture latent features from weekly
consumption,
• the class imbalance problem is a severe issue in ETD that drastically affects the perfor-
mance of classifiers. The TimeGAN exploits to generate fake samples from existing theft
patterns to tackle the class imbalance ratio,
• extensive experiments are conducted on a realistic EC dataset that is provided by smart
grid corporation of China (SGCC), the largest smart grid company in China. Moreover,
5
different performance indicators are utilized to evaluate the performance of the proposed
model: receiver operating characteristic curve (ROC curve), ROC-area under the curve
(ROC-AUC), precision recall curve (PR curve) and PR-area under the curve (PR-AUC),
• GRU model is utilized to handle non malicious factors like sudden changes in consump-
tion patterns due to increase in family members, change in weather conditions, etc. It
has memory modules to remember consumption history of a user and compare the cur-
rent input with previous saved user’s history before giving final prediction about having
anamoly or not and
• batch normalization and dropout layers are used to enhance convergence speed of model
and reduce overfitting issue.
In second solution, we solve the curse of dimensionality issuee. Despite the extensive use
of ML classifiers, some ML researchers focus on the curse of dimensionality, which leads to
overfitting, computational overhead, and memory limitations. In [2], Joker et al. propose EC
theft detectoion methld that is based on support vector machine (SVM) and hardware devices
to distinguish between normal and abnormal patterns. Both of the above problems generate
false alarms that are not sustainable for an electric utility due to the limited budget for on-site
inspections. In [58], the authors use four metaheuristic techniques: black hole, harmonic search,
particle swarm optimization, and differential evolution to select optimal features from the EC
dataset. They use accuracy as a fitness function to evaluate the performance of the selected
features by the four techniques. However, accuracy is not a good measure of imbalanced class
datasets. In this study, a framework based on three modules is developed to address the above
issues. The list of contributions of second solution is given below.
• A hybrid method based on metaheuristics and ML methods have been developed using
big data for efficient electricity theft detection.
• In order to reduce FPR and improve DR, updated version of theft cases is exploited to
generate malicious samples from benigns.
• Eleven different features are synthesized from the EC data using the statistical and electri-
cal parameters. The features provide good classification accuracy and F1-score, indicating
that they are good representatives of the EC data.
• The metaheuristic techniques, i.e., ABC and GA are used to select optimal features from
the newly synthesized features. In denoising, autoencoders are used to reduce the high
dimensionality and extract features with high variance. This process reduces the compu-
tational cost and memory constraints that limit the real-time applications of ML classifiers
for smart grids.
• The metaheuristic techniques are used in literature to select a subset of features from EC
data. They use accuracy as a fitness function. However, it is not a suitable measure for
imbalanced datasets. That is why F1-score is utilized as a fitness function because it gives
equal weights to both classes.
Although predicting future trends and the direction of financial market movements is one of
the potential tasks in the financial industry, performing such a task is very difficult due to the
complex and volatile nature of the financial market [59]. It depends on many factors such as
political conditions of a country, government policies, investor sentiments, etc. [60]. For many
years, people in academia and in the financial market have believed that future trends cannot
be predicted. This belief is based on the random walk theory [61] and the efficient market
hypothesis (EMH) [62]. The movement of the financial market moves along a random path
and behaves like Brownian motion. Thus, according to [63], there are only 50% chances to
6
predict its behavior. Furthermore, according to the EMH concept, its behavior depends on the
information currently available. So we are not able to predict the movements of the financial
market on a regular bases.
Forecasting in the financial market is guided by two schools of thought, namely technical
analysis and fundamental analysis. The former analyzes the stock price and its turnover. It takes
into account all the factors that affect the stock price such as economic conditions, social or
cultural factors, etc. Technical analysis also assumes that price movements continue until the
stock reaches its peak and then reverses. The latter school of thought examines fundamental
factors such as investor sentiment, natural disasters, company ratings that affect price move-
ments in the financial markets. So, in fundamental analysis, researchers focus on the factors that
affect stock prices, while in technical analysis, they measure the impact of these factors on stock
prices. Thus, the second school of thought strengthens the EMH, which states that the financial
market is independent because these fundamental factors can be changed at any time. However,
despite extensive debates about the EMH in academia and industry, no one has confirmed or
refuted it. In practice, researchers develop complex mathematical models and profit from them
by predicting the behavior of financial markets [64].
Existing literature uses statistical and econometric techniques to predict the upward and
downward trends of financial markets [60]. However, these techniques have low accuracy in
predicting the behavior of financial markets, resulting in large losses for potential investors. Re-
cently, machine learning (ML) and deep learning (DL) models are receiving a lot of attention
from the research community as they are trained on historical data of the financial market [65].
However, these models have the following problems: the curse of dimensionality, inappropriate
tuning of parameters, inability to learn complex patterns, and low accuracy of the independent
models. [66] do not address the problem of the curse of dimensionality. [67] use principal com-
ponent analysis (PCA) to reduce the high dimensionality of the data. However, it gives good
results for linear data compared to non-linear data. We know that the financial market has com-
plex and nonlinear behavior. [68] use a stacked autoencoder to solve this problem. However, this
is sensitive to the diversity of the data, which affects its generalization ability. Another problem
with single classifiers is the trade-off between bias and variance. We prefer models that have
low bias and variance, but this is difficult to achieve. In this study, a denoising autoencoder is
used to solve the curse of dimensionality problem, while a hybrid DL model is proposed to pre-
dict upward and downward trends in financial market data. Moreover, the problem of bias and
variance of individual classifiers is addressed. The main contributions of solution 3 are given:
• a hybrid DL model HRG is proposed for financial trend prediction, which is a combination
of ResNet and gated recurrent unit (GRU),
• the problem of the curse of dimensionality is solved using a denoising autoencoder and
• extensive experiments are conducted to compare the performance of the proposed model
with the performance of benchmark models. We use different datasets, i.e., IBM, APPL,
BA, and WMT, to evaluate the effectiveness of the proposed model in real-world scenar-
ios.
2 Related work
In this section, we have studied the state-of-the-art articles to understand how researchers are
using machine learning and deep learning methods in smart grids and financial market.
2.1 Solving the Curse of Dimensionality Issue
In [5], the authors use a support vector machine (SVM) to detect the abnormal patterns from
EC data. However, the authors do not use any feature engineering technique to extract or select
7
the prominent features as a time-series dataset contains large number of dimensions. The high
dimensionality of data creates time complexity, storage and FPR issues. In [33,45,69–73],
feature selection is important part of data-driven based techniques where significant features are
selected from existing ones. Domain knowledge requires for feature selection. During feature
selection process, less domain knowledge increases FPR and decreases classification accuracy.
In [57], previous studies use only the EC dataset to train ML classifiers and predict abnormal
patterns. They do not use smart meter information and auxiliary data (geographical information,
meter inside or outside, etc.) to predict normal and abnormal patterns from EC data.
In [74–78], there are various consumption behaviours of different users. Consumption be-
haviours of each customer give different results. So, it is necessary to select those features,
which give best results. However, consumption behaviours are closely related and significant
correlation exists between these features. The authors remove highly correlated and overlapped
features, which help to improve DR and decrease FPR. In [34], existing NTL detection meth-
ods are based on data driven approaches and specific hardware devices. The hardware based
methods are time-consuming, less efficient and more expensive. While, data driven methods re-
quire domain experts to classify normal and abnormal patterns. The introduction of smart meters
helps in NTL detection. However, they introduce several types of new attacks that are difficult to
detect through existing detection algorithms, which are based on domain knowledge. In case of
NTLs, electricity theft reports lower consumption than actual consumption. The theft customers
use different techniques to change consumption patterns. For example, shunt devices diverge
current from an input terminal to output terminals or bypass a smart meter. In double-tapping
attacks, appliances are directly connected to distribution lines. As there is no mathematical for-
mulation for these attacks. So, handcrafted features are designed to detect these attacks. For
example, features that show sudden changes in consumption patterns are used for shunt attacks.
Faulty meters are detected through the presence of number of zeroes or missing values in elec-
tricity measurements. The handcrafted features require continuous work, less efficient, more
expensive and depend upon domain knowledge to select optimal features.
In [58,79–88], despite of extensive uses of ML techniques, no one focuses on the selection
of optimal features. In [46,89–94,123], the authors give possibilities of implementing ML clas-
sifiers for detection of NTLs and describe the advantage of selecting optimal features and their
impacts on classifier performance. One of main challenges [95] that limited the classification
ability of existing methods are high dimensionality of data. Although, smart meters greatly
improve data collection procedures and provide high dimensionality data to capture complex
patterns. However, research work shows that most existing classification methods are based on
conventional ML classifiers like ANN, SVM and decision tree, which have limited generaliza-
tion ability and unable to learn complex patterns in high dimensional datasets. In [5], authors
use gradient boosting classifiers: Categorical Boosting (CatBoost), Light Gradient Boosting
Machine (LightGBM) and eXtreme Gradient Boosting Machine (XGBoost), which have built-in
weighted feature importance module to create a subset of optimal features. Moreover, stochas-
tic features are extracted through mean, standard deviation, maximum and minimum value of a
consumer observations. These stochastic features have a minor effect on DR and major effect
on FPR.
In [33,96–102], the authors propose a hybrid model, which is a combination of multilayer
perceptron (MLP) and convolutional neural network (CNN). CNN has the ability to extract la-
tent features from EC data due to different number of layers: max pooling and convolution
layers. Both layers extract abstract information, reduce data dimensionality, computation time
and increase model generalization ability. The authors attain global knowledge from 1D data
and local information from 2D data (weekly consumption) through MLP and CNN, respec-
tively. In [57,103–110], the authors generate new features from the smart meter and auxiliary
data. These features are based on z-score, electrical magnitude, users’ consumption patterns
through clustering technique, smart meter alarm system, geographical location and smart meter
8
placement indoor or outdoor. In [111], features are selected from existing features based on
clustering evaluation criteria. Selecting the number of feature 1, we select a feature from among
all other ones, which have the highest clustering evaluation criteria. Selecting the number of
features 2, we select two among all other features, which have the highest clustering evalua-
tion criteria. A DL-based stacked autoencoder is utilized to extract the latent features, which
significantly improve DR.
In [34], authors proposed a new DL model, which have ability to learn and extract latent fea-
tures from EC data. The proposed methodology integrates both sequential and non-sequential
data. Former (EC) and latter (smart meter data, geographical information, etc.) fed into in
LSTM and MLP module, respectively. The EC values are recorded on an hourly basis. How-
ever, the authors reduce the granularity on daily level by simply taking an average of twenty-four
hours. LSTM model gives more accuracy and extracts better-hidden features from daily con-
sumption than monthly or hourly. In [58,112–118], authors use the black hole algorithm (BHA)
to select the optimal number of features. BHA is based on concept of the black hole. The black
hole has high gravitational force as compared to other stars. It sucks the stars, which come near
its boundary line. It is a basic concept that is used in BHA. BHA is designed for continuous
data. However, the authors convert it into a binary algorithm using the sigmoid activation func-
tion because we want to select a subset of features from existing ones. In the end, the authors
compare the results of BHA with particle swarm optimization, differential evolution, genetic
algorithm (GA) and harmony search. In [119–122], the authors utilize blockchain to achieve
privacy in smart grid domain. In [123], the authors perform work on feature engineering. They
identify different features like electricity contract, geographical location, weather condition, etc.
In [70,124–133], conventional techniques are applied on data to reduce the curse of dimen-
sionality. This process is very tedious and time-consuming. CNN performs a downsampling
mechanism and extracts high variance features. The last layer of CNN is a fully connected layer
where sigmoid is used as an activation function that classifies data points as theft or normal
ones. In their proposed technique, the authors used CNN as a feature extractor and passed these
features to a random forest (RF) classifier. RF performs bagging and random features selec-
tion, which overcome the overfitting problem. In [71], one of the main contributions of this
paper is to find optimal number features. It is observed that not all features equally contribute
to prediction results. Their motive is to find a threshold from which including or excluding
features will not affect the prediction results. The authors apply the Gini index to calculate
each feature’s score in prediction results, sort in descending order and select the above features
that contribute maximum in prediction results. The authors found 14 optimal features where all
classifiers give the best performance. Moreover, these selected features reduce the computation
time of classifiers and minimize the curse of dimensionality issue. In [69,134–141], authors
use Dense-Net based CNN to analyse periodicity in EC data. Convolutional layers can capture
the long-term and short-term sequences from weekly and monthly consumption patterns. Dif-
ferent types of convolutional layers extract the different types of features. So, they overcome
the problem of handcrafted feature engineering using domain experts. To capture abstract and
latent (hidden) features, authors change the connection sequence in a dense block and convert it
into the multi-dimensional block where its received input contains previous blocks’ output. The
authors compare the proposed multi-dense block network with RF, gradient boosting classifier,
simple CNN, 1D-Dense-Net.
In [46], maximal overlap discrete wavelet packet transform (MODWPT) is used to extract
the optimal number of features. It decomposes the consumer load profile into wavelet sig-
nals. Wavelet coefficient and standard deviation of wavelet signals help to select the optimal
number of features. In [95], to address the curse of dimensionality issue, authors implement
a bidirectional Wasserstein generative adversarial network (BiWGAN) to extract the optimal
features from time-series data. Generative Adversarial Networks (GANs) gain much attention
from academia and industry due to their various applications like generating fake samples of
9
images, etc. GAN contains two parts: generator and discriminator. Former generates fake sam-
ples and tries to fool the discriminator. Whereas, latter compares fake and real samples. Both
sub-models are trained in an adversarial manner. When both achieve the equilibrium stage, then,
discriminator is failed to distinguish between fake and real samples.
In [5], authors show 100 customers’ DR and FPR against some selected features through
gradient boosting classifiers. Results indicate that slight improvement is observed against DR
and FPR when we reduce the number of selected features. However, sudden decreased is ob-
served in DR when number of selected features are too low. We choose an optimal number
of features where the proposed method gives high DR and takes low processing time. In [33],
authors perform exploratory data analysis to visualize periodicity in monthly and weekly con-
sumption data. Results show, periodicity exists if we analyse data in 2D manner (Weeks). The
authors fed weekly consumption in 2D-CNN model to extract hidden or latent features. In [57],
authors fed a combination of newly created features in different conventional ML classifiers
and compare their results. In [74], authors perform comparison between a number of selected
features and classification accuracy. When selected features’ are less than four, then accuracy is
decreased. So, an optimal number of features are four that reduce the execution time and mem-
ory complexity and improve the model generalization ability. In [34], authors measure precision
and recall score of LSTM classifier on test data. The hybrid of MLP and LSTM outperform the
single LSTM in terms of precision-recall curve (PR-curve) because MLP adds additional infor-
mation of features to the network like meter location, contractual data and technical information.
In [58], authors use accuracy score, convergence rate and computation time to compare the per-
formance of BHA and benchmark meta-heuristic techniques. In [123], these identified features
are passed to gradient boosting classifiers like LightGBM, CatBoost and XGBoost to distinguish
between normal and abnormal samples.
Receiver operating characteristics area under curve (ROC-AUC) and PR-curve are utilized
to evaluate gradient boosting classifiers’ performance. The authors use a dataset of Naturgy
electric company of Spain and achieve accuracy more than 50%. In [71], authors use preci-
sion, recall and f1-score measures to evaluate the performance of deployed classifiers. In [69],
authors use log loss and ROC-AUC to compare all deployed classifiers’ performance. The pro-
posed model achieves 0.86 and 0.25 ROC-AUC and log loss, respectively. In [46], classification
accuracy are used to evaluate classifier performance on test data. In [95], authors evaluate pro-
posed model performance through DR and FPR. In [2,57,142], authors do not use any feature
engineering techniques to extract or select the optimal features as the time-series dataset con-
tains large number of features. The high dimensionality of data creates time complexity, storage
issues and affects the model generalization ability.
In [74], authors form a feature library where they select a subset of features from existing
features using clustering evaluation criteria. However, they do not compare the adopted feature
selection strategy with other feature selection strategies. Moreover, clustering takes high com-
putation time in case of large number of observations and features. Sparsity and outliers also
affect the performance of clustering evaluation criteria. The authors use autoencoder to extract
new features from existing ones to solve the curse of dimensionality issue. The autoencoder
contains two neural networks. First, network transforms higher dimension data into the low
dimensional data. While, the second network removes noise and correlated features from en-
coded data and converts them into higher dimension data. However, autoencoders require large
amount of data and computational time for training. It also does not give good results on noisy
data. The authors only consider EC data to detect abnormal patterns. They also consider smart
meter data (model, location inside or outside, alarm system, etc.) and geographical data. In [58],
meta-heuristic techniques are used to select optimal features. However, these techniques take
lot of computation time and high chance to stuck in local minima problem.
10
Table 3: Related work
Limitations already ad-
dressed
Solution already pro-
posed
Validations already done Limitations to be done
[2] Data imbalance prob-
lem, Contamination at-
tacks, Effect of outliers,
Privacy issue
Malicious samples gen-
eration through six theft
cases, TLs information
used to identify con-
tamination attacks, High
consumer privacy
Generated theft cases re-
semble with real world
theft cases, DR, FPR, High
DR
Overfitting of ROS, SVM
not designed for time-
series data, Low perfor-
mance of SVM on noisy
data, Difficult to tune hy-
per parameter of SVM,
Cases 1 and 2 not resemble
with real world theft cases
[5] Curse of dimension-
ality, Overfitting, Less re-
semblance with real world
theft cases.
Stochastic features,
Weighted features selec-
tion, Handle overfitting
through SMOTE, Update
theft cases 1 and 2
DR, FPR, Time complex-
ity, Recall
Difficult to tune hyper
parameters of GBCs,
High time complexity of
GBCs, Overfitting issue of
SMOTE, Privacy leakage
due to high sampling rate
[33] Low DR of tra-
ditional methods, Require
domain knowledge to ex-
tract prominent features,
Low accuracy of conven-
tional ML methods, Diffi-
cult to analyse periodicity
from 1D data, Missing val-
ues and outliers
Data driven approaches
not required hardware
devices, Latent features
and periodicity extracted
through DL models, Miss-
ing values through linear
interpolation, Outlier by
three sigma rule
Less expensive, Do not
require hardware devices,
ROC, Precision, Recall,
FPR
MLP not designed for
time-series data, Class im-
balance problem, Limita-
tion of ReLU function
[34] Cyber attacks dif-
ficult to detect through
conventional ML methods.
CNN and MLP networks
not designed for sequence
data.
DL models have ability
model to learn and extract
latent features of data.
LSTM models are de-
signed to handle sequence
data
TPR, FPR, PR-curve,
ROC-AUC
High complexity of MLP,
Class imbalance problem,
High dimensionality of
time-series data
[57] Data imbalance prob-
lem, Use smart meter and
auxiliary data, Drift pat-
terns
Class imbalance problem
handles through RUS,
New features created
from smart meter and
auxiliary data, Recent
and old abnormal patterns
detect through Z-socre and
K-mean clustering
PR-curve, Good accuracy
on new generated features,
ROC-AUC,
Underfitting issue, High
dimensionality of time-
series, Computation over-
head of Grid search CV
[45] Class imbalance
problem, High FPR, Man-
ual feature engineering
Use SMOTE to solve
class imbalance problem,
Extract optimal features
through CNN, Sequence
data classification through
LSTM
Precision, Recall, F1-
score
Overfitting of SMOTE,
Difficult to train hyper
parameters of LSTM
[46] Current approaches
expensive and time con-
suming, Class imbalance
problem, Not suitable per-
formance measures, Curse
of dimensionality
Proposed data driven
based framework, Solve
class imbalance through
RUS, Optimal feature
selection through MOD-
WPT, Select meaningful
performance measures for
class imbalance problem
Accuracy, Recall, F1-
score, Specificity, ROC-
AUC, MCC
Information loss due to
RUS, FPR measure not
considered
[58] Curse of dimension-
ality
BHA to select prominent
features, Perform compar-
ison between BHA, GA,
PSO and HS
Convergence speed, Accu-
racy score, Computation
time
Class imbalance problem,
High FPR, Computation
overhead of meta heuristic
techniques
[69] Require handcrafted
features
Multi Dense-Net CNN to
capture periodicity and
hidden features
Precision, Log loss Class imbalance problem,
Not suitable performance
measures
[70] Existing techniques
unable to detect new type
of attacks, Require manual
checking, Traditional ap-
proaches expensive, Gen-
erate malicious samples,
Class imbalance problem
SMOTE to solve class
imbalance problem, Ex-
tract features through
CNN, Classify data points
through RF, Use theft
cases to generate mali-
cious samples
Precision, Recall, F1-
score
Overfitting of SMOTE
11
[71] Existing methods ex-
pensive and time consum-
ing, Curse of dimensional-
ity, Limited budget for on
site inspection, Observer
meter only identifies spe-
cific area not culprit
Machine learning methods
efficient and less time con-
suming , Feature selection
through Gini index, Com-
pare between ML meth-
ods, Ensemble methods
achieve best results
Precision, Recall and F1-
score measures
Class imbalance problem
[74] Class imbalance
problem, Feature selection
and extraction, Reduce
misclassification of SVM
1D-GAN, Features se-
lection using clustering
evaluation criteria, Feature
extraction using autoen-
coder, Proposed similarity
measure through euclidean
distance and dynamic time
wrapping, Reduce SVM
misclassification error
through projection vector
and KNN
Accuracy, ROC curve, Ex-
ecution time
No comparison done with
base techniques
[123] Existed methods ex-
pensive, Require hardware
devices to detect NTL,
High FPR
Perform feature engineer-
ing to select optimal fea-
tures, Extracted feature are
evaluated through gradient
boosting classifiers
ROC-AUC, PR-curve Class imbalance problem
[95] Severe class imbal-
ance problem, Curse of
data dimensionality
Feature extraction through
GAN, Handle severe
class imbalance problem
through one class SVM
DR, FPR Class imbalance problem,
Low DR
[111] Low generalization
ability and high FPR of
existing classifiers, Van-
ishing gradient problem,
Class imbalance problem
Autoencoders have good
generalization ability on
high dimensional datasets
ROC-AUC, FPR Execu-
tion time, DR
Privacy issue due to high
sampling rate, High FPR,
PSO local optima problem
[142] Low accuracy,
Overfitting, Low conver-
gence speed, High FPR
Introduced new version of
LSTM
Precision, Recall, F1-
score, Convergence speed
Not suitable for large
datasets
[143] Existing methods
expensive and time con-
suming, High FPR, Low
DR, Class imbalance prob-
lem, Not used all records
Handle class imbalance,
Low FPR, High DR, Bag-
ging methods perform bet-
ter on larger datasets
ROC curve, Confusion
matrix, Computation time,
DR
Overfitting of SMOTE
[144] Labelled data
required for supervised
methods, Low perfor-
mance of unsupervised
learning methods, Diffi-
cult interpretability and
practicality of DL methods
Suitable for low power
hardware devices, Over-
come the limitation of
classification and cluster-
ing methods
Precision, Recall, Classifi-
cation accuracy
High time complexity and
difficult to tune hyper pa-
rameters of SVM
[145] Tedious task to
design utility function,
Low DR, High FPR, Class
imbalance problem, Poor
generalization ability,
Sudden deviation in nor-
mal consumption
High performance of en-
semble methods
Accuracy, Recall, F1-
score, Sensitivity, FPR
Information loss due to
RUS, FPR measure not
considered
[146] Label data requires
for supervised learning
methods, Low perfor-
mance of unsupervised
methods, Difficult inter-
pretability and practicality
of DL methods
Suitable for low power
hardware devices, Semi-
supervised model requires
low amount labelled data,
Overcome the limitation of
classification and cluster-
ing methods
Precision, Recall, Accu-
racy
High time complexity and
difficult to tune hyper pa-
rameters of SVM
[147] Bad performance of
classifiers against sudden
changes
Remove outliers through
K-Means clustering, Pat-
terns learned by LSTM,
Decide about theft or nor-
mal pattern through pre-
diction error
Precision, Recall Larger detection delay,
High time complexity
LSTM
2.2 Handling the Class Imbalanced Problem
In [2,5,45,74,143] , data imbalance is a major issue for training of ML classifiers. Benign sam-
ples are easily collected by getting the history of any consumers. While, theft cases rarely hap-
12
[148] Supervised learn-
ing methods not feasible
for practical applications,
Low performance of ex-
isting methods on large
datasets, Require domain
experts for feature engi-
neering
Spectral density function
and decision tree used to
extract optimal features,
Ensemble method used to
design different architec-
ture of autoencoders
Feature extraction ability
of autoencoders, Compu-
tation time, Shapiro test
—
[149] Low DR due to un-
labelled data, Only elec-
tricity consumption data
used, False alarm genera-
tion
Remove outliers through
K-means clustering algo-
rithm, Generate theft sam-
ples, Classify normal and
theft observation through
ANN
Precision, FPR, Accuracy ANN not suitable for time-
series data, High time
complexity of ANN
[150] Handle data pois-
ing attacks, Robust theft
detector, Design a robust
theft detector
Effect of data poison-
ing attacks, Generalized
model against data poi-
soning attacks, Compare
performance of sequential
and parallel ensemble
methods
DR, FPR, Specificity,
Classification accuracy,
F1-score
NA
[156] What types of at-
tacks can be applied at
generation side to falsify
the data? What type of
data used by electric util-
ities to detect the attacks?
Which DL model gives
high and robust perfor-
mance?
Design attacks that are ap-
plied at generation side,
Apply attacks to gener-
ate malicious samples, De-
sign hybrid DL models
and evaluate their perfor-
mance
DR, Precision, Recall, F1-
score
Overfitting problem
[157] Supervised learn-
ing methods not feasible
for practical applications,
Low performance of exist-
ing methods, Require do-
main experts to extract op-
timal features
Spectral density function
and decision tree used for
feature engineering, En-
semble method used to de-
sign different architecture
of auto encoders
Feature extraction through
autoencoders, computa-
tion time, Shapiro test
No mechanism tell
whether data normal or
not
pened in the real world. So, lack of theft samples limit classification accuracy and increase FPR.
There are two main approaches to solve data imbalance problem: RUS and ROS techniques.
Former selects existing copies of minority class and generates duplicated records. Whereas,
latter randomly selects samples from majority class and discards them. Result in, this tech-
nique losses the potential information of data. Xiaolong proposed synthetic minority oversam-
pling technique (SMOTE) to create artificial samples of minority class using euclidean distance.
SMOTE technique has many advanced versions like Random-SMOTE, Kmeans-SMOTE, etc.
However, these sampling techniques do not represent the overall distribution of data, which
affects the FPR and DR badly. In [2], authors introduce six theft cases to generate malicious
samples using benign samples. They argue that goal of theft is to report less consumption than
actual consumption or shift load toward low tariff periods. After generating malicious samples,
authors exploit ROS technique to solve class imbalance problem. In [74], authors use GAN
to create theft samples. GANs are belonged to DL domain. These are mostly used in image
processing field to generate fake images. The EC data is 1D time-series. So, authors implement
1D-Wasserstein GAN (WGAN) to generate fake theft samples, which have high resemblance
with real world theft cases. WGAN contains two sub models: generator and discriminator.
Both modules use game theory based approach and try to deceive each other to generate new
fake samples. In [45], authors use six theft cases, which are introduced by [2] to generate
malicious samples and SMOTE is exploited to handle class imbalance problem. In [143], au-
thors use SMOTE and near miss technique to tackle class imbalance problem. Near miss is a
RUS technique, which randomly selects samples of majority class and remove it from data until
both classes have equal ratio. After balancing dataset, the authors perform comparison between
bagging and boosting ensemble techniques. However, both techniques give better results on
13
SMOTE rather than near miss. In near miss, some samples of majority class are removed to
balance dataset. These samples may be contain important information, which decrease classifi-
cation accuracy. Whereas, in SMOTE, we do not remove any information from data.
In [2], authors argue that goal of theft is to report less consumption or shift load from high
tariff periods to low tariff periods. So, it is possible to generate malicious samples from benign
ones. In [74], authors use 1D-WGAN to generate duplicated copies of minority class. Different
visualization plots of fake and real samples help us to decide about effectiveness of generated
samples. At end, authors compare 1D-WGAN performance with data generation techniques
like SMOTE and improved SMOTE. In [5,45], SMOTE technique is used to tackle the class
imbalance ratio. In [2], authors use ROS technique to handle the class imbalance ratio. The
ROS technique replicates existing samples of minority class, which create an overfitting prob-
lem. Where, classifier gives higher accuracy on training data than test data. Electricity theft
cases rarely happen in real world. Theft samples for a customer rarely exist or do not exist,
which limits the DR of any ML classifier. The authors introduce six theft cases to generate
malicious samples to balance the ratio between normal and theft cases. However, cases 1 and 2
do not have resemblance with real theft cases. In [33,34,69,71,95,123,144,145] , authors do
not tackle class imbalance problem. One of severe issue in ETD is class imbalance ratio where,
one class (honest consumers) is dominant to other class (theft consumers). Data is not normally
distributed and skewed towards the majority class. If ML model is applied on imbalance dataset;
it would be biased towards majority class and not learned important features of minority class,
which increases the FPR. In [143], authors use SMOTE and Near miss method to handle class
imbalance problem. Near miss reduces majority class sample to balance ratio between normal
and theft samples. This technique discards useful information of dataset that creates an underfit-
ting problem due to the limited number of samples. In [46,57], class imbalance ratio is a severe
problem in ETD. Where, non-fraudulent consumers are more than fraudulent ones. Due to this
problem, ML classifiers bias toward majority class, ignore the minority class and generate false
alarms. A utility cannot bear false alarms because it has low budget for on-site inspection. The
authors apply RUS technique to handle data imbalance problem. This method randomly selects
samples of majority class and removes it until both classes have equal ratio. However, RUS
technique randomly selects sample from majority class and removes them. In case of highly
imbalance ratio, it discards important information of data, which creates underfitting problem.
2.3 Optimizing the Hyperparameters
The parameters whose value define structure of a ML model is known as hyper-parameters. The
process of choosing best parameters is called parameter tuning. There are different techniques
in literature, which are used to find optimal parameters like random search, grid search and meta
heuristic techniques, etc. In [5,45,58,71,74,142–149], authors use random search method to
find the optimal hyper-parameters. Random search sets up a grid of parameters, selects random
combination of parameters to train models and calculates classification accuracy. The number
of search iterations depend upon time and system resources. In [2,33,34,46,57,69,70,95,123,
150], authors use grid search to perform parameters tuning of ML models. Grid search also
sets up a grid of hyper-parameters, trains models on each combination and calculates classifier
performance. It is computationally expensive because it checks each combination of parameters.
Both techniques have advantages and disadvantages. In existing literature, experimental results
show that grid search performs better as compared to random search. Selection of technique
depends upon system resources and nature of dataset. The literature work proves that grid search
is suitable for smaller datasets. Whereas, random search is better for larger datasets. In [111]
authors select best parameter of autoencoder through particle swarm optimization (PSO). The
PSO is a swarm based optimization technique, which is used to solve numerical and other ML
problems. However, PSO falls into local optima problem in case of high dimensional feature
space and does not give good results in iterative process.
14
Predicting price movements is always a difficult task as the financial market has complex
and non-linear behaviour depending on many factors such as government policies, investor sen-
timent, etc. To deal with complex and difficult task, various statistical, ML, and DL methods are
used in the literature to predict upward and downward trends in financial market. [162] predict
daily stock returns using artificial neural network (ANN) and deep neural networks. The perfor-
mance of the implemented models is evaluated using SPDR and S&P 500 ETF datasets. PCA
is used to reduce the high dimensionality of the data, which leads to overfitting. [163] propose a
hybrid model based on autoregressive fractional integrated moving average and long short term
memory (LSTM). The performance of the model is compared with autoregressive integrated
moving average and regression residual based neural network using mean square error, root of
square error and mean absolute percentage error. The authors propose a DL model, which con-
sists of a recurrent neural network [164]. Sentiment analysis is performed on financial market
data by extracting sentiment related features from news articles. The extracted features are fed as
input to a support vector machine (SVM) to predict upward and downward trends in market be-
haviour. The parameters of the SVM are tuned using the particle swarm optimization technique,
indicating that it has high accuracy and low complexity compared to DL models [165]. [60]
implement the LSTM classifier on the large data of S&P 500 (ranging from 1992 to 2015) and
compare its performance with the state of the art classifiers: random forest (RF), deep neural
network and logistic repressor. The authors also use LSTM to predict market behaviour and
provide answers to five questions related to the financial market [166]. In [146], features are
extracted from time-series data using a 1D convolutional neural network (CNN) to protect the
model from biasness issues of technical indicators. Their results show that 1D convolutional
layers retrieve well-generalised and representative features that give better results compared to
conventional technical indicators.
Existing literature uses statistical and econometrical techniques to predict the future behav-
ior of the financial market. However, these techniques have low detection accuracy, resulting
in a large loss for potential investors. Recently, the research community has shown great in-
terest in the ML and DL models because of their ability to learn temporal and latent patterns
in financial market data. However, the ML and DL models have the following limitations such
as the curse of dimensionality and inappropriate parameter setting, low accuracy of standalone
models, and inability to learn complex patterns. In [66], a SVM-based model is proposed to
forecast financial market volatility; however, the high dimensionality of the data is not handled,
which leads to the complexity of the model and generalization problems. As a result, proposed
model suffers from the problem of overfitting. In [67], PCA is used to overcome the effect of
the curse of dimensionality. However, it is designed for linear datasets. It does not give good
results on volatile and complex datasets. In [68], a stacked autoencoder is used for dimension-
ality reduction. However, it is sensitive to data diversity, as small changes in the input lead to a
huge value of sum-of-squared error.
As we started the research work, we have read state-of-the-art articles that are belonged
to smart grids and financial market domains. After reading research articles, we have identi-
fied three sub problems 3.1,3.2 and 3.3 and proposed three different solutions for them. The
validations of proposed solutions will be done in thesis work.
3 Problem Statement
After reading state-of-the-art research artciles, we have identified three subproblems, which are
explained below.
15
3.1 Sub Problem Statement 1
In [2], the authors propose a CPBETD and use SVM to identify normal and abnormal EC
patterns. However, the CPBETD does not use any feature engineering technique to solve the
curse of dimensionality issue. The curse of dimensionality refers to a set of problems that
occur due to high dimensionality of a dataset. A data set, which contains a large number of
features, generally in order of hundreds or more, is know as a high dimensional dataset. A time
series dataset has high dimensionality that increases time complexity, reduces DR and affects
the generalization of a classifier. In [33,34], the authors solve the curse of dimensionality issue
of data by selecting the prominent features through DL and meta-heuristic techniques. However,
the authors do not address class imbalance problem, which is a major issue in NTLs detection.
In [5,143], the authors use SMOTE to handle class imbalance ratio. However, SMOTE creates
an overfitting problem where ML models give good accuracy on training, whereas their accuracy
decreases on testing data. So, it does not perform well on time series data. In [57], the authors
use RUS technique to handle class imbalance ratio. However, this approach discards the useful
information from data, which creates an underfitting issue
To solve above mentioned limitations in Sub Problem Statement 1, we have Proposed Solu-
tion 1.
3.2 Sub Problem Statement 2
For a given consumer in the smart grid, the normal samples are easily collected from consumer
history, while theft cases are rarely available or not available in users’ consumption history.
Despite of the extensive use of ML classifiers, a few ML researchers focus on the curse of
dimensionality issue, which creates overfitting, computational overhead and storage constraints.
In [2], Joker et al. propose EC theft detector that is based on support vector machine (SVM) and
hardware devices to differentiate between normal and abnormal patterns. Both above mentioned
issues generate false alarms, which are not bearable for an electric utility due to the limited
budget for on-site inspections. In [58], the authors use four metaheuristic techniques: black
hole, harmonic search, particle swarm optimization and differential evolution to select optimal
features from EC dataset. They use accuracy as fitness function to evaluate performance of the
selected features by the four techniques. However, accuracy is not a good measure for class
imbalance datasets. In this study, a framework is designed that is based on three modules to
tackle the above mentioned issues.
To solve above mentioned limitations in Sub Problem Statement 2, we have Proposed Solu-
tion 2.
3.3 Sub Problem Statement 3
The financial market has complex and volatile nature. In the existing literature, statistical and
econometrical techniques are utilized to predict the future behaviour of financial market. How-
ever, these techniques have low detection accuracy, which results in a huge loss to potential
investors. In the recent era, research community are shown high interest in ML and DL models
because they have ability to learn temporal and latent patterns in financial market data. However,
ML and DL models have the following limitations like curse of dimensionality and inappropri-
ate parameters tuning, low accuracy of stand-alone models and unable to learn complex patterns.
In [66], high dimensionality of data is not handled, which leads to model’s complexity and gen-
eralization issues. Results in, implemented models suffer into overfitting issue. In [67], PCA
is used to overcome the effect of curse of dimensionality. However, it is designed for linear
datasets. It does not give good results on volatile and complex nature datasets. In [68], stacked
autoencoder is leveraged for dimensionality reduction. However, it is sensitive to data diver-
sity because minor changes in input create a huge value of sum-of-squared error. It also does
16
not have good generalization property. Furthermore, single classifiers have an overfitting issue
where they give good results during the training as compared to test data.
We have proposed a solution Proposed Solution 3 to solve above mentioned limitations.
4 The Proposed Solution 1: Hybrid of GRU and GoogleNet for
classification of malicious and benign samples
The proposed model 1 is used to handle the identified limitations by problem statement 3.1.
4.1 Acquiring the Dataset
SGCC dataset is used in this study to evaluate the performance of the proposed model. It con-
tains consumers’ IDs, daily EC and labels either 0 or 1. It comprises EC data of 42,372 con-
sumers, out of which 91.46% are normal and remaining are thieves. Each consumer is labeled
as either 0 or 1, where 0 represents normal consumer and 1 represents electricity thief. These
labels are assigned by SGCC after performing on-site inspections. The dataset is in a tabular
form. The rows represent complete record of each consumer. While columns represent daily
EC of all consumers. The meta information about dataset is given in Table 4.
4.2 Data Preprocessing
Data preprocessing is an important part of data science where the quality of data is improved
by applying different techniques that directly enhance the performance of ML methods. In this
section, the data preprocessing techniques used in this synopsis are discussed in detail.
4.3 Handling the Missing Values
EC datasets often contain missing or erroneous values, which are presented as not a number
(NaN). The values often occur due to many reasons: failure of smart meter, fault in distribution
lines, unscheduled maintenance of a system, data storage problem, etc. Training data with
missing values have negative impact on the performance of ML methods. One way to handle
the missing values is to remove the consumers’ records that have missing values. However, this
approach may remove valuable information from data. In this study, we use a linear imputation
method to recover missing values [5].
Table 4: Dataset information
Time window Jan. 1, 2014 to Oct. 31, 2016
Total consumers 42372
Normal consumers 38757
Electricity thieves 3615
f(xi)=
xi,j-1 +xi,j+1
2,xi,j =N aN, xi,j±1̸=NaN,
0,xi,j-1 =NaN or xi,j+1 =N aN,
xi,j,xi,j ̸=N aN.
(1)
In Equation (1), xi,j represents daily EC of a consumer iover time period j(a day). xi,j-1 repre-
sents EC of the previous day. xi,j+1 represents the EC of the next day.
17
Algorithm 1: Data preprocessing steps
Data: EC dataset: X
1X= (x1, y1),(x2, y2), ..., (xm, ym)
2Variables: mini=minimum value of consumer xi,maxi=miximum value of
consumer xi,xi=mean of consumer xi,σi=standard deviation of consumer xi,
row, col =X.shape
3for i←row do
4for j←col do
5Fill missing values:
6if xi,j−1&& xi,j +1 ̸=NaN && xi,j == NaN then
7xi,j = (xi,j−1+xi,j +1)/2
8end
9if xi,j−1∥xi,j +1 == NaN then
10 xi,j = 0
11 end
12 Remove outliers:
13 if xi,j > xi+ 3σithen
14 xi,j =xi+ 3σi
15 end
16 Min-max normalization:
17 xi,j =xi,j −mini
maxi−mini
18 end
19 end
Result: Xnormalized =X
4.4 Removing the Outliers from Dataset
We have found some outliers in the EC dataset. One of the most important steps of data prepro-
cessing phase is to detect and treat outliers. The supervised learning models are sensitive to the
statistical distribution of data. The outliers mislead the training process as a result the models
take longer time for training and generate false results. Motivated from [33], we use three-sigma
rule (TSR) to handle outliers. Mathematical form of TSR is given in Equation (2).
f(xi)=(¯xi+ 3 ×σ(xi), if xi,j >¯xi+ 3 ×σ(xi),
xi,j otherwise. (2)
xirepresents complete energy EC history of consumer i. The ¯xidenotes average EC and σ(xi)
represents standard deviation of consumer i.
4.5 Normalization
After handling the missing values and outliers, we apply the min-max technique to normalize
the dataset because all DL models are sensitive to the diversity of data [33]. The experimental
results show that DL models give good results on normalized data. The mathematical form of
min-max technique is given in equation (3).
xi,j =xi,j −min(xi)
max(xi)−min(xi)(3)
The min(xi)and max(xi)represent minimum and maximum values of EC of consumer i,
respectively. All data preprocessing steps are shown in algorithm 4. In line number 1 and 2, the
dataset is acquired from an electric utility and variables are initialized. In line number 3 to 19,
18
following steps are performed: remove missing values, handle outliers and apply the min-max
normalization technique. Finally, we obtain a normalized dataset.
4.6 Exploratory Dataset Analysis
Electricity theft is a criminal behaviour, which is done by tampering or bypassing smart meters,
hacking smart meters through cyber attacks and manipulating meter readings using physical
components or over the communication links. Since EC data contains normal and abnormal
patterns, that is why data driven approaches receive high attention from research community
to differentiate between benign and thief consumers. We conduct a preliminary analysis on
EC data through statistical techniques to check existence of periodicity and non-periodicity in
consumers’ EC patterns. Meta information about dataset is given in Section 4.1.
15 20
Days
0
5
10
15
20
25
30
kWh
5 10 15 20
Days
Figure 2. Monthly electricity consumption of a normal consumer
Days
1st week
2nd week
3rd week
4th week
1 2 3 4 5
0
5
10
15
20
25
30
kWh
Figure 3. Weekly electricity consumption of a normal consumer
Figure 2shows the EC pattern of a normal consumer during a month. There are a lot of
fluctuations in a monthly EC pattern. So, it is difficult to find normal and abnormal patterns
from 1D time series data. Figure 3shows EC patterns of a normal consumer according to
weeks. The EC is decreasing on days 3 and 5, whereas, it is increasing on days 2 and 4. While,
2nd week shows abnormal pattern, which is different from other weeks. We also conduct similar
type of analysis on theft patterns. Figures 4and 5show EC during a month and a week of an
energy thief. There are a lot of fluctuations in monthly measurements and no periodicity exists
in weekly EC patterns.
19
0
5
10
15
20
25
30
kWh
5 10 15 20
Days
Figure 4. Monthly electricity consumption of a abnormal consumer
1st week
2nd week
3rd week
4th week
1 2 3 4 5
Days
0
5
10
15
20
25
30
kWh
Figure 5. Weekly electricity consumption of a normal consumer
Moreover, the correlation analysis is conducted between EC of thieves and normal con-
sumers. Figure 6shows Pearson correlation values of a normal consumer that are mostly more
than 0.3. It is the indication of a strong relationship between weekly EC patterns of a normal
consumer. Figure 7shows Pearson correlation values of electricity thief, which indicate poor
correlation between weekly EC data. Hereinafter, we use Euclidean distance similarity mea-
sure to examine how much weekly observations are similar to each other. Euclidean distance
is calculated for both normal and theft consumers. We compare EC pattern of the last week of
a month with the previous three weeks and then take the average of differences to decide how
much normal EC is different to abnormal EC. We observe that the Euclidean distance between
normal EC pattern is low as compared to abnormal ones. Similar type of findings are found
in the whole dataset. To avoid the repetition, exploratory data analysis is conducted on some
observations, which are shown in Figures 2-7and Table 5.
f(x) = q(wi,j−wm,j)2+... + (wi,j−n−wm,j −n)2.(4)
Equation (4) shows Euclidean distance formula to measure similarity between weekly EC pat-
tern. The wiand wmdenote ith and mth weeks. j is a EC of a specific week day j ≤5.
After conducting statistical analysis on thieves and normal consumers, we conclude that
theft patterns have more fluctuations (less periodic) than normal EC patterns. We believe that
this type of patterns can also be observed in datasets, which are collected from different regions
of countries. However, it is challenging to capture long-term periodicity from 1D time series
dataset because it consists of longer sequential patterns. The conventional statistical and ML
20
Ist week
2nd week
3rd week
4th week
Ist week
2nd week
3rd week
4th week
10.59 0.31 0.2
0.59 10.8 0.27
0.31 0.8 10.58
0.2 0.27 0.58 1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Figure 6. Pearson correlation analysis weekly consumption of a normal consumer
1-0.067 0.43 0.042
-0.067 10.51 -0.79
0.43 0.51 1-0.79
0.042 -0.79 -0.79 1
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
Ist week
2nd week
3rd week
4th week
Figure 7. Pearson correlation analysis weekly consumption of a abnormal con-
sumer
models, such as autoregressive integrated moving average, SVM, decision tree are unable to
retrieve these patterns. Based on the above analysis, we pass 1D data to GRU model because it
is specially designed to capture temporal patterns from time series data. Whereas, 1D EC data
is stacked according to weeks and is fed into GoogLeNet to extract periodicity between weeks.
4.7 The Description of Proposed Model
The proposed system model contains following steps:
• handling the class imbalance problem using TLSGAN,
• extracting prominent features utilizing GRU and GoogLeNet,
• classifying the theft and benign samples leveraging fully connected neural network,
• handling the non malicious factors using memory units of GRU and
Table 5: Euclidean distance similarity measure
Consumers w1, w2w1, w3w1, w4Average
Normal 4.70 4.83 3.66 4.40
Theft 4.66 3.54 12.90 7.03
21
• enhancing the model’s generalization ability with the help of dropout and batch normal-
ization layers.
Each of the above mentioned steps is explained in the following subsections.
4.7.1 Handling the Class Imbalance Problem
One of the critical problems in ETD is class imbalance ratio where one class (honest consumers)
is dominant to other class (electricity thieves). The EC data is not normally distributed and
skewed towards the majority class. When a ML model is applied to an imbalance dataset, it
becomes biased towards the majority class and do not learn important features of the minority
class, which increases the FPR. Traditionally, two sampling techniques such as ROS and RUS
are used to balance the dataset. However, these techniques have some limitations: overfitting,
information loss and duplication of existing data. In this synopsis, we propose TLSGAN to
handle class imbalance ratio because it is specially designed for time series datasets by utilizing
GRU layers. Its objective function is based on the least-square method that computes the dif-
ference between real and fake samples and generates new samples, which have high closeness
to real samples. The collected electricity theft data belongs to the time series domain. So, GRU
layers are exploited to design the TLSGAN model. Using the least square function, the model
learns a small amount of both real theft data distribution and generated fake samples. Finally, the
generated samples are concatenated with real samples and class imbalance problem is solved.
The overall working mechanism of TLSGAN is explained below See Code.
We select the existing theft data as training data. The theft samples are presented as Pdata (x).
A random noise or latent variable zis drawn from Gaussian distribution Pg(z). A mapping rela-
tionship is established between Pg(z)and Pdata(x)through the GAN model. The GAN model
contains two DL models: generator (G) and discriminator (D). The former is responsible to learn
regularities from Pdata(x)distribution and generate fake samples. It takes a random variable z
as input from Pg(z)and produces G(z)as output. Its main goal is to fit Pg(z)onto Pdata (x)
to generate highly resembling fake samples with real theft samples and confuse the D as many
times as possible. The D is responsible to discriminate whether input data is real or fake. It takes
real theft samples and synthetic samples generated by G as input and produces output either 0
or 1, which indicates that the generated samples are either real or fake. The mathematical form
of min-max equation of GAN network is given below [151].
min
Gmax
DVGAN (D, G) = Ex∼pdata (x)[log D(x)]+
Ez∼pz(z)[log(1 −D(G(z)))],
(5)
where, VGAN (D, G)is the loss function of GAN, Ex∼pdata (x)is the expected value of theft
distribution and Ez∼pdata(z)is the expected value of latent distribution.
The standard GAN network is suitable for unsupervised learning problems. It uses the
binary cross-entropy function to draw a decision boundary between real and fake samples. The
limitation of binary cross-entropy is that it tells whether the generated sample is real or fake
but does not tell how much generated samples are far away from the decision boundary. It
creates a vanishing gradient problem and stops the training process of the GAN model. In
[151], the authors propose a least square generative adversarial network (LSGAN) architecture,
which is an extension of the standard GAN model. It uses the least square loss instead of
binary cross-entropy loss function. The LSGAN provides two benefits. The standard GAN
only updates those samples, which are at wrong side of the decision boundary. The LSGAN
penalizes all the samples, which are away from the decision boundary, even if the samples
reside at the correct side of the boundary. During the penalization process, the parameters of
D and decision boundary are fixed. Now, G generates samples that are closer to the decision
boundary. Secondly, penalizing the samples near the decision boundary produces more changes
22
Algorithm 2: Training of TLSGAN
Data: Xnormalized
1Variables: Seperate theft & benign samples from Xnormaliz ed,Theft: T=
{xi,j ,xi,j+1,xi,j+2 , ..., xm,n}, Normal: N= {yi,j ,yi,j +1,yi,j+2,..., yp,n}
2while Stopping condition is not met do
3ti⇒Sample from theft distribution
4si⇒Sample from Gaussian distribution
51
tPt
i=1[1
2Et∼pdata(t)[(D(ti)−b)2] + 1
2Es∼ps(s)[(D(si)−a)2]]
6Fix discriminator weights
7Zi⇒Sample from latent space
81
nPn
i=1[1
2Ez∼pz(z)[(D(zi)−c)2]]
9end
10 aand bare labels of theft and fake patterns
11 cis distance that G wants to decieve D
12 After training of G, fake theft patterns are generated
13 F akeS amples =G(z)
14 XBalData =C oncatenate(F akeSamples, N, T )
Result: Return balanced dataset: XB alData
in gradients, which solves the vanishing gradient problem. The min-max objective function of
LSGAN is given in Equation (6), [151].
max
DVLSGAN (D) = 1
2Ex∼pdata(x)[(D(x)−b)2]+
1
2Ez∼pz(z)[(D(x)−a)2],
(6)
min
GVLSGAN (G) = 1
2Ez∼pz(z)[(D(G(z)) −c)2],
where, VLSGAN (G)is the loss function of LSGAN. The aand bare labels of real (theft
data) and fake samples. cis the value of distance between both samples. The G needs to
minimize this value in order to deceive D. The LSGAN is designed for generating fake images
using convolutional layers. We change the internal architecture and use GRU layers instead of
convolutional layers because we are working on a problem that belongs to sequential data. The
training process of TLSGAN is presented in algorithm 2. We pass Xnormalized data to algorithm
2that is obtained from algorithm 4. In the first step, variables are initialized. In steps 2 to 9,
TLSGAN is trained on theft samples to generate fake theft patterns. In steps 10 to 14, the data
is generated from latent distribution and passed to G to produce fake theft samples. At the end,
we concatenate fake samples generated by G, original theft samples, and normal samples and
return a balanced dataset XBalData .
4.7.2 Architecture of Hybrid Model
Time series data of EC has complex structure with high random fluctuations because it is af-
fected by various factors like high load, weather conditions, big party in a house, etc. Traditional
models like SVM, MLP, etc., are not ideal to learn complex patterns. The models have low DR
and high FPR due to curse of dimensionality issue. In literature, different DL models are used
to learn complex patterns from time series data.
In this synopsis, a hybrid model is proposed, which is a combination of GoogLeNet and
GRU. In [152,153], the authors prove that hybrid DL models perform better than individual
learners. The proposed model takes advantages of both GoogLeNet and GRU by extracting and
23
remembering periodic features of EC dataset. The architecture of the proposed model consists
of three modules: GRU, GoogLeNet and hybrid. We pass 1D data to the GRU module. Whereas,
2D weekly EC data is passed to the GoogLeNet module. The hybrid module takes outputs of
both modules, concatenates them and gives final results about having anomaly in EC patterns.
The hybird DL models are very efficient because they allow joint training of both models. Fig-
ure 8shows overall structure of the proposed model. In the proposed system model, steps 1, 2
and 3 show data preprocessing phase where we handle missing values, outliers and normalize
the dataset, respectively. In step 4, the class imbalance problem is solved. In steps 5 and 6,
prominent features are extracted from 1D and 2D EC datasets using GRU and GoogLeNet mod-
els, respectively. Finally, in step 7, extracted features of GRU and GoogLeNet are concatenated
and passed to a fully connected neural network to classify between normal and theft samples.
S1
xnk xnk
x11 x1k
xnk xnk
x11 x1k
xnk xnk
x11 x1k
x1x2
Conv layers Conv layers
Pooling
layers
Conv
layers
Text
GRUn
GRU3
GRU2
Balancing the
dataset
1D data
2D data
5
6
7
Hybrid module
GoogleNet module
x1x2
GRUh
GoogLeNet
Output with sigmoid
function
L5 S2
L5 S2
L6 S3
GRU module
SGCC labeled dataset
1
3
2Data preprocessing
module
5
6
GRU & GoogleNet
modules
7Hybrid module
4Data imbalance
module
L1, L2, L3, L4
L5
S1
L6
L7
S2
S3
S4
Hybrid layer
GRU1
Noise
Discriminator
Real
Discriminator
loss
Generator
4
FakeGenerator loss
L2
L4
L1
L3
Missing
values 12
Outliers
3Normalization
Theft
samples
Normal
samples
L6: High FPR & Overfitting issue
S1: TimeGAN
S2: GoogLeNet and GRU
L1: Class imbalance
L2: Information loss due to RUS
L3: Data duplication due to ROS
L4: Overfitting due to SMOTE
S3: Dropout layers and batch
normalization
L5: Curse of dimensionality
issue
Figure 8. The proposed system model: HG2
4.7.3 Gated Recurrent Unit
We observe that there are a lot of fluctuations in theft EC patterns as compared to normal con-
sumers. So, 1D data is fed into GRU model to capture co-occurring dependencies in time
series data. GRU is proposed by Chung et al. in 2014 to capture related dependencies in time
series data. It has memory modules to remember important periodic patterns, which help to
24
handle sudden changes in EC patterns due to non-anomalous factors like changing in weather
conditions, big party in a house, weekends, etc. Moreover, it is introduced to solve the van-
ishing gradient problem of recurrent neural network (RNN). GRU and LSTM are considered
as variants of RNN. In [157], the authors compare the performance of GRU and LSTM with
RNN model on different sequential datasets. Both models outperform the RNN and solve its
vanishing gradient problem. In [142], the authors from Google conduct extensive experiments
on 10,000 LSTM and RNN architectures. Their final experimental results show that no single
model is found that performs better than GRU. Based on the above analysis, we opt GRU to
extract optimal features from EC dataset because it gives good results on sequential datasets. It
has reset and update gates that control the flow of information inside the network. The update
gate decides how much previous information should be preserved for future decisions. Whereas,
the reset gate decides that how much past information should be kept or discarded. Equations
of update and reset gates are similar to each other. However, the difference comes from weights
and gates’ usage. The equations of GRU network are given below [34].
zt=σ(Wz,[ht−1, xt]),(7)
rt=σ(Wr,[ht−1, xt]),(8)
ˆ
ht= tanh(W, [rt∗ht−1, xt]),(9)
ht= (1 −zt)∗ht−1+zt∗ˆ
ht.(10)
Where, t,zt,σ,Wzand xtrepresent time step, update gate, sigmoid function, update gate weight
and current input, respectively. ht-1,ˆ
hand rtare previous hidden state, candidate value, reset
gate, respectively. Wris reset gate weight, Wis weight of candidate value and htis hidden state.
The last hidden layer of GRU is presented as DenseGRU .
4.7.4 GoogLeNet
It is difficult to capture long-term periodicity from 1D EC data. However, periodicity can be
captured if data is aligned according to weeks as explained in Section 4.6. The GoogLeNet is
a DL model that is proposed by researchers at Google in 2014. It is designed to increase the
accuracy and computational efficiency of the existing models. Its architecture is similar to the
existing CNN models like LeNet-5 and AlexNet, etc. However, the core of the model is auxiliary
classifiers and inception modules. Each inception module contains 1×1,3×3,5×5and 7×7
convolutional filters that extract hidden or latent features from EC data. After each inception
module, the output of convolutional and max pooling layers are concatenated and passed to next
inception module. The auxiliary classifiers calculate training loss after 4th and 7th inception
modules and add it to the GoogLeNet network to prevent it from vanishing gradient problem.
In [33,153], the authors exploit 2D-CNN model to the extract abstract features from time
series dataset. Motivated from these articles, the GoogLeNet is applied to extract latent features
from EC data. The latent features increase model’s generalization ability. The 1D EC data is
transformed into 2D according to weeks and is fed as input to GoogLeNet model, which has
inception modules. Each inception module has max pooling and multiple convolutional layers
with different filter sizes. In [33], the authors use simple CNN model to extract local patterns
from EC data. In simple CNN model, multiple convolve windows of the same size move over
EC patterns and extract optimal features. However, the same size of convolve windows have
low ability to extract optimal features.
The GoogLeNet overcomes this problem through inception modules. Different number of
convolve and max pooling layers extract optimal features from EC data. Moreover, GoogLeNet
has less time and memory complexity as compared to the existing DL models. However, it is
designed for computer vision tasks that is why it has multiple inception modules to extract edges
and interest points from images. For our problem, we change the architecture and use only one
25
Algorithm 3: Training of HG2
Data: EC dataset: XB alData
1Data in 1D format
2X1D= {xi,j ,xi,j+1,xi,j+2 , ..., xm,n}
3m= 42372, n = 1034
4Convert data in 2D format
5Z=
x1,1· · · x1,k
.
.
.....
.
.
xj,1· · · xm,k
6j= 147, k = 7
7Pass X1Dto GRU
8zt=σ(Wz,[ht−1, xt])
9rt=σ(Wr,[ht−1, xt])
10 ˆ
ht= tanh(W, [rt∗ht−1, xt])
11 ht= (1 −zt)∗ht−1+zt∗ˆ
ht
12 DenseGRU =relu(W·ht, b])
13 Pass Zto GoogLeNet
14 Z[a, c]=(Z)[a, c] = PjPkf[j, k]Z[a−j, c −k]
15 a, c ⇒dimension of output matrix
16 F lGoogLeN et =f latten(Z)
17 DenseGoogLeN et = [W·F lGoogLeNet +b]
18 hHG2= (WH G2·[DenseGRU , DenseGoogLeN et ] + b)
19 b⇒bias term
20 YN T L =σ(hHG2)
Result: YN T L
inception module that extracts periodicity and non-periodicity from weekly EC patterns. Finally,
we use flatten and fully connected layers to attain principal features that are extracted through
convolutional and max pooling layers. The last hidden layer of GoogLeNet is presented as
DenseGoogLeN et
4.7.5 Hybrid Module
GRU memorizes the periodic patterns from 1D data. Whereas, GoogLeNet captures latent pat-
terns from 2D data. We combine the DenseGoogLeNet and DenseGRU to aggregate latent and
temporal patterns. The outcome of the model is calculated through sigmoid activation function
and training loss is measured using binary cross entropy.
hHG2= (WH G2·[DenseGoog LeN et, DenseGRU ], bHG2),(11)
YN T L =σ(hHG2).(12)
Where, hHG2: hidden layer of hybrid module, WHG2: weight of hybrid layer, bHG2: bias of hybrid
layer, YNTL: output and σ: sigmoid function. We pass XBalData to algorithm 3that is taken from
algorithm 2. On lines 1 to 3, variables are intialized. The 1D EC data is transformed into 2D
format from lines 4 to 6. On lines 7 to 17, we pass 1D data to GRU to extract time-related
patterns. Whereas, 2D data is fed into GoogLeNet to retrieve periodicity and non periodicity
from weekly EC patterns. On lines 18 and 19, we concatenate features of GRU and GoogLeNet
and apply sigmoid activation function, which classifies theft and normal EC patterns.
26
5 Proposed Solution 2: Framework based on Denoising Autoen-
coder and Metaheuristic Techniques
The solution 2 is proposed to tackle the identified limitations in problem statement 3.2.
5.1 Dataset Description
The dataset used in this study is taken from PRECON3, an energy informatics group in Pakistan.
The consumers that participated in this research have installed smart meters in their houses. So,
it is a reasonable assumption that the users are honest consumers. The public availability, large
number of measurements and information of different types of consumers make this dataset an
excellent source to perform analysis on the users’ consumption history. It includes electricity
measurements that are recorded after each second from 7-15-2018 to 5-31-2019 of 42 houses.
The sampling rate of electricity measurements is reduced one sample per half hour because high
sampling rate creates issues of memory complexity and computational overhead. It also affects
consumers’ privacy. The dataset contains electricity measurements of only normal consumers.
One solution is to use one-class SVM to classify normal and abnormal patterns. However, it
gives low DR and high FPR that are not suitable for an electric utility because it has limited
budget for on-site inspections. So, the purpose of proposed framework is to generate the mali-
cious samples using honest consumers’ samples. In [5], the authors introduce an updated version
of six types of theft cases to generate malicious samples. They argue that the purpose of theft is
to report less EC or shift the load from high tariff periods to low tariff periods. The theft cases
are applied on PRECON dataset and malicious samples are generated. The description about
each theft case is given below.
f1(xt) = αxt, α =random(0.1,0.9),(13)
f2(xt) = βxt, β =random(0.1,1.0),(14)
f3(xt) = γtxt, γt=random[0,1],(15)
f4(xt) = βmean(x), β =random(0.1,1.0),(16)
f5(xt) = mean(X),(17)
f6(xt) = x48−t.(18)
Where, X={x1, x2, x3, . . . , xt}is an input vector of benign samples durning T= [1, 48]. f1(.)
multiplies the EC measurements of every interval with a same random number from 0 to 0.9.
While f2(.)multiplies a meter reading at time internal twith a same random number from 0.1
to 1.0. The f3(.)multiplies a meter reading at time twith a different random number. f4(.)and
f5(.)generate a random factor multiplied with average EC and an average EC of an entire day,
respectively. The f6(.)reverses the order of meter readings. f5(.)and f6(.)launch an attack
against load control mechanism to shift the load from high tariff to low tariff periods. Figures
9and 10 shows the behavior of EC before and after applying the six theft attacks (Theft attacks
code).
5.2 Synthesized Features
We synthesize 11 features from EC data that are based on statistical and electrical measures.
The description of these features is given below.
1. Minimum consumption: It represents the minimum EC of a consumer on a specific day.
3Pakistan residential electrical consumption dataset
27
2. Maximum consumption: It represents the maximum EC of a consumer on a specific day.
3. Mean: The mean is a single value that is used to represent the center location of a set
of data points. We calculate the mean consumption of an entire day and consider it as a
feature because it tells about the overall consumption behavior of a user. It is calculated as
xi=xij ,xij+1 ,xij+2 , ..., xmn
n. Where, ndenotes the total number of features, mrepresents
the total number of records and xij denotes EC of a consumer iat interval j.
4. Standard deviation: It is an important statistical measure, which tells how much data
is dispersed from the central location. The dispersion in user consumption is obtained
through a single value. It is calculated as s=qP(xij −xi)2
n−1. Where, xirepresents the
mean consumption of a consumer i.
5. Skewness: It is an important statistical factor that is mostly used to measure symmetry or
asymmetry in EC dataset. It tells about the nature of a dataset: left skewed, right skewed
or normal distribution.
6. Mean absolute deviation (MAD): It measures how much EC of each interval is away
from its mean value. A small value of M AD indicates that data points are located near to
mean consumption and vice versa. It is calculated as M AD =P|x−xij|
n.
7. Peak to average ratio: It is a ratio of the maximum value of power signal to its average
value. It is a very important factor in EC measurement because its large value decreases
the performance of a smart grid.
8. Peak to peak value: It is the sum of minimum and maximum values of EC of a consumer
i.
9. Electric current (I): The flow rate of electric charges through any cross sectional area is
known as electric currrent. I=P
V, where, Prepresents electric power, while Vdenotes
voltage.
10. Active power: It is the amount of electrical energy that is consumed or utilized by an
electric circuit per unit time.
11. Resistance (R): It is measure of opposition to Ipassing through the electric wires. It is
calculated as R=V
I.
The same types of features are synthesized in [5]. They only consider statistical measures as
features. However, electrical measures have more importance as compared to statistical features.
So, in this study, we have considered both of them (See code for features synthesizing).
0 10 20 30 40
0
50
100
150
200
250
300
Power usage (kWh)
Time
Real Usage
Theft 1
Theft 2
Theft 3
Figure 9. EC before and after applying theft attacks
28
5.3 Description of the Proposed Framework
In this study, a framework is proposed that is based on three modules. In the first module, the
sampling rate of EC measurements is reduced from one second to half hour. The min-max
normalization is applied because the SVM classifier is sensitive to diversity of data. Afterwards,
the updated version of theft cases is exploited to generate malicious or theft samples from benign
ones. The generated samples are concatenated with benign ones and then passed to the SVM
classifier to classify normal and abnormal patterns. In the second module, we synthesize 11
new features from EC data (obtained in the previous module) using statistical and electrical
parameters. Now, the new features are fed as input to the SVM and metaheuristic techniques
(ABC and GA). The former checks that whether the new features are good representatives of EC
data or not. While the latter selects a subset of prominent features that gives the highest F1-score.
In the third module, the subset of prominent features is passed to the denoising autoencoder to
extract high variance features and fed them to the SVM as input to differentiate between normal
and abnormal patterns. The proposed framework reduces overfitting, computational overhead
and storage constraints that limit the adaptability of an ML classifier for real time applications
of smart grids. The description of GA, ABC and denoising autoencoder are given in subsections
5.3.1,5.3.2 and 5.3.3, respectively. The working process of the proposed framework is given in
Figure 11.
5.3.1 Genetic Algorithm
The GA is an evolutionary algorithm inspired by Charles Darwin’s theory. It was developed
by David E. Goldberg and has applications in various fields such as filtering and signal pro-
cessing, ML, DL, code breaking, numerical optimization problems, etc. [159]. This algorithm
mimics the process of natural selection, where only the individuals that have the best lifestyle
can survive. At the beginning, we randomise the population and calculate the fitness value. The
individuals with high fitness value have more chances to be included in the new offspring pop-
ulation. Three common methods are used to select a new offspring: roulette wheel, elitism and
tournament selection. In this study, tournament selection is used to select individuals from the
initial population to form make a new offspring. In this method, a tournament is held between
two or three individuals and the individual with the highest fitness value. Crossover and muta-
tion steps are then performed to maintain diversity and prevent the solution from getting stuck in
local optima. Crossover occurs between two parents, resulting in two offspring that possess the
traits of the parents. Mutation is a process that randomly selects a bit from an offspring and flips
it to avoid premature convergence. The process of GA for selecting optimal features is defined
as: (See GA code for selection).
0 10 20 30
0
50
100
150
200
250
300
Power usage (kWh)
40
Time
Real Usage
Theft 4
Theft 5
Theft 6
Figure 10. EC before and after applying theft attacks
29
EC signal
Pre-processing steps
(1) Reduce sampling rate
Generate malicious
samples
Prediction with
SVM Prediction with SVM
Feature's synthesis
Feature
selection by
GA
Feature
selection by
ABC
Select a subset of features
with the highest F1-socre
Prediction with SVM
Module 1 Module 2 Module 3
Encoder
Decoder
Error
Code
Parameter
tuning
(2) Normalization
Prediction
with SVM
Concatenation
malicious and
normal samples
Data splitting for
training and
testing
Theft/ Normal
Consumers
Decoder
Figure 11. Diagram of proposed framework
• initialize the population,
• f1-score of each member (solution) of population is calculated using SVM classifier,
• a subset of members is selected using the tournament selection approach,
• perform crossover on selected members,
• apply mutation on each member of the population and
• repeat number of required generations.
5.3.2 Artificial Bee Colony Algorithm
ABC belongs to the family of swarm algorithms used to solve optimization problems. It was
proposed by Dervis Karaboga in 2005 by studying the foraging strategy of honey bees. The
process of the algorithm starts when the bees leave the hive to find the food sources. After
finding the food sources, they return to the hive and exchange information with other bees about
the food sources, such as quality, distance and direction from the hive. The ABC algorithm
includes the following components: food sources, employed and unemployed bees (See ABC
code for feature selection).
• Food sources: Each food source is a solution to the defined problem.
• Employed bees: They leave the hive, find the food sources, store the information about
food sources like quality, direction and distance from the hive. They return to the hive and
share the information with other bees. The number of solutions or food sources is equal
to the number of employee bees.
• Unemployed bees: There are two types of unemployed bees: onlooker and scout. The
onlooker bees receive information about the food sources from employeed bees. They
select the food sources with the best quality, explore the area near to them and try to find
the food sources with better quality. If a food source is explored maximum times by the
employeed bees, which become scout bees. The scout bee explores the search space and
tries to find food sources with better quality than the existing ones. After finding the food
source, the scout bee becomes the employeed bee.
This is a general working behavior of the ABC algorithm. However, it is proposed to solve the
optimization problem of continuous values. In [160], the authors propose an improved version
30
of the ABC algorithm to select a subset of salient features from the existing ones. They use
the concept of modification rate (MR) to update the population in each iteration. The overall
working mechanism of ABC for selection of subset of optimal features are discussed below:
1. Initialize the food sources: for feature selection, it is a desirable strategy to select a subset
of features with the highest accuracy. For this purpose, ABC algorithm is initialized
with N food sources. The N is equal to the total number of features. Each food source
is initialized with a bit vector of size N, where 1 indicates that features will be part of
feature subset,
2. Feature subset of food sources are fed as input into SVM classifier and f1-score is calcu-
lated, which is utilized as fitness value,
3. Determine the neighbours of each food source using MR: The employee bees visit the
food sources and explore their neighbours. In feature selection, a neighbour is created
from the bit vector of original food source. The feature is inserted or not into a bit vector
of neighbour food source, it is decided using MR. A random or uniform number is created
between 0 and 1. If its value is less than MR, then the feature is included otherwise not,
4. Feature subset of neighbours are fed into SVM classifier and fitness of each neighbour
food is calculated,
5. If fitness of the newly created food source is better than food source, which is under
exploration, then information about new food source is updated and shared with other
bees. Otherwise, the LIMIT variable is updated. If the value of LIMIT is more than
MAX LIMIT then food source is abandoned. For each abandoned food source, a scout
bee is generated, which creates a new food source, then it becomes an employee bee and
calculates the fitness of newly created food source,
6. Onlookers’ bees collect information about food sources, which are evaluated by employee
bees. They select food sources with better fitness value or better probability of selection.
After this, they become employee bees and again start the processing at step 3,
7. Memories the food source with high fitness value as onlookers’ bees phase is completed,
8. Scout bees are generated to generate new food sources in replacement of abandoned ones.
After this, the processing is again started at step 3 and
9. This processing is continued until the maximum limit of iterations is completed.
5.3.3 Denoising Autoencoder
The autoencoder is an unsupervised neural network with three layers: input, bottleneck and
output. The structure of the autoencoder is given in Figure 12. It converts the original data into
low dimensional code and then tries to reconstruct original data from coded data with minimum
difference. The working process of the autoencoder contains two steps: encoding and decoding.
In the first step, the input data is converted into low dimensional space, which is known as code
(See denoising autoencoder code for feature extraction).
C=ge(X) = LReLu(WeX+be)(19)
In the second step, decoder reconstructs data from code and maps it with the original data.
Y=gd(C) = LReLu(WdC+bd)(20)
In equations 19 and 20,X={x1, x2, x3, ..., xt}is an input feature vector, C={c1, c2, c3, ..., cm}
is the number of neurons in the bottleneck layer and Y={y1, y2, y3, ..., yt}is an output layer
31
Encoder
(ge)
XY
Decoder
(gd)
For ideal case: X = Y
Bottleneck
layer
C
Input layer Output layer
Figure 12. Denoising autoencoder
vector. Weand Wdare weights from input to hidden and hidden to output layers, respectively.
The ge(·)and gd(·)are activation functions of encoding and decoding layers, respectively. These
functions try to map the relationship between encoded and input data. In this study, leaky rec-
tified linear unit (LReLU) is used as an activitation function instead of ReLU because it com-
pensates for the negative values. After adjusting the parameter settings of encoder and decoder,
the error between original input data and reconstructed data is minimized through mean sqaure
error (MSE) [161].
MSE =1
n
m
X
i=1
(Xi−Yi)2,(21)
where, mis the total number of samples or records. So far, we discuss a simple autoencoder in
which input and output are identical that makes it sensitive against training data and reduces its
generalization property. In denoising autoencoder, the input data is slightly corrupted by adding
some random noise and passed to the autoencoder to make it insensitive against training data.
This modification increases the generalization property of the autoencoder, which enhances its
feature extraction ability.
6 Proposed Solution 3: Hybrid deep learning model based on ResNet
and GRU for classification of upwards and downwards trends
The solution 3 is proposed to tackle the identified limitations in sub problem statement 3.3.
6.1 Data Processing Phase
Pre-processing of data is an important phase which includes the following steps: Acquisition of
the data, handling the missing values, normalization and overcoming the curse of dimensionality
See Code.
6.1.1 Acquiring Datasets
IBM , APPL, BA and WMT datasets are considered to evaluate the performance of RG 2. All
companies are big giants in the World and exhibit complex behavioural patterns in the financial
market, which are very difficult to predict using statistical and conventional ML models. In
this study, we work with highly granular data to analyse the upward and downward trends in
the financial market. The data is collected from Yahoo Finance website from 25-08-2021 to
31-08-2021. However, these companies only allow us to access the data for the last seven days
because high granularity data (data is collected after every minute) leads to privacy risk, so most
of the companies do not provide high frequency data for commercial use. Next, 92 technical
indicators are used to determine the relationship between supply and demand of assets in the
financial market. The mathematical equations and technical details of these indicators can be
found at [167]. The labels of uptrends and downtrends are assigned based on the closing price
of each interval. We take the difference between the current closing price and the closing price
of the previous interval. If the difference is positive, we assign the label 1, indicating an upward
32
trend, otherwise 0 is assigned, indicating a negative or downward trend in the financial market
behaviour.
6.1.2 Normalization
First, the missing values in all datasets are checked using the Numpy and Pandas libraries.
Then, a min-max normalization is performed, scaling the data between 0 and 1. The results of
the experiment show that the DL models best with normalized data because they are sensitive
to the diversity of the data. When the data is not normalized, the exploding gradient problem
occurs, stopping the learning process of the DL models. The mathematical equation for the
min-max method is given below.
xi,j =xi,j −min(xi)
max(xi)−min(xi)(22)
xi,j represents the current behaviour of financial market while min(xi)and max(xi)represent
minimum and maximum values of whole interval.
6.1.3 Denoising Autoencoder
The curse of dimensionality is a common problem for ML and DL models, which badly affects
their performance. Due to this, the models have high execution time and give a good perfor-
mance during training as compared to the testing phase. In literature, feature selection and
extraction methods are utilized to reduce the dimensionality of data. The former select the fea-
tures from existing ones while the latter compress original features and create new ones. In the
literature, the authors utilize PCA to extract high variance features from financial market data.
However, it works well on linear data while the financial market has complex and non-linear
behaviour. The autoencoder is a DL model, which consists of encoder and decoder parts. The
encoder compresses the high dimensional data into low dimensional data, which is known as
code. The decoder part takes the converted data (code) of the encoder and transforms it into
high-dimensional data See Code.
Y=encoder(Wencoder ∗X+bencoder )(23)
X′=decoder(Wdecoder ∗Y+bdecoder )(24)
X,Yand X′are original input, coded and reconstructed data. The autoencoder is mostly
used for feature extraction. However, in some cases it learns the identity function, where the
input becomes equal to the output, making the autoencoder useless. It performs better on seen
data than unseen data. However, in this study, we adopt denoising autoencoder and it is an
advanced version that solves this problem by introducing noise into the original input data. The
slight corruption of the input data increases its generalization ability. Compared to a simple
autoencoder, it now has a better feature extraction capability.
zt=σ(xt∗Wz+Uzht−1)(25)
rt=σ(xt∗Wr+Urht−1)(26)
h′
t= tanh(xt∗W+rt◦Uht−1)(27)
ht= tanh(zt◦ht+ 1 −zt∗h′
t)(28)
33
The equations of reset and update gates are same. However, both gates utilize different values
of weight metrics to keep or forget past information. xt,ztare current input and update gate,
respectively. Wzand Uzare weight metrics of update gate. Wrand Urare weight metrics of
reset gate. h′
tand htare current and future output values, respectively. In the end, we apply
flatten and dense layers to combine output of GRU model with ResNet.
DenseGRU =WDense ∗f latten(ht) + bDense (29)
WDense and bDense are weight matrix and bias term of dense layers.
Weight matrix
X
ReLu
ReLu
X' = F(X)
X' = F(X) + X
ReLu
Weight matrix
Figure 13. Residual module with skip connection
GRU1 GRUn
GRU module ResNet module
DenseGru DenseResNet
hGR2
Ptrend
Noisy
data
Clean
data
Encoder Decoder
Reconstructed data
Missing values
1
3
Normalization
Coded data
4
5
6
8
7
2
1Finance data
2Handle missing
values
5GRU module
6ResNet module
7Hybrid module
8Prediction
3Normalization
4Feature extraction 88
Figure 14. The proposed model: HRG
34
Algorithm 4: The algorithm of HRG
Data: Financial market dataset: X
1X={xi,j , xi+1,j+1 , ..., xm,n}
2Indicators ={Ind1, I nd2, ..., Ind92}
3IndiD ataF rame = [...]
4for i←mdo
5for e←92 do
6for j←ndo
7IndiD ataF rame[i, j] = I ndicators[e].Xi
8end
9end
10 end
11 mini=minimum value of indicator during interval i
12 maxi=maximum value of indicator during interval i
13 for i←mdo
14 for j←ndo
15 xi,j =xi,j −mini
maxi−mini
16 end
17 end
18 while epoch ←50 do
19 Intialize a layer with weights and bias
20 Wencoder, bencoder , Wdecoder , bdecoder
21 Add noise in input data
22 X=IndiD ataF rame ∗0.5
23 Training of encoder and decoder
24 Y=encoder(Wencoder ∗X+bencoder )
25 ResX =decoder(Wdecoder ∗Y+bdecoder )
26 RMSE =rΣm
i=1(I ndiDataF ramei−ResXi)
m
2
27 Goal: Reduce RM S E value
28 Return: ResX
29 end
30 ResX[label]=0
31 for i=1←mdo
32 Previous closing price
33 P CP r ice =X[i−1,3]
34 Current closing price
35 CC P rice =X[i, 3]
36 if CCPrice ≥PCPrice then
37 ResX[i, −1] = 1
38 end
39 end
40 Features and label Separation
41 X=ResX[:,:−1],y=ResX [:,−1]
42 while epoch ←50 do
43 DenseGRU =WDense ∗f latten(X) + bD ense
44 DenseResN et =WDense ∗f latten(X) + bDense
45 hHRG =WH RG[DenseGRU +DenseResN et] + bH RG
46 ˆy=σ(WHybrid ∗hH RG +bHybr id)
47 Loss =−1
mΣm
i=1yi·log ˆy+ (1 −y)·log(1 −ˆy)
48 Goal: minimize loss
49 end
Result: Ptrend = ˆy
7 The Description of Proposed Model
In this section, our proposed hybrid model is disclosed that is a combination of GRU and ResNet.
The former has memory modules that help to learn and remember long and short term temporal
patterns. The latter has convolutional layers, pooling layers, and skip connections that help
extract abstract or latent patterns from financial market data that cannot be seen by the human
eye. The description of GRU and ResNet can be found below See Code.
35
7.1 Gated Recurrent Unit
Sequential models have been developed to predict the complex behaviour of time series datas.
The recurrent neural network is a first sequential model that works well for smaller sequential
patterns. However, for large sequential patterns, it has a vanishing gradient problem because it
remembers only the current information and forgets the past information. The LSTM and GRU
are its advanced versions that have been proposed to solve the vanishing gradient problem. In
this study, we choose the GRU model because it has low internal complexity and gives good
results than LSTM on smaller datasets. GRU contains update and reset gates. The former de-
cides how much information from the past is shifted along the path. While the later decides how
much information is removed from the network. GRU utilizes sigmoid σ(x) = 1
1+e−xand hy-
perbolic tangent tanh(x) = e2x+1
e2x−1, which are non-linear activation functions to learn complex
and non-linear patterns from financial market data. The mathematical form of GRU model is
explained below [168].
7.2 ResNet Model
CNN is an advanced version of the ANN that is specifically designed for object detection, video
segmentation, object localization, etc. It introduces the idea of convolution operations on images
in a different way. The idea of these operations comes from the field of computer vision, where
hand-crafted convolution filters are developed to extract various features from images. However,
in the CNN model, the optimized values of these filters are learned during the training process by
stochastic gradient descent. Moreover, the CNN model mainly consists of two types of layers,
namely convolutional filters and pooling.
Multiple convolutional layers: One of the limitations of a simple neural network is poor
scalability due to the complete connectivity of neurons. Convolutional layers overcome this
limitation by using kernels or filters that move over the entire image, extracting optimal values
and ignoring redundant values. This mechanism not only reduces the complexity of the model
but also improves scalability. The multiple convolutional layers are stacked together. From that,
initial layers are leveraged to extract high-level features, while the middle and last layers are
used to extract low-level features that cannot be seen by the human eye.
Pooling layers: There are two types of pooling layers, i.e., max and average. The max-
pooling layers take the maximum value from the feature map and ignore the other values. While
average pooling layers take the average of all values and produce a single value. So both con-
volutional layers and pooling layers extract optimal features and remove the redundant ones.
This process not only reduces the complexity of the model but also increases the scalability.
If we simply stack the convolutional layers, a vanishing gradient problem occurs which stops
the learning process of DL models. The existing models like AlexNet, GoogleNet and VGG
perform poorly when the number of convolutional layers is increased. ResNet is an advanced
version of CNN that solves the vanishing gradient problem by introducing the concept of skip
connections. In these connections, the output of the previous layers is added with an output of
the next layers. For example, Xis an input matrix and X′is a matrix obtained after applying
convolution operations, max-pooling, batch normalization operations and non-linear activation
functions. Now both matrices are added and the resulting matrix is fed into the next layers for
further operations. The residual block representing the idea of skip connection can be seen in
Figure 13.
The ResNet model using skip connections solves the vanishing gradient problem because
it allows the gradient to move through alternative paths. Now the higher layers are at least as
efficient as the lower layers. In the end, flatten and dense layers are used to combine the resulting
36
output with the extracted features of the GRU model.
DenseResN et =WDense ∗f latten(X′) + bDense (30)
DenseResN et is last dense layer of ResNet model while WDense and bDense are weight matrix
and bias term of dense layer, respectively.
7.3 Hybrid Module
The DenseGRU and DenseResN et are the last layers of GRU and ResNet models. The hybrid
module simply concatenates both layers using Keras API.
hHRG =WH RG [DenseGRU +DenseResNet] + bH RG (31)
Now, the sigmoid activation function is applied on hHRG , which returns the probability of
having an upward or downward trend in financial market behaviour.
Ptrend =σ(WHybrid ∗hH RG +bH ybrid )(32)
Ptrend has 1 or 0 value, which indicate upward and downward trends in financial market be-
haviour, respectively. WH ybrid and bH ybrid are weight matrix and bias term of the hybrid mod-
ule. The diagram of the proposed model is presented in 14. The working mechanism of the
HRG model is shown in Algorithm 1. In lines 1 to 10, datasets are acquired from the Yahoo
Finance website and 92 technical indicators are obtained from them. In lines 11 to 16, the min-
max normalization technique is applied. From lines 18 to 25, the denoising autoencoder is used
to extract optimal features from the time series dataset. From lines 27 to 33, labels are assigned
to the technical indicator data frames, where 1 and 0 represent upward and downward trends in
financial market behavior, respectively. From lines 34 to 42, ResNet and GRU layers are utilized
to extract latent and temporal patterns from the financial market data, and fully connected layers
are used to predict market behavior.
8 Validation of the proposed system models
In this synopsis, we have proposed thee solutions for sub problems 3.1,3.2 and 3.3. The first
solution is a hybrid DL model that is based on GoogleNet and GRU, which is used to classify
malicious and normal samples. In the second solution, we have proposed a framework that is
based on metaheuristic techniques and denoising autoencoder to solve the curse of dimension-
ality issues. This issue creates an overfitting problem and increases the training time of ML and
DL models. In this solution, a hybrid DL model is introduced that is a combination of ResNet
and GRU for classification of upwards and downwards in financial market data. The validation
of proposed solutions will be done in thesis work.
9 Conclusions and Future Work
In this synopsis, we have proposed three different solutions for sub problems 3.1,3.2 and 3.3.
The conclusion of each solution is given below.
In first solution, we propose a model to detect NTLs in the electricity distribution system.
The proposed model is a hybrid of GRU and GoogleNet. The GRU is used to extract temporal
patterns from time series dataset. Whereas, The GoogleNet is exploited to attain abstract and
latent patterns from the weekly stacked EC dataset. The proposed model is evaluated on realistic
EC dataset that is provided by SGCC, the largest smart grid company in China. Moreover, the
class imbalance problem is a severe issue in ETD. The TLSGAN is proposed that consist of
37
GRU and dense layers to tackle the class imbalance problem. The TLSGAN generates fake
samples, which have high resemblance with real world theft samples.
In second solution, we propose a framework based on metaheuristic techniques (ABC and
GA) and denoising autoencoders. The former select optimal features, while the latter extract
high variance features from the EC data. We use the PRECON dataset, which contains informa-
tion on consumers with different social, demographic, and financial backgrounds. We synthesize
multiple features from users’ consumption patterns using statistical and electrical parameters.
The newly generated features are fed into metaheuristic techniques to find a subset of optimal
features. Denoising autoencoder is used to extract high variance features from the subset of
optimal features selected by GA and ABC. Moreover, the lack or infrequent availability of theft
samples is a serious problem in ETD, which reduces the classification accuracy and leads to
false alarms. We use an updated version of theft samples to generate malicious samples and
concatenate them with normal samples to improve DR.
In third solution, we develop a hybrid HRG based on a ResNet module and GRU methods
to integrate the advantages of both models. The former is used to extract latent or abstract data,
while the latter is used to find temporal patterns from financial market data. The performance
of the proposed model is evaluated using real financial market datasets obtained from well-
known companies around the globe, i.e., IBM, APPL, BA, and WMT . Moreover, the curse of
dimensionality is a major problem that severely affects the performance of forecasting models
that leads to overfitting and increases the complexity of the model. The denoising autoencoder
is an advanced version of a simple autoencoder used to extract important features from financial
market datasets. A small amount of noise is added to the input data, which makes it insensitive
to the diversity of data and increases the generalization ability.
For validation, the proposed models will be implemented by writing the source codes of
ML and DL models using Google Colab [169]. The steps for exploratory data analyses will
be performed using Numpy [170], Pandas [171], Seaborn [172] and Matplotlib [173] libraries.
Implementation of DL models will be performed using Keras [174], TensorFlow [175], Caffe
[176] and Pytorch [177] while ML models will be implemented using scikit-learn [178] and
H2O [179] libraries. If we implement the proposed system models well in time, then hyper-
parameter tuning of proposed models will be performed using metaheuristic techniques [78].
Moreover, performance of proposed models will be evaluated on the following datasets: SGCC,
PRECON,IBM,WMT,BA and APPL.
References
[1] Arango, L. G., Deccache, E., Bonatto, B. D., Arango, H., Ribeiro, P. F., & Silveira, P. M.
(2016, October). Impact of electricity theft on power quality. In 2016 17th International
Conference on Harmonics and Quality of Power (ICHQP) (pp. 557-562). IEEE.
[2] Jokar, P., Arianpoo, N., & Leung, V. C. (2015). Electricity theft detection in AMI using
customers’ consumption patterns. IEEE Transactions on Smart Grid, 7(1), 216-226.
[3] Aslam, S., Javaid, N., Khan, F. A., Alamri, A., Almogren, A., & Abdul, W. (2018). Towards
efficient energy management and power trading in a residential area via integrating a grid-
connected microgrid. Sustainability, 10(4), 1245.
[4] Rasheed, M. B., Javaid, N., Ahmad, A., Awais, M., Khan, Z. A., Qasim, U., & Alrajeh,
N. (2016). Priority and delay constrained demand side management in real-time price envi-
ronment with renewable energy source. International Journal of Energy Research, 40(14),
2002-2021.
38
[5] Punmiya, R., & Choe, S. (2019). Energy theft detection using gradient boosting theft detec-
tor with feature engineering-based preprocessing. IEEE Transactions on Smart Grid, 10(2),
2326-2329.
[6] Zheng, K., Chen, Q., Wang, Y., Kang, C., & Xia, Q. (2018). A novel combined data-driven
approach for electricity theft detection. IEEE Transactions on Industrial Informatics, 15(3),
1809-1819.
[7] Razavi, R., Gharipour, A., Fleury, M., & Akpan, I. J. (2019). A practical feature-engineering
framework for electricity theft detection in smart grids. Applied energy, 238, 481-494.
[8] Otuoze, A. O., Mustafa, M. W., Mohammed, O. O., Saeed, M. S., Surajudeen-Bakinde,
N. T., & Salisu, S. (2019). Electricity theft detection by sources of threats for smart city
planning. IET Smart Cities, 1(2), 52-60.
[9] Tariq, M., & Poor, H. V. (2016). Electricity theft detection and localization in grid-tied
microgrids. IEEE Transactions on Smart Grid, 9(3), 1920-1929.
[10] Amin, S., Schwartz, G. A., Cardenas, A. A., & Sastry, S. S. (2015). Game-theoretic mod-
els of electricity theft detection in smart utility networks: Providing new capabilities with
advanced metering infrastructure. IEEE Control Systems Magazine, 35(1), 66-81.
[11] Yan, Z., & Wen, H. (2021). Electricity theft detection base on extreme gradient boosting
in AMI. IEEE Transactions on Instrumentation and Measurement, 70, 1-9.
[12] Aldegheishem, A., Anwar, M., Javaid, N., Alrajeh, N., Shafiq, M., & Ahmed, H. (2021).
Towards sustainable energy efficiency with intelligent electricity theft detection in smart
grids emphasising enhanced neural networks. IEEE Access, 9, 25036-25061.
[13] Finardi, P., Campiotti, I., Plensack, G., de Souza, R. D., Nogueira, R., Pinheiro,
G., & Lotufo, R. (2020). Electricity Theft Detection with self-attention. arXiv preprint
arXiv:2002.06219.
[14] Khan, Z. A., Adil, M., Javaid, N., Saqib, M. N., Shafiq, M., & Choi, J. G. (2020). Electric-
ity theft detection using supervised learning techniques on smart meter data. Sustainability,
12(19), 8023.
[15] Feng, X., Hui, H., Liang, Z., Guo, W., Que, H., Feng, H. & Ding, Y. (2020). A Novel Elec-
tricity Theft Detection Scheme Based on Text Convolutional Neural Networks. Energies,
13(21), 5758.
[16] Gong, X., Tang, B., Zhu, R., Liao, W., & Song, L. (2020). Data augmentation for electricity
theft detection using conditional variational auto-encoder. Energies, 13(17), 4291.
[17] Lo, C. H., & Ansari, N. (2013). CONSUMER: A novel hybrid intrusion detection system
for distribution networks in smart grid. IEEE Transactions on Emerging Topics in Comput-
ing, 1(1), 33-44.
[18] Khoo, B., & Cheng, Y. (2011, April). Using RFID for anti-theft in a Chinese electrical sup-
ply company: A cost-benefit analysis. In 2011 Wireless Telecommunications Symposium
(WTS) (pp. 1-6). IEEE.
[19] Anzar, M., Nadeem, J., Muhammad, A. K., & Sohail, R. (2015). An Overview of Load
Management Techniques in Smart Frid. Int. J. Energy Res, 39(11), 1437-1450.
[20] Mohite, N., Ranaware, R. & Kakade, P., (2014). “GSM based electricity theft detection."
International Journal of Scientific engineering and Apllied Science (IJSEAS), 8(10), pp.51-
59.
39
[21] Jumale, P., Khaire, A., Jadhawar, H., Awathare, S., & Mali, M. (2016). Survey: Electricity
Theft Detection Technique. International Journal of Computer Engineering and Information
Technology, 8(2), 30.
[22] Bihl, T. J., & Hajjar, S. (2017, June). Electricity theft concerns within advanced energy
technologies. In 2017 IEEE National Aerospace and Electronics Conference (NAECON)
(pp. 271-278). IEEE.
[23] Dineshkumar, K., Ramanathan, P., & Ramasamy, S. (2015, March). Development of ARM
processor based electricity theft control system using GSM network. In 2015 International
Conference on Circuits, Power and Computing Technologies [ICCPCT-2015] (pp. 1-6).
IEEE.
[24] Patil, N. V., Kanase, R. S., Bondar, D. R., & Bamane, P. D. (2017, February). Intelligent
energy meter with advanced billing system and electricity theft detection. In 2017 Interna-
tional Conference on Data Management, Analytics and Innovation (ICDMAI) (pp. 36-41).
IEEE.
[25] Chauhan, A. A. (2015, May). Non-technical losses in power system and monitoring of
electricity theft over low-tension poles. In 2015 Second International Conference on Ad-
vances in Computing and Communication Engineering (pp. 280-284). IEEE.
[26] Amin, S., Schwartz, G. A., & Tembine, H. (2012, November). Incentives and security in
electricity distribution networks. In International Conference on Decision and Game Theory
for Security (pp. 264-280). Springer, Berlin, Heidelberg.
[27] Cárdenas, A. A., Amin, S., Schwartz, G., Dong, R., & Sastry, S. (2012, October). A
game theory model for electricity theft detection and privacy-aware control in AMI systems.
In 2012 50th Annual Allerton Conference on Communication, Control, and Computing
(Allerton) (pp. 1830-1837). IEEE.
[28] Kalra, P. (2014). Theft Detection Schemes for Smart Grid Application. Journal Impact
Factor, 5(12), 321-327.
[29] Liang, X. & Xiao, Y., (2012). “Game theory for network security." IEEE Communications
Surveys & Tutorials, 15(1), pp.472-486.
[30] Salinas, S. A., & Li, P. (2015). Privacy-preserving energy theft detection in microgrids: A
state estimation approach. IEEE Transactions on Power Systems, 31(2), 883-894.
[31] Podimata, M. V., & Yannopoulos, P. C. (2015). Evolution of game theory application in
irrigation systems. Agriculture and agricultural science procedia, 4, 271-281.
[32] McLaughlin, S., Holbert, B., Fawaz, A., Berthier, R., & Zonouz, S. (2013). A multi-sensor
energy theft detection framework for advanced metering infrastructures. IEEE Journal on
Selected Areas in Communications, 31(7), 1319-1330.
[33] Zheng, Z., Yang, Y., Niu, X., Dai, H. N., & Zhou, Y. (2017). Wide and deep convolutional
neural networks for electricity-theft detection to secure smart grids. IEEE Transactions on
Industrial Informatics, 14(4), 1606-1615.
[34] Buzau, M. M., Tejedor-Aguilera, J., Cruz-Romero, P., & Gómez-Expósito, A. (2019). Hy-
brid deep neural networks for detection of non-technical losses in electricity smart meters.
IEEE Transactions on Power Systems, 35(2), 1254-1263.
40
[35] Hussain, S., Mustafa, M. W., Jumani, T. A., Baloch, S. K., Alotaibi, H., Khan, I., &
Khan, A. (2021). A novel feature engineered-CatBoost-based supervised machine learning
framework for electricity theft detection. Energy Reports, 7, 4425-4436.
[36] Bohani, F. A., Suliman, A., Saripuddin, M., Sameon, S. S., Md Salleh, N. S., & Nazeri,
S. (2021). A Comprehensive Analysis of Supervised Learning Techniques for Electricity
Theft Detection. Journal of Electrical and Computer Engineering, 2021.
[37] Aslam, Z., Javaid, N., Ahmad, A., Ahmed, A., & Gulfam, S. M. (2020). A combined
deep learning and ensemble learning methodology to avoid electricity theft in smart grids.
Energies, 13(21), 5599.
[38] Park, C. H., & Kim, T. (2020). Energy Theft Detection in Advanced Metering Infrastruc-
ture Based on Anomaly Pattern Detection. Energies, 13(15), 3832.
[39] Jindal, A., Dua, A., Kaur, K., Singh, M., Kumar, N., & Mishra, S. (2016). Decision tree and
SVM-based data analytics for theft detection in smart grid. IEEE Transactions on Industrial
Informatics, 12(3), 1005-1016.
[40] Nabil, M., Ismail, M., Mahmoud, M., Shahin, M., Qaraqe, K., & Serpedin, E. (2019).
Deep learning-based detection of electricity theft cyber-attacks in smart grid AMI networks.
In Deep Learning Applications for Cyber Security (pp. 73-102). Springer, Cham.
[41] Kocaman, B., & Tümen, V. (2020). Detection of electricity theft using data processing and
LSTM method in distribution systems. S¯
adhan¯
a, 45(1), 1-10.
[42] Pereira, J., & Saraiva, F. (2021). Convolutional neural network applied to detect electricity
theft: A comparative study on unbalanced data handling techniques. International Journal
of Electrical Power & Energy Systems, 131, 107085.
[43] Hussain, S., Mustafa, M. W., Jumani, T. A., Baloch, S. K., & Saeed, M. S. (2020). A
novel unsupervised feature-based approach for electricity theft detection using robust PCA
and outlier removal clustering algorithm. International Transactions on Electrical Energy
Systems, 30(11), e12572.
[44] Jamil, F. (2018). Electricity theft among residential consumers in Rawalpindi and Islam-
abad. Energy Policy, 123, 147-154.
[45] Hasan, M., Toma, R. N., Nahid, A. A., Islam, M., & Kim, J. M. (2019). Electricity theft
detection in smart grid systems: A CNN-LSTM based approach. Energies, 12(17), 3310.
[46] Avila, N. F., Figueroa, G., & Chu, C. C. (2018). NTL detection in electric distribution
systems using the maximal overlap discrete wavelet-packet transform and random under-
sampling boosting. IEEE Transactions on Power Systems, 33(6), 7171-7180.
[47] Tjahjono, A., Raziqurrahman, S., & Wardhani, R. N. (2020, February). Consumer power
prediction based on neural network for electricity theft detection. In Journal of Physics:
Conference Series (Vol. 1450, No. 1, p. 012048). IOP Publishing.
[48] Odoom, D. (2020). A Methodology in Utilizing Machine Learning Algorithm for Electric-
ity Theft Detection in Ghana. Available at SSRN 3659614.
[49] Shen, Y., Shao, P., Chen, G., Gu, X., Wen, T., Zang, L., & Zhu, J. (2021). An identifi-
cation method of anti-electricity theft load based on long and short-term memory network.
Procedia Computer Science, 183, 440-447.
41
[50] Lin, G., Feng, X., Guo, W., Cui, X., Liu, S., Jin, W. & Ding, Y. (2021). Electricity Theft
Detection Based on Stacked Autoencoder and the Undersampling and Resampling Based
Random Forest Algorithm. IEEE Access, 9, 124044-124058.
[51] Qu, Z., Liu, H., Wang, Z., Xu, J., Zhang, P., & Zeng, H. (2021). A combined genetic
optimization with AdaBoost ensemble model for anomaly detection in buildings electricity
consumption. Energy and Buildings, 248, 111193.
[52] Bian, J., Wang, L., Scherer, R., Wo´zniak, M., Zhang, P., & Wei, W. (2021). Abnormal
Detection of Electricity Consumption of User Based on Particle Swarm Optimization and
Long Short Term Memory With the Attention Mechanism. IEEE Access, 9, 47252-47265.
[53] Afridi, A., Wahab, A., Khan, S., Ullah, W., Khan, S., Islam, S. Z. U., & Hussain, K.
(2021). An efficient and improved model for power theft detection in Pakistan. Bulletin of
Electrical Engineering and Informatics, 10(4), 1828-1837.
[54] Aslam, Z., Ahmed, F., Almogren, A., Shafiq, M., Zuair, M., & Javaid, N. (2020). An
attention guided semi-supervised learning mechanism to detect electricity frauds in the dis-
tribution systems. IEEE Access, 8, 221767-221782.
[55] Gunturi, S. K., & Sarkar, D. (2021). Ensemble machine learning models for the detection
of energy theft. Electric Power Systems Research, 192, 106904.
[56] Komolafe, O. M., & Udofia, K. M. (2020). A technique for electrical energy theft detection
and location in low voltage power distribution systems. Engineering and Applied Sciences,
5(2), 41.
[57] Buzau, M. M., Tejedor-Aguilera, J., Cruz-Romero, P., & Gómez-Expósito, A. (2018).
Detection of non-technical losses using smart meter data and supervised learning. IEEE
Transactions on Smart Grid, 10(3), 2661-2670.
[58] Ramos, C. C., Rodrigues, D., de Souza, A. N., & Papa, J. P. (2016). On the study of
commercial losses in Brazil: a binary black hole algorithm for theft characterization. IEEE
Transactions on Smart Grid, 9(2), 676-683.
[59] Coussement, K., De Bock, K. W., & Geuens, S. (2021). A decision-analytic framework for
interpretable recommendation systems with multiple input data sources: a case study for a
European e-tailer. Annals of Operations Research, 1-24.
[60] Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for
financial market predictions. European Journal of Operational Research, 270(2), 654-669.
[61] Fama, E. F., Fisher, L., Jensen, M., & Roll, R. (1969). The adjustment of stock prices to
new information. International economic review, 10(1).
[62] Jensen, M. C. (1978). Some anomalous evidence regarding market efficiency. Journal of
financial economics, 6(2/3), 95-101.
[63] Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal
of computational science, 2(1), 1-8.
[64] Ferreira, F. G. D. C., Gandomi, A. H., & Cardoso, R. T. N. (2020, December). Finan-
cial time-series analysis of Brazilian stock market using machine learning. In 2020 IEEE
Symposium Series on Computational Intelligence (SSCI) (pp. 2853-2860). IEEE.
[65] Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted trees,
random forests: Statistical arbitrage on the S&P 500. European Journal of Operational Re-
search, 259(2), 689-702.
42
[66] Yang, R., Yu, L., Zhao, Y., Yu, H., Xu, G., Wu, Y., & Liu, Z. (2020). Big data analytics for
financial Market volatility forecast based on support vector machine. International Journal
of Information Management, 50, 452-462.
[67] Wei, X., Chen, W., & Li, X. (2021). Exploring the financial indicators to improve the
pattern recognition of economic data based on machine learning. Neural Computing and
Applications, 33(2), 723-737.
[68] Mohanty, D. K., Parida, A. K., & Khuntia, S. S. (2021). Financial market prediction under
deep learning framework using auto encoder and kernel extreme learning machine. Applied
Soft Computing, 99, 106898.
[69] Li, B., Xu, K., Cui, X., Wang, Y., Ai, X., & Wang, Y. (2018, August). Multi-scale
DenseNet-based electricity theft detection. In International Conference on Intelligent Com-
puting (pp. 172-182). Springer, Cham.
[70] Li, S., Han, Y., Yao, X., Yingchen, S., Wang, J., & Zhao, Q. (2019). Electricity theft
detection in power grids with deep learning and random forests. Journal of Electrical and
Computer Engineering, 2019.
[71] Ghori, K. M., Abbasi, R. A., Awais, M., Imran, M., Ullah, A., & Szathmary, L. (2019).
Performance analysis of different types of machine learning classifiers for non-technical
loss detection. IEEE Access, 8, 16033-16048.
[72] Mujeeb, S., Javaid, N., Ilahi, M., Wadud, Z., Ishmanov, F., & Afzal, M. K. (2019). Deep
long short-term memory: A new price and load forecasting scheme for big data in smart
cities. Sustainability, 11(4), 987.
[73] Javaid, N., Gul, H., Baig, S., Shehzad, F., Xia, C., Guan, L., & Sultana, T. (2021). Using
GANCNN and ERNET for Detection of Non Technical Losses to Secure Smart Grids. IEEE
Access, 9, 98679-98700.
[74] Kong, X., Zhao, X., Liu, C., Li, Q., Dong, D., & Li, Y. (2021). Electricity theft detection
in low-voltage stations based on similarity measure and DT-KSVM. International Journal
of Electrical Power & Energy Systems, 125, 106544.
[75] Arif, A., Javaid, N., Aldegheishem, A., & Alrajeh, N. (2021). Big data analytics for iden-
tifying electricity theft using machine learning approaches in microgrids for smart commu-
nities. Concurrency and Computation: Practice and Experience, e6316.
[76] Javaid, N., Jan, N., & Javed, M. U. (2021). An adaptive synthesis to handle imbalanced
big data with deep siamese network for electricity theft detection in smart grids. Journal of
Parallel and Distributed Computing, 153, 44-52.
[77] Aldegheishem, A., Anwar, M., Javaid, N., Alrajeh, N., Shafiq, M., & Ahmed, H. (2021).
Towards sustainable energy efficiency with intelligent electricity theft detection in smart
grids emphasising enhanced neural networks. IEEE Access, 9, 25036-25061.
[78] Khalid, R., & Javaid, N. (2020). A survey on hyperparameters optimization algorithms of
forecasting models in smart grid. Sustainable Cities and Society, 61, 102275.
[79] Wang, X., & Ahn, S. H. (2020). Real-time prediction and anomaly detection of electrical
load in a residential community. Applied Energy, 259, 114145.
[80] Oprea, S. V., & Bâra, A. (2021). Machine learning classification algorithms and anomaly
detection in conventional meters and Tunisian electricity consumption large datasets. Com-
puters & Electrical Engineering, 94, 107329.
43
[81] Krishna, V., & Bose. (2020). Emerging Research in Data Engineering Systems and Com-
puter Communications. Springer Singapore.
[82] Yao, R., Wang, N., Liu, Z., Chen, P., & Sheng, X. (2021). Intrusion Detection System in
the Advanced Metering Infrastructure: A Cross-Layer Feature-Fusion CNN-LSTM-Based
Approach. Sensors, 21(2), 626.
[83] Henriques, H. O., Corrêa, R. L. S., Fortes, M. Z., Borba, B. S. M. C., & Ferreira, V. H.
(2020). Monitoring technical losses to improve non-technical losses estimation and detec-
tion in LV distribution systems. Measurement, 161, 107840.
[84] Cui, L., Qu, Y., Gao, L., Xie, G., & Yu, S. (2020). Detecting false data attacks using
machine learning techniques in smart grid: A survey. Journal of Network and Computer
Applications, 102808.
[85] Emadaleslami, M., & Haghifam, M. R. (2021). A Machine Learning Approach to De-
tect Energy Fraud in Smart Distribution Network. International Journal of Smart Electrical
Engineering, 10(02), 59-66.
[86] Han, W., & Xiao, Y. (2020). Edge computing enabled non-technical loss fraud detection
for big data security analytic in Smart Grid. Journal of Ambient Intelligence and Humanized
Computing, 11(4), 1697-1708.
[87] Long, H., Chen, C., Gu, W., Xie, J., Wang, Z., & Li, G. (2020). A Data-Driven Combined
Algorithm for Abnormal Power Loss Detection in the Distribution Network. IEEE Access,
8, 24675-24686.
[88] Qin, H., Zhou, H., & Cao, J. (2020). Imbalanced learning algorithm based intelligent
abnormal electricity consumption detection. Neurocomputing, 402, 112-123.
[89] Khalid, R., Javaid, N., Al-Zahrani, F. A., Aurangzeb, K., Qazi, E. U. H., & Ashfaq, T.
(2020). Electricity load and price forecasting using Jaya-Long Short Term Memory (JL-
STM) in smart grids. Entropy, 22(1), 10.
[90] Aslam, S., Herodotou, H., Mohsin, S. M., Javaid, N., Ashraf, N., & Aslam, S. (2021). A
survey on deep learning methods for power load and renewable energy forecasting in smart
microgrids. Renewable and Sustainable Energy Reviews, 144, 110992.
[91] Javaid, N., Naz, A., Khalid, R., Almogren, A., Shafiq, M., & Khalid, A. (2020). ELS-Net:
A New Approach to Forecast Decomposed Intrinsic Mode Functions of Electricity Load.
IEEE Access, 8, 198935-198949.
[92] Zahid, M., Ahmed, F., Javaid, N., Abbasi, R. A., Zainab Kazmi, H. S., Javaid, A. &
Ilahi, M. (2019). Electricity price and load forecasting using enhanced convolutional neural
network and enhanced support vector regression in smart grids. Electronics, 8(2), 122.
[93] Mujeeb, S., & Javaid, N. (2019). ESAENARX and DE-RELM: Novel schemes for big data
predictive analytics of electricity load and price. Sustainable Cities and Society, 51, 101642.
[94] Adil, M., Javaid, N., Qasim, U., Ullah, I., Shafiq, M., & Choi, J. G. (2020). LSTM and bat-
based RUSBoost approach for electricity theft detection. Applied Sciences, 10(12), 4378.
[95] Hu, T., Guo, Q., Sun, H., Huang, T. E., & Lan, J. (2020). Nontechnical losses detection
through coordinated biwgan and svdd. IEEE Transactions on Neural Networks and Learning
Systems, 32(5), 1866-1880.
44
[96] Villar-Rodriguez, E., Del Ser, J., Oregi, I., Bilbao, M. N., & Gil-Lopez, S. (2017). De-
tection of non-technical losses in smart meter data based on load curve profiling and time
series analysis. Energy, 137, 118-128.
[97] Han, W., & Xiao, Y. (2017). A novel detector to detect colluded non-technical loss frauds
in smart grid. Computer Networks, 117, 19-31.
[98] Rouzbahani, H. M., Bahrami, A. H., & Karimipour, H. (2021). A Snapshot Ensemble Deep
Neural Network Model for Attack Detection in Industrial Internet of Things. In AI-Enabled
Threat Detection and Security Analysis for Industrial IoT (pp. 181-194). Springer, Cham.
[99] Hu, T., Guo, Q., Shen, X., Sun, H., Wu, R., & Xi, H. (2019). Utilizing unlabeled data to de-
tect electricity fraud in AMI: A semisupervised deep learning approach. IEEE transactions
on neural networks and learning systems, 30(11), 3287-3299.
[100] Qin, H., Zhou, H., & Cao, J. (2020). Imbalanced learning algorithm based intelligent
abnormal electricity consumption detection. Neurocomputing, 402, 112-123.
[101] Ganguly, P., Nasipuri, M., & Dutta, S. (2018). A novel approach for detecting and mitigat-
ing the energy theft issues in the smart metering infrastructure. Technology and Economics
of Smart Grids and Sustainable Energy, 3(1), 1-11.
[102] Ahmad, T. (2017). Non-technical loss analysis and prevention using smart meters. Re-
newable and Sustainable Energy Reviews, 72, 573-589.
[103] Cheng, G., Zhang, Z., Li, Q., Li, Y., & Jin, W. (2021). Energy Theft Detection in an Edge
Data Center Using Deep Learning. Mathematical Problems in Engineering, 2021.
[104] Kirankumar, T., & Madhu, G. S. (2018). Power theft detection using probabilistic neural
network classifier. International Research Journal of Engineering and Technology (IRJET),
5(8), 834-838.
[105] Ibrahim, N. M., Al-Janabi, S. T. F., & Al-Khateeb, B. (2021). Electricity-theft detection
in smart grids based on deep learning. Bulletin of Electrical Engineering and Informatics,
10(4), 2285-2292.
[106] Ibrahim, N., Al-Janabi, S., & Al-Khateeb, B. (2021). Electricity-Theft Detection in Smart
Grid Based on Deep Learning. Bulletin of Electrical Engineering and Informatics, 10(4),
2285-2292.
[107] Yip, S. C., Wong, K., Hew, W. P., Gan, M. T., Phan, R. C. W., & Tan, S. W. (2017).
Detection of energy theft and defective smart meters in smart grids using linear regression.
International Journal of Electrical Power & Energy Systems, 91, 230-240.
[108] Yao, D., Wen, M., Liang, X., Fu, Z., Zhang, K., & Yang, B. (2019). Energy theft detection
with energy privacy preservation in the smart grid. IEEE Internet of Things Journal, 6(5),
7659-7669.
[109] Karabiber, A. (2019). Detecting and pricing nontechnical losses by using utility power
meters in electricity distribution grids. Journal of Electrical Engineering & Technology,
14(5), 1933-1942.
[110] Gul, H., Javaid, N., Ullah, I., Qamar, A. M., Afzal, M. K., & Joshi, G. P. (2020). Detection
of non-technical losses using SOSTLink and bidirectional gated recurrent unit to secure
smart meters. Applied Sciences, 10(9), 3151.
45
[111] Huang, Y., & Xu, Q. (2021). Electricity theft detection based on stacked sparse denoising
autoencoder. International Journal of Electrical Power & Energy Systems, 125, 106448.
[112] Khan, Z. A., Zafar, A., Javaid, S., Aslam, S., Rahim, M. H., & Javaid, N. (2019). Hybrid
meta-heuristic optimization based home energy management system in smart grid. Journal
of Ambient Intelligence and Humanized Computing, 10(12), 4837-4853.
[113] Naz, M., Iqbal, Z., Javaid, N., Khan, Z. A., Abdul, W., Almogren, A., & Alamri, A.
(2018). Efficient power scheduling in smart homes using hybrid grey wolf differential evo-
lution optimization technique with real time and critical peak pricing schemes. Energies,
11(2), 384.
[114] Javaid, N., Ahmed, F., Ullah, I., Abid, S., Abdul, W., Alamri, A., & Almogren, A. S.
(2017). Towards cost and comfort based hybrid optimization for residential load scheduling
in a smart grid. Energies, 10(10), 1546.
[115] Javaid, N., Hussain, S. M., Ullah, I., Noor, M. A., Abdul, W., Almogren, A., & Alamri,
A. (2017). Demand side management in nearly zero energy buildings using heuristic opti-
mizations. Energies, 10(8), 1131.
[116] Manzoor, A., Javaid, N., Ullah, I., Abdul, W., Almogren, A., & Alamri, A. (2017). An
intelligent hybrid heuristic scheme for smart metering based demand side management in
smart homes. Energies, 10(9), 1258.
[117] Aslam, S., Khalid, A., & Javaid, N. (2020). Towards efficient energy management in
smart grids considering microgrids with day-ahead energy forecasting. Electric Power Sys-
tems Research, 182, 106232.
[118] Ullah, A., Javaid, N., Yahaya, A. S., Sultana, T., Al-Zahrani, F. A., & Zaman, F. (2021).
A Hybrid Deep Neural Network for Electricity Theft Detection Using Intelligent Antenna-
Based Smart Meters. Wireless Communications and Mobile Computing, 2021.
[119] Sultana, T., Almogren, A., Akbar, M., Zuair, M., Ullah, I., & Javaid, N. (2020). Data
sharing system integrating access control mechanism using blockchain-based smart con-
tracts for IoT devices. Applied Sciences, 10(2), 488.
[120] Khalid, R., Samuel, O., Javaid, N., Aldegheishem, A., Shafiq, M., & Alrajeh, N. (2021).
A Secure Trust Method for Multi-Agent System in Smart Grids Using Blockchain. IEEE
Access, 9, 59848-59859.
[121] Samuel, O., & Javaid, N. GarliChain: A Privacy Preserving System for Smart Grid Con-
sumers using Blockchain.
[122] Khalid, R., Javaid, N., Almogren, A., Javed, M. U., Javaid, S., & Zuair, M. (2020).
A blockchain-based load balancing in decentralized hybrid P2P energy trading market in
smart grid. IEEE Access, 8, 47047-47062.
[123] Coma-Puig, B., & Carmona, J. (2019). Bridging the gap between energy consumption
and distribution through non-technical loss detection. Energies, 12(9), 1748.
[124] Somefun, T. E., Awosope, C. O. A., & Chiagoro, A. (2019). Smart prepaid energy meter-
ing system to detect energy theft with facility for real time monitoring. International Journal
of Electrical and Computer Engineering, 9(5), 4184.
[125] He, Y., Mendis, G. J., & Wei, J. (2017). Real-time detection of false data injection attacks
in smart grid: A deep learning-based intelligent mechanism. IEEE Transactions on Smart
Grid, 8(5), 2505-2516.
46
[126] Liu, J., Li, G., Song, W., Liu, D., & Jiang, T. (2020). Electricity Stealing Behavior De-
tection Method based on BP Neural Network. Design Engineering, 103-110.
[127] Liu, Y., & Hu, S. (2015). Cyberthreat analysis and detection for energy theft in social
networking of smart homes. IEEE Transactions on Computational Social Systems, 2(4),
148-158.
[128] Saeed, M. S., Mustafa, M. W., Sheikh, U. U., Jumani, T. A., & Mirjat, N. H. (2019). En-
semble bagged tree based classification for reducing non-technical losses in multan electric
power company of Pakistan. Electronics, 8(8), 860.
[129] Zhang, W., Dong, X., Li, H., Xu, J., & Wang, D. (2020). Unsupervised detection of
abnormal electricity consumption behavior based on feature engineering. IEEE Access, 8,
55483-55500.
[130] Hong, H., Su, Y., Zheng, P., Cheng, N., & Zhang, J. (2021). A SVM-based detection
method for electricity stealing behavior of charging pile. Procedia Computer Science, 183,
295-302.
[131] Micheli, G., Soda, E., Vespucci, M. T., Gobbi, M., & Bertani, A. (2019). Big data analyt-
ics: an aid to detection of non-technical losses in power utilities. Computational Manage-
ment Science, 16(1), 329-343.
[132] Lu, X., Zhou, Y., Wang, Z., Yi, Y., Feng, L., & Wang, F. (2019). Knowledge embedded
semi-supervised deep learning for detecting non-technical losses in the smart grid. Energies,
12(18), 3452.
[133] Sasirekha, P., & Karthikeyan, R. Non-Technical Loss Detection in Electric Power Distri-
bution Networks by using Random Forest Fed Support Vector Machines. American Inter-
national Journal of Research in Science, Technology, Engineering & Mathematics, 96.
[134] Khederzadeh, M. (2019). Application of importance sampling method for non-technical
losses detection in electrical distribution systems using smart meters.
[135] Sankari, E., & Rajesh, R. (2015). Detection of Non-Technical Loss in Power Utilities
using Data Mining Techniques. International Journal for Innovative Research in Science &
Technology, 1(9), 97-101.
[136] Normanyo, E., & Kassim, M. (2018). Detecting Electric Power Theft in Households
Using Hidden Markov Models. Energy, 6.
[137] Alfarra, A. H. T., Attia, B. A., & El Safty, C. S. M. Nontechnical Loss Detection for
Metered Customers in Alexandria Electricity Distribution Company Using Support Vector
Machine.
[138] Pazi, S., Clohessy, C. M., & Sharp, G. D. (2020). A framework to select a classification
algorithm in electricity fraud detection. South African Journal of Science, 116(9-10), 1-7.
[139] Yip, S. C., Tan, W. N., Tan, C., Gan, M. T., & Wong, K. (2018). An anomaly detection
framework for identifying energy theft and defective meters in smart grids. International
Journal of Electrical Power & Energy Systems, 101, 189-203.
[140] Messinis, G. M., Rigas, A. E., & Hatziargyriou, N. D. (2019). A hybrid method for non-
technical loss detection in smart distribution grids. IEEE Transactions on Smart Grid, 10(6),
6080-6091.
47
[141] Aniedu, A. N., Inyiama, H. C., Azubogu, A. C., & Nwokoye, S. C. Pattern Recogni-
tion using Support Vector Machines as a Solution for Non-Technical Losses in Electricity
Distribution Industry.
[142] Ding, N., Ma, H., Gao, H., Ma, Y., & Tan, G. (2019). Real-time anomaly detection
based on long short-Term memory and Gaussian Mixture Model. Computers & Electrical
Engineering, 79, 106458.
[143] GULFAM, S. M., & RADWAN, A. A Robust Hybrid Deep Learning Model for Detection
of Non-technical Losses to Secure Smart Grids.
[144] Bhat, R. R., Trevizan, R. D., Sengupta, R., Li, X., & Bretas, A. (2016, December).
Identifying nontechnical power loss via spatial and temporal deep learning. In 2016 15th
IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 272-
279). IEEE.
[145] Saeed, M. S., Mustafa, M. W., Sheikh, U. U., Jumani, T. A., & Mirjat, N. H. (2019). En-
semble bagged tree based classification for reducing non-technical losses in multan electric
power company of Pakistan. Electronics, 8(8), 860.
[146] Wang, X., Yang, I., & Ahn, S. H. (2019). Sample efficient home power anomaly detection
in real time using semi-supervised learning. IEEE Access, 7, 139712-139725.
[147] Fenza, G., Gallo, M., & Loia, V. (2019). Drift-aware methodology for anomaly detection
in smart grid. IEEE Access, 7, 9645-9657.
[148] Fan, C., Xiao, F., Zhao, Y., & Wang, J. (2018). Analytical investigation of autoencoder-
based methods for unsupervised anomaly detection in building energy data. Applied energy,
211, 1123-1135.
[149] Maamar, A., & Benahmed, K. (2019). A hybrid model for anomalies detection in AMI
system combining K-means clustering and deep neural network. Comput. Mater. Continua,
60(1), 15-39.
[150] Takiddin, A., Ismail, M., Zafar, U., & Serpedin, E. (2020). Robust Electricity Theft De-
tection Against Data Poisoning Attacks in Smart Grids. IEEE Transactions on Smart Grid,
12(3), 2675-2684.
[151] Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Paul Smolley, S. (2017). Least squares
generative adversarial networks. In Proceedings of the IEEE international conference on
computer vision (pp. 2794-2802).
[152] Huang, C. J., Shen, Y., Chen, Y. H., & Chen, H. C. (2021). A novel hybrid deep neural
network model for short-term electricity price forecasting. International Journal of Energy
Research, 45(2), 2511-2532.
[153] Yu, J., Zhang, X., Xu, L., Dong, J., & Zhangzhong, L. (2021). A hybrid CNN-GRU
model for predicting soil moisture in maize root zone. Agricultural Water Management,
245, 106649.
[154] McLaughlin, S., Holbert, B., Fawaz, A., Berthier, R., & Zonouz, S. (2013). A multi-
sensor energy theft detection framework for advanced metering infrastructures. IEEE Jour-
nal on Selected Areas in Communications, 31(7), 1319-1330.
[155] Xiao, Z., Xiao, Y., & Du, D. H. C. (2013). Non-repudiation in neighborhood area net-
works for smart grid. IEEE Communications Magazine, 51(1), 18-26.
48
[156] Ismail, M., Shaaban, M. F., Naidu, M., & Serpedin, E. (2020). Deep learning detection
of electricity theft cyber-attacks in renewable distributed generation. IEEE Transactions on
Smart Grid, 11(4), 3428-3437.
[157] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated
recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
[158] Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effec-
tiveness of data in deep learning era. In Proceedings of the IEEE international conference
on computer vision (pp. 843-852).
[159] Mirjalili, S. (2019). Genetic algorithm. In Evolutionary algorithms and neural networks
(pp. 43-55). Springer, Cham.
[160] Schiezaro, M., & Pedrini, H. (2013). Data feature selection based on Artificial Bee
Colony algorithm. EURASIP Journal on Image and Video Processing, 2013(1), 1-8.
[161] Yan, B., & Han, G. (2018). Effective feature extraction via stacked sparse autoencoder to
improve intrusion detection system. IEEE Access, 6, 41238-41248.
[162] Zhong, X., & Enke, D. (2019). Predicting the daily return direction of the stock market
using hybrid machine learning algorithms. Financial Innovation, 5(1), 1-20.
[163] Bukhari, A. H., Raja, M. A. Z., Sulaiman, M., Islam, S., Shoaib, M., & Kumam, P.
(2020). Fractional neuro-sequential ARFIMA-LSTM for financial market forecasting. IEEE
Access, 8, 71326-71338.
[164] Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., & Iosifidis, A.
(2017, August). Using deep learning to detect price change indications in financial markets.
In 2017 25th European Signal Processing Conference (EUSIPCO) (pp. 2511-2515). IEEE.
[165] Chiong, R., Fan, Z., Hu, Z., Adam, M. T., Lutz, B., & Neumann, D. (2018, July). A sen-
timent analysis-based machine learning approach for financial market prediction via news
disclosures. In Proceedings of the Genetic and Evolutionary Computation Conference Com-
panion (pp. 278-279).
[166] Li, A. W., & Bastos, G. S. (2020). Stock market forecasting using deep learning and
technical analysis: a systematic review. IEEE Access, 8, 185232-185242.
[167] https://github.com/bukosabino/ta [Last access: 2021-11-10].
[168] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,
& Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for sta-
tistical machine translation. arXiv preprint arXiv:1406.1078.
[169] https://jupyter.org/[Last accessed:2021-11-14]
[170] https://numpy.org/[Last accessed:2021-11-14]
[171] https://pandas.pydata.org/[Last accessed:2021-11-14]
[172] https://seaborn.pydata.org/[Last accessed:2021-11-14]
[173] https://matplotlib.org/[Last accessed:2021-11-14]
[174] https://keras.io/[Last accessed:2021-11-14]
[175] https://www.tensorflow.org/[Last accessed:2021-11-14]
49
[176] https://caffe.berkeleyvision.org/[Last accessed:2021-11-14]
[177] https://pytorch.org/[Last accessed:2021-11-14]
[178] https://scikit-learn.org/stable/[Last accessed:2021-11-14]
[179] https://www.h2o.ai/[Last accessed:2021-11-14]
50
Journal Publications
1Faisal Shehzad, Nadeem Javaid, Ahmad Almogren, Abrar Ahmed, Sardar Muhammad
Gulfam and Ayman Radwan, “A Robust Hybrid Deep Learning Model for Detection
of Non-technical Losses to Secure Smart Grids" in IEEE Access, doi: 10.1109/AC-
CESS.2021.3113592.
4 Nadeem Javaid, Hira Gul, Sobia Baig, Faisal Shehzad, Chengjun Xia, Lin Guan, Tanzeela
Sultana “Using GANCNN and ERNET for Detection of Non Technical Losses to Secure
Smart Grids”, IEEE Access, Volume: NN, Pages: NN, Published: June 2021, ISSN:
2169-3536. DOI: 10.1109/ACCESS.2021.3092645.
51
Conference Proceedings
1Faisal Shehzad,Muhammad Asif, Zeeshan Aslam, Shahzaib Anwar, Hamza Rashid, Muham-
mad Ilyasd and Nadeem Javaid, “Comparative Study of Data Driven Approaches towards Efficient
Electricity Theft Detection in Micro Grids”, in the 13th International Conference on Innovative
Mobile and Internet Services in Ubiquitous Computing (IMIS), 2021, ISBN: 978-3-030-22263-5.
2Faisal Shehzad, Nadeem Javaid, Usman Farooq, Hamza Tariq, Israr Ahmad and Sadia Jabeen,
“IoT Enabled E-business via Blockchain Technology using Ethereum Platform”, in 34th Interna-
tional Conference on Web, Artificial Intelligence and Network Applications, (WAINA) 2020, Ad-
vances in Intelligent Systems and Computing, vol 1150, pp: 671-683, ISBN: 978-3-030-44038-1.
DOI: https://doi.org/10.1007/978-3-030-44038-1_62.
3 Omaji Samuel, Nadeem Javaid, Faisal Shehzad, Muhammad Sohaib Iftikhar, Muhammad Zo-
haib Iftikhar, Hassan Farooq and Muhammad Ramzan, “Electric Vehicles Privacy Preserving us-
ing Blockchain in Smart Community”, in 14th International Conference on Broad-Band Wireless
Computing, Communication and Applications (BWCCA), 2019, pp: 67-80, ISBN: 978-3-030-
33505-2. DOI: https://doi.org/10.1007/978-3-030-33506-9_7.
4 Abdul Ghaffar, Muhammad Azeem, Zain Abubaker, Muhammad Usman Gurmani, Tanzeela Sul-
tana, Faisal Shehzad and Nadeem Javaid, “Smart Contracts for Research Lab Sharing Scholars
Data Rights Management over the Ethereum Blockchain Network”, in the 14th International Con-
ference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2019, pp: 70-81, ISBN:
978-3-030-33508-3. DOI: https://doi.org/10.1007/978-3-030-33509-0_7.
52
PART II
RecommendationbytheResearchSupervisor
Name: Dr. Nadeem Javaid Signature:_____________________ Date:March 29, 2021
RecommendationbytheResearchCo-Supervisor
Name: Dr. Mariam Akbar Signature:_____________________ Date:March 29, 2021
SignedbySupervisoryCommittee
S.# Name of Committee member Designation Signature & Date
1Dr. Nadeem Javaid Assistant Professor March 29, 2021
2Dr. Mariam Akbar Associate Professor March 29, 2021
3Dr. Saif Ur Rehman Khan Assistant Professor March 29, 2021
Approved by Departmental Advisory Committee
Certified that the synopsis has been seen by members of DAC and considered it suitable for
putting up to BASAR.
Secretary
Departmental Advisory Committee
Name: _____________________________
Signature: _____________________________
Date: _____________________________
Chairman/HoD: ____________________________
Signature: _____________________________
Date: _____________________________
53
PART III
Dean,FacultyofInformationSciences&Technology
_____________________Approved for placement before BASAR.
_____________________Not Approved on the basis of following reasons
Signature_____________________Date________
SecretaryBASAR
_____________________Approved for placement before BASAR.
_____________________Not Approved on the basis of following reasons
Signature_____________________Date________
Dean,FacultyofInformationSciences&Technology
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
Signature_____________________Date________
54
Please provide the list of courses studied
1. Research Methodology in IT
2. Advanced Topics in Artificial Intelligence
3. Performance Evaluation of Networks
4. Advanced Topics in Simulation and Modeling
5. Advanced Algorithms Analysis
6. Theory of Computation
7. Advanced Topics in Data Mining
8. Intelligent Systems Design
55