ThesisPDF Available

Exploiting Data Science for Detecting Electricity Thefts in Smart Grids and Predicting Trends in Financial Markets (MS Thesis without Source Codes)

Authors:

Abstract and Figures

Data science is an emerging field, which has applications in multiple disciplines; like healthcare, advanced image recognition, airline route planning, augmented reality, targeted advertising, etc. In this thesis, we have exploited its applications in smart grids and financial markets with three major contributions. In the first two contributions, machine learning (ML) and deep learning (DL) models are utilized to detect anomalies in electricity consumption (EC) data, while in third contribution, upwards and downwards trends in the financial markets are predicted to give benefits to the potential investors. Non-technical losses (NTLs) are one of the major causes of revenue losses for electric utilities. In the literature, various ML and DL approaches are employed to detect NTLs. The first solution introduces a hybrid DL model, which tackles the class imbalance problem and curse of dimensionality and low detection rate of existing models. The proposed model integrates benefits of both GoogLeNet and gated recurrent unit (GRU). The one dimensional EC data is fed into GRU to remember periodic patterns. Whereas, GoogLeNet model is leveraged to extract latent features from the two dimensional weekly stacked EC data. Furthermore, the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples, which have high resemblance with real world theft samples. The standard generative adversarial network only updates the weights of those points that are available at the wrong side of the decision boundary. Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of decision boundary, which prevent the model from vanishing gradient problem. Moreover, dropout and batch normalization layers are utilized to enhance model’s convergence speed and generalization ability. The proposed model is compared with different state-of-the-art classifiers including multilayer perceptron (MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network and wide and deep convolutional neural network. The second solution presents a framework, which is employed to solve the curse of dimensionality issue. In literature, the existing studies are mostly concerned with tuning the hyperparameters of ML/ DL methods for efficient detection of NTL, i.e., electricity theft detection. Some of them focus on the selection of prominent features from data to improve the performance of electricity theft detection. However, the curse of dimensionality affects the generalization ability of ML/ DL classifiers and leads to computational, storage and overfitting problems. Therefore, to deal with above-mentioned issues, this study proposes a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for electricity theft detecton using big data in electric power systems. The former (metaheuristics) are used to select prominent features. While the latter are utilized to extract high variance features from electricity consumption data. First, new features are synthesized from statistical and electrical parameters from the user’s consumption history. Then, the synthesized features are used as input to metaheuristic techniques to find a subset of optimal features. Finally, the optimal features are fed as input to the denoising autoencoder to extract features with high variance. The ability of both techniques to select and extract features is measured using a support vector machine. The proposed system reduces the overfitting, storage and computational overhead of ML classifiers. Moreover, we perform several experiments to verify the effectiveness of our proposed system and results reveal that the proposed system has higher performance our counterparts. The third solution introduces a hybrid DL model for prediction of upwards and downwards trends in financial market data. The financial market exhibits complex and volatile behavior that is difficult to predict using conventional ML and statistical methods, as well as shallow neural networks. Its behavior depends on many factors such as political upheavals, investor sentiment, interest rates, government policies, natural disasters, etc. However, it is possible to predict upward and downward trends in financial market behavior using complex DL models. This paper therefore addresses the following limitations that adversely affect the performance of existing ML and DL models, i.e., the curse of dimensionality, the low accuracy of the standalone models, and the inability to learn complex patterns from high-frequency time series data. The denoising autoencoder is used to reduce the high dimensionality of the data, overcoming the problem of overfitting and reducing the training time of the ML and DL models. Moreover, a hybrid DL model HRG is proposed based on a ResNet module and gated recurrent units. The former is used to extract latent or abstract patterns that are not visible to the human eye, while the latter retrieves temporal patterns from the financial market dataset. Thus, HRG integrates the advantages of both models. It is evaluated on real-world financial market datasets obtained from IBM, APPL, BA and WMT . Also, various performance indicators such as f1-score, accuracy, precision, recall, receiver operating characteristic-area under the curve (ROC-AUC) are used to check the performance of the proposed and benchmark models. The RG 2 achieves 0.95, 0.90, 0.82 and 0.80 ROC-AUC values on APPL, IBM, BA and WMT datasets respectively, which are higher than the ROC-AUC values of all implemented ML and DL models.
Content may be subject to copyright.
Exploiting Data Science for Detecting Electricity Thefts
in Smart Grids and Predicting Trends in Financial
Markets (MS Thesis without Source Codes)
By
Faisal Shehzad
CIIT/SP19-RCS-013/ISB
MS Thesis
in
Computer Science
COMSATS University Islamabad, Islamabad - Pakistan
Spring, 2021
COMSATS University Islamabad
Exploiting Data Science for Detecting Electricity Thefts
in Smart Grids and Predicting Trends in Financial
Markets (MS Thesis without Source Codes)
A Thesis Presented to
COMSATS University Islamabad
In partial fulfillment
of the requirement for the degree of
MS (Computer Science)
By
Faisal Shehzad
CIIT/SP19-RCS-013/ISB
Spring, 2021
ii
Exploiting Data Science for Detecting Electricity Thefts
in Smart Grids and Predicting Trends in Financial
Markets (MS Thesis without Source Codes)
A Post Graduate Thesis submitted to the Department of Computer Science as partial fulfilment
of the requirement for the award of Degree of MS (Computer Science).
Name Registration Number
Faisal Shehzad CIIT/SP19-RCS-013/ISB
Supervisor:
Dr. Nadeem Javaid,
Associate Professor, Department of Computer Science,
COMSATS University Islamabad,
Islamabad, Pakistan
Visiting Professor, School of Computer Science,
University of Technology Sydney (UTS), Ultimo, NSW, 2007, Australia
Co-Supervisor:
Dr. Mariam Akbar,
Assistant Professor, Department of Computer Science,
COMSATS University Islamabad,
Islamabad, Pakistan
iii
Final Approval
This thesis titled
Exploiting Data Science for Detecting Electricity Thefts in Smart Grids and
Predicting Trends in Financial Markets (MS Thesis without Source Codes)
By
Faisal Shehzad
CIIT/SP19-RCS-013/ISB
has been approved
For the COMSATS University Islamabad, Islamabad
External Examiner:
Dr. Muhammad Younus Javed
Vice Chancellor, Mirpur University of Science and Technology, Mirpur, Pakistan
Supervisor:
Dr. Nadeem Javaid
Associate Professor, Department of Computer Science,
COMSATS University Islamabad, Islamabad, Pakistan,
Visiting Professor, School of Computer Science,
University of Technology Sydney (UTS), Ultimo, NSW, 2007, Australia
Co-Supervisor:
Dr. Mariam Akbar,
Assistant Professor, Department of Computer Science,
COMSATS University Islamabad, Islamabad, Pakistan
Incharge:
Dr. Majid Iqbal Khan,
Associate Professor, Department of Computer Science,
COMSATS University Islamabad, Islamabad
iv
Declaration
Faisal Shehzad (Registration No. CIIT/SP19-RCS-013/ISB) hereby declare that I have pro-
duced the work presented in this thesis, during the scheduled period of study. I also declare
that I have not taken any material from any source except referred to wherever due that amount
of plagiarism is within acceptable range. If a violation of HEC rules on research has occurred
in this thesis, I shall be liable to punishable action under the plagiarism rules of the HEC.
Date: July, 2021
Faisal Shehzad
CIIT/SP19-RCS-013/ISB
v
Certificate
It is certified that Faisal Shehzad (Registration No. CIIT/SP19-RCS-013/ISB) has carried out
all the work related to this thesis under my supervision at the Department of Computer Science,
COMSATS University, Islamabad and the work fulfils the requirement for award of MS degree.
Date: July, 2021
Supervisor:
Dr. Nadeem Javaid
Associate Professor, Department of Computer Science
Visiting Professor, School of Computer Science,
University of Technology Sydney (UTS), Ultimo, NSW, 2007, Australia
Co-Supervisor:
Dr. Mariam Akbar
Assistant Professor, Computer Information Science,
Incharge:
Dr. Majid Iqbal Khan
Department of Computer Science
vi
DEDICATION
Dedicated
to my mentor Dr. Nadeem Javaid, loving Parents and Brothers, who equipped
me with pearls of knowledge and showed me the way of spiritual and personal
enlightenment in this world and the world hereafter.
vii
ACKNOWLEDGEMENT
First of all, thanks to Allah Almighty who give me strength and confidence to complete this
dissertation. After that, I would like to express my profound appreciation to many people
who supported me during my MS and who helped me to complete my thesis. Their generous
support made this research work possible
Firstly, I would like to express my sincere gratitude to my advisor Dr. Nadeem Javaid for
the continuous support of my MS study and related research, for his patience, motivation and
immense knowledge. His guidance helped me in all the time of research and writing of this
thesis. I could not have imagined having a better advisor and mentor for my MS study. I am
truly indebted to him for his knowledge, thoughts and friendship.
I would like to thank my parents and my brothers Farooq Azam, Zaib and Qaisar Shehzad for
their continuous support, understanding and assistance whenever I needed them throughout
my MS studies and research work. Moreover, I would like to say special thanks to Faiza Anwar
who motivated me in a difficult time. I am always grateful to them for their encouragement
and support.
Last but not the least, I am greatly thankful to Director of ComSens Lab and all of my
colleagues at CUI for providing me the warm and friendly atmosphere.
viii
ABSTRACT
Exploiting Data Science for Detecting Electricity Thefts in Smart
Grids and Predicting Trends in Financial Markets (MS Thesis
without Source Codes)
Data science is an emerging field, which has applications in multiple disciplines; like healthcare,
advanced image recognition, airline route planning, augmented reality, targeted advertising, etc.
In this thesis, we have exploited its applications in smart grids and financial markets with three
major contributions. In the first two contributions, machine learning (ML) and deep learning
(DL) models are utilized to detect anomalies in electricity consumption (EC) data, while in
third contribution, upwards and downwards trends in the financial markets are predicted to
give benefits to the potential investors.
Non-technical losses (NTLs) are one of the major causes of revenue losses for electric utilities.
In the literature, various ML and DL approaches are employed to detect NTLs. The first
solution introduces a hybrid DL model, which tackles the class imbalance problem and curse
of dimensionality and low detection rate of existing models. The proposed model integrates
benefits of both GoogLeNet and gated recurrent unit (GRU). The one dimensional EC data
is fed into GRU to remember periodic patterns. Whereas, GoogLeNet model is leveraged to
extract latent features from the two dimensional weekly stacked EC data. Furthermore, the
time least square generative adversarial network (TLSGAN) is proposed to solve the class
imbalance problem. The TLSGAN uses unsupervised and supervised loss functions to generate
fake theft samples, which have high resemblance with real world theft samples. The standard
generative adversarial network only updates the weights of those points that are available at
the wrong side of the decision boundary. Whereas, TLSGAN even modifies the weights of
those points that are available at the correct side of decision boundary, which prevent the
model from vanishing gradient problem. Moreover, dropout and batch normalization layers are
utilized to enhance model’s convergence speed and generalization ability. The proposed model
is compared with different state-of-the-art classifiers including multilayer perceptron (MLP),
support vector machine, naive bayes, logistic regression, MLP-long short term memory network
and wide and deep convolutional neural network.
The second solution presents a framework, which is employed to solve the curse of dimensional-
ity issue. In literature, the existing studies are mostly concerned with tuning the hyperparam-
eters of ML/ DL methods for efficient detection of NTL, i.e., electricity theft detection. Some
of them focus on the selection of prominent features from data to improve the performance
of electricity theft detection. However, the curse of dimensionality affects the generalization
ability of ML/ DL classifiers and leads to computational, storage and overfitting problems.
ix
Therefore, to deal with above-mentioned issues, this study proposes a system based on meta-
heuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for
electricity theft detecton using big data in electric power systems. The former (metaheuristics)
are used to select prominent features. While the latter are utilized to extract high variance
features from electricity consumption data. First, new features are synthesized from statistical
and electrical parameters from the user’s consumption history. Then, the synthesized features
are used as input to metaheuristic techniques to find a subset of optimal features. Finally, the
optimal features are fed as input to the denoising autoencoder to extract features with high
variance. The ability of both techniques to select and extract features is measured using a sup-
port vector machine. The proposed system reduces the overfitting, storage and computational
overhead of ML classifiers. Moreover, we perform several experiments to verify the effectiveness
of our proposed system and results reveal that the proposed system has higher performance
our counterparts.
The third solution introduces a hybrid DL model for prediction of upwards and downwards
trends in financial market data. The financial market exhibits complex and volatile behavior
that is difficult to predict using conventional ML and statistical methods, as well as shallow
neural networks. Its behavior depends on many factors such as political upheavals, investor
sentiment, interest rates, government policies, natural disasters, etc. However, it is possible to
predict upward and downward trends in financial market behavior using complex DL models.
This paper therefore addresses the following limitations that adversely affect the performance of
existing ML and DL models, i.e., the curse of dimensionality, the low accuracy of the standalone
models, and the inability to learn complex patterns from high-frequency time series data. The
denoising autoencoder is used to reduce the high dimensionality of the data, overcoming the
problem of overfitting and reducing the training time of the ML and DL models. Moreover,
a hybrid DL model HRG is proposed based on a ResNet module and gated recurrent units.
The former is used to extract latent or abstract patterns that are not visible to the human
eye, while the latter retrieves temporal patterns from the financial market dataset. Thus,
HRG integrates the advantages of both models. It is evaluated on real-world financial market
datasets obtained from IBM, APPL, BA and WMT . Also, various performance indicators such
as f1-score, accuracy, precision, recall, receiver operating characteristic-area under the curve
(ROC-AUC) are used to check the performance of the proposed and benchmark models. The
RG 2achieves 0.95, 0.90, 0.82 and 0.80 ROC-AUC values on APPL, IBM, BA and WMT
datasets respectively, which are higher than the ROC-AUC values of all implemented ML and
DL models.
x
Journal Publications
1Faisal Shehzad, Nadeem Javaid, Ahmad Almogren, Abrar Ahmed, Sardar Muham-
mad Gulfam and Ayman Radwan, “A Robust Hybrid Deep Learning Model for Detec-
tion of Non-technical Losses to Secure Smart Grids” in IEEE Access, doi: 10.1109/AC-
CESS.2021.3113592.
4 Nadeem Javaid, Hira Gul, Sobia Baig, Faisal Shehzad, Chengjun Xia, Lin Guan, Tanzeela
Sultana “Using GANCNN and ERNET for Detection of Non Technical Losses to Secure
Smart Grids”, IEEE Access, Volume: NN, Pages: NN, Published: June 2021, ISSN:
2169-3536. DOI: 10.1109/ACCESS.2021.3092645.
xi
Conference Proceedings
1Faisal Shehzad,Muhammad Asif, Zeeshan Aslam, Shahzaib Anwar, Hamza Rashid, Muham-
mad Ilyasd and Nadeem Javaid, “Comparative Study of Data Driven Approaches towards Ef-
ficient Electricity Theft Detection in Micro Grids”, in the 13th International Conference on
Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2021, ISBN: 978-3-
030-22263-5.
2Faisal Shehzad, Nadeem Javaid, Usman Farooq, Hamza Tariq, Israr Ahmad and Sadia Jabeen,
“IoT Enabled E-business via Blockchain Technology using Ethereum Platform”, in 34th Inter-
national Conference on Web, Artificial Intelligence and Network Applications, (WAINA) 2020,
Advances in Intelligent Systems and Computing, vol 1150, pp: 671-683, ISBN: 978-3-030-44038-
1. DOI: https://doi.org/10.1007/978-3-030-44038-1 62.
3 Omaji Samuel, Nadeem Javaid, Faisal Shehzad, Muhammad Sohaib Iftikhar, Muhammad
Zohaib Iftikhar, Hassan Farooq and Muhammad Ramzan, “Electric Vehicles Privacy Preserv-
ing using Blockchain in Smart Community”, in 14th International Conference on Broad-Band
Wireless Computing, Communication and Applications (BWCCA), 2019, pp: 67-80, ISBN:
978-3-030-33505-2. DOI: https://doi.org/10.1007/978-3-030-33506-9 7.
4 Abdul Ghaffar, Muhammad Azeem, Zain Abubaker, Muhammad Usman Gurmani, Tanzeela
Sultana, Faisal Shehzad and Nadeem Javaid, “Smart Contracts for Research Lab Sharing
Scholars Data Rights Management over the Ethereum Blockchain Network”, in the 14th Inter-
national Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2019,
pp: 70-81, ISBN: 978-3-030-33508-3. DOI: https://doi.org/10.1007/978-3-030-33509-0 7.
xii
TABLE OF CONTENTS
Dedication vii
Acknowledgements viii
Abstract ix
Journal Publications 98
Conference Proceedings 99
List of Figures xvi
List of Tables xviii
List of Symbols xix
11
1.1 Introduction ...................................... 2
1.2 Organization of Thesis ................................ 8
29
2.1 Literature Review ................................... 10
2.1.1 Solving the Curse of Dimensionality Issue ................. 10
2.1.2 Handling the Class Imbalanced Problem .................. 14
2.1.3 Optimizing the Hyperparameters ...................... 18
2.2 Problem Statement .................................. 20
2.2.1 Sub Problem Statement 1 .......................... 20
2.2.2 Sub Problem Statement 2 .......................... 20
2.2.3 Sub Problem Statement 3 .......................... 21
322
3.1 The Proposed Solution 1: Hybrid of GRU and GoogleNet for classification of
malicious and benign samples ............................ 23
3.1.1 Acquiring the Dataset ............................ 23
3.1.2 Data Preprocessing .............................. 23
3.1.3 Handling the Missing Values ......................... 23
3.1.4 Removing the Outliers from Dataset .................... 24
3.1.5 Normalization ................................. 25
3.1.6 Exploratory Dataset Analysis ........................ 25
3.1.7 The Description of Proposed Model ..................... 28
3.1.7.1 Handling the Class Imbalance Problem .............. 28
3.1.7.2 Architecture of Hybrid Model ................... 31
3.1.7.3 Gated Recurrent Unit ....................... 31
3.1.7.4 GoogLeNet ............................. 33
xiii
3.1.7.5 Hybrid Module ........................... 34
3.2 Proposed Solution 2: Framework based on Denoising Autoencoder and Meta-
heuristic Techniques ................................. 35
3.2.1 Dataset Description .............................. 35
3.2.2 Synthesized Features ............................. 36
3.2.3 Description of the Proposed Framework ................... 38
3.2.3.1 Genetic Algorithm ......................... 38
3.2.3.2 Artificial Bee Colony Algorithm .................. 39
3.2.3.3 Denoising Autoencoder ....................... 41
3.3 Proposed Solution 3: Hybrid deep learning model based on ResNet and GRU
for classification of upwards and downwards trends ................ 42
3.3.1 Data Processing Phase ............................ 42
3.3.1.1 Acquiring Datasets ......................... 42
3.3.1.2 Normalization ............................ 43
3.3.1.3 Denoising Autoencoder ....................... 43
3.4 The Description of Proposed Model ......................... 45
3.4.1 Gated Recurrent Unit ............................ 45
3.4.2 ResNet Model ................................. 47
3.4.3 Hybrid Module ................................ 48
449
4.1 Performance Analysis of Proposed Solution 1 .................... 50
4.1.1 Performance Metrics ............................. 50
4.1.2 Experiments and Results Analysis ...................... 50
4.1.2.1 Analysis of Least Square Generative Adversarial Network . . . 50
4.1.3 Performance Analysis of Gated Recurrent Unit ............... 53
4.1.4 Performance Analysis of GoogleNet ..................... 55
4.1.5 Performance Analysis of Hybrid HG2Model ................ 56
4.1.6 Comparison with Benchmark Classifiers .................. 58
4.1.7 Mapping among Limitations, Solutions and Validations .......... 60
4.2 Performance Analysis of Proposed Solution 2 .................... 62
4.2.1 Performance Measures ............................ 62
4.2.2 Experiments and Results Discussion ..................... 62
4.2.3 Results’ Discussion of Metaheuristic Techniques .............. 63
4.2.4 Experimental Results of Denoising Autoencoder .............. 66
4.2.5 Mapping Table ................................ 67
4.3 Performance Analysis of Proposed Solution 3 .................... 68
4.4 Experimental and Result Discussion ......................... 68
4.4.1 Performance Indicators ............................ 68
4.4.2 Performance Analysis of Denoising Autoencoder .............. 69
4.4.3 Comparision .................................. 71
4.5 Hyperparameters Study ............................... 76
4.6 Mapping table ..................................... 79
581
5.1 Conclusions ...................................... 82
5.2 Future Work ...................................... 83
xiv
684
Journal Publications 98
Conference Proceedings 99
xv
LIST OF FIGURES
1.1 Data science pillars .................................. 2
3.1 Monthly electricity consumption of a normal consumer .............. 25
3.2 Weekly electricity consumption of a normal consumer ............... 26
3.3 Monthly electricity consumption of a abnormal consumer ............. 26
3.4 Weekly electricity consumption of a normal consumer ............... 27
3.5 Pearson correlation analysis weekly consumption of a normal consumer . . . . . 27
3.6 Pearson correlation analysis weekly consumption of a abnormal consumer . . . . 28
3.7 The proposed system model: HG2.......................... 32
3.8 EC before and after applying theft attacks ..................... 36
3.9 EC before and after applying theft attacks ..................... 36
3.10 Diagram of proposed framework ........................... 38
3.11 Denoising autoencoder ................................ 41
3.12 Residual module with skip connection ........................ 44
3.13 The proposed model: HRG .............................. 45
4.1 Generator and discriminator loss .......................... 51
4.2 Consumption patterns of real theft samples ..................... 51
4.3 Consumption patterns of generated theft samples ................. 52
4.4 Consumption patterns of generated theft samples ................. 52
4.5 Precision recall curve ................................. 53
4.6 Receiver operating characteristic curve ....................... 54
4.7 Accuracy and loss on training and testing data ................... 54
4.8 Precision recall curve ................................. 55
4.9 Receiver operating characteristic curve ....................... 55
4.10 Accuracy and loss on training and testing data ................... 56
4.11 Precision recall curve ................................. 56
4.12 Receiver operating characteristic curve ....................... 57
4.13 Accuracy and loss on training and testing data ................... 57
4.14 Feature selection ................................... 64
4.15 Execution time .................................... 65
4.16 Convergence analysis ................................. 65
4.17 Convergence rate of denoising autoencoder ..................... 67
4.18 Analyze the performance of extracted features by denoising autoencoder . . . . 67
4.19 APPL dataset ..................................... 70
4.20 IBM dataset ...................................... 70
4.21 BA dataset ...................................... 70
4.22 WMT dataset ..................................... 71
4.23 APPL dataset ..................................... 73
4.24 IBM dataset ...................................... 73
4.25 BA dataset ...................................... 74
4.26 WMT dataset ..................................... 74
4.27 Effect of α....................................... 77
4.28 Effect of α....................................... 77
xvi
4.29 Effect of β....................................... 78
4.30 Effect of β....................................... 78
4.31 Effect of γ....................................... 79
4.32 Effect of γ....................................... 79
xvii
LIST OF TABLES
2.1 Related work ..................................... 15
3.1 Dataset information .................................. 23
3.2 Euclidean distance similarity measure ........................ 27
4.1 Comparison through accuracy and execution time of different data generation
techniques ....................................... 52
4.2 Hyperparameters setting of gated recurrent unit .................. 54
4.3 Hyperparameters setting of GoogLeNet ....................... 56
4.4 Hyperparameters setting of proposed model .................... 58
4.5 Comparison of HG2with existing techniques .................... 58
4.6 Mapping table ..................................... 59
4.7 Performance analysis before and after features’ synthesization .......... 63
4.8 Parameters settings of meta-heuristic techniques .................. 64
4.9 Features selected by ABC and GA ......................... 65
4.10 Performance analysis of metaheuristic techniques ................. 66
4.11 Hyperparameter setting of denoising autoencoder ................. 66
4.12 ............................................. 68
4.13 Hyperparameter setting of denoising autoencoder ................. 72
4.14 Proposed model performance on APPL dataset .................. 73
4.15 Proposed model performance on IBM dataset ................... 74
4.16 Proposed model performance on BA dataset .................... 74
4.17 Proposed model performance on WMT dataset ................... 75
4.18 ............................................. 80
xviii
List of Abbreviations and Symbols
Abbreviation Full form
ADASYN Adaptive synthetic sampling approach
AMI Advanced metering infrastructure
APPL Apple technology company
ANN Artificial neural network
Adam Adaptive moment estimation
ABC Artificial bee colony
Adagrad Adaptive Gradient Algorithm
BA British airways
CNN Convolutional neural network
CPBETD Consumption pattern based electricity theft detector
Catboost Categorical boosting
D Discriminator
DR Detection rate
DL Deep learning
DT Decision tree
ETD Electricity theft detection
EC Electricity consumption
EMH Efficient-market hypothesis
FPR False positive rate
FP False positive
FN False negative
GA Genetic algorithm
GRU Gated recurrent unit
G Generator
I Electric current
IBM International Business Machines
KNN k-nearest neighbors
LReLU Leaky rectified linear unit
xix
LSTM Long short term memory
LR Logistic regression
LSGAN Least square generative adversarial network
LightGBM Light gradient boosting machine
ML Machine learning
SVM Support vector machine
SGCC Smart grid corporation of China
SMOTE Synthetic minority over-sampling technique
MR Modification rate
MLP Multilayer perceptron
MAD Mean absolute deviation
MSE Mean sqaure error
NTLs Non technical losses
NB Naive bayes
NaN Not a number
TimeGAN Time series LSGAN
TPR True positive rate
TSR Three sigma rule
SSDAE Stacked sparse denoising autoencoder
R Resistance
RUS Random undersampling
RF Random forest
ROS Random oversampling
RNN Recurrent neural network
RDBN Real-valued deep belief network
HRG ResNet and GRU
RMSProp Root mean squared propagation
PR-AUC Precision recall-area under curve
PCA Principal component analysis
PRECON Pakistan residential electricity consumption
ROC-AUC Receiver operating characteristic - area under curve
ROC curve ROC curve
TLs Technical losses
TP True positive
TN True negative
VGG Visual geometry group
WMT Wal-Mart Stores
WDCNN Wide and deep convolutional neural network
xx
XGBoost eXtreme gradient boosting
aLabel of theft sample
bLabel of fake sample
bHG2Bias of hybrid layer
cDistance variable G wants to decieve D
DenseGoogLeN et Last layer of GoogLeNet
DenseGRU Last layer of GRU
EExpected value of all instances
htHidden state at time stemp t
hHG2Hidden layer of hybrid module
ˆ
hCandidate value
Pdata(x)Theft data
Pg(z)Gaussian distribution
rReset gate
wiith week EC
wmmth week EC
WrWeight of reset gate
WzWeight of update gate
WHG2Weight of hybrid layer
xiRepresents complete consumption history of consumer i
xi,j Represents daily EC of a consumer iover time period j(a day)
xi,j-1 Represents EC of a previous day
xi,j+1 Represents the EC of a next day
¯xiDenotes average consumption
σ(xi) Represents standard deviation of consumer i
min(xi) Represent minimum value of consumer i
YN T L Output of having NTLs or not
zUpdate gate
xxi
Chapter 1
Introduction
1
Chapter 1 Chapter 1
1.1 Introduction
Data science is an interdisciplinary field, which includes scientific methods, advanced algorithms
and computer software to extract valuable knowledge from noisy, structured and unstructured
data, and applies extracted knowledge into a wide range of domains. The data science do-
main takes knowledge from multiple fields: physics, mathematics, computer science, statistics,
domain knowledge, etc. It has applications in multiple fields:
Computer
science
Machine
learning Math, Physics
& Statistics
Software
development
Data
Science
Research
innovative
ideas
Domain
knowledge
Figure 1.1: Data science pillars
1. Fraud and risk detection in banking: The earliest application of data science was in the
banking sector. The banking systems collect customer information during paperwork
while giving loans or other products. After this, they implement advanced machine
learning, deep learning and data mining methods to extract use insights, which tell about
the probability of risk or default. Moreover, this analysis also helps them how to increase
the purchasing power of consumers by learning their habits.
2. Health care: The health care sector receives tremendous benefits from data science: med-
ical images analysis to identify the areas where cancerous or malignant cells are growing
rapidly, Genetics & Genomics analysis to understand relationship between disease, drug
response and genetics of an individual person, etc.
3. Recommendation system (Netflix and Amazon, eBay). The recommendation systems are
the systems that are designed to recommend new things or articles to users on a basis
of their habits and other factors. Big giants like Netflix, Amazon, eBay, YouTube, etc.
utilize these systems to recommend products to their clients. Nowadays, data scientists
2Thesis by: Faisal Shehzad
Chapter 1 Chapter 1
use advanced clustering, classification and forecasting models to extract complex patterns
in clients’ behaviors, which help companies in target marketing.
4. Computer vision: It is one of the hottest research topics in data science. We use Facebook
regularly where we attract with the tagging feature, which is one of the best examples of
data science in computer vision. So, data science has become part of our lives knowingly
or unknowingly. It has applications in many other fields like sentiment analysis, target
marketing, gaming, natural language processing, etc.
In this thesis, we analyze the applications of data science in two domains. In first phase, data
science techniques are applied to detect anomalies in electricity consumption (EC) data, which
is collected with the help of smart grids. Moreover, the first phase consists of two solutions.
In first solution, feature extraction and classification steps are performed with the help of a
hybrid deep learning model. While, the second solution only focuses curse of dimensionality
issue, which is handled using denoising autoencoder and metaheuristic techniques. In second
phase, uplward and downward trends are predicted in financial market for giving benifits to
potential investors. The detailed description of both phases are given below.
Electricity is an important factor in human lives and becomes essential for the economic and
social development of any country. Various losses occur in the transmission and distribution of
electricity, namely technical losses and non-technical losses (NTLs). The technical losses occur
due to the dissipation of energy in transmission lines, transformers, and electrical equipment,
while the NTLs occur due to direct connection to transmission lines, meter tampering, faulty
meters and changes in meter readings through communication links Moreover, a recent report
shows that NTLs cause 96 billion of revenue loss every year [1]. According to the World
Bank’s report, India, China and Brazil bear 25%, 6% and 16% loss on their total electric
supply, respectively. The NTLs are not limited to only developing countries; it is estimated
that developed countries like UK and US also lose 232 million and 6 billion US dollars per
annum, respectively [2,3,4].
Electricity theft is a primary cause of NTLs. The evolution of advanced metering infrastructure
(AMI) promises to overcome electricity theft through monitoring users’ consumption history.
However, it introduces new types of cyber-attacks, which are difficult to detect using conven-
tional methods. Whereas, traditional meters are only compromised through physical tamper-
ing. In AMI, the meter readings are tampered locally and remotely over the communication
links before sending them to an electric utility [5,6,7,8,9,10,11,12,13,14,15,16]. There
are three types of approaches to address the NTLs in AMI: state, game theory and data-driven.
State-based approaches exploit wireless sensors and radio frequency identification tags to de-
tect NTLs. However, these approaches require high installation, maintenance and training cost
and they also perform poorly in extreme weather conditions [17,18,19,20,21,22,23,24,25].
3Thesis by: Faisal Shehzad
Chapter 1 Chapter 1
Beside this, game theory based approaches hold a game between a power utility and consumers
to achieve equilibrium state and then extract hidden patterns from users’ EC history. However,
it is difficult to design a suitable utility function for utilities, regulators, distributors and energy
thieves to achieve equilibrium state within the defined time [26,27,28,29,30,31,32]. More-
over, both NTLs detection approaches have low detection rate (DR) and high false positive
rate (FPR)
The data driven methods get high attention due to the availability of electricity consumption
(EC) data that is collected through AMI. A normal consumer’s EC follows a statistical pattern,
whereas, abnormal1EC does not follow any pattern. The machine learning (ML) and data
mining techniques are trained on collected data to learn normal2and abnormal consumption
patterns. After training, the model is deployed in a smart grid to classify incoming consumer’s
data into normal or abnormal samples. Since, these techniques use already available data and
do not require to deploy hardware devices at consumers’ site that is why their installation and
maintenance costs are low as compared to hardware based methods. However, class imbalance
problem is a serious issue for data driven methods where the number of normal EC samples is
more than theft ones. Normal data is easily collected through users’ consumption history.
Whereas, theft cases are relatively rare than normal class in the real world that is why few
number of samples are present in user’s consumption history. So, lack of theft samples affect
the performance of classification models. The ML models become biased towards majority
class and ignore the minority class, which increases the FPR [33,34,35,36,37,38,39,40,41,
42,43,44]. In literature, the authors mostly use random undersampling (RUS) and random
oversampling (ROS) techniques to handle the class imbalance problem. However, both tech-
niques have underfitting and overfitting issues that increase the FPR and minimize the DR
[5,45,46,47,48,49,50,51,52,53,54,55,56,57]. The second challenging issue is the curse
of dimensionality. A time series dataset contains a large number of timestamps (features) that
increase both execution time and memory complexity and reduce the generalization ability of
ML methods. However, traditional ML methods have low DR and overfitting issue due to
curse of dimensionality. They require domain knowledge to extract prominent features that is
a time consuming task [2,5]. Moreover, metaheuristic techniques are proposed by understan-
ing the working mechanism of nature. In literature, these techniques are mostly utilized for
optimization and feature selection purposes [58].
In this thesis, time series least square generative adversarial network (TLSGAN) is proposed,
which is specifically designed to handle data imbalance problem of time series datasets. It
utilizes supervised and unsupervised loss functions and gated recurrent unit (GRU) layers
to generate fake theft samples, which have high resemblance with real world theft samples.
1Theft and abnormal words are used interchangeably
2Benign and normal words are used interchangeably.
4Thesis by: Faisal Shehzad
Chapter 1 Chapter 1
Whereas, standard GAN uses only unsupervised loss function to generate fake theft samples,
which have low resemblance with real word theft samples. Moreover, a HG2model is pro-
posed, which is a hybrid of GoogLeNet and GRU. It is a challenging task to capture long-term
periodicity from one dimensional (1D) time series dataset. The deep learning (DL) models
have better ability to memorize sequence patterns as compare to traditional ML models. The
1D data is fed into GRU to capture temporally correlated patterns from users’ consumption
history. Whereas, weekly consumption data is passed to GoogLeNet to capture local features
from sequence data using the inception modules. Each inception module contains multiple
convolutional and max-pooling layers that extract high level features from time series data and
overcome the curse of dimensionality issue. Moreover, non malicious factors like changing the
number of persons in a house, extreme weather conditions, weekends, big party in a house,
etc., affect the performance of ML methods. The GRU is used to handle non malicious factors
because it has memory modules. These memory modules help GRU to learn sudden changes
in consumption patterns and memorize them, which decrease the FPR. Moreover, dropout and
batch normalization layers are used to enhance convergence speed, model generalization ability
and increase the DR. The main contributions of solution 1 are given below.
a hybrid model is proposed that is a combination of GRU and GoogleNet. The former
extracts the temporal patterns from the EC dataset while the latter retrieves latent or
abstract features that are not observed through the human eye. The self-learning mecha-
nism of the hybrid model increases convergence speed, accuracy and overall performance.
Moreover, we work on 1D and 2D data parallelly. The 1D data is fed into GRU to learn
time-related patterns, whereas, 2D data is passed to GoogleNet to capture latent features
from weekly consumption,
the class imbalance problem is a severe issue in ETD that drastically affects the perfor-
mance of classifiers. The TimeGAN exploits to generate fake samples from existing theft
patterns to tackle the class imbalance ratio,
extensive experiments are conducted on a realistic EC dataset that is provided by smart
grid corporation of China (SGCC), the largest smart grid company in China. Moreover,
different performance indicators are utilized to evaluate the performance of the proposed
model: receiver operating characteristic curve (ROC curve), ROC-area under the curve
(ROC-AUC), precision recall curve (PR curve) and PR-area under the curve (PR-AUC),
GRU model is utilized to handle non malicious factors like sudden changes in consump-
tion patterns due to increase in family members, change in weather conditions, etc. It
has memory modules to remember consumption history of a user and compare the cur-
rent input with previous saved user’s history before giving final prediction about having
anamoly or not and
5Thesis by: Faisal Shehzad
Chapter 1 Chapter 1
batch normalization and dropout layers are used to enhance convergence speed of model
and reduce overfitting issue.
In second solution, we solve the curse of dimensionality issuee. Despite the extensive use
of ML classifiers, some ML researchers focus on the curse of dimensionality, which leads to
overfitting, computational overhead, and memory limitations. In [2], Joker et al. propose EC
theft detectoion methld that is based on support vector machine (SVM) and hardware devices
to distinguish between normal and abnormal patterns. Both of the above problems generate
false alarms that are not sustainable for an electric utility due to the limited budget for on-site
inspections. In [58], the authors use four metaheuristic techniques: black hole, harmonic search,
particle swarm optimization, and differential evolution to select optimal features from the EC
dataset. They use accuracy as a fitness function to evaluate the performance of the selected
features by the four techniques. However, accuracy is not a good measure of imbalanced class
datasets. In this study, a framework based on three modules is developed to address the above
issues. The list of contributions of second solution is given below.
A hybrid method based on metaheuristics and ML methods have been developed using
big data for efficient electricity theft detection.
In order to reduce FPR and improve DR, updated version of theft cases is exploited to
generate malicious samples from benigns.
Eleven different features are synthesized from the EC data using the statistical and electri-
cal parameters. The features provide good classification accuracy and F1-score, indicating
that they are good representatives of the EC data.
The metaheuristic techniques, i.e., ABC and GA are used to select optimal features
from the newly synthesized features. In denoising, autoencoders are used to reduce the
high dimensionality and extract features with high variance. This process reduces the
computational cost and memory constraints that limit the real-time applications of ML
classifiers for smart grids.
The metaheuristic techniques are used in literature to select a subset of features from
EC data. They use accuracy as a fitness function. However, it is not a suitable measure
for imbalanced datasets. That is why F1-score is utilized as a fitness function because it
gives equal weights to both classes.
Although predicting future trends and the direction of financial market movements is one of
the potential tasks in the financial industry, performing such a task is very difficult due to the
complex and volatile nature of the financial market [59]. It depends on many factors such as
political conditions of a country, government policies, investor sentiments, etc. [60]. For many
6Thesis by: Faisal Shehzad
Chapter 1 Chapter 1
years, people in academia and in the financial market have believed that future trends cannot
be predicted. This belief is based on the random walk theory [61] and the efficient market
hypothesis (EMH) [62]. The movement of the financial market moves along a random path
and behaves like Brownian motion. Thus, according to [63], there are only 50% chances to
predict its behavior. Furthermore, according to the EMH concept, its behavior depends on the
information currently available. So we are not able to predict the movements of the financial
market on a regular bases.
Forecasting in the financial market is guided by two schools of thought, namely technical
analysis and fundamental analysis. The former analyzes the stock price and its turnover.
It takes into account all the factors that affect the stock price such as economic conditions,
social or cultural factors, etc. Technical analysis also assumes that price movements continue
until the stock reaches its peak and then reverses. The latter school of thought examines
fundamental factors such as investor sentiment, natural disasters, company ratings that affect
price movements in the financial markets. So, in fundamental analysis, researchers focus on the
factors that affect stock prices, while in technical analysis, they measure the impact of these
factors on stock prices. Thus, the second school of thought strengthens the EMH, which states
that the financial market is independent because these fundamental factors can be changed at
any time. However, despite extensive debates about the EMH in academia and industry, no
one has confirmed or refuted it. In practice, researchers develop complex mathematical models
and profit from them by predicting the behavior of financial markets [64].
Existing literature uses statistical and econometric techniques to predict the upward and down-
ward trends of financial markets [60]. However, these techniques have low accuracy in predicting
the behavior of financial markets, resulting in large losses for potential investors. Recently, ma-
chine learning (ML) and deep learning (DL) models are receiving a lot of attention from the
research community as they are trained on historical data of the financial market [65]. However,
these models have the following problems: the curse of dimensionality, inappropriate tuning of
parameters, inability to learn complex patterns, and low accuracy of the independent models.
[66] do not address the problem of the curse of dimensionality. [67] use principal component
analysis (PCA) to reduce the high dimensionality of the data. However, it gives good results
for linear data compared to non-linear data. We know that the financial market has complex
and nonlinear behavior. [68] use a stacked autoencoder to solve this problem. However, this is
sensitive to the diversity of the data, which affects its generalization ability. Another problem
with single classifiers is the trade-off between bias and variance. We prefer models that have
low bias and variance, but this is difficult to achieve. In this study, a denoising autoencoder
is used to solve the curse of dimensionality problem, while a hybrid DL model is proposed to
predict upward and downward trends in financial market data. Moreover, the problem of bias
7Thesis by: Faisal Shehzad
Chapter 1 Chapter 1
and variance of individual classifiers is addressed. The main contributions of solution 3 are
given:
a hybrid DL model HRG is proposed for financial trend prediction, which is a combination
of ResNet and gated recurrent unit (GRU),
the problem of the curse of dimensionality is solved using a denoising autoencoder and
extensive experiments are conducted to compare the performance of the proposed model
with the performance of benchmark models. We use different datasets, i.e., IBM, APPL,
BA, and WMT, to evaluate the effectiveness of the proposed model in real-world scenarios.
1.2 Organization of Thesis
The remaining thesis is organized as follows. Chapter 2 describes the related work and problem
statement. The Chapters 3 and 4 present system models and experiment results. Finally, the
conclusion and future work of the thesis is given in Chapter 5.
8Thesis by: Faisal Shehzad
Chapter 2
Literature Review and Problem Statement
9
Chapter 2 Chapter 2
2.1 Literature Review
In this section, we have studied the state-of-the-art articles to understand how researchers are
using machine learning and deep learning methods in smart grids and financial market.
2.1.1 Solving the Curse of Dimensionality Issue
In [5], the authors use a support vector machine (SVM) to detect the abnormal patterns from
EC data. However, the authors do not use any feature engineering technique to extract or
select the prominent features as a time-series dataset contains large number of dimensions.
The high dimensionality of data creates time complexity, storage and FPR issues. In [33,
45,69,70,71,72,73], feature selection is important part of data-driven based techniques
where significant features are selected from existing ones. Domain knowledge requires for
feature selection. During feature selection process, less domain knowledge increases FPR and
decreases classification accuracy. In [57], previous studies use only the EC dataset to train
ML classifiers and predict abnormal patterns. They do not use smart meter information and
auxiliary data (geographical information, meter inside or outside, etc.) to predict normal and
abnormal patterns from EC data.
In [74,75,76,77,78], there are various consumption behaviours of different users. Consumption
behaviours of each customer give different results. So, it is necessary to select those features,
which give best results. However, consumption behaviours are closely related and significant
correlation exists between these features. The authors remove highly correlated and overlapped
features, which help to improve DR and decrease FPR. In [34], existing NTL detection methods
are based on data driven approaches and specific hardware devices. The hardware based
methods are time-consuming, less efficient and more expensive. While, data driven methods
require domain experts to classify normal and abnormal patterns. The introduction of smart
meters helps in NTL detection. However, they introduce several types of new attacks that are
difficult to detect through existing detection algorithms, which are based on domain knowledge.
In case of NTLs, electricity theft reports lower consumption than actual consumption. The
theft customers use different techniques to change consumption patterns. For example, shunt
devices diverge current from an input terminal to output terminals or bypass a smart meter.
In double-tapping attacks, appliances are directly connected to distribution lines. As there is
no mathematical formulation for these attacks. So, handcrafted features are designed to detect
these attacks. For example, features that show sudden changes in consumption patterns are
used for shunt attacks. Faulty meters are detected through the presence of number of zeroes or
missing values in electricity measurements. The handcrafted features require continuous work,
less efficient, more expensive and depend upon domain knowledge to select optimal features.
10 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
In [58,79,80,81,82,83,84,85,86,87,88], despite of extensive uses of ML techniques,
no one focuses on the selection of optimal features. In [46,123,89,90,91,92,93,94], the
authors give possibilities of implementing ML classifiers for detection of NTLs and describe the
advantage of selecting optimal features and their impacts on classifier performance. One of main
challenges [95] that limited the classification ability of existing methods are high dimensionality
of data. Although, smart meters greatly improve data collection procedures and provide high
dimensionality data to capture complex patterns. However, research work shows that most
existing classification methods are based on conventional ML classifiers like ANN, SVM and
decision tree, which have limited generalization ability and unable to learn complex patterns
in high dimensional datasets. In [5], authors use gradient boosting classifiers: Categorical
Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM) and eXtreme Gradient
Boosting Machine (XGBoost), which have built-in weighted feature importance module to
create a subset of optimal features. Moreover, stochastic features are extracted through mean,
standard deviation, maximum and minimum value of a consumer observations. These stochastic
features have a minor effect on DR and major effect on FPR.
In [33,96,97,98,99,100,101,102], the authors propose a hybrid model, which is a combina-
tion of multilayer perceptron (MLP) and convolutional neural network (CNN). CNN has the
ability to extract latent features from EC data due to different number of layers: max pooling
and convolution layers. Both layers extract abstract information, reduce data dimensionality,
computation time and increase model generalization ability. The authors attain global knowl-
edge from 1D data and local information from 2D data (weekly consumption) through MLP
and CNN, respectively. In [57,103,104,105,106,107,108,109,110], the authors generate
new features from the smart meter and auxiliary data. These features are based on z-score,
electrical magnitude, users’ consumption patterns through clustering technique, smart meter
alarm system, geographical location and smart meter placement indoor or outdoor. In [111],
features are selected from existing features based on clustering evaluation criteria. Selecting
the number of feature 1, we select a feature from among all other ones, which have the high-
est clustering evaluation criteria. Selecting the number of features 2, we select two among
all other features, which have the highest clustering evaluation criteria. A DL-based stacked
autoencoder is utilized to extract the latent features, which significantly improve DR.
In [34], authors proposed a new DL model, which have ability to learn and extract latent features
from EC data. The proposed methodology integrates both sequential and non-sequential data.
Former (EC) and latter (smart meter data, geographical information, etc.) fed into in LSTM
and MLP module, respectively. The EC values are recorded on an hourly basis. However, the
authors reduce the granularity on daily level by simply taking an average of twenty-four hours.
LSTM model gives more accuracy and extracts better-hidden features from daily consumption
than monthly or hourly. In [58,112,113,114,115,116,117,118], authors use the black hole
11 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
algorithm (BHA) to select the optimal number of features. BHA is based on concept of the
black hole. The black hole has high gravitational force as compared to other stars. It sucks
the stars, which come near its boundary line. It is a basic concept that is used in BHA. BHA
is designed for continuous data. However, the authors convert it into a binary algorithm using
the sigmoid activation function because we want to select a subset of features from existing
ones. In the end, the authors compare the results of BHA with particle swarm optimization,
differential evolution, genetic algorithm (GA) and harmony search. In [119,120,121,122],
the authors utilize blockchain to achieve privacy in smart grid domain. In [123], the authors
perform work on feature engineering. They identify different features like electricity contract,
geographical location, weather condition, etc.
In [70,124,125,126,127,128,129,130,131,132,133], conventional techniques are applied on
data to reduce the curse of dimensionality. This process is very tedious and time-consuming.
CNN performs a downsampling mechanism and extracts high variance features. The last
layer of CNN is a fully connected layer where sigmoid is used as an activation function that
classifies data points as theft or normal ones. In their proposed technique, the authors used
CNN as a feature extractor and passed these features to a random forest (RF) classifier. RF
performs bagging and random features selection, which overcome the overfitting problem. In
[71], one of the main contributions of this paper is to find optimal number features. It is
observed that not all features equally contribute to prediction results. Their motive is to find
a threshold from which including or excluding features will not affect the prediction results.
The authors apply the Gini index to calculate each feature’s score in prediction results, sort in
descending order and select the above features that contribute maximum in prediction results.
The authors found 14 optimal features where all classifiers give the best performance. Moreover,
these selected features reduce the computation time of classifiers and minimize the curse of
dimensionality issue. In [69,134,135,136,137,138,139,140,141], authors use Dense-Net
based CNN to analyse periodicity in EC data. Convolutional layers can capture the long-
term and short-term sequences from weekly and monthly consumption patterns. Different
types of convolutional layers extract the different types of features. So, they overcome the
problem of handcrafted feature engineering using domain experts. To capture abstract and
latent (hidden) features, authors change the connection sequence in a dense block and convert
it into the multi-dimensional block where its received input contains previous blocks’ output.
The authors compare the proposed multi-dense block network with RF, gradient boosting
classifier, simple CNN, 1D-Dense-Net.
In [46], maximal overlap discrete wavelet packet transform (MODWPT) is used to extract
the optimal number of features. It decomposes the consumer load profile into wavelet sig-
nals. Wavelet coefficient and standard deviation of wavelet signals help to select the optimal
number of features. In [95], to address the curse of dimensionality issue, authors implement
12 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
a bidirectional Wasserstein generative adversarial network (BiWGAN) to extract the optimal
features from time-series data. Generative Adversarial Networks (GANs) gain much attention
from academia and industry due to their various applications like generating fake samples of
images, etc. GAN contains two parts: generator and discriminator. Former generates fake
samples and tries to fool the discriminator. Whereas, latter compares fake and real samples.
Both sub-models are trained in an adversarial manner. When both achieve the equilibrium
stage, then, discriminator is failed to distinguish between fake and real samples.
In [5], authors show 100 customers’ DR and FPR against some selected features through
gradient boosting classifiers. Results indicate that slight improvement is observed against DR
and FPR when we reduce the number of selected features. However, sudden decreased is
observed in DR when number of selected features are too low. We choose an optimal number
of features where the proposed method gives high DR and takes low processing time. In
[33], authors perform exploratory data analysis to visualize periodicity in monthly and weekly
consumption data. Results show, periodicity exists if we analyse data in 2D manner (Weeks).
The authors fed weekly consumption in 2D-CNN model to extract hidden or latent features. In
[57], authors fed a combination of newly created features in different conventional ML classifiers
and compare their results. In [74], authors perform comparison between a number of selected
features and classification accuracy. When selected features’ are less than four, then accuracy
is decreased. So, an optimal number of features are four that reduce the execution time and
memory complexity and improve the model generalization ability. In [34], authors measure
precision and recall score of LSTM classifier on test data. The hybrid of MLP and LSTM
outperform the single LSTM in terms of precision-recall curve (PR-curve) because MLP adds
additional information of features to the network like meter location, contractual data and
technical information. In [58], authors use accuracy score, convergence rate and computation
time to compare the performance of BHA and benchmark meta-heuristic techniques. In [123],
these identified features are passed to gradient boosting classifiers like LightGBM, CatBoost
and XGBoost to distinguish between normal and abnormal samples.
Receiver operating characteristics area under curve (ROC-AUC) and PR-curve are utilized to
evaluate gradient boosting classifiers’ performance. The authors use a dataset of Naturgy elec-
tric company of Spain and achieve accuracy more than 50%. In [71], authors use precision, recall
and f1-score measures to evaluate the performance of deployed classifiers. In [69], authors use
log loss and ROC-AUC to compare all deployed classifiers’ performance. The proposed model
achieves 0.86 and 0.25 ROC-AUC and log loss, respectively. In [46], classification accuracy are
used to evaluate classifier performance on test data. In [95], authors evaluate proposed model
performance through DR and FPR. In [2,57,142], authors do not use any feature engineering
techniques to extract or select the optimal features as the time-series dataset contains large
13 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
number of features. The high dimensionality of data creates time complexity, storage issues
and affects the model generalization ability.
In [74], authors form a feature library where they select a subset of features from existing
features using clustering evaluation criteria. However, they do not compare the adopted fea-
ture selection strategy with other feature selection strategies. Moreover, clustering takes high
computation time in case of large number of observations and features. Sparsity and outliers
also affect the performance of clustering evaluation criteria. The authors use autoencoder to
extract new features from existing ones to solve the curse of dimensionality issue. The autoen-
coder contains two neural networks. First, network transforms higher dimension data into the
low dimensional data. While, the second network removes noise and correlated features from
encoded data and converts them into higher dimension data. However, autoencoders require
large amount of data and computational time for training. It also does not give good results on
noisy data. The authors only consider EC data to detect abnormal patterns. They also con-
sider smart meter data (model, location inside or outside, alarm system, etc.) and geographical
data. In [58], meta-heuristic techniques are used to select optimal features. However, these
techniques take lot of computation time and high chance to stuck in local minima problem.
2.1.2 Handling the Class Imbalanced Problem
In [2,5,45,74,143] , data imbalance is a major issue for training of ML classifiers. Benign
samples are easily collected by getting the history of any consumers. While, theft cases rarely
happened in the real world. So, lack of theft samples limit classification accuracy and in-
crease FPR. There are two main approaches to solve data imbalance problem: RUS and ROS
techniques. Former selects existing copies of minority class and generates duplicated records.
Whereas, latter randomly selects samples from majority class and discards them. Result in,
this technique losses the potential information of data. Xiaolong proposed synthetic minority
oversampling technique (SMOTE) to create artificial samples of minority class using euclidean
distance. SMOTE technique has many advanced versions like Random-SMOTE, Kmeans-
SMOTE, etc. However, these sampling techniques do not represent the overall distribution of
data, which affects the FPR and DR badly. In [2], authors introduce six theft cases to gen-
erate malicious samples using benign samples. They argue that goal of theft is to report less
consumption than actual consumption or shift load toward low tariff periods. After generating
malicious samples, authors exploit ROS technique to solve class imbalance problem. In [74],
authors use GAN to create theft samples. GANs are belonged to DL domain. These are mostly
used in image processing field to generate fake images. The EC data is 1D time-series. So,
authors implement 1D-Wasserstein GAN (WGAN) to generate fake theft samples, which have
high resemblance with real world theft cases. WGAN contains two sub models: generator and
discriminator. Both modules use game theory based approach and try to deceive each other
14 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
Table 2.1: Related work
Limitations already ad-
dressed
Solution already pro-
posed
Validations already
done
Limitations to be done
[2] Data imbalance prob-
lem, Contamination at-
tacks, Effect of outliers,
Privacy issue
Malicious samples gen-
eration through six theft
cases, TLs information
used to identify contam-
ination attacks, High
consumer privacy
Generated theft cases re-
semble with real world
theft cases, DR, FPR, High
DR
Overfitting of ROS, SVM
not designed for time-series
data, Low performance of
SVM on noisy data, Diffi-
cult to tune hyper param-
eter of SVM, Cases 1 and
2 not resemble with real
world theft cases
[5] Curse of dimensional-
ity, Overfitting, Less re-
semblance with real world
theft cases.
Stochastic features,
Weighted features se-
lection, Handle overfitting
through SMOTE, Update
theft cases 1 and 2
DR, FPR, Time complex-
ity, Recall
Difficult to tune hyper
parameters of GBCs,
High time complexity of
GBCs, Overfitting issue of
SMOTE, Privacy leakage
due to high sampling rate
[33] Low DR of tradi-
tional methods, Require
domain knowledge to ex-
tract prominent features,
Low accuracy of conven-
tional ML methods, Diffi-
cult to analyse periodicity
from 1D data, Missing val-
ues and outliers
Data driven approaches
not required hardware
devices, Latent features
and periodicity extracted
through DL models, Miss-
ing values through linear
interpolation, Outlier by
three sigma rule
Less expensive, Do not
require hardware devices,
ROC, Precision, Recall,
FPR
MLP not designed for
time-series data, Class im-
balance problem, Limita-
tion of ReLU function
[34] Cyber attacks difficult
to detect through conven-
tional ML methods. CNN
and MLP networks not de-
signed for sequence data.
DL models have ability
model to learn and extract
latent features of data.
LSTM models are designed
to handle sequence data
TPR, FPR, PR-curve,
ROC-AUC
High complexity of MLP,
Class imbalance problem,
High dimensionality of
time-series data
[57] Data imbalance prob-
lem, Use smart meter and
auxiliary data, Drift pat-
terns
Class imbalance problem
handles through RUS,
New features created
from smart meter and
auxiliary data, Recent
and old abnormal patterns
detect through Z-socre and
K-mean clustering
PR-curve, Good accuracy
on new generated features,
ROC-AUC,
Underfitting issue, High di-
mensionality of time-series,
Computation overhead of
Grid search CV
[45] Class imbalance prob-
lem, High FPR, Manual
feature engineering
Use SMOTE to solve
class imbalance problem,
Extract optimal features
through CNN, Sequence
data classification through
LSTM
Precision, Recall, F1-score Overfitting of SMOTE,
Difficult to train hyper
parameters of LSTM
[46] Current approaches
expensive and time con-
suming, Class imbalance
problem, Not suitable per-
formance measures, Curse
of dimensionality
Proposed data driven
based framework, Solve
class imbalance through
RUS, Optimal feature
selection through MOD-
WPT, Select meaningful
performance measures for
class imbalance problem
Accuracy, Recall, F1-score,
Specificity, ROC-AUC,
MCC
Information loss due to
RUS, FPR measure not
considered
[58] Curse of dimensional-
ity
BHA to select prominent
features, Perform compar-
ison between BHA, GA,
PSO and HS
Convergence speed, Accu-
racy score, Computation
time
Class imbalance problem,
High FPR, Computation
overhead of meta heuristic
techniques
[69] Require handcrafted
features
Multi Dense-Net CNN to
capture periodicity and
hidden features
Precision, Log loss Class imbalance problem,
Not suitable performance
measures
[70] Existing techniques
unable to detect new type
of attacks, Require manual
checking, Traditional ap-
proaches expensive, Gen-
erate malicious samples,
Class imbalance problem
SMOTE to solve class
imbalance problem, Ex-
tract features through
CNN, Classify data points
through RF, Use theft
cases to generate malicious
samples
Precision, Recall, F1-score Overfitting of SMOTE
15 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
[71] Existing methods ex-
pensive and time consum-
ing, Curse of dimensional-
ity, Limited budget for on
site inspection, Observer
meter only identifies spe-
cific area not culprit
Machine learning methods
efficient and less time con-
suming , Feature selection
through Gini index, Com-
pare between ML methods,
Ensemble methods achieve
best results
Precision, Recall and F1-
score measures
Class imbalance problem
[74] Class imbalance prob-
lem, Feature selection and
extraction, Reduce mis-
classification of SVM
1D-GAN, Features selec-
tion using clustering eval-
uation criteria, Feature
extraction using autoen-
coder, Proposed similar-
ity measure through eu-
clidean distance and dy-
namic time wrapping, Re-
duce SVM misclassification
error through projection
vector and KNN
Accuracy, ROC curve, Ex-
ecution time
No comparison done with
base techniques
[123] Existed methods ex-
pensive, Require hardware
devices to detect NTL,
High FPR
Perform feature engineer-
ing to select optimal fea-
tures, Extracted feature
are evaluated through gra-
dient boosting classifiers
ROC-AUC, PR-curve Class imbalance problem
[95] Severe class imbalance
problem, Curse of data di-
mensionality
Feature extraction through
GAN, Handle severe
class imbalance problem
through one class SVM
DR, FPR Class imbalance problem,
Low DR
[111] Low generalization
ability and high FPR of ex-
isting classifiers, Vanishing
gradient problem, Class
imbalance problem
Autoencoders have good
generalization ability on
high dimensional datasets
ROC-AUC, FPR Execu-
tion time, DR
Privacy issue due to high
sampling rate, High FPR,
PSO local optima problem
[142] Low accuracy, Over-
fitting, Low convergence
speed, High FPR
Introduced new version of
LSTM
Precision, Recall, F1-score,
Convergence speed
Not suitable for large
datasets
[143] Existing methods ex-
pensive and time consum-
ing, High FPR, Low DR,
Class imbalance problem,
Not used all records
Handle class imbalance,
Low FPR, High DR, Bag-
ging methods perform bet-
ter on larger datasets
ROC curve, Confusion ma-
trix, Computation time,
DR
Overfitting of SMOTE
[144] Labelled data re-
quired for supervised
methods, Low perfor-
mance of unsupervised
learning methods, Difficult
interpretability and practi-
cality of DL methods
Suitable for low power
hardware devices, Over-
come the limitation of clas-
sification and clustering
methods
Precision, Recall, Classifi-
cation accuracy
High time complexity and
difficult to tune hyper pa-
rameters of SVM
[145] Tedious task to de-
sign utility function, Low
DR, High FPR, Class im-
balance problem, Poor gen-
eralization ability, Sudden
deviation in normal con-
sumption
High performance of en-
semble methods
Accuracy, Recall, F1-score,
Sensitivity, FPR
Information loss due to
RUS, FPR measure not
considered
[146] Label data requires
for supervised learning
methods, Low perfor-
mance of unsupervised
methods, Difficult inter-
pretability and practicality
of DL methods
Suitable for low power
hardware devices, Semi-
supervised model requires
low amount labelled data,
Overcome the limitation of
classification and cluster-
ing methods
Precision, Recall, Accu-
racy
High time complexity and
difficult to tune hyper pa-
rameters of SVM
[147] Bad performance of
classifiers against sudden
changes
Remove outliers through
K-Means clustering, Pat-
terns learned by LSTM,
Decide about theft or nor-
mal pattern through pre-
diction error
Precision, Recall Larger detection delay,
High time complexity
LSTM
16 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
[148] Supervised learning
methods not feasible for
practical applications, Low
performance of existing
methods on large datasets,
Require domain experts
for feature engineering
Spectral density function
and decision tree used to
extract optimal features,
Ensemble method used to
design different architec-
ture of autoencoders
Feature extraction ability
of autoencoders, Computa-
tion time, Shapiro test
[149] Low DR due to unla-
belled data, Only electric-
ity consumption data used,
False alarm generation
Remove outliers through
K-means clustering algo-
rithm, Generate theft sam-
ples, Classify normal and
theft observation through
ANN
Precision, FPR, Accuracy ANN not suitable for time-
series data, High time com-
plexity of ANN
[150] Handle data poising
attacks, Robust theft de-
tector, Design a robust
theft detector
Effect of data poisoning at-
tacks, Generalized model
against data poisoning
attacks, Compare perfor-
mance of sequential and
parallel ensemble methods
DR, FPR, Specificity,
Classification accuracy,
F1-score
NA
[156] What types of attacks
can be applied at gener-
ation side to falsify the
data? What type of data
used by electric utilities to
detect the attacks? Which
DL model gives high and
robust performance?
Design attacks that are ap-
plied at generation side,
Apply attacks to gener-
ate malicious samples, De-
sign hybrid DL models and
evaluate their performance
DR, Precision, Recall, F1-
score
Overfitting problem
[157] Supervised learning
methods not feasible for
practical applications, Low
performance of existing
methods, Require domain
experts to extract optimal
features
Spectral density function
and decision tree used for
feature engineering, En-
semble method used to de-
sign different architecture
of auto encoders
Feature extraction through
autoencoders, computa-
tion time, Shapiro test
No mechanism tell whether
data normal or not
to generate new fake samples. In [45], authors use six theft cases, which are introduced by [2]
to generate malicious samples and SMOTE is exploited to handle class imbalance problem. In
[143], authors use SMOTE and near miss technique to tackle class imbalance problem. Near
miss is a RUS technique, which randomly selects samples of majority class and remove it from
data until both classes have equal ratio. After balancing dataset, the authors perform com-
parison between bagging and boosting ensemble techniques. However, both techniques give
better results on SMOTE rather than near miss. In near miss, some samples of majority class
are removed to balance dataset. These samples may be contain important information, which
decrease classification accuracy. Whereas, in SMOTE, we do not remove any information from
data.
In [2], authors argue that goal of theft is to report less consumption or shift load from high
tariff periods to low tariff periods. So, it is possible to generate malicious samples from benign
ones. In [74], authors use 1D-WGAN to generate duplicated copies of minority class. Different
visualization plots of fake and real samples help us to decide about effectiveness of generated
samples. At end, authors compare 1D-WGAN performance with data generation techniques
like SMOTE and improved SMOTE. In [5,45], SMOTE technique is used to tackle the class
imbalance ratio. In [2], authors use ROS technique to handle the class imbalance ratio. The
17 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
ROS technique replicates existing samples of minority class, which create an overfitting prob-
lem. Where, classifier gives higher accuracy on training data than test data. Electricity theft
cases rarely happen in real world. Theft samples for a customer rarely exist or do not exist,
which limits the DR of any ML classifier. The authors introduce six theft cases to generate
malicious samples to balance the ratio between normal and theft cases. However, cases 1 and
2 do not have resemblance with real theft cases. In [33,34,69,71,123,95,144,145] , au-
thors do not tackle class imbalance problem. One of severe issue in ETD is class imbalance
ratio where, one class (honest consumers) is dominant to other class (theft consumers). Data
is not normally distributed and skewed towards the majority class. If ML model is applied
on imbalance dataset; it would be biased towards majority class and not learned important
features of minority class, which increases the FPR. In [143], authors use SMOTE and Near
miss method to handle class imbalance problem. Near miss reduces majority class sample to
balance ratio between normal and theft samples. This technique discards useful information of
dataset that creates an underfitting problem due to the limited number of samples. In [57,46],
class imbalance ratio is a severe problem in ETD. Where, non-fraudulent consumers are more
than fraudulent ones. Due to this problem, ML classifiers bias toward majority class, ignore
the minority class and generate false alarms. A utility cannot bear false alarms because it has
low budget for on-site inspection. The authors apply RUS technique to handle data imbalance
problem. This method randomly selects samples of majority class and removes it until both
classes have equal ratio. However, RUS technique randomly selects sample from majority class
and removes them. In case of highly imbalance ratio, it discards important information of data,
which creates underfitting problem.
2.1.3 Optimizing the Hyperparameters
The parameters whose value define structure of a ML model is known as hyper-parameters.
The process of choosing best parameters is called parameter tuning. There are different tech-
niques in literature, which are used to find optimal parameters like random search, grid search
and meta heuristic techniques, etc. In [5,45,58,71,74,142,143,144,145,146,147,148,149],
authors use random search method to find the optimal hyper-parameters. Random search sets
up a grid of parameters, selects random combination of parameters to train models and cal-
culates classification accuracy. The number of search iterations depend upon time and system
resources. In [2,33,34,57,46,69,70,123,95,150], authors use grid search to perform param-
eters tuning of ML models. Grid search also sets up a grid of hyper-parameters, trains models
on each combination and calculates classifier performance. It is computationally expensive
because it checks each combination of parameters. Both techniques have advantages and dis-
advantages. In existing literature, experimental results show that grid search performs better
as compared to random search. Selection of technique depends upon system resources and
18 Thesis by: Faisal Shehzad
Chapter 2 Chapter 2
nature of dataset. The literature work proves that grid search is suitable for smaller datasets.
Whereas, random search is better for larger datasets. In [111] authors select best parameter of