Article

Hybrid Deep Neural Networks for Detection of Non-Technical Losses in Electricity Smart Meters

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Non-technical losses in electricity utilities are responsible for major revenue losses. In this paper, we propose a novel end-to-end solution to self-learn the features for detecting anomalies and frauds in smart meters using a hybrid deep neural network. The network is fed with simple raw data, removing the need of handcrafted feature engineering. The proposed architecture consists of a long short-term memory network and a multi-layer perceptrons network. The first network analyses the raw daily energy consumption history whilst the second one integrates non-sequential data such as its contracted power or geographical information. The results show that the hybrid neural network significantly outperforms state-of-the-art classifiers as well as previous deep learning models used in non-technical losses detection. The model has been trained and tested with real smart meter data of Endesa, the largest electricity utility in Spain.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... So, lack of theft samples affect the performance of classification models. The ML models become biased towards majority class and ignore the minority class, which increases the FPR [7], [8]. In literature, the authors mostly use random undersampling (RUS) and random oversampling (ROS) techniques to handle the class imbalance problem. ...
... In [22], features are selected from existing features based on clustering evaluation criteria. In [8], the authors propose a new deep learning model, which has ability to learn and extract latent features from EC data. In [14], the authors use the black hole algorithm to select the optimal number of features and compare the results with particle swarm optimization, differential evolution, genetic algorithm and harmony search. ...
... In [18], the authors perform comparison between a number of selected features and classification accuracy. In [8], [23], the authors measure precision and recall score of long short term memory (LSTM) classifier on test data. The hybrid of multilayer perceptron (MLP) and LSTM outperform the single LSTM in terms of PR curve because MLP adds additional information to the network like meter location, contractual data and technical information. ...
Article
Full-text available
For dealing with the electricity theft detection in the smart grids, this article introduces a hybrid deep learning model. The model tackles various issues such as class imbalance problem, curse of dimensionality and low theft detection rate of the existing models. The model integrates the benefits of both GoogLeNet and gated recurrent unit (GRU). The one dimensional electricity consumption (EC) data is fed into GRU to remember the periodic patterns of electricity consumption. Whereas, GoogLeNet model is leveraged to extract the latent features from the two dimensional weekly stacked EC data. Furthermore, the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples, which have high resemblance with real world theft samples. The standard generative adversarial network only updates the weights of those points that are available at the wrong side of the decision boundary. Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of decision boundary that prevent the model from vanishing gradient problem. Moreover, dropout and batch normalization layers are utilized to enhance model’s convergence speed and generalization ability. The proposed model is compared with different state-of-the-art classifiers including multilayer perceptron (MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network and wide and deep convolutional neural network. It outperforms all classifiers by achieving 96% and 97% precision-recall area under the curve and receiver operating characteristics area under the curve, respectively.
... Moreover, limited amount of labeled EC data is another underlying cause that decreases the detection accuracy. Furthermore, in deep learning models, the problem of internal covariate shift (ICS) adversely affects the stable learning of hidden layers [1], [14]. ICS occurs when the input distribution of a hidden neural layer is transferred to other layers. ...
... ICS occurs when the input distribution of a hidden neural layer is transferred to other layers. The severe lack of fraudulent electricity consumers in real-world scenarios creates a class imbalance problem, which is an important concern for efficient ETD [1], [5], [14], [15]. In addition, the noisy and high dimensional data leads to the curse of dimensionality issue, which is confronted by the researchers during ETD [14]. ...
... The severe lack of fraudulent electricity consumers in real-world scenarios creates a class imbalance problem, which is an important concern for efficient ETD [1], [5], [14], [15]. In addition, the noisy and high dimensional data leads to the curse of dimensionality issue, which is confronted by the researchers during ETD [14]. ...
Article
Full-text available
In this paper, we present a hybrid deep learning model that is based on a two-dimensional convolutional neural network (2D-CNN) and a bidirectional long short-term memory network (Bi-LSTM)to detect non-technical losses (NTLs) in smart meters. NTLs occur due to the fraudulent use of electricity. The global integration of smart meters has proven to be beneficial for the storage of historical electricity consumption (EC) data. The proposed methodology learns the deep insights from the historical EC data and informs power utilities about the presence of NTLs. However, the effective detection of NTLs faces the problem of class imbalance that occurs due to the rare availability of fraudulent electricity consumers. To solve this issue, an evolutionary bidirectional Wasserstein generative adversarial network (Bi-WGAN) is employed. Bi-WGAN synthesizes the most plausible fraudulent EC samples by integrating an auxiliary encoder module. Besides, the inevitable curse of high dimensional data reduces the generalization ability of classifiers. The proposed hybrid model efficiently handles the highly dynamic data by utilizing its potent feature extracting capabilities. The one-dimensional daily EC data is passed to Bi-LSTM model for capturing the non-malicious changes from consumers’ profiles. Meanwhile, 2D-CNN takes 2D weekly EC data as input to extract the potential features by applying different convolutions and pooling operations. Extensive experiments are conducted on a realistic smart meters dataset to prove the effectiveness of the proposed model. The results show that the proposed model outperforms the state-of-the-art models by achieving area under the curve receiver operating characteristics of 0.97 and precision-recall area under the curve of 0.98, which make it suitable for real-world scenarios.
... However, the last layer in CNN is the same as the Feed Forward Network (FFN), which is likely to be over-trained, leading to poor generalization and misclassification [22]. In [23], authors use the concept of a Hybrid Neural Network (HNN), which is the combination of MLP and LSTM. MLP is used for non-sequential data while sequential data is fed to LSTM. ...
... It needs four linear MLP layers per cell as well to run at each time-step of sequence conssuming a large amount of memory to process data [24]. Moreover, the imbalanced classification problem is not handled by [15] and [23]. In [25], authors propose a multi-view stacking model for anomaly detection using EC. ...
... It is due to the diversity, complexity and irregularity of ET behavior. Therefore, motivated by [17] and [23], we integrate additional records for each user with the EC dataset to improve the ETD. In this paper, our main focus is to identify the irregular behavior of electricity thieves for ETD. ...
Article
In energy sectors, power utilities face financial losses due to Electricity Theft (ET). It happens when electricity is consumed without billing. Several methods are developed to detect ET automatically. Most of these methods only assess Electricity Consumption (EC) records. However, it is challenging to detect fraudulent consumers by only observing EC records because of diverse theft strategies (line tapping, meter tampering, etc.) and the irregularity of ET behavior. Furthermore, many methods have poor classification accuracy due to imbalanced data. This work proposes two novel methods to resolve the above-mentioned issues: Tomek Link Borderline Synthetic Minority Oversampling Technique with Support Vector Machine (TBSSVM) and Temporal Convolutional Network with Enhanced Multi-Layer Perceptron (TCN-EMLP). The former resamples the data by balancing the majority and minority class instances. Whereas, the latter classifies normal and fraudulent consumers. Moreover, deep learning models suffer from high variance in their final results due to the assignment of different weights. Therefore, an averaging ensemble strategy is applied in this work to reduce the high variance. Furthermore, State Grid Cooperation of China (SGCC) and Pakistan Residential Electricity Consumption (PRECON) datasets are used in this paper for performing the simulations. SGCC is an imbalanced and labeled dataset while PRECON is an unlabeled dataset comprised of normal consumers’ EC records (sequential) and auxiliary (non-sequential) data. Simulation results show that the proposed model outperforms the baselines, i.e., wide and deep convolutional neural network, extreme gradient boosting, long short-term memory with multi-layer perceptron, etc., in terms of ET detection.
... So, lack of theft samples affect the performance of classification models. The ML models become biased towards majority class and ignore the minority class, which increases the FPR [33,34,35,36,37,38,39,40,41,42,43,44]. In literature, the authors mostly use random undersampling (RUS) and random oversampling (ROS) techniques to handle the class imbalance problem. ...
... The authors remove highly correlated and overlapped features, which help to improve DR and decrease FPR. In [34], existing NTL detection methods are based on data driven approaches and specific hardware devices. The hardware based methods are time-consuming, less efficient and more expensive. ...
... In [34], authors proposed a new DL model, which have ability to learn and extract latent features from EC data. The proposed methodology integrates both sequential and non-sequential data. ...
Thesis
Full-text available
Data science is an emerging field, which has applications in multiple disciplines; like healthcare, advanced image recognition, airline route planning, augmented reality, targeted advertising, etc. In this thesis, we have exploited its applications in smart grids and financial markets with three major contributions. In the first two contributions, machine learning (ML) and deep learning (DL) models are utilized to detect anomalies in electricity consumption (EC) data, while in third contribution, upwards and downwards trends in the financial markets are predicted to give benefits to the potential investors. Non-technical losses (NTLs) are one of the major causes of revenue losses for electric utilities. In the literature, various ML and DL approaches are employed to detect NTLs. The first solution introduces a hybrid DL model, which tackles the class imbalance problem and curse of dimensionality and low detection rate of existing models. The proposed model integrates benefits of both GoogLeNet and gated recurrent unit (GRU). The one dimensional EC data is fed into GRU to remember periodic patterns. Whereas, GoogLeNet model is leveraged to extract latent features from the two dimensional weekly stacked EC data. Furthermore, the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples, which have high resemblance with real world theft samples. The standard generative adversarial network only updates the weights of those points that are available at the wrong side of the decision boundary. Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of decision boundary, which prevent the model from vanishing gradient problem. Moreover, dropout and batch normalization layers are utilized to enhance model’s convergence speed and generalization ability. The proposed model is compared with different state-of-the-art classifiers including multilayer perceptron (MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network and wide and deep convolutional neural network. The second solution presents a framework, which is employed to solve the curse of dimensionality issue. In literature, the existing studies are mostly concerned with tuning the hyperparameters of ML/ DL methods for efficient detection of NTL, i.e., electricity theft detection. Some of them focus on the selection of prominent features from data to improve the performance of electricity theft detection. However, the curse of dimensionality affects the generalization ability of ML/ DL classifiers and leads to computational, storage and overfitting problems. Therefore, to deal with above-mentioned issues, this study proposes a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for electricity theft detecton using big data in electric power systems. The former (metaheuristics) are used to select prominent features. While the latter are utilized to extract high variance features from electricity consumption data. First, new features are synthesized from statistical and electrical parameters from the user’s consumption history. Then, the synthesized features are used as input to metaheuristic techniques to find a subset of optimal features. Finally, the optimal features are fed as input to the denoising autoencoder to extract features with high variance. The ability of both techniques to select and extract features is measured using a support vector machine. The proposed system reduces the overfitting, storage and computational overhead of ML classifiers. Moreover, we perform several experiments to verify the effectiveness of our proposed system and results reveal that the proposed system has higher performance our counterparts. The third solution introduces a hybrid DL model for prediction of upwards and downwards trends in financial market data. The financial market exhibits complex and volatile behavior that is difficult to predict using conventional ML and statistical methods, as well as shallow neural networks. Its behavior depends on many factors such as political upheavals, investor sentiment, interest rates, government policies, natural disasters, etc. However, it is possible to predict upward and downward trends in financial market behavior using complex DL models. This paper therefore addresses the following limitations that adversely affect the performance of existing ML and DL models, i.e., the curse of dimensionality, the low accuracy of the standalone models, and the inability to learn complex patterns from high-frequency time series data. The denoising autoencoder is used to reduce the high dimensionality of the data, overcoming the problem of overfitting and reducing the training time of the ML and DL models. Moreover, a hybrid DL model HRG is proposed based on a ResNet module and gated recurrent units. The former is used to extract latent or abstract patterns that are not visible to the human eye, while the latter retrieves temporal patterns from the financial market dataset. Thus, HRG integrates the advantages of both models. It is evaluated on real-world financial market datasets obtained from IBM, APPL, BA and WMT . Also, various performance indicators such as f1-score, accuracy, precision, recall, receiver operating characteristic-area under the curve (ROC-AUC) are used to check the performance of the proposed and benchmark models. The RG 2 achieves 0.95, 0.90, 0.82 and 0.80 ROC-AUC values on APPL, IBM, BA and WMT datasets respectively, which are higher than the ROC-AUC values of all implemented ML and DL models.
... So, lack of theft samples affect the performance of classification models. The ML models become biased towards majority class and ignore the minority class, which increases the FPR [33][34][35][36][37][38][39][40][41][42][43][44]. In literature, the authors mostly use random undersampling (RUS) and random oversampling (ROS) techniques to handle the class imbalance problem. ...
... The authors remove highly correlated and overlapped features, which help to improve DR and decrease FPR. In [34], existing NTL detection methods are based on data driven approaches and specific hardware devices. The hardware based methods are time-consuming, less efficient and more expensive. ...
... In [34], authors proposed a new DL model, which have ability to learn and extract latent features from EC data. The proposed methodology integrates both sequential and non-sequential data. ...
Research Proposal
Full-text available
In this synopsis, the first solution introduces a hybrid deep learning model, which tackles the class imbalance problem and curse of dimensionality and low detection rate of existing models. The proposed model integrates benefits of both GoogLeNet and gated recurrent unit. The one dimensional EC data is fed into GRU to remember periodic patterns. Whereas, GoogLeNet model is leveraged to extract latent features from the two dimensional weekly stacked EC data. Furthermore , the time least square generative adversarial network is proposed to solve the class imbalance problem. The second solution presents a framework, which is employed to solve the curse of dimensionality issue. In literature, the existing studies are mostly concerned with tuning the hyperparameters of ML/ DL methods for efficient detection of NTL. Some of them focus on the selection of prominent features from data to improve the performance of electricity theft detection. However, the curse of dimensionality affects the generalization ability of ML/ DL classifiers and leads to computational, storage and overfitting problems. Therefore, to deal with above-mentioned issues, this study proposes a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for electricity theft detection using big data in electric power systems. The third solution introduces a hybrid deep learning model for prediction of upwards and downwards trends in financial market data. The financial market exhibits complex and volatile behavior that is difficult to predict using conventional machine learning (ML) and statistical methods, as well as shallow neural networks. Its behavior depends on many factors such as political upheavals , investor sentiment, interest rates, government policies, natural disasters, etc. However, it is possible to predict upward and downward trends in financial market behavior using complex DL models. In this synopsis, we have proposed three solutions to solve different issues in smart grids and financial market. The validations of proposed solutions will be done in thesis work using real-world datasets.
... It includes AI methods. AI methods categorized into machine learning [39,40,41,42,43,44] and DL algorithms [14,25,26,45]. Machine learning methods are impressive, though most of the methods involve manual feature extraction due to the limited ability to manage a large set of features. ...
... Therefore, these models widely used in recent studies [49,50,51]. In [26,45], authors have proposed HNN for ET detection such as LSTM-MLP, W&D-CNN. The individual components of both models concatenate each other, then finally perform joint training and classification performed through the FFN layer that is the fully connected layer and leads toward the generalization error, which causes overfitting and at the end model unable to distinguish between the normal and fraudulent consumer. ...
... In [26,45], authors have proposed HNN for ET detection, such LSTM-MLP and W&D-CNN. The individual components of both models concatenate each other, then finally perform joint training and classification performed through the FFN layer that is the fully connected layer and leads toward the generalization error, which causes overfitting and at the end model unable to distinguish between the normal and fraudulent consumer. ...
Thesis
Electricity theft (ET) is a major problem in developing countries. It a�ects the economy that causes revenue loss. It also decreases the reliability and stability of electricity utilities. Due to these losses, the quality of supply e�ects and tari � imposed on legitimate consumers. ET is an essential part of Non-technical loss (NTL) and it is challenging for electricity utilities to �nd the responsible people. Several methodologies have developed to identify ET behaviors automatically. However, these approaches mainly assess records of consumers' electricity usage, may prove inadequate in detecting ET due to a variety of theft attacks and irregularity of consumers' behavior. Moreover, some important challenges are needed to be addressed. (i) The number of normal consumers has been wrongly identi�ed as fraudulent. This leads to high False-positive rate (FPR). After the detection of theft, on-site inspection is needed to validate the detected person, either is it fraudulent or not and it is costly. (ii) The imbalanced nature of datasets which negatively a�ect on the model's performance. (iii) The problem of over�tting and generalization error is often faced in deep learning models, predicts unseen data inaccurately. So, the motivation for this work to detect illegal consumers accurately. We have proposed four Arti�cial intelligence (AI) models in this thesis. In system model 1, we have proposed Enhanced arti�cial neural network blocks with skip connections (EANNBS). It makes training easier, reduces over�tting, FPR and generalization error, as well as execution time. Temporal convolutional network with enhanced multi-layer perceptron (TCN-EMLP) is proposed in system model 2. It analyzes the sequential data based on daily electricity-usage records, obtained from smart meters. At the same time, EMLP integrates the non-sequential auxiliary data, such as data related to electrical connection type, property area, electrical appliances usage, etc. System model 3 based on Residual network (RN) that is used to automate feature extraction while three tree-based classi�ers such as Decision tree (DT), Random forest (RF) and Adaptive boosting (AdaBoost) are trained on the obtained features for classi�cation. Hyperparameter tuning toolkit is presented in this system model, named as Hyperactive optimization toolkit. Bayesian is used as an optimizer in this toolkit that aims to simplify the tuning process of DT, RF and AdaBoost. In system model 4, input is forwarded to three di�erent and well-known Machine learning (ML) techniques, i.e., Support vector machine (SVM), as an input. At this stage, a meta-heuristic algorithm named Simulated annealing (SA) is employed to acquire optimal values for ML models' hyperparameters. Finally, ML models' outputs are used as features for meta-classi�ers to achieve �nal classi�cation with Light Gradient boosting machine (LGBM) and Multi-layer perceptron (MLP). Furthermore, Pakistan residential electricity consumption dataset (PRECON1), State grid corporation of china (SGCC2) and Commission for energy regulation (CER3) datasets is used in this thesis. SGCC dataset contains 9% fraudulent consumers, which are extremely less than non-fraudulent consumers, due to the imbalance nature of data. Furthermore, many classi�cation techniques have poor predictive class accuracy for the positive class. These techniques mainly focus on minimizing the error rate while ignoring the minority class. Many re-sampling techniques are used in literature to adjust the class ratio; however, sometimes, these techniques remove the important information that is necessary to learn the model and cause over�tting. By using six previous theft attacks, we generate theft cases to mimic the real world theft attacks in original data. We have proposed the combination of oversampling and under-sampling techniques that is Near miss borderline synthetic minority oversampling technique (NMB-SMOTE), Tomek link borderline synthetic minority oversampling technique with support vector machine (TBSSVM) and Synthetic minority oversampling technique with near miss (SMOTE-NM) to handle imbalanced classi�cation problem. We have conducted a comprehensive experiment using SGCC, CER and PRECON datasets. The performance of suggested model is validated using di�erent performance metrics that are derived from Confusion matrix (CM).
... Among the existing solutions, the machine-learning-based solutions achieve the state of the art performance [11]. These solutions either consider the consumption metering system [8], [9], [12]- [15] or the FIT system [5], [6] and none of the existing solutions consider the net-metering system. ...
... Some of the existing solutions employ shallow detectors [8], [9], [12], while other solutions employ deep-learning-based detectors [13]- [15]. ...
... The detector alarms a false-reading attack if the error is above a predefined threshold. However, the above works employ shallow detectors and it has been proven in [13]- [15] that the deep-learningbased detectors achieve a superior performance. ...
... A reliable evaluation measure is very necessary for assessing the performance of models in the case of imbalanced data classification problem. To address the NTL problems in Spain, the authors in [28] propose a hybrid model, which consists of LSTM and Multi Layer Perceptron (MLP). In the model, two types of data are utilized to detect the electricity thieves, i.e., sequential data and nonsequential data. ...
... This process increases the convergence speed of PLSTM to a greater extent. The mathematical representations of PLSTM's memory gates are given below [28]. ...
... In the hybrid MLP-LSTM model [28], MLP is used for feature extraction. Whereas, LSTM classifies the electricity thieves by capturing the temporal correlations. ...
Article
Full-text available
The problem of electricity theft is exponentially increasing around the globe, which is harmful to the power sectors and consumers. The recent development in the advanced metering infrastructure brings opportunities for experts to identify the electricity thieves in the smart grid community. Many advancements are made in the area of the smart grid for Electricity Theft Detection (ETD), where the data collected from the smart meters is utilized. However, the problems of imbalanced distribution of data and inaccurate classification are not efficiently addressed. Therefore, to overcome the problems, machine learning and deep learning models are proposed for ETD. Initially, to refine the smart meters' data, pre-processing methods are used. Then, the class imbalance problem is solved through Synthetic Minority Oversampling Tomek Links (ST-Links). It solves the classifier's biasness problem, which occurs due to imbalanced data. It achieves the benefits of both data oversampling and undersampling. Afterwards, an AlexNet and peephole long short-term memory network based feature extractor with an attention layer is developed to extract the relevant features from electricity consumption profiles that are most suitable to classify honest and theft consumers. After the extraction of suitable features, the classification of consumers is performed by an echo state neural network. Moreover, an evolutionary grey wolf optimization technique is utilized to tune the hyper-parameters of the proposed model. A paired t-test is also applied on the final classification results for a reliable assessment of the proposed model. The simulations are conducted on a realistic smart meters' dataset of China to check the performance of the proposed model. In addition, different benchmark models are implemented to perform a comparative analysis. Different meaningful performance metrics are considered for the fair evaluation of the proposed model: Matthews Correlation Coefficient (MCC), F1-score, Area Under Curve (AUC), precision and recall. The simulation results depict that the proposed model obtains accuracy, recall, F1-score, AUC, PR-AUC, precision and MCC score of 96.3%, 92.1%, 92.0%, 96.4%, 97.3%, 90.0% and 84.0%, respectively. It is worth mentioning that the application of the proposed solution is quite general. Therefore, it can be used by the power companies to overcome the power losses in the energy sector. INDEX TERMS AlexNet, Data pre-processing, electricity theft detection, echo state neural network, imbalanced data, peephole LSTM, supervised learning, smart meter.
... These solutions are used to analyze and detect irregular patterns of consumers' electricity consumption. Deep learning based methods for the detection of electrical theft are used in [4], [9]. The authors present a study of various deep learning models, including Long-Short-Term Memory (LSTM), Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNN), Gated Recurrent Unit (GRU), etc. ...
... Despite recent advancements in deep learning and its growing success, relatively little work has been done in the literature on the class imbalance issue. The authors in [4] used LSTM-MLP model for data classification. However, imbalanced dataset issue is not addressed, which leads to poor Area under the Precision-Recall Curve (PR-AUC) score of 54.4%. ...
... Similarly, recent approaches such as CNN and MLP cannot work with sequential data. In [4], the authors use MLP combined with LSTM models for NTL detection. However, the imbalanced dataset issue is not resolved. ...
Conference Paper
Full-text available
In this paper, a data driven based solution is proposed to detect Non-Technical Losses (NTLs) in the smart grids. In the real world, the number of theft samples are less as compared to the benign samples, which leads to data imbalance issue. To resolve the issue, diverse theft attacks are applied on the benign samples to generate synthetic theft samples for data balancing and to mimic real-world theft patterns. Furthermore, several non-malicious factors influence the users' energy usage patterns such as consumers' behavior during weekends, seasonal change and family structure, etc. The factors adversely affect the model's performance resulting in data misclassification. So, non-malicious factors along with smart meters' data need to be considered to enhance the theft detection accuracy. Keeping this in view, a hybrid Multi-Layer Perceptron and Gated Recurrent Unit (MLP-GRU) based Deep Neural Network (DNN) is proposed to detect electricity theft. The MLP model takes auxiliary data such as geographical information as input while the dataset of smart meters is provided as an input to the GRU model. Due to the improved generalization capability of MLP with reduced overfitting and effective gated configuration of multi-layered GRU, the proposed model proves to be an ideal solution in terms of prediction accuracy and computational time. Furthermore, the proposed model is compared with the existing MLP-LSTM model and the simulations are performed. The results show that MLP-GRU achieves 0.87 and 0.89 score for Area under the Receiver Operating Characteristic Curve and Area under the Precision-Recall Curve (PR-AUC), respectively as compared to 0.72 and 0.47 for MLP-LSTM.
... Today, it is estimated that electricity theft costs the power industry as much as $96 billion/year globally. In developing countries, this proportion is much higher, with an estimated cost of $60 billion/year [2]. This huge loss drives up prices for end-users, increases the need for costly government subsidies, and cripples utility companies around the globe. ...
... Since these algorithms utilize already fabricated data, the computational cost is moderate with no requirement for new hardware devices and prior knowledge about network topology. However, there are several shortcomings in existing classification-based schemes, such as the high false-positive rate (FPR), time-consuming engagement of experts, and low adaption to new types of electricity fraud [2]. ...
... This procedure is repeated unless the optimal values of all hyper-parameters given in Tables I and II are obtained. It is pertinent to mention that important hyper-parameters as well as their initial values for ML models (used later for comparison in Section V) and iANN are borrowed from Ref. [2]. The four main steps of the SA algorithm are as follows: i. Start with random initialization of population. ...
... However, all of them have low detection rate and poor generalization results due to the inefficient feature engineering and limited availability of labeled electricity data. Moreover, another most common issue that occurs in ETD is a class imbalance [5], [8], [9], [10] because in the real world scenarios, the theft samples are rarely available as compared to the honest samples. Furthermore, the curse of dimensionality [8] is also the major problem faces by the researchers. ...
... Moreover, another most common issue that occurs in ETD is a class imbalance [5], [8], [9], [10] because in the real world scenarios, the theft samples are rarely available as compared to the honest samples. Furthermore, the curse of dimensionality [8] is also the major problem faces by the researchers. It degrades the model's accuracy as well as increase the computational time. ...
... In literature, many traditional schemes are focused on handicraft feature engineering for NTL detection [8]. Whereas, there are no mathematical mechanisms founded in the existing literature for identifying the shunt and double-tapping attacks. ...
Conference Paper
Full-text available
In this paper, a novel hybrid deep learning approach is proposed to detect the nontechnical losses (NTLs) that occur in smart grids due to illegal use of electricity, faulty meters, meter malfunctioning, unpaid bills, etc. The proposed approach is based on data-driven methods due to the sufficient availability of smart meters' data. Therefore, a bi-directional wasserstein generative adversarial network (Bi-WGAN) is utilized to generate the synthetic theft samples for solving the class imbalance problem. The Bi-WGAN efficiently synthesizes the minority class theft samples by leveraging the capabilities of an additional encoder module. Moreover, the curse of dimensionality degrades the model's generalization ability. Therefore, the high dimensionality issue is solved using the two dimensional convolutional neural network (2D-CNN) and bidirectional long short-term memory network (Bi-LSTM). The 2D-CNN is applied on 2D weekly data to extract the most prominent features. In 2D-CNN, the convolutional and pooling layers extract only the potential features and discard the redundant features to reduce the curse of dimensionality. This process increases the convergence speed of the model as well as reduces the computational overhead. Meanwhile, a Bi-LSTM is also used to detect the non-malicious changes in consumers' load profiles using its strong memorization capabilities. Finally, the outcomes of both models are concatenated into a single feature map and a sigmoid activation function is applied for final NTL detection. The simulation results demonstrate that the proposed model outperforms the existing scheme in terms of mathew correlation coefficient (MCC), precision-recall (PR) and area under the curve (AUC). It achieves 3%, 5% and 4% greater MCC, PR and AUC scores, respectively as compared to the existing model.
... The economic loss arises due to the electricity theft that amounts around 100 million Canadian dollars per year according to the British Columbia power and hydro authority [5]. In addition, the revenue loss incurred due to the NTL throughout the world is around 96 billion USD yearly [6]. Hence, it is very crucial to have an efficient and effective ETD approach in the smart grids (SGs). ...
... Several deep learning (DL) and machine learning (ML) based classification models are developed in [5,6,[11][12][13][14][15][16] for ETD and they use energy consumption data stored in smart meters (SMs). Therefore, their costs are reasonable. ...
... The research articles that deal with the hyperparameters' tuning of the ML techniques are given in initial category. In [4][5][6][11][12][13][14], those ML and DL techniques are under consideration that deal with efficient tuning of hyperparameters. In [4], a stacked sparse denoising autoencoder (SSDAE) is proposed to extract the most effective features to deal with the FPR and generalization issues. ...
Article
Full-text available
Electricity theft is one of the challenging problems in smart grids. The power utilities around the globe face huge economic loss due to ET. The traditional electricity theft detection (ETD) models confront several challenges, such as highly imbalance distribution of electricity consumption data, curse of dimensionality and inevitable effects of non-malicious factors. To cope with the aforementioned concerns, this paper presents a novel ETD strategy for smart grids based on theft attacks, long short-term memory (LSTM) and gated recurrent unit (GRU) called TLGRU. It includes three subunits: (1) synthetic theft attacks based data balancing, (2) LSTM based feature extraction, and (3) GRU based theft classification. GRU is used for drift identification. It stores and extracts the long-term dependency in the power consumption data. It is beneficial for drift identification. In this way, a minimum false positive rate (FPR) is obtained. Moreover, dropout regularization and Adam optimizer are added in GRU for tackling overfitting and trapping model in the local minima, respectively. The proposed TLGRU model uses the realistic EC profiles of the Chinese power utility state grid corporation of China for analysis and to solve the ETD problem. From the simulation results, it is exhibited that 1% FPR, 97.96% precision, 91.56% accuracy, and 91.68% area under curve for ETD are obtained by the proposed model. The proposed model outperforms the existing models in terms of ETD.
... The revenue loss from electricity theft cost around 100 millions Canadian dollars yearly reported by the Canadian electric power utility, i.e., british columbia Hydro and power authority [2]. Moreover, monetary loss due to the NTLs reported $96 billion per annum globally [3]. Hence, there is an urgent requirement of efficient approach for electricity theft detection (ETD). ...
... Some machine learning (ML) and deep learning (DL) techniques are proposed in [2]- [3] and [8]- [13] to use the electricity consumption (EC) data recorded by smart meters (SMs) for ETD in smart grids (SGs). However, there are some problems in existing methods which affect the model's true positive rate (TPR) and false positive rate (FPR) negatively. ...
... The first category discusses the research papers that focus on hyperparameter tuning. In [1]- [3] and [8]- [11], the authors focused on ML and DL models' hyperparameters tuning. In [1], authors targeted the high dimensionality issue and extracted the most effective features, so that it can solve the generalization issue as well as achieved better performance with respect to FPR. ...
Conference Paper
Full-text available
In this paper, we present a novel approach for the electricity theft detection (ETD). It comprises of two modules:(1) implementations of the six theft attacks for dealing with the data imbalanced issue and (2) a gated recurrent unit (GRU)to tackle the model’s poor performance in terms of high false positive rate (FPR) due to some non malicious reasons (i.e., drift).In order to balance the data, the synthetic theft attacks are applied on the smart grid corporation of China (SGCC) dataset. Subsequently, once the data is balanced, we pass the data to the GRU for ETD. As the GRU model stores and memorizes a huge sequence of the data by utilizing the balanced data, so it helps to detect the real thieves instead of anomaly due to drift. The proposed methodology uses electricity consumption (EC) data from SGCC dataset for solving ETD problem. The performance of the adopted GRU with respects to ETD accuracy is compared with the existing support vector machine (SVM) using variousper formance metrics. Simulation results show that SVM achieves64.33% accuracy; whereas, the adopted GRU achieves 82.65%accuracy for efficient ETD.
... When 80% of the data was used in training, the Precision Recall (PR-AUC) reached 54.5%. [25] Random Forest (RF) and CNN ...
... In the SGCC dataset, the number of typical energy consumers outnumbers the number of thieves. This data mismatch is a serious problem in ETD that must be addressed; otherwise, the classifier would be biased towards the majority class, resulting in poor performance [25]. ...
Article
Full-text available
One of the major concerns for the utilities in the Smart Grid (SG) is electricity theft. With the implementation of smart meters, the frequency of energy usage and data collection from smart homes has increased, which makes it possible for advanced data analysis that was not previously possible. For this purpose, we have taken historical data of energy thieves and normal users. To avoid imbalance observation, biased estimates, we applied the interpolation method. Furthermore, the data unbalancing issue is resolved in this paper by Nearmiss undersampling technique and makes the data suitable for further processing. By proposing an improved version of Zeiler and Fergus Net (ZFNet) as a feature extraction approach, we had able to reduce the model's time complexity. To minimize the overfitting issues, increase the training accuracy and reduce the training loss, we have proposed an enhanced method by merging Adaptive Boosting (AdaBoost) classifier with Coronavirus Herd Immunity Optimizer (CHIO) and Forensic based Investigation Optimizer (FBIO). In terms of low computational complexity, minimized over-fitting problems on a large quantity of data, reduced training time and training loss and increased training accuracy, our model outperforms the benchmark scheme. Our proposed algorithms Ada-CHIO and Ada-FBIO, have the low Mean Average Percentage Error (MAPE) value of error, i.e., 6.8% and 9.5%, respectively. Furthermore, due to the stability of our model our proposed algorithms Ada-CHIO and Ada-FBIO have achieved the accuracy of 93% and 90%. Statistical analysis shows that the hypothesis we proved using statistics is authentic for the proposed technique against benchmark algorithms, which also depicts the superiority of our proposed techniques.
... The cost of in-situ inspections is high and it does not represent a good return for utilities [9,17,44,48,52,55,57,72,105,115,126,127]. The number of inspectors required for these operations is very high [22,37,71,73,115]. ...
... Another barrier to the detection of NTL is the inefficiency in verifying the variability of consumption patterns of different groups of consumers, making it challenging to analyze historical customer data [9,115,127]. Due to this variation, there is a possibility that methods are ineffective because they only detect consumers with null consumption, which are only the most apparent cases of non-technical losses [22,37,79]. ...
Article
Non-technical losses refer to all electricity consumption not billed and represent a significant problem that has consequences to all sectors and a substantial negative impact on some geographical areas. These losses are complex and are attributed to several factors, leading researchers, concessionaires, and regulatory agents to seek successful solutions to reduce their effects. Thus, this article aims to identify the worldwide panorama on non-technical losses, presenting their impacts and the leading strategies and policies to mitigate them, helping the public and private sectors to understand the theme to outline effective solutions to combat this problem. A systematic review of the literature has been performed using the review protocol Preferred Reporting Items for Systematic Reviews and Meta-Analyzes, which resulted in 121 journal articles published between 2000 and 2020. The results comprise a complete definition of non-technical loss, its consequences for countries, distribution utilities, and society, the barriers and strategies for their identification, and the principal policies and regulations in countries of all levels of Gross National Income per capita. The main contribution of this article is to demonstrate the impact of non-technical losses to society and the economy, and the research and investigation directions so that frauds in the electricity sector are mitigated.
... The method has been developed and tested based on real smart meter data from Endesa's industry and customers. Buzau et al. in [18] uses a long and short-term memory network and a multi-layer perceptron hybrid deep neural network to detect anomalies and frauds in the electricity meter by analyzing daily energy consumption and geographic information, and finally tested it in the Spanish power company. Ghori et al. in [19] evaluated 9 types and 15 existing machine learning classifiers, and analyzed and compared the detection performance of various classifiers using the Pakistan Power Supply Company data set. ...
... Buzau et al. in [17], the XGBoost classifier is used for non-technical loss detection. Buzau et al. in [18] uses long-term and short-term memory networks and multilayer perceptron hybrid deep neural networks, which have higher accuracy than other classifiers. Ghori et al. in [19], using the data set of Pakistan Electric Power Company, the detection performance of 15 classifiers was tested. ...
Article
Full-text available
In the process of power transmission and distribution, non-technical losses are usually caused by users’ abnormal power consumption behavior. It will not only affect the dispatch and operation of the distribution network, bring hidden dangers to the security of the power grid, but also damage the operating costs of power companies and disrupt the operation of the power market. Aiming at users’ abnormal electricity consumption behavior, this paper proposes a model based on particle swarm optimization and long-short term memory with the attention mechanism (PSO-Attention-LSTM). Firstly, according to the actual electricity theft behavior, six typical electricity theft modes are summarized, and 4 composite modes are obtained by combining them, so as to comprehensively test the detection performance of the model for various electricity theft behaviors. Secondly, a detection model based on PSO-Attention-LSTM is proposed, and the model is built using the TensorFlow framework. The model uses the attention mechanism to give different weights to the hidden state of LSTM, which reduces the loss of historical information, strengthens important information and suppresses useless information. Use PSO to solve the difficult problem of model parameter selection, and optimize the hyperparameters to improve the model performance. Finally, the data set of the University of Massachusetts was used for simulation and compared with convolutional neural network-long short term memory (CNN-LSTM), attention mechanism-based long short term memory (Attention-LSTM), LSTM, gated recurrent unit (GRU), support vector regression (SVR), random forest (RF) and linear regression (LR) to verify the effectiveness and accuracy of the method used in this article. In this paper, Matlab software is used to analyze and visualize the detection result data.
... The authors of [5] introduce Gradient Boosted Tree (GBTs) to detect the NTLs. The authors in [6] present a Hybrid Deep Neural Network (HDNN), which is based on the LSTM and Multi-Layer Perceptron (MLP). In proposed model, auxiliary data is also considered for better detection of electricity thieves. ...
Conference Paper
Full-text available
In this paper, a hybrid deep learning model is presented to detect electricity theft in the power grids, which happens due to the Non-Technical Losses (NTLs). The NTLs emerge due to meter malfunctioning, meter bypassing, meter tampering, etc. The main focus of this study is to detect the NTLs. However, the detection of NTLs faces three major challenges: the problem of severe class imbalance, the problem of overfitting due to the highly dynamic data and poor generalization due to the usage of synthetic data. To overcome the aforementioned problems, a hybrid deep neural network is designed, which is the combination of Alexnet, Adaptive Boosting (AdaBoost) and Ant Bee Colony (ABC), termed as Alexnet-Adaboost-ABC. The Alexnet is exploited for the features' extraction while Adaboost and ABC are used for the classification and parameters' tuning, respectively. Moreover, the class imbalance issue is resolved using the Near Miss (NM) under sampling technique. The NM effectively reduces the majority class samples and standardize the proportion of both majority and minority classes. The model is evaluated on the real time inspected dataset released by the State Grid Corporation of China (SGCC). The performance of the proposed model is validated through the F1-score, precision, recall, Area Under Curve (AUC) and Matthew Correlation Coefficient (MCC). The simulation results depict that the proposed model outperform the existing techniques. The simulation results depict that the proposed model obtains 3%, 2% and 4% higher values of F1-score, AUC and MCC, respectively.
... In such cases, EnsembleNTLDetect provides optimal performance with minimal training time. Moreover, state-of-the-art NTL detection methods in [23], [48], [50] have not been verified on SGCC dataset, which partly explains their subpar performance on this dataset and lack of generalization ability. IV. ...
Preprint
Full-text available
Artificial intelligence-based techniques applied to the electricity consumption data generated from the smart grid prove to be an effective solution in reducing Non Technical Loses (NTLs), thereby ensures safety, reliability, and security of the smart energy systems. However, imbalanced data, consecutive missing values, large training times, and complex architectures hinder the real time application of electricity theft detection models. In this paper, we present EnsembleNTLDetect, a robust and scalable electricity theft detection framework that employs a set of efficient data pre-processing techniques and machine learning models to accurately detect electricity theft by analysing consumers' electricity consumption patterns. This framework utilises an enhanced Dynamic Time Warping Based Imputation (eDTWBI) algorithm to impute missing values in the time series data and leverages the Near-miss undersampling technique to generate balanced data. Further, stacked autoencoder is introduced for dimensionality reduction and to improve training efficiency. A Conditional Generative Adversarial Network (CTGAN) is used to augment the dataset to ensure robust training and a soft voting ensemble classifier is designed to detect the consumers with aberrant consumption patterns. Furthermore, experiments were conducted on the real-time electricity consumption data provided by the State Grid Corporation of China (SGCC) to validate the reliability and efficiency of EnsembleNTLDetect over the state-of-the-art electricity theft detection models in terms of various quality metrics.
... Buzau et al. [10] propose a hybrid model, which is a combination of long short term memory (LSTM) and multilayer perceptron (MLP). Former is validated and tested on sequential data. ...
Conference Paper
Full-text available
In this research article, we tackle the following limitations: high misclassification rate, low detection rate and, class imbalance problem and no availability of malicious or theft samples. The class imbalanced problem is severe issue in electricity theft detection that affects the performance of supervised learning methods. We exploit the adaptive synthetic minority oversampling technique to tackle this problem. Moreover, theft samples are created from benign samples and we argue that the goal of theft is to report less than consumption actual electricity consumption. Different machine learning and deep learning methods including recently developed light and extreme gradient boosting (XGBoost), are trained and evaluated on are alistic electricity consumption dataset that is provided by an electric utility in Pakistan. The consumers in the dataset belong to different demographics and, different social and financial backgrounds. Different number of classifiers are trained on acquired data; however, long short-term memory (LSTM) and XGBoost attain high performance and outperform all classifiers. The XGBoost achieves a 0.981 detection rate and 0.015misclassification rate. Whereas, LSTM attains 0.976 and 0.033detection and misclassification rate, respectively. Moreover, the performance of all implemented classifiers is evaluated through precision, recall, F1-score, etc.
... To handle NTLs, several solutions have been proposed in the literature [10], [11]. These solutions are broadly catego- [18] Customer's bill data Z-shaped and S-shaped member function N-A Hybrid of Gaussian mixture and LSTM [21] Numenta Anomaly Benchmark (NAB) and synthetic dataset Mean Square Error (MSE), F1-score, recall and precision N-A MODWPT [31] Households' data of Honduras Matthews correlation coefficient, specificity, accuracy, F1-score, AUC and recall 94% Anomaly detection system [32] Consumption data of Tanzania Accuracy, cross validation and F1-score 87% Binary black hole algorithm [33] Electricity consumption dataset of Brazil Recognition rate and time complexity N-A Artificial Neural Network (ANN) based customer filtering and classification and regression tree [34] Endesa's dataset Classification, regression and selforganizing map N-A Semi-supervised embedded system for NTL [35] SGCC Accuracy, FPR, precision, recall and F1score 95% Hybrid of MLP and LSTM [36] Endesa's smart meter data PR, AUC and precision N-A Kullback leibler divergence detector for optimal attack [37] Electricity consumption data of Ireland AUC and FPR N-A ...
Article
Full-text available
In this paper, two supervised learning models based solutions are proposed for Electricity Theft Detection (ETD). In the first solution, Adaptive Synthetic Edited Nearest Neighbor (ADASYNENN) is used to solve class imbalanced problem. For feature extraction, Locally Linear Embedding (LLE) technique is utilized. Moreover, Self-Attention Generative Adversarial Network (SAGAN) is used in combination with Convolutional Neural Network (CNN) for the classification of electricity consumers. In the second solution, Synthetic Minority Oversampling Technique Edited Nearest Neighbor (SMOTEENN) is proposed. Moreover, a novel classification model, named as ERNET, which is based on EfficientNet, Residual Network (ResNet) and Gated Recurrent Unit (GRU), is used to detect Non-Technical Losses (NTLs). We also used a Sparse Auto Encoder (SAE) for effective feature extraction that makes the classification more robust and easy. Furthermore, a robust Root Mean Square Propagation (RMSProp) optimizer is used to improve the learning rate of the model. To validate the proposed models, simulations are performed using different performance metrics, such as precision, recall, F1-score, Area Under the Curve (AUC), FPR and Root Mean Square Error (RMSE). All simulations are performed using State Grid Corporation of China (SGCC) dataset. The proposed models are compared with benchmark models, such as SAGAN, Wide and Deep Convolutional Neural Network (WDCNN), CNN and Long Short Term Memory (LSTM). The simulation results prove that the proposed models outperform the existing models in terms of the aforementioned performance metrics.
... It must be emphasized that due to the major hindrance of measuring interparticle contact forces experimentally, the current way of using DEM data for constructing the ML model is the only feasible approach, while the ANN model proposed to solve CFC prediction problem can be replaced by some other ML algorithms such as long shortterm memory network [50]. For instance, Zhang and Yin [34] adopted a DEM-based ML approach to predict the constitutive relationships of 2D granular materials using CNN and bidirectional long short-term memory neural network. ...
Article
Full-text available
The inhomogeneous distribution of contact force chains (CFC) in quasi-statically sheared granular materials dominates their bulk mechanical properties. Although previous micromechanical investigations have gained significant insights into the statistical and spatial distribution of CFC, they still lack the capacity to quantitatively estimate CFC evolution in a sheared granular system. In this paper, an artificial neural network (ANN) based on discrete element method (DEM) simulation data is developed and applied to predict the anisotropy of CFC in an assembly of spherical grains undergoing a biaxial test. Five particle-scale features including particle size, coordination number, x- and y-velocity (i.e., x and y-components of the particle velocity), and spin, which all contain predictive information about the CFC, are used to establish the ANN. The results of the model prediction show that the combined features of particle size and coordination number have a dominating influence on the CFC’s estimation. An excellent model performance manifested in a close match between the rose diagrams of the CFC from the ANN predictions and DEM simulations is obtained with a mean accuracy of about 0.85. This study has shown that machine learning is a promising tool for studying the complex mechanical behaviors of granular materials.
... Considering that the initial value and change evolution of the PEM fuel cell performance both change under different operating conditions, we try to prediction them separately to improve the accuracy by the multivariate polynomial regression (MPR) method and ANN, respectively. Some previous works also reported the reasonable combination of algorithms can achieve a higher accuracy [32]. Therefore, we propose a combined the MPR method and ANN approach, M-ANN. ...
Article
Full-text available
The dead-ended anode (DEA) and anode recirculation operations are commonly used to improve the hydrogen utilization of automotive proton exchange membrane (PEM) fuel cells. The cell performance will decline over time due to the nitrogen crossover and liquid water accumulation in the anode. Highly efficient prediction of the short-term degradation behaviors of the PEM fuel cell has great significance. In this paper, we propose a data-driven degradation prediction method based on multivariate polynomial regression (MPR) and artificial neural network (ANN). This method first predicts the initial value of cell performance, and then the cell performance variations over time are predicted to describe the degradation behaviors of the PEM fuel cell. Two cases of degradation data, the PEM fuel cell in the DEA and anode recirculation modes, are employed to train the model and demonstrate the validation of the proposed method. The results show that the mean relative errors predicted by the proposed method are much smaller than those by only using the ANN or MPR. The predictive performance of the two-hidden-layer ANN is significantly better than that of the one-hidden-layer ANN. The performance curves predicted by using the sigmoid activation function are smoother and more realistic than that by using rectified linear unit (ReLU) activation function.
... However, when the data input for model training is high, its performance degraded to 54.5% Precision-Recall Area Under the Curve (PR-AUC). In [7], Shuan et al. used a well-known deep neural network model named Convolutional Neural Network (CNN) for the detection of electricity theft accurately. Nevertheless, a major drawback of the model's generalization arises when classification output is taken from a fully connected layer CNN. ...
Conference Paper
Full-text available
In Smart Grids (SG), Electricity Theft Detection (ETD) is of great importance because it makes the SG cost efficient. Existing methods for ETD cannot efficiently handle data imbalance, missing values, variance and non-linear data problems in the smart meter data. Therefore, an effective integrated strategy is required to address underlying issues and accurately detect electricity theft using big data. In this work, a simple yet effective approach is proposed by integrating two different modules, such as data pre-processing and classification, in a single framework. The first module involves data imputation, outliers handling, standardization and class balancing steps to generate quality data for classifier training. The second module classifies honest and dishonest users with a Support Vector Machine (SVM) classifier. To improve the classifier’s learning trend and accuracy, a Bayesian optimization algorithm is used to tune SVM’s hyperparameters. Simulation results confirm that the proposed framework for ETD significantly outperforms previous machine learning approaches such as random forest, logistic regression and SVM in terms of accuracy.
... On the other hand, NTLs are calculated as the consumed electrical power that is not billed [36,37,38,39,40,41]. The main causes of these losses are faulty meters, unpaid and erroneous bills, and electricity theft [26,47,48,49,50]. For electricity theft, key spoofing and password cracking are the conventional techniques, which are used to tamper the meters [26,27,42,43,44,45,46]. ...
... In such cases, EnsembleNTLDetect provides optimal performance with minimal training time. Moreover, state-of-the-art NTL detection methods in [23], [48], [50] have not been verified on SGCC dataset, which partly explains their subpar performance on this dataset and lack of generalization ability. IV. ...
Conference Paper
Full-text available
Artificial intelligence-based techniques applied to the electricity consumption data generated from the smart grid prove to be an effective solution in reducing Non Technical Loses (NTLs), thereby ensures safety, reliability, and security of the smart energy systems. However, imbalanced data, consecutive missing values, large training times, and complex architectures hinder the real time application of electricity theft detection models. In this paper, we present EnsembleNTLDetect, a robust and scalable electricity theft detection framework that employs a set of efficient data pre-processing techniques and machine learning models to accurately detect electricity theft by analysing consumers' electricity consumption patterns. This framework utilises an enhanced Dynamic Time Warping Based Imputation (eDTWBI) algorithm to impute missing values in the time series data and leverages the Near-miss undersampling technique to generate balanced data. Further, stacked autoencoder is introduced for dimensionality reduction and to improve training efficiency. A Conditional Generative Adversarial Network (CTGAN) is used to augment the dataset to ensure robust training and a soft voting ensemble classifier is designed to detect the consumers with aberrant consumption patterns. Furthermore, experiments were conducted on the real-time electricity consumption data provided by the State Grid Corporation of China (SGCC) to validate the reliability and efficiency of EnsembleNTLDetect over the state-of-the-art electricity theft detection models in terms of various quality metrics.
... Not only do users require service improvements, but providers constantly lose resources to non-technical losses (NTL). Therefore, in [9] they have implemented a control algorithm based on hybrid deep neural networks that allow the detection of operating problems and anomalies in smart meters. This proposal does not require data pre-processing, which simplifies the system that shows good results. ...
Conference Paper
Power grids continue to develop and it is increasingly difficult to guarantee the quality of service offered to the user. In several developing countries, consumption is calculated on the basis of visual inspection, which is prone to errors. Consequently, this document outlines the construction of electrical consumption telemetering equipment. This is designed to reduce human error through manual measures and have a web backup that can be accessed from anywhere. To develop the prototype voltage and current sensors are used, and the signal is conditioned for the control stage. The processing unit is the Arduino Mega embedded board, which incorporates a GPRS Shield (General Packet Radio Services) that handles communication with a LAMP server (Linux, Apache, MySQL, PHP) connected to the Internet. It also incorporates a block of connection and disconnection of the electrical service that would leave the whole house without service. Two functionalities are used to present the data, one is local on the LCD display of the equipment installed in the home (user) and the second is remote access to a website (server). The results show that in comparison with a standard voltage device it presents an error of 0.28% and 4.12% in current. In this way, the use of this prototype for real-time monitoring of electricity consumption is validated, since it works similarly to a conventional one.
... LSTM is a deep recursive network that models temporal dependencies in electricity consumption time series data. The imbalance electricity theft data is not considered by authors in [172], therefore, network is more biased towards the majority class that is normal consumers and the is proposed in [170], which also suffers from poor performance due to inefficient feature engineering, ignoring data imbalance and inappropriate performance evaluation. The curse of dimensionality also drastically impacts the LSTM and WADCNN models. ...
Thesis
Full-text available
The revolution of power grids from traditional grids to Smart Grids (SGs) requires effective Demand Side Management (DSM) and reliable Renewable Energy Sources (RESs) incorporation in order to maintain demand, supply balance and optimize energy in an environment friendly manner. Data analytics provide solutions to the emerging challenges of power systems, such as DSM, environmental pollution (due to carbon emission), fossil fuel dependency mitigation, RESs incorporation, cost curtailment, grid’s stability and security. To efficiently manage electricity and maximize the profit of power utilities several tasks are focused in this thesis, i.e., prediction of electricity load to avoid demand and generation mismatch, wind power forecasting to satisfy energy demand effectively, electricity price forecasting for regulating market operations, carbon emissions forecasting for reducing payment of carbon tax, Electricity Theft Detection (ETD) for recovering power utilities’ revenue loss caused by electricity theft. In addition to that, a wind power forecast based DSM scheme is proposed. Furthermore, impact of RESs integration level on carbon emissions, electricity price and consumption cost is quantified. Both forecasting and classification techniques are utilized for efficient energy management. Forecasting of electricity load, price, wind power and carbon emissions is performed, whereas, classification of fair and fraudulent electricity consumers is performed. To balance electricity demand and supply, electricity load forecasting is required. Three models are proposed for this purpose, i.e., Deep Long Short-Term Memory (DLSTM), Efficient Sparse Autoencoder Nonlinear Autoregressive eXogenous network (ESAENARX) and Differential Evolution Recurrent Extreme Learning Machine (DE-RELM). DLSTM utilizes univariate data and gives single result, whereas, ESAENARX and DE-RELM model multivariate data and predict electricity load and price simultaneously. Due to adaptive and automatic feature learning mechanism, DLSTM achieves accurate results for separate forecasting of electricity load and price. ESAENARX and DE-RELM models are enhanced by newly proposed efficient feature extractor and model’s parameter tuning, respectively. Real-world datasets of ISO-NE, PJM, NYISO are used for load and price forecasting. The purpose of regulating the electricity market operations is achieved by forecasting of electricity load, price, wind power and carbon emissions. Wind power generation is predicted by an efficient model named Efficient Deep Convolution Neural Network (EDCNN). Moreover, a DSM strategy is also proposed based on predicted wind power generation. Power utilities have to pay carbon emissions tax imposed by government. To pay less carbon emissions tax, carbon emissions prediction is required, which helps in encouraging electricity consumers to shift their consumption load to low carbon price time periods of the day. For accomplishing the carbon emissions forecasting task, an efficient model named as Improved Particle Swarm Optimization based Deep Neural Network (IPSO DNN) is proposed. This model is improved by tunning the parameters of DNN by newly proposed improved optimization technique named as IPSO. ISO-NE dataset is used for wind power and carbon emissions forecasting. To reduce the financial loss of power utilities ETD is very important. For this purpose four models are proposed, named as, Differential Evolution Random Under Sampling Boosting (DE-RUSBoost), Jaya-RUSBoost, RUS Ensemble CNN (RUSE-CNN) and anomaly detection based ETD. In DE-RUSBoost and Jaya-RUSBoost, the parameters of RUSBoost classifier are tunned by DE and Jaya optimization techniques, respectively. In RUSE-CNN, RUS data balancing technique is applied along with ensemble CNN to improve ETD performance. DE-RUSBoost, Jaya-RUSBoost and RUSE-CNN are supervised model that work on labeled electricity theft data. Whereas, anomaly detection based ETD model is capable of identifying electricity theft from unlabeled electricity consumption data. Real-world datasets of SGCC, UMass, PRECON, CER, EnerNOC and LCL are used for ETD. Simulation results show that all the proposed models perform significantly better on real-world dataset as compared to their state-of-the-art counterpart models. The improved feature engineering and model hyper-parameter tuning enhance the performance of the proposed models in terms of prediction and classification results.
... Detection of NTL using smart meter data from Endesa and supervised learning 10 emphasize several classifiers of which XGB was the best performer. In another study, deep neural networks are the best performing algorithms using input data from Endesa (Spain) 17 . ...
Article
Full-text available
Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detection using real datasets of Tunisian electricity consumption metered by conventional meters. We propose an extensive feature engineering approach using the structured query language (SQL) analytic functions. Furthermore, double merging of datasets reveals more dimensions of the data allowing better detection of irregularities in consumption. We analyze the results of several machine learning (ML) algorithms that manage cases of weakly correlated features and highly unbalanced datasets. The skewness of the target is approached as a regular characteristic of the input data because most of consumers are fair and only a small portion attempt to mislead the utility companies by tampering with metering devices. Our fraud detection solutions consist of combining classifiers with an anomaly detection feature obtained with an unsupervised ML algorithm—Isolation Forest, and extensive feature engineering using SQL analytic functions on large datasets. Several techniques for feature processing enhanced the Area Under the Curve score for Decision Tree algorithm from 0.68 to 0.99.
... Multi-Layer Perceptron Networks (MLP): Costa and Buzau [41]- [42] constructed Multi-Layer Perceptron Networks to detect electricity frauds. We adopted a similar MLP structure with minor adjustments. ...
Article
Non-technical losses cause substantial commercial concerns to distribution network operators (DNOs). 80% of NTLs are related to electricity theft, which contains various high-techs and is increasingly difficult to detect. Advanced metering infrastructure (AMI) has enabled supervised machine learning (ML) to detect the NTLs, which significantly improved the detection rates. A further advance in ML type of methods requires sufficient labeled datasets, which is usually not available. To address this, this article proposes a self-supervised detection method that extracts long-term consumption patterns to detect fraud in low-voltage networks, known as NTL detection contrastive predictive coding (ND-CP). Smart meter data sequences are fed into a one-dimensional convolutional neural network (1D-CNN) first. The gated recursive unit (GRU) data is then used to extract global information. After that, the output of the prediction from the GRU model is used to construct positive and negative sample pairs for contrastive learning. Eventually, a single-layer neural network classifier for detection is trained using the long-term features extracted by ND-CP. Experiments are conducted with real electricity consumption data to verify the effectiveness of the proposed method.
Article
Non-technical losses (NTLs) are one of the major causes of revenue losses for electric utilities. In the literature, various machine learning (ML)/deep learning (DL) approaches are employed to detect NTLs. The existing studies are mostly concerned with tuning the hyperparameters of ML/DL methods for efficient detection of NTL, i.e., electricity theft detection. Some of them focus on the selection of prominent features from data to improve the performance of electricity theft detection. However, the curse of dimensionality affects the generalization ability of ML/DL classifiers and leads to computational, storage, and overfitting problems. Therefore, to deal with the above-mentioned issues, this study proposes a system based on metaheuristic techniques (artificial bee colony and genetic algorithm) and denoising autoencoder for electricity theft detection using big data in electric power systems. The former (metaheuristics) are used to select prominent features, while the latter is utilized to extract high variance features from electricity consumption data. Firstly, 11 new features are synthesized using statistical and electrical parameters from the user’s consumption history. Then, the synthesized features are used as input to metaheuristic techniques to find a subset of optimal features. Finally, the optimal features are fed as input to the denoising autoencoder to extract features with high variance. The ability of both metaheuristic and autoencoder techniques to select and extract features is measured using a support vector machine. The proposed system reduces the overfitting, storage, and computational overhead of ML classifiers. Moreover, we perform several experiments to verify the effectiveness of our proposed system and results reveal that the proposed system has better performance than its counterparts.
Article
With the development of smart cities infrastructure, these cities’ energy efficiency has become a major problem. Many public buildings, including health centers, educational institutions, and other institutions, consume more energy. It is of great significance that we produce as much energy as possible to be close to the real energy demand. IoT systems have many benefits in the smart city, such as controlling and reducing energy consumption. Therefore, governments can save a lot of money. In the current study, IoT sensors are used to collect data from home appliances. There was a particular trend in the amount of energy consumed by each home appliance. Moreover, we investigate the potential to integrate machine learning-based anomaly detection approaches to improve the maintenance of the power systems and control as a fundamental part of the smart city concept. The models were selected to catch trends and changes in energy consumption at an early stage. Based on the results, the Prophet and LightGBM models outperform vector autoregressive (VAR) models in point anomaly detection. Furthermore, the Prophet and LightGBM models predict how much energy will be used in the future based on weather and time information.
Conference Paper
Full-text available
In this paper, a problem of misclassification due to cross pairs across a decision boundary is investigated. A cross pair is a junction of the two opposite class samples. These cross pairs are identified using Tomek links technique. The majority class sample associated with cross pairs are removed to segregate the two opposite classes through an affine decision boundary. Due to non-availability of theft data, six theft cases are used to synthesize theft data to mimic real world scenario. These six theft cases are applied to benign class data, where benign samples are modified and malicious samples are synthesized. Furthermore, to tackle the class imbalance issue a K-means SMOTE is used for the provision of balance data. Moreover, the technical route is to train the model on a time-series data of both classes. Training model on imbalance data tends to misclassification of the samples, due to biasness towards a majority class, which results in a high FPR. The balanced data is provided as an input to a hybrid bi-directional GRU and bi-directional LSTM model. The two classes are efficiently classified with a high accuracy, high detection rate and low FPR.
Article
Full-text available
The existing power systems as well as the smart grids utilize a number of advanced computing, networking, and measurement technologies that improve their planning and operation targeting to a full automated system in terms of monitoring and control. As power systems become more complicated, they face a combination of known and unknown vulnerabilities and threats that are more targeted and sophisticated. Among them, the malicious activity, influencing the measuring devices, is of major importance since it can instantaneously result in the physical operation and reliability of the grid. Conventional and smart energy meters incorporated to power systems are the most vulnerable measuring devices. The problem of conventional or smart energy meter manipulation targeting to the influence of power system operation and reliability is known as the energy theft (ENT) problem. This problem has become of major importance in many countries all over the world. The energy theft mainly occurs in the transmission and distribution levels. In order to reduce the impact of the energy theft, there are many methods that have been proposed in the literature. This paper presents a review of smart metering in the European Union (EU) along with a classification and contribution analysis of the most cited ENT problem solutions published in the literature, while a bibliographic analysis concerning the impact of most cited authors, affiliations, and references is also conducted.
Article
Electricity theft has significant impact on the power grids in terms of generating non-technical losses, which eventually degrading the power quality and minimizing the outfitted profit. In this paper, we proposed a hybrid approach based on deep learning and support vector machine for the detection of energy theft to facilitate and assess energy supplier companies to eliminate the issue of insufficient power, irregular power expenditure and ineffective electricity monitoring. A deep convolutional neural network is proposed for the feature learning using smart meters data in different times, varying from hours to days. Extracted features were further used to train support vector machine, which classify the features in two categories as theft and non-theft. Furthermore, a dropout layer is introduced in convolutional neural network model to avoid over fitting issues. Several careful experiments were carried out on real time customers smart meter data and the results validate the effectiveness of the proposed method in terms of accuracy and less detection error.
Thesis
Full-text available
The revolution of power grids from traditional grids to Smart Grids (SGs) requires effective Demand Side Management (DSM) and reliable Renewable Energy Sources (RESs) incorporation in order to maintain demand, supply balance and optimize energy in an environment friendly manner. Data analytics provide solutions to the emerging challenges of power systems, such as DSM, environmental pollution (due to carbon emission), fossil fuel dependency mitigation, RESs incorporation, cost curtailment, grid’s stability and security. To efficiently manage electricity and maximize the profit of power utilities several tasks are focused in this thesis, i.e., prediction of electricity load to avoid demand and generation mismatch, wind power forecasting to satisfy energy demand effectively, electricity price forecasting for regulating market operations, carbon emissions forecasting for reducing payment of carbon tax, Electricity Theft Detection (ETD) for recovering power utilities’ revenue loss caused by electricity theft. In addition to that, a wind power forecast based DSM scheme is proposed. Furthermore, impact of RESs integration level on carbon emissions, electricity price and consumption cost is quantified. Both forecasting and classification techniques are utilized for efficient energy management. Forecasting of electricity load, price, wind power and carbon emissions is performed, whereas, classification of fair and fraudulent electricity consumers is performed. To balance electricity demand and supply, electricity load forecasting is required. Three models are proposed for this purpose, i.e., Deep Long Short-Term Memory (DLSTM), Efficient Sparse Autoencoder Nonlinear Autoregressive eXogenous network (ESAENARX) and Differential Evolution Recurrent Extreme Learning Machine (DE-RELM). DLSTM utilizes univariate data and gives single result, whereas, ESAENARX and DE-RELM model multivariate data and predict electricity load and price simultaneously. Due to adaptive and automatic feature learning mechanism, DLSTM achieves accurate results for separate forecasting of electricity load and price. ESAENARX and DE-RELM models are enhanced by newly proposed efficient feature extractor and model’s parameter tuning, respectively. Real-world datasets of ISO-NE, PJM, NYISO are used for load and price forecasting. The purpose of regulating the electricity market operations is achieved by forecasting of electricity load, price, wind power and carbon emissions. Wind power generation is predicted by an efficient model named Efficient Deep Convolution Neural Network (EDCNN). Moreover, a DSM strategy is also proposed based on predicted wind power generation. Power utilities have to pay carbon emissions tax imposed by government. To pay less carbon emissions tax, carbon emissions prediction is required, which helps in encouraging electricity consumers to shift their consumption load to low carbon price time periods of the day. For accomplishing the carbon emissions forecasting task, an efficient model named as Improved Particle Swarm Optimization based Deep Neural Network (IPSO DNN) is proposed. This model is improved by tunning the parameters of DNN by newly proposed improved optimization technique named as IPSO. ISO-NE dataset is used for wind power and carbon emissions forecasting. To reduce the financial loss of power utilities ETD is very important. For this purpose four models are proposed, named as, Differential Evolution Random Under Sampling Boosting (DE-RUSBoost), Jaya-RUSBoost, RUS Ensemble CNN (RUSE-CNN) and anomly detection based ETD. In DE-RUSBoost and Jaya-RUSBoost, the parameters of RUSBoost classifier are tunned by DE and Jaya optimization techniques, respectively. In RUSE-CNN, RUS data balancing technique is applied along with ensemble CNN to improve ETD performance. DE-RUSBoost, Jaya- RUSBoost and RUSE-CNN are supervised model that work on labeled electricity theft data. Whereas, anomaly detection based ETD model is capable of identifying electricity theft from unlabeled electricity consumption data. Real-world datasets of SGCC, UMass*, PRECON, CER, EnerNOC and LCL are used for ETD. Simulation results show that all the proposed models perform significantly better on real-world dataset as compared to their state-of-the-art counterpart models. The improved feature engineering and model hyper-parameter tuning enhance the performance of the proposed models in terms of prediction and classification results.
Chapter
In this paper, a novel hybrid deep learning approach is proposed to detect the nontechnical losses (NTLs) that occur in smart grids due to illegal use of electricity, faulty meters, meter malfunctioning, unpaid bills, etc. The proposed approach is based on data-driven methods due to the sufficient availability of smart meters’ data. Therefore, a bi-directional wasserstein generative adversarial network (Bi-WGAN) is utilized to generate the synthetic theft samples for solving the class imbalance problem. The Bi-WGAN efficiently synthesizes the minority class theft samples by leveraging the capabilities of an additional encoder module. Moreover, the curse of dimensionality degrades the model’s generalization ability. Therefore, the high dimensionality issue is solved using the two dimensional convolutional neural network (2D-CNN) and bidirectional long short-term memory network (Bi-LSTM). The 2D-CNN is applied on 2D weekly data to extract the most prominent features. In 2D-CNN, the convolutional and pooling layers extract only the potential features and discard the redundant features to reduce the curse of dimensionality. This process increases the convergence speed of the model as well as reduces the computational overhead. Meanwhile, a Bi-LSTM is also used to detect the non-malicious changes in consumers’ load profiles using its strong memorization capabilities. Finally, the outcomes of both models are concatenated into a single feature map and a sigmoid activation function is applied for final NTL detection. The simulation results demonstrate that the proposed model outperforms the existing scheme in terms of mathew correlation coefficient (MCC), precision-recall (PR) and area under the curve (AUC). It achieves 3%, 5% and 4% greater MCC, PR and AUC scores, respectively as compared to the existing model.
Article
The advanced metering infrastructure is a key building block for the smart grid, which is responsible for facilitating communication between the smart meter and the electricity provider. However, the development of advanced metering infrastructure leads to more sophisticated attacks that can be launched on the network and have a tremendous effect on the grid. Non-technical losses are the number one and most critical issue contributing to financial losses on smart grids. This paper, therefore, discusses the types and causes of attacks that trigger non-technical losses in the advanced metering infrastructure system and analyzes attacks that result from non-technical losses attack models. This paper also addresses various features and feature-engineering methods and their ability to create distinct classes between normal and attack samples. This study further examines the performances of different learning models in the detection of different types of attacks. Finally, the paper provides a summary and suggestions for further improvement in the detection of non-technical loss attacks.
Article
The vulnerability of power theft has hampered the electricity industry for decades. It obstructs social progress by having varying degrees of impact on home, commercial, and industrial customers. Sneak thieves have caught up with contemporary metering systems, putting electricity suppliers in trouble financially. This comparative analysis is the first step in the presentation of principles. Theft of electricity has serious consequences for the power grid's proper operation as well as the economic benefits of power corporations and commercial power service providers. An effective anti-power-theft algorithm is required for tracking power usage statistics in order to detect electricity power theft. In this literature review, we differentiate the Support Vector Machine (SVM) algorithm with other techniques for detecting abnormal usage among consumers (i.e., electricity fraudsters) in time-series data on power consumption. The results show some combinations can reach significantly better values than others, comparing both the balancing techniques for a same machine learning method itself as well as comparing these combinations between themselves.
Chapter
In this research article, we tackle the following limitations: high misclassification rate, low detection rate and, class imbalance problem and no availability of malicious or theft samples. The class imbalanced problem is severe issue in electricity theft detection that affects the performance of supervised learning methods. We exploit the adaptive synthetic minority oversampling technique to tackle this problem. Moreover, theft samples are created from benign samples and we argue that the goal of theft is to report less than consumption actual electricity consumption. Different machine learning and deep learning methods including recently developed light and extreme gradient boosting (XGBoost), are trained and evaluated on a realistic electricity consumption dataset that is provided by an electric utility in Pakistan. The consumers in the dataset belong to different demographics and, different social and financial backgrounds. Different number of classifiers are trained on acquired data; however, long short-term memory (LSTM) and XGBoost attain high performance and outperform all classifiers. The XGBoost achieves a 0.981 detection rate and 0.015 misclassification rate. Whereas, LSTM attains 0.976 and 0.033 detection and misclassification rate, respectively. Moreover, the performance of all implemented classifiers is evaluated through precision, recall, F1-score, etc.
Chapter
In this paper, a data driven based solution is proposed to detect Non-Technical Losses (NTLs) in the smart grids. In the real world, the number of theft samples are less as compared to the benign samples, which leads to data imbalance issue. To resolve the issue, diverse theft attacks are applied on the benign samples to generate synthetic theft samples for data balancing and to mimic real-world theft patterns. Furthermore, several non-malicious factors influence the users’ energy usage patterns such as consumers’ behavior during weekends, seasonal change and family structure, etc. The factors adversely affect the model’s performance resulting in data misclassification. So, non-malicious factors along with smart meters’ data need to be considered to enhance the theft detection accuracy. Keeping this in view, a hybrid Multi-Layer Perceptron and Gated Recurrent Unit (MLP-GRU) based Deep Neural Network (DNN) is proposed to detect electricity theft. The MLP model takes auxiliary data such as geographical information as input while the dataset of smart meters is provided as an input to the GRU model. Due to the improved generalization capability of MLP with reduced overfitting and effective gated configuration of multi-layered GRU, the proposed model proves to be an ideal solution in terms of prediction accuracy and computational time. Furthermore, the proposed model is compared with the existing MLP-LSTM model and the simulations are performed. The results show that MLP-GRU achieves 0.87 and 0.89 score for Area under the Receiver Operating Characterstic Curve and Area under the Precision-Recall Curve (PR-AUC), respectively as compared to 0.72 and 0.47 for MLP-LSTM.
Chapter
In this paper, a problem of misclassification due to cross pairs across a decision boundary is investigated. A cross pair is a junction of the two opposite class samples. These cross pairs are identified using Tomek links technique. The majority class sample associated with cross pairs are removed to segregate the two opposite classes through an affine decision boundary. Due to non-availability of theft data, six theft cases are used to synthesize theft data to mimic real world scenario. These six theft cases are applied to benign class data, where benign samples are modified and malicious samples are synthesized. Furthermore, to tackle the class imbalance issue a K-means SMOTE is used for the provision of balance data. Moreover, the technical route is to train the model on a time-series data of both classes. Training model on imbalance data tends to misclassification of the samples, due to biasness towards a majority class, which results in a high FPR. The balanced data is provided as an input to a hybrid bi-directional GRU and bi-directional LSTM model. The two classes are efficiently classified with a high accuracy, high detection rate and low FPR.
Chapter
In this paper, we present a novel approach for the electricity theft detection (ETD). It comprises of two modules: (1) implementations of the six theft attacks for dealing with the data imbalanced issue and (2) a gated recurrent unit (GRU) to tackle the model’s poor performance in terms of high false positive rate (FPR) due to some non malicious reasons (i.e., drift). In order to balance the data, the synthetic theft attacks are applied on the smart grid corporation of China (SGCC) dataset. Subsequently, once the data is balanced, we pass the data to the GRU for ETD. As the GRU model stores and memorizes a huge sequence of the data by utilizing the balanced data, so it helps to detect the real thieves instead of anomaly due to drift. The proposed methodology uses electricity consumption (EC) data from SGCC dataset for solving ETD problem. The performance of the adopted GRU with respects to ETD accuracy is compared with the existing support vector machine (SVM) using various performance metrics. Simulation results show that SVM achieves 64.33% accuracy; whereas, the adopted GRU achieves 82.65% accuracy for efficient ETD.
Chapter
Full-text available
In this paper, a hybrid deep learning model is presented to detect electricity theft in the power grids, which happens due to the Non-Technical Losses (NTLs). The NTLs emerge due to meter malfunctioning, meter bypassing, meter tampering, etc. The main focus of this study is to detect the NTLs. However, the detection of NTLs faces three major challenges: the problem of severe class imbalance, the problem of overfitting due to the highly dynamic data and poor generalization due to the usage of synthetic data. To overcome the aforementioned problems, a hybrid deep neural network is designed, which is the combination of Alexnet, Adaptive Boosting (AdaBoost) and Ant Bee Colony (ABC), termed as Alexnet-Adaboost-ABC. The Alexnet is exploited for the features’ extraction while Adaboost and ABC are used for the classification and parameters’ tuning, respectively. Moreover, the class imbalance issue is resolved using the Near Miss (NM) undersampling technique. The NM effectively reduces the majority class samples and standardize the proportion of both majority and minority classes. The model is evaluated on the real time inspected dataset released by the State Grid Corporation of China (SGCC). The performance of the proposed model is validated through the F1-score, precision, recall, Area Under Curve (AUC) and Matthew Correlation Coefficient (MCC). The simulation results depict that the proposed model outperform the existing techniques. The simulation results depict that the proposed model obtains 3%, 2% and 4% higher values of F1-score, AUC and MCC, respectively.
Article
For dealing with the electricity theft detection in the smart grids, this article introduces a hybrid deep learning model. The model tackles various issues such as class imbalance problem, curse of dimensionality and low theft detection rate of the existing models. The model integrates the benefits of both GoogLeNet and gated recurrent unit (GRU). The one dimensional electricity consumption (EC) data is fed into GRU to remember the periodic patterns of electricity consumption. Whereas, GoogLeNet model is leveraged to extract the latent features from the two dimensional weekly stacked EC data. Furthermore , the time least square generative adversarial network (TLSGAN) is proposed to solve the class imbalance problem. The TLSGAN uses unsupervised and supervised loss functions to generate fake theft samples, which have high resemblance with real world theft samples. The standard generative adversarial network only updates the weights of those points that are available at the wrong side of the decision boundary. Whereas, TLSGAN even modifies the weights of those points that are available at the correct side of decision boundary that prevent the model from vanishing gradient problem. Moreover, dropout and batch normalization layers are utilized to enhance model's convergence speed and generalization ability. The proposed model is compared with different state-of-the-art classifiers including multilayer perceptron (MLP), support vector machine, naive bayes, logistic regression, MLP-long short term memory network and wide and deep convolutional neural network. It outperforms all classifiers by achieving 96% and 97% precision-recall area under the curve and receiver operating characteristics area under the curve, respectively.
Article
Full-text available
Today's energy resources are closer to consumers thanks to sustainable energy and advanced metering infrastructure (AMI), such as smart meters. Smart meters are controlled and manipulated through various interfaces in smart grids, such as cyber, physical and social interfaces. Recently, a large number of non-technical losses (NTLs) have been reported in smart grids worldwide. These are partially caused by false data injections (FDIs). Therefore, ensuring a secure communication medium and protected AMIs is critical to ensuring reliable power supply to consumers. In this paper, we propose a novel Big Data-driven solution that employs machine learning, deep learning and parallel computing techniques. We additionally obtained robust statistical features to detect the FDIs based cyber threats at the distribution level. The performance of the proposed model for NTL detection is investigated using private smart grid datasets in the Turkish distribution network for AMI-level cyber threats, and the results are compared to state-of-the-art machine learning algorithms used for NTL classification problems. Our approach shows promising results, as the accuracy, specificity, and precision metrics of most classifiers are above 90% and false positive rates vary between 0.005 to 0.027.
Article
Although fraud in electricity consumption is easier to detect when consumption is recorded hourly by smart meters, in most developing countries, where the propensity for fraud is higher, conventional meters are not yet affordable. Fraud detection is easier with time series data-logging due to the periodicity and variability of consumption that reveals deviations from a regular consumption pattern. In contrast, fraud detection with conventional meters remains a significant challenge because anomalies in consumption are well hidden within the normal consumption of other consumers. In this paper, large datasets regarding consumers and invoice data from Tunisia are combined and investigated with several Machine Learning (ML) classification algorithms, to detect irregularities in electricity consumption. By performing extensive feature engineering, including multivariate Gaussian distribution, the efficiency of ensemble classifiers such as Light Gradient Boosting (LGB) outperforms other algorithms and achieves realistic performance from challenging, unbalanced and uncorrelated input datasets.
Article
Load anomalies in power distribution systems are relatively rare, yielding imbalanced classification datasets. Consequently, traditional artificial intelligence approaches for detecting load anomalies tend to be relatively inaccurate or rely heavily on pre-processing optimization. Reasoning that hyperdimensional computing (HDC) lends itself naturally to classification with imbalanced datasets, we present an HDC-based method for detecting anomalies accurately and in real time using raw meter data without prior optimization. An associative memory is used to classify the hypervectors, providing resilience against data imbalance. Furthermore, a novel retraining function is designed to further improve classification accuracy. The proposed method is evaluated using real-world datasets from two sources with significantly different characteristics–a US university campus that relies on state-of-the-art power monitoring and uses mostly non-renewable energy sources, and a rural village in Tanzania that relies on wireless monitoring and uses 100% solar energy. In comparison with traditional AI approaches, it achieves consistently higher accuracy and runtime efficiency on both datasets and under a wide variety of evaluation metrics. To the best of our knowledge, this work is the first investigation of HDC for anomaly detection in smart grids. Our results show that HDC-based methods have considerable potential for accurate real-time detection of load anomalies.
Article
Full-text available
Non-technical electricity losses due to anomalies or frauds are accountable for important revenue losses in power utilities. Recent advances have been made in this area, fostered by the roll-out of smart meters. In this paper, we propose a methodology for non-technical loss detection using supervised learning. The methodology has been developed and tested on real smart meter data of all the industrial and commercial customers of Endesa. This methodology uses all the information the smart meters record (energy consumption, alarms and electrical magnitudes) to obtain an in-depth analysis of the customer’s consumption behavior. It also uses auxiliary databases to provide additional information regarding the geographical location and technological characteristics of each smart meter. The model has been trained, validated and tested on the results of approximately 57000 on-field inspections. It is currently in use in a non-technical loss detection campaign for big customers. Several state-of-the-art classifiers have been tested. The results show that extreme gradient boosted trees outperform the rest of the classifiers.
Article
Full-text available
Electricity theft can be harmful to power grid suppliers and cause economic losses. Integrating information flows with energy flows, smart grids can help to solve the problem of electricity theft owning to the availability of massive data generated from smart grids. The data analysis on the data of smart grids is helpful in detecting electricity theft because of the abnormal electricity consumption pattern of energy thieves. However, the existing methods have poor detection accuracy of electricity-theft since most of them were conducted on one dimensional (1-D) electricity consumption data and failed to capture the periodicity of electricity consumption. In this paper, we originally propose a novel electricity-theft detection method based on Wide & Deep Convolutional Neural Networks (CNN) model to address the above concerns. In particular, Wide & Deep CNN model consists of two components: the Wide component and the Deep CNN component. The Deep CNN component can accurately identify the non-periodicity of electricity-theft and the periodicity of normal electricity usage based on two dimensional (2-D) electricity consumption data. Meanwhile, the Wide component can capture the global features of 1-D electricity consumption data. As a result, Wide & Deep CNN model can achieve the excellent performance in electricity-theft detection. Extensive experiments based on realistic dataset show that Wide & Deep CNN model outperforms other existing methods.
Conference Paper
Full-text available
Developments in ambient energy and radio frequency (RF) energy harvesting have the potential to provide in situ power for sensor systems; however, they also have the potential to illicitly collect generated energy. Additionally, new methods of electricity theft have appeared with the introduction of smart grid components. This paper provides an understanding of electricity theft as it relates to advanced energy applications, e.g. energy harvesting and the smart grid. A discussion is also provided of the interdisciplinary, ethical and education issues of electricity theft as it relates to these domains.
Conference Paper
Full-text available
Electricity theft is responsible for economic problems for the electric utility due to revenue loss caused by electricity consumers that are not paying for it. The stealer has a tendency to consume more energy, resulting also in power quality problems. An increase in power demand to values greater than the transformer rated power can result in different quality deviations, like transformer overload, voltage unbalance and steady state voltage drop on system buses. The objective of this paper is to analyze, by using MatLab simulations and considering different grid configurations, how electricity theft results in power quality issues, specifically voltage drop in steady state. Additionally, it is shown how the steady state voltage drop can result in economic penalties for the electric utility when the proper voltage exceeds the network operational standards.
Chapter
Full-text available
With the advent of smart grids, distribution utilities have initiated a large deployment of smart meters on the premises of the consumers. The enormous amount of data obtained from the consumers and communicated to the utility give new perspectives and possibilities for various analytics-based applications. In this paper the current smart metering-based energy-theft detection schemes are reviewed and discussed according to two main distinctive categories: A) system state-based, and B) artificial intelligence-based.
Article
Full-text available
Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence to predict whether a customer causes NTL. This paper first provides an overview of how NTLs are defined and their impact on economies, which include loss of revenue and profit of electricity providers and decrease of the stability and reliability of electrical power grids. It then surveys the state-of-the-art research efforts in a up-to-date and comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be addressed in the future.
Conference Paper
Full-text available
Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.
Article
Full-text available
Nowadays the electric utilities have to handle problems with the non-technical losses caused by frauds and thefts committed by some of their consumers. In order to minimize this, some methodologies have been created to perform the detection of consumers that might be fraudsters. In this context, the use of classification techniques can improve the hit rate of the fraud detection and increase the financial income. This paper proposes the use of the knowledge-discovery in databases process based on artificial neural networks applied to the classifying process of consumers to be inspected. An experiment performed in a Brazilian electric power distribution company indicated an improvement of over 50% of the proposed approach if compared to the previous methods used by that company.
Article
Full-text available
This paper proposes a comprehensive framework to detect non-technical losses (NTLs) and recover electrical energy (lost by abnormalities or fraud) by means of a data mining analysis, in the Spanish Power Electric Industry. It is divided into four section: data selection, data preprocessing, descriptive, and predictive data mining. The authors insist on the importance of the knowledge of the particular characteristics of the Power Company customer: the main features available in databases are described. The paper presents two innovative statistical estimators to attach importance to variability and trend analysis of electric consumption and offers a predictive model, based on the Generalized Rule Induction (GRI) model. This predictive analysis discovers association rules in the data and it is supplemented by a binary Quest tree classification method. The quality of this framework is illustrated by a case study considering a real database, supplied by Endesa Company.
Article
Full-text available
Recall and precision are often used to evaluate the effectiveness of information retrieval systems. They are easy to define if there is a single query and if the retrieval result generated for the query is a linear ordering. However, when the retrieval results are weakly ordered, in the sense that several documents have an identical retrieval status value with respect to a query, some probabilistic notion of precision has to be introduced. Relevance probability, expected precision, and so forth, are some alternatives mentioned in the literature for this purpose. Furthermore, when many queries are to be evaluated and the retrieval results averaged over these queries, some method of interpolation of precision values at certain preselected recall levels is needed. The currently popular approaches for handling both a weak ordering and interpolation are found to be inconsistent, and the results obtained are not easy to interpret. Moreover, in cases where some alternatives are available, no comparative analysis that would facilitate the selection of a particular strategy has been provided. In this paper, we systematically investigate the various problems and issues associated with the use of recall and precision as measures of retrieval system performance. Our motivation is to provide a comparative analysis of methods available for defining precision in a probabilistic sense and to promote a better understanding of the various issues involved in retrieval performance evaluation.
Article
Full-text available
LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regres- sion and linear support vector machines. We provide easy-to-use command-line tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced users. Experiments demonstrate that LIBLINEAR is very efficient on large sparse data sets.
Article
Full-text available
Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence.
Article
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Article
Full-text available
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Article
The illegal use of electricity, defective meters, and a malfunctioning infrastructure are major causes of Non-Technical Losses (NTLs) in electric distribution systems. Although the use of supervised machine learning techniques to detect NTLs has been widely studied, further research is needed in order to address some significant challenges. (i) Given that fraudulent consumers remarkably outnumber non-fraudulent ones, the imbalanced nature of the dataset can have a major negative impact on the performance of supervised machine learning methods. (ii) Given the large number of dimensions present in the time series data used for training and testing classifiers, advanced signal processing techniques are required in order to extract the most relevant information. (iii) The effectiveness of classifiers must be evaluated using meaningful performance measures for imbalanced data. This paper proposes a framework that addresses the three previous challenges. The core of the proposed framework is the application of the Maximal Overlap Discrete Wavelet-Packet Transform (MODWPT) for feature extraction from time series data and the Random Undersampling Boosting (RUSBoost) algorithm for NTL detection. Moreover, our framework is evaluated using an extensive list of performance metrics. Experiments show that the MODWPT combined with the RUSBoost algorithm can significantly improve the quality of NTL predictions.
Article
Electricity theft has been a major issue for many years. Distribution System Operators (DSOs) have been trying to detect electricity theft, however the phenomenon insists, while simple meter inspection methods cannot adequately identify most cases of fraud. In this paper the most recent and characteristic research papers on Non-Technical Loss (NTL) detection are reviewed and their key features are summarized. NTL detection schemes are organized in three large categories: data oriented, network oriented and hybrids. Data oriented and network oriented methods are further divided to subcategories, according to the main concept behind NTL detection. Apart from categorizing the various methods the authors focus on algorithms, data types and size, features, evaluation metrics and NTL detection system response times. An overview of the algorithms used by NTL detection systems is presented focusing on why they are suitable for the specific application. The data types consumed by each NTL detection system are defined and features typically extracted from these data types are presented. In addition, the authors provide a comprehensive list of performance metrics used and comment on their importance. Finally, a qualitative comparison of NTL detectors is provided focusing on performance issues, costs, data variety/quality issues and system response times.
Article
The Endesa Company is the main power utility in Spain. One of the main concerns of power distribution companies is energy loss, both technical and non-technical. A non-technical loss (NTL) in power utilities is defined as any consumed energy or service that is not billed by some type of anomaly. The NTL reduction in Endesa is based on the detection and inspection of the customers that have null consumption during a certain period. The problem with this methodology is the low rate of success of these inspections. This paper presents a framework and methodology, developed as two coordinated modules, that improves this type of inspection. The first module is based on a customer filtering based on text mining and a complementary artificial neural network. The second module, developed from a data mining process, contains a Classification & Regression tree and a Self-Organizing Map neural network. With these modules, the success of the inspections is multiplied by 3. The proposed framework was developed as part of a collaboration project with Endesa.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Conference Paper
Fraud detection in electricity consumption is a major challenge for power distribution companies. While many pattern recognition techniques have been applied to identify electricity theft, they often require extensive handcrafted feature engineering. Instead, through deep layers of transformation, nonlinearity, and abstraction, Deep Learning (DL) automatically extracts key features from data. In this paper, we design spatial and temporal deep learning solutions to identify nontechnical power losses (NTL), including Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) and Stacked Autoencoder. These models are evaluated in a modified IEEE 123-bus test feeder. For the same tests, we also conduct comparison experiments using three conventional machine learning approaches: Random Forest, Decision Trees and shallow Neural Networks. Experimental results demonstrate that the spatiotemporal deep learning approaches outperform conventional machine learning approaches.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
We map categorical variables in a function approximation problem into Euclidean spaces, which are the entity embeddings of the categorical variables. The mapping is learned by a neural network during the standard supervised training process. Entity embedding not only reduces memory usage and speeds up neural networks compared with one-hot encoding, but more importantly by mapping similar values close to each other in the embedding space it reveals the intrinsic properties of the categorical variables. We applied it successfully in a recent Kaggle competition and were able to reach the third position with relative simple features. We further demonstrate in this paper that entity embedding helps the neural network to generalize better when the data is sparse and statistics is unknown. Thus it is especially useful for datasets with lots of high cardinality features, where other methods tend to overfit. We also demonstrate that the embeddings obtained from the trained neural network boost the performance of all tested machine learning methods considerably when used as the input features instead. As entity embedding defines a distance measure for categorical variables it can be used for visualizing categorical data and for data clustering.
Article
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Article
Currently, power distribution companies have several problems that are related to energy losses. For example, the energy used might not be billed due to illegal manipulation or a breakdown in the customer’s measurement equipment. These types of losses are called non-technical losses (NTLs), and these losses are usually greater than the losses that are due to the distribution infrastructure (technical losses). Traditionally, a large number of studies have used data mining to detect NTLs, but to the best of our knowledge, there are no studies that involve the use of a Knowledge-Based System (KBS) that is created based on the knowledge and expertise of the inspectors. In the present study, a KBS was built that is based on the knowledge and expertise of the inspectors and that uses text mining, neural networks, and statistical techniques for the detection of NTLs. Text mining, neural networks, and statistical techniques were used to extract information from samples, and this information was translated into rules, which were joined to the rules that were generated by the knowledge of the inspectors. This system was tested with real samples that were extracted from Endesa databases. Endesa is one of the most important distribution companies in Spain, and it plays an important role in international markets in both Europe and South America, having more than 73 million customers.
Conference Paper
Nowadays distribution system operators are equipped with many real time operation data from SCADA, intelligent electronics devices (IED), automatic meter reading (AMR) and advanced metering infrastructure (AMI) systems. Customer and transformer loading data obtained from AMI system at different time intervals give an opportunity to develop a distribution state estimation (DSE) algorithm. DSE is enabler of many smart distribution applications. In this paper, a DSE based quasi real time search for potential irregularity of electricity usage is presented. The proposed approach can be used to reduce non-technical loss in the distribution network.
Conference Paper
The non-technical loss is not a problem with trivial solution or regional character and its minimization represents the guarantee of investments in product quality and maintenance of power systems, introduced by a competitive environment after the period of privatization in the national scene. In this paper, we show how to improve the training phase of a neural network-based classifier using a recently proposed meta-heuristic technique called Charged System Search, which is based on the interactions between electrically charged particles. The experiments were carried out in the context of non-technical loss in power distribution systems in a dataset obtained from a Brazilian electrical power company, and have demonstrated the robustness of the proposed technique against with several others nature-inspired optimization techniques for training neural networks. Thus, it is possible to improve some applications on Smart Grids.
Article
Illegal use of electric energy is a widespread practice in many parts of the world. Smart metering enables the improvement of customer load model, theft and stressed asset detections. In this paper, a state estimation based approach for distribution transformer load estimation is exploited to detect meter malfunction/tampering and provide quantitative evidences of non-technical loss (NTL). A measure of overall fit of the estimated values to the pseudo feeder bus injection measurements based on customer metering data aggregated at the distribution transformers is used to localize the electricity usage irregularity. Following the state estimation results, an analysis of variance (ANOVA) is used to create a suspect list of customers with metering problems and estimate the actual usage. Typical Taiwan Power Company (TPC) distribution feeder data are used in the tests. Results of NTL detection under meter defect and energy theft scenarios are presented. Experiences indicate that the proposed scheme can give a good trace of the actual usage at feeder buses and supplement the current meter data validation estimation and editing (VEE) process to improve meter data quality.
Article
Total losses in transmission and distribution (T&D) of electrical energy including nontechnical losses (NTL) are huge and are affecting the good interest of utility company and its customers. In this context, importance of customer load profile evaluation for detection of illegal consumers is explained in this paper. Classification of the customers based on load profile evaluation using SVMLIB requires us to choose training function and related parameters. Selecting these parameters would consume a lot of time and is not suggestible evaluation of real time electricity consumption patterns, as, the suspicious profiles are to be predicted instantly. In light of this issue, this paper implements a neural network (NN) model and suggests a hierarchical model for enhanced estimation of the classification efficiency, if that data was classified using support vector machines (SVM). In addition, this paper proposes an encoding technique that can identify illegal consumers with better efficiency and faster classification of data.
Article
MapReduce and its variants have been highly successful in implementing large-scale data-intensive applications on commodity clusters. However, most of these systems are built around an acyclic data flow model that is not suitable for other popular applications. This paper focuses on one such class of applications: those that reuse a working set of data across multiple parallel operations. This includes many iterative machine learning algorithms, as well as interactive data analysis tools. We propose a new framework called Spark that supports these applications while retaining the scalability and fault tolerance of MapReduce. To achieve these goals, Spark introduces an abstraction called resilient distributed datasets (RDDs). An RDD is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost. Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.
Article
Non-technical loss (NTL) during transmission of electrical energy is a major problem in developing countries and it has been very difficult for the utility companies to detect and fight the people responsible for theft. Electricity theft forms a major chunk of NTL. These losses affect quality of supply, increase load on the generating station, and affect tariff imposed on genuine customers. This paper discusses the factors that influence the consumers to steal electricity. In view of these ill effects, various methods for detection and estimation of the theft are discussed. This paper proposes an architectural design of smart meter, external control station, harmonic generator, and filter circuit. Motivation of this work is to deject illegal consumers, and conserve and effectively utilize energy. As well, smart meters are designed to provide data of various parameters related to instantaneous power consumption. NTL in the distribution feeder is computed by external control station from the sending end information of the distribution feeder. If a considerable amount of NTL is detected, harmonic generator is operated at that feeder for introducing additional harmonic component for destroying appliances of the illegal consumers. For illustration, cost-benefit analysis for implementation of the proposed system in India is presented.