Chapter

Electricity Theft Detection Using Machine Learning Techniques to Secure Smart Grid

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Non Technical Losses (NTL) is major problem in power system and cause big revenue losses to the electric utility. The Electricity Theft Detection (ETD) is important topic of research over the years and achieves great success in efficiently detecting the electricity thieves. Further research is needed to improve the existing work and to overcome the problems of data imbalance and detection accuracy of electricity theft. In this paper, we propose a solution to address the above two challenges. The propose solution is consists of Long Short Term Memory (LSTM) and Random Under Sampling Boosting (RUSBoost) technique. Firstly, the data is pre-processed using data normalization and data interpolation. The pre-processed data is further given to LSTM module for feature extraction. Finally, refined features are passed to RUSBoost module for classification. This technique is efficient in solving the data imbalance problem without causing the loss of information and overfitting problems. For evaluation, the proposed model is compared with the state-of-the-art techniques. The experimental results show that our proposed model has achieved high performance in terms of F1-score, precision, recall and Recieving Operating Characteristics curve. The proposed technique is efficient and performs better for recovery of revenue losses in electric utilities.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As a result, data-driven strategies for detecting NTL have lately been favored. Game theory [30], [31], Statistical methods [38], [39], and machine learning methods [40], [41] are examples of such methods. Compared to hardware alternatives, these methods are easier to implement and less expensive. ...
Article
Smart meters are key elements of a smart grid. These data from Smart Meters can help us analyze energy consumption behaviour. The machine learning and deep learning approaches can be used for mining the hidden theft detection information in the smart meter data. However, it needs effective data extraction. This research presents a theft detection dataset (TDD2022) and a machine learning-based solution for automated theft identification in a smart grid environment. An effective theft generator is modelled and used for obtaining a multi-class theft detection dataset from publicly available consumer energy consumption data, owned by the “Open Energy Data Initiative” (OEDI) platform. This is an important and interesting phase to explore in the smart grid field. The proposed dataset can be used for benchmarking and comparative studies. We evaluated the proposed dataset using five different machine learning techniques: k-nearest neighbours (KNN), decision trees (DT), random forest (RF), bagging ensemble (BE), and artificial neural networks (ANN) with different evaluation alternatives (mechanisms). Overall, our best empirical results have been recorded to the theft detection-based RF model scoring an improvement in the performance metrics by 10% or more over the other developed models.
Article
Full-text available
Anomaly detection in home power monitoring can be categorized into two main types: detection of electrical theft, leakage, or nontechnical loss and monitoring anomalies in the daily activities of residents. Focusing on the application and practicality of anomaly detection, we propose sample efficient home power anomaly detection (SEPAD) with improved monitoring performance in terms of electricity usage as well as changes in the daily living activities of residents via provision of detailed feedback. SEPAD consists of two classifiers: an appliance pattern matching classifier (APMC) and an energy consumption habit classifier (ECHC). The APMC uses a single-source separation framework based on a semi-supervised support vector machine (semi-SVM) model. This semi-supervised learning method requires only a small amount of labeled data to achieve high accuracy in near real time and is a sample efficient detection method. The hidden Markov model (HMM)-based ECHC improves the rationality of SEPAD by providing anomaly detection functionality with respect to the daily activities of householders, especially the elderly and residents in developing areas. When SEPAD detects the appearance of an unknown pattern or known patterns contrary to the household’s electricity usage habits, it triggers an alarm. SEPAD was applied to monitor power consumption data from Mkalama, a rural area in Tanzania with 52 households containing nearly 150 occupants connected to a solar powered off-grid network. The results of the practical test demonstrate the high accuracy and practicality of the proposed method.
Article
Full-text available
Non-technical losses (NTL) caused by fault or electricity theft is greatly harmful to the power grid. Industrial customers consume most of the power energy, and it is important to reduce this part of NTL. Currently, most work concentrates on analyzing characteristic of electricity consumption to detect NTL among residential customers. However, the related feature models cannot be adapted to industrial customers because they do not have a fixed electricity consumption pattern. Therefore, this paper starts from the principle of electricity measurement, and proposes a deep learning-based method to extract advanced features from massive smart meter data rather than artificial features. Firstly, we organize electricity magnitudes as one-dimensional sample data and embed the knowledge of electricity measurement in channels. Then, this paper proposes a semi-supervised deep learning model which uses a large number of unlabeled data and adversarial module to avoid overfitting. The experiment results show that our approach can achieve satisfactory performance even when trained by very small samples. Compared with the state-of-the-art methods, our method has achieved obvious improvement in all metrics.
Article
Full-text available
Among an electricity provider's non-technical losses, electricity theft has the most severe and dangerous effects. Fraudulent electricity consumption decreases the supply quality, increases generation load, causes legitimate consumers to pay excessive electricity bills, and affects the overall economy. The adaptation of smart grids can significantly reduce this loss through data analysis techniques. The smart grid infrastructure generates a massive amount of data, including the power consumption of individual users. Utilizing this data, machine learning and deep learning techniques can accurately identify electricity theft users. In this paper, an electricity theft detection system is proposed based on a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) architecture. CNN is a widely used technique that automates feature extraction and the classification process. Since the power consumption signature is time-series data, we were led to build a CNN-based LSTM (CNN-LSTM) model for smart grid data classification. In this work, a novel data pre-processing algorithm was also implemented to compute the missing instances in the dataset, based on the local values relative to the missing data point. Furthermore, in this dataset, the count of electricity theft users was relatively low, which could have made the model inefficient at identifying theft users. This class imbalance scenario was addressed through synthetic data generation. Finally, the results obtained indicate the proposed scheme can classify both the majority class (normal users) and the minority class (electricity theft users) with good accuracy.
Article
Full-text available
Non-technical losses (NTLs) have been a major concern for power distribution companies (PDCs). Billions of dollars are lost each year due to fraud in billing, metering, and illegal consumer activities. Various studies have explored different methodologies for efficiently identifying fraudster consumers. This study proposes a new approach for NTL detection in PDCs by using the ensemble bagged tree (EBT) algorithm. The bagged tree is an ensemble of many decision trees which considerably improves the classification performance of many individual decision trees by combining their predictions to reach a final decision. This approach relies on consumer energy usage data to identify any abnormality in consumption which could be associated with NTL behavior. The key motive of the current study is to provide assistance to the Multan Electric Power Company (MEPCO) in Punjab, Pakistan for its campaign against energy stealers. The model developed in this study generates the list of suspicious consumers with irregularities in consumption data to be further examined on-site. The accuracy of the EBT algorithm for NTL detection is found to be 93.1%, which is considerably higher compared to conventional techniques such as support vector machine (SVM), k-th nearest neighbor (KNN), decision trees (DT), and random forest (RF) algorithm.
Article
Full-text available
1.Species distribution models are used to study biogeographic patterns and guide decision‐making. The variable quality of these models makes it critical to assess whether a model's outputs are suitable for the intended use, but commonly‐used evaluation approaches are inappropriate for many ecological contexts. In particular, unrealistically high performance assessments have been associated with models for rare species and predictions over large geographic extents. 2.We evaluated the area under the precision‐recall curve (AUC‐PR) as a performance metric for rare binary events, focusing on assessment of species distribution models. Precision is the probability that a species is present given a predicted presence, while recall (more commonly called sensitivity) is the probability the model predicts presence in locations where the species has been observed. We simulated species at three levels of prevalence, compared AUC‐PR and the area under the receiver operating characteristic curve (AUC‐ROC) when the geographic extent of predictions was increased, and assessed how well each metric reflected a model's utility to guide surveys for new populations. 3.AUC‐PR was robust to species rarity and, unlike AUC‐ROC, not affected by an increasing geographic extent. The major advantages of AUC‐PR arise because it does not incorporate correctly predicted absences, and is therefore less prone to exaggerate model performance for unbalanced datasets. AUC‐PR and precision were useful indicators of a model's utility for guiding surveys. 4.We show that AUC‐PR has important advantages for evaluating models of rare species, and its benefits in the context of unbalanced binary responses will make it applicable for other ecological studies. By not considering the true negative quadrant of the confusion matrix, AUC‐PR ameliorates issues that arise when the geographic extent is increased beyond the species’ range or when a large number of background points are used when absence information is unavailable. However, no single metric captures all aspects of performance, nor provides an absolute index that can be compared across datasets. Our results indicate AUC‐PR and precision can provide useful and intuitive metrics for evaluating a model's utility for guiding sampling, and can complement other metrics to help delineate a model's appropriate use. This article is protected by copyright. All rights reserved.
Article
Full-text available
To detect energy theft attacks in Advanced Metering Infrastructure (AMI), we propose a detection method based on principal component analysis (PCA) approximation. PCA approximation is introduced by dimensionality reduction of high dimensional AMI data and we extract the underlying consumption trends of a consumer that repeat on a daily or weekly basis. AMI data is reconstructed using Principal Components (PCs) and used for computing relative entropy. In the proposed method, relative entropy is used to measure the similarity between two probability distribution derived from reconstructed consumption dataset. When energy theft attacks are injected into AMI, the probability distribution of energy consumption will deviate from the historical consumption, so leading to a larger relative entropy. The proposed detection method is tested under different attack scenarios using real smart meter data. Test results show that the proposed method can detect theft attacks with high detection percentage.
Article
Full-text available
The two-way flow of information and energy is an important feature of the Energy Internet. Data analytics is a powerful tool in the information flow that aims to solve practical problems using data mining techniques. As the problem of electricity thefts via tampering with smart meters continues to increase, the abnormal behaviors of thefts become more diversified and more difficult to detect. Thus, a data analytics method for detecting various types of electricity thefts is required. However, the existing methods either require a labeled dataset or additional system information which is difficult to obtain in reality or have poor detection accuracy. In this paper, we combine two novel data mining techniques to solve the problem. One technique is the Maximum Information Coefficient (MIC), which can find the correlations between the non-technical loss (NTL) and a certain electricity behavior of the consumer. MIC can be used to precisely detect thefts that appear normal in shapes. The other technique is the clustering technique by fast search and find of density peaks (CFSFDP). CFSFDP finds the abnormal users among thousands of load profiles, making it quite suitable for detecting electricity thefts with arbitrary shapes. Next, a framework for combining the advantages of the two techniques is proposed. Numerical experiments on the Irish smart meter dataset are conducted to show the good performance of the combined method.
Article
Full-text available
Electricity theft can be harmful to power grid suppliers and cause economic losses. Integrating information flows with energy flows, smart grids can help to solve the problem of electricity theft owning to the availability of massive data generated from smart grids. The data analysis on the data of smart grids is helpful in detecting electricity theft because of the abnormal electricity consumption pattern of energy thieves. However, the existing methods have poor detection accuracy of electricity-theft since most of them were conducted on one dimensional (1-D) electricity consumption data and failed to capture the periodicity of electricity consumption. In this paper, we originally propose a novel electricity-theft detection method based on Wide & Deep Convolutional Neural Networks (CNN) model to address the above concerns. In particular, Wide & Deep CNN model consists of two components: the Wide component and the Deep CNN component. The Deep CNN component can accurately identify the non-periodicity of electricity-theft and the periodicity of normal electricity usage based on two dimensional (2-D) electricity consumption data. Meanwhile, the Wide component can capture the global features of 1-D electricity consumption data. As a result, Wide & Deep CNN model can achieve the excellent performance in electricity-theft detection. Extensive experiments based on realistic dataset show that Wide & Deep CNN model outperforms other existing methods.
Article
Full-text available
Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence to predict whether a customer causes NTL. This paper first provides an overview of how NTLs are defined and their impact on economies, which include loss of revenue and profit of electricity providers and decrease of the stability and reliability of electrical power grids. It then surveys the state-of-the-art research efforts in a up-to-date and comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be addressed in the future.
Conference Paper
Full-text available
Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.
Article
Full-text available
Class imbalance is a problem that is common to many application domains. When examples of one class in a training data set vastly outnumber examples of the other class(es), traditional data mining algorithms tend to create suboptimal classification models. Several techniques have been used to alleviate the problem of class imbalance, including data sampling and boosting. In this paper, we present a new hybrid sampling/boosting algorithm, called RUSBoost, for learning from skewed training data. This algorithm provides a simpler and faster alternative to SMOTEBoost, which is another algorithm that combines boosting and data sampling. This paper evaluates the performances of RUSBoost and SMOTEBoost, as well as their individual components (random undersampling, synthetic minority oversampling technique, and AdaBoost). We conduct experiments using 15 data sets from various application domains, four base learners, and four evaluation metrics. RUSBoost and SMOTEBoost both outperform the other procedures, and RUSBoost performs comparably to (and often better than) SMOTEBoost while being a simpler and faster technique. Given these experimental results, we highly recommend RUSBoost as an attractive alternative for improving the classification performance of learners built using imbalanced data.
Article
Anomaly detection is a long-standing problem in system designation. High-quality anomaly detection can benefit plenty of applications (e.g. system monitoring, disaster precaution and intrusion detection). Most of the existing anomalies detection algorithms are less competent for both effectiveness and real-time capability requirements simultaneously. Therefore, in this paper, the LGMAD, a real-time anomaly detection algorithm based on Long-Short Term Memory (LSTM) and Gaussian Mixture Model (GMM)is proposed. Specifically, we evaluate the real-time anomalies of each univariate sensing time-series via LSTM model, and then a Gaussian Mixture Model is adopted to give a multidimensional joint detection of possible anomalies. Both NAB dataset and self-made dataset are employed to verify our approach. Extensive experiments are conducted to demonstrate the superiority of LGMAD compared to existing anomaly detection algorithms.
Article
Non-technical losses in electricity utilities are responsible for major revenue losses. In this paper, we propose a novel end-to-end solution to self-learn the features for detecting anomalies and frauds in smart meters using a hybrid deep neural network. The network is fed with simple raw data, removing the need of handcrafted feature engineering. The proposed architecture consists of a long short-term memory network and a multi-layer perceptrons network. The first network analyses the raw daily energy consumption history whilst the second one integrates non-sequential data such as its contracted power or geographical information. The results show that the hybrid neural network significantly outperforms state-of-the-art classifiers as well as previous deep learning models used in non-technical losses detection. The model has been trained and tested with real smart meter data of Endesa, the largest electricity utility in Spain.
Article
Abstract: Recently, the radical digital transformation has deeply affected the traditional electricity grid and transformed it into an intelligent network (smart grid). This mutation is based on the progressive development of advanced technologies: advanced metering infrastructure (AMI) and smart meter which play a crucial role in the development of smart grid. AMI technologies have a promising potential in terms of improvement in energy efficiency, better demand management, and reduction in electricity costs. However the possibility of hacking smart meters and electricity theft is still among the most significant challenges facing electricity companies. In this regard, we propose a hybrid approach to detect anomalies associated with electricity theft in the AMI system, based on a combination of two robust machine learning algorithms; K-means and Deep Neural Network (DNN). K-means unsupervised machine learning algorithm is used to identify groups of customers with similar electricity consumption patterns to understand different types of normal behavior. DNN algorithm is used to build an accurate anomaly detection model capable of detecting changes or anomalies in usage behavior and deciding whether the customer has a normal or malicious consumption behavior. The proposed model is constructed and evaluated based on a real dataset from the Irish Smart Energy Trials. The results show a high performance of the proposed model compared to the models mentioned in the literature. Keywords: Anomaly detection, advanced metering infrastructure (AMI), smart grid, behavior, machine learning, deep neural network (DNN), cyber-security.
Article
The reduction of non-technical losses is a significant part of the total potential benefits resulting from implementations of the smart grid concept. This paper proposes a data-based method to detect sources of theft and other commercial losses. Prototypes of typical consumption behavior are extracted through clustering of data collected from smart meters. A distance-based novelty detection framework classifies new data samples as malign if their distance to the typical consumption prototypes is significant. The proposed method works on the space of four different indicators of irregular consumption, enabling the easy interpretation of results. A use case based on real data is presented to evaluate the method. The threat model considers sixteen different possible types of changes in consumption pattern that result from non-technical losses, including attacks and defects present since the first day of metering. The proposed clustering-based novelty detection method for identification of non-technical losses, using the Gustafson-Kessel fuzzy clustering algorithm, achieves a true positive rate of 63.6% and false positive rate of 24.3%, outperforming other state-of-the-art unsupervised learning methods.
Article
In this paper, customer’s invoiced energy time series in the recent past is analyzed in order to detect the cause of increased total electricity losses in the low voltage network. Period from 2003 to 2017 is included in the recent past. This period is organized in five successive decades whose beginnings are shifted for one year. Fuzzy logic is used as a method for determining a set of suspicious electricity customers in the first phase of electricity fraud detecting. At this phase, every customer located in the area of increased total electricity losses is analyzed. For each customer, a time series of invoiced energy are formed. Selected time series data and their relations are used to create fuzzy sets of suspicion. Then, total suspicion value of each customer is determined by using fuzzy logic. Based on the estimated total and technical energy losses in the customer’s area (region that is supplied by one or more MV/LV transformer stations) and the balance of total, invoiced and energy of losses, a boundary value of suspicion percentage is determined. All customers, whose percentage suspicion value is greater than the boundary value, are declared suspect. Thus, suspicious customers with their locations which need inspection are obtained. On-site inspection of suspected customers is not performed and is not the subject of this paper.
Article
Practical building operations usually deviate from the designed building operational performance due to the wide existence of operating faults and improper control strategies. Great energy saving potential can be realized if inefficient or faulty operations are detected and amended in time. The vast amounts of building operational data collected by the Building Automation System have made it feasible to develop data-driven approaches to anomaly detection. Compared with supervised analytics, unsupervised anomaly detection is more practical in analyzing real-world building operational data, as anomaly labels are typically not available. Autoencoder is a very powerful method for the unsupervised learning of high-level data representations. Recent development in deep learning has endowed autoencoders with even greater capability in analyzing complex, high-dimensional and large-scale data. This study investigates the potential of autoencoders in detecting anomalies in building energy data. An autoencoder-based ensemble method is proposed while providing a comprehensive comparison on different autoencoder types and training schemes. Considering the unique learning mechanism of autoencoders, specific methods have been designed to evaluate the autoencoder performance. The research results can be used as foundation for building professionals to develop advanced tools for anomaly detection and performance benchmarking.
Technical Report
TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
The Endesa Company is the main power utility in Spain. One of the main concerns of power distribution companies is energy loss, both technical and non-technical. A non-technical loss (NTL) in power utilities is defined as any consumed energy or service that is not billed by some type of anomaly. The NTL reduction in Endesa is based on the detection and inspection of the customers that have null consumption during a certain period. The problem with this methodology is the low rate of success of these inspections. This paper presents a framework and methodology, developed as two coordinated modules, that improves this type of inspection. The first module is based on a customer filtering based on text mining and a complementary artificial neural network. The second module, developed from a data mining process, contains a Classification & Regression tree and a Self-Organizing Map neural network. With these modules, the success of the inspections is multiplied by 3. The proposed framework was developed as part of a collaboration project with Endesa.
Article
According to The Brazilian Electricity Regulatory Agency, Brazil reached a loss of approximately US$ 4 billion in commercial losses during 2011, which correspond to more than 27,000 GWh. The strengthening of the Smart Grid has brought a considerable amount of research can be noticed, mainly with respect to the application of several artificial intelligence techniques in order to automatically detect commercial losses, but the problem of selecting the most representative features has not been widely discussed. In this paper, we make a parallel among the problem of commercial losses in Brazil and the task of irregular consumers characterization by means of a recent meta-heuristic optimization technique called Black Hole Algorithm. The experimental setup is conducted over two private datasets (commercial and industrial) provided by a Brazilian electric utility, and it shows the importance of selecting the most relevant features in the context of theft characterization.
Entropy-based electricity theft detection in AMI network
  • S K Singh
  • R Bose
  • A Joshi
  • SK Singh
Modeling tabular data using conditional GAN
  • L Xu
  • M Skoularidou
  • A Cuesta-Infante
  • K Veeramachaneni