Conference Paper

Electricity Theft Detection using Pipeline in Machine Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Supervised ML-based ETD solutions, on the other hand, employ prelabelled SM data(i.e., ''Normal'' and ''Theft'') to train classifier models to identify electricity theft (Anwar et al., 2020;Hasan et al., 2019;Zheng et al., 2018). The generalisation made by the model on the data during the model training stage categorises unlabelled data into different groups, reducing the cost and effort associated with site visits. ...
... The authors (Anwar et al., 2020) propose a machine-learning pipeline of SMOTE, KPCA, and SVM. The proposed ETD pipeline by Anwar et al. (2020) first handles data class imbalance using Synthetic Minority Oversampling Technique (SMOTE), then integrates kernel function and principal component analysis (KPCA) to extra relevant features from the dataset. ...
... The authors (Anwar et al., 2020) propose a machine-learning pipeline of SMOTE, KPCA, and SVM. The proposed ETD pipeline by Anwar et al. (2020) first handles data class imbalance using Synthetic Minority Oversampling Technique (SMOTE), then integrates kernel function and principal component analysis (KPCA) to extra relevant features from the dataset. The support vector machine (SVM) is used as a classifier obtaining 88%,85%,87%, 86%, and 89% theft detection rate or recall (DR), precision, F1-score (F1), Matthew's correlation coefficient, area under the curve receiver operating characteristics respectively. ...
Article
Full-text available
Electricity ranks among the world’s most plundered commodities. The fraudulent act of acquiring electrical power without paying for it is termed electricity theft. Electricity theft is captured in power distribution systems as non-technical losses (NTL), representing a major loss in revenue for power utility companies. Electricity theft has far-reaching financial consequences owing to unrealised revenue, and this has a knock-on effect on both developed and developing countries because electricity represents a major part of a country’s GDP and facilitates other industries. AMI-based smart energy meters (SM) gather large amounts of electricity consumption (EC) data that power utilities can utilise to monitor and detect fraudulent customers. This EC data is fed to a machine learning (ML) based electricity theft detection model to learn the behaviour of fraudulent customers. However, existing ML-based electricity theft detection (ETD) models do not produce the best outcomes because of; consecutive missing values in EC datasets, data class imbalance problems, inappropriate hyperparameter tuning of ML models, etc. This research introduces an ETD model using an extremely randomised trees classifier to detect electricity theft in smart grids efficiently. SMOTE Tomek sampling is used to deal with the data class imbalance, and the grid search optimisation technique is employed to optimise the hyperparameters of the proposed model. The proposed model shows its capacity to detect electricity theft by obtaining 98%, 95.06%, 98%, 97%, 98%, and 99.65% accuracy, Matthew’s correlation coefficient, detection rate, Precision, F1-score, and area under the curve receiver operating characteristic, respectively.
... The ultimate goal of electricity thieves is to consume energy without being billed by utility companies [3], or pay the bills amounting to less than the consumed amount [4]. As a result, utility companies suffer a huge revenue loss due to electricity theft. ...
... Due to high-cost demand in the above categories, many researchers work on data-driven methods to overcome the electricity theft problem. For instance, the authors in [3] designed an electricity theft detection system by employing three algorithms in the pipeline: Synthetic Minority Over-sampling Technique (SMOTE), Kernel function and Principal Component Analysis (KPCA) and SVM. They used SMOTE to generate synthetic data for balancing an unbalanced dataset, KPCA to extract features and SVM for classification. ...
... They varied training and validation data ratios, to obtain maximum AUC value of 79%. By utilizing the same dataset used in [3] and [4], the method we present in this paper achieves AUC results beyond 90% on both validation and testing. ...
Article
Full-text available
Electricity theft is a global problem that negatively affects both utility companies and electricity users. It destabilizes the economic development of utility companies, causes electric hazards and impacts the high cost of energy for users. The development of smart grids plays an important role in electricity theft detection since they generate massive data that includes customer consumption data which, through machine learning and deep learning techniques, can be utilized to detect electricity theft. This paper introduces the theft detection method which uses comprehensive features in time and frequency domains in a deep neural network-based classification approach. We address dataset weaknesses such as missing data and class imbalance problems through data interpolation and synthetic data generation processes. We analyze and compare the contribution of features from both time and frequency domains, run experiments in combined and reduced feature space using principal component analysis and finally incorporate minimum redundancy maximum relevance scheme for validating the most important features. We improve the electricity theft detection performance by optimizing hyperparameters using a Bayesian optimizer and we employ an adaptive moment estimation optimizer to carry out experiments using different values of key parameters to determine the optimal settings that achieve the best accuracy. Lastly, we show the competitiveness of our method in comparison with other methods evaluated on the same dataset. On validation, we obtained 97% area under the curve (AUC), which is 1% higher than the best AUC in existing works, and 91.8% accuracy, which is the second-best on the benchmark.
... This work is an extension of [35]. The main contributions of this work are listed below. ...
... So, the ratio is 1:10, which proves that the data set is highly imbalanced. The dataset cannot be used without preprocessing as imbalanced dataset negatively affects the performance of a classifier [35]. ...
... • Support Vector Machine: it is famous for both regression and classification problems as it is a flexible and a powerful supervised algorithm [59], [35]. Table 6 shows the values of parameters used for simulations of SVM. ...
Article
Full-text available
In smart grids, electricity theft is the most significant challenge. It cannot be identified easily since existing methods are dependent on specific devices. Also, the methods lack in extracting meaningful information from high-dimensional electricity consumption data and increase the false positive rate that limit their performance. Moreover, imbalanced data is a hurdle in accurate electricity theft detection (ETD) using data driven methods. To address this problem, sampling techniques are used in the literature. However, the traditional sampling techniques generate insufficient and unrealistic data that degrade the ETD rate. In this work, two novel ETD models are developed. A hybrid sampling approach, i.e., synthetic minority oversampling technique with edited nearest neighbor, is introduced in the first model. Furthermore, AlexNet is used for dimensionality reduction and extracting useful information from electricity consumption data. Finally, a light gradient boosting model is used for classification purpose. In the second model, conditional wasserstein generative adversarial network with gradient penalty is used to capture the real distribution of the electricity consumption data. It is constructed by adding auxiliary provisional information to generate more realistic data for the minority class. Moreover, GoogLeNet architecture is employed to reduce the dataset’s dimensionality. Finally, adaptive boosting is used for classification of honest and suspicious consumers. Both models are trained and tested using real power consumption data provided by state grid corporation of China. The proposed models’ performance is evaluated using different performance metrics like precision, recall, accuracy, F1-score, etc. The simulation results prove that the proposed models outperform the existing techniques, such as support vector machine, extreme gradient boosting, convolution neural network, etc., in terms of efficient ETD.
... Support Vector Machine (SVM) is the most commonly used technique for electricity theft detection to achieve a high detection rate and fewer false alarms. Certain aspects of the electricity consumption data such as historical consumption data (location, seasonality, and category), load profile information, identification of consumers with a high probability of abnormal behaviour, and high dimensional data have been explored well using SVMs [9], Genetic algorithm-based SVM [5], fuzzy-based SVMs [10], and PCA based SVMs [11]. Electricity thieves have also been identified by analyzing their load profiles at different hierarchies of the power grid (transmission, distribution, and consumer) using hybrid SVM models such as decision tree-based SVMs [12], decision trees-k-nearest neighbour SVMs [13], Extreme learning machine (ELM), online sequential ELMs [14], and even multi-class SVMs [15]. ...
... Ignoring such missing values might lead to downsizing the dataset, which poses a significant challenge in carrying out reliable analysis. Previous works [11], [23]- [25] have used linear interpolation, mean of previous and following day consumption's, filling with mean or median of a complete column, and dropping rows which have missing values beyond a certain threshold. Such methods perform well for isolated RQ1: How to handle large gaps, i.e., consecutive missing values, in time series data with high seasonal trends in an effective way? ...
Conference Paper
Full-text available
Artificial intelligence-based techniques applied to the electricity consumption data generated from the smart grid prove to be an effective solution in reducing Non Technical Loses (NTLs), thereby ensures safety, reliability, and security of the smart energy systems. However, imbalanced data, consecutive missing values, large training times, and complex architectures hinder the real time application of electricity theft detection models. In this paper, we present EnsembleNTLDetect, a robust and scalable electricity theft detection framework that employs a set of efficient data pre-processing techniques and machine learning models to accurately detect electricity theft by analysing consumers' electricity consumption patterns. This framework utilises an enhanced Dynamic Time Warping Based Imputation (eDTWBI) algorithm to impute missing values in the time series data and leverages the Near-miss undersampling technique to generate balanced data. Further, stacked autoencoder is introduced for dimensionality reduction and to improve training efficiency. A Conditional Generative Adversarial Network (CTGAN) is used to augment the dataset to ensure robust training and a soft voting ensemble classifier is designed to detect the consumers with aberrant consumption patterns. Furthermore, experiments were conducted on the real-time electricity consumption data provided by the State Grid Corporation of China (SGCC) to validate the reliability and efficiency of EnsembleNTLDetect over the state-of-the-art electricity theft detection models in terms of various quality metrics.
... Support Vector Machine (SVM) is the most commonly used technique for electricity theft detection to achieve a high detection rate and fewer false alarms. Certain aspects of the electricity consumption data such as historical consumption data (location, seasonality, and category), load profile information, identification of consumers with a high probability of abnormal behaviour, and high dimensional data have been explored well using SVMs [9], Genetic algorithm-based SVM [5], fuzzy-based SVMs [10], and PCA based SVMs [11]. Electricity thieves have also been identified by analyzing their load profiles at different hierarchies of the power grid (transmission, distribution, and consumer) using hybrid SVM models such as decision tree-based SVMs [12], decision trees-k-nearest neighbour SVMs [13], Extreme learning machine (ELM), online sequential ELMs [14], and even multi-class SVMs [15]. ...
... Ignoring such missing values might lead to downsizing the dataset, which poses a significant challenge in carrying out reliable analysis. Previous works [11], [23]- [25] have used linear interpolation, mean of previous and following day consumption's, filling with mean or median of a complete column, and dropping rows which have missing values beyond a certain threshold. Such methods perform well for isolated RQ1: How to handle large gaps, i.e., consecutive missing values, in time series data with high seasonal trends in an effective way? ...
Preprint
Full-text available
Artificial intelligence-based techniques applied to the electricity consumption data generated from the smart grid prove to be an effective solution in reducing Non Technical Loses (NTLs), thereby ensures safety, reliability, and security of the smart energy systems. However, imbalanced data, consecutive missing values, large training times, and complex architectures hinder the real time application of electricity theft detection models. In this paper, we present EnsembleNTLDetect, a robust and scalable electricity theft detection framework that employs a set of efficient data pre-processing techniques and machine learning models to accurately detect electricity theft by analysing consumers' electricity consumption patterns. This framework utilises an enhanced Dynamic Time Warping Based Imputation (eDTWBI) algorithm to impute missing values in the time series data and leverages the Near-miss undersampling technique to generate balanced data. Further, stacked autoencoder is introduced for dimensionality reduction and to improve training efficiency. A Conditional Generative Adversarial Network (CTGAN) is used to augment the dataset to ensure robust training and a soft voting ensemble classifier is designed to detect the consumers with aberrant consumption patterns. Furthermore, experiments were conducted on the real-time electricity consumption data provided by the State Grid Corporation of China (SGCC) to validate the reliability and efficiency of EnsembleNTLDetect over the state-of-the-art electricity theft detection models in terms of various quality metrics.
... In the last few decades, electrical power has become a backbone for the development of any country [112,113,114]. It has the potential to either raise or reduce the country's economy. ...
... However, the existing methods for detecting this criminal electrical theft behavior are diversified and complicated due to the unbalanced nature of the data sets. The work of [17] addresses these difficulties by developing a model, integrating 3 algorithms, first applying the synthetic minority oversampling technique (SMOTE) to balance the data set, secondly, the integration of the function of the kernel and principal component analysis (KPCA) for feature extraction from high-dimensional time-series data, and Support Vector Machine (SVM) for classification. ...
Article
Full-text available
Energy theft and defective meters not only lead to non-technical losses (NTLs) that are extremely detrimental to energy distributors and power infrastructure, but NTLs are also a major cause of damages to electricity and massive losses of revenue each year. Automatic meter reading (AMR) is a system used by the Provincial Electricity Authority (PEA) of Thailand for NTLs detection that works in conjunction with physical inspection. At present, using only AMR is unable to classify and identify the types of abnormalities that occur. In addition, another important issue that has been rarely studied and should not be neglected is the balancing of data. Because this issue has a negative impact on minorities, learners misclassify and leads to incorrect predictions. This paper proposes unbalanced data techniques for classifying energy theft, defective and normal customers based on AMR monitoring of PEA. In order to handle the multiclass imbalance problems, three methods are evaluated and compared: anomaly models (AM), adaptive synthetic sampling (ADASYN), and image data augmentation (IA). The data were extracted from time series into an image using a recurrence plot (RP) and classified as abnormal patterns of imaging time series using six deep learning models: LeNet5, AlexNet, VGGNet19, DenseNet121, ResNet50, and InceptionV3. The experimental results demonstrate that data generation techniques using anomaly models and DenseNet121 for classifying provided the best results compared to other techniques, and data extraction using images yields better results than time series. Moreover, compared to balanced and unbalanced data, classification evaluation using AUC-ROC and F1-score are the most appropriate evaluation methods, and importantly, balancing the data before classification improved the model performance.
Article
Full-text available
Despite the widespread use of artificial intelligence-based methods in detecting electricity theft by smart grid customers, current methods suffer from two main flaws: a limited amount of data on electricity theft customers compared to that on normal customers and an imbalanced dataset that can significantly affect the accuracy of the detection method. Additionally, most existing methods for detecting electricity theft rely solely on one-dimensional electricity consumption data, which fails to capture the periodicity of consumption and overlooks the temporal correlation of customers’ electricity consumption based on their weekly, monthly, or other time scales. To address the mentioned issues, this paper proposes a novel approach that first employed a time series generative adversarial network to balance the dataset by generating synthetic data for electricity theft customers. Then, a hybrid multi-time-scale neural network-based model was utilized to extract customers’ features and a CatBoost classifier was applied to achieve classification. Experiments were conducted on a real-world smart meter dataset obtained from the State Grid Corporation of China. The results demonstrated that the proposed method could detect electricity theft by customers with a precision rate of 96.64%, a recall rate of 96.87%, and a significantly reduced false detection rate of 3.77%.
Thesis
Full-text available
Instead of planting new electricity generation units, there is a need to design an efficient energy management system to achieve a normalized trend of power consumption. Smart grid has been evolved as a solution, where Demand Response (DR) strategy is used to modify the consumer's nature of demand. In return, utilities pay incentives to the consumer. This concept is equally applicable on residential and commercial areas; however, the increasing load demand and irregular electricity load profile in residential area have encouraged us to propose an efficient home energy management system for optimal scheduling of home appliances. Whereas, electricity consumers have stochastic nature, for which nature-inspired optimization techniques provide optimal solutions. However, these optimization techniques behave stochastically according to the situation. For this reason, we have proposed different optimization techniques for different scenarios. The objectives of this thesis include: reduction in electricity bill and peak to average ratio, minimization of waiting time to start appliances (comfort maximization) and minimization of wastage of surplus energy by exploiting the coordination among appliances and homes. In order to meet the electricity demand of the consumers, the energy consumption patterns of a consumer are maintained through scheduling the appliances in day-ahead and realtime bases. It is applicable by the defined fitness criterion for the proposed hybrid bacterial foraging genetic algorithm and hybrid elephant adaptive cuckoo search optimization techniques, which helps in balancing the load during On-peak and Off-peak hours. Moreover, the concept of coordination and coalition among home appliances is presented for real-time scheduling. The fitness criterion helps the scheduler to optimally decide the ON/OFF status of appliances in order to reduce the waiting time of the appliance. A multi-objective optimization based solution is proposed to resolve the trade-off between conflicting objectives: electricity bill, waiting time of appliances and electricity load shifting according to the defined electricity load pattern. Two optimization techniques: binary multiobjective bird swarm optimization and a hybrid of bird swarm and cuckoo search algorithms are proposed to obtain the Pareto front. The main objective of DR is to encourage the consumer to shift the peak load and gets incentives in terms of cost reduction. However, prices remain the same for all the users even if they shift the peak load or not. In this thesis, Game Theory (GT) based Time of Use pricing model is presented to define the pricing strategy for On-peak and Off-peak hours. The price is defined for each user according to the utilized load using coalitional GT. Further, the proposed pricing model is analyzed for scheduled and unscheduled load. In this regards, Salp swarm and rainfall algorithms are used for scheduling of appliances and an aggregated fitness criterion is defined for load shifting to avoid the peak rebound effect. We also proposed the coordination and coalition based Energy Management System-as-a- Service on Fog (EMSaaS_Fog). With the increase in number of electricity consumers, the computational complexity of energy management system is becoming a threat for efficiency of a system in real-time environment. To deal with this dilemma, the utility shifts computational and storage units on cloud and fog. The proposed EMSaaS_Fog effectively handles the coalition among the apartments within a building to maintain balance between the demand and supply. Moreover, we consider a small community, which consists of multiple smart homes. Microgrid is installed at each residence for electricity generation. It is connected with the fog server to share and store information. Smart energy consumers are able to share detail of excess energy with each other through the fog server.
Article
Full-text available
With the ever-growing demand of electric power, it is quite challenging to detect and prevent Non-Technical Loss (NTL) in power industries. NTL is committed by meter bypassing, hooking from the main lines, reversing and tampering the meters. Manual on-site checking and reporting of NTL remains an unattractive strategy due to the required manpower and associated cost. The use of machine learning classifiers has been an attractive option for NTL detection. It enhances data-oriented analysis and high hit ratio along with less cost and manpower requirements. However, there is still a need to explore the results across multiple types of classifiers on a real-world dataset. This paper considers a real dataset from a power supply company in Pakistan to identify NTL. We have evaluated 15 existing machine learning classifiers across 9 types which also include the recently developed CatBoost, LGBoost and XGBoost classifiers. Our work is validated using extensive simulations. Results elucidate that ensemble methods and Artificial Neural Network (ANN) outperform the other types of classifiers for NTL detection in our real dataset. Moreover, we have also derived a procedure to identify the top-14 features out of a total of 71 features, which are contributing 77% in predicting NTL. We conclude that including more features beyond this threshold does not improve performance and thus limiting to the selected feature set reduces the computation time required by the classifiers. Last but not least, the paper also analyzes the results of the classifiers with respect to their types, which has opened a new area of research in NTL detection.
Article
Full-text available
As one of the major factors of the nontechnical losses (NTLs) in distribution networks, the electricity theft causes significant harm to power grids, which influences power supply quality and reduces operating profits. In order to help utility companies solve the problems of inefficient electricity inspection and irregular power consumption, a novel hybrid convolutional neural network-random forest (CNN-RF) model for automatic electricity theft detection is presented in this paper. In this model, a convolutional neural network (CNN) firstly is designed to learn the features between different hours of the day and different days from massive and varying smart meter data by the operations of convolution and downsampling. In addition, a dropout layer is added to retard the risk of overfitting, and the backpropagation algorithm is applied to update network parameters in the training phase. And then, the random forest (RF) is trained based on the obtained features to detect whether the consumer steals electricity. To build the RF in the hybrid model, the grid search algorithm is adopted to determine optimal parameters. Finally, experiments are conducted based on real energy consumption data, and the results show that the proposed detection model outperforms other methods in terms of accuracy and efficiency.
Article
Full-text available
Non-technical losses (NTL) caused by fault or electricity theft is greatly harmful to the power grid. Industrial customers consume most of the power energy, and it is important to reduce this part of NTL. Currently, most work concentrates on analyzing characteristic of electricity consumption to detect NTL among residential customers. However, the related feature models cannot be adapted to industrial customers because they do not have a fixed electricity consumption pattern. Therefore, this paper starts from the principle of electricity measurement, and proposes a deep learning-based method to extract advanced features from massive smart meter data rather than artificial features. Firstly, we organize electricity magnitudes as one-dimensional sample data and embed the knowledge of electricity measurement in channels. Then, this paper proposes a semi-supervised deep learning model which uses a large number of unlabeled data and adversarial module to avoid overfitting. The experiment results show that our approach can achieve satisfactory performance even when trained by very small samples. Compared with the state-of-the-art methods, our method has achieved obvious improvement in all metrics.
Article
Full-text available
Among an electricity provider's non-technical losses, electricity theft has the most severe and dangerous effects. Fraudulent electricity consumption decreases the supply quality, increases generation load, causes legitimate consumers to pay excessive electricity bills, and affects the overall economy. The adaptation of smart grids can significantly reduce this loss through data analysis techniques. The smart grid infrastructure generates a massive amount of data, including the power consumption of individual users. Utilizing this data, machine learning and deep learning techniques can accurately identify electricity theft users. In this paper, an electricity theft detection system is proposed based on a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) architecture. CNN is a widely used technique that automates feature extraction and the classification process. Since the power consumption signature is time-series data, we were led to build a CNN-based LSTM (CNN-LSTM) model for smart grid data classification. In this work, a novel data pre-processing algorithm was also implemented to compute the missing instances in the dataset, based on the local values relative to the missing data point. Furthermore, in this dataset, the count of electricity theft users was relatively low, which could have made the model inefficient at identifying theft users. This class imbalance scenario was addressed through synthetic data generation. Finally, the results obtained indicate the proposed scheme can classify both the majority class (normal users) and the minority class (electricity theft users) with good accuracy.
Article
Full-text available
Non-technical losses (NTLs) have been a major concern for power distribution companies (PDCs). Billions of dollars are lost each year due to fraud in billing, metering, and illegal consumer activities. Various studies have explored different methodologies for efficiently identifying fraudster consumers. This study proposes a new approach for NTL detection in PDCs by using the ensemble bagged tree (EBT) algorithm. The bagged tree is an ensemble of many decision trees which considerably improves the classification performance of many individual decision trees by combining their predictions to reach a final decision. This approach relies on consumer energy usage data to identify any abnormality in consumption which could be associated with NTL behavior. The key motive of the current study is to provide assistance to the Multan Electric Power Company (MEPCO) in Punjab, Pakistan for its campaign against energy stealers. The model developed in this study generates the list of suspicious consumers with irregularities in consumption data to be further examined on-site. The accuracy of the EBT algorithm for NTL detection is found to be 93.1%, which is considerably higher compared to conventional techniques such as support vector machine (SVM), k-th nearest neighbor (KNN), decision trees (DT), and random forest (RF) algorithm.
Article
Full-text available
With the collection of massive amounts of data every day, big data analytics has emerged as an important trend for many organizations. These collected data can contain important information that may be key to solving wide-ranging problems, such as cyber security, marketing, healthcare, and fraud. To analyze their large volumes of data for business analyses and decisions, large companies, such as Facebook and Google, adopt analytics. Such analyses and decisions impact existing and future technology. In this study, we explore how big data analytics is utilized as a technique for solving problems of complex and unstructured data using such technologies as Hadoop, Spark, and MapReduce. We also discuss the data challenges introduced by big data according to the literature, including its six V’s. Moreover, we investigate case studies of big data analytics on various techniques of such analytics, namely, text, voice, video, and network analytics. We conclude that big data analytics can bring positive changes in many fields, such as education, military, healthcare, politics, business, agriculture, banking, and marketing, in the future.
Article
Full-text available
The two-way flow of information and energy is an important feature of the Energy Internet. Data analytics is a powerful tool in the information flow that aims to solve practical problems using data mining techniques. As the problem of electricity thefts via tampering with smart meters continues to increase, the abnormal behaviors of thefts become more diversified and more difficult to detect. Thus, a data analytics method for detecting various types of electricity thefts is required. However, the existing methods either require a labeled dataset or additional system information which is difficult to obtain in reality or have poor detection accuracy. In this paper, we combine two novel data mining techniques to solve the problem. One technique is the Maximum Information Coefficient (MIC), which can find the correlations between the non-technical loss (NTL) and a certain electricity behavior of the consumer. MIC can be used to precisely detect thefts that appear normal in shapes. The other technique is the clustering technique by fast search and find of density peaks (CFSFDP). CFSFDP finds the abnormal users among thousands of load profiles, making it quite suitable for detecting electricity thefts with arbitrary shapes. Next, a framework for combining the advantages of the two techniques is proposed. Numerical experiments on the Irish smart meter dataset are conducted to show the good performance of the combined method.
Article
Full-text available
The great amount of data collected by the Advanced Metering Infrastructure can help electric utilities to detect energy theft, a phenomenon that globally costs over 25 billions of dollars per year. To address this challenge, this paper describes a new approach to non-technical loss analysis in power utilities using a variant of the P2P computing that allows identifying frauds in the absence of total reachability of smart meters. Specifically, the proposed approach compares data recorded by the smart meters and by the collector in the same neighborhood area and detects the fraudulent customers through the application of a Multiple Linear Regression model. Using real utility data, the regression model has been compared with other data mining techniques such as SVM, neural networks and logistic regression, in order to validate the proposed approach. The empirical results show that the Multiple Linear Regression model can efficiently identify the energy thieves even in areas with problems of meters reachability.
Article
Full-text available
Non-technical electricity losses due to anomalies or frauds are accountable for important revenue losses in power utilities. Recent advances have been made in this area, fostered by the roll-out of smart meters. In this paper, we propose a methodology for non-technical loss detection using supervised learning. The methodology has been developed and tested on real smart meter data of all the industrial and commercial customers of Endesa. This methodology uses all the information the smart meters record (energy consumption, alarms and electrical magnitudes) to obtain an in-depth analysis of the customer’s consumption behavior. It also uses auxiliary databases to provide additional information regarding the geographical location and technological characteristics of each smart meter. The model has been trained, validated and tested on the results of approximately 57000 on-field inspections. It is currently in use in a non-technical loss detection campaign for big customers. Several state-of-the-art classifiers have been tested. The results show that extreme gradient boosted trees outperform the rest of the classifiers.
Article
Full-text available
Electricity theft can be harmful to power grid suppliers and cause economic losses. Integrating information flows with energy flows, smart grids can help to solve the problem of electricity theft owning to the availability of massive data generated from smart grids. The data analysis on the data of smart grids is helpful in detecting electricity theft because of the abnormal electricity consumption pattern of energy thieves. However, the existing methods have poor detection accuracy of electricity-theft since most of them were conducted on one dimensional (1-D) electricity consumption data and failed to capture the periodicity of electricity consumption. In this paper, we originally propose a novel electricity-theft detection method based on Wide & Deep Convolutional Neural Networks (CNN) model to address the above concerns. In particular, Wide & Deep CNN model consists of two components: the Wide component and the Deep CNN component. The Deep CNN component can accurately identify the non-periodicity of electricity-theft and the periodicity of normal electricity usage based on two dimensional (2-D) electricity consumption data. Meanwhile, the Wide component can capture the global features of 1-D electricity consumption data. As a result, Wide & Deep CNN model can achieve the excellent performance in electricity-theft detection. Extensive experiments based on realistic dataset show that Wide & Deep CNN model outperforms other existing methods.
Article
Full-text available
In smart grids, smart meters may potentially be attacked or compromised to cause certain security risks. It is challenging to identify malicious meters when there are a large number of users. In this paper, we explore the malicious meter inspection (MMI) problem in neighborhood area smart grids. We propose a suite of inspection algorithms in a progressive manner. First, we present a basic scanning method, which takes linear time to accomplish inspection. The scanning method is efficient when the malicious meter ratio is high. Then, we propose a binary-tree-based inspection algorithm, which performs better than scanning when the malicious meter ratio is low. Finally, we employ an adaptive-tree-based algorithm, which leverages advantages of both the scanning and binary-tree inspections. Our approaches are tailored to fit both static and dynamic situations. The theoretical and experimental results have shown the effectiveness of the adaptive tree approach.
Article
Non-technical losses in electricity utilities are responsible for major revenue losses. In this paper, we propose a novel end-to-end solution to self-learn the features for detecting anomalies and frauds in smart meters using a hybrid deep neural network. The network is fed with simple raw data, removing the need of handcrafted feature engineering. The proposed architecture consists of a long short-term memory network and a multi-layer perceptrons network. The first network analyses the raw daily energy consumption history whilst the second one integrates non-sequential data such as its contracted power or geographical information. The results show that the hybrid neural network significantly outperforms state-of-the-art classifiers as well as previous deep learning models used in non-technical losses detection. The model has been trained and tested with real smart meter data of Endesa, the largest electricity utility in Spain.
Article
Code-driven systems have extent to more than half of the world’s populations in ambient data and connectivity, offering formerly unimagined opportunities and unexpected threats. Evolutions in Artificial Intelligence (AI) are seen increasing day by day especially in industrial builds. The unconventional technique of AI in cyber-attacks seems to be quite daunting. The idea of a machine growing its own knowledge through self-learning becomes sophisticated to attack things is a fretful problem to the cyber world. Most of the time, these AI enabled cyber-attacks are performed using advanced malwares which incorporates advanced evasion techniques to evade security perimeters. Traditional cyber security methods fail to cope with these attacks. In order to address these issues, robust traffic classification system using Principal Component Analysis (PCA) and Artificial Neural Network (ANN) is proposed for providing extreme surveillance. Further, these proposed method aims to expose various AI based cyber-attacks with their present-day impact, and their fortune in the future. Simulation is carried out using a self-developed autonomous agent which learns by itself. Experimental results confirm that the proposed schemes are efficient to classify the attack traffic with 99% of accuracy when compared to the state of the art methods.
Article
Abstract: Recently, the radical digital transformation has deeply affected the traditional electricity grid and transformed it into an intelligent network (smart grid). This mutation is based on the progressive development of advanced technologies: advanced metering infrastructure (AMI) and smart meter which play a crucial role in the development of smart grid. AMI technologies have a promising potential in terms of improvement in energy efficiency, better demand management, and reduction in electricity costs. However the possibility of hacking smart meters and electricity theft is still among the most significant challenges facing electricity companies. In this regard, we propose a hybrid approach to detect anomalies associated with electricity theft in the AMI system, based on a combination of two robust machine learning algorithms; K-means and Deep Neural Network (DNN). K-means unsupervised machine learning algorithm is used to identify groups of customers with similar electricity consumption patterns to understand different types of normal behavior. DNN algorithm is used to build an accurate anomaly detection model capable of detecting changes or anomalies in usage behavior and deciding whether the customer has a normal or malicious consumption behavior. The proposed model is constructed and evaluated based on a real dataset from the Irish Smart Energy Trials. The results show a high performance of the proposed model compared to the models mentioned in the literature. Keywords: Anomaly detection, advanced metering infrastructure (AMI), smart grid, behavior, machine learning, deep neural network (DNN), cyber-security.
Conference Paper
Outlier detection in data streams is used in many applications, such as network flow monitoring, stock trading fluctuation detection and network intrusion detection [1]. These applications require that the algorithms finish outlier detection effectively in a limited amount of time and memory space. Local Outlier Factor (LOF) is a fundamental density-based outlier detection algorithm [2], it determines whether an object is an outlier by calculating LOF score of each observer. There are many LOF-based algorithms that have achieved excellent results with respect to outlier detection in data streams, while most of existing LOF-based algorithms have problems with excessive computation. In this paper, we propose a fast outlier detection algorithm in data streams, the algorithm effectively reduces the LOF calculation of the whole data by Z-score pruning. The algorithm consists of three phases. Firstly, generate the prediction data through the generator. Secondly, judge whether the observation object is a potential outlier by the Z-score of the residual from the origin value and the prediction value. Finally, calculate the LOF of the observation object in the current time window according to the judgment result of the previous step. It is proved by experiments that our algorithm effectively reduces the detection time consumption through Z-score pruning under the condition of ensuring the detection accuracy.
Article
Machine learning is a popular topic in data analysis and modeling. Many different machine learning algorithms have been developed and implemented in a variety of programming languages over the past 20 years. In this article, we first provide an overview of machine learning and clarify its difference from statistical inference. Then, we review Scikit-learn, a machine learning package in the Python programming language that is widely used in data science. The Scikit-learn package includes implementations of a comprehensive list of machine learning methods under unified data and modeling procedure conventions, making it a convenient toolkit for educational and behavior statisticians.
Article
Despite many potential advantages, Advanced Metering Infrastructures have introduced new ways to falsify meter readings and commit electricity theft. This study contributes a new model-agnostic, feature-engineering framework for theft detection in smart grids. The framework introduces a combination of Finite Mixture Model clustering for customer segmentation and a Genetic Programming algorithm for identifying new features suitable for prediction. Utilizing demand data from more than 4000 households, a Gradient Boosting Machine algorithm is applied within the framework, significantly outperforming the results of prior machine-learning, theft-detection methods. This study further examines some important practical aspects of deploying theft detection including: the detection delay; the required size of historical demand data; the accuracy in detecting thefts of various types and intensity; detecting irregular and unseen attacks; and the computational complexity of the detection algorithm.
Article
For the smart grid energy theft identification, this letter introduces a gradient boosting theft detector (GBTD) based on the three latest gradient boosting classifiers (GBCs): extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting method (LightGBM). While most of existing ML algorithms just focus on fine tuning the hyperparameters of the classifiers, our ML algorithm, GBTD, focuses on the feature engineering-based preprocessing to improve detection performance as well as time-complexity. GBTD improves both detection rate (DR) and false positive rate (FPR) of those GBCs by generating stochastic features like standard deviation, mean, minimum, and maximum value of daily electricity usage. GBTD also reduces the classifier complexity with weighted feature-importance (WFI) based extraction techniques. Emphasis has been laid upon the practical application of the proposed ML for theft detection by minimizing FPR and reducing data storage space and improving time-complexity of the GBTD classifiers. Additionally, this letter proposes an updated version of the existing six theft cases to mimic real world theft patterns and applies them to the dataset for numerical evaluation of the proposed algorithm.
Article
The reduction of non-technical losses is a significant part of the total potential benefits resulting from implementations of the smart grid concept. This paper proposes a data-based method to detect sources of theft and other commercial losses. Prototypes of typical consumption behavior are extracted through clustering of data collected from smart meters. A distance-based novelty detection framework classifies new data samples as malign if their distance to the typical consumption prototypes is significant. The proposed method works on the space of four different indicators of irregular consumption, enabling the easy interpretation of results. A use case based on real data is presented to evaluate the method. The threat model considers sixteen different possible types of changes in consumption pattern that result from non-technical losses, including attacks and defects present since the first day of metering. The proposed clustering-based novelty detection method for identification of non-technical losses, using the Gustafson-Kessel fuzzy clustering algorithm, achieves a true positive rate of 63.6% and false positive rate of 24.3%, outperforming other state-of-the-art unsupervised learning methods.
Article
The illegal use of electricity, defective meters, and a malfunctioning infrastructure are major causes of Non-Technical Losses (NTLs) in electric distribution systems. Although the use of supervised machine learning techniques to detect NTLs has been widely studied, further research is needed in order to address some significant challenges. (i) Given that fraudulent consumers remarkably outnumber non-fraudulent ones, the imbalanced nature of the dataset can have a major negative impact on the performance of supervised machine learning methods. (ii) Given the large number of dimensions present in the time series data used for training and testing classifiers, advanced signal processing techniques are required in order to extract the most relevant information. (iii) The effectiveness of classifiers must be evaluated using meaningful performance measures for imbalanced data. This paper proposes a framework that addresses the three previous challenges. The core of the proposed framework is the application of the Maximal Overlap Discrete Wavelet-Packet Transform (MODWPT) for feature extraction from time series data and the Random Undersampling Boosting (RUSBoost) algorithm for NTL detection. Moreover, our framework is evaluated using an extensive list of performance metrics. Experiments show that the MODWPT combined with the RUSBoost algorithm can significantly improve the quality of NTL predictions.
Article
Practical building operations usually deviate from the designed building operational performance due to the wide existence of operating faults and improper control strategies. Great energy saving potential can be realized if inefficient or faulty operations are detected and amended in time. The vast amounts of building operational data collected by the Building Automation System have made it feasible to develop data-driven approaches to anomaly detection. Compared with supervised analytics, unsupervised anomaly detection is more practical in analyzing real-world building operational data, as anomaly labels are typically not available. Autoencoder is a very powerful method for the unsupervised learning of high-level data representations. Recent development in deep learning has endowed autoencoders with even greater capability in analyzing complex, high-dimensional and large-scale data. This study investigates the potential of autoencoders in detecting anomalies in building energy data. An autoencoder-based ensemble method is proposed while providing a comprehensive comparison on different autoencoder types and training schemes. Considering the unique learning mechanism of autoencoders, specific methods have been designed to evaluate the autoencoder performance. The research results can be used as foundation for building professionals to develop advanced tools for anomaly detection and performance benchmarking.
Article
The Endesa Company is the main power utility in Spain. One of the main concerns of power distribution companies is energy loss, both technical and non-technical. A non-technical loss (NTL) in power utilities is defined as any consumed energy or service that is not billed by some type of anomaly. The NTL reduction in Endesa is based on the detection and inspection of the customers that have null consumption during a certain period. The problem with this methodology is the low rate of success of these inspections. This paper presents a framework and methodology, developed as two coordinated modules, that improves this type of inspection. The first module is based on a customer filtering based on text mining and a complementary artificial neural network. The second module, developed from a data mining process, contains a Classification & Regression tree and a Self-Organizing Map neural network. With these modules, the success of the inspections is multiplied by 3. The proposed framework was developed as part of a collaboration project with Endesa.
Article
According to The Brazilian Electricity Regulatory Agency, Brazil reached a loss of approximately US$ 4 billion in commercial losses during 2011, which correspond to more than 27,000 GWh. The strengthening of the Smart Grid has brought a considerable amount of research can be noticed, mainly with respect to the application of several artificial intelligence techniques in order to automatically detect commercial losses, but the problem of selecting the most representative features has not been widely discussed. In this paper, we make a parallel among the problem of commercial losses in Brazil and the task of irregular consumers characterization by means of a recent meta-heuristic optimization technique called Black Hole Algorithm. The experimental setup is conducted over two private datasets (commercial and industrial) provided by a Brazilian electric utility, and it shows the importance of selecting the most relevant features in the context of theft characterization.
Article
As one of the key components of the smart grid, advanced metering infrastructure brings many potential advantages such as load management and demand response. However, computerizing the metering system also introduces numerous new vectors for energy theft. In this paper, we present a novel consumption pattern-based energy theft detector, which leverages the predictability property of customers' normal and malicious consumption patterns. Using distribution transformer meters, areas with a high probability of energy theft are short listed, and by monitoring abnormalities in consumption patterns, suspicious customers are identified. Application of appropriate classification and clustering techniques, as well as concurrent use of transformer meters and anomaly detectors, make the algorithm robust against nonmalicious changes in usage pattern, and provide a high and adjustable performance with a low-sampling rate. Therefore, the proposed method does not invade customers' privacy. Extensive experiments on a real dataset of 5000 customers show a high performance for the proposed method.
World Loses $89.3 Billion to Electricity Theft Annually, $58.7 Billion in Emerging Markets. 2014
  • Pr Newswire
Malware traffic classification using principal component analysis and artificial neural network for extreme surveillance
  • D Arivudainambi
  • V K Ka
  • P Visu