Thesis

A Novel Combined And Game-Theoretic Approach To Avoid Smart Grids From Energy Stealing (MS Thesis with Source Codes)

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Electricity is widely used around 80% of the world, which depicts the significance of secure and efficient use of electricity. Nontechnical losses (NTLs) become one of the biggest issues for electric utilities around the world. The intentional malfunctioning with electric meters and false data injection cover the largest proportion of NTLs and have hazardous effects on the power systems. The digitalization of traditional grids gives a new revolution for an efficient exchange of information over short periods, which allows the electric utilities to devise innovative datadriven solutions for electricity theft detection (ETD). The existing data-driven approaches for ETD have limited ability to handle high-dimensional, noisy and imbalanced data. Moreover, these approaches have limited potential to derive features' associations during feature extraction. These limitations raise the misclassification rate, which makes the existing ETD approaches unacceptable for electric utilities. Therefore, in this thesis, we propose a new data-driven methodology, which consists of four new solutions to systematically detect the electricity fraudsters in the smart grid environment. Particularly, in first system model, we present a new class balancing mechanism based on the interquartile minority oversampling technique to handle the data imbalance issues. Then, a combined ETD model composed of long short-term memory (LSTM), UNet and adaptive boosting (Adaboost), termed as LSTM-UNet-Adaboost, is presented to detect electricity frauds. Afterwards, in second solution, we introduce a new mechanism that is based on two scenarios. In the first scenario, a new supervised learning based mechanism is presented, which is a combination of UNet and generative adversarial network (GAN), named as UNet-GAN. The GAN's structure is mainly comprised of two neural networks: generator and discriminator. Due to the excellent performance of UNet, we utilize it in both generator and discriminator parts. These two neural networks contest with each other in a game-theoretic manner to significantly boost the ETD performance. In the second scenario, we propose a new dynamic learning based semi-supervised solution, which consists of probabilistic guider (PG) and Ladder network. This solution is termed as PG-Ladder network. The PG dynamically guides the proposed PG-Ladder network to further improve its performance in terms of ETD. Furthermore, the conventional approaches require extensive experts' involvement and lose data relationships during feature extraction for effective theft detection. Therefore, in third system model, we solve these issues by presenting the new solution that is based on relational denoising autoencoder (RDAE) with the attention guided (AG) TripleGAN, named as RDAE-AG-TripleGAN. The limitations of conventional clustering mechanisms and scarcity of labeled electricity consumption (EC) data are solved by presenting the new two-fold end-to-end semi-supervised solution, referred as fourth solution. In the first fold, it groups the similar EC cases by employing the grey wolf optimization (GWO) based clustering mechanism, namely clustering by fast search and find of density peaks (CFSFDP), known as GC. In the second fold, we design a new relational stacked denoising autoencoder (RSDAE) enabled semi-supervised GAN, termed as RGAN, for ETD. Therefore, the combined solution is named as GC-RGAN. In the system, RSDAE performs as feature extractor and the generator model of proposed RGAN. The proposed semi-supervised solutions efficiently gain the potential benefits of both labeled and unlabeled representations. Furthermore, the proposed solutions are simulated and evaluated over the real-time smart meter dataset of state grid corporation of China using the most suitable performance indicators, e.g., area under the curve and Mathews correlation coefficient. The simulation outcomes validate that the proposed methodology surpasses other traditional methods, such as semi-supervised support vector machine and random forest, for ETD and become acceptable for real-time practices.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Specifically, paper [16], [17] utilizes support vector machines (SVM) to seek for the hyperplanes which can distinguish ordinary users and electricity stealing users. paper [18], [19] adopts long and short-term memory (LSTM) network to study the time characteristics of electricity data and achieves anti-electricity theft load. Paper [20], [21] leverages convolutional neural network (CNN) to deeply learn the features of electricity data, and the electricity stealing users can be identified accurately with these features. ...
Preprint
Full-text available
With the increasing number of electricity stealing users, the interests of countries are jeopardized and it brings economic burden to the government. However, due to the small-scale stealing and its random time coherence, it is difficult to find electricity stealing users. To solve this issue, we first generate the hybrid dataset composed of real electricity data and specific electricity stealing data. Then, we put forward the timing shift based bi-residual network (TS-BiResNet) model. It learns the features of electricity consumption data on two aspects, i.e., shallow features and deep features, and meanwhile takes time factor into consideration. The simulation results show that TS-BiResNet model can detect electricity stealing behaviors that are small-scaled and randomly coherent with time. Besides, its detection accuracy is superior to benchmark schemes, i.e., long short-term memory (LSTM) model and Bi-ResNet model.
Article
Full-text available
Electricity theft is one of the main causes of non-technical losses and its detection is important for power distribution companies to avoid revenue loss. The advancement of traditional grids to smart grids allows a two-way flow of information and energy that enables real-time energy management, billing and load surveillance. This infrastructure enables power distribution companies to automate electricity theft detection (ETD) by constructing new innovative data-driven solutions. Whereas, the traditional ETD approaches do not provide acceptable theft detection performance due to high-dimensional imbalanced data, loss of data relationships during feature extraction and the requirement of experts' involvement. Hence, this paper presents a new semi-supervised solution for ETD, which consists of relational denoising autoencoder (RDAE) and attention guided (AG) TripleGAN, named as RDAE-AG-TripleGAN. In this system, RDAE is implemented to derive features and their associations while AG performs feature weighting and dynamically supervises the AG-TripleGAN. As a result, this procedure significantly boosts the ETD. Furthermore, to demonstrate the acceptability of the proposed methodology over conventional approaches, we conducted extensive simulations using the real power consumption data of smart meters. The proposed solution is validated over the most useful and suitable performance indicators: area under the curve, precision, recall, Matthews correlation coefficient, F1-score and precision-recall area under the curve. The simulation results prove that the proposed method efficiently improves the detection of electricity frauds against conventional ETD schemes such as extreme gradient boosting machine and transductive support vector machine. The proposed solution achieves the detection rate of 0.956, which makes it more acceptable for electric utilities than the existing approaches.
Article
Full-text available
Electricity is widely used around 80\% of the world. Electricity theft has dangerous effects on utilities in terms of power efficiency and costs billions of dollars per annum. The~enhancement of the traditional grids gave rise to smart grids that enable one to resolve the dilemma of electricity theft detection (ETD) using an extensive amount of data formulated by smart meters. This data are used by power utilities to examine the consumption behaviors of consumers and to decide whether the consumer is an electricity thief or benign. However, the traditional data-driven methods for ETD have poor detection performances due to the high-dimensional imbalanced data and their limited ETD capability. In this paper, we present a new class balancing mechanism based on the interquartile minority oversampling technique and a combined ETD model to overcome the shortcomings of conventional approaches. The combined ETD model is composed of long short-term memory (LSTM), UNet and adaptive boosting (Adaboost), and termed LSTM--UNet--Adaboost. In~this~regard, LSTM--UNet--Adaboost combines the advantages of deep learning (LSTM-UNet) along with ensemble learning (Adaboost) for ETD. {Moreover, the performance of the proposed LSTM--UNet--Adaboost scheme was simulated and evaluated over the real-time smart meter dataset given by the State Grid Corporation of China. The simulations were conducted using the most appropriate performance indicators, such as area under the curve, precision, recall and F1 measure. The proposed solution obtained the highest results as compared to the existing benchmark schemes in terms of selected performance measures. More specifically, it achieved the detection rate of 0.92, which~was the highest among existing benchmark schemes, such as logistic regression, support vector machine and random under-sampling boosting technique. Therefore, the simulation outcomes validate that the proposed LSTM--UNet--Adaboost model surpasses other traditional methods in terms of ETD and is more acceptable for real-time practices.
Article
Full-text available
Due to the increase in the number of electricity thieves, the electric utilities are facing problems in providing electricity to their consumers in an efficient way. An accurate Electricity Theft Detection (ETD) is quite challenging due to the inaccurate classification on the imbalance electricity consumption data, the overfitting issues and the High False Positive Rate (FPR) of the existing techniques. Therefore, intensified research is needed to accurately detect the electricity thieves and to recover a huge revenue loss for utility companies. To address the above limitations, this paper presents a new model, which is based on the supervised machine learning techniques and real electricity consumption data. Initially, the electricity data are pre-processed using interpolation, three sigma rule and normalization methods. Since the distribution of labels in the electricity consumption data is imbalanced, an Adasyn algorithm is utilized to address this class imbalance problem. It is used to achieve two objectives. Firstly, it intelligently increases the minority class samples in the data. Secondly, it prevents the model from being biased towards the majority class samples. Afterwards, the balanced data are fed into a Visual Geometry Group (VGG-16) module to detect abnormal patterns in electricity consumption. Finally, a Firefly Algorithm based Extreme Gradient Boosting (FA-XGBoost) technique is exploited for classification. The simulations are conducted to show the performance of our proposed model. Moreover, the state-of-the-art methods are also implemented for comparative analysis, i.e., Support Vector Machine (SVM), Convolution Neural Network (CNN), and Logistic Regression (LR). For validation, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), Receiving Operating Characteristics Area Under Curve (ROC-AUC), and Precision Recall Area Under Curve (PR-AUC) metrics are used. Firstly, the simulation results show that the proposed Adasyn method has improved the performance of FA-XGboost classifier, which has achieved F1-score, precision, and recall of 93.7%, 92.6%, and 97%, respectively. Secondly, the VGG-16 module achieved a higher generalized performance by securing accuracy of 87.2% and 83.5% on training and testing data, respectively. Thirdly, the proposed FA-XGBoost has correctly identified actual electricity thieves, i.e., recall of 97%. Moreover, our model is superior to the other state-of-the-art models in terms of handling the large time series data and accurate classification. These models can be efficiently applied by the utility companies using the real electricity consumption data to identify the electricity thieves and overcome the major revenue losses in power sector.
Article
Full-text available
Abstract: In systems connected to smart grids, smart meters with fast and efficient responses are very helpful in detecting anomalies in realtime. However, sending data with a frequency of a minute or less is not normal with today’s technology because of the bottleneck of the communication network and storage media. Because mitigation cannot be done in realtime, we propose prediction techniques using Deep Neural Network (DNN), Support Vector Regression (SVR), and k-Nearest Neighbors (KNN). In addition to these techniques, the prediction timestep is chosen per day and wrapped in sliding windows, and clustering using Kmeans and intersection Kmeans and HDBSCAN is also evaluated. The predictive ability applied here is to predict whether anomalies in electricity usage will occur in the next few weeks. The aim is to give the user time to check their usage and from the utility side, whether it is necessary to prepare a sufficient supply. We also propose the latency reduction to counter higher latency as in the traditional centralized system by adding layer Edge Meter Data Management System (MDMS) and Cloud-MDMS as the inference and training model. Based on the experiments when running in the Raspberry Pi, the best solution is choosing DNN that has the shortest latency 1.25 ms, 159 kB persistent file size, and at 128 timesteps.
Article
Full-text available
Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges, one of which is to increase the depth of the neural networks. Such deeper networks not only increase training times, but also suffer from vanishing gradients problem while training. In this work, we propose gradient amplification approach for training deep learning models to prevent vanishing gradients and also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates. We perform experiments on VGG-19 and Resnet models (Resnet-18 and Resnet-34), and study the impact of amplification parameters on these models in detail. Our proposed approach improves performance of these deep learning models even at higher learning rates, thereby allowing these models to achieve higher performance with reduced training time.
Article
Full-text available
Multi-microgrid (MMG) system is a new method that concurrently incorporates different types of distributed energy resources, energy storage systems and demand responses to provide reliable and independent electricity for the community. However, MMG system faces the problems of management, real-time economic operations and controls. Therefore, this study proposes an energy management system (EMS) that turns an infinite number of MMGs into a coherence and efficient system, where each MMG can achieve its goals and perspectives. The proposed EMS employs a cooperative game to achieve efficient coordination and operations of the MMG system and also ensures a fair energy cost allocation among members in the coalition. This study considers the energy cost allocation problem when the number of members in the coalition grows exponentially. The energy cost allocation problem is solved using a column generation algorithm. The proposed model includes energy storage systems, demand loads, real-time electricity prices and renewable energy. The estimate of the daily operating cost of the MMG using a proposed deep convolutional neural network (CNN) is analyzed in this study. An optimal scheduling policy to optimize the total daily operating cost of MMG is also proposed. Besides, other existing optimal scheduling policies, such as approximate dynamic programming (ADP), model prediction control (MPC), and greedy policy are considered for the comparison. To evaluate the effectiveness of the proposed model, the real-time electricity prices of the electric reliability council of Texas are used. Simulation results show that each MMG can achieve energy cost savings through a coalition of MMG. Moreover, the proposed optimal policy method achieves MG's daily operating cost reduction up to 87.86% as compared to 79.52% for the MPC method, 73.94% for the greedy policy method and 79.42% for ADP method.
Article
Full-text available
With the development and construction of country, the rapid growth of electricity consumption has caused the problem of undersupply of electricity. It has become a necessarily daily work to forecast the load of the electricity precisely. A new method, new clustering load forecasting method, is used to forecast residential electricity consumption. Distinguished from other methods with Euclidean distance as their evaluation index, however, the method in this paper is defined with the Pearson correlation coefficient. Furthermore, CNN is used in the experiment about residential load forecast. The result indicates that the new method offers more accurate forecasting data than the traditional methods.
Article
Full-text available
Energy theft refers to the intentional and illegal usage of electricity by various means. A number of studies have been conducted on energy theft detection in the advanced metering infrastructure using machine learning methods. However, applying machine learning for energy theft detection has a problem in that it is difficult to obtain enough electricity theft data to train a machine learning model. In this paper, we propose a method based on anomaly pattern detection to detect electricity theft in data streams generated from smart meters. The proposed method requires only normal energy consumption data to train the model. Previous usage records of customers being monitored are not needed for energy theft detection. This characteristic makes the proposed method applicable in real situations. Experiments were conducted using real smart meter data and artificial attack data, including the preprocessing of daily consumption vectors by standard normalization, the construction of an outlier detection model on normal electricity consumption data of randomly chosen customers, and the application of anomaly pattern detection on test data streams. Some promising results were obtained, notably, that attacks of types 4, 5, 6 were detected with an average F1 value of 0.93 and average delay of 19 days.
Article
Full-text available
The electrical losses in power systems are divided into non-technical losses (NTLs) and technical losses (TLs). NTL is more harmful than TL because it includes electricity theft, faulty meters and billing errors. It is one of the major concerns in the power system worldwide and incurs a huge revenue loss for utility companies. Electricity theft detection (ETD) is the mechanism used by industry and academia to detect electricity theft. However, due to imbalanced data, overfitting issues and the handling of high-dimensional data, the ETD cannot be applied efficiently. Therefore, this paper proposes a solution to address the above limitations. A long short-term memory (LSTM) technique is applied to detect abnormal patterns in electricity consumption data along with the bat-based random under-sampling boosting (RUSBoost) technique for parameter optimization. Our proposed system model uses the normalization and interpolation methods to pre-process the electricity data. Afterwards, the pre-processed data are fed into the LSTM module for feature extraction. Finally, the selected features are passed to the RUSBoost module for classification. The simulation results show that the proposed solution resolves the issues of data imbalancing, overfitting and the handling of massive time series data. Additionally, the proposed method outperforms the state-of-the-art techniques; i.e., support vector machine (SVM), convolutional neural network (CNN) and logistic regression (LR). Moreover, the F1-score, precision, recall and receiver operating characteristics (ROC) curve metrics are used for the comparative analysis.
Article
Full-text available
Electricity fraud in billing are the primary concerns for Distribution System Operators (DSO). It is estimated that billions of dollars are wasted annually due to these illegal activities. DSOs around the world, especially in underdeveloped countries, still utilize conventional time consuming and inefficient methods for Non-Technical Loss (NTL) detection. This research work attempts to solve the mentioned problem by developing an efficient energy theft detection model in order to identify the fraudster customers in a power distribution system. The key motivation for the present study is to assist the DSOs in their fight against energy theft. The proposed computational model initially utilizes a set of distinct features extracted from the monthly consumers’ consumption data, obtained from Multan Electric Power Company (MEPCO) Pakistan, to segregate the honest and the fraudulent customers. The Pearson’s chi-square feature selection algorithm is adopted to select the most relevant features among the extracted ones. Finally, the Boosted C5.0 Decision Tree (DT) algorithm is used to classify the honest and the fraudster consumers based on the outcomes of the selected features. To validate the superiority of the proposed NTL detection approach, its performance is matched with that of few state-of-the-art machine learning algorithms (one of most exciting recent technologies in Artificial Intelligence), like Random Forest (RF), Support Vector Machine (SVM), Artificial Neural Network (ANN) and Extreme Gradient Bossting (XGBoost). The proposed NTL detection method provides an accuracy of 94.6%, Sensitivity of 78.1%, Specificity of 98.2%, F1 score 84.9% and Precision of 93.2% which are significantly higher than that of the same for the above-mentioned algorithms.
Article
Full-text available
Imbalanced data refers to a problem in machine learning where there exists unequal distribution of instances for each classes. Performing a classification task on such data can often turn bias in favour of the majority class. The bias gets multiplied in cases of high dimensional data. To settle this problem, there exists many real-world data mining techniques like over-sampling and under-sampling, which can reduce the Data Imbalance. Synthetic Minority Oversampling Technique (SMOTe) provided one such state-of-the-art and popular solution to tackle class imbalancing, even on high-dimensional data platform. In this work, a novel and consistent oversampling algorithm has been proposed that can further enhance the performance of classification, especially on binary imbalanced datasets. It has been named as NMOTe (Navo Minority Oversampling Technique), an upgraded and superior alternative to the existing techniques. A critical analysis and comprehensive overview on the literature has been done to get a deeper insight into the problem statements and nurturing the need to obtain the most optimal solution. The performance of NMOTe on some standard datasets has been established in this work to get a statistical understanding on why it has edged the existing state-of-the-art to become the most robust technique for solving the two-class data imbalance problem.
Article
Full-text available
Due to the existence of marine environmental noise, coupled with the instability of underwater acoustic channel, ship-radiated noise (SRN) signals detected by sensors tend to suffer noise pollution as well as distortion caused by the transmission medium, making the denoising of the raw detected signals the new focus in the field of underwater acoustic target recognition. In view of this, this paper presents a novel hybrid feature extraction scheme integrating improved variational mode decomposition (IVMD), normalized maximal information coefficient (norMIC) and permutation entropy (PE) for SRN signals. Firstly, the IVMD method is employed to decompose the SRN signals into a number of finite intrinsic mode functions (IMFs). The noise IMFs are then filtered out by a denoising method before PE extraction. Next, the MIC between each retained IMF and the raw SRN signal and PE of retained IMFs are calculated, respectively. After this, the norMICs are used to weigh the PE values of the retained IMFs and the sum of the weighted PE results is regarded as the classification parameter. Finally, the feature vectors are fed into the particle swarm optimization-based support vector machine multi-class classifier (PSO-SVM) to identify different types of SRN samples. The experimental results have indicated that the classification accuracy of the proposed method is as high as 99.1667%, which is much higher than that of other currently existing methods. Hence, the method proposed in this paper is more suitable for feature extraction of SRN signals in practical application.
Article
Full-text available
One of the keys of enhancing the quality of electric power supply resides in the accuracy of the consumption metering. Nowadays development of the sensors, devices and systems for electricity metering offers the basis for this service. Nevertheless, this achievement in many situations is altered such that appropriate measures must be adopted even if already significant costs have been registered. In this paper is proposed and discussed an optimal solution based on the identification and minimizing the measurement errors for increasing the electricity readings accuracy and lowering the electricity losses and related costs. In this regard, a mathematical model was developed and a particular algorithm for the mentioned problem is proposed and tested in the case of a power distribution company where an enhancement on average of the own technological consumption with 4% was recorded.
Article
Full-text available
Forecasting in the smart grid (SG) plays a vital role in maintaining the balance between demand and supply of electricity, efficient energy management, better planning of energy generation units and renewable energy sources and their dispatching and scheduling. Existing forecasting models are being used and new models are developed for a wide range of SG applications. These algorithms have hy-perparameters which need to be optimized carefully before forecasting. The optimized values of these algorithms increase the forecasting accuracy up-to a significant level. In this paper, we present a brief literature review of forecasting models and the optimization methods used to tune their hyperparam-eters. In addition, we have also discussed the data preprocessing methods. A comparative analysis of these forecasting models, according to their hyperparameter optimization, error methods and prepro-cessing methods, is also presented. Besides, we have critically analyzed the existing optimization and data preprocessing models and highlighted the important findings. A survey of existing survey papers is also presented and their recency score is computed based on the number of recent papers reviewed in them. By recent, we mean that the year in which a survey paper is published and its previous three years. Finally, future research directions are discussed in detail.
Article
Full-text available
Effective detection of electricity theft is essential to maintain power system reliability. With the development of smart grids, traditional electricity theft detection technologies have become ineffective to deal with the increasingly complex data on the users’ side. To improve the auditing efficiency of grid enterprises, a new electricity theft detection method based on improved synthetic minority oversampling technique (SMOTE) and improve random forest (RF) method is proposed in this paper. The data of normal and electricity theft users were classified as positive data (PD) and negative data (ND), respectively. In practice, the number of ND was far less than PD, which made the dataset composed of these two types of data become unbalanced. An improved SOMTE based on K-means clustering algorithm (K-SMOTE) was firstly presented to balance the dataset. The cluster center of ND was determined by K-means method. Then, the ND were interpolated by SMOTE on the basis of the cluster center to balance the entire data. Finally, the RF classifier was trained with the balanced dataset, and the optimal number of decision trees in RF was decided according to the convergence of out-of-bag data error (OOB error). Electricity theft behaviors on the user side were detected by the trained RF classifier.
Article
Full-text available
In this study, a novel framework is proposed for efficient energy management of residential buildings to reduce the electricity bill, alleviate peak-to-average ratio (PAR), and acquire the desired trade-off between electricity bill and user-discomfort in smart grid. The proposed framework is an integrated framework of artificial neural network (ANN) based forecast engine and our proposed day-ahead grey wolf modified enhanced differential evolution algorithm (DA-GmEDE) based home energy management controller (HEMC). The forecast engine forecasts price-based demand response (DR) signal and energy consumption patterns and HEMC schedules smart home appliances under the forecasted pricing signal and energy consumption pattern for efficient energy management. The proposed DA-GmEDE based strategy is compared with two benchmark strategies: day-ahead genetic algorithm (DA-GA) based strategy, and day-ahead game-theory (DA-game-theoretic) based strategy for performance validation. Moreover, extensive simulations are conducted to test the effectiveness and productiveness of the proposed DA-GmEDE based strategy for efficient energy management. The results and discussion illustrate that the proposed DA-GmEDE strategy outperforms the benchmark strategies by 33.3% in terms of efficient energy management.
Article
Full-text available
Dividing abstract object sets into multiple groups, called clustering, is essential for effective data mining. Clustering can find innate but unknown real-world knowledge that is inaccessible by any other means. Rodriguez and Laio have published a paper about a density-based fast clustering algorithm in Science called CFSFDP. CFSFDP is a highly efficient algorithm that clusters objects by using fast searching of density peaks. But with CFSFDP, the essential second step of finding clustering centers must be done manually. Furthermore, when the amount of data objects increases or a decision graph is complicated, determining clustering centers manually is difficult and time consuming, and clustering accuracy reduces sharply. To solve this problem, this paper proposes an improved clustering algorithm, ACDPC, that is based on data detection, which can automatically determinate clustering centers without manual intervention. First, the algorithm calculates the comprehensive metrics and sorts them based on the CFSFDP method. Second, the distance between the sorted objects is used to judge whether they are the correct clustering centers. Finally, the remaining objects are grouped into clusters. This algorithm can efficiently and automatically determine clustering centers without calculating additional variables. We verified ACDPC using three standard datasets and compared it with other clustering algorithms. The experimental results show that ACDPC is more efficient and robust than alternative methods.
Article
Full-text available
In recent years, various types of power theft incidents have occurred frequently, and the training of the power-stealing detection model is susceptible to the influence of the imbalanced data set and the data noise, which leads to errors in power-stealing detection. Therefore, a power-stealing detection model is proposed, which is based on Improved Conditional Generation Adversarial Network (CWGAN), Stacked Convolution Noise Reduction Autoencoder (SCDAE) and Lightweight Gradient Boosting Decision Machine (LightGBM). The model performs Generation- Adversarial operations on the original unbalanced power consumption data to achieve the balance of electricity data, and avoids the interference of the imbalanced data set on classifier training. In addition, the convolution method is used to stack the noise reduction auto-encoder to achieve dimension reduction of power consumption data, extract data features and reduce the impact of random noise. Finally, LightGBM is used for power theft detection. The experiments show that CWGAN can effectively balance the distribution of power consumption data. Comparing the detection indicators of the power-stealing model with various advanced power-stealing models on the same data set, it is finally proved that the proposed model is superior to other models in the detection of power stealing.
Article
Full-text available
The detection of abnormal electricity consumption behavior has been of great importance in recent years. However, existing research often focuses on algorithm improvement and ignores the process of obtaining features. The optimal feature set, which reflects customers’ electricity consumption behavior, has a significant influence on the final detection results. Moreover, it is not straightforward to obtain datasets with label information. In this paper, a method based on feature engineering for unsupervised detection of abnormal electricity consumption behavior is proposed. First, the original feature set is constructed by brainstorming in the feature engineering step. Then, the optimal feature set, which reflects the customers' electricity consumption behavior, is obtained by features selected based on the variance and similarity between them. After that, in the abnormal detection step, a density-based clustering algorithm, in which the best clustering parameters are selected through iteration and evaluation, combined with unsupervised clustering evaluation indexes, is used to detect abnormal electricity consumption behaviors. Finally, using the load dataset of an industrial park, several typical feature strategies are applied for comparison with the feature engineering proposed in this paper. To perform the evaluation, the label information of abnormal behaviors is obtained by combining the original electricity consumption behavior detection results with abnormal data injections. The abnormal detection method proposed has given good results and outperformed typical feature strategies in an effective and generalizable way.
Article
Full-text available
The clustering algorithm plays an important role in data mining and image processing. The breakthrough of algorithm precision and method directly affects the direction and progress of the following research. At present, types of clustering algorithms are mainly divided into hierarchical, density-based, grid-based and model-based ones. This paper mainly studies the Clustering by Fast Search and Find of Density Peaks (CFSFDP) algorithm, which is a new clustering method based on density. The algorithm has the characteristics of no iterative process, few parameters and high precision. However, we found that the clustering algorithm did not consider the original topological characteristics of the data. We also found that the clustering data is similar to the social network nodes mentioned in DeepWalk, which satisfied power-law distribution. In this study, we tried to consider the topological characteristics of the graph in the clustering algorithm. Based on previous studies, we propose a clustering algorithm that adds the topological characteristics of original data on the basis of the CFSFDP algorithm. Our experimental results show that the clustering algorithm with topological features significantly improves the clustering effect and proves that the addition of topological features is effective and feasible.
Article
Full-text available
This paper proposes a Bayesian network (BN) that can construct the nonlinear dependence among wind speed, solar irradiation, and load. The correlations of random variables (RVs) are analyzed using the Pearson correlation coefficient, Kendall rank correlation coefficient, and Spearman rank correlation coefficient. According to Bayesian theory, the Bayesian information criterion (BIC) and maximum likelihood estimation (MLE) method are employed to determine the structure and parameters of BN. Then the BN model of RVs is established. The constructed BN model is the joint probability distribution of RVs that can present the nonlinear dependence and marginal distribution (MD) of RVs without limitation. The testing samples of wind speed, solar irradiation, and load are generated using the autoregressive integrated moving average (ARMA) model. Then these samples are utilized to construct the probability model of a BN and C-vine copula. The modeling accuracy and efficiency of the probability models of BN and C-vine copula are compared, and the quality of the synthetic samples output from them are analyzed. In the modified IEEE 118-bus test system, two kinds of synthetic samples are used to calculate the probabilistic load flow (PLF), and the accuracy and efficiency of the PLF based on the BN are tested. The validity of the BN model is verified.
Article
Full-text available
Electric load data are essential for data-driven approaches (including deep learning) in smart grid, and advanced smart meter technologies provide fine-grained data with reliable communications. Despite the recent development of smart metering devices, however, missing data still arise due to unexpected device power off, communication failure, measuring error, or other unknown reasons. In this paper, we investigate a deep learning framework for missing imputation of smart meter data by leveraging a denoising autoencoder (DAE). Then, we compare the performance of the proposed DAE with traditional methods as well as other recently developed generative models, e.g., variational autoencoder and Wasserstein autoencoder. The proposed DAE based imputation shows significantly better results compared to other methods in terms of root mean square error (RMSE) by up to 28.9% for point-wise error, and by up to 56% for daily-accumulated error.
Conference Paper
Full-text available
Short-term electricity supply and demand forecasting using weather parameters including: temperature, wind speed, and solar radiations improve the operational efficiency and accuracy of power systems. There are many weather parameters which have influential affect on the supply and demand of electricity, but temperature, solar radiations, and wind speed are the most important parameters. Our proposed time series model is based on preprocessing, feature extraction, data preparation, and Enhanced Convolutional Neural Network referred as ECNN module for short-term weather parameters forecasting up to 6-hours ahead. The proposed ECNN time series model is applied on 61 locations of United States, collected from National Solar Radiation Database (NSRDB). Model trained on 15-years data and validated on additional two-years out of sample data. Simulation result shows that our proposed model performs better than traditional benchmark models in terms of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Relative Root Mean Square Error (RMSE%) performance metrics. Result shows that the proposed model is effective for short-term forecasting of temperature, solar radiations, and wind speed. Moreover, proposed model improves the accuracy and operational efficiency of power systems.
Article
Full-text available
This study proposes an efficient energy management method to systematically manage the energy consumption in the residential area to alleviate the peak to average ratio and mitigate electricity cost along with user comfort maximiza-tion. We developed an efficient energy management scheme using mixed integer linear programming (MILP), which schedules smart appliances and charg-ing/discharging of electric vehicles (EVs) optimally in order to mitigate energy costs. In the proposed model, consumer is able to generate its own energy from microgrid consisting of solar panels and wind turbines. We also consider an energy storage system (ESS) for efficient energy utilization. This work also performs energy forecasting using wind speed and solar radiation prediction for efficient energy management. Moreover, we perform extensive simulations to validate our developed MILP based scheme and results affirm the effectiveness and productiveness of our proposed energy efficient technique.
Article
Full-text available
In order to keep track of the operational state of power grids, the world’s largest sensor system, smart grid, was built by deploying hundreds of millions of smart meters. Such a system makes it possible to discover and make quick response to any hidden threat to the entire power grid. Non-technical losses (NTLs) have always been a major concern for their consequent security risks as well as immeasurable revenue loss. However, various causes of NTL may have different characteristics reflected in the data. Accurately capturing these anomalies faced with such a large scale of collected data records is rather tricky as a result. In this paper, we proposed a new methodology of detecting abnormal electricity consumptions. We did a transformation of the collected time-series data which turns it into an image representation that could well reflect users’ relatively long term consumption behaviors. Inspired by the excellent neural network architecture used for objective detection in computer vision, we designed our deep learning model that takes the transformed images as input and yields joint features inferred from the multiple aspects the input provides. Considering the limited amount of labeled samples, especially the abnormal ones, we used our model in a semi-supervised fashion that was brought about in recent years. The model is tested on samples which are verified by on-field inspections and our method showed significant improvement for NTL detection compared with the state-of-the-art methods.
Article
Full-text available
With the ever-growing demand of electric power, it is quite challenging to detect and prevent Non-Technical Loss (NTL) in power industries. NTL is committed by meter bypassing, hooking from the main lines, reversing and tampering the meters. Manual on-site checking and reporting of NTL remains an unattractive strategy due to the required manpower and associated cost. The use of machine learning classifiers has been an attractive option for NTL detection. It enhances data-oriented analysis and high hit ratio along with less cost and manpower requirements. However, there is still a need to explore the results across multiple types of classifiers on a real-world dataset. This paper considers a real dataset from a power supply company in Pakistan to identify NTL. We have evaluated 15 existing machine learning classifiers across 9 types which also include the recently developed CatBoost, LGBoost and XGBoost classifiers. Our work is validated using extensive simulations. Results elucidate that ensemble methods and Artificial Neural Network (ANN) outperform the other types of classifiers for NTL detection in our real dataset. Moreover, we have also derived a procedure to identify the top-14 features out of a total of 71 features, which are contributing 77% in predicting NTL. We conclude that including more features beyond this threshold does not improve performance and thus limiting to the selected feature set reduces the computation time required by the classifiers. Last but not least, the paper also analyzes the results of the classifiers with respect to their types, which has opened a new area of research in NTL detection.
Article
Full-text available
To facilitate the management of 3D content in applications, some researchers add semantics to the geometric description of 3D models. However, the insurmountable semantic gap between 3D model and semantic description is the biggest obstacle to the matching of them. This paper proposes a novel network framework named Multi-modal Auxiliary Classifier Generative Adversarial Network with autoencoder (MACGAN-AE) for the matching of 3D model and its semantic description. Firstly, the Multi-modal Auxiliary Classifier Generative Adversarial Network is presented to solve the multi-modal classification. It captures the latent correlated representation between multi-modes and bridges the semantic gap of them. Then, the autoencoder is introduced to construct MACGAN-AE to further enhance the correlation between 3D model and its semantic description. The framework is expected to minimize the semantic gap between 3D model and its corresponding semantic description. In addition, to preserve the relationships between data after feature projection, this paper also defines a structure-preserving loss to reduce the intra-class distance and increase the inter-class distance. Experimental results on XMediaNet dataset demonstrate that our method significantly outperforms other methods.
Article
Full-text available
Anomaly detection in home power monitoring can be categorized into two main types: detection of electrical theft, leakage, or nontechnical loss and monitoring anomalies in the daily activities of residents. Focusing on the application and practicality of anomaly detection, we propose sample efficient home power anomaly detection (SEPAD) with improved monitoring performance in terms of electricity usage as well as changes in the daily living activities of residents via provision of detailed feedback. SEPAD consists of two classifiers: an appliance pattern matching classifier (APMC) and an energy consumption habit classifier (ECHC). The APMC uses a single-source separation framework based on a semi-supervised support vector machine (semi-SVM) model. This semi-supervised learning method requires only a small amount of labeled data to achieve high accuracy in near real time and is a sample efficient detection method. The hidden Markov model (HMM)-based ECHC improves the rationality of SEPAD by providing anomaly detection functionality with respect to the daily activities of householders, especially the elderly and residents in developing areas. When SEPAD detects the appearance of an unknown pattern or known patterns contrary to the household’s electricity usage habits, it triggers an alarm. SEPAD was applied to monitor power consumption data from Mkalama, a rural area in Tanzania with 52 households containing nearly 150 occupants connected to a solar powered off-grid network. The results of the practical test demonstrate the high accuracy and practicality of the proposed method.
Article
Full-text available
Non-technical losses (NTL) caused by fault or electricity theft is greatly harmful to the power grid. Industrial customers consume most of the power energy, and it is important to reduce this part of NTL. Currently, most work concentrates on analyzing characteristic of electricity consumption to detect NTL among residential customers. However, the related feature models cannot be adapted to industrial customers because they do not have a fixed electricity consumption pattern. Therefore, this paper starts from the principle of electricity measurement, and proposes a deep learning-based method to extract advanced features from massive smart meter data rather than artificial features. Firstly, we organize electricity magnitudes as one-dimensional sample data and embed the knowledge of electricity measurement in channels. Then, this paper proposes a semi-supervised deep learning model which uses a large number of unlabeled data and adversarial module to avoid overfitting. The experiment results show that our approach can achieve satisfactory performance even when trained by very small samples. Compared with the state-of-the-art methods, our method has achieved obvious improvement in all metrics.
Article
Full-text available
This paper proposes a filter-based feature selection method by combining the measurement of kernel canonical correlation analysis (KCCA) with the mutual information (MI)-based feature selection method, named mRMJR-KCCA. The mRMJR-KCCA maximizes the relevance between the feature candidate and the target class labels and simultaneously minimizes the joint redundancy between the feature candidate and the already selected features in the view of KCCA. To improve the computation efficiency, we adopt the Incomplete Cholesky Decomposition to approximate the kernel matrix in implementing the KCCA in mRMJR-KCCA for larger-size datasets. The proposed method is experimentally evaluated on 13 classification-associated datasets. Compared with certain popular feature selection methods, the experimental results demonstrate the better performance of the proposed mRMJR-KCCA.
Article
Full-text available
Non-technical losses (NTLs) have been a major concern for power distribution companies (PDCs). Billions of dollars are lost each year due to fraud in billing, metering, and illegal consumer activities. Various studies have explored different methodologies for efficiently identifying fraudster consumers. This study proposes a new approach for NTL detection in PDCs by using the ensemble bagged tree (EBT) algorithm. The bagged tree is an ensemble of many decision trees which considerably improves the classification performance of many individual decision trees by combining their predictions to reach a final decision. This approach relies on consumer energy usage data to identify any abnormality in consumption which could be associated with NTL behavior. The key motive of the current study is to provide assistance to the Multan Electric Power Company (MEPCO) in Punjab, Pakistan for its campaign against energy stealers. The model developed in this study generates the list of suspicious consumers with irregularities in consumption data to be further examined on-site. The accuracy of the EBT algorithm for NTL detection is found to be 93.1%, which is considerably higher compared to conventional techniques such as support vector machine (SVM), k-th nearest neighbor (KNN), decision trees (DT), and random forest (RF) algorithm.
Article
Full-text available
In advanced metering infrastructure (AMI) networks, smart meters installed at the consumer side should report fine-grained power consumption readings (every few minutes) to the system operator for billing, real-time load monitoring, and energy management. On the other hand, AMI networks are vulnerable to cyber-attacks where malicious consumers report false (low) electricity consumption to reduce their bills in an illegal way. Therefore, it is imperative to develop schemes to accurately identify the consumers that steal electricity by reporting false electricity usage. Most of the existing schemes rely on machine learning for electricity theft detection using the consumers’ fine-grained power consumption meter readings. However, this fine-grained data that is used for electricity theft decetion, load monitoring, and billing can also be misused to infer sensitive information regarding the consumers such as whether they are on travel, the appliances they use, etc. In this paper, we propose an efficient and privacy-preserving electricity theft detection scheme for AMI network and we refer to it as PPETD. Our scheme allows system operators to identify the electricity thefts, monitor the loads, and compute electricity bills efficiently using masked fine-grained meter readings without violating the consumers’ privacy. PPETD uses secret sharing to allow the consumers to send masked readings to the system operator such that these readings can be aggregated for the purpose of monitoring and billing. In addition, secure two-party protocols using arithmetic and binary circuits are executed by the system operator and each consumer to evaluate a generalized convolutionalneural network model on the reported masked fine-grained power consumption readings for the purpose of electricity theft detection. An extensive analysis on real datasets is performed to evaluate the security and the performance of PPETD. Our results confirm that our scheme is accurate in detecting fraudulent consumers with privacy preservation and acceptable communication and computation overhead.
Article
Non-technical losses are a component of energy losses associated with energy theft and fraud by the final consumers, hindering revenues of distribution utilities. This paper aims to compare the implemented solutions in the countries of South America to reduce non-technical losses. In this comparison, we introduce a new indicator based on the World Bank's database as input information. Considering that some regulatory agencies take policy actions related to non-technical losses to improve the quality of the electricity supply, we also present a correlation analysis of the proposed indicator and the electricity supply quality index. This analysis shows that in most of South America's countries, there is a high correlation within the studied horizon. An adequate characterization of the temporal variation in the proposed indicator can characterize the evolution of the consumers' perception of the quality in the electricity supply. This indicator allows each country's regulatory agency to analyze how the performed action is reducing non-technical losses concerning neighboring countries.
Article
Smart grid is the new trend for clean, sustainable, efficient and reliable energy generation, delivery and use. To ensure stable and secure operation is essential for the smart grid, which needs effective stability analysis and control. As the smart grid has evolved through a growing scale of interconnection, increasing integration of renewable energy, widespread operation of direct current power transmission systems, and liberalization of electricity markets, the stability characteristics of it are much more complex than the past. Due to these changes, conventional stability analysis and control approaches have a series of drawbacks in terms of speed, effectiveness and economy. On the contrary, the emerging artificial intelligence (AI) techniques provide powerful and promising tools for stability analysis and control in smart grids and have attracted growing attention. This paper aims to give a comprehensive and clear picture of recent advances in this research area. First, we present a general overview of AI, including its definitions, history and state-of-the-art methodologies. And then, this paper gives a comprehensive review of its applications to security assessment, stability assessment, fault diagnosis, and stability control in smart grids. These applications have achieved impressive results. Nevertheless, we also identify some major challenges these applications face in practice: high requirements on data, imbalanced learning, interpretability of AI, difficulties in transfer learning, the robustness of AI to communication quality, and the robustness against attack or adversarial examples. Furthermore, we provide suggestions for potential important future investigation directions to overcome these challenges and bridge the gap between research and practice.
Article
Uncertainty modeling of Renewable Energy Sources, load demand, electricity price, etc. create a high volume of data in smart grids. Accordingly, in this paper, a precise forecasting method based on a deep learning concept with Micro-clustering (MC) task is presented. The MC method is structured based on hybrid unsupervised and supervised clustering tasks by Kmeans and Gaussian Support Vector Machine, respectively. In the proposed method, the input data sequence is clustered by the MC task, and then the forecasting process is employed. By applying the MC, input data in each hour is categorized into different groups, and a distinctive forecasting unit is allocated to each one. In this way, more clusters and forecasting networks are earmarked for the hours with higher fluctuation rates. The Bi-directional Long Short-Term Memory (B-LSTM), which is one of the newest recurrent artificial neural networks, is proposed as the forecasting unit. The B-LSTM has bidirectional memory—feedforward and feedback loops- that helps us to investigate both previous and future hidden layers data. The optimal number of clusters in each hour is determined based on the Davies-Bouldin index. To evaluate the performance of the proposed method, in this study, three forecasting tasks including the wind speed, load demand, and electricity price are studied in different periods using the Ontario province, Canada data set. The results are compared with other benchmarking methods to verify the robustness and effectiveness of the proposed method. In fact, the proposed method, which is equipped with the MC technique and B-LSTM networks, significantly promotes the forecasting results, especially in spike points.
Article
Clustering by fast search and find of density peak (CFSFDP) is a simple and crisp density-clustering algorithm. The original algorithm is not suitable for direct application to anomaly detection. Its clustering results have a high level of redundant density information. If used directly as behavior profiles, the computation and storage costs of anomaly detection are high. Therefore, an improved algorithm based on CFSFDP is proposed for anomaly detection. The improved algorithm uses a few data points and their radius to support behavior profiles, and deletes the redundant data points without supporting profiles. This method not only reduces the large amount of data storage and distance calculation in the process of generating profiles, but also reduces the search space of profiles in the detection process. Numerous experiments show that the improved algorithm generates profiles faster than density-based spatial clustering of application with noise (DBSCAN), and has better profile precision than adaptive real-time anomaly detection with incremental clustering (ADWICE). The improved algorithm inherits the arbitrary shape clusters of CFSFDP, and improves the storage and computation performance. Compared with DBSCAN and ADWICE, the improved anomaly-detection algorithm based on CFSFDP has more balanced detection precision and real-time performance.
Article
In this paper, we present a novel data-driven approach to detect outage events in partially observable distribution systems by capturing the changes in smart meters’ (SMs) data distribution. To achieve this, first, a breadth-first search (BFS)-based mechanism is proposed to decompose the network into a set of zones that maximize outage location information in partially observable systems. Then, using SM data in each zone, a generative adversarial network (GAN) is designed to implicitly extract the temporal-spatial behavior in normal conditions in an unsupervised fashion. After training, an anomaly scoring technique is leveraged to determine if real-time measurements indicate an outage event in the zone. Finally, to infer the location of the outage events in a multi-zone network, a zone coordination process is proposed to take into account the interdependencies of intersecting zones. We have provided analytical guarantees of performance for our algorithm using the concept of entropy , which is leveraged to quantify outage location information in multi-zone grids. The proposed method has been tested and verified on distribution feeder models with real SM data.
Article
With the fast development of industrial Internet of Things (IoT) for smart energy, data processing and storing are closer to the end used side. Edge data center, an intermediate platform between end data source and centralized data center, can reduce the data transmission pressure and processing time. To provide dependable data source for decision making and to reduce property loss, energy theft detection is important to an edge data center. In this work, we propose a threshold-based abnormality detector for energy theft detection in an edge data center. The framework includes training feature extractor based on VAE-GAN, implementing k-means clustering to determine the representative features of normal load profiles, and finally formulating a threshold-based abnormality detector based on defined abnormality degree. We demonstrate that when VAE-GAN converges, it can grasp the temporal relationship and statistical distribution of real data. The encoder of VAE-GAN has good feature extraction performance and the distribution of normal and abnormal data can be easily separated. Also, we prove that the proposed feature representation is better than the feature extracted by other advanced feature extractors. By comparison with state-of-the-art detection models, the proposed detector is more computationally efficient and robust against the attack type changes.
Article
This paper presents a novel approach for detection and identification of energy theft in distribution systems considering advanced metering infrastructure. For the energy theft detection stage, a three phase state estimator based on phasor measurement units is used to detect the transformers which have evidence of energy theft. The next step is to identify consumers which are stealing energy. A Self-Organizing Map (SOM) was trained for clustering consumers according to similar consumption patterns. For each class defined by the SOM, a Multilayer Perceptron Artificial Neural Network (MP-ANN) for classification of consumers into two classes, either honest or fraudulent, was created. The main contribution of the energy theft detection step is the reduction of the number of transformers which have suspect consumers without the need to install measurement units on all transformers. The use of ANN allows to identify the fraudulent users considering either cyber or physical attacks. Tests were conducted for energy theft detection step on the IEEE 70 busbar test system. Real data from 5000 consumers were used for identification of fraudulent users. The results show the effectiveness and robustness of the proposed technique, presenting a detection rate close to 93% with a false positive rate less than 2%.
Article
Abnormal electricity consumption (AEC) caused huge economic losses to power supply enterprises in the past years, and also posed severe threats to the safety of peoples’ daily live. An accurate AEC detection is crucial to reducing the non-technical losses (NTLs) suffered by power supply enterprises and the State Grid. Comparing with the huge amount of electricity data flow, AEC data are relative few, that makes the AEC detection a typical imbalanced learning problem. To address this issue, two effective AEC detection algorithms from the perspective of data balancing and data weighting, respectively, are studied in this paper: (i) the K-means clustering and synthetic minority oversampling (K-means SMOTE) technique combining with the artificial neural network (ANN) trained by kernel extreme learning machine (KELM), and (ii) the deep weighted ELM (DWELM), that builds on an improved multiclass AdaBoost imbalanced learning algorithm (AdaBoost-ID) and an enhanced deep representation network based ELM (EH-DrELM). Experiments on the electricity consumption data of State Grid Zhejiang Electric Power Corporation are presented to show the effectiveness of the proposed algorithms. Comparisons to many state-of-the-art methods are provided for the superiority demonstration.
Article
With the development of demand response technologies, the pricing scheme in smart grids is moving from flat pricing to multiple pricing (MP), which facilitates the energy saving at the consumer side. However, the flexible pricing policy may be exploited for the stealthy reduction of utility bills. In this paper, we present a hidden electricity theft (HET) attack by exploiting the emerging MP scheme. The basic idea is that attackers can tamper with smart meters to cheat the utility that some electricity is consumed under a lower price. To construct the HET attack, we propose an optimization problem aiming at maximizing the attack profits while evading current detection methods, and design two algorithms to conduct the attack on smart meters. Moreover, we disclose and exploit several new vulnerabilities of smart meters to demonstrate the feasibility of HET attacks. To protect smart grids against HET attacks, we propose several defense and detection countermeasures, including selective protection on smart meters, limiting the attack cycle, and updating the billing mechanism. Extensive experiments on a real data set demonstrate that the attack could cause high economic losses, and the proposed countermeasures could effectively mitigate the attack’s impact at a low cost.
Article
Machine learning has a wide range of applications in the recognition of power loads (PLs). In the light of the problems, such as poor generalization and the ease of falling into the local optima existing in the current PL classification algorithms, an improved algorithm based on the denoising deconvolutional auto-encoder was proposed to classify the field PL data. With the mirror symmetric structure of the network, the convolutional module can extract the distinctive features, while the deconvolutional module can reduce data redundancy and maintain high activation pixels. The data preprocessing accomplishes data dimensionality reduction. In order to accelerate convergence and improve classification accuracy, the unsupervised pre-training and ℓ2, regularization were used. The experimental results in the field data of a provincial power grid demonstrate that the proposed algorithm has a better generalization performance and a higher recognition rate than other algorithms, thus providing an efficient and objective way for PLs recognition.
Article
With the wide deployment of smart meters in distribution systems, a new challenge emerges for the storage and transmission of huge volume of power consumption data collected by smart meters. In this paper, a deep-learning-based compression method for smart meter data is proposed via stacked convolutional sparse auto-encoder (SCSAE). An efficient and lightweight auto-encoder structure is first designed by leveraging the unique characteristics of smart meter readings. Specifically, the encoder is designed based on 2D separable convolution layers and the decoder is based on transposed convolution layers. Compared with the existing auto-encoder method and traditional methods, the proposed structure is redesigned, and the parameters and reconstruction errors are efficiently reduced. In addition, cluster-based indexes are used to represent the regularity of power consumption behavior and the relationship between electricity consumption behavior and compression effect is studied. Case studies illustrate that the proposed method can attain significant enhancement in model size, computational efficiency, and reconstruction error reduction while maintaining the most abundant details. And grouping compression considering users’ electricity consumption rules can further improve the compression effect.
Article
The advent of Big Data has ushered a new era of scientific breakthroughs. One of the common issues that affects raw data is class imbalance problem which refers to imbalanced distribution of values of the response variable. This issue is present in fraud detection, network intrusion detection, medical diagnostics, and a number of other fields where negatively labeled instances significantly outnumber positively labeled instances. Modern machine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class. The goal of our paper is demonstrate the effects of class imbalance on classification models. Concretely, we study the impact of varying class imbalance ratios on classifier accuracy. By highlighting the precise nature of the relationship between the degree of class imbalance and the corresponding effects on classifier performance we hope to help researchers to better tackle the problem. To this end, we carry out extensive experiments using 10-fold cross validation on a large number of datasets. In particular, we determine that the relationship between the class imbalance ratio and the accuracy is convex.
Article
Anomaly detection is a long-standing problem in system designation. High-quality anomaly detection can benefit plenty of applications (e.g. system monitoring, disaster precaution and intrusion detection). Most of the existing anomalies detection algorithms are less competent for both effectiveness and real-time capability requirements simultaneously. Therefore, in this paper, the LGMAD, a real-time anomaly detection algorithm based on Long-Short Term Memory (LSTM) and Gaussian Mixture Model (GMM)is proposed. Specifically, we evaluate the real-time anomalies of each univariate sensing time-series via LSTM model, and then a Gaussian Mixture Model is adopted to give a multidimensional joint detection of possible anomalies. Both NAB dataset and self-made dataset are employed to verify our approach. Extensive experiments are conducted to demonstrate the superiority of LGMAD compared to existing anomaly detection algorithms.
Article
The extensive deployment of smart meters in millions of households provides a huge amount of individual electricity consumption data for demand side analysis at a fine granularity. Different from traditional aggregated system-level data, smart meter data is more irregular and unpredictable. As a result, probabilistic load forecasting, which can provide a better understanding of the uncertainty and volatility in future demand, is critical to constructing energy-efficient and reliable smart grids. In this paper, a recently developed technique called Bayesian deep learning is employed to solve this challenging problem. In particular, a novel multitask probabilistic load forecasting framework based on Bayesian deep learning is proposed to quantify the shared uncertainties across distinct customer groups while accounting for their differences. Further, a clustering-based pooling method is designed to increase the data diversity and volume for the framework. This not only addresses the problem of overfitting but also improves the predictive performance. Numerical results are presented which demonstrate that the proposed framework provides superior probabilistic forecasting accuracy over conventional methods.
Article
Abstract: Recently, the radical digital transformation has deeply affected the traditional electricity grid and transformed it into an intelligent network (smart grid). This mutation is based on the progressive development of advanced technologies: advanced metering infrastructure (AMI) and smart meter which play a crucial role in the development of smart grid. AMI technologies have a promising potential in terms of improvement in energy efficiency, better demand management, and reduction in electricity costs. However the possibility of hacking smart meters and electricity theft is still among the most significant challenges facing electricity companies. In this regard, we propose a hybrid approach to detect anomalies associated with electricity theft in the AMI system, based on a combination of two robust machine learning algorithms; K-means and Deep Neural Network (DNN). K-means unsupervised machine learning algorithm is used to identify groups of customers with similar electricity consumption patterns to understand different types of normal behavior. DNN algorithm is used to build an accurate anomaly detection model capable of detecting changes or anomalies in usage behavior and deciding whether the customer has a normal or malicious consumption behavior. The proposed model is constructed and evaluated based on a real dataset from the Irish Smart Energy Trials. The results show a high performance of the proposed model compared to the models mentioned in the literature. Keywords: Anomaly detection, advanced metering infrastructure (AMI), smart grid, behavior, machine learning, deep neural network (DNN), cyber-security.