Conference Paper

Unsupervised Fault Detection Based on Laplacian Score and TEDA

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Neural Network Self-organizing map [36] ANN [37,38] Random Forest Classification Problem (normal, fault) [39] k-Nearest Neighbors (kNN) Ensemble method based on kNN with random forest k-means for feature selection [40] Naïve Bayes classifier Ensemble method based on Naïve Bayes classifier with random forest k-means for feature selection [40] Kernel PCA Training on only normal data points and using threshold for fault detection [19,41,42] TEDA (Typicality and Eccentricity Data Analytics) Unsupervised algorithm, no previous knowledge needed; detects outliers as faulty data samples [3,[43][44][45] Improved Support Vector Machines (SVM) ...
... With the TEDA method, no prior knowledge of the processes and data samples and no user-defined parameters are needed. The TEDA method was used as an unsupervised learning algorithm in Lou and Li [45] by selecting features via the Laplacian Score method before training to make a priori knowledge during pre-processing stage negligible. ...
Article
Full-text available
The increase of productivity and decrease of production loss is an important goal for modern industry to stay economically competitive. For that, efficient fault management and quick amendment of faults in production lines are needed. The prioritization of faults accelerates the fault amendment process but depends on preceding fault detection and classification. Data-driven methods can support fault management. The increasing usage of sensors to monitor machine health status in production lines leads to large amounts of data and high complexity. Machine Learning methods exploit this data to support fault management. This paper reviews literature that presents methods for several steps of fault management and provides an overview of requirements for fault handling and methods for fault detection, fault classification, and fault prioritization, as well as their prerequisites. The paper shows that fault prioritization lacks research about available learning methods and underlines that expert opinions are needed.
... Effectiveness analysis of variables is performed by random forest (RF). RF is a variable selection method that discerns its significance according to the score [25]. Fig.6 illustrates the scores of all variables ( Table I). ...
Article
Full-text available
Wind farms are usually located in plateau mountains and northern coastal areas, bringing a high probability of blade icing. Blade icing even leads to blade cracks and turbine collapse. Traditional methods of blade icing diagnosis increase operating costs and have the potential risk of damaging the original mechanical structure. A data-driven model based on a novel convolutional recurrent neural network is proposed in this paper. The method can effectively extract hidden features for accurate icing diagnosis. The hyperparameters of the proposed model are optimized by the improved African Vultures optimization algorithm (IAVOA). To alleviate the critical data imbalance, the Adaptive Synthetic (ADASYN) is used to oversample the minority classes of icing status. In comparison to the state-of-art classification methods, the proposed method illustrates the outstanding effectiveness in blade icing diagnosis using the sensor data from Supervisory Control and Data Acquisition (SCADA) systems. The effectiveness analysis of variables, ablation study, and sensitivity analysis validate the performance of the proposed method.
... The evolving rule base for classification is performed by AnYa fuzzy systems [19]. Applications of TEDA also include approaches related to anomaly detection [20], real-time fault detection [21]- [22] and autonomous online learning for construction of local models in feature space [23]. ...
Article
Dealing with uncertain data requires effective methods to properly describe their real meaning in terms of a trade-off between interpretability and generality on the process of knowledge formation based on data abstraction. This paper proposes an online granulation process based on Evolving Ellipsoidal Fuzzy Information Granules (EEFIG) and the Principle of Justifiable Granularity (PJG) for data streams parameterization. The granulation process consists in the information granule development taking into consideration the data stream with a simplified optimal granularity allocation. In the sequel, an evolving Takagi-Sugeno fuzzy model based on the ellipsoidal granules is proposed for data reconstruction and one-step ahead prediction from past data numerical evidence. Experimental studies concerning clustering, data granulation, and time-series forecasting are performed to illustrate the effectiveness of the proposed method.
Article
Full-text available
Fault detection in industrial processes is a field of application that has gaining considerable attention in the past few years, resulting in a large variety of techniques and methodologies designed to solve that problem. However, many of the approaches presented in literature require relevant amounts of prior knowledge about the process, such as mathematical models, data distribution and pre-defined parameters. In this paper, we propose the application of TEDA - Typicality and Eccentricity Data Analytics - , a fully autonomous algorithm, to the problem of fault detection in industrial processes. In order to perform fault detection, TEDA analyzes the density of each read data sample, which is calculated based on the distance between that sample and all the others read so far. TEDA is an online algorithm that learns autonomously and does not require any previous knowledge about the process nor any user-defined parameters. Moreover, it requires minimum computational effort, enabling its use for real-time applications. The efficiency of the proposed approach is demonstrated with two different real world industrial plant data streams that provide “normal” and “faulty” data. The results shown in this paper are very encouraging when compared with traditional fault detection approaches.
Article
Full-text available
We introduce a framework for filtering fea- tures that employs the Hilbert-Schmidt In- dependence Criterion (HSIC) as a measure of dependence between the features and the labels. The key idea is that good features should maximise such dependence. Fea- ture selection for various supervised learning problems (including classification and regres- sion) is unified under this framework, and the solutions can be approximated using a backward-elimination algorithm. We demon- strate the usefulness of our method on both artificial and real world datasets.
Article
Full-text available
In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, is introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure
Article
In this paper we introduce a classifier named TEDAClass (Typicality and Eccentricity based Data Analytics Classifier) which is based on the recently proposed AnYa type fuzzy rule based system. Specifically, the rules of the proposed classifier are defined according to the recently proposed TEDA framework. This novel and efficient systematic methodology for data analysis is a promising addition to the traditional probability as well as to the fuzzy logic. It is centred at non-parametric density estimation derived from the data sample. In addition, the proposed framework is computationally cheap and provides fast and exact per-point processing of the data set/stream. The algorithm is demonstrated to be suitable for different classification tasks. Throughout the paper we give evidence of its applicability to a wide range of practical problems. Furthermore, the algorithm can be easily adapted to different classical data analytics problems, such as clustering, regression, prediction, and outlier detection. Finally, it is very important to remark that the proposed algorithm can work ”from scratch” and evolve its structure during the learning process.
Conference Paper
In this paper, we propose a new eccentricity- based anomaly detection principle and algorithm. It is based on a further development of the recently introduced data analytics framework (TEDA - from typicality and eccentricity data analytics). We compare TEDA with the traditional statistical approach and prove that TEDA is a generalization of it in regards to the well-known “nσ” analysis (TEDA gives exactly the same result as the traditional “nσ” analysis but it does not require the restrictive prior assumptions that are made for the traditional approach to be in place). Moreover, it offers a non-parametric, closed form analytical descriptions (models of the data distribution) to be extracted from the real data realizations, not to be pre-assumed. In addition to that, for several types of proximity/similarity measures (such as Euclidean, cosine, Mahalonobis) it can be calculated recursively, thus, computationally very efficiently and is suitable for real time and online algorithms. Building on the per data sample, exact information about the data distribution in a closed analytical form, in this paper we propose a new less conservative and more sensitive condition for anomaly detection. It is quite different from the traditional “nσ” type conditions. We demonstrate example where traditional conditions would lead to an increased amount of false negatives or false positives in comparison with the proposed condition. The new condition is intuitive and easy to check for arbitrary data distribution and arbitrary small (but not less than 3) amount of data samples/points. Finally, because the anomaly/novelty/change detection is very important and basic data analysis operation which is in the fundament of such higher level tasks as fault detection, drift detection in data streams, clustering, outliers detection, autonomous video analytics, particle physics, etc. we point to some possible applications which will be the domain of future work.
Article
The paper proposes a new fault-detection scheme for nonlinear input-output systems with uncertain interval-type parameters. Using a high-pass filter for detecting the system's steady-state conditions, one is able to build a confidence band of data for the system in normal operating mode. The band can be seen as an approximation of the system's nonlinear input-output mapping including the effects of uncertainties. By approximating the boundary functions of the band using an interval fuzzy model (INFUMO), we are able to build a simple and effective fault-detection system based on the boundary-crossing test and logical relations. An example of a fault-detection system for a pH neutralization process is presented to demonstrate the benefits of the proposed method.
Article
Fault detection for aileron actuators mainly involves the enhancement of reliability and fault tolerant capability. A fault detection method for aileron actuator under variable conditions is proposed in this study. In the approach, three neural networks are used for fault detection and preliminary fault localization. The first neural network, which is employed as an observer, is established to monitor the aileron actuator and estimate the system output. The second neural network generates the corresponding adaptive threshold synchronously. The last neural network is used as a force motor current observer, and outputs estimated force motor current. Faults are detected by comparing the residual error (the difference value between the actual and estimated output) and the threshold, or comparing the force motor current and the estimated force motor current. In considering of the variable conditions, aerodynamic loads are introduced to the neural network, and the training order spectrums are designed. Finally, the effectiveness of the proposed scheme is demonstrated by a simulation model with different faults.
Book
A most critical and important issue surrounding the design of automatic control systems with the successively increasing complexity is guaranteeing a high system performance over a wide operating range and meeting the requirements on system reliability and dependability. As one of the key technologies for the problem solutions, advanced fault detection and identification (FDI) technology is receiving considerable attention. The objective of this book is to introduce basic model-based FDI schemes, advanced analysis and design algorithms and the needed mathematical and control theory tools at a level for graduate students and researchers as well as for engineers. © 2008 Springer-Verlag Berlin Heidelberg. All rights are reserved.
Article
This study aims at providing a fault detection and diagnosis (FDD) approach based on nonlinear parity equations identified from process data. Process knowledge is used to reduce the process nonlinearity from high to low-dimensional nonlinear functions representing common process devices, such as valves, and incorporating the monotonousness properties of the dependencies between the variables. The fault detection approach considers the obtained process model to be nonlinear parity equations, and fault diagnosis is carried out with the standard structured residual method. The applicability of the approach to complex flow networks controlled by valves is tested on the drying section of an industrial board machine, in which the key problems are leakages and blockages of valves and pipes in the steam–water network. Nonlinear model equations based on the mass balance of different parts of the network are identified and validated. Finally, fault detection and diagnosis algorithms are successfully implemented, tested, and reported.
Article
While principal component analysis (PCA) has found wide application in process monitoring, slow and normal process changes often occur in real processes, which lead to false alarms for a fixed-model monitoring approach. In this paper, we propose two recursive PCA algorithms for adaptive process monitoring. The paper starts with an efficient approach to updating the correlation matrix recursively. The algorithms, using rank-one modification and Lanczos tridiagonalization, are then proposed and their computational complexity is compared. The number of principal components and the confidence limits for process monitoring are also determined recursively. A complete adaptive monitoring algorithm that addresses the issues of missing values and outlines is presented. Finally, the proposed algorithms are applied to a rapid thermal annealing process in semiconductor processing for adaptive monitoring.
Conference Paper
In supervised learning scenarios, feature selection has be en studied widely in the literature. Selecting features in unsupervis ed learning sce- narios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of pre- vious unsupervised feature selection methods are "wrapper" techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a "filter" method for feature select ion which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification pr oblems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demon- strate the effectiveness and efficiency of our algorithm. Feature selection methods can be classified into "wrapper" m ethods and "filter" methods (4). The wrapper model techniques evaluate the features using the learning algorithm that will ultimately be employed. Thus, they "wrap" the selection process around the learning algorithm. Most of the feature selection methods are wrapper methods. Algorithms based on the filter model examine intrinsic properties of the data t o evaluate the features prior to the learning tasks. The filter based approaches almost alway s rely on the class labels, most commonly assessing correlations between features and the class label. In this paper, we are particularly interested in the filter methods. Some typi cal filter methods include data variance, Pearson correlation coefficients, Fisher score, and Kolmogorov-Smirnov test. Most of the existing filter methods are supervised. Data vari ance might be the simplest unsupervised evaluation of the features. The variance along a dimension reflects its repre- sentative power. Data variance can be used as a criteria for feature selection and extraction. For example, Principal Component Analysis (PCA) is a classical feature extraction method which finds a set of mutually orthogonal basis functions that capture the directions of max- imum variance in the data.
Conference Paper
In real-world concept learning problems, the representation of data often uses many features, only a few of which may be related to the target concept. In this situation, feature selection is important both to speed up learning and to improve concept quality. A new feature selection algorithm Relief uses a statistical method and avoids heuristic search. Relief requires linear time in the number of given features and the number of training instances regardless of the target concept to be learned. Although the algorithm does not necessarily find the smallest subset of features, the size tends to be small because only statistically relevant features are selected. This paper focuses on empirical test results in two artificial domains; the LED Display domain and the Parity domain with and without noise. Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.
Article
Fault management is critical for a vehicle active safety system. Since a sensor fault may not always be detectable by a sensor self-test or an electronic monitoring system whose detection often relies on out-of-range signals, a redundancy check is warranted for the detection of an in-range signal fault. In this paper, an in-vehicle roll rate sensor failure detection scheme utilizing analytical redundancy is presented. The vehicle is assumed to be equipped with a steering wheel angle sensor, a yaw rate sensor, a lateral accelerometer, and wheel speed sensors in addition to the roll rate sensor. Due to the wide variation of vehicle dynamics under a vast operating range, such as various and dynamically changing road super-elevations and road grades, the detection of a roll rate signal fault using analytical redundancy is particularly challenging. These challenges, as well as the robustness and performance of the proposed scheme are discussed. The robust performance of the proposed scheme, over model uncertainties and road disturbances, is illustrated analytically and validated through experimental test data. The analytical illustrations include three elements: a robust estimation of the vehicle roll angle, a dynamic compensation of both electrical and kinematics-induced biases in the roll rate signal, and a directionally sensitive design of a robust observer which decouples the model uncertainties and disturbances from the fault. The experimental verifications of no false positive and/or no false negative were taken with a variety of maneuvers and road conditions on several vehicle test platforms