Fig 1 - uploaded by Sucheta Chauhan
Content may be subject to copyright.
LSTM Block Diagram with three multiplicative gates :an input gate, forget gate and an output gate. 

LSTM Block Diagram with three multiplicative gates :an input gate, forget gate and an output gate. 

Similar publications

Chapter
Full-text available
The paper introduces a multilayer long short-term memory (LSTM) based auto-encoder network to spot abnormalities in fetal ECG. The LSTM network was used to detect patterns in the time series, reconstruct errors and classify a given segment as an anomaly or not. The proposed anomaly detection method provides a filtering procedure able to reproduce E...

Citations

... Anomaly detection (AD) aims to identify data objects that significantly deviate from the majority of samples, with numerous successful applications in intrusion detection [36,44], fault detection [22,85], medical diagnosis [13,38], fraud detection [2,8,9], social media analysis [81,86], etc. Recently, deep neural networks have become the primary techniques in AD due to their powerful representation learning capacity [57]. ...
Conference Paper
Full-text available
Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing. However, most of the current research tends to view deep AD algorithms as a whole, without dissecting the contributions of individual design choices like loss functions and network architectures. This view tends to diminish the value of preliminary steps like data preprocessing, as more attention is given to newly designed loss functions, network architectures, and learning paradigms. In this paper, we aim to bridge this gap by asking two key questions: (i) Which design choices in deep AD methods are crucial for detecting anomalies? (ii) How can we automatically select the optimal design choices for a given AD dataset, instead of relying on generic, pre-existing solutions? To address these questions, we introduce ADGym, a platform specifically crafted for comprehensive evaluation and automatic selection of AD design elements in deep methods. Our extensive experiments reveal that relying solely on existing leading methods is not sufficient. In contrast, models developed using ADGym significantly surpass current state-of-the-art techniques.
... If the LSTM's prediction for a future data point does not match the actual observed data, it is an indication of an anomaly. Owing to their long memory, they can be particularly useful for spotting anomalies that are based on long-term patterns [26,27]. Convolutional neural networks (CNNs) are primarily designed for image processing to identify spatial hierarchies in data. ...
Article
Full-text available
Anomalies are infrequent in nature, but detecting these anomalies could be crucial for the proper functioning of any system. The rarity of anomalies could be a challenge for their detection as detection models are required to depend on the relations of the datapoints with their adjacent datapoints. In this work, we use the rarity of anomalies to detect them. For this, we introduce the reversible instance normalized anomaly transformer (RINAT). Rooted in the foundational principles of the anomaly transformer, RINAT incorporates both prior and series associations for each time point. The prior association uses a learnable Gaussian kernel to ensure a thorough understanding of the adjacent concentration inductive bias. In contrast, the series association method uses self-attention techniques to specifically focus on the original raw data. Furthermore, because anomalies are rare in nature, we utilize normalized data to identify series associations and employ non-normalized data to uncover prior associations. This approach enhances the modelled series associations and, consequently, improves the association discrepancies.
... Researchers have diligently explored methods to navigate these intricacies, propelling the evolution of log-based anomaly detection techniques (Ergen and Kozat, 2019;Malhotra et al, 2015;Chauhan and Vig, 2015;Xu et al, 2009;Zhang et al, 2020;Lu et al, 2018). From long short-term memory(LSTM)-based (Greff et al, 2016;Hochreiter and Schmidhuber, 1997) algorithms that forecast time series data and fit multivariate gaussian distributions to prediction errors (Malhotra et al, 2015;Chauhan and Vig, 2015), to the integration of transformer networks (Nedelkoski et al, 2020), the field has witnessed innovations designed to capture anomalies in diverse ways. ...
... Researchers have diligently explored methods to navigate these intricacies, propelling the evolution of log-based anomaly detection techniques (Ergen and Kozat, 2019;Malhotra et al, 2015;Chauhan and Vig, 2015;Xu et al, 2009;Zhang et al, 2020;Lu et al, 2018). From long short-term memory(LSTM)-based (Greff et al, 2016;Hochreiter and Schmidhuber, 1997) algorithms that forecast time series data and fit multivariate gaussian distributions to prediction errors (Malhotra et al, 2015;Chauhan and Vig, 2015), to the integration of transformer networks (Nedelkoski et al, 2020), the field has witnessed innovations designed to capture anomalies in diverse ways. However, conventional LSTM-based approaches, while effective, often suffer from parameter sensitivity that can hinder their real-world applicability (Malhotra et al, 2015). ...
... These models have proven effective in capturing sequential dependencies in log data (Greff et al, 2016). Notably, LSTM-based algorithms, as described in (Malhotra et al, 2015) and (Chauhan and Vig, 2015), follow a two-step approach. First, they predict time series data using LSTM models. ...
Preprint
Full-text available
Log messages play a critical role in system analysis and issue resolution, particularly in complex software-intensive systems that demand high availability and quality assurance. However, log-based anomaly detection faces three major challenges. Firstly, millions of log data poses a significant labeling challenge. Secondly, log data tends to exhibit a severe class imbalance. Thirdly, the task of anomaly detection in such massive datasets requires both high accuracy and efficiency. Numerous deep learning based methods have been proposed to tackle those challenges. Yet, a comprehensive solution that effectively addresses all these issues has remained elusive. Through careful examination of log messages from stable systems, we find a consistency principle: the number of unique anomaly logs is consistently small. Based on this principle, we present a novel framework called ''Whole Lifecycle Tuning Anomaly Detection with Small Sample Logs'' (SSADLog). SSADLog introduces a hyper-efficient log data pre-processing method that generates a representative subset of small sample logs. It leverages a pre-trained bidirectional encoder representations from transformers (BERT) language model to create contextual word embeddings. Furthermore, a semi-supervised fine-tuning process is employed for enhancing detection accuracy. A distinctive feature of SSADLog is its ability to fine-tune language models with small samples, achieving high-performance iterations in just approximately 30 minutes. Extensive experimental evaluations show that SSADLog greatly reduces the effort to detect anomaly log messages from millions of daily new logs and outperforms the previous representative methods across various log datasets in terms of precision, recall, and F1 score.
... The problem then shifts to become a classification problem where we want to attribute a label of anomalous or not to each cycle, instead of finding point anomalies in the data. This approach is similar to that of detecting anomalies in electrocardiogram signals [4], where each heartbeat is analyzed individually. ...
Chapter
Full-text available
The transition to Industry 4.0 provoked a transformation of industrial manufacturing with a significant leap in automation and intelligent systems. This paradigm shift has brought about a mindset that emphasizes predictive maintenance: detecting future failures when current behaviour of industrial processes and machines is thought to be normal. The constant monitoring of industrial equipment produces massive quantities of data that enables the application of machine learning approaches to this task. This study uses deep learning-based models to build a data-driven predictive maintenance framework for the air production unit (APU), a crucial system for the proper functioning of a Metro do Porto train. This public transport system moves thousands of people every day and train failures lead to delays and loss of trust by clients. Therefore, it is essential not only to detect APU failures before they occur to minimize negative impacts, but also to provide explanations for the failure warnings that can aid in decision-making processes. We propose an autoencoder architecture trained with an adversarial loss, known as the Wasserstein Autoencoder with Generative Adversarial Network (WAE-GAN), designed to detect sensor failures in systems connected to the APU. Our model can detect APU failures up to two hours before they occur, allowing timely intervention of the maintenance teams. We further augment our model with an explainability layer, by providing explanations generated by a rule-based model that focuses on rare events. Results show that our model is able to detect APU failures without any false alarms, fulfilling the requisites of Metro do Porto for early detection of the failures.
... The forget gate selectively maintains information by retaining or discarding it b on the combination of these inputs. σ refers to the sigmoid activation function [22] input gate is utilized to extract the current sequence input and the input value from The forget gate selectively maintains information by retaining or discarding it based on the combination of these inputs. σ refers to the sigmoid activation function [22]. ...
... σ refers to the sigmoid activation function [22] input gate is utilized to extract the current sequence input and the input value from The forget gate selectively maintains information by retaining or discarding it based on the combination of these inputs. σ refers to the sigmoid activation function [22]. The input gate is utilized to extract the current sequence input and the input value from the previous layer unit and calculate the input factor [23]. ...
Article
Full-text available
Citation: Su, S.; Zhu, Z.; Wan, S.; Sheng, F.; Xiong, T.; Shen, S.; Hou, Y.; Liu, C.; Li, Y.; Sun, X.; et al. An ECG Signal Acquisition and Analysis System Based on Machine Learning with Model Fusion. Sensors 2023, 23, 7643. https://doi.org/10.3390/ s23177643 Academic Editors: Dajing Chen, Weiting Liu and Lei Ren Abstract: Recently, cardiovascular disease has become the leading cause of death worldwide. Abnormal heart rate signals are an important indicator of cardiovascular disease. At present, the ECG signal acquisition instruments on the market are not portable and manual analysis is applied in data processing, which cannot address the above problems. To solve these problems, this study proposes an ECG acquisition and analysis system based on machine learning. The ECG analysis system responsible for ECG signal classification includes two parts: data preprocessing and machine learning models. Multiple types of models were built for overall classification, and model fusion was conducted. Firstly, traditional models such as logistic regression, support vector machines, and XG-Boost were employed, along with feature engineering that primarily included morphological features and wavelet coefficient features. Subsequently, deep learning models, including convolutional neural networks and long short-term memory networks, were introduced and utilized for model fusion classification. The system's classification accuracy for ECG signals reached 99.13%. Future work will focus on optimizing the model and developing a more portable instrument that can be utilized in the field.
... To overcome this issue, some researchers proposed to use recurrent neural network (RNN) and its variations (Gated recurrent unit (GRU) and long short-term memory (LSTM)) [10,11]. These networks are commonly used in time series processing. ...
Article
Full-text available
Epilepsy is a neurological disorder characterized by recurring seizures, detected by electroencephalography (EEG). EEG signals can be detected by manual time-consuming analysis and recently by automatic detection. The latter poses a significant challenge due to the high dimensional and non-stationary nature of EEG signals. Recently, deep learning (DL) techniques have emerged as valuable tools for seizure detection. In this study, a novel data-driven model based on DL, incorporating a self-attention mechanism (SAT), is proposed. One notable advantage of the proposed method is its simplicity in application, as the raw signal data is directly fed into the suggested network without requiring expertise in signal processing. The model leverages a one-dimensional convolutional neural network (CNN) to extract relevant features from EEG signals. These features are then passed through a long short-term memory (LSTM) module to benefit from its memory capabilities, along with a SAT mechanism. The key contribution of this paper lies in the addition of the SAT layer to the LSTM encoder, enabling enhanced exploration of the latent mapping during the encoding step. Cross-subject experiments revealed good performance of this approach with F1-score of 97.8% and 92.7% for binary and five-class epileptic seizure recognition tasks, respectively, on the public UCI dataset, and 97.9% on the CHB-MIT database, surpassing state-of-the-art DL performance. Besides, the proposed method exhibits robustness to inter-subject variability.
... Deep Learning methods in TAD: In recent years, deep learning models have demonstrated potential in addressing the complexities of TAD due to their capacity to autonomously extract features from raw data [13]. Prior research has leaned toward unsupervised [65,26,2,37,61] or semi-supervised [8,10,42,45] techniques due to the paucity of labelled data in real-world scenarios [13]. The preference for models like LSTM-AD [40], LSTM-NDT [26], DITAN [20] and THOC [53] is because they excel at minimising forecasting errors while capturing time series data's temporal dependencies. ...
Preprint
Full-text available
We introduce a Self-supervised Contrastive Representation Learning Approach for Time Series Anomaly Detection (CARLA), an innovative end-to-end self-supervised framework carefully developed to identify anomalous patterns in both univariate and multivariate time series data. By taking advantage of contrastive representation learning, We introduce an innovative end-to-end self-supervised deep learning framework carefully developed to identify anomalous patterns in both univariate and multivariate time series data. By taking advantage of contrastive representation learning, CARLA effectively generates robust representations for time series windows. It achieves this by 1) learning similar representations for temporally close windows and dissimilar representations for windows and their equivalent anomalous windows and 2) employing a self-supervised approach to classify normal/anomalous representations of windows based on their nearest/furthest neighbours in the representation space. Most of the existing models focus on learning normal behaviour. The normal boundary is often tightly defined, which can result in slight deviations being classified as anomalies, resulting in a high false positive rate and limited ability to generalise normal patterns. CARLA's contrastive learning methodology promotes the production of highly consistent and discriminative predictions, thereby empowering us to adeptly address the inherent challenges associated with anomaly detection in time series data. Through extensive experimentation on 7 standard real-world time series anomaly detection benchmark datasets, CARLA demonstrates F1 and AU-PR superior to existing state-of-the-art results. Our research highlights the immense potential of contrastive representation learning in advancing the field of time series anomaly detection, thus paving the way for novel applications and in-depth exploration in this domain.
... In the related literature, we encounter methods that are close to our proposed model, so in the following, we discuss their similarities and differences and use them in the experimental comparison. Chauhan and Vig [10] use the Long Short Term Memory (LSTM) network [11] on an electrocardiography signals dataset that contains 1-dimension time series to detect anomalies. Malhotra et al. [12] use CNNs on a balanced dataset of healthy and broken tooth gears, to detect anomalies. ...
Preprint
Full-text available
Anomaly Detection in multivariate time series is a major problem in many fields. Due to their nature, anomalies sparsely occur in real data, thus making the task of anomaly detection a challenging problem for classification algorithms to solve. Methods that are based on Deep Neural Networks such as LSTM, Autoencoders, Convolutional Autoencoders etc., have shown positive results in such imbalanced data. However, the major challenge that algorithms face when applied to multivariate time series is that the anomaly can arise from a small subset of the feature set. To boost the performance of these base models, we propose a feature-bagging technique that considers only a subset of features at a time, and we further apply a transformation that is based on nested rotation computed from Principal Component Analysis (PCA) to improve the effectiveness and generalization of the approach. To further enhance the prediction performance, we propose an ensemble technique that combines multiple base models toward the final decision. In addition, a semi-supervised approach using a Logistic Regressor to combine the base models' outputs is proposed. The proposed methodology is applied to the Skoltech Anomaly Benchmark (SKAB) dataset, which contains time series data related to the flow of water in a closed circuit, and the experimental results show that the proposed ensemble technique outperforms the basic algorithms. More specifically, the performance improvement in terms of anomaly detection accuracy reaches 2% for the unsupervised and at least 10% for the semi-supervised models.
... Machine learning methods, especially deep learning-based methods, have succeeded greatly due to their powerful representation advantages. Most of the supervised and semisupervised methods [14,18,48,50,83,85] can not handle the challenge of limited labeled data, especially the anomalies are dynamic and new anomalies never observed before may occur. Unsupervised methods are popular without strict requirements on labeled data, including one class classification-based, probabilisticbased, distance-based, forecasting-based, reconstruction-based approaches [11,23,42,56,63,81,88]. ...
Conference Paper
Full-text available
Time series anomaly detection is critical for a wide range of applications. It aims to identify deviant samples from the normal sample distribution in time series. The most fundamental challenge for this task is to learn a representation map that enables effective discrimination of anomalies. Reconstruction-based methods still dominate, but the representation learning with anomalies might hurt the performance with its large abnormal loss. On the other hand, contrastive learning aims to find a representation that can clearly distinguish any instance from the others, which can bring a more natural and promising representation for time series anomaly detection. In this paper, we propose DCdetector, a multi-scale dual attention contrastive representation learning model. DCdetector utilizes a novel dual attention asymmetric design to create the permutated environment and pure contrastive loss to guide the learning process, thus learning a permutation invariant representation with superior discrimination abilities. Extensive experiments show that DCdetector achieves state-of-the-art results on multiple time series anomaly detection benchmark datasets. Code is publicly available at https://github.com/DAMO-DI-ML/KDD2023-DCdetector.
... Then, time series data are marked as abnormal or normal based on whether they exceeded the defined threshold or not. Chauhan and Vig [32] proposed a deep RNN model with LSTM units to predict Electrocardiography (ECG) signals and detect arrhythmia in the human heart. Kanarachos et al. [33] combined wavelet and Hilbert transform with deep neural network to detect anomalies in earthquake activity. ...