Conference Paper

Contextual Anomaly Detection for a Critical Industrial System Based on Logs and Metrics

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recent advances in contextual anomaly detection attempt to combine resource metrics and event logs to uncover unexpected system behaviors at run-time. This is highly relevant for critical software systems, where monitoring is often mandated by international standards and guidelines. In this paper, we analyze the effectiveness of a metrics-logs contextual anomaly detection technique in a middleware for Air Traffic Control systems. Our study addresses the challenges of applying such techniques to a new case study with a dense volume of logs, and finer monitoring sampling rate. Guided by our experimental results, we propose and evaluate several actionable improvements, which include a change detection algorithm and the use of time windows on contextual anomaly detection.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A regression-based strategy was developed by Farshichi et al. [32] for detecting contextual anomalies in the control system of air traffic. They also have actionable improvement specifics by using the algorithm for modification detection and time windows on contextual anomalies. ...
... Regression algorithms use the input features to predict the data's output values faded into the system. For example, in intelligent transportation, a regression-based model was developed by Farshichi et al. [32] to detect contextual anomalies in the air traffic control framework by identifying the correlation between accidents and resource measurements in the log reports. ...
... Precision is known as the number of class members classified accurately over the total number of cases classified as class members [14]. Recall is known as the number of class members classified correctly over the total number of class members [32]. In anomaly detection, high precision and high recall are needed to develop a high-quality technique. ...
Article
Full-text available
Abstract: Anomaly detection has gained considerable attention in the past couple of years. Emerging technologies, such as the Internet of Things (IoT), are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications. Examining these collected data to detect suspicious events can reduce functional threats and avoid unseen issues that cause downtime in the applications. Due to the dynamic nature of the data stream characteristics, many unresolved problems persist. In the existing literature, methods have been designed and developed to evaluate certain anomalous behaviors in IoT data stream sources. However, there is a lack of comprehensive studies that discuss all the aspects of IoT data processing. Thus, this paper attempts to fill this gap by providing a complete image of various state-of-the-art techniques on the major problems and core challenges in IoT data. The nature of data, anomaly types, learning mode, window model, datasets, and evaluation criteria are also presented. Research challenges related to data evolving, feature-evolving, windowing, ensemble approaches, nature of input data, data complexity and noise, parameters selection, data visualizations, heterogeneity of data, accuracy, and large-scale and high-dimensional data are investigated. Finally, the challenges that require substantial research efforts and future directions are summarized.
... The study by Farshchi et al. [32] presented a regression-based model for finding the contextual anomalies in air traffic control systems via the determination of the events-resource metrics correlations in the logs files. They also provided further details of possible improvements by suggesting the inclusion of change detection frameworks and time windows utilization on contextual anomalies. ...
Conference Paper
The world is currently progressing towards a new connectivity era where billions of sensors are connected over a network called the Internet of Things (IoT). IoT enables a wide range of physical objects and devices to be connected and monitored with insufficient spatial and temporal detail. Despite their potential to improve multiple application domains, anomalies in the devices' behaviors pose a significant challenge, especially in the smart city's domain. Many research works have been devoted to determining such anomalous behaviors; however, there is a lack of comprehensive review focusing on anomaly detection techniques using statistical and machine learning methods in the smart cities domain. This work aims to fill this gap by presenting a review of anomaly detection techniques using statistical and machine learning methods. This paper explains the essential contexts related to IoT, followed by a review of the IoT anomaly detection techniques and their challenges, types, and detection modes. The paper then presents a summary of the related works related to smart cities. Finally, the open challenges and future directions were highlighted.
... In [7], a contextual anomaly detection framework designed for anomaly detection and was evaluated in the areas of electricity, temperature and traffic system. Similarly, [8], addressed and analyze the effectiveness of a metric-logs contextual anomaly detection technique in middleware for air traffic control systems. In [9], introduced Detection of driving patterns and road anomalies using D&R Sense and support vector machine to address driving styles of drivers and road anomalies like bumps and potholes. ...
Article
In Nigeria, a crucial responsibility of the executive arms of the government is to submit annual budgetary allocations to the national assembly for approval. Due to the diversity and complexity of the budget, the national assembly is mandated to carry out its constitutional duty of scrutinizing the budget to discover irregularity or anomaly, make recommendations, or substantial modification upon what it received. This is very challenging, particularly in Nigeria where there are many different ethnicities and regional, to ensure inclusiveness, the national assembly must carry out its constitutional duty diligently and carefully without fear or favor that often has unintended consequences. This might not be very easy to accomplish within a short period. Thus, this research aims at detecting an anomaly in the budget that will ease the legislative duty thereby facilitating the process of appropriation. The concept of Clustering for Machine learning technique was used for the detection of an anomaly, where the detected ones are noted and indicated for critical examination.
... While the temperature range increase they deaccelarte to avoid motor malfunction that is one of the major cause of drone crash. A regression based model [37], is developed to find the correlation between events and resource metrics in logs files to find the contextual anomalies in air traffic control system. Furthermore, they also provide the details of actionable improvement by including the change detection algorithm and use of time windows on contextual anomalies. ...
Article
Anomaly detection has attracted considerable attention from the research community in the past few years due to the advancement of sensor monitoring technologies, low-cost solutions, and high impact in diverse application domains. Sensors generate a huge amount of data while monitoring the physical spaces and objects. These huge collected data streams can be analyzed to identify unhealthy behaviors. It may reduce functional risks, avoid unseen problems, and prevent downtime of the systems. Many research methodologies have been designed and developed to determine such anomalous behaviors in security and risk analysis domains. In this paper, we present the results of a systematic literature review about anomaly detection techniques except for these dominant research areas. We focus on the studies published from 2000 to 2018 in the application areas of intelligent inhabitant environments, transportation systems, health care systems, smart objects, and industrial systems. We have identified a number of research gaps related to the data collection, the analysis of imbalanced large datasets, limitations of statistical methods to process the huge sensory data, and few research articles in abnormal behavior prediction in real scenarios. Based on our analysis, researchers and practitioners can acquaint themselves with the existing approaches, use them to solve real problems, and/or further contribute to developing novel techniques for anomaly detection, prediction, and analysis.
Chapter
The paper discusses an approach to detecting anomalies in the behavior of users of data centers, using a specialized analytical unit based on artificial neural networks. It is proposed to use transaction log records of the databases that are part of the data center as data sets for analysis. An experimental evaluation of the proposed approach is made for several types of analytical units, which include several artificial neural networks. Experiments have demonstrated the high efficiency of the proposed approach.
Conference Paper
In some special application environments, network fault can lead to loss of important information or even mission failures, resulting in unpredictable losses. Therefore, it has certain research significance and practical value to evaluate the network status and predict the possible faults before performing the key tasks. Based on the logs collected by the router board in the real network, this paper analyses the behavior type, attribute information and the corresponding status value, and detects the hidden fault or network attack, so as to provide early warning information for operators. We propose a deep neural network model utilizing Long Short-Term Memory (LSTM) to predict the current number of level-1 logs. By comparing the predicted number of level-1 logs, it can detect abnormal behavior such as a surge in the number of logs. What's more, we perform semantic analysis on attribute information to construct attribute syntax forest, which assists maintenance staff to monitor the system through key fingerprint information in the log. In addition, we adopt attribute information and status value to train the unsupervised learning algorithm models such as Isolation Forest, OneClassSVM and LocalOutlierFactor. What's more, this paper analyses the results to find out the causes of log surge, and to assist operators in subsequent maintenance of the system.
Article
Full-text available
Monitoring is a consolidated practice to characterize the dependability behavior of a software system. A variety of techniques, such as event logging and operating system probes, are currently used to generate monitoring data for troubleshooting and failure analysis. In spite of the importance of monitoring, whose role can be essential in critical software systems, there is a lack of studies addressing the assessment and the comparison of the techniques aiming to monitor the occurrence of failures during operations. This paper proposes a method to characterize the monitoring techniques implemented in a software system. The method is based on a fault injection approach and allows measuring 1) precision and recall of a monitoring technique and 2) the dissimilarity of the data it generates upon failures. The method has been used in two critical software systems implementing event logging, assertion checking, and source code instrumentation techniques. We analyzed a total of 3 844 failures. With respect to our data, we observed that the effectiveness of a technique is strongly affected by the system and type of failure, and that the combination of different techniques is potentially beneficial to increase the overall failure reporting ability. More important, our analysis revealed a number of practical implications to be taken into account when developing a monitoring technique.
Article
Full-text available
In order to meet stringent performance requirements, system administrators must effectively detect undesirable performance behaviours, identify potential root causes and take adequate corrective measures. The problem of uncovering and understanding performance anomalies and their causes (bottlenecks) in different system and application domains is well studied. In order to assess progress, research trends and identify open challenges, we have reviewed major contributions in the area and present our findings in this survey. Our approach provides an overview of anomaly detection and bottleneck identification research as it relates to the performance of computing systems. By identifying fundamental elements of the problem, we are able to categorize existing solutions based on multiple factors such as the detection goals, nature of applications and systems, system observability, and detection methods.
Article
Full-text available
to difierentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the efiectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the difierent existing techniques in that category are variants of the basic tech- nique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the difierent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
Conference Paper
Full-text available
Automated tools for understanding application behavior and its changes during the application life-cycle are essential for many performance analysis and debugging tasks. Application performance issues have an immediate impact on customer experience and satisfaction. A sudden slowdown of enterprise-wide application can effect a large population of customers, lead to delayed projects and ultimately can result in company financial loss. We believe that online performance modeling should be a part of routine application monitoring. Early, informative warnings on significant changes in application performance should help service providers to timely identify and prevent performance problems and their negative impact on the service. We propose a novel framework for automated anomaly detection and application change analysis. It is based on integration of two complementary techniques: i) a regression-based transaction model that reflects a resource consumption model of the application, and ii) an application performance signature that provides a compact model of run-time behavior of the application. The proposed integrated framework provides a simple and powerful solution for anomaly detection and analysis of essential performance changes in application behavior. An additional benefit of the proposed approach is its simplicity: it is not intrusive and is based on monitoring data that is typically available in enterprise production environments.
Article
Full-text available
This paper gives the main definitions relating to dependability, a generic concept including a special case of such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting). The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of system failures.
Article
Cloud computing systems provide the facilities to make application services resilient against failures of individual computing resources. However, resiliency is typically limited by a cloud consumer’s use and operation of cloud resources. In particular, system operations have been reported as one of the leading causes of system-wide outages. This applies specifically to DevOps operations, such as backup, redeployment, upgrade, customized scaling, and migration – which are executed at much higher frequencies now than a decade ago. We address this problem by proposing a novel approach to detect errors in the execution of these kinds of operations, in particular for rolling upgrade operations. Our regression-based approach leverages the correlation between operations’ activity logs and the effect of operation activities on cloud resources. First, we present a metric selection approach based on regression analysis. Second, the output of a regression model of selected metrics is used to derive assertion specifications, which can be used for runtime verification of running operations. We have conducted a set of experiments with different configurations of an upgrade operation on Amazon Web Services, with and without randomly injected faults to demonstrate the utility of our new approach.