To read the full-text of this research, you can request a copy directly from the authors.
Abstract and Figures
Safety-critical systems are becoming larger and more complex to obtain a higher level of functionality. Hence, modelling and evaluation of these systems can be a difficult and error-prone task. Among existing safety models, Fault Tree Analysis (FTA) is one of the well-known methods in terms of easily understandable graphical structure. This study proposes a novel approach by using Machine Learning (ML) and real-time operational data to learn about the normal behaviour of the system. Afterwards, if any abnormal situation arises with reference to the normal behaviour model, the approach tries to find the explanation of the abnormality on the fault tree and then share the knowledge with the operator. If the fault tree fails to explain the situation, a number of different recommendations, including the potential repair of the fault tree, are provided based on the nature of the situation. A decision tree is utilised for this purpose. The effectiveness of the proposed approach is shown through a hypothetical example of an Aircraft Fuel Distribution System (AFDS).
... In  the authors use a learning approach to formulate hypotheses concerning the impact of context variability on nonfunctional requirements and enhance assurances for real-time SASs. Gheraibia et al.  introduce a machine-learning approach to crosscheck the real-time operational behavior of the system with safety artefacts specified as fault trees. Klös et al.  extend the MAPE-K feedback loop with a meta-adaptation layer learning new adaptation rules based on executable run-time models. ...
Traditional safety-critical systems are engineered in a way to be predictable in all operating conditions. They are common in industrial automation and transport applications where uncertainties (e.g., fault occurrence rates) can be modelled and precisely evaluated. Furthermore, they use high-cost hardware components to increase system reliability. On the contrary, future systems are increasingly required to be “smart” (or “intelligent”) that is to adapt to new scenarios, learn and react to unknown situations, possibly using low-cost hardware components. In order to move a step forward to fulfilling those new expectations, in this paper we address run-time stochastic evaluation of quantitative safety targets, like hazard rate, in self-adaptive event detection systems by using Bayesian Networks and their extensions. Self-adaptation allows changing correlation schemes on diverse detectors based on their reputation, which is continuously updated to account for performance degradation as well as modifications in environmental conditions. To that aim, we introduce a specific methodology and show its application to a case-study of vehicle detection with multiple sensors for which a real-world data-set is available from a previous study. Besides providing a proof-of-concept of our approach, the results of this paper pave the way to the introduction of new paradigms in the dynamic safety assessment of smart systems.
... Human Health Monitoring Medical Systems (HHMMS) are advancing at a dramatic rate, bringing with safety improvements by aiming to deliver improved quality and accuracy in predicting the disease, faster diagnostics, and user-friendly interfaces [1,2,3,4]. With advancements in technology, there is a scope to address the present challenges in the identification and detection of the actual abnormal vital (heart-rate) cardiac signal [5,6]. ...
This paper presents an approach of apt prognostic diagnostics of cardiac health by using Artificial Intelligence (AI) in safety-related based non-invasive bio-medical systems. This approach addresses the existing challenge in identification of the actual abnormality of the vital cardiac signal from the various interrupting factors like bio-signal faulted due to high noise signal interference, electronic and software fault, mechanical fault like sensor contacts failures, wear and tear of equipment. Presently, most of the medical systems use a 1oo1(one-out-of-one) system architectures, and there exists a safety procedure to raise a particular defined type of standard alarm for a specific failure to detect an abnormality. These existing approaches may incur high maintenance costs and subject to random failures with long downtimes of the system and where it affects operational safety to a certain extent. However, there is a scope to improve in the segregation of the actual fault-free signal and extract the abnormality of the vital feature for prognostic diagnostics. With advancements in systems engineering and usage of safety-related design architectures in medical systems, we used an Artificial Intelligence (AI) based approach in performing the data analytics on the selected correct vital signal for prognostic analysis. As a case study, we evaluated by configuring the system with the 2oo2 fault-tolerant safety-related design architecture and implemented the diagnostic function using the AI-based method on the apt logged data during system operation. The results show a substantial improvement in the accuracy of the cardiac health findings.
... Gheraibia et. al.  introduce a machine-learning approach to crosscheck the real-time operational behaviour of the system with safety artefacts specified as fault trees. Klös et. ...
Traditional safety-critical systems are engineered in a way to be predictable in all operating conditions. They are common in industrial automation and transport applications where uncertainties (e.g., fault occurrence rates) can be modelled and precisely evaluated. Furthermore, they use high-cost hardware components to increase system reliability. On the contrary, future systems are increasingly required to be "smart" (or "intelligent") that is to adapt to new scenarios, learn and react to unknown situations, possibly using low-cost hardware components. In order to move a step forward to fulfilling those new expectations, in this paper we address run-time stochastic evaluation of quantitative safety targets, like hazard rate, in self-adaptive event detection systems by using Bayesian Networks and their extensions. Self-adaptation allows changing correlation schemes on diverse detectors based on their reputation, which is continuously updated to account for performance degradation as well as modifications in environmental conditions. To that aim, we introduce a specific methodology and show its application to a case-study of vehicle detection with multiple sensors for which a real-world data-set is available from a previous study. Besides providing a proof-of-concept of our approach, the results of this paper pave the way to the introduction of new paradigms in the dynamic safety assessment of smart systems.
... Linard et al. (2019) has provided a new evolutionarybased approach to generate fault tree from observational data. A novel idea to merge machine learning algorithm and update fault tree has been provided by Gheraibia et al. (2019). In this research, a one-class support vector machine with a decision tree has been used to update the fault tree of safety critical systems. ...
Safety and reliability are two important aspects of dependability that are needed to be rigorously evaluated throughout the development life-cycle of a system. Over the years, several methodologies have been developed for the analysis of failure behaviour of systems. Fault Tree Analysis (FTA) is one of the well-established and widely used methods for safety and reliability engineering of systems. Fault tree, in its classical static form, is inadequate for modelling dynamic interactions between components and is unable to include temporal and statistical dependencies in the model. Several attempts have been made to alleviate the aforementioned limitations of static fault trees (SFT). Dynamic Fault Trees (DFT) were introduced to enhance the modelling power of its static counterpart. In DFT, the expressiveness of fault tree was improved by introducing new dynamic gates. While the introduction of the dynamic gates helps to overcome many limitations of SFT and allow to analyse a wide-range of complex systems, it brings some overhead with it. One such overhead is that the existing combinatorial approaches used for qualitative and quantitative analysis of SFTs are no longer applicable to DFTs. This leads to several successful attempts for developing new approaches for DFT analysis. The methodologies used so far for DFT analysis include, but not limited to, algebraic solution, Markov models, Petri Nets, Bayesian Networks, and Monte Carlo simulation. To illustrate the usefulness of modelling capability of DFTs, many benchmark studies have been performed in different industries. Moreover, software tools are developed to aid in the DFT analysis process. Firstly, in this chapter, we provided a brief description of the DFT methodology. Secondly, this chapter reviews a number of prominent DFT analysis techniques such as Markov chains, Petri Nets, Bayesian networks, algebraic approach; and provides insight into their working mechanism, applicability, strengths, and challenges. These reviewed techniques covered both qualitative and quantitative analysis of DFTs. Thirdly, we discussed the emerging trends on machine learning based approaches to DFT analysis. Fourthly, the research performed for sensitivity analysis in DFTs have been reviewed. Finally , we provided some potential future research directions for DFT-based safety and reliability analysis.
Reliability technology plays an important role in the present era of industrial growth, optimal efficiency, and reducing hazards. This book provides current advances and developments in reliability engineering, filtered across all branches.
It discusses interdisciplinary solutions to complex problems using different approaches to save money, time, and manpower. It presents methodologies of coping with uncertainty in reliability optimization through the usage of various techniques such as soft computing, fuzzy optimization, uncertainty, and maintenance scheduling. Case studies and real-world examples along with applications that can be used in practice with numerous examples are also presented.
This book is useful to researchers, academicians, and practitioners working in the area of Reliability and Systems Assurance Engineering.
Provides current advances and developments across different branches of engineering
Reviews and analyses case studies and real-world examples
Presents applications to be used in practice
Includes numerous examples to illustrate theoretical results
Non-State-Space (Combinatorial) Models
K. Trivedi, A. Bobbio and J. Muppala, "Non-State-Space
(Combinatorial) Models," Reliability and Availability Engineering:
Modeling, Analysis, and Applications, pp. 103-104, 2017.
IET Code of Practice: Competence for Safety Related Systems Practitioners covers
Institution of Engineering and Technology, "IET Code of Practice:
Competence for Safety Related Systems Practitioners covers,"
Engineering Safety Consultants Ltd, UK, 2016.
The science and superstition of quantitative risk assessment
A. Rae, J. McDermid and R. Alexander, "The science and superstition
of quantitative risk assessment," Journal of Systems Safety, pp. 28-38, 2012.
Fukushima, risk, and probability: Expect the unexpected
C. Perrow, "Fukushima, risk, and probability: Expect the
Hazard-driven testing of safety-related software
J. Joyce and K. Wong, "Hazard-driven testing of safety-related
software," in 21st International System Safety Conference,
Vancouver, Canada, 2003.
Safety Investigation into the accident on 1
Bureau D'enquetes d'analyses, "Safety Investigation into the accident
on 1 June 2009 to the Airbus A330-203, flight AF447," BEA, France,
AI + SAFETY: Safety implications for Artificial Intelligence
F Børre Pedersen
E. Simen, C. Agrell, A. Hafver and F. Børre Pedersen, "AI +
SAFETY: Safety implications for Artificial Intelligence," DNV GL,
Replacement of the subsea control module (SCM) and halting production caused by failure of the SCM are costly, and the electrical control system plays a large role in the failure modes of the SCM. Hence, the analysis of its reliability and safety is important. Markov processes and multiple beta factor (MBF) model are used to model the reliability and safety for the electrical control system of the SCM that consider the effects of multiple factors, including the failure detection rate, common-cause failure (CCF) and failure rate of each module, for evaluating system reliability and safety. Morris screening method that is based on a large amount of random data with equal probability is applied to reliability and safety models that vary with time for quantifying the effect levels of each factor on the system reliability and safety in the time interval. The effect of each factor on the system and its interaction degree with other factors are evaluated in the time interval. Finally, the system model is simulated in MATLAB. The simulation demonstrates that the system reliability and safety gradually decrease, and the effect of each factor on them increases over time; the relia-bility is most sensitive to the failure rates of the input, central processing unit (CPU) and output modules, whereas the safety is most sensitive to the failure detection rate. The common-cause failure and the compar-ator module have some effects on the reliability and safety of the system.
Critical technological systems exhibit complex dynamic characteristics such as time-dependent behaviour, functional dependencies among events, sequencing and priority of causes that may alter the effects of failure. Dynamic fault trees (DFTs) have been used in the past to model the failure logic of such systems, but the quantitative analysis of DFTs has assumed the existence of precise failure data and statistical independence among events, which are unrealistic assumptions. In this paper, we propose an improved approach to reliability analysis of dynamic systems, allowing for uncertain failure data and statistical and stochastic dependencies among events. In the proposed framework, DFTs are used for dynamic failure modelling. Quantitative evaluation of DFTs is performed by converting them into generalised stochastic Petri nets. When failure data are unavailable, expert judgment and fuzzy set theory are used to obtain reasonable estimates. The approach is demonstrated on a simplified model of a Cardiac Assist System.
During the whole life-cycle of software-intensive systems in safety-critical domains, system models must consistently co-evolve with quality evaluation models like fault trees. However, performing these co-evolution steps is a cumbersome and often manual task. To understand this problem in detail, we have analyzed the evolution and mined common changes of architecture and fault tree models for a set of evolution scenarios of a part of a factory automation system called Pick and Place Unit. On the other hand, we designed a set of intra- and inter-model transformation rules which fully cover the evolution scenarios of the case study and which offer the potential to semi-automate the co-evolution process. In particular, we validated these rules with respect to completeness and evaluated them by a comparison to typical visual editor operations. Our results show a significant reduction of the amount of required user interactions in order to realize the co-evolution.
Process mining methods allow analysts to exploit logs of historical executions of business processes in order to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Several dozen automated process discovery methods have been proposed in the past two decades, striking different trade-offs between scalability, accuracy and complexity of the resulting models. So far, automated process discovery methods have been evaluated in an ad hoc manner, with different authors employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of non-publicly available datasets. In this setting, this article provides a systematic review of automated process discovery methods and a systematic comparative evaluation of existing implementations of these methods using an opensource benchmark covering nine publicly-available real-life event logs and eight quality metrics. The review and evaluation results highlight gaps and unexplored trade-offs in the field, including the lack of scalability of several proposals in the field and a strong divergence in the performance of different methods with respect to different quality metrics. The proposed benchmark allows researchers to empirically compare new automated process discovery against existing ones in a unified setting.
Condition-based maintenance strategies adapt maintenance planning through the integration of online condition monitoring of assets. The accuracy and cost-effectiveness of these strategies can be improved by integrating prognostics predictions and grouping maintenance actions respectively. In complex industrial systems, however, effective condition-based maintenance is intricate. Such systems are comprised of repairable assets which can fail in different ways, with various effects, and typically governed by dynamics which include time-dependent and conditional events. In this context, system reliability prediction is complex and effective maintenance planning is virtually impossible prior to system deployment and hard even in the case of condition-based maintenance. Addressing these issues, this paper presents an online system maintenance method that takes into account the system dynamics. The method employs an online predictive diagnosis algorithm to distinguish between critical and non-critical assets. A prognostics-updated method for predicting the system health is then employed to yield well-informed, more accurate, condition-based suggestions for the maintenance of critical assets and for the group-based reactive repair of non-critical assets. The cost-effectiveness of the approach is discussed in a case study from the power industry.
to difierentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the efiectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the difierent existing techniques in that category are variants of the basic tech- nique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the difierent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
This paper gives the main definitions relating to dependability, a generic concept including a special case of such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting). The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of system failures.
Process mining techniques have recently received notable attention in the literature; for their ability to assist in the (re)design of complex processes by automatically discovering models that explain the events registered in some log traces provided as input. Following this line of research, the paper investigates an extension of such basic approaches, where the identification of different variants for the process is explicitly accounted for, based on the clustering of log traces. Indeed, modeling each group of similar executions with a different schema allows us to single out "conformant" models, which, specifically, minimize the number of modeled enactments that are extraneous to the process semantics. Therefore, a novel process mining framework is introduced and some relevant computational issues are deeply studied. As finding an exact solution to such an enhanced process mining problem is proven to require high computational costs, in most practical cases, a greedy approach is devised. This is founded on an iterative, hierarchical, refinement of the process model, where, at each step, traces sharing similar behavior patterns are clustered together and equipped with a specialized schema. The algorithm guarantees that each refinement leads to an increasingly sound mDdel, thus attaining a monotonic search. Experimental results evidence the validity of the approach with respect to both effectiveness and scalability.
Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and, typically, there are discrepancies between the actual workflow processes and the processes as perceived by the management. Therefore, we have developed techniques for discovering workflow models. The starting point for such techniques is a so-called "workflow log" containing information about the workflow process as it is actually being executed. We present a new algorithm to extract a process model from such a log and represent it in terms of a Petri net. However, we also demonstrate that it is not possible to discover arbitrary workflow processes. We explore a class of workflow processes that can be discovered. We show that the α-algorithm can successfully mine any workflow represented by a so-called SWF-net.
Dynamic fault tree (DFT) is a top-down deductive technique extended to model systems with complex failure behaviors and interactions. In two last decades, different methods have been applied to improve its capabilities, such as computational complexity reduction, modularization, intricate failure distribution, and reconfiguration. This paper uses semi-Markov process (SMP) theorem for DFT solution with the motivation of obviating the model state-explosion, considering nonexponential failure distribution through a hierarchical solution. In addition, in the proposed method, a universal SMP for static and dynamic gates is introduced, which can generalize dynamic behaviors like functional dependencies, sequences, priorities, and spares in a single model. The efficiency of the method regarding precision and competitiveness with commercial tools, repeated events consideration, computational complexity reduction, nonexponential failure distribution consideration, and repairable events in DFT is studied by a number of examples, and the results are then compared to those of the selected existing methods.
In order to improve the intelligent level of fault diagnosis and condition maintenance of hydropower units, an Imitation medical diagnosis method (IMDM) is proposed in this study. IMDM uses Bayesian networks (BN) as the technical framework, including three components: machine learning BN model, expert empirical BN model, and maintenance decision model. Its characteristics are as follows: (i) the machine learning model uses a new node selection method to solve the problem that the traditional fault diagnosis model is difficult to connect with the state monitoring system. (ii) The expert experience BN model improves the traditional method: using the fault tree model to transform the BN structure, Noisy-Or model to simplify conditional probability table, and fuzzy comprehensive evaluation method to obtain the conditional probability. (iii) By introducing the expected utility theory, a maintenance decision model is innovated, which makes sure the optimal maintenance decision scheme after the fault can be better selected. The performance of this proposed method is evaluated by using the experimental data. The results show that the accuracy of the fault reasoning model is higher than 80%, and the maintenance decision model successfully selects 236 optimal maintenance decision schemes from 3159 schemes generated by 13 faults.
Most of today's machine learning (ML) methods and implementations are based on correlations, in the sense of a statistical relationship between a set of inputs and the output(s) under investigation. The relationship might be obscure to the human mind, but through the use of ML, mathematics and statistics makes it seemingly apparent. However, to base safety critical decisions on such methods suffer from the same pitfalls as decisions based on any other correlation metric that disregards causality. Causality is key to ensure that applied mitigation tactics will actually affect the outcome in the desired way.
This paper reviews the current situation and challenges of applying ML in high risk environments. It further outlines how phenomenological knowledge, together with an uncertainty-based risk perspective can be incorporated to alleviate the missing causality considerations in current practice.
In this brief, a hierarchical Bayesian network modeling framework is formulated for large-scale process monitoring and decision making, which includes a basic layer and a functional layer. First, the whole process is decomposed into different units, where local Bayesian networks are constructed, providing monitoring information and decision-making capability for the upper layer. The network structure is determined automatically based on the process data in each local unit of the basic layer. Then, through incorporating the topological structure of the process, a functional Bayesian network is further constructed to infer the information from the basic layer, which can be customized according to user demands, such as fault detection, fault diagnosis, and classification of operating status. The performance of the proposed method is evaluated through a benchmark process.
Dynamic systems exhibit time-dependent behaviours and complex functional dependencies amongst their components. Therefore, to capture the full system failure behaviour, it is not enough to simply determine the consequences of different combinations of failure events: it is also necessary to understand the order in which they fail. Pandora temporal fault trees (TFTs) increase the expressive power of fault trees and allow modelling of sequence-dependent failure behaviour of systems. However, like classical fault tree analysis, TFT analysis requires a lot of manual effort, which makes it time consuming and expensive. This in turn makes it less viable for use in modern, iterated system design processes, which requires a quicker turnaround and consistency across evolutions. In this paper, we propose for a model-based analysis of temporal fault trees via HiP-HOPS, which is a state-of-the-art model-based dependability analysis method supported by tools that largely automate analysis and optimisation of systems. The proposal extends HiP-HOPS with Pandora, Petri Nets and Bayesian Networks and results to dynamic dependability analysis that is more readily integrated into modern design processes. The effectiveness is demonstrated via application to an aircraft fuel distribution system.
The use of average data for dependability assessments results in a outdated system-level dependability estimation which can lead to incorrect design decisions. With increasing availability of online data, there is room to improve traditional dependability assessment techniques. Namely, prognostics is an emerging field which provides asset-specific failure information which can be reused to improve the system level failure estimation. This paper presents a framework for prognostics-updated dynamic dependability assessment. The dynamic behaviour comes from runtime updated information, asset inter-dependencies, and time-dependent system behaviour. A case study from the power generation industry is analysed and results confirm the validity of the approach for improved near real-time unavailability estimations.
Fault Tree Analysis (FTA) is a well-established and well-understood technique, widely used for dependability evaluation of a wide range of systems. Although many extensions of fault trees have been proposed, they suffer from a variety of shortcomings. In particular, even where software tool support exists, these analyses require a lot of manual effort. Over the past two decades, research has focused on simplifying dependability analysis by looking at how we can synthesise dependability information from system models automatically. This has led to the field of model-based dependability analysis (MBDA). Different tools and techniques have been developed as part of MBDA to automate the generation of dependability analysis artefacts such as fault trees. Firstly, this paper reviews the standard fault tree with its limitations. Secondly, different extensions of standard fault trees are reviewed. Thirdly, this paper reviews a number of prominent MBDA techniques where fault trees are used as a means for system dependability analysis and provides an insight into their working mechanism, applicability, strengths and challenges. Finally, the future outlook for MBDA is outlined, which includes the prospect of developing expert and intelligent systems for dependability analysis of complex open systems under the conditions of uncertainty.
This paper presents a condition-based monitoring methodology based on novelty detection applied to industrial machinery. The proposed approach includes both, the classical classification of multiple a priori known scenarios, and the innovative detection capability of new operating modes not previously available. The development of condition-based monitoring methodologies considering the isolation capabilities of unexpected scenarios represents, nowadays, a trending topic able to answer the demanding requirements of the future industrial processes monitoring systems. First, the method is based on the temporal segmentation of the available physical magnitudes, and the estimation of a set of time-based statistical features. Then, a double feature reduction stage based on Principal Component Analysis and Linear Discriminant Analysis is applied in order to optimize the classification and novelty detection performances. The posterior combination of a Feed-forward Neural Network and One-Class Support Vector Machine allows the proper interpretation of known and unknown operating conditions. The effectiveness of this novel condition monitoring scheme has been verified by experimental results obtained from an automotive industry machine.
Present study addresses a fault diagnosis system based on micro-macro data for monitoring chemical plants. Macro data are encapsulation of process history in terms of prior probability distribution of faults which is achieved using the fault tree analysis. Then, the fault diagnosis system is developed based on an imbalanced dataset, established by frequent and rare faults. In addition, micro data, records of sensors at each time step, are used to predict faults. The Bayesian network is proposed to integrate micro-macro data for diagnostic purposes. Efficiency of the proposed framework was evaluated for an industrial gas sweetening unit. It was shown that the diagnostic performance of the proposed approach is remarkable. Thus, it was concluded that fusion of micro-macro data enhances the performance of the fault diagnosis system. Furthermore, extraction of significant features using the principal components analysis promotes the diagnosis performance. The proposed framework, compared to conventional ones, shows 21% improvement in terms of accuracy. In addition, error bands of fault prediction decreased through implementing a hierarchical strategy.
In recent years, data science emerged as a new and important discipline. It can be viewed as an amalgamation of classical disciplines like statistics, data mining, databases, and distributed systems. Existing approaches need to be combined to turn abundantly available data into value for individuals, organizations, and society. Moreover, new challenges have emerged, not just in terms of size (“Big Data”) but also in terms of the questions to be answered. This book focuses on the analysis of behavior based on event data. Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements. In later chapters, we will show that process mining provides powerful tools for today’s data scientist. However, before introducing the main topic of the book, we provide an overview of the data science discipline.
TOKYO — In the country that gave the world the word tsunami, the Japanese nuclear establishment largely disregarded the potentially destructive force of the walls of water. The word did not even appear in government guidelines until 2006, decades after plants — including the Fukushima Daiichi facility that firefighters are still struggling to get under control — began dotting the Japanese coastline. The lack of attention may help explain how, on an island nation surrounded by clashing tectonic plates that commonly produce tsunamis, the protections were so tragically minuscule compared with the nearly 46-foot tsunami that overwhelmed the Fukushima plant on March 11. Offshore breakwaters, designed to guard against typhoons but not tsunamis, succumbed quickly as a first line of defense. The wave grew three times as tall as the bluff on which the plant had been built. Japanese government and utility officials have repeatedly said that engineers could never have anticipated the magnitude 9.0 earthquake — by far the largest in Japanese history — that caused the sea bottom to shudder and generated the huge tsunami. Even so, seismologists and tsunami experts say that according to readily available data, an earthquake with a magnitude as low as 7.5 — almost garden variety around the Pacific Rim — could have created a tsunami large enough to top the bluff at Fukushima. After an advisory group issued nonbinding recommendations in 2002, Tokyo Electric Power Company, the plant owner and Japan's biggest utility, raised its maximum projected tsunami at Fukushima Daiichi to between 17.7 and 18.7 feet — considerably higher than the 13-foot-high bluff. Yet the company appeared to respond only by raising the level of an electric pump near the coast by 8 inches, presumably to protect it from high water, regulators said.
In this paper, a safety sensitivity analysis approach is developed. This approach is built upon a sensitivity analysis approach for acyclic Markov reliability models and the Markov Chain Modular approach. The safety sensitivity analysis approach is exercised using an example sensor system.
Process mining techniques relate observed behavior (i.e., event logs) to modeled behavior (e.g., a BPMN model or a Petri net). Process models can be discovered from event logs and conformance checking techniques can be used to detect and diagnose differences between observed and modeled behavior. Existing process mining techniques can only uncover these differences, but the actual repair of the model is left to the user and is not supported. In this paper we investigate the problem of repairing a process model w.r.t. a log such that the resulting model can replay the log (i.e., conforms to it) and is as similar as possible to the original model. To solve the problem, we use an existing conformance checker that aligns the runs of the given process model to the traces in the log. Based on this information, we decompose the log into several sublogs of non-fitting subtraces. For each sublog, either a loop is discovered that can replay the sublog or a subprocess is derived that is then added to the original model at the appropriate location. The approach is implemented in the process mining toolkit ProM and has been validated on logs and models from several Dutch municipalities.
Fault tree analysis (FTA) is one of the preeminent methods for testing the reliability, trustworthiness, and safety of engineered systems. Given their pervasiveness as a major tool for assessing the risks of technology, it is imperative that methodologies such as FTA are valid and sound. If a safety assessment based on an FTA is erroneous, the system may fail to work as expected. In this paper I submit FTA to critical scrutiny. Through a detailed step-by-step investigation, it is shown that the FTA methodology rests on numerous unproven, even false assumptions. The paper sketches out a set of ethical principles for risk assessment professionals, which, if followed, will allow them to meet their professional and ethical obligations to consider the impacts of risk on all of the stakeholders involved.
Quality standards impose increasingly stringent requirements and constraints on quality of service attributes and measures. As a consequence, aspects, phenomena, and behaviors, hitherto approximated or neglected, have to be taken into account in quantitative assessment in order to provide adequate measures satisfying smaller and smaller confidence intervals and tolerances. With specific regards to reliability and availability, this means that interferences and dependencies involving the components of a system can no longer be neglected. Therefore, in order to support such a trend, specific techniques and tools are required to adequately deal with dynamic aspects in reliability and availability assessment.The main goal of this paper is to demonstrate how state–space based techniques can satisfy such a demand. For this purpose some examples of specific dynamic reliability behaviors, such as common cause failure and load sharing, are considered applying state–space based techniques to study the corresponding reliability models. Different repair policies in availability contexts are also explored. Both Markovian and non-Markovian models are studied via phase type expansion and renewal theory in order to adequately represent and evaluate the considered dynamic reliability aspects in case of generally distributed lifetimes and times to repair.
Over recent years a significant amount of research has been undertaken to develop prognostic models that can be used to predict the remaining useful life of engineering assets. Implementations by industry have only had limited success. By design, models are subject to specific assumptions and approximations, some of which are mathematical, while others relate to practical implementation issues such as the amount of data required to validate and verify a proposed model. Therefore, appropriate model selection for successful practical implementation requires not only a mathematical understanding of each model type, but also an appreciation of how a particular business intends to utilise a model and its outputs.This paper discusses business issues that need to be considered when selecting an appropriate modelling approach for trial. It also presents classification tables and process flow diagrams to assist industry and research personnel select appropriate prognostic models for predicting the remaining useful life of engineering assets within their specific business environment. The paper then explores the strengths and weaknesses of the main prognostics model classes to establish what makes them better suited to certain applications than to others and summarises how each have been applied to engineering prognostics. Consequently, this paper should provide a starting point for young researchers first considering options for remaining useful life prediction. The models described in this paper are Knowledge-based (expert and fuzzy), Life expectancy (stochastic and statistical), Artificial Neural Networks, and Physical models.
Hazard identification is fundamental to construction safety management; unidentified hazards present the most unmanageable risks. This paper presents an investigation indicating the current levels of hazard identification on three U.K. construction projects. A maximum of only 6.7% of the method statements analyzed on these projects managed to identify all of the hazards that should have been identified, based upon current knowledge. Maximum hazard identification levels were found to be 0.899 89.9% for a construction project within the nuclear industry, 0.728 72.8% for a project within the railway industry, and 0.665 66.5% for a project within both the railway and general construction industry sector. The results indicate that hazard identification levels are far from ideal. A discussion on the reasons for low hazard identification levels indicates key barriers. This leads to the presentation of an Information Technology IT tool for construction project safety management Total-Safety and, in particular, a module within Total-Safety designed to help construction personnel develop method statements with improved levels of hazard identification.
New technology is making fundamental changes in the etiology of accidents and is creating a need for changes in the explanatory mechanisms used. We need better and less subjective understanding of why accidents occur and how to prevent future ones. The most effective models will go beyond assigning blame and instead help engineers to learn as much as possible about all the factors involved, including those related to social and organizational structures. This paper presents a new accident model founded on basic systems theory concepts. The use of such a model provides a theoretical foundation for the introduction of unique new types of accident analysis, hazard analysis, accident prevention strategies including new approaches to designing for safety, risk assessment techniques, and approaches to designing performance monitoring and safety metrics.
Fault diagnostic methods aim to recognize when faults exist on a system and to identify the failures that have caused the fault. The symptoms of the fault are obtained from readings from sensors located on the system.When the observed readings do not match those expected then a fault can exist. Using the detailed information provided by the sensors, a list of the failures (singly or in combinations) that could cause the symptoms can be deduced. In the last two decades, fault diagnosis has received growing attention due to the complexity of modern systems and the consequent need for more sophisticated techniques to identify the failures when they occur. Detecting the causes of a fault quickly and efficiently means reducing the costs associated with the system unavailability and, in certain cases, avoiding the risks of unsafe operating conditions. Bayesian belief networks (BBNs) are probabilistic models that were developed in artificial intelligence applications but are now applied in many fields. They are ideal for modelling the causal relations between faults and symptoms used in the detection process. The probabilities of events within the BBN can be updated following observations (evidence) about the system state. In this paper we investigate how BBNs can be applied to diagnose faults on a system. Initially Fault trees (FTs) are constructed to indicate how the component failures can combine to cause unexpected deviations in the variables monitored by the sensors. Converting FTs into BNs enables the creation of a model that represents the system with a single network, which is constituted by sub-networks. The posterior probabilities of the components' failures give a measure of those components that have caused the symptoms observed. The method gives a procedure that can be generalized for any system where the causality structure can be developed relating the system component states to the sensor readings. The technique is demonstrated with a simple example system.
The aim of the investigation was to develop a new method for evaluating the validity of safety analyses with incident and accident descriptions, and to use the approach on some common methods of safety analysis. For this purpose descriptions of disturbances and accidents were collected at seven Finnish process plants. This resulted in 51 incident descriptions. Complementary material (18 incidents) was collected from the FACT data bank. The evaluation concerned four methods - hazard and operability study (HAZOP), action error analysis (AEA), failure mode and effect analysis (FMEA), and management oversight and risk tree (MORT). The basic idea of the evaluations was to use the descriptions of accidents and disturbances as indicators of the real accident contributors, and to decide which of them would have been identified if the methods had been applied in a plant-wide analysis. The results show a total validity of 0. 55 of the methods in the cases used in the evaluations.
Voting algorithms are used to provide an error masking capability in a wide range of highly dependable commercial & research applications. These applications include N-Modular Redundant hardware systems and diversely designed software systems based on N-Version Programming. The most sophisticated & complex algorithms can even tolerate malicious (or Byzantine) subsystem errors. The algorithms can be implemented in hardware or software depending on the characteristics of the application, and the type of voter selected. Many voting algorithms have been defined in the literature, each with particular strengths and weaknesses. Having surveyed more than 70 references from the literature, a functional classification is used in this paper to provide taxonomy of those voting algorithms used in safety-critical applications. We classify voters into three categories: generic, hybrid, and purpose-built voters. Selected algorithms of each category are described, for illustrative purposes, and application areas proposed. Approaches to the comparison of algorithm behavior are also surveyed. These approaches compare the acceptability of voter behavior based on either statistical considerations (e.g., number of successes, number of benign or catastrophic results), or probabilistic computations (e.g., probability of choosing correct value in each voting cycle or average mean square error) during q voting cycles.