Conference PaperPDF Available

Analysis of a major-accident dataset by Association Rule Mining to minimise unsafe interfaces

Authors:

Abstract and Figures

Major accidents may cause severe damage to humans and the environment, and can potentially lead to significant losses in a business and societal level. Thus, the understanding of these complex multi-attribute events through the analysis of past accidents might assist the search for strategies to improve engineering system's safety and design robustness. Therefore, we aim to explore potential relationships among contributing factors by means of assessing approximately 200 major industrial accidents from the Multi-attribute Technological Accidents Dataset (MATA-D) created by Moura et al. Understanding this complex and high dimensional data on incidents, is the main purpose of this work. We apply association rule mining techniques and perform point-failure analysis in order to produce further insight into the dataset. Subsequently, key similarities among accidents' contributing factors will be analysed, in order to disclose relevant associations and identify to which extent a limited number of driving forces might be generating undesirable events. Results will be regarded as additional indicators to reduce risky interfaces among contributing factors, and to indicate further managerial actions to minimise accidents. Conclusions to enable additional means to visualise and communicate risks to specific stakeholders are then discussed.
Content may be subject to copyright.
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Analysis of a major-accident dataset by Association Rule Mining to
minimise unsafe interfaces
Christoph Doell1and Pascal Held1and Raphael Moura2and Rudolf Kruse1and
Michael Beer2
1Faculty of Computer Science, Otto von Guericke University of Magdeburg,
Universitaetsplatz 2 39106 Magdeburg, Germany.
E-mail: {christoph.doell,pascal.held,rudolf.kruse}@ovgu.de
2Institute for Risk and Uncertainty, University of Liverpool, Brodie Tower, Brownlow Street,
Liverpool L69 3GQ, UK.
E-mail: {rmoura,mbeer}@liverpool.ac.uk
Major accidents may cause severe damage to humans and the environment, and can potentially
lead to significant losses in a business and societal level. Thus, the understanding of these
complex multi-attribute events through the analysis of past accidents might assist the search
for strategies to improve engineering system’s safety and design robustness.
Therefore, we aim to explore potential relationships among contributing factors by means of
assessing approximately 200 major industrial accidents from the Multi-attribute Technological
Accidents Dataset (MATA-D) created by Moura et al.
Understanding this complex and high dimensional data on incidents, is the main purpose of
this work. We apply association rule mining techniques and perform point-failure analysis
in order to produce further insight into the dataset. Subsequently, key similarities among
accidents’ contributing factors will be analysed, in order to disclose relevant associations and
identify to which extent a limited number of driving forces might be generating undesirable
events.
Results will be regarded as additional indicators to reduce risky interfaces among contribut-
ing factors, and to indicate further managerial actions to minimise accidents. Conclusions to
enable additional means to visualise and communicate risks to specific stakeholders are then
discussed.
Keywords: Association Rule Mining; MATA-D; major accidents; binary data.
1. Introduction
Recent major accidents in high-technology systems have been triggering serious concerns
about the safety level of complex industrial structures, raising reasonable doubts among a
distressed and perplexed society. The development of novel standards, procedures, safety
assessments and the role of regulatory bodies do not seem to have being able to cope with
the intricate relationship between new technologies, organisational aspects and human factors.
212
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Edited by Edoardo Patelli & Ioannis Kougioumtzoglou
Copyright c
2015 IPW 2015 Organisers :: Published by Research Publishing
ISBN: 978-981-00-0000-0 — doi:10.3850/978-981-00-0000-0 092
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
The death toll from the latest widely-known events, such as the Tianjin Port explosion in China
(August 2015, 173 fatalities), the sinking of the Korean Ferry MV Sewol (April 2014, 304
fatalities), the disappearance of the Malaysian MH370 flight (March 2014, 239 fatalities), the
Lac-Megantic train derailment and explosion in Quebec (July 2013, 47 Fatalities) and the
sinking of the Costa Concordia Transatlantic in Italy (January 2012, 32 fatalities) has been a
substantial blow on the public’s confidence in the industry’s capacity to conduct operations
safely. Extensive investigation reports are usually generated from this type of catastrophic
event, to elucidate how several contributing factors perfectly merged to produce undesirable
developments which collapsed systems and structures originally designed to be highly reliable.
Therefore, there is a need to understand these complex interactions occurred during major
events, in order to learn from previous accidents and identify common patterns and relation-
ships to gain some insight into the genesis of major accidents. This work uses a University
of Liverpool’s proprietary dataset named MATA-D (Multi-attribute Technological Accidents
Dataset), which was presented by Moura et al [7].
The following section will provide some further information regarding the dataset, com-
prising the structuration method and the effects of the taxonomy used on the type of analysis
hereby performed. Then, difficulties related to the interpretation of such a heterogeneous and
sparse data are discussed, independent of the underlying analysis. Earlier approaches, such as
the application of self-organizing map and hierarchical clustering, including their advantages,
disadvantages and their corresponding results, are addressed. Later, the methods section will
explain the algorithms used in this work, i.e. Frequent Itemset Mining and Association Rule
Mining, being followed by the description of the experiments and its results, as well as some
advice to assist stakeholders on risk reduction measures.
2. Data Description
Hollnagel has developed a methodology for conducting Human Reliability Analysis, aiming
at the performance prediction of operators when conducting tasks (to quantify possible
outcomes) and to support the search for causes of occurred events (i.e. to investigate
accidents)[4]. To support both the predictive and the responsive use of the method, he
developed the taxonomy CREAM, containing 53 potential descriptions for human erroneous
actions, cognitive functions, personal-related factors, technology issues and organisational
aspects. These descriptions are hierarchically grouped in the highest level: Human Error,
Technological Error and Organisational Environment Error. These can be separated further as
follows: Human Errors contains the areas: Action,Specific Cognitive Functions,Temporary
Person Related Functions and Permanent Person Related Functions.Technology is split to:
Equipment,Procedures,Temporary Interface and Permanent Interface. The third group is
Organisational Environment, containing: Organisation, Training, Ambient Conditions and
Working Conditions.
Moura et al [7] gives more details on the applied taxonomy, but also examined other
taxonomies, such as the ones contained in the Human Error Assessment and Reduction
Technique [11] and the Human Factors Analysis and Classification System [10]. The generalist
213
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
feature of Hollnagel’s classification system, as well as its hierarchical partition, favoured its
usage as a dataset structural basis without further adaptation for the generation of MATA-D.
The MATA-D dataset contains data on major accidents that occurred in the past. It was
generated by a review of reports on accidents [7] and contains data on 216 major accidents.
As four of the 53 attributes did not occur, we only have to handle 49 attributes, so the space’s
dimension is 216 ×49 of boolean values. They indicate whether or not a specific type of error
occurred within the process that caused the accident.
Our universe of discourse captures a set of accidents and the corresponding contributing
factors. We do not have any data given on near accidents or events given the total hours of
operation. Hence, we have no information about probabilities of attributes being independent
from an accident. Especially when dealing with rare events, this would be crucial from a
frequencist point of view, because some potentially driving factors might occur often, even
without causing an accident.
Nevertheless, the generation of such a dataset is difficult, as the information is written in
human readable documents, which have to be analysed manually in order to generate an entry
for our data base. For many areas data on accidents or near accidents is not available. This has
several reasons: First of all, we need a reliable documentation of the accident, which already
limits the time, as we think back of disasters from long ago, as the destruction of the German
Zeppelin Hindenburg or the sinking of the Titanic. If data is recorded properly, it may still
be classified, as for many military accidents. Companies might be unwilling to reveal their
documents as this might damage their reputation. This becomes even more difficult for near
accidents. Only few areas perform a clear identification of near misses. Incidents in aviation
– even when not leading to accidents – can be reported in detail and accessed in the Nasa
aviation safety reporting system[2]. So, future work might include data on more accidents and
also near misses, to perform a wider analysis on the driving factors. For now, we will analyse
the major accidents data given in the MATA-D.
3. Related Work
Earlier approaches by Moura et al. [8] tried to cluster the data. They used unsupervised learning
techniques such as a Self Organising Map (SOM) [6]. Self organising maps are both: a method
for clustering and for visualization of high dimensional data. They use a neural network as
basis and initiate random values for neurons, based on a grid, which forms the output space.
Neurons on the grid are connected with edges. Learning works as follows: For the current
training sample the algorithm finds the neuron in the grid most similar and adepts the neuron’s
weight to better match the sample. In addition all connected neurons are adapted in the same
way, just weaker. Neurons farther away may be also slightly adapted, depending on the chosen
parameters. If a later sample is similar to one already learned, it will probably affect a neuron
close to the one found earlier. This behaviour results in grouping similar samples close to each
other, while dissimilar samples iteratively converge to having larger distances. After learning
all training examples, the (normally two dimensional) grid is directly used as visualisation.
214
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
As the described work uses also MATA-D we can directly give few examples of these earlier
findings. Five clusters were found showing different relations between attributes, for which we
discuss three in detail: Accidents with equipment failures built a cluster with relations to only
few other attributes. Another cluster implied relations between design failures, insufficient
skills, communication failures and specific execution factors like wrong time and wrong
type. A third cluster contained accidents with problems in quality control, task allocation,
maintenance and design as well as insufficient knowledge and missing information.
Another approach performed was hierarchical agglomerative clustering, as described in
[5]. One critical aspect with cluster analysis is the choice of the similarity measure. For
accidents as given boolean vectors similarity mean that they have many common reasons and
few reasons, which they have not in common. The bray-curtis dissimilarity [3]fulfils this
property. Hierarchical agglomerative clustering initialises every data point as a cluster and
merges close clusters subsequently. Depending on the chosen similarity and linkage method,
different results can be gained. Bray-curtis dissimilarity and different linkage methods, were
applied to MATA-D [9] generating several clusters. One showed communication issues and
the highest fatality rate from all clusters, another grouped training aspects in combination with
knowledge issues.
For the latter works the similarity measure of accidents can highly influence the results. And
in general one might argue which one is appropriate. Avoiding this conflict, we apply a method
not on (dis-)similarity of accidents, but on relations of the underlying attributes.
4. Methodology
In market basket analysis, the given data are often considered to be items bought together.
Each customer chooses his items, puts them into his basket and buys them. Mathematically,
we speak of a transaction, containing the items as binary vector that describes which items
were chosen. The set of all given transactions is called transaction database. Given these data,
we are interested in association rules like ”‘90% of transactions that purchase bread and butter,
also purchase milk”’[1]. One property of association rules is that the generated rules shall
have a high support, meaning that the describes behaviour can be seen frequently within the
transaction database. Be Athe antecedent and B the consequent of a rule, then support(A)
P(A)where P(A)denotes the empirical probability for the occurrence of Ain the transaction
database. The support of a rule is given by P(A, B). Further properties of rules are confidence
P(B|A), describing the rate for the consequence holding, when the antecedent occurs - for
example the value of 90% in the example above. The rules lift P(A, B)/ P (A)P(B), explains
the change in ratio: Staying with example of bread, milk and butter – be the rate of milk being
bought at 5% and when given bread and butter are bought, the rate of milk be 10%, then the
lift of the rule is 2.0 (10% / 5%).
We are interested in rules with a high support. This has the practical consequence that we
do not need to look for every possible set of premises, but only for those with high support. So
215
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
we apply frequent itemset mining as initial step in our analysis, which generates the relative
frequencies needed for generating association rules.
As the proposed algorithms generate association rules automatically, the interpretation of
the rules remains manually as there are several properties to be maximized. Namely support,
confidence, lift and the simplicity of the rule. Simplicity in this case means number of items
in the antecedent. For rule generation it may be subjective preference whether adding another
item to a rule results in enough lift and confidence growth with respect to only a little decrease
in support. As we aim to communicate the most important rules to stakeholders, we only
consider rules with up to three items in the antecedent.
5. Experiments and Results
The dataset is binary, so we can interpret it as items and transactions and apply association rule
mining. Then we determine those confident rules that are of simple structure, and are high-
lifting as well as sufficiently supported. As association rules can only generate insights on
frequently appearing attributes, we finally take a special look at the subset of those accidents,
which occurred although only few aspects went wrong. We focus on those, with less then four
contributing attributes.
MATA-D only contains information on accidents and not on near accidents, which makes the
interpretation of rules difficult. They do not mean that a given premise caused the consequence
which caused the accident, but that premises and consequences are statistically related. This
means avoiding at least one part of the premise lowers the chance of the consequence to occur.
If the consequence can be prevented, there are at least two less contributing factors – one
premise and the consequence – , which raises hope that the accident can be prevented, too.
Basic Exploration
As first step of the analysis, we look at 53 ×216 matrix and see that only 1416 of the 11448
values are ones. This sparsely filled matrix means that in average accidents have only few
reasons. Figure 1 shows the accidents per attribute(left) and vice versa(right). Four attributes
occur more than 100 times, six attributes between forty and hundred times and the remaining
43 attributes less then 40 times each. This shows that there are few attributes shown very
frequently, while others hardly occur at all. The ten most frequent factors are, in a decreasing
order, as follows: design failure, inadequate quality control, equipment failure, inadequate
task allocation, inadequate procedure, insufficient skills, maintenance failure, insufficient
knowledge, wrong place and missing information.
When looking at the attributes per accident, only few accidents show more than 15 reasons
and most of the accidents have at least 5 contributing factors. This indicated that association
rule mining might reveal insights on these accidents.
Still, a quarter of all the accidents is caused by less than four factors. As infrequent factors
represent a variety of all the possible attributes. Consequently exploring accidents only caused
by few point failures thoroughly can give additional insight resolving whether some of these
accidents are caused by infrequent factors.
216
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Fig. 1. Distribution of accidents per attribute(left) and attributes per accident(right).
The CREAM taxonomy allows to perform the same analysis on a more coarse level. The
results are depicted in Figure 2.
Fig. 2. Distribution of accidents per aggregated attribute(left) and attributes per accident(right).
Surprisingly, the label organisation contributes to nearly every accident. After a gap, there
are 5 more categories, containing each more than 100 accidents and six with less then 50
accidents each. So, even on the higher level of the taxonomy, the data show heterogeneous
behaviour. The distribution of attributes per accident is similar to the one from Figure 1,
with the difference that the high values from are cut, as from the original 53 dimensions (the
theoretical maximal value) only 13 remain.
Association Rule Generation on the lowest hierarchical level
We start to generate association rules on the dataset’s lowest dimension and we extract
those with a high lift with respect to a minimum support of the rule’s implicating part of
217
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
218
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Table 1. Selection of rules with one item as antecedence and one item as consequence, ordered by lift.
Premise Consequent Support
(premise)
Support
(rule)
Confid. Lift
Wrong Reasoning Insufficient Knowledge 0.1204 0.0972 0.8077 2.3576
Management Problem Inadequate Quality Contr. 0.1019 0.0880 0.8636 1.4574
Wrong Reasoning Inadequate Task Alloc. 0.1204 0.1019 0.8462 1.4505
Inadequate Procedure Inadequate Task Alloc. 0.4400 0.3657 0.8316 1.4256
Missing Info. Inadequate Task Alloc. 0.1991 0.1620 0.8140 1.3953
Incomplete Info. Inadequate Task Alloc. 0.1389 0.1111 0.8000 1.3714
Wrong Place Inadequate Task Alloc. 0.2685 0.2130 0.7931 1.3596
Maintenance Failure Inadequate Quality Contr. 0.3519 0.2778 0.7895 1.3322
Insufficient Knowledge Inadequate Task Alloc. 0.3426 0.2639 0.7703 1.3205
Insufficient Knowledge Inadequate Quality Contr. 0.3426 0.2593 0.7568 1.2770
Insufficient Knowledge Design Failure 0.3426 0.2731 0.7973 1.2390
Maintenance Failure Design Failure 0.3519 0.2731 0.7763 1.2064
Insufficient Skills Design Failure 0.3750 0.2870 0.7654 1.1894
skills, the lack of important operational information as well as failing to proceed adequately or
maintain regularly seem to play key roles in increasing the possibility of a major accident.
The first rule states that wrong reasoning highly increases the chance of occurrence of
insufficient knowledge with a lift of 2.3. Further, insufficient knowledge appears again as
premise in three rules showing a strong relation to several consequences while having a support
(premise) of huge 34%. One of these three consequences is Inadequate Task Allocation, which
occurs as consequence 5 more times for Wrong Reasoning, Missing Information, Incomplete
Information, Wrong Place and Inadequate Procedure. Especially the last one with 0.44 has a
very high support. The latter factors seem to be core problems when analysing major accidents.
The next step is to have a closer look at Association Rules consisting of two implying items
as they might contain more interesting rules. Those are shown in Table 2. It reveals that they
do not yield significantly better results and they mainly consist of the same attributes as when
using just one implicating item.
It may be surprising that equipment failures do not occur alone in the rules, in spite of being
one of the most occurring attributes. This can be interpreted as a sign that an equipment failure
alone does not make an accident, but that mostly the combination with further human errors
e.g. insufficient knowledge or maintenance failures build the arrangement which causes the
accident.
Association Rule Generation on the high hierarchical level
Since Hollnagel’s taxonomy allows us to proceed association rule mining on a higher level in
the hierarchy, the grouped attributes may better suffice the required support level thus enabling
us to discover additional rules of interest. The rules generated this way should tend to be less
specific and might have higher support values, as infrequent items can now appear in the rules
as part of a bigger group.
219
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
Table 2. Selection of rules with two items as antecedence and one item as consequence ordered by lift.
Rule (premise up, consequence down) Support
(premise)
Support
(rule)
Confid. Lift
Wrong Place Inadequate Task Alloc.
Inadequate Procedure 0.2130 0.1713 0.8043 1.8288
Wrong Place Maintenance Failure
Inadequate Task Alloc. 0.1157 0.1111 0.9600 1.6457
Wrong Place Inadequate Procedure
Inadequate Task Alloc. 0.1806 0.1713 0.9487 1.6264
Missing Info. Insufficient Skills
Inadequate Task Alloc. 0.1111 0.1019 0.9167 1.5714
Inadequate Procedure Maintenance Failure
Inadequate Quality Contr. 0.1944 0.1806 0.9286 1.5670
Missing.Info. Insufficient Knowledge
Inadequate Task Alloc. 0.1019 0.0926 0.9091 1.5584
Maintenance Failure Inadequate Task Alloc.
Inadequate Quality Contr. 0.2315 0.2083 0.9000 1.5188
Inadequate Procedure Design.Failure
Inadequate Task Alloc. 0.3056 0.2639 0.8636 1.4805
Inadequate Procedure Inadequate Quality Contr.
Inadequate Task Alloc. 0.2963 0.2546 0.8594 1.4732
Equip.Failure Insufficient Knowledge
Inadequate Quality Contr. 0.1944 0.1574 0.8095 1.3661
Inadequate Procedure Insufficient Skills
Inadequate Task Alloc. 0.2269 0.1806 0.7959 1.3644
Insufficient Skills Insufficient Knowledge
Design Failure 0.1713 0.1481 0.8649 1.3440
Equipment Failure Maintenance Failure
Design Failure 0.2454 0.2037 0.8302 1.2901
The scatter plot of the rules extracted from this grouped level is depicted in Figure 4. It shows
a number of Association Rules having a lift close to one. By referring to the lift’s definition,
we neglect those rules and regard both sides of the implication as nearly independent. In
addition, there are many rules with high lift and high confidence, and few of them even have
high support.
Table 3 show the rules generated on the coarse level. The first two rules indicate a strong
correlation between specific cognitive functions and action, due to very high values in support,
lift and confidence. These are the rules with highest support, those very far in the upper right
corner of 4. The following six rules show that further influences on these two very important
factors are temporary personal related functions, working conditions, temporary interface
problems or equipment and communication issues. Those were indicated as the circles with
lift close to two, so unfortunately they do not give many further insights. The following rules
indicate problems concerning training. Especially the last rule, with with a support(premise)
220
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Fig. 4. Scattered exemplary Association Rules on grouped items.
of nearly 44%, shows that procedural problems are related with training issues. As expected,
the rules found here predominantly reflect those attributes with the highest supports.
Table 3. Selection of rules on groups of items, ordered by lift.
Premise Consequent Support
(premise)
Support
(rule)
Confid. Lift
Spec. Cogn. Func. Action 0.4676 0.4444 0.9505 1.9010
Action Spec. Cogn. Func. 0.5000 0.4444 0.8889 1.9010
Temp. Pers. Rel. Func. Action 0.1343 0.1250 0.9310 1.8621
Work. Cond. Action 0.1204 0.1111 0.9231 1.8462
Temp. Interface Spec. Cogn. Func. 0.1528 0.1204 0.7879 1.6850
Temp. Interface Action 0.1528 0.1250 0.8182 1.6364
Temp. Pers. Rel. Func. Spec. Cogn. Func. 0.1343 0.1019 0.7586 1.6224
Equipment Communication Action 0.1620 0.1296 0.8000 1.6000
Procedure Communication Training 0.1620 0.1389 0.8571 1.5690
Work. Cond. Training 0.1204 0.1019 0.8462 1.5489
Temp. Interface Training 0.1528 0.1204 0.7879 1.4422
Procedure Training 0.4398 0.3333 0.7579 1.3873
221
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
Due to the limitations in the extraction of Association Rules, rarely occurring attributes were
a priori excluded. Unfortunately, since there is a large variance in the data, these infrequent
factors represent a variety of all the possible attributes. Vice versa, analysing an accident
caused by only few attributes might give additional hints for potential driving factors as,
for instance, there may be accidents only being caused by these rarely occurring attributes.
Consequently, we have a detailed look at those accidents with only few contributing factors as
those factors might be especially important.
Analysis of accidents with few contributing factors
From now on, we will focus on those 58 accidents being caused by less or equal than three
attributes. As they were hardly represented in the association rule mining, we will look at
them here in more detail.
Table 4. Few point failures ordered by percentage of occurring in cases
one Reason two Reasons three Reasons
Attribute ratio Attribute ratio Attribute ratio
Equip.Failure 9/14 Equip.Failure 14/25 Equip.Failure 13/19
Inadeq.Task.Alloc. 2/14 Design.Failure 11/25 Inadeq.Qual.Contr. 10/19
Inadeq.Qual.Contr. 1/14 Inadeq.Qual.Contr. 5/25 Design.Failure 8/19
Design.Failure 1/14 Inadeq.Task.Alloc. 5/25 Inadeq.Task.Alloc. 6/19
Adverse.Ambient.Cond. 1/14 Maint.Failure 5/25 Inadeq.Procedure 3/19
Adverse.Ambient.Cond. 3/25 Insuff.Knowl. 3/19
Inadeq.Procedure 3/25 Adverse.Ambient.Cond. 2/19
Insuff.Skills 2/25 Maint.Failure 2/19
Wrong Place 1/25 Missing.Info 2/19
Physio.Stress 1/25 Wrong Place 1/19
Insuff.Skills 1/19
Wrong.Type 1/19
Priority.Error 1/19
Decision.Error 1/19
Observ.Missed 1/19
Mgmt.Problem 1/19
Temperature 1/19
Table 4 illustrates that equipment, design and maintenance failures, as well as inadequate
task allocation and quality control, are again the attributes contributing most to accidents
caused by less than four attributes.
When looking again at the creation process of the data, we found out that for all incidents
with only one reason, the data source was a report from an insurance company. Maybe
for their purposes, one reason might suffice, especially a strong and clear reason such as
an equipment failure. Probably, the data generation purpose (or the objective of the search
222
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
for accident causes) is slightly different for them. While, in most of cases, the disclosure
of all circumstances of an accident might be desirable, for insurance companies the report
construction might be based on whether they have to pay or not. If this question is clearly
answered by a major contributing cause stated in the report, there is no need to spend more
money on extensive investigations. As for many data analysis tasks, the result’s quality is
highly influenced by the input’s characteristics.
6. Conclusions
In this work, we analysed the MATA-D dataset. In contrast to earlier works, we applied
association rule mining instead of clustering, in order to gain further insights. Moreover,
further investigations were performed for the data points with less then four contributing
factors, as they were not well represented by association rule mining.
On the coarse level, firstly we found strong correlations between failures in action and
specific cognitive functions. They also showed to be correlate with several temporary or
person related errors. As these are mostly man-made faults, we recommend stakeholders
to give crucial tasks not to single persons but to teams, providing supervision to minimise
negative effects of specific cognitive functions. Working as a team should also minimise the
temporal person related problems found, although regular checks on the workers’ condition (if
subjected to fatigue or stress, for examples) are also desirable. Secondly, we found training
issues to be related to procedural and temporal problems. We recommend an assessment of the
training on critical procedures, where the involved people can gain practice and experience.
The development of means for feedback on complicated processes, which require a higher
level of cognition, and of design shortcomings, would also be appropriate, in order to simplify
them. On the fine level, we saw wrong reasoning in correlation with inadequate task allocation
and insufficient knowledge and design failures. These issues could also be handled by the
previous recommendations.
Future analysis aiming on specific statistical reasons leading to accidents, could also include
data on near-accidents.
Final Remarks
Raphael Moura’s contribution to this work has been partially funded by CAPES (Proc.
no. 5959/13-6).
References
1. Rakesh Agrawal, Tomasz Imieli´
nski, and Arun Swami. Mining association rules between
sets of items in large databases. In ACM SIGMOD Record, volume 22, pages 207–216.
ACM, 1993.
2. CE Billings, JK Lauber, H Funkhouser, EG Lyman, and EM Huff. Nasa aviation safety
reporting system. 1976.
223
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
3. J Roger Bray and John T Curtis. An ordination of the upland forest communities of
southern wisconsin. Ecological monographs, 27(4):325–349, 1957.
4. Erik Hollnagel. Cognitive reliability and error analysis method. UK Oxford: Elsevier
Science Ltd, 1998.
5. Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction to
cluster analysis, volume 344. John Wiley & Sons, 2009.
6. Teuvo Kohonen, MR Schroeder, TS Huang, and Self-Organizing Maps. Springer-verlag
new york. Inc., Secaucus, NJ, 43, 2001.
7. R Moura, M Beer, E Pattelli, J Lewis, and F Knoll. Human error analysis: Review of past
accidents and implications for improving robustness of system design. In T. Nowakowski
et al., editor, Proceedings of the 24th European Safety and Reliability Conference: 14-18
September 2014, Wroclaw, Poland, pages 1037–1046, London, 2014. Taylor & Francis
Group.
8. Raphael Moura, Michael Beer, John Lewis, and Edoardo Patelli. Learning from accidents:
analysis and representation of human errors in multi-attribute events. In T. Haukaas,
editor, Proceedings of the 12th International Conference on Applications of Statistics and
Probability in Civil Engineering (ICASP12), Vancouver, Canada, July 12-15 2015.
9. Raphael Moura, Christoph Doell, Michael Beer, and Rudolf Kruse. A clustering approach
to a major-accident data set: Analysis of key interactions to minimise human errors. To be
published in SSCI-CIES, 2015.
10. Scott Shappell, Cristy Detwiler, Kali Holcomb, Carla Hackworth, Albert Boquet, and
Douglas A Wiegmann. Human error and commercial aviation accidents: an analysis using
the human factors analysis and classification system. Human Factors: The Journal of the
Human Factors and Ergonomics Society, 49(2):227–242, 2007.
11. JC Williams. Heart–a proposed method for assessing and reducing human error. In 9th
Advances in Reliability Technology Symposium, University of Bradford, 1986.
224
... The structured but comprehensive nature of the MATA-D framework allowed for the effective application of several data mining approaches in previous research (e.g. Doell et al., 2015;Moura et al., 2015aMoura et al., , 2015b, such as agglomerative clustering methods, association rule mining techniques and neural networks. Cross-industrial common patterns in major events as well as significant relationships among contributing factors were successfully disclosed. ...
Article
Many industries are subjected to major hazards, which are of great concern to stakeholders groups. Accordingly, efforts to control these hazards and manage risks are increasingly made, supported by improved computational capabilities and the application of sophisticated safety and reliability models. Recent events, however, have revealed that apparently rare or seemingly unforeseen scenarios, involving complex interactions between human factors, technologies and organisations, are capable of triggering major catastrophes. The purpose of this work is to enhance stakeholders’ trust in risk management by developing a framework to verify if tendencies and patterns observed in major accidents were appropriately contemplated by risk studies. This paper first discusses the main accident theories underpinning major catastrophes. Then, an accident dataset containing contributing factors from major events occurred in high-technology industrial domains serves as basis for the application of a clustering and data mining technique (self-organising maps – SOM), allowing the exploration of accident information gathered from in-depth investigations. Results enabled the disclosure of common patterns in major accidents, leading to the development of an attribute list to validate risk assessment studies to ensure that the influence of human factors, technological issues and organisational aspects was properly taken into account.
... Moreover, Doell et al (2015) also tested association rule mining techniques on both the lower (53 factors) and the intermediate (15 factors) hierarchical levels of the MATA-D dataset, aiming at the identification of relevant links among contributing factors. Attributes that occurred infrequently had to be excluded from the analysis, due to limitations in the extraction of association rules, but these were assessed separately. ...
... Moreover, Doell et al (2015) also tested association rule mining techniques on both the lower (53 factors) and the intermediate (15 factors) hierarchical levels of the MATA-D dataset, aiming at the identification of relevant links among contributing factors. Attributes that occurred infrequently had to be excluded from the analysis, due to limitations in the extraction of association rules, but these were assessed separately. ...
Conference Paper
Full-text available
Latest disasters in high-technology systems, such as the February 2016 train collision in southern Bavaria, Germany (BBC,2016), have been associated with complex interactions between design shortcomings, human factors and organisations. In an attempt to approach this important matter and improve the interfaces between humans, technology and organisations, this paper first applies an unsupervised learning neural network approach entitled self-organising maps (SOM) to a dataset containing 238 major accidents collected from industry, regulators and insurance companies. The data mining process translated extremely complex data into a 2-D topographical map, ordered by similarity. Results highlighted a portion of the map where the worst consequences of the accidents were observed, using the fatalities rate as a metric. Relevant relationships between design failures and human factors were exposed, and several examples extracted from the targeted cluster had been presented. The findings emerging from the lessons learned from major accidents were then translated into a practical guide for designers and design reviewers, also revealing opportunities to expand designers’ understanding and perception of risks related to human factors.
Article
Full-text available
This paper is a product of a line of research that uses the Socio-Technical Risk Analysis (SoTeRiA) theoretical framework and Integrated PRA (I-PRA) methodological framework to theorize and quantify underlying organizational mechanisms contributing to socio-technical system risk scenarios. I-PRA has an input module that executes the Data-Theoretic (DT) approach, where "data analytics" can be guided by "theory." The DT input module of I-PRA has two sub-modules: (1) DT-BASE, for developing detailed grounded theory-based causal relationships in SoTeRiA, equipped with a software-supported BASEline quantification utilizing information extracted from academic articles, industry procedures, and regulatory standards, and (2) DT-SITE, using data analytics to refine and measure the causal factors of SoTeRiA based on industry event databases and using Bayesian analysis to update the baseline quantification. This paper focuses on the advancement of DT-SITE, contributing to the integration of text mining with the measurement of organizational factors for PRA, and demonstrating the following methodological elements and steps in DT-SITE: (Element 2.1) Text mining: (Step i) collect and pre-process unstructured text data, (Step ii) identify theory-based seed terms based on DT-BASE causal model, (Step iii) generate features, and (Step iv) build and evaluate classifiers (e.g., by using Support Vector Machine [SVM]); and (Element 2.2) Estimating probabilities and their associated uncertainties. The DT-SITE methodology is applied in a case study targeting the "training system" in Nuclear Power Plants (NPPs) and using Licensee Event Reports (LERs) from the U.S. nuclear power industry, where LER-specific data extraction and pre-processing tools are developed.
Conference Paper
Full-text available
Regardless of the evolution of engineering systems and fabrication methods, recent major accidents exposed the risk behind modern human economic activities to an inquiring and perplexed society. These events brought out the fact that interactions between complex systems, cutting-edge technologies and human factors may trigger particular accident sequences that are very difficult to predict and mitigate through traditional risk assessment tools. Thus, the purpose of this study is to overcome barriers to dealing with complex data by translating multi-attribute events into a two-dimensional visualisation framework, providing means to communicate high-technology risks and to disclose surrounding factors and tendencies that could lead to the manifestation of human errors. This paper first discusses the human error and human factors role in industrial accidents. The second part applies Kohonen's self-organising maps neural network theory to an accident dataset developed by the authors, as an attempt to improve data exploration and classify information from past events. Graphical interfaces are then generated to produce further insight into the conditions leading to the human errors genesis and to facilitate risk communication among stakeholders.
Conference Paper
Full-text available
Since the establishment of the high-technology industry and industrial systems, develop-ments of new materials and fabrication techniques, associated with cutting-edge structural and engineer-ing assessments, are contributing to more reliable and consistent systems, thus reducing the likelihood of losses. However, recent accidents are acknowledged to be linked to human factors which led to cata-strophic consequences. Therefore, the understanding of human behavioural characteristics interlaced with the actual technology aspects and organisational context is of paramount importance for the safety & reliability field. This study first approaches this multidisciplinary problem by classifying and review-ing 200 major accident data from insurance companies and regulatory authorities under the Cognitive Reliability and Error Analysis framework. Then, specific attention is dedicated to discuss the implications for improving robustness of system design and tackling the surrounding factors and tendencies that could lead to the manifestation of human errors. estimation of human error probabilities. The key step in this approach is defining possible tasks to be performed by a human operator, considered to be an element or component subjected to failure due to inborn characteristics, thus having an "in-built probability of failure". Modifiers known as performance shaping factors, error-forcing condi-tions, scaling factors or performance influencing factors, depending on the methodology, are then applied to adjust the likelihood of human failure when performing the assessed task. On the other hand, some contemporary approaches to HRA such as "A Technique for Human Error Analysis"—ATHEANA (Barriere et al. 2000), the Connectionism Assessment of Human Reliability (CAHR) based on Sträter (2000) and the Cognitive Reliability and Error Analysis Method (CREAM) by Hollnagel (1998) were developed around the principle that the fun-damental element is, in fact, the context in which the task is performed, reducing previous emphasis on the task characteristics per se and on a hypo-thetical inherent human error probability. Several HRA methods comprising both "task-centred" and "context-centred" approaches are currently in use. Bell & Holroyd (2009) reported 72 different techniques to estimate human reli-ability and considered 35 as potentially relevant. Further analysis highlighted 17 of these HRA
Article
Full-text available
The origins and development of the NASA Aviation Safety Reporting System (ASRS) are briefly reviewed. The results of the first quarter's activity are summarized and discussed. Examples are given of bulletins describing potential air safety hazards, and the disposition of these bulletins. During the first quarter of operation, the ASRS received 1464 reports; 1407 provided data relevant to air safety. All reports are being processed for entry into the ASRS data base. During the reporting period, 130 alert bulletins describing possible problems in the aviation system were generated and disseminated. Responses were received from FAA and others regarding 108 of the alert bulletins. Action was being taken with respect to 70 of the 108 responses received. Further studies are planned of a number of areas, including human factors problems related to automation of the ground and airborne portions of the national aviation system.