Content uploaded by Raphael Moura
Author content
All content in this area was uploaded by Raphael Moura on Nov 24, 2015
Content may be subject to copyright.
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Analysis of a major-accident dataset by Association Rule Mining to
minimise unsafe interfaces
Christoph Doell1and Pascal Held1and Raphael Moura2and Rudolf Kruse1and
Michael Beer2
1Faculty of Computer Science, Otto von Guericke University of Magdeburg,
Universitaetsplatz 2 39106 Magdeburg, Germany.
E-mail: {christoph.doell,pascal.held,rudolf.kruse}@ovgu.de
2Institute for Risk and Uncertainty, University of Liverpool, Brodie Tower, Brownlow Street,
Liverpool L69 3GQ, UK.
E-mail: {rmoura,mbeer}@liverpool.ac.uk
Major accidents may cause severe damage to humans and the environment, and can potentially
lead to significant losses in a business and societal level. Thus, the understanding of these
complex multi-attribute events through the analysis of past accidents might assist the search
for strategies to improve engineering system’s safety and design robustness.
Therefore, we aim to explore potential relationships among contributing factors by means of
assessing approximately 200 major industrial accidents from the Multi-attribute Technological
Accidents Dataset (MATA-D) created by Moura et al.
Understanding this complex and high dimensional data on incidents, is the main purpose of
this work. We apply association rule mining techniques and perform point-failure analysis
in order to produce further insight into the dataset. Subsequently, key similarities among
accidents’ contributing factors will be analysed, in order to disclose relevant associations and
identify to which extent a limited number of driving forces might be generating undesirable
events.
Results will be regarded as additional indicators to reduce risky interfaces among contribut-
ing factors, and to indicate further managerial actions to minimise accidents. Conclusions to
enable additional means to visualise and communicate risks to specific stakeholders are then
discussed.
Keywords: Association Rule Mining; MATA-D; major accidents; binary data.
1. Introduction
Recent major accidents in high-technology systems have been triggering serious concerns
about the safety level of complex industrial structures, raising reasonable doubts among a
distressed and perplexed society. The development of novel standards, procedures, safety
assessments and the role of regulatory bodies do not seem to have being able to cope with
the intricate relationship between new technologies, organisational aspects and human factors.
212
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Edited by Edoardo Patelli & Ioannis Kougioumtzoglou
Copyright c
2015 IPW 2015 Organisers :: Published by Research Publishing
ISBN: 978-981-00-0000-0 — doi:10.3850/978-981-00-0000-0 092
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
The death toll from the latest widely-known events, such as the Tianjin Port explosion in China
(August 2015, 173 fatalities), the sinking of the Korean Ferry MV Sewol (April 2014, 304
fatalities), the disappearance of the Malaysian MH370 flight (March 2014, 239 fatalities), the
Lac-Megantic train derailment and explosion in Quebec (July 2013, 47 Fatalities) and the
sinking of the Costa Concordia Transatlantic in Italy (January 2012, 32 fatalities) has been a
substantial blow on the public’s confidence in the industry’s capacity to conduct operations
safely. Extensive investigation reports are usually generated from this type of catastrophic
event, to elucidate how several contributing factors perfectly merged to produce undesirable
developments which collapsed systems and structures originally designed to be highly reliable.
Therefore, there is a need to understand these complex interactions occurred during major
events, in order to learn from previous accidents and identify common patterns and relation-
ships to gain some insight into the genesis of major accidents. This work uses a University
of Liverpool’s proprietary dataset named MATA-D (Multi-attribute Technological Accidents
Dataset), which was presented by Moura et al [7].
The following section will provide some further information regarding the dataset, com-
prising the structuration method and the effects of the taxonomy used on the type of analysis
hereby performed. Then, difficulties related to the interpretation of such a heterogeneous and
sparse data are discussed, independent of the underlying analysis. Earlier approaches, such as
the application of self-organizing map and hierarchical clustering, including their advantages,
disadvantages and their corresponding results, are addressed. Later, the methods section will
explain the algorithms used in this work, i.e. Frequent Itemset Mining and Association Rule
Mining, being followed by the description of the experiments and its results, as well as some
advice to assist stakeholders on risk reduction measures.
2. Data Description
Hollnagel has developed a methodology for conducting Human Reliability Analysis, aiming
at the performance prediction of operators when conducting tasks (to quantify possible
outcomes) and to support the search for causes of occurred events (i.e. to investigate
accidents)[4]. To support both the predictive and the responsive use of the method, he
developed the taxonomy CREAM, containing 53 potential descriptions for human erroneous
actions, cognitive functions, personal-related factors, technology issues and organisational
aspects. These descriptions are hierarchically grouped in the highest level: Human Error,
Technological Error and Organisational Environment Error. These can be separated further as
follows: Human Errors contains the areas: Action,Specific Cognitive Functions,Temporary
Person Related Functions and Permanent Person Related Functions.Technology is split to:
Equipment,Procedures,Temporary Interface and Permanent Interface. The third group is
Organisational Environment, containing: Organisation, Training, Ambient Conditions and
Working Conditions.
Moura et al [7] gives more details on the applied taxonomy, but also examined other
taxonomies, such as the ones contained in the Human Error Assessment and Reduction
Technique [11] and the Human Factors Analysis and Classification System [10]. The generalist
213
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
feature of Hollnagel’s classification system, as well as its hierarchical partition, favoured its
usage as a dataset structural basis without further adaptation for the generation of MATA-D.
The MATA-D dataset contains data on major accidents that occurred in the past. It was
generated by a review of reports on accidents [7] and contains data on 216 major accidents.
As four of the 53 attributes did not occur, we only have to handle 49 attributes, so the space’s
dimension is 216 ×49 of boolean values. They indicate whether or not a specific type of error
occurred within the process that caused the accident.
Our universe of discourse captures a set of accidents and the corresponding contributing
factors. We do not have any data given on near accidents or events given the total hours of
operation. Hence, we have no information about probabilities of attributes being independent
from an accident. Especially when dealing with rare events, this would be crucial from a
frequencist point of view, because some potentially driving factors might occur often, even
without causing an accident.
Nevertheless, the generation of such a dataset is difficult, as the information is written in
human readable documents, which have to be analysed manually in order to generate an entry
for our data base. For many areas data on accidents or near accidents is not available. This has
several reasons: First of all, we need a reliable documentation of the accident, which already
limits the time, as we think back of disasters from long ago, as the destruction of the German
Zeppelin Hindenburg or the sinking of the Titanic. If data is recorded properly, it may still
be classified, as for many military accidents. Companies might be unwilling to reveal their
documents as this might damage their reputation. This becomes even more difficult for near
accidents. Only few areas perform a clear identification of near misses. Incidents in aviation
– even when not leading to accidents – can be reported in detail and accessed in the Nasa
aviation safety reporting system[2]. So, future work might include data on more accidents and
also near misses, to perform a wider analysis on the driving factors. For now, we will analyse
the major accidents data given in the MATA-D.
3. Related Work
Earlier approaches by Moura et al. [8] tried to cluster the data. They used unsupervised learning
techniques such as a Self Organising Map (SOM) [6]. Self organising maps are both: a method
for clustering and for visualization of high dimensional data. They use a neural network as
basis and initiate random values for neurons, based on a grid, which forms the output space.
Neurons on the grid are connected with edges. Learning works as follows: For the current
training sample the algorithm finds the neuron in the grid most similar and adepts the neuron’s
weight to better match the sample. In addition all connected neurons are adapted in the same
way, just weaker. Neurons farther away may be also slightly adapted, depending on the chosen
parameters. If a later sample is similar to one already learned, it will probably affect a neuron
close to the one found earlier. This behaviour results in grouping similar samples close to each
other, while dissimilar samples iteratively converge to having larger distances. After learning
all training examples, the (normally two dimensional) grid is directly used as visualisation.
214
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
As the described work uses also MATA-D we can directly give few examples of these earlier
findings. Five clusters were found showing different relations between attributes, for which we
discuss three in detail: Accidents with equipment failures built a cluster with relations to only
few other attributes. Another cluster implied relations between design failures, insufficient
skills, communication failures and specific execution factors like wrong time and wrong
type. A third cluster contained accidents with problems in quality control, task allocation,
maintenance and design as well as insufficient knowledge and missing information.
Another approach performed was hierarchical agglomerative clustering, as described in
[5]. One critical aspect with cluster analysis is the choice of the similarity measure. For
accidents as given boolean vectors similarity mean that they have many common reasons and
few reasons, which they have not in common. The bray-curtis dissimilarity [3]fulfils this
property. Hierarchical agglomerative clustering initialises every data point as a cluster and
merges close clusters subsequently. Depending on the chosen similarity and linkage method,
different results can be gained. Bray-curtis dissimilarity and different linkage methods, were
applied to MATA-D [9] generating several clusters. One showed communication issues and
the highest fatality rate from all clusters, another grouped training aspects in combination with
knowledge issues.
For the latter works the similarity measure of accidents can highly influence the results. And
in general one might argue which one is appropriate. Avoiding this conflict, we apply a method
not on (dis-)similarity of accidents, but on relations of the underlying attributes.
4. Methodology
In market basket analysis, the given data are often considered to be items bought together.
Each customer chooses his items, puts them into his basket and buys them. Mathematically,
we speak of a transaction, containing the items as binary vector that describes which items
were chosen. The set of all given transactions is called transaction database. Given these data,
we are interested in association rules like ”‘90% of transactions that purchase bread and butter,
also purchase milk”’[1]. One property of association rules is that the generated rules shall
have a high support, meaning that the describes behaviour can be seen frequently within the
transaction database. Be Athe antecedent and B the consequent of a rule, then support(A)
P(A)where P(A)denotes the empirical probability for the occurrence of Ain the transaction
database. The support of a rule is given by P(A, B). Further properties of rules are confidence
P(B|A), describing the rate for the consequence holding, when the antecedent occurs - for
example the value of 90% in the example above. The rules lift P(A, B)/ P (A)P(B), explains
the change in ratio: Staying with example of bread, milk and butter – be the rate of milk being
bought at 5% and when given bread and butter are bought, the rate of milk be 10%, then the
lift of the rule is 2.0 (10% / 5%).
We are interested in rules with a high support. This has the practical consequence that we
do not need to look for every possible set of premises, but only for those with high support. So
215
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
we apply frequent itemset mining as initial step in our analysis, which generates the relative
frequencies needed for generating association rules.
As the proposed algorithms generate association rules automatically, the interpretation of
the rules remains manually as there are several properties to be maximized. Namely support,
confidence, lift and the simplicity of the rule. Simplicity in this case means number of items
in the antecedent. For rule generation it may be subjective preference whether adding another
item to a rule results in enough lift and confidence growth with respect to only a little decrease
in support. As we aim to communicate the most important rules to stakeholders, we only
consider rules with up to three items in the antecedent.
5. Experiments and Results
The dataset is binary, so we can interpret it as items and transactions and apply association rule
mining. Then we determine those confident rules that are of simple structure, and are high-
lifting as well as sufficiently supported. As association rules can only generate insights on
frequently appearing attributes, we finally take a special look at the subset of those accidents,
which occurred although only few aspects went wrong. We focus on those, with less then four
contributing attributes.
MATA-D only contains information on accidents and not on near accidents, which makes the
interpretation of rules difficult. They do not mean that a given premise caused the consequence
which caused the accident, but that premises and consequences are statistically related. This
means avoiding at least one part of the premise lowers the chance of the consequence to occur.
If the consequence can be prevented, there are at least two less contributing factors – one
premise and the consequence – , which raises hope that the accident can be prevented, too.
Basic Exploration
As first step of the analysis, we look at 53 ×216 matrix and see that only 1416 of the 11448
values are ones. This sparsely filled matrix means that in average accidents have only few
reasons. Figure 1 shows the accidents per attribute(left) and vice versa(right). Four attributes
occur more than 100 times, six attributes between forty and hundred times and the remaining
43 attributes less then 40 times each. This shows that there are few attributes shown very
frequently, while others hardly occur at all. The ten most frequent factors are, in a decreasing
order, as follows: design failure, inadequate quality control, equipment failure, inadequate
task allocation, inadequate procedure, insufficient skills, maintenance failure, insufficient
knowledge, wrong place and missing information.
When looking at the attributes per accident, only few accidents show more than 15 reasons
and most of the accidents have at least 5 contributing factors. This indicated that association
rule mining might reveal insights on these accidents.
Still, a quarter of all the accidents is caused by less than four factors. As infrequent factors
represent a variety of all the possible attributes. Consequently exploring accidents only caused
by few point failures thoroughly can give additional insight resolving whether some of these
accidents are caused by infrequent factors.
216
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Fig. 1. Distribution of accidents per attribute(left) and attributes per accident(right).
The CREAM taxonomy allows to perform the same analysis on a more coarse level. The
results are depicted in Figure 2.
Fig. 2. Distribution of accidents per aggregated attribute(left) and attributes per accident(right).
Surprisingly, the label organisation contributes to nearly every accident. After a gap, there
are 5 more categories, containing each more than 100 accidents and six with less then 50
accidents each. So, even on the higher level of the taxonomy, the data show heterogeneous
behaviour. The distribution of attributes per accident is similar to the one from Figure 1,
with the difference that the high values from are cut, as from the original 53 dimensions (the
theoretical maximal value) only 13 remain.
Association Rule Generation on the lowest hierarchical level
We start to generate association rules on the dataset’s lowest dimension and we extract
those with a high lift with respect to a minimum support of the rule’s implicating part of
217
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
approximately 10% and a rule’s confidence of at least about 75%. These constraints are
formulated softly for not excluding potentially interesting rules, which might be located close
to these limitations. Since the algorithm itself extracts rules only on the basis of common
occurrence, we only present some of rules tabularly. We focus on those, with the highest
values or which yield in a sufficient increase in lift and only minor decrease in support or
delivered new combinations of attributes.
Fig. 3. Scattered exemplary Association Rules and their data-based limit.
Figure 3 portrays the natural conflict of interest by trying to maximize support, confidence
and lift simultaneously. As the X-Axis means support, the Y-Axis means lift and the confidence
is presented as the size of the point, those rules would be shown as big points in the upper right
corner. Triangles in the plot indicate that the rule is shown in detail in the table later, while
circles are left out for not fulfilling at least one of the interesting attributes. The grey scale
of the object describes the confidence values. Small sized objects mean that the rules contain
one item in the premise, meaning they are simpler, than the bigger ones, containing two. This
figure just gives us a hint, that there are only few rules in the upper right corner. Look at the
rules with one premise, shown in detail in table 1.
Mainly, there are only three implicated attributes fulfilling the rules’ requirements, which
are: inadequate quality control, inadequate task allocation and design failure. In contrast to
this, there is a variety of implying attributes. Obviously, issues such as deficient knowledge and
218
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Table 1. Selection of rules with one item as antecedence and one item as consequence, ordered by lift.
Premise Consequent Support
(premise)
Support
(rule)
Confid. Lift
Wrong Reasoning ⇒Insufficient Knowledge 0.1204 0.0972 0.8077 2.3576
Management Problem ⇒Inadequate Quality Contr. 0.1019 0.0880 0.8636 1.4574
Wrong Reasoning ⇒Inadequate Task Alloc. 0.1204 0.1019 0.8462 1.4505
Inadequate Procedure ⇒Inadequate Task Alloc. 0.4400 0.3657 0.8316 1.4256
Missing Info. ⇒Inadequate Task Alloc. 0.1991 0.1620 0.8140 1.3953
Incomplete Info. ⇒Inadequate Task Alloc. 0.1389 0.1111 0.8000 1.3714
Wrong Place ⇒Inadequate Task Alloc. 0.2685 0.2130 0.7931 1.3596
Maintenance Failure ⇒Inadequate Quality Contr. 0.3519 0.2778 0.7895 1.3322
Insufficient Knowledge ⇒Inadequate Task Alloc. 0.3426 0.2639 0.7703 1.3205
Insufficient Knowledge ⇒Inadequate Quality Contr. 0.3426 0.2593 0.7568 1.2770
Insufficient Knowledge ⇒Design Failure 0.3426 0.2731 0.7973 1.2390
Maintenance Failure ⇒Design Failure 0.3519 0.2731 0.7763 1.2064
Insufficient Skills ⇒Design Failure 0.3750 0.2870 0.7654 1.1894
skills, the lack of important operational information as well as failing to proceed adequately or
maintain regularly seem to play key roles in increasing the possibility of a major accident.
The first rule states that wrong reasoning highly increases the chance of occurrence of
insufficient knowledge with a lift of 2.3. Further, insufficient knowledge appears again as
premise in three rules showing a strong relation to several consequences while having a support
(premise) of huge 34%. One of these three consequences is Inadequate Task Allocation, which
occurs as consequence 5 more times for Wrong Reasoning, Missing Information, Incomplete
Information, Wrong Place and Inadequate Procedure. Especially the last one with 0.44 has a
very high support. The latter factors seem to be core problems when analysing major accidents.
The next step is to have a closer look at Association Rules consisting of two implying items
as they might contain more interesting rules. Those are shown in Table 2. It reveals that they
do not yield significantly better results and they mainly consist of the same attributes as when
using just one implicating item.
It may be surprising that equipment failures do not occur alone in the rules, in spite of being
one of the most occurring attributes. This can be interpreted as a sign that an equipment failure
alone does not make an accident, but that mostly the combination with further human errors
e.g. insufficient knowledge or maintenance failures build the arrangement which causes the
accident.
Association Rule Generation on the high hierarchical level
Since Hollnagel’s taxonomy allows us to proceed association rule mining on a higher level in
the hierarchy, the grouped attributes may better suffice the required support level thus enabling
us to discover additional rules of interest. The rules generated this way should tend to be less
specific and might have higher support values, as infrequent items can now appear in the rules
as part of a bigger group.
219
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
Table 2. Selection of rules with two items as antecedence and one item as consequence ordered by lift.
Rule (premise up, consequence down) Support
(premise)
Support
(rule)
Confid. Lift
Wrong Place ∧Inadequate Task Alloc.
⇒Inadequate Procedure 0.2130 0.1713 0.8043 1.8288
Wrong Place ∧Maintenance Failure
⇒Inadequate Task Alloc. 0.1157 0.1111 0.9600 1.6457
Wrong Place ∧Inadequate Procedure
⇒Inadequate Task Alloc. 0.1806 0.1713 0.9487 1.6264
Missing Info. ∧Insufficient Skills
⇒Inadequate Task Alloc. 0.1111 0.1019 0.9167 1.5714
Inadequate Procedure ∧Maintenance Failure
⇒Inadequate Quality Contr. 0.1944 0.1806 0.9286 1.5670
Missing.Info. ∧Insufficient Knowledge
⇒Inadequate Task Alloc. 0.1019 0.0926 0.9091 1.5584
Maintenance Failure ∧Inadequate Task Alloc.
⇒Inadequate Quality Contr. 0.2315 0.2083 0.9000 1.5188
Inadequate Procedure ∧Design.Failure
⇒Inadequate Task Alloc. 0.3056 0.2639 0.8636 1.4805
Inadequate Procedure ∧Inadequate Quality Contr.
⇒Inadequate Task Alloc. 0.2963 0.2546 0.8594 1.4732
Equip.Failure ∧Insufficient Knowledge
⇒Inadequate Quality Contr. 0.1944 0.1574 0.8095 1.3661
Inadequate Procedure ∧Insufficient Skills
⇒Inadequate Task Alloc. 0.2269 0.1806 0.7959 1.3644
Insufficient Skills ∧Insufficient Knowledge
⇒Design Failure 0.1713 0.1481 0.8649 1.3440
Equipment Failure ∧Maintenance Failure
⇒Design Failure 0.2454 0.2037 0.8302 1.2901
The scatter plot of the rules extracted from this grouped level is depicted in Figure 4. It shows
a number of Association Rules having a lift close to one. By referring to the lift’s definition,
we neglect those rules and regard both sides of the implication as nearly independent. In
addition, there are many rules with high lift and high confidence, and few of them even have
high support.
Table 3 show the rules generated on the coarse level. The first two rules indicate a strong
correlation between specific cognitive functions and action, due to very high values in support,
lift and confidence. These are the rules with highest support, those very far in the upper right
corner of 4. The following six rules show that further influences on these two very important
factors are temporary personal related functions, working conditions, temporary interface
problems or equipment and communication issues. Those were indicated as the circles with
lift close to two, so unfortunately they do not give many further insights. The following rules
indicate problems concerning training. Especially the last rule, with with a support(premise)
220
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
Fig. 4. Scattered exemplary Association Rules on grouped items.
of nearly 44%, shows that procedural problems are related with training issues. As expected,
the rules found here predominantly reflect those attributes with the highest supports.
Table 3. Selection of rules on groups of items, ordered by lift.
Premise Consequent Support
(premise)
Support
(rule)
Confid. Lift
Spec. Cogn. Func. ⇒Action 0.4676 0.4444 0.9505 1.9010
Action ⇒Spec. Cogn. Func. 0.5000 0.4444 0.8889 1.9010
Temp. Pers. Rel. Func. ⇒Action 0.1343 0.1250 0.9310 1.8621
Work. Cond. ⇒Action 0.1204 0.1111 0.9231 1.8462
Temp. Interface ⇒Spec. Cogn. Func. 0.1528 0.1204 0.7879 1.6850
Temp. Interface ⇒Action 0.1528 0.1250 0.8182 1.6364
Temp. Pers. Rel. Func. ⇒Spec. Cogn. Func. 0.1343 0.1019 0.7586 1.6224
Equipment ∧Communication ⇒Action 0.1620 0.1296 0.8000 1.6000
Procedure ∧Communication ⇒Training 0.1620 0.1389 0.8571 1.5690
Work. Cond. ⇒Training 0.1204 0.1019 0.8462 1.5489
Temp. Interface ⇒Training 0.1528 0.1204 0.7879 1.4422
Procedure ⇒Training 0.4398 0.3333 0.7579 1.3873
221
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
Due to the limitations in the extraction of Association Rules, rarely occurring attributes were
a priori excluded. Unfortunately, since there is a large variance in the data, these infrequent
factors represent a variety of all the possible attributes. Vice versa, analysing an accident
caused by only few attributes might give additional hints for potential driving factors as,
for instance, there may be accidents only being caused by these rarely occurring attributes.
Consequently, we have a detailed look at those accidents with only few contributing factors as
those factors might be especially important.
Analysis of accidents with few contributing factors
From now on, we will focus on those 58 accidents being caused by less or equal than three
attributes. As they were hardly represented in the association rule mining, we will look at
them here in more detail.
Table 4. Few point failures ordered by percentage of occurring in cases
one Reason two Reasons three Reasons
Attribute ratio Attribute ratio Attribute ratio
Equip.Failure 9/14 Equip.Failure 14/25 Equip.Failure 13/19
Inadeq.Task.Alloc. 2/14 Design.Failure 11/25 Inadeq.Qual.Contr. 10/19
Inadeq.Qual.Contr. 1/14 Inadeq.Qual.Contr. 5/25 Design.Failure 8/19
Design.Failure 1/14 Inadeq.Task.Alloc. 5/25 Inadeq.Task.Alloc. 6/19
Adverse.Ambient.Cond. 1/14 Maint.Failure 5/25 Inadeq.Procedure 3/19
Adverse.Ambient.Cond. 3/25 Insuff.Knowl. 3/19
Inadeq.Procedure 3/25 Adverse.Ambient.Cond. 2/19
Insuff.Skills 2/25 Maint.Failure 2/19
Wrong Place 1/25 Missing.Info 2/19
Physio.Stress 1/25 Wrong Place 1/19
Insuff.Skills 1/19
Wrong.Type 1/19
Priority.Error 1/19
Decision.Error 1/19
Observ.Missed 1/19
Mgmt.Problem 1/19
Temperature 1/19
Table 4 illustrates that equipment, design and maintenance failures, as well as inadequate
task allocation and quality control, are again the attributes contributing most to accidents
caused by less than four attributes.
When looking again at the creation process of the data, we found out that for all incidents
with only one reason, the data source was a report from an insurance company. Maybe
for their purposes, one reason might suffice, especially a strong and clear reason such as
an equipment failure. Probably, the data generation purpose (or the objective of the search
222
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Proc. of the 13th International Probabilistic Workshop (IPW 2015)
for accident causes) is slightly different for them. While, in most of cases, the disclosure
of all circumstances of an accident might be desirable, for insurance companies the report
construction might be based on whether they have to pay or not. If this question is clearly
answered by a major contributing cause stated in the report, there is no need to spend more
money on extensive investigations. As for many data analysis tasks, the result’s quality is
highly influenced by the input’s characteristics.
6. Conclusions
In this work, we analysed the MATA-D dataset. In contrast to earlier works, we applied
association rule mining instead of clustering, in order to gain further insights. Moreover,
further investigations were performed for the data points with less then four contributing
factors, as they were not well represented by association rule mining.
On the coarse level, firstly we found strong correlations between failures in action and
specific cognitive functions. They also showed to be correlate with several temporary or
person related errors. As these are mostly man-made faults, we recommend stakeholders
to give crucial tasks not to single persons but to teams, providing supervision to minimise
negative effects of specific cognitive functions. Working as a team should also minimise the
temporal person related problems found, although regular checks on the workers’ condition (if
subjected to fatigue or stress, for examples) are also desirable. Secondly, we found training
issues to be related to procedural and temporal problems. We recommend an assessment of the
training on critical procedures, where the involved people can gain practice and experience.
The development of means for feedback on complicated processes, which require a higher
level of cognition, and of design shortcomings, would also be appropriate, in order to simplify
them. On the fine level, we saw wrong reasoning in correlation with inadequate task allocation
and insufficient knowledge and design failures. These issues could also be handled by the
previous recommendations.
Future analysis aiming on specific statistical reasons leading to accidents, could also include
data on near-accidents.
Final Remarks
Raphael Moura’s contribution to this work has been partially funded by CAPES (Proc.
no. 5959/13-6).
References
1. Rakesh Agrawal, Tomasz Imieli´
nski, and Arun Swami. Mining association rules between
sets of items in large databases. In ACM SIGMOD Record, volume 22, pages 207–216.
ACM, 1993.
2. CE Billings, JK Lauber, H Funkhouser, EG Lyman, and EM Huff. Nasa aviation safety
reporting system. 1976.
223
November 3, 2015 3:53 RPS/Trim Size: 24cm x 17cm for Proceedings/Edited Book driver092
Edoardo Patelli & Ioannis Kougioumtzoglou (editors)
3. J Roger Bray and John T Curtis. An ordination of the upland forest communities of
southern wisconsin. Ecological monographs, 27(4):325–349, 1957.
4. Erik Hollnagel. Cognitive reliability and error analysis method. UK Oxford: Elsevier
Science Ltd, 1998.
5. Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction to
cluster analysis, volume 344. John Wiley & Sons, 2009.
6. Teuvo Kohonen, MR Schroeder, TS Huang, and Self-Organizing Maps. Springer-verlag
new york. Inc., Secaucus, NJ, 43, 2001.
7. R Moura, M Beer, E Pattelli, J Lewis, and F Knoll. Human error analysis: Review of past
accidents and implications for improving robustness of system design. In T. Nowakowski
et al., editor, Proceedings of the 24th European Safety and Reliability Conference: 14-18
September 2014, Wroclaw, Poland, pages 1037–1046, London, 2014. Taylor & Francis
Group.
8. Raphael Moura, Michael Beer, John Lewis, and Edoardo Patelli. Learning from accidents:
analysis and representation of human errors in multi-attribute events. In T. Haukaas,
editor, Proceedings of the 12th International Conference on Applications of Statistics and
Probability in Civil Engineering (ICASP12), Vancouver, Canada, July 12-15 2015.
9. Raphael Moura, Christoph Doell, Michael Beer, and Rudolf Kruse. A clustering approach
to a major-accident data set: Analysis of key interactions to minimise human errors. To be
published in SSCI-CIES, 2015.
10. Scott Shappell, Cristy Detwiler, Kali Holcomb, Carla Hackworth, Albert Boquet, and
Douglas A Wiegmann. Human error and commercial aviation accidents: an analysis using
the human factors analysis and classification system. Human Factors: The Journal of the
Human Factors and Ergonomics Society, 49(2):227–242, 2007.
11. JC Williams. Heart–a proposed method for assessing and reducing human error. In 9th
Advances in Reliability Technology Symposium, University of Bradford, 1986.
224