Content uploaded by Gyula Dörgő
Author content
All content in this area was uploaded by Gyula Dörgő on Jun 12, 2019
Content may be subject to copyright.
Learning and predicting operation strategies by
sequence mining and deep learning
Gyula Dorgo and Janos Abonyi∗
MTA-PE Lendulet Complex Systems Monitoring Research Group, Department of Process
Engineering, University of Pannonia, Egyetem str. 10, Veszpr´em, H-8200, Hungary
Abstract
The operators of chemical technologies are frequently faced with the problem
of determining optimal interventions. Our aim is to develop data-driven models
by exploring the consequential relationships in the alarm and event-log database
of industrial systems. Our motivation is twofold: 1) to facilitate the work of the
operators by predicting future events and 2) analyse how consequent the event
series is. The core idea is that machine learning algorithms can learn sequences
of events by exploring connected events in databases. First, frequent sequence
mining applications are utilised to determine how the event sequences evolve
during the operation. Second, a sequence-to-sequence deep learning model is
proposed for their prediction. The long short-term memory unit-based model
(LSTM) is capable of evaluating rare operation situations and their consequen-
tial events. The performance of this methodology is presented with regard to
the analysis of the alarm and event-log database of an industrial delayed coker
unit.
Keywords: alarm management, data mining, data preprocessing, deep
learning, LSTM
2010 MSC: 00-01, 99-00
∗Corresponding author
Preprint submitted to Journal of L
A
T
E
X Templates June 12, 2019
1. Introduction
In current modern and complex chemical production plants with an increas-
ing number of interlinked process units, the process operators rely more and
more on automation systems to optimize production and extract important
control-related information. A crucial part of this control system is the alarm5
and warning system, which alerts the operators in the event of any abnor-
malities. According to alarm management guidelines, a single malfunction in
the production process should generate a single and actionable alarm message,
however, due to poor alarm management practices, faults often result in long
sequences of alarms and eventually suboptimal control of the system [1]. As10
for highly complex event series, the task of the operators is to interpret the
load of alarms, identify problematic situations, determine the true root cause
of such events, qualify the effect of possible interventions and interact with the
process. Early intervention can minimize any losses resulting from malfunctions
[2]. Therefore, the forecast of discrete events is crucial as far as the handling of15
critical situations is concerned [3]. The performance of the proposed data-driven
models and goal-oriented preprocessing of datasets is presented in terms of the
analysis of the alarm and event-log database of an industrial delayed coker unit.
In the present paper, the tasks and possibilities related to the analysis of in-
dustrial alarm and event-log databases to assist in the work of the operators and20
identify operational practices and patterns are defined. First of all, by analysing
alarm patterns that occur, the spillover effect of malfunctions can be revealed
and the hierarchical origin of alarm messages helps to determine the problem-
atic parts of the system that facilitate the spillover effect of a malfunction [4].
On the other hand, analysis of the operator actions enables frequently applied25
operation strategies to be explored and the consistency of the work of the op-
erators to be determined - which in turn supports the definition of automation
logics [5]. Finally, the simultaneous analysis of alarms and operator actions
supports the exploration of the causal relationship between them in both ways:
as the number of alarm messages is usually significantly higher than the number30
2
of operator actions, it can be determined what kind of alarm messages require
operator interactions and how the chosen action can be predicted, moreover,
what the effect of the selected operator action on the alarm messages is. Two
different approaches are proposed, namely frequent sequence mining and deep
learning, for the exploration of these operational patterns and quantification of35
consequential relationships.
2. Motivation & contributions
The motivation of our methodology is based on the assumption that pre-
dictable alarms do not contain any novel information, so the state of the process
can already be characterized based on the previously registered signals and this40
information is sufficient to determine what operator actions are required in the
given situation. Therefore, what is needed in the operation to facilitate the
work of the operators are models that can extract useful information from past
sequences of events and support the prediction of future states of the technology.
As can be seen in the literature the method of sequence learning is a well-45
researched and widespread technique in the data mining and machine learning
communities. Furthermore, the formulation of alarm management as a learning
problem is also not new, the publications of Shah and Chen et al. (e.g. [6]) and
Zhao et al. (e.g. [7]) provide a detailed picture of the existing approaches from
the past decades. However, according to our knowledge, this is the very first ar-50
ticle that connected these approaches and applied a deep learning model for the
learning of operational event sequences to solve industrial alarm management
problems and explore the potential of such techniques. Moreover, special em-
phasis on the goal-oriented preprocessing of the alarm and event-log databases
for different analysis purposes is given as well.55
Based on this, the contributions of the present paper are the following:
•The presented goal-oriented clearing and restructuring of industrial alarm
and event-log databases are suitable for the application of advanced data
mining and machine learning techniques. Since only operational segments
3
are of interest when the operators interact during the process, meaningful60
events are extracted and event sequences defined that contain them for
further analysis. This contribution is presented in Section 4.1.
•Following the description of the applied frequent sequence mining algo-
rithms (Section 4.2), the probabilistic interpretation of the sequences is
presented in depth in Section 4.3.65
•In Section 5.1 a novel methodology to predict events is proposed based
on the frequent sequences that occur during production together with a
novel network-based visualisation technique.
•A sequence-to-sequence deep learning model for the prediction of oper-
ational events is proposed. Given that the performance of such predic-70
tion models can only be evaluated in light of their possible application,
novel set-based similarity and edit distance-based performance metrics
have been defined. These contributions are presented in Section 5.2.
The listed contributions were made following the review of the literature
in Section 3. Afterwards, as a proof of concept of the described methodology,75
evaluation of its effectiveness in terms of the analysis of the alarm and event-
log database of an industrial delayed coker unit was conducted. The cleaning
process, distribution of the events in the cleaned database and the resultant se-
quences are described in Section 6.1. Since the future construction of a decision
support or automation solution is explicitly of interest, the frequent sequences80
that start with alarm messages were analysed that end up with operator ac-
tions in Section 6.2. The result of this frequent sequence mining-based analysis
showed that the operator interventions can hardly be linked to the antecedent
alarm messages. Finally, how the frequent sequences and sequence-to-sequence
deep learning models can be applied for the prediction of future events is dis-85
cussed in Sections 6.3 and 6.4, respectively.
The steps of the methodology are presented in Figure 4. First, the alarm
and event-log database is cleared and restructured and the definition of event
4
sequences for further analysis is determined. For the analysis of frequently oc-
curring operational sequences, frequent sequence mining-based applications are90
utilized. In the final step, a deep learning-based sequence-to-sequence predic-
tion model is described and its performance evaluated in light of the results of
the frequent sequence mining algorithms.
Figure 1: The workflow of learning statistically significant operational sequences from the
alarm and event-log database of chemical production sites.
To stimulate further research, the resultant MATLAB and Python codes
of the proposed clearing, sequence mining and deep learning algorithms are95
publicly available on the website of the authors (www.abonyilab.com).
3. Related Work and Background
As the information inefficiency of the occurring alarms significantly overload
the operators and reduce operator effectiveness [8], several, mainly conventional
techniques, have been utilised to reduce the number of alarms, like alarm limit100
deadbands [9], delay-timers [10] and filtering [11]. In the case of more advanced
techniques, due to the high number of interacting components at modern and
complex production sites, it is accepted that the generation of redundant and
co-occurring alarms is almost inevitable [12] and our aim is to detect and group
these redundant signals to reduce the information noise. However, the industrial105
alarm and event-log databases support the detection of such operational rules.
As Hu et al. [13] and Dorgo et al. [5] highlighted in their previous works,
longer sequences of industrial alarm messages cannot be handled and analysed
by themselves since the periodic interactions made by the operators constantly
5
change the underlying processes and thus the evaluation path of the alarm110
messages. However, these human factors may still quantify the effectiveness of
alarm management systems [14].
Several studies can be found in the literature that discusses the detection
of frequently occurring patterns in alarm data [6, 15, 16, 17, 18, 19, 20]. Even
though, the works [16, 17, 18, 19] only focus on the alarm data of telecom-115
munication networks, the present study explicitly focuses on the alarm data of
industrial-scale chemical processes as found in [20, 6]. Lai and Chen focus on the
mining of alarm flood datasets [15]. Moreover, only the works of [15, 17, 18, 19]
seek to identify sequential patterns, the works [16] and [20] search for correlated
alarm tags, while Hu et al. only generate itemset patterns, where orders and120
correlations are not taken into consideration [6]. Formerly, a solution to the
mining of multi-temporal sequential alarm patterns was published [21] and this
algorithm improved in [4], which is able to generate frequently occurring pat-
terns in large temporal event databases and determine not just the sequential
order of the events inside a sequence but their temporal relationship as well.125
Even though, recurrent neural networks (RNN) are capable of learning the
probability distribution of a sequence using their internal states, long-term de-
pendencies significantly hinder the training of a traditional RNN [22]. Long
Short-Term Memory (LSTM) units were explicitly designed to learn long-range
dependencies over sequences [23]. In the present study, our aim is to draw up a130
sequence-to-sequence prediction model. In order to ensure our model is capable
of handling input and output sequences of different lengths without knowing
the distribution of sequence length or the relationship between the length of the
input and output sequences prior to the analysis, the most advanced sequence-
to-sequence deep learning neural network structure suitable for our concept was135
selected [24]. The basic structure of the proposed model was presented in [25].
The prediction of discrete events and especially alarms is a highly researched
topic in the chemical industry. Without any claim of completeness, Zhu et al.
[26] proposed a dynamic alarm prediction methodology using an n-gram model.
Lai et al. introduced a pattern matching-based alarm flood prediction method-140
6
ology [27]. In our previous works, how individual events can be predicted using
frequent pattern mining and Bayesian probabilistic measures was discussed [21]
and the basic principles of sequence-to-sequence deep learning structures were
introduced that are explained and elaborated on in detail in the present article
[25].145
This paper is an extension of our previous conference publication [25] with
an analysis of the possibilities of event prediction on real industrial datasets,
comparing the performance of the frequent sequence mining and deep learn-
ing algorithms for event prediction together with the introduction of a novel
visualisation technique for frequent sequence-based event prediction and novel150
performance metrics for the evaluation of sequence-to-sequence deep learning
models.
4. Operational sequences of alarm and event-log databases
In the present section, first, the definition of the analysed events and their
organisation into event traces are presented in Section 4.1. Then the defini-155
tions and terminology of frequent sequence mining are provided in Section 4.2.
The section concludes with a description of the probabilistic interpretation of
frequent sequences in Section 4.3.
4.1. Determination of operational sequences
The alarms and operator actions recorded in the alarm and event-log database160
of chemical technologies can be treated as states of the technology. Theoreti-
cally, each state (denoted by s) is either an alarm message or an operator action
from the sets of A={a1, a2, ..., am}alarm messages and O={o1, o2, ..., on}
operator actions. The time interval in which the given state occurs in the tech-
nology is referred to as an event denoted by eiand represented by a triplet165
< s, sti, eti>, where stidenotes the start and etithe end time of the event,
moreover, the state of the technology is either represented by an alarm mes-
sage or operator action (s∈S={A,O}). It should be highlighted that each
7
state can occur in multiple events. Given that nowadays alarm management
systems do not assume that each alarm should be actionable, a methodology170
for the reconstruction of alarm messages and operator actions of alarm and
event-log databases is briefly introduced focusing on the primary goal of the
present study, namely to reveal the causal connections between alarm messages
(or certain situations represented by alarm sequences) and the operator actions
applied.175
Fundamentally contrary to the assumption that each alarm should be action-
able, the number of operator actions in the preprocessed alarm and event-log
database is still significantly less than the number of alarm messages. As the
main concern of our investigation is to reveal the antecedent alarm messages
that lead to specific operator actions, the notion of event sequences is defined,180
namely the segments of operational periods when the operators interact with
the process. A schematic representation of the definition of event sequences is
presented in Figure 2.
Figure 2: The segmentation of alarm and event-log databases for determining the causality
between alarm messages and operator actions. Time is represented by the horizontal axis,
while the operator actions and alarm messages are illustrated by the yellow and blue bars,
respectively.
The horizontal axis represents the time, while the yellow and blue bars de-
note the operator actions and alarm messages, respectively. The horizontal185
8
length of the bars is proportional to the temporal length of the events: the
duration of alarm messages can be lengthy, while operator interventions are in-
stantaneous. The core idea of the segmentation process is based on the cognitive
model of the operators. Operator actions that follow on from each other more
closely than a defined operator action series window (dwo) are assumed to be190
conducted based on the same intuition of the operator and are grouped into
an operator action series. This close aggregation of operator actions facilitates
the analysis of antecedent events of operator decisions. Since the determina-
tion of what alarms lead to certain types of operator interactions is desirable,
asequence window was defined before (dwab) and after (dwaa) the operator195
action traces based on the time constant of the processes. Only the alarms
that start before the operator action series in the dwabtime window and end
after the start of the operator action series in the time window dwaaare taken
into consideration in the subsequent part of the analysis. Therefore, any alarm
message that did not require operator interactions is deleted from the database200
in a similar fashion to the alarm message marked by the top horizontal blue
bar in Figure 2. Therefore, a sequence is a chronologically ordered list of states
Φ:=s1⇒s2⇒... ⇒snand consists of the alarms that anticipate the op-
erator action series (and those that end after them) and the operator action
series itself. The sequence window should cover the time needed for a fault to205
develop and trigger the related alarm messages in order to see the co-occurring
relationship with the consequential operator actions.
Using the described method, the industrial alarm and event-log database
is segmented into sequences. An exemplary sequence database is presented in
Table 1 with four sequences and the related sequence IDs (SIDs). The generated210
sequence database is applied for the determination of frequent sequences as well
as closed maximal sequences and for the mining of sequential rules.
4.2. Mining frequent sequences from the alarm and action sequences
In the following, the applied sequence mining-related definitions are ex-
plained using simple didactic examples to facilitate understanding and the def-215
9
Table 1: Exemplary sequences of alarm messages and operator actions
SID States of the seq.
1a1⇒a2⇒a4⇒o1⇒o2
2a1⇒a2⇒o1
3a2⇒a3⇒o3
4a5⇒o1
5a1⇒a2⇒a4
6a1⇒a2⇒a4⇒o1
inition of the problem.
A sequence ΦA:=sA,1⇒sA,2⇒... ⇒sA,n is said to occur in another
sequence ΦB:=sB,1⇒sB,2⇒... ⇒sB,m if the states of ΦAare found in
the same order of occurrence in the sequence ΦB, however, they do not have
to follow each other in strictly consecutive fashion (denoted as ΦAvΦB). In220
the present context, frequently occurring operational patterns represented by
alarm messages and operator actions within longer sequences are sought after.
However, it is necessary to quantify the frequency of an operational sequence and
for this purpose, the support of a sequence is defined, which is a measure scaled
between 0 and 1 that characterises the frequency of certain alarm messages225
and describes how frequently operator actions are applied during production.
Therefore, the support of a sequence ΦAis defined as the number of sequences
where the given sequence occurs (ΦAvΦ) divided by the total number of
sequences in the analysed database and is denoted by support(ΦA). As a result,
a sequence Φ is said to be frequent if support(Φ) ≥minSupp, where minSupp230
is a predefined threshold set by the user. The aim of frequent sequence mining
is to identify all the frequent sequences in the analysed database.
A sequence ΦAis referred to as closed if no other sequence ΦBis present
such that ΦAvΦBand their supports are equal. A sequence ΦAis maximal if
it is not strictly included in another closed sequence (ΦAvΦB). The maximal235
closed sequence can concisely represent longer operational sequences and any
10
subsequent sequences are frequent as well.
Consider the exemplary sequences presented in Table 1 when minSupp = 0.5
(therefore, a frequent sequence has to be present in at least three out of six
sequences from the sequence database in Table 1). The frequent sequences are240
presented in Table 2 together with an assigned frequent sequence ID (FSID).
The sequences a2and o1are closed sequences (they are not present in any other
sequences with the same support values), while the sequences a1⇒a2⇒o1
and a1⇒a2⇒a4are closed maximal sequences.
Table 2: Frequent sequences generated from the exemplary sequences of Table 1
FSID Sequence Support
1a12/3
2a25/6
3a41/2
4o12/3
5a1⇒a22/3
6a1⇒a2⇒o11/2
7a1⇒a2⇒a41/2
A sequence consisting of k+ 1 states is referred to as a k-length sequence245
(the number of transitions between states is k), or a k+ 1-state sequence (the
number of states is k+ 1) and is denoted by Φk.
According to this, for:
•k= 0-length patterns, where Φ0:=s0, s0 S (this trivial pattern with
only one state, e.g. ”high column top temperature” is referred to as a250
degenerated sequence)
•k= 1-length pattern is formulated as Φ1:= (Φ0⇒s1):= (s0⇒s1)
•In general a sequence where k≥2 in length is formulated as Φk:=
(Φk−1⇒sk):= (s0⇒s1⇒s2⇒... ⇒sk).
11
Therefore, the states of the technology are represented by the alarm messages255
and operator actions as S={A,O} and these events that occur are organised
into sequences as described previously in Section 4.1. Our aim is to extract
useful information from these sequences in the form of frequently occurring
sequences and closed maximal sequences.
In the present paper, the frequent sequences were determined by the CM-260
SPAM algorithm [28], while the closed maximal sequences were mined by the
VMSP algorithm [29]. Both algorithms apply the vertical database format,
which provides the advantage of generating patterns without performing costly
database scans and allows algorithms to perform better on datasets having long
sequences. Moreover, the algorithms utilise a compact data structure to store265
the item co-occurrence information, named the Co-occurrence MAP (CMAP)
and by this effectively prune a large number of candidates and generates the
resultant frequent sequences. We have tried to apply the most suitable algo-
rithms for the mining of the sequence database, however, the applied algorithm
does not influence the found sequences, only the speed and effectiveness differs270
at the different algorithms.
4.3. Probabilistic measures of sequential rule mining
Lengthier causal rules can be defined if the number of occurrences of different
states is not independent, therefore, the probability of the occurrence of the Φk
sequence P(Φk):=P(s0⇒s1⇒s2⇒... ⇒sk) can be calculated by the chain275
rule as follows:
P(Φk) = P(s1|s0)×P(s2|s0⇒s1)×... ×P(sk|s0⇒s1⇒s2⇒... ⇒sk−1) (1)
Therefore, according to the chain rule the probability of the occurrence of a
k-length sequence can be calculated from the probability of the occurrence of the
sequence that is one state shorter, namely the k−1-length sequence, should the
P(sk|Φk−1) conditional probability be known (where Φk−1is the sub-pattern of280
Φkwhich can be unfolded as Φk−1⇒sk):
12
P(sk|Φk−1) = P(Φk)
P(Φk−1)=supp(Φk)
supp(Φk−1)(2)
It should be noted that this probability reflects how confident the next state
of the sequence is knowing the previous k−1 state. Therefore, a confidence
measure can be defined:
conf(Φk) =
supp(Φk)
supp(Φk−1)×conf(Φk−1) Φk>0
1 Φk=0
(3)
As a result, two important probability measures have been defined, both285
of them scaled between 0 and 1. While the support measures the frequency
of the occurrence of a sequence of alarm messages and operator actions, the
confidence describes how reliable the given sequential rule is. It is important
to highlight that conf(Φk) describes the confidence of the sequence Φk, while
the confidence of the transition between states or longer sequences is denoted290
as conf(Φk−1⇒sk) = P(sk|Φk−1) and thus the two quantities are not equal.
Similarly, the confidence of transition between longer sequences can be in-
terpreted as well. A sequence Φkcan be divided into an antecedent and a
consequential part as Φk= Φ0⇒Φ00, where Φ0and Φ00 are the sequential sub-
patterns of Φk. The confidence of the transition between the antecedent (first295
part of the temporal sequence, Φ0) and consequent sequences (second part of
the temporal sequence, Φ00 ) can be expressed mathematically as conf (Φ0⇒Φ00 )
and is basically the proportion of Φ0sequential rules that continues as described
in Φ00.
5. Prediction of future operational sequences300
It is supposed that the Φ0sequence of events (present and past alarm mes-
sages as well as operator actions) defines the state of the process and it is
assumed that based on this information the sequence of future events Φ0⇒Φ00
can be predicted. Therefore, Φ00 =f(Φ0) models that efficiently handle this
13
sequence-to-sequence modeling problem are sought after. First, it is shown how305
frequent sequences assist in the construction of such models (Section 5.1) and
then a deep learning-based sequence-to-sequence learning model is proposed
(Section 5.2).
5.1. Prediction using frequent operational sequences
The core concept of our work is that the state of a chemical technology can310
be represented by the previously registered events (alarm messages and opera-
tor actions) and this information is sufficient to predict future scenarios of the
operation or recommend operator interactions to optimally control production.
For example, given the presence of a k-length event sequence in our tech-
nology (Φk), it is possible to compare it with the previously generated frequent315
sequences and search for the frequent sequences that contain the occurred events
(ΦkvΦF SI D, where the frequent sequence is identified by its frequent sequence
ID). From the frequent sequences identified the one with the highest confidence
describes the most probable state of the process. Then the basis of the predic-
tion is to split the frequent sequence into an antecedent part that contains the320
past events (the events in Φk) and a consequential part, as ΦF SID = Φ0⇒Φ00 .
The probability of the occurrence of the predicted future state is provided by the
conf(Φ0⇒Φ00) confidence. Therefore, it is possible to make a prediction based
solely on the frequent sequences, in other words, based on the frequently occur-
ring sequences it is possible to predict the future sequences with a relatively high325
level of confidence. This ”disadvantage” of the method can be considered as an
advantage since the analysis of the frequent sequences provides information on
the predictability of the event sequence and consistency of the operation.
Following this concept, the network presented in Figure 3 provides a nice
overview of the information revealed from the frequent sequences. The nodes330
represent the frequent sequences of Table 2 in the form of their FSID, while the
directed edges show the possible continuation of the sequences. Therefore, for
example, the sequence with FSID 1 (a1) can continue with an a2state forming
the sequence with FSID 5 (a1⇒a2). Here, the alarm sequence can either end
14
in an operator action that forms the sequence FSID 6 (a1⇒a2⇒o1) or if the335
operators do not interact in the process, alarm a4might sound and the sequence
FSID 7 will be formed (a1⇒a2⇒a4). The confidence of transition between
the related sequences can be assigned to the edges of the network to show how
confident certain scenarios are.
Figure 3: The network of the frequent sequences of Table 2. The nodes represent a frequently
occurring operational sequence according to its frequent sequence ID (FSID), while the di-
rected edges show the possible continuation of the sequence. This representation provides
an easily interpretable visualisation for the analysis of the frequently occurring operational
pathways.
This representation shows how the online prediction process using frequent340
sequences can be conducted. The incoming events form a sequence which repre-
sents the current state of the technology. However, this sequence most probably
does not form a frequent sequence, the most similar frequent sequence, which
contains at least a part of the sequence of incoming events in the same order,
can usually be identified and used to determine the future operational pathway345
15
with the highest confidence. Therefore, by assuming a similar network to that
in Figure 3, first which node represents the present state of the process most ac-
curately can be determined and then the future path identified with the highest
confidence.
Even though this process can be applied with a satisfactory level of prediction350
accuracy in many cases, operational situations exist which have not occurred
previously in the past or can be considered as a mixture of previous sequences.
The sequence-to-sequence deep learning models aim to reveal such connections
between the sequences that occur. In the following, how such a deep learning
model can be constructed and applied to predict future operational sequences355
is described.
5.2. Encoder-decoder deep recurrent neural network-based sequence prediction
The most advanced approach to sequence-to-sequence learning is based on
deep-learning neural networks. The construction and application of this recur-
rent neural network-based model are not trivial, so in the following the proposed360
goal-oriented model structure is introduced (depicted in Figure 4).
The input of the model: Figure 4 highlights the structure of the input
sequences. Firstly, an end-of-sequence (EOS) tag is appended to the sequence to
indicate the end of the event series. Secondly, in order to ensure fixed sequence
lengths (which is required by the model), the sequences have been extended to365
the length of the longest sequence by adding padding symbols (PAD) after the
EOS tag. Both the EOS and PAD tags are simply added to the end of the
sequences and handled similarly as the alarm messages and operator actions in
the subsequent steps. Finally, the order of the events in the input sequence is
reversed, since Sutskever et al. [24] determined that the prediction accuracy370
significantly improves when the beginning of the input sequence is ”closer” to
the beginning of the predicted sequence.
Embedding layer: Even though, the structure of the sequence of input
events is described, the sequence of symbols still needs to be transformed into
mathematically manageable vectors. Therefore, the symbols of the events are375
16
Figure 4: A schematic illustration of the proposed methodology. The encoder maps the input
sequence into a fixed length vector representation. Using this vector as the initial state the
decoder layer determines the next event with the highest level of probability according to the
argmax function of the dense layer. The StOS tag marks the start of a sequence, the EOS tag
indicates the end of a sequence, while the PAD symbol is added to the sequences to maintain
equal sequence length.
encoded as one-hot encoded vectors, ohtand a binary vector of length nd, where
only one bit related to the encoded symbol is fired and ndis the number of
one-hot encoded symbols. A detailed explanation and visualisation of the one-
hot encoding process can be found in [30]. The embedding layer undergoes a
xt=Wemb ohtlinear transformation, which maps the one-hot encoded vectors380
into a lower (ne) dimension of continuous values. Note that in Figure 4 the
embedded forms of the EOS and PAD symbols are denoted by the symbols
EOS’ and PAD’, respectively.
Encoder and decoder layers: First, the encoder LSTM layer processes
the input sequence, but instead of calculating output values, it maps the em-385
bedded input sequence into its internal states. These internal variables of the
encoder layer represent the current state of the technology and are used to
condition the LSTM units of the decoder layer, which means the transfer of
information about what has happened previously in the process and what kind
17
of prediction the decoder layer should generate. The decoder layer is designed390
to predict the next event of the predicted sequence iteratively, always applying
the previously predicted event as the input for the prediction of the next state,
so the procedure is repeated until an EOS signal is predicted or the maximum
sequence length achieved.
Dense layer: After the decoder layer maps the input event ˆxˆ
tinto a vector
of real values ˆ
hˆ
trepresented as ˆ
hˆ
t=hˆ
h1,...,ˆ
hnU
ˆ
ti, these values are used to cal-
culate the probabilities of occurrence of the events using the softmax activation
function of the dense layer in Figure 4,
P(ˆeˆ
t+1|ˆxˆ
t) = P(ˆeˆ
t+1|ˆ
hˆ
t) =
exp (ˆ
hˆ
t)Tws,j +bj
nd
P
j=1
exp (ˆ
hˆ
t)Tws,j +bj(4)
where ws,j represents the j-th column vector of the weight matrix of the output395
dense layer of the network Ws, and bjrepresents the degree of bias.
Training: The defined operational sequences are applied to train the deep
learning model. In order to train the model, the input data of both the encoder
and decoder layers must be encoded in the form of the one-hot vector. The input
dataset of the decoder layer is the one-hot vectorized form of the sequences that400
are to be predicted. Similarly, the target data of the decoder layer is also
identical to the decoder input data, but is shifted by one timestep since our aim
is to predict the event et+1 from the event et. This training approach, when the
expected future output from the training dataset during the current timestep is
applied as the input in the next timestep rather than as the predicted output405
generated by the model, is referred to as teacher forcing [31]. By using this
technique, all of the layers (the two embedding, the encoder, the decoder and
the dense layers) are trained simultaneously.
Prediction: Prior to the prediction, it is necessary to encode the sequence
that defines the state of the technology in terms of the internal state vector410
using the encoder layer. Then, the internal states of the encoder network are
transferred to the decoder layer. The prediction starts with the start-of-sequence
symbol (marked as StOS in Figure 4). The layer generates a prediction of the
18
next event which will be reintroduced into the input of the decoder layer and
applied as the input in the next time step. The generated events are always415
appended to the predicted target sequence. This prediction process is repeated
until the layer generates the end-of-sequence symbol or reaches the previously
set limit of the length of the predicted target sequence.
Evaluation: The evaluation of the model should be related to its intended
application. Since our focus is on the development of a recommendation (or au-420
tomation) system and, hence, the construction of an accurate prediction model,
three performance metrics have been identified. First of all, P1has been iden-
tified as the percentage of sequences that include at least one well-predicted
event. For mathematical formulation, Φnis the nth operational sequence in the
database that we aim to predict, while ˆ
Φnis our prediction. Nis the number of425
sequences in the analysed database, the cardinality of a set is marked with |∗|,
while the common elements in two sequences are marked as their intersection.
Therefore, P1is expressed as follows:
P1=PN
n=1(|Φn∩ˆ
Φn| ≥ 1)
N(5)
Second, P%is a set-based similarity measure which describes the well-predicted
events as a percentage of the length of the target sequence, has been defined.430
The events do not have to be in the order of occurrence, P%measures how
accurately the type of events are predicted.
P%=PN
n=1
|Φn∩ˆ
Φn|
|Φn|
N(6)
Finally, PED was proposed, which is an edit distance-based similarity metric
that provides the edit distance between the true (target) and predicted sequence
as a percentage of the length of the longer sequence among them. The edit dis-435
tance yields the minimum number of elements that must be inserted or skipped
in the compared sequences in order to be identical. The edit distance of two
sequences is marked with ED, and Equation 7 mathematically describes the
19
PED edit distance-based similarity metric.
PED =PN
n=1 ED(Φn,ˆ
Φn)
N(7)
20
6. Results440
The performance of the proposed methods is presented in terms of the anal-
ysis of the alarm and event-log database of an industrial delayed coker unit
located in the Danube Refinery of the MOL Group [4]. The process flow dia-
gram of the analysed technology can be seen in Figure 5.
Figure 5: The process flow diagram of the analysed industrial delayed coker unit. The plant
is divided into two main parts: one for the production of coke and the other for its separation.
Although all the tag names and identifiers are masked due to confidential-445
ity, the hierarchical structure of the analysed process in the generation of the
identifiers has been included. The structure of the tags is depicted in Figure 6.
The last four digits of every tag indicate the sensor or actuator where the given
signal was recorded. Each of these sensors/actuators can be clearly assigned to
a unit, which is indicated by the middle three digits of the tag, and each of these450
units can be assigned to a production unit in the technology, which is indicated
by the first two digits. The hierarchical structure of a chemical technology and
its role in alarm management was discussed in depth in [4]. To easily distinguish
between the alarm messages and operator actions, the first digit of every tag is
optional: it is one if the related tag is an alarm message and does not exist if455
it is an operator action. For example, the tag shown in Figure 6 is an alarm
message (the alarm flag is 1) that occurred in production unit number 9, more
specifically in unit number 491 on sensor/actuator 10.
21
Figure 6: The structure of the applied tags. The last four digits indicate which sensor/actua-
tor, and the middle three and anticipating two digits represent the related units and production
unit, respectively. The first digit of every tag is optional: it only exists and is set as one if the
tag is an alarm message.
Preprocessing of the datasets and analysis of the results were conducted in
MATLAB. The frequent sequences [28] and maximal frequent sequences [29]460
were generated by the algorithms of the SPMF Java open-source data mining
library. The deep neural network was built and trained with Keras in Python
using TensorFlow backend with an Nvidia GeForce GTX 1060 6GB GPU and
the application of CUDA.
In the present section, first, the preprocessing of the analysed datasets and465
the definition of operational sequences are presented in Section 6.1. This is
followed by the analysis of the frequent operational sequences which start with
alarm messages and end with an operator interaction in order to determine how
consistent the operator actions are based on the alarm messages (Section 6.2).
Finally, how the frequent sequences and sequence-to-sequence deep learning470
models can be applied for event prediction is presented in Sections 6.3 and 6.4,
respectively.
6.1. Exploratory data analysis for the cleaning and restructuring of the alarm
and event-log database
Before the application of advanced analysis techniques, it is necessary to475
define the events and clean the analysed datasets thoughtfully. In support of
the definition of the applied parameters for this process, the following selected
parts of the exploratory data analysis are presented. The ratio of alarm messages
to operator actions after each clearing or merging step is presented in Figure 7.
22
The numbers are normalized according to the original number of events in the480
database (in the absence of the suppressed and shelved alarm messages).
First of all, only the alarm messages and operator actions of the alarm and
event-log database are analysed. As can be seen in Figure 7, the number of
alarm messages is significantly higher than the number of operator actions,
therefore, the fundamental concept of alarm management philosophy, that each485
alarm should be actionable, is not fulfilled.
Both in the case of alarm messages and operator actions, repeated events of
the same type are present. In the case of alarm messages, this is mainly due to
badly configured alarm levels, which cause alarm messages of the same type to
frequently sound and clear out with short breaks between them. These alarms490
are the so-called chattering alarms [1]. Similarly, the operators can change the
setpoint of different control units during multiple steps over a short time just
to reach the final setpoint value. In order to avoid this disturbing effect, a time
window for both the alarm messages and operator actions has been defined (both
set to 30 seconds) and when the time interval between multiple events of the495
same type is less than the defined time window then only the first occurrence
in the database is kept. As can be seen in the second column of Figure 7, this
significantly reduces the number of events in the analysed database, however,
according to the results of further applied techniques the events became much
more interpretable.500
Finally, extremely short alarm messages (shorter than 10 seconds) that pro-
vide the operators with insufficient time to respond and long-standing alarms
(longer than 1 hour) that are too long to informatively indicate the presence of
a malfunction are also cleared from the database.
All of the above-mentioned clearing and merging processes aim to achieve505
an easily interpretable event-log database, which potentially holds the causal
connection between the alarm-alarm, alarm-action, action-action and action-
alarm event pairs, and is, therefore, suitable for the description of the cognitive
model of the operators.
The distribution of the resultant alarm messages and operator actions is510
23
1. original
2. merge repeated
3. rem. of long
4. rem. of sort
0
10
20
30
40
50
60
70
80
90
100
% of events
op. actions
alarms
Figure 7: The ratio of alarm messages and operator actions after each clearing or merging
step. The numbers are normalized by the original number of events in the database.
presented in descending order in Figure 8. The vertical axis shows the number
of events as a percentage of the sum of events in the analysed dataset. The
figure only presents the top 50 types of alarm messages and operator actions.
The long-tailed distribution shows the complexity of the event prediction task
since a considerable number of alarm messages and operator actions are present515
with very few occurrences in the database.
The application of either frequent sequence mining algorithms or sequence-
to-sequence learning applications requires the definition of event sequences. The
process of defining event sequences is described thoroughly in Section 4.1. The
critical parameters, that we need to choose carefully are the operator action520
series window dwo, and the sequence window before (dwab) and after (dwaa)
the operator action traces.
The operator action series window, which defines the operator actions that
were probably conducted due to the same intuition of the operator and, there-
24
5 10 15 20 25 30 35 40 45 50
alarm
0
5
10
%
5 10 15 20 25 30 35 40 45 50
operator action
0
5
10
%
Figure 8: The long-tailed distribution of the alarm messages (top) and operator actions (bot-
tom) as a percentage of the sum of events in the analysed dataset.
fore, provides the basis for the definition of the sequences can be defined based525
on the reaction time for chemical process operators. Therefore, the approxi-
mate length of the operator action series window was set based on the work
of Buddaraju [32], who suggested a 20-second-long reaction time for chemical
process operators for responding to certain situations. The sequence window
before and after the operator action traces can be defined based on the expert530
knowledge of the approximate time constant of the process (therefore, how fast
it reacts for certain changes). Based on this, both the sequence window before
and after the operator action traces were set to approximately 30 minutes. In
order to show how sensitive the number of sequences for these time window
parameters a sensitivity analysis was conducted. Assuming a dwab= 30 mins535
and dwaa= 30 mins and changing dwoto be equal to 20, 30 and 40 seconds,
the number of generated sequences was 20330, 18501 and 16782, respectively.
Setting the dwoto constant 30 seconds and changing the dwaaand dwabpa-
rameters equally to 25, 30 and 35 minutes, the number of generated sequences
25
was 18063, 18501 and 18792, respectively. Applying the above described ex-540
pert knowledge and in view of the results of the sensitivity analysis, the further
part of the analysis was conducted with a 30-second-long operator action series
window (dwo= 30 secs) and the time windows before and after the defined
series of operator actions were both set to 30 minutes (dwab= 30 mins and
dwaa= 30 mins). The maximum number of alarm messages and operator ac-545
tions in a sequence were both set to 10 therefore, the maximum length of a
sequence was set to 20. Figure 9 shows the distribution of the number of alarm
messages and operator actions in the top and bottom parts of the figure, re-
spectively. The higher density of short alarm and operator action sequences is
clearly visible in the Figure. Therefore, the figure indicates that usually shorter550
alarm sequences are formed during the operations and even shorter operator
action sequences are applied to prevent such malfunctions.
Figure 9: The distribution of the number of alarm messages (top) and operator actions (bot-
tom) in the event sequences.
26
6.2. Frequent sequences of alarm messages and operator actions
The defined sequences can be conveniently analysed by frequent sequence
mining-based applications. First of all, which alarm messages lead to operator555
interventions, what operator interventions are frequently applied and how de-
termined these operational pathways are were investigated. By analysing these
questions, our aim was to investigate the possibility of forming a recommenda-
tion system or automation solutions to facilitate the work of the operators in
situations that frequently occur.560
Figure 10 shows the two-event-long sequences that start with an alarm mes-
sage and end with an operator action. The colours show the support (therefore,
the frequency of occurrence) of these event pairs as a percentage. As can be
seen, very few sequences are present that show the strong causal relationship
between alarm messages and operator actions, even though the minimal support565
was set at only 0.6 %. It is important to highlight that by analysing the longer
sequences that start with alarm messages and end with operator actions only
five more sequences were identified:
•1080400873 ⇒1080400876 ⇒80150219
•1094910010 ⇒1050820493 ⇒80150219570
•1094910010 ⇒1050820493 ⇒54870114
•1094910010 ⇒1050820493 ⇒54870059
•1094910010 ⇒1050820493 ⇒140080393
Figure 10 highlights how the causal relationship between alarm messages
and operator actions can be determined and visualised. As is presented, most575
of these frequent sequences are short and their confidence value is rather small
as well (all of them below 5 %!).
6.3. Prediction of operational pathways using frequent alarm sequences
This method of prediction using frequently occurring operational sequences
was conducted as described in Section 5.1 and Figure 11 is constructed analo-580
27
110990438
140080393
140080734
140080735
150460030
170160223
170390550
44860246
44860272
50120130
50120549
54870058
54870059
54870114
54870198
54870408
80150219
80150224
80150537
80380259
80380261
80400252
80400720
80730197
80800700
80800701
80800875
84900278
84900584
84900585
84900664
94910589
OperatorAction
1030780038
1050820493
1080150684
1080400873
1080400876
1094910010
Alarm
1
1.5
2
NaN
Figure 10: The heatmap of the two-event-long sequences that start with an alarm message
and end with an operator action. The colours show the support of the sequences.
gously to Figure 3. Therefore, every prediction can be interpreted as visiting
the related nodes of the network of frequent sequences. Each node represents
a frequent sequence mined using the CM-SPAM algorithm [28] with a minimal
support of 0.6 %, a frequent pattern length of between 1 and 100, and a gap of
10 (the specification of gaps between consecutive items in sequential patterns is585
permitted. If it is set as 1, then no gap is allowed and each consecutive item
of a frequent sequence must appear consecutively in the original sequence as
well. If a gap of Nis permitted, then N−1 items are allowed between two
consecutive items of a frequent sequence). Full flexibility in the generation of
frequent sequences is permitted by a gap of 10 which is necessary since multiple590
operators can function simultaneously on the same or different problems and
multiple alarms can occur in different parts of the technology. The directed
edges show the possible directions of the continuation of a sequence, while the
colours denote the type of the last event in the sequence: the green and red
colours stand for the operator actions of the unit with tags 8 and 5, respec-595
tively, while the black nodes represent the alarm messages at the end of the
sequence.
Two smaller parts of the network are enlarged in the figure in order to provide
a more insightful vision of the process. The sequence 66 indicates a single
alarm in the raw material system of the delayed coker unit. This alarm can be600
followed by an alarm message in the coke tanks, the heating gas line, the product
system or in the furnace, according to the sequences 321-324, respectively. These
28
sequences all contain one additional alarm that sounded in the mentioned parts
of the process. On the other hand, the sequence 127 indicates an alarm in the
blowdown system and it is usually handled by two types of operator actions both605
performed right there in the blowdown system (indicated by sequences 685 and
686). Moreover, the operators are provided by the confidence values of these
future alarm messages. However, these probability values are not published due
to confidentiality.
Figure 11: The network of frequent sequences. Each node represents a frequent sequence and
each directed edge shows a possible continuation for a given sequence. The colour of the node
shows the type of the last event of the sequence: the green and red nodes indicate that the
last event was an operator action in the units with tag 8 and 5, respectively, while the black
nodes indicate that the last event of the sequence was an alarm message.
It is important to highlight that even though the 0.6 % minimal support610
can seem to be small, the minimal support implies not the accuracy of the
searching of sequences, but the complexity of the sequence mining and the alarm
management problem. Moreover, approximately 18500 sequences are present
29
in the mined dataset, therefore, the 0.6 % minimal support means that the
found frequent sequences are present for more than 111 instances in the original615
dataset. As this is a real and complex industrial problem and we have so many
sequences in the dataset, the minimal support can be as low as this, moreover,
in such complex problem, this can indicate a significant operational pattern.
6.4. Prediction of operational sequences using sequence-to-sequence deep learn-
ing models620
Even though Figure 10 shows that a weak consequential follow-up connection
exists between alarm messages and operator actions, the performance of the
proposed sequence-to-sequence deep learning model was investigated. Three
different datasets were defined and after training on the related dataset, the
performance of the deep learning model was evaluated. In the first dataset, the625
alarms at the beginning and operator actions at the end of each operational
sequence, referred to as the ”original” dataset, were simply separated. In this
case, situations when the model has to predict the operator actions based on very
few (often just 1) antecedent alarm messages exist. To avoid this situation and
ensure the model performs more accurate predictions, sequences which contain630
at least three alarm messages at the beginning were filtered out, therefore, only
the fourth event (operator action) has to be predicted first. This dataset is
indicated by the tag ”AO4”. However, based on the results of the frequent
sequence mining-based analysis, longer sequences are only formed in the case
of operator actions. Therefore, in the third dataset, it is our aim to predict635
operator actions from operator actions and sequences containing only operator
actions that are at least four actions long were separated: an attempt was made
to predict the end of the sequence based on the first four actions. This dataset
is indicated by the tag ”OO5”.
Therefore, first, the three datasets were compared to see how well the deep640
learning model performs on them. During the analysis, a 7-fold cross-validation
was applied with 20 % of the dataset as the test sample, while the embedding
dimension was set at 40 with 32 LSTM units in both the encoder and decoder
30
layers. The number of epochs was set at 500, with a batch size of 64 using the
RMSProp optimizer by Keras. The performance was evaluated in terms of the645
percentage of predictions with at least one well-predicted event (P1), the edit
distance (PED ) as well as the set-based similarity (P%) of the true and predicted
sequences. In the case of the edit distance, the goal was to minimize it, while
the other two performance measures were to be maximized. The performance of
the trained models in terms of the different datasets is presented in Figure 12.650
As can be seen, no significant difference exists between the performance of the
training and test datasets, therefore, the model does not seem to be overtrained.
The best performance achieved is on the learning of future operator actions from
past ones (dataset ”OO5”).
Training set Test set
0.4
0.6
0.8
P%
Orig dataset
Training set Test set
0.2
0.4
0.6
PED
Training set Test set
0.4
0.5
0.6
0.7
0.8
P1
Training set Test set
0.4
0.6
0.8
AO4 dataset
Training set Test set
0.2
0.4
0.6
Training set Test set
0.4
0.5
0.6
0.7
0.8
Training set Test set
0.4
0.6
0.8
OO5 dataset
Training set Test set
0.2
0.4
0.6
Training set Test set
0.4
0.5
0.6
0.7
0.8
Figure 12: The performance of the trained models in the case of the three different training
sets. In the case of the edit distance (PED ) the goal was to minimize it, while the other two
performance measures were to be maximized (the predictions with at least one well-predicted
event (P1) and the set-based similarity of the predicted and true sequences (P%)).
Using the dataset ”OO5”, the best model parameters were determined by655
sensitivity analysis. The size of the embedding dimension yields the size of the
vector-based representation of the different events. As can be seen in Figure 13,
the best performance was achieved using an embedding dimension of 60.
31
10 20 40 60
0.6
0.7
0.8
P%
Train dataset
10 20 40 60
0.2
0.25
0.3
0.35
0.4
PED
10 20 40 60
Embeding dimension
0.7
0.75
0.8
0.85
P1
10 20 40 60
0.6
0.7
0.8
0.9
P%
Test dataset
10 20 40 60
0.2
0.3
0.4
PED
10 20 40 60
Embeding dimension
0.7
0.75
0.8
0.85
0.9
P1
Figure 13: The performance of the model with different embedding dimensions. The perfor-
mance seems to increase by increasing embedding dimensions.
With the embedding dimension set at 60, the sensitivity of the model to
the number of applied LSTM units in both the encoder and decoder layers was660
examined. As can be seen in Figure 14, the performance does not increase with
more than 32 LSTM units.
Finally, the length of the training was examined by analysing of the applied
epochs (basically iterations during the training). Figure 15 shows a saturation
in the performance by increasing the number of epochs. The final model is665
trained using 500 epochs.
With the applied parameters it is possible to predict the end of the operator
action sequences based on the first few actions. The model is accurate as in
more than 80 % of the sequences at least one well-predicted event is present
and on average the set-based similarity of the predicted and actual sequences is670
in excess of 80 % as well. The edit distance between these sequences is less than
20 %. The highest prediction accuracy using the dataset ”OO5” highlights that
the operators can handle the alarm messages with different operator actions,
however, once an operator action sequence is started, it is quite consistent and
32
5 10 20 32 40
0.4
0.6
0.8
P%
Training dataset
5 10 20 32 40
0.2
0.4
0.6
PED
5 10 20 32 40
LSTM units
0.4
0.6
0.8
P1
5 10 20 32 40
0.4
0.6
0.8
P%
Test dataset
5 10 20 32 40
0.2
0.4
0.6
0.8
PED
5 10 20 32 40
LSTM units
0.4
0.6
0.8
P1
Figure 14: The performance of the model with different numbers of LSTM units. The optimum
performance of the model occurs with 32 LSTM units.
predictable what interventions will be implemented. The length of the embed-675
ding layer describes how the information is input into the model: the higher
the embedding dimension, the more parameters that code the related event.
Therefore, a higher embedding dimension can usually enhance prediction ac-
curacy. Finally, the LSTM units will mathematically formulate the long-term
dependencies between the sequence elements. Similarly to the embedding di-680
mension, if more LSTM units are present, the prediction accuracy is enhanced.
However, one must be careful not to overtrain a model: with the numerous pa-
rameters of the LSTM units, this must be carefully checked with the validation
of the model. The prediction accuracy of the sequence-to-sequence deep learn-
ing model is consistent with the number of the frequent sequences identified in685
addition to their calculated confidence level. However, the frequent sequence
mining-based prediction can only utilise situations that previously occurred in
the production, while the deep learning-based approach can make predictions
using arbitrary operational sequences without the analysed sequence having oc-
curred previously during the production. It is important to highlight that this690
33
20 100 300 500
0
0.2
0.4
0.6
0.8
P%
Training dataset
20 100 300 500
0.2
0.4
0.6
0.8
1
PED
20 100 300 500
Epochs
0
0.2
0.4
0.6
0.8
P1
20 100 300 500
0
0.2
0.4
0.6
0.8
P%
Test dataset
20 100 300 500
0.2
0.4
0.6
0.8
1
PED
20 100 300 500
Epochs
0
0.2
0.4
0.6
0.8
P1
Figure 15: The performance of the model with different applied epochs during the training.
The performance of the model does not significantly improve when more than 500 epochs are
applied.
deep learning-based approach is a black box model, therefore, its exact deci-
sions are hardly interpretable, but it is highly effective in the construction of
hidden internal structures. Its working method is also different from the fre-
quent sequence mining-based approach, where similar sequences are searched for
prediction in a lookup table-based representation. Here, an internal knowledge695
of the process is obtained, therefore, a potential advantage of the method can
be its ability for extrapolation, however, this was not investigated in the present
article. Even though, the method performed well on sequences that were not
frequent, which can also prove this theory.
7. Conclusions700
The ever-increasing level of integration of chemical production systems over
recent decades has resulted in control operators becoming overloaded. A safe
and economically optimal operation requires the fast and accurate identification
of the root cause of operational problems and the determination of optimal op-
34
erator interventions based on the alarms and warning signals that occur. The705
present paper provides just such a computational methodology based on the
analysis of industrial alarm and event-log databases. How the cleaning of such
databases should be conducted and the events organized to form informative
operational sequences was discussed. Based on the results of frequent sequence
mining, the rather inconsistent follow-up of alarm messages and operator ac-710
tions is presented: with a minimal support of only 0.6 % only a few frequent
connections were identified between them and their confidence level was low
as well. A novel methodology and an associated network-based visualisation
technique were proposed for how the alarm messages that occur represent the
states of the technology and can be applied to predict future events. Finally,715
as LSTM-based recurrent neural networks were explicitly designed to capture
the long-term dependency of sequences, a sequence-to-sequence deep learning
model was proposed. Realising that the evaluation of the performance of these
sequence-to-sequence models requires goal-oriented metrics, novel performance
metrics based on the set-based similarity and edit distance of sequences was720
proposed. The connection between alarm messages and operator actions was
shown to be weak using this model as well, however, the prediction of future
operator actions based on antecedent actions seems to be feasible. The predic-
tion accuracy of the deep learning model is in good agreement with the number
of frequent sequences identified and their calculated confidence levels. However,725
the deep learning-based sequence-to-sequence model is able to make predictions
from arbitrary operational sequences and does not require the sequence to be
frequent. In our future work, a methodology for the prediction of rare opera-
tional events is to be developed together with the deep learning-based generation
of possible operational sequences with the related probability metrics.730
Acknowledgements
This research was supported by the National Research, Development and
Innovation Office NKFIH, through the project OTKA-116674 (Process mining
35
and deep learning in the natural sciences and process development) and the
EFOP-3.6.1-16-2016-00015 Smart Specialization Strategy (S3) Comprehensive735
Institutional Development Program. Gyula Dorgo was supported by the ´
UNKP-
18-3 New National Excellence Program of the Ministry of Human Capacities.
Appendix
The structure of the LSTM units
Recurrent neural networks (RNNs) are designed to capture time-dependency740
in sequential data [33]. One of the most popular forms of RNNs are the LSTM
(long short-term memory) units [23] which are proposed to overcome the diffi-
culties of handling long-term dependencies [34].
The LSTM layer maps the sequence xkinto hk, the vector of the activities.
Figure 16 shows the structure of an LSTM unit. The key feature of LSTM units745
is the cell state (Ct
k) [35] that is able to forward information to the next unit.
Therefore, an LSTM unit is able to interact with its neighbouring cells via gates
either by adding information to or removing it from this memory flow.
The inputs of the LSTM unit are the activation of the previous cell ht−1
k, the
sequence element xt
kand the cell state of the previous unit Ct−1
k. Then first,750
the forget gate ft
kdetermines how much information from the previous units
should be kept:
ft
k=σ(Wf[ht−1
k,xt
k] + bf) (8)
where brepresents the bias vector of the neurons (and in all of the following
equations as well) and σthe sigmoid function applied.
The state and activity of the current LSTM unit will be updated using the755
sequence element xt
kand the preceding cell activation ht−1
kby applying the
sigmoid function of the input gate. The whole process is illustrated in Figure
16.
36
Figure 16: The structure of a single long short-term memory unit. The input sequence element
(xt
k), as well as the activity (ht−1
k) and cell state (Ct−1
k) values of the previous long short-
term memory unit are modified by the use of the sigmoid (σ) and hyperbolic tangent (tanh)
functions of the forget (ft
k), input (it
k) and output (ot
k) gates. The calculated activity (ht
k)
and cell state (Ct
k) values are transferred to the next long short-term memory unit, while
the value of the activity is directly output as well. The MUX box in the figure indicates the
forming of one signal from the ht−1
kand xt
kexpressions as in [ht−1
k,xt
k].
it
k=σ(Wi[ht−1
k,xt
k] + bi) (9)
˜
Ct
k= tanh(Wc[ht−1
k,xt
k] + bc) (10)
The LSTM unit updates its old cellstate Ct−1
kusing the forgetgate ft
kand
the filtered input gate it
k:
Ct
k=ft
kCt−1
k+it
k˜
Ct
k(11)
Finally, the activity of the LSTM unit is calculated based on the cellstate
and outputgate signals:760
ot
k=σ(Wo[ht−1
k,xt
k] + bo) (12)
ht
k=ot
ktanh(Ct
k) (13)
37
References
[1] K. VanCamp, Alarm management by the numbers, Chemical Engineering
Essentials for the CPI Professional, Automation & Control.
[2] C. M. Burns, Towards proactive monitoring in the petrochemical
industry, Safety Science 44 (1) (2006) 27 – 36, safety and Design.765
doi:https://doi.org/10.1016/j.ssci.2005.09.004.
URL http://www.sciencedirect.com/science/article/pii/
S0925753505000950
[3] M. Baptista, S. Sankararaman, I. P. de Medeiros, C. Nascimento,
H. Prendinger, E. M. Henriques, Forecasting fault events for predictive770
maintenance using data-driven techniques and arma modeling, Com-
puters and Industrial Engineering 115 (Supplement C) (2018) 41 – 53.
doi:https://doi.org/10.1016/j.cie.2017.10.033.
URL http://www.sciencedirect.com/science/article/pii/
S036083521730520X775
[4] G. Dorgo, K. Varga, J. Abonyi, Hierarchical frequent sequence mining algo-
rithm for the analysis of alarm cascades in chemical processes, IEEE Access
6 (2018) 50197–50216. doi:10.1109/ACCESS.2018.2868415.
[5] G. Dorgo, K. Varga, M. Haragovics, T. Szabo, J. Abonyi, Towards operator
4.0, increasing production efficiency and reducing operator workload by780
process mining of alarm data, Chemical Engineering Transactions 70 (2018)
829 – 834. doi:10.3303/CET1870139.
URL https://www.aidic.it/cet/18/70/139.pdf
[6] W. Hu, T. Chen, S. L. Shah, Detection of frequent alarm patterns in in-
dustrial alarm floods using itemset mining methods, IEEE Transactions on785
Industrial Electronics 65 (9) (2018) 7290–7300. doi:10.1109/TIE.2018.
2795573.
38
[7] J. Zhu, Y. Shu, J. Zhao, F. Yang, A dynamic alarm man-
agement strategy for chemical process transitions, Journal of
Loss Prevention in the Process Industries 30 (2014) 207 – 218.790
doi:https://doi.org/10.1016/j.jlp.2013.07.008.
URL http://www.sciencedirect.com/science/article/pii/
S0950423013001393
[8] N. Dadashi, D. Golightly, S. Sharples, Seeing the woods for
the trees: The problem of information inefficiency and infor-795
mation overload on operator performance, IFAC-PapersOnLine
49 (19) (2016) 603 – 608, 13th IFAC Symposium on Analysis,
Design, and Evaluation ofHuman-Machine Systems HMS 2016.
doi:https://doi.org/10.1016/j.ifacol.2016.10.628.
URL http://www.sciencedirect.com/science/article/pii/800
S2405896316322406
[9] N. A. Adnan, I. Izadi, T. Chen, On expected detection de-
lays for alarm systems with deadbands and delay-timers, Jour-
nal of Process Control 21 (9) (2011) 1318 – 1331. doi:https:
//doi.org/10.1016/j.jprocont.2011.06.019.805
URL http://www.sciencedirect.com/science/article/pii/
S0959152411001296
[10] Design and analysis of improved alarm delay-timers, IFAC-
PapersOnLine 48 (8) (2015) 669 – 674, 9th IFAC Symposium
on Advanced Control of Chemical Processes ADCHEM 2015.810
doi:https://doi.org/10.1016/j.ifacol.2015.09.045.
URL http://www.sciencedirect.com/science/article/pii/
S240589631501126X
[11] I. Izadi, S. L. Shah, D. S. Shook, S. R. Kondaveeti, T. Chen, A
framework for optimal design of alarm systems, IFAC Proceed-815
ings Volumes 42 (8) (2009) 651 – 656, 7th IFAC Symposium on
39
Fault Detection, Supervision and Safety of Technical Processes.
doi:https://doi.org/10.3182/20090630-4-ES-2003.00108.
URL http://www.sciencedirect.com/science/article/pii/
S1474667016358517820
[12] B. Mehta, Y. Reddy, Chapter 21 - alarm management systems,
in: B. Mehta, Y. Reddy (Eds.), Industrial Process Automation
Systems, Butterworth-Heinemann, Oxford, 2015, pp. 569 – 582.
doi:http://dx.doi.org/10.1016/B978-0-12-800939- 0.00021-8.
URL http://www.sciencedirect.com/science/article/pii/825
B9780128009390000218
[13] W. Hu, A. W. Al-Dabbagh, T. Chen, S. L. Shah, Process discov-
ery of operator actions in response to univariate alarms**this work
was supported by the natural sciences and engineering research
council of canada via the crd program., IFAC-PapersOnLine 49 (7)830
(2016) 1026 – 1031, 11th IFAC Symposium on Dynamics and Con-
trol of Process SystemsIncluding Biosystems DYCOPS-CAB 2016.
doi:https://doi.org/10.1016/j.ifacol.2016.07.337.
URL http://www.sciencedirect.com/science/article/pii/
S2405896316305444835
[14] A. Adhitya, S. F. Cheng, Z. Lee, R. Srinivasan, Quantifying the
effectiveness of an alarm management system through human fac-
tors studies, Computers Chemical Engineering 67 (2014) 1 – 12.
doi:https://doi.org/10.1016/j.compchemeng.2014.03.013.
URL http://www.sciencedirect.com/science/article/pii/840
S0098135414000945
[15] S. Lai, T. Chen, A method for pattern mining in multiple alarm flood
sequences, Chemical Engineering Research and Design 117 (2017) 831 –
839. doi:https://doi.org/10.1016/j.cherd.2015.06.019.
40
URL http://www.sciencedirect.com/science/article/pii/845
S0263876215002257
[16] T. Li, X. Li, Novel alarm correlation analysis system based on association
rules mining in telecommunication networks, Information Sciences 180 (16)
(2010) 2960 – 2978. doi:https://doi.org/10.1016/j.ins.2010.04.013.
URL http://www.sciencedirect.com/science/article/pii/850
S0020025510001696
[17] R. D. Gardner, D. A. Harle, Fault resolution and alarm correlation in
high-speed networks using database mining techniques, in: Proceedings
of ICICS, 1997 International Conference on Information, Communications
and Signal Processing. Theme: Trends in Information Systems Engineering855
and Wireless Multimedia Communications (Cat., Vol. 3, 1997, pp. 1423–
1427 vol.3. doi:10.1109/ICICS.1997.652226.
[18] J.-Z. Ouh, P.-H. Wu, M.-S. Chen, Experimental results on a constraint
based sequential pattern mining for telecommunication alarm data, in:
Proceedings of the Second International Conference on Web Information860
Systems Engineering, Vol. 2, 2001, pp. 186–193 vol.2. doi:10.1109/WISE.
2001.996754.
[19] P.-H. Wu, W.-C. Peng, M.-S. Chen, Mining sequential alarm patterns in a
telecommunication database, in: W. Jonker (Ed.), Databases in Telecom-
munications II, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp.865
37–51.
[20] F. Yang, S. Shah, D. Xiao, T. Chen, Improved correlation analysis and
visualization of industrial alarm data, ISA Transactions 51 (4) (2012) 499
– 506. doi:https://doi.org/10.1016/j.isatra.2012.03.005.
URL http://www.sciencedirect.com/science/article/pii/870
S0019057812000341
[21] G. Dorgo, J. Abonyi, Sequence mining based alarm suppression, IEEE
Access 6 (2018) 15365–15379. doi:10.1109/ACCESS.2018.2797247.
41
[22] K. Cho, B. van Merrienboer, C¸ . G¨ul¸cehre, F. Bougares, H. Schwenk,
Y. Bengio, Learning phrase representations using RNN encoder-decoder for875
statistical machine translation, CoRR abs/1406.1078. arXiv:1406.1078.
URL http://arxiv.org/abs/1406.1078
[23] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput.
9 (8) (1997) 1735–1780. doi:10.1162/neco.1997.9.8.1735.
URL http://dx.doi.org/10.1162/neco.1997.9.8.1735880
[24] I. Sutskever, O. Vinyals, Q. V. Le, Sequence to sequence learning with
neural networks, CoRR abs/1409.3215. arXiv:1409.3215.
URL http://arxiv.org/abs/1409.3215
[25] G. Dorgo, P. Pigler, M. Haragovics, J. Abonyi, Learning op-
eration strategies from alarm management systems by temporal885
pattern mining and deep learning, in: A. Friedl, J. J. Kleme,
S. Radl, P. S. Varbanov, T. Wallek (Eds.), 28th European Sym-
posium on Computer Aided Process Engineering, Vol. 43 of Com-
puter Aided Chemical Engineering, Elsevier, 2018, pp. 1003 – 1008.
doi:https://doi.org/10.1016/B978-0-444-64235- 6.50176-5.890
URL http://www.sciencedirect.com/science/article/pii/
B9780444642356501765
[26] J. Zhu, C. Wang, C. Li, X. Gao, J. Zhao, Dynamic alarm
prediction for critical alarms using a probabilistic model, Chi-
nese Journal of Chemical Engineering 24 (7) (2016) 881 – 885.895
doi:https://doi.org/10.1016/j.cjche.2016.04.017.
URL http://www.sciencedirect.com/science/article/pii/
S1004954116303044
[27] S. Lai, F. Yang, T. Chen, Online pattern matching and prediction of
incoming alarm floods, Journal of Process Control 56 (2017) 69 – 78.900
doi:https://doi.org/10.1016/j.jprocont.2017.01.003.
42
URL http://www.sciencedirect.com/science/article/pii/
S0959152417300100
[28] P. Fournier-Viger, A. Gomariz, M. Campos, R. Thomas, Fast vertical min-
ing of sequential patterns using co-occurrence information, in: V. S. Tseng,905
T. B. Ho, Z.-H. Zhou, A. L. P. Chen, H.-Y. Kao (Eds.), Advances in Knowl-
edge Discovery and Data Mining, Springer International Publishing, Cham,
2014, pp. 40–52.
[29] P. Fournier-Viger, C.-W. Wu, A. Gomariz, V. S. Tseng, Vmsp: Efficient
vertical mining of maximal sequential patterns, in: M. Sokolova, P. van910
Beek (Eds.), Advances in Artificial Intelligence, Springer International Pub-
lishing, Cham, 2014, pp. 83–94.
[30] G. Dorgo, P. Pigler, J. Abonyi, Understanding the importance of pro-
cess alarms based on the analysis of deep recurrent neural networks
trained for fault isolation, Journal of Chemometrics 32 (4) e3006,915
e3006 cem.3006. arXiv:https://onlinelibrary.wiley.com/doi/pdf/
10.1002/cem.3006,doi:10.1002/cem.3006.
URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cem.3006
[31] R. J. Williams, D. Zipser, A learning algorithm for continually running
fully recurrent neural networks 1.920
[32] D. Buddaraju, Performance of control room operators in alarm manage-
ment, Master’s thesis, Louisiana State University and Agricultural and
Mechanical College (2011).
[33] J. J. Hopfield, Neural networks and physical systems with emergent col-
lective computational abilities, Proceedings of the national academy of sci-925
ences 79 (8) (1982) 2554–2558.
[34] S. Hochreiter, The vanishing gradient problem during learning recurrent
neural nets and problem solutions, International Journal of Uncertainty,
43