PreprintPDF Available

Abstract and Figures

Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection procedure that allows sampling training process instances for prediction models. We show that our sampling method allows for a significant increase of training speed for next activity prediction methods while maintaining reliable levels of prediction accuracy.
Content may be subject to copyright.
Event Log Sampling for Predictive Monitoring
Mohammadreza Fani Sani 1, Mozhgan Vazifehdoostirani 2,
Gyunam Park 1, Marco Pegoraro 1, Sebastiaan J. van Zelst 3,1, and
Wil M.P. van der Aalst 1,3
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
{fanisani, gnpark, pegoraro, s.j.v.zelst, wvdaalst}@pads.rwth-aachen.de
2Industrial Engineering and Innovation Science, Eindhoven University of Technolo,
Eindhoven, the Netherlands
m.vazifehdoostirani@tue.nl
3Fraunhofer FIT, Birlinghoven Castle, Sankt Augustin, Germany
Abstract
Predictive process monitoring is a subeld of process mining that aims to estimate
case or event features for running process instances. Such predictions are of sig-
nicant interest to the process stakeholders. However, state-of-the-art methods
for predictive monitoring require the training of complex machine learning mod-
els, which is ofen inecient. This paper proposes an instance selection procedure
that allows sampling training process instances for prediction models. We show
that our sampling method allows for a signicant increase of training speed for
next activity prediction methods while maintaining reliable levels of prediction
accuracy.
Keywords: Process Mining ·Predictive Monitoring ·Sampling ·Machine Learn-
ing ·Deep Learning ·Instance Selection.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the authors. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Fani Sani, Mohammadreza et al. “EventLog Sampling for Predictive Monitoring”. In: International Workshop on Lever-
aging Machine Learning in Process Mining (ML4PM). Springer, 2021
Please, cite this document as shown above.
Publication chronology:
2021-08-19: abstract submitted to the International Workshop on LeveragingMachine Learning in Process Mining (ML4PM) 2021
2021-08-26: full text submitted to the International Workshopon Leveraging Machine Learning in Process Mining (ML4PM) 2021
2021-09-16: notication of acceptance
2021-09-30: camera-ready version submitted
2021-11-01: presented
2022-03-24: proceedings published
The published version referred above is ©Springer.
Correspondence to:
Mohammadreza Fani Sani, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Email: fanisani@pads.rwth-aachen.de ·ORCID: 0000-0003-3152-2103
Content: 17 pages, 1 gure, 4 tables, 34 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
1 Introduction
As the environment surrounding business processes becomes more dynamic and com-
petitive, it becomes imperative to predict process behaviors and take proactive actions [2].
Predictive business process monitoring aims at predicting the behavior of business pro-
cesses, to mitigate the risk resulting from undesired behaviors in the process. For in-
stance, by predicting the next activities in the process, one can foresee the undesired ex-
ecution of activities, thus preventing possible risks resulting from it [5]. Moreover, by
predicting an expected high service time for an activity, one may bypass or add more re-
sources for the activity [15]. Recent breakthroughs in machine learning have enabled the
development of efective techniques for predictive business process monitoring. Speci-
cally, techniques based on deep neural networks, e.g., Long-Short Term Memory(LSTM)
networks, have shown high performance in diferent tasks [8]. Additionally, the emer-
gence of ensemble learning methods leads to improvement in accuracy in diferent ar-
eas [4]. Particularly, for predictive process monitoring, eXtreme Gradient Boosting (XG-
Boost) [7] has shown promising results, ofen outperforming other ensemble methods
such as Random Forest or using a single regression tree [25,28].
Indeed, machine learning algorithms sufer from the expensive computational costs
in their training process [34]. In particular, machine learning algorithms based on neu-
ral networks and ensemble learning might require tuning their hyperparameters to be
able to provide acceptable accuracy. Such long training time limits the application of the
techniques considering the limitations in time and hardware [21]. This is particularly
relevant for predictive business process monitoring techniques. Business analysts need
to test the eciency and reliability of their conclusions via repeated training of difer-
ent prediction models with diferent parameters [15]. Moreover, the dynamic nature of
business processes requires new models adapting to new situations in short intervals.
Instance selection aims at reducing original datasets to a manageable volume to per-
form machine learning tasks, while the quality of the results (e.g., accuracy) is maintained
as if the original dataset was used [11]. Instance selection techniques are categorized into
two classes based on the way they select instances. First, some techniques select the in-
stances at the boundaries of classes. For instance, Decremental Reduction Optimization
Procedure (DROP) [32] selects instances using k-Nearest Neighbors by incrementally
discarding an instance if its neighbors are correctly classied without the instance. The
other techniques preserve the instances residing inside classes, e.g., Edited Nearest Neigh-
bor (ENN) [33] preserves instances by repeatedly discarding an instance if it does not
belong to the class of the majority of its neighbors.
Such techniques assume independence among instances [32]. However, in predic-
tive business process monitoring training, instances may be highly correlated [1], imped-
ing the application of techniques for instance selection. Such instances are computed
from event data that are recorded by the information system supporting business pro-
3 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
cesses [13]. The event data are correlated by the notion of case, e.g., patients in a hospital
or products in a factory. In this regard, we need new techniques for instance selection
applicable to event data.
In this work, we suggest an instance selection approach for predicting the next activ-
ity, one of the main applications of predictive business process monitoring. By consider-
ing the characteristics of the event data, the proposed approach samples event data such
that the training speed is improved while the accuracy of the resulting prediction model
is maintained. We have evaluated the proposed methods using two real-life datasets
and state-of-the-art techniques for predictive business process monitoring, including
LSTM [12] and XGBoost [7].
The remainder is organized as follows. We discuss the related work in Section 2.
Next, we present the preliminaries in Section 3and proposed methods in Section 4. Af-
terward, Section 5evaluates the proposed methods using real-life event data and Section 6
provides discussions. Finally, Section 7concludes the paper.
2 Related Work
Predictive process monitoring is an exceedingly active eld of research. At its core, the
fundamental component of predictive monitoring is the abstraction technique it uses
to obtain a xed-length representation of the process component subject to the predic-
tion (ofen, but not always, process traces). In the earlier approaches, the need for such
abstraction was overcome through model-aware techniques, employing process mod-
els and replay techniques on partial traces to abstract a at representation of event se-
quences. Such process models are mostly automatically discovered from a set of available
complete traces, and require perfect tness on training instances (and, seldomly, also on
unseen test instances). For instance, Van der Aalst et al. [2] proposed a time prediction
framework based on replaying partial traces on a transition system, efectively clustering
training instances by control-ow information. This framework has later been the basis
for a prediction method by Polato et al. [20], where the transition system is annotated
with an ensemble of SVR and Na¨
ıve Bayes classiers, to perform a more accurate time
estimation. A related approach, albeit more linked to the simulation domain and based
on a Monte Carlo method, is the one proposed by Rogge-Solti and Weske [24], which
maps partial process instances in an enriched Petri net.
Recently, predictive process monitoring started to use a plethora of machine learn-
ing approaches, achieving varying degrees of success. For instance, Teinemaa et al. [27]
provided a framework to combine text mining methods with Random Forest and Logis-
tic Regression. Senderovich et al. [25] studied the efect of using intra-case and inter-case
features in predictive process monitoring and showed a promising result for XGBoost
compared to other ensemble and linear methods. A comprehensive benchmark on using
4 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
classical machine learning approaches for outcome-oriented predictive process monitor-
ing tasks [28] has shown that the XGBoost is the best-performing classier among dif-
ferent machine learning approaches such as SVM, Decision Tree, Random Forest, and
logistic regression.
More recent methods are model-unaware and perform based on a single and more
complex machine learning model instead of an ensemble. The LSTM network model
has proven to be particularly efective for predictive monitoring [8,26], since the re-
current architecture can natively support sequences of data of arbitrary length. It allows
performing trace prediction while employing a xed-length eventabstraction, which can
be based on control-ow alone [8,26], data-aware [16], time-aware [17], text-aware [19],
or model-aware [18].
A concept similar to the idea proposed in this paper, and of current interest in the
eld of machine learning, is dataset distillation: utilizing a dataset to obtain a smaller set
of training instances that contain the same information (with respect to training a ma-
chine learning model) [31]. While this is not considered sampling, since some instances
of the distilled dataset are created ex-novo, it is an approach very similar to the one we
illustrate in our paper. Moreover, recently some instance selection algorithms have been
proposed to help process mining algorithms. For example, [10,9] proposed to use in-
stance selection techniques to improve the performance of process discovery and con-
formance checking procedures.
In this paper, we examine the underexplored topic of event data sampling and se-
lection for predictive process monitoring, with the objective of assessing if and to which
extent prediction quality can be retained when we utilize subsets of the training data.
3 Preliminaries
In this section, some process mining concepts such as event log and sampling are dis-
cussed. In process mining, we use events to provide insights into the execution of busi-
ness processes. Each event is related to specic activities of the underlying process. Fur-
thermore, we refer to a collection of events related to a specic process instance as a case.
Both cases and events may have diferent attributes. An event log that is a collection of
events and cases is dened as follows.
Definition 1 (Event Log).Let Ebe the universe of events, Cbe the universe of cases,
AT be the universe of attributes, and Ube the universe of attribute values. Moreover,
let CCbe a non-empty set of cases, let EEbe a non-empty set of events, and
let AT AT be a set of attributes. We define (C, E, πC, πE)as an event log, where
πC:C×AT 6→ Uand πE:E×AT 6→ U. Any event in the event log has a case,
therefore, @eE(πE(e, case)6∈ C)and S
eE
(πE(e, case)) = C.
5 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Furthermore, let AUbe the universe of activities and let VAbe the
universe of sequences of activities. For any eE, function πE(e, activity)A, which
means that any event in the event log has an activity. Moreover, for any cCfunction
πC(c, variant)A\ {hi} that means any case in the event log has a variant.
Therefore, there are some mandatory attributes that are case and activity for events
and variants for cases. In some process mining applications, e.g., process discovery and
conformance checking, just variant information is considered. Therefore, event logs are
considered as a multiset of sequences of activities. In the following, a simple event log is
dened.
Definition 2 (Simple event log).Let Abe the universe of activities and let the
universe of multisets over a set Xbe denoted by B(X). A simple event log is LB(A).
Moreover, let EL be the universe of event logs and EL = (C, E, πC, πE)EL be an
event log. We define function sl :EL B({πE(e, activity)|eE})returns the simple
event log of an event log. The set of unique variants in the event log is denoted by sl(EL).
Therefore, sl returns the multiset of variants in the event logs. Note that the size of
a simple event log equals the number of cases in the event logs, i.e., sl(EL) = |C|
In this paper, we use sampling techniques to reduce the size of event logs. An event
log sampling method is dened as follows.
Definition 3 (Event log sampling).Let EL be the universe of event logs and Abe the
universe of activities. Moreover, let EL = (C, E, πC, πE)EL be an event log, we define
function δ:EL EL that returns the sampled event log where if (C0, E 0, π0
C, π0
E) =
δ(EL), then C0C,E0E,π0
eπE,π0
CπC, and consequently, sl(δ(EL))
sl(EL). We define that δis a variant-preserving sampling if sl(δ(EL)) = sl(EL).
In other words, a sampling method is variant-preserving if and only if all the variants
of the original event log are presented in the sampled event log.
To use machine learning methods for prediction, we usually need to transfer each
case to one or more features. The feature is dened as follows.
Definition 4 (Feature).Let AT be the universe of attributes, Ube the universe of
attribute values, and Cbe the universe of cases. Moreover, let AT AT be a set of
attributes. A feature is a relation between a sequence of attributes’ values for AT and the
target attribute value, i.e., f(U|AT|×U). We define fe:C×EL B(U|AT |×U)
is a function that receives a case and an event log, and returns a multiset of features.
For the next activity prediction, i.e., our prediction goal, the target attribute value
should be an activity. Moreover, a case in the event log may have diferent features. For
example, suppose that we only consider the activities. For the case ha, b, c, di, we may
6 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Figure 1: A schematic view of the proposed sampling procedure
have (hai, b),(ha, bi, c), and (ha, b, ci, d)as features. Furthermore, P
cC
fe(c, EL)are the
corresponding features of event log EL = (C, E, πC, πE)that could be given to diferent
machine learning algorithms. For more details on how to extract features from event logs
please refer to [23].
4 Proposed Sampling Methods
In this section, we propose an event log preprocessing procedure that helps prediction
algorithms to perform faster while maintaining reasonable accuracy. The schematic view
of the proposed sampling approach is presented in Figure 1. We rst need to traverse the
event log and nd the variants and corresponding traces of each variant in the event log.
Moreover, diferent distributions of data attributes in each variant will be computed.
Aferward, using diferent sorting and instance selection strategies, we are able to select
some of the cases and return the sample event log. In the following, each of these steps
is explained in more detail.
1. Traversing the event log: In this step, the unique variants of the event log and
the corresponding traces of each variant are determined. In other words, con-
sider event log EL that sl(EL) = {σ1, . . . , σn}where n=|sl(EL)|, we aim to
split EL to EL1, . . . , ELnwhere ELionly contains all the cases that Ci={cC|
πC(c, variant) = σi}and Ei={eE|πE(e, case)Ci}. Obviously, S
1in
(Ci) =
Cand T
1in
(Ci) = .
2. Distribution Computation: In this step, for each variant of the event log, we com-
pute the distribution of diferent data attributes aAT. It would be more
practical if the interesting attributes are chosen by an expert. Both event and case
attributes can be considered. A simple approach is to compute the frequency of
categorical data values. For numerical data attributes, it is possible to consider the
average or the median of values for all cases of each variant.
3. Sorting the cases of each variant: In this step, we aim to sort the traces of each
variant. We need to sort the traces to give a higher priority to those traces that can
7 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
represent the variant better. One way is to sort the traces based on the frequency
of the existence of the most occurred data values of the variant. For example, we
can give a higher priority to the traces that have more frequent resources of each
variant. It is also possible to sort the traces based on their arrival time or randomly.
4. Returning sample event logs: Finally, depending on the setting of the sampling
function, we return some of the traces with the highest priority for all variants.
The most important point about this step is to know how many traces of each
variant should be selected. In the following, some possibilities will be introduced.
Unique selection: In this approach, we select only one trace with the highest
priority. In other words, suppose that L0=sl(δ(EL)),σL0L0(σ)=1.
Therefore, using this approach we will have |sl(δ(EL))|=|sl(EL)|. It is
expected that using this approach, the distribution of frequency of variants
will be changed and consequently the resulted prediction model will be less
accurate.
Logarithmic distribution: In this approach, we reduce the number of traces
in each variant in a logarithmic way. If L=sl(EL)and L0=sl(δ(EL)),
σL0L0(σ) = [Logk(L(σ))]. Using this approach, the infrequent variants
will not have any trace in the sampled event log. By using a higher k, the size
of the sampled event log is reduced more.
Division: This approach performs similar to the previous one, however, in-
stead of using logarithmic scale, we apply the division operator. In this ap-
proach, σL0L0(σ) = d(σ)
ke. A higher kresults in fewer cases in the sample
event log. Note that using this approach all the variants have at least one
trace in the sampled event log.
There is also a possibility to consider other selection methods. For example, we
can select the traces completely randomly from the original event log.
By choosing diferent data attributes in Step 2 and diferent sorting algorithms in
Step 3, we are able to lead the sampling of the method on which cases should be chosen.
Moreover, by choosing the type of distribution in Step 4, we determine how many cases
should be chosen. To compute how sampling method δreduces the size of the given
event log EL, we use the following equation:
RS=|sl(EL)|
|sl(δ(EL))|(1)
The higher RSvalue means, the sampling method reduces more the size of the training
log. By choosing diferent distribution methods and diferent k-values, we are able to
8 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Table 1: Overview of the event logs that are used in the experiments. The accuracy and the required times
(in seconds) of diferent prediction methods for these event logs are also presented.
Event Log Cases Activities Variants Attributes FE Time LSTM Train Time LSTM Acc XG TrainTime XG Acc
RTFM 150370 11 231 1 73649 3021 0.791 11372 0.814
BPIC-2012-W 9658 6 2643 2 1212 3344 0.68 2011 0.685
control the size of the sampled event log. It should be noted that the proposed method
will apply just to the training event log. In other words, we do not sample event logs for
development and test datasets.
5 Evaluation
In this section, we aim at designing some experiments to answer our research question,
i.e., “Can we improve the computational performance of prediction methods by using
the sampled event logs, while maintaining a similar accuracy?”. It should be noted that
the focus of the experiments is not on prediction model tuning to have higher accuracy.
Conversely, we aim to analyze the efect of using sampled event logs (instead of the whole
datasets) on the required time and the accuracy of prediction models. In the following,
we rst explain the event logs that are used in the experiments. Aferward, we provide
some information about the implementation of sampling methods. Moreover, the ex-
perimental setting is discussed and, nally, we show the experimental results.
5.1 Event logs
To evaluate the proposed sampling procedure for prediction, we have used two event
logs widely used in the literature. Some information about these event logs is presented
in Table 1. In the RTFM event log, which corresponds to a road trac management
system, we have some high frequent variants and several infrequent variants. Moreover,
the number of activities in this event log is high. Some of these activities are infrequent,
which makes this event log imbalanced. In the BPIC-2012-W event log, relating to a
process of an insurance company, the average of variant frequencies is lower.
5.2 Implementation
We have developed the sampling methods as a plug-in in the ProM framework [30], ac-
cessible via https://svn.win.tue.nl/repos/prom/Packages/LogFiltering.
This plug-in takes an event log and returns k diferent train and test event logs in the
CSV format. Moreover, to train the prediction method, we have used XGBoost [7] and
LSTM [12] methods as they are widely used in the literature and outperformed their
counterparts. Our LSTM network consisted of an input layer, two LSTM layers with
9 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
dropout rates of 10%, and a dense output layer with the SoMax activation function.
We used “categorical cross-entropy” to calculate the loss and adopted ADAM as an
optimizer. We used gbtree with a max depth of 6as a booster in our XGBoost model.
Uniform distribution is used as the sampling method inside our XGBoost model. To
avoid overtting in both models, the training set is further divided into 90% training set
and 10% validation set to stop training once the model performance on the validation
set stops improving. We used the same setting of both models for original event logs and
sampled event logs. To access our implementations of these methods and the feature gen-
eration please refer to https://github.com/gyunamister/pm-prediction/. For
details of the feature generation and feature encoding steps, please refer to [18].
5.3 Evaluation setting
To sample the event logs, we use three distributions that are log distribution,division, and
unique variants. For the log distribution method, we have used 2,3, and 10 (i.e., log2, log3,
and log10). For the division method, we have used 2,5, and 10 (i.e., d2, d5, and d10). For
each event log and for each sampling method, we have used a 5-fold cross-validation.
Moreover, as the results of the experiments are non-deterministic, all the experiments
have been repeated 5times and the average values are represented.
Note that, for both training and evaluation phases, we have used the same settings
for extracting features and training prediction models. We used one-hot encoding to
encode the sequence of activities for both LSTM and XGBoost models. We ran the
experiment on a server with Intel Xeon CPU E7-4850 2.30GHz, and 512 GB of RAM.
In all the steps, one CPU thread has been used. We employed the Weighted Accuracy
metric [22] to compute how a prediction method performs for test data. To compare
the accuracy of the prediction methods, we use the relative accuracy that is dened as
follows.
RAcc =Accuracy using the sampled training log
Accuracy using the whole training log (2)
If RAcc is close to 1, it means that using the sampling event logs, the prediction methods
behave almost similar to the case that the whole data is used for the training. Moreover,
values higher than 1indicate the accuracy of prediction methods has improved.
To compute the improvement in the performance of training time, we will use the
following equations.
Rt=Training time using whole data
Training time using the sampled data (3)
RFE =Feature extraction time using whole data
Feature extraction time using the sampled data (4)
10 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
For both equations, the resulting values indicate how many times the sampled log is
faster than using all data.
5.4 Experimental results
Table 2presents the reduction rate and the improvement in the feature extraction phase
using diferent sampling methods. As it is expected, the highest reduction rate is for log10
(as it removes infrequent variants and keeps few traces of frequent variants), and respec-
tively it has the biggest improvement in RFE. Moreover, the lowest reduction is for d2,
especially if there are lots of unique variants in the event log (i.e., for the RTFM event
log). We expected smaller event logs to require less feature extraction time. However, re-
sults indicate that the relationship is not linear, and by having more reduction in the size
of the sampled event log there will be a much higher reduction in the feature extraction
time.
In Tables 3and 4, the results of improvement in Rtand RAcc are shown for LSTM
and XG prediction methods. As expected, by using fewer cases in the training, the per-
formance of training time improvement will be higher. Comparing the results in these
two tables and the results in Table 2, it is interesting to see that in some cases, even by
having a high reduction rate, the accuracy of the trained prediction model is close to the
case in which whole training log is used. For example, using d10 for the RTFM event
log, we will have high accuracy for both prediction methods. In other words, we are
able to improve the performance of the prediction procedure while the accuracy is still
reasonable.
When using the LSTM prediction method for the RTFM event log, there are some
cases where we have accuracy improvement. For example, using d3, there is a 0.4% im-
provement in the accuracy of the trained model. It is mainly because of the existence of
high frequent variants. These variants lead to having unbiased training logs and conse-
quently, the accuracy of the trained model will be lower for infrequent behaviors.
6 Discussion
The results indicate that we do not always have a typical trade-of between the accuracy
of the trained model and the performance of the prediction procedure. In other words,
there are some cases where the training process is much faster than the normal procedure,
even though the trained model provides an almost similar accuracy. We did not provide
the results for other metrics; however, there are similar patterns for weighted recall, pre-
cision, and f1-score. Thus, the proposed sampling methods can be used when we aim
to apply hyperparameter optimization [3]. In this way, more settings can be analyzed in
a limited time. Moreover, it is reasonable to use the proposed method when we aim to
train an online prediction method or on naive hardware such as cell phones.
11 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Table 2: The reduction in the size of training logs (i.e., RS) and the improvement in the performance of
feature extraction part (i.e., RFE) using diferent sampling methods.
Sampling Methods d2 d3 d10 log2log3log10 unique
Event Log RSRFE RSRFE RSRFE RSRFE RSRFE RSRFE RSRFE
RTFM[14] 1.99 4.8 3.0 11.1 9.8 106.9 153.5 12527.6 236.3 23699.2 572.3 74912.8 285.1 24841.8
BPIC-2012-W [29] 1.22 1.37 1.41 1.80 1.66 2.51 6.06 22.41 9.05 37.67 28.50 208.32 1.73 2.36
Table 3: The accuracy and the improvement in the performance of prediction using diferent sampling
methods for LSTM.
Sampling Methods d2 d3 d10 log2log3log10 unique
Event Log RAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc Rt
RTFM 1.001 2.0 1.004 2.9 0.990 9.0 0.716 26.7 0.724 33.0 0.767 41.8 0.631 29.1
BPIC-2012-W 1.000 1.4 0.985 1.3 0.938 1.3 0.977 4.7 0.970 5.8 0.876 11.9 0.996 1.6
Table 4: The accuracy and the improvement in the performance of prediction using diferent sampling
methods for XGBoost.
Sampling Methods d2 d3 d10 log2log3log10 unique
Event Log RAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc Rt
RTFM 1.000 2.4 1.000 1.4 1.000 84.1 0.686 126.4 0.706 191.8 0.772 355.0 0.582 297.7
BPIC-2012-W 0.999 2.3 0.998 2.4 0.997 3.4 0.923 10.7 0.970 16.7 0.883 64.8 0.997 2.8
Another important outcome of the results is that for diferent event logs, we should
use diferent sampling methods to achieve the highest performance. For example, for the
RTFM event log—as there are some highly frequent variants—the division distribution
may be more useful. In other words, independently of the used prediction method, if
we change the distribution of variants (e.g., using unique distribution), it is expected that
the accuracy will sharply decrease. However, for event logs with a more uniform distribu-
tion, we can use logarithmic and unique distributions to sample event logs. The results
indicate that the efect of the chosen distribution (i.e., unique,division, and logarithmic)
is more important than the used k-value. Therefore, it would be valuable to investigate
more on the characteristics of the given event log and suitable sampling parameters for
such distribution. For example, if most variants of a given event log are unique, the
division and unique methods are not able to have remarkable RSand consequently, RFE
and Rtwill be close to 1.
Moreover, results have shown that by oversampling the event logs, although we will
have a very big improvement in the performance of the prediction procedure, the ac-
curacy of the trained model is signicantly lower than the accuracy of the model that is
trained by the whole event log. Therefore, we suggest gradually increasing (or decreas-
ing) the size of the sampled event log in the hyper-parameter optimization scenarios.
By analysis of the results using common prediction methods, we have found that the
infrequent activities can be ignored using some hyper-parameter settings. This is mainly
because the event logs are unbalanced for these infrequent activities. Using the sampling
methods that modify the distribution of the event logs such as the unique method can
help the prediction methods to also consider these activities.
Finally, in real scenarios, the process can change because of diferent reasons [6]. This
12 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
phenomenon is usually called concept dri. By considering the whole event log for train-
ing the prediction model, it is most probable that these changes are not considered in
the prediction. Using the proposed sampling procedure, and giving higher priorities to
newer traces, we are able to adapt to the changes faster, which may be critical for specic
applications.
7 Conclusion
In this paper, we proposed to use the subset of event logs to train prediction models.
We proposed diferent sampling methods for next activity prediction. These methods
are implemented in the ProM framework. To evaluate the proposed methods, we have
applied them on two real event logs and have used twostate-of-the-art prediction meth-
ods: LSTM and XGBoost. The experimental results have shown that, using the pro-
posed method, we are able to improve the performance of the next activity prediction
procedure while retaining an acceptable accuracy (in some experiments, the accuracy in-
creased). However, there is a relation between event logs characteristics and suitable pa-
rameters that can be used to sample these event logs. The proposed methods can be
helpful in situations where we aim to train the model fastly or in hyper-parameter opti-
mization scenarios. Moreover, in cases where the process can change over time, we are
able to adapt to the modied process more quickly using sampling methods.
To continue this research, we aim to extend the experiments to study the relation-
ship between the event log characteristics and the sampling parameters. Additionally, we
plan to provide some sampling methods that help prediction methods to predict infre-
quent activities, which could be more critical in the process. Finally, it is interesting to
investigate more on using sampling methods for other prediction method applications
such as last activity and remaining time prediction.
Acknowledgements
We thank the Alexander von Humboldt (AvH) Stifung for supporting our research in-
teractions.
References
[1] van der Aalst, Wil M. P. Process Mining - Data Science in Action, Second Edition.
Springer, 2016. isbn: 978-3-662-49850-7. doi:10.1007/978-3-662-49851-
4.
13 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[2] van der Aalst, Wil M. P., M. H. Schonenberg, and Minseok Song. “Time predic-
tion based on process mining”. In: Information Systems 36.2 (2011), pp. 450–475.
doi:10.1016/j.is.2010.09.001.
[3] Bergstra, James, R´
emi Bardenet, Yoshua Bengio, et al. “Algorithms for Hyper-
Parameter Optimization”. In: Advances in Neural Information Processing Sys-
tems 24: 25th Annual Conference on Neural Information Processing Systems 2011.
Proceedings of a meeting held 12-14 December 2011, Granada, Spain. Ed. by Shawe-
Taylor, John, Richard S. Zemel, Peter L. Bartlett, et al. 2011, pp. 2546–2554. url:
https://proceedings . neurips . cc /paper/2011/hash/86e8f7ab
32cfd12577bc2619bc635690-Abstract.html.
[4] Breiman, Leo. “Bagging Predictors”. In: Machine Learning 24.2 (1996), pp. 123–
140. doi:10.1007/BF00058655.
[5] Breuker, Dominic, Martin Matzner, Patrick Delfmann, et al. “Comprehensible
Predictive Models for Business Processes”. In: MIS Quarterly 40.4 (2016), pp. 1009–
1034. url:http://misq.org/comprehensible-predictive-models-
for-business-processes.html.
[6] Carmona, Josep and Ricard Gavald`
a. “Online Techniques for Dealing with Con-
cept Drif in Process Mining”. In: Advances in Intelligent Data Analysis XI - 11th
International Symposium, IDA 2012, Helsinki, Finland, October 25-27, 2012. Pro-
ceedings. Ed. by Hollm´
en, Jaakko, Frank Klawonn, and Allan Tucker. Vol. 7619.
Lecture Notes in Computer Science. Springer, 2012, pp. 90–102. doi:10.1007/
978-3-642-34156-4_10.
[7] Chen, Tianqi and Carlos Guestrin. “XGBoost: A Scalable Tree Boosting Sys-
tem”. In: Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-
17, 2016. Ed. by Krishnapuram, Balaji, Mohak Shah, Alexander J. Smola, et al.
ACM, 2016, pp. 785–794. doi:10.1145/2939672.2939785.url:https:
//doi.org/10.1145/2939672.2939785.
[8] Evermann, Joerg, Jana-Rebecca Rehse, and Peter Fettke. “Predicting process be-
haviour using deep learning”. In: Decision Support Systems 100 (2017), pp. 129–
140. doi:10.1016/j.dss.2017.04.003.
[9] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“Conformance Checking Approximation Using Subset Selection and Edit Dis-
tance”. In: Advanced Information Systems Engineering - 32nd International Con-
ference, CAiSE 2020, Grenoble, France, June 8-12, 2020, Proceedings. Ed. by Dust-
dar, Schahram, Eric Yu, Camille Salinesi, et al. Vol. 12127. Lecture Notes in Com-
puter Science. Springer, 2020, pp. 234–251. doi:10 . 1007 / 978 - 3 - 030 -
49435-3_15.
14 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[10] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“The impact of biased sampling of event logs on the performance of process dis-
covery”. In: Computing 103.6 (2021), pp. 1085–1104. doi:10. 1007 / s00607 -
021-00910-4.
[11] Garc´
ıa, Salvador, Juli´
an Luengo, and Francisco Herrera. Data Preprocessing in
Data Mining. Vol. 72. Intelligent Systems Reference Library. Springer, 2015. isbn:
978-3-319-10246-7. doi:10.1007/978-3-319-10247-4.
[12] Huang, Zhiheng, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF Models for
Sequence Tagging”. In: CoRR abs/1508.01991 (2015). arXiv: 1508.01991.url:
http://arxiv.org/abs/1508.01991.
[13] de Leoni, Massimiliano, Wil M. P. van der Aalst, and Marcus Dees. “A general
process mining framework for correlating, predicting and clustering dynamic be-
havior based on event logs”. In: Information Systems 56 (2016), pp. 235–257. doi:
10.1016/j.is.2015.07.003.
[14] de Leoni, Massimiliano and Felix Mannhardt. “Road trac ne management
process”. In: Eindhoven University of Technolo. Dataset (2015).
[15] M´
arquez-Chamorro, Alfonso Eduardo, Manuel Resinas, and Antonio Ruiz-Cort´
es.
“Predictive Monitoring of Business Processes: A Survey”. In: IEEE Transactions
on Services Computing 11.6 (2018), pp. 962–977. doi:10.1109 / TSC . 2017 .
2772256.
[16] Navarin, Nicol`
o, Beatrice Vincenzi, Mirko Polato, et al. “LSTM networks for
data-aware remaining time prediction of business process instances”. In: 2017 IEEE
Symposium Series on Computational Intelligence, SSCI 2017, Honolulu, HI, USA,
November 27 - Dec. 1, 2017. IEEE, 2017, pp. 1–7. doi:10.1109/SSCI.2017.
8285184.
[17] Nguyen, An, Srijeet Chatterjee, Sven Weinzierl, et al. “Time Matters: Time-Aware
LSTMs for Predictive Business Process Monitoring”. In: Process Mining Work-
shops - ICPM 2020 International Workshops, Padua, Italy, October 5-8, 2020, Re-
vised Selected Papers. Ed. by Leemans, Sander J. J. and Henrik Leopold. Vol. 406.
Lecture Notes in Business Information Processing. Springer, 2020, pp. 112–123.
doi:10.1007/978-3-030-72693-5_9.
[18] Park, Gyunam and Minseok Song. “Predicting performances in business pro-
cesses using deep neural networks”. In: Decision Support Systems 129 (2020). doi:
10.1016/j.dss.2019.113191.url:https://doi.org/10.1016/j.
dss.2019.113191.
15 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[19] Pegoraro, Marco, Merih Seran Uysal, David Benedikt Georgi, et al. “Text-Aware
Predictive Monitoring of Business Processes”. In: 24th International Conference
on Business Information Systems, BIS 2021, Hannover, Germany, June 15-17, 2021.
Ed. by Abramowicz, Witold, S¨
oren Auer, and Elzbieta Lewanska. TIB Open Pub-
lishing, 2021, pp. 221–232. doi:10.52825/bis.v1i.62.
[20] Polato, Mirko, Alessandro Sperduti, Andrea Burattin, et al. “Time and activity
sequence prediction of business process instances”. In: Computing 100.9 (2018),
pp. 1005–1031. doi:10.1007/s00607-018-0593-x.
[21] Pourghassemi, Behnam, Chenghao Zhang, Joo Hwan Lee, et al. “On the Limits
of Parallelizing Convolutional Neural Networks on GPUs”. In: SPAA ’20: 32nd
ACM Symposium on Parallelism in Algorithms and Architectures, Virtual Event,
USA, July 15-17, 2020. Ed. by Scheideler, Christian and Michael Spear. ACM,
2020, pp. 567–569. doi:10.1145/3350755.3400266.
[22] Powers, David M. W. “Evaluation: from precision, recall and F-measure to ROC,
informedness, markedness and correlation”. In: CoRR abs/2010.16061 (2020). arXiv:
2010.16061.url:https://arxiv.org/abs/2010.16061.
[23] Qafari, Mahnaz Sadat and Wil M. P. van der Aalst. “Root Cause Analysis in Pro-
cess Mining Using Structural Equation Models”. In: Business Process Manage-
ment Workshops - BPM 2020 International Workshops, Seville, Spain, Septem-
ber 13-18, 2020, Revised Selected Papers. Ed. by del-R´
ıo-Ortega, Adela, Henrik
Leopold, and Fl´
avia Maria Santoro. Vol. 397. Lecture Notes in Business Infor-
mation Processing. Springer, 2020, pp. 155–167. doi:10.1007/978- 3-030-
66498-5_12.
[24] Rogge-Solti, Andreas and Mathias Weske. “Prediction of Remaining Service Exe-
cution Time Using Stochastic Petri Nets with Arbitrary Firing Delays”. In: Service-
Oriented Computing - 11th International Conference, ICSOC 2013, Berlin, Ger-
many, December 2-5, 2013, Proceedings. Ed. by Basu, Samik, Cesare Pautasso, Liang
Zhang, et al. Vol. 8274. Lecture Notes in Computer Science. Springer, 2013, pp. 389–
403. doi:10.1007/978-3-642-45005-1_27.
[25] Senderovich, Arik, Chiara Di Francescomarino, Chiara Ghidini, et al. “Intra and
Inter-case Features in Predictive Process Monitoring: A Tale of Two Dimensions”.
In: Business Process Management - 15th International Conference, BPM 2017,
Barcelona, Spain, September 10-15, 2017, Proceedings. Ed. by Carmona, Josep, Gre-
gor Engels, and Akhil Kumar. Vol. 10445. Lecture Notes in Computer Science.
Springer, 2017, pp. 306–323. doi:10.1007/978-3-319-65000-5_18.
16 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[26] Tax, Niek, Ilya Verenich, Marcello La Rosa, et al. “Predictive Business Process
Monitoring with LSTM Neural Networks”. In: Advanced Information Systems
Engineering - 29th International Conference, CAiSE 2017, Essen, Germany, June
12-16, 2017, Proceedings. Ed. by Dubois, Eric and Klaus Pohl. Vol. 10253. Lecture
Notes in Computer Science. Springer, 2017, pp. 477–492. doi:10.1007/978-
3-319-59536-8_30.
[27] Teinemaa, Irene, Marlon Dumas, Fabrizio Maria Maggi, et al. “Predictive Busi-
ness Process Monitoring with Structured and Unstructured Data”. In: Business
Process Management - 14th International Conference, BPM 2016, Rio de Janeiro,
Brazil, September 18-22, 2016. Proceedings. Ed. by Rosa, Marcello La, Peter Loos,
and Oscar Pastor. Vol. 9850. Lecture Notes in Computer Science. Springer, 2016,
pp. 401–417. doi:10.1007/978-3-319-45348-4_23.
[28] Teinemaa, Irene, Marlon Dumas, Marcello La Rosa, et al. “Outcome-Oriented
Predictive Process Monitoring: Review and Benchmark”. In: ACM Transactions
on Knowledge Discovery from Data13.2 (2019), 17:1–17:57. doi:10.1145/3301300.
[29] Van Dongen, Boudewijn F. BPI Challenge 2012. nl. 2012. doi:10.4121/UUID:
3926DB30-F712-4394-AEBC- 75976070E91F.url:https://data.4tu
.nl/repository/uuid:3926db30-f712-4394-aebc-75976070e91f.
[30] Verbeek, Eric, Joos C. A. M. Buijs, Boudewijn F. van Dongen, et al. “ProM 6:
The Process Mining Toolkit”. In: Proceedings of the Business Process Management
2010 Demonstration Track, Hoboken, NJ, USA, September 14-16, 2010. Ed. by
Rosa, Marcello La. Vol. 615. CEUR Workshop Proceedings. CEUR-WS.org, 2010.
url:http://ceur-ws.org/Vol-615/paper13.pdf.
[31] Wang, Tongzhou, Jun-Yan Zhu, Antonio Torralba, et al. “Dataset Distillation”.
In: CoRR abs/1811.10959 (2018). arXiv: 1811.10959.url:http://arxiv.
org/abs/1811.10959.
[32] Wilson, D. Randall and Tony R. Martinez. “Reduction Techniques for Instance-
Based Learning Algorithms”. In: Machine Learning 38.3 (2000), pp. 257–286.
doi:10.1023/A:1007626913721.
[33] Wilson, Dennis L. “Asymptotic Properties of Nearest Neighbor Rules Using Edited
Data”. In: IEEE Transactions on Systems, Man and Cybernetics 2.3 (1972), pp. 408–
421. doi:10.1109/TSMC.1972.4309137.
[34] Zhou, Lina, Shimei Pan, Jianwu Wang, et al. “Machine learning on big data: Op-
portunities and challenges”. In: Neurocomputing 237 (2017), pp. 350–361. doi:
10.1016/j.neucom.2017.01.026.
17 / 17
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The real-time prediction of business processes using historical event data is an important capability of modern business process monitoring systems. Existing process prediction methods are able to also exploit the data perspective of recorded events, in addition to the control-flow perspective. However, while well-structured numerical or categorical attributes are considered in many prediction techniques, almost no technique is able to utilize text documents written in natural language, which can hold information critical to the prediction task. In this paper, we illustrate the design, implementation, and evaluation of a novel text-aware process prediction model based on Long Short-Term Memory (LSTM) neural networks and natural language models. The proposed model can take categorical, numerical and textual attributes in event data into account to predict the activity and timestamp of the next event, the outcome, and the cycle time of a running process instance. Experiments show that the text-aware model is able to outperform state-of-the-art process prediction methods on simulated and real-world event logs containing textual data.
Chapter
Full-text available
Predictive business process monitoring (PBPM) aims to predict future process behavior during ongoing process executions based on event log data. Especially, techniques for the next activity and timestamp prediction can help to improve the performance of operational business processes. Recently, many PBPM solutions based on deep learning were proposed by researchers. Due to the sequential nature of event log data, a common choice is to apply recurrent neural networks with long short-term memory (LSTM) cells. We argue, that the elapsed time between events is informative. However, current PBPM techniques mainly use “vanilla” LSTM cells and hand-crafted time-related control flow features. To better model the time dependencies between events, we propose a new PBPM technique based on time-aware LSTM (T-LSTM) cells. T-LSTM cells incorporate the elapsed time between consecutive events inherently to adjust the cell memory. Furthermore, we introduce cost-sensitive learning to account for the common class imbalance in event logs. Our experiments on publicly available benchmark event logs indicate the effectiveness of the introduced techniques.
Article
Full-text available
With Process discovery algorithms, we discover process models based on event data, captured during the execution of business processes. The process discovery algorithms tend to use the whole event data. When dealing with large event data, it is no longer feasible to use standard hardware in a limited time. A straightforward approach to overcome this problem is to down-size the data utilizing a random sampling method. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper systematically evaluates various biased sampling methods and evaluates their performance on different datasets using four different discovery techniques. Our experiments show that it is possible to considerably speed up discovery techniques using biased sampling without losing the resulting process model quality. Furthermore, due to the implicit filtering (removing outliers) obtained by applying the sampling technique, the model quality may even be improved.
Article
Full-text available
Online operational support is gaining increasing interest due to the availability of real-time data and sufficient computing power, such as predictive business process monitoring. Predictive business process monitoring aims at providing timely information that enables proactive and corrective actions to improve process enactments and mitigate risks. There are a handful of research works focusing on the predictions at the instance level. However, it is more practical to predict the performance of processes at the process model level and detect potential weaknesses in the process to facilitate the proactive actions that will improve the process execution. Thus, in this paper, we propose a novel method to predict the future performances of a business process at the process model level. More in detail, we construct an annotated transition system and generate a process representation matrix from it. Based on the process representation matrix, we build performance prediction models using deep neural networks that consider both spatial and temporal dependencies present in the underlying business process. To validate the proposed method, we performed case studies on three real-life logs.
Article
Full-text available
Nowadays, process mining is becoming a growing area of interest in business process management (BPM). Process mining consists in the extraction of information from the event logs of a business process. From this information, we can discover process models, monitor and improve our processes. One of the applications of process mining, is the predictive monitoring of business process. The aim of these techniques is the prediction of quantifiable metrics of a running process instance with the generation of predictive models. The most representative approaches for the runtime prediction of business process are summarized in this paper. The different types of computational predictive methods, such as statistical techniques or machine learning approaches, and certain aspects as the type of predicted outcomes and quality evaluation metrics, have been considered for the categorization of these methods. This paper also includes a summary of the basic concepts, as well as a global overview of the process predictive monitoring area, that can be used to support future efforts of researchers and practitioners in this research field.
Article
Full-text available
Predictive business process monitoring refers to the act of making predictions about the future state of ongoing cases of a business process, based on their incomplete execution traces and logs of historical (completed) traces. Motivated by the increasingly pervasive availability of fine-grained event data about business process executions, the problem of predictive process monitoring has received substantial attention in the past years. In particular, a considerable number of methods have been put forward to address the problem of outcome-oriented predictive process monitoring, which refers to classifying each ongoing case of a process according to a given set of possible outcomes - e.g. Will the customer complain or not? Will an order be delivered, cancelled or withdrawn? Unfortunately, different authors have used different datasets, experimental settings, evaluation measures and baselines to assess their proposals, resulting in poor comparability and an unclear picture of the relative merits and applicability of different methods. To address this gap, this article presents a systematic review and taxonomy of outcome-oriented predictive process monitoring methods, and a comparative experimental evaluation of eleven representative methods using a benchmark covering twelve predictive process monitoring tasks based on four real-life event logs.
Conference Paper
Full-text available
Predictive business process monitoring methods exploit logs of completed cases of a process in order to make predictions about running cases thereof. Existing methods in this space are tailor-made for specific prediction tasks. Moreover, their relative accuracy is highly sensitive to the dataset at hand, thus requiring users to engage in trial-and-error and tuning when applying them in a specific setting. This paper investigates Long Short-Term Memory (LSTM) neural networks as an approach to build consistently accurate models for a wide range of predictive process monitoring tasks. First, we show that LSTMs outperform existing techniques to predict the next event of a running case and its timestamp. Next, we show how to use models for predicting the next task in order to predict the full continuation of a running case. Finally, we apply the same approach to predict the remaining time, and show that this approach outperforms existing tailor-made methods.
Chapter
Process mining is a multi-purpose tool enabling organizations to monitor and improve their processes. Process mining assists organizations to enhance their performance indicators by helping them to find and amend the root causes of performance or compliance problems. This task usually involves gathering process data from the event log and then applying some data mining and machine learning techniques. However, using the results of such techniques for process enhancement does not always lead to any process improvements. This phenomenon is often caused by mixing up correlation and causation. In this paper, we present a solution to this problem by creating causal equation models for processes, which enables us to find not only the features that cause the problem but also the effect of an intervention on any of the features. We have implemented this method as a plug-in ProM and we have evaluated it using two real and synthetic event logs. These experiments show the validity and effectiveness of the proposed method.
Conference Paper
Predictive process monitoring is concerned with predicting measures of interest for a running case (e.g., a business outcome or the remaining time) based on historical event logs. Most of the current predictive process monitoring approaches only consider intra-case information that comes from the case whose measures of interest one wishes to predict. However, in many systems, the outcome of a running case depends on the interplay of all cases that are being executed concurrently. For example, in many situations, running cases compete over scarce resources. In this paper, following standard predictive process monitoring approaches, we employ supervised machine learning for prediction. In particular, we present a method for feature encoding of process cases that relies on a bi-dimensional state space representation: the first dimension corresponds to intra-case dependencies, while the second dimension reflects inter-case dependencies to represent shared information among running cases. The inter-case encoding derives features based on the notion of case types that can be used to partition the event log into clusters of cases that share common characteristics. To demonstrate the usefulness and applicability of the method, we evaluated it against two real-life datasets coming from an Israeli emergency department process, and an open dataset of a manufacturing process.