Content uploaded by Marco Pegoraro
Author content
All content in this area was uploaded by Marco Pegoraro on Apr 04, 2022
Content may be subject to copyright.
Event Log Sampling for Predictive Monitoring
Mohammadreza Fani Sani 1, Mozhgan Vazifehdoostirani 2,
Gyunam Park 1, Marco Pegoraro 1, Sebastiaan J. van Zelst 3,1, and
Wil M.P. van der Aalst 1,3
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
{fanisani, gnpark, pegoraro, s.j.v.zelst, wvdaalst}@pads.rwth-aachen.de
2Industrial Engineering and Innovation Science, Eindhoven University of Technolo,
Eindhoven, the Netherlands
m.vazifehdoostirani@tue.nl
3Fraunhofer FIT, Birlinghoven Castle, Sankt Augustin, Germany
Abstract
Predictive process monitoring is a subeld of process mining that aims to estimate
case or event features for running process instances. Such predictions are of sig-
nicant interest to the process stakeholders. However, state-of-the-art methods
for predictive monitoring require the training of complex machine learning mod-
els, which is ofen inecient. This paper proposes an instance selection procedure
that allows sampling training process instances for prediction models. We show
that our sampling method allows for a signicant increase of training speed for
next activity prediction methods while maintaining reliable levels of prediction
accuracy.
Keywords: Process Mining ·Predictive Monitoring ·Sampling ·Machine Learn-
ing ·Deep Learning ·Instance Selection.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the authors. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Fani Sani, Mohammadreza et al. “EventLog Sampling for Predictive Monitoring”. In: International Workshop on Lever-
aging Machine Learning in Process Mining (ML4PM). Springer, 2021
Please, cite this document as shown above.
Publication chronology:
•2021-08-19: abstract submitted to the International Workshop on LeveragingMachine Learning in Process Mining (ML4PM) 2021
•2021-08-26: full text submitted to the International Workshopon Leveraging Machine Learning in Process Mining (ML4PM) 2021
•2021-09-16: notication of acceptance
•2021-09-30: camera-ready version submitted
•2021-11-01: presented
•2022-03-24: proceedings published
The published version referred above is ©Springer.
Correspondence to:
Mohammadreza Fani Sani, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Email: fanisani@pads.rwth-aachen.de ·ORCID: 0000-0003-3152-2103
Content: 17 pages, 1 gure, 4 tables, 34 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
1 Introduction
As the environment surrounding business processes becomes more dynamic and com-
petitive, it becomes imperative to predict process behaviors and take proactive actions [2].
Predictive business process monitoring aims at predicting the behavior of business pro-
cesses, to mitigate the risk resulting from undesired behaviors in the process. For in-
stance, by predicting the next activities in the process, one can foresee the undesired ex-
ecution of activities, thus preventing possible risks resulting from it [5]. Moreover, by
predicting an expected high service time for an activity, one may bypass or add more re-
sources for the activity [15]. Recent breakthroughs in machine learning have enabled the
development of efective techniques for predictive business process monitoring. Speci-
cally, techniques based on deep neural networks, e.g., Long-Short Term Memory(LSTM)
networks, have shown high performance in diferent tasks [8]. Additionally, the emer-
gence of ensemble learning methods leads to improvement in accuracy in diferent ar-
eas [4]. Particularly, for predictive process monitoring, eXtreme Gradient Boosting (XG-
Boost) [7] has shown promising results, ofen outperforming other ensemble methods
such as Random Forest or using a single regression tree [25,28].
Indeed, machine learning algorithms sufer from the expensive computational costs
in their training process [34]. In particular, machine learning algorithms based on neu-
ral networks and ensemble learning might require tuning their hyperparameters to be
able to provide acceptable accuracy. Such long training time limits the application of the
techniques considering the limitations in time and hardware [21]. This is particularly
relevant for predictive business process monitoring techniques. Business analysts need
to test the eciency and reliability of their conclusions via repeated training of difer-
ent prediction models with diferent parameters [15]. Moreover, the dynamic nature of
business processes requires new models adapting to new situations in short intervals.
Instance selection aims at reducing original datasets to a manageable volume to per-
form machine learning tasks, while the quality of the results (e.g., accuracy) is maintained
as if the original dataset was used [11]. Instance selection techniques are categorized into
two classes based on the way they select instances. First, some techniques select the in-
stances at the boundaries of classes. For instance, Decremental Reduction Optimization
Procedure (DROP) [32] selects instances using k-Nearest Neighbors by incrementally
discarding an instance if its neighbors are correctly classied without the instance. The
other techniques preserve the instances residing inside classes, e.g., Edited Nearest Neigh-
bor (ENN) [33] preserves instances by repeatedly discarding an instance if it does not
belong to the class of the majority of its neighbors.
Such techniques assume independence among instances [32]. However, in predic-
tive business process monitoring training, instances may be highly correlated [1], imped-
ing the application of techniques for instance selection. Such instances are computed
from event data that are recorded by the information system supporting business pro-
3 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
cesses [13]. The event data are correlated by the notion of case, e.g., patients in a hospital
or products in a factory. In this regard, we need new techniques for instance selection
applicable to event data.
In this work, we suggest an instance selection approach for predicting the next activ-
ity, one of the main applications of predictive business process monitoring. By consider-
ing the characteristics of the event data, the proposed approach samples event data such
that the training speed is improved while the accuracy of the resulting prediction model
is maintained. We have evaluated the proposed methods using two real-life datasets
and state-of-the-art techniques for predictive business process monitoring, including
LSTM [12] and XGBoost [7].
The remainder is organized as follows. We discuss the related work in Section 2.
Next, we present the preliminaries in Section 3and proposed methods in Section 4. Af-
terward, Section 5evaluates the proposed methods using real-life event data and Section 6
provides discussions. Finally, Section 7concludes the paper.
2 Related Work
Predictive process monitoring is an exceedingly active eld of research. At its core, the
fundamental component of predictive monitoring is the abstraction technique it uses
to obtain a xed-length representation of the process component subject to the predic-
tion (ofen, but not always, process traces). In the earlier approaches, the need for such
abstraction was overcome through model-aware techniques, employing process mod-
els and replay techniques on partial traces to abstract a at representation of event se-
quences. Such process models are mostly automatically discovered from a set of available
complete traces, and require perfect tness on training instances (and, seldomly, also on
unseen test instances). For instance, Van der Aalst et al. [2] proposed a time prediction
framework based on replaying partial traces on a transition system, efectively clustering
training instances by control-ow information. This framework has later been the basis
for a prediction method by Polato et al. [20], where the transition system is annotated
with an ensemble of SVR and Na¨
ıve Bayes classiers, to perform a more accurate time
estimation. A related approach, albeit more linked to the simulation domain and based
on a Monte Carlo method, is the one proposed by Rogge-Solti and Weske [24], which
maps partial process instances in an enriched Petri net.
Recently, predictive process monitoring started to use a plethora of machine learn-
ing approaches, achieving varying degrees of success. For instance, Teinemaa et al. [27]
provided a framework to combine text mining methods with Random Forest and Logis-
tic Regression. Senderovich et al. [25] studied the efect of using intra-case and inter-case
features in predictive process monitoring and showed a promising result for XGBoost
compared to other ensemble and linear methods. A comprehensive benchmark on using
4 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
classical machine learning approaches for outcome-oriented predictive process monitor-
ing tasks [28] has shown that the XGBoost is the best-performing classier among dif-
ferent machine learning approaches such as SVM, Decision Tree, Random Forest, and
logistic regression.
More recent methods are model-unaware and perform based on a single and more
complex machine learning model instead of an ensemble. The LSTM network model
has proven to be particularly efective for predictive monitoring [8,26], since the re-
current architecture can natively support sequences of data of arbitrary length. It allows
performing trace prediction while employing a xed-length eventabstraction, which can
be based on control-ow alone [8,26], data-aware [16], time-aware [17], text-aware [19],
or model-aware [18].
A concept similar to the idea proposed in this paper, and of current interest in the
eld of machine learning, is dataset distillation: utilizing a dataset to obtain a smaller set
of training instances that contain the same information (with respect to training a ma-
chine learning model) [31]. While this is not considered sampling, since some instances
of the distilled dataset are created ex-novo, it is an approach very similar to the one we
illustrate in our paper. Moreover, recently some instance selection algorithms have been
proposed to help process mining algorithms. For example, [10,9] proposed to use in-
stance selection techniques to improve the performance of process discovery and con-
formance checking procedures.
In this paper, we examine the underexplored topic of event data sampling and se-
lection for predictive process monitoring, with the objective of assessing if and to which
extent prediction quality can be retained when we utilize subsets of the training data.
3 Preliminaries
In this section, some process mining concepts such as event log and sampling are dis-
cussed. In process mining, we use events to provide insights into the execution of busi-
ness processes. Each event is related to specic activities of the underlying process. Fur-
thermore, we refer to a collection of events related to a specic process instance as a case.
Both cases and events may have diferent attributes. An event log that is a collection of
events and cases is dened as follows.
Definition 1 (Event Log).Let Ebe the universe of events, Cbe the universe of cases,
AT be the universe of attributes, and Ube the universe of attribute values. Moreover,
let C⊆Cbe a non-empty set of cases, let E⊆Ebe a non-empty set of events, and
let AT ⊆AT be a set of attributes. We define (C, E, πC, πE)as an event log, where
πC:C×AT 6→ Uand πE:E×AT 6→ U. Any event in the event log has a case,
therefore, @e∈E(πE(e, case)6∈ C)and S
e∈E
(πE(e, case)) = C.
5 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Furthermore, let A⊆Ube the universe of activities and let V⊆A∗be the
universe of sequences of activities. For any e∈E, function πE(e, activity)∈A, which
means that any event in the event log has an activity. Moreover, for any c∈Cfunction
πC(c, variant)∈A∗\ {hi} that means any case in the event log has a variant.
Therefore, there are some mandatory attributes that are case and activity for events
and variants for cases. In some process mining applications, e.g., process discovery and
conformance checking, just variant information is considered. Therefore, event logs are
considered as a multiset of sequences of activities. In the following, a simple event log is
dened.
Definition 2 (Simple event log).Let Abe the universe of activities and let the
universe of multisets over a set Xbe denoted by B(X). A simple event log is L∈B(A∗).
Moreover, let EL be the universe of event logs and EL = (C, E, πC, πE)∈EL be an
event log. We define function sl :EL →B({πE(e, activity)|e∈E}∗)returns the simple
event log of an event log. The set of unique variants in the event log is denoted by sl(EL).
Therefore, sl returns the multiset of variants in the event logs. Note that the size of
a simple event log equals the number of cases in the event logs, i.e., sl(EL) = |C|
In this paper, we use sampling techniques to reduce the size of event logs. An event
log sampling method is dened as follows.
Definition 3 (Event log sampling).Let EL be the universe of event logs and Abe the
universe of activities. Moreover, let EL = (C, E, πC, πE)∈EL be an event log, we define
function δ:EL →EL that returns the sampled event log where if (C0, E 0, π0
C, π0
E) =
δ(EL), then C0⊆C,E0⊆E,π0
e⊆πE,π0
C⊆πC, and consequently, sl(δ(EL)) ⊆
sl(EL). We define that δis a variant-preserving sampling if sl(δ(EL)) = sl(EL).
In other words, a sampling method is variant-preserving if and only if all the variants
of the original event log are presented in the sampled event log.
To use machine learning methods for prediction, we usually need to transfer each
case to one or more features. The feature is dened as follows.
Definition 4 (Feature).Let AT be the universe of attributes, Ube the universe of
attribute values, and Cbe the universe of cases. Moreover, let AT ⊆AT be a set of
attributes. A feature is a relation between a sequence of attributes’ values for AT and the
target attribute value, i.e., f∈(U|AT|×U). We define fe:C×EL →B(U|AT |×U)
is a function that receives a case and an event log, and returns a multiset of features.
For the next activity prediction, i.e., our prediction goal, the target attribute value
should be an activity. Moreover, a case in the event log may have diferent features. For
example, suppose that we only consider the activities. For the case ha, b, c, di, we may
6 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Figure 1: A schematic view of the proposed sampling procedure
have (hai, b),(ha, bi, c), and (ha, b, ci, d)as features. Furthermore, P
c∈C
fe(c, EL)are the
corresponding features of event log EL = (C, E, πC, πE)that could be given to diferent
machine learning algorithms. For more details on how to extract features from event logs
please refer to [23].
4 Proposed Sampling Methods
In this section, we propose an event log preprocessing procedure that helps prediction
algorithms to perform faster while maintaining reasonable accuracy. The schematic view
of the proposed sampling approach is presented in Figure 1. We rst need to traverse the
event log and nd the variants and corresponding traces of each variant in the event log.
Moreover, diferent distributions of data attributes in each variant will be computed.
Aferward, using diferent sorting and instance selection strategies, we are able to select
some of the cases and return the sample event log. In the following, each of these steps
is explained in more detail.
1. Traversing the event log: In this step, the unique variants of the event log and
the corresponding traces of each variant are determined. In other words, con-
sider event log EL that sl(EL) = {σ1, . . . , σn}where n=|sl(EL)|, we aim to
split EL to EL1, . . . , ELnwhere ELionly contains all the cases that Ci={c∈C|
πC(c, variant) = σi}and Ei={e∈E|πE(e, case)∈Ci}. Obviously, S
1≤i≤n
(Ci) =
Cand T
1≤i≤n
(Ci) = ∅.
2. Distribution Computation: In this step, for each variant of the event log, we com-
pute the distribution of diferent data attributes a∈AT. It would be more
practical if the interesting attributes are chosen by an expert. Both event and case
attributes can be considered. A simple approach is to compute the frequency of
categorical data values. For numerical data attributes, it is possible to consider the
average or the median of values for all cases of each variant.
3. Sorting the cases of each variant: In this step, we aim to sort the traces of each
variant. We need to sort the traces to give a higher priority to those traces that can
7 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
represent the variant better. One way is to sort the traces based on the frequency
of the existence of the most occurred data values of the variant. For example, we
can give a higher priority to the traces that have more frequent resources of each
variant. It is also possible to sort the traces based on their arrival time or randomly.
4. Returning sample event logs: Finally, depending on the setting of the sampling
function, we return some of the traces with the highest priority for all variants.
The most important point about this step is to know how many traces of each
variant should be selected. In the following, some possibilities will be introduced.
•Unique selection: In this approach, we select only one trace with the highest
priority. In other words, suppose that L0=sl(δ(EL)),∀σ∈L0L0(σ)=1.
Therefore, using this approach we will have |sl(δ(EL))|=|sl(EL)|. It is
expected that using this approach, the distribution of frequency of variants
will be changed and consequently the resulted prediction model will be less
accurate.
•Logarithmic distribution: In this approach, we reduce the number of traces
in each variant in a logarithmic way. If L=sl(EL)and L0=sl(δ(EL)),
∀σ∈L0L0(σ) = [Logk(L(σ))]. Using this approach, the infrequent variants
will not have any trace in the sampled event log. By using a higher k, the size
of the sampled event log is reduced more.
•Division: This approach performs similar to the previous one, however, in-
stead of using logarithmic scale, we apply the division operator. In this ap-
proach, ∀σ∈L0L0(σ) = d(σ)
ke. A higher kresults in fewer cases in the sample
event log. Note that using this approach all the variants have at least one
trace in the sampled event log.
There is also a possibility to consider other selection methods. For example, we
can select the traces completely randomly from the original event log.
By choosing diferent data attributes in Step 2 and diferent sorting algorithms in
Step 3, we are able to lead the sampling of the method on which cases should be chosen.
Moreover, by choosing the type of distribution in Step 4, we determine how many cases
should be chosen. To compute how sampling method δreduces the size of the given
event log EL, we use the following equation:
RS=|sl(EL)|
|sl(δ(EL))|(1)
The higher RSvalue means, the sampling method reduces more the size of the training
log. By choosing diferent distribution methods and diferent k-values, we are able to
8 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Table 1: Overview of the event logs that are used in the experiments. The accuracy and the required times
(in seconds) of diferent prediction methods for these event logs are also presented.
Event Log Cases Activities Variants Attributes FE Time LSTM Train Time LSTM Acc XG TrainTime XG Acc
RTFM 150370 11 231 1 73649 3021 0.791 11372 0.814
BPIC-2012-W 9658 6 2643 2 1212 3344 0.68 2011 0.685
control the size of the sampled event log. It should be noted that the proposed method
will apply just to the training event log. In other words, we do not sample event logs for
development and test datasets.
5 Evaluation
In this section, we aim at designing some experiments to answer our research question,
i.e., “Can we improve the computational performance of prediction methods by using
the sampled event logs, while maintaining a similar accuracy?”. It should be noted that
the focus of the experiments is not on prediction model tuning to have higher accuracy.
Conversely, we aim to analyze the efect of using sampled event logs (instead of the whole
datasets) on the required time and the accuracy of prediction models. In the following,
we rst explain the event logs that are used in the experiments. Aferward, we provide
some information about the implementation of sampling methods. Moreover, the ex-
perimental setting is discussed and, nally, we show the experimental results.
5.1 Event logs
To evaluate the proposed sampling procedure for prediction, we have used two event
logs widely used in the literature. Some information about these event logs is presented
in Table 1. In the RTFM event log, which corresponds to a road trac management
system, we have some high frequent variants and several infrequent variants. Moreover,
the number of activities in this event log is high. Some of these activities are infrequent,
which makes this event log imbalanced. In the BPIC-2012-W event log, relating to a
process of an insurance company, the average of variant frequencies is lower.
5.2 Implementation
We have developed the sampling methods as a plug-in in the ProM framework [30], ac-
cessible via https://svn.win.tue.nl/repos/prom/Packages/LogFiltering.
This plug-in takes an event log and returns k diferent train and test event logs in the
CSV format. Moreover, to train the prediction method, we have used XGBoost [7] and
LSTM [12] methods as they are widely used in the literature and outperformed their
counterparts. Our LSTM network consisted of an input layer, two LSTM layers with
9 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
dropout rates of 10%, and a dense output layer with the SoMax activation function.
We used “categorical cross-entropy” to calculate the loss and adopted ADAM as an
optimizer. We used gbtree with a max depth of 6as a booster in our XGBoost model.
Uniform distribution is used as the sampling method inside our XGBoost model. To
avoid overtting in both models, the training set is further divided into 90% training set
and 10% validation set to stop training once the model performance on the validation
set stops improving. We used the same setting of both models for original event logs and
sampled event logs. To access our implementations of these methods and the feature gen-
eration please refer to https://github.com/gyunamister/pm-prediction/. For
details of the feature generation and feature encoding steps, please refer to [18].
5.3 Evaluation setting
To sample the event logs, we use three distributions that are log distribution,division, and
unique variants. For the log distribution method, we have used 2,3, and 10 (i.e., log2, log3,
and log10). For the division method, we have used 2,5, and 10 (i.e., d2, d5, and d10). For
each event log and for each sampling method, we have used a 5-fold cross-validation.
Moreover, as the results of the experiments are non-deterministic, all the experiments
have been repeated 5times and the average values are represented.
Note that, for both training and evaluation phases, we have used the same settings
for extracting features and training prediction models. We used one-hot encoding to
encode the sequence of activities for both LSTM and XGBoost models. We ran the
experiment on a server with Intel Xeon CPU E7-4850 2.30GHz, and 512 GB of RAM.
In all the steps, one CPU thread has been used. We employed the Weighted Accuracy
metric [22] to compute how a prediction method performs for test data. To compare
the accuracy of the prediction methods, we use the relative accuracy that is dened as
follows.
RAcc =Accuracy using the sampled training log
Accuracy using the whole training log (2)
If RAcc is close to 1, it means that using the sampling event logs, the prediction methods
behave almost similar to the case that the whole data is used for the training. Moreover,
values higher than 1indicate the accuracy of prediction methods has improved.
To compute the improvement in the performance of training time, we will use the
following equations.
Rt=Training time using whole data
Training time using the sampled data (3)
RFE =Feature extraction time using whole data
Feature extraction time using the sampled data (4)
10 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
For both equations, the resulting values indicate how many times the sampled log is
faster than using all data.
5.4 Experimental results
Table 2presents the reduction rate and the improvement in the feature extraction phase
using diferent sampling methods. As it is expected, the highest reduction rate is for log10
(as it removes infrequent variants and keeps few traces of frequent variants), and respec-
tively it has the biggest improvement in RFE. Moreover, the lowest reduction is for d2,
especially if there are lots of unique variants in the event log (i.e., for the RTFM event
log). We expected smaller event logs to require less feature extraction time. However, re-
sults indicate that the relationship is not linear, and by having more reduction in the size
of the sampled event log there will be a much higher reduction in the feature extraction
time.
In Tables 3and 4, the results of improvement in Rtand RAcc are shown for LSTM
and XG prediction methods. As expected, by using fewer cases in the training, the per-
formance of training time improvement will be higher. Comparing the results in these
two tables and the results in Table 2, it is interesting to see that in some cases, even by
having a high reduction rate, the accuracy of the trained prediction model is close to the
case in which whole training log is used. For example, using d10 for the RTFM event
log, we will have high accuracy for both prediction methods. In other words, we are
able to improve the performance of the prediction procedure while the accuracy is still
reasonable.
When using the LSTM prediction method for the RTFM event log, there are some
cases where we have accuracy improvement. For example, using d3, there is a 0.4% im-
provement in the accuracy of the trained model. It is mainly because of the existence of
high frequent variants. These variants lead to having unbiased training logs and conse-
quently, the accuracy of the trained model will be lower for infrequent behaviors.
6 Discussion
The results indicate that we do not always have a typical trade-of between the accuracy
of the trained model and the performance of the prediction procedure. In other words,
there are some cases where the training process is much faster than the normal procedure,
even though the trained model provides an almost similar accuracy. We did not provide
the results for other metrics; however, there are similar patterns for weighted recall, pre-
cision, and f1-score. Thus, the proposed sampling methods can be used when we aim
to apply hyperparameter optimization [3]. In this way, more settings can be analyzed in
a limited time. Moreover, it is reasonable to use the proposed method when we aim to
train an online prediction method or on naive hardware such as cell phones.
11 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
Table 2: The reduction in the size of training logs (i.e., RS) and the improvement in the performance of
feature extraction part (i.e., RFE) using diferent sampling methods.
Sampling Methods d2 d3 d10 log2log3log10 unique
Event Log RSRFE RSRFE RSRFE RSRFE RSRFE RSRFE RSRFE
RTFM[14] 1.99 4.8 3.0 11.1 9.8 106.9 153.5 12527.6 236.3 23699.2 572.3 74912.8 285.1 24841.8
BPIC-2012-W [29] 1.22 1.37 1.41 1.80 1.66 2.51 6.06 22.41 9.05 37.67 28.50 208.32 1.73 2.36
Table 3: The accuracy and the improvement in the performance of prediction using diferent sampling
methods for LSTM.
Sampling Methods d2 d3 d10 log2log3log10 unique
Event Log RAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc Rt
RTFM 1.001 2.0 1.004 2.9 0.990 9.0 0.716 26.7 0.724 33.0 0.767 41.8 0.631 29.1
BPIC-2012-W 1.000 1.4 0.985 1.3 0.938 1.3 0.977 4.7 0.970 5.8 0.876 11.9 0.996 1.6
Table 4: The accuracy and the improvement in the performance of prediction using diferent sampling
methods for XGBoost.
Sampling Methods d2 d3 d10 log2log3log10 unique
Event Log RAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc RtRAcc Rt
RTFM 1.000 2.4 1.000 1.4 1.000 84.1 0.686 126.4 0.706 191.8 0.772 355.0 0.582 297.7
BPIC-2012-W 0.999 2.3 0.998 2.4 0.997 3.4 0.923 10.7 0.970 16.7 0.883 64.8 0.997 2.8
Another important outcome of the results is that for diferent event logs, we should
use diferent sampling methods to achieve the highest performance. For example, for the
RTFM event log—as there are some highly frequent variants—the division distribution
may be more useful. In other words, independently of the used prediction method, if
we change the distribution of variants (e.g., using unique distribution), it is expected that
the accuracy will sharply decrease. However, for event logs with a more uniform distribu-
tion, we can use logarithmic and unique distributions to sample event logs. The results
indicate that the efect of the chosen distribution (i.e., unique,division, and logarithmic)
is more important than the used k-value. Therefore, it would be valuable to investigate
more on the characteristics of the given event log and suitable sampling parameters for
such distribution. For example, if most variants of a given event log are unique, the
division and unique methods are not able to have remarkable RSand consequently, RFE
and Rtwill be close to 1.
Moreover, results have shown that by oversampling the event logs, although we will
have a very big improvement in the performance of the prediction procedure, the ac-
curacy of the trained model is signicantly lower than the accuracy of the model that is
trained by the whole event log. Therefore, we suggest gradually increasing (or decreas-
ing) the size of the sampled event log in the hyper-parameter optimization scenarios.
By analysis of the results using common prediction methods, we have found that the
infrequent activities can be ignored using some hyper-parameter settings. This is mainly
because the event logs are unbalanced for these infrequent activities. Using the sampling
methods that modify the distribution of the event logs such as the unique method can
help the prediction methods to also consider these activities.
Finally, in real scenarios, the process can change because of diferent reasons [6]. This
12 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
phenomenon is usually called concept dri. By considering the whole event log for train-
ing the prediction model, it is most probable that these changes are not considered in
the prediction. Using the proposed sampling procedure, and giving higher priorities to
newer traces, we are able to adapt to the changes faster, which may be critical for specic
applications.
7 Conclusion
In this paper, we proposed to use the subset of event logs to train prediction models.
We proposed diferent sampling methods for next activity prediction. These methods
are implemented in the ProM framework. To evaluate the proposed methods, we have
applied them on two real event logs and have used twostate-of-the-art prediction meth-
ods: LSTM and XGBoost. The experimental results have shown that, using the pro-
posed method, we are able to improve the performance of the next activity prediction
procedure while retaining an acceptable accuracy (in some experiments, the accuracy in-
creased). However, there is a relation between event logs characteristics and suitable pa-
rameters that can be used to sample these event logs. The proposed methods can be
helpful in situations where we aim to train the model fastly or in hyper-parameter opti-
mization scenarios. Moreover, in cases where the process can change over time, we are
able to adapt to the modied process more quickly using sampling methods.
To continue this research, we aim to extend the experiments to study the relation-
ship between the event log characteristics and the sampling parameters. Additionally, we
plan to provide some sampling methods that help prediction methods to predict infre-
quent activities, which could be more critical in the process. Finally, it is interesting to
investigate more on using sampling methods for other prediction method applications
such as last activity and remaining time prediction.
Acknowledgements
We thank the Alexander von Humboldt (AvH) Stifung for supporting our research in-
teractions.
References
[1] van der Aalst, Wil M. P. Process Mining - Data Science in Action, Second Edition.
Springer, 2016. isbn: 978-3-662-49850-7. doi:10.1007/978-3-662-49851-
4.
13 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[2] van der Aalst, Wil M. P., M. H. Schonenberg, and Minseok Song. “Time predic-
tion based on process mining”. In: Information Systems 36.2 (2011), pp. 450–475.
doi:10.1016/j.is.2010.09.001.
[3] Bergstra, James, R´
emi Bardenet, Yoshua Bengio, et al. “Algorithms for Hyper-
Parameter Optimization”. In: Advances in Neural Information Processing Sys-
tems 24: 25th Annual Conference on Neural Information Processing Systems 2011.
Proceedings of a meeting held 12-14 December 2011, Granada, Spain. Ed. by Shawe-
Taylor, John, Richard S. Zemel, Peter L. Bartlett, et al. 2011, pp. 2546–2554. url:
https://proceedings . neurips . cc /paper/2011/hash/86e8f7ab
32cfd12577bc2619bc635690-Abstract.html.
[4] Breiman, Leo. “Bagging Predictors”. In: Machine Learning 24.2 (1996), pp. 123–
140. doi:10.1007/BF00058655.
[5] Breuker, Dominic, Martin Matzner, Patrick Delfmann, et al. “Comprehensible
Predictive Models for Business Processes”. In: MIS Quarterly 40.4 (2016), pp. 1009–
1034. url:http://misq.org/comprehensible-predictive-models-
for-business-processes.html.
[6] Carmona, Josep and Ricard Gavald`
a. “Online Techniques for Dealing with Con-
cept Drif in Process Mining”. In: Advances in Intelligent Data Analysis XI - 11th
International Symposium, IDA 2012, Helsinki, Finland, October 25-27, 2012. Pro-
ceedings. Ed. by Hollm´
en, Jaakko, Frank Klawonn, and Allan Tucker. Vol. 7619.
Lecture Notes in Computer Science. Springer, 2012, pp. 90–102. doi:10.1007/
978-3-642-34156-4_10.
[7] Chen, Tianqi and Carlos Guestrin. “XGBoost: A Scalable Tree Boosting Sys-
tem”. In: Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-
17, 2016. Ed. by Krishnapuram, Balaji, Mohak Shah, Alexander J. Smola, et al.
ACM, 2016, pp. 785–794. doi:10.1145/2939672.2939785.url:https:
//doi.org/10.1145/2939672.2939785.
[8] Evermann, Joerg, Jana-Rebecca Rehse, and Peter Fettke. “Predicting process be-
haviour using deep learning”. In: Decision Support Systems 100 (2017), pp. 129–
140. doi:10.1016/j.dss.2017.04.003.
[9] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“Conformance Checking Approximation Using Subset Selection and Edit Dis-
tance”. In: Advanced Information Systems Engineering - 32nd International Con-
ference, CAiSE 2020, Grenoble, France, June 8-12, 2020, Proceedings. Ed. by Dust-
dar, Schahram, Eric Yu, Camille Salinesi, et al. Vol. 12127. Lecture Notes in Com-
puter Science. Springer, 2020, pp. 234–251. doi:10 . 1007 / 978 - 3 - 030 -
49435-3_15.
14 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[10] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“The impact of biased sampling of event logs on the performance of process dis-
covery”. In: Computing 103.6 (2021), pp. 1085–1104. doi:10. 1007 / s00607 -
021-00910-4.
[11] Garc´
ıa, Salvador, Juli´
an Luengo, and Francisco Herrera. Data Preprocessing in
Data Mining. Vol. 72. Intelligent Systems Reference Library. Springer, 2015. isbn:
978-3-319-10246-7. doi:10.1007/978-3-319-10247-4.
[12] Huang, Zhiheng, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF Models for
Sequence Tagging”. In: CoRR abs/1508.01991 (2015). arXiv: 1508.01991.url:
http://arxiv.org/abs/1508.01991.
[13] de Leoni, Massimiliano, Wil M. P. van der Aalst, and Marcus Dees. “A general
process mining framework for correlating, predicting and clustering dynamic be-
havior based on event logs”. In: Information Systems 56 (2016), pp. 235–257. doi:
10.1016/j.is.2015.07.003.
[14] de Leoni, Massimiliano and Felix Mannhardt. “Road trac ne management
process”. In: Eindhoven University of Technolo. Dataset (2015).
[15] M´
arquez-Chamorro, Alfonso Eduardo, Manuel Resinas, and Antonio Ruiz-Cort´
es.
“Predictive Monitoring of Business Processes: A Survey”. In: IEEE Transactions
on Services Computing 11.6 (2018), pp. 962–977. doi:10.1109 / TSC . 2017 .
2772256.
[16] Navarin, Nicol`
o, Beatrice Vincenzi, Mirko Polato, et al. “LSTM networks for
data-aware remaining time prediction of business process instances”. In: 2017 IEEE
Symposium Series on Computational Intelligence, SSCI 2017, Honolulu, HI, USA,
November 27 - Dec. 1, 2017. IEEE, 2017, pp. 1–7. doi:10.1109/SSCI.2017.
8285184.
[17] Nguyen, An, Srijeet Chatterjee, Sven Weinzierl, et al. “Time Matters: Time-Aware
LSTMs for Predictive Business Process Monitoring”. In: Process Mining Work-
shops - ICPM 2020 International Workshops, Padua, Italy, October 5-8, 2020, Re-
vised Selected Papers. Ed. by Leemans, Sander J. J. and Henrik Leopold. Vol. 406.
Lecture Notes in Business Information Processing. Springer, 2020, pp. 112–123.
doi:10.1007/978-3-030-72693-5_9.
[18] Park, Gyunam and Minseok Song. “Predicting performances in business pro-
cesses using deep neural networks”. In: Decision Support Systems 129 (2020). doi:
10.1016/j.dss.2019.113191.url:https://doi.org/10.1016/j.
dss.2019.113191.
15 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[19] Pegoraro, Marco, Merih Seran Uysal, David Benedikt Georgi, et al. “Text-Aware
Predictive Monitoring of Business Processes”. In: 24th International Conference
on Business Information Systems, BIS 2021, Hannover, Germany, June 15-17, 2021.
Ed. by Abramowicz, Witold, S¨
oren Auer, and Elzbieta Lewanska. TIB Open Pub-
lishing, 2021, pp. 221–232. doi:10.52825/bis.v1i.62.
[20] Polato, Mirko, Alessandro Sperduti, Andrea Burattin, et al. “Time and activity
sequence prediction of business process instances”. In: Computing 100.9 (2018),
pp. 1005–1031. doi:10.1007/s00607-018-0593-x.
[21] Pourghassemi, Behnam, Chenghao Zhang, Joo Hwan Lee, et al. “On the Limits
of Parallelizing Convolutional Neural Networks on GPUs”. In: SPAA ’20: 32nd
ACM Symposium on Parallelism in Algorithms and Architectures, Virtual Event,
USA, July 15-17, 2020. Ed. by Scheideler, Christian and Michael Spear. ACM,
2020, pp. 567–569. doi:10.1145/3350755.3400266.
[22] Powers, David M. W. “Evaluation: from precision, recall and F-measure to ROC,
informedness, markedness and correlation”. In: CoRR abs/2010.16061 (2020). arXiv:
2010.16061.url:https://arxiv.org/abs/2010.16061.
[23] Qafari, Mahnaz Sadat and Wil M. P. van der Aalst. “Root Cause Analysis in Pro-
cess Mining Using Structural Equation Models”. In: Business Process Manage-
ment Workshops - BPM 2020 International Workshops, Seville, Spain, Septem-
ber 13-18, 2020, Revised Selected Papers. Ed. by del-R´
ıo-Ortega, Adela, Henrik
Leopold, and Fl´
avia Maria Santoro. Vol. 397. Lecture Notes in Business Infor-
mation Processing. Springer, 2020, pp. 155–167. doi:10.1007/978- 3-030-
66498-5_12.
[24] Rogge-Solti, Andreas and Mathias Weske. “Prediction of Remaining Service Exe-
cution Time Using Stochastic Petri Nets with Arbitrary Firing Delays”. In: Service-
Oriented Computing - 11th International Conference, ICSOC 2013, Berlin, Ger-
many, December 2-5, 2013, Proceedings. Ed. by Basu, Samik, Cesare Pautasso, Liang
Zhang, et al. Vol. 8274. Lecture Notes in Computer Science. Springer, 2013, pp. 389–
403. doi:10.1007/978-3-642-45005-1_27.
[25] Senderovich, Arik, Chiara Di Francescomarino, Chiara Ghidini, et al. “Intra and
Inter-case Features in Predictive Process Monitoring: A Tale of Two Dimensions”.
In: Business Process Management - 15th International Conference, BPM 2017,
Barcelona, Spain, September 10-15, 2017, Proceedings. Ed. by Carmona, Josep, Gre-
gor Engels, and Akhil Kumar. Vol. 10445. Lecture Notes in Computer Science.
Springer, 2017, pp. 306–323. doi:10.1007/978-3-319-65000-5_18.
16 / 17
M. Fani Sani et al. Event Log Sampling for Predictive Monitoring
[26] Tax, Niek, Ilya Verenich, Marcello La Rosa, et al. “Predictive Business Process
Monitoring with LSTM Neural Networks”. In: Advanced Information Systems
Engineering - 29th International Conference, CAiSE 2017, Essen, Germany, June
12-16, 2017, Proceedings. Ed. by Dubois, Eric and Klaus Pohl. Vol. 10253. Lecture
Notes in Computer Science. Springer, 2017, pp. 477–492. doi:10.1007/978-
3-319-59536-8_30.
[27] Teinemaa, Irene, Marlon Dumas, Fabrizio Maria Maggi, et al. “Predictive Busi-
ness Process Monitoring with Structured and Unstructured Data”. In: Business
Process Management - 14th International Conference, BPM 2016, Rio de Janeiro,
Brazil, September 18-22, 2016. Proceedings. Ed. by Rosa, Marcello La, Peter Loos,
and Oscar Pastor. Vol. 9850. Lecture Notes in Computer Science. Springer, 2016,
pp. 401–417. doi:10.1007/978-3-319-45348-4_23.
[28] Teinemaa, Irene, Marlon Dumas, Marcello La Rosa, et al. “Outcome-Oriented
Predictive Process Monitoring: Review and Benchmark”. In: ACM Transactions
on Knowledge Discovery from Data13.2 (2019), 17:1–17:57. doi:10.1145/3301300.
[29] Van Dongen, Boudewijn F. BPI Challenge 2012. nl. 2012. doi:10.4121/UUID:
3926DB30-F712-4394-AEBC- 75976070E91F.url:https://data.4tu
.nl/repository/uuid:3926db30-f712-4394-aebc-75976070e91f.
[30] Verbeek, Eric, Joos C. A. M. Buijs, Boudewijn F. van Dongen, et al. “ProM 6:
The Process Mining Toolkit”. In: Proceedings of the Business Process Management
2010 Demonstration Track, Hoboken, NJ, USA, September 14-16, 2010. Ed. by
Rosa, Marcello La. Vol. 615. CEUR Workshop Proceedings. CEUR-WS.org, 2010.
url:http://ceur-ws.org/Vol-615/paper13.pdf.
[31] Wang, Tongzhou, Jun-Yan Zhu, Antonio Torralba, et al. “Dataset Distillation”.
In: CoRR abs/1811.10959 (2018). arXiv: 1811.10959.url:http://arxiv.
org/abs/1811.10959.
[32] Wilson, D. Randall and Tony R. Martinez. “Reduction Techniques for Instance-
Based Learning Algorithms”. In: Machine Learning 38.3 (2000), pp. 257–286.
doi:10.1023/A:1007626913721.
[33] Wilson, Dennis L. “Asymptotic Properties of Nearest Neighbor Rules Using Edited
Data”. In: IEEE Transactions on Systems, Man and Cybernetics 2.3 (1972), pp. 408–
421. doi:10.1109/TSMC.1972.4309137.
[34] Zhou, Lina, Shimei Pan, Jianwu Wang, et al. “Machine learning on big data: Op-
portunities and challenges”. In: Neurocomputing 237 (2017), pp. 350–361. doi:
10.1016/j.neucom.2017.01.026.
17 / 17