Content uploaded by Marco Pegoraro
Author content
All content in this area was uploaded by Marco Pegoraro on Mar 07, 2023
Content may be subject to copyright.
Performance-Preserving Event Log Sampling
for Predictive Monitoring
Mohammadreza Fani Sani 1, Mozhgan Vazifehdoostirani 2,
Gyunam Park 1, Marco Pegoraro 1, Sebastiaan J. van Zelst 3,1, and
Wil M.P. van der Aalst 1,3
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
{fanisani, gnpark, pegoraro, s.j.v.zelst, wvdaalst}@pads.rwth-aachen.de
2Industrial Engineering and Innovation Science, Eindhoven University of Technolo,
Eindhoven, the Netherlands
m.vazifehdoostirani@tue.nl
3Fraunhofer FIT, Birlinghoven Castle, Sankt Augustin, Germany
Abstract
Predictive process monitoring is a subeld of process mining that aims to estimate
case or event features for running process instances. Such predictions are of sig-
nicant interest to the process stakeholders. However, most of the state-of-the-art
methods for predictive monitoring require the training of complex machine learn-
ing models, which is ofen inecient. Moreover, most of these methods require a
hyper-parameter optimization that requires several repetitions of the training pro-
cess which is not feasible in many real-life applications. In this paper, we propose
an instance selection procedure that allows sampling training process instances for
prediction models. We show that our instance selection procedure allows for a sig-
nicant increase of training speed for next activity and remaining time prediction
methods while maintaining reliable levels of prediction accuracy.
Keywords: Process Mining ·Predictive Monitoring ·Sampling ·Machine Learn-
ing ·Deep Learning ·Instance Selection.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the authors. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Fani Sani, Mohammadreza, Mozhgan Vazifehdoostirani, Gyunam Park, Marco Pegoraro, Sebastiaan j. van Zelst, and
Wil M. P. van der Aalst. “Performance-Preserving Event Log Sampling for Predictive Monitoring”. In: Journal of Intel-
ligent Information Systems (2023)
Please, cite this document as shown above.
Publication chronology:
•2022-03-09: full text submitted to the Springer Journal of Intelligent Information Systems
•2022-06-01: major revision requested
•2022-09-01: revised version submitted
•2022-11-10: minor revision requested
•2022-12-07: revised version submitted
•2022-12-29: notication of acceptance
•2023-03-06: published
The published version referred above is ©Springer.
Correspondence to:
Mohammadreza Fani Sani, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Email: fanisani@pads.rwth-aachen.de ·ORCID: 0000-0003-3152-2103
Content: 28 pages, 1 gure, 13 tables, 46 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
1 Introduction
The main goal of predictive process monitoring is to provide timely information by pre-
dicting the behavior of business processes [3] and enabling proactive actions to improve
the performance of the process [22]. It provides various predictive information such as
the next performing activity of a process instance [6], e.g., patient and product, its wait-
ing time for an activity, its remaining time to complete the process, etc [19]. For instance,
by predicting the long waiting time of a patient for registration, one can bypass the ac-
tivity or add more resources to perform it.
A plethora of approaches have been proposed to support predictive process mon-
itoring. In particular, with the recent breakthroughs in machine learning, various ma-
chine learning-based approaches have been developed [19]. The emergence of ensemble
learning methods leads to improvement in accuracy in diferent areas [5]. eXtreme Gra-
dient Boosting (XGBoost) [9] has shown promising results, ofen outperforming other
ensemble methods such as Random Forest or using a single regression tree [35,40]. Fur-
thermore, techniques based on deep neural networks, e.g., Long-Short Term Memory
(LSTM) networks, have shown high performance in diferent predictive tasks [10].
However, machine learning-based techniques are computationally expensive due to
their training process [46]. Moreover, they ofen require exhaustive hyperparameter-
tuning to provide acceptable accuracy. Such limitations hinder the application of ma-
chine learning-based techniques into real-life business processes where new prediction
models are required in short intervals to adapt to changing business situations. Business
analysts need to test the eciency and reliability of their conclusions via repeated train-
ing of diferent prediction models with diferent parameters [19]. Such long training
time limits the application of the techniques when considering the limitations in time
and hardware [31].
In this regard, instance selection has been studied as a promising direction of research
to reduce original datasets to a manageable volume to perform machine learning tasks,
while the quality of the results (e.g., accuracy) is maintained as if the original dataset was
used [15]. Instance selection techniques are categorized into two classes based on the
way they select instances. First, some techniques select the instances at the boundaries
of classes. For instance, Decremental Reduction Optimization Procedure (DROP) [44]
selects instances using k-Nearest Neighbors by incrementally discarding an instance if its
neighbors are correctly classied without the instance. The other techniques preserve
the instances residing inside classes, e.g., Edited Nearest Neighbor (ENN) [45] preserves
instances by repeatedly discarding an instance if it does not belong to the class of the
majority of its neighbors.
However, it is restricted to directly apply existing techniques for instance selection
to predictive process monitoring training, since such techniques assume independence
among instances [44]. Instead, in predictive process monitoring, instances are computed
3 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
from event data that are recorded by the information system supporting business pro-
cesses [17]. Thus, they are highly correlated [2] with the notion of case, e.g., a manufac-
turing product in a factory and a customer in a store.
In this work, we suggest an instance selection approach for predicting the next ac-
tivity, the remaining time and the outcome of the process that are main applications of
predictive business process monitoring. By considering the characteristics of the event
data, the proposed approach samples event data such that the training speed is improved
while the accuracy of the resulting prediction model is maintained. We have evaluated
the proposed methods using three real-life datasets and state-of-the-art techniques for
predictive business process monitoring, including LSTM [16] and XGBoost [9].
This paper extends our earlier work presented in [11] in the following dimensions:
1) Evaluating the applicability of the proposed approach, 2) Enhancing the accessibil-
ity of the work, and 3) Extending the discussion of strengths and limitations. First, we
have evaluated the applicability of the proposed approach both task-wise and domain-
wise. For the task-wise evaluation, we have selected the three most well-known pre-
dictive monitoring tasks (i.e., next activity,remaining time, and outcome predictions)
and evaluated the performance of the proposed approach in the diferent tasks. For the
domain-wise evaluation, we have evaluated the performance of our proposed approach
in real-life event logs from diferent domains, including finance,government, and health-
care domains. Second, we have extended the accessibility of the proposed approach by
implementing the proposed sampling methods in the Python platform as well as the
Java platform. Finally, we have extensively discussed the strengths and limitations of the
proposed approach, providing foundations for further research.
The remainder is organized as follows. We discuss the related work in Section 2.
Next, we present the preliminaries in Section 3 and proposed methods in Section 4. Af-
terward, Section 5 evaluates the proposed methods using real-life event data and Section 6
provides discussions. Finally, Section 7 concludes the paper.
2 Related Work
This section presents the related work on predictive process monitoring, time optimiza-
tion, and instance sampling.
2.1 Predictive Process Monitoring
Predictive process monitoring is an exceedingly active eld, both currently and histori-
cally, thanks to the compatibility of process sciences and other branches of data science
that include inference techniques, such as statistics and machine learning. At its core, the
fundamental component of many predictive monitoring approaches is the abstraction
4 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
technique it uses to obtain a xed-length representation of the process component sub-
ject to the prediction (ofen, but not always, process traces). In the earlier approaches,
the need for such abstraction was overcome through model-aware techniques, employ-
ing process models and replay techniques on partial traces to abstract a at representation
of event sequences. Such process models are mostly automatically discovered from a set
of available complete traces, and require perfect tness on training instances (and, sel-
domly, also on unseen test instances). For instance, Van der Aalst et al. [3] proposed a
time prediction framework based on replaying partial traces on a transition system, ef-
fectively clustering training instances by control-ow information. This framework has
later been the basis for a prediction method by Polato et al. [30], where the transition
system is annotated with an ensemble of SVR and Na¨
ıve Bayes classiers, to perform a
more accurate time estimation. Some more recent approaches split the predictive con-
tribution of process models and machine learning models in a perspective-wise manner:
for instance, Park et al. [24] obtain a representation of the performance perspective us-
ing an annotated transition system, and design an ensemble with a deep neural network
to obtain the nal predictive model. A related approach, albeit more linked to the simu-
lation domain and based on a Monte Carlo method, is the one proposed by Rogge-Solti
and Weske [34], which maps partial process instances in an enriched Petri net.
Recently, predictive process monitoring started to use a plethora of machine learn-
ing approaches, achieving varying degrees of success. For instance, Teinemaa et al. [39]
provided a framework to combine text mining methods with Random Forest and Logis-
tic Regression. Senderovich et al. [35] studied the efect of using intra-case and inter-case
features in predictive process monitoring and showed a promising result for XGBoost
compared to other ensemble and linear methods. A comprehensive benchmark on using
classical machine learning approaches for outcome-oriented predictive process monitor-
ing tasks [40] has shown that the XGBoost is the best-performing classier among dif-
ferent machine learning approaches such as SVM, Decision Tree, Random Forest, and
logistic regression.
More recent methods are model-unaware and perform based on a single and more
complex machine learning model instead of an ensemble. In fact, such an evolution of
predictive monitoring mimics the advancement in the accuracy of newer machine learn-
ing approaches, specically the numerous and sophisticated models based on deep neu-
ral networks that have been developed in the last decade. The LSTM network model has
proven to be particularly efective for predictive monitoring [10,38], since the recurrent
architecture can natively support sequences of data of arbitrary length. It allows per-
forming trace prediction while employing a xed-length event abstraction, which can be
based on control-ow alone [10,38], data-aware [20], time-aware [21], text-aware [27],
or model-aware [24]. Additionally, rather than leveraging control-ow information for
prediction, some recent research aims to use predictive monitoring to reconstruct miss-
ing control-ow attributes such as labels [1] or case identiers [29,28]. However, the
5 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
body of work currently standing in predictive process monitoring, regardless of architec-
ture of the predictive model and/or target of the prediction, does not include strategies
for performance-preserving sampling.
2.2 Time Optimization and Instance Sampling
The latest research developments in the eld of predictive monitoring, much like both
this paper and the application of machine learning techniques in other domains, shifs its
focus away from increasing the quality of prediction (with respect to a given error met-
ric or a given benchmark), and specializes in enriching the results of the prediction on
additional aspects or perspectives. Unlike the methods mentioned in the previous para-
graph, this is usually obtained through dedicated machine learning models—or modi-
cations thereof—rather than designing specic event- or trace-level abstractions. For
instance, many scholars have attempted to make predictions more transparent through
the use of explainable machine learning techniques [42,37,36,14,23]. More related
to our present work, Pauwels and Calders [26] propose a technique to avoid the time
expenditure caused by the retrain of machine learning models; this is necessary when
they are not representative anymore—for instance, when changes occur in the underly-
ing process (caused e.g. by concept drif). While in this paper we focus on the data, and
we propose a solution based on sampling, Pauwels and Calders intervene on the model
side, devising an incremental training schema which accounts for new information in an
ecient way.
Another concept similar to the idea proposed in this paper, and of current interest in
the eld of machine learning, is dataset distillation: utilizing a dataset to obtain a smaller
set of training instances that contain the same information (with respect to training a
machine learning model) [8]. While this is not considered sampling, since some instances
of the distilled dataset are created ex-novo, it is an approach very similar to the one we
illustrate in our paper.
The concept of instance sampling, or instance subset selection, is present in the con-
text of process mining at large, albeit the development of such techniques is very recent.
Some instance selection algorithms have been proposed to help classical process mining
tasks. For example, [13] proposes to use instance selection techniques to improve the
performance of process discovery algorithms; in this context, the goal is to obtain auto-
matically a descriptive model of the process, on the basis of the data recorded about the
historical executions of process instances—the same starting point of the present work.
Then, the work in [12] applies the same concept and integrates the edit distance to ob-
tain a fast technique to approximate the conformance checking score of process traces:
this consists of measuring the deviation between a model, which ofen represents the
normative or prescribed behavior of a process, and the data, which represents the actual
behavior of a process. This paper integrates the two aforementioned works, extending
6 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 1: Simple example of an event log. Rows capture events recorded in the context of the execution of
the process. An event describes at what point in time an activity was performed. Other data attributes may
be available as well.
Case-id Event-id Activity name Starting time Finishing time ...
.
.
..
.
..
.
..
.
..
.
.. . .
7 35 Register(a) 2021-01-02 12:23 2021-01-02 12:25 . . .
7 36 Analyze Defect(b) 2021-01-02 12:30 2021-01-02 12:40 . . .
7 37 Inform User(g) 2021-01-02 12:45 2021-01-02 12:47 . . .
8 39 Register(a) 2021-01-02 12:23 2021-01-02 13:15 . . .
7 40 Test Repair(e) 2021-01-02 13:05 2021-01-02 13:20 . . .
7 41 Archive Repair(h) 2021-01-02 13:21 2021-01-02 13:22 . . .
8 42 Analyze Defect(b) 2021-01-02 12:30 2021-01-02 13:30 . . .
.
.
..
.
..
.
..
.
..
.
....
the efects of strategic sampling: while in [13] the sampling optimizes descriptive model-
ing and in [12] it optimizes process diagnostics, in this work it aids predictive modeling.
To the best of our knowledge, no work present in literature inspects the efects of
building a training set for predictive process monitoring through a strategic process in-
stance selection, with the exception of our previous work, which we extend in this pa-
per [11]. In this paper, we examine the underexplored topic of event data sampling and se-
lection for predictive process monitoring, with the objective of assessing if and to which
extent prediction quality can be retained when we utilize subsets of the training data.
3 Preliminaries
In this section, some process mining concepts such as event log and sampling are dis-
cussed. In process mining, we use events to provide insights into the execution of busi-
ness processes. Event logs, i.e., collections of events representing the execution of several
instances of a process, are the starting point of process mining algorithms. An exam-
ple event log is shown in Table 1. Each event that relates to a row in the table is related
to specic activities of the underlying process. Furthermore, we refer to a collection of
events related to a specic process instance of the process as a case (represented by the
Case-id column). Both cases and events may have diferent attributes. An event log that
is a collection of events and cases is dened as follows.
Definition 1 (Event Log).Let UEbe the universe of events, UCbe the universe
of cases, UAT be the universe of attributes, and Ube the universe of attribute values.
Moreover, let C⊆UCbe a non-empty set of cases, let E⊆UEbe a non-empty set
of events. We define (C, E, πC, πE)as an event log, where πC:C×UAT 6→ Uand
7 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
πE:E×UAT 6→ U. Any event in the event log has a case, and thus, @e∈E(πE(e, case)6∈
C)and S
e∈E
(πE(e, case)) = C.
Let A⊆Ube the universe of activities and V⊆A∗be the universe of sequences
of activities. For any e∈E, function πE(e, activity)∈A, which means that any event
in the event log has an activity. Moreover, for any c∈Cfunction πC(c, variant)∈
A∗\ {h i} that means any case in the event log has a variant.
Therefore, there are some mandatory attributes that are case and activity for events
and variants for cases. For example, for event with Event-id equals to 35 in Table 1,
πE(e, case)) = 7 and πE(e, activity) = Register(a).
Variants are the sequence of activities that are presented in each case. For example,
for case 7in Table 1, the variant is ha, b, g, e, hi(for the simplicity we show each activ-
ity by a letter). Variant information plays an important role an in some process mining
applications, e.g., process discovery and conformance checking, just this information is
considered. In this regard, event logs are considered as a multiset of sequences of activi-
ties. In the following, a simple event log is dened.
Definition 2 (Simple event log).Let Abe the universe of activities and let the
universe of multisets over a set Xbe denoted by B(X). A simple event log is L∈B(A∗).
Moreover, let EL be the universe of event logs and EL = (C, E, πC, πE)∈EL be an event
log. We define function sl :EL →B(A∗)returns the simple event log of an event log
where sl(EL) = [σk|σ∈ {πC(c, variant)|c∈C} ∧ k= Σc∈CπC(c, variant ) = σ].
The set of unique variants in the event log is denoted by sl(EL) = {πC(c, variant )|c∈
C}.
Therefore, sl returns the multiset of variants in the event logs. Note that the size of
a simple event log equals the number of cases in the event logs, i.e., |sl(EL)|=|C|
In this paper, we use sampling techniques to reduce the size of event logs. An event
log sampling method is dened as follows.
Definition 3 (Event log sampling).Let EL be the universe of event logs and Abe the
universe of activities. Moreover, let EL = (C, E, πC, πE)∈EL be an event log, we define
function δ:EL →EL that returns the sampled event log where if (C0, E0, π0
C, π0
E) =
δ(EL), then C0⊆C,E0⊆E,π0
E⊆πE,π0
C⊆πC, and consequently, sl(δ(EL)) ⊆
sl(EL). We define that δis a variant-preserving sampling if sl(δ(EL)) = sl(EL).
In other words, a sampling method is variant-preserving if and only if all the variants
of the original event log are presented in the sampled event log.
To use machine learning methods for prediction, we usually need to transfer each
case to one or more features. The feature is dened as follows.
8 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Figure 1: A schematic view of the proposed sampling procedure
Definition 4 (Feature).Let UAT be the universe of attributes, Ube the universe of
attribute values, and UCbe the universe of cases. Moreover, let AT ⊆UAT be a set of
attributes. A feature is a relation between a sequence of attributes’ values for AT and the
target attribute value, i.e., f∈(U|AT|×U). We define fe:UC×EL →B(U|AT |×U)
is a function that receives a case and an event log, and returns a multiset of features.
For the next and nal activity prediction, the target attribute value should be an
activity. However, for the remaining time prediction, the target attribute value is a nu-
merical value. Moreover, a case in the event log may have diferent features. For example,
suppose that we only consider the activities. For the case ha, b, c, di, we may have (hai, b),
(ha, bi, c), and (ha, b, ci, d )as features. Furthermore, Pc∈Cfe(c, EL)are the correspond-
ing features of event log EL = (C, E, πC, πE)that could be given to diferent machine
learning algorithms. For more details on how to extract features from event logs please
refer to [33].
4 Proposed Sampling Methods
In this section, we propose an event log preprocessing procedure that helps prediction
algorithms to perform faster while maintaining reasonable accuracy. The schematic view
of the proposed instance selection approach is presented in Figure 1. First, we need to
traverse the event log and nd the variants and corresponding traces of each variant in
the event log. Moreover, diferent distributions of data attributes in each variant will be
computed. Aferward, using diferent sorting and instance selection strategies, we are
able to select some of the cases and return the sample event log. In the following, each
of these steps is explained in more detail. To illustrate the following steps, we provide an
example event log with 10 cases, visible in Table 2.
1. Traversing the event log: In this step, the unique variants of the event log and
the corresponding traces of each variant are determined. In other words, con-
sider event log EL that sl(EL) = {σ1, . . . , σn}where n=|sl(EL)|, we aim to
split EL to EL1, . . . , ELnwhere ELionly contains all the cases that Ci={c∈
C|πC(c, variant) = σi}and Ei={e∈E|πE(e, case)∈Ci}. Obviously,
9 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
S1≤i≤n(Ci) = Cand T1≤i≤n(Ci) = ∅. For the event log that is presented in
Table 2, we have n= 4 variants and E1={c1, c3, c4, c9, c10},E2={c2, c5, c8},
E3={c6}, and E1={c7}.
2. Computing Distribution: In this step, for each variant of the event log, we compute
the distribution of diferent data attributes a∈AT. It would be more practical if
the interesting attributes are chosen by an expert. Both event and case attributes
can be considered. A simple approach is to compute the frequency of categorical
data values. For numerical data attributes, it is possible to consider the average
or the median of values for all cases of each variant. In the running example for
E3and E4, we only have one case for each variant. However, for E1, and E2, the
average of Amount is 500 and 460, respectively.
3. Sorting the cases of each variant: In this step, we aim to sort the traces of each
variant. We need to sort the traces to give a higher priority to those traces that can
represent the variant better. One way is to sort the traces based on the frequency
of the existence of the most occurred data values of the variant. For example, we
can give a higher priority to the traces that have more frequent resources of each
variant. For the event log that is presented in Table 2, we do not need to prioritize
the cases in E3and E4. However, if we sort the traces according to their distance
of amount value and the average value of each variant, for E1, we have c3, c9, c4, c1,
and c10. The order for E2is c5, c2, and c8. It is also possible to sort the traces based
on their arrival time or randomly.
4. Returning sample event logs: Finally, depending on the setting of the sampling
function, we return some of the traces with the highest priority for all variants.
The most important point about this step is to know how many traces of each
variant should be selected.
In the following, some possibilities will be introduced.
•Unique selection: In this approach, we select only one trace with the high-
est priority. In other words, suppose that L0=sl(δ(EL)),∀σ∈L0L0(σ) = 1.
Therefore, using this approach we will have |sl(δ(EL))|=|sl(EL)|. It is
expected that by using this approach, the distribution of frequency of vari-
ants will be changed and consequently the resulted prediction model will
be less accurate. By applying this sampling method on the event log that is
presented in Table 2, the sampled event log will have 4traces, i.e., one trace
for each variant. The corresponding cases are C0={c3, c5, c6, c7}. For the
variants that have more than one trace, the traces are chosen that have the
highest priority (their amount value is closer to the average amount value
for each variant).
10 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 2: An example event log with 10 traces and 4 variants. Each trace has two attributes that are Variant
and Amount.
CaseID Variant Amount
c1ha, b, c, di100
c2ha, c, b, di720
c3ha, b, c, di400
c4ha, b, c, di800
c5ha, c, b, di600
c6ha, c, c, di750
c7ha, c, di170
c8ha, c, b, di60
c9ha, b, c, di260
c10 ha, b, c, di940
•Logarithmic distribution: In this approach, we reduce the number of traces
in each variant in a logarithmic way. If L=sl(EL)and L0=sl(δ(EL)),
∀σ∈L0L0(σ) = [Logk(L(σ))]. Using this approach, the infrequent variants
will not have any trace in the sampled event log and consequently it is not
variant-preserving. According the above formula, by using a higher base for
the logarithm (i.e., k), the size of the sampled event log is reduced more.
By using this sampling strategy with kequals to 3on the event log that is
presented in Table 2, the cases that selected in the sampled event log is C0=
{c3, c9, c5}. Note that for the infrequent variants no trace is selected in the
sampled event log.
•Division: This approach performs similar to the previous one, however, in-
stead of using logarithmic scale, we apply the division operator. In this ap-
proach, ∀σ∈L0L0(σ) = dL(σ)
ke. A higher kresults in fewer cases in the sample
event log. Note that as d e considered in the above formula, using this ap-
proach all the variants have at least one trace in the sampled event log and
it is variant-preserving. By using this sampling strategy with k= 4 on the
event log that is presented in Table 2, the sampled event log will have 5traces
that are C0={c3, c9, c5, c6, c7}.
There is also a possibility to consider other selection methods. For example, we
can select the traces completely randomly from the original event log.
By choosing diferent data attributes in Step 2 and diferent sorting algorithms in
Step 3, we are able to lead the sampling of the method on which cases should be chosen.
Moreover, by choosing the type of distribution in Step 4, we determine how many cases
11 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
should be chosen. To compute how sampling method δreduces the size of the given
event log EL, we use the following equation:
RS=|sl(EL)|
|sl(δ(EL))|
The higher RSvalue means, the sampling method reduces more the size of the training
log. By choosing diferent distribution methods and diferent k-values, we are able to
control the size of the sampled event log. It should be noted that the proposed method
will apply just to the training event log. In other words, we do not sample event logs for
development and test datasets.
5 Evaluation
In this section, we aim at designing some experiments to answer the research question,
i.e., ”Is it possible to have computational performance improvement of prediction meth-
ods by using the sampled event logs, while maintaining a similar accuracy?”. It should be
noted that the focus of the experiments is not on prediction model tuning to have higher
accuracy. Conversely, we aim to analyze the efect of using sampled event logs (instead
of the whole datasets) on the required time and the accuracy of prediction models.
In the following, we rst explain the evaluation settings and event logs that are used.
Aferward, we provide some information about the implementation of sampling meth-
ods, and nally, we show the experimental results.
5.1 Evaluation Setting
In this section, we rst explain the prediction methods and parameters that are used in
the evaluation. Aferward, we discuss the evaluation metrics.
5.1.1 Evaluation Parameters
We have developed the sampling methods as a plug-in in the ProM framework [41], ac-
cessible via https://svn.win.tue.nl/repos/prom/Packages/LogFiltering.
This plug-in takes an event log and returns k diferent train and test event logs in the CSV
format. Moreover, we have also implemented the sampling methods in Python to have
all the evaluations in one workow.
We have used two machine learning methods to train the prediction models, i.e.,
LSTM and XGBoost. For predicting the next activity, our LSTM network consisted of
an input layer, two LSTM layers with dropout rates of 10%, and a dense output layer
with the SoMax activation function. We used “categorical cross-entropy” to calcu-
late the loss and adopted ADAM as an optimizer. We built the same architecture of
12 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
LSTM for the remaining time predicting with some diferences. We employed “mean
absolute error” as a loss function and “Root Mean Squared Propagation” (RMSprop)
as an optimizer. We used gbtree with a max depth of 6as a booster in our XGBoost
model for both of the next activity and remaining time prediction tasks. Uniform dis-
tribution is used as the sampling method inside our XGBoost model. To avoid overt-
ting in both models, the training set is further divided into 90% training set and 10%
validation set to stop training once the model performance on the validation set stops
improving. We used the same parameter setting of both models for original event logs
and sampled event logs. The implementations of these methods are available at https:
//github.com/gyunamister/pm-prediction/.
To train the prediction models using machine learning methods, we extract features
from event data. To this end, we use the most commonly-used features for each predic-
tion task in order to reduce the degree of freedom in selecting relevant features. In other
words, we focus on comparing the performance of predictions between sampled and
non-sampled event data with a xed feature space. For instance, for the next activity pre-
diction, we use the partial trace (i.e., the sequence of historical activities) of cases and the
temporal measures of each activity (e.g., sojourn time) with one-hot encoding [25]. For
the remaining time prediction, we use the partial trace of cases along with case attributes
(e.g., cost), resources, and temporal measures [43].
To sample the event logs, we use three distributions that are log distribution,divi-
sion, and unique variants. For the log distribution method, we have used 2,3,5, and 10
(i.e., log2, log3, log5, and log10). For the division method, we have used 2,3,5, and 10 (i.e.,
d2, d3, d5, and d10). For each event log and each sampling method, we have used a 5-fold
cross-validation. It means we split the data into 5groups. One of the groups is used as the
test event log, and the rest are merged as the training event log. It should be noted that
for each event log, the splitting groups were the same for all the prediction and sampling
methods. Moreover, as the results of the experiments are non-deterministic, all the ex-
periments have been repeated 5times, and the average values are represented. Moreover,
to have a fair evaluation, in all the steps, one CPU thread has been used.
5.1.2 Metrics
To evaluate the correctness of prediction methods for predicting the next activities, we
have considered two metrics, i.e., Accuracy and F1-score. The F1-score is used for im-
balanced data [18]. For remaining time prediction, we consider Mean Absolute Error
(MAE) and Root Mean Squared Error (RMSE) measures as they were used in [43] that
are computed as follows.
MAE =1
n
n
X
t=1
|et|
13 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
RMSE =v
u
u
t
1
n
n
X
t=1
e2
t
In the above equations, etindicates the prediction error for the tth instance of validation
data. In other words, we aim to compute the absolute diference between the predicted
and real values (in days) for the remained time. For both measures, the lower value means
higher accuracy. Note that, similar to [43], we considered seconds as the time unit to
compute these two metrics.
To evaluate how accurate the prediction methods using the sampled event logs are,
we have used relative metrics that compared them with the case that whole event logs are
used according to the following equations.
RAcc =Accuracy using the sampled training log
Accuracy using the whole training log
RF1=F1-score using the sampled training log
F1-score using the whole training log
In both above equations, a value close to 1means that using the sampling event logs,
the prediction methods behave almost similar to the case that the whole data is used for
the training. Moreover, values higher than 1indicate the accuracy/F1-score of prediction
methods has improved.
Unlike previous metrics, for MAE and RMSE, a higher value means the prediction
model is less accurate in predicting the remaining time. Therefore, we use the following
measures.
RMAE =MAE using the whole training log
MAE using the sampled training log
RRMSE =RMSE using the whole training log
RMSE using the sampled training log
In both of the above measures, a higher value means that the applied instance se-
lection method preserves higher accuracy. In case the values of the above measures are
higher than 1, the instance selection methods improve the accuracy of prediction models
compared to the case that the whole training data have been used.
To compute the improvement in the performance of feature extraction and training
time, we will use the following equations.
RFE =Feature extraction time using whole data
Feature extraction time using the sampled data
14 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Rt=Training time using whole data
Training time using the sampled data
For both equations, the resulting values indicate how many times applying the sam-
pled log is faster than using all data.
5.1.3 Event Logs
We have used three event logs widely used in the literature. In the BPIC-2012-W event
log, relating to a process of an insurance company, the average of variant frequencies is
low. In the RTFM event log, which corresponds to a road trac management system,
we have some highly frequent variants and several infrequent variants. Moreover, the
number of activities in this event log is high. In the Sepsis event log, relating to a health
care process, there are several variants, that most of them are unique. Some of the activ-
ities in the last two event logs are infrequent, which makes these event logs imbalanced.
Some information about these event logs and the result of using prediction methods
on them is presented in Table 3. Note that the time-related features in this table are in
seconds.
According to Table 3, using the whole event data we usually have high accuracy for
the next activity prediction. However, the F1-score is not that high, which is mainly be-
cause the event logs are imbalanced (specically RTFM and Sepsis). Moreover, the MAE
and RMSE values are very high. Specically for the RTFM event log. It is mainly be-
cause process instances’ durations are very long in this event log, and consequently, the
etvalues are higher. Finally, there is a direct relation between the size of event logs and
the required time for extracting features and training the models.
5.2 Evaluation Results
Here, we provide the results of using sampled training event logs instead of whole train-
ing event logs. First, we show how by using sampling, the size of training data is reduced
in Table 4. As it is expected, the highest reduction occurs when log10 is used. Using this
sampling, the size of the RTFM event log is more than 1000 times reduced. However,
Table 3: Event logs that are used in the evaluation and results of using them for the next activity and remain-
ing time prediction
15 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 4: Reduction in size of training event logs (i.e., RS) and the improvement in the feature extraction
process (i.e., RFE ) using diferent sampling methods
Table 5: RAcc of diferent event logs when diferent sampling methods are used.
for the Sepsis event log, as most variants are unique, sampling the training event logs
using divide distribution could not result in high RS. Moreover, this table shows how
using the sampling event logs can reduce the required time to extract features of the event
data, i.e., RFE. As it is expected, there is a correlation between the size reduction of the
sampled event logs and the improvement in the RFE.
In the following, we show how using the sampling event logs afects the next activity,
remaining time, and outcome prediction.
5.2.1 Next Activity Prediction
The accuracy of both LSTM and XGboost methods that are trained using sampled
training data is presented in Table 5. Results indicate that in most cases, when the di-
vision sampling methods are used, we can achieve similar accuracy, i.e., RAcc close to 1,
compared to the case where the whole training data is used. In some cases, like using
d2for the Sepsis event log and LSTM method, the accuracy of the training method is
even (slightly) improved. However, for the RTFM, using sampling with logarithmic
or unique distribution highly changes the frequency distribution of variants and conse-
quently causes a higher reduction in the accuracy of predicting models. Moreover, the
accuracy reduction was higher when the XGboost method was used. The result indi-
cated that by increasing the RSvalue, we lose more information in the training event
logs, and consequently, the quality of the prediction models will be decreased.
In Table 6,RF1of trained models are depicted. The results again indicate that using
the sampling method with divide distribution in most cases leads to having a similar (and
sometimes higher) F1-score.
By considering the results of these two tables, we found that specically, when the
LSTM method is used, we will have similar accuracy and F1-score. Moreover, for the
Sepsis and BPIC2012-W that variants have similar frequency having all the variants (i.e.,
16 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 6: RF1of diferent event logs when diferent sampling methods are used.
Table 7: Rtof diferent event logs when diferent sampling methods are used.
using divide and unique distributions) can help the prediction method to have results
similar to the case that we use the whole training data. However, for the RTFM event
log that has some high frequent variants, using the unique distribution results in lower
accuracy and F1-score.
Table 7 shows how much training time is faster using the sampled training data in-
stead of using the whole event data. There is a direct relationship between the size re-
duction and Rt(refer to the results in Table 4). However, in most cases, the performance
improvement is bigger for the XGboost method. Considering the results in this table and
Table 6, we found that using the sampling method, we are able to improve the perfor-
mance of the next activity prediction methods on the used event logs while they provide
similar results. However, oversampling (e.g., applying log10 for RTFM ) will result in
lower accuracy.
5.2.2 Remaining Time Prediction
In Table 8 and Table 9, we show how by using the sampled event logs, MAE and RMSE
of diferent remaining time prediction methods are changed. The results indicate that for
the LSTM method, independent of the sampling method for all event logs, we are able to
provide a prediction similar to the case where whole event logs are used. It seems that the
settings that are used for training the prediction method are not good. In other words,
the trained model is not accurate enough. We repeat this experiment for LSTM with
several diferent parameters, but we have almost the same results. It is mainly caused by
the challenging nature of the remaining time prediction task compared to classication-
based problems (such as next activity and outcome prediction). However, by sampling
the training event logs, we keep the quality of prediction models.
For the XGboost method, the results indicate that if we do not sample a small amount
of traces (for example, using logarithmic sampling), we can have high RMAE and RRMSE.
17 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 8: RMAE of diferent event logs when diferent sampling methods are used.
Table 9: RRMSE of diferent event logs when diferent sampling methods are used.
In general, as the main attribute for the next activity prediction is the sequence of activ-
ities, it is less sensitive to sampling. However, to predict the remaining time, the other
data attributes can be essential too. In other words, for the remaining prediction, we
need larger sampled event logs.
In Table 10, it is shown how by sampling event logs, we are able to reduce the re-
quired training time and improve the performance of the remaining time prediction pro-
cess. By considering the results in Table 10 and Table 7, as we have expected, by having
higher RSthe Rtvalue is higher.
5.3 Outcome Prediction
For the outcome prediction, in order to facilitate comparison and remain consistent with
previous work on outcome prediction, we transform each event log into diferent event
logs [40]. For example, we transform BPIC−2012 event log to BPIC−2012−Accepted,
BPIC−2012−Cancelled, and BPIC−2012−Declined.
The RAcc and RF1of both LSTM and XGboost methods that are trained for pre-
diction of outcome using sampled training data is presented in Table 11 and Table 12,
respectively. The results indicate that, in many cases, we are able to improve the accuracy
of the outcome prediction algorithms. Specically, using Unique strategy for sampling
the traces for BPIC−2012−Accepted event log leads to considerable improvement for
both LSTM and XGB methods. Unlike the other two applications, i.e., the next activ-
Table 10: Rtof diferent event logs when diferent sampling methods are used.
18 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 11: RAcc of diferent event logs when diferent sampling methods are used for outcome prediction.
Table 12: RF1of diferent event logs when diferent sampling methods are used for outcome prediction.
ity and the remaining time prediction, even by oversampling some event logs, e.g., log10,
we can obtain results similar to the cases in which the whole training event logs are used.
In Table 13,Rtof diferent sampling methods are shown. The performance improve-
ment is usually bigger for the LSTM method. There are several cases in which we are
not able to improve the performance of the prediction method (Rtvalues less than 1).
It happens mainly for the Unique sampling method. One reason could be by removing
the frequencies, the convergence time for the learning method is increased. In case the
logarithmic method is used, the performance improvement is around 50 times. It means
the training process using the sampled event log is 50 times faster than the case where the
whole training log is used.
6 Discussion
In this section, we discuss the results that are illustrated in the previous section. The
results indicate that we do not always have a typical trade-of between the accuracy of
the trained model and the performance of the prediction procedure. For example, for
the next activity prediction, there are some cases where the training process is much faster
Table 13: Rtof diferent event logs when diferent sampling methods are used for outcome prediction.
19 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
than the normal procedure, even though the trained model provides an almost similar
or higher accuracy and F1-score. Thus, the proposed instance selection procedure can
be applied when we aim to apply hyper-parameter optimization [4]. In this way, more
settings can be analyzed in a limited time. Moreover, it is reasonable to use the proposed
method when we aim to train an online prediction method or on naive hardware such
as cell phones.
To achieve the highest performance improvement while the trained model is accu-
rate enough, diferent sampling methods should be used for diferent event logs. For
example, for the RTFM event log—as there are some highly frequent variants—the
division distribution may be more useful. In other words, independently of the used
prediction method, if we change the distribution of variants (e.g., using unique distri-
bution), it is expected that the accuracy will sharply decrease. However, for event logs
with a more uniform distribution, we can use unique distributions to sample event logs.
Furthermore, the results indicate that the efect of the chosen distribution (i.e., unique,
division, and logarithmic) is more important than the used k-value. It is mainly because
the logarithmic distribution may remove some of the variants, and the unique distribu-
tion change the frequency distribution of variants. Therefore, it would be interesting
to investigate more on the characteristics of the given event log and suitable sampling
parameters for such distribution. For example, if most variants of a given event log are
unique (e.g., Sepsis), using the logarithmic distribution leads to having remarkable RS
and consequently, RFE and Rtwill be very high. However, we will lose most of the vari-
ants, and the trained model might have poor predictions.
By analyzing the results, we found that the infrequent activities can be ignored us-
ing some hyper-parameter settings. The signicant diference between F1-score and Ac-
curacy values in Table 3 indicates this problem too. Using the sampling methods that
modify the distribution of the event logs such as the unique method can help the predic-
tion methods to also consider these activities. However, as these activities are infrequent,
improving in the prediction of them would not impact highly on the presented aggre-
gated F1-score value.
Finally, in real-life business scenarios, the process can change because of diferent
reasons [7]. This phenomenon is usually called concept dri. By considering the whole
event log for training the prediction model, it is most probable that these changes are
not considered in the prediction. Using the proposed sampling procedure, and giving
higher priorities to newer traces, it is expected that we are able to adapt to the changes
faster, which may be critical for specic applications.
Limitations
Comparing the results for the next activity and remaining time prediction, we found
that predicting the remaining time of the process is more sensitive to instance selection.
20 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
In other words, this application requires more data to predict accurately. Thus, for the
cases that the target attribute depends more on other data attributes compared to variant
information, we need to sample more data to capture more related information. Other-
wise, the trained model might be inaccurate.
We also found a critical problem in predictive monitoring. In some cases, specically
using LSTM for predicting the remaining time, the accuracy of the predictions is low.
For the next activity prediction, it is possible that the prediction models almost ignore
infrequent activities. In these cases, even if we use the training data for the evaluation,
we do not have acceptable results. This problem in machine learning is called a high bias
error [32]. In other words, the training is not ecient even when using whole data and
we need to change the prediction method (or its parameters).
7 Conclusion
In this paper, we proposed an instance selection approach to improve the performance
of predictive business process monitoring methods. We suggested that it is possible to
use a sample of training event data instead of the whole training event data. To evaluate
the proposed approach, we consider two main applications of predictive business mon-
itoring, i.e., the next activity and the remaining time prediction. Results of applying the
proposed approaches on three real-life event logs and two widely used machine learning
methods, i.e., LSTM and XGboost, indicate that in most cases, we are able to improve
the performance of predictive monitoring algorithms while providing similar accuracy
compared to the case that the whole training event logs are used. However, by oversam-
pling, the accuracy of the trained model might be reduced. Moreover, we have found
that the remaining time prediction application is more sensitive to sampling.
To continue this research, we aim to extend the experiments to study the relationship
between the event log characteristics and the sampling parameters. In other words, we
aim to help the end-user to adjust the sampling parameters based on the characteristics
of the given event log. Moreover, it would be great to investigate how we can apply the
proposed sampling procedure for streaming event data which is potentially one of the
major advantages of the proposed method in a real-life setting. Finally, it is interesting to
investigate more on feature selection methods for improving the performance of the pre-
dictive monitoring procedure. It is expected that, similar to process instance sampling,
feature selection methods are able to reduce the required training time. In other words,
the training is not ecient even using whole data and we need to change the prediction
method (or its parameters).
Another important outcome of the results is that for diferent event logs, we should
use diferent sampling methods to achieve the highest performance. Considering lots
of diferent machine learning methods and their parameters can lead to an increase in
21 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
the search space and complexity for users. In other words, nding the right setting for
sampling may be challenging in real scenarios. Therefore, it would be valuable to research
the relationship between the event log characteristics and suitable sampling parameters
that can be used for preprocessing the training event log.
Acknowledgment
The authors would like to thank the Alexander von Humboldt (AvH) Stifung for fund-
ing this research.
Declarations
In this part, we provide some declarations about the conict of interest, the code avail-
ability, and the availability of data that is used in this paper.
•Conict of interest:
•Code availability: Our proposed are available in https://svn.win.tue.nl/
repos/prom/Packages/LogFiltering and https://github.com/gyunamister/
pm-prediction/. For a part of the experiments, we have used the implementa-
tion that is available at https://github.com/verenich/time-prediction-benchmark.
•Data availability: We have applied our proposed approach to the following three
publicly available datasets (event logs).
–BPIC-2012, that is accessible via https://data.4tu.nl/articles/dataset/
BPI_Challenge_2012/12689204.
–RTFM, that is accessible via https://data.4tu.nl/articles/dataset/
Road_Traffic_Fine_Management_Process/12683249.
–Sepsis that is accessible via https://data.4tu.nl/articles/dataset/
Sepsis_Cases_-_Event_Log/12707639.
References
[1] van der Aa, Han, Adrian Rebmann, and Henrik Leopold. “Natural language-
based detection of semantic execution anomalies in event logs”. In: Information
Systems 102 (2021), p. 101824. doi:10.1016/j.is.2021.101824.
[2] van der Aalst, Wil M. P. Process Mining - Data Science in Action, Second Edition.
Springer, 2016. isbn: 978-3-662-49850-7. doi:10.1007/978-3-662-49851-
4.
22 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[3] van der Aalst, Wil M. P., M. H. Schonenberg, and Minseok Song. “Time predic-
tion based on process mining”. In: Information Systems 36.2 (2011), pp. 450–475.
doi:10.1016/j.is.2010.09.001.
[4] Bergstra, James, R´
emi Bardenet, Yoshua Bengio, and Bal´
azs K´
egl. “Algorithms
for Hyper-Parameter Optimization”. In: Advances in Neural Information Pro-
cessing Systems 24: 25th Annual Conference on Neural Information Processing Sys-
tems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain.
Ed. by Shawe-Taylor, John, Richard S. Zemel, Peter L. Bartlett, Fernando C. N.
Pereira, and Kilian Q. Weinberger. 2011, pp. 2546–2554. url:https:/ / proc
eedings. neurips . cc/ paper/ 2011/hash/ 86e8f7ab32cfd12577bc
2619bc635690-Abstract.html.
[5] Breiman, Leo. “Bagging Predictors”. In: Machine Learning 24.2 (1996), pp. 123–
140. doi:10.1007/BF00058655.
[6] Breuker, Dominic, Martin Matzner, Patrick Delfmann, and J¨
org Becker. “Com-
prehensible Predictive Models for Business Processes”. In: MIS Quarterly 40.4
(2016), pp. 1009–1034. url:http://misq.org/comprehensible-predic
tive-models-for-business-processes.html.
[7] Carmona, Josep and Ricard Gavald`
a. “Online Techniques for Dealing with Con-
cept Drif in Process Mining”. In: Advances in Intelligent Data Analysis XI
- 11th International Symposium, IDA 2012, Helsinki, Finland, October 25-27,
2012. Proceedings. Ed. by Hollm´
en, Jaakko, Frank Klawonn, and Allan Tucker.
Vol. 7619. Lecture Notes in Computer Science. Springer, 2012, pp. 90–102. doi:
10.1007/978-3-642-34156-4_10.
[8] Cazenavette, George, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and
Jun-Yan Zhu. “Dataset Distillation by Matching Training Trajectories”. In: IEEE/CVF
Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Or-
leans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 10708–10717. doi:10.1109/
CVPR52688.2022.01045.
[9] Chen, Tianqi and Carlos Guestrin. “XGBoost: A Scalable Tree Boosting Sys-
tem”. In: Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17,
2016. Ed. by Krishnapuram, Balaji, Mohak Shah, Alexander J. Smola, Charu C.
Aggarwal, Dou Shen, and Rajeev Rastogi. ACM, 2016, pp. 785–794. doi:10 .
1145/2939672.2939785.
[10] Evermann, Joerg, Jana-Rebecca Rehse, and Peter Fettke. “Predicting process be-
haviour using deep learning”. In: Decision Support Systems 100 (2017), pp. 129–
140. doi:10.1016/j.dss.2017.04.003.
23 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[11] Fani Sani, Mohammadreza, Mozhgan Vazifehdoostirani, Gyunam Park, Marco
Pegoraro, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. “Event Log Sam-
pling for Predictive Monitoring”. In: Process Mining Workshops - ICPM 2021
International Workshops, Eindhoven, The Netherlands, October 31 - November
4, 2021, Revised Selected Papers. Ed. by Munoz-Gama, Jorge and Xixi Lu. Vol. 433.
Lecture Notes in Business Information Processing. Springer, 2021, pp. 154–166.
doi:10.1007/978-3-030-98581-3_12.
[12] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“Conformance Checking Approximation Using Subset Selection and Edit Dis-
tance”. In: Advanced Information Systems Engineering - 32nd International Con-
ference, CAiSE 2020, Grenoble, France, June 8-12, 2020, Proceedings. Ed. by Dust-
dar, Schahram, Eric Yu, Camille Salinesi, Dominique Rieu, and Vik Pant. Vol. 12127.
Lecture Notes in Computer Science. Springer, 2020, pp. 234–251. doi:10.1007/
978-3-030-49435-3_15.
[13] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“The impact of biased sampling of event logs on the performance of process dis-
covery”. In: Computing 103.6 (2021), pp. 1085–1104. doi:10. 1007 / s00607 -
021-00910-4.
[14] Galanti, Riccardo, Bernat Coma-Puig, Massimiliano de Leoni, Josep Carmona,
and Nicol´
o Navarin. “Explainable Predictive Process Monitoring”. In: 2nd In-
ternational Conference on Process Mining, ICPM 2020, Padua, Italy, October
4-9, 2020. Ed. by van Dongen, Boudewijn F., Marco Montali, and Moe Thandar
Wynn. IEEE, 2020, pp. 1–8. doi:10.1109/ICPM49681.2020.00012.
[15] Garc´
ıa, Salvador, Juli´
an Luengo, and Francisco Herrera. Data Preprocessing in
Data Mining. Vol. 72. Intelligent Systems Reference Library. Springer, 2015. isbn:
978-3-319-10246-7. doi:10.1007/978-3-319-10247-4.
[16] Huang, Zhiheng, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF Models for
Sequence Tagging”. In: CoRR abs/1508.01991 (2015). arXiv: 1508.01991.
[17] de Leoni, Massimiliano, Wil M. P. van der Aalst, and Marcus Dees. “A general
process mining framework for correlating, predicting and clustering dynamic be-
havior based on event logs”. In: Information Systems 56 (2016), pp. 235–257. doi:
10.1016/j.is.2015.07.003.
[18] Luque, Amalia, Alejandro Carrasco, Alejandro Mart´
ın, and Ana de las Heras.
“The impact of class imbalance in classication performance metrics based on
the binary confusion matrix”. In: Pattern Recognition 91 (2019), pp. 216–231. doi:
10.1016/j.patcog.2019.02.023.
24 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[19] M´
arquez-Chamorro, Alfonso Eduardo, Manuel Resinas, and Antonio Ruiz-Cort´
es.
“Predictive Monitoring of Business Processes: A Survey”. In: IEEE Transactions
on Services Computing 11.6 (2018), pp. 962–977. doi:10.1109/TSC . 2017 .
2772256.
[20] Navarin, Nicol´
o, Beatrice Vincenzi, Mirko Polato, and Alessandro Sperduti. “LSTM
networks for data-aware remaining time prediction of business process instances”.
In: 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017, Hon-
olulu, HI, USA, November 27 - Dec. 1, 2017. IEEE, 2017, pp. 1–7. doi:10.1109/
SSCI.2017.8285184.
[21] Nguyen, An, Srijeet Chatterjee, Sven Weinzierl, Leo Schwinn, Martin Matzner,
and Bjoern M. Eskoer. “Time Matters: Time-Aware LSTMs for Predictive Busi-
ness Process Monitoring”. In: Process Mining Workshops - ICPM 2020 Interna-
tional Workshops, Padua, Italy, October 5-8, 2020, Revised Selected Papers. Ed. by
Leemans, Sander J. J. and Henrik Leopold. Vol. 406. Springer, 2020, pp. 112–123.
doi:10.1007/978-3-030-72693-5_9.
[22] Park, Gyunam and Wil M. P. van der Aalst. “Action-oriented process mining:
bridging the gap between insights and actions”. In: Progress in Artificial Intelli-
gence (2022). issn: 2192-6352, 2192-6360. doi:10.1007/s13748-022-00281-
7.
[23] Park, Gyunam, Aaron K ¨
usters, Mara Tews, Cameron Pitsch, Jonathan Schneider,
and Wil M. P. van der Aalst. “Explainable Predictive Decision Mining for Oper-
ational Support”. In: CoRR abs/2210.16786 (2022). doi:10.48550/ arXiv .
2210.16786.
[24] Park, Gyunam and Minseok Song. “Predicting performances in business pro-
cesses using deep neural networks”. In: Decision Support Systems 129 (2020). doi:
10.1016/j.dss.2019.113191.
[25] Park, Gyunam and Minseok Song. “Prediction-based Resource Allocation using
LSTM and Minimum Cost and Maximum Flow Algorithm”. In: International
Conference on Process Mining, ICPM 2019, Aachen, Germany, June 24-26, 2019.
IEEE, 2019, pp. 121–128. doi:10.1109/ICPM.2019.00027.
[26] Pauwels, Stephen and Toon Calders. “Incremental Predictive Process Monitor-
ing: The Next Activity Case”. In: Business Process Management - 19th Interna-
tional Conference, BPM 2021, Rome, Italy, September 06-10, 2021, Proceedings.
Ed. by Polyvyanyy, Artem, Moe Thandar Wynn, Amy Van Looy, and Manfred
Reichert. Vol. 12875. Lecture Notes in Computer Science. Springer, 2021, pp. 123–
140. doi:10.1007/978-3-030-85469-0_10.
25 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[27] Pegoraro, Marco, Merih Seran Uysal, David Benedikt Georgi, and Wil M. P. van
der Aalst. “Text-Aware Predictive Monitoring of Business Processes”. In: 24th
International Conference on Business Information Systems, BIS 2021, Hannover,
Germany, June 15-17, 2021. Ed. by Abramowicz, Witold, S ¨
oren Auer, and Elzbieta
Lewanska. 2021, pp. 221–232. doi:10.52825/bis.v1i.62.
[28] Pegoraro, Marco, Merih Seran Uysal, Tom-Hendrik H¨
ulsmann, and Wil M. P.
van der Aalst. “Resolving Uncertain Case Identiers in Interaction Logs: A User
Study”. In: CoRR abs/2212.00009 (2022). doi:10 . 48550 / arXiv . 2212 .
00009.
[29] Pegoraro, Marco, Merih Seran Uysal, Tom-Hendrik H¨
ulsmann, and Wil M. P.
van der Aalst. “Uncertain Case Identiers in Process Mining: A User Study of the
Event-Case Correlation Problem on Click Data”. In: Enterprise, Business-Process
and Information Systems Modeling - 23rd International Conference, BPMDS
2022 and 27th International Conference, EMMSAD 2022, Held at CAiSE 2022,
Leuven, Belgium, June 6-7, 2022, Proceedings. Ed. by Augusto, Adriano, Asif
Gill, Dominik Bork, Selmin Nurcan, Iris Reinhartz-Berger, and Rainer Schmidt.
Vol. 450. Lecture Notes in Business Information Processing. Springer, 2022, pp. 173–
187. doi:10.1007/978-3-031-07475-2_12.
[30] Polato, Mirko, Alessandro Sperduti, Andrea Burattin, and Massimiliano de Leoni.
“Time and activity sequence prediction of business process instances”. In: Com-
puting 100.9 (2018), pp. 1005–1031. doi:10.1007/s00607-018-0593-x.
[31] Pourghassemi, Behnam, Chenghao Zhang, Joo Hwan Lee, and Aparna Chan-
dramowlishwaran. “On the Limits of Parallelizing Convolutional Neural Net-
works on GPUs”. In: SPAA ’20: 32nd ACM Symposium on Parallelism in Al-
gorithms and Architectures, Virtual Event, USA, July 15-17, 2020. Ed. by Schei-
deler, Christian and Michael Spear. ACM, 2020, pp. 567–569. doi:10.1145/
3350755.3400266.
[32] van der Putten, Peter and Maarten van Someren. “A Bias-Variance Analysis of a
Real World Learning Problem: The CoIL Challenge 2000”. In: Machine Learn-
ing 57.1-2 (2004), pp. 177–195. doi:10.1023/B:MACH.0000035476.95130.
99.
[33] Qafari, Mahnaz Sadat and Wil M. P. van der Aalst. “Root Cause Analysis in Pro-
cess Mining Using Structural Equation Models”. In: Business Process Manage-
ment Workshops - BPM 2020 International Workshops, Seville, Spain, Septem-
ber 13-18, 2020, Revised Selected Papers. Ed. by del-R´
ıo-Ortega, Adela, Henrik
Leopold, and Fl´
avia Maria Santoro. Vol. 397. Lecture Notes in Business Infor-
mation Processing. Springer, 2020, pp. 155–167. doi:10.1007/978-3- 030-
66498-5_12.
26 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[34] Rogge-Solti, Andreas and Mathias Weske. “Prediction of Remaining Service Exe-
cution Time Using Stochastic Petri Nets with Arbitrary Firing Delays”. In: Service-
Oriented Computing - 11th International Conference, ICSOC 2013, Berlin, Ger-
many, December 2-5, 2013, Proceedings. Ed. by Basu, Samik, Cesare Pautasso,
Liang Zhang, and Xiang Fu. Vol. 8274. Springer, 2013, pp. 389–403. doi:10.
1007/978-3-642-45005-1_27.
[35] Senderovich, Arik, Chiara Di Francescomarino, Chiara Ghidini, Kerwin Jorbina,
and Fabrizio Maria Maggi. “Intra and Inter-case Features in Predictive Process
Monitoring: A Tale of Two Dimensions”. In: Business Process Management - 15th
International Conference, BPM 2017, Barcelona, Spain, September 10-15, 2017,
Proceedings. Ed. by Carmona, Josep, Gregor Engels, and Akhil Kumar. Vol. 10445.
Lecture Notes in Computer Science. Springer, 2017, pp. 306–323. doi:10.1007/
978-3-319-65000-5_18.
[36] Sindhgatta, Renuka, Catarina Moreira, Chun Ouyang, and Alistair Barros. “Ex-
ploring Interpretable Predictive Models for Business Processes”. In: Business Pro-
cess Management - 18th International Conference, BPM 2020, Seville, Spain,
September 13-18, 2020, Proceedings. Ed. by Fahland, Dirk, Chiara Ghidini, J¨
org
Becker, and Marlon Dumas. Vol. 12168. Lecture Notes in Computer Science. Springer,
2020, pp. 257–272. doi:10.1007/978-3-030-58666-9_15.
[37] Stierle, Matthias, Jens Brunk, Sven Weinzierl, Sandra Zilker, Martin Matzner, and
J¨
org Becker. “Bringing Light Into the Darkness - A Systematic Literature Review
on Explainable Predictive Business Process Monitoring Techniques”. In: 28th Eu-
ropean Conference on Information Systems - Liberty, Equality, and Fraternity in
a Digitizing World , ECIS 2020, Marrakech, Morocco, June 15-17, 2020. Ed. by
Rowe, Frantz, Redouane El Amrani, Moez Limayem, Sabine Matook, Christoph
Rosenkranz, Edgar A. Whitley, and Ali El Quammah. 2021. url:https : / /
aisel.aisnet.org/ecis2021_rip/8.
[38] Tax, Niek, Ilya Verenich, Marcello La Rosa, and Marlon Dumas. “Predictive Busi-
ness Process Monitoring with LSTM Neural Networks”. In: Advanced Infor-
mation Systems Engineering - 29th International Conference, CAiSE 2017, Es-
sen, Germany, June 12-16, 2017, Proceedings. Ed. by Dubois, Eric and Klaus Pohl.
Vol. 10253. Springer, 2017, pp. 477–492. doi:10.1007/978-3-319-59536-
8_30.
[39] Teinemaa, Irene, Marlon Dumas, Fabrizio Maria Maggi, and Chiara Di Francesco-
marino. “Predictive Business Process Monitoring with Structured and Unstruc-
tured Data”. In: Business Process Management - 14th International Conference,
BPM 2016, Rio de Janeiro, Brazil, September 18-22, 2016. Proceedings. Ed. by
Rosa, Marcello La, Peter Loos, and Oscar Pastor. Vol. 9850. Lecture Notes in
27 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Computer Science. Springer, 2016, pp. 401–417. doi:10.1007/978- 3-319-
45348-4_23.
[40] Teinemaa, Irene, Marlon Dumas, Marcello La Rosa, and Fabrizio Maria Maggi.
“Outcome-Oriented Predictive Process Monitoring: Review and Benchmark”.
In: ACM Transactions on Knowledge Discovery from Data 13.2 (2019), 17:1–17:57.
doi:10.1145/3301300.
[41] Verbeek, Eric, Joos C. A. M. Buijs, Boudewijn F. van Dongen, and Wil M. P. van
der Aalst. “ProM 6: The Process Mining Toolkit”. In: Proceedings of the Business
Process Management 2010 Demonstration Track, Hoboken, NJ, USA, September
14-16, 2010. Ed. by Rosa, Marcello La. Vol. 615. CEUR Workshop Proceedings.
CEUR-WS.org, 2010. url:http :/ /ceur - ws .org / Vol- 615 /paper13 .
pdf.
[42] Verenich, Ilya. “Explainable Predictive Monitoring of Temporal Measures of Busi-
ness Processes”. In: Proceedings of the Dissertation Award, Doctoral Consortium,
and Demonstration Track at BPM 2019 co-located with 17th International Con-
ference on Business Process Management, BPM 2019, Vienna, Austria, Septem-
ber 1-6, 2019. Ed. by Depaire, Benoˆ
ıt, Johannes De Smedt, Marlon Dumas, Dirk
Fahland, Akhil Kumar, Henrik Leopold, Manfred Reichert, Stefanie Rinderle-
Ma, Stefan Schulte, Stefan Seidel, and Wil M. P. van der Aalst. Vol. 2420. CEUR
Workshop Proceedings. CEUR-WS.org, 2019, pp. 26–30. url:http://ceur-
ws.org/Vol-2420/paperDA6.pdf.
[43] Verenich, Ilya, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, and
Irene Teinemaa. “Survey and Cross-benchmark Comparison of Remaining Time
Prediction Methods in Business Process Monitoring”. In: ACM Transactions on
Intelligent Systems and Technolo 10.4 (2019), 34:1–34:34. doi:10.1145/3331449.
[44] Wilson, D. Randall and Tony R. Martinez. “Reduction Techniques for Instance-
Based Learning Algorithms”. In: Machine Learning 38.3 (2000), pp. 257–286.
doi:10.1023/A:1007626913721.
[45] Wilson, Dennis L. “Asymptotic Properties of Nearest Neighbor Rules Using
Edited Data”. In: IEEE Transactions on Systems, Man and Cybernetics 2.3 (1972),
pp. 408–421. doi:10.1109/TSMC.1972.4309137.
[46] Zhou, Lina, Shimei Pan, Jianwu Wang, and Athanasios V. Vasilakos. “Machine
learning on big data: Opportunities and challenges”. In: Neurocomputing 237
(2017), pp. 350–361. doi:10.1016/j.neucom.2017.01.026.
28 / 28