PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods require a hyper-parameter optimization that requires several repetitions of the training process which is not feasible in many real-life applications. In this paper, we propose an instance selection procedure that allows sampling training process instances for prediction models. We show that our instance selection procedure allows for a significant increase of training speed for next activity and remaining time prediction methods while maintaining reliable levels of prediction accuracy.
Content may be subject to copyright.
Performance-Preserving Event Log Sampling
for Predictive Monitoring
Mohammadreza Fani Sani 1, Mozhgan Vazifehdoostirani 2,
Gyunam Park 1, Marco Pegoraro 1, Sebastiaan J. van Zelst 3,1, and
Wil M.P. van der Aalst 1,3
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
{fanisani, gnpark, pegoraro, s.j.v.zelst, wvdaalst}@pads.rwth-aachen.de
2Industrial Engineering and Innovation Science, Eindhoven University of Technolo,
Eindhoven, the Netherlands
m.vazifehdoostirani@tue.nl
3Fraunhofer FIT, Birlinghoven Castle, Sankt Augustin, Germany
Abstract
Predictive process monitoring is a subeld of process mining that aims to estimate
case or event features for running process instances. Such predictions are of sig-
nicant interest to the process stakeholders. However, most of the state-of-the-art
methods for predictive monitoring require the training of complex machine learn-
ing models, which is ofen inecient. Moreover, most of these methods require a
hyper-parameter optimization that requires several repetitions of the training pro-
cess which is not feasible in many real-life applications. In this paper, we propose
an instance selection procedure that allows sampling training process instances for
prediction models. We show that our instance selection procedure allows for a sig-
nicant increase of training speed for next activity and remaining time prediction
methods while maintaining reliable levels of prediction accuracy.
Keywords: Process Mining ·Predictive Monitoring ·Sampling ·Machine Learn-
ing ·Deep Learning ·Instance Selection.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the authors. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Fani Sani, Mohammadreza, Mozhgan Vazifehdoostirani, Gyunam Park, Marco Pegoraro, Sebastiaan j. van Zelst, and
Wil M. P. van der Aalst. “Performance-Preserving Event Log Sampling for Predictive Monitoring”. In: Journal of Intel-
ligent Information Systems (2023)
Please, cite this document as shown above.
Publication chronology:
2022-03-09: full text submitted to the Springer Journal of Intelligent Information Systems
2022-06-01: major revision requested
2022-09-01: revised version submitted
2022-11-10: minor revision requested
2022-12-07: revised version submitted
2022-12-29: notication of acceptance
2023-03-06: published
The published version referred above is ©Springer.
Correspondence to:
Mohammadreza Fani Sani, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Email: fanisani@pads.rwth-aachen.de ·ORCID: 0000-0003-3152-2103
Content: 28 pages, 1 gure, 13 tables, 46 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
1 Introduction
The main goal of predictive process monitoring is to provide timely information by pre-
dicting the behavior of business processes [3] and enabling proactive actions to improve
the performance of the process [22]. It provides various predictive information such as
the next performing activity of a process instance [6], e.g., patient and product, its wait-
ing time for an activity, its remaining time to complete the process, etc [19]. For instance,
by predicting the long waiting time of a patient for registration, one can bypass the ac-
tivity or add more resources to perform it.
A plethora of approaches have been proposed to support predictive process mon-
itoring. In particular, with the recent breakthroughs in machine learning, various ma-
chine learning-based approaches have been developed [19]. The emergence of ensemble
learning methods leads to improvement in accuracy in diferent areas [5]. eXtreme Gra-
dient Boosting (XGBoost) [9] has shown promising results, ofen outperforming other
ensemble methods such as Random Forest or using a single regression tree [35,40]. Fur-
thermore, techniques based on deep neural networks, e.g., Long-Short Term Memory
(LSTM) networks, have shown high performance in diferent predictive tasks [10].
However, machine learning-based techniques are computationally expensive due to
their training process [46]. Moreover, they ofen require exhaustive hyperparameter-
tuning to provide acceptable accuracy. Such limitations hinder the application of ma-
chine learning-based techniques into real-life business processes where new prediction
models are required in short intervals to adapt to changing business situations. Business
analysts need to test the eciency and reliability of their conclusions via repeated train-
ing of diferent prediction models with diferent parameters [19]. Such long training
time limits the application of the techniques when considering the limitations in time
and hardware [31].
In this regard, instance selection has been studied as a promising direction of research
to reduce original datasets to a manageable volume to perform machine learning tasks,
while the quality of the results (e.g., accuracy) is maintained as if the original dataset was
used [15]. Instance selection techniques are categorized into two classes based on the
way they select instances. First, some techniques select the instances at the boundaries
of classes. For instance, Decremental Reduction Optimization Procedure (DROP) [44]
selects instances using k-Nearest Neighbors by incrementally discarding an instance if its
neighbors are correctly classied without the instance. The other techniques preserve
the instances residing inside classes, e.g., Edited Nearest Neighbor (ENN) [45] preserves
instances by repeatedly discarding an instance if it does not belong to the class of the
majority of its neighbors.
However, it is restricted to directly apply existing techniques for instance selection
to predictive process monitoring training, since such techniques assume independence
among instances [44]. Instead, in predictive process monitoring, instances are computed
3 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
from event data that are recorded by the information system supporting business pro-
cesses [17]. Thus, they are highly correlated [2] with the notion of case, e.g., a manufac-
turing product in a factory and a customer in a store.
In this work, we suggest an instance selection approach for predicting the next ac-
tivity, the remaining time and the outcome of the process that are main applications of
predictive business process monitoring. By considering the characteristics of the event
data, the proposed approach samples event data such that the training speed is improved
while the accuracy of the resulting prediction model is maintained. We have evaluated
the proposed methods using three real-life datasets and state-of-the-art techniques for
predictive business process monitoring, including LSTM [16] and XGBoost [9].
This paper extends our earlier work presented in [11] in the following dimensions:
1) Evaluating the applicability of the proposed approach, 2) Enhancing the accessibil-
ity of the work, and 3) Extending the discussion of strengths and limitations. First, we
have evaluated the applicability of the proposed approach both task-wise and domain-
wise. For the task-wise evaluation, we have selected the three most well-known pre-
dictive monitoring tasks (i.e., next activity,remaining time, and outcome predictions)
and evaluated the performance of the proposed approach in the diferent tasks. For the
domain-wise evaluation, we have evaluated the performance of our proposed approach
in real-life event logs from diferent domains, including finance,government, and health-
care domains. Second, we have extended the accessibility of the proposed approach by
implementing the proposed sampling methods in the Python platform as well as the
Java platform. Finally, we have extensively discussed the strengths and limitations of the
proposed approach, providing foundations for further research.
The remainder is organized as follows. We discuss the related work in Section 2.
Next, we present the preliminaries in Section 3 and proposed methods in Section 4. Af-
terward, Section 5 evaluates the proposed methods using real-life event data and Section 6
provides discussions. Finally, Section 7 concludes the paper.
2 Related Work
This section presents the related work on predictive process monitoring, time optimiza-
tion, and instance sampling.
2.1 Predictive Process Monitoring
Predictive process monitoring is an exceedingly active eld, both currently and histori-
cally, thanks to the compatibility of process sciences and other branches of data science
that include inference techniques, such as statistics and machine learning. At its core, the
fundamental component of many predictive monitoring approaches is the abstraction
4 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
technique it uses to obtain a xed-length representation of the process component sub-
ject to the prediction (ofen, but not always, process traces). In the earlier approaches,
the need for such abstraction was overcome through model-aware techniques, employ-
ing process models and replay techniques on partial traces to abstract a at representation
of event sequences. Such process models are mostly automatically discovered from a set
of available complete traces, and require perfect tness on training instances (and, sel-
domly, also on unseen test instances). For instance, Van der Aalst et al. [3] proposed a
time prediction framework based on replaying partial traces on a transition system, ef-
fectively clustering training instances by control-ow information. This framework has
later been the basis for a prediction method by Polato et al. [30], where the transition
system is annotated with an ensemble of SVR and Na¨
ıve Bayes classiers, to perform a
more accurate time estimation. Some more recent approaches split the predictive con-
tribution of process models and machine learning models in a perspective-wise manner:
for instance, Park et al. [24] obtain a representation of the performance perspective us-
ing an annotated transition system, and design an ensemble with a deep neural network
to obtain the nal predictive model. A related approach, albeit more linked to the simu-
lation domain and based on a Monte Carlo method, is the one proposed by Rogge-Solti
and Weske [34], which maps partial process instances in an enriched Petri net.
Recently, predictive process monitoring started to use a plethora of machine learn-
ing approaches, achieving varying degrees of success. For instance, Teinemaa et al. [39]
provided a framework to combine text mining methods with Random Forest and Logis-
tic Regression. Senderovich et al. [35] studied the efect of using intra-case and inter-case
features in predictive process monitoring and showed a promising result for XGBoost
compared to other ensemble and linear methods. A comprehensive benchmark on using
classical machine learning approaches for outcome-oriented predictive process monitor-
ing tasks [40] has shown that the XGBoost is the best-performing classier among dif-
ferent machine learning approaches such as SVM, Decision Tree, Random Forest, and
logistic regression.
More recent methods are model-unaware and perform based on a single and more
complex machine learning model instead of an ensemble. In fact, such an evolution of
predictive monitoring mimics the advancement in the accuracy of newer machine learn-
ing approaches, specically the numerous and sophisticated models based on deep neu-
ral networks that have been developed in the last decade. The LSTM network model has
proven to be particularly efective for predictive monitoring [10,38], since the recurrent
architecture can natively support sequences of data of arbitrary length. It allows per-
forming trace prediction while employing a xed-length event abstraction, which can be
based on control-ow alone [10,38], data-aware [20], time-aware [21], text-aware [27],
or model-aware [24]. Additionally, rather than leveraging control-ow information for
prediction, some recent research aims to use predictive monitoring to reconstruct miss-
ing control-ow attributes such as labels [1] or case identiers [29,28]. However, the
5 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
body of work currently standing in predictive process monitoring, regardless of architec-
ture of the predictive model and/or target of the prediction, does not include strategies
for performance-preserving sampling.
2.2 Time Optimization and Instance Sampling
The latest research developments in the eld of predictive monitoring, much like both
this paper and the application of machine learning techniques in other domains, shifs its
focus away from increasing the quality of prediction (with respect to a given error met-
ric or a given benchmark), and specializes in enriching the results of the prediction on
additional aspects or perspectives. Unlike the methods mentioned in the previous para-
graph, this is usually obtained through dedicated machine learning models—or modi-
cations thereof—rather than designing specic event- or trace-level abstractions. For
instance, many scholars have attempted to make predictions more transparent through
the use of explainable machine learning techniques [42,37,36,14,23]. More related
to our present work, Pauwels and Calders [26] propose a technique to avoid the time
expenditure caused by the retrain of machine learning models; this is necessary when
they are not representative anymore—for instance, when changes occur in the underly-
ing process (caused e.g. by concept drif). While in this paper we focus on the data, and
we propose a solution based on sampling, Pauwels and Calders intervene on the model
side, devising an incremental training schema which accounts for new information in an
ecient way.
Another concept similar to the idea proposed in this paper, and of current interest in
the eld of machine learning, is dataset distillation: utilizing a dataset to obtain a smaller
set of training instances that contain the same information (with respect to training a
machine learning model) [8]. While this is not considered sampling, since some instances
of the distilled dataset are created ex-novo, it is an approach very similar to the one we
illustrate in our paper.
The concept of instance sampling, or instance subset selection, is present in the con-
text of process mining at large, albeit the development of such techniques is very recent.
Some instance selection algorithms have been proposed to help classical process mining
tasks. For example, [13] proposes to use instance selection techniques to improve the
performance of process discovery algorithms; in this context, the goal is to obtain auto-
matically a descriptive model of the process, on the basis of the data recorded about the
historical executions of process instances—the same starting point of the present work.
Then, the work in [12] applies the same concept and integrates the edit distance to ob-
tain a fast technique to approximate the conformance checking score of process traces:
this consists of measuring the deviation between a model, which ofen represents the
normative or prescribed behavior of a process, and the data, which represents the actual
behavior of a process. This paper integrates the two aforementioned works, extending
6 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 1: Simple example of an event log. Rows capture events recorded in the context of the execution of
the process. An event describes at what point in time an activity was performed. Other data attributes may
be available as well.
Case-id Event-id Activity name Starting time Finishing time ...
.
.
..
.
..
.
..
.
..
.
.. . .
7 35 Register(a) 2021-01-02 12:23 2021-01-02 12:25 . . .
7 36 Analyze Defect(b) 2021-01-02 12:30 2021-01-02 12:40 . . .
7 37 Inform User(g) 2021-01-02 12:45 2021-01-02 12:47 . . .
8 39 Register(a) 2021-01-02 12:23 2021-01-02 13:15 . . .
7 40 Test Repair(e) 2021-01-02 13:05 2021-01-02 13:20 . . .
7 41 Archive Repair(h) 2021-01-02 13:21 2021-01-02 13:22 . . .
8 42 Analyze Defect(b) 2021-01-02 12:30 2021-01-02 13:30 . . .
.
.
..
.
..
.
..
.
..
.
....
the efects of strategic sampling: while in [13] the sampling optimizes descriptive model-
ing and in [12] it optimizes process diagnostics, in this work it aids predictive modeling.
To the best of our knowledge, no work present in literature inspects the efects of
building a training set for predictive process monitoring through a strategic process in-
stance selection, with the exception of our previous work, which we extend in this pa-
per [11]. In this paper, we examine the underexplored topic of event data sampling and se-
lection for predictive process monitoring, with the objective of assessing if and to which
extent prediction quality can be retained when we utilize subsets of the training data.
3 Preliminaries
In this section, some process mining concepts such as event log and sampling are dis-
cussed. In process mining, we use events to provide insights into the execution of busi-
ness processes. Event logs, i.e., collections of events representing the execution of several
instances of a process, are the starting point of process mining algorithms. An exam-
ple event log is shown in Table 1. Each event that relates to a row in the table is related
to specic activities of the underlying process. Furthermore, we refer to a collection of
events related to a specic process instance of the process as a case (represented by the
Case-id column). Both cases and events may have diferent attributes. An event log that
is a collection of events and cases is dened as follows.
Definition 1 (Event Log).Let UEbe the universe of events, UCbe the universe
of cases, UAT be the universe of attributes, and Ube the universe of attribute values.
Moreover, let CUCbe a non-empty set of cases, let EUEbe a non-empty set
of events. We define (C, E, πC, πE)as an event log, where πC:C×UAT 6→ Uand
7 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
πE:E×UAT 6→ U. Any event in the event log has a case, and thus, @eE(πE(e, case)6∈
C)and S
eE
(πE(e, case)) = C.
Let AUbe the universe of activities and VAbe the universe of sequences
of activities. For any eE, function πE(e, activity)A, which means that any event
in the event log has an activity. Moreover, for any cCfunction πC(c, variant)
A\ {h i} that means any case in the event log has a variant.
Therefore, there are some mandatory attributes that are case and activity for events
and variants for cases. For example, for event with Event-id equals to 35 in Table 1,
πE(e, case)) = 7 and πE(e, activity) = Register(a).
Variants are the sequence of activities that are presented in each case. For example,
for case 7in Table 1, the variant is ha, b, g, e, hi(for the simplicity we show each activ-
ity by a letter). Variant information plays an important role an in some process mining
applications, e.g., process discovery and conformance checking, just this information is
considered. In this regard, event logs are considered as a multiset of sequences of activi-
ties. In the following, a simple event log is dened.
Definition 2 (Simple event log).Let Abe the universe of activities and let the
universe of multisets over a set Xbe denoted by B(X). A simple event log is LB(A).
Moreover, let EL be the universe of event logs and EL = (C, E, πC, πE)EL be an event
log. We define function sl :EL B(A)returns the simple event log of an event log
where sl(EL) = [σk|σ {πC(c, variant)|cC} k= ΣcCπC(c, variant ) = σ].
The set of unique variants in the event log is denoted by sl(EL) = {πC(c, variant )|c
C}.
Therefore, sl returns the multiset of variants in the event logs. Note that the size of
a simple event log equals the number of cases in the event logs, i.e., |sl(EL)|=|C|
In this paper, we use sampling techniques to reduce the size of event logs. An event
log sampling method is dened as follows.
Definition 3 (Event log sampling).Let EL be the universe of event logs and Abe the
universe of activities. Moreover, let EL = (C, E, πC, πE)EL be an event log, we define
function δ:EL EL that returns the sampled event log where if (C0, E0, π0
C, π0
E) =
δ(EL), then C0C,E0E,π0
EπE,π0
CπC, and consequently, sl(δ(EL))
sl(EL). We define that δis a variant-preserving sampling if sl(δ(EL)) = sl(EL).
In other words, a sampling method is variant-preserving if and only if all the variants
of the original event log are presented in the sampled event log.
To use machine learning methods for prediction, we usually need to transfer each
case to one or more features. The feature is dened as follows.
8 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Figure 1: A schematic view of the proposed sampling procedure
Definition 4 (Feature).Let UAT be the universe of attributes, Ube the universe of
attribute values, and UCbe the universe of cases. Moreover, let AT UAT be a set of
attributes. A feature is a relation between a sequence of attributes’ values for AT and the
target attribute value, i.e., f(U|AT|×U). We define fe:UC×EL B(U|AT |×U)
is a function that receives a case and an event log, and returns a multiset of features.
For the next and nal activity prediction, the target attribute value should be an
activity. However, for the remaining time prediction, the target attribute value is a nu-
merical value. Moreover, a case in the event log may have diferent features. For example,
suppose that we only consider the activities. For the case ha, b, c, di, we may have (hai, b),
(ha, bi, c), and (ha, b, ci, d )as features. Furthermore, PcCfe(c, EL)are the correspond-
ing features of event log EL = (C, E, πC, πE)that could be given to diferent machine
learning algorithms. For more details on how to extract features from event logs please
refer to [33].
4 Proposed Sampling Methods
In this section, we propose an event log preprocessing procedure that helps prediction
algorithms to perform faster while maintaining reasonable accuracy. The schematic view
of the proposed instance selection approach is presented in Figure 1. First, we need to
traverse the event log and nd the variants and corresponding traces of each variant in
the event log. Moreover, diferent distributions of data attributes in each variant will be
computed. Aferward, using diferent sorting and instance selection strategies, we are
able to select some of the cases and return the sample event log. In the following, each
of these steps is explained in more detail. To illustrate the following steps, we provide an
example event log with 10 cases, visible in Table 2.
1. Traversing the event log: In this step, the unique variants of the event log and
the corresponding traces of each variant are determined. In other words, con-
sider event log EL that sl(EL) = {σ1, . . . , σn}where n=|sl(EL)|, we aim to
split EL to EL1, . . . , ELnwhere ELionly contains all the cases that Ci={c
C|πC(c, variant) = σi}and Ei={eE|πE(e, case)Ci}. Obviously,
9 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
S1in(Ci) = Cand T1in(Ci) = . For the event log that is presented in
Table 2, we have n= 4 variants and E1={c1, c3, c4, c9, c10},E2={c2, c5, c8},
E3={c6}, and E1={c7}.
2. Computing Distribution: In this step, for each variant of the event log, we compute
the distribution of diferent data attributes aAT. It would be more practical if
the interesting attributes are chosen by an expert. Both event and case attributes
can be considered. A simple approach is to compute the frequency of categorical
data values. For numerical data attributes, it is possible to consider the average
or the median of values for all cases of each variant. In the running example for
E3and E4, we only have one case for each variant. However, for E1, and E2, the
average of Amount is 500 and 460, respectively.
3. Sorting the cases of each variant: In this step, we aim to sort the traces of each
variant. We need to sort the traces to give a higher priority to those traces that can
represent the variant better. One way is to sort the traces based on the frequency
of the existence of the most occurred data values of the variant. For example, we
can give a higher priority to the traces that have more frequent resources of each
variant. For the event log that is presented in Table 2, we do not need to prioritize
the cases in E3and E4. However, if we sort the traces according to their distance
of amount value and the average value of each variant, for E1, we have c3, c9, c4, c1,
and c10. The order for E2is c5, c2, and c8. It is also possible to sort the traces based
on their arrival time or randomly.
4. Returning sample event logs: Finally, depending on the setting of the sampling
function, we return some of the traces with the highest priority for all variants.
The most important point about this step is to know how many traces of each
variant should be selected.
In the following, some possibilities will be introduced.
Unique selection: In this approach, we select only one trace with the high-
est priority. In other words, suppose that L0=sl(δ(EL)),σL0L0(σ) = 1.
Therefore, using this approach we will have |sl(δ(EL))|=|sl(EL)|. It is
expected that by using this approach, the distribution of frequency of vari-
ants will be changed and consequently the resulted prediction model will
be less accurate. By applying this sampling method on the event log that is
presented in Table 2, the sampled event log will have 4traces, i.e., one trace
for each variant. The corresponding cases are C0={c3, c5, c6, c7}. For the
variants that have more than one trace, the traces are chosen that have the
highest priority (their amount value is closer to the average amount value
for each variant).
10 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 2: An example event log with 10 traces and 4 variants. Each trace has two attributes that are Variant
and Amount.
CaseID Variant Amount
c1ha, b, c, di100
c2ha, c, b, di720
c3ha, b, c, di400
c4ha, b, c, di800
c5ha, c, b, di600
c6ha, c, c, di750
c7ha, c, di170
c8ha, c, b, di60
c9ha, b, c, di260
c10 ha, b, c, di940
Logarithmic distribution: In this approach, we reduce the number of traces
in each variant in a logarithmic way. If L=sl(EL)and L0=sl(δ(EL)),
σL0L0(σ) = [Logk(L(σ))]. Using this approach, the infrequent variants
will not have any trace in the sampled event log and consequently it is not
variant-preserving. According the above formula, by using a higher base for
the logarithm (i.e., k), the size of the sampled event log is reduced more.
By using this sampling strategy with kequals to 3on the event log that is
presented in Table 2, the cases that selected in the sampled event log is C0=
{c3, c9, c5}. Note that for the infrequent variants no trace is selected in the
sampled event log.
Division: This approach performs similar to the previous one, however, in-
stead of using logarithmic scale, we apply the division operator. In this ap-
proach, σL0L0(σ) = dL(σ)
ke. A higher kresults in fewer cases in the sample
event log. Note that as d e considered in the above formula, using this ap-
proach all the variants have at least one trace in the sampled event log and
it is variant-preserving. By using this sampling strategy with k= 4 on the
event log that is presented in Table 2, the sampled event log will have 5traces
that are C0={c3, c9, c5, c6, c7}.
There is also a possibility to consider other selection methods. For example, we
can select the traces completely randomly from the original event log.
By choosing diferent data attributes in Step 2 and diferent sorting algorithms in
Step 3, we are able to lead the sampling of the method on which cases should be chosen.
Moreover, by choosing the type of distribution in Step 4, we determine how many cases
11 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
should be chosen. To compute how sampling method δreduces the size of the given
event log EL, we use the following equation:
RS=|sl(EL)|
|sl(δ(EL))|
The higher RSvalue means, the sampling method reduces more the size of the training
log. By choosing diferent distribution methods and diferent k-values, we are able to
control the size of the sampled event log. It should be noted that the proposed method
will apply just to the training event log. In other words, we do not sample event logs for
development and test datasets.
5 Evaluation
In this section, we aim at designing some experiments to answer the research question,
i.e., ”Is it possible to have computational performance improvement of prediction meth-
ods by using the sampled event logs, while maintaining a similar accuracy?”. It should be
noted that the focus of the experiments is not on prediction model tuning to have higher
accuracy. Conversely, we aim to analyze the efect of using sampled event logs (instead
of the whole datasets) on the required time and the accuracy of prediction models.
In the following, we rst explain the evaluation settings and event logs that are used.
Aferward, we provide some information about the implementation of sampling meth-
ods, and nally, we show the experimental results.
5.1 Evaluation Setting
In this section, we rst explain the prediction methods and parameters that are used in
the evaluation. Aferward, we discuss the evaluation metrics.
5.1.1 Evaluation Parameters
We have developed the sampling methods as a plug-in in the ProM framework [41], ac-
cessible via https://svn.win.tue.nl/repos/prom/Packages/LogFiltering.
This plug-in takes an event log and returns k diferent train and test event logs in the CSV
format. Moreover, we have also implemented the sampling methods in Python to have
all the evaluations in one workow.
We have used two machine learning methods to train the prediction models, i.e.,
LSTM and XGBoost. For predicting the next activity, our LSTM network consisted of
an input layer, two LSTM layers with dropout rates of 10%, and a dense output layer
with the SoMax activation function. We used “categorical cross-entropy” to calcu-
late the loss and adopted ADAM as an optimizer. We built the same architecture of
12 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
LSTM for the remaining time predicting with some diferences. We employed “mean
absolute error” as a loss function and “Root Mean Squared Propagation” (RMSprop)
as an optimizer. We used gbtree with a max depth of 6as a booster in our XGBoost
model for both of the next activity and remaining time prediction tasks. Uniform dis-
tribution is used as the sampling method inside our XGBoost model. To avoid overt-
ting in both models, the training set is further divided into 90% training set and 10%
validation set to stop training once the model performance on the validation set stops
improving. We used the same parameter setting of both models for original event logs
and sampled event logs. The implementations of these methods are available at https:
//github.com/gyunamister/pm-prediction/.
To train the prediction models using machine learning methods, we extract features
from event data. To this end, we use the most commonly-used features for each predic-
tion task in order to reduce the degree of freedom in selecting relevant features. In other
words, we focus on comparing the performance of predictions between sampled and
non-sampled event data with a xed feature space. For instance, for the next activity pre-
diction, we use the partial trace (i.e., the sequence of historical activities) of cases and the
temporal measures of each activity (e.g., sojourn time) with one-hot encoding [25]. For
the remaining time prediction, we use the partial trace of cases along with case attributes
(e.g., cost), resources, and temporal measures [43].
To sample the event logs, we use three distributions that are log distribution,divi-
sion, and unique variants. For the log distribution method, we have used 2,3,5, and 10
(i.e., log2, log3, log5, and log10). For the division method, we have used 2,3,5, and 10 (i.e.,
d2, d3, d5, and d10). For each event log and each sampling method, we have used a 5-fold
cross-validation. It means we split the data into 5groups. One of the groups is used as the
test event log, and the rest are merged as the training event log. It should be noted that
for each event log, the splitting groups were the same for all the prediction and sampling
methods. Moreover, as the results of the experiments are non-deterministic, all the ex-
periments have been repeated 5times, and the average values are represented. Moreover,
to have a fair evaluation, in all the steps, one CPU thread has been used.
5.1.2 Metrics
To evaluate the correctness of prediction methods for predicting the next activities, we
have considered two metrics, i.e., Accuracy and F1-score. The F1-score is used for im-
balanced data [18]. For remaining time prediction, we consider Mean Absolute Error
(MAE) and Root Mean Squared Error (RMSE) measures as they were used in [43] that
are computed as follows.
MAE =1
n
n
X
t=1
|et|
13 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
RMSE =v
u
u
t
1
n
n
X
t=1
e2
t
In the above equations, etindicates the prediction error for the tth instance of validation
data. In other words, we aim to compute the absolute diference between the predicted
and real values (in days) for the remained time. For both measures, the lower value means
higher accuracy. Note that, similar to [43], we considered seconds as the time unit to
compute these two metrics.
To evaluate how accurate the prediction methods using the sampled event logs are,
we have used relative metrics that compared them with the case that whole event logs are
used according to the following equations.
RAcc =Accuracy using the sampled training log
Accuracy using the whole training log
RF1=F1-score using the sampled training log
F1-score using the whole training log
In both above equations, a value close to 1means that using the sampling event logs,
the prediction methods behave almost similar to the case that the whole data is used for
the training. Moreover, values higher than 1indicate the accuracy/F1-score of prediction
methods has improved.
Unlike previous metrics, for MAE and RMSE, a higher value means the prediction
model is less accurate in predicting the remaining time. Therefore, we use the following
measures.
RMAE =MAE using the whole training log
MAE using the sampled training log
RRMSE =RMSE using the whole training log
RMSE using the sampled training log
In both of the above measures, a higher value means that the applied instance se-
lection method preserves higher accuracy. In case the values of the above measures are
higher than 1, the instance selection methods improve the accuracy of prediction models
compared to the case that the whole training data have been used.
To compute the improvement in the performance of feature extraction and training
time, we will use the following equations.
RFE =Feature extraction time using whole data
Feature extraction time using the sampled data
14 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Rt=Training time using whole data
Training time using the sampled data
For both equations, the resulting values indicate how many times applying the sam-
pled log is faster than using all data.
5.1.3 Event Logs
We have used three event logs widely used in the literature. In the BPIC-2012-W event
log, relating to a process of an insurance company, the average of variant frequencies is
low. In the RTFM event log, which corresponds to a road trac management system,
we have some highly frequent variants and several infrequent variants. Moreover, the
number of activities in this event log is high. In the Sepsis event log, relating to a health
care process, there are several variants, that most of them are unique. Some of the activ-
ities in the last two event logs are infrequent, which makes these event logs imbalanced.
Some information about these event logs and the result of using prediction methods
on them is presented in Table 3. Note that the time-related features in this table are in
seconds.
According to Table 3, using the whole event data we usually have high accuracy for
the next activity prediction. However, the F1-score is not that high, which is mainly be-
cause the event logs are imbalanced (specically RTFM and Sepsis). Moreover, the MAE
and RMSE values are very high. Specically for the RTFM event log. It is mainly be-
cause process instances’ durations are very long in this event log, and consequently, the
etvalues are higher. Finally, there is a direct relation between the size of event logs and
the required time for extracting features and training the models.
5.2 Evaluation Results
Here, we provide the results of using sampled training event logs instead of whole train-
ing event logs. First, we show how by using sampling, the size of training data is reduced
in Table 4. As it is expected, the highest reduction occurs when log10 is used. Using this
sampling, the size of the RTFM event log is more than 1000 times reduced. However,
Table 3: Event logs that are used in the evaluation and results of using them for the next activity and remain-
ing time prediction
15 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 4: Reduction in size of training event logs (i.e., RS) and the improvement in the feature extraction
process (i.e., RFE ) using diferent sampling methods
Table 5: RAcc of diferent event logs when diferent sampling methods are used.
for the Sepsis event log, as most variants are unique, sampling the training event logs
using divide distribution could not result in high RS. Moreover, this table shows how
using the sampling event logs can reduce the required time to extract features of the event
data, i.e., RFE. As it is expected, there is a correlation between the size reduction of the
sampled event logs and the improvement in the RFE.
In the following, we show how using the sampling event logs afects the next activity,
remaining time, and outcome prediction.
5.2.1 Next Activity Prediction
The accuracy of both LSTM and XGboost methods that are trained using sampled
training data is presented in Table 5. Results indicate that in most cases, when the di-
vision sampling methods are used, we can achieve similar accuracy, i.e., RAcc close to 1,
compared to the case where the whole training data is used. In some cases, like using
d2for the Sepsis event log and LSTM method, the accuracy of the training method is
even (slightly) improved. However, for the RTFM, using sampling with logarithmic
or unique distribution highly changes the frequency distribution of variants and conse-
quently causes a higher reduction in the accuracy of predicting models. Moreover, the
accuracy reduction was higher when the XGboost method was used. The result indi-
cated that by increasing the RSvalue, we lose more information in the training event
logs, and consequently, the quality of the prediction models will be decreased.
In Table 6,RF1of trained models are depicted. The results again indicate that using
the sampling method with divide distribution in most cases leads to having a similar (and
sometimes higher) F1-score.
By considering the results of these two tables, we found that specically, when the
LSTM method is used, we will have similar accuracy and F1-score. Moreover, for the
Sepsis and BPIC2012-W that variants have similar frequency having all the variants (i.e.,
16 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 6: RF1of diferent event logs when diferent sampling methods are used.
Table 7: Rtof diferent event logs when diferent sampling methods are used.
using divide and unique distributions) can help the prediction method to have results
similar to the case that we use the whole training data. However, for the RTFM event
log that has some high frequent variants, using the unique distribution results in lower
accuracy and F1-score.
Table 7 shows how much training time is faster using the sampled training data in-
stead of using the whole event data. There is a direct relationship between the size re-
duction and Rt(refer to the results in Table 4). However, in most cases, the performance
improvement is bigger for the XGboost method. Considering the results in this table and
Table 6, we found that using the sampling method, we are able to improve the perfor-
mance of the next activity prediction methods on the used event logs while they provide
similar results. However, oversampling (e.g., applying log10 for RTFM ) will result in
lower accuracy.
5.2.2 Remaining Time Prediction
In Table 8 and Table 9, we show how by using the sampled event logs, MAE and RMSE
of diferent remaining time prediction methods are changed. The results indicate that for
the LSTM method, independent of the sampling method for all event logs, we are able to
provide a prediction similar to the case where whole event logs are used. It seems that the
settings that are used for training the prediction method are not good. In other words,
the trained model is not accurate enough. We repeat this experiment for LSTM with
several diferent parameters, but we have almost the same results. It is mainly caused by
the challenging nature of the remaining time prediction task compared to classication-
based problems (such as next activity and outcome prediction). However, by sampling
the training event logs, we keep the quality of prediction models.
For the XGboost method, the results indicate that if we do not sample a small amount
of traces (for example, using logarithmic sampling), we can have high RMAE and RRMSE.
17 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 8: RMAE of diferent event logs when diferent sampling methods are used.
Table 9: RRMSE of diferent event logs when diferent sampling methods are used.
In general, as the main attribute for the next activity prediction is the sequence of activ-
ities, it is less sensitive to sampling. However, to predict the remaining time, the other
data attributes can be essential too. In other words, for the remaining prediction, we
need larger sampled event logs.
In Table 10, it is shown how by sampling event logs, we are able to reduce the re-
quired training time and improve the performance of the remaining time prediction pro-
cess. By considering the results in Table 10 and Table 7, as we have expected, by having
higher RSthe Rtvalue is higher.
5.3 Outcome Prediction
For the outcome prediction, in order to facilitate comparison and remain consistent with
previous work on outcome prediction, we transform each event log into diferent event
logs [40]. For example, we transform BPIC2012 event log to BPIC2012Accepted,
BPIC2012Cancelled, and BPIC2012Declined.
The RAcc and RF1of both LSTM and XGboost methods that are trained for pre-
diction of outcome using sampled training data is presented in Table 11 and Table 12,
respectively. The results indicate that, in many cases, we are able to improve the accuracy
of the outcome prediction algorithms. Specically, using Unique strategy for sampling
the traces for BPIC2012Accepted event log leads to considerable improvement for
both LSTM and XGB methods. Unlike the other two applications, i.e., the next activ-
Table 10: Rtof diferent event logs when diferent sampling methods are used.
18 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Table 11: RAcc of diferent event logs when diferent sampling methods are used for outcome prediction.
Table 12: RF1of diferent event logs when diferent sampling methods are used for outcome prediction.
ity and the remaining time prediction, even by oversampling some event logs, e.g., log10,
we can obtain results similar to the cases in which the whole training event logs are used.
In Table 13,Rtof diferent sampling methods are shown. The performance improve-
ment is usually bigger for the LSTM method. There are several cases in which we are
not able to improve the performance of the prediction method (Rtvalues less than 1).
It happens mainly for the Unique sampling method. One reason could be by removing
the frequencies, the convergence time for the learning method is increased. In case the
logarithmic method is used, the performance improvement is around 50 times. It means
the training process using the sampled event log is 50 times faster than the case where the
whole training log is used.
6 Discussion
In this section, we discuss the results that are illustrated in the previous section. The
results indicate that we do not always have a typical trade-of between the accuracy of
the trained model and the performance of the prediction procedure. For example, for
the next activity prediction, there are some cases where the training process is much faster
Table 13: Rtof diferent event logs when diferent sampling methods are used for outcome prediction.
19 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
than the normal procedure, even though the trained model provides an almost similar
or higher accuracy and F1-score. Thus, the proposed instance selection procedure can
be applied when we aim to apply hyper-parameter optimization [4]. In this way, more
settings can be analyzed in a limited time. Moreover, it is reasonable to use the proposed
method when we aim to train an online prediction method or on naive hardware such
as cell phones.
To achieve the highest performance improvement while the trained model is accu-
rate enough, diferent sampling methods should be used for diferent event logs. For
example, for the RTFM event log—as there are some highly frequent variants—the
division distribution may be more useful. In other words, independently of the used
prediction method, if we change the distribution of variants (e.g., using unique distri-
bution), it is expected that the accuracy will sharply decrease. However, for event logs
with a more uniform distribution, we can use unique distributions to sample event logs.
Furthermore, the results indicate that the efect of the chosen distribution (i.e., unique,
division, and logarithmic) is more important than the used k-value. It is mainly because
the logarithmic distribution may remove some of the variants, and the unique distribu-
tion change the frequency distribution of variants. Therefore, it would be interesting
to investigate more on the characteristics of the given event log and suitable sampling
parameters for such distribution. For example, if most variants of a given event log are
unique (e.g., Sepsis), using the logarithmic distribution leads to having remarkable RS
and consequently, RFE and Rtwill be very high. However, we will lose most of the vari-
ants, and the trained model might have poor predictions.
By analyzing the results, we found that the infrequent activities can be ignored us-
ing some hyper-parameter settings. The signicant diference between F1-score and Ac-
curacy values in Table 3 indicates this problem too. Using the sampling methods that
modify the distribution of the event logs such as the unique method can help the predic-
tion methods to also consider these activities. However, as these activities are infrequent,
improving in the prediction of them would not impact highly on the presented aggre-
gated F1-score value.
Finally, in real-life business scenarios, the process can change because of diferent
reasons [7]. This phenomenon is usually called concept dri. By considering the whole
event log for training the prediction model, it is most probable that these changes are
not considered in the prediction. Using the proposed sampling procedure, and giving
higher priorities to newer traces, it is expected that we are able to adapt to the changes
faster, which may be critical for specic applications.
Limitations
Comparing the results for the next activity and remaining time prediction, we found
that predicting the remaining time of the process is more sensitive to instance selection.
20 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
In other words, this application requires more data to predict accurately. Thus, for the
cases that the target attribute depends more on other data attributes compared to variant
information, we need to sample more data to capture more related information. Other-
wise, the trained model might be inaccurate.
We also found a critical problem in predictive monitoring. In some cases, specically
using LSTM for predicting the remaining time, the accuracy of the predictions is low.
For the next activity prediction, it is possible that the prediction models almost ignore
infrequent activities. In these cases, even if we use the training data for the evaluation,
we do not have acceptable results. This problem in machine learning is called a high bias
error [32]. In other words, the training is not ecient even when using whole data and
we need to change the prediction method (or its parameters).
7 Conclusion
In this paper, we proposed an instance selection approach to improve the performance
of predictive business process monitoring methods. We suggested that it is possible to
use a sample of training event data instead of the whole training event data. To evaluate
the proposed approach, we consider two main applications of predictive business mon-
itoring, i.e., the next activity and the remaining time prediction. Results of applying the
proposed approaches on three real-life event logs and two widely used machine learning
methods, i.e., LSTM and XGboost, indicate that in most cases, we are able to improve
the performance of predictive monitoring algorithms while providing similar accuracy
compared to the case that the whole training event logs are used. However, by oversam-
pling, the accuracy of the trained model might be reduced. Moreover, we have found
that the remaining time prediction application is more sensitive to sampling.
To continue this research, we aim to extend the experiments to study the relationship
between the event log characteristics and the sampling parameters. In other words, we
aim to help the end-user to adjust the sampling parameters based on the characteristics
of the given event log. Moreover, it would be great to investigate how we can apply the
proposed sampling procedure for streaming event data which is potentially one of the
major advantages of the proposed method in a real-life setting. Finally, it is interesting to
investigate more on feature selection methods for improving the performance of the pre-
dictive monitoring procedure. It is expected that, similar to process instance sampling,
feature selection methods are able to reduce the required training time. In other words,
the training is not ecient even using whole data and we need to change the prediction
method (or its parameters).
Another important outcome of the results is that for diferent event logs, we should
use diferent sampling methods to achieve the highest performance. Considering lots
of diferent machine learning methods and their parameters can lead to an increase in
21 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
the search space and complexity for users. In other words, nding the right setting for
sampling may be challenging in real scenarios. Therefore, it would be valuable to research
the relationship between the event log characteristics and suitable sampling parameters
that can be used for preprocessing the training event log.
Acknowledgment
The authors would like to thank the Alexander von Humboldt (AvH) Stifung for fund-
ing this research.
Declarations
In this part, we provide some declarations about the conict of interest, the code avail-
ability, and the availability of data that is used in this paper.
Conict of interest:
Code availability: Our proposed are available in https://svn.win.tue.nl/
repos/prom/Packages/LogFiltering and https://github.com/gyunamister/
pm-prediction/. For a part of the experiments, we have used the implementa-
tion that is available at https://github.com/verenich/time-prediction-benchmark.
Data availability: We have applied our proposed approach to the following three
publicly available datasets (event logs).
BPIC-2012, that is accessible via https://data.4tu.nl/articles/dataset/
BPI_Challenge_2012/12689204.
RTFM, that is accessible via https://data.4tu.nl/articles/dataset/
Road_Traffic_Fine_Management_Process/12683249.
Sepsis that is accessible via https://data.4tu.nl/articles/dataset/
Sepsis_Cases_-_Event_Log/12707639.
References
[1] van der Aa, Han, Adrian Rebmann, and Henrik Leopold. “Natural language-
based detection of semantic execution anomalies in event logs”. In: Information
Systems 102 (2021), p. 101824. doi:10.1016/j.is.2021.101824.
[2] van der Aalst, Wil M. P. Process Mining - Data Science in Action, Second Edition.
Springer, 2016. isbn: 978-3-662-49850-7. doi:10.1007/978-3-662-49851-
4.
22 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[3] van der Aalst, Wil M. P., M. H. Schonenberg, and Minseok Song. “Time predic-
tion based on process mining”. In: Information Systems 36.2 (2011), pp. 450–475.
doi:10.1016/j.is.2010.09.001.
[4] Bergstra, James, R´
emi Bardenet, Yoshua Bengio, and Bal´
azs K´
egl. “Algorithms
for Hyper-Parameter Optimization”. In: Advances in Neural Information Pro-
cessing Systems 24: 25th Annual Conference on Neural Information Processing Sys-
tems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain.
Ed. by Shawe-Taylor, John, Richard S. Zemel, Peter L. Bartlett, Fernando C. N.
Pereira, and Kilian Q. Weinberger. 2011, pp. 2546–2554. url:https:/ / proc
eedings. neurips . cc/ paper/ 2011/hash/ 86e8f7ab32cfd12577bc
2619bc635690-Abstract.html.
[5] Breiman, Leo. “Bagging Predictors”. In: Machine Learning 24.2 (1996), pp. 123–
140. doi:10.1007/BF00058655.
[6] Breuker, Dominic, Martin Matzner, Patrick Delfmann, and J¨
org Becker. “Com-
prehensible Predictive Models for Business Processes”. In: MIS Quarterly 40.4
(2016), pp. 1009–1034. url:http://misq.org/comprehensible-predic
tive-models-for-business-processes.html.
[7] Carmona, Josep and Ricard Gavald`
a. “Online Techniques for Dealing with Con-
cept Drif in Process Mining”. In: Advances in Intelligent Data Analysis XI
- 11th International Symposium, IDA 2012, Helsinki, Finland, October 25-27,
2012. Proceedings. Ed. by Hollm´
en, Jaakko, Frank Klawonn, and Allan Tucker.
Vol. 7619. Lecture Notes in Computer Science. Springer, 2012, pp. 90–102. doi:
10.1007/978-3-642-34156-4_10.
[8] Cazenavette, George, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and
Jun-Yan Zhu. “Dataset Distillation by Matching Training Trajectories”. In: IEEE/CVF
Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Or-
leans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 10708–10717. doi:10.1109/
CVPR52688.2022.01045.
[9] Chen, Tianqi and Carlos Guestrin. “XGBoost: A Scalable Tree Boosting Sys-
tem”. In: Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17,
2016. Ed. by Krishnapuram, Balaji, Mohak Shah, Alexander J. Smola, Charu C.
Aggarwal, Dou Shen, and Rajeev Rastogi. ACM, 2016, pp. 785–794. doi:10 .
1145/2939672.2939785.
[10] Evermann, Joerg, Jana-Rebecca Rehse, and Peter Fettke. “Predicting process be-
haviour using deep learning”. In: Decision Support Systems 100 (2017), pp. 129–
140. doi:10.1016/j.dss.2017.04.003.
23 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[11] Fani Sani, Mohammadreza, Mozhgan Vazifehdoostirani, Gyunam Park, Marco
Pegoraro, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. “Event Log Sam-
pling for Predictive Monitoring”. In: Process Mining Workshops - ICPM 2021
International Workshops, Eindhoven, The Netherlands, October 31 - November
4, 2021, Revised Selected Papers. Ed. by Munoz-Gama, Jorge and Xixi Lu. Vol. 433.
Lecture Notes in Business Information Processing. Springer, 2021, pp. 154–166.
doi:10.1007/978-3-030-98581-3_12.
[12] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“Conformance Checking Approximation Using Subset Selection and Edit Dis-
tance”. In: Advanced Information Systems Engineering - 32nd International Con-
ference, CAiSE 2020, Grenoble, France, June 8-12, 2020, Proceedings. Ed. by Dust-
dar, Schahram, Eric Yu, Camille Salinesi, Dominique Rieu, and Vik Pant. Vol. 12127.
Lecture Notes in Computer Science. Springer, 2020, pp. 234–251. doi:10.1007/
978-3-030-49435-3_15.
[13] Fani Sani, Mohammadreza, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst.
“The impact of biased sampling of event logs on the performance of process dis-
covery”. In: Computing 103.6 (2021), pp. 1085–1104. doi:10. 1007 / s00607 -
021-00910-4.
[14] Galanti, Riccardo, Bernat Coma-Puig, Massimiliano de Leoni, Josep Carmona,
and Nicol´
o Navarin. “Explainable Predictive Process Monitoring”. In: 2nd In-
ternational Conference on Process Mining, ICPM 2020, Padua, Italy, October
4-9, 2020. Ed. by van Dongen, Boudewijn F., Marco Montali, and Moe Thandar
Wynn. IEEE, 2020, pp. 1–8. doi:10.1109/ICPM49681.2020.00012.
[15] Garc´
ıa, Salvador, Juli´
an Luengo, and Francisco Herrera. Data Preprocessing in
Data Mining. Vol. 72. Intelligent Systems Reference Library. Springer, 2015. isbn:
978-3-319-10246-7. doi:10.1007/978-3-319-10247-4.
[16] Huang, Zhiheng, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF Models for
Sequence Tagging”. In: CoRR abs/1508.01991 (2015). arXiv: 1508.01991.
[17] de Leoni, Massimiliano, Wil M. P. van der Aalst, and Marcus Dees. “A general
process mining framework for correlating, predicting and clustering dynamic be-
havior based on event logs”. In: Information Systems 56 (2016), pp. 235–257. doi:
10.1016/j.is.2015.07.003.
[18] Luque, Amalia, Alejandro Carrasco, Alejandro Mart´
ın, and Ana de las Heras.
“The impact of class imbalance in classication performance metrics based on
the binary confusion matrix”. In: Pattern Recognition 91 (2019), pp. 216–231. doi:
10.1016/j.patcog.2019.02.023.
24 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[19] M´
arquez-Chamorro, Alfonso Eduardo, Manuel Resinas, and Antonio Ruiz-Cort´
es.
“Predictive Monitoring of Business Processes: A Survey”. In: IEEE Transactions
on Services Computing 11.6 (2018), pp. 962–977. doi:10.1109/TSC . 2017 .
2772256.
[20] Navarin, Nicol´
o, Beatrice Vincenzi, Mirko Polato, and Alessandro Sperduti. “LSTM
networks for data-aware remaining time prediction of business process instances”.
In: 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017, Hon-
olulu, HI, USA, November 27 - Dec. 1, 2017. IEEE, 2017, pp. 1–7. doi:10.1109/
SSCI.2017.8285184.
[21] Nguyen, An, Srijeet Chatterjee, Sven Weinzierl, Leo Schwinn, Martin Matzner,
and Bjoern M. Eskoer. “Time Matters: Time-Aware LSTMs for Predictive Busi-
ness Process Monitoring”. In: Process Mining Workshops - ICPM 2020 Interna-
tional Workshops, Padua, Italy, October 5-8, 2020, Revised Selected Papers. Ed. by
Leemans, Sander J. J. and Henrik Leopold. Vol. 406. Springer, 2020, pp. 112–123.
doi:10.1007/978-3-030-72693-5_9.
[22] Park, Gyunam and Wil M. P. van der Aalst. “Action-oriented process mining:
bridging the gap between insights and actions”. In: Progress in Artificial Intelli-
gence (2022). issn: 2192-6352, 2192-6360. doi:10.1007/s13748-022-00281-
7.
[23] Park, Gyunam, Aaron K ¨
usters, Mara Tews, Cameron Pitsch, Jonathan Schneider,
and Wil M. P. van der Aalst. “Explainable Predictive Decision Mining for Oper-
ational Support”. In: CoRR abs/2210.16786 (2022). doi:10.48550/ arXiv .
2210.16786.
[24] Park, Gyunam and Minseok Song. “Predicting performances in business pro-
cesses using deep neural networks”. In: Decision Support Systems 129 (2020). doi:
10.1016/j.dss.2019.113191.
[25] Park, Gyunam and Minseok Song. “Prediction-based Resource Allocation using
LSTM and Minimum Cost and Maximum Flow Algorithm”. In: International
Conference on Process Mining, ICPM 2019, Aachen, Germany, June 24-26, 2019.
IEEE, 2019, pp. 121–128. doi:10.1109/ICPM.2019.00027.
[26] Pauwels, Stephen and Toon Calders. “Incremental Predictive Process Monitor-
ing: The Next Activity Case”. In: Business Process Management - 19th Interna-
tional Conference, BPM 2021, Rome, Italy, September 06-10, 2021, Proceedings.
Ed. by Polyvyanyy, Artem, Moe Thandar Wynn, Amy Van Looy, and Manfred
Reichert. Vol. 12875. Lecture Notes in Computer Science. Springer, 2021, pp. 123–
140. doi:10.1007/978-3-030-85469-0_10.
25 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[27] Pegoraro, Marco, Merih Seran Uysal, David Benedikt Georgi, and Wil M. P. van
der Aalst. “Text-Aware Predictive Monitoring of Business Processes”. In: 24th
International Conference on Business Information Systems, BIS 2021, Hannover,
Germany, June 15-17, 2021. Ed. by Abramowicz, Witold, S ¨
oren Auer, and Elzbieta
Lewanska. 2021, pp. 221–232. doi:10.52825/bis.v1i.62.
[28] Pegoraro, Marco, Merih Seran Uysal, Tom-Hendrik H¨
ulsmann, and Wil M. P.
van der Aalst. “Resolving Uncertain Case Identiers in Interaction Logs: A User
Study”. In: CoRR abs/2212.00009 (2022). doi:10 . 48550 / arXiv . 2212 .
00009.
[29] Pegoraro, Marco, Merih Seran Uysal, Tom-Hendrik H¨
ulsmann, and Wil M. P.
van der Aalst. “Uncertain Case Identiers in Process Mining: A User Study of the
Event-Case Correlation Problem on Click Data”. In: Enterprise, Business-Process
and Information Systems Modeling - 23rd International Conference, BPMDS
2022 and 27th International Conference, EMMSAD 2022, Held at CAiSE 2022,
Leuven, Belgium, June 6-7, 2022, Proceedings. Ed. by Augusto, Adriano, Asif
Gill, Dominik Bork, Selmin Nurcan, Iris Reinhartz-Berger, and Rainer Schmidt.
Vol. 450. Lecture Notes in Business Information Processing. Springer, 2022, pp. 173–
187. doi:10.1007/978-3-031-07475-2_12.
[30] Polato, Mirko, Alessandro Sperduti, Andrea Burattin, and Massimiliano de Leoni.
“Time and activity sequence prediction of business process instances”. In: Com-
puting 100.9 (2018), pp. 1005–1031. doi:10.1007/s00607-018-0593-x.
[31] Pourghassemi, Behnam, Chenghao Zhang, Joo Hwan Lee, and Aparna Chan-
dramowlishwaran. “On the Limits of Parallelizing Convolutional Neural Net-
works on GPUs”. In: SPAA ’20: 32nd ACM Symposium on Parallelism in Al-
gorithms and Architectures, Virtual Event, USA, July 15-17, 2020. Ed. by Schei-
deler, Christian and Michael Spear. ACM, 2020, pp. 567–569. doi:10.1145/
3350755.3400266.
[32] van der Putten, Peter and Maarten van Someren. “A Bias-Variance Analysis of a
Real World Learning Problem: The CoIL Challenge 2000”. In: Machine Learn-
ing 57.1-2 (2004), pp. 177–195. doi:10.1023/B:MACH.0000035476.95130.
99.
[33] Qafari, Mahnaz Sadat and Wil M. P. van der Aalst. “Root Cause Analysis in Pro-
cess Mining Using Structural Equation Models”. In: Business Process Manage-
ment Workshops - BPM 2020 International Workshops, Seville, Spain, Septem-
ber 13-18, 2020, Revised Selected Papers. Ed. by del-R´
ıo-Ortega, Adela, Henrik
Leopold, and Fl´
avia Maria Santoro. Vol. 397. Lecture Notes in Business Infor-
mation Processing. Springer, 2020, pp. 155–167. doi:10.1007/978-3- 030-
66498-5_12.
26 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
[34] Rogge-Solti, Andreas and Mathias Weske. “Prediction of Remaining Service Exe-
cution Time Using Stochastic Petri Nets with Arbitrary Firing Delays”. In: Service-
Oriented Computing - 11th International Conference, ICSOC 2013, Berlin, Ger-
many, December 2-5, 2013, Proceedings. Ed. by Basu, Samik, Cesare Pautasso,
Liang Zhang, and Xiang Fu. Vol. 8274. Springer, 2013, pp. 389–403. doi:10.
1007/978-3-642-45005-1_27.
[35] Senderovich, Arik, Chiara Di Francescomarino, Chiara Ghidini, Kerwin Jorbina,
and Fabrizio Maria Maggi. “Intra and Inter-case Features in Predictive Process
Monitoring: A Tale of Two Dimensions”. In: Business Process Management - 15th
International Conference, BPM 2017, Barcelona, Spain, September 10-15, 2017,
Proceedings. Ed. by Carmona, Josep, Gregor Engels, and Akhil Kumar. Vol. 10445.
Lecture Notes in Computer Science. Springer, 2017, pp. 306–323. doi:10.1007/
978-3-319-65000-5_18.
[36] Sindhgatta, Renuka, Catarina Moreira, Chun Ouyang, and Alistair Barros. “Ex-
ploring Interpretable Predictive Models for Business Processes”. In: Business Pro-
cess Management - 18th International Conference, BPM 2020, Seville, Spain,
September 13-18, 2020, Proceedings. Ed. by Fahland, Dirk, Chiara Ghidini, J¨
org
Becker, and Marlon Dumas. Vol. 12168. Lecture Notes in Computer Science. Springer,
2020, pp. 257–272. doi:10.1007/978-3-030-58666-9_15.
[37] Stierle, Matthias, Jens Brunk, Sven Weinzierl, Sandra Zilker, Martin Matzner, and
J¨
org Becker. “Bringing Light Into the Darkness - A Systematic Literature Review
on Explainable Predictive Business Process Monitoring Techniques”. In: 28th Eu-
ropean Conference on Information Systems - Liberty, Equality, and Fraternity in
a Digitizing World , ECIS 2020, Marrakech, Morocco, June 15-17, 2020. Ed. by
Rowe, Frantz, Redouane El Amrani, Moez Limayem, Sabine Matook, Christoph
Rosenkranz, Edgar A. Whitley, and Ali El Quammah. 2021. url:https : / /
aisel.aisnet.org/ecis2021_rip/8.
[38] Tax, Niek, Ilya Verenich, Marcello La Rosa, and Marlon Dumas. “Predictive Busi-
ness Process Monitoring with LSTM Neural Networks”. In: Advanced Infor-
mation Systems Engineering - 29th International Conference, CAiSE 2017, Es-
sen, Germany, June 12-16, 2017, Proceedings. Ed. by Dubois, Eric and Klaus Pohl.
Vol. 10253. Springer, 2017, pp. 477–492. doi:10.1007/978-3-319-59536-
8_30.
[39] Teinemaa, Irene, Marlon Dumas, Fabrizio Maria Maggi, and Chiara Di Francesco-
marino. “Predictive Business Process Monitoring with Structured and Unstruc-
tured Data”. In: Business Process Management - 14th International Conference,
BPM 2016, Rio de Janeiro, Brazil, September 18-22, 2016. Proceedings. Ed. by
Rosa, Marcello La, Peter Loos, and Oscar Pastor. Vol. 9850. Lecture Notes in
27 / 28
M. Fani Sani et al. Performance-Preserving Sampling for Predictive Monitoring
Computer Science. Springer, 2016, pp. 401–417. doi:10.1007/978- 3-319-
45348-4_23.
[40] Teinemaa, Irene, Marlon Dumas, Marcello La Rosa, and Fabrizio Maria Maggi.
“Outcome-Oriented Predictive Process Monitoring: Review and Benchmark”.
In: ACM Transactions on Knowledge Discovery from Data 13.2 (2019), 17:1–17:57.
doi:10.1145/3301300.
[41] Verbeek, Eric, Joos C. A. M. Buijs, Boudewijn F. van Dongen, and Wil M. P. van
der Aalst. “ProM 6: The Process Mining Toolkit”. In: Proceedings of the Business
Process Management 2010 Demonstration Track, Hoboken, NJ, USA, September
14-16, 2010. Ed. by Rosa, Marcello La. Vol. 615. CEUR Workshop Proceedings.
CEUR-WS.org, 2010. url:http :/ /ceur - ws .org / Vol- 615 /paper13 .
pdf.
[42] Verenich, Ilya. “Explainable Predictive Monitoring of Temporal Measures of Busi-
ness Processes”. In: Proceedings of the Dissertation Award, Doctoral Consortium,
and Demonstration Track at BPM 2019 co-located with 17th International Con-
ference on Business Process Management, BPM 2019, Vienna, Austria, Septem-
ber 1-6, 2019. Ed. by Depaire, Benoˆ
ıt, Johannes De Smedt, Marlon Dumas, Dirk
Fahland, Akhil Kumar, Henrik Leopold, Manfred Reichert, Stefanie Rinderle-
Ma, Stefan Schulte, Stefan Seidel, and Wil M. P. van der Aalst. Vol. 2420. CEUR
Workshop Proceedings. CEUR-WS.org, 2019, pp. 26–30. url:http://ceur-
ws.org/Vol-2420/paperDA6.pdf.
[43] Verenich, Ilya, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, and
Irene Teinemaa. “Survey and Cross-benchmark Comparison of Remaining Time
Prediction Methods in Business Process Monitoring”. In: ACM Transactions on
Intelligent Systems and Technolo 10.4 (2019), 34:1–34:34. doi:10.1145/3331449.
[44] Wilson, D. Randall and Tony R. Martinez. “Reduction Techniques for Instance-
Based Learning Algorithms”. In: Machine Learning 38.3 (2000), pp. 257–286.
doi:10.1023/A:1007626913721.
[45] Wilson, Dennis L. “Asymptotic Properties of Nearest Neighbor Rules Using
Edited Data”. In: IEEE Transactions on Systems, Man and Cybernetics 2.3 (1972),
pp. 408–421. doi:10.1109/TSMC.1972.4309137.
[46] Zhou, Lina, Shimei Pan, Jianwu Wang, and Athanasios V. Vasilakos. “Machine
learning on big data: Opportunities and challenges”. In: Neurocomputing 237
(2017), pp. 350–361. doi:10.1016/j.neucom.2017.01.026.
28 / 28
ResearchGate has not been able to resolve any citations for this publication.
Preprint
Full-text available
Several decision points exist in business processes (e.g., whether a purchase order needs a manager's approval or not), and different decisions are made for different process instances based on their characteristics (e.g., a purchase order higher than $500 needs a manager approval). Decision mining in process mining aims to describe/predict the routing of a process instance at a decision point of the process. By predicting the decision, one can take proactive actions to improve the process. For instance, when a bottleneck is developing in one of the possible decisions, one can predict the decision and bypass the bottleneck. However, despite its huge potential for such operational support, existing techniques for decision mining have focused largely on describing decisions but not on predicting them, deploying decision trees to produce logical expressions to explain the decision. In this work, we aim to enhance the predictive capability of decision mining to enable proactive operational support by deploying more advanced machine learning algorithms. Our proposed approach provides explanations of the predicted decisions using SHAP values to support the elicitation of proactive actions. We have implemented a Web application to support the proposed approach and evaluated the approach using the implementation.
Article
Full-text available
As business environments become more dynamic and complex, it becomes indispensable for organizations to objectively analyze business processes, monitor the existing and potential operational frictions, and take proactive actions to mitigate risks and improve performances. Process mining provides techniques to extract insightful knowledge of business processes from event data collected during the execution of the processes. Besides, various approaches have been suggested to support the real-time (predictive) monitoring of the process-related problems. However, the link between the insights from the continuous monitoring and the concrete management actions for the actual process improvement is missing. Action-oriented process mining aims at connecting the knowledge extracted from event data to actions. In this work, we propose a general framework for action-oriented process mining covering the continuous monitoring of operational processes and the automated execution of management actions. Based on the framework, we suggest a cube-based action engine where actions are generated by analyzing monitoring results in a multi-dimensional way. The framework is implemented as a ProM plug-in and evaluated by conducting experiments on both artificial and real-life information systems.
Conference Paper
Full-text available
Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data often called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for event-case correlation on click data, in the context of user interaction events from a mobility sharing company. To reconstruct the case notion of the process, we apply a novel method to aggregate user interaction data in separate user sessions—interpreted as cases—based on neural networks. To validate our findings, we qualitatively discuss the impact of process mining analyses on the resulting well-formed event log through interviews with process experts.
Conference Paper
Full-text available
Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection procedure that allows sampling training process instances for prediction models. We show that our sampling method allows for a significant increase of training speed for next activity prediction methods while maintaining reliable levels of prediction accuracy. KeywordsProcess MiningPredictive MonitoringSamplingMachine LearningDeep LearningInstance Selection
Chapter
Full-text available
Next-activity prediction methods for business processes are always introduced in a static setting, implying a single training phase followed by the application of the learned model during the test phase. Real-life processes, however, are often dynamic and prone to changes over time. Therefore, all state-of-the-art methods need regular retraining on new data to be kept up to date. It is, however, not straightforward to determine when to retrain nor what data to use; for instance, should all historic data be included or only new data? Updating models that still perform at an acceptable level wastes a potentially large amount of computational resources while postponing an update too much will deteriorate model performance. In this paper, we present incremental learning strategies for updating these existing models that do not require fully retraining them, hence reducing the number of computational resources needed while still maintaining a more consistent and correct view of the process in its current form. We introduce a basic neural network method consisting of a single dense layer. This architecture makes it easier to perform fast updates to the model and enables us to perform more experiments. We investigate the differences between our proposed incremental approaches. Experiments performed with a prototype on real-life data show that these update strategies are a promising way forward to further increase the power and usability of state-of-the-art methods.
Conference Paper
Full-text available
The real-time prediction of business processes using historical event data is an important capability of modern business process monitoring systems. Existing process prediction methods are able to also exploit the data perspective of recorded events, in addition to the control-flow perspective. However, while well-structured numerical or categorical attributes are considered in many prediction techniques, almost no technique is able to utilize text documents written in natural language, which can hold information critical to the prediction task. In this paper, we illustrate the design, implementation, and evaluation of a novel text-aware process prediction model based on Long Short-Term Memory (LSTM) neural networks and natural language models. The proposed model can take categorical, numerical and textual attributes in event data into account to predict the activity and timestamp of the next event, the outcome, and the cycle time of a running process instance. Experiments show that the text-aware model is able to outperform state-of-the-art process prediction methods on simulated and real-world event logs containing textual data.
Conference Paper
Full-text available
Predictive business process monitoring (PBPM) provides a set of techniques to perform different prediction tasks in running business processes, such as the next activity, the process outcome, or the remaining time. Nowadays, deep-learning-based techniques provide more accurate predictive models. However, the explainability of these models has long been neglected. The predictive quality is essential for PBPM-based decision support systems, but also its explainability for human stakeholders needs to be considered. Explainable artificial intelligence (XAI) describes different approaches to make machine-learning-based techniques explainable. To examine the current state of explainable PBPM techniques, we perform a structured and descriptive literature review. We identify explainable PBPM techniques of the domain and classify them along with different XAI-related concepts: prediction purpose, intrinsically interpretable or post-hoc, evaluation objective, and evaluation method. Based on our classification, we identify trends in the domain and remaining research gaps.
Preprint
Modern software systems are able to record vast amounts of user actions, stored for later analysis. One of the main types of such user interaction data is click data: the digital trace of the actions of a user through the graphical elements of an application, website or software. While readily available, click data is often missing a case notion: an attribute linking events from user interactions to a specific process instance in the software. In this paper, we propose a neural network-based technique to determine a case notion for click data, thus enabling process mining and other process analysis techniques on user interaction data. We describe our method, show its scalability to datasets of large dimensions, and we validate its efficacy through a user study based on the segmented event log resulting from interaction data of a mobility sharing company. Interviews with domain experts in the company demonstrate that the case notion obtained by our method can lead to actionable process insights.
Article
Anomaly detection in process mining aims to recognize outlying or unexpected behavior in event logs for purposes such as the removal of noise and identification of conformance violations. Existing techniques for this task are primarily frequency-based, arguing that behavior is anomalous because it is uncommon. However, such techniques ignore the semantics of recorded events and, therefore, do not take the meaning of potential anomalies into consideration. In this work, we overcome this caveat and focus on the detection of anomalies from a semantic perspective, arguing that anomalies can be recognized when process behavior does not make sense. To achieve this, we propose an approach that exploits the natural language associated with events. Our key idea is to detect anomalous process behavior by identifying semantically inconsistent execution patterns. To detect such patterns, we first automatically extract business objects and actions from the textual labels of events. We then compare these against a process-independent knowledge base. By populating this knowledge base with patterns from various kinds of resources, our approach can be used in a range of contexts and domains. We demonstrate the capability of our approach to successfully detect semantic execution anomalies through an evaluation based on a set of real-world and synthetic event logs and show the complementary nature of semantics-based anomaly detection to existing frequency-based techniques.