Conference PaperPDF Available

White-Box Prediction of Process Performance Indicators via Flow Analysis

Authors:

Abstract and Figures

Predictive business process monitoring methods exploit historical process execution logs to provide predictions about running instances of a process, which enable process workers and managers to preempt performance issues or compliance violations. A number of approaches have been proposed to predict quantitative process performance indicators, such as remaining cycle time, cost, or probability of deadline violation. However, these approaches adopt a black-box approach, insofar as they predict a single scalar value without decomposing this prediction into more elementary components. In this paper, we propose a white-box approach to predict performance indicators of running process instances. The key idea is to first predict the performance indicator at the level of activities, and then to aggregate these predictions at the level of a process instance by means of flow analysis techniques. The paper specifically develops this idea in the context of predicting the remaining cycle time of ongoing process instances. The proposed approach has been evaluated on four real-life event logs and compared against several baselines.
Content may be subject to copyright.
White-Box Prediction of Process Performance Indicators via
Flow Analysis
Ilya Verenich
eensland University of Technology
Brisbane, Australia
ilya.verenich@qut.edu.au
Hoang Nguyen
eensland University of Technology
Brisbane, Australia
huanghuy.nguyen@hdr.qut.edu.au
Marcello La Rosa
eensland University of Technology
Brisbane, Australia
m.larosa@qut.edu.au
Marlon Dumas
University of Tartu
Tartu, Estonia
marlon.dumas@ut.ee
ABSTRACT
Predictive business process monitoring methods exploit histori-
cal process execution logs to provide predictions about running
instances of a process, which enable process workers and man-
agers to preempt performance issues or compliance violations. A
number of approaches have been proposed to predict quantitative
process performance indicators, such as remaining cycle time, cost,
or probability of deadline violation. However, these approaches
adopt a black-box approach, insofar as they predict a single scalar
value without decomposing this prediction into more elementary
components. In this paper, we propose a white-box approach to
predict performance indicators of running process instances. e
key idea is to rst predict the performance indicator at the level
of activities, and then to aggregate these predictions at the level
of a process instance by means of ow analysis techniques. e
paper specically develops this idea in the context of predicting
the remaining cycle time of ongoing process instances. e pro-
posed approach has been evaluated on four real-life event logs and
compared against several baselines.
CCS CONCEPTS
Information systems Information systems applications; Deci-
sion support systems;
KEYWORDS
Process Mining, Predictive Process Monitoring, Flow analysis
ACM Reference format:
Ilya Verenich, Hoang Nguyen, Marcello La Rosa, and Marlon Dumas. 2017.
White-Box Prediction of Process Performance Indicators via Flow Analysis.
is author is also aliated with the Institute of Computer Science, University of
Tartu, Estonia.
is author is also aliated with the eensland University of Technology, Australia.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permied. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ICSSP’17, Paris, France
©2017 ACM. 978-1-4503-5270-3/17/07.. . $15.00
DOI: 10.1145/3084100.3084110
In Proceedings of 2017 International Conference on Soware and Systems
Process , Paris, France, July 2017 (ICSSP’17), 10 pages.
DOI: 10.1145/3084100.3084110
1 INTRODUCTION
Predictive business process monitoring techniques seek to deter-
mine the future state or properties of ongoing process instances
based on models extracted from historical event logs. A wide range
of predictive monitoring techniques have been proposed to predict
for example compliance violations [
13
,
14
], the next activity or the
remaining sequence of activities of a process instance [
8
,
23
], or
quantitative process performance indicators, such the remaining
cycle time of a process instance [
18
,
19
,
22
]. ese predictions can
be used to alert process workers to problematic process instances or
to support resource allocation decisions, e.g. to allocate additional
resources to instances that are at risk of a deadline violation.
is paper addresses the problem of predicting quantitative pro-
cess performance indicators, with a specic focus on predicting
the remaining cycle time of ongoing process instances. Existing
approaches to this problem adopt a “black-box” approach by build-
ing stochastic models or regression models which, given a process
instance, predict the remaining execution time as a single scalar
value, without seeking to explain this prediction in terms of more
elementary components. Yet, quantitative performance indicators
such as cost or time are aggregations of corresponding performance
indicators of the activities composing the process. In particular, the
cycle time of a process instance consists of the sum of the cycle time
of the activities performed in that process instance. In this respect,
existing techniques allow us to predict the aggregate value of a per-
formance indicator for a running process instance, but they do not
explain how each activity contributes to this aggregate prediction.
Motivated by this observation, this paper proposes a “white-box”
approach to predicting quantitative performance indicators of run-
ning process instances based on a general technique for quantitative
process analysis known as ow analysis. e idea of ow analysis
is to estimate a quantitative performance indicator at the level of a
process by aggregating the estimated values of this performance
indicator at the level of the activities in the process, taking into
account the control-ow relations between these activities. Ac-
cordingly, in order to predict the remaining cycle time of a process
instance, we propose to rst estimate the cycle time of each activity
ICSSP’17, July 2017, Paris, France I. Verenich et. al.
that might potentially be executed within this process instance, and
then to aggregate these estimates using ow analysis.
In addition to providing predictions that can be traced down to
the level of individual activities, we show via an empirical evalua-
tion with real-life business process event logs, that the proposed
technique achieves comparable and sometimes higher prediction
accuracy relative to several state-of-the-art “black-box” baselines.
e remainder of the paper is structured as follows. Section 2
presents the related work on process prediction, with an emphasis
on the prediction of remaining time. Section 3 introduces the im-
portant concepts and notations used in the paper. Section 4 outlines
the details of the proposed approach. Next, Section 5 presents an
experimental evaluation of our approach and compares it with the
baseline techniques. Finally, Section 6 concludes the paper and
outlines future work directions.
2 RELATED WORK
A wide range of predictive business process monitoring problems
have been studied in previous work, including the prediction of
delays and deadline violations, remaining cycle time, outcome, and
future events of a running case.
e problem of predicting delays and deadline violations in busi-
ness processes has been addressed by dierent authors. Pika et
al. [
16
] propose a technique for predicting deadline violations by
identifying process risk indicators that cause the possibility of a
delay. Metzger et al. [
15
] present techniques for predicting “late
show” events (i.e. delays between the expected and the actual time
of arrival) in a freight transportation process by nding correla-
tions between “late show” events and external variables related to
weather conditions or road trac. Finally, Senderovich et al. [
20
]
apply queue mining techniques to predict delays in case executions.
Another group of works address the prediction of the remaining
cycle time of running cases. Van Dongen et al. predict the remain-
ing time by ing non-parametric regression models based on the
frequencies of activities within each case, their average durations,
and case aributes [
24
]. Van der Aalst et al. [
22
] propose a remain-
ing time prediction method by constructing a transition system
from the event log using set, bag, or sequence abstractions of ob-
served events. Polato et al. [
17
] rene this method by proposing a
data-aware transition system annotated with classiers and regres-
sors. Rogge-Solti and Weske [
18
,
19
] model business processes as
stochastic Petri nets and perform Monte Carlo simulation to predict
the remaining time of a process instance. De Leoni et al. [
5
,
6
]
propose a general framework to predict various characteristics of
running instances, including the remaining time, based on correla-
tions with other characteristics and using decision and regression
trees. e remaining time prediction problem has also been exten-
sively studied in the context of soware development processes.
For example, Kikas et al. [
10
] predict issue resolution time in Github
projects using static, dynamic and contextual features. In this paper,
we show that the remaining cycle time of a process instance can be
decomposed into a sum of the cycle times of the activities that are
yet to be performed in that process instance. us, estimating cycle
times of individual activities, we can estimate the entire remaining
time of a case.
Another category of techniques aim to predict the outcome of
running cases. For example, Maggi et al. [
14
], propose a framework
to predict the outcome of a case (normal vs. deviant) based on the
sequence of activities executed in a given case and the values of
data aributes of the last executed activity in a case. is laer
framework constructs a classier on-the-y (e.g. a decision tree
or random forest) based on historical cases that are similar to the
(incomplete) trace of a running case. Other approaches construct
a collection of classiers oine. For example, [
13
] construct one
classier for every possible prediction point (e.g. predicting the
outcome aer the rst event, the second one and so on). Conforti
et al [
4
] apply a multi-classier (decision trees) at each decision
point of the process, to predict the likelihood of various types of
risks, such as cost overruns and deadline violations.
A nal group of techniques aim to predict future event(s) of a
running case. Lakshmanan et al. [
12
] use Markov chains to estimate
the probability of future execution of a given task in a running case;
Breuker et al. [
3
] use probabilistic nite automata to predict the
next activity to be performed while Tax et al [
21
] predict the entire
continuation of a running case as well as timestamps of future
events using long short-term memory (LSTM) neural networks.
In this paper, we do not address the problems of case outcome
prediction and future events prediction, although our approach
could in principle be extended in these directions.
3 BACKGROUND
In this section, we introduce concepts used in later sections of this
paper.
3.1 Event logs, traces and sequences
For a given set
A
,
A
denotes the set of all sequences over
A
and
σ=ha1,a2, . . . , ani
a sequence of length
n
;
hi
is the empty sequence
and
σ1·σ2
is the concatenation of sequences
σ1
and
σ2
.
hdk(σ)=
ha1,a2, . . . , aki
is the prex of length
k
(0
<k<n
) of sequence
σ
and
tl k(σ)=hak+1, . . . , ani
is its sux. For example, for a
sequence σ1=ha,b,c,d,ei,hd2(σ1)=ha,biand tl2(σ1)=hc,d,ei.
Let
E
be the event universe, i.e., the set of all possible event
identiers, and
T
the time domain. We assume that events are
characterized by various properties. One of these properties is the
timestamp of an event
1
, meaning that there is a function
πT
E → T
that assigns timestamps to events. Other properties of an
event include its activity, resource performing the event, etc.
Denition 3.1 (Trace). Atrace is a nite non-empty sequence of
events
σ∈ E
such that each event appears only once and time
is non-decreasing, i.e., for 1
i<j≤ |σ|
:
σ(i),σ(j)
and
πT(σ(i)) πT(σ(j) )
. A trace in a log represents the execution of
one case.
Denition 3.2 (Event log). An event log is a set of events, each
linked to a particular trace and globally unique, i.e., the same event
cannot occur twice in a log.
3.2 Flow analysis
Flow analysis is a family of techniques that enables estimation of
the overall performance of a process given knowledge about the
1Hereinaer, we refer to the event completion timestamp unless otherwise noted.
White-Box Prediction of Process Performance Indicators via Flow Analysis ICSSP’17, July 2017, Paris, France
performance of its activities. For example, using ow analysis one
can calculate the average cycle time of an entire process if the
average cycle time of each activity is known. Flow analysis can also
be used to calculate the average cost of a process instance knowing
the cost-per-execution of each activity, or calculate the error rate
of a process given the error rate of each activity [
7
]. e main
advantage of the ow analysis is that the estimation can be easily
explained in terms of its elementary components.
Denition 3.3 (Cycle time of an activity). Acycle time of an activ-
ity
i
is the average time it takes between the moment the activity
is ready to be executed and the moment it completes. By “ready to
be executed” we mean that all activities upon which the activity
in question depends have completed. Formally, cycle time is the
dierence between the timestamp of the activity and the timestamp
of the previous activity. i.e.
πT(σ(i)) πT(σ(i
1
))
for 1
i≤ |σ|
.
Here, πT(σ(0)) denotes the start time of the case.
e cycle time of an activity includes the processing time of the
activity, as well as all waiting time prior to the execution of the
activity. Processing time refers to the time that actors spend doing
actual work. On the other hand, waiting time is the portion of the
cycle time where no work is being done to advance the process.
is may include time spent in transferring information about the
case between process participants, for example when documents
are exchanged by post, as well as time when the case is waiting
for an actor to process it. In many processes, the waiting time
makes up a considerable proportion of the overall cycle time. is
situation may, for example, happen when the work is performed in
batches. In a process related to the approval of purchase requisitions
at a company, the supervisor responsible for such approvals in a
business unit might choose to batch all applications and check them
only once at the start or the end of a working day [7].
To understand how ow analysis works, we start with an exam-
ple of a process with sequential fragments of events as in Figure 1a.
Each fragment has a single entry ow and a single exit ow and
has a cycle time
Ti
. Since the fragments are performed one aer
the other, we can intuitively conclude that the cycle time
CT
of a
purely sequential process with
N
event fragments is the sum of the
cycle times of each fragment [7]:
CT =
N
X
i=1
Ti(1)
Let us consider a process model with a decision point between
N
mutually exclusive fragments, represented by an XOR gateway
(Figure 1b). In this case, the cycle time of a process model is
CT =
N
X
i=1
pi·Ti,(2)
where
pi
denote the branching probabilities, i.e. frequencies
with which a given branch iof a decision gateway is taken.
In case of parallel, or AND gateways where activities can be
executed concurrently as in Figure 1c, the combined cycle time of
multiple fragments is determined by the slowest of the fragments,
that is:
CT =max
i=1...n
Ti(3)
Another recurrent paern is the one where a fragment of a pro-
cess may be repeated multiple times, for instance because of a failed
quality control. is situation is called rework and is illustrated in
Figure 1d. e fragment is executed once. Next, it might be repeated
each time with a probability
r
referred to as rework probability. e
average number of times that the rework fragment is expected to
be executed can be obtained via the geometric series [
7
], and the
cycle time of the fragment in this case is:
CT =T
1r(4)
(a)
(b)
(c)
(d)
Figure 1: Typical process model patterns: sequential (a),
XOR-block (b), AND-block (c) and rework loop (d).
Besides cycle time, ow analysis can also be used to calculate
other performance measures. For instance, assuming we know
the average cost of each activity, we can calculate the cost of a
process more or less in the same way as we calculate cycle time. In
particular, the cost of a sequence of activities is the sum of the costs
of these activities. e only dierence between calculating cycle
time and calculating cost relates to the treatment of AND-blocks.
e cost of an AND-block such as the one shown in Figure 1c is not
the maximum of the cost of the branches of the AND-block. Instead,
the cost of such a block is the sum of the costs of the branches. is
is because aer the AND-split is traversed, every branch in the
AND join is executed and therefore the costs of these branches add
up to one another [7].
ICSSP’17, July 2017, Paris, France I. Verenich et. al.
In case of block-structured process models that can be repre-
sented as a sequence of event fragments with a single entry and a
single exit, we can relate each fragment to one of the four described
types and use the aforementioned equations to estimate the required
performance measure. However, in case of an unstructured process
model or if a model contains other modeling constructs besides
AND and XOR gateways, the method for calculating performance
measures becomes more complicated.
A major limitation of ow analysis is that it does not consider
the fact that a process behaves dierently depending on the load, i.e.
the number of process instances that are running concurrently. For
example, the cycle time of a process for handling insurance claims
would be much slower if the insurance company was handling
thousands of claims at once, due for example to a recent natural
disaster as compared to the case where the load is low and the
company may be handling only a hundred claims at once. When the
load increases and the number of process workers remains constant,
the waiting times tend to increase. is phenomenon is referred to
as resource contention. It occurs when there is more work to be done
than resources available to perform the work. In such scenarios,
some tasks will be in waiting mode until a required resource is
freed up. Flow analysis does not take into account the eects of
increased resource contention. Instead, the estimates obtained from
ow analysis are only applicable if the level of resource contention
is relatively stable over the long term.
4 APPROACH
In this section, we describe the proposed approach to predict the
remaining time. We rst provide an overview of the entire solution
framework and then focus on the key parts of our approach.
4.1 Overview
Our approach exploits historical execution traces in order to dis-
cover a structured process model. Once the model has been dis-
covered, we identify its set of activities and decision points and
train two families of machine learning models: one to predict the
cycle time of each activity, and the other to predict the branching
probabilities of each decision point. To speed up the performance
at runtime, these steps are performed oine (Figure 2).
At runtime, given an ongoing process instance, we align its
partial trace with the discovered process model to determine the
current state of the instance. Next, we traverse the process tree
obtained from the model starting from the state up to the process
end and deduce a formula for remaining time using rules described
in Section 3.2. e formula includes cycle times of activities and
branching probabilities of decision points that are reachable from
the current execution state. ese components are predicted using
previously trained regression and classication models. Finally, we
evaluate the formula and obtain the expected value of the remaining
cycle time.
4.2 Discovering Process Models From Event
Logs
e proposed approach relies on a process model as input. How-
ever, since the model is not always known or might not conform
to the real process, generally we need to discover the model from
Figure 2: Overview of the proposed approach.
event logs. For that, we use a two-step automated process discov-
ery technique proposed in [2] that has been shown to outperform
traditional approaches with respect to a range of accuracy and
complexity measures. e technique has been implemented as a
standalone tool
1
as well as a ProM plugin, namely StructuredMiner.
e technique in [
2
] pursues a two-phase “discover and structure”
approach. In the rst phase, a model is discovered from the log
using a heuristic process discovery method that has been shown
to consistently produce accurate, but potentially unstructured or
even unsound models. In the second phase, the discovered model
is transformed into a sound and structured model by applying two
techniques: a technique to maximally block-structure an acyclic
process model and an extended version of a technique for block-
structuring owcharts. is approach has been shown to outper-
form traditional ”discover structured” approaches with respect to a
range of accuracy and complexity measures.
A structured model is internally represented as a process tree. A
process tree is a tree where each leaf is labeled with an activity and
each internal node is labeled with a control-ow operator: sequence,
exclusive choice,non-exclusive choice,parallelism, or iteration.
4.3 Replaying Partial Traces on the Process
Model
For a given partial trace, to predict its remaining time, we need
to determine the current state of the trace relative to the process
model. For that, we map, or align, a trace to the process model
using the technique described in [
1
] which is available as a plugin
for the open-source process mining platform Apromore.
e technique treats a process model as a graph that is composed
of activities as nodes and their order dependencies as arcs. A case
replay can be seen as a series of coordinated moves, including those
over the model activities and gateways and those over the trace
events. In that sense, a case replay is also termed an alignment of a
process model and a trace. Ideally, this alignment should result in
as many matches between activity labels on the model and event
labels in the trace as possible. However, practically, the replay may
choose to skip a number of activities or events in search of more
matches in later moves. Moves on the model must observe the
semantics of the underlying modeling language which is usually
expressed by the notion of tokens. For example, for a BPMN model,
a move of an incoming token over a XOR split gateway will result in
a single token produced on one of the gateway outgoing branches,
1Available at hp://apromore.org/platform/tools
White-Box Prediction of Process Performance Indicators via Flow Analysis ICSSP’17, July 2017, Paris, France
while a move over an AND split gateway will result in a separate
token produced on each of the gateway outgoing branches. e
set of tokens located on a process model at a point in time is called
amarking. On the other hand, a move in the trace is sequential
over successive events of the trace ordered by timestamps, one aer
another. us, aer every move, either on the model or in the trace,
the alignment comes to a state consisting of the current marking of
the model and the index of the current event in the trace.
In [
1
], cases are replayed using a heuristics-based backtracking
algorithm that searches for the best alignment between the model
and a partial trace. e algorithm can be illustrated by a traversal
of a process tree starting from the root node, e.g. using depth-rst
search, where nodes represent partial candidate solution states
(Figure 3). Here the state represents the aforementioned alignment
state of the case replay. At each node, the algorithm checks whether
the alignment state till that node is good enough. If so, it generates
a set of child nodes of that node and continues down that path;
otherwise, it stops at that node, i.e. it prunes the branch under the
node, and backtracks to the parent node to traverse other branches.
Figure 3: Backtracking algorithm (taken from [1]).
4.4 Obtaining the ow analysis formulas
Having determined the current state of the case execution, we
traverse the process model starting from that state until the process
completion in order to obtain the ow analysis formulas.
As a running example, let us consider a simple process model
in Figure 4. Applying the ow analysis formulas described in Sec-
tion 3.2, the average cycle time of this process can be decomposed
as follows:
CT =TA+max(TB+TC,TD)+TF+p2TG+TH
1r(5)
Note that one of the branches of gateway
X
21 is empty and
therefore does not contribute to the cycle time. erefore, only the
branch with the probability p2 is included in the equation.
e components of the formula – cycle times of individual ac-
tivities and branching and rework probabilities – can be estimated
as averages of their historical values. However, since we deal with
ongoing process cases, we can use the information that is already
available from the case prex to predict the above components.
Consider, we have a partial trace
hd(σ)=hA,D,Bi
. Replaying
this trace on the given model as described in the Section 4.3, we
nd the current marking to be in the states
B
and
D
within the
AND-block. Traversing the process model starting from these states
until the process end, we obtain the following formula:
CTr em =max(TB+TC,TD)+TF+p2TG+TH
1r(6)
Since the activity
A
has already been executed, it does not con-
tribute to the remaining cycle time. us, it is not a part of the
formula. Furthermore,
TB
and
TD
have been executed, however,
since they form one of the terms of the formula wherein
TC
is
still unknown, they cannot be omied, but their actual cycle times
should be taken. All the other formula terms need to be predicted
using the data from hd(σ).
Similarly, if a current marking is inside a XOR block, its branch-
ing probabilities need not be predicted. Instead, the probability of
the branch that has actually been taken is set to 1 while the other
probabilities are set to 0.
A more complex situation arises when the current marking is
inside the rework loop. In this case, we “unfold” the loop as shown
in the Figure 5. Specically, we separate the already executed occur-
rences of the rework fragment from the potential future occurrences
and take the former out of the loop. Let us consider a partial trace
hd(σ)=hA,D,B,C,F,G,Hi
. Since
H
has occurred once, according
to the process model (Figure 4), with a probability
r
, it may be
repeated, otherwise, the rework loop is exited. To signal this choice,
we take the rst occurrence of
H
out of the loop, and place a XOR
gateway aer it. One of the branches will contain a rework loop of
future events with the same probability
r
, while the other one will
reect an option to skip the loop altogether. us, the cycle time of
the whole fragment can be decomposed as follows:
CTH=TH0+rTH
1r,(7)
where
TH0
refers to the cycle time of already executed occurrence(s)
of
H
. It is highlighted in bold font, meaning that we should take
the actual cycle time rather than the predicted.
4.5 Computing the remaining time
We can use the ow analysis formulas produced by the method
described in Section 4.4 to compute the remaining cycle time of
a case, given: (i) an estimate of the cycle time of each activity
reachable from the current execution state; and (ii) an estimate of
the branching probability of each ow stemming from a reachable
XOR-split (herein called a reachable conditional ow). Given an
execution state, these estimates can be obtained in several ways
including:
(1)
By using the prediction models produced for each reachable
activity and for each reachable conditional ow, taking into
account only traces that reach the current execution state.
We herein call this approach predictive ow analysis.
(2)
By computing the mean cycle time of each reachable ac-
tivity and the traversal frequency of each reachable condi-
tional ow, again based only on the suxes of traces that
reach the execution state in question. We call this approach
mean ow analysis
ICSSP’17, July 2017, Paris, France I. Verenich et. al.
x32
x31
x21 end
A
start x11
B
D
C
x12
FH
G
x22
p2
p1
r
Figure 4: Example process model. Highlighted is the current marking
H
1-r
H
H*
r
1-r
1-r
r
r
(a) (b)
Figure 5: Unfolding the rework loop of F
e rationale for the mean ow analysis is that the prex size
can have two opposite eects on prediction accuracy. If a prex is
too short, there might not be enough information in it to predict
cycle times of some activities and gateways’ branching probabilities,
especially those that are executed near the process end. On the
other hand, if the prex is long, for activities and gateways that are
usually executed at the beginning of the process, we will not have
enough training data to t the model. As an example, let us consider
an activity that, according to the process model, usually occurs in
the 4th or 5th position in the process, but in a few cases can occur
in the 8th position. en, to t a model for a prex length 5, as
training data we can only use these few cases, since for most other
cases, the activity will not occur aer the 5th event. In cases where
the accuracy of the produced predictive models is insucient, we
can then use the mean historical activity cycle times instead.
In order to make use of predictive models, we need to encode
process execution traces in the form of feature vectors. In this paper,
we use index-based encoding as described in [
13
] that concatenates
the case aributes and, for each position in a trace, the event oc-
curring in that position and the value of each event aribute in
that position. is type of trace encoding is lossless and has been
shown to achieve a relatively high accuracy and reliability when
making early predictions of binary process properties [13, 25].
For each activity in the process model, to predict its cycle time,
we train a regression model, while for predicting branching prob-
abilities we t classication models for each corresponding XOR
gateway. In the laer case, each branch of a gateway is assigned
a class starting from 0, and the model makes predictions about
the probability of each class. e predictive models are trained for
prexes
hdk(σ)
of all traces
σ
in the training set for 2
k<|σ|
.
We do not train and make predictions aer the rst event, since
for those prexes there is no sucient data available to base the
predictions upon.
As an example, let us consider a snapshot of the log with one
completed case in Table 1 that corresponds to the process model
in Figure 4. e events are ordered according to their completion
timestamp.
Table 1: Extract of an event log.
Case Case aributes Event aributes
ID Channel Age
Activity
Timestamp Resource
1 Email 37 A 9:13:00 R03
1 Email 37 B 9:14:20 R12
1 Email 37 D 9:16:00 R07
1 Email 37 C 9:18:00 R03
1 Email 37 F 9:18:10 R21
1 Email 37 G 9:18:50 R12
1 Email 37 H 9:19:00 R12
To encode traces as feature vectors, we include both case at-
tributes and event aributes. us, the rst case in the log will be
encoded as such:
~
X=(Email,37; A,B,D,C,F,G,H; 9:13:00,R03; 9:14:20,R12;
9:16:00,R07; 9:18:00,R03; 9:18:10,R21; 9:18:50,R12; 9:19:00,R12)
Now, to create the training set for
hdk(σ)
, we cut the feature
vectors to include the event aributes up to the
k
-th event and
case aributes (which are usually known since the beginning of
the case). Furthermore, we add the value of the target variable
y
to be learned. For example, if we are to predict the cycle time of
activity
G
for prexes
k=
2, the training sample based on the data
extracted from the rst case in Table 1 would be created as follows:
D2
G=~
X2,yG={Email,37; A,B; 9:13:00,R03; 9:14:20,R12; 40}
Here 40 is the cycle time of
G
for the rst case, determined as
the time dierence (in seconds) between the completion timestamp
of
G
and the completion timestamp of the previous activity
F
. It
should be noted that for a case that follows the upper branch of
the gateway
x
21, the process terminates aer
F
, thus
G
is never
executed and its cycle time is undened. erefore, we exclude
such cases from the training data. Conversely, if an activity occurs
multiple times in a case, we take its average cycle time.
White-Box Prediction of Process Performance Indicators via Flow Analysis ICSSP’17, July 2017, Paris, France
Similarly, if we are to predict the branching probabilities for
X
32
gateway for prexes
k=
2, we would assign class 0 to the branch
that leads to rework and class 1 to the other branch. en, the rst
training sample would be:
D2
x32 =~
X2,yx32={Email,37; A,B; 9:13:00,R03; 9:14:20,R12; 1}
Since
H
is not repeated for the rst case, we assign class 1 to the
gateway. Evidently, the probability of class 0 would be equal to the
rework probability r.
5 EVALUATION
In the following section, we empirically compare the predictive
ow analysis and the mean approaches between them and against
baselines proposed in previous work. In particular, we seek to
answer the following specic research questions:
RQ1.
Do ow analysis-based techniques provide accurate pre-
dictions in comparison with state-of-the-art baselines?
RQ2.
Do ow analysis-based techniques provide stable results
at dierent stages of ongoing cases?
e rst question focuses on the quality of the predictions, while
the second one relates to the stability of the results at dierent
stages of running cases. Next, we describe the conducted exper-
iments to answer these research questions. e source code and
supplementary material required to reproduce the experiments re-
ported in this paper can be found at hp://github.com/verenich/
ow-analysis-predictions
5.1 Datasets
We conducted the experiments using four real-life event datasets.
Table 2 summarizes the basic characteristics of each dataset.
First three datasets originate from the Business Process Intelli-
gence Challenge (BPIC’12)
1
and contain data from the application
procedure for nancial products at a large nancial institution. is
process consists of three subprocesses: one that tracks the state of
the application (BPIC’12 A), one that tracks the state of the oer
(BPIC’12 O), and a third one tracks the states of work items associ-
ated with the application (BPIC’12 W). For the laer subprocess, we
retain only events of type complete. e fourth dataset is based on
the log that contains events from a ticketing management process
of the help desk of an Italian soware company2. Each case starts
with the insertion of a new ticket into the ticketing management
system and ends when the issue is resolved and the ticket is closed.
As mentioned in Section 3.2, ow analysis technique cannot read-
ily deal with unstructured models. Even though the tool described
in Section 4.2 aims to mine maximally structured models, it does
not always succeed in doing so. Specically, it sometimes produces
models with overlapping loops which our current implementation
is unable to deal with. One solution to this problem could be to
simplify the process model by removing the transitions that cause
overlapping loops. However, this may severely decrease the accu-
racy of the discovered model, which will, in turn, negatively aect
the accuracy of the ow analysis-based predictions of remaining
1doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
2doi:10.17632/39bp3vv62t.1
time. Hence, instead, we remove the cases that cause overlapping
loops from the event log (up to 15% of cases in each log).
Table 2: Summary of datasets.
Dataset
Number of Mean Mean case
cases activities case events duration,
variants per case days
BPIC’12 A 12,007 10 10 4.49 7.5
BPIC’12 O 3,487 7 6 4.56 15.1
BPIC’12 W 9,650 6 2,263 7.50 11.4
helpdesk 3,218 5 8 3.30 7.3
5.2 Experimental setup
To assess the quality of the prediction of continuous variables, well-
known error metrics are Mean Absolute Error (MAE), Root Mean
Square Error (RMSE) and Mean Percentage Error (MAPE) [
9
], where
MAE is dened as the arithmetic mean of the prediction errors,
RMSE – as the square root of the squared prediction errors, while
MAPE measures error as the average of the unsigned percentage er-
ror. We observe that the value of remaining time tends to be highly
varying across cases, with values at dierent orders of magnitude.
RMSE would be very sensitive to such outliers. Furthermore, the
remaining time can be very close to zero, especially near the end
of the trace, thus MAPE is skewed in such situations. Hence, we
use MAE to measure the error in predicting the remaining time.
We employ several baselines to compare our approach to. Firstly,
we use a transition system (TS) based method proposed by van der
Aalst et al. [
22
] applying both set, bag and sequence abstractions.
Secondly, we use a method proposed by Leontjeva et al. [
13
] who
compared several types of business process sequence encodings
for prediction of the boolean case outcome. is method can be
naturally adjusted to predict the remaining time by replacing the
classication task with the regression task. For the purpose of this
paper, we will reproduce only two types of the original encodings
– index-based and frequency-based, as the others were shown to
have either very similar or inferior performance. Next, we evaluate
against the stochastic Petri-net (SPN) based approach proposed by
Rogge-Solti and Weske [
18
,
19
]. Specically, we use the method
based on the constrained Petri net, as it was shown to have the
lowest prediction error. However, their original approach makes
predictions at xed time points, regardless of the arriving events.
To make the results comparable to our approach, we modify the
method to make predictions aer each arrived event. Finally, we
used a combined estimator along the lines of [
24
] where the feature
set includes the frequencies of activities within each case, their
average durations, and case aributes.
In our experiments, we order the cases in the logs based on
the time at which the rst event of each case has occurred. en,
we split the logs into two parts. We use the rst part (2/3 of the
cases) as a training set, i.e. as historical data to train the predictive
models. e remaining 1/3 of the cases are used to evaluate the
accuracy of the predictions. Furthermore, we perform a ve-fold
cross-validation on the training set in order to select the optimal
values of the training parameters such as the number of trees and
the number of variables at each split for a random forest model.
ICSSP’17, July 2017, Paris, France I. Verenich et. al.
5.3 Results
Table 3 summarizes the performance of the predictive and mean
ow analysis techniques, as well as the baselines approaches for
each dataset. We make predictions for prexes
hdk(σ)
of traces
σ
in the test set starting from
k=
2. However, since for very long
prexes, there are not enough traces with that length, and the error
measurements become unreliable, we stop the predictions aer
k
reaches the 70th-percentile length of the traces in the log, i.e.
at least 70% of the traces in the log have a length smaller than
k
.
us, since the BPIC’12 W log contains longer traces, the prex
sizes evaluated are higher for this log. Additionally, we report the
average performance across all prexes, weighted over the relative
frequency of traces with that prex (i.e. longer prexes get lower
weights, since not all traces reach that length).
We observe that for most logs, the prediction accuracy of ow
analysis-based techniques is at least as good as that of the baselines.
At the same time, for all logs except BPIC’12 O, mean ow analysis,
on average, provides the best results among all the methods. Specif-
ically, it outperforms the predictive ow analysis. e laer is due
to the lack of data aributes in the event logs that would be able
to accurately explain the variation in the cycle times of individual
activities and branching probabilities of each conditional ow. To
further investigate this issue, for each activity in the BPIC’12 A and
BPIC’12 O logs, we analyze the performance of regressors trained
to predict its cycle time and compare it with a constant regressor
used in the mean ow analysis. In Table 4 we report MAE of cycle
times for each activity and each technique, as evaluated on the test
set. Since for each prex length we have a separate regressor, we
report weighted average values, as in Table 3. In addition, we report
the actual average cycle time values of each predicted activity based
on the test set.
As can be seen from Table 4, in the BPIC’12 O log, prediction-
based cycle times are more accurate than the constant ones for
longer activities which make up the largest portion of the remaining
cycle time. Furthermore, the dierence between the two approaches
is higher for BPIC’12 O. Hence, for this log, we can estimate the
remaining time more accurately with the predictive ow analysis.
Another observation is related to the very low accuracy of the
predictive ow analysis on the BPIC’12 W log. Having closely in-
spected this log, we found that it contains sequences of two or more
events in a row of the same activity. In other words, activities are
frequently reworked multiple times. As mentioned in Section 3.2,
ow analysis techniques assume a constant rework probability
r
.
However, in many real-life processes
r
subsequently decreases af-
ter each execution of the rework loop, meaning that the rework
becomes less and less likely. us, if
r
is inaccurately predicted in
predictive ow analysis, this error propagates further. To verify our
hypothesis, we modify the log keeping only the rst occurrence
of each repeated event in a sequence. To keep the remaining time
calculations correct, we retain the last event of a case, even if it is a
repeated event. Having run the experiments on the modied log
(Table 5), we notice that predictive ow analysis becomes almost
as accurate as mean ow analysis, thus proving our hypothesis.
Summing up, the experiments suggest that ow analysis-based
techniques provide relatively accurate estimations of the remaining
cycle time across all logs. us, we can positively answer RQ1.
Our experiments also show that ow analysis-based techniques
are able to provide relatively accurate predictions starting from
the early stages of an ongoing case. e general trend is a stable
reduction in MAE values as a case progresses. is is due to the
increasing amount of aributes in the prex to base the predictions
upon. Furthermore, the actual remaining times intuitively decreases
at later stages of a case, thus its prediction error also decreases. We
can then provide a positive answer to RQ2.
Execution Times. e execution time of the proposed approach
is composed of the execution times of the following components:
(i) training the predictive models; (ii) replaying the partial traces on
the process model (nding an alignment) and deriving the formulas;
(iii) applying the models to predict the cycle times and branching
probabilities and calculating the overall remaining time. For real-
time prediction, it is crucial to output the results faster than the
mean case arrival rate. us, we also measured the average runtime
overhead of our approach. All experiments were conducted on a
laptop with a 2.4 GHz Intel Core i5 processor and 8 Gb of RAM.
For a given prex length
k
, training all the models takes between
20 and 200 seconds depending on the prex size and the number
of models to train. Replaying the test traces takes between 5 and
45 seconds, for a given length of the prex. Finally, making the
predictions takes less than 4 seconds per prex length. is shows
that our approach performs within reasonable bounds for most
online applications.
5.4 reats to Validity
e datasets used in this evaluation, except for the BPIC’12 W,
have only the completion timestamps, but not the start timestamps.
us, it is impossible to discern the actual processing time from the
waiting time. e laer can have a signicant impact on the overall
cycle time depending on the case arrival rate and the resource load.
As these factors are not accounted for in the predictive models,
their accuracy is rather low.
We reported the results with a single learning algorithm (random
forest). With decision trees and gradient boosting, we obtained
qualitatively the same results, relatively to the baselines. However,
our approach is independent of the learning algorithm used. us,
using a dierent algorithm does not in principle invalidate the re-
sults. at said, we acknowledge that the goodness of t, as in
any machine learning problem, depends on the particular classi-
er/regressor algorithm employed. Hence, it is important to test
multiple algorithms for a given dataset, and to apply hyperparam-
eter tuning, in order to choose the most adequate algorithm with
the best conguration.
e proposed approach relies on the accuracy of the branching
probability estimates provided by the classication model. It is
known however that the likelihood probabilities produced by clas-
sication methods are not always reliable. Methods for estimating
the reliability of such likelihood probabilities have been proposed
in the machine learning literature [
11
]. A possible enhancement of
the proposed approach would be to integrate heuristics that take
into account such reliability estimates.
White-Box Prediction of Process Performance Indicators via Flow Analysis ICSSP’17, July 2017, Paris, France
Table 3: MAE values (in days) for prexes of dierent lengths.
Method Prex length
Avg2345678910
BPIC’12 A
Predictive ow analysis 9.48 9.60 10.38 9.68 7.04
Mean ow analysis 8.32 7.89 9.62 8.81 6.88
TS set abstraction [22] 9.16 8.39 10.53 10.31 8.02
TS bag abstraction [22] 9.16 8.39 10.53 10.31 8.02
TS sequence abstraction [22] 9.16 8.39 10.53 10.31 8.02
Index-based encoding [13] 9.07 8.21 10.48 10.38 7.99
Frequency-based encoding [13] 9.16 8.40 10.52 10.28 8.02
Constrained SPN [19] 8.44 9.15 8.47 7.41 6.89
Combined estimator [24] 9.05 8.19 10.48 10.32 8.03
BPIC’12 O
Predictive ow analysis 5.96 7.46 6.40 2.55
Mean ow analysis 6.33 8.00 6.81 2.53
TS set abstraction [22] 6.05 8.03 6.81 2.54
TS bag abstraction [22] 6.05 8.03 6.81 2.54
TS sequence abstraction [22] 6.05 8.03 6.81 2.54
Index-based encoding [13] 6.36 8.06 6.82 2.52
Frequency-based encoding [13] 6.33 8.02 6.81 2.54
Constrained SPN [19] 5.49 6.46 6.46 2.42
Combined estimator [24] 6.34 8.04 6.80 2.52
BPIC’12 W
Predictive ow analysis 14.48 15.38 15.33 15.83 14.32 16.08 11.62 12.52 13.67 12.21
Mean ow analysis 7.35 8.58 8.01 7.49 7.20 6.87 6.70 6.61 6.36 6.21
TS set abstraction [22] 7.99 9.04 8.70 8.20 7.93 7.50 7.34 7.35 6.94 6.75
TS bag abstraction [22] 7.95 8.84 8.71 8.22 7.95 7.42 7.26 7.27 6.93 6.83
TS sequence abstraction [22] 7.91 8.84 8.70 8.22 7.91 7.40 7.21 7.22 6.84 6.74
Index-based encoding [13] 7.64 8.71 8.29 7.86 7.50 7.24 7.02 6.95 6.69 6.53
Frequency-based encoding [13] 7.79 8.77 8.64 8.19 7.93 7.40 7.20 7.24 6.85 6.66
Constrained SPN [19] 9.60 8.77 9.36 9.68 9.97 10.15 10.02 10.01 9.71 9.39
Combined estimator [24] 7.66 8.74 8.30 7.91 7.59 7.28 7.02 6.93 6.64 6.43
Helpdesk
Predictive ow analysis 5.97 5.24 9.36 2.76
Mean ow analysis 5.27 5.10 6.10 3.28
TS set abstraction [22] 5.52 5.44 5.92 5.14
TS bag abstraction [22] 5.59 5.49 6.15 3.08
TS sequence abstraction [22] 5.59 5.49 6.15 3.08
Index-based encoding [13] 5.58 5.39 6.54 3.26
Frequency-based encoding [13] 5.61 5.50 6.17 3.28
Constrained SPN [19] 5.54 5.34 6.53 4.29
Combined estimator [24] 5.54 5.39 6.34 3.27
6 CONCLUSION AND FUTURE WORK
e paper has put forward some potential benets of a “white-box”
approach to predicting quantitative process performance indicators.
Rather than predicting single scalar indicators, we demonstrated
how these indicators can be estimated as aggregations of corre-
sponding performance indicators of the activities composing the
process. In this way, the predicted indicators become more ex-
plainable, as they are decomposed into elementary components.
us, business analysts can pinpoint the bolenecks in the process
execution and provide beer recommendations to keep the process
compliant with the performance standards.
We implemented and evaluated two approaches – one where the
formulas’ components are predicted from the trace prex based on
the models trained on historical completed traces, and the other
one that instead uses constant values obtained from the historical
averages of similar traces. We evaluated the approaches to predict
the remaining cycle time, as one of common process performance
indicators. e empirical evaluation has shown that the proposed
techniques are, on average, able to yield more accurate predictions
at dierent stages of running cases than the surveyed baselines.
We identied a limitation of ow analysis-based approaches
when dealing with traces with rework loops, i.e. multiple occur-
rences of the same fragment of activities in a row. A direction for
future work is to further investigate the factors aecting the per-
formance of the proposed approaches in order to beer understand
their strength and weaknesses. Furthermore, we plan to extend
ICSSP’17, July 2017, Paris, France I. Verenich et. al.
Table 4: MAE of cycle time predictions of individual activi-
ties and their actual mean cycle times (in days).
Activity MAE Mean cycle
Predictive Mean time
BPIC’12 A
A CANCELLED 11.97 12.02 14.36
A APPROVED 7.61 7.51 7.36
ADECLINED 3.72 3.74 3.74
A REGISTERED 5.92 5.96 3.70
A ACTIVATED 4.46 4.47 2.88
A ACCEPTED 0.43 0.78 0.76
A PREACCEPTED 0.04 0.13 0.09
A FINALIZED 0.01 0.01 0.01
BPIC’12 O
O CANCELLED 8.68 9.75 18.20
O SENT BACK 2.79 4.01 9.42
O ACCEPTED 2.60 2.59 4.22
O DECLINED 2.50 2.43 3.54
O SENT <0.01 <0.01 <0.01
Table 5: MAE values (in days) for prexes of dierent
lengths for the modied BPIC’12 W log with excluded event
duplicates.
Method Prex length
Avg 2 3 4 5
Predictive ow analysis 6.70 8.22 5.86 4.45 4.27
Mean ow analysis 6.15 7.69 5.14 3.88 4.14
TS Set abstraction [22] 6.70 8.40 5.82 3.94 4.27
TS Bag abstraction [22] 6.69 8.40 5.82 4.04 3.97
TS Sequence abstraction [22] 6.69 8.40 5.82 4.04 3.97
Index-based encoding [13] 6.54 8.14 5.66 4.15 4.04
Freq-based encoding [13] 6.71 8.44 5.83 4.03 4.00
Constrained SPN [19] 7.82 7.76 8.14 7.70 7.46
Combined estimator [24] 6.50 8.12 5.59 4.09 3.99
the proposed approaches so that they would be able to deal with
more complex models with overlapping loops, using structuring
techniques such as the one proposed in [26].
With some modications in the derivation of the ow analysis
formulas, the proposed approaches can be extended to predict other
quantitative performance indicators. In future work, we aim to
extend and evaluate the approaches to predict the process cost or
error rate.
ACKNOWLEDGMENTS
is research is funded by the Australian Research Council under
Grant No.: DP150103356 and the Estonian Research Council under
Grant No.: IUT20-55.
REFERENCES
[1]
Robert Andrews, Suriadi Suriadi, Moe Wynn, Arthur ter Hofstede, Nguyen Hoang
Pika, Anastasiia, and Marcello la Rosa. 2016. Comparing static and dynamic
aspects of patient ows via process model visualisations. Preprint available at
hps://eprints.qut.edu.au/102848/ (2016).
[2]
Adriano Augusto, Raaele Conforti, Marlon Dumas, Marcello La Rosa, and Gior-
gio Bruno. 2016. Automated Discovery of Structured Process Models: Discover
Structured vs. Discover and Structure. In Conceptual Modeling - 35th International
Conference, ER 2016. 313–329.
[3]
Dominic Breuker, Martin Matzner, Patrick Delfmann, and J
¨
org Becker. 2016.
Comprehensible predictive models for business processes. MIS arterly 40, 4
(2016), 1009–1034.
[4]
Raaele Conforti, Massimiliano de Leoni, Marcello La Rosa, Wil M. P. van der
Aalst, and Arthur H. M. ter Hofstede. 2015. A recommendation system for
predicting risks across multiple business process instances. Decision Support
Systems 69 (2015), 1–19.
[5]
Massimiliano de Leoni, Wil M. P. van der Aalst, and Marcus Dees. 2014. A
General Framework for Correlating Business Process Characteristics. In BPM.
250–266.
[6]
Massimiliano de Leoni, Wil M. P. van der Aalst, and Marcus Dees. 2016. A general
process mining framework for correlating, predicting and clustering dynamic
behavior based on event logs. Information Systems 56 (2016), 235–257.
[7]
Marlon Dumas, Marcello La Rosa, Jan Mendling, and Hajo A. Reijers. 2013.
Fundamentals of Business Process Management. Springer.
[8]
Joerg Evermann, Jana-Rebecca Rehse, and Peter Feke. 2016. A Deep Learning
Approach for Predicting Process Behaviour at Runtime. In Proceedings of the
1st International Workshop on Runtime Analysis of Process-Aware Information
Systems. Springer.
[9]
Rob J Hyndman and Anne B Koehler. 2006. Another look at measures of forecast
accuracy. International Journal of Forecasting 22, 4 (2006), 679–688.
[10]
Riivo Kikas, Marlon Dumas, and Dietmar Pfahl. 2016. Using dynamic and
contextual features to predict issue lifetime in GitHub projects. In Proceedings of
the 13th International Conference on Mining Soware Repositories, MSR. 291–302.
DOI:hp://dx.doi.org/10.1145/2901739.2901751
[11]
Meelis Kull and Peter A. Flach. 2014. Reliability Maps: A Tool to Enhance
Probability Estimates and Improve Classication Accuracy. In Machine Learning
and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014.
18–33.
[12]
Geetika T Lakshmanan, Davood Shamsi, Yurdaer N Doganata, Merve Unuvar, and
Rania Khalaf. 2015. A Markov prediction model for data-driven semi-structured
business processes. Knowledge and Information Systems 42, 1 (2015), 97–126.
[13]
Anna Leontjeva, Raaele Conforti, Chiara Di Francescomarino, Marlon Dumas,
and Fabrizio Maria Maggi. 2015. Complex Symbolic Sequence Encodings for
Predictive Monitoring of Business Processes. In BPM. 297–313.
[14]
Fabrizio Maria Maggi, Chiara Di Francescomarino, Marlon Dumas, and Chiara
Ghidini. 2014. Predictive monitoring of business processes. In CAiSE. Springer,
457–472.
[15]
Andreas Metzger, Rod Franklin, and Yagil Engel. 2012. Predictive monitoring of
heterogeneous service-oriented business networks: e transport and logistics
case. In 2012 Annual SRII Global Conference. IEEE, 313–322.
[16]
Anastasiia Pika, Wil M P van der Aalst, Colin J Fidge, Arthur H M ter Hofstede,
and Moe T Wynn. 2012. Predicting deadline transgressions using event logs. In
BPM. Springer, 211–216.
[17]
Mirko Polato, Alessandro Sperduti, Andrea Burain, and Massimiliano de Leoni.
2014. Data-aware remaining time prediction of business process instances. In
2014 International Joint Conference on Neural Networks, IJCNN 2014. 816–823.
[18]
Andreas Rogge-Solti and Mathias Weske. 2013. Prediction of remaining service
execution time using stochastic Petri nets with arbitrary ring delays. In ICSOC.
Springer, 389–403.
[19]
Andreas Rogge-Solti and Mathias Weske. 2015. Prediction of business process
durations using non-Markovian stochastic Petri nets. Information Systems 54
(2015), 1–14.
[20]
Arik Senderovich, Mahias Weidlich, Avigdor Gal, and Avishai Mandelbaum.
2014. eue Mining - Predicting Delays in Service Processes. In CAiSE. 42–57.
[21]
Niek Tax, Ilya Verenich, Marcello La Rosa, and Marlon Dumas. 2017. Predictive
business process monitoring with LSTM neural networks. In CAiSE. Springer,
To appear.
[22]
Wil M P van der Aalst, M Helen Schonenberg, and Minseok Song. 2011. Time
prediction based on process mining. Information Systems 36, 2 (2011), 450–475.
[23]
Sjoerd van der Spoel, Maurice van Keulen, and Chintan Amrit. 2012. Process
prediction in noisy data sets: a case study in a dutch hospital. In International
Symposium on Data-Driven Process Discovery and Analysis. Springer, 60–83.
[24]
Boudewijn F van Dongen, Ronald A Crooy, and Wil M P van der Aalst. 2008.
Cycle time prediction: when will this case nally be nished?. In CoopIS. Springer,
319–336.
[25]
Ilya Verenich, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, and
Chiara Di Francescomarino. 2016. Minimizing Overprocessing Waste in Business
Processes via Predictive Activity Ordering. In CAiSE. 186–202.
[26]
Yong Yang, Marlon Dumas, Luciano Garc
´
ıa-Ba
˜
nuelos, Artem Polyvyanyy, and
Liang Zhang. 2012. Generalized aggregate ality of Service computation for
composite services. Journal of Systems and Soware 85, 8 (2012), 1818–1830.
... Several of these approaches are based on Annotated Transition Systems (ATS), where each (partial) trace is associated to a state [4]- [6], [14], [15]. Other approaches use a partial trace-based or index-based representation [1], [10], [16]. More recently, approaches have been proposed for applying machine learning methods for predicting the remaining time [2], [17]- [20]. ...
... In [1] authors present a prediction method based on non-Markovian Petri Nets, which are enriched with duration distributions of activities and the elapsed time since the occurrence of the last activity. In [16] authors propose a white-box approach to predict the remaining time of running process instances. The approach followed is firstly to predict the remaining time at the level of activities and then to aggregate these predictions at the level of a process instance by means of flow analysis techniques. ...
... More recent approaches use machine learning methods to predict the remaining time [11], [12], [16]. These models, in general, produce better results than the previously described approaches. ...
Article
Full-text available
In this paper, we deal with one of the current challenges in process mining enhancement: the prediction of remaining times in business processes. Accurate predictions of the remaining time, defined as the required time for an instance process to finish, are critical in many systems for organisations being able to establish a priori requirements, for optimal management of resources or for improving the quality of the services organisations provide. Our approach consists of i) extracting and assessing a number of features on the business logs, that provide a structural characterisation of the traces; ii) extending the well-known annotated transition system (ATS) model to include these features; iii) proposing a partitioning strategy for the lists of features associated to each state in the extended ATS; and iv) applying a linear regression technique to each partition for predicting the remaining time of new traces. Extensive experimentation using eight attributes and ten real-life datasets show that the proposed approach outperforms in terms of mean absolute error and accuracy all the other approaches in state of the art, which includes ATS-based, non-ATS based as well as Deep Learning-based approaches.
... A review of the literature reveals three primary predictive process monitoring approaches: model-based approaches [4,9], sequence-to-feature encoding (STEP) approaches [10,11], and simulation-based approaches [12,13]. ...
... The authors propose an approach to determine the optimal quantile for taking a point estimate from the survival curve. In [9], the authors extend that approach to predict the time-to-default for credit data sets from Belgian and UK financial institutions. ...
... # performer(en)) associated with that trace as well as the start and end event activity labels. While we adopt that approach to explore the impact of social contextual factors on process cycle time, we acknowledge that other encoding approaches, such as index-based encoding which is "lossless" [9], could also be equally adopted. Our approach is in effect a combination of aggregation and last state encoding [14] where the aggregation function computes the group degree, betweenness, closeness, and eigenvalue centrality for each trace based on the set of performers who executed the events in that trace. ...
Article
Full-text available
Predictive process monitoring aims to accurately predict a variable of interest (e.g., remaining time) or the future state of the process instance (e.g., outcome or next step). The quest for models with higher predictive power has led to the development of a variety of novel approaches. However, though social contextual factors are widely acknowledged to impact the way cases are handled, as yet there have been no studies which have investigated the impact of social contextual features in the predictive process monitoring framework. These factors encompass the way humans and automated agents interact within a particular organisation to execute process-related activities. This paper seeks to address this problem by investigating the impact of social contextual features in the predictive process monitoring framework utilising a survival analysis approach. We propose an approach to censor an event log and build a survival function utilising the Weibull model, which enables us to explore the impact of social contextual factors as covariates. Moreover, we propose an approach to predict the remaining time of an in-flight process instance by using the survival function to estimate the throughput time for each trace, which is then used with the elapsed time to predict the remaining time for the trace. The proposed approach is benchmarked against existing approaches using five real-life event logs and it outperforms these approaches.
... Furthermore, some approaches, e.g., [19] and [43], exploit contextual information, such as workload indicators, to take into account inter-case dependencies due to resource contention and data sharing. Finally, a group of works, e.g., [30] and [60] also leverage a process model in order to "replay" ongoing process cases on it. Such works treat remaining time as a cumulative indicator composed of cycle times of elementary process components. ...
... Furthermore, some process-aware approaches rely on stochastic Petri nets [41,42] and process models in BPMN notation [60]. ...
... Finally, Verenich et al. [60] propose a hybrid approach that relies on classification methods to predict routing probabilities for each decision point in a process model, regression methods to predict cycle times of future events, and flow analysis methods to calculate the total remaining time. A conceptually similar approach is proposed by Polato et al. [40] who build a transition system from an event log and enrich it with classification and regression models. ...
Article
Full-text available
Predictive business process monitoring methods exploit historical process execution logs to generate predictions about running instances (called cases) of a business process, such as the prediction of the outcome, next activity, or remaining cycle time of a given process case. These insights could be used to support operational managers in taking remedial actions as business processes unfold, e.g., shifting resources from one case onto another to ensure the latter is completed on time. A number of methods to tackle the remaining cycle time prediction problem have been proposed in the literature. However, due to differences in their experimental setup, choice of datasets, evaluation measures, and baselines, the relative merits of each method remain unclear. This article presents a systematic literature review and taxonomy of methods for remaining time prediction in the context of business processes, as well as a cross-benchmark comparison of 16 such methods based on 17 real-life datasets originating from different industry domains.
Article
As an important task in business process management, remaining time prediction for business process instances has attracted extensive attentions. However, most of the traditional remaining time prediction approaches only take into account formal process models and cannot handle large-scale event logs in an effective manner. Although machine learning and deep learning have been recently applied to the remaining time prediction task, these approaches cannot incorporate domain knowledge naturally. To overcome these weaknesses of existing studies, we propose a remaining execution time prediction approach based on a novel auto-encoded transition system, which can enhance the complementarity of process modeling and deep learning techniques. Through auto-encoding the event-level and state-level features, the proposed approach can represent process instances in a comprehensive and compact form. Furthermore, a transfer learning strategy is proposed to train the remaining time prediction model so as to avoid overfitting and improve the accuracy of prediction. We conduct extensive experiments on four real-world datasets to verify the effectiveness of the proposed approach. The results show its superiority over several state-of-the-art approaches.
Article
Process Mining is a new kind of Business Analytics and has emerged as a powerful family of Process Science techniques for analysing and improving business processes. Although Process Mining has managerial benefits, such as better decision making, the scientific literature has investigated it mainly from a computer science standpoint and appears to have overlooked various possible applications. We reviewed management-orientated literature on Process Mining and Business Management to assess the state of the art and to pave the way for further research. We built a seven-dimension framework to develop and guide the review. We selected and analysed 145 papers and identified eleven research gaps sorted into four categories. Our findings were formalised in a structured research agenda suggesting twenty-five research questions. We believe that these questions may stimulate the application of Process Mining in promising, albeit little explored, business contexts and in mostly unaddressed managerial areas.
Conference Paper
Predictive business process monitoring aims to accurately predict a variable of interest (e.g. remaining time) or the future state of the process instance (e.g. outcome or next step). It is an important topic both from a research and practitioner perspective. For example, existing research suggests that even when problems occur with service provision, providing accurate estimates around process completion time is positively correlated with increasing customer satisfaction. The quest for models with higher predictive power has led to the development of a variety of novel techniques. However, though the location of events is a crucial explanatory variable in many business processes, as yet there have been no studies which have incorporated spatial context into the predictive process monitoring framework. This paper seeks to address this problem by introducing the concept of a spatial event log which records location details at a trace or event level. The predictive utility of spatial contextual features is evaluated vis-à-vis other contextual features. An approach is proposed to predict the remaining time of an in-flight process instance by calculating the buffer distances between the location of events in a spatial event log to capture spatial proximity and connectedness. These distances are subsequently utilised to construct a regression model which is then used to predict the remaining time for events in the test dataset. The proposed approach is benchmarked against existing approaches using five real-life event logs and demonstrates that spatial features improve the predictive power of business process monitoring models.
Chapter
Process mining is an innovative research field aimed at extracting useful information about business processes from event data. An important task herein is process discovery. The results of process discovery are mainly non-stochastic process models, which do not convey a notion of probability or uncertainty. In this paper, Bayesian inference and Markov Chain Monte Carlo is used to build a statistical model on top of a process model using event data, which is able to generate probability distributions for choices in a process’ control-flow. A generic algorithm to build such a model is presented, and it is shown how the resulting statistical model can be used to test different kinds of hypotheses. The algorithm supports the enhancement of discovered process models by exposing probabilistic dependencies, and allows to compare the quality among different models, each of which provides important advancements in the field of process discovery.
Article
As the generalization of fuzzy systems, the belief rule base (BRB) expert system is transparent and interpretable. However, the interpretability of BRB has almost been ignored recently and leads to the decrease of model credibility. The main reason is the lack of unified guidelines for establishing an interpretable BRB expert system. In this paper, the interpretability characteristics of BRB are summarized systematically, which can be used as the guideline of BRB establishment. Four interpretability criteria are proposed to ensure the interpretability of BRB in the optimization. A modified optimization algorithm with the interpretability constraints transformed from the interpretability criteria is further developed. As such, an interpretable BRB can be established. A case study for health state evaluation of the aerospace relay is conducted to verify the effectiveness of the proposed method.
Conference Paper
Full-text available
Predictive business process monitoring methods exploit logs of completed cases of a process in order to make predictions about running cases thereof. Existing methods in this space are tailor-made for specific prediction tasks. Moreover, their relative accuracy is highly sensitive to the dataset at hand, thus requiring users to engage in trial-and-error and tuning when applying them in a specific setting. This paper investigates Long Short-Term Memory (LSTM) neural networks as an approach to build consistently accurate models for a wide range of predictive process monitoring tasks. First, we show that LSTMs outperform existing techniques to predict the next event of a running case and its timestamp. Next, we show how to use models for predicting the next task in order to predict the full continuation of a running case. Finally, we apply the same approach to predict the remaining time, and show that this approach outperforms existing tailor-made methods.
Conference Paper
Full-text available
This paper addresses the problem of discovering business process models from event logs. Existing approaches to this problem strike various tradeoffs between accuracy and understandability of the discovered models. With respect to the second criterion, empirical studies have shown that block-structured process models are generally more understandable and less error-prone than unstructured ones. Accordingly, several automated process discovery methods generate block-structured models by construction. These approaches however intertwine the concern of producing accurate models with that of ensuring their structuredness, sometimes sacrificing the former to ensure the latter. In this paper we propose an alternative approach that separates these two concerns. Instead of directly discovering a structured process model, we first apply a well-known heuristic that discovers more accurate but sometimes unstructured (and even unsound) process models, and then transform the resulting model into a structured one. An experimental evaluation shows that our “discover and structure” approach outperforms traditional “discover structured” approaches with respect to a range of accuracy and complexity measures.
Article
Full-text available
Predictive business process monitoring methods exploit logs of completed cases of a process in order to make predictions about running cases thereof. Existing methods in this space are tailor-made for specific prediction tasks. Moreover, their relative accuracy is highly sensitive to the dataset at hand, thus requiring users to engage in trial-and-error and tuning when applying them in a specific setting. This paper investigates Long Short-Term Memory (LSTM) neural networks as an approach to build consistently accurate models for a wide range of predictive process monitoring tasks. First, we show that LSTMs outperform existing techniques to predict the next event of a running case and its timestamp. Next, we show how to use models for predicting the next task in order to predict the full continuation of a running case. Finally, we apply the same approach to predict the remaining time, and show that this approach outperforms existing tailor-made methods.
Article
Full-text available
Predictive modeling approaches in business process management provide a way to streamline operational busi-ness processes. For instance, they can warn decision makers about undesirable events that are likely to happenin the future, giving the decision maker an opportunity to intervene. The topic is gaining momentum in processmining, a field of research that has traditionally developed tools to discover business process models from datasets of past process behavior. Predictive modeling techniques are built on top of process-discovery algorithms. As these algorithms describe business process behavior using models of formal languages (e.g., Petri nets),strong language biases are necessary in order to generate models with the limited amounts of data includedin the data set. Naturally, corresponding predictive modeling techniques reflect these biases. Based on theoryfrom grammatical inference, a field of research that is concerned with inducing language models, we designa new predictive modeling technique based on weaker biases. Fitting a probabilistic model to a data set of pastbehavior makes it possible to predict how currently running process instances will behave in the future. Toclarify how this technique works and to facilitate its adoption, we also design a way to visualize the proba-bilistic models. We assess the effectiveness of the technique in an experimental evaluation with synthetic andreal-world data.
Article
Business Process Management (BPM) is the art and science of how work should be performed in an organization in order to ensure consistent outputs and to take advantage of improvement opportunities, e.g. reducing costs, execution times or error rates. Importantly, BPM is not about improving the way individual activities are performed, but rather about managing entire chains of events, activities and decisions that ultimately produce added value for an organization and its customers. This textbook encompasses the entire BPM lifecycle, from process identification to process monitoring, covering along the way process modelling, analysis, redesign and automation. Concepts, methods and tools from business management, computer science and industrial engineering are blended into one comprehensive and inter-disciplinary approach. The presentation is illustrated using the BPMN industry standard defined by the Object Management Group and widely endorsed by practitioners and vendors worldwide. In addition to explaining the relevant conceptual background, the book provides dozens of examples, more than 100 hands-on exercises – many with solutions – as well as numerous suggestions for further reading. The textbook is the result of many years of combined teaching experience of the authors, both at the undergraduate and graduate levels as well as in the context of professional training. Students and professionals from both business management and computer science will benefit from the step-by-step style of the textbook and its focus on fundamental concepts and proven methods. Lecturers will appreciate the class-tested format and the additional teaching material available on the accompanying website fundamentals-of-bpm.org.
Conference Paper
Predicting the final state of a running process, the remaining time to completion or the next activity of a running process are important aspects of runtime process management. Runtime management requires the ability to identify processes that are at risk of not meeting certain criteria in order to offer case managers decision information for timely intervention. This in turn requires accurate prediction models for process outcomes and for the next process event, based on runtime information available at the prediction and decision point. In this paper, we describe an initial application of deep learning with recurrent neural networks to the problem of predicting the next process event. This is both a novel method in process prediction, which has previously relied on explicit process models in the form of Hidden Markov Models (HMM) or annotated transition systems, and also a novel application for deep learning methods.
Article
This article addresses the problem of estimating the Quality of Service (QoS) of a composite service given the QoS of the services participating in the composition. Previous solutions to this problem impose restrictions on the topology of the orchestration models, limiting their applicability to well-structured orchestration models for example. This article lifts these restrictions by proposing a method for aggregate QoS computation that deals with more general types of unstructured orchestration models. The applicability and scalability of the proposed method are validated using a collection of models from industrial practice.
Conference Paper
Methods for predicting issue lifetime can help software project managers to prioritize issues and allocate resources accordingly. Previous studies on issue lifetime prediction have focused on models built from static features, meaning features calculated at one snapshot of the issue's lifetime based on data associated to the issue itself. However, during its lifetime, an issue typically receives comments from various stakeholders, which may carry valuable insights into its perceived priority and difficulty and may thus be exploited to update lifetime predictions. Moreover, the lifetime of an issue depends not only on characteristics of the issue itself, but also on the state of the project as a whole. Hence, issue lifetime prediction may benefit from taking into account features capturing the issue's context (contextual features). In this work, we analyze issues from more than 4000 GitHub projects and build models to predict, at different points in an issue's lifetime, whether or not the issue will close within a given calendric period, by combining static, dynamic and contextual features. The results show that dynamic and contextual features complement the predictive power of static ones, particularly for long-term predictions.
Conference Paper
Overprocessing waste occurs in a business process when effort is spent in a way that does not add value to the customer nor to the business. Previous studies have identified a recurrent overprocessing pattern in business processes with so-called “knockout checks”, meaning activities that classify a case into “accepted” or “rejected”, such that if the case is accepted it proceeds forward, while if rejected, it is cancelled and all work performed in the case is considered unnecessary. Thus, when a knockout check rejects a case, the effort spent in other (previous) checks becomes overprocessing waste. Traditional process redesign methods propose to order knockout checks according to their mean effort and rejection rate. This paper presents a more fine-grained approach where knockout checks are ordered at runtime based on predictive machine learning models. Experiments on two real-life processes show that this predictive approach outperforms traditional methods while incurring minimal runtime overhead.
Conference Paper
Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Recently, work on process mining showed how management of these processes, and engineering of supporting systems, can be guided by models extracted from the event logs that are recorded during process operation. In this work, we establish a queueing perspective in operational process mining. We propose to consider queues as first-class citizens and use queueing theory as a basis for queue mining techniques. To demonstrate the value of queue mining, we revisit the specific operational problem of online delay prediction: using event data, we show that queue mining yields accurate online predictions of case delay.