ChapterPDF Available

Multi-perspective Comparison of Business Process Variants Based on Event Logs


Abstract and Figures

A process variant represents a collection of cases with certain shared characteristics, e.g. cases that exhibit certain levels of performance. The comparison of business process variants based on event logs is a recurrent operation in the field of process mining. Existing approaches focus on comparing variants based on directly-follows relations such as “a task directly follows another one” or a “resource directly hands-off to another resource”. This paper presents a more general approach to log-based process variant comparison based on so-called perspective graphs. A perspective graph is a graph-based abstraction of an event log where a node represents any entity referred to in the log (e.g. task, resource, location) and an arc represents a relation between these entities within or across cases (e.g. directly-follows, co-occurs, hands-off to, works-together with). Statistically significant differences between two perspective graphs are captured in a so-called differential perspective graph, which allows us to compare two logs from any perspective. The paper illustrates the approach and compares it to an existing baseline using real-life event logs.
Content may be subject to copyright.
Multi-Perspective Comparison of Business Process
Variants Based on Event Logs
Hoang Nguyen1, Marlon Dumas2, Marcello La Rosa3, and Arthur H.M. ter Hofstede1
1Queensland University of Technology,
2University of Tartu
3University of Melbourne
Abstract. A process variant represents a collection of cases with certain shared
characteristics, e.g. cases that exhibit certain levels of performance. The com-
parison of business process variants based on event logs is a recurrent operation
in the field of process mining. Existing approaches focus on comparing variants
based on directly-follows relations such as “a task directly follows another one”
or a “resource directly hands-oto another resource”. This paper presents a more
general approach to log-based process variant comparison based on so-called per-
spective graphs. A perspective graph is a graph-based abstraction of an event log
where a node represents any entity referred to in the log (e.g. task, resource, lo-
cation) and an arc represents a relation between these entities within or across
cases (e.g. directly-follows, co-occurs, hands-oto, works-together with). Sta-
tistically significant dierences between two perspective graphs are captured in
a so-called dierential perspective graph, which allows us to compare two logs
from any perspective. The paper illustrates the approach and compares it to an
existing baseline using real-life event logs.
Keywords: Process mining, variant analysis, comparison, multi-perspective.
1 Introduction
The performance of a business process may vary over time, geographically, or across
business units, products, or customer types. And even within a given time period, place,
business unit, product, and customer type, there are usually performance variations be-
tween cases of a process. Some cases lead to a positive outcome (e.g. on-time com-
pletion), while others lead to negative outcomes. A typical question that arises in this
setting is: “What dierentiates the positive and the negative cases?”, or more broadly:
“What (statistically) significant dierences exist between two variants of a process?”
Recently, several approaches for comparing variants of a process based on their
event logs have been proposed [1–4]. Given two event logs L1 and L2 corresponding to
two variants of a business process, these techniques allow us to identify characteristics
that are commonly found in the cases in L1, but are rare or non-existent in L2 and vice-
versa. These approaches are restricted to identifying dierences in the directly-follows
relations, such as “task A always directly follows task B in one variant but never in the
other” or “resource X often hands-owork to resource Y in one variant but rarely in
the other”. However, events in a log may carry a richer set of attributes besides tasks
and resources – e.g. customer attributes, location attributes, product-related attributes,
etc. Dierences between two event logs may be found along any of these attributes.
This paper presents an approach for comparing process variants from multiple per-
spectives corresponding to arbitrary sets of attributes. Specifically, the paper introduces
a graph-based abstraction of an event log, namely a perspective graph, where a node
represents any entity referenced in an attribute of the event log (task, resource, location,
etc.) and an arc represents an arbitrary relation between entities (e.g. directly-follows,
2 H. Nguyen et al.
co-occurs, hands-oto, works-together with, etc.) within or across cases. Statistically
significant dierences between two perspective graphs are captured in a so-called dif-
ferential perspective graph, which allows a user to visually compare two event logs
from any given perspective using either a graphical or a matrix representation.
The proposed approach has been implemented as a proof-of-concept prototype in
the ProM open-source process mining toolset. The paper illustrates the capabilities pro-
vided by dierential perspective graphs using two real-life event logs, and compares
them against an existing state-of-the-art approach for process variant analysis.
The paper is structured as follows. Section 2 discusses existing process variant anal-
ysis approaches. Section 3 presents the proposed approach, while Section 4 discusses its
evaluation. Section 5 summarizes the contributions and outlines future work directions.
2 Related Work
Existing approaches to log-based process variant comparison can be classified into
indicator-based, graph-based, and model-based. Indicator-based approaches extract per-
formance indicators from two input logs and compare these indicators using visualiza-
tion techniques (e.g. bar charts), e.g. risk indicators [5], performance indicators [6], and
resource behavior indicators [7]. These approaches allow one to determine how two
variants perform relative to each other on an aggregate basis or at a task- or resource
level. For example, these techniques allow us to determine which tasks have higher
cycle time in one variant than in the other. However, they do not allow us to identify
behavioral dierences and their impact on performance.
Graph-based approaches rely on pairwise dierencing of graphs, such as event
structures [2], directly-follows graphs [8], or transition systems [4]. For example, the
approach in [4] abstracts a process as a transition system where each state represents an
equivalence class of trace prefixes, e.g. a state may represent all prefixes that coincide
on their last nevents. Each transition is labeled with an event label or event attribute
value. Transition systems are contrasted and dierences are visually highlighted. The
approach in [4] is integrated with a Process Cube approach for log slicing and dicing to
generate sublogs for comparison [9]. Our technique falls into this category. In particular,
the technique in [4] is used as a baseline in our evaluation.
Other related techniques take as input process models and enrich them with per-
formance measures extracted from event logs, such as occurrence frequency or cycle
time [3]. These techniques assume that a process model is available, which captures the
dominant behaviors of the process variants. In contrast, in this paper we assume that
no models are given a priori. Instead, the comparison of variants is conducted on an
exploratory basis, from multiple perspectives, and purely based on event logs.
3 Approach
Given two event logs as input, each representing a variant of the same business process,
our approach mines two perspective graphs, one from each log. Next, it compares the
two graphs and visualizes their statistically significant dierences using a dierential
graph. In the remainder, we define all ingredients of our approach: event logs, process
abstraction, perspective graphs, and dierential graphs.
3.1 Event Logs and Process Abstraction
An event log consists of cases where each case has a number of associated events. Cases
and events can have various attributes. An example log is shown in Table 1. Rows are
events and columns are event attributes. The format of event logs has been standardized
in the eXtensible Event Stream by the IEEE CIS Task Force on Process Mining [10].
Table 1 provides an example of a log schema. Our technique can work with any log
Multi-Perspective Comparison of Business Process Variants 3
Let be a universe of values, Ebe a universe of events and Abe a universe of
attribute names, where for each A∈ A,A:E → .
Definition 1 (Event Log) An event log L over a schema S ={A1,A2,...,An} ⊂ A is a
set of events, i.e. a subset of E.
CaseID EventID Timestamp Activity Resource Department Location
c1e101.10 10:00:00 a1r1d1l1
c1e202.10 10:00:00 a2r2d1l1
c1e303.10 10:00:00 a3r3d2l1
c1e404.10 10:00:00 a1r3d2l2
c1e505.10 10:00:00 a2r1d1l2
c2e602.10 10:00:00 a3r1d1l1
c2e704.10 10:00:00 a1r2d1l2
c2e806.10 10:00:00 a3r4d2l2
c2e908.10 10:00:00 a2r2d1l1
Table 1. Event log example
Assume that events in a case occur
sequentially as shown in Table 1. Based
on the event sequence in each case, Fig. 1
shows an example of event clusterings
in each case along the time line accord-
ing to the Department and Location at-
tributes. For example, in Fig. 1a, events
e1and e2share the same department at-
tribute d1, and the next occurring events
e3and e4share the same department
d2. This is followed by event e5which
has occurred with department attribute d1
again. Fig. 1a and Fig. 1b each is seen as
an abstraction of the process in Table 1.
This method of process abstraction is formalized as follows.
e1 e2 e5
d1 d1
Abstraction (Location)
e3 e4
e6 e7
e1 e2 e3
Abstraction (Department)
e4 e5
e6 e7 e8
r1,a1 r2,a2 r3,a3 r3,a1 r1,a2
r1,a3 r2,a1 r4,a3 r2,a2
r1,a1 r2,a2 r3,a3 r3,a1 r1,a2
r1,a3 r2,a1 r4,a3 r2,a2
e1 e2 e5
d1 d1
Abstraction (Location)
e3 e4
e6 e7
e1 e2 e3
Abstraction (Department)
e4 e5
e6 e7 e8
r1,a1 r2,a2 r3,a3 r3,a1 r1,a2
r1,a3 r2,a1 r4,a3 r2,a2
r1,a1 r2,a2 r3,a3 r3,a1 r1,a2
r1,a3 r2,a1 r4,a3 r2,a2
Fig. 1. Abstraction by (a) Department,
(b) Location
Let Lbe a log over schema S,CaseID the
CaseID attribute of events, timestamp the times-
tamp attribute of events, CIDLthe set of CaseIDs
in L, and TSa schema with T6=and
timestamp /T. We assume that all timestamps
are dierent.
First, we define a number of relations between
events as the basis for our later formalisation.
Events can be related in terms of timestamp,Ca-
seID, and other attributes. Given two events, first
they can be ordered based on their timestamps,
i.e. e1<e2itimestamp(e1)<timestamp(e2).
Second, they are case-related if they occur in
the same case, i.e. e1
CaseID(e2). Finally, in terms of a schema TS,
they are T-equal if they share values for all at-
tributes in T, i.e. e1=Te2itT[t(e1)=
t(e2)]; otherwise, they are T-unequal, i.e. e16=Te2.
Based on the above relations, two events are case-ordered ithey are ordered and
case-related, i.e. e1le2ie1<e2e1
=e2. Further, two events are T-case-ordered
ithey are case-ordered and T-unequal or case-ordered,T-equal but separated in time
from each other by another case-related but T-unequal event, i.e. e1lTe2i(e1le2
e16=Te2)(e1le2e1=Te2∧ ∃e3L[e1le3e3le2e36=Te1]).
The T-case-ordered relation lTcan be observed in Table 1. In case c1, in terms of
schema T={Department}, events e1and e3are T-case-ordered because they are ordered
and T-unequal; events e1and e5are also T-case-ordered because they are ordered, T-
equal and there is an event, e.g. e3, that occurs between them in the same case but is
T-unequal to them. The T-case-ordered relation lTforms a strict partial order over E
for each CaseID. This can be sketched briefly. T-case-ordered is an irreflexive relation
because elTeimplies elehence timestamp(e)<timestamp(e) which is not possible.
From the definition of T-case-ordered, it can be proved that T-case-ordered is a transitive
relation by case distinction with four cases.
Given two case-related events and a schema T, if there exists no T-case-ordered
relation between the two events, it means that they are T-equal and non-separable in
time from each other by another case-related T-unequal event. In this situation, we say
that they are T-equivalent, i.e. e1Te2ie1
=e2∧ ¬(e1lTe2)∧ ¬(e2lTe1).
4 H. Nguyen et al.
The T-equivalent relation Ton the event set Ec={eL|CaseID(e)=c}of a
caseid cforms an equivalence relation, where the corresponding quotient set of Ecis
Ec\T. Two T-equivalent events are in the same equivalence class of Ec\T. We refer to
an equivalence class in Ec\Tas a fragment. Visually, a fragment is a row of events as
shown in Fig. 1, e.g. in Fig. 1a, {e1,e2}is a fragment in case c1.
Based on the notion of fragments, we now define process abstraction.
Definition 2 (Process Abstraction) Let L be a log, T S a schema, and CIDLthe set
of CaseIDs in L. An abstraction over T from L is defined as AL
Let AL
Tbe an abstraction over schema Tfrom log L. From the definition, AL
Tis a
set of fragments where each fragment is a set of T-equivalent events. There is a follows
relation between fragments F1,F2∈ AL
T, i.e. F1F2ie1F1e2F2[e1lTe2],
and a directly-follows relation between fragments F1,F2∈ AL
T, i.e. F1F2iF1
F2@F3∈ AL
3.2 Perspective Graphs
From a process abstraction as shown in Fig. 1, one can look at dierent relations be-
tween event attributes. For example, one can look at the co-occurrence of two attributes
in the same fragment, e.g. two resources working in the same department or location.
Alternatively, one can look at an inter-fragment relation where an attribute occurs in one
fragment and the other attribute occurs in a directly following fragment, e.g. the flow
from an activity performed in one department to another activity performed in the next
department. More generally, instead of focusing on one attribute, one may focus on a
number of attributes depending on the type of analysis, e.g. it may be a pair (resource,
activity) representing a task assignment.
In order to represent dierent types of relation between event attributes, we propose
two types of graphs, intra-fragment and inter-fragment, defined as follows.
Let AL
Tbe an abstraction over schema Tfrom log L. Let VS, then πV(e) denotes
a projection on event eof attributes in Vwhich is defined as {(v,v(e)) |vV}1.
Definition 3 (Intra-Fragment Graph) Let AL
Tbe an abstraction over schema T from
log L and let U,VS . An Intra-Fragment Graph IAGU,V
T(L)is a node and arc weighted
undirected graph G =(N,E,WN,WE), defined by:
E={{πU(e), πV(e0)} | eLe0LeTe0πU(e)6=πV(e0)},
WN(n)=|{eL|πU(e)=nπV(e)=n}| for all n N,
WE({n1,n2})=|{eL|πU(e)=n1πV(e)=n2}| +|{{e1,e2} | e1Le2
LπU(e1)=n1πV(e2)=n2e1Te2e16=e2}| for all {n1,n2} ∈ E.
Intra-Fragment Graphs represent a co-occurrence relation between event attributes
as they co-occur in the same fragment. For example, it can be a task assignment relation
when a resource and an activity co-occur in an event, or a co-location relation when two
resources co-occur in the same location.
Let AL
Tbe an abstraction over schema Tfrom log L. Let φ, ϕ:AL
T E such that
for all F∈ AL
T,φ(F)∈ E and ϕ(F)∈ E.φand ϕare two choice functions on AL
T, i.e.
choice functions can be used to extract events with certain properties from fragments.
We will not specify their semantics. Just as an example, φcould be chosen such that
it returns the latest event in a fragment, and ϕcould be chosen such that it returns the
earliest event in a fragment.
Given two choice functions φand ϕ, two events are in an inter-fragment directly-
follows relation, denoted e1Te2, iF1,F2∈ AL
e2=ϕ(F2). The set of all inter-fragment directly-follows event pairs in log Lis
ET={(e1,e2)∈ E × E | e1Te2}.
1(v,v(e)) is abbreviated to v(e) when it is clear from the context for the purpose of readability
Multi-Perspective Comparison of Business Process Variants 5
Definition 4 (Inter-Fragment Graph) Let AL
Tbe an abstraction over schema T from
log L and let U,VS . An Inter-Fragment Graph IEGU,V
T(L)is a node and arc weighted
directed graph G =(N,E,WN,WE), defined by:
N={πU(e)|e0∈ E ∧ (e,e0)∈ ET}∪{πV(e0)|e∈ E ∧ (e,e0)∈ ET},
E={(πU(e), πV(e0)) |(e,e0)∈ ET},
WN(n)=|{(e,e0)∈ ET|πU(e)=n}| +|{(e,e0)∈ ET|πV(e0)=n}|, for all n N,
Let (n1,n2)E and En1,n2
Tbe the set of all inter-fragment directly-follows event
pairs corresponding to (n1,n2), i.e. En1,n2
T={(e1,e2)∈ ET|πU(e1)=n1πV(e2)=
WE((n1,n2)) =
T|if frequency based
T|if time based .
Inter-Fragment Graphs represent a flow relation between event attributes. For ex-
ample, it can be a hand-over from a resource in one department to another resource in
a directly following department, or a flow from an activity executed in one location to
another activity executed in a directly following location.
3.3 Comparing Perspective Graphs and Visualizing Dierences
In comparing two perspective graphs, common nodes and edges on the two graphs are
compared in terms of their weights. Note that the weights defined in Section 3.2 are
computed for the whole log. Instead of comparing graphs based on these weights, this
paper looks for statistically significant dierences by comparing sample populations of
weights obtained from log observations.
Dierent techniques can be used to make observations of logs. Case-wise observa-
tions are made on cases in the log, i.e. weights of nodes and edges are computed from
events in each case. Dierences determined by the tests can be understood as dierences
between the two variants synthesized from all cases. Time-wise observation allows one
to see dierences between logs over time. This technique uses a sliding time window
starting from the earliest event in each log. Observations are made on each window, i.e.
weights of nodes and edges are computed from events occurring within each window.
0 0.6 0.7 0.8
-0.6 -0.7 -0.8 1
P(XA < XB) P(XA > XB) Common nodes/edges with differences
Common nodes/edges with no differences
Only occur in variant A Only occur in variant B
Uncommon nodes/edges
Fig. 2. Color scheme
The result of graph comparison
is a dierential graph containing
common nodes and edges and also
uncommon nodes and edges that
appear in one graph only. If nodes
and edges are common with a sta-
tistically significant dierence, their
weight is the eect size of the dier-
ence. This paper chooses common language eect size [11] due to its interpret-ability.
For example, an eect size of 80% indicates that given any random observations of the
two variants, variant A has 80% chance of having a higher mean weight than variant B.
If nodes or edges are common without a statistically significant dierence, their weight
is simply zero. Lastly, if they are uncommon, their weight is the relative weight among
all uncommon nodes or edges in the graph. Dierential graphs are visualised in the
form of matrices (nodes are row and column headers while edges are cells). Matrices
can be symmetric (for undirected graphs) as shown in Fig. 5 or asymmetric (for directed
graphs) as shown in Fig. 4. Nodes and edges are color coded based on their weight and
the color scheme shown in Fig. 2.
4 Evaluation
We implemented our approach as a ProM plugin named Multi-Perspective Process
Comparator (MPC).2The plugin allows one to import two event logs in MXML or
2Executable and source code are available from
6 H. Nguyen et al.
XES format as input, mine dierent perspective graphs and compare them to identify
statistically significant dierences. Using this implementation, we evaluated our ap-
proach on two real-life datasets and compared the results with the ProcessComparator
(or PC) plugin in ProM [4].
We looked at the public real-life event logs available in the 4TU Data Center3and
selected two representative datasets, namely BPIC13 and BPIC15. These two datasets
come with business questions that entail variants comparison, which have been posed by
the process stakeholders of these datasets, as part of public contests on process mining.
Due to space limits, we only report the result of our technique on the BPIC13 log
focusing on aspects our technique can improve over the baseline. Detailed evaluation is
documented in a technical report [12].
BPIC134records cases of an IT incident handling process at Volvo Belgium. An
IT ticket is raised for each incident to be investigated by various IT support teams.
Teams are organized into technology-wide functions (org:role attribute), organization
lines (organization involved attribute), and countries (resource country attribute). For
our evaluation, we selected the following question from the description accompanying
this dataset: “Where do the two IT organisations (A2 and C) dier?” where A2 and C
are the main organization lines responsible for most of the IT tickets.
For each dataset and business question above, we compared process variants using
three sub-questions. With reference to Fig. 1, these questions are focused on two levels
of granularity, event and fragment, and time-wise dierences.
Q1. What are the dierences at the event level?
At the event level, we can look into either inter-event or intra-event relations where
each event is a fragment. Regarding the former, both PC and MPC can provide the same
insight. Specifically in the case of MPC, we can use the event ID attribute to create a
process abstraction, then create an inter-fragment graph using the pair of event name
and status attributes as nodes.
Fig. 3. Resource Country and Activity Status
However, regarding the
intra-event relations, PC cannot
provide a solution while MPC
can investigate these relations
through intra-fragment graphs.
For example, on the event-based
abstraction, we can use the coun-
try attribute as a node and the
activity status attribute as another
node. The matrix in Fig. 3 reveals
that the teams in Brazil, India
and the USA in the organization
C choose the “Wait User” sta-
tus for IT tickets more frequently than in the organization A2. This is an operational
concern since IT stacan choose this status as an excuse to delay incident investigation.
Q2. What are the dierences at the fragment level?
In the BPIC13 dataset, we can create fragments using the country attribute to look into
how process activities are related between IT teams from dierent countries. In this as-
pect, PC aggregates the activity flow between fragments. For example, PC shows a flow
from [Sweden] to [Poland] through the Accepted activity meaning that this activity is
performed by Sweden and then work is transferred to Poland. It may however consist of
two possible flows: either Accepted by Sweden followed by activity Queued performed
by Poland, or Accepted by Sweden followed by Completed performed by Poland.
4doi:10.4121/uuid:500573e6-accc- 4b0c-9576-aa5468b10cee
Multi-Perspective Comparison of Business Process Variants 7
BPIC13, Inter-Fragment, EventName + Status
Node1 = status + country + event name
Node2 = status + country + event name
Fig. 4. Impact, Country, and Event Name
Similarly to PC, MPC can
look into the same flow by first
using the country attribute to
create the process abstraction,
then choosing the pair of coun-
try and event name attributes as
node and the country attribute as
another node to create an inter-
fragment graph. However, be-
yond that, MPC can elaborate
the activity flow between coun-
tries by using other event at-
tributes. Fig. 4 shows an ex-
ample where we chose impact,
country and event name to repre-
sent a node. From this figure, we
can see that the process activ-
ity flow from Sweden to Poland
through the “Accepted” activity
is actually the control flow from
“Medium Sweden Accepted” to “Medium Poland Accepted” (i.e. from “Accepted” to
Accepted” activities for medium impact cases only).
BPIC13, Intra-Fragment, Impact, Impact + Status
Fig. 5. Medium Impact and Activity Status
Further, MPC can look into the dif-
ferences between A2 and C within each
fragment through intra-fragment graphs,
while this is not possible with PC. For ex-
ample, we use the impact attribute to create
a process abstraction, and the pair of impact
and activity status attributes as the node. The
result is shown in Fig. 5 for medium impact
incidents. Remarkably, we can see that for
medium-impact incidents, most of activity
statuses in A2 have approximately 60-70%
chance of occurring more frequently than
in C. There are no significant dierences
between the two organization lines in high-
impact incidents.
Q3. What are the time-wise dierences as compared to case-wise dierences?
So far, the evaluation only finds case-wise dierences, i.e. dierences synthesized from
cases in A2 and C. Time-wise observations, however, are not available in PC.
BPIC13, Inter-Fragment, Impact, Impact + Status
Fig. 6. Event Name and Activity Status
For MPC, we use time-wise observations
with a sliding window set to three days as
most of events in a case occur within a day.
We use the event ID to create a process ab-
straction, and the pair of event name and ac-
tivity status as node. In this case, the node
(edge) weight captures their relative occur-
rence frequency in each window. The result is
shown in Fig. 6. We can see that there are two
remarkable dierences between A2 and C
over time in the node “Accepted Wait-User”
and the edge “Accepted In Progress” “Ac-
cepted Wait-User”. The dierence magnitude
is approximately 56%, i.e. there is a 56% probability that “Accepted Wait-User” in A2
has lower frequency than in C. In MPC, clicking on the edge “Accepted In Progress”
8 H. Nguyen et al.
Accepted Wait-User” views detailed time series which shows that this dierence
mostly occurred between 21 Jan and 10 Mar 2012.
5 Conclusion
This paper contributes the notions of perspective graph and dierential graph. A per-
spective graph is an abstraction of an event log in which nodes represent entities refer-
enced by an event attribute or combination of attributes, and links refer to co-occurrence
or directly-follows relations. Perspective graphs generalize directly-follows graphs and
hand-ographs, commonly supported by process mining tools. Dierential perspective
graphs allow us to compare two event logs (abstracted via perspective graphs) and to
identify their statistically significant dierences.
The example-based evaluation of dierential perspective graphs on real-life logs
shows that we can identify dierences that are beyond the scope of the existing Pro-
cessComparator approach, and that the matrix-based representation of dierential per-
spective graphs provides a more compact representation for displaying such dierences,
compared to node-link (graphical) representations used in process mining tools.
While the examples highlighted the possible advantages of the proposed approach,
these need to be confirmed via a usability evaluation with end users, which is left as
future work. Another future work avenue is to extend the approach in order to identify
dierences between variants that can be causally related to performance, e.g. structural
or behavioral dierences that can explain dierences in cycle time between variants.
Acknowledgements. This research is partly funded by the Australian Research Council
(DP150103356) and the Estonian Research Council (grant IUT20-55).
1. H. Nguyen, M. Dumas, M. La Rosa, F. Maggi, and S. Suriadi. Mining business process
deviance: a quest for accuracy. In Proc. of CoopIS. Springer, 2014.
2. N.R.T.P. Van Beest, M. Dumas, L. Garc´
nuelos, and M. La Rosa. Log delta analysis:
interpretable dierencing of business process event logs. In Proc. of BPM. Springer, 2015.
3. M. Wynn, E. Poppe, J. Xu, A.H.M. ter Hofstede, R. Brown, A. Pini, and W.M.P. van der
Aalst. ProcessProfiler3D: A visualisation framework for log-based process performance
comparison. DSS, 100:93–108, 2017.
4. A. Bolt, M. de Leoni, and W.M.P. van der Aalst. Process variant comparison: using event
logs to detect dierences in behavior and business rules. Inf. Syst., 2018.
5. A. Pika, W.M.P. van der Aalst, C. Fidge, A.H.M. ter Hofstede, and M. Wynn. Profiling event
logs to configure risk indicators for process delays. In Proc. of CAiSE. Springer, 2013.
6. J. Gulden. Visually comparing process dynamics with Rhythm-Eye views. In Proc. of BPM
Workshops. Springer, 2016.
7. A. Pika, M. Wynn, C. Fidge, A.H.M. ter Hofstede, M. Leyer, and W.M.P. van der Aalst. An
extensible framework for analysing resource behaviour using event logs. In CAiSE. Springer,
8. N. Ballambettu, M. Suresh, and R. Bose. Analyzing process variants to understand dier-
ences in key performance indices. In Proc. of CAiSE. Springer, 2017.
9. A. Bolt and W.M.P. van der Aalst. Multidimensional process mining using Process Cubes.
In Proc. of BMMDS/EMMSAD. Springer, 2015.
10. IEEE standard for eXtensible Event Stream (XES) for achieving interoperability in event
Logs and event streams. IEEE Std 1849-2016, pages 1–50, 2016.
11. K.O. McGraw and S.P. Wong. A common language eect size statistic. Psychological
Bulletin, 111(2):361–365, 1992.
12. H. Nguyen, M. Dumas, M. La Rosa, and A.H.M. ter Hofstede. Multi-perspective comparison
of business process variants based on event Logs (extended paper). QUT ePrints Technical
Report, Queensland University of Technology, 2018 (
... Compared to other methods, relatively less research has been done on this type of methods due to the NP-completeness of graph matching [13]. Nguyen et al. [14] proposed a method that uses a perspective graph to compare process variants. The perspective graph contains information about entities and the relationship among entities. ...
Full-text available
With the application of numerous services or software, process mining has attracted more and more attention. However, concept drift may occur during process mining due to the instability of the process. Sudden and gradual drifts are considered to be two basic modes of change, that may always appear in independent or nested forms. Although the existing methods have studied the detection of two basic modes, they do not consider the nesting of two change modes. We identify the change mode that sudden drifts and gradual drifts do not appear independently as nested drifts. The current drift detection methods can only detect the drift modes that occur independently, but not suitable for nested drift detection. To fill this gap, this paper proposes a business concept drift detection and localization framework called BRDDL (Behavior Replacement-based Drift Detection and Localization) which can not only detect independent drifts such as sudden drifts and gradual drifts, but also detect nested drifts. Firstly, we propose an integrated drift point detection and localization method which can report the location of change points and return the changed behaviors (activity relationship pairs). On this basis, we propose a behavior replacement method by updating the changed traces to restore an unchanged sub log. Then we compare the behaviors in the updated traces with those in the associated unchanged traces to judge the type of drifts. The effectiveness of the method is verified by simulation experiments on the synthetic log.
... In this case, it can be considered correct due to the business goal defined for the business process that generates this event log. Figure 3 shows an analysis of the main process variants retrieved from the event log of the IoT Air Quality Monitor participant. Through the process variants, characteristics found in the traces of an event log can be identified [38], allowing one to know the different directly-follows relations, i.e., discovering a sequence of specific activities through the flow of the business process. In the first map of the process variant presented in Figure 3 (from top to bottom), it is observed that a sequence of SDILS-VSD activities, executed twice in a row, may correspond to a loop within the flow of the process. ...
Full-text available
Process mining is a novel alternative that uses event logs to discover, monitor, and improve real business processes through knowledge extraction. Event logs are a prerequisite for any process mining technique. The extraction of event data and event log building is a complex and time-intensive process, with human participation at several stages of the procedure. In this paper, we propose a framework to semi-automatically build an event log based on the XES standard from relational databases. The framework comprises the stages of requirements identification, event log construction, and event log evaluation. In the first stage, the data is interpreted to identify the relationship between the columns and business process activities, then the business process entities are defined. In the second stage, the hierarchical structure of the event log is specified. Likewise, a formal rule set is defined to allow mapping the database columns with the attributes specified in the event log structure, enabling the extraction of attributes. This task is implemented through a correlation method at the case, event, and activity levels, to automatic event log generation. In the third stage, we validate the event log through statistical analysis and business process discovery. The former allows determining the complexity of the event log built using the metrics of the average time of cases and average time of the number of events. The latter evaluates the business process models discovered through precision, coverage, and generalization metrics. The proposed approach was evaluated using an autonomous Internet of Things (IoT) air quality monitoring system’s database, reaching acceptable values of 1.0 in the precision and coverage metrics and between 0.980 and 0.991 in the generalization metric.
... Some studies used clustering-based techniques to find groups of traces sharing similar characteristics that can be generalised and employed to detect drifts [11,[19][20][21][22]. Other studies used graph-based analysis techniques [9,23,24] or model-to-log alignment [25,26]. ...
Full-text available
This paper presents a set of methods, jointly called PGraphD*, which includes two new methods (PGraphDD-QM and PGraphDD-SS) for drift detection and one new method (PGraphDL) for drift localisation in business processes. The methods are based on deep learning and graphs, with PGraphDD-QM and PGraphDD-SS employing a quality metric and a similarity score for detecting drifts, respectively. According to experimental results, PGraphDD-SS outperforms PGraphDD-QM in drift detection, achieving an accuracy score of 100% over the majority of synthetic logs and an accuracy score of 80% over a complex real-life log. Furthermore, PGraphDD-SS detects drifts with delays that are 59% shorter on average compared to the best performing state-of-the-art method.
Full-text available
The field of process mining focuses on distilling knowledge of the (historical) execution of a process based on the operational event data generated and stored during its execution. Most existing process mining techniques assume that the event data describe activity executions as degenerate time intervals, i.e., intervals of the form [t, t], yielding a strict total order on the observed activity instances. However, for various practical use cases, e.g., the logging of activity executions with a nonzero duration and uncertainty on the correctness of the recorded timestamps of the activity executions, assuming a partial order on the observed activity instances is more appropriate. Using partial orders to represent process executions, i.e., based on recorded event data, allows for new classes of process mining algorithms, i.e., aware of parallelism and robust to uncertainty. Yet, interestingly, only a limited number of studies consider using intermediate data abstractions that explicitly assume a partial order over a collection of observed activity instances. Considering recent developments in process mining, e.g., the prevalence of high-quality event data and techniques for event data abstraction, the need for algorithms designed to handle partially ordered event data is expected to grow in the upcoming years. Therefore, this paper presents a survey of process mining techniques that explicitly use partial orders to represent recorded process behavior. We performed a keyword search, followed by a snowball sampling strategy, yielding 68 relevant articles in the field. We observe a recent uptake in works covering partial-order-based process mining, e.g., due to the current trend of process mining based on uncertain event data. Furthermore, we outline promising novel research directions for the use of partial orders in the context of process mining algorithms.
Process mining is a well-established discipline with applications in many industry sectors, including healthcare. To date, few publications have considered the context in which processes execute. Little consideration has been given as to how contextual data (exogenous data) can be practically included for process mining analysis, beyond including case or event attributes in a typical event log. We show that the combination of process data (endogenous) and exogenous data can generate insights not possible with standard process mining techniques. Our contributions are a framework for process mining with exogenous data and new analyses, where exogenous data and process behaviour are linked to process outcomes. Our new analyses visualise exogenous data, highlighting the trends and variations, to show where overlaps or distinctions exist between outcomes. We applied our analyses in a healthcare setting and show that clinicians could extract insights about differences in patients’ vital signs (exogenous data) relevant to clinical outcomes. We present two evaluations, using a publicly available data set, MIMIC–III, to demonstrate the applicability of our analysis. These evaluations show that process mining can integrate large amounts of physiologic data and interventions, with resulting discrimination and conversion to clinically interpretable information.
Discovering and analysing business processes are important tasks for organizations. Process mining bridges the gap between process management and data science by discovering process models using event logs derived from real-world data. Besides mandatory event attributes like case identifier, activity, and timestamp, additional event attributes can be present, such as human resources, costs, and laboratory values. These event attributes can be modified by multiple events in a trace, which can be classified as so-called dynamic event attributes. So far, the process behaviour of event attributes is described in the form of read/write operations or object-lifecycle states. However, the actual value behaviour has not been considered yet. This paper introduces an approach that allows to automatically detect changes in the actual values of dynamic event attributes, enabling to identify changes between process activities representing events with the same activity name. This can help to confirm expected behaviour of dynamic event attributes, but also allows deriving novel insights by identifying unexpected changes. We applied the proposed technique on the MIMIC-IV real-world data set on hospitalizations in the US and evaluated the results together with a medical expert. The approach is implemented in Python with the help of the PM4Py framework.KeywordsProcess miningChange detectionProcess enhancement
Full-text available
Batch processing reduces processing time in a business process at the expense of increasing waiting time. If this trade-off between processing and waiting time is not analyzed, batch processing can, over time, evolve into a source of waste in a business process. Therefore, it is valuable to analyze batch processing activities to identify waiting time wastes. Identifying and analyzing such wastes present the analyst with improvement opportunities that, if addressed, can improve the cycle time efficiency (CTE) of a business process. In this paper, we propose an approach that, given a process execution event log, (1) identifies batch processing activities, (2) analyzes their inefficiencies caused by different types of waiting times to provide analysts with information on how to improve batch processing activities. More specifically, we conceptualize different waiting times caused by batch processing patterns and identify improvement opportunities based on the impact of each waiting time type on the CTE. Finally, we demonstrate the applicability of our approach to a real-life event log.
Business process simulation is a methodology that enables analysts to run the process in different scenarios, compare the performances and consequently provide indications into how to improve a business process. Process simulation requires one to provide a simulation model, which should accurately reflect reality to ensure the reliability of the simulation findings. This paper proposes a framework to assess the extent to which a simulation model reflects reality and to pinpoint how to reduce the distance. The starting point is a business simulation model, along with a real event log that records actual executions of the business process being simulated and analyzed. In a nutshell, the idea is to simulate the process, thus obtaining a simulation log, which is subsequently compared with the real event log. A decision tree is built, using the vector of features that represent the behavioral characteristics of log traces. The tree aims to classify traces as belonging to the real and simulated event logs, and the discriminating features encode the difference between reality, represented in the real event log, and the simulation model, represented in the simulated event logs. These features provide actionable insights into how to repair simulation models to become closer to reality. The technique has been assessed on a real-life process for which the literature provides a real event log and a simulation model. The results of the evaluation show that our framework increases the accuracy of the given initial simulation model to better reflect reality.
Process mining is a discipline sitting between data mining and process science, whose goal is to provide theoretical methods and software tools to analyse process execution data, known as event logs. Although process mining was originally conceived to facilitate business process management activities, research studies have shown the benefit of leveraging process mining in healthcare contexts. However, applying process mining tools to analyse healthcare process execution data is not straightforward. In this paper, we show a methodology to: i) prepare general practice healthcare process data for conducting a process mining analysis; ii) select and apply suitable process mining solutions for successfully executing the analysis; and iii) extract valuable insights from the obtained results, alongside leads for traditional data mining analysis. By doing so, we identified two major challenges when using process mining solutions for analysing healthcare process data, and highlighted benefits and limitations of the state-of-the-art process mining techniques when dealing with highly variable processes and large data-sets. While we provide solutions to the identified challenges, the overarching goal of this study was to detect differences between the patients‘ health services utilization pattern observed in 2020–during the COVID-19 pandemic and mandatory lock-downs –and the one observed in the prior four years, 2016 to 2019. By using a combination of process mining techniques and traditional data mining, we were able to demonstrate that vaccinations in Victoria did not drop drastically–as other interactions did. On the contrary, we observed a surge of influenza and pneumococcus vaccinations in 2020, as opposed to other research findings of similar studies conducted in different geographical areas.
Medical process trace classification exploits the activity sequences logged by an healthcare organization to classify traces themselves on the basis of some performance properties; this information can be used for quality assessment. State-of-the-art process trace classification resorts to deep learning, a very powerful technique which however suffers from the lack of explainability. In this paper we aim at addressing this issue, motivated by a relevant application, i.e., the classification of process traces for quality assessment in stroke management. To this end we introduce the novel concept of trace saliency maps, an instrument able to highlight what trace activities are particularly significant for the classification task. Through trace saliency maps we justify the output of the deep learning architecture, and make it more easily interpretable to medical users. The good results in our use case have shown the feasibility of the approach, and let us make the hypothesis that it might be translated to other application settings and to other black box learners as well.
Conference Paper
Full-text available
Service delivery organizations cater similar processes across several clients. Process variants may manifest due to the differences in the nature of clients, heterogeneity in the type of cases, etc. The organization’s operational Key Performance Indices (KPIs) across these variants may vary, e.g., KPIs for some variants may be better than others. There is a need to gain insights for such variance in performance and seek opportunities to learn from well performing process variants (e.g., to establish best practices and standardization of processes) and leverage these learnings/insights on non-performing ones. In this paper, we present an approach to analyze two or more process variants, presented as annotated process maps. Our approach identifies and reasons the key differences, manifested in both the control-flow (e.g., frequent paths) and performance (e.g., flow time, activity execution times, etc.) perspectives, among these variants. The fragments within process variants where the key differences manifest are targets for process redesign and re-engineering. The proposed approach has been implemented as a plug-in in the process mining framework, ProM, and applied on real-life case studies.
Conference Paper
Full-text available
To visualize information about process behavior over time, typically timeline based visualizations are used in contemporary analysis tools. When an overview over a large range of process instances and possible repetitive behavior is to be displayed, however, the timeline projection comes with several limitations. In this article, an alternative to the common timeline projection of process event data is elaborated, which allows to project series of time-related events and regularities therein onto a circular structure. Especially for comparing process rhythms in multiple sets of event data, this visualization comes with advantages over timeline projections and provides more flexibility in configuration. A conceptual elaboration of the approach together with a prototypical implementation is presented in this paper.
Full-text available
An organisation can significantly improve its performance by observing how their business operations are currently being carried out. A great way to derive evidence-based process improvement insights is to compare the behaviour and performance of processes for different process cohorts by utilising the information recorded in event logs. A process cohort is a coherent group of process instances that has one or more shared characteristics. Such process performance comparisons can highlight positive or negative variations that can be evident in a particular cohort, thus enabling a tailored approach to process improvement. Although existing process mining techniques can be used to calculate various statistics from event logs for performance analysis, most techniques calculate and display the statistics for each cohort separately. Furthermore, the numerical statistics and simple visualisations may not be intuitive enough to allow users to compare the performance of various cohorts efficiently and effectively. We developed a novel visualisation framework for log-based process performance comparison to address these issues. It enables analysts to quickly identify the performance differences between cohorts. The framework supports the selection of cohorts and a three-dimensional visualisation to compare the cohorts using a variety of performance metrics. The approach has been implemented as a set of plug-ins within the open source process mining framework ProM and has been evaluated using two real-life data sets from the insurance domain to assess the usefulness of such a tool. This paper also derives a set of design principles from our approach which provide guidance for the development of new approaches to process cohort performance comparison.
Conference Paper
Full-text available
Process mining techniques enable the analysis of processes using event data. For structured processes without too many variations, it is possible to show a relative simple model and project performance and conformance information on it. However, if there are multiple classes of cases exhibiting markedly different behaviors, then the overall process will be too complex to interpret. Moreover, it will be impossible to see differences in performance and conformance for the different process variants. The different process variations should be analysed separately and compared to each other from different perspectives to obtain meaningful insights about the different behaviors embedded in the process. This paper formalizes the notion of process cubes where the event data is presented and organized using different dimensions. Each cell in the cube corresponds to a set of events which can be used as an input by any process mining technique. This notion is related to the well-known OLAP (Online Analytical Processing) data cubes, adapting the OLAP paradigm to event data through multidimensional process mining. This adaptation is far from trivial given the nature of event data which cannot be easily summarized or aggregated, conflicting with classical OLAP assumptions. For example, multidimensional process mining can be used to analyze the different versions of a sales processes, where each version can be defined according to different dimensions such as location or time, and then the different results can be compared. This new way of looking at processes may provide valuable insights for process optimization.
Conference Paper
Full-text available
This paper addresses the problem of explaining behavioral differences between two business process event logs. The paper presents a method that, given two event logs, returns a set of statements in natural language capturing behavior that is present or frequent in one log, while absent or infrequent in the other. This log delta analysis method allows users to diagnose differences between normal and deviant executions of a process or between two versions or variants of a process. The method relies on a novel approach to losslessly encode an event log as an event structure, combined with a frequency-enhanced technique for differencing pairs of event structures. A validation of the proposed method shows that it accurately diagnoses typical change patterns and can explain differences between normal and deviant cases in a real-life log, more compactly and precisely than previously proposed methods.
Conference Paper
Full-text available
This paper evaluates the suitability of sequence classification techniques for analyzing deviant business process executions based on event logs. Deviant process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as executions that undershoot or exceed performance targets. We evaluate a range of features and classification methods based on their ability to accurately discriminate between normal and deviant executions. We also analyze the ability of the discovered rules to explain potential causes of observed deviances. The evaluation shows that feature types extracted using pattern mining techniques only slightly outperform those based on individual activity frequency. It also suggest that more complex feature types ought to be explored to achieve higher levels of accuracy.
This paper addresses the problem of comparing different variants of the same process. We aim to detect relevant differences between processes based on what was recorded in event logs. We use transition systems to model behavior and to highlight differences. Transition systems are annotated with measurements, used to compare the behavior in the different variants. Decision points in the transition system are also analyzed, and the business rules of the process variants in such points are compared. The results are visualized as colored transitions systems, where the colored states and transitions indicate the existence and magnitude of differences. The approach has been implemented in ProM, and the implementation is publicly available. The approach has been evaluated using real-life data sets. The results show how our technique is able to detect relevant differences that could not be captured using existing approaches. Moreover, the user is not overloaded with diagnostics on differences that are less significant.
Conference Paper
Business processes depend on human resources and managers must regularly evaluate the performance of their employees based on a number of measures, some of which are subjective in nature. As modern organisations use information systems to automate their business processes and record information about processes’ executions in event logs, it now becomes possible to get objective information about resource behaviours by analysing data recorded in event logs. We present an extensible framework for extracting knowledge from event logs about the behaviour of a human resource and for analysing the dynamics of this behaviour over time. The framework is fully automated and implements a predefined set of behavioural indicators for human resources. It also provides a means for organisations to define their own behavioural indicators, using the conventional Structured Query Language, and a means to analyse the dynamics of these indicators. The framework’s applicability is demonstrated using an event log from a German bank.
Some of the shortcomings in interpretability and generalizability of the effect size statistics currently available to researchers can be overcome by a statistic that expresses how often a score sampled from one distribution will be greater than a score sampled from another distribution. The statistic, the common language effect size indicator, is easily calculated from sample means and variances (or from proportions in the case of nominal-level data). It can be used for expressing the effect observed in both independent and related sample designs and in both 2-group and n-group designs. Empirical tests show it to be robust to violations of the normality assumption, particularly when the variances in the 2 parent distributions are equal. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Profiling event logs to configure risk indicators for process delays
  • A Pika
  • W M P Van Der Aalst
  • C Fidge
  • A H M Ter Hofstede
  • M Wynn
A. Pika, W.M.P. van der Aalst, C. Fidge, A.H.M. ter Hofstede, and M. Wynn. Profiling event logs to configure risk indicators for process delays. In Proc. of CAiSE. Springer, 2013.