ChapterPDF Available

A Framework for Estimating Simplicity of Automatically Discovered Process Models Based on Structural and Behavioral Characteristics

Chapter

A Framework for Estimating Simplicity of Automatically Discovered Process Models Based on Structural and Behavioral Characteristics

Abstract and Figures

A plethora of algorithms for automatically discovering process models from event logs has emerged. The discovered models are used for analysis and come with a graphical flowchart-like representation that supports their comprehension by analysts. According to the Occam’s Razor principle, a model should encode the process behavior with as few constructs as possible, that is, it should not be overcomplicated without necessity. The simpler the graphical representation, the easier the described behavior can be understood by a stakeholder. Conversely, and intuitively, a complex representation should be harder to understand. Although various conformance checking techniques that relate the behavior of discovered models to the behavior recorded in event logs have been proposed, there are no methods for evaluating whether this behavior is represented in the simplest possible way. Existing techniques for measuring the simplicity of discovered models focus on their structural characteristics such as size or density, and ignore the behavior these models encoded. In this paper, we present a conceptual framework that can be instantiated into a concrete approach for estimating the simplicity of a model, considering the behavior the model describes, thus allowing a more holistic analysis. The reported evaluation over real-life event logs for several instantiations of the framework demonstrates its feasibility in practice.
Content may be subject to copyright.
Postprint, June 2020
A Framework for Estimating Simplicity of
Automatically Discovered Process Models Based on
Structural and Behavioral Characteristics
Anna Kalenkova , Artem Polyvyanyy , and Marcello La Rosa
School of Computing and Information Systems
The University of Melbourne, Parkville, VIC, 3010, Australia
{anna.kalenkova;artem.polyvyanyy;marcello.larosa}@unimelb.edu.au
Abstract. A plethora of algorithms for automatically discovering process mod-
els from event logs has emerged. The discovered models are used for analysis and
come with a graphical flowchart-like representation that supports their compre-
hension by analysts. According to the Occam’s Razor principle, a model should
encode the process behavior with as few constructs as possible, that is, it should
not be overcomplicated without necessity. The simpler the graphical representa-
tion, the easier the described behavior can be understood by a stakeholder. Con-
versely, and intuitively, a complex representation should be harder to understand.
Although various conformance checking techniques that relate the behavior of
discovered models to the behavior recorded in event logs have been proposed,
there are no methods for evaluating whether this behavior is represented in the
simplest possible way. Existing techniques for measuring the simplicity of dis-
covered models focus on their structural characteristics such as size or density,
and ignore the behavior these models encoded. In this paper, we present a con-
ceptual framework that can be instantiated into a concrete approach for estimating
the simplicity of a model, considering the behavior the model describes, thus al-
lowing a more holistic analysis. The reported evaluation over real-life event logs
for several instantiations of the framework demonstrates its feasibility in practice.
1 Introduction
Information systems keep records of the business processes they support in the form of
event logs. An event log is a collection of traces encoding timestamped actions under-
took to execute the corresponding process. Thus, such logs contain valuable information
on how business processes are carried out in the real world. Process mining [1] aims
to exploit this historical information to understand, analyze, and ultimately improve
business processes. A core problem in process mining is that of automatically discov-
ering a process model from an event log. Such a model should faithfully encode the
process behavior captured in the log and, hence, meet a range of criteria. Specifically,
a discovered model should describe the traces recorded in the log (have good fitness),
not encode traces not present in the log (have good precision), capture possible traces
that may stem from the same process but are not present in the log (have good gener-
alization), and be “simple”. These quality measures for discovered process models are
studied within the conformance checking area of process mining. A good discovered
model is thus supposed to achieve a good balance between these criteria [3].
2 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
In [1], Van der Aalst suggests that process discovery should be guided by the Oc-
cam’s Razor principle [10,8], a problem-solving principle attributed to William of Ock-
ham. Accordingly, “one should not increase, beyond what is necessary, the number
of entities required to explain anything” [1]. Specifically to process mining, a discov-
ered process model should only contain the necessary constructs. Various measures
for assessing whether a discovered model is simple have been proposed in the liter-
ature [9,15], such as the number of nodes and arcs, density, and diameter. However,
these measures address the number of constructs, i.e., the structure of the discovered
models, while ignoring what these constructs describe, i.e., the process behavior.
In this paper, we present a framework that considers the model’s structure and be-
havior to operationalize Occam’s Razor principle for measuring the simplicity of a pro-
cess model discovered from an event log. The framework comprises three components
that can selectively be configured: (i) a notion for measuring the structural complexity
of a process model, e.g., size or diameter; (ii) a notion for assessing the behavioral sim-
ilarity, or equivalence, of process models, e.g., trace equivalence, bisimulation, or en-
tropy; and (iii) the representation bias, i.e., a modeling language for describing models.
A configured framework results in an approach for estimating the simplicity of process
models. The obtained simplicity score establishes whether the behavior captured by the
model can be encoded in a structurally simpler model. To this end, the structure of the
model is related to the structures of other behaviorally similar process models.
To demonstrate these ideas, we instantiate the framework with the number of nodes
[9] and control flow complexity (cfc) [4] measures of structural complexity, topological
entropy [20] measure of behavioral similarity, and uniquely labeled block-structured
[19] process models captured in the Business Process Model and Notation (BPMN)
[18] as the representation bias. We then apply these framework instantiations to assess
the simplicity of the process models automatically discovered from event logs by the
Inductive miner algorithm [16]. This algorithm constructs process trees, which can then
be converted into uniquely labeled block-structured BPMN models [14].
Once the framework is configured, the next challenge is to obtain models of various
structures that specify behaviors similar to that captured by the given model, as these
are then used to establish and quantify the amount of unnecessary structural information
in the given model. To achieve completeness, one should aim to obtain all the similar
models, including the simplest ones. As an exhaustive approach for synthesizing all
such models is often unfeasible in practice, in this paper, we take an empirical approach
and synthesize random models that approximate those models with similar behavior. To
implement such approximations, we developed a tool that generates uniquely labeled
block-structured BPMN models randomly, or exhaustively for some restricted cases,
and measures their structural complexity and behavioral similarity.
The remainder of this paper is organized as follows. Section 2 discusses an example
that motivates the problem of ignoring the behavior when measuring the simplicity of
process models. Section 3 presents our framework for estimating the simplicity of the
discovered models. In Section 4, we instantiate the framework with concrete compo-
nents. Section 5 presents the results of an analysis of process models discovered from
real-life event logs using our framework instantiations. Section 6 concludes the paper.
A Framework for Estimating Simplicity of Process Models 3
2 Motivating Example
In this section, we show that the existing simplicity measures do not always follow the
Occam’s Razor principle. Consider event log L={hload page,fill name,fil l passport,
fill expire datei,hload page,fill name,fill expire date,fill passporti,hload page,
fill passport,fill expire datei,hload page,fill name ,fill expire datei,hload page,
fill name ,fill passporti,hload page,fill name,fil l passport,fill expire date,load pagei,
hload page,fill name,fill expire date,fill passport,load pagei,hload page,fill name,
load pagei,hfill name,fill expire date,fill passporti,hfill passport,fill expire datei}
generated by a passport renewal information system.1The log contains ten traces, each
encoded as a sequence of events, or steps, taken by the users of the system. Usually, the
user loads the Web page and fills out relevant forms with details such as name, previous
passport number, and expiry date. Some steps in the traces may be skipped or repeated,
as this is common for the real world event data [13], Fig. 1 and Fig. 2 present BPMN
models discovered from Lusing, respectively, the Split miner [2] and Inductive miner
(with the noise threshold set to 0.2) [16] process discovery algorithm.
load page
fill name
fill
passport
fill expire
date
Fig. 1: A BPMN model discovered by Split miner from event log L.
It is evident from the figures that the models are different. First, they have differ-
ent structures. Fig. 1 shows an acyclic model with exclusive and parallel branches. In
contrast, the model in Fig. 2 only contains exclusive branches enclosed in a loop and
allowing to skip any of the steps. Second, the models describe different collections of
traces. While the model in Fig. 1 describes three traces (viz. hload pagei,hload page,
fill name ,fill passport,fil l expire datei, and hload page,fill name,fil l expire date,
fill passporti), the model in Fig. 2 describes all the possible traces over the given steps,
i.e., all the possible sequences of the steps, including repetitions.
fill name fill
passport
fill expire
date
load page
Fig. 2: A BPMN model discovered by Inductive miner from event log L.
The latter fact is also evident in the precision and recall values between the models
and log. The precision and recall values of the model in Fig. 1 are 0.852 and 0.672,
1This simple example is inspired by a real world event log analyzed in [13].
4 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
respectively, while the precision and recall values of the model in Fig. 2 are 0.342 and
1.0, respectively; the values were obtained using the entropy-based measures presented
in [20]. The values indicate, for instance, that the model in Fig. 2 is more permissive
(has lower precision), i.e., encodes more behavior not seen in the log than the model in
Fig. 1, and describes all the traces in the log (has perfect fitness of 1.0); the measures
take values on the interval [0,1] with larger values showing better precision and fitness.
load page
fill name
fill
passport
fill expire
date
Fig. 3: A “flower” model.
To assess the simplicity of discovered pro-
cess models, measures of their structural complex-
ity [4,9,15,17], such as the number of nodes and/or
edges,density,depth,coefficients of network con-
nectivity, and control flow complexity, can be em-
ployed. If one relies on the number of nodes to es-
tablish the simplicity of the two example models,
then they will derive at the conclusion that they are
equally simple, as both contain ten nodes. This con-
clusion is, however, na¨
ıve for at least two reasons:
(i) the two models use ten nodes to encode different
behaviors, and (ii) it may be unnecessary to use ten
nodes to encode the corresponding behaviors.
The model in Fig. 3 describes the same behavior as the model in Fig. 2 using eight
nodes. One can use different notions to establish similarity of the behaviors, including
exact (e.g., trace equivalence) or approximate (e.g., topological entropy). The models in
Figs. 2 and 3 are trace equivalent and specify the behaviors that have the (short-circuit)
topological entropy of 1.0 [20]. Intuitively, the entropy measures the “variety” of traces
of different lengths specified by the model. The more distinct traces of different lengths
the model describes, the closer the entropy is to 1.0. The entropy of the model in Fig. 1 is
0.185. There is no block-structured BPMN model with unique task labels that describes
the behavior with the entropy of 0.185 and uses less than ten nodes. Thus, we argue that
the model in Fig. 1 should be accepted as such that is simpler than the model in Fig. 2.
3 A Framework for Estimating Simplicity of Process Models
In this section, we present our framework for estimating the simplicity of process mod-
els. The framework describes standard components that can be configured to result in
a concrete measure of simplicity. The simplicity framework is a tuple F= (M,C,B),
where Mis a collection of process models,C:M → R+
0is a measure of structural
complexity, and B ⊆ (M×M)is a behavioral equivalence relation over M.
The process models are captured using some process modeling language (represen-
tation bias), e.g., finite state machines [11], Petri nets [21], or BPMN [18]. The mea-
sure of structural complexity is a function that maps the models onto non-negative real
numbers, with smaller assigned numbers indicating simpler models. For graph-based
models, this can be the number of nodes and edges, density, diameter, or some other
existing measure of simplicity used in process mining [17]. The behavioral equivalence
relation Bmust define an equivalence relation over M, i.e., be reflexive, symmetric,
and transitive. For instance, Bcan be given by (weak or strong) bisimulation [12] or
trace equivalence [11] relation over models. Alternatively, equivalence classes of Bcan
A Framework for Estimating Simplicity of Process Models 5
m1
m*
M
m2
mk-1
mk
(a) An equivalence class.
m
M
m'1
m'*
M'
m'2
m'l-1
m'lm''1
m''*
M''
m''2
m''k-1
m''k
(b) Similar equivalence classes.
Fig. 4: Behavioral classes of model equivalence.
be defined by models with the same or similar measure of behavioral complexity, e.g.,
(short-circuit) topological entropy [20].
Given a model m1∈ M, its behavioral equivalence class per relation Bis the set
M={m M | (m, m1)∈ B}, cf. Fig. 4a. If one knows a model mMwith the
lowest structural complexity in M, i.e., mM:C(m)≤ C(m), then they can put
the simplicity of models in Minto the perspective of the simplicity of m. For instance,
one can use function sim(m) = (C(m)+1)
/(C(m)+1) to establish such a perspective.
Suppose that Mis the set of all block-structured BPMN models with four uniquely
labeled tasks, Bis the trace equivalence relation, and Cis the measure of the number of
nodes in the models. Then, it holds that sim(m1)=1.0and sim(m2) = 9
/11 = 0.818,
where m1and m2are the models from Fig. 1 and Fig. 2, respectively, indicating that m1
is simpler than m2. To obtain these simplicity values, we used our tool and generated
all the block-structured BPMN models over four uniquely labeled tasks, computed all
the behavioral equivalence classes over the generated models, and collected statistics
on the numbers of nodes in the models.
For some configurations of the framework, however, such exhaustive analysis may
yield intractable. For instance, the collection of models of interest may be infinite, or
finite but immense. Note that the number of block-structured BPMN models with four
uniquely labeled tasks is 2,211,840, and is 297,271,296 if one considers models with
five uniquely labeled tasks (the number of models grows exponentially with the number
of allowed labels). In such cases, we suggest grounding the analysis in a representative
subset M0⊂ M of the models.
Suppose that one analyzes model m M that has no other (or only a few) models in
its equivalence class M, refer to Fig. 4b. Then, model mcan be compared to models of
lowest structural complexities m0
and m00
from some other equivalence classes M0and
M00 which contain models that describe the behaviors “similar” to the one captured by
m. To this end, one needs to establish a measure of “similarity” between the behavioral
equivalence classes of models.
In the next section, we exemplify the discussed concepts by presenting example
instantiations of the framework.
4 Framework Instantiations
In this section, we instantiate our framework F= (M,C,B)for assessing the simplic-
ity of process models discovered from event logs and define the set of models (M),
structural complexity (C), and the behavioral equivalence relation (B) as follows:
6 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
Mis a set of block-structured BPMN models with a fixed number of uniquely
labeled tasks. BPMN is one of the most popular process modeling languages. Be-
sides, block-structured uniquely labeled process models are discovered by Induc-
tive miner — a widely used process discovery algorithm;
Cis either the number of nodes or the control flow complexity measure. These mea-
sures were selected among other simplicity measures, because, as shown empiri-
cally in Section 5, there is a relation between these measures and the behavioral
characteristics of process models; and
Bis the behavioral equivalence relation induced by the notion of (short-circuit)
topological entropy [20]. The entropy measure is selected because it maps process
models onto non-negative real numbers that reflect the complexity of the behaviors
they describe; the greater the entropy, the more variability is present in the un-
derlying behavior. Consequently, models from an equivalence class of Bdescribe
behaviors with the same (or very similar) entropy values.
For BPMN models the problem of minimization is still open and only some rules for lo-
cal BPMN models simplification exist [14,22]. Although, NP-complete techniques [5]
synthesizing Petri nets with minimal regions (corresponding to BPMN models [14] with
minimal number of routing contracts) from the sets traces can be applied, there is no
general algorithm for finding a block-structured BPMN model that contains a minimal
possible number of nodes or has a minimal control flow complexity for a given process
behavior. In this case, it may be feasible to generate the set of all possible process mod-
els for the given behavioral class (see the general description of this approach within our
framework Section 3, Fig. 4a). However, due to the combinatorial explosion, the possi-
ble number of block-structured BPMN models grows exponentially with the number of
tasks. While it is still possible to generate all block-structured BPMN models contain-
ing 4 or less tasks, for larger number of tasks this problem is computationally expensive
and cannot be solved in any reasonable amount of time. In this work, we propose an ap-
proach that approximates the exact solutions by comparing analyzed models with only
some randomly generated models that behave similarly. This approach implements a
general approximation idea proposed within our framework (Section 3, Fig. 4b). Sec-
tion 4.1 introduces basic notions used to describe this approach. Section 4.2 describes
the proposed approach, discusses its parameters and analyzes dependencies between
structural and behavioral characteristics of block-structured BPMN models.
4.1 Basic Notions
In this subsection, we define basic notions that are used later in this section.
Let Xbe a finite set of elements. By hx1, x2, . . . , xki, where x1, x2, . . . , xkX,
kN0, we denote a finite sequence of elements over Xof length k.Xstands for the
set of all finite sequences over Xincluding the empty sequence of zero length.
Given two sequences x=hx1, x2, . . . , xkiand y=hy1, y2, . . . , ymi, by x·ywe
denote concatenation of xand y, i.e., the sequence obtained by appending yto the end
of x, i.e., x·y=hx1, x2, . . . , xk, y1, y2, . . . , ymi.
An alphabet is a nonempty finite set. The elements of an alphabet are its labels. A
(formal) language Lover an alphabet Σis a (not necessarily finite) set of sequences,
A Framework for Estimating Simplicity of Process Models 7
a
(a) Initial pattern.
a
ab
(b) Sequence pattern.
a
ab
a
b
(c) Choice pattern.
a
ab
a
b
a
b
(d) Parallel pattern.
a
ab
a
b
a
b
a
(e) Loop pattern.
a
ab
a
b
a
b
a
a
(f) Skip pattern.
Fig. 5: Patterns of block-structured BPMN models.
over Σ, i.e., LΣ. Let L1and L2be two languages. Then, L1L2is their con-
catenation defined by {l1·l2|l1L1l2L2}. The language Lis defined as
L=S
n=0 Ln, where L0={hi},Ln=Ln1L.
Structural Representation. The class of process models considered in this work are
block-structured BPMN models that are often used for the representation of processes
discovered from event logs, e.g., these models are discovered by the Inductive mining
algorithm [16].
Block-structured BPMN models are constructed from the following basic set of el-
ements: start and end events represented by circles with thin and thick borders respec-
tively and denoting beginning and termination of the process; tasks modeling atomic
process steps and depicted by rounded rectangles with labels; routing exclusive and par-
allel gateways modeling exclusive and parallel executions and presented by diamonds;
and control flow arcs that define the order in which elements are executed.
The investigated class of block-structured BPMN models consists of all and only
BPMN models that:
1) can be constructed starting from the initial model presented in Fig. 5a and induc-
tively replacing tasks with the patterns presented in Figs. 5b to 5f;
2) have uniquely labeled tasks, i.e., any two tasks have different labels;
3) only patterns other than loop can be applied to the nested task of the loop; only pat-
terns other than skip and loop can be applied to the nested task of the skip pattern;
When constructing a model, the number of tasks increases if the patterns from
Figs. 5b to 5d are applied, the patterns Figs. 5e and 5f can be applied no more than
twice in a row, and the pattern Fig. 5a is applied only once. Hence, if we fix the number
tasks (labels) in the investigated models, the the overall set of these models is finite.
After constructing the collection of models, local minimization rules are applied [14].
These rules merge gateways without changing the model semantics. An example of lo-
cal reduction of gateways is presented in Fig. 6a, Fig. 6b illustrates merging of loop and
skip constructs. For the detailed description of local minimization rules refer to [14].
8 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
c
a
b
a
c
b
a
(a) Merging parallel gateways.
a
a
(b) Merging loop and skip constructs.
Fig. 6: Examples of applying local minimization rules.
We focus on the following complexity measures of block-structured BPMN mod-
els: (1) Cn– the number of nodes (including start and end events, tasks, and gateways);
(2) Ccfc - the control flow complexity measure, which is defined as a sum of two num-
bers: the number of all splitting parallel gateways and the total number of all outgoing
control flows of all splitting exclusive (choice) gateways [4].
Sequences of labels are used to encode executions of business processes. The or-
dering of tasks being executed defines the ordering of labels in a sequence. We say
that a process model encodes or accepts a formal language if and only if this lan-
guage contains all possible sequences of labels corresponding to the orderings of tasks
being executed within the model and only them. Fig. 7a presents an example of a
block-structured model m1that accepts language L1={ha, b, ci,ha, c, bi,hb, a, ci,
hb, c, ai,hc, a, bi,hc, b, ai}. A block-structured BPMN model m2accepting an infinite
language of all sequences starting with ain alphabet {a, b, c}is presented in Fig. 7b.
a
c
b
a
(a) Block-structured BPMN model m1.
c
b
a
(b) Block-structured BPMN model m2.
Fig. 7: Examples of block-structured BPMN models.
Behavioral Representation. Next, we recall the notion of entropy which is used for
the behavioral analysis of process models and event logs [20]. Let Σbe an alphabet
and let LΣbe a language over this alphabet. We say that language Lis irreducible
regular language if and only if it is accepted by a strongly-connected automata model
(for details refer to [6]). Let Cn(L),nN0, be the set of all sequences in Lof length n.
Then, the topological entropy that estimates the cardinality of Lby measuring the ratio
of the number of distinct sequences in the language to the length of these sequences is
defined as [6]:
ent(L) = lim sup
n→∞
log |Cn(L)|
n.(1)
A Framework for Estimating Simplicity of Process Models 9
The languages accepted by block-structured BPMN models are regular, because
they are also accepted by corresponding automata models [11]. But not all of them
are irreducible, so the standard topological entropy (Eq. (1)) cannot be always cal-
culated. To that end, in [20], it was proposed to construct an irreducible language
(L◦ {hχi})L, where χ /Σ, for each language L, and use so-called short-circuit
entropy ent(L) = ent((L◦ {hχi})L). Monotonicity of the short-circuit measure
follows immediately from the definition of the short-circuit topological entropy and
Lemma 4.7 in [20]:
Corollary 4.1 (Topological entropy). Let L1and L2be two regular languages.
1. If L1=L2, then ent(L1) = ent (L2);
2. If L1L2, then ent(L1)<ent (L2).
Note that the opposite is not always true, i.e., different languages can be represented
by the same entropy value. Although the language (trace) equivalence is stricter than
the entropy-based equivalence, in the next section, we show that entropy is still useful
for classifying the process behavior.
In this paper, we use the notion of normalized entropy. Suppose that Lis a language
over alphabet Σ, then the normalized entropy of Lis defined as: ent(L) = ent(L)
ent(Σ),
where Σis the language containing all words over alphabet Σ. The normalized en-
tropy value is bounded, because, for any language Lit holds that LΣ, and hence, by
Corollary 4.1 ent(L)ent(Σ), consequently ent(L)[0,1]. Obviously, Corol-
lary 4.1 can be formulated and applied to the normalized entropy measure, i.e., for two
languages L1and L2over alphabet Σ, if L1=L2, then ent(L1) = ent (L2), and if
L1L2, it holds that ent(L1)<ent(L2).
We define the relation of behavioral equivalence Busing the normalized entropy.
Let Σbe an alphabet and let L1, L2Σbe languages accepted by models m1and
m2respectively, (m1, m2)∈ B if and only if ent(L1) = ent(L2).
Normalized entropy not only allows to define the notion of behavioral equivalence,
but also to formalize the notion of behavioral similarity. For a given parameter , we
say that two models m1and m2are behaviorally similar if and only if |ent(L1)
ent(L2)|< ∆, where L1and L2are the languages these models accept.
4.2 Estimating Simplicity of Block-Structured BPMN Models
In this subsection, we devise a method for assessing the simplicity of uniquely labeled
block-structured BPMN models. As no analytical method for synthetizing a “minimal”
block-structured BPMN model in terms of number of nodes or control flow complexity
for a given behavior is known, and no computationally feasible approach for generating
all possible models with a given behavior exists, we propose an approach that inves-
tigates the dependencies between the structural and behavioral model characteristics
empirically, and reuse these dependencies to measure the simplicity of models.
As the set of all models Mcannot be exhaustively constructed, we generate its sub-
set M0 M and relate analyzed models from Mwith behaviorally similar models
from M0, producing an approximate solution. We then estimate the simplicity of the
10 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
0 0.2 0.4 0.6 0.8 1
4
6
8
10
12
E
C
(a)
0 0.2 0.4 0.6 0.8 1
4
6
8
10
12
f+
f
f
E
C
(b)
Fig. 8: Structural (C) and behavioral (E) characteristics of process models: (a) for all
models in M0and (b) filtered models with upper and lower envelops for = 0.05.
given model by comparing its structural complexity with that of the simplest behav-
iorally similar models, with the complexity of these models being in a certain interval
from “the best case” to “the worst case” complexity. In order to define this interval and
relate it to entropy values, we construct envelope functions fand f+that approxi-
mate “the best case” and “the worst case” structural complexity of the simplest process
models for a given entropy value. Below we give an approach for constructing these
envelope functions and define the simplicity measure that relates the model complexity
to an interval defined by these functions.
Let E:M0[0,1] be a function that maps process models in M0onto the
corresponding normalized entropy values. Fig. 8a presents an example plot relating
structural characteristics C(m)and entropy E(m)values for each process model m
M0; the example is artificial and does not correspond to any concrete structural and
behavioral measures of process models. Once such data points are obtained, we filter
out all the models m∈ M0such that m0∈ M0:E(m) = E(m0)and C(m)>C(m0).
In other words, we filter out a model, if its underlying behavior can be described in
a structurally simpler model. This means that only the structurally “simplest” models
remain. The process models that were filtered out are presented by gray dots in Fig. 8a.
Once the set M00 of the models remaining after filtering the models in M0is ob-
tained, it defines the partial function f: [0,1] 6→ R+
0, such that f(e) = cif and only
if exists a model m∈ M00, where e=E(m)and c=C(m)(Fig. 8b). This function
relates the behavioral and structural characteristics of the remaining models. Then, we
construct envelop functions that define intervals of the structural complexities of the
remaining “simplest” models. The upper envelop f+: [0,1] R+
0is a function going
through the set of data points D+={(e, f (e)) | ∀e0dom(f) : (|ee0|< ∆
f(e)f(e0))}, where is a parameter that defines classes of behaviorally similar
models. Less formally, f+goes through all the data points that are maximum in a -
size window. Similarly, the lower envelop is defined as a function f: [0,1] R+
0
going through D={(e, f (e)) | ∀e0dom(f) : (|ee0|< ∆ f(e)f(e0))}. In
A Framework for Estimating Simplicity of Process Models 11
general, the envelope can be any smooth polynomial interpolation or a piecewise linear
function, the only restriction is that it goes through D+and Ddata points.
Parameter defines the measure of similarity between classes of behaviorally
equivalent models. In each case, should be selected empirically, for instance, too
small will lead to local evaluations, that may not be reliable because they do not take
into account global trends, while setting too large results in a situation when we do
not take into account entropy, relating our model with all other models from the set. In
Fig. 8b, the upper and lower envelops were constructed for = 0.05.
Using the upper and lower envelope functions we can estimate simplicity of process
models from M. The simplicity measure is defined in Eq. (2), where sim(m)is the
simplicity of model m∈ M with an entropy value e=E(m);αand βare parameters,
such that α, β [0,1] and α+β1.
sim(m) =
α·f+(e)
C(m)C(m)f+(e)
α+ (1 αβ)·(f+(e)− C(m))
(f+(e)f(e)) f(e)≤ C(m)< f +(e)
1.0β·C(m)
f(e)C(m)< f(e)
(2)
According to Eq. (2), sim(m)is in the interval between zero and one. Parameters
αand βare used to adjust the measure. Parameter αshows the level of confidence that
some complexity values can be above the upper envelope. If the complexity C(m)of
model mis above the upper envelope f+(e),mis more complex than “the worst case”
model, then sim(m)is less than or equal to αand tends to zero as C(m)grows. If the
model complexity C(m)is between the envelopes f(e)and f+(e), then sim(m)
(α, 1β]and the higher C(m)is (the closer the model is to “the worst case”), the
closer the simplicity value to α. Parameter βshows the level of confidence that some
data points may be below the lower envelope. If it is guaranteed that there are no models
in Mwith data points below the lower envelope it is feasible to set βto zero. Otherwise,
if C(m)is lower than the lower envelope f(e), then sim(m)belongs to the interval
(1 β, 1] and tends to one as C(m)approaches zero.
Next, we apply the proposed approach to construct upper and lower envelope func-
tions for the number of nodes and control flow complexity measures of block-structured
BPMN models with a fixed number of uniquely labeled tasks. To analyze the relations
between the structural and behavioral characteristics, we generated all block-structured
BPMN models with three tasks. Fig. 9 contains plots with data points representing the
behavioral and structural characteristics of these process models. These data points,
as well as the upper and lower envelopes, are constructed using the general technique
described above with window parameter = 0.01. Additionally, to make the enve-
lope functions less detailed and reflect the main trends, we construct them only through
some of the data points from D+and Dsets in such a way that the second derivative
of each envelope function is either non-negative or non-positive, i.e., envelope functions
are either convex or concave.
In both cases, the upper envelope functions f+
nand f+
cfc grow monotonically, start-
ing at the sequential model (a sequence of three tasks) with the entropy of zero, reach the
maximum, and then drop to the “flower” model (see an example of a “flower” model
12 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
0 0.2 0.4 0.6 0.8 1
5
10
15
20
f+
n
f
n
E
Cn
(a)
0 0.2 0.4 0.6 0.8 1
0
5
10
15
f+
cfc
f
cfc
E
Ccfc
(b)
Fig. 9: Dependencies between entropy and structural complexity of block-structured
BPMN models containing three tasks for the (a) number of nodes and (b) control flow
complexity structural complexity measures.
in Fig. 3) with the entropy of one. These results show that, as the number of nodes
(Fig. 9a) and complexity of the control flow (Fig. 9b) increase, the minimum possi-
ble entropy values also increase. As the model structure becomes more complex, more
behavior is allowed and the lower bound of the entropy increases.
While the lower envelope for the number of nodes measure f
n(Fig. 9a) is flat and
does not reveal any explicit dependency, the lower envelope for the control flow com-
plexity f
cfc (Fig. 9b) grows. This can be explained by the fact that the number of nodes
is reduced during the local model minimization [14], while the control flow complexity
measure takes into account the number of outgoing arcs of exclusive splitting gateways
and is not affected by the local minimization.
The empirical results and observations presented in this section reveal the main de-
pendencies between the structural complexity and the behavioral complexity of block-
structured BPMN models and can be generalized for an arbitrary number of tasks. They
also show that this approach relies on the quality of the model generator, i.e., the set
of generated models should be dense enough to reveal these dependencies. In the next
section, we apply the proposed approach to assess the simplicity of process models
discovered from real world event logs.
5 Evaluation
In this section, we use the approach from Section 4 to evaluate the simplicity of process
models discovered by Inductive miner (with noise threshold 0.2) from industrial Busi-
ness Process Intelligence Challenge (BPIC) event logs2and an event log of a booking
flight system (BFS). Before the analysis, we filtered out infrequent events that appear
less than in 80% of traces using the “Filter Log Using Simple Heuristics” Process Min-
ing Framework (ProM) plug-in [7].3The Inductive miner algorithm discovers uniquely
2BPIC logs: https://data.4tu.nl/repository/collection:event logs real.
3The filtered logs are available here: https://github.com/jbpt/codebase/tree/master/jbpt-pm/logs.
A Framework for Estimating Simplicity of Process Models 13
labeled block-structured BPMN models. As structural complexity measures, we used
the number of nodes and the control flow complexity (cfc).
To estimate the upper and lower bounds of the structural complexity of the models,
for each number of tasks (each size of the log alphabet), we generated 5,000 random
uniquely labeled block-structured BPMN models. For all the models, data points rep-
resenting the entropy and structural complexity of the models were constructed. We
then constructed the upper and lower envelopes using the window of size 0.01, refer to
Section 4 for the details. The envelopes were constructed as piecewise linear functions
going through all the selected data points.
For both structural complexity measures, parameter α, cf. Eq. (2), was set to 0.5,
i.e., models represented by the data points above the upper envelope are presumed to
have simplicity characteristic lower than 0.5. In turn, parameter βwas set to 0.0and
0.1for the number of nodes and cfc measure, respectively. In contrast to cfc, the lower
envelope for the number of nodes measure is defined as the minimal possible number of
nodes in any model, and this guarantees that there are no data points below it. The final
equations for the number of nodes and cfc simplicity measures, where e=E(m)is the
topological entropy of model m,Cn(m)is the number of nodes in m,Ccfc(m)is the cfc
complexity of m,f+
nand f+
cfc are the upper envelopes for the number of nodes and cfc
measures, and f
nand f
cfc are the corresponding lower envelopes, are given below.
simn(m) =
0.5·f+
n(e)
Cn(m)Cn(m)f+
n(e)
0.5+0.5·(f+
n(e)− Cn(m))
(f+
n(e)f
n(e)) f
n(e)≤ Cn(m)< f+
n(e)
1.0Cn(m)< f
n(e)
(3)
simcfc (m) =
0.5·f+
cfc(e)
Ccfc(m)Ccfc (m)f+
cfc(e)
0.5+0.4·(f+
cfc(e)− Ccfc (m))
(f+
cfc(e)f
cfc(e)) f
cfc(e)≤ Ccfc (m)< f +
cfc(e)
1.00.1·Ccfc(m)
f
cfc(e)Ccfc (m)< f
cfc(e)
(4)
Table 1 presents the original and adjusted (proposed in this paper) simplicity mea-
sures, induced by the number of nodes and cfc structural complexity measures, for the
process models discovered from the evaluated event logs. Models were discovered from
the filtered event logs and their sublogs that contain only traces appearing in the filtered
event logs at least two or four times. Model m8(refer to Fig. 11a) is an automatically
discovered process model with redundant nodes. The value of simn(m8)is less than 0.5
because the corresponding data point is above the upper envelope (Fig. 10a). Note that
the manually constructed “flower” model m0
8(Fig. 11) that accepts the same traces as
m8has better structural complexity and, consequently, the corresponding adjusted sim-
plicity measurements relate as follows: simn(m0
8) = 1.0>simn(m8) = 0.485, and
simcfc(m0
8)=0.890 >simcfc (m8)=0.723. The difference between simcfc (m8)and
simcfc(m0
8)simplicity values is not as significant as the difference between simn(m8)
and simn(m0
8), as despite m0
8has only two gateways the total number of outgoing
sequence flows from these gateways is rather high.
14 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
Model Event log #Traces #Labels Entropy Cnsim nCcfc sim cfc
m1BPIC’2019 3,365 6 0.484 12 0.923 5 0.767
m2BPIC’2019 614 6 0.333 12 0.887 5 0.697
m3BPIC’2019 302 6 0.377 12 0.901 6 0.714
m4BPIC’2018 15,536 8 0.800 25 0.684 20 0.690
m5BPIC’2018 1,570 7 0.432 16 0.802 10 0.680
m6BPIC’2018 618 7 0.638 18 0.813 15 0.676
m7BFS 279 6 0.378 14 0.754 5 0.723
m8BFS 70 6 0.847 20 0.485 13 0.723
m9BFS 29 6 0.258 15 0.630 7 0.516
Table 1: Simplicity of uniquely labeled block-structured BPMN models discovered from indus-
trial event logs and their sublogs; the number of unique traces (#Traces), number of distinct
labels in the discovered models (#Labels) and their entropy values (Entropy) are specified.
Models m1,m2, and m3have the same number of nodes, but different entropy val-
ues. Model m1is considered the simplest in terms of the number of nodes among the
three models because it is located further from the upper envelope than the other two
models. Hence, its simnvalue is the highest. Models m1and m2, which in addition
have the same control flow complexity values, are shown in Fig. 12a and Fig. 12b, re-
spectively. Model m1is considered more simple than model m2because it merely runs
all the tasks in parallel (allowing “Record Service Entry Sheet” task to be skipped or
executed several times). In contrast, model m2adds additional constraints on the order
of tasks (leading to lower entropy) and, thus, should be easier to test. Note that m1
models a more diverse behavior which in the worst case, according to the upper enve-
lope, can be modeled with more nodes. These results demonstrate that when analyzing
the simplicity of a discovered process model, it is feasible and beneficial to consider
both phenomena of the structural complexity of the model’s diagrammatic representa-
tion and the variability/complexity of the behavior the model describes. This way, one
can adhere to the Occam’s Razor problem-solving principle.
0 0.2 0.4 0.6 0.8 1
10
15
20
25
f+
n
f
n
m1
m2m3
m7
m8
m0
8
m9
E(m)
Cn(m)
(a) Number of nodes.
0 0.2 0.4 0.6 0.8 1
0
5
10
15
20
f+
cfc
f
cfc
m1
m2
m3
m7
m8
m0
8
m9
E(m)
Ccfc(m)
(b) Control flow complexity (cfc).
Fig. 10: Structural complexity of block-structured BPMN models over six labels.
A Framework for Estimating Simplicity of Process Models 15
surname-
fill name-fill birthday-
fill
docnum-
fill
docexpire
-fill
c_phone
_num-fill
surname-
fill name-fill
birthday-
fill
docnum-
fill
docexpire
-fill
c_phone
_num-fill
(a) Model m8.
surname-
fill name-fill birthday-
fill
docnum-
fill
docexpire
-fill
c_phone
_num-fill
surname-
fill name-fill
birthday-
fill
docnum-
fill
docexpire
-fill
c_phone
_num-fill
(b) Model m
0
8.
Fig. 11: Models m8and m0
8discovered from the BFS event log.
Create
Purchase
Order Item
Record
Goods
Receipt
Record
Invoice
Receipt Record
Service
Entry Sheet
Vendor
creates
invoice
Clear
Invoice
Create
Purchase
Order Item
Record
Goods
Receipt Record
Invoice
Receipt
Record
Service
Entry Sheet
Vendor
creates
invoice
Clear
Invoice
(a) Model m1.
Create
Purchase
Order Item
Record
Goods
Receipt
Record
Invoice
Receipt Record
Service
Entry Sheet
Vendor
creates
invoice
Clear
Invoice
Create
Purchase
Order Item
Record
Goods
Receipt Record
Invoice
Receipt
Record
Service
Entry Sheet
Vendor
creates
invoice
Clear
Invoice
(b) Model m2.
Fig. 12: Models m1and m2discovered from the BPIC’2019 event log.
6 Conclusion
This paper presents a framework that can be configured to result in a concrete approach
for measuring the simplicity of process models discovered from event logs. In con-
trast to the existing simplicity measures, our framework accounts for both a model’s
structure and behavior. In this paper, the framework was implemented for the class of
uniquely-labeled block-structured BPMN models using topological entropy as a mea-
sure of process model behavior. The experimental evaluation of process models dis-
covered from real-life event logs shows the approach’s ability to evaluate the quality
of discovered process models by relating their structural complexity to the structural
complexity of other process models that describe similar behaviors. Such analysis can
complement existing simplicity measurement techniques showing the relative aspects
of the structural complexity of the model.
We identify several research directions arising from this work. First, we acknowl-
edge that the proposed instantiation of the framework is approximate and depends on
the quality of the randomly generated models. The analysis of other structural com-
plexity measures as well as more sophisticated random model generation algorithms
can lead to a more precise approach. Second, the framework described in this paper
can be instantiated with other classes of process models to extend its applicability to
models discovered by a broader range of process discovery algorithms. Finally, we be-
lieve that this work can give valuable insights into the improvement of existing, and the
development of new process discovery algorithms.
16 Anna Kalenkova, Artem Polyvyanyy, and Marcello La Rosa
Acknowledgments. This work was supported by the Australian Research Council Dis-
covery Project DP180102839. We sincerely thank the anonymous reviewers whose sug-
gestions helped us to improve this paper.
References
1. van der Aalst, W.: Data Science in Action, pp. 3–23. Springer Berlin Heidelberg (2016)
2. Augusto, A., Conforti, R., Dumas, M., La Rosa, M., Polyvyanyy, A.: Split miner: automated
discovery of accurate and simple business process models from event logs. Knowl. Inf. Syst.
59(2), 251–284 (2019). https://doi.org/10.1007/s10115-018-1214-x
3. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision,
generalization and simplicity in process discovery. In: On the Move to Meaningful Internet
Systems: OTM 2012. pp. 305–322. Springer, Berlin (2012)
4. Cardoso, J.: How to Measure the Control-flow Complexity of Web processes and Workflows,
pp. 199–212 (01 2005)
5. Carmona, J., Cortadella, J., Kishinevsky, M.: A region-based algorithm for discovering petri
nets from event logs. In: Dumas, M., Reichert, M., Shan, M.C. (eds.) Business Process Man-
agement. pp. 358–373. Springer Berlin Heidelberg, Berlin, Heidelberg (2008)
6. Ceccherini-Silberstein, T., Mach`
ı, A., Scarabotti, F.: On the entropy of regular languages.
Theor. Comp. Sci. 307, 93–102 (2003)
7. van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der
Aalst, W.M.P.: The ProM Framework: A New Era in Process Mining Tool Support. In: Ap-
plications and Theory of Petri Nets 2005. pp. 444–454. Springer Berlin Heidelberg (2005)
8. Garrett, A.J.M.: Ockham’s Razor, pp. 357–364. Springer Netherlands (1991)
9. Gruhn, V., Laue, R.: Complexity metrics for business process models. In: 9th International
Conference on Business Information Systems (BIS 2006). pp. 1–12 (2006)
10. Gr¨
unwald, P.D.: The Minimum Description Length Principle (Adaptive Computation and
Machine Learning). The MIT Press (2007)
11. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and
Computation (3rd Edition). Addison-Wesley Longman Publishing Co., Inc., USA (2006)
12. Janˇ
car, P., Kuˇ
cera, A., Mayr, R.: Deciding bisimulation-like equivalences with finite-state
processes. In: Automata, Languages and Programming. pp. 200–211. Springer (1998)
13. Kalenkova, A.A., Ageev, A.A., Lomazova, I.A., van der Aalst, W.M.P.: E-government ser-
vices: Comparing real and expected user behavior. In: Business Process Management Work-
shops. pp. 484–496. Springer, Cham (2018)
14. Kalenkova, A., Aalst, W., Lomazova, I., Rubin, V.: Process mining using BPMN: Relating
event logs and process models process mining using BPMN. Relating event logs and process
models. Software and Systems Modeling 16, 1019–1048 (01 2017)
15. Kluza, K., Nalepa, G.J., Lisiecki, J.: Square Complexity Metrics for Business Process Mod-
els, pp. 89–107. Springer International Publishing, Cham (2014)
16. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process
models from event logs - a constructive approach. In: Application and Theory of Petri Nets
and Concurrency. pp. 311–329. Springer Berlin Heidelberg (2013)
17. Lieben, J., Jouck, T., Depaire, B., Jans, M.: An improved way for measuring simplicity dur-
ing process discovery. In: EOMAS. pp. 49–62. Springer (2018)
18. OMG: Business Process Model and Notation (BPMN), Version 2.0.2 (Dec 2013), http://
www.omg.org/spec/BPMN/2.0.2
19. Polyvyanyy, A.: Structuring Process Models. Ph.D. thesis, University of Potsdam (2012),
http://opus.kobv.de/ubp/volltexte/2012/5902/
A Framework for Estimating Simplicity of Process Models 17
20. Polyvyanyy, A., Solti, A., Weidlich, M., Ciccio, C.D., Mendling, J.: Monotone precision
and recall measures for comparing executions and specifications of dynamic systems. ACM
Trans. Softw. Eng. Methodol. 29(3) (Jun 2020). https://doi.org/10.1145/3387909
21. Reisig, W.: Understanding Petri Nets: Modeling Techniques, Analysis Methods, Case Stud-
ies. Springer Publishing Company, Incorporated (2013)
22. Wynn, M.T., Verbeek, H.M.W., van der Aalst, W.M.P., ter Hofstede, A.H.M., Edmond, D.:
Reduction rules for yawl workflows with cancellation regions and or-joins. Inf. Softw. Tech-
nol. 51(6), 1010–1020 (Jun 2009)
... Unlike enhancement techniques, estimators can potentially change control flows when producing a stochastic process model. Stochastic process models have a corresponding, emerging, set of stochastic process conformance measures [20,21,16]. Consequently, the algorithms and models presented here are evaluated, in Section 4, as stochastic process discovery algorithms, using stochastic process conformance measures. ...
... (3) Petri net entity count (places and transitions) and (4) edge count are used as structural simplicity measures, ensuring that conformance quality has not been achieved by sacrificing model simplicity and comprehensibility. Entity and arc counts have existing uses in process model evaluation [14,17], and were preferred here over behavioural simplicity measures [16], though these measures also have limitations, including specificity to Petri nets, and insensitivity to the stochastic perspective of GSPNs. ...
Conference Paper
Full-text available
Many algorithms now exist for discovering process models from event logs. These models usually describe a control flow and are intended for use by people in analysing and improving real-world organizational processes. The relative likelihood of choices made while following a process (i.e., its stochastic behaviour) is highly relevant information which few existing algorithms make available in their automatically discovered models. This can be addressed by automatically discovered stochastic process models. We introduce a framework for automatic discovery of stochastic process models, given a control-flow model and an event log. The framework introduces an estimator which takes a Petri net model and an event log as input, and outputs a Generalized Stochastic Petri net. We apply the framework, adding six new weight estimators, and a method for their evaluation. The algorithms have been implemented in the open-source process mining framework ProM. Using stochastic conformance measures, the resulting models have comparable conformance to existing approaches and are shown to be calculated more efficiently.
... Commonly, discovery results are evaluated by means of model-centric metrics like fitness, precision, generalization, and simplicity [9,15], which are e.g., computed via conformance checking [12,30] with the log that served as input to the discovery algorithm. Those metrics are valuable for assessing the reliability of discovery algorithms, and we want to complement them by expanding the evaluation perspective, as shown in Figure 1. ...
Chapter
Event logs have become a valuable information source for business process management, e.g., when analysts discover process models to inspect the process behavior and to infer actionable insights. To this end, analysts configure discovery pipelines in which logs are filtered, enriched, abstracted, and process models are derived. While pipeline operations are necessary to manage log imperfections and complexity, they might, however, influence the nature of the discovered process model and its properties. Ultimately, not considering this possibility can negatively affect downstream decision making. We hence propose a framework for assessing the consistency of model properties with respect to the pipeline operations and their parameters, and, if inconsistencies are present, for revealing which parameters contribute to them. Following recent literature on software engineering for machine learning, we refer to it as debugging. From evaluating our framework in a real-world analysis scenario based on complex event logs and third-party pipeline configurations, we see strong evidence towards it being a valuable addition to the process mining toolbox.
... Commonly, discovery results are evaluated by means of model-centric metrics like fitness, precision, generalization, and simplicity [9,15], which are e.g., computed via conformance checking [12,30] with the log that served as input to the discovery algorithm. Those metrics are valuable for assessing the reliability of discovery algorithms, and we want to complement them by expanding the evaluation perspective, as shown in Figure 1. ...
Preprint
Full-text available
Event logs have become a valuable information source for business process management, e.g., when analysts discover process models to inspect the process behavior and to infer actionable insights. To this end, analysts configure discovery pipelines in which logs are filtered, enriched, abstracted, and process models are derived. While pipeline operations are necessary to manage log imperfections and complexity, they might, however, influence the nature of the discovered process model and its properties. Ultimately, not considering this possibility can negatively affect downstream decision making. We hence propose a framework for assessing the consistency of model properties with respect to the pipeline operations and their parameters, and, if inconsistencies are present, for revealing which parameters contribute to them. Following recent literature on software engineering for machine learning, we refer to it as debugging. From evaluating our framework in a real-world analysis scenario based on complex event logs and third-party pipeline configurations, we see strong evidence towards it being a valuable addition to the process mining toolbox.
... Commonly, discovery results are evaluated by means of model-centric metrics like fitness, precision, generalization, and simplicity [9,15], which are e.g., computed via conformance checking [12,30] with the log that served as input to the discovery algorithm. Those metrics are valuable for assessing the reliability of discovery algorithms, and we want to complement them by expanding the evaluation perspective, as shown in Figure 1. ...
Conference Paper
Full-text available
Event logs have become a valuable information source for business process management, e.g., when analysts discover process models to inspect the process behavior and to infer actionable insights. To this end, analysts configure discovery pipelines in which logs are filtered, enriched, abstracted, and process models are derived. While pipeline operations are necessary to manage log imperfections and complexity, they might, however, influence the nature of the discovered process model and its properties. Ultimately, not considering this possibility can negatively affect downstream decision making. We hence propose a framework for assessing the consistency of model properties with respect to the pipeline operations and their parameters, and, if inconsistencies are present, for revealing which parameters contribute to them. Following recent literature on software engineering for machine learning, we refer to it as debugging. From evaluating our framework in a real-world analysis scenario based on complex event logs and third-party pipeline configurations, we see strong evidence towards it being a valuable addition to the process mining toolbox.
... Commonly, discovery results are evaluated by means of model-centric metrics like fitness, precision, generalization, and simplicity [9,15], which are e.g., computed via conformance checking [12,30] with the log that served as input to the discovery algorithm. Those metrics are valuable for assessing the reliability of discovery algorithms, and we want to complement them by expanding the evaluation perspective, as shown in Figure 1. ...
Conference Paper
Event logs have become a valuable information source for business process management, e.g., when analysts discover process models to inspect the process behavior and to infer actionable insights. To this end, analysts configure discovery pipelines in which logs are filtered, enriched, abstracted, and process models are derived. While pipeline operations are necessary to manage log imperfections and complexity, they might, however, influence the nature of the discovered process model and its properties. Ultimately, not considering this possibility can negatively affect downstream decision making. We hence propose a framework for assessing the consistency of model properties with respect to the pipeline operations and their parameters, and, if inconsistencies are present, for revealing which parameters contribute to them. Following recent literature on software engineering for machine learning, we refer to it as debugging. From evaluating our framework in a real-world analysis scenario based on complex event logs and third-party pipeline configurations, we see strong evidence towards it being a valuable addition to the process mining toolbox.
Chapter
A processProcess is a collection of actions that were already, are currently being, or must be taken in order to achieve a goal, where an actionAction is an atomic unit of work, for instance, a business activity or an instruction of a computer program. A process repositoryProcessrepository is an organized collection of models that describe processes, for example, a business process repository and a software repository. Process repositories without facilities for process queryingProcessquerying and process manipulationProcessmanipulation are like databases without Structured Query LanguageStructured Query Language (SQL)SQL Structured Query Language, that is, collections of elements without effective means for deriving value from them. Process Query Language (PQL) is a domain-specific programming language for managing processes described in models stored in process repositories. Process Query Language (PQL)PQL Process Query Language PQL can be used to query and manipulate process models based on possibly infinite collections of processes that they represent, including processes that support concurrent execution of actions. This chapter presents PQL, its current features, publicly available implementation, planned design and implementation activities, and open research problems associated with the design of the language.
Chapter
Due to growing digital opportunities, persistent legislative pressure, and recent challenges in the wake of the COVID-19 pandemic, public universities need to engage in digital innovation (DI). While society expects universities to lead DI efforts, the successful development and implementation of DIs, particularly in administration and management contexts, remains a challenge. In addition, research lacks knowledge on the DI process at public universities, while further understanding and guidance are needed. Against this backdrop, our study aims to enhance the understanding of the DI process at public universities by providing a structured overview of corresponding drivers and barriers through an exploratory single case study. We investigate the case of a German public university and draw from primary and secondary data of its DI process from the development of three specific digital process innovations. Building upon Business Process Management (BPM) as a theoretical lens to study the DI process, we present 13 drivers and 17 barriers structured along the DI actions and BPM core elements. We discuss corresponding findings and provide related practice recommendations for public universities that aim to engage in DI. In sum, our study contributes to the explanatory knowledge at the convergent interface between DI and BPM in the context of public universities.
Article
Full-text available
The behavioural comparison of systems is an important concern of software engineering research. For example, the areas of specification discovery and specification mining are concerned with measuring the consistency between a collection of execution traces and a program specification. This problem is also tackled in process mining with the help of measures that describe the quality of a process specification automatically discovered from execution logs. Though various measures have been proposed, it was recently demonstrated that they neither fulfil essential properties, such as monotonicity, nor can they handle infinite behaviour. In this article, we address this research problem by introducing a new framework for the definition of behavioural quotients. We prove that corresponding quotients guarantee desired properties that existing measures have failed to support. We demonstrate the application of the quotients for capturing precision and recall measures between a collection of recorded executions and a system specification. We use a prototypical implementation of these measures to contrast their monotonic assessment with measures that have been defined in prior research.
Article
Full-text available
The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or over-generalize it (low precision). Striking a trade-off between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method, namely Split Miner, which produces simple process models with low branching complexity and consistently high and balanced fitness and precision, while achieving considerably faster execution times than state-of-the-art methods, measured on a benchmark covering twelve real-life event logs. Split Miner combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph. Split Miner is also the first automated process discovery method that is guaranteed to produce deadlock-free process models with concurrency, while not being restricted to producing block-structured process models.
Chapter
Full-text available
E-government web services are becoming increasingly popular among citizens of various countries. Usually, to receive a service, the user has to perform a sequence of steps. This sequence of steps forms a service rendering process. Using process mining techniques this process can be discovered from the information system’s event logs. A discovered process model of a real user behavior can assist in the analysis of service usability. Thus, for popular and well-designed services this process model will coincide with a reference process model of the expected user behavior. While for other services the observed real behavior and the modeled expected behavior can differ significantly. The main aim of this work is to suggest an approach for the comparison of process models and evaluate its applicability when applied to real-life e-government services.
Thesis
Full-text available
One can fairly adopt the ideas of Donald E. Knuth to conclude that process modeling is both a science and an art. Process modeling does have an aesthetic sense. Similar to composing an opera or writing a novel, process modeling is carried out by humans who undergo creative practices when engineering a process model. Therefore, the very same process can be modeled in a myriad number of ways. Once modeled, processes can be analyzed by employing scientific methods. Usually, process models are formalized as directed graphs, with nodes representing tasks and decisions, and directed arcs describing temporal constraints between the nodes. Common process definition languages, such as Business Process Model and Notation (BPMN) and Event-driven Process Chain (EPC) allow process analysts to define models with arbitrary complex topologies. The absence of structural constraints supports creativity and productivity, as there is no need to force ideas into a limited amount of available structural patterns. Nevertheless, it is often preferable that models follow certain structural rules. A well-known structural property of process models is (well-)structuredness. A process model is (well-)structured if and only if every node with multiple outgoing arcs (a split) has a corresponding node with multiple incoming arcs (a join), and vice versa, such that the set of nodes between the split and the join induces a single-entry-single-exit (SESE) region; otherwise the process model is unstructured. The motivations for well-structured process models are manifold: (i) Well-structured process models are easier to layout for visual representation as their formalizations are planar graphs. (ii) Well-structured process models are easier to comprehend by humans. (iii) Well-structured process models tend to have fewer errors than unstructured ones and it is less probable to introduce new errors when modifying a well-structured process model. (iv) Well-structured process models are better suited for analysis with many existing formal techniques applicable only for well-structured process models. (v) Well-structured process models are better suited for efficient execution and optimization, e.g., when discovering independent regions of a process model that can be executed concurrently. Consequently, there are process modeling languages that encourage well-structured modeling, e.g., Business Process Execution Language (BPEL) and ADEPT. However, the well-structured process modeling implies some limitations: (i) There exist processes that cannot be formalized as well-structured process models. (ii) There exist processes that when formalized as well-structured process models require a considerable duplication of modeling constructs. Rather than expecting well-structured modeling from start, we advocate for the absence of structural constraints when modeling. Afterwards, automated methods can suggest, upon request and whenever possible, alternative formalizations that are "better" structured, preferably well-structured. In this thesis, we study the problem of automatically transforming process models into equivalent well-structured models. The developed transformations are performed under a strong notion of behavioral equivalence which preserves concurrency. The findings are implemented in a tool, which is publicly available.
Article
Full-text available
Complexity metrics for Business Process (BP) are used for the better understanding, and controlling quality of the models, thus improving their quality. In the paper we give an overview of the existing metrics for describing various aspects of BP models. We argue, that the design process of BP models can be improved by the availability of metrics that are transparent and easy to be interpreted by the designers. Therefore, we propose simple yet practical square metrics for describing complexity of a BP model based on the Durfee and Perfect square concept. These metrics are easy to interpret and provide basic information about the structural complexity of themodel. The proposed metrics are to be used with models built with Business Process Model and Notation (BPMN), which is currently the most widespread language used for BP modeling. Moreover, we present a set of BPMN models analyzed with our metrics. Finally, we introduce a tool implementing the discussed metrics. We compare the results to other important metrics, emphasizing the qualities of our approach.
Article
Full-text available
Process-aware information systems (PAIS) are systems relying on processes, which involve human and software resources to achieve concrete goals. There is a need to develop approaches for modeling, analysis, improvement and monitoring processes within PAIS. These approaches include process mining techniques used to discover process models from event logs, find log and model deviations, and analyze performance characteristics of processes. The rep-resentational bias (a way to model processes) plays an important role in process mining. The BPMN 2.0 (Business Process Model and Notation) standard is widely used and allows to build conventional and understandable process models. In addition to the flat control flow perspective, subpro-cesses, data flows, resources can be integrated within one BPMN diagram. This makes BPMN very attractive for both process miners and business users. In this paper, we describe and justify robust control flow conversion algorithms, which provide the basis for more advanced BPMN-based discovery and conformance checking algorithms. We believe that the results presented in this paper can be used for a wide variety of BPMN mining and conformance checking algorithms. We also provide metrics for the processes discovered before and after the conversion to BPMN structures. Cases for which conversion algorithms produce more compact or more involved BPMN models in comparison with the initial models are identified.
Chapter
In the domain of process discovery, there are four quality dimensions for evaluating process models of which simplicity is one. Simplicity is often measured using the size of a process model, the structuredness and the entropy. It is closely related to the process model understandability. Researchers from the domain of business process management (BPM) proposed several metrics for measuring the process model understandability. A part of these understandability metrics focus on the control-flow perspective, which is important for evaluating models from process discovery algorithms. It is remarkable that there are more of these metrics defined in the BPM literature compared to the number of proposed simplicity metrics. To research whether the understandability metrics capture more understandability dimensions than the simplicity metrics, an exploratory factor analysis was conducted on 18 understandability metrics. A sample of 4450 BPMN models, both manually modelled and artificially generated, is used. Four dimensions are discovered: token behaviour complexity, node IO complexity, path complexity and degree of connectedness. The conclusion of this analysis is that process analysts should be aware that the measurement of simplicity does not capture all dimensions of the understandability of process models.
Chapter
In recent years, data science emerged as a new and important discipline. It can be viewed as an amalgamation of classical disciplines like statistics, data mining, databases, and distributed systems. Existing approaches need to be combined to turn abundantly available data into value for individuals, organizations, and society. Moreover, new challenges have emerged, not just in terms of size (“Big Data”) but also in terms of the questions to be answered. This book focuses on the analysis of behavior based on event data. Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements. In later chapters, we will show that process mining provides powerful tools for today’s data scientist. However, before introducing the main topic of the book, we provide an overview of the data science discipline.