Content uploaded by Luise Pufahl
Author content
All content in this area was uploaded by Luise Pufahl on Jan 11, 2024
Content may be subject to copyright.
Discovery of Workflow Patterns - A Comparison
of Process Discovery Algorithms
Kerstin Andree1, Mai Hoang2, Felix Dannenberg2, Ingo Weber1, and Luise
Pufahl1[0000−0002−5182−2587]
1Technische Universit¨at M¨unchen, Heilbronn, Germany
firstname.lastname@tum.de
2Technische Universit¨at Berlin, Berlin, Germany
mai.hoang@hotmail.de, felixd@win.tu-berlin.de
Abstract. Process mining provides a set of techniques and algorithms
to analyze, support, and improve business processes based on process
execution data. Process discovery aims at deducing a representative pro-
cess model of real-world execution. So far, process discovery algorithms
have been mainly compared regarding their output quality but not yet
with regard to their functional capabilities. The well-established work-
flow control flow patterns imperatively describe process behavior, origi-
nally used to compare modeling languages, but to date, not to compare
discovery algorithms. In this work, we analyze a representative set of pro-
cess discovery algorithms with regard to their coverage of 23 control flow
patterns. For this purpose, we implemented each workflow pattern as
an executable colored Petri net, simulated it, and ran various discovery
algorithms on the obtained event log. A comparison of the results shows
that the discovery algorithms mainly cover basic control flow patterns
and iterative structures, while multi-instance, state-base, and cancella-
tion patterns are only partially covered.
Keywords: Process Mining ·Discovery Algorithms ·Workflow Pat-
terns.
1 Introduction
Process mining is a data-driven approach that aims to extract insights and
knowledge from event logs, enabling organizations to understand, analyze, and
improve their business processes [26, Ch.2]. One of its main operations is pro-
cess discovery [26, Ch.6], which offers insights into the real-world execution of
business processes, e.g., the analysis of how students perform their studies [17].
Thus, several process discovery algorithms [7] have been introduced to automat-
ically identify and create process models based on process execution data in the
form of event logs. They provide a representation of how business processes are
executed within an organization that can be used for analysis, simulation, and
compliance checking [26, Ch.6].
2 K. Andree et al.
So far, process discovery techniques have been compared regarding their out-
put quality, including accuracy, complexity, and soundness of the resulting pro-
cess models [7,10]. However, a structured comparison regarding their functional
capabilities, i.e., logic and semantics of business process models, is missing. Orig-
inally, the workflow patterns [4] were developed as a conceptual basis to assess
the relative strengths and weaknesses of process modeling approaches and busi-
ness process execution tools in five dimensions: control flow, data, resources,
exception handling, and imperfection. Based on the control flow patterns [22],
we set up an experimental study to compare and analyze the functional capa-
bilities of a set of process discovery algorithms. This best reflects the focus of
process discovery, namely to discover activities and their control flow relations.
For each pattern, we modeled one colored Petri Net [18] in CPN Tools [1]. Sim-
ulating the model generates an event log, which serves as an input for the set of
selected process discovery algorithms. The algorithms’ output is then compared
with the event log and original workflow model. With this study, we make the
following contributions:
–An experimental comparison of discovery algorithms with regards to their
functional capabilities;
–23 implemented workflow patterns and their event logs;
–The state-of-the-art coverage of workflow patterns; and
–A discussion of future research options and improvement avenues for discov-
ery algorithms.
In the remainder of this paper, we will first introduce related work in Sec-
tion 2. The methodology of the coverage analysis is explained in Section 3 and its
results are presented in Section 4. Section 5 discusses the results in the context
of the current approaches for discovery algorithm improvement and addresses
the limitations of this work. Finally, Section 6 concludes.
2 Related Work
The comparison of discovery algorithms is essential as it supports selecting and
evaluating discovery techniques to gain valuable insights into process data. Thus,
literature already compares different process discovery algorithms [13,7]. Au-
gusto et al. [7] present a benchmark of 35 existing process discovery algorithms.
They focus on the four quality attributes generalization, precision, complex-
ity, replay fitness, and soundness. They also compare basic semantic constructs,
such as XOR, OR, AND, and loops. In contrast, van Dongen et al. [13] provide a
comparison based on completeness, support of control-flow constructs, abstrac-
tion, and tendency to over- and underfitting. However, both approaches do not
consider all control-flow patterns.
In 1998, Agrawal et al. [5] presented one of the first approaches and ideas to
mine process models from workflow logs. Their research highlights the impor-
tance of applying process mining techniques in the context of workflow patterns
in order to get better access to the workflow system. Based on this motivation,
Discovery of Workflow Patterns 3
the Workflow Miner was developed [5,14,15] that can discover workflow pat-
terns using algorithmic and statistical analysis techniques. However, this miner
has only been analyzed for the basic control-flow patterns such as Sequence,
Parallel Split, and Exclusive Choice.
Furthermore, the importance of workflow patterns is addressed by Cardoso [11].
He introduces a new approach to workflow analysis and evaluation of workflow
models by presenting metrics and mechanisms that define the complexity of
control-flow patterns to get an indication of the overall system complexity.
Related work shows that workflow patterns are an important tool to better
understand and discover processes. However, discovery algorithms have not yet
been compared on the basis of the complete set of control flow patterns, which
is why this type of coverage analysis is the subject of this paper.
3 Research Method
We perform an experimental coverage analysis to compare the process discov-
ery algorithms based on the workflow patterns. Each pattern is modeled and
simulated so that we can then execute the selected discovery techniques on it.
This section describes the research method of this work: First, the selection of
workflow patterns and discovery algorithms is explained and justified. Then, our
experimental setup is explained in detail.
3.1 Selection of Workflow Patterns
The workflow control flow patterns were defined to specify fundamental business
process behavior [4] imperatively. Originally, 20 control flow patterns were for-
mally described as a colored Petri net (CPN) [18], but the set was extended by
Russell et al. [22] to 42 patterns in 2006. This paper’s main focus is the original
20 workflow control flow patterns and the Structured Loop, the Recursion,
and the Explicit Termination from the extended set. In total, we selected 23
control-flow patterns for our analysis. They are listed in Table 2. The patterns
are grouped into seven categories, briefly revisited in this section. A detailed
description of the patterns is provided in [4,22].
Basic Control-Flow Patterns are considered the most essential ones being ob-
served in various business processes: Sequence,Parallel Split,Synchronization,
Exclusive Choice, and Simple Merge. We include all of them in the analysis.
Advanced Branching and Synchronization Patterns represent more complex branch-
ing and merging behavior, including the patterns Multi-Choice and Structured
Synchronizing Merge (OR split/join), which can often be observed in real-
world processes as well. Many of these patterns are already challenging to dis-
cover. For example, distinguishing a Structured Discriminator from the Simple
Merge is quite challenging from a simple event log. Both merge multiple incom-
ing branches into a subsequent branch once an incoming branch completes. The
4 K. Andree et al.
minor difference is that activities before a discriminator can happen without trig-
gering it again. In contrast, completing activities before a Simple Merge always
triggers the subsequent branch. Thus, we omit the majority of the extended
patterns from [22] in our analysis but include: Multi-Choice,Multi-Merge,
Structured Synchronizing Merge, and the Structured Discriminator.
Iteration Patterns include patterns on loops and repeating behavior. As loops are
supported by several process discovery algorithms [7], we include the Structured
Loop and Recursion.
Termination Patterns deal with the completion of workflow instances. A stan-
dard assumption of process discovery algorithms is that instances end after the
last event is present in each log trace. Hence, Implicit Termination is in-
herently supported. Furthermore, we also include the Explicit Termination
pattern due to its relevance for business analysts. Especially for business pro-
cesses, including parallel threads, this pattern allows for an abrupt termination
in case of reaching a specific process state.
Multiple Instance Patterns covers the occurrence of multiple activity instances.
Distinguishing multi-instance from iterative behavior is challenging, which is why
we decided to include the four basic multi-instance patterns without considering
the extended patterns such as Static Partial Join for Multiple Instances.
State-based Patterns consider the state of a process instance, which is often
defined based on its associated data, such as the case data or the output data
of previous tasks. Thus, discovery algorithms need to consider data from the
cases and events. In this work, we focus on the basic state-based patterns, e.g.,
Deferred Choice and Milestone, and omit extended ones such as the Critical
Section pattern. Finally, we select the patterns Cancel Task and Cancel Case
from the group of Cancellation Patterns.
3.2 Selection of Process Discovery Algorithms
The benchmark paper of Augusto et al. [7] provides a comprehensive overview
of existing process discovery algorithms based on structured literature analysis.
For this work, we selected seven out of the presented 35 algorithms, namely
the α-Miner [16], Inductive Miner [20], Evolutionary Tree Miner (ETM) [10],
Fodina Miner [9], Split Miner [28], Hybrid Integer Linear Programming (HILP)
Miner [30], and the BPMN Miner [12]. Details on the selected miners can be
found in the referenced research works. The selection offers a wide range of
different approaches and covers several modeling languages such as Petri nets,
process trees, causal nets, and BPMN each being transferrable to Petri nets to
warrant comparability to the implemented workflow patterns. Additionally, these
discovery algorithms are supported by the process mining framework ProM, its
plugins, or Java applications and are thus publicly accessible.
We want to highlight here that Simple Merge and the Multi-Merge are
often similarly represented by an XOR gateway in process models. However,
a relevant difference exists between them: whereas the Simple Merge assumes
Discovery of Workflow Patterns 5
that only one incoming path of multiple ones can be activated, the Multi-Merge
can also be triggered by multiple incoming paths resulting in multiple process
threads running independently. In most process modeling notations, particularly
Petri Nets and BPMN, this implies that, in the general case, proper termination
cannot be guaranteed following a Multi-Merge because it becomes unclear how
many tokens result from it. Hence, soundness is generally not given, and, thus, a
Multi-Merge cannot be discovered by algorithms that guarantee sound outputs.
3.3 Approach
The goal of this work is an analysis of state-of-the-art discovery algorithms
regarding their coverage of workflow patterns. For this purpose, (1) each selected
workflow pattern was implemented as an executable colored Petri net (CPN) to
(2) generate an event log by simulation (cf. Fig. 1). Based on this generated
data, (3) different discovery algorithms were executed, and their outputs were
compared with the modeled workflow patterns to evaluate the coverage.
1. Modeling the Workflow
Patterns using CPN Tools
2. Simulation of Workflow
Patterns
3. Validation and Comparison
of Results
Set of Workflow Patterns (Theory)
Modeling
Set of Workflow Patterns (Executable)
Simulation </> Generation </>
Event Log (MXML Format)
CPN-Tools
CPN-Tools
ProM Import
Framework
</>
Event Log (MXML Format)
Alpha Miner
Inductive Miner
HILP Miner
ETM Miner
Split Miner
Fodina Minder
BPMN Miner Discovered Models
ProM ProM
Fig. 1: Approach
To implement the workflow patterns as CPNs and to generate an event log,
we use CPN Tools3(v.4.0.1), a publicly available software tool for editing, sim-
ulating, and analyzing (colored) Petri nets. We use ProM Import Framework4
(v.7.0) to merge and convert the simulated event log data of a workflow pattern
into the MXML format to be used as an input for process mining tools (e.g.,
ProM). For process discovery, we use the framework ProM 5(v.6.12). Each of
the selected discovery algorithms presented above is supported by ProM so that
we can use the framework to run different discovery techniques. This section ex-
plains each of these steps. All of the implemented Petri nets, the simulated event
logs, and the outputs of each discovery algorithm are published via figshare6.
3https://cpntools.org/
4http://www.promtools.org/promimport/
5https://promtools.org/prom-6-12/
6https://figshare.com/s/40a65e1fdab01c58e3d1
6 K. Andree et al.
Modeling To generate event log data in order to run several discovery algo-
rithms, we implemented each control-flow pattern as a colored Petri net (CPN)
using CPN Tools [1]. Thus, we can distinguish different cases, and each workflow
pattern is represented by exactly one executable CPN, resulting in 23 uniquely
identifiable CPN models. In the following, we exemplify the modeling of the pat-
terns on the Recursion shown in Fig. 2. Afterward, we describe the necessary
elements of the CPN model to simulate and log the pattern execution.
Case Generator
Logging
Fig. 2: Recursion pattern modeled as a colored Petri net
Recursion describes the behavior of a task that is able to invoke itself during
its execution [22]. Thus, the requirements of the Recursion pattern are (1) one
activity invokes itself, (2) the execution of additional created activity instances
starts after the last activity has been executed, and (3) the process instance
terminates only after all activity instances are terminated successfully. In the
CPN shown in Fig. 2, transition t3 can invoke itself. As soon as t4 and t5 are
executed once, the process instance starts to continue with all created instances
of t3. To ensure proper termination, each child of t3 has its own id so that t6 is
only enabled when the initial instance of t3 has terminated. Note that t6 is not
part of the execution sequence we want to log.
In general, we strictly followed the described behavior of the patterns given
by [22]. For some patterns, we enriched the behavior to include several cases
of the pattern. For instance, we included a skipping and a redoing option to
implement the Cancel Task pattern to have different variants of the canceling
behavior. However, we restricted the cancellation of a task before its execution
to at most once to avoid loop behavior.
Additionally, we modeled the external behavior needed for the Deferred
Choice pattern as a separate transition that is only enabled when the process
waits for an external event. This transition then generates a random integer
between 1 and 10. Depending on the number (even or odd), one of two transitions
Discovery of Workflow Patterns 7
is enabled, representing external events. We changed the labeling accordingly to
differentiate between activity and event transitions.
Simulation When simulating process models, multiple instances of the model
are executed, and their activities are logged as traces in the event log. To ensure
representative event log data for each pattern, we run 500 process instances per
CPN. An ID generator within each CPN increments the case ID and generates
tokens accordingly (see Fig. 2). CPN Tools are used for logging the execution of
each process instance. As the logging output of CPN Tools is incompatible with
ProM, we apply the approach of Alves de Medeiros and G¨unther [6] to create
S-MXML logs by enriching the CPN nets with ML functions on the transitions
that represent process activities of the workflow pattern (cf. blue boxes in Fig. 2).
The ProM Import Framework bundles the output log files into a single MXML
file containing 500 cases for each workflow pattern. Note that not all transitions
of the CPN are logged, only the ones that are process activities. For example,
transition t6 in the CPN shown in Fig. 2 is not logged as it is a helper transition
and ensures process termination without being part of it.
Validation The simulated event logs are used as input for each selected discov-
ery algorithm. We use ProM 6.12 to run the algorithms because of its large pool
of process discovery algorithms and compatibility with external plugins.
Table 1: Overview of the used process discovery plugins
Algorithm Plugin In ProM Configuration
αAlpha Miner yes α++ Miner
HILP ILP-Based Process Discovery yes Basic algorithm
Inductive Mine Petri Net with
Inductive Miner
yes Inductive Miner - infrequent, noise
threshold 0.2
ETM Mine Configured Process Tree
with ETMc
yes Default settings kept, number of
generations set to 100
Convert Process Tree to Petri net
(by Leemans)
yes
Fodina Mine Causal net with Fodina yes external dependencies required [8]
Convert Causal Net to Reduced
Petri Net
yes
Split Split Miner [2] no parallelism threshold = 0.1, per-
centile for frequency threshold =
0.4, boolean flag = false
Convert BPMN diagram to Petri net yes default settings
BPMN BPMN Miner [25] no Heuristic Miner (hm), pull-up rule
flag p, force structuring flag f
Convert BPMN diagram to Petri net yes default settings
A general setup and configuration of input parameters are needed to run
discovery algorithms. Except for the Split Miner and the BPMN Miner, which
are standalone Java applications, and the Fodina Miner, which must be included
manually into ProM, all other algorithms can be found and executed via the
8 K. Andree et al.
process mining tool. Table 1 shows an overview of the used plugins for the
selected algorithms and the configurations, respectively.
The discovered process models are compared as Petri nets to ensure better
comparability with the modeled workflow patterns. However, not all algorithms
discover a Petri net. For example, the Split Miner and BPMN Miner output
BPMN process models, the ETM Process Trees outputs a Process Tree, and the
Fodina Miner outputs a Causal Net. Therefore, the converters in ProM are used
accordingly.
(a) Split Miner
(b) Fodina Miner
Fig. 3: Outputs of two discovery algorithms for Recursion pattern
For each pattern and algorithm, we evaluate whether there is full coverage,
i.e., the pattern has been completely discovered, partial coverage, i.e., the pat-
tern is only covered under certain conditions or no coverage. Fig. 3 shows two
discovered process models of the Recursion pattern. We observe that the process
model shown in Fig. 3a represents the pattern partially because the information
about how many instances of t3 are needed to continue in execution is not trans-
ferred to the last gateway. The Fodina Miner, however, is not able to detect the
pattern. The requirement that the execution of additional created activity in-
stances starts after the last activity has been executed is not true for the output
process model. Moreover, completing several instances of t3 is impossible.
4 Results
This section presents the coverage analysis results summarized in Table 2. The
table gives an overview of the coverage of 23 control-flow patterns by seven
discovery algorithms. We denote the successful coverage of a workflow pattern
by a checkmark (✓). A cross (×) indicates a failed coverage, whereas a circle
(#) indicates the case that the algorithm only partially discovered the pattern.
Discovery of Workflow Patterns 9
As shown in Table 2, all discovery algorithms have successfully discovered all
the basic control-flow patterns except the Simple Merge. Advanced branching
and synchronization patterns are only partially covered by the Inductive Miner,
the ETM Miner, the Split Miner, and the BPMN Miner.
Table 2: Results of workflow patterns coverage by discovery algorithms
Discovery Algorithm α++ HILP7Inductive ETM8Fodina Split BPMN
Output PN9PN9PN9PT10 CN11 BPMN BPMN
Year of Release 2004 2017 2013 2014 2017 2017 2016
Basic Control-Flow
Sequence ✓ ✓ ✓ ✓ ✓ ✓ ✓
Parallel Split ✓ ✓ ✓ ✓ ✓ ✓ ✓
Synchronization ✓ ✓ ✓ ✓ ✓ ✓ ✓
Exclusive Choice ✓ ✓ ✓ ✓ ✓ ✓ ✓
Simple Merge #✓ ✓ ✓ # # #
Advanced Branching + Sync.
Multi-Merge #× × × # # #
Multi-Choice × × ✓ ✓ ×✓ ✓
Structured Sync. Merge × × ✓× × × ✓
Structured Discriminator × × × × × × ×
Iteration
Arbitrary Cycles ×✓× × ✓ ✓ ✓
Structured Loop #✓ ✓ ×✓ ✓ ✓
Recursion ×#× × × # #
Termination
Implicit Termination ✓ ✓ ✓ ✓ # # ✓
Explicit Termination × × × × ✓× ×
Multiple Instances
without Synchronization × × × × × × ×
with a Priori Design-Time # # # ✓# # ×
with a Priori Run-Time × × # # # # #
without a Priori Run-Time ✓ ✓ ✓ ✓ ✓ ✓ ✓
State-Based Patterns
Deferred Choice # # # # ×# #
Interleaved Parallel Routing × × × × × × ✓
Milestone (deadline) × × × × × × ✓
Cancellation
Cancel Task × × × × ✓×✓
Cancel Case ×✓× × × #✓
We observe a strong coverage for iterative patterns. Implicit Termination
is covered by almost all discovery algorithms whereas Explicit Termination
7Hybrid Integer Linear Programming
8Evolutionary Tree Miner
9Petri Net
10 Process Tree
11 Causal Nets
10 K. Andree et al.
is only covered by the Fodina Miner. A low coverage also applies to multiple
instances, state-based, and cancellation patterns. Complex workflow patterns
such as Cancel Task are only covered by the Split Miner and the Fodina Miner.
In the following, we present the results for each process discovery algorithm.
4.1 α-Miner
The α-Miner is one of the first process model discovery algorithms and, there-
fore, a pioneer in this area of research. Based on the observed directly-follows
relations in an event log, it detects basic ordering relations, i.e., the alpha rela-
tions, between the events, such as parallelism and exclusiveness. This behavior
can also be discovered in our coverage analysis. The algorithm covers all ba-
sic control-flow patterns. However, because the α-Miner cannot identify sound
workflow nets, it cannot differentiate between a Simple Merge (only one out of
multiple incoming paths can be triggered) and a Multi-Merge (multiple incom-
ing paths can be triggered). Advanced branching and synchronization patterns
were not discovered correctly since no ordering relations for these were defined
for the algorithm to detect them.
Furthermore, we observe that the algorithm did not perform well on the
Recursion pattern, the state-based, the multiple instance patterns (except for
the Multiple Instance Without a Priori Run-Time Knowledge pattern), and
the cancellation patterns. Multiple Instances with a priori Design-Time
Knowledge is partially covered. The algorithm discovers the multiple instance
activity but neglects the number of instances defined at design time. The dis-
covered Petri net allows for an arbitrary number of instances at run-time. We
explain these results with the fact that these patterns depend on the process-
ing of further information of the event logs in addition to the directly-follows
relations, and, thus, the α-algorithm cannot detect any of these patterns.
Deferred Choice is only partially covered by the α-Miner. The pattern con-
tains a decision from an external source that could not be identified. Instead,
simple exclusive choice behavior is discovered. Arbitrary Cycles is not discov-
ered correctly. The α++ Miner cannot identify the multiple entry points of the
arbitrary loop and fails in detecting the correct execution sequences. In contrast,
the Structured Loop is partially discovered. The entry and exit points of the
structured loop are correctly identified, but the conditional entry, as stated in
the workflow pattern, is not represented in the discovered model.
Fig. 4: Discovered process model by the HILP Miner of the Recursion pattern
Discovery of Workflow Patterns 11
4.2 Hybrid Integer Linear Programming (HILP) Miner
Similar to the α-Miner, the HILP-Miner [30] produces relaxed sound Petri nets
from event logs as input. An optimization technique is used to set the places for
a process model. The results show that the HILP-Miner is able to identify all
basic control-flow patterns. However, Multi-Merge is not covered since relaxed
soundness implies proper termination. Other advanced branching and synchro-
nization patterns are also not covered. We explain this observation because the
HILP-Miner focuses on optimizing the distinction between exclusive and parallel
behavior. Moreover, the discovery technique does not cover all multiple instances
or state-based patterns. Similar to the α-Miner, only the Multiple Instance
Without a Priori Run-Time Knowledge pattern is fully discovered, whereas the
Multiple Instances with a Priori Design-Time Knowledge pattern was only
partially discovered because of missing information regarding the number of in-
stances. Regarding more complex patterns, the Cancel Case pattern is covered.
The discovered model for Recursion correctly represents the workflow pattern
but does not specify conditions for XOR splits, as shown in Fig. 4. Having the
discovered model as a basis, it is unclear when and how many invoked activity
instances are executed. Therefore, we indicate the discovery as partial.
From the state-based patterns, the HILP Miner only partially discovers the
Deferred Choice pattern. Similar to the alpha Miner, the algorithm could not
identify that the decision was made externally. Nevertheless, the HILP-Miner
can detect Structured Loops and Arbitraty Cycles.
4.3 Inductive Miner
The Inductive Miner [20] focuses on the soundness, fitness, and rediscoverability
of the discovered process models. It outputs a Petri net.
All basic control-flow patterns and the Structured Loop pattern are cov-
ered. The algorithm works with a divide-and-conquer approach with the directly-
follows relation used for partitioning. Exclusive choice, sequence, concurrent, and
looping relation are fundamental to this approach, which is why the algorithm
performs well for this group of patterns. However, Arbitrary Cycles could not
be identified because of the nature of block-structured process models. Cycles
with multiple entry points cannot be localized.
Advanced branching and synchronization patterns are partly discovered. There
is no dedicated split for the Multi-Choice and Structured Synchronizing
Merge patterns, but process trees themselves display these patterns with a con-
junction of XOR and AND splits and joins.
The Multi-Merge pattern is not discovered. This is due to the fact that the
resulting process model would not be sound anymore, which is not allowed by
the algorithm. Structured Discriminator could also not be detected because
it falls under infrequent behavior.
Similar to the HILP and α-Miner, the Inductive Miner can also detect the
Multiple Instance Without a Priori Run-Time Knowledge pattern and is
able to partially identify the two multiple instances patterns with a priori
12 K. Andree et al.
Design-Time Knowledge and with a priori Run-Time, but fails in discovering
state-based and cancellation patterns. The Deferred Choice is partially covered.
4.4 Evolutionary Tree Miner (ETM)
The ETM algorithm [10] outputs process trees. This discovery method focuses
on emphasizing certain quality dimensions while using process trees ensures the
soundness of the resulting models.
The algorithm’s coverage is similar to the Inductive Miner. All basic control-
flow patterns and the Multi-Choice pattern could be identified. However, in
contrast to the inductive miner, the Structured Synchronizing Merge and the
Strucutured Loop patterns were not discovered by the algorithm. In comparison
to the other algorithms, it cannot discover any of the iteration patterns.
The Evolutionary Tree Miner is able to (partially) detect three of the multi-
ple instance patterns but has almost no coverage of the state-based, recursion,
or cancellation patterns. Compared to other algorithms, the ETM is able to
correctly identify the number of instances for the Multiple Instances with
a Priori Design-Time Knowledge. Instead of a loop, the algorithm detects the
correct number of instances that are put in sequence. However, it fails in de-
tecting the number of instances for the Multiple Instances with a Priori
Run-Time Knowledge pattern. The Deferred Choice pattern is partially cov-
ered for the same reason as explained for the previous algorithms. Cancellation
patterns are not covered because the discovered Petri nets still allow to execute
transitions, although the case was canceled.
Moreover, the ETM does not consider any additional information in the event
logs and only tries to build process trees with each activity and does not, e.g.,
consider multiple instances of activities.
4.5 Fodina Miner
The Fodina Miner [8] first converts event logs to task logs, with additional in-
formation used to mine duplicates of activities. It discovers relationships using
a dependency graph, starting with length-one and length-two loops and ending
with split and join semantics. It has a dedicated step for long-distance depen-
dencies, unlike other discovery algorithms. The output is causal nets.
As shown in Table 2, the results reflect the algorithm’s functionalities. The
Miner is able to identify all basic control-flow patterns, Arbitrary Loop, and
Structured Loop. Since the soundness of the output model is not guaranteed,
the algorithm cannot differentiate between Simple Merge and Multi-Merge.
All other advanced branching and synchronization patterns are not covered.
Nonetheless, iteration patterns except the Recursion pattern are discovered.
Surprisingly, the Fodina Miner cannot discover Implicit Termination. Al-
though there are no duplicates of tasks t2 and t4 in the simulated event log, the
algorithm identifies that these tasks are executed twice for each case. Similar
behavior can be observed for the Cancel Case pattern; looping behavior is iden-
tified, which is not recorded in the event log. Regarding state-based patterns,
Discovery of Workflow Patterns 13
Fig. 5: Discovered process model by the Fodina Miner for Implicit Termination
pattern
Fodina cannot identify any of them but performs well on the multiple instances
patterns, with the two patterns dealing with a priori knowledge being partially
covered. The discovered model does not indicate the number of instances, allow-
ing more activity instances to be instantiated than initially defined.
4.6 Split Miner
The Split Miner [2] discovers uses also directly-follows relations, similar to the α-
Miner, the Inductive Miner, and the Fodina Miner. After some filtering, different
discovery steps will be applied to the directly-follows graph, leading to a BPMN
model. It is able to identify all basic control-flow patterns. The discovery of the
Simple Merge pattern, however, is only covered for acyclic processes because
of soundness. For cyclic processes, the algorithm fails in differentiating between
Simple Merge and Multi-Merge. Furthermore, the Split Miner is able to detect
the Multi-Choice but fails in identifying the Structured Synchronizing Merge
pattern and the Structured Discriminator.
Looping behavior such as the Arbitrary Cycles and Structured Loop pat-
terns are covered by the discovery technique, whereas the Recursion pattern is
only partially covered due to missing gateway conditions. Regarding multiple in-
stances patterns, the Split Miner does not cover the pattern Multiple Instances
without Synchronization. The asynchronous and independent behavior of the
multiple activity instances to other parallel branches is not captured in the
model. Patterns dealing with a priori knowledge are partially covered because
the information on the number of instances was not discovered. The general
behavior, however, is correct, which is why the algorithm performs well on the
pattern dealing without a priori run-time knowledge.
Moreover, the Split Miner partially covers the Deferred Choice pattern and
the Cancel Case pattern. The Deferred Choice pattern cannot be identified
completely because external events are not distinguished from lifecycle events of
activities. The quality of the Cancel Case depends on the observation frequency
of canceling at different activities in a log. For example, the Cancel Case would
have been correctly identified (cf. Fig. 6, blue markings indicate correct cancel-
lation behavior) but due to low occurrences of cancellation after t3, the Split
Miner abstracts from the process data and does not include the cancellation
option after t3 in the output model as marked in red.
14 K. Andree et al.
(a) Event log (b) Discovered process model by the Split Miner
Fig. 6: Event log and discovered model of the Cancel Case pattern
4.7 BPMN Miner
The BPMN Miner [25] does not provide a flat process model, and therefore, it can
detect many more patterns than most other discovery algorithms. Functionality-
wise, the BPMN Miner starts by extracting a hierarchy from the event logs
based on instance identifiers. Afterward, each subprocess will be processed inde-
pendently with different algorithms to identify different event types. To detect
advanced patterns, the event logs must include specific information in a certain
key format and the information on the event types. Thus, the BPMN miner
introduces stronger assumptions about the input log than the other miners.
Fig. 7: Discovered process model by the BPMN Miner of the Cancel Case pattern
All of the basic control flow patterns and the Multi-Choice and Structured
Synchronizing Merge from the advanced branching and synchronization pat-
terns were discovered. The Structured Discriminator is not discovered since
the focus of the BPMN Miner is on multiple instances and boundary events,
and this pattern would need specific handling. However, general looping be-
havior is covered by the algorithm. The Arbitrary Cycles and Structured
Loop patterns were discovered by the BPMN Miner. Considering the more com-
plex patterns, which the classical process mining techniques could not discover,
the BPMN Miner can detect one of the multiple instances patterns, namely
Discovery of Workflow Patterns 15
Multiple Instances without a priori Run-Time Knowledge. The patterns
Multiple Instances without Synchronization and with a priori Design-
Time Knowledge are not detected. The BPMN Miner fails to identify the multiple
instance activity, so the discovered model does not allow for multiple instance
behavior. In particular, with regard to cancellation patterns, this discovery tech-
nique is different to all other discovery algorithms because it can identify both
patterns Cancel Case and Cancel Task. Fig. 7 shows the discovered process
model for the Cancel Case pattern with the blue boxes showing the correct
cancellation points after each activity and during the execution of t2.
5 Discussion
Our analysis of which existing discovery algorithms currently cover workflow
patterns shows that only a limited number of workflow patterns are supported
by most algorithms. This section discusses the results and compares them to
research works developed to support certain workflow patterns.
Almost all discovery algorithms detected the Simple Merge. Nevertheless, it
is not ensured this pattern is always identified. Depending on the event log and
algorithms’ functionality, the Multi-Merge pattern could be detected instead
(unless proper termination/soundness is guaranteed for the output models). The
challenge for a business analyst might be distinguishing these two in the resulting
process model, especially when they do not expect the presence of Multi-Merge.
Analyzing workflow patterns, where case attributes or external factors deter-
mine business process routing, is an important and understudied area. Sarno et
al. [23] introduce process model modifications for multi-choice behavior to en-
sure decision mining for the Multi-Choice pattern in Petri nets. An extension
of the discovery of Recursion and hierarchical behavior was done by Leemans
et al. [19]. They compared process mining techniques and dynamic and statis-
tical analysis techniques and showed the failure of these methods in identifying
Recursion and hierarchy. The approach extended the process tree representation
with a new tree operator to support both behaviors.
Mining multi instantiation is difficult for traditional process mining approaches
because most (except for the BPMN miner) produce flat process models. How-
ever, to detect multiple instance patterns and the Interleaved Parallel Routing
pattern, sub-structures between events have to be detected. Thus, besides the
state-based patterns, the group of multiple instance patterns has the lowest
coverage. Weber et al. [27] propose a method to discover multi-instance sub-
processes, also with the help of additional annotations in event logs. Recent de-
veloped hierarchical process discovery techniques [24,21] support the automatic
detection of generic sub-process structures.
Some control-flow patterns, like the Structured Discriminator, might not
be present in most real-life business processes. Therefore, it is reasonable to not
focus on full coverage of control-flow patterns but on a useful subset of patterns
that should be identified in future research. For some algorithms, e.g., the Split
Miner [2], it is already mentioned that it intentionally omitted the OR-split
16 K. Andree et al.
and -join to generate a simpler process model for the business analyst. Another
strategy is to let the business analyst decide which patterns should be detected.
Based on these two observations, we believe that either enriched event logs with
additional information on sub-process structures, activities, and events or auto-
matic techniques, e.g., with the help of natural language processing, can help to
overcome current limitations in identifying advanced control-flow patterns.
Because existing discovery algorithms do not distinguish between activities
and external events12 in the output process model, they cannot distinguish a
Exclusive Choice from a Deferred Choice. Conceptually speaking, activities
in a process take time to perform the corresponding work, while process events
instantly happen in the environment and are either produced or received by a
process [29]. In existing algorithms, the events stored in a log are assumed to
be life cycle transitions of activities [3]. Thus, process events are not actively
detected and distinguished from activity life cycle transitions.
Finally, we want to discuss some threads to the validity of our work. First,
we only compared a limited number of process discovery algorithms, omitting
the full range of techniques available. However, we aimed to include a repre-
sentative set of diverse techniques with existing implementations. Additionally,
simulating workflow patterns cannot fully replace real event logs from compre-
hensive workflow management systems. Our implemented patterns intentionally
exhibit certain behaviors more frequently than in reality, such as case cancel-
lations. For comparability reasons, we convert all discovered process models to
Petri nets, implying a loss of information since reducing BPMN constructs or
causal nets, for example, to silent transitions means losing relevant process be-
havior. Nonetheless, this paper introduces an approach for comparing discovery
algorithms using workflow patterns and provides a collection of implemented
colored Petri nets for 23 control-flow patterns and event logs. These resources
can serve as input for further research in process discovery.
6 Conclusion
This paper analyzes the coverage of existing discovery algorithms regarding
the control-flow patterns. In conclusion, most discovery techniques cover the
basic control-flow patterns well. Advanced control flow patterns are only par-
tially covered. The Multi-Choice pattern, for example, was discovered by four
out of seven selected process discovery algorithms. In contrast, the structured
Discriminator remains undiscovered. While iteration patterns and Implicit
Termination are well covered, Explicit Termination and advanced patterns
are almost not covered at all. In our discussion, we argue that existing algorithms
also have the potential to extend their coverage by adding further heuristics or
ordering relations that are detected by the algorithms. Furthermore, hierarchical
discovery techniques, as developed recently, can also help to detect additional
patterns. The analysis in this work can stimulate the future development of
12 Note the distinction here between process events and log events.
Discovery of Workflow Patterns 17
discovery algorithms, possibly targeted at clearer or broader coverage of pat-
terns. Furthermore, the comparison can be extended to more existing discovery
algorithms and more workflow patterns, including those concerning data and
resources. For 23 workflow patterns, we provide executable colored Petri nets
and the corresponding event logs, which can be used for further analysis.
References
1. Cpn tools - a tool for editing, simulating, and analyzing colored petri nets., https:
//cpntools.org/
2. A. Augusto, R. Conforti, M.D., Rosa, M.L.: Research lab split miner, https://
apromore.com/research-lab/
3. van der Aalst, W.M.P.: Process Mining - Discovery, Conformance and Enhance-
ment of Business Processes. Springer (2011)
4. van der Aalst, W.M.P., ter Hofstede, A.H.M., Kiepuszewski, B., Barros, A.P.:
Workflow patterns. Distributed Parallel Databases 14(1), 5–51 (2003)
5. Agrawal, R., Gunopulos, D., Leymann, F.: Mining process models from workflow
logs. In: Schek, H., Saltor, F., Ramos, I., Alonso, G. (eds.) Advances in Database
Technology - EDBT’98. LNCS, vol. 1377, pp. 469–483. Springer (1998)
6. Alves De Medeiros, A., G¨unther, C.: Process mining: Using CPN tools to create
test logs for mining algorithms, pp. 177–190. DAIMI, University of Aarhus (2005),
6th Workshop and Tutorial on Practical Use of Coloured Petri Nets and the CPN
Tools (CPN ’05), 2005, Aarhus, Denmark, CPN ’05
7. Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Maggi, F.M., Marrella, A.,
Mecella, M., Soo, A.: Automated discovery of process models from event logs:
Review and benchmark. IEEE Trans. Knowl. Data Eng. 31(4), 686–705 (2019)
8. S. K. vanden Broucke, J.D.W.: Fodina: Robust and flexible process discovery, http:
//www.processmining.be/fodina/
9. vanden Broucke, S.K.L.M., Weerdt, J.D.: Fodina: A robust and flexible heuristic
process discovery technique. Decis. Support Syst. 100, 109–118 (2017)
10. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Quality dimensions in
process discovery: The importance of fitness, precision, generalization and simplic-
ity. Int. J. Cooperative Inf. Syst. 23(1) (2014)
11. Cardoso, J.: Business process quality metrics: Log-based complexity of workflow
patterns. In: Meersman, R., Tari, Z. (eds.) On the Move to Meaningful Internet
Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, 2007, Vilamoura, Portu-
gal, November 25-30, 2007. LNCS, vol. 4803, pp. 427–434. Springer (2007)
12. Conforti, R., Dumas, M., Garc´ıa-Ba˜nuelos, L., Rosa, M.L.: BPMN miner: Auto-
mated discovery of BPMN process models with hierarchical structure. Inf. Syst.
56, 284–303 (2016)
13. van Dongen, B.F., de Medeiros, A.K.A., Wen, L.: Process mining: Overview and
outlook of petri net discovery algorithms. Trans. Petri Nets Other Model. Concurr.
2, 225–242 (2009)
14. Gaaloul, W., Ba¨ına, K., Godart, C.: Towards mining structural workflow patterns.
In: Andersen, K.V., Debenham, J.K., Wagner, R.R. (eds.) DEXA 2005, Copen-
hagen, Denmark, August 22-26, 2005. LNCS, vol. 3588, pp. 24–33. Springer (2005)
15. Gaaloul, W., Ba¨ına, K., Godart, C.: A bottom-up workflow mining approach for
workflow applications analysis. In: Lee, J., Shim, J., Lee, S., Bussler, C., Shim,
S.S.Y. (eds.) DEECS 2006, San Francisco, CA, USA, June 26, 2006, Proceedings.
LNCS, vol. 4055, pp. 182–197. Springer (2006)
18 K. Andree et al.
16. G¨unther, C.W., van der Aalst, W.M.P.: Fuzzy mining - adaptive process simplifi-
cation based on multi-perspective metrics. In: Alonso, G., Dadam, P., Rosemann,
M. (eds.) BPM 2007, Brisbane, Australia, September 24-28, 2007, Proceedings.
LNCS, vol. 4714, pp. 328–343. Springer (2007)
17. Hobeck, R., Pufahl, L., Weber, I.: Process mining on curriculum-based study data:
A case study at a german university. In: Montali, M., Senderovich, A., Weidlich, M.
(eds.) Process Mining Workshops - ICPM 2022 International Workshops, Bozen-
Bolzano, Italy, October 23-28, 2022. LNBIP, vol. 468, pp. 577–589. Springer (2022)
18. Jensen, K., Kristensen, L.M.: Colored petri nets: a graphical language for formal
modeling and validation of concurrent systems. CACM 58(6), 61–70 (2015)
19. Leemans, M., van der Aalst, W.M.P., van den Brand, M.G.J.: Recursion aware
modeling and discovery for hierarchical software event log analysis. In: Oliveto,
R., Penta, M.D., Shepherd, D.C. (eds.) SANER 2018, Campobasso, Italy, March
20-23, 2018. pp. 185–196. IEEE Computer Society (2018)
20. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured
process models from event logs containing infrequent behaviour. In: Lohmann, N.,
Song, M., Wohed, P. (eds.) BPM 2013 Workshops, Beijing, China, August 26, 2013.
LNBIP, vol. 171, pp. 66–78. Springer (2013)
21. Leemans, S.J.J., Goel, K., van Zelst, S.J.: Using multi-level information in hierar-
chical process mining: Balancing behavioural quality and model complexity. In: van
Dongen, B.F., Montali, M., Wynn, M.T. (eds.) ICPM 2020, Padua, Italy, October
4-9, 2020. pp. 137–144. IEEE (2020)
22. Russell, N., Ter Hofstede, A.H., Van Der Aalst, W.M., Mulyar, N.: Workflow
control-flow patterns: A revised view. BPM Center Report BPM-06-22, BPMcen-
ter. org 2006 (2006)
23. Sarno, R., Sari, P.L.I., Ginardi, H., Sunaryono, D., Mukhlash, I.: Decision min-
ing for multi choice workflow patterns. In: 2013 International Conference on Com-
puter, Control, Informatics and Its Applications, IC3INA 2013, Jakarta, Indonesia,
November 19-21, 2013. pp. 337–342. IEEE (2013)
24. Schuster, D., van Zelst, S.J., van der Aalst, W.M.P.: Incremental discovery of
hierarchical process models. In: Dalpiaz, F., Zdravkovic, J., Loucopoulos, P. (eds.)
RCIS 2020, Limassol, Cyprus, September 23-25, 2020. LNBIP, vol. 385, pp. 417–
433. Springer (2020)
25. at University of Tartu, S.E.R.G., the BPM Discipline at Queensland University of
Technology: Bpmn miner 2.0 - a tool for automated discovery of structured bpmn
models from event logs, https://sep.cs.ut.ee/Main/BPMNMiner/
26. Van Der Aalst, W.: Process mining: data science in action, vol. 2. Springer (2016)
27. Weber, I., Farshchi, M., Mendling, J., Schneider, J.: Mining processes with multi-
instantiation. In: Wainwright, R.L., Corchado, J.M., Bechini, A., Hong, J. (eds.)
Proceedings of the 30th Annual ACM Symposium on Applied Computing, Sala-
manca, Spain, April 13-17, 2015. pp. 1231–1237. ACM (2015)
28. Weijters, A.J.M.M., Ribeiro, J.T.S.: Flexible heuristics miner (FHM). In: Proceed-
ings of the IEEE Symposium on Computational Intelligence and Data Mining,
CIDM, April 11-15, 2011, Paris, France. pp. 310–317. IEEE (2011)
29. Weske, M.: Business Process Management - Concepts, Languages, Architectures,
Third Edition. Springer (2019)
30. van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P., Verbeek, H.M.W.: Dis-
covering workflow nets using integer linear programming. Computing 100(5), 529–
556 (2018)