Conference PaperPDF Available

A Method for Debugging Process Discovery Pipelines to Analyze the Consistency of Model Properties


Abstract and Figures

Event logs have become a valuable information source for business process management, e.g., when analysts discover process models to inspect the process behavior and to infer actionable insights. To this end, analysts configure discovery pipelines in which logs are filtered, enriched, abstracted, and process models are derived. While pipeline operations are necessary to manage log imperfections and complexity, they might, however, influence the nature of the discovered process model and its properties. Ultimately, not considering this possibility can negatively affect downstream decision making. We hence propose a framework for assessing the consistency of model properties with respect to the pipeline operations and their parameters, and, if inconsistencies are present, for revealing which parameters contribute to them. Following recent literature on software engineering for machine learning, we refer to it as debugging. From evaluating our framework in a real-world analysis scenario based on complex event logs and third-party pipeline configurations, we see strong evidence towards it being a valuable addition to the process mining toolbox.
Content may be subject to copyright.
EasyChair Preprint
A Method for Debugging Process Discovery
Pipelines to Analyze the Consistency of Model
Christopher Klinkmüller, Alexander Seeliger, Richard Müller,
Luise Pufahl and Ingo Weber
EasyChair preprints are intended for rapid
dissemination of research results and are
integrated with the rest of EasyChair.
June 17, 2021
A Method for Debugging Process Discovery Pipelines
to Analyse the Consistency of Model Properties
Christopher Klinkm¨
uller1, Alexander Seeliger2, Richard M¨
uller3, Luise Pufahl4, and
Ingo Weber4
1CSIRO Data61, Sydney, Australia,
2TU Darmstadt, Germany,
3Leipzig University, Germany,
4Chair of Software and Business Engineering,
Technische Universitaet Berlin, Germany,
Abstract. Event logs have become a valuable information source for business
process management, e.g., when analysts discover process models to inspect the
process behavior and to infer actionable insights. To this end, analysts config-
ure discovery pipelines in which logs are filtered, enriched, abstracted, and pro-
cess models are derived. While pipeline operations are necessary to manage log
imperfections and complexity, they might, however, influence the nature of the
discovered process model and its properties. Ultimately, not considering this pos-
sibility can negatively aect downstream decision making. We hence propose
a framework for assessing the consistency of model properties with respect to
the pipeline operations and their parameters, and, if inconsistencies are present,
for revealing which parameters contribute to them. Following recent literature
on software engineering for machine learning, we refer to it as debugging. From
evaluating our framework in a real-world analysis scenario based on complex
event logs and third-party pipeline configurations, we see strong evidence to-
wards it being a valuable addition to the process mining toolbox.
Keywords: Process Mining, Discovery, Uncertainty & Sensitivity Analysis
1 Introduction
Historic process information from event logs enables analysts to derive business process
insights using process mining [1]: process discovery [5,19] infers process models from
the recorded behavior, conformance checking [30,12] relates observed behavior to an
existing process model, process enhancement [2,6] repairs models or extends them e.g.,
with performance and resource information, and predictive process monitoring [22,17]
forecasts how process instances may unfold during execution.
The maturity of those techniques has led to an increasing adoption of process min-
ing in industry projects, where analysts often find answers to business problems through
a divide-and-conquer strategy by breaking down those problems into fine-grain infor-
mation needs [10]. Here, process discovery plays a crucial role, as analysts interpret the
properties of the discovered models to derive insights [32] that then serve as a founda-
tion for understanding related aspects [1,18]. If interpreted carelessly, process discov-
ery insights can hence negatively aect downstream analysis. Thus, evaluating insights
2 C. Klinkm¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
Limitations of Data-Driven Evaluation for
Process Discovery
Conventional Model
Quality Assessment Break Down Model Properties
Investigate Parameter Effects
Fig. 1: An extended perspective for the evaluation of process discovery results
from mining, particularly discovery, should be a key activity in each project [10,25]
to confirm findings and to turn them into reliable and actionable insights [32]. Besides
verifying scripts or tool configurations, consulting domain experts, or investigating the
process environment, analysts can also perform data-driven evaluation [37].
Commonly, discovery results are evaluated by means of model-centric metrics like
fitness, precision, generalization, and simplicity [9,15], which are e.g., computed via
conformance checking [12,30] with the log that served as input to the discovery algo-
rithm. Those metrics are valuable for assessing the reliability of discovery algorithms,
and we want to complement them by expanding the evaluation perspective, as shown in
Figure 1. Analysts typically set up process discovery pipelines to transform logs before
discovering a model. While necessary to manage log imperfections and complexity,
such a pipeline potentially constrains the validity of the behavior covered by the dis-
covered model. Thus, we propose to examine how pipeline parameters aect properties
of the discovered process models at dierent granularity levels, because analysts often
focus on specific execution paths and patterns to break down the model topology [18].
To this end, we propose a method to investigate the consistency of model properties
by means of uncertainty and sensitivity analysis [36]. Our primary goal is to enable
what-if analyses in which the reliability of insights is assessed by examining relation-
ships between pipeline parameters and model properties. Yet, the method can also be
applied to guide the pipeline definition, or to generate insights from those relation-
ships. In more detail, we present a configurable framework to evaluate, if user-defined
model properties are consistent with results from varied configurations of a user-defined
pipeline and to quantify the contribution of individual pipeline parameters towards in-
consistencies. In doing so, we follow recent work in software engineering [3], which
defines a notion of debugging for machine-learning (ML) pipelines. As such, our pro-
posal can be understood as a method for debugging process discovery pipelines.
Following, we discuss the problem in Section 2, relying on observations from a
competitive process analysis challenge and an illustrative analysis of a moderately com-
plex real-world dataset. We then outline the framework and demonstrate its application
using the same dataset in Section 3. In a separate experiment, we investigate our frame-
work in a realistic analysis setting based on another real-world dataset with high com-
plexity in Section 4. Here, we substantiate the utility of our framework by showing
that its output is founded in observations by external analysts and theory. The results
demonstrate that our debugging framework is a valuable addition to the process min-
ing toolbox: in addition to existing guidelines, patterns, and tools which we discuss in
Section 5, it enables analysts and their audiences to comprehend the degree to which
properties of discovered models are constrained by analytical decisions in a specific
context. Finally, we conclude the paper and discuss future directions in Section 6.
Debugging Process Discovery Pipelines 3
2 Basic Terminology & Problem Illustration
An event log L is a set of traces and each trace is an ordered sequence of events. Event
logs also contain features that describe properties of events and traces, such as case iden-
tifiers, event timestamps, or activity names. A process model P is a directed graph where
typed and labeled nodes represent activities, gateways, events, etc., whereas edges de-
pict the control flow. Finally, Land Pdenote the universes of event logs and process
models, respectively. Note that for the purposes of this paper this basic understanding is
sucient. We hence omit formal definitions which are e.g., presented in [1, Ch. 3 & 5].
To analyze the process behavior captured in an event log, analysts often define pro-
cess discovery pipelines, either implicitly or explicitly. In this paper, we primarily focus
on pipelines that transform a single log into a single model. In the general case, how-
ever, a process discovery pipeline can be viewed as a function δ:Lnl× Xnx→ Pnp
that takes nlevent logs and a set of nxparameters from the universe of parameters X
and returns npprocess models. Pipelines are assembled by combining transformation
and discovery operators. Each operator can be configured via its own set of parameters,
all of which are included in the set of parameters that serves as input to the discovery
pipeline. Pipelines can be implemented as Python or R-scripts based on packages like
dplyr5, bupaR6, pandas7, and pm4py8, or by incrementally executing tools or compo-
nents, like ProM plugins9, but they often involve multiple tools and adhoc scripts [18].
The reasons for analysts to apply discovery pipelines are twofold. On the one hand,
logs might contain imperfections, such as missing values or outlier behavior. To elim-
inate those imperfections, analysts filter traces or events, and manipulate features to
improve their quality or to enrich logs with data from other information sources. On the
other hand, log complexity typically poses a challenge in interpreting the data, when logs
contain drifts or describe a diverse range of activities or variants. In addition to filter-
ing cases and events, analysts commonly lift the level of abstraction by defining higher
level activities or sub-processes and by aggregating the events in the log accordingly.
Note that some operations are directly supported by discovery algorithms, e.g., the in-
ductive miner [19] can filter infrequent behavior, while directly-follows graph mining
techniques often allow analysts to filter paths and activities based on their frequencies.
In this work, we postulate that the analytical decisions behind the pipeline config-
uration ultimately constrain the degree to which the behavior depicted in a discovered
process model can be generalized. Consider e.g., the following observations from the
business process intelligence challenge (BPIC), a competition that invites researchers,
students, and experts to submit analysis reports for real-world event logs. Table 1 con-
trasts the complexity of the five event logs from BPIC 201510 with the distribution of
complexity of the discovered process models presented in the nine submissions. While
the event logs are highly complex with 350+activities and 800+variants, the majority
of the models contains between 6 and 40 activities. We could not reliably quantify the
5, accessed 2021-05-12
6, accessed 2021-05-12
7, accessed 2021-05-12
8, accessed 2021-05-12
9, accessed 2021-05-12
10, accessed 2021-05-12
4 C. Klinkm¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
Table 1: Complexity of event logs and of discovered models in BPIC 2015
Logs 1 2 3 4 5
# Events 52,217 44,354 59,681 47,293 59,083
Log Complexity # Activities 398 410 383 356 389
# Variants 1,170 828 1,349 1,049 1,153
Model Complexity # Activities
number of model paths, but observed that the models only allowed for a fraction of
the log variants. Moreover, one report in fact included models discovered from the raw
logs, to demonstrate that it is impossible to interpret these models. While necessary to
manage the cognitive load, the transformations in the underlying pipelines can aect
the nature of the discovered model, even if they are less extensive, as illustrated below.
We analyzed the Sepsis event log11 which captures treatments of Sepsis patients in
a Dutch hospital [23]. Its complexity is moderate (1,050 cases, 15,214 events, 16 ac-
tivities), rendering it useful for illustration purposes. We used the default configuration
of the inductive miner [19] (infrequent variant, noise threshold =0.2) to discover a
process model. But, we first filtered out short cases with an execution duration smaller
than minDuration based on a common assumption that short cases represent incom-
plete or outlier behavior. Next, we abstracted the log by aggregating activities related to
the release of patients. That is, if consolidate is set to true, all release-related events
are re-labeled and in each trace all but the last release-related events are removed. Note
that these transformations are not presented here as the ideal way to handle the log, but
merely for illustration purposes. We chose the transformations, as we observed that they
were commonly applied in submissions to dierent editions of the BPIC.
By varying the two parameters, we yielded the four models shown in Figure 2. The
dierences between the models demonstrate that discovery results can strongly depend
on a specific pipeline configuration and hence might be inconsistent with models dis-
covered using varied configurations. For instance, model 1 indicates that the registration
activities are executed in arbitrary order before all other activities; in model 2 and 3 they
0 sec 1 day
ER Registration
ER Sepsis Triage
ER Triage
Admission NC
IV Liquid
IV Antibiotics
Admission IC
Release A
Release B
Release C
Release D
Release E
Return ER
Release Patient
model 1
avg. fitness: 90.6%
model 3
avg. fitness: 93.3%
model 2
avg. fitness: 92.9%
model 4
avg. fitness: 82.4%
Fig. 2: Sepsis results for dierent pipeline configurations (fitness calculated with the
multi-perspective process explorer in ProM with the transformed event logs).
11 Cases - Event Log/12707639, accessed 2021-03-12
Debugging Process Discovery Pipelines 5
Table 2: Eects of the analyst’s awareness of result uncertainties (adapted from [31])
Discovery Result
No Uncertainties Uncertainties
Aware trust in insight: high trust in insight: medium-low
decision making: unaected decision making: largely unaected
Mistaken trust in insight: medium-low trust in insight: high
decision making: aected decision making: severely aected
Unaware trust in insight: medium trust in insight: medium
decision making: unaected decision making: severely aected
are optional and parallel to the treatment activities; and in model 4 the registration ac-
tivity B requires the completion of the two remaining registration activities A and C.
Dierences consequently also exist at the level of the model topology. Yet, the models
achieve similar fitness values. This shows that model-centric quality metrics may not
reflect how pipeline configurations impact properties of the discovered process models.
In summary, we demonstrated that, while configuring a discovery pipeline is nec-
essary to manage log imperfections and complexity, it might constrain the discovered
model, when varied pipeline configurations yield inconsistent outputs. This can ulti-
mately aect the certainty with which insights can be inferred from a discovered model.
Following the awareness classification from [31] (see Table 2), we argue that insight un-
certainties can impact the decision making that is based on the insights. In the presence
of uncertainties, the chance of error due to unjustified trust in the insights is high, when
analysts are unaware of or mistakenly assume the absence of uncertainties. But also
in the absence of uncertainties, decision making might be impaired when analysts un-
necessarily question the insight validity due to mistakenly assuming that uncertainties
exist. While in the remaining cases the decision making is usually not aected, analysts
(and their audiences) should ideally always be aware of the level of uncertainty that is
associated with the insights and of its root causes.
3 Debugging of Process Discovery Pipelines
The necessity to address log imperfections and complexity via pipeline operations can
result in uncertain insights and impaired decision making (see Section 2). Such uncer-
tainty can stem from stochastic operators, but most often is introduced by the pipeline
parameters. For example, while there might be a plausible range of threshold values for
a filter that removes outlier traces with short durations, the precise value can be uncer-
tain. Diagnosing such uncertainty by manually varying parameters and inspecting the
respective outputs is infeasible due to the number of configurations needed to obtain
reliable conclusions, especially when model and pipeline complexities, or parameter
interactions are present. Moreover, it is not transparent to the model audience. Hence,
to assist analysts in debugging their discovery pipelines, we pursue two objectives:
O1: Assess the consistency of model properties to unveil potential pipeline constraints.
O2: Quantify the influence of parameters to provide explanations for inconsistencies.
While our approach could be used to evaluate steps in pipelines generally, we designed
it with the purpose of allowing an analyst to achieve objectives O1 and O2 for a concrete
6 C. Klinkm¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
Sample the Pipeline
{𝑋𝑖, 𝑃𝑖𝑋𝑖}𝑖≤𝑛𝑥
Measure the Consistency
for each Execution
Analyze the
of the Pipeline
Fig. 3: Framework for investigating property consistency in process discovery pipelines
case. As such, the standard situation for applying our framework is: an analyst has cre-
ated a concrete pipeline with a concrete parameter configuration to generate a baseline
model. The analyst then investigates how the parameters influence the model properties
(i) to substantiate insights inferred from the baseline model, (ii) to iteratively construct
a reliable pipeline, or (iii) to generate insights from parameter /property relationships.
In all cases, the metrics are calculated relative to the properties of the baseline model.
To this end, one conceivable strategy is to instrument the pipeline and to track the
validity of model properties in all steps [45], i.e., in all intermediate logs and the discov-
ered process models. Yet, as this analysis only considers the current configuration, we
would not be able to measure the consistency of model properties with it, or to reason
about the general influence of parameters. Hence, we adopt uncertainty and sensitiv-
ity analysis which provides means to quantify eects of varied pipeline configurations.
In this regard, a first option are one-at-a-time designs [36, pp. 66–69]. In such a de-
sign we would examine both objectives by focusing on each parameter individually.
Given a parameter, we would repeatedly change its value and for each value execute
the pipeline without modifying any of the other parameters. Then, we would use the
generated outcomes to examine how variations in the parameter change the pipeline
outcome. While this is computationally ecient, the analytical results can be skewed in
the presence of parameter interactions [34]. Global sensitivity analysis overcomes this
limitation by studying the eects of simultaneous parameter changes. Here, variogram
analysis of response surfaces (VARS) [29] aims to reveal the spatial structure and vari-
ability of model outputs. Essentially, VARS models the output space as a variogram
function that describes the degree to which model outcomes for a specific parameter
configuration Xdepend on outcomes produced by configurations in the vicinity of X.
This variogram function is then used to examine properties of input-output relation-
ships. However, VARS does not provide clear indications for the importance of inputs
and thus, they should be used to complement variance-based sensitivity analysis [28].
We follow this argumentation and build our framework on the scheme for variance-
based sensitivity analysis from [35].
As shown in Figure 3, we first sample the pipeline (Section 3.1). That is, we exe-
cute the user-defined pipeline δ:Lnl× Xnx→ Pnpmultiple times to generate process
models for dierent parameter configurations. Here, we consider event logs to be con-
stants. This eectively turns discovery pipelines into functions δX:Xnx→ Pnpthat only
take parameters as input. To guide the exploration and the parameter sampling, analysts
must specify the relevant parameters and their probability measures {(Xi,Pi(Xi))}inx.
Next, we measure the property consistency for each execution (Section 3.2), requir-
ing the analysts to manually determine the model properties for which they want to
measure the consistency, i.e., the degree to which a (set of) model(s) produced in a
Debugging Process Discovery Pipelines 7
single execution satisfies this property. In particular, the analyst must provide a set of
nmproperty consistency measurements {µj}jnmwhere each function µj:Pn0
represents a specific property and returns the consistency for this property as observed
in a set of n0
p,jprocess models: a value of 0 indicates total inconsistency, a value of 1
perfect consistency, and values in between degrees of consistency. Lastly, we analyze
the property consistency of the pipeline (Section 3.3): an uncertainty analysis assesses
the degree to which a model property changes when pipeline parameters vary (O1),
whereas sensitivity analysis quantifies the contribution of individual parameters to po-
tential inconsistencies (O2). Below, we describe each step using the Sepsis experiment
from Section 2 for illustration purposes.
3.1 Sampling the Pipeline
To explore the output of dierent pipeline configurations, we first create a k×nxcon-
figuration matrix Awhich comprises the configurations for kpipeline executions. Each
configuration contains nxvalues, one per relevant parameter Xi. We use the configura-
tions in Ato assess whether the pipeline yields inconsistencies (O1, see Section 3.3).
If there are inconsistencies and it must be analyzed how parameters contribute to them
(O2, see Section 3.3), then for each parameter Xiwe create an additional k×nxconfigu-
ration matrix ABiby copying Aand varying the values in the ith column which defines
the values for parameter Xi. Comparing the results obtained from the configurations in
Aand ABiallows us to quantify the influence of parameter Xi. Thus, when desired, O2
requires k×nxadditional pipeline executions, yielding a total of k×(nx+1) executions.
For a reliable analysis we need configurations that (i) suciently sample the en-
tire parameter space and (ii) systematically vary the parameter values. We achieve this
based on the procedure that yielded the best results in a comparative evaluation by
Saltelli et al. [35]. First, we use a low-discrepancy sequence to generate two temporary
k×nxmatrices Atand Btwhere each row is a point in the nx-dimensional unit cube.
Low-discrepancy sequences ensure that the parameter space is evenly sampled. We here
use the Sobol’ sequence [39] which, in contrast to sequences like the Latin Hypercube
design, has the advantage that we do not necessarily need to fix the sample size, but
could in principle dynamically generate new configurations until the analysis results
converge. We use the Sobol’ sequence to generate a k×2nxmatrix that is split in half to
obtain the temporary matrices Atand Btfrom the left and right half, respectively. While
we derive Adirectly from At, we use Btto create the temporary matrices {ABt
the radial sampling strategy [33]. That is, for each parameter Xiwe construct ABt
copying Atand replacing the i-th column with the respective column from Bt. Lastly,
we obtain the configuration matrices (Aand {ABi}inx) by interpreting the values in the
temporary matrices as probabilities: for each parameter we convert each value pin the
i-th columns of the temporary matrices to a parameter value xfor Xiso that the respec-
tive cumulative probability yields the probability pfor value x, i.e., Pi(Xix)=p. The
final step is to execute the discovery pipeline for each configuration in Ato discover the
process models. The configurations from {ABi}inxare only executed, if inconsistencies
exist for which the analyst wishes to inspect the influence of parameters.
In our running example, the Sepsis experiment, we sample the pipeline for the pa-
rameters minDuration,consolidate, and threshold, in this order of parameters.
8 C. Klinkm¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
We here also consider the threshold parameter, because in Section 2 it was set to 0.2
by default and might have influenced the results. For consolidate and threshold we
use uniform distributions over their entire domains ({false,true}and [0,1]), whereas
for minDuration we use the empirical distribution of case durations in the log for
all values 2 days. Setting minDuration to 2 days would exclude about 29% of the
cases, and hence we chose this value as an upper bound. Taking a concrete example
for a configuration, say the current configuration from Ator ABt
iis (0.7, 0.6, 0.3);
then our approach derives the following parameter values as per the above use of the
cumulative probabilities. The 70th percentile of the actual data for minDuration is
at 4h 10min, and therefore we get minDuration = 4h 10min. 0.6>0.5, hence we
get consolidate=true. For threshold, the uniform distribution equals the identity
function, hence threshold=0.3. We set the sample size kto 1,000 resulting in 1,000
executions for O1 and (3 ×1,000) =3,000 executions for O2.
3.2 Measuring the Property Consistency for a Single Execution
Within our framework, analysts can investigate the consistency of the model topol-
ogy and of fine-grained model properties like execution patterns and paths by defining
property consistency measurements µ:Pn0
P[0,1]. While analysts can provide any
measurement, we propose two specific measurements for single models (n0
P=1). Both
functions rely on the causal behavioral profile [42] which captures behavioral relations
between a set of activities Tas observed in a set of executions E. The causal behavioral
profile is defined as CT,E={ ,+,k,} where activity pairs (t1,t2)T×Tare
1. in strict order (t1 t2), if in all executions with t1and t2,t1occurs before t2;
2. in interleaving order (t1kt2), if they can be executed in arbitrary order;
3. exclusive (t1+t2), if they are never part of the same execution; and
4. co-occurring (t1t2), if the presence of t1implies the presence of t2.
We chose behavioral profiles as a foundation for the concrete consistency measure-
ments, as they have been applied for various tasks including process monitoring, com-
plex event processing, conformance checking, and most importantly model consistency
assessment [43]. Moreover, they can be computed from heterogeneous inputs. Con-
sidering that each trace represents an execution, they can straightforwardly be derived
from logs. An ecient computation for sound process models [42] derives the profile
from a tree representation of the process model. This computation can easily be adopted
for discovery algorithms that output process trees such as the inductive miner [19]. For
directly-follows graphs with a dedicated start and a dedicated end node, every path
from start to end is an execution. Besides these beneficial properties, behavioral pro-
files might however inaccurately represent behavioral relationships in some cases [27].
Hence, a comparative evaluation of consistency measures is required in future work.
The first type of measurement is the profile-based consistency µC:P[0,1]. It
requires the provision of a base profile CTb,Eb. Then, it applies the degree of consistency
metric from [41] to compute a consistency score for CTb,Eband a profile CTd,Edderived
from a discovered process model. This metric relies on an alignment of the activities
from Tband Td. It hence allows us to compare profiles at the same and at dierent
levels of granularity. If two profiles are at the same level of granularity, all activities with
Debugging Process Discovery Pipelines 9
equal labels are aligned. Otherwise, the pipeline includes a log abstraction step in which
fine-grained activities are mapped to higher-level activities e.g., using manually defined
hierarchies or automated label comparison [16]. This mapping defines the alignment.
Based on the alignment, the first step is to determine the sets of aligned activities Ta
dwhich contain all activities from Tband Tdfor which the other activity set contains
aligned activities. The metric then determines the count γof activity pairs in Ta
dwhose relations defined by CTb,Eband CTd,Edmatch the relations of the aligned
activity pairs from the other profile. The relations of two aligned activity pairs (t0
band (t0
dmatch, if both pairs are in strict order, interleaving order
or exclusive, and they either co-occur or not. If an activity pair (t0,t00) is aligned with
multiple pairs, then the relations of all these pairs must match the relations of (t0,t00).
Finally, γis divided by the number of aligned activity pairs |Ta
d|. In this
work, we primarily use the profile from the baseline model discovered with a specific
pipeline configuration to track the degree to which behavioral relations change when
parameters change. Similar to model-centric quality metrics [9], it is also conceivable
to check, if the discovered model accurately reflects the relations in a log, potentially
produced during pipeline execution.
A break down of the model topology to investigate more fine-grain aspects can be
achieved by removing activities from the base profile to focus on certain activity sets.
Additionally, the rule-based consistency µR:P→ {0,1}enables analysts to specify
arbitrary rules in terms of boolean expressions which define relations that need to hold
between specific activities, e.g., that an activity αmust be in strict order with an activity
β. The function then returns a value of 1, if the profile derived from the discovered
model adheres to the rule and a value of 0 otherwise. Note that this is similar to the
use of declarative rules which are defined at the level of events and traces, whereas the
rule-based consistency relies on the more abstract level of the behavioral profile.
In the Sepsis example, we observed some inconsistencies at the model and at the
activity level. Here, we focus on three properties for which we analyze the pipeline con-
sistency below in Section 3.3. First, we use the profile-based consistency to evaluate the
model that we obtained, when setting minDuration to 2 days, consolidate to true,
and threshold to 0.2 (I1), see lower right corner of Figure 2. Additionally, we use
the rule-based consistency to diagnose specific inconsistencies that we observed when
varying the parameters in Figure 2. In particular, we check if the registration activities
Aand Coccur before all other activities (I2), and if the release activities generally occur
at the end of the process (I3). Note that we evaluate all three consistencies based on the
same set of configurations and discovered process models, respectively.
3.3 Analyzing the Property Consistency for the Pipeline
The last step conducts the analyses postulated by the two objectives. We first address
O1 and examine the uncertainty associated with model properties based on the provided
consistency measurements {µj}jnm. To this end, we compose the discovery pipeline
δX:Xnx→ Pnpand each consistency measurement µj:Pnp,j[0,1] to functions
fj=µjδXthat measure the property consistency for models produced by a given
pipeline configuration. This requires that the consistency functions take as many process
models as input as discovered by the pipeline in a single execution, i.e., np=np,j.
10 C. Klinkm¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
For a measurement µj, we first calculate the mean consistency fj=1
over all configurations from the configuration matrix A(see Section 3.1). If the mean
consistency is equal or very close to 1 (or 0 respectively), we know that the respective
property is (not) free of constrains and hence generally (in-)valid. In all other cases,
there is uncertainty regarding the conditions that cause inconsistencies and we next
estimate the consistency variance ˆ
lµj2. If the variance is close
to 0, we can infer that all pipeline configurations yield similar consistency values and
that there likely is a systematic dierence between the property from the baseline model
and the properties of the pipeline output, generally. Such a dierence can be explored
by comparing the originally discovered model to a few models generated with dierent
configurations. Here, the analyst can also resort to restricting the base profile or defining
rule-based consistencies, in order to investigate dierences at a more fine-grained level.
Larger variance values indicate that varied pipeline configurations yield process
models with dierent levels of consistency. To analyze the influence of parameters
as per O2, we compute the total eect index S i,jfor each parameter Xi[13]. It mea-
sures the contribution of parameter Xito the variance in the consistency measure-
ment µjand considers all variance that is directly caused by Xiand by interactions
with other parameters. As suggested in [35], we here use the estimator from [14]:
l=1fj(A)lfj(ABi)l2. This estimator relies on the results of the con-
figuration matrix ABi. The higher the value of the index for a parameter, the more it
contributes to the variance in the consistency measurement. If the sum of the indexes is
larger than 1 Pnx
Si,j)>1the parameters definitely interact.
We conclude by analyzing the pipeline consistency for the Sepsis experiment con-
sidering the sampling configuration and properties from Section 3.1 and 3.2. The mean
model consistency (I1) is f1=.57 and for the two rule-based measures (I2, I3) we yield
mean consistencies of f2=.08 and f3=.21. These low values are in line with our
observations from Figure 2, because they indicate that the behavioral relations in the
baseline model are associated with uncertainty, especially the relations of the registra-
tion and release activities. The variances ( ˆ
V(f1)=.06, ˆ
V(f2)=.07, ˆ
V(f3)=.16) point
to non-systematic dierences which are attributed to all parameters. That is because
all consistency /parameter combinations yield high total eect indexes on the interval
[.71, .92]. This implies that the handling of the log is not optimal and should be changed,
not least because the indexes reveal that there is significant parameter interaction.
4 Experiment
The primary objective of our experiment is to study whether the framework provides a
reliable foundation for investigating the eects of discovery pipeline operations on the
discovered model and its properties. In the following, we first outline and justify our
experimental design in Section 4.1. After that, we discuss our results in Section 4.2.
4.1 Experimental Design
Uncertainty and sensitivity analysis are mature techniques that have been studied in-
tensively, e.g., in [13,14,28,29,33,35,36], and hence provide a solid foundation for our
Debugging Process Discovery Pipelines 11
work. Software engineering for machine learning [3] is an emerging topic, and has not
yet been adopted for process mining (see Section 5). Hence, we validate our frame-
work using a single-case mechanism experiment, a suitable method for investigating
the application of existing technology to a new phenomenon [44, Ch. 18]. To mitigate
the eects of a limited external validity associated with such a design, i.e., the degree
to which the findings can be generalized, we attached great importance to strengthen-
ing the ecological validity, i.e., the realism with which the setup resembles real-world
circumstances, and to minimizing the threat of experimenter bias. Moreover, to ensure
transparency and reproducibility, we followed open science principles by relying on
public data and by publishing our source code12. In more detail, we decided to use the
BPIC 2015 dataset from Section 2. It is a highly complex (see Table 1), publicly avail-
able, real-world dataset for which nine independent analysis reports were published.
The latter allows us to setup a representative discovery pipeline based on operations
commonly applied by external parties on this dataset. We merely use the reports to
guide the pipeline setup. It is not our intention to judge the analysts’ practices, for
which an exact replication of a pipeline would be required (which is neither desired nor
feasible with the level of detail in the reports). The dataset contains five event logs from
applications for building permits in dierent Dutch municipalities. Hence, we can reuse
the sample pipeline to analyze our framework in (slightly) varied circumstances.
We first categorized the applied transformation operations from the reports and as-
sembled the three most common operations into the pipeline from the last row of Ta-
ble 3. First, the log preparation loads the log and performs computations that ease the
analysis. That is, the log specifies an activity code which is the activity identifier, but
also contains a sub-process identifier and an order index. As the sub-process identifier
is used for log consolidation, we extract it into a separate feature. Because events were
logged in batch with overlapping timestamps, we follow advice from the BPIC orga-
nizers and establish the execution order based on the order index. After that, we apply
atime window filter to remove traces that started or completed outside a window de-
fined by pipeline parameters start and end date. This operation addresses the drifts
in the log which impact the discovery, and we here consider a time window from sum-
mer 2013 to spring 2014 in which no drift occurred. If parameter activated is set to
true, we perform a consolidation in which we define the sub-process identifier as the
activity classifier. Further, in each trace we only keep the first and last sub-process event
and set the event lifecycle state to started for the first event, and retain completed
for the last. Next, a frequency filter can reduce the complexity of the discovered process
model by selecting events and traces based on the activity and variant frequency.
Lastly, we apply the infrequent lifecycle variant of the inductive miner [19] where the
noise threshold also allows for filtering behavior.
To systematically study the eects of combining dierent operations, we vary the
subset of relevant parameters from the above six parameters, and set the remaining
parameters to default values. The relevance of parameters for the variants and their
probability measures are summarized in Table 3. V1 establishes a baseline in which we
only vary the parameters of the time window filter. Here, we expect that the absence of
drifts in the considered period (summer 2013 to spring 2014) guarantees a consistent
12 C. Klinkm ¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
Table 3: Pipeline specification for the experiment including the parameters’ emp-irical
or uni-form distributions; their rel-evance for the variants V1–V5 where a default value
is provided for irrelevant parameters (0, 1, f – false, t – true); and the parameter values
that were used to generate baseline profiles for dierent consistency measurements.
Probabilities Variants Baseline Profile Generation
Parameter Type From To V1 V2 V3 V4 V5 norm abst simp mod comp
start date emp 1/5/13 30/6/13 always relevant always set to 1/6/13
end date emp 1/4/13 31/5/14 always relevant always set to 30/4/14
activated uni f t f rel f f rel f t f f f
activity freq. uni 0 1 1 1 rel rel rel 1 1 .2 .35 .5
variant freq. uni 0 1 1 1 rel 1 rel always 1
threshold uni 0 1 0 0 0 rel rel always 0
Log Preparation
BPIC 2015 Pipeline
Time Window Filter
start date
end date
Frequency Filter
activity freq.
variant freq.
Inductive Miner
discovery for slightly varied start and end dates. To study the impact of model
consolidation, V2 additionally considers the activated parameters. Here, we expect
that the information loss which is inadvertently linked to abstraction leads to a drop
in the consistency, but that the discovered models are largely consistent, as we rely on
a clearly defined process hierarchy. In V3 and V4, we add dierent ways of behavior
filtering to V1: while both variants utilize the activity frequency, V3 additionally
combines it with the variant frequency and V4 with the noise threshold. We
hypothesize that these filters interact with the time window filter, which influences the
frequencies in the intermediate log. Finally, in V5 all parameters are relevant.
To investigate the pipeline consistency, we focus on the overall model consistency
using the profile-based consistency. In this regard, dierent baseline models and thus
base profiles emulate dierent degrees of complexity of discovered models (see Ta-
ble 3). All profiles are derived from the log for the default time window. The normative
(norm) profile has the highest complexity. It is discovered directly from the default time
window log and used for all variants. For V2, we also use an abstract (abst) profile
obtained by activating the consolidation. Lastly, for V3 and V4 we aimed to replicate
dierent model complexities in line with the model complexities found in the reports
(Table 1). We generate the simple (simp), moderate (mod), and complex (comp) profiles
by varying the activity frequency to obtain models with 10, 20 and 35 of the
most frequent activities. We did not use the variant frequency or noise threshold,
as their eects on the model complexity diered across the five logs. Yet, the profile-
based consistency still allows us to assess their influence on the discovery results.
4.2 Results
In the analysis, we considered a sample size of k=1,000 for all combinations of
pipeline variants and consistency measurements. To ensure that this sample size yields
reliable results, we first investigated the convergence of the mean consistencies, vari-
ances, and total eect indexes. That is, we computed the values that we obtain for these
Debugging Process Discovery Pipelines 13
Municipality 1
Municipality 2
Municipality 3
Municipality 4
Municipality 5
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fig. 4: Mean consistencies (dot) and variances (error bars) for pipeline variants
measures for sample sizes less than 1,000 and observed that for sample sizes larger than
500, all measures yield values that are very close to the respective values obtained for
k=1,000 on all five logs for all variant/measurement combinations. While this ensures
the reliability of our experiment, it also demonstrates that measuring the convergence
of the values is a strategy to control the number of pipeline executions in real-world
situations. We did not investigate the run-time performance explicitly, but observed that
the inductive miner accounted for a large part of the execution time and that its perfor-
mance depended (unsurprisingly) on the complexity of the input log. To compute all
metrics per variant and dataset, on a customary laptop (Processor: i5-8350U 1.70 GHz;
RAM: 16GB) and using parallel execution we yielded execution times between one and
two hours for (V1); but below 5min for V3–V5, due to complexity reductions in the in-
termediary logs. Note that this is only a rough indication for the run-time performance,
for which we leave deeper investigation and optimization to future work.
We first investigate the uncertainty for each variant and consistency combination,
see Figure 4. A first observation is that the consistency of the normative model is very
high ( fj> .9) for V1. This is in line with our expectations, as we knew from the reports
that the considered period does not contain drifts. Slight variations in the model can
be attributed to a few outlier cases that might occur around the default start and end
date. For V2 we also confirm our expectations, as the model consistency drops (fj> .7)
due to some information loss caused by the consolidation, but is still high. Note that
this holds for the abstracted and the normative model, indicating that log abstraction
is a reliable means for complexity management. Lastly, the variants that apply filtering
(V3–V5) yield very low consistency measures (fj< .5). While we expected some inter-
action with other parameters, we were surprised by the magnitude of the eect of this
interaction. However, this observation is in line with guidelines from [11] that postulate
to carefully apply random subset selection, as it – in contrast to strategic selection, like
the date window filter – can aect the quality of the discovered model. We consider the
filter parameters from V3–V5 to fall in this category, as it is hard for analysts to predict
the eects of certain value combinations. Moreover, the negative eects pertain all base
profiles which shows that the filters aect a large range of the relations and that a broad
range of possible behavior can be generated by modifying the respective parameters.
Overall, the coherence of our expectations and existing guidelines with the experiment
results substantiates the reliability of the consistency measurement.
To study the sensitivity analysis, we focused on the three variants with filtering (V3–
V5) and the normative base profile which overall yielded the largest variance across all
logs. The total eect indexes for all parameters per variant and log are shown in Figure 5
14 C. Klinkm ¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
Municipality 1
Municipality 2
Municipality 3
Municipality 4
Municipality 5
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
variant frequency
activity frequency
end date
start date
activity frequency
end date
start date
variant frequency
activity frequency
end date
start date
Fig. 5: Total eect indexes for the normative profile and variants V3–V5
where higher values for a parameter indicate a stronger contribution of this parameter
to the variance. In line with the uncertainty analysis for the variants, the total eect
indexes show the frequency and threshold parameters to contribute the most to the
uncertainty in the model topology. This provides evidence towards the utility of the
sensitivity analysis: an analyst can determine the most influential parameters without
manually inspecting possible parameter or pipeline variations. Another interesting find-
ing is that the time window filter and consolidation parameters, which without filtering
only impacted the consistency a bit, have a stronger influence in variants V3–V5. This
demonstrates that analysts need to carefully assemble discovery pipelines and cannot
assume that a ‘stable’ operation can be straightforwardly reused in other contexts.
5 Related Work
Research has studied issues related to data quality and quantity, in order to ensure that
high quality process models can be obtained from event logs. Classifications of data
quality issues [8] and data quality patterns for event logs [40] allow for systematic
cleaning of event logs to increase process mining result quality. Fitness, precision, gen-
eralization, and simplicity have been adopted as metrics to evaluate the quality of a
process model based on the event log that served as the input for a process discovery al-
gorithm [1]. Conformance checking allows to obtain further details about if and how an
event log deviates from a process model for qualitative evaluation [12,30]. Also, meth-
ods have been proposed to balance the behavioral quality of a discovered process model
with its complexity, in order to facilitate human inspection. For example, in [20] event
attributes are used to generate hierarchical process models that better represent dierent
levels of process granularity. A statistical pre-processing framework for event logs that
reduces the amount of data needed to produce high quality process models is presented
in [7]. Similarly, the influence of subset selection on the model quality was examined
in [11] where it was shown that, in contrast to random-based selection, strategic sub-
set selection increases the model quality. The taxonomy of log and model uncertainty
from [26] considers issues like incorrectness, coarseness, and ambiguity, and allows for
obtaining upper and lower uncertainty bounds for conformance checking.
Related work also proposed approaches for automatically extracting and evaluating
process discovery insights. An automatic approach that compares dierent process vari-
ants with the goal to obtain valuable insights is introduced in [6]. In more detail, the best
and worst-performing variants with respect to a set of key performance indicators are
Debugging Process Discovery Pipelines 15
determined and their dierences are presented to the analyst. ProcessExplorer [38] au-
tomatically computes potential subsets of cases and evaluates the interestingness based
on statistical dierences between insights from the subsets and from the entire event
log. Leemans et al. [21] introduce an automatic extraction approach to obtain cohorts
from event logs via trace attribute analysis. The authors measure the stochastic distance
between trace attribute cohorts to identify their influence to the process model behavior.
Complementary to these techniques, patterns, and guidelines, our consistency frame-
work enables analysts to, in a concrete context, explicate how their decisions, that un-
derlie the configuration of a discovery pipeline including its log transformations and
discovery algorithms, aect model properties at dierent granularity levels.
6 Conclusion
In this work we presented a first framework for debugging of process discovery pipelines.
We demonstrated the potential eects of pipeline operations on the discovered models
and discussed the implications for downstream decision making. Next, we proposed a
debugging framework which relies on uncertainty and sensitivity analysis, in order to
assist analysts in assessing the consistency of their insights and to quantify the contribu-
tion of pipeline parameters to potential inconsistencies. In an experiment on real-world
event logs, we assessed the utility of our framework and found that the uncertainties
and explanations delivered by the framework were well-grounded.
As mentioned in Section 3.2, comparative evaluations of consistency measures are
required to improve the framework’s applicability. Beyond that, research opportunities
ensue specifically regarding its usability,computational performance, and broader ap-
plication and evaluation. Usability topics comprise suitable user interfaces for tools,
but also the generalization towards other process mining methods including declarative
process mining; support for determining relevant parameters (e.g., via screening [36])
and their probability distributions; and means to diagnose and break down inconsis-
tencies. Moreover, repeatedly executing a pipeline for dierent configurations can be
time-consuming. While screening methods can help to reduce the number of relevant
parameters, integrated uncertainty propagation [24] or emulators [36] might speed up
the analysis. Lastly, applying the framework to a larger set of real-world scenarios could
potentially reveal and confirm (anti-)patterns for process mining pipelines [40].
In general, we believe that applying software engineering practices, as proposed in
the context of machine learning [3], is relevant for process mining as well. While tradi-
tionally process mining techniques have been made available via visual idioms which
combine visual representations and user interaction techniques, packages like BupaR
and pm4py have brought process mining to open data processing environments like R,
Python, Apache Spark, etc. This enables a paradigm shift towards script-based analy-
sis, where the ability to seamlessly integrate data processing, data mining, and machine
learning techniques and tools can ease the definition, execution, documentation, and
sharing of process mining pipelines, and reduce their fragmentation. In this regard, chal-
lenges from machine learning include testing, experiment management, transparency,
and troubleshooting [4]. Empirical studies into the practices of process analysts, such
as [18], can help to refine those challenges in the context of process mining.
16 C. Klinkm ¨
uller, A. Seeliger, R. M¨
uller, L. Pufahl, I. Weber
1. van der Aalst, W.: Process Mining: Data Science in Action. Springer (2016)
2. Adriansyah, A., Buijs, J.C.A.M.: Mining process performance from event logs. In: BPM
Workshops. pp. 217–218 (2013)
3. Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B.,
Zimmermann, T.: Software engineering for machine learning: A case study. In: ICSE SEIP.
p. 291–300 (2019)
4. Arpteg, A., Brinne, B., Crnkovic-Friis, L., Bosch, J.: Software engineering challenges of
deep learning. In: SEAA. pp. 50–59 (2018)
5. Augusto, A., Conforti, R., Dumas, M., La Rosa, M., Polyvyanyy, A.: Split miner: automated
discovery of accurate and simple business process models from event logs. Knowl Inf Syst
59, 251–284 (2019)
6. Ballambettu, N.P., Suresh, M.A., Bose, R.P.J.C.: Analyzing process variants to understand
dierences in key performance indices. In: CAISE. pp. 298–313 (2017)
7. Bauer, M., Senderovich, A., Gal, A., Grunske, L., Weidlich, M.: How much event data is
enough? a statistical framework for process discovery. In: CAISE. pp. 239–256 (2018)
8. Bose, R.P.J.C., Mans, R.S., Van Der Aalst, W.M.P.: Wanna improve process mining results?
In: IEEE SSCI. pp. 127–134 (2013)
9. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Quality dimensions in process
discovery: The importance of fitness, precision, generalization and simplicity. Int J Coop Inf
Syst 23(01), 1440001 (2014)
10. van Eck, M.L., Lu, X., Leemans, S.J.J., van der Aalst, W.M.P.: PM2: A process mining
project methodology. In: CAISE. pp. 297–313 (2015)
11. Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P.: The Impact of Event Log Subset Selec-
tion on the Performance of Process Discovery Algorithms. In: ADBIS. pp. 391–404 (2019)
12. Garc´
nuelos, L., van Beest, N.R.T.P., Dumas, M., Rosa, M.L., Mertens, W.: Complete
and interpretable conformance checking of business processes. IEEE Trans Softw Eng 44(3),
262–290 (2018)
13. Homma, T., Saltelli, A.: Importance measures in global sensitivity analysis of nonlinear mod-
els. Reliab Eng Syst Saf 52(1), 1–17 (1996)
14. Jansen, M.J.W.: Analysis of variance designs for model output. Comput Phys Commun
117(1), 35–43 (1999)
15. Kalenkova, A., Polyvyanyy, A., La Rosa, M.: A framework for estimating simplicity of au-
tomatically discovered process models based on structural and behavioral characteristics. In:
BPM. pp. 129–146 (2020)
16. Klinkm¨
uller, C., Weber, I.: Every apprentice needs a master: Feedback-based eectiveness
improvements for process model matching. Inf Syst 95, 101612 (2021)
17. Klinkm¨
uller, C., van Beest, N.R.T.P., Weber, I.: Towards reliable predictive process monitor-
ing. In: CAISE Forum. pp. 163–181 (2018)
18. Klinkm¨
uller, C., M¨
uller, R., Weber, I.: Mining process mining practices: An exploratory
characterization of information needs in process analytics. In: BPM. pp. 322–337 (2019)
19. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process
models from event logs - a constructive approach. In: Petri Nets. pp. 311–329 (2013)
20. Leemans, S.J.J., Goel, K., Van Zelst, S.J.: Using multi-level information in hierarchical pro-
cess mining: Balancing behavioural quality and model complexity. In: ICPM. pp. 137–144
21. Leemans, S.J.J., Shabaninejad, S., Goel, K., Khosravi, H., Sadiq, S., Wynn, M.T.: Identify-
ing Cohorts: Recommending Drill-Downs Based on Dierences in Behaviour for Process
Mining. In: ER. pp. 92–102 (2020)
Debugging Process Discovery Pipelines 17
22. Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of
business processes. In: CAISE. pp. 457–472 (2014)
23. Mannhardt, F., Blinde, D.: Analyzing the trajectories of patients with sepsis using process
mining. In: BPMDS. pp. 72–80 (2017)
24. Manousakis, I., Goiri, I.n., Bianchini, R., Rigo, S., Nguyen, T.D.: Uncertainty propagation
in data processing systems. In: ACM SOOC. pp. 95–106 (2018)
25. Mariscal, G., Marb´
an, s., Fern´
andez, C.: A survey of data mining and knowledge discovery
process models and methodologies. Knowl Eng Rev 25(2), 137–166 (2010)
26. Pegoraro, M., van der Aalst, W.M.P.: Mining uncertain event data in process mining. In:
ICPM. pp. 89–96 (2019)
27. Polyvyanyy, A., Armas-Cervantes, A., Dumas, M., Garc´
nuelos, L.: On the expressive
power of behavioral profiles. Formal Aspects of Computing 28(4), 597–613 (2016)
28. Puy, A., Lo Piano, S., Saltelli, A.: Is vars more intuitive and ecient than sobol’ indices?
Environ Model Softw 137, 104960 (2021)
29. Razavi, S., Gupta, H.V.: A new framework for comprehensive, robust, and ecient global
sensitivity analysis: 1. theory. Water Resour Res 52(1), 423–439 (2016)
30. Rozinat, A., van der Aalst, W.M.P.: Conformance checking of processes based on monitoring
real behavior. Inf Syst 33(1), 64–95 (2008)
31. Sacha, D., Senaratne, H., Kwon, B.C., Ellis, G., Keim, D.A.: The role of uncertainty, aware-
ness, and trust in visual analytics. IEEE Trans Vis Comput Graph 22(1), 240–249 (2016)
32. Sacha, D., Stoel, A., Stoel, F., Kwon, B.C., Ellis, G., Keim, D.A.: Knowledge generation
model for visual analytics. IEEE Trans Vis Comput Graph 20(12), 1604–1613 (2014)
33. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput
Phys Commun 145(2), 280–297 (2002)
34. Saltelli, A., Aleksankina, K., Becker, W., Fennell, P., Ferretti, F., Holst, N., Li, S., Wu, Q.:
Why so many published sensitivity analyses are false: A systematic review of sensitivity
analysis practices. Environ Model Softw 114, 29–39 (2019)
35. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output. design and estimator for the total sensitivity index. Com-
put Phys Commun 181(2), 259–270 (2010)
36. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global Sensitivity Analysis. The Primer. Wiley (2008)
37. Sargent, R.G.: Verification and validation of simulation models. J Simul pp. 12–24 (2013)
38. Seeliger, A., S´
anchez Guinea, A., Nolle, T., M¨
auser, M.: Processexplorer: Intelligent
process mining guidance. In: BPM (2019)
39. Sobol, I.M.: Uniformly distributed sequences with an additional uniform property. USSR
Comput Math Math Phys 16(5), 236–242 (1976)
40. Suriadi, S., Andrews, R., ter Hofstede, A.H.M., Wynn, M.T.: Event log imperfection patterns
for process mining: Towards a systematic approach to cleaning event logs. Inf Syst 64, 132–
150 (2017)
41. Weidlich, M., Mendling, J., Weske, M.: Ecient consistency measurement based on behav-
ioral profiles of process models. IEEE Trans Softw Eng 37(3), 410–429 (2011)
42. Weidlich, M., Polyvyanyy, A., Mendling, J., Weske, M.: Ecient computation of causal
behavioural profiles using structural decomposition. In: Petri Nets. pp. 63–83 (2010)
43. Weidlich, M., Polyvyanyy, A., Mendling, J., Weske, M.: Causal behavioural profiles - e-
cient computation, applications, and evaluation. Fundam Inf 113(3–4), 399–435 (2011)
44. Wieringa, R.J.: Design Science Methodology for Information Systems and Software Engi-
neering. Springer (2014)
45. Yang, K., Huang, B., Stoyanovich, J., Schelter, S.: Fairness-aware instrumentation of prepro-
cessing pipelines for machine learning. In: HILDA (2020)
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The Variogram Analysis of Response Surfaces (VARS) has been proposed by Razavi and Gupta as a new comprehensive framework in sensitivity analysis. According to these authors, VARS provides a more intuitive notion of sensitivity and it is much more computationally efficient than Sobol’ indices. Here we review these arguments and critically compare the performance of VARS-TO, for total-order index, against the total-order Jansen estimator. We argue that, unlike classic variance-based methods, VARS lacks a clear definition of what an “important” factor is, and we show that the alleged computational superiority of VARS does not withstand scrutiny. We conclude that while VARS enriches the spectrum of existing methods for sensitivity analysis, especially for a diagnostic use of mathematical models, it complements rather than replaces classic estimators used in variance-based sensitivity analysis.
Full-text available
A plethora of algorithms for automatically discovering process models from event logs has emerged. The discovered models are used for analysis and come with a graphical flowchart-like representation that supports their comprehension by analysts. According to the Occam’s Razor principle, a model should encode the process behavior with as few constructs as possible, that is, it should not be overcomplicated without necessity. The simpler the graphical representation, the easier the described behavior can be understood by a stakeholder. Conversely, and intuitively, a complex representation should be harder to understand. Although various conformance checking techniques that relate the behavior of discovered models to the behavior recorded in event logs have been proposed, there are no methods for evaluating whether this behavior is represented in the simplest possible way. Existing techniques for measuring the simplicity of discovered models focus on their structural characteristics such as size or density, and ignore the behavior these models encoded. In this paper, we present a conceptual framework that can be instantiated into a concrete approach for estimating the simplicity of a model, considering the behavior the model describes, thus allowing a more holistic analysis. The reported evaluation over real-life event logs for several instantiations of the framework demonstrates its feasibility in practice.
Full-text available
Process models are a central element of modern business process management technology. When adopting such technology, organizations inevitably establish process model collections which, depending on the degree of adoption, can reach sizes of thousands of models. Process model matching techniques are intended to assist experts in the management of such large collections, e.g., in querying the collections and in comparing process models. Yet, as demonstrated in comparative evaluations, existing techniques struggle to achieve a high effectiveness on real-world datasets, limiting their practical applicability. This is partly due to these techniques being fully automated and relying on universal knowledge bases that insufficiently represent the domain semantics of model collections. To increase effectiveness and to progress on the path to practical applicability, we pursue the idea of integrating expert feedback into the matching process, so as to continuously update the knowledge base and achieve a better domain adaptation. In particular, we present ADBOT, a matching technique that relies on expert feedback in terms of corrected matching results. Our contributions are twofold. First, we introduce different strategies to utilize expert feedback in the matching process and to improve its effectiveness. Second, we provide heuristics for guiding experts through a model collection intended to reduce the amount of collected feedback while still maximizing the gains of learning from it. Based on five separate real-world datasets we provide empirical evidence towards the feasibility of our matcher. In the experiments, ADBOT (i) achieves high f-measures of up to .90, (ii) improves the effectiveness of baseline matchers by up to 88%, (iii) yields high recall values due to the detection of correspondences that automated matchers fail to achieve, and (iv) still increases effectiveness when the feedback contains errors. We also discuss evidence that substantiates ADBOT’s individual components, amongst others demonstrating that the guidance heuristics can maximize effectiveness, while minimizing human effort.
Full-text available
Process discovery algorithms automatically discover process models on the basis of event data, captured during the execution of business processes. These algorithms tend to use all of the event data to discover a process model. When dealing with large event logs, it is no longer feasible using standard hardware in limited time. A straightforward approach to overcome this problem is to down-size the event data by means of sampling. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper evaluates various subset selection methods and evaluates their performance on real event data. The proposed methods have been implemented in both the ProM and the RapidProM platforms. Our experiments show that it is possible to speed up discovery considerably using ranking-based strategies. Furthermore, results show that biased selection of the process instances compared to random selection of them will result in process models with higher quality.
Full-text available
Many business process management activities benefit from the investigation of event data. Thus, research, foremost in the field of process mining, has focused on developing appropriate analysis techniques, visual idioms, methodologies, and tools. Despite the enormous effort, the analysis process itself can still be fragmented and inconvenient: analysts often apply various tools and ad-hoc scripts to satisfy information needs. Therefore, our goal is to better understand the specific information needs of process analysts. To this end, we characterize and examine domain problems, data, analysis methods, and visualization techniques associated with visual representations in 71 analysis reports. We focus on the representations, as they are of central importance for understanding and conveying information derived from event data. Our contribution lies in the explication of the current state of practice, enabling the evaluation of existing as well as the creation of new approaches and tools against the background of actual, practical needs.
Conference Paper
Full-text available
Nowadays, more and more process data are automatically recorded by information systems, and made available in the form of event logs. Process mining techniques enable process-centric analysis of data, including automatically discovering process models and checking if event data conform to a certain model. In this paper we analyze the previously unexplored setting of uncertain event logs: logs where quantified uncertainty is recorded together with the corresponding data. We define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained aligning an uncertain trace onto a regular process model.
Conference Paper
Full-text available
Surprisingly promising results have been achieved by deep learning (DL) systems in recent years. Many of these achievements have been reached in academic settings, or by large technology companies with highly skilled research groups and advanced supporting infrastructure. For companies without large research groups or advanced infrastructure, building high-quality production-ready systems with DL components has proven challenging. There is a clear lack of well-functioning tools and best practices for building DL systems. It is the goal of this research to identify what the main challenges are, by applying an interpretive research approach in close collaboration with companies of varying size and type. A set of seven projects have been selected to describe the potential with this new technology and to identify associated main challenges. A set of 12 main challenges has been identified and categorized into the three areas of development, production, and organizational challenges. Furthermore, a mapping between the challenges and the projects is defined, together with selected motivating descriptions of how and why the challenges apply to specific projects. Compared to other areas such as software engineering or database technologies, it is clear that DL is still rather immature and in need of further work to facilitate development of high-quality systems. The challenges identified in this paper can be used to guide future research by the software engineering and DL communities. Together, we could enable a large number of companies to start taking advantage of the high potential of the DL technology.
Conference Paper
Large amount of data is collected in event logs from information systems, reflecting the actual execution of business processes. Due to the highly competitive pressure in the market, organizations are particularly interested in optimizing their processes. Process mining enables the extraction of valuable knowledge from event logs, such as deviations, bottlenecks, and anomalies. Due to the increase of process complexity in flexible environments, visual exploration is increasingly becoming more challenging. In this paper, we propose ProcessExplorer, an interactive process mining approach to enable fast data analysis and exploration. ProcessExplorer takes an event log as input to automatically suggest subsets of similar process behavior, evaluate each subset, generate interesting insights, and suggest the subsets with the most interesting characteristics. We implemented our approach into an interactive visual exploration system, which we use as part of a user study conducted to evaluate our approach. Our results show that ProcessExplorer can be successfully applied to analyze and explore real-life data sets efficiently.