Content uploaded by Marco Pegoraro

Author content

All content in this area was uploaded by Marco Pegoraro on Apr 04, 2022

Content may be subject to copyright.

Probability Estimation of Uncertain

Process Trace Realizations

Marco Pegoraro 1, Bianka Bakullari 1, Merih Seran Uysal 1, and

Wil M.P. van der Aalst 1

1Chair of Process and Data Science (PADS), Department of Computer Science,

RWTH Aachen University, Aachen, Germany

{pegoraro, bianka.bakullari, uysal, vwdaalst}@pads.rwth-aachen.de

Abstract

Process mining is a scientic discipline that analyzes event data, ofen collected

in databases called event logs. Recently, uncertain event logs have become of in-

terest, which contain non-deterministic and stochastic event attributes that may

represent many possible real-life scenarios. In this paper, we present a method to

reliably estimate the probability of each of such scenarios, allowing their analy-

sis. Experiments show that the probabilities calculated with our method closely

match the true chances of occurrence of specic outcomes, enabling more trust-

worthy analyses on uncertain data.

Keywords: Process Mining ·Uncertain Data ·Partial Order.

Colophon

This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-

ternational” license.

©the authors. Some rights reserved.

This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:

Pegoraro, Marco et al. “Probability Estimation of Uncertain Process Trace Realizations”. In: International Workshop

on Event Data and Behavioral Analytics (EdbA). Springer, 2021

Please, cite this document as shown above.

Publication chronology:

•2021-06-15: abstract submitted to the International Conference on Process Mining (ICPM) 2021, main track

•2021-07-01: full text submitted to the International Conference on Process Mining (ICPM) 2021, main track

•2021-08-16: notication of rejection

•2021-08-17: abstract submitted to the International Workshop on EventData and Behavioral Analytics (EdbA) 2021

•2021-08-20: full text submitted to the International Workshopon Event Data and Behavioral Analytics (EdbA) 2021

•2021-09-16: notication of acceptance

•2021-09-22: camera-ready version submitted

•2021-11-01: presented

•2022-03-24: proceedings published

The published version referred above is ©Springer.

Correspondence to:

Marco Pegoraro, Chair of Process and Data Science (PADS), Department of Computer Science,

RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany

Website: http://mpegoraro.net/ ·Email: pegoraro@pads.rwth- aachen.de ·ORCID:0000-0002-8997-7517

Content: 16 pages, 7 gures, 4 tables, 11 references. Typeset with pdfL

A

T

E

X, Biber, and BibL

A

T

E

X.

Please do not print this document unless strictly necessary.

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

1 Introduction

Process mining is a discipline that focuses on extracting insights about processes in a

data-driven manner. For instance, on the basis of the recorded information on histor-

ical process executions, process mining allows to automatically extract a model of the

behavior of process instances, or to measure the compliance of the process data with a

prescribed normative model of the process. In process mining, the central focus is on the

event log, a collection of data that tracks past process instances. Every activity performed

in a process is recorded in the event log, together with information such as the corre-

sponding process case and the timestamp of the activity, in a sequence of events called a

trace.

Recently, research on novel forms of event data have garnered the attention of the

scientic community. Among these there are uncertain event logs, which contain data

afected by imprecision [8]. This data contains meta-information describing the nature

and entity of the uncertainty. Such meta-information can be obtained from the inher-

ent precision with which the data has been recorded (e.g., timestamps only indicating

the date have a possible “true value” range of 24 hours), from the precision of the tools

involved in supporting the process (e.g., the absolute error of sensors), or from the do-

main knowledge provided by a process expert. An uncertain trace corresponds to mul-

tiple possible real-life scenarios, each of which might have very diverse implications on

features of cases such as compliance to a model. It is then important to be able to assess

the risk of occurrence of specic outcomes of uncertain traces, which enables to estimate

the impact of such traces on indicators such as cost and conformance.

In this paper, we present a method to obtain a complete probability distribution

over the possible instantiations of uncertain attributes in a trace. As a possible example

of application, we frame our results in the context of conformance checking, and show

the impact of assessing probability estimates for uncertain traces on insights about the

compliance of an uncertain trace to a process model. We validate our method with exper-

iments based on a Monte Carlo simulation, which shows that the probability estimates

are reliable and reect the true chances of occurrence of a specic outcome.

The remainder of the paper is structured as follows. Section 2examines relevant

related work. Section 3illustrates a motivating running example for our technique. Sec-

tion 4presents preliminary denitions of diferent types of uncertainty in process min-

ing. Section 5illustrates a method for computing probabilities of realizations for uncer-

tain process traces. Section 6validates our method through experimental results. Finally,

Section 7concludes the paper.

3 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

2 Related Work

The analysis of uncertain data in process mining is a very recent research direction. The

specic formulation and denition of uncertain data utilized in this paper has been in-

troduced in 2019 [8], in the context of an analysis approach consisting in computing

bounds for the conformance score of uncertain traces through alignments [5]. Subse-

quently, that work has been extended with an inductive mining approach for process

discovery over uncertainty [10] and a taxonomy of diferent types of uncertain data, with

their characteristics [9].

Uncertain data, as formulated in our present and previous work, is closely related to a

considerably more studied data anomaly in process mining: partially ordered event data.

In fact, uncertain data as described here is a generalization of partially ordered traces. Lu

et al. [7] proposed a conformance checking approach based on alignments to measure

conformance of partially ordered traces. More recently, Van der Aa et al. [1] illustrated a

method for inferring a linear extension, i.e., a compliant total order, of events in partially

ordered traces, based on examples of correct orderings extracted from other traces in the

log. Busany et al. [4] estimated probabilities for partially ordered events in IoT event

streams.

An associated topic, which draws from disciplines such as pattern and sequence min-

ing and is antithetical to the analysis of partially ordered data, is the inference of partial

orders from fully sequential data as a way to model its behavior. This goes under the

name of episode mining, which can be performed with many techniques both on batched

data and with online streams of events [11,6,2].

In this paper, we present a method to estimate the likelihood of any scenario in

an uncertain setting, which covers partially ordered traces as well as other types of un-

certainty illustrated in the taxonomy [9]. Furthermore, we will cover both the non-

deterministic case (strong uncertainty) and the probabilistic case (weak uncertainty).

3 Running Example

In this section, we will provide a running example of uncertain process instance related

to a sample process. We will then apply our probability estimation method to this un-

certain trace, to illustrate its operation. The example we analyze here is a simplied gen-

eralization of a remote credit card fraud investigation process. This process is visualized

by the Petri net in Figure 1.

Firstly, the credit card owner alerts the credit card company of a possibly fraudulent

transaction. The customer may either notify the company by calling their hotline (alert

hotline) or arrange an urgent meeting with personnel of the bank that issued the credit

card (alert bank). In both scenarios, his credit is frozen (freeze credit) to prevent further

4 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

Figure 1: A Petri net model of the credit card fraud investigation process. This net allows for 10 possible

traces.

fraud. All information provided by the customer about the transaction is summarized

when ling the formal report (ﬁle report). As a next step, the credit card company tries

to contact the merchant that charged the credit card. If this happens (contact merchant),

the credit card company claries whether there has been just a mistake (e.g., merchant

charging not delivering a product, or a billing mistake) on the merchant’s side. In such

cases, the customer gets a refund from merchant and the case is closed. Another outcome

might be the discovery of a friendly fraud, which is when a cardholder makes a purchase

and then disputes it as fraud even though it was not. If contacting the merchant is impos-

sible, a fraud investigation is initiated. In this case, fraud investigators will usually start

with the transaction data and look for timestamps, geolocation, IP addresses, and other

elements that can be used to prove whether or not the cardholder was involved in the

transaction. The outcome might be either friendly fraud or true fraud. True fraud can

also happen when both the merchant and the cardholder are afected by the fraud. In

this case, the cardholder receives a refund from the credit institute (activity refund credit

institute) and the case is closed.

Note that for simplicity, we have used single letters to represent the activity labels in

the Petri net transitions. Some possible traces in this process are for example: hh, c, r, m, ui,

hb, c, r, m, f i,hh, c, r, i, f iand hb, c, r, i, t, vi.

Suppose that the credit card company wants to perform conformance checking to

identify deviant process instances. However, some traces in the information system of

the company are afected by uncertainty, such as the one in Table 1.

Suppose that in the rst half of October 2020, the company was implementing a new

system for automatic event data generation. During this time, the event data regarding

the credit card fraud investigation process ofen had to be inserted manually by the em-

ployees. Such manual recordings were subject to inaccuracies, leading to imprecise or

missing data afecting the cases during this period. The process instance from Table 1is

one of the afected instances. Here, events e2, e3, e5, e6are uncertain. The timestamp of

event e2is not precise enough, so the possible timestamp lies between 06-10-2020 00:00

5 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

Table 1: Example of an uncertain case from the credit card fraud investigation process.

Case ID Event ID Activity Timestamp Ind.

5167 e1h(alert hotline) 05-10-2020 23:00

5167 e2c(freeze credit) 06-10-2020

5167 e3r(le report) U(05-10-2020 20:00,

06-10-2020 10:00)

5167 e4i(fraud investigation) 09-10-2020 10:00

5167 e5

{f: 0.3(friendly fraud),

t: 0.7(true fraud)}14-10-2020 09:00

5167 e6v(refund credit institute) 15-10-2020 10:00 ?

and 06-10-2020 23:59. Event e3has happened some time between 20:00 on October 5th

and 10:00 on October 6th. Event e5has two possible activity labels: fwith probability

0.3and twith probability 0.7. Refunding the customer (event e6) has been recorded in

the system, but the customer has not received the money yet, which is why the event is

indeterminate: this is indicated with a question mark (?) in the rightmost column, and

indicates an event that has been recorded, but for which is unclear if it actually occurred

in reality.

The credit card company is interested in understanding if and how the data in this

uncertain trace conforms with the normative process model, and the entity of the ac-

tual compliance risk; they are specically interested in knowing whether a severely non-

compliant scenario is highly likely. In the remainder of the paper, we will describe a

method able to estimate the probability of all possible outcome scenarios.

4 Preliminaries

Let us now present some preliminary denitions regarding uncertain event data.

Deﬁnition 1 (Uncertain attributes).Let Ube the universe of attribute domains,

and the set D∈Ube an attribute domain. Any D∈Uis a discrete set or a totally

ordered set. A strongly uncertain attribute of domain Dis a subset dS⊆Dif Dis a

discrete set, and it is a closed interval dS= [dmin, dmax]with dmin ∈Dand dmax ∈D

otherwise. We denote with SDthe set of all such strongly uncertain attributes of domain

D. A weakly uncertain attribute fDof domain Dis a function fD:D6→ [0,1] such

that 0<Px∈DfD(x)≤1if Dis ﬁnite, 0<R∞

−∞ fD(x)dx ≤1otherwise. We denote

with WDthe set of all such weakly uncertain attributes of domain D. We collectively

denote with UD=SD∪WDthe set of uncertain attributes of domain D.

It is easy to see how a “certain” attribute x, with a value not afected by any uncer-

6 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

tainty, can be represented through the denitions in use here: if its domain is discrete,

it can be represented with the singleton {x}; otherwise, it can be represented with the

degenerate interval [x, x].

Deﬁnition 2 (Uncertain events).Let UIbe the universe of event identiers. Let UC

be the universe of case identiers. Let A∈Ube the discrete domain of all the activity

identiers. Let T∈Ube the totally ordered domain of all the timestamp identiers.

Let O={?} ∈ U, where the “?” symbol is a placeholder denoting event indeterminacy.

The universe of uncertain events is denoted with E=UI×UC×UA×UT×UO.

The activity label, timestamp and indeterminacy attribute values of an uncertain

event are drawn from UA,UTand UO; in accordance with Denition 1, each of these

attributes can be strongly uncertain (set of possible values or interval) or weakly uncer-

tain (probability distribution). The indeterminacy domain is dened on a single element

“?”: thus, strongly uncertain indeterminacy may be {?}(indeterminate event) or ∅(no

indeterminacy). In weakly uncertain indeterminacy, the “?” element is associated to a

probability value.

Deﬁnition 3 (Projection functions).For an uncertain event e= (i, c, a, t, o)∈E,

we deﬁne the following projection functions: πa(e) = a,πt(e) = t,πo(e) = o. We deﬁne

πset

a(e) = aif ais strongly uncertain, and πset

a(e) = {x∈UA|fA(x)>0}with

a=fAotherwise. If the timestamp t= [tmin, tmax ]is strongly uncertain, we deﬁne

πtmin (e) = tmin and πtmax (e) = tmax. If the timestamp t=fTis weakly uncertain, we

deﬁne πtmin (e) = argminx(fT(x)>0) and πtmax (e) = argmaxx(fT(x)>0).

Deﬁnition 4 (Uncertain traces and logs).τ⊂Eis an uncertain trace if all the

event identiﬁers in τare unique and all events in τshare the same case identiﬁer c∈UC.

Tdenotes the universe of uncertain traces. L⊂Tis an uncertain log if all the event

identiﬁers in Lare unique.

Deﬁnition 5 (Realizations of uncertain traces).Let e, e0∈Ebe two uncertain

events. ≺Eis a strict partial order deﬁned on the universe of strongly uncertain events E

as e≺Ee0⇔πtmax (e)< πtmin (e0). Let τ∈Tbe an uncertain trace. The sequence

ρ=he1, e2, . . . , eni ∈ E∗, with n≤ |τ|, is an order-realization of τif there exists a total

function f:{1,2, . . . , n} → τsuch that:

•for all 1≤i<j≤nwe have that ρ[j]⊀Eρ[i],

•for all e∈τwith πo(e) = ∅there exists 1≤i≤nsuch that f(i) = e.

We denote with RO(τ)the set of all such order-realizations of the trace τ.

Given an order-realization ρ=he1, e2, . . . , eni ∈ RO(τ), the sequence σ∈UA∗

is a realization of ρif σ∈ {ha1, a2, . . . , ani | ∀1≤i≤nai∈πset

a(i)}. We denote with

7 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

RA(ρ)⊆UA∗the set of all such realizations of the order-realization ρ. We denote with

R(τ)⊆UA∗the union of the realizations obtainable from all the order-realizations of

τ:R(τ) = Sρ∈RO(τ)RA(ρ). We will say that an order-realization ρ∈RO(τ)enables

a sequence σ∈UA∗if σ∈RA(ρ).

Detailing an algorithm to generate all realizations of an uncertain trace is beyond

the scope of this paper. The literature illustrates a conformance checking method over

uncertain data which employs a behavior net, a Petri net able to replay all and only the re-

alizations of an uncertain trace [8]. Exhaustively exploring all complete ring sequences

of a behavior net, e.g., through its reachability graph, provides all realizations of the cor-

responding uncertain trace.

Given the above formalization, we can now dene more clearly the research question

that we are investigating in this paper. Given an uncertain trace τ∈Tand one of its

realizations σ∈R(τ), our goal is to obtain a procedure to reliably compute P(σ|τ) =

“probability of σgiven that we observe τ”. In other words, provided that σcorresponds to

a scenario (i.e., a realization) for the uncertain trace τ, we are interested in calculating the

probability that σis the actual scenario occurred in reality, which caused the recording

of the uncertain trace τin the event log. In the next section, we will illustrate how to

calculate such probabilities of uncertain traces realizations.

5 Method

Before we show how we can obtain probability estimates for all realizations of an uncer-

tain trace, it is important to state an assumption: the information on uncertainty related

to a particular attribute in some event is independent of the possible values of the same

attribute present in other events, and it is independent of the uncertainty information

on other attributes of the same event. Note that in the examples of uncertainty sources

given in Section 1(data coarseness and sensor errors), this independence assumption of-

ten holds.

Additionally, we need to consider the fact that strongly uncertain attributes do not

come with known probability values: their description only species the values that at-

tributes might acquire, but not the likelihood of each possible value. As a consequence,

estimating probability for specic realizations in a strongly uncertain environment is

only possible with a-priori assumptions on how probability distributes among the at-

tribute value. At times, it might be possible to assume the distribution in an informed

way—for instance, on the basis of features of the information system hosting the data, of

the sensors recording events and attributes, or other tools involved in the management

of the process.

In case no indication is present, a reasonable assumption—which we will hold for

the remainder of the paper—is that any possible value of a strongly uncertain attribute

8 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

is equally likely. Formally, with e= (i, c, a, t, o)∈Elet τs:E→Ebe a function such

that τs(e)=(i, c, a0, t0, o0), where a0={(x, 1

|πset

a(e)|)|x∈πset

a(e)}if a∈SAand a0=a

otherwise; t0=U(πtmin (e), πtmax (e)) if t∈STand t0=totherwise; o0= 0.5if o={?}

and o0=ootherwise.

First, observe that the probability P(σ|τ)that an activity sequence σ∈UA∗is

indeed a realization of the trace τ∈T, and thus σ∈R(τ), increases with the number

of order-realizations enabling it. Furthermore, for each such order-realizations, one can

construct a probability function PO(ρ|τ)reecting the likelihood of the sequence ρ

itself given the trace τ, and a probability function PA(σ|ρ)reecting the likelihood

that the realization corresponding to ρis indeed σ. The value of PO(ρ|τ)is afected by

the uncertainty information in timestamps and indeterminate events, while the value of

PA(σ|ρ)is aggregated from the uncertainty information in the activity labels.

Given a realization σof an uncertain process instance and the set of its enablers, its

probability is computed as following:

P(σ|τ) = X

ρ∈E∗

PO(ρ|τ)·PA(σ|ρ)

Note that, if ρdoes not enable σ,PA(σ|ρ) = 0. For any uncertain trace τ∈T, it

holds that Pσ∈R(τ)P(σ|τ)=1, since both PO(·)and PA(·)are each constructed to be

(independent) probability distributions.

We will now compute PA(σ|ρ)using the information on the activity labels uncer-

tainty. Let us write fe

Aas a shorthand for πa(e). If there is uncertainty in activities, then

for each event e∈ρand activity label a∈πset

a(e), the probability that eexecutes ais

given by fe

A(a). Thus, for every ρ=he1, ..., eni ∈ RO(τ)and σ=ha1, ..., ani ∈ RO(τ),

the value PAcan be aggregated from these distributions in the following way:

PA(σ|ρ) =

n

Y

i=1

fi

A(ai)

Through the value of PA, we can assess the likelihood that any given order-realization

executes a particular realization. The next step is to estimate the probability of each

order-realization ρfrom the set RO(τ). The probability of observing ρneeds to be ag-

gregated from the probability that the corresponding set of events appears in the given

particular order, which is determined by the timestamp intervals and, if applicable, the

distributions over them; and the probability that the order-realization contains the cor-

responding specic set of events, which is determined by the uncertainty information

on the indeterminacy. Multiplying the two values obtained above to yield a probability

9 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

estimate for the order-realization reects our independence assumption. Let us rstly

focus on uncertainty on timestamps, which causes the events to be partially ordered.

We will write fe

T(t)as a shorthand for πt(e)(t). For every event e, the value of fe

T(t)

yields the probability that event ehappened on timestamp t. This value is always 0 for all

t < πtmin (e)and t > πtmax (e)(see πtmin and πtmax in Denition 3). Given the continuous

domain of timestamps, PO(·)is assessed by using integrals. For a trace τ∈Tand an

order-realization ρ=he1, ..., eni ∈ RO(τ), let ai=πtmin (i)and bi=πtmax (i)for all

1≤i≤n. Then, we dene:

I(ρ) = Zmin{b1,...,bn}

a1

fe1

T(x1)Zmin{b2,...,bn}

max{a2,x1}

fe2

T(x2)· · ·

Zmin{bi,...,bn}

max{ai,xi−1}

fi

T(xi)· · · Zbn

max{an,xn−1}

fen

T(xn)dxn. . . dx1

=Zmin{b1,...,bn}

a1Zmin{b2,...,bn}

max{a2,x1}

· · · Zmin{bi,...,bn}

max{ai,xi−1}

· · · Zbn

max{an,xn−1}

n

Y

i=1

fi

T(xi)dxn. . . dx1

This chain of integrals allows us to compute the probability of a specic order among

all the events in an uncertain trace. Now, to compute the probability of each realization

from Reaccounting for indeterminate events, we combine both the probability of the

events having appeared in a particular order and the probability that the sequence con-

tains exactly those events. For simplicity, we will use a function that acquires the value 1

if an event is not indeterminate. Let us dene fe

O:O→[0,1] such that fe

O(?) = πo(e)(?)

if πo(e)6=∅and fe

O(?) = 1 otherwise. More precisely, given τ∈Tand ρ∈RO(τ), we

compute:

PO(ρ|τ) = I(ρ)·Y

e∈τ

e∈ρ

(1 −fe

O(?)) ·Y

e∈τ

e6∈ρ

fe

O(?)

We now have at our disposal all the necessary tools to compute a probability dis-

tribution over the trace realizations of any uncertain process instance in any possible

uncertainty scenario. Let us then apply this method to compute the probabilities of all

realizations of the trace τin Table 1, and to analyze its conformance to the process in

Figure 1.

Each order-realization of τenables two realizations, because event e5has two pos-

sible activity labels. Since for events e∈τ\ {e5}, we have fe

Aequal to 1 for their cor-

responding unique activity label, the probability that an order-realization ρ∈RO(τ)

has some realization σ∈RA(ρ)only depends on whether the trace σcontains activ-

ity for t. Thus, for traces σ10, σ20, σ30, σ 40, σ50, σ 60and their unique enabling sequences,

10 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

Table 2: The possible order-realizations of the

process instance from Table 1and their proba-

bilities.

Order-realization ρ I(ρ)PO(ρ)

ρ1:he1, e2, e3, e4, e5, e6i0.140 0.074

ρ2:he1, e3, e2, e4, e5, e6i0.780 0.390

ρ3:he3, e1, e2, e4, e5, e6i0.072 0.036

ρ4:he1, e2, e3, e4, e5i0.149 0.074

ρ5:he1, e3, e2, e4, e5i0.780 0.390

ρ6:he3, e1, e2, e4, e5i0.072 0.036

Table 3: The set of possible realizations of the example

from Table 1, their enablers, their probabilities, and their

conformance scores. The conformance score is equal to

the cost of the optimal alignment between the trace and

the Petri net in Figure 1.

Realization σ ρ P(σ|τ)conf

σ10:hh, c, r, i, f, viρ1PO(ρ1)·PA(σ10|ρ1)=0.022 1

σ100 :hh, c, r, i, t, viρ1PO(ρ1)·PA(σ100 |ρ1)=0.052 0

σ20:hh, r, c, i, f, viρ2PO(ρ2)·PA(σ20|ρ2)=0.117 3

σ200 :hh, r, c, i, t, viρ2PO(ρ2)·PA(σ200 |ρ2)=0.273 2

σ30:hr, h, c, i, f, viρ3PO(ρ3)·PA(σ30|ρ3)=0.011 3

σ300 :hr, h, c, i, t, viρ3PO(ρ3)·PA(σ300 |ρ3)=0.025 2

σ40:hh, c, r, i, f iρ4PO(ρ4)·PA(σ40|ρ4)=0.022 0

σ400 :hh, c, r, i, tiρ4PO(ρ4)·PA(σ400 |ρ4)=0.052 1

σ50:hh, r, c, i, f iρ5PO(ρ5)·PA(σ50|ρ5)=0.117 2

σ500 :hh, r, c, i, tiρ5PO(ρ5)·PA(σ500 |ρ5)=0.273 3

σ60:hr, h, c, i, f iρ6PO(ρ6)·PA(σ60|ρ6)=0.011 2

σ600 :hr, h, c, i, tiρ6PO(ρ6)·PA(σ600 |ρ6)=0.025 3

we always have PA(σi0|si

e) = fe5

A(f) = 0.3, where i∈ {1,...,6}. Similarly, for traces

σ100 , σ200, σ 300, σ 400, σ 500, σ 600 and their unique enabling sequences, we alwayshave PA(σi00 |ρi)

=fe5

A(t) = 0.7, where i∈ {1,...,6}. Next, we calculate the PO(·)values for the 6 possi-

ble order-realizations in RO(τ), which are displayed in Table 2.

One can notice that the Ivalues only depend on the ordering of the rst three events,

which are also the only ones with overlapping timestamps. Since the indeterminate event

e6does not overlap with any other event, pairs of sequences where the rst three events

have the same order also have the same probability. This reects our assumption that

the occurrence and non-occurrence of e6are both equally possible. Table 3displays the

calculations for the computation of the P(σ|τ)values for all realizations. Now we

can compute the expected conformance score for the uncertain process instance τ=

{e1, . . . , e6}. We can do so by computing alignments [5] for each realization of τ:

conf (τ) = X

σ∈R(τ)

P(σ|τ)·conf (σ, M )

= 0.022 ·1+0.05 ·0+0.117 ·3+0.273 ·2+0.011 ·3+0.025 ·2

+ 0.022 ·0+0.052 ·1+0.117 ·2+0.273 ·3+0.011 ·2+0.025 ·3

= 2.204.

Given the information on uncertainty available for the trace, this conformance score

is a more realistic estimate of the real conformance score compared to taking the best,

worst or average scores with values 0, 3 and 1.75 respectively.

11 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

e1

ae2

b: 0.9

c: 0.1

e3

d

?: 0.8

e4

e

Figure 2: The behavior graph of the uncertain

trace considered as example for validation.

Figure 3: The behavior net obtained from the be-

havior graph in Figure 2.

Table 4: The set of realizations of the trace from Figure 2, their enablers, and their probabilities.

Realization σ ρ P(σ|τ)

σ1:ha, b, eiρ1:he1, e2, e4iPO(ρ1)·PA(σ1|ρ1)=0.8·0.9=0.72

σ2:ha, b, d, eiρ2:he1, e2, e3, e4iPO(ρ2)·PA(σ2|ρ2) = (0.5·0.2)·0.9=0.09

σ3:ha, d, b, eiρ3:he1, e3, e2, e4iPO(ρ3)·PA(σ3|ρ3) = (0.5·0.2)·0.9=0.09

σ4:ha, c, eiρ4:he1, e2, e4iPO(ρ4)·PA(σ4|ρ4)=0.8·0.1=0.08

σ5:ha, c, d, eiρ5:he1, e2, e3, e4iPO(ρ5)·PA(σ5|ρ5) = (0.5·0.2)·0.1=0.01

σ6:ha, d, c, eiρ6:he1, e3, e2, e4iPO(ρ6)·PA(σ6|ρ6) = (0.5·0.2)·0.1=0.01

6 Validation of Probability Estimates

In this section, we compute the probability estimates for the realizations of an uncertain

trace, and then show a validation of those estimates by Monte Carlo simulation on the

behavior net of the trace. The process instance of our example has strong uncertainty in

timestamps and weak uncertainty in activities and indeterminacy. It consists of 4 events:

e1, e2, e3and e4, where e2and e3have overlapping timestamps. Event e2executes b(resp.,

c) with probability 0.9 (resp., 0.1). There is a probability of 0.2 that e3did not occur. Fig-

ure 2shows the corresponding behavior graph, an uncertain event data visualization that

represents the time relationships between events with a directed acyclic graph [8]. Lastly,

Table 4list all the possible realizations, their probabilities, and the order-realizations en-

abling them.

We now validate our obtained probability estimates quantitatively by means of a

Monte Carlo simulation approach. First, we construct the behavior net [9] correspond-

ing to the uncertain process instance, which is shown in Figure 3. The set of replayable

traces in this behavior net is exactly the set of realizations for the uncertain instance.

Then, we simulate realizations on the behavior net, dividing the accumulated count of

each realization by the number of runs, and compare those values to our probability es-

timates. Here, we use the stochastic simulator of the PM4Py library [3]. In every step

12 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

of the simulation, the stochastic simulator chooses one enabled transition to re accord-

ing to a stochastic map, assigning a weight to each transition in the Petri net (here, the

behavior net).

To simulate uncertainty in activities, events and timestamps, we do the following:

possible activities executed by the same event appearing in an XOR-split in the behavior

net are weighted so to reect the probability values of the activity labels. Indeterminacy

is equivalently modeled as an XOR-choice between a visible transition and a silent one

in the behavior net, so to model a “skip”. If there are two or more possible activities for

an indeterminate event, then the sum of the weights of the visible transitions in relation

to the weight of the silent transition should be the same as in the distribution given in

the event type uncertainty information. Whenever there are events with overlapping

timestamps, these appear in an AND-split in the behavior net. The (enabled) path of

the AND-split which is taken rst signals which event is executed at that moment.

Let bn(τ)=(P, T )be the behavior net of trace τ. Let (e, a)∈Tbe a visible transi-

tion related to some event e∈τ. We weight (e, a)the following way:

weight((e, a)) = (fe

A(a)if πo(e) = ∅,

(1 −fe

O(?)) ·fe

A(a)otherwise.

If e∈τis an indeterminate event, then weight((e, )) = fe

O(?).

Note that according to the weight assignment function, if eis determinate, then

Pa∈πset

a(e)weight((e, a)) = 1. Otherwise, Pa∈πset

a(e)weight((e, a)) = 1 −fe

O(?) = 1 −

weight((e, τ)). By construction of the behavior net, any transition related to an event in

τcan only re in accordance with the partial order of uncertain timestamps. Addition-

ally, all transitions representing events with overlapping timestamps appear in an AND

construct. By denition of our weight function, whenever the transitions of some e∈τ

are enabled (in an XOR construct), the probability of ring one of them is 1/k, where kis

the number of events from τfor which none of the corresponding transitions have red

yet. This way, there is always a uniform distribution over the set of enabled transitions

representing overlapping events. Assigning the weights according to this distribution al-

lows to decorate the behavior net with probabilities that reect the chances of occurrence

of every possible value in uncertain attributes.

Applying the stochastic simulator ntimes yields nrealizations. For each of the 6

possible realizations for the uncertain process instance, we obtain a probability measure-

ment by dividing its simulated frequency by n. Figures 4through 7show how for greater

n, this measurement converges to the probability estimates shown in Table 4, which were

computed with our method.

To conclude, the Monte Carlo simulation shows that our estimated probabilities for

realizations match their relative frequencies when one simulates the behavior net of the

corresponding uncertain trace.

13 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

Figure 4: Plot showing how the frequency of trace

ha, b, eiconverges to the expected value of 0.72 over

1000 runs.

Figure 5: Plot showing how the frequency of trace

ha, b, d, eiconverges to the expected value of 0.09

over 1000 runs.

Figure 6: Plot showing how the frequency of trace

ha, d, b, eiconverges to the expected value of 0.09

over 1000 runs.

Figure 7: Plot showing how the frequency of trace

ha, c, eiconverges to the expected value of 0.08 over

1000 runs.

7 Conclusion

Uncertain traces inherently contain behavior, allowing for many realizations; these, in

turn, correspond to diverse possible real-life scenarios, that may have diferent conse-

quences on the management and governance of a process. In this paper, we presented a

method to quantify the probability of each realization of an uncertain trace. This enables

process analysts to weigh the impact of specic insights gathered with uncertainty-aware

process mining techniques, such as conformance checking using alignments. As a con-

sequence, information from process analysis techniques can be associated with a quan-

tication of risk or opportunity for specic scenarios, making them more trustworthy.

Multiple avenues for future work on this topic are possible. These include inferring

probabilities for uncertain traces from sections of the log not afected by uncertainty,

14 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

adopting certain traces or fragments of traces as ground truth. Moreover, inferring prob-

abilities by examining evidence against a ground truth can also be achieved with a nor-

mative model that includes information concerning the probability of error or noise in

specic parts of the process.

Acknowledgements

We thank the Alexander von Humboldt (AvH) Stifung for supporting our research in-

teractions.

References

[1] van der Aa, Han, Henrik Leopold, and Matthias Weidlich. “Partial order resolu-

tion of event logs for process conformance checking”. In: Decision Support Sys-

tems 136 (2020), p. 113347. doi:10.1016/j.dss.2020.113347.

[2] Ao, Xiang, Ping Luo, Chengkai Li, et al. “Online Frequent Episode Mining”.

In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul,

South Korea, April 13-17, 2015. Ed. by Gehrke, Johannes, Wolfgang Lehner, Kyuseok

Shim, et al. IEEE Computer Society, 2015, pp. 891–902. doi:10.1109/ICDE .

2015.7113342.

[3] Berti, Alessandro, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. “Process

Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Sci-

ence”. In: ICPM Demo Track (CEUR 2374). 2019, pp. 13–16.

[4] Busany, Nimrod, Han van der Aa, Arik Senderovich, et al. “Interval-based Queries

over Lossy IoT Event Streams”. In: Transanctions on Data Science1.4 (2020), 27:1–

27:27. doi:10.1145/3385191.

[5] van Dongen, Boudewijn F., Josep Carmona, Thomas Chatain, et al. “Aligning

Modeled and Observed Behavior: A Compromise Between Computation Com-

plexity and Quality”. In: Advanced Information Systems Engineering - 29th Inter-

national Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings.

Ed. by Dubois, Eric and Klaus Pohl. Vol. 10253. Lecture Notes in Computer Sci-

ence. Springer, 2017, pp. 94–109. doi:10.1007/978-3-319-59536-8_7.

[6] Leemans, Maikel and Wil M. P. van der Aalst. “Discovery of Frequent Episodes in

Event Logs”. In: Proceedings of the 4th International Symposium on Data-driven

Process Discovery and Analysis (SIMPDA 2014), Milan, Italy, November 19-21,

2014. Ed. by Accorsi, Rafael, Paolo Ceravolo, and Barbara Russo. Vol. 1293. CEUR

Workshop Proceedings. CEUR-WS.org, 2014, pp. 31–45. url:http://ceur-

ws.org/Vol-1293/paper3.pdf.

15 / 16

M. Pegoraro et al. Probability Estimation of Uncertain Trace Realizations

[7] Lu, Xixi, Dirk Fahland, and Wil M. P. van der Aalst. “Conformance Checking

Based on Partially Ordered Event Data”. In: Business Process Management Work-

shops - BPM 2014 International Workshops, Eindhoven, The Netherlands, Septem-

ber 7-8, 2014, Revised Papers. Ed. by Fournier, Fabiana and Jan Mendling. Vol. 202.

Lecture Notes in Business Information Processing. Springer, 2014, pp. 75–88. doi:

10.1007/978-3-319-15895-2_7.

[8] Pegoraro, Marco and Wil M. P. van der Aalst. “Mining Uncertain Event Data in

Process Mining”. In: International Conference on Process Mining, ICPM 2019,

Aachen, Germany, June 24-26, 2019. IEEE, 2019, pp. 89–96. doi:10 . 1109 /

ICPM.2019.00023.

[9] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Conformance

Checking over Uncertain Event Data”. In: Information Systems (2021), p. 101810.

doi:10.1016/j.is.2021.101810.

[10] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Discovering

Process Models from Uncertain Event Data”. In: Business Process Management

Workshops - BPM 2019 International Workshops, Vienna, Austria, September 1-

6, 2019, Revised Selected Papers. Ed. by Francescomarino, Chiara Di, Remco M.

Dijkman, and Uwe Zdun. Vol. 362. Lecture Notes in Business Information Pro-

cessing. Springer, 2019, pp. 238–249. doi:10 .1007 / 978- 3 - 030- 37453 -

2_20.

[11] Zhu, Huisheng, Peng Wang, Xianmang He, et al. “Ecient Episode Mining with

Minimal and Non-overlapping Occurrences”. In: ICDM 2010, The 10th IEEE

International Conference on Data Mining, Sydney, Australia, 14-17 December

2010. Ed. by Webb, Geofrey I., Bing Liu, Chengqi Zhang, et al. IEEE Computer

Society, 2010, pp. 1211–1216. doi:10.1109/ICDM.2010.25.

16 / 16