Content uploaded by Marco Pegoraro
Author content
All content in this area was uploaded by Marco Pegoraro on May 11, 2022
Content may be subject to copyright.
Probabilistic and Non-Deterministic Event
Data in Process Mining: Embedding
Uncertainty in Process Analysis Techniques
Marco Pegoraro 1
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
pegoraro@pads.rwth-aachen.de
Abstract
Process mining is a subeld of process science that analyzes event data collected
in databases called event logs. Recently, novel types of event data have become
of interest due to the wide industrial application of process mining analyses. In
this paper, we examine uncertain event data. Such data contain meta-attributes
describing the amount of imprecision tied with attributes recorded in an event
log. We provide examples of uncertain event data, present the state of the art in
regard of uncertainty in process mining, and illustrate open challenges related to
this research direction.
Keywords: Process Mining ·Process Science ·Event Data ·Probabilistic Data
·Non-Deterministic Data.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the authors. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Pegoraro, Marco. “Probabilistic and Non-Deterministic Event Data in Process Mining: Embedding Uncertainty in
Process Analysis Techniques”. In: Proceedings of the Doctoral Consortium Papers Presented at the 34th International
Conference on Advanced Information Systems Engineering (CAiSE 2022). Ed. by Van Looy, Amy, Barbara Weber, and
Michael Rosemann. CEUR Workshop Proceedings. CEUR-WS.org, 2022
Please, cite this document as shown above.
Publication chronology:
•2022-03-15: full text submitted to the International Conference on Advanced Information Systems Engineering (CAiSE) 2022, main track
•2022-04-11: notication of acceptance
•2022-04-25: camera-ready version submitted
The published version referred above is ©CEUR-WS.org.
Correspondence to:
Marco Pegoraro, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Website: http://mpegoraro.net/ ·Email: pegoraro@pads.rwth-aachen.de ·ORCID: 0000-0002-8997-7517
Content: 15 pages, 4 gures, 4 tables, 16 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
1 Introduction
Process mining is a rapidly growing subeld of data science that aims to automatically
analyze event data through a collection of techniques, including the extraction of a pro-
cess model from a log of historical process executions, the assessment of the conformance
and deviations between observed and expected behavior, and the measurement of met-
rics and indicators over event data and process models.
The endemic adoption of process mining in the last decades has increased the de-
mand of domain-specic process analysis techniques—for instance, techniques to ana-
lyze less traditional types of event data. In this paper, we describe novel types of event
data—collectively referred as uncertain event data [7]. Such data contain meta-attributes
describing and quantifying the amount of imprecision tied with attributes recorded in
an event log. The uncertainty tied to an event attribute might contain indications on its
possible values, or also a probability distribution over such values.
The aim of this research direction is to formally illustrate and classify diferent types
of uncertain event data, and develop ad-hoc process mining techniques able to natively
function with uncertain event data.
The remainder of the paper is structured as follows. Section 2shows examples of un-
certain event data. Section 3discusses some possible sources of uncertainty in recorded
event data. Section 4explores related concepts in process mining and neighboring dis-
ciplines. Then, Section 5lays out the research methodology and describes the state of
the art. Section 6describes some open challenges in the eld of uncertainty in process
mining. Finally, Section 7concludes the paper.
2 Uncertainty in Event Data
In order to more clearly visualize the structure of the attributes in uncertain events, let
us consider the following process instance, which is a simplied version of actually oc-
curring anomalies, e.g., in the processes of the healthcare domain.
An elderly patient enrolls in a clinical trial for an experimental treatment against
myeloproliferative neoplasms, a class of blood cancers. This enrollment includes a lab
exam and a visit with a specialist; then, the treatment can begin. The lab exam, per-
formed on the 8th of July, nds a low level of platelets in the blood of the patient, a
condition known as thrombocytopenia (TP). During the visit on the 10th of July, the
patient reports an episode of night sweats on the night of the 5th of July, prior to the
lab exam. The medic notes this but also hypothesizes that it might not be a symptom,
since it can be caused either by the condition or by external factors (such as very warm
weather). The medic also reads the medical records of the patient and sees that, shortly
prior to the lab exam, the patient was undergoing a heparin treatment (a blood-thinning
3 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
Table 1: The strongly uncertain trace of an example of healthcare process. The timestamps column shows
only the day of the month.
Case ID Event ID Timestamp Activity Indeterminacy
ID192 e15NightSweats ?
ID192 e28PrTP,SecTP
ID192 e34–10 Splenomeg
Table 2: A trace where uncertain event attributes are labeled with probabilities (weak uncertainty).
Case ID Event ID Timestamp Activity Indeterminacy
ID348 e45NightSweats ? : 25%
ID348 e58PrTP: 90%,
SecTP: 10%
ID348 e6N(7,1) Splenomeg
medication) to prevent blood clots. The thrombocytopenia, detected by the lab exam,
can then be either primary (caused by the blood cancer) or secondary (caused by other
factors, such as a concomitant condition). Finally, the medic nds an enlargement of the
spleen in the patient (splenomegaly). It is unclear when this condition has developed: it
might have appeared at any moment prior to that point. These events are collected and
recorded in the trace shown in Table 1within the hospital’s information system.
Such scenario, with no known probability, is known as strong uncertainty. In this
trace, the rightmost column refers to event indeterminacy: in this case, e1has been recorded,
but it might not have occurred in reality, and is marked with a “?” symbol. Event e2has
more then one possible activity labels, either PrTP or SecTP. Lastly, event e3has an un-
certain timestamp, and might have happened at any point in time between the 4th and
10th of July.
Uncertain events may also have probability values associated with them, a scenario
dened as weak uncertainty (Table 2). In the example described above, suppose the medic
estimates that there is a high chance (90) that the thrombocytopenia is primary (caused
by the cancer). Furthermore, if the splenomegaly is suspected to have developed three
days prior to the visit, which takes place on the 10th of July, the timestamp of event e3
may be described through a Gaussian curve with µ= 7. Lastly, the probability that the
event e1has been recorded but did not occur in reality may be known (for example, it
may be 25).
Uncertain data as described here can be represented, imported, analyzed and ex-
ported on all tools supporting the XES standard [9].
4 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
3 Sources of Uncertainty
In this section, we will examine some possible sources of uncertain event data. This is
not intended to be an exhaustive list nor a proper taxonomy, but it is rather a collection
of motivating situations not uncommon in the analysis of event data. In fact, many are
documented in literature.
It is important to notice that some causes of uncertainty are epistemic, that is, caused
by a loss of information or knowledge in some stage of the data recording process; or
aleatoric, where the uncertainty is intrinsic to the process itself. This distinction, strongly
underlined in other elds such as statistics and machine learning, is very important in
order to interpret the results of process mining analyses—especially in regard of process
improvements prompted by the analysis.
Data Coarseness. Limitations in the precision available to record an event at-
tribute can generate uncertainty. In process mining, this is ofen the case with times-
tamps, the attribute we normally rely on to determine a total ordering between events.
In some event logs, however, timestamps of diferent events in the same process trace
coincide, because of the coarseness of data recording (e.g., when only the day is recorded
but not the time, causing all events happened in the same day to have the same times-
tamp). This is a source of partially ordered event data, a type of uncertain data, and is
well documented in process mining research [6].
Accuracy of Textual Information. In many processes, activities and other event
attributes are recorded by humans. In such cases, ofen natural language describes the
activity identier, which may be imprecise in describing what actually happened. For
instance, in the uncertain trace of Table 1, the activity label uncertainty of event e2might
have been caused by the activity being recorded simply as “TP”. This is also a known
anomaly in process mining; some approaches to repair it exist, and are based on merging
similar labels through NLP methods [1].
Accuracy of Data Detection/Repair Methods. In some cases, events are not
recorded as they happen, but are rather detected from an unstructured source. An ex-
ample of this is detecting events from video feeds using e.g. deep learning [3,5]. Neu-
ral networks are able to predict the occurrence of an event describing it as probability
distribution over the possible classes (here, activity labels). This generates probabilistic
information about events which ts with the framework described in this paper.
4 Related Research
Techniques to deal with anomalies and noise in data are present in all branches of data
science, from statistics, to machine learning, to process mining itself. Ofen, a strong
focus is on either filter anomalous data [16], and analyze the remaining dataset, or repair
5 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
anomalous attributes, by predicting or inferring heuristically their correct value.
The meta-information describing uncertainty opens a third possibility, which is the
development of analysis techniques able to operate on uncertain data as-is. In the con-
text of standard tabular data, this is the research domain of probabilistic databases [15].
Specically, an approach that lies at the intersection of probabilistic databases and event
data analysis is frequent itemsets mining, where the goal is to dene frequently-appearing
clusters of objects across sets of items (which might be events). There exist approaches
to solve this problem for probabilistic data, such as the U-Apriori algorithm [2].
The concept of uncertainty as quantiable imprecision of data is also of great rel-
evance in the eld of machine learning [4], and very recent research is aimed to detect
possible uncertainties in data, quantify them, and classify them as epistemic or aleatoric.
The topic of uncertainty in process mining as dened in this paper is novel, and—to
the best of our knowledge—no techniques able to manage uncertainty were described in
literature before the start of the doctoral program described in this paper. In the next sec-
tion, we will describe the research principle that leads our research of uncertain data, and
examples of problems solved by process mining techniques applied to uncertain data.
5 Research Methodology
The premises set out in Sections 2and 3, together with the analysis of the literature,
brought us to formulate—among others—the following research questions:
RQ1: How can we adapt conformance checking to be able to deal with uncertain event
data?
RQ2: How can we adapt process discovery to be able to deal with uncertain event data?
RQ3: How can we embed the mathematical formulation of uncertain event data to
obtain uncertain logs from information systems?
RQ4: How can we manage the high complexity tied with all possible scenarios described
by an uncertain trace?
In the following Subsections 5.1 and 5.2 we will describe the methodology utilized
to research RQ1 and RQ2, respectively. RQ3 and RQ4 entail challenges that are still
completely open, and we comment on them in Section 6.
Uncertain event data can be considered noise. Filtering or repairing noisy data in a
pre-processing step is standard practice both in process mining and data science at large.
In our research, the leading principle is the opposite: retain all data, and exploit the
quantification of uncertainty to analyze it in a trustworthy way. We shif the resolution
of uncertainty from the data side to the algorithm side. Such practice avoids information
loss and unlocks new insights.
Let us see this principle in action on two of the primary process mining analyses:
conformance checking (RQ1) and process discovery (RQ2).
6 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
t1
NightSweats
t2
t4
Splenomeg
t3PrTP
t5
Adm
t6
Figure 1: A normative model for the healthcare process case in the running example. The initial marking is
displayed; the gray “token slot” represents the nal marking.
5.1 Conformance Checking
Conformance checking is one of the main tasks in process mining, and consists in mea-
suring the deviation between process execution data and a reference model. This is par-
ticularly useful for organization, since it enables them to compare historical process data
against a normative model created by process experts and to identify anomalies in their
operations.
Let us assume that we have access to a normative model for the disease of the patient
in the running example, shown in Figure 1.
This model essentially states that the disease is characterized by the occurrence of
night sweats and splenomegaly on the patient, which may happen concurrently, and
then should be followed by primary thrombocytopenia. We would like to measure the
conformance between the trace in Table 1and this normative model. A very popular
conformance checking technique works via the computation of alignments. Through
this technique, we are able to identify the deviations in the execution of a process, in the
form of behavior happening in the model but not in the trace, and behavior happening
in the trace but not in the model. These deviations are identied and used to compute a
conformance score between the trace and the process model.
The formulation of alignments in is not applicable to an uncertain trace. In fact, de-
pending on the instantiation of the uncertain attributes of events—like the timestamp of
e3in the trace—the order of event may difer, and so may the conformance score. How-
ever, we can look at the best- and worst-case scenarios: the instantiation of attributes of
the trace that entails the minimum and maximum number of deviations with respect
to the reference model. In our example, two possible outcomes for the sample trace are
hNightSweats,Splenomeg,PrTP,Admiand hSecTP,Splenomeg,Admi; both represent
the sequence of event that might have happened in reality, but their conformance score
is very diferent. The alignment of the rst trace against the reference model can be seen
in Table 3, while the alignment of the second trace can be seen in Table 4. These two out-
comes of the uncertain trace in Table 1represent, respectively, the minimum and maxi-
mum amount of deviation possible with respect to the reference model, and dene then
a lower and upper bound for conformance score.
7 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
Table 3: An optimal alignment for hNightSweats,Splenomeg,PrTP,Admi, one of the possible instantia-
tions of the trace in Table 1, against the model in Figure 1. This alignment has a deviation cost of 0, and
corresponds to the best-case scenario for conformance between the process model and the uncertain trace.
NightSweats Splenomeg PrTP Adm
τNightSweats Splenomeg τPrTP Adm
t1t2t3t4t5t6
Table 4: An optimal alignment for hSecTP,Splenomeg,Admi, one of the possible instantiations of the trace
in Table 1, against the model in Figure 1. This alignment has a deviation cost of 3, caused by 2 moves on model
and 1 move on log, and corresponds to the worst-case scenario for conformance between the process model
and the uncertain trace.
SecTP Splenomeg Adm
τNightSweats Splenomeg τPrTP Adm
t1t2t3t4t5t6
It is possible to nd bounds for the conformance score of an uncertain trace and a
reference process model with an extension of the alignment technique [10]. In order to
nd such bounds, it is necessary to build a Petri net able to simulate all possible behaviors
in the uncertain trace, called the behavior net [13]. The behavior net of the trace in Table 1
is shown in Figure 2.
The alignments in Tables 3and 4show how we can get actionable insights from
process mining over uncertain data. In some applications it is reasonable and appropri-
ate to remove uncertain data from an event log via ltering, and then compute log-level
aggregate information—such as total number of deviations, or average deviations per
trace—using the remaining certain data. Even in processes where this is possible, doing
so prevents the important process mining task of case diagnostic. Conversely, uncertain
alignments allow not only to have best- and worst-case scenarios for a trace, but also to
individuate the specic deviations afecting both scenarios. For instance, the alignments
of the running example can be implemented in a system that warns the medics that the
patient might have been afected by a secondary thrombocytopenia not explained by the
model of the disease. Since the model indicates that the disease should develop primary
thrombocytopenia as a symptom, this patient is at risk of both types of platelets decit
simultaneously, which is a serious condition. The medics can then intervene to avoid this
complication, and perform more exams to ascertain the cause of the patient’s thrombo-
cytopenia.
8 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
(start,e1)
NightSweats
(e1, NightSweats)
NightSweats
(e1, τ)
(e1, e2)
PrTP
(e2, PrTP)
SecTP
(e2, SecTP)
(e2, e4)
(start,e3)(e3, e4)
Splenomeg
(e3, Splenomeg)
Adm
(e4, Adm)(e4,end)
Figure 2: The behavior net [12] representing the behavior of the uncertain trace in Table 1. The initial
marking is displayed; the gray “token slot” represents the nal marking. This artifact is necessary to perform
conformance checking between uncertain traces and a reference model.
5.2 Process Discovery
Process discovery is another main objective in process mining, and involves automatically
creating a process model from event data. Many process discovery algorithms rely on
the concept of directly-follows relationships between activities to gather clues on how to
structure the process model. Uncertain Directly-Follows Graphs (UDFGs) enable the
representation of directly-follows relationships in an uncertain event log; they consist in
directed graphs where the activity labels appearing in the event log constitute the nodes,
and the edges are decorated with information on the minimum and maximum frequency
observable for the directly-follows relation between pair of activities.
Let us examine an example of UDFG. In order to build a signicant example, we
need to introduce an entire uncertain event log; since the full table notation for uncertain
traces becomes cumbersome for entire logs, let us utilize a shorthand simplied notation.
In a trace, we represent an uncertain event with multiple possible activity labels by listing
all the associated labels between curly braces.
When two events have mutually overlapping timestamps, we write their activity la-
bels between square brackets, and we indicate indeterminate events by overlining them.
For instance, the trace ha, {b, c},[d, e]iis a trace containing 4 events, of which the rst
is an indeterminate event with activity label a, the second is an uncertain event that can
have either bor cas activity label, and the last two events have an interval as timestamp
(and the two ranges overlap). Let us consider the following event log:
ha, b, e, f, g, hi80,ha, {b, c},[e, f ], g, ii15 ,ha, {b, c, d},[e, f ], g, ji5.
For each pair of activities, we can count the minimum and maximum occurrences
of a directly-follows relationship that can be observed in the log. The resulting UDFG
is shown in Figure 3.
This graph can be then utilized to discover process models of uncertain logs via pro-
cess discovery methods based on directly-follows relationships. In a previous work we
illustrated this principle by applying it to the inductive miner, a popular discovery algo-
9 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
a
b
c
d
e
f
g
h
i
j
[80, 100]
[0, 20]
[0, 5]
[80, 100]
[0, 20]
[0, 20]
[0, 20]
[0, 5]
[0, 5]
[80, 100][0, 20]
[0, 20]
[80, 100]
[80, 80]
[15, 15]
[0, 5]
[100, 100]
[80, 80]
[15, 15]
[0, 5]
Figure 3: The Uncertain Directly-Follows Graph (UDFG) computed based on the uncertain event log
ha, b, e, f, g, hi80,ha, {b, c},[e, f ], g, ii15 ,ha, {b, c, d},[e, f ], g, j i5. The arcs are labeled with the minimum
and maximum number of directly-follows relationship observable between activities in the corresponding
trace. The construction of this object is necessary to perform automatic process discovery over uncertain
event data.
rithm [11]; the edges of the UDFG can be ltered using the information on the labels, in
such a way that the nal model can represent all possible behavior in the uncertain log,
or only a part. Figure 4shows some process models obtained through inductive mining
of the UDFG, as well as a description regarding how the model relates to the original un-
certain log. Notice how all three models in the gure are not obtainable by ltering out
the traces with uncertainty from the log; this would radically remove useful information
from the event log.
The process mining techniques described here are available in a Python library built
on the PM4Py framework [14].
10 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
abefg
h
i
(a) A process model that can only replay the relationships appearing in the certain parts
of the traces in the uncertain log. Here, information from uncertainty has been ex-
cluded completely.
a
b
c
e
f
g
h
i
(b) A process model that can replay some—but not all—the relationships appearing in
the uncertain parts of the traces in the uncertain log. This process model mediates be-
tween representing only certain observation and representing all the possible behavior
in the process.
a
b
d
c
e
f
g
i
j
h
k
(c) A process model that can replay all possible congurations of certain and uncertain
traces in the uncertain log. This process model has the highest possible replay tness,
but is also very likely to contain some noisy or otherwise unwanted behavior.
Figure 4: Three diferent process models for the uncertain event log ha, b, e, f, g, hi80,
ha, {b, c},[e, f ], g, i i15,ha, {b, c, d},[e, f ], g, j i5obtained through inductive mining over an uncer-
tain directly-follows graph. The diferent ltering parameters for the UDFG yield models with distinct
features.
6 Open Challenges
The examples shown in the previous section show some viable solutions to typical pro-
cess mining problems in the uncertain case; however, many technical challenges remain
open.
A prominent problem is in data sourcing (RQ3). At the present time, no informa-
tion system natively supports the quantication of uncertainty, thus examples of uncer-
tain logs come from pre-processing steps that label data as uncertain based on domain
11 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
knowledge provided by process experts. This needs to be automated; for instance, inter-
vening directly on the process of data recording. Uncertainty-aware information systems
would not only enable the full automation of techniques for process mining over uncer-
tainty, but also more reliably support general data mining techniques, which would gain
an additional measure of reliability.
Retaining all information from uncertain traces has the problem that the possible
behavior are subject to a combinatorial explosion (RQ4). While techniques to fully de-
scribe all behavior and related probabilities exists [8], this comes at the cost of high
(sometimes exponential) computational complexity. In existing techniques, this has been
mitigated by representing uncertain traces as graphs (e.g., the behavior net), and design-
ing algorithms able to work on graphs as inputs. However, this is inefective for some
applications, such as measuring classic model/log metrics in process mining like tness
and precision. We might overcome this problem by switching to approximated tech-
niques, which allow to trade-of speed and accuracy in a controlled manner.
7 Conclusion
The research eld of process mining on uncertain event data, while at its infancy, has
proven useful in solving real-life problems that can appear on uncertain data and that
require dedicated techniques. Such techniques do not lter out or repair the uncertain
attributes in event logs, but rather use extended versions of known process mining algo-
rithms to obtain an uncertainty-aware solution—a solution that explains uncertainty as
intrinsic part of the process.
In pursuing this line of research, we aim to create a comprehensive set of techniques
that allow to carry out the most typical process mining tasks on data with quantied
uncertainty. Our future work will be guided by the open challenges hereby described
which, once solved, will enable a rich array of analysis techniques on uncertain data.
Acknowledgements
I am very grateful to Prof. Wil van der Aalst, who advises my doctoral studies, and to Dr.
Merih Seran Uysal, who supervises me in researching this topic. I thank the Alexander
von Humboldt (AvH) Stifung for supporting my research interactions.
References
[1] van der Aa, Han, Adrian Rebmann, and Henrik Leopold. “Natural language-
based detection of semantic execution anomalies in event logs”. In: Information
Systems 102 (2021), p. 101824. doi:10.1016/j.is.2021.101824.
12 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
[2] Chui, Chun Kit, Ben Kao, and Edward Hung. “Mining Frequent Itemsets from
Uncertain Data”. In: Advances in Knowledge Discovery and Data Mining, 11th
Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22-25, 2007, Pro-
ceedings. Ed. by Zhou, Zhi-Hua, Hang Li, and Qiang Yang. Vol. 4426. Lecture
Notes in Computer Science. Springer, 2007, pp. 47–58. doi:10.1007/978-3-
540-71701-0_8.
[3] Cohen, Izack and Avigdor Gal. “Uncertain Process Data with Probabilistic Knowl-
edge: Problem Characterization and Challenges”. In: Proceedings of the Interna-
tional Workshop on BPM Problems to Solve Before We Die (PROBLEMS 2021)
co-located with the 19th International Conference on Business Process Management
(BPM 2021), Rome, Italy, September 6-10, 2021. Ed. by Beerepoot, Iris, Claudio Di
Ciccio, Andrea Marrella, et al. Vol. 2938. CEUR Workshop Proceedings. CEUR-
WS.org, 2021, pp. 51–56. url:http://ceur- ws.org/Vol- 2938/paper-
PROBLEMS-51.pdf.
[4] H¨
ullermeier, Eyke and Willem Waegeman. “Aleatoric and epistemic uncertainty
in machine learning: an introduction to concepts and methods”. In: Machine
Learning 110.3 (2021), pp. 457–506. doi:10.1007/s10994-021-05946-3.
[5] Lepsien, Arvid, Jan Bosselmann, Andreas Melfsen, et al. “Process Mining on Video
Data”. In: Proceedings of the 14th Central European Workshop on Services and
their Composition (ZEUS 2022), Bamberg, Germany, February 24-25, 2022. Ed.
by Manner, Johannes, Daniel L¨
ubke, Stephan Haarmann, et al. Vol. 3113. CEUR
Workshop Proceedings. CEUR-WS.org, 2022, pp. 56–62. url:http://ceur-
ws.org/Vol-3113/paper9.pdf.
[6] Lu, Xixi, Dirk Fahland, and Wil M. P. van der Aalst. “Conformance Checking
Based on Partially Ordered Event Data”. In: Business Process Management Work-
shops - BPM 2014 International Workshops, Eindhoven, The Netherlands, Septem-
ber 7-8, 2014, Revised Papers. Ed. by Fournier, Fabiana and Jan Mendling. Vol. 202.
Lecture Notes in Business Information Processing. Springer, 2014, pp. 75–88. doi:
10.1007/978-3-319-15895-2_7.
[7] Pegoraro, Marco and Wil M. P. van der Aalst. “Mining Uncertain Event Data in
Process Mining”. In: International Conference on Process Mining, ICPM 2019,
Aachen, Germany, June 24-26, 2019. IEEE, 2019, pp. 89–96. doi:10 . 1109 /
ICPM.2019.00023.
[8] Pegoraro, Marco, Bianka Bakullari, Merih Seran Uysal, et al. “Probability Estima-
tion of Uncertain Process Trace Realizations”. In: Process Mining Workshops -
ICPM 2021 International Workshops, Eindhoven, The Netherlands, October 31 -
November 4, 2021, Revised Selected Papers. Ed. by Munoz-Gama, Jorge and Xixi
13 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
Lu. Vol. 433. Lecture Notes in Business Information Processing. Springer, 2021,
pp. 21–33. doi:10.1007/978-3-030-98581-3_2.
[9] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “An XES Exten-
sion for Uncertain Event Data”. In: Proceedings of the Best Dissertation Award,
Doctoral Consortium, and Demonstration & Resources Track at BPM 2021 co-
located with 19th International Conference on Business Process Management (BPM
2021), Rome, Italy, September 6th to 10th, 2021. Ed. by van der Aalst, Wil M. P.,
Remco M. Dijkman, Akhil Kumar, et al. Vol. 2973. CEUR Workshop Proceed-
ings. CEUR-WS.org, 2021, pp. 116–120. url:http :/ /ceur - ws .org / Vol-
2973/paper_273.pdf.
[10] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Conformance
checking over uncertain event data”. In: Information Systems 102 (2021), p. 101810.
doi:10.1016/j.is.2021.101810.
[11] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Discovering
Process Models from Uncertain Event Data”. In: Business Process Management
Workshops - BPM 2019 International Workshops, Vienna, Austria, September 1-6,
2019, Revised Selected Papers. Ed. by Di Francescomarino, Chiara, Remco M. Dijk-
man, and Uwe Zdun. Vol. 362. Lecture Notes in Business Information Processing.
Springer, 2019, pp. 238–249. doi:10.1007/978-3-030-37453-2_20.
[12] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Con-
struction of Behavior Graphs for Uncertain Event Data”. In: Business Information
Systems - 23rd International Conference, BIS 2020, Colorado Springs, CO, USA,
June 8-10, 2020, Proceedings. Ed. by Abramowicz, Witold and Gary Klein. Vol. 389.
Lecture Notes in Business Information Processing. Springer, 2020, pp. 76–88.
doi:10.1007/978-3-030-53337-3_6.
[13] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Time
and Space Representation of Uncertain Event Data”. In: Algorithms 13.11 (2020),
p. 285. doi:10.3390/a13110285.
[14] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “PROVED: A
Tool for Graph Representation and Analysis of Uncertain Event Data”. In: Ap-
plication and Theory of Petri Nets and Concurrency - 42nd International Con-
ference, PETRI NETS 2021, Virtual Event, June 23-25, 2021, Proceedings. Ed. by
Buchs, Didier and Josep Carmona. Vol. 12734. Lecture Notes in Computer Sci-
ence. Springer, 2021, pp. 476–486. doi:10.1007/978-3-030-76983-3_24.
[15] Suciu, Dan, Dan Olteanu, Christopher R´
e, et al. Probabilistic Databases. Synthe-
sis Lectures on Data Management. Morgan & Claypool Publishers, 2011. doi:
10.2200/S00362ED1V01Y201105DTM016.
14 / 15