PreprintPDF Available

Probabilistic and Non-Deterministic Event Data in Process Mining: Embedding Uncertainty in Process Analysis Techniques

Authors:

Abstract and Figures

Process mining is a subfield of process science that analyzes event data collected in databases called event logs. Recently, novel types of event data have become of interest due to the wide industrial application of process mining analyses. In this paper, we examine uncertain event data. Such data contain meta-attributes describing the amount of imprecision tied with attributes recorded in an event log. We provide examples of uncertain event data, present the state of the art in regard of uncertainty in process mining, and illustrate open challenges related to this research direction.
Content may be subject to copyright.
Probabilistic and Non-Deterministic Event
Data in Process Mining: Embedding
Uncertainty in Process Analysis Techniques
Marco Pegoraro 1
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
pegoraro@pads.rwth-aachen.de
Abstract
Process mining is a subeld of process science that analyzes event data collected
in databases called event logs. Recently, novel types of event data have become
of interest due to the wide industrial application of process mining analyses. In
this paper, we examine uncertain event data. Such data contain meta-attributes
describing the amount of imprecision tied with attributes recorded in an event
log. We provide examples of uncertain event data, present the state of the art in
regard of uncertainty in process mining, and illustrate open challenges related to
this research direction.
Keywords: Process Mining ·Process Science ·Event Data ·Probabilistic Data
·Non-Deterministic Data.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the authors. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Pegoraro, Marco. “Probabilistic and Non-Deterministic Event Data in Process Mining: Embedding Uncertainty in
Process Analysis Techniques”. In: Proceedings of the Doctoral Consortium Papers Presented at the 34th International
Conference on Advanced Information Systems Engineering (CAiSE 2022). Ed. by Van Looy, Amy, Barbara Weber, and
Michael Rosemann. CEUR Workshop Proceedings. CEUR-WS.org, 2022
Please, cite this document as shown above.
Publication chronology:
2022-03-15: full text submitted to the International Conference on Advanced Information Systems Engineering (CAiSE) 2022, main track
2022-04-11: notication of acceptance
2022-04-25: camera-ready version submitted
The published version referred above is ©CEUR-WS.org.
Correspondence to:
Marco Pegoraro, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Website: http://mpegoraro.net/ ·Email: pegoraro@pads.rwth-aachen.de ·ORCID: 0000-0002-8997-7517
Content: 15 pages, 4 gures, 4 tables, 16 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
1 Introduction
Process mining is a rapidly growing subeld of data science that aims to automatically
analyze event data through a collection of techniques, including the extraction of a pro-
cess model from a log of historical process executions, the assessment of the conformance
and deviations between observed and expected behavior, and the measurement of met-
rics and indicators over event data and process models.
The endemic adoption of process mining in the last decades has increased the de-
mand of domain-specic process analysis techniques—for instance, techniques to ana-
lyze less traditional types of event data. In this paper, we describe novel types of event
data—collectively referred as uncertain event data [7]. Such data contain meta-attributes
describing and quantifying the amount of imprecision tied with attributes recorded in
an event log. The uncertainty tied to an event attribute might contain indications on its
possible values, or also a probability distribution over such values.
The aim of this research direction is to formally illustrate and classify diferent types
of uncertain event data, and develop ad-hoc process mining techniques able to natively
function with uncertain event data.
The remainder of the paper is structured as follows. Section 2shows examples of un-
certain event data. Section 3discusses some possible sources of uncertainty in recorded
event data. Section 4explores related concepts in process mining and neighboring dis-
ciplines. Then, Section 5lays out the research methodology and describes the state of
the art. Section 6describes some open challenges in the eld of uncertainty in process
mining. Finally, Section 7concludes the paper.
2 Uncertainty in Event Data
In order to more clearly visualize the structure of the attributes in uncertain events, let
us consider the following process instance, which is a simplied version of actually oc-
curring anomalies, e.g., in the processes of the healthcare domain.
An elderly patient enrolls in a clinical trial for an experimental treatment against
myeloproliferative neoplasms, a class of blood cancers. This enrollment includes a lab
exam and a visit with a specialist; then, the treatment can begin. The lab exam, per-
formed on the 8th of July, nds a low level of platelets in the blood of the patient, a
condition known as thrombocytopenia (TP). During the visit on the 10th of July, the
patient reports an episode of night sweats on the night of the 5th of July, prior to the
lab exam. The medic notes this but also hypothesizes that it might not be a symptom,
since it can be caused either by the condition or by external factors (such as very warm
weather). The medic also reads the medical records of the patient and sees that, shortly
prior to the lab exam, the patient was undergoing a heparin treatment (a blood-thinning
3 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
Table 1: The strongly uncertain trace of an example of healthcare process. The timestamps column shows
only the day of the month.
Case ID Event ID Timestamp Activity Indeterminacy
ID192 e15NightSweats ?
ID192 e28PrTP,SecTP
ID192 e34–10 Splenomeg
Table 2: A trace where uncertain event attributes are labeled with probabilities (weak uncertainty).
Case ID Event ID Timestamp Activity Indeterminacy
ID348 e45NightSweats ? : 25%
ID348 e58PrTP: 90%,
SecTP: 10%
ID348 e6N(7,1) Splenomeg
medication) to prevent blood clots. The thrombocytopenia, detected by the lab exam,
can then be either primary (caused by the blood cancer) or secondary (caused by other
factors, such as a concomitant condition). Finally, the medic nds an enlargement of the
spleen in the patient (splenomegaly). It is unclear when this condition has developed: it
might have appeared at any moment prior to that point. These events are collected and
recorded in the trace shown in Table 1within the hospital’s information system.
Such scenario, with no known probability, is known as strong uncertainty. In this
trace, the rightmost column refers to event indeterminacy: in this case, e1has been recorded,
but it might not have occurred in reality, and is marked with a “?” symbol. Event e2has
more then one possible activity labels, either PrTP or SecTP. Lastly, event e3has an un-
certain timestamp, and might have happened at any point in time between the 4th and
10th of July.
Uncertain events may also have probability values associated with them, a scenario
dened as weak uncertainty (Table 2). In the example described above, suppose the medic
estimates that there is a high chance (90) that the thrombocytopenia is primary (caused
by the cancer). Furthermore, if the splenomegaly is suspected to have developed three
days prior to the visit, which takes place on the 10th of July, the timestamp of event e3
may be described through a Gaussian curve with µ= 7. Lastly, the probability that the
event e1has been recorded but did not occur in reality may be known (for example, it
may be 25).
Uncertain data as described here can be represented, imported, analyzed and ex-
ported on all tools supporting the XES standard [9].
4 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
3 Sources of Uncertainty
In this section, we will examine some possible sources of uncertain event data. This is
not intended to be an exhaustive list nor a proper taxonomy, but it is rather a collection
of motivating situations not uncommon in the analysis of event data. In fact, many are
documented in literature.
It is important to notice that some causes of uncertainty are epistemic, that is, caused
by a loss of information or knowledge in some stage of the data recording process; or
aleatoric, where the uncertainty is intrinsic to the process itself. This distinction, strongly
underlined in other elds such as statistics and machine learning, is very important in
order to interpret the results of process mining analyses—especially in regard of process
improvements prompted by the analysis.
Data Coarseness. Limitations in the precision available to record an event at-
tribute can generate uncertainty. In process mining, this is ofen the case with times-
tamps, the attribute we normally rely on to determine a total ordering between events.
In some event logs, however, timestamps of diferent events in the same process trace
coincide, because of the coarseness of data recording (e.g., when only the day is recorded
but not the time, causing all events happened in the same day to have the same times-
tamp). This is a source of partially ordered event data, a type of uncertain data, and is
well documented in process mining research [6].
Accuracy of Textual Information. In many processes, activities and other event
attributes are recorded by humans. In such cases, ofen natural language describes the
activity identier, which may be imprecise in describing what actually happened. For
instance, in the uncertain trace of Table 1, the activity label uncertainty of event e2might
have been caused by the activity being recorded simply as “TP”. This is also a known
anomaly in process mining; some approaches to repair it exist, and are based on merging
similar labels through NLP methods [1].
Accuracy of Data Detection/Repair Methods. In some cases, events are not
recorded as they happen, but are rather detected from an unstructured source. An ex-
ample of this is detecting events from video feeds using e.g. deep learning [3,5]. Neu-
ral networks are able to predict the occurrence of an event describing it as probability
distribution over the possible classes (here, activity labels). This generates probabilistic
information about events which ts with the framework described in this paper.
4 Related Research
Techniques to deal with anomalies and noise in data are present in all branches of data
science, from statistics, to machine learning, to process mining itself. Ofen, a strong
focus is on either filter anomalous data [16], and analyze the remaining dataset, or repair
5 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
anomalous attributes, by predicting or inferring heuristically their correct value.
The meta-information describing uncertainty opens a third possibility, which is the
development of analysis techniques able to operate on uncertain data as-is. In the con-
text of standard tabular data, this is the research domain of probabilistic databases [15].
Specically, an approach that lies at the intersection of probabilistic databases and event
data analysis is frequent itemsets mining, where the goal is to dene frequently-appearing
clusters of objects across sets of items (which might be events). There exist approaches
to solve this problem for probabilistic data, such as the U-Apriori algorithm [2].
The concept of uncertainty as quantiable imprecision of data is also of great rel-
evance in the eld of machine learning [4], and very recent research is aimed to detect
possible uncertainties in data, quantify them, and classify them as epistemic or aleatoric.
The topic of uncertainty in process mining as dened in this paper is novel, and—to
the best of our knowledge—no techniques able to manage uncertainty were described in
literature before the start of the doctoral program described in this paper. In the next sec-
tion, we will describe the research principle that leads our research of uncertain data, and
examples of problems solved by process mining techniques applied to uncertain data.
5 Research Methodology
The premises set out in Sections 2and 3, together with the analysis of the literature,
brought us to formulate—among others—the following research questions:
RQ1: How can we adapt conformance checking to be able to deal with uncertain event
data?
RQ2: How can we adapt process discovery to be able to deal with uncertain event data?
RQ3: How can we embed the mathematical formulation of uncertain event data to
obtain uncertain logs from information systems?
RQ4: How can we manage the high complexity tied with all possible scenarios described
by an uncertain trace?
In the following Subsections 5.1 and 5.2 we will describe the methodology utilized
to research RQ1 and RQ2, respectively. RQ3 and RQ4 entail challenges that are still
completely open, and we comment on them in Section 6.
Uncertain event data can be considered noise. Filtering or repairing noisy data in a
pre-processing step is standard practice both in process mining and data science at large.
In our research, the leading principle is the opposite: retain all data, and exploit the
quantification of uncertainty to analyze it in a trustworthy way. We shif the resolution
of uncertainty from the data side to the algorithm side. Such practice avoids information
loss and unlocks new insights.
Let us see this principle in action on two of the primary process mining analyses:
conformance checking (RQ1) and process discovery (RQ2).
6 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
t1
NightSweats
t2
t4
Splenomeg
t3PrTP
t5
Adm
t6
Figure 1: A normative model for the healthcare process case in the running example. The initial marking is
displayed; the gray “token slot” represents the nal marking.
5.1 Conformance Checking
Conformance checking is one of the main tasks in process mining, and consists in mea-
suring the deviation between process execution data and a reference model. This is par-
ticularly useful for organization, since it enables them to compare historical process data
against a normative model created by process experts and to identify anomalies in their
operations.
Let us assume that we have access to a normative model for the disease of the patient
in the running example, shown in Figure 1.
This model essentially states that the disease is characterized by the occurrence of
night sweats and splenomegaly on the patient, which may happen concurrently, and
then should be followed by primary thrombocytopenia. We would like to measure the
conformance between the trace in Table 1and this normative model. A very popular
conformance checking technique works via the computation of alignments. Through
this technique, we are able to identify the deviations in the execution of a process, in the
form of behavior happening in the model but not in the trace, and behavior happening
in the trace but not in the model. These deviations are identied and used to compute a
conformance score between the trace and the process model.
The formulation of alignments in is not applicable to an uncertain trace. In fact, de-
pending on the instantiation of the uncertain attributes of events—like the timestamp of
e3in the trace—the order of event may difer, and so may the conformance score. How-
ever, we can look at the best- and worst-case scenarios: the instantiation of attributes of
the trace that entails the minimum and maximum number of deviations with respect
to the reference model. In our example, two possible outcomes for the sample trace are
hNightSweats,Splenomeg,PrTP,Admiand hSecTP,Splenomeg,Admi; both represent
the sequence of event that might have happened in reality, but their conformance score
is very diferent. The alignment of the rst trace against the reference model can be seen
in Table 3, while the alignment of the second trace can be seen in Table 4. These two out-
comes of the uncertain trace in Table 1represent, respectively, the minimum and maxi-
mum amount of deviation possible with respect to the reference model, and dene then
a lower and upper bound for conformance score.
7 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
Table 3: An optimal alignment for hNightSweats,Splenomeg,PrTP,Admi, one of the possible instantia-
tions of the trace in Table 1, against the model in Figure 1. This alignment has a deviation cost of 0, and
corresponds to the best-case scenario for conformance between the process model and the uncertain trace.
NightSweats Splenomeg PrTP Adm
τNightSweats Splenomeg τPrTP Adm
t1t2t3t4t5t6
Table 4: An optimal alignment for hSecTP,Splenomeg,Admi, one of the possible instantiations of the trace
in Table 1, against the model in Figure 1. This alignment has a deviation cost of 3, caused by 2 moves on model
and 1 move on log, and corresponds to the worst-case scenario for conformance between the process model
and the uncertain trace.
SecTP Splenomeg   Adm
τNightSweats Splenomeg τPrTP Adm
t1t2t3t4t5t6
It is possible to nd bounds for the conformance score of an uncertain trace and a
reference process model with an extension of the alignment technique [10]. In order to
nd such bounds, it is necessary to build a Petri net able to simulate all possible behaviors
in the uncertain trace, called the behavior net [13]. The behavior net of the trace in Table 1
is shown in Figure 2.
The alignments in Tables 3and 4show how we can get actionable insights from
process mining over uncertain data. In some applications it is reasonable and appropri-
ate to remove uncertain data from an event log via ltering, and then compute log-level
aggregate information—such as total number of deviations, or average deviations per
trace—using the remaining certain data. Even in processes where this is possible, doing
so prevents the important process mining task of case diagnostic. Conversely, uncertain
alignments allow not only to have best- and worst-case scenarios for a trace, but also to
individuate the specic deviations afecting both scenarios. For instance, the alignments
of the running example can be implemented in a system that warns the medics that the
patient might have been afected by a secondary thrombocytopenia not explained by the
model of the disease. Since the model indicates that the disease should develop primary
thrombocytopenia as a symptom, this patient is at risk of both types of platelets decit
simultaneously, which is a serious condition. The medics can then intervene to avoid this
complication, and perform more exams to ascertain the cause of the patient’s thrombo-
cytopenia.
8 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
(start,e1)
NightSweats
(e1, NightSweats)
NightSweats
(e1, τ)
(e1, e2)
PrTP
(e2, PrTP)
SecTP
(e2, SecTP)
(e2, e4)
(start,e3)(e3, e4)
Splenomeg
(e3, Splenomeg)
Adm
(e4, Adm)(e4,end)
Figure 2: The behavior net [12] representing the behavior of the uncertain trace in Table 1. The initial
marking is displayed; the gray “token slot” represents the nal marking. This artifact is necessary to perform
conformance checking between uncertain traces and a reference model.
5.2 Process Discovery
Process discovery is another main objective in process mining, and involves automatically
creating a process model from event data. Many process discovery algorithms rely on
the concept of directly-follows relationships between activities to gather clues on how to
structure the process model. Uncertain Directly-Follows Graphs (UDFGs) enable the
representation of directly-follows relationships in an uncertain event log; they consist in
directed graphs where the activity labels appearing in the event log constitute the nodes,
and the edges are decorated with information on the minimum and maximum frequency
observable for the directly-follows relation between pair of activities.
Let us examine an example of UDFG. In order to build a signicant example, we
need to introduce an entire uncertain event log; since the full table notation for uncertain
traces becomes cumbersome for entire logs, let us utilize a shorthand simplied notation.
In a trace, we represent an uncertain event with multiple possible activity labels by listing
all the associated labels between curly braces.
When two events have mutually overlapping timestamps, we write their activity la-
bels between square brackets, and we indicate indeterminate events by overlining them.
For instance, the trace ha, {b, c},[d, e]iis a trace containing 4 events, of which the rst
is an indeterminate event with activity label a, the second is an uncertain event that can
have either bor cas activity label, and the last two events have an interval as timestamp
(and the two ranges overlap). Let us consider the following event log:
ha, b, e, f, g, hi80,ha, {b, c},[e, f ], g, ii15 ,ha, {b, c, d},[e, f ], g, ji5.
For each pair of activities, we can count the minimum and maximum occurrences
of a directly-follows relationship that can be observed in the log. The resulting UDFG
is shown in Figure 3.
This graph can be then utilized to discover process models of uncertain logs via pro-
cess discovery methods based on directly-follows relationships. In a previous work we
illustrated this principle by applying it to the inductive miner, a popular discovery algo-
9 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
a
b
c
d
e
f
g
h
i
j
[80, 100]
[0, 20]
[0, 5]
[80, 100]
[0, 20]
[0, 20]
[0, 20]
[0, 5]
[0, 5]
[80, 100][0, 20]
[0, 20]
[80, 100]
[80, 80]
[15, 15]
[0, 5]
[100, 100]
[80, 80]
[15, 15]
[0, 5]
Figure 3: The Uncertain Directly-Follows Graph (UDFG) computed based on the uncertain event log
ha, b, e, f, g, hi80,ha, {b, c},[e, f ], g, ii15 ,ha, {b, c, d},[e, f ], g, j i5. The arcs are labeled with the minimum
and maximum number of directly-follows relationship observable between activities in the corresponding
trace. The construction of this object is necessary to perform automatic process discovery over uncertain
event data.
rithm [11]; the edges of the UDFG can be ltered using the information on the labels, in
such a way that the nal model can represent all possible behavior in the uncertain log,
or only a part. Figure 4shows some process models obtained through inductive mining
of the UDFG, as well as a description regarding how the model relates to the original un-
certain log. Notice how all three models in the gure are not obtainable by ltering out
the traces with uncertainty from the log; this would radically remove useful information
from the event log.
The process mining techniques described here are available in a Python library built
on the PM4Py framework [14].
10 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
abefg
h
i
(a) A process model that can only replay the relationships appearing in the certain parts
of the traces in the uncertain log. Here, information from uncertainty has been ex-
cluded completely.
a
b
c
e
f
g
h
i
(b) A process model that can replay some—but not all—the relationships appearing in
the uncertain parts of the traces in the uncertain log. This process model mediates be-
tween representing only certain observation and representing all the possible behavior
in the process.
a
b
d
c
e
f
g
i
j
h
k
(c) A process model that can replay all possible congurations of certain and uncertain
traces in the uncertain log. This process model has the highest possible replay tness,
but is also very likely to contain some noisy or otherwise unwanted behavior.
Figure 4: Three diferent process models for the uncertain event log ha, b, e, f, g, hi80,
ha, {b, c},[e, f ], g, i i15,ha, {b, c, d},[e, f ], g, j i5obtained through inductive mining over an uncer-
tain directly-follows graph. The diferent ltering parameters for the UDFG yield models with distinct
features.
6 Open Challenges
The examples shown in the previous section show some viable solutions to typical pro-
cess mining problems in the uncertain case; however, many technical challenges remain
open.
A prominent problem is in data sourcing (RQ3). At the present time, no informa-
tion system natively supports the quantication of uncertainty, thus examples of uncer-
tain logs come from pre-processing steps that label data as uncertain based on domain
11 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
knowledge provided by process experts. This needs to be automated; for instance, inter-
vening directly on the process of data recording. Uncertainty-aware information systems
would not only enable the full automation of techniques for process mining over uncer-
tainty, but also more reliably support general data mining techniques, which would gain
an additional measure of reliability.
Retaining all information from uncertain traces has the problem that the possible
behavior are subject to a combinatorial explosion (RQ4). While techniques to fully de-
scribe all behavior and related probabilities exists [8], this comes at the cost of high
(sometimes exponential) computational complexity. In existing techniques, this has been
mitigated by representing uncertain traces as graphs (e.g., the behavior net), and design-
ing algorithms able to work on graphs as inputs. However, this is inefective for some
applications, such as measuring classic model/log metrics in process mining like tness
and precision. We might overcome this problem by switching to approximated tech-
niques, which allow to trade-of speed and accuracy in a controlled manner.
7 Conclusion
The research eld of process mining on uncertain event data, while at its infancy, has
proven useful in solving real-life problems that can appear on uncertain data and that
require dedicated techniques. Such techniques do not lter out or repair the uncertain
attributes in event logs, but rather use extended versions of known process mining algo-
rithms to obtain an uncertainty-aware solution—a solution that explains uncertainty as
intrinsic part of the process.
In pursuing this line of research, we aim to create a comprehensive set of techniques
that allow to carry out the most typical process mining tasks on data with quantied
uncertainty. Our future work will be guided by the open challenges hereby described
which, once solved, will enable a rich array of analysis techniques on uncertain data.
Acknowledgements
I am very grateful to Prof. Wil van der Aalst, who advises my doctoral studies, and to Dr.
Merih Seran Uysal, who supervises me in researching this topic. I thank the Alexander
von Humboldt (AvH) Stifung for supporting my research interactions.
References
[1] van der Aa, Han, Adrian Rebmann, and Henrik Leopold. “Natural language-
based detection of semantic execution anomalies in event logs”. In: Information
Systems 102 (2021), p. 101824. doi:10.1016/j.is.2021.101824.
12 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
[2] Chui, Chun Kit, Ben Kao, and Edward Hung. “Mining Frequent Itemsets from
Uncertain Data”. In: Advances in Knowledge Discovery and Data Mining, 11th
Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22-25, 2007, Pro-
ceedings. Ed. by Zhou, Zhi-Hua, Hang Li, and Qiang Yang. Vol. 4426. Lecture
Notes in Computer Science. Springer, 2007, pp. 47–58. doi:10.1007/978-3-
540-71701-0_8.
[3] Cohen, Izack and Avigdor Gal. “Uncertain Process Data with Probabilistic Knowl-
edge: Problem Characterization and Challenges”. In: Proceedings of the Interna-
tional Workshop on BPM Problems to Solve Before We Die (PROBLEMS 2021)
co-located with the 19th International Conference on Business Process Management
(BPM 2021), Rome, Italy, September 6-10, 2021. Ed. by Beerepoot, Iris, Claudio Di
Ciccio, Andrea Marrella, et al. Vol. 2938. CEUR Workshop Proceedings. CEUR-
WS.org, 2021, pp. 51–56. url:http://ceur- ws.org/Vol- 2938/paper-
PROBLEMS-51.pdf.
[4] H¨
ullermeier, Eyke and Willem Waegeman. “Aleatoric and epistemic uncertainty
in machine learning: an introduction to concepts and methods”. In: Machine
Learning 110.3 (2021), pp. 457–506. doi:10.1007/s10994-021-05946-3.
[5] Lepsien, Arvid, Jan Bosselmann, Andreas Melfsen, et al. “Process Mining on Video
Data”. In: Proceedings of the 14th Central European Workshop on Services and
their Composition (ZEUS 2022), Bamberg, Germany, February 24-25, 2022. Ed.
by Manner, Johannes, Daniel L¨
ubke, Stephan Haarmann, et al. Vol. 3113. CEUR
Workshop Proceedings. CEUR-WS.org, 2022, pp. 56–62. url:http://ceur-
ws.org/Vol-3113/paper9.pdf.
[6] Lu, Xixi, Dirk Fahland, and Wil M. P. van der Aalst. “Conformance Checking
Based on Partially Ordered Event Data”. In: Business Process Management Work-
shops - BPM 2014 International Workshops, Eindhoven, The Netherlands, Septem-
ber 7-8, 2014, Revised Papers. Ed. by Fournier, Fabiana and Jan Mendling. Vol. 202.
Lecture Notes in Business Information Processing. Springer, 2014, pp. 75–88. doi:
10.1007/978-3-319-15895-2_7.
[7] Pegoraro, Marco and Wil M. P. van der Aalst. “Mining Uncertain Event Data in
Process Mining”. In: International Conference on Process Mining, ICPM 2019,
Aachen, Germany, June 24-26, 2019. IEEE, 2019, pp. 89–96. doi:10 . 1109 /
ICPM.2019.00023.
[8] Pegoraro, Marco, Bianka Bakullari, Merih Seran Uysal, et al. “Probability Estima-
tion of Uncertain Process Trace Realizations”. In: Process Mining Workshops -
ICPM 2021 International Workshops, Eindhoven, The Netherlands, October 31 -
November 4, 2021, Revised Selected Papers. Ed. by Munoz-Gama, Jorge and Xixi
13 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
Lu. Vol. 433. Lecture Notes in Business Information Processing. Springer, 2021,
pp. 21–33. doi:10.1007/978-3-030-98581-3_2.
[9] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “An XES Exten-
sion for Uncertain Event Data”. In: Proceedings of the Best Dissertation Award,
Doctoral Consortium, and Demonstration & Resources Track at BPM 2021 co-
located with 19th International Conference on Business Process Management (BPM
2021), Rome, Italy, September 6th to 10th, 2021. Ed. by van der Aalst, Wil M. P.,
Remco M. Dijkman, Akhil Kumar, et al. Vol. 2973. CEUR Workshop Proceed-
ings. CEUR-WS.org, 2021, pp. 116–120. url:http :/ /ceur - ws .org / Vol-
2973/paper_273.pdf.
[10] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Conformance
checking over uncertain event data”. In: Information Systems 102 (2021), p. 101810.
doi:10.1016/j.is.2021.101810.
[11] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Discovering
Process Models from Uncertain Event Data”. In: Business Process Management
Workshops - BPM 2019 International Workshops, Vienna, Austria, September 1-6,
2019, Revised Selected Papers. Ed. by Di Francescomarino, Chiara, Remco M. Dijk-
man, and Uwe Zdun. Vol. 362. Lecture Notes in Business Information Processing.
Springer, 2019, pp. 238–249. doi:10.1007/978-3-030-37453-2_20.
[12] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Con-
struction of Behavior Graphs for Uncertain Event Data”. In: Business Information
Systems - 23rd International Conference, BIS 2020, Colorado Springs, CO, USA,
June 8-10, 2020, Proceedings. Ed. by Abramowicz, Witold and Gary Klein. Vol. 389.
Lecture Notes in Business Information Processing. Springer, 2020, pp. 76–88.
doi:10.1007/978-3-030-53337-3_6.
[13] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Time
and Space Representation of Uncertain Event Data”. In: Algorithms 13.11 (2020),
p. 285. doi:10.3390/a13110285.
[14] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “PROVED: A
Tool for Graph Representation and Analysis of Uncertain Event Data”. In: Ap-
plication and Theory of Petri Nets and Concurrency - 42nd International Con-
ference, PETRI NETS 2021, Virtual Event, June 23-25, 2021, Proceedings. Ed. by
Buchs, Didier and Josep Carmona. Vol. 12734. Lecture Notes in Computer Sci-
ence. Springer, 2021, pp. 476–486. doi:10.1007/978-3-030-76983-3_24.
[15] Suciu, Dan, Dan Olteanu, Christopher R´
e, et al. Probabilistic Databases. Synthe-
sis Lectures on Data Management. Morgan & Claypool Publishers, 2011. doi:
10.2200/S00362ED1V01Y201105DTM016.
14 / 15
M. Pegoraro Embedding Uncertainty in Process Analysis Techniques
[16] Wang, Hongzhi, Mohamed Jaward Bah, and Mohamed Hammad. “Progress in
Outlier Detection Techniques: A Survey”. In: IEEE Access 7 (2019), pp. 107964–
108000. doi:10.1109/ACCESS.2019.2932769.
15 / 15
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Process mining is a scientific discipline that analyzes event data, often collected in databases called event logs. Recently, uncertain event logs have become of interest, which contain non-deterministic and stochastic event attributes that may represent many possible real-life scenarios. In this paper, we present a method to reliably estimate the probability of each of such scenarios, allowing their analysis. Experiments show that the probabilities calculated with our method closely match the true chances of occurrence of specific outcomes, enabling more trustworthy analyses on uncertain data.
Conference Paper
Full-text available
Disciplines like life and natural sciences could gain high benefits from process mining in terms of identifying anomalies in the process or supporting predictive analytics in what is being measured. These disciplines, however, mostly work with data at a much lower level of abstraction and the data does not directly relate to high-level business process concepts as required for process mining. This paper discusses an approach for process mining on video data. As a use case, we applied our approach on video surveillance data of pigpens. Although, our process analytics pipeline from raw video data to a discovered process model has not yet been fully implemented, we are convinced that our approach is an essential contribution towards a (semi)automatic technique aiming to replace manual work.
Conference Paper
Full-text available
The discipline of process mining aims to study processes in a data-driven manner by analyzing historical process executions, often employing Petri nets. Event data, extracted from information systems (e.g. SAP), serve as the starting point for process mining. Recently, novel types of event data have gathered interest among the process mining community, including uncertain event data. Uncertain events, process traces and logs contain attributes that are characterized by quantified imprecisions, e.g., a set of possible attribute values. The PROVED tool helps to explore, navigate and analyze such uncertain event data by abstracting the uncertain information using behavior graphs and nets, which have Petri nets semantics. Based on these constructs, the tool enables discovery and conformance checking.
Article
Full-text available
The strong impulse to digitize processes and operations in companies and enterprises have resulted in the creation and automatic recording of an increasingly large amount of process data in information systems. These are made available in the form of event logs. Process mining techniques enable the process-centric analysis of data, including automatically discovering process models and checking if event data conform to a given model. In this paper, we analyze the previously unexplored setting of uncertain event logs. In such event logs uncertainty is recorded explicitly, i.e., the time, activity and case of an event may be unclear or imprecise. In this work, we define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained by aligning an uncertain trace onto a regular process model.
Article
Full-text available
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic . In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.
Article
Full-text available
Process mining is a discipline which concerns the analysis of execution data of operational processes, the extraction of models from event data, the measurement of the conformance between event data and normative models, and the enhancement of all aspects of processes. Most approaches assume that event data is accurately captured behavior. However, this is not realistic in many applications: data can contain uncertainty, generated from errors in recording, imprecise measurements, and other factors. Recently, new methods have been developed to analyze event data containing uncertainty; these techniques prominently rely on representing uncertain event data by means of graph-based models explicitly capturing uncertainty. In this paper, we introduce a new approach to efficiently calculate a graph representation of the behavior contained in an uncertain process trace. We present our novel algorithm, prove its asymptotic time complexity, and show experimental results that highlight order-of-magnitude performance improvements for the behavior graph construction.
Conference Paper
Full-text available
The discipline of process mining deals with analyzing execution data of operational processes, extracting models from event data, checking the conformance between event data and normative models, and enhancing all aspects of processes. Recently, new techniques have been developed to analyze event data containing uncertainty; these techniques strongly rely on representing uncertain event data through graph-based models capturing uncertainty. In this paper we present a novel approach to efficiently compute a graph representation of the behavior contained in an uncertain process trace. We present our new algorithm, analyze its time complexity, and report experimental results showing order-of-magnitude performance improvements for behavior graph construction.
Conference Paper
Full-text available
Modern information systems are able to collect event data in the form of event logs. Process mining techniques allow to discover a model from event data, to check the conformance of an event log against a reference model, and to perform further process-centric analyses. In this paper, we consider uncertain event logs, where data is recorded together with explicit uncertainty information. We describe a technique to discover a directly-follows graph from such event data which retains information about the uncertainty in the process. We then present experimental results of performing inductive mining over the directly-follows graph to obtain models representing the certain and uncertain part of the process.
Conference Paper
Full-text available
Nowadays, more and more process data are automatically recorded by information systems, and made available in the form of event logs. Process mining techniques enable process-centric analysis of data, including automatically discovering process models and checking if event data conform to a certain model. In this paper we analyze the previously unexplored setting of uncertain event logs: logs where quantified uncertainty is recorded together with the corresponding data. We define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained aligning an uncertain trace onto a regular process model.
Article
Anomaly detection in process mining aims to recognize outlying or unexpected behavior in event logs for purposes such as the removal of noise and identification of conformance violations. Existing techniques for this task are primarily frequency-based, arguing that behavior is anomalous because it is uncommon. However, such techniques ignore the semantics of recorded events and, therefore, do not take the meaning of potential anomalies into consideration. In this work, we overcome this caveat and focus on the detection of anomalies from a semantic perspective, arguing that anomalies can be recognized when process behavior does not make sense. To achieve this, we propose an approach that exploits the natural language associated with events. Our key idea is to detect anomalous process behavior by identifying semantically inconsistent execution patterns. To detect such patterns, we first automatically extract business objects and actions from the textual labels of events. We then compare these against a process-independent knowledge base. By populating this knowledge base with patterns from various kinds of resources, our approach can be used in a range of contexts and domains. We demonstrate the capability of our approach to successfully detect semantic execution anomalies through an evaluation based on a set of real-world and synthetic event logs and show the complementary nature of semantics-based anomaly detection to existing frequency-based techniques.