Content uploaded by Marco Pegoraro
Author content
All content in this area was uploaded by Marco Pegoraro on Apr 11, 2022
Content may be subject to copyright.
Process Mining on Uncertain Event Data
Marco Pegoraro 1
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
pegoraro@pads.rwth-aachen.de
Abstract
With the widespread adoption of process mining in organizations, the eld of pro-
cess science is seeing an increase in the demand for ad-hoc analysis techniques of
non-standard event data. An example of such data are uncertain event data: events
characterized by a described and quantied attribute imprecision. This paper out-
lines a research project aimed at developing process mining techniques able to ex-
tract insights from uncertain data. We set the basis for this research topic, recapit-
ulate the available literature, and dene a future outlook.
Keywords: Process Mining ·Uncertain Data ·Partial Order.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the author. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Pegoraro, Marco. “Process Mining on Uncertain Event Data”. In: International Conference on Process Mining ICPM
2021, Doctoral Consortium and Tool Demonstration Track, Eindhoven, the Netherlands, October 31–November 4, 2021.
Ed. by Jans, Mieke et al. CEUR-WS.org, 2021
Please, cite this document as shown above.
Publication chronology:
•2021-08-20: full text submitted to the International Conference on Process Mining (ICPM) 20201, Doctoral Consortium
•2021-10-08: notication of acceptance
•2021-10-13: camera-ready version submitted
•2021-10-31: presented
•2022-03-01: proceedings published
Correspondence to:
Marco Pegoraro, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Website: http://mpegoraro.net/ ·Email: pegoraro@pads.rwth-aachen.de ·ORCID: 0000-0002-8997-7517
Content: 8 pages, 1 gures, 3 tables, 9 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Pegoraro Process Mining on Uncertain Event Data
1 Introduction
Since its inception, process mining has ultimately proved its value in commercial appli-
cations. An ever-increasing number of success stories has led to a vast demand of the
most diverse process analysis techniques, ofen customized to meet the needs of specic
domains. Among these, novel techniques have been introduced to mine non-standard
types of data.
This paper presents a research direction aimed to mine one such type of anomalous
(i.e, uncommon) type of event data: uncertain data. Such data is associated with a degree
of imprecision that afects event attributes, which is described and quantied through
sets of possible attribute labels, intervals of possible values, or probability distributions.
The remainder of the paper is structured as follows. Section 2illustrates with ex-
amples the structure of uncertain event data. Section 3shows the research principles in
regard of process mining on uncertain data, and reports recent results on the topic. Fi-
nally, Section 4outlines open challenges, outlook, and future perspectives of this line of
research.
2 Uncertain Data
In order to more clearly visualize the structure of the attributes in uncertain events, let us
consider the following process instance, which is a simplied version of actually occur-
ring anomalies, e.g., in the processes of the healthcare domain. An elderly patient enrolls
in a clinical trial for an experimental treatment against myeloproliferative neoplasms, a
class of blood cancers. This enrollment includes a lab exam and a visit with a specialist;
then, the treatment can begin. The lab exam, performed on the 8th of July, nds a low
level of platelets in the blood of the patient, a condition known as thrombocytopenia
(TP). During the visit on the 10th of July, the patient reports an episode of night sweats
on the night of the 5th of July, prior to the lab exam. The medic notes this but also hy-
pothesizes that it might not be a symptom, since it can be caused either by the condition
or by external factors (such as very warm weather). The medic also reads the medical
records of the patient and sees that, shortly prior to the lab exam, the patient was under-
going a heparin treatment (a blood-thinning medication) to prevent blood clots. The
thrombocytopenia, detected by the lab exam, can then be either primary (caused by the
blood cancer) or secondary (caused by other factors, such as a concomitant condition).
Finally, the medic nds an enlargement of the spleen in the patient (splenomegaly). It
is unclear when this condition has developed: it might have appeared at any moment
prior to that point. These events are collected and recorded in the trace shown in Table 1
within the hospital’s information system.
Such scenario, with no known probability, is known as strong uncertainty. In this
3/8
M. Pegoraro Process Mining on Uncertain Event Data
Table 1: The strongly uncertain trace of an example of healthcare process. The timestamps column shows
only the day of the month.
Case ID Event ID Timestamp Activity Indeterminacy
ID192 e15NightSweats ?
ID192 e28PrTP,SecTP
ID192 e34–10 Splenomeg
trace, the rightmost column refers to event indeterminacy: in this case, e1has been recorded,
but it might not have occurred in reality, and is marked with a “?” symbol. Event e2has
more then one possible activity labels, either PrTP or SecTP. Lastly, event e3has an un-
certain timestamp, and might have happened at any point in time between the 4th and
10th of July.
Uncertain events may also have probability values associated with them, a scenario
dened as weak uncertainty (Table 2). In the example described above, suppose the medic
estimates that there is a high chance (90) that the thrombocytopenia is primary (caused
by the cancer). Furthermore, if the splenomegaly is suspected to have developed three
days prior to the visit, which takes place on the 10th of July, the timestamp of event e3
may be described through a Gaussian curve with µ= 7. Lastly, the probability that the
event e1has been recorded but did not occur in reality may be known (for example, it
may be 25).
Table 2: A trace where uncertain event attributes are labeled with probabilities (weak uncertainty).
Case ID Event ID Timestamp Activity Indeterminacy
ID348 e45NightSweats ? : 25%
ID348 e58PrTP: 90%,
SecTP: 10%
ID348 e6N(7,1) Splenomeg
Table 3summarizes the types of uncertain data subject of our research.
3 Research Approach
We will now illustrate the guiding principles of our research plans, through a series of
assertions.
Assertion 1 (Uncertainty is not noise).Uncertain data contain information and
value. We do not aim to analyze the data beyond the uncertainty, but the data within the
uncertainty.
4/8
M. Pegoraro Process Mining on Uncertain Event Data
Table 3: The four diferent types of uncertainty subject of this research project.
Weak uncertainty Strong uncertainty
(stochastic) (non-deterministic)
Discrete data
Discrete probability distribution
0 5 10 15 20
0
10
20
30
40
50
Set of possible values
{x1, x2, x3, . . . } ⊆ X
Continuous data
Probability density function
−2−1012 3
0
0.2
0.4
0.6
0.8
Interval
{x∈R|a≤x≤b}
Assertion 2 (Uncertainty should not be filtered or repaired).To extract informa-
tion from uncertainty itself, existing approaches to filter or repair data are not applicable:
information from uncertainty must be accounted for, and not altered.
Assertion 3 (Uncertainty is behavior).The many possible values for event attributes
entail numerous possible scenarios for the control-flow perspective of an uncertain trace—
which can be represented as behavior. To fully analyze uncertain process instances, it is
necessary to account for such behavior.
The fundamental technique that enables the analysis of uncertain traces is their rep-
resentation as dynamic objects that incorporate the intrinsic behavior of uncertain traces,
such as graphs or Petri nets (behavior graphs or behavior nets [2], respectively). This leads
to the schematic visible in Figure 1.
A number of mining techniques for uncertain event data are now present in litera-
ture. A taxonomy of uncertain event data is available [2], as well as a method to reliably
compute the probability associated with each real-life scenario in an uncertain trace [3].
There exist approaches for conformance checking [4] and process discovery [5] over
strongly uncertain event data. The key phase in uncertain data analysis of building graph
5/8
M. Pegoraro Process Mining on Uncertain Event Data
Agents
Process
Domain
knowledge
Raw data
Uncertain
event log
Graph
representation Process model
Records
Abstracts
Process
discovery
Conformance
checking
Figure 1: The overall schema for process mining over uncertainty.
representation has been optimized through ecient algorithms [6,7]. Such techniques
are available in the PROVED toolset [8], which employs an ad-hoc extension of the XES
standard to represent uncertain data [9]. A real-life source of uncertain data, convolu-
tional neural network sensing in video feeds of processes, has been described, as well as
an additional taxonomy also involving process models [1].
4 Open Challenges and Conclusion
The eld of process mining over uncertain data is still in its infancy. While some tech-
niques to perform discovery and conformance checking over uncertainty do exist, the
weakly uncertain case is still unexplored. The principle of the four quality metrics of logs
and processes (tness, precision, simplicity, precision), a cornerstone of process mining,
needs to be (re)developed in the context of uncertain data.
Through analyzing uncertain event data without discarding any of the attributes in
an uncertain event log, this research direction unlocks the extraction of process informa-
tion formerly inaccessible. Insights from process mining analyses can, as a consequence,
maintain quantied guarantees of reliability and accuracy even in presence of data af-
fected by uncertainty.
6/8
M. Pegoraro Process Mining on Uncertain Event Data
Acknowledgments
I am very grateful to Prof. Wil van der Aalst, who advises my doctoral studies, and to
Merih Seran Uysal, who supervises me in researching this topic. I thank the Alexander
von Humboldt (AvH) Stifung for supporting my research interactions.
References
[1] Cohen, Izack and Avigdor Gal. “Uncertain Process Data with Probabilistic Knowl-
edge: Problem Characterization and Challenges”. In: CoRR/abs (2021). arXiv: 2106.
03324.url:https://arxiv.org/abs/2106.03324.
[2] Pegoraro, Marco and Wil M. P. van der Aalst. “Mining Uncertain Event Data in
Process Mining”. In: International Conference on Process Mining, ICPM 2019,
Aachen, Germany, June 24-26, 2019. IEEE, 2019, pp. 89–96. doi:10 . 1109 /
ICPM.2019.00023.
[3] Pegoraro, Marco, Bianka Bakullari, Merih Seran Uysal, et al. “Probability Esti-
mation of Uncertain Process Trace Realizations”. In: International Workshop on
Event Data and Behavioral Analytics (EdbA). Springer. 2021.
[4] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Conformance
checking over uncertain event data”. In: Information Systems 102 (2021), p. 101810.
doi:10.1016/j.is.2021.101810.
[5] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Discovering
Process Models from Uncertain Event Data”. In: Business Process Management
Workshops - BPM 2019 International Workshops, Vienna, Austria, September 1-6,
2019, Revised Selected Papers. Ed. by Francescomarino, Chiara Di, Remco M. Dijk-
man, and Uwe Zdun. Vol. 362. Lecture Notes in Business Information Processing.
Springer, 2019, pp. 238–249. doi:10.1007/978-3-030-37453-2_20.
[6] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Con-
struction of Behavior Graphs for Uncertain Event Data”. In: Business Information
Systems - 23rd International Conference, BIS 2020, Colorado Springs, CO, USA,
June 8-10, 2020, Proceedings. Ed. by Abramowicz, Witold and Gary Klein. Vol. 389.
Lecture Notes in Business Information Processing. Springer, 2020, pp. 76–88. doi:
10.1007/978-3-030-53337-3_6.
[7] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Time
and Space Representation of Uncertain Event Data”. In: Algorithms 13.11 (2020),
p. 285. doi:10.3390/a13110285.
7/8
M. Pegoraro Process Mining on Uncertain Event Data
[8] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “PROVED: A
Tool for Graph Representation and Analysis of Uncertain Event Data”. In: Appli-
cation and Theory of Petri Nets and Concurrency - 42nd International Conference,
PETRI NETS 2021, Virtual Event, June 23-25, 2021, Proceedings. Ed. by Buchs, Di-
dier and Josep Carmona. Vol. 12734. Lecture Notes in Computer Science. Springer,
2021, pp. 476–486. doi:10.1007/978-3-030-76983-3_24.
[9] Pegoraro, Marco, Merih Seran Uysal, and Wil M.P. van der Aalst. “An XES Ex-
tension for Uncertain Event Data”. In: Proceedings of the Demonstration & Re-
sources Track, Best BPM Dissertation Award, and Doctoral Consortium at BPM
2021 co-located with the 19th International Conference on Business Process Manage-
ment, BPM 2021, Rome, Italy, September 6-10, 2021. Ed. by van der Aalst, Wil
M. P., Remco Dijkman, Akhil Kumar, et al. CEUR-WS.org, 2021, pp. 116–120.
url:http://ceur-ws.org/Vol-2973/#paper_273.
8/8