PreprintPDF Available

Process Mining on Uncertain Event Data

Authors:

Abstract and Figures

With the widespread adoption of process mining in organizations, the field of process science is seeing an increase in the demand for ad-hoc analysis techniques of non-standard event data. An example of such data are uncertain event data: events characterized by a described and quantified attribute imprecision. This paper outlines a research project aimed at developing process mining techniques able to extract insights from uncertain data. We set the basis for this research topic, recapitulate the available literature, and define a future outlook.
Content may be subject to copyright.
Process Mining on Uncertain Event Data
Marco Pegoraro 1
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
pegoraro@pads.rwth-aachen.de
Abstract
With the widespread adoption of process mining in organizations, the eld of pro-
cess science is seeing an increase in the demand for ad-hoc analysis techniques of
non-standard event data. An example of such data are uncertain event data: events
characterized by a described and quantied attribute imprecision. This paper out-
lines a research project aimed at developing process mining techniques able to ex-
tract insights from uncertain data. We set the basis for this research topic, recapit-
ulate the available literature, and dene a future outlook.
Keywords: Process Mining ·Uncertain Data ·Partial Order.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the author. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Pegoraro, Marco. “Process Mining on Uncertain Event Data”. In: International Conference on Process Mining ICPM
2021, Doctoral Consortium and Tool Demonstration Track, Eindhoven, the Netherlands, October 31–November 4, 2021.
Ed. by Jans, Mieke et al. CEUR-WS.org, 2021
Please, cite this document as shown above.
Publication chronology:
2021-08-20: full text submitted to the International Conference on Process Mining (ICPM) 20201, Doctoral Consortium
2021-10-08: notication of acceptance
2021-10-13: camera-ready version submitted
2021-10-31: presented
2022-03-01: proceedings published
Correspondence to:
Marco Pegoraro, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Website: http://mpegoraro.net/ ·Email: pegoraro@pads.rwth-aachen.de ·ORCID: 0000-0002-8997-7517
Content: 8 pages, 1 gures, 3 tables, 9 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Pegoraro Process Mining on Uncertain Event Data
1 Introduction
Since its inception, process mining has ultimately proved its value in commercial appli-
cations. An ever-increasing number of success stories has led to a vast demand of the
most diverse process analysis techniques, ofen customized to meet the needs of specic
domains. Among these, novel techniques have been introduced to mine non-standard
types of data.
This paper presents a research direction aimed to mine one such type of anomalous
(i.e, uncommon) type of event data: uncertain data. Such data is associated with a degree
of imprecision that afects event attributes, which is described and quantied through
sets of possible attribute labels, intervals of possible values, or probability distributions.
The remainder of the paper is structured as follows. Section 2illustrates with ex-
amples the structure of uncertain event data. Section 3shows the research principles in
regard of process mining on uncertain data, and reports recent results on the topic. Fi-
nally, Section 4outlines open challenges, outlook, and future perspectives of this line of
research.
2 Uncertain Data
In order to more clearly visualize the structure of the attributes in uncertain events, let us
consider the following process instance, which is a simplied version of actually occur-
ring anomalies, e.g., in the processes of the healthcare domain. An elderly patient enrolls
in a clinical trial for an experimental treatment against myeloproliferative neoplasms, a
class of blood cancers. This enrollment includes a lab exam and a visit with a specialist;
then, the treatment can begin. The lab exam, performed on the 8th of July, nds a low
level of platelets in the blood of the patient, a condition known as thrombocytopenia
(TP). During the visit on the 10th of July, the patient reports an episode of night sweats
on the night of the 5th of July, prior to the lab exam. The medic notes this but also hy-
pothesizes that it might not be a symptom, since it can be caused either by the condition
or by external factors (such as very warm weather). The medic also reads the medical
records of the patient and sees that, shortly prior to the lab exam, the patient was under-
going a heparin treatment (a blood-thinning medication) to prevent blood clots. The
thrombocytopenia, detected by the lab exam, can then be either primary (caused by the
blood cancer) or secondary (caused by other factors, such as a concomitant condition).
Finally, the medic nds an enlargement of the spleen in the patient (splenomegaly). It
is unclear when this condition has developed: it might have appeared at any moment
prior to that point. These events are collected and recorded in the trace shown in Table 1
within the hospital’s information system.
Such scenario, with no known probability, is known as strong uncertainty. In this
3/8
M. Pegoraro Process Mining on Uncertain Event Data
Table 1: The strongly uncertain trace of an example of healthcare process. The timestamps column shows
only the day of the month.
Case ID Event ID Timestamp Activity Indeterminacy
ID192 e15NightSweats ?
ID192 e28PrTP,SecTP
ID192 e34–10 Splenomeg
trace, the rightmost column refers to event indeterminacy: in this case, e1has been recorded,
but it might not have occurred in reality, and is marked with a “?” symbol. Event e2has
more then one possible activity labels, either PrTP or SecTP. Lastly, event e3has an un-
certain timestamp, and might have happened at any point in time between the 4th and
10th of July.
Uncertain events may also have probability values associated with them, a scenario
dened as weak uncertainty (Table 2). In the example described above, suppose the medic
estimates that there is a high chance (90) that the thrombocytopenia is primary (caused
by the cancer). Furthermore, if the splenomegaly is suspected to have developed three
days prior to the visit, which takes place on the 10th of July, the timestamp of event e3
may be described through a Gaussian curve with µ= 7. Lastly, the probability that the
event e1has been recorded but did not occur in reality may be known (for example, it
may be 25).
Table 2: A trace where uncertain event attributes are labeled with probabilities (weak uncertainty).
Case ID Event ID Timestamp Activity Indeterminacy
ID348 e45NightSweats ? : 25%
ID348 e58PrTP: 90%,
SecTP: 10%
ID348 e6N(7,1) Splenomeg
Table 3summarizes the types of uncertain data subject of our research.
3 Research Approach
We will now illustrate the guiding principles of our research plans, through a series of
assertions.
Assertion 1 (Uncertainty is not noise).Uncertain data contain information and
value. We do not aim to analyze the data beyond the uncertainty, but the data within the
uncertainty.
4/8
M. Pegoraro Process Mining on Uncertain Event Data
Table 3: The four diferent types of uncertainty subject of this research project.
Weak uncertainty Strong uncertainty
(stochastic) (non-deterministic)
Discrete data
Discrete probability distribution
0 5 10 15 20
0
10
20
30
40
50
Set of possible values
{x1, x2, x3, . . . } ⊆ X
Continuous data
Probability density function
21012 3
0
0.2
0.4
0.6
0.8
Interval
{xR|axb}
Assertion 2 (Uncertainty should not be filtered or repaired).To extract informa-
tion from uncertainty itself, existing approaches to filter or repair data are not applicable:
information from uncertainty must be accounted for, and not altered.
Assertion 3 (Uncertainty is behavior).The many possible values for event attributes
entail numerous possible scenarios for the control-flow perspective of an uncertain trace—
which can be represented as behavior. To fully analyze uncertain process instances, it is
necessary to account for such behavior.
The fundamental technique that enables the analysis of uncertain traces is their rep-
resentation as dynamic objects that incorporate the intrinsic behavior of uncertain traces,
such as graphs or Petri nets (behavior graphs or behavior nets [2], respectively). This leads
to the schematic visible in Figure 1.
A number of mining techniques for uncertain event data are now present in litera-
ture. A taxonomy of uncertain event data is available [2], as well as a method to reliably
compute the probability associated with each real-life scenario in an uncertain trace [3].
There exist approaches for conformance checking [4] and process discovery [5] over
strongly uncertain event data. The key phase in uncertain data analysis of building graph
5/8
M. Pegoraro Process Mining on Uncertain Event Data
Agents
Process
Domain
knowledge
Raw data
Uncertain
event log
Graph
representation Process model
Records
Abstracts
Process
discovery
Conformance
checking
Figure 1: The overall schema for process mining over uncertainty.
representation has been optimized through ecient algorithms [6,7]. Such techniques
are available in the PROVED toolset [8], which employs an ad-hoc extension of the XES
standard to represent uncertain data [9]. A real-life source of uncertain data, convolu-
tional neural network sensing in video feeds of processes, has been described, as well as
an additional taxonomy also involving process models [1].
4 Open Challenges and Conclusion
The eld of process mining over uncertain data is still in its infancy. While some tech-
niques to perform discovery and conformance checking over uncertainty do exist, the
weakly uncertain case is still unexplored. The principle of the four quality metrics of logs
and processes (tness, precision, simplicity, precision), a cornerstone of process mining,
needs to be (re)developed in the context of uncertain data.
Through analyzing uncertain event data without discarding any of the attributes in
an uncertain event log, this research direction unlocks the extraction of process informa-
tion formerly inaccessible. Insights from process mining analyses can, as a consequence,
maintain quantied guarantees of reliability and accuracy even in presence of data af-
fected by uncertainty.
6/8
M. Pegoraro Process Mining on Uncertain Event Data
Acknowledgments
I am very grateful to Prof. Wil van der Aalst, who advises my doctoral studies, and to
Merih Seran Uysal, who supervises me in researching this topic. I thank the Alexander
von Humboldt (AvH) Stifung for supporting my research interactions.
References
[1] Cohen, Izack and Avigdor Gal. “Uncertain Process Data with Probabilistic Knowl-
edge: Problem Characterization and Challenges”. In: CoRR/abs (2021). arXiv: 2106.
03324.url:https://arxiv.org/abs/2106.03324.
[2] Pegoraro, Marco and Wil M. P. van der Aalst. “Mining Uncertain Event Data in
Process Mining”. In: International Conference on Process Mining, ICPM 2019,
Aachen, Germany, June 24-26, 2019. IEEE, 2019, pp. 89–96. doi:10 . 1109 /
ICPM.2019.00023.
[3] Pegoraro, Marco, Bianka Bakullari, Merih Seran Uysal, et al. “Probability Esti-
mation of Uncertain Process Trace Realizations”. In: International Workshop on
Event Data and Behavioral Analytics (EdbA). Springer. 2021.
[4] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Conformance
checking over uncertain event data”. In: Information Systems 102 (2021), p. 101810.
doi:10.1016/j.is.2021.101810.
[5] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Discovering
Process Models from Uncertain Event Data”. In: Business Process Management
Workshops - BPM 2019 International Workshops, Vienna, Austria, September 1-6,
2019, Revised Selected Papers. Ed. by Francescomarino, Chiara Di, Remco M. Dijk-
man, and Uwe Zdun. Vol. 362. Lecture Notes in Business Information Processing.
Springer, 2019, pp. 238–249. doi:10.1007/978-3-030-37453-2_20.
[6] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Con-
struction of Behavior Graphs for Uncertain Event Data”. In: Business Information
Systems - 23rd International Conference, BIS 2020, Colorado Springs, CO, USA,
June 8-10, 2020, Proceedings. Ed. by Abramowicz, Witold and Gary Klein. Vol. 389.
Lecture Notes in Business Information Processing. Springer, 2020, pp. 76–88. doi:
10.1007/978-3-030-53337-3_6.
[7] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Time
and Space Representation of Uncertain Event Data”. In: Algorithms 13.11 (2020),
p. 285. doi:10.3390/a13110285.
7/8
M. Pegoraro Process Mining on Uncertain Event Data
[8] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “PROVED: A
Tool for Graph Representation and Analysis of Uncertain Event Data”. In: Appli-
cation and Theory of Petri Nets and Concurrency - 42nd International Conference,
PETRI NETS 2021, Virtual Event, June 23-25, 2021, Proceedings. Ed. by Buchs, Di-
dier and Josep Carmona. Vol. 12734. Lecture Notes in Computer Science. Springer,
2021, pp. 476–486. doi:10.1007/978-3-030-76983-3_24.
[9] Pegoraro, Marco, Merih Seran Uysal, and Wil M.P. van der Aalst. “An XES Ex-
tension for Uncertain Event Data”. In: Proceedings of the Demonstration & Re-
sources Track, Best BPM Dissertation Award, and Doctoral Consortium at BPM
2021 co-located with the 19th International Conference on Business Process Manage-
ment, BPM 2021, Rome, Italy, September 6-10, 2021. Ed. by van der Aalst, Wil
M. P., Remco Dijkman, Akhil Kumar, et al. CEUR-WS.org, 2021, pp. 116–120.
url:http://ceur-ws.org/Vol-2973/#paper_273.
8/8
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Process mining is a scientific discipline that analyzes event data, often collected in databases called event logs. Recently, uncertain event logs have become of interest, which contain non-deterministic and stochastic event attributes that may represent many possible real-life scenarios. In this paper, we present a method to reliably estimate the probability of each of such scenarios, allowing their analysis. Experiments show that the probabilities calculated with our method closely match the true chances of occurrence of specific outcomes, enabling more trustworthy analyses on uncertain data.
Conference Paper
Full-text available
Event data, often stored in the form of event logs, serve as the starting point for process mining and other evidence-based process improvements. However, event data in logs are often tainted by noise, errors, and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a class of anomalies known as uncertainty—imprecisions quantified with meta-information in the event log. This paper illustrates an extension of the XES data standard capable of representing uncertain event data. Such an extension enables input, output, and manipulation of uncertain data, as well as analysis through the process discovery and conformance checking approaches available in literature.
Conference Paper
Full-text available
The discipline of process mining aims to study processes in a data-driven manner by analyzing historical process executions, often employing Petri nets. Event data, extracted from information systems (e.g. SAP), serve as the starting point for process mining. Recently, novel types of event data have gathered interest among the process mining community, including uncertain event data. Uncertain events, process traces and logs contain attributes that are characterized by quantified imprecisions, e.g., a set of possible attribute values. The PROVED tool helps to explore, navigate and analyze such uncertain event data by abstracting the uncertain information using behavior graphs and nets, which have Petri nets semantics. Based on these constructs, the tool enables discovery and conformance checking.
Article
Full-text available
The strong impulse to digitize processes and operations in companies and enterprises have resulted in the creation and automatic recording of an increasingly large amount of process data in information systems. These are made available in the form of event logs. Process mining techniques enable the process-centric analysis of data, including automatically discovering process models and checking if event data conform to a given model. In this paper, we analyze the previously unexplored setting of uncertain event logs. In such event logs uncertainty is recorded explicitly, i.e., the time, activity and case of an event may be unclear or imprecise. In this work, we define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained by aligning an uncertain trace onto a regular process model.
Article
Full-text available
Process mining is a discipline which concerns the analysis of execution data of operational processes, the extraction of models from event data, the measurement of the conformance between event data and normative models, and the enhancement of all aspects of processes. Most approaches assume that event data is accurately captured behavior. However, this is not realistic in many applications: data can contain uncertainty, generated from errors in recording, imprecise measurements, and other factors. Recently, new methods have been developed to analyze event data containing uncertainty; these techniques prominently rely on representing uncertain event data by means of graph-based models explicitly capturing uncertainty. In this paper, we introduce a new approach to efficiently calculate a graph representation of the behavior contained in an uncertain process trace. We present our novel algorithm, prove its asymptotic time complexity, and show experimental results that highlight order-of-magnitude performance improvements for the behavior graph construction.
Conference Paper
Full-text available
The discipline of process mining deals with analyzing execution data of operational processes, extracting models from event data, checking the conformance between event data and normative models, and enhancing all aspects of processes. Recently, new techniques have been developed to analyze event data containing uncertainty; these techniques strongly rely on representing uncertain event data through graph-based models capturing uncertainty. In this paper we present a novel approach to efficiently compute a graph representation of the behavior contained in an uncertain process trace. We present our new algorithm, analyze its time complexity, and report experimental results showing order-of-magnitude performance improvements for behavior graph construction.
Conference Paper
Full-text available
Modern information systems are able to collect event data in the form of event logs. Process mining techniques allow to discover a model from event data, to check the conformance of an event log against a reference model, and to perform further process-centric analyses. In this paper, we consider uncertain event logs, where data is recorded together with explicit uncertainty information. We describe a technique to discover a directly-follows graph from such event data which retains information about the uncertainty in the process. We then present experimental results of performing inductive mining over the directly-follows graph to obtain models representing the certain and uncertain part of the process.
Conference Paper
Full-text available
Nowadays, more and more process data are automatically recorded by information systems, and made available in the form of event logs. Process mining techniques enable process-centric analysis of data, including automatically discovering process models and checking if event data conform to a certain model. In this paper we analyze the previously unexplored setting of uncertain event logs: logs where quantified uncertainty is recorded together with the corresponding data. We define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained aligning an uncertain trace onto a regular process model.
Uncertain Process Data with Probabilistic Knowledge: Problem Characterization and Challenges
  • Izack Cohen
  • Avigdor Gal
Cohen, Izack and Avigdor Gal. "Uncertain Process Data with Probabilistic Knowledge: Problem Characterization and Challenges". In: CoRR/abs (2021). arXiv: 2106. 03324. : https://arxiv.org/abs/2106.03324.