Conference PaperPDF Available

PROVED: A Tool for Graph Representation and Analysis of Uncertain Event Data

Authors:

Abstract and Figures

The discipline of process mining aims to study processes in a data-driven manner by analyzing historical process executions, often employing Petri nets. Event data, extracted from information systems (e.g. SAP), serve as the starting point for process mining. Recently, novel types of event data have gathered interest among the process mining community, including uncertain event data. Uncertain events, process traces and logs contain attributes that are characterized by quantified imprecisions, e.g., a set of possible attribute values. The PROVED tool helps to explore, navigate and analyze such uncertain event data by abstracting the uncertain information using behavior graphs and nets, which have Petri nets semantics. Based on these constructs, the tool enables discovery and conformance checking.
Content may be subject to copyright.
PROVED: A Tool for Graph Representation
and Analysis of Uncertain Event Data
Marco Pegoraro 1, Merih Seran Uysal 1, and Wil M.P. van der Aalst 1
1Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Aachen, Germany
{pegoraro, uysal, vwdaalst}@pads.rwth-aachen.de
Abstract
The discipline of process mining aims to study processes in a data-driven man-
ner by analyzing historical process executions, ofen employing Petri nets. Event
data, extracted from information systems (e.g. SAP), serve as the starting point for
process mining. Recently, novel types of event data have gathered interest among
the process mining community, including uncertain event data. Uncertain events,
process traces and logs contain attributes that are characterized by quantied im-
precisions, e.g., a set of possible attribute values. The PROVED tool helps to ex-
plore, navigate and analyze such uncertain event data by abstracting the uncer-
tain information using behavior graphs and nets, which have Petri nets semantics.
Based on these constructs, the tool enables discovery and conformance checking.
Keywords: Process Mining ·Uncertain Data ·Partial Order ·Petri Net Tool.
Colophon
This work is licensed under a Creative Commons “Attribution-NonCommercial 4.0 In-
ternational” license.
©the authors. Some rights reserved.
This document is an Author Accepted Manuscript (AAM) corresponding to the following scholarly paper:
Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “PROVED: A Tool for Graph Representation and
Analysis of Uncertain Event Data”. In: Application and Theory of Petri Nets and Concurrency - 42nd International
Conference, PETRI NETS 2021, Virtual Event, June 23-25, 2021, Proceedings. Ed. by Buchs, Didier and Josep Carmona.
Vol. 12734. Lecture Notes in Computer Science. Springer, 2021, pp. 476–486. doi:10.1007/978- 3-030-76983-
3_24
Please, cite this document as shown above.
Publication chronology:
2021-01-10: abstract submitted to the International Conference on Application and Theory of Petri Nets and Concurrency (PetriNets) 2021, tool track
2021-02-03: full text submitted to the International Conference on Application and Theory of Petri Nets and Concurrency (PetriNets) 2021, tool track
2021-03-05: notication of acceptance
2021-03-12: camera-ready version submitted
2021-06-16: proceeding published
2021-06-24: presented
The published version referred above is ©Springer.
Correspondence to:
Marco Pegoraro, Chair of Process and Data Science (PADS), Department of Computer Science,
RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
Website: http://mpegoraro.net/ ·Email: pegoraro@pads.rwth- aachen.de ·ORCID:0000-0002-8997-7517
Content: 15 pages, 6 gures, 1 table, 16 references. Typeset with pdfL
A
T
E
X, Biber, and BibL
A
T
E
X.
Please do not print this document unless strictly necessary.
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
1 Introduction
Process mining is a branch of process sciences that performs analysis on processes fo-
cusing on a log of execution data [2]. From an event log of the process, it is possible to
automatically discover a model that describes the ow of a case in the process, or measure
the deviations between a normative model and the log.
The primary enabler of process mining analyses is the control-ow perspective of
event data, which has been extensively investigated and utilized by researchers in this
domain.
Modern information systems supporting processes can enable the extraction of more
data perspectives: for instance, it is ofen possible to retrieve (and thus analyze) additional
event attributes, such as the agent (resource) associated with the event, or the cost of a
specic activity instance.
Collected event data can be subjected to errors, imprecisions and anomalies; as a con-
sequence, they can be afected by uncertainty. Uncertainty can be caused by many fac-
tors, such as sensitivity of sensors, human error, limitations of information systems, or
failure of recording systems. The type of uncertainty we consider here is quantied: the
event log includes some meta-attributes that describe the uncertainty afecting the event.
For instance, the activity label of an event can be unknown, but we might have access to a
set of possible activity labels for the event. In this case, in addition to the usual attributes
constituting the event in the log, we have a meta-attribute containing a set of activity
labels associated with the event. In principle, such meta-attributes can be natively sup-
ported by the information system; however, they are usually inferred afer the extraction
of the event log, in a pre-processing step to be undertaken before the analysis. Ofen,
this pre-processing step necessitates domain knowledge to dene, identify, and quantify
diferent types of uncertainty in the event log.
In an event log, regular traces provide a static description of the events that occurred
during the completion of a case in the process. Conversely, uncertain process traces con-
tain behavior, and describe a number of possible scenarios that might have occurred in
reality. Only one of these scenarios actually took place. It is possible to represent this in-
herent behavior of uncertain traces with graphical constructs, which are built from the
data available in the event log. Some applications of process mining to uncertain data re-
quire a model with execution semantics, so to be able to execute all and only the possible
real-life scenarios described by the uncertain attributes in the log. To this end, Petri nets
are the model of choice to accomplish this, thanks to their ability to compactly repre-
sent complex constructs like exclusive choice, possibility of skipping activities, and most
importantly, concurrency.
Process mining using uncertain event data is an emerging topic with only a few recent
papers. The topic was rst introduced in [10] and successively extended in [11]: here, the
authors provide a taxonomy and a classication of the possible types of uncertainty that
3 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
can appear in event data. Furthermore, they propose an approach to obtain measures
for conformance score (upper and lower bounds) between uncertain process traces and
a normative process model represented by a Petri net.
An additional application of process mining algorithms for uncertain event logs re-
lates to the domain of process discovery. Here, the uncertain log is mined for possi-
ble directly-follows relationships between activities: the result, an Uncertain Directly-
Follows Graph (UDFG), expresses the minimum and maximum possible strength of
the relationship between pair of activities. In turn, this can be exploited to perform pro-
cess discovery with established discovery techniques. For instance, the inductive miner
algorithm can, given the UDFG and some ltering parameters, automatically discover
a process model of the process which also embeds information about the uncertain be-
havior [12].
While the technological sector of process mining sofware has been ourishing in re-
cent years, no existing tool—to the best of our knowledge—can analyze or handle event
data with uncertainty. In this paper, we present a novel tool based on Petri nets, which is
capable of performing process mining analyses on uncertain event logs. The PROVED
(PRocess mining OVer uncErtain Data) sofware [15] is able to leverage uncertain mining
techniques to deliver insights on the process without the need of discarding the informa-
tion afected by uncertainty; on the contrary, uncertainty is exploited to obtain a more
precise picture of all the possible behavior of the process. PROVED utilizes Petri nets as
means to model uncertain behavior in a trace, associating every possible scenario with a
complete ring sequence. This enables the analysis of uncertain event data.
The remainder of the paper is structured as follows: Section 2provides an overview
of the relevant literature on process mining over uncertainty. Section 3presents the con-
cept of uncertain event data with examples. Section 4illustrates the architectural struc-
ture of the PROVED tool. Section 5demonstrates some uses of the tool. Lastly, Section 6
concludes the paper.
2 Related Work
The problem of modeling systems containing or representing uncertain behavior is well-
investigated and has many established research results. Systems where specic compo-
nents are associated with time intervals can, for instance, be modeled with time Petri
nets [4]. Large systems with more complex timed inter-operations between components
can be represented by interval-timed coloured Petri nets [1]. Probabilistic efects can be
modeled and simulated in a system by formalisms such as generalized stochastic Petri
nets [9]. It is important to notice, however, that the focus of process mining over uncer-
tain event data is diferent: the aim is not to simulate the uncertain behavior in a model,
but rather to perform data-driven analyses, some results of which can be represented by
4 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
(regular) Petri nets.
The PROVED tool contains the implementation of existing techniques for pro-
cess mining over uncertain event data. In this paper, we will show the capabilities of
PROVED in performing the analysis presented in the literature mentioned above. In
terms of tool functionalities, constructing a Petri net based on the description of specic
behavior—known as synthesis in Petri net research—has some precedents: for instance,
from transition systems [6] in the context of process discovery. More relevantly for this
paper, the VipTool [3] allows to synthesize Petri nets based on partially ordered objects.
While partial order between events is in itself a kind of uncertainty and a consequence
of the presence of uncertain timestamps, in this tool paper we extend Petri net synthesis
to additional types of uncertainty, and we add process mining functionalities.
3 Preliminary Concepts
The motivating problem behind the PROVED tool is the analysis of uncertain event
data. Let us give an example of a process instance generating uncertain data.
An elderly patient enrolls in a clinical trial for an experimental treatment against
myeloproliferative neoplasms, a class of blood cancers. The enrollment in this trial in-
cludes a lab exam and a visit with a specialist; then, the treatment can begin. The lab
exam, performed on the 8th of July, nds a low level of platelets in the blood of the pa-
tient, a condition known as thrombocytopenia (TP). At the visit, on the 10th of July,
the patient self-reports an episode of night sweats on the night of the 5th of July, prior
to the lab exam: the medic notes this, but also hypothesized that it might not be a symp-
tom, since it can be caused not by the condition but by external factors (such as very
warm weather). The medic also reads the medical records of the patient and sees that,
shortly prior to the lab exam, the patient was undergoing a heparine treatment (a blood-
thinning medication) to prevent blood clots. The thrombocytopenia found with the lab
exam can then be primary (caused by the blood cancer) or secondary (caused by other fac-
tors, such as a drug). Finally, the medic nds an enlargement of the spleen in the patient
(splenomegaly). It is unclear when this condition has developed: it might have appeared
at any moment prior to that point. The medic decides to admit the patient to the clinical
trial, starting 12th of July.
These events are collected and recorded in the trace shown in Table 1in the infor-
mation system of the hospital. Uncertain activities are indicated as a set of possibilities.
Uncertain timestamps are denoted as intervals. Some event are indicated with a “?” in
the rightmost column; these so-called indeterminate events have been recorded, but it is
unclear if they actually happened in reality. Regular (i.e., non-indeterminate) events are
marked with “!”. For the sake of readability, the timestamp eld only indicates the day
of the month.
5 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
Table 1: The uncertain trace of an instance of healthcare process used as a running example. For the sake of
clarity, we have further simplied the notation in the timestamps column, by showing only the day of the
month.
Case ID Event ID Timestamp Activity Indet. event
ID192 e15NightSweats ?
ID192 e28{PrTP,SecTP}!
ID192 e3[4, 10] Splenomeg !
ID192 e412 Adm !
Throughout the paper, we will utilize the trace of Table 1as a running example to
showcase the functionalities of the PROVED tool.
4 Architecture
This section provides an overview of the architecture of the PROVED tool, as well as a
presentation of the libraries and existing sofware that are used in the tool as dependen-
cies.
Our tool has two distinct parts, a library (implemented in the PROVED Python
package) and a user interface allowing to operate the functions in the library in a graph-
ical, non-programmatic way.
The library is written in the Python programming language (compatible with ver-
sions 3.6.x through 3.8.x), and is distributed through the Python package manager pip [14].
Notable sofware dependencies include:
PM4Py [5]: a process mining library for Python. PM4Py is able to provide many
classical process mining functionalities needed for PROVED, including import-
ing/exporting of logs and models, management of log objects, and conformance
checking through alignments. Notice that PM4Py also provides functions to rep-
resent and manage Petri nets.
NetworkX [8]: this library provides a set of graph algorithms for Python. It is
used for the management of graph objects in PROVED.
Graphviz [7]: this library adds visualization functionalities for graphs to PROVED,
and is used to visualize directed graphs and Petri nets.
The aforementioned libraries enable the management, analysis and visualization of un-
certain event data, and support the mining techniques of the PROVED toolset here il-
lustrated. An uncertain log in PROVED is a log object of the PM4Py library; here, we
will list only the novel functionalities introduced in PROVED, while omitting existing
6 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
NightSweats
e1
{PrTP, SecTP}
e2
Splenomeg
e3
Adm
e4
Figure 1: The behavior graph of the trace in Table 1.
All the nodes in the graph are connected based on
precedence relationships. Pairs of nodes for which
the order is certain are connected by a path in the
graph; pairs of nodes for which the order is un-
known are pairwise unreachable.
NightSweats
e1
NightSweats
e1
PrTP
e2
SecTP
e2
Splenomeg
e3
Adm
e4
Figure 2: The behavior net corresponding to the
uncertain trace in Table 1. The labels abovethe tran-
sitions show the corresponding uncertain event.
The initial marking is displayed; the gray “token
slot” represents the nal marking. This net is able
to replay all and only the sequences of activities that
might have happened in reality.
features inherited from PM4Py—such as importing/exporting and attribute manipula-
tion.
4.1 Artifacts
As mentioned earlier, uncertain data contain behavior and, thus, dedicated constructs
are necessary to enable process mining analysis. In the PROVED tool, the subpackage
proved.artifacts contain the models and construction methods of such constructs.
Two fundamental artifacts for uncertain data representation are available:
proved.artifacts.behavior graph: here are collected the PROVED func-
tionalities related to the behavior graph of an uncertain trace. Behavior graphs are
directed acyclic graphs that capture the variability caused by uncertain timestamps
in the trace, and represent the partial order relationships between events. The be-
havior graph of the trace in Table 1is shown in Figure 1. The PROVED library can
build behavior graphs eciently (in quadratic time with respect to the number of
events) by using an algorithm described in [13].
proved.artifacts.behavior net: this subpackage includes all the func-
tionalities necessary to create and utilize behavior nets, which are acyclic Petri nets
that can replay all possible sequences of activities (called realizations) contained
in the uncertain trace. Behavior nets allow to simulate all “possible worlds” de-
scribed by an uncertain trace, and are crucial for tasks such as computing confor-
mance scores between uncertain traces and a normative model. The construction
technique for behavior nets is detailed in [11].
7 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
4.2 Algorithms
The algorithms contained in the PROVED tool are categorized in the three subpackages:
proved.algorithms.conformance: this subpackage contains all the func-
tionalities related to measuring conformance between uncertain data and a nor-
mative Petri net employing the alignment technique [10,11]. It includes functions
to compute upper and lower bounds for conformance score through exhaustive
alignment of the realizations of an uncertain trace, and an optimized technique
to eciently compute the lower bound.
proved.algorithms.discovery: this subpackage contains the functionali-
ties needed to perform process discovery over uncertain event logs. It ofers func-
tionalities to compute a UDFG, a graph representing an extension of the concept
of directly-follows relationship on uncertain data; this construct can be utilized to
perform inductive mining [12].
proved.algorithms.simulation: this subpackage contains some utility func-
tions to simulate uncertainty within an existing event log. It is possible to add sep-
arately the diferent kinds of uncertainty described in the taxonomy of [11], while
ne-tuning the dictionary of activity labels to sample and the amplitude of time
intervals for timestamps.
4.3 Interface
Some of the functionalities of the PROVED tool are also supported by a graphical user
interface. The PROVED interface is web-based, utilizing the Django framework in Python
for the back-end, and the Bootstrap framework in Javascript and HTML for the front
end. The user interface includes the PROVED library asa dependency, and is, thus, com-
pletely decoupled from the logic and algorithms in it. We will illustrate some parts of the
user interface in the next section.
5 Usage
In this section, we will outline how to install and use our tool. Firstly, let us focus on the
programmatic usage of the Python library.
The full source code for PROVED can be found on the GitHub project page1. Once
installed Python on the system, PROVED is available through the pip package manager
for Python, and can be installed with the terminal command pip install proved,
which will also install all the necessary dependencies.
1Available at https://github.com/proved-py/proved-core/
8 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
Thanks to the import and export functionalities inherited from PM4Py, which has
full XES [16] certication, it is possible to start uncertain logs analysis easily and com-
pactly. Let us examine the following example:
1from pm4py.objects.log.importer.xes import importer as x_importer
2from proved.artifacts import behavior_graph, behavior_net
3
4uncertain_log = x_importer.apply(’uncertain_event_log.xes’)
5uncertain_trace = uncertain_log[0]
6beh_graph = behavior_graph.BehaviorGraph(uncertain_trace)
7beh_net = behavior_net.BehaviorNet(beh_graph)
In this code snippet, an uncertain event log is imported, then the rst trace of the
log is selected, and the behavior graph and behavior net of the trace are obtained. Nodes
and connections of behavior graphs and nets can be explored using the igraph function-
alities and the PM4Py functionalities. We can also visualize both objects with Graphviz,
obtaining graphics akin to the ones in Figures 1and 2.
1from pm4py.objects.petri.importer import importer as p_importer
2from proved.algorithms.conformance.alignments import alignment_bounds_su
3
4net, i_mark, f_mark = p_importer.apply(’model.pnml’)
5
6alignments = alignment_bounds_su_log(uncertain_log, net, i_mark, f_mark)
In the snippet given above, we can see the code that allows to compute upper and
lower bounds for conformance score of all the traces in the uncertain log against a refer-
ence model that we import, utilizing the technique of alignments [11]. For each trace in
the log, a pair of alignment objects is computed: the rst one corresponds to an align-
ment with a cost equal to the lower bound for conformance cost, while the second object
is an alignment with the maximum possible conformance cost. The object alignments
is a list with one of such pairs for each trace in the log.
Let us now see some visual examples of the usages of the PROVED tool user inter-
face2. The graphical tool can be executed in a local environment by starting the Django
server in a terminal with the command python manage.py runserver.
Upon opening the tool and loading an uncertain event log, we are presented with a
dashboard that summarizes the main information regarding the event log, as shown in
Figure 3.
In the center panel of the dashboard, we can see statistics regarding the uncertain log.
On the top lef, we nd basic statistics such as the size of the log in the number of events
and traces, the average trace length, and the number of uncertain variants. Note that the
2Available at https://github.com/proved-py/proved-app/
9 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
Figure 3: The dashboard of the PROVED user interface. This screen contains general information regarding
an uncertain event log, including the list of uncertain variants, the number of instances of each activity label
(minimum and maximum), and statistics regarding the frequency of uncertain events and uncertain traces
in the log.
classical denition of variant is inconsistent in uncertain event logs; rather, uncertain
variants group together traces which have mutually isomorphic behavior graphs [11]. We
can also nd pie charts indicating the percentage of uncertain events in the log (events
with at least one uncertain attribute) and the percentage of uncertain traces in the log
(traces with at least one uncertain event).
On the bottom, a table reports the counts of the number of occurrences for each
activity label in the event log. Because of uncertainty on activity labels and indeterminate
events, there is a minimum and maximum amount of occurrences of a specic activity
label. The table reports both gures. There are two other tables in the dashboard, the
Start Activities table and the End Activities table. Both are akin to the activity table
depicted, but separately list activity labels appearing in the rst or last event in a trace.
Upon clicking on one of the uncertain variants listed on the lef, the user can access
the graphical representation of the variant. It is possible to visualize both the behavior
graph and the behavior net: the former is depicted in Figure 4. The gure specically
shows information related to the trace depicted in Table 1.
Next to the variant menu on the lef, we now have a trace menu, listing all the traces
belonging to that uncertain variant. Clicking on a specic trace, the user is presented
with data related to it, including a tabular view of the trace similar to that of Table 1,
and a Gantt diagram representation of the trace. Similarly to the behavior graph, the
Gantt diagram shows time information in a graphical manner; but, instead of show-
10 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
Figure 4: The uncertain variant page of the PROVED tool, showing information regarding the variant
obtained from the trace in Table 1. For a variant in an uncertain log, this page lists the traces belonging to
that variant, and displays the graphical representations for that variant—behavior graph and behavior net
(the latter is not displayed, but can be accessed through the tab on the top).
ing the precedence relationship between events, it shows the time information in scale,
representing the time intervals on an absolute scale. This visualization is presented in
Figure 5.
The interface allows the user to explore the features of an uncertain log, to “drill
down” to variants, traces, event and single attributes, and visualize the uncertain data in
a graphical manner without the need to resort to coding in Python.
Lastly, the menu on the lef also allows for loading a Petri net, and obtaining align-
ments on uncertain event data.
As shown above, every uncertain trace can be represented by a behavior net. A con-
formance score can be computed between such behavior nets and a normative process
model also represented by a Petri net: Figure 6illustrate the results of such alignment.
For a given behavior net, two alignments are provided, together with the respective cost:
one, showing a best-case scenario, and the other showing a worst-case scenario. This
enables diagnostics on uncertain event data.
6 Conclusions
In many real-world scenarios, the applicability of process mining techniques is severely
limited by data quality problems. In some situations, these anomalies causing an erro-
neous recording of data in an information system can be translated in uncertainty, which
is described through meta-attributes included in the log itself. Such uncertain event log
can still be analyzed and mined, thanks to specialized process mining techniques. The
PROVED tool is a Python-based sofware that enables such analysis. It provides capabil-
ities for importing and exporting uncertain event data in the XES format, for obtaining
graphical representations of data that can capture the behavior generated by uncertain
attributes, and for computing upper and lower bounds for conformance between un-
11 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
Figure 5: Visualization dedicated to a specic trace in the PROVED tool, showing information related to
the trace in Table 1. It is possible to see details on each event and on the uncertainty that might afect them,
as well as a visualization showing the time relationship between uncertain event in scale.
Figure 6: Visualization of alignments of the uncertain trace in Table 1and a normative process model. In
this case, the optimal alignment in the best case scenario perfectly ts the model, while in the worst case
scenario we have an alignment cost of 2, caused by one move on model and one move on log.
12 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
certain process traces and a normative model in the form of a Petri net.
Future work on the tool includes the denition of a formal XES language extension
with dedicated tags for uncertainty meta-attributes, the further development of front-
end functionalities to include more process mining capabilities, and more interactive
objects in the user interface. Moreover, the research efort on uncertainty afecting the
data perspective of processes can be integrated with the model perspective, blending un-
certainty research with formalisms such as stochastic Petri nets.
Acknowledgements
We thank the Alexander von Humboldt (AvH) Stifung for supporting our research in-
teractions.
References
[1] van der Aalst, Wil M. P. “Interval Timed Coloured Petri Nets and their Analy-
sis”. In: Application and Theory of Petri Nets 1993, 14th International Confer-
ence, Chicago, Illinois, USA, June 21-25, 1993, Proceedings. Ed. by Marsan, Marco
Ajmone. Vol. 691. Lecture Notes in Computer Science. Springer, 1993, pp. 453–
472. doi:10.1007/3-540-56863-8_61.
[2] van der Aalst, Wil M. P. Process Mining - Data Science in Action, Second Edition.
Springer, 2016. isbn: 978-3-662-49850-7. doi:10.1007/978-3- 662-49851-
4.
[3] Bergenthum, Robin, J¨
org Desel, Robert Lorenz, et al. “Synthesis of Petri Nets
from Scenarios with VipTool”. In: Applications and Theory of Petri Nets, 29th
International Conference, PETRI NETS 2008, Xi’an, China, June 23-27, 2008.
Proceedings. Ed. by van Hee, Kees M. and R ¨
udiger Valk. Vol. 5062. Lecture Notes
in Computer Science. Springer, 2008, pp. 388–398. doi:10 . 1007 / 978 - 3 -
540-68746-7_25.
[4] Berthomieu, Bernard and Michel Diaz. “Modeling and Verication of Time De-
pendent Systems Using Time Petri Nets”. In: IEEE Transactions on Soware En-
gineering 17.3 (1991), pp. 259–273. doi:10.1109/32.75415.
[5] Berti, Alessandro, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. “Process
Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Sci-
ence”. In: ICPM Demo Track (CEUR 2374). 2019, pp. 13–16. url:http : / /
ceur-ws.org/Vol-2374/paper4.pdf.
13 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
[6] Carmona, Josep, Jordi Cortadella, and Michael Kishinevsky. “Genet: A Tool for
the Synthesis and Mining of Petri Nets”. In: Ninth International Conference on
Application of Concurrency to System Design, ACSD 2009, Augsburg, Germany,
1-3 July 2009. IEEE Computer Society, 2009, pp. 181–185. doi:10.1109/ACSD.
2009.6.
[7] Ellson, John, Emden R. Gansner, Elefherios Koutsoos, et al. “Graphviz - Open
Source Graph Drawing Tools”. In: Graph Drawing, 9th International Sympo-
sium, GD 2001 Vienna, Austria, September 23-26, 2001, Revised Papers. Ed. by
Mutzel, Petra, Michael J¨
unger, and Sebastian Leipert. Vol. 2265. Lecture Notes
in Computer Science. Springer, 2001, pp. 483–484. doi:10 . 1007 / 3 - 540 -
45848-4_57.
[8] Hagberg, Aric, Pieter Swart, and Daniel S Chult. Exploring network structure, dy-
namics, and function using NetworkX. Tech. rep. Los Alamos National Lab.(LANL),
Los Alamos, NM (United States), 2008. url:https : / / www . osti . gov /
servlets/purl/960616.
[9] Marsan, Marco Ajmone, Gianfranco Balbo, Gianni Conte, et al. “Modelling with
Generalized Stochastic Petri Nets”. In: SIGMETRICS Performance Evaluation
Review 26.2 (1998), p. 2. doi:10.1145/288197.581193.
[10] Pegoraro, Marco and Wil M. P. van der Aalst. “Mining Uncertain Event Data in
Process Mining”. In: International Conference on Process Mining, ICPM 2019,
Aachen, Germany, June 24-26, 2019. IEEE, 2019, pp. 89–96. doi:10 . 1109 /
ICPM.2019.00023.
[11] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Conformance
Checking over Uncertain Event Data”. In: Information Systems (2021). doi:10.
1016/j.is.2021.101810.
[12] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Discovering
Process Models from Uncertain Event Data”. In: Business Process Management
Workshops - BPM 2019 International Workshops, Vienna, Austria, September 1-
6, 2019, Revised Selected Papers. Ed. by Di Francescomarino, Chiara, Remco M.
Dijkman, and Uwe Zdun. Vol. 362. Lecture Notes in Business Information Pro-
cessing. Springer, 2019, pp. 238–249. doi:10 .1007 / 978- 3 - 030- 37453 -
2_20.
[13] Pegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. “Ecient Time
and Space Representation of Uncertain Event Data”. In: Algorithms 13.11 (2020),
p. 285. doi:10.3390/a13110285.
[14] pip - PyPi.https://pypi.org/project/pip/. accessed: 2020-02-03.
14 / 15
M. Pegoraro et al. PROVED: A Tool for Analysis of Uncertain Event Data
[15] The PROVED project on GitHub.https: / / github . com / proved - py/.
accessed: 2021-02-03.
[16] Verbeek, H. M. W., Joos C. A. M. Buijs, Boudewijn F. van Dongen, et al. “XES,
XESame, and ProM 6”. In: Information Systems Evolution - CAiSE Forum 2010,
Hammamet, Tunisia, June 7-9, 2010, Selected Extended Papers. Ed. by Sofer, Pn-
ina and Erik Proper. Vol. 72. Lecture Notes in Business Information Processing.
Springer, 2010, pp. 60–75. doi:10.1007/978-3-642-17722-4_5.
15 / 15
... The synthetic data generation and the sof ware tools necessary to compute conformance bounds on uncertain event data are available within the PRocess mining OVer uncErtain Data (PROVED) project [30]. A speci c branch of the repository hosting the project is dedicated to the experiments presented in this paper, making them readily reproducible 2 . ...
... In terms of amount of deviation to be considered in each con guration, we aimed at recreating a situation where there is signi cant deviating behavior with respect to the normative model; for each kind of deviation considered, we introduced anomalous behavior in 30 We consider four di ferent settings for the addition of uncertain behavior to events logs: Activities = {u a = p, u t = %, u i = %}, Timestamps = {u a = %, u t = p, u i = %}, Indeterminate events = {u a = %, u t = %, u i = p} and All = {u a = p, u t = p, u i = p}. We test all four di ferent con gurations of deviation against each of the four con gurations of uncertainty, with increasing values of p, for a total of 16 separate experiments. ...
Article
Full-text available
The strong impulse to digitize processes and operations in companies and enterprises have resulted in the creation and automatic recording of an increasingly large amount of process data in information systems. These are made available in the form of event logs. Process mining techniques enable the process-centric analysis of data, including automatically discovering process models and checking if event data conform to a given model. In this paper, we analyze the previously unexplored setting of uncertain event logs. In such event logs uncertainty is recorded explicitly, i.e., the time, activity and case of an event may be unclear or imprecise. In this work, we define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained by aligning an uncertain trace onto a regular process model.
... The synthetic data generation and the sof ware tools necessary to compute conformance bounds on uncertain event data are available within the PRocess mining OVer uncErtain Data (PROVED) project [30]. A speci c branch of the repository hosting the project is dedicated to the experiments presented in this paper, making them readily reproducible 2 . ...
... In terms of amount of deviation to be considered in each con guration, we aimed at recreating a situation where there is signi cant deviating behavior with respect to the normative model; for each kind of deviation considered, we introduced anomalous behavior in 30 We consider four di ferent settings for the addition of uncertain behavior to events logs: Activities = {u a = p, u t = %, u i = %}, Timestamps = {u a = %, u t = p, u i = %}, Indeterminate events = {u a = %, u t = %, u i = p} and All = {u a = p, u t = p, u i = p}. We test all four di ferent con gurations of deviation against each of the four con gurations of uncertainty, with increasing values of p, for a total of 16 separate experiments. ...
Preprint
Full-text available
The strong impulse to digitize processes and operations in companies and enterprises have resulted in the creation and automatic recording of an increasingly large amount of process data in information systems. These are made available in the form of event logs. Process mining techniques enable the process-centric analysis of data, including automatically discovering process models and checking if event data conform to a given model. In this paper, we analyze the previously unexplored setting of uncertain event logs. In such event logs uncertainty is recorded explicitly, i.e., the time, activity and case of an event may be unclear or imprecise. In this work, we define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained by aligning an uncertain trace onto a regular process model.
... The problem of event-case correlation can be positioned in the broader context of uncertain event data [35,37]. This research direction aims to analyze event data with imprecise attributes, where single traces might correspond to an array of possible real-life scenarios. ...
Preprint
Modern software systems are able to record vast amounts of user actions, stored for later analysis. One of the main types of such user interaction data is click data: the digital trace of the actions of a user through the graphical elements of an application, website or software. While readily available, click data is often missing a case notion: an attribute linking events from user interactions to a specific process instance in the software. In this paper, we propose a neural network-based technique to determine a case notion for click data, thus enabling process mining and other process analysis techniques on user interaction data. We describe our method, show its scalability to datasets of large dimensions, and we validate its efficacy through a user study based on the segmented event log resulting from interaction data of a mobility sharing company. Interviews with domain experts in the company demonstrate that the case notion obtained by our method can lead to actionable process insights.
... Notice how all three models in the gure are not obtainable by ltering out the traces with uncertainty from the log; this would radically remove useful information from the event log. The process mining techniques described here are available in a Python library built on the PM4Py framework [14]. ...
Conference Paper
Full-text available
Process mining is a subfield of process science that analyzes event data collected in databases called event logs. Recently, novel types of event data have become of interest due to the wide industrial application of process mining analyses. In this paper, we examine uncertain event data. Such data contain meta-attributes describing the amount of imprecision tied with attributes recorded in an event log. We provide examples of uncertain event data, present the state of the art in regard of uncertainty in process mining, and illustrate open challenges related to this research direction.
... The problem of event-case correlation can be positioned in the broader context of uncertain event data [15,16]. This research direction aims to analyze event data with imprecise attributes, where single traces might correspond to an array of possible reallife scenarios. ...
Conference Paper
Full-text available
Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data often called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for event-case correlation on click data, in the context of user interaction events from a mobility sharing company. To reconstruct the case notion of the process, we apply a novel method to aggregate user interaction data in separate user sessions—interpreted as cases—based on neural networks. To validate our findings, we qualitatively discuss the impact of process mining analyses on the resulting well-formed event log through interviews with process experts.
... Notice how all three models in the gure are not obtainable by ltering out the traces with uncertainty from the log; this would radically remove useful information from the event log. The process mining techniques described here are available in a Python library built on the PM4Py framework [14]. ...
Preprint
Full-text available
Process mining is a subfield of process science that analyzes event data collected in databases called event logs. Recently, novel types of event data have become of interest due to the wide industrial application of process mining analyses. In this paper, we examine uncertain event data. Such data contain meta-attributes describing the amount of imprecision tied with attributes recorded in an event log. We provide examples of uncertain event data, present the state of the art in regard of uncertainty in process mining, and illustrate open challenges related to this research direction.
... The key phase in uncertain data analysis of building graph representation has been optimized through e cient algorithms [6,7]. Such techniques are available in the PROVED toolset [8], which employs an ad-hoc extension of the XES standard to represent uncertain data [9]. A real-life source of uncertain data, convolutional neural network sensing in video feeds of processes, has been described, as well as an additional taxonomy also involving process models [1]. ...
Preprint
Full-text available
With the widespread adoption of process mining in organizations, the field of process science is seeing an increase in the demand for ad-hoc analysis techniques of non-standard event data. An example of such data are uncertain event data: events characterized by a described and quantified attribute imprecision. This paper outlines a research project aimed at developing process mining techniques able to extract insights from uncertain data. We set the basis for this research topic, recapitulate the available literature, and define a future outlook.
... The key phase in uncertain data analysis of building graph representation has been optimized through e cient algorithms [6,7]. Such techniques are available in the PROVED toolset [8], which employs an ad-hoc extension of the XES standard to represent uncertain data [9]. A real-life source of uncertain data, convolutional neural network sensing in video feeds of processes, has been described, as well as an additional taxonomy also involving process models [1]. ...
Conference Paper
Full-text available
With the widespread adoption of process mining in organizations, the field of process science is seeing an increase in the demand for ad-hoc analysis techniques of non-standard event data. An example of such data are uncertain event data: events characterized by a described and quantified attribute imprecision. This paper outlines a research project aimed at developing process mining techniques able to extract insights from uncertain data. We set the basis for this research topic, recapitulate the available literature, and define a future outlook.
... The problem of event-case correlation can be positioned in the broader context of uncertain event data [15,16]. This research direction aims to analyze event data with imprecise attributes, where single traces might correspond to an array of possible reallife scenarios. ...
Preprint
Full-text available
Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data of en called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for event-case correlation on click data, in the context of user interaction events from a mobility sharing company. To reconstruct the case notion of the process, we apply a novel method to aggregate user interaction data in separate user sessions—interpreted as cases—based on neural networks. To validate our findings, we qualitatively discuss the impact of process mining analyses on the resulting well-formed event log through interviews with process experts.
... The key phase in uncertain data analysis of building graph representation has been optimized through e cient algorithms [6,7]. Such techniques are available in the PROVED toolset [8], which employs an ad-hoc extension of the XES standard to represent uncertain data [9]. A real-life source of uncertain data, convolutional neural network sensing in video feeds of processes, has been described, as well as an additional taxonomy also involving process models [1]. ...
Preprint
Full-text available
With the widespread adoption of process mining in organizations, the field of process science is seeing an increase in the demand for ad-hoc analysis techniques of non-standard event data. An example of such data are uncertain event data: events characterized by a described and quantified attribute imprecision. This paper outlines a research project aimed at developing process mining techniques able to extract insights from uncertain data. We set the basis for this research topic, recapitulate the available literature, and define a future outlook.
Article
Full-text available
The strong impulse to digitize processes and operations in companies and enterprises have resulted in the creation and automatic recording of an increasingly large amount of process data in information systems. These are made available in the form of event logs. Process mining techniques enable the process-centric analysis of data, including automatically discovering process models and checking if event data conform to a given model. In this paper, we analyze the previously unexplored setting of uncertain event logs. In such event logs uncertainty is recorded explicitly, i.e., the time, activity and case of an event may be unclear or imprecise. In this work, we define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained by aligning an uncertain trace onto a regular process model.
Article
Full-text available
Process mining is a discipline which concerns the analysis of execution data of operational processes, the extraction of models from event data, the measurement of the conformance between event data and normative models, and the enhancement of all aspects of processes. Most approaches assume that event data is accurately captured behavior. However, this is not realistic in many applications: data can contain uncertainty, generated from errors in recording, imprecise measurements, and other factors. Recently, new methods have been developed to analyze event data containing uncertainty; these techniques prominently rely on representing uncertain event data by means of graph-based models explicitly capturing uncertainty. In this paper, we introduce a new approach to efficiently calculate a graph representation of the behavior contained in an uncertain process trace. We present our novel algorithm, prove its asymptotic time complexity, and show experimental results that highlight order-of-magnitude performance improvements for the behavior graph construction.
Conference Paper
Full-text available
Modern information systems are able to collect event data in the form of event logs. Process mining techniques allow to discover a model from event data, to check the conformance of an event log against a reference model, and to perform further process-centric analyses. In this paper, we consider uncertain event logs, where data is recorded together with explicit uncertainty information. We describe a technique to discover a directly-follows graph from such event data which retains information about the uncertainty in the process. We then present experimental results of performing inductive mining over the directly-follows graph to obtain models representing the certain and uncertain part of the process.
Conference Paper
Full-text available
Nowadays, more and more process data are automatically recorded by information systems, and made available in the form of event logs. Process mining techniques enable process-centric analysis of data, including automatically discovering process models and checking if event data conform to a certain model. In this paper we analyze the previously unexplored setting of uncertain event logs: logs where quantified uncertainty is recorded together with the corresponding data. We define a taxonomy of uncertain event logs and models, and we examine the challenges that uncertainty poses on process discovery and conformance checking. Finally, we show how upper and lower bounds for conformance can be obtained aligning an uncertain trace onto a regular process model.
Conference Paper
Full-text available
NetworkX is a Python language package for exploration and analysis of networks and network algorithms. The core package provides data structures for representing many types of networks, or graphs, including simple graphs, directed graphs, and graphs with parallel edges and self loops. The nodes in NetworkX graphs can be any (hashable) Python object and edges can contain arbitrary data; this flexibility mades NetworkX ideal for representing networks found in many different scientific fields. In addition to the basic data structures many graph algorithms are implemented for calculating network properties and structure measures: shortest paths, betweenness centrality, clustering, and degree distribution and many more. NetworkX can read and write various graph formats for eash exchange with existing data, and provides generators for many classic graphs and popular graph models, such as the Erdoes-Renyi, Small World, and Barabasi-Albert models, are included. The ease-of-use and flexibility of the Python programming language together with connection to the SciPy tools make NetworkX a powerful tool for scientific computations. We discuss some of our recent work studying synchronization of coupled oscillators to demonstrate how NetworkX enables research in the field of computational networks.
Conference Paper
Full-text available
Process mining has emerged as a new way to analyze business processes based on event logs. These events logs need to be extracted from operational systems and can subsequently be used to discover or check the conformance of processes. ProM is a widely used tool for process mining. In earlier versions of ProM, MXML was used as an input format. In future releases of ProM, a new logging format will be used: the eXtensible Event Stream (XES) format. This format has several advantages over MXML. The paper presents two tools that use this format - XESame and ProM 6 - and highlights the main innovations and the role of XES. XESame enables domain experts to specify how the event log should be extracted from existing systems and converted to XES. ProM 6 is a completely new process mining framework based on XES and enabling innovative process mining functionality.
Book
This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a considerably expanded section on software tools and a completely new chapter of process mining in the large. It is self-contained, while at the same time covering the entire process-mining spectrum from process discovery to predictive analytics. After a general introduction to data science and process mining in Part I, Part II provides the basics of business process modeling and data mining necessary to understand the remainder of the book. Next, Part III focuses on process discovery as the most important process mining task, while Part IV moves beyond discovering the control flow of processes, highlighting conformance checking, and organizational and time perspectives. Part V offers a guide to successfully applying process mining in practice, including an introduction to the widely used open-source tool ProM and several commercial products. Lastly, Part VI takes a step back, reflecting on the material presented and the key open challenges. Overall, this book provides a comprehensive overview of the state of the art in process mining. It is intended for business process analysts, business consultants, process managers, graduate students, and BPM researchers.
Chapter
In recent years, data science emerged as a new and important discipline. It can be viewed as an amalgamation of classical disciplines like statistics, data mining, databases, and distributed systems. Existing approaches need to be combined to turn abundantly available data into value for individuals, organizations, and society. Moreover, new challenges have emerged, not just in terms of size (“Big Data”) but also in terms of the questions to be answered. This book focuses on the analysis of behavior based on event data. Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements. In later chapters, we will show that process mining provides powerful tools for today’s data scientist. However, before introducing the main topic of the book, we provide an overview of the data science discipline.
Article
A description and analysis of concurrent systems, such as communication systems, whose behavior is dependent on explicit values of time is presented. An enumerative method is proposed in order to exhaustively validate the behavior of P. Merlin's time Petri net model, (1974). This method allows formal verification of time-dependent systems. It is applied to the specification and verification of the alternating bit protocol as a simple illustrative example
Conference Paper
State-based representations of concurrent systems suffer from the well known state explosion problem. In contrast, Petri nets are good models for this type of systems both in terms of complexity of the analysis and in visualization of the model. In this paper we present Genet, a tool that allows the derivation of a general Petri net from a state-based representation of a system. The tool supports two modes of operation: synthesis and mining. Applications of these two modes range from synthesis of digital systems to business intelligence.