Conference PaperPDF Available

The ProM Framework: A New Era in Process Mining Tool Support

Authors:

Abstract and Figures

Under the umbrella of buzzwords such as “Business Activity Monitoring” (BAM) and “Business Process Intelligence” (BPI) both academic (e.g., EMiT, Little Thumb, InWoLvE, Process Miner, and MinSoN) and commercial tools (e.g., ARIS PPM, HP BPI, and ILOG JViews) have been developed. The goal of these tools is to extract knowledge from event logs (e.g., transaction logs in an ERP system or audit trails in a WFM system), i.e., to do process mining. Unfortunately, tools use different formats for reading/storing log files and present their results in different ways. This makes it difficult to use different tools on the same data set and to compare the mining results. Furthermore, some of these tools implement concepts that can be very useful in the other tools but it is often difficult to combine tools. As a result, researchers working on new process mining techniques are forced to build a mining infrastructure from scratch or test their techniques in an isolated way, disconnected from any practical applications. To overcome these kind of problems, we have developed the ProM framework, i.e., an “pluggable” environment for process mining. The framework is flexible with respect to the input and output format, and is also open enough to allow for the easy reuse of code during the implementation of new process mining ideas. This paper introduces the ProM framework and gives an overview of the plug-ins that have been developed.
Content may be subject to copyright.
The ProM Framework:
A New Era in Process Mining Tool Support
B.F. van Dongen, A.K.A. de Medeiros, H.M.W. Verbeek, A.J.M.M. Weijters,
and W.M.P. van der Aalst
Department of Technology Management, Eindhoven University of Technology,
P.O. Box 513, NL-5600 MB, Eindhoven,
The Netherlands
b.f.v.dongen@tue.nl
Abstract. Under the umbrella of buzzwords such as “Business Activ-
ity Monitoring” (BAM) and “Business Process Intelligence” (BPI) both
academic (e.g., EMiT, Little Thumb, InWoLvE, Process Miner, and
MinSoN) and commercial tools (e.g., ARIS PPM, HP BPI, and ILOG
JViews) have been developed. The goal of these tools is to extract knowl-
edge from event logs (e.g., transaction logs in an ERP system or audit
trails in a WFM system), i.e., to do process mining. Unfortunately, tools
use different formats for reading/storing log files and present their re-
sults in different ways. This makes it difficult to use different tools on
the same data set and to compare the mining results. Furthermore, some
of these tools implement concepts that can be very useful in the other
tools but it is often difficult to combine tools. As a result, researchers
working on new process mining techniques are forced to build a mining
infrastructure from scratch or test their techniques in an isolated way,
disconnected from any practical applications. To overcome these kind of
problems, we have developed the ProM framework, i.e., an “pluggable”
environment for process mining. The framework is flexible with respect
to the input and output format, and is also open enough to allow for
the easy reuse of code during the implementation of new process mining
ideas. This paper introduces the ProM framework and gives an overview
of the plug-ins that have been developed.
1 Introduction
The research domain process mining is relatively new. A complete overview of
recent process mining research is beyond the scope of this paper. Therefore, we
limit ourselves to a brief introduction to this topic and refer to [3, 4] and the
http://www.processmining.org web page for a more complete overview.
The goal of process mining is to extract information about processes from
transaction logs. It assumes that it is possible to record events such that (i) each
event refers to an activity (i.e., a well-defined step in the process), (ii) each event
refers to a case (i.e., a process instance), (iii) each event can have a performer
also referred to as originator (the actor executing or initiating the activity), and
(iv) events have a timestamp and are totally ordered. Table 1 shows an example
G. Ciardo and P. Darondeau (Eds.): ICATPN 2005, LNCS 3536, pp. 444–454, 2005.
c
Springer-Verlag Berlin Heidelberg 2005
The ProM Framework: A New Era in Process Mining Tool Support 445
Table 1 . An event log (audit trail)
case id activity id originator case id activity id originator
case 1 activity A John case 5 activity A Sue
case 2 activity A John case 4 activity C Carol
case 3 activity A Sue case 1 activity D Pete
case 3 activity B Carol case 3 activity C Sue
case 1 activity B Mike case 3 activity D Pete
case 1 activity C John case 4 activity B Sue
case 2 activity C Mike case 5 activity E Clare
case 4 activity A Sue case 5 activity D Clare
case 2 activity B John case 4 activity D Pete
case 2 activity D Pete
of a log involving 19 events, 5 activities, and 6 originators. In addition to the
information shown in this table, some event logs contain more information on
the case itself, i.e., data elements referring to properties of the case. For example,
the case handling system FLOWer logs every modification of some data element.
Event logs such as the one shown in Table 1 are used as the starting point
for mining. We distinguish three different perspectives: (1) the process perspec-
tive, (2) the organizational perspective and (3) the case perspective. The process
perspecti ve focuses on the control-flow, i.e., the ordering of activities, as shown
in Figure 1(a). The goal of mining this perspective is to find a good charac-
terization of all possible paths, e.g., expressed in terms of a Petri net [15] or
Event-driven Process Chain (EPC) [11, 12]. The organizational perspective fo-
cuses on the originator field, i.e., which performers are involved and how are
they related. The goal is to either structure the organization by classifying peo-
ple in terms of roles and organizational units (Figure 1(b)) or to show relation
between individual performers (i.e., build a social network as described in [2] and
references there, and as shown in Figure 1(c)). The case perspectiv e focuses on
properties of cases. Cases can be characterized by their path in the process or by
the originators working on a case. However, cases can also be characterized by
the values of the corresponding data elements. For example, if a case represents
a replenishment order, it is interesting to know the supplier or the number of
products ordered.
Orthogonal to the three perspectives (process, organization, and case), the
result of a mining effort may refer to logical issues and/or performance issues.
For example, process mining can focus on performance issues such as flow time,
the utilization of performers or execution frequencies.
After developing ad hoc tools for the mining of the process perspective (e.g.,
EMiT [1] and Little Thumb [17]) and other ad hoc tools (e.g., MinSoN [2]) for
the other mining perspectives we started the design of a flexible framework in
which different algorithms for each of the perspectives can be plugged in.
446 B.F. van Dongen et al.
A
AND
-split
B
C
AND
-join
D
E
(a) The control-flow structure expressed in terms of a Petri net.
(b) The organizational structure expressed in
terms of a activity-role-performer diagram.
John Sue Mike Carol Pete Clare
role X role Y role Z
John Sue
Mike
CarolPete
Clare
(c) A sociogram based on transfer of work.
Fig. 1. Some mining results for the process perspective (a) and organizational (b and
c) perspective based on the event log shown in Table 1
2 Architecture
As indicated in the introduction, the basis for all process mining techniques is
aprocess log. Such a log is a file generated by some information system, with
information about the execution of a process. Since each information system has
its own format for storing log files, we have developed a generic XML format
for the ProM framework to store a log in. This format was based on a thorough
comparison of the input needs of various existing (ad-hoc) process mining tools
and the information typically contained in an audit trail or transaction log of
some complex information system (e.g., an ERP or a WFM system).
Another important feature of the ProM framework is that it allows for in-
teraction between a large number of so-called plug-ins. A plug-in is basically
the implementation of an algorithm that is of some use in the process mining
area, where the implementation agrees with the framework. Such plug-ins can
be added to the entire framework with relative ease: Once the plug-in is ready
it can be added to the framework by adding its name to some ini-file. Note that
there is no need to modify the ProM framework (e.g., recompiling the code)
when adding new plug-ins, i.e., it is a truly “pluggable” environment. This in
contradiction to open-source initiatives, such as the data mining software Weka1.
In Figure 2, we show an overview of the framework that we developed. It
explains the relations between the framework, the process log format, and the
plug-ins. As Figure 2 shows, the ProM framework can read files in the XML
format through the Log filte r component. This component is able to deal with
large data sets and sorts the events within a case on their timestamps before
1Weka is available from http://www.cs.waikato.ac.nz/ml/weka/
The ProM Framework: A New Era in Process Mining Tool Support 447
User
Interface
+
User
Interaction
Staffware
Flower
SAP
InConcert
...
Heuristic Net
Aris Graph Format
(Aris AML Format)
PNML
TPN
...
Mining
Plugin
Import
Plugin
Export
Plugin
Analysis
Plugin
Conversion
Plugin
Heuristic Net PNML
Aris Graph format TPN
NetMiner file Agna file
Aris PPM Instances DOT
Comma Seperated Values
...
Log Filter
Visualisation
Engine
XML Log
Result
Frame
Fig. 2. Overview of the ProM framework
the actual mining starts. (If no timestamps are present, the order in the XML
file is preserved.) Through the Import plug-ins a wide variety of models can
be loaded ranging from a Petri net to logical formulas. The Mining plug-ins
do the actual mining and the result is stored in memory, and in a window on
the ProM desktop. The framework allows plug-ins to operate on each others
results in a standardized way. Typically, the mining results contain some kind of
visualization, e.g., displaying a Petri net [15], an EPC [12] or a Social network
[2], or further analysis or conversion. The Analysis plug-ins take a mining result
an analyze it, e.g., calculating a place invariant for a resulting Petri net. The
Conversion plug-ins take a mining result and transform it into another format,
e.g., transforming an EPC into a Petri net. In the remainder of this section, we
describe both the process log format and the plug-ins.
2.1 Process Log Format
Figure 3(a) visualizes the XML schema that specifies the process log format.
The root element is a WorkflowLog element. (The name “workflow log” is cho-
sen for backwards compatibility and we prefer to talk about process log.) The
WorkflowLog element contains (in the given order) an optional Data element, an
optional Source element, and a number of Process elements. A Data element al-
lows for storing arbitrary textual data, and contains a list of Attribute elements.
ASource element can be used to store information about the information system
this log originated from. A Process element refers to a specific process in an in-
formation system. Since most information systems typically control several pro-
cesses, multiple Process elements may exist in a log file. A ProcessInst ance is an
instance of the process, i.e., a case. An AuditTrailEntry may refer to an activity
(WorkflowModelElement), an eventtype (Eventtype), a timestamp (Timestamp),
and a person that executed the activity (Originator).
448 B.F. van Dongen et al.
(a) Process log XML format
reassign
schedule assign
start resume
suspend
autoskip complete
manualskip
ate_abort
pi_abort
withdraw
(b) Transactional model for EventType
Fig. 3. Process log XML format (a) and transactional model (b)
As will be clear from what was mentioned earlier, a log file typically contains
information about events that took place in a system. Such events typically refer
to a case and a specific activity within that case. Examples of such events are:
The activity send message is now ready to be executed.
The activity wait for incoming transmission has not been started for three
weeks.
The case with ID 203453 was aborted.
In order to be able to talk about these events in a standard way, we developed
a transactional model that shows the events that we assume can appear in a
log. Again this model is based on analyzing the different types of logs in real-life
systems (e.g., Staffware, SAP, FLOWer, etc.) Figure 3(b) shows the transactional
model.
When an activity is created, it is either scheduled or skipped automatically
(autoskip). Scheduling an activity means that the control over that activity is put
into the information system. The information system can now assign this activity
to a certain person or group of persons. It is possible to reassign an assigned
activity to another person or group of persons. This can be done by the system,
or by a user. A user can start working on an activity that was assigned to him, or
some user can decide to withdraw the activity or skip it manually (manualskip),
which can even happen before the activity was assigned. The main difference
between a withdrawal and a manual skip is the fact that after the manual skip
the activity has been executed correctly, while after a withdrawal it is not. The
user that started an activity can suspend and resume the activity several times,
but in the end he either has to complet e or abort (ate abort) it. Note the activity
can get aborted (pi abort) during its entire life cycle.
The ProM Framework: A New Era in Process Mining Tool Support 449
We do not claim that we have captured all possible behavior of all systems.
However, we have verified our transactional model against several commercial
systems and they all seem to fit nicely. Nonetheless, in the XML format, we
allow for other event types to be defined on the fly.
2.2 Plug-ins
In this section, we provide an overview of the plug-ins as currently implemented
in the context of the ProM framework. For more technical documentation and
scientific publications, we refer to our website http://www.processmining.org.
As shown in Figure 2 there are five kinds of plug-ins:
Mining plug-ins which implement some mining algorithm, e.g., mining algo-
rithms that construct a Petri net based on some event log.
Export plug-ins which implement some “save as” functionality for some ob-
jects (such as graphs). For example, there are plug-ins to save EPCs, Petri
nets (e.g., in PNML format [7]), spreadsheets, etc.
Import plug-ins which implement an “open” functionality for exported ob-
jects, e.g., load instance-EPCs from ARIS PPM.
Analysis plug-ins which typically implement some property analysis on some
mining result. For example, for Petri nets there is a plug-in which constructs
place invariants, transition invariants, and a coverability graph. However,
there are also analysis plug-ins to compare a log and a model (i.e., confor-
mance testing) or a log and an LTL formula.
Conversion plug-ins which implement conversions between different data for-
mats, e.g., from EPCs to Petri nets.
The current version of the framework contains a large set of plug-ins. A detailed
description of these plug-ins is beyond the scope of this paper. Currently, there
are nine export plug-ins, four import plug-ins, seven analysis plug-ins, and three
conversion plug-ins. Therefore, we only mention some of the available mining
plug-ins. For each of the three perspectives which were mentioned in the intro-
duction, there are different mining plug-ins.
For the process perspective, four plug-ins are available:
α-algorithm which implements the α-algorithm [5] and its extensions as devel-
oped by the authors. The α-algorithm constructs a Petri net which models
the process recorded in the log.
Tshinghua-αalgorithm which uses timestamps in the log files to construct
a Petri net. It is related to the αalgorithm, but uses a different approach.
Details can be found in [18]. It is interesting to note that this mining plug-in
was the first plug-in developed by researchers outside of our research group.
Researchers from Tshinghua University in China (Jianmin Wang and Wen
Lijie) were able to develop and integrate this plug-in without any help or
changes to the framework.
450 B.F. van Dongen et al.
Genetic algorithm which uses genetic algorithms to tackle possible noise in
the log file as described in [13]. Its output format is a heuristics net (which
can be converted into an EPC or a Petri net).
Multi-phase mining which implements a series of process mining algorithms
that use instance graphs (comparable to runs) as an intermediate format.
The two-phase approach resembles the aggregation process in Aris PPM.
For the organizational perspective, one plug-in is available:
Social network miner which uses the log file to determine a social network of
people [2]. It requires the log file to contain the Originator element.
Finally, for the case perspective, also one plug-in is available:
Case data extraction which can be used for interfacing with a number of
standard knowledge discovering tools, e.g., Viscovery and SPSS AnswerTree.
Sometimes a collection of plug-ins is needed to achieve the desired functionality.
An example is the LTL-checker which checks whether logs satisfy some Linear
Temporal Logic (LTL) formula. For example, the LTL-checker can be used to
check the “four eyes” principle, i.e., two activities within the same case should
not be executed by the same person to avoid possible fraud. The LTL-checker
combines a mining plug-in (to get the log), an import plug-in (to load the file with
predefined LTL formulas), and an analysis plug-in (to do the actual checking).
3UserInterface
Since the ProM framework contains a large number of plug-ins, it is impossible
to discuss them all in detail. Therefore, we only present some screenshots of a
few plug-ins that we applied to the example of Table 1. In Figure 4, we show the
result of applying the α-mining plug-in to the example. The default settings of
the plug-in were used, and the result is a Petri net that is behaviorally equivalent
to the one presented in Figure 1. In Figure 5, we show the result of the social
network mining plug-in. We used the handover of work setting, considering only
direct succession, to generate this figure. Comparing it to Figure 1(c) shows that
the result is an isomorphic graph (i.e. the result is the same).
Petri nets are not the only modelling language supported by the framework.
Instead, we also have built-in support for EPCs (Event-driven Process Chains).
In Figure 6, we show the result of the multi-phase mining plug-in. The result is
an aggregated EPC describing the behavior of all cases. Note that it allows for
more behavior than the Petri net, since the connectors are of the type logical or.
In Figure 7 we show the user interface of the analysis plug-in that can be used
for the verification of EPCs.
In this section, we have shown some screenshots to provide an overview of
the framework. We would like to stress that we only showed a few plug-ins of
the many that are available. We would also like to point out that most plug-ins
allow for user interaction. The latter it important because process mining is often
an interactive process where human interpretation is important and additional
knowledge can be used to improve the mining result.
The ProM Framework: A New Era in Process Mining Tool Support 451
Fig. 4. The α-mining plug-in Fig. 5. The social network mining plug-in
Fig. 6. The discovered EPC Fig. 7. Analyzing the EPC for correctness
4 Related Work
Process mining can be seen as a tool in the context of Business Activity Mon-
itoring (BAM) and Business (Process) Intelligence (BPI). In [9] a BPI toolset
on top of HP’s Process Manager is described. The BPI tools set includes a so-
called “BPI Process Mining Engine”. However, this engine does not provide any
techniques as discussed before. Instead it uses generic mining tools such as SAS
Enterprise Miner for the generation of decision trees relating attributes of cases
to information about execution paths (e.g., duration). In [14] the PISA tool is
described which can be used to extract performance metrics from workflow logs.
Similar diagnostics are provided by the ARIS Process Performance Manager
(PPM) [11]. The latter tool is commercially available and a customized version
of PPM is the Staffware Process Monitor (SPM) [16] which is tailored towards
mining Staffware logs.2
2Note that the ProM Framework interfaces with Staffware, SPM, ARIS Toolset, and
ARIS PPM.
452 B.F. van Dongen et al.
Given the many papers on mining the process perspective it is not possible to
give a complete overview. Instead we refer to [3, 5]. Historically, Cook et al. [8]
and Agrawal et al. [6] started to work on the problem addressed in this paper.
Herbst et al. [10] took an alternative approach which allows for dealing with
duplicate activities. The authors of this paper have been involved in different
variants of the so-called α-algorithm [1, 5, 17]. Each of the approaches has its
pros and its cons. Most approaches that are able to discover concurrency have
problems dealing with issues such as duplicate activities, hidden activities, non-
free-choice constructs, noise, and incompleteness.
The ProM framework subsumes process mining tools like EMiT [1], Lit-
tle Thumb [17] and MinSon [2]. Most of these tools had their own format to
store log files in, and had their own limitations. The tool EMiT for example
was unable to deal with log files of more than 1000 cases. To be able to use
all these tools together in an interactive way, we developed the ProM frame-
work, which can be seen as a successor of all these tools. The framework allows
researchers to seamlessly combine their own algorithms with algorithms from
other people. Furthermore, using the framework allows you to interface with
many existing tools, both commercial and public. These tools include: the Aris
Toolset, Aris PPM, Woflan, The Petri net kernel, Netminer, Agna, Dot, Viscov-
ery, etc.
5 Conclusion
The ProM framework integrates the functionality of several existing process
mining tools and provides many additional process mining plug-ins. The ProM
framework supports multiple formats and multiple languages, e.g., Petri nets,
EPCs, Social Networks, etc. The plug-ins can be used in several ways and
combined to be applied in real-life situations. We encourage developers and re-
searchers to use the ProM framework for implementing new ideas. It is easy to
add a new plug-in. For adding new plug-ins it suffices to add a few lines to the
configuration files and no changes to the code are necessary, i.e., new mining
plug-ins can be added without re-compiling the source code. Experiences with
adding the Thingua-αplug-in and the Social network miner show that this is
indeed rather straightforward.
Acknowledgements
The authors would like to thank all people that have been involved in the de-
velopment and implementation of the ProM framework. In particular we would
like to thank Minseok Song, Jianmin Wang and Wen Lijie for their contribu-
tions. Furthermore, we would like to thank IDS Scheer for providing us with
Aris PPM and the Aris toolset. Last, but certainly not least, we would like to
thank Peter van den Brand for doing the major part of the implementation work
for us.
The ProM Framework: A New Era in Process Mining Tool Support 453
References
1. W.M.P. van der Aalst and B.F. van Dongen. Discovering Workflow Performance
Models from Timed Logs. In Y. Han, S. Tai, and D. Wikarski, editors, International
Conference on Engineering and Deployment of Cooperative Information Systems
(EDCIS 2002), volume 2480 of Lecture Notes in Computer Science, pages 45–63.
Springer-Verlag, Berlin, 2002.
2. W.M.P. van der Aalst and M. Song. Mining Social Networks: Uncovering interac-
tion patterns in business processes. In J. Desel, B. Pernici, and M. Weske, editors,
International Conference on Business Process Management (BPM 2004),volume
3080 of Lecture Notes in Computer Science, pages 244–260. Springer-Verlag, Berlin,
2004.
3. W.M.P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and
A.J.M.M. Weijters. Workflow Mining: A Survey of Issues and Approaches. Data
and Knowledge Engineering, 47(2):237–267, 2003.
4. W.M.P. van der Aalst and A.J.M.M. Weijters, editors. Process Mining,Special
Issue of Computers in Industry, Volume 53, Number 3. Elsevier Science Publishers,
Amsterdam, 2004.
5. W.M.P. van der Aalst, A.J.M.M. Weijters, and L. Maruster. Workflow Mining:
Discovering Process Models from Event Logs. IEEE Transactions on Knowledge
and Data Engineering, 16(9):1128–1142, 2004.
6. R. Agrawal, D. Gunopulos, and F. Leymann. Mining Process Models from Work-
flow Logs. In Sixth International Conference on Extending Database Technology,
pages 469–483, 1998.
7. J. Billington and et. al. The Petri Net Markup Language: Concepts, Technology,
and Tools. In W.M.P. van der Aalst and E. Best, editors, Application and Theory
of Petri Nets 2003, volume 2679 of Lecture Notes in Computer Science,pages
483–506. Springer-Verlag, Berlin, 2003.
8. J.E. Cook and A.L. Wolf. Discovering Models of Software Processes from Event-
Based Data. ACM Transactions on Software Engineering and Methodology,
7(3):215–249, 1998.
9. D. Grigori, F. Casati, U. Dayal, and M.C. Shan. Improving Business Process Qual-
ity through Exception Understanding, Prediction, and Prevention. In P. Apers,
P. Atzeni, S. Ceri, S. Paraboschi, K. Ramamohanarao, and R. Snodgrass, ed-
itors, Proceedings of 27th Internati on al C onference on Very Large Dat a Ba se s
(VLDB’01), pages 159–168. Morgan Kaufmann, 2001.
10. J. Herbst. A Machine Learning Approach to Workflow Management. In P roceedings
11th European Conference on Machine Learning, volume 1810 of Lecture Notes in
Computer Science, pages 183–194. Springer-Verlag, Berlin, 2000.
11. IDS Scheer. ARIS Process Performance Manager (ARIS PPM): Measure, Ana-
lyze and Optimize Your Business Process Performance (whitepaper). IDS Scheer,
Saarbruecken, Gemany, http://www.ids-scheer.com, 2002.
12. G. Keller and T. Teufel. SAP R/3 Process Oriented Implementation. Addison-
Wesley, Reading MA, 1998.
13. A.K.A. de Medeiros, A.J.M.M. Weijters, and W.M.P. van der Aalst. Using Ge-
netic Algorithms to Mine Process Models: Representation, Operators and Results.
BETA Working Paper Series, WP 124, Eindhoven University of Technology, Eind-
hoven, 2004.
454 B.F. van Dongen et al.
14. M. zur M¨uhlen and M. Rosemann. Workflow-based Process Monitoring and Con-
trolling - Technical and Organizational Issues. In R. Sprague, editor, Proceed in gs
of the 33rd Hawaii International Conference on System Science (HICSS-33),pages
1–10. IEEE Computer Society Press, Los Alamitos, California, 2000.
15. W. Reisig and G. Rozenberg, editors. Lectures on Petri Nets I: Basic Models,
volume 1491 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 1998.
16. Staffware. Staffware Process Monitor (SPM). http://www.staffware.com, 2002.
17. A.J.M.M. Weijters and W.M.P. van der Aalst. Rediscovering Workflow Models
from Event-Based Data using Little Thumb. Integrated Computer-Aided Engi-
neering, 10(2):151–162, 2003.
18. L. Wen, J. Wang, W.M.P. van der Aalst, Z. Wang, and J. Sun. A Novel Approach
for Process Mining Based on Event Types. BETA Working Paper Series, WP 118,
Eindhoven University of Technology, Eindhoven, 2004.
... Fig. 2), the mentioned documentation processes were selected as the focus of our research on using process mining to examine and identify the user or user groups responsible for changes in master data. Our application of the process mining methodology is summarized in three steps according to van Dongen et al. (2005). ...
... In step 2, the event logs were analyzed by applying process discovery and conformance checking methods with the process mining solution ProM [28]. In step 3, the discovered process model was evaluated using the fitness, precision, and generalizability measures proposed by van Dongen et al. (2005). Finally, in step 4, these data were anonymized using a common and established technical protection measure within the ProM program package, specifically including all resource names (proper names, usernames, user abbreviations, logins), case IDs, and roughly detailed timestamps (months and years). ...
Chapter
Robotic Process Automation (RPA) is an emerging automation technology that creates software (SW) robots to partially or fully automate rule-based and repetitive tasks (aka routines) previously performed by human users in their applications’ user interfaces (UIs). Successful usage of RPA requires strong support by skilled human experts, from the detection of the routines to be automated to the development of the executable scripts required to enact SW robots. In this paper, we discuss how process mining can be leveraged to minimize the manual and time-consuming steps required for the creation of SW robots, enabling new levels of automation and support for RPA. We first present a reference data model that can be used for a standardized specification of UI logs recording the interactions between workers and SW applications to enable interoperability among different tools. Then, we introduce a pipeline of processing steps that enable us to (1) semi-automatically discover the anatomy of a routine directly from the UI logs, and (2) automatically develop executable scripts for performing SW robots at run-time. We show how this pipeline can be effectively enacted by researchers/practitioners through the SmartRPA tool.KeywordsRobotic Process AutomationProcess miningUser Interface (UI) logsReference data model for UI logsSegmentationAutomated generation of SW robots from UI LogsSmartRPA
... Its output reveals qualitative dynamics by offering the ability to assess the content of the patterns discovered. Software support for process mining is included in the ProM framework (Van Dongen et al., 2005) providing guidance and tool support for process analysis. ...
Chapter
Teams are at the core of every organisation, composed of individuals who continuously collaborate, exchange knowledge and ideas, and constantly learn from one another through formal or informal learning experiences. Team learning is therefore a continuously changing phenomenon that develops and evolves over time as teams interact. In this chapter, we aim to promote the investigation of team learning as a temporal phenomenon, and suggest that its temporality can be captured through team interaction dynamics, defined as continuously changing patterns of micro-behaviours that emerge and evolve as teams operate. We set three key steps for initiating and leading research that captures temporality: (a) identifying the interaction dynamics of interest, (b) figuring out the best way to collect and code these, and finally (c) choosing an analysis technique that helps capture continuously and sequentially unfolding patterns. We offer some ‘food for thought’ on interaction dynamics that relate to team learning and the added value of investigating them, and present some existing data collection and coding methods. We finally propose a framework for choosing an appropriate analysis technique based on the dynamic output that each analysis generates.KeywordsTeam learningTemporal phenomenonInteraction dynamicsPatternsAnalysis techniquesEmergent states
... Second characteristic Models based on this proposal should be replicable and reproducible for any Health Technology Assessment, using algorithms and tools specifically tailored to this purpose. Most of the related works used PROM [36] -an open-source framework provided by the University of Eindhover [37]. The software was created to support various techniques and algorithms to experiment with the process mining technique within the academic community. ...
Article
Objective Propose a process mining-based method for Health Technology Assessment.Methods Articles dealing with prior studies in Health Technology Assessment using Process Mining were identified. Five research questions were defined to investigate these studies and present important points and desirable characteristics to be addressed in a proposal. The was defined method with five steps and was submitted to a case study for evaluation.ResultsThe Literature search identified six main characteristics. As a result, the five-step method proposed was applied in the radical prostatectomy surgical procedure between the robot assisted technique and laparoscopy.Conclusion It was demonstrated in this article the creation of the proposal of an efficient method with its replication for other health technologies, coupled with the good interpretation of the specialists in terms of comprehensibility of the discovered patterns and their correlation with clinical protocols and guidelines.
... Furthermore, DPNs allow the global management of variables, which allows a more realistic representation of the domain regarding data management. The DPN was modeled in ProM [56] considering 7PMG [29]. An extract of the DPN model is shown in Figure 5. ...
Article
Full-text available
In the context of improving clinical treatments and certifying clinics, guideline-compliant care has become more important. However, verifying the compliance of treatment procedures with Clinical Guidelines remains difficult, as guidelines are mostly available in non-computer interpretable form and previous computer-interpretable approaches neglect the process perspective with its potential to gain medical insight. In this paper, we present our transformation framework CGK4PM, which addresses the procedural nature of treatment processes and which guides the transformation of clinical explicit and implicit guideline knowledge into process models. The procedural representation enables the use of process mining techniques such as conformance checking to verify guideline compliance and the opportunity to gain insights from complex clinical treatment processes. In collaboration with physicians from Münster University Hospital, the practical applicability of the framework is demonstrated in a case study by transforming the guideline for the treatment of malignant melanoma. The case study findings demonstrate the need for structured and guided transformation and highlight the difficulties in developing a guideline-based process model.
Chapter
Over the past decade, process mining has emerged as a new area of research focused on analyzing end-to-end processes through the use of event data and novel techniques for process discovery and conformance testing. While the benefits of process mining are widely recognized scientifically, research has increasingly addressed privacy concerns regarding the use of personal data and sensitive information that requires protection and compliance with data protection regulations. However, the privacy debate is currently answered exclusively by technical safeguards that lead to the anonymization of process data. This research analyzes the real-world utility of these process data anonymization techniques and evaluates their suitability for privacy protection. To this end, we use process mining in a case study to investigate how responsible users and specific user groups can be identified despite the technical anonymization of process mining data.KeywordsProcess miningPrivacy measuresHealthcare sectorHospital information system
Chapter
Interest in stochastic models for business processes has been revived in a recent series of studies on uncertainty in process models and event logs, with corresponding process mining techniques. In this context, variants of stochastic labelled Petri nets, that is with duplicate labels and silent transitions, have been employed as a reference model. Reasoning on the stochastic, finite-length behaviours induced by such nets is consequently central to solve a variety of model-driven and data-driven analysis tasks, but this is challenging due to the interplay of uncertainty and the potentially infinitely traces (including silent transitions) induced by the net. This explains why reasoning has been conducted in an approximated way, or by imposing restrictions on the model. The goal of this paper is to provide a deeper understanding of such nets, showing how reasoning can be properly conducted by leveraging solid techniques from qualitative model checking of Markov chains, paired with automata-based techniques to suitably handle silent transitions. We exploit this connection to solve three central problems: computing the probability of reaching a particular final marking; computing the probability of a trace or that a temporal property, specified as a finite-state automaton, is satisfied by the net; checking whether the net stochastically conforms to a probabilistic Declare model. The different techniques have all been implemented in a proof-of-concept prototype.
Preprint
Full-text available
As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This second paper presents a literature review of key enabling technologies of digital twins, with an emphasis on uncertainty quantification, optimization methods, open source datasets and tools, major findings, challenges, and future directions. Discussions focus on current methods of uncertainty quantification and optimization and how they are applied in different dimensions of a digital twin. Additionally, this paper presents a case study where a battery digital twin is constructed and tested to illustrate some of the modeling and twinning methods reviewed in this two-part review. Code and preprocessed data for generating all the results and figures presented in the case study are available on GitHub.
Thesis
Full-text available
This thesis aims to develop an approach that allows generating realistic event logs based on Data Petri nets (DPN) that can be used as a substitute for missing real data when applying AI methods. Based on a requirement analysis, a literature review of already existing methods for synthetic data generation based on process models is conducted, and the token-based simulation method is chosen. Finally, the approach is developed, implemented, and evaluated. The approach was implemented in the tool called DALG: The Data Aware Event Log Generator. During the evaluation, it was found that it can generate event logs that conform to both the control-flow and the data perspective of a DPN. However, achieving the generation of realistic event logs could not be reached. It was found that it is difficult to describe processes accurately enough in DPNs to generate realistic data since they lack the necessary expressiveness. Nonetheless, the approach shows promise for generating realistic event logs. However, further research regarding the problems uncovered in this thesis is necessary to improve the realism of the synthetic data. The main contributions of this thesis are the identification of challenges that occur when trying to generate realistic data based on Data Petri nets, solutions for several of the uncovered problems, and the tool DALG.
Article
Process mining is the discipline of analyzing and improving processes which are known as an event log. The real-life event log contains noise, infrequent behaviors, and numerous concurrency, in effect the generated process model through process discovery algorithms will be inefficient and complex. Shortcomings in an event log result in current process discovery algorithms failing to pre-process data and describe real-life phenomena. Existing process mining algorithms are limited based on the algorithm’s filtering, parameters, and pre-defined features. It is critical to use a high-quality event log to generate a robust process model. However, pre-processing of the event log is mostly cumbersome and is a challenging procedure. In this paper, we propose a novel pre-processing step aimed to obtain superior quality event log from a set of raw data, consequently a better performing process model. The proposed approach concatenates events which hold concurrent relations based on a probability algorithm, producing simpler and accurate process models. This proposed pre-processing step is based on the probability of the frequency of concurrent events. The performance of the pre-processing approach is evaluated on 18 real-life benchmark datasets that are publicly available. We show that the proposed pre-processing framework significantly reduces the complexity of the process model and improves the model’s F-Measure.
Article
Full-text available
Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and typically, there are discrepancies between the actual workflow processes and the processes as perceived by the management. Therefore, we propose a technique for rediscovering workflow models. This technique uses workflow logs to discover the workflow process as it is actually being executed. The workflow log contains information about events taking place. We assume that these events are totally ordered and each event refers to one task being executed for a single case. This information can easily be extracted from transactional information systems (e.g., Enterprise Resource Planning systems such as SAP and Baan). The rediscovering technique proposed in this paper can deal with noise and can also be used to validate workflow processes by uncovering and measuring the discrepancies between prescriptive models and actual process executions.
Conference Paper
Full-text available
Workflow management systems enable the exact and timely analysis of automated business processes through the analysis of logged audit trail data. Within the research project CONGO, we develop a process analysis tool (PISA) that can be employed to analyze the audit trail data of different workflow management systems in conjunction with target data from business process modeling tools. A working prototype has been completed that integrates data of the ARIS Toolset and IBM MQSeries Workflow. The analysis focuses on three different perspectives: processes and functions, involved resources, and process objects. We outline the economic aspects of workflow based process monitoring and control and the current state of the art in monitoring facilities provided by current workflow management systems and existing standards. After a discussion of the three evaluation perspectives, sample evaluation methods for each perspective are discussed. The concept and architecture of PISA are described and implementation issues are outlined before an outlook on further research is given.
Conference Paper
Full-text available
Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and typically there are discrepancies between the actual workflow processes and the processes as perceived by the management. Therefore, we propose a technique for process mining. This technique uses workflow logs to discover the workflow process as it is actually being executed. The process mining technique proposed in this paper can deal with noise and can also be used to validate workflow processes by uncovering and measuring the discrepancies between prescriptive models and actual process executions.
Article
Full-text available
The topic of process mining has attracted the attention of both researchers and tool vendors in the Business Process Management (BPM) space. The goal of process mining is to discover process models from event logs, i.e., events logged by some information system are used to extract information about activities and their causal relations. Several algorithms have been proposed for process mining. Many of these algo-rithms cannot deal with concurrency. Other typical problems are the presence of duplicate activities, hidden activities, non-free-choice con-structs, etc. In addition, real-life logs contain noise (e.g., exceptions or incorrectly logged events) and are typically incomplete (i.e., the event logs contain only a fragment of all possible behaviors). To tackle these problems we propose a completely new approach based on genetic algo-rithms. In this paper, we present a new process representation, a fitness measure and the genetic operators used in a genetic algorithm to mine process models. Our focus is on the use of the genetic algorithm for min-ing noisy event logs. Additionally, in the appendix we elaborate on the relation between Petri nets and this representation and show that genetic algorithms can be used to discover Petri net models from event logs.
Chapter
Full-text available
Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and typically there are discrepancies between the actual workflow processes and the processes as perceived by the management. Therefore, we have developed techniques for discovering workflow models. Starting point for such techniques are so-called “workflow logs” containing information about the workflow process as it is actually being executed. In this paper, we extend our existing mining technique α [4] to incorporate time. We assume that events in workflow logs bear timestamps. This information is used to attribute timing such as queue times to the discovered workflow model. The approach is based on Petri nets and timing information is attached to places. This paper also presents our workflow-mining tool EMiT. This tool translates the workflow log of several commercial systems (e.g., Staffware) to an independent XML format. Based on this format the tool mines for causal relations and produces a graphical workflow model expressed in terms of Petri nets.
Article
Full-text available
Many of today’s information systems are driven by explicit process models. Workflow management systems, but also ERP, CRM, SCM, and B2B, are configured on the basis of a workflow model specifying the order in which tasks need to be executed. Creating a workflow design is a complicated time-consuming process and typically there are discrepancies between the actual workflow processes and the processes as perceived by the management. To support the design of workflows, we propose the use of workflow mining. Starting point for workflow mining is a so-called “workflow log” containing information about the workflow process as it is actually being executed. In this paper, we introduce the concept of workflow mining and present a common format for workflow logs. Then we discuss the most challenging problems and present some of the workflow mining approaches available today.
Conference Paper
Full-text available
Increasingly information systems log historic information in a systematic way. Workflow management systems, but also ERP, CRM, SCM, and B2B systems often provide a so-called "event log", i.e a log recording the execution of activities. Unfortunately, the information in these event logs is rarely used to analyze the underlying processes. Process mining aims at improving this by providing techniques and tools for discovering process, control, data, organizational, and social structures from event logs. This paper focuses on the mining social networks. This is possible because event logs typically record information about the users executing the activities recorded in the log. To do this we combine concepts from workflow management and social network analysis. This paper introduces the approach, defines metrics, and presents a tool to mine social networks from event logs.