ChapterPDF Available

Generating Synthetic Sensor Event Logs for Process Mining

Authors:

Abstract and Figures

Process mining has gained significant practical usefulness in diverse domains. The input of process mining is an event log, tracking the execution of activities that can be mapped onto a business processes. Thus, the availability and quality of event logs significantly impact the process mining result. The use of process mining in novel use cases or experimental settings is often hampered because no appropriate event logs are available. This paper presents a tool to generate synthetic (sensor) event logs. Compared to existing synthetic log generator tools, the IoT process log generator produces data in a non-deterministic way. Users can add noise in a controlled manner and might enhance the processes with IoT data. In this way, the tool allows generating synthetic data for IoT environments that can be individually configured. Our tool makes a contribution towards an increased use of process mining in settings relying on (IoT) sensor event data. KeywordsInternet of thingsEvent log simulationSynthetic dataBusiness process simulationProcess mining
Content may be subject to copyright.
Generating Synthetic Sensor Event Logs
for Process Mining
Yorck Zisgen[0000000296462829], Dominik Janssen[0000000302188628], and
Agnes Koschmider[0000000182067636]
Group Process Analytics
Kiel University, Germany
yzi|dominik.janssen|ak@informatik.uni-kiel.de
Abstract. Process mining has gained significant practical usefulness in
diverse domains. The input of process mining is an event log, tracking
the execution of activities that can be mapped onto a business processes.
Thus, the availability and quality of event logs significantly impact the
process mining result. The use of process mining in novel use cases or ex-
perimental settings is often hampered because no appropriate event logs
are available. This paper presents a tool to generate synthetic (sensor)
event logs. Compared to existing synthetic log generator tools, the IoT
process log generator produces data in a non-deterministic way. Users
can add noise in a controlled manner and might enhance the processes
with IoT data. In this way, the tool allows generating synthetic data for
IoT environments that can be individually configured. Our tool makes
a contribution towards an increased use of process mining in settings
relying on (IoT) sensor event data.
Keywords: Internet of Things ·Event Log Simulation ·Synthetic Data
·Business Process Simulation ·Process Mining.
1 Introduction
Process Mining and Internet-of-Things (IoT) can significantly benefit from each
other because IoT environments produce the large quantity of data that process
mining methods require for accurate process analysis [11]. In turn, process mining
provides the insights to understand the IoT enhanced processes in a controlled
way. However, quality issues of the high volume of IoT data (like missing or
incomplete data entries) hamper the direct applicability of process mining on
IoT data. Additionally, sensor event data is at a much lower level of semantics,
and the data does not directly relate to high-level business process concepts as
required for process mining.
To allow working on IoT data of different data quality that can be used by
process mining, this paper presents the IoT process log generator. The tool gen-
erates both synthetic event logs and synthetic IoT sensor event logs. For this
purpose, a simulation engine with an end-user front end has been implemented.
Users model processes and can configure the process models in terms of dura-
tion, frequency of process activities, or noise. Optionally they can specify an
2 Y. Zisgen et al.
IoT environment that is mapped to the process model. In this way, event logs
for varying IoT environments (with different sensor or failure types) can be pro-
duced and used for experimental purposes. For instance, motion sensors with
discrete ON and OFF states or temperature sensors with continuous values can
be configured. Also, a simulation can be used to answer questions of whether
upgrading a facility with IoT sensors will justify the occurred cost. It can also
help to reveal bottlenecks in production capacity by showing potential conges-
tion in the simulation. Generally, it has been shown that synthetic data not only
provide a substitution for real data [3, 16], but can even enhance insight into
domain-specific research [19]. Thus, we are convinced that our IoT process log
generator will fuel the application of process mining in use cases where data
accessibility is challenging and data quality also hampers data analysis.
This paper is structured as follows. Section 2 summarizes related works.
Section 3 presents the general architecture of our tool, while Section 4 provides an
in-depth look into the implementation. Section 5 demonstrates the applicability
of our tool on two use cases. Finally, the paper concludes with a summary and
a discussion of future tasks.
2 Related Work
The following streams of research are related to the IoT process log generator:
(1) IoT log generators and (2) event log generators.
Generally, the available works are either limited to a certain sensor type [9,
15], are restricted to a particular application domain [2, 15], or only provide a
collection of IoT simulation approaches [1,4, 18]. For instance, the approaches
presented in [9, 15] are limited to GPS or signal strength sensor types and do not
allow to add additional sensor types. In contrast our IoT process log generator
allows to add additional sensor types like motion, light, temperature, counter or
on/off sensors in the analysis.
Diverse application domains have been tackled by IoT log generators like
mobile devices, wireless sensor networks or cyberphysical systems. For instance,
Kertestz et al. [13] propose a simulator for the cloud communication of mobile
IoT devices’ sensor data. Papadoupolos et al. [15] addressed signal strength of
wireless sensor networks. Ramprasad et al. [17] propose a simulator for virtual
IoT architectures called EMU-IoT, in which an end-to-end evaluation of an IoT
network can be simulated. Gim´enez et al. [9] tested changing positional data to
test collision anticipation of vehicles in a warehouse. Ahmad et al. [2] proposed a
simulation architecture to commit research on communication in real-time-IoT
environments. Thus, available IoT simulators are commonly designed around a
narrow application field. They do not allow for usage in more diverse settings
like the IoT process log generator, which allows to run simulations such as visi-
tor amount monitoring, smart home activities, procedures in a smart factory, or
hospital processes.
Synthetic event logs might also be generated with the CPN tool [12], ProM [20]
or WoPed [7], which are common tools for process modeling or process mining
Generating Synthetic Event Logs 3
respectively. Also, a log generator for declarative process models has been sug-
gested [5]. However, these tools only create deterministic event logs that directly
result from the behavior of the modeled process (i.e., no frequency of trace occur-
rence can be specified). Furthermore, ProM allows to add noise and outliers into
the event log. However, a recent analysis of these noise adding plug-ins showed
that available noise filtering tools do not appropriately filter nor add noise [14].
In contrast, our IoT process log generator makes it possible to introduce different
noise types, while still providing a corresponding noise-free log as ground truth
for comparison.
To sum up, the available IoT log generators are restricted to specific sensor
types or application domains, while our tool is not restricted to any specific
sensor type or domain. It also generates data in a non-deterministic way, can
add noise in a controlled way and it can be expanded to include other sensor or
noise types.
3 Architecture
This section presents the architecture of the IoT process log generator. Fig. 1
shows the architectural design. The tool is platform-independent, is designed for
single-user settings, and was developed in Python version 3.9, using NumPy [10]
as an external library. The tool can be accessed publicly via a browser. Users
Frontend
IoT
Modeler
Process
Modeler Settings
Configuration
export
map & extract
Application
Event
Log Generator
Sensor Event
Log Generator
Output
CSV
Fig. 1: Architecture of the IoT Process Log Generator
can model processes with Petri nets in the online modeler or import .PNML files.
Additionally, they can manage simulation settings, such as duration of activities,
simulated time range, and add different types and quantities of noise. They can
use the IoT environment to configure an IoT environment (i.e., specify sensor
types and failure probabilities). Subsequently, two different types of event logs
can be generated at the application layer, either a conventional event log with
time-stamps and activities or an IoT sensor event log with additional sensor
information. The settings configuration and the designed IoT environment are
used as input in a way that each process activity is mapped onto locations in
the IoT environment. Finally, the information generated by the application layer
4 Y. Zisgen et al.
is processed into an event log that can be read on-screen or downloaded as a
.TXT- or .CSV-file. The next section explains in detail the conceptual design of
the IoT process log generator.
4 Implementation
Figure 2 extends Figure 1 and shows the information flow within our IoT pro-
cess log generator. First, the modeled processes, the configuration settings and
Simulation Engine
Event Log
Workflow Net
Place
1 2 3
4
6
Sensor
9LocationPath
10
8 7
11 12
5 5
Transition
Front End
Fig. 2: Information Flow
Frontend
Application
Processing
Engine
Application
IoT
Engine
Output
Configuration
Manager
Simulation
Engine
Location Edge
Output
Manager Message
Object
1 1
10 01..*
Place
0
1
1
0
0
*
0
*
*
0
1
1*
1
1
*
2
0
*
*0
02
0*
Node
{abstract}
Arc
WorkflowNet
Transition
Sensor
{abstract}
Fig. 3: UML Class Diagram Extract
the desired output format are forwarded to the simulation engine (see 1,2,3).
The simulation engine creates process instances from the business process(es)
(4). Those process instances (consisting of places and transitions) are processed
independently of one another. Internally, they follow the execution rules of Petri
nets (5,6). Transitions represent activities, which are mapped to locations in an
IoT environment (8) and activate sensors (7). Movements within the environ-
ment (9) can also activate sensors (10). Sensor readings (continuous or discrete)
then adhere to the IoT settings and finally transmit their information, times-
tamps, and optionally noise, to the sensor event log (11). If a user only aims for
an event log, the IoT modeler is skipped and the event log is directly generated
(12). Figure 3 shows the technical design of the IoT process log generator in
terms of a UML class diagram.
Our work distinguishes itself from a processing script by avoiding any deter-
ministic proceedings when exclusive choices or parallel activities are modeled.
Furthermore activity duration and sensor values are determined by choosing a
random value within a given interval and a probability distribution. As a con-
sequence, all possible execution sequences of a business process can be tracked
in the event log. Activity duration is randomly set to a value within the range
Generating Synthetic Event Logs 5
specified for this activity. Sensor activation can be set to happen at any time
during the activation, in either a given or random order. Furthermore, we permit
to specify probabilities for each choice to allow for a ’standard case execution
and an ’outlier case execution.
Figure 4 shows exemplary the GUI of our tool. The user has modeled a Petri
net referring to a hospital process (see Figures 4a and 4b). The generator will
simulate the modeled process ten times within a six-hour time frame. Noise with
an occurrence rate of 50.7% is added to the output. According to the selection,
the following types of noise will be included: dropping events, duplicating events
(i.e., event twice), and assigning a wrong event value (Figure 4c).
(a) Petri net Wizard
(b) Settings: Activity (c) Settings: IoT-Simulation
Fig. 4: Screenshots of Web Interface
5 Exemplary application of the IoT Process Log
Generator
In this section, we demonstrate the usefulness of our tool using two examples:
hospital processes and smart homes. To generate an event log for a hospital
process, we took the process described in Elkoumy et al. [8]. We translated the
BPMN process to a Petri net, configured the settings of the process (i.e., used
different activity durations, added noise), and simulated the process. Based on
the simulation and the user’s configuration, an event log has been generated
as shown in Table 1. The left-hand side of the table shows the clean event log
(ground truth), while the right-hand side of the table includes noise according
to the user specifications.
6 Y. Zisgen et al.
Log - Clean
Case
ID Date Time Activity
846 2022-02-24 08:23 Register
846 2022-02-24 09:07 Hospitalize
846 2022-02-24 10:46 Blood Test
846 2022-02-24 11:18 Blood Test
846 2022-02-24 12:18 Visit
846 2022-02-24 13:12 Discharge
(a) Event Log Clean
Log - Noise
Case
ID Date Time Activity Noise Type
846 2022-02-24 20:23 Register Wrong Time
846 2022-02-24 09:07 Hospitalize Event Twice
846 2022-02-24 09:07 Hospitalize Event Twice
846 2022-24-02 10:46 Blood Test Wrong Date
Event Lost
846 2022-02-24 12:18 Visit Multi Recordings
846 2022-02-24 12:21 Visit Multi Recordings
846 2022-02-24 12:22 Visit Multi Recordings
846 2022-02-24 13:12 Register Wrong Event
(b) Event Log with added Noise
Table 1: Synthetic Hospital Event Logs
The second example is related to smart homes. For this purpose, we took a
sensor event log from literature [6] observing activities of smart home inhabitants
with various sensors. We designed an IoT environment with the IoT modeler,
added different sensor types, and simulated daily routines like cooking,clean-
ing up or making breakfast. Additionally, we added noise to the configuration.
Tables 2a and 2b show the results.
6 Summary and Future Work
This paper presented the IoT process log generator, a tool to generate synthetic
(sensor) event logs. Our generator not only creates event logs usable as ground
truth. Additionally, it offers to add an adjustable degree of erroneous entries
(noise) to enable working on imperfect and, therefore, more realistic data. In
this way, the synthetic event logs might be used to validate process discovery
algorithms, increase the quality of event logs, and also pave the way for novel
use cases based on (IoT) sensor event data.
As for now, our generator is capable of generating event logs for single-user
settings. We plan to extend the IoT modeler with a multi-agent capability and
role-based task simulation. This extension will enable the assignment of resources
and individual availabilities to specific process activities. In future iterations, we
will allow more process modeling notations, increase the number of output for-
mats, and enhance the process visualization. The upcoming versions will include
BPMN 2.0 as an alternative input format. We plan to use random seeds for the
non-deterministic parts of the generator to ensure the reproducibility of exper-
iments. Beside .CSV and .TXT as output formats, it is planned to output the
event logs as .XES-files. To enhance the visualization of the process models, we
plan to include a 3D modeler allowing to visually create a realistic 3D environ-
Generating Synthetic Event Logs 7
Log - Clean
Sensor
ID Date Time Value
S1 2022-03-04 08:13 Off
S2 2022-03-04 08:17 On
S3 2022-03-04 08:25 Off
S4 2022-03-24 08:36 On
S5 2022-03-04 08:58 Off
F26 2022-03-04 09:33 96.22
S6 2022-03-04 09:42 On
F27 2022-03-04 09:56 0.493
(a) Sensor Event Log Clean
Log - Noise
Sensor
ID Date Time Value Noise Type
S1 2022-03-04 20:13 Off Wrong Time
S2 2022-03-04 08:17 On Event Twice
S2 2022-03-04 08:17 On Event Twice
Event Lost
S4 2022-03-24 08:36 On Multi Recordings
S4 2022-03-24 08:36 On Multi Recordings
S4 2022-03-24 08:37 On Multi Recordings
S5 2022-04-03 08:58 Off Wrong Date
S2 2022-03-04 09:33 On Wrong Sensor
S6 2022-03-04 09:42 Off Wrong Status
F27 2022-03-04 09:56 0.557 Wrong Value
(b) Sensor Event Log with added Noise
Table 2: Synthetic Sensor Event Logs
ment and to augment the process model. A corresponding 3D plug-in has been
already presented [21].
References
1. Ahmad, S., Malik, S., Kim, D.H.: Comparative Analysis of Simulation Tools with
Visualization based on Realtime Task Scheduling Algorithms for IoT Embedded
Applications. International Journal of Grid and Distributed Computing 11, 1–10
(Feb 2018). https://doi.org/10.14257/ijgdc.2018.11.2.01
2. Ahmad, S., Malik, S., Ullah, I., Park, D.H., Kim, K., Kim, D.: Towards
the Design of a Formal Verification and Evaluation Tool of Real-Time
Tasks Scheduling of IoT Applications. Sustainability 11(1), 204 (Jan 2019).
https://doi.org/10.3390/su11010204
3. Chen, J., Chun, D., Patel, M., Chiang, E., James, J.: The validity of synthetic
clinical data: a validation study of a leading synthetic data generator (Synthea)
using clinical quality measures. BMC Medical Informatics and Decision Making
19(1), 44 (Mar 2019). https://doi.org/10.1186/s12911-019-0793-0
4. Chernyshev, M., Baig, Z., Bello, O., Zeadally, S.: Internet of Things (IoT): Re-
search, Simulators, and Testbeds. IEEE Internet of Things Journal 5(3), 1637–1647
(Jun 2018). https://doi.org/10.1109/JIOT.2017.2786639, number: 3
5. Ciccio, C.D., Bernardi, M.L., Cimitile, M., Maggi, F.M.: Generating event logs
through the simulation of declare models. In: Workshop on Enterprise and Orga-
nizational Modeling and Simulation. pp. 20–36. Springer (2015)
6. Cook, D., Schmitter-Edgecombe, M.: Assessing the Quality of Activities in a
Smart Environment. Methods of information in medicine 48, 480–5 (Jun 2009).
https://doi.org/10.3414/ME0592
7. Eckleder, A., Freytag, T.: Woped a tool for teaching, analyzing and visualizing
workflow nets. Petri Net Newsletter 75, 3–8 (2008)
8 Y. Zisgen et al.
8. Elkoumy, G., Fahrenkrog-Petersen, S.A., Sani, M.F., Koschmider, A., Mannhardt,
F., von Voigt, S.N., Rafiei, M., von Waldthausen, L.: Privacy and Confidentiality
in Process Mining Threats and Research Challenges. ACM 13(1), 1–17 (Mar
2022). https://doi.org/10.1145/3468877, number: 1 arXiv: 2106.00388
9. Gimenez, P., Molina, B., Palau, C.E., Esteve, M.: SWE Simulation and
Testing for the IoT. In: 2013 IEEE International Conference on Sys-
tems, Man, and Cybernetics. pp. 356–361. IEEE, Manchester (Oct 2013).
https://doi.org/10.1109/SMC.2013.67
10. Harris, C.R., Millman, K.J., Oliphant, T.E.: Array programming with NumPy.
Nature 585(7825), 357–362 (Sep 20). https://doi.org/10.1038/s41586-020-2649-2
11. Janiesch, C., Koschmider, A., Mecella, M., Weber, B., Burattin, A., Di Ciccio, C.,
Fortino, G., Gal, A., Kannengiesser, U., Leotta, F., Mannhardt, F., Marrella, A.,
Mendling, J., Oberweis, A., Reichert, M., Rinderle-Ma, S., Serral, E., Song, W.,
Su, J., Zhang, L.: The internet of things meets business process management: A
manifesto. IEEE Systems, Man, and Cybernetics Magazine 6, 34–44 (10 2020).
https://doi.org/10.1109/MSMC.2020.3003135
12. Jensen, K., Kristensen, L.M., Wells, L.: Coloured petri nets and cpn tools for
modelling and validation of concurrent systems. International Journal on Software
Tools for Technology Transfer 9(3), 213–254 (2007)
13. Kertesz, A., Pflanzner, T., Gyimothy, T.: A Mobile IoT Device Simulator for
IoT-Fog-Cloud Systems. Journal of Grid Computing 17, 529–551 (Sep 2019).
https://doi.org/10.1007/s10723-018-9468-9, number: 3
14. Koschmider, A., Kaczmarek, K., Krause, M., van Zelst, S.J.: Demystifying noise
and outliers in event logs: Review and future directions. In: Business Process Man-
agement Workshops. Lecture Notes in Business Information Processing, vol. 436,
pp. 123–135. Springer (2021)
15. Papadopoulos, G.Z., Beaudaux, J., Gallais, A., No¨el, T., Schreiner,
G.: Adding value to WSN simulation using the IoT-LAB experimen-
tal platform. In: 2013 IEEE 9th WiMob. pp. 485–490 (Oct 2013).
https://doi.org/10.1109/WiMOB.2013.6673403, iSSN: 2160-4894
16. Patki, N., Wedge, R., Veeramachaneni, K.: The Synthetic Data Vault. In: 2016
IEEE International Conference on Data Science and Advanced Analytics (DSAA).
pp. 399–410 (Oct 2016). https://doi.org/10.1109/DSAA.2016.49
17. Ramprasad, B., Fokaefs, M., Mukherjee, J., Litoiu, M.: EMU-IoT - A Virtual In-
ternet of Things Lab. In: 2019 IEEE International Conference on Autonomic Com-
puting (ICAC). pp. 73–83 (Jun 2019). https://doi.org/10.1109/ICAC.2019.00019
18. Sharif, M., Sadeghi-Niaraki, A.: Ubiquitous sensor network simulation and emula-
tion environments: A survey. Journal of Network and Computer Applications 93,
150–181 (Sep 2017). https://doi.org/10.1016/j.jnca.2017.05.009
19. Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To,
T., Cameracci, E., Boochoon, S., Birchfield, S.: Training Deep Networks With
Synthetic Data: Bridging the Reality Gap by Domain Randomization. In: 2018
IEEE/CVF CVPRW. pp. 969–977 (2018)
20. Van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H., Weijters, A., van Der Aalst,
W.M.: The prom framework: A new era in process mining tool support. In: Inter-
national conference on application and theory of petri nets. pp. 444–454. Springer
(2005)
21. Wetzel, M., Koschmider, A.: Entwicklung einer VR-Umgebung zur Ex-
ploration von Process-Mining. HMD Prax. Wirtsch. 59(1), 37–53 (2022).
https://doi.org/10.1365/s40702-021-00827-8
... Furthermore, detailed methodological guidance and tool support are necessary to facilitate the operationalisation of process-data governance. Eventually, synthetic data might also be beneficial for this purpose [26,66]. Generally, it has been shown that synthetic data not only provides a substitution for real data, but can even enhance insight into domain-specific research. ...
Article
Full-text available
Since its emergence over two decades ago, process mining has flourished as a discipline, with numerous contributions to its theory, widespread practical applications, and mature support by commercial tooling environments. However, its potential for significant organisational impact is hampered by poor quality event data. Process mining starts with the acquisition and preparation of event data coming from different data sources. These are then transformed into event logs, consisting of process execution traces including multiple events. In real-life scenarios, event logs suffer from significant data quality problems, which must be recognised and effectively resolved for obtaining meaningful insights from process mining analysis. Despite its importance, the topic of data quality in process mining has received limited attention. In this paper, we discuss the emerging challenges related to process-data quality from both a research and practical point of view. Additionally, we present a corresponding research agenda with key research directions.
... BPM research artifacts can then be modified to achieve appropriate analysis results with this more complex IoT-data. In current research, some event logs for PM in smart environments have been proposed (e. g., [7,8,9]). Even though these provide a good basis for research, they are mostly synthetically generated, as acquiring real-world data for research purposes can be very difficult [3,10]. ...
Preprint
Full-text available
Modern technologies such as the Internet of Things (IoT) are becoming increasingly important in various domains, including Business Process Management (BPM) research. One main research area in BPM is process mining, which can be used to analyze event logs, e.g., for checking the conformance of running processes. However, there are only a few IoT-based event logs available for research purposes. Some of them are artificially generated and the problem occurs that they do not always completely reflect the actual physical properties of smart environments. In this paper, we present an IoT-enriched XES event log that is generated by a physical smart factory. For this purpose, we create the SensorStream XES extension for representing IoT-data in event logs. Finally, we present some preliminary analysis and properties of the log.
Conference Paper
This work addresses the challenge of data scarcity in process mining by proposing the creation of synthetic training data using generative models. A comparative analysis is conducted between a Long Short-Term Memory (LSTM) model and the Generative Adversarial Network (GAN) model, using two distinct datasets. Multiple evaluation methods are employed to compare the results from the two models based on: precision, fidelity, diversity, and novelty. Results indicate that while LSTM accurately reproduces the initial data structure, GAN introduces more variability, offering a wider range of training scenarios. This highlights the potential of GAN-generated data to enhance the effectiveness and reliability of machine learning-based process mining tools.
Article
Full-text available
The continuous evolution of digital technologies applied to the more traditional world of industrial automation led to Industry 4.0, which envisions production processes subject to continuous monitoring and able to dynamically respond to changes that can affect the production at any stage (resilient factory). The concept of agility, which is a core element of Industry 4.0, is defined as the ability to quickly react to breaks and quickly adapt to changes. Accurate approaches should be implemented aiming at managing, optimizing and improving production processes. In this vision paper, we show how process management (BPM) can benefit from the availability of raw data from the industrial internet of things to obtain agile processes by using a top-down approach based on automated synthesis and a bottom-up approach based on mining.
Chapter
Process mining has shown that it provides valuable insights in terms of uncovering bottlenecks and inefficiencies in processes or identifying tasks for automation. However, process mining techniques expect structured input data that is at a high (business) level of abstraction. Recently, the benefits of process mining for unstructured data which is at a much lower level of abstraction have been demonstrated, e.g., for IoT data or time series data. It can be expected that the demand for methods efficiently processing these kinds of data for process mining will continuously increase. Hence, in this paper, we present an approach that allows the translation of video data into higher-level, discrete event data, thus enabling existing process mining techniques to work on data tracked in videos. Particularly, we used a combination of object tracking, spatio-temporal action detection, and techniques for raising the abstraction level of events. The evaluation results show that meaningful event logs can be extracted from an unlabeled video dataset, validating both the implementation and the feasibility of our approach.KeywordsProcess miningEvent log extractionUnstructured dataActivity recognition
Chapter
The combination of machine learning techniques with process analytics like process mining might significantly elevate novel insights into time-series data collections that are predominantly used in disciplines like life and natural science. To efficiently analyse time-series data by process mining requires bridging several challenges. For instance, time-series data need to be processed and represented in a useable form to turn into information. This paper provides: (1) A structured approach to map time-series data on control-flow patterns that we annotated for our purpose. (2) Based on the simulation of the patterns it is possible to generate synthetic data in varying quality, which is again a crucial step for accurate results from machine learning techniques. In this way, our approach contributes understanding novel insights in terms of causal-effects in time-series data, which could not be answered by traditional approaches used in the disciplines.Keywordstime-series datacontrol-flow patternsprocess mining
Article
Full-text available
Zusammenfassung VR-Umgebungen werden bereits in zahlreichen Anwendungsszenarien erfolgreich zur Visualisierung von Daten mit dem Ziel beispielsweise der Lernprozessunterstützung eingesetzt. Dieser Beitrag stellt eine VR-basierte Umgebung für Process-Mining mit dem Ziel der Prozessanalyseunterstützung vor. Die VR-basierte Umgebung ermöglicht, Prozessdaten aus Quellsystemen dynamisch anzubinden und zu laden und diese als ein dreidimensionales Prozessmodell zu visualisieren und zu analysieren. Die VR-Umgebung wurde systematisch basierend auf einer Anforderungsanalyse konzipiert, die aus kommerziellen (2D) Process-Mining Werkzeugen und verwandten Arbeiten aus der Literatur abgeleitet wurde. Bei der Implementierung der Umgebung wurde Wert auf Erweiterbarkeit und Offenheit der Umgebung gelegt. Im Gegensatz zu einer zweidimensionalen Visualisierung ermöglicht die VR-Umgebung für Process-Mining eine verbesserte Exploration der Zusammenhänge zwischen Prozessen und Daten (z. B. Prozesskennzahlen wie Ressourcenauslastung, Durchlaufzeit, Prozessabweichungen).
Article
Full-text available
The Internet of Things (IoT) refers to a network of connected devices that collects and exchanges data through the Internet. These things can be artificial or natural and interact as autonomous agents that form a complex system. In turn, business process management (BPM) was established to analyze, discover, design, implement, execute, monitor, and evolve collaborative business processes within and across organizations. While the IoT and BPM have been regarded as separate topics in research and in practice, we strongly believe that, on the one hand, the management of IoT applications will greatly benefit from BPM concepts, methods, and technologies. On the other hand, the IoT poses challenges that will require enhancements and extensions of the current state of the art in the BPM field. In this article, we question the extent to which these two paradigms can be combined, and we discuss emerging challenges and intersections from a research and practitioner's point of view in terms of complex software systems development.
Article
Full-text available
Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.
Article
Full-text available
Background: Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult to obtain or unnecessary. However, its validity has not been fully examined, and no previous study has validated it from the perspective of healthcare quality, a critical aspect of a healthcare system. This study fills this gap by calculating clinical quality measures using synthetic data. Methods: We examined an open-source well-documented synthetic data generator Synthea, which was composed of the key advancements in this emerging technique. We selected a representative 1.2-million Massachusetts patient cohort generated by Synthea. Four quality measures, Colorectal Cancer Screening, Chronic Obstructive Pulmonary Disease (COPD) 30-Day Mortality, Rate of Complications after Hip/Knee Replacement, and Controlling High Blood Pressure, were selected based on clinical significance. Calculated rates were then compared with publicly reported rates based on real-world data of Massachusetts and United States. Results: Of the total Synthea Massachusetts population (n = 1,193,439), 394,476 were eligible for the “colorectal cancer screening” quality measure, and 248,433 (63%) were considered compliant, compared to the publicly reported Massachusetts and national rates being 77.3 and 69.8%, respectively. Of the 409 eligible patients, 0.7% of died within 30 days after COPD exacerbation, versus 7% reported in Massachusetts and 8% nationally. Using an expanded logic, this rate increased to 5.7%. No Synthea residents had complications after Hip/Knee Replacement (Massachusetts: 2.9%, national: 2.8%) or had their blood pressure controlled after being diagnosed with hypertension (Massachusetts: 74.52%, national: 69.7%). Results show that Synthea is quite reliable in modeling demographics and probabilities of services being offered in an average healthcare setting. However, its capabilities to model heterogeneous health outcomes post services are limited. Conclusions Synthea and other synthetic patient generators do not currently model for deviations in care and the potential outcomes that may result from care deviations. To output a more realistic data set, we propose that synthetic data generators should consider important quality measures in their logic and model when clinicians may deviate from standard practice.
Article
Full-text available
Real-Time Internet of Things (RT-IoT) is a newer technology paradigm envisioned as a global inter-networking of devices and physical things enabling real-time communication over the Internet. The research in Edge Computing and 5G technology is making way for the realisation of future IoT applications. In RT-IoT tasks will be performed in real-time for the remotely controlling and automating of various jobs and therefore, missing their deadline may lead to hazardous situations in many cases. For instance, in the case of safety-critical and mission-critical IoT systems, a missed task could lead to a human loss. Consequently, these systems must be simulated, as a result, and tasks should only be deployed in a real scenario if the deadline is guaranteed to be met. Numerous simulation tools are proposed for traditional real-time systems using desktop technologies, but these relatively older tools do not adapt to the new constraints imposed by the IoT paradigm. In this paper, we design and implement a cloud-based novel architecture for the formal verification of IoT jobs and provide a simulation environment for a typical RT-IoT application where the feasibility of real-time remote tasks is perceived. The proposed tool, to the best of our knowledge, is the first of its kind effort to support not only the feasibility analysis of real-time tasks but also to provide a real environment in which it formally monitors and evaluates different IoT tasks from anywhere. Furthermore, it will also act as a centralised server for evaluating and tracking the real-time scheduled jobs in a smart space. The novelty of the platform is purported by a comparative analysis with the state-of-art solutions against attributes which is vital for any open-source tools in general and IoT in specifics.
Article
Privacy and confidentiality are very important prerequisites for applying process mining to comply with regulations and keep company secrets. This article provides a foundation for future research on privacy-preserving and confidential process mining techniques. Main threats are identified and related to a motivation application scenario in a hospital context as well as to the current body of work on privacy and confidentiality in process mining. A newly developed conceptual model structures the discussion that existing techniques leave room for improvement. This results in a number of important research challenges that should be addressed by future process mining research.
Chapter
Various process mining techniques exist, e.g., techniques that automatically discover a descriptive model of the execution of a process, based on event data. Whereas the premise of process mining is clear, i.e., as witnessed by the tremendous growth of the field, data quality issues often hamper the direct applicability of process mining techniques. Several authors have studied data quality issues in process mining, yet, these works primarily propose data pre-processing techniques. An overarching study of the nature of data quality issues, the types of available techniques, and the general possibilities of (semi)-automated outlier/noise detection methods is missing. Therefore, in this paper, we propose a first attempt to structure and study the field of outlier/noise detection in process mining and understand to what degree knowledge on noise and outliers from other domains could advance the process mining field. We do so by answering three central research questions, covering various aspects related to (semi)-automated outlier/noise detection.
Conference Paper
Various process mining techniques exist, e.g., techniques that automatically discover a descriptive model of the execution of a process, based on event data. Whereas the premise of process mining is clear, i.e., as witnessed by the tremendous growth of the field, data quality issues often hamper the direct applicability of process mining techniques. Several authors have studied data quality issues in process mining, yet, these works primarily propose data pre-processing techniques. An overarching study of the nature of data quality issues, the types of available techniques , and the general possibilities of (semi)-automated outlier/noise detection methods is missing. Therefore, in this paper, we propose a first attempt to structure and study the field of outlier/noise detection in process mining and understand to what degree knowledge on noise and outliers from other domains could advance the process mining field. We do so by answering three central research questions, covering various aspects related to (semi)-automated outlier/noise detection.