Content uploaded by Yorck Zisgen
Author content
All content in this area was uploaded by Yorck Zisgen on May 21, 2023
Content may be subject to copyright.
Generating Synthetic Sensor Event Logs
for Process Mining
Yorck Zisgen[0000−0002−9646−2829], Dominik Janssen[0000−0003−0218−8628], and
Agnes Koschmider[0000−0001−8206−7636]
Group Process Analytics
Kiel University, Germany
yzi|dominik.janssen|ak@informatik.uni-kiel.de
Abstract. Process mining has gained significant practical usefulness in
diverse domains. The input of process mining is an event log, tracking
the execution of activities that can be mapped onto a business processes.
Thus, the availability and quality of event logs significantly impact the
process mining result. The use of process mining in novel use cases or ex-
perimental settings is often hampered because no appropriate event logs
are available. This paper presents a tool to generate synthetic (sensor)
event logs. Compared to existing synthetic log generator tools, the IoT
process log generator produces data in a non-deterministic way. Users
can add noise in a controlled manner and might enhance the processes
with IoT data. In this way, the tool allows generating synthetic data for
IoT environments that can be individually configured. Our tool makes
a contribution towards an increased use of process mining in settings
relying on (IoT) sensor event data.
Keywords: Internet of Things ·Event Log Simulation ·Synthetic Data
·Business Process Simulation ·Process Mining.
1 Introduction
Process Mining and Internet-of-Things (IoT) can significantly benefit from each
other because IoT environments produce the large quantity of data that process
mining methods require for accurate process analysis [11]. In turn, process mining
provides the insights to understand the IoT enhanced processes in a controlled
way. However, quality issues of the high volume of IoT data (like missing or
incomplete data entries) hamper the direct applicability of process mining on
IoT data. Additionally, sensor event data is at a much lower level of semantics,
and the data does not directly relate to high-level business process concepts as
required for process mining.
To allow working on IoT data of different data quality that can be used by
process mining, this paper presents the IoT process log generator. The tool gen-
erates both synthetic event logs and synthetic IoT sensor event logs. For this
purpose, a simulation engine with an end-user front end has been implemented.
Users model processes and can configure the process models in terms of dura-
tion, frequency of process activities, or noise. Optionally they can specify an
2 Y. Zisgen et al.
IoT environment that is mapped to the process model. In this way, event logs
for varying IoT environments (with different sensor or failure types) can be pro-
duced and used for experimental purposes. For instance, motion sensors with
discrete ON and OFF states or temperature sensors with continuous values can
be configured. Also, a simulation can be used to answer questions of whether
upgrading a facility with IoT sensors will justify the occurred cost. It can also
help to reveal bottlenecks in production capacity by showing potential conges-
tion in the simulation. Generally, it has been shown that synthetic data not only
provide a substitution for real data [3, 16], but can even enhance insight into
domain-specific research [19]. Thus, we are convinced that our IoT process log
generator will fuel the application of process mining in use cases where data
accessibility is challenging and data quality also hampers data analysis.
This paper is structured as follows. Section 2 summarizes related works.
Section 3 presents the general architecture of our tool, while Section 4 provides an
in-depth look into the implementation. Section 5 demonstrates the applicability
of our tool on two use cases. Finally, the paper concludes with a summary and
a discussion of future tasks.
2 Related Work
The following streams of research are related to the IoT process log generator:
(1) IoT log generators and (2) event log generators.
Generally, the available works are either limited to a certain sensor type [9,
15], are restricted to a particular application domain [2, 15], or only provide a
collection of IoT simulation approaches [1,4, 18]. For instance, the approaches
presented in [9, 15] are limited to GPS or signal strength sensor types and do not
allow to add additional sensor types. In contrast our IoT process log generator
allows to add additional sensor types like motion, light, temperature, counter or
on/off sensors in the analysis.
Diverse application domains have been tackled by IoT log generators like
mobile devices, wireless sensor networks or cyberphysical systems. For instance,
Kertestz et al. [13] propose a simulator for the cloud communication of mobile
IoT devices’ sensor data. Papadoupolos et al. [15] addressed signal strength of
wireless sensor networks. Ramprasad et al. [17] propose a simulator for virtual
IoT architectures called EMU-IoT, in which an end-to-end evaluation of an IoT
network can be simulated. Gim´enez et al. [9] tested changing positional data to
test collision anticipation of vehicles in a warehouse. Ahmad et al. [2] proposed a
simulation architecture to commit research on communication in real-time-IoT
environments. Thus, available IoT simulators are commonly designed around a
narrow application field. They do not allow for usage in more diverse settings
like the IoT process log generator, which allows to run simulations such as visi-
tor amount monitoring, smart home activities, procedures in a smart factory, or
hospital processes.
Synthetic event logs might also be generated with the CPN tool [12], ProM [20]
or WoPed [7], which are common tools for process modeling or process mining
Generating Synthetic Event Logs 3
respectively. Also, a log generator for declarative process models has been sug-
gested [5]. However, these tools only create deterministic event logs that directly
result from the behavior of the modeled process (i.e., no frequency of trace occur-
rence can be specified). Furthermore, ProM allows to add noise and outliers into
the event log. However, a recent analysis of these noise adding plug-ins showed
that available noise filtering tools do not appropriately filter nor add noise [14].
In contrast, our IoT process log generator makes it possible to introduce different
noise types, while still providing a corresponding noise-free log as ground truth
for comparison.
To sum up, the available IoT log generators are restricted to specific sensor
types or application domains, while our tool is not restricted to any specific
sensor type or domain. It also generates data in a non-deterministic way, can
add noise in a controlled way and it can be expanded to include other sensor or
noise types.
3 Architecture
This section presents the architecture of the IoT process log generator. Fig. 1
shows the architectural design. The tool is platform-independent, is designed for
single-user settings, and was developed in Python version 3.9, using NumPy [10]
as an external library. The tool can be accessed publicly via a browser. Users
‹
Frontend
IoT
Modeler
Process
Modeler Settings
Configuration
export
map & extract
Application
‹
Event
Log Generator
‹
Sensor Event
Log Generator
Output
CSV
Fig. 1: Architecture of the IoT Process Log Generator
can model processes with Petri nets in the online modeler or import .PNML files.
Additionally, they can manage simulation settings, such as duration of activities,
simulated time range, and add different types and quantities of noise. They can
use the IoT environment to configure an IoT environment (i.e., specify sensor
types and failure probabilities). Subsequently, two different types of event logs
can be generated at the application layer, either a conventional event log with
time-stamps and activities or an IoT sensor event log with additional sensor
information. The settings configuration and the designed IoT environment are
used as input in a way that each process activity is mapped onto locations in
the IoT environment. Finally, the information generated by the application layer
4 Y. Zisgen et al.
is processed into an event log that can be read on-screen or downloaded as a
.TXT- or .CSV-file. The next section explains in detail the conceptual design of
the IoT process log generator.
4 Implementation
Figure 2 extends Figure 1 and shows the information flow within our IoT pro-
cess log generator. First, the modeled processes, the configuration settings and
Simulation Engine
Event Log
Workflow Net
Place
1 2 3
4
6
Sensor
9LocationPath
10
8 7
11 12
5 5
Transition
Front End
Fig. 2: Information Flow
Frontend
Application
Processing
Engine
Application
IoT
Engine
Output
Configuration
Manager
Simulation
Engine
Location Edge
Output
Manager Message
Object
1 1
10 01..*
Place
0
1
1
0
0
*
0
*
*
0
1
1*
1
1
*
2
0
*
*0
02
0*
Node
{abstract}
Arc
WorkflowNet
Transition
Sensor
{abstract}
Fig. 3: UML Class Diagram Extract
the desired output format are forwarded to the simulation engine (see 1,2,3).
The simulation engine creates process instances from the business process(es)
(4). Those process instances (consisting of places and transitions) are processed
independently of one another. Internally, they follow the execution rules of Petri
nets (5,6). Transitions represent activities, which are mapped to locations in an
IoT environment (8) and activate sensors (7). Movements within the environ-
ment (9) can also activate sensors (10). Sensor readings (continuous or discrete)
then adhere to the IoT settings and finally transmit their information, times-
tamps, and optionally noise, to the sensor event log (11). If a user only aims for
an event log, the IoT modeler is skipped and the event log is directly generated
(12). Figure 3 shows the technical design of the IoT process log generator in
terms of a UML class diagram.
Our work distinguishes itself from a processing script by avoiding any deter-
ministic proceedings when exclusive choices or parallel activities are modeled.
Furthermore activity duration and sensor values are determined by choosing a
random value within a given interval and a probability distribution. As a con-
sequence, all possible execution sequences of a business process can be tracked
in the event log. Activity duration is randomly set to a value within the range
Generating Synthetic Event Logs 5
specified for this activity. Sensor activation can be set to happen at any time
during the activation, in either a given or random order. Furthermore, we permit
to specify probabilities for each choice to allow for a ’standard case execution’
and an ’outlier case execution’.
Figure 4 shows exemplary the GUI of our tool. The user has modeled a Petri
net referring to a hospital process (see Figures 4a and 4b). The generator will
simulate the modeled process ten times within a six-hour time frame. Noise with
an occurrence rate of 50.7% is added to the output. According to the selection,
the following types of noise will be included: dropping events, duplicating events
(i.e., event twice), and assigning a wrong event value (Figure 4c).
(a) Petri net Wizard
(b) Settings: Activity (c) Settings: IoT-Simulation
Fig. 4: Screenshots of Web Interface
5 Exemplary application of the IoT Process Log
Generator
In this section, we demonstrate the usefulness of our tool using two examples:
hospital processes and smart homes. To generate an event log for a hospital
process, we took the process described in Elkoumy et al. [8]. We translated the
BPMN process to a Petri net, configured the settings of the process (i.e., used
different activity durations, added noise), and simulated the process. Based on
the simulation and the user’s configuration, an event log has been generated
as shown in Table 1. The left-hand side of the table shows the clean event log
(ground truth), while the right-hand side of the table includes noise according
to the user specifications.
6 Y. Zisgen et al.
Log - Clean
Case
ID Date Time Activity
846 2022-02-24 08:23 Register
846 2022-02-24 09:07 Hospitalize
846 2022-02-24 10:46 Blood Test
846 2022-02-24 11:18 Blood Test
846 2022-02-24 12:18 Visit
846 2022-02-24 13:12 Discharge
(a) Event Log Clean
Log - Noise
Case
ID Date Time Activity Noise Type
846 2022-02-24 20:23 Register Wrong Time
846 2022-02-24 09:07 Hospitalize Event Twice
846 2022-02-24 09:07 Hospitalize Event Twice
846 2022-24-02 10:46 Blood Test Wrong Date
Event Lost
846 2022-02-24 12:18 Visit Multi Recordings
846 2022-02-24 12:21 Visit Multi Recordings
846 2022-02-24 12:22 Visit Multi Recordings
846 2022-02-24 13:12 Register Wrong Event
(b) Event Log with added Noise
Table 1: Synthetic Hospital Event Logs
The second example is related to smart homes. For this purpose, we took a
sensor event log from literature [6] observing activities of smart home inhabitants
with various sensors. We designed an IoT environment with the IoT modeler,
added different sensor types, and simulated daily routines like cooking,clean-
ing up or making breakfast. Additionally, we added noise to the configuration.
Tables 2a and 2b show the results.
6 Summary and Future Work
This paper presented the IoT process log generator, a tool to generate synthetic
(sensor) event logs. Our generator not only creates event logs usable as ground
truth. Additionally, it offers to add an adjustable degree of erroneous entries
(noise) to enable working on imperfect and, therefore, more realistic data. In
this way, the synthetic event logs might be used to validate process discovery
algorithms, increase the quality of event logs, and also pave the way for novel
use cases based on (IoT) sensor event data.
As for now, our generator is capable of generating event logs for single-user
settings. We plan to extend the IoT modeler with a multi-agent capability and
role-based task simulation. This extension will enable the assignment of resources
and individual availabilities to specific process activities. In future iterations, we
will allow more process modeling notations, increase the number of output for-
mats, and enhance the process visualization. The upcoming versions will include
BPMN 2.0 as an alternative input format. We plan to use random seeds for the
non-deterministic parts of the generator to ensure the reproducibility of exper-
iments. Beside .CSV and .TXT as output formats, it is planned to output the
event logs as .XES-files. To enhance the visualization of the process models, we
plan to include a 3D modeler allowing to visually create a realistic 3D environ-
Generating Synthetic Event Logs 7
Log - Clean
Sensor
ID Date Time Value
S1 2022-03-04 08:13 Off
S2 2022-03-04 08:17 On
S3 2022-03-04 08:25 Off
S4 2022-03-24 08:36 On
S5 2022-03-04 08:58 Off
F26 2022-03-04 09:33 96.22
S6 2022-03-04 09:42 On
F27 2022-03-04 09:56 0.493
(a) Sensor Event Log Clean
Log - Noise
Sensor
ID Date Time Value Noise Type
S1 2022-03-04 20:13 Off Wrong Time
S2 2022-03-04 08:17 On Event Twice
S2 2022-03-04 08:17 On Event Twice
Event Lost
S4 2022-03-24 08:36 On Multi Recordings
S4 2022-03-24 08:36 On Multi Recordings
S4 2022-03-24 08:37 On Multi Recordings
S5 2022-04-03 08:58 Off Wrong Date
S2 2022-03-04 09:33 On Wrong Sensor
S6 2022-03-04 09:42 Off Wrong Status
F27 2022-03-04 09:56 0.557 Wrong Value
(b) Sensor Event Log with added Noise
Table 2: Synthetic Sensor Event Logs
ment and to augment the process model. A corresponding 3D plug-in has been
already presented [21].
References
1. Ahmad, S., Malik, S., Kim, D.H.: Comparative Analysis of Simulation Tools with
Visualization based on Realtime Task Scheduling Algorithms for IoT Embedded
Applications. International Journal of Grid and Distributed Computing 11, 1–10
(Feb 2018). https://doi.org/10.14257/ijgdc.2018.11.2.01
2. Ahmad, S., Malik, S., Ullah, I., Park, D.H., Kim, K., Kim, D.: Towards
the Design of a Formal Verification and Evaluation Tool of Real-Time
Tasks Scheduling of IoT Applications. Sustainability 11(1), 204 (Jan 2019).
https://doi.org/10.3390/su11010204
3. Chen, J., Chun, D., Patel, M., Chiang, E., James, J.: The validity of synthetic
clinical data: a validation study of a leading synthetic data generator (Synthea)
using clinical quality measures. BMC Medical Informatics and Decision Making
19(1), 44 (Mar 2019). https://doi.org/10.1186/s12911-019-0793-0
4. Chernyshev, M., Baig, Z., Bello, O., Zeadally, S.: Internet of Things (IoT): Re-
search, Simulators, and Testbeds. IEEE Internet of Things Journal 5(3), 1637–1647
(Jun 2018). https://doi.org/10.1109/JIOT.2017.2786639, number: 3
5. Ciccio, C.D., Bernardi, M.L., Cimitile, M., Maggi, F.M.: Generating event logs
through the simulation of declare models. In: Workshop on Enterprise and Orga-
nizational Modeling and Simulation. pp. 20–36. Springer (2015)
6. Cook, D., Schmitter-Edgecombe, M.: Assessing the Quality of Activities in a
Smart Environment. Methods of information in medicine 48, 480–5 (Jun 2009).
https://doi.org/10.3414/ME0592
7. Eckleder, A., Freytag, T.: Woped a tool for teaching, analyzing and visualizing
workflow nets. Petri Net Newsletter 75, 3–8 (2008)
8 Y. Zisgen et al.
8. Elkoumy, G., Fahrenkrog-Petersen, S.A., Sani, M.F., Koschmider, A., Mannhardt,
F., von Voigt, S.N., Rafiei, M., von Waldthausen, L.: Privacy and Confidentiality
in Process Mining – Threats and Research Challenges. ACM 13(1), 1–17 (Mar
2022). https://doi.org/10.1145/3468877, number: 1 arXiv: 2106.00388
9. Gimenez, P., Molina, B., Palau, C.E., Esteve, M.: SWE Simulation and
Testing for the IoT. In: 2013 IEEE International Conference on Sys-
tems, Man, and Cybernetics. pp. 356–361. IEEE, Manchester (Oct 2013).
https://doi.org/10.1109/SMC.2013.67
10. Harris, C.R., Millman, K.J., Oliphant, T.E.: Array programming with NumPy.
Nature 585(7825), 357–362 (Sep 20). https://doi.org/10.1038/s41586-020-2649-2
11. Janiesch, C., Koschmider, A., Mecella, M., Weber, B., Burattin, A., Di Ciccio, C.,
Fortino, G., Gal, A., Kannengiesser, U., Leotta, F., Mannhardt, F., Marrella, A.,
Mendling, J., Oberweis, A., Reichert, M., Rinderle-Ma, S., Serral, E., Song, W.,
Su, J., Zhang, L.: The internet of things meets business process management: A
manifesto. IEEE Systems, Man, and Cybernetics Magazine 6, 34–44 (10 2020).
https://doi.org/10.1109/MSMC.2020.3003135
12. Jensen, K., Kristensen, L.M., Wells, L.: Coloured petri nets and cpn tools for
modelling and validation of concurrent systems. International Journal on Software
Tools for Technology Transfer 9(3), 213–254 (2007)
13. Kertesz, A., Pflanzner, T., Gyimothy, T.: A Mobile IoT Device Simulator for
IoT-Fog-Cloud Systems. Journal of Grid Computing 17, 529–551 (Sep 2019).
https://doi.org/10.1007/s10723-018-9468-9, number: 3
14. Koschmider, A., Kaczmarek, K., Krause, M., van Zelst, S.J.: Demystifying noise
and outliers in event logs: Review and future directions. In: Business Process Man-
agement Workshops. Lecture Notes in Business Information Processing, vol. 436,
pp. 123–135. Springer (2021)
15. Papadopoulos, G.Z., Beaudaux, J., Gallais, A., No¨el, T., Schreiner,
G.: Adding value to WSN simulation using the IoT-LAB experimen-
tal platform. In: 2013 IEEE 9th WiMob. pp. 485–490 (Oct 2013).
https://doi.org/10.1109/WiMOB.2013.6673403, iSSN: 2160-4894
16. Patki, N., Wedge, R., Veeramachaneni, K.: The Synthetic Data Vault. In: 2016
IEEE International Conference on Data Science and Advanced Analytics (DSAA).
pp. 399–410 (Oct 2016). https://doi.org/10.1109/DSAA.2016.49
17. Ramprasad, B., Fokaefs, M., Mukherjee, J., Litoiu, M.: EMU-IoT - A Virtual In-
ternet of Things Lab. In: 2019 IEEE International Conference on Autonomic Com-
puting (ICAC). pp. 73–83 (Jun 2019). https://doi.org/10.1109/ICAC.2019.00019
18. Sharif, M., Sadeghi-Niaraki, A.: Ubiquitous sensor network simulation and emula-
tion environments: A survey. Journal of Network and Computer Applications 93,
150–181 (Sep 2017). https://doi.org/10.1016/j.jnca.2017.05.009
19. Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To,
T., Cameracci, E., Boochoon, S., Birchfield, S.: Training Deep Networks With
Synthetic Data: Bridging the Reality Gap by Domain Randomization. In: 2018
IEEE/CVF CVPRW. pp. 969–977 (2018)
20. Van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H., Weijters, A., van Der Aalst,
W.M.: The prom framework: A new era in process mining tool support. In: Inter-
national conference on application and theory of petri nets. pp. 444–454. Springer
(2005)
21. Wetzel, M., Koschmider, A.: Entwicklung einer VR-Umgebung zur Ex-
ploration von Process-Mining. HMD Prax. Wirtsch. 59(1), 37–53 (2022).
https://doi.org/10.1365/s40702-021-00827-8