Process mining has gained significant practical usefulness in diverse domains. The input of process mining is an event log, tracking the execution of activities that can be mapped onto a business processes. Thus, the availability and quality of event logs significantly impact the process mining result. The use of process mining in novel use cases or experimental settings is often hampered because no appropriate event logs are available. This paper presents a tool to generate synthetic (sensor) event logs. Compared to existing synthetic log generator tools, the IoT process log generator produces data in a non-deterministic way. Users can add noise in a controlled manner and might enhance the processes with IoT data. In this way, the tool allows generating synthetic data for IoT environments that can be individually configured. Our tool makes a contribution towards an increased use of process mining in settings relying on (IoT) sensor event data. KeywordsInternet of thingsEvent log simulationSynthetic dataBusiness process simulationProcess mining
Generating Synthetic Sensor Event Logs
for Process Mining
Yorck Zisgen[0000000296462829], Dominik Janssen[0000000302188628], and
Agnes Koschmider[0000000182067636]
Group Process Analytics
Kiel University, Germany
1 Introduction
Process Mining and Internet-of-Things (IoT) can significantly benefit from each
other because IoT environments produce the large quantity of data that process
mining methods require for accurate process analysis [11]. In turn, process mining
provides the insights to understand the IoT enhanced processes in a controlled
way. However, quality issues of the high volume of IoT data (like missing or
incomplete data entries) hamper the direct applicability of process mining on
IoT data. Additionally, sensor event data is at a much lower level of semantics,
and the data does not directly relate to high-level business process concepts as
required for process mining.
To allow working on IoT data of different data quality that can be used by
process mining, this paper presents the IoT process log generator. The tool gen-
erates both synthetic event logs and synthetic IoT sensor event logs. For this
purpose, a simulation engine with an end-user front end has been implemented.
Users model processes and can configure the process models in terms of dura-
tion, frequency of process activities, or noise. Optionally they can specify an
2 Y. Zisgen et al.
IoT environment that is mapped to the process model. In this way, event logs
for varying IoT environments (with different sensor or failure types) can be pro-
duced and used for experimental purposes. For instance, motion sensors with
discrete ON and OFF states or temperature sensors with continuous values can
be configured. Also, a simulation can be used to answer questions of whether
upgrading a facility with IoT sensors will justify the occurred cost. It can also
help to reveal bottlenecks in production capacity by showing potential conges-
tion in the simulation. Generally, it has been shown that synthetic data not only
provide a substitution for real data [3, 16], but can even enhance insight into
domain-specific research [19]. Thus, we are convinced that our IoT process log
generator will fuel the application of process mining in use cases where data
accessibility is challenging and data quality also hampers data analysis.
This paper is structured as follows. Section 2 summarizes related works.
Section 3 presents the general architecture of our tool, while Section 4 provides an
in-depth look into the implementation. Section 5 demonstrates the applicability
of our tool on two use cases. Finally, the paper concludes with a summary and
a discussion of future tasks.
2 Related Work
The following streams of research are related to the IoT process log generator:
(1) IoT log generators and (2) event log generators.
Generally, the available works are either limited to a certain sensor type [9,
15], are restricted to a particular application domain [2, 15], or only provide a
collection of IoT simulation approaches [1,4, 18]. For instance, the approaches
presented in [9, 15] are limited to GPS or signal strength sensor types and do not
allow to add additional sensor types. In contrast our IoT process log generator
allows to add additional sensor types like motion, light, temperature, counter or
on/off sensors in the analysis.
Diverse application domains have been tackled by IoT log generators like
mobile devices, wireless sensor networks or cyberphysical systems. For instance,
Kertestz et al. [13] propose a simulator for the cloud communication of mobile
IoT devices’ sensor data. Papadoupolos et al. [15] addressed signal strength of
wireless sensor networks. Ramprasad et al. [17] propose a simulator for virtual
IoT architectures called EMU-IoT, in which an end-to-end evaluation of an IoT
network can be simulated. Gim´enez et al. [9] tested changing positional data to
test collision anticipation of vehicles in a warehouse. Ahmad et al. [2] proposed a
simulation architecture to commit research on communication in real-time-IoT
environments. Thus, available IoT simulators are commonly designed around a
narrow application field. They do not allow for usage in more diverse settings
like the IoT process log generator, which allows to run simulations such as visi-
tor amount monitoring, smart home activities, procedures in a smart factory, or
hospital processes.
Synthetic event logs might also be generated with the CPN tool [12], ProM [20]
or WoPed [7], which are common tools for process modeling or process mining
Generating Synthetic Event Logs 3
respectively. Also, a log generator for declarative process models has been sug-
gested [5]. However, these tools only create deterministic event logs that directly
result from the behavior of the modeled process (i.e., no frequency of trace occur-
rence can be specified). Furthermore, ProM allows to add noise and outliers into
the event log. However, a recent analysis of these noise adding plug-ins showed
that available noise filtering tools do not appropriately filter nor add noise [14].
In contrast, our IoT process log generator makes it possible to introduce different
noise types, while still providing a corresponding noise-free log as ground truth
for comparison.
To sum up, the available IoT log generators are restricted to specific sensor
types or application domains, while our tool is not restricted to any specific
sensor type or domain. It also generates data in a non-deterministic way, can
add noise in a controlled way and it can be expanded to include other sensor or
noise types.
3 Architecture
This section presents the architecture of the IoT process log generator. Fig. 1
shows the architectural design. The tool is platform-independent, is designed for
single-user settings, and was developed in Python version 3.9, using NumPy [10]
as an external library. The tool can be accessed publicly via a browser. Users
Modeler Settings
map & extract
Log Generator
Sensor Event
Log Generator
Fig. 1: Architecture of the IoT Process Log Generator
can model processes with Petri nets in the online modeler or import .PNML files.
Additionally, they can manage simulation settings, such as duration of activities,
simulated time range, and add different types and quantities of noise. They can
use the IoT environment to configure an IoT environment (i.e., specify sensor
types and failure probabilities). Subsequently, two different types of event logs
can be generated at the application layer, either a conventional event log with
time-stamps and activities or an IoT sensor event log with additional sensor
information. The settings configuration and the designed IoT environment are
used as input in a way that each process activity is mapped onto locations in
the IoT environment. Finally, the information generated by the application layer
4 Y. Zisgen et al.
is processed into an event log that can be read on-screen or downloaded as a
.TXT- or .CSV-file. The next section explains in detail the conceptual design of
the IoT process log generator.
4 Implementation
Figure 2 extends Figure 1 and shows the information flow within our IoT pro-
cess log generator. First, the modeled processes, the configuration settings and
Simulation Engine
Event Log
Workflow Net
1 2 3
8 7
11 12
5 5
Front End
Fig. 2: Information Flow
Location Edge
Manager Message
1 1
10 01..*
Fig. 3: UML Class Diagram Extract
the desired output format are forwarded to the simulation engine (see 1,2,3).
The simulation engine creates process instances from the business process(es)
(4). Those process instances (consisting of places and transitions) are processed
independently of one another. Internally, they follow the execution rules of Petri
nets (5,6). Transitions represent activities, which are mapped to locations in an
IoT environment (8) and activate sensors (7). Movements within the environ-
ment (9) can also activate sensors (10). Sensor readings (continuous or discrete)
then adhere to the IoT settings and finally transmit their information, times-
tamps, and optionally noise, to the sensor event log (11). If a user only aims for
an event log, the IoT modeler is skipped and the event log is directly generated
(12). Figure 3 shows the technical design of the IoT process log generator in
terms of a UML class diagram.
Our work distinguishes itself from a processing script by avoiding any deter-
ministic proceedings when exclusive choices or parallel activities are modeled.
Furthermore activity duration and sensor values are determined by choosing a
random value within a given interval and a probability distribution. As a con-
sequence, all possible execution sequences of a business process can be tracked
in the event log. Activity duration is randomly set to a value within the range
Generating Synthetic Event Logs 5
specified for this activity. Sensor activation can be set to happen at any time
during the activation, in either a given or random order. Furthermore, we permit
to specify probabilities for each choice to allow for a ’standard case execution
and an ’outlier case execution.
Figure 4 shows exemplary the GUI of our tool. The user has modeled a Petri
net referring to a hospital process (see Figures 4a and 4b). The generator will
simulate the modeled process ten times within a six-hour time frame. Noise with
an occurrence rate of 50.7% is added to the output. According to the selection,
the following types of noise will be included: dropping events, duplicating events
(i.e., event twice), and assigning a wrong event value (Figure 4c).
(a) Petri net Wizard
(b) Settings: Activity (c) Settings: IoT-Simulation
Fig. 4: Screenshots of Web Interface
5 Exemplary application of the IoT Process Log
In this section, we demonstrate the usefulness of our tool using two examples:
hospital processes and smart homes. To generate an event log for a hospital
process, we took the process described in Elkoumy et al. [8]. We translated the
BPMN process to a Petri net, configured the settings of the process (i.e., used
different activity durations, added noise), and simulated the process. Based on
the simulation and the user’s configuration, an event log has been generated
as shown in Table 1. The left-hand side of the table shows the clean event log
(ground truth), while the right-hand side of the table includes noise according
to the user specifications.
6 Y. Zisgen et al.
Log - Clean
ID Date Time Activity
846 2022-02-24 08:23 Register
846 2022-02-24 09:07 Hospitalize
846 2022-02-24 10:46 Blood Test
846 2022-02-24 11:18 Blood Test
846 2022-02-24 12:18 Visit
846 2022-02-24 13:12 Discharge
(a) Event Log Clean
Log - Noise
ID Date Time Activity Noise Type
846 2022-02-24 20:23 Register Wrong Time
846 2022-02-24 09:07 Hospitalize Event Twice
846 2022-02-24 09:07 Hospitalize Event Twice
846 2022-24-02 10:46 Blood Test Wrong Date
Event Lost
846 2022-02-24 12:18 Visit Multi Recordings
846 2022-02-24 12:21 Visit Multi Recordings
846 2022-02-24 12:22 Visit Multi Recordings
846 2022-02-24 13:12 Register Wrong Event
(b) Event Log with added Noise
Table 1: Synthetic Hospital Event Logs
The second example is related to smart homes. For this purpose, we took a
sensor event log from literature [6] observing activities of smart home inhabitants
with various sensors. We designed an IoT environment with the IoT modeler,
added different sensor types, and simulated daily routines like cooking,clean-
ing up or making breakfast. Additionally, we added noise to the configuration.
Tables 2a and 2b show the results.
6 Summary and Future Work
This paper presented the IoT process log generator, a tool to generate synthetic
(sensor) event logs. Our generator not only creates event logs usable as ground
truth. Additionally, it offers to add an adjustable degree of erroneous entries
(noise) to enable working on imperfect and, therefore, more realistic data. In
this way, the synthetic event logs might be used to validate process discovery
algorithms, increase the quality of event logs, and also pave the way for novel
use cases based on (IoT) sensor event data.
As for now, our generator is capable of generating event logs for single-user
settings. We plan to extend the IoT modeler with a multi-agent capability and
role-based task simulation. This extension will enable the assignment of resources
and individual availabilities to specific process activities. In future iterations, we
will allow more process modeling notations, increase the number of output for-
mats, and enhance the process visualization. The upcoming versions will include
BPMN 2.0 as an alternative input format. We plan to use random seeds for the
non-deterministic parts of the generator to ensure the reproducibility of exper-
iments. Beside .CSV and .TXT as output formats, it is planned to output the
event logs as .XES-files. To enhance the visualization of the process models, we
plan to include a 3D modeler allowing to visually create a realistic 3D environ-
Generating Synthetic Event Logs 7
Log - Clean
ID Date Time Value
S1 2022-03-04 08:13 Off
S2 2022-03-04 08:17 On
S3 2022-03-04 08:25 Off
S4 2022-03-24 08:36 On
S5 2022-03-04 08:58 Off
F26 2022-03-04 09:33 96.22
S6 2022-03-04 09:42 On
F27 2022-03-04 09:56 0.493
(a) Sensor Event Log Clean
Log - Noise
ID Date Time Value Noise Type
S1 2022-03-04 20:13 Off Wrong Time
S2 2022-03-04 08:17 On Event Twice
S2 2022-03-04 08:17 On Event Twice
Event Lost
S4 2022-03-24 08:36 On Multi Recordings
S4 2022-03-24 08:36 On Multi Recordings
S4 2022-03-24 08:37 On Multi Recordings
S5 2022-04-03 08:58 Off Wrong Date
S2 2022-03-04 09:33 On Wrong Sensor
S6 2022-03-04 09:42 Off Wrong Status
F27 2022-03-04 09:56 0.557 Wrong Value
(b) Sensor Event Log with added Noise
Table 2: Synthetic Sensor Event Logs
ment and to augment the process model. A corresponding 3D plug-in has been
already presented [21].
