Data Set for MobIS Challenge 2019
File (1)
Content uploaded by Martin Scheid
Author content
... To substantiate our discussion, we present empirical evidence that was generated in a setting that is representative of the typical evaluation setup used in the field. We employ five commonly used event logs (Helpdesk, BPIC12, BPIC13 Incidents, BPIC17 Offer, and MobIS [12]) and generate six splits for each log: five in which we randomly allocate traces so that 80% of them are part of the training set and 20% are part of the test set, and one in which the split is time-based so that the 20% of traces with the most recent start timestamps end up in the test set. We then generate n−1 prefix-label pairs (x, y) from each trace with lengths p ∈ [1, n−1] and calculate prediction accuracy as the percentage of prefixes in the test set for which the correct next-activity label was predicted, i.e.,ŷ = y. ...
... In certain scenarios, the context attributes like involved resources, timestamp or cost carry important information to determine the continuation of the process instance [2,12]. Considering the contextual information is an important capability when dealing with event logs which distinguishes next step prediction from other sequential prediction tasks. ...
Next activity prediction aims to forecast the future behavior of running process instances. Recent publications in this field predominantly employ deep learning techniques and evaluate their prediction performance using publicly available event logs. This paper presents empirical evidence that calls into question the effectiveness of these current evaluation approaches. We show that there is an enormous amount of example leakage in all of the commonly used event logs, so that rather trivial prediction approaches perform almost as well as ones that leverage deep learning. We further argue that designing robust evaluations requires a more profound conceptual engagement with the topic of next-activity prediction, and specifically with the notion of generalization to new data. To this end, we present various prediction scenarios that necessitate different types of generalization to guide future research.
The inductive miner (IM) can guarantee to return structured process models, but the process behaviours that process trees can represent are limited. Loops in process trees can only be exited after the execution of the “body” part. However, in some cases, it is possible to break a loop structure in the “redo” part. This paper proposes an extension to the process tree notation and the IM to discover and represent break behaviours. We present a case study using a healthcare event log to explore Acute Coronary Syndrome (ACS) patients’ treatment pathways, especially discharge behaviours from ICU, to demonstrate the usability of the proposed approach in real-life. We find that treatment pathways in ICU are routine behaviour, while discharges from ICU are break behaviours. The results show that we can successfully discover break behaviours and obtain the structured and understandable process model with satisfactory fitness, precision and simplicity.
Event logs capture information about executed activities. However, they do not capture information about activities that could have been performed, i.e., activities that were enabled during a process. Event logs containing information on enabled activities are called translucent event logs. Although it is possible to extract translucent event logs from a running information system, such logs are rarely stored. To increase the availability of translucent event logs, we propose two techniques. The first technique records the system’s states as snapshots. These snapshots are stored and linked to events. A user labels patterns that describe parts of the system’s state. By matching patterns with snapshots, we can add information about enabled activities. We apply our technique in a small setting to demonstrate its applicability. The second technique uses a process model to add information concerning enabled activities to an existing traditional event log. Data containing enabled activities are valuable for process discovery. Using the information on enabled activities, we can discover more correct models.
IoT devices supporting business processes (BPs) in sectors like manufacturing, logistics or healthcare collect data on the execution of the processes. In the last years, there has been a growing awareness of the opportunity to use the data these devices generate for process mining (PM) by deriving an event log from a sensor log via event abstraction techniques. However, IoT data are often affected by data quality issues (e.g., noise, outliers) which, if not addressed at the preprocessing stage, will be amplified by event abstraction and result in quality issues in the event log (e.g., incorrect events), greatly hampering PM results. In this paper, we review the literature on PM with IoT data to find the most frequent data quality issues mentioned in the literature. Based on this, we then derive six patterns of poor sensor data quality that cause event log quality issues and propose solutions to avoid or solve them.
Human behavior could be represented in the form of a process. Existing process modeling notations, however, are not able to faithfully represent these very flexible and unstructured processes. Additional non-process aware perspectives should be considered in the representation. Control-flow and data dimensions should be combined to build a robust model which can be used for analysis purposes. The work in this paper proposes a new hybrid model in which these dimensions are combined. An enriched conformance checking approach is described, based on the alignment of imperative and declarative process models, which also supports data dimensions from a statistical viewpoint.
Object-centric event log is a format for properly organizing information from different views of a business process into an event log. The novelty in such a format is the association of events with objects, which allows different notions of cases to be analyzed. The addition of new features has brought an increase in complexity. Clustering analysis can ease this complexity by enabling the analysis to be guided by process behaviour profiles. However, identifying which features describe the singularity of each profile is a challenge. In this paper, we present an exploratory study in which we mine frequent patterns on top of clustering analysis as a mechanism for profile characterization. In our study, clustering analysis is applied in a trace clustering fashion over a vector representation for a flattened event log extracted from an object-centric event log, using a unique case notion. Then, frequent patterns are discovered in the event sublogs associated with clusters and organized according to that original object-centric event log. The results obtained in preliminary experiments show association rules reveal more evident behaviours in certain profiles. Despite the process underlying each cluster may contain the same elements (activities and transitions), the behaviour trends show the relationships between such elements are supposed to be different. The observations depicted in our analysis make room to search for subtler knowledge about the business process under scrutiny.
Process mining is a family of techniques that support the analysis of operational processes based on event logs. Among the existing event log formats, the IEEE standard eXtensible Event Stream () is the most widely adopted. In , each event must be related to a single case object, which may lead to convergence and divergence problems. To solve such issues, object-centric approaches become promising, where objects are the central notion and one event may refer to multiple objects. In particular, the Object-Centric Event Logs () standard has been proposed recently. However, the crucial problem of extracting logs from external sources is still largely unexplored. In this paper, we try to fill this gap by leveraging the Virtual Knowledge Graph () approach to access data in relational databases. We have implemented this approach in the system, extending it to support both and standards. We have carried out an experiment with over the Dolibarr system. The evaluation results confirm that can effectively extract logs and the performance is scalable.
Predictive process monitoring techniques leverage machine learning (ML) to predict future characteristics of a case, such as the process outcome or the remaining run time. Available techniques employ various models and different types of input data to produce accurate predictions. However, from a practical perspective, explainability is another important requirement besides accuracy since predictive process monitoring techniques frequently support decision-making in critical domains. Techniques from the area of explainable artificial intelligence (XAI) aim to provide this capability and create transparency and interpretability for black-box ML models. While several explainable predictive process monitoring techniques exist, none of them leverages textual data. This is surprising since textual data can provide a rich context to a process that numerical features cannot capture. Recognizing this, we use this paper to investigate how the combination of textual and non-textual data can be used for explainable predictive process monitoring and analyze how the incorporation of textual data affects both the predictions and the explainability. Our experiments show that using textual data requires more computation time but can lead to a notable improvement in prediction quality with comparable results for explainability.
Aggregation of event data is a key operation in process mining for revealing behavioral features of processes for analysis. It has primarily been studied over sequences of events in event logs. The data model of event knowledge graphs enables new analysis questions requiring new forms of aggregation. We focus on analyzing task executions in event knowledge graphs. We show that existing aggregation operations are inadequate and propose new aggregation operations, formulated as query operators over labeled property graphs. We show on the BPIC’17 dataset that the new aggregation operations allow gaining new insights into differences in task executions, actor behavior, and work division.
During the last years, a number of studies have experimented with applying process mining (PM) techniques to smart spaces data. The general goal has been to automatically model human routines as if they were business processes. However, applying process-oriented techniques to smart spaces data comes with its own set of challenges. This paper surveys existing approaches that apply PM to smart spaces and analyses how they deal with the following challenges identified in the literature: choosing a modelling formalism for human behaviour; bridging the abstraction gap between sensor and event logs; and segmenting logs in traces. The added value of this article lies in providing the research community with a common ground for some important challenges that exist in this field and their respective solutions, and to assist further research efforts by outlining opportunities for future work.
ResearchGate has not been able to resolve any references for this publication.