ChapterPDF Available

A Probabilistic Approach to Event-Case Correlation for Process Mining

Authors:

Abstract and Figures

Process mining aims to understand the actual behavior and performance of business processes from event logs recorded by IT systems. A key requirement is that every event in the log must be associated with a unique case identifier (e.g., the order ID in an order-to-cash process). In reality, however, this case ID may not always be present, especially when logs are acquired from different systems or when such systems have not been explicitly designed to offer process-tracking capabilities. Existing techniques for correlating events have worked with assumptions to make the problem tractable: some assume the generative processes to be acyclic while others require heuristic information or user input. In this paper, we lift these assumptions by presenting a novel technique called EC-SA based on probabilistic optimization. Given as input a sequence of timestamped events (the log without case IDs) and a process model describing the underlying business process, our approach returns an event log in which every event is mapped to a case identifier. The approach minimises the misalignment between the generated log and the input process model, and the variance between activity durations across cases. The experiments conducted on a variety of real-life datasets show the advantages of our approach over the state of the art.
Content may be subject to copyright.
A preview of the PDF is not available
... In previous work [18], we introduced a probabilistic optimization technique called EC-SA (Events Correlation by Simulated Annealing), which is based on a simulated annealing heuristic approach. EC-SA addresses the correlation problem as a multi-level optimization problem. ...
... Also, it is computationally inefficient due to the breadth-first search approach. In a previous paper [18], we propose the Event Correlation by Simulated Annealing (EC-SA) approach, which uses the event names and timestamp in addition to the process model. EC-SA addresses the correlation problem as a multi-level optimization problem, as it searches for the nearest optimal correlated log considering the fitness with an input process model and the activities' timed behavior within the log. ...
... We implemented a prototype tool for EC-SA-Data. 3 Using this tool, we conducted three experiments to evaluate the accuracy and time performance of our approach, and compared the results with EC-SA [18] used as a baseline. Figure 14 illustrates our evaluation process. ...
Preprint
Process mining supports the analysis of the actual behavior and performance of business processes using event logs. % such as, e.g., sales transactions recorded by an ERP system. An essential requirement is that every event in the log must be associated with a unique case identifier (e.g., the order ID of an order-to-cash process). In reality, however, this case identifier may not always be present, especially when logs are acquired from different systems or extracted from non-process-aware information systems. In such settings, the event log needs to be pre-processed by grouping events into cases -- an operation known as event correlation. Existing techniques for correlating events have worked with assumptions to make the problem tractable: some assume the generative processes to be acyclic, while others require heuristic information or user input. Moreover, %these techniques' primary assumption is that they abstract the log to activities and timestamps, and miss the opportunity to use data attributes. % In this paper, we lift these assumptions and propose a new technique called EC-SA-Data based on probabilistic optimization. The technique takes as inputs a sequence of timestamped events (the log without case IDs), a process model describing the underlying business process, and constraints over the event attributes. Our approach returns an event log in which every event is associated with a case identifier. The technique allows users to incorporate rules on process knowledge and data constraints flexibly. The approach minimizes the misalignment between the generated log and the input process model, maximizes the support of the given data constraints over the correlated log, and the variance between activity durations across cases. Our experiments with various real-life datasets show the advantages of our approach over the state of the art.
... They can also deduce missing information (e.g. case ID) [7, 8,9,10,11,12,13,14]. ...
... Correlation in an offline setting has received attention from the community, e.g. [7, 8,13]. However, these approaches cannot be applied in an online (streaming) setting due to their need for the complete event log. ...
... This output can be filtered with an optional user-specified threshold. It is possible to keep only the correlated event with the highest probability [13]. However, this might not always guarantee a one-to-one mapping as two correlations 215 might have the same probability. ...
Article
Process mining is a sub-field of data mining that focuses on analyzing timestamped and partially ordered data. This type of data is commonly called event logs. Each event is required to have at least three attributes: case ID, task ID/name, and timestamp to apply process mining techniques. Thus, any missing information need to be supplied first. Traditionally, events collected from different sources are manually correlated. While this might be acceptable in an offline setting, this is infeasible in an online setting. Recently, several use cases have emerged that call for applying process mining in an online setting. In such scenarios, a stream of high-speed and high-volume events continuously flow, e.g. IoT applications, with stringent latency requirements to have insights about the ongoing process. Thus, event correlation must be automated and occur as the data is being received. We introduce an approach that correlates unlabeled events received on a stream. Given a set of start activities, our approach correlates unlabeled events to a case identifier. Our approach is probabilistic. That implies a single uncorrelated event can be assigned to zero or more case identifiers with different probabilities. Moreover, our approach is flexible. That is, the user can supply domain knowledge in the form of constraints that reduce the correlation space. This knowledge can be supplied while the application is running. We realize our approach using complex event processing (CEP) technologies. We implemented a prototype on top of Esper, a state of the art industrial CEP engine. We compare our approach to baseline approaches. The experimental evaluation shows that our approach outperforms the throughput and latency of the baseline approaches. It also shows that using real-life logs, the accuracy of our approach can compete with the baseline approaches.
... It is worth noticing that although the classification of cases and variants was illustrated with only two routines (interleaving or not), the classification is defined in a generic way and applies to any number of routines. Agostinelli et al. [3] ∼ ∼ Baier et al. [7] Bayomie et al. [8] Bosco et al. [9] Kumar et al. [18] Leno et al. [20] ∼ ∼ ∼ ∼ Liu [23] ∼ ∼ ∼ Fazzinga et al. [12] Ferreira et al. [13] Mannhardt et al. [24] Mȃruşter et al. [27] Srivastava et al. [29] 5 Assessing the Segmentation Approaches ...
... Even if more focused on traditional business processes in BPM rather than on RPA routines, Bayomie et al. [8] address the problem of correlating uncorrelated event logs in process mining in which they assume the model of the routine is known. Since event logs allow to store traces of one process model only, as a consequence this technique is able to achieve Case 1.1 only. ...
... In the field of process discovery, Mȃruşter et al. [27] propose an empirical method for inducing rule sets from event logs containing execution of one process only. Therefore, as in [8], this method is able to achieve Case 1.1 only, thus making the technique ineffective in presence of interleaved and shared user actions. A more robust approach, developed by Fazzinga et al. [12], employs predefined behavioural models to establish which process activities belong to which process model. ...
Chapter
Full-text available
Robotic Process Automation (RPA) is an emerging technology that allows organizations to automate intensive repetitive tasks (or simply routines) previously performed by a human user on the User Interface (UI) of web or desktop applications. RPA tools are able to capture in dedicated UI logs the execution of several routines and then emulate their enactment in place of the user by means of a software (SW) robot. A UI log can record information about many routines, whose actions are mixed in some order that reflects the particular order of their execution by the user, making their automated identification far from being trivial. The issue to automatically understand which user actions contribute to a specific routine inside the UI log is also known as segmentation. In this paper, we leverage a concrete use case to explore the issue of segmentation of UI logs, identifying all its potential variants and presenting an up-to-date overview that discusses to what extent such variants are supported by existing literature approaches. Moreover, we offer points of reference for future research based on the findings of this paper.
... The problem of assigning a case identi er to events in a log is a long-standing challenge in the process mining community [5], and is known by multiple names in literature, including event-case correlation problem [3] and case notion discovery problem [13]. Event logs where events are missing the case identi er attribute are usually referred to as unlabeled event logs [5]. ...
... This comes at the cost of a slow computing time. An improvement of the aforementioned method [3] employs simulated annealing to select an optimal case notion; while still very computationally heavy, this method delivers high-quality case attribution results. ...
Conference Paper
Full-text available
Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data often called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for event-case correlation on click data, in the context of user interaction events from a mobility sharing company. To reconstruct the case notion of the process, we apply a novel method to aggregate user interaction data in separate user sessions—interpreted as cases—based on neural networks. To validate our findings, we qualitatively discuss the impact of process mining analyses on the resulting well-formed event log through interviews with process experts.
... The problem of assigning a case identi er to events in a log is a long-standing challenge in the process mining community [5], and is known by multiple names in literature, including event-case correlation problem [3] and case notion discovery problem [13]. Event logs where events are missing the case identi er attribute are usually referred to as unlabeled event logs [5]. ...
... This comes at the cost of a slow computing time. An improvement of the aforementioned method [3] employs simulated annealing to select an optimal case notion; while still very computationally heavy, this method delivers high-quality case attribution results. ...
Preprint
Full-text available
Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data of en called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for event-case correlation on click data, in the context of user interaction events from a mobility sharing company. To reconstruct the case notion of the process, we apply a novel method to aggregate user interaction data in separate user sessions—interpreted as cases—based on neural networks. To validate our findings, we qualitatively discuss the impact of process mining analyses on the resulting well-formed event log through interviews with process experts.
... There have been various works on finding correlations between process events [11]. They have focused on different tasks such as identifying process instances, i.e. cases, for event log generation [12], also considering object-centric perspective [13] and middleware [14], for discovering a process model [15] or enriching an event log with sensor data [16]. ...
Conference Paper
Process mining techniques provide process analysts with insights into interesting patterns of a business process. Current techniques have focused by and large on the explanation of behavior, partially by help of features that relate to multiple perspectives beyond just pure control flow. However, techniques to provide insights into the connection between data elements of related events have been missing so far. Such connections are relevant for several analysis tasks such as event correlation, resource allocation, or log partitioning. In this paper, we propose a multi-perspective mining technique for discovering data connections. More specifically, we adapt concepts from association rule mining to extract connections between a sequence of events and behavioral attributes of related data objects and contextual features. Our technique was evaluated using real-world events supporting the usefulness of the mined association rules.
... Such a case notion correlates events of a process instance and represents them as a single sequence, e.g., a sequence of events of a patient. However, in real-life business processes supported by ERP systems such as SAP and Oracle, multiple objects (i.e., multiple sequences of events) exist in a process instance [2,7] and they share events (i.e., sequences are overlapping). Fig. 1(a) shows a process instance in a simple blood test process as multiple overlapping sequences. ...
Preprint
Full-text available
Performance analysis in process mining aims to provide insights on the performance of a business process by using a process model as a formal representation of the process. Such insights are reliably interpreted by process analysts in the context of a model with formal semantics. Existing techniques for performance analysis assume that a single case notion exists in a business process (e.g., a patient in healthcare process). However, in reality, different objects might interact (e.g., order, item, delivery, and invoice in an O2C process). In such a setting, traditional techniques may yield misleading or even incorrect insights on performance metrics such as waiting time. More importantly, by considering the interaction between objects, we can define object-centric performance metrics such as synchronization time, pooling time, and lagging time. In this work, we propose a novel approach to performance analysis considering multiple case notions by using object-centric Petri nets as formal representations of business processes. The proposed approach correctly computes existing performance metrics, while supporting the derivation of newly-introduced object-centric performance metrics. We have implemented the approach as a web application and conducted a case study based on a real-life loan application process.
... Nevertheless, it still requires a process model and heuristics data as input. In [6] the authors propose another approach called EC-SA, which is based on probabilistic optimization. The approach requires as input the unlabeled event log and the process model. ...
Chapter
Full-text available
Event logs can be analyzed using various process mining techniques (e.g., process discovery) to obtain valuable information about the actual behavior of business process executions. Typically, these techniques rely on the presence of a case identifier linking events to process instances. However, if the process involves information systems that do not record events in a process-oriented manner, a clear case identifier may be missing, resulting in an unlabeled event log. While some approaches already address the challenge of inferring case identifiers for unlabeled event logs, most of them provide limited support for cyclic behavior without additional inputs. This paper proposes a three-step approach to correlate events with case identifiers for unlabeled event logs originating from processes with cyclic behavior. While evaluating the accuracy of our approach with two real-world event logs (MIMIC-IV and Road Traffic Fine Management), we show that our approach, compared to the existing ones, detects cyclic behavior and correlates events closer to the original process instances without additional inputs.
... Nevertheless, it still requires a process model and heuristics data as input. In [6] the authors propose another approach called EC-SA, which is based on probabilistic optimization. The approach requires as input the unlabeled event log and the process model. ...
Conference Paper
Full-text available
Event logs can be analyzed using various process mining techniques (e.g., process discovery) to obtain valuable information about the actual behavior of business process executions. Typically, these techniques rely on the presence of a case identifier linking events to process instances. However, if the process involves information systems that do not record events in a process-oriented manner, a clear case identifier may be missing, resulting in an unlabeled event log. While some approaches already address the challenge of inferring case identifiers for unlabeled event logs, most of them provide limited support for cyclic behavior without additional inputs. This paper proposes a three-step approach to correlate events with case identifiers for unlabeled event logs originating from processes with cyclic behavior. While evaluating the accuracy of our approach with two real-world event logs (MIMIC-IV and Road Traffic Fine Management), we show that our approach, compared to the existing ones, detects cyclic behavior and correlates events closer to the original process instances without additional inputs.
Chapter
Robotic Process Automation (RPA) is an emerging technology for automating tasks using bots that can mimic human actions on computer systems. Most existing research focuses on the earlier phases of RPA implementations, e.g. the discovery of tasks that are suitable for automation. To detect exceptions and explore opportunities for bot and process redesign, historical data from RPA-enabled processes in the form of bot logs or process logs can be utilized. However, the isolated use of bot logs or process logs provides only limited insights and not a good understanding of an overall process. Therefore, we develop an approach that merges bot logs with process logs for process mining. A merged log enables an integrated view on the role and effects of bots in an RPA-enabled process. We first develop an integrated data model describing the structure and relation of bots and business processes. We then specify and instantiate a ‘bot log parser’ translating bot logs of three leading RPA vendors into the XES format. Further, we develop the ‘log merger’ functionality that merges bot logs with logs of the underlying business processes. We further introduce process mining measures allowing the analysis of a merged log.
Article
Full-text available
The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or over-generalize it (low precision). Striking a trade-off between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method, namely Split Miner, which produces simple process models with low branching complexity and consistently high and balanced fitness and precision, while achieving considerably faster execution times than state-of-the-art methods, measured on a benchmark covering twelve real-life event logs. Split Miner combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph. Split Miner is also the first automated process discovery method that is guaranteed to produce deadlock-free process models with concurrency, while not being restricted to producing block-structured process models.
Article
Full-text available
The domains of complex event processing (CEP) and business process management (BPM) have different origins but for many aspects draw on similar concepts. While specific combinations of BPM and CEP have attracted research attention, resulting in solutions to specific problems, we attempt to take a broad view at the opportunities and challenges involved. We first illustrate these by a detailed example from the logistics domain. We then propose a mapping of this area into four quadrants – two quadrants drawing from CEP to create or extend process models and two quadrants starting from a process model to address how it can guide CEP. Existing literature is reviewed and specific challenges and opportunities are indicated for each of these quadrants. Based on this mapping, we identify challenges and opportunities that recur across quadrants and can be considered as the core issues of this combination. We suggest that addressing these issues in a generic manner would form a sound basis for future applications and advance this area significantly.
Conference Paper
Full-text available
Nowadays, many business processes once intra-organizational are becoming inter-organizational. Thus, being able to monitor how such processes are performed, including portions carried out by service providers, is paramount. Yet, traditional process monitoring techniques present some shortcomings when dealing with inter-organizational processes. In particular, they require human operators to notify when business activities are performed, and to stop the process when it is not executed as expected. In this paper, we address these issues by proposing an artifact-driven monitoring service, capable of autonomously and continuously monitor inter-organizational processes. To do so, this service relies on the state of the artifacts (i.e., physical entities) participating to the process, represented using the E-GSM notation. A working prototype of this service is presented and validated using real-world processes and data from the logistics domain.
Article
Full-text available
Nowadays, business processes are increasingly supported by IT services that produce massive amounts of event data during the execution of a process. These event data can be used to analyze the process using process mining techniques to discover the real process, measure conformance to a given process model, or to enhance existing models with performance information. Mapping the produced events to activities of a given process model is essential for conformance checking, annotation and understanding of process mining results. In order to accomplish this mapping with low manual effort, we developed a semi-automatic approach that maps events to activities using insights from behavioral analysis and label analysis. The approach extracts Declare constraints from both the log and the model to build matching constraints to efficiently reduce the number of possible mappings. These mappings are further reduced using techniques from natural language processing, which allow for a matching based on labels and external knowledge sources. The evaluation with synthetic and real-life data demonstrates the effectiveness of the approach and its robustness toward non-conforming execution logs.
Article
Full-text available
In the era of "big data" one of the key challenges is to analyze large amounts of data collected in meaningful and scalable ways. The field of process mining is concerned with the analysis of data that is of a particular nature, namely data that results from the execution of business processes. The analysis of such data can be negatively influenced by the presence of outliers, which reflect infrequent behavior or "noise". In process discovery, where the objective is to automatically extract a process model from the data, this may result in rarely travelled pathways that clutter the process model. This paper presents an automated technique to the removal of infrequent behavior from event logs. The proposed technique is evaluated in detail and it is shown that its application in conjunction with certain existing process discovery algorithms significantly improves the quality of the discovered process models and that it scales well to large datasets.
Article
Process discovery algorithms aim to capture process models from event logs. These algorithms have been designed for logs in which the events that belong to the same case are related to each other — and to that case — by means of a unique case identifier. However, in service-oriented systems, these case identifiers are rarely stored beyond request-response pairs, which makes it hard to relate events that belong to the same case. This is known as the correlation challenge. This paper addresses the correlation challenge by introducing a technique, called the correlation miner, that facilitates discovery of business process models when events are not associated with a case identifier. It extends previous work on the correlation miner, by not only enabling the discovery of the process model, but also detecting which events belong to the same case. Experiments performed on both synthetic and real-world event logs show the applicability of the correlation miner. The resulting technique enables us to observe a service-oriented system and determine — with high accuracy — which request-response pairs sent by different communicating parties are related to each other.
Article
Process mining methods allow analysts to exploit logs of historical executions of business processes in order to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Several dozen automated process discovery methods have been proposed in the past two decades, striking different trade-offs between scalability, accuracy and complexity of the resulting models. So far, automated process discovery methods have been evaluated in an ad hoc manner, with different authors employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of non-publicly available datasets. In this setting, this article provides a systematic review of automated process discovery methods and a systematic comparative evaluation of existing implementations of these methods using an opensource benchmark covering nine publicly-available real-life event logs and eight quality metrics. The review and evaluation results highlight gaps and unexplored trade-offs in the field, including the lack of scalability of several proposals in the field and a strong divergence in the performance of different methods with respect to different quality metrics. The proposed benchmark allows researchers to empirically compare new automated process discovery against existing ones in a unified setting.
Book
This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a considerably expanded section on software tools and a completely new chapter of process mining in the large. It is self-contained, while at the same time covering the entire process-mining spectrum from process discovery to predictive analytics. After a general introduction to data science and process mining in Part I, Part II provides the basics of business process modeling and data mining necessary to understand the remainder of the book. Next, Part III focuses on process discovery as the most important process mining task, while Part IV moves beyond discovering the control flow of processes, highlighting conformance checking, and organizational and time perspectives. Part V offers a guide to successfully applying process mining in practice, including an introduction to the widely used open-source tool ProM and several commercial products. Lastly, Part VI takes a step back, reflecting on the material presented and the key open challenges. Overall, this book provides a comprehensive overview of the state of the art in process mining. It is intended for business process analysts, business consultants, process managers, graduate students, and BPM researchers.