If you want to read the PDF, try requesting it from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... While this paper focuses on event log quality issues, it should be recognized that ILS data can also suffer from data quality issues. Gal et al. [10] discuss ILS data quality challenges in queue mining, which is a subfield of process mining. In particular, they highlight the absence of a case identifier for some instances, the difficulty to determine the start and end point of activity execution, and issues related to reaching an appropriate level of data granularity for the analysis. ...
... It should be noted that Gal et al. [10] and other research using ILS data for process mining use ILS data in isolation. This paper advocates the use of both ILS and HIS data. ...
Conference Paper
Hospitals are becoming more and more aware of the need to manage their business processes. In this respect, process mining is increasingly used to gain insight in healthcare processes, requiring the analysis of event logs originating from the hospital information system. Process mining research mainly focuses on the development of new techniques or the application of existing methods, but the quality of all analyses ultimately depends on the quality of the event log. However, limited research has been done on the improvement of data quality in the process mining field, which is the topic of this paper. In particular, this paper discusses, from a conceptual angle, the opportunities that indoor location system data provides to tackle event log data quality issues. Moreover, the paper reflects upon the associated challenges. In this way, it provides the conceptualization for a new area of research, focusing on the systematic integration of an event log with indoor location system data.
... The business process management research community acknowledges [34,35] the importance of high quality event logs to enable high quality process analysis. The quality of process data becomes particularly relevant with the recent trend towards responsible data science, i.e., developing data science approaches, such as process mining techniques, that guarantee fairness, accuracy, transparency and confidentiality [36,37,38]. ...
Article
The efficiency and effectiveness of business processes are usually evaluated by Process Performance Indicators (PPIs), which are computed using process event logs. PPIs can be insightful only when they are measurable, i.e., reliable. This paper proposes to define PPI measurability on the basis of the quality of the data in the process logs. Then, based on this definition, a framework for PPI measurability assessment and improvement is presented. For the assessment, we propose novel definitions of PPI accuracy, completeness, consistency, timeliness and volume that contextualise the traditional definitions in the data quality literature to the case of process logs. For the improvement, we define a set of guidelines for improving the measurability of a PPI. These guidelines may concern improving existing event logs, for instance through data imputation, implementation or enhancement of the process monitoring systems, or updating the PPI definitions. A case study in a large-sized institution is discussed to show the feasibility and the practical value of the proposed framework.
... Common queue mining methods assume that one observes a complete activity life cycle, including arrival times (when an activity is scheduled), activity start times, and completion times. However, arrival times are often not observed in the data [5], in particular for processes in which a manual service is provided, such as patient treatments in hospitals. The reason being that data is often recorded only for the service providers, not the requesters. ...
Conference Paper
Process mining uses computational techniques for process-oriented data analysis. The use of poor quality input data will lead to unreliable analysis outcomes (garbage in - garbage out), as it does for other types of data analysis. Among the key inputs to process mining analyses are activity labels in event logs which represent tasks that have been performed. Activity labels are not immune from data quality issues. Fixing them is an important but challenging endeavour, which may require domain knowledge and can be computationally expensive. In this paper we propose to tackle this challenge from a novel angle by using a gamified crowdsourcing approach to the detection and repair of problematic activity labels, namely those with identical semantics but different syntax. Evaluation of the prototype with users and a real-life log showed promising results in terms of quality improvements achieved.
Conference Paper
Full-text available
Event logs are invaluable sources about the actual execution of processes. Most of process mining and postmortem analysis techniques depend on logs. All these techniques require the existence of the case ID to correlate the events. Real life logs are rarely originating from a centrally orchestrated process execution. Hence, case ID is missing, known as unlabeled logs. Correlating unla-beled events is a challenging problem that has received little attention in literature. Moreover, the few approaches addressing this challenge support acyclic business processes only. In this paper, we build on our previous work and propose an approach to deduce case ID for unlabeled event logs produced from cyclic business processes. As a result, a set of ranked labeled logs are generated. We evaluate our approach using real life logs.
Article
Full-text available
Research on sensor-based activity recognition has, recently, made significant progress and is attracting growing attention in a number of disciplines and application domains. However, there is a lack of high-level overview on this topic that can inform related communities of the research state of the art. In this paper, we present a comprehensive survey to examine the development and current status of various aspects of sensor-based activity recognition. We first discuss the general rationale and distinctions of vision-based and sensor-based activity recognition. Then, we review the major approaches and methods associated with sensor-based activity monitoring, modeling, and recognition from which strengths and weaknesses of those approaches are highlighted. We make a primary distinction in this paper between data-driven and knowledge-driven approaches, and use this distinction to structure our survey. We also discuss some promising directions for future research.
Conference Paper
Full-text available
The analysis of business processes is often challenging not only because of intricate dependencies between process activities but also because of various sources of faults within the activities. The automated detection of potential business process anomalies could immensely help business analysts and other process participants detect and understand the causes of process errors. This work focuses on temporal anomalies, i.e., anomalies concerning the runtime of activities within a process. To detect such anomalies, we propose a Bayesian model that can be automatically inferred form the Petri net representation of a business process. Probabilistic inference on the above model allows the detection of non-obvious and interdependent temporal anomalies.
Article
Full-text available
Today's organizations require techniques for automated transformation of their large data volumes into operational knowledge. This requirement may be addressed by using event recognition systems that detect events/activities of special significance within an organization, given streams of ‘low-level’ information that is very difficult to be utilized by humans. Consider, for example, the recognition of attacks on nodes of a computer network given the Transmission Control Protocol/Internet Protocol messages, the recognition of suspicious trader behaviour given the transactions in a financial market and the recognition of whale songs given a symbolic representation of whale sounds. Various event recognition systems have been proposed in the literature. Recognition systems with a logic-based representation of event structures, in particular, have been attracting considerable attention, because, among others, they exhibit a formal, declarative semantics, they have proven to be efficient and scalable and they are supported by machine learning tools automating the construction and refinement of event structures. In this paper, we review representative approaches of logic-based event recognition and discuss open research issues of this field. We illustrate the reviewed approaches with the use of a real-world case study: event recognition for city transport management.
Conference Paper
Full-text available
Process discovery is the problem of, given a log of observed behaviour, finding a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm guarantees to return a fitting model (i.e., able to reproduce all observed behaviour) that is sound (free of deadlocks and other anomalies) in finite time. We present an extensible framework to discover from any given log a set of block-structured process models that are sound and fit the observed behaviour. In addition we characterise the minimal information required in the log to rediscover a particular process model. We then provide a polynomial-time algorithm for discovering a sound, fitting, block-structured model from any given log; we give sufficient conditions on the log for which our algorithm returns a model that is language-equivalent to the process model underlying the log, including unseen behaviour. The technique is implemented in a prototypical tool.
Conference Paper
Full-text available
Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.
Article
Full-text available
to difierentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the efiectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the difierent existing techniques in that category are variants of the basic tech- nique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the difierent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
Book
This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a considerably expanded section on software tools and a completely new chapter of process mining in the large. It is self-contained, while at the same time covering the entire process-mining spectrum from process discovery to predictive analytics. After a general introduction to data science and process mining in Part I, Part II provides the basics of business process modeling and data mining necessary to understand the remainder of the book. Next, Part III focuses on process discovery as the most important process mining task, while Part IV moves beyond discovering the control flow of processes, highlighting conformance checking, and organizational and time perspectives. Part V offers a guide to successfully applying process mining in practice, including an introduction to the widely used open-source tool ProM and several commercial products. Lastly, Part VI takes a step back, reflecting on the material presented and the key open challenges. Overall, this book provides a comprehensive overview of the state of the art in process mining. It is intended for business process analysts, business consultants, process managers, graduate students, and BPM researchers.
Conference Paper
Detecting and measuring resource queues is central to business process optimization. Queue mining techniques allow for the identification of bottlenecks and other process inefficiencies, based on event data. This work focuses on the discovery of resource queues. In particular, we investigate the impact of available information in an event log on the ability to accurately discover queue lengths, i.e. the number of cases waiting for an activity. Full queueing information, i.e. timestamps of enqueueing and exiting the queue, makes queue discovery trivial. However, often we see only the completions of activities. Therefore, we focus our analysis on logs with partial information, such as missing enqueueing times or missing both enqueueing and service start times. The proposed discovery algorithms handle concurrency and make use of statistical methods for discovering queues under this uncertainty. We evaluate the techniques using real-life event logs. A thorough analysis of the empirical results provides insights into the influence of information levels in the log on the accuracy of the measurements.
Conference Paper
Process mining is a rapidly developing field that aims at automated modeling of business processes based on data coming from event logs. In recent years, advances in tracking technologies, e.g., Real-Time Locating Systems (RTLS), put forward the ability to log business process events as location sensor data. To apply process mining techniques to such sensor data, one needs to overcome an abstraction gap, because location data recordings do not relate to the process directly. In this work, we solve the problem of mapping sensor data to event logs based on process knowledge. Specifically, we propose interactions as an intermediate knowledge layer between the sensor data and the event log. We solve the mapping problem via optimal matching between interactions and process instances. An empirical evaluation of our approach shows its feasibility and provides insights into the relation between ambiguities and deviations from process knowledge, and accuracy of the resulting event log.
Conference Paper
Process mining techniques analyze processes based on event data. A crucial assumption for process analysis is that events correspond to occurrences of meaningful activities. Often, low-level events recorded by information systems do not directly correspond to these. Abstraction methods, which provide a mapping from the recorded events to activities recognizable by process workers, are needed. Existing supervised abstraction methods require a full model of the entire process as input and cannot handle noise. This paper proposes a supervised abstraction method based on behavioral activity patterns that capture domain knowledge on the relation between activities and events. Through an alignment between the activity patterns and the low-level event logs an abstracted event log is obtained. Events in the abstracted event log correspond to instantiations of recognizable activities. The method is evaluated with domain experts of a Norwegian hospital using an event log from their digital whiteboard system. The evaluation shows that state-of-the art process mining methods provide valuable insights on the usage of the system when using the abstracted event log, but fail when using the original lower level event log.
Article
Service processes, for example in transportation, telecommunications or the health sector, are the backbone of today's economies. Conceptual models of service processes enable operational analysis that supports, e.g., resource provisioning or delay prediction. In the presence of event logs containing recorded traces of process execution, such operational models can be mined automatically.In this work, we target the analysis of resource-driven, scheduled processes based on event logs. We focus on processes for which there exists a pre-defined assignment of activity instances to resources that execute activities. Specifically, we approach the questions of conformance checking (how to assess the conformance of the schedule and the actual process execution) and performance improvement (how to improve the operational process performance). The first question is addressed based on a queueing network for both the schedule and the actual process execution. Based on these models, we detect operational deviations and then apply statistical inference and similarity measures to validate the scheduling assumptions, thereby identifying root-causes for these deviations. These results are the starting point for our technique to improve the operational performance. It suggests adaptations of the scheduling policy of the service process to decrease the tardiness (non-punctuality) and lower the flow time. We demonstrate the value of our approach based on a real-world dataset comprising clinical pathways of an outpatient clinic that have been recorded by a real-time location system (RTLS). Our results indicate that the presented technique enables localization of operational bottlenecks along with their root-causes, while our improvement technique yields a decrease in median tardiness and flow time by more than 20%.
Article
Urban mobility impacts urban life to a great extent. To enhance urban mobility, much research was invested in traveling time prediction: given an origin and destination, provide a passenger with an accurate estimation of how long a journey lasts. In this work, we investigate a novel combination of methods from Queueing Theory and Machine Learning in the prediction process. We propose a prediction engine that, given a scheduled bus journey (route) and a 'source/destination' pair, provides an estimate for the traveling time, while considering both historical data and real-time streams of information that are transmitted by buses. We propose a model that uses natural segmentation of the data according to bus stops and a set of predictors, some use learning while others are learning-free, to compute traveling time. Our empirical evaluation, using bus data that comes from the bus network in the city of Dublin, demonstrates that the snapshot principle, taken from Queueing Theory, works well yet suffers from outliers. To overcome the outliers problem, we use Machine Learning techniques as a regulator that assists in identifying outliers and propose prediction based on historical data.
Conference Paper
Process Mining techniques have been gaining attention, especially as concerns the discovery of predictive process models. Traditionally focused on workflows, they usually assume that process tasks are clearly specified, and referred to in the logs. This limits however their application to many real-life BPM environments (e.g. issue tracking systems) where the traced events do not match any predefined task, but yet keep lots of context data. In order to make the usage of predictive process mining to such logs more effective and easier, we devise a new approach, combining the discovery of different execution scenarios with the automatic abstraction of log events. The approach has been integrated in a prototype system, supporting the discovery, evaluation and reuse of predictive process models. Tests on real-life data show that the approach achieves compelling prediction accuracy w.r.t. state-of-the-art methods, and finds interesting activities’ and process variants’ descriptions.
Article
Companies increasingly adopt process-aware information systems (PAISs) to model, execute, monitor, and evolve their business processes. Though the handling of temporal constraints (e.g., deadlines or time lags between activities) is crucial for the proper support of business processes, existing PAISs vary significantly regarding the support of the temporal perspective. Both the formal specification and the operational support of temporal constraints constitute fundamental challenges in this context. In previous work, we introduced process time patterns, which facilitate the comparison and evaluation of PAISs in respect to their support of the temporal perspective. Furthermore, we provided empirical evidence for these time patterns. To avoid ambiguities and to ease the use as well as the implementation of the time patterns, this paper formally defines their semantics. To additionally foster the use of the patterns for a wide range of process modeling languages and to enable pattern integration with existing PAISs, the proposed semantics are expressed independently of a particular process meta model. Altogether, the presented pattern formalization will be fundamental for introducing the temporal perspective in PAISs.
Article
Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Information recorded by systems during the operation of these processes provide an angle for operational process analysis, commonly referred to as process mining. In this work, we establish a queueing perspective in process mining to address the online delay prediction problem, which refers to the time that the execution of an activity for a running instance of a service process is delayed due to queueing effects. We present predictors that treat queues as first-class citizens and either enhance existing regression-based techniques for process mining or are directly grounded in queueing theory. In particular, our predictors target multi-class service processes, in which requests are classified by a type that influences their processing. Further, we introduce queue mining techniques that derive the predictors from event logs recorded by an information system during process execution. Our evaluation based on large real-world datasets, from the telecommunications and financial sectors, shows that our techniques yield accurate online predictions of case delay and drastically improve over predictors neglecting the queueing perspective.
Article
While the maturity of process mining algorithms increases and more process mining tools enter the market, process mining projects still face the problem of different levels of abstraction when comparing events with modeled business activities. Current approaches for event log abstraction try to abstract from the events in an automated way that does not capture the required domain knowledge to fit business activities. This can lead to misinterpretation of discovered process models. We developed an approach that aims to abstract an event log to the same abstraction level that is needed by the business. We use domain knowledge extracted from existing process documentation to semi-automatically match events and activities. Our abstraction approach is able to deal with n:m relations between events and activities and also supports concurrency. We evaluated our approach in two case studies with a German IT outsourcing company.
Article
• Contains additional discussion and examples on left truncation as well as material on more general censoring and truncation patterns. • Introduces the martingale and counting process formulation swil lbe in a new chapter. • Develops multivariate failure time data in a separate chapter and extends the material on Markov and semi Markov formulations. • Presents new examples and applications of data analysis.
Conference Paper
Existing process mining techniques are able to discover process models from event logs where each event is known to have been produced by a given process instance. In this paper we remove this restriction and address the problem of discovering the process model when the event log is provided as an unlabelled stream of events. Using a probabilistic approach, it is possible to estimate the model by means of an iterative Expectaction-Maximization procedure. The same procedure can be used to find the case id in unlabelled event logs. A series of experiments show how the proposed technique performs under varying conditions and in the presence of certain workflow patterns. Results are presented for a running example based on a technical support process.
Article
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity of potential duplicates. In this paper, we present a framework for improving duplicate detection using trainable measures of textual similarity. We propose to employ learnable text distance functions for each database field, and show that such measures are capable of adapting to the specific notion of similarity that is appropriate for the field's domain. We present two learnable text similarity measures suitable for this task: an extended variant of learnable string edit distance, and a novel vector-space based measure that employs a Support Vector Machine (SVM) for training. Experimental results on a range of datasets show that our framework can improve duplicate detection accuracy over traditional techniques.
Discovering Queues from Event Logs with Varying Levels of Information
  • Arik Senderovich
  • J J Sander
  • Leemans
  • M P Wil
  • Van Der Aalst
Arik Senderovich, Sander J. J. Leemans, Shahar Harel, Avigdor Gal, Avishai Mandelbaum, and Wil M. P. van der Aalst. 2015. Discovering Queues from Event Logs with Varying Levels of Information. In BPM Workshops 2015, Revised Papers. 154-166.
François Portet, and Georgios Paliouras
  • Alexander Artikis
  • Anastasios Skarlatidis
Alexander Artikis, Anastasios Skarlatidis, François Portet, and Georgios Paliouras. 2012. Logic-based event recognition. Knowledge Eng. Review 27, 4 (2012), 469-506.