Chapter

Process Minding: Closing the Big Data Gap

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The discipline of process mining was inaugurated in the BPM community. It flourished in a world of small(er) data, with roots in the communities of software engineering and databases and applications mainly in organizational and management settings. The introduction of big data, with its volume, velocity, variety, and veracity, and the big strides in data science research and practice pose new challenges to this research field. The paper positions process mining along modern data life cycle, highlighting the challenges and suggesting directions in which data science disciplines (e.g., machine learning) may interact with a renewed process mining agenda.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It could even reach beyond the mere process view and, for example, support theorizing about organizational change [11]. 1. Process mining as the bridge between process science and data science [33] Process mining originated in the BPM community and flourished first in a world of rather small data [9]. Only throughout the last decade huge data pools representing process events emerged, and their exploitation now requires data science approaches and techniques, while the objective of using these and the subsequent calibration and control remain related to process science. ...
... Only throughout the last decade huge data pools representing process events emerged, and their exploitation now requires data science approaches and techniques, while the objective of using these and the subsequent calibration and control remain related to process science. For succeeding in process mining projects it usually needs collaboration of experts from both fields [9]. In that sense it is an interdisciplinary endeavor indicating that distinct ingredients from two different paradigms are essential, and there is no sign of disciplinary merge and community building beyond the establishment as a subset of both fields. ...
Chapter
As citizen orientation and public value creation are more in the focus, how do we set priorities for the upcoming digital transformation in the public sector? Distinguishing data science and process science as paradigms that promote different directions for the transformation, this research seeks to improve the transparency of how IT-related decisions are directing projects and resources and thus promoting directions of public value production and delivery. Digital government research along this line may help constituents, IT experts and other stakeholders to engage in the needed discourse about the (not) wanted future of government performance and related technology usage.
Preprint
Full-text available
The paper describes a model of subjective goal-oriented semantics extending standard "view-from-nowhere" approach. Generalization is achieved by using a spherical vector structure essentially supplementing the classical bit with circular dimension, organizing contexts according to their subjective causal ordering. This structure, known in quantum theory as qubit, is shown to be universal representation of contextual-situated meaning at the core of human cognition. Subjective semantic dimension, inferred from fundamental oscillation dynamics, is discretized to six process-stage prototypes expressed in common language. Predicted process-semantic map of natural language terms is confirmed by the open-source word2vec data.
This book constitutes the proceedings of the 20th IFIP WG 8.5 International Conference on Electronic Government, EGOV 2021, held in Granada, Spain, in September 2021, in conjunction with the IFIP WG 8.5 IFIP International Conference on Electronic Participation (ePart 2021) and the International Conference for E-Democracy and Open Government Conference (CeDEM 2021). The 23 full papers presented were carefully reviewed and selected from 63 submissions. The papers are clustered under the following topical sections: digital transformation; digital services and open government; open data: social and technical perspectives; smart cities; and data analytics, decision making, and artificial intelligence. Chapters "Perceived and Actual Lock-in Effects Amongst Swedish Public Sector Organisations when Using a SaaS Solution" and "Ronda: Real-time Data Provision, Processing and Publication for Open Data" are available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.
Article
Full-text available
Time prediction is an essential component of decision making in various Artificial Intelligence application areas, including transportation systems, healthcare, and manufacturing. Predictions are required for efficient resource allocation and scheduling, optimized routing, and temporal action planning. In this work, we focus on time prediction in congested systems, where entities share scarce resources. To achieve accurate and explainable time prediction in this setting, features describing system congestion (e.g., workload and resource availability), must be considered. These features are typically gathered using process knowledge, (i.e., insights on the interplay of a system’s entities). Such knowledge is expensive to gather and may be completely unavailable. In order to automatically extract such features from data without prior process knowledge, we propose the model of congestion graphs, which are grounded in queueing theory. We show how congestion graphs are mined from raw event data using queueing theory based assumptions on the information contained in these logs. We evaluate our approach on two real-world datasets from healthcare systems where scarce resources prevail: an emergency department and an outpatient cancer clinic. Our experimental results show that using automatic generation of congestion features, we get an up to 23% improvement in terms of relative error in time prediction, compared to common baseline methods. We also detail how congestion graphs can be used to explain delays in the system.
Conference Paper
Full-text available
In the last decade, with availability of large datasets and more computing power, machine learning systems have achieved (super)human performance in a wide variety of tasks. Examples of this rapid development can be seen in image recognition, speech analysis, strategic game planning and many more. The problem with many state-of-the-art models is a lack of transparency and interpretability. The lack of thereof is a major drawback in many applications, e.g. healthcare and finance, where rationale for model's decision is a requirement for trust. In the light of these issues, explainable artificial intelligence (XAI) has become an area of interest in research community. This paper summarizes recent developments in XAI in supervised learning, starts a discussion on its connection with artificial general intelligence, and gives proposals for further research directions.
Article
Full-text available
The evolution of data accumulation, management, analytics, and visualization has led to the coining of the term big data, which challenges the task of data integration. This task, common to any matching problem in computer science involves generating alignments between structured data in an automated fashion. Historically, set-based measures, based upon binary similarity matrices (match/non-match), have dominated evaluation practices of matching tasks. However, in the presence of big data, such measures no longer suffice. In this work, we propose evaluation methods for non-binary matrices as well. Non-binary evaluation is formally defined together with several new, non-binary measures using a vector space representation of matching outcome. We provide empirical analyses of the usefulness of non-binary evaluation and show its superiority over its binary counterparts in several problem domains.
Conference Paper
Full-text available
Process mining offers a variety of techniques for analyzing process execution event logs. Although process discovery algorithms construct end-to-end process models, they often have difficulties dealing with the complexity of real-life event logs. Discovered models may contain either complex or over-generalized fragments, the interpretation of which is difficult, and can result in misleading insights. Detecting and visualizing behavioral patterns instead of creating model structures can reduce complexity and give more accurate insights into recorded behaviors. Unsupervised detection techniques, based on statistical properties of the log only, generate a multitude of patterns and lack domain context. Supervised pattern detection requires a domain expert to specify patterns manually and lacks the event log context. In this paper, we reconcile supervised and unsupervised pattern detection. We visualize the log and help users extract patterns of interest from the log or obtain patterns through unsupervised learning automatically. Pattern matches are visualized in the context of the event log (also showing concurrency and additional contextual information). Earlier patterns can be extended or modified based on the insights. This enables an interactive and iterative approach to identify complex and concrete behavioral patterns in event logs. We implemented our approach in the ProM framework and evaluated the tool using both the BPI Challenge 2012 log of a loan application process and an insurance claims log from a major Australian insurance company.
Article
Full-text available
The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper we focus on process discovery relying on online streams of business process execution events. Learning process models from event streams poses both challenges and opportunities, i.e. we need to handle unlimited amounts of data using finite memory and, preferably, constant time. We propose a generic architecture that allows for adopting several classes of existing process discovery techniques in context of event streams. Moreover, we provide several instantiations of the architecture, accompanied by implementations in the process mining tool-kit ProM (http://promtools.org). Using these instantiations, we evaluate several dimensions of stream-based process discovery. The evaluation shows that the proposed architecture allows us to lift process discovery to the streaming domain.
Article
Full-text available
Although most business processes change over time, contemporary process mining techniques tend to analyze these processes as if they are in a steady state. Processes may change suddenly or gradually. The drift may be periodic (e.g., because of seasonal influences) or one-of-a-kind (e.g., the effects of new legislation). For the process management, it is crucial to discover and understand such concept drifts in processes. This paper presents a generic framework and specific techniques to detect when a process changes and to localize the parts of the process that have changed. Different features are proposed to characterize relationships among activities. These features are used to discover differences between successive populations. The approach has been implemented as a plug-in of the ProM process mining framework and has been evaluated using both simulated event data exhibiting controlled concept drifts and real-life event data from a Dutch municipality.
Article
Full-text available
Process mining techniques use event data to discover process models, to check the conformance of predefined process models, and to extend such models with information about bottlenecks, decisions, and resource usage. These techniques are driven by observed events rather than hand-made models. Event logs are used to learn and enrich process models. By replaying history using the model, it is possible to establish a precise relationship between events and model elements. This relationship can be used to check conformance and to analyze performance. For example, it is possible to diagnose deviations from the modeled behavior. The severity of each deviation can be quantified. Moreover, the relationship established during replay and the timestamps in the event log can be combined to show bottlenecks. These examples illustrate the importance of maintaining a proper alignment between event log and process model. Therefore, we elaborate on the realization of such alignments and their application to conformance checking and performance analysis. © 2012 Wiley Periodicals, Inc.
Article
Full-text available
Process mining techniques have proven to be a valuable tool for ana-lyzing the execution of business processes. They rely on logs that identify events at an activity level, i.e., most process mining techniques assume that the infor-mation system explicitly supports the notion of activities/tasks. This is often not the case and only low-level events are being supported and logged. For example, users may provide different pieces of data which together constitute a single ac-tivity. The technique introduced in this paper uses clustering algorithms to derive activity logs from lower-level data modification logs, as produced by virtually every information system. This approach was implemented in the context of the ProM framework and its goal is to widen the scope of processes that can be ana-lyzed using existing process mining techniques.
Conference Paper
Full-text available
Modern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, not all available features can be used for ranking, and in fact, much of the time, only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. To this end, we explore different feature selection methods using boosted regression trees, including both greedy approaches (selecting the features with highest relative importance as computed by boosted trees; discounting importance by feature similarity and a randomized approach. We evaluate and compare these approaches using data from a commercial search engine. The experimental results show that the proposed randomized feature selection with feature-importance-based backward elimination outperforms greedy approaches and achieves a comparable relevance with 30 features to a full-feature model trained with 419 features and the same modeling parameters.
Conference Paper
Full-text available
The goal of performance analysis of business processes is to gain insights into operational processes, for the purpose of optimizing them. To intuitively show which parts of the process might be improved, performance analysis results can be projected onto process models. This way, bottlenecks can quickly be identified and resolved. Unfortunately, for many operational processes, good models, describing the process accurately and intuitively are unavailable. Process mining, or more precisely, process discovery, aims at deriving such models from events logged by information systems. However many mining techniques assume that all events in an event log are logged at the same level of abstraction, which in practice is often not the case. Furthermore, many mining algorithms produce results that are hard to understand by process specialists. In this paper, we propose a simple clustering algorithm to derive a model from an event log, such that this model only contains a limited set of nodes and edges. Each node represents a set of activities performed in the process, but many nodes can refer to many activities and vice versa. Using the discovered model, which represents the process at a potentially high level of abstraction, we present two different ways to project performance information onto it. Using these performance projections, process owners can gain insights into the process under consideration in an intuitive way. To validate our approach, we apply our work to a real-life case from a Dutch municipality.
Book
This book introduces readers to the field of conformance checking as a whole and outlines the fundamental relation between modelled and recorded behaviour. Conformance checking interrelates the modelled and recorded behaviour of a given process and provides techniques and methods for comparing and analysing observed instances of a process in the presence of a model, independent of the model’s origin. Its goal is to provide an overview of the essential techniques and methods in this field at an intuitive level, together with precise formalisations of its underlying principles. The book is divided into three parts, that are meant to cover different perspectives of the field of conformance checking. Part I presents a comprehensive yet accessible overview of the essential concepts used to interrelate modelled and recorded behaviour. It also serves as a reference for assessing how conformance checking efforts could be applied in specific domains. Next, Part II provides readers with detailed insights into algorithms for conformance checking, including the most commonly used formal notions and their instantiation for specific analysis questions. Lastly, Part III highlights applications that help to make sense of conformance checking results, thereby providing a necessary next step to increase the value of a given process model. They help to interpret the outcomes of conformance checking and incorporate them by means of enhancement and repair techniques. Providing the core building blocks of conformance checking and describing its main applications, this book mainly addresses students specializing in business process management, researchers entering process mining and conformance checking for the first time, and advanced professionals whose work involves process evaluation, modelling and optimization.
Article
Discovery plays a key role in data-driven analysis of business processes. The vast majority of contemporary discovery algorithms aims at the identification of control-flow constructs. The increase in data richness, however, enables discovery that incorporates the context of process execution beyond the control-flow perspective. A “control-flow first” approach, where context data serves for refinement and annotation, is limited and fails to detect fundamental changes in the control-flow that depend on context data. In this work, we thus propose a novel approach for combining the control-flow and data perspectives under a single roof by extending inductive process discovery. Our approach provides criteria under which context data, handled through unsupervised learning, take priority over control-flow in guiding process discovery. The resulting model is a process tree, in which some operators carry data semantics instead of control-flow semantics. We show that the proposed approach produces trees that are context consistent, deterministic, complete, and can be explainable without a major quality reduction. We evaluate the approach using synthetic and real-world datasets, showing that the resulting models are superior to state-of-the-art discovery methods in terms of measures based on multi-perspective alignments.
Article
Operational process models such as generalised stochastic Petri nets (GSPNs) are useful when answering performance questions about business processes (e.g. ‘how long will it take for a case to finish?’). Recently, methods for process mining have been developed to discover and enrich operational models based on a log of recorded executions of processes, which enables evidence-based process analysis. To avoid a bias due to infrequent execution paths, discovery algorithms strive for a balance between over-fitting and under-fitting regarding the originating log. However, state-of-the-art discovery algorithms address this balance solely for the control-flow dimension, neglecting the impact of their design choices in terms of performance measures. In this work, we thus offer a technique for controlled performance-driven model reduction of GSPNs, using structural simplification rules, namely foldings. We propose a set of foldings that aggregate or eliminate performance information. We further prove the soundness of these foldings in terms of stability preservation and provide bounds on the error that they introduce with respect to the original model. Furthermore, we show how to find an optimal sequence of simplification rules, such that their application yields a minimal model under a given error budget for performance estimation. We evaluate the approach with two real-world datasets from the healthcare and telecommunication domains, showing that model simplification indeed enables a controlled reduction of model size, while preserving performance metrics with respect to the original model. Moreover, we show that aggregation dominates elimination when abstracting performance models by preventing under-fitting due to information loss.
Article
Process mining methods allow analysts to exploit logs of historical executions of business processes in order to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Several dozen automated process discovery methods have been proposed in the past two decades, striking different trade-offs between scalability, accuracy and complexity of the resulting models. So far, automated process discovery methods have been evaluated in an ad hoc manner, with different authors employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of non-publicly available datasets. In this setting, this article provides a systematic review of automated process discovery methods and a systematic comparative evaluation of existing implementations of these methods using an opensource benchmark covering nine publicly-available real-life event logs and eight quality metrics. The review and evaluation results highlight gaps and unexplored trade-offs in the field, including the lack of scalability of several proposals in the field and a strong divergence in the performance of different methods with respect to different quality metrics. The proposed benchmark allows researchers to empirically compare new automated process discovery against existing ones in a unified setting.
Conference Paper
While models and event logs are readily available in modern organizations, their quality can seldom be trusted. Raw event recordings are often noisy, incomplete, and contain erroneous recordings. The quality of process models, both conceptual and data-driven, heavily depends on the inputs and parameters that shape these models, such as domain expertise of the modelers and the quality of execution data. The mentioned quality issues are specifically a challenge for conformance checking. Conformance checking is the process mining task that aims at coping with low model or log quality by comparing the model against the corresponding log, or vice versa. The prevalent assumption in the literature is that at least one of the two can be fully trusted. In this work, we propose a generalized conformance checking framework that caters for the common case, when one does neither fully trust the log nor the model. In our experiments we show that our proposed framework balances the trust in model and log as a generalization of state-of-the-art conformance checking techniques.
Conference Paper
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.
Conference Paper
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust in a model. Trust is fundamental if one plans to take action based on a prediction, or when choosing whether or not to deploy a new model. Such understanding further provides insights into the model, which can be used to turn an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We further propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). The usefulness of explanations is shown via novel experiments, both simulated and with human subjects. Our explanations empower users in various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and detecting why a classifier should not be trusted.
Conference Paper
Process mining is a rapidly developing field that aims at automated modeling of business processes based on data coming from event logs. In recent years, advances in tracking technologies, e.g., Real-Time Locating Systems (RTLS), put forward the ability to log business process events as location sensor data. To apply process mining techniques to such sensor data, one needs to overcome an abstraction gap, because location data recordings do not relate to the process directly. In this work, we solve the problem of mapping sensor data to event logs based on process knowledge. Specifically, we propose interactions as an intermediate knowledge layer between the sensor data and the event log. We solve the mapping problem via optimal matching between interactions and process instances. An empirical evaluation of our approach shows its feasibility and provides insights into the relation between ambiguities and deviations from process knowledge, and accuracy of the resulting event log.
Chapter
In recent years, data science emerged as a new and important discipline. It can be viewed as an amalgamation of classical disciplines like statistics, data mining, databases, and distributed systems. Existing approaches need to be combined to turn abundantly available data into value for individuals, organizations, and society. Moreover, new challenges have emerged, not just in terms of size (“Big Data”) but also in terms of the questions to be answered. This book focuses on the analysis of behavior based on event data. Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements. In later chapters, we will show that process mining provides powerful tools for today’s data scientist. However, before introducing the main topic of the book, we provide an overview of the data science discipline.
Conference Paper
The performance of scheduled business processes is of central importance for services and manufacturing systems. However, current techniques for performance analysis do not take both queueing semantics and the process perspective into account. In this work, we address this gap by developing a novel method for utilizing rich process logs to analyze performance of scheduled processes. The proposed method combines simulation, queueing analytics, and statistical methods. At the heart of our approach is the discovery of an individual-case model from data, based on an extension of the Colored Petri Nets formalism. The resulting model can be simulated to answer performance queries, yet it is computational inefficient. To reduce the computational cost, the discovered model is projected into Queueing Networks, a formalism that enables efficient performance analytics. The projection is facilitated by a sequence of folding operations that alter the structure and dynamics of the Petri Net model. We evaluate the approach with a real-world dataset from Dana-Farber Cancer Institute, a large outpatient cancer hospital in the United States.
Conference Paper
Process mining techniques analyze processes based on event data. A crucial assumption for process analysis is that events correspond to occurrences of meaningful activities. Often, low-level events recorded by information systems do not directly correspond to these. Abstraction methods, which provide a mapping from the recorded events to activities recognizable by process workers, are needed. Existing supervised abstraction methods require a full model of the entire process as input and cannot handle noise. This paper proposes a supervised abstraction method based on behavioral activity patterns that capture domain knowledge on the relation between activities and events. Through an alignment between the activity patterns and the low-level event logs an abstracted event log is obtained. Events in the abstracted event log correspond to instantiations of recognizable activities. The method is evaluated with domain experts of a Norwegian hospital using an event log from their digital whiteboard system. The evaluation shows that state-of-the art process mining methods provide valuable insights on the usage of the system when using the abstracted event log, but fail when using the original lower level event log.
Conference Paper
The visualization of models is essential for user-friendly human-machine interactions during Process Mining. A simple graphical representation contributes to give intuitive information about the behavior of a system. However, complex systems cannot always be represented with succinct models that can be easily visualized. Quality-preserving model simplifications can be of paramount importance to alleviate the complexity of finding useful and attractive visualizations. This paper presents a collection of log-based techniques to simplify process models. The techniques trade off visual-friendly properties with quality metrics related to logs, such as fitness and precision, to avoid degrading the resulting model. The algorithms, either cast as optimization problems or heuristically guided, find simplified versions of the initial process model, and can be applied in the final stage of the process mining life-cycle, between the discovery of a process model and the deployment to the final user. A tool has been developed and tested on large logs, producing simplified process models that are one order of magnitude smaller while keeping fitness and precision under reasonable margins.
Conference Paper
Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Recently, work on process mining showed how management of these processes, and engineering of supporting systems, can be guided by models extracted from the event logs that are recorded during process operation. In this work, we establish a queueing perspective in operational process mining. We propose to consider queues as first-class citizens and use queueing theory as a basis for queue mining techniques. To demonstrate the value of queue mining, we revisit the specific operational problem of online delay prediction: using event data, we show that queue mining yields accurate online predictions of case delay.
Article
Urban mobility impacts urban life to a great extent. To enhance urban mobility, much research was invested in traveling time prediction: given an origin and destination, provide a passenger with an accurate estimation of how long a journey lasts. In this work, we investigate a novel combination of methods from Queueing Theory and Machine Learning in the prediction process. We propose a prediction engine that, given a scheduled bus journey (route) and a 'source/destination' pair, provides an estimate for the traveling time, while considering both historical data and real-time streams of information that are transmitted by buses. We propose a model that uses natural segmentation of the data according to bus stops and a set of predictors, some use learning while others are learning-free, to compute traveling time. Our empirical evaluation, using bus data that comes from the bus network in the city of Dublin, demonstrates that the snapshot principle, taken from Queueing Theory, works well yet suffers from outliers. To overcome the outliers problem, we use Machine Learning techniques as a regulator that assists in identifying outliers and propose prediction based on historical data.
Book
Googling the term “Business Process Management†in May 2008 yields some 6.4 million hits, the great majority of which (based on sampling) seem to concern the so-called BPM software systems. This is ironic and unfortunate, because in fact IT in general, and such BPM systems in particular, is at most a peripheral aspect of Business Process Management. In fact, Business Process Management (BPM) is a comprehensive system for managing and transforming organizational operations, based on what is arguably the first set of new ideas about organizational performance since the Industrial Revolution.
Conference Paper
Modern information systems that support complex business processes generally maintain significant amounts of process execution data, particularly records of events corresponding to the execution of activities (event logs). In this paper, we present an approach to analyze such event logs in order to pre-dictively monitor business constraints during business process execution. At any point during an execution of a process, the user can define business constraints in the form of linear temporal logic rules. When an activity is being executed, the framework identifies input data values that are more (or less) likely to lead to the achievement of each business constraint. Unlike reactive compliance monitoring approaches that detect violations only after they have occurred, our predictive monitoring approach provides early advice so that users can steer ongoing pro-cess executions towards the achievement of business constraints. In other words, violations are predicted (and potentially prevented) rather than merely detected. The approach has been implemented in the ProM process mining toolset and val-idated on a real-life log pertaining to the treatment of cancer patients in a large hospital.
Article
Modern information systems that support complex business processes generally maintain significant amounts of process execution data, particularly records of events corresponding to the execution of activities (event logs). In this paper, we present an approach to analyze such event logs in order to predictively monitor business goals during business process execution. At any point during an execution of a process, the user can define business goals in the form of linear temporal logic rules. When an activity is being executed, the framework identifies input data values that are more (or less) likely to lead to the achievement of each business goal. Unlike reactive compliance monitoring approaches that detect violations only after they have occurred, our predictive monitoring approach provides early advice so that users can steer ongoing process executions towards the achievement of business goals. In other words, violations are predicted (and potentially prevented) rather than merely detected. The approach has been implemented in the ProM process mining toolset and validated on a real-life log pertaining to the treatment of cancer patients in a large hospital.
Article
Similarity is an important and widely used concept. Previous definitions of similarity are tied to a particular application or a form of knowledge representation. We present an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model. We demonstrate how our definition can be used to measure the similarity in a number of different domains.
Dealing with concept drifts in process mining
  • RJC Bose
  • WM Van Der Aalst
  • I Žliobaitė
  • M Pechenizkiy
  • M Dumas
Ensemble-based prediction of business processes bottlenecks with recurrent concept drifts
  • Y Spenrath
  • M Hassani
Spenrath, Y., Hassani, M.: Ensemble-based prediction of business processes bottlenecks with recurrent concept drifts. In: EDBT/ICDT Workshops. (2019)
Queue Mining: Service Perspectives in Process Mining
  • A Senderovich
Senderovich, A.: Queue Mining: Service Perspectives in Process Mining. PhD dissertation, Technion-Israel Institute of Technology (2017)