Conference Paper

P $$^3$$ -Folder: Optimal Model Simplification for Improving Accuracy in Process Performance Prediction

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Operational process models such as generalised stochastic Petri nets (GSPNs) are useful when answering performance queries on business processes (e.g. ‘how long will it take for a case to finish?’). Recently, methods for process mining have been developed to discover and enrich operational models based on a log of recorded executions of processes, which enables evidence-based process analysis. To avoid a bias due to infrequent execution paths, discovery algorithms strive for a balance between over-fitting and under-fitting regarding the originating log. However, state-of-the-art discovery algorithms address this balance solely for the control-flow dimension, neglecting possible over-fitting in terms of performance annotations. In this work, we thus offer a technique for performance-driven model reduction of GSPNs, using structural simplification rules. Each rule induces an error in performance estimates with respect to the original model. However, we show that this error is bounded and that the reduction in model parameters incurred by the simplification rules increases the accuracy of process performance prediction. We further show how to find an optimal sequence of applying simplification rules to obtain a minimal model under a given error budget for the performance estimates. We evaluate the approach with a real-world case in the healthcare domain, showing that model simplification indeed yields significant improvements in time prediction accuracy.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Defining such a model manually would require deep knowledge of the process and high levels of expertise, especially in scenarios where the process is not enacted and traced according to a given well-defined workflow schema. This explains the many effort s that have been spent to extract such a model automatically from historical traces [13][14][15][16][17][18][19][20][21] , based on suitable prediction-oriented inductive learning methods. ...
... R1 : Combining forecasts on ongoing and future process instances, reusing process/time-series prediction methods. The performance outcomes of each ongoing process instance can be forecast by leveraging, as a base prediction technique, one among the many ones that have been developed in the active field of performance-oriented predictive process monitoring [13][14][15][16][17][18][19][20][21] . As there is no consensus on which of these solutions is the best one, for the sake of versatility and extensibility it is convenient to define an A-PPI prediction approach parametrically to its underlying single-instance prediction technique. ...
... In particular, as to the fundamental requirement R1, all the approaches to the prediction of (numerical) performance measures developed in the field of predictive process monitoring [13][14][15][16][17][18][19][20][21] provide forecasts for single process instances, but cannot infer anything on the process instances of the current window starting after the current checkpoint. On the other hand, the approaches that only forecast aggregate performances over future time ranges based on time-series predictors [23,24] , assume that all the past values of the target time series (i.e. the series of aggregate perfor-mance value computed for all windows/slots of the process) are known with a sufficient level of certainty; however, this assumption does not hold in our setting, where the process instances of a window/slot may terminate way after the end of the window/slot. ...
Article
Monitoring the performances of a business process is a key issue in many organizations, especially when the process must comply with predefined performance constraints. In such a case, empowering the monitoring system with prediction capabilities would allow us to know in advance a constraint violation, and possibly trigger corrective measures to eventually prevent the violation. Despite the problem of making run-time predictions for a process, based on pre-mortem log data, is an active research topic in Process Mining, current predictive monitoring approaches in this field only support predictions at the level of a single process instance, whereas process performance constraints are often defined in an aggregated form, according to predefined time windows. Moreover, most of these approaches cannot work well on the traces of a lowly-structured business process when these traces do not refer to well-defined process tasks/activities. For such a challenging setting, we define an approach to the problem of predicting whether the process instances of a given (unfinished) time window will violate an aggregate performance requirement. The approach mainly rely on inducing and integrating two complementary predictive models: (1) a clustering-based predictor for estimating the outcome of each ongoing process instance, (2) a time-series predictor for estimating the performance outcome of “future” process instances that will fall in the window after the moment when the prediction is being made (i.e. instances, not started yet, that will start by the end of the window). Both models are expected to benefit from the availability of aggregate context data regarding the environment that surrounds the process. This discovery approach is conceived as the core of an advanced performance monitoring system, for which an event-based conceptual architecture is here proposed. Tests on real-life event data confirmed the validity of our approach, in terms of accuracy, robustness, scalability, and usability.
... In [50], authors pose a simplification of the PNs models for the improvement of performance prediction. Their method, named P3-fold, generates several simplification rules for this task, using Integer Linear Programming (ILP). ...
... A higher number of data events for the encoding Name Size Refs Availability <<unknown>> 1,000 traces [30] n/a Dutch municipality 796 cases [61] n/a Manufacturing process 1,030 instances [25] n/a Transshipment system 5,336 traces [20], [2], [3], [21], [9] n/a Logistic provider 10,000 traces, 784 cases [26], [45] n/a CARGO 2000 system 3,942 instances [36], [37] yes ACMEBOT process 5,000 traces [29] n/a BPI Challenge 2011 (Dutch Academic Hospital) 1,100 cases [32], [31], [22], [64], [18] yes BPI Challenge 2012 (Dutch financial bank) 13,087 traces [41], [4], [46], [5], [55] yes BPI Challenge 2015 (Dutch municipalities) 1,199 cases [17], [18] yes BPI Challenge 2013 (Volvo IT incidents) - [5], [34] yes <<unknown>> 3,777 instances [1] n/a Air traffic information 119 event logs [6] n/a Sending for credit collection 1,500 and 5,000 traces [43] n/a Insurance company (claim handling process) 1,065 traces [11] n/a Israeli bank call center 7,000 traces [52] n/a Road fines log 7,300 traces [44] n/a US Hospital - [50] n/a Personal loan process 9,350 instances [12] n/a Marketing campaign process - [59] n/a LtC process - [56] n implies a decrease in the efficiency of the system. To solve this problem, a variation in the number of events for the prediction is considered in several works, e.g. ...
... The different data sets, quality metrics and input features employed, hinder the comparison. Although the majority of methods achieve a reasonable prediction Type of predicted values References Remaining time [45], [61], [46], [2], [20], [43], [39], [55] [63], [64], [50], [9], [44], [59], [52] Risk probability [11], [37], [41], [42], [26], [6], [12], [56] Any value of indicator [31], [3], [2], [25], [36], [29], [30], [17] [58], [34] LTL rule [32], [22], [18] Aggregate metrics [21], [29] Next event [1], [4], [5], [55], [63], [64], [44], [59] [28] performance, i.e. precision and accuracy rates higher than 70%, the lack of comparison prevents the determination of which of the proposals obtain the best global performance rates. The second of them is the absence of available software of the proposals, and in consequence, users are not allowed to test the validity of these methods using different data sets. ...
Article
Full-text available
Nowadays, process mining is becoming a growing area of interest in business process management (BPM). Process mining consists in the extraction of information from the event logs of a business process. From this information, we can discover process models, monitor and improve our processes. One of the applications of process mining, is the predictive monitoring of business process. The aim of these techniques is the prediction of quantifiable metrics of a running process instance with the generation of predictive models. The most representative approaches for the runtime prediction of business process are summarized in this paper. The different types of computational predictive methods, such as statistical techniques or machine learning approaches, and certain aspects as the type of predicted outcomes and quality evaluation metrics, have been considered for the categorization of these methods. This paper also includes a summary of the basic concepts, as well as a global overview of the process predictive monitoring area, that can be used to support future efforts of researchers and practitioners in this research field.
... Under these conditions, the full range of PM techniques can B Luigi Pontieri luigi.pontieri@icar.cnr.it Francesco Folino francesco.folino@icar.cnr.it 1 Institute ICAR-CNR, Via P. Bucci 8/9C, 87036 Rende, CS, Italy be exploited, possibly combined in a pipeline-like fashion (according to the "Extended L * lifecycle" model [89]): from the induction and validation of a high-quality control-flow model to the enrichment of this model with stochastic modeling capabilities [76,78], up to the exploitation of these predictive capabilities for run-time support. ...
... The third stage, named Create control-flow model and connect event log, is devoted to obtaining a control-flow model that explains the input log accurately, by using process discovery and conformance checking techniques. This control-flow model is then enriched (through extension methods) with additional perspectives and/or predictive capabilities [76,78] in the fourth stage, named Create integrated process model. The final Operational Support stage consists in performing operational-support tasks on pre-mortem traces, based on a prediction-augmented process model. ...
Article
Full-text available
The ever-increasing attention of process mining (PM) research to the logs of low structured processes and of non-process-aware systems (e.g., ERP, IoT systems) poses a number of challenges. Indeed, in such cases, the risk of obtaining low-quality results is rather high, and great effort is needed to carry out a PM project, most of which is usually spent in trying different ways to select and prepare the input data for PM tasks. Two general AI-based strategies are discussed in this paper, which can improve and ease the execution of PM tasks in such settings: (a) using explicit domain knowledge and (b) exploiting auxiliary AI tasks. After introducing some specific data quality issues that complicate the application of PM techniques in the above-mentioned settings, the paper illustrates these two strategies and the results of a systematic review of relevant literature on the topic. Finally, the paper presents a taxonomical scheme of the works reviewed and discusses some major trends, open issues and opportunities in this field of research.
... Recently proposed discovery algorithms attempt to balance between over-fitting and under-fitting in the control-flow dimension [7,8,9,10]. Yet, the question of how to balance over-fitting and under-fitting in terms of performance annotations of operational models has received little attention in the literature so far [11]. ...
... The paper builds upon our earlier work [11] and provides three main contributions: ...
Article
Operational process models such as generalised stochastic Petri nets (GSPNs) are useful when answering performance questions about business processes (e.g. ‘how long will it take for a case to finish?’). Recently, methods for process mining have been developed to discover and enrich operational models based on a log of recorded executions of processes, which enables evidence-based process analysis. To avoid a bias due to infrequent execution paths, discovery algorithms strive for a balance between over-fitting and under-fitting regarding the originating log. However, state-of-the-art discovery algorithms address this balance solely for the control-flow dimension, neglecting the impact of their design choices in terms of performance measures. In this work, we thus offer a technique for controlled performance-driven model reduction of GSPNs, using structural simplification rules, namely foldings. We propose a set of foldings that aggregate or eliminate performance information. We further prove the soundness of these foldings in terms of stability preservation and provide bounds on the error that they introduce with respect to the original model. Furthermore, we show how to find an optimal sequence of simplification rules, such that their application yields a minimal model under a given error budget for performance estimation. We evaluate the approach with two real-world datasets from the healthcare and telecommunication domains, showing that model simplification indeed enables a controlled reduction of model size, while preserving performance metrics with respect to the original model. Moreover, we show that aggregation dominates elimination when abstracting performance models by preventing under-fitting due to information loss.
... Probabilistic variant mining relates to recent attempts to handle noise in process discovery, based either on event log filtering [10,13,23], or model abstraction [24,25]. While these works rely on deterministic reasoning, requiring a user to decide on over or under representation of process variants in the model, the TNR enables for a probabilistic framework, which better reflects process heterogeneity in the discovered model. ...
Conference Paper
Analysing performance of business processes is an important vehicle to improve their operation. Specifically, an accurate assessment of sojourn times and remaining times enables bottleneck analysis and resource planning. Recently, methods to create respective performance models from event logs have been proposed. These works are severely limited, though: They either consider control-flow and performance information separately, or rely on an ad-hoc selection of temporal relations between events. In this paper, we introduce the Temporal Network Representation (TNR) of a log, based on Allen’s interval algebra, as a complete temporal representation of a log, which enables simultaneous discovery of control-flow and performance information. We demonstrate the usefulness of the TNR for detecting (unrecorded) delays and for probabilistic mining of variants when modelling the performance of a process. In order to compare different models from the performance perspective, we develop a framework for measuring performance fitness. Under this framework, we provide guarantees that TNR-based process discovery dominates existing techniques in measuring performance characteristics of a process. To illustrate the practical value of the TNR, we evaluate the approach against three real-life datasets. Our experiments show that the TNR yields an improvement in performance fitness over state-of-the-art algorithms.
Article
Business process management has been widely adopted by many organisations, resulting in the accumulation of large collections of process models. The majority of these models are rather large and complex. Even though such models constitute a great source of knowledge, they cannot be easily understood by all process stakeholders. Process model abstraction techniques have been proven effective in generating easy to comprehend, high-level views on business process models; thus, such techniques change the way that detailed process models may be utilized within an organization. Although much attention has been given to abstract activities of process models, to the best of our knowledge, there are no research works that deliver abstract process model views, by considering as candidates for abstraction not only activities but also other process model elements. In this paper, we present an abstraction approach that simplifies existing process models by focusing not only on the abstraction of activities, but also on the abstraction of data, roles, messages and artifacts. The proposed approach exploits both model structure and element properties, while it is grounded on a set of abstraction rules. A prototype tool has been implemented as a proof of concept; this tool has been used for validating the proposed approach by automatically applying the suggested abstraction rules to different sets of real-world process models. A number of process stakeholders have also been involved in this validation. Thus, it is empirically proved that the presented work is an effective process model abstraction method that increases the usability of complex business process models, as it enables their rapid comprehension by process stakeholders.
Article
Predictive process monitoring has recently gained traction in academia and is maturing also in companies. However, with the growing body of research, it might be daunting for companies to navigate in this domain in order to find, provided certain data, what can be predicted and what methods to use. The main objective of this paper is developing a value-driven framework for classifying existing work on predictive process monitoring. This objective is achieved by systematically identifying, categorizing, and analyzing existing approaches for predictive process monitoring. The review is then used to develop a value-driven framework that can support organizations to navigate in the predictive process monitoring field and help them to find value and exploit the opportunities enabled by these analysis techniques.
Conference Paper
Full-text available
Given an event log describing observed behaviour, process discovery aims to find a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm returns a sound model in all cases (free of deadlocks and other anomalies), handles infrequent behaviour well and finishes quickly. We present a technique able to cope with infrequent behaviour and large event logs, while ensuring soundness. The technique has been implemented in ProM and we compare the technique with existing approaches in terms of quality and performance.
Conference Paper
Full-text available
The aim of process discovery is to discover a process model based on business process execution data, recorded in an event log. One of several existing process discovery techniques is the ILP-based process discovery algorithm. The algorithm is able to unravel complex process structures and provides formal guarantees w.r.t. the model discovered, e.g., the algorithm guarantees that a discovered model describes all behavior present in the event log. Unfortunately the algorithm is unable to cope with exceptional behavior present in event logs. As a result, the application of ILP-based process discovery techniques in everyday process discovery practice is limited. This paper addresses this problem by proposing a filtering technique tailored towards ILP-based process discovery. The technique helps to produce process models that are less over-fitting w.r.t. the event log, more understandable, and more adequate in capturing the dominant behavior present in the event log. The technique is implemented in the ProM framework.
Article
Full-text available
Process mining can be viewed as the missing link between model-based process analysis and data-oriented analysis techniques. Lion׳s share of process mining research has been focusing on process discovery (creating process models from raw data) and replay techniques to check conformance and analyze bottlenecks. These techniques have helped organizations to address compliance and performance problems. However, for a more refined analysis, it is essential to correlate different process characteristics. For example, do deviations from the normative process cause additional delays and costs? Are rejected cases handled differently in the initial phases of the process? What is the influence of a doctor׳s experience on treatment process? These and other questions may involve process characteristics related to different perspectives (control-flow, data-flow, time, organization, cost, compliance, etc.). Specific questions (e.g., predicting the remaining processing time) have been investigated before, but a generic approach was missing thus far. The proposed framework unifies a number of approaches for correlation analysis proposed in literature, proposing a general solution that can perform those analyses and many more. The approach has been implemented in ProM and combines process and data mining techniques. In this paper, we also demonstrate the applicability using a case study conducted with the UWV (Employee Insurance Agency), one of the largest “administrative factories” in The Netherlands.
Conference Paper
Full-text available
This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible outcomes (e.g., a positive and a negative outcome), the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple symbolic sequence classification, meaning that they extract features from traces seen as sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex symbolic sequences, that is, sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex symbolic sequences and compares their predictive accuracy on real-life business process event logs.
Conference Paper
Full-text available
Companies realize their services by business processes to stay competitive in a dynamic market environment. In particular, they track the current state of the process to detect undesired deviations, to provide customers with predicted remaining durations, and to improve the ability to schedule resources accordingly. In this setting, we propose an approach to predict remaining process execution time, taking into account passed time since the last observed event. While existing approaches update predictions only upon event arrival and subtract elapsed time from the latest predictions, our method also considers expected events that have not yet occurred, resulting in better prediction quality. Moreover, the prediction approach is based on the Petri net formalism and is able to model concurrency appropriately. We present the algorithm and its implementation in ProM and compare its predictive performance to state-of-the-art approaches in simulated experiments and in an industry case study.
Conference Paper
Full-text available
Capturing the performance of a system or business process as accurately as possible is important, as models enriched with performance information provide valuable input for analysis, operational support, and prediction. Due to their computationally nice properties, memoryless models such as exponentially distributed stochastic Petri nets have earned much attention in research and industry. However, there are cases when the memoryless property is clearly not able to capture process behavior, e.g., when dealing with fixed time-outs. We want to allow models to have generally distributed durations to be able to capture the behavior of the environment and resources as accurately as possible. For these more expressive process models, the execution policy has to be specified in more detail. In this paper, we present and evaluate process discovery algorithms for each of the execution policies. The introduced approach uses raw event execution data to discover various classes of stochastic Petri nets. The algorithms are based on the notion of alignments and have been implemented as a plug-in in the process mining framework ProM.
Conference Paper
Full-text available
Process discovery algorithms typically aim at discovering process models from event logs that best describe the recorded behavior. Often, the quality of a process discovery algorithm is measured by quantifying to what extent the resulting model can reproduce the behavior in the log, i.e. replay fitness. At the same time, there are many other metrics that compare a model with recorded behavior in terms of the precision of the model and the extent to which the model generalizes the behavior in the log. Furthermore, several metrics exist to measure the complexity of a model irrespective of the log. In this paper, we show that existing process discovery algorithms typically consider at most two out of the four main quality dimensions: replay fitness, precision, generalization and simplicity. Moreover, existing approaches can not steer the discovery process based on user-defined weights for the four quality dimensions. This paper also presents the ETM algorithm which allows the user to seamlessly steer the discovery process based on preferences with respect to the four quality dimensions. We show that all dimensions are important for process discovery. However, it only makes sense to consider precision, generalization and simplicity if the replay fitness is acceptable.
Book
Full-text available
Differences in features supported by the various contemporary commercial workflow management systems point to different insights of suitability and different levels of expressive power. The challenge, which we undertake in this paper, is to systematically address workflow requirements, from basic to complex. Many of the more complex requirements identified, recur quite frequently in the analysis phases of workflow projects, however their implementation is uncertain in current products. Requirements for workflow languages are indicated through workflow patterns. In this context, patterns address business requirements in an imperative workflow style expression, but are removed from specific workflow languages. The paper describes a number of workflow patterns addressing what we believe identify comprehensive workflow functionality. These patterns provide the basis for an in-depth comparison of a number of commercially availablework flow management systems. As such, this paper can be seen as the academic response to evaluations made by prestigious consulting companies. Typically, these evaluations hardly consider the workflow modeling language and routing capabilities, and focus more on the purely technical and commercial aspects.
Conference Paper
Full-text available
A business process is often modeled using some kind of a directed flow graph, which we call a workflow graph. The Refined Process Structure Tree (RPST) [1] is a technique for workflow graph parsing, i.e., for discovering the structure of a workflow graph, which has various applications. In this paper, we provide two improvements to the RPST. First, we propose an alternative way to compute the RPST that is simpler than the one developed originally [1]. In particular, the computation reduces to constructing the tree of the triconnected components of a workflow graph in the special case when every node has at most one incoming or at most one outgoing edge. Such graphs occur frequently in applications. Secondly, we extend the applicability of the RPST. Originally, the RPST was applicable only to graphs with a single source and single sink such that the completed version of the graph is biconnected. We lift both restrictions. Therefore, the RPST is then applicable to arbitrary directed graphs such that every node is on a path from some source to some sink. This includes graphs with multiple sources and/or sinks and disconnected graphs.
Article
Full-text available
Process mining includes the automated discovery of processes from event logs. Based on observed events (e.g., activities being executed or messages being exchanged) a process model is constructed. One of the essential problems in process mining is that one cannot assume to have seen all possible behavior. At best, one has seen a representative subset. Therefore, classical synthesis techniques are not suitable as they aim at finding a model that is able to exactly reproduce the log. Existing process mining techniques try to avoid such “overfitting” by generalizing the model to allow for more behavior. This generalization is often driven by the representation language and very crude assumptions about completeness. As a result, parts of the model are “overfitting” (allow only for what has actually been observed) while other parts may be “underfitting” (allow for much more behavior without strong support for it). None of the existing techniques enables the user to control the balance between “overfitting” and “underfitting”. To address this, we propose a two-step approach. First, using a configurable approach, a transition system is constructed. Then, using the “theory of regions”, the model is synthesized. The approach has been implemented in the context of ProM and overcomes many of the limitations of traditional approaches.
Book
Full-text available
An exploration of concepts and techniques necessary for diagnosing and correcting the problems that create queues.
Article
Full-text available
Process mining is a tool to extract non-trivial and useful information from process execution logs. These so-called event logs (also called audit trails, or transaction logs) are the starting point for various discovery and analysis techniques that help to gain insight into certain characteristics of the process. In this paper we use a combination of process mining techniques to discover multiple perspectives (namely, the control-flow, data, performance, and resource perspective) of the process from historic data, and we integrate them into a comprehensive simulation model. This simulation model is represented as a colored Petri net (CPN) and can be used to analyze the process, e.g., evaluate the performance of different alternative designs. The discovery of simulation models is explained using a running example. Moreover, the approach has been applied in two case studies; the workflows in two different municipalities in the Netherlands have been analyzed using a combination of process mining and simulation. Furthermore, the quality of the CPN models generated for the running example and the two case studies has been evaluated by comparing the original logs with the logs of the generated models.
Conference Paper
Full-text available
Process Mining is a technique for extracting process models from ex- ecution logs. This is particularly useful in situations where people have an ide- alized view of reality. Real-life processes turn out to be less structured than peo- ple tend to believe. Unfortunately, traditional process mining approaches have problems dealing with unstructured processes. The discovered models are often "spaghetti-like", showing all details without distinguishing what is important and what is not. This paper proposes a new process mining approach to overcome this problem. The approach is configurable and allows for different faithfully simpli- fied views of a particular process. To do this, the concept of a roadmap is used as a metaphor. Just like different roadmaps provide suitable abstractions of reality, process models should provide meaningful abstractions of operational processes encountered in domains ranging from healthcare and logistics to web services and public administration.
Conference Paper
Full-text available
This paper presents a new net-reduction methodology to facilitate the anal- ysis of large workflow models. We propose an enhanced algorithm based on reducible subnet identification which preserves both soundness and completion time distribu- tion. Moreover we outline an approach to model the dynamic behavior of business processes by exploiting the power of a class of non-Markovian stochastic Petri net models.
Conference Paper
Full-text available
A business process is often modeled using some kind of a directed flow graph, which we call a workflow graph. The Refined Process Structure Tree (RPST) is a technique for workflow graph parsing, i.e., for discovering the structure of a workflow graph, which has various applications. In this paper, we provide two improvements to the RPST. First, we propose an alternative way to compute the RPST that is simpler than the one developed originally. In particular, the computation reduces to constructing the tree of the triconnected components of a workflow graph in the special case when every node has at most one incoming or at most one outgoing edge. Such graphs occur frequently in applications. Secondly, we extend the applicability of the RPST. Originally, the RPST was applicable only to graphs with a single source and single sink such that the completed version of the graph is biconnected. We lift both restrictions. Therefore, the RPST is then applicable to arbitrary directed graphs such that every node is on a path from some source to some sink. This includes graphs with multiple sources and/or sinks and disconnected graphs.
Article
Full-text available
Process mining allows for the automated discovery of process models from event logs. These models provide insights and enable various types of model-based analysis. This paper demonstrates that the discovered process models can be extended with information to predict the completion time of running instances. There are many scenarios where it is useful to have reliable time predictions. For example, when a customer phones her insurance company for information about her insurance claim, she can be given an estimate for the remaining processing time. In order to do this, we provide a configurable approach to construct a process model, augment this model with time information learned from earlier instances, and use this to predict e.g. the completion time. To provide meaningful time predictions we use a configurable set of abstractions that allow for a good balance between "overfitting" and "underfitting". The approach has been implemented in ProM and through several experiments using real-life event logs we demonstrate its applicability.
Article
Business Process Management (BPM) is the art and science of how work should be performed in an organization in order to ensure consistent outputs and to take advantage of improvement opportunities, e.g. reducing costs, execution times or error rates. Importantly, BPM is not about improving the way individual activities are performed, but rather about managing entire chains of events, activities and decisions that ultimately produce added value for an organization and its customers. This textbook encompasses the entire BPM lifecycle, from process identification to process monitoring, covering along the way process modelling, analysis, redesign and automation. Concepts, methods and tools from business management, computer science and industrial engineering are blended into one comprehensive and inter-disciplinary approach. The presentation is illustrated using the BPMN industry standard defined by the Object Management Group and widely endorsed by practitioners and vendors worldwide. In addition to explaining the relevant conceptual background, the book provides dozens of examples, more than 100 hands-on exercises – many with solutions – as well as numerous suggestions for further reading. The textbook is the result of many years of combined teaching experience of the authors, both at the undergraduate and graduate levels as well as in the context of professional training. Students and professionals from both business management and computer science will benefit from the step-by-step style of the textbook and its focus on fundamental concepts and proven methods. Lecturers will appreciate the class-tested format and the additional teaching material available on the accompanying website fundamentals-of-bpm.org.
Conference Paper
The performance of scheduled business processes is of central importance for services and manufacturing systems. However, current techniques for performance analysis do not take both queueing semantics and the process perspective into account. In this work, we address this gap by developing a novel method for utilizing rich process logs to analyze performance of scheduled processes. The proposed method combines simulation, queueing analytics, and statistical methods. At the heart of our approach is the discovery of an individual-case model from data, based on an extension of the Colored Petri Nets formalism. The resulting model can be simulated to answer performance queries, yet it is computational inefficient. To reduce the computational cost, the discovered model is projected into Queueing Networks, a formalism that enables efficient performance analytics. The projection is facilitated by a sequence of folding operations that alter the structure and dynamics of the Petri Net model. We evaluate the approach with a real-world dataset from Dana-Farber Cancer Institute, a large outpatient cancer hospital in the United States.
Conference Paper
Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Recently, work on process mining showed how management of these processes, and engineering of supporting systems, can be guided by models extracted from the event logs that are recorded during process operation. In this work, we establish a queueing perspective in operational process mining. We propose to consider queues as first-class citizens and use queueing theory as a basis for queue mining techniques. To demonstrate the value of queue mining, we revisit the specific operational problem of online delay prediction: using event data, we show that queue mining yields accurate online predictions of case delay.
Article
In this paper we present a novel approach to specify and analyze complex system using product-form models. The main strengths of this approach are its high modu-larity and its ability of dealing with a very large class of product-form models. This has been possible because the product-form analysis is based on two properties that are formulated at a very low level, i.e., the Markov implies Markov property and the Reversed Compound Agent Theorem. We propose a unifying framework for combining product-form models defined in terms of dif-ferent formalisms and we give the conditions that allow the composition to be in product-form. The semantic of their combination is formally defined because the var-ious sub-models are transformed into GSPNs with an equivalent underlying process. In particular, we illus-trate with several examples that we can perform analy-sis of models with non-linear traffic equations, including those with some components being G-queues, product-form stochastic Petri nets, or multi-class queueing sta-tions.
Article
Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Information recorded by systems during the operation of these processes provide an angle for operational process analysis, commonly referred to as process mining. In this work, we establish a queueing perspective in process mining to address the online delay prediction problem, which refers to the time that the execution of an activity for a running instance of a service process is delayed due to queueing effects. We present predictors that treat queues as first-class citizens and either enhance existing regression-based techniques for process mining or are directly grounded in queueing theory. In particular, our predictors target multi-class service processes, in which requests are classified by a type that influences their processing. Further, we introduce queue mining techniques that derive the predictors from event logs recorded by an information system during process execution. Our evaluation based on large real-world datasets, from the telecommunications and financial sectors, shows that our techniques yield accurate online predictions of case delay and drastically improve over predictors neglecting the queueing perspective.
Article
Exploring and understanding large business process models are important tasks in the context of business process management. In recent years, several techniques have been proposed for the abstraction of business processes. Automated abstraction techniques have been devised for verifying correctness and consistency of process models and for providing customised process views for business process analysts. Yet a goal-focused and semantic-based approach to generate purposeful abstraction of business processes is an open issue. We propose an approach for configuration of process abstractions with respect to a specific abstraction goal expressed as constraints on the correspondence relation between concrete and abstract process and process transformation operators. Our framework goes beyond simple structural aggregation and leverages domain-specific properties, taxonomies, meronymy, and flow criteria to generate a hierarchy of abstract process models. We outline the constraint-based framework, describe how rewriting-based abstraction mechanisms are embedded with consistency criteria guiding the search for abstractions, and show how notions of behaviour consistency can be utilised to obtain abstractions that conform to behavioural process inheritance criteria.
Article
Process models discovered from a process log using process mining tend to be complex and have problems balancing between overfitting and underfitting. An overfitting model allows for too little behavior as it just permits the traces in the log and no other trace. An underfitting model allows for too much behavior as it permits traces that are significantly different from the behavior seen in the log. This paper presents a post-processing approach to simplify discovered process models while controlling the balance between overfitting and underfitting. The discovered process model, expressed in terms of a Petri net, is unfolded into a branching process using the event log. Subsequently, the resulting branching process is folded into a simpler process model capturing the desired behavior.
Article
This paper describes the Queueing Network Analyzer (QNA), a software package developed at Bell Laboratories to calculate approximate congestion measures for a network of queues. The first version of QNA analyzes open networks of multiserver nodes with the first-come, first-served discipline and no capacity constraints. An important feature is that the external arrival processes need not be Poisson and the service-time distributions need not be exponential. Treating other kinds of variability is important. For example, with packet-switched communication networks we need to describe the congestion resulting from bursty traffic and the nearly constant service times of packets. The general approach in QNA is to approximately characterize the arrival processes by two or three parameters and then analyze the individual nodes separately. The first version of QNA uses two parameters to characterize the arrival processes and service times, one to describe the rate and the other to describe the variability. The nodes are then analyzed as standard GI/G/m queues partially characterized by the first two moments of the interarrival-time and service-time distributions. Congestion measures for the network as a whole are obtained by assuming as an approximation that the nodes are stochastically independent given the approximate flow parameters.
Article
The aim of this paper is to present a methodology for a performance evaluation of a workflow net, a well-known subclass of Petri nets that specify the real time behavior of a business process, by considering first the case where the number of resources is limited which require the use of queueing systems theory and secondly the case where the number of resources is unlimited for which a method based on stochastic Petri nets theory is implemented. This type of estimation can assist designers of workflow in delivering efficient support to the business processes models.
Conference Paper
Petri nets are useful for modelling complex concurrent sys- tems. While modelling using Petri nets focusses on local states and ac- tions, the analysis methods are concerned with global states and their transitions. Unfortunately generation of the complete state space sufiers from the well-known state space explosion problem. This paper presents a method to overcome the state-space explosion problem for a class of Generalised Stochastic Petri Nets (GSPNs). Large complex GSPN mod- els are transformed into smaller, less complex ones with smaller state spaces than the original models. This transformation is called aggrega- tion. The aim of aggregation is to reduce the state space while preserving the desired behaviour of the original model. In this paper we investigate the aggregation of GSPNs preserving time dependent behaviour by using recent (5,6) and newly developed transformation rules. These rules are used to merge several single timed transitions into one merged transition. The flring rate of the merged transition turns out to be dependent on the marking of the net. Beside the introduction of a new method for the aggregation of exponential transitions with flxed flring rates, new for- mulae to aggregate transitions with marking-dependent flring rates are presented. Successive aggregation becomes possible to transform very complex models into models in which either a closed-form computation of the stationary state distribution is available or which has a very small state space. A prototype implementation is used to demonstrate both the drastically reduced state space for suitable models and the general limits of the method.
Book
The more challenging case of transient analysis of Markov chains is investigated in Chapter 5. The chapter introduces symbolic solutions in simple cases such as small or very regular state spaces. In general, numerical techniques are more suitable and are therefore covered in detail. Uniformization and some variants thereof are introduced as the method of choice for transient analysis in most cases. Particular emphasis is given to stiffness tolerant uniformization that can be of practical relevance in many modeling scenarios where relatively rare and fast events occur concurrently. As an alternative a method for aggregation/disaggregation of stiff Markov chains is introduced for a computation of approximate transient state probabilities. The method is based on a distinction of fast recurrent and fast transient sets of states that can be aggregated with relatively small error. All steps are illustrated by a detailed example model of server breakdown and repair. In addition to numerical results an error analysis is provided as well.
Article
The discipline of business process management aims at capturing, understanding, and improving work in organizations by using process models as central artifacts. Since business-oriented tasks require different information from such models to be highlighted, a range of abstraction techniques has been developed over the past years to manipulate overly detailed models. At this point, a clear understanding of what distinguishes these techniques and how they address real world use cases has not yet been established. In this paper we systematically develop, classify, and consolidate the use cases for business process model abstraction and present a case study to illustrate the value of this technique. The catalog of use cases that we present is based on a thorough evaluation of the state of the art, as well as on our cooperation with end users in the health insurance sector. It has been subsequently validated by experts from the consultancy and tool vendor domains. Based on our findings, we evaluate how the existing business process model abstraction approaches support the discovered use cases and reveal which areas are not adequately covered, as such providing an agenda for further research in this area.
Article
From the Publisher:This book presents a unified theory of Generalized Stochastic Petri Nets (GSPNs) together with a set of illustrative examples from different application fields. The continuing success of GSPNs and the increasing interest in using them as a modelling paradigm for the quantitative analysis of distributed systems suggested the preparation of this volume with the intent of providing newcomers to the field with a useful tool for their first approach. Readers will find a clear and informal explanation of the concepts followed by formal definitions when necessary or helpful. The largest section of the book however is devoted to showing how this methodology can be applied in a range of domains.
Book
Stochastic processes are necessary ingredients for building models of a wide variety of phenomena exhibiting time varying randomness. This text offers easy access to this fundamental topic for many students of applied sciences at many levels. It includes examples, exercises, applications, and computational procedures. It is uniquely useful for beginners and non-beginners in the field. No knowledge of measure theory is presumed.
Conference Paper
Performance analysis of Petri net models is limited by state explosion in the underlying Markovian model. To overcome this problem, an iterative approximate technique is obtained, using a number of auxiliary models, each of much lower state complexity. It is demonstrated on a substantial model which represents a parallel implementation of two layers of protocols for data communications. The model represents ten separate software tasks and their interactions via rendezvous, and is based on a testbed implementation in the laboratory. Submodels can be constructed in various ways, and this is illustrated with four different decompositions. Their state space complexity, solution time and solution accuracy are evaluated
Conference Paper
The authors present a decomposition approach for the solution of large stochastic Petri nets (SPNs). The overall model consists of a set of submodels whose interactions are described by an import graph. Each node of the graph corresponds to a parametrized SPN submodel and an arc from submodel A to submodel B corresponds to a parameter value that B must receive from A . The quantities exchanged between submodels are based on only three primitives. The import graph is normally cyclic, so the solution method is based on fixed point iteration. The authors apply their technique to the analysis of a flexible manufacturing system
Article
A technique is presented whereby queueing network models and generalized stochastic Petri nets are combined in such a way as to exploit the best features of both modeling techniques. The resulting hierarchical modeling approach is useful in the solution of complex models of system behavior. The authors have chosen two examples from the recent literature to illustrate the power and scope of this technique. They also demonstrate how folding of the generalized stochastic Petri net models for these two examples is useful in obtaining efficiently solvable, approximate models (bounding models)
Performance petri net analysis of communications protocol software by delay-equivalent aggregation. In: Petri Nets and Performance Models
  • C M Woodside
  • Y Li
A decomposition approach for stochastic petri net models. In: Petri Nets and Performance Models
  • G Ciardo
  • K S Trivedi
The critical-item, upper bounds, and a branch-and-bound algorithm for the tree knapsack problem
  • D X Shaw
  • G Cho
  • DX Shaw