Conference Paper

Mining Reference Process Models from Large Instance Data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Reference models provide generic blueprints of process models that are common in a certain industry. When designing a reference model, stakeholders have to cope with the so-called ‘dilemma of reference modeling’, viz., balancing generality against market specificity. In principle, the more details a reference model contains, the fewer situations it applies to. To overcome this dilemma, the contribution at hand presents a novel approach to mining a reference model hierarchy from large instance-level data such as execution logs. It combines an execution-semantic technique for reference model development with a hierarchical-agglomerative cluster analysis and ideas from Process Mining. The result is a reference model hierarchy, where the lower a model is located, the smaller its scope, and the higher its level of detail. The approach is implemented as proof-of-concept and applied in an extensive case study, using the data from the 2015 BPI Challenge.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Compared to e.g. trace clustering, where a few thousand clustering objects are still computationally feasible [15], activity clustering typically contains not more than a hundred objects, so computation times should not become a problem. The result of a clustering is an activity hierarchy. ...
... After obtaining the cluster hierarchy, we mine a reference model component for each identified activity set. Therefore, we use our RMM-2 approach for reference model mining based on execution semantics, adapted to work with process traces instead of process models [15]. It analyzes the represented process semantics in terms of behavioral profiles and computes a reference model subsuming the specified behavior. ...
... Our own work on mining reference process models from large instance data works in a similar way, applying a clustering approach onto the traces of an execution log. The objective of this approach is to determine reference models depicting the whole process execution, but with differing degrees of generality or domain specificity [15], as opposed to reference components depicting process fragments. For this purpose, the event log is divided and clustered horizontally along the trace similarity instead of vertically according to activity proximity, as we do here. ...
Chapter
Reference models are special conceptual models that are reused for the design of other conceptual models. They confront stakeholders with the dilemma of balancing the size of a model against its reuse frequency. The larger a reference model is, the better it applies to a specific situation, but the less often these situations occur. This is particularly important when mining a reference model from large process logs, as this often produces complex and unstructured models. To address this dilemma, we present a new approach for mining reference model components by vertically dividing complex process traces and hierarchically clustering activities based on their proximity in the log. We construct a hierarchy of subprocesses, where the lower a component is placed the smaller and the more structured it is. The approach is implemented as a proof-of-concept and evaluated using the data from the 2017 BPI challenge.
Article
Business process management often uses reference models to improve processes or as starting point when creating individual process models. The current academic literature offers primarily deductive methods with which to develop these reference models, although some methods develop reference models inductively from a set of individual process models, focusing on deriving and representing common practices. However, there is no inductive method with which to detect best practices and represent them in a reference model. This paper addresses this research gap by proposing a method by which to develop reference process models that represent best practices in public administrations semi-automatically and inductively. The method uses a merged model that retains the structure of the source models while detecting their common parts. It identifies best practices using query constructs and ranking criteria to group the source models’ elements and to evaluate these groups. We provide a conceptualization of the method and demonstrate its functionality using an artificial example. We describe our implementation of the method in a software prototype and report on its evaluation in a workshop with domain and method experts who applied the method to real-world process models.
Chapter
Enterprise architectures (EA) help organizations to analyze interrelations among their strategy, business processes, application landscape and information structures. Such ambitious endeavors can be supported by using reference models for EA. In recent years, research has increasingly been investigating inductive methods for reference model development. However, the characteristics of EA models have not been considered in this context yet. We therefore aim to adapt existing inductive approaches to the domain of reference enterprise architectures development. Using design science research our work contributes to the reference modeling community with (i) a comparative analysis of inductive reference modeling methods regarding their applicability to EA models, (ii) the refinement of an identified approach to reference EA design and (iii) its application in a use case.
Conference Paper
Full-text available
Real-life business processes are complex and show a high degree of variability. Additionally, due to changing conditions and circumstances , these processes continuously evolve over time. For example in the healthcare domain, advances in medicine trigger changes in diagnoses and treatment processes. Besides changes over time, case data (e.g. treating physician, patient age) also influence how processes are executed. Existing process mining techniques assume processes to be static and therefore are less suited for the analysis of contemporary flexible business processes. This paper presents a novel comparative trace clustering approach that is able to expose changes in behavior. Valuable insights can be gained and process improvements can be made by finding those points in time where behavior changed and the reasons why. Evaluation on real-life event data shows our technique can provide these insights.
Conference Paper
Full-text available
Reference models are a cost-and time-saving approach for the development of new models. As induc-tive strategies are capable of automatically deriving a potential reference process model from a collection of existing process models, they have gained attention in current research. A number of promising approaches can be found in recent publications. However, all existing methods rely on graph-based similarity measures to identify commonalities between input models. Since behaviourally similar process models can have different graphical structures, those approaches are unable to find certain commonalities. To overcome these shortcomings, we propose a new approach to inductive reference model development based on an execution-semantic similarity measure. Since a naïve solution to the intuitive idea does not yield productive results, the proposed approach is rather elaborate. By capturing the commonalities of the input models in a behavioural profile, we are able to derive a reference model subsuming the input models' semantics instead of their structure. In our contribution, this approach is outlined, implemented and evaluated in three different scenarios. As the evaluations show, it is capable of handling complex process models and overcome most restrictions that structural approaches pose. Thus, it introduces a new level of flexibility and applicability to inductive reference modelling.
Chapter
Full-text available
Automated process discovery techniques aim at extracting models from information system logs in order to shed light into the business processes supported by these systems. Existing techniques in this space are effective when applied to relatively small or regular logs, but otherwise generate large and spaghetti-like models. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. The result is a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically by means of subprocess extraction. The proposed technique allows users to set a desired bound for the complexity of the produced models. Experiments on real-life logs show that the technique produces collections of models that are up to 64% smaller than those extracted under the same complexity bounds by applying existing trace clustering techniques.
Chapter
Full-text available
Existing process mining techniques are able to discover a specific process model for a given event log. In this paper, we aim to discover a configurable process model from a collection of event logs, i.e., the model should describe a family of process variants rather than one specific process. Consider for example the handling of building permits in different municipalities. Instead of discovering a process model per municipality, we want to discover one configurable process model showing commonalities and differences among the different variants. Although there are various techniques that merge individual process models into a configurable process model, there are no techniques that construct a configurable process model based on a collection of event logs. By extending our ETM genetic algorithm, we propose and compare four novel approaches to learn configurable process models from collections of event logs. We evaluate these four approaches using both a running example and a collection of real event logs.
Conference Paper
Full-text available
The application of process mining and analysis techniques to the process logs of information systems often leads to highly complex results, e.g. in terms of a high number of elements in the mined model. Thus, clustering corresponding log files is mandatory in the context of an expedient analysis. Against that background, many cluster techniques were developed during the last years but, at the same time, it is unclear how powerful they operate in particular application scenarios. Therefore, the paper at hand aims at analyzing and comparing the capabilities of existing cluster techniques with regard to different objectives. As a result, some techniques are more suitable for the handling of particular scenarios than others and there are also general challenges in their application, which should be addressed in future work.
Article
Full-text available
Automated process discovery techniques aim at extracting process models from information system logs. Existing techniques in this space are effective when applied to relatively small or regular logs, but generate spaghetti-like and sometimes inaccurate models when confronted to logs with high variability. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. This leads to a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity and low fitness. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically using subprocess extraction. Splitting is performed in a controlled manner in order to achieve user-defined complexity or fitness thresholds. Experiments on real-life logs show that the technique produces collections of models substantially smaller than those extracted by applying existing trace clustering techniques, while allowing the user to control the fitness of the resulting models.
Article
Full-text available
With the design of reference models, an increase in the efficiency of information systems engineering is intended. This is expected to be achieved by reusing information models. Current research focuses mainly on configuration as one principle for reusing artifacts. According tothis principle, all variants ofa model are incorporatedin the reference modeI facilitating adaptations by choices. In practice, however, situations arise whereby various requirements to a model are unforeseen: Either results are inappropriate or costs of design are rising strongly. This chapter introduces additional design principles aiming at giving moreflexibility toboththedesignandapplicationofreference,models.
Article
Full-text available
With the design of reference models, an increase in the efficiency of information systems engineering is intended. This is expected to be achieved by reusing information models. Current research focuses mainly on configuration as one principle for reusing artefacts. According to this principle, all variants of a model are incorporated in the reference model facilitating adaptations by choices. In practice however, situations arise whereby various requirements to a model are unforeseen: Either results are inappropriate or costs of design are exploding. This paper introduces additional design principles that aim towards giving more flexibility to both the design and application of reference models.
Article
Full-text available
Process mining techniques attempt to extract non-trivial and useful information from event logs recorded by information systems. For example, there are many process mining techniques to automatically dis-cover a process model based on some event log. Most of these algorithms perform well on structured processes with little disturbances. However, in reality it is difficult to determine the scope of a process and typically there are all kinds of disturbances. As a result, process mining tech-niques produce spaghetti-like models that are difficult to read and that attempt to merge unrelated cases. To address these problems, we use an approach where the event log is clustered iteratively such that each of the resulting clusters corresponds to a coherent set of cases that can be adequately represented by a process model. The approach allows for different clustering and process discovery algorithms. In this paper, we provide a particular clustering algorithm that avoids over-generalization and a process discovery algorithm that is much more robust than the algorithms described in literature [1]. The whole approach has been im-plemented in ProM.
Chapter
Full-text available
Reference models have to be adapted to fit to the according application situation. In order to reduce the adaptation efforts, the concept of configurative reference modeling represents a promising approach. Nevertheless, since not every requirement of possible reference model users can be anticipated by the reference model developer, further model adaptations have to be performed. In order to support the reference model user decreasing his adaptation efforts by providing a higher methodological support, we propose to integrate generic model adaptation techniques with configurative reference modeling. Our paper presents recommendations for the construction of modeling languages that realize an integration of configurative and generic reference modeling.
Conference Paper
Full-text available
Process mining techniques attempt to extract non-trivial and useful information from event logs recorded by information systems. For example, there are many process mining techniques to automatically discover a process model based on some event log. Most of these algorithms perform well on structured processes with little disturbances. However, in reality it is difficult to determine the scope of a process and typically there are all kinds of disturbances. As a result, process mining techniques produce spaghetti-like models that are difficult to read and that attempt to merge unrelated cases. To address these problems, we use an approach where the event log is clustered iteratively such that each of the resulting clusters corresponds to a coherent set of cases that can be adequately represented by a process model. The approach allows for different clustering and process discovery algorithms. In this paper, we provide a particular clustering algorithm that avoids over-generalization and a process discovery algorithm that is much more robust than the algorithms described in literature [1]. The whole approach has been implemented in ProM.
Conference Paper
Full-text available
Process models can be seen as “maps ” describing the operational processes of organizations. Traditional process discovery algorithms have problems dealing with fine-grained event logs and lessstructured processes. The discovered models (i.e., “maps”) are spaghettilike and are difficult to comprehend or even misleading. One of the reasons for this can be attributed to the fact that the discovered models are flat (without any hierarchy). In this paper, we demonstrate the discovery of hierarchical process models using a set of interrelated plugins implemented in ProM. 3 The hierarchy is enabled through the automated discovery of abstractions (of activities) with domain significance.
Conference Paper
Full-text available
Reference process models are templates for common pro-cesses run by many corporations. However, the individual needs among organizations on the execution of these processes usually vary. A process model can address these variations through control-flow choices. Thus, it can integrate the different process variants into one model. Through configuration parameters, a configurable reference models enables corpo-rations to derive their individual process variant from such an integrated model. While this simplifies the adaptation process for the reference model user, the construction of a configurable model integrating several process variants is far more complex than the creation of a traditional reference model depicting a single best-practice variant. In this paper we therefore recommend the use of process mining techniques on log files of existing, well-running IT systems to help the reference model provider in creating such integrated process models. Afterwards, the same log files are used to derive suggestions for common configurations that can serve as starting points for individual configurations.
Conference Paper
Full-text available
Process mining has proven to be a valuable tool for analyzing operational process executions based on event logs. Existing techniques perform well on structured processes, but still have problems discovering and visualizing less structured ones. Unfortunately, process mining is most interesting in domains requiring exibilit y. A typical example would be the treatment process in a hospital where it is vital that people can deviate to deal with changing circumstances. Here it is useful to provide insights into the actual processes but at the same time there is a lot of diversity leading to complex models that are dicult to interpret. This paper presents an approach using trace clustering, i.e., the event log is split into homogeneous subsets and for each subset a process model is created. We demonstrate that our approach, based on log proles, can improve process mining results in real exible environments. To illustrate this we present a real-life case study.
Article
Full-text available
Two paradigms characterize much of the research in the Information Systems discipline: behavioral science and design science. The behavioral-science paradigm seeks to develop and verify theories that explain or predict human or organizational behavior. The design-science paradigm seeks to extend the boundaries of human and organizational capabilities by creating new and innovative artifacts. Both paradigms are foundational to the IS discipline, positioned as it is at the confluence of people, organizations, and technology. Our objective is to describe the performance of design-science research in Information Systems via a concise conceptual framework and clear guidelines for understanding, executing, and evaluating the research. In the design-science paradigm, knowledge and understanding of a problem domain and its solution are achieved in the building and application of the designed artifact. Three recent exemplars in the research literature are used to demonstrate the application of these guidelines. We conclude with an analysis of the challenges of performing high-quality design-science research in the context of the broader IS community.
Article
Full-text available
Process mining techniques have recently received notable attention in the literature; for their ability to assist in the (re)design of complex processes by automatically discovering models that explain the events registered in some log traces provided as input. Following this line of research, the paper investigates an extension of such basic approaches, where the identification of different variants for the process is explicitly accounted for, based on the clustering of log traces. Indeed, modeling each group of similar executions with a different schema allows us to single out "conformant" models, which, specifically, minimize the number of modeled enactments that are extraneous to the process semantics. Therefore, a novel process mining framework is introduced and some relevant computational issues are deeply studied. As finding an exact solution to such an enhanced process mining problem is proven to require high computational costs, in most practical cases, a greedy approach is devised. This is founded on an iterative, hierarchical, refinement of the process model, where, at each step, traces sharing similar behavior patterns are clustered together and equipped with a specialized schema. The algorithm guarantees that each refinement leads to an increasingly sound mDdel, thus attaining a monotonic search. Experimental results evidence the validity of the approach with respect to both effectiveness and scalability.
Article
Full-text available
Database design commonly assumes, explicitly or implicitly, that instances must belong to classes. This can be termed the assumption of inherent classification. We argue that the extent and complexity of problems in schema integration, schema evolution, and interoperability are, to a large extent, consequences of inherent classification. Furthermore, we make the case that the assumption of inherent classification violates philosophical and cognitive guidelines on classification and is, therefore, inappropriate in view of the role of data modeling in representing knowledge about application domains. As an alternative, we propose a layered appro...
Chapter
With the design of reference models, an increase in the efficiency of information systems engineering is intended. This is expected to be achieved by reusing information models. Current research focuses mainly on configuration as one principle for reusing artefacts. According to this principle, all variants of a model are incorporated in the reference model facilitating adaptations by choices. In practice however, situations arise whereby various requirements to a model are unforeseen: Either results are inappropriate or costs of design are exploding. This paper introduces additional design principles that aim towards giving more flexibility to both the design and application of reference models. Purchase this chapter to continue reading all 30 pages >
Article
Conceptual models play an increasingly important role in all phases of the information systems life cycle. For instance, they are used for business engineering, information systems development and customizing of enterprise resource planning (ERP) systems. Despite conceptual modeling being a vital instrument for developing information systems, the modeling process often is resource consuming and faulty. As a way to overcome these failures and to improve the development of enterprise-specific models, the concept of reference modeling has been introduced. A reference model is a conceptual framework and may be used as a blueprint for information systems development. In this chapter, we seek to motivate research on reference modeling and introduce the chapters of this book on using reference models for business systems analysis. Our discussion is based on a framework for research on reference modeling that consists of four elements: reference modeling languages, reference modeling methods, reference models and reference modeling context. Each element of the framework is discussed with respect to prior research, the contributions of chapters in this book and future research opportunities.
Data
Two paradigms characterize much of the research in the Information Systems discipline: behavioral science and design science. The behavioral-science paradigm seeks to develop and verify theories that explain or predict human or organizational behavior. The design-science paradigm seeks to extend the boundaries of human and organizational capabilities by creating new and innovative artifacts. Both paradigms are foundational to the IS discipline, positioned as it is at the confluence of people, organizations, and technology. Our objective is to describe the performance of design-science research in Information Systems via a concise conceptual framework and clear guidelines for understanding, executing, and evaluating the research. In the design-science paradigm, knowledge and understanding of a problem domain and its solution are achieved in the building and application of the designed artifact. Three recent exemplars in the research literature are used to demonstrate the application of these guidelines. We conclude with an analysis of the challenges of performing high-quality design-science research in the context of the broader IS community.
Conference Paper
Most Process Mining techniques assume business processes remain steady through time, when in fact their underlying design could evolve over time. Discovery algorithms should be able to automatically find the different versions of a process, providing independent models to describe each of them. In this article, we present an approach that uses the starting time of each process instance as an additional feature to those considered in traditional clustering approaches. By combining control-flow and time features, the clusters formed share both a structural similarity and a temporal proximity. Hence, the process model generated for each cluster should represent a different version of the analyzed business process. A synthetic example set was used for testing, showing the new approach outperforms the basic approach. Although further testing with real data is required, these results motivate us to deepen on this research line.
Article
During the last decade a new generation of process-aware information systems has emerged, which enables process model configurations at buildtime as well as process instance changes during runtime. Respective adaptations result in a large number of process model variants that were derived from the same process model, but slightly differ in structure. Generally, such model variants are expensive to configure and maintain. This paper introduces two different scenarios for learning from process model adaptations and for discovering a reference model out of which the variants can be configured with minimum efforts. The first scenario presumes a reference process model and a collection of related process model variants. The goal is to evolve the reference process model such that it structurally fits better to the given variant models. The second scenario comprises a collection of process variants, while the original reference model is unknown; i.e., the goal is to "merge" these variants into a reference process model. We suggest two algorithms that are applicable in both scenarios, but which have their pros and cons. We systematically compare the two algorithms and contrast them with conventional process mining techniques. Our comparison results indicate good performance of both algorithms. Further they confirm that specific techniques are needed for learning from past process adaptations. Finally, we present a case study in the automotive industry in which we applied our algorithms.
Article
Organizations are subject to constant evolution and must systematically analyze and design the impact of change to implement it consistently across all organizational domains. A thorough understanding of all relevant business-related artifacts as well as their relationships is a prerequisite to achieve this. For many organizations, business architecture management is a means to ensure the correct and up-to-date documentation of these artifacts. One challenge of business architecture management is the development a company-specific business architecture meta model. Two directions of existing work provide partial solutions: (1) generic (meta) modeling methods and (2) business architecture meta models and languages. We argue that these two approaches complement each other and should be applied in an integrated way. The goal of this contribution is to propose such an integrated approach to business architecture engineering. The development of this approach follows the design research process and is based on experiences gained in three industrial business architecture engineering projects.
The process matching contest 2015
  • G Antunes
  • M Bakhshandelh
  • J Borbinha
  • J Cardoso
  • S Dadashnia
Antunes, G., Bakhshandelh, M., Borbinha, J., Cardoso, J., Dadashnia, S., et al.: The process matching contest 2015. In Kolb, J., Leopold, H., Mendling, J. (eds.) Proceedings of the 6th International Workshop on Enterprise Modelling and Information Systems Architectures. International Workshop on Enterprise Modelling and Information Systems Architectures (EMISA-15), September 3-4, Innsbruck, Austria, Köllen Druck+Verlag GmbH, Bonn, September 2015
  • B Van Dongen
van Dongen, B.: BPI Challenge 2015 (2015). http://dx.doi.org/10.4121/uuid: 31a308efc844-48da-948c-305d167a0ec1
Concern-oriented business architecture engineering
  • S Kurpjuweit
  • R Winter
Kurpjuweit, S., Winter, R.: Concern-oriented business architecture engineering. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 265-272. ACM (2009)