ArticlePDF Available

Optimization-Based Business Process Model Matching

Authors:

Abstract

The rapid increase in generation of business process models in the industry has raised the demand on the development of process model matching approaches. In this paper, we introduce a novel optimization-based business process model matching approach which can flexibly incorporate both the behavioral and label information of processes for the identification of correspondences between activities. Given two business process models, we achieve our goal by defining an integer linear program which maximizes the label similarities among process activities and the behavioral similarity between the process models. Our approach enables the user to determine the importance of the local label-based similarities and the global behavioral similarity of the models by offering the utilization of a predefined weighting parameter, allowing for flexibility. Moreover, extensive experimental evaluation performed on three real-world datasets points out the high accuracy of our proposal, outperforming the state of the art.
24th International Conference on Business Information Systems (BIS 2021)
Big Data
https://doi.org/10.52825/bis.v1i.60
© Authors. This work is licensed under a Creative Commons Attribution 4.0 International License
Published: 02 July 2021
Optimization-based Business Process Model Matching
Merih Seran Uysal1[https://orcid.org/0000-0003-1115-6601], Dominik H¨user1, and Wil M.P. van der Aalst1[https://
orcid.org/0000-0002-0955-6940]
1Process and Data Science Chair, RWTH Aachen University, Aachen, Germany
{uysal,wvdaalst}@pads.rwth-aachen.de dominik.hueser@rwth-aachen.de
Abstract. The rapid increase in generation of business process models in the industry has
raised the demand on the development of process model matching approaches. In this paper, we
introduce a novel optimization-based business process model matching approach which can
flexibly incorporate both the behavioral and label information of processes for the identifi-cation of
correspondences between activities. Given two business process models, we achieve our goal by
defining a n i nteger l inear p rogram w hich m aximizes t he l abel s imilarities among process
activities and the behavioral similarity between the process models. Our approach en-ables the
user to determine the importance of the local label-based similarities and the global behavioral
similarity of the models by offering the utilization of a predefined weighting param-eter, allowing
for flexibility. M oreover, e xtensive e xperimental e valuation p erformed o n three real-world
datasets points out the high accuracy of our proposal, outperforming the state of the art.
Keywords: Process Model Matching, Optimization Problem, Integer Linear Programming, Be-
havioral Similarity
1 Introduction
The ubiquity of advanced capabilities of the digital world enables organizations to generate
and store process models which exhibit indispensable activities of their business processes
in various domains, e.g., finance, l o gistics, a nd p roduction [ 1, 1 4, 18]. T he r esulting increase
in uptake of business process model repositories leads to the need for the development of
techniques in various fields, e . g. s t orage o f p r ocess m o dels, m a nagement o f repositories,
process querying, and process model matching.
Process model matching is the task of finding correspondences between the activities of two
given process models. In particular, for very large process model repositories of organizations,
it is essential to utilize process model matching techniques in order to determine similar models and
merge them, eliminate redundancies, as well as alleviate storage and processing costs, and
increase efficiency accordingly.
Most of the existing model matching techniques typically utilize activity labels and process
structures to determine process matching in model repositories [12]. However, incorporating the
behavior of the underlying process models is indispensable while detecting process match-ing.
Unlike label-based and structure-based process matching approaches, behavioral process model
matching takes the order of the activities in the models into consideration to attain a more
reliable, accurate matching.
In this paper, we introduce a novel business process model matching approach Optimization-based
Process Model Matching (OPTIMA) which matches the individual components of two
61
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
given process models to each other by enabling the incorporation of both the label and be-
havioral information of the process models. Our proposal exhibits an optimization problem
which maximizes the activity label similarities at an individual local level, and simultaneously
maximizes the behavioral similarity of the given processes at a global level by utilizing their
relational profiles [19, 21, 22, 24]. Thanks to the high flexibility of our approach, it is possible for
the user to set the importance (i.e. weighting) of the behavioral information to be incorporated,
as well as the label information of the process model components. Furthermore, our approach
is completely independent of the application of a prior matching of activity labels, exposing a
competitive advantage, when compared with some existing approaches. Moreover, our exten-
sive comparative experimental evaluation performed on three real-world datasets points out the
competitiveness of our proposal against the existing techniques, in particular outperforming the
state of the art in terms of f-score performance.
Our paper is structured as follows: Section 2 gives an overview of the related work regarding
business process model matching. Then, Section 3 presents the preliminaries including fun-
damental information about Petri nets, as well as relational profiles, and similarity functions we
define. In Section 4, we introduce our proposal Optimization-based Process Model Matching
(OPTIMA), followed by Section 5 which presents the extensive experimental results. Our paper
is concluded by Section 6 with a conclusion and future work.
2 Related Work
Business process model matching has been a challenging research area where there have
been numerous attempts to provide effective and accurate techniques. Process model match-
ing describes the task of finding corresponding transitions in two given process models, whose
roots stem from process model similarity [4, 6, 7, 16, 17] and ontology matching [8] relying
on structural and label comparison of processes [2, 5]. Researchers have primarily developed
label-based matching techniques which assesses the similarity of acitivity labels in process
models. Exhibiting a well-known label-based approach, the basic Bag of Words (BoW) match-
ing technique [11] first determines pairwise bag of words similarity among the labels of transi-
tions, and a word similarity function is used, such as Levenshtein [23] or Lin [13], to compute
all pairwise similarity scores and find out the highest scores for the matching.
In contrast to ontology and label-based matching, process models exhibit additional behav-
ioral information which cannot be captured by only considering labels or process structures.
Based on this fact, researchers have developed further approaches considering the behavioral
information of process models. The authors of [12] propose a behavioral model matching ap-
proach which considers both label-based similarities and behavioral relations. After determining
the semantic similarity of label components, match constraints are derived based on behavioral
profiles [22] of the process models. These constraints are utilized towards a matching formal-
ized as an optimization problem and solved by using Markov Logic Network inference. Another
further model matching approach is proposed in [3] which is based on the quantitative bisimu-
lation. First, process models are converted into labeled transition systems and then the degree
of simulation is computed, followed by solving a linear program, corresponding to the overall
bisimulation result. We refer to [16, 17] for a more comprehensive study of model matching
approaches.
Since our proposal can incorporate both the information of activity labels and behavior of
given two processes regulated by a parameter, it is noteworthy to use the Bag of Words ap-
proach, a label-based method, as a baseline for the label-based matching comparison for our
evaluations later on.
62
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
3 Preliminaries
For investigating process model matching, we first use Petri nets and workflow nets as our
formal grounding [1]. Then, we formulate the relational profile exhibiting a compact behavioral
representation of a Petri net. Last, we present similarity functions and give the definition of the
relation type similarity we require for our proposal later on.
3.1 Petri Nets
Originally introduced by C. Adam Petri [15], Petri nets are the most utilized process modeling
language which allows for concurrency modelling, as well as the analysis of process models
effectively. Below, we first present the definition of Petri net, labeled Petri net, and workflow net
definitions and terms based on [1], serving as fundamentals for our paper.
Definition 1 (Petri net) APetri net is defined as a triplet N= (P, T , F )where Pis a finite set
of places,Ta finite set of transitions such that PT=, and F(P×T)(T×P)is the flow
relation denoting a set of directed arcs. A marked Petri net is defined as a pair (N, M )where
Nis a Petri net and MB(P)is a multi-set over Pdenoting the marking of the net.
Definition 2 (Labeled Petri net) Let Adenote the universe of activity labels. A Labeled Petri
net is a tuple N= (P, T , F, A, λ)where (P, T, F )is a Petri net, A A is the set of activity
labels, and λ:TAis the labeling function.
For some particular transitions which are not observable, we use the notation τ, i.e. a
transition twith l(t) = τis unobservable and is referred to as silent or invisible. Furthermore,
elements of PTare referred to as nodes. For any xPT, the pre-set of x(a.k.a. input set),
denoted x, is the set of nodes with a directed arc to x, i.e. x={y|(y , x)F}. The post-set
of x, denoted x, is the set of nodes with a directed arc from x, i.e. x={y|(x, y)F}.
A marked, labeled Petri net is referred to as labeled Petri net system, denoted S= (N, M0),
where N= (P, T , F, A, λ)is a labeled Petri net and M0B(P)a multi-set over the places P,
denoting the initial marking. We let Ndenote the universe of marked labeled Petri nets.
As convention, for any labeled Petri net system S= (N, M )with N= (P, T, F ), we let T
denote the universe of transitions, and Tv(S) := {tT|λ(t)6=τ}be the set of non-silent
(a.k.a. visible) transitions in S. For sake of simplicity, the notation Tv(S)is replaced by Tv
Sin the
remainder of the paper, where necessary.
Given a labeled Petri net system (N, M)with N= (P, T , F, A, λ), the transition tT
is enabled in marking M, denoted (N, M )[ti, iff tM. The firing rule [i N × T×
Nis the smallest relation satisfying for any (N, M )∈ N and any tT: (N, M )[ti=
(N, M )[ti(N, M )\ •t)]t.
For a given labeled Petri net system (N, M0), a sequence σ=ht1, ..., tni ∈ T,with nN,
is called firing sequence of (N, M0)iff there exist markings M1, ..., Mnsuch that for all iwith
0i < n,ti+1 is enabled in marking Mi, i.e. (N , Mi)[ti+1i, and firing ti+1 ends up in the
marking Mi+1, i.e. (N, Mi)[ti+1i(N, Mi+1).
Workflow nets, a subclass of Petri nets, are highly relevant for business process modeling
due to their strength in natural representation of the life-cycle of cases of the underlying process
models [1]. The formal definition of workflow net is given below.
Definition 3 (Workflow net) Given an identifier ¯
t /PT, a labeled Petri net N= (P, T, F, A, λ)
is called a workflow net (WF-net) iff (1) Pcontains a source place iP(a.k.a. input place)
such that i=, (2) Pcontains a sink place oP(a.k.a. output place) such that o=and (3)
its short circuit net ¯
N= (P, T ∪ {¯
t}, F ∪ {(o, ¯
t),(¯
t, i)}, A ∪ {τ}, λ ∪ {(¯
t, τ )})is strongly connected,
i.e. there is a directed path between any pair of nodes in ¯
N.
63
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
𝑝1𝑝2
𝑝3
𝑝4
𝑝5
𝑡1
𝑡2𝑡4
𝑡3𝑡5
Figure 1. An example workflow net. The notation pidenotes the i-th place and tjdenotes the
j-th transition. The places p1and p5exhibit the input (aka source) place and output (aka sink)
place, respectively.
Since WF-nets can expose processes with errors, such as deadlocks, activities that can
never become active, still enabled intermediate transitions in spite of the process termination,
etc., we need to define soundness criterion which is commonly used in the literature [20].
A workflow net N= (P, T , F, A, λ)with an input place iPand an output place tP
is called sound iff (1) (N, [i]) is safe, i.e. places cannot hold multiple tokens at the same time
(safeness), (2) for any marking M[N, [i]i:oMM= [o](proper completion), (3) for
any marking M[N, [i]i: [o][N , Mi(option to complete), (4) for any transition tT,
there is a firing sequence enabling t, i.e. (N , [i]) includes no dead transitions (absence of dead
parts). Furthermore, a Petri net is free-choice if any two transitions sharing an input place have
identical input sets, i.e. for all transitions t1, t2T , t1∩ •t26=∅ ⇒ •t1=t2. Figure 1 exhibits
an example workflow net.
3.2 Relational Profiles
In order to give a compact behavioral representation of a Petri net, an appropriate structure is
required which captures the relationships among its transitions. Below, we present the compre-
hensive definition of the relational profile.
Definition 4 (Relational profile) Let N= (P, T , F, A, λ)be a sound free-choice workflow net
and S= (N, M0)the corresponding workflow net system. A relational profile RS= (Ψ,Ω) of
Sis a tuple comprising a set Ψof relation types and an assignment relation T×T×Ψ
which assigns pairs of transitions relation types. A transition sTis in a relation RΨwith a
transition tT, denoted sRt, iff (s, t, R).RSis called mutually exclusive relational profile
if for all transitions s, t Tand all relation types R1, R2Ψwith R16=R2: (s, t, R1)
(s, t, R2)6∈ .
Since our proposal OPTIMA requires that profiles assign at most one relation per pair of
transitions, we will consider such relational profiles satisfying the latter via the term mutually
exclusive profiles, complying with [21].
Example. We consider a relational profile RS= (Ψ,Ω) of the workflow net in Figure 1 and
two relation types eventually-follows relation ⊆ T×Tand directly-follows relation >T×T,
resulting in Ψ = {, >}. Note that (ti, tj)is in an eventually-follows relation if there exists a
firing sequence which fires tibefore tj. In contrast, (ti, tj)is in a directly-follows relation if there
exists a firing sequence where tjis fired after tiwithout any visible transition in between. In
Figure 1, we realize that t1t4holds but t1is not directly-followed by t4, i.e. t16> t4, thus,
(t1, t4,)and (t1, t4, >)/. In addition, RSis not a mutually exclusive relational profile,
since two transitions can exhibit more than one relation, e.g. (t1, t2,)and (t1, t2, >).
3.3 Similarity Functions
After presenting the definition of relational profile, we now focus on the similarity computation
of two relational profiles of the Petri nets at hand. Since we assume that mutually exclusive
profiles are used to represent the behavior of Petri nets, we define a similarity function which
64
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
determines the similarity of two given relation types, corresponding to the behavioral similarity
of two transitions exposing those relation types, such as directly-follows relation and eventually-
follows relation. In this paper, since we consider the relational profiles of the α-relational Profile
(αP) [19], the Behavioral Profile (BP) [21, 22], and the BP+ profile (BPP) [24], we define the
relation type similarity function by using the aforementioned profiles. Please note that this
similarity function is not limited to these profiles, and can easily be extended by the further
profiles accordingly, where necessary.
Definition 5 (Relation type similarity) Let S1and S2be sound and free-choice WF-net sys-
tems with relational profiles RS1= (Ψ,1)and RS2= (Ψ,2)of type R ∈ {BP, αP,BP P}.
The relation type similarity simR: Ψ ×Ψ[0,1] of two relation types R1, R2Ψis defined by:
simR(R1, R2) = ([R1=R2]if R ∈ {BP, αP}
wR1,R2if R=BPP
where the similarity value wR1,R2of BP+ relation types stems from [24](Table 2).
The identification function [α]∈ {0,1}returns 1 if and only if the statement αis true, i.e. if
the relation equivalence holds in the definition above.
Analogously, the label-based similarity function simL:T ×T [0,1] computes the similarity
of the given two transitions, which will be utilized in the upcoming section, too.
4 Optimization-based Process Model Matching
In this section, we propose our novel approach Optimization-based Process Model Matching
(OPTIMA) which takes both local and global information of the underlying process models into
consideration. This is achieved by utilizing the label information of the activity labels and the
behavior information of both process models.
Our approach is presented as an optimization problem which maximizes the label similari-
ties at an individual local level, and simultaneously maximizes the behavioral similarity of both
processes at a global level by using their relational profiles. This is attained by defining an
integer linear program which exhibits an optimization problem with a linear objective function,
linear constraints, and variables which are defined to be integers [10].
In order to provide flexibility for the user, e.g. process owner, domain expert, etc., we
introduce a weighting parameter wwhich determines how much importance will be attached
to label information and behavioral information, aligning with the user intention. Moreover, our
proposal is fully independent of the application of a prior matching of transition labels, which
constitutes an important competitive advantage in comparison with some existing approaches.
For sake of simplicity, the notations Tv(S1)and Tv(S2)will be replaced by Tv
1and Tv
2for the
remainder of our paper, where required. Below, we first give the formal definition of our novel
approach OPTIMA and then elaborate on its constraints:
Definition 6 (Optimization-based Model Matching) Given two sound free-choice WF-net sys-
tems S1= (N1, M S1
0),S2= (N2, M S2
0)with N1= (P1, T1, F1, A1, λ1),N2= (P2, T2, F2, A2, λ2),
and mutually exclusive relational profiles RS1= (Ψ,1)of S1and RS2= (Ψ,2)of S2, let
simR: Ψ ×Ψ[0,1] be a relation type similarity of the profile type Rand simL:T × T
[0,1] be a label-based similarity function. The Optimization-based Model Matching (OPTIMA)
MTv
1×Tv
2is derived from the optimal solution of the following problem:
65
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
max wPs1,s2Tv
1
t1,t2Tv
2
1
m2ys1,s2,t1,t2simR(RS1
s1,s2,RS2
t1,t2) + (1 w)PsTv
1
tTv
2
1
mxs,tsimL(s, t)
s.t. X
sTv
1
xs,t 1tTv
2(1)
X
tTv
2
xs,t 1sTv
1(2)
2ys1,s2,t1,t2xs1,t1+xs2,t2s1, s2Tv
1, t1, t2Tv
2(3)
xs,t ∈ {0,1} ∀sTv
1, t Tv
2(4)
ys1,s2,t1,t2∈ {0,1} ∀s1, s2Tv
1, t1, t2Tv
2(5)
where w[0,1] denotes the weighting parameter, and m= min{|Tv
1|,|Tv
2|}.
The maximum number of simple correspondences of the two nets N1and N2, i.e. the
matching of single transitions of Petri nets, is determined by m:= min{|Tv
1|,|Tv
2|}. According to
Constraint (4) above, for transitions sTv
1and tTv
2,xs,t ∈ {0,1}indicates if sis matched to
t(i.e. xs,t = 1) or not (i.e. xs,t = 0). Constraints (1) and (2) ensure that every transition of one
WF-net is matched to at most one transition of the other WF-net.
The decision variable yis concerned with the aggregation of the information of two xvari-
ables: For visible transitions s1, s2Tv
1and t1, t2Tv
2, Constraint (5) indicates if s1is matched
to t1and simultaneously if s2is matched to t2(i.e. ys1,s2,t1,t2= 1) or not (i.e. ys1,s2,t1,t2= 0).
Furthermore, Constraint (3) denotes the relationship between the variables xs1,t1,xs2,t2, and
ys1,s2,t1,t2which ensures that if xs1,t1=xs2,t2= 1, then the maximization problem results in
ys1,s2,t1,t2= 1 due to the nature of the problem definition.
The objective function comprising two summands aims to maximize the average label sim-
ilarity between matched transitions, and maximize the behavioral similarity of both WF-nets,
depending on their relational profiles. Finally, the obtained sum is normalized by the squared
number of maximum possible simple correspondences m2to provide an objective value in
[0.0,1.0].
5 Experiments
In this section, we first give details about the experimental system setup, datasets, and the
process model matching approaches which are used in our evaluations. Then, we will present
the extensive evaluation results.
5.1 Experimental Setup
5.1.1 System setup.
The implementation of programs is performed in JAVA 8 and experiments are conducted on 2
×Intel Xeon Gold 5115 CPUs, each consisting of 10 cores and 20 threads @ 2.40GHz with a
total of 512 GB RAM DDR4-2400 and 15 ×400-AXQU 960 GB SSD with Ubuntu Linux 18.04.
In addition, for our proposed OPTIMA approach, we utilize Gurobi 8.0.1 [9], and adopt the Petri
net and behavioral profile implementation from the jBPT1library. The implementation utilized
for the evaluation results presented in this paper is available and can be publicly checked out2.
5.1.2 Datasets.
We use three real-world datasets which arise from the Process Model Matching Contests 2015
[2]. The first dataset is the University Admission Processes (abbreviated by University) compris-
ing 36 model pairs derived from 9 models representing the application procedure for students
1https://github.com/jbpt/codebase
2https://github.com/domhues/ilp-matcher
66
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
Characteristics University Birth Asset
Before conversion to PNML
# model pairs 36* 36 36*
# transitions (min) 12* 9 1*
# transitions (max) 45* 25 43*
# transitions (avg) 24.2* 19.3 18.6*
After conversion to PNML
# model pairs 21 - 17
# non-silent transitions (min) 16 - 1
# non-silent transitions (max) 32 - 21
# non-silent transitions (avg) 24.1 - 6.4
Gold standard
% matched non-silent transitions 25.35% 65.95% 84.86%
% unmatched non-silent transitions 74.65% 34.05% 15.14%
% simple correspondences 83.3% 14.0% 22.0%
% complex correspondences 16.7% 86.0% 78.0%
% trivial correspondences 33.3% 4.0% 18.3%
Table 1. Characteristic information of the datasets Birth, University, and Asset (values are
adopted from [2]).
at nine universities in Germany. The second dataset is the Birth Registration Processes (Birth)
consisting of 36 model pairs that were derived from 9 models representing the birth registration
processes of Germany, Russia, South Africa, and the Netherlands. The third dataset is As-
set Management Processes (Asset) which includes 36 model pairs that were derived from 72
models from an SAP Reference Model Collection covering the fields of finance and accounting.
Since the University and Asset datasets originally include process models of BPMN and EPML
formats, respectively, these models are first converted to Petri nets, i.e. PNML format, so that
process model matching approaches and our proposal can be evaluated.
It is noteworthy to state that model pairs available in the datasets Birth, University, and Asset
are associated with a gold standard indicating the ground truth corresponding to the optimal
matching of the process model pairs. The gold standard is derived manually by making use of
the human expert knowledge, comprising simple (1:1) and complex (1:n) correspondences.
Table 1 presents key characteristics about the three aforementioned datasets. As men-
tioned above, the process models in the datasets University and Asset are of BPMN and EPML
formats, respectively, which are converted to Petri nets, i.e. PNML format. It is visible that the
conversion of the models from BPMN and EPML into PNML format affects the number of model
pairs which are then used for the matching evaluations. The reason for obtaining less process
models after the format conversion is that some transformed models are not free-choice models
any more to which relational profiles cannot be applied. Since the Birth dataset comprises the
process models of the PNML format, there is no need to apply any model conversion, i.e. 36
process model pairs remain for the experimental evaluation. In addition, the University dataset
indicates the highest number of non-silent transitions (24.1transitions), while the Asset dataset
has the lowest number of non-silent transitions (6.4transitions). Furthermore, the Asset dataset
includes minimum 1 non-silent transition, while the University dataset shows the highest mini-
mum number of non-silent transitions (16).
At the bottom of Table 1, we realize some information regarding the gold standard indi-
cating the percentage of the matched and unmatched transitions, as well as the information
of simple and complex correspondences. For the University dataset, 25.35% of the non-silent
transitions are considered as mapped by the gold standard, while 74.65% remain unmatched.
Furthermore, the University dataset indicates a high percentage of simple correspondences
(83.3%) in the gold standard, while the Asset dataset shows a high percentage of complex
correspondences (78%) in its gold standard. Please note that a low percentage of simple cor-
respondences corresponds to a high percentage of complex correspondences.
67
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
5.1.3 Approaches.
As given above, our proposal OPTIMA utilizes both label-based and behavioral information of
process models by means of a weighting parameter w. We vary w∈ {0.0,0.1, ..., 1.0}and utilize
the basic Bag of Words label similarity function with Lin word similarity function. Furthermore,
we consider the following individual relational profiles of process models: the α-relational Profile
(αP) [19], the Behavioral Profile (BP) [21, 22], and the BP+ profile (BP P ) [24]. Please note
that the evaluation and comparison of various label similarity functions is not the scope of this
work.
In order to provide a fair empirical study, we consider the existing process model matching
approaches which in particular take the behavioral information of process models into consid-
eration, too. First, we utilize the Markov Logic Network model matching approach [12] with
two different labeling functions, namely Refactored label similarity function, as proposed by the
authors, and the basic Bag of Words label similarity function with Lin word similarity function
[13] (abbreviated by MarkovRand MarkovB, respectively) so that the results can be compared
with those of OPTIMA. Then, we use the bisimulation-based model matching approach (Bisim)
[3] utilizing the basic Bag of Words label similarity function with Lin word similarity function so
that the results are comparable with those of our proposal.
It is noteworthy to state that setting w= 0 leads to the fact that OPTIMA exposes only a
label-based process model matching, while setting w= 1 provides our proposal to include only
behavioral information for process model matching. Hence, regarding the former, we consider
a baseline from the class of the label-based process model matching approaches, i.e. the Bag
of Words process model matching approach [11] (BoW ) with Lin word similarity function (with
the threshold value of 0.7, as authors suggest).
We compare our computed matching results, i.e. found correspondences in our matching,
against a gold standard which is generated by authors in [2]. In this way, each found correspon-
dence of the activity pair is determined to be in one of the following classes: true-positive (T P ),
true-negative (T N ), false-positive (F P ) or false-negative (F N). Taking this classification into
consideration, we then calculate the precision (T P /(T P +F P )), the recall (T P /(T P +F N )),
and the f-score (2 ×precision ×recall)/(precision +recall).
In order to gather comparable, fair evaluation results, we utilize the average of precision,
recall, and f-score values, referring to [2]: the macro evaluation considers the average of pre-
cision, recall, and f-score values over all test cases, while the micro evaluation is obtained by
first summing up all true/false positives, true/false negatives, and then computing the preci-
sion, recall, and f-score values once at the end of the computation. Furthermore, since Bisim,
MarkovR, MarkovB, and our proposal OPTIMA consist of parameters which are not necessarily
preset by the authors, we present our results obtained by applying the parameters which lead
to the highest micro f-score values, as stated in [2]. Due to space limitations, we focus on the
precision, recall, and f-score comparison of our proposal and the aforementioned approaches.
5.2 Experimental Results
After giving details about the experimental setup above, we now present our evaluation results.
Based on micro and macro aggregation of the results, Figure 2 exhibits precision, recall, and
f-score results of our proposal OPTIMA and four state-of-the-art approaches, respectively. By
inspecting the plots in both figures, we note that micro and macro aggregated evaluation results
of all approaches expose a similar tendency over utilized real-world datasets (Birth, Asset, and
University), as anticipated.
As stated before, we present the results attained by applying the parameters resulting in
the highest micro f-score values, as proposed in [2]. The results of our approach OPTIMA are
obtained by determining the best values attained with the BP+ profile (BPP) with the weighting
parameter w= 0.4on the Birth dataset, with the BP+ profile (BP P ) with the weighting param-
68
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
0
0.2
0.4
0.6
0.8
1
Birth Asset University
precision value (macro)
dataset
BoW OPTIMA MarkovR MarkovB Bisim
(a) Precision results obtained by macro aggregation.
0
0.2
0.4
0.6
0.8
1
Birth Asset University
precision value (micro)
dataset
BoW OPTIMA MarkovR MarkovB Bisim
(b) Precision results obtained by micro aggregation.
0
0.2
0.4
0.6
0.8
Birth Asset University
recall value (macro)
dataset
BoW OPTIMA MarkovR MarkovB Bisim
(c) Recall results obtained by macro aggregation.
0
0.2
0.4
0.6
0.8
Birth Asset University
recall value (micro)
dataset
BoW OPTIMA MarkovR MarkovB Bisim
(d) Recall results obtained by micro aggregation.
0
0.2
0.4
0.6
0.8
1
Birth Asset University
fscore value (macro)
dataset
BoW OPTIMA MarkovR MarkovB Bisim
(e) F-score results obtained by macro aggregation.
0
0.2
0.4
0.6
0.8
1
Birth Asset University
fscore value (micro)
dataset
BoW OPTIMA MarkovR MarkovB Bisim
(f) F-score results obtained by micro aggregation.
Figure 2. Micro and macro aggregation results of precision, recall, and f-score measures
obtained on three real-world datasets, i.e. Birth Registration Processes (Birth), Asset Man-
agement Processes (Asset), and University Admission Processes (University) stemming from
Process Model Matching Contest 2015 [2]. For a fair comparison, we utilize (i)Bag of Words
process model matching approach [11] (BoW ) indicating a baseline for a label-based approach
(ii) Markov Logic Network model matching approach [12] with two variations MarkovR and
MarkovB utilizing Refactored and Bag of Words label similarity functions (iii) Bisimulation-based
model matching approach [3] (Bisim). Considering both label and behavior information of pro-
cess models and being independent of the application of a prior matching of activity labels, our
proposal OPTIMA outperforms all approaches regarding micro and macro f-score results on all
real-world datasets.
eter w= 0.5on the University dataset, and with the Behavioral Profile (BP) with the weighting
parameter w= 0.2on the Asset dataset. Furthermore, for the bisimulation-based approach
Bisim, a skip-penalty with the value 0.7for Birth,Asset, and a skip-penalty with the value 0.9for
University lead to the highest values. In addition, the best values for Markov-based approaches
MarkovB,MarkovR are attained by the constraint weight 0.001 for Birth,University, and the con-
69
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
straint weight 0.01 for the Asset dataset. Finally, for the Bag of Words process model matching
approach (BoW ) indicating a baseline for a label-based matching model technique, we utilize
the threshold of 0.7which is suggested by the authors in [11].
The results summarized in Figure 2(a)-2(b) report that the Bisim approach outperforms
other approaches in terms of precision performance on the Birth and Asset datasets. Fur-
thermore, OPTIMA exhibits the highest precision values on the University dataset, while the
Markov-based approaches show the lowest precision performance. The slightly higher perfor-
mance of Bisim over that of OPTIMA can be elucidated by the fact that the quantitative simula-
tion exposes a higher or comparable expressiveness for model matching, when compared with
the incorporation of the relational profiles. Furthermore, we observe that the BoW approach,
indicating only a baseline label-based approach, shows a much higher precision performance
than that of MarkovB and MarkovR on Birth and Asset comprising complex correspondences in
their gold standard, i.e. ground truth. This can be elucidated by the fact that BoW can success-
fully detect complex correspondences, while Markov-based approaches can only find a smaller
portion of complex correspondences on these datasets.
The results presented in Figure 2(c)-2(d) provide confirmatory evidence that our proposal
OPTIMA outperforms existing approaches regarding the recall measure on the Birth dataset.
In addition, OPTIMA indicates a comparable recall performance when compared with MarkovR
and MarkovB on all datasets, which can be explained by that fact that both Markov-based
approaches and our proposal can detect simple correspondences successfully. Furthermore,
BoW shows a comparable recall performance on all datasets, outperforming Bisim. An inter-
esting observation is the considerably poor recall performance of Bisim on all datasets. This
suggests that the applied quantitative simulation technique is eligible for detecting only a small
fraction of the relevant results.
A closer examination of macro and micro aggregation indicates that the macro aggregation
of the precision, recall, and f-score measures reflects lower values on the University dataset
than the micro aggregation results. This posits that some model pairs in University expose a
relatively poor performance in the three measures, directly resulting in lower macro aggregated
values since every model matching contributes equally to the computation of macro scores. In
contrast, the corresponding micro aggregation results seem to be higher, since the influence
of the poor performance of some particular models in the aforementioned dataset does not
substantially contribute to the computation of micro aggregation results at all.
As presented in Figure 2(e)-2(f), OPTIMA considerably outperforms all state-of-the-art match-
ing approaches regarding both micro and macro aggregation f-score results on all three real-
world datasets. The intuition behind this observation lies in the high precision and recall results
of our proposed approach, while other approaches exhibit a smaller result either in precision or
recall.
6 Conclusion
One of the major challenges and key components in today’s organizations is the ever-increasing
amounts of business processes, resulting in the need for novel effective process model match-
ing techniques in huge process model repositories. Providing direct insight into process model
matching, this paper introduces a novel business process model matching approach Optimization-
based Process Model Matching (OPTIMA) which matches individual components of two given
process models to each other by incorporating both label and behavioral information of the pro-
cess models. We present an optimization problem maximizing the activity label similarities at a
local level, and the behavioral similarity of the given processes at a global level by leveraging
relational profiles. Being fully independent of any prior matching of activity labels, our proposal
shows high competitiveness against existing techniques, in particular outperforming the state
70
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
of the art in terms of accuracy performance.
An interesting direction for future work concerns the analysis of the complex correspon-
dences which can potentially shed on light on the further matching strategies. Furthermore,
we intend to conduct research into the evaluations on various real-world datasets in order to
gain more insights, as well as examine the execution time of our approach to evaluate its per-
formance. In addition, examining the efficiency of the proposal, as well as reducing the number
of variables in the maximization problem to attain higher efficiency can be dedicated for future
examination.
Acknowledgments
We would like to thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.
References
[1] van der Aalst, W.M.P.: Process Mining: Data Science in Action. Springer, Heidelberg, 2
edn. (2016)
[2] Antunes, G., Bakhshandeh, M., Borbinha, J., Cardoso, J., Dadashnia, S., Francesco-
marino, C.D., Dragoni, M., Fettke, P., Gal, A., Ghidini, C., Hake, P., Khiat, A., Klinkm¨
uller,
C., Kuss, E., Leopold, H., Loos, P., Meilicke, C., Niesen, T., Pesquita, C., Peus, T.,
Schoknecht, A., Sheetrit, E., Sonntag, A., Stuckenschmidt, H., Thaler, T., Weber, I., Wei-
dlich, M.: The Process Model Matching Contest 2015. In: EMISA’15: International Work-
shop on Enterprise Modelling and Information Systems Architecture. pp. 127–155. GI,
Innsbruck, Austria (Sep 2015)
[3] Becker, J., Breuker, D., Delfmann, P., Dietrich, H.A., Steinhorst, M.: Identifying Business
Process Activity Mappings by Optimizing Behavioral Similarity. In: AMCIS. vol. 1, p. Paper
21 (01 2012)
[4] Becker, M., Laue, R.: A Comparative Survey of Business Process Similarity Measures.
Computers in Industry 63(2), 148 – 167 (2012)
[5] Cayoglu, U., Dijkman, R., Dumas, M., Fettke, P., Garc´
ıa-Ba˜
nuelos, L., Hake, P.,
Klinkm¨
uller, C., Leopold, H., Ludwig, A., Loos, P., Mendling, J., Oberweis, A., Schoknecht,
A., Sheetrit, E., Thaler, T., Ullrich, M., Weber, I., Weidlich, M.: Report: The Process Model
Matching Contest 2013. In: Lohmann, N., Song, M., Wohed, P. (eds.) Business Process
Management Workshops. pp. 442–463. Springer International Publishing, Cham (2014)
[6] Dijkman, R.M., Dumas, M., van Dongen, B.F., K¨
a¨
arik, R., Mendling, J.: Similarity of Busi-
ness Process Models: Metrics and Evaluation. Inf. Syst. 36(2), 498–516 (2011)
[7] Dumas, M., Garc´
ıa-Ba˜
nuelos, L., Dijkman, R.M.: Similarity Search of Business Process
Models. IEEE Data Eng. Bull. 32(3), 23–28 (2009)
[8] Euzenat, J., Shvaiko, P.: Ontology Matching. Springer Publishing Company, Incorporated,
2nd edn. (2013)
[9] Gurobi Optimization LLC: Gurobi Optimizer Reference Manual (2019), http://www.
gurobi.com
[10] Hillier, F., Lieberman, G.: Introduction to Linear Programming. McGraw-Hill (1990)
[11] Klinkm¨
uller, C., Weber, I., Mendling, J., Leopold, H., Ludwig, A.: Increasing Recall of
Process Model Matching by Improved Activity Label Matching. In: Business Process Man-
agement. pp. 211–218. Springer Berlin Heidelberg (2013)
71
Uysal et al. | Bus. Inf. Sys. 1 (2021) "BIS 2021"
[12] Leopold, H., Niepert, M., Weidlich, M., Mendling, J., Dijkman, R., Stuckenschmidt, H.:
Probabilistic Optimization of Semantic Process Model Matching. In: Business Process
Management. pp. 319–334. Springer Berlin Heidelberg (2012)
[13] Lin, D.: An Information-theoretic Definition of Similarity. In: Proc. of the 15th International
Conference on Machine Learning. vol. 98, pp. 296–304. Morgan Kaufmann (1998)
[14] Pegoraro, M., Uysal, M.S., van der Aalst, W.M.P.: Discovering Process Models from Uncer-
tain Event Data. In: Business Process Management Workshops. pp. 238–249. Springer
International Publishing, Cham (2019)
[15] Petri, C.A.: Kommunikation mit Automaten. Schriften des Rheinisch-Westf¨
alischen In-
stitutes f¨
ur Instrumentelle Mathematik an der Universit¨
at Bonn, Technische Hochschule,
Darmstadt. (1962), https://books.google.de/books?id=NCZMvAEACAAJ
[16] Schoknecht, A., Thaler, T., Fettke, P., Oberweis, A., Laue, R.: Similarity of Business Pro-
cess Models – A State-of-the-Art Analysis. ACM Comput. Surv. 50(4), 52:1–52:33 (Aug
2017)
[17] Thaler, T., Schoknecht, A., Fettke, P., Oberweis, A., Laue, R.: A Comparative Analysis of
Business Process Model Similarity Measures. In: Business Process Management Work-
shops. pp. 310–322. Springer International Publishing, Cham (2017)
[18] Uysal, M.S., van Zelst, S.J., Brockhoff, T., Ghahfarokhi, A.F., Pourbafrani, M., Schumacher,
R., Junglas, S., Schuh, G., van der Aalst, W.M.: Process Mining for Production Processes
in the Automotive Industry. In: Industry Forum at BPM 2020 co-located with 18th Interna-
tional Conference on Business Process Management (BPM 2020), Sevilla, Spain (2020)
[19] van der Aalst, W., Weijters, T., Maruster, L.: Workflow Mining: Discovering Process Models
from Event Logs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1128–
1142 (Sep 2004)
[20] van der Aalst, W.: The Application of Petri-nets to Workflow Management. Journal of Cir-
cuits, Systems and Computers 8(1), 21–66 (1998)
[21] Weidlich, M., Mendling, J., Weske, M.: Efficient Consistency Measurement Based on Be-
havioral Profiles of Process Models. IEEE Transactions on Software Engineering 37(3),
410–429 (May 2011)
[22] Weidlich, M., Mendling, J., Weske, M.: Computation of Behavioural Profiles of Process
Models. Business Process Technology, Hasso Plattner Institute for IT-Systems Engineer-
ing. Potsdam (2009)
[23] Weigel, A., Fein, F.: Normalizing the Weighted Edit Distance. In: Proceedings of the 12th
IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Pro-
cessing. vol. 2, pp. 399–402 vol.2 (Oct 1994)
[24] Wen, L., Song, J., Wang, J., Kumar, A.: BP+: An Improved Behavioral Profile Metric for
Process Models. https://www.researchgate.net/publication/286932844 (2015), accessed:
01.02.2021
72
... From the many available methods for measuring model similarity, quite a few studies focused on using structural similarity measures, such as discussed in [12]. In contrast, many other studies focused on using behavioural similarity measures, such as those discussed in [13,14], on which we also based some of our arguments in this study. Despite many other methods proposed by researchers, such as the weighted graph model [15] or weighted tree declarative pattern model [16], we use the combination of structural and behavioural metrics to measure the similarities between models used in this study. ...
... Structural similarity is a similarity metric between process models based on their structures. The most important aspect of calculating the structural similarity between two process models is the mapping of the components from the models that are being compared [11,14]. Since we used BPMN models in this study, the components we mapped were edges, tasks, gateways, and transitions. ...
Article
Full-text available
The absence of a Standard Operating Procedure (SOP) can lead to many problems in operations within organisations. Process mining techniques can discover process models that reflect the actual behaviour of the process implementations by using event logs extracted from information systems. However, the process models discovered by process mining often have too many variations and deviations when compared to the actual SOPs of the processes. This study attempted to compare three prediction methods in finding a process model from process mining that has the closest properties to the actual SOP. The compared methods are Receiver Operating Characteristics (ROC), the four quality dimensions, and similarity measures for structural and behavioural similarities. For the experiment, we designed a synthetic SOP that served as a ground truth for evaluating the performance of the three prediction methods in this study. We used a synthetic event log extracted from a dummy information system we particularly built for this study to test the methods. This study's results can be useful, e.g. for auditors to save a lot of time from conducting extensive surveys when SOPs are not readily available.
Conference Paper
Full-text available
Modern information systems are able to collect event data in the form of event logs. Process mining techniques allow to discover a model from event data, to check the conformance of an event log against a reference model, and to perform further process-centric analyses. In this paper, we consider uncertain event logs, where data is recorded together with explicit uncertainty information. We describe a technique to discover a directly-follows graph from such event data which retains information about the uncertainty in the process. We then present experimental results of performing inductive mining over the directly-follows graph to obtain models representing the certain and uncertain part of the process.
Conference Paper
Full-text available
To work efficiently with and unlock the potentials of business process models, measuring their similarity is a basic requirement. Thus, many automatic similarity measurement approaches have been developed during the last years, which utilize very different aspects of a model. At the same time, it is unclear which measures can be meaningfully applied in which context and how they behave in general. Hence, this paper analyzes how the values of existing similarity measures correlate and how corresponding implementations perform with respect to their resource consumption. The results of our analysis show that the similarity values of most measures highly correlate while their performance prohibits the usage of more than 50% of the measures in practice.
Technical Report
Full-text available
Similarity measures between process models are increasingly important for management, reuse, and analysis of process models in modern enterprises. So far, several approaches have been proposed and Be-havioral Profile (BP for short) is a good concept to judge the behavioral consistency of process models, which describes the observable relations between tasks. However, we found that BP has several serious drawbacks. In this paper, we propose a novel approach named BP + to improve BP. By using a finer granularity of relation types, BP + can refine behavioral profiles and distinguish between interleaving order relations caused by loops and parallel structures. Furthermore, BP + can handle process models with silent transitions. We propose some important properties that any good behavioral metric should have (in particular uniqueness and the ability to serve as a fingerprint for a process model) and show that BP + satisfies them. The experimental results show that BP + performs very well compared to BP in efficacy, efficiency and scalability.
Conference Paper
Full-text available
Process model matching refers to the creation of correspondences between activities of process models. Applications of process model matching are manifold, reaching from model validation over harmonization of process variants to effective management of process model collections. Recently, this demand led to the development of different techniques for process model matching. Yet, these techniques are heuristics and, thus, their results are inherently uncertain and need to be evaluated on a common basis. Currently, however, the BPM community lacks established data sets and frameworks for evaluation. The Process Model Matching Contest 2013 aimed at addressing the need for effective evaluation by defining process model matching problems over published data sets. This paper summarizes the setup and the results of the contest. Besides a description of the contest matching problems, the paper comprises short descriptions of all matching techniques that have been submitted for participation. In addition, we present and discuss the evaluation results and outline directions for future work in this field of research
Conference Paper
Full-text available
Comparing process models and matching similar activities has recently emerged as a research area of business process management. However, the problem is fundamentally hard when considering realistic scenarios: e.g., there is a huge variety of terms and various options for the grammatical structure of activity labels exist. While prior research has established important conceptual foundations, recall values have been fairly low (around 0.26) --- arguably too low to be useful in practice. In this paper, we present techniques for activity label matching which improve current results (recall of 0.44, without sacrificing precision). Furthermore, we identify categories of matching challenges to guide future research.
Conference Paper
Full-text available
Business process models are increasingly used by companies, often yielding repositories of several thousand models. These models are of great value for business analysis such as service identification or process standardization. A problem is though that many of these analyses require the pairwise comparison of process models, which is hardly feasible to do manually given an extensive number of models. While the computation of similarity between a pair of process models has been intensively studied in recent years, there is a notable gap on automatically matching activities of two process models. In this paper, we develop an approach based on semantic techniques and probabilistic optimization. We evaluate our approach using a sample of admission processes from different universities.
Article
Full-text available
Workflow management promises a new solution to an age-old problem: controlling, monitoring, optimizing and supporting business processes. What is new about workflow management is the explicit representation of the business process logic which allows for computerized support. This paper discusses the use of Petri nets in the context of workflow management. Petri nets are an established tool for modeling and analyzing processes. On the one hand, Petri nets can be used as a design language for the specification of complex workflows. On the other hand, Petri net theory provides for powerful analysis techniques which can be used to verify the correctness of workflow procedures. This paper introduces workflow management as an application domain for Petri nets, presents state-of-the-art results with respect to the verification of workflows, and highlights some Petri-net-based workflow tools.
Article
Business process models play an important role in today’s enterprises, hence, model repositories may contain hundreds of models. These models are, for example, reused during process modeling activities or utilized to check the conformance of processes with legal regulations. With respect to the amount of models, such applications benefit from or even require detailed insights into the correspondences between process models or between process models’ nodes. Therefore, various process similarity and matching measures have been proposed during the past few years. This article provides an overview of the state-of-the-art regarding business process model similarity measures and aims at analyzing which similarity measures exist, how they are characterized, and what kind of calculations are typically applied to determine similarity values. Finally, the analysis of 123 similarity measures results in the suggestions to conduct further comparative analyses of similarity measures, to investigate the integration of human input into similarity measurement, and to further analyze the requirements of similarity measurement usage scenarios as future research opportunities.
Article
Similarity measures for business process models have been suggested for different purposes such as measuring compliance between reference and actual models, searching for related models in a repository, or locating services that adhere to a specification given by a process model. The aim of our article is to provide a comprehensive survey on techniques to define and calculate such similarity measures.As the measures differ in many aspects, it is an interesting question how different measures rank “similarity” within the same set of models. We investigated, how different kinds of changes in a model influence the values of different similarity measures that have been published in academic literature.Furthermore, we identified eight properties that a similarity measure should have from a theoretical point of view and analysed how these properties are fulfilled by the different measures. Our results show that there are remarkable differences among existing measures.We give some recommendations which type of measure is useful for which kind of application.