Conference PaperPDF Available

Abstract and Figures

Process model matching refers to the creation of correspondences between activities of process models. Applications of process model matching are manifold, reaching from model validation over harmonization of process variants to effective management of process model collections. Recently, this demand led to the development of different techniques for process model matching. Yet, these techniques are heuristics and, thus, their results are inherently uncertain and need to be evaluated on a common basis. Currently, however, the BPM community lacks established data sets and frameworks for evaluation. The Process Model Matching Contest 2013 aimed at addressing the need for effective evaluation by defining process model matching problems over published data sets. This paper summarizes the setup and the results of the contest. Besides a description of the contest matching problems, the paper comprises short descriptions of all matching techniques that have been submitted for participation. In addition, we present and discuss the evaluation results and outline directions for future work in this field of research
Content may be subject to copyright.
The Process Model Matching Contest 2013
Ugur Cayoglu1, Remco Dijkman2, Marlon Dumas3, Peter Fettke4,
Luciano Garc´ıa-Ba˜nuelos3, Philip Hake4, Christopher Klinkm¨uller5,
Henrik Leopold6, Andr´e Ludwig5, Peter Loos4, Jan Mendling7,
Andreas Oberweis1, Andreas Schoknecht1, Eitam Sheetrit8, Tom Thaler4,
Meike Ullrich1, Ingo Weber9,10, and Matthias Weidlich8
1Karlsruhe Institute of Technology (KIT), Germany
Institute of Applied Informatics and Formal Description Methods (AIFB)
ugur.cayoglu@kit.edu
2Eindhoven University of Technology, The Netherlands
r.m.dijkman@tue.nl
3University of Tartu, Estonia
marlon.dumas|luciano.garcia@ut.ee
4
Institute for Information Systems (IWi) at DFKI and Saarland University, Germany
Tom.Thaler|Philip.Hake|Peter.Fettke|Peter.Loos@iwi.dfki.de
5Information Systems Institute, University of Leipzig, Germany
klinkmueller|ludwig@wifa.uni-leipzig.de
6Humboldt-Universit¨at zu Berlin, Germany
henrik.leopold@wiwi.hu-berlin.de
7Wirtschaftsuniversit¨at Wien, Austria
jan.mendling@wu.ac.at
8Technion - Israel Institute of Technology, Israel
eitams|weidlich@tx.technion.ac.il
9Software Systems Research Group, NICTA, Sydney, Australia
ingo.weber@nicta.com.au
10 School of Computer Science & Engineering, University of New South Wales
Abstract.
Process model matching refers to the creation of correspon-
dences between activities of process models. Applications of process model
matching are manifold, reaching from model validation over harmonization
of process variants to effective management of process model collections.
Recently, this demand led to the development of different techniques
for process model matching. Yet, these techniques are heuristics and,
thus, their results are inherently uncertain and need to be evaluated on a
common basis. Currently, however, the BPM community lacks established
data sets and frameworks for evaluation. The Process Model Matching
Contest 2013 aimed at addressing the need for effective evaluation by
defining process model matching problems over published data sets.
This paper summarizes the setup and the results of the contest. Besides a
description of the contest matching problems, the paper comprises short
descriptions of all matching techniques that have been submitted for
participation. In addition, we present and discuss the evaluation results
and outline directions for future work in this field of research.
Key words:
Process matching, model alignment, contest, matching evaluation
2 Cayoglu et al.
1 Introduction
Business process models allow for managing the lifecycle of a business process,
from its identification over its analysis, design, implementation, and monitoring [
1
].
A process model captures the activities of a business process along with their
execution dependencies. Process model matching is concerned with supporting
the creation of an alignment between process models, i.e., the identification of
correspondences between their activities.
In recent years, many techniques building on process model matching have
been proposed. Examples include techniques for the validation of a technical
implementation of a business process against a business-centered specification
model [
2
], delta-analysis of process implementations and a reference model [
3
],
harmonization of process variants [
4
,
5
], process model search [
6
,
7
,
8
], and clone
detection [
9
]. Inspired by the field of schema matching and ontology alignment,
cf., [
10
,
11
], this demand led to the development of different techniques for process
model matching. Yet, these techniques are heuristics and, thus, their results are
inherently uncertain and need to be evaluated on a common basis. Currently, the
BPM community lacks established data sets and frameworks for evaluation.
In this paper, we report on the setup and results of the Process Model Matching
Contest 2013. It was organized as part of the 4th International Workshop on
Process Model Collections: Management and Reuse (PMC-RM 13) that took
place on August 26, 2013, at the 11th International Conference on Business
Process Management in Beijing, China. The Contest Co-Chairs were Henrik
Leopold and Matthias Weidlich.
The Process Model Matching Contest (PMMC) 2013 addresses the need for
effective evaluation of process model matching techniques. The main goal of the
PMMC is the comparative analysis of the results of different techniques. By doing
so, it further aims at providing an angle to assess strengths and weaknesses of
particular techniques and at outlining directions for improving process model
matching. Inspired by the Ontology Alignment Evaluation Initiative (OAEI)
1
,
the PMMC was organized as a controlled, experimental evaluation. Two process
model matching problems were defined and published with respective data sets.
Then, participants were asked to send in their result files with the identified
correspondences along with a short description of the matching technique. The
evaluation of these results was conducted by the Contest Co-Chairs.
There have been seven submission to the contest covering diverse techniques
for addressing the problem of process model matching. All submissions provided
reasonable results and could, therefore, be included in the evaluation and this
paper. For each submitted matching technique, this paper contains an overview
of the matching approach, details on the specific techniques applied, and pointers
to related implementations and evaluations.
We are glad that the contest attracted interest and submissions from a variety
of research groups. We would like to thank all of them for their participation.
1http://oaei.ontologymatching.org
The Process Model Matching Contest 2013 3
The remainder of this paper is structured as follows. The next section gives
details on the process model matching problems of the PMMC 2013. Section 3
features the short descriptions of the submitted matching approaches. Section 4
presents the evaluation results. Based on these results, Section 5 outlines directions
for future work in process model matching before Section 6 concludes the paper.
2 Data Sets
The contest includes two sets of process model matching problems:
University Admission Processes (UA)
: This set contains process models
representing the admission processes of nine German universities. All models
contain English text only. The models have been created by different modelers
using varying terminology and capturing activities at different levels of
granularity. All models are available as Petri-nets in the PNML format and
shall be matched pairwise. Further, for eight out of the 36 model pairs, we
also provide a gold standard alignment for initial evaluation.
Birth Registration Processes (BR)
: This set comprises nine models
of birth registration processes in Germany, Russia, South Africa, and the
Netherlands. Four models were created by graduate students at the HU Berlin
and five of the models stem from a process analysis in Dutch municipalities.
Again, all models contain only English text, are available as Petri-nets in the
PNML format, and shall be matched pairwise to obtain 36 alignments.
Table 1. Characteristics of Test Data Sets
Characteristic UA BR
No. of labeled Transitions (min) 11 9
No. of labeled Transitions (max) 44 25
No. of labeled Transitions (avg) 22 17.9
No. of 1:1 Correspondences (total) 345 348
No. of 1:1 Correspondences (avg) 9.6 9.7
No. of 1:n Correspondences (total) 83 171
No. of 1:n Correspondences (avg) 2.3 4.75
Table 1 gives an overview of the main characteristics of the two data sets. In
addition to the minimum, maximum, and average number of labeled transitions
per model, it shows the total and average number of simple and complex corre-
spondences. From the numbers, we can learn that both model sets particularly
differ with regard to the number of complex correspondences. While the admission
models only contain an average of 2.3 complex correspondences per model, the
birth certificate models contain 4.75. Consequently, we expect the birth certificate
set to represent the more challenging sample.
4 Cayoglu et al.
3 Matching Approaches
In this section, we give an overview of the participating process model matching
approaches. In total, seven matching techniques participated in the process model
matching contest. Table 2 gives an overview of the participating approaches and
the respective authors. In the following subsections, we provide a brief technical
overview of each matching approach.
Table 2. Overview of Participating Approaches
No. Approach Authors
1
Triple-S: A Matching Approach for Petri Nets
on Syntactic, Semantic and Structural Level
Cayoglu, Oberweis, Schoknecht,
Ullrich
2 Business Process Graph Matching
Dijkman, Dumas, Garc´ıa-
Ba˜nuelos
3
RefMod-Mine/NSCM - N-Ary Semantic Clus-
ter Matching
Thaler, Hake, Fettke, Loos
4
RefMod-Mine/ESGM - Extended Semantic
Greedy Matching
Hake, Thaler, Fettke, Loos
5 Bag-of-Words Similarity with Label Pruning
Klinkm¨uller, Weber, Mendling,
Leopold, Ludwig
6
PMLM - Process Matching Using Positional
Language Models
Weidlich, Sheetrit
7 The ICoP Framework: Identification of Corre-
spondences between Process Models
Weidlich, Dijkman, Mendling
3.1 Triple-S: A Matching Approach for Petri Nets on Syntactic,
Semantic and Structural Level
Overview
So far, a handful contributions have been made to the problem of
process model matching. The Triple-S matching approach adheres to the KISS
principle by avoiding complexity and keeping it simple and stupid. It combines
similarity scores of independent levels as basis for a well-founded decision about
matching transition pairs of different process models. The following three levels
and scores are considered:
Syntactic level - SIMsy n
(
a, b
)
:
For the syntactic analysis of transition
labels we perform two preprocessing steps: (1) tokenization and (2) stop word
elimination. The actual analysis is based on the calculation of Levenshtein
distances between each combination of tokens (i.e. words) from the labels of
transitions
a
and
b
. The final syntactic score is the minimum distance over all
The Process Model Matching Contest 2013 5
tokens divided by the number of tokens, i.e. the minimum average distance
between each token.
Semantic level - SIMsem
(
a, b
)
:
Prior to analysis, we perform the same
preprocessing steps as above mentioned. Subsequently, we apply the approach
of Wu & Palmer [
12
] to calculate the semantic similarity between each token of
labels of transitions
a
and
b
based on path length between the corresponding
concepts. The final semantic score is the maximum average similarity, i.e. it
is calculated in an analogous manner to the final syntactic score.
Structural level - SIMsem
(
a, b
)
:
Here, we investigate the similarity of
transitions
a
and
b
through a comparison of (i) the ratio of their in- and
outgoing arcs and (ii) their relative position in the complete net.
These three scores are combined to the final score
SI Mtotal
(
a, b
) which represents
the matching degree between two transitions
a
and
b
from different process
models. It is calculated according to the following formula:
SI Mtotal (a, b) = ω1SI Msyn(a, b) + ω2SIMsem(a, b) + ω3SIMstruc(a, b)
The three parameters
ω1
,
ω2
and
ω3
define the weight of each similarity level. A
threshold value
θ
is used to determine whether transitions actually match, i.e. iff
SI Mtotal θ, two transitions positively match.
Specific Techniques
Compared to [
13
], the Triple-S approach makes several
adjustments. Firstly, stop words are eliminated and the Levenshtein distance is
calculated on the level of single tokens instead of complete sentences. Secondly,
for the semantic level an established NLP approach is introduced. Finally, on
the structural level TripleS performs contextual analysis by investigating local
similarity only.
Implementation
The Triple-S approach has been implemented using Java.
For the calculation of the semantic score with the approach of Wu & Palmer,
the WS4J Java API
2
has been used to query Princeton’s English WordNet 3.0
lexical database [
14
]. Relative positions of transitions are calculated using the
implementation of Dijkstra’s algorithm by Vogella
3
. The code can be obtained
from
http://code.google.com/p/bpmodelmatching/wiki/Download?tm=4
un-
der GNU GPL v3 license.
Evaluations
During our experiments we tried to approximate optimal results
based on the gold standard examples. For the contest, we have used the following
values:
ω1
= 0
.
45,
ω2
= 0
.
3,
ω3
= 0
.
25 and
θ
= 0
.
6. The Triple-S approach is
currently developed as part of the ongoing SemReuse research project addressing
business process model reuse. This contest on business process similarity presents
a welcome possibility for first experiments. We are planning on refining the
current measures for the individual levels, especially the semantic and structural
level and improved detection of 1:n matches.
2https://code.google.com/p/ws4j/
3http://www.vogella.com/articles/JavaAlgorithmsDijkstra/article.html
6 Cayoglu et al.
Acknowledgement
This work has been developed with the support of DFG
(German Research Foundation) under the project SemReuse OB 97/9-1.
3.2 Business Process Graph Matching
Overview
Business process graph matching works by considering a business
process as a labeled graph, wherein nodes correspond to tasks, events or gateways,
and edges capture the flow of control between nodes in the process. Nodes are
generally assumed to have a label, although gateways may be unlabeled.
Graph matching aims at computing a mapping between the nodes in the input
graphs. In its most common form, the mapping relates one node in a graph to at
most one node in the another graph (partial inductive mapping). The mapping
induces a distance between the two graphs, which is usually calculated by adding
the following components:
the number of inserted nodes: nodes that appear in one graph, but not in
the other (i.e.: nodes that are not part of the mapping);
the sum of the distances between nodes that are part of the mapping based
on their labels (e.g.: the nodes labeled ‘receive request’ and ‘receiving request’
are closer than the nodes labeled ‘receive request’ and ‘reject request’); and
the number of inserted edges: edges that appear in one graph, but not in the
other.
The goal of a typical graph matching algorithm is to find the mapping with the
smallest possible distance, also called as the graph-edit distance [
15
]. This is a
computationally complex problem, because the space of possible mappings that
need to be explored. Thus in practice, some pruning technique must be employed.
Specific Techniques
Graph matching algorithm can primarily be varied with
respect to two points. The first variation point is the metric that is used for
computing the weight of mapped nodes. The second variation point is the
algorithm that is used to explore the space of possible mappings.
The two main classes of metrics to compute the weight of mapped nodes are
syntactic metrics and semantic metrics. Syntactic metrics look at the label as a
string of characters. For example, a typical syntactic metric between two labels
is string-edit distance, which is the minimum number of character insertions,
deletions and substitutions that must be performed to transform one string into
another. Semantic metrics treat the label as a list or bag of words. A typical
semantic similarity metric is based on matching the words of two given labels and
defining a distance based on this matching. Words that are closer semantically
(e.g. they are synonyms or share a hypernym) are more likely to be matched.
The number of matches found and the strength of the matches then determines
the similarity between the labels. Additional tweaks may be applied to deal with
unlabeled nodes such as gateways.
Several algorithms can be envisioned to explore the space of possible mappings
between two business process graphs. One is a greedy algorithm that, in each
iteration, adds a mapping between two nodes that decreases the distance the most,
The Process Model Matching Contest 2013 7
until no such mapping can be found anymore. Another is based on search of the
space of mappings based on the so-called A-star heuristics. We have investigated
these alternatives in a number of papers [16, 17].
Implementation
The graph matching approach to business process matching
has been implemented both as part of the ICoP framework [
18
] and as part of
version 5 of the tool ProM
4
. ProM is open source. ICoP is available on request.
The tool uses WordNet to compute the semantic weights of node mappings.
Evaluations
We have evaluated several graph matching techniques on a col-
lection of models extracted from the SAP R/3 reference model. The extracted
collection consists of 100 so-called “document” models that simulate a repository
of process models, and 10 so-called “query” models that simulate business process
graphs that a user would be looking for. The goal is, given a query model, to
rank the document models according to their similarity to the query model.
In this experiment, the aim was to test how close different techniques corre-
spond to the “perceived similarity” of models as determined by a golden standard.
The golden standard was constructed by asking a number of individuals to rate
the similarity between pairs of process models in the collection (query model,
document model) on a scale of 1 to 7.
In this respect, we found that a technique based on A-star achieves a higher
mean average precision, which is a measure of ranking accuracy commonly used
in information retrieval. The greedy algorithm comes relatively close to the A-star
algorithm, while being faster.
3.3 RefMod-Mine/NSCM - N-Ary Semantic Cluster Matching
Overview
The approach for clustering business process model nodes consists
of four components which are executed sequentially. First of all it conducts
asemantic error detection (1), where defects of modeling are being identified
and automatically handled. After that, it uses all models as input for an n-ary
cluster matcher (2), which uses a semantic similarity measure (3) for pairwise
node comparison. As a result of that cluster matching we get a set of clusters
containing nodes of all considered models, which are being extracted to binary
complex matchings (4).
Specific Techniques
Semantic error detection. While analyzing different busi-
ness process models, we recognized the existence of model failures which leads to
a misinterpretation of nodes during a process matching. Against that background,
the main function of semantic error detection is the identification of wrong
modeled transition nodes. Since the algorithm as well as the gold standard only
matches transitions, this functionality checks whether the label suggests a node
being a place or confirms it being a transition. Therefore the form and order of
nouns and verbs of a label are being analyzed, which leads to the applicability
4http://www.processmining.org
8 Cayoglu et al.
only to English language models. The identified transitions are being marked as
ignore and will not be considered in the following matching components.
N-Ary cluster matching. In contrast to existing matching techniques, the
authors use an n-ary clustering instead of a binary matching. The nodes of all
models are being pairwise compared using a semantic similarity measure. Since
the cluster algorithm is agglomerative [1], it starts with clusters of size 1 (=
transitions) and consolidates two transitions to a cluster if their similarity value
passes a user-defined threshold. If two nodes are being clustered and both are
already part of different clusters, the two clusters are being merged. Thus, the
resulting clusters are hard and not fuzzy [19].
Semantic similarity measure. The used similarity measure consists of three
phases. The first phase splits node labels
L
into single words
wiL
, so that
split
(
L
) =
{w1L, ..., wnL}
. Stop words, like the,is,at as well as waste characters
like additional spaces are being removed. The second phase computes the Porter
Stem [
20
]
stem
(
wiL
) for each word and compares the stem sets of both labels.
The number of stem matchings is being divided by the sum of all words.
sim(L1, L2) = |{stem(w1L1), ..., stem(wnL1)}∩{stem(w1L2), ..., stem(wmL2)}|
|split(L1) + split(L2)|
If the resulting similarity value passes a user-defined threshold, the third
phase checks the labels for antonyms using the lexical database WordNet [
21
]
and checking the occurrence of negation words like not. Thus, that phase decides
the similarity being 0 or sim(L1, L2).
Binary matching extraction. The last component extracts binary matchings
from the node clusters calculated by the n-ary cluster matcher. For each model
pair all clusters are being scanned for the occurrence of nodes of both models.
The containing node set of the first model is then being matched to the node
set of the second model. Thus, the component returns a binary complex (N:M)
matching for each model pair.
Implementation
The mentioned technique has been implemented in form of a
php command line tool and can publicly checked out
5
. Next to the n-ary semantic
cluster matching and other matching techniques, the research prototype is able to
calculate node and process similarities from recent literature as well as analyzing
models and matchings.
Evaluations
To evaluate the approach, the authors analyzed the precision and
recall values in case of the delivered admission models with the corresponding
gold standard. After justifying the algorithm, the results leaded to a precision of
67% and a recall of 34%. Thereby, the threshold for semantic similarity was set
to 60%.
3.4 RefMod-Mine/ESGM - Extended Semantic Greedy Matching
Overview
In a first attempt dealing with the matching problem, a greedy
matching [
17
] was implemented and evaluated based on precision and recall.
5https://refmodmine.googlecode.com/svn
The Process Model Matching Contest 2013 9
Though a considerably high precision is achieved by this approach, only a low
degree of recall is reached due to neglect of potential complex matches. To attain
a higher recall and meet the demands of complex matches, the approach is
extended.
The approach introduced here matches business process models pair-wisely
based on the similarities of the process models’ transitions. The result of the
matching algorithm is a set of complex (N:M) transition matches between two
process models. The matching is subdivided into three steps.
In the first step, a pre-processing of data is applied to the models. The second
step consists in computing the similarity of all potential 1:1 transition matches of
two models using a word matching technique. In a final step, a heuristic grouping
of similar transitions from step 2 is conducted.
Specific Techniques
Pre-Processing. While evaluating precision and recall, the
authors noticed that some transitions which seemed to represent process events
rather than activities, had not been matched with regard to the gold-standard.
Hence one step of the pre-processing is a heuristic filter which excludes such
transitions from further matching steps.
Moreover, the labels of the transitions are split up into word sets according to
split characters like whitespace or hyphen. After all non-word characters
6
have
been removed from the word sets, stop words like to,the, and is are removed
from the word sets.
Word Matching. Unlike most approaches, the computation of the transitions’
similarity is accomplished applying the greedy matching technique [
17
] on business
process models to transition labels. Therefore, at first, the similarity of the words
of two labels is determined.
The computation of the similarity score
simw
of two words is based on
dictionary lookups and a syntactic similarity measure [
16
]. In case the words
represent synonyms or it exists a nominalization of one word that is synonymic
to the other or vice versa, they receive a similarity score of 1. If the words or their
nominalizations are considered antonyms, a similarity score of -1 is returned,
otherwise they receive a syntactic similarity score based on Levenshtein’s edit
distance.
Let
L
be a label of Transition
T
that belongs to process model
M
and
W
a set of words of a label
L
.
simw
(
w1, w2
) denotes the similarity of two words
w1W1
and
w2W2
. Furthermore, let
MW
:
W1W2
be a partial injective
mapping on the word sets
W1, W2
. Then
simL
(
L1, L2
) denotes the similarity of
two labels L1, L2.
simL(L1, L2) = P(w1,w2)MWsimw(w1, w2)
max(|W1|,|W2|)(1)
Heuristic Grouping. The subsequent grouping of transitions consists in adding
all pairs which do not fall below a predefined similarity threshold
t
to the result.
The following rules depict the heuristic grouping technique.
6http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
10 Cayoglu et al.
Let
G
be a set of transitions representing a group of transitions. Given a pair
of transitions (
T1, T2
), which satisfies the threshold criterion (
simL
(
Li, Lj
)
t
),
a new group
G
=
{T1, T2}
is added to the result set if neither
T1
nor
T2
belongs
to any group. In case only one transition, either
T1
or
T2
, is not represented in
any group, this transition is added to the group the other transition belongs to.
If
T1
belongs to group
Gi
and
T2
to group
Gj
, the groups
Gi, Gj
are replaced by
the new group Gn=GiGj.
Implementation
The matching approach is implemented in Java (jre6) and
is embedded in RefMod-Mine, which is a tool set dedicated to the mining of
reference models. The computation of the labels similarity largely relies on
dictionary lookups. The underlying dictionary is the WordNet [
21
] database (v
3.0) and it is accessed via the RiTa.WordNet Java/Javascript API
7
, which is free
and open-source licensed under GPL.
Evaluations
The approach has been evaluated based on the partial gold standard
provided. Therefore, the threshold for the grouping was set to 65%.
3.5 Bag-of-Words Similarity with Label Pruning
Overview The approach to process model matching discussed here is a subset
of our previous paper [
22
]. While we explored various options before, herein we
focus on the matching strategy that provided the most significant increase in
match quality in our experiments. This technique solely considers activity labels,
disregarding other information present in the process models such as events,
process structure or behavior.
In a nutshell, the approach computes label similarity by (i) treating each
label as a bag of words (a multi-set of words), (ii) applying word stemming
(to transform, e.g., “evaluating” into “evaluate”) for better comparability, (iii)
computing the similarity scores as per Levenshtein [
23
] and Lin [
24
] for each pair
of words, (iv) pruning the multi-sets for both activity labels under comparison
to be equal in the number of words, (v) computing an overall matching score
for each activity pair, and (vi) selecting all activity pairs whose score is above a
given threshold.
Specific Techniques
For a detailed introduction of the overall approach we
refer the reader to [
22
]. In the following we explain specific aspects of it and the
configuration used in this paper.
One characteristic of the bag-of-words similarity is that it neglects the gram-
matical structure of the label. This is in contrast to [
25
] where the individual
words of the labels are assigned with types; and words will only be compared
if they belong to the same type. The rationale for neglecting label structure is
that the brevity of labels makes it hard to deduce information like word forms.
In this way, the bag-of-words similarity aims to offer a means to find matches
like “reject applicant” vs. “send letter of rejection”.
7RiTa.WordNet, http://www.rednoise.org/rita/wordnet/documentation/
The Process Model Matching Contest 2013 11
Furthermore, in case the two bags-of-words under comparison are different
in size, the larger one is pruned to the size of the smaller one. Therefore, words
with a small similarity score are removed from the larger set. This is done to
better capture activity labels with a strong difference in specificity. For instance,
“rank case” vs. “rank application on scale of 1 to 10” may have a very low average
word similarity as the second label also contains information about a condition
not present in the first label.
Finally, the decision to rely on a syntactical (Levenshtein) as well as a semantic
(Lin) word similarity notion tries to lessen the weaknesses of both notions. While
syntactical notions cannot account for a strong conceptual similarity of two words,
a semantic notion struggles when spelling errors are present. However, there are
still cases where this combination struggles.
Implementation
The technique is implemented in Java and part of the Pro-
cess Model Matching Tools for Java (jpmmt)-project which aims at providing
algorithms and measures for process model matching. The project is publicly
available8under the terms of the MIT License9.
Evaluations
In [
22
] we evaluated various configurations of the bag-of-words
similarity with label pruning and its basic variant the bag-of-words similarity.
These configurations included different pruning criteria and word similarity
functions. In order to achieve comparability, we used the data set from [
25
] which
includes the university admission processes and the corresponding matching
standard also part of the data set of this matching contest. The evaluation
showed that the technique has the potential to increase recall of process model
matching compared to results yielded by the approaches introduced in [18, 25].
Furthermore, we applied the technique in the context of Business Process
Querying (BPQ). In [
26
] an approach to BPQ is presented that decomposes a
BPMN-Q query [
27
] into a set of sub-queries. For these sub-queries corresponding
process model fragments are determined within a process collection. Finally, these
fragments are aggregated in order to provide a list of process model fragments
that provide answers to the whole query. Our technique constitutes the base for
an extension of this approach. Instead of relying on 1:1 matches for the activities
in the query this assumption is relaxed and more complex matching constellations
are allowed. An evaluation which also relies on the university admission processes
shows that the technique in combination with the approach from [
26
] yields
promising results. However, the size of the collection and queries is relatively
small, and further experiments need to be conducted.
3.6 PMLM - Process Matching Using Positional Language Model
Overview
This matching technique is tailored towards process models that
feature textual descriptions of activities, introduced in detail in [
28
]. Using
8http://code.google.com/p/jpmmt/
9http://opensource.org/licenses/mit-license.php
12 Cayoglu et al.
ideas from language modeling in Information Retrieval, the approach leverages
those descriptions to identify correspondences between activities. More precisely,
we combine two different streams of work on probabilistic language modeling.
First, we adopt passage-based modeling such that activities are passages of a
document representing a process model. Second, we consider structural features of
process models by positional language modeling. While using those probabilistic
language models, we create a similarity matrix between the activities and derive
correspondences using second line matching.
Specific Techniques
Activities as Passages. Let
T
be a corpus of terms. For a
process model
P
, we create a document
d
=
hT1, . . . , Tni
as a sequence of length
nN
of passages, where each passage is a set of terms
d
(
i
) =
T⊆ T
, 1
in
.
The set
d
(
i
) =
T
comprises all terms that occur in the label or description of the
activity at position
i
. The length of
d
is denoted by
|d|
. We denote by
D
a set of
processes, represented as documents.
Our model is built on a cardinality function
c
: (
T × D × N
)
→ {
0
,
1
}
,
such that
c
(
t, d, i
) = 1 if
tT
=
d
(
i
) (term
t
occurs in the
i
-th passage of
d
)
and
c
(
t, d, i
) = 0 otherwise. To realize term propagation to close-by positions,
a proximity-based density function
k
: (
N×N
)
[0
,
1] is used to assign a
discounting factor to pairs of positions. Then,
k
(
i, j
) represents how much of
the occurrence of a term at position
j
is propagated to position
i
. We rely on
the Gaussian Kernel
kg
(
i, j
) =
e((ij)2)/(2σ2)
, defined with a spread parameter
σR+
[
29
]. Adapting function
c
with term propagation, we obtain a function
c0
: (
T × D × N
)
[0
,
1], such that
c0
(
t, d, i
) =
Pn
j=1 c
(
t, d, j
)
·kg
(
i, j
). Then,
our positional, passage-based language model
p
(
t|d, i
) captures the probability of
term
t
occurring in the
i
-th passage of document
d
(
µR
,
µ >
0, is a weighting
factor):
pµ(t|d, i) = c0(t, d, i) + µ·p(t|d)
Pt0∈T c0(t0, d, i) + µ.(2)
Derivation of Passage Positions. To instantiate the positional language model
for process models, we need to specify how to order the passages in the document
to represent the order of activities in a process. In this matching contest, we
chose to use a Breadth-First Traversal over the process model graph starting
from an initial activity that creates the process instance (we insert a dummy
node connect to all initial activities if needed).
Similarity of Language Models. Using the language models, we measure the
similarity for document positions and, thus, activities of the process models,
with the Jensen-Shannon divergence (JSD) [
30
]. Let
pµ
(
t|d, i
) and
pµ
(
t|d0, j
)
be the smoothed language models of two process model documents. Then, the
probabilistic divergence of position iin dwith position jin d0is:
jsd(d, d0, i, j ) = 1
2X
t∈T
pµ(t|d, i) lg pµ(t|d, i)
p+(t)+1
2X
t∈T
pµ(t|d0, j) lg pµ(t|d0, j)
p+(t)
with p+(t) = 1
2(pµ(t|d, i) + pµ(t|d0, j))
(3)
The Process Model Matching Contest 2013 13
When using the binary logarithm, the JSD is bound to the unit interval [0
,
1], so
that sim(d, d0, i, j)=1jsd(d, d0, i, j ) can be used as a similarity measure.
Derivation of Correspondences. Finally, we derive correspondences from a
similarity matrix over activities, which is known as second line matching. Here,
we rely on two strategies, i.e., dominants and top-k, see [31]. The former selects
pairs of activities that share the maximum similarity value in their row and
column in the similarity matrix. The latter selects for each activity in one model,
the kactivities of the other process that have the highest similarity values.
Implementation
The application was built in C#, and uses the Lemur ToolKit
for stemming terms, and calculating the probability of each term to be relevant
given a certain passage and position in a document. In our implementation, we
first read the XML files representing the process models, transform each element
into an object according to its type (transition, place or arc) and order the
transitions. In the first phase, we create an ordered document containing only
the activity labels (with no propagation), create a similarity matrix using the
Lemur ToolKit and find correspondences using dominants approach. In the second
phase, we create another ordered document with activity labels, descriptions and
term propagation, create a similarity matrix using the Lemur ToolKit and find
correspondences using top-3 approach. Finally, we choose matches according to
the dominants result and add the selected top-3 if their similarity score is no less
then 80% of the highest similarity value in their row.
The implementation is still in development stage, so for the time being it is
not available for a public use.
Evaluations
We conducted experiments with several real-world model collec-
tions. First, we used models from the Bank of Northeast of Brazil (BNB) that
capture business processes on three levels: business perspective, technical per-
spective, or executable process specification, also used in [
2
]. Second, we used
models from an electronics company and from municipalities in the Netherlands,
described and used for evaluation in [
32
]. All sets include textual annotations for
at least some of the activities. Our results indicate that this matching technique is
geared towards high recall, increasing it up to a factor of 5 over existing work [
28
].
While average precision is rather low, we observe k-precision values (k-precision
extends precision to top-k lists, where a match is a top-k list where a correct
pair is found) above 60%. Hence, correct correspondences can be extracted by an
expert with reasonable effort, thereby supporting semi-automated matching.
3.7 The ICoP Framework: Identification of Correspondences
between Process Models
Overview
The ICoP framework [
32
] aims at solving the problem of matching
process models with a particular focus on complex correspondences that are
defined between sets of activities instead of single activities. Towards this end,
the framework proposes an architecture and a set of re-usable components for
assembling concrete matchers.
14 Cayoglu et al.
The ICoP architecture defines process model matching as a multi-step ap-
proach involving four different types of components.
Searchers
try to cope with the combinatorial challenges induced by potentially
complex correspondences by applying heuristics to search the space of possible
matches. Here, different strategies are first applied for group activities and,
second, for assessing the similarity of these groups of activities. Searchers
return a set of candidate correspondences with assigned confidence scores.
Boosters
aggregate candidate correspondences and adapt their scores. On the
one hand, the multiset of matches returned by the searchers is aggregated
to obtain a set of candidate correspondences. Also, scores are adapted, e.g.,
based on subsumption of candidate correspondences.
Selectors
build up the actual final set of correspondences from the set of candidate
correspondences, by selecting the best candidates that are non-overlapping in
their sets of activities. Here, selection is guided by the scores of the candidates
as well as an evaluation score computed by an evaluator (see below). Then,
selection of correspondences is done iteratively. Yet, exhaustive search for
the best selection is typically not possible, so that a greedy strategy or an
approach with a certain lookahead is followed.
Evaluators
assign a score to a set of correspondences. Computation of this score
is typically based on the original process models, such that the consistency
of certain structural or behavioural properties of the process models under
the given correspondences is assessed.
In addition to this architecture, the ICoP framework provides different imple-
mentations of these four components that may be used to assemble matchers.
Examples include searchers that rely on vector space scoring, different aggrega-
tion boosters, evaluators based on the graph edit distance, and selectors that
implement different strategies for combining scores of individual candidates and
the evaluation scores for sets of correspondences.
Specific Techniques
We want to highlight two specific techniques that are
used in components of the ICoP framework:
Virtual Document Searchers. Searchers implement heuristics to first group
activities in either process model and then assess the similarity of these groups to
derive candidate correspondences. Given a set of activities groups in either model
(e.g., derived based on proximity in terms of graph distance or by structural
decomposition), searchers in the ICoP framework exploit virtual documents for
similarity assessment. Here, the notion of a virtual document is inspired by work
on ontology alignment [
33
] where a virtual document of a node consists of all
textual information in an ontology that is related to that node. Then, two two
virtual documents are scored based their Cosine similarity in a vector space that
is spanned by the terms that appear in the documents. In the ICoP searchers, a
virtual document for a group of activities consists of the terms of the activity
label and any additional textual information related to the activity, such as an
activity description, data input and output artefacts, and names and descriptions
of related roles and information systems. Combined with common techniques
from information retrieval, e.g., stop-word filtering and term-frequency based
The Process Model Matching Contest 2013 15
weighting, this technique provides a means to consider not only activity labels, but
a broad spectrum of textual information related to an activity for the matching.
Execution Semantics Evaluator. An evaluator scores a set of correspondences,
typically based on the original process models. The ICoP framework defines an
evaluator that exploits the execution semantics of the process models for scoring
a set of correspondences. To this end, it relies on the relations of the behavioural
profile of a process model, cf., [
34
]. Such a profile abstracts trace semantics of a
process by a set of binary behavioural relations defined over its activities: two
activities are ordered (if one can occur before the other but not vice versa),
exclusive (if they cannot occur jointly in an execution sequence), or interleaved
(if they can occur in either order). This information is used for assigning a score
to a set of correspondences by checking for each pairs of activities of distinct
correspondences in one model, whether their behavioural relation is mirrored by
all the corresponding activities in the other model. Then, the ratio of consistent
pairs and all investigated pairs provides us with a score that captures the extent
to which the behavioural characteristics of one model are preserved in the other
model under the given correspondences.
Implementation
The ICoP framework has been implemented in Java and is
available upon request from the authors of [
32
]. Currently, process models are
expected to be given as Petri nets in the PNML format.
A major revision of the framework is under way. By building upon the jBPT
library [
35
], this new implementation will support a broader class of process
model descriptions and serialization formats.
Evaluations
The ICoP framework has been designed with a particular focus on
the identification of complex correspondences. An evaluation of the framework
can be found in [
32
]. It illustrates that the ICoP architecture allows for the
creation of matchers that find a significant share of complex correspondences.
Yet, it also shows that a certain homogeneity of the process model vocabulary is
required for the identification of complex correspondences.
4 Results
For assessing the submitted process model matching techniques, we compare the
computed matches against a manually created gold standard. Using the gold
standard, we classify each computed activity match as either true-positive (TP),
true-negative (TN), false-positive (FP) or false-negative (FN). Based on this clas-
sification, we calculate the precision (TP/(TP+FP), the recall (TP/(TP+FN)),
and the f-measure, which is the harmonic mean of precision and recall (2*pre-
cision*recall/(precision+recall)). Table 3 gives an overview of the results for
the university admission data set and Table 4 presents the results for the birth
certificate data set. For getting a better understanding of the result details, we
report the average (AVG) and the standard deviation (STD) for each metric.
The highest value for each metric is marked using bold font.
16 Cayoglu et al.
Table 3. Results of University Admission Matching
Precision Recall F-Measure
No. Approach AVG STD AVG STD AVG STD
1 Triple-S 0.31 0.19 0.36 0.26 0.33 0.12
2 BP Graph Matching 0.60 0.45 0.19 0.30 0.29 0.29
3 RefMod-Mine/NSCM 0.37 0.22 0.39 0.27 0.38 0.19
4 RefMod-Mine/ESGM 0.16 0.26 0.12 0.21 0.14 0.17
5 Bag-of-Words Similarity 0.56 0.23 0.32 0.28 0.41 0.20
6 PMLM 0.12 0.05 0.58 0.20 0.20 0.08
7 ICoP 0.36 0.24 0.37 0.26 0.36 0.23
Table 4. Results of Birth Certificate Matching
Precision Recall F-Measure
No. Approach AVG STD AVG STD AVG STD
1 Triple-S 0.19 0.21 0.25 0.33 0.22 0.23
2 BP Graph Matching 0.55 0.48 0.19 0.28 0.28 0.30
3 RefMod-Mine/NSCM 0.68 0.19 0.33 0.22 0.45 0.18
4 RefMod-Mine/ESGM 0.25 0.28 0.18 0.26 0.21 0.23
5 Bag-of-Words Similarity 0.29 0.35 0.22 0.30 0.25 0.31
6 PMLM 0.19 0.09 0.60 0.20 0.29 0.12
7 ICoP 0.42 0.27 0.28 0.23 0.33 0.24
From the results presented in Table 3 and Table 4, we can draw the following
conclusions. Most importantly, it has to be noted there is no clear winner. As the
employed data sets are different with respect to characteristics such as the number
of complex correspondences and the linguistic consistency, different capabilities
are required to come up with a good matching result. Apparently, no technique
can perfectly deal with both data sets. However, there are a couple of interesting
observations.
Focussing on the f-measure, the bag-of-words similarity approach yields the
best result for the university admission set (0.41) and the RefMod-Mine/NSCM
approach yields the best result for the birth certificate set (0.45). However, it
should be noted that the RefMod-Mine/NSCM approach is quite close to the
f-measure of the bag-of-words similarity approach for the university admission
set (0.38) while the bag-of-words approach has a rather average result quality
for the birth certificate models (0.25). Interestingly, the best f-measure is not
necessarily associated with the best recall and precision. The PMLM approach
yields the best recall (0.60 and 0.58) for both sets. Nevertheless, due to its rather
low precision, it only yields average f-measures. The opposite situation can be
observed for the BP Graph Matching approach. While it has rather low recall
values, it yields top precision values (0.48 and 0.60). Apparently, the trade-off
The Process Model Matching Contest 2013 17
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
2"
3"
4"
6"
7"
10"
11"
12"
13"
14"
15"
16"
17"
18"
19"
20"
21"
22"
23"
24"
25"
26"
27"
28"
29"
30"
31"
32"
33"
34"
35"
36"
F"Measure)
Pair)
AVG"
Max"
Fig. 1. Detailed Results of Admission Data Set
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
2"
3"
4"
6"
7"
10"
11"
12"
13"
14"
15"
16"
17"
18"
19"
20"
21"
22"
23"
24"
25"
26"
27"
28"
29"
30"
31"
32"
33"
34"
35"
36"
F"Measure)
Pair)
AVG"
Max"
Fig. 2. Detailed Results of Birth Certificate Data Set
between precision and recall is still a major issue in the context of process model
matching.
Looking at the standard deviation, we can see that many approaches suffer
from quite unstable results. A detailed consideration of the results for individual
model pairs reveals that there are some model pairs that are matched well, while
others represent a considerable challenge for all participating techniques. Figure 1
and Figure 2 illustrate this fact by showing the average and maximum f-measure
among all techniques for each matching pair. In the admission set, we observe
particular high results for the pairs 1,7, 14, 17, 19, and 28. The pairs 25 and 36
apparently represent complex matching problems. For the birth certificate data
18 Cayoglu et al.
set, we observe a quite similar constellation. While the techniques yield good
results for the pairs 31, 32, and 34, they fail to adequately match the pairs 10
and 15. Having a closer look into these extreme cases, we can identify two main
characteristics that influence the result quality for a matching pair: the similarity
of labels and the number of complex matches.
The more similar the labels of the matching pair, the better the matching
result. By contrast, if many business objects are different or even missing, the
identification of the matches may represent a serious challenge. As example,
consider the match between Checking if complete and Check documents. Here,
the rather unspecific verb check is the only connection between the labels. The
second characteristic indicating the hardness of the matching challenge is the
the number of complex matches. As such matches often require a semantic
grouping of activities, their identification is a complicated and error-prone task.
The identification of complex matches is often further aggravated by the fact
that the connection between actions and business objects is hard to detect. As
example, consider the complex match between the activity Clarify name and
the activities Consult mother and Consult father. Taking a standard semantic
similarity measure such as the Lin metric, the similarity between these labels is
close to zero. In order to adequately address such problems, more sophisticated
approaches are required.
Besides this comparative discussion, the obtained precision and recall values
indicate that matching techniques cannot yet be expected to provide an out-of-
the-box solution for fully automated matching. However, the detailed analysis
of individual model pairs reveals that very good results can be obtained for a
certain setting. Also, the variability of the techniques in terms of their preference
for either precision or recall outlines potential for further improvements.
5 Future Directions
Based on the results and the observations from the Process Model Matching
Contest 2013, we use this section to outline major directions for future work
in the field of process model matching. In particular, we discuss strategies to
address the overall matching result quality, the need for addressing semantics,
the applicability of push-button approaches, and the question of how process
model matching can be evaluated.
The results from this contest highlight that the overall result quality still needs
to improved. Still, the differences between the employed data sets also indicate
that many techniques can properly match a particular set of models. This raises
the question whether appropriate matchers can be automatically selected based
on the matching problem at hand. This, however, requires a precise understanding
of the capabilities of the different matchers and an accurate selection algorithm.
A promising strategy to address this problem might be the incorporation of
prediction techniques as they have been recently proposed in the area of schema
matching [
36
]. If the quality of the result of a matching technique can be predicted
based on certain characteristics of the model or the match constellation, the best
The Process Model Matching Contest 2013 19
matching technique can be selected in an automated fashion. In this context, it
could be also a promising strategy to determine a set of matchers that jointly
address the given matching problem.
The detailed consideration of the matching results revealed that particular
semantic relationships are hard to detect. Hence, we are convinced that semantic
technologies need to be explored in more detail. While it turned out to be helpful
to differentiate between label components such as action and business object, the
simple comparison with semantic similarity measures is not sufficient. In order
to detect more complex semantic relationships, it might be necessary to include
ontologies or additional information such as textual descriptions of the models.
Most of the currently existing process model matching techniques represent
push-button approaches that compute results without any user interaction. Thus,
matching shall be considered as an iterative process that includes feedback cycles
with human experts, a process known as reconciliation in data integration [
37
,
38
].
Given the general complexity of the matching task, such a semi-automated
technique could still provide significant support to the user. By collecting feedback
from the user, important decisions during the construction of a matching can be
validated, leading to a better overall result.
So far, many matching techniques evaluate the result quality using precision,
recall, and f-measure. However, considering the complexity of the matching
setting, it can be doubted that these simplistic metrics are appropriate. In many
cases, a match constellation is not necessarily true or false and the decision is
even hard for humans. Against this background, it might be worth to pursue
different evaluation strategies, such as non-binary evaluation [
39
]. Also, one shall
consider the actual benefit achieved by (semi-) automated matching. However,
measuring the post-match effort turned out to be challenging and is also not
well understood for related matching problems [
40
]. Further work is needed to
understand how tool-supported matching compares to manual matching in terms
of time and quality.
Altogether, it must be stated that there are many directions for future re-
search. Many of them are concerned with improving existing techniques. However,
acknowledging that process model matching is not a simple task with a single
correct result, it is also important to focus on alternative evaluation strategies.
6 Conclusion
In this paper, we reported on the setup and the results of the Process Model
Matching Contest 2013. This contest addressed the need for effective evaluation
of process model matching techniques. We provided two different process model
matching problems and received automatically generated results of 7 different
techniques. The evaluation of the results showed that their is no clear winner of
the contest since no approach yielded the best performance for both data sets.
We learned that there is still a huge trade-off between precision and recall, and
that semantic and complex correspondences represent considerable challenges.
20 Cayoglu et al.
For future work, we highlighted that it is important to further improve the
result quality achieved by the matching techniques. This may be accomplished
by automatically selecting the best matcher based on the matching problem at
hand, by exploiting semantics in a more elaborated way, or by incorporating
user feedback. Further, we emphasized the importance of proper evaluation. As
precision, recall, and f-measure are overly simplistic and only allow matches do
be true or false, it might be worth to consider alternative evaluation strategies.
This may, for instance, include the comparison of a matching technique with a
human matching in terms of time and quality.
References
1.
Dumas, M., La Rosa, M., Mendling, J., Reijers, H.: Fundamentals of Business
Process Management. Springer (2012)
2.
Branco, M.C., Troya, J., Czarnecki, K., K¨uster, J.M., V¨olzer, H.: Matching business
process workflows across abstraction levels. In France, R.B., Kazmeier, J., Breu, R.,
Atkinson, C., eds.: MoDELS. Volume 7590 of Lecture Notes in Computer Science.,
Springer (2012) 626–641
3.
uster, J.M., Koehler, J., Ryndina, K.: Improving business process models with
reference models in business-driven development. In Eder, J., Dustdar, S., eds.: Busi-
ness Process Management Workshops. Volume 4103 of Lecture Notes in Computer
Science., Springer (2006) 35–44
4.
Weidlich, M., Mendling, J., Weske, M.: A foundational approach for managing
process variability. In Mouratidis, H., Rolland, C., eds.: CAiSE. Volume 6741 of
Lecture Notes in Computer Science., Springer (2011) 267–282
5.
La Rosa, M., Dumas, M., Uba, R., Dijkman, R.: Business process model merging:
An approach to business process consolidation. ACM Trans. Softw. Eng. Methodol.
22(2) (2013) 11:1–11:42
6.
Dumas, M., Garc´ıa-Ba˜nuelos, L., Dijkman, R.M.: Similarity search of business
process models. IEEE Data Eng. Bull. 32(3) (2009) 23–28
7.
Kunze, M., Weidlich, M., Weske, M.: Behavioral similarity - a proper metric. In
Rinderle-Ma, S., Toumani, F., Wolf, K., eds.: BPM. Volume 6896 of Lecture Notes
in Computer Science., Springer (2011) 166–181
8.
Jin, T., Wang, J., Rosa, M.L., ter Hofstede, A.H., Wen, L.: Efficient querying of
large process model repositories. Computers in Industry 64(1) (2013)
9.
Ekanayake, C.C., Dumas, M., Garc´ıa-Ba˜nuelos, L., Rosa, M.L., ter Hofstede, A.H.M.:
Approximate clone detection in repositories of business process models. In Barros,
A.P., Gal, A., Kindler, E., eds.: BPM. Volume 7481 of Lecture Notes in Computer
Science., Springer (2012) 302–318
10. Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (DE) (2007)
11.
Bellahsene, Z., Bonifati, A., Rahm, E., eds.: Schema Matching and Mapping.
Springer (2011)
12.
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of
the 32nd annual meeting on Association for Computational Linguistics. ACL ’94,
Stroudsburg, PA, USA, Association for Computational Linguistics (1994) 133–138
13.
Ehrig, M., Koschmider, A., Oberweis, A.: Measuring similarity between semantic
business process models. In: Proceedings of the 4th Asia-Pacific conference on
Conceptual modelling. Volume 67., Australian Computer Science Communications
(2007) 71–80
The Process Model Matching Contest 2013 21
14.
Miller, G., Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press,
Cambridge, MA (1998)
15.
Bunke, H.: On a relation between graph edit distance and maximum common
subgraph. Pattern Recognition Letters 18(8) (1997) 689 – 694
16.
Dijkman, R.M., Dumas, M., van Dongen, B.F., K¨arik, R., Mendling, J.: Similarity
of Business Process Models: Metrics and Evaluation. Information Systems
36
(2)
(2011) 498–516
17. Dijkman, R.M., Dumas, M., Garc´ıa-Ba˜nuelos, L.: Graph matching algorithms for
business process model similarity search. In: Proceedings of the 7th International
Conference on Business Process Management. BPM ’09, Berlin, Heidelberg, Springer-
Verlag (2009) 48–63
18.
Weidlich, M., Dijkman, R.M., Mendling, J.: The icop framework: Identification
of correspondences between process models. In Pernici, B., ed.: Proceedings of
the 22nd international conference on Advanced Information Systems Engineering.
Volume 6051 of LNCS., Springer (2010) 483–498
19.
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput.
Surv. 31(3) (1999) 264–323
20.
Porter, M.F.: Readings in information retrieval. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA (1997) 313–316
21.
Miller, G.A.: WordNet: a Lexical Database for English. Communications of the
ACM 38(11) (1995) 39–41
22.
Klinkm¨uller, C., Weber, I., Mendling, J., Leopold, H., , Ludwig, A.: Increasing
recall of process model matching by improved activity label matching. In: BPM.
(2013)
23.
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and
reversals. Soviet Physics Doklady 10(8) (1966) 707–710
24.
Lin, D.: An information-theoretic definition of similarity. In: In Proceedings of
the 15th International Conference on Machine Learning, Morgan Kaufmann (1998)
296–304
25.
Leopold, H., Smirnov, S., Mendling, J.: On the refactoring of activity labels in
business process models. Information Systems 37(5) (2012) 443–459
26.
Sakr, S., Awad, A., Kunze, M.: Querying process models repositories by aggregated
graph search. In Rosa, M., Soffer, P., eds.: Business Process Management Workshops.
Volume 132 of Lecture Notes in Business Information Processing. Springer Berlin
Heidelberg (2013) 573–585
27.
Awad, A.: BPMN-Q: A language to query business processes. In Reichert, M.,
Strecker, S., Turowski, K., eds.: EMISA. Volume P-119 of LNI., St. Goar, Germany,
GI (2007) 115–128
28.
Weidlich, M., Sheetrit, E., Branco, M., Gal, A.: Matching business process models
using positional language models. In: 32nd International Conference on Conceptual
Modeling, ER 2013, Hong Kong (2013)
29.
Lv, Y., Zhai, C.: Positional language models for information retrieval. In Allan, J.,
Aslam, J.A., Sanderson, M., Zhai, C., Zobel, J., eds.: SIGIR, ACM (2009) 299–306
30.
Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on
Information Theory 37(1) (1991) 145–151
31.
Gal, A., Sagi, T.: Tuning the ensemble selection process of schema matchers. Inf.
Syst. 35(8) (2010) 845–859
32.
Weidlich, M., Dijkman, R.M., Mendling, J.: The ICoP framework: Identification of
correspondences between process models. In Pernici, B., ed.: CAiSE. Volume 6051
of Lecture Notes in Computer Science., Springer (2010) 483–498
22 Cayoglu et al.
33.
Qu, Y., Hu, W., Cheng, G.: Constructing virtual documents for ontology matching.
In Carr, L., Roure, D.D., Iyengar, A., Goble, C.A., Dahlin, M., eds.: WWW, ACM
(2006) 23–31
34.
Weidlich, M., Mendling, J., Weske, M.: Efficient consistency measurement based
on behavioral profiles of process models. IEEE Trans. Software Eng.
37
(3) (2011)
410–429
35.
Polyvyanyy, A., Weidlich, M.: Towards a compendium of process technologies -
the jbpt library for process model analysis. In Deneck`ere, R., Proper, H.A., eds.:
CAiSE Forum. Volume 998 of CEUR Workshop Proceedings, CEUR Workshop
Proceedings., CEUR-WS.org CEUR-WS.org (2013) 106–113
36.
Sagi, T., Gal, A.: Schema matching prediction with applications to data source
discovery and dynamic ensembling. The VLDB Journal (2013) To appear.
37.
Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M.: User
feedback as a first class citizen in information integration systems. In: CIDR,
www.cidrdb.org (2011) 175–183
38.
Nguyen, Q.V.H., Wijaya, T.K., Miklos, Z., Aberer, K., Levy, E., Shafran, V., Gal,
A., Weidlich, M.: Minimizing human effort in reconciling match networks. In:
Proceedings of the 32nd International Conference on Conceptual Modeling (ER).
(2013) To appear.
39.
Sagi, T., Gal, A.: Non-binary evaluation for schema matching. In Atzeni, P.,
Cheung, D.W., Ram, S., eds.: ER. Volume 7532 of Lecture Notes in Computer
Science., Springer (2012) 477–486
40.
Duchateau, F., Bellahsene, Z., Coletta, R.: Matching and alignment: What is the
cost of user post-match effort? - (short paper). In Meersman, R., Dillon, T.S.,
Herrero, P., Kumar, A., Reichert, M., Qing, L., Ooi, B.C., Damiani, E., Schmidt,
D.C., White, J., Hauswirth, M., Hitzler, P., Mohania, M.K., eds.: OTM Conferences
(1). Volume 7044 of Lecture Notes in Computer Science., Springer (2011) 421–428
... For example, process model matching algorithms are designed to automatically identify activities that represent similar functionality. Despite the substantial attention that process model matching received, existing solution approaches have not yet yielded satisfactory and practically usable performance, as prominently demonstrated in the process model matching contests (Cayoglu et al., 2013;Antunes et al., 2015). This is a direct result of the lack of objectivity: matching approaches rely on general-purpose, off-the-shelf knowledge bases and techniques, but need to interpret less objective process models with, e.g., heterogeneous terminology (i.e., differences in labeling styles, domain terminology, etc., as observed by Klinkmüller et al., 2013) and different structures that express similar control flow relationships (Klinkmüller and Weber, 2017). ...
Article
Full-text available
It may be tempting for researchers to stick to incremental extensions of their current work to plan future research activities. Yet there is also merit in realizing the grand challenges in one’s field. This paper presents an overview of the nine major research problems for the Business Process Management discipline. These challenges have been collected by an open call to the community, discussed and refined in a workshop setting, and described here in detail, including a motivation why these problems are worth investigating. This overview may serve the purpose of inspiring both novice and advanced scholars who are interested in the radical new ideas for the analysis, design, and management of work processes using information technology.
... Process Model Matching (PMM) refers to the identification of correspondences between elements of two process models [19]. The PMM problem gained attention in 2013 when the first PMM Contest was organized to effectively evaluate the matching techniques [20]. Seven matching techniques participated in the contest and the results of all the techniques were released. ...
Article
Full-text available
COVID-19 has imposed unprecedented restrictions on the society which has compelled the organizations to work ambidextrously. Consequently, the organizations need to go continuously monitor the performance of their business process and improve them. To facilitate that, this study has put-forth the idea of augmenting business process models with end-user feedback and proposed a machine learning based approach (AugProMo) to automatically identify correspondences between end-user feedback and elements of process models. In particular, we have generated three valuable resources, process models, feedback corpus and gold standard benchmark correspondences. Furthermore, 2880 experiments are performed to identify the most effective combination of word embeddings, feature vectors, data balancing and machine learning techniques. The study concludes that the proposed approach is effective for augmenting business process models with end-user feedback.
... Process Model Matching (PMM) refers to the identification of correspondences VOLUME 4, 2016 between elements of two process models [20]. The PMM problem gained attention in 2013 when the first PMM Contest was organized to effectively evaluate the matching techniques [21]. Seven matching techniques participated in the contest and the results of all the techniques were released. ...
Article
Full-text available
COVID-19 has imposed unprecedented restrictions on the society which has compelled the organizations to work ambidextrously. Consequently, the organizations need to continuously monitor the performance of their business process and improve them. To facilitate that, this study has put-forth the idea of augmenting business process models with end-user feedback and proposed a machine learning based approach (AugProMo) to automatically identify correspondences between end-user feedback and elements of process models. In particular, we have generated three valuable resources, process models, feedback corpus and gold standard benchmark correspondences. Furthermore, 2880 experiments are performed to identify the most effective combination of word embeddings, feature vectors, data balancing and machine learning techniques. The study concludes that the proposed approach is effective for augmenting business process models with end-user feedback.
... It is conveniently possible to collect feedback about university processes as the end-users of these process (students) are conveniently available for providing feedback about processes. The university case studies have been used for other BPM tasks, such as Process Model Matching Contests 2013 [17], Process Model Matching Contest 2015 [19], and Process Model Matching at OAEI 2017 [20]. ...
Conference Paper
Full-text available
Business Process Management (BPM) is an established discipline that uses business processes for organizing the operations of an enterprise. The enterprises that embrace BPM continuously analyze their processes and improve them to achieve competitive edge. Consequently, a plethora of studies have developed contrasting approaches to analyze business processes. These approach vary from examining event logs of process-aware information systems to employing the data warehousing technology for analyzing the execution logs of business processes. In contrast to these classical approaches, this work proposes to combine two prominent domains, BPM and Natural Language Processing, for analyzing business processes. In particular, this study has proposed to perform sentiment analysis of end-user feedback on business processes to assess the satisfaction level of end-users. More specifically, firstly, a structured approach is used to develop a corpus of over 7000 user-feedback sentences. Secondly, these feedback sentences are annotated at three levels of classification, where, the first-level classification determines the relevance of a sentence to the process. Whereas, the second-level classifies the relevant sentences across four process performance dimensions, time, cost, quality and flexibility, and the third-level classifies the sentences into positive, negative, or neutral sentiments. Finally, 78 experiments are performed to determine the effectiveness of six supervised learning techniques and one state-of-the-art deep learning technique for the automatic classification of user feedback sentences at three levels of classifications. The results show that deep learning technique is most effective for the classification tasks.
... They can stem from real-world domains or artificial simulation. Among others, such benchmark data sets are available for part-of-speech tagging [Paul and Baker 1992], image recognition [Fei-Fei et al. 2006], ontology matching [Algergawy et al. 2019], process model matching [Cayoglu et al. 2013], and vehicle routing problems [Defryn et al. 2016]. Benchmark data sets not only facilitate comparative evaluation. ...
Preprint
Full-text available
There is an ongoing debate in computer science how algorithms should best be studied. Some scholars have argued that experimental evaluations should be conducted, others emphasize the benefits of formal analysis. We believe that this debate less of a question of either-or, because both views can be integrated into an overarching framework. It is the ambition of this paper to develop such a framework of algorithm engineering with a theoretical foundation in the philosophy of science. We take the empirical nature of algorithm engineering as a starting point. Our theoretical framework builds on three areas discussed in the philosophy of science: ontology, epistemology and methodology. In essence, ontology describes algorithm engineering as being concerned with algorithmic problems, algorithmic tasks, algorithm designs and algorithm implementations. Epistemology describes the body of knowledge of algorithm engineering as a collection of prescriptive and descriptive knowledge, residing in World 3 of Popper's Three Worlds model. Methodology refers to the steps how we can systematically enhance our knowledge of specific algorithms. In this context, we identified seven validity concerns and discuss how researchers can respond to falsification. Our framework has important implications for researching algorithms in various areas of computer science.
... Business process model matching has been a challenging research area where there have been numerous attempts to provide effective and accurate techniques. Process model matching describes the task of finding corresponding transitions in two given process models, whose roots stem from process model similarity [4,6,7,16,17] and ontology matching [8] relying on structural and label comparison of processes [2,5]. Researchers have primarily developed label-based matching techniques which assesses the similarity of acitivity labels in process models. ...
Article
Full-text available
The rapid increase in generation of business process models in the industry has raised the demand on the development of process model matching approaches. In this paper, we introduce a novel optimization-based business process model matching approach which can flexibly incorporate both the behavioral and label information of processes for the identification of correspondences between activities. Given two business process models, we achieve our goal by defining an integer linear program which maximizes the label similarities among process activities and the behavioral similarity between the process models. Our approach enables the user to determine the importance of the local label-based similarities and the global behavioral similarity of the models by offering the utilization of a predefined weighting parameter, allowing for flexibility. Moreover, extensive experimental evaluation performed on three real-world datasets points out the high accuracy of our proposal, outperforming the state of the art.
Chapter
This chapter elaborates on the concepts of designing, implementing, and executing process choreographies and follows a model-driven, top-down approach, starting from the conceptual design of process choreography models in BPMN, over their verification using Workflow Nets, to their execution design through correlation mechanisms. This chapter also discusses the bottom-up design of process orchestrations and choreographies using process mining techniques.
Chapter
Business process models play an important role in today’s organizations and they are stored in models repositories. Organizations need to handle hundreds or even thousands of process models within their model repositories, which serve as a knowledge base for business process management. Similarity measures can detect similarities between Business process models and consequently they play an important role in the management of business processes. Existing researches are mostly based on the syntactic similarities based on labels of activities and deal with mapping of type 1:1. To address the problem, semantic similarities remain difficult to detect and this problem is accentuated when dealing with mapping of type n:m and considering large models. In this paper, we will present a solution for detecting similarities between business process models by taking into account the semantics. We will use a genetic algorithm, which is a well-known metaheuristic, to find a good enough mapping between two process models.KeywordsBusiness Process modelsSimilarity measuresSemanticsGenetic AlgorithmMatching
Chapter
The International Conference on Business Process Management (BPM) is a conference series with some remarkable successes over the last 20 years. In this paper, we discuss how neighboring fields have made progress. A key observation is the co-evolution of the problem and solution spaces: methodological innovations yield substantive advancements and, in turn, substantive findings help to improve methods. We discuss implications of this observation for business process science.KeywordsBusiness process classificationSubstantive knowledgeMethodological knowledgeTypes of business processesCo-evolution of problem and solution spaces
Chapter
Organizations store hundreds or even thousands of models nowadays in business process model repositories. This makes sophisticated operations, like conformance checking or duplicate detection, hard to conduct without automated support. Therefore, querying methods are used to support such tasks. This chapter reports on an evaluation of six techniques for similarity-based search of process models. Five of these approaches are based on Process Model Matching using various aspects of process models for similarity calculation. The sixth approach, however, is based on a technique from Information Retrieval and considers process models as text documents. All the techniques are compared regarding different measures from Information Retrieval. The results show the best performance for the non-matching-based technique, especially when a matching between models is difficult to determine.
Conference Paper
Full-text available
This paper presents the idea of a compendium of process technologies, i.e., a concise but comprehensive collection of techniques for process model analysis that support research on the design, execution, and evaluation of processes. The idea originated from observations on the evolution of process-related research disciplines. Based on these observations , we derive design goals for a compendium. Then, we present the jBPT library, which addresses these goals by means of an implementation of common analysis techniques in an open source codebase.
Article
Full-text available
Empirical evidence shows that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication arises for example when the repository covers multiple variants of the same processes or due to copy-pasting. Previous work has addressed the problem of efficiently retrieving exact clones that can be refactored into shared subprocess models. This paper studies the broader problem of approximate clone detection in process models. The paper proposes techniques for detecting clusters of approximate clones based on two well-known clustering algorithms: DBSCAN and Hierarchical Agglomerative Clustering (HAC). The paper also defines a measure of standardizability of an approximate clone cluster, meaning the potential benefit of replacing the approximate clones with a single standardized subprocess. Experiments show that both techniques, in conjunction with the proposed standardizability measure, accurately retrieve clusters of approximate clones that originate from copy-pasting followed by independent modifications to the copied fragments. Additional experiments show that both techniques produce clusters that match those produced by human subjects and that are perceived to be standardizable.
Conference Paper
Full-text available
Generating new knowledge from scientific databases, fusion-ing products information of business companies or computing an overlap between various data collections are a few examples of applications that require data integration. A crucial step during this integration process is the discovery of correspondences between the data sources, and the evaluation of their quality. For this purpose, a few measures such as pre-cision and recall have been designed. However, these measures do not evaluate user post-match effort, that matching approaches aim at re-ducing. Furthermore, the overall metric suffers from a major drawback. Thus, we present in this paper two measures to compute this user effort during the post-match phase, that takes into account both the correction of discovered correspondences and the manual search for missing ones. A set of experiments with three matching tools including a comparison with the overall measure highlights the benefits of our metrics. We also show that time performance during matching is not significant w.r.t. time performance during post-matching. 1 Plots : Evaluation In this section, we compare the quality results of three matching tools to demon-strate the benefits of our measure.
Conference Paper
Business operations are often documented by business process models. Use cases such as system validation and process harmonization require the identification of correspondences between activities, which is supported by matching techniques that cope with textual heterogeneity and differences in model granularity. In this paper, we present a matching technique that is tailored towards models featuring textual descriptions of activities. We exploit these descriptions using ideas from language modelling. Experiments with real-world process models reveal that our technique increases recall by up to factor five, largely without compromising precision, compared to existing approaches.
Conference Paper
Schema and ontology matching is a process of establishing correspondences between schema attributes and ontology concepts, for the purpose of data integration. Various commercial and academic tools have been developed to support this task. These tools provide impressive results on some datasets. However, as the matching is inherently uncertain, the developed heuristic techniques give rise to results that are not completely correct. In practice, post-matching human expert effort is needed to obtain a correct set of correspondences. We study this post-matching phase with the goal of reducing the costly human effort. We formally model this human-assisted phase and introduce a process of matching reconciliation that incrementally leads to identifying the correct correspondences. We achieve the goal of reducing the involved human effort by exploiting a network of schemas that are matched against each other.We express the fundamental matching constraints present in the network in a declarative formalism, Answer Set Programming that in turn enables to reason about necessary user input. We demonstrate empirically that our reasoning and heuristic techniques can indeed substantially reduce the necessary human involvement.