PreprintPDF Available

SAP Signavio Academic Models: A Large Process Model Dataset

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In this paper, we introduce the SAP Signavio Academic Models (SAP-SAM) dataset, a collection of hundreds of thousands of business models, mainly process models in BPMN notation. The model collection is a subset of the models that were created over the course of roughly a decade on academic.signavio.com, a free-of-charge software-as-a-service platform that researchers, teachers, and students can use to create business (process) models. We provide a preliminary analysis of the model collection, as well as recommendations on how to work with it. In addition, we discuss potential use cases and limitations of the model collection from academic and industry perspectives.
Content may be subject to copyright.
SAP Signavio Academic Models: A Large
Process Model Dataset
Diana Sola1,2[0000000156881730] , Christian Warmuth1[0000000301251824] ,
Bernhard Sch¨afer1,2[0000000343640086] , Peyman
Badakhshan1[0000000276277618] , Jana-Rebecca Rehse2 [0000000157076944],
and Timotheus Kampik1[0000000264582252]
1SAP Signavio, Berlin, Germany
2University of Mannheim, Mannheim, Germany
{diana.sola,christian.warmuth,bernhard.schaefer,
peyman.badakhshan,timotheus.kampik}@sap.com, rehse@uni-mannheim.de
Abstract. In this paper, we introduce the SAP Signavio Academic Mod-
els (SAP-SAM) dataset, a collection of hundreds of thousands of busi-
ness models, mainly process models in BPMN notation. The model col-
lection is a subset of the models that were created over the course of
roughly a decade on academic.signavio.com, a free-of-charge software-
as-a-service platform that researchers, teachers, and students can use to
create business (process) models. We provide a preliminary analysis of
the model collection, as well as recommendations on how to work with it.
In addition, we discuss potential use cases and limitations of the model
collection from academic and industry perspectives.
Keywords: Process Models ·Data Set ·Model Collection.
1 Introduction
Process models depict how organizations conduct their operations. They rep-
resent the basis for understanding, analyzing, redesigning, and automating pro-
cesses along the business process management (BPM) lifecycle [9]. As such, many
organizations posses large repositories of process models [11]. Having access to
such repositories would be tremendously beneficial for developing and testing
algorithms in the area of BPM, e.g., for process model querying [19] or refer-
ence model mining [20]. Also, the growing interest in applying machine learning
in the BPM field, e.g., for process model matching [1], process model abstrac-
tion [27] or process modeling assistance [24], underlines the relevance for large
model collections that can, for example, serve as training datasets.
However, researchers rarely have access to large collections of models from
practice. Such models can contain sensitive information about the organization’s
internal operations. Legal aspects and the fear of losing competitive advantage
thus discourage companies from publishing their business (process) models [25].
This inherent dilemma has so far largely prevented the publication of large-scale
model collections for research, as they are common in related research fields [25].
arXiv:2208.12223v1 [cs.OH] 24 Aug 2022
2 D. Sola et al.
In this paper, we introduce SAP Signavio Academic Models (SAP-SAM), a
model collection that consists of hundreds of thousands of process and business
models in different notations. We provide a basic overview of datasets related
to SAP-SAM, as well as the origin and structure of it. Subsequently, we present
selected properties and use cases of SAP-SAM. Finally, we discuss limitations of
the dataset along with recommendations on how to work with it.
2 Related Datasets
Compared to SAP-SAM, existing process model collections are rather small.
The hdBPMN [21] dataset, for example, contains 704 BPMN 2.0 models. This
collection has the special feature that the models are handwritten and can be
parsed as BPMN 2.0 XML. Another example is RePROSitory [5] (Repository of
open PROcess models and logS) which is an open collection of business process
models and logs, meaning users can contribute to the repository by uploading
their own data. At the time of writing, RePROSitory also contains around 700
models. Some models included in SAP-SAM have already been published [28].
However, the previously published dataset contains only 29,810 models that were
collected over a shorter period of time.
In the process mining community, the BPI challenge datasets, e.g., the BPI
challenge 2020 [8], have become important benchmarks. Unlike SAP-SAM, these
datasets consist of event logs from practice. Therefore, the applications of the
BPI challenge datasets only partially overlap with those of SAP-SAM.
3 Origins & Structure of SAP-SAM
SAP-SAM contains 1,021,471 process and business models that were created
using the software-as-a-service platform of the SAP Signavio Academic Initia-
tive3(SAP-SAI), roughly from 2011 to 20214. Most models are in Business Pro-
cess Model and Notation (BPMN 2.05). SAP-SAI allows academic researchers,
teachers, and students to create, execute, and analyze process models, as well
as related business models, e.g., of business decisions. The usage of SAP-SAI is
restricted to non-commercial research and education. Upon registration, users
consent that the models they create can be made available for research pur-
poses, either anonymized or non-anonymized. SAP-SAM contains those models
3See: signavio.com/bpm-academic-initiative/ (accessed at 2022-07-25)
4The total number includes vendor-provided example models, which are automatically
added to newly created workspaces (process repositories that users register). About
470,000 models in the dataset bear the name of an example model, but this can only
be a rough estimate of the number of example models in the dataset.
5Technically, the latest version of BPMN is, at the time of writing, BPMN 2.0.2. How-
ever, little has changed between 2.0 and 2.0.2. We assume that the informal cross-
vendor alignment efforts of the OMG BPMN Model Interchange Working Group are
more substantial than formal progress between minor versions. In the following, we
therefore use BPMN 2.0 to refer to any version among 2.0 and 2.0.2.
SAP-SAM: A Large Process Model Dataset 3
for which users have consented to non-anonymized sharing. Still, anonymiza-
tion scripts were run to post-process the models, in particular to remove email
addresses, student registration numbers, and—to the extent possible—names.
The models in SAP-SAM were created between July 2011 and (incl.) Septem-
ber 2021 by a total of 72,996 users, based on a count of distinct user IDs that
are associated with the creation or revision of a model. The models were ex-
tracted from the MySQL database of SAP-SAI and are in SAP Signavio’s pro-
prietary JSON-based data format. The BPMN models are conceptually BPMN-
2.0-standard-compliant, i.e., individual models can be converted to BPMN 2.0
XML using the built-in functionality of SAP-SAI. Decision Model and Notation
(DMN) models can be exported analogously. The dataset contains models in the
following notations:
Business Process Model and Notation (BPMN): BPMN is a standardized no-
tation for modeling business processes [15]. SAP-SAM distinguishes between
BPMN process models, collaboration models, and choreography models, and
among BPMN process models between BPMN 1.1 and BPMN 2.0 models.
Decision Model and Notation (DMN): DMN is a standardized notation for
modeling business decisions, complementing BPMN [17].
Case Management Model and Notation (CMMN): CMMN is an attempt to
supplement BPMN and DMN with a notation that focuses on agility and
autonomy [16].
Event-driven Process Chain (EPC): EPC [22] is a process modeling notation
that enjoyed substantial popularity before the advent of BPMN.
Unified Modeling Language (UML): UML is a modeling language used to
describe software (and other) systems. It is subdivided into class and use
case diagrams.
Value Chain: A value chain is an informal notation for sketching high-level
end-to-end processes and process frameworks.
ArchiMate: ArchiMate is a notation for the integrated modeling of informa-
tion technology and business perspectives on large organizations [13].
Organization Chart: Organization charts are tree-like models of organiza-
tional hierarchies.
Fundamental Modeling Concepts (FMC) Block Diagram: FMC block dia-
grams support the modeling of software and IT system architectures.
(Colored) Petri Net: Petri nets [18] are a popular mathematical modeling
language for distributed systems and a crucial preliminary for many formal
foundations of BPM. In SAP-SAM, colored Petri nets [12] are considered a
separate notation.
Journey Map: Journey maps model the customer’s perspective on an orga-
nization’s business processes.
Yet Another Workflow Language (YAWL): YAWL is a language for modeling
the control flow logic of workflows [26].
jBPM: jBPM models allowed for the visualization of business process models
that could be executed by the jBPM business process execution engine before
the BPMN 2.0 XML serialization format existed. However, recent versions
of jBPM rely on BPMN 2.0-based models.
4 D. Sola et al.
Process Documentation Template: Process documentation templates sup-
port the generation of comprehensive PDF-based process documentation re-
ports. These templates are technically a model notation, although they may
practically be considered a reporting tool instead.
XForms: XForms is a (dated) standard for modeling form-based graphical
user interfaces [2].
Chen Notation: Chen notation diagrams [3] allow for the creation of entity-
relationship models.
SAP-SAM is available at https://zenodo.org/record/7012043. Its license
supports non-commercial use for research purposes, e.g., usage for the evaluation
of academic research artifacts, such as algorithms and related software artifacts.
4 Properties of SAP-SAM
SAP-SAM comprises models in different modeling notations and languages, as
well as of varying complexity. In this section, we provide an overview of selected
properties of SAP-SAM. The source code that we used to examine the properties
is available at https://github.com/signavio/sap-sam.
Modeling notations. Figure 1 depicts the number of models in different no-
tations in the dataset, as well as the according percentages (in brackets). We
aggregate notations which are used for less than 100 models respectively into
Other: Process Documentation Template (86 models), jBPM 4 (76 models),
XForms (20 models), and Chen Notation (3 models). The primarily used mod-
eling notation is BPMN 2.0, which confirms that it is the de-facto standard for
modeling business processes [4]. Therefore, we will focus on BPMN 2.0 models
as we examine further properties.
Languages. Since SAP-SAI can be used by academic researchers, teachers and
students all over the world, the models in SAP-SAM are created using different
languages. For example, SAP-SAM includes BPMN 2.0 models in 41 different
languages. Figure 2 shows the ten most frequently used languages for BPMN 2.0
models. Note that the vendor-provided example models, which are added to
newly created workspaces, exist in English, German, and French. When a SAP-
SAI workspace is created, the example models added to it are in German or
French if the language configured upon creation is German or French, respec-
tively; otherwise, the example models are in English. This contributes to the fact
that more than half of the BPMN 2.0 models (57.43 %) are in English.
Elements. Figure 3 illustrates the occurrence frequency of different element
types in the BPMN 2.0 models of SAP-SAM. It can be recognized that the el-
ement types are not equally distributed, which confirms the findings of prior
research [14]. The number of models that contain at least one instance of a par-
ticular element type is much higher for some types, e.g., sequence flow (98.88 %)
or task (98.11 %), than for others, e.g., collapsed subprocess (25.23 %) or start
message events (25.42 %). Note that Figure 3 only includes element types that
are used in at least 10 % of the BPMN 2.0 models. More than 30 element types
SAP-SAM: A Large Process Model Dataset 5
0 100000 200000 300000 400000 500000 600000 700000
No. of Models (Percentage)
BPMN 2.0
Value Chain
DMN 1.0
EPC
BPMN 1.1
UML 2.2 Class
Petri Net
ArchiMate 2.1
UML Use Case
Organigram
BPMN 2.0 Choreography
BPMN 2.0 Conversation
FMC Block Diagram
CMMN 1.0
CPN
Journey Map
YAWL 2.2
Other
Notation
618807 (60.58 %)
194078 (19.00 %)
98286 (9.62 %)
32369 (3.17 %)
15643 (1.53 %)
14953 (1.46 %)
11207 (1.10 %)
10956 (1.07 %)
10228 (1.00 %)
4568 (0.45 %)
4096 (0.40 %)
2788 (0.27 %)
1398 (0.14 %)
999 (0.10 %)
385 (0.04 %)
287 (0.03 %)
238 (0.02 %)
185 (0.02 %)
Fig. 1. Usage of different modeling notations.
0 50000 100000 150000 200000 250000 300000 350000
No. of Models (Percentage)
English
German
Italian
Spanish
French
Croatian
Portuguese
Estonian
Dutch
Slovenian
Language
347730 (57.43 %)
158956 (26.25 %)
19307 (3.19 %)
17282 (2.85 %)
9459 (1.56 %)
8345 (1.38 %)
6698 (1.11 %)
4874 (0.80 %)
4845 (0.80 %)
4133 (0.68 %)
Fig. 2. Usage of different languages for BPMN 2.0 models.
are used by less than 1 % of the models. On average, a BPMN 2.0 model in
SAP-SAM contains 11.3 different element types (median: 11) and 46.7 different
elements, i.e., instances of element types (median: 40).
Table 1 shows the number of elements per model by type. For a compact rep-
resentation, we aggregate similar element types by arranging them into groups.
On average, connecting objects, which include associations and flows, make up
the largest proportion of the elements in a model (mean: 23.1, median: 20).
Labels. All elements of a BPMN 2.0 model can be labeled by the modeler,
which results in a total of 2,820,531 distinct labels for the 28,293,762 elements
of all BPMN 2.0 models in SAP-SAM. Figure 4 depicts the distribution of label
usage frequencies. We sorted the labels based on their absolute usage frequency
in descending order and aggregated them in bins of size 10,000 to visualize the
6 D. Sola et al.
0 100000 200000 300000 400000 500000 600000
No. of Models (Percentage)
Sequence flow
Task
End none event
Lane
Pool
Start none event
Exclusive databased gateway
Message flow
Association unidirectional
Collapsed pool
Association undirected
Intermediate message event catching
Data object
Parallel gateway
IT system
Intermediate timer event
Start message event
Collapsed subprocess
Eventbased gateway
Intermediate message event throwing
Text annotation
Element Type
598712 (98.88 %)
594055 (98.11 %)
558940 (92.31 %)
502013 (82.91 %)
501858 (82.88 %)
470583 (77.72 %)
464579 (76.73 %)
330932 (54.65 %)
287010 (47.40 %)
240177 (39.67 %)
235286 (38.86 %)
232430 (38.39 %)
223692 (36.94 %)
175277 (28.95 %)
175230 (28.94 %)
171562 (28.33 %)
153929 (25.42 %)
152786 (25.23 %)
128083 (21.15 %)
88572 (14.63 %)
66940 (11.06 %)
Fig. 3. Occurrence frequency of different BPMN 2.0 element types.
unevenness of the distribution. The first bin (leftmost bar in the chart) therefore
contains the 10,000 most frequently used labels for the elements in the BPMN 2.0
models. Overall, 53.9 % of all elements in the BPMN 2.0 models are labeled with
these first 10,000 labels. On the other hand, the long-tail distribution indicates
that many of the labels are used for only one element of all BPMN 2.0 models.
More precisely, 1,829,891 (64.9 %) of the labels are used only one time. The
unevenness of the label usage distribution can again partly be explained by the
vendor-provided examples in the dataset: The labels of the example processes
appear very frequently in the dataset.
5 SAP Signavio Academic Models Applications
As pointed out above, large process model collections like SAP-SAM are a valu-
able and critical resource for BPM research. Process models from practice codify
organizational knowledge about business processes and methodical knowledge
about modeling practices. Both types of knowledge can be used by research, for
example, for deriving recommendations for the design of future models. In addi-
tion, large process model collections are required for evaluating newly developed
BPM algorithms and techniques regarding their applicability in practice.
To illustrate the potential value of SAP-SAM for the BPM community, the
following list describes some application scenarios that we consider to be partic-
ularly relevant. It is neither prescriptive nor comprehensive; researchers can use
SAP-SAM for many other purposes.
SAP-SAM: A Large Process Model Dataset 7
Table 1. Statistics of the number of elements per BPMN 2.0 model by type (grouped).
Element type groups Mean Std Min 25% 50% 75% Max
Activities 8.6 8.4 0 4 7 10 1,543
Events 5.2 5.1 0 2 5 6 157
Gateways 3.7 4.4 0 2 3 4 303
Connecting Objects 23.1 21.8 0 14 20 25 2,066
Swimlanes 3.8 2.6 0 3 4 5 227
Data Elements 1.3 3.4 0 0 0 2 266
Artifacts 0.9 4.0 0 0 0 1 529
1 50 101 151 201
Bins of 10,000 Labels
10,000
100,000
1,000,000
10,000,000
Usage Frequency (Log Scale)
Fig. 4. Distribution of the label usage frequency in BPMN 2.0 models. Each bar rep-
resents a bin of 10,000 distinct labels.
Knowledge Generation. Process models depict business processes, codifying
knowledge about the operations within organizations. This knowledge can be
extracted and generalized to a broader context. Hence, SAP-SAM can be con-
sidered as a knowledge base to generate new insights into the contents and the
practice of organizational modeling. Example applications include:
Reference model mining [20]: Reference models provide a generic template
for the design of new processes in a certain industry. They can be mined by
merging commonalities between existing processes from different contexts
into a new model that abstracts from their specific features. By applying
this technique to subsets of similar models from SAP-SAM, we can mine new
reference models for process landscapes or individual processes, including,
e.g., the organizational perspective. Similarly, we could identify, analyze, and
compare different variants of the same process.
Identifying modeling patterns [10]: Process model patterns provide proven
solutions to recurring problems in process modeling. They can help in stream-
lining the modeling process and standardizing the use of modeling concepts.
8 D. Sola et al.
A dataset like SAP-SAM which contains process models from many different
modelers, provides an empirical foundation both for finding new modeling
patterns and for validating existing ones. This also extends to process model
antipatterns, i.e., patterns that should be avoided, as well as modeling guide-
lines and conventions.
Modeling Assistance. The modeling knowledge that is codified in SAP-SAM
can also be used for automated assistance functions in modeling tools. Such
assistance functions support modelers in creating or updating process models,
accelerating and facilitating the modeling process. However, many assistance
functions are based on machine learning techniques and therefore require a large
set of training data to generate useful results. With its large amounts of contained
modeling structures and labels, SAP-SAM offers a substantial training set, for
example, for the following applications:
Process model auto-completion [23]. By providing recommendations on pos-
sible next modeling steps, process model auto-completion can speed up mod-
eling and facilitate consistency of the terms and modeling patterns that are
used by an organization. Besides structural next element type recommen-
dations, text label suggestions or even recommendations of entire process
segments are possible. SAP-SAM can be used to train machine learning
models for these purposes.
Automated abstraction techniques [27]: One important function of BPM is
process model abstraction, i.e., the aggregation of model elements into less
complex, higher-level structures to enable a better understanding of the over-
all process. Such an aggregation entails the identification and assignment of
higher-level categories to groups of process elements. SAP-SAM can provide
the necessary training data for an NLP-based automated abstraction.
Evaluation. Managing large repositories of process models is a key application
of BPM [7]. Researchers have developed many different approaches to assist or-
ganizations with this task. To make these approaches as productive as possible,
they need to be tested on datasets that are comparable to those within orga-
nizations. Since SAP-SAM goes well beyond the size of related datasets, it can
be used for large-scale evaluations of existing process management approaches
on data from practice. Examples for these approaches include process model
querying [19], process model matching [1], and process model similarity [6].
6 Limitations and Recommendations for Usage
As explained in the previous section, SAP-SAM can be used by the academic
community to test and evaluate a plethora of tools and algorithms that address a
wide range of process querying and business process analytics use cases. However,
in the context of any evaluation, the limitations of the dataset need to be taken
into account. Considering the nature of SAP-SAM as a model collection that has
been generated by academic researchers, teachers, and students, the following
limitations must be considered:
SAP-SAM: A Large Process Model Dataset 9
Many models in SAP-SAM exist multiple times, either as direct duplicates
(copies) or as very similar versions. This includes vendor-provided example
models or standard academic examples that are frequently used in academic
teaching and research. The existence of these models can be used to evaluate
variant identification and fuzzy matching approaches in process querying,
but it negatively affects the diversity, i.e., the breadth of the dataset.
Many models may be of low technical quality, in particular the models that
are created by “process modeling beginners”, i.e., early-stage students, for
learning purposes. Although it can be interesting to analyze the mistakes or
antipatterns in such models, flawed models can, for example, be problematic
when using the dataset for generating modeling recommendations based on
machine learning. Also, the mistakes that students make are most likely not
representative of mistakes made by process modeling practitioners.
Because many of the models have most likely been created for either teaching,
learning, or demonstrating purposes, they presumably present a simplistic
perspective on business processes. Even when assuming that all researchers,
teachers, and students are skilled process modelers6and have a precise un-
derstanding of the underlying processes when modeling, the purpose of their
models is typically fundamentally different from the purpose of industry pro-
cess models. Whereas academic models often emphasize technical precision
and correctness, industry models usually focus on a particular business goal,
such as the facilitation of stakeholder alignment.
Let us note that this list may not be exhaustive; in particular, limitations
that depend on a particular use case or evaluation scenario need to be identified
by researchers who will use this dataset. Still, it is also worth highlighting that
the rather “messy” nature of the model collection reflects the reality of industrial
data science challenges, in which a sufficiently large amount of high-quality data
(or models) is typically not straightforwardly available [11]; instead, substantial
efforts need to be made to separate the wheat from the chaff, or to isolate use-
cases in which the flaws in the data do not have an adverse effect on business
value, or any other undesirable organizational or societal implications.
When using SAP-SAM for academic research purposes, it typically makes
sense to filter it, i.e., to reduce it to a subset of models that satisfy desirable
properties. Here, we provide some recommendations to help with this step.
It typically makes sense to filter out the vendor-provided example models
that are created by the SAP-SAI system upon workspace creation.
For many use cases, researchers may want to sort out process models that
contain a very small or a very large number of elements. As can be expected
for BPMN 2.0 models and is shown in Figure 5, the number of nodes and
the number of edges in a model are highly correlated. Hence, it is sufficient
to filter according to the number of nodes. There is no need to additionally
filter according to the number of edges.
6Considering the previous point, that means even when focusing on the subset of
the model collection that only entails models carefully created by skilled advanced
students, teachers, and researchers.
10 D. Sola et al.
Fig. 5. Correlation of the number of nodes and edges in BPMN 2.0 models.
Similarly, researchers may want to sort out process models where the element
labels have an average length of less than, for example, three characters to
ensure that only models with useful labels are included.
Let us again highlight that example code that demonstrates how the dataset
can be queried, as well as the code for the analysis in this paper is available at
https://github.com/signavio/sap-sam.
7 Conclusion
In this paper, we have presented the SAP-SAM dataset of process and other busi-
ness models. We are confident in our assumption that SAP-SAM is, by far, the
largest publicly available collection of business process models. Hence, it can—
despite the limitation that it entails “academic” models created by researchers,
teachers, and students and not by process management professionals—serve as
an excellent basis for developing and evaluating tools and algorithms for process
model querying and analysis.
In the future, SAP-SAM can potentially be augmented by including the fol-
lowing additional data objects:
Business objects/dictionary entries: In addition to models, SAP-SAI sup-
ports the creation of business objects, so-called dictionary entries. These ob-
jects represent, for example, organizational roles, documents, or IT systems
and can be linked to models to then be re-used across a process landscape
that entails many models. Dictionary entries facilitate process landscape
maintenance, as well as reporting.
SAP-SAM: A Large Process Model Dataset 11
Standard-conform XML serializations: The models in the SAP-SAM dataset
are serialized using a non-standardized JSON format that i) supports a gen-
eralization of modeling notations and ii) is more convenient to use than
XML-based serializations within the JavaScript-based front-ends of modern
web applications. However, proprietary components exist that can—in the
case of BPMN, DMN, and CMMN models—generate XML serializations
which are compliant with the corresponding Object Management Group
standards. Adding these XML serializations to the dataset can facilitate
academic use, as many open-source and prototypical software tools support
the open standards.
PNG or SVG image representations: Similarly, to allow for a more straight-
forward visualization of models, PNG and SVG representations of the SAP-
SAM models can be generated and included.
References
1. Antunes, G., Bakhshandelh, M., Borbinha, J., Cardoso, J., Dadashnia, S., et al.:
The process model matching contest 2015. In: Enterprise Modelling and Informa-
tion Systems Architectures. pp. 127–155. ollen (2015)
2. Bruchez, E., Couthures, A., Pemberton, S., Van den Bleeken, N.: XForms 2.0.
W3C Working Draft, World Wide Web Consortium (W3C) (2012), https://www.
w3.org/TR/xforms20/
3. Chen, P.P.S.: The entity-relationship model—toward a unified view of data. ACM
Trans. Database Syst. 1(1), 9–36 (1976)
4. Chinosi, M., Trombetta, A.: Bpmn: An introduction to the standard. Computer
Standards & Interfaces 34(1), 124–134 (2012)
5. Corradini, F., Fornari, F., Polini, A., Re, B., Tiezzi, F.: Reprository: a repository
platform for sharing business process models. In: BPM (PhD/Demos). pp. 149–153
(2019)
6. Dijkman, R., Dumas, M., Van Dongen, B., arik, R., Mendling, J.: Similarity
of business process models: Metrics and evaluation. Information Systems 36(2),
498–516 (2011)
7. Dijkman, R., La Rosa, M., Reijers, H.: Managing large collections of business
process models-current techniques and challenges. Computers in Industry 63(2),
91–97 (2012)
8. van Dongen, B.: Bpi challenge 2020: Domestic declarations (2020).
https://doi.org/10.4121/UUID:3F422315-ED9D-4882-891F-E180B5B4FEB5
9. Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A.: Fundamentals of Business
Process Management. Springer, Berlin (2013)
10. Fellmann, M., Koschmider, A., Laue, R., Schoknecht, A., Vetter, A.: Business pro-
cess model patterns: state-of-the-art, research classification and taxonomy. Busi-
ness Process Management Journal (2018)
11. Houy, C., Fettke, P., Loos, P., van der Aalst, W.M., Krogstie, J.: Business process
management in the large. Business & Information Systems Engineering 3(6), 385–
388 (2011)
12. Jensen, K.: Coloured petri nets. In: Petri nets: central models and their properties,
pp. 248–299. Springer (1987)
12 D. Sola et al.
13. Lankhorst, M.M., Proper, H.A., Jonkers, H.: The architecture of the archimate
language. In: Enterprise, Business-Process and Information Systems Modeling. pp.
367–380. Springer (2009)
14. Muehlen, M.z., Recker, J.: How much language is enough? theoretical and prac-
tical use of the business process modeling notation. In: Seminal Contributions to
Information Systems Engineering, pp. 429–443. Springer (2013)
15. OMG: Business Process Model and Notation (BPMN), Version 2.0.2 (2013), http:
//www.omg.org/spec/BPMN/2.0.2
16. OMG: Case Management Model and Notation (CMMN), Version 1.1 (2016), http:
//www.omg.org/spec/CMMN/1.1
17. OMG: Decision Model and Notation (DMN), Version 1.3 (2021), http://www.omg.
org/spec/DMN/1.3
18. Petri, C.A.: Kommunikation mit automaten. Westf¨al. Inst. f. Instrumentelle Math-
ematik an der Univ. Bonn (1962)
19. Polyvyanyy, A.: Process querying: Methods, techniques, and applications. In:
Polyvyanyy, A. (ed.) Process Querying Methods, pp. 511–524. Springer (2022)
20. Rehse, J.R., Fettke, P., Loos, P.: A graph-theoretic method for the inductive devel-
opment of reference process models. Software & Systems Modeling 16(3), 833–873
(2017)
21. Sch¨afer, B., van der Aa, H., Leopold, H., Stuckenschmidt, H.: Sketch2bpmn: Auto-
matic recognition of hand-drawn bpmn models. In: Advanced Information Systems
Engineering. Springer (2021)
22. Scheer, A.W., Thomas, O., Adam, O.: Process Modeling using Event-Driven Pro-
cess Chains, chap. 6, pp. 119–145. John Wiley & Sons, Ltd (2005)
23. Sola, D., Aa, H.v.d., Meilicke, C., Stuckenschmidt, H.: Exploiting label semantics
for rule-based activity recommendation in business process modeling. Information
Systems 108(Article 102049) (2022)
24. Sola, D., Meilicke, C., van der Aa, H.: On the use of knowledge graph completion
methods for activity recommendation in business process modeling. In: Business
Process Management Workshops. p. 5. Springer (2021)
25. Thaler, T., Walter, J., Ardalani, P., Fettke, P., Loos, P.: The need for process
model corpora. In: FMI 2014. p. 14 (2014)
26. van der Aalst, W., ter Hofstede, A.: Yawl: yet another workflow language. Infor-
mation Systems 30(4), 245–275 (2005)
27. Wang, N., Sun, S., OuYang, D.: Business process modeling abstraction based on
semi-supervised clustering analysis. Business & Information Systems Engineering
60(6), 525–542 (2018)
28. Weske, M., Decker, G., Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A.:
Model Collection of the Business Process Management Academic Initiative (2020),
https://doi.org/10.5281/zenodo.3758705
... Of course, rules and template libraries validated by experts are much more reliable information sources than AI models, however, given good training datasets and reasonable attention of the modeler, this issue can be overcome. Such datasets are already appearing [9,10] and in the nearest future can become more reliable to avoid possible wrong recommendations of enterprise modeling decision support systems. ...
Conference Paper
Though enterprise modeling processes are intensively applied due to rapidly developing technologies, the artificial intelligence in this area today has a very limited application. However, it can potentially provide for significant improvement of the decision support during the enterprise modeling. The research presented in this paper aims at developing a methodology for building such a decision support system as well as its architecture and technology stack. The results are supported by description of the implemented prototype that has a client-server architecture and is based on the usage of Python, HTML, CSS, and JavaScript.
Conference Paper
Full-text available
Business process modeling is essential for organisations. However , it is a time-consuming task that requires expert knowledge. In particular , this is the case when modeling domain-specific processes, which often involves the consistent use of technical terminology. Process mod-elers can be supported through the provision of recommendations on how the model under development can be expanded. Activity recommendation is one such support approach, in which suitable activities to be inserted at a user-defined position are recommended. Recently, it has been suggested to treat activity recommendation as a knowledge graph completion task and to apply methods from this discipline. In this paper, we investigate different approaches to apply embedding-and rule-based knowledge graph completion methods out of the box and evaluate them in an experimental study. Additionally, we compare them to two methods that have specifically been designed for activity recommendation.
Article
Full-text available
Purpose Patterns have proven to be useful for documenting general reusable solutions to a commonly occurring problem. In recent years, several different business process management (BPM)-related patterns have been published. Despite the large number of publications on this subject, there is no work that provides a comprehensive overview and categorization of the published business process model patterns. The purpose of this paper is to close this gap by providing a taxonomy of patterns as well as a classification of 89 research works. Design/methodology/approach The authors analyzed 280 research articles following a structured iterative procedure inspired by the method for taxonomy development from Nickerson et al. (2013). Using deductive and inductive reasoning processes embedded in concurrent as well as joint research activities, the authors created a taxonomy of patterns as well as a classification of 89 research works. Findings In general, the findings extend the current understanding of BPM patterns. The authors identify pattern categories that are highly populated with research works as well as categories that have received far less attention such as risk and security, the ecological perspective and process architecture. Further, the analysis shows that there is not yet an overarching pattern language for business process model patterns. The insights can be used as starting point for developing such a pattern language. Originality/value Up to now, no comprehensive pattern taxonomy and research classification exists. The taxonomy and classification are useful for searching pattern works which is also supported by an accompanying website complementing the work. In regard to future research and publications on patterns, the authors derive recommendations regarding the content and structure of pattern publications.
Article
Full-text available
The most prominent Business Process Model Abstraction (BPMA) use case is the construction of the process “quick view” for rapidly comprehending a complex process. Some researchers propose process abstraction methods to aggregate the activities on the basis of their semantic similarity. One important clustering technique used in these methods is traditional k-means cluster analysis which so far is an unsupervised process without any priori information, and most of the techniques aggregate the activities only according to business semantics without considering the requirement of an order-preserving model transformation. The paper proposes a BPMA method based on semi-supervised clustering which chooses the initial clusters based on the refined process structure tree and designs constraints by combining the control flow consistency of the process and the semantic similarity of the activities to guide the clustering process. To be more precise, the constraint function is discovered by mining from a process model collection enriched with subprocess relations. The proposed method is validated by applying it to a process model repository in use. In an experimental validation, the proposed method is compared to the traditional k-means clustering (parameterized with randomly chosen initial clusters and an only semantics-based distance measure), showing that the approach closely approximates the decisions of the involved modelers to cluster activities. As such, the paper contributes to the development of modeling support for effective process model abstraction, facilitating the use of business process models in practice.
Conference Paper
Full-text available
In spite of the current research activities developing methods and techniques for business process model analysis, a standardized and digital available process model corpus for evaluating these methods and techniques is still missing. Particularly with regard to a consistent appreciation of information systems such a corpus is of high importance, as it improves the development of standardized evaluations. The benefit of such corpora can also be observed in other fields of research like computational linguistics, biology, chemistry or medicine. Against that background the position paper at hand motivates the need for model corpora in general and process model corpora in particular. A short introduction on what the authors already did in terms of developing and establishing a model corpus enriches the paper. The current prototypical corpus version contains reference models, models from practice and models from controlled environments and comprises 16 model collections with 2290 process models.
Article
Full-text available
Nowadays, business process management is an important approach for managing organizations from an operational perspective. As a consequence, it is common to see organizations develop collections of hundreds or even thousands of business process models. Such large collections of process models bring new challenges and provide new opportunities, as the knowledge that they encapsulate requires to be properly managed. Therefore, a variety of techniques for managing large collections of business process models is being developed. The goal of this paper is to provide an overview of the management techniques that currently exist, as well as the open research challenges that they pose.
Article
Business process modeling is a crucial task in organizations. Yet, the creation of consistent and complete process models is challenging and necessitates the support of process modelers with their task. In previous work, we presented a rule-based activity-recommendation approach, which recommends appropriate labels for a new activity inserted by a modeler in a process model under development. While our method has shown to work well, it is limited by the fact that it only learns rules that describe the inter-relations between complete activity labels. In the case that the model’s activities and the ones in the training repository are disjoint, the existing approach will thus not be able to provide any recommendations. In this paper, we overcome this restriction by additionally considering the natural language-based semantics of the process models. In particular, we propose a semantics-aware recommendation approach that extends the existing approach in both central phases, i.e., in the rule-learning phase and in the rule-application phase. We equip the rule learning with novel rule types, which capture action and business-object patterns in process models. For the rule application, we developed an optional similarity extension that allows rules to make recommendations even if the bodies of the rules are not exactly true for the given model. Through an evaluation on a large set of real-world process models, we demonstrate that the semantic extensions can improve the quality of recommendations.
Chapter
Despite the widespread availability of process modeling tools, the first version of a process model is often drawn by hand on a piece of paper or whiteboard, especially when several people are involved in its elicitation. Though this has been found to be beneficial for the modeling task itself, it also creates the need to manually convert hand-drawn models afterward, such that they can be further used in a modeling tool. This manual transformation is associated with considerable time and effort and, furthermore, creates undesirable friction in the modeling workflow. In this paper, we alleviate this problem by presenting a technique that can automatically recognize and convert a sketch process model into a digital BPMN model. A key driver and contribution of our work is the creation of a publicly available dataset consisting of 502 manually annotated, hand-drawn BPMN models, covering 25 different BPMN elements. Based on this data set, we have established a neural network-based recognition technique that can reliably recognize and transform hand-drawn BPMN models. Our evaluation shows that our technique considerably outperforms available baselines and, therefore, provides a valuable basis to smoothen the modeling process.
Article
Business process management is one of the most widely discussed topics in information systems research. As process models advance in both complexity and maturity, reference models, serving as reusable blueprints for the development of individual models, gain more and more importance. Only a few business domains have access to commonly accepted reference models, so there is a widespread need for the development of new ones. This article describes a new inductive approach for the development of reference models, based on existing individual models from the respective domain. It employs a graph-based paradigm, exploiting the underlying graph structures of process models by identifying frequent common subgraphs of the individual models, analyzing their order relations, and merging them into a new model. This newly developed approach is outlined and evaluated in this contribution. It is applied in three different case studies and compared to other approaches to the inductive development of reference models in order to highlight its characteristics as well as assets and drawbacks.
Article
It is common for large and complex organizations to maintain repositories of business process models in order to document and to continuously improve their operations. Given such a repository, this paper deals with the problem of retrieving those process models in the repository that most closely resemble a given process model or fragment thereof. The paper presents three similarity metrics that can be used to answer such queries: (i) label matching similarity that compares the labels attached to process model elements; (ii) structural similarity that compares element labels as well as the topology of process models; and (iii) behavioral similarity that compares element labels as well as causal relations captured in the process model. These similarity metrics are experimentally evaluated in terms of precision and recall, and in terms of correlation of the metrics with respect to human judgement. The experimental results show that all three metrics yield comparable results, with structural similarity slightly outperforming the other two metrics. Also, all three metrics outperform traditional search engines when it comes to searching through a repository for similar business process models.