ArticlePDF Available

Checking Process Compliance against Natural Language Specifications using Behavioral Spaces

Authors:

Abstract and Figures

Textual process descriptions are widely used in organizations since they can be created and understood by virtually everyone. Because of their widespread use, they also provide a valuable source for process analysis, such as compliance checking. However, the inherent ambiguity of natural language impedes the automated analysis of textual process descriptions. While human readers can use their context knowledge to correctly understand statements with multiple possible interpretations, automated tools currently have to make assumptions about their correct meaning. As a result, compliance-checking techniques are prone to draw incorrect conclusions about the proper execution of a process. To provide a comprehensive solution to these reasoning problems, we use this paper to introduce the concept of a behavioral space as a means to deal with behavioral ambiguity in textual process descriptions. A behavioral space captures all possible interpretations of a textual process description in a systematic manner. Thus, it avoids the problem of focusing on a single, possibly incorrect interpretation. We use a quantitative evaluation with a set of of 47 textual process descriptions to demonstrate the usefulness of a behavioral space for compliance checking in the context of ambiguous texts.
Content may be subject to copyright.
Checking Process Compliance against Natural Language Specifications
using Behavioral Spaces
Han van der Aaa,, Henrik Leopolda, Hajo A. Reijersa,b
aDepartment of Computer Sciences, VU University Amsterdam, Faculty of Sciences,
De Boelelaan 1081, 1081HV Amsterdam, The Netherlands
bDepartment of Mathematics and Computer Science, Eindhoven University of Technology,
PO Box 513, 5600MB Eindhoven, The Netherlands
Abstract
Textual process descriptions are widely used in organizations since they can be created and understood
by virtually everyone. Because of their widespread use, they also provide a valuable source for process
analysis, such as compliance checking. However, the inherent ambiguity of natural language impedes the
automated analysis of textual process descriptions. While human readers can use their context knowledge
to correctly understand statements with multiple possible interpretations, automated tools currently have to
make assumptions about their correct meaning. As a result, compliance-checking techniques are prone to
draw incorrect conclusions about the proper execution of a process. To provide a comprehensive solution
to these reasoning problems, we use this paper to introduce the concept of a behavioral space as a means
to deal with behavioral ambiguity in textual process descriptions. A behavioral space captures all possible
interpretations of a textual process description in a systematic manner. Thus, it avoids the problem of
focusing on a single, possibly incorrect interpretation. We use a quantitative evaluation with a set of of 47
textual process descriptions to demonstrate the usefulness of a behavioral space for compliance checking in
the context of ambiguous texts.
Keywords: business process analysis, compliance checking, natural language processing, ambiguity
Corresponding author. Phone number: +31 20 59 87788
Email addresses: j.h.vander.aa@vu.nl (Han van der Aa), h.leopold@vu.nl (Henrik Leopold), h.a.reijers@vu.nl
(Hajo A. Reijers)
Preprint submitted to Information Systems July 19, 2017
1. Introduction
Non-compliance represents a risk for many organizations. According to a recent study by Thomson
Reuters, non-compliance may even represent a possible cause of bankruptcy, also for the so-called “be-
hemoths” in the financial sector [1]. Recognizing the risk that is associated with non-compliance, orga-
nizations in a wide range of domains are stepping up their spending in order to ensure their compliance
with laws, regulations, and procedures. In this context, automated compliance checking techniques play a
crucial role thanks to their ability to automatically identify compliance violations [2, 3]. For this reason,
numerous approaches have been developed to perform this task (cf. [4, 5, 6, 7]). What these compliance-
checking techniques have in common is that they rely on a structured specification of allowed behavior, for
example in the form of process models or business rules. As a result, these techniques ignore the wealth
of information that is contained in less structured forms of process documentation, such as textual process
descriptions [8].
While the relevance and widespread use of text documents as a source for process analysis has been
emphasized in various contexts [9, 10, 11, 12], the inherent ambiguity of natural language presents a con-
siderable challenge to compliance-checking techniques. For example, a simple natural language statement
such as “in parallel to the latter steps” leaves room for interpretation. Due to this statement’s ambiguity, it
is generally impossible to infer with certainty whether “latter” refers to the preceding two, three, or even
more activities mentioned in the textual description. In prior work, text-to-process model generation tech-
niques have circumvented this problem by introducing interpretation heuristics [9, 13, 14]. In this way, these
techniques obtain a single process-oriented interpretation of the text, in spite of the presence of ambiguous
sentences. This interpretation, however, contains assumptions on the correct interpretation of essentially
undecidable ambiguity issues. So, there is always the risk that the derived interpretation conflicts with the
proper way to execute the process. As a result, the focus on a single, assumed interpretation can lead to
incorrect and, thus, untrustworthy compliance-checking results.
To provide a rigorous solution for the reasoning problems caused by ambiguous natural language state-
ments, we introduce a novel concept which we refer to as a behavioral space. A behavioral space precisely
captures all possible behavioral interpretations of a textual process description. The behavioral space clearly
defines which behavior is within and which behavior is outside any reasonable bounds of interpretation. By
2
using behavioral spaces for compliance checking, we avoid the need to impose assumptions on the correct
interpretations of ambiguous natural language texts. Therefore, compliance checks based on behavioral
spaces provide trustworthy results: They avoid the risks associated with the selection of incorrect interpre-
tations.
The remainder of the paper is structured as follows. Section 2 motivates the problem of reasoning under
behavioral ambiguity in textual process descriptions. In Section 3, we introduce the notion of a behavioral
space to capture behavioral ambiguity. Section 4 describes how a behavioral space can be generated from
a textual process description. We show how to perform compliance checks using a behavioral space in
Section 5. Then, Section 6 introduces a semi-automated pruning technique that can be used to eectively
reduce the uncertainty in compliance-checking results. In Section 7, we demonstrate the usefulness of
behavioral spaces and our proposed pruning technique through a quantitative evaluation using real-world
data. Section 8 discusses streams of related work. Finally, we conclude the paper and discuss directions for
future research in Section 9.
2. Behavioral Ambiguity in Textual Process Descriptions
In this section, we illustrate the problem associated with compliance checking of process behavior
against textual process descriptions. The key challenge in this context is the inherent ambiguity of natural
language. Ambiguity in natural language refers to a type of uncertainty in which several interpretations of
the same text are plausible. For example, the sentence “I saw a man on the hill with a telescope” can have
at least five plausible interpretations. These interpretations vary, among others, on who is on the hill (Ior
the man) and on the possessor or the location of the telescope (I, the man, or it is on the hill). In certain
situations, the correct interpretation of an ambiguous statement can be clear from the context in which it is
used, whereas in other situations even context cannot help to resolve ambiguity.
The goal of compliance checking is to determine if some observed behavior (i.e. a sequence of per-
formed activities) conforms to the allowed behavior described by a process specification, i.e. a textual pro-
cess description. Therefore, in a compliance-checking context we are particularly concerned with ambiguity
related to the allowed process behavior described in a text, which we shall refer to as behavioral ambigu-
ity. Behavioral ambiguity occurs when statements about the relations that exist between process steps can
3
After a claim is received, a claim ocer reviews the request and records the claim information. The
claim ocer then validates the claim documents before writing a settlement recommendation. A senior
ocer then checks this recommendation. The senior ocer can request further information from the
claimant, or reject or accept the claim. In the former case, the previous steps must be repeated once
the requested information arrives. If a claim is rejected, the claim is archived and the process finishes.
If a claim is accepted, the claim ocer calculates the payable amount. Afterwards, the claims ocer
records the settlement information and archives the claim. In the meantime, the financial department
takes care of the payment.
Figure 1: Exemplary description of a claims handling process.
be interpreted in dierent ways. We illustrate the problem of behavioral ambiguity through the simplified
description of a claims handling process, as presented in Figure 1. The description uses typical patterns to
describe ordering relations, as observed in process descriptions obtained from practice and research [9].
At first glance, the description from Figure 1 may appear to be clear. However, on closer inspection, it
turns out that the description does not provide conclusive answers to several questions regarding the proper
execution of the described process. For instance:
Q1. Is it allowed that the claims ocer records the claim information before reviewing the request?
Q2. Does it suce for the claim ocer to rewrite the settlement recommendation in case additional infor-
mation has been requested?
Q3. Can the financial department start paying the claimant while the settlement information is still being
recorded?
Based on the information provided in the textual description, these questions are not clearly decidable.
This lack of decidability results from two forms of behavioral ambiguity: type ambiguity and scope ambigu-
ity. Type ambiguity occurs when a textual description does not clearly specify the type of order relationship
between two activities. For instance, the relation between the “review request” and “record claim infor-
mation” activities in the first sentence is unclear. The term “and” simply does not allow us to determine
whether these activities must be executed sequentially or whether they can be executed in an arbitrary order
(Q1). Scope ambiguity occurs when statements in a textual description underspecify to which activity or
activities they precisely refer. This type of ambiguity particularly relates to repetitions and parallelism. For
instance, the statement “the previous steps must be repeated” does not clearly specify which activities must
4
be performed again (Q2). Similarly, the expression “in the meantime” does not define when the financial
department can start performing its activities (Q3).
As a result of such ambiguities, there are dierent views on how to properly carry out the described
process. When deriving a single structured interpretation from a textual process description, as done by
process model generation techniques (cf. [9, 13, 14]), there is always the risk that a derived interpretation
conflicts with the proper way to execute the process. The focus on a single interpretation can, therefore,
lead to wrong conclusions when reasoning about a business process. This can, for instance, result in a
loss of eciency by not allowing for parallel execution where possible (Q3). Furthermore, it can result in
non-compliance with regulations, for example, by failing to impose necessary ordering restrictions (Q1) or
by not repeating all of the required steps when dealing with the receipt of new claim information (Q2).
To avoid the problems associated with using an assumed interpretation, automated reasoning techniques
should take into account all reasonable interpretations of a textual process description. Therefore, we use
this paper to introduce the concept of a behavioral space. A behavioral space allows us to capture the full
range of possible semantics that can be conveyed by textual descriptions in a structured manner. As such, it
provides the basis to correctly reason about compliance to described processes.
3. Capturing Behavioral Ambiguity using Behavioral Spaces
In this section, we introduce and define the concept of a behavioral space. This concept provides the
foundation to reason about behavioral compliance for ambiguous process descriptions. The general idea
underlying behavioral spaces is to represent the causes and eects of behavioral ambiguity in a structured
manner. In Section 3.1, we first consider how to capture the various possible interpretations of a single
ambiguous behavioral statement. Then, Section 3.2 defines the concept of a behavioral space as a means to
capture all possible ways to interpret an entire textual process description.
3.1. Behavioral Statement Interpretations
Textual process descriptions consist of statements, i.e. sentences or parts of sentences, that describe
ordering relations between activities. We shall refer to such statements as behavioral statements. Each be-
havioral statement describes a single type of relation that holds between two or more activities. For example,
the statement “After a claim is received, a claims ocer reviews the request” describes a sequential relation
5
between “receive claim” and “review request”. These relations described in behavioral statements can be
translated into a structured notation. In this paper, we employ the behavioral profile relations from [5] for
this purpose. This choice is based on two main criteria. First, behavioral profile relations allow for an
intuitive representation of the concepts we introduce in the remainder of this paper. Second, the relations
can be used for computationally ecient compliance checks. This feature is important because the compu-
tational complexity of compliance checks increases with the number of interpretations of a textual process
description.
The behavioral profile relations capture the ordering restrictions that are in eect between pairs of
activities. Four dierent behavioral profile relations can exist for an activity pair (ai,aj). The strict order
relation ai ajis used to express that activity aicannot be executed after the execution of activity aj. The
reverse strict order relation ai 1ajindicates the opposite restriction, namely that aicannot be executed
after the execution of ai.1The exclusiveness relation ai+ajdenotes that either activity aior activity ajcan
be executed in a single process instance. Finally, the interleaving order relation ai|| ajstates that aiand aj
can be executed in an arbitrary order. Using the activity identifiers specified in Table 1, this results in the
relation a1 a2as a relation that holds between receive claim and review request .
Table 1: Activities in the running example
ID Activity ID Activity
a1Receive claim a8Reject claim
a2Review request a9Accept claim
a3Record claim information a10 Receive requested information
a4Validate documents a11 Calculate payable amount
a5Write settlement recommendation a12 Record settlement information
a6Check recommendation a13 Archive claim
a7Request further information a14 Arrange payment
Behavioral ambiguity in a textual process description occurs when behavioral statements can be in-
terpreted in dierent manners. We refer to these statements as ambiguous behavioral statements. These
statements result in conflicting activity relations. For instance, the statement “a claim ocer reviews the
request and records the claim information” results in two interpretations. It is unclear whether this state-
ment implies a strict order or an interleaving order between the two described activities. This leads to two
1Note that the reverse strict order relation ai 1ajcan only exist if and only if aj ai.
6
dierent possible behavioral relations, namely a2 a3and a2|| a3.
In the remainder, we use the term statement interpretation to refer to the various behavioral relations that
are associated with a single interpretation of a behavioral statement. Unambiguous behavioral statements
result in a single statement interpretation, whereas ambiguous statements lead to multiple interpretations.
Definition 1 capture this notion.
Definition 1 (Statement Interpretations).Let s be a behavioral statement in the set of behavioral statements
STof a textual process description T, ATthe set of activities described in T , and R={ , 1,+,||} the
set of behavioral profile relations. We define Γsas a set consisting of one or more statement interpretations
for the statement s. Each statement interpretation γΓscaptures the behavioral relations that follow a
possible interpretation of s, which is defined as a partial function γ:AT×AT9Rthat assigns a behavioral
profile relation from Rto a pair of activities from AT, if any.
Next, we discuss how these statement interpretations serve as a basis for the establishment of behavioral
spaces, which consist of interpretations of an entire textual process description.
3.2. Behavioral Spaces
Using the interpretations of individual statements as a basis, we can construct views on the full process
behavior described in a textual description. We refer to a specific view as a process interpretation. A process
interpretation for a text Tfollows from the selection of a single statement interpretation for each statement
sin the set of behavioral statements ST. We define a process interpretation as given in Definition 2.
Definition 2 (Process Interpretation).Let T be a textual process description, ATthe set of activities de-
scribed in T , S Tthe set of behavioral statements in T with Γsthe set of statement interpretations of a
statement s ST, and R={ , 1,+,||} the set of behavioral profile relations. We then define a process
interpretation as a tuple P =(I,BP), with:
• I: a complete set of interpretations consisting of a single statement interpretation γΓsfor each
statement s ST, formally the following constraint holds: sST:|I ∩ Γs|=1;
BP :AT×AT R a function that assigns a behavioral profile relation from Rto each pair of
activities from AT.
7
Recognize that in Definition 2 any activity relation that follows from a function γ∈ I is part of BP,
but that the reverse does not necessarily hold. BP defines a complete behavioral profile based on the in-
terpretations included in I, which includes relations that follow from the transitivity of the strict order and
interleaving order relations [15]. For instance let γ1∈ I be a statement interpretation that establishes the
relation a band let γ2∈ I be a statement interpretation that establishes b c. Then, BP will include
the relation a cthat follows due to transitivity2, even though this relation is not part of any statement
interpretation in I. Despite the overlap between them, we include Ialongside BP in Definition 2. This
is done in order to preserve traceability between the obtained process behavior described by BP and the
statement interpretation in Ithat provided the foundation for BP. As such, we can use this traceability to
provide more useful diagnostic results when performing compliance checks.
A textual process description without ambiguity has exactly one process interpretation. For a textual process
description Twith behavioral ambiguity, the set of possible process interpretations follows naturally as the
set of possible combinations of statement interpretations for the ambiguous statements. For example, a text
with two ambiguous behavioral statements siand sj, with, respectively, two and three dierent statement
interpretations, will yield a set of 2 ×3=6 process interpretations. The concept of a behavioral space
captures this spectrum of possible interpretations for a single process. Figure 2 visualizes this, by depicting
a three-dimensional view on the behavioral relations that exist between the activities in an ambiguous textual
description. Definition 3 provides the formal definition of a behavioral space.
Definition 3 (Behavioral Space).Let T be a textual process description, ATthe set of activities described
in T, and S Tthe set of behavioral statements in T. We define a behavioral space BSTas a set of process
interpretations of the textual process description T.
In Section 4, we consider how to automatically generate behavioral spaces from textual process descriptions.
4. Constructing Behavioral Spaces
This section describes an approach to automatically generate a behavioral space from a textual process
description. The approach, visualized in Figure 3, consists of three steps. First, a textual process descrip-
tion Tis parsed in order to identify and analyze the set of behavioral statements ST. Second, the proposed
2Note that this only applies if no interpretation γ∈ I denotes a (dierent) relation between activities aand c.
8
a1a2a3. . . an
a1
a2
a3
.
.
.
an
+
f
f
f
||
+
||
.
.
.
f
||
+
f. . .
||
||
.
.
.
+
a1a2a3. . . an
a1
a2
a3
.
.
.
an
+
f
f
f
+
||
f
||
+
f
.
.
.
+
. . .
Pm
P2
P1
Figure 2: A behavioral space as a collection of mprocess interpretations
approach generates behavioral interpretations for each statement in ST. Third and lastly, the dierent state-
ment interpretations are combined into a collection of process interpretations that, together, comprise the
behavioral space BS.
In the remainder of this section, we present the details on each of these three steps. Given the focus on
behavioral ambiguity of this paper, we mainly describe those aspects of the generation procedure that are
specific to the consideration of ambiguity. Existing text-to-process model generation approaches, cf. [9],
already address most challenges related to the parsing of textual descriptions (step 1) and related to the
extraction of behavioral relations for unambiguous behavioral statements (part of step 2).
1. Parse textual
description
Textual process
description
2. Compute
interpretations per
statement
3. Generate
behavioral
space
Behavioral
space
Figure 3: Steps involved to construct a behavioral space from a textual description
4.1. Parsing Textual Process Descriptions
The first step in the approach is to parse a textual process description T. The goal of this parsing step is
to identify the set of behavioral statements STand to extract behavioral information from these statements.
9
Each behavioral statement in a text describes a single type of relation that holds between a number of
activities. Therefore, a sentence in a textual description can contain more than a single behavioral statement.
For example, the sentence “After a claim is received, a claim ocer reviews the request and records the
claim information”, contains two separate behavioral statements. One statement describes the strict order
relation between “claim is received” and “reviews the request”. The other describes an ambiguous relation
between “reviews the request” and “records the claim information”.
For each behavioral statement in ST, the parsing step extracts the parts of the statement that refer
to process concepts and their inter-relations. As such, the parser identifies references to three semantic
components that together comprise a behavioral relation of the form: <source,relation type,target>. These
components are a source reference, a relation type reference, and a target reference. We speak about
references, rather than process concepts here, because textual descriptions often use dierent ways to refer
to the same concept. For instance, “previous activity” provides a reference to some aforementioned activity;
it is not a process concept itself. To clarify the parsing results obtained in this manner, we present the
parsing results for three statements in Table 2. We will use these examples in the remainder of this section
to further illustrate the algorithm that generates statement interpretations.
Table 2: Exemplary outcomes of behavioral statement parsing
ID Statement Source ref. Relation ref. Target ref.
s1After a claim is received, a claim of-
ficer reviews the request
claim is received (a1)after reviews the request (a2)
s2a claim ocer reviews the request
and records the claim information
reviews the request (a2)and records the claim infor-
mation (a3)
s3In the meantime, the financial de-
partment takes care of the payment
meantime takes care of payment
(a14)
The techniques necessary to perform this parsing step are equal to the parsing used by existing text-to-
model generations approaches. These approaches use a combination of standard natural language process-
ing (NLP) tools and heuristic-based techniques. NLP tools, such as the Stanford Parser [16], are used to
identify the grammatical structure of sentences. This structure is important to extract the business object on
which an action is being performed and to identify the actor who is performing the action. For example,
the “claims ocer” (actor) “reviews” (action)“the request” (business object). While the NLP tools mainly
provide general techniques to analyze individual sentences and, thereby, extract activities, more tailored
10
techniques are necessary to extract the ordering relations that exist between activities. For this purpose,
model generation approaches employ heuristic-based techniques that recognize typical patterns used to ex-
press ordering relations. Such patterns include sequences like ”After activity i, do activity j” or choices like
“If condition, then activity i, else activity j”. Since these techniques are extensively described in related
work, we do not elaborate on them. The interested reader may consult the work by Friedrich et al. [9] for a
detailed description of a state-of-the-art parsing technique.
4.2. Computing Statement Interpretations
In the second step, the proposed approach constructs interpretations for each behavioral statement in
ST. Algorithm 1 formalizes this part of the approach. The algorithm takes as input a parsed behavioral
statement sST, in which the semantic components have been identified. Then, it generates one or more
statement interpretations, depending on the presence and the type of ambiguity in ST. Given a statement
sST, we distinguish three cases for this: (i) sis unambiguous, (ii) scontains type ambiguity, or (iii)
scontains scope ambiguity. To which of these three cases a statement belongs depends on the ability to
resolve the textual references with certainty. The following sections describe each of these cases in detail.
Algorithm 1 Computing interpretations for a behavioral statement
1: function computeStatementInterpretations(ParsedStatement s)
2: Set interpretations =new Set();
3: if s.isUnambiguous() then Statement is not ambiguous
4: Set sourceActivities =s.getSourceActivities();
5: Set targetActivities =s.getTargetActivities();
6: RelationType r=s.getRelationType();
7: interpretations.add(createInterpretation(sourceActivities,r,targetActivities));
8: if s.hasTypeAmbiguity() then Relation type is ambiguous
9: Set sourceActivities =s.getSourceActivities();
10: Set targetActivities =s.getTargetActivities();
11: for RelationType rRelationTypes.get(s.getRelationReference()) do
12: interpretations.add(createInterpretation(sourceActivities,r,targetActivities));
13: if s.hasScopeAmbiguity() then Scope of source reference is ambiguous
14: Set targetActivities =s.getTargetActivities();
15: RelationType r=s.getRelationType();
16: Set sourceActivities1 =getPrecedingActivitiesWithSameResource(s);
17: interpretations.add(createInterpretation(sourceActivities1,r,targetActivities));
18: Set sourceActivities2 =getPrecedingActivitiesWithSameObject(s);
19: interpretations.add(createInterpretation(sourceActivities2,r,targetActivities));
20: Set sourceActivities3 =getActivitiesFromPrecedingControlFlowBlock(s);
21: interpretations.add(createInterpretation(sourceActivities3,r,targetActivities));
22: return interpretations;
11
4.2.1. Unambiguous Behavioral Statements
Generating a process interpretation for an unambiguous statement is relatively straightforward: all ref-
erences in the statement can be resolved in a single and unambiguous manner. For these cases, we can
generate a single statement interpretation by simply resolving the references and constructing a behavioral
relation. For example, for statement s1, we can directly generate the relation a1 a2. Algorithm 1
describes in lines 3–7 the creation of an interpretation for unambiguous statements. Since there is no ambi-
guity, we can automatically extract the set of source activities, target activities, and the relation type from
the parsed statement.
4.2.2. Statements with Type Ambiguity
A behavioral statement with type ambiguity describes a relation among a specific set of activities, but
does not clearly define the type of relationship. Such statements can be identified because they have an
ambiguous way to refer to the relation type, i.e. the relation reference is ambiguous. In line 8, Algorithm 1
checks if statement shas type ambiguity by determining if the relation reference (relationRef ) is a known
ambiguous type indicator. In the context of this work, we focus on two ambiguous type indicators, namely
the terms and and or. It is important to distinguish between cases where relationRef =and” and cases
where relationRef =and, then” or similar. In the latter cases, the relation type is not ambiguous, since
a sequential relation is clearly specified. Although these statements with unclear relation types are am-
biguous, we can generate a set of statement interpretations that accurately capture the dierent possible
interpretations.
If a statement indeed suers from type ambiguity, the algorithm continues by resolving the unambiguous
source and target activity references (lines 9–10). Afterwards, the algorithm generates a single statement
interpretation for each applicable relation type (lines 11–12). For example, for statement s2from Table 2, we
generate one statement interpretation that contains a sequential relation and one that contains an interleaving
order relation between activities a2and a3.
4.2.3. Statements with Scope Ambiguity
Dealing with behavioral statements with scope ambiguity represents the most complex case of the three.
These statements describe the existence of a relation, but do not specify between which activities this re-
12
lationship holds. In practice, such ambiguity occurs with respect to the set of source activities to which a
behavioral statement refers. For example, the statement s3In the meantime, the financial department takes
care of the payment”, does not specify to what activities “meantime” refers. Consequently, the source refer-
ence is empty (), as indicated in Table 2. Statements with scope ambiguity typically occur in the context of
parallelism and loops because such behavioral patterns can be associated with textual references to groups
of activities.
To automatically identify cases with scope ambiguity, we analyzed how existing techniques for the
generation of process models from textual descriptions handle parallelism and loops. These techniques
generally use heuristics to identify and analyze behavioral statements in a text. They build on predefined
sets of indicators that pinpoint the dierent types of relations, e.g. “while” and “in the meantime” for parallel
or interleaving order relations and “is repeated” for backward loops. Specifically, we analyzed the set of
indicators used by the state-of-the-art technique from [9]. In this analysis, we isolated a subset of the
parallelism indicators employed by the technique that typically result in statements with scope ambiguity.
These indicators are presented in Table 3. What these indicators have in common is that their usage does
not allow for the specification of a scope of the statement, i.e. these indicators cannot be associated with
a source reference at all. Consider, for example, the sentence “In the meantime, the financial department
takes care of the payment”. It is syntactically not possible to specify to which set of activities a statement
with in the meantime refers. By contrast, if we replace this construct with a non-ambiguous indicator, from
the second set in Table 3, this problem can be avoided. For example, the indicator “while” can be used
to specify the scope of the statement, e.g. “While the claim is being archived”. By observing the usage of
such ambiguous indicators, we can identify statements with scope ambiguity (line 13 of Algorithm 1) and
generate interpretations for them (lines 14–21).
Table 3: Parallel indicators used in [9] and their classification
Class Contents
Unambiguous while,as well as,in parallel to
Ambiguous meanwhile,concurrently,meantime,in the meantime
Although statements with scope ambiguity are highly problematic since they refer to an unknown set
of activities, we can still identify a set of possible meanings for them. In particular, we can utilize the
13
knowledge that statements such as “the previous steps” and “in the meantime” refer to distinct parts of
a process. This means that the set of activities to which these statements refer cannot be any arbitrary
combination of activities. The activities in the set must rather have a certain commonality, such as a set of
activities that are all executed by the same person.
For this reason, we generate interpretations for statements with scope ambiguity based on sets of ac-
tivities that have a certain common attribute. In particular, given a textual process description, we can
identify sets of subsequently described activities that are (i) performed by the same resource, (ii) performed
on or with the same (business) object, or (iii) are part of the same control-flow construct (e.g. a choice in
the process). Based on this, we obtain dierent interpretations for the statement s3. In this statement, in
the meantime” can refer to dierent moments when the financial department can start taking care of the
payment. This results in three possible statement interpretations, each with a dierent commonality:
1. Common resource: While the claims ocer, the last described resource, is performing its tasks, after
a senior claims ocer has accepted the claim, i.e. {a11,a12 ,a13};
2. Common object: While activities are being performed on the last mentioned business object (the
claim), i.e. {a13};
3. Last control-flow construct: While the last mentioned activity before the statement is being executed,
i.e. {a13}.
These three possibilities result in three sets of relations that can follow from the same behavioral state-
ment. Lines 14–21 describe the generation of a statement interpretation for each of the three sets of activi-
ties.
4.3. Generating Behavioral Interpretations
Based on the statement interpretations extracted from ambiguous and unambiguous behavioral state-
ments, we can generate a set of process interpretations, i.e. the behavioral space, for the entire textual
description. Algorithm 2 formalizes this generation step.
The set of process interpretations included in a behavioral space should contain all possible combina-
tions of statement interpretations for the behavioral statements. Lines 3–14 in Algorithm 2 describe the
creation of these combinations. The underlying idea is that all existing text interpretations, starting from
14
Algorithm 2 Generate a behavioral space from a textual process description
1: function constructBehavioralSpace(Text text)
2: BehavioralSpace behavioralSpace =new BehavioralSpace();
3: List processInterpretations =new List();
4: processInterpretations.add(new ProcessInterpretation());
5: List newProcessInterpretations =new List();
6: for Statement s text.getStatements() do
7: List interpretations =computeStatementInterpretations(s);
8: for Interpretation iinterpretations do
9: for ProcessInterpretation pi processInterpretations do
10: Interpretation newPI =pi.copy();
11: newPI.add(i);
12: newProcessInterpretations.add(newPI)
13: interpretations =newInterpretations.copy();
14: newInterpretations.clear();
15: for ProcessInterpretation pi interpretations do
16: computeFullBehavioralProfile(pi)
17: if pi.getBehavioralProfile().isConsistent() then
18: behavioralSpace.add(pi);
19: return behavioralSpace;
an empty list (line 3), are incrementally extended with a single statement interpretation (lines 8–11). This
ensures that each possible combination of statement interpretations is included in the list. For example, as
considered in the previous section, the claims handling process contains three ambiguous statements with,
respectively, two, three, and two possible interpretations. This results in a total number of 12 (2 ×3×2)
possible combinations of the statement interpretations and, thus, of 12 interpretations in a behavioral space
BST. After all combinations have been generated, we compute a full behavioral profile for each of the
interpretations and add them to the behavioral space (lines 15–17).
To compute the complete behavioral profile for a process interpretation (line 16), we exploit the transi-
tivity of the strict order and interleaving order relations [15]. In this way, we can obtain relations beyond
those pair-wise relations that we extracted from a textual description. For example, if a text specifies that
activity aiis followed by ajand ajis followed by ak, i.e. ai ajand aj ak, then aiis also followed by
ak, i.e. ai ak. After constructing this profile, the algorithm checks the internal consistency of the behav-
ioral profile with a technique defined in [15]. We only include mutually consistent process interpretations
to the behavioral space (lines 17–18). For example, we exclude obviously incorrect interpretations in which
one statement interpretation yields the relation ai ajand another the relation ai+aj.
Once the process interpretations have been obtained in this manner, the construction of the behavioral
15
space is complete. In Section 5, we illustrate the usefulness of these spaces for compliance checking.
5. Compliance Checking using Behavioral Spaces
By capturing behavioral ambiguity in a structured manner, behavioral spaces allow us to reason about
process compliance without the need to arbitrarily settle ambiguity. In this section, we demonstrate this
by showing how to perform a compliance check of an execution trace versus a behavioral space generated
from a textual process description. The goal of compliance checking is to determine whether the behavior
captured in an execution trace is allowed by the behavioral specification of a business process [4]. The key
dierence between traditional compliance checking and compliance checking using behavioral spaces lies
in the potential outcomes of a check. In traditional compliance checking, a trace is either compliant or it is
non-compliant with a business process. Due to the behavioral ambiguity captured in behavioral spaces, a
trace can be compliant or non-compliant, but also potentially compliant with a behavioral space. The latter
outcome occurs for traces that comply with one or more process interpretations in a behavioral space, but
not with all of them.
5.1. Process Interpretation Compliance
Compliance checking of a trace tagainst a behavioral space BSTbuilds on the compliance checking
of tagainst individual process interpretations of the behavioral space. This is similar to the compliance
check of a trace and a behavioral profile, as obtained from a process model (see [17]). This check builds
on a comparison of the behavioral profile of a trace BPtto the behavioral profile relations of a process
interpretation P=(I,BP).
The behavioral profile BPtcaptures the strict order, exclusiveness, and interleaving order relations for
the set of activities Atin a trace t. Given an activity pair (ai,aj)(At×At), BPtcontains the strict order
relation ai tajiat least one occurrence of activity aiprecedes an occurrence of activity ajin t, and no
occurrence of ajprecedes an occurrence of aiin t.BPtcontains the interleaving order relation ai|| ajiat
least one occurrence of aiprecedes an occurrence of ajin t, and at least one occurrence of ajprecedes an
occurrence of aiin t.
Given a behavioral profile of a trace BPtand a process interpretation P∈ BST, we can determine if tis
compliant with P=(I,BP) by checking if the relations in BPtdo not violate the behavioral relations in BP.
16
Specifically, tis compliant with Pif all relations in BPtare subsumed by the relations in BP. A behavioral
profile relation R∈ R is subsumed by relation R0∈ R if the relations are equal, i.e. R=R0, or if R0restricts
less behavior than R. Definition 4 formally defines the notion of subsumption according to [17].
Definition 4 (Subsumption Predicate).Given two behavioral relations R,R0∈ { , 1,+,||}, the sub-
sumption predicate S(R,R0)is satisfied, iR∈ { , 1} ∧ R0= + or R =R0or R =||.
Based on the notion of subsumption, we define compliance between a trace and a process interpretation
in Definition 5.
Definition 5 (Trace to Process Interpretation Compliance).Let t be an event trace with an activity set At
and P =(I,BP)a process interpretation in the behavioral space BSTwith an activity set AT, such that
AtAT. Then, the compliance predicate compl(t,P)is satisfied if for each activity pair (x,y)(At×At)
the relation Rtin BPtof the pair (x,y)is subsumed by the relation RPof the pair (x,y)in BP, i.e. S(RP,Rt).
Next, we describe how to determine the compliance of a trace to a behavioral spaced based on compli-
ance checks between a trace and the space’s individual process interpretations.
5.2. Behavioral Space Compliance
To determine the level of compliance between a trace and a behavioral space, we consider the number
of process interpretations with which a trace complies. In particular, we quantify the support of a behavioral
space BSTfor a trace tas the ratio between the number of interpretations to which tis compliant and the
total number of interpretations in BST:
supp(t,BST)=|{P∈ BST|compl(t,P)}|
|BST|(1)
The support metric quantifies the fraction of interpretations that allow for a trace to occur. A support
value of 1.0 indicates that a trace is without any doubt compliant with the behavioral space, i.e. independent
of the chosen interpretation. A support of 0.0 shows that there is no interpretation under which a trace
complies with the behavioral space. Therefore, it can be said with certainty that the trace is non-compliant
with BST. Finally, any trace twith a support value 0.0<supp(t,BST)<1.0 is potentially compliant
with BST. This implies that there are certain interpretations of the textual description to which the trace
17
complies. To illustrate the usefulness of the support metric, consider the following three partial execution
traces of the running example:
Trace t1=<a1,a2,a3,a4,a5>;
Trace t2=<a1,a3,a2,a4,a5>;
Trace t3=<a11,a14,a12 ,a13 >.
The dierence between the traces t1and t2is that, in t1, activity a2occurs before a3, whereas these are
executed in reverse order in t2, i.e. a2 a3BPt1and a3 a2BPt2. Furthermore, recall that the behav-
ioral relation between these two activities is given by the ambiguous behavioral statement s2. Depending on
the interpretation of s2, there either exists a strict order or an interleaving order relation between a2and a3,
i.e. R(a2,a3)={ ,||}. The relation a2 t1a3from t1is subsumed by both possible interpretations included
in the behavioral space, since sub( , ) and sub( ,||) are both satisfied. Therefore, t1is compliant with
all interpretations in the behavioral space and, thus, has a support value of 1.0. For trace t2we observe a
dierent situation. While a3 t2a2in trace t2is subsumed by relation a2|| a3, this relation is not subsumed
by a2 a3. Therefore, t2does not comply with half of the process interpretations in the behavioral space.
This results in supp(t2,BST)=0.5.
Aside from providing information on the process interpretations with which a trace complies, behavioral
spaces allow us to obtain further diagnostic information from this compliance check. In particular, we can
gain insights into the conditions under which a trace is compliant with a process description. For example,
we can learn under which interpretations of the statement s3, “In the meantime, the financial department
takes care of the payment”, trace t3is compliant. In t3, the financial department pays the settlement amount
(a14) before the claims ocer records the settlement information (a12 ). This complies with one of the two
interpretations of statement s3and, therefore, results in a support value of 0.5. Furthermore, we know that
this trace is compliant, if and only if “in the meantime” means “while the claims ocer is performing its
tasks” and not “while the claims ocer is archiving the claim”. Such diagnostic information can be useful
when interpreting the support values for a trace or when aiming to resolve the ambiguity contained in a
textual description.
18
6. Pruning Behavioral Spaces based on Information Gain
This section demonstrates how the behavioral space of a textual process description can be reduced to
provide more accurate compliance-checking results. We refer to this act as pruning a behavioral space. In
particular, we present a technique that supports users in resolving behavioral ambiguity in an ecient man-
ner. The technique identifies those ambiguous behavioral statements in a textual description that lead to the
most uncertainty in compliance-checking results. To achieve this, we define a metric that captures the infor-
mation gain that can be achieved by resolving the ambiguity in behavioral statements. This metric quantifies
the amount of uncertainty in compliance-checking results caused by a particular ambiguous behavioral state-
ment. Therefore, information gain illustrates how much compliance-checking uncertainty can be removed
by resolving the ambiguity in a particular statement. It serves a similar purpose as the information-gain
metric used in the context of decision trees to quantify reductions in information entropy (cf. [18, 19]). Be-
fore introducing the information-gain metric, we first consider how compliance-checking uncertainty can
be reduced by resolving ambiguous statements and, thereby, pruning a behavioral space.
6.1. Compliance-Checking Uncertainty
In Section 5, we demonstrated that behavioral spaces allow for reasoning about compliance without the
need to resolve behavioral ambiguity. This is achieved by introducing the notion of potential compliance,
which captures situations where a trace is compliant to some process interpretations, but non-compliant
to others. Such a classification provides valuable information, especially in the context of the diagnostic
information that can be associated with it, i.e. for which statement interpretations a trace is compliant or
non-compliant. Still, such cases represent a form of unclarity in the compliance-checking results, because
it is not known if a trace is actually compliant or not. We will refer to this state as compliance-checking
uncertainty.
The accuracy of compliance-checking results can be improved by reducing the level of compliance-
checking uncertainty. This can be achieved by resolving the cause of ambiguity in ambiguous statements.
For instance, by replacing an ambiguous type indicator such as “and” with either “and, then” or with “and,
meanwhile”. In these cases, there are less potentially compliant traces and more traces for which it can be
stated with certainty that they are compliant or not. Behavioral spaces represent a powerful tool to support
19
users in this endeavor. First, behavioral spaces support the improvement of compliance checking accuracy
by providing insights into the causes (i.e. the ambiguous statements) and the eects (i.e. the dierent in-
terpretations) of behavioral ambiguity. For instance, the behavioral space shows us that statement s2from
the running example is ambiguous. Therefore, it is clear that if a user decides to resolve this ambiguity by
selecting the correct interpretation of s2, we reduce the overall compliance-checking uncertainty. Second,
behavioral spaces can support users even further by letting them focus their resolution eorts on the am-
biguous statements that are the greatest causes of compliance-checking uncertainty. We achieve this with
the information-gain metric that we introduce next.
6.2. Information Gain
We introduce information gain (IG) as a metric that describes how much compliance-checking uncer-
tainty can be resolved by selecting a single interpretation for an ambiguous statement. A proper quantifi-
cation for this gain is to consider for how many traces the interpretations of a single ambiguous statement
disagree about their compliance. By resolving the ambiguity in statements of which the interpretations dis-
agree about the largest number of traces, a maximum of compliance-checking uncertainty can be removed.
We define IG for a set of statement interpretations Γand a set of traces (i.e. a log) Lin Equation 2.
IG(Γ,L)=|[
γΓ
CL(γ)\
γΓ
CL(γ)|(2)
In this equation, we use CL(γ) to refer to the set of traces from Lthat are compliant to the behavioral
relations that comprise a statement interpretation γ.IG(Γ,L) specifies the size of the set of traces that
are compliant to at least one interpretation, but also non-compliant to at least one interpretation. This is
computed by taking all traces that are allowed according to at least one interpretation in Γ, i.e. the union of
all sets CL(γ) for γΓ, minus those traces that are allowed by all interpretations in Γ, i.e. the intersection
of these sets.
The metric IG can be applied in two dierent ways, depending on the availability of an event log. If an
event log is not available, a log LGcan be generated that contains all traces that are potentially compliant
to the behavioral space BS. In this case, IG(Γ,LG) can be used to identify those phrases that lead to the
biggest potential reduction in ambiguity. However, if an event log LRrelated to the process is already
20
available, the information-gain metric can be used to compute the information gain in the context of truly
observed behavior. In this case, IG(Γ,LR) represents the gain in ambiguity specific to the event log LR.
To illustrate the usage of IG, consider the statements s2and s3used throughout this paper (introduced
in Table 2) and a log Lwith (partial) execution traces:
t1=<a2,a3,a12,a13 ,a14 >
t2=<a2,a3,a14,a12 ,a13 >
t3=<a2,a3,a14,a12 ,a13 >
t4=<a3,a2,a14,a13 ,a12 >
Recall that statement s2has two interpretations, with the following sets of behavioral relations: {a2
a3}and {a2|| a3}. It can be easily observed that these statements disagree about any trace in which a3occurs
before a2, i.e. of which the behavioral profile contains a3 ta2. Trace t4is the only trace in Lfor which
this is the case. Therefore, IG(Γs2,L)=1. To compute IG for statement s3, it suces to consider the
most restrictive and most flexible interpretations of the statement. The most restrictive interpretation states
that a14 can only be executed in parallel, i.e. possibly before, activity a13. By contrast, the most flexible
interpretation of s3specifies that a14 is in an interleaving order with a11,a12, and a13 . This means that
from log L, only trace t1is compliant to the former interpretation, whereas all four traces are compliant to
the latter interpretation of s3. Therefore, three traces in Lare in dispute by the interpretations in Γs3, i.e.
IG(Γs3,L)=3. From this, it can be concluded that the resolution of ambiguity in statement s3has a greater
impact on the ambiguity in log Lthan the resolution of s2.
By computing IG for all ambiguous statements in a behavioral space and resolving the statements with
the highest information gain, users can eciently reduce the level of compliance-checking uncertainty. As
such, the behavioral space will be pruned by removing process interpretations that are not compliant with
the resolved ambiguity. This greatly enhances the ecient usage of the notion of behavioral spaces for
compliance checking.
7. Evaluation
In this section, we evaluate the usefulness of behavioral spaces for compliance checking in the context
of ambiguous textual process descriptions. For this purpose, we conduct a two-stage evaluation. First, we
assess the impact that the consideration of behavioral spaces has on compliance-checking results. In par-
ticular, we compare the compliance-checking results obtained by using behavioral spaces to two alternative
21
ways of dealing with behavioral ambiguity. Second, we demonstrate the eectiveness of the proposed IG
metric for the reduction of uncertainty in compliance-checking results.
In the remainder, Section 7.1 first introduces the test collection used in both parts of the evaluation.
Then, Sections 7.2 and 7.3 respectively describe the evaluation of the compliance-checking results and of
the IG metric. Finally, we discuss limitations of the evaluation results in Section 7.4.
7.1. Test Collection
To perform our evaluation, we reuse the collection of textual process descriptions from the text-to-model
generation approach by Friedrich et al [9]. The collection contains 47 process descriptions from various
industrial and scholarly sources. Table 4 gives an overview of the characteristics of the test collection.
Table 4: Overview of the test collection
ID Source Type PD S L
1 HU Berlin Academic 4 10.0 18.1
2 TU Berlin Academic 2 34.0 21.2
3 QUT Academic 8 6.1 18.3
4 TU Eindhoven Academic 1 40.0 18.5
5 Vendor Tutorials Industry 4 9.0 18.2
6 inubit AG Industry 4 11.5 18.4
7 BPM Practitioners Industry 1 7 9.7
8 BPMN Practice Handbook Textbook 3 4.7 17.0
9 BPMN Guide Textbook 6 7.0 20.8
10 Federal Network Agency Public Sector 14 6.4 20.0
Total 47 9.2 17.2
Legend: PD =Number of process descriptions per source,
S=Average number of sentences, L =Average number of
words per sentence
The data from Table 4 illustrate that the included process descriptions dier greatly in size. The average
number of sentences ranges from 4.7 to 34.0. The longest process description contains a total of 40 sen-
tences. Furthermore, the descriptions dier in the average length of the sentences. While the BPM Practi-
tioners source contains process descriptions with rather short sentences (9.7 words), the process descriptions
from the TU Berlin source contain relatively long sentences (21.2 words). Lastly, the process descriptions
dier in terms of how explicitly and unambiguously they describe the process behavior. Among others, this
results from the variety of authors that created the textual descriptions. Hence, we believe that the collection
22
is well-suited for achieving a reasonably high external validity of the results.
7.2. Compliance Evaluation
To demonstrate the usefulness of behavioral spaces for compliance checking, we compare the compliance-
checking results obtained by using behavioral spaces to two alternative ways of dealing with behavioral
ambiguity. These two alternatives are: (i) imposing assumptions on the correct interpretation of behavioral
statements, and (ii) ignoring ambiguous statements because they cannot be resolved. The goal of this part
of the evaluation is to show that behavioral spaces provide a much more reasonable view on the process
behavior allowed by a textual process description, when compared to the two alternatives. Section 7.2.1
describes the details of the setup used for this step. In Section 7.2.2, we present and discuss the results.
7.2.1. Setup
To conduct the evaluation, we implemented a prototype to generate behavioral spaces from textual
process descriptions. To achieve this, we build on the state-of-the-art text-to-process model generation
approach by Friedrich et al. [9]. In particular, our Java prototype uses a library that is part of the RefMod-
Miner3, which implements the process model generation approach in a stand-alone tool. We use the library
to automatically identify activities and extract behavioral profile relations that exist between activities.
We compare the compliance-checking results obtained by using behavioral spaces to two alternative
ways of dealing with behavioral ambiguity. The first alternative reflects the possibility to deal with behav-
ioral ambiguity by imposing assumptions on the correct interpretation of a text, i.e. by selecting a single
interpretation for each ambiguous behavioral statement. Second, it is possible to deal with behavioral am-
biguity by ignoring all ambiguous statements and, thus, only focusing on the behavioral relations that can
be extracted with certainty from a text. Given these alternatives to behavioral spaces, we generate three
behavioral models (BMs) for each of the 47 textual process descriptions as follows:
1. Fully interpreted behavioral profile (BPfull): This behavioral model reflects an approach that im-
poses assumptions on the correct interpretation of ambiguous statements. To obtain BP f ull , we gen-
erate a process model by using the text-to-model generation approach from [9] and, subsequently,
extracting a behavioral profile from this model;
3http://refmod-miner.dfki.de
23
2. Minimally restricted behavioral profile (BPmin): This behavioral model reflects an approach in
which ambiguous statements are fully ignored. The resulting behavioral profile only captures the
behavioral relations that can be extracted with certainty from the textual process description. To
obtain BPmin, we remove all behavioral profile relations from BPf ull that were extracted from the
analysis of ambiguous behavioral statements;
3. Behavioral space (BS): The behavioral space generated for the textual description in accordance
with the interpretation generation method described in Section 4.
We conduct our evaluation by comparing the sizes of the sets of traces that are (potentially) compliant
with the three behavioral models, in accordance with the definitions provided in Section 5.4Using C(BM)
to refer to the collection of traces that are compliant or potentially compliant to a behavioral model BM, we
quantify the size dierences using the following two metrics:
R1=| C(BS)|
| C(BPmin)|(3) R2=| C(BPf ull)) |
| C(BS)|(4)
R1quantifies the ratio between the number of traces allowed by a behavioral space and a minimally
restricted behavioral profile. This measure reflects how much behavior that certainly does not comply with
tis allowed when ambiguous statements are ignored. R2quantifies the ratio between the number of traces
allowed by a behavioral space and those allowed by a fully interpreted behavioral profile. This measure
reflects how much behavior that is possibly compliant to tis marked as noncompliant by an approach that
imposes assumptions on ambiguous statements.
7.2.2. Results
Table 5 summarizes the evaluation results. The first interesting thing to note is how common textual
process descriptions with behavioral ambiguity are. In total, 32 of the 47 textual process descriptions (70%)
contained one or more ambiguous phrases. The majority of these cases, 28 in total, included just phrases
with type ambiguity. Four cases contain statements with scope ambiguity, 3 of which also contain behavioral
statements with type ambiguity.
4For processes that contain loops, we only include traces with at most one repetition.
24
Table 5: Evaluation results
Collection P Stype Sscope A|PI|R1R2
Only type ambiguity 28 64 0 19.6 11.0 100.0% 37.8%
With scope ambiguity 4 13 4 24.0 76.5 16.4% 0.5%
Total 32 77 4 20.2 19.1 89.5% 33.7%
Legend: P=number of processes, Sty pe =statements with type ambigu-
ity, Sscope =statements with scope ambiguity, A=extracted activities per
process (avg.), |PI| =interpretations per behavioral space.
For processes with just type ambiguity in their descriptions, there is a clear dierence between the
behavior allowed by fully interpreted behavioral profiles C(BPf ull ) and the behavior allowed by behavioral
spaces C(BS). As indicated by metric R2, the fully interpreted behavioral profiles allow for only 37.8%
of the behavior allowed by the behavioral space. For the remaining 62.2% of the traces, we cannot state
with certainty that they do not comply with the process described in the text. This dierence results from
ordering restrictions that the text-to-model generation algorithm imposes on activities, even when these
ordering restrictions may not exist. Behavioral spaces do not impose such restrictions and, thus, mark traces
that exhibit such execution flexibility as potentially compliant. This consideration of the cases with type
ambiguity already illustrates the impact of assumptions on compliance checking. Nevertheless, this impact
is much more severe for textual process descriptions that also contain statements with scope ambiguity.
Legend:
C(BP f ull )
C(BS)
C(BPmin)
Figure 4: Visualization of three sets of compliant traces for cases with scope ambiguity.
The behavioral models for the 4 cases with scope ambiguity show much larger dierences among the
behavior they allow. We visualize the relative sizes of the three sets of compliant traces in Figure 4. The
25
light-gray area denotes the set of traces compliant with BPmin , i.e. the set of traces that remain when treating
ambiguous statements as undecidable. The behavior allowed by the behavioral space, represented by the
dark-gray area, is considerably smaller, as also indicated by the R1score of 16.4%. This number reveals
that 83.6% of the traces in C(BPmin) are not compliant with any reasonable interpretation of the statements
with scope ambiguity. Figure 4 also shows the considerable impact that the usage of single interpretations
has on the number of compliant traces. The tiny size of the black area in the figure and the R2score of
0.5% both indicate that, for the cases with scope ambiguity, the fully interpreted behavior profiles allow
for only a very small fraction of the behavior that is (potentially) compliant to a behavioral space. Again,
the remaining 99.5% represent traces that do not necessarily conflict with behavior specified in a textual
process description.
The evaluation results show the impact both of ignoring ambiguous statements and of imposing single
interpretations on them. As visualized by Figure 4, behavioral spaces provide a balance between these
loosely restricted and too restricted behavioral models. In summary, behavioral spaces exclude a large
number of nonsensical traces, which can be excluded by generating proper interpretations for ambiguous
statements. Still, they allow for much more traces than the restricted models that are obtained by imposing
assumptions on the ambiguous statements in textual descriptions.
7.3. Pruning Evaluation
In the second part of the evaluation, we set out to demonstrate how eective the proposed pruning
technique is at reducing uncertainty in compliance-checking results. Specifically, we assess how quickly
compliance uncertainty can be reduced when we employ the IG metric, introduced in Section 6, to select
ambiguous phrases. As a benchmark, we compare the results obtained in this manner to a random selection
mechanism. Section 7.3.1 describes the setup of this part of the evaluation, followed by a presentation of
the results in Section 7.3.2.
7.3.1. Setup
To evaluate the eectiveness of the IG metric, we make use of the behavioral spaces generated for the
textual descriptions in the previous part of the evaluation. Specifically, we select the behavioral spaces for
the 17 textual descriptions with more than one ambiguous behavioral statement. For these cases it is relevant
26
to determine which ambiguous phrase should be resolved first. To compute values for the IG metric, we
generate a log Ltfor each textual description Tthat contains all traces that are potentially compliant to the
behavioral space BS.
In this evaluation, we are interested in how much we can reduce compliance-checking uncertainty by
using IG to select the ambiguous phrases we resolve first. We quantify this reduction by comparing the
number of potentially compliant traces that remain after resolving the ambiguity in a statement to the origi-
nal number of potentially compliant traces. Again, we use C(BS) to denote the set of potentially compliant
traces for a given behavioral space BS. Then, we compute the fraction of compliance uncertainty that re-
mains after resolving kambiguous statements as given by Equation 5. Here, BSkrepresents the behavioral
space that remains after resolving kambiguous statements. BS0represents the behavioral space for which
no ambiguity is resolved.
U(BS,k)=|C(BSk)|
|C(BS0)|(5)
We compute the value U(BS,k) for each k1,...,n, where nis the number of ambiguous statements in
BS. As a benchmark, we compare these values against the uncertainty that remains after randomly selecting
ambiguous statements to resolve. In particular, we compute the average value of U(BS,k) for each of the
n! dierent orders in which ambiguous statements can potentially be selected. As such, this benchmark
mimics the situation in which people blindly select ambiguous statements to resolve.
7.3.2. Results
Figure 5 visualizes the evaluation results. The curves represent the average reduction in uncertainty over
the 17 relevant cases. The figure shows considerable dierences between the reduction obtained by using
the information gain metric versus the reductions obtained through random selection.
When interpreting these results, it is important to recognize that the minimum uncertainty reduction per
statement observed in the test collection is 50.0%. This quantity represents the amount of uncertainty that is
removed when resolving the most simple form of an ambiguous statement: a statement with type ambiguity
between just two activities. Therefore, the reduction of 63.7% that is obtained by randomly resolving a
single ambiguous statement appears is only 13.7% higher than the minimum improvement per statement. In
27
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
1 2 3 4 5+
Uncertainty resolved
# of ambiguous statements resolved
max. IG
random
Figure 5: Visualization of three sets of compliant traces for cases with scope ambiguity.
comparison, usage of the IG metric results in a removal of 79.7% of the compliance-checking uncertainty
in the first step. This reflects an improvement of 29.7 percentage points above the minimum improvement
per statement. Therefore, the information gain metric leads to an improvement that is more than twice as
high as random selection.
The 79.7% reduction in uncertainty demonstrates that, in many textual descriptions, a single statement
has a much bigger impact on the ambiguity than the others. These statements are typically the ones with
scope ambiguity. This is the case because scope ambiguity results in a high number of potentially compliant
traces, usually caused by interpretations with a considerable number of activities with interleaving order
relations. This is, for instance, shown for statement s3considered in the running example of this paper.
Similarly, statements with type ambiguity among more than two activities result in an increased number of
interleaving order relations compared to statements with ambiguity between just two activities. Therefore,
these statements result in a high number of potentially compliant traces. By resolving these ambiguous
statements first, the information gain metric can be used to quickly resolve the vast majority of compliance-
checking uncertainty.
7.4. Limitations
Our evaluation demonstrates the usefulness of behavioral spaces for reasoning about process compliance
in the context of textual process descriptions. However, the evaluation results should be considered against
28
the background of some limitations. In particular, we are able to identify limitations related to the behavioral
space generation approach, to the compliance-checking technique, and limitations related to the evaluation.
7.4.1. Generation Approach
Limitations related to the generation approach concern two facets. First, it has to be considered that
the natural language processing techniques on which we build our approach are not fully accurate. The
model generation approach from [9] is heuristics-based, which means it does not cover all possible linguistic
patterns that can be used to express behavioral relations. For instance, the approach cannot handle constructs
corresponding to OR-gateways in process models, such as “At least two of the following three activities
should be performed”. However, since this approach represents the state of the art, it does provide an
accurate reflection of the quality of model generation approaches. Furthermore, it is important to stress
that our generation approach is largely independent of the underlying model generation approach. As long
as this approach can provide information regarding source, relation type, and target references, behavioral
spaces can be generated according to the algorithms detailed in Section 4.1.
Second, it has to be taken into account that we generate interpretations for statements with scope am-
biguity based on three commonalities. While these commonalities represent the plausible options in the
context of process descriptions, it is possible that, in certain situations, other properties are necessary to
capture the set of activities to which a statement refers. Still, the behavioral space will identify such state-
ments as ambiguous and help users to analyze the impact of this ambiguity on compliance checks.
7.4.2. Compliance Checking
A point of consideration regarding our compliance-checking technique is that we perform compliance
checks based on behavioral profiles. The expressive power of these relations has been shown to be less
than that of process modeling notations such as Petri nets, which are used by certain other compliance-
checking techniques [4]. Because of this lack of expressive power, behavioral profiles abstract from certain
process behavior. As a result, compliance checking based on behavioral profiles is less restrictive than
compliance checking techniques based on Petri nets. [20] provides a detailed overview of the restrictions
on expressiveness. Nevertheless, we have selected behavioral profiles because they enable highly ecient
compliance checks, which is an important prerequisite given the numerous possible process interpretations
29
that can exist in behavioral spaces. Furthermore, by using behavioral predicates to express behavioral
relations, we are able to combine the implications of dierent statement interpretations in an intuitive and
safe manner. Lastly, it is important to stress that the notion of a behavioral space as a basis for compliance
checking is independent of the underlying compliance measures. As long as behavioral spaces are used to
capture the various interpretations of ambiguous process descriptions, the same reasoning can be directly
transferred to other notions of process compliance.
7.4.3. Evaluation
As for limitations related to the evaluation, we would like to point out that the presented quantitative
results are bound to the specifics of the textual process descriptions included in the test collection. The
employed data collection does not form a statistically representative sample. In fact, the creation of such
a sample is hardly feasible since natural language oers such a high degree of freedom. However, the
data set is composed of several heterogeneous sources and contains a considerable degree of ambiguity.
Furthermore, this data set has been previously used to evaluate the process model generation approach of
Friedrich et al. [9]. Therefore, we are confident that our evaluation shows a realistic picture of the impact
of behavioral ambiguity in practical settings.
8. Related Work
The work presented in this paper primarily relates to three major research streams: compliance check-
ing, the analysis of textual process descriptions, and the representation of data uncertainty. In the remainder
of this section, we describe each of these related areas.
8.1. Compliance Checking
Process compliance checking or conformance checking involves the comparison of dierent behavioral
specifications. These techniques are applied in various application scenarios, including process query-
ing [6], legal compliance [21], and auditing [2]. Most compliance checking techniques focus on the com-
parison of observed process behavior, as captured in execution traces, to a process specification. A plethora
of techniques exist for this purpose (cf. [4, 22, 23, 24]). In this paper, we used techniques that perform com-
pliance checks against behavioral profiles, introduced in [5]. The advantage of these techniques is their high
30
computational eciency. Therefore, they represent an attractive choice for compliance checking against be-
havioral spaces, which can contain a vast number of dierent behavioral interpretations. However, other
commonly used techniques compare execution traces to process models based on so-called alignments.
These techniques (cf. [4, 23]) provide dierent diagnostic information than compliance checks based on
behavioral profiles. Furthermore, the compliance checks can be considered to be more accurate in some
situations, because behavioral profile relations abstract from certain details of process behavior. It is im-
portant to emphasize at this point that the notion of a behavioral space, which consists of dierent process
interpretations, can be easily translated to other compliance checking notions, such as alignment-based
checks.
The aforementioned compliance checking techniques all rely on a structured specification of the be-
havior allowed for a process, e.g. in the form of process models. In earlier work, we introduced the first
compliance checking approach that works with textual process descriptions [25], a less structured repre-
sentation format. The current manuscript extends the earlier conference paper in three ways. First, we
provided detailed descriptions of the algorithms and dictionaries used to automatically generate behavioral
spaces from textual process descriptions. Second, we introduced a method to reduce the level of uncertainty
in compliance checking results by supporting users in the ecient resolution of ambiguous parts of a tex-
tual description, a so-called pruning method. Lastly, we have extended the evaluation to demonstrate the
usefulness of our pruning method.
8.2. Analysis of Textual Process Descriptions
The majority of works that consider the analysis of textual artifacts related to business processes focus
on the automated derivation of process models from them. Respective techniques have been designed for
textual process descriptions [9, 26], group stories [13], use case descriptions [14], and textual methodolo-
gies [27]. From these, the text-to-model generation technique from Friedrich et al. [9] is typically recog-
nized as the state of the art [28]. Therefore, we used it as a basis for our own prototype and as a benchmark
for our evaluation. Though none of these existing works mentions the problem of behavioral ambiguity ex-
plicitly, all techniques impose assumptions on the interpretation of ambiguous behavioral statements. This
results in a single interpretation, i.e. a process model, for any given text. However, this comes with the great
disadvantage that the behavior allowed by this representation is much stricter than the behavior specified
31
in the textual description. Our earlier works on the comparison of textual process descriptions to process
models [29, 30] face similar issues when reasoning about the consistency of the two artifacts.
8.3. Representing Data Uncertainty
Similar to behavioral ambiguity in textual process descriptions, ambiguous or uncertain data is also
present in other application contexts. In these cases, uncertainty can be caused by, among others, data
randomness, incompleteness, and limitations of measuring equipment [31]. This has created a need for
algorithms and applications for uncertain data management [32]. As a result, the modeling of uncertain
data has been studied extensively (cf. [33, 34, 35, 36]). Our notion of a behavioral space builds on concepts
related to those used in uncertain data models. For instance, similar to the behavioral interpretations cap-
tured in a behavioral space, the model presented by Das Sarma et al. [36] uses a set of possible instances
to represent the spectrum of possible interpretations for an uncertain relation. Furthermore, the model de-
scribed in [33] uses conditions to capture dependencies between uncertain values. This notion has the same
result as the sets of behavioral relations that we derive from uncertain behavioral statements and convert
into dierent behavioral interpretations. Still, the technical aspects and application contexts of these uncer-
tain data models, which mostly relate to querying and data integration [32], dier considerably from the
process-oriented view of behavioral spaces.
9. Conclusions
In this paper, we introduced the concept of a behavioral space to deal with the ambiguity in textual
process descriptions. A behavioral space captures all possible interpretations of a textual process descrip-
tion. In this way, it avoids the issue of focusing on a single process-oriented interpretation of a text. We
demonstrated that a behavioral space is a useful concept for reasoning about a process described by a text.
In particular, we used a quantitative evaluation with a set of 47 textual process descriptions to illustrate
that a behavioral space strikes a reasonable balance between ignoring ambiguous statements and imposing
fixed interpretations on them. Furthermore, we demonstrated the usefulness of a semi-automated pruning
technique to quickly reduce the level of uncertainty remaining in compliance-checking results.
While we defined the behavioral space concept based on textual process descriptions, we would like to
point out that its use is not limited to pure text. A behavioral space can also help to capture the full behavior
32
of other types of ambiguous process representations. Consider, for instance, process models containing
activities that describe several streams of actions by using ambiguous behavioral statements such as “and”.
It has been found that such non-atomic activities can result in dierent interpretations of how to properly
execute the process [37]. A behavioral space would also be useful for application scenarios beyond com-
pliance checking. In future work, we set out to explore these usage scenarios of behavioral spaces in more
detail.
Bibliography
[1] S. English, S. Hammond, The rising costs of non-compliance: from the end of a career to the end of a firm, Thomson Reuters,
2014.
[2] R. Accorsi, T. Stocker, On the exploitation of process mining for security audits: the conformance checking case, in: Pro-
ceedings of the 27th Annual ACM Symposium on Applied Computing, ACM, 2012, pp. 1709–1716.
[3] W. M. van Aalst, K. M. van Hee, J. M. van Werf, M. Verdonk, Auditing 2.0: using process mining to support tomorrow’s
auditor, Computer 43 (3) (2010) 90–93.
[4] W. Van der Aalst, A. Adriansyah, B. van Dongen, Replaying history on process models for conformance checking and
performance analysis, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2 (2) (2012) 182–192.
[5] M. Weidlich, J. Mendling, M. Weske, Ecient consistency measurement based on behavioral profiles of process models,
IEEE Transactions on Software Engineering 37 (3) (2011) 410–429.
[6] A. Awad, G. Decker, M. Weske, Ecient compliance checking using bpmn-q and temporal logic, in: International Conference
on Business Process Management, Springer, 2008, pp. 326–341.
[7] F. Chesani, P. Mello, M. Montali, F. Riguzzi, M. Sebastianis, S. Storari, Checking compliance of execution traces to business
rules, in: International Conference on Business Process Management, Springer, 2008, pp. 134–145.
[8] H. Van der Aa, H. Leopold, F. Mannhardt, H. A. Reijers, On the fragmentation of process information: Challenges, solutions,
and outlook, in: Enterprise, Business-Process and Information Systems Modeling, Springer, 2015, pp. 3–18.
[9] F. Friedrich, J. Mendling, F. Puhlmann, Process model generation from natural language text, in: Advanced Information
Systems Engineering, Springer, 2011, pp. 482–496.
[10] H. Leopold, J. Mendling, A. Polyvyanyy, Supporting process model validation through natural language generation, IEEE
Transactions on Software Engineering 40 (8) (2014) 818–840.
[11] M. Selway, G. Grossmann, W. Mayer, M. Stumptner, Formalising natural language specifications using a cognitive linguis-
tic/configuration based approach, Information Systems 54 (2015) 191–208.
[12] H. Leopold, H. van der Aa, F. Pittke, M. Rael, J. Mendling, H. A. Reijers, Integrating textual and model-based process
descriptions for comprehensive process search, in: International Workshop on Business Process Modeling, Development and
Support, Springer International Publishing, 2016, pp. 51–65.
33
[13] J. C. de Gonc¸alves, F. M. Santoro, F. A. Baiao, Business process mining from group stories, in: Computer Supported
Cooperative Work in Design, 2009. CSCWD 2009. 13th International Conference on, IEEE, 2009, pp. 161–166.
[14] A. Sinha, A. Paradkar, Use cases to process specifications in Business Process Modeling Notation, in: IEEE International
Conference on Web Services, 2010, pp. 473–480.
[15] S. Smirnov, M. Weidlich, J. Mendling, Business process model abstraction based on behavioral profiles, in: Service-Oriented
Computing, Springer, 2010, pp. 1–16.
[16] D. Klein, C. D. Manning, Accurate unlexicalized parsing, in: Proceedings of the 41st Annual Meeting of the ACL-Volume 1,
ACL, 2003, pp. 423–430.
[17] M. Weidlich, A. Polyvyanyy, N. Desai, J. Mendling, M. Weske, Process compliance analysis based on behavioural profiles,
Information Systems 36 (7) (2011) 1009–1025.
[18] J. R. Quinlan, Induction of decision trees, Machine learning 1 (1) (1986) 81–106.
[19] S. R. Safavian, D. Landgrebe, A survey of decision tree classifier methodology, IEEE Trans. Systems, Man, & Cybernetics.
[20] A. Polyvyanyy, A. Armas-Cervantes, M. Dumas, L. Garc´
ıa-Ba˜
nuelos, On the expressive power of behavioral profiles, Formal
Aspects of Computing 28 (4) (2016) 597–613.
[21] S. Sadiq, G. Governatori, K. Namiri, Modeling control objectives for business process compliance, in: International confer-
ence on business process management, Springer, 2007, pp. 149–164.
[22] E. Ramezani, D. Fahland, W. M. P. van der Aalst, Where did i misbehave? diagnostic information in compliance checking,
in: International conference on business process management, Springer, 2012, pp. 262–278.
[23] A. Adriansyah, B. van Dongen, W. van der Aalst, Conformance checking using cost-based fitness analysis, in: Enterprise
Distributed Object Computing Conference (EDOC), 2011 15th IEEE International, IEEE, 2011, pp. 55–64.
[24] J. Munoz-Gama, J. Carmona, W. M. P. Van Der Aalst, Single-entry single-exit decomposed conformance checking, Informa-
tion Systems 46 (2014) 102–122.
[25] H. van der Aa, H. Leopold, H. A. Reijers, Dealing with behavioral ambiguity in textual process descriptions, in: International
Conference on Business Process Management, Springer International Publishing, 2016, pp. 271–288.
[26] A. Ghose, G. Koliadis, A. Chueng, Process discovery from model and text artefacts, in: Services, 2007 IEEE Congress on,
IEEE, 2007, pp. 167–174.
[27] E. Viorica Epure, P. Martin-Rodilla, C. Hug, R. Deneckere, C. Salinesi, Automatic process model discovery from textual
methodologies, in: Research Challenges in Information Science (RCIS), 2015 IEEE 9th International Conference on, IEEE,
2015, pp. 19–30.
[28] M. Riefer, S. F. Ternis, T. Thaler, Mining process models from natural language text: A state-of-the-art analysis, in: Multi-
konferenz Wirtschaftsinformatik (MKWI-16), March 9-11, Illmenau, Germany, Universit¨
at Illmenau, 2016.
[29] H. Van der Aa, H. Leopold, H. A. Reijers, Detecting inconsistencies between process models and textual descriptions, in:
Business Process Management, Springer, 2015, pp. 90–105.
[30] H. van der Aa, H. Leopold, H. A. Reijers, Comparing textual descriptions to process models – the automatic detection of
inconsistencies, Information Systems, 2016 (in press).
34
[31] J. Pei, B. Jiang, X. Lin, Y. Yuan, Probabilistic skylines on uncertain data, in: Proceedings of the 33rd international conference
on Very large data bases, 2007, pp. 15–26.
[32] C. C. Aggarwal, P. S. Yu, A survey of uncertain data algorithms and applications, Knowledge and Data Engineering, IEEE
Transactions on 21 (5) (2009) 609–623.
[33] S. Abiteboul, P. Kanellakis, G. Grahne, On the representation and querying of sets of possible worlds, Vol. 16, ACM, 1987.
[34] T. Imieli´
nski, W. Lipski Jr, Incomplete information in relational databases, Journal of the ACM (JACM) 31 (4) (1984) 761–
791.
[35] L. Peng, Y. Diao, Supporting data uncertainty in array databases, in: ACM SIGMOD International Conference on Manage-
ment of Data, ACM, 2015, pp. 545–560.
[36] A. D. Sarma, O. Benjelloun, A. Halevy, J. Widom, Working models for uncertain data, in: 22nd International Conference on
Data Engineering, IEEE, 2006, pp. 7–7.
[37] F. Pittke, H. Leopold, J. Mendling, When language meets language: Anti patterns resulting from mixing natural and modeling
language, in: Business Process Management Workshops, Springer, 2014, pp. 118–129.
35
... Some of these studies analyze the labeling style of process elements to identify incorrect labeling styles [12], whereas others develop techniques to automatically generate textual process descriptions [13]. Similarly, work has been done to automatically generate process models from textual process descriptions [14], as well as on checking process compliance of process models against textual descriptions [15] and the automatic detection of inconsistencies between the two descriptions [16]. Furthermore, studies have been conducted on employing NLP based techniques for finding relevant process models from a collection of process model using their textual descriptions [17] [18]. ...
Article
Full-text available
COVID-19 has imposed unprecedented restrictions on the society which has compelled the organizations to work ambidextrously. Consequently, the organizations need to go continuously monitor the performance of their business process and improve them. To facilitate that, this study has put-forth the idea of augmenting business process models with end-user feedback and proposed a machine learning based approach (AugProMo) to automatically identify correspondences between end-user feedback and elements of process models. In particular, we have generated three valuable resources, process models, feedback corpus and gold standard benchmark correspondences. Furthermore, 2880 experiments are performed to identify the most effective combination of word embeddings, feature vectors, data balancing and machine learning techniques. The study concludes that the proposed approach is effective for augmenting business process models with end-user feedback.
... Some of these studies analyze the labeling style of process elements to identify incorrect labeling styles [13], whereas others develop techniques to automatically generate textual process descriptions [14]. Similarly, work has been done to automatically generate process models from textual process descriptions [15], as well as on checking process compliance of process models against textual descriptions [16] and the automatic detection of inconsistencies between the two descriptions [17]. Furthermore, studies have been conducted on employing NLP based techniques for finding relevant process models from a collection of process model using their textual descriptions [18] [19]. ...
Article
Full-text available
COVID-19 has imposed unprecedented restrictions on the society which has compelled the organizations to work ambidextrously. Consequently, the organizations need to continuously monitor the performance of their business process and improve them. To facilitate that, this study has put-forth the idea of augmenting business process models with end-user feedback and proposed a machine learning based approach (AugProMo) to automatically identify correspondences between end-user feedback and elements of process models. In particular, we have generated three valuable resources, process models, feedback corpus and gold standard benchmark correspondences. Furthermore, 2880 experiments are performed to identify the most effective combination of word embeddings, feature vectors, data balancing and machine learning techniques. The study concludes that the proposed approach is effective for augmenting business process models with end-user feedback.
... Research in conceptual modeling has already begun leveraging AI. These efforts have focused on using AI-based techniques to design conceptual models from narratives and use cases [7], [12], or assessing and predicting model quality [14], [15], among others [5], [6], [8], [16]. We suggest to expand these efforts into a full-fledged integration of AI into conceptual modeling, leading to a proposal for developing conceptual modeling systems. ...
Conference Paper
Full-text available
Although conceptual modeling has been integral to information systems development and use, much of its potential remains underutilized. This is evidenced by the lack of a broad adoption of modeling concepts beyond traditional database design and process modeling applications. In this paper, we propose a fundamentally new perspective on conceptual modeling that integrates artificial intelligence (AI) components with conceptual modeling. This perspective enables us to go beyond passive conceptual modeling representations, such as diagrams, to design conceptual modeling systems that have the capability to learn and evolve.
... Remaining in the field of work of process and text alignment, Han van der Aa et. al. proposed an approach to identify inconsistencies between process models and their textual descriptions [61] and an approach that used behavioral spaces to capture all the possible interpretations of textual process descriptions in a systematic manner for compliance checking [62]. Our work does not deal with compliance checking and does not consider inconsistencies between the models and the descriptions. ...
Article
Full-text available
Context Traceability Links Recovery has been a topic of interest for many years, resulting in techniques that perform traceability based on the linguistic clues of the software artifacts under study. However, BPMN models tend to present an overall lack of linguistic clues when compared to code-based artifacts or code generation models. Hence, TLR becomes a harder task when performed among requirements and BPMN models. Objective This paper proposes a novel approach, called METRA, that leverages the execution traces of BPMN to expand the BPMN models. The expansion of the BPMN models enhances their linguistic clues, bridging the language between BPMN models and other software artifacts, and improving the TLR process between requirements and BPMN models. Methods The proposed approach is evaluated through a real-world industrial case study, comparing its outcomes against two state-of-the-art baselines, TLR and LORE. The paper also evaluates the combination of METRA with LORE against the rest of the approaches, including standalone METRA. The evaluation process generates a report of measurements (precision, recall, f-measure, and MCC), over which a statistical analysis is conducted. Results Results show that approaches based on METRA maintain the excellent precision results obtained by baseline approaches (74.2% for METRA, 78.8% for METRA+LORE), whilst also improving the recall results from the unacceptable values obtained by the baselines to good values (72.4% for METRA, 73.9% for METRA+LORE). Moreover, according to the statistical analysis, the differences in the results obtained by the evaluated approaches are statistically significant. Conclusions This paper opens a novel field of work in TLR by analyzing the improvement of the TLR process through the inclusion of linguistic clues present in execution traces, and discusses ideas for further research that can delve into this promising direction explored by our work.
... In particular, it may be grounded in other sets of behavioural rules, such as those presented in [13], [14], which are then instantiated for the original log to capture the semantics of the underlying process. Moreover, rules may also originate from other sources, such as textual documents [15]. However, deriving the rules from the original log ensures that trace variants in the original log are more likely to be preserved. ...
Preprint
Privacy-preserving process mining enables the analysis of business processes using event logs, while giving guarantees on the protection of sensitive information on process stakeholders. To this end, existing approaches add noise to the results of queries that extract properties of an event log, such as the frequency distribution of trace variants, for analysis.Noise insertion neglects the semantics of the process, though, and may generate traces not present in the original log. This is problematic. It lowers the utility of the published data and makes noise easily identifiable, as some traces will violate well-known semantic constraints.In this paper, we therefore argue for privacy preservation that incorporates a process semantics. For common trace-variant queries, we show how, based on the exponential mechanism, semantic constraints are incorporated to ensure differential privacy of the query result. Experiments demonstrate that our semantics-aware anonymization yields event logs of significantly higher utility than existing approaches.
Chapter
With the widespread popularity of smart devices in people’s daily lives, people hope to communicate with devices through a more humane interactive way. The natural language communication simulation computing system can provide technical support for the interactivity of smart devices. Genetic algorithm (GA) is a novel intelligent algorithm that has been studied hot in recent years. Therefore, this article will start the design research of the natural language communication simulation calculation (NLCSC) system model based on GA. This paper combines the advantages of BP neural network and GA, and proposes an improved adaptive genetic algorithm. This article designs the NLCSC system model in detail, including the architecture of the input layer, convolutional layer, and pooling layer. This article compares with the calculation model of BP neural network by experiment, it is found that the average calculation error of BP algorithm is 1.46%, while the average calculation error of GA-BP calculation model is 0.243%. This data verifies that the GA-based NLCSC system model has a high calculation accuracy.KeywordsNatural languageBP neural networkGenetic algorithmSystem model
Preprint
Full-text available
Automatic Process Discovery aims at developing algorithmic methodologies for the extraction and elicitation of process models as described in data. While Process Discovery from event-log data is a well established area, that has already moved from research to concrete adoption in a mature manner, Process Discovery from text is still a research area at an early stage of development, which rarely scales to real world documents. In this paper we analyze, in a comparative manner, reference state-of-the-art literature, especially for what concerns the techniques used, the process elements extracted and the evaluations performed. As a result of the analysis we discuss important limitations that hamper the exploitation of recent Natural Language Processing techniques in this field and we discuss fundamental limitations and challenges for the future concerning the datasets, the techniques, the experimental evaluations, and the pipelines currently adopted and to be developed in the future.
Article
Purpose -This study aims to draw the attention of business process management (BPM) research and practice to the textual data generated in the processes and the potential of meaningful insights extraction. The authors apply standard natural language processing (NLP) approaches to gain valuable knowledge in the form of business process (BP) complexity concept suggested in the study. It is built on the objective, subjective and meta-knowledge extracted from the BP textual data and encompassing semantics, syntax and stylistics. As a result, the authors aim to create awareness about cognitive, attention and reading efforts forming the textual data-based BP complexity. The concept serves as a basis for the development of various decision-support solutions for BP workers. Design/methodology/approach -The starting point is an investigation of the complexity concept in the BPM literature to develop an understanding of the related complexity research and to put the textual data-based BP complexity in its context. Afterward, utilizing the linguistic foundations and the theory of situation awareness (SA), the concept is empirically developed and evaluated in a real-world application case using qualitative interview-based and quantitative data-based methods. Findings - In the practical, real-world application, the authors confirmed that BP textual data could be used to predict BP complexity from the semantic, syntactic and stylistic viewpoints. The authors were able to prove the value of this knowledge about the BP complexity formed based on the (1) professional contextual experience of the BP worker enriched by the awareness of cognitive efforts required for BP execution (objective knowledge), (2) business emotions enriched by attention efforts (subjective knowledge) and (3) quality of the text, i.e. professionalism, expertise and stress level of the text author, enriched by reading efforts (meta-knowledge). In particular, the BP complexity concept has been applied to an industrial example of Information Technology Infrastructure Library (ITIL) change management (CHM) Information Technology (IT) ticket processing. The authors used IT ticket texts from two samples of 28,157 and 4,625 tickets as the basis for the analysis. The authors evaluated the concept with the help of manually labeled tickets and a rule-based approach using historical ticket execution data. Having a recommendation character, the results showed to be useful in creating awareness regarding cognitive, attention and reading efforts for ITIL CHM BP workers coordinating the IT ticket processing. Originality/value - While aiming to draw attention to those valuable insights inherent in BP textual data, the authors propose an unconventional approach to BP complexity definition through the lens of textual data. Hereby, the authors address the challenges specified by BPM researchers, i.e. focus on semantics in the development of vocabularies and organization-and sector-specific adaptation of standard NLP techniques.
Conference Paper
Full-text available
Textual process descriptions are widely used in organizations since they can be created and understood by virtually everyone. The inherent ambiguity of natural language, however, impedes the automated analysis of textual process descriptions. While human readers can use their context knowledge to correctly understand statements with multiple possible interpretations, automated analysis techniques currently have to make assumptions about the correct meaning. As a result, automated analysis techniques are prone to draw incorrect conclusions about the correct execution of a process. To overcome this issue, we introduce the concept of a behavioral space as a means to deal with behavioral ambiguity in textual process descriptions. A behavioral space captures all possible interpretations of a textual process description in a systematic manner. Thus, it avoids the problem of focusing on a single interpretation. We use a compliance checking scenario and a quantitative evaluation with a set of of 47 textual process descriptions to demonstrate the usefulness of a behavioral space for reasoning about a process described by a text. Our evaluation demonstrates that a behavioral space strikes a balance between ignoring ambiguous statements and imposing fixed interpretations on them.
Conference Paper
Full-text available
Documenting business processes using process models is common practice in many organizations. However, not all process information is best captured in process models. Hence, many organizations complement these models with textual descriptions that specify additional details. The problem with this supplementary use of textual descriptions is that existing techniques for automatically searching process repositories are limited to process models. They are not capable of taking the information from textual descriptions into account and, therefore, provide incomplete search results. In this paper, we address this problem and propose a technique that is capable of searching textual as well as model-based process descriptions. It automatically extracts process information from both descriptions types and stores it in a unified data format. An evaluation with a large Austrian bank demonstrates that the additional consideration of textual descriptions allows us to identify more relevant processes from a repository.
Conference Paper
Full-text available
Workflow projects are time-consuming processes. They include the knowledge extraction and the creation of process models. The necessary information is often available as textual resources. Therefore, process model mining from natural language text has been a research area of growing interest. This paper gives an overview of the current state-of-the-art in text-to-model mining. For this purpose, different approaches focusing on business process models are presented, analyzed and compared against each other on a theoretical and technical level. The resulting overview covers both advantages and disadvantages of current techniques. This should establish a sturdy basis on which further research can be conducted.
Conference Paper
Full-text available
Business process modeling has become an integral part of many organizations for documenting and redesigning complex organizational operations. However, the increasing size of process model repositories calls for automated quality assurance techniques. While many aspects such as formal and structural problems are well understood, there is only a limited understanding of semantic issues caused by natural language. One particularly severe problem arises when modelers employ natural language for expressing control-flow constructs such as gateways or loops. This may not only negatively affect the understandability of process models, but also the performance of analysis tools, which typically assume that process model elements do not encode control-flow related information in natural language. In this paper, we aim at increasing the current understanding of mixing natural and modeling language and therefore exploratively investigate three process model collections from practice. As a result, we identify a set of nine anti patterns for mixing natural and modeling language.
Conference Paper
Full-text available
An organization's knowledge on its business processes represents valuable corporate knowledge because it can be used to enhance the performance of these processes. In many organizations, documentation of process knowledge is scattered around various process information sources. Such information fragmentation poses considerable problems if, for example, stakeholders wish to develop a comprehensive understanding of their operations. The existence of efficient techniques to combine and integrate process information from different sources can therefore provide much value to an organization. In this work, we identify the general challenges that must be overcome to develop such techniques. This paper illustrates how these challenges should be and, to some extent , are being met in research. Based on these insights, we present three main frontiers that must be further expanded to successfully counter the fragmentation of process information in organizations.
Article
We represent a set of possible worlds using an incomplete information database. The representation techniques that we study form a hierarchy, which generalizes relations of constants. This hierarchy ranges from the very simple Codd-table, (i e, a relation of constants and distinct variables called nulls, which stand for values present but unknown), to much more complex mechanisms involving views on conditioned-tables, (i e, queries on Codd-tables together with conditions). The views we consider are the queries that have polynomial data-complexity on complete information databases. Our conditions are conjunctions of equalities and inequalities.(1) We provide matching upper and lower bounds on the data-complexity of testing containement, membership, and uniqueness for sets of possible worlds and we fully classify these problems with respect to our representation hierarchy. The most surprising result in this classification is that it is complete in Π2p, whether a set of possible worlds represented by a Codd-table is a subset of a set of possible worlds represented by a Codd-table with one conjuction of inequalities.(2) We investigate the data-complexity of querying incomplete information databases. We examine both asking for certain facts and for possible facts. Our approach is algebraic but our bounds also apply to logical databases. We show that asking for a certain fact is coNP-complete, even for a fixed first order query on a Codd-table. We thus strengthen a lower bound of [16], who showed that this holds for a Codd-table with a conjunction of inequalities. For each fixed positive existential query we present a polynomial algorithm solving the bounded possible fact problem of this query on conditioned-tables. We show that our approach is, in a sense, the best possible, by deriving two NP-completeness lower bounds for the bounded possible fact problem when the fixed query contains either negation or recursion.
Article
An incomplete information relational database combines two types of information about the real world modeled by the database: (a) the information represented by tables with null values (“value not known”) allowed as entries, and (b) the data dependencies, which are known to be satisfied in the real world. We view the well known chase procedure as a process which transforms type (b) information into an “equivalent” type (a) form. Assuming that the data dependencies are arbitrary implicational dependencies, we show that this transformation is not quite equivalent, but the corruption of information introduced cannot be discovered if the query language uses the operations of projection, positive selection (i.e. no negation in selection condition), union, natural join and renaming of attributes. This result can be interpreted also as the new important property of chase.The influence of so-called view dependencies on the table with null values is also examined.
Article
Many organizations maintain textual process descriptions alongside graphical process models. The purpose is to make process information accessible to various stakeholders, including those who are not familiar with reading and interpreting the complex execution logic of process models. Despite this merit, there is a clear risk that model and text become misaligned when changes are not applied to both descriptions consistently. For organizations with hundreds of different processes, the effort required to identify and clear up such conflicts is considerable. To support organizations in keeping their process descriptions consistent, we present an approach to automatically identify inconsistencies between a process model and a corresponding textual description. Our approach detects cases where the two process representations describe activities in different orders and detect process model activities not contained in the textual description. A quantitative evaluation with 53 real-life model-text pairs demonstrates that our approach accurately identifies inconsistencies between model and text.
Article
Behavioral profiles have been proposed as a behavioral abstraction of dynamic systems, specifically in the context of business process modeling. A behavioral profile can be seen as a complete graph over a set of task labels, where each edge is annotated with one relation from a given set of binary behavioral relations. Since their introduction, behavioral profiles were argued to provide a convenient way for comparing pairs of process models with respect to their behavior or computing behavioral similarity between process models. Still, as of today, there is little understanding of the expressive power of behavioral profiles. Via counter-examples, several authors have shown that behavioral profiles over various sets of behavioral relations cannot distinguish certain systems up to trace equivalence, even for restricted classes of systems represented as safe workflow nets. This paper studies the expressive power of behavioral profiles from two angles. Firstly, the paper investigates the expressive power of behavioral profiles and systems captured as acyclic workflow nets. It is shown that for unlabeled acyclic workflow net systems, behavioral profiles over a simple set of behavioral relations are expressive up to configuration equivalence. When systems are labeled, this result does not hold for any of several previously proposed sets of behavioral relations. Secondly, the paper compares the expressive power of behavioral profiles and regular languages. It is shown that for any set of behavioral relations, behavioral profiles are strictly less expressive than regular languages, entailing that behavioral profiles cannot be used to decide trace equivalence of finite automata and thus Petri nets.
Conference Paper
Uncertain data management has become crucial to scientific applications. Recently, array databases have gained popularity for scientific data processing due to performance benefits. In this paper, we address uncertain data management in array databases, which may involve both value uncertainty within individual tuples and position uncertainty regarding where a tuple should belong in an array given uncertain dimension attributes. Our work defines the formal semantics of array operations under both value and position uncertainty. To address the new challenge raised by position uncertainty, we propose a suite of storage and evaluation strategies for array operations, with a focus on a new scheme that bounds the overhead of querying by strategically treating tuples with large variances via replication in storage. Results from real datasets show that for common workloads, our best-performing techniques outperform alternative methods based on state-of-the-art indexes by 1.7x to 4.3x for the Subarray operation and 1 to 2 orders of magnitude for Structure-Join, at only a small storage cost.