Content uploaded by Martin Jergler
Author content
All content in this area was uploaded by Martin Jergler on Jul 14, 2015
Content may be subject to copyright.
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 1
Safe Distribution and Parallel Execution of
Data-centric Workflows over the
Publish/Subscribe Abstraction
Mohammad Sadoghi, Martin Jergler, Hans-Arno Jacobsen, Richard Hull, Roman Vacul´
ın
Abstract
—In this work, we develop an approach for the safe distribution and parallel execution of data-centric workflows over the
publish/subscribe abstraction. In essence, we design a unique representation of data-centric workflows, specifically designed to exploit
the loosely coupled and distributed nature of publish/subscribe systems. Furthermore, we argue for the practicality and expressiveness
of our approach by mapping a standard and industry-strength data-centric workflow model, namely, IBM Business Artifacts with
Guard-Stage-Milestone (GSM), into the publish/subscribe abstraction. In short, the contributions of this work are three-fold: (1) mapping
of data-centric workflows into publish/subscribe to achieve distributed and parallel execution; (2) detailed theoretical analysis of the
mapping; and (3) formulation of the complexity of the optimal workflow distribution over the publish/subscribe abstraction as an NP-hard
problem.
Index Terms—Data-centric workflows, publish/subscribe, workflow distribution, business artifacts, case-management
F
1 INTRODUCTION
Typically, workflows support globally distributed business pro-
cesses (BPs) involving data and participants from disparate
geographical locations and organizations. At the same time, the
vast majority of workflow and business process management
(BPM) systems are either centralized in nature, relying on
centralized processing of associated data, or support only rather
restricted forms of distributed execution without considering data
appropriately [
5
], [
13
], [
7
]. In such environments, for instance,
in global corporations, it is not uncommon that large amounts
of data need to be regularly moved across the globe resulting
in decreased business efficiency. Furthermore, compliance with
legal regulations, as the privacy of business-relevant data, or other
constraints that are imposed by individual organizations or even
governments are hard to address. For example, the eighth Data
Protection Principle of the Data Protection Act (DPA) in the
United Kingdom requires that personal data (e.g., customer infor-
mation) must not be transferred outside the European Economic
Area unless the country or territory to which the data are to be
transferred provides an adequate level of protection for personal
data [
14
]. In summary, a major hurdle of current workflow
systems is their negligence of a distributed execution that adheres
to the actual geographical needs (i.e., locality of data), the
workflow scale (e.g., the number of tasks and/or instances), and
compliance with regulations or constraints [17], [12].
In recent years, there has been a growing interest in frame-
•
Throughout 2010 - 2015, this work has been supported by an IBM Faculty
Award, an NSERC Discovery Grant, and an Alexander von Humboldt Award.
From 2010-2012, M. Sadoghi was with the Middleware Systems Research
Group at the University of Toronto.
•
M. Sadoghi R. Hull and R. Vacul
´
ın are with IBM T.J. Watson Research Center,
Yorktown Heights, USA.
•M. Jergler is with Technische Universit¨
at M¨
unchen, Germany.
works for specifying and deploying workflows that combine
both data and process as first-class citizens [
2
], [
24
], [
26
], [
30
],
[
12
], [
25
], [
18
]. Data-centric workflows have a potential to
address the problem described above. Process and associated
data are tightly coupled in a sense that both are expressed in
a single model without giving explicit favor to one of them.
This simplifies workflow distribution according to geographical,
organizational, and legal constraints as only a single model
needs to be distributed. In this paper, we consider one such
data-centric BPM approach called Business Artifacts (BA) [
6
],
[
9
], [
24
] and a recent meta-model for modeling business artifacts
called Guard-Stage-Milestone (GSM) [
15
], [
12
], [
16
]. We focus
on how business processes specified in GSM can be distributed
and executed on a massively parallel infrastructure employing
the publish/subscribe (pub/sub) abstraction. Due to recent trends
towards ad-hoc and adaptable workflows (e.g., the recent Case
Management standard [
20
], [
21
], [
25
], which was significantly
influenced by GSM), we believe that the loosely coupled nature
of pub/sub systems provides a convenient substrate for workflow
execution. Adaptations like the addition or removal of individual
tasks, users, and constraints can be accomplished during runtime
by (un-)subscribing to events that drive the execution.
The ultimate goal of distributing a data-centric workflow is
to achieve an effective grouping of workflow components such
as flow activities and associated data fragments, respecting a set
of constraints such as the infrastructure topology, geographical
constraints, or pricing factors, while minimizing communication
or data transport costs. This work provides the foundation for
developing a mapping from data-centric workflow primitives
to publish/subscribe primitives, while maintaining an equivalent
operational semantics. This foundation can be applied to
identify an optimal workflow distribution that conforms to
given constraints. Moreover, the pub/sub nature decouples
individual workflow components and thus facilitates their ability
for migration to enable effective scalability of the system.
0000–0000/00/$00.00 c
2015 IEEE Published by the IEEE Computer Society
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 2
In the artifact-centric paradigm, BPs are modeled as
interactions of key business-relevant, conceptual entities called
Business Artifacts (or “artifacts,” for short). Artifacts are
modeled using an information model, that includes attributes for
storing all business-relevant information about the artifact, and
alifecycle model, that represents the possible ways the artifact
might evolve. The artifact approach typically yields a high-level
factoring of BPs into a handful of interacting artifact types.
The recently introduced data-centric workflow model known
as Business Artifacts with Guard-Stage-Milestone Lifecycles
meta-model [
15
], [
16
], [
12
] provides a declarative approach for
specifying artifact lifecycles. GSM supports parallelism and
modularity, with an operational semantics based on a variant
of Event-Condition-Action (ECA) rules. There are four key
elements in the GSM meta-model: (a) the information model; (b)
milestones, which correspond to business-relevant operational
objectives that are achieved (and possibly invalidated) based on
triggering events and/or conditions over the information models
of BA instances; (c) stages, which correspond to clusters of
activity intended to achieve milestones; and (d) guards, which
control when stages are opened or closed, respectively. Multiple
stages of an artifact instance may be active at the same time,
which enables parallelism. Hierarchical structuring of the stages
supports a rich form of modularity.
The operational semantics of GSM is characterized by how
a single “incoming external event” is incorporated into the
current “snapshot” of the information model of a GSM-based
system [
12
]. This semantics extends the well-known Event-
Condition-Action (ECA) rule paradigm. It is centered around
business steps (or B-steps, for short) that focus on the full impact
of incorporating incoming external events. In particular, the focus
is on what milestones (i.e., goals or objectives) are achieved
or invalidated and what stages (i.e., tasks) are opened and
closed, as a result of the incoming event. Changes in milestone
and stage status are treated as internal “status events” and can
trigger further status changes in the B-step. Intuitively, a B-step
corresponds to the smallest unit of business-relevant change
that can occur to a data-centric workflow. In this paper, we rely
on the incremental operational semantics introduced in [
12
],
which resembles the incremental application of ECA-like rules
providing a natural and direct approach for its implementation.
Starting with an information model and a set of data-centric
workflow primitives (based on a set of acyclic ECA-style rules)
that rely on an incremental operational semantics, we develop
a complete mapping of data-centric workflows into the pub/sub
abstraction. We enable this workflow transformation by redefin-
ing and formalizing key pub/sub constructs such as subscriptions
and publications together with their matching conditions, as well
as consumption and notification policies. As a result, once a data-
centric workflow is transformed into the pub/sub abstraction, it
seamlessly inherits the distributed and loosely-coupled benefits
of pub/sub. Altogether, we make the following contributions:
1)
Formalization of data-centric workflows and a suitable
pub/sub abstraction (Sec. 3-4).
2)
Mapping of data-centric workflows into the pub/sub
abstraction to achieve distributed and parallel execution
(Sec. 5-6).
3) Detailed theoretical analysis of the mapping (Sec. 7).
4)
Complexity analysis for optimal workflow distribution
over pub/sub (Sec. 8).
2 RELATED WORK
Our work is based on the data-centric business artifacts
paradigm [
24
], [
6
], [
9
], with the GSM meta-model being
a natural evolution from the earlier practical artifact meta-
models [
10
], [
28
], but using a declarative basis and supporting
modularity and parallelism within artifact instances. The existing
work on GSM operational semantics already addresses some sort
of parallelism [
29
] but does not consider the distributed execution
of business artifacts [
12
]. Recently, different data-centric
approaches have been proposed including the FlexConnect
meta-model [
26
], in which processes are organized as interacting
business objects, the Case Management paradigm [
30
], [
20
],
[
25
], [
21
], and the AXML Artifact Model [
2
], [
?
], which is
based on a declarative form of artifacts using Active XML
as a basis [
1
]. Another object-aware framework that aims at
unifying process and data is PHILharmonicFlows [
18
]. Here,
workflows are modeled as micro processes that represent the
data and behavior of individual objects and macro processes that
represent the interactions among such objects.
There exists a body of work focused on various aspects of
distributed workflow execution. For instance, [
5
] has a similar
goal as our work but is applied to an inherently activity-centric
workflow model, in which data is only considered as input and
output of flow activities (dataflow) and no data-centric execution
is supported. This is also true in [
4
], in which scheduling of
workflows in self-organizing wireless networks is addressed to
respect resource allocation constraints and dynamic topology
changes, or for [
27
], [
19
] that use pub/sub techniques to
implement some of the BPM execution aspects.
Distributed workflow execution has been studied in the 1990s
to also address scalability, fault resilience, and enterprise-wide
workflow management [
3
], [
32
], [
22
]. A detailed design of a
distributed workflow management system was proposed in [
3
].
The work bares similarity with our approach in that a business
process is fully distributed among a set of nodes. However, the
distribution architectures differ fundamentally. In our approach,
a content-based message routing substrate naturally enables
decoupling, dynamic reconfiguration, system monitoring, and
run-time control. This is not addressed in the earlier work.
A behavior-preserving transformation of a centralized activity
chart, representing a workflow, into an equivalent partitioned one
is described in [
22
] and realized in the MENTOR system [
32
].
MENTOR is inspired by compiler-based techniques, including
control flow and data flow analysis, in order to parallelize the
business process [
23
]. However, these approaches are comple-
mentary to our work since we operate with the original business
process model without analyzing the process. An advantage
of executing an unmodified process is that dynamic changes to
the executing business process instances are possible, as their
structure remains unchanged from the original specification.
Finally, an approach to integrate existing business processes as
part of a larger workflow is presented in [
8
]. The authors define
event points in business processes where events can be received
or sent. Events are filtered, correlated, and dispatched using a
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 3
centralized pub/sub model. The interaction of existing business
processes is synchronized by event communication. This is
similar to our work in terms of allowing business processes to
publish and subscribe. In our approach, activities in a business
process are decoupled, and the communication between them
is performed in a content-based pub/sub broker network.
3 DATA-CENTRIC WORKFLOWS
We begin by describing the concepts behind GSM for modeling
and executing data-centric workflows. Then we give a concrete
example of a BP represented in GSM, which will serve as a
running example throughout the remainder.
3.1 Overview of GSM Schema
A GSM schema describing a workflow is defined as a set of
artifact types with lifecycle, denoted by
A
, where each
A
is
defined by a six-tuple:
A=hx, Att, T yp, Stg, Mst, Lcyci.
In essence, a GSM workflow schema (or model) can succinctly
be described as the grouping of business processes into an
artifact type
A
that corresponds to an actual business entity
within an organization. Each artifact is comprised of a set of
goal-oriented work items with lifecycles, in which a work item is
modeled as stages (
Stg
) and goals are referred to as milestones
(
Mst
). In addition, each artifact may have many instances (
x
),
i.e., workflow instances or enactments over a globally shared
information model in order to store relevant data, e.g., a set of
data and status attributes (
Att
) and their associated data types
(
Typ
). Moreover, the lifecycle schema, i.e., the blueprint for the
artifact’s evolution through its various stages, is given by:
Lcyc=hSubstage, Tasks, Owns, Guards, Ach, Invi.
The lifecycle of each stage captures the hierarchy of its substages
(
Substage
), encapsulation of a task within each (sub)stage
(
Tasks
), information about stage nesting (
Owns
), conditions
for enabling (sub)stages (
Guards
), conditions for determining
the successful completion of (sub)stages (
Ach
), and conditions
for disabling (sub)stages (
Inv
). Roughly speaking, the GSM
schema defines a workflow through the lens of a stage, guards
for entering a stage, and milestones for leaving a stage.
A key primitive GSM construct, in addition to guard, stage,
and milestone, is the sentry, which in fact is the building block
of guards and milestones. A sentry is a Boolean formula of
type
χ(x)
that consists of two parts: the (triggering) event
ξ(x)
,
which is a Boolean formula to test the type of an incoming
external event, and a condition
ϕ(x)
, which is a Boolean formula
defined over a subset of status attributes. A sentry may have
three different forms: (i)
on ξ(x)if ϕ(x)
; (ii)
on ξ(x)
;
and (iii) if ϕ(x).
With respect to GSM execution, we focus on the incremental
formulation of the GSM operational semantics: a variation
of incremental firing of Event-Condition-Action (ECA) rules,
known as Prerequisite-Antecedent-Consequent (PAC) rules. PAC
rules differ from traditional ECA rules in a way that they also
incorporate a temporal aspect, i.e., the prerequisite, allowing
for conditions on prior system states. The set of PAC rules
can be derived in polynomial time from a GSM schema [
12
].
More importantly, the order of PAC rule firing is defined by the
generalized notion of the Polarized Dependency Graph (PDG).
The PDG imposes a topological sort order on PAC rule firing,
essentially a PAC rule stratification, in which no cyclic relation
among PAC rules are allowed, which requires the PDG graph to
be acyclic. The PDG imposed order on rule firing guarantees the
uniqueness and the termination properties in the context of defin-
ing the smallest logical unit of work, i.e., a B-step, as the well-
formedness of a finite set of PAC rules within the B-step [12].
The incremental formulation of GSM (in turn, the execution
of PAC rules in the prescribed order of the PDG) is driven and
initiated upon receiving an external event from the environment.
The set of all relevant PAC rules are executed in response
to the external event; the firing of PAC rules are sequenced
to form an atomic-step. The semantics of such a B-step with
respect to the overall GSM system state snapshot (i.e., the
instantiation of the information model) is summarized using a
5-tuple
hΣ,e,t,Σ0,Geni
, where
Σ
is the current system snapshot
of the GSM instance prior to consuming the external event
e
,
Σ0
is the new snapshot of the system after firing all relevant PAC
rules that are triggered directly or indirectly by the external event
e, and Gen is a set of generated immutable events as a result of
1-way and 2-way service calls that may be encapsulated in a task;
DEFIN ITION 1. An immutable event is a static instantiation of
an event schema such that all its attribute values are predefined
and not changed at runtime.
A task itself is encapsulated in a stage. Thus, the B-step is
formalized with respect to the sequence of PAC rule firings such
that
Σ = Σ0,Σ1,Σ2,···,Σn= Σ0
(where
Σ06= Σ1
). Thus, after
applying the
ith
PAC rule, according to the order imposed by
the PDG, the state advances from
Σi
to
Σi+1
, which is also
referred to as a micro-B-step in GSM.
The key properties surrounding B-steps are that a B-step
hΣ,e,t,Σ0,Geni
always terminates and ends in a unique state
Σ0
, where
Σ6= Σ0
. We refer to these as uniqueness properties
of B-steps [
12
]. They are achieved in part by restricting that
each
Att
in the GSM schema changes at most once as a result
of PAC rules firing within the context of a single B-step (i.e.,
toggle-once property), which implies that a change cannot be
undone, and in part by executing all relevant
1
PAC rules whose
consequents are reachable in the PDG graph and in an order
imposed by the PDG, namely, visiting every reachable node in
the PDG using a strata-based, breadth-first graph traversal.
The GSM schema consists of six distinct types of PAC rules
that are also described in Appendix B: PAC-1 for achieving
guards; PAC-2 for achieving milestones; PAC-3 for invalidating
a milestone once its stage is opened; PAC-4 for invalidating
a milestone once its invalidating sentry is achieved; PAC-5 for
closing a stage when one of its milestones is achieved; and
PAC-6 for closing a substage when its parent stage is closed.
The GSM execution model assumes a global, external event
queue, and the current GSM operational semantics is serialized
1.
If the PAC rule’s prerequisite and antecedent are satisfied and its consequent
is applied, the current state of a GSM instance changes.
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 4
Legal Reviewing (LR)
Requirements
Approval
(RA)
Engineering
Design
(ED)
g1
g2
g3
g4
Requirements
Approved
(RA:ap)
Design Completed
(ED:cp)
Design Suspended
(ED:sp)
g5
g6
Legal Review
Completed
(LR:cp)
Evaluating
Country
Restrictions
(ECR)
g7 Evaluated
(ECR:ev)
Preparing
Export
Documents
(PE)
g8
g9
Export Docs
Prepared
(PE:pp)
Preparation
Suspended
(PE:sp)
Lifecycle
Model
Information
Model
Data Attributes Status Attributes
Milestones
design
customer
requirements
latestIncEvent
Stages (open/closed)
RA ED LR ECR PE RA:ap ED:cp ED:sp ECR:ev PE:pp PE :sp LR :cp
...
Fig. 1. ”Design-to-order” business process.
w.r.t. the external event queue. In this work, we also rely on a
global event queue to orchestrate concurrent B-step executions
such that the event queue behaves as a pseudo-global clock.
However, we interleave and pipeline the processing of multiple
B-steps over a loosely coupled, distributed pub/sub infrastructure,
in which each B-step is associated with a different external
event. We achieve this distributed and parallel execution while
guaranteeing an identical behavior as if B-steps were processed
centrally and in the sequential order of the global event queue.
For enhanced comprehensibility, we focus on the core aspects
of GSM workflows (i.e., the data and the lifecycle) to describe
our mapping. Therefore, we formalize a GSM workflow schema,
Γ, as follows:
Γ=hI,Ri,
where
I
is the workflow information model that consists of a
set of ordered
hattr,datatypei
-pairs and distinguishes between
data attributes (
Id
), i.e., application data, and status attributes
(
Is
) (describing the state of the workflow within its lifecycle).
The number of status attributes is finite and bounded by the
schema.
R
is a set of acyclic PAC rules representing the lifecycle.
The operational semantics of
Γ
follows the general notion of
incremental operational semantics [12].
3.2 Example of Data-centric Workflow in GSM
The example depicted in Figure 1 (cf. [
12
]) represents a
data-centric GSM model for a product design process on behalf
of an external customer. It is structured into various stages (i.e.,
rounded rectangles) describing the individual task definitions.
Guards are denoted by diamonds and milestones by circles.
Upon a customer order, i.e., an external event of type
R:NewOrder
, a new workflow is instantiated and the cor-
responding product requirements are approved. Once the re-
quirements have been approved (i.e., an external event of type
T:RequirementsApproval
, which indicates that the clerk
in charge of finished the task, has been received), the actual engi-
neering stage is opened. In case the customer decides to change
the requirements afterwards (i.e.,
R:CustomerChange
), the
design stage is suspended and the requirements are approved
again. The legal reviewing of the order is encapsulated in a sepa-
rate stage comprising two sub-stages. While country restrictions
can be evaluated in parallel with the approval of requirements, the
preparation of the export documents requires a completed design.
Furthermore, preparation is suspended and country restrictions
are re-evaluated if requirements change. The whole process is
accomplished once the export documents are prepared.
+, RA
+, g1
+, ED +, LR +, ECR +, PE
+, RA:ap +, ED:cp +, ED:sp +, ECR:cv +, PE:pp +, PE:sp +, LR:cp
+, g2 +, g3 +, g4 +, g5 +, g6 +, g8 +, g9+, g7
-, RA -, ED
-, LR
-, ECR -, PE
-, RA:ap - , ED:cp -, ED:sp -, ECR:cv - , PE:pp -, PE:sp -, LR:cp
Fig. 2. Polarized dependency graph (PDG) for BP.
GUARD SENT RY
g1 latestIncEvent = “R:NewOrder”
g2 latestIncEvent = “R:CustomerChange”
g3 RA:ap ∧ECR:ev ∧ ¬ ED:cp
g4 latestIncEvent = “R:ResumeDesign”
g5 latestIncEvent = “R:NewOrder”
g6 latestIncEvent = “R:RedoExportDocs”
g7 ¬ECR:ev
g8 ⊕ED:cp
g9 latestIncEvent = “R:RedoExportDocs”
TABLE 1
Sentries associated with guards.
MILESTO NE TYPE SE NTRY
RA:ap Ach latestIncEvent = “T:RequirementsApproval”
ED:cp Ach latestIncEvent = “T:EngineeringDesign”
Inv RA:ap
ED:sp Ach RA:ap
ECR:ev Ach latestIncEvent = “T:EvalCountryRestrictions”
PE:pp Ach latestIncEvent = “T:PreparingExportDocs”
Inv ED:cp
PE:sp Ach ED:cp
LR:cp Ach ⊕PE:pp
Inv PE:pp
TABLE 2
Sentries associated with milestones.
The above behavior is implicitly described by the sentries as-
sociated with guards (cf. Table 1) and milestones (cf. Table 2) of
the GSM schema. The triggering events in the sentry definitions
can be either external or internal events. Conceptually, external
events are further divided into request-events invoking a task
(indicated with a “
R
” in the example) and task-termination-events
starting with a “
T
”. Internal events represent status attribute
updates in the information model, whereby
⊕
indicates that an
attribute toggled to true and indicates that it toggled to false.
For example, guard
g1
is achieved if the latest
incoming external event was of type
R:NewOrder
,
which corresponds to a new customer request. Similar,
milestone
ExportDocsPrepared
is invalidated if an
internal status-update event notified the invalidation of milestone
DesignCompleted (i.e., ED:cp, for short).
The PAC rules for this workflow are derived from the
GSM model according to the six rule templates described in
Appendix B (cf. also [
12
]). Altogether, this comprises a set of 41
rules that are depicted in Appendix D. An excerpt of three PAC
rules, which will be relevant for a subsequent running example of
the mapping is depicted in Table 3. The order of PAC rule firing
for the GSM operational semantics is described by the PDG
depicted in Figure 2. It has been established according to the
construction algorithm described in Appendix C (cf. also [12]).
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 5
In the rest of the paper we exploit this example to illustrate
our GSM-to-pub/sub mapping. In particular, we show the
construction of two distinct subscriptions capturing the semantics
of (1) the invalidation of milestone
PE:pp
(based on the rules
depicted in Table 3) and (2) maintaining a consistent view on
status attribute ExportDocsPrepared (i.e., PE:pp).
NOPRERE-
QUISI TE
ANTECEDENT CONSE-
QUENT
1 PE:pp ED:cp PE:pp
2 PE:pp ⊕ED:cp ∧LR PE:pp
3 PE:pp latestIncEvent = “R:RedoExportDocs” ∧LR PE:pp
TABLE 3
Excerpt of PAC rules for “Design-to-order” BP.
4 PUBLISH/SUBSCRIBE SCHEMA
In this section, we present the necessary formalization of the
pub/sub abstraction for subsequently being able to prove the
correctness of our mapping from data-centric workflows to
pub/sub. At the core of the pub/sub abstraction lies a set of
publications (
P
) and subscriptions (
S
). Each publication,
P ∈P
,
is defined as follows:
P=hEi,
where
E
defines the publication’s event schema that consists of
a set of ordered
hattr,datatypei
-pairs. Events are instances of
this schema and defined as sets of ordered
hattr,valuei
-pairs,
where
value
is an instance of the
datatype
specified in
E
. Over
time, a publisher continuously produces events that conform to
its event schema. Each subscription,
S ∈ S
, is defined as follows:
S=hD,Φ(ρk),δ(ρk),N(ρk),Ψ(ρk)i(1)
where
ρk=he,t, xi
, with event type
e
, logical event time
t
,
and subscription instance
x
(
x
is a context variable essentially
identifying a concrete workflow instance).
Dis the data model
. It describes the internal state of a
subscription and its unique key is formed by the triplet ρk.
D=he,t,x,onE1,···,onEm,d1,···,dn,s1,···,sp,visitedi
For every toggling status attribute appearing in antecedents of
PAC rules there is a column
onEi
in
D
. Moreover, there is a
column for every data attribute
di
(i.e., application data) and
every status attribute
sj
(i.e., internal workflow state) appearing
in logical expressions of PAC rules. The tuple with key
ρk
maintains these values as the result of receiving events associated
with
ρk
. The final column
visited
indicates whether or not all
the values in this tuple have stabilized, i.e., do no longer change
as a result of external event
ρk
. Setting
visited
to true in the
tuple with key
ρk
implies that this tuple is now a read-only tuple
and any notification (event generation) associated with
S
for
event
ρk
has been completed. A read-only tuple is retained for
maintaining an execution history and for enabling parallel and
distributed processing of PAC rules. We define the domain range
for status attributes as
DOMstatus ={true,false,∅}
, i.e., the
Boolean constants together with a special symbol
∅
. The domain
for status changes is defined as
DOMtoggling ={hBoolean ×
Booleani,∅}
, i.e., all possible transitions for the status attribute
together with
∅
. Similarly, the domain range for data attributes
is defined as
DOMdata ={String ∪Number ∪∅}
, i.e.,
any string or number together with
∅
. The special symbol
∅
indicates that the current value is unstable, i.e., the attribute has
not been updated in the context of the external event
ρk
, while
all other domain values are considered as stable.
Φ(ρk)is the subscription’s matching condition
. It is
a disjunction over
φi(ρk)∈Φ(ρk
), where each
φi(ρk)
is a
condition, that is, a logical formula representing the antecedent of
a PAC rule, that is instantiated and correlated with each external
event
ρk
. This condition is expressed over the condition language
L
that is a subset of First-Order Logic (FOL) supporting:
scalar values, binary relations (i.e., logical operators
(∨,∧,→)
,
relational operators (i.e.,
<,≤,=,6=,≥,>
, the unary relation
¬
,
and quantification over subscription instances
ρk
, i.e.,
∀
and
∃
).
The quantification domain for ρis totally ordered by time tand
instance x. Furthermore, we define the following functions.
1) τk(attr,ρk)
, or simply,
τ(attr)
, which returns the current
value of the attribute attr w.r.t. ρkin D.
2) τk−1(attr,ρk)
which returns the last value of the attribute
attr
w.r.t.
ρk
for
k > 2
in
D
; otherwise it returns
False
for Boolean attributes, and a default or a null value (
⊥
)
for non-Boolean attributes.
Finally, we resort to three-valued logic, with three possible
values (i.e.,
true
,
false
,
unknown
), where
unknown
is the
interpretation of the unstable value (
∅
). We do not consider the
null value (
⊥
) as unstable and we do not permit the null value for
Boolean variables. We define the evaluation of any logical binary
or unary operator involving
Unknown
as
Unknown
, whereas we
rely on traditional two-valued logic when no
unknown
value is
present. Also, when dealing with different system snapshots
Σ
, to
differentiate an attribute value among different snapshots, when it
is notclear from the context, we extend the definition
τ
to include
Σas input parameter as follows: τk(Σ,attr,ρk)or τ(Σ,attr).
δis the subscription’s consumption policy
that describes how
the internal state of a subscription changes after consuming an
event (cf. Section 6).
N(ρk)is the subscription’s notification policy
that is also a
disjunction over
νi(ρk)∈N(ρk)
and instantiated and correlated
with each external event
ρk
. The notification consists of the
notification schema that describes the content of the event(s)
(its payload) and a set of conditions
νi(ρk)
that dictate how the
content of the event is generated.
Ψ(ρk)defines the relationship
between a subscription’s
condition
Φ
and a subscription’s notification policy
N
and is
represented as a set of ordered pairs
hφ∈Φ,ν ∈Ni
, where each
φi
is associated with the corresponding
νi
, meaning, when the
matching condition
φi
is satisfied, then the notification condition
νiis evaluated:
Ψs(ρk)= [
φi∈Φ
hφi(ρk),νi(ρk)i.
An instance of the subscription
S
consists of an internal state
ΣS
j
over the data model
D
. The internal state of a subscription
can only be changed upon receiving (consuming) an external
event or generating an event (notification). In general, the
internal state together with an event shapes the subscription
operational semantics
OS
(a.k.a., the matching semantics),
which is summarized as a 6-tuple:
OS=hΣS
j,e,t,x,ΣS
j+1,Geni.
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 6
1) ΣS
jis the current internal state of the subscription S.
2) eis an occurrence of an external immutable event.
3) t
is the logical time, which is greater than all logical
timestamps occurring in ΣS
j.
4) x
is a variable that ranges over the IDs of instances of
S
.
This is referred to as the context variable of S.
5) ΣS
j+1 is the internal state after consuming event e.
6) Gen
is the set of generated immutable event occurrences
(generated by the notification policy) as reaction to the
external event ρk.
Consequently, the operational semantics for a subscription
OS
is formally defined as follows.
DEFIN ITION 2. Given a subscription
S
with internal state
ΣS
j
and an external event
e
at time
t
for the instance
x
(denoted by
ρ
) of
ΣS
j
, the subscription
S
examines
e
and either accepts
e
and makes a transitions from
ΣS
j
e,t,x
7−−−−→ ΣS
j+1
, or rejects
e
, if
neither the consumption policy nor the notification policy define
a state change for e.
We formally define the pub/sub schema Πas follows:
Π=hP,S,E,Ci
1) Pis a set of publications.
2) Sis a set of subscriptions.
3) E
is the event schema that captures both publications’
event and subscriptions’ notification schemes.
4) C
is the communication state maintaining for each external
event its type
(e)
, the logical time of its occurrence
(t)
,
the subscription instance that processed it
(x)
, and the
subscription type S, formalized as
C=he,t,x,Si.
Without loss of generality, if there is more than just a single
publisher for external events our formal pub/sub model assumes
the following two properties: i) Each subscription instantaneously
examines the external event
e
according to the subscription opera-
tional semantics. ii) At any instant in time, only a single subscrip-
tion is examining
e
in
Π
. These assumptions simplify the correct-
ness proof for our mapping (cf. Section 7) as external events are
inspected in the same sequential order by all subscriptions. We
refer to this as the pseudo-serializable execution property of Γ2.
An instance of the pub/sub schema
Π
is defined as a sequence
of global snapshots of
Σ1···Σk
over a discrete time space
t
,
where
Σi={ΣC
k,ΣS
j}
,
ΣC
k
is the communication state at time
t
over
Π
, and
ΣS
j
is the internal state of each subscription instance
of S.Π’s operational semantics is summarized as follows:
OΠ=hΣk,e,t,x,Σk+1i.
Here,
Σk
is the current snapshot of
Π
. And
e
is an occurrence
of an external event that is pending, implying that there is at least
one instance
x
of at least one subscription
S
that has not yet
examined
e
at logical time
t
.
Σk+1
is the new global snapshot
2.
The practical implication of these assumptions is that with multiple
external event publishers, all external events must be serialized. This requires
a synchronization mechanism between external event publishers in order to
generate a total order over a discrete timespace
t
. A solution to this problem in
content-based pub/sub systems is presented in [33]
of
Π
. Formally, the operational semantics of the pub/sub model
OΠis defined as follows:
DEFIN ITION 3. Given a pending event
ρk= (e,t, x)
and a
subscription
S
that has yet to examine
ρk
and having the current
state
ΣS
j
, then the global snapshot advances instantaneously
from Σk
e,t,x
7−−−−→ Σk+1, namely,
1)
The communication state transitions from
ΣC
k
e,t,x
7−−−−→ ΣC
k+1
, i.e., event
e
was sent to
S
for instance
x
.
2)
The subscription
S
examines the external event
e
in
accordance to
OS
; hence,
S
either accepts
e
and
transitions from ΣS
j
e,t,x
7−−−−→ ΣS
j+1 or rejects e.
We define a valid execution sequence over
Π
as one that
corresponds to a pseudo-serializable execution such that
at any instant in time,
Π
transitions only once from state
ΣC
k
e,t,x
7−−−−→ ΣC
k+1
, and only a single instance of subscription
S
receives an event
e
and transitions from
ΣS
j
e,t,x
7−−−−→ ΣS
j+1
(if
necessary). Notably, at any instant in time, many subscriptions
(or many instances of a single subscription) may be waiting to
receive the event
e
; however, the pseudo-serializable execution
property does not impose any restriction on the order in which
subscriptions (or instances of a subscription) must receive
the event
e
. Therefore, any non-deterministic selection of
subscriptions (or instances), that results in an instantaneous
examination of event
e
at time
t
by a single subscription
instance
x
, suffices. Most importantly, this pseudo-serialization
requirement can be dropped when there is a single publisher of
external events (cf. assumptions on the formal pub/sub model).
An event is pending only if at least one subscription instance
has not examined it yet, and (in theory) every subscription
instance must examine every event exactly once. Therefore,
from the communication state
C
, it can be inferred, which events
have been processed for which instances of subscription
S
and
which events are pending for which instances of S.
Finally, in general, with more than one publisher of external
events, any valid implementation of
Π
must guarantee the
pseudo-serializable execution property.
5 WORKFLOW MAPPING OVERVIEW
Given a data-centric workflow schema
Γ=hI,Ri
, we construct
a pub/sub schema
Π = hP,S,E,Ci
by applying a mapping
function
M
such that
M: Γ −→ Π
. The set
P
in our mapping
consists of a single publisher which simply publishes the external
events coming from the environment. However, constructing the
set of necessary subscriptions,
S
, is more subtle and is primarily
derived from the set of PAC rules and the PDG for a given
schema,
Γ
. In addition, we require a set of subscriptions for
bookkeeping purposes such as updating data and status attributes
and determining the start and the end of a B-step.
We define subscriptions both for processing relevant PAC
rules and maintaining the current values for status and data
attributes. In general, two classes of subscriptions arise: (1)
Application-specific subscriptions which capture the core of
the workflow operational semantics encoding both the PAC
rule semantics and the PDG topological sort order semantics.
(2) Generic subscriptions which implement a bookkeeping
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 7
Environ-
ment S⊝si
Ssource Ssink
Ssi Ssj
S⊕sk
S⊕sl
S⊕si
S⊝sj
S⊕sj
Bookkeeping
subscriptions
Application-specific
subscriptions
Ssk Ssl
Fig. 3. High-level illustration of subscription flow
mechanism to provide a consistent view of the data with an
implicit locking mechanism. This mechanism maintains multiple
versions of values for all attributes from the information model
and applies updates in a deterministic order dictated by the order
of external events. Hence, there is one version for each
ρk
, i.e., for
each B-step. These two classes of subscriptions also incorporate
the time semantics of the workflow schema which is based on the
external event received from the single publisher in our pub/sub
formulation. Therefore, subscriptions are event-relativized in a
sense that each subscription evaluates its conditions, implements
its consumption policy, and sends its notification in the context
of each external event in isolation, which forms a B-step.
In our mapping,
M:Γ−→Π
, we require the following set of
subscriptions for key workflow operations: [
S⊕s
] and [
Ss
] for
satisfying/falsifying or validating/invalidating status attributes
s
;
[
Ss
] for updating the status attribute
s
; [
Sd
] for updating the data
attribute
d
; [
Ssource
] for identifying the start of a B-step; and
[
Ssink
] for identifying the end of a B-step, where the
⊕
or
po-
larity, denotes a positive or a negative change in status attributes.
Next, we provide a high-level overview of each subscription.
The high-level representation and interaction among
subscriptions (represented as oval) is also depicted in Figure 3.
The directed, solid arrows indicate the flow of events among
subscriptions and the (bright-colored) directed, dashed arrows in-
dicate events received from and sent to the environment, while the
(black) dashed lines are bookkeeping messages for maintaining a
consistent view of the attributes. What is not shown in the figure,
for improved readability, is that there must be an arrow from
every node to the node Ssink to determine the end of a B-step.
The precise meaning of the arrows becomes evident in
Section 6, after formally defining each subscription.
[S⊕s],[Ss]
: For each status attribute
s
in the information
model of
Γ
,
I
, we add the subscription
S⊕s
, for validating
the attribute
s
and the subscription
Ss
for invalidating
s
.
The subscription’s condition
Φ
is derived based on the PAC
rules’ prerequisite and antecedent conditions. Hence,
Φ
is an
application-specific condition.
[Ss]
: For each status attribute
s
in
I
, we add the subscription
Ss
that listens to updates (i.e., notifications of
S⊕s
and
Ss
) for
s. Hence, Ss’s Φis a generic condition.
[Sd]
: For each data attribute
d
in
I
, we add the subscription
Sd
that listens to updates on
d
at the outset of the B-step. Hence,
Sd’s Φis also a generic condition.
[Ssource,Ssink]
: For identifying the beginning and ending
of a B-step we add the source subscription
Ssource
and the sink
subscription
Ssink
, respectively. All of these subscriptions are
intended for bookkeeping purposes. Thus, their conditions are
also generic.
6 MAPPING FORMALIZATION
The subscription plays a central role in formulating the mapping
of a data-centric workflow schema,
Γ=hI,Ri
, into the pub/sub
abstraction given by
Π=hP,S,E,Ci
. We formalize the semantics
of a subscription
S ∈ S
as described in Equation 1, where its con-
dition
Φ(ρk)
, consumption policy
δ(ρk)
, and notification policy
N(ρk)
are instantiated and associated with an external event
ρk
,
and
Ψ(ρk)
interrelates condition and corresponding notification.
6.1 Matching and Notification Policies
In this section, we start by providing a detailed account of
the mapping of the workflow’s application-specific semantics,
namely, encoding of PAC rules and the PDG topological sort
order, into a set of subscriptions. In addition, we provide the
foundation for a mapping that emulates the generic-execution
semantics including the necessary bookkeeping mechanism as
a set of subscriptions. We guarantee that the workflow correctly
executes by ensuring data consistency and the B-step semantics.
6.1.1 PAC Rules and PDG Mapping
We first define the
Γ
application-specific conditions for
subscriptions
S ∈ S
. Each logical formula
φi(ρk)∈Φ
is defined
as follows:
φi(ρk)=
PDG Predecessors
z }| {
ψi,PDG(ρk)∧
Event-based Pseudo Clock
z }| {
ψi,PseudoClock(ρk)(2)
Here,
ψi,PDG
is the PDG predecessor component, that is,
a logical formula that encodes the PDG topological sort order,
i.e.,
ψi,PDG
is a logical formula that evaluates to true when
all variables in
D
have stabilized (cf. detailed descriptions in
Sections 6.1.1.1 and 6.1.1.2).
ψi,PseudoClock
is a logical formula
that enforces that subscriptions are processed based on the order
of external events, i.e., it guarantees event-order serialization (cf.
detailed description in Section 6.1.1.3). The second component of
Ψ
, the notification expression,
νi(ρk)∈N
, is defined as follows:
νi(ρk)=
γρk,Svisited
sρkif ψi,SAT (ρk)
γρk,Svisited
sρkif ∀νi∈N,¬(ψi,SAT (ρk))
WAIT if ∃φi∈Ψi,¬(φi)
(3)
Here,
Svisited
sρk
is an event that indicates that the subscription
S
was successfully visited for the external event
ρk
, i.e., (partially)
completed as defined in Section 6.2. And,
γρk
is an event that
represents the consequent of the PAC rule indicating a change
(either a positive or a negative) to a particular status attribute
s∈ I
(
γ=s
), while
γ
indicates no change to status attribute
s
. In addition, each event
γρk
and
γρk
contains the current value
of the status attribute
s
in the context of the external event
ρk
.
ψi,SAT(ρk)
is a logical formula derived from a PAC rule’s pre-
requisite
π
and antecedent
α
and
WAIT
is an indicator that implies
that not all subscription conditions (
φi
) have been satisfied. The
notification policy is explained in detail in Section 6.1.1.4)
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 8
6.1.1.1
Application-specific Condition
: To construct
the application-specific condition, we adapt the PDG
construction algorithm [
15
], which operates on the set of PAC
rules
R
. Suppose each PAC rule has the form
hπ, α, γi
that
stands for prerequisite, antecedent, and consequent, respectively.
Then, the antecedent
α
of a PAC rule is constructed according
to the template
on ξ(x)if ϕ(x)
, where each expression
expr ∈ξ(x)
is of the form on-event
onEventT ype
or
s
,
where
onEventT ype
indicates requiring an external event of
the type given by
onEventT ype
and
means waiting for a
positive,
⊕
, or negative,
, change in status attribute
s
. Similarly,
every expression
expr∈γ
also follows the form
s
. However,
expression
expr∈ϕ(x)
has the form
s
, which simply indicates a
stable value for status attribute
s
. A value is stable if it no longer
changes in the current B-step.
We first collapse instances of PAC rules,
Ri∈ R
, that have
identical
π
and
γ
into a super PAC rule given by
hπ, A, γi
,
where
A= (∨α∈Riα)
. In general, PAC rules share identical
π
and
γ
because for a given status attribute
s
there exist multiple
rules that satisfy or falsify it. The notion of a super PAC rule
simplifies the mapping. Therefore, the (super) PAC rule
R
is
mapped to Ss, where s∈γ.
Example 1
In our example BP, there are three PAC
rules that share the common prerequisite
π=PE:pp
and consequent
γ=PE:pp
(cf. Table 3). These
PAC rules represent the incoming edges in the PDG
for node
(-,PE:pp)
(cf. Figure 2). Hence, they
can be collapsed into the super PAC rule
PACPE:pp
that represents the invalidation of milestone
PE:pp
.
πAγ
(ED:cp) ∨
PE:pp (⊕ED:cp ∧LR) ∨PE:pp
(latestIncEvent = “R:RedoExportDocs” ∧LR)
The super PAC rule is mapped to subscription SPE:pp.
The relation
Ψs∈ Ss
(an application-specific condition
and notification) is constructed through various mapping stages,
which are described next.
Each PAC rule is used to construct the subscription’s matching
condition,
hπ,A,γi∈R Φ∈ Ss
. More specifically, we
derive each φi∈Φbased on the PAC rule as follows:
MΦ:αi∈A −→φi∈Φ.(4)
Intuitively speaking, the antecedent of a PAC rule (
α
) forms
the matching condition (
Φ
). In case of a super PAC rule, each
antecedent of the original PAC rules that are collapsed within
this super PAC rule (
αi∈ A
) forms a single component of the
matching condition (φi∈Φ).
6.1.1.2
PDG Predecessors
: The key component of
φi
, denoted by
ψi,PDG ∈φi
, is at the core of the subscription
mapping and incorporates the notion of PDG predecessors,
an integral part of encoding the PDG topological sort order
semantics of Γinto Π. This mapping stage is represented by:
Mψi,P DG :αi∈A−→ψi,PDG ∈φi.(5)
Thus, for each
φi∈Φ
, we construct
ψi,PDG ∈φi
from the
corresponding
αi∈R
. The actual definition of
ψi,PDG
is derived
by adapting the PDG construction algorithm that examines the
antecedent component of each PAC rule identifying the set of
status attributes whose values must stabilize before firing a PAC
rule, i.e., before evaluating the subscription. We formally define
ψi,PDG
as a set of on-events that listen for positive or negative
change in variables appearing in the PAC rule’s antecedent,
which encodes the three different forms a sentry can have:
ψi,PDG(ρk)= ^
s∈ξ(x)
τk(ons,ρk)∧^
s∈ϕ(x)
τk(on.s,ρk),(6)
where
ons
refers to events that announce a change or no
change to
s
(i.e., the
on ξ(x)
component in antecedent
α
). and
on.s
refers to an event that holds the current value of
s
(i.e., the
if ϕ(x)component in α).
6.1.1.3
Event-based Pseudo Clock
: The second
component of
φi
is
ψPseudoClock
which enforces that each
subscription is processed, namely, its condition
φi
is satisfied,
in the order in which external events arrive. Therefore, external
events act as a pseudo clock. The operation of this pseudo clock
is defined by a logical formula as follows:
ψPseudoClock(ρk)=(@ρj,ρj∈ΣS,
¬(τj(isVisited,ρj))∧
τj(eventTime,ρj)<τk(eventTime,ρk)∧
τj(subInstance,ρj) = τk(subInstance,ρk)).
(7)
Example 1—cont.
As
PACPE:pp
is originally
comprised of three individual PAC rules, the subscription
condition
Φ
contains three disjuncts representing the
original antecedents (i.e.,
φ1
,
φ2
, and
φ3
), which results
in the following PDG predecessor components:
ψ1,PDG(ρk) = τk(onED:cp,ρk)
ψ2,PDG(ρk) = τk(onED:cp,ρk)∧τk(on.LR)
ψ3,PDG(ρk) = τk(LR).
Note that the external request event in PAC Rule 3
(i.e.,
R:RedoExportDocs
) is not evaluated
within the matching condition (here
ψ3,PDG(ρk)
)
but later on within the notification condition. The
data model
D
for this subscription is as follows:
etxonED:cp ED:cp PE:pp LR visited
The complete subscription condition Φis then:
Φ= φ1∨φ2∨φ3
=(ψ1,PDG (ρk)∨ψ2,PDG(ρk)∨ψ3,PDG(ρk))
∧ψPseudoClock(ρk)
6.1.1.4
Application-specific Notification
: Once the
PDG requirement (i.e.,
ψi,PDG ∈φi
) for a subscription instance
is satisfied, namely, all variables in
α
have stabilized, and all prior
external events have been processed, (i.e.,
(ψPseudoClock ∈φi)
),
the corresponding notification of
(φi, νi)
is triggered. Each
νi∈N
is partially derived from the corresponding PAC rule of
the super PAC rule hπ,αi,γiin accordance to Equation 3:
MN:(π,αi∈A,γ)∈R−→νi∈N. (8)
The key component of
νi
is a logical formula
ψi,SAT
, which
describes the behavior of the notification policy. Before, giving
the definition of the logical formula
ψi,SAT
, we must re-write
π
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 9
and
αi
. This re-writing is necessary for abiding by the workflow
semantics, in which each variable in
π
must use its last recent
value from the last completed B-step (if any), while each variable
in
αi
must use its most recent value. Thus, we re-write
π
, which
consists of only Boolean variables, as follows:
Mπ:si∈π−→τk−1(si,ρk).(9)
Similarly, we re-write
α
, which consists of both status and
data attributes, based on the most recent values as follows (
Mαi
consists of three stages of re-writing given by
Mϕ∈αi
,
M1
ξ∈αi
,
and M2
ξ∈αi):
Mϕ∈αi:ai∈ϕ−→ τk(ai,ρk),
M1
ξ∈αi:s∈ξ−→τk(ons,ρk)= ˆ
,
M2
ξ∈αi:onEvent∈ξ−→τk(eventType,ρk)=onEvent,
(10)
where
ˆ
∈ Boolean×Boolean
refers to the type of transition
for status attribute
s
as indicated by
s
. The mapping of a PAC
rule to ψi,SAT is expressed as
Mψi,SAT :(π,αi)∈R−→ψi,SAT ∈φi,(11)
where
ψi,SAT
is simply derived by conjunction of re-written
π
and α:
ψi,SAT (ρk)=Mπ∧Mαi.(12)
Example 1—cont.
Based on the super PAC rule,
PACPE:pp
, we now derive the notification condition for
the example in a similar fashion:
ψ1,SAT(ρk)=τk−1(PE :pp,ρk)∧
τk(onED:cp,ρk)= (true,false)
ψ2,SAT(ρk)=τk−1(PE :pp,ρk)∧τk(LR,ρk)∧
τk(onED:cp,ρk)= (false,true)
ψ3,SAT(ρk)=τk−1(PE :pp,ρk)∧τk(LR,ρk)∧
τk(eventType,ρk)=0R:RedoExportDocs0
The components of the notification condition
N
(i.e.,
ν1
,
ν2
, and
ν3
) can then be derived from Equation 3. We show
this for ν1as follows:
ν1(ρk)=
PE:ppρk,Svisited
PE:ppρk
if ψ1,SAT (ρk)
PE:ppρk,Svisited
PE:ppρk
if ∀νi,¬(ψi,SAT (ρk))
WAIT if ∃φi∈Ψi,¬(φi)
6.1.2 Data Consistency & Semantics Simulation
Now that we demonstrated the mapping to translate PAC rules
into a set of subscriptions, next, we derive the subscriptions
required for bookkeeping (described in Sections 6.1.2.1
and 6.1.2.2) and execution of Γ(described in Section 6.1.2.3).
Given the relation
Ψi,s(ρk)=hφi(ρk),νi(ρk)i
, then the
generic condition φiis defined by:
φi(ρk)=τk(a,ρk),(13)
where
φi
essentially captures the interest in any attempt to alter
the value of attribute
a
. On the other hand, the notification policy
νiis expressed as follows:
νi=aρk,Svisited
aρk,(14)
where
aρk
broadcasts the current value of attribute
a
and
Svisited
aρk
indicates that the bookkeeping subscription for
a
was
visited for external event ρk.
6.1.2.1
Status Attribute Consistency
: We start with
the workflow’s data consistency requirement that ensures a
consistent view of status attributes. We must ensure that when
a status attribute changes, no race condition for updating the
value arises and that every interested subscription has the
most up-to-date values for its status attributes. To achieve
data consistency, we add to the subscription,
Ss
, a generic
condition for every status attribute
s
in the information model
of
Γ
, which acts as a single gateway for changing
s
’s value
and subsequently broadcasting the final stable value of
s
to all
interested subscriptions. The relation Ψs∈Ssis given by:
φs(ρk)=τk(ons,ρk)(15)
νs(ρk)=
τk(s,ρk)⇐True,Svisited
sρkif τk(ons,ρk)= (false,true)
τk(s,ρk)⇐False,Svisited
sρkif τk(ons,ρk)= (true,false)
τk(s,ρk)⇐τk−1(s,ρk),Svisited
sρkotherwise,
where
⇐
indicates assignment of the value of the right-side to
the variable on the left-side.
Example 2
We now show the subscription for capturing
updates on status attribute
PE:pp
. The subscription
condition Φis given by:
φPE:pp(ρk)= τk(onPE:pp,ρk)
Notification condition N=νPE:pp(ρk)is given by:
τk(PE:pp,ρk)⇐True,Svisited
PE:ppρk
if τk(onPE:pp,ρk)= (false,true)
τk(PE:pp,ρk)⇐False,Svisited
PE:ppρk
if τk(onPE:pp,ρk)= (true,false)
τk(PE:pp,ρk)⇐τk−1(PE:pp,ρk),Svisited
PE:ppρk
otherwise
6.1.2.2
Data Attribute Consistency
: Likewise, we
construct a set of subscriptions that listens to events containing
values for each data attribute. Upon consuming an external event,
if the value in the event payload is different from the current
value, then the subscription
Sd
generates the value, derived from
the change or no change events, accordingly, as follows:
φd(ρk)=τk(on∆eρk,ρk)(16)
νd(ρk)=
τk(d,ρk)⇐d,Svisited
dρkif d∈∆eρk∧
τk(d,ρk)6=τk−1(d,ρk)
τk(d,ρk)⇐τk−1(d,ρk),Svisited
dρkotherwise,
where ∆eρksummarizes the data attributes appearing in e.
6.1.2.3
B-Step Simulation
: Finally, in the workflow
execution, it is crucial to identify the start and end of a completed
B-step. Therefore, first, we focus on the start of a new B-step,
which is achieved through subscription
Ssource
. The source
subscription
Ssource
has a special property because it is the
only subscription that waits upon receiving external events
e
from the environment. Every incoming external event in
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 10
turn establishes the start of a new B-step (referred to as the
B-step deterministic-initiation property). Acting as a single
gateway,
Ssource
assigns increasing timestamps
t
to all incoming
e
and thereby imposes a total order on all external events.
Therefore,
Ssource
sends the events
Svisited
sourceρk
and
eρk
that
is understood by all subscriptions, where its type, time, and
intended subscription instances are summarized in
ρk
. Hence,
the relation Ψsource= (φ(ρk),ν(ρk)) is given as follows:
φsource(ρk) = e(17)
νsource(ρk) = Svisited
sourceρk,eρk,∆eρk
In order to guarantee the B-step deterministic-initiation property,
we must add
onSvisited
sourceρk
to all subscriptions whose
φi∈Φ
are
empty. In the same spirit, the end of a B-step is determined by
introducing
Ssink
that subscribes to every subscription involved
in order to establish the ending of a B-step (referred to as the
B-step deterministic-completion property). Hence,
Ψsink(ρk)
is given by:
φsink(ρk) = ^
Si∈S0
τk(onSvisited
iρk,ρk)(18)
νsink(ρk) = Svisited
sinkρk,
where S0=S\Ssink
6.2 Consumption Policy
The subscriptions’ conditions and notification policies define
a design-time specification of the workflow semantics under
our pub/sub formulation. As opposed to this, the consumption
policy specifies how to update the internal state of each
subscription,
Σ
, at runtime. The consumption policy is tightly
bound to the subscription operational semantics, denoted
by
OS= (ΣS
j, e, t, x, ΣS
j+1, Gen)
. To precisely model the
consumption policy w.r.t.
OS
, we discuss the subscription’s
evolution as it goes through the various stages of its lifecycle
within a B-step: initiation, modification, completion, satisfaction,
generation, and termination. Each stage and its interaction with
other stages is defined next and illustrated in Figure 4.
STAGE 1.Subscription initiation occurs for the event associated
with
ρk
(within the
kth
B-step), when the subscription first
receives the event, either directly (the event
eρk
), or indirectly
(such as status or data attribute updates in the context of
ρk
).
Then,
eventType
,
eventTime
, and
subscriptionIstance
are populated based on
ρk
and
isVisited
is set to false, while
the rest of its attributes in
D
are set to
∅
. However, if the subscrip-
tion instance
x∈ρk
does not exist in
ΣS
, then as part of the ini-
tialization (and creation of the new instance), all status attributes
are set to false and all data attributes are set to their default values.
STAGE 2.Subscription modification occurs for the event asso-
ciated with
ρk
(within the
kth
B-step) after the subscription has
been initiated (before or after a subscription’s partial completion),
when the internal state of the subscription is updated and it is
transitioned according to the subscription operational semantics:
OS=ΣS
j
Eρk→(e,t,x)
7−−−−−−−−−→ ΣS
j+1.
Initiation Modi-
fication
Com-
pletion
Partial
Comple-
tion
Satis-
faction
Genera-
tion
Termina-
tion
STAGE 1 STAGE 2 STAGE 3 STAGE 4 STAGE 5 STAGE 6
Fig. 4. Consumption policy state transition
The internal state of the subscription changes by at most one
single attribute in
D
and is characterized by the following
assignment:
(∀(ai,value)∈Eρk,ai∈D)→τk(ai,ρk)⇐value
STAGE 3.Subscription (partial) completion occurs for the event
associated with
ρk
(within the
kth
B-step) after the subscription
has been initiated, when at least one of the subscription’s
φi(ρk)∈Ψs(ρk)
has evaluated to true. If all
φi(ρk)
have
evaluated to true, then the subscription is considered completed,
while if at least one of
φi(ρk)
has evaluated to true, then
the subscription is considered partially completed. Explicitly
considering a stage for the partial completion, allows the
pub/sub system to evaluate the notification policy, i.e., Stage
4, and generate events, i.e., Stage 5, before the subscription
is completed. Hence, a tuple
hφi(ρk),νi(ρk)i ∈ ΨS(ρk)
might
completely evaluate to true and the corresponding notifications
are generated, even if there exist conditions
φj∈Φ
that did
not yet evaluated to true. This behavior improves parallelism
in execution and is indicated by the dashed lines in Figure 4.
STAGE 4.Subscription satisfaction occurs for the event
associated with
ρk
(within the
kth
B-step) after the subscription
has been (partially) completed, when
φi(ρk)∈Ψ(ρk)
is
evaluated to true, i.e., the subscription is (partially) completed,
and the subscription’s corresponding notification policy,
νi(ρk)
,
evaluates to true.
STAGE 5.Subscription generation occurs for the event associated
with
ρk
(within the
kth
B-step) after the subscription is satisfied
and when the subscription’s relevant events are generated
according to νi(ρk).
STAGE 6.Subscription termination occurs for the event
ρk
(within the
kth
B-step) after all events have been generated by
the subscription and attribute
τk(isVisited,ρk)
is assigned to
true. Once
isVisited
is set to true, the tuple associated with
ρkbecomes read-only.
7 WORKFLOW MAPPING ANALYSIS
In this section, we show that under incremental formulation
(sequential execution), the data-centric workflow schema
Γ
is equivalent to the pub/sub schema
Π
(distributed execution),
expressed as
M: Γ −→Π
. Before establishing the correctness
and equivalence of the
Γ
and the
Π
schemas, we define a set
of preliminary concepts. For the proof of these preliminaries and
the overall equivalence of
Γ
and
Π
we would like to refer the
reader to Appendix A.
As described in Section 3, the incremental operational
semantics of
Γ
is defined as the 5-tuple
(Σ,e,t,Σ0,Gen)
and the
Γ
system snapshot transition, denoted by
Σe
7−−→ Σ0
, is defined
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 11
as the smallest logical business step (B-step), which consists of
the sequential firing of PAC rules. The B-step in its expanded
form is given by
Σ = Σ0,Σ1,Σ2,···,Σn= Σ0,
where
Σ06= Σ1
(due to updating data attributes based on the external incoming
event
e
) and each
Σi
is referred to as a micro-B-step. Thus, the
ith
micro-B-step corresponds to the firing of the
ith
PAC rule.
Furthermore, based on the
Γ
semantics, each PAC rule firing
results in the change of exactly one status attribute and the value
of each status attribute changes at most once within a B-step
(the toggle-once property). Consequently, each PAC rule is fired
at most once within a B-step.
First we provide a formalization for the set of status
attributes that is changed within a B-step in reaction to an
external event
e
. Essentially, this set can be derived from the
event-relativized-PDG.
DEFIN ITION 4. The event-relativized-PDG for an external
event
e
, denoted by
ePDG = (Ve,Ee)
, is a subgraph of the
PDG that includes all PAC rules and their ordering that are
triggered in reaction to e:
PDG(V,E)⊇eP DG ={(Ve,Ee)|Ve⊆V,Ee⊆E}
DEFIN ITION 5. Given
ePDG=(Ve,Ee)
for external events of
type
e
, the event-relativized status attribute set for
e
, denoted by
Ie
s
, contains all status attributes that occur in nodes
Ve
of
ePDG
,
i.e., all status attributes that are changed within the B-step.
Ie
s={s|s∈Ve,(Ve,Ee)= ePDG}
Thus, the set of status attributes that is not changed within the
B-step is given by Ie
s=Is\Ie
s.
We now formalize the changes in value of a status attribute
over the notion of stable attribute values as follows.
DEFIN ITION 6. A status attribute
s∈ Is
is called stable,
denoted by
˙sΣ
, within a B-step caused by
e
iff
s
is within the
set of attributes that are not changed as reaction to
e
, or
s
is
in the event-relativized status attribute set for
e
and changed its
value in the context of e.
˙sΣi=(τ(Σi−1,s)6=τ(Σ0
i,s), if s∈Ie
s
>, if s∈Ie
s
DEFIN ITION 7. We refer to initial and final system snapshot
of a B-step as complete system snapshot, denoted by
Σ
(or
Σ0
)
and Σ0(or Σn), if all status attributes are stable.
∀s∈Is,˙sΣ
DEFIN ITION 8. We refer to an intermediate system snapshot
within a B-step as a partial system snapshot, denoted by
Σi,0<i<n, i.e., if not all status attributes are stable.
∃s∈Is,¬˙sΣ
Finally, we emphasize that the incremental formulation of
the execution follows a sequential and central execution, in
which the
Γ
semantics for the B-step execution is defined as
an atomic step and each B-step consists of a finite number of
micro-B-steps. Therefore, we define the concept of time in terms
of a B-step such that system time advances only from
ti
to
ti+1
after processing the
ith
event (
ei
), i.e., the completion of the
ith
B-step. In addition, external events are processed in the order
in which they arrive—the in-order processing of external events.
LEMMA 1. The
Γ
incremental semantics guarantees the
in-order processing of external events (when all events are
published from a single source). Hence, the B-step execution
(i.e., PAC rule firing) follows the event-order serialization (cf.
Proof 2 in Appendix A).
Similar to the B-step event-order serialization in the
Γ
semantics, the micro-B-steps within a B-step also follow a strict
order which is imposed by the topological sort order of the
PDG—the PDG-based serialization of micro-B-steps.
DEFIN ITION 9. The
Γ
incremental semantics guarantees the
PDG-based serialization of micro-B-steps [12].
Next, we show how the operational semantics of
Γ
is also guar-
anteed in our pub/sub formulation. As provided in Section 4, the
pub/sub schema
Π
’s operational semantics is also formalized as a
sequence of changes in a system snapshot denoted by
Σi
e,t,x
7−−−−→
Σi+1
, implying a single subscriber received and accepted event
e
.
LEMMA 2. The pub/sub operational semantics guarantees
in-order delivery of events between any pair of publisher and
subscriber (cf. Proof 3 in Appendix A).
COROLLARY 1. As consequence of Lemma 2 the mapping
M
under our pub/sub operational semantics guarantees in-order
processing of external events (when all events are published
from a single source).
Furthermore, our subscription mapping for PAC rules in the
Γ
schema processes events with respect to the order of external
events (published from a single source in both
Γ
and
Π
schemas).
This mapping also introduces the notion of event-based pseudo-
clock (Section 6) in order to achieve event-order serialization.
LEMMA 3. The mapping
M
under the pub/sub operational
semantics guarantees execution of subscriptions based on
event-order serialization (cf. Proof 4 in Appendix A).
LEMMA 4. The mapping
M
under our pub/sub operational
semantics guarantees the PDG-based serialization of
subscriptions (cf. Proof 5 in Appendix A).
With respect to the B-step execution, we also prove that the
pub/sub semantics satisfy the toggle-once property.
LEMMA 5. The mapping
M
guarantees the toggle-once
property of a B-step (cf. Proof 6 in Appendix A).
To prove the correctness of the overall execution of the pub/sub
workflow formulation, we introduce the notion of a reachable
system snapshot: the state of the system after executing a set of
external events. Therefore, the correctness of our model after
processing a set of external events is determined by comparing
the information model (captured by the system snapshot) of the
Γ
and
Π
schemas. If the two snapshots are identical, then our
workflow to pub/sub mapping is correct, otherwise, it is incorrect.
To compare
Γ
and
Π
system snapshots, denoted by
ΣΓ
and
ΣΠ
, respectively, we introduce two levels of equivalence, namely
weak and strong equivalence. Without loss of generality, we
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 12
make the following simplification in the internal data model of a
system snapshot for both
ΣΓ
and
ΣΠ
: we conceptualize
ΣΓ
and
ΣΠ
as simply a collection of all data and status attributes given
in the
Γ
information model. In addition, in
ΣΠ
, we also employ
a versioning mechanism for storing this collection, in which the
versioning is advanced with respect to external events. Hence,
through versioning in
ΣΠ
, values of data and status attributes are
retained separately for each external event, while in
ΣΓ
, only the
latest version of data and status attribute values are maintained.
DEFIN ITION 10. The (partial) system snapshots
ΣΓ
and
ΣΠ
are
weakly equivalent up to event
ei
, denoted by
ΣΓ⇔wΣΠ
, iff the
values of stable status attributes in both ΣΓand ΣΠare equal.
∀s∈ΣΓ,˙sΓ
Σ∧˙sΠ
Σ→τ(ΣΓ,s)=τi(ΣΠ,s)
DEFIN ITION 11. The (complete) system snapshots
ΣΓ
and
ΣΠ
are strongly equivalent up to event
ei
, denoted by
ΣΓ⇔sΣΠ
,
iff all status attributes in both ΣΓand ΣΠare stable and equal.
∀s∈ΣΓ,˙sΓ
Σ∧˙sΠ
Σ∧τ(ΣΓ,s)=τi(ΣΠ,s)
LEMMA 6. Any reachable system snapshots
ΣΓ
ei
and
ΣΠ
ei
for
event eiare weakly equivalent (cf. Proof 7 in Appendix A).
LEMMA 7. The time complexity of the mapping
M:Γ−→Π
is linear w.r.t. the number of PAC rules and the size of the
Γ
schema (cf. Proof 8 in Appendix A).
LEMMA 8. The number of subscriptions generated by the
mapping
M:Γ−→Π
is linear w.r.t. the information model
I
of the schema Γ(cf. Proof 9 in Appendix A).
We are now in a position to prove our mapping from the
workflow formulation Γinto the pub/sub abstraction Π.
THEOR EM 1. The schema
Γ
under incremental formulation
is equivalent to
Π
in terms of the B-Step operational, which
establishes the correctness of our mapping
M: Γ −→ Π
(cf.
Proof 10 in Appendix A).
7.1 Overhead in B-Step Execution
The number of B-steps that is executed in order to process the
whole workflow depends on the characteristics of the application
and is consequently not bounded by the workflow schema. For
a quantification of the communication costs of our mapping, we
focus on the number of events generated within a single B-step,
which depends on the number of PAC rules (i.e., PDG nodes) that
are fired as a result of an external event
e
arriving (i.e., the ePDG).
In our mapping, subscriptions capture (1) PAC rule firing and
(2) the bookkeeping mechanism. Once a subscription evaluates
to true, events are generated and sent to other interested subscrip-
tions (notification). Regarding (1), the number of events corre-
sponds to all edges that are traversed in the ePDG (one event for
each edge to trigger the next PAC rule) and the number of nodes
visited (one event for each state change is sent to the bookkeeping
subscription). As the ePDG is acyclic, the upper bound for gen-
erated events representing PAC rule firings w.r.t. the ePDG is in
O
PAC rule fire
z}|{
|Ee|+
bookkeeping
z}|{
|Ve|.
S1
P1
P2
P3
Unassigned
subscriptions
Subscription’s
event flow
S3
S2
S4S5Sn
Assigned
subscriptions
Fig. 5. Illustration of subscription assignment
For (2), the estimate of generated events is based on the same ar-
gument as for (1). A bookkeeping subscription will propagate the
new value of a status attribute (i.e., create an event) every time it
received an update for this attribute. Within a B-step, the number
of status changes is bounded by the ePDG, i.e., by the number of
nodes that are possibly visited. Consequently, the upper bound for
bookkeeping events, i.e. the cost for maintaining data consistency,
w.r.t. the PDG, is
O(|Ve|)
. Altogether, the number of events that
are generated for executing a B-step is in O(|ePDG|).
8 FOUNDATION FOR DISTRIBUTION
The mapping of the data-centric workflow schema to the pub/sub
schema serves to enable the robust distribution and parallel
execution of each workflow element. In general, workflow
elements comprise the individual tasks, transitions among them,
their input/output parameters, and user roles. More specifically,
in a data-centric workflow, rules are the fundamental element
and capture both task invocations or transitions, respectively, and
their relevant I/O parameters. In this sense, workflow distribution
can be seen as the grouping of a set of rules or their mapped
subscription counterparts, respectively, over a loosely coupled
and distributed system. What remains unaddressed by the map-
ping is how to determine the actual grouping of these workflow
elements in various processing sites within the pub/sub system.
Yet another important property of workflow element grouping
lies in the ability to easily move subscriptions among pub/sub
processing nodes in order to achieve higher-level functionalities
such as load balancing, replication, and availability.
One can imagine the two extreme possibilities of grouping:
one that every group entails a single subscription (where each
subscription is derived as was shown in Sections 5,6) or all sub-
scriptions can be placed into a single processing site. The former
approach achieves the highest level of parallelism (in a sense of
distributed execution), but suffers substantially from the increased
event traffic in order to coordinate and share data among elements
across various processing sites. The latter approach becomes
sequential (in a sense of centralized execution), but requires no
event traffic for interactions among various elements.
Our goal is to lay a foundation that enables us to study the
distribution of the workflow at various granularity levels in
order to minimize an objective function, e.g., network traffic,
while satisfying additional real-world (hard) constraints, e.g.,
compliance requirements: enforcing parts of a workflow to be
completed in a particular geographical region, requiring that
data must reside in a particular region, or following a licensing
model that charges for shipping data which indirectly forces the
execution to be as close as possible to the data. We formulate
the workflow distribution in terms of a portable execution unit
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 13
that can be carried out in a single processing site (P-site). Thus,
P-site is a processing site deployed on a given geographical
location that is responsible for executing a set of elements in a
given workflow such that P-site minimizes the objective function
and satisfies a given set of hard constraints.
We formally define the workflow distribution problem as fol-
low: given a set of P-sites
Pi
and a partial assignment of a subset
of subscriptions to each of the P-site (hard constraints), then deter-
mine a complete assignment such that the network traffic among
the set of P-sites is minimized. Furthermore, we require that P-
sites are disjoint, namely, a single subscription cannot be assigned
to more than one P-site. The solution to our problem is a com-
plete assignment of subscription-to-P-site. Clearly, any arbitrary
assignment, starting from the partial assignment, is a solution, but
not necessarily one that minimizes the objective function; hence,
not an optimal solution. Figure 5 illustrates an instance of our as-
signment problem, in which we have three P-sites, where each P-
site is assigned one subscription
P1
,
P2
and
P3
, respectively, and
we have a set of unassigned subscriptions
S4···Sn
. Without loss
of generality, we also combine subscriptions with polarity (pos-
itive and negative) into a single subscription, i.e.,
S={S⊕,S}
.
This assignment problem is formulated as an undirected
weighted graph
G=(E,V )
, where each subscription
Si
is repre-
sented by a vertex
vi
, and there is an edge between two vertices
vi
and
vj
iff the subscription
Sj
is interested in events generated
by subscription
Si
or vise versa. Also, we have a set of colors,
C={c1,···,ck}
, where each
ci
corresponds to the P-site
Pi
. Con-
sequently, the partial coloration (i.e., partial assignment) of the
subset of vertices in
G
,
V0⊆V
, is given by the mapping function:
χ:V0−→C. (19)
Moreover, we need a cost function to capture the communication
cost between two subscriptions (relative to the size of data and
protocol messages). Thus, each edge
(vi,vj)
of the graph reflects
the communication cost flowing between
vi
to
vj
. The cost of
data flow is given by:
C∆:E(G)−→R+.(20)
Likewise, the protocol cost is given by:
Cπ:E(G)−→R+.(21)
Finally, the total cost is given by function Cz:
Cz=(C∆+Cπ)(E(G)),(22)
which is simply computed by summing the data and protocol
cost. Therefore, under this formulation, the objective of our
assignment problem is to provide a complete coloration of graph
Gwhile minimizing Cz:
χ:V(G)−→C, (23)
where
χ=χ
for all
v∈V0
, such that the sum of all edge weights,
given by
Cz
, whose vertices are not of the same color, is
minimized. Essentially, the complete graph coloration results in
a complete assignment, which in turn partitions the graph into
k
disjoint sets of vertices such that each set is assigned to a P-site.
The intractability of our graph coloration formulation is shown
by reducing the well-known multiway cut (a.k.a., multiterminal
cut) problem [11], [31] to the graph coloration.
DEFIN ITION 12. Given an undirected weighted graph
G=
(E,V )
and a set of terminals
S={s1,···,sk}⊆ V
, then a multi-
way cut is defined as a set of edges whose removal disconnects
the terminals from each other. The multiway cut asks for a min-
imum weight edge set whose removal disconnects the terminals.
The problem of computing the minimum weight multiway cut
is NP-hard for any fixed size
k > 2
[
11
]. For
k=2
, the problem
is tractable and can be solved optimally using the standard
max-flow, min-cut algorithm. Furthermore, for
k≥3
, there
exists a greedy algorithm with a
2−2
k
approximation ratio [
31
].
This greedy algorithm [31] consists of two phases:
1)
For each
i=1···k
, compute a minimum weight isolating
cut
Ii
for each
si
. This cut is computed optimally using
the max-flow algorithm by construction a new instance
of the min-weight cut problem which consists of only two
terminals, namely, siand S−{si}.
2)
Discard the maximum weight cut
Ij
and output the union
of the rest, denoted by:
I=[
i=1···k
Ii−Ij.
I
disconnects any pair of terminals, hence, it is a multiway
cut.
The multiway cut problem can be reduced to our graph
coloration problem. We transform the multiway cut problem by
assigning each terminal vertex to a different color (i.e., partial
color assignment), and we assign a color to each non-terminal
vertex (i.e., complete color assignment), where the color is
chosen from the set of colors used for the terminal vertices,
such that we minimize the edge cut between vertices of different
colors. Hence, there exists a polynomial reduction of the
classical multiway cut problem to our graph coloration problem
(i.e., colored multiway cut).
THEOR EM 2. The colored multiway cut problem is NP-hard
and can be solved within a 2−2
kapproximation.
PROOF 1. The proof follows from reducing the known multiway
cut problem to the colored multiway problem.
In summary, we formalized the general workflow distribution
problem over the pub/sub abstraction as the colored multiway cut
problem. We showed that colored multiway cut is intractable, but
there exists a constant factor approximation for solving it. From a
theoretical perspective, it is interesting to employ a more complex
communication cost function in the workflow distribution
which collapses all edges leaving from the subscription
Si
to all
interested subscriptions residing in a different processing site P-
site because it is sufficient to transmit a message once from
Si
to
each interested subscriptions in a P-site. The collapsing of edges
and the extension of the problem to a directed graph, instead of
an undirected graph, leads to new challenges for future research.
However, these new restriction do not affect the hardness of the
problem, namely, the problem remains intractable.
9 CONCLUSIONS
In this work, we developed the theoretical foundation for the safe
distribution and the parallel execution of data-centric workflows
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 14
over the decoupled and distributed pub/sub abstraction. To this
end, we made the following contributions: we developed a
polynomial-time mapping of data-centric workflows into the
pub/sub abstraction to achieve distributed and parallel workflow
execution; we proved the correctness of our mapping through
an equivalence of reachable system snapshots; and we proved
the hardness of the optimal workflow distribution problem over
the pub/sub abstraction, and, finally, we employed a greedy
algorithm with a constant factor approximation for this problem.
Moreover, we presented the foundation for a reactive pub/sub
middleware by extending the matching condition for an ordering
property and introducing a notification mechanism.
REFERENCES
[1]
S. Abiteboul, O. Benjelloun, and T. Milo. The Active XML Project: An
Overview. VLDB J., 17(5):1019–1040, 2008.
[2]
S. Abiteboul, P. Bourhis, A. Galland, and B. Marinoiu. The AXML
Artifact Model. In TIME, pages 11–17, 2009.
[3]
G. Alonso, G. Alonso, C. Mohan, D. Agrawal, A. E. Abbadi, R. Gunthor,
D. Agrawal, A. E. Abbadi, and M. Kamath. Exotica/FMQM: A Persistent
Message-Based Architecture for Distributed Workflow Management. In
IFIP, pages 1–18, 1995.
[4]
A. Avanes and J. C. Freytag. Adaptive Workflow Scheduling under
Resource Allocation Constraints and Network Dynamics. PVLDB,
1(2):1631–1637, 2008.
[5]
T. Bauer and P. Dadam. Efficient Distributed Workflow Management
Based on Variable Server Assignments. In CAiSE, pages 94–109, 2000.
[6]
K. Bhattacharya, N. S. Caswell, S. Kumaran, A. Nigam, and F. Y.
Wu. Artifact-centered operational modeling: Lessons from customer
engagements. IBM Systems Journal, 46(4):703–721, 2007.
[7]
D. Calvanese, G. De Giacomo, and M. Montali. Foundations of data-aware
process analysis: a database theory perspective. In PODS, pages 1–12,
2013.
[8]
F. Casati and A. Discenza. Modeling and Managing Interactions among
Business Processes. Journal of Systems Integration, 10(2):145–168, 2001.
[9]
T. Chao, D. Cohn, A. Flatgard, S. Hahn, M. Linehan, P. Nandi, A. Nigam,
F. Pinel, J. Vergo, and F. Wu. Artifact-Based Transformation of IBM
Global Financing. In BPM, volume 5701 of Lecture Notes in Computer
Science, pages 261–277. Springer Berlin Heidelberg, 2009.
[10]
D. Cohn, P. Dhoolia, F. T. Heath, F. Pinel, and J. Vergo. Siena: From
PowerPoint to Web App in 5 Minutes. In ICSOC, pages 722–723, 2008.
[11]
E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and
M. Yannakakis. The Complexity of Multiway Cuts (Extended Abstract).
In STOC, pages 241–251, 1992.
[12]
E. Damaggio, R. Hull, and R. Vacul
´
ın. On the Equivalence of Incremental
and Fixpoint Semantics for Business Artifacts with Guard-Stage-Milestone
Lifecycles. Information Systems, 38(4):561–584, 2013.
[13]
M. Dumas. On the Convergence of Data and Process Engineering. In
ADBIS, pages 19–26, 2011.
[14] Great Britain. Data Protection Act. 1998.
[15]
R. Hull, E. Damaggio, F. Fournier, M. Gupta, F. T. Heath, S. Hobson,
M. H. Linehan, S. Maradugu, A. Nigam, P. Sukaviriya, and R. Vacul
´
ın.
Introducing the Guard-Stage-Milestone Approach for Specifying Business
Entity Lifecycles. In WS-FM, pages 1–24, 2010.
[16]
R. Hull, E. Damaggio, R. D. Masellis, F. Fournier, M. Gupta, F. T. Heath,
S. Hobson, M. H. Linehan, S. Maradugu, A. Nigam, P. N. Sukaviriya, and
R. Vacul
´
ın. Business Artifacts with Guard-Stage-Milestone Lifecycles:
Managing Artifact Interactions with Conditions and Events. In DEBS,
pages 51–62, 2011.
[17]
R. Hull and J. Su. NSF Workshop on Data-Centric Workflows (2009).
http://dcw2009.cs.ucsb.edu/report.pdf.
[18]
V. K
¨
unzle and M. Reichert. PHILharmonicFlows: Towards a Framework
for Object-aware Process Management. Journal of Software Maintenance,
23(4):205–244, 2011.
[19]
G. Li, V. Muthusamy, and H. Jacobsen. A Distributed Service-oriented
Architecture for Business Process Execution. TWEB, 4(1), 2010.
[20]
H. D. Man. An Approach to Case Based Management Case Management:
Cordys Approach, 2009.
[21]
M. Marin, R. Hull, and R. Vacul
´
ın. Data Centric BPM and the Emerging
Case Management Standard: A Short Survey. In BPM, pages 24–30, 2012.
[22]
P. Muth, D. Wodtke, J. Weißenfels, A. K. Dittrich, and G. Weikum. From
Centralized Workflow Specification to Distributed Workflow Execution.
J. Intell. Inf. Syst., 10(2):159–184, 1998.
[23]
M. G. Nanda, S. Chandra, and V. Sarkar. Decentralizing Execution of
Composite Web Services. In OOPSLA, pages 170–187, 2004.
[24]
A. Nigam and N. S. Caswell. Business Artifacts: An Approach to
Operational Specification. IBM Systems Journal, 42(3):428–445, 2003.
[25]
OMG. Case Management Model and Notation (CMMN).
http://www.omg.org/spec/CMMN/1.0, May 2014.
[26]
G. Redding, M. Dumas, A. H. M. ter Hofstede, and A. Iordachescu.
Modelling Flexible Processes with Business Objects. In CEC, pages
41–48, 2009.
[27]
C. Schuler, H. Schuldt, and H. Schek. Supporting Reliable Transactional
Business Processes by Publish/Subscribe Techniques. In TES, pages
118–131, 2001.
[28]
J. K. Strosnider, P. Nandi, S. Kumaran, S. P. Ghosh, and A. Arsnajani.
Model-driven Synthesis of SOA Solutions. IBM Systems Journal,
47(3):415–432, 2008.
[29]
Y. Sun, R. Hull, and R. Vacul
´
ın. Parallel Processing for Business Artifacts
with Declarative Lifecycles. In OTM, pages 433–443, 2012.
[30]
W. M. P. van der Aalst, M. Weske, and D. Gr
¨
unbauer. Case handling: a new
paradigm for business process support. Data Knowl. Eng., 53(2):129–162,
2005.
[31]
V. V. Vazirani. Approximation Algorithms. Springer-Verlag New York,
Inc., New York, NY, USA, 2001.
[32]
D. Wodtke, J. Weißenfels, G. Weikum, and A. K. Dittrich. The Mentor
Project: Steps Toward Enterprise-Wide Workflow Management. In ICDE,
pages 556–565, 1996.
[33]
K. Zhang, V. Muthusamy, and H. Jacobsen. Total Order in Content-Based
Publish/Subscribe Systems. In ICDCS, pages 335–344, 2012.
Mohammad Sadoghi
joined IBM T.J. Watson
in 2012. He received his Ph.D from the
Computer Science department at the University
of Toronto in 2013. Mohammad was the
recipient of the Ontario Graduate Scholarship
(2006-2007) and the NSERC Canada
Graduate Scholarship (2007-2008, 2009-2011).
Mohammad’s research focuses on high-
performance big data analytics in the context
of designing novel data structures and (parallel)
algorithms and utilizing modern hardware
advances, especially multi-core computing, hardware accelerators (e.g.,
FPGA/GPU), and solid state devices (e.g., flash and phase change
memory). Lastly, he is interested in rethinking database system design
for future hardware by reshaping the transaction and storage model.
Martin Jergler
is a doctoral candidate at the
Chair for Application and Middleware Systems
at Technische Universit
¨
at M
¨
unchen, Germany.
He received his B.Sc. (2009) in Internet
Computing and M.Sc. (2012) in Computer
Science from the Department of Informatics
and Mathematics at the University of Passau,
Germany. His current research interests revolve
around distributed data management and
event-processing. These include pub/sub
middleware, service-oriented architectures,
data-centric workflows and case management.
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 15
Hans-Arno Jacobsen
is a professor of
Computer Engineering and Computer Science
and directs the activites of the Middleware
Systems Research Group. He conducts
research at the intersection of distributed
systems and data management, with particular
focus on middleware systems, event processing,
and cyber-physical systems (e.g., smart power
grids.) After studying and completing his Ph.D.
in Germany, France, and the U.S., he engaged
in post-doctoral research at INRIA near Paris
before moving to the University of Toronto in 2001. In 2011 he has
been awarded the Alexander von Humboldt-Professorship to engage
in research at the Technische Universit¨
at M¨
unchen, Germany.
Richard Hull
received his Ph.D. in Mathematics
from the University of California, Berkeley.
Currently, he is a Senior Researcher at IBM
Research, T.J. Watson. In the past years he
worked on foundations and systems issues
in databases, workflow, and web services.
Specific directions have included integrity
constraints, semantic database models, query
languages, database programming languages,
converged services, personalization, and
semantic web services. He is co-author of
the book Foundations of Databases (Addison-Wesley, 1995), and is
(co-)author of over 100 refereed journal, conference, and workshop
articles. Since 2007 he is ACM Fellow and since 2005 a Bell Labs Fellow.
Roman Vacul´ın
received his Ph.D. in Computer
Science from the Charles University in Prague
and is currently a Research Staff Member at
IBM Research, T.J. Watson. Before, he was
working as a researcher at the Institute of
Computer Science, Academy of Sciences of
the Czech Republic, as a postdoctoral research
fellow at the Agent Technology Center, Czech
Technical University, and as a Fulbright Scholar
at Carnegie Mellon University. His research
interests are in service oriented systems,
services research, business processes, and distributed systems.
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 16
APPENDIX A
EXTENDED MAPPING ANALYSIS
LEMMA .1 The
Γ
incremental semantics guarantees the in-order
processing of external events (when all events are published from
a single source). Hence, the B-step execution (i.e., PAC rule
firing) follows the event-order serialization.
PROOF 2 (proof of Lemma 1). The proof simply follows
from the
Γ
incremental semantics such that external events
are consumed in-order and each consumed event (potentially)
triggers a B-step that is executed atomically [12].
LEMMA .2 The pub/sub operational semantics guarantees in-
order delivery of events between any pair of publisher and
subscriber.
PROOF 3 (proof of Lemma 2). The properties in are direct conse-
quences of our pub/sub definition in Section 4 (cf. Definition 2).
LEMMA .3 The mapping
M
under the pub/sub operational
semantics guarantees execution of subscriptions based on event-
order serialization.
PROOF 4 (proof of Lemma 3). This follows from the
subscription condition
ψi,PseudoClock
, which enforces that
subscriptions are processed based on the order of external events.
The condition
ψi,PseudoClock
assures that a notification for
event
ei
is generated only if all notifications for events
e0···ei−1
have already been generated.
LEMMA .4 The mapping
M
under our pub/sub operational
semantics guarantees the PDG-based serialization of subscrip-
tions.
PROOF 5 (proof of Lemma 4). The PDG-based serialization
of a micro-B-step (single PAC rule) and a subscription (super
PAC rule) is satisfied in the pub/sub semantics because the
topological sort order of the PDG is directly encoded in
the subscription’s condition (
ψPDG
), which enforces that a
subscription is evaluated only after all attributes have stabilized.
LEMMA .5 The mapping
M
guarantees the toggle-once prop-
erty of a B-step.
PROOF 6 (proof of Lemma 5). The toggle-once property of
aB-step, which is achieved by the PAC rule design, namely,
the relationship between a PAC rule’s prerequisite (
π
) and
consequent (
γ
) such that, roughly speaking,
π≈ ¬γ
and
π
is
always evaluated w.r.t. to a system snapshot at the outset of a
B-step after consuming the external event. This relation is also
encoded in our subscription definition given by Mπ.
LEMMA .6 Any reachable system snapshots
ΣΓ
ei
and
ΣΠ
ei
for
event eiare weakly equivalent.
PROOF 7 (proof of Lemma 6). The only means to change data
and status attributes is through external events and firing of
PAC rules, respectively. Data attributes are changed through
external events, since both
Γ
and
Π
semantics follow event-based
serialization. Therefore, changes to data attributes must be con-
sistent under both formulations. The status attributes are changed
through PAC rules fired within the scope of each external event;
again, we showed that under both
Γ
and
Π
formulations, PAC
rules follow PDG-based serialization. Moreover, the toggle-once
property can be emulated under our pub/sub formulation. The
toggle-once property is essential in order to avoid the infinite
firing of PAC rules within a B-step, thus, achieving a finite
number of micro-B-steps in a B-step. As desired, both
Γ
and
Π
result in firing of PAC rules and corresponding subscriptions
derived from these PAC rules in an identical order. Hence, the
values of status attribute are also guaranteed to be identical.
Moreover, the pub/sub operational semantics enables
concurrent execution of external events in parallel in accordance
to the PDG topological sort order. Suppose, the topological sort
consists of a number of levels, where each level is associated with
a set of PAC rules, i.e., subscriptions. Thus, as the external event
ei
propagates through each level, all status attributes associated
with visited levels will be stabilized and will be unaffected by the
execution of subsequent levels of PAC rules. Therefore, granted
the attribute versioning is in-place, the new external event
ei+1
can process the subscriptions that fall in levels
l1
-
lj−1
while
ei
is processing level
lj
. Therefore, inductively, it can clearly be
proven that as soon as one level of the topological sort order is
processed by one event, the processed level is ready to accept
the subsequent event. Hence, our pub/sub operational semantics
is capable of processing many events in parallel while satisfying
event-based and PDG-based serialization requirements.
LEMMA .7 The time complexity of the mapping
M:Γ−→Π
is linear w.r.t. the number of PAC rules and the size of the
Γ
schema.
PROOF 8 (proof of Lemma 7). We construct a set of application-
specific and generic conditions by iterating over each PAC rule
exactly once. In addition, we construct a generic condition for
every status attribute in Γ.
LEMMA .8 The number of subscriptions generated by the
mapping
M:Γ−→Π
is linear w.r.t. the information model
I
of
the schema Γ.
PROOF 9 (proof of Lemma 8). For every status attribute
s∈S⊆ I
, the mapping
M
generates three subscriptions (i.e.,
S⊕s
,
Ss
, and
Ss
). For each data attribute
d∈D⊂ I
, there is
a single subscription
Sd
. In addition, the mapping produces the
two generic subscriptions
Ssource
and
Ssink
. Altogether, this
results in 3·|S|+|D|+2, i.e., O(I)subscriptions.
THEOR EM.1 The schema
Γ
under incremental formulation
is equivalent to
Π
in terms of the B-Step operational, which
establishes the correctness of our mapping M:Γ −→ Π.
PROOF 10 (proof of Theorem 1). The necessary steps in
proving the correctness for the workflow mapping in terms of the
system snapshot reachability condition is summarized as follows:
(1) Event-based serialization of B-steps (firing all relevant
PAC rules) and execution subscriptions (a super PAC rule) (cf.
Lemmas 1,3). (2) PDG-based serialization of a micro-B-steps
(firing a single PAC rule) and execution of a subscription (a
super PAC rule) (cf. Lemma 4). (3) The toggle-once property
of a B-step is satisfied in our pub/sub semantics (cf. Lemma 5).
(4) Weak equivalence of any reachable partial system snapshot
ΣΓ
ei
and
ΣΠ
ei
for any event
ei
(cf. Lemma 6) and (5) strong
equivalence of any reachable complete system snapshot
ΣΓ
ei
and
1041-4347 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TKDE.2015.2421331, IEEE Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 5, MAY 2015 17
ΣΠ
eifor any event ei(cf. Lemma 6).
APPENDIX B
TEMPLATES FOR PAC RULES
The six templates for deriving the PAC rules from a GSM model
are depicted in Table 4.
Basis
Pre-
requi-
site
Ante-
cedent
Con-
se-
quent
PAC-1
Guard: for each stage
S
, for each
guard
ϕ
of S. (Include term
activeS0if S0is parent of S.)
¬S ϕ∧S0⊕S
PAC-2
Milestone achiever: For each
milestone
m
of stage
S
with
achieving sentry ϕ.
S ϕ ⊕m
PAC-3
Milestone invalidator: For each
milestone
m
of stage
S
with
invalidating sentry ϕ.
m ϕ m
PAC-4
Guard invalidating milestone: For
each guard
ϕ
of a stage
S
, for
each milestone
m
not occuring
in a toplevel conjunct
¬m
. (In-
clude term
S0
if
S0
is parent of
S.)
m ϕ∧S0m
PAC-5
For each milestone
m
of a stage
S.
S⊕mS
PAC-6 For each stage Schild of S0.SS0S
TABLE 4
Templates for PAC rules associated with a GSM model Γ.
Table adopted from [12].
APPENDIX C
POLARIZED DEPENDENCY GRAPH
The following algorithm for constructing the polarized
dependency graph (PDG) is adopted from [
12
] and included
here to make this paper self-contained.
The polarized dependency graph for a GSM model
Γ=(I,R)
,
denoted
PDG(Γ)
, is constructed as follows: For each status
attribute
s
in
Γ
, there are two nodes
h+,si
and
h−,si
. For each
stage
S
and each of its guards
ϕ
, there is a node
h+,S.ϕi
. For the
description of the edges of
PDG(Γ)
, the antecedent
α
of a PAC
rule is written as
τ∧γ
, where
τ
is either empty, or has the form
latestIncEvent=E
for some external event type
E
, or has the
form
s
, i.e., status event, for some status attribute
s
, where
∈
{⊕,} and γcontains no external event types or status events.
1) For each PAC-1 rule h¬s,τ ∧γ,⊕si∈R:
•
If
ˆ
a
is a toggling status attribute occurring in
τ
,
include a directed edge (ˆ
a,+S.ϕ).
•
If status attribute
a
occurs in
γ
, include two directed
edges (+a,+S.ϕ) and (−a,+S.ϕ).
2)
For each guard
ϕ
of stage
S
represented by status attribute
s:
•Add edge (+S.ϕ,+s).
•
For each milestone
m
owned by
S
that does not
occur in a top-level conjunct of form
¬m
in
γ
, add
the edge (+S.ϕ,−m).
3)
For each PAC rule
hπ,τ ∧γ,si
from templates PAC-2
or PAC-3 ∈R:
•
If
ˆ
a
is a toggling status attribute occurring in
τ
,
include a directed edge (ˆ
a,s).
•
If status attribute
a
occurs in
γ
, include two edges
(+a,s) and (−a,s).
4)
For each PAC-5 rule
hs,⊕m,si∈ R
, where
s
represents
a stage S, add edge (+m,−s).
5)
For each PAC-6 rule
hs,s0,si∈R
, where
s0
represents
a stage S0being a child of stage S, add edge (−s0,−s).
APPENDIX D
PAC RULES FOR EXAMPLE PROCESS
Table 5 shows all PAC rules that can be derived from the
example process given in Section 3.
NOPRERE-
QUISI TE
ANTECEDENT CONSE-
QUENT
PAC-1 RULES
1¬RA latestIncEvent = “R:NewOrder” ⊕RA
2¬RA latestIncEvent = “R:CustomerChange” ⊕RA
3¬ED RA:ap ∧ECR:ev ∧ ¬ ED:cp ⊕ED
4¬ED latestIncEvent = “R:ResumeEngineeringDesign” ⊕ED
5¬LR latestIncEvent = “R:NewOrder” ⊕LR
6¬LR latestIncEvent = “R:RedoExportDocs” ⊕LR
7¬ECR ¬ECR:ev ∧LR ⊕ECR
8¬PE ⊕ED:cp ∧LR ⊕PE
9¬PE latestIncEvent = “R:RedoExportDocs” ∧LR ⊕PE
PAC-2 RULES
10 RA latestIncEvent = “T:RequirementsApproval” ⊕RA:ap
11 ED latestIncEvent = “T:EngineeringDesign” ⊕ED:cp
12 ED RA:ap ⊕ED:sp
13 ECR latestIncEvent = “T:EvalCountryRestrictions” ⊕ECR:ev
14 PE latestIncEvent = “T:PreparingExportDocs” ⊕PE:pp
15 PE ED:cp ⊕PE:sp
16 LR ⊕PE:pp ⊕LR:cp
PAC-3 RULES
17 ED:cp RA:ap ED:cp
18 PE:pp ED:cp PE:pp
19 LR:cp PE:pp LR:cp
PAC-4 RULES
20 RA:ap latestIncEvent = “R:NewOrder” RA:ap
21 RA:ap latestIncEvent = “R:CustomerChange” RA:ap
22 ED:cp RA:ap ∧ECR:ev ∧ ¬ ED:cp ED:cp
23 ED:sp RA:ap ∧ECR:ev ∧ ¬ ED:cp ED:sp
24 ED:cp latestIncEvent = “R:ResumeEngineeringDesign” ED:cp
25 ED:sp latestIncEvent = “R:ResumeEngineeringDesign” ED:sp
26 LR:cp latestIncEvent = “R:NewOrder” LR:cp
27 LR:cp latestIncEvent = “R:RedoExportDocs” LR:cp
28 ECR:ev ¬ECR:ev ∧LR ECR:ev
29 PE:pp ⊕ED:cp ∧LR PE:pp
30 PE:sp ⊕ED:cp ∧LR PE:sp
31 PE:pp latestIncEvent = “R:RedoExportDocs” ∧LR PE:pp
32 PE:sp latestIncEvent = “R:RedoExportDocs” ∧LR PE:sp
PAC-5 RULES
33 RA ⊕RA:ap RA
34 ED ⊕ED:cp ED
35 ED ⊕ED:sp ED
36 LR ⊕LR:cp LR
37 ECR ⊕ECR:ev ECR
38 PE ⊕PE:pp PE
39 PE ⊕PE:sp PE
PAC-6 RULES
40 ECR LR ECR
41 PE LR PE
TABLE 5
Complete set of PAC rules for the “Design-to-order”
workflow.