Conference PaperPDF Available

LEAD: A Formal Specification For Event Processing

Authors:

Abstract and Figures

Processing event streams is an increasingly important area for modern businesses aiming to detect and efficiently react to critical situations in near real-time. The need to govern the behavior of systems where such streams exist has led to the development of numerous Complex Event Processing (CEP) engines, capable of detecting patterns and analyzing event streams. Although current CEP systems provide real-time analysis foundations for a variety of applications, several challenges arise due to languages' limitations and imprecise semantics, as well as the lack of power to handle big data requirements. In this paper, we discuss such systems, analyzing some of the most sensitive issues in this domain. Further in this context, we present our contributions expressed in LEAD, a formal specification for processing complex events. LEAD provides an algebra that consists of a set of operators for constructing complex events (patterns), temporally restricting the construction process and choosing among several selection and consumption policies. We show how to build LEAD rules to demonstrate the expressive power of our approach. Furthermore, we introduce a novel approach of interpreting these rules into a logical execution plan, built with temporal prioritized colored petri nets.
Content may be subject to copyright.
LEAD: A Formal Specification For Event Processing
Anas Al Bassit
anas.albassit@euranova.eu
EURA NOVA
Brussels, Belgium
Sabri Skhiri
sabri.skhiri@euranova.eu
EURA NOVA
Brussels, Belgium
Hejer Ammar
hejer.ammar@euranova.eu
EURA NOVA TUNISIA
Tunis, TUNISIA
ABSTRACT
Processing event streams is an increasingly important area for
modern businesses aiming to detect and eciently react to critical
situations in near real-time. The need to govern the behavior of
systems where such streams exist has led to the development of
numerous Complex Event Processing (CEP) engines, capable of
detecting patterns and analyzing event streams. Although current
CEP systems provide real-time analysis foundations for a variety of
applications, several challenges arise due to languages’ limitations
and imprecise semantics, as well as the lack of power to handle big
data requirements. In this paper, we discuss such systems, analyzing
some of the most sensitive issues in this domain. Further in this
context, we present our contributions expressed in LEAD, a formal
specication for processing complex events. LEAD provides an
algebra that consists of a set of operators for constructing complex
events (patterns), temporally restricting the construction process
and choosing among several selection and consumption policies.
We show how to build LEAD rules to demonstrate the expressive
power of our approach. Furthermore, we introduce a novel approach
of interpreting these rules into a logical execution plan, built with
temporal prioritized colored petri nets.
CCS CONCEPTS
Theory of computation Modal and temporal logics
;
Al-
gebraic language theory
;
Computer systems organization
Real-time system specication;
Software and its engineer-
ing
Syntax;Semantics;Domain specic languages;Application
specic development environments.
KEYWORDS
complex event processing, cep, pattern logic, pattern recognition,
timed petri nets, prioritized petri nets
ACM Reference Format:
Anas Al Bassit, Sabri Skhiri, and Hejer Ammar. 2019. LEAD: A Formal
Specication For Event Processing. In DEBS ’19: The 13th ACM International
Conference on Distributed and Event-based Systems (DEBS ’19), June 24–
28, 2019, Darmstadt, Germany. ACM, New York, NY, USA, 12 pages. https:
//doi.org/10.1145/3328905.3329501
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
DEBS ’19, June 24–28, 2019, Darmstadt, Germany
©2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6794-3/19/06.. .$15.00
https://doi.org/10.1145/3328905.3329501
1 INTRODUCTION
Complex Event Processing, as the eld concerned with detecting
and timely reacting to complex situations, is widely deployed in
many domains, such as, but not limited to, business activity moni-
toring [
39
], security monitoring [
4
], object tracking and intrusion
detection in networks [
6
], risk prediction [
29
], trac congestion
detection [40], surveillance [32] and RFID processing [45, 46, 48].
The need for CEP has been early realized by both industry and
academia, which has led to the development of numerous number
of systems and specications; some of the notable ones are CEL
[
24
], Siddhi [
41
], Esper [
17
], Cayuga [
16
], TESLA/T-Rex [
13
,
14
] and
Flink CEP [
8
] (for more examples refer to survey [
15
]). Despite the
fact that the aforementioned systems dier in their architectures,
query languages, execution and data models, the functionality they
provide is mostly similar to the rst CEP engine Rapide [
34
,
35
].
In other words, existing systems rarely focus on expanding the
original concepts of CEP, which leads eventually to products with
fairly similar features, but with dierent sets of limitations.
In this paper we present LEAD, i.e., a formal specication for
Live Event Analysis and Detection. LEAD denes a set of operators
to ingest non-decomposable (atomic) events and combine them in
order to detect more complex events (patterns). We aim through this
work to increase the expressiveness of CEP languages, in addition
to avoiding the pitfall of ambiguous semantics, recurrence and
oversimplication. The contributions we cover in the upcoming
sections can be summarized as follows:
A pattern algebra that extends the common set of operators
in CEP, and denes them formally using TRIO [
23
,
36
], a
logic-based specication language aggrandized with tempo-
ral features;
A rule grammar that, using our pattern algebra, allows users
to obtain dierent kinds of actions, depending on the char-
acteristics of a matched pattern;
A novel logical execution plan, created based on a combina-
tion of timed colored petri nets with aging tokens [
44
] and
prioritized petri nets [
5
], that we believe will facilitate the
deployment of this plan in the future.
The rest of the paper is organized in the following manner: in
the next Section, we discuss the motivation behind LEAD and the
limitations of other approaches; in Section 3 we present LEAD pat-
tern model in detail, providing the semantics for all valid operators,
while in Section 4 we show how to create LEAD rules. Consequently,
Section 5 demonstrates how LEAD rules can be translated into petri
nets for ecient pattern detection. Next, we further discuss our
positioning in the related work in Section 6. Finally, we conclude
our work and discuss future plans in Section 7.
91
DEBS ’19, June 24–28, 2019, Darmstadt, Germany Al Bassit, Skhiri and Ammar
2 MOTIVATION
In this section we discuss the major challenges that face the CEP
domain from our perspective, and we position our work in the big
space of dierent frameworks and specications. The current CEP
challenges can be grouped in two classes: technical challenges and
logical challenges. Technical challenges are related mainly to per-
formance, maintainability and scalability of the event processing
engine. With the performance requirements getting more crucial,
most of the attention is being driven towards improving and main-
taining the streaming engine, as well as optimizing and distributing
the physical plan, which in total requires signicant resources. An
improvement in this regard is to leverage on a full stream process-
ing engine to do the work, which is achievable by implementing
CEP operators over distributed stream processing frameworks like
Apache Spark [
42
] or Apache Flink [
19
], and therefore reducing the
amount of technical focus to the minimum, as soon as mapping the
operators is done. As a matter of fact, Apache Flink already has its
own CEP module [
8
] working on top of the streaming framework,
but it is still limited in terms of the features it oers.
Another class of issues is what we call logical challenges, which
include the complications related to CEP query languages, particu-
larly, their semantics and expressive power. Several issues appear
in this category, such as the lack of formalism in multiple CEP
frameworks [
1
,
17
,
34
,
35
] that leads to semantic ambiguities, in
addition to the absence of some operators like negations [
16
], repe-
titions [
13
,
16
,
26
,
47
], sequencing [
12
,
34
,
35
], and programmable
selection & consumption policies [
34
,
35
,
43
,
47
], which decreases
the expressiveness of a CEP query language, and naturally aects
its user-friendliness. Yet, there is hardly a use-case that cannot be
implemented using at least several CEP query languages. This can
be partially linked to the fact that most CEP users and language
creators have an SQL-like mindset, where a typical problem is for-
mulated to have several streams as inputs and at most one action as
an output. But what happens if a certain problem requires having
multiple outputs? Or, adding more complexity, how to handle the
dependencies between dierent outputs? To clarify our intentions
behind these questions let us discuss the following example:
Example 1. Consider a scenario of a product roll-up tracking
problem, where a mobile gaming company wants to prole its
applications, in order to infer knowledge about the users and the
game usage, and apply this knowledge to improve sales. To model
this scenario, we assume the following four streams: installations,
accesses, artifacts bought and shares; and the following four actions
per each user and game and within the rst 3 days from installation:
(i) Game Success for a gamer (S)
The user has played the game at least 5 times, shared it with
at least 2 friends and bought at least 2 artifacts.
(ii) Game Middle-success for a gamer (M)
The user has played at least 3 times, and not (S).
(iii) Game Middle-success & Leaving (L)
The user has played between 3 and 5 times, and did not
share the game or buy any artifacts. Additionally, they did
not connect within 2 days after the last access.
(iv) Game Failure (F)
The user has played less than 3 times, and did not share or
buy any artifacts.
By analyzing this use-case we realize:
Case (M)depends on the absence of case (S);
No decision can be made before at least 3 days are passed,
unless (S)is matched;
Knowing that a user has played a game between 3 and 5
times, and did not share the game in 3 days is not enough to
decide whether the output should be (M)or (L);
Not the whole space of options is covered by these four cases;
The four cases are similar patterns with dierent conditions;
To formulate these patterns, a CEP language must support
repetitions (collect events from a stream), negations (check
for the absence of accesses), sequencing (start collecting
accesses after an installation) and temporal boundaries (a 3
days interval).
According to our best knowledge, if obtaining four distinct types
of actions is required, there is no CEP framework capable of formu-
lating this problem with less than four rules, although the patterns
are similar to each other and have inter-dependencies. The fact that
such a problem cannot be formulated using only one rule automat-
ically reduces the expressiveness attributes of a query language.
Additionally, if the CEP framework, on which we intend to execute
these multiple rules, does not optimize its execution plans by merg-
ing identical parts, as done in Cayuga [
27
], it would mean that the
same events will be processed multiple times, despite the fact that
there is only one possible output at most.
The aforementioned issues are the motivation behind this work.
Even though we do not focus directly on the technical type of
problems, we keep it in mind while describing our computational
model in Section 5. Moreover, we increase the expressiveness with
a decent set of operators and features, while trying to preserve
user-friendliness, in addition to avoiding ambiguous semantics by
providing formal denitions for our operators.
3 PATTERN MODEL
3.1 TRIO Overview
TRIO [
23
,
36
] is a rst order logical language designed to support
formal specications of real-time systems by introducing temporal
operators. It provides the tools for modeling properties whose val-
ues may change over time. TRIO can be thought of as an extension
of classical temporal logic [
31
,
37
], as it introduces a precise quan-
titative view of time. Each TRIO formula is relative to the current
time instant, which is not mentioned explicitly and can be always
assumed. TRIO’s alphabet includes sets of names for variables, func-
tions and predicates. It also denes a xed set of operator symbols,
which are divided into propositional symbols (
,
¬
), the universal
quantier
, and the temporal operators Futr and Past.Futr(A, t)
and Past(A, t) mean that a formula
A
holds an instant that is ttime
units in the future or in the past, respectively, relative to the current
time. TRIO’s variables, predicates and functions are divided into
time-dependent and time-independent. Furthermore, every vari-
able, predicate or function has its own type, which denes the set
of values that can be assumed, returned or taken as arguments. The
syntax of TRIO is represented as follows:
Every variable is a term;
Every n-ary function applied to nterms is a term;
92
LEAD: A Formal Specification For Event Processing DEBS ’19, June 24–28, 2019, Darmstadt, Germany
t1t t +1t+2t+3t+4t+5t+6t+7time
SA
a0a1a2
SB
b0b1b2b3
SC
c0c1c2c3
Figure 1: Sketch for three streams SA,SBand SCover 8 instants of time
Table 1: Matches relations of two patterns, a sequence-based and a conjunction-based
Pattern Matches Relation
=ABC M={(1.1,2,4),(1.1,2,5),(1.1,2,7),(1.1,3,4),(1.1,3,5),
(1.1,3,7),(1.1,5,5),(1.1,5,7),(1.2,2,4),(1.2,2,5),(1.2,2,7),
(1.2,3,4),(1.2,3,5),(1.2,3,7),(1.2,5,5),(1.2,5,7),(4,5,5),(4,5,7)}
=ABA M={(1.1,0,1.2),(1.1,2,1.2),(1.1,3,1.2),(1.1,5,1.2),(1.1,0,4),
(1.1,2,4),(1.1,3,4),(1.1,5,4),(1.2,0,4),(1.2,2,4),(1.2,3,4),(1.2,5,4)}
Every
n
-ary predicate applied to
n
terms of the appropriate
types is a formula;
If Aand Bare formulas, ¬Aand ABare formulas;
If
A
is a formula and
x
is a time-independent variable,
x A
is a formula;
If
A
is a formula and
t
is a term of the temporal type, then
Futr (A,t)and Past(A,t)are formulas.
The temporal domain in TRIO is numeric with no further con-
straints, i.e. it can be either discrete or continuous. Given a temporal
mean to order events, LEAD does not aect this property.
3.2 LEAD Basic Denitions
Let
A
be a set of attribute names and let
E
be a set of type names,
where each
eE
is associated with a structure that is a nite
subset of
A
, and nally let
S
be a set of streams. A stream in LEAD
is an unbounded sequence of event instances (
s=r0r1r2r3. . .
).
Events are notications generated by event producers, where ev-
ery producer typically generates events of the same event type
Type(s)=Type(ri)=eE
and associates them with event time
time (ri) ∈ T
, where
T
is the time domain. For example, the
Access
event in Example 1 can be represented as follows:
Access (GI D : 123,U I D : 321): 999
Where
Access
is the event type,
GI D
and
U I D
are attributes
of primitive types and 999 is the event time. In LEAD,
T
is also
the baseline event type, where a time event can be given as an
absolute point in time, e.g., Mon, 24 Sep 2018 17:22:06 GMT, a
relative point in time, e.g.,
access.EventT ime +
3
days
, or a time
interval: [access.EventT ime,(access.EventT ime +3hours)).
A pattern (or a complex event)
is an interesting situation
that can be modeled using a subset of event types
EE
, and
a collection of CEP operators, such as conjunction, sequence and
negation. We call an atomic pattern every
that consists of a single
event type instance in its denition, and we dene a predicate
Atomic()to indicate whether is atomic or not.
Given a pattern
dened over
n
streams, we call
M
the matches
relation of the pattern
, i.e. the set of all event instances that t the
pattern
, regardless of the technical specication of the language.
Each relation
M
is associated with a schema
S(M)
that consists
of a heading, paired with a set of constraints dened in terms of
that heading. A heading normally has
O
attributes, where
O
is the
size of the bag of event type instances in the pattern denition. All
attributes in the heading of
M
have their types in the time domain,
e.g. an ordered set of timestamps, intervals or atomic timestamps,
so that each tuple
mi
in
M
represents the position (a combination
of event time and arrival order) of the instances of all the event
types that participate and satisfy the pattern
. Finally, we dene a
function
pos()
that given an event instance, returns its position, i.e.
its event time (also retrieved by
time ()
) and its arrival time, allowing
to order simultaneous events coming from dierent sources and
belonging to the same stream on the logical level.
Example 2. Consider the three streams shown in Figure 1, and
two patterns
and
as sequence and conjunction patterns over
these streams, respectively, and let us see the structure and set of
constraints of their matches relations (see also Table 1):
=ABC,
M=(pos(aj),pos(bk),pos(cp)) where j,k,pNand
time (aj) ≤ time(bk) ≤ time(cp)
=ABA,
M=(pos(aj),pos(bk),pos(aq)) where j,k,qNand
pos(aq)>pos(aj)
Going one step further, we can construct a new pattern
′′
out
of
and
, by stating a relation between the latter two patterns
(also called sub-patterns in this context) by means of a CEP op-
erator
op
:
′′ =op
, and
M′′ =MZM
, where
Z
is
the Cartesian product of the two relations with respect to the con-
straints dened over their headings and the CEP operator, and
S(M′′)=S(M)ÉS(M)
, where
É
concatenates the headings
of the two schemas.
93
DEBS ’19, June 24–28, 2019, Darmstadt, Germany Al Bassit, Skhiri and Ammar
3.3 Pattern Logic
A pattern
consists of a bag
b
of event types, where
b
is a 2-tuple
(E,f)
and
f
:
EN1
, giving the multiplicity, that is, the number
of occurrences of an event type
eE
in this given bag. Moreover,
let
P
be the set of all predicates dened over the bag of event types
b
, i.e. a pattern; and the arity of a predicate
pP
therefore is
the cardinality of
b
. Besides time-dependent and time-independent
predicates dened by TRIO, LEAD adds another classication of
predicates in P:
Simple predicates, where all event types in the predicate
have f(e)=1
Complex predicates, where at least one event type in the
predicate has f(e)>1
And we call W(P)the conjunctive normal form of predicates in P.
Filtering. Pattern instances can be ltered using formulas in the
set of time-independent predicates
PiP
, i.e. evaluating the same
pattern instance will not change over time. Written as FILTER,
ltering is the rst pattern operator presented in this work:
=F ILT ER (w),where w W(Pi)(1)
Renaming. In the case of complex formulas it becomes impossible
to dierentiate between multiple instances of an event type or a
sub-pattern used to construct a single pattern, and then it is needed
to explicitly rename the duplicated instances, at least. The renaming
functionality is activated by means of variables. Let
X
be the set
of all variables, which can be used in the assignment operator AS
with the following grammar expression:
=AS x,where x X(2)
Uniqueness. In LEAD, atomic events of the same type can occur
at the same time (event time). However, events with the same event
time are ordered by their arrival time. Thus, it is enough to satisfy
the uniqueness of event instances in one pattern
by using the
2-tuple (e,m)as a unique ID, where eEand mM.
Match predicate. Once uniqueness has been introduced, we are
able to formally dene the matching of events using the time-
dependent predicate
Match(e,m)
, which evaluates to
true
at the
time instant when an event of type (
e
) having a position (
m
) occurs,
and
f alse
at every other instant. This predicate is captured by the
following formula:
eE,mM,t(Match(e,m) ↔
Past (M atch(e,m),t)∧¬Futr (Match(e,m),t))) (3)
This predicate can be generalized to matching a pattern
, which
is implicitly proven recursively after the rest of operators are pre-
sented in the upcoming sections:
EE,mM,t(Match(,m) ↔
Past (M atch(,m),t)∧¬Futr(Match(,m),t))) (4)
Match enabling and life-cycle. We say that the
Match
predicate is
enabled if we are looking for the pattern
, otherwise it is disabled.
The
Match
predicate indicates that a pattern
is matched at the
position
m
. However, it does not show for how long the matching
procedure had been enabled before a match was found. Therefore,
we signify the starting point of the enabling interval by the predicate
In
, where
In()
indicates when recognizing
was enabled, and
In(,m)
indicates when recognizing this specic match instance
was started. To better understand the behavior of the last predicate,
let us look at the following TRIO formula:
EE,t[Match(,m)↔(In(,m) ∨ Past (In(,m),t))] (5)
In this TRIO expression, (t)represents the time required to match
the
m
instance of the event
e
. In other words, it took us an interval
of length
t
to nd the match at the position
m
. Moreover, we call
the interval
[time(In(e,m)),time(Match(e,m))]
the life-cycle of the
2-tuple
(e,m)
. Through the rest of this work,
In
will be omitted in
LEAD formulas written using TRIO when it is implicit, and will be
shown otherwise.
In the next subsections we present the set of operators sup-
ported in LEAD. We start with the core operators, then move to the
temporal constraints operators and nish with the selection and
consumption policies operators.
3.3.1 Core Operators. In our context, core operators are the basic
operators that can be used to construct more complex patterns out
of simpler ones. The set of core operators is given by the following
grammar:
=||| ¬|+
For each operator in this grammar we provide a translation into
TRIO formulas in order to dene its behavior formally and to avoid
any ambiguity while executing these operators (in Section 5).
Conjunction. The rst core operator to present is conjunction,
which is matched when two sub-patterns
1
and
2
are matched
in any arbitrary order at two possibly dierent times.
∧(1,2)=de f
E1,E2E,m1M1,m2M2
{Match(12,m1Zm2) ↔
[(Match(2,m2) ∧ Match (1,m1))∨
t1(Match(2,m2) ∧ Past (Match(1,m1),t1))∨
t2(Match(1,m1) ∧ Past (Match(2,m2),t2))]} (6)
Generalization of this operator (by replacing
mM
with
mM
) can be thought of as the Cartesian product of the
matches relations.
Dis-junction. On the contrary to conjunction, dis-junction is
matched when at least one of the sub-patterns
1
or
2
is matched
at a single point in time.
∨(1,2)=de f
E1,E2E,Match(12,m1Zm2) ↔
{m1M1Match(1,m1)∨
m2M2Match(2,m2) } (7)
Generalization of this pattern operator is the outer join of the
matches relations, where event time represents the join key.
Sequence. A sequence-based pattern is matched when two sub-
patterns
1
and
2
are matched in order. Bare in mind that, in our
context, the matching process of
2
can be enabled only when
1
has a match, which is controlled by the enabling predicate of 2.
94
LEAD: A Formal Specification For Event Processing DEBS ’19, June 24–28, 2019, Darmstadt, Germany
→ (1,2)=de f
E1,E2E,m1M1,m2M1
{Match(12,m1Zm2) ↔
[(Match(1,m1) ∧ I n(2,m2) ∧ M atch(2,m2))∨
t1>0((Past (Mat ch(1,m1),t1))∧
Past(In(2,m2),t1)) ∧ M atch(2,m2)]} (8)
Looking at formula (8), one can notice that the sequence oper-
ator plays a staging role in our specication. In other words, this
operator splits the matching process into stages, where only when
one stage is matched, the next one starts.
Negation. This operator represents the absence of a pattern,
hence introduces additional complexity to the matching formula
in order to decide whether there is a match or not. The match-
ing formula of the negation operator is written as follows: ¬m
M(Match(,m)) ≡ Match(,∅)
, which is ambiguous in its cur-
rent form unless we bound it, due to the fact that it is always valid
except when an event instance occurs. The bounding is done by
means of time-dependent termination predicates
PdP
, thus, we
need to reform the matching formula to include these predicates as
follows:
¬(,wnt )=d e f
wnt W(Pd),EE,¬mM
{(wnt In) ∧ Match (,m)) ∨ ¬t1<t
(wnt Past (I n),t) ∧ Past(Match(,m),t1))} (9)
In the Formula 9,
In()
is the predicate that denes when we
started looking for the absence of
. For instance, in the case of
1→ ¬2,Mat ch(1,m1) ≡ In2).
Repetition. Can be thought of as a concatenation, or a collec-
tion of events, atomic or complex, called items, which have the
same type that we call the internal pattern. Similar to negations,
repetitions require a set of termination predicates. However, they
also need invalidation predicates to terminate with a mismatch,
and acceptance conditions to accept a new item to the list of col-
lected items. The domain of termination predicates is the whole set
P
, while invalidation and acceptance ones are time-independent
predicates
PiP
. In order to write the matching formula of the
repetition operator, we still need to dene the time-independent
function
check ((,m),w)
, that given a 2-tuple
(,m)
, which rep-
resents a unique match, and a collection of predicates
wW(Pi)
,
returns the result of evaluating the latter over the former. Now we
have sucient tools to express the matching formula:
+(,wacc ,wr t ,wi n )=d ef
wrt W(P),wacc ,wi n W(Pi),EE,MM+,t
{Match(+,i {1, . . ., |+| }mi=M,wac c ,wrt ,win ) ↔
[Past (I n(),t) wr t ∧ ¬win miM,t1<t
(Past (Match(,mi),t1) ∧ check ((,mi),wac c ))]} (10)
Sub-context. Similar or simplied version of the described set of
operators can be found in most of the currently used CEP frame-
works. However, in LEAD we try to increase the expressive power
of CEP languages, by introducing an extension for the repetition
operator to allow a subset of the collected items to enable new
matching predicates, and we call the new predicates ”branches”.
Given a repetition pattern
+
and one matches list M, where for
every
mM
there is an item
(,m)
, a branch can be considered
as a match enabling predicate for a new pattern
In(1)
that is ini-
tiated by the matching of
(,m)
. Furthermore, as branches work
concurrently with each other and the main context, there must
be a way to terminate or disable them with a termination clause
wst W(Pi). This behavior can be formalized as follows:
(,m)∈(+,i∈{1, . . ., |+| }mi=M),wst W(Pi)
(Match(,m) ∧ I n(1)∧¬ws t )(11)
The output of a sub-context operator that starts multiple branches
is a set of matches that is returned to the main context.
3.3.2 Temporal Constraints Operators. These operators are used
mainly to provide temporal boundaries for other operators, espe-
cially the ones that cannot be matched otherwise (negation) or
can only be matched partially (repetition). Furthermore, they add
validity ranges and provide the ability to temporally constraint
sub-patterns in a pattern denition. The set of temporal constraints
supported in our specication is given by the following grammar:
=W ait (,t) | W i thin(,t,)
Wait. This operator requires the match of a pattern
, and a
waiting time of
t
instants, before emitting the match, where
tT
is
a relative point in time to the one when the operator was enabled,
i.e. its In predicate. This operator is formalized as follows:
W ait (,t)=de f
E1E,m1M1,t
{Match(W ait (1,t),m1) ↔
[Past (I n(),t)∧(Match(1,m1)∨
(Past (Match(1,m1),t1) ∧ 0<t1<t))]} (12)
Within. This operator is matched when a pattern
1
is matched
after another pattern 2within less than tinstants of time:
W ithi n(2,t,1)=de f
E1,E2E,m1M1,m2M1,t
{Match(W ithin (2,t,1),m1Zm2) ↔
[Past (Match(1,m1),t1)∧
Match(2,m2) ∧ 0<t1<t]} (13)
Remark 1. The aforementioned temporal constraint operators
can be expressed using time events and core operators, e.g.
W ait
operator is the conjunction between a time event
(tT)
match and
a pattern
()
match. Nevertheless, for the sake of simplicity and
user-friendliness, we have introduced
W ait
and
W ithi n
as explicit
operators with predened behaviors.
3.3.3 Selection & Consumption Policies. All formulas presented
so far cannot provide unambiguous answers, i.e. in presence of
situations where multiple matches exist, which one to pick? Or
how many to pick? If multiple matches could be selected, is it
possible to use (or consume) an event instance more than once?
95
DEBS ’19, June 24–28, 2019, Darmstadt, Germany Al Bassit, Skhiri and Ammar
Answering such questions is the motive behind supporting what
is called selection and consumption policies. These policies can
be traced back to when pattern matching frameworks started to
show up, and long before the problem was properly addressed by
D. Zimmer in [
49
]. For example, in some Active Databases, a sort
of such policies occurred in the form of event validity intervals in
SAMOS [
21
], or operators like
Periodic
in Snoop [
10
]. However,
with the rise of streaming era, managing memory and producing
the most relevant set of matches to users’ expectations became
more critical. Therefore, the ability to answer the aforementioned
questions as expressively as possible is the cornerstone for any
modern DSMS/CEP, which is manifested in works like Amit [
1
],
Sase+ [26] and TESLA [13], among others.
In LEAD, we support a set of selection and consumption opera-
tors that is categorized into three distinct levels, that we are going
to detail in the next paragraphs, bonded with examples of how to
utilize these policies to extract dierent set of matches from the
pattern dened in Table 1.
Basic Selection Policies. The range of these operators is all pat-
terns that include core and temporal constraint operators. In par-
ticular, we dene three operators at this level:
Sel ()=f irs t() | l ast () | ad j()
f irst
is the default selection policy in LEAD, in which we
select the rst occurrence of an event type or a pattern to be
a representative until it is consumed or the query is nalized.
Its behavior is formalized as follows:
f irst ()=def
EE,m1M{[In() ∧ Match(,m1)]
t[Past (In(),t) ∧ Mat ch(,m1)∧
¬m2M(Past(Match(,m2),t1) ∧ t1<t)]} (14)
An example of how
f irst
works on pattern
from Table 1:
1=ABC=f irst (A) → f irs t(B) → f i rst (C)
M(1)={(1.1,2,4)}
last
, on the other hand, means that the last element is se-
lected and, therefore it needs a temporal boundary dened by
the interval
[In(),wlt )
, where
wlt
is either a match of the
next element in a sequence pattern, or a temporal constraint.
last (,wl t )=de f
EE,m1M,{[wlt M atch(,m1)]∨
t[wlt Past(Match(,m1),t)∧¬m2M,
(Past (Match(,m1),t1) ∧ t1<t)]} (15)
An example of how last works:
2=Alast (B) C
M(2)={(1.1,3,4)}
And nally,
adj
(adjacent), that given a binary operator
pattern, i.e. conjunction or sequence, written as follows:
=1op 2
returns the rst match
with the minimum
time dierence between 1and 2.
adj (∧(1,2)) =d e f
E1,E2E,m1M1,m2M1
{Match(last(1) → f i rst (2),m1Zm2)∨
Match(last(2) → f i rst (1),m1Zm2)} (16)
An example of using adj policy:
3=adj (ABC)=last(AB) → f ir st(C)
M(3)={(1.2,3,4)}
Mixed Policies. A pattern
, dened with the operators presented
in this work so far, guarantees only one match at most. With the use
of selection policies, users can decide some of the characteristics of
the selected match, only if multiple options were available at some
point in time. Consequently, there was no need to introduce the
concept of consumption, as event instances could never be used
more than once for a single pattern denition. However, to support
acquiring multiple matches of
, we developed two policies that
mix the concepts of selection and consumption policies.
SelCon()=every() | all ()consume(i),i
every
selects all matches of a pattern so that their life-cycles
do not overlap. For example, let us assume that
m1
and
m2
are the only two possible matches for a pattern
, where
m1<m2
. For the operator
every
to allow both matches,
the life-cycle of
m1
must take place
before
or
meet
the life-
cycle of
m2
, according to Allen’s temporal logic [
18
]. This
policy can be described using TRIO as follows:
every()=d e f
EE,MM{mM(Match(,m)) ↔
¬m1M(Past(Match(,m1),t1) ∨ P ast (I n(,m1),t1)
t1∈ [time(I n(,m)) t ime(Match(,m))) )} (17)
Some examples of how everyworks:
4=every(ABC)
M(4)={(1.1,2,4),(4,5,5)}
5=Aevery(BC)
M(5)={(1.1,3,4),(1.1,5,5)}
all ...consume
is another mixed policy.
all
on its own se-
lects all possible matches for a pattern
, that is the whole
matches relation
M
.
consume
, on the other hand, indicates
the part(s) of
that cannot be used (or consumed) more
than once.
all ()consume(1)=de f
E,E1E,1,mM
{Match(m,) ∧ m1M1(m1m
¬m2M(Past(Match(m2,),t) ∧ m1m2))}
(18)
An example of how all...consume works:
96
LEAD: A Formal Specification For Event Processing DEBS ’19, June 24–28, 2019, Darmstadt, Germany
Table 2: Mapping table between the algebraic operators and their corresponding syntax in the query grammar. p denotes
patterns, sp – sub-patterns, rp – repetition patterns, bx – boolean expressions, tf – time frame expressions and n – natural
numbers
Algebraic Operator Grammar Syntax Algebraic Operator Grammar Syntax
conjunction p1and p2adjacent adj [acent ](p)
dis-junction p1or p2all all (p)
negation not p every every(p)
repetition collect (p) [accept (bx1)] sub-context sub-contex t (prp [.ranдe(n1,n2)] [.f ilter (bx1)]
[terminate(bx2)] {sub-query})
[invalidate(bx3)] [ter minate(bx2)]
within p1wiht in t f p2wait W ait (p,t f )
rst f irst (p)last last (p)
max max(rp)min min(rp)
repetition & every collect (everyp)... repetition & consumption collect (p)consume(sp)...
8=all (ABC wit hin 1s f rom B)consume (A)
M(8)={(1.1,2,4),(1.2,2,4),(4,5,5)}
Remark 2.
all
and
every
are the only operators that cannot be
nested, since they consume events dierently.
Repetition Selection Policies. The default selection policy for rep-
etitions is
f irst
, like any other operator, which means the rst
+
to be matched, i.e. terminated, will be emitted. But what happens
when multiple repetition instances are terminated at the same time?
which collection of items to choose? By default, the collection with
the longest life-cycle will be chosen to be emitted. In order to cus-
tomize this behavior, we introduce two new selection policies to be
applied only to repetitions:
Sel r (+)=Max(,waдд ) | Min(,waдд ),
waдд W(Paддr e дat e ) ∧ Paддr e дat e Pi
where
Paддr e дat e
represents the set of aggregate functions that we
seek to maximize or minimize, i.e. Sum, Count, Average, etc.
Going back to the denition of repetitions in the previous sec-
tion, it is still enigmatic how items in
MM+
are selected and
consumed, which is a natural eect of allowing operators to be
nested. By default, a repetition selects all matched items during
its life-cycle with a zero-consumption policy, in a similar manner
to the selection policy
all
. To extend this behavior, we allow rep-
etitions to apply either
consume
or
every
to the internal pattern
.
Remark 3. A repetition pattern
+
must have a termination
clause, as we discovered in equation (10). The predicates in this
termination clause can be time-independent predicates, dened over
the attributes of the collected items and the new ones to be checked,
and would normally include comparison operators
(<, >, ==,,
,,)
, boolean functions, and aggregation boolean functions. Only
when an explicit time-independent termination clause is absent,
there must be a time-dependent termination predicate. In equations
(9,10,15) this predicate can be either the match of the following
pattern
in a sequence pattern
, or a temporal constraint
operator.
4 LEAD RULES’ GRAMMAR
Each LEAD rule has a structure that reects the following grammar
expression template:
FROM <streams>
[DEFINE <event types |event instances>]
[ENRICH <event types>]
MATCH <pattern expression>
[PARTITION BY <attributes |window>]
EMIT <actions |complex emit>
Naturally, FROM clause identies the set of streams which con-
tribute in the rule, while DEFINE optionally denes custom event
instances or custom event types to be used internally in the rule
expression, and ENRICH enriches one or more event types with
new attributes in order to produce them later on as actions. The
MATCH statement is where a pattern is dened, according to the al-
gebraic denitions in the previous sections and the mapping shown
in Table 2. PARTITION BY expresses LEAD’s support for parti-
tioned rules based on the distinct values of event attributes and/or a
time or event-count frame (window). Finally, EMIT gives LEAD the
power to either produce a list of actions, if the specied pattern was
matched, or apply further checks (complex emit) to decide which
action or list of actions shall be produced, depending on internal
values in the matched pattern. This is done by nesting two checking
strategies, as shown in the following regular expressions:
<complex emit>FIRST <check clause>|
ANY <check clause>|
<check clause> <complex emit>|
(<where clause>actions)+
<where clause>WHERE conditions
Checking Strategies. FIRST checks the WHERE clauses one by
one in their written order and produces the action list associated
only with the rst matched clause. ANY, on the other hand, checks
all its children clauses, and produces all the action lists of the
matched conditions.
97
DEBS ’19, June 24–28, 2019, Darmstadt, Germany Al Bassit, Skhiri and Ammar
FROM Installations AS _in, Accesses AS _ac, ArtifactsBought AS _ab, Shares AS _sh
DEFINE TimeEvent tc(_in.event_time, _in.event_time + 3 days)
EventType leaving(BOOLEAN leaving(FALSE))
MATCH _in Followed By (collect(_ac) terminate (!tc or count()==6) AS acs
and collect(_ab) terminate (!tc or count()==2) AS abs
and collect(_sh) terminate (!tc or count()==2) AS shs)
Subcontext (ac ==> acs.RANGE(3, 5) (MATCH (not _ac Within 2 days) Emit Event leaving(TRUE)))
terminate(abs.count()>0 or shs.count()>0) AS ls
PARTITION BY _in.uid, _in.gid
CHECK FIRST
WHERE (count(acs)>=5 and count(abs)==2 and count(shs)==2) Emit Event Success(gid)
WHERE (count(acs)>=3)
CHECK FIRST
WHERE (AT LEAST 1 (ls.event_time > _in.event_time + 3 days) and count(abs)==0 and count(shs)==0)
Emit Event Middle_Success_Leaving(gid)
WHERE (TRUE) Emit Event Middle_Success(gid) END
WHERE (count(acs) <= 2 and count(abs)==0 and count(shs)==0) Emit Event Failure(gid) END
Figure 2: A query solving Example 1 using LEAD syntax
Repetition Quantiers. So far we have discussed how time inde-
pendent predicates are used to lter individual events. However, in
some scenarios, a WHERE clause or a sub-context operator in par-
ticular, one might need to check a list of events, that is the natural
outcome of a repetition operation. To ease such scenarios, we in-
troduce the set of quantiers to be used with this kind of lists:
ALL
,
ANY
,
AT LEAST v
,
AT MO ST v
,
BETW E EN v1AN D v2
, where
v
represents variables with natural numeric values. For example,
let us consider a list of events
(l)
and its events of the type
eE
that has a schema of one attribute
aA
of a numeric type. Then
we can quantify
(l)
using a quantier as follows:
ALL(l.a==
5
)
, or
AT LEAST 3(l.a== 5), etc.
Syntactic Restrictions. Some of the operators dened in Section
3.3 were associated with termination predicates in order to bound
their behavior. The range of event types that these predicates can
use was mentioned in the operators denitions. Nonetheless, it is
worth insisting over the following points:
All time-independent predicates are bounded to the attributes
of the patterns they govern;
An exception to the previous point is the termination clause
of a sub-context, that is bounded to the attributes of the
patterns in the main MATCH clause.
After explaining the semantics behind LEAD’s rules, we are
ready to model the use-case discussed in Example 1. It is evident
from Figure 2 that such type of problems can be naturally solved
using one rule. We start by selecting the streams from which we
expect to get event instances, then in the DEFINE clause we dene
two internal elements: a time interval event
tc
that has a validity
range of 3 days and starts after every detected installation, and an
event type
leavinд
that consists of one boolean attribute with
f alse
as a default value. The MATCH clause denesthe main pattern, that
is the sequence consisting of an installation followed by collecting
events from
Accesses
,
Arti f ac tsBouдht
and
Shares
streams. Their
termination expressions are similar, as they all match if they reach
a certain count or 3 days have already passed. The sub-context
operator is used to search for the case (L), where we are checking
for the absence of accesses in two consecutive days, and is disabled
whenever there is a share or an artifact is bought. The PARTITION
BY clause partitions the rule per user and game, which will ensure
that the rule is executed only once per each partition. After detect-
ing the pattern, it is time to decide which action needs to be emitted.
As we learned, CHECK FIRST checks for the rst WHERE clause
to be matched, and only then emits the corresponding action and
stops, or goes deeper in the nested hierarchy. An interesting real-
ization that can be made out of this query is that the conditions in
the WHERE clauses are deferred in this scenario, but this behavior
can be altered based on the implementation of the CEP engine that
follows our specication and the query optimizations it applies,
which is out of the scope of our paper.
5 LOGICAL EXECUTION PLAN
In this section, we introduce the last piece of our contribution, a for-
mal logical execution plan for evaluating LEAD rules. In particular,
we follow an aging tokens prioritized colored petri nets (APCPN)
approach for evaluation. A colored petri net is a directed bipartite
graph, where the classes of its states are called places and transi-
tions, in addition to colored (typed) tokens that ow between these
dierent states. In a CEP context, CPNs serve as complex event def-
initions and the owing tokens represent event occurrences, thus
allowing a clear distinction between event denitions and instances.
Since an event occurs at a certain time, its age can be used to order
or invalidate tokens. To cease the non-deterministic behavior in
CPNs we extend the original model with priorities whenever a
choice has to be made. Formally, the denition of APCPN is a tuple
N={Σ,P,I,IC,OC,TT ,π,I T ,G,r0}:
Σis a nite set of types (colours), ΣE[n],nN;
P≡ [p1,p2, ... , p|P|]
is a nite set of places, which can be
either stateless, i.e. they pass tokens between transitions, or
stateful, i.e. they preserve tokens in ordered structures;
I
is a nite set of transitions,
IP=
. Transitions are either
temporal guards, consumers or intermediate transitions;
IC ⊆ (P×I)
is a nite non-empty set of input arcs (i.e, place
transition);
OC ⊆ (I×P)is a nite non-empty set of output arcs;
TT
:
PΣ
is a color function, where each place
pP
has
a single type that belongs to
Σ
, and all the tokens on this
place must be of the same type;
π
:
IC N0
is a priority function, where
(p,i),(p,i) ∈
IC {[π((p,i)) =π((p,i))] ↔ [p=p]};
98
LEAD: A Formal Specification For Event Processing DEBS ’19, June 24–28, 2019, Darmstadt, Germany
Stream(1)
Stream(n)
Producers
FROM Statement MATCH Statement
Sink L
H
EMIT Statement
Check FIRST
Check ANY
Figure 3: LEAD rule template interpreted in APCPN model
IT :IRis a time expression function;
G
:
Iboolean
is a guard function that maps each transi-
tion
iI
to a boolean expression over all the incoming arcs
IC (i) ⊆ IC ;
r0Ris an initial marking from the set of all markings R.
Having APCPNs dened, we show how to interpret LEAD rules
into a petri nets model in Figure 3. Every place is associated with one
type, that is the union of all the input tokens types. An exception
to this rule are producers representing streams, where tokens of
atomic types are generated. Therefore, producers are only of atomic
types, and sinks are of the complex event type that is dened in
the MATCH statement. In the next subsection, we show that the
process of translating our algebra into petri nets is straightforward.
5.1 Pattern Interpretation
The rst construction we present is sources. A source (Figure 4)
represents a stream instance, as a single stream can be used multiple
times in a pattern. For a source to pass event instances to the
following elements through the high priority arc
H
, it has to be
enabled (
In
predicate). Otherwise, incoming events will be directed
to the garbage consumer, that is the low priority arc L.
Sour c e
en able
HSi nk
GC
L
en able Si nk
Figure 4: Source representation and its compact version
A
Match
predicate is translated into a source and a blue stateless
place, while a
Past(Match)
predicate into a source and a red stateful
place. Tokens on red places are consumed once or more, based on
lters and selection & consumption policies, whereas tokens on
blue places are consumed immediately. Therefore, Conjunction and
Dis-junction can be translated naturally, as depicted in Figure 5.
More complex core operators, like repetitions and sequences,
are illustrated in Figure 6. The initial marking of a repetition in-
cludes an empty list. Every incoming token will be added to all the
ASA,BAB
B
ASA
AB
SB
B
Figure 5: From left to right: Dis-junction and Conjunction
representations
currently available lists, generating at least one list and at most
double the amount of incoming lists, if the system that implements
our specication allows item skipping in repetitions. Moreover,
the predicates
Wacc
,
Wr t
and
Win
take place over transitions to
decide whether to merge a new item, pass a list to the next parts of
the petri net or get rid of a list with a garbage consumer, respec-
tively. On the right half of the gure, we see that
B
, according to
its TRIO formula in Section 3.3, is enabled only when
A
is matched.
When
B
is matched eventually,
AB
res once or multiple times,
depending on selection & consumption policies.
A
GC
LA+
Wac c
H
SA+
[1|SA+| ∗2+1]
|SA+|
Wrt
H
Win
L
ASA
AB
SB
H
BGC
L
Figure 6: LtR: Repetition and Sequence representations
So far, we have spotted stateless and stateful places, and ex-
plained how priority works (
H
for high and
L
for low). We have
also seen intermediate and garbage consumer transitions. To under-
stand the temporal features of APCPNs, we introduce the concept
of aging events. Each event has two types of age: a global age, which
starts when an event is rst detected, and a local one that can be
assigned by transitions when needed. Aging events are of a great
importance to our proposed execution plan: they allow to dene
99
DEBS ’19, June 24–28, 2019, Darmstadt, Germany Al Bassit, Skhiri and Ammar
temporal constraints in APCPNs, where temporal guard transitions
consume tokens when they reach a certain age. In Figure 7, we
demonstrate representations for
W ait
and
W ithi n
operators.
W ait
starts by enabling the source
A
and a time event that will become
valid in 30 seconds, represented as a token that will re when its age
reaches 30. According to its TRIO formula, After 30 seconds have
passed, and only if there is at least one
A
match,
W ait
will match.
Otherwise, the time event will re the garbage consumer which
will disable the
A
source. On the other half,
W ithi n
is translated
into an ordered conjunction with a time limitation (compare with
conjunction in Figure 5). This representation reects the fact that
we are looking for events
B
that were matched before
A
. A
B
event
will wait for no more than 10 seconds, and events
A
will be garbage
collected if the conjunction is not enabled.
I n
A
enable
SA
T
30s
H
GC
disable
LASA
A.tB.t
H
SB
H
B
GC
L
T
10s GC
L
Figure 7: LtR: W ai t(A,30s)and Aw ithi n 10s f rom B
In Figure 8, we show how to translate negations. Negations
must have temporal boundaries. The representation on the left side
of the gure is for the pattern (
¬B within
10
s f rom A
), which is
again an ordered conjunction, where we are looking for an
A
event
followed by the absence of
B
for 10 seconds. The right pattern’s
(
¬A wit hin
10
s
) representation shows that we do not need to relate
a negation to an actual event.
BSB
AB GC
H
SA
H
A
B GC
L
T
10s A ∧ ¬B
L
ASA
GC
H
I n
T
10s
L
Figure 8: LtR: ¬B within 10s f r om A and ¬A wit hin 10s
In Figure 9, we present the key elements of a sub-context. This op-
erator is triggered whenever its parent repetition accepts a new item
(
Wacc
). Since a sub-context operator produces multiple branches
that work independently, it is necessary to make sure there are
no branches left before we terminate the execution. This is en-
sured by the active branches counter. Moreover, we dierentiate
between two states: sub-context enabled, and sub-context disabled.
The former refers to the possibility to accept more branches and/or
terminate with a match, while the latter means the sub-context is
not valid anymore. In order to retrieve the outcome of the state
S+
, a sub-context must be enabled, the count of active branches
must be zero, and the repetition that feeds the operator must have
been terminated. Otherwise, sub-context remains disabled and all
active tokens inside will be garbage consumed.
Wac c f il te r
cou nt
me r дe
S+Wst
di s ab l eden abled
Figure 9: Sub-context concise representation
We conclude this section with remarks about the hidden parts
of APCPNs. First, selection & consumption policies control the
order of tokens in their places, and how many times a token can
be consumed in stateful places. Second, every place, explicit or
hidden (a place on an arc between two transitions) is connected to
a temporal garbage consumer transition, which ensures that tokens
under any circumstances will always be consumed, satisfying the
liveness and boundness properties of petri nets.
6 RELATED WORK
The main dierences between the LEAD approach and the CEP
state of the art lay mainly in: (1) our usage of rst order logic to
formalize the pattern matching operators, (2) the usage of petri
net rather than an automata, and nally, (3) the separation of our
specication into two distinct levels – the pattern algebra and the
rule grammar. In this section, we compare LEAD with the state of
the art along four dimensions: the query model and the underlying
data model, the pattern representation, the formal denition of the
pattern operators, and nally resolving Example 1 comparison.
Query Model. The CEP query model can be classied in three
main categories: (1) DAG (Directed Acyclic Graph), (2) CQL (Con-
tinuous Query Language), (3) time points correlated by temporal
constraints. The rst two categories apply operations on streams to
dene thresholds that can re actions. For example, if more than 10
people enter this room within 10 minutes, raise an alarm. The last
category, however, expresses a pattern as a set of event occurrences
(time points), linked by temporal constraints. For instance, if Jimmy
enters in the room followed by Peter at least 10 seconds after Jimmy,
and if Cathy enters at least 10 seconds after Jimmy, but without
Peter going out of this room in the meantime, then re some action.
Aurora and Borealis [
9
], NiagaraCQ [
11
], STREAM [
2
], and even
later Spade [
22
] used a DAG of operations applied on streams to
represent the pattern. TelegraphCQ [
30
], DataCell [
33
], Esper [
17
]
and RuleCore [
38
] are CEP engines using CQL to express a pattern.
They adopt a similar approach, expressing operations on a stream,
where a threshold is dened, as a pattern. In this case, the query is
expressed by an extension of SQL. We believe that using CQL or
a DAG is not adapted to represent a complex situation, that is, by
denition, a set of events occurring within temporal constraints.
100
LEAD: A Formal Specification For Event Processing DEBS ’19, June 24–28, 2019, Darmstadt, Germany
Additionally, even when such systems support selection & con-
sumption policies, they are too dicult to express and understand.
Finally, T-Rex [
14
], CEL [
24
] and the Orange CRS [
25
] use temporal
constraints between events to express a complex situation. In these
systems a pattern is dened by a root event, linked to other events
and constrained by temporal structures. We relate LEAD to such
CEP systems, with one key dierence: LEAD provides the ability to
dene sub-contexts, where an event participating in the pattern can,
in turn, become the root event of a sub-pattern, enabling multiple
patterns to be matched concurrently to solve more complex situa-
tions. In addition, LEAD supports multiple outputs, and provides a
way to choose between multiple actions.
The pattern representation and processing architecture. In most
CEP systems, the way to represent a pattern is tightly linked to
the query model. The query is translated to a physical execution
plan, where the pattern is detected. As stated previously, Aurora
and Borealis [
9
], NiagaraCQ [
11
], STREAM [
2
], and Spade [
22
]
represented CEP pattern using Event Stream Processing and events
oating from one operator to the others. As a result, the pattern
is expressed as a DAG and the execution model is the DAG itself.
However, at that time, stream processing technologies were not
distributed and were not able to handle distributed states, as modern
stateful stream processors do [
19
,
28
,
42
]. Going further, the synapse
in STREAM [
2
] consists of states shared by operators for further
processing. This kind of architecture made its distribution dicult.
LEAD diers from these CEP systems by (1) the expressiveness
of the patterns given by our operators, (2) the formal denition
of these operators and, (3) the usage of petri nets as the logical
execution plan. We believe that petri nets provide a direct mapping
with modern stateful stream processors, such as Spark [
42
], Flink
[
19
] and Kafka Streams [
28
]. TelegraphCQ [
30
] and later systems
such as DataCell [
33
] employed a dierent approach: they built a
CEP engine on top of a relational engine optimized for this purpose.
In a similar way, the same three dierences apply here.
On the other hand, T-Rex [
14
], Siddhi [
41
] and Sase [
26
,
47
]
started investigating how automata can be used to describe patterns.
In LEAD we chose colored temporal petri nets over an automata.
Our intuition is that petri nets are better adapted for translating
patterns in the modern stateful stream processing. There are many
inherent challenges in distributing an automata, such as the dis-
tribution of the state, the linkage of the state transition, and the
management of the dierent instances representing the partial
matches. As an example, Flink CEP [
8
] generates a state machine
that is later contained in one heavy stream operator. This approach
does not leverage the processing distribution, the operator place-
ment optimization and the ecient multiple state management of
modern stream processors. Whereas, petri nets, although devel-
oped quite a long time ago, use a formalism close to the stream
processing frameworks and should facilitate the translation. This
claim will be validated in the continuation of this work.
Detecting complex events using petri nets is not unnatural, as
dierent variations of colored petri nets (CPNs) were adopted over
the last couple of decades for event detection purposes, in systems
like SAMOS [
20
,
21
] and MEdit4CEP-CP [
7
], out of many others.
Comparing to these systems, LEAD employs aging tokens to enable
temporal constraints, enriches the set of operators in CEP, not to
mention that the usage of petri nets is bounded only to the logical
level, sparing the users from dealing with them directly.
The formal denition of the pattern matching operators. Recent
years have revealed some remarkable specications in the domain
of CEP, of them we distinguish TESLA [
13
] and CEL [
24
]. Like
these works, we seek a CEP language with clear semantics, by
formalizing the behavior of our algebraic operators. However, we
dier by separating our specication into two distinct levels, namely
pattern algebra and rule grammar, not to mention LEAD’s ability
to express concurrent sub-rules, which gives our specication an
advantage over previous works. The use of a logical tool to represent
our operators does not position our approach in the logic-based
languages. On the contrary, since we interpret the rules into a petri
nets logical plan, LEAD is close to automata-based CEP frameworks
(for more information about the dierent approaches refer to [3]).
Example 1 comparison. The main advantages of LEAD are its
expressiveness (sub-context) and the ability to model complicated
scenarios using minimal number of queries. Knowing that less
queries does not necessarily provide better user experience, LEAD
provides these features in addition to the common set of operators
in CEP, and thus giving more sensible options. In comparison,
according to our calculations, modeling Example 1 would take
us 756 rules in Sase+ [
26
] (due to the absence of sequences) and no
less than 15 rules in Esper [17] and TESLA [13].
7 CONCLUSION
In this paper, we presented LEAD. We started by formally dening
a set of pattern operators, showing the expressive power they hold.
Then, we introduced the way to utilize these operators in LEAD
rules, and nally, how APCPNs can be used to represent these rules.
A product roll-up example was used along the way to show the
limitations of other CEP languages and the advantages LEAD brings,
particularly at the expressiveness level. Next steps in our research
include further investigating our beliefs about the advantages of
APCPNs, followed by leveraging on a distributed stateful stream
processing framework, such as Apache Flink [
19
], to implement
LEAD rules. Additionally, we want to dene acomplete compilation
strategy for generating algebraic constructions using the APCPNs
from the abstract syntax tree of statements. Finally, we aim at
discussing and implementing query optimizations on both logical
and physical levels, and demonstrating the power of our approach
by benchmarking the performance of LEAD CEP.
ACKNOWLEDGMENTS
The elaboration of this scientic paper was supported by the Min-
istry of Economy, Industry, Research, Innovation, IT, Employment
and Education of the Region of Wallonia (Belgium), by the funding
of the industrial research project Jericho (convention no. 7717).
REFERENCES
[1] Asaf Adi and Opher Etzion. 2004. Amit - The situation manager. VLDB Journal
13, 2 (2004), 177–203. https://doi.org/10.1007/s00778-003- 0108-y
[2]
Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Itaru
Nishizawa, Justin Rosenstein, and Jennifer Widom. 2003. STREAM: The Stanford
Stream Data Manager (Demonstration Description). In Proceedings of the 2003
ACM SIGMOD International Conference on Management of Data (SIGMOD ’03).
ACM, New York, NY, USA, 665–665. https://doi.org/10.1145/872757.872854
101
DEBS ’19, June 24–28, 2019, Darmstadt, Germany Al Bassit, Skhiri and Ammar
[3]
Alexander Artikis, Alessandro Margara, Martin Ugarte, Stijn Vansummeren, and
Matthias Weidlich. 2017. Complex Event Recognition Languages: Tutorial. In
Proceedings of the 11th ACM International Conference on Distributed and Event-
based Systems (DEBS ’17). ACM, New York, NY, USA, 7–10. https://doi.org/10.
1145/3093742.3095106
[4]
Lars Baumgärtner, Christian Strack, Bastian Ho, Marc Seidemann, Bernhard
Seeger, and Bernd Freisleben. 2015. Complex Event Processing for Reactive
Security Monitoring in Virtualized Computer Systems. In Proceedings of the 9th
ACM International Conference on Distributed Event-Based Systems (DEBS ’15).
ACM, New York, NY, USA, 22–33. https://doi.org/10.1145/2675743.2771829
[5]
Eike Best and Maciej Koutny. 1992. Petri net semantics of priority systems.
Theoretical Computer Science 96, 1 (1992), 175 – 215. https://doi.org/10.1016/
0304-3975(92)90184- H
[6]
R. Bhargavi, V. Vaidehi, P. T. V. Bhuvaneswari, P. Balamuralidhar, and M. G.
Chandra. 2010. Complex Event Processing for object tracking and intrusion
detection in Wireless Sensor Networks. In 2010 11th International Conference on
Control Automation Robotics Vision. 848–853. https://doi.org/10.1109/ICARCV.
2010.5707288
[7] Juan Boubeta-Puig, Gregorio DÃŋaz, Hermenegilda MaciÃă, ValentÃŋn Valero,
and Guadalupe Ortiz. 2019. MEdit4CEP-CPN: An approach for complex event
processing modeling by prioritized colored petri nets. Information Systems 81
(2019), 267 – 289. https://doi.org/10.1016/j.is.2017.11.005
[8]
Flink CEP. 2018. Complex Event Processing (CEP) with Apache Flink. Retrieved
February 5, 2019 from https://ink.apache.org/news/2016/04/06/cep-monitoring.
html
[9]
Uğur Çetintemel, Daniel Abadi, Yanif Ahmad, Hari Balakrishnan, Magdalena
Balazinska, Mitch Cherniack, Jeong-Hyon Hwang, Samuel Madden, Anurag
Maskey, Alexander Rasin, Esther Ryvkina, Mike Stonebraker, Nesime Tatbul,
Ying Xing, and Stan Zdonik. 2016. The Aurora and Borealis Stream Processing
Engines. Springer Berlin Heidelberg, Berlin, Heidelberg, 337–359. https://doi.
org/10.1007/978-3- 540-28608- 0_17
[10]
Sharma Chakravarthy and Deepak Mishra. 1994. Snoop: An expressive event
specication language for active databases. Data & Knowledge Engineering 14, 1
(1994), 1–26. https://doi.org/10.1016/0169-023X(94)90006- X
[11]
Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A
Scalable Continuous Query System for Internet Databases. SIGMOD Rec. 29, 2
(May 2000), 379–390. https://doi.org/10.1145/335191.335432
[12]
Gianpaolo Cugola and Alessandro Margara. 2009. RACED: An Adaptive Mid-
dleware for Complex Event Detection. In Proceedings of the 8th International
Workshop on Adaptive and Reective MIddleware (ARM ’09). ACM, New York, NY,
USA, Article 5, 6 pages. https://doi.org/10.1145/1658185.1658190
[13]
Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A Formally Dened
Event Specication Language. In Proceedings of the Fourth ACM International
Conference on Distributed Event-Based Systems (DEBS ’10). ACM, New York, NY,
USA, 50–61. https://doi.org/10.1145/1827418.1827427
[14]
Gianpaolo Cugola and Alessandro Margara. 2012. Complex Event Processing
with T-REX. J. Syst. Softw. 85, 8 (Aug. 2012), 1709–1728. https://doi.org/10.1016/
j.jss.2012.03.056
[15]
Gianpaolo Cugola and Alessandro Margara. 2012. Processing ows of information.
Comput. Surveys 44, 3 (2012), 1–62. https://doi.org/10.1145/2187671.2187677
[16]
Alan Demers, Johannes Gehrke, Mingsheng Hong, Mirek Riedewald, and Walker
White. 2005. A general algebra and implementation for monitoring event streams.
(2005).
[17]
EsperTech. 2006. Esper enterprise edition website. Retrieved February 5, 2019
from http://www.espertech.com/
[18]
James F. Allen and George Ferguson. 1994. Actions and Events in Interval
Temporal Logic. J. Log. Comput. 4 (10 1994), 531–579. https://doi.org/10.1093/
logcom/4.5.531
[19]
Apache Flink. 2018. Stateful Computations over Data Streams. Retrieved
February 5, 2019 from https://ink.apache.org/
[20]
S. Gatziu and K. R. Dittrich. 1994. Detecting composite events in active database
systems using Petri nets. In Proceedings of IEEE International Workshop on Research
Issues in Data Engineering: Active Databases Systems. 2–9. https://doi.org/10.
1109/RIDE.1994.282859
[21]
Stella Gatziu, Andreas Geppert, and Klaus R. Dittrich. 1995. The SAMOS Active
DBMS Prototype. SIGMOD Rec. 24, 2 (May 1995), 480–. https://doi.org/10.1145/
568271.223893
[22]
Bugra Gedik, Henrique Andrade, Kun-Lung Wu, Philip S. Yu, and Myungcheol
Doo. 2008. SPADE: The System s Declarative Stream Processing Engine. In
Proceedings of the 2008 ACM SIGMOD International Conference on Management of
Data (SIGMOD ’08). ACM, New York, NY, USA, 1123–1134. https://doi.org/10.
1145/1376616.1376729
[23]
C. Ghezzi, D. Mandrioli, and A. Morzenti. 1990. TRIO: A Logic Language for
Executable Specications of Real-time Systems. J. Syst. Softw. 12, 2 (May 1990),
107–123. https://doi.org/10.1016/0164-1212(90)90074- V
[24]
Alejandro Grez, Cristian Riveros, and Martín Ugarte. 2017. Foundations of
Complex Event Processing. CoRR abs/1709.05369 (2017).
[25]
Bruno Guerraz and Christophe Dousson. 2004. Chronicles Construction Starting
from the Fault Model of the System to Diagnose. nternational Workshop on
Principles of Diagnosis 15 (1 2004), 267 – 289.
[26]
D. Gyllstrom, J. Agrawal, Y. Diao, and N. Immerman. 2008. On Supporting Kleene
Closure over Event Streams. In 2008 IEEE 24th International Conference on Data
Engineering. 1391–1393. https://doi.org/10.1109/ICDE.2008.4497566
[27]
Mingsheng Hong, Mirek Riedewald, Christoph Koch, Johannes Gehrke, and
Alan Demers. 2009. Rule-based Multi-query Optimization. In Proceedings of the
12th International Conference on Extending Database Technology: Advances in
Database Technology (EDBT ’09). ACM, New York, NY, USA, 120–131. https:
//doi.org/10.1145/1516360.1516376
[28]
Apache Kafka. 2015. Kafka Streams. Retrieved February 19, 2019 from https:
//kafka.apache.org/documentation/streams/
[29]
Y. Kim and C. Jeong. 2014. Risk Prediction System Based on Risk Prediction
Model with Complex Event Processing: Risk Prediction in Real Time on Complex
Event Processing Engine. In 2014 IEEE Fourth International Conference on Big
Data and Cloud Computing. 711–715. https://doi.org/10.1109/BDCloud.2014.43
[30]
Sailesh Krishnamurthy, Sirish Chandrasekaran, Owen Cooper, Amol Deshpande,
Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Samuel R. Madden, Fred
Reiss, and Mehul A. Shah. 2003. TelegraphCQ: An Architectural Status Report.
IEEE DATA ENGINEERING BULLETIN 26 (2003), 2003.
[31]
Fred Kröger. 1987. Temporal Logic of Programs. EATCS Monographs on
Theoretical Computer Science, Vol. 8. Springer. https://doi.org/10.1007/
978-3- 642-71549- 5
[32]
R. Lee, C. Yu, M. Liang, and M. Feng. 2009. An Approach to Children Surveil-
lance with Sensor-Based Signals Using Complex Event Processing. In 2009 IEEE
International Conference on e-Business Engineering. 596–601. https://doi.org/10.
1109/ICEBE.2009.94
[33]
Erietta Liarou, Romulo Goncalves, and Stratos Idreos. 2009. Exploiting the power
of relational databases for ecient stream processing. ERCIM News 2009 (2009).
[34]
David C. Luckham. 1996. Rapide: A Language and Toolset for Simulation of
Distributed Systems by Partial Orderings of Events. In Princeton University.
[35]
David. C. Luckham and J. Vera. 1995. An event-based architecture denition
language. IEEE Transactions on Software Engineering 21, 9 (Sept 1995), 717–734.
https://doi.org/10.1109/32.464548
[36]
Angelo Morzenti, Dino Mandrioli, and Carlo Ghezzi. 1992. A Model Parametric
Real-time Logic. ACM Trans. Program. Lang. Syst. 14, 4 (Oct. 1992), 521–573.
https://doi.org/10.1145/133233.129397
[37]
Amir Pnueli. 1979. The Temporal Semantics of Concurrent Programs. In Pro-
ceedings of the International Sympoisum on Semantics of Concurrent Computa-
tion. Springer-Verlag, London, UK, UK, 1–20. http://dl.acm.org/citation.cfm?id=
647172.716123
[38]
RuleCore. 2015. Complex Event Pattern Detector. Retrieved February 19, 2019
from https://sourceforge.net/projects/rulecore/
[39]
Werner Schmidt. 2013. Business Activity Monitoring (BAM). Springer-Verlag
London. 229–242 pages.
[40]
P. G. Shinde and M. M. Dongre. 2017. Trac congestion detection with
complex event processing in VANET. In 2017 Fourteenth International Confer-
ence on Wireless and Optical Communications Networks (WOCN). 1–5. https:
//doi.org/10.1109/WOCN.2017.8065852
[41]
Siddhi. 2016. A stream and complex event processor. https://wso2.github.io/
siddhi/ Accessed: 2018-10-22.
[42]
Apache Spark. 2018. Unied Analytics Engine for Big Data. Retrieved February
5, 2019 from https://spark.apache.org/
[43]
TIBCO. 2013. TIBCO BusinessEvents: Event Stream Processing soft-
ware. Retrieved February 5, 2019 from https://docs.tibco.com/products/
tibco-businessevents- event-stream- processing-5- 4-0
[44]
V Volovoi. 2004. Modeling of system reliability Petri nets with aging tokens.
Reliability Engineering & System Safety 84, 2 (2004), 149–161. https://doi.org/10.
1016/j.ress.2003.10.013
[45]
G. Wang and G. Jin. 2008. Research and Design of RFID Data Processing Model
Based on Complex Event Processing. In 2008 International Conference on Computer
Science and Software Engineering, Vol. 5. 1396–1399. https://doi.org/10.1109/
CSSE.2008.253
[46]
Y. Wangand S. Yang. 2010. High-performance complex event processing for large-
scale RFID applications. In 2010 2nd International Conference on Signal Processing
Systems, Vol. 1. V1–127–V1–131. https://doi.org/10.1109/ICSPS.2010.5555586
[47]
Eugene Wu, Yanlei Diao, and Shariq Rizvi. 2006. High-performance Complex
Event Processing over Streams. In Proceedings of the 2006 ACM SIGMOD Interna-
tional Conference on Management of Data (SIGMOD ’06). ACM, New York, NY,
USA, 407–418. https://doi.org/10.1145/1142473.1142520
[48]
Wen Yao, Chao Hsien Chu, and Zang Li. 2011. Leveraging complex event process-
ing for smart hospitals using RFID. J. Netw. Comput. Appl. 34, 3 (2011), 799–810.
https://doi.org/10.1016/j.jnca.2010.04.020
[49]
Detlef Zimmer. 1999. On the Semantics of Complex Events in Active Database
Management Systems. In Proceedings of the 15th International Conference on Data
Engineering (ICDE ’99). IEEE Computer Society, Washington, DC, USA, 392–.
http://dl.acm.org/citation.cfm?id=846218.847253
102
... A number of works have used different formalisms. Al Bassit et al. [34] defined LEAD, a pattern algebra that extends the common set of operators in CEP. These authors defined a set of pattern operators, a rule grammar and a logical execution plan, which is based on a combination of timed colored Petri nets with aging tokens and prioritized Petri nets. ...
Article
Full-text available
Prioritized Colored Petri Nets (PCPNs) are a well-known extension of plain Petri nets in which transitions can have priorities and the tokens on the places carry data information. In this paper, we propose an extension of the PCPN model with black sequencing transitions (BPCPN). This extension allows us to easily model the ordered firing of the same transition using an ordered set of tokens on one of its precondition places. Black sequencing transitions are then presented as a shorthand notation in order to model the processing of a flow of events, represented by one of their precondition places. We then show how black sequencing transitions can be encoded into PCPNs, and their application to model Complex Event Processing (CEP), defining a compositional approach to translate some of the most relevant event pattern operators. We have developed MEdit4CEP-BPCPN, an extension of the MEdit4CEP tool, to provide tool support for this novel technique, thus allowing end users to easily define event patterns and obtain an automatic translation into BPCPNs. This can, in turn, be transformed into a corresponding PCPN, and then be immediately used in CPN Tools. Finally, a health case study concerning the monitoring of pregnant women is considered to illustrate how the event patterns are created and how the BPCPN and PCPN models are obtained by using the MEdit4CEP-BPCPN tool.
Article
Full-text available
Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating heterogeneous distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP frameworks are based on ad-hoc solutions that do not rely on solid theoretical ground, making them hard to understand, extend or generalize. Moreover, they are usually presented as application programming interfaces documented by examples, and using each of them requires learning a different set of skills. In this paper we embark on the task of giving a rigorous framework to CEP. As a starting point, we propose a formal language for specifying complex events, called CEPL, that contains the common features used in the literature and has a simple and denotational semantics. We also formalize the so-called selection strategies, which are the cornerstone of CEP and had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we study how to efficiently evaluate CEPL for processing complex events. We provide optimization results based on rewriting formulas to a normal form that simplifies the evaluation of filters. Furthermore, we introduce a formal computational model for CEP based on transducers and symbolic automata, called match automata, that captures the regular core of CEPL, i.e. formulas with unary predicates. By using rewriting techniques and automata-based translations, we show that formulas in the regular core of CEPL can be evaluated using constant time per event followed by constant-delay enumeration of the output (under data complexity). By gathering these results together, we propose a framework for efficiently evaluating CEPL, establishing solid foundations for future CEP systems.
Conference Paper
Full-text available
Complex event recognition (CER) refers to the detection of events in Big Data streams. The paper presents a summary of the most prominent models and algorithms for CER, and discusses the main conceptual links and the differences between them.
Conference Paper
Full-text available
The number of security incidents in computer systems is steadily increasing, despite intrusion detection and prevention mechanisms deployed as countermeasures. Many existing intrusion detection and prevention systems struggle to keep up with new threats posed by zero-day attacks and/or have serious performance impacts through extensive monitoring , questioning their effectiveness in most real-life scenarios. In this paper, we present a new approach for reactive security monitoring in a virtualized computer environment based on minimally-intrusive dynamic sensors deployed vertically across virtualization layers and horizontally within a virtual machine instance. The sensor streams are analyzed using a novel federation of complex event processing engines and an optimized query index to maximize the performance of continuous queries, and the results of the analysis are used to trigger appropriate actions on different virtualization layers in response to detected security anomalies. Furthermore , a novel event store that supports fast event logging is utilized for offline analysis of collected historical data. Experiments show that the proposed system can execute tens of thousands of complex, stateful detection rules simultaneously and trigger actions efficiently and with low latency.
Article
Complex Event Processing (CEP) is an event-based technology that allows us to process and correlate large data streams in order to promptly detect meaningful events or situations and respond to them appropriately. CEP implementations rely on the so-called Event Processing Languages (EPLs), which are used to implement the specific event types and event patterns to be detected for a particular application domain. To spare domain experts this implementation, the MEdit4CEP approach provides them with a graphical modeling editor for CEP domain, event pattern and action definition. From these graphical models, the editor automatically generates a corresponding Esper EPL code. Nevertheless, the generated code is syntactically but not semantically validated. To address this problem, MEdit4CEP is extended in this paper by Prioritized Colored Petri Net (PCPN) formalism, resulting in the MEdit4CEP-CPN approach. This approach provides both a novel PCPN domain-specific modeling language and a graphical editor. By using model transformations, event pattern models can be automatically transformed into PCPN models, and then into the corresponding PCPN code executable by CPN Tools. In addition, by using PCPNs we can compare the expected output with the actual output and can even conduct a quantitative analysis of the scenarios of interest. To illustrate our approach, we have conducted an air quality level detection case study and we show how this novel approach facilitates the modeling, simulation, analysis and semantic validation of complex event-based systems.
Chapter
Over the last several years, a great deal of progress has been made in the area of stream-processing engines (SPEs). Three basic tenets distinguish SPEs from current data processing engines. First, they must support primitives for streaming applications. Unlike Online Transaction Processing (OLTP), which processes messages in isolation, streaming applications entail time series operations on streams of messages. Second, streaming applications entail a real-time component. If one is content to see an answer later, then one can store incoming messages in a data warehouse and run a historical query on the warehouse to find information of interest. This tactic does not work if the answer must be constructed in real time. The need for real-time answers also dictates a fundamentally different storage architecture. DBMSs universally store and index data records before making them available for query activity. Such outbound processing, where data are stored before being processed, cannot deliver real-time latency, as required by SPEs. To meet more stringent latency requirements, SPEs must adopt an alternate model, which we refer to as “inbound processing”, where query processing is performed directly on incoming messages before (or instead of) storing them. Lastly, an SPE must have capabilities to gracefully deal with spikes in message load. Incoming traffic is usually bursty, and it is desirable to selectively degrade the performance of the applications running on an SPE. The Aurora stream-processing engine, motivated by these three tenets, is currently operational, has been used to build various application systems, and has been transferred to the commercial domain. Borealis is a distributed stream-processing system that inherits core stream-processing functionality from Aurora and enriches it with distribution functionality, in order to provide advanced capabilities that are commonly required by newly emerging stream-processing applications.
Chapter
The formalism of Temporal logic is suggested as an appropriate tool for formalizing the semantics of concurrent programs. A simple model of concurrent program is presented in which n processors are executing concurrently n disjoint programs under a shared memory environment. The semantics of such a program specifies the class of state sequences which are admissible as proper execution sequences under the program. The two main criteria which are required are a) Each state is obtained from its predecessor in the sequence by exactly one processor performing an atomic instruction in its process. b) Fair Scheduling: No processor which is infinitely often enabled will be indefinitely delayed. The basic elements of Temporal Logic are introduced in a particular logic framework DX. The usefulness of Temporal Logic notation in describing properties of concurrent programs is demonstrated. A construction is then given for assigning to a program P a temporal formula W(P) which is true on all proper execution sequences of P. In order to prove that a program P possesses a property R, one has only to prove the implication W(P)⊃R. An example of such proof is given. It is then demonstrated that specification of the Temporal character of the program's behavior is absolutely essential for the unabiguous undestanding of the meaning of programming constructs.
Article
System failure mainly arises from various reasons. It would most likely become an immediate cause of accident. It is difficult to predict system failure in real-time, since it is hard to detect which subsystem caused the whole system failure. To track the system failure in real-time, we have to recognize immediately the failure of each subsystem. In this paper, we propose new a risk prediction model and risk prediction system architecture to trace the failure of system in real-time by using fault tree analysis (FTA) on complex event processing (CEP) engine. We shall show our new risk prediction model efficiently enables CEP system to predict the risk before any undesired event occurs, and to prevent disaster, accident and disease in advance.
Article
Continuous queries are persistent queries that allow users to receive new results when they become available. While continuous query systems can transform a passive web into an active environment, they need to be able to support millions of queries due to the scale of the Internet. No existing systems have achieved this level of scalability. NiagaraCQ addresses this problem by grouping continuous queries based on the observation that many web queries share similar structures. Grouped queries can share the common computation, tend to fit in memory and can reduce the I/O cost significantly. Furthermore, grouping on selection predicates can eliminate a large number of unnecessary query invocations. Our grouping technique is distinguished from previous group optimization approaches in the following ways. First, we use an incremental group optimization strategy with dynamic re-grouping. New queries are added to existing query groups, without having to regroup already installed queries. Second, we use a query-split scheme that requires minimal changes to a general-purpose query engine. Third, NiagaraCQ groups both change-based and timer-based queries in a uniform way. To insure that NiagaraCQ is scalable, we have also employed other techniques including incremental evaluation of continuous queries, use of both pull and push models for detecting heterogeneous data source changes, and memory caching. This paper presents the design of NiagaraCQ system and gives some experimental results on the system's performance and scalability.