Content uploaded by Rajrup Ghosh
Author content
All content in this area was uploaded by Rajrup Ghosh on Oct 20, 2018
Content may be subject to copyright.
Adaptive Energy-aware Scheduling of Dynamic Event Analytics
across Edge and Cloud Resources
Rajrup Ghosh*, Siva Prakash Reddy Komma and Yogesh Simmhan
Computational and Data Sciences, Indian Institute of Science, Bangalore, India
Email: rajrup.withbestwishes@gmail.com, sivaprakash@iisc.ac.in, simmhan@iisc.ac.in
Abstract—The growing deployment of sensors as part of
Internet of Things (IoT) is generating thousands of event
streams. Complex Event Processing (CEP) queries offer a useful
paradigm for rapid decision-making over such data sources.
While often centralized in the Cloud, the deployment of capable
edge devices on the field motivates the need for cooperative
event analytics that span Edge and Cloud computing. Here,
we identify a novel problem of query placement on edge
and Cloud resources for dynamically arriving and departing
analytic dataflows. We define this as an optimization problem
to minimize the total makespan for all event analytics, while
meeting energy and compute constraints of the resources. We
propose 4adaptive heuristics and 3rebalancing strategies
for such dynamic dataflows, and validate them using detailed
simulations for 100 −1000 edge devices and VMs. The results
show that our heuristics offer O(seconds)planning time, give
a valid and high quality solution in all cases, and reduce the
number of query migrations. Furthermore, rebalance strate-
gies when applied in these heuristics have significantly reduced
the makespan by around 20 −25%.
1. Introduction
Internet of Things (IoT) is a distributed systems
paradigm where sensors and actuators are connected through
communication channels with the wider Internet to help
observe and control large and complex physical systems.
IoT manifests itself in various application domains, such as
Industrial IoT [1] and Smart utilities. E.g., a city utility may
monitor the power grid and water network to identify power
outages and water leaks or to predict near-term demand,
based on real-time consumer and network observations [2].
Event analytics form a key aspect of this “smartness”,
and translates the observations from numerous sensors
into actionable intelligence [3]. Complex Event Processing
(CEP) is one form of event analytics where continuous
queries are defined over one or more event streams to detect
patterns of interest [4]. Such patterns can include filters on
properties in the events, aggregation over a time or count
window or events, or a sequence of events that match a
trend. A CEP query is similar to an SQL query defined over
*This work was done as a part of research at the Indian Institute of Science,
Bangalore. The author is currently affiliated with Samsung R&D Institute,
Bangalore.
unbounded and transient stream(s) of tuples, and that gener-
ates one or more output event streams. CEP query engines
such as Apache Edgent [5] and WSO2 Siddhi [6], perform
these queries with low latency and have been applied to IoT
domains for online decision-making [3], [7]. The queries
themselves can be chained and composed into arbitrary
dataflows, each of which we refer to as an event analytic, to
capture complex situations and their event-driven responses.
Event analytics are typically performed centrally on
Cloud resources. However, IoT deployments offer captive
access to gateway devices with non-trivial computing ca-
pacity [8], [9]. E.g., a Raspberry Pi 2 device, often used
as an IoT gateway, has 4-core ARM 64-bit CPU and sells
for US$ 35, but has about 1
3
rd the compute capacity for
CEP analytics as a 4-core Intel Xeon CPU on a Microsoft
Azure VM that costs ≈US$ 150/month [10]. An IoT in-
frastructure may have hundreds of such “free” edge devices
available.
Besides the cost benefits, edge devices also have lower
network latency from the sensor event source to the device,
in comparison to the Cloud data center [11]. At the same
time, Cloud resources are more reliable, and have seemingly
infinite on-demand capacity for a fee. Edges suffer from
limited resource capacity on individual devices, and often
have energy constraints, e.g., based on the recharge cycle
for a solar-powered Pi. As a result, using edge-computing
as a first class resource to complement Cloud computing is
essential for IoT applications [12].
Problem. Given an IoT deployment of sensors and edge
devices, we have a set of users and applications who wish
to perform event analytics over the sensor event streams
using the edge and Cloud resources that are available. These
event analytics themselves are transient, being registered and
active for a certain period of time before user deregister them
once their interest passes. As a result, we have a dynamic
situation where the analytics arrive and depart, will be
sharing the same set of edge and Cloud resources, and need
to perform their CEP queries with low latency. We propose
adaptive scheduling strategies to place the CEP queries
from the dynamic analytic dataflows onto the distributed
edge and Cloud resources to minimize the total makespan
for all dataflows, while addressing the compute and net-
work constraints of the resources, the energy constraints
of the edge devices, and interference between queries on
arXiv:1801.01087v1 [cs.DC] 3 Jan 2018
the same resource. This problem is motivated by real-world
concerns IoT deployments for smart water and power man-
agement [2], [13], with city-scale applications as well [14].
In our prior work [10], we have considered static
scheduling of a single CEP dataflow on a set of edge
and Cloud resources while meeting the energy and com-
pute constraints, which was solved using a Genetic Al-
gorithm (GA) meta-heuristic. Here, we consider multiple
analytic dataflows that arrive and depart the system, propose
novel heuristics and rebalancing strategies for the deployed
dataflows, and evaluate these strategies against the earlier
GA approach.
We make the following specific contributions here:
1) We motivate the need to schedule event analytics on
edge and Cloud resources (§2), and formalize the op-
timization problem and constraints for adaptive place-
ment of dataflows that dynamically arrive/depart (§3).
2) We propose novel heuristics and rebalancing strategies
to solve the above problem, besides extending a prior
GA-based approach for a dynamic scenario (§4).
3) We validate the scheduling strategies for their quality
(latency and stabilization time) and performance (time
complexity) using 39 real CEP dataflows on 100−1000
edge devices and VMs, with arrival and departure mod-
eled as random walk and Poisson distributions (§5).
In addition, we also review related literature (§6) and
summarize our conclusions (§7).
2. Motivation
Campus and community scale IoT testbeds, as well as
a few city-scale deployments, are coming online [2], [14],
[15]. One such example is the Smart Campus Project [13]
which is deploying an IoT fabric of sensors, actuators
and gateway devices across the IISc university campus in
Bangalore. This will help understand practical and research
challenges on Smart City platforms [16]. The campus is
spread across 1.6km2with over 10,000 students, staff and
faculty, and 50 buildings, that is representative of a small
township. The deployment initially targets smart water man-
agement, with water level and quality sensors and flowme-
ters deployed, along with actuators to control pumping and
valve operations. Event streams from hundreds of sensor
observations flow through Raspberry Pi gateway devices that
sit between the sensors, connected through ZigBee/LoRA
wireless protocols, and the campus backbone. Event analyt-
ics drive decision-making such as turning on and off pumps
when the water level is low or high, and sending alerts when
water quality drops or leakages in the network is detected.
Numerous CEP queries help with online processing of
these real-time streams to detect such situations of interest.
E.g., the following queries over water quality and level
sensors, sampled every 5mins, detect when the chlorine
level drops below a safety threshold [17], the water level
drops rapidly over 15 mins indicating leakage, and report
the average sliding usage over 60 mins. They are described
using the Siddhi CEP engine’s query model, and represent
filter, sequence, and aggregate query types.
◦FROM qltyStm[clMgL < 0.2]
SELECT clMgL INSERT INTO qltyAlertStm;
◦FROM EVERY l1=lvlStm,l2=lvlStm[l1.htCm-l2.hCm>25],
l3=lvlStm[l2.htCm-l3.htCm>25]
SELECT l1.htCm,l3.htCm INSERT INTO leakAlertStm;
◦FROM lvlStm #window.length(12)
SELECT avg(htCm) as avgHt INSERT INTO avgLvlStm;
CEP engines like Siddhi allow composition of queries
as a Directed Acyclic Graph (DAG), where the output
stream from one query feeds in as the input stream for
one or more queries. Such dataflows help utility managers
design meaningful event analytics from modular queries.
Many such event analytics will be active at a time. Some
will be persistent to drive dashboards and alerts, while
others are exploratory or personalized for individual build-
ings or analysts. These event analytic dataflows will be
registered/deregistered with our IoT platform for deploy-
ment onto available resources for execution on the rele-
vant streams. Reducing the end-to-end latency for a given
dataflow is one of the Quality of Service (QoS) metrics.
Besides 10 −1000sof such Pi’s active in a typical
campus setup, we also access pay-as-you-go Cloud VMs
at Microsoft’s Azure Data Center in Singapore where the
backend services run. The Pi’s are powered by solar panels
and rechargeable batteries. While the Pi’s can be used for
performing CEP queries within the compute, memory and
network constraints of each device, we should also ensure
that the battery of these gateways must not fully drain, and
cause even the basic sensing capability to be lost.
3. Optimization Problem for Query Placement
Here, we formally define the problem of scheduling CEP
queries present in analytic dataflows that dynamically arrive
and depart, onto edge and Cloud resources. This has the
goal of reducing the overall latency of the running dataflows,
while meeting compute, network and energy constraints. In
§3.1, we introduce and reuse notations from our earlier work
on static placement [10], and use these in §3.2 onward to
formalize the dynamic variant of the optimization problem.
3.1. Preliminaries: Single Analytic Dataflow
The analytic dataflow is represented as a Directed
Acyclic Graph (DAG) of vertices and edges: G=hV,E i,
where V={vi}is the set of CEP queries that are the
vertices of the DAG, and E=hvi, vjiis the set of event
streams that pass the output from vito the input of vj[10].
Source queries (VSRC
i) serve as a dummy input to the DAG,
representing the source sensor stream and do not have any
predecessor queries, while sink queries (VSN K
i) are the final
output from the analytic and do not have successors. We
assume that the output events of a query are duplicated
across all outgoing edges from that vertex and the inputs
for a query from multiple incoming edges are interleaved.
Apath pi∈ P of length nin the graph Gis a connected
sequence of nedges with n+1 vertices, starting at a source
query and ending at a sink query. Pis the set of all paths
in the DAG.
Stream rate is the number of events passing per unit
time on a stream [10]. The input stream rate,Ωin, to a
dataflow is the sum of the output stream rates from all
source queries in the DAG, while the output stream rate,
Ωout, for the dataflow is the sum of output stream rates
from the sink queries. A selectivity function σ(vi)gives the
average number of output events expected for each input
event processed by a query. The input stream rate,ωin
i, for
a vertex viis the sum of the stream rates on all its incoming
edges and its output stream rate is the product of its incom-
ing stream rate and its selectivity, ωout
i=ωin
i×σ(vi). We
can then recursively compute the input and output rates for
downstream queries vj, and the output rate for the entire
DAG. For simplicity, if the output stream rate for all source
queries vk∈ VSRC is uniform, we have ωout
k=Ωin
|VS RC |.
The selectivity for the whole dataflow is σ(G) = Ωout
Ωin .
We consider two classes of computing resources –RE
for edge devices and RCfor Cloud VMs, with the set of
all computing resources available in the IoT fabric given by
R={rk}=RE∪RCand RE∩RC=∅[10]. A CEP
query in a dataflow executes on a single resource rk, and a
resource mapping function indicates this, µ:V → R.
Compute latency (or latency) denoted by λk
iis the time
taken to process one event by a query vion an exclusive
resource rk. This will depend on both the query type as well
as the resource type. If the size of an event that is emitted
by the query on its outgoing edge(s) is δi, and the network
latency and network bandwidth between two resources rm
and rnis given by lm,n and βm,n, respectively, then the
end-to-end latency along a path p∈ P for a given resource
mapping µfor the DAG is defined as,
Lp=X
hvi,vji∈p
(vi,rm)∈µ
(vj,rn)∈µ
λm
i+lm,n +δi
βm,n
The maximum over the end-to-end latency along all paths
p∈ P gives us the makespan for the DAG for the given
mapping, with the maximum path called the critical path.
LG= max
∀p∈P(Lp)
3.2. Preliminaries: Dynamic Analytic Dataflows
We extend the single static dataflow above to a dynamic
situation where analytics arrive and depart the system over
time. We represent the set of active dataflows at logical
time tas Gt={G0,G1, ..., Gn}, where Gi=hVi,Eiiis
one active dataflow with CEP queries vi,k ∈ Vi. They
have corresponding input and output stream rates of Ωin
i
and Ωout
i, respectively. Their selectivities are denoted as
σ(Gt) = {σ(G0), σ(G1), ..., σ(Gn)}, where σ(Gi) = Ωout
i
Ωin
i
.
When these dataflows are mapped to a set of edge and
Cloud resources R, their mapping functions set at time tis
represented as Mt={µt
0, µt
1, ..., µt
n}, where µi:Vi→R.
The system of analytic dataflows deployed on the set of
IoT resources at time tmay undergo a change at each sub-
sequent time interval. We define a control interval,θ, as the
time period at which dataflows can arrive or depart. This is
the interval at which (re)mapping decisions should be made.
At each control interval, we consider three possible activities
by the users: a new DAG arrives, an existing DAG departs,
or no change happens, and we take appropriate scheduling
actions. For convenience, we increment the logical time t
in units of θso that t+ 1 indicates the next control interval.
Here, we assume that at each control interval, only a single
dataflow may arrive or depart, and the dataflows’ input rates
are stable. But these can be generalized in future.
As the analytic dataflows arrive and depart, it is neces-
sary to deploy new queries onto available resources, or stop
old queries and release resources. Additionally, remapping
of queries of other active dataflows may ensure that their
performances are not affected. Let at time t, the mapping
functions for the active dataflows Gtbe the set Mt. If
a new DAG arrives or leaves, the set M(t+1) will have
a new mapping added, an old mapping removed, and/or
reconfigurations of mappings of dataflows that stay active.
The time taken to find the new mappings for the active
DAGs at time tis the schedule planning time φt.
Areconfiguration of the mapping for a dataflow that
continues to be active after a control interval will cause a
change in its prior mapping of queries to resources. We de-
fine a binary function ρt(vk,i)for a vertex vk,i ∈ Vkin DAG
Gkat time tto capture the occurrence of a reconfiguration,
ρt(vk,i) = (1hvk,i, rpi ∈ µt
k,hvk,i, rqi ∈ µt+1
k, rp6=rq
0otherwise
Migration time is a constant time ηtaken for moving a CEP
query vk,i from resource rpto rqupon reconfiguration. The
total number of migrations at control interval tis given by,
ρt=X
∀vk,i∈Vi,Vi∈Vt
ρt(vk,i)
When a query of a DAG Gkis migrated, the input stream
for the query is buffered for later downstream processing
after the schedule has been enacted. Given an input stream
rate of ωin
k,i for a query vk,i, the number events that will be
buffered in a queue during a control interval action is,
qk,i =ωin
k,i ×ηsuch that vk,i ∈ Vk, ρt(vk,i)=1
After the schedule has been enacted, these events buffered
during the migration have to be processed along with events
that continue to arrive on the input streams to the DAG. The
amount of time required for a query vk,i to process and drain
all buffered events, and catch-up to a stable rate is called
the stabilization time ψk,i. If the latency for processing an
event for a vertex vk,i mapped on a new resource rpis λp
k,i,
stabilization time for that vertex can be calculated as,
ψk,i =qk,i
1
λp
k,i
−ωin
k,i
which is the buffered queue size divided by the difference
between the supported input rate and current input rate.
The total stabilization time ψfor a collection of
dataflows Gafter reconfigurations at time tis given by,
ψt= max
∀k, ∀vk,i∈Vk, ρt(vk ,i)=1 ψk,i
3.3. Query Placement Constraints
Based on the earlier motivating scenario, we define
several constraints to be met when deciding the placement
of queries to edge and Cloud resources. These are similar to
our earlier work on static query placement [10], but modified
for the dynamic scenario that we now consider.
Constraint 1. Source queries in all DAGs must be placed on
edges, while the sink queries must be placed on the Cloud.
∀µt
k,hvk,i, rpi ∈ µt
k|vk,i ∈ VSRC
k=⇒rp∈RE
|vk,i ∈ VSN K
k=⇒rp∈RC
Event analytics operate on streams sourced from the edge
but often have costly control decisions occurring in the
Cloud. Hence, the dataflow should consume events from
the edge and deliver results to the Cloud. Thus, the decision
making responsiveness will depend on the end-to-end la-
tency across the edge and Cloud network. This forces source
queries to be co-located on edge devices that generate the
input stream(s), and sink queries to be placed on VMs.
Constraint 2. Given an input rate ωin
k,i to query vk,i of DAG
Gk, the query must not overwhelm the compute capacity if
exclusively mapped to a resource rp.
ωin
k,i <1
λp
k,i
∀vk,i ∈ Vk
If multiple queries from one or more DAGs run on the same
resource rp, then the input rate ωin
k,i on a vertex vk,i that
the resource can handle is limited by:
ωin
k,i <1
P
µt
kP
(vk,j ,rp)∈µt
k
vk,j /∈VS RC
kλp
k,j 1 + π(m)
m=X
(vk,j ,rp)∈µt
k
|vk,j |,∀vk,i ∈ Vk, vk,i /∈ VSRC
k
The maximum input rate that a resource rpcan handle when
exclusively running a query vk,i is the inverse of its latency
1
λp
k,i
, and for multiple queries it is the inverse of the sum
of their latencies 1
Pλp
k,i
. However, there is likely to be
additional overheads in the latter case due to interference
between concurrent queries. If mqueries are running on
a resource rp, let π(m)denote the parallelism overhead,
which is obtained empirically. Hence, we should only place
a query on a resource if it will not receive an input rate
greater than this upper-bound throughput.
Lastly, edge resources on the field are often powered by
rechargeable batteries that are connected to renewables like
solar panel to reduce their maintenance. Let Cp, in mAh, be
the power capacity of a fully-charged battery for the edge
device rp∈RE. Let the base load (instantaneous current)
drawn by the edge device when no queries are running be
κp,inmA. Let p
k,i be the incremental power,inmAh
beyond κp, drawn on the edge resource by a query vk,i to
process a single input event. Let the recharge interval to
fully recharge this battery be τp, in seconds, say through
solar generation or by replacing the battery.
Constraint 3. The queries running on a edge device
rpshould not fully drain its battery capacity within the
recharge time interval τp.
τp×κp+X
µt
kX
(vk,i,rp)∈µt
k
vk,i /∈VS RC
k
rp/∈RC
(ωin
k,i ×p
k,i)≤Cp
The first term is the base load that drains the edge resource,
even when inactive, during the recharge interval. The second
term is the incremental power for processing events by all
queries mapped to that edge at their respective input rates.
3.4. Optimization Problem
Given a set of DAGs Gt={G0,G1, ..., Gn}which
have been scheduled at time ton a set of Edge and
Cloud resources R, denoted by a set of resource mappings
Mt={µt
0, µt
1, ..., µt
n}, without violating the above three
constraints. When a DAG arrives or leaves the system at
time t+ 1, the primary objective is to find a new mappings
set Mt+1 that meets Constraints 1,2and 3, while minimizing
the sum of the makespans for all the DAGs, Gt+1, given by,
b
LGt+1 =X
Gi∈Gt+1
min
∀µt+1
i:Vi→RLGi
Secondary objectives to this optimization problem are to
minimize the schedule planning time φt+1 at the control
interval t+ 1, the total number of migrations performed,
ρt+1, and the total stabilization time ψt+1 .
4. Adaptive Placement Strategies
The solution to the above optimization problem is NP-
complete as optimal DAG scheduling in general is NP-
complete [18]. As DAGs arrive and depart, placing them
on or removing them from existing resources will affect the
latency time and the constraints of the other DAGs that are
collocated on the same resource(s). As a result, an optimal
placement that minimizes the sum of makespans of all active
DAGs may require all the queries for all active DAGs to be
rescheduled. The time complexity for a brute force solution
to this at a given control interval tis exponential in terms of
the number of resources, O(|Vt|+|Et|)× |R|n. This take
days to solve optimally even for a single dataflow with 14
queries on, say, 50 resources [10]. Hence we explore heuris-
tics that offer a reasonable quality solution to minimize the
makespan sum while guaranteeing that constraints are met.
In our proposed approach, we perform several actions at
each control interval. When a dataflow arrives, we need to
determine on which available resources to place its queries
on while reducing its end-to-end latency. We should meet its
constraints, and also ensure that the constraints of existing
dataflows on those resources continue to be met. We propose
a novel dataflow scheduling heuristic for performing this in
§4.1, and further extend a prior GA-based meta-heuristics
to this dynamic scenario in §4.2. When a dataflow departs,
we need to stop its CEP queries and reclaim its resources.
This will not violate the constraints of existing dataflows.
Once the dataflow in question has been mapped/un-
mapped within the constraints, we next check if the sum of
dataflow makespans can be improved by reconfiguring the
dataflow queries through migrations, while also minimizing
the number of migrations which will affect the stabilization
time. We propose several rebalancing strategies for these
selective migrations in §4.3. Both the dataflow schedul-
ing heuristic and the rebalance strategies contribute to the
schedule planning time, which must be reduced so that these
online algorithms complete within each control interval.
4.1. Topological Set Ordering (TopSet) Heuristic
The makespan for each event analytic dataflow that ar-
rives is determined by its critical path. Topological sorting is
frequently used for DAG scheduling [18], where the queries
in the DAG are traversed in a BFS order starting from the
source tasks and scheduled on the most suitable available
resource, in that order. Others use a rank based approach
that assigns a priority for each query in the DAG based on
its presence in the critical path from the source to the sink
tasks [19]. However, these are designed for batch workflows,
rather than streaming dataflows, and for scheduling a single
workflow rather than dynamic dataflows. We adapt these
techniques for our heuristic by extending the topological
DAG ordering with a local ranking at each level.
When a DAG Gkarrives, our TopSet heuristic traverses
it in topological order that ensures that all parent (upstream)
queries are visited before their child queries. Further, we
rank the children of a parent so that they are visited in de-
creasing order of query latency. Such an ordering is obtained
by finding the topological set ordering. When performing
the multi-source BFS traversal, instead of appending a child
to the topological list, we merge all children at the same
depth for a parent into one set. This traversal will return a
list of query sets, with each set having sibling queries and
with the previous set in the list referring to the parent set.
Formally, a topological list of sets Sk= [Si]for the
traversal of DAG, Gk= (Vk,Ek), is defined as a recurrence,
S0={v∈ Vk| ∀u∈ Vk,hu, vi/∈ Ek}
Si+1 ={v∈ Vk|u∈
i
[
j=0
Sj,∀u∈ Vk,hu, vi∈Ek}
where S0is the source set containing all source queries.
Given Skfor the incoming DAG, we visit each set in the
list sequentially, and within each set visit the queries in de-
creasing order of critical latency to consider it for placement
on the available resources. The capacity of resources Ris a
function of the concurrent queries from existing dataflows
running on each. When a query vk,i is visited, we evaluate
its quality on each resource rp∈Rby calculating the critical
path length from the source queries to this query.
Li= max
∀vk,j ∈parents(vk,i )
(vk,j ,rq)∈µkλp
k,i +lq,p +δi
βq,p
When we place the query vk,i on a resource rpwhich
already has other upstream queries from the dataflow placed
in it, there will be an impact on the latencies of the previous
queries due to interference. We optionally assign a penalty
on the current query’s latency for that resource. This penalty
is the sum of increase in latency of the critical path lengths
Ljfor all queries vk,j placed on the resource rp, relative to
their previously estimated critical path length. We term this
variant as TopSet/P.
We consider a resource as a valid mapping for a visited
query only if it does not violate the three constraints, either
for this query or for prior queries placed on it. Among
the valid resources, we select the one with smallest critical
path latency for mapping this query, and add it to the
mapping function for the DAG, µk. We expect a query to
be more likely to be placed on an edge resource till the
latency/capacity of the edge surpass the network latency
from the edge to the Cloud. Once an edge violates the
constraints, queries are likely to move to Cloud resources.
TopSet only schedules queries of arriving DAG on the
available resource. It does not migrate queries in existing
dataflows. It also does not handle DAG removals explicitly,
and we just remove the queries of the departing DAG from
the resources they were placed on. This may improve the
performance of remaining DAGs, but this improvement may
be sub-optimal. Later, we discuss rebalancing strategies to
globally reconfigure queries across all active dataflows.
4.2. Extensions to GA Meta-heuristics
In our prior work, we reduced the single dataflow
scheduling problem to a Genetic Algorithm (GA) formu-
lation [10]. That approach models a chromosome as the
mapping function µ, with a length nthat matches the
number of queries in the dataflow and each basepair having
a value from [0,|R−1|]. The GA algorithm generates a
random population of chromosomes, and uses crossovers
and mutations to create a new generation of population [20].
In each population, we penalize chromosomes that violate
any constraint. We keep track of the chromosome which
gives the best makespan among all valid solutions seen. The
algorithm terminates after a fixed number of generations, or
if no improvement is seen in the past 50% of generations.
We propose two extensions to this GA algorithm to
support the current problem of having multiple dataflows
arriving and departing dynamically from the system.
GA-Incremental (GAI). When a new DAG arrives, we
use GA to schedule its queries on the available resources,
after reducing their capacities based on the previous queries
running on them. We obtain the latency λk, energy kand
throughput rate ωin
ksupported by the resources from earlier
deployments to drive this GA placement. We run GA on
the DAG with these updated resource capacities. If the GA
converges to a valid placement, we deploy the DAG on the
resources returned by the chromosome mapping function. If
the GA does not converge due to constraint violations, we
discard the DAG and report an error. The resource mappings
for prior dataflows are retained, and only queries in the
arriving DAG are incrementally scheduled. As for TopSet,
GAI does not handle DAG departures other than reclaiming
those resource capacities for future DAG arrivals.
This approach is simple by reusing an earlier algorithm
for a static scenario, and requires no reconfiguration of
existing dataflows. However, GA while faster than a brute-
force approach still takes longer to converge than our TopSet
heuristic which has a bounded time. GA cannot guarantee an
optimal solution either, as with similar meta-heuristics [20].
So it is ill-suited for online scheduling of dynamic DAGs.
GA-Global (GAG). Another variant of GA addresses the
potential sub-optimal global state of dataflows after a single
dataflow has been placed or an existing dataflow unmapped
by TopSet or GAI. When a DAG arrives or departs the
system at time t, the set of active DAGs Gtchanges. GAG
considers this entire set of dataflows for ab initio placement,
irrespective of their prior placement on resources. We trans-
late the DAGs in Ginto a single global DAG by connecting
all their source queries to a dummy source and all their
sinks to a dummy sink. GA is then run on this global DAG
to place all queries in all active DAGs on the set of all
resources whose full capacities are available.
This is expected to give a better solution for the opti-
mization problem than GAI since all DAGs (and not the
most recently arriving DAG) are considered for placement.
However, such an approach results in high migration costs
since the placement for all queries, both new and previously
placed, will change at each control interval. This will also
have a larger schedule planning time relative to GAI.
4.3. Rebalancing of Placements
It is common for DAG scheduling heuristics to start with
an approximate placement and then perform a constrained
search to incrementally improve this solution [21]. We pro-
pose rebalancing strategies that start from a valid schedule
from one of the above heuristics, and then improves upon
their solutions to reduce the overall makespan of dataflows.
These specifically look at the queries on the critical path of
different dataflows and try to migrate these queries to better
resources, provided that no constraints are violated.
Vertex Rebalancing. In this strategy, the query having
the highest latency along the critical path of a DAG is
migrated to a resource which has a higher compute capacity,
thereby reducing the critical path length of the DAG. When
a vertex rebalance is performed, the DAGs currently present
Figure 1: DAG size distributions for workload generation
in the collection Gtat time tare sorted in decreasing order
of their critical path latency. Each DAG is visited in the
sorted order obtained, and the query with highest latency
along its critical path is chosen for rebalance. This query is
moved to a resource which reduces the objective value of
the optimization problem. Such a relaxation of the costliest
query in the critical path for each active DAGs results in a
maximum of |Gt|migrations.
Edge Rebalance. Besides query latency due to the compute
capacity of a resource, network latency forms the other
major factor in the critical path. This will also be affected by
the network latencies between different resources deployed
in a wide area network. Edge rebalance also sorts the DAGs
in decreasing order of critical path latencies. In each path,
it identifies the edge with the highest network cost (latency
and bandwidth) for rebalance. We then test if moving the
upstream query of this critical edge to the same resource
as the downstream query will improve the makespan, or if
moving the downstream query to the upstream resource will
help. We pick the operation that offers the better improve-
ment. A maximum of |Gt|migrations may be performed.
Code Repository is available online 1
5. Results
We perform a realistic simulation study to evaluate the
dynamic dataflow scheduling strategies. These use detailed
traces from real-world micro and application benchmarks on
the Campus IoT deployment, on edge and Cloud resources.
System Setup. We use Raspberry Pi 2 Model B v1.1 as
our edge devices, with a 900MHz 4-core ARM A7 CPU,
1GB RAM and 100 Mbps NIC. A Microsoft Azure D2 VM
in Southeast Asia data center serves as our Cloud resource,
with a 2.2Ghz 2-core Intel Xeon E5 CPU, 7GB RAM and
Gigabit NIC. Both resource types run Linux and WSO2’s
Siddhi CEP engine within a JRE [6]. The edge devices are
connected by Gigabit Ethernet campus network, and access
the Azure data center through the public Internet. In our IoT
deployment for the simulation, we used two resource setups:
small, one with 96 Pi devices and 4Azure VMs for a total
of 100 resources, and large, with 960 Pi devices and 40
Azure VMs for a total of 1,000 resources. This captures a
1. Code is available at https://github.com/dream-lab/ec-sched
(a) Random Walk,
b
U= 2.0±0(b) Random Walk,
b
U= 2.0±0.5(c) Random Walk,
b
U= 2.0±1.0(d) Poisson, λ= 12
Figure 2: Utacross time for the workloads. Small has 100 devices/100 intervals and Large has 1,000 devices/400 intervals.
campus and a township scale IoT deployments, and bounds
the operational costs for the pay-as-you-go Cloud VMs.
We identify 21 different CEP query types that span
filter, sequence, pattern, sliding window aggregate and batch
window aggregate, and with different configurations, such
as sequence length, window length and selectivity. These
queries are individually micro-benchmarked on the Pi and
the Azure VM to measure various coefficients used in
§3, such as their peak compute throughput, parallelism
overhead, base load and incremental power consumption,
etc. We also make detailed network latency and bandwidth
measurements between pairs of edge devices, and the edge
and Cloud VM. During the simulation runs, we sample
values from these distributions to capture the variability in
performance of even identical resources when operating in
the field. For brevity, we refer the readers to the detailed
benchmark measurements from earlier [10].
Simulation Workloads. We generate 39 different event
analytic dataflows using the Random Task and Resource
Graph (RTRG) utility [22], with 4−50 vertices each and
a fan-out of up to 5edges. We uniformly sample and
assign one of the benchmarked CEP queries to each vertex.
We use a constant cumulative input rate of 100 e/sec
for each dataflow, though the input rate per source query
can range from 25 −100 e/sec. We ensure coverage of
query types (21), selectivities (0.01 −458), output rates
(1−11,457 e/sec), source queries (1−4), sink queries
(1−3), etc. and make sure there is a feasible valid placement
for the dataflow on the available resources [10].
We simulate the dynamic arrival and departure of DAGs
at each control interval using a Random Walk (RW) and a
Poisson distribution [23]. In the Poisson model, we alternate
between adding and removing a DAG after initially adding
16 and 70 DAGs for Small and Large resource setups, re-
spectively. In the RW model, we use the cumulative resource
utilization to decide whether to add or remove a DAG. The
utilization of the resources Rwith GihVi,Eii ∈ GtDAGs
active at time tis,
Ut=
|Gt|
P
i=1
|Vi|
|RE|+|RC|
We define a utilization threshold b
U=U±u. We continue
adding DAGs at each control interval t+ 1 while Ut<
(U+u), and switch to removing a DAG while Ut>(U−u).
This oscillates DAG addition and removal within an upper
and lower bound. We use U= 2, i.e., an average of 2
queries per resource, and 3values of u= 0,0.5and 1.0,
which allows an average deviation of ±uqueries from 2.
Once we have decided to add or remove a DAG at a
control interval, the size of the DAG is determined from
a probability distribution shown in Fig. 1. For RW, the
chance that a DAG is selected is inversely proportional to
the number of queries it has, while for Poisson, it follows
a Poisson distribution with λ= 12 queries, which is close
to the median DAG size. Given the DAG size, we choose a
specific DAG to add by uniformly selecting from the differ-
ent variants of this DAG size in the pool of 39 dataflows; or
we choose a specific DAG to remove by linearly searching
through the active DAGs till one with this size is located,
and otherwise, sample again from the distribution till a size
matches an active DAG. We run this simulator for 100
and 400 control intervals for the Small and Large resource
setups, respectively. Figs. 2show the utilizations for the
generated workloads across the control intervals as DAGs
are added or removed. This variation is smoother for the
large setup as compared to small since the relative change in
utilization due to adding/removing a single DAG is smaller.
Quality of Overall Makespan. Fig. 3show the sum of
the end-to-end latency for all active DAGs (Y-Axis) for
each control interval (X-Axis) for the 4workloads using
the small (top row) and large (bottom row) setups, when
not performing rebalance. Besides our TopSet and TopSet/P
heuristics, and the GAI and GAG meta-heuristics, we also
test two na¨
ıve baseline placement strategies of Edge-Only
and Cloud-Only. In the Edge-Only baseline, we consider
an infinite availability of edge resources with no network
overhead among them. Thus all non-sink queries are placed
on exclusive edges and the network latency from edge to
Cloud VM is paid only once. In the Cloud-Only baseline, we
place all non-source queries on exclusive Cloud VMs, again
paying network latency only once from edge to Cloud. They
do not consider constraint violations either. While infeasible,
these baselines offer a weak lower bound for makespan
when not limited by resources and network latency.
In all the cases we see that the Cloud-Only baseline
has the least total makespan summed across all active
DAGs since the compute speed of the Cloud resource is
the fastest. Edge-Only is the next best, and indicates the
impact of the slower ARM CPU relative to the Xeon CPU
for running queries. Neither take network time into account,
(i) Small Setup
(a) Random Walk,
b
U= 2 ±0(b) Random Walk,
b
U= 2 ±0.5(c) Random Walk,
b
U= 2 ±1.0(d) Poisson, λ= 12
(ii) Large Setup
Figure 3: Sum of DAG makespan b
LGt+1 at each control interval for placement strategies without rebalancing
and hence have a better makespan than all other heuristics.
As expected, the large setup has a 10×larger makespan
(≈5secs) than the small, due to 10×more active ver-
tices for the RW workloads. In case of Poisson workloads
for large setup, it is 6x larger makespan. This is due to
small DAGs having relatively higher probability of getting
added unlike the RW workloads, thus contributing lesser
makespan.
For both the setups, GAI performs better than GAG,
while we expect the global scheduling algorithm across all
DAGs to better the incremental algorithm for just the new
DAG. Upon examination, there are a large number of queries
being scheduled by GAG in both the setup, especially as the
utilization peaks. This causes the search space to explode,
resulting in a sub-optimal solution. For the large setup, this
space explosion is seen for both GAI and GAG, and the
strategies fail to return any solution beyond ≈100 intervals
(‘?’ in Fig. 3(ii)). GAG fails to return a solution at an earlier
control interval than GAI. Tuning the GA meta-parameters
did not help either.
We see that our TopSet and TopSet/P heuristics have
total makespan values that match closely with the better
of GAI or GAG, and they are able to find a solution in all
cases. This exhibits the robustness of our heuristics. We also
notice that TopSet/P is marginally but consistently better
than TopSet. The latter only considers the current query
being visited in the topological set traversal to find the cost
of mapping to a resource, while the former also estimates the
side-effect of the mapping on other queries with a penalty
function. We see the benefit of that optimization here.
We measure the impact of the rebalance strategies that
complement the heuristics after their initial placement. The
Fig. 4plot the relative improvement in makespan when
performing rebalance after arrival and departure of DAGs,
compared to without. A positive value indicates a reduction
in total makespan. Rebalance is only relevant to TopSet,
TopSet/P and GAI, since GAG does a global DAG place-
ment. GAI values are less relevant for the large setup due
to few valid solutions. While vertex rebalance appears to
have only a small impact, sometimes even doing worse, the
edge rebalance has significant benefits, reducing the total
makespan by up to 20% for the small setup and 25% for the
large setup. This confirms that network plays a greater role
in the DAG latency in such distributed setups. Interestingly,
for the Poisson and RW 2±0.5workloads, combining vertex
and edge rebalance is better than just edge even though
vertex rebalance by itself offers limited benefits. While all
three heuristics see benefits from rebalance, GAI for the
small setup appears to improve more frequently as it starts
from a better solution base.
Runtime Performance. The secondary measures of the
solution quality have to do with reducing the schedule
planning time, number of migrations and stabilization time.
Fig. 5shows the schedule planning time taken by the four
heuristic algorithms upon DAG arrival (and removal too for
GAG) at a control interval. The Y-axis is in log scale. We
see that the GA-based meta-heuristics take much more time
to converge to a placement than our TopSet strategies. In
fact, our heuristics give a placement mapping within 1sec
in all cases for all workloads, barring some outliers. GAG
and GAI take 104and 103longer, ranging from under a
minute for GAI to multiple hours for GAG. We also see
(not shown) that there is strong correlation between the
number of queries being scheduled and the time taken for the
schedule planning, which is also correlated with utilization.
As a result, TopSet and TopSet/P are not just robust in
always giving a valid solution and matching the GA’s quality
when they return a solution, but also do so in sub-second
time. This makes them well-suited for short control intervals
where one can expect 1−10 DAGs change every second.
The GA solutions are not feasible for large setups, and only
GAI is practically usable for small setups. Failure of such
meta-heuristics to scale with parameter space in terms of
solution quality and run-time performance is evident in [24].
We also report the number of migrations performed
by the strategies for different workloads of the small setup
in Figs. 6. We omit the plots for large setup due to space
limits. We see that vertex rebalancing causes fewer migra-
(i) Small Setup
(a) Random Walk,
b
U= 2 ±0(b) Random Walk,
b
U= 2 ±0.5(c) Random Walk,
b
U= 2 ±1.0(d) Poisson, λ= 12
(ii) Large Setup
Figure 4: Violin plot of reduction in sum of makespan with different rebalance strategies, relative to no rebalance (Fig. 3).
(a) Small Setup (b) Large Setup
Figure 5: Violin plot of Schedule Planning time,φtfor each
workload and strategy
tions compared to edge rebalancing. While good, this has
the consequence of having minimal impact on improving
the makespan after applying rebalancing. Edge rebalancing
on the other hand has more migrations but also a better
makespan reduction. Further, the median number of migra-
tions in all cases is zero, indicating that migrations are less
frequent and rebalance is able to cause an improvement in
<50% of the schedules. For the large setup, we see up to
140 migrations take place compared to a peak of about 25
for the small setup. More resources imply a larger solution
space for improvements. While not shown, GAG causes
most queries in the global set of DAGs to migrate each
time.
The migrations also impact stabilization time, and in
particular, the peak number of migrations on a single re-
source that can cause queries on it to buffer longer and
take longer to drain. Fig. 7shows the average time taken
for all queries to drain the buffer and reach a steady state of
event stream execution for the small setup. The plot for large
setup is skipped for brevity. Here, we assume that the cost
of each migration (and hence buffering time) is η= 1 sec,
but this may vary based on the CEP engine used and its
bootstrap overheads. Since migrations are less frequent, the
median stabilization time tends to 0. We normally find the
stabilization time to be in the range of sub-seconds when
non-zero with some peaks reaching ≈20 secs. We also see
that typically, the stabilization time for GAI is smaller than
TopSet and TopSet/P, indicating that fewer number of peak
migrations on a resource happen. For the large setup, we
report that Edge and Vertex+Edge rebalance require a larger
stabilization time for all workloads, compared to Vertex
rebalance. There is also a correlation between the increase
in stabilization time of ≈10×, taking up to 8secs, with
10×more number of resources relative to the small setup.
6. Related Work
Edge computing is gaining increasing attention [12],
and generalizes prior work on Wireless Sensor Networks
(WSN) [25], Cloudlets, and Mobile Cloud [11]. Constrained
scheduling of applications to meet QoS and dynamic re-
source management across edge and cloud has gained at-
tention [26]. Serendipity [27] opportunistically exploits mo-
bile resources within communication range for offloading
parallel tasks of a program from a device, with mobile
devices joining and leaving the system. Nebula [28] fo-
cuses on pushing data-intensive compute to geographically
distributed edge devices with localized optimisations on
location-aware data and computation placement, replication,
and recover. Others examine a related problem on schedul-
ing dynamic independent tasks from Cloudlets to mobile ad
hoc Clouds [29], with heuristics that are validated through
simulation. They use both user-centric and system-centric
metrics like average makespan, waiting time, slowdown, etc.
We focus on edge devices that are part of the infrastructure
where availability is not a concern but energy and compute
constraints exist. We also support dynamic arrival and exit
of dataflows, which are more complex than tasks, and for a
streaming scenario.
Wide area distributed query processing has been exam-
ined for WSN. There, constrained motes collocated with
sensors partition a query across the edge devices for online
processing [25]. Some of these look into stream process-
ing across nodes with varying event rates and network
performance, where the schedule has to be dynamically
changed [30]. Data partitioning and selective replication
(a) Random Walk,
b
U= 2 ±0(b) Random Walk,
b
U= 2 ±0.5(c) Random Walk,
b
U= 2 ±1.0(d) Poisson, λ= 12
Figure 6: Number of migrations ρtrequired for different placement strategies with rebalancing, for Small setup
(a) RW, 2±0(b) RW, 2±0.5(c) RW, 2±1.0(d) Poisson
Figure 7: Average of Total Stabilization time ψt+1 over
100 control intervals for different placement strategies (η=
1sec)
are also used along with temporal information in prior
workloads to predict future query patterns and reduce the
query span [31]. We consider dynamism in application
arrival but not event rates or resource availability. While
issues of energy limits and network dynamism do exist
in WSN, current edge devices have superior performance,
and are complemented by Cloud resources for cooperative
scheduling rather than an edge-only approach. We also see
a richer set of dataflow applications and fast rates with IoT.
Big data platforms like Storm [32] and Flink are de-
signed for streaming applications deployed on Cloud re-
sources. Resource aware scheduling of such continuous
dataflows considers some dynamism in application struc-
ture [33]. CEP dataflows are a specialization of such fast
data applications, but with standard query models rather than
arbitrary user logic. As a result, our CEP dataflow scheduler
has better awareness of the resource needs, such as energy
and compute per event, compared to opaque user tasks.
Scheduling of scientific workflows on Cloud resources
have been well studied using elastic resource allocation
strategies [34]. Often they require multi-objective optimisa-
tion goals to meet the requirements of the workflows [35].
While event analytic dataflows have a similar DAG model
as workflows, they process data in a stream continuously
than a batch, and hence all their queries are active all the
time. The edge devices we consider also have constraints
like energy that are based on deployment parameters like
battery capacity. These open up novel scheduling problems
beyond workflow that we tackle.
DAG scheduling is a well-studied problem with a num-
ber of heuristics that have been proposed to solve this NP-
complete problem [18], [21], [36]. Makespan is a common
optimization criteria, with critical-path based approaches of-
ten considered as a form of list scheduling. Meta-heuristics
like GA and ACO are also often used [20]. Placement of
large-scale distributed applications on multiple clouds using
heuristics and meta-heuristics have been well studied [24],
[37]. We leverage some of these common strategies to solve
an interesting and practical IoT scheduling problem across
edge and Cloud.
7. Conclusions
We have proposed a distinctive problem of scheduling
dynamic event analytic dataflows on edge and Cloud com-
puting resources to support the emerging needs of IoT and
smart city applications. We define the optimization problem
for query placement with energy and compute constraints,
customized to the unique needs of IoT resource deploy-
ments. We have proposed the TopSet heuristic based on a
topological set ordering, along with a variant that considers
side-effects on other dataflows. We also extend two prior GA
heuristics from a static to a dynamic dataflow scenario. Re-
balance strategies further improve upon the initial placement
decision. Our detailed simulations using real-world traces
of queries and resources show that TopSet/P with Edge and
Vertex rebalance is fast (sub-second), consistently offers a
valid solution, has a makespan that out-performs in most
cases, and has a mean stabilization time of <8sec even
with 1000 devices. GA-based solutions fail for larger setups,
and are slow as well.
References
[1] J. Bloem and et al., “The fourth industrial revolution,” Things Tighten,
2014.
[2] Y. Simmhan, V. Prasanna, S. Aman, A. Kumbhare, R. Liu, S. Stevens,
and Q. Zhao, “Cloud-based software platform for big data analytics
in smart grids,” IEEE/AIP CiSE, 2013.
[3] M. Strohbach, H. Ziekow, V. Gazis, and N. Akiva, “Towards a big data
analytics framework for iot and smart city applications,” in Modeling
and processing for next-generation big-data technologies, 2015.
[4] G. Cugola and A. Margara, “Processing flows of information: From
data stream to complex event processing,” ACM CSUR, 2012.
[5] “Apache Edgent, v1.1.0,” http://edgent.apache.org/.
[6] S. Suhothayan, K. Gajasinghe, I. Loku Narangoda, S. Chaturanga,
S. Perera, and V. Nanayakkara, “Siddhi: A second look at complex
event processing architectures,” in ACM Gateway Comp. Env., 2011.
[7] Z. Jerzak and H. Ziekow, “The debs 2014 grand challenge,” in ACM
DEBS, 2014.
[8] P. Varshney and Y. Simmhan, “Demystifying fog computing: Charac-
terizing architectures, applications and abstractions,” in IEEE ICFEC,
2017.
[9] A. Kumar, S. Goyal, and M. Varma, “Resource-efficient machine
learning in 2 kb ram for the internet of things,” in International
Conference on Machine Learning, 2017, pp. 1935–1944.
[10] R. Ghosh and Y. Simmhan, “Distributed scheduling of event
analytics across edge and cloud,” ACM Transactions on Cyber-
Physical Systems (TCPS), to Appear. [Online]. Available: http:
//arxiv.org/abs/1608.01537
[11] M. Satyanarayanan, R. Schuster, M. Ebling, G. Fettweis, H. Flinck,
K. Joshi, and K. Sabnani, “An open ecosystem for mobile-cloud
convergence,” IEEE Comm. Magazine, 2015.
[12] P. Garcia Lopez, A. Montresor, D. Epema, A. Datta, T. Higashino,
A. Iamnitchi, M. Barcellos, P. Felber, and E. Riviere, “Edge-centric
computing: Vision and challenges,” ACM Comp. Comm. Rev., 2015.
[13] Smartx, “IISc Smart Campus: Closing the loop from Network to
Knowledge,” 2016. [Online]. Available: http://smartx.cds.iisc.ac.in
[14] B. Amrutur, et al., “An Open Smart City IoT Test Bed: Street Light
Poles as Smart City Spines,” in ACM/IEEE IoTDI, 2017.
[15] M. Yannuzzi, F. van Lingen, A. Jain, O. L. Parellada, M. M. Flores,
D. Carrera, J. L. Perez, D. Montero, P. Chacin, A. Corsaro, and
A. Olive, “A new era for cities with fog computing,” IEEE Internet
Computing, 2017.
[16] P. Misra, Y. Simmhan, and J. Warrior, “Towards a practical architec-
ture for internet of things: An india-centric view,” IEEE Internet of
Things Newsletter, pp. 1–2, 2015.
[17] “Measuring chlorine levels in water supplies,” World Health Organi-
zation, Tech. Rep., 2013.
[18] Y.-K. Kwok and I. Ahmad, “Static scheduling algorithms for allocat-
ing directed task graphs to multiprocessors,” ACM CSUR, 1999.
[19] H. Zhao and R. Sakellariou, “Scheduling multiple dags onto hetero-
geneous systems,” in IPDPS, 2006.
[20] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution
Programs. London, UK: Springer-Verlag, 1996.
[21] M.-Y. Wu, W. Shu, and J. Gu, “Efficient local search far dag schedul-
ing,” IEEE Transactions on parallel and distributed systems, 2001.
[22] R. A. Shafik, B. M. Al-Hashimi, and K. Chakrabarty, “Soft error-
aware design optimization of low power and time-constrained em-
bedded systems,” in DATE, 2010.
[23] A. Arampatzis and J. Kamps, “A study of query length,” in ACM
SIGIR, 2008.
[24] P. Silva, C. Perez, and F. Desprez, “Efficient heuristics for placing
large-scale distributed applications on multiple clouds,” in 2016 16th
IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing (CCGrid), May 2016, pp. 483–492.
[25] U. Srivastava, K. Munagala, and J. Widom, “Operator placement for
in-network stream query processing,” in ACM PODS, 2005.
[26] S. Shekhar and A. Gokhale, “Dynamic resource management across
cloud-edge resources for performance-sensitive applications,” in 2017
17th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing (CCGRID), May 2017, pp. 707–710.
[27] C. Shi, V. Lakafosis, M. H. Ammar, and E. W. Zegura, “Serendipity:
Enabling remote computing among intermittently connected mobile
devices,” in ACM MobiHoc, 2012.
[28] M. Ryden, K. Oh, A. Chandra, and J. Weissman, “Nebula: Distributed
edge cloud for data intensive computing,” in 2014 IEEE International
Conference on Cloud Engineering, March 2014, pp. 57–66.
[29] B. Li, Y. Pei, H. Wu, and B. Shen, “Heuristics to allocate high-
performance cloudlets for computation offloading in mobile ad hoc
clouds,” J. Supercomput., 2015.
[30] J. H. Hwang, U. Cetintemel, and S. Zdonik, “Fast and highly-available
stream processing over wide area networks,” in IEEE ICDE, 2008.
[31] A. Turk, R. O. Selvitopi, H. Ferhatosmanoglu, and C. Aykanat, “Tem-
poral workload-aware replicated partitioning for social networks,”
IEEE Transactions on Knowledge and Data Engineering, 2014.
[32] B. Peng, M. Hosseini, Z. Hong, R. Farivar, and R. Campbell, “R-
storm: Resource-aware scheduling in storm,” in Middleware, 2015.
[33] C. Wickramaarachchi and Y. Simmhan, “Continuous dataflow update
strategies for mission-critical applications,” in IEEE eScience, 2013.
[34] R. F. d. Silva, W. Chen, G. Juve, K. Vahi, and E. Deelman, “Com-
munity resources for enabling research in distributed scientific work-
flows,” in IEEE e-Science, 2014.
[35] H. A. Nguyen, Z. van Iperen, S. Raghunath, D. Abramson,
T. Kipouros, and S. Somasekharan, “Multi-objective optimisation
in scientific workflow,” Procedia Computer Science, vol. 108, no.
Supplement C, pp. 1443 – 1452, 2017, international Conference
on Computational Science, ICCS 2017, 12-14 June 2017, Zurich,
Switzerland.
[36] A. Malik, C. Walker, M. OSullivan, and O. Sinnen, “Satisfiability
modulo theory (smt) formulation for optimal scheduling of task
graphs with communication delay,” Computers & Operations Re-
search, vol. 89, no. Supplement C, pp. 113 – 126, 2018.
[37] L.-C. Canon, E. Jeannot, R. Sakellariou, and W. Zheng, Compara-
tive Evaluation Of The Robustness Of DAG Scheduling Heuristics.
Boston, MA: Springer US, 2008, pp. 73–84.