Elastic Scaling of a High-Throughput
Content-Based Publish/Subscribe Engine
Thomas Heinze∗, Andr´
e Martin†, Marcelo Pasin‡, Rapha¨
Pascal Felber‡, Zbigniew Jerzak∗, Emanuel Onica‡and Etienne Rivi`
∗SAP AG, Dresden, Germany; †Technische Universit¨
at Dresden, Germany; ‡Universit´
e de Neuchˆ
Abstract—Publish/subscribe (pub/sub) infrastructures run-
ning as a service on cloud environments offer simplicity and
ﬂexibility for composing distributed applications. Provisioning
them appropriately is however challenging. The amount of stored
subscriptions and incoming publications varies over time, and the
computational cost depends on the nature of the applications and
in particular on the ﬁltering operation they require (e.g., content-
based vs. topic-based, encrypted vs. non-encrypted ﬁltering). The
ability to elastically adapt the amount of resources required
to sustain given throughput and delay requirements is key to
achieving cost-effectiveness for a pub/sub service running in
a cloud environment. In this paper, we present the design
and evaluation of an elastic content-based pub/sub system: E-
STREAMHUB. Speciﬁc contributions of this paper include: (1) a
mechanism for dynamic scaling, both out and in, of stateful
and stateless pub/sub operators, (2) a local and global elasticity
policy enforcer maintaining high system utilization and stable
end-to-end latencies, and (3) an evaluation using real-world tick
workload from the Frankfurt Stock Exchange and encrypted
Cloud computing provides processing infrastructures, plat-
forms, or software solutions as a service. One of the key
promises of cloud computing is its support for elasticity ,
i.e., the ability to dynamically adapt the amount of resources
allocated for supporting a service. Thanks to elasticity, cloud
users no longer need to statically provision their infrastructure.
Instead, resources can be added (scale out) or removed (scale
in) on-demand in response to variations in the experienced
load . This allows a service to accommodate unpredictable
growth or decay in popularity.
To support elasticity, an application or service deployed on
a cloud must provide three key features. First, it should be
intrinsically scalable: the addition or removal of resources must
result in an increase, respectively decrease, of the processing
capacity of the service. Second, it should support the dynamic
allocation of resources. This allows modifying at runtime the
processing capacity of a service without halting or restarting
that service. Finally, it should provide decision support, in
order to drive the dynamic allocation based on the experienced
workload and according to elasticity rules. These rules trigger
the allocation of new resources and the deallocation and
reallocation of existing resources based on the observed load.
Publish/subscribe (pub/sub)  is a decoupled communi-
cation paradigm that is particularly well suited for cloud-based
deployments . In pub/sub systems, information producers,
or publishers, send data to the pub/sub service in the form
of publications. The pub/sub service is then in charge of dis-
patching these publications to all the parties, called subscribers,
that previously expressed interest in them by the means of
subscriptions. In the topic-based model , , publications
and subscriptions are associated to pre-deﬁned topics. Content-
based pub/sub supports subscriptions that express interest as
ﬁlters on the content of publications, typically by means of
predicates over the attributes associated to each publication ,
, . We consider the more general content-based model
in this paper.
The load experienced by a pub/sub system running as
a service is by nature difﬁcult to predict. The amount of
subscribers and the volume of publications can vary signiﬁcantly
over time. A typical example is stock market monitoring :
stock exchanges (publishers) send real-time stock ticks or
quotes, while clients (subscribers) register their interests in
ticks and/or quotes related to their investment strategies, e.g.,
when the price for a given stock goes over or under a speciﬁc
threshold. In such a setting, the volume of publications depends
on the activity of the stock exchanges, which have speciﬁc
periods of activity every work day and are closed on week-
ends. Figure 1 shows a typical tick load recorded on the
of November 2011 at the Frankfurt Stock Exchange. The
volume sharply rises when the trading opens at 9:00 and rapidly
declines after market closes at 17:30. Static provisioning of
cloud resources for a pub/sub system supporting the peak load
of this application would be cost-ineffective. Conversely, the
ability to elastically scale the amount of resources allows a
better utilization of the infrastructure, translating into better
return-on-investment for the service provider.
In this paper we present the design, imple-
mentation, and evaluation of an elastic pub/sub middleware
service supporting high-throughput and low-delay content-
based ﬁltering. Our system, named E-S TR EA MHU B, extends
a scalable but static pub/sub engine, ST RE AM HUB , with
09:00 12:00 15:00 18:00 21:00
Ticks per second
Fig. 1. Typical volume of ticks (Frankfurt Stock Exchange, Nov. 18, 2011).
comprehensive elasticity support. It supports arbitrary types of
ﬁltering operations, including encrypted ﬁltering for privacy
preservation in public cloud settings. It leverages a stream
processing engine, ST RE AM MINE3G , and relies on a set
of stream processing operators, each implementing a speciﬁc
aspect of the pub/sub service, allowing the system to scale
horizontally and vertically with the number of allocated nodes.
In this work, we make the following contributions: (1) sup-
port for slicing of the state and computations of a high-
performance and massively parallel pub/sub engine; (2) mi-
gration mechanisms for dynamically allocating resources to
each operator of a pub/sub engine with low impact on delays
and throughput; (3) decision support for elasticity and resource
allocation based on system observation and enforcement of
global and local elasticity policies; and (4) implementation of
these mechanisms in E-STREA MHU B and their evaluation on a
private cloud of 240 cores, using real load evolution traces from
the Frankfurt Stock Exchange. Our evaluation allows adapting
the number of machines used to the actual workload experienced
by the pub/sub service, while maintaining continuous operation
and with minimal impact on notiﬁcation delays.
II. RE LATE D WORK
In this section we present an overview of related work with
speciﬁc focus on elastic platforms, event processing systems,
and pub/sub middleware.
A. Elastic Platforms and Infrastructures
Elastic infrastructures and platforms as a service (IaaS
and PaaS) support ad hoc addition and removal of virtual
machines (VMs) to/from a virtual environment (VE). A typical
example is the Amazon EC2 Auto Scaling service . It relies
on basic elasticity policies by setting simple thresholds on
resource utilization. The elastic site mechanism for Nimbus
and EC2 clouds proposes more elaborate policies, for workloads
composed of independent tasks under batch scheduling .
AutoScale  is an example of an adaptive IaaS elasticity
policy, where the goal is to maintain the minimum amount of
spare capacity to handle sudden load increases while keeping
the total number of servers as low as possible.
All elastic scaling solutions at the IaaS level require that
the VMs composing the VE, as well as the jobs that run on
these VMs, are stateless or independent. These requirements are
not fulﬁlled by the nodes of a pub/sub system, which require
application-level elasticity mechanisms. Elastic applications
interact with elastic IaaS using the VM allocation and de-
allocation APIs of the IaaS elasticity manager.
B. Elastic Complex Event Processing
Both complex event processing systems (CEP) and stream
processing engines (SPE) process queries composed of directed
acyclic graphs of stateful or stateless operators processing data
in the form of event streams.
Fernandez et al.  propose application-level scale out
for stateful operators, integrated with passive replication
(checkpointing). The integration is made possible by explicit
operator state management, an approach similar to that of E-
STR EA MHU B. The operator state is dynamically split to form
new partitions deployed over multiple machines when scaling
up, potentially requiring application support for partitioning.
E-STR EA MHU B supports full elasticity, i.e., scale out and in,
while optimizing the overall system utilization, and without
requiring speciﬁc application support. Its underlying runtime
engine can support both passive  and active  replication.
In the context of IBM’s System S , Schneider et al. 
propose an extension of the SPADE domain-speciﬁc language
to support parallel operators on a single host. E-STREAMHU B
supports elastic parallelization within a single host as well as
elastic distribution across multiple hosts.
ElasticStream  proposes to outsource parts of a System S
SPE deployment to a public cloud. It periodically adjusts the
deployment based on decision from a linear programming
formulation of the allocation problem, where the goal is to
minimize the monetary cost of the cloud usage. E-STREA MHU B
elastically scales based on the current state of the system,
immediately addressing load variations.
StreamCloud  monitors the average load per cluster in
order to detect under- and overload situations. When an over-
load is detected, StreamCloud triggers pair-wise re-balancing
between most and least loaded hosts. STREAMHU B elasticity
policies share this goal of rebalancing the load but with the
additional goals of minimizing the amount of migrations and
the support of migrations with minimal interruption of the ﬂow.
C. Elastic Publish/Subscribe
Topic-based pub/sub engines used in the industry, such
as Apache Kafka  or Hadoop HedWig , typically
support incremental scalability. This refers to the ability to
add/remove support servers in order to achieve linear scalability
in throughput or number of topics. None of these systems
supports automated addition and removal of servers based on
the experienced workload, and therefore cannot be classiﬁed
as elastic. E-S TREAMHUB supports incremental scalability
and elasticity, and considers the more general and challenging
EQS  is a message queue architecture that can be
classiﬁed as topic-based pub/sub. Elasticity in EQS is achieved
by monitoring the load on each topic and migrating topics
between hosts. This imposes a potential cost in terms of
service interruption. In contrast to EQS, E-STR EA MHU B targets
content-based ﬁltering where the destination of events can only
be determined at runtime. It also migrates parts of the load to
new hosts when overload is detected, but with the objective of
minimizing service interruption thanks to an application-level
Hoffert et al.  consider the problem of predicting the
load requirements for an adaptive topic-based pub/sub service
running in a cloud environment. Using supervised machine
learning, the approach is able to dynamically conﬁgure the
service (e.g., the transport protocols used) for different QoS
requirement, but does not consider the dynamic addition or
removal of hosts for supporting the service.
BlueDove  targets attribute-based pub/sub deployments
on public cloud services. The message ﬂow in BlueDove is
strictly dependent on the attribute-based ﬁltering model used
for subscriptions: the attribute space is split in regions, and
subscriptions and publications are dispatched to the matching
servers that are in charge of an overlapping region. This
disallows the use of ﬁltering schemes that are not based on low-
dimensionality attribute spaces or that do not allow examining
the content of the subscriptions at the server side, such as
with encrypted ﬁltering. Finally, BlueDove supports only scale
out, therefore, unlike E-S TR EA MHU B, it cannot be described
as fully elastic. Fang et al.  follow a similar approach to
BlueDove and only scales out.
III. THE ST RE AM HUB SCALABLE PU B/SU B ENGINE
We present in this section the high-level architecture of
the pub/sub engine that we extend for elasticity support,
STR EA MHU B . Its components are shown in Figure 2. The
design of ST RE AM HUB targets high throughput ﬁltering and
notiﬁcation pub/sub. It also aims for low and stable latencies.
For any given publication, the delay for notifying users shall
not vary more than a small fraction of its average value.
The matching of incoming publications against stored sub-
scriptions is performed in STRE AM HUB by means of external
ﬁltering libraries. Deploying a content-based pub/sub system in
a shared (public) cloud infrastructure poses security challenges.
The values of the publications’ attributes and the subscriptions’
predicates may reveal conﬁdential information that must be
protected from potential attackers . Encrypted ﬁltering
techniques ,  allow matching encrypted publications
against encrypted subscriptions, without revealing their original
content. In the context of this paper and in our evaluation, we
focus on computationally intensive encrypted ﬁltering using
the ASPE algorithm . Due to the independence of the
architecture from the ﬁltering scheme, our results would apply
to any other scheme.
Most of the previous distributed content-based pub/sub
approaches were based on an overlay of brokers, where
Each broker would implement all operations of the pub/sub
service , and brokers would have to maintain routing tables
between them whose efﬁciency and construction cost directly
depend on the subscription model and workload. STREAMHU B
departs from the use of broker overlays and adopts a tiered
architecture with dedicated components for each operation.
This yields better performance and allows the system to
scale independently of the ﬁltering scheme(s) being used.
STR EA MHU B splits the pub/sub service into three fundamental
operations. Each of these operations is mapped to an operator
spanning over an arbitrary number of processing entities, called
The ﬁrst operation is subscription partitioning. It is sup-
ported by an Access Point (AP) operator and splits the
workload of subscriptions in non-overlapping sets, one for
each slice of the next operator, using modulo hashing on
subscription identiﬁers. This introduces data parallelism and
allows processing of several incoming publications in parallel
against all subscriptions. Incoming publications are broadcast
to reach all slices of the next operator.
The second operation is publication ﬁltering. Each slice
of the corresponding Matching (M) operator is associated
with an instance of the ﬁltering library that stores incoming
subscriptions in the state associated with the slice. Upon
receiving a publication, the library returns the list of subscribers
that have registered a matching subscription. The publication
and this list are forwarded to the next operator. Note that
there might be several M operators, one per ﬁltering scheme
supported by the platform (e.g., encrypted and non-encrypted).
Finally, the third operation is publication dispatching. It
is supported by an Exit Point (EP) operator whose goal is to
collect and combine the lists of matching subscribers sent by
each of the M operator slices, for each publication. These lists
are dispatched to EPs using modulo hashing on publication
identiﬁers. As a result, the lists for a particular publication
will all reach the same EP operator slice. Once all lists are
collected by that slice, a notiﬁcation message is prepared and
sent to interested subscribers through the connection point(s).
STR EA MHU B leverages the runtime support of a stream-
based processing engine, STRE AM MINE3G . Events ﬂow
through a directed acyclic graph (DAG) of operators. All slices
of an operator use the same processing code for handling
incoming events. Synchronized access to operator state can
take place within a given slice by using read (R) and read/write
(R/W) locks. A slice has no access to the state of other slices,
even within the same operator. Each physical node (host)
supports vertical scalability for locally-deployed operator slices
by using a thread pool whose size is adapted to the number of
available cores. Each slice can be supported by multiple cores
operating in parallel when the processing is stateless or requires
only an R lock. STRE AM MINE3G supports dependability
through passive  or active  slice replication. This
requires no modiﬁcation to the application but the evaluation
of these mechanisms is out of the scope of the present paper.
The original STRE AM HUB design assumes a static conﬁg-
uration: each operator slice is allocated to a single host and
the number of slices for each operator must be determined
manually based on the expected load for each of the pub/sub
operations. Such manual provisioning is a challenging task as
the workload is not necessarily known in advance. Even if
the expected throughput could be estimated in certain cases,
the intrinsic characteristics of its impact on the load each
operator will have to support are hard to determine. Over-
provisioning the operators is not desirable as the number of
stored subscriptions and the rate of incoming publications will
vary over time, thus leading to poor cost-effectiveness.
While scalability is a primary and necessary aspect of
such a middleware solution, it is not sufﬁcient per se. One
also needs the ability to dynamically adapt the number of
hosts allocated to each of the operators, based on the observed
load on their slices. We present in the remaining of this
paper the mechanisms (Section IV) and policies (Section V)
required to provide elasticity within the pub/sub engine. These
solutions provide minimal service interruption: they maintain a
high ﬁltering throughput and low delays even under dynamic
IV. ELASTICITY MECHANISMS
Elasticity in E-S TR EA MHU B is supported by migrating
operator slices across a varying number of hosts. The total
number of active slices used by a single operator across all
these hosts is ﬁxed: we thus perform static partitioning of the
operator state. Dynamically splitting and coalescing of the state
x > 3 &
y == 5
x = 7
y = 10
matching encrypted publications
AP operator M operator
Public Cloud deployment
Fig. 2. Example of a ST RE AMHU B engine deployed on a public cloud. Each operator is supported by 6 slices. Events ﬂow from left to right.
1 2 3 4 5
Fig. 3. The ﬁve main steps of a slice migration between two hosts.
of an operator requires either help from the application or a
restrictive storage model .
In the remainder of this section we describe the mechanisms
for operator slice migration in ST RE AM MINE3G and the E-
STR EA MHU B manager component that orchestrates elasticity
and system conﬁguration.
A. Slice Migration with Low Impact on Delays
Migration of slices is supported at the level of STREAM-
MINE3G. The requirement is that the interruption of service
must be as short as possible. We minimize the delay introduced
in the processing of events during migrations by using slice
duplication and in-memory logging/buffering of events.
The migration mechanism is illustrated by Figure 3. The
slice to migrate initially runs on a host 1 (Figure 3-
The E-STR EA MHU B manager (described later in this section)
orchestrates the migration of that slice to host 2 when requested
by the E-S TR EA MHU B elasticity enforcer (described in the next
section). The STRE AM MINE3G runtime creates a new slice on
host 2, which is initially inactive. The operator DAG is rewired
in order to duplicate all incoming events for that slice (
Events still reach the original slice on host 1 where they are
processed normally, but they also reach the new slice on host 2
where they are queued for later processing. There is one queue
per originating slice of the previous operator. The copy of the
state takes place when all queues contain events with sequence
numbers that are lower than or equal to those of events already
processed on host 1 for the same source. Before copying the
state, processing is stopped on host 1 (
). The state is associated
with a timestamp vector. Processing resumes on host 2 using
events following the migrated state’s timestamp vector, ﬁltering
obsolete events and preventing duplicate processing (
the original slice on host 1 is removed (»).
The slice migration time, and hence the duration of the
service interruption, depends on the state size. AP operator
slices are stateless and there is no copy phase, therefore limited
latency is expected during a slice migration. M operator slices
have a persistent state consisting of the stored subscriptions.
The migration delay is expected to mostly depend on this
state size. EP operator slices also maintain a state, but it is
transient and expected to be small: it consists of the lists of
matching subscriber identiﬁers for publications being processed.
Therefore, migrating EP operator slices is expected to have a
small impact on the notiﬁcation delay.
B. The E-STR EA MHU B Manager
The interaction of the components of the system is illustrated
in Figure 4. The manager is in charge of the system conﬁg-
uration. It orchestrates the migrations according to the ﬂow
of operations described above, and updates the conﬁguration
accordingly. Migrations are not initiated by the manager itself
but requested by the elasticity enforcer component, described
in the next section. The manager collects probes from all
participating hosts via heartbeats. Probes indicate for each slice
the CPU utilization, memory utilization, and network usage.
They are then aggregated on a per-slice and per-host basis, and
sent to the elasticity enforcer in order to trigger changes to the
slice placement according to the input elasticity policies.
The operation of E-STR EA MHU B requires that all hosts
supporting the system share a common conﬁguration. At the
Fig. 4. Interaction of the E-STR EA MHUB components. Each host runs an
instance of the ST RE AMMINE3 G runtime (SM3G) and several operator slices.
The manager collects probes about the utilization of the hosts for each individual
slice and maintain the conﬁguration in ZooKeeper. Probes are dispatched to
the elasticity enforcer, which takes as input a set of elasticity policies and
issue migration requests to the manager.
application level, the static conﬁguration allows, for instance,
EP operator slices to know the amount of M operator slices from
which they must await matching subscriber lists. At the runtime
level, the conﬁguration is dynamic and includes the location of
the slices on the hosts, which is updated upon migrations.
The orchestration of the migration and the update of the
conﬁguration must be reliable to tolerate, in particular, a failure
of the master. To this end, the shared conﬁguration and migra-
tion orchestration leverage an instance of the ZooKeeper 
coordination kernel. ZooKeeper maintains a simple ﬁlesystem-
like hierarchy of shared objects used to reliably store the
conﬁguration. At the core of ZooKeeper, a reliable atomic
broadcast with ordering guarantees on multiple support servers
can tolerate failures and maintain conﬁguration availability. The
whole state of the manager is stored in ZooKeeper. This allows
to easily restart the manager in case of failure.
V. ELASTICITY ENFORCER AND POLICY
The elasticity enforcer decides on the placement of operator
slices on hosts by requesting migrations to the manager (e.g.,
migration of M:3 from host 2 to host 3 in Figure 4). It bases
its decisions on input elasticity policies and on probes from
the manager. The goal of the enforcer is to match policies
requirements while minimizing the number and cost of slices
migrations and thus the impact on service degradation.
The E-STR EA MHU B elasticity policy combines global and
local rules. Rules are evaluated as soon as a set of probes for all
slices has been received since their last evaluation. Violations
of global rules cause the system to scale in or out, by adding
or removing hosts. Scaling out implies moving slices from
overloaded hosts to newly added ones. Scaling in implies re-
dispatching slices from the least loaded hosts to other hosts,
and releasing the former. Local rules violations result in a
re-allocation of slices among existing hosts. Global rules have
the highest priority: local rules are only evaluated when no
global rule is violated.
The global rule used in our evaluation states that the average
CPU load for existing hosts must remain in the [30%:70%]
range. E-STR EA MHU B scales out if the average load exceeds
70% for at least 30 seconds and scales in if it drops below 30%
for the same duration. The local elasticity policy states that
the individual CPU utilization for a given host shall remain in
the [20%:80%] range, also measured over a 30 seconds period.
Finally, the policy sets a target (ideal) average CPU utilization
Host 1 Host 2
Host 1 Host 2
before scale out after scale out
Fig. 5. Example of a slice placement decision.
of 50% and speciﬁes a grace period of at least 30 seconds
between migration requests in order to reduce the probability
The elasticity enforcer executes a two step resolution
algorithm when a global or local rule is violated. First, it
decides on the set of slices to migrate. Second, it decides on a
new placement (destination hosts) for these slices.
We start by describing the case of a scale out operation, upon
a violation of the 70% average CPU utilization threshold of the
global policy. The slice selection step consists in identifying a
set of slices, whose summed CPU utilization is larger than, or
equal to, the absolute difference between the current utilization
and the average utilization target (50% in our case), starting
from overloaded hosts. This corresponds to the subset sum
problem , which is solved using dynamic programming
with pseudo-polynomial complexity. This returns a set of valid
solutions, among which we select the one incurring the minimal
amount of state transfer (as reported by memory utilization
probes). This allows us to minimize the cost and duration of
migrations and to reduce service degradation.
The second step is the placement of the slices selected
for migration. The enforcer uses a First Fit bin packing
approach . Hosts are modeled as bins with capacities
reﬂecting the available CPU resources. Each slice is modeled as
an item with weight equal to the CPU usage of the slice. A slice
can only be assigned to a host when its load is smaller than the
remaining CPU capacity of the host. The bin packing approach
starts from the current placement of slices without the slices
selected for migration. It greedily assigns migrated slices to
hosts, in decreasing order of CPU utilization. Furthermore, the
enforcer automatically derives allocation decisions for new hosts
if the spare system capacity is not sufﬁcient to accommodate a
migrating slice without violating a local rule. The cost of the
second step is linear in the number of hosts and slices in the
Figure 5 illustrates slice placement. The system starts with
an average load of
, violating the global rule
and requiring the system to scale out. In order to reach a target
utilization that is less than or equal to 50% for all hosts, the
ﬁrst step selects slices AP:1 and AP:2 for host 1 and slices
EP:1 and EP:2 for host 2, among possible sets, as these have
the lowest memory usage. The second step identiﬁes that a
new host is required for a placement that does not break the
50% max utilization rule. All selected slices are migrated to
that new host.
When a global rule violation requires scaling in, the ﬁrst step
is to calculate the required number of hosts as the difference
between the current number of hosts and the minimal number of
hosts required for an average load equal to or higher than 50%.
In the second step, the enforcer marks the least loaded host
for release. Slices from that host are reassigned onto already
running hosts and the host is released when it becomes empty.
This procedure is repeated until the required number of hosts
has been released.
Local policy enforcement uses the same two steps, except
that only slices from the underloaded or overloaded host are
considered by the bin packing algorithm.
VI. EVALUATIO N
We present in this section the experimental evaluation of
E-STREAMHUB. We ﬁrst evaluate the baseline performance
without elasticity support. Then, we measure the cost of
migrations and their impact on notiﬁcation delays. We ﬁnally
observe the behavior of E-STREAMHUB in terms of hardware
resource utilization and notiﬁcation delays, under synthetic and
trace-based load evolution patterns.
A. Experimental Setup
We deploy E-ST RE AM HUB in a private cloud of 30 hosts
running Debian Linux 6.0.7 Squeeze (kernel 2.6.32-48). Each
host features two quad-core Xeon E5405 2.0 GHz processors
and 8 GB of RAM. Hosts are interconnected by a 1 Gbps
We use a ﬁxed number of 8 slices for the AP and EP
operators, and 16 slices for the M operator. The ZooKeeper
service, the elasticity enforcer, and the E-STREAMHUB manager
run on separate hosts. In addition to the AP, M and EP operators,
we deploy on 4 dedicated hosts two convenience operators, the
source and the sink, each with 4 slices. The source operator
pushes subscriptions and publications to the system from pre-
encrypted events stored on disk. The sink operator receives
the notiﬁcations. We place the source and sink operator slices
two-by-two onto the same nodes. We measure the notiﬁcation
delays between one source operator slice and the sink operator
slices on the same host, to avoid the effect of potential clock
drifts on delay measurements. The number of notiﬁcations is
large enough for the measured average and standard deviation
to be statistically signiﬁcant.
Our evaluation focuses on the support of elastic pub/sub
as a service running on a untrusted public cloud. We therefore
use the ASPE  encrypted ﬁltering scheme, and workloads
of pre-encrypted subscriptions and publications. While the
performance of plain-text ﬁltering may depend on the charac-
teristics of the workload, such as the possibility to leverage
containment between subscriptions , , , , , ,
, encrypted ﬁltering such as with ASPE requires ﬁltering
each incoming publication against all stored subscriptions. Each
individual ﬁltering operation cost is quadratic in number of
)). The static performance of E-STREA MHU B is
thus impacted by two factors. First, the number of attributes
impacts on the cost of each individual ﬁltering operation at the
M operator level. We use a ASPE schema with
Second, the matching rate is the probability that a publication
will match each subscription. It impacts on the number of
notiﬁcations an incoming publication generates, and therefore
on the load at the EP operator level. We generate subscriptions
Number of hosts and attribution to operators (AP|M|EP)
Distribution of delays under half of max. throughput
Max 75th 50th 25th Min
Fig. 6. Performance of static E-S TR EAM HUB (100 K subscriptions).
with an average matching rate of 1%. Unless explicitly noted,
we use a workload of 100,000 subscriptions. Each publication
thus generates an average of 1,000 notiﬁcations. Since we use
encrypted ﬁltering, where each publication has to be matched
against each one of the stored subscriptions, our experiments
are independent of the nature of the workload as there is no
support for containment in the ASPE scheme. We therefore do
not need to use traces and rely on synthetic subscriptions and
Our experiments always begin with a subscription storage
phase where subscriptions are stored in the system prior to
sending any publication. Then, the rate of publications is
either the maximal one supported by the system (for the
baseline evaluation in a non-elastic setting), based on synthetic
descriptions of the rate evolution, or based on the tick trace
from the Frankfurt stock exchange described in our introduction
C. Baseline STREAMHUB Performance
We start by evaluating the baseline performance of E-
STR EA MHU B with 100 K subscriptions. We consider static
conﬁgurations of 2 to 12 hosts dedicated to the E-S TR EA MHU B
engine and supporting AP, M, and EP operator slices. We used
twice as many hosts for the M operator than for each of the
two others. With 8 hosts, we deploy AP slices on 2 hosts
(4 processors, 16 cores), M slices on 4 hosts (32 cores), and
EP slices on 2 hosts (16 cores). Slices from two different
operators can be placed on the same host (e.g., with 2 hosts,
one host runs all AP and EP operators slices). This placement
is not guaranteed to be optimal in terms of CPU utilization of
the hosts but allows us to simply illustrate the linear scaling
Figure 6.up presents the maximal throughput supported
AP M (12.5 K) M(50 K) EP
average 232 ms 1.497 s 2.533 s 275 ms
std. dev. 31 ms 354 ms 1.557 s 52 ms
TABLE I. MIGRATION TIMES, 100 PUBLICATIONS/S,
12.5 K OR 50 K SUBSCRIPTIONS STORED PER MOPE RATOR S LI CE,F OR A
TOTAL O F 100 K AND 500 K SUBSCRIPTIONS RESPECTIVELY).
by each conﬁguration, before events start accumulating at the
input of the AP operator. The throughput is perfectly linear as
expected . 12 hosts support a ﬂow of 422 publications/second,
corresponding to 42.2 millions encrypted ﬁltering operations
and 422,000 notiﬁcations sent per second.
We evaluate the delays by submitting to each conﬁguration
an incoming publication rate of half the maximal throughput,
corresponding to the target load set in our elasticity policy.
Figure 6.bottom presents the evolution of delays between the
sending of each publication by the source operator and the
reception of the last notiﬁcation by the sink operator. For each
conﬁguration, stacked percentiles are represented as shades of
grey. For instance, with 12 hosts, the minimal delay is 55 ms,
while for 75% of the publications the notiﬁcation is received
by the last notiﬁed subscriber in less than 247 ms.
D. Operator Slice Migration Performance
We now proceed to the evaluation of the performance of
slice migration. Table I presents the average and standard
deviation of the migration time for the three operators, over 25
migrations. We consider larger slices: 4, 8, and 4 slices for the
AP, M and EP operators, respectively deployed on 2, 4, and 2
hosts as the initial placement. Each migration picks a random
slice of the corresponding operator and migrates it to another
host. The system is under a constant ﬂow of 100 publication/s,
slightly less than half its maximal ﬁltering throughput (Figure 6),
and we consider the storage of 100 K subscriptions (12.5 K
per M operator slice) and 500 K subscriptions (50 K per M
operator slice). Migration times are very small for the AP
operator, which is stateless, and for the EP operator, which
has a very small state. The standard deviations are also small
compared to the average. Migrations of M operator slices
are more costly, as expected. The state size impacts on the
migration time, but delays remain reasonable even for the
very large number of subscriptions considered. We deliberately
evaluated small deployments (with 4 M operators) and large
numbers of subscriptions (up to 500 K), to show that even
disproportionately large migrations take place within a few
Our next experiment evaluates the impact of migrations
on the experienced delays. We use the same conﬁguration as
previously, with 100 K stored subscriptions. The plot indicates
the delay average, deviation, and min/max values, and points
the time when we perform migrations, respectively, for two
AP operator slices consecutively, for two M operator slices
consecutively, and ﬁnally for one of the EP operator slices. We
observe that the notiﬁcation delay increases from the steady
state of 500 ms to a maximal value of less than two seconds
after migrations take place, while the average notiﬁcation delay
remains below a second for the most part.
100 120 140 160 180 200
migrations: AP (1) AP (2) M (1) M (2) EP
Fig. 7. Impact of migrations on delay (min, average, max).
E. Elastic Scaling under Varying Workloads
We now evaluate the complete E-STR EA MHU B with elas-
ticity support. We start with a synthetic benchmark. Results are
presented by Figure 8. The ﬁrst plot depicts the load evolution.
We start with a single host that initially runs all 32 slices (AP
and EP: 8 slices, M: 16 slices). The system is initially fed
with 100 K encrypted subscriptions. We gradually increase the
rate of encrypted publications, until we reach 350 publications
per second. After a rate stability period, we decrease the rate
gradually back to no activity.
We observe the number of hosts used by the system, as
dynamically provisioned by the enforcer, in the second plot.
The minimum, average and maximum CPU load observed at
these hosts are presented by the third plot. Finally the average
delays and standard deviations are shown on the fourth plot. For
all plots, we present averages, standard deviations, minimum,
or maximum values observed over periods of 30 seconds.
We clearly observe that the system meets the expectations
set by the elasticity policy. The number of hosts gradually
increase towards 15 hosts. This corresponds to the range of
static performance expected (note that Figure 6 presents the
maximal achievable throughput with more subscriptions, hence
the deviation ). As the publication rate decreases, hosts are
released and we return to the initial conﬁguration. The elasticity
policy enforcement and corresponding 246 migrations during
the experiment result in the load of the hosts remaining within
an envelope of 40% to 70%, the average being close to the 50%
target. Finally, we observe that the delays remain small despite
migrations, which conﬁrms the ability of the E-STR EA MHU B
approach to achieve elastic scale out and scale in with minimal
degradation of the service performance. The initial migration
from one to two hosts is the one that impacts most the delay.
Indeed, the relative augmentation of load per host is more
important in this case and roughly half of the operator slices
have to be migrated to the new host, and the delays at the
three operators may sum up if the three types are migrated as
a same operation.
Our next and ﬁnal experiment uses the same presentation
as the previous, but instead of using a synthetic publication
workload, we replay the trace-based publication activity from
the Frankfurt stock exchange described in our introduction and
in Figure 1. Results are presented with the same format as
for Figure 8. The publication rate from the source operator is
based on the stock exchange ticks, while the set of 100,000
publication rate (publications / second)
host CPU load (min / average / max)
average delays (seconds)
0 600 1200 1800
Fig. 8. Elastic scaling under a steadily increasing and decreasing synthetic
workload and 20 K encrypted subscriptions.
subscriptions is ﬁxed. We consider a shorter time frame for
the experiment. We speed up the tick trace by 10 times: one
hour in the original trace corresponds to 3 minutes in the
experiment (for a total duration of 40 minutes). We also reduce
the maximum amount of publications per second to account
for the limited size of our experimentation cluster: the ﬂow is
scaled down from 1200 on Figure 1 to 190 publications per
second (for a peak load of 19 millions evaluations and 19,000
notiﬁcations sent per second, 1,800 seconds after the beginning
of the experiment). We observe that the number of machines
for supporting the workload ranges from 1 to 8. The system
reacts to changes in load, in particular at the beginning of day
and for the spike observed in the afternoon, while lowering to
3 nodes for the lower load in the evening. We observe as with
the previous experiment that the load remains for all machines
within the requested envelope. The effect of scale up operation
(e.g., around 500 seconds) is to reduce the average and max
load but may also be to let machines under-used: the subsequent
scale-down experiment chooses the underloaded machine as a
candidate for release, raising the minimal load observed. The
publication rate (publications / second)
host CPU load (min / average / max)
average delays (seconds)
0 500 1000 1500 2000 2500
Fig. 9. Elastic scaling under a load trace from the Frankfurt stock exchange.
average notiﬁcation delay remain below one second for entire
duration of the experiment despite migrations. This ﬁnal trace-
based experiment conveys the ability of E- ST RE AM HUB to
quickly adapt to the experience load while keeping notiﬁcation
delays small and stable.
We presented the design, implementation and evaluation
of an elastic content-based pub/sub engine, E-S TR EA MHU B.
Pub/sub middleware eases the composition and integration
of multiple applications, services and users across different
administrative domains. Pub/sub is particularly suited as a
cloud-provided service. The load experienced by such a service
heavily depends on the application, the ﬁltering scheme(s) used,
and ﬂuctuations of the service popularity. The unpredictability
of the load poses a challenge for the appropriate dimensioning
of the support infrastructure and for its evolution over time.
We addressed this problem by describing mechanisms and
techniques for building an elastic pub/sub engine that auto-
matically scales out and in based on observation of the load
experienced by the system. Elasticity takes place for each of the
three operators forming the engine independently, allowing to
adapt to the nature of the workload. Elastic scaling takes place
through migrations of operator slices with minimal service
degradation. The elasticity enforcer matches elasticity policies
while reducing the cost and number of necessary migrations.
Our evaluation using synthetic and trace-driven benchmarks
indicates that E-S TR EA MHU B is able to react to dynamic
changes in load, automatically adding and removing hosts
as required by the elasticity policy.
M. Altinel and M. J. Franklin. Efﬁcient ﬁltering of xml documents for
selective dissemination of information. In VLDB, 2000.
Michael Armbrust, Armando Fox, Rean Grifﬁth, Anthony D. Joseph,
Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel
Rabkin, Ion Stoica, and Matei Zaharia. A view of cloud computing.
Communications of the ACM, 53(4):50–58, April 2010.
R. Barazzutti, P. Felber, C. Fetzer, E. Onica, M. Pasin, J.-F. Pineau,
ere, and S. Weigert. StreamHub: A massively parallel architecture
for high-performance content-based publish/subscribe. In DEBS, 2013.
R. Barazzutti, P. Felber, H. Mercier, E. Onica, and E. Rivi
privacy: Efﬁcient support for privacy-preserving publish/subscribe. In
A. Brito, A. Martin, T. Knauth, S. Creutz, D. Becker de Brum,
S. Weigert, and C. Fetzer. Scalable and low-latency data processing
with streammapreduce. In CloudCom, 2011.
A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Design and evaluation
of a wide-area event notiﬁcation service. ACM TCS, 2001.
C.-Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efﬁcient ﬁltering
of XML documents with XPath expressions. VLDB Journal, 11, 2002.
Sunoh Choi, Gabriel Ghinita, and Elisa Bertino. A privacy-enhancing
content-based publish/subscribe system using scalar product preserving
transformations. In Database and Expert Systems Applications, volume
6261 of Lecture Notes in Computer Science, pages 368–384, 2010.
E.G. Coffman Jr, M.R. Garey, and D.S. Johnson. Approximation
algorithms for bin packing: A survey. In Approximation algorithms for
NP-hard problems, pages 46–93. PWS Publishing Co., 1996.
Patrick Th. Eugster, Pascal Felber, Rachid Guerraoui, and Anne-Marie
Kermarrec. The many faces of publish/subscribe. ACM Comput. Surv.,
Wenjing Fang, Beihong Jin, Biao Zhang, Yuwei Yang, and Ziyuan Qin.
Design and evaluation of a pub/sub service in the cloud. In Proceedings
of the 2011 International Conference on Cloud and Service Computing,
CSC ’11, pages 32–39, Washington, DC, USA, 2011. IEEE Computer
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki,
and Peter Pietzuch. Integrating scale out and fault tolerance in
stream processing using operator state management. In Proc. of ACM
International Conference on Management of Data, SIGMOD, New York,
NY, USA, 2013.
Anshul Gandhi, Mor Harchol-Balter, Ram Raghunathan, and Michael A.
Kozuch. Autoscale: Dynamic, robust capacity management for multi-tier
data centers. ACM Trans. Comput. Syst., 30(4):14:1–14:26, November
Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martinez,
Claudio Soriente, and Patrick Valduriez. Streamcloud: An elastic and
scalable data streaming system. IEEE Transactions on Parallel and
Distributed Systems, 23(12):2351–2365, 2012.
Joe Hoffert, Douglas C. Schmidt, and Aniruddha Gokhale. Adapting
distributed real-time and embedded pub/sub middleware for cloud
computing environments. In Proceedings of the ACM/IFIP/USENIX
11th International Conference on Middleware, Middleware ’10, pages
21–41, Berlin, Heidelberg, 2010. Springer-Verlag.
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin
Reed. Zookeeper: wait-free coordination for internet-scale systems.
In Proceedings of the 2010 USENIX technical conference, USENIX
ATC’10, pages 11–11, Berkeley, CA, USA, 2010. USENIX Association.
Atsushi Ishii and Toyotaro Suzumura. Elastic stream computing with
clouds. In Cloud Computing (CLOUD), 2011 IEEE International
Conference on, pages 195–202. IEEE, 2011.
Jay Kreps, Neha Narkhede, and Jun Rao. Kafka: a distributed messaging
system for log processing. In Proc. of the 6th International Workshop
on Networking Meets Databases, volume NetDB, Athens, Greece, 2011.
Ming Li, Fan Ye, Minkyong Kim, Han Chen, and Hui Lei. A scalable
and elastic publish/subscribe service. In IEEE International Parallel
Distributed Processing Symposium, IPDPS, 2011.
Paul Marshall, Kate Keahey, and Tim Freeman. Elastic site: Using
clouds to elastically extend site resources. In Proceedings of the 2010
10th IEEE/ACM International Conference on Cluster, Cloud and Grid
Computing, CCGRID ’10, pages 43–52, Washington, DC, USA, 2010.
IEEE Computer Society.
Silvano Martello and Paolo Toth. Knapsack problems: algorithms and
computer implementations. John Wiley & Sons, Inc., 1990.
A. Martin, C. Fetzer, and A. Brito. Active replication at (almost) no
cost. In SRDS, 2011.
A. Martin, T. Knauth, S. Creutz, D. Becker de Brum, S. Weigert, A. Brito,
and C. Fetzer. Low-overhead fault tolerance for high-throughput data
processing systems. In ICDCS, 2011.
Peter Mell and Timothy Grance. The NIST deﬁnition of cloud computing.
NIST special publication, 800(145):7, 2011.
Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and
Kun-Lung Wu. Elastic scaling of data parallel operators in stream
processing. In Parallel & Distributed Processing, 2009. IPDPS 2009.
IEEE International Symposium on, pages 1–12. IEEE, 2009.
Juraj Somorovsky, Mario Heiderich, Meiko Jensen, J
org Schwenk, Nils
Gruschka, and Luigi Lo Iacono. All your clouds are belong to us:
security analysis of cloud management interfaces. In Proceedings of
the 3rd ACM workshop on Cloud computing security workshop, CCSW
’11, pages 3–14, New York, NY, USA, 2011. ACM.
Nam-Luc Tran, Sabri Skhiri, and Esteban Zim
anyi. Eqs: An elastic
and scalable message queue for the cloud. In Proceedings of the 2011
IEEE Third International Conference on Cloud Computing Technology
and Science, CLOUDCOM ’11, pages 391–398, Washington, DC, USA,
2011. IEEE Computer Society.
Kun-Lung Wu, Kirsten W. Hildrum, Wei Fan, Philip S. Yu, Charu C.
Aggarwal, David A. George, Bu
gra Gedik, Eric Bouillet, Xiaohui Gu,
Gang Luo, and Haixun Wang. Challenges and experience in prototyping
a multi-modal stream analytic and monitoring application on system s.
In Proceedings of the 33rd international conference on Very large data
bases, VLDB ’07, pages 1185–1196. VLDB Endowment, 2007.