The PADRES Publish/Subscribe
Hans-Arno Jacobsen, Alex Cheung, Guoli Li, Balasubramaneyam Maniymaran,
Vinod Muthusamy, Reza Sherafat Kazemzadeh
Middleware Systems Research Group, University of Toronto, Canada
Publish/Subscribe, Content-based Routing, Composite Subscription, Historic Data Access, Load
This chapter introduces PADRES, the publish/subscribe model with the capability to correlate
events, uniformly access data produced in the past and future, balance the traffic load among
brokers, and handle network failures. The new model can filter, aggregate, correlate and project
any combination of historic and future data. A flexible architecture is proposed consisting of dis-
tributed and replicated data repositories that can be provisioned in ways to tradeoff availability,
storage overhead, query overhead, query delay, load distribution, parallelism, redundancy and
locality. This chapter gives a detailed overview of the PADRES content-based publish/subscribe
system. Several applications are presented in detail that can benefit from the content-based nature
of the publish/subscribe paradigm and take advantage of its scalability and robustness features. A
list of example applications are discussed that can benefit from the content-based nature of pub-
lish/subscribe paradigm and take advantage of its scalability and robustness features.
The publish/subscribe paradigm provides a simple and effective method for disseminating data
while maintaining a clean decoupling of data sources and sinks (Cugola, 2001; Fabret, 2001; Cas-
tro, 2002；Fiege, 2002; Carzaniga, 2003; Eugster, 2003; Li, 2005; Ostrowski, 2006; Rose, 2007).
This decoupling can enable the design of large, distributed, and loosely coupled systems that
interoperate through simple publish and subscribe invocations. While there are many applications
such as information dissemination (Liu, 2004; Nayate, 2004; Liu, 2005) based on group commu-
nication (Birman, 1999) and topic-based publish/subscribe protocols (Castro, 2002; Ostrowski,
2006), a large variety of emerging applications benefit from the expressiveness, filtering, distrib-
uted event correlation, and complex event processing capabilities of content-based pub-
lish/subscribe systems. These applications include RSS feed filtering (Rose, 2007), stock-market
monitoring engines (Tock, 2005), system and network management and monitoring (Mukherjee,
1994; Fawcett, 1999), algorithmic trading with complex event processing (Keonig, 2007), busi-
ness process management and execution (Schuler, 2001; Andrews, 2003;), business activity
monitoring (Fawcett, 1999), workflow management (Cugola, 2001), and service discovery (Hu,
Typically, a distributed content-based publish/subscribe systems is built as an applica-
tion-level overlay of content-based publish/subscribe brokers, with publishing data sources and
1 This paper will be published as a chapter in “Handbook of research on advanced distributed event-based systems,
publish/subscribe and message filtering technologies”.
The PADRES Publish/Subscribe
subscribing data sinks connecting to the broker overlay as clients. In a content-based pub-
lish/subscribe system, message routing decisions are not based on destination IP-addresses but on
the content of messages and the locations of data sinks that have expressed an interest in that
To make the publish/subscribe paradigm a viable solution for the above applications, addi-
tional features must be added. This includes support for composite subscriptions to model and
detect composite events, and to enable event correlation and in-network event filtering to reduce
the amount of data transferred across the network.
Furthermore, the publish/subscribe substrate that carries and delivers messages must be robust
against non-uniform workloads, node failures, and network congestions. In PADRES2, robustness
is achieved by supporting alternate message routing paths, load balancing techniques to distribute
load, and fault resilience techniques to react to broker failures.
It is also essential for a publish/subscribe system to provide tools to perform monitoring, de-
ployment, and management tasks. Monitoring is required throughout the system to oversee the
actual message routing, the operation of content-based brokers, and the interaction of applications
via the publish/subscribe substrate. Deployment support is required to bring up large broker fed-
erations, orchestrate composite applications, support composition of services and business proc-
esses, and to conduct controlled experiments. Management support is required to inspect and
control live brokers.
This chapter presents the PADRES content-based publish/subscribe system developed by the
Middleware Systems Research Group at the University of Toronto. The PADRES system incor-
porates many unique features that address the above concerns and thereby enable a broad class of
applications. The remainder of this chapter beings with a description of the PADRES language
model, network architecture and routing protocol in Section 1.2. This is followed, in Section 1.3,
by an outline of the PADRES load balancing capabilities whereby the system can automatically
relocate subscribers in order to avoid processing or routing hotspots among the network of
brokers. Section 1.4 then addresses failure resilience describing how the PADRES routing proto-
cols are able to guarantee message delivery despite a configurable number of concurrent
crash-stop node failures. Some of the PADRES distributed management features are presented in
Section 1.5, including topology monitoring and deployment tools. Next, Section 1.6 discusses a
wide variety of applications and illustrates how the features of the PADRES system enable or
support the development of these applications. Finally, a survey of related publish/subscribe pro-
jects and the contributions of the PADRES project are presented in Section 1.7, followed by some
concluding remarks in Section 1.8.
1.2 MESSAGE ROUTING
All interactions in the PADRES distributed content-based publish/subscribe system are
performed by routing four messages: advertisements, subscriptions, publications, and no-
tifications. This section outlines the format of each of these messages, then describes
how these messages are routed in the PADRES network.
1.2.1 Language model
2The project name PADRES is an acronym that was initially comprised of letters (mostly first
letters of first names) of the initial group of researchers working on the project. Over time, the
acronym was also synonymously used as name, simply written Padres. Both forms are correct.
Also, various re-interpretations of the acronym have been published, such as Publish/subscribe
Applied to Distributed REsource Scheduling, PAdres is Distributed REsource Scheduling, etc.
The PADRES language model is based on the traditional [attribute, operator, value] predicates
used in several other content-based publish/subscribes systems (Opyrchal, 2000; Carzaniga, 2001;
Cugola, 2001; Fabret, 2001; Mühl, 2002; Bittner, 2007). In PADRES, each message consists of a
message header and a message body. The header includes a unique message identifier, the mes-
sage type (publication, advertisement, subscription, or notification), the last and next hops of the
message, and a timestamp that records when the message was generated. The content and for-
mats of each message type are detailed below.
Data producers, or publishers, encapsulate their data in publication messages which consist of a
comma separated set of [attribute, value] pairs. Each publication message includes a mandatory
tuple describing the class of the message. The class attribute provides a guaranteed selective
predicate for matching, similar to the topic in topic-based publish/subscribe systems.3 A
publication that conveys information about a stock listing may look as follows:
P: [class, ‘STOCK’], [symbol, ‘YHOO’], [open, 25.2], [high, 43.0],
[low, 24.5], [close, 33.0], [volume, 170300], [date, ‘12-Apr-96’]
A publication is allowed to traverse the system only if there are data sinks, or subscribers,
who are interested in the data. Subscribers indicate their interest using subscription messages
which are detailed below. If there are no interested subscribers, the publication is dropped. A
publication may also contain an optional payload, which is a blob of binary data. The payload is
delivered to subscribers, but cannot be referenced in a subscription constraint.
Before a publisher can issue publications, it must supply a template that specifies constraints on
the publications it will produce. These templates are expressed via advertisement messages. In a
sense, an advertisement is analogous to a database schema or a programming language type, and
can specify the type and ranges for each attribute as shown in the following example:
A: [class, eq4, ‘STOCK’], [symbol, isPresent, @STRING], [open, >, 0.0],
[high, >, 0.0], [low, > ,0.0], [close, >, 0.0], [volume, >, 0],
[date, isPresent, @DATE]
The above advertisement indicates that the publisher will publish only STOCK data with any
symbol. The isPresent operator allows an attribute to have any value in the domain of the
An advertisement is said to induce publications: the attribute set of an induced publication is a
subset of attributes defined in the associated advertisement, and the values of each attribute in an
induced publication must satisfy the predicate constraint defined in the advertisement. Note that a
publisher may only issue publications that are induced by an advertisement it has sent. Two
possible publications P1 and P2 induced by the above advertisement are listed below, while P3 is
not induced by the advertisement due to the extra attribute company.
P1: [class, ‘STOCK’], [symbol, ‘YHOO’], [open, 25.25],
[high, 43.00],[low, 24.50]
3The PADRES language is nevertheless fully content-based and supports a rich predicate language.
4Operator ‘eq’ is used for String type values and ‘=’ is used for Integer and float type values.
P2: [class, ‘STOCK’], [symbol, ‘IBM’], [open, 45.25]
P3: [class, ‘STOCK’], [symbol, ‘IBM’], [company, ‘IBM’]
Subscribers express their interests in receiving publication messages by issuing subscriptions
which specify predicate constraints on matching publications. PADRES not only allows
subscribers to subscribe to individual publications, but also allows correlations or joins across
multiple publications. Subscriptions are classified into atomic and composite subscriptions.
An atomic subscription is a conjunction of predicates. For example, below is a subscription for
Yahoo stock quotes.
S: [class, eq, ‘STOCK’], [symbol, eq, ‘YHOO’],
[open, isPresent, @FLOAT]
The commas between predicates indicate the conjunction relation. Similar to publications, each
subscription message has a mandatory predicate specifying the class of the message, with the
remaining predicates specyfing constraints on other attributes.
A publication is said to match a subscription, if all predicates in the subscription are satisfied
by some [attribute, value] pair in the publication. For instance, the above subscription is matched
by publications of all YHOO stock quotes with an open value. A subscription is said to cover
another subscription, if and only if any publication that matches the latter also matches the former.
That is, the set of publications matching the covering subscription is a superset of those matching
the covered subscription.
Composite subscriptions consist of atomic subscriptions linked by logical or temporal
operators, and can be used to express interest in composite events. A composite subscription is
matched only after all component atomic subscriptions are satisfied. For example, the following
subscription detects when Yahoo’s stock opens at less than 22, and Microsoft’s at greater than 31.
Parenthesis are used to specify the priority of operators.
CS: ([class, eq, ‘STOCK’], [symbol, eq, ‘YHOO’], [open, <, 22.0]) &&
([class, eq, ‘STOCK’], [symbol, eq, ‘MSFT’], [open, >, 31.0])
Moreover, unlike the traditional publish/subscribe model, PADRES can deliver not only those
publications produced after a subscription has been issued, but also those published before a
subscription was issued. That is, PADRES realizes a publish/subscribe model to query both the
future and the past (Li, 2007; Li, 2008). In this model, data from the past can be correlated with
data from the future. Composite subscriptions that allow correlations across publications continue
to work with future data, and also with any combination of historic and future data. In that sense,
subscriptions can be classified into future subscriptions, historical subscriptions and hybrids of
For example, the following subscription is satisfied if during the period Aug. 12 to Aug. 24,
2008, MSFT’s opening price was lower than the current YHOO opening price. The variable $X
correlates the opening price in the two stock quotes. This is an example of a hybrid subscription.
CS: ([class, eq, ‘STOCK’], [symbol, eq, ‘YHOO’], [open, eq, $X] &&
[class, eq, ‘STOCK’], [symbol, eq, ‘MSFT’], [open, >, $X],
[_start_time, eq, ‘12-Aug-08’], [_end_time, eq, ‘24-Aug-08’])
PADRES also provides an SQL-like language called PSQL (PADRES SQL) (Li, 2008), which
has the same expressiveness as described above and allows users to uniformly access data pro-
duced in the past and future. The PSQL language supports the ability to specify the notification
semantic, and it can filter, aggregate, correlate, and project any combination of historic and future
data as described below.
In PSQL, subscribers issue SQL-like SELECT statements to query both historic and future
publications. Within a SELECT statement, the SELECT clause specifies the set of attributes or
aggregation functions to include in the notifications of matching publications, the WHERE clause
indicates the predicate constraints to apply to matching publications, and the optional FROM and
HAVING clauses help express joins and aggregations.
SELECT [ attr | function ], ...
[FROM src, ...]
WHERE attr op val, ...
[HAVING function, ...]
The above composite subscription is translated as follows in PSQL.
SELECT src1.class, src1.symbol, src1.open, src2.symbol,
FROM src1, src2
WHERE src1.class eq ‘STOCK’,
src2.class eq ‘STOCK’,
src1.symbol eq ‘YHOO’,
src2.symbol eq ‘MSFT’,
src1.open < src2.open,
src2.start_time eq ‘12-Aug-08’,
src2.end_time eq ‘24-Aug-08’
Notice that the reserved start_time and end_time attributes can be used to express time
constraints in order to query for publications from the past, the future, or both. The sources in
the FROM clause specify that two different publications are required to satisfy this query, and are
subsequently used to qualify the WHERE constraints. The two publications may come from dif-
ferent publishers and conform to different schema (i.e., advertisements).
The HAVING clause is used to specify constraints across a set of matching publications. The
,N), MAX( ,N), and MIN(
,N) compute the relevant aggregation across
matching publications or be reset when the HAVING constraints are satisfied. The following
subscription returns all publications about YHOO stock quotes in a window of 10 publications
whose average price exceeds $20.
SELECT class, symbol, price
WHERE class eq ‘STOCK’, symbol eq ‘YHOO’
HAVING AVG(price, 10) > 20.00
For more information about PSQL, please refer to the technique report (Li, 2008).
in a window of matching publications. The window may either slide over
When a publication matches a subscription at a broker, a notification message is generated and
further forwarded into the broker network until delivered to subscribers. Notification semantics
do not constrain notification results, but transform them. Recall that notifications may include a
subset of attributes in matching publications indicated in the SELECT clause in PSQL. Most ex-
isting publish/subscribe systems use matching publication messages as notifications whereas
PSQL supports projections and aggregations over matching publications. This simplifies the noti-
fications delivered to subscribers and reduces overhead by eliminating unnecessary information.
1.2.2 Broker network and broker architecture
Figure 1. Broker network
Figure 2. Router architecture
Figure 1 shows a deployed PADRES system consists of a set of brokers connected in an overlay
which forms the basis for message routing. Each PADRES broker acts as a content-based router
that matches and routes publish/subscribe messages. A broker is only aware of its neighbors
(those located within one hop), which information it stores in its Overlay Routing Tables (ORT).
Clients connect to brokers using various binding interfaces such as Java Remote Method Invoca-
tion (RMI) and Java Messaging Service (JMS).
Publishers and subscribers are clients to the overlay. A publisher client must first issue an ad-
vertisement before it publishes, and the advertisement is flooded to all brokers in the overlay
network. These advertisements are stored at each broker in a Subscription Routing Table (SRT)
which is essentially a list of [advertisement, last hop] tuples.
A subscriber may subscribe at any time, and subscriptions are routed based on the information
in the SRT. If a subscription intersects an advertisement in the SRT, it is forwarded to the last hop
broker the advertisement came from. A subscription is routed hop-by-hop in this way until it
reaches the publisher who sent the matching advertisement. Subscriptions are used to construct
the Publication Routing Table (PRT). Similar to the SRT, the PRT is a list of [subscription, last
hop] tuples, and is used to route publications.
If a publication matches a subscription in the PRT, it is forwarded to the last hop broker of that
subscription until it reaches the subscriber that sent the subscription. Figure 1 shows an example
PADRES overlay and the SRT and the PRT at one of the brokers. In the figure, in Step 1 an ad-
vertisement is published at broker B1. A matching subscription enters through broker B2 in Step
2 and since the subscription overlaps the advertisement at broker B3, it is sent to broker B1. In
Step 3 a publication is routed to broker B2 along the path established by the subscription.
Each broker consists of an input queue, a router, and a set of output queues, as shown in Fig-
ure 2. A message first goes into the input queue. The router takes the message from the input
queue, matches it against existing messages according to the message type, and puts it in the
proper output queue(s) which refer to different destination(s). Other components provide other
advanced features. For example, the controller provides an interface for a system administrator to
manipulate a broker (e.g., to shut it down, or to inject a message into it); the monitor maintains
statistical information about the broker (e.g., the incoming message rate, the average queueing
time and the matching time); the load balancer triggers offload algorithms to balance the traffic
among brokers when a broker becomes overloaded (e.g., the incoming message rate exceeds a
certain threshold); and the failure detector triggers the fault-tolerance procedure when a failure is
detected in order to reconstruct new forwarding paths for messages and ensure timely delivery of
publications in the presence of failures.
Figure 3. Rete network
PADRES brokers use an efficient Rete-based pattern matching algorithm (Forgy, 1982) to
perform publish/subscribe content-based matching. Subscriptions are organized in a Rete network
as shown in Figure 3. Each rectangle node in the Rete network corresponds to a predicate and
carries out simple conditional tests to match attributes against constant values. Each oval node
performs a join between different atomic subscriptions and thus corresponds to composite sub-
scriptions. These oval nodes maintain the partial matching states for composite subscriptions. A
path from the root node to a terminal node (a double-lined rectangle) represents a subscription.
The Rete matching engine performs efficient content-based matching by reducing or eliminating
certain types of redundancy through the use of node sharing. Partial matching states stored in the
join nodes allow the matching engine to avoid a complete re-evaluation of all atomic subscrip-
tions each time new publications are inserted into the matching engine. Experiments show that it
takes only 4.5 ms to match a publication against 200,000 subscriptions which is nearly 20 times
faster than the predicate counting algorithm (Ashayer, 2002). Moreover, the detection time does
not increase with the number of subscriptions, but is affected by the number of matched publica-
tions. That is, the more publications that match a subscription, the longer it takes the matching
engine to process the subscription. This indicates that the Rete approach is suitable for large-scale
publish/subscribe systems and can process a large number of publication and subscription mes-
sages efficiently. Also, the Rete-based matching engine naturally supports composite subscription
1.2.3 Content-based routing protocols
Instead of address-based routing, PADRES uses content-based routing, where a publication is
routed towards the interested subscribers without knowing where subscribers are and how many
of them exist. The content-based address of a subscriber is the set of subscriptions issued by the
subscriber. This provides a decoupling between the publishers and subscribers.
PADRES provides many content-based routing optimizations to improve efficiency and ro-
bustness of message delivery, including covering-based routing, adaptive content-based routing in
cyclic overlays, and routing protocols for composite subscriptions.
Covering and merging based routing
In content-based publish/subscribe systems, subscribers may issue similar subscriptions. The goal
of covering-based routing is to guarantee a compact routing table without information loss,
thereby avoiding the propagation of redundant messages, and reducing the size of the routing ta-
bles and improving the performance of the matching algorithm.
When a broker receives a new subscription from a neighbor, it performs the following steps to
determine how to forward it. First, it searches the routing table to determine if the subscription is
covered by some existing subscription from the same neighbor. If it is, the new subscription can
be safely removed without inserting it into the routing table and, of course, without forwarding it
further. If the new subscription is not covered by any existing subscriptions, the broker checks if
it covers any existing subscriptions. If so, the covered subscriptions should be removed.
Subscriptions with no covering relations but which have significant overlap with one another
can be merged into a new subscription, thus creating even more concise routing tables. There are
two kinds of mergers: if the publication set of the merged subscription is exactly equal to the un-
ion of the publication sets of the original subscriptions, the merger is said to be perfect; otherwise,
if the merged subscription’s publication set is a superset of the union, it is an imperfect merger.
Imperfect merging can reduce the number of subscriptions but may allow false positives, that is,
publications that match the merged subscription but not any of the original subscriptions. These
false positives are eventually filtered out in the network, and subscribers will not receive any false
positives, but they do contribute to increased message propagations. However, by selectively and
strategically employing subscription merging the matching efficiency of the publish/subscribe
system can be further improved. For additional information, please refer to (Li, 2005; Li, 2008)
Adaptive content-based routing for general overlays
The standard content-based routing protocol is based on an acyclic broker overlay network. With
only one path between any pair of brokers or clients, content-based routing is greatly simplified.
However, an acyclic overlay offers limited flexibility to accommodate changing network condi-
tions, is not robust with respect to broker failures, and introduces complexities for supporting
other protocols such as failure recovery.
We propose a TID-based content-based routing protocol (Li, 2008) for cyclic overlays to
eliminate the above limitations. In the TID-based routing, each advertisement is assigned a
unique tree identifier (TID) within the broker network. When a broker receives a subscription
from a subscriber, the subscription is bound with the TIDs of its matching advertisements. A
subscription with a bound TID value only propagates along the corresponding advertisement tree.
Subscriptions set up paths for routing publications. When a broker receives a publication, it is
assigned an identifier equal to the TID of its matching advertisement. From this point on, the
publication is propagated along the paths set up by matching subscriptions with the same TID
without matching the content of the publication at each broker. This is referred to as fixed publi-
Alternative paths for publication routing are maintained in PRTs as subscription routing paths
with different TIDs and destinations. More alternate paths are available if publishers’ advertise-
ment spaces overlap or subscribers are interested in similar publications, which is often the case
for many applications with long-tailed workloads. Our approach takes advantage of this and uses
multiple paths available at the subscription level. Our dynamic publication routing (DPR) algo-
rithm takes advantages of these alternate paths by balancing publication traffic among them, and
providing more robust message delivery.
0 3000 6000 9000 12000 15000 18000
Number of notifications over time
Publication rate = 2400msg/min
Connection degree = 4
Fixed publication routing
Dynamic publication routing
Figure 4. Higher publication rate
We observe in our experiments that an increase in the publication rate causes the fixed routing
approach to suffer worse notification delays. For instance, in Figure 4, when the publication rate
is increased to 2400 msg/min, the fixed algorithm becomes overloaded with messages queueing
up at brokers along the routing path, whereas the dynamic routing algorithm continues to operate
by offloading the high workload across alternate paths. The results suggest that dynamic routing
is more stable and capable of handling heavier workloads, especially in a well connected network.
Composite subscription routing
Composite events are detected by the broker network in a distributed manner. In topology-based
composite subscription routing (Li, 2005), a composite subscription is routed as a unit towards
potential publishers until it reaches a broker B at which the potential data sources are located in
different directions in the overlay network. The composite subscription is then split at broker B,
which is called the join point broker. Each component subscription is routed to potential publish-
ers separately. Later, matching publications are routed back to the join point broker for it to detect
the composite event. Notice that topology-based routing assumes an acyclic overlay and does not
consider dynamic network conditions.
In a general (cyclic) broker overlay, multiple paths exist between subscribers and publishers,
and topology-based composite subscription routing does not necessarily result in the most effi-
cient use of network resources. For example, composite event detection would be less costly if
the detection is close to publishers with a higher publishing rate, and in a cyclic overlay, more
alternative locations for composite event detection may be available. The overall savings are sig-
nificant if the imbalance in detecting composite events at different locations is large. PADRES
includes a dynamic composite subscription routing (DCSR) algorithm (Li, 2008) that selects op-
timal join point brokers to minimize the network traffic and matching delay while correctly de-
tecting composite events in a cyclic broker overlay. The DCSR algorithm determines how a
composite subscription should be split and routed based on the cost model discussed below.
A broker routing a composite subscription makes local optimal decisions based on the knowl-
edge available to itself and its neighbors. The cost function captures the use of resources such as
memory, CPU, and communication. Suppose a composite subscription CS is split at broker B.
The total routing cost (TRC) of
( ) =()(
TRC CSRC CS RC CS
and includes the routing cost of CS at broker B, denoted as
RC CS , and those neighbors
where publications contributing to CS may come from, denoted as
denotes the part of CS routed to broker
B , and may be an atomic or composite subscription.
The cost of a composite subscription CS at a broker includes not only the time needed to
match publications (from n neighbors) against CS, but also the time these publications spend in
the input queue of the broker, and the time that matching results (to m neighbors) spend in the
output queues. This cost is modeled as
RC CSTP CS
( ) =| ( )|| (P CS )| | ( P CS)
where Tm is the average matching time at a broker, and are the average time messages
spend in the input queue, and output queue to the i
tion S, which is the number of matching publications per unit time. To compute the cost at a
neighbor, brokers periodically exchange information such as Tin and Tm. This information is in-
corporated into an M/M/1 queueing model to estimate queueing times at neighbor brokers as a
result of the additional traffic attracted by splitting a composite subscription there.
Evaluations of the DCSR algorithm were conducted on the PlanetLab wide-area network with
a 30 broker topology. The metrics measured include the bandwidth of certain brokers located on
the composite subscription routing path. In Figure 5, the solid bars represent the number of out-
going messages at a broker, and the hatched bars are the number of incoming messages that are
not forwarded. Note that the sum of the solid and hatched bars represents the total number of in-
coming messages at a broker. Three routing algorithms are compared: simple routing, in which
th neighbor. |P(S)| is the cardinality of subscrip-
composite subscriptions are split into atomic parts at the first broker, topology-based composite
subscription routing, and the DCSR algorithm. The topology-based routing imposes less traffic
than simple routing by moving the join point into the network and the DCSR algorithm further
reduces traffic by moving the join point closer towards congested publishers as indicated by the
cost model. In the scenario in Figure 5, compared to simple routing, the DCSR algorithm reduces
the traffic at Brokers 1B by 79.5%, a reduction that is also enjoyed by all brokers downstream
of the join point.
Figure 5. Composite subscription traffic
1.3 Historic Data Access
Figure 6. Historic data access architecture
PADRES allows subscribers to access both future and historic data with a single interface as de-
scribed in Section 1.2.1. The system architecture, shown in Figure 6, consists of a traditional dis-
tributed publish/subscribe overlay network of brokers and clients. Subscriptions for future publi-
cations are routed and handled as usual (Opyrchal, 2000; Carzaniga, 2001). To support historic
subscriptions, databases are attached to a subset of brokers as shown in Figure 6. The databases
are provisioned to sink a specified subset of publications, and to later respond to queries. The set
of possible publications, as determined by the advertisements in the system, is partitioned and
these partitions assigned to the databases. A partition may be assigned to multiple databases to
achieve replication, and multiple partitions may be assigned to the same database if database
consolidation is desired. Partition assignments can be modified at any time, and replicas will
synchronize among themselves. The only constraint is that each partition be assigned to at least
one database so no publications are lost. Partitioning algorithms as well and partition selection
and assignment policies are described in (Li, 2008). Subscriptions can be atomic expressing
constraints on single publications or composite expressing correlation constraints over multiple
ublications. We describe their routing under the extended publish/subscribe model. p
1.3.1 Atomic Subscription Routing
When a broker receives an atomic subscription, it checks the start_time and end_time at-
tributes. A future subscription is forwarded to potential publishers using standard pub-
lish/subscribe routing (Opyrchal, 2000; Carzaniga, 2001). A hybrid subscription is split into fu-
For historic subscriptions, a broker determines the set of advertisements that overlap the subscrip-
tion, and for each partition, selects the database with the minimum routing delay. The subscrip-
tion is forwarded to only one database per partition to avoid duplicate results. When a database
receives a historic subscription, it evaluates it as a database query, and publishes the results as
publications to be routed back to the subscriber. Upon receiving an END publication after the final
result, the subscriber’s host broker unsubscribes the historic su
1.3.2 Adaptive Routing
Topology-based composite subscription routing (Li, 2005) evaluates correlation constraints in the
network where the paths from the publishers to subscriber merge. If a composite subscription cor-
relates a historic data source and a publisher, where the former produces more publications, cor-
relation detection would save network traffic if moved closer to the database, thereby filtering
potentially unnecessary historic publications earlier in the network. Based on this observation, the
DCSR algorithm we discussed in Section 1.2.3 can be applied here. The WHERE clause con-
straints of a composite subscription can be represented as a tree where the internal nodes are op-
erators, leaf nodes are atomic subscriptions, and the root node represents the composite subscrip-
tion. A composite subscription example is represented as the tree in Figure 6. The recursive
DCSR algorithm (Li, 2008) computes the destination of each node in the tree to determine how to
split and route the subscription. The algorithm traverses the tree as follows: if the root of the tree
is a leaf, that is, an atomic subscription, the atomic subscription's next hop is assigned to the root.
Otherwise, the algorithm processes the left and right children's destination trees separately. If
the two children have the same destination, the root node is assigned this destination, and the
composite subscription is routed to the next hop as a whole. If the children have different destina-
tions, the algorithm estimates the total routing cost for potential candidate brokers, and the mini-
historic parts, with the historic subscription routed to potential databases as described
bscription. This broker also un-
ubscribes future subscriptions whose end_time has expired.
mum cost destination is assigned to the root. If the root’s destination is the current broker, the
composite subscription is split here, and the current broker is the join point and performs the
omposite detection. The algorithm assigns destinations to the tree nodes bottom up. c
When network conditions change, join points may no longer be optimal and should be recom-
puted. A join point broker periodically evaluates the cost model, and upon finding a broker able
to perform detection cheaper than itself, initiates a join point movement. The state transfer from
the original join point to the new one includes routing path information and partial matching
states. Each part of the composite subscription should be routed to the proper destinations so rout-
ing information is consistent. Publications that partially match
the join point broker must be delivered to the new join point.
For more detailed
composite subscriptions stored at
description of the historic data access function, please refer to our technique
report (Li, 2008).
1.4 Load Balancing
In a distributed publish/subscribe system, geographically dispersed brokers may suffer from un-
even load distributions due to different population densities, interests, and usage patterns of
end-users. A typical scenario is an enterprise-scale deployment consisting of a dozen brokers lo-
cated at different world-wide branches of an international corporation, where the broker network
provides a communication service for hundreds of publishers and thousands of subscribers. It is
conceivable that the concentration of business operations and departments, and thus pub-
lish/subscribe clients and messages, is orders of magnitudes higher at the corporate headquarters
than at the subsidiary locations. Such hotspots at the headquarters can overload the broker there in
two ways. First, the broker can be overloaded, if the incoming message rate into the broker ex-
ceeds the maximum processing or matching rate of the broker’s matching engine. Because the
matching rate is inversely proportional to the number of subscriptions in the matching engine, this
effect is exacerbated if the number of subscribers is large (Fabret, 2001). Second, overload can
also occur if the output transmission rate exceeds the total available output bandwidth. In both
cases, input queues at the broker accumulate with messages waiting to be processed, resulting in
increasingly higher processing and delivery delays. Worse yet, the broker may crash when it runs
out of memory from queueing too many messages.
The matching rate and both the incoming and outgoing message rates determine the load of a
broker. In turn, these factors depend on the number and nature of subscriptions that the broker
services. Thus, load balancing is possible by offloading specific subscribers from higher loaded to
lesser loaded brokers. The PADRES system supports this capability using load estimation meth-
odologies, a load balancing framework, and three offload algorithms (Cheung, 2006).
Figure 7. PEER architecture
The load balancing framework consists of the PADRES Efficient Event Routing (PEER) ar-
chitecture, a distributed load exchange protocol called PADRES Information Exchange (PIE), and
detection and mediation mechanisms at the local and global load balancing tiers. The PEER ar-
chitecture organizes brokers into a hierarchical structure as shown in Figure 7. Brokers with more
than one neighboring broker are referred to as cluster-head brokers, while brokers with only one
neighbor are referred to as edge brokers. A cluster-head broker with its connected set of edge
brokers, if any, forms a cluster. Publishers are serviced by cluster-head brokers, while subscribers
are serviced by edge brokers. Load balancing is possible by moving subscribers among edge bro-
kers of the same or different cluster. With PIE, edge brokers within a cluster exchange load in-
formation by publishing and subscribing to PIE messages of a certain cluster ID. For example, a
subscription to PIE messages from cluster C01 is [class, eq, ‘LOCAL_PIE’],
[cluster, eq, ‘C01’]. The detector invokes load balancing if it detects overload or the
load of the local broker is greater than another broker by a threshold. Load is characterized by
three load metrics. First, the input utilization ratio (Ir) captures the broker’s input load and is cal-
where ir is the rate of incoming publications and mr is the maximum message match rate calcu-
lated by taking the inverse of the matching delay. Second, the output utilization ratio captures the
output load and is calculated as:
where twindow is the monitoring time window, tbusy is the amount of time spent sending messages
within twindow, brx represents the messages (in bytes) put into the output queue in time window
twindow, and btx represents the messages (in bytes) removed from the output queue and sent success-
fully in time window twindow. A utilization value greater than 1.0 indicates overload. Third, the
matching delay captures the average amount of time to match a publication message.
The core of the load estimation is the PADRES Real-time Event to Subscription Spectrum
(PRESS), which uses an efficient bit vector approach to estimate the input and output publication
loads of all subscriptions at the local broker. Together with locally subscribing to the
load-accepting broker’s covering subscription set, PRESS can estimate the amount of input and
output load introduced at the load-accepting broker for all subscriptions at the offloading broker.
Each of the three offload algorithms are designed to load balance on each load metric of the
broker by selecting the appropriate subscribers to offload based on their profiled load characteris-
tics. Simultaneously, the subscriptions that each offload algorithm picks minimize the impact on
the other load metrics to avoid instability. For example, the match offload algorithm offloads
subscriptions with the minimal traffic, and the output offload algorithm first offloads highest traf-
fic subscriptions that are covered by the load accepting broker’s subscription(s.)
This solution inherits all of the most desirable properties that make a load balancing algorithm
flexible. PIE contributes to the distributed and dynamic nature of the load balancing solution by
allowing each broker to invoke load balancing whenever necessary. Adaptiveness is provided by
the three offload algorithms that load balance on a unique performance metric. The local mediator
gives transparency to the subscribers throughout the offload process. Finally, load estimation
with PRESS allows the offload algorithms to account for broker and subscription heterogeneity.