Conference PaperPDF Available

Scalable and Adaptable Distributed Stream Processing


Abstract and Figures

In this paper we introduce a new architectural design of a large scale distributed stream processing system. The system adopts a two layer architecture. Based on the locality and the natural administrative dependency of the processors, the processors are naturally partitioned into multiple independent entities. Processors within each entity compose the first layer while all the entities comprise the second one. Tightly coupled cooperation is employed within each entity due to the availability of central administration and their close locality. On the contrary, the entities cooperate with each other in a loosely coupled way. Challenges are identified in each layer of the architecture and techniques are proposed to solve them.
Content may be subject to copyright.
Scalable and Adaptable Distributed Stream Processing
Yongluan Zhou
National University of Singapore
In this paper we introduce a new architectural design of
a large scale distributed stream processing system. The sys-
tem adopts a two layer architecture. Based on the locality
and the natural administrative dependency of the proces-
sors, the processors are naturally partitioned into multiple
independent entities. Processors within each entity com-
pose the first layer while all the entities comprise the second
one. Tightly coupled cooperation is employed within each
entity due to the availability of central administration and
their close locality. On the contrary, the entities cooperate
with each other in a loosely coupled way. Challenges are
identified in each layer of the architecture and techniques
are proposed to solve them.
1. Introduction
Emerging applications, such as stock tickers, sports tick-
ers, financial monitoring and network management, have
fueled much research interest in designing stream process-
ing engines [1, 5, 10]. These systems support complex con-
tinuous queries over push-based data streams. In the ap-
plications, such as financial market monitoring, which have
potentially large number of clients, we envision that there
would be a lot of business entities that provide stream pro-
cessing services for a huge number of clients. One example
of such kind of entity is Instead of only
providing stock quotes, it can evaluate user specified queries
and deliver the results in a real time manner. Each of this
kind of entities installs and runs its own stream processing
engine. To enhance scalability, each entity can employ a
cluster of processors. These processors are typically inter-
connected by a fast local network, under a central adminis-
tration and expected to employ the same processing model.
A more ambitious service is to integrate the processing
power and capabilities of the different entities to provide a
central access portal to all the clients. We assume the par-
ticipating entities cooperates based on business agreements
and they are incited to process queries assigned to them. For
example, an entity can be paid based on the length of time
when it is executing the queries. We also assume there is a
Figure 1. Two layer network
known global schema of the data. Each participating entity
only needs to install a wrapper which is responsible to coop-
erate with other entities. Due to the possibility of employ-
ing different processing engines, different entities may have
different data model, processing model as well as user inter-
faces. And it is not expected that the entities will surrender
their administrations. Furthermore, the entities are typically
interconnected by a widely distributed network. This brings
different requirements to the architectural design. Instead
of being tightly-coupled as the intra-entity processors, the
inter-entity cooperation is preferred to be loosely coupled
due to the lack of central administration.
Under the above observations, we can see that the net-
work is naturally partitioned into two layers: the intra-entity
layer and the inter-entity layer. This structure is illustrated
in Figure 1, where the rectangles denote the entities and the
circles denote the processors within the entities. The trian-
gles are to represent the stream sources. This thesis aims at
designing a scalable and adaptable distributed stream pro-
cessing system under the above architecture. The properties
of the system include the followings:
The inter-entity cooperation is loosely-coupled. Each
entity can operate its own processing engine and process the
queries assigned to it without much interactions with other
The task of disseminating streams from the sources to
the widely distributed entities are emphasized in our sys-
tem. This differs from existing work which mainly focus
on the query processing task. Adaptive data dissemination
techniques are applied to efficiently disseminate the data
streams from the sources to the widely distributed entities.
Queries are adaptively allocated to the entities to min-
imize the communication cost of data stream dissemination
and to share the load among the entities.
Within each entity, we assume there is a central ad-
ministration over the whole entity. These processors are
expected to employ the same processing engine and hence
share a uniform data model, processing model and user in-
terface. As such, we propose the processors cooperate in a
more closely-coupled way to enhance efficiency. The pro-
posed techniques are platform independent and hence can
be applied to any existing processing engines. Note that
this architecture is only one of all the possibilities. Each en-
tity is not forced to employ any specific architecture due to
the fact that they cooperate in a loosely coupled manner.
A platform independent intra-entity dynamic opera-
tor placement scheme is proposed. Queries assigned to an
entity are dynamically partitioned into multiple fragments
which are distributed to the processors for processing.
An adaptive distributed operator ordering architecture
is proposed to adaptively change the ordering of operators
that are distributed to multiple processors within an entity.
The rest of this paper is organized as follows. The archi-
tecture design for the inter-entity layer and the intra-entity
layer are presented in Section 3 and Section 4 respectively.
Section 5 presents the related work and Section 6 concludes
the paper.
2. The Degree of Coupling
Although distributed stream processing has attracted sig-
nificant research attention recently, we do not notice any ex-
plicit discussion of the degree of coupling for such a system.
In general, the coupling of a distributed system is related
to the degree of cooperation between the distributed nodes.
We focus on the cooperation in the two main services pro-
vided by the system: stream transfer and query processing.
The stream transfer service disseminates the data streams
to the diverse processing nodes while the query processing
service evaluate the submitted queries over the streaming
Table 1 presents the different types of cooperation in
these two services. Here, we use a coarse-grained cate-
gorization. Inside each category, the adoption of different
techniques would also affect the degree of coupling. To pro-
vide the stream transfer service, the distributed nodes in a
non-cooperated system can connect to the sources directly
and rely on the sources to transfer data to them. Alterna-
tively, to enhance the communication efficiency, the nodes
can be organized into an overlay network and transfer the
streams to the nodes through their cooperation. For the
query processing service, the processing nodes can be iso-
lated and process their queries independently. To address
the problem of load imbalance, the nodes can cooperate to
share their processing loads. We further subdivide the co-
operation into two subcategories: load sharing at the query
Stream transfer
Query processing
load sharing
query level op or finer
non-cooperated all single-
site engines
unexplored [9, 11, 6]
Cooperated [13] Section 3 [2, 7, 8],
Section 4
Table 1. Different degree of cooperation
level and the one at the operator (or finer) level. In the for-
mer one, load is distributed in the unit of the whole queries
while it is in the unit of operators (or partitions of each op-
erator) in the latter one. Generally speaking, the system is
more tightly coupled in the latter case, because it requires
the processing nodes to adopt the same data model and pro-
cessing model.
The main advantage of a loose coupling system is the
ease of deployment. With a looser coupling, the changes at
one node would have less impact on the others. This results
in a lot of desirable system characteristics. For example, we
can adopt different stream processing engines in different
nodes and upgrade the version of the engine in one node
without the synchronization of the others. On the other
hand, with a tighter cooperation, higher efficiency can be
achieved, such as the load sharing and cooperated stream
transfer mentioned above. Hence when we are designing
the architecture of the system, the degree of coupling should
be carefully chosen. We choose different degree of coupling
in different layers of our system. Details are given in the
following sections.
3. Inter-entity Layer
In the inter-entity layer, because the entities are under in-
dependent administrations, loosely coupled cooperation is
preferred. First, entities may join or leave at any time which
is out of control even without failure. Second, different pro-
cessing engines might be installed in different entities be-
cause of different business decisions. These engines may
have different data models and processing models. Thus,
many tightly coupled cooperation techniques cannot be di-
rectly tapped upon. For instance, dynamically distribut-
ing the operators of a query to multiple entities violates
the loosely-coupling property. Moreover, this may also not
be feasible, e.g., moving a window join operator from the
STREAM system to a TelegrahCQ system is hard to imple-
ment, because it relies on a special data structure “synopsis”
implemented in STREAM which is not only manipulated by
the join operator itself but also other operators before or af-
ter the join operator. Furthermore, even the entities use the
same engine, one may upgrade its engine without informing
the others. The would also bring problems unless forward
and backward compatibility is implemented. For a practical
solution, we choose to operate at query level, i.e., a query
is processed within a single entity. In the following subsec-
tions, we present how the entities cooperate to disseminate
the data streams and distribute the user queries.
3.1. Data Stream Dissemination
Given that queries allocated to the various entities, the
streams have to be transferred to the entities to feed the
queries. A straightforward approach is to let the source
nodes to feed the entities directly. However, relying solely
on the sources to transfer data is not scalable to the num-
ber of entities. To address this issue, we allow the entities
to cooperate with each other in transferring data streams
rather than only relying on the sources. The entities are
organized into multiple hierarchical tree structure, which is
widely adopted in large scale data dissemination systems.
Each parent entity in a tree is responsible to transfer the up-
stream data to its children. Hence each entity only needs to
transfer streams to a limited number of entities. The shapes
of these trees have significant impact on the dissemination
efficiency which deserve further study. The second issue is
how each entity forward data to its children. The simplest
approach is to forward all the received data to its children.
This incurs a lot of unnecessary data transfer if a child does
not require all the data. Due to the large volumes and con-
tinuity of streaming data, minimizing the communication
cost in the system is critical. We allow each entity to ex-
press its data requirement which will be used to perform
early filtering and transforming at its ancestors. This brings
the issue of how to represent the data interest of the different
queries as well as how to efficiently compute the aggrega-
tion of data interest from different queries.
3.2. Query Distribution
The above discussion assumes that the queries are al-
ready allocated to the entities. To distribute queries, we
should consider the following issues.
3.2.1. Coordinator Tree Construction. Queries in our ap-
plication may arrive very quickly. We refer to them as the
query streams. The query allocation algorithm should be
scalable to fast query streams. To address this problem, we
adopt a hierarchical coordinator-based approach. The co-
ordinators are organized as a hierarchical tree. Queries are
distributed level by level down the tree. An internal coordi-
nator distributes query to its child coordinators. The queries
are finally distributed to the entities by the leaf coordina-
tors. A higher level coordinator distributes queries based on
coarser information. In this paper, we adapt the distributed
mechanism proposed in [3] to dynamically construct a hier-
archical tree of coordinators. The mechanism tries to main-
tain a tree with the following properties: (1) the size of the
cluster in each level (except the root and the second to root
level, where the size is less than 3k 1) is between k to
3k 1; (2) the parent of a cluster is the geographical center.
The tree is constructed incrementally and dynamically.
1. When a new node requests to join the network, its re-
quest will be first directed to the root coordinator. For each
node that receives a join request, if it is the leaf coordinator,
it will add the joining node as its child node. Otherwise,
it will identify the child coordinator closest to the joining
node and direct the request to that child.
2. If a node leave the network, a message is sent to its
parent and children (if any). If it is a coordinator, a new
parent is reselected among its remaining children. Further-
more, heartbeat messages are sent periodically among the
parent and children to detect any node failure.
3. If a coordinator finds out that the number of its chil-
dren exceeds 3k 1, it will partition the cluster into two
clusters, each of size at least b3k/2c, such that the radii
among the two clusters are minimized. The center of the
two clusters are selected as the two new parents.
4. If the number of children of a coordinator x falls be-
low k, it will send a merge request to the closest sibling y.
The sibling adds all the children of x to its children.
5. Periodically, a new parent will be selected if the cur-
rent parent is no longer the center among its cluster.
3.2.2. Load Distribution. To achieve maximum system uti-
lization and minimum processing latencies, load balancing
among the processors is desirable. Moreover, the data in-
terest of different queries may significantly overlap. The
transferring of these data can be shared among the queries to
reduce the communication cost. This suggests that queries
with similar data interest should be allocated to the entities
that are close to each other in the dissemination trees. To
solve the above two problems, we model the query distribu-
tion problem as a graph partitioning problem. Each vertex
in the query graph corresponds to a query and there is an
edge between two vertices if there is overlap in their data
interest. A vertex is weighted by the workload incurred by
the query and an edge is weighted with the estimated arrival
rate (bytes/second) of the data of interest to both end ver-
tices (queries). The problem is modeled as: given a graph
G = (V, E) and the weights on the vertices and edges,
dynamically partition V into k disjoint partitions such that
each partition has a specified amount of vertex weights and
the weighted edge cut, i.e. the total weight of the edges
connecting vertices in different partitions, is minimized.
Figure 2 illustrates an example query graph that com-
prises 5 queries. The weights of the vertices and edges are
drawn around them. If, for example, we have to allocate
the queries to two entities, we can consider two plans: (a)
allocate Q
and Q
to one entity and the rest to another; (b)
allocate Q
and Q
to one entity and the others to another.
Both the two plans can achieve load balance. However,
Figure 2. Query Graph
plan (b) has a better communication efficiency, where only
3 (bytes/second) of duplicate data are transferred to both
nodes, while in plan (a) 8 (bytes/second) of duplicate data
are transferred. Note that only considering allocating sim-
ilar queries together may not result in good performance.
As can be seen from the above example, Q
and Q
are not
similar in their data interest but allocating them together re-
sults in a better scheme.
At runtime, the system is subject to changes, e.g., the
evolving of the data interest of the queries and the load in-
curred by the queries, arrival or leave of queries etc. Hence,
the query graph should be adaptively repartitioned if a bet-
ter solution can be found. One approach is to repartition the
query graph from scratch. This may result in a relatively op-
timal partitioning but with a long decision making time and
a large number of query movements. Another approach is
to cut some vertices from the overloaded partitions to other
underloaded partitions without considering the relationship
of overlap in data interest. This approach can achieve small
query migration time and decision making time. However,
communication efficiency might be unsatisfactory due to
the large overlap of data interest between different parti-
tions. Hence a desirable approach should be able to achieve
a trade-off between these two extremes.
4. Intra-entity Layer
Within each entity, processors are under a central admin-
istration and interconnected by a fast local network, which
eases the employment of tightly coupled techniques. Be-
cause we assume entities may adopt different processing
engines, the techniques proposed should be independent on
the actual employed engines.
The first problem is how to receive the data feed by the
upstream entity and forward them to the downstream enti-
ties. Relying on a single processor to receive all the streams
is not scalable. Hence, we assign a processor as the del-
egation of each data stream that is sent to the entity. The
delegation processor is responsible to route the streams to
other processors in the same entity as well as to transfer the
streams to the child entities. Figure 3 shows an overview
of the structure of an entity. Given a delegation scheme,
there are two important query optimization issues: operator
placement and operator ordering.
Stream sources
Stream sources
Stream sources
Figure 3. The structure of an entity
4.1. Operator Placement
To achieve better performance, we adopt a finer grained
load distribution scheme than that used in the inter-entity
layer. A single query is dynamically partitioned into multi-
ple query fragments and distributed to the processors for
processing to minimize the delay of query results. At a
closer look, the delay d
includes the time used in evalu-
ating the query (denoted as p
), the time waiting for pro-
cessing as well as the time it is transferred over the net-
work connections. For a specific processing model and a
particular operator ordering, we regard the evaluation time
as the inherent complexity of the query. Since different
queries may have different inherent complexities, the value
of d
cannot reflect correctly the relative performance of
different queries. For example, a query may experience a
long delay because its evaluation time is long. However,
in a multi-query and multi-user environment, we wish to
tell the relative performance of different queries. Hence we
propose a new metric Performance Ratio (PR) to incorpo-
rate the inherent complexity of a query. The P R of a query
is defined as P R
. Our objective is to minimize
the worst relative performance among all the queries, i.e.
= max
, where Q is the total number
of queries.
We identified several heuristics to achieve the above
goal. First, an arrived tuple has to wait for processing while
a processor is busy. And the length of the busy period of a
processor depends on the workload imposed upon the pro-
cessor. Hence, to minimize the waiting time, load balancing
among the processors is desirable. Second, the communi-
cation delay of a tuple is equal to the product of the transfer
latency and the times that the tuple is transferred over the
network. This suggests the second heuristic: distribute op-
erators of a query to a restricted number of nodes so that
communication overhead of a query is limited. We call the
maximum of this number as the distribution limit of that
query. Third, the bandwidth of each processor is limited
and we have to avoid high communication traffic. Hence
the third heuristic is to minimize the communication traffic
under the first two heuristics.
4.2. Operator Ordering
To address this problem, distributed plan adaptation
techniques [12] can be adapted to our context. In [12], we
proposed a distributed query processing architecture which
facilitates the adaptation of operator orderings. However,
it relies on a specific processing engine: TelegraphCQ. In
this paper, we extend it to a platform independent scheme.
There is an Adaptation Module (AM) that intercepts the in-
put and output stream of the processing engine. Given an
operator placement, the output stream of a processor may
need to be processed by multiple processors in different or-
ders. A set of candidate downstream processors are gener-
ated when a query fragment is (re)placed onto a processor.
The AM continuously collects statistics of these candidate
processors, such as workload, selectivities of the query frag-
ments and the bandwidth usage etc. Based on these statis-
tics, the AM adaptively chooses the immediate downstream
processor for an output tuple.
5. Related Work
We categorize exisiting work on stream processing sys-
tems based on the types of the adopted cooperatoion tech-
niques, which are listed in Table 1.
Early work on stream processing focus on building
single-site stream processing engines[5, 4, 10]. Recently,
attention has been paid to distributed stream processing.
Flux [9] and Borealis [11] adopted similar system architec-
ture and assumptions. A cluster of processors is employed
to enhance the scalability of the processing engine. The
network connections are assumed to be very fast and hence
the communication cost is ignored. The techniques focus
on sharing the load between the processors in a fine granu-
larity. However, the streams are disseminated directly from
the sources to the processors. Furthermore, the load distri-
bution problems in the above two pieces of work are essen-
tially partitioning problems, as all the processors are identi-
cal in terms of the assignment of operator/stream partitions.
However, our intra-entity operator placement problem is an
assignment problem (due to the stream delegation scheme),
which requires different solutions.
Project Medusa [6] also proposed an architecture to in-
tegrate multiple administratively independent entities. In
their load distribution algorithm, it is assumed that the en-
tities employ the same type of processing engine so that
operators of a query can be distributed to multiple entities
for processing. Furthermore, their architecture does not ad-
dress the problem of transferring the data steams to large
number of processors and rely solely on the sources to dis-
seminate the streams. Literatures [2], [7] and [8] proposed
techniques of in-network stream processing. Query oper-
ators are allocated along the path from the sources to the
clients to enhance the communication efficiency. The major
goal of the load sharing is to minimize the communication
cost instead of load balancing. Furthermore, they assume
an overlay mesh already exists and focus on searching the
place to allocate the query operators along the data trans-
fer path. On the contrary, [13] focuses on the problem of
constructing an overlay structure to transfer streaming data
to a large number of processing servers without considering
load sharing.
6. Conclusion
In this paper, we reexamined the problem of designing
a scalable distributed stream processing system. A new ar-
chitectural design is presented. The system composed by
two layers: the inter-entity layer and the intra-entity layer.
A few performance issues in the new system structure are
identified. Initial solution to these issues are presented. We
are now trying to integrate the system with various single
site processing engines and then plan to deploy it onto real
network environment
[1] D. J. Abadi et al. The design of the borealis stream process-
ing engine. In CIDR, 2005.
[2] Y. Ahmad and U. C¸ etintemel. Networked query processing
for distributed stream-based applications. In VLDB, 2004.
[3] S. Banerjee et al. Scalable application layer multicast. In
SIGCOMM, 2002.
[4] D. Carney et al. Monitoring streams: A new class of data
management applications. In VLDB, 2002.
[5] S. Chandrasekaran et al. TelegraphCQ: Continuous dataflow
processing for an uncertain world. In CIDR, 2003.
[6] M. Cherniack et al. Scalable distributed stream processing.
In CIDR, 2003.
[7] V. Kumar, B. F. Cooper, Z. Cai, G. Eisenhauer, and
K. Schwan. Resource-aware distributed stream management
using dynamic overlays. In ICDCS, 2005.
[8] P. Pietzuch et al. Network-aware operator placement for
stream-processing systems. In ICDE, 2006.
[9] M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J.
Franklin. Flux: An adaptive partitioning operator for contin-
uous query systems. In ICDE, 2003.
[10] The STREAM Group. STREAM: The stanford stream data
manager. IEEE Data Engineering Bulletin.
[11] Y. Xing, S. B. Zdonik, and J.-H. Hwang. Dynamic load dis-
tribution in the borealis stream processor. In ICDE, pages
791–802, 2005.
[12] Y. Zhou, B. C. Ooi, K.-L. Tan, and W. H. Tok. An adaptable
distributed query processing architecture. Data Knowl. Eng.,
53(3):283–309, 2005.
[13] Y. Zhou, B. C. Ooi, K.-L. Tan, and F. Yu. Adaptive re-
organization of coherency-preserving dissemination tree for
streaming data. In ICDE, 2006.
Participatory Sensing is a new computing paradigm that aims to turn personal mobile devices into advanced mobile sensing networks. For popular applications, we can expect a huge number of users to both contribute with sensor data and request information from the system. In such scenario, scalability of data processing becomes a major issue. In this paper, we present a system for supporting participatory sensing applications that leverages cluster or cloud infrastructures to provide a scalable data processing infrastructure. We propose and evaluate three strategies for data processing in this architecture.
The ability to process large numbers of continuous data streams in a near-real-time fashion has become a crucial prerequisite for many scientific and industrial use cases in recent years. While the individual data streams are usually trivial to process, their aggregated data volumes easily exceed the scalability of traditional stream processing systems. At the same time, massively-parallel data processing systems like MapReduce or Dryad currently enjoy a tremendous popularity for data-intensive applications and have proven to scale to large numbers of nodes. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far. In this paper we address this gap. First, we analyze common design principles of today’s parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a highly distributed scheme which allows these frameworks to detect violations of user-defined QoS constraints and optimize the job execution without manual interaction. As a proof of concept, we implemented our approach for our massively-parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online. For an example streaming application from the multimedia domain running on a cluster of 200 nodes, our approach improves the processing latency by a factor of at least 13 while preserving high data throughput when needed.
Full-text available
This experience paper summarizes the key lessons we learned throughout the design and implementation of the Aurora stream-processing engine. For the past 2 years, we have built five stream-based applications using Aurora. We first describe in detail these applications and their implementation in Aurora. We then reflect on the design of Aurora based on this experience. Finally, we discuss our initial ideas on a follow-on project, called Borealis, whose goal is to eliminate the limitations of Aurora as well as to address new key challenges and applications in the stream-processing domain.
Discovering icebergs in distributed streams of data is an important problem for a number of applications in networking and databases. While previous work has concentrated on measuring these icebergs in the non-distributed streaming case or in the non-streaming distributed case, we present a general framework that allows for distributed processing across multiple streams of data. We compare several of the state-of-the-art streaming algorithms for estimating local elephants in the individual streams. However, since an iceberg may be hidden by being distributed across many different streams, we add a sampling component to handle such cases. We provide a novel taxonomy of current sketches and perform a thorough analysis of the strengths and weaknesses of each scheme under various QoS metrics, using both real and synthetic Internet trace data. We summarize their performance and discuss the implications for the future design of sketches. KeywordsIceberg–Sketch–Sampling
Scheduling a streaming application on high-performance computing (HPC) resources has to be sensitive to the computation and communication needs of each stage of the application dataflow graph to ensure QoS criteria such as latency and throughput. Since the grid has evolved out of traditional high-performance computing, the tools available for scheduling are more appropriate for batch-oriented applications. Our scheduler, called Streamline, considers the dynamic nature of the grid and runs periodically to adapt scheduling decisions using application requirements (per-stage computation and communication needs), application constraints (such as co-location of stages), and resource availability. The performance of Streamline is compared with an Optimal placement, Simulated Annealing (SA) approximations, and E-Condor, a streaming grid scheduler built using Condor. For kernels of streaming applications, we show that Streamline performs close to the Optimal and SA algorithms, and an order of magnitude better than E-Condor under non-uniform load conditions. We also conduct scalability studies showing the advantage of Streamline over other approaches. Furthermore, we implement Streamline on Planetlab as a grid service and demonstrate that it performs close to SA algorithm under dynamic resource conditions.
Full-text available
Traditionally, distributed query optimization techniques generate static query plans at compile time. However, the optimality of these plans depends on many parameters (such as the selectivities of operations, the transmission speeds and workloads of servers) that are not only difficult to estimate but are also often unpredictable and fluctuant at runtime. As the query processor cannot dynamically adjust the plans at runtime, the system performance is often less than satisfactory. In this paper, we introduce a new highly adaptive distributed query processing architecture. Our architecture can quickly detect fluctuations in selectivities of operations, as well as transmission speeds and workloads of servers, and accordingly change the operation order of a distributed query plan during execution. We have implemented a prototype based on the Telegraph system [Telegragraph project. Available from <>]. Our experimental study shows that our mechanism can adapt itself to the changes in the environment and hence approach to an optimal plan during execution.
Conference Paper
Full-text available
Stream processing fits a large class of new applications for which conventional DBMSs fall short. Because many stream-oriented systems are inherently geographically distributed and because distribution offers scalable load management and higher availability, future stream processing systems will operate in a distributed fashion. They will run across the Internet on computers typically owned by multiple cooperating administrative domains. This paper describes the architectural challenges facing the design of large-scale distributed stream processing systems, and discusses novel approaches for addressing load management, high availability, and federated operation issues. We describe two stream processing systems, Aurora* and Medusa, which are being designed to explore complementary solutions to these challenges. This paper discusses the architectural issues facing the design of large-scale distributed stream processing systems. We begin in Section 2 with a brief description of our centralized stream processing system, Aurora (4). We then discuss two complementary efforts to extend Aurora to a distributed environment: Aurora* and Medusa. Aurora* assumes an environment in which all nodes fall under a single administrative domain. Medusa provides the infrastructure to support federated operation of nodes across administrative boundaries. After describing the architectures of these two systems in Section 3, we consider three design challenges common to both: infrastructures and protocols supporting communication amongst nodes (Section 4), load sharing in response to variable network conditions (Section 5), and high availability in the presence of failures (Section 6). We also discuss high-level policy specifications employed by the two systems in Section 7. For all of these issues, we believe that the push-based nature of stream-based applications not only raises new challenges but also offers the possibility of new domain-specific solutions.
Conference Paper
Full-text available
Borealis is a second-generation distributed stream pro-cessing engine that is being developed at Brandeis Uni-versity, Brown University, and MIT. Borealis inherits core stream processing functionality from Aurora [14] and distribution functionality from Medusa [51]. Bo-realis modifies and extends both systems in non-trivial and critical ways to provide advanced capabilities that are commonly required by newly-emerging stream pro-cessing applications. In this paper, we outline the basic design and function-ality of Borealis. Through sample real-world applica-tions, we motivate the need for dynamically revising query results and modifying query specifications. We then describe how Borealis addresses these challenges through an innovative set of features, including revi-sion records, time travel, and control lines. Finally, we present a highly flexible and scalable QoS-based opti-mization model that operates across server and sensor networks and a new fault-tolerance model with flexible consistency-availability trade-offs. 1 Introduction
This paper investigates the benefits of network awareness when processing queries in widely-distributed environments such as the Internet. We present algorithms that leverage knowledge of network characteristics (e.g., topology, band-width, etc.) when deciding on the network lo-cations where the query operators are executed. Using a detailed emulation study based on realis-tic network models, we analyse and experimen-tally evaluate the proposed approaches for dis-tributed stream processing. Our results quantify the significant benefits of the network-aware ap-proaches and reveal the fundamental trade-off be-tween bandwidth efficiency and result latency that arises in networked query processing.
Conference Paper
We consider distributed applications that continuously stream data across the network, where data needs to be aggregated and processed to produce a 'useful' stream of updates. Centralized approaches to performing data aggregation suffer from high communication overheads, lack of scalability, and unpredictably high processing workloads at central servers. This paper describes a scalable and efficient solution to distributed stream management based on (1) resource-awareness, which is middleware-level knowledge of underlying network and processing resources; (2) overlay-based in-network data aggregation; and (3) high-level programming constructs to describe data-flow graphs for composing useful streams. Technical contributions include a novel algorithm based on resource-aware network partitioning to support dynamic deployment of data-flow graph components across the network, where efficiency of the deployed overlay is maintained by making use of partition-level resource-awareness. Contributions also include efficient middleware-based support for component deployment, utilizing runtime code generation rather than interpretation techniques, thereby addressing both high performance and resource-constrained applications. Finally, simulation experiments and benchmarks attained with actual operational data corroborate this paper's claims
Conference Paper
This paper introduces monitoring applications, which we will show differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS that is currently under construction at Brandeis University, Brown University, and M.I.T. We describe the basic system architecture, a stream-oriented set of operators, optimization tactics, and support for real- time operation.
Conference Paper
The education industry has a very poor record of productivity gains. In this brief article, I outline some of the ways the teaching of a college course in database systems could be made more efficient, and staff time used more productively. These ideas ...
Conference Paper
The long-running nature of continuous queries poses new scalability challenges for dataflow processing. CQ systems execute pipelined dataflows that may be shared across multiple queries. The scalability of these dataflows is limited by their constituent, stateful operators - e.g. windowed joins or grouping operators. To scale such operators, a natural solution is to partition them across a shared-nothing platform. But in the CQ context, traditional, static techniques for partitioned parallelism can exhibit detrimental imbalances as workload and runtime conditions evolve. Long-running CQ dataflows must continue to function robustly in the face of these imbalances. To address this challenge, we introduce a dataflow operator called flux that encapsulates adaptive state partitioning and dataflow routing. Flux is placed between producer-consumer stages in a dataflow pipeline to repartition stateful operators while the pipeline is still executing. We present the flux architecture, along with repartitioning policies that can be used for CQ operators under shifting processing and memory loads. We show that the flux mechanism and these policies can provide several factors improvement in throughput and orders of magnitude improvement in average latency over the static case.