## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

To read the full-text of this research,

you can request a copy directly from the authors.

Data Stream Processing (DSP) has emerged over the years as the reference paradigm for the analysis of continuous and fast information flows, which have often to be processed with low-latency requirements to extract insights and knowledge from raw data. Dealing with unbounded data flows, DSP applications are typically long-running and, thus, likely experience varying workloads and working conditions over time. To keep a consistent service level in face of such variability, a lot of effort has been spent studying strategies for run-time adaptation of DSP systems and applications. In this survey, we review the most relevant approaches from the literature, presenting a taxonomy to characterize the state of the art along several key dimensions. Our analysis allows us to identify current research trends as well as open challenges that will motivate further investigations in this field.

Stream processing applications handle unbounded and continuous flows of data items which are generated from multiple geographically distributed sources. Two approaches are commonly used for processing: Cloud-based analytics and Edge analytics. The first one routes the whole data set to the Cloud, incurring significant costs and late results from the high latency networks that are traversed. The latter can give timely results but forces users to manually define which part of the computation should be executed on Edge and to interconnect it with the remaining part executed in the Cloud, leading to sub-optimal placements. In this paper, we introduce Planner, a middleware for uniform and transparent stream processing across Edge and Cloud. Planner automatically selects which parts of the execution graph will be executed at the Edge in order to minimize the network cost. Real-world micro-benchmarks show that Planner reduces the network usage by 40% and the makespan (end-to-end processing time) by 15% compared to state-of-the-art.

Internet of Things (IoT) domains generate large volumes of high velocity event streams from sensors, which need to be analyzed with low latency to drive intelligent decisions. Big Data platforms for Complex Event Processing (CEP) enable such analytics. Traditionally, limited analytics are performed on the gateway edge device, or comprehensive analytics performed on Cloud Virtual Machines (VM) across all sensor streams. Leveraging the growing prevalence of captive edge resources in combination with Cloud VMs can offer better performance, flexibility and monetary costs. Here, we propose an approach to schedule an event analytics application composed as a Directed Acyclic Graph (DAG) of CEP queries across a collection of edge and Cloud resources. The goal of this optimization problem is to map the queries to the resources such that the end-to-end latency for the DAG is minimized, while also ensuring that a resource's compute and energy capacity are not saturated. We propose a brute-force optimal algorithm (BF) and a Generic Algorithm (GA) meta-heuristic to solve this problem. We perform comprehensive real-world benchmarks on the compute, network and energy capacity of edge and Cloud resources for over 17 CEP query configurations. These results are used to define a realistic simulation study that validate the BF and solutions for a wide variety over 45 DAGs. Our results show that the GA approach comes within 99% of the optimal BF solution, maps DAGs with 4 - 50 queries within 0.001 - 25 secs (unlike BF which takes hours for > 12 queries), and in fewer than 10% of the experiments is unable to offer a feasible solution.

Many applications in several domains such as telecommunications, network security, large-scale sensor networks, require online processing of continuous data flows. They produce very high loads that requires aggregating the processing capacity of many nodes. Current Stream Processing Engines do not scale with the input load due to single-node bottlenecks. Additionally, they are based on static configurations that lead to either under or overprovisioning. In this paper, we present StreamCloud, a scalable and elastic stream processing engine for processing large data stream volumes. StreamCloud uses a novel parallelization technique that splits queries into subqueries that are allocated to independent sets of nodes in a way that minimizes the distribution overhead. Its elastic protocols exhibit low intrusiveness, enabling effective adjustment of resources to the incoming load. Elasticity is combined with dynamic load balancing to minimize the computational resources used. The paper presents the system design, implementation, and a thorough evaluation of the scalability and elasticity of the fully implemented system.

Many systems for big data analytics employ a data flow abstraction to define parallel data processing tasks. In this setting, custom operations expressed as user-defined functions are very common. We address the problem of performing data flow optimization at this level of abstraction, where the semantics of operators are not known. Traditionally, query optimization is applied to queries with known algebraic semantics. In this work, we find that a handful of properties, rather than a full algebraic specification, suffice to establish reordering conditions for data processing operators. We show that these properties can be accurately estimated for black box operators by statically analyzing the general-purpose code of their user-defined functions.
We design and implement an optimizer for parallel data flows that does not assume knowledge of semantics or algebraic properties of operators. Our evaluation confirms that the optimizer can apply common rewritings such as selection reordering, bushy join-order enumeration, and limited forms of aggregation push-down, hence yielding similar rewriting power as modern relational DBMS optimizers. Moreover, it can optimize the operator order of nonrelational data flows, a unique feature among today's systems.

Many emerging on-line data analysis applications require applying continuous query operations such as correlation, aggregation, and filtering to data streams in real-time. Distributed stream processing s ystems allow in-network stream processing to achieve better scalability and qualit y-of-service (QoS) pro- vision. In this paper we present Synergy, a distributed stream processing mid- dleware that provides sharing-aware component composition. Synergy enables efficient reuse of both data streams and processing componen ts, while composing distributed stream processing applications with QoS demands. Synergy provides a set of fully distributed algorithms to discover and evalua te the reusability of available data streams and processing components when instantiating new stream applications. For QoS provision, Synergy performs QoS impact projection to examine whether the shared processing can cause QoS violations on currently running applications. We have implemented a prototype of the Synergy middle- ware and evaluated its performance on both PlanetLab and simulation testbeds. The experimental results show that Synergy can achieve much better resource utilization and QoS provision than previously proposed schemes, by judiciously sharing streams and processing components during application composition.

The graph partitioning problem is that of dividing the vertices of a graph into sets of specified sizes such that few edges cross between sets. This NP-complete problem arises in many important scientific and engineering problems. Prominent examples include the decomposition of data structures for parallel computation, the placement of circuit elements and the ordering of sparse matrix computations. We present a multilevel algorithm for graph partitioning in which the graph is approximated by a sequence of increasingly smaller graphs. The smallest graph is then partitioned using a spectral method, and this partition is propagated back through the hierarchy of graphs. A variant of the Kernighan-Lin algorithm is applied periodically to refine the partition. The entire algorithm can be implemented to execute in time proportional to the size of the original graph. Experiments indicate that, relative to other advanced methods, the multilevel algorithm produces high quality partitions at low cost.

Exploiting on-the-fly computation, Data Stream Processing (DSP) applications are widely used to process unbounded streams of data and extract valuable information in a near real-time fashion. As such, they enable the development of new intelligent and pervasive services that can improve our everyday life. To keep up with the high volume of daily produced data, the operators that compose a DSP application can be replicated and placed on multiple, possibly distributed, computing nodes, so to process the incoming data flow in parallel. Moreover, to better exploit the abundance of diffused computational resources (e.g., Fog computing), recent trends investigate the possibility of decentralizing the DSP application placement.
In this paper, we present and evaluate a general formulation of the optimal DSP replication and placement (ODRP) as an integer linear programming problem, which takes into account the heterogeneity of application requirements and infrastructural resources. We integrate ODRP as prototype scheduler in the Apache Storm DSP framework. By leveraging on the DEBS 2015 Grand Challenge as benchmark application, we show the benefits of a joint optimization of operator replication and placement and how ODRP can optimize different QoS metrics, namely response time, internode traffic, cost, availability, and a combination thereof.

Streaming applications transform possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. The streaming programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, it does not naturally expose data parallelism, which must instead be extracted from streaming applications. This paper presents a compiler and runtime system that automatically extract data parallelism for distributed stream processing. Our approach guarantees safety, even in the presence of stateful, selective, and user-defined operators. When constructing parallel regions, the compiler ensures safety by considering an operator's selectivity, state, partitioning, and dependencies on other operators in the graph. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for standard parallel regions, and near linear scalability when tuples are shuffled across parallel regions.

Various research communities have independently arrived at stream processing as a programming model for efficient and parallel computing. These communities include digital signal processing, databases, operating systems, and complex event processing. Since each community faces applications with challenging performance requirements, each of them has developed some of the same optimizations, but often with conflicting terminology and unstated assumptions. This article presents a survey of optimizations for stream processing. It is aimed both at users who need to understand and guide the system’s optimizer and at implementers who need to make engineering tradeoffs. To consolidate terminology, this article is organized as a catalog, in a style similar to catalogs of design patterns or refactorings. To make assumptions explicit and help understand tradeoffs, each optimization is presented with its safety constraints (when does it preserve correctness?) and a profitability experiment (when does it improve performance?). We hope that this survey will help future streaming system builders to stand on the shoulders of giants from not just their own community.

The k-cut problem is to find a partition of an edge weighted graph into k nonempty components, such that the total edge weight between components is minimum. This problem is NP-complete for an arbitrary k and its version involving fixing a vertex in each component is NP-hard even for k = 3. We present a polynomial algorithm for k fixed, that runs in Onki¾²/2-3k/2+4Tn, m steps, where Tn, m is the running time required to find the minimum s, t-cut on a graph with n vertices and m edges.

Let G be an acyclic directed graph with weights and values assigned to its vertices. In the partially ordered knapsack problem the authors wish to find a maximum-valued subset of vertices whose total weight does not exceed a given knapsack capacity, and which contains every predecessor of a vertex if it contains the vertex itself. Consideration is given to the special case were G is an out-tree. Even though this special case is still NP-complete, the authors observe how dynamic programming techniques can be used to construct pseudopolynomial time optimization algorithms and fully polynomial time approximation schemes for it. In particular, it is shown that a nonstandard approach called ″left-right″ dynamic programming is better suited for this problem than the standard ″bottom-up″ approach, and the authors show how this ″left-right″ approach can also be adapted to the case of in-trees and to a related tree partitioning problem arising in integrated circuit design.

An abstract is not available.

This paper describes an algorithm for partitioning a graph that is in the form of a tree. The algorithm has a growth in computation time and storage requirements that is directly proportional to the number of nodes in the tree. Several applications of the algorithm are briefly described. In particular it is shown that the tree partitioning problem frequently arises in the allocation of computer information to blocks of storage. Also, a heuristic method of partitioning a general graph based on this algorithm is suggested.

To use their pool of resources efficiently, distributed stream-processing systems push query operators to nodes within the network. Currently, these operators, ranging from simple filters to custom business logic, are placed manually at intermediate nodes along the transmission path to meet application-specific performance goals. Determining placement locations is challenging because network and node conditions change over time and because streams may interact with each other, opening venues for reuse and repositioning of operators. This paper describes a stream-based overlay network (SBON), a layer between a stream-processing system and the physical network that manages operator placement for stream-processing systems. Our design is based on a cost space, an abstract representation of the network and on-going streams, which permits decentralized, large-scale multi-query optimization decisions. We present an evaluation of the SBON approach through simulation, experiments on PlanetLab, and an integration with Borealis, an existing stream-processing engine. Our results show that an SBON consistently improves network utilization, provides low stream latency, and enables dynamic optimization at low engineering cost.

Given a rooted tree with a positive weight associated with every node, a linear algorithm is presented that will partition the tree into a minimum number of subtrees such that the sum of node weights in no subtree exceed a prespecified value k.

In this paper we present and study a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, find a k-way partitioning of the smaller graph, and then uncoarsen and refine it to construct a k-way partitioning for the original graph. These algorithms compute a k-way partitioning of a graph G = (V, E) in O(|E |) time which is faster by a factor of O(log k) than previously proposed multilevel recursive bisection algorithms. A key contribution of our work is in finding a high quality and computationally inexpensive refinement algorithm that can improve upon an initial k-way partitioning. We also study the effectiveness of the overall scheme for a variety of coarsening schemes. We present experimental results on a large number of graphs arising in various domains including finite element methods, linear programming, VLSI, and transportation. Our experiments show that this new scheme produces partitions that are of comparable or better q...

We consider the problem of partitioning the nodes of a graph with costs on its edges into subsets of given sizes so as to minimize the sum of the costs on all edges cut. This problem arises in several physical situations—for example, in assigning the components of electronic circuits to circuit boards to minimize the number of connections between boards.
This paper presents a heuristic method for partitioning arbitrary graphs which is both effective in finding optimal partitions, and fast enough to be practical in solving large problems.

An iterative mincut heuristic for partitioning networks is presented whose worst case computation time, per pass, grows linearly with the size of the network. In practice, only a very small number of passes are typically needed, leading to a fast approximation algorithm for mincut partitioning. To deal with cells of various sizes, the algorithm progresses by moving one cell at a time between the blocks of the partition while maintaining a desired balance based on the size of the blocks rather than the number of cells per block. Efficient data structures are used to avoid unnecessary searching for the best cell to move and to minimize unnecessary updating of cells affected by each move.

In this paper we present a parallel formulation of a multilevel k-way graph partitioning algorithm. The multilevel k-way partitioning algorithm reduces the size of the graph by collapsing vertices and edges (coarsening phase), finds a k-way partition of the smaller graph, and then it constructs a k-way partition for the original graph by projecting and refining the partition to successively finer graphs (uncoarsening phase). A key innovative feature of our parallel formulation is that it utilizes graph coloring to effectively parallelize both the coarsening and the refinement during the uncoarsening phase. Our algorithm is able to achieve a high degree of concurrency, while maintaining the high quality partitions produced by the serial algorithm.

The graph partitioning problem is that of dividing the vertices of a graph into sets of specified sizes such that few edges cross between sets. This NP--complete problem arises in many important scientific and engineering problems. Prominent examples include the mapping of parallel computations, the laying out of circuits and the ordering of sparse matrix computations. We present a multilevel algorithm for graph partitioning in which the graph is approximated by a sequence of increasingly smaller graphs. The smallest graph is then partitioned using a spectral method, and this partition is propagated back through the hierarchy of graphs. A variant of the Kernighan--Lin algorithm is applied periodically to refine the partition. For important classes of graphs, the entire algorithm can be implemented to execute in time proportional to the size of the original graph. Experiments indicate that, relative to other advanced methods, the new multilevel algorithm produces high quality...

Recent Advances in Graph Partitioning

- Aydın Buluç
- Ilya Henning Meyerhenke
- Peter Safro
- Christian Sanders
- Schulz

Graph partitioning and constructing optimal decision trees are polynomial complete problems

- Laurent Hyafil
- Ronald L Rivest