Query Optimization over Distributed Data Stream
ABSTRACT Recent research efforts in the fields of data stream processing show the increasing importance of processing data streams, e.g., in the e-science domain. Together with the advent of peer-to-peer (P2P) networks and grid computing, this leads to the necessity of developing new techniques for distributing and processing continuous queries over data streams in such networks. These systems often have to process multiple similar but different continuous aggregation queries simultaneously. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. The challenge is to identify overlapping computations that may not be obvious in the queries themselves. In this paper, we propose a novel algorithmic solution for problem of finding the minimum number of queries in such a distributed-streams setting, in order to optimize the communicate cost across the network. The experiment result show that our approach gives us as much as magnitude performance improvement over the no-share settings.