Conference Paper

Approximating Aggregation Queries in Peer-to-Peer Networks

UC Riverside;
DOI: 10.1109/ICDE.2006.23 Conference: Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
Source: DBLP

ABSTRACT Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries ― e.g., aggregation queries ― on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of peer-to-peer databases. In this paper we present novel sampling-based techniques for approximate answering of ad-hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors ― the data is distributed (usually in uneven quantities) across many peers, within each peer the data is often highly correlated, and moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach, based on random walks of the P2P graph as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solutio

  • [Show abstract] [Hide abstract]
    ABSTRACT: Over the recent years, the proliferation of mobile networking and the increasing capabilities of smartphone devices have led to the development of the "Community-based Participatory Sensing" approach, where users participate in data collection and sharing in a wide range of application areas such as entertainment, transportation and environmental monitoring. This paper develops a participatory sensing system that uses a sampling mechanism that aims to stimulate user participation in dynamic groups that provide services and get compensated for the services they provide. Users participate in the community by sensing and sharing streams of events. The system then uses a sampling mechanism to define a subset of events that preserves the characteristics of the stream data and provides the highest "information gain" to the system, given the budget and resource constraints. Our experimental results illustrate that our approach is practical, efficient and depicts good performance.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The present peer-to-peer (P2P) content distribution system is based on simple on-demand content discovery technique. This can be improved by implementing additional capabilities namely a mechanism through which peers can register with the network so that they can be continuously informed of new data items, and a means for the peers to advertise their contents. Existing unstructured overlay based systems require complex indexing and routing schemes makes the network less flexible for transient peers. For these applications, we study the alternate continuous query paradigm, which is a best-effort service providing the services. We present a scalable and effective middleware called CQUOS for supporting continuous queries in unstructured overlay networks. CQUOS preserves the simplicity and flexibility of the unstructured P2P network. It has two techniques namely cluster resilient random walk algorithm which is responsible for pro propagating the queries to various regions of the network and dynamic probability-based query registration scheme to ensure that the registrations are well distributed in the overlay. This paper studies the properties of our algorithms through theoretical analysis.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Estimating the global data distribution in Peer-to-Peer (P2P) networks is an important issue and has not yet been well addressed. It can benefit many P2P applications, such as load balancing analysis, query processing, data mining, and so on. In this paper, we propose a novel algorithm which is based on compact multi-dimensional histogram information to achieve high estimation accuracy with low estimation cost. Maintaining data distribution in a multi-dimensional histogram which is spread among peers without overlapping and each part of which is further condensed by a set of discrete cosine transform coefficients, each peer is capable to hierarchically accumulate the compact information to the entire histogram by information exchange and consequently estimates the global data density with accuracy and efficiency. Algorithms on discrete cosine transform coefficients hierarchically accumulating as well as density estimation error are introduced with detailed theoretical analysis and proof. Our extensive performance study confirms the effectiveness and efficiency of our methods on density estimation in dynamic P2P networks.
    Distributed and Parallel Databases - DPD. 01/2009; 26:261-289.

Full-text (2 Sources)

Available from
Nov 19, 2014