Conference Paper

Approximating Aggregation Queries in Peer-to-Peer Networks

UC Riverside
DOI: 10.1109/ICDE.2006.23 Conference: Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
Source: DBLP


Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries ― e.g., aggregation queries ― on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of peer-to-peer databases. In this paper we present novel sampling-based techniques for approximate answering of ad-hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors ― the data is distributed (usually in uneven quantities) across many peers, within each peer the data is often highly correlated, and moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach, based on random walks of the P2P graph as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solutio

Download full-text


Available from: Benjamin Arai, Nov 19, 2014
  • Source
    • "They have shown that if certain structural elements of the network are known then a sample is selected from a stationary distribution with a high probability. [19] maintains histograms to estimate workload on peers in the structured P2P networks by random walk, and [2] gives one paradigm on evaluating aggregation query in unstructured P2P networks by invoking two random walks. However, they are hard to be deployed to estimate data distributions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Estimating the global data distribution in Peer-to-Peer (P2P) networks is an important issue and has yet to be well addressed. It can benefit many P2P applications, such as load balancing analysis, query processing, and data mining. Inspired by the inversion method for random variate generation, in this paper we present a novel model named distribution-free data density estimation for dynamic ring-based P2P networks to achieve high estimation accuracy with low estimation cost regardless of distribution models of the underlying data. It generates random samples for any arbitrary distribution by sampling the global cumulative distribution function and is free from sampling bias. In P2P networks, the key idea for distribution-free estimation is to sample a small subset of peers for estimating the global data distribution over the data domain. Algorithms on computing and sampling the global cumulative distribution function based on which global data distribution is estimated are introduced with detailed theoretical analysis. Our extensive performance study confirms the effectiveness and efficiency of our methods in ring-based P2P networks.
    01/2012; DOI:10.1109/ICDE.2012.19
  • Source
    • "When cost is more important than accuracy, approximate query processing can be employed. A sampling-based approach to aggregation query processing is proposed in [1]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Peer-to-peer database systems (P2PDBs) aim at providing database services with node autonomy, high availability and loose coupling between participating nodes by building the DBMS on top of a peer-to-peer network. A key feature of current peer-to-peer systems is resilience to churn in the overlay network layer. A major challenge in P2PDBs is to provide similar robustness in the data and query processing layer. In this paper we in particular describe how aggrega- tion queries in P2PDBs can be handled in order to reduce the impact of churn on accuracy of results. We perform a formal study of data loss and accuracy of such queries, and describe new approaches that increase the accuracy of aggregation queries in P2PDBs under churn.
    12th International Database Engineering and Applications Symposium (IDEAS 2008), September 10-12, 2008, Coimbra, Portugal; 01/2008
  • Source
    • "Several authors have looked into approximation-type queries for P2P networks, including using random walks over the web in [18], and aggregations over unstructured P2P networks as in [2]. Alternative gossip-style techniques of computing aggregates have been suggested by [20], but require participation of every node in the system. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Peer-to-Peer networks have become very popular on the Internet, with millions of peers all over the world sharing large volumes of data. In the assistive healthcare sector, it is likely that P2P networks will develop that interconnect and allow the controlled sharing of patient databases of various hospitals, clinics, and research laboratories. However, the sheer scale of these networks has made it difficult to gather statistics that could be used for building new features. In this paper, we present a technique to obtain estimations of the number of distinct values matching a query on the network. We evaluate the technique experimentally and provide a set of results that demonstrate its effectiveness, as well as its flexibility in supporting a variety of queries and applications.
    Proceedings of the 1st ACM International Conference on Pervasive Technologies Related to Assistive Environments, PETRA 2008, Athens, Greece, July 16-18, 2008; 01/2008
Show more