Article

Efficient Approximate Query Processing in Peer-to-Peer Networks

Univ. of California, Riverside;
IEEE Transactions on Knowledge and Data Engineering (impact factor: 1.66). 08/2007; 19(7):919-933. DOI:10.1109/TKDE.2007.1064
Source: IEEE Xplore

ABSTRACT Peer-to-peer (P2P) databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large-scale ad hoc analysis queries, for example, aggregation queries, on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement, given the distributed and dynamic nature of P2P databases. In this paper, we present novel sampling-based techniques for approximate answering of ad hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors: the data is distributed (usually in uneven quantities) across many peers, within each peer, the data is often highly correlated, and, moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach based on random walks of the P2P graph, as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution.

0 0
 · 
0 Bookmarks
 · 
59 Views

Full-text

View
0 Downloads
Available from

Keywords

ad hoc aggregation queries
 
adaptive two-phase sampling approach
 
aggregation queries
 
block-level sampling techniques
 
difficult
 
digital media
 
documents
 
dynamic nature
 
Exact solutions
 
high-quality random sample
 
large-scale ad hoc analysis queries
 
P2P databases
 
P2P environment
 
P2P graph
 
Peer-to-peer
 
prevalent
 
proposed solution
 
random sample
 
uneven quantities
 
unique challenges
 

B. Arai