Conference Paper

A Multi-attribute Data Structure with Parallel Bloom Filters for Network Services.

DOI: 10.1007/11945918_30 Conference: High Performance Computing - HiPC 2006, 13th International Conference, Bangalore, India, December 18-21, 2006, Proceedings
Source: DBLP

ABSTRACT A Bloom filter has been widely utilized to represent a set of items because it is a simple space-efficient randomized data structure. In this paper, we propose a new structure to support the representation of items with multiple attributes based on Bloom filters. The structure is composed of Parallel Bloom Filters (PBF) and a hash table to support the accurate and efficient representation and query of items.The PBF is a counter-based matrix and consists of multiple submatrixes. Each sub- matrix can store one attribute of an item. The hash table as an auxiliary structure captures a verification value of an item, which can reflect the inherent dependency of all attributes for the item. Because the correct query of an item with multiple attributes becomes complicated, we use a two-step verification process to ensure the presence of a particular item to reduce false positive probability.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In WSNs the existing query flooding based and event flooding based routing protocol, Query flooding can find desired events quickly but is also costly because many query messages are generated and employs precise routing hints to route queries that can reduce query messages at the expense of heavy routing overhead (specifically, keeping precise routing hints for many events is expensive) respectively. Bloom filters have been used in database applications, web caching, and searching in peer-to-peer networks. In this paper, we propose a routing protocol in Wireless Sensor Networks (WSNs) by Scope Decay Bloom Filter (SDBF), that detecting and correcting the decaying bits of SDBF using Hamming Code. In SDBF, each node maintains some probabilistic hints about events and utilizes these hints to route queries intelligently. SDBF greatly reduces the amortized network traffic without compromising the query success rate and achieves a higher energy efficiency.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bloom filters have been extensively applied in many network functions. Their performance is judged by three criteria: query overhead, space requirement, and false positive ratio. Due to wide applicability, any improvement to the performance of Bloom filters can potentially have a broad impact in many areas of networking research. In this paper, we study Bloom-1, a data structure that performs membership check in one memory access, which compares favorably with the k memory accesses of a standard Bloom filter. We also generalize Bloom-1 to Bloom-g and Bloom-Q, allowing performance tradeoff between membership query overhead and false positive ratio. We thoroughly examine the variants in this family of filters, and show that they can be configured to outperform the Bloom filters with a smaller number of memory accesses, a smaller or equal number of hash bits, and a smaller or comparable false positive ratio in practical scenarios. We also perform experiments based on a real traffic trace to support our filter design.
    IEEE Transactions on Parallel and Distributed Systems 01/2014; 25(1):93-103. DOI:10.1109/TPDS.2013.46
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The unparalleled growth and popularity of the Internet coupled with the advent of diverse modern applications such as search engines, on-line transactions, climate warning systems, etc., has catered to an unprecedented expanse in the volume of data stored world-wide. Efficient storage, management, and processing of such massively exponential amount of data has emerged as a central theme of research in this direction. Detection and removal of redundancies and duplicates in real-time from such multi-trillion record-set to bolster resource and compute efficiency constitutes a challenging area of study. The infeasibility of storing the entire data from potentially unbounded data streams, with the need for precise elimination of duplicates calls for intelligent approximate duplicate detection algorithms. The literature hosts numerous works based on the well-known probabilistic bitmap structure, Bloom Filter and its variants. In this paper we propose a novel data structure, Streaming Quotient Filter, (SQF) for efficient detection and removal of duplicates in data streams. SQF intelligently stores the signatures of elements arriving on a data stream, and along with an eviction policy provides near zero false positive and false negative rates. We show that the near optimal performance of SQF is achieved with a very low memory requirement, making it ideal for real-time memory-efficient de-duplication applications having an extremely low false positive and false negative tolerance rates. We present detailed theoretical analysis of the working of SQF, providing a guarantee on its performance. Empirically, we compare SQF to alternate methods and show that the proposed method is superior in terms of memory and accuracy compared to the existing solutions. We also discuss Dynamic SQF for evolving streams and the parallel implementation of SQF.