Conference Paper

A Multi-attribute Data Structure with Parallel Bloom Filters for Network Services

DOI: 10.1007/11945918_30 Conference: High Performance Computing - HiPC 2006, 13th International Conference, Bangalore, India, December 18-21, 2006, Proceedings
Source: DBLP


A Bloom filter has been widely utilized to represent a set of items because it is a simple space-efficient randomized data structure. In this paper, we propose a new structure to support the representation of items with multiple attributes based on Bloom filters. The structure is composed of Parallel Bloom Filters (PBF) and a hash table to support the accurate and efficient representation and query of items.The PBF is a counter-based matrix and consists of multiple submatrixes. Each sub- matrix can store one attribute of an item. The hash table as an auxiliary structure captures a verification value of an item, which can reflect the inherent dependency of all attributes for the item. Because the correct query of an item with multiple attributes becomes complicated, we use a two-step verification process to ensure the presence of a particular item to reduce false positive probability.

  • Source
    • "[32] discusses the use of Bloom Filters to speed-up the name-to-location resolution process in large distributed systems. Parallel versions of Bloom Filters were also proposed for multi-core applications [13] [22] [27]. A related problem of computing the number of distinct elements in a data stream was studied in [20]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The unparalleled growth and popularity of the Internet coupled with the advent of diverse modern applications such as search engines, on-line transactions, climate warning systems, etc., has catered to an unprecedented expanse in the volume of data stored world-wide. Efficient storage, management, and processing of such massively exponential amount of data has emerged as a central theme of research in this direction. Detection and removal of redundancies and duplicates in real-time from such multi-trillion record-set to bolster resource and compute efficiency constitutes a challenging area of study. The infeasibility of storing the entire data from potentially unbounded data streams, with the need for precise elimination of duplicates calls for intelligent approximate duplicate detection algorithms. The literature hosts numerous works based on the well-known probabilistic bitmap structure, Bloom Filter and its variants. In this paper we propose a novel data structure, Streaming Quotient Filter, (SQF) for efficient detection and removal of duplicates in data streams. SQF intelligently stores the signatures of elements arriving on a data stream, and along with an eviction policy provides near zero false positive and false negative rates. We show that the near optimal performance of SQF is achieved with a very low memory requirement, making it ideal for real-time memory-efficient de-duplication applications having an extremely low false positive and false negative tolerance rates. We present detailed theoretical analysis of the working of SQF, providing a guarantee on its performance. Empirically, we compare SQF to alternate methods and show that the proposed method is superior in terms of memory and accuracy compared to the existing solutions. We also discuss Dynamic SQF for evolving streams and the parallel implementation of SQF.
    Proceedings of the VLDB Endowment 06/2013; 6(8):589-600.
  • Source
    • "Bloom Filters have been applied even to network related applications such as finding heavy flows for stochastically fair blue queue management [26], packet classification [27], per-flow state management and longest prefix matching [28]. Multiple Bloom Filters in conjunction with hash tables have been studied to represent items with multiple attributes accurately and efficiently with low false positive rates [29]. Bloomjoin used for distributed joins have also been extended to minimize network usage for query execution based on database statistics. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Applications involving telecommunication call data records, web pages, online transactions, medical records, stock markets, climate warning systems, etc., necessitate efficient management and processing of such massively exponential amount of data from diverse sources. De-duplication or Intelligent Compression in streaming scenarios for approximate identification and elimination of duplicates from such unbounded data stream is a greater challenge given the real-time nature of data arrival. Stable Bloom Filters (SBF) addresses this problem to a certain extent. . In this work, we present several novel algorithms for the problem of approximate detection of duplicates in data streams. We propose the Reservoir Sampling based Bloom Filter (RSBF) combining the working principle of reservoir sampling and Bloom Filters. We also present variants of the novel Biased Sampling based Bloom Filter (BSBF) based on biased sampling concepts. We also propose a randomized load balanced variant of the sampling Bloom Filter approach to efficiently tackle the duplicate detection. In this work, we thus provide a generic framework for de-duplication using Bloom Filters. Using detailed theoretical analysis we prove analytical bounds on the false positive rate, false negative rate and convergence rate of the proposed structures. We exhibit that our models clearly outperform the existing methods. We also demonstrate empirical analysis of the structures using real-world datasets (3 million records) and also with synthetic datasets (1 billion records) capturing various input distributions.
  • Source
    • "Removing an element from this simple Bloom filter is impossible. The element maps to k bits, and although setting any one of these k bits to zero suffices to remove it, this has the side effect of removing any other elements that map on-to that bit, and we have no way of determining whether any such elements have been added [21]. Such removal would introduce a possibility for false negatives, which are not allowed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In WSNs the existing query flooding based and event flooding based routing protocol, Query flooding can find desired events quickly but is also costly because many query messages are generated and employs precise routing hints to route queries that can reduce query messages at the expense of heavy routing overhead (specifically, keeping precise routing hints for many events is expensive) respectively. Bloom filters have been used in database applications, web caching, and searching in peer-to-peer networks. In this paper, we propose a routing protocol in Wireless Sensor Networks (WSNs) by Scope Decay Bloom Filter (SDBF), that detecting and correcting the decaying bits of SDBF using Hamming Code. In SDBF, each node maintains some probabilistic hints about events and utilizes these hints to route queries intelligently. SDBF greatly reduces the amortized network traffic without compromising the query success rate and achieves a higher energy efficiency.
Show more