Conference Paper

A Multi-attribute Data Structure with Parallel Bloom Filters for Network Services.

DOI: 10.1007/11945918_30 Conference: High Performance Computing - HiPC 2006, 13th International Conference, Bangalore, India, December 18-21, 2006, Proceedings
Source: DBLP

ABSTRACT A Bloom filter has been widely utilized to represent a set of items because it is a simple space-efficient randomized data structure. In this paper, we propose a new structure to support the representation of items with multiple attributes based on Bloom filters. The structure is composed of Parallel Bloom Filters (PBF) and a hash table to support the accurate and efficient representation and query of items.The PBF is a counter-based matrix and consists of multiple submatrixes. Each sub- matrix can store one attribute of an item. The hash table as an auxiliary structure captures a verification value of an item, which can reflect the inherent dependency of all attributes for the item. Because the correct query of an item with multiple attributes becomes complicated, we use a two-step verification process to ensure the presence of a particular item to reduce false positive probability.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract-This paper shows a scalable and adaptive decentralized metadata lookup scheme for ultra large-scale file systems Petabytes or even Exabytes. Our scheme logically creates metadata servers (MDS) into a multi-layered query hierarchy and exploits grouped filters Bloom to efficiently route metadata requests to desired MDSs through the hierarchy. This metadata lookup scheme can be performed at the network or memory speed. An effective workload balance algorithm is also developed in this paper for server reconfigurations. This scheme is calculated through extensive trace-driven simulations and prototype implementation in Linux. This scheme can significantly improve metadata management scalability and query efficiency in ultra large-scale storage systems.
    International Journal of Latest Trends in Engineering and Technology (IJLTET). 07/2013; Vol. 2(Issue 4):295-300.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The unparalleled growth and popularity of the Internet coupled with the advent of diverse modern applications such as search engines, on-line transactions, climate warning systems, etc., has catered to an unprecedented expanse in the volume of data stored world-wide. Efficient storage, management, and processing of such massively exponential amount of data has emerged as a central theme of research in this direction. Detection and removal of redundancies and duplicates in real-time from such multi-trillion record-set to bolster resource and compute efficiency constitutes a challenging area of study. The infeasibility of storing the entire data from potentially unbounded data streams, with the need for precise elimination of duplicates calls for intelligent approximate duplicate detection algorithms. The literature hosts numerous works based on the well-known probabilistic bitmap structure, Bloom Filter and its variants. In this paper we propose a novel data structure, Streaming Quotient Filter, (SQF) for efficient detection and removal of duplicates in data streams. SQF intelligently stores the signatures of elements arriving on a data stream, and along with an eviction policy provides near zero false positive and false negative rates. We show that the near optimal performance of SQF is achieved with a very low memory requirement, making it ideal for real-time memory-efficient de-duplication applications having an extremely low false positive and false negative tolerance rates. We present detailed theoretical analysis of the working of SQF, providing a guarantee on its performance. Empirically, we compare SQF to alternate methods and show that the proposed method is superior in terms of memory and accuracy compared to the existing solutions. We also discuss Dynamic SQF for evolving streams and the parallel implementation of SQF.
    Proceedings of the VLDB Endowment. 06/2013; 6(8):589-600.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This chapter discusses the false rates of Bloom filters in a distributed environment. A Bloom filter (BF) is a space-efficient data structure to support probabilistic membership query. In distributed systems, a Bloom filter is often used to summarize local services or objects and this Bloom filter is replicated to remote hosts. This allows remote hosts to perform fast membership query without contacting the original host. However, when the services or objects are changed, the remote Bloom replica may become stale. This chapter analyzes the impact of staleness on the false positive and false negative for membership queries on a Bloom filter replica. An efficient update control mechanism is then proposed based on the analytical results to minimize the updating overhead. This chapter validates the analytical models and the update control mechanism through simulation experiments.