Page 1

A Multi-attribute Data Structure with Parallel

Bloom Filters for Network Services⋆

Yu Hua1,2and Bin Xiao1

1Department of Computing

Hong Kong Polytechnic University, Kowloon, Hong Kong

{csyhua, csbxiao}@comp.polyu.edu.hk

2School of Computer Science and Technology

Huazhong University of Science and Technology, Wuhan, China

Abstract. A Bloom filter has been widely utilized to represent a set of

items because it is a simple space-efficient randomized data structure. In

this paper, we propose a new structure to support the representation of

items with multiple attributes based on Bloom filters. The structure is

composed of Parallel Bloom Filters (PBF) and a hash table to support

the accurate and efficient representation and query of items. The PBF is

a counter-based matrix and consists of multiple submatrixes. Each sub-

matrix can store one attribute of an item. The hash table as an auxiliary

structure captures a verification value of an item, which can reflect the

inherent dependency of all attributes for the item. Because the correct

query of an item with multiple attributes becomes complicated, we use a

two-step verification process to ensure the presence of a particular item

to reduce false positive probability.

1Introduction

A standard Bloom filter can represent a set of items as a bit array using several

independent hash functions and support the query of items [1]. Using a Bloom

filter to represent a set, one can query whether an item is a member of the set

according to the Bloom filter, instead of the set. This compact representation is

the tradeoff for allowing a small probability of false positive in the membership

query. However, the space savings often outweigh this drawback when the false

positive probability is rather low. Bloom filters can be widely used in practice

when space resource is at a premium.

From the standard Bloom filters, many other forms of Bloom filters are pro-

posed for various purposes, such as counting Bloom filters [2], compressed Bloom

filters [3], hierarchical Bloom filters [4], space-code Bloom filters [5] and spectral

Bloom filters [6]. Counting Bloom filters replace an array of bits with counters

in order to count the number of items hashed to that location. It is very useful

⋆This work is partially supported by HK RGC CERG B-Q827 and POLYU A-

PA2F, and by the National Basic Research 973 Program of China under Grant

2004CB318201.

Page 2

to apply counting Bloom filters to support the deletion operation and handle a

set that is changing over time.

With the booming development of network services, the query based on mul-

tiple attributes of an item becomes more attractive. However, not much work

has been done in this aspect. Previous work mainly focused on the represen-

tation of a set of items with a single attribute, and they could not be used to

represent items with multiple attributes accurately. Because one item has mul-

tiple attributes, the inherent dependency among multiple attributes could be

lost if we only store attributes in different places by computing hash functions

independently. There are no functional units to record the multiple attributes de-

pendency by the simple data structure expansion on the standard Bloom filters

and the query operations could often receive wrong answers. The lost of depen-

dency information among multiple attributes of an item greatly increases the

false probability. Thus, we need to develop a new structure to the representation

of items with multiple attributes.

In this paper, we make the following main contributions. First, we propose

a new Bloom filter structure that can support the representation of items with

multiple attributes and allow the false positive probability of the membership

queries at a very low level. The new structure is composed of Parallel Bloom

Filters (PBF) and a hash table to support the accurate and efficient represen-

tation and query of items. The PBF is a counter-based matrix and consists of

multiple submatrixes. Each submatrix can store one attribute of an item. The

hash table captures a verification value of an item, which can reflect the in-

herent dependency of all attributes for one item. We generate the verification

values by an attenuated method, which tremendously reduces the items colli-

sion probability. Second, we present a two-step verification process to justify the

presence of a particular item. Because the multiple attributes of an item make

the correct query become complicated, the verification in the PBF alone is insuf-

ficient to distinguish attributes from one item to another. The verification in the

hash table can complement the verification process and lead to accurate query

results. Third, the new data structure in the PBF explores a counter in each

entry such that it can support comprehensive data operations of adding, query-

ing and removing items and these operations remain computational complexity

O(1) using the novel structure. We also study the false positive probability and

algebra operations through mathematic analysis and experiments. Finally, we

show that the new Bloom filter structure and proposed algorithms of data op-

erations are efficient and accurate to realize the representation of an item with

multiple attributes while they yield sufficiently small false positive probability

through theoretical analysis and simulations.

The rest of the paper is organized as follows. Section 2 introduces the related

work. Section 3 presents the new Bloom filter structure, which is composed of the

PBF and hash table. Section 4 illustrates the operations of adding, querying and

removing items. In Section 5, we present the corresponding algebra operations.

Section 6 provides the performance evaluation and Section 7 concludes our paper.

Page 3

2Related Work

A Bloom filter can be used to support membership queries [7], [8] because

of its simple space-efficient data structure to represent a set and Bloom filters

have been broadly applied to network-related applications. Bloom filters are used

to find heavy flows for stochastic fair blue queue management scheme [9] and

summarize contents to help the global collaboration [10]. Bloom filters provide

a useful tool to assist the network routing, such as route lookup [11], packet

classification [12], per-flow state management and the longest prefix matching

[13].

There is a great deal of room to develop variants or extensions of Bloom

filters for specific applications. When space is an issue, a Bloom filter can be an

excellent alternative to keeping an explicit list. In [14], authors designed a data

structure called an exponentially decaying bloom filter (EDBF) that encoded

such probabilistic routing tables in a highly compressed manner and allowed for

efficient aggregation and propagation.

In addition, network applications emphasize a strong need to engineer hash-

based data structure, which can achieve faster lookup speeds with better worst-

case performance in practice. From the engineering perspective, authors in [15]

extended the multiple-hashing Bloom filter by using a small amount of multi-port

on-chip memory, which can support better throughput for router applications

based on hash tables.

Due to the essential role in network services, the structure expansion of Bloom

filters is a well-researched topic. While some approaches exist in the literature,

most work emphasizes the improvements on the Bloom filters themselves. Au-

thors in [16] suggested the multi-dimension dynamic bloom filters (MDDBF)

to support representation and membership queries based on the multi-attribute

dimension. Their basic idea was to represent a dynamic set A with a dynamic

s×m bit matrix that consists of s standard Bloom filters. However, the MDDBF

lacks a verification process of the inherent dependency of multiple attributes of

an item, which may increase the false positive probability.

3Analytical Model

In this section, we will introduce a novel structure, which is composed of PBF

and a hash table, to represent items of p attributes. The hash table stores the

verification values of items and we provide an improved method for generating

the verification values.

3.1 Proposed Structure

Figure 1 shows the proposed structure based on the counting Bloom filters. The

whole structure includes two parts: PBF and a hash table. PBF and the hash

table are used to store multiple attributes and the verification values of items,

respectively. PBF uses the counting Bloom filters [2] to support the deletion

Page 4

H[1][1](a1)

H[1][2](a1)

.

.

H[1][q](a1)

4

1

0

3

..

..

.

.

..

..

0

0

1

0

40 ....03

H[2][1](a2)

H[2][2](a2)

.

.

H[2][q](a2)

1

0

0

2

..

..

.

.

..

..

0

0

1

0

10.. .. 03

H[p][1](ap)

H[p][2](ap)

.

.

0

1

0

3

..

..

.. 0

.. 1

6

1

.

.

m

H[p][q](ap)

30.... 01

.

.

.

.

v1=F(*)

a

a1

a2

ap

.

.

1

v2=F(*)

vp=F(*)

∑

=

i

=

p

ia

vV

1

…

…

Hash TableParallel Bloom Filters

Fig.1. The proposed structure based on counting Bloom filters.

operation and can be viewed as a matrix, which consists of p parallel submatrixes

in order to represent p attributes. A submatrix is composed of q parallel arrays

and can be used to represent one attribute. An array consists of m counters and

is related to one hash function. q arrays in parallel are corresponding to q hash

functions. Assume that aiis the ith attribute of item a. We use H[i][j](ai)(1 ≤

i ≤ p,1 ≤ j ≤ q) to represent the hash value computed by the jth hash function

for the ith attribute of item a. Thus, each submatrix has q × m counters and

PBF composed of p submatrixes utilizes p × q × m counters to store the items

with p attributes.

The hash table contains the verification values, which can be used to verify

the inherent dependency among different attributes from one item. We measure

the verification values as a function of the hash values. Let vi= F(H[i][j](ai))

be the verification value of the ith attribute of item a. The verification value of

item a can be computed by Va=?p

i=1vi, which can be inserted into the hash

table for future dependency tests.

3.2Role of Hash Table

The fundamental role of the hash table is to verify the inherent dependency

of all attributes for an item and avoid the query collision. The main reason

for the query collision in terms of multiple attributes is that the dependency

among multiple attributes is lost after we insert p attributes into p independent

submatrixes, respectively. Then, the PBF only knows the existence of attributes

and cannot determine whether those attributes belong to one item. Meanwhile,

the verification based on PBF itself is not enough to distinguish attributes from

one item to another. Therefore, the hash table can be used to confirm whether

the queried multiple attributes belong to one item.

Page 5

Thus, if a query receives answer True, the two-step verification process must

be conducted. First, we need to check whether queried attributes exist in PBF.

Second, we need to verify whether the multiple attributes belong to a single item

based on the verification value in the hash table.

3.3Verification Value

Traditionally, the hash values computed by hash functions are only used to

update the location counters in the counting Bloom filters. In the proposed

structure, we utilize the hash values to generate the verification values, which

can stand for existing items.

The basic method of generating the verification value is to add all the hash

values and store their sum in the hash table. For example, the value of variable

viis vi= F(H[i][j](ai)) =?q

case, the function F is a sum operation. Then, the verification value of item a

is Va=?p

i=1

?q

stands for an existing item a. However, in the basic method, the values computed

by different hash functions are possible to be the same and their sums might be

the same, too. Thus, different items might hold the same verification values in

the hash table and this will lead to the verification collision.

The improved method utilizes the sequential information of hash functions

to distinguish the verification values of different items. We allocate different

weights to sequential hash functions in order to reflect the difference among

hash functions. As for the ith attribute of item a, the value from the jth hash

function in the ith submatrix is defined as

the Attenuate Bloom Filters [17]. In attenuate Bloom filters, higher filter levels

are attenuated with respect to earlier filter level and it is a lossy distributed

index. Therefore, as for the item a, the verification value of the ith attribute is

defined as vi= F(H[i][j](ai)) =?q

j=1

is Va=?p

i=1

?q

j=1

2j

. This verification value of item a can be inserted

into the hash table.

j=1H[i][j](ai) for the ith attribute of item a. In this

j=1H[i][j](ai). Thus, Vacan be inserted into the hash table and

H[i][j](ai)

2j

, which is similar to the idea of

H[i][j](ai)

2j

. The verification value of item a

H[i][j](ai)

4 Operations on Data Structure

Given a certain item a, it has p attributes and each attribute can be represented

using q hash functions as shown in Figure 1. We denote its verification value by

Va, which is initialized to zero. Meanwhile, we can implement the corresponding

operations, such as adding, querying and removing items, with a complexity of

O(1) in the parallel Bloom filters and the hash table.

4.1Adding Items

Figure 2 presents the algorithm of adding items in the proposed structure. We

need to compute the hash values of multiple attributes by hash functions and