A hybrid searching scheme in unstructured P2P networks.
-
Citations (0)
- Cited In (1)
-
Article: Service provisioning for a next-generation adaptive grid.
IJPEDS. 01/2011; 26:85-106.
Page 1
A Hybrid Searching Scheme in Unstructured P2P Networks∗
Xiuqi Li and Jie Wu
Department of Computer Science and Engineering
Florida Atlantic University
Boca Raton, FL 33431
{xli, jie}@cse.fau.edu
Abstract
The existing searching schemes in Peer-to-Peer (P2P)
networks are either forwarding-based or non-forwarding
based.In forwarding-based schemes, queries are for-
warded from the querying source to the query destina-
tion nodes. These schemes offer low state maintenance.
However, querying sources do not entirely have control
over query processing. In non-forwarding based methods,
queries are not forwarded, and the querying source directly
probes its neighbors for the desired files. Non-forwarding
searching provides querying sources flexible control over
the searching process at the cost of high state maintenance.
In this paper, we seek to combine the powers of both for-
warding and non-forwarding searching schemes. We pro-
posean approachwherethe querying sourcedirectlyprobes
its own extended neighbors and forwards the query to a
subset of its extended neighbors and guides these neigh-
bors to probe their own extended neighbors on its behalf.
Our approach can adapt query processing to the popular-
ity of the sought files without having to maintain a large set
of neighbors because its neighbors’ neighbors are also in
the searching scope due to the 1-hop forwarding inherent
in our approach. It achieves a higher query efficiency than
the forwarding scheme and a better success rate than the
non-forwarding approach. To the best of our knowledge,
the work in this paper is the first one to combine forwarding
and non-forwarding P2P searching schemes. Experimental
results demonstrate the effectiveness of our approach.
1. Introduction
Peer-to-Peer (P2P) networks have been widely used for
information sharing. In such systems, all nodes play equal
roles and the need of expensive servers is eliminated. P2P
∗This work was supported in part by NSF grants CCR 9900646, CCR
0329741, ANI 0073736, and EIA 0130806.
networks are overlay networks, where each overlay link
is actually a sequence of links in the underlying network.
P2P networks are self-organized, distributed, and decentral-
ized. In addition, they can gather and harness the tremen-
dous computation and storage resources on computers in
the entire network. P2P networks can be classified as un-
structured, loosely structured, and highly structured based
on the control over data location and network topology [7].
In this paper, we are concerned with unstructured P2Ps be-
cause they are the most widely used systems in practice. In
such systems, no rule exists that defines where data is stored
and the network topology is arbitrary.
Searching is one of the most important operations in
P2P networks. Most existing P2P searching techniques are
based on forwarding [7]. In such schemes, a query is for-
warded on the overlay from the querying source toward the
querying destinations where the desired data items are lo-
cated. The query forwarding stops when the termination
condition is satisfied. Forwarding schemes offer low state
maintenance. Each node only needs to keep a small number
of neighbors. However, the querying source has no control
over query processing. Once the query is forwarded, the
querying source has no influence on the number of nodes
that receive the query and in which order these nodes re-
ceive the query. Too many nodes are searched for popular
data items while not enough nodes are examined for rare
ones. Therefore, the forwarding-based approach does not
offer query flexibility and has low query efficiency.
Recently, non-forwarding schemes were proposed in [2]
[12]. In these approaches, queries are not forwarded. In-
stead, the querying source directly probes its neighbors for
the data items it desires. Thus the querying source has full
control over query processing. The extent of a search is de-
termined by the querying source. For popular items, only
a small number of nodes need to be searched. For rare
items, a large number of nodes are queried. No resource
is wasted to search for popular items. However, to find
rare items, each node has to maintain (dynamically recruit)
a large number of living neighbors because it relies solely
Page 2
on its own neighbors for finding a data item. The system
has to either carry a large overhead to keep a large number
of neighbors alive or leaves queries unsatisfied with a low
state maintenance overhead because the number of living
neighbors that a node is aware of is not enough for finding
rare items.
In this paper, we seek to combine these two schemes
to get their advantages while lowering their disadvantages.
Our goals are to advocate the integration of both schemes,
to explore different methods for integration, and to evalu-
ate the integrated schemes. We propose an approach that is
a unification of direct query probing and guided 1-hop for-
warding. Given a query, the querying source directly probes
its own extended neighbors for the desired files and for-
wards the query to a selected number of neighbors. These
neighbors will probe their own extended neighbors on be-
half of and under the guidance of the querying source and
will not forward the query further. When the query termi-
nation condition is satisfied, the querying source terminates
its own probing and the probing of its neighbors.
The main contributions of this paper are the following:
• We identify the necessity to integrate both the forward-
ing schemes and non-forwarding schemes into one ap-
proach.
• We devise a hybrid approach that combines both the
forwarding and non-forwarding schemes. This hybrid
approach achieves query flexibility, query efficiency,
and query satisfaction without a large state mainte-
nance overhead. To the best of our knowledge, this
work is the first one to combine both schemes.
• We investigate different design tradeoffs in integrating
the forwarding and non-forwarding approaches. These
choices include constant integration and adaptive in-
tegration. We point out their pros and cons and offer
some practical advice in applying them to real world
systems.
• We put forward two new policies for recruiting new
neighbors, called Most Files Shared in Neighborhood
(MFSN) and Most Query Results in Neighborhood
(MQRN). The nodes with more files and more past
query results in its neighborhood are recruited first.
• We evaluate our hybrid approach against both the for-
warding schemes and non-forwarding schemes and
demonstrate the performance improvement in our hy-
brid approach through simulations.
This paper is organized as follows. In Section 2, the for-
warding and non-forwarding searching schemes in unstruc-
turedP2Pnetworksarereviewed. InSection3, theproposed
hybrid approach is overviewed and contrasted with the for-
warding and non-forwarding schemes. In Section 4, the de-
tails about the hybrid approach, such as action queue com-
putation, different integration design choices including con-
stant integration and adaptive integration, and state mainte-
nance are discussed. In Section 5, the experimental setup
and results are described. At the end, our work is summa-
rized and a future plan is identified.
2Related work
Most searching schemes in unstructured P2P networks
are forwarding-based, including iterative deepening [11],
local indices [11], k-walker random walk [8], modified
random BFS [6], two-level k-walker random walk [5], di-
rectedBFS[11], intelligentsearch[6], routingindicesbased
search [3], adaptive probabilistic search [9], and dominating
set based search [13]. These schemes are different varia-
tions of flooding used in Gnutella [1]. They can be classi-
fied as deterministic or probabilistic [7].
In contrast, there are only two non-forwarding schemes
for searching unstructured P2Ps in the research literature.
The non-forwarding concept was first proposed in GUESS
[2]. Inthisapproach, eachnodefullycontrolstheentirepro-
cess of its own queries. Each node directly probes its own
neighbors in a sequential order until the query is satisfied or
until all neighbors have been probed. The query fails in the
latter case. Each node uses a link cache to keep information
about its neighbors, which includes the IP, the time stamp,
the number of files shared, and the number of results from
the most recent query. There is one entry for each neigh-
bor in the link cache. These link cache entries are refreshed
through periodic pings. In addition, to add new neighbors
into the link cache, each node also requests that its neigh-
bors select a certain number of their own link cache entries
and return them in the pongs during the periodic pings.
Because of the overhead of link cache maintenance, the
link cache size cannot be too large. To accommodate this
problem, when a neighbor is probed during the processing
of a query, it also returns some of its own link cache entries
in a separate query pong message. These link cache entries
are stored in another cache, called query cache. Each entry
in the query cache has the similar content to that in the link
cache. Some entries in the query cache may be moved to
the link cache. However, the entries in the query cache is
not maintained.
The performance of GUESS is improved in [12], which
emphasizes the impacts of different design choices, called
policies, in non-forwarding schemes. The policies are clas-
sified into five types: QueryProbe, QueryPong, PingProbe,
PingPong, and CacheReplacement. For each policy type,
many specific policies may be adopted. Five common poli-
cies, which include random (RAN), most recently used
Page 3
forwarded and forwarding node
node forwarded only
probed and forwarded node
node probed only
querying source
(b)
(c)
(a)
??
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
???
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
???
Figure 1. The three types of P2P searches. (a) forwarding
based. (b) non-forwarding based. (c) hybrid.
(MRU), least recently used (LRU), most files shared (MFS),
and most results (MR), are proposed for these policy types.
3 Outline of the hybrid search
Figure 1 illustrates the differences between the three
types of searching approaches, forwarding based, non-
forwarding based, and hybrid. In the figure, a node’s chil-
dren refer to some or all its neighbors in the P2P overlay.
Forwarding-based searching can be regarded as a D-level
tree rooted at the querying source as shown in Figure 1(a).
D refers to the maximum TTL value. The querying source,
denoted by a triangle, checks its local datastore and for-
wards the query to its children nodes. These children, de-
noted by solid squares, look up their local datastores and
forward the query to their own children. This process con-
tinues until the search terminates successfully at a leaf node
that is not at Level-D or the search fails at a leaf node that is
at Level-D. It is observed that once the query is forwarded,
the querying source cannot control how the nodes on this
tree process the query. Each node just needs to maintain a
small number of neighbors because nodes within D hops of
the querying source are potentially in the searching scope.
Non-forwardingbasedsearchingisshowninFigure1(b).
It is a 1-level tree rooted at the querying source. The query-
ing source directly probes its child nodes for the desired
files. These children only search their local datastores and
do not send the query further. The querying source termi-
nates the search when the query is satisfied or when all its
neighbors are probed. Only the querying source and its di-
rect neighbors are involved in the processing of a particu-
lar query. Therefore, each node must maintain a sufficient
number of live neighbors. These neighbors are dynamically
recruited and updated via periodical ping-probes and ping-
pongs.
(b)
(a)
B4 B3B2
B1
8030
8010060 803040
A
100
10
80 90 100 150 200 100100 503020
Intermediate AQ
P_B2F_B1P_B3F_B2F_B4 F_B3P_B4P_B1
Tail
Head
P: Probe only F: Forward only PF: Probe&Forward
Final AQ
Head
Tail
P_B1P_B4PF_B3F_B4PF_B2F_B1
Figure 2. An example of action queue computation. (a)
the querying source A and its 2-hop neighborhood, and the
file distribution. (b) the computed action queue (the inter-
mediate and final results).
The hybrid searching is illustrated in Figure 1(c). It is
a 2-level tree rooted at the querying source. The query-
ing source directly probes the nodes at Level-1 of the tree.
In the mean time, it also forwards the query to the inter-
nal nodes at Level-1 and guides these nodes to probe the
nodes at Level-2 on its behalf. The querying source ter-
minates the search when the query is satisfied or when all
its neighbors and its neighbors’ neighbors are probed. The
maximum searching scope for a query in this approach is
the 2-hop neighborhood of the querying source.
Like the non-forwarding approach, a node in the hybrid
approach maintains an extended neighbor set and dynam-
ically recruits and updates this neighbor set via periodic
ping-probes and ping-pongs. However, the hybrid approach
can achieve the same or higher query satisfaction with less
neighbors per node. Compared to the forwarding-based ap-
proach, the querying source in the hybrid approach can con-
trol the extent of the searching.
Tocombinetheforwarding
smoothly, the hybrid search is implemented as follows. It
considers three types of actions, probing only, forwarding
only, probing and forwarding. Probing only means that the
querying source probes its neighbors and these neighbors
look up their local datastores. Forwarding only means that
the querying source does not probe its neighbors but guides
its neighbors to probe their own neighbors on its behalf.
Probing and forwarding means the combination of the first
two actions.
When processing a query, the querying source first ranks
these three types of actions if performed on all its neigh-
bors and organizes these actions into an action queue. Two
andnon-forwarding
Page 4
examples of action queues are shown in Figure 2(b). The
final AQ(Action Queue) contains six actions listed in the
descending order of their ranks, probe node B1, probe node
B4, probe and forward to node B3, forward to node B4,
probe and forward to node B2, and forward to node B1.
The querying source then takes actions in this queue in or-
der. It can take actions at a constant rate of k1actions at
once, which is called constant integration. It can also take
actions at a variable rate depending on the rareness of the
sought files, which is referred to as adaptive integration.
The querying source terminates the entire searching process
when the query is satisfied or when all actions in the queue
have been taken.
The action ranking considers both the costs and gains of
actions. The cost of an action is the time (in terms of the
number of overlay hops) it takes for that action to be com-
pleted. The gain of an action is the estimated probability of
that action for returning query results, which are determined
by the system policies. These policies can also be used by
the neighbors of the querying source for probing their own
neighbors on behalf of the querying source.
To keep information about neighbors, each node actively
maintains a link cache. There is one entry per neighbor.
These entries are periodically updated (deleting dead en-
tries, replacing existing entries using new entries) according
tosystempolicies. Weproposetwonewpolicies, MostFiles
Shared On Neighborhood (MFSN) and Most Query Results
on Neighborhood (MQRN).
4 The hybrid search
The hybrid search involves the querying source and its
neighbors. The processing at these nodes is shown in Al-
gorithm 1 and Algorithm 2. Given a query q, the querying
source s first computes the action queue AQ based on the
discussion in section 4.1. If constant integration is adopted,
s takes the first k1actions in AQ at the same time. k1is
a system parameter. P, F , or PF messsages are sent to
the intended neighbors according to the action types. When
v receives P or PF messages, it looks up its datastore and
returns the query results if there is any. When v receives F
or PF messages, it probes its own neighbors on behalf of s
with k2neighbors per probe. k2is also a system parameter.
Ifsreceivesanyquery resultfromaneighborv, sstoresthat
result. If adaptive integration is employed, follow the de-
tailed algorithm in section 4.2. When q is satisfied, s stops
its own probing and the probing performed by its neighbors
on its behalf.
4.1 Action queue computation
The action queue is computed based on the gain/cost ra-
tios of the actions if they are performed on the querying
Algorithm 1 The hybrid search at the querying source s
1: Compute the action queue AQ for the query q based on
the description in section 4.1;
2: if the integration design is constant then
3:
while q is not satisfied AND AQ is not empty do
4:
remove the first k1 actions from AQ and store
them in the array ACTk;
5:
for i = 0 to k1− 1 do
6:
if ACTk[i] is ProbeOnly then
7:
send P message to the intended node;
8:
else if ACTk[i] is ForwardOnly then
9:
send F message to the intended node;
10:
add this node to the set: FWDed;
11:
else
12:
send PF message to the intended node;
13:
add this node to the set: FWDed;
14:
end if
15:
end for
16:
if s receives query results from a neighbor v then
17:
store the query results in the array QRes;
18:
if v has probed all its neighbors then
19:
remove v from the set FWDed;
20:
end if
21:
end if
22:
end while
23: else
24:
call the algorithm adaptive integration search in
section 4.2;
25: end if
26: if q is satisfied then
27:
Order each node in FWDed to stop probing on be-
half of s;
28: end if
Algorithm 2 The hybrid search at the querying source s’s
neighbor v
1: if v receives a P message then
2:
v checks its local datastore and returns a query result
to s if the result is found;
3: else if v receives a F message then
4:
v probes its own neighbors on behalf of s at the rate
of k2nodes per probe;
5: else
6:
v checks its local datastore and returns a query result
to s if the result is found;
7:
v probes its own neighbors on behalf of s at the rate
of k2nodes per probe;
8: end if
Page 5
source’s neighbors. We intend to use the number of query
results per hop as the gain/cost ratio. The cost of an action
is the time (in terms of the number of overlay hops) taken
for that action to be completed. The gain of an action is
the estimated probability of that action for returning query
results. This probability is computed based on the system
policy on estimating nodes’ query-answering ability. Possi-
ble policies are random (RAN), most recently used (MRU),
most files (MF), and most query results (MR). The action
queue computation algorithm varies according to the cho-
sen system policy.
If the system policy is random, the action queue is a ran-
dom sequence of ProbeOnly actions on all neighbors of
the querying source s followed by a random sequence of
ForwardOnly actions on those neighbors. If the system
policy is most recently used, the action queue is a sequence
of ProbeOnly actions on s’s neighbors, followed by a se-
quence of FowardOnly actions on those neighbors. Both
sequences are sorted in the descending order of the times-
tamp when s interacted with these neighbors regardless
which party initiated the interation. No Probe&Forward
action is involved in these two policies to reduce the query
traffic.
If the system policy is most files, the action queue is
computed according to Algorithm 3. The gain/cost ratio of
a ProbeOnly action on a neighbor v, denoted by PGCRv,
is computed using the following formula. NumFvrepre-
sents the gain of the action. It is the number of files on node
v. 2 is the cost of this action, 2 overlay hops.
PGCRv=NumFv
2
The gain/cost ratio of a ForwardOnly action on a
neighbor v, denoted by FGCRv, is calculated according
to the following formula. NBvrefers to the set of neigh-
bors of node v. NumFurefers to the number of files on u.
dvrepresents the degree of node v. k2is the system param-
eter mentioned earlier. The gain of this action is the total
number of files on v’s neighbors. The cost of this action is
the denominator where 1 means that it takes one hop for the
querying source s to send a F message to v, 2dv/k2repre-
sents the time taken for v to finish probing all its neighbors
at the rate of k2nodes per probe, dv/k2denotes the time
taken for v to return all query results found on its neighbors
to s, and γ refers to the penalty weighting factor because
probing and forwarding are considered together in action
ranking.
?
If the system policy is most query results, the action
queue computation is similar to that of most files. The only
difference is that the number of files on node u and v are
FGCRv=
u∈NBvNumFu
γ(1 + 2dv/k2+ dv/k2)
Algorithm 3 The action queue computation at the querying
source s for policies MF and MR
1: compute the gain/cost ratios of the actions ProbeOnly
and ForwardOnly if performed on each neighbor v;
2: sort these actions in the descending order of their
gain/cost ratios and store the result in the linked list
AQ.
3: if
a node
v
exists
ForwardOnly to v precedes action Probe v Only in
AQ then
4:
replacetheaction
Probe v and Forward to v;
5:
remove the action Probe v Only from AQ;
6: end if
suchthattheaction
FowardOnlytov
by
replaced by the number of query results for the most recent
query on u and v respectively.
An example of action queue computation is shown in
Figure 2 and Table 1. Suppose that the querying source
A, its neighbors B1, B2, B3, B4, and its neighbors’ neigh-
bors are the same as that in Figure 2(a). The numbers
next to each node refers to the number of files on that
node. Assume that the system policy for estimating nodes’
query-answering ability is most files, k2= 2, and γ = 2.
We first consider the ProbeOnly and ForwardOnly ac-
tions if performed on each neighbor of A. The gain/cost
ratios of these actions are illustrated in Table 1.
node B4as an example. The gain/cost ratio of the action
Probe B4only is 80/2 = 40. The gain/cost ratio of the
action Forward to B4only is
Take
60 + 100 + 80 + 80
2(1 +2∗4
2+4
2)
.= 23.
Then we sort these actions in the descending order of their
gain/cost ratios and get the intermediate action queue as
shown in Figure 2(b).Because Forward to B3 only
(F B3) action appears before Probe B3only (P B3) ac-
tion in the intermediate AQ, they are combined into one ac-
tion Probe B3and Forward to B3(PF B3). Similarly
the actions F B2and P B2are combined into the action
PF B2. The final AQ is shown in Figure 2(b).
4.2Integration design
We consider two ways to integrate forwarding and prob-
ing, constant integration and adaptive integration. In con-
stant integration, the querying source s takes actions in the
actionqueueataconstantspeed(k1actionseachtimewhere
k1is determined experimentally). In adaptive integration, s
adjusts its action-taking progress according to the rareness
of the sought files. The rarer, the more progressive. There