ArticlePDF Available

Towards an immunity based distributed algorithm to detect harmful files shared in P2P networks

Authors:

Abstract

Due to the free and self-organized features, the Peer-to-Peer file sharing networks have become one of the major transmission channels for harmful contents, such as child pornography and abuse video. Traditional monitoring techniques deploy centralized powerful servers at gateways to analyse and filter the P2P traffic. However, the immense amount of documents shared and transferred in the P2P networks makes these techniques quite cost-expensive and inefficient. To address this problem, we develop the iDetect, a distributed harmful content detection algorithm inspired by the Clonal Selection mechanism of immune systems. Analogous to the B-lymphocytes secreting antibodies against antigens in human bodies, the clients in the P2P networks deployed with iDetect cooperate to detect the harmful contents in a distributed and self-organized manner. We build a probability model of the detection procedure to prove the performance of iDetect theoretically. We also conduct simulations to compare iDetect with traditional centralized filtering algorithms. The theoretical proof and experimental results show that iDetect is efficient, effective, self-optimized and scalable to locate the clients sharing harmful contentsin P2P networks.
Peer-to-Peer Netw. Appl.
DOI 10.1007/s12083-013-0221-7
Towards an immunity based distributed algorithm to detect
harmful files shared in P2P networks
Jianming Lv ·Zhiwen Yu ·Tieying Zhang
Received: 17 September 2012 / Accepted: 14 June 2013
© Springer Science+Business Media New York 2013
Abstract Due to the free and self-organized features, the
Peer-to-Peer file sharing networks have become one of
the major transmission channels for harmful contents, such
as child pornography and abuse video. Traditional mon-
itoring techniques deploy centralized powerful servers at
gateways to analyse and filter the P2P traffic. However, the
immense amount of documents shared and transferred in the
P2P networks makes these techniques quite cost-expensive
and inefficient. To address this problem, we develop the
iDetect, a distributed harmful content detection algorithm
inspired by the Clonal Selection mechanism of immune
systems. Analogous to the B-lymphocytes secreting anti-
bodies against antigens in human bodies, the clients in the
P2P networks deployed with iDetect cooperate to detect
the harmful contents in a distributed and self-organized
manner. We build a probability model of the detection
procedure to prove the performance of iDetect theoreti-
cally. We also conduct simulations to compare iDetect with
traditional centralized filtering algorithms. The theoretical
proof and experimental results show that iDetect is efficient,
J. Lv ()·Z. Yu
School of Computer Science and Engineering,
South China University of Technology,
Guangzhou, 510006 China
e-mail: jmlv@scut.edu.cn
Z. Yu
e-mail: zhwyu@scut.edu.cn
T. Zhang
Institute of Computing Technology,
Chinese Academy of Sciences,
Beijing, China
e-mail: zhangtieying@ict.ac.cn
effective, self-optimized and scalable to locate the clients
sharing harmful contentsin P2P networks.
Keywords Immunity ·Peer-to-peer ·File sharing ·Clonal
selection
1 Introduction
Many studies [13] show that a huge amount of harmful
and illegal contents such as child pornography and abuse
video are shared and exchanged in public P2P file sharing
networks. The contents are so easily accessed in the open
P2P infrastructure, that some dangerous side effects have
been brought to a significant proportion of young people
using internet.
Some recent researches [47] present filtering methods
to block the harmful content transferred in the P2P net-
work. All of them require deploying powerful centralized
servers at gateways to detect the incoming and outgoing P2P
traffic. While millions of clients are sharing billions of
videos, images, audios and text documents in the network,
to detect all of the contents is quite cost-expensive. The fre-
quently updating of the sharing files also makes it difficult
to find out the harmful ones in time.
In order to solve this limitation, we present the iDetect,
a distributed detection algorithm inspired by the clonal
selection mechanism of immune systems. We consider the
harmful contents shared in the P2P networks analogous
to the antigens in human bodies, and model the detection
tasks performed by each client as antibodies. By adopt-
ing iDetect, the clients in the P2P network can cooperate
to detect harmful contents, similar with the procedure of
the B lymphocytes secreting antibodies against antigens.
Peer-to-Peer Netw. Appl.
Compared with the traditional centralized filtering methods
[47], iDetect achieves the following advantages:
1) The detection of shared content is self-organized by a
huge amount of clients synchronously in a distributed
manner. It is much more cost-effective and robust than
the centralized filtering methods. When 10 % clients of
a P2P network with 10,000 on-line users take part in
the distributed detection, each one only needs to detect
27 files on average before all harmful clients in the net-
work are found out. By using the same time in the same
P2P network, the centralized filtering algorithms can
only detect 2 % of the harmful clients, even adopting
a powerful server with the computing ability 10 times
higher than normal clients.
2) Benefiting from the cloning and mutation mechanisms,
the iDetect algorithm can make the antibodies evolve
on each client to focus the detection on harmful con-
tent providers rapidly. Experiments show that, once a
harmful file is found, several more will be detected in
a short period of time. This local phenomenon causes
high efficiency of iDetect.
3) The iDetect algorithm has high scalability. It requires
almost equal time to finish locating all harmful content
providers, while the system scale grows exponentially.
On the other side, in the centralized filtering algorithms,
the time increases linearly along with the system scale.
4) A probability model of iDetect is built to analyse the
relationship between the detection efficiency and the
system configurations such as the ratio of clients pro-
viding harmful contents, the ratio of the harmful files
on each harmful client, the maximum clone rate and the
generation size. Comprehensive simulations are also
conducted to verify the theoretical analysis, and show
that higher ratio of harmful clients and harmful files in
the network may lead to higher efficiency of detection.
Moreover, the maximum clone rate and the generation
size are two important tunable parameters to achieve
faster retrieval of harmful files.
This paper is based on a short conference paper [37] with
6 pages. Beyond the original version, this paper builds
a probability model to prove the performance of iDetect
theoretically, presents discussion about the implementation
issues of iDetect in real P2P networks, and adds more
well-designed experiments to validate the efficiency and
effectiveness of iDetect.
The remainder of the paper is organized as follows.
Section 2introduces the related works. Section 3provides
a brief discussion about the clonal selection algorithm.
Section 4describes the model of iDetect. Section 5analyses
the performance theoretically. Section 6discusses several
implementation issues. Section 7evaluates the performance
of our approach through some simulation experiments.
Section 8concludes the paper and describes possible
future works.
2 Related work
The topic on harmful content detection has been extensively
discussed in the field of Internet web filtering. Some detec-
tion algorithms [13,14] are presented to analyse the content
of web pages and perform online filtering of pornographic
materials. M. Friedman et al. [36] present the clustering
methodology to detect anomalous documents downloaded
from the internet by users. Chen et al. [23] provide a
overview of the state-of-the-art of the deployment of web
filtering. All of these algorithms are implemented on cen-
tralized servers to detect and monitor the illegal web sites.
As the exponential growth of the scale of the P2P files
sharing networks [1012], a substantial amount of harmful
documents are transferred in the P2P networks. The P2P
network is a distributed architecture that all the clients in
the network are connected and cooperating in an equally
privileged manner. Not only each client is able to share doc-
uments to others, but also they can search and download
their favorite content from others.
From a certain perspective, the harmful content detec-
tion in the P2P network is to search for the harmful content
shared within. A lot of techniques [2734] are presented
to search specific content in the P2P network. Specifically,
S. Ratnasamy et al. [27] and I. Stoica et al. [28]present
some structured P2P overlays to facilitate the efficient key-
based search, in which each shared file is indexed by some
keys and the search is also based on the keys. C. Tang
et al. [29] and P. Reynolds et al. [30] introduce the infor-
mation retrieval mechanisms to search the content of the
text documents shared in the P2P network. W. Mller et al.
[31] and D. Picard et al. [32] present some intelligent algo-
rithms to search the images in the P2P network. Moreover,
the research [3335] use the heuristic algorithms based on
the shared content and the search history on each client
to improve the search performance. The search patterns
of above techniques are document keywords or reference
images, which are used to search for the most similar ones
shared in the network. In the harmful content scenario of
this paper, it is hard to define such search patterns because
of the variety and unpredictability of harmful content.
More related detection algorithms [47] present cen-
tralized infrastructures to analyse and block the harm-
ful contents shared in the P2P network. Specifically,
Nam et al. [4] proposes the P2P traffic sensor to filter
adult videos, images and text documents in P2P networks.
The sensor is responsible to identify and analyse the P2P
traffic and block the transmission of harmful information.
Liu et al. [5] present an algorithm to filter illegal files by
Peer-to-Peer Netw. Appl.
collecting the signatures of transferred files. Once the signa-
ture of a transferred file matches some one in the database
storing harmful files signatures, the system will perform
filtering immediately. A novel solution is presented by
Lee et al. [6], which introduces the idea of Honeypot into
P2P network. Some fake P2P service farms are implemented
and they are used to monitor and trace users who spread ille-
gal or harmful files in P2P network. Cruz et al. [7] designs
a tool to detect the Child Pornography in the P2P network
without violation of the privacy of legal users. The system
enables its operators to use the pedophilic files as search pat-
terns, which are gathered in other ways beyond the scope of
the system, to search for the P2P clients sharing these illicit
materials.
Compared with the centralized infrastructures proposed
in the related works above, the iDetect algorithm is based
on the theory of the Artificial Immune System (AIS) and
run by the P2P clients in a fully distributed way. According
to [20,22], “Artificial Immune Systems (AIS) are adaptive
systems, inspired by theoretical immunology and observed
immune functions, principles and models, which are applied
to problem solving. A lot of anomaly detection algorithms
[1521] based on AIS are presented recently.Different from
harmful content detection presented in this paper, anomaly
detection is mainly deployed to detect the intrusion against a
computer system. The AIS is applied in these algorithms to
model the normal state of the system and detect the anomaly
actions.
3 The clonal selection algorithm
The clonal selection theory is firstly presented by Burnet
[8], which is mainly about how the animal bodies respond
to the antigens invasion by producing antibodies through
B lymphocytes. The antibodies are the proteins having the
ability to recognize the invading antigens, which are viewed
as foreign material of body.The effectiveness of an antibody
to recognize an antigen is defined as affinity of the antibody.
The B lymphocytes secreting higher affinity antibodies are
selected to be cloned more, and the ones secreting low affin-
ity antibodies are eliminated in the end. On the other hand,
a process called affinity maturation is taken to keep the
antibodies mutating at a certain rate, which will keep the
diversity of generated antibodies. Additionally, the B lym-
phocytes can differentiate into long-lived B memory cells
secreting high affinity antibodies. This learning mechanism
enables the immune systems to respond to the recurring
antigens very fast.
According to the clonal selection theory, De castro
presents the clonal selection algorithm [9]. The procedure is
illustrated as Fig. 1. Given a collection of antibodies Iin the
step (1), the top nantibodies with highest affinity are firstly
Fig. 1 The clonal selection algorithm
selected as Inin the step (2). Inis named as a generation of
antibodies in the algorithm. Then, the antibodies in Inare
cloned in the step (3) and mutated in the step (4) according
to its affinity. In the following step (5), the antibodies with
highest affinity are re-selected and added back into the col-
lection. Finally, Idnovel antibodies are selected to replace
the antibody with low affinity in the step (6).
4 Model
4.1 System overview
In a P2P file sharing network as illustrated in Fig. 2, each
client can not only share files to other clients, but also search
and download favourite files from others [1012]. With-
out any constraint about the files sharing, it is very easy to
propagate harmful multimedia documents in the network.
We present the iDetect algorithm to detect and filter
harmful contents as illustrated in the Fig. 3. Assume that
a portion of the clients in the P2P network install the iDe-
tect plug-in module, which can be distributed by the P2P
software company or third-party supervisory service. Each
iDetect module works together with the original P2P client
to detect the shared contents of other clients in the network.
In this distributed way, the tremendous centralized detection
cost can be saved and replaced by the lightweight detection
tasks run on a large number of clients. As long as a client
finds out a harmful file, it reports the result to the centralized
mediator server. The mediator server then sends the warning
Fig. 2 P2P file sharing network
Peer-to-Peer Netw. Appl.
Fig. 3 Distributed detection in iDetect
information to the harmful file releaser, and prevent it from
accessing the P2P network until the harmful file is removed.
4.2 iDetect model
The distributed detection algorithm should achieve the fol-
lowing goals:
G1: The detection of harmful files should be as soon as
possible in order to lower down the side effects caused
by the propagation of harmful content.
G2: It should be guaranteed that all clients sharing harmful
contents are detected.
G3: Once a client has detected a file, it can memorize the
detection result. When it detects the same file from
another client, it can judge whether the file is harmful
much more quickly.
In order to achieve the above goals, we propose the iDe-
tect model based on the clonal selection algorithm [9]to
schedule the detection tasks among clients. Similar with the
context of immune systems where B-cells generate antibod-
ies to recognize invading antigens, the clients installing the
iDetect module execute the detection tasks to find harm-
ful files shared in the network. We borrow some concepts
in immune systems to describe the iDetect algorithm as
Table 1. Specifically, each harmful file shared by a client is
Table 1 Definitions in iDetect
Concepts Definition in immune Definition in iDetect
systems
Antibody Protein generated by B-cell A task executed by a
to recognize antigen client to detect a file
Antigen Foreign material simulating A harmful file shared
an immune response of body in the network
Affinity Effectiveness of an antibody Effectiveness of a detection
to recognize antigens task to find harmful files
Cloning Generating a copy of Generating a copy of
an antibody a detection task
Mutation A change of an antibody A change of a detection task
defined as an antigen (Ag). A task to detect a file shared by
some client is defined as an antibody (Ab). Each antibody
is coded as a tuple Abi=(Pj,F
k)which indicates a task
to detect the file Fkshared by the client Pj(1 jN
,1 kΓ). Here Nis the total number of the clients in the
P2P network, Γis the number of files shared by the client
Pj. Each client installing iDetect keeps generating and exe-
cuting the antibodies to detect files shared by other P2P
clients.
The affinity of an antibody is defined as follows to mea-
sure how effective of a detection task to find harmful files.
f(Ab
i)=0(if Fkis harmless)
sh(Fk)(ifF
kis harmful) (1)
Here sh(Fk)indicates the number of the clients shar-
ing Fk.IfFkis harmful, the affinity of Abiis measured as
sh(Fk), which indicates the popularity of the harmful file.
Otherwise, if Fkis harmless, the affinity is equal to 0, which
means the task is ineffective to find any harmful file.
Similar with the clonal selection algorithm, the cloning
and mutation operations are adopted in our method. Cloning
an antibody means to generate a copy of the detection task
run in a client.The number of the copies of an antibody is
defined as the clone rate, which is calculated as follows:
δ(Abi)=α(f(Ab
i)>0)
1(f (Abi)=0)(α>0)(2)
Here αis the maximum clone rate. Equation (2) implies
that the antibodies with higher affinity will be cloned
into more copies to accelerate the detection of correlated
harmful files.
On the other hand, the mutation of an antibody means
to change the task to detect another file. For any antibody
Abi=(Pj,F
k), three kinds of mutation operations on Abi
are defined as follows:
Client mutation: the client Pjis changed to another
different client P
jsharing Fk, and the file Fkis changed
to any other undetected file F
kon P
j.Thismeansto
detect another client sharing the same file Fk.
File mutation: the file Fkis changed to another differ-
ent file shared by the client Pj. This means to detect
another file on the same client Pj.
All mutation: the client Pjis changed to any other
clients P
jin the P2P network and the file Fkis changed
to any file F
kshared by P
j. This means to detect a
different file shared by any other client.
Each antibody should take one of the mutations men-
tioned above. Specifically, the probability to perform the
Client mutation is defined as follows:
χ(Abi)=1
2(1exp(f(Ab
i))) (3)
Peer-to-Peer Netw. Appl.
The probability to perform the File mutation is equal to
the Client mutation:
γ(Ab
i)=χ(Abi)(4)
The probability for the All mutation is defined as
follows:
λ(Abi)=exp(f(Ab
i)) (5)
AccordingtotheEq.1, the antibodies with higher affinity
are corresponding to the harmful files shared by more clients.
These antibodies will perform the File mutation and Client
mutation with higher probability according to Eqs. 3and 4.
The File mutation is to change the antibody to detect
another file shared by the same client. Because users tempt
to share multiple files on a topic, a harmful client usually
shares more than one harmful file. With the File mutation,
the detection can be guided to cover other harmful files
shared on the same client quickly.
The Client mutation is to detect another client sharing
the same harmful file. Once a harmful file is detected,
the Client mutation can help to find more harmful files
in a broad scope. By combining the File mutation with
Client mutation, the procedure to hunt for the harmful files
can be accelerated.
Different from the above two mutation operations chang-
ing antibodies to another correlative one, the All mutation
is used to change the antibody to a totally different one.
The All mutation can guarantee the diversity of antibodies
and make the scope of detection as large as possible.
4.3 The iDetect algorithm
As illustrated in Algorithm 1, the iDetect algorithm is
designed to schedule and execute the detection tasks on
each client. For better understanding the algorithm, a sim-
ple example of a P2P file sharing network with five peers is
also showed in Fig. 4to illustrate how antibodies evolve on
each iDetect client. Specifically, the algorithm contains the
following main steps:
1) Initializing the antibodies. Each client installing iDetect
randomly selects nother clients from the P2P network
and requests them for the name list of their shared files.
For each list, the client picks up one file to construct
an antibody. Figure 4b shows the evolution of the anti-
bodies on the peer P1.P1randomly select two clients,
P3and P4from the network, and constructs two anti-
bodies as (P3,F
7)and (P4,F
5). Here F7is a file ran-
domly selected from P3,andF5is randomly selected
from P4.
2) Detection. For each antibody, the client detects the con-
tent of the corresponding file and decides whether it is
harmful. In the real implementation, the detection tech-
niques presented in [4,25] can be adopted here. As
showed in the Fig. 4b, P1has two antibodies (P3,F
7)
and (P4,F
5)at the beginning of the iteration 1. P1then
detects the files F7on P3and F5on P4, and can decide
that F5is harmful.
3) Memorization. Once a harmful file is detected by a
client, its file name and the MD5 signature code are
memorized. When the client detects the same file
shared by other clients, it can judge the file is harmful
in a short time.
4) Selection. Similar with the clonal selection algorithm,
the nantibodies with highest affinity are selected to per-
form the subsequent cloning and mutation operations.
As illustrated in Fig. 4b, Pihas only two antibodies,
(P3,F
7)and (P4,F
5), at the beginning of the iteration
1, so both of them are selected to perform the following
cloning operations.
5) Cloning. The antibodies with higher affinity will be
cloned into more copies according to the clone rate
function δ(Abi)defined as Eq. 2.InFig.4b, Pical-
culates the clone rate of the two antibodies (P3,F
7)
and (P4,F
5). Because F7is harmless, the clone rate
of (P3,F
7)is 1 according to Eq. 2and only one copy
is generated. On the other hand, the clone rate of
(P4,F
5)is 2, because F5is harmful. Thus, (P4,F
5)is
duplicated.
6) Mutation. In this step, each antibody performs one
of the following mutation operations: Client mutation,
File mutation and All mutation. The probabilities to
perform these operations are defined in Eqs. 3,4
Peer-to-Peer Netw. Appl.
Fig. 4 An example of the
iDetect Algorithm. aThe figure
shows a P2P file sharing network
containing five clients P1P5,
which share the files F1F13.
Among these files, F4,F5and
F10 are harmful and others are
harmless. P1is deployed with
iDetect. The parameters are set
as follows: n=2,α=2. bThe
evolution of the antibodies on
P1while running the iDetect
algorithm. Each iteration
contains the following main
stages: detection, selection,
cloning and mutation
ab
and 5. Figure 4b shows an example of the mutation.
When Piis in the mutation stage of the iteration
1, the antibody (P3,F
7)performs All mutation and
is changed to (P5,F
13). The first (P4,F
5)performs
File mutation and is changed to (P4,F
10). The other
(P4,F
5)performs client mutation and is changed to
(P2,F
4).
The iDetect algorithm repeats the above steps while the
client is on-line. As showed in Fig. 4b, after finish-
ing the mutation in the iteration 1, Pigoes on to run
in the iteration 2. All the files in the three antibodies,
(P5,F
13),(P4,F
10)and (P2,F
4)are detected. The affin-
ity of (P5,F
13),(P4,F
10)and (P2,F
4)are 0, 1 and 2
respectively according to the Eq. 1. At the Selection stage,
(P4,F
10)and (P2,F
4)are selected, because they have high-
est affinity. Then each antibody is duplicated in the cloning
stage. In the mutation stage,the four antibodies are mutated
into another different ones.
5 Theoretical analysis
In this section, we analyse the detection efficiency of the
iDetect Algorithm. We denote the total number of the clients
as N, and the ratio of harmful clients in the network as η(0<
η<1). The averagenumber of files shared by each client is
Γ. For each harmful client, the average ratio of harmful files
shared on the client is ρ(0<ρ<1). Moreover, ωN(0<
ω<1)clients install the iDetect module. The time to detect
each file is ton average.
For clarity, the notations mentioned in this section are
illustrated in the Table 2.
We present the following Theorem 1 about the complex-
ity analysis of the Algorithm 1.
Theorem 1 The total number of antibodies created by a
client running iDetect is O(αnr), and the time consumed
by the client to perform detection is O(αnrt),wherenis
the generation size, ris the number of iterations run on the
client and tis the average time to detect each file.
Proof At the beginning of each iteration in the algorithm,
nantibodies with highest affinity are selected from the can-
didates. At the cloning stage of the iteration, each of the n
antibodies, Abi, is cloned into δ(Abi)copies in the system.
According to Eq. 2,δ(Abi)α. Thus, There are at most
αn antibodies created by a client in one iteration. After run-
ning riterations, the total number of antibodies created by
a client is O(αnr). For each antibody created by the client,
there is one corresponding file detected. Thus the detection
time consumed by the client is O(αnrt).
The above Theorem 1 shows that the time spent for
detection on each client is proportional to the number of
iterations, which can be tuned to control the resource con-
sumption for detection by each client. To infer how many
iterations are needed to detect harmful files shared in the
network, we present the following Theorem 2.
Table 2 Notation in algorithm 1
Description
αMaximum clone rate
nGeneration size: In each iteration, the top n antibodies with
highest affinity are selected to construct the antibody generation.
the antibody generation.
NTotal number of clients
ΓNumber of files shared by a client
ηRatio of harmful clients
ρRatio of harmful files on a harmful client
ωRatio of clients installing iDetect
tAverage time to detect each file
rNumber of iterations run on a client
Peer-to-Peer Netw. Appl.
Theorem 2 If n<N, for any harmful file Fxshared by any
client Pjin the network, the expectation of the number of
iterations needed to run before the file Fxgets detected is:
R1eωnΓ 11(6)
Proof To prove the theorem, we build a probability model
of the evolutionary procedure in the iDetect algorithm
as Fig. 5. During each evolutionary iteration, each anti-
body is evolved in four main stages: initialization, selec-
tion, cloning and mutation. An antibody is called matched
if its detected file is harmful. Otherwise, it is called
mismatched.
At the initialization stage, nrandom antibodies are con-
structed, which are corresponding to the detection tasks to
scan nfiles randomly selected from the network. At the
selection stage, nof the antibodies with highest affinity
are selected. At the cloning stage, each matched antibody
is cloned into αcopies, while the mismatched ones keep
unchanged. At the mutation stage, each antibody performs
one form of the mutations: Client mutation, File mutation
and All mutation, and the corresponding probabilities are
χ(Abi),γ(Ab
i)and λ(Abi)as defined in Eqs. 3,4and
5. Specifically, the mismatched antibodies only perform
All mutation, because the probabilities λ(Abi)of these
antibodies are 1 according to Eq. 5.
For each antibody taking Client mutation or
File mutation, it is changed to detect another file on a
harmful client. The probability of the detected file being
harmful is ρ, which is the ratio of harmful files on a harm-
ful client. Thus the mutated antibody is matched with the
probability ρ. On the other hand, for any antibody taking
All mutation, it is changed to detect any other file on any
client. Because the ratio of the harmful clients in the net-
work is η, the probability of the mutated antibody being
matched is ηρ .
Given any harmful file Fxshared by any client Pjin the
network, the probability of the antibody Abx=(Pj,F
x)
created by the iDetect client Piin No. k(k0) iter-
ation is denoted as i,k (Abx). Here the No. 0 iteration
means the initialization stage before running the No. 1
iteration. At this stage, nfiles are randomly selected to
initialize the corresponding nantibodies, so the probabil-
ity of Pigenerating Abxin No. 0 iteration is as follows:
i,0(Abx)=n(NΓ )1(7)
For the No. k(k>0) iteration run on Pi,
because the adoption of Cloning, Client mutation and
File mutation can make higher probability to generate
a matched antibody than random selection, we have:
i,k(Abx)i,0(Abx)=n(N Γ )1(8)
We denote the probability of Abxgenerated in
the No. kiteration by any client as k. Since there are
ωN clients installing iDetect in the network, from Eq. 8we
have:
k(Abx)=1
iD
(1i,k(Abx))
11n(NΓ )1ωN
=1(1c/N)ωN (9)
Here D={i|1iN,andPiinstalls
iDetect }.c= 1. Because n<Nand
Γ1, we have c<N.Letg(N) =1
(1c/N)ωN . The derivative of g(N ) is as follows:
g(N) =w(g(N ) 1)(ln(1c/N ) +c/(N c)) (10)
Given 0 <c<N, it can be proved that g(N) <
0. On the other hand, lim
N→∞ g(N) =1eωc ,so
g(N) 1eωc . Combined with (9)wehave:
k(Abx)g(N) 1eωc =1eωnΓ 1(11)
Thus, the expectation of the iterations needed to run
before Abxgets generated is:
R=1+(11(Abx))+(11(Abx))(12(Abx))+
(11(Abx))(12(Abx))(13(Abx)) +...
=1+
k=1
k
j=1
(1j(Abx))
1+
k=1
k
j=1
eωnΓ 1
1+
k=1
eωnΓ 1k
=1eωnΓ 11(12)
Once the antibody Abx=(Pj,F
x)is generated by some
client, the file Fxshared on Pjwill be detected in the same
iteration. Thus, the expectation of the number of iterations
needed to run before Fxgets detected is also equal to R,
and the Theorem 2 holds.
When combining the Theorem 1 and 2, we can prove
the following Theorem 3 about the detection efficiency of
iDetect.
Theorem 3 If n<N, for any harmful file Fxshared by any
client Pjin the network, the expectation of the time to run
Peer-to-Peer Netw. Appl.
Fig. 5 The evolution of an
antibody in the iDetect algorithm
iDetect before the file Fxgets detected is:
TF=Oαnt 1eωnΓ 11(13)
Proof According the Theorem 2, the expectation of the
number of iterations to run iDetect before Fxgets detected
is:
R1eωnΓ 11
According the Theorem 1, the corresponding time of run-
ning Riterations is: O(αntR). Thus, the expectation of the
time to run iDetect before the file Fxgets detected is:
TF=Oαnt 1eωnΓ 11
From the Theorem 3, we can compare the efficiency
between iDetect and the centralized filtering algorithms
[46]. In the centralized filtering algorithms, all shared
files transferred in the network are randomly sampled and
detected by the servers, which decide whether the files are
harmful. For any harmful file Fxshared by any client Pjin
the network, the time to run the algorithms before Fxgets
detected is T=O(tNΓ ), which grows proportional to the
network scale N. From Theorem 3, we can infer that the
detection time TFin iDetect is not related to N, but corre-
lated with ω, the ratio of the clients installing iDetect. That
means iDetect can be much more efficient than centralized
filtering algorithms in large scale network, if the ratio ωcan
be kept at a certain level. This is verified in the simulations
in the Section 7.2.
6 Implementation issues
In above chapters, we mainly focus on the schedule of detec-
tion tasks on the clients, while leaving the detail about the
implementation of the whole system in this section.
6.1 Communication of clients
As mentioned in the Algorithm 1, the communication proce-
dures required by a client installing iDetect are summarized
and supplied with implementation details as follows.
1) Joining and leaving the network: The clients installing
iDetect conform the P2P file sharing protocol and
join/leave the network as other file sharing clients.
Because each client installing iDetect independently
evolves its antibodies and executes the detection tasks,
no additional maintenance operations are required to
synchronize the detection of the clients.
2) Randomly selecting a client in the network: Most of
the file sharing networks support retrieving a list of
on-line clients in the network. For example, in BitTor-
rent [11] and Emule [12], the Distributed Hash Table
(DHT) is implemented to support distributed searching
of a numeric key and return the peers whose identities
are close to the key. The iDetect Module can invoke the
DHT API to search for a random key to get a list of
on-line clients in the network and select one randomly
from the list.
3) Randomly selecting a file shared by a client: The iDe-
tect module can invokes the peer browsing function
supported by most P2P file sharing clients, which is
used to view the sharing files ofa client. The implemen-
tation of the function is defined in the P2P file sharing
protocols, such as Emule [12] and Gnutella [10].
4) Detecting a file: While the iDetect module needs to
detect a file shared by another client, it conforms the
P2P file sharing protocol to transfer file blocks.
Peer-to-Peer Netw. Appl.
6.2 Deployment issues
In this section, we discuss the deployment issues of iDe-
tect by presenting the answers for some probably asked
questions as follows:
(1) How to deploy iDetect in the real P2P network? It
seems hard to persuade the users to install the content
detection module, which may preventthem from using
the P2P sharing freely.
(2) Why not just make every client only scan itself for
harmful content?
(3) Is the bandwidth used to download and detect the con-
tent too high for each individual client and affecting
the user experience?
(4) What is the accuracy rate of judging whether a file is
harmful while detecting the file?
(5) How to ban the harmful content sharer after we have
detected them?
The question (1) can be answered by the theoretical anal-
ysis in the Section 5. We only require a portion of clients
in the network to install iDetect. Theorem 3 shows that the
higher ratio (ω) of clients installing iDetect may leads to
faster detection of harmful contents. The supervisory orga-
nization may cooperate with the P2P service providers to
distribute the new version of client embedded with iDetect
to the public. While most of the legal users are unwilling
to let the harmful content spread in the network, they may
adopt the new version and take part in the detection.
The solution mentioned in question (2) is to let each
client only scan its local shared files instead of detecting
other clients as iDetect. This seems easier to implement than
iDetect, but confronts some obstacles in a real deployment.
Most of the popular P2P sharing protocols such as Emule,
BitTorrent and Gnutella are public, and usually have mul-
tiple compatible client softwares. For example, Emule has
more than 20 different but compatible client softwares [38].
Users may use any kind of clients to join the network. Thus
it is hard to force all the clients to install the iDetect module
to do self-detection.
As mentioned in the question (3), the latent high band-
width consumed to detect the media content may be a
problem to any individual user. This can be solved by utiliz-
ing the blocks transferring mechanism in P2P networks. To
save the bandwidth, the detecting clients can only download
some randomly selected blocks of a media file and detect
the frame images within the samples. Moreover, some files
can be judged harmful only according to its name. For these
files, there is no need to download the content to make the
decision.
The accuracy of judging a file as mentioned in question
(4) is decided by the adopted classification methods such as
the algorithms [4,25]. Specifically, for the detection algo-
rithm [25], its false positive ratio is 6.0 %, and the false
negative ratio is 1.2 %. On the other hand, the correct rate
of the algorithm [4] is 93 % as illustrated in the paper [4].
Both of these algorithms perform well in detecting harmful
content.
The answer of the question (5) is out of the scope of
this paper. As soon as a client detects a harmful file, it may
report the result to the centralized mediator server, which
is maintained by the third-party supervisory organization or
the P2P service providers themselves. The further filtering
actions [46] can be taken to block the transferring of illegal
contents. In another way, the clients installing iDetect can
also cooperate as [26] to stop searching and downloading of
some concerned files.
7 Experimentation and results
7.1 Experiments setup
In this paper, we setup the experiment to test the perfor-
mance of iDetect. We simulate a P2P file sharing network
with 10,000 clients. The files sharing scenario is configured
according to the measurement [24] about the real P2P file
sharing network, in which 25 % clients share nothing, 7 %
peers share more than 1000 files, and other peers share 100
files or less. To simplify the experiments without loss of
generality, we set that 25 % clients in the simulation share
nothing, 7 % ones share 1000 files and others share 100
files.
We also simulate the harmful contents shared in the net-
work. Unless otherwise specified, 1 % clients are harmful
and 50 % files of what they share are harmful. Each harm-
ful file is shared by 5 clients in the network on average. The
parameters configured in this section are summarized in the
Table 3.
We compare iDetect with the traditional centralized fil-
tering methods in the following experiments. In the iDetect
scenario, there are 10 % clients in the network running the
iDetect algorithm and detecting files independently. In the
Table 3 The parameters configured in the experiments
Description default value
αMaximum clone rate 4
nGeneration size 10
NTotal number of clients 10,000
ΓThe number of files shared by a client 1001000
ηRatio of harmful clients 1 %
ρRatio of harmful files on a harmful client 50 %
ωRatio of clients installing iDetect 10 %
Peer-to-Peer Netw. Appl.
centralized filtering scenario, there exists a powerful cen-
tralized server dedicated to randomly collect and detect the
files shared by the clients in the network.
7.2 Detection efficiency
Figure 6shows how many percentages of harmful files are
detected when a given number of files are scanned totally.
While 100,000 files are detected in the iDetect algorithm,
almost 100 % of the harmful files areincluded. This is much
more efficient than the centralized filtering algorithms, in
which less than 10 % of harmful files are collected when
totally 100,000 files are detected.
Specifically, to look into the efficiency of three types of
mutations in the iDetect algorithm, we also construct two
other variants of the algorithm:
All m+File m: The All mutation and File mutation are
combined to use while disabling the
Client mutation. The probability for the
File mutation is defined as: χ(Abi)=
1exp(f(Ab
i)), while the probability
of the All mution is defined as: λ(Abi)=
exp(f(Ab
i)). In this case, once a harm-
ful file is detected, the detection focuses
on the sharer of this file.
All m: Only the All mutation is adopted in the
algorithm. In other words, the probability
for All mutation is defined as: λ(Abi)=
1. In this case, each client installing iDe-
tect randomly selects the files shared by
others to detect.
Figure 6shows that the efficiency of the All m is close to
the centralized filtering algorithm, because they have similar
randomized detection manner. The All m+File mismuch
more efficient to capture the harmful files. This is due to the
use of the File mutation, which makes the detection focused
on the harmful clients. The iDetect algorithm combining
Fig. 6 Harmful files detection rate vs total number of scanned files
a
b
Fig. 7 The function h(Fi)of the No. idetected file Fiduring the
detection procedure. If Fiis harmful, h(Fi)is 1, otherwise h(Fi)is 0. a
The detection procedure of a client installing iDetect. bThe detection
procedure in the centralized filtering system
all of the three mutations are most efficient. This proves
the effect of the Client mutation, which helps covering all
harmful clients quickly by mutating the detection against
different clients sharing a same harmful file.
Further more, to explain why iDetect is more efficient
than the centralized filtering methods to locate harmful files,
we also illustrate the detail of the detection procedure in
Fig. 7. The figure shows the h(·)function (defined in Eq.
1) of the first 200 files detected in the systems. In the iDe-
tect case, the dense lines indicate that, once a harmful file is
found, several more will be detected in a short period. This
local phenomenon causes the high efficiency of iDetect. In
contrast, in the centralized filtering scenario, the sparse line
visualizes that the harmful files are detected randomly and
independently.
In the real deployment of the detection algorithms, what
is cared about seriously is how soon all of the harmful
clients can be detected. In order to measure the time to fin-
ish the detection, we assume that the time for each client
to detect a file is equal and defined as 1 unit time. Con-
sidering that the servers in centralized filtering algorithms
are usually much more powerful than normal clients, we
assume that the time for the server to detect a file is 1/10 unit
time. During the detection, once a harmful file on a client is
detected, this client is considered as a harmful client.
Figure 8shows how many harmful clients are detected
as time goes on. Similar with the Fig. 6, iDetect is much
more efficient than other algorithms, since it only takes 27
unit time to finish detecting all harmful clients. That means
all of the harmful clients are found out when each client
Peer-to-Peer Netw. Appl.
installing iDetect has detected 27 files on average. By using
the same time, the centralized filtering algorithm only detect
2 % harmful clients. We also test the efficiency of iDetect
under a extreme condition, named One-iDetect in Fig. 8,
where only one client installs the iDetect module. It shows
that One-iDetect is also much better than the centralized fil-
tering method. This result suggeststhat iDetect can also help
to improve the performance of centralized detection while
the servers are deployed with iDetect instead of randomized
detecting.
7.3 Scalability
In this section, we conduct experiments to test the scalability
of the iDetect algorithm in different network environments.
First of all, we test how the performance of the algorithm
changes while increasing the system scale exponentially.
Figure 9illustrates how the time to detect all harmful clients
is affected by the increase of the system scale. It shows that
the iDetect algorithm has high scalability, for its detection
efficiency keeps steady against the increase of the system
scale. This result verifies the Theorem 3. On the other hand,
in the centralized filtering algorithm, the time increases lin-
early along with the system scale. The higher scale the
system has, the much more iDetect outperforms the central-
ized filtering algorithm. For the file sharing network with
one million peers, the time for the centralized filtering algo-
rithm to finish detection is 31,519 times of that in iDetect,
when the server can detect a file within 1/10 unit time. Fur-
thermore, if we plan to deploy a more powerful server to
detect files as fast as iDetect, the server should be 315,190
times quicker than a PC installing iDetect, and also con-
sume the bandwidth multiplied by the same number. Thus,
the huge amount of computing and bandwidth cost to main-
tain powerful centralized servers can be saved by deploying
iDetect to perform the distributed and intelligent detection.
We also test the performance of the iDetect algorithm
against the increase of the density of the harmful files in
Fig. 8 Harmful clients detection rate vs time
Fig. 9 The time to detect all harmful clients in different system scales
the system. Figure 10a illustrates how the ratio of harmful
files on each harmful client affects the detection efficiency.
It shows that the detection will be much quicker if the ratio
is higher. This can be explained roughly from the mutation
procedure of the iDetect algorithm. Higher ratio of harmful
files on a harmful client may make the client more possi-
ble to share same harmful files with other harmful clients.
Accordingly, the antibody including the client may be gen-
erated with higher probability by the Client mutation, which
mutate the antibody against the clients sharing same harm-
ful files. Thus the harmful clients may be detected in higher
probability. On the other hand, Fig. 10b illustrates how the
performance changes while increasing the harmful client
ratio ηfrom 1 %32 %. It shows that the time to finish
detecting all harmful clients grows linearly along with the
density of the harmful clients.
Furthermore, we also test how the ratio of the clients
installing iDetect affects the detection efficiency. Figure 11
shows that more clients taking part in the distributed
detected may cause faster retrieval of the harmful clients.
This result verifies the Theorem 3, which shows that the
detection time TFis a decreasing function of ω, the ratio of
clients installing iDetect. Moreover, when all the clients in
the network install iDetect, the detection is fastest and the
efficiency is close to the self-detecting method where each
one just detects itself.
7.4 Sensitivity of the parameters
In order to make the parameter of the iDetect algorithm
more reasonable for the real deployment, we test the sen-
sitivity of the two parameters: αand n. As illustrated in
Table 2,αis the maximum clone rate and nis the generation
size.
Figure 12 shows the performance of iDetect with differ-
ent combination of αand n. We can see that the detection is
finished quicker when αis tuned bigger. The result can be
explained as the result of the following chain reaction.When
Peer-to-Peer Netw. Appl.
Fig. 10 The detection
efficiency in different density of
harmful files. aThe detection
efficiency vs different ratio of
harmful files on each harmful
client. bThe detection
efficiency vs different ratio of
harmful clients in the system
ab
αis set bigger, the antibodies to detect the files on harm-
ful clients will be cloned more in each generation. This may
stimulate the File mutation and Client mutation happened
in higher probability, which promotes the speed to capture
harmful clients sharing same harmful files. Finally, the time
to detect all harmful clients decreases.
We can also infer from Fig. 12 that when αis bigger
than a threshold about 6, αdoes not affect the detection
obviously. This also indicates that in real deployment, it is
reasonable to set α6.
Figure 12 also shows how the generation size naffects
the detection. Given the same configuration of α, bigger
nleads to worse performance. Increasing nmay cause the
antibodies with low affinity also gain too much opportu-
nity to be cloned and mutated, which makes the clients
waste more time to detect harmless files. This suggests
that we should set nas low as possible. From Fig. 12 we
can see that 2 is the most reasonable setting for n, while
α6. That means only the top two antibodies with high-
est affinity are selected to join the evolvement of each
iteration.
Fig. 11 The detection efficiency in different ratio of the clients
installing the iDetect module
Fig. 12 The figure shows the detection efficiency of iDetect with
different configuration of the clone rate αand the generation size n
8 Conclusion
In this paper, we investigate the problem of how to detect
harmful contents shared in the P2P networks efficiently and
effectively. Our major contribution is the iDetect, a dis-
tributed detection algorithm based on the clonal selection
mechanism of immune systems. The detection tasks can be
executed parallel on the P2P clients and intelligently focus
on the harmful clients to locate the harmful files efficiently.
To prove the performance of the algorithm theoretically,
we build a probability model about the evolvement pro-
cedure of the antibodies. We also conduct experiments to
compare iDetect with the centralized filtering algorithms.
Theoretical proof and experiments show the following
results:
1) The iDetect algorithm is much more efficient than the
centralized filtering algorithms. When 10 % clients of
the P2P network take part in the distributed detection,
each one only requires to detect 27 files on average
before all harmful clients in the network are found out.
Peer-to-Peer Netw. Appl.
By using the same time, the centralized filtering algo-
rithms can only detect 2 % of the harmful clients, even
adopting a powerful server with the computing ability
10 times higher than normal clients.
2) The iDetect algorithm has high scalability. The detec-
tion efficiency keeps almost unchanged while the sys-
tem scale grows exponentially.
3) The density of the harmful files shared in the network
affects the performance of the iDetect algorithm. More
harmful files shared on each harmful client may cause
the algorithm easier to detect harmful clients in the
network.
4) The maximum clone rate αand the generation size n
are two tunable model parameters in the iDetect algo-
rithm. Increasing αcan accelerate the evolvement of
antibodies to achieve faster retrieval of harmful files,
while decreasing nmay reduce the opportunity to detect
harmless files. The experiment results suggest that α
had better to be set bigger than 6 and nto be set as 2.
Acknowledgments The work described in this paper was supported
by grants from National Natural Science Foundation of China (Project
No.61070090, 61003174, 60973083 and 61170080), the grant fromthe
Comprehensive Strategic Cooperation Project of Guangdong Province
and Chinese Academy of Sciences (Project No.2012B090400016),
the grant from the Technology Planning Project of Guangdong
Province (Project No.2012A011100005), and the grant from the
Fundamental Research Funds for the Central Universities (Project
No.2011ZM0069)..
References
1. Fournier R, Cholez T, Latapy M, Magnien C, Chrisment I,
Daniloff I, Festor O (2012) Comparing paedophile activity in
different P2P systems. Arxiv eprint, arXiv:1206.4167
2. Wolak J, Finkelhor D, Mitchell K (2011) Child pornography pos-
sessors: trends in offender and case characteristics. Sexual Abuse
23(1):22–42
3. Hughes D, Walkerdine J, Coulson G, Gibson S (2006) Peer-to-
Peer: is deviant behavior the norm on P2P file-sharing networks?
IEEE Distrib Syst Online 7(2):1–7
4. Nam T, Lee H, Jeong C, Han C (2005) A harmful content pro-
tection in peer-to-peer networks. Artif Intell Simul 3397:617–
626
5. Liu J, Ning L, Xue Y, Wang D (2006) PIFF: an intelligent file
filtering mechanism for peer-to-peer network. In: Proceedings 2nd
IEEE symposium dependable, autonomic and secure computing,
DASC’06. Indianapolis, pp 308–314
6. Lee H, Nam T (2007) P2P honeypot to prevent illegal or harm-
ful contents from spreading in P2P network. In: Proceedings
9th international conference advanced communication technology,
ICACT’07. Gangwon-Do, pp 497–501
7. Cruz IP, Aller CF, Garcia SS, Gallardo JC (2010) A careful design
for a tool to detect child pornography in P2P networks. In: Pro-
ceedings IEEE international symposium technology and society,
ISTAS’10. Wollongong, pp 227–233
8. Burnet FM (1959) The clonal selection theory of acquired immu-
nity. Cambridge University Press, Cambridge
9. de Castro LN, Von Zuben FJ (2002) Learning and optimization
using the clonal selection principle. IEEE Trans Evol Comput
6(3):239–251
10. http://en.wikipedia.org/wiki/Gnutella, 2010
11. www.bittorrent.com/, 2010
12. www.emule.org/, 2010
13. Hammami M, Chahir Y, Chen L (2006) WebGuard: a web filter-
ing engine combining textual, structural, and visual content-based
analysis. IEEE Trans Knowl Data Eng 18(2):272–284
14. Lee PY, Hui SC, Fong ACM (2005) An intelligent categorization
engine for bilingual web content filtering. IEEE Trans Multimedia
7(6):1183–1190
15. Harmer PK, Williams PD, Gunsch GH, Lamont GB (2002) An
artificial immune system architecture for computer security appli-
cations. IEEE Trans Evol Comput 6(3):252–280
16. Dasgupta D, Gonzalez F (2002) An immunity-based technique to
characterize intrusions in computer networks. IEEE Trans Evol
Comput 6(3):281–291
17. Gonzales F, Dasgupta D, Kozma R (2002) Combining negative
selection and classification techniques for anomaly detection. In:
Proceedings IEEE congress evolutionary computation, CEC’02.
Honolulu, pp 705–710
18. Dasgupta D, Majumdar NS (2002) Anomaly detection in multi-
dimensional data using negative selection algorithm. In: Proceed-
ings IEEE congress evolutionary computation CEC’02. Honolulu,
pp 1039–1044
19. Anchor KP, Williams PD, Gunsch GH, Lamont GB (2002) The
computer defense immune system: current and future research in
intrusion detection. In: Proceedings IEEE congress evolutionary
computation, CEC’02. Honolulu, pp 1027–1032
20. de Castro LN, Timmis J (2002) Artificial immune systems: a new
computational intelligence approach. Springer-Verlag, Berlin
21. Powers S, He J (2008) A hybrid artificial immune system and
self organising map for network intrusion detection. Inf Sci
178(15):3024–3042
22. Freitas AA, Timmis J (2007) Revisiting the foundations of artifi-
cial immune systems for data mining. IEEE Trans Evol Comput
11(4):521–540
23. Chen TM, Wang V (2010) Web filtering and censoring. Computer
43(3):94–97
24. Saroiu S, Gummadi PK, Gribble SD (2002) A Measurement study
of peer-to-peer file sharing systems. In: Proceedings multimedia
computing and networking, MMCN’02, San Jose
25. Hu W, Wu O, Chen Z, Fu Z, Maybank S (2007) Recognition of
pornographic web pages by classifying texts and images. IEEE
Trans Pattern Anal Mach Intell 29(6):1019–1034
26. Singh A, Ngan T, Druschel P, Wallach D (2006) Eclipse
attacks on overlay networks: threats and defenses. In: Proceed-
ings IEEE international conference computer communications,
INFOCOM’06. Barcelona, pp 1–12
27. Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001)
A scalable content-addressable network. In: Proceedings ACM
SIGCOMM conference, SIGCOMM’01. San Diego, pp 161–172
28. Stoica I, Morris R, Karger D, Kaashoek F, Balakrishnan H
(2001) Chord: a scalable peertopeer lookup service for inter-
net applications. In: Proceedings ACM SIGCOMM conference,
SIGCOMM’01. San Diego, pp 149–160
29. Tang C, Xu Z, Dwarkadas S (2001) Peer-to-peer information
retrieval using self-organizing semantic overlay networks. In:
Proceedings ACM SIGCOMM conference, SIGCOMM’01. San
Diego, pp 175–186
30. Reynolds P, Vahdat A (2003) Efficient peer-to-peer keyword
searching. In: Proceedings ACM/IFIP/USENIX international mid-
dleware conference, middleware’03. Rio de Janeiro, pp 21–40
Peer-to-Peer Netw. Appl.
31. Muller W, Boykin PO, Roychowdhury VP, Sarshar N (2008)
Comparison of image similarity queries in P2P systems. Comput
Commun 31(2):375–386
32. Picard D, Revel A, Cord M (2010) An application of swarm intel-
ligence to distributed image retrieval. Inf Sci. In Press, Corrected
Proof, Available online
33. Yang B, Garcia-Molina H (2002) Improving search in peer-to-peer
networks. In: Proceedings international conference distributed
computing systems, ICDCS’02. Vienna, pp 5–14
34. Crespo A, Garcia-Molina H (2002) Routing indices for peer-to-
peer systems. In: Proceedings international conference distributed
computing systems, ICDCS’02. Vienna, pp 23–32
35. Kalogeraki V, Gunopulos D, Zeinalipour-Yazti D (2002) A local
search mechanism for peer-to-peer networks. In: Proceedings
international conference information and knowledge management,
CIKM’02. McLean, pp 300–307
36. Friedmana M, Lastb M, Makoverb Y, Kandelc A (2007) Anomaly
detection in web documents using crisp and fuzzy-based cosine
clustering methodology. Inf Sci 177(2):467–475
37. Lv J, Yu Z, Zhang T (2011) iDetect: an immunity based algo-
rithm to detect harmful content shared in peer-to-peer networks.
In: Proceedings international conference machine learning and
cybernetics, ICMLC’11. Guilin, pp 926–931
38. http://www.emule-mods.de/?mods=start, 2012
Jianming Lv received the
BS degree in Computer Sci-
ence from Sun YAT-SEN Uni-
versity, China, in 2002, and
the PhD degrees from Insti-
tute of Computing Technol-
ogy, Chinese Academy of Sci-
ences University in 2008. He
is currently a lecturer in South
China University of Technol-
ogy. His research interests
include peer-to-peer comput-
ing, security and privacy.
Zhiwen Yu is a professor in
the School of Computer Sci-
ence and Engineering, South
China University of Technol-
ogy, Guangzhou, China. He
received the B.Sc. and M.Phil.
degrees from the Sun Yat-Sen
University in China in 2001
and 2004 respectively, and the
Ph.D. degree in Computer Sci-
ence from the City Univer-
sity of Hong Kong, in 2008.
He hold a research position in
Hong Kong Polytechnic Uni-
versity. His research interests include bioinformatics, machine learn-
ing, pattern recognition, multimedia, intelligent computing and data
mining. He has published more than 70 technical articles in referred
journals and conference proceedings in the areas of bioinformatics,
artificial intelligence, pattern recognition and multimedia.
Tieying Zhang is currently
an assistant professor at the
Institute of Computing Tech-
nology, Chinese Academy of
Sciences. His research inter-
ests include computer net-
works, distributed computing,
peer-to-peer systems, multi-
media networking, and net-
work security. He has pub-
lished over 20 technical papers
and book chapters in the above
areas. He is a member of IEEE
and ACM.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper addresses the social problem of child pornography on peer-to-peer (P2P) networks on the Internet and presents an automated system with effective computer and telematic tools for seeking out and identifying data exchanges with pedophilic content on the Internet. The paper analyzes the social and legal context in which the system must operate and describes the processes by which the system respects the rights of the persons investigated and prevents these tools from being used to establish processes of surveillance and attacks on the privacy of Internet users.
Article
Full-text available
Peer-to-peer (P2P) systems are widely used to exchange content over the Internet. Knowledge on paedophile activity in such networks remains limited while it has important social consequences. Moreover, though there are different P2P systems in use, previous academic works on this topic focused on one system at a time and their results are not directly comparable. We design a methodology for comparing \kad and \edonkey, two P2P systems among the most prominent ones and with different anonymity levels. We monitor two \edonkey servers and the \kad network during several days and record hundreds of thousands of keyword-based queries. We detect paedophile-related queries with a previously validated tool and we propose, for the first time, a large-scale comparison of paedophile activity in two different P2P systems. We conclude that there are significantly fewer paedophile queries in \kad than in \edonkey (approximately 0.09% \vs 0.25%).
Conference Paper
Full-text available
In this paper we propose the P2P Honeypot that prevents illegal or harmful files from spreading in P2P network. We apply the idea of Honeypot to P2P network. We build fake P2P service farms and monitor and trace users who spread or gain illegal or harmful files in P2P network. If we can apply this system widely, we can expect illegal or harmful files to be wiped out in P2P network.
Article
Full-text available
Information on the Web is not as uncontrolled as it may appear and for controlling them, web filtering and censoring is essential. Web filters differ in complexity, granularity, accuracy, location, and transparency. The simplest Web filters depend on blacklists of IP addresses. The main advantage of blacklists is speed, essentially a fast table lookup. Speed allows Web filtering at choke points in the network where traffic is aggregated such as gateways between neighboring national networks. IP and URL blacklists can be deployed at proxy-based filters. A proxy-based filter checks the IP addresses or URLs in all Web requests against a blacklist. If it detects a blacklisted IP address or URL, the proxy filter can return a blockpage with an error message or explanation that the content was blocked. Content filters typically use machine learning or AI techniques to classify webpages into a set of predefined categories. Intelligent content filters examine various elements of a webpage for classification, including the metadata, links, text, images, and scripts.
Conference Paper
The distribution of copyright content over peer-to-peer (P2P) networks facilitates misuse of protected digital property and severely violates the rights of creators and owners. Besides, the anonymity and other features of latest P2P applications make law enforcement almost impractical. Unfortunately, current technology efforts attempting to block unauthorized files depend on manually predefined keywords which can be easily modified by the application, and they fail to distinguish between authorized and unauthorized files. In this context, we present PIFF, an intelligent file filtering mechanism for peer-to-peer network, which effectively blocks illegal file distribution based on file signatures automatically generated and deployed by the system. With a hierarchy infrastructure and decision algorithm, our solution discovers and identifies the file signatures much faster than manual analysis, and spreads it among system nodes to perform filtering immediately. Experiment results suggest that compared with unprotected environment, PIFF can reduce over 85% peers for unauthorized files without any affect on the distribution of authorized files.
Article
In this article, we introduce an application of swarm intelligence to dis-tributed visual information retrieval distributed over networks. Based on the relevance feedback scheme, we use ant-like agents to crawl the network and to retrieve relevant images. Agents movements are influenced by markers stored on the hosts. These markers are reinforced to match the distribution of relevant images over the network. We tackle the use of the information gathered during previous search sessions. In order to match the different categories available on the network, we use several markers. Sessions search-ing for the same category will thus use the same makers. The system in-volves three learning problems: the selection of relevant markers regarding the searched category, the reinforcement of these markers and the learning of the relevance function. All of these problems are based on the relevance feedback loop. We test our system on a custom network hosting images taken from the well known TrecVid dataset. Our system shows a high improve-ment over classical content based image retrieval systems which do not use previous sessions information.
Article
Network intrusion detection is the problem of detecting unauthorised use of, or access to, computer systems over a network. Two broad approaches exist to tackle this problem: anomaly detection and misuse detection. An anomaly detection system is trained only on examples of normal connections, and thus has the potential to detect novel attacks. However, many anomaly detection systems simply report the anomalous activity, rather than analysing it further in order to report higher-level information that is of more use to a security officer. On the other hand, misuse detection systems recognise known attack patterns, thereby allowing them to provide more detailed information about an intrusion. However, such systems cannot detect novel attacks.A hybrid system is presented in this paper with the aim of combining the advantages of both approaches. Specifically, anomalous network connections are initially detected using an artificial immune system. Connections that are flagged as anomalous are then categorised using a Kohonen Self Organising Map, allowing higher-level information, in the form of cluster membership, to be extracted. Experimental results on the KDD 1999 Cup dataset show a low false positive rate and a detection and classification rate for Denial-of-Service and User-to-Root attacks that is higher than those in a sample of other works.