ArticlePDF Available

Towards an immunity based distributed algorithm to detect harmful files shared in P2P networks

July 2013
Peer-to-Peer Networking and Applications 8(1)

July 2013
8(1)

DOI:10.1007/s12083-013-0221-7

Authors:

Jianming Lv

South China University of Technology

Zhiwen Yu

South China University of Technology

Tieying Zhang

Chinese Academy of Sciences

Due to the free and self-organized features, the Peer-to-Peer file sharing networks have become one of the major transmission channels for harmful contents, such as child pornography and abuse video. Traditional monitoring techniques deploy centralized powerful servers at gateways to analyse and filter the P2P traffic. However, the immense amount of documents shared and transferred in the P2P networks makes these techniques quite cost-expensive and inefficient. To address this problem, we develop the iDetect, a distributed harmful content detection algorithm inspired by the Clonal Selection mechanism of immune systems. Analogous to the B-lymphocytes secreting antibodies against antigens in human bodies, the clients in the P2P networks deployed with iDetect cooperate to detect the harmful contents in a distributed and self-organized manner. We build a probability model of the detection procedure to prove the performance of iDetect theoretically. We also conduct simulations to compare iDetect with traditional centralized filtering algorithms. The theoretical proof and experimental results show that iDetect is efficient, effective, self-optimized and scalable to locate the clients sharing harmful contentsin P2P networks.

Content uploaded by Jianming Lv

Content may be subject to copyright.

Peer-to-Peer Netw. Appl.

DOI 10.1007/s12083-013-0221-7

Towards an immunity based distributed algorithm to detect

harmful files shared in P2P networks

Jianming Lv ·Zhiwen Yu ·Tieying Zhang

Received: 17 September 2012 / Accepted: 14 June 2013

Abstract Due to the free and self-organized features, the

Peer-to-Peer file sharing networks have become one of

the major transmission channels for harmful contents, such

as child pornography and abuse video. Traditional mon-

itoring techniques deploy centralized powerful servers at

gateways to analyse and filter the P2P traffic. However, the

immense amount of documents shared and transferred in the

P2P networks makes these techniques quite cost-expensive

and inefficient. To address this problem, we develop the

iDetect, a distributed harmful content detection algorithm

inspired by the Clonal Selection mechanism of immune

systems. Analogous to the B-lymphocytes secreting anti-

bodies against antigens in human bodies, the clients in the

P2P networks deployed with iDetect cooperate to detect

the harmful contents in a distributed and self-organized

manner. We build a probability model of the detection

procedure to prove the performance of iDetect theoreti-

cally. We also conduct simulations to compare iDetect with

traditional centralized filtering algorithms. The theoretical

proof and experimental results show that iDetect is efficient,

J. Lv ()·Z. Yu

School of Computer Science and Engineering,

South China University of Technology,

Guangzhou, 510006 China

e-mail: jmlv@scut.edu.cn

Z. Yu

e-mail: zhwyu@scut.edu.cn

T. Zhang

Institute of Computing Technology,

Chinese Academy of Sciences,

Beijing, China

e-mail: zhangtieying@ict.ac.cn

effective, self-optimized and scalable to locate the clients

sharing harmful contentsin P2P networks.

Keywords Immunity ·Peer-to-peer ·File sharing ·Clonal

selection

1 Introduction

Many studies [1–3] show that a huge amount of harmful

and illegal contents such as child pornography and abuse

video are shared and exchanged in public P2P file sharing

networks. The contents are so easily accessed in the open

P2P infrastructure, that some dangerous side effects have

been brought to a significant proportion of young people

using internet.

Some recent researches [4–7] present filtering methods

to block the harmful content transferred in the P2P net-

work. All of them require deploying powerful centralized

servers at gateways to detect the incoming and outgoing P2P

traffic. While millions of clients are sharing billions of

videos, images, audios and text documents in the network,

to detect all of the contents is quite cost-expensive. The fre-

quently updating of the sharing files also makes it difficult

to find out the harmful ones in time.

In order to solve this limitation, we present the iDetect,

a distributed detection algorithm inspired by the clonal

selection mechanism of immune systems. We consider the

harmful contents shared in the P2P networks analogous

to the antigens in human bodies, and model the detection

tasks performed by each client as antibodies. By adopt-

ing iDetect, the clients in the P2P network can cooperate

to detect harmful contents, similar with the procedure of

the B lymphocytes secreting antibodies against antigens.

Peer-to-Peer Netw. Appl.

Compared with the traditional centralized filtering methods

[4–7], iDetect achieves the following advantages:

1) The detection of shared content is self-organized by a

huge amount of clients synchronously in a distributed

manner. It is much more cost-effective and robust than

the centralized filtering methods. When 10 % clients of

a P2P network with 10,000 on-line users take part in

the distributed detection, each one only needs to detect

27 files on average before all harmful clients in the net-

work are found out. By using the same time in the same

P2P network, the centralized filtering algorithms can

only detect 2 % of the harmful clients, even adopting

a powerful server with the computing ability 10 times

higher than normal clients.

2) Benefiting from the cloning and mutation mechanisms,

the iDetect algorithm can make the antibodies evolve

on each client to focus the detection on harmful con-

tent providers rapidly. Experiments show that, once a

harmful file is found, several more will be detected in

a short period of time. This local phenomenon causes

high efficiency of iDetect.

3) The iDetect algorithm has high scalability. It requires

almost equal time to finish locating all harmful content

providers, while the system scale grows exponentially.

On the other side, in the centralized filtering algorithms,

the time increases linearly along with the system scale.

4) A probability model of iDetect is built to analyse the

relationship between the detection efficiency and the

system configurations such as the ratio of clients pro-

viding harmful contents, the ratio of the harmful files

on each harmful client, the maximum clone rate and the

generation size. Comprehensive simulations are also

conducted to verify the theoretical analysis, and show

that higher ratio of harmful clients and harmful files in

the network may lead to higher efficiency of detection.

Moreover, the maximum clone rate and the generation

size are two important tunable parameters to achieve

faster retrieval of harmful files.

This paper is based on a short conference paper [37] with

6 pages. Beyond the original version, this paper builds

a probability model to prove the performance of iDetect

theoretically, presents discussion about the implementation

issues of iDetect in real P2P networks, and adds more

well-designed experiments to validate the efficiency and

effectiveness of iDetect.

The remainder of the paper is organized as follows.

Section 2introduces the related works. Section 3provides

a brief discussion about the clonal selection algorithm.

Section 4describes the model of iDetect. Section 5analyses

the performance theoretically. Section 6discusses several

implementation issues. Section 7evaluates the performance

of our approach through some simulation experiments.

Section 8concludes the paper and describes possible

future works.

2 Related work

The topic on harmful content detection has been extensively

discussed in the field of Internet web filtering. Some detec-

tion algorithms [13,14] are presented to analyse the content

of web pages and perform online filtering of pornographic

materials. M. Friedman et al. [36] present the clustering

methodology to detect anomalous documents downloaded

from the internet by users. Chen et al. [23] provide a

overview of the state-of-the-art of the deployment of web

filtering. All of these algorithms are implemented on cen-

tralized servers to detect and monitor the illegal web sites.

As the exponential growth of the scale of the P2P files

sharing networks [10–12], a substantial amount of harmful

documents are transferred in the P2P networks. The P2P

network is a distributed architecture that all the clients in

the network are connected and cooperating in an equally

privileged manner. Not only each client is able to share doc-

uments to others, but also they can search and download

their favorite content from others.

From a certain perspective, the harmful content detec-

tion in the P2P network is to search for the harmful content

shared within. A lot of techniques [27–34] are presented

to search specific content in the P2P network. Specifically,

S. Ratnasamy et al. [27] and I. Stoica et al. [28]present

some structured P2P overlays to facilitate the efficient key-

based search, in which each shared file is indexed by some

keys and the search is also based on the keys. C. Tang

et al. [29] and P. Reynolds et al. [30] introduce the infor-

mation retrieval mechanisms to search the content of the

text documents shared in the P2P network. W. Mller et al.

[31] and D. Picard et al. [32] present some intelligent algo-

rithms to search the images in the P2P network. Moreover,

the research [33–35] use the heuristic algorithms based on

the shared content and the search history on each client

to improve the search performance. The search patterns

of above techniques are document keywords or reference

images, which are used to search for the most similar ones

shared in the network. In the harmful content scenario of

this paper, it is hard to define such search patterns because

of the variety and unpredictability of harmful content.

More related detection algorithms [4–7] present cen-

tralized infrastructures to analyse and block the harm-

ful contents shared in the P2P network. Specifically,

Nam et al. [4] proposes the P2P traffic sensor to filter

adult videos, images and text documents in P2P networks.

The sensor is responsible to identify and analyse the P2P

traffic and block the transmission of harmful information.

Liu et al. [5] present an algorithm to filter illegal files by

Peer-to-Peer Netw. Appl.

collecting the signatures of transferred files. Once the signa-

ture of a transferred file matches some one in the database

storing harmful files signatures, the system will perform

filtering immediately. A novel solution is presented by

Lee et al. [6], which introduces the idea of Honeypot into

P2P network. Some fake P2P service farms are implemented

and they are used to monitor and trace users who spread ille-

gal or harmful files in P2P network. Cruz et al. [7] designs

a tool to detect the Child Pornography in the P2P network

without violation of the privacy of legal users. The system

enables its operators to use the pedophilic files as search pat-

terns, which are gathered in other ways beyond the scope of

the system, to search for the P2P clients sharing these illicit

materials.

Compared with the centralized infrastructures proposed

in the related works above, the iDetect algorithm is based

on the theory of the Artificial Immune System (AIS) and

run by the P2P clients in a fully distributed way. According

to [20,22], “Artificial Immune Systems (AIS) are adaptive

systems, inspired by theoretical immunology and observed

immune functions, principles and models, which are applied

to problem solving.” A lot of anomaly detection algorithms

[15–21] based on AIS are presented recently.Different from

harmful content detection presented in this paper, anomaly

detection is mainly deployed to detect the intrusion against a

computer system. The AIS is applied in these algorithms to

model the normal state of the system and detect the anomaly

actions.

3 The clonal selection algorithm

The clonal selection theory is firstly presented by Burnet

[8], which is mainly about how the animal bodies respond

to the antigens invasion by producing antibodies through

B lymphocytes. The antibodies are the proteins having the

ability to recognize the invading antigens, which are viewed

as foreign material of body.The effectiveness of an antibody

to recognize an antigen is defined as affinity of the antibody.

The B lymphocytes secreting higher affinity antibodies are

selected to be cloned more, and the ones secreting low affin-

ity antibodies are eliminated in the end. On the other hand,

a process called affinity maturation is taken to keep the

antibodies mutating at a certain rate, which will keep the

diversity of generated antibodies. Additionally, the B lym-

phocytes can differentiate into long-lived B memory cells

secreting high affinity antibodies. This learning mechanism

enables the immune systems to respond to the recurring

antigens very fast.

According to the clonal selection theory, De castro

presents the clonal selection algorithm [9]. The procedure is

illustrated as Fig. 1. Given a collection of antibodies Iin the

step (1), the top nantibodies with highest affinity are firstly

Fig. 1 The clonal selection algorithm

selected as Inin the step (2). Inis named as a generation of

antibodies in the algorithm. Then, the antibodies in Inare

cloned in the step (3) and mutated in the step (4) according

to its affinity. In the following step (5), the antibodies with

highest affinity are re-selected and added back into the col-

lection. Finally, Idnovel antibodies are selected to replace

the antibody with low affinity in the step (6).

4 Model

4.1 System overview

In a P2P file sharing network as illustrated in Fig. 2, each

client can not only share files to other clients, but also search

and download favourite files from others [10–12]. With-

out any constraint about the files sharing, it is very easy to

propagate harmful multimedia documents in the network.

We present the iDetect algorithm to detect and filter

harmful contents as illustrated in the Fig. 3. Assume that

a portion of the clients in the P2P network install the iDe-

tect plug-in module, which can be distributed by the P2P

software company or third-party supervisory service. Each

iDetect module works together with the original P2P client

to detect the shared contents of other clients in the network.

In this distributed way, the tremendous centralized detection

cost can be saved and replaced by the lightweight detection

tasks run on a large number of clients. As long as a client

finds out a harmful file, it reports the result to the centralized

mediator server. The mediator server then sends the warning

Fig. 2 P2P file sharing network

Peer-to-Peer Netw. Appl.

Fig. 3 Distributed detection in iDetect

information to the harmful file releaser, and prevent it from

accessing the P2P network until the harmful file is removed.

4.2 iDetect model

The distributed detection algorithm should achieve the fol-

lowing goals:

G1: The detection of harmful files should be as soon as

possible in order to lower down the side effects caused

by the propagation of harmful content.

G2: It should be guaranteed that all clients sharing harmful

contents are detected.

G3: Once a client has detected a file, it can memorize the

detection result. When it detects the same file from

another client, it can judge whether the file is harmful

much more quickly.

In order to achieve the above goals, we propose the iDe-

tect model based on the clonal selection algorithm [9]to

schedule the detection tasks among clients. Similar with the

context of immune systems where B-cells generate antibod-

ies to recognize invading antigens, the clients installing the

iDetect module execute the detection tasks to find harm-

ful files shared in the network. We borrow some concepts

in immune systems to describe the iDetect algorithm as

Table 1. Specifically, each harmful file shared by a client is

Table 1 Definitions in iDetect

Concepts Definition in immune Definition in iDetect

systems

Antibody Protein generated by B-cell A task executed by a

to recognize antigen client to detect a file

Antigen Foreign material simulating A harmful file shared

an immune response of body in the network

Affinity Effectiveness of an antibody Effectiveness of a detection

to recognize antigens task to find harmful files

Cloning Generating a copy of Generating a copy of

an antibody a detection task

Mutation A change of an antibody A change of a detection task

defined as an antigen (Ag). A task to detect a file shared by

some client is defined as an antibody (Ab). Each antibody

is coded as a tuple Abi=(Pj,F

k)which indicates a task

to detect the file Fkshared by the client Pj(1 ≤j≤N

,1 ≤k≤Γ). Here Nis the total number of the clients in the

P2P network, Γis the number of files shared by the client

Pj. Each client installing iDetect keeps generating and exe-

cuting the antibodies to detect files shared by other P2P

clients.

The affinity of an antibody is defined as follows to mea-

sure how effective of a detection task to find harmful files.

f(Ab

i)=0(if Fkis harmless)

sh(Fk)(ifF

kis harmful) (1)

Here sh(Fk)indicates the number of the clients shar-

ing Fk.IfFkis harmful, the affinity of Abiis measured as

sh(Fk), which indicates the popularity of the harmful file.

Otherwise, if Fkis harmless, the affinity is equal to 0, which

means the task is ineffective to find any harmful file.

Similar with the clonal selection algorithm, the cloning

and mutation operations are adopted in our method. Cloning

an antibody means to generate a copy of the detection task

run in a client.The number of the copies of an antibody is

defined as the clone rate, which is calculated as follows:

δ(Abi)=α(f(Ab

i)>0)

1(f (Abi)=0)(α>0)(2)

Here αis the maximum clone rate. Equation (2) implies

that the antibodies with higher affinity will be cloned

into more copies to accelerate the detection of correlated

harmful files.

On the other hand, the mutation of an antibody means

to change the task to detect another file. For any antibody

Abi=(Pj,F

k), three kinds of mutation operations on Abi

are defined as follows:

–Client mutation: the client Pjis changed to another

different client P

jsharing Fk, and the file Fkis changed

to any other undetected file F

kon P

j.Thismeansto

detect another client sharing the same file Fk.

–File mutation: the file Fkis changed to another differ-

ent file shared by the client Pj. This means to detect

another file on the same client Pj.

–All mutation: the client Pjis changed to any other

clients P

jin the P2P network and the file Fkis changed

to any file F

kshared by P

j. This means to detect a

different file shared by any other client.

Each antibody should take one of the mutations men-

tioned above. Specifically, the probability to perform the

Client mutation is defined as follows:

χ(Abi)=1

2(1−exp(−f(Ab

i))) (3)

Peer-to-Peer Netw. Appl.

The probability to perform the File mutation is equal to

the Client mutation:

γ(Ab

i)=χ(Abi)(4)

The probability for the All mutation is defined as

follows:

λ(Abi)=exp(−f(Ab

i)) (5)

AccordingtotheEq.1, the antibodies with higher affinity

are corresponding to the harmful files shared by more clients.

These antibodies will perform the File mutation and Client

mutation with higher probability according to Eqs. 3and 4.

The File mutation is to change the antibody to detect

another file shared by the same client. Because users tempt

to share multiple files on a topic, a harmful client usually

shares more than one harmful file. With the File mutation,

the detection can be guided to cover other harmful files

shared on the same client quickly.

The Client mutation is to detect another client sharing

the same harmful file. Once a harmful file is detected,

the Client mutation can help to find more harmful files

in a broad scope. By combining the File mutation with

Client mutation, the procedure to hunt for the harmful files

can be accelerated.

Different from the above two mutation operations chang-

ing antibodies to another correlative one, the All mutation

is used to change the antibody to a totally different one.

The All mutation can guarantee the diversity of antibodies

and make the scope of detection as large as possible.

4.3 The iDetect algorithm

As illustrated in Algorithm 1, the iDetect algorithm is

designed to schedule and execute the detection tasks on

each client. For better understanding the algorithm, a sim-

ple example of a P2P file sharing network with five peers is

also showed in Fig. 4to illustrate how antibodies evolve on

each iDetect client. Specifically, the algorithm contains the

following main steps:

1) Initializing the antibodies. Each client installing iDetect

randomly selects nother clients from the P2P network

and requests them for the name list of their shared files.

For each list, the client picks up one file to construct

an antibody. Figure 4b shows the evolution of the anti-

bodies on the peer P1.P1randomly select two clients,

P3and P4from the network, and constructs two anti-

bodies as (P3,F

7)and (P4,F

5). Here F7is a file ran-

domly selected from P3,andF5is randomly selected

from P4.

2) Detection. For each antibody, the client detects the con-

tent of the corresponding file and decides whether it is

harmful. In the real implementation, the detection tech-

niques presented in [4,25] can be adopted here. As

showed in the Fig. 4b, P1has two antibodies (P3,F

and (P4,F

5)at the beginning of the iteration 1. P1then

detects the files F7on P3and F5on P4, and can decide

that F5is harmful.

3) Memorization. Once a harmful file is detected by a

client, its file name and the MD5 signature code are

memorized. When the client detects the same file

shared by other clients, it can judge the file is harmful

in a short time.

4) Selection. Similar with the clonal selection algorithm,

the nantibodies with highest affinity are selected to per-

form the subsequent cloning and mutation operations.

As illustrated in Fig. 4b, Pihas only two antibodies,

(P3,F

7)and (P4,F

5), at the beginning of the iteration

1, so both of them are selected to perform the following

cloning operations.

5) Cloning. The antibodies with higher affinity will be

cloned into more copies according to the clone rate

function δ(Abi)defined as Eq. 2.InFig.4b, Pical-

culates the clone rate of the two antibodies (P3,F

and (P4,F

5). Because F7is harmless, the clone rate

of (P3,F

7)is 1 according to Eq. 2and only one copy

is generated. On the other hand, the clone rate of

(P4,F

5)is 2, because F5is harmful. Thus, (P4,F

5)is

duplicated.

6) Mutation. In this step, each antibody performs one

of the following mutation operations: Client mutation,

File mutation and All mutation. The probabilities to

perform these operations are defined in Eqs. 3,4

Peer-to-Peer Netw. Appl.

Fig. 4 An example of the

iDetect Algorithm. aThe figure

shows a P2P file sharing network

containing five clients P1∼P5,

which share the files F1∼F13.

Among these files, F4,F5and

F10 are harmful and others are

harmless. P1is deployed with

iDetect. The parameters are set

as follows: n=2,α=2. bThe

evolution of the antibodies on

P1while running the iDetect

algorithm. Each iteration

contains the following main

stages: detection, selection,

cloning and mutation

and 5. Figure 4b shows an example of the mutation.

When Piis in the mutation stage of the iteration

1, the antibody (P3,F

7)performs All mutation and

is changed to (P5,F

13). The first (P4,F

5)performs

File mutation and is changed to (P4,F

10). The other

(P4,F

5)performs client mutation and is changed to

(P2,F

4).

The iDetect algorithm repeats the above steps while the

client is on-line. As showed in Fig. 4b, after finish-

ing the mutation in the iteration 1, Pigoes on to run

in the iteration 2. All the files in the three antibodies,

(P5,F

13),(P4,F

10)and (P2,F

4)are detected. The affin-

ity of (P5,F

13),(P4,F

10)and (P2,F

4)are 0, 1 and 2

respectively according to the Eq. 1. At the Selection stage,

(P4,F

10)and (P2,F

4)are selected, because they have high-

est affinity. Then each antibody is duplicated in the cloning

stage. In the mutation stage,the four antibodies are mutated

into another different ones.

5 Theoretical analysis

In this section, we analyse the detection efficiency of the

iDetect Algorithm. We denote the total number of the clients

as N, and the ratio of harmful clients in the network as η(0<

η<1). The averagenumber of files shared by each client is

Γ. For each harmful client, the average ratio of harmful files

shared on the client is ρ(0<ρ<1). Moreover, ωN(0<

ω<1)clients install the iDetect module. The time to detect

each file is ton average.

For clarity, the notations mentioned in this section are

illustrated in the Table 2.

We present the following Theorem 1 about the complex-

ity analysis of the Algorithm 1.

Theorem 1 The total number of antibodies created by a

client running iDetect is O(αnr), and the time consumed

by the client to perform detection is O(αnrt),wherenis

the generation size, ris the number of iterations run on the

client and tis the average time to detect each file.

Proof At the beginning of each iteration in the algorithm,

nantibodies with highest affinity are selected from the can-

didates. At the cloning stage of the iteration, each of the n

antibodies, Abi, is cloned into δ(Abi)copies in the system.

According to Eq. 2,δ(Abi)α. Thus, There are at most

αn antibodies created by a client in one iteration. After run-

ning riterations, the total number of antibodies created by

a client is O(αnr). For each antibody created by the client,

there is one corresponding file detected. Thus the detection

time consumed by the client is O(αnrt).

The above Theorem 1 shows that the time spent for

detection on each client is proportional to the number of

iterations, which can be tuned to control the resource con-

sumption for detection by each client. To infer how many

iterations are needed to detect harmful files shared in the

network, we present the following Theorem 2.

Table 2 Notation in algorithm 1

Description

αMaximum clone rate

nGeneration size: In each iteration, the top n antibodies with

highest affinity are selected to construct the antibody generation.

the antibody generation.

NTotal number of clients

ΓNumber of files shared by a client

ηRatio of harmful clients

ρRatio of harmful files on a harmful client

ωRatio of clients installing iDetect

tAverage time to detect each file

rNumber of iterations run on a client

Peer-to-Peer Netw. Appl.

Theorem 2 If n<N, for any harmful file Fxshared by any

client Pjin the network, the expectation of the number of

iterations needed to run before the file Fxgets detected is:

R1−e−ωnΓ −1−1(6)

Proof To prove the theorem, we build a probability model

of the evolutionary procedure in the iDetect algorithm

as Fig. 5. During each evolutionary iteration, each anti-

body is evolved in four main stages: initialization, selec-

tion, cloning and mutation. An antibody is called matched

if its detected file is harmful. Otherwise, it is called

mismatched.

At the initialization stage, nrandom antibodies are con-

structed, which are corresponding to the detection tasks to

scan nfiles randomly selected from the network. At the

selection stage, nof the antibodies with highest affinity

are selected. At the cloning stage, each matched antibody

is cloned into αcopies, while the mismatched ones keep

unchanged. At the mutation stage, each antibody performs

one form of the mutations: Client mutation, File mutation

and All mutation, and the corresponding probabilities are

χ(Abi),γ(Ab

i)and λ(Abi)as defined in Eqs. 3,4and

5. Specifically, the mismatched antibodies only perform

All mutation, because the probabilities λ(Abi)of these

antibodies are 1 according to Eq. 5.

For each antibody taking Client mutation or

File mutation, it is changed to detect another file on a

harmful client. The probability of the detected file being

harmful is ρ, which is the ratio of harmful files on a harm-

ful client. Thus the mutated antibody is matched with the

probability ρ. On the other hand, for any antibody taking

All mutation, it is changed to detect any other file on any

client. Because the ratio of the harmful clients in the net-

work is η, the probability of the mutated antibody being

matched is ηρ .

Given any harmful file Fxshared by any client Pjin the

network, the probability of the antibody Abx=(Pj,F

created by the iDetect client Piin No. k(k0) iter-

ation is denoted as i,k (Abx). Here the No. 0 iteration

means the initialization stage before running the No. 1

iteration. At this stage, nfiles are randomly selected to

initialize the corresponding nantibodies, so the probabil-

ity of Pigenerating Abxin No. 0 iteration is as follows:

i,0(Abx)=n(NΓ )−1(7)

For the No. k(k>0) iteration run on Pi,

because the adoption of Cloning, Client mutation and

File mutation can make higher probability to generate

a matched antibody than random selection, we have:

i,k(Abx)i,0(Abx)=n(N Γ )−1(8)

We denote the probability of Abxgenerated in

the No. kiteration by any client as k. Since there are

ωN clients installing iDetect in the network, from Eq. 8we

have:

k(Abx)=1−

i∈D

(1−i,k(Abx))

1−1−n(NΓ )−1ωN

=1−(1−c/N)ωN (9)

Here D={i|1iN,andPiinstalls

iDetect }.c=nΓ −1. Because n<Nand

Γ1, we have c<N.Letg(N) =1−

(1−c/N)ωN . The derivative of g(N ) is as follows:

g(N) =w∗(g(N ) −1)∗(ln(1−c/N ) +c/(N −c)) (10)

Given 0 <c<N, it can be proved that g(N) <

0. On the other hand, lim

N→∞ g(N) =1−e−ωc ,so

g(N) 1−e−ωc . Combined with (9)wehave:

k(Abx)g(N) 1−e−ωc =1−e−ωnΓ −1(11)

Thus, the expectation of the iterations needed to run

before Abxgets generated is:

R=1+(1−1(Abx))+(1−1(Abx))(1−2(Abx))+

(1−1(Abx))(1−2(Abx))(1−3(Abx)) +...

=1+

∞



k=1



j=1

(1−j(Abx))

1+

∞



k=1



j=1

e−ωnΓ −1

1+

∞



k=1

e−ωnΓ −1k

=1−e−ωnΓ −1−1(12)

Once the antibody Abx=(Pj,F

x)is generated by some

client, the file Fxshared on Pjwill be detected in the same

iteration. Thus, the expectation of the number of iterations

needed to run before Fxgets detected is also equal to R,

and the Theorem 2 holds.

When combining the Theorem 1 and 2, we can prove

the following Theorem 3 about the detection efficiency of

iDetect.

Theorem 3 If n<N, for any harmful file Fxshared by any

client Pjin the network, the expectation of the time to run

Peer-to-Peer Netw. Appl.

Fig. 5 The evolution of an

antibody in the iDetect algorithm

iDetect before the file Fxgets detected is:

TF=Oαnt 1−e−ωnΓ −1−1(13)

Proof According the Theorem 2, the expectation of the

number of iterations to run iDetect before Fxgets detected

is:

R1−e−ωnΓ −1−1

According the Theorem 1, the corresponding time of run-

ning Riterations is: O(αntR). Thus, the expectation of the

time to run iDetect before the file Fxgets detected is:

TF=Oαnt 1−e−ωnΓ −1−1

From the Theorem 3, we can compare the efficiency

between iDetect and the centralized filtering algorithms

[4–6]. In the centralized filtering algorithms, all shared

files transferred in the network are randomly sampled and

detected by the servers, which decide whether the files are

harmful. For any harmful file Fxshared by any client Pjin

the network, the time to run the algorithms before Fxgets

detected is T=O(tNΓ ), which grows proportional to the

network scale N. From Theorem 3, we can infer that the

detection time TFin iDetect is not related to N, but corre-

lated with ω, the ratio of the clients installing iDetect. That

means iDetect can be much more efficient than centralized

filtering algorithms in large scale network, if the ratio ωcan

be kept at a certain level. This is verified in the simulations

in the Section 7.2.

6 Implementation issues

In above chapters, we mainly focus on the schedule of detec-

tion tasks on the clients, while leaving the detail about the

implementation of the whole system in this section.

6.1 Communication of clients

As mentioned in the Algorithm 1, the communication proce-

dures required by a client installing iDetect are summarized

and supplied with implementation details as follows.

1) Joining and leaving the network: The clients installing

iDetect conform the P2P file sharing protocol and

join/leave the network as other file sharing clients.

Because each client installing iDetect independently

evolves its antibodies and executes the detection tasks,

no additional maintenance operations are required to

synchronize the detection of the clients.

2) Randomly selecting a client in the network: Most of

the file sharing networks support retrieving a list of

on-line clients in the network. For example, in BitTor-

rent [11] and Emule [12], the Distributed Hash Table

(DHT) is implemented to support distributed searching

of a numeric key and return the peers whose identities

are close to the key. The iDetect Module can invoke the

DHT API to search for a random key to get a list of

on-line clients in the network and select one randomly

from the list.

3) Randomly selecting a file shared by a client: The iDe-

tect module can invokes the peer browsing function

supported by most P2P file sharing clients, which is

used to view the sharing files ofa client. The implemen-

tation of the function is defined in the P2P file sharing

protocols, such as Emule [12] and Gnutella [10].

4) Detecting a file: While the iDetect module needs to

detect a file shared by another client, it conforms the

P2P file sharing protocol to transfer file blocks.

Peer-to-Peer Netw. Appl.

6.2 Deployment issues

In this section, we discuss the deployment issues of iDe-

tect by presenting the answers for some probably asked

questions as follows:

(1) How to deploy iDetect in the real P2P network? It

seems hard to persuade the users to install the content

detection module, which may preventthem from using

the P2P sharing freely.

(2) Why not just make every client only scan itself for

harmful content?

(3) Is the bandwidth used to download and detect the con-

tent too high for each individual client and affecting

the user experience?

(4) What is the accuracy rate of judging whether a file is

harmful while detecting the file?

(5) How to ban the harmful content sharer after we have

detected them?

The question (1) can be answered by the theoretical anal-

ysis in the Section 5. We only require a portion of clients

in the network to install iDetect. Theorem 3 shows that the

higher ratio (ω) of clients installing iDetect may leads to

faster detection of harmful contents. The supervisory orga-

nization may cooperate with the P2P service providers to

distribute the new version of client embedded with iDetect

to the public. While most of the legal users are unwilling

to let the harmful content spread in the network, they may

adopt the new version and take part in the detection.

The solution mentioned in question (2) is to let each

client only scan its local shared files instead of detecting

other clients as iDetect. This seems easier to implement than

iDetect, but confronts some obstacles in a real deployment.

Most of the popular P2P sharing protocols such as Emule,

BitTorrent and Gnutella are public, and usually have mul-

tiple compatible client softwares. For example, Emule has

more than 20 different but compatible client softwares [38].

Users may use any kind of clients to join the network. Thus

it is hard to force all the clients to install the iDetect module

to do self-detection.

As mentioned in the question (3), the latent high band-

width consumed to detect the media content may be a

problem to any individual user. This can be solved by utiliz-

ing the blocks transferring mechanism in P2P networks. To

save the bandwidth, the detecting clients can only download

some randomly selected blocks of a media file and detect

the frame images within the samples. Moreover, some files

can be judged harmful only according to its name. For these

files, there is no need to download the content to make the

decision.

The accuracy of judging a file as mentioned in question

(4) is decided by the adopted classification methods such as

the algorithms [4,25]. Specifically, for the detection algo-

rithm [25], its false positive ratio is 6.0 %, and the false

negative ratio is 1.2 %. On the other hand, the correct rate

of the algorithm [4] is 93 % as illustrated in the paper [4].

Both of these algorithms perform well in detecting harmful

content.

The answer of the question (5) is out of the scope of

this paper. As soon as a client detects a harmful file, it may

report the result to the centralized mediator server, which

is maintained by the third-party supervisory organization or

the P2P service providers themselves. The further filtering

actions [4–6] can be taken to block the transferring of illegal

contents. In another way, the clients installing iDetect can

also cooperate as [26] to stop searching and downloading of

some concerned files.

7 Experimentation and results

7.1 Experiments setup

In this paper, we setup the experiment to test the perfor-

mance of iDetect. We simulate a P2P file sharing network

with 10,000 clients. The files sharing scenario is configured

according to the measurement [24] about the real P2P file

sharing network, in which 25 % clients share nothing, 7 %

peers share more than 1000 files, and other peers share 100

files or less. To simplify the experiments without loss of

generality, we set that 25 % clients in the simulation share

nothing, 7 % ones share 1000 files and others share 100

files.

We also simulate the harmful contents shared in the net-

work. Unless otherwise specified, 1 % clients are harmful

and 50 % files of what they share are harmful. Each harm-

ful file is shared by 5 clients in the network on average. The

parameters configured in this section are summarized in the

Table 3.

We compare iDetect with the traditional centralized fil-

tering methods in the following experiments. In the iDetect

scenario, there are 10 % clients in the network running the

iDetect algorithm and detecting files independently. In the

Table 3 The parameters configured in the experiments

Description default value

αMaximum clone rate 4

nGeneration size 10

NTotal number of clients 10,000

ΓThe number of files shared by a client 100∼1000

ηRatio of harmful clients 1 %

ρRatio of harmful files on a harmful client 50 %

ωRatio of clients installing iDetect 10 %

Peer-to-Peer Netw. Appl.

centralized filtering scenario, there exists a powerful cen-

tralized server dedicated to randomly collect and detect the

files shared by the clients in the network.

7.2 Detection efficiency

Figure 6shows how many percentages of harmful files are

detected when a given number of files are scanned totally.

While 100,000 files are detected in the iDetect algorithm,

almost 100 % of the harmful files areincluded. This is much

more efficient than the centralized filtering algorithms, in

which less than 10 % of harmful files are collected when

totally 100,000 files are detected.

Specifically, to look into the efficiency of three types of

mutations in the iDetect algorithm, we also construct two

other variants of the algorithm:

All m+File m: The All mutation and File mutation are

combined to use while disabling the

Client mutation. The probability for the

File mutation is defined as: χ(Abi)=

1−exp(−f(Ab

i)), while the probability

of the All mution is defined as: λ(Abi)=

exp(−f(Ab

i)). In this case, once a harm-

ful file is detected, the detection focuses

on the sharer of this file.

All m: Only the All mutation is adopted in the

algorithm. In other words, the probability

for All mutation is defined as: λ(Abi)=

1. In this case, each client installing iDe-

tect randomly selects the files shared by

others to detect.

Figure 6shows that the efficiency of the All m is close to

the centralized filtering algorithm, because they have similar

randomized detection manner. The All m+File mismuch

more efficient to capture the harmful files. This is due to the

use of the File mutation, which makes the detection focused

on the harmful clients. The iDetect algorithm combining

Fig. 6 Harmful files detection rate vs total number of scanned files

Fig. 7 The function h(Fi)of the No. idetected file Fiduring the

detection procedure. If Fiis harmful, h(Fi)is 1, otherwise h(Fi)is 0. a

The detection procedure of a client installing iDetect. bThe detection

procedure in the centralized filtering system

all of the three mutations are most efficient. This proves

the effect of the Client mutation, which helps covering all

harmful clients quickly by mutating the detection against

different clients sharing a same harmful file.

Further more, to explain why iDetect is more efficient

than the centralized filtering methods to locate harmful files,

we also illustrate the detail of the detection procedure in

Fig. 7. The figure shows the h(·)function (defined in Eq.

1) of the first 200 files detected in the systems. In the iDe-

tect case, the dense lines indicate that, once a harmful file is

found, several more will be detected in a short period. This

local phenomenon causes the high efficiency of iDetect. In

contrast, in the centralized filtering scenario, the sparse line

visualizes that the harmful files are detected randomly and

independently.

In the real deployment of the detection algorithms, what

is cared about seriously is how soon all of the harmful

clients can be detected. In order to measure the time to fin-

ish the detection, we assume that the time for each client

to detect a file is equal and defined as 1 unit time. Con-

sidering that the servers in centralized filtering algorithms

are usually much more powerful than normal clients, we

assume that the time for the server to detect a file is 1/10 unit

time. During the detection, once a harmful file on a client is

detected, this client is considered as a harmful client.

Figure 8shows how many harmful clients are detected

as time goes on. Similar with the Fig. 6, iDetect is much

more efficient than other algorithms, since it only takes 27

unit time to finish detecting all harmful clients. That means

all of the harmful clients are found out when each client

Peer-to-Peer Netw. Appl.

installing iDetect has detected 27 files on average. By using

the same time, the centralized filtering algorithm only detect

2 % harmful clients. We also test the efficiency of iDetect

under a extreme condition, named One-iDetect in Fig. 8,

where only one client installs the iDetect module. It shows

that One-iDetect is also much better than the centralized fil-

tering method. This result suggeststhat iDetect can also help

to improve the performance of centralized detection while

the servers are deployed with iDetect instead of randomized

detecting.

7.3 Scalability

In this section, we conduct experiments to test the scalability

of the iDetect algorithm in different network environments.

First of all, we test how the performance of the algorithm

changes while increasing the system scale exponentially.

Figure 9illustrates how the time to detect all harmful clients

is affected by the increase of the system scale. It shows that

the iDetect algorithm has high scalability, for its detection

efficiency keeps steady against the increase of the system

scale. This result verifies the Theorem 3. On the other hand,

in the centralized filtering algorithm, the time increases lin-

early along with the system scale. The higher scale the

system has, the much more iDetect outperforms the central-

ized filtering algorithm. For the file sharing network with

one million peers, the time for the centralized filtering algo-

rithm to finish detection is 31,519 times of that in iDetect,

when the server can detect a file within 1/10 unit time. Fur-

thermore, if we plan to deploy a more powerful server to

detect files as fast as iDetect, the server should be 315,190

times quicker than a PC installing iDetect, and also con-

sume the bandwidth multiplied by the same number. Thus,

the huge amount of computing and bandwidth cost to main-

tain powerful centralized servers can be saved by deploying

iDetect to perform the distributed and intelligent detection.

We also test the performance of the iDetect algorithm

against the increase of the density of the harmful files in

Fig. 8 Harmful clients detection rate vs time

Fig. 9 The time to detect all harmful clients in different system scales

the system. Figure 10a illustrates how the ratio of harmful

files on each harmful client affects the detection efficiency.

It shows that the detection will be much quicker if the ratio

is higher. This can be explained roughly from the mutation

procedure of the iDetect algorithm. Higher ratio of harmful

files on a harmful client may make the client more possi-

ble to share same harmful files with other harmful clients.

Accordingly, the antibody including the client may be gen-

erated with higher probability by the Client mutation, which

mutate the antibody against the clients sharing same harm-

ful files. Thus the harmful clients may be detected in higher

probability. On the other hand, Fig. 10b illustrates how the

performance changes while increasing the harmful client

ratio ηfrom 1 %∼32 %. It shows that the time to finish

detecting all harmful clients grows linearly along with the

density of the harmful clients.

Furthermore, we also test how the ratio of the clients

installing iDetect affects the detection efficiency. Figure 11

shows that more clients taking part in the distributed

detected may cause faster retrieval of the harmful clients.

This result verifies the Theorem 3, which shows that the

detection time TFis a decreasing function of ω, the ratio of

clients installing iDetect. Moreover, when all the clients in

the network install iDetect, the detection is fastest and the

efficiency is close to the self-detecting method where each

one just detects itself.

7.4 Sensitivity of the parameters

In order to make the parameter of the iDetect algorithm

more reasonable for the real deployment, we test the sen-

sitivity of the two parameters: αand n. As illustrated in

Table 2,αis the maximum clone rate and nis the generation

size.

Figure 12 shows the performance of iDetect with differ-

ent combination of αand n. We can see that the detection is

finished quicker when αis tuned bigger. The result can be

explained as the result of the following chain reaction.When

Peer-to-Peer Netw. Appl.

Fig. 10 The detection

efficiency in different density of

harmful files. aThe detection

efficiency vs different ratio of

harmful files on each harmful

client. bThe detection

efficiency vs different ratio of

harmful clients in the system

αis set bigger, the antibodies to detect the files on harm-

ful clients will be cloned more in each generation. This may

stimulate the File mutation and Client mutation happened

in higher probability, which promotes the speed to capture

harmful clients sharing same harmful files. Finally, the time

to detect all harmful clients decreases.

We can also infer from Fig. 12 that when αis bigger

than a threshold about 6, αdoes not affect the detection

obviously. This also indicates that in real deployment, it is

reasonable to set α≥6.

Figure 12 also shows how the generation size naffects

the detection. Given the same configuration of α, bigger

nleads to worse performance. Increasing nmay cause the

antibodies with low affinity also gain too much opportu-

nity to be cloned and mutated, which makes the clients

waste more time to detect harmless files. This suggests

that we should set nas low as possible. From Fig. 12 we

can see that 2 is the most reasonable setting for n, while

α≥6. That means only the top two antibodies with high-

est affinity are selected to join the evolvement of each

iteration.

Fig. 11 The detection efficiency in different ratio of the clients

installing the iDetect module

Fig. 12 The figure shows the detection efficiency of iDetect with

different configuration of the clone rate αand the generation size n

8 Conclusion

In this paper, we investigate the problem of how to detect

harmful contents shared in the P2P networks efficiently and

effectively. Our major contribution is the iDetect, a dis-

tributed detection algorithm based on the clonal selection

mechanism of immune systems. The detection tasks can be

executed parallel on the P2P clients and intelligently focus

on the harmful clients to locate the harmful files efficiently.

To prove the performance of the algorithm theoretically,

we build a probability model about the evolvement pro-

cedure of the antibodies. We also conduct experiments to

compare iDetect with the centralized filtering algorithms.

Theoretical proof and experiments show the following

results:

1) The iDetect algorithm is much more efficient than the

centralized filtering algorithms. When 10 % clients of

the P2P network take part in the distributed detection,

each one only requires to detect 27 files on average

before all harmful clients in the network are found out.

Peer-to-Peer Netw. Appl.

By using the same time, the centralized filtering algo-

rithms can only detect 2 % of the harmful clients, even

adopting a powerful server with the computing ability

10 times higher than normal clients.

2) The iDetect algorithm has high scalability. The detec-

tion efficiency keeps almost unchanged while the sys-

tem scale grows exponentially.

3) The density of the harmful files shared in the network

affects the performance of the iDetect algorithm. More

harmful files shared on each harmful client may cause

the algorithm easier to detect harmful clients in the

network.

4) The maximum clone rate αand the generation size n

are two tunable model parameters in the iDetect algo-

rithm. Increasing αcan accelerate the evolvement of

antibodies to achieve faster retrieval of harmful files,

while decreasing nmay reduce the opportunity to detect

harmless files. The experiment results suggest that α

had better to be set bigger than 6 and nto be set as 2.

Acknowledgments The work described in this paper was supported

by grants from National Natural Science Foundation of China (Project

No.61070090, 61003174, 60973083 and 61170080), the grant fromthe

Comprehensive Strategic Cooperation Project of Guangdong Province

and Chinese Academy of Sciences (Project No.2012B090400016),

the grant from the Technology Planning Project of Guangdong

Province (Project No.2012A011100005), and the grant from the

Fundamental Research Funds for the Central Universities (Project

No.2011ZM0069)..

References

1. Fournier R, Cholez T, Latapy M, Magnien C, Chrisment I,

Daniloff I, Festor O (2012) Comparing paedophile activity in

different P2P systems. Arxiv eprint, arXiv:1206.4167

2. Wolak J, Finkelhor D, Mitchell K (2011) Child pornography pos-

sessors: trends in offender and case characteristics. Sexual Abuse

23(1):22–42

3. Hughes D, Walkerdine J, Coulson G, Gibson S (2006) Peer-to-

Peer: is deviant behavior the norm on P2P file-sharing networks?

IEEE Distrib Syst Online 7(2):1–7

4. Nam T, Lee H, Jeong C, Han C (2005) A harmful content pro-

tection in peer-to-peer networks. Artif Intell Simul 3397:617–

626

5. Liu J, Ning L, Xue Y, Wang D (2006) PIFF: an intelligent file

filtering mechanism for peer-to-peer network. In: Proceedings 2nd

IEEE symposium dependable, autonomic and secure computing,

DASC’06. Indianapolis, pp 308–314

6. Lee H, Nam T (2007) P2P honeypot to prevent illegal or harm-

ful contents from spreading in P2P network. In: Proceedings

9th international conference advanced communication technology,

ICACT’07. Gangwon-Do, pp 497–501

7. Cruz IP, Aller CF, Garcia SS, Gallardo JC (2010) A careful design

for a tool to detect child pornography in P2P networks. In: Pro-

ceedings IEEE international symposium technology and society,

ISTAS’10. Wollongong, pp 227–233

8. Burnet FM (1959) The clonal selection theory of acquired immu-

nity. Cambridge University Press, Cambridge

9. de Castro LN, Von Zuben FJ (2002) Learning and optimization

using the clonal selection principle. IEEE Trans Evol Comput

6(3):239–251

10. http://en.wikipedia.org/wiki/Gnutella, 2010

11. www.bittorrent.com/, 2010

12. www.emule.org/, 2010

13. Hammami M, Chahir Y, Chen L (2006) WebGuard: a web filter-

ing engine combining textual, structural, and visual content-based

analysis. IEEE Trans Knowl Data Eng 18(2):272–284

14. Lee PY, Hui SC, Fong ACM (2005) An intelligent categorization

engine for bilingual web content filtering. IEEE Trans Multimedia

7(6):1183–1190

15. Harmer PK, Williams PD, Gunsch GH, Lamont GB (2002) An

artificial immune system architecture for computer security appli-

cations. IEEE Trans Evol Comput 6(3):252–280

16. Dasgupta D, Gonzalez F (2002) An immunity-based technique to

characterize intrusions in computer networks. IEEE Trans Evol

Comput 6(3):281–291

17. Gonzales F, Dasgupta D, Kozma R (2002) Combining negative

selection and classification techniques for anomaly detection. In:

Proceedings IEEE congress evolutionary computation, CEC’02.

Honolulu, pp 705–710

18. Dasgupta D, Majumdar NS (2002) Anomaly detection in multi-

dimensional data using negative selection algorithm. In: Proceed-

ings IEEE congress evolutionary computation CEC’02. Honolulu,

pp 1039–1044

19. Anchor KP, Williams PD, Gunsch GH, Lamont GB (2002) The

computer defense immune system: current and future research in

intrusion detection. In: Proceedings IEEE congress evolutionary

computation, CEC’02. Honolulu, pp 1027–1032

20. de Castro LN, Timmis J (2002) Artificial immune systems: a new

computational intelligence approach. Springer-Verlag, Berlin

21. Powers S, He J (2008) A hybrid artificial immune system and

self organising map for network intrusion detection. Inf Sci

178(15):3024–3042

22. Freitas AA, Timmis J (2007) Revisiting the foundations of artifi-

cial immune systems for data mining. IEEE Trans Evol Comput

11(4):521–540

23. Chen TM, Wang V (2010) Web filtering and censoring. Computer

43(3):94–97

24. Saroiu S, Gummadi PK, Gribble SD (2002) A Measurement study

of peer-to-peer file sharing systems. In: Proceedings multimedia

computing and networking, MMCN’02, San Jose

25. Hu W, Wu O, Chen Z, Fu Z, Maybank S (2007) Recognition of

pornographic web pages by classifying texts and images. IEEE

Trans Pattern Anal Mach Intell 29(6):1019–1034

26. Singh A, Ngan T, Druschel P, Wallach D (2006) Eclipse

attacks on overlay networks: threats and defenses. In: Proceed-

ings IEEE international conference computer communications,

INFOCOM’06. Barcelona, pp 1–12

27. Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001)

A scalable content-addressable network. In: Proceedings ACM

SIGCOMM conference, SIGCOMM’01. San Diego, pp 161–172

28. Stoica I, Morris R, Karger D, Kaashoek F, Balakrishnan H

(2001) Chord: a scalable peertopeer lookup service for inter-

net applications. In: Proceedings ACM SIGCOMM conference,

SIGCOMM’01. San Diego, pp 149–160

29. Tang C, Xu Z, Dwarkadas S (2001) Peer-to-peer information

retrieval using self-organizing semantic overlay networks. In:

Proceedings ACM SIGCOMM conference, SIGCOMM’01. San

Diego, pp 175–186

30. Reynolds P, Vahdat A (2003) Efficient peer-to-peer keyword

searching. In: Proceedings ACM/IFIP/USENIX international mid-

dleware conference, middleware’03. Rio de Janeiro, pp 21–40

Peer-to-Peer Netw. Appl.

31. Muller W, Boykin PO, Roychowdhury VP, Sarshar N (2008)

Comparison of image similarity queries in P2P systems. Comput

Commun 31(2):375–386

32. Picard D, Revel A, Cord M (2010) An application of swarm intel-

ligence to distributed image retrieval. Inf Sci. In Press, Corrected

Proof, Available online

33. Yang B, Garcia-Molina H (2002) Improving search in peer-to-peer

networks. In: Proceedings international conference distributed

computing systems, ICDCS’02. Vienna, pp 5–14

34. Crespo A, Garcia-Molina H (2002) Routing indices for peer-to-

peer systems. In: Proceedings international conference distributed

computing systems, ICDCS’02. Vienna, pp 23–32

35. Kalogeraki V, Gunopulos D, Zeinalipour-Yazti D (2002) A local

search mechanism for peer-to-peer networks. In: Proceedings

international conference information and knowledge management,

CIKM’02. McLean, pp 300–307

36. Friedmana M, Lastb M, Makoverb Y, Kandelc A (2007) Anomaly

detection in web documents using crisp and fuzzy-based cosine

clustering methodology. Inf Sci 177(2):467–475

37. Lv J, Yu Z, Zhang T (2011) iDetect: an immunity based algo-

rithm to detect harmful content shared in peer-to-peer networks.

In: Proceedings international conference machine learning and

cybernetics, ICMLC’11. Guilin, pp 926–931

38. http://www.emule-mods.de/?mods=start, 2012

Jianming Lv received the

BS degree in Computer Sci-

ence from Sun YAT-SEN Uni-

versity, China, in 2002, and

the PhD degrees from Insti-

tute of Computing Technol-

ogy, Chinese Academy of Sci-

ences University in 2008. He

is currently a lecturer in South

China University of Technol-

ogy. His research interests

include peer-to-peer comput-

ing, security and privacy.

Zhiwen Yu is a professor in

the School of Computer Sci-

ence and Engineering, South

China University of Technol-

ogy, Guangzhou, China. He

received the B.Sc. and M.Phil.

degrees from the Sun Yat-Sen

University in China in 2001

and 2004 respectively, and the

Ph.D. degree in Computer Sci-

ence from the City Univer-

sity of Hong Kong, in 2008.

He hold a research position in

Hong Kong Polytechnic Uni-

versity. His research interests include bioinformatics, machine learn-

ing, pattern recognition, multimedia, intelligent computing and data

mining. He has published more than 70 technical articles in referred

journals and conference proceedings in the areas of bioinformatics,

artificial intelligence, pattern recognition and multimedia.

Tieying Zhang is currently

an assistant professor at the

Institute of Computing Tech-

nology, Chinese Academy of

Sciences. His research inter-

ests include computer net-

works, distributed computing,

peer-to-peer systems, multi-

media networking, and net-

work security. He has pub-

lished over 20 technical papers

and book chapters in the above

areas. He is a member of IEEE

and ACM.

ResearchGate has not been able to resolve any citations for this publication.

A careful design for a tool to detect child pornography in P2P networks

Article

Full-text available

Jun 2010

This paper addresses the social problem of child pornography on peer-to-peer (P2P) networks on the Internet and presents an automated system with effective computer and telematic tools for seeking out and identifying data exchanges with pedophilic content on the Internet. The paper analyzes the social and legal context in which the system must operate and describes the processes by which the system respects the rights of the persons investigated and prevents these tools from being used to establish processes of surveillance and attacks on the privacy of Internet users.

Comparing Pedophile Activity in Different P2P Systems

Article

Full-text available

Jun 2012

Peer-to-peer (P2P) systems are widely used to exchange content over the Internet. Knowledge on paedophile activity in such networks remains limited while it has important social consequences. Moreover, though there are different P2P systems in use, previous academic works on this topic focused on one system at a time and their results are not directly comparable. We design a methodology for comparing \kad and \edonkey, two P2P systems among the most prominent ones and with different anonymity levels. We monitor two \edonkey servers and the \kad network during several days and record hundreds of thousands of keyword-based queries. We detect paedophile-related queries with a previously validated tool and we propose, for the first time, a large-scale comparison of paedophile activity in two different P2P systems. We conclude that there are significantly fewer paedophile queries in \kad than in \edonkey (approximately 0.09% \vs 0.25%).

P2P Honeypot to Prevent Illegal or Harmful Contents From Spreading in P2P network

Conference Paper

Full-text available

Mar 2007

In this paper we propose the P2P Honeypot that prevents illegal or harmful files from spreading in P2P network. We apply the idea of Honeypot to P2P network. We build fake P2P service farms and monitor and trace users who spread or gain illegal or harmful files in P2P network. If we can apply this system widely, we can expect illegal or harmful files to be wiped out in P2P network.

Web Filtering and Censoring

Article

Full-text available

Apr 2010

Information on the Web is not as uncontrolled as it may appear and for controlling them, web filtering and censoring is essential. Web filters differ in complexity, granularity, accuracy, location, and transparency. The simplest Web filters depend on blacklists of IP addresses. The main advantage of blacklists is speed, essentially a fast table lookup. Speed allows Web filtering at choke points in the network where traffic is aggregated such as gateways between neighboring national networks. IP and URL blacklists can be deployed at proxy-based filters. A proxy-based filter checks the IP addresses or URLs in all Web requests against a blacklist. If it detects a blacklisted IP address or URL, the proxy filter can return a blockpage with an error message or explanation that the content was blocked. Content filters typically use machine learning or AI techniques to classify webpages into a set of predefined categories. Intelligent content filters examine various elements of a webpage for classification, including the metadata, links, text, images, and scripts.

The Clonal Selection Theory of Immunity

Book

Jan 1959

FM Burnet

PIFF: An Intelligent File Filtering Mechanism for Peer-to-Peer Network

Conference Paper

Sep 2006

The distribution of copyright content over peer-to-peer (P2P) networks facilitates misuse of protected digital property and severely violates the rights of creators and owners. Besides, the anonymity and other features of latest P2P applications make law enforcement almost impractical. Unfortunately, current technology efforts attempting to block unauthorized files depend on manually predefined keywords which can be easily modified by the application, and they fail to distinguish between authorized and unauthorized files. In this context, we present PIFF, an intelligent file filtering mechanism for peer-to-peer network, which effectively blocks illegal file distribution based on file signatures automatically generated and deployed by the system. With a hierarchy infrastructure and decision algorithm, our solution discovers and identifies the file signatures much faster than manual analysis, and spreads it among system nodes to perform filtering immediately. Experiment results suggest that compared with unprotected environment, PIFF can reduce over 85% peers for unauthorized files without any affect on the distribution of authorized files.

Artificial immune systems: A new computational intelligence approach

Book

Jun 2002

An Application of Swarm Intelligence to Distributed Image Retrieval

Article

Jan 2010
INFORM SCIENCES

In this article, we introduce an application of swarm intelligence to dis-tributed visual information retrieval distributed over networks. Based on the relevance feedback scheme, we use ant-like agents to crawl the network and to retrieve relevant images. Agents movements are influenced by markers stored on the hosts. These markers are reinforced to match the distribution of relevant images over the network. We tackle the use of the information gathered during previous search sessions. In order to match the different categories available on the network, we use several markers. Sessions search-ing for the same category will thus use the same makers. The system in-volves three learning problems: the selection of relevant markers regarding the searched category, the reinforcement of these markers and the learning of the relevance function. All of these problems are based on the relevance feedback loop. We test our system on a custom network hosting images taken from the well known TrecVid dataset. Our system shows a high improve-ment over classical content based image retrieval systems which do not use previous sessions information.

A Hybrid Artificial Immune System and Self Organising Map for Network Intrusion Detection

Article

Aug 2008
INFORM SCIENCES

Network intrusion detection is the problem of detecting unauthorised use of, or access to, computer systems over a network. Two broad approaches exist to tackle this problem: anomaly detection and misuse detection. An anomaly detection system is trained only on examples of normal connections, and thus has the potential to detect novel attacks. However, many anomaly detection systems simply report the anomalous activity, rather than analysing it further in order to report higher-level information that is of more use to a security officer. On the other hand, misuse detection systems recognise known attack patterns, thereby allowing them to provide more detailed information about an intrusion. However, such systems cannot detect novel attacks.A hybrid system is presented in this paper with the aim of combining the advantages of both approaches. Specifically, anomalous network connections are initially detected using an artificial immune system. Connections that are flagged as anomalous are then categorised using a Kohonen Self Organising Map, allowing higher-level information, in the form of cluster membership, to be extracted. Experimental results on the KDD 1999 Cup dataset show a low false positive rate and a detection and classification rate for Denial-of-Service and User-to-Root attacks that is higher than those in a sample of other works.

A local search mechanism for peer-to-peer networks.

Conference Paper