Conference PaperPDF Available

On Permutation Masks in Hamming Negative Selection

Authors:

Abstract and Figures

Permutation masks were proposed for reducing the number of holes in Hamming negative selection when applying the r-contiguous or r-chunk matching rule. Here, we show that (randomly determined) permutation masks re-arrange the semantic representation of the underlying data and therefore shatter self-regions. As a consequence, detectors do not cover areas around self regions, instead they cover randomly distributed elements across the space. In addition, we observe that the resulting holes occur in regions where actually no self regions should occur.
Content may be subject to copyright.
On Permutation Masks in Hamming
Negative Selection
Thomas Stibor1, Jonathan Timmis2, and Claudia Eckert1
1Department of Computer Science
Darmstadt University of Technology
{stibor, eckert}@sec.informatik.tu-darmstadt.de
2Departments of Electronics and Computer Science
University of York, Heslington, York
jtimmis@cs.york.ac.uk
Abstract. Permutation masks were proposed for reducing the number
of holes in Hamming negative selection when applying the r-contiguous
or r-chunk matching rule. Here, we show that (randomly determined)
permutation masks re-arrange the semantic representation of the under-
lying data and therefore shatter self-regions. As a consequence, detec-
tors do not cover areas around self regions, instead they cover randomly
distributed elements across the space. In addition, we observe that the
resulting holes occur in regions where actually no self regions should
occur.
1 Introduction
Applying negative selection for anomaly detection problems has been undertaken
extensively [1,2,3,4]. Anomaly detection problems, also termed one-class classifi-
cation, can be considered as a type of pattern classification problem, where one
tries to describe a single class of objects, and distinguish that from all other pos-
sible objects. More formally, one-class classification is a problem of generating
decision boundaries that can successfully distinguish between the normal and
anomalous class. Hamming negative selection is an immune-inspired technique
for one-class classification problems. Recent results, however, have revealed sev-
eral problems concerning algorithm complexity of generating detectors [5,6,7]
and determining the proper matching threshold to allow for the generation of
correct generalization regions [8]. In this paper we investigate an extended tech-
nique for Hamming negative selection: permutation masks. Permutation masks
are immunologically motivated by lymphocyte diversity. Lymphocyte diversity
is an important property of the immune system, as it enables a lymphocyte to
reacting to many substances, i.e. it induces diversity and generalization. This
kind of generalization process inspired Hofmeyr [3,9] to propose a similar coun-
terpart for use in Hamming negative selection. Hofmeyr introduced permutation
masks in order to reduce the number of undetectable elements. It was argued
that permutation masks could be useful for covering the non-self space eciently
when varying the representation by means of permutation masks (see Fig. 1).
H. Bersini and J. Carneiro (Eds.): ICARIS 2006, LNCS 4163, pp. 122–135, 2006.
c
!Springer-Verlag Berlin Heidelberg 2006
On Permutation Masks in Hamming Negative Selection 123
Fig. 1. Visualized concept of varying representations by means of permutation masks
to reduce the number of undetectable elements. The light gray shaded area in the
middle represents the self regions (normal class in terms of anomaly detection). The
dark gray shaded shapes represent areas which are covered by detectors with varying
representations. The white area represents the non-self space (anomalous class in terms
of anomaly detection). This figure is taken from [9].
In the following two sections we briefly introduce the standard negative selec-
tion inspired anomaly detection technique.
2 Artificial Immune System
An artificial immune system (AIS) [10] is a paradigm inspired by the immune
system and are used for solving computational and information processing prob-
lems. An AIS can be described, and developed, using a framework [10] which
contains the following basic elements:
A representation for the artificial immune elements.
A set of functions, which quantifies the interactions of the artificial immune
elements (anity).
A set of algorithms which based on observed immune principles and methods.
This 3-step abstraction (representation, anity, algorithm) for using the AIS
framework is discussed in the following sections.
2.1 Hamming Shape-Space
The notion of shape-space was introduced by Perelson and Oster [11] and allows
a quantitative anity description between immune components known as an-
tibodies and antigens. More precisely, a shape-space is a metric space with an
associated distance (anity) function.
124 T. Stibor, J. Timmis, and C. Eckert
The Hamming shape-space UΣ
lis built from all elements of length lover a
finite alphabet Σ.
Example 1.
Σ={0,1}
000 . . . 000
000 . . . 001
. . . . . . . . . .
. . . . . . . . . .
111 . . . 111
!"# $
l
Σ={A, C, G, T }
AAA . . . AAA
AAA . . . AAC
............
............
T T T . . . T T T
!"# $
l
In example 1 two Hamming shape-spaces for dierent alphabets and alphabet
sizes are presented. On the left, a Hamming shape-space defined over the binary
alphabet of length lis shown. On the right, a Hamming shape-space defined over
the DNA bases alphabet (Adenine, Cytosine, Guanine, Thymine) is presented.
2.2 R-Contiguous and R-Chunk Matching
A formal description of antigen-antibody interactions not only requires a repre-
sentation (encoding), but also appropriate anity functions. Percus et. al [12]
proposed the r-contiguous matching rule for abstracting the anity of an anti-
body needed to recognize an antigen.
Definition 1. An element eUΣ
lwith e=e1e2. . . eland detector dUΣ
l
with d=d1d2. . . dl, match with r-contiguous rule, if a position pexists where
ei=difor i=p,...,p+r1,plr+ 1.
Informally, two elements, with the same length, match if at least rcontiguous
characters are identical.
An additional rule, which subsumes1the r-contiguous rule, is the r-chunk
matching rule [13].
Definition 2. An element eUΣ
lwith e=e1e2. . . eland detector
dN×DΣ
rwith d= (p|d1d2. . . dr), for rl, p lr+ 1 match with r-chunk
rule, if a position pexists where ei=difor i=p,...,p+r1.
Informally, element eand detector dmatch if a position pexists, where all
characters of eand dare identical over a sequence of length r.
We use the term subsume as any r-contiguous detector can be represented as a
set of r-chunk detectors. This implicates that any set of elements from UΣ
lthat
can be recognized with a set of r-contiguous detectors can also be recognized
with some set of r-chunk detectors. The converse statement is surprisingly not
true, i.e. there exists a set of elements from UΣ
lthat can be recognized with a set
1Include within a larger entity.
On Permutation Masks in Hamming Negative Selection 125
of r-chunk detectors, but not recognized with any set of r-contiguous detectors.
We demonstrate this converse statement on an example, a formal approach is
provided in [14].
Example 2. Given a Hamming shape-space U{0,1}
5, a set
S={01011,01100,01110,10010,10100,11100}of self elements and a detector
length r= 3.
All possible generable r-contiguous detectors for the complementary space
U{0,1}
5\Sare Drcontiguous ={00000,00001,00111,11000,11001}.
All possible generable r-chunk detectors are
Drchunk ={0|000,0|001,0|110,1|000,1|011,1|100,2|000,2|001,2|101,2|111}.
The set Drcontiguous recognizes the elements
P1=U{0,1}
5\(S{01010,01101,10011,10101,11101,11110}),
whereas the set Drchunk recognizes the elements
P2=U{0,1}
5\(S{10011,01010,11110}). Hence |P1||P2|.
Example 2 shows, that the set of r-chunk detectors Drchunk recognizes more
elements of U{0,1}
5than the set of r-contiguous detectors Drcontiguous and there-
fore the r-chunk matching rule subsumes the r-contiguous rule.
3 Hamming Negative Selection
Forrest et al. [1] proposed a (generic2) negative selection algorithm for detecting
changes in data streams. Given a shape-space U=Sseen Sunseen Nwhich
is partitioned into training data Sseen and testing data (Sseen Sunseen N).
The basic idea is to generate a number of detectors for the complementary space
U\Sseen and then to apply these detectors to classify new (unseen) data as self
(no data manipulation) or non-self (data manipulation).
Algorithm 1. Generic Negative Selection Algorithm
input :Sseen = set of self seen elements
output:D= set of generated detectors
begin
1.Define self as a set Sseen of elements in shape-space U
2.Generate a set Dof detectors, such that each fails to match any element in
Sseen
3.Monitor (seen and unseen) data δUby continually matching the
detectors in D against δ.
end
The generic negative selection algorithm can be used with arbitrary shape-
spaces and anity functions. In this paper, we focus on Hamming negative
2Applicable to arbitrary shape-spaces.
126 T. Stibor, J. Timmis, and C. Eckert
selection, i.e. the negative selection algorithm which operates on Hamming shape-
space and employs the r-chunk matching rule and permutation masks.
3.1 Holes as Generalization Regions
The r-contiguous and r-chunk matching rule induce undetectable elements —
termed holes (see Fig. 2). In general, all matching rules which match over a
certain element length induce holes. This statement is theoretically investigated
in [15,14] and empirically explored3in [16]. Holes are some4elements from U\
Sseen, i.e. elements not seen during the training phase. For these elements, no
detectors can be generated and therefore they cannot be recognized and classified
as non-self elements. However, the term holes is not an accurate expression, as
holes are necessary to generalize beyond the training set. A detector set which
generalizes well ensures that seen and unseen self elements are not recognized
by any detector, whereas all other elements are recognized by detectors and
classified as non-self. Hence, holes must represent unseen self elements; or in
other words, holes must represent generalization regions in the shape-space UΣ
l.
1000
0001
!
!
100 000
000 001 ={0001,1001}
={1000,0000}
={s1, h1}
={s2, h2}
r1
Fig. 2. Self elements s1= 0001 and s2= 1000 induce holes h1, h2, i.e. elements which
are not detectable with r-contiguous and r-chunk matching rules for r= 3
4 Permutation Masks
Permutation masks were proposed by Hofmeyr [3,9] for reducing the number of
holes. A permutation mask is a bijective mapping πthat specifies a reordering
for all elements aiUΣ
l, i.e. a1π(a1), a2π(a2),...,a|Σ|lπ(a|Σ|l).
More formally, a permutation πSn, where nN, can be written as a 2 ×n
matrix, where the first row are elements a1, a2, . . . , anand the second row the
new arrangement π(a1),π(a2),...,π(an), i.e.
%a1a2. . . an
π(a1)π(a2)... π(an)&
For the sake of simplicity we will use the equivalent cycle notation [17] to specify
a permutation. A permutation in cycle notation can be written as (b1b2. . . bn)
and means b1becomes b2, . . . , bn1becomes bn,bnbecomes b1. In addition, this
notation allows the identity and non-cyclic mappings, for instance (b1) (b2b3) (b4)
means : b1b1,b2b3, b3b2and b4b4.
3Hamming, r-contiguous, r-chunk and Rogers & Tanimoto matching rule.
4The number of holes is controlled by the matching threshold r.
On Permutation Masks in Hamming Negative Selection 127
4.1 Permutation Masks for Inducing Other Holes
As explained above, a permutation mask is a bijective mapping and therefore can
increase or reduce the number of holes — there also exists permutation masks
which results in self elements which neither increase nor reduce the number of
holes. The simplest examples is the identity permutation mask.
For reducing the number of holes, πmust be chosen at an appropriate value,
and a certain number of detectors must be generable.
Reconsider the self elements s1= 0001, s2= 1000 in figure (2). One can see
that elements h1= 1001 and h2= 0000 are not detectable by the r-contiguous
and r-chunk matching rule. However, after applying the permutation mask π0=
(1 2 4 3), i.e.
π0(s1) = 0010,π0(s2) = 0100
one can verify (see Fig. 3) that holes h1, h2are eliminated.
π0(1000)
π0(0001)
!
!
010 100
001 010 ={0010}
={0100}
={π0(s1)}
={π0(s2)}
r1
Fig. 3. The permutated self elements π0(s1) and π0(s2) induce no holes by r-contiguous
and r-chunk matching rule
However, it is also clear that (1 2 4 3) (2 4 3 1),(4 3 1 2) and (3 1 2 4) represent
the same permutation, namely the cycle permutation of π0= (1 2 4 3). Specif-
ically, all cycle permutations of an arbitrary selected πleads, in terms of the
r-chunk and r-contiguous matching, to the same holes.
On the other hand, there do exist permutation masks which do not reduce
holes, i.e. π(si) = sj, for i'=jand self elements s1, s2,...,s|S|. An example is
the permutation π1= (14)(2)(3), as π1(s1) = s2and π1(s2) = s1.
Furthermore, as mentioned above, a permutation mask can also increase the
number of holes. In our subsequent presented experiments this is illustrated for
instance in figures55(c) and 5(d).
5 Permutation Masks Experiments in Hamming Negative
Selection
In [18,8] results were presented which demonstrated the coherence between the
matching threshold rand generalization regions when the r-chunk matching rule
in Hamming negative selection is applied. Recall, as holes are not detectable by
any detector, holes must represent unseen self elements, or in other words holes
must represent generalization regions. In the following experiment we will investi-
gate how randomly determined permutation masks will influence the occurrence
5With and without permutation mask.
128 T. Stibor, J. Timmis, and C. Eckert
of holes (generalization regions). More specifically, we will empirically explore
if holes occur in suitable generalization regions when a randomly determined
permutation mask is applied. Finally, we explore empirically whether randomly
determined permutation masks reduce the number of holes.
Stibor et al. [8] have shown in prior experiments that the matching thresh-
old ris a crucial parameter and is inextricably linked to the input data being
analyzed. However, permutation masks were not considered in [8]. In order to
study the impact of permutation masks on generalization regions, and to obtain
comparable results to previously performed experiments [8], we will utilize the
same mapping function and data set. Furthermore, we will explore the impact
of permutation masks on an additional data set (see Fig. 4).
5.1 Experiments Settings
The first self data set contains 1000 Gaussian (µ= 0.5,σ= 0.1) generated points
p= (x, y)[0,1]2. Each point pis mapped to a binary string
b1, b2,...,b8
! "# $
bx
, b9, b10,...,b16
! "# $
by
,
where the first 8 bits encode the integer x-value ix:= (255 ·x+ 0.5)and the last
8 bits the integer y-value iy:= (255 ·y+ 0.5), i.e.
[0,1]2(ix, iy)[1, . . . , 256 ×1, . . . , 256] (bx, by)U{0,1}
8×U{0,1}
8
This mapping is proposed in [18] and also utilized in [8] — it satisfies a straightfor-
ward visualization of real-valued encoded points in Hamming negative selection.
The second data set (termed banana data set) is depicted in figure (4) and is a com-
monly used benchmark for anomaly detection problems [19]. The banana data set
is taken from [20] and consists of 5300 points in total. These points are partitioned
in two dierent classes, C+which represents points inside the“banana-shape”and
class Cwhich contains points outside of the“banana-shape”. In this experiment we
have taken points from C+only for simulating one self-region (similar to figure 1).
More specifically, we have normalized with min-max method all points from C+
to the unitary square [0,1]2. We then sampled 1000 random points from C+and
mapped those sampled points to bit-strings of length 16.
As the r-chunk matching rule subsumes the r-contiguous rule, i.e. recognize
at least as many elements as the r-contiguous matching rule (see section 2.2), we
have performed all experiments with the r-chunk matching rule. Furthermore,
as proposed in [3,9] we have randomly determined permutation masks πS16.
5.2 Experimental Results
In figures (5,6,7,8) experimental results are presented. The black points represent
the 1000 sampled self elements, the white points are holes, and the grey points
represent areas which are covered by r-chunk detectors. It is not surprising that
On Permutation Masks in Hamming Negative Selection 129
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
X
Y
Fig. 4. Banana data set (points from class C+), min-max normalized to [0,1]2. In an
perfect case (error-less detection), the r-chunk detectors should cover regions outside
the “banana” shape. The region within the “banana” shape is the generalization region
and should consists of undetectable elements, i.e. holes and self elements.
for both data sets, holes occur as they should in generalization regions when
8r10. This phenomena is discussed and explained in [8]. To summarize
results from [8], a detector matching length which is not at least as long as the
semantical representation of the underlying data — in this case 8 bits for xand
ycoordinates — results in incorrect generalization regions.
What is more interesting though, is the observation that a (randomly deter-
mined) permutation mask shatters the semantical representation of the under-
lying data (see Fig. 5-8 (b,d,f,h,j,l,n,p,r,t)) and therefore, holes are randomly
distributed across the space instead of being concentrated inside or close to self
regions. This observation also means that detectors are not covering areas around
the self regions, instead they recognize elements which are also randomly dis-
tributed across the space. Furthermore one can see that the number of holes
— when applying permutation masks (see Fig. 5-8 (b,d,f,h,j,l,n,p,r,t)) — is in
some cases significantly higher than without permutation masks (see Fig. 5-8
(a,c,d,e,g,i,k,m,q,s)). This observation could be explained with the previous ob-
servation, that permutation masks distort the underlying data and therefore
shatter self regions. As a consequence the underlying data is transformed into a
collection of random chunks. For randomly determined self elements, Stibor et
al. [6] showed that the number of holes increase exponentially for r:= l0.
Of course this shattering eect is linked very strongly to the mapping function
employed. However it is clear that each permutation mask — except the identity
permutation — semantically (more or less) distort the data. Furthermore, we
believe that finding a permutation mask which does not significantly distort the
semantical representation of the data may be computational intractable6.
6In the worst-case, one have to check all n! permutations of Sn.
130 T. Stibor, J. Timmis, and C. Eckert
(a) r= 2 (b) r= 2,π(c) r= 3 (d) r= 3,π
(e) r= 4 (f) r= 4,π(g) r= 5 (h) r= 5,π
(i) r= 6 (j) r= 6,π(k) r= 7 (l) r= 7,π
(m) r= 8 (n) r= 8,π(o) r= 9 (p) r= 9,π
(q) r= 10 (r) r= 10,π(s) r= 11 (t) r= 11,π
Fig. 5. A visualized simulation run, with 1000 random (self) points generated by a
Gaussian distribution with mean µ= 0.5 and variance σ= 0.1. The grey shaded area
is covered by the generated r-chunk detectors, the white areas are holes. The black
points are self elements. The captions which include a π” are simulations results with
the randomly determined permutation mask πS16.
On Permutation Masks in Hamming Negative Selection 131
(a) r= 2 (b) r= 2,π(c) r= 3 (d) r= 3,π
(e) r= 4 (f) r= 4,π(g) r= 5 (h) r= 5,π
(i) r= 6 (j) r= 6,π(k) r= 7 (l) r= 7,π
(m) r= 8 (n) r= 8,π(o) r= 9 (p) r= 9,π
(q) r= 10 (r) r= 10,π(s) r= 11 (t) r= 11,π
Fig. 6. An additional visualized simulation run, with 1000 random (self) points gen-
erated by a Gaussian distribution with mean µ= 0.5 and variance σ= 0.1. The grey
shaded area is covered by the generated r-chunk detectors, the white areas are holes.
The black points are self elements. The captions which include a π” are simulations
results with the randomly determined permutation mask πS16.
132 T. Stibor, J. Timmis, and C. Eckert
(a) r= 2 (b) r= 2,π(c) r= 3 (d) r= 3,π
(e) r= 4 (f) r= 4,π(g) r= 5 (h) r= 5,π
(i) r= 6 (j) r= 6,π(k) r= 7 (l) r= 7,π
(m) r= 8 (n) r= 8,π(o) r= 9 (p) r= 9,π
(q) r= 10 (r) r= 10,π(s) r= 11 (t) r= 11,π
Fig. 7. A visualized simulation run, 1000 randomly sampled (self) points from banana
data set. The grey shaded area is covered by the generated r-chunk detectors, the white
areas are holes. The black points are self elements. The captions which include a π
are simulations results with the randomly determined permutation mask πS16 .
On Permutation Masks in Hamming Negative Selection 133
(a) r= 2 (b) r= 2,π(c) r= 3 (d) r= 3,π
(e) r= 4 (f) r= 4,π(g) r= 5 (h) r= 5,π
(i) r= 6 (j) r= 6,π(k) r= 7 (l) r= 7,π
(m) r= 8 (n) r= 8,π(o) r= 9 (p) r= 9,π
(q) r= 10 (r) r= 10,π(s) r= 11 (t) r= 11,π
Fig. 8. An additional visualized simulation run, with 1000 randomly sampled (self)
points from banana data set. The grey shaded area is covered by the generated r-
chunk detectors, the white areas are holes. The black points are self elements. The
captions which include a “π” are simulations results with the randomly determined
permutation mask πS16.
134 T. Stibor, J. Timmis, and C. Eckert
In order to obtain representative results, we performed 50 simulation runs,
each with a randomly determined permutation mask for both data sets. Due
to the lack of space to present all 50 simulation runs, we have selected two
simulation results at random for each data set (see Fig. 5,6,7,8). The remaining
simulation results are closely comparable to results in figures (5,6,7,8).
6 Conclusion
Lymphocyte diversity is an important property of the immune system for recog-
nizing a huge amount of diverse substances. This property has been abstracted in
terms of permutation masks in the Hamming negative selection detection tech-
nique. In this paper we have shown that (randomly determined) permutation
masks in Hamming negative selection, distort the semantic meaning of the un-
derlying data — the shape of the distribution — and as a consequence shatter
self regions. Furthermore, the distorted data is transformed into a collection of
random chunks. Hence, detectors are not covering areas around the self regions,
instead they are randomly distributed across the space. Moreover the resulting
holes (the generalization) occur in regions where actually no self regions should
occur. Additionally we believe that it is computational infeasible to find permu-
tation masks which correctly capture the semantical representation of the data
— if one exists at all. We conclude that the use of permutation masks casts doubt
on the appropriateness of abstracting diversity in Hamming negative selection.
References
1. Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R.: Self-nonself discrimination in
a computer. In: Proceedings of the 1994 IEEE Symposium on Research in Security
and Privacy, IEEE Computer Society Press (1994)
2. Dasgupta, D., Forrest, S.: Novelty detection in time series data using ideas from
immunology. In: Proceedings of the 5th International Conference on Intelligent
Systems. (1996)
3. Hofmeyr, S.A.: An Immunological Model of Distributed Detection and its Appli-
cation to Computer Security. PhD thesis, University of New Mexico (1999)
4. Singh, S.: Anomaly detection using negative selection based on the r-contiguous
matching rule. In: Proceedings of the 1st International Conference on Artificial
Immune Systems (ICARIS), Unversity of Kent at Canterbury Printing Unit (2002)
99–106
5. Kim, J., Bentley, P.J.: An evaluating of negative selection in an artificial immune
system for network intrusion detection. In: Proceedings of the Genetic and Evolu-
tionary Computation Conference, GECCO-2001. (2001) 1330–1337
6. Stibor, T., Timmis, J., Eckert, C.: On the appropriateness of negative selection
defined over hamming shape-space as a network intrusion detection system. In:
Congress On Evolutionary Computation – CEC 2005, IEEE Press (2005) 995–1002
7. Stibor, T., Timmis, J., Eckert, C.: The link between r-contiguous detectors and
k-cnf satisfiability. In: Congress On Evolutionary Computation – CEC 2006, IEEE
Press (2006 (to appear))
On Permutation Masks in Hamming Negative Selection 135
8. Stibor, T., Timmis, J., Eckert, C.: Generalization regions in hamming negative
selection. In: Intelligent Information Processing and Web Mining. Advances in
Soft Computing, Springer-Verlag (2006) 447–456
9. Hofmeyr, S., Forrest, S.: Architecture for an artificial immune system. Evolutionary
Computation 8(2000) 443–473
10. de Castro, L.N., Timmis, J.: Artificial Immune Systems: A New Computational
Intelligence Approach. Springer Verlag (2002)
11. Perelson, A.S., Oster, G.: Theoretical studies of clonal selection: minimal antibody
repertoire size and reliability of self-nonself discrimination. In: J. Theor. Biol.
Volume 81. (1979) 645–670
12. Percus, J.K., Percus, O.E., Perelson, A.S.: Predicting the size of the T-cell receptor
and antibody combining region from consideration of ecient self-nonself discrim-
ination. Proceedings of National Academy of Sciences USA 90 (1993) 1691–1695
13. Balthrop, J., Esponda, F., Forrest, S., Glickman, M.: Coverage and generalization
in an artificial immune system. In: GECCO 2002: Proceedings of the Genetic and
Evolutionary Computation Conference, New York, Morgan Kaufmann Publishers
(2002) 3–10
14. Esponda, F., Forrest, S., Helman, P.: A formal framework for positive and negative
detection schemes. IEEE Transactions on Systems, Man and Cybernetics Part B:
Cybernetics 34 (2004) 357–373
15. D’haeseleer, P., Forrest, S., Helman, P.: An immunological approach to change
detection: algorithms, analysis, and implications. In: Proceedings of the 1996 IEEE
Symposium on Research in Security and Privacy, IEEE Computer Society, IEEE
Computer Society Press (1996) 110–119
16. Gonz´alez, F., Dasgupta, D., Ni˜no, L.F.: A randomized real-valued negative selec-
tion algorithm. In: Proceedings of the 2nd International Conference on Artificial
Immune Systems (ICARIS). Volume 2787 of Lecture Notes in Computer Science.,
Edinburgh, UK, Springer-Verlag (2003) 261–272
17. Knuth, D.E.: The Art of Computer Programming. third edn. Volume 1. Addison-
Wesley (2002)
18. Gonz´alez, F., Dasgupta, D., G´omez, J.: The eect of binary matching rules in
negative selection. In: Genetic and Evolutionary Computation – GECCO-2003.
Volume 2723 of Lecture Notes in Computer Science., Chicago, Springer-Verlag
(2003) 195–206
19. Tax, D.M.J.: One-class classification. PhD thesis, Technische Universiteit Delft
(2001)
20. R¨
atsch, G.: Benchmark repository (1998)
http://ida.first.fraunhofer.de/projects/bench/benchmarks.htm.
... In additional work, Stibor et al. [45,46] argued that holes in anomaly detection with binary negative selection algorithm are necessary to generalize beyond the training data set. Holes must represent unseen self elements (or generation regions) to ensure that seen and unseen self elements are not recognized by any detector. ...
... They found that an r-chunk length which does not properly capture the semantic representation of the input data will result in an incorrect generalization and further concluded that a suitable r-chunk length does not exist for input data with element of different length. In [46], they conducted some experiments to investigating how randomly determined permutation masks will influence the occurrence of holes. They observed that holes when applying a randomly determined permutation mask are randomly distributed across the space instead of being concentrated inside or close to self regions because a randomly determined permutation mask shatters the semantical representation of the underlying data. ...
Article
The immune system is a remarkable information processing and self learning system that offers inspiration to build artificial immune system (AIS). The field of AIS has obtained a significant degree of success as a branch of Computational Intelligence since it emerged in the 1990s. This paper surveys the major works in the AIS field, in particular, it explores up-to-date advances in applied AIS during the last few years. This survey has revealed that recent research is centered on four major AIS algorithms: (1) negative selection algorithms; (2) artificial immune networks; (3) clonal selection algorithms; (4) Danger Theory and dendritic cell algorithms. However, other aspects of the biological immune system are motivating computer scientists and engineers to develop new models and problem solving methods. Though an extensive amount of AIS applications has been developed, the success of these applications is still limited by the lack of any exemplars that really stand out as killer AIS applications.
... Grounded Hamming memories ) output a ground state if the input does not match any element in the fundamental set with a small enough distance. Cellular Hamming memories (Coppin, 2004, Chapter 18), (Abraham, 2005, Pipe and Carse, 2007, Sahba et al., 2005, Stelzer et al., 2007, Wang and Mendel, 1992, swarm intelligence (Berger et al., 2006, (Coppin, 2004, Chapter 19), and artificial immune systems (de Abreu et al., 2006, de Abreu and Mostardinha, 2009, de Castro and Timmis, 2002, Elberfeld and Textor, 2009, Greensmith et al., 2005, Hart and Davoudani, 2009, Hofmeyr and Forrest, 1999, McEwan and Hart, 2009, Pagnoni and Visconti, 2005, Stibor et al., 2006, Timmis et al., 2008. The following sections present existing hardware architectures (Section 2.2) and then discuss the implications (Section 2.3) of using these architectures to implement the categories of AI algorithms described above. ...
... highlighted many more shortcomings of the algorithm revolving around shape space coverage, computational complexity and the inefficiency of random detector generation [130,132,133]. Despite the extent of the work demonstrating the problems with negative selection Elberfeld and ...
... These empirical studies suggest that 47 negative selection might not be a suitable algorithm for use in computer security, with these notions 48 confirmed by the theoretical work performed by Stibor et al. [69]. Further analysis performed by the 49 same authors has given insights into the theoretical reasons for negative selection's problems [70], with 50 more evidence presented recently by Stibor et al. in [71] of how to overcome these problems remains at the forefront of AIS research, focussing on the 3 incorporation of more advanced immunology. 4 An interdisciplinary approach is presented by Aickelin et al. [1] , developed in 2003 through the Dan- 5 ger Project. Aickelin et al. believe that some of the problems shown with negative selection approaches 6 can be attributed to its biological naivety. ...
Article
Full-text available
The dendritic cell algorithm (DCA) is an immune-inspired algorithm, developed for the purpose of anomaly detection. The algorithm performs multi-sensor data fusion and correlation which results in a ‘context aware’ detection system. Previous applications of the DCA have included the detection of potentially malicious port scanning activity, where it has produced high rates of true positives and low rates of false positives. In this work we aim to compare the performance of the DCA and of a self-organizing map (SOM) when applied to the detection of SYN port scans, through experimental analysis. A SOM is an ideal candidate for comparison as it shares similarities with the DCA in terms of the data fusion method employed. It is shown that the results of the two systems are comparable, and both produce false positives for the same processes. This shows that the DCA can produce anomaly detection results to the same standard as an established technique.
Chapter
We introduce Artificial Immune Systems by emphasizing on their ability to provide an alternative machine learning paradigm. The relevant bibliographical survey is utilized to extract the formal definition of Artificial Immune Systems and identify their primary application domains, which include: Clustering and Classification, Anomaly Detection/Intrusion Detection, Optimization, Automatic Control, Bioinformatics, Information Retrieval and Data Mining, User Modeling/Personalized Recommendation and Image Processing. Special attention is paid on analyzing the Shape-Space Model which provides the necessary mathematical formalism for the transition from the field of Biology to the field of Information Technology. This chapter focuses on the development of alternative machine learning algorithms based on Immune Network Theory, the Clonal Selection Principle and the Theory of Negative Selection. The proposed machine learning algorithms relate specifically to the problems of: Data Clustering, Pattern Classification and One-Class Classification.
Article
The Negative Selection Algorithm developed by Forrest et al. was inspired by the way in which T-cell lymphocytes mature within the thymus before being released into the blood system. The mature T-cell lymphocytes exhibit an interesting characteristic, in that they are only activated by non-self cells that invade the human body. The Negative Selection Algorithm utilises an affinity matching function to ascertain whether the affinity between a newly generated (NSA) T-cell lymphocyte and a self-cell is less than a particular threshold; that is, whether the T-cell lymphocyte is activated by the self-cell. T-cell lymphocytes not activated by self-sells become mature T-cell lymphocytes. A new affinity matching function termed the feature-detection rule is introduced in this paper. The feature-detection rule utilises the interrelationship between both adjacent and non-adjacent features of a particular problem domain to determine whether an antigen is activated by an artificial lymphocyte. The performance of the feature-detection rule is contrasted with traditional affinity matching functions, currently employed within Negative Selection Algorithms, most notably the r-chunks rule (which subsumes the r-contiguous bits rule) and the hamming distance rule. This paper shows that the feature-detection rule greatly improves the detection rates and false alarm rates exhibited by the NSA (utilising the r-chunks and hamming distance rule) in addition to refuting the way in which permutation masks are currently being applied in artificial immune systems.
Article
The human immune system has numerous properties that make it ripe for exploitation in the computational domain, such as robustness and fault tolerance, and many different algorithms, collectively termed Artificial Immune Systems (AIS), have been inspired by it. Two generations of AIS are currently in use, with the first generation relying on simplified immune models and the second generation utilising interdisciplinary collaboration to develop a deeper understanding of the immune system and hence produce more complex models. Both generations of algorithms have been successfully applied to a variety of problems, including anomaly detection, pattern recognition, optimisation and robotics. In this chapter an overview of AIS is presented, its evolution is discussed, and it is shown that the diversification of the field is linked to the diversity of the immune system itself, leading to a number of algorithms as opposed to one archetypal system. Two case studies are also presented to help provide insight into the mechanisms of AIS; these are the idiotypic network approach and the Dendritic Cell Algorithm.
Article
In this paper, we show that the accuracy of Bio-inspired classifiers can be dramatically improved if they operate on intelligent features. We propose a novel set of intelligent features for the well-known problem of malware portscan detection. We compare the performance of three well-known Artificial Immune System (AIS) based classifiers operating on the proposed intelligent features: Real Valued Negative Selection (RVNS) – both constant and variable sized detectors, termed as C-detector and V-detector – based on the adaptive immune system, and Dendritic Cell Algorithm (DCA) based on the innate immune system. To empirically evaluate the improvements provided by the intelligent features, we use 3 network traffic datasets collected at different points in network. For unbiased performance comparison, we also include a machine learning algorithm, Support Vector Machine (SVM), and two state-of-the-art statistical malware detectors, Rate Limiting (RL) and Maximum Entropy (ME). To the best of our knowledge, this is the first study in which C-detector, V-detector and DCA are not only compared with each other but also with several other classifiers on multiple real-world datasets. The experimental results indicate that our proposed features significantly improve the tp rate and the fp rate of C-detector, V-detector and DCA.
Conference Paper
Full-text available
We present new results on a distributable change-detection method inspired by the natural immune system. A weakness in the original algorithm was the exponential cost of generating detectors. Two detector-generating algorithms are introduced which run in linear time. The algorithms are analyzed, heuristics are given for setting parameters based on the analysis, and the presence of holes in detector space is examined. The analysis provides a basis for assessing the practicality of the algorithms in specific settings, and some of the implications are discussed.
Conference Paper
Full-text available
In the context of generating detectors using the r-contiguous matching rule, questions have been raised at the efficiency of the process. We show that the problem of generating r-contiguous detectors can be transformed in a k-CNF satisfiability problem. This insight allows for the wider understanding of the problem of generating r-contiguous detectors. Moreover, we apply this result to consider questions relating to the complexity of generating detectors, and when detectors are generable.
Article
A series of observations of a system over time is often used to characterize its normal behaviour. The problem of anomaly detection is that of find- ing deviations in the characteristics of the sys- tem. Anomaly detection algorithms inspired by the negative selection mechanism of the natural immune system have been proposed. This paper presents results obtained by employing an effi- cient negative selection algorithm based on the r- contiguous matching rule to detect anomaly in various forms of data. The algorithm presented is an extension of an existing detector generating algorithm to deal with m-ary alphabet strings. Results are obtained for three cases - assembler instructions, system calls and simulated time se- ries. Finally, conclusions of the study are pre- sented and future direction of the work, currently in progress, is indicated.
Chapter
Negative selection is an immune-inspired algorithm which is typically applied to anomaly detection problems. We present an empirical investigation of the generalization capability of the Hamming negative selection, when combined with the r-chunk affinity metric. Our investigations reveal that when using the r-chunk metric, the length r is a crucial parameter and is inextricably linked to the input data being analyzed. Moreover, we propose that input data with different characteristics, i.e. different positional biases, can result in an incorrect generalization effect.