## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

Negative selection is an immune-inspired algorithm which is typically applied to anomaly detection problems. We present an
empirical investigation of the generalization capability of the Hamming negative selection, when combined with the r-chunk
affinity metric. Our investigations reveal that when using the r-chunk metric, the length r is a crucial parameter and is inextricably linked to the input data being analyzed. Moreover, we propose that input data
with different characteristics, i.e. different positional biases, can result in an incorrect generalization effect.

To read the full-text of this research,

you can request a copy directly from the authors.

... Hamming negative selection is an immune-inspired technique for one-class classification problems. Recent results, however, have revealed several problems concerning algorithm complexity of generating detectors [5,6,7] and determining the proper matching threshold to allow for the generation of correct generalization regions [8] . In this paper we investigate an extended technique for Hamming negative selection: permutation masks. ...

... In [18,8] results were presented which demonstrated the coherence between the matching threshold r and generalization regions when the r-chunk matching rule in Hamming negative selection is applied. Recall, as holes are not detectable by any detector, holes must represent unseen self elements, or in other words holes must represent generalization regions. ...

... Finally, we explore empirically whether randomly determined permutation masks reduce the number of holes. Stibor et al. [8] have shown in prior experiments that the matching threshold r is a crucial parameter and is inextricably linked to the input data being analyzed. However, permutation masks were not considered in [8]. ...

Permutation masks were proposed for reducing the number of holes in Hamming negative selection when applying the r-contiguous or r-chunk matching rule. Here, we show that (randomly determined) permutation masks re-arrange the semantic representation of
the underlying data and therefore shatter self-regions. As a consequence, detectors do not cover areas around self regions,
instead they cover randomly distributed elements across the space. In addition, we observe that the resulting holes occur
in regions where actually no self regions should occur.

... In additional work, Stibor et al. [45,46] argued that holes in anomaly detection with binary negative selection algorithm are necessary to generalize beyond the training data set. Holes must represent unseen self elements (or generation regions) to ensure that seen and unseen self elements are not recognized by any detector. ...

... Holes must represent unseen self elements (or generation regions) to ensure that seen and unseen self elements are not recognized by any detector. In [45], they explored the generation capability of the Hamming negative selection when using the r-chunk length r. They found that an r-chunk length which does not properly capture the semantic representation of the input data will result in an incorrect generalization and further concluded that a suitable r-chunk length does not exist for input data with element of different length. ...

The immune system is a remarkable information processing and self learning system that offers inspiration to build artificial immune system (AIS). The field of AIS has obtained a significant degree of success as a branch of Computational Intelligence since it emerged in the 1990s. This paper surveys the major works in the AIS field, in particular, it explores up-to-date advances in applied AIS during the last few years. This survey has revealed that recent research is centered on four major AIS algorithms: (1) negative selection algorithms; (2) artificial immune networks; (3) clonal selection algorithms; (4) Danger Theory and dendritic cell algorithms. However, other aspects of the biological immune system are motivating computer scientists and engineers to develop new models and problem solving methods. Though an extensive amount of AIS applications has been developed, the success of these applications is still limited by the lack of any exemplars that really stand out as killer AIS applications.

... Dilger [8] investigated metric properties of some affinity functions (Hamming and r-contiguous) and showed that not all metric properties are satisfied. González et al. [11] and Stibor et al. [15, 17] showed that the generalization capability of some affinity ...

Affinity functions are the core components in negative selection to discriminate self from non-self. It has been shown that affinity functions such as the r-contiguous distance and the Hamming distance are limited applicable for discrimination problems such as anomaly detection. We propose to model self as a discrete probability distribution specified by finite mixtures of multivariate Bernoulli distributions. As by-product one also obtains information of non-self and hence is able to discriminate with probabilities self from non-self. We underpin our proposal with a comparative study between the two affinity functions and the probabilistic discrimination.

Glossary
Definition of the Subject
Introduction
What Is an Artificial Immune System?
Current Artificial Immune Systems Biology and Basic Algorithms
Alternative Immunological Theories for AIS
Emerging Methodologies in AIS
Future Directions
Bibliography

The problem of generating r-contiguous detectors in negative selection can be transformed in the problem of finding assignment sets for a Boolean formula
in k-CNF. Knowing this crucial fact enables us to explore the computational complexity and the feasibility of finding detectors
with respect to the number of self bit strings
|S||\mathcal{S}|
, the bit string length l and matching length r. It turns out that finding detectors is hardest in the phase transition region, which is characterized by certain combinations
of parameters
|S|,l|\mathcal{S}|,l
and r. This insight is derived by investigating the r-contiguous matching probability in a random search approach and by using the equivalent k-CNF problem formulation.

Negative selection and the associated r-contiguous matching rule is a popular immune-inspired method for anomaly detection problems. In recent years, however, problems
such as scalability and high false positive rate have been empirically noticed. In this article, negative selection and the
associated r-contiguous matching rule are investigated from a pattern classification perspective. This includes insights in the generalization
capability of negative selection and the computational complexity of finding r-contiguous detectors.

The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune
system provides the human body with a high level of protection from invading pathogens, in a robust, self-organised and distributed
manner. Secondly, current techniques used in computer security are not able to cope with the dynamic and increasingly complex
nature of computer systems and their security. It is hoped that biologically inspired approaches in this area, including the
use of immune-based systems will be able to meet this challenge. Here we collate the algorithms used, the development of the
systems and the outcome of their implementation. It provides an introduction and review of the key developments within this
field, in addition to making suggestions for future research.
KeywordsArtificial immune systems-intrusion detection systems-literature review

Negative selection algorithm is one of the most widely used techniques in the field of artificial immune systems. It is primarily used to detect changes in data/behavior patterns by generating detectors in the complementary space (from given normal samples). The negative selection algorithm generally uses binary matching rules to generate detectors. The purpose of the paper is to show that the low-level representation of binary matching rules is unable to capture the structure of some problem spaces. The paper compares some of the binary matching rules reported in the literature and study how they behave in a simple two-dimensional real-valued space. In particular, we study the detection accuracy and the areas covered by sets of detectors generated using the negative selection algorithm.

In anomaly detection, the normal behavior of a process is characterized by a model, and deviations from the model are called anomalies. In behavior-based approaches to anomaly detection, the model of normal behavior is constructed from an observed sample of normally occurring patterns. Models of normal behavior can represent either the set of allowed patterns (positive detection) or the set of anomalous patterns (negative detection). A formal framework is given for analyzing the tradeoffs between positive and negative detection schemes in terms of the number of detectors needed to maximize coverage. For realistically sized problems, the universe of possible patterns is too large to represent exactly (in either the positive or negative scheme). Partial matching rules generalize the set of allowable (or unallowable) patterns, and the choice of matching rule affects the tradeoff between positive and negative detection. A new match rule is introduced, called r-chunks, and the generalizations induced by different partial matching rules are characterized in terms of the crossover closure. Permutations of the representation can be used to achieve more precise discrimination between normal and anomalous patterns. Quantitative results are given for the recognition ability of contiguous-bits matching together with permutations.

Artificial immune systems have become popular in recent years as a new approach for intrusion detection systems. Indeed, the (natural) immune system applies very effective mechanisms to protect the body against foreign intruders. We present empirical and theoretical arguments, that the artificial immune system negative selection principle, which is primarily used for network intrusion detection systems, has been copied to naively and is not appropriate and not applicable for network intrusion detection systems.

Since their development, AIS have been used for a number of machine learning tasks including that of classification. Within
the literature, there appears to be a lack of appreciation for the possible bias in the selection of various representations
and affinity measures that may be introduced when employing AIS in classification tasks. Problems are then compounded when
inductive bias of algorithms are not taken into account when applying seemingly generic AIS algorithms to specific application
domains. This paper is an attempt at highlighting some of these issues. Using the example of classification, this paper explains
the potential pitfalls in representation selection and the use of various affinity measures. Additionally, attention is given
to the use of negative selection in classification and it is argued that this may be not an appropriate algorithm for such
a task. This paper then presents ideas on avoiding unnecessary mistakes in the choice and design of AIS algorithms and ultimately
delivered solutions.

Best known in our circles for his key role in the renaissance of low- density parity-check (LDPC) codes, David MacKay has written an am- bitious and original textbook. Almost every area within the purview of these TRANSACTIONS can be found in this book: data compression al- gorithms, error-correcting codes, Shannon theory, statistical inference, constrained codes, classification, and neural networks. The required mathematical level is rather minimal beyond a modicum of familiarity with probability. The author favors exposition by example, there are few formal proofs, and chapters come in mostly self-contained morsels richly illustrated with all sorts of carefully executed graphics. With its breadth, accessibility, and handsome design, this book should prove to be quite popular. Highly recommended as a primer for students with no background in coding theory, the set of chapters on error-correcting codes are an excellent brief introduction to the elements of modern sparse-graph codes: LDPC, turbo, repeat-accumulate, and fountain codes are de- scribed clearly and succinctly. As a result of the author's research on the field, the nine chapters on neural networks receive the deepest and most cohesive treatment in the book. Under the umbrella title of Probability and Inference we find a medley of chapters encompassing topics as varied as the Viterbi algorithm and the forward-backward algorithm, Monte Carlo simu- lation, independent component analysis, clustering, Ising models, the saddle-point approximation, and a sampling of decision theory topics. The chapters on data compression offer a good coverage of Huffman and arithmetic codes, and we are rewarded with material not usually encountered in information theory textbooks such as hash codes and efficient representation of integers. The expositions of the memoryless source coding theorem and of the achievability part of the memoryless channel coding theorem stick closely to the standard treatment in (1), with a certain tendency to over- simplify. For example, the source coding theorem is verbalized as: " i.i.d. random variables each with entropy can be compressed into more than bits with negligible risk of information loss, as ; conversely if they are compressed into fewer than bits it is virtually certain that informa- tion will be lost." Although no treatment of rate-distortion theory is offered, the author gives a brief sketch of the achievability of rate with bit- error rate , and the details of the converse proof of that limit are left as an exercise. Neither Fano's inequality nor an operational definition of capacity put in an appearance. Perhaps his quest for originality is what accounts for MacKay's pro- clivity to fail to call a spade a spade. Almost-lossless data compres- sion is called "lossy compression;" a vanilla-flavored binary hypoth-

Viewing the immune system as a molecular recognition device designed to identify “foreign shapes”, we estimate the probability that an immune system with NAb monospecific antibodies in its repertoire can recognize a random foreign antigen. Furthermore, we estimate the improvement in recognition if antibodies are multispecific rather than monospecific. From our probabilistic model we conclude: (1) clonal selection is feasible, i.e. with a finite number of antibodies an animal can recognize an effectively infinite number of antigens; (2) there should not be great differences in the specificities of antibody molecules among different species; (3) the region of a foreign molecule recognized by an antibody must be severely limited in extent; (4) the probability of recognizing a foreign molecule, P, increases with the antibody repertoire size NAb; however, below a certain value of NAb the immune system would be very ineffectual, while beyond some high value of NAb further increases in NAb yield diminishing small increases in P; (5) multispecificity is equivalent to a modest increase (probably less than 10) in the antibody repertoire size NAb, but this increase can substantially improve the probability of an immune system recognizing a foreign molecule.Besides recognizing foreign molecules, the immune system must distinguish them from self molecules. Using the mathematical theory of reliability we argue that multisite recognition is a more reliable method of distinguishing between molecules than single site recognition. This may have been an important evolutionary consideration in the selection of weak non-covalent interactions as the basis of antigen-antibody bonds.

The problem of protecting computer systems can be viewed generally as the problem of learning to distinguish self from other. We describe a method for change detection which is based on the generation of T cells in the immune system. Mathematical analysis reveals computational costs of the system, and preliminary experiments illustrate how the method might be applied to the problem of computer viruses. 1 Introduction The problem of ensuring the security of computer systems includes such activities as detecting unauthorized use of computer facilities, guaranteeing the integrity of data files, and preventing the spread of computer viruses. In this paper, we view these protection problems as instances of the more general problem of distinguishing self (legitimate users, corrupted data, etc.) from other (unauthorized users, viruses, etc.). We introduce a change-detection algorithm that is based on the way that natural immune systems distinguish self from other. Mathematical analysis ...