Chapter

Generalization Regions in Hamming Negative Selection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Negative selection is an immune-inspired algorithm which is typically applied to anomaly detection problems. We present an empirical investigation of the generalization capability of the Hamming negative selection, when combined with the r-chunk affinity metric. Our investigations reveal that when using the r-chunk metric, the length r is a crucial parameter and is inextricably linked to the input data being analyzed. Moreover, we propose that input data with different characteristics, i.e. different positional biases, can result in an incorrect generalization effect.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Hamming negative selection is an immune-inspired technique for one-class classification problems. Recent results, however, have revealed several problems concerning algorithm complexity of generating detectors [5,6,7] and determining the proper matching threshold to allow for the generation of correct generalization regions [8] . In this paper we investigate an extended technique for Hamming negative selection: permutation masks. ...
... In [18,8] results were presented which demonstrated the coherence between the matching threshold r and generalization regions when the r-chunk matching rule in Hamming negative selection is applied. Recall, as holes are not detectable by any detector, holes must represent unseen self elements, or in other words holes must represent generalization regions. ...
... Finally, we explore empirically whether randomly determined permutation masks reduce the number of holes. Stibor et al. [8] have shown in prior experiments that the matching threshold r is a crucial parameter and is inextricably linked to the input data being analyzed. However, permutation masks were not considered in [8]. ...
Conference Paper
Full-text available
Permutation masks were proposed for reducing the number of holes in Hamming negative selection when applying the r-contiguous or r-chunk matching rule. Here, we show that (randomly determined) permutation masks re-arrange the semantic representation of the underlying data and therefore shatter self-regions. As a consequence, detectors do not cover areas around self regions, instead they cover randomly distributed elements across the space. In addition, we observe that the resulting holes occur in regions where actually no self regions should occur.
... In additional work, Stibor et al. [45,46] argued that holes in anomaly detection with binary negative selection algorithm are necessary to generalize beyond the training data set. Holes must represent unseen self elements (or generation regions) to ensure that seen and unseen self elements are not recognized by any detector. ...
... Holes must represent unseen self elements (or generation regions) to ensure that seen and unseen self elements are not recognized by any detector. In [45], they explored the generation capability of the Hamming negative selection when using the r-chunk length r. They found that an r-chunk length which does not properly capture the semantic representation of the input data will result in an incorrect generalization and further concluded that a suitable r-chunk length does not exist for input data with element of different length. ...
Article
The immune system is a remarkable information processing and self learning system that offers inspiration to build artificial immune system (AIS). The field of AIS has obtained a significant degree of success as a branch of Computational Intelligence since it emerged in the 1990s. This paper surveys the major works in the AIS field, in particular, it explores up-to-date advances in applied AIS during the last few years. This survey has revealed that recent research is centered on four major AIS algorithms: (1) negative selection algorithms; (2) artificial immune networks; (3) clonal selection algorithms; (4) Danger Theory and dendritic cell algorithms. However, other aspects of the biological immune system are motivating computer scientists and engineers to develop new models and problem solving methods. Though an extensive amount of AIS applications has been developed, the success of these applications is still limited by the lack of any exemplars that really stand out as killer AIS applications.
... Dilger [8] investigated metric properties of some affinity functions (Hamming and r-contiguous) and showed that not all metric properties are satisfied. González et al. [11] and Stibor et al. [15, 17] showed that the generalization capability of some affinity ...
Conference Paper
Affinity functions are the core components in negative selection to discriminate self from non-self. It has been shown that affinity functions such as the r-contiguous distance and the Hamming distance are limited applicable for discrimination problems such as anomaly detection. We propose to model self as a discrete probability distribution specified by finite mixtures of multivariate Bernoulli distributions. As by-product one also obtains information of non-self and hence is able to discriminate with probabilities self from non-self. We underpin our proposal with a comparative study between the two affinity functions and the probabilistic discrimination.
Chapter
Glossary Definition of the Subject Introduction What Is an Artificial Immune System? Current Artificial Immune Systems Biology and Basic Algorithms Alternative Immunological Theories for AIS Emerging Methodologies in AIS Future Directions Bibliography
Article
The problem of generating r-contiguous detectors in negative selection can be transformed in the problem of finding assignment sets for a Boolean formula in k-CNF. Knowing this crucial fact enables us to explore the computational complexity and the feasibility of finding detectors with respect to the number of self bit strings |S||\mathcal{S}| , the bit string length l and matching length r. It turns out that finding detectors is hardest in the phase transition region, which is characterized by certain combinations of parameters |S|,l|\mathcal{S}|,l and r. This insight is derived by investigating the r-contiguous matching probability in a random search approach and by using the equivalent k-CNF problem formulation.
Article
Negative selection and the associated r-contiguous matching rule is a popular immune-inspired method for anomaly detection problems. In recent years, however, problems such as scalability and high false positive rate have been empirically noticed. In this article, negative selection and the associated r-contiguous matching rule are investigated from a pattern classification perspective. This includes insights in the generalization capability of negative selection and the computational complexity of finding r-contiguous detectors.
Conference Paper
Full-text available
The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune system provides the human body with a high level of protection from invading pathogens, in a robust, self-organised and distributed manner. Secondly, current techniques used in computer security are not able to cope with the dynamic and increasingly complex nature of computer systems and their security. It is hoped that biologically inspired approaches in this area, including the use of immune-based systems will be able to meet this challenge. Here we collate the algorithms used, the development of the systems and the outcome of their implementation. It provides an introduction and review of the key developments within this field, in addition to making suggestions for future research. KeywordsArtificial immune systems-intrusion detection systems-literature review
Conference Paper
Full-text available
Negative selection algorithm is one of the most widely used techniques in the field of artificial immune systems. It is primarily used to detect changes in data/behavior patterns by generating detectors in the complementary space (from given normal samples). The negative selection algorithm generally uses binary matching rules to generate detectors. The purpose of the paper is to show that the low-level representation of binary matching rules is unable to capture the structure of some problem spaces. The paper compares some of the binary matching rules reported in the literature and study how they behave in a simple two-dimensional real-valued space. In particular, we study the detection accuracy and the areas covered by sets of detectors generated using the negative selection algorithm.
Article
Full-text available
In anomaly detection, the normal behavior of a process is characterized by a model, and deviations from the model are called anomalies. In behavior-based approaches to anomaly detection, the model of normal behavior is constructed from an observed sample of normally occurring patterns. Models of normal behavior can represent either the set of allowed patterns (positive detection) or the set of anomalous patterns (negative detection). A formal framework is given for analyzing the tradeoffs between positive and negative detection schemes in terms of the number of detectors needed to maximize coverage. For realistically sized problems, the universe of possible patterns is too large to represent exactly (in either the positive or negative scheme). Partial matching rules generalize the set of allowable (or unallowable) patterns, and the choice of matching rule affects the tradeoff between positive and negative detection. A new match rule is introduced, called r-chunks, and the generalizations induced by different partial matching rules are characterized in terms of the crossover closure. Permutations of the representation can be used to achieve more precise discrimination between normal and anomalous patterns. Quantitative results are given for the recognition ability of contiguous-bits matching together with permutations.
Conference Paper
Full-text available
Artificial immune systems have become popular in recent years as a new approach for intrusion detection systems. Indeed, the (natural) immune system applies very effective mechanisms to protect the body against foreign intruders. We present empirical and theoretical arguments, that the artificial immune system negative selection principle, which is primarily used for network intrusion detection systems, has been copied to naively and is not appropriate and not applicable for network intrusion detection systems.
Conference Paper
Since their development, AIS have been used for a number of machine learning tasks including that of classification. Within the literature, there appears to be a lack of appreciation for the possible bias in the selection of various representations and affinity measures that may be introduced when employing AIS in classification tasks. Problems are then compounded when inductive bias of algorithms are not taken into account when applying seemingly generic AIS algorithms to specific application domains. This paper is an attempt at highlighting some of these issues. Using the example of classification, this paper explains the potential pitfalls in representation selection and the use of various affinity measures. Additionally, attention is given to the use of negative selection in classification and it is argued that this may be not an appropriate algorithm for such a task. This paper then presents ideas on avoiding unnecessary mistakes in the choice and design of AIS algorithms and ultimately delivered solutions.
Book
Best known in our circles for his key role in the renaissance of low- density parity-check (LDPC) codes, David MacKay has written an am- bitious and original textbook. Almost every area within the purview of these TRANSACTIONS can be found in this book: data compression al- gorithms, error-correcting codes, Shannon theory, statistical inference, constrained codes, classification, and neural networks. The required mathematical level is rather minimal beyond a modicum of familiarity with probability. The author favors exposition by example, there are few formal proofs, and chapters come in mostly self-contained morsels richly illustrated with all sorts of carefully executed graphics. With its breadth, accessibility, and handsome design, this book should prove to be quite popular. Highly recommended as a primer for students with no background in coding theory, the set of chapters on error-correcting codes are an excellent brief introduction to the elements of modern sparse-graph codes: LDPC, turbo, repeat-accumulate, and fountain codes are de- scribed clearly and succinctly. As a result of the author's research on the field, the nine chapters on neural networks receive the deepest and most cohesive treatment in the book. Under the umbrella title of Probability and Inference we find a medley of chapters encompassing topics as varied as the Viterbi algorithm and the forward-backward algorithm, Monte Carlo simu- lation, independent component analysis, clustering, Ising models, the saddle-point approximation, and a sampling of decision theory topics. The chapters on data compression offer a good coverage of Huffman and arithmetic codes, and we are rewarded with material not usually encountered in information theory textbooks such as hash codes and efficient representation of integers. The expositions of the memoryless source coding theorem and of the achievability part of the memoryless channel coding theorem stick closely to the standard treatment in (1), with a certain tendency to over- simplify. For example, the source coding theorem is verbalized as: " i.i.d. random variables each with entropy can be compressed into more than bits with negligible risk of information loss, as ; conversely if they are compressed into fewer than bits it is virtually certain that informa- tion will be lost." Although no treatment of rate-distortion theory is offered, the author gives a brief sketch of the achievability of rate with bit- error rate , and the details of the converse proof of that limit are left as an exercise. Neither Fano's inequality nor an operational definition of capacity put in an appearance. Perhaps his quest for originality is what accounts for MacKay's pro- clivity to fail to call a spade a spade. Almost-lossless data compres- sion is called "lossy compression;" a vanilla-flavored binary hypoth-
Article
Viewing the immune system as a molecular recognition device designed to identify “foreign shapes”, we estimate the probability that an immune system with NAb monospecific antibodies in its repertoire can recognize a random foreign antigen. Furthermore, we estimate the improvement in recognition if antibodies are multispecific rather than monospecific. From our probabilistic model we conclude: (1) clonal selection is feasible, i.e. with a finite number of antibodies an animal can recognize an effectively infinite number of antigens; (2) there should not be great differences in the specificities of antibody molecules among different species; (3) the region of a foreign molecule recognized by an antibody must be severely limited in extent; (4) the probability of recognizing a foreign molecule, P, increases with the antibody repertoire size NAb; however, below a certain value of NAb the immune system would be very ineffectual, while beyond some high value of NAb further increases in NAb yield diminishing small increases in P; (5) multispecificity is equivalent to a modest increase (probably less than 10) in the antibody repertoire size NAb, but this increase can substantially improve the probability of an immune system recognizing a foreign molecule.Besides recognizing foreign molecules, the immune system must distinguish them from self molecules. Using the mathematical theory of reliability we argue that multisite recognition is a more reliable method of distinguishing between molecules than single site recognition. This may have been an important evolutionary consideration in the selection of weak non-covalent interactions as the basis of antigen-antibody bonds.
Article
The problem of protecting computer systems can be viewed generally as the problem of learning to distinguish self from other. We describe a method for change detection which is based on the generation of T cells in the immune system. Mathematical analysis reveals computational costs of the system, and preliminary experiments illustrate how the method might be applied to the problem of computer viruses. 1 Introduction The problem of ensuring the security of computer systems includes such activities as detecting unauthorized use of computer facilities, guaranteeing the integrity of data files, and preventing the spread of computer viruses. In this paper, we view these protection problems as instances of the more general problem of distinguishing self (legitimate users, corrupted data, etc.) from other (unauthorized users, viruses, etc.). We introduce a change-detection algorithm that is based on the way that natural immune systems distinguish self from other. Mathematical analysis ...