Conference PaperPDF Available

Intrusion Detection Based on Genetic Algorithm and Bayesian Networks

  • Director, IIIT Kottayam, Kerala, India Institute of National Importance

Abstract and Figures

This paper presents a general overview of Intrusion Detection Systems and the methods used in these systems, giving brief points of the design principles and the major trends. Artificial intelligence techniques are widely used in this area such as fuzzy logic and Genetic algorithms. In this paper, we will focus on the Genetic algorithm technique and how it could be used in Intrusion Detection Systems giving some examples of systems and experiments proposed in this field. The purpose of this paper is to give a clear understanding of the use of Genetic Algorithm and Bayesian Networks in IDS.
Content may be subject to copyright.
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
Intrusion Detection Based on Genetic Algorithm and Bayesian Networks
Ms.Nivedita Naidu
Department of Computer
Science & Engineering,
G.H.Raisoni College of
Professor & Head,
Department of
Computer Science &
G.H.Raisoni College of
This paper presents a general overview of Intrusion
Detection Systems and the methods used in these systems,
giving brief points of the design principles and the major
trends. Artificial intelligence techniques are widely used
in this area such as fuzzy logic and Genetic algorithms. In
this paper, we will focus on the Genetic algorithm
technique and how it could be used in Intrusion Detection
Systems giving some examples of systems and
experiments proposed in this field.
The purpose of this paper is to give a clear understanding
of the use of Genetic Algorithm and Bayesian Networks in
General Terms : Algorithms, Performance, Security
Keywords: Intrusion Detection System, Genetic
Because of increased network connectivity, computer
systems are becoming increasingly vulnerable to attack.
These attacks often exploit laws in either the operating
system or application programs. The general goal of such
attacks is to subvert the traditional security mechanisms
on the systems and execute operations in excess of the
intruder's authorization. These operations could include
reading protected or private data or simply doing
malicious damage to the system or user files.
The degree of protection from such malicious actions
depends on the amount of time and effort spent building
and maintaining the system's security defenses. By
building complex tools, which continually monitor and
report activities, a system security operator can catch
potentially malicious activities as they occur. However,
this involves a large expense in terms of time and money
in both building and maintaining such a monitoring
system. The monitoring will also impose a performance
penalty on the system being protected - something which
the users may object to.
This paper proposes a mechanism for building such a
monitoring system which does not involve a significant
process operates independently of the other agents, but
they all cooperate in monitoring the system. This
approach has significant advantages in terms of overhead,
scalability and flexibility.
An intrusion can be defined as [1]: Any set of actions that
attempt to compromise the integrity, confidentiality or
availability of a resource and they can be categorized into
two main classes:
Misuse intrusions: They are well defined attacks on
known weak points of a system. They can be spotted by
watching for certain actions being performed on certain
Anomaly intrusions: These are based on observations of
deviations from normal system usage patterns. They are
caught by examining log messages resulting from system
calls. This can be done using a pattern matching approach
such as in [3].
Intrusion Detection: An intrusion detection system
(IDS) must [2] identify, preferably in real time,
unauthorized use, misuse, and abuse of computer systems.
An intrusion detection system does not attempt to stop an
intrusion as it occurs. Its role is to alert a system security
officer that a potential security violation is occurring. As
such it is a reactive, rather than proactive, form of system
defense. An intrusion detection system can either be host-
based or network-based. Often a system is a hybrid of the
two approaches. A host based system will monitor all the
activity on a single host computer. It will ensure that no
user operations are violating the site security policy. A
network based system monitors on a net-wide basis -it
will consider actions occurring on the network and
analyze them as to whether they constitute potential
security violations.
Network- Based Intrusion Detection Systems:
Network based intrusion detection system (NIDS) as an
ID system that monitors the traffic on its network segment
as a data source. Implementation requires:
• The network interface card is placed in promiscuous
mode to capture all network traffic that crosses its
network segment; and
J D College of Engineering, Nagpur(M.S.) 127
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
• A sensor, which monitors packets traveling on that
network segment.
The objective is to determine if packet flow matches a
known signature. There are three signatures that are
particularly important:
1. String signatures that look for a text string that
indicates a possible attack
2. Port signatures simply watch for connection attempts to
well known, frequently attacked ports, and
3. Header signatures that watch for dangerous or illogical
combinations in packet headers.
“A Genetic Algorithm (GA) is a programming technique
that mimics biological evolution as a problem-solving
strategy.”[2] It is based on Darwinian’s principle of
evolution and survival of fittest to optimize a population
of candidate solutions towards a predefined fitness.
GA uses an evolution and natural selection that uses a
chromosome-like data structure and evolve the
chromosomes using selection, recombination, and
mutation operators. The following figure shows the
structure of a simple genetic algorithm. Starting by a
random generation of initial population, then evaluate and
evolve through selection, recombination, and mutation.
Finally, the best individual (chromosome) is picked out as
the final result once the optimization meets its target.
Figure 1: Application to Intrusion Detection Problem
GA has some common elements and parameters which
should be defined:
Fitness Function is defined according to [2], “The
fitness function is defined as a function which scales the
value individual relative to the rest of population.” It
computes the best possible solutions from the amount of
candidates located in the population.
GA Operators According to the figure above, the
selection, mutation and crossover are the most effective
parts in the algorithm as they are they participate in the
generation of each population.
Selection is the phase where population individuals
with better fitness are selected, otherwise it gets damaged.
Crossover is a process where each pair of individuals
selects randomly participates in exchanging their parents
with each other, until a total new population has been
Mutation flips some bits in an individual, and since all
bits could be filled, there is low probability of predicting
the change.
In [8], a generic algorithm has been presented which
contains a training process. This algorithm is designed to
apply set of classification rules according to the input data
given. It follows the simple flow of genetic algorithms
presented in the Figure 1. [8][16][2].
Applying genetic algorithm to intrusion detection seems
to be a promising area. We discuss the motivation and
implementation details in this section.
4.1 Overview:
Genetic algorithms can be used to evolve simple rules for
network traffic. These rules are used to differentiate
normal network connections from anomalous
connections. These anomalous connections refer to events
with probability of intrusions. The rules stored in the rule
base are usually in the following form :
if { condition } then { act }
Algorithm : Rule set generation using genetic algorithm.
Input : Network audit data, number of generations
and population size.
Output : A set of classification rules.
1. Initialize the population
2. W1=0.2 ,W2= 0.8, T=0.5
3. N= total number of records in the training set
4. For each chromosome in the population
5. A = 0, AB = 0
6. For each record in the training set
7. If the record matches the chromosome
8. AB = AB + 1
9. End if
J D College of Engineering, Nagpur(M.S.) 128
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
10. If the record matches only the “condition”
11. A = A+1
12. End if
13. End for
14. Fitness = W1 * AB / N + W2 * AB / A
15. If Fitness > T
16. Select the chromosome into new
17. End if
18. End for
19. For each chromosome in the new population
20. Apply crossover operator to the chromosome
21. Apply mutation operator to the chromosome
22. End for
23. If number of generations is not reached, goto
line 4.
For example, a rule can be defined as:
if {the connection has following information: source IP
address; destination IP address:; destination port number: 21;
connection time: 10.1 seconds } then {stop the
This rule can be explained as follows: if there exists a
network connection request with the source IP address, destination IP address,
destination port number 21, and connection time 10.1
seconds, then stop this connection establishment. This is
because the IP address is recognized by the
IDS as one of the blacklisted IP addresses; therefore, any
service request initiated from it is rejected. The final goal
of applying GA is to generate rules that match only the
anomalous connections. These rules are tested on
historical connections and are used to filter new
connections to find suspicious network traffic.
In this implementation, the network traffic used for GA is
a pre-classified data set that differentiates normal network
connections from anomalous ones. This data set is
gathered using network sniffers (a program used to record
network traffic without doing something harmful) such as
Tcpdump ( Snort
( The data set is manually
classified based on experts’ knowledge. It is used for the
fitness evaluation during the execution of GA. By
starting GA with only a small set of randomly generated
rules, we can generate a larger data set that contains rules
for IDS. These rules are “good enough” solutions for GA
and can be used for filtering new network traffic.
4.2 Data Representation
In order to fully exploit the suspicious level, we need to
examine all fields related with a specific network
connection. For simplicity, we only consider some
obvious attributes for each connection. The definition of
rules (for TCP/IP protocols) is shown in Table 1.
Table 1. Rule definition for connection and range of
values of each field
Attribute Range of
Values Example
Values Descriptions
Address to
d1.0b. **.**
A Subnet with IP address to
-tion IP
Address to
A Subnet with IP address to
0 to 65536 42335 Destination port number
indicates this is a http
0 to 65536 00080 Duration of the connection
is 482 seconds
Duration 0 to 99999999 00000482 The connection is
terminated by the
originator for internal use
State 1 to 20 11 The connection is
terminated by the
originator for internal use
Protocol 1 to 9 2 The protocol for this
connection is TCP
of Bytes
Sent by
0 to 99999999 0000007320 The originator sends 7320
bytes of data
of Bytes
Sent by
0 to 99999999 0000038891 The responder sends
38891 bytes of data
if {the connection has following information: source IP
address 209.11.??.??; destination IP address:
130.18.176+?.??; source port number: 42335; destination
port number: 80; connection time: 482 seconds; the
connection is stopped by the originator; the protocol used
is TCP; the originator sent 7320 bytes of data; and the
responder sent 38891 bytes of data } then {stop the
We can convert the above example into the chromosome
form, as described below.
(d, 1, 0, b, -1, -1, -1, -1, 8, 2, 1, 2,b, -1, -1, -1, 4, 2, 3, 3,
5, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 4, 8, 2, 1, 1, 2, 0, 0, 0, 0, 0, 0,
7, 3, 2, 0, 0, 0, 0, 0, 0, 3, 8, 8, 9, 1)
J D College of Engineering, Nagpur(M.S.) 129
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
Altogether there are fifty-seven genes in each
chromosome. For simplicity, we use hexadecimal
representations for the IP addresses.
The rule can be explained as follows: if a network
connection with source IP address 209.11.??.??
( ~, destination IP address
130.18.176.?? ( ~, source
port number 42335,destination port number 80, duration
time 482 seconds, ends with state 11 (the connection
terminated by the originator), uses protocol type 2 (TCP),
and the originator sends 7320 bytes of data, the
responders sends 38891 bytes of data, then this is a
suspicious behavior and can be identified as a potential
intrusion. The actual validity of this rule will be examined
by matching the historical data set comprised of
connections marked as either anomalous or normal. If the
rule is able to find an anomalous behavior, a bonus will
be given to the current chromosome. If the rule matches a
normal connection, a penalty will be applied to the
chromosome. Clearly no single rule can be used to
separate all anomalous connections from normal
connections. The population needs evolving to find the
optimal rule set.
In the example shown in Table 1, some wild cards (the
‘*’ character and the ‘?’ character) are used and the
corresponding genes within the chromosome are shown
as –1. These wild cards are used to represent an
appropriate range of specific values . It is useful when
representing a network block (a range of IP addresses or
port numbers) in a rule. Once the spatial information is
included in the rules, the capability of the IDS can be
greatly improved as an intrusion may initiate from many
different locations. The inclusion of the duration time of
a network connection in the chromosome ensures
incorporation of temporal information for network
connections. The maximum value of duration time is
99999999 seconds, which is more than a year. This is
helpful for identifying intrusions because complex
intrusions may span hours, days, or even months.
The genetic algorithm starts with a population
that has randomly selected rules. The population can
evolve by using the crossover and mutations operators.
Due to the effectiveness of the evaluation function, the
succeeding populations are biased toward rules that match
intrusive connections. Ultimately as the algorithm stops,
rules are selected and added into the IDS rule base.
Figure 2. Architecture of applying GA into Intrusion
4.3 Parameters in Genetic Algorithm
There are many parameters to consider for the application
of GA.
Evaluation function
The evaluation function is one of the most important
parameters in genetic algorithm. The following steps are
used to calculate the evaluation function. First the overall
outcome is calculated based on whether a field of the
connection matches the pre-classified data set, and then
multiply the weight of that field. The Matched value is
set to either 1 or 0.
Outcome= Matched* Weighti
The order of weight values in the function is shown in
Figure 3. These orders are categorized according to
different fields in the connection record as reported by
network sniffers. Therefore, all genes representing
destination IP address field have the same weight. The
actual values can be finely tuned at execution time. The
basic idea behind this order is the importance of different
fields in TCP/IP packets. This scheme is straightforward
and intuitive. Destination IP address is the target of an
intrusion while the source IP address is the originator of
the intrusion. These are the most important pieces of
information needed to capture an intrusion. Destination
port number indicates to applications that the target
system is running (for example, FTP service usually runs
on port 21).
Figure 3. Order of weights for fields in the evaluation
The absolute difference between the outcome of the
chromosome and the actual suspicious level is then
J D College of Engineering, Nagpur(M.S.) 130
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
computed using the following equation. The suspicious
level is a threshold that indicates the extent to which two
network connections are considered a “match.” The actual
value of suspicious level reflects observations from
historical data.
= | outcome – suspicious level |
Once a mismatch happens, the penalty value is computed
using the absolute difference. The ranking in the equation
indicates whether or not an intrusion is easy to identify.
Penalty = ( * ranking)/100
The fitness of a chromosome is computed using the above
J D College of Engineering, Nagpur(M.S.) 131
fitness = 1- penalty
Obviously, the range of the fitness value is between 0 and
1. By defining evaluation, we have incorporated both
temporal and spatial information needed for identification
of network intrusions.
A Bayesian network is a graphical modeling tool used to
model decision problems containing uncertainty. It is a
directed acyclic graph where each node represents a
discrete random variable of interest. Each node contains
the states of the random variable that it represents and a
Conditional probability table (CPT) which give
conditional probabilities of this variable such as
realization of other connected variables, based upon
Bayes rule:
The CPT of a node contains probabilities of the node
being in a specific state given the states of its parents. The
parent child relationship between nodes in a Bayesian
network indicates the direction of causality between the
corresponding variables. That is, the variable represented
by the child node is causally dependent on the ones
represented by its parents [18].
1) Intrusion Detection Interface : Figure 4 shows the
bayesian network built by AGENT ID1. For every new
connection, AGENT ID1 uses its bayesian network to
decide about the intrusion and its type.
Figure 4. Intrusion Detection Interface
2) Alerts Classification Interface : Figure 5 shows the
bayesian network built by the IPA for alerts classification.
The IPA receives alerts messages sent by intrusion
detection agents about the detected intrusions. The IPA
uses its Bayesian network to determine hyper-alerts
corresponding to these alerts.
Figure 5. Alerts Classification Interface
Figure 6. Intrusion Prediction Interface
3) Attack Plans Prediction Interface : Figure 6 shows
the bayesian network built by the IPA for attack plans
prediction. The IPA uses its bayesian network to
determine the eventual attacks that will follow the
detected intrusions.
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
In this paper, we discussed a methodology of applying
genetic algorithm into network intrusion detection
techniques. A brief overview of Intrusion Detection
System (IDS), genetic algorithm, and related detection
techniques are discussed. The system architecture is also
introduced. Factors affecting the GA are addressed in
detail. This implementation of genetic algorithm is unique
as it considers both temporal and spatial information of
network connections during the encoding of the problem;
therefore, it should be more helpful for identification of
network anomalous behaviors.
The application of our system in intrusion detection
context helps detect both normal and abnormal
connections with very considerable rates. Besides, we
presented an approach to identify attack plans and predict
upcoming attacks. We developed a Bayesian network
based system to correlate attack scenarios based on their
Our system demonstrates high performance when
intrusions, correlating and predicting attacks. This is due
to the use of bayesian networks and genetic algorithm
within bayesian networks which is especially useful when
dealing with missing information.
[1] R. Heady, G. Luger, A. Maccabe, M. Servilla. The
architecture of a network level intrusion detection system.
Technical Report, University of New Mexico,
Department of Computer Science, August 1990.
[2]Bobor, V. Efficient Intrusion Detection System
Architecture Based on Neural Networks and Genetic
Algorithms, Department of Computer and Systems
Sciences, Stockholm University / Royal Institute of
Technology, KTH/DSV, 2006.
[3]Faraoun, K M., and A. Boukelif. Genetic
Programming Approach for Multi-Category Pattern
Classification Applied to Network Intrusions Detection
International Journal of Computational Intelligence, Vol.
3, No. 1, 2006 pp. 79-90.
[4]Zhang, J., and M. Zulkernine.Anomaly Based Network
Intrusion Detection withUnsupervised Outlier Detection,
Symposium on Network Security and Information
Assurance- Proc. of the IEEE International Conference on
Communications (ICC), June 2006, Istanbul,Turkey.
[5]Kabiri, P., and Ali A. Ghorban. Research on Intrusion
Detection and Response: a Survey International Journal
of network security, The Intelligent & Adaptive Systems
Group (IAS),Vol. 1, No. 2, 4 July 2005, pp. 84-102 .
[6] Diaz-Gome, P. A., and D. F. Hougen. Improved
Offline Intrusion Detection using a genetic algorithm,
of the Seventh International Conference on Enterprise
Information Systems, 2005, Miami, USA.
[7]Gong, R.H. , M. Zulkernine, P. Abolmaesumi, A
Software Implementation of a Genetic Algorithm Based
Approach to Network Intrusion Detection,Proceedings of
Sixth IEEE ACIS International Conference on Software
Engineering, Artificial Intelligence, Networking, and
Parallel/Distributed Computing (SNPD) May 2005,
[8] Stein, G., B. Chen, A. S. Wu, and Kien A. Hua.
Decision tree classifier for network intrusion detection
with GA-based feature selection, In the Proceedings of
the 43rd ACM Southeast Conference, March 18-20, 2005,
Kennesaw, GA, .
[9].Folino, G., C. Pizzuti, G. Spezzano,GP Ensemble for
Distributed Intrusion Detection Systems, International
Conference on Advances in Pattern Recognition,
ICAPR05, August22-25, 2005, Bath, UK.
[10] De Boer, P., and Martin Pels, Host-Based Intrusion
Detection Systems, Technical Report:1.10, Faculty of
Science, Informatics Institute, University of Amsterdam,
[11] Yao, J. T., S.L. Zhao, and L.V. Saxton, A study on
fuzzy intrusion detection , Proceedings of SPIE Vol.
5812, Data Mining, Intrusion Detection, Information
Assurance, And Data Networks Security, 28 March - 1
April 2005, Orlando, Florida, USA.
[12] Bradford, P. G., and N. Hu. A Layered Approach to
Insider Threat Detection and Proactive Forensics. 21st
Annual Computer Security Applications Conference,
Applied Computer Security Associates
(ACSA),December 5-9, 2005, Tucson, Arizona
[13] Brugger, S. T. Data Mining Methods for Network
Intrusion Detection. Terry Brugger's Homepage. 9 June
2004. University of California, Davis. 6 Oct. 2006.
[14] Li, W.,Using Genetic Algorithm for Network
Intrusion Detection Proceedings of the United States
Department of Energy Cyber Security Group 2004
Training Conference, May24-27, 2004, Kansas City,
Kansas, USA.
[15] Marczyk, A. "Genetic Algorithms and Evolutionary
Computation.", The Talk, Origins Archive. 23 Apr. 2004.
7 Oct. 2006.
[16] Song, D., A Linear Genetic Programming Approach
to Intrusion Detection, Master Degree for Computer
Sciences, Genetic and Evolutionary Computation–
GECCO 2003.
[17] Smith, L. S. An Introduction to Neural Networks
Professor Leslie S. Smith, Centre for Cognitive and
Computational Neuroscience. 2 Apr. 2003. 7 Oct. 2006.
J D College of Engineering, Nagpur(M.S.) 132
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
[18] Coull, S., Joel Branch, Boleslaw Szymanski, and
Eric Breimer. Intrusion Detection: a Bioinformatics
Approach Proceedings of the 19th Annual Computer
Security Applications Conference, Dec. 2003, Las Vegas,
[19]Gorodetsky,V., I.Kotenko, and O.Karsaev. Multi-
agent Technologies for Computer Network Security:
Attack Simulation, Intrusion Detection and Intrusion
Detection Learning International Journal of Computer
Systems Science and Engineering. vol.18, No.4, July
2003, pp.191-200.
[20]Gomez, J., and D. Dasgupta. Evolving Fuzzy Classifiers for
Intrusion Detection.Proceedings of the 2002 IEEE, Workshop
on Information Assurance, United States Military Academy,
June 2001,West Point, NY .
J D College of Engineering, Nagpur(M.S.) 133
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
An insider threat is a menace to computer security as a result of unauthorized system misuse by users of an organization. A recent study jointly published by the United States Secret Service and Carnegie Mellon University (7) confirms the prevalence of computer crimes perpetrated by insiders across America's organizations. Insider attacks can be more destructive and costly than attacks from the outside as a perpetrator often has deep understanding of and convenient accesses to a plethora of an organization's computer resources. This paper discusses augmenting intrusion detection systems with forensics tools to enhance the discovery and prosecution of internal attacks. Our research follows two approaches: One is usingintrusion detection systems (IDSs) (5) as black boxes and having them drive forensics tools. Likewise, we are looking at building our own statistical metrics for fleshing out long term changes in user behavior. Given their pervasive and destructive nature, early detec- tion and documentation of insider breaches is unarguably vital to stakeholders' interests. Unfortunately, IDSs alone may not be suitable for this challenge for the following rea- sons: • Detection accuracy: Either an IDS may overlook a real threat (false negative) or issue alerts about a normal event (false positive). Signature-based ID (SID)identi- fies known intrusive patterns, but fails to capture novel behavior. Anomaly-based ID (AID)is susceptible to false alarms due to the difficulty of distinguishing an attack from numerous anomalies. False negatives are obviously more detrimental than false positives be- cause of their destructive nature. False positives are less urgent concerns but no less troublesome as they waste the resources needed for their verification. For example, a false positive may prompt an IDS to collect useless data on an apparently benign user over the long term.
Full-text available
This paper presents the preliminary architecture of a network level intrusion detection system. The proposed system will monitor base level information in network packets (source, destination, packet size, and time), learning the normal patterns and announcing anomalies as they occur. The goal of this research is to determine the applicability of current intrusion detection technology to the detection of network level intrusions. In particular, the authors are investigating the possibility of using this technology to detect and react to worm programs.
Full-text available
This paper describes a technique of applying Genetic Algorithm (GA) to network Intrusion Detection Systems (IDSs). A brief overview of the Intrusion Detection System, genetic algorithm, and related detection techniques is presented. Parameters and evolution process for GA are discussed in detail. Unlike other implementations of the same problem, this implementation considers both temporal and spatial information of network connections in encoding the network connection information into rules in IDS. This is helpful for identification of complex anomalous behaviors. This work is focused on the TCP/IP network protocols.
Network intrusion detection systems have become a standard component in security infrastruc- tures. Unfortunately, current systems are poor at detecting novel attacks without an unacceptable level of false alarms. We propose that the solution to this problem is the application of an en- semble of data mining techniques which can be applied to network connection data in an oine environment, augmenting existing real-time sensors. In this paper, we expand on our motivation, particularly with regard to running in an oine environment, and our interest in multisensor and multimethod correlation. We then review existing systems, from commercial systems, to research based intrusion detection systems. Next we survey the state of the art in the area. Standard datasets and feature extraction turned out to be more important than we had initially antici- pated, so each can be found under its own heading. Next, we review the actual data mining methods that have been proposed or implemented. We conclude by summarizing the open prob- lems in this area, along with some questions of a broader scope. We hope that by providing the motivation and summarizing the work in this area that we can stimulate further research. Categories and Subject Descriptors: K.6.5 (Management of Computing and Information
Conference Paper
Page-based Linear Genetic Programming (GP) is proposed and implemented with two-layer Subset Selection to address a two-class intrusion detection classification problem as defined by the KDD-99 benchmark dataset. By careful adjustment of the relationship between subset layers, over fitting by individuals to specific subsets is avoided. Moreover, efficient training on a dataset of 500,000 patterns is demonstrated. Unlike the current approaches to this benchmark, the learning algorithm is also responsible for deriving useful temporal features. Following evolution, decoding of a GP individual demonstrates that the solution is unique and comparative to hand coded solutions found by experts.
Conference Paper
Anomaly detection is a critical issue in Network Intrusion Detection Systems (NIDSs). Most anomaly based NIDSs employ supervised algorithms, whose performances highly depend on attack-free training data. However, this kind of training data is difficult to obtain in real world network environment. Moreover, with changing network environment or services, patterns of normal traffic will be changed. This leads to high false positive rate of supervised NIDSs. Unsupervised outlier detection can overcome the drawbacks of supervised anomaly detection. Therefore, we apply one of the efficient data mining algorithms called random forests algorithm in anomaly based NIDSs. Without attack-free training data, random forests algorithm can detect outliers in datasets of network traffic. In this paper, we discuss our framework of anomaly based network intrusion detection. In the framework, patterns of network services are built by random forests algorithm over traffic data. Intrusions are detected by determining outliers related to the built patterns. We present the modification on the outlier detection algorithm of random forests. We also report our experimental results over the KDD'99 dataset. The results show that the proposed approach is comparable to previously reported unsupervised anomaly detection approaches evaluated over the KDD' 99 dataset.
Conference Paper
Machine Learning techniques such as Genetic Algorithms and Decision Trees have been applied to the field of intrusion detection for more than a decade. Machine Learning techniques can learn normal and anomalous patterns from training data and generate classifiers that then are used to detect attacks on computer systems. In general, the input data to classifiers is in a high dimension feature space, but not all of features are relevant to the classes to be classified. In this paper, we use a genetic algorithm to select a subset of input features for decision tree classifiers, with a goal of increasing the detection rate and decreasing the false alarm rate in network intrusion detection. We used the KDDCUP 99 data set to train and test the decision tree classifiers. The experiments show that the resulting decision trees can have better performance than those built with all available features.