Content uploaded by Rajiv Vasantrao Dharaskar
Author content
All content in this area was uploaded by Rajiv Vasantrao Dharaskar on Mar 30, 2018
Content may be subject to copyright.
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
Intrusion Detection Based on Genetic Algorithm and Bayesian Networks
Ms.Nivedita Naidu
Student,MTech,IIISem,
Department of Computer
Science & Engineering,
G.H.Raisoni College of
Engineering,Nagpur,
Maharashtra,India.
naidu_nivedita@rediffmail.com
Dr.R.V.Dharaskar
Professor & Head,
Department of
Computer Science &
Engineering,
G.H.Raisoni College of
Engg.,Nagpur,
Maharashtra,India.
ABSTRACT
This paper presents a general overview of Intrusion
Detection Systems and the methods used in these systems,
giving brief points of the design principles and the major
trends. Artificial intelligence techniques are widely used
in this area such as fuzzy logic and Genetic algorithms. In
this paper, we will focus on the Genetic algorithm
technique and how it could be used in Intrusion Detection
Systems giving some examples of systems and
experiments proposed in this field.
The purpose of this paper is to give a clear understanding
of the use of Genetic Algorithm and Bayesian Networks in
IDS.
General Terms : Algorithms, Performance, Security
Keywords: Intrusion Detection System, Genetic
Algorithm.
1. INTRODUCTION
Because of increased network connectivity, computer
systems are becoming increasingly vulnerable to attack.
These attacks often exploit laws in either the operating
system or application programs. The general goal of such
attacks is to subvert the traditional security mechanisms
on the systems and execute operations in excess of the
intruder's authorization. These operations could include
reading protected or private data or simply doing
malicious damage to the system or user files.
The degree of protection from such malicious actions
depends on the amount of time and effort spent building
and maintaining the system's security defenses. By
building complex tools, which continually monitor and
report activities, a system security operator can catch
potentially malicious activities as they occur. However,
this involves a large expense in terms of time and money
in both building and maintaining such a monitoring
system. The monitoring will also impose a performance
penalty on the system being protected - something which
the users may object to.
This paper proposes a mechanism for building such a
monitoring system which does not involve a significant
process operates independently of the other agents, but
they all cooperate in monitoring the system. This
approach has significant advantages in terms of overhead,
scalability and flexibility.
2. INTRUSIONS AND INTRUSION DETECTION:
An intrusion can be defined as [1]: Any set of actions that
attempt to compromise the integrity, confidentiality or
availability of a resource and they can be categorized into
two main classes:
Misuse intrusions: They are well defined attacks on
known weak points of a system. They can be spotted by
watching for certain actions being performed on certain
objects.
Anomaly intrusions: These are based on observations of
deviations from normal system usage patterns. They are
caught by examining log messages resulting from system
calls. This can be done using a pattern matching approach
such as in [3].
Intrusion Detection: An intrusion detection system
(IDS) must [2] identify, preferably in real time,
unauthorized use, misuse, and abuse of computer systems.
An intrusion detection system does not attempt to stop an
intrusion as it occurs. Its role is to alert a system security
officer that a potential security violation is occurring. As
such it is a reactive, rather than proactive, form of system
defense. An intrusion detection system can either be host-
based or network-based. Often a system is a hybrid of the
two approaches. A host based system will monitor all the
activity on a single host computer. It will ensure that no
user operations are violating the site security policy. A
network based system monitors on a net-wide basis -it
will consider actions occurring on the network and
analyze them as to whether they constitute potential
security violations.
Network- Based Intrusion Detection Systems:
Network based intrusion detection system (NIDS) as an
ID system that monitors the traffic on its network segment
as a data source. Implementation requires:
• The network interface card is placed in promiscuous
mode to capture all network traffic that crosses its
network segment; and
J D College of Engineering, Nagpur(M.S.) 127
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
• A sensor, which monitors packets traveling on that
network segment.
The objective is to determine if packet flow matches a
known signature. There are three signatures that are
particularly important:
1. String signatures that look for a text string that
indicates a possible attack
2. Port signatures simply watch for connection attempts to
well known, frequently attacked ports, and
3. Header signatures that watch for dangerous or illogical
combinations in packet headers.
3. GENETIC ALGORITHMS
“A Genetic Algorithm (GA) is a programming technique
that mimics biological evolution as a problem-solving
strategy.”[2] It is based on Darwinian’s principle of
evolution and survival of fittest to optimize a population
of candidate solutions towards a predefined fitness.
[16][18]
GA uses an evolution and natural selection that uses a
chromosome-like data structure and evolve the
chromosomes using selection, recombination, and
mutation operators. The following figure shows the
structure of a simple genetic algorithm. Starting by a
random generation of initial population, then evaluate and
evolve through selection, recombination, and mutation.
Finally, the best individual (chromosome) is picked out as
the final result once the optimization meets its target.
[16][18][20][3]
Figure 1: Application to Intrusion Detection Problem
GA has some common elements and parameters which
should be defined:
Fitness Function is defined according to [2], “The
fitness function is defined as a function which scales the
value individual relative to the rest of population.” It
computes the best possible solutions from the amount of
candidates located in the population.
GA Operators According to the figure above, the
selection, mutation and crossover are the most effective
parts in the algorithm as they are they participate in the
generation of each population.
Selection is the phase where population individuals
with better fitness are selected, otherwise it gets damaged.
Crossover is a process where each pair of individuals
selects randomly participates in exchanging their parents
with each other, until a total new population has been
generated.
Mutation flips some bits in an individual, and since all
bits could be filled, there is low probability of predicting
the change.
4. DETECTION ALGORITHM OVERVIEW
In [8], a generic algorithm has been presented which
contains a training process. This algorithm is designed to
apply set of classification rules according to the input data
given. It follows the simple flow of genetic algorithms
presented in the Figure 1. [8][16][2].
Applying genetic algorithm to intrusion detection seems
to be a promising area. We discuss the motivation and
implementation details in this section.
4.1 Overview:
Genetic algorithms can be used to evolve simple rules for
network traffic. These rules are used to differentiate
normal network connections from anomalous
connections. These anomalous connections refer to events
with probability of intrusions. The rules stored in the rule
base are usually in the following form :
if { condition } then { act }
Algorithm : Rule set generation using genetic algorithm.
Input : Network audit data, number of generations
and population size.
Output : A set of classification rules.
1. Initialize the population
2. W1=0.2 ,W2= 0.8, T=0.5
3. N= total number of records in the training set
4. For each chromosome in the population
5. A = 0, AB = 0
6. For each record in the training set
7. If the record matches the chromosome
8. AB = AB + 1
9. End if
J D College of Engineering, Nagpur(M.S.) 128
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
10. If the record matches only the “condition”
part
11. A = A+1
12. End if
13. End for
14. Fitness = W1 * AB / N + W2 * AB / A
15. If Fitness > T
16. Select the chromosome into new
population
17. End if
18. End for
19. For each chromosome in the new population
20. Apply crossover operator to the chromosome
21. Apply mutation operator to the chromosome
22. End for
23. If number of generations is not reached, goto
line 4.
For example, a rule can be defined as:
if {the connection has following information: source IP
address 124.12.5.18; destination IP address:
130.18.206.55; destination port number: 21;
connection time: 10.1 seconds } then {stop the
connection}
This rule can be explained as follows: if there exists a
network connection request with the source IP address
124.12.5.18, destination IP address 130.18.206.55,
destination port number 21, and connection time 10.1
seconds, then stop this connection establishment. This is
because the IP address 124.12.5.18 is recognized by the
IDS as one of the blacklisted IP addresses; therefore, any
service request initiated from it is rejected. The final goal
of applying GA is to generate rules that match only the
anomalous connections. These rules are tested on
historical connections and are used to filter new
connections to find suspicious network traffic.
In this implementation, the network traffic used for GA is
a pre-classified data set that differentiates normal network
connections from anomalous ones. This data set is
gathered using network sniffers (a program used to record
network traffic without doing something harmful) such as
Tcpdump (http://www.tcpdump.com)or Snort
(http://www.snort.com). The data set is manually
classified based on experts’ knowledge. It is used for the
fitness evaluation during the execution of GA. By
starting GA with only a small set of randomly generated
rules, we can generate a larger data set that contains rules
for IDS. These rules are “good enough” solutions for GA
and can be used for filtering new network traffic.
4.2 Data Representation
In order to fully exploit the suspicious level, we need to
examine all fields related with a specific network
connection. For simplicity, we only consider some
obvious attributes for each connection. The definition of
rules (for TCP/IP protocols) is shown in Table 1.
Table 1. Rule definition for connection and range of
values of each field
Attribute Range of
Values Example
Values Descriptions
Source
IP
Address
0.0.0.0 to
255.255.255.2
555
d1.0b. **.**
(209.11.??.??)
A Subnet with IP address
209.11.0.0 to
209.11.255.2555
Destina
-tion IP
Address
0.0.0.0 to
255.255.255.2
555
82.12.b*.**
(130.18.176+?.
??)
A Subnet with IP address
130.18.176.0 to
130.18.255.2555
Source
Port
Number
0 to 65536 42335 Destination port number
indicates this is a http
service
Destina
-tion
Port
Number
0 to 65536 00080 Duration of the connection
is 482 seconds
Duration 0 to 99999999 00000482 The connection is
terminated by the
originator for internal use
State 1 to 20 11 The connection is
terminated by the
originator for internal use
Protocol 1 to 9 2 The protocol for this
connection is TCP
Number
of Bytes
Sent by
Originat
or
0 to 99999999 0000007320 The originator sends 7320
bytes of data
Number
of Bytes
Sent by
Respond
er
0 to 99999999 0000038891 The responder sends
38891 bytes of data
if {the connection has following information: source IP
address 209.11.??.??; destination IP address:
130.18.176+?.??; source port number: 42335; destination
port number: 80; connection time: 482 seconds; the
connection is stopped by the originator; the protocol used
is TCP; the originator sent 7320 bytes of data; and the
responder sent 38891 bytes of data } then {stop the
connection}.
We can convert the above example into the chromosome
form, as described below.
(d, 1, 0, b, -1, -1, -1, -1, 8, 2, 1, 2,b, -1, -1, -1, 4, 2, 3, 3,
5, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 4, 8, 2, 1, 1, 2, 0, 0, 0, 0, 0, 0,
7, 3, 2, 0, 0, 0, 0, 0, 0, 3, 8, 8, 9, 1)
J D College of Engineering, Nagpur(M.S.) 129
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
Altogether there are fifty-seven genes in each
chromosome. For simplicity, we use hexadecimal
representations for the IP addresses.
The rule can be explained as follows: if a network
connection with source IP address 209.11.??.??
(209.11.0.0 ~ 209.11.255.255), destination IP address
130.18.176.?? (130.18.176.0 ~ 130.18.255.255), source
port number 42335,destination port number 80, duration
time 482 seconds, ends with state 11 (the connection
terminated by the originator), uses protocol type 2 (TCP),
and the originator sends 7320 bytes of data, the
responders sends 38891 bytes of data, then this is a
suspicious behavior and can be identified as a potential
intrusion. The actual validity of this rule will be examined
by matching the historical data set comprised of
connections marked as either anomalous or normal. If the
rule is able to find an anomalous behavior, a bonus will
be given to the current chromosome. If the rule matches a
normal connection, a penalty will be applied to the
chromosome. Clearly no single rule can be used to
separate all anomalous connections from normal
connections. The population needs evolving to find the
optimal rule set.
In the example shown in Table 1, some wild cards (the
‘*’ character and the ‘?’ character) are used and the
corresponding genes within the chromosome are shown
as –1. These wild cards are used to represent an
appropriate range of specific values . It is useful when
representing a network block (a range of IP addresses or
port numbers) in a rule. Once the spatial information is
included in the rules, the capability of the IDS can be
greatly improved as an intrusion may initiate from many
different locations. The inclusion of the duration time of
a network connection in the chromosome ensures
incorporation of temporal information for network
connections. The maximum value of duration time is
99999999 seconds, which is more than a year. This is
helpful for identifying intrusions because complex
intrusions may span hours, days, or even months.
The genetic algorithm starts with a population
that has randomly selected rules. The population can
evolve by using the crossover and mutations operators.
Due to the effectiveness of the evaluation function, the
succeeding populations are biased toward rules that match
intrusive connections. Ultimately as the algorithm stops,
rules are selected and added into the IDS rule base.
Figure 2. Architecture of applying GA into Intrusion
Detection
4.3 Parameters in Genetic Algorithm
There are many parameters to consider for the application
of GA.
Evaluation function
The evaluation function is one of the most important
parameters in genetic algorithm. The following steps are
used to calculate the evaluation function. First the overall
outcome is calculated based on whether a field of the
connection matches the pre-classified data set, and then
multiply the weight of that field. The Matched value is
set to either 1 or 0.
Outcome= ∑Matched* Weighti
The order of weight values in the function is shown in
Figure 3. These orders are categorized according to
different fields in the connection record as reported by
network sniffers. Therefore, all genes representing
destination IP address field have the same weight. The
actual values can be finely tuned at execution time. The
basic idea behind this order is the importance of different
fields in TCP/IP packets. This scheme is straightforward
and intuitive. Destination IP address is the target of an
intrusion while the source IP address is the originator of
the intrusion. These are the most important pieces of
information needed to capture an intrusion. Destination
port number indicates to applications that the target
system is running (for example, FTP service usually runs
on port 21).
Figure 3. Order of weights for fields in the evaluation
function
The absolute difference between the outcome of the
chromosome and the actual suspicious level is then
J D College of Engineering, Nagpur(M.S.) 130
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
computed using the following equation. The suspicious
level is a threshold that indicates the extent to which two
network connections are considered a “match.” The actual
value of suspicious level reflects observations from
historical data.
∆ = | outcome – suspicious level |
Once a mismatch happens, the penalty value is computed
using the absolute difference. The ranking in the equation
indicates whether or not an intrusion is easy to identify.
Penalty = (∆ * ranking)/100
The fitness of a chromosome is computed using the above
penalty:
J D College of Engineering, Nagpur(M.S.) 131
fitness = 1- penalty
Obviously, the range of the fitness value is between 0 and
1. By defining evaluation, we have incorporated both
temporal and spatial information needed for identification
of network intrusions.
5.BAYESIAN NETWORKS
A Bayesian network is a graphical modeling tool used to
model decision problems containing uncertainty. It is a
directed acyclic graph where each node represents a
discrete random variable of interest. Each node contains
the states of the random variable that it represents and a
Conditional probability table (CPT) which give
conditional probabilities of this variable such as
realization of other connected variables, based upon
Bayes rule:
Π(Β|Α)=Π(Α|Β)Π(Β)/Π(Α)
The CPT of a node contains probabilities of the node
being in a specific state given the states of its parents. The
parent child relationship between nodes in a Bayesian
network indicates the direction of causality between the
corresponding variables. That is, the variable represented
by the child node is causally dependent on the ones
represented by its parents [18].
5.1 SYSTEM IMPLEMENTATION:-
1) Intrusion Detection Interface : Figure 4 shows the
bayesian network built by AGENT ID1. For every new
connection, AGENT ID1 uses its bayesian network to
decide about the intrusion and its type.
Figure 4. Intrusion Detection Interface
2) Alerts Classification Interface : Figure 5 shows the
bayesian network built by the IPA for alerts classification.
The IPA receives alerts messages sent by intrusion
detection agents about the detected intrusions. The IPA
uses its Bayesian network to determine hyper-alerts
corresponding to these alerts.
Figure 5. Alerts Classification Interface
Figure 6. Intrusion Prediction Interface
3) Attack Plans Prediction Interface : Figure 6 shows
the bayesian network built by the IPA for attack plans
prediction. The IPA uses its bayesian network to
determine the eventual attacks that will follow the
detected intrusions.
6. CONCLUSION AND FUTURE WORK
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
In this paper, we discussed a methodology of applying
genetic algorithm into network intrusion detection
techniques. A brief overview of Intrusion Detection
System (IDS), genetic algorithm, and related detection
techniques are discussed. The system architecture is also
introduced. Factors affecting the GA are addressed in
detail. This implementation of genetic algorithm is unique
as it considers both temporal and spatial information of
network connections during the encoding of the problem;
therefore, it should be more helpful for identification of
network anomalous behaviors.
The application of our system in intrusion detection
context helps detect both normal and abnormal
connections with very considerable rates. Besides, we
presented an approach to identify attack plans and predict
upcoming attacks. We developed a Bayesian network
based system to correlate attack scenarios based on their
relationships.
Our system demonstrates high performance when
detecting
intrusions, correlating and predicting attacks. This is due
to the use of bayesian networks and genetic algorithm
within bayesian networks which is especially useful when
dealing with missing information.
7. REFERENCES
[1] R. Heady, G. Luger, A. Maccabe, M. Servilla. The
architecture of a network level intrusion detection system.
Technical Report, University of New Mexico,
Department of Computer Science, August 1990.
[2]Bobor, V. Efficient Intrusion Detection System
Architecture Based on Neural Networks and Genetic
Algorithms, Department of Computer and Systems
Sciences, Stockholm University / Royal Institute of
Technology, KTH/DSV, 2006.
[3]Faraoun, K M., and A. Boukelif. Genetic
Programming Approach for Multi-Category Pattern
Classification Applied to Network Intrusions Detection
International Journal of Computational Intelligence, Vol.
3, No. 1, 2006 pp. 79-90.
[4]Zhang, J., and M. Zulkernine.Anomaly Based Network
Intrusion Detection withUnsupervised Outlier Detection,
Symposium on Network Security and Information
Assurance- Proc. of the IEEE International Conference on
Communications (ICC), June 2006, Istanbul,Turkey.
[5]Kabiri, P., and Ali A. Ghorban. Research on Intrusion
Detection and Response: a Survey International Journal
of network security, The Intelligent & Adaptive Systems
Group (IAS),Vol. 1, No. 2, 4 July 2005, pp. 84-102 .
[6] Diaz-Gome, P. A., and D. F. Hougen. Improved
Offline Intrusion Detection using a genetic algorithm,
Proceedings
of the Seventh International Conference on Enterprise
Information Systems, 2005, Miami, USA.
[7]Gong, R.H. , M. Zulkernine, P. Abolmaesumi, A
Software Implementation of a Genetic Algorithm Based
Approach to Network Intrusion Detection,Proceedings of
Sixth IEEE ACIS International Conference on Software
Engineering, Artificial Intelligence, Networking, and
Parallel/Distributed Computing (SNPD) May 2005,
Maryland,USA.
[8] Stein, G., B. Chen, A. S. Wu, and Kien A. Hua.
Decision tree classifier for network intrusion detection
with GA-based feature selection, In the Proceedings of
the 43rd ACM Southeast Conference, March 18-20, 2005,
Kennesaw, GA, .
[9].Folino, G., C. Pizzuti, G. Spezzano,GP Ensemble for
Distributed Intrusion Detection Systems, International
Conference on Advances in Pattern Recognition,
ICAPR05, August22-25, 2005, Bath, UK.
[10] De Boer, P., and Martin Pels, Host-Based Intrusion
Detection Systems, Technical Report:1.10, Faculty of
Science, Informatics Institute, University of Amsterdam,
2005.
[11] Yao, J. T., S.L. Zhao, and L.V. Saxton, A study on
fuzzy intrusion detection , Proceedings of SPIE Vol.
5812, Data Mining, Intrusion Detection, Information
Assurance, And Data Networks Security, 28 March - 1
April 2005, Orlando, Florida, USA.
[12] Bradford, P. G., and N. Hu. A Layered Approach to
Insider Threat Detection and Proactive Forensics. 21st
Annual Computer Security Applications Conference,
Applied Computer Security Associates
(ACSA),December 5-9, 2005, Tucson, Arizona
[13] Brugger, S. T. Data Mining Methods for Network
Intrusion Detection. Terry Brugger's Homepage. 9 June
2004. University of California, Davis. 6 Oct. 2006.
[14] Li, W.,Using Genetic Algorithm for Network
Intrusion Detection Proceedings of the United States
Department of Energy Cyber Security Group 2004
Training Conference, May24-27, 2004, Kansas City,
Kansas, USA.
[15] Marczyk, A. "Genetic Algorithms and Evolutionary
Computation.", The Talk, Origins Archive. 23 Apr. 2004.
7 Oct. 2006.
[16] Song, D., A Linear Genetic Programming Approach
to Intrusion Detection, Master Degree for Computer
Sciences, Genetic and Evolutionary Computation–
GECCO 2003.
[17] Smith, L. S. An Introduction to Neural Networks
Professor Leslie S. Smith, Centre for Cognitive and
Computational Neuroscience. 2 Apr. 2003. 7 Oct. 2006.
J D College of Engineering, Nagpur(M.S.) 132
National Conference on “Emerging Trends in Electronics Engineering & Computing” (E3C 2010)
[18] Coull, S., Joel Branch, Boleslaw Szymanski, and
Eric Breimer. Intrusion Detection: a Bioinformatics
Approach Proceedings of the 19th Annual Computer
Security Applications Conference, Dec. 2003, Las Vegas,
Nevada.
[19]Gorodetsky,V., I.Kotenko, and O.Karsaev. Multi-
agent Technologies for Computer Network Security:
Attack Simulation, Intrusion Detection and Intrusion
Detection Learning International Journal of Computer
Systems Science and Engineering. vol.18, No.4, July
2003, pp.191-200.
[20]Gomez, J., and D. Dasgupta. Evolving Fuzzy Classifiers for
Intrusion Detection.Proceedings of the 2002 IEEE, Workshop
on Information Assurance, United States Military Academy,
June 2001,West Point, NY .
J D College of Engineering, Nagpur(M.S.) 133