Conference PaperPDF Available

Feature set reduction for automatic network intrusion detection with machine learning algorithms

Authors:

Abstract and Figures

Selecting a minimum set of core features for au-tomatic network intrusion detection with a variety of machine learning algorithms is a challenging problem. In this paper we propose a minimum feature set which can be easily extracted from network traffic. We compare decision trees, neural net-works, naive Bayes and Bayesian networks classifiers performing on the KDDCup99 datasets. We show that by feature selection and preprocessing a comparable classification performance is achievable for the benefit of a significant reduction of training time.
Content may be subject to copyright.
Feature Set Reduction for Automatic Network
Intrusion Detection with Machine Learning
Algorithms
R.C. Staudemeyer
Max-Born-Institute for Nonlinear Optics and Short Pulse Spectroscopy
Max-Born-Str. 2 A, 12489 Berlin, GERMANY
Tel: +49-30-6392-1542, Fax: +49-30-6392-1549
e-mail: rcs@mbi-berlin.de
Prof. C.W. Omlin
Middle East Technical University Northern Cyprus, Campus Kalkanli
Guzelyurt, KKTC, Mersin 10, TURKEY
email: omlin@metu.edu.tr
Abstract—Selecting a minimum set of core features for au-
tomatic network intrusion detection with a variety of machine
learning algorithms is a challenging problem. In this paper we
propose a minimum feature set which can be easily extracted
from network traffic. We compare decision trees, neural net-
works, naive Bayes and Bayesian networks classifiers performing
on the KDDCup99 datasets. We show that by feature selection
and preprocessing a comparable classification performance is
achievable for the benefit of a significant reduction of training
time.
Index Terms—network intrusion detection, feature selection,
feature reduction, machine learning, decision trees, neural net-
works, naive bayes, bayesian networks
I. INTRODUCTION
Huge amounts of data are being transmitted via computer
networks nowadays; any attempts to perform real-time traffic
analysis on continuous streams of data necessitates a wise
selection of information to be extracted. Traffic analysis for
the purpose of network intrusion detection is no exception.
Communication sessions between hosts can be characterized
by so-called connection records. Every connection record con-
tains a number of features uniquely identifying the connection.
Some features such as the duration of the connection, bytes
transferred in each direction or the TCP/UDP ports used for
communication can be easily extracted. Other more complex
features which include information from application layers
which require packet inspection are more difficult to obtain.
There has been an increasing diversity and sophistication
of threats to modern IT systems; these threats call for novel
security mechanisms capable of coping with sophisticated
attacks. Merely patching vulnerabilities in a system is no
longer a viable solution. Intrusion detection systems aiming
at identifying various kinds of malicious activities are thus
becoming vital in order to safeguard networks against attacks.
Classical signature-based approaches do not provide suit-
able solutions for the detection of novel attacks. Machine
learning methods offer alternative approaches which are able
to learn from monitored network data how to differentiate
between normal and anomalous traffic. In particular, they hold
the promise of providing a solution that can detect possible
attacks in real-time so that countermeasures may be taken in
a timely manner.
The purpose of this work was to investigate the application
of standard machine learning methods to network intrusion
detection; in particular, we were interested in the effects of
a reduced feature set on the network intrusion detection per-
formance. We applied decision trees, naive Bayesian learning,
Bayesian networks and the neural network backpropagation al-
gorithm to the publicly available DARPA/KDDCup99 dataset.
It consists of connection records with 41 features whose
relevance for intrusion detection are not clear. We report ex-
periments with different subsets of these features; in particular,
we present a subset of 11 features whose performance with
the standard machine learning algorithms is comparable to the
performance with the full feature set. The majority of selected
features are basic features which are easily extracted from a
network stream.
II. RELATED WORK
An excellent introduction into various areas of Machine
Learning is provided by Mitchell (1997) [1]. Basic concepts of
Machine Learning and their application to the field of network
intrusion detection are summarized in Maloof (2006) [2] and
Vemuri (2006) [3].
Machine Learning techniques have been applied to network
intrusion detection for some time. Usually, the aim is the
automatic generation of rules in order to classify network con-
nections. Sinclair et al. (1999) [4] proposed the use of genetic
algorithms and decision trees for the automatic generation of
such rules. Peddabachigari et al. (2005) [5] investigated and
evaluated decision trees and support vector machines. The
results showed that decision trees perform slightly better if
the dataset is small.
Kruegel et al. (2003) [6] proposed an event classification
scheme that is based on Bayesian networks. The scheme
significantly reduces the number of false alarms in comparison
to threshold-based systems.
Debar et al. (1992) [7] and Cannady (1998) [8] suggested
the use of neural networks as components of intrusion de-
tection systems. Mukkamala et al. (2003) studied [9] the
application of artificial neural networks and support vector
machines in network intrusion detection; their results showed
an ensemble of artificial neural networks and support vector
machines to show superior detection performance compared
to single neural networks. Zhang et al. (2001) [10] ap-
plied and compared perceptron, backpropagation, perceptron-
backpropagation-hybrid, fuzzy artmap, and radial-based func-
tion neural networks for statistical anomaly detection to
four different scenario data sets. Their results showed that
backpropagation and perceptron-backpropagation-hybrid nets
outperformed the other methods. Bivens et al. (2002) [11]
further illustrated that neural networks can efficiently be used
in network intrusion detection. The authors used classifying,
self-organizing maps for data clustering and multi-layer per-
ceptron neural networks for classification. They trained their
system to detect denial of service attacks, distributed denial
of service attacks, and portscans. Shah et al. (2004) [12]
used artificial neural networks and fuzzy inference systems
to design an intrusion detection system. Their hybrid system
combining fuzzy logic with neural networks outperformed
neural networks.
The required infrastructure to capture, prepare and analyze
large quantities of network data is defined by Brodie et
al. (2005) [13]. Sabhnani et al. (2003) [14] evaluated the
performance of a comprehensive set of pattern recognition
and machine learning algorithms on a selection of attacks in
the KDDCup99 intrusion detection dataset. Sung et al. (2003)
extracted a reduced dataset with comparable performance from
the 1998 DARPA/KDDCup99 datasets by deleting one feature
at a time. They applied neural networks and support vector
machines.
In [15] Kayacik et al. (2005) investigated the relevance of
all features provided in the KDDCup99 intrusion detection
dataset to substantiate the performance of machine learning
based detectors trained on KDDCup99 training data.
Chebrolu et al. [16] identified important input features to
build computationally efficient and effective intrusion detec-
tion systems. They investigated the performance of Bayesian
networks and classification and regression trees and suggested
a hybrid model. They concluded that the reduction to relevant
dataset features can improve performance of machine learning
algorithms. Chen et al. (2005) [17] presented a flexible neural
tree model for intrusion detection systems with a focus on
improving the intrusion detection performance by reducing
the input features. Lee et al. (2006) [18] presented a novel
feature selection method based on genetic optimization. The
performance of the proposed approach was contrasted against
the performance of the naive Bayesian classifier. The pro-
posed approach was especially effective in detecting unknown
attacks.
III. MACHINE LEARNING BACKGROUND
A. Decision Trees
Decision Tree learning is one of the most common machine
learning methods. Learned functions are usually represented in
the form of a tree-like structure representing a set of decisions,
which can be translated into if-then rules. Depending on
the algorithm used, the representation may be binary or
multibranched.
Nodes in a decision represent some attribute of an instance
and branches descending from a node correspond to possible
attribute values. Leaves represent possible values of the target
variable given the path starting from the root node and ending
at the observed leave. To classify an item, the decision is
followed from the root to a leaf. At every node, an attribute
is tested and based on the outcome the corresponding branch
is followed. This procedure continues until a leaf is reached.
[1]
B. Neural Networks
Artificial Neural Networks are inspired by biological learn-
ing systems and loosely model their basic functions. They
consist of a densely interconnected group of simple neuron-
like threshold switching units. Each unit takes a number of
real-valued inputs and produces a single real-valued output.
Based on the connectivity between the threshold units and
element parameters these networks can model a complex
global behavior.
In feed-forward neural networks, sets of neurons are orga-
nized in layers where each neuron computes a weighted sum
of its inputs. Input neurons take signals from the environment
and output neurons present signals to the environment. Neu-
rons which are not directly connected to the environment but
are connected to other neurons are called hidden neurons.
The most common neural network learning technique is
the error backpropagation algorithm. It uses gradient descent
to learn the weights in multilayer networks. It works in
small iterative steps starting backwards from the output layer
towards the input layer. A requirement is that the activation
function of the neuron be differentiable. [1]
C. Bayesian Learning
The naive Bayes is a simple probabilistic classifier. It
assumes that the effect of a variable value on a given class is
independent of the values of other variables. This assumption
is called class conditional independence.
The naive Bayesian classifier is based on Bayes’ theorem
which provides a way to calculate the posterior probability
from the prior probability.
The algorithm stores the prior class probabilities and the
posterior probability of each attribute assigned to that class.
During the learning phase, it estimates these probabilities from
examples by simply counting frequencies of occurrence. The
prior probability is the portion of examples from each class.
The posterior probability is the frequency that attribute values
occur in the given class.
During an observation, the algorithm operates under the
assumption that attributes are conditionally independent. The
algorithm uses Bayes’ theorem to calculate the posterior
probability of each class. It returns the class label with the
highest probability as the decision.
Despite its simplicity and the assumptions made, this algo-
rithm can often outperform more sophisticated classification
methods. The performance and applicability is comparable to
decision trees and neural networks. [1]
D. Bayesian Networks
Bayesian networks are another statistical classifier. They are
drawn as a directed acyclic graph where every node represents
an attribute and the edges describe the relations between them.
Every node contains a conditional probability table which
defines the probability distribution. It is used to predict the
class probabilities for every given instance. The probability of
each feature value depends on the values of the attributes of
the parent nodes. Nodes without parents have an unconditional
probability distribution.
Learning of Bayesian networks is basically a search through
the space of all possible networks.
The main advantage of Bayesian networks in comparison
to naive Bayes is that it is less constraining. They are easy to
interpret for humans. The provided estimates can be ranked,
which allows the cost to be minimized. [1] [19]
IV. THE DATA
The choice of training data available for machine learning in
the field of network intrusion detection systems is very limited.
One of the few but at the same time most comprehensive,
widely used datasets are the DARPA datasets. They are freely
available from the Information Systems Technology Group
(IST) of the MIT Lincoln Laboratory
1
.
The tcpdump data provided by 1998 DARPA Intrusion
Detection Evaluation network was further processed and used
for the 1999 KDDCup contest at the fifth International Confer-
ence on Knowledge Discovery and Data Mining. The learning
task of this competition was to classify the preprocessed
connection records to either normal traffic or one out of the
four given attack categories (’dos’, ’probe’, ’r2l’, ’u2r’).
The seven weeks of network traffic collected in four gi-
gabytes of compressed raw tcpdump files from the DARPA
training data were preprocessed into five million labeled and
categorized connection records with approximately 100 bytes
each; and the two weeks of training data were processed into
two million unlabeled connections records. Preprocessing of
the DARPA data for the 1999 KDDCup contest was done with
the MADAMID framework and is described in Lee (1999)
[20], Lee (2000) [21]. The KDDCup99 datasets are available
from the UCI KDD Archive as the 1999 KDDCup Dataset
[22].
A connection record summarizes the packets of a communi-
cation session between a connection initiator with a specified
source IP address and a destination IP address over a pair
of TCP/UDP ports. The labeled connection records in the
training set are categorized normal or indicate one of 22 types
of attacks. As far as we know, the KDDCup99 dataset is the
only publicly available dataset with fully labeled connection
records. Training and test sets have different probability dis-
tributions.
Each connection record contains 41 input features grouped
into basic features and higher-level features. The basic features
are directly extracted or derived from the header information
of IP packets and TCP/UDP segments in the tcpdump files of
each session (basic features 1-9 in table I). This was done by
using a modified version of the freely available ’Bro Intrusion
Detection System’
2
. Each connection record was produced
when either the connection was terminated or Bro was closed.
The ’listfiles’ for tcpdump from the DARPA training data
where used to label the connection records.
1
DARPA Intrusion Detection Evaluation. Information Systems
Technology Group (IST). MIT Lincoln Laboratory. web site.
http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/index.html
2
http://bro-ids.org/
The so-called ’content-based’ higher-level features use do-
main knowledge to look specifically for attacks in the actual
data of the segments recorded in the tcpdump files. These
address r2l’ and ’u2r’ attacks which sometimes only require
a single connection or which are without any prominent
sequential patterns. Typical features include the number of
failed login attempts or whether root access was obtained
during the session (features 10-22 in table I).
Furthermore, there are ’time-based’ and ’connection-based’
derived features to address ’dos’ and ’probe’ attacks. ’time-
based’ features examine connections within a time window
of two seconds and provide statistics about these. To provide
statistical information about attacks extending a two seconds
time-window such as slow probing attacks ’connection-based’
features use a connection-window of 100 connections. Both
are further split into same host’ features which provide
statistics about connections with the same destination host and
same service’ features that examine only connections with the
same service (features 23-41 in table I).
The KDDCup99 competition provides the training and
testing datasets in a full and a so-called ’10%’ subset version.
The ’10%’ subset was created due to the huge amount of
connection records present in the full set; some dos’ attacks
have millions of records. For this reason, not all of these con-
nection records were selected. Furthermore, only connections
within a time-window of five minutes before and after the
entire duration of an attack were added into the 10% datasets.
To achieve approximately the same distribution of intrusions
and normal traffic as the original DARPA dataset, a selected
set of sequences with ’normal’ connections were as well left
in the 10% dataset.
The full training dataset contains 4.898.431 records and
the ’10%’ subset contains 494.021 records. Both contain 22
different attack types which are in the order they were used
during the 1998 DARPA experiments.
The full testset with 2.984.154 records is only available
unlabeled; but a 311.029 record subset is provided both as
unlabeled and labeled test data. It is specified as the ’10%
corrected’ subset with a different distribution and additional
attacks not part of the training set.
For the KDDCup99 competition the ’10%’ subset was
intended for training. The ’10% corrected’ subset containing
37 different attacks can be used for performance testing.
Out of 24 submitted entries the first three places of the
original KDDCup99 challenge were using variants of decision
trees. Elkan (2000) [23] summarizes the winning entries
results of the KDDCup99 challenge.
V. FEATURE SET REDUCTION
From the perspective of data mining, feature set reduction
aims to find the set of core features which best classifies
the presented data. Some features may contain redundant
information, while others may contain information suggest-
ing false correlations; both can hinder correct classification.
Additionally, unnecessary features add to computation time.
From the perspective of network intrusion detection sys-
tems, there are strong reasons to reduce the number of
collected features and choose features which can easily be
extracted out of a high-speed data stream. Especially when
aiming connections in today’s local area networks forward
packets with tens of gitabit per second and millions of frames
per second.
TABLE I
KDDCUP99FEATURES
Nr features 12 17 11
name category attr. type description Che04 NEW
1 duration basic feature numeric duration of the connection in seconds X X
2 protocol type basic nominal connection protocol (tcp, udp, icmp) X X
3 service basic nominal destination port mapped to service X X X
4 flag basic nominal normal or error status flag of the connection
5 src bytes basic numeric number of data bytes from source to destination X X X
6 dst bytes basic numeric bytes from destination to source X X
7 land basic* nominal 1 if connection is from/to the same host/port; X
8 wrong fragment basic* numeric number of wrong fragments (values 0,1,3) X X
9 urgent basic* numeric number of urgent packets
10 hot content feature* numeric number of hot indicators
11 num failed logins content numeric number of failed login attempts X
12 logged in content nominal 1 if successfully logged in; 0 otherwise X X
13 num compromised content numeric number of compromised conditions
14 root shell content nominal 1 if root shell is obtained; 0 otherwise X
15 su attempted content numeric 1 if su root command attempted; 0 otherwise
16 num root content numeric number of root accesses
17 num file creations content numeric number of file creation operations X
18 num shells content numeric number of shell prompts
19 num access files content numeric number of operations on access control files
20 num outbound cmds content numeric number of outbound commands in an ftp session
21 is hot login content nominal 1 if the login belongs to the hot list
22 is guest login content nominal 1 if the login is a guest login X
23 count time-based numeric number of connections to the same host as X X
the current connection in the past two seconds
24 srv count time-based numeric number of connections to the same service as X X
the current connection in the past two seconds
25 serror rate time-based numeric % of connections that have SYN errors X X X
26 srv serror rate time-based numeric % of connections that have SYN errors X
27 rerror rate time-based numeric % of connections that have REJ errors
28 srv rerror rate time-based numeric % of connections that have REJ errors X
29 same srv rate time-based numeric % of connections to the same service
30 diff srv rate time-based numeric % of connections to different services X
31 srv diff host rate time-based numeric % of connections to different hosts X
32 dst host count host-based numeric count of connections having the same destination host X X
33 dst host srv count host-based numeric count of connections having the same destination X X
host and using the same service
34 dst host same - host-based numeric % of connections having the same destination
srv rate port and using the same service
35 dst host diff - host-based numeric % of different services on the current host X X
srv rate
36 dst host same - host-based numeric % of connections to the current host X
src port rate having the same source port
37 dst host srv - host-based numeric % of connections to the same service
diff host rate coming from different hosts
38 dst host serror rate host-based numeric % of connections to the current host
that have an S0 error
39 dst host srv - host-based numeric % of connections to the current host and
serror rate specified service that have an S0 error
40 dst host rerror rate host-based numeric % of connections to the current host X
that have an RST error
41 dst host srv - host-based numeric % of connections to the current host and
rerror rate specified service that have an RST error
42 connection type nominal X X X
* = feature provided by the ’Bro Intrusion Detection System’
For feature reduction we used a custom training set with
10.422 instances. This new dataset was sampled and ran-
domised from up-to the 1.000 first examples out of the 23 traf-
fic types contained in the full dataset. Afterwards we classified
the traffic to one out of five types (’normal’, dos’, ’probe’,
r2l’, ’u2r’). Feature selection was done by examination of
the J48 decision tree after every run reducing and/or adding
individual and groups of features. Features close to the root of
the tree were considered more important than features close
to leaves. Features extracted easily from network data were
preferred to features requiring domain knowledge or detailed
traffic data analysis. Classification and runtime performance
of naive Bayes, Bayesian nets and backpropagation neural
networks were also observed in every run. The resulting 11
selected features set consists out of seven basic features and
four higher level features. They are described in table I.
VI. EXPERIMENTS
We used the weka data mining suite which provides a
large number of different machine learning algorithms
3
. For
feature reduction we applied the C4.5 decision tree algorithm
(in weka specified as J48), standard backpropagation with a
multilayer feed-forward network (in weka MLP), naive Bayes
and Bayesian networks to the DARPA/KDDCup99 training
data using 10-fold cross-validation.
We did an extended series of experiments with the aim to
extract a reduced features set with only few, if any, content
features. This resulted in our 11 features set described in
section V.
In our next step we investigated into optimizing preprocess-
ing the selected features to further increase performance.
3
http://www.cs.waikato.ac.nz/ml/weka/ (09/05/08)
  








   
Fig. 1. Comparison of the total number of incorrect classified instances
(false positives + false negatives) using the four dataset variants with the 41
features, 41 features preprocessed, 11 features and 11 features preprocessed.
Like most neural network implementations Weka’s MLP
implementation of the neural network backpropagation al-
gorithm requires floating point numbers as input, preferred
in the range [1, 1]. The KDDCup99 dataset contains con-
tinues/numeric and nominal/discrete features preprocessed in
very different ways.
We wrote our custom preprocessing scripts. Foremost we
reset all outlier values to estimated maximum threshold values
individually specified for every processed feature. Maximum
threshold values were set based on expert knowledge. Next
we applied the natural logarithm to selected continues features
with strongly biased distributions. Then we scaled the values
of continues features to the the range [1, 1] with a precision
of 10
6
.
We encoded binary values as [1, 1]. For discrete features
with three or fewer distinct values, we used effects coding. For
features with a greater numbers of distinct values we sorted
the values using least-first-ranking and scaled the score values
to the range [1, 1]. We removed features with non-changing
values in the testset (e.g. num
outbound cmds, is host login).
Finally we mapped all attacks to one of the ve attack types
(normal, dos, probe, r2l, u2r).
For a performance comparison with the results of the
KDDCup99 competition, we did run experiments using the
original ’10% training set’ and the ’10% test set’. The results
are shown in table II. Figure 1 additionally shows the dif-
ferent number of misclassifications by the machine learning
algorithms applied to all four variants of the ’10% training
set’ (41/11 features, original/preprocessed).
In terms of total accuracy we find that decision trees,
Bayesian networks and neural networks are able to hold their
performance after preprocessing and reducing the features.
The correct categorization of normal traffic remains stable for
them as well. The naive Bayes is the only classifier which
looses performance noticeably.
An investigation into the true positive rate and the false
positive rate per attack traffic class reveal more interesting
details:
For the J48 decision tree the detection of network probes
decreases but the false alarm rate remains stable. The detection
of u2r attacks increases.
Bayesian networks slightly decrease on the detection of
probe attacks, but this comes with an improved false alarm
rate. Here the detection of network probes, r2l and u2r attacks
both decrease. The false alarm rate for u2r attacks is reduced.
The MLP neural network improves the false alarm rate on
the detection of dos attacks. The detection of probe attacks
improves with a decrease of the false alarm rate. Detection of
r2l attacks is decreased and of u2r attacks is increased.
We note that due to the few examples of u2r attacks the very
low false alarm rates are not very meaningful and difficult to
compare.
Processing time improved for all four classifiers. Outstand-
ing is the reduction of processing time for the neural network
which was reduced in total by 97.5%.
Tables III shows the confusion matrix of the result with
the highest accuracy of a classifier trained with the feature
reduced and preprocessed 10% training set. The result is very
close to the winning entry of the KDDCup99 competition.
TABLE III
CONFUSION MATRIX OF TRAINED NEURAL NETWORK
predicted 0 1 2 3 4 %correct
actual normal probe dos u2r r2l
0 normal 60250 236 100 3 4 99.43%
1 probe 794 3136 235 1 0 75.28%
2 dos 6228 527 223098 0 0 97.06%
3 u2r 61 0 1 8 0 11.43%
4 r2l 15610 82 4 92 559 3.42%
Experiments were performed on a Dual-Core AMD
Opteron@2.22GHz with 32GB memory and a 64bit
GNU/debian operating system.
VII. CONCLUSIONS
We have applied machine learning algorithms including
decision trees, naive Bayes classifiers, Baysian nets and neural
networks to the KDDCup99 dataset for network intrusion
detection. Our results show that a large number of features
are in fact redundant or at least unimportant for the majority
of attacks. We were able to drastically reduce the number of
features from initially 41 down to 11 core features. Further-
more, we could significantly decrease the classification time.
The naive Bayes classifier is not well suited for this learning
task. It shows poor performance for all traffic types. Bayesian
networks show strengths in the classification of network
probes but suffer from high false alarm rates in general.
J48 decision trees and MLP neural networks show good
performance for this type of datasets. Decision trees show
strenghts in the detection of rare r2l and u2r attacks.
The slight decrease in detection of dos attacks and network
probes does not hurt. Due to the large amout of connections
initiated in series by these attacks a detection rate of 80% is
still acceptable.
The first six of the selected core features (1,2,3,5,6,8)
are base features which can be easily extracted from net-
work traffic with very less overhead. The remaining features
(25,33,35,36,40) are time-based and host-based traffic fea-
tures. We were able to dismiss all ’content-based’ features
which are much more complex to extract.
Further research might reveal that some of these remaining
traffic features are as well dismissable using machine learning
algorithms which are able to extract time series information.
TABLE II
PERFORMANCE OF ORIGINAL 10% TRAINING SET WITH CORRECTED TESTSET USING TRAFFIC TYPE CLASSIFICATION
normal dos probe r2l u2r
feat.set classifier time* accuracy TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR
41 J48 6m 92.5759% 0.995 0.089 0.973 0.003 0.747 0.002 0.058 0 0.086 0
11p J48 1m 92.2618% 0.995 0.092 0.97 0.003 0.665 0.002 0.057 0 0.143 0
41 naiveBayes 1m 78.1795% 0.944 0.085 0.792 0.018 0.895 0.136 0.006 0.001 0.7 0.011
11p naiveBayes 7s 77.3902% 0.895 0.076 0.792 0.116 0.72 0.133 0.085 0.003 0.1 0.001
41 BayesNet 6m 91.1892% 0.99 0.084 0.95 0.002 0.836 0.014 0.101 0.001 0.629 0.005
11p BayesNet 24s 91.3285% 0.988 0.089 0.957 0.003 0.804 0.01 0.053 0.002 0.471 0.002
41 MLP 28h53m 92.3657% 0.984 0.09 0.973 0.011 0.725 0.001 0.056 0 0.086 0
11p MLP 43m 92.2908% 0.994 0.091 0.971 0.004 0.753 0.003 0.034 0 0.114 0
* = performed on Dual-Core AMD Opteron@2.22GHz with 32GB memory
REFERENCES
[1] T. Mitchell, Machine Learning. McGraw Hill, 1997.
[2] M. A. Maloof, “Some basic concepts of machine learning and data
mining, in Marcus A. Maloof (Ed.). Machine Learning and Data
Mining for Computer Security. Springer, 2006.
[3] V. R. Vemuri, Relevance of Machine Learning. Auerbach Publications,
2006.
[4] C. Sinclair, L. Pierce, and S. Matzner, An application of machine
learning to network intrusion detection, in Proceedings of the 15th
Annual Computer Security Applications Conference, 1999.
[5] S. Peddabachigari, A. Abraham, and J. Thomas, “Intrusion detection
systems using decision trees and support vector machines,International
Journal of Applied Science and Computations, USA, vol. 11, no. 3, pp.
118–134, 2004.
[6] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, “Bayesian event
classification for intrusion detection, in Proceedings of the 19th Annual
Computer Security Applications Conference (ACSAC 2003), 2003.
[7] M. Debar, D. Becke, and A. Siboni, A neural network component for
an intrusion detection system, in Proceedings of the IEEE Computer
Society Symposium on Research in Security and Privacy, 1992.
[8] J. Cannady, Artifical neural networks for misuse detection, in In Pro-
ceedings of the 1998 National Information Systems Security Conference
(NISSC’98), 1998, pp. 443–456.
[9] S. Mukkamala, A. H. Sung, and A. Abraham, “Intrusion detection using
ensemble of soft computing paradigms,” in Advances in Soft Computing.
Springer Verlag, 2003, pp. 239–248.
[10] Z. Zhang, J. Li, C. N. Manikopoulos, J. Jorgenson, and J. Ucles, “Neural
networks in statistical anomaly intrusion detection,” in Advances in Neu-
ral Networks and Applications, N. Mastoraks (Ed.). World Scientific
and Engineering Society Press, 2001, pp. 333–338.
[11] A. Bivens, M. Embrechts, C. Palagiri, R. Smith, and B. Szymanski,
“Network-based intrusion detection using neural networks, in Intelli-
gent Engineering Systems through Artificial Neural Networks, Vol. 12.
Proc. ANNIE 2002 Conference, 2002.
[12] K. Shah, N. Dave, S. Chavan, S. Mukherjee, A. Abraham, and S. Sanyal,
Adaptive neuro-fuzzy intrusion detection system, in IEEE Interna-
tional Conference on ITCC’04, no. Vol. 1, 2004, pp. 70–74.
[13] M. Brodie, M. Mei, D. George, and S. Ma, “Using data mining for
intrusion detection, in Kantardzic and Zurada (Ed.). Next Generation
of Data Mining Applications. Wiley-Interscience, 2005.
[14] M. Sabhnani and G. Serpen, Application of machine learning al-
gorithms to kdd intrusion detection dataset within misuse setection
context, in In Proceedings of the International Conference on Machine
Learning: Models, Technologies, and Applications, 2003, pp. 209–215.
[15] H. G. Kayacik, A. N. Zincir-Heywood, and M. I. Heywood, “Selecting
features for intrusion detection: A feature relevance analysis on kdd 99
intrusion detection datasets, in Proceedings of the PST2005, 2005.
[16] S. Chebrolu, A. Abraham, and J. Thomas, “Feature deduction and
ensemble design of intrusion detection systems, in Computers and
Security. Elsevier Science, 2005.
[17] Y. Chen and A. Abraham, “Feature selection and intrusion detection
using hybrid flexible neural tree, in Second IEEE International Sym-
posium on Neural Networks (ISNN 2005). Lecture Notes in Computer
Science, Springer Verlag, Germany, 2005.
[18] C. H. Lee, S. W. Shin, and J. W. Chung, “Network intrusion detection
through genetic feature selection, in Proceedings of the Seventh ACIS
International Conference on Software Engineering, Artificial Intel-
ligence, Networking, and Parallel/Distributed Computing (SNPD06),
2006.
[19] I. H. Witten and E. Frank, Data Mining - Practical Machine Learning
Tools and Techniques. Morgan Kaufmann, 2005, no. Second Edition.
[20] W. Lee, “A data mining framework for constructing features and models
for intrusion detection systems, in PhD thesis, 1999.
[21] W. Lee and S. Stolfo, A framework for constructing features and mod-
els for intrusion detection systems, ACM Transactions on Information
and System Security, no. 3, pp. 227–261, 2000.
[22] S. Hettich and S. Bay, “The uci kdd archive. web site,
http//kdd.ics.uci.edu (last access: 08.05.2009), in Department of
Information and Computer Sciences, University of California, Irvine,
1999.
[23] C. Elkan, “Results of the kdd99 classifier learning. sigkdd explorations,
in Newsletter of the ACM Special Interest Group on Knowledge Dis-
covery and Data Mining, no. Volume 1, Issue 2, 2000, pp. 64–34.
Ralf Staudemeyer is employed as a network administrator at the Max-
Born-Institute in Berlin, Germany. He is PhD candidate at the University
of the Western Cape, South Africa. His interests are currently in the fields of
machine learning and computer network intrusion detection.
... During their experiments, decision tree achieved accuracy of 88.74% which is highest among all classifiers. However, for feature selection they have removed content features from NSL-KDD directly by referring [50]. Staudemeyer et al. [50] have used J48 decision tree to select best subset of features using KDD'CUP99. ...
... However, for feature selection they have removed content features from NSL-KDD directly by referring [50]. Staudemeyer et al. [50] have used J48 decision tree to select best subset of features using KDD'CUP99. In KDD'CUP99, 75% and 78% of the records were duplicated for test and train sets, respectively [32]. ...
Article
With simple connectivity and fast-growing demand of smart devices and networks, IoT has become more prone to cyber attacks. In order to detect and prevent cyber attacks in IoT networks, intrusion detection system (IDS) plays a crucial role. However, most of the existing IDS have dimensionality curse that reduces overall IoT systems efficiency. Hence, it is important to remove repetitive and irrelevant features while designing effective IDS. Motivated from aforementioned challenges, this paper presents an intelligent cyber attack detection system for IoT network using a novel hybrid feature reduced approach. This technique first performs feature ranking using correlation coefficient, random forest mean decrease accuracy and gain ratio to obtain three different feature sets. Then, features are combined using a suitably designed mechanism (AND operation), to obtain single optimized feature set. Finally, the obtained reduced feature set is fed to three well-known machine learning algorithms such as random forest, K-nearest neighbor and XGBoost for detection of cyber attacks. The efficiency of the proposed cyber attack detection framework is evaluated using NSL-KDD and two latest IoT-based datasets namely, BoT-IoT and DS2OS. Performance of the proposed framework is evaluated and compared with some recent state-of-the-art techniques found in literature, in terms of accuracy, detection rate (DR), precision and F1 score. Performance analysis using these three datasets shows that the proposed model has achieved DR up to 90%–100%, for most of the attack vectors that has close similarity to normal behaviors and accuracy above 99%.
... The each traffic flows consist of 100 bytes of information. 41 features were composed of 34 continuous and 7 discrete valued and grouped into basic features in the range [1][2][3][4][5][6][7][8][9], content features in the range [10][11][12][13][14][15][16][17][18][19][20][21][22], traffic features with in a time window of in the range [23][24][25][26][27][28][29][30][31] and features based on host in the range [32][33][34][35][36][37][38][39][40][41]. The full data set contains more number of connection records in comparison to the attacks. ...
... To know the effectiveness of minimal feature sets, CNN-LSTM is trained on 3 different minimal feature sets [31], [15]. All experiment on each minimal feature sets are run till 1000 epochs. ...
... To conform to the unique input data format requirements of CNN and deep forest models, we restructured the feature values for all four datasets. Following preprocessing, the NSL-KDD dataset consisted of 122 attributes [39,40]. The CICIDS2017 and UNSW-NB15 datasets consist of 78 and 194 features, respectively. ...
Article
Full-text available
Consumer Electronics (CEs) are smart devices using IoT for connectivity. They’re susceptible to attacks like DoS, DDoS, and web attacks, impairing functions and enabling remote hijacking. Attackers can exploit CEs to target other systems, like vehicles. Malicious code can propagate through networks or CEs, causing vehicle failures. Existing ML/DL based IDS have high classification accuracy and robustness in traditional internet environments, but they are overly complex for performance improvement, which hinders their deployment in edge small computing environments. Furthermore, the comparison experiments of these intrusion detection algorithms with other algorithms are not sufficiently comprehensive to evaluate their performance in small computing environments. Therefore, balancing “detection performance and resource consumption” is a key issue in CE network detection. To address this issue, this paper proposes a hybrid feature selection model based on chi-square test and information gain combined Ig-Chi, which effectively reduces the feature dimensionality and improves the classification accuracy of classifiers for high-dimensional data sets. Additionally, layered intrusion detection is employed to perform intrusion detection on the data after feature selection. The experiments on four public data sets demonstrate that this method surpasses six ML/DL algorithms in terms of accuracy and resource indicators.
... In order to increase the efficiency of SDN based intrusion detection systems we need to select the best features that can be used in SDN context. It is worth noting that the content features need to be omitted due to the fact that these features are complex to extract by a network based IDS [17]. Therefore, content features (i.e. ...
Preprint
Full-text available
Software-defined networking (SDN) is a new paradigm that allows developing more flexible network applications. SDN controller, which represents a centralized controlling point, is responsible for running various network applications as well as maintaining different network services and functionalities. Choosing an efficient intrusion detection system helps in reducing the overhead of the running controller and creates a more secure network. In this study, we investigate the performance of the well-known anomaly-based intrusion detection approaches in terms of accuracy, false alarm rate, precision, recall, f1-measure, area under ROC curve, execution time and Mc Nemar's test. Precisely, we focus on supervised machine-learning approaches where we use the following classifiers: Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT), K Nearest-Neighbour (KNN), AdaBoost, RUSBoost, LogitBoost and BaggingTrees where we employ the well-known NSL-KDD benchmark dataset to compare the performance of each one of these classifiers.
... In order to increase the efficiency of SDN based intrusion detection systems we need to select the best features that can be used in SDN context. It is worth noting that the content features need to be omitted due to the fact that these features are complex to extract by a network based IDS [17]. Therefore, content features (i.e. ...
Article
Full-text available
Software-defined networking (SDN) is a new paradigm that allows developing more flexible network applications. SDN controller, which represents a centralized controlling point, is responsible for running various network applications as well as maintaining different network services and functionalities. Choosing an efficient intrusion detection system helps in reducing the overhead of the running controller and creates a more secure network. In this study, we investigate the performance of well-known anomaly-based intrusion detection approaches in terms of accuracy, false positive rate, area under ROC curve and execution time. Precisely, we focus on supervised machine-learning approaches where we use the following classifiers: Adaptive Neuro-Fuzzy Inference System (ANFIS), Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT) and K Nearest-Neighbor (KNN). By using the NSL-KDD benchmark dataset, we observe that KNN achieves the best testing accuracy. However, in terms of execution time, we conclude that ELM shows the best results for both training and testing stages.
... In our future work, we will focus on data gathering, labeling and asses that labeled feature vectors using machine learning to find out the effectiveness of them towards robust traffic classification. Employing deep learning approaches on raw network traffic traces by using [27] approach will be remained as another future direction of our work. ...
... These issues may hinder the classification performance. To alleviate this, feature set reduction of KDDCup '99' challenge data set is studied in the perspective of data mining by [27], [28]. They reported 3 types of minimal feature sets such as 4-feature sets, 8-feature sets and 12-feature sets. ...
Chapter
With the continuous improvement of technology, people’s reliance on the Internet is gradually increasing, which makes some malicious traffic can cause huge economic losses to society. To create a safety and stable network environment and reduce the economic losses caused by network attacks, a technology for identifying malicious traffic is crucial. In this paper, we introduce a deep learning-based malicious and illegal traffic recognition system with the functions of traffic acquisition, traffic feature extraction, malicious traffic recognition and display, which can identify malicious traffic in real time and maintain the security of the network environment.KeywordsCyberspace securityMalicious traffic detectionConvolutional neural networksDeep learning
Conference Paper
Full-text available
A small subset of machine learning algorithms, mostly inductive learning based, applied to the KDD 1999 Cup intrusion detection dataset resulted in dismal performance for user-to-root and remote-to-local attack categories as reported in the recent literature. The uncertainty to explore if other machine learning algorithms can demonstrate better performance compared to the ones already employed constitutes the motivation for the study reported herein. Specifically, exploration of if certain algorithms perform better for certain attack classes and consequently, if a multi-expert classifier design can deliver desired performance measure is of high interest. This paper evaluates performance of a comprehensive set of pattern recognition and machine learning algorithms on four attack categories as found in the KDD 1999 Cup intrusion detection dataset. Results of simulation study implemented to that effect indicated that certain classification algorithms perform better for certain attack categories: a specific algorithm specialized for a given attack category . Consequently, a multi-classifier model, where a specific detection algorithm is associated with an attack category for which it is the most promising, was built. Empirical results obtained through simulation indicate that noticeable performance improvement was achieved for probing, denial of service, and user-to-root attacks.
Conference Paper
Full-text available
KDD 99 intrusion detection datasets, which are based on DARPA 98 dataset, provides labeled data for researchers working in the field of intrusion detection and is the only labeled dataset publicly available. Numerous researchers employed the datasets in KDD 99 intrusion detection competition to study the utilization of machine learning for intrusion detection and reported detection rates up to 91% with false positive rates less than 1%. To substantiate the performance of machine learning based detectors that are trained on KDD 99 training data; we investigate the relevance of each feature in KDD 99 intrusion detection datasets. To this end, information gain is employed to determine the most discriminating features for each class.
Article
In this paper, we report on experiments in which we used neural networks for statistical anomaly intrusion detection systems. The five types of neural networks that we studied were: Perceptron; Backpropagation; Perceptron- Backpropagation-Hybrid; Fuzzy ARTMAP; and Radial-Based Function. We collected four separate data sets from different simulation scenarios, and these data sets were used to test various neural networks with different hidden neurons. Our results showed that the classification capabilities of BP and PBH outperform those of other neural networks.
Article
With the growth of computer networking, electronic commerce, and web services, security of networking systems has become very important. Many companies now rely on web services as a major source of revenue. Computer hacking poses significant problems to these companies, as distributed attacks can render their cyber-storefront inoperable for long periods of time. This happens so often, that an entire area of research, called Intrusion Detection, has been devoted to detecting this activity. We show that evidence of many of these attacks can be found in a careful analysis of network data. We also illustrate that the learning abilities of neural networks can serve to detect this activity. We test our systems against denial of service attacks, distributed denial of service attacks, portscans, and even some doorknobs attacks. Finally, we also show how our systems detect long-term attacks, which occur when attackers space out their efforts to evade detection. In this work, we explore network based intrusion detection using a Perceptron-based, feed-forward neural network system and a system based on classifying, self-organizing maps. Both of these systems are tested on data provided from the DARPA intrusion detection evaluation program as well as live attacks in an isolated computer network.
Article
Security of computers and the networks that connect them is increasingly becoming of great significance. Intrusion detection is a mechanism of providing security to computer networks. Although there are some existing mechanisms for Intrusion detection, there is need to improve the performance. Data mining techniques are a new approach for Intrusion detection. In this paper we investigate and evaluate the decision tree data mining techniques as an intrusion detection mechanism and we compare it with Support Vector Machines (SVM). Intrusion detection with Decision trees and SVM were tested with benchmark 1998 DARPA Intrusion Detection data set. Our research shows that Decision trees gives better overall performance than the SVM.
Article
Current intrusion detection systems (IDS) examine all data features to detect intrusion or misuse patterns. Some of the features may be redundant or contribute little (if anything) to the detection process. The purpose of this study is to identify important input features in building an IDS that is computationally efficient and effective. We investigated the performance of two feature selection algorithms involving Bayesian networks (BN) and Classification and Regression Trees (CART) and an ensemble of BN and CART. Empirical results indicate that significant input feature selection is important to design an IDS that is lightweight, efficient and effective for real world detection systems. Finally, we propose an hybrid architecture for combining different feature selection algorithms for real world intrusion detection.