ArticlePDF Available

Intrusion Detection in Computer Networks using Lazy Learning Algorithm

Authors:

Abstract and Figures

Intrusion Detection Systems (IDS) are used in computer networks to safeguard the integrity and confidentiality of sensitive data. In recent years, network traffic has become sizeable enough to be considered under the big data domain. Current machine learning based techniques used in IDS are largely defined on eager learning paradigms which lose performance efficiency by trying to generalize training data before receiving queries thereby incurring overheads for trivial computations. This paper, proposes the use of lazy learning methodologies to improve overall performance of IDS. A novel heuristic weight based indexing technique has been used to overcome the drawback of high search complexity inherent in lazy learning. IBk and LWL, two popular lazy learning algorithms have been compared and applied on the NSL-KDD dataset for simulating a real-world like scenario and comparing their relative performances with hw-IBk. The results of this paper clearly indicate lazy algorithms as a viable solution for real-world network intrusion detection.
Content may be subject to copyright.
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 132 (2018) 928–936
1877-0509 © 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science
(ICCIDS 2018).
10.1016/j.procs.2018.05.108
10.1016/j.procs.2018.05.108
© 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientic committee of the International Conference on Computational Intelligence and
Data Science (ICCIDS 2018).
1877-0509
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000002
www.elsevier.com/locate/procedia
*Corresponding Author: aditya.chellam2015@vit.ac.in
1877-0509© 2018 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and
Data Science (ICCIDS 2018).
International Conference on Computational Intelligence and Data Science (ICCIDS 2018)
Intrusion Detection in Computer Networks using Lazy Learning
Algorithm
Aditya Chellama, Ramanathan La, Ramani Sa
aSchool of Computer Science and Engineering, VIT, Vellore 632014, India
Abstract
Intrusion Detection Systems (IDS) are used in computer networks to safeguard the integrity and confidentiality of sensitive data.
In recent years, network traffic has become sizeable enough to be considered under the big data domain. Current machine
learning based techniques used in IDS are largely defined on eager learning paradigms which lose performance efficiency by
trying to generalize training data before receiving queries thereby incurring overheads for trivial computations. This paper,
proposes the use of lazy learning methodologies to improve overall performance of IDS. A novel heuristic weight based indexing
technique has been used to overcome the drawback of high search complexity inherent in lazy learning. IBk and LWL, two
popular lazy learning algorithms have been compared and applied on the NSL-KDD dataset for simulating a real-world like
scenario and comparing their relative performances with hw-IBk. The results of this paper clearly indicate lazy algorithms as a
viable soluti on for real-world network intrusion detecti on.
© 2018 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and
Data Science (ICCIDS 2018).
Keywords :Lazy Learning; Intrusion Detection System; Machine Learning; IBk; kNN
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000002
www.elsevier.com/locate/procedia
*Corresponding Author: aditya.chellam2015@vit.ac.in
1877-0509© 2018 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and
Data Science (ICCIDS 2018).
International Conference on Computational Intelligence and Data Science (ICCIDS 2018)
Intrusion Detection in Computer Networks using Lazy Learning
Algorithm
Aditya Chellama, Ramanathan La, Ramani Sa
aSchool of Computer Science and Engineering, VIT, Vellore 632014, India
Abstract
Intrusion Detection Systems (IDS) are used in computer networks to safeguard the integrity and confidentiality of sensitive data.
In recent years, network traffic has become sizeable enough to be considered under the big data domain. Current machine
learning based techniques used in IDS are largely defined on eager learning paradigms which lose performance efficiency by
trying to generalize training data before receiving queries thereby incurring overheads for trivial computations. This paper,
proposes the use of lazy learning methodologies to improve overall performance of IDS. A novel heuristic weight based indexing
technique has been used to overcome the drawback of high search complexity inherent in lazy learning. IBk and LWL, two
popular lazy learning algorithms have been compared and applied on the NSL-KDD dataset for simulating a real-world like
scenario and comparing their relative performances with hw-IBk. The results of this paper clearly indicate lazy algorithms as a
viable soluti on for real-world network intrusion detecti on.
© 2018 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and
Data Science (ICCIDS 2018).
Keywords :Lazy Learning; Intrusion Detection System; Machine Learning; IBk; kNN
2 Aditya Chellam, Ramanathan L, Ramani S/ Procedia Computer Science 00 (2018) 000002
1. Introduction
The predominant strategy for observing systems for vindictive movement or information infringement is the
utilization of Intrusion Detection System (IDS). Any identified approach of infringement is ordinarily revealed either
to an overseer or accumulated midway utilizing a Security Information and Event Management (SIEM) framework.
A SIEM framework system-cluster comes about because of numerous sources and makes utilization of preventive
sifting procedures to decide the validity of identified assault.Network Intrusion Detection Systems (NIDS) are
strategically positioned and demonstrate the framework screen motion between all nodes on the framework. It
supervises the actions on the entire network and unusual subnet activities are corresponded to a library of assaults
that are already known.Once an assault is recognized, or irregular conduct is detected, the caution can be sent to the
administrator. A case of an NIDS would introduce it on the subnet where firewalls are situated, so as to check
whether somebody is attempting to break into the firewall. In a perfect world, one would check all inbound and
outbound activity; however, doing as such, may make a bottleneck that would weaken the general speed of the
system. OPNET and NetSim are regularly utilized instruments for reproducingnetwork intrusion discover y
frameworks. NID Systems are additionally equipped for contrasting marks for comparative bundles with connection
and drop unsafe distinguished parcels which have a mark coordinating the records in the NIDS.NIDS can be
characterizedinto two subgroups based on the intuitiveness of the framework, namely, disconnected and online
NIDS. Disconnected NIDS detect assaults by passing the information through a set ofprocedures[6 ]. In the case of
Online NIDS, Ethernet bundles are scrutinized and tenets are applied to detect assaults.
2. Data Mining in Computer Networks
Data mining techniques for intrusion detection are chiefly based on follows
Frequent pattern mining
Classification
Clustering
Mining data streams
Data mining in the network security context is defined as the non-trivial process of identifying verified and
important data by characterizing the underlying patterns in the networks.Machine Learning based data mining
techniques have tremendous applications in detecting underlying patterns in network traffic data. Supervised
learning is performed to learn accurate and exact models from previous intrusion logs.Alternatively, in unsupervised
learning, suspicious activities are detected and subsequently identified.
2.1. Lazy Learning Algorithms
IBk Classifier In the K-nearest neighbour’s classifier, predictions are made based on the relative node distances of
instances from each class. There is no fixed value of K suitable for all domains, and the algorithm uses cross
validation of K in order to pick an appropriate value.
LWL Classifier Locally Weighted Learning (otherwise called memory-based learning, case-based learning, lazy-
learning, and firmly identified with kernel density estimation, similitude seeking and case-based thinking). Locally
Weighted Learning is basic, yet, engaging, both naturally and measurably. When you need to foresee, what will
occur later on, you basically venture into a database of all your past encounters, get some comparative encounters,
join them (maybe by a weighted normal that weights more comparative encounters all the more unequivocally) and
utilize the blend to make an expectation, do a relapse, or numerous other more complex operations. The algorithm is
extremely adaptable and provides a precise model in the long run.
Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936 929
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000002
www.elsevier.com/locate/procedia
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000002
www.elsevier.com/locate/procedia
2 Aditya Chellam, Ramanathan L, Ramani S/ Procedia Computer Science 00 (2018) 000002
1. Introduction
The predominant strategy for observing systems for vindictive movement or information infringement is the
utilization of Intrusion Detection System (IDS). Any identified approach of infringement is ordinarily revealed either
to an overseer or accumulated midway utilizing a Security Information and Event Management (SIEM) framework.
A SIEM framework system-cluster comes about because of numerous sources and makes utilization of preventive
sifting procedures to decide the validity of identified assault.Network Intrusion Detection Systems (NIDS) are
strategically positioned and demonstrate the framework screen motion between all nodes on the framework. It
supervises the actions on the entire network and unusual subnet activities are corresponded to a library of assaults
that are already known.Once an assault is recognized, or irregular conduct is detected, the caution can be sent to the
administrator. A case of an NIDS would introduce it on the subnet where firewalls are situated, so as to check
whether somebody is attempting to break into the firewall. In a perfect world, one would check all inbound and
outbound activity; however, doing as such, may make a bottleneck that would weaken the general speed of the
system. OPNET and NetSim are regularly utilized instruments for reproducingnetwork intrusion discover y
frameworks. NID Systems are additionally equipped for contrasting marks for comparative bundles with connection
and drop unsafe distinguished parcels which have a mark coordinating the records in the NIDS.NIDS can be
characterizedinto two subgroups based on the intuitiveness of the framework, namely, disconnected and online
NIDS. Disconnected NIDS detect assaults by passing the information through a set ofprocedures[6 ]. In the case of
Online NIDS, Ethernet bundles are scrutinized and tenets are applied to detect assaults.
2. Data Mining in Computer Networks
Data mining techniques for intrusion detection are chiefly based on follows
Frequent pattern mining
Classification
Clustering
Mining data streams
Data mining in the network security context is defined as the non-trivial process of identifying verified and
important data by characterizing the underlying patterns in the networks.Machine Learning based data mining
techniques have tremendous applications in detecting underlying patterns in network traffic data. Supervised
learning is performed to learn accurate and exact models from previous intrusion logs.Alternatively, in unsupervised
learning, suspicious activities are detected and subsequently identified.
2.1. Lazy Learning Algorithms
IBk Classifier In the K-nearest neighbour’s classifier, predictions are made based on the relative node distances of
instances from each class. There is no fixed value of K suitable for all domains, and the algorithm uses cross
validation of K in order to pick an appropriate value.
LWL Classifier Locally Weighted Learning (otherwise called memory-based learning, case-based learning, lazy-
learning, and firmly identified with kernel density estimation, similitude seeking and case-based thinking). Locally
Weighted Learning is basic, yet, engaging, both naturally and measurably. When you need to foresee, what will
occur later on, you basically venture into a database of all your past encounters, get some comparative encounters,
join them (maybe by a weighted normal that weights more comparative encounters all the more unequivocally) and
utilize the blend to make an expectation, do a relapse, or numerous other more complex operations. The algorithm is
extremely adaptable and provides a precise model in the long run.
930 Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936
Aditya Chellam, Ramanathan L, Ramani S / Procedia Computer Science 00 (2018) 000000 3
2.2. Advantage of using lazy learning
The fundamental preferred standpoint picked up in utilizing a lazy learning strategy, for example, case-based
thinking, is that the objective capacity will be locally approximated. Since the objective capacity is approximated
locally for each question to the framework, lethargic learning frameworks can at the same time take care of
numerous issues and arrangement effectively with changes in the issue area.
2.3. Simulation of real world network
The NSL-KDD dataset is recommended for this study as it takes care of a portion of the characteristic issues of
the KDD'99 informational index as mentioned in. In spite of the way that, this new type of the KDD dataset still
encounters a bit of the issues discussed by McHugh and may not be a faultless illustrative of existing veritable
frameworks, in perspective of the lack of open source data indexes for framework based IDSs, it can in any case be
reliably associated as an effective benchmark instructive record to enable investigators to analyse changed
interference acknowledgment procedures[2].
Also, the NSL-KDD contains a sizeable number of records[2]. This favoured angle influences it to run the
examinations on the aggregate set without the need to discretionarily pick a tiny bit. In this manner, appraisal
eventual outcomes of different research work are expected to be essentially indistinguishable. The salient features of
the NSL-LDD that make it more desirable than its predecessors are as follows, The quantity of picked records from
each issue level is comparable to the rate of records in the primary KDD dataset. Thus, the portrayal rates of
unmistakable machine learning methodologies vary in a broader region, which makes it more capable to have a
correct evaluation of different learning techniques. Both test and prepared set contain appropriate number of
instances, th us investigations can be run on the entire set seamlessly. Therefore, assessment aftereffects of different
research works will be consistent and nearly alike.
3. Literature Survey
Various Machine Learning (ML) algorithms were surveyed for determining the optimum data mining solution to
detect intrusions in computer networks. The various surveyed work has been enlisted in tabular form below.
Table 1 . Literatur e Survey Table
Sr.
No.
Author Name Domain Addressed Description Algorithm Used Advantage
1
David A Cieslak et al.
Imbalance in Net work
Intru sion Data sets
Actual Notre Dame traffic analysed
to detect imbalance in real time
network intrusions. Using ROC
analysi s, it is shown that over-
sampli ng by arti ficial generation o f
minority (intrusion) class outdo
oversampling by imitation and
RIPPER’s loss ratio method [6].
RIPPER rule learning,
ROC used for analy sis
Clustering
based a pproach
more suitable
for intru sion
detection and
can deliver
added
enhancement
over just
artificial
generation of
occurrences.
2
Wei Wang and Roberto
Battiti
Network Intrusion
Detection
Normal i ntrusion behaviour profiled
founded on regular data for
irregularity detection a nd model s of
each type of attack built based on
attack
data for intrusion
recognition[7].
Principa l Component
Analysis; proposed
profili ng algorithm
Accurate
identification
and
computationall y
efficiency
model for rea l-
time intrusion
identification.
4 Aditya Chellam, Ramanathan L, Ramani S/ Procedia Computer Science 00 (2018) 000002
3
Jiong Z hang et al .
Network Intrusion
Detection
KDD’99 experiment to detect
network intrusions using Random
Forest Algorithm. Propose d model
improves detec tion per forman ce of
current Network Intrusion
Detection Systems (NIDS)[4].
Random Forest
Algorithm
Can detect
unknown
intrusions and
low, false
positive rate;
overc omes
shortcomings of
anomaly and
misuse
detection.
4
Steven Noel and
SushilJajodia
Complex Network
Attacks
Graph ba sed technique elu cidat es
multiple-step attacks by matching
rows and columns of the clustered
adjacency matrix permitting atta ck
influence/responses to be identified
and priori tized based on the numbe r
of atta ck steps to victim machines,
and allows attack origins to be
determined[3].
Adjacency Ma trix
Clustering
places intrusion
checkpoints in
context of
susceptibility
based attack
graphs, making
false alarms
ostensible thus
making
inference of
missed
detection
possibl e
5
Corvera S. et al
Anomal y Dete ction
Data mining technique used to
cluster network s to detec t
anomalies using kNN based
learni ng.
k-NN Algorithm for
anomaly detection
Efficient and
effective
anomaly
detection in
networks
6
Wenke Lee et al.
Building IDS models
Reviewing programs used to
excer pt
Extensive set of features describing
each node in system. Data mining
programs are used to accurately
learn rules capturing the behaviour
of interruptions and normal
events[1].
Meta Classification
RIPPER used for
anomaly detection.
Bro Engine used for
packet filtering and
reassembling.
Proposed model
shows best
detection in
U2R and
PROBING
attacks.
7
M. MazharRath ore,
Anand Paul et al.
Real-Time Intrusion
Detecti on for high
speed networks
Hadoop based IDS for high speed
real time intrusion detection.
Nine best parameters are selected
for intruder flows classification
using FSR and BER, as well as by
analysing the DAR PA dataset s[16].
REPTree and J48
algorit hm
Proposed model
has bet ter
effici ency a nd
accura cy than
existing models
and is capa ble
of handli ng big
data.
8
MahsaBataghvaS hahbaz
et al.
Efficiency
Enhancement of
Featur e Selection i n
IDS
Highly dimensional NSL -KDD
dataset experimented on for feature
extraction and selection for
improving accuracy in ID S[15].
J48 classifier
Enhances
perfor mance
through
reduction of
complexity and
acceleration of
detection
process
9
FaridLawan Bello et al.
Analysis and
Evaluation of H ybrid
IDS
Different IDS classifier models
analysed based on detection
strategies calli ng for hybrid model
to overcome limitations[14].
Support Vector
Machin e algorit hm
(SVM) . Clu stering
based on Self
Organizing Ant
Colony Networks.
Hybrid model
enable s
detection of
multilevel
classes of
attacks with
low cla ssifier
training time.
Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936 931
Sr.
No.
Author Name
Domain Addressed
Description
Algorithm Used
Advantage
1
David A Cieslak et al.
Imbalance in Net work
Intru sion Data sets
Actual Notre Dame traffic analysed
to detect imbalance in real time
network intrusions. Using ROC
analysi s, it is shown that over-
sampli ng by arti ficial generation o f
minority (intrusion) class outdo
oversampling by imitation and
RIPPER’s loss ratio method [6].
RIPPER rule learning,
ROC used for analy sis
Clustering
based a pproach
more suitable
for intru sion
detection and
can deliver
added
enhancement
over just
artificial
generation of
occurrences.
2
Wei Wang and Roberto
Battiti
Network Intrusion
Detection
Normal i ntrusion behaviour profiled
founded on regular data for
irregularity detection a nd model s of
each type of attack built based on
attack data for intru sion
recognition[7].
Principa l Component
Analysis; proposed
profili ng algorithm
Accurate
identification
and
computationall y
efficiency
model for rea l-
time intrusion
identification.
4 Aditya Chellam, Ramanathan L, Ramani S/ Procedia Computer Science 00 (2018) 000002
3
Jiong Z hang et al .
Network Intrusion
Detection
KDD’99 experiment to detect
network intrusions using Random
Forest Algorithm. Propose d model
improves detec tion per forman ce of
current Network Intrusion
Detection Systems (NIDS)[4].
Random Forest
Algorithm
Can detect
unknown
intrusions and
low, false
positive rate;
overc omes
shortcomings of
anomaly and
misuse
detection.
4
Steven Noel and
SushilJajodia
Complex Network
Attacks
Graph ba sed technique elu cidat es
multiple-step attacks by matching
rows and columns of the clustered
adjacency matrix permitting atta ck
influence/responses to be identified
and priori tized based on the numbe r
of atta ck steps to victim machines,
and allows attack origins to be
determined[3].
Adjacency Ma trix
Clustering
places intrusion
checkpoints in
context of
susceptibility
based attack
graphs, making
false alarms
ostensible thus
making
inference of
missed
detection
possibl e
5
Corvera S. et al
Anomal y Dete ction
Data mining technique used to
cluster network s to detec t
anomalies using kNN based
learni ng.
k-NN Algorithm for
anomaly detection
Efficient and
effective
anomaly
detection in
networks
6
Wenke Lee et al.
Building IDS models
Reviewing programs used to
excer pt
Extensive set of features describing
each node in system. Data mining
programs are used to accurately
learn rules capturing the behaviour
of interruptions and normal
events
[1]
.
Meta Classification
RIPPER used for
anomaly detection.
Bro Engine used for
packet filtering and
reassembling.
Proposed model
shows best
detection in
U2R and
PROBING
attacks.
7
M. MazharRath ore,
Anand Paul et al.
Real-Time Intrusion
Detecti on for high
speed networks
Hadoop based IDS for high speed
real time intrusion detection.
Nine best parameters are selected
for intruder flows classification
using FSR and BER, as well as by
analysing the DAR PA dataset s[16].
REPTree and J48
algorit hm
Proposed model
has bet ter
effici ency a nd
accura cy than
existing models
and is capa ble
of handli ng big
data.
8
MahsaBataghvaS hahbaz
et al.
Efficiency
Enhancement of
Featur e Selection i n
IDS
Highly dimensional NSL -KDD
dataset experimented on for feature
extraction and selection for
improving accuracy in ID S[15].
J48 classifier
Enhances
perfor mance
through
reduction of
complexity and
acceleration of
detection
process
9
FaridLawan Bello et al.
Analysis and
Evaluation of H ybrid
IDS
Different IDS classifier models
analysed
based on detection
strategies calli ng for hybrid model
to overcome limitations[14].
Support Vector
Machin e algorit hm
(SVM) . Clu stering
based on Self
Organizing Ant
Colony Networks.
Hybrid model
enable s
detection of
multilevel
classes of
attacks with
low cla ssifier
training time.
932 Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936
Aditya Chellam, Ramanathan L, Ramani S / Procedia Computer Science 00 (2018) 000000 5
10
Ma Xia o-li et al.
Data mining in
computer network
security
KDD-CUP 2002 dataset to exploit
to test out Artificial Immune
System based classification for
improved accura cy in intrusion
detection.
Artificial Immune
System algorithm;
developed further on
Neural Network and
SVM classifier.
Accelerates
speed of
network
intrusion
detection.
Reduces non-
response rates
and more
reliable security
model.
11
Subaira.A. S et al.
Improvi ng
Classification
efficiency in IDS
Elucidates data mining as an
efficient artifice for intrusion
detection to deter mine k ey
components from big data in
networks [12].
SVM, decision tree
Algorithms, Neural
Network , , Bayesian
Classifier, K- Nearest
Neighbour, Fuzzy
Logic and Genetic
Algorithm
Reduces strain
of physical
compilations of
the regular and
irregular
behavi our
patt erns.
12
Kailas Elekar et al.
Data mining in
Intrusion Detection
Network traffic KDD CUP
dataset is scrutinized and
supervised for detecting security
faults using rule based data minin g
algorit hm for detection[13].
Rule based data
mining algorithm
OneR, PART, and
zeroR, Decision
Table, JRip.
Significantly
better
performance by
PART classifier
in overall
intrusion
detection
classification.
13
Ali SharifiBorou jerdi et
al.
DDoS Attack D etection
Ensemble of Sugeno kind adaptive
neuro -fuzzy classifiers proposed for
DDoS intrusion finding using
Ma
rliboost. Model performance
evaluated on basis of detection of
correctness and false positive
alarms[9].
Fuzzy- Neural
Network with
Marlibo ost for
boosting.
Proposed
classifiers
combination
has improved
detection
accura cy to
96%.
14
Zeon Trevor Fer nando
et al.
Network Attacks
Identification
Experi mental anal ysis carri ed on
KDD99 dataset and each feature is
selected using int egrated
mechanism to identify attacks in the
dataset[8][11].
J48 decision tree and
Self-Organizing Ma p
(SOM).
Increases
overal l
classification
accura cy by
reducing
dataset to
priorit ized
subset.
15
ManasRanjanPatra and
AshalataPanigrahi
Enhancing
Performance of IDS
Soft computing techniques used on
NSL-
KDD dataset to assess
performance of each procedure and
determine most efficient solution
for enhanced accuracy in intrusion
detection[10].
Radial Basis Function
Network (RBFN),
Self-Organizing Ma p
(SOM), Support
Vector Machine
(SVM) , back
propagation, an d J48
classifier
Improved
efficiency in
catalogu ing of
network
intrusion data
into re gular and
irregular data.
16
V. K. Pachghare and
ParagKulkarni
Pattern Based Network
Security [13]
Highly uneven KDD-cup’99 dataset
used as based to detect patterns
using J48 graft for improved
performance in intrusion
detection[8].
J48 Graft algorithm
(Decision Tree) and
SVM classifier
J48 Graft tree
determined to
per form b est for
patter n
classification in
IDS.
Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936 933
Aditya Chellam, Ramanathan L, Ramani S / Procedia Computer Science 00 (2018) 000000 5
10
Ma Xia o-li et al.
Data mining in
computer network
security
KDD-CUP 2002 dataset to exploit
to test out Artificial Immune
System based classification for
improved accura cy in intrusion
detection.
Artificial Immune
System algorithm;
developed further on
Neural Network and
SVM classifier.
Accelerates
speed of
network
intrusion
detection.
Reduces non-
response rates
and more
reliable security
model.
11
Subaira.A. S et al.
Improvi ng
Classification
efficiency in IDS
Elucidates data mining as an
efficient artifice for intrusion
detection to deter mine k ey
components from big data in
networks [12].
SVM, decision tree
Algorithms, Neural
Network , , Bayesian
Classifier, K- Nearest
Neighbour, Fuzzy
Logic and Genetic
Algorithm
Reduces strain
of physical
compilations of
the regular and
irregular
behavi our
patt erns.
12
Kailas Elekar et al.
Data mining in
Intrusion Detection
Network traffic KDD CUP
dataset is scrutinized and
supervised for detecting security
faults using rule based data minin g
algorit hm for detection[13].
Rule based data
mining algorithm
OneR, PART, and
zeroR, Decision
Table, JRip.
Significantly
better
performance by
PART classifier
in overall
intrusion
detection
classification.
13
Ali SharifiBorou jerdi et
al.
DDoS Attack D etection
Ensemble of Sugeno kind adaptive
neuro -fuzzy classifiers proposed for
DDoS intrusion finding using
Marliboost. Model performance
evaluated on basis of detection of
correctness and false positive
alarms[9].
Fuzzy- Neural
Network with
Marlibo ost for
boosting.
Proposed
classifiers
combination
has improved
detection
accura cy to
96%.
14
Zeon Trevor Fer nando
et al.
Network Attacks
Identification
Experi mental anal ysis carri ed on
KDD99 dataset and each feature is
selected using int egrated
mechanism to identify attacks in the
dataset[8][11].
J48 decision tree and
Self-Organizing Ma p
(SOM).
Increases
overal l
classification
accura cy by
reducing
dataset to
priorit ized
subset.
15
ManasRanjanPatra and
AshalataPanigrahi
Enhancing
Performance of IDS
Soft computing techniques used on
NSL-KDD dataset to assess
performance of each procedure and
determine most efficient solution
for enhanced accuracy in intrusion
detection[10].
Radial Basis Function
Network (RBFN),
Self-Organizing Ma p
(SOM), Support
Vector Machine
(SVM) , back
propagation, an d J48
classifier
Improved
efficiency in
catalogu ing of
network
intrusion data
into re gular and
irregular data.
16
V. K. Pachghare and
ParagKulkarni
Pattern Based Network
Security [13]
Highly uneven KDD-cup’99 dataset
used as based to detect patterns
using J48 graft for improved
performance in intrusion
detection[8].
J48 Graft algorithm
(Decision Tree) and
SVM classifier
J48 Graft tree
determined to
per form b est for
patter n
classification in
IDS.
6 Aditya Chellam, Ramanathan L, Ramani S/ Procedia Computer Science 00 (2018) 000002
4. Proposed Work
In the current k-NN algorithm the existing nodes are partitioned into classed and the result of applying the classifier
is a membership to either of the classes.
k defines the number of neighbours in consideration.
When value of k=1, every training vector defines a section in space, defining a Voronoi partition of space.
     (1)
Where, Riis the radial distance of the neighbour i from the node.
Euclidean distance measure is used to calculate the distance between the node and its nearest neighbours.
 
 (2)
where is the distance between the nodes p and q.
Based on the distance vector, the k instances are ranked by Bayesian probability, where the notations have their
standard meanings,


(3)
Alternatively, a sequential heuristic rank can also be assigned. The major drawback in this method of approach is
the high search complexity that ensues Euclidean distance measurements. To overcome this drawback, only those
computations that are absolutely necessary for getting an accurate measure should be computed. Thus, each node
niis associated with an appropriate fractional weight wi. The initial assignment of the weights is the same. Indicating
equal initial weightage of all the nodes.
Furthermore, in order to limit the error in measurements, a constraint on the initial weight assignment has been
imposed by stating that the sum of the weights of the kneighbours of an instance is 1.
 (4)
Each neighbour of n is assigned an initial value of 1/k. Based on the importance of each of the neighbours in
determining the class of the new node, the weights are updated.Heuristic ranks assigned based on, probabilistic
significance is used as a metric for weight updation.
(5)
The heuristic ranks can be determined by Maryam Kuhkan’s updated measure of classification (from David Aha’s
model)[3][5],
Table 2 . Weight Change Characteristics
Difference/Classification
Correct
Incorrect
Little Uncha nged
Much Decrease
Much Unchanged
Little Decrea se
934 Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936
Aditya Chellam, Ramanathan L, Ramani S / Procedia Computer Science 00 (2018) 000000 7
By experimentally testing and trying out the method, it is found that the complexity could further be reduced by
considering only the (k/2) + 1, most significant neighbours ranked in descending order by updated weights.
Thus, the new distance measure is,
 

 (6)
The, resultant vector is the list of distance measures of the node to its neighbours.Thus, the search complexity is
significantly reduced.
5. Implementation
Weka 3.8 tool has been used for to implement the various lazy learning algorithms. NSL-KDD comprising of
22544 instances and 42 attributes was used as the representational dataset for real-world like network traffic data.
Using 10-fold cross- validation testing option the classifier was deployed. A NetBeans framework was designed to
incorporate the modifications of the novel distance vector measurements. The experimental implementation and
observed results have been reported in the following tables.
6. Result
IBk
Class wise accuracy
Table 3 . Per formance Met rics of IB k
TP
Rate
FP
Rate
Precision
F-
measure
ROC
Area
PRC Area Class
0.910 0.033 0.901 0.913 0.937 0.892
normal
0.936 0.049 0.945 0.933 0.937 0.953
anomaly
0.923 0.041 0.923 0.923 0.937 0.922
Modified IBk
Class wise accuracy
Table 4 . Per formance Metrics of HW-IBk
TP
Rate
FP
Rate
Precision
F-
measure
ROC
Area
PRC Area Class
0.969 0.019 0.974 0.972 0.976 0.962
normal
0.981 0.031 0.977 0.979 0.976 0.972
anomaly
0.976 0.026 0.976 0.976 0.976 0.967
8 Aditya Chellam, Ramanathan L, Ramani S/ Procedia Computer Science 00 (2018) 000002
LWL
Class wise accuracy
Table 5 . Performance Metri cs of LWL
TP Rate
FP
Rate
Precision
F-
measure
ROC
Area
PRC
Area
Class
0.878
0.071
0.903
0.890
0.968
0.967
normal
0.929
0.122
0.910
0.919
0.968
0.973
anomaly
0.907
0.100
0.907
0.907
0.968
0.970
Fig1. Comparison of performance of lazy-learning classifiers on NSL-KDD
Thus, the overall accuracy has improved by nearly 4%, with reduced search complexity and thus computation
results are available faster too. Not including less significant terms also prunes out noise introducing nodes during
classification.
7. Conclusion
This paper elucidates the advantages of lazy learning in IDS. Lazy learning improves the efficiency of th e NIDS
by eliminating pre-fetching of overheads that are inherent in eager learning algorithms popularly in use today.
Further an improvement of the k-nearest neighbour algorithm has been proposed to reduce the search complexity
using a heuristic weight based indexing system. The results of this sufficiently prove thehw-IBk algorithm is a
practical and viable solution for intrusion detection in data streams, with great accuracy,more so than other machine
learning algorithms currently deployed. Additionally, the IBk algorithm has been compared to another other lazy
learning algorithmLWL in order to compareand contrast their performances on the NSL-KDD network traffic
dataset.The time taken to detect intrusions is significantly reduced and it is observed that the number of correctly
classified instances of intrusions is relatively higher (~97.59).Thus, with significant increase in the speed of
computation, network intrusions can now be detected faster without any loss to accuracy and thus aid in threat
identification in real-time network system.
86%
88%
90%
92%
94%
96%
98%
100%
IB-k
HW-Ibk
LWL
Performance Comparision Chart
Correctly Classified Instances
Incorrectly Classified Instances
Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936 935
Aditya Chellam, Ramanathan L, Ramani S / Procedia Computer Science 00 (2018) 000000 7
By experimentally testing and trying out the method, it is found that the complexity could further be reduced by
considering only the (k/2) + 1, most significant neighbours ranked in descending order by updated weights.
Thus, the new distance measure is,
 

 (6)
The, resultant vector is the list of distance measures of the node to its neighbours.Thus, the search complexity is
significantly reduced.
5. Implementation
Weka 3.8 tool has been used for to implement the various lazy learning algorithms. NSL-KDD comprising of
22544 instances and 42 attributes was used as the representational dataset for real-world like network traffic data.
Using 10-fold cross- validation testing option the classifier was deployed. A NetBeans framework was designed to
incorporate the modifications of the novel distance vector measurements. The experimental implementation and
observed results have been reported in the following tables.
6. Result
IBk
Class wise accuracy
Table 3 . Per formance Met rics of IB k
TP
Rate
FP
Rate
Precision
F-
measure
ROC
Area
PRC Area
Class
0.910
0.033
0.901
0.913
0.937
0.892
normal
0.936
0.049
0.945
0.933
0.937
0.953
anomaly
0.923
0.041
0.923
0.923
0.937
0.922
Modified IBk
Class wise accuracy
Table 4 . Per formance Metrics of HW-IBk
TP
Rate
FP
Rate
Precision
F-
measure
ROC
Area
PRC Area
Class
0.969
0.019
0.974
0.972
0.976
0.962
normal
0.981
0.031
0.977
0.979
0.976
0.972
anomaly
0.976
0.026
0.976
0.976
0.976
0.967
8 Aditya Chellam, Ramanathan L, Ramani S/ Procedia Computer Science 00 (2018) 000002
LWL
Class wise accuracy
Table 5 . Performance Metri cs of LWL
TP Rate
FP
Rate
Precision
F-
measure
ROC
Area
PRC
Area
Class
0.878 0.071 0.903 0.890 0.968 0.967
normal
0.929 0.122 0.910 0.919 0.968 0.973
anomaly
0.907 0.100 0.907 0.907 0.968 0.970
Fig1. Comparison of performance of lazy-learning classifiers on NSL-KDD
Thus, the overall accuracy has improved by nearly 4%, with reduced search complexity and thus computation
results are available faster too. Not including less significant terms also prunes out noise introducing nodes during
classification.
7. Conclusion
This paper elucidates the advantages of lazy learning in IDS. Lazy learning improves the efficiency of th e NIDS
by eliminating pre-fetching of overheads that are inherent in eager learning algorithms popularly in use today.
Further an improvement of the k-nearest neighbour algorithm has been proposed to reduce the search complexity
using a heuristic weight based indexing system. The results of this sufficiently prove thehw-IBk algorithm is a
practical and viable solution for intrusion detection in data streams, with great accuracy,more so than other machine
learning algorithms currently deployed. Additionally, the IBk algorithm has been compared to another other lazy
learning algorithmLWL in order to compareand contrast their performances on the NSL-KDD network traffic
dataset.The time taken to detect intrusions is significantly reduced and it is observed that the number of correctly
classified instances of intrusions is relatively higher (~97.59).Thus, with significant increase in the speed of
computation, network intrusions can now be detected faster without any loss to accuracy and thus aid in threat
identification in real-time network system.
86%
88%
90%
92%
94%
96%
98%
100%
IB-k
HW-Ibk
LWL
Performance Comparision Chart
Correctly Classified Instances
Incorrectly Classified Instances
936 Aditya Chellam et al. / Procedia Computer Science 132 (2018) 928–936
Aditya Chellam, Ramanathan L, Ramani S / Procedia Computer Science 00 (2018) 000000 9
Acknowledgements
I would like to thank all the people who have motivated and helped me most throughout my project especially my
colleagues who, by exchanging their own thoughts and providing valuable input made it possible to complete the
paper with all accurate information.
References
[1] Lee, W., Stolfo, S.J. and Mok, K.W. (1999)“A data mining framework for buil ding intrusion detection model s.” in Security and Privacy,
1999. Proceedings of the 1999 IEEE Symposium: 120132.
[2] McHugh, J. (2000)“Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as
performed by Lincoln laborator y.ACM Transactions on Information and System Security (TISSEC), 3(4):262294.
[3] Noel, S. and Jajodia, S. (2005)“December. Under standing Complex Network Attack Graphs throu gh Clustered Adjacency Matrices.” in
Proceedings of the 21st Annual Computer Security Applications Conference: 160169.
[4] Zha ng, J. and Zulkernine, M. (2006)“A Hybrid N etwork I ntrusio n Detecti on Technique Using Random Forests.” in Proceedings of the First
Interna tional Conferenc e on Availability, R eliability and Security: 262269.
[5] Maryam Kuhkan. (2006) “A Method t o Improv e accuracy of k -NN algorithm”, IJCEIT8(6): 9095.
[6]Cieslak, D. A., Chawla, N. V., &Striegel, A. (2006)“Combating imbalance in network intrusion datasets. in GrC: 732737.
[7] Wang, W. and Battiti, R.(2006)“Identifying Intru sions in Computer Networks wi th Prin cipal Component Analysis” in Proceedings of the
First International Conference on Availability, Reliability and Security 270 279.
[8] Pachghare, V.K. and Kulkarni, P.(2011)“Pattern based net work secu rity usi ng decision trees and support vector machin e. in Electronics
Computer Technology (ICECT), 2011 3rd International Conference on 5(1): 254257.
[9] Boroujerdi, A. S., &Ayat, S. (2013)“A robust ensemble of neuro-fuzzy classi fiers for DDoS attack detection. in Computer Science and
Network Technology (ICCSNT), 2013 3rd International Conference:484487.
[10] Patra, M. R., &Panigrahi, A. (2013)“Enhancing Performance of Intrusion Detection through Soft Computing Techniques. in Computational
and Business Intelligence (ISCBI), 2013 International Symposium: 44-48.
[11] Fernando, Z. T., Thaseen, I. S., & Kumar, C. A. (2014)“Network attacks identification using consistency based feature selection and self-
organizing maps . In Networks & Soft Computing (ICNSC), 2014 First International Conference:162166.
[12] Subaira, A. S., &Anitha, P. (2014)“Efficient classification mechanism for network intrusi on detection system based on data mining
techniques: a survey. in Intelligent Systems and Control (ISCO), 2014 IEEE 8th International Conference:274280.
[13]Elekar, K., Wa ghmare, M. M., &Priyadarshi, A. (2015)“ Use of rule base data mining algorithm for intrusion detection. in Pervasive
Computing (ICPC), 2015 International Conference:15.
[14]Bello, F. L., &Ravulakollu, K. (2015)“Analysis and evaluation of hybrid intrusion detection system models. in Computers, Communications,
and Systems (ICCCS), International Conference: 9397.
[15]Shahbaz, M. B., Wang, X., Behnad, A., &Samarabandu, J. (2016)“On efficiency enhancement of the correlation-based feature selection for
intrusion detection systems. in Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2016 IEEE 7th
Annual: 17.
[16]Rathore, M.M., Paul, A., Ahmad, A., Rho, S., Imran, M. a nd Guizani, M.(2016)“Hadoop Based Real-Time Intrusion Detection for High-
Speed Networks” in Global Communications Conference (GLOBECOM), 2016 IEEE: 16.
... It is worth mentioning that SSPLR method behaves as an optimization technique by iterative procedure, where the computational costs can be computed by OðN Â PÞ for each iteration. Here, N is the total numbers of training records and P is [1,2,3,6,7,9], [11,14,17,18,22], [23,25,26,29,30,31], [32,35,36,39,40] 95.89 93.65 92.15 NSL-KDD (U2R) [1,2,3,4,5,6,7,8,9], [12,13,14,17,18,20,21,22], [23,24,25,26,27,28,29,30,31], [32,33,34,35,36,39,40] 95.91 90.24 87.83 NSL-KDD'99 (R2L) [1,2,3,4,5,6,7,8,9,], [10,12,13,17,18,19,20,21,22], [22,23,25,26,27,28,29,30], [32,33,34,35,36,38,39,40,41] 95.78 91.51 87.84 the total number of features. The convergence includes various aspects such as step size, data set, parameter settings, etc. Whereas, in most of the cases SSPLR convergence rate is very fast especially in first numerous repetitions. ...
... It is worth mentioning that SSPLR method behaves as an optimization technique by iterative procedure, where the computational costs can be computed by OðN Â PÞ for each iteration. Here, N is the total numbers of training records and P is [1,2,3,6,7,9], [11,14,17,18,22], [23,25,26,29,30,31], [32,35,36,39,40] 95.89 93.65 92.15 NSL-KDD (U2R) [1,2,3,4,5,6,7,8,9], [12,13,14,17,18,20,21,22], [23,24,25,26,27,28,29,30,31], [32,33,34,35,36,39,40] 95.91 90.24 87.83 NSL-KDD'99 (R2L) [1,2,3,4,5,6,7,8,9,], [10,12,13,17,18,19,20,21,22], [22,23,25,26,27,28,29,30], [32,33,34,35,36,38,39,40,41] 95.78 91.51 87.84 the total number of features. The convergence includes various aspects such as step size, data set, parameter settings, etc. Whereas, in most of the cases SSPLR convergence rate is very fast especially in first numerous repetitions. ...
... It is worth mentioning that SSPLR method behaves as an optimization technique by iterative procedure, where the computational costs can be computed by OðN Â PÞ for each iteration. Here, N is the total numbers of training records and P is [1,2,3,6,7,9], [11,14,17,18,22], [23,25,26,29,30,31], [32,35,36,39,40] 95.89 93.65 92.15 NSL-KDD (U2R) [1,2,3,4,5,6,7,8,9], [12,13,14,17,18,20,21,22], [23,24,25,26,27,28,29,30,31], [32,33,34,35,36,39,40] 95.91 90.24 87.83 NSL-KDD'99 (R2L) [1,2,3,4,5,6,7,8,9,], [10,12,13,17,18,19,20,21,22], [22,23,25,26,27,28,29,30], [32,33,34,35,36,38,39,40,41] 95.78 91.51 87.84 the total number of features. The convergence includes various aspects such as step size, data set, parameter settings, etc. Whereas, in most of the cases SSPLR convergence rate is very fast especially in first numerous repetitions. ...
Article
Full-text available
With the rapid advancement in technology, network systems are becoming prone to more sophisticated types of intrusions. However, machine learning (ML) based strategies are among the most efficient and popular methods to identify the network intrusions or attacks. In this study, we examined the important and discriminative features, in order to recognize the various attacks by applying the Structural Sparse Logistic Regression (SSPLR) and Support Vector Machine (SVMs) methods. The SVMs are standard ML-based techniques, which provide the reasonable performance, however, they have few shortcomings, such as, interpretability and huge computational cost. On the other hand, the sparse modeling (SSPLR) is considered as the advanced method for the data examination and processing through regularization. The structural sparse modeling can be used to simultaneously select the distinct features or the group of discriminative features from the repository of the data set to determine the coefficient of the linear classifier, where, prior information of the feature’s structure can be mapped on various sparsity-inducing regularizations. In this way, the particular group of features yielded by the most significant network attacks are selected and potentially identified. The experiments and discussion, show that the proposed techniques have improved performance compared to the most state-of-the-art techniques, used for the Intrusion Detection System (IDS).
... It reduces the amount of data that should be preserved for historical comparisons of network activity and produces more meaningful data to anomaly detection. Besides, Data Mining can be divide into the following (Gyanchandani et al., 2012;Chellam et al., 2018): ...
Article
Full-text available
Currently, the Intrusion Detection System (IDS) is attracting both the commercial companies and the research community as IDS is playing an increasingly important role in most network systems to detect and block possible attacks. This paper aims to present a general view of the state-of-the-art of the IDS, based on a proposed taxonomy so that the researchers can quickly become familiar with the essential aspects of the intrusion detection techniques. The taxonomy includes reclassifying IDSs according to multiple bases, e.g., IDS' data source, the detection method, and others. Also, comparisons among different detection approaches and various data collection techniques are tabulated. Besides, the paper exhibits the taxonomy of anomaly-based IDSs classifying the promising techniques and summarizing the merits of the most recent anomaly-based techniques as a table. Furthermore, it deliberates the IDSs through various applications, like data centers, backbone, Fog, and Cloud Computing, and IoT models, in addition to the most common prevailing models. Subsequently, multiple metrics and the datasets frequently used to assess the IDS are described concisely. Finally, the requirements and challenges of contemporary IDSs are mapped.
... At present, the uses of artificial intelligence have become relevant and applicable to almost any sector, such as chemical synthesis [47], stock market prediction [48], media recommendation systems [49], medical diagnosis [50], cybersecurity [51], and environmental monitoring and research [17,52,53]. The use of an algorithm to train and make a prediction without being explicitly programmed is known as machine learning (ML). ...
Article
Background In this study, the adsorption of methylene blue (MB) dye using an aquatic plant, Azolla pinnata (AP) was modelled using several various supervised machine learning (ML) algorithms, aiming to accurately predict the adsorption capacity under various experimental conditions. Methods The ML algorithms used in this study are the artificial neural network (ANN), random forests (RF), support vector regression (SVR), and instance-based learner (IbK). The SVR algorithm was trained using three kernels: radial basis function (RBF), Pearson VII universal kernel (PUK), and polynomial kernel (PolyK). The experimental data (adsorbent dosage, pH, ionic strength, initial dye concentration, and contact time) served as input for training the algorithms and with the adsorption capacity as the output. The performance of the algorithms was optimised based on the values of correlation coefficient (R) and fine-tuned using several error functions (e.g. mean absolute error, root mean square error, and non-linear chi-squared). Findings The best performing ML algorithm in this study is SVR-RBF which achieves the highest value in R (0.994) and has the lowest error.
... The advantage of rule-based N-IDS is better known for attack detection. [3]. Therefore, we propose a smart N-IDS which can capture network traffic, analyze, and detect network anomalies automatically. ...
Article
Today's Internet and enterprise networks are so popular as they can easily provide multimedia and ecommerce services to millions of users over the Internet in our daily lives. Since then, security has been a challenging problem in the Internet's world. That issue is called Cyberwar, in which attackers can aim or raise Distributed Denial of Service (DDoS) to others to take down the operation of enterprises Intranet. Therefore, the need of applying an Intrusion Detection System (IDS) is very important to enterprise networks. In this paper, we propose a smarter solution to detect network anomalies in Cyberwar using Stacking techniques in which we apply three popular machine learning models: k-nearest neighbor algorithm (KNN), Adaptive Boosting (AdaBoost), and Random Decision Forests (RandomForest). Our proposed scheme uses the Logistic Regression method to automatically search for better parameters to the Stacking model. We do the performance evaluation of our proposed scheme on the latest data set NSLKDD 2019 dataset. We also compare the achieved results with individual machine learning models to show that our proposed model achieves much higher accuracy than previous works.
... Network intrusion detection systems (NIDS) are extensively investigated in the literature to protect the seemingly vulnerable networks from external and internal intruders [2]. NIDSs have been in use since 1980's after Dorothy Denning [3] delineated that intrusion detection systems are critical to maintain the confidentiality, integrity and availability of computer resources [4,5]. Prolific approaches exist in the field of network intrusion detection that have met with different scales of success. ...
Article
Full-text available
Due to the emerging technological advances, cyber-attacks continue to hamper information systems. The changing dimensionality of cyber threat landscape compel security experts to devise novel approaches to address the problem of network intrusion detection. Machine learning algorithms are extensively used to detect intrusions by dint of their remarkable predictive power. This work presents an ensemble approach for network intrusion detection using a concept called Stacking. As per the popular no free lunch theorem of machine learning, employing single classifier for a problem at hand may not be ideal to achieve generalization. Therefore, the proposed work on network intrusion detection emphasizes upon a combinative approach to improve performance. A robust processing paradigm called Graphlab Create, capable of upholding massive data has been used to implement the proposed methodology. Two benchmark datasets like UNSW NB-15 and UGR’ 16 datasets are considered to demonstrate the validity of predictions. Empirical investigation has illustrated that the performance of the proposed approach has been reasonably good. The contribution of the proposed approach lies in its finesse to generate fewer misclassifications pertaining to various attack vectors considered in the study.
... In the era of Internet and unlimited access of information, network security becomes one of the most important aspect to look into in order to keep confidential data and information from unauthorized third party access [1,2]. Network Intrusion Detection System (NIDS) is an important field of research since it deals with many possibilities and aspects in the real-time application especially in terms of network security. ...
Article
Full-text available
Developing a better intrusion detection systems (IDS) has attracted many researchers in the area of computer network for the past decades. In this paper, Genetic Algorithm (GA) is proposed as a tool that capable to identify harmful type of connections in a computer network. Different features of connection data such as duration and types of connection in network were analyzed to generate a set of classification rule. For this project, standard benchmark dataset known as KDD Cup 99 was investigated and utilized to study the effectiveness of the proposed method on this problem domain. The rules comprise of eight variables that were simulated during the training process to detect any malicious connection that can lead to a network intrusion. With good performance in detecting bad connections, this method can be applied in intrusion detection system to identify attack thus improving the security features of a computer network.
Chapter
Network-based information transmission has brought huge convenience for users in terms of the ease of use. However, the increased transaction not only lures fraudsters but also makes attack detection with a complicated process and mandates scalable models that can handle big data. This paper presents a network intrusion detection model that uses a hybrid ensemble model to identify intrusions in network transmission process. This work proposes the cross-bagging-based stacked ensemble model, which is a two-layered prediction mechanism used for operating on the complex network data. The first layer that contains a modified bagging mechanism is called the cross-bagging, where the results are passed to the second layer for final prediction. Experiments were conducted by using the NSL-KDD dataset. The usage of ensemble modelling enables the model to be effectively parallelized and also ensures high scalability of the model. This ensures effective prediction even on data with large volumes and high velocity. Comparisons with recent models show high performance for the proposed model.
Chapter
The coal gasification process is one of the most convenient and clean coal technologies that convert coal into electricity, syngas, and other energy products. Thus, it is essential to estimate the outcomes of this process to obtain the optimum amount of product. Therefore, the main effort of this study is to evaluate the capability of various machine learning (ML) methods to predict gasification process output variables such as the product gas generation and product gas heating value. For this purpose, various regression models were created by using different ML algorithms such as Sequential Minimal Optimization Regression, Gaussian Process Regression, Lazy K-Star, Lazy IBk, Alternating Model Tree, Random Forest, and M5Rules. Coal properties such as fixed carbon, volatile matter, and mineral matter content and gasification process parameters, such as air feed per kg of coal, steam feed per kg of coal, and bed temperature were used as input parameters. The performances of the models were evaluated using various well-known statistical measures such as coefficient of determination (R2), the mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE in %), and root relative squared error (RRSE in %). In the test dataset, the Random Forest model achieved the best results for both outputs with R2 = 0.9730, MAE = 0.0338, RMSE = 0.0451, RAE = 15.7148%, and RRSE = 19.1181% values for the prediction of the heating value of the product gas and R2 = 0.9928, MAE = 0.0214, RMSE = 0.0258, RAE = 8.8001%, and RRSE = 9.1592% values for prediction of the product gas generation.
Conference Paper
Full-text available
ABSTRACT COMPUTER NETWORK INTRUSION CLASSIFICATION BASED ON MACHINE LEARNING. Intrusion profiles to computer network are more and more complex with increasingly use of information and communication technology. A system with classifying intrusion capabilities become important to apply to handle intrusion which can compromise the computer network. This paper proposes intrusion classification based on machine learning with decision tree and naive bayes, including accuracy, true positif (TP) rate and false positif (FP) rate to evaluate the result. This proposal produce accuracy around 90% with decision tree and 71% with naive bayes against intrusion data set UNSW-NB15. PENDAHULUAN Aplikasi dari teknologi informasi dan komunikasi telah sangat banyak digunakan, mulai dari sistem keuangan [1], sistem informasi kesehatan [2] hingga sistem fisik yang terkait dengan pengamatan lingkungan [3-4], hingga sistem keamanan [5] dan keselamatan [6]. Kondisi ini sebanding dengan peningkatan ancaman yang ditimbulkannya, baik jumlah maupun jenisnya [7]. Untuk melindungi sistem dari ancaman tersebut, diperlukan sebuah sistem pendeteksi intrusi untuk mencegah risiko yang lebih besar. Intrusi adalah upaya untuk masuk ke dalam sistem jaringan komputer secara tidak sah dan tidak wajar dengan target mempengaruhi aspek kerahasiaan, integritas serta ketersediaan data dan layanan. Sistem deteksi intrusi (IDS) merupakan sebuah program yang berusaha untuk mendapatkan indikasi bahwa komputer dalam jaringan telah terkompromi oleh intrusi secara akurat dan dengan jumlah false alarm yang rendah [8]. Sistem ini adalah inti dari subyek keamanan siber dan masih aktif dikembangkan [9]. Sejumlah sistem deteksi intrusi berbasis pembelajaran mesin telah dikembangkan dengan beragam pendekatan, perangkat lunak bantu serta benchmark data set. Secara ringkas, perbandingan sistem-sistem tersebut disajikan dalam .
Conference Paper
Full-text available
An approach to combating network intrusion is the development of systems applying machine learning and data mining techniques. Many IDS (Intrusion Detection Systems) suffer from a high rate of false alarms and missed intrusions. We want to be able to improve the intrusion detection rate at a reduced false positive rate. The focus of this paper is rule- learning, using RIPPER, on highly imbalanced intrusion datasets with an objective to improve the true positive rate (intrusions) without significantly increasing the false positives. We chose RIPPER to induce the comprehensibility in the model as is required by humans using the system. To counter imbalance in data, we implement a combination of oversampling (both by replication and synthetic generation) and undersampling techniques. We also propose a clustering based methodology for oversampling by generating synthetic instances. We evaluate our approaches on two intrusion datasets — destination and actual packets based — constructed from actual Notre Dame traffic, giving a flavor of real-world data with its idiosyncracies. Using ROC analysis, we show that oversampling by synthetic genera- tion of minority (intrusion) class outperforms oversampling by replication and RIPPER's loss ratio to inherently counter class imbalance. Additionaly, we establish that our clustering based approach is more suitable for the detecting intrusions.
Conference Paper
Full-text available
We apply adjacency matrix clustering to network attack graphs for attack correlation, prediction, and hypothesizing. We self-multiply the clustered adjacency matrices to show attacker reachability across the network for a given number of attack steps, culminating in transitive closure for attack prediction over all possible number of steps. This reachability analysis provides a concise summary of the impact of network configuration changes on the attack graph. Using our framework, we also place intrusion alarms in the context of vulnerability-based attack graphs, so that false alarms become apparent and missed detections can be inferred. We introduce a graphical technique that shows multiple-step attacks by matching rows and columns of the clustered adjacency matrix. This allows attack impact/responses to be identified and prioritized according to the number of attack steps to victim machines, and allows attack origins to be determined. Our techniques have quadratic complexity in the size of the attack graph.
Conference Paper
Full-text available
Most current anomaly intrusion detection systems (IDSs) detect computer network behavior as normal or abnormal but cannot identify the type of attacks. Moreover, most current intrusion detection methods cannot process large amounts of audit data for real-time operation. In this paper, we propose a novel method for intrusion identification in computer networks based on principal component analysis (PCA). Each network connection is transformed into an input data vector. PCA is employed to reduce the dimensionality of the data vectors and identification is handled in a low dimensional space with high efficiency and low use of system resources. The normal behavior is profiled based on normal data for anomaly detection and models of each type of attack are built based on attack data for intrusion identification. The distance between a vector and its reconstruction onto those reduced subspaces representing the different types of attacks and normal activities is used for identification. The method is tested with network data from MIT Lincoln labs for the 1998 DARPA intrusion detection evaluation program and testing results show that the model is promising in terms of identification accuracy and computational efficiency for real-time intrusion identification.
Conference Paper
Full-text available
There is often the need to update an installed intrusion detection system (IDS) due to new attack methods or upgraded computing environments. Since many current IDSs are constructed by manual encoding of expert knowledge, changes to IDSs are expensive and slow. We describe a data mining framework for adaptively building Intrusion Detection (ID) models. The central idea is to utilize auditing programs to extract an extensive set of features that describe each network connection or host session, and apply data mining programs to learn rules that accurately capture the behavior of intrusions and normal activities. These rules can then be used for misuse detection and anomaly detection. New detection models are incorporated into an existing IDS through a meta-learning (or co-operative learning) process, which produces a meta detection model that combines evidence from multiple models. We discuss the strengths of our data mining programs, namely, classification, meta-learning, association rules, and frequent episodes. We report on the results of applying these programs to the extensively gathered network audit data for the 1998 DARPA Intrusion Detection Evaluation Program
Conference Paper
Anomaly detection is one of the major areas of research with the tremendous development of computer networks. Any intrusion detection model designed should have the ability to visualize high dimensional data with high processing and accurate detection rate. Integrated Intrusion detection models combine the advantage of low false positive rate and shorter detection time. Hence this paper proposes an anomaly detection model by deploying consistency based feature selection, J48 decision tree and self organizing map (SOM). Experimental analysis has been carried on KDD99 data set and each of the features selected using the integrated mechanism has been able to identify the attacks in the data set.
Article
In 1998 and again in 1999, the Lincoln Laboratory of MIT conducted a comparative evaluation of intrusion detection systems (IDSs) developed under DARPA funding. While this evaluation represents a significant and monumental undertaking, there are a number of issues associated with its design and execution that remain unsettled. Some methodologies used in the evaluation are questionable and may have biased its results. One problem is that the evaluators have published relatively little concerning some of the more critical aspects of their work, such as validation of their test data. The appropriateness of the evaluation techniques used needs further investigation. The purpose of this article is to attempt to identify the shortcomings of the Lincoln Lab effort in the hope that future efforts of this kind will be placed on a sounder footing. Some of the problems that the article points out might well be resolved if the evaluators were to publish a detailed description of their procedures and the rationale that led to their adoption, but other problems would clearly remain. Categories and Subject Descriptors: K.6.5 (Management of Computing and Information
Conference Paper
Intrusion detection is important in network security. Most current network intrusion detection systems (NIDSs) employ either misuse detection or anomaly detection. However, misuse detection cannot detect unknown intrusions, and anomaly detection usually has high false positive rate. To overcome the limitations of both techniques, we incorporate both anomaly and misuse detection into the NIDS. In this paper, we present our framework of the hybrid system. The system combines the misuse detection and anomaly detection components in which the random forests algorithm is applied. We discuss the advantages of the framework and also report our experimental results over the KDD'99 dataset. The results show that the proposed approach can improve the detection performance of the NIDSs, where only anomaly or misuse detection technique is used.
A Method to Improve accuracy of k-NN algorithm
  • Maryam Kuhkan
Maryam Kuhkan. (2006) "A Method to Improve accuracy of k-NN algorithm", IJCEIT8(6): 90-95.
Pattern based network security using decision trees and support vector machine
  • V K Pachghare
  • P Kulkarni
Pachghare, V.K. and Kulkarni, P.(2011)"Pattern based network security using decision trees and support vector machine." in Electronics Computer Technology (ICECT), 2011 3rd International Conference on 5(1): 254-257.