Conference PaperPDF Available

A majority voting technique for Wireless Intrusion Detection Systems

Authors:

Figures

Content may be subject to copyright.
978-1-4577-1343-9/12/$26.00 ©2016 IEEE
A Majority Voting Technique for Wireless Intrusion
Detection Systems
Bandar Alotaibi and Khaled Elleithy
Computer Science and Engineering Department
University of Bridgeport
Bridgeport, CT 06604
balotaib@my.bridgeport.edu, elleithy@bridgeport.edu
Abstract—This article aims to build a misuse Wireless Local Area
Network Intrusion Detection System (WIDS), and to discover
some important fields in WLAN MAC-layer frame to
differentiate the attackers from the legitimate devices. We tested
several machine-learning algorithms, and found some promising
ones to improve the accuracy and computation time on a public
dataset. The best performing algorithms that we found are Extra
Trees, Random Forests, and Bagging. We then used a majority
voting technique to vote on these algorithms. The Bagging
classifier and our customized voting technique have good results
(about 96.25% and 96.32% respectively) when tested on all the
features. We also used a data-mining technique based on Extra
Trees ensemble method to find the most important features on
Aegean WiFi Intrusion Dataset (AWID) public data-set. After
selecting the most 20 important features, Extra Trees and our
voting technique were the best performing classifiers in term of
accuracy (96.31% and 96.32% respectively).
Keywords-component; WLAN; IDS; Data mining; Machine
learning; Attacks
I.
I
NTRODUCTION
Wireless networks have dominated in recent years over the
wired networks that have been dominant for decades.
Nowadays, Wireless Local Area Networks (WLANs) are the
first choice for local area connectivity because of the mobility
and the low cost that they provide. Unfortunately, the mobility
and the low cost do not come free; it comes with debatable
security. Some researchers suggest enhancing the security of
WLANs, but this requires either modification of existing
standards/protocols, or updates to existing wireless devices
such as Access Points (APs). External solutions that do not
require modification to standards and protocols such as
Intrusion Detection Systems (IDSs) have gained attention for
decades because of the immediate response to threats and the
possibility of eliminating intruders. Some of the IDSs are based
on predetermined signatures of familiar attacks, which are
saved on the database. The monitored frames are compared
with the predetermined signatures. If the match is found, the
notification takes place immediately. On the other hand, data
mining or machine learning IDSs have an advantage because
they do not require predefined static signatures of known
attacks. Thus, it can be done automatically through
classification or clustering algorithms.
There are two types of IEEE 802.11 networks:
Infrastructure mode and Ad-hoc mode. In the Infrastructure
mode, the AP is the coordinator that manages the wireless users
and connects them to the wired side of the network. The
wireless users can connect to each other directly in the Ad-hoc
mode without the AP. This research concerns only the
Infrastructure mode, because the experiments in the data-set
that we use are conducted using that mode.
Upon the release of the first version of the 802.11 standard,
security methods were included to allow secure communication
between communicating parties. Wired Equivalent Privacy
(WEP) was adopted from the wired networks and found to be
unsuitable for wireless networks. Many weaknesses were found
which were related to availability and confidentiality of the
shared key, in particular. WiFi Protected Access (WPA) and
WPA2 were ratified to improve the confidentiality weaknesses
that have been found in WEP. WPA has been found to be
robust in comparison to WEP. However, like WEP, WPA-
WPA2 are vulnerable to Denial of Service (DoS) attacks,
which makes availability questionable. Moreover, with the
computation power and the availability of cluster computing,
WPA passwords become an issue. For instance, CloudCracker
[1] is capable of trying 300,000,000 WPA passwords in less
than 30 minutes.
There are a wide range of security measures in use, such as
encryption mechanisms, authentication methods, and access
control techniques, but many intrusions remain undetected.
Thus, there is a demand to automate the monitoring of WLAN
activities to detect intrusions. There are two known Intrusion
Detection methods: anomaly detection and misuse detection.
Anomaly detection identifies attacks through deviation from
the normal behavior, by the devices that generate these attacks.
Misuse detection recognizes suspicious activities regarding
patterns matching previous known attacks. Anomaly detection
techniques are more likely to detect unknown intrusions and
have a high false positive rate. On the other hand, misuse
detection techniques have a low false positive rate, but
unknown attacks could remain undetected. Several IDSs are
considered to be rule-based, in which system performance
depends on security experts who build the rules. Considering
the vast amount of WLAN traffic, building rules can be slow
and expensive. The rules have to be modified manually and
applying new rules is a hard and time-consuming task. To
overcome the aforementioned limitations, data-mining or
machine-learning techniques take place to discover important
patterns of large data sets. It can build intrusion patterns which
can be used for misuse-detection techniques based on
classification, and can build profiles for normal behavior to
detect intrusions by anomaly detection techniques. This paper
proposes a new misuse-detection framework based on machine
learning algorithms and a voting technique.
A. Frames Types in WLAN
There are three types of frames in WLANs: management,
control, and data.
a) Management Frames: Responsible for establishing
connection of wireless users and maintaining connection with
the AP. There are several management frame sub-types, each
having different responsibilities, such as authentication,
deauthentication, association request, association response,
disassociation, probe request, probe response, beacon, re-
association request, and re-association response.
b) Control Frames: Responsible for controlling the
WLAN medium to deliver data frames reliably from the
wireless users to the AP, and from the AP to the wireless
users. There are several control frame sub-types such as
Request to Send (RTS), Clear to Send (CTS),
Acknowledgment, and Power Save Poll. RTS and CTS can
reduce the possibility of the collision. RTS and CTS are
exchanged by the communicating parties, prior to sending the
data frame, to prevent collisions that might happen because of
a hidden terminal.
c) Data Frames: Responsible for transferring the actual
information from the upper layers. There are some sub-types
of data frames such as frames having quality of service
enhancements, sent on a contention based service, or frames
carrying more data.
The contributions of this research can be summarized as
follow:
We propose a new WLAN misuse Intrusion Detection
framework based on majority voting.
We apply a feature selection technique based on the
Extra Trees ensemble method to improve the accuracy,
and more importantly, to expedite detection time.
II. R
ELATED
W
ORK
The authors of [2] used several light machine-learning
algorithms that could classify the four classes that they studied
for one of the reduced data-sets. The best performing classifier
was J48, with an accuracy of 96.19%, when using all the 156
feature set. This algorithm takes, about 3921.68 seconds. The
authors then reduced the dimensionality of the data-set and
picked the best 20 features to improve accuracy and reduce
time. They were able to increase the accuracy of the best
performing algorithm to 96.2574% and decrease the time of
that algorithm by 568.92 seconds.
III. P
ROPOSED
S
OLUTION
The proposed framework (shown in Figure 1) uses several
machine-learning algorithms to build the patterns of both the
normal behavior and intrusions. In the offline stage the patterns
of the intrusions are built. In the online stage the intrusions are
classified based on their types. Prior to training the framework
applies a feature selection capability to choose important
features and discard unwanted features. The training includes
algorithms that are fed into majority voting for robustness and
to improve performance. These algorithms are Extra Trees with
20 trees, Random Forests with 20 trees, and Bagging with 10
Decision Trees. After majority voting, the patterns are built by
the matching builder for normal samples and intrusions. Once
the builder creates the patterns, the patterns can be serialized
and fed into the detection capability. In the online stage, the
network traces are pre-processed using the features that have
been selected by the feature selection capability. After pre-
processing, the frames are fed into the detection utility for
online detection. The detection utility decides whether the
frame is suspicious or not. If it finds it a suspicious frame, the
alert is triggered.
Monitoring
Extraction +
Normalization
Feature
Selection
Capability
Misuse
Detection Notificatio n
Building
Patterns Serialization
Online
Offline
Bagging Random
Forests
Extra
Trees
Voting
Dataset
Figure 1. The proposed framework
A. Bagging
The Tree Bagging algorithm was created by Leo Breiman,
in 1996 [3]. The Bagging ensemble method consists of
predetermined and parallelized classification Trees. These
trees are grown from bootstrap replications. The
randomization of the cut-points is accomplished implicitly
through the bootstrap re-sampling.
B. Random Forests
The Random Forests classifier was also introduced by
Breiman, in 2001 [4]. The Random Forests ensemble method
is constructed using collections of weakly-correlated decision
trees. A bootstrap sample of the training set is used to train
each tree in the forests. The best split is chosen at each node
from a random subset of the features. This procedure
guarantees that each tree uses independent features from the
training samples. Thus, it helps reduce the statistical
correlations on the rest of the trees.
C. Extra Trees
Extra Trees was created by Geurts et al. in 2006 [5]. The
ensemble method utilizes the top-down procedure to construct
an ensemble of unpruned decision trees. The cut-point
selection is carried out fully at random to provide the best split
of the nodes. The Extra Trees algorithm grows the trees by
utilizing the entire learning sample instead of a bootstrap
replication.
D. Majority Voting
Majority Voting is one of the most popular voting methods
along with Plurality Voting and Weighted Voting [6].
Majority Voting has been used by several researchers utilizing
the base classifiers to obtain better results. There are some
advantages in combining several classifiers such as increasing
robustness, obtaining better accuracy, and heavily built
generalization [7], [8], [9], [10]. The vote for one class is
carried out by each base classifier, and the final class label is
the one that receives more than half of the votes. If there is no
class label that receives more than the half of the votes, the
majority vote technique makes no prediction (i.e., a rejection
option is given), or one of the base classifiers option is
explicitly selected. In this article, we first used the best
performing classifiers to get strong generalization. Then, we
used a majority voting technique to get better accuracy.
E. Feature Selection
Some of the frames fields are not necessarily for
distinguishing between the legitimate devices’ traces and the
attackers’ traces. Extracting unwanted features adds overhead
and might not improve the performance. Feature selection is a
valuable initiative to build IDSs, especially machine-learning-
based IDSs. The number of features is well-defined since it
depends on the frame header, many other features can be
added artificially to the frames metadata when capturing the
frames. However, only some frames fields are crucial to
detecting the intruders. Some machine learning algorithms are
hypersensitive to the number of features. Choosing the
significant features increases the performance of the IDS and
decreases time. Some researchers reported that choosing the
suitable features is difficult and time consuming. The usual,
prone-to-errors way of choosing the right features is to let a
security expert decide which features are important. A better
way to do it is to use the data mining approach to discover
important patterns of large data sets. It can build intrusions
patterns, which can be used for misuse detection techniques
based on classification, or can build profiles of normal
behavior to detect intrusions by anomaly-detection techniques.
Some information might obstruct the classification task,
especially in classification problems that consist of many
different and connected correlations. Incorrect
interrelationships exist in features which affect the detection
performance. Some features might be needless or redundant.
Furthermore, reducing the number of features could improve
computation time and the performance of the WIDS. It is
impossible for a human to discover the complex correlations
that exist between features. Feature selection is critical for real
time prediction performed by the WIDS, so reducing the
features is recommended. Reduction could be done using data
filtering with system expert supervision, or by data mining
techniques. The former might ignore useful data, so it has to
be done with caution.
IV. R
ESULTS AND
D
ISCUSSION
The only public data-set that we know for WLANs is
introduced in [2]. The data-set includes four parts, including
two reduced data-sets for research interested in Wireless
Intrusion Detection Systems (WIDSs), and two full data sets
for big-data researches. The two reduced data-sets consists of
four classes and fifteen classes, respectively. The four classes
are categories that launched attacks belong to, including
flooding, injection, and impersonation, and the normal class,
while the other reduced data-set consists of the names of the
launched attacks and the normal class. The number of training
samples of each reduced data-set is 1,795,575, and the number
of test samples is 575,643. The number of features is 156,
representing the WLAN frame fields along with physical layer
meta-data.
A. 802.11 Attacks
The attacks that are launched by the authors, who published
the data-set, were based on WEP, but most of the attacks share
the same characteristics on other security mechanisms. In this
subsection we will explain the classes that are used in the
reduced data-set and how the 20 features were selected by the
data mining technique.
a) Injection Attacks flood the wireless network with
encrypted data frames smaller in size than the normal frame.
ARP injection attack is an attack in which the attacker
launches to speed up the process of collecting Initialization
Vectors (IVs) from the targeted wireless device or AP. Some
penetration testing tools (such as Aireplay) are used to launch
these attacks and use the same IV values, which cannot occur
under normal conditions. Also, the DS status flag is always set
to 1 for all the frames sent during ARP injection. Another vital
attack is fragmentation, in which the attacker injects small
fragmented data frames. This attack usually takes about a
second if succesful. Some of the penetration testing tools that
launch these attacks use a static invalid Destination Address.
The DS status flag is always set to 1, the frame length is small
but is not fixed, and the frames have out-of-order sequence
numbers.
b) Flooding Attacks: usually generate an increase in the
number of frames in a WLAN-management frames in
particular. However, it is not always valid to consider the
increased number of management frames as an indication of a
flooding attack. It could be an indication of a malfunction of
certain device. Although the attacker can masquerade as a
legitimate device, it is much harder to hide the increase in
management frames produced by flooding attacks. For
example, a de-authentication attack is launched by some tools
using the same reason code, and has an out-of-order sequence
number. Also, some tools, such as MDK3 that the hackers use
to launch authentication flooding and beacon flooding attacks,
use a sequence number that is always set to 0. Tools such as
Metaspolit, used to launch probe response flooding attack, use
a random sender address, which could have a valid 24-bit
number that identifies the vendor uniquely. This is known as
Organizationally Unique Identifier (OUI).
c) Impersonation Attacks masquerade a legitimate device
in a WLAN by changing one or more of its characteristics.
The Evil-twin AP is one example, where the attacker can
change the MAC address and Service Set Identifier (SSID) of
the device to be the same as the MAC address and SSID of an
existing AP. Such attacks are always proceeded by de-
authentication attacks, targeting wireless devices that are
connecting to the targeted AP, to force them to connect back
to the fake AP. This attack is launched by tools like Airbase,
which sends broadcast beacon frames with a fixed frame
length. Furthermore, in all impersonation attacks, the Received
Signal Strength (RSS) of the attacker is different from the
legitimate device RSS if there is a significant distance between
the two devices.
The best machine learning algorithms that we used in our
experiments are Decision Trees, Extra Trees, and Random
Forests. Decision Trees is not stable. We ran the test several
times and it gave us different results every time. The three
classifiers did not achieve better results than the J48 classifier
that the authors of [2] used in their experiments. We decided to
use the Bagging classifier of minimum Decision Trees as a
base estimator to be more robust and to have minimum time.
The Bagging classifier yields slightly better results and has
better timing. We then used the voting classifier that utilized
Extra Trees of 20 trees, Random Forests of 20 trees, and the
Bagging classifier of 10 Decision Trees as base estimator, and
got better results and reduced time.
B. Bagging
We used Decision Tree [9] introduced by Breiman et al. as
a base estimator to build the Bagging method. Ten trees was
used to minimize cost. Table I shows the confusion matrix of
the bagging method. Among the three tested classifiers, it is the
most accurate classifier for the hardest class, which is the
impersonation class. It is also slightly better than our voting
classifier, of which about 1471 to 1470 occurrences classified
correctly. Bagging and Extra Trees classifiers are better than
the rest of the classifiers (including the voting technique) in
classifying the injection class of 16680 occurrences (i.e., it
misclassified only 2 occurrences). It is expensive in terms of
time (about 154 seconds) in comparison to Random Forests and
Extra Trees ensemble methods. The overall accuracy of the
bagging method is 96.25%, as shown in Table II. The accuracy
did not change when we used the reduced features, but the time
was decreased to about 35.7 seconds as shown in Table III.
C. Random Forests
We used 20 trees to build the ensemble because Random
Forests is lighter than the Bagging method. The accuracy of
Random Forests is the worst among the tested methods when
we used the entire feature set of about 95.89% (as shown in
Table II). However, it is the best method for classifying
flooding class. The training time is second after the Extra Trees
classifier by about 22.4 seconds when using the entire feature
set, and 9.95 seconds when using the reduced feature set. It is
the algorithm that most likely to benefit from reducing the
feature set, in terms of accuracy. It jumped from 95.89% to
96.31% after we applied the feature-selection technique.
TABLE I. B
AGGING
C
ONFUSION
M
ATRIX
Normal Flooding Injection
Impersonation Classified as
530383 343 0 59 Normal
2585 5512 0 0 Flooding
2 0
16680 0 Injection
18606 2 0 1471 Impersonation
TABLE II. A
LL
F
EATURES
R
ESULTS
Accuracy Time
Extra Trees 96.06 18.1
Random Forests 95.89 22.4
Bagging 96.25 154
Voting 96.32 390
TABLE III. 20
F
EATURES
R
ESULTS
Accuracy Precision
Recall Time
Extra Trees 96.31 0.96 0.96 8.03
Random Forests 96.31 0.96 0.96 9.95
Bagging 96.25 0.98 0.96 35.7
Voting 96.32 0.96 0.96 107
TABLE IV. R
ANDOM
F
ORESTS
C
ONFUSION
M
ATRIX
Normal Flooding Injection
Impersonation Classified as
530775 6 0 4 Normal
2536 5561 0 0 Flooding
41 0 16641
0 Injection
18645 0 0 1434 Impersonation
D. Extra Trees
We also used 20 trees to build the ensemble of Extra trees.
The overall accuracy of Extra Trees is 96.06% when we used
the whole feature set. It improved to 96.31% when we applied
the feature selection capability. The best time among the tested
algorithms is the time of Extra Trees (about 18.1 seconds)
when using the whole feature set and 8.03 seconds when we
applied the reduced feature set. Aside Bagging method, Extra
Trees classified 16680 occurrences of injection class correctly
(as shown in V).
TABLE V. E
XTRA
T
REES
C
ONFUSION
M
ATRIX
Normal Flooding Injection
Impersonation Classified as
530773 2 0 10 Normal
2601 5496 0 0 Flooding
2 0
16680 0 Injection
18619 0 0 1460 Impersonation
E. Majority Voting
Majority Voting relies on the base classifiers. We chose
light classifiers to get better results and to be able to detect
intrusions in real time. As expected, it is the best method in
terms of accuracy (about 96.32%) when using the whole
feature set. Time is expensive-about 390 seconds. It is the best
method to classify the normal class. As shown in Table VI, the
method correctly classified all normal occurrences as normal
(i.e., there is no false positive at all). It also maintained its
accuracy; the best method in term of accuracy when we
reduced the feature set. Time decreased significantly when we
reduce the feature set, from about 107 seconds to 390 seconds
using the full feature set.
TABLE VI. M
AJORITY
V
OTING
C
ONFUSION
M
ATRIX
Normal Flooding Injection
Impersonation Classified as
530778 0 0 0 Normal
2589 5508 0 0 Flooding
5 0 16677
0 Injection
18609 0 0 1470 Impersonation
F. Most Important 20 Features
Figure 2 shows the most important features selected by the
Extra Tree ensemble method. The most important 20 features
that have been selected are as follows:
1) Destination Address(DA) is the final destination of the
data frame.
2) Sub-type is in the control frame which identifies the
purpose of the frame type. For instance, if the type of the
frame is control, the sub-type field could be one of the
possible sub-types such as CTS, RTS, Ack and so on.
3) Seq: every 802.11 frame has a sequence number, except
control frames. The sequence number is incremented by one
from 0 to 4,095 every consecutive frame.
4) Transmitter Address(TA) is one of two addresses from
which the frame might be transmitted, which is the first
originator of the frame (i.e., the wireless user) or the
intermediate address that transfers the frame to the final
destination (i.e., the AP).
5) Duration field identifies the time required to transmit
the frame in microseconds.
6) Receiver Address (RA) is the first device that receives
the data frame. It could be the AP in the path to the final
destination or the device that receives the frame which is the
final destination.
7) Type.cck (Complementary Code Keying) is a
modulation scheme that is adopted to achieve high data rates.
8) fc.ds is the distribution system status field that indicates
which direction the frame is going to.
9) Pwrmgt indicates if the station is either going to change
its status to power save mode or can receive frames.
10) frame-len indicates the length of the frame in the wire.
11) Datarate specifies the supported data rate.
12) wep.icv (Integrity Check Value) is a 4 byte long that is
calculated using the frame and attached to it.
13) reason c: there are some reasons to be indicated when
sending a deauthentication frame such as station is leaving or
disassociated due to inactivity.
14) wep.iv (WEP Initialization Vector) is a 24-bit long that
is sent in the clear. It is different for each encrypted frame and
concatenated with the fixed root key.
15) Type has to be one of data, control, or management.
16) Bssid is the MAC address of the AP.
17) Source Address (SA) of the frame originator.
18) RSS is the Received Signal Strength (RSS) of the sender
measured at the receiver.
19) Protected indicates the encryption method that is used
by the WLAN network.
20) wep.key (Wired Equivalence Privacy) key that is a
hexadecimal number that encrypts messages between group of
connected devices in WLAN. There are two key sizes that
WEP supports which are 40 bits and 104 bits.
G. Dat- set Limitations
It only applied to the WEP encryption method. Some
of the features are WEP-dependent. The majority of the
attacks in the data set can be applied to other security
standards (such as WPA, WPA2 and 802.11w
amendment), but some of them are WEP-specific.
Most of the attacks are launched by specific
penetration testing tools to build patterns of the
intrusions. Attackers might use different existing or
customized tools to exploit some of the wholes and
bypass the IDS.
Does not consider the mobility of the attacker.
Figure 2. The most important 20 features
V. C
ONCLUSION
We improved the accuracy and the time on the AWID data-
set using a classifier that votes on the output of the carefully
picked three classifiers: Extra Trees, Random Forests, and
Bagging with ten Decision Trees as base estimators. This
performs well in both accuracy and time. The best performing
classifier is the voting classifier which improved accuracy and
time to 96.31% and 390 seconds when we used all the features.
We also used a data mining technique based on the Extra Trees
ensemble method to choose the best 20 features to decrease
time and improve accuracy of the best performing classifiers.
We maintain the same accuracy, but improved the time by
about 107 seconds.
R
EFERENCES
[1] Thoughcrime Labs. CloudCracker. Feb. 2016. URL:
https://www.cloudcracker.com.
[2] Kolias, C., Kambourakis, G., Stavrou, A., & Gritzalis, S. Intrusion
Detection in 802.11 Networks: Empirical Evaluation of Threats and a
Public Dataset.
[3] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-
140.
[4] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
[5] Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized
trees. Machine learning, 63(1), 3-42.
[6] Zhou, Z. H. (2012). Ensemble methods: foundations and algorithms.
CRC Press.
[7] Chebrolu, S., Abraham, A., & Thomas, J. P. (2005). Feature deduction
and ensemble design of intrusion detection systems. Computers &
Security,24(4), 295-307.
[8] Peddabachigari, S., Abraham, A., Grosan, C., & Thomas, J. (2007).
Modeling intrusion detection system using hybrid intelligent
systems. Journal of network and computer applications, 30(1), 114-132.
[9] Zainal, A., Maarof, M. A., & Shamsuddin, S. M. (2009). Ensemble
classifiers for network intrusion detection system. Journal of
Information Assurance and Security, 4(3), 217-225.
[10] Oza, N. C., & Tumer, K. (2008). Classifier ensembles: Select real-world
applications. Information Fusion, 9(1), 4-20.
[11] Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984).
Classification and regression trees. CRC press.
... Machine learning techniques have been used in IDS design [9]- [12]. In [9] the authors propose a misuse intrusion detection framework for a wireless local area network based on majority voting that differentiates between attacker and legitimate node patterns by examining mac-layer frames. ...
... Machine learning techniques have been used in IDS design [9]- [12]. In [9] the authors propose a misuse intrusion detection framework for a wireless local area network based on majority voting that differentiates between attacker and legitimate node patterns by examining mac-layer frames. Their system uses several machine learning techniques where the best performing classifiers are chosen to get strong generalization. ...
... They perform security analysis by considering false positive rate and detection rate. These works cited above [9]- [12] all utilize machine learning techniques for providing host-level intrusion detection. Our work is different in that we aim to provide system-level intrusion detection in the form of IDS voting by which each voting node selected (which can be good or bad) reports its host-level intrusion detection outcome as input. ...
Article
Full-text available
In this paper, we develop a methodology to capture and analyze the interplay of attack-defense strategies for intrusion detection in an autonomous distributed Internet of Things (IoT) system. In our formulation, every node must participate in lightweight intrusion detection of a neighbor target node. Consequently, every good node would play a set of defense strategies to faithfully defend the system while every bad node would play a set of attack strategies for achieving their own goals. We develop an analytical model based on Stochastic Petri Net (SPN) modeling techniques. Our methodology allows the optimal defense strategies to be played by good nodes to maximize the system lifetime when given a set of parameter values characterizing the distributed IoT system operational environment. We conduct a detailed performance evaluation based on an experiment dataset deriving from a reference autonomous distributed IoT system comprising 128 sensor-carrying mobile nodes and show how IDS defense mechanisms can counter malicious attack mechanisms under the ADIoTS system while considering multiple failure conditions.
... Com base no método de aprendizado de máquina chamado Extra Tree, os autores selecionam 20 atributos, os quais são ordenados por nível de importância. Tal seleção é referenciada também no trabalho [15], do mesmo autor, onde é alcançada uma acurácia de 96.32%. Para a avaliação, os autores propõem o uso de um conjunto de classificadores, baseando-se em votação majoritária entre os mesmos. ...
... Nesse contexto, ao passo que a seleção de atributos para ataques convencionais é amplamente discutida na literatura[12][13][14], a análise do impacto dos atributos capazes de identificar ataques atuais, tais como os que envolvem dispositivos da Internet das Coisas, ainda é pouco explorada pela comunidade acadêmica. Existem, contudo, algumas propostas que consideram a seleção de diferentes conjuntos de atributos para a detecção de intrusões por meio da análise de atributos coletados de redes Wi-Fi[2][6][7][8][15]. Não há, porém, uma consolidação na seleção dos atributos a serem analisados, de modo a explorar o máximo desempenho dos IDSs. ...
Conference Paper
Full-text available
Resumo-Os Sistemas de Detecção de Intrusão (IDS) utilizam o mecanismo de seleção de atributos durante o processo de classificação de ameaças ou eventos de intrusão. Uma seleção adequada permite que o IDS processe somente os atributos relevantes para a classificação. Atualmente, com bilhões de novos dispositivos e objetos ingressando na Internet das Coisas (IoT), o papel da seleção de atributos ganha maior relevância devido as restrições de recursos impostas nesse ambiente. Este trabalho investiga diferentes conjuntos de atributos propostos na literatura para detectar ataques de personificação, quando se falsifica entidades legítimas da rede. A avaliação de desempenho do IDS considera os requisitos impostos pela IoT, enfatizando o papel da seleção de atributos. Os resultados indicam uma variação de até 49,99% na acurácia para os diferentes conjuntos de atributos, mesmo com a escolha do melhor classificador para cada conjunto. Adicionalmente, uma oscilação de até 85,43% foi observada no tempo de processamento. A melhor acurácia obtida foi de 99,99%, com uma redução de até 65,04% do tempo necessário para processamento.
... The study concluded that random tree classifier was the winner under various performance metrics. [38,39,69,75,83,90] 6 Not mentioned [28,125,127,144,157] 5 Weighted sum voting [70,133] 2 Majority voting [43,55,56,68,70,77,84,86,91,92,101,106,109,113,117,135,140,143] 18 Maximum probability voting [68,106,135] 3 Product probability voting [68,135] 2 Sum probability voting [76] 1 Minimum probability voting [68,145] 2 Median probability voting [106,145] 2 Bayesian [98] 1 ...
... The proposed method improved classification accuracy when compared to other classifiers. Alotaibi and Elleithy [117] built a misuse wireless local area network IDS. The proposed method used a majority voting to vote the class predictions of extra trees, RF, and bagging. ...
Article
Intrusion detection systems (IDSs) are intrinsically linked to a comprehensive solution of cyberattacks prevention instruments. To achieve a higher detection rate, the ability to design an improved detection framework is sought after, particularly when utilizing ensemble learners. Designing an ensemble often lies in two main challenges such as the choice of available base classifiers and combiner methods. This paper performs an overview of how ensemble learners are exploited in IDSs by means of systematic mapping study. We collected and analyzed 124 prominent publications from the existing literature. The selected publications were then mapped into several categories such as years of publications, publication venues, datasets used, ensemble methods, and IDS techniques. Furthermore, this study reports and analyzes an empirical investigation of a new classifier ensemble approach, called stack of ensemble (SoE) for anomaly-based IDS. The SoE is an ensemble classifier that adopts parallel architecture to combine three individual ensemble learners such as random forest, gradient boosting machine, and extreme gradient boosting machine in a homogeneous manner. The performance significance among classification algorithms is statistically examined in terms of their Matthews correlation coefficients, accuracies, false positive rates, and area under ROC curve metrics. Our study fills the gap in current literature concerning an up-to-date systematic mapping study, not to mention an extensive empirical evaluation of the recent advances of ensemble learning techniques applied to IDSs.
... But in our approach, both the contemplations are well tended to with an ideal subset highlight a commendable precision in demonstrate too was gotten. [22] Neural networks 11 Alotaibi et al. [23] Voting 154 Thing et al. [24] Deep learning 154 ...
Article
Full-text available
Digitization has given as a goliath whole of data that joins fragile information. Endeavours are attempting so hard to secure the data as far as mystery, insightfulness, and realness. One medium used by various associations for securing unapproved get to is through the intrusion detection system. This zone remains dynamic in researching as the peculiarity regarding intruders is growing exponentially on a regular reason. These solicitations successful figuring’s and systems that can recognize and take way better decisions practically increasingly current ambushes. A couple of AI based methodologies are existing recorded as a hard copy which can be upgraded for reduced wrong cautions. We have done a wide ask about experimentation on the AWID dataset for way better comes to fruition on DoS ambushes. We have used an embedded ridge-based decrease approach and ensemble classifier that gave us 99.94% exactness.
... Alotaibi et al. [33] established a voting-based IDS. For each time period's node in the IoT system, one of the three pre-trained intrusion detection algorithms was selected for active intrusion detection. ...
Article
Full-text available
With the advent of the “Internet plus” era, the Internet of Things (IoT) is gradually penetrating into various fields, and the scale of its equipment is also showing an explosive growth trend. The age of the “Internet of Everything” is coming. The integration and diversification of IoT terminals and applications make IoT more vulnerable to various intrusion attacks. Therefore, it is particularly important to design an intrusion detection model that guarantees the security, integrity and reliability of the IoT. Traditional intrusion detection technology has the disadvantages of low detection rate and poor scalability, which cannot adapt to the complex and changeable IoT environment. In this paper, we propose a particle swarm optimization-based gradient descent (PSO-LightGBM) for the intrusion detection. In this method, PSO-LightGBM is used to extract the features of the data and inputs it into one-class SVM (OCSVM) to discover and identify malicious data. The UNSW-NB15 dataset is applied to verify the intrusion detection model. The experimental results show that the model we propose is very robust in detecting either normal or various malicious data, especially small sample data such as Backdoor, Shellcode and Worms.
Chapter
The exponential increases of technology-enabled devices are creating new horizons for research and development. The telecom revolution has taken a massive step in providing services through the 5G network, which has made inter-device communication reach the pinnacles of information exchange. The 5G networks back the recent advancements in the field of the Internet of Things. Inter-device communication through the network is prone to attacks in many ways. The intrusion detection system (IDS) has played a crucial role in avoiding and detecting attacks at a basic level. The IDS can be active or passive. It uses various methods to detect suspicious activities like Signature-based, Anomaly-based or even at the physical network levels, namely, network intrusion detection and host intrusion detection. The machine learning domain has enabled various paths that are dynamic to handle the network intruders. This chapter lights up on different machine learning algorithms and models, both supervised and unsupervised, capable of enhancing how to detect and handle intruders in the communication networks. In the first set, the chapter gives a brief knowledge about IoT and 5G technologies. Later it describes various ML algorithms and IDS systems. The chapter highlights various works carried out in the field through a thorough literature review. Also, confer about existing and forecasted challenges in the intercommunication devices through 5G for IoT.
Article
Full-text available
Anomaly‐based intrusion detection plays a crucial and essential role in providing security to computer networks. However, some concerns still exist for the sustainability and feasibility of existing approaches over modern networks. Specifically, these concerns relate to increasing many features from a network, thus making it difficult for a user to achieve high detection accuracy. This article presents a novel and efficient feature reduction scheme for fault‐based intrusion detection by efficiently handling the large feature from a network. Monarch butterfly optimization is used with a new correlation‐based fitness function to reduce features. Twin support vector machine is utilized with its unique classification in the proposed scheme. The proposed method is analyzed and evaluated with NSL‐KDD, CICIDS2017, AWID network security, and standard fault network datasets. The obtained results are verified using standard performance measures, that is, overall detection accuracy, F‐measure, FPR, and AUC score with time complexity. The results show the superiority and stability of the proposed scheme over the existing techniques. A novel feature reduction technique for efficient anomaly‐based intrusion detection in wireless networks is proposed. The proposed feature reduction utilizes linear correlation coefficient as fitness function.
Chapter
The primary challenge in choosing the right electrification approach across the globe is understanding the local energy resource potential. In this paper, the result of solar resource potential assessment of East Gojjam (EG) Zone, Ethiopia is presented. The solar insolation, an important parameter in designing and planning solar photovoltaic systems, at four meteorological stations of EG (viz. Debre Markos, Debrewerq, Mota and Yetnora) is estimated from sunshine hour and extraterrestrial radiation. The hour of bright sunshine data which covered eleven years has been collected from National Meteorological Agency Bahir Dar Branch. This data is prepared and used to estimate the solar insolation using a well-known linear Ångestrӧm-Prescott (A-P) model. The site-specific A-P model is adopted by using regression coefficients, ‘a’ and ‘b’, which are obtained from well-known empirical formulas. The empirical formulas were validated using the measured data from other sites in the region. The annual mean daily solar insolation (kWh/m2/day) for Debre Markos, Debrewerq, Mota, and Yetnora is estimated to be 5.47, 7.05, 6.11, and 6.16, respectively. According to the monthly solar insolation profile, EG receives the highest and lowest solar insolation in April at Debrewerq and July at Debre Markos, respectively. The solar insolation profile at Debre Markos demonstrates a significant inconsistency while Debrewerq receives more uniform solar radiation throughout the year; and therefore, the later site is the most suitable for solar photovoltaic energy investments with a highest and more uniform clearness index profile throughout the year.
Article
Vehicular networks are susceptible to various attacks from malicious nodes within a network. The collaborative misbehavior detection system can be used to detect these attacks. However, in a collaborative misbehavior detection system, an attacker may send false feedback which affects the detection accuracy. A trust model can be used to stimulate vehicles to send true feedbacks. However, an attacker can take advantage of weak or strong reputation update methods. A dynamic trust can be used to stimulate vehicles to send true feedbacks. In this paper, we propose a deep reinforcement learning based dynamic reputation update. In the proposed method, feedbacks from vehicles are combined in vehicular edge computing (VEC) servers using Dempster-Shafer theory and the results are used to predict the average number of true messages. VEC then uses deep reinforcement learning to determine the optimum reputation update policy to stimulate vehicles to send true feedbacks. In addition, through extensive simulations, we show that the proposed dynamic reputation policy is better in terms of the average number of true feedbacks compared to the existing reputation update policy.
Article
Network intrusion detection remains a challenging research area as it involves learning from large-scale imbalanced multiclass datasets. While machine learning algorithms have been widely used for network intrusion detection, most standard techniques cannot achieve consistent good performance across multiple classes. In this paper we proposed a novel ensemble system based on the modified adaptive boosting with area under the curve (M-AdaBoost-A) algorithm to detect network intrusions more effectively. We combined multiple M-AdaBoost-A-based classifiers into an ensemble by employing various strategies, including particle swarm optimization. To the best of our knowledge, this study is the first to utilize the M-AdaBoost-A algorithm, which incorporates the area under the curve into the boosting process for addressing class imbalance in network intrusion detection. Compared with existing standard techniques, our proposed ensemble system achieved superior performance across multiple classes in both 802.11 wireless intrusion detection and traditional enterprise intrusion detection.
Article
Full-text available
WiFi has become the de facto wireless technology for achieving short- to medium-range device connectivity. While early attempts to secure this technology have been proved inadequate in several respects, the current more robust security amendments will inevitably get outperformed in the future, too. In any case, several security vulnerabilities have been spotted in virtually any version of the protocol rendering the integration of external protection mechanisms a necessity. In this context, the contribution of this paper is multifold. First, it gathers, categorizes, thoroughly evaluates the most popular attacks on 802.11 and analyzes their signatures. Second, it offers a publicly available dataset containing a rich blend of normal and attack traffic against 802.11 networks. A quite extensive first-hand evaluation of this dataset using several machine learning algorithms and data features is also provided. Given that to the best of our knowledge the literature lacks such a rich and well-tailored dataset, it is anticipated that the results of the work at hand will offer a solid basis for intrusion detection in the current as well as next-generation wireless networks.
Article
Full-text available
Two of the major challenges in designing anomaly intrusion detection are to maximize detection accuracy and to minimize false alarm rate. In addressing this issue, this paper proposes an ensemble of one-class classifiers where each adopts different learning paradigms. The techniques deployed in this ensemble model are; Linear Genetic Programming (LGP), Adaptive Neural Fuzzy Inference System (ANFIS) and Random Forest (RF). The strengths from the individual models were evaluated and ensemble rule was formulated. Prior to classification, a 2-tier feature selection process was performed to expedite the detection process. Empirical results show an improvement in detection accuracy for all classes of network traffic; Normal, Probe, DoS, U2R and R2L. Random Forest, which is an ensemble learning technique that generates many classification trees and aggregates the individual result was also able to address imbalance dataset problem that many of machine learning techniques fail to sufficiently address it.
Book
An up-to-date, self-contained introduction to a state-of-the-art machine learning approach, Ensemble Methods: Foundations and Algorithms shows how these accurate methods are used in real-world tasks. It gives you the necessary groundwork to carry out further research in this evolving field. After presenting background and terminology, the book covers the main algorithms and theories, including Boosting, Bagging, Random Forest, averaging and voting schemes, the Stacking method, mixture of experts, and diversity measures. It also discusses multiclass extension, noise tolerance, error-ambiguity and bias-variance decompositions, and recent progress in information theoretic diversity. Moving on to more advanced topics, the author explains how to achieve better performance through ensemble pruning and how to generate better clustering results by combining multiple clusterings. In addition, he describes developments of ensemble methods in semi-supervised learning, active learning, cost-sensitive learning, class-imbalance learning, and comprehensibility enhancement.
Article
The process of monitoring the events occurring in a computer system or network and analyzing them for sign of intrusions is known as intrusion detection system (IDS). This paper presents two hybrid approaches for modeling IDS. Decision trees (DT) and support vector machines (SVM) are combined as a hierarchical hybrid intelligent system model (DT–SVM) and an ensemble approach combining the base classifiers. The hybrid intrusion detection model combines the individual base classifiers and other hybrid machine learning paradigms to maximize detection accuracy and minimize computational complexity. Empirical results illustrate that the proposed hybrid systems provide more accurate intrusion detection systems.