ArticlePublisher preview available

Anomaly-Based Intrusion Detection Using Extreme Learning Machine and Aggregation of Network Traffic Statistics in Probability Space

Authors:
  • Nokia Bell Labs, Espoo, Finland
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Recently, with the increased use of network communication, the risk of compromising the information has grown immensely. Intrusions have become more sophisticated and few methods can achieve efficient results while the network behavior constantly changes. This paper proposes an intrusion detection system based on modeling distributions of network statistics and Extreme Learning Machine (ELM) to achieve high detection rates of intrusions. The proposed model aggregates the network traffic at the IP subnetwork level and the distribution of statistics are collected for the most frequent IPv4 addresses encountered as destination. The obtained probability distributions are learned by ELM. This model is evaluated on the ISCX-IDS 2012 dataset, which is collected using a real-time testbed. The model is compared against leading approaches using the same dataset. Experimental results show that the presented method achieves an average detection rate of 91% and a misclassification rate of 9%. The experimental results show that our methods significantly improve the performance of the simple ELM despite a trade-off between performance and time complexity. Furthermore, our methods achieve good performance in comparison with the other few state-of-the-art approaches evaluated on the ISCX-IDS 2012 dataset.
This content is subject to copyright. Terms and conditions apply.
Cognitive Computation (2018) 10:848–863
https://doi.org/10.1007/s12559-018-9564-y
Anomaly-Based Intrusion Detection Using Extreme Learning Machine
and Aggregation of Network Traffic Statistics in Probability Space
Buse Gul Atli1·Yoan Miche2·Aapo Kalliola2·Ian Oliver2·Silke Holtmanns2·Amaury Lendasse3
Received: 1 November 2017 / Accepted: 22 May 2018 / Published online: 5 June 2018
©Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract
Recently, with the increased use of network communication, the risk of compromising the information has grown
immensely. Intrusions have become more sophisticated and few methods can achieve efficient results while the network
behavior constantly changes. This paper proposes an intrusion detection system based on modeling distributions of network
statistics and Extreme Learning Machine (ELM) to achieve high detection rates of intrusions. The proposed model
aggregates the network traffic at the IP subnetwork level and the distribution of statistics are collected for the most frequent
IPv4 addresses encountered as destination. The obtained probability distributions are learned by ELM. This model is
evaluated on the ISCX-IDS 2012 dataset, which is collected using a real-time testbed. The model is compared against leading
approaches using the same dataset. Experimental results show that the presented method achieves an average detection
rate of 91% and a misclassification rate of 9%. The experimental results show that our methods significantly improve the
performance of the simple ELM despite a trade-off between performance and time complexity. Furthermore, our methods
achieve good performance in comparison with the other few state-of-the-art approaches evaluated on the ISCX-IDS 2012
dataset.
Keywords Intrusion detection ·Network behavior analysis ·Probability density function ·Hierarchical clustering ·
Extreme learning machine
Introduction
In recent years, the advances in networking technology,
especially cloud services and the Internet of Things (IoT),
have created new businesses and connected the world by
converting it into a massive information system. This also
has drawn attention of hackers, since more and more
personal and private information have been stored in hosting
devices [6]. Therefore, security practices have been the
focus of intense research due to the requirement for a safe,
secure environment.
Yoan Miche
yoan.miche@nokia-bell- labs.com
1Department of Signal Processing and Acoustics,
Aalto University, Espoo, Finland
2Nokia Bell Labs, Espoo, Finland
3The University of Iowa, Iowa City, IA 52242, USA
Network behavior analysis (NBA) and intrusion detec-
tion systems (IDS) play an important role in cybersecurity.
They are potential defense mechanism layers to monitor
network and detect intrusions when user identification and
authentication mechanisms fail to do so. Intrusion detec-
tion systems are capable of recognizing malicious activities
by triggering an alert or logging the results [4]. Anomaly-
based intrusion detection systems analyze network events
and capture security problems by finding unusual activities
which do not conform to the normal baseline. In order to
support anomaly detection systems, NBA tools are deployed
for capturing, aggregating and comparing different network
behaviors [38].
Anomaly-based intrusion detection has been the focus
of intense research in recent years [24,30]. Despite the
significant number of existing studies in this area, more
research is needed due to the continuously evolving nature
of the attacks. In order to solve this problem, a practical
intrusion detection system should be able to update itself to
detect novel and stealthier attacks, as well as handle large
amount of streaming data [11,37].
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Another characteristic of DL-based methods, is that DL has a higher learning capability in comparison with traditional ML methods like Random Forest, Support Vector Machine, and KNN [7]. Thus, they are expected to learn highly complicated patterns to gain a higher accuracy than previous methods with more functionality. ...
... So, since the total positive contribution (Red stripes) is larger than the negative contribution (Blue stripe), the final value is greater than the base value. As shown under Label_2 (figure 8), examining the dependence plot for InPktLen8s10i [7] reveals a wider presence of Blue which indicates a lower likelihood of network traffic being acknowledged or classified as attacks. With this same variable, an examination under Label_3 (figure 9) reveals a scenario of mixed classification wherein some network traffics are acknowledged as attacks, and some also are ignored as attacks. ...
... In the Label_3 heatmap ( figure 17) however, while the left diagonal has all 1 or dark brown, denoting a perfect correlation, as the squares are correlating each feature to itself; the more positively correlated are SumPktOut and BytesPerSessOut with a correlation value of 0.7, and the least negatively correlated are BytesPerSessOut and InPktLen8s10i [7] with a correlation value of -0.6. For the rest, the larger the number, and the darker the color; the higher the correlation between the two features. ...
... Since cyberattacks have several purposes and characteristics, symptoms of malicious activity might not be evident to network professionals. With this in mind, IDS has resorted to machine learning algorithms to identify behaviors outside pre-established patterns (García-Teodoro et al., 2009;Kanimozhi and Jacob, 2019;Al-Utaibi and El-Alfy, 2018;Atli et al., 2018;Papamartzivanos et al., 2019). ...
... A network flow is an aggregation of network packets that share some characteristics, such as the same source and destination (Ring et al., 2019a). Most recent IDS literature (Viegas et al., 2019;Atli et al., 2018;Papamartzivanos et al., 2019) uses network flows to detect malicious traffic. While a flow-based approach makes the reasonable assumption that attacks likely span multiple packets, it must limit the amount of data kept for each flow to a summary, given the potentially large number of flows in the system and the computation resources required to store ample information. ...
... Atli et al. [8] proposed a classifier for IDS using the DL method, dataset as Different Dataset, with the novelty of being capable of achieving a detection rate of 96.95%, accuracy of 96.54%, and 0.83% of FAR, but it lacks an elaborate function over malicious traffic predictability. As the number of computer-controlled vehicles increases worldwide, it has the potential to enhance the driving experience, but it also introduces new safety concerns in the automotive industry. ...
... Table 4 shows recall comparison, and Fig. 7 shows recall in terms of the proposed model shown below as Eq. (18) (Atli et al. [8]). ...
Article
Full-text available
This paper aims to propose a new technique for identifying and categorizing malevolent Internet traffic within the context of security for smart devices. Given the rising usage of smart devices, including mobile phones, wearables, smart transportation-based devices, and the Internet of Things, concerns regarding their security are increasing. The need to develop effective security measures arises from the potential for attackers to compromise user data. In this study, we introduce an innovative approach that combines deep learning techniques, specifically convolutional neural networks (CNN), with long short-term memory (LSTM) for the purpose of detecting and categorizing malevolent Internet traffic. The objective of the proposed technique is to address the challenges related to time estimation by focusing on level prediction, resulting in a substantial reduction in prediction time for the identification of malevolent traffic. We utilize bidirectional long short-term memory–CNN (BI-LSTM–CNN) to identify malevolent communication and provide support for voice input. Experimental outcomes illustrate the effectiveness of our proposed technique in terms of precision, accuracy, F1 factor, false acceptance rate (FAR), false positive rate (FPR), and detection rate. In comparison with existing methods for detecting malevolent traffic, our approach achieves a 99.62% traffic detection rate, 99.98% accuracy, and 0.01% FAR, whereas the accuracy, detection rate of malevolent traffic, and FAR of existing methods are 99.88%, 97.32%, and 4.31%, respectively. These outcomes emphasize the superior performance and analysis of our technique, rendering it a valuable contribution to the realm of smart device security. In summary, this paper proposes a novel BI-LSTM–CNN technique for detecting malevolent traffic in smart devices. The proposed methodology tackles time estimation challenges and exhibits superior performance when compared to existing techniques.
... Extreme Learning Machine (ELM) is an advanced machine learning algorithm based on parallel programming single-layer feed-forward neural networks. Atli et al. [27] used ELM on the ISCX-IDS 2012 dataset, achieving 91% detection accuracy. Roshan et al. [28] employed ELM on the NSL-KDD dataset, detecting 81% of known attacks and 89% of unknown attacks. ...
Article
Full-text available
Recent advancements in information and communication technologies have led to a proliferation of online systems and services. To ensure these systems’ trustworthiness and prevent cybersecurity threats, Intrusion Detection Systems (IDS) are essential. Therefore, developing advanced and intelligent IDS models has become crucial. However, most existing IDS models rely on traditional machine learning algorithms with shallow learning behaviours, resulting in less efficient feature selection and classification performance for new attacks. Another issue is that these approaches are either network-based or host-based, often leading to the detection module missing many known attacks. Additionally, they struggle to handle the massive amounts of network traffic data flexible and scalable due to high model complexity. To address these challenges, an efficient hybrid IDS model is introduced, utilizing a MapReduce-based Black Widow Optimized Convolutional-Long Short-Term Memory (BWO-CONV-LSTM) network. The first stage of this IDS model involves feature selection using the Artificial Bee Colony (ABC) algorithm. The second stage employs a hybrid deep learning classifier model of BWO-CONV-LSTM on a MapReduce framework for intrusion detection from system traffic data. The proposed BWO-CONV-LSTM network combines Convolutional and LSTM neural networks, with hyper-parameters optimized by BWO to achieve the ideal architecture. The BWO-CONV-LSTM-based IDS model performance evaluations were conducted on the NSL-KDD, ISCX-IDS, UNSWNB15, and CSE-CIC-IDS2018 datasets. The results show that the proposed model achieves high intrusion detection performance, with accuracy rates of 98.67%, 97.003%, 98.667%, and 98.25% for the NSL-KDD, ISCX-IDS, UNSWNB15, and CSE-CIC-IDS2018 datasets, respectively. It also demonstrates fewer false values, reduced computation time, and improved classification coefficients.
... The DL based methodologies facilitate the classification of network traffic by allowing spontaneous extrication and selection of traits by training [8]. The salient point of a DL based modal is that learning ability of it is more than conventional ML models like Support Vector Machine (SVM), Artificial Neural Network (ANN), and k-NN [9]. Hence, these are supposed to learn extremely complex features to achieve greater accuracy with high functionality. ...
Article
The worldwide encrypted or https traffic on Internet accounts for the safe and secure communication between users and servers. However, cyber attackers are also exploiting https traffic to disguise their malignant activities. Detection of network threats in https traffic is a tiresome task for security experts owing to the convoluted nature of encrypted traffic on the web. Conventional detection techniques decrypt the network content, check it for threats, re-encrypt the network content, and then send it to the server. This approach jeopardizes the integrity of enciphering and the user’s secrecy and safety. In recent time, deep learning (DL) has emerged as one of the most fruitful AI methods that diminishes the manual resolution of features to enhance classification accuracy. A DL based strategy is suggested for recognition of threat in encrypted communication without using decryption. The three DL algorithms, as used by the proposed approach are, multilayer perceptron (MLP), long short-term memory (LSTM) and 1-D convolutional neural network (1-D CNN), which are experimented on the CTU-13 malware dataset containing flow-based attributes of network traffic. The outcome of the experiment exhibits that MLP based approach performs better in comparison to 1-D CNN and LSTM based ones and other existing approaches. Thus, the secrecy of the data is maintained and the capability of identifying threats in encrypted communication is augmented.
... The accuracy obtained by them is 92.1%. In [13] for achieving high rates of detection of intrusions, a detection system has been proposed on the basis of modeling distributions of network statistics and ELM. In the proposed model the network traffic is aggregated in the IP subnetwork level. ...
Article
Full-text available
A Network Intrusion Detection System (NIDS) is frequently used for monitoring and detecting malicious activities in network traffic. A typical NIDS has four stages: a data source, data pre-processing, a decision-making technique, and a defense reaction. We have utilized both anomaly and signature based techniques to build a framework which is resilient to identifying both known and unknown attack. The incoming data packet is fed into the Stacked Autoencoder to identify whether it is a benign or malicious. If found to be malicious we extract the most relevant features from the network packet using grey wolf optimization algorithm. Then these attribute are provided to RandomForest classifier to determine if this malign attack is present in our knowledge base. If it is present we progress to identify the attack type using LightGBM classifier. If not, we term it as zero-day attack. To evaluate the usability of the proposed framework we have assessed it using two publicly available datasets namely UNSW-NB15 and CIC-IDS-2017 dataset. We have obtained an accuracy of 90.94% and 99.67% on the datasets respectively.
Article
Full-text available
In real applications of cognitive computation, data with imbalanced classes are used to be collected sequentially. In this situation, some of current machine learning algorithms, e.g., support vector machine, will obtain weak classification performance, especially on minority class. To solve this problem, a new hybrid sampling online extreme learning machine (ELM) on sequential imbalanced data is proposed in this paper. The key idea is keeping the majority and minority classes balanced with similar sequential distribution characteristic of the original data. This method includes two stages. At the offline stage, we introduce the principal curve to build confidence regions of minority and majority classes respectively. Based on these two confidence zones, over-sampling of minority class and under-sampling of majority class are both conducted to generate new synthetic samples, and then, the initial ELM model is established. At the online stage, we first choose the most valuable ones from the synthetic samples of majority class in terms of sample importance. Afterwards, a new online fast leave-one-out cross validation (LOO CV) algorithm utilizing Cholesky decomposition is proposed to determine whether to update the ELM network weight at online stage or not. We also prove theoretically that the proposed method has upper bound of information loss. Experimental results on seven UCI datasets and one real-world air pollutant forecasting dataset show that, compared with ELM, OS-ELM, meta-cognitive OS-ELM, and OSELM with SMOTE strategy, the proposed method can simultaneously improve the classification performance of minority and majority classes in terms of accuracy, G-mean value, and ROC curve. As a conclusion, the proposed hybrid sampling online extreme learning machine can be effectively applied to the sequential data imbalance problem with better generalization performance and numerical stability.
Article
Full-text available
Most of the existing image blurriness assessment algorithms are proposed based on measuring image edge width, gradient, high-frequency energy, or pixel intensity variation. However, these methods are content sensitive with little consideration of image content variations, which causes variant estimations for images with different contents but same blurriness degrees. In this paper, a content-insensitive blind image blurriness assessment metric is developed utilizing Weibull statistics. Inspired by the property that the statistics of image gradient magnitude (GM) follows Weibull distribution, we parameterize the GM using β (scale parameter) and ɣ (shape parameter) of Weibull distribution. We also adopt skewness (η) to measure the asymmetry of the GM distribution. In order to reduce the influence of image content and achieve more robust performance, divisive normalization is then incorporated to moderate the β, ɣ, and η. The final image quality is predicted using a sparse extreme learning machine. Performances evaluation on the blur image subsets in LIVE, CSIQ, TID2008, and TID2013 databases demonstrate that the proposed method is highly correlated with human perception and robust with image contents. In addition, our method has low computational complexity which is suitable for online applications.
Article
Full-text available
Numerous state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage process: distortion description followed by distortion effects pooling. As for the first stage, the distortion descriptors or measurements are expected to be effective representatives of human visual variations, while the second stage should well express the relationship among quality descriptors and the perceptual visual quality. However, most of the existing quality descriptors (e.g., luminance, contrast, and gradient) do not seem to be consistent with human perception, and the effects pooling is often done in ad-hoc ways. In this paper, we propose a novel full-reference IQA metric. It applies non-negative matrix factorization (NMF) to measure image degradations by making use of the parts-based representation of NMF. On the other hand, a new machine learning technique [extreme learning machine (ELM)] is employed to address the limitations of the existing pooling techniques. Compared with neural networks and support vector regression, ELM can achieve higher learning accuracy with faster learning speed. Extensive experimental results demonstrate that the proposed metric has better performance and lower computational complexity in comparison with the relevant state-of-the-art approaches.
Article
Full-text available
Intrusion Detection is the identification of malicious activities in a given network by analyzing its traffic. Data mining techniques used for this analysis study the traffic traces and identify hostile flows in the traffic. Dimensionality Reduction in data mining focuses on representing data with minimum number of dimensions such that its properties are not lost and hence reducing the underlying complexity in processing the data. Principal Component Analysis (PCA) is one of the prominent dimensionality reduction techniques widely used in network traffic analysis. In this paper, we focus on the efficiency of PCA for intrusion detection and determine its Reduction Ratio (RR), ideal number of Principal Components needed for intrusion detection and the impact of noisy data on PCA. We carried out experiments with PCA using various classifier algorithms on two benchmark datasets namely, KDD CUP and UNB ISCX. Experiments show that the first 10 Principal Components are effective for classification. The classification accuracy for 10 Principal Components is about 99.7% and 98.8%, nearly same as the accuracy obtained using original 41 features for KDD and 28 features for ISCX, respectively.
Article
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
Conference Paper
Modern intrusion detection systems must handle many complicated issues in real-time, as they have to cope with a real data stream; indeed, for the task of classification, typically the classes are unbalanced and, in addition, they have to cope with distributed attacks and they have to quickly react to changes in the data. Data mining techniques and, in particular, ensemble of classifiers permit to combine different classifiers that together provide complementary information and can be built in an incremental way. This paper introduces the architecture of a distributed intrusion detection framework and in particular, the detector module based on a meta-ensemble, which is used to cope with the problem of detecting intrusions, in which typically the number of attacks is minor than the number of normal connections. To this aim, we explore the usage of ensembles specialized to detect particular types of attack or normal connections, and Genetic Programming is adopted to generate a non-trainable function to combine each specialized ensemble. Non-trainable functions can be evolved without any extra phase of training and, therefore, they are particularly apt to handle concept drifts, also in the case of real-time constraints. Preliminary experiments, conducted on the well-known KDD dataset and on a more up-to-date dataset, ISCX IDS, show the effectiveness of the approach.