Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Classification-oriented Machine Learning methods are a precious tool, in modern Intrusion Detection Systems (IDSs), for discriminating between suspected intrusion attacks and normal behaviors. Many recent proposals in this field leveraged Deep Neural Network (DNN) methods, capable of learning effective hierarchical data representations automatically. However, many of these solutions were validated on data featuring stationary distributions and/or large amounts of training examples. By contrast, in real IDS applications different kinds of attack tend to occur over time, and only a small fraction of the data instances is labeled (usually with far fewer examples of attacks than of normal behavior). A novel ensemble-based Deep Learning framework is proposed here that tries to face the challenging issues above. Basically, the non-stationary nature of IDS log data is faced by maintaining an ensemble consisting of a number of specialized base DNN classifiers, trained on disjoint chunks of the data instances’ stream, plus a combiner model (reasoning on both the base classifiers predictions and original instance features). In order to learn deep base classifiers effectively from small training samples, an ad-hoc shared DNN architecture is adopted, featuring a combination of dropout capabilities, skip-connections, along with a cost-sensitive loss (for dealing with unbalanced data). Tests results, conducted on two benchmark IDS datasets and involving several competitors, confirmed the effectiveness of our proposal (in terms of both classification accuracy and robustness to data scarcity), and allowed us to evaluate different ensemble combination schemes.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To face the challenges above and try to overcome the limitations of some state-of-the-art approaches in the current literature, we propose an unsupervised, DL-based model for detecting DoS and DDoS attacks in a NIDS scenario. Our solution encompasses two main steps: (i) an ad-hoc preprocessing phase in which features with a certain degree of correlation are removed, and several additional features are created by applying non-linear functions to the original input (similarly to what was proposed in [12]); and (ii) an unsupervised learning phase in which a hybrid architecture combining Sparse AEs and U-Net-like models [13] (from now on referred as Sparse U-Net) is trained against legitimate traffic only and then used to reveal the presence of abnormal (possibly attack-related) behaviors. In particular, following the intuition of [14], a data augmentation strategy is exploited in our neural architecture to mitigate the risk of overfitting and yield more reliable models. ...
... Finally, the Additional Feature Generation component is based on the idea proposed in [12] consisting of enriching the original input vector with a fixed number of additional features. The additional features are computed by applying several distinguished non-linear functions to each original feature. ...
... In future works, we plan to investigate the usage of semisupervised architectures to take advantage of both the better capability of the unsupervised techniques in recognizing zeroday attacks and the higher accuracy achieved by the supervised methods. Moreover, since computer networks are evolving environments in which concept drift and data shift phenomena are expected to occur, we are considering the possibility of integrating our approach into an incremental ensemble learning scheme similar to that proposed in [12]. ...
Conference Paper
In the last few years, we experienced exponential growth in the number of cyber-attacks performed against com-panies and organizations. In particular, because of their ability to mask themselves as legitimate traffic, DoS and DDoS have become two of the most common kinds of attacks on computer networks. Modern Intrusion Detection Systems (IDSs) represent a precious tool to mitigate the risk of unauthorized network access as they allow for accurately discriminating between benign and malicious traffic. Among the plethora of approaches proposed in the literature for detecting network intrusions, Deep Learning (DL)-based IDSs have been proved to be an effective solution because of their ability to analyze low-level data (e.g., flow and packet traffic) directly. However, many current solutions require large amounts of labeled data to yield reliable models. Unfortunately, in real scenarios, small portions of data carry label information due to the cost of manual labeling conducted by human experts. Labels can even be completely missing for some reason (e.g., privacy concerns). To cope with the lack of labeled data, we propose an unsupervised DL-based intrusion detection methodology, combining an ad-hoc preprocessing procedure on input data with a sparse U-Net-like autoencoder architecture. The experimentation on an IDS benchmark dataset substantiates our approach's ability to recognize malicious behaviors correctly.
... A BiGAN-inspired model combined with a custom loss function is adopted for identifying intrusions in computer networks. In [14], the authors propose a supervised incremental Deep Learning scheme to cope with concept drifts and data shifts: a number of Residual Neural Networks (ResNet) are trained against disjoint data chunks gathered in different time windows. Then, the single models are combined in an ensemble model, which is further fine-tuned on a subset of data extracted by each data chunk. ...
... Then, the single models are combined in an ensemble model, which is further fine-tuned on a subset of data extracted by each data chunk. As shown in [20], the performances of different ML-based detection methods (including also [14]) can be further improved by embedding them in an Active Learning scheme. A different solution to deal with the problem proposes the usage of the Federated Learning framework: the underlying idea of this approach is to distribute the computation of the model among different nodes, which are the owner of data. ...
Chapter
In recent times, Machine Learning has played an important role in developing novel advanced tools for threat detection and mitigation. Intrusion Detection, Misinformation, Malware, and Fraud Detection are just some examples of cybersecurity fields in which Machine Learning techniques are used to reveal the presence of malicious behaviors. However, Out-of-Distribution, i.e., the potential distribution gap between training and test set, can heavily affect the performances of the traditional Machine Learning based methods. Indeed, they could fail in identifying out-of-samples as possible threats, therefore devising robust approaches to cope with this issue is a crucial and relevant challenge to mitigate the risk of undetected attacks. Moreover, a recent emerging line proposes to use generative models to yield synthetic likely examples to feed the learning algorithms. In this work, we first survey recent Machine Learning and Deep Learning based solutions to face both the problems, i.e., outlier detection and generation; then we illustrate the main cybersecurity application scenarios in which these approaches have been adopted successfully.KeywordsThreat DetectionOutlier GenerationDeep LearningEnsemble LearningGenerative ModelAutoencoder
... Folino and co. For analyzing non-stationary data like intrusion detection system logs, a novel ensemble-based deep learning framework was created by [18]. When employing string section learners, the capacity to construct a superior detection structure is essential in order to achieve a higher detection rate, when putting together an ensemble, one of the most difficult problems is choosing from the available base classifiers and combiners. ...
... University of Sindh Journal of Information and Communication Technology (USJICT) Vol.6(4), pg.:[8][9][10][11][12][13][14][15][16][17][18] ...
Article
Full-text available
This paper explores distributed denial of service (DDoS) attacks, their current threat level, and intrusion detection systems (IDS), which are one of key techniques for mitigating them. It focuses on the problems and issues that IDS systems encounter while detecting DDoS attacks, as well as the difficulties and obstacles that they face nowadays when integrating with artificial intelligence systems. These ID systems enable the automatic and real-time identification of harmful threats. However, the network requires a highly sophisticated security solution due to the frequency with which malicious threats emerge and change. A significant amount of research is required to create an intelligent and trustworthy identification system for research purposes; numerous ID datasets are freely accessible. Due to the rapid evolution of attack detection mechanisms and the complexity of malicious attacks, publicly available Identification databases must be completely changed. on a regular basis. Due to the ever-evolving attack detection mechanism and the complexity of malicious attacks, publicly available ID datasets must frequently be modified. A Convolutional Neural Network (CNN) network was trained using four distinct training algorithms. The CICDDoS2019 dataset, which contains the most recent DDoS attack types created in CICDDoS2019, was tested, According to the analysis; the "Gradient Descent with Momentum Backpropagation" algorithm could be trained quickly. Network data attacks were correctly detected 93.1 percent of the time. The results indicate that The Convolutional Neural Network is able to successfully defend against DDoS attacks detection by using intrusion detection systems IDS, as evidenced by the high accuracy values obtained.
... Those ML algorithms are typically binary classifiers, which aim at distinguishing between normal and attack-related behavior by processing feature values. This has proven to be very effective for detecting a wide variety of attacks, and in the last two decades originated a huge amount of research papers and industrial applications [42], [43], [44], [45], [46], [47], [48] that have the potential to improve security attributes of ICT systems. However, researchers and practitioners have to craft intrusion detectors for specific systems, network interfaces and attack models, to name a few. ...
... Their performance is then evaluated and compared against potential competitors, and then the detection system is deployed and put into operation. This is a consolidated flow that has been proven effective in many studies [42], [44], [45], [46], [47], [48]. ...
Chapter
Full-text available
Exercising Machine Learning (ML) algorithms to detect intrusions is nowadays the de-facto standard for data-driven detection tasks. This activity requires the expertise of the researchers, practitioners, or employees of companies that also have to gather labeled data to learn and evaluate the model that will then be deployed into a specific system. Reducing the expertise and time required to craft intrusion detectors is a tough challenge, which in turn will have an enormous beneficial impact in the domain. This paper conducts an exploratory study that aims at understanding to which extent it is possible to build an intrusion detector that is general enough to learn the model once and then be applied to different systems with minimal to no effort. Therefore, we recap the issues that may prevent building general detectors and propose software architectures that have the potential to overcome them. Then, we perform an experimental evaluation using several binary ML classifiers and a total of 16 feature learners on 4 public attack datasets. Results show that a model learned on a dataset or a system does not generalize well as is to other datasets or systems, showing poor detection performance. Instead, building a unique model that is then tailored to a specific dataset or system may achieve good classification performance, requiring less data and far less expertise from the final user.KeywordsIntrusion detectionGeneral modelTransferabilityMachine learningFeature learning
... In addition, many researchers have used advanced boosting technology to process long-tail data recently, and combined the Deep Neural Network (DNN) architecture with it to improve the classification effect by taking advantage of the good generalization performance of advanced boosting technology. Folino et al. [20] adopted an ad-hoc shared DNN architecture, featuring a combination of dropout capabilities, skip connections, along with a cost-sensitive loss to efficiently learn deep base classifiers from minority class samples,which is also the first attempt to combine a block-based learning scheme with DNN ensemble techniques to handle long-tail classification tasks, and the experimental results also confirm the feasibility and application prospects of the method. Bedi et al. [21] proposed an improved Siam-IDS (I-SiamIDS) algorithm-level method, using a collection of binary eXtreme Gradient Boosting, Siamese Neural Network and Deep Neural Network to improve the system's effectiveness in detecting intrusion attacks in an unbalanced network environment, but its computational time overhead is very large. ...
... The deep learning-based techniques in the literature [18,19] reconstruct the data well, but do not introduce new features for the data, which is also the biggest difference from GAN. In the literature [20][21][22], the method based on advanced boosting and DNN introduces the idea of modularization, and the scope of changes to the model structure is small, but because it is a combination of multiple models, it often increases the computational cost exponentially. In the literature [23][24][25][26][27][28][29][30], the method based on GAN introduces new features while adding minority class samples, which effectively improves the classification effect. ...
Article
Full-text available
With the rapid development and application of the mobile Internet, it is necessary to analyze and classify mobile traffic to meet the needs of users. Due to the difficulty in collecting some application data, the mobile traffic data presents a long-tailed distribution, resulting in a decrease in classification accuracy. In addition, the original GAN is difficult to train, and it is prone to "mode collapse". Therefore, this paper introduces the self-attention mechanism and gradient normalization into the auxiliary classifier generative adversarial network to form SA-ACGAN-GN model to solve the long-tailed distribution and training stability problems of mobile traffic data. This method firstly converts the traffic into images; secondly, to improve the quality of the generated images, the self-attention mechanism is introduced into the ACGAN model to obtain the global geometric features of the images; finally, the gradient normalization strategy is added to SA-ACGAN to further improve the data augmentation effect and improve the training stability. It can be seen from the cross-validation experimental data that, on the basis of using the same classifier, the SA-ACGAN-GN algorithm proposed in this paper, compared with other comparison algorithms, has the best precision reaching 93.8%; after adding gradient normalization, during the training process of the model, the classification loss decreases rapidly and the loss curve fluctuates less, indicating that the method proposed in this paper can not only effectively improve the long-tail problem of the dataset, but also enhance the stability of the model training.
... The designed novel FLbHTDN has five layers: input layer, hidden layer, classification layer, parameter tuning phase, and output layer, as described in fig. 3. The proposed FLbHTDN approach has been designed based on the frog leaping algorithm ) and deep neural model (DNM) (Folino et al. 2021). The fitness of the fog is updated in all layers to tune the parameters for rearing better results. ...
... The performance of the developed novel FLbHTDN method has been analyzed with other existing works that are, Deep belief Model (DBM) (Wang et al. 2021), Recurrent Neural Model (RNM) (Imrana 2021), Deep neural model (DNM) (Folino et al. 2021), and Convolutional Neural Model (CNM) (Mendonça 2021). Hence, the obtained existing models were implemented in the same java platform, and the results are discussed as follows. ...
Preprint
Full-text available
Nowadays, Internet-of-things (IoT) facilities have been used worldwide in all digital applications. Hence, maintaining the IoT communication system's security range is crucial to enrich the IoT advanced better. However, the harmful attacks can destroy security and degrade the IoT communication channel by making network traffic, system shutdown, and collapse. The present work has introduced a novel Frog Leap-based Hyper-parameter Tuned Deep Neural (FLbHTDN) model to overcome these issues to detect intrusion in the IoT communication paradigm. Hence, the dataset called Nsl-Kdd has been utilized to validate the pressed model. Initially, the preprocessing process functioned to remove the error from the trained dataset. Consequently, the present features in the dataset have been tracked, and the malicious features have been extracted and classified as specific attack classes. The designed model is executed in the Java platform, and the improvement measure of the developed technique has been validated by performing the comparative analysis. The proposed FLbHTDN approach has obtained the finest attack prediction score in less duration than the compared models.
... For the purpose of illustration, we assume that the TDS layer includes EBIDS (Ensemble Based IDS) [37], a ML-based Intrusion Detection technique adopting specialized ensembles of classification models to identify undetected attacks by analyzing traffic flow statistics extracted from network logs. Here, we use pcap format to share network flow information, but the proposed security event object, described in the previous section, is flexible and allows for supporting data shared in other formats. ...
... , TDS N . These instances are initialized by using the same parameters described in [37]. For the architecture of the base model, (i) the Extended Input layer produces the transformations √ ...
Article
Sharing threat events and Indicators of Compromise (IoCs) enables quick and crucial decision making relative to effective countermeasures against cyberattacks. However, the current threat information sharing solutions do not allow easy communication and knowledge sharing among threat detection systems (in particular Intrusion Detection Systems (IDS)) exploiting Machine Learning (ML) techniques. Moreover, the interaction with the expert, which represents an important component to gather verified and reliable input data for the ML algorithms, is weakly supported. To address all these issues, ORISHA, a platform for ORchestrated Information SHaring and Awareness enabling the cooperation among threat detection systems and other information awareness components, is proposed here. ORISHA is backed by a distributed Threat Intelligence Platform based on a network of interconnected Malware Information Sharing Platform instances, which enables the communication with several Threat Detection layers belonging to different organizations. Within this ecosystem, Threat Detection Systems mutually benefit by sharing knowledge that allows them to refine the underlying predictive accuracy. Uncertain cases, i.e. examples with low anomaly scores, are proposed to the expert, who acts with the role of oracle in an Active Learning scheme. By interfacing with a honeynet, ORISHA allows for enriching the knowledge base with further positive attack instances and then yielding robust detection models. An experimentation conducted on a well-known Intrusion Detection benchmark demonstrates the validity of the proposed architecture.
... , D i−k , respectively), which are fed with data instances gathered in different temporal intervals. Specifically, the incremental learning process adopted in our solution is loosely inspired by the work (Folino et al., 2021) proposing an ensemble-based deep learning approach trained on disjoint data chunks. Differently from this work where each model is trained independently, in our solution the model M i is trained (i.e., fine-tuned) from the weights of the model M i−1 and the sample D i . ...
Article
Full-text available
Modern IoT ecosystems are the preferred target of threat actors wanting to incorporate resource-constrained devices within a botnet or leak sensitive information. A major research effort is then devoted to create countermeasures for mitigating attacks, for instance, hardware-level verification mechanisms or effective network intrusion detection frameworks. Unfortunately, advanced malware is often endowed with the ability of cloaking communications within network traffic, e.g., to orchestrate compromised IoT nodes or exfiltrate data without being noticed. Therefore, this paper showcases how different autoencoder-based architectures can spot the presence of malicious communications hidden in conversations, especially in the TTL of IPv4 traffic. To conduct tests, this work considers IoT traffic traces gathered in a real setting and the presence of an attacker deploying two hiding schemes (i.e., naive and “elusive” approaches). Collected results showcase the effectiveness of our method as well as the feasibility of deploying autoencoders in production-quality IoT settings.
... Wang Zhendong et al [4] proposed an IGWO-BP neural network detection model for the problem that error backpropagation neural networks were random when giving values at the beginning and tended to fall into local optima during training. Folino F et al [5] proposed a new deep learning framework based on integration learning in an attempt to solve the problem of data imbalance. ...
Article
Full-text available
Using deep learning and machine learning techniques for network intrusion detection is of great significance for enhancing the defense capability of network security systems. Given the characteristics of generative adversarial networks, such as the approximate consistency of generated samples with the input data distribution but with a random distribution within a certain bounded interval, and in response to the problem of insufficient classification performance and detection omission caused by the imbalance of different degrees of data categories and quantities in network intrusion traffic, and in light of the fact that the effectiveness of existing classification algorithms based on unbalanced traffic data still has some room for improvement, this paper proposes a network intrusion detection strategy based on auxiliary classifier generative adversarial networks. The data expansion experiments are conducted with the intrusion detection dataset NSL-KDD. The data are classified into twenty-three categories before and after the expansion by binary classification validation. The results show that the expansion of the generated samples for unbalanced network traffic data improve the subsequent recognition effect significantly. Finally, five classification performance index verification experiments are conducted. The results prove that the strategy of this paper performs better in accuracy, precision, recall rate and F-value indexes, and is capable of obtaining a large number of features from limited samples and inferring complete data distribution based on fewer features. The model as a whole has stronger generalization ability and defense effect.
... A total of 98% of these datasets are regarded as normal, whereas the remaining 2% are categorized as assaults [11]. Folino et al. [12] suggested a novel deeplearning model based on ensemble learning for interpreting non-stationary datasets such as IDS logs. It is desirable to be able to construct a better detection system, especially when utilizing ensemble classifiers. ...
Article
Full-text available
Attacks on networks are currently the most pressing issue confronting modern society. Network risks affect all networks, from small to large. An intrusion detection system must be present for detecting and mitigating hostile attacks inside networks. Machine Learning and Deep Learning are currently used in several sectors, particularly the security of information, to design efficient intrusion detection systems. These systems can quickly and accurately identify threats. However, because malicious threats emerge and evolve regularly, networks need an advanced security solution. Hence, building an intrusion detection system that is both effective and intelligent is one of the most cognizant research issues. There are several public datasets available for research on intrusion detection. Because of the complexity of attacks and the continually evolving detection of an attack method, publicly available intrusion databases must be updated frequently. A convolutional recurrent neural network is employed in this study to construct a deep-learning-based hybrid intrusion detection system that detects attacks over a network. To boost the efficiency of the intrusion detection system and predictability, the convolutional neural network performs the convolution to collect local features, while a deep-layered recurrent neural network extracts the features in the proposed Hybrid Deep-Learning-Based Network Intrusion Detection System (HDLNIDS). Experiments are conducted using publicly accessible benchmark CICIDS-2018 data, to determine the effectiveness of the proposed system. The findings of the research demonstrate that the proposed HDLNIDS outperforms current intrusion detection approaches with an average accuracy of 98.90% in detecting malicious attacks.
... F-score represents twice multiplication of the precision and recall divided by the summation. The equation of F1-score is presented in Equation (16). ...
Article
Full-text available
Concept drift (CD) in data streaming scenarios such as networking intrusion detection systems (IDS) refers to the change in the statistical distribution of the data over time. There are five principal variants related to CD: incremental, gradual, recurrent, sudden, and blip. Genetic programming combiner (GPC) classification is an effective core candidate for data stream classification for IDS. However, its basic structure relies on the usage of traditional static machine learning models that receive onetime training, limiting its ability to handle CD. To address this issue, we propose an extended variant of the GPC using three main components. First, we replace existing classifiers with alternatives: online sequential extreme learning machine (OSELM), feature adaptive OSELM (FA-OSELM), and knowledge preservation OSELM (KP-OSELM). Second, we add two new components to the GPC, specifically, a data balancing and a classifier update. Third, the coordination between the sub-models produces three novel variants of the GPC: GPC-KOS for KA-OSELM; GPC-FOS for FA-OSELM; and GPC-OS for OSELM. This article presents the first data stream-based classification framework that provides novel strategies for handling CD variants. The experimental results demonstrate that both GPC-KOS and GPC-FOS outperform the traditional GPC and other state-of-the-art methods, and the transfer learning and memory features contribute to the effective handling of most types of CD. Moreover, the application of our incremental variants on real-world datasets (KDD Cup ‘99, CICIDS-2017, CSE-CIC-IDS-2018, and ISCX ‘12) demonstrate improved performance (GPC-FOS in connection with CSE-CIC-IDS-2018 and CICIDS-2017; GPC-KOS in connection with ISCX2012 and KDD Cup ‘99), with maximum accuracy rates of 100% and 98% by GPC-KOS and GPC-FOS, respectively. Additionally, our GPC variants do not show superior performance in handling blip drift.
... Another chunk-based learning scheme is presented in [21]. The authors use disjoint time-delimited chunks of the training dataset for training a series of DNN classifiers. ...
Preprint
Full-text available
The network security analyzers use intrusion detection systems (IDSes) to distinguish malicious traffic from benign ones. The deep learning-based IDSes are proposed to auto-extract high-level features and eliminate the time-consuming and costly signature extraction process. However, this new generation of IDSes still suffers from a number of challenges. One of the main issues of an IDS is facing traffic concept drift which manifests itself as new (i.e., zero-day) attacks, in addition to the changing behavior of benign users/applications. Furthermore, a practical DL-based IDS needs to be conformed to a distributed architecture to handle big data challenges. We propose a framework for adapting DL-based models to the changing attack/benign traffic behaviors, considering a more practical scenario (i.e., online adaptable IDSes). This framework employs continual deep anomaly detectors in addition to the federated learning approach to solve the above-mentioned challenges. Furthermore, the proposed framework implements sequential packet labeling for each flow, which provides an attack probability score for the flow by gradually observing each flow packet and updating its estimation. We evaluate the proposed framework by employing different deep models (including CNN-based and LSTM-based) over the CIC-IDS2017 and CSE-CIC-IDS2018 datasets. Through extensive evaluations and experiments, we show that the proposed distributed framework is well adapted to the traffic concept drift. More precisely, our results indicate that the CNN-based models are well suited for continually adapting to the traffic concept drift (i.e., achieving an average detection rate of above 95% while needing just 128 new flows for the updating phase), and the LSTM-based models are a good candidate for sequential packet labeling in practical online IDSes (i.e., detecting intrusions by just observing their first 15 packets).
... Also, Folino et al. [16] combined four base DNN classifiers, trained on disjoint chunks of the data instances' stream, and the meta classifier uses both the base classifiers predictions and original instance features for training and the final prediction tasks. Using two datasets the experimental results showed that the proposed ensemble model can act as a methodological basis robust and scalable enough for intelligent systems for the analysis of streaming IDS. ...
... On the other hand, deep neural models that currently yield accurate decisions in several cybersecurity domains (e.g., [4]- [7]), perform as black-box models, while easier-to-explain models are becoming increasingly desirable in several domains. Several eXplainable Artificial Intelligence (XAI) techniques [8] have been recently explored to produce explanations of decisions of deep neural models also in cybersecurity applications [9]. ...
... On the other hand, the alternative is to get a great deal more knowledge about a single non-ensemble system. An ensemble system can be more efficient in terms of total accuracy improvement by spreading the same increase in computing, storage, or communication resources among two or more methods, rather than increasing the resource usage for a single method (Folino et al., 2021). ...
Article
Full-text available
In this paper, we have looked at how easy it is for users in an organisation to be given different roles, as well as how important it is to make sure that the tasks are done well using predictive analytical tools. As a result, ensemble of classification and regression tree link Neural Network was adopted for evaluating the effectiveness of role-based tasks associated with organization unit. A Human Resource Manangement System was design and developed to obtain comprehensive information about their employees’ performance levels, as well as to ascertain their capabilities, skills, and the tasks they perform and how they perform them. Datasets were drawn from evaluation of the system and used for machine learning evaluation. Linear regression models, decision trees, and Genetic Algorithm have proven to be good at prediction in all cases. In this way, the research findings highlight the need of ensuring that users tasks are done in a timely way, as well as enhancing an organization’s ability to assign individual duties.
... Despite having a high computational complexity, ensemble-based techniques have a high level of accuracy as compared to the base models. An ensemble based DNN framework for the continuous analysis of intrusion detection along with the ability to learn hierarchical data-sets automatically has been proposed in Folino et al. (2021). Here, a log-stream of an intrusion detection system maintains an ensemble that contains classifiers trained on discrete chunks of the data-set instance and a combiner model. ...
Preprint
Full-text available
Wireless Sensor Networks (WSNs) is a promising technology with enormous applications in almost every walk of life. One of the crucial applications of WSNs is intrusion detection and surveillance at the border areas and in the defense establishments. The border areas are stretched in hundreds to thousands of miles, hence, it is not possible to patrol the entire border region. As a result, an enemy may enter from any point absence of surveillance and cause the loss of lives or destroy the military establishments. WSNs can be a feasible solution for the problem of intrusion detection and surveillance at the border areas. Detection of an enemy at the border areas and nearby critical areas such as military cantonments is a time-sensitive task as a delay of few seconds may have disastrous consequences. Therefore, it becomes imperative to design systems that are able to identify and detect the enemy as soon as it comes in the range of the deployed system. In this paper, we have proposed a deep learning architecture based on a fully connected feed-forward Artificial Neural Network (ANN) for the accurate prediction of the number of k-barriers for fast intrusion detection and prevention. We have trained and evaluated the feed-forward ANN model using four potential features, namely area of the circular region, sensing range of sensors, the transmission range of sensors, and the number of sensor for Gaussian and uniform sensor distribution. These features are extracted through Monte Carlo simulation. In doing so, we found that the model accurately predicts the number of k-barriers for both Gaussian and uniform sensor distribution with correlation coefficient (R = 0.78) and Root Mean Square Error (RMSE = 41.15) for the former and R = 0.79 and RMSE = 48.36 for the latter. Further, the proposed approach outperforms the other benchmark algorithms in terms of accuracy and computational time complexity.
... Despite having a high computational complexity, ensemble-based techniques have a high level of accuracy as compared to the base models. An ensemble based DNN framework for the continuous analysis of intrusion detection along with the ability to learn hierarchical data-sets automatically has been proposed in Folino, Folino, Guarascio, Pisani, and Pontieri (2021). Here, a log-stream of an intrusion detection system maintains an ensemble that contains classifiers trained on discrete chunks of the data-set instance and a combiner model. ...
Article
Full-text available
Wireless Sensor Networks (WSNs) is a promising technology with enormous applications in almost every walk of life. One of the crucial applications of WSNs is intrusion detection and surveillance at the border areas and in the defense establishments. The border areas are stretched in hundreds to thousands of miles, hence, it is not possible to patrol the entire border region. As a result, an enemy may enter from any point absence of surveillance and cause the loss of lives or destroy the military establishments. WSNs can be a feasible solution for the problem of intrusion detection and surveillance at the border areas. Detection of an enemy at the border areas and nearby critical areas such as military cantonments is a time-sensitive task as a delay of few seconds may have disastrous consequences. Therefore, it becomes imperative to design systems that are able to identify and detect the enemy as soon as it comes in the range of the deployed system. In this paper, we have proposed a deep learning architecture based on a fully connected feed-forward Artificial Neural Network (ANN) for the accurate prediction of the number of k-barriers for fast intrusion detection and prevention. We have trained and evaluated the feed-forward ANN model using four potential features, namely area of the circular region, sensing range of sensors, the transmission range of sensors, and the number of sensor for Gaussian and uniform sensor distribution. These features are extracted through Monte Carlo simulation. In doing so, we found that the model accurately predicts the number of k-barriers for both Gaussian and uniform sensor distribution with correlation coefficient (R = 0.78) and Root Mean Square Error (RMSE = 41.15) for the former and R = 0.79 and RMSE = 48.36 for the latter. Further, the proposed approach outperforms the other benchmark algorithms in terms of accuracy and computational time complexity.
... As a kind of combinational optimization learning method, ensemble learning can efficiently solve practical application problems [15] . Related studies have shown that simply training several neural networks and integrating their prediction can significantly improve the performance of neural networks [16][17] . However, ensemble learning was seldom applied to diagnose lungrelated diseases. ...
Article
Deep learning based analyses of computed tomography (CT) images contribute to automated diagnosis of COVID-19, and ensemble learning may commonly provide a better solution. Here, we proposed an ensemble learning method that integrates several component neural networks to jointly diagnose COVID-19. Two ensemble strategies are considered: the output scores of all component models that are combined with the weights adjusted adaptively by cost function back propagation; voting strategy. A database containing 8 347 CT slices of COVID-19, common pneumonia and normal subjects was used as training and testing sets. Results show that the novel method can reach a high accuracy of 99.37% (recall: 0.9981, precision: 0.989 3), with an increase of about 7% in comparison to single-component models. And the average test accuracy is 95.62% (recall: 0.958 7, precision: 0.955 9), with a corresponding increase of 5.2%. Compared with several latest deep learning models on the identical test set, our method made an accuracy improvement up to 10.88%. The proposed method may be a promising solution for the diagnosis of COVID-19.
... Intrusion detection is an important problem for both physical space [20][21][22] and cyberspace [23,24], for safety and security reasons. There are numerous previous studies attempting to address the difficulties of railway clearance intrusion detection. ...
Article
Full-text available
The efficiency and the effectiveness of railway intrusion detection are crucial to the safety of railway transportation. Most current methods of railway intrusion detection or obstacle detection are inappropriate for large-scale applications due to their high cost or limited coverage. In this study, we present a fast and low-cost solution to intrusion detection of high-speed railways. As the solution to heavy computational burdens in the current convolutional-neural-network-based detection methods, the proposed method is mainly a novel neural network based on the SSD framework, which includes a feature extractor using an improved MobileNet and a lightweight and efficient feature fusion module. In addition, aiming to improve the detection accuracy of small objects, the feature map weights are introduced through convolution operation to fuse features at different scales. TensorRT is employed to optimize and deploy the proposed network in the low-cost embedded GPU platform, NVIDIA Jetson TX2, to enhance the efficiency. The experimental results show that the proposed methods achieved 89% mAP on the railway intrusion detection dataset, and the average processing time for a single frame was 38.6 ms on the Jetson TX2 module, which satisfies the need of real-time processing.
... • Raw traffic -network traffic observed in an observation point, such as a line, to which the probe is attached, an Ethernet-based LAN, or the ports of a switch or router [10]. • Flow (also called traffic flow (e.g., [10]), network connection (e.g., [11]), internet stream (e.g., [12]))-grouped raw network traffic according the same properties, usually 5tuple: source and destination IP address, source and destination port number, and type of service. • Session-bi-directional flow. ...
Article
Full-text available
The enormous growth of services and data transmitted over the internet, the bloodstream of modern civilization, has caused a remarkable increase in cyber attack threats. This fact has forced the development of methods of preventing attacks. Among them, an important and constantly growing role is that of machine learning (ML) approaches. Convolutional neural networks (CNN) belong to the hottest ML techniques that have gained popularity, thanks to the rapid growth of computing power available. Thus, it is no wonder that these techniques have started to also be applied in the network traffic classification domain. This has resulted in a constant increase in the number of scientific papers describing various approaches to CNN-based traffic analysis. This paper is a survey of them, prepared with particular emphasis on a crucial but often disregarded aspect of this topic—the data transformation schemes. Their importance is a consequence of the fact that network traffic data and machine learning data have totally different structures. The former is a time series of values—consecutive bytes of the datastream. The latter, in turn, are one-, two- or even three-dimensional data samples of fixed lengths/sizes. In this paper, we introduce a taxonomy of data transformation schemes. Next, we use this categorization to describe various CNN-based analytical approaches found in the literature.
... Authors found that Xgboost outperforms SVM, NB, and RF-based IDSs. The authors [27] and [28] used deep learning based NIDS model which are more resource consuming and complex network intrusion detection system. ...
... Misuse intrusion detection is also called feature-based intrusion detection. e basic principle is to collect a large number of network intrusion characteristics and establish a network intrusion signature database by establishing a misuse detection model [6]. e detection process can be simply understood as comparing the status of the monitored network data with the established network intrusion signature database to determine whether the current network behavior is abnormal. ...
Article
Full-text available
Intrusion Detection System (IDS) is an important part of ensuring network security. When the system faces network attacks, it can identify the source of threats in a timely and accurate manner and adjust strategies to prevent hackers from intruding. Efficient IDS can identify external threats well, but traditional IDS has poor performance and low recognition accuracy. To improve the detection rate and accuracy of IDS, this paper proposes a novel ACGA-BPNN method based on adaptive clonal genetic algorithm (ACGA) and backpropagation neural network (BPNN). ACGA-BPNN is simulated on the KDD-CUP’99 and UNSW-NB15 data sets. The simulation results indicate that, in contrast to the methods based on simulated annealing (SA) and genetic algorithm (GA), the detection rate and accuracy of ACGA-BPNN are much higher than of GA-BPNN and SA-BPNN. In the classification results of KDD-CUP’99, the classification accuracy of ACGA-BPNN is 11% higher than GA-BPNN and 24.2% higher than SA-BPNN, and F-score reaches 99.0%. In addition, ACGA-BPNN has good global searchability and its convergence speed is higher than that of GA-BPNN and SA-BPNN. Furthermore, ACGA-BPNN significantly improves the overall detection performance of IDS.
... For instance, in the work of Diro and Chilamkurti (2018), the authors use the multi-layer feedforward neural networks for the task of classification, while Wu and Li (2021) used the ensemble power of random forest (Kam, 1995) and neural networks for feature selection to improve the performance of the model. The ensemble methods are used in other works such as Folino et al. (2021) or Jayanthi et al. (2021). However, as pointed out by other research studies (Dang, 2019), the ensemble methods might improve but not significantly the performance of the models. ...
Article
Purpose This study aims to explain the state-of-the-art machine learning models that are used in the intrusion detection problem for human-being understandable and study the relationship between the explainability and the performance of the models. Design/methodology/approach The authors study a recent intrusion data set collected from real-world scenarios and use state-of-the-art machine learning algorithms to detect the intrusion. The authors apply several novel techniques to explain the models, then evaluate manually the explanation. The authors then compare the performance of model post- and prior-explainability-based feature selection. Findings The authors confirm our hypothesis above and claim that by forcing the explainability, the model becomes more robust, requires less computational power but achieves a better predictive performance. Originality/value The authors draw our conclusions based on their own research and experimental works.
... Future ensemble ML algorithms, such as extremely randomized trees (ET), random forest (RF), and Voting, are used to increase the performances of these machine learning models. Folino et al. [10] developed a novel ensemble-based deep learning framework for the analysis of non-stationary data, such as those that typically occur in IDS logs. The ability to design a better detection system is desired to achieve a higher detection rate, particularly when using ensemble learners. ...
Article
Full-text available
Nowadays, network attacks are the most crucial problem of modern society. All networks, from small to large, are vulnerable to network threats. An intrusion detection (ID) system is critical for mitigating and identifying malicious threats in networks. Currently, deep learning (DL) and machine learning (ML) are being applied in different domains, especially information security, for developing effective ID systems. These ID systems are capable of detecting malicious threats automatically and on time. However, malicious threats are occurring and changing continuously, so the network requires a very advanced security solution. Thus, creating an effective and smart ID system is a massive research problem. Various ID datasets are publicly available for ID research. Due to the complex nature of malicious attacks with a constantly changing attack detection mechanism, publicly existing ID datasets must be modified systematically on a regular basis. So, in this paper, a convolutional recurrent neural network (CRNN) is used to create a DL-based hybrid ID framework that predicts and classifies malicious cyberattacks in the network. In the HCRNNIDS, the convolutional neural network (CNN) performs convolution to capture local features, and the recurrent neural network (RNN) captures temporal features to improve the ID system’s performance and prediction. To assess the efficacy of the hybrid convolutional recurrent neural network intrusion detection system (HCRNNIDS), experiments were done on publicly available ID data, specifically the modern and realistic CSE-CIC-DS2018 data. The simulation outcomes prove that the proposed HCRNNIDS substantially outperforms current ID methodologies, attaining a high malicious attack detection rate accuracy of up to 97.75% for CSE-CIC-IDS2018 data with 10-fold cross-validation.
Article
The rapid growth of Internet of Things (IoT) applications has raised concerns about the security of IoT communication systems, particularly due to a surge in malicious attacks leading to network disruptions and system failures. This study introduces a novel solution, the Hyper-Parameter Optimized Progressive Neural Network (HOPNET) model, designed to effectively detect intrusions in IoT communication networks. Validation using the Nsl-Kdd dataset involves meticulous data preprocessing for error rectification and feature extraction across diverse attack categories. Implemented on the Java platform, the HOPNET model undergoes comprehensive evaluation through comparative analysis with established intrusion detection methods. Results demonstrate the superiority of the HOPNET model, with improved attack prediction scores and significantly reduced processing times, highlighting the importance of advanced intrusion detection methods for enhancing IoT communication security. The HOPNET model contributes by establishing robust defense against evolving cyber threats, ensuring a safer IoT ecosystem, and paving the way for proactive security measures as the IoT landscape continues to evolve.
Article
Rapid technological advances and network progress has occurred in recent decades, as has the global growth of services via the Internet. Consequently, piracy has become more prevalent, and many modern systems have been infiltrated, making it vital to build information security tools to identify new threats. An intrusion detection system (IDS) is a critical information security technology that detects network fluctuations with the help of machine learning (ML) and deep learning (DL) approaches. However, conventional techniques could be more effective in dealing with advanced attacks. So, this paper proposes an efficient DL approach for network intrusion detection (NID) using an optimal weight-based deep neural network (OWDNN). The network traffic data was initially collected from three openly available datasets: NSL-KDD, CSE-CIC-IDS2018 and UNSW-NB15. Then preprocessing was carried out on the collected data based on missing values imputation, one-hot encoding, and normalization. After that, the data under-sampling process is performed using the butterfly-optimized k-means clustering (BOKMC) algorithm to balance the unbalanced dataset. The relevant features from the balanced dataset are selected using inception version 3 with multi-head attention (IV3MHA) mechanism to reduce the computation burden of the classifier. After that, the dimensionality of the selected feature is reduced based on principal component analysis (PCA). Finally, the classification is done using OWDNN, which classifies the network traffic as normal and anomalous. Experiments on NSL-KDD, CSE-CIC-IDS2018 and UNSW-NB15 datasets show that the OWDNN performs better than the other ID methods.
Article
The mission of an intrusion detection system (IDS) is to monitor network activities and assess whether or not they are malevolent. Specifically, anomaly-based IDS can discover irregular activities by discriminating between normal and anomalous deviations. Nonetheless, existing strategies for detecting anomalies generally rely on single classification models that are still incapable of reducing the false alarm rate and increasing the detection rate. This study introduces a dual ensemble model by combining two existing ensemble techniques, such as bagging and gradient boosting decision tree (GBDT). Multiple dual ensemble schemes involving various fine-tuned GBDT algorithms such as gradient boosting machine (GBM), LightGBM, CatBoost, and XGBoost, are extensively appraised using multiple publicly available data sets, such as NSL-KDD, UNSW-NB15, and HIKARI-2021. The results indicate that the proposed technique is a reasonable solution for the anomaly-based IDS task. Furthermore, we demonstrate that the combination of Bagging and GBM is superior to all alternative combination schemes. In addition, the proposed dual ensemble (e.g., Bagging-GBM) is considerably more competitive than similar techniques reported in the current literature.
Article
Malicious traffic detection is one of the most important parts of cyber security. The approaches of using the flow as the detection object are recognized as effective. Benefitting from the development of deep learning techniques, raw traffic can be directly used as a feature to detect malicious traffic. Most existing work usually converts raw traffic into images or long sequences to express a flow and then uses deep learning technology to extract features and classify them, but the generated features contain much redundant or even useless information, especially for encrypted traffic. The packet header field contains most of the packet characteristics except the payload content, and it is also an important element of the flow. In this paper, we only use the fields of the packet header in the raw traffic to construct the characteristic representation of the traffic and propose a novel flow-vector generation approach for malicious traffic detection. The preprocessed header fields are embedded as field vectors, and then a two-layer attention network is used to progressively generate the packet vectors and the flow vector containing context information. The flow vector is regarded as the abstraction of the raw traffic and is used to classify. The experiment results illustrate that the accuracy rate can reach up to 99.48% in the binary classification task and the average of AUC-ROC can reach 0.9988 in the multi-classification task.
Article
The goal of this systematic and broad survey is to present and discuss the main challenges that are posed by the implementation of Artificial Intelligence and Machine Learning in the form of Artificial Neural Networks in Cybersecurity, specifically in Intrusion Detection Systems. Based on the results of the state-of-the-art analysis with a number of bibliographic methods, as well as their own implementations, the authors provide a survey of the answers to the posed problems as well as effective, experimentally-found solutions to those key issues. The issues include hyperparameter tuning, dataset balancing, increasing the effectiveness of an ANN, securing the networks from adversarial attacks, and a range of non-technical challenges of applying ANNs for IDS, such as societal, ethical and legal dilemmas, and the question of explainability. Thus, it is a systematic review and a summary of the body of knowledge amassed around implementations of Artificial Neural Networks in Network Intrusion Detection, guided by an actual, real-world implementation.
Article
Nowadays, several kinds of attacks exist in cyberspace, and hence comprehensive research has been implemented to overcome these drawbacks. One such method to provide security in WSN (Wireless sensor network) is Intrusion Detection System. However, the determination of unknown attacks remains a major challenge in the intrusion detection system. Hence, the usage of deep learning methodologies remains to be an active area in cyber security. However, prevailing a deep learning algorithm possesses limitations such as comparatively low accuracy and heavily depends on the manual selection of the features. These problems have been analysed, and corresponding to the problems, the present study proposed an enhanced empirical-based component analysis to select relevant features. This proposed feature selection model integrates the advantages of both empirical mode decomposition and principal component analysis to retain most of the relevant features. The classification of the attack node with the selected features was performed with LSTM (Long Short Term Memory). The proposed framework validated datasets, namely NSL – KDD, CICIDS 2017, UNSW NB 2015, and KDD99 datasets, compared with the state of art methods. The comparative analysis with the prevailing methods proved the effectiveness of the presented system in terms of performance metrics such as accuracy, F1 score, Recall, FPR, FAR, etc.
Article
Network Intrusion Detection (NID) systems are one of the most powerful forms of defense for protecting public and private networks. Most of the prominent methods applied to NID problems consist of Deep Learning methods that have achieved outstanding accuracy performance. However, even though they are effective, these systems are still too complex to interpret and explain. In recent years this lack of interpretability and explainability has begun to be a major drawback of deep neural models, even in NID applications. With the aim of filling this gap, we propose ROULETTE: a method based on a new neural model with attention for an accurate, explainable multi-class classification of network traffic data. In particular, attention is coupled with a multi-output Deep Learning strategy that helps to discriminate better between network intrusion categories. We report the results of extensive experimentation on two benchmark datasets, namely NSL-KDD and UNSW-NB15, which show the beneficial effects of the proposed attention mechanism and multi-output learning strategy on both the accuracy and explainability of the decisions made by the method.
Article
Internet of things (IoT) security is a prerequisite for the rapid development of the IoT to enhance human well-being. Machine learning-based intrusion detection systems (IDS) have good protection capabilities. However, it is difficult to identify attack information in massive amounts of data, which leads to inefficient model detection when faced with insufficient samples for certain types of attacks. In this regard, this paper fuses deep learning methods and statistical ideas to address the problem of minority samples attack detection, and proposes an intrusion detection method for the IoT based on Improved Conditional Variational Autoencoder (ICVAE) and Borderline Synthetic Minority Oversampling Technique (BSM), called ICVAE-BSM. By introducing an auxiliary network into the Conditional Variational Autoencoder (CVAE) to adjust the output probability distribution of the encoder, learning the posterior distribution of different classes of samples, so that the distributions of samples of the same class are concentrated, and the distributions of different classes of samples are scattered in the latent space; then based on BSM, adaptively synthesize the edge latent variables in the latent space of ICVAE, and feed the new synthetic edge latent variables to the ICVAE’s decoder to generate representative new samples to balance the data set. The output of the encoder is connected to the Softmax classifier at last, and the original samples are mixed with the generated samples to fine-tune it to enhance its generalization ability for intrusion detection of minority samples. We use the NSL-KDD data set, CIC-IDS2017 data set and CSE-CIC-IDS2018 data set to simulate and evaluate the model, the experimental results show that the proposed method can more effectively improve the accuracy of IoT attack detection under the condition of unbalanced samples.
Article
Full-text available
Intrusion detection tools have largely benefitted from the usage of supervised classification methods developed in the field of data mining. However, the data produced by modern system/network logs pose many problems, such as the streaming and non-stationary nature of such data, their volume and velocity, and the presence of imbalanced classes. Classifier ensembles look a valid solution for this scenario, owing to their flexibility and scalability. In particular, data-driven schemes for combining the predictions of multiple classifiers have been shown superior to traditional fixed aggregation criteria (e.g., predictions’ averaging and weighted voting). In intrusion detection settings, however, such schemes must be devised in an efficient way, since (part of) the ensemble may need to be re-trained frequently. A novel ensemble-based framework is proposed here for the online intrusion detection, where the ensemble is updated through an incremental stream-oriented learning scheme, correspondingly to the detection of concept drifts. Differently from mainstream ensemble-based approaches in the field, our proposal relies on deriving, though an efficient genetic programming (GP) method, an expressive kind of combiner function defined in terms of (non-trainable) aggregation functions. This approach is supported by a system architecture, which integrates different kinds of functionalities, ranging from the drift detection, to the induction and replacement of base classifiers, up to the distributed computation of GP-based combiners. Experiments on both artificial and real-life datasets confirmed the validity of the approach.
Article
Full-text available
Intrusion detection system (IDS) is one of extensively used techniques in a network topology to safeguard the integrity and availability of sensitive assets in the protected systems. Although many supervised and unsupervised learning approaches from the field of machine learning have been used to increase the efficacy of IDSs, it is still a problem for existing intrusion detection algorithms to achieve good performance. First, lots of redundant and irrelevant data in high-dimensional datasets interfere with the classification process of an IDS. Second, an individual classifier may not perform well in the detection of each type of attacks. Third, many models are built for stale datasets, making them less adaptable for novel attacks. Thus, we propose a new intrusion detection framework in this paper, and this framework is based on the feature selection and ensemble learning techniques. In the first step, a heuristic algorithm called CFS-BA is proposed for dimensionality reduction, which selects the optimal subset based on the correlation between features. Then, we introduce an ensemble approach that combines C4.5, Random Forest (RF), and Forest by Penalizing Attributes (Forest PA) algorithms. Finally, voting technique is used to combine the probability distributions of the base learners for attack recognition. The experimental results, using NSL-KDD, AWID, and CIC-IDS2017 datasets, reveal that the proposed CFS-BA-Ensemble method is able to exhibit better performance than other related and state of the art approaches under several metrics.
Article
Full-text available
In recent years, advanced threat attacks are increasing, but the traditional network intrusion detection system based on feature filtering has some drawbacks which make it difficult to find new attacks in time. This paper takes NSL-KDD data set as the research object, analyses the latest progress and existing problems in the field of intrusion detection technology, and proposes an adaptive ensemble learning model. By adjusting the proportion of training data and setting up multiple decision trees, we construct a MultiTree algorithm. In order to improve the overall detection effect, we choose several base classifiers, including decision tree, random forest, kNN, DNN, and design an ensemble adaptive voting algorithm. We use NSL-KDD Test+ to verify our approach, the accuracy of the MultiTree algorithm is 84.2%, while the final accuracy of the adaptive voting algorithm reaches 85.2%. Compared with other research papers, it is proved that our ensemble model effectively improves detection accuracy. In addition, through the analysis of data, it is found that the quality of data features is an important factor to determine the detection effect. In the future, we should optimize the feature selection and preprocessing of intrusion detection data to achieve better results.
Article
Full-text available
Many Intrusion Detection Systems (IDS) has been proposed in the current decade. To evaluate the effectiveness of the IDS Canadian Institute of Cybersecurity presented a state of art dataset named CICIDS2017, consisting of latest threats and features. The dataset draws attention of many researchers as it represents threats which were not addressed by the older datasets. While undertaking an experimental research on CICIDS2017, it has been found that the dataset has few major shortcomings. These issues are sufficient enough to biased the detection engine of any typical IDS. This paper explores the detailed characteristics of CICIDS2017 dataset and outlines issues inherent to it.Finally, it also presents a combined dataset by eliminating such issues for better classification and detection of any future intrusion detection engine.
Article
Full-text available
Identification of network attacks is a matter of great concern for network operators due to extensive the number of vulnerabilities in computer systems and creativity of the attackers. Anomaly-based Intrusion Detection Systems (IDSs) present a significant opportunity to identify possible incidents, logging information and reporting attempts. However, these systems generate a low detection accuracy rate with changing network environment or services. To overcome this problem, we present a deep neural network architecture based on a combination of a stacked denoising autoencoder and a softmax classifier. Our architecture can extract important features from data and learn a model for detecting abnormal behaviors. The model is trained locally to denoise corrupted versions of their inputs based on stacking layers of denoising autoencoders in order to achieve reliable intrusion detection. Experimental results on real KDD-CUP'99 dataset show that our architecture outperformed shallow learning architectures and other deep neural network architectures.
Article
Full-text available
Humans and animals have the ability to continually acquire and fine-tune knowledge throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational learning systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback also for state-of-the-art deep and shallow neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which the number of tasks is not known a priori and the information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as neurosynaptic plasticity, critical developmental stages, multi-task transfer learning, intrinsically motivated exploration, and crossmodal learning.
Article
Full-text available
The evaluation of algorithms and techniques to implement intrusion detection systems heavily rely on the existence of well designed datasets. In the last years, a lot of efforts have been done towards building these datasets. Yet, there is still room to improve. In this paper, a comprehensive review of existing datasets is first done, making emphasis on their main shortcomings. Then, we present a new dataset that is built with real traffic and up-to-date attacks. The main advantage of this dataset over previous ones is its usefulness for evaluating IDSs that consider long-term evolution and traffic periodicity. Models that consider differences in daytime/night or weekdays/weekends can also be trained and evaluated with it. We discuss all the requirements for a modern IDS evaluation dataset and analyze how the one presented here meets the different needs.
Article
Full-text available
A model of an intrusion-detection system capable of detecting attack in computer networks is described. The model is based on deep learning approach to learn best features of network connections and Memetic algorithm as final classifier for detection of abnormal traffic.One of the problems in intrusion detection systems is large scale of features. Which makes typical methods data mining method were ineffective in this area. Deep learning algorithms succeed in image and video mining which has high dimensionality of features. It seems to use them to solve the large scale of features problem of intrusion detection systems is possible. The model is offered in this paper which tries to use deep learning for detecting best features.An evaluation algorithm is used for produce final classifier that work well in multi density environments.We use NSL-KDD and Kdd99 dataset to evaluate our model, our findings showed 98.11 detection rate. NSL-KDD estimation shows the proposed model has succeeded to classify 92.72% R2L attack group.
Article
Full-text available
Ensemble-based methods are among the most widely used techniques for data stream classification. Their popularity is attributable to their good performance in comparison to strong single learners while being relatively easy to deploy in real-world applications. Ensemble algorithms are especially useful for data stream learning as they can be integrated with drift detection algorithms and incorporate dynamic updates, such as selective removal or addition of classifiers. This work proposes a taxonomy for data stream ensemble learning as derived from reviewing over 60 algorithms. Important aspects such as combination, diversity, and dynamic updates, are thoroughly discussed. Additional contributions include a listing of popular open-source tools and a discussion about current data stream research challenges and how they relate to ensemble learning (big data streams, concept evolution, feature drifts, temporal dependencies, and others).
Article
Full-text available
In this work, we introduce a novel interpretation of residual networks showing they are exponential ensembles. This observation is supported by a large-scale lesion study that demonstrates they behave just like ensembles at test time. Subsequently, we perform an analysis showing these ensembles mostly consist of networks that are each relatively shallow. For example, contrary to our expectations, most of the gradient in a residual network with 110 layers comes from an ensemble of very short networks, i.e., only 10-34 layers deep. This suggests that in addition to describing neural networks in terms of width and depth, there is a third dimension: multiplicity, the size of the implicit ensemble. Ultimately, residual networks do not resolve the vanishing gradient problem by preserving gradient flow throughout the entire depth of the network - rather, they avoid the problem simply by ensembling many short networks together. This insight reveals that depth is still an open research question and invites the exploration of the related notion of multiplicity.
Conference Paper
Full-text available
A Network Intrusion Detection System (NIDS) helps system administrators to detect network security breaches in their organizations. However, many challenges arise while developing a flexible and efficient NIDS for unforeseen and unpredictable attacks. We propose a deep learning based approach for developing such an efficient and flexible NIDS. We use Self-taught Learning (STL), a deep learning based technique, on NSL-KDD - a benchmark dataset for network intrusion. We present the performance of our approach and compare it with a few previous work. Compared metrics include accuracy, precision, recall, and f-measure values.
Article
Full-text available
The prevalence of mobile phones, the internet-of-things technology, and networks of sensors has led to an enormous and ever increasing amount of data that are now more commonly available in a streaming fashion [1]-[5]. Often, it is assumed - either implicitly or explicitly - that the process generating such a stream of data is stationary, that is, the data are drawn from a fixed, albeit unknown probability distribution. In many real-world scenarios, however, such an assumption is simply not true, and the underlying process generating the data stream is characterized by an intrinsic nonstationary (or evolving or drifting) phenomenon. The nonstationarity can be due, for example, to seasonality or periodicity effects, changes in the users' habits or preferences, hardware or software faults affecting a cyber-physical system, thermal drifts or aging effects in sensors. In such nonstationary environments, where the probabilistic properties of the data change over time, a non-adaptive model trained under the false stationarity assumption is bound to become obsolete in time, and perform sub-optimally at best, or fail catastrophically at worst.
Article
Full-text available
Anomaly detection in communication networks provides the basis for the uncovering of novel attacks, misconfigurations and network failures. Resource constraints for data storage, transmission and processing make it beneficial to restrict input data to features that are (a) highly relevant for the detection task and (b) easily derivable from network observations without expensive operations. Removing strong correlated, redundant and irrelevant features also improves the detection quality for many algorithms that are based on learning techniques. In this paper we address the feature selection problem for network traffic based anomaly detection. We propose a multi-stage feature selection method using filters and stepwise regression wrappers. Our analysis is based on 41 widely-adopted traffic features that are presented in several commonly used traffic data sets. With our combined feature selection method we could reduce the original feature vectors from 41 to only 16 features. We tested our results with five fundamentally different classifiers, observing no significant reduction of the detection performance. In order to quantify the practical benefits of our results, we analyzed the costs for generating individual features from standard IP Flow Information Export records, available at many routers. We show that we can eliminate 13 very costly features and thus reducing the computational effort for on-line feature generation from live traffic observations at network nodes.
Conference Paper
Full-text available
Transfer Learning is a paradigm in machine learning to solve a target problem by reusing the learning with minor modifications from a different but related source problem. In this paper we propose a novel feature transference approach, especially when the source and the target problems are drawn from different distributions. We use deep neural networks to transfer either low or middle or higher-layer features for a machine trained in either unsupervised or supervised way. Applying this feature transference approach on Convolutional Neural Network and Stacked Denoising Autoencoder on four different datasets, we achieve lower classification error rate with significant reduction in computation time with lower-layer features trained in supervised way and higher-layer features trained in unsupervised way for classifying images of uppercase and lowercase letters dataset.
Article
Full-text available
Mixture of experts (ME) is one of the most popular and interesting combining methods, which has great potential to improve performance in machine learning. ME is established based on the divide-and-conquer principle in which the problem space is divided between a few neural network experts, supervised by a gating network. In earlier works on ME, different strategies were developed to divide the problem space between the experts. To survey and analyse these methods more clearly, we present a categorisation of the ME literature based on this difference. Various ME implementations were classified into two groups, according to the partitioning strategies used and both how and when the gating network is involved in the partitioning and combining procedures. In the first group, The conventional ME and the extensions of this method stochastically partition the problem space into a number of subspaces using a special employed error function, and experts become specialised in each subspace. In the second group, the problem space is explicitly partitioned by the clustering method before the experts’ training process starts, and each expert is then assigned to one of these sub-spaces. Based on the implicit problem space partitioning using a tacit competitive process between the experts, we call the first group the mixture of implicitly localised experts (MILE), and the second group is called mixture of explicitly localised experts (MELE), as it uses pre-specified clusters. The properties of both groups are investigated in comparison with each other. Investigation of MILE versus MELE, discussing the advantages and disadvantages of each group, showed that the two approaches have complementary features. Moreover, the features of the ME method are compared with other popular combining methods, including boosting and negative correlation learning methods. As the investigated methods have complementary strengths and limitations, previous researches that attempted to combine their features in integrated approaches are reviewed and, moreover, some suggestions are proposed for future research directions.
Article
Full-text available
We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn<sup>++</sup>.NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn<sup>++</sup> family of algorithms, that is, without requiring access to previously seen data. Learn<sup>++</sup>.NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn<sup>++</sup>.NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper.
Article
Full-text available
This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.
Conference Paper
Full-text available
We investigate potential simulation artifacts and their effects on the evaluation of network anomaly detection systems in the 1999 DARPA/MIT Lincoln Laboratory off-line intrusion detection evaluation data set. A statistical comparison of the simulated b ackground and training traffic with real t raffic c ollected from a university departmental server suggests the presence of artifacts that could allow a network anomaly detection system to d etect some novel i ntrusions based on idiosyncrasies of the underlying implementation of the simulation, with an artificially low false alarm rate. The evaluation problem can be mitigated by mixing real traffic into the simulation. We compare five anomaly detection algorithms on simulated and mixed traffic. On mixed traffic they detect fewer attacks, but t he e xplanations for these detections are more plausible.
Article
In this work we propose a new deep learning based approach for online classification on streams of high-dimensional data. While requiring very little historical data storage, our approach is able to alleviate catastrophic forgetting in the scenario of continual learning with no assumption on the stationarity of the data in the stream. To make up for the absence of historical data, we propose a new generative autoencoder endowed with an auxiliary loss function that ensures fast task-sensitive convergence. To evaluate our approach we perform experiments on two well-known image datasets, MNIST and LSUN, in a continuous streaming mode. We extend the experiments to a large multi-class synthetic dataset that allows to check the performance of our method in more challenging settings with up to 1000 distinct classes. Our approach is able to perform classification on dynamic data streams with an accuracy close to the results obtained in the offline classification setup where all the data are available for the full duration of training. In addition, we demonstrate the ability of our method to adapt to unseen data classes and new instances of already known data categories, while avoiding catastrophic forgetting of previously acquired knowledge.
Article
The use of deep learning models for the network intrusion detection task has been an active area of research in cybersecurity. Although several excellent surveys cover the growing body of research on this topic, the literature lacks an objective comparison of the different deep learning models within a controlled environment, especially on recent intrusion detection datasets. In this paper, we first introduce a taxonomy of deep learning models in intrusion detection and summarize the research papers on this topic. Then we train and evaluate four key deep learning models - feed-forward neural network, autoencoder, deep belief network and long short-term memory network - for the intrusion classification task on two legacy datasets (KDD 99, NSL-KDD) and two modern datasets (CIC-IDS2017, CIC-IDS2018). Our results suggest that deep feed-forward neural networks yield desirable evaluation metrics on all four datasets in terms of accuracy, F1-score and training and inference time. The results also indicate that two popular semi-supervised learning models, autoencoders and deep belief networks do not perform better than supervised feed-forward neural networks. The implementation and the complete set of results have been released for future use by the research community. Finally, we discuss the issues in the research literature that were revealed in the survey and suggest several potential future directions for research in machine learning methods for intrusion detection.
Article
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
Article
Network traffic anomaly detection is an important technique of ensuring network security. However, there are usually three problems with existing machine learning based anomaly detection algorithms. First, most of the models are built for stale data sets, making them less adaptable in real-world environments; Second, most of the anomaly detection algorithms do not have the ability to learn new models again based on changes in the attack environment; Third, from the perspective of data multi-dimensionality, a single detection algorithm has a peak value and cannot be well adapted to the needs of a complex network attack environment. Thus, we propose a new anomaly detection framework, and this framework is based on the organic integration of multiple deep learning techniques. In the first step, we used the Damped Incremental Statistics algorithm to extract features from network traffic; Second, we train Autoencoder with a small amount of label data; Third, we use Autoencoder to mark the abnormal score of network traffic; Fourth, the data with the abnormal score label is used to train the LSTM; Finally, the weighted method is used to get the final abnormal score. The experimental results show that our HELAD algorithm has better adaptability and accuracy than other state of the art algorithms.
Article
The massive growth of data that are transmitted through a variety of devices and communication protocols have raised serious security concerns, which have increased the importance of developing advanced intrusion detection systems (IDSs). Deep learning is an advanced branch of machine learning, composed of multiple layers of neurons that represent the learning process. Deep learning can cope with large-scale data and has shown success in different fields. Therefore, researchers have paid more attention to investigating deep learning for intrusion detection. This survey comprehensively reviews and compares the key previous deep learning-focused cybersecurity surveys. Through an extensive review, this survey provides a novel fine-grained taxonomy that categorizes the current state-of-the-art deep learning-based IDSs with respect to different facets, including input data, detection, deployment, and evaluation strategies. Each facet is further classified according to different criteria. This survey also compares and discusses the related experimental solutions proposed as deep learning-based IDSs. By analysing the experimental studies, this survey discusses the role of deep learning in intrusion detection, the impact of intrusion detection datasets, and the efficiency and effectiveness of the proposed approaches. The findings demonstrate that further effort is required to improve the current state-of-the art. Finally, open research challenges are identified, and future research directions for deep learning-based IDSs are recommended.
Article
Cutting edge Deep Learning (DL) techniques have been widely applied to areas like image processing and speech recognition so far. Likewise, some DL work has been done in the area of cybersecurity. In this survey, we focus on recent DL approaches that have been proposed in the area of cybersecurity, namely intrusion detection, malware detection, phishing/spam detection, and website defacement detection. First, preliminary definitions of popular DL models and algorithms are described. Then, a general DL framework for cybersecurity applications is proposed and explained based on the four major modules it consists of. Afterward, related papers are summarized and analyzed with regard to the focus area, methodology, model applicability, and feature granularity. Finally, concluding remarks and future work are discussed including the possible research topics that can be taken into consideration to enhance various cybersecurity applications using DL models.
Chapter
In a fast-growing digital era, the increase in devices connected to internet have raised many security issues. For providing security, varieties of the system are available in the IT sector, Intrusion Detection system is one of such system. The design of an efficient intrusion detection system is an open problem to the research community. In this paper, various machine learning algorithms have been used for detecting different types of Denial-of-Service attack. The performance of the models have been measured on the basis of binary and multi-classification. Furthermore, parameter tuning algorithm has been discussed. On the basis of performance parameters, XGBoost performs efficiently and in robust manner to find an intrusion. The proposed method i.e. XGBoost has been compared with other classifiers like AdaBoost, Naïve Bayes, Multi-layer perceptron (MLP) and K-Nearest Neighbour (KNN) on recently captured network traffic by Canadian Institute of Cybersecurity (CIC). In this research, average class error and overall error have been calculated for the multi-classification problem.
Chapter
The chapter is devoted at illustrating the basic principles and the current results which characterize the research on Deep Learning. The term refers to the theory and practice of devising and training complex neural networks for supervised and unsupervised tasks. Within the chapter, we illustrate the basic principle underlying the idea of a single neural unit, and will show how these units can be combined to realize a complex network. We shall discuss the basic algorithms for training a network and the recent advances proposed by the literature for scaling up the training to deep architectures. The chapter concludes by an overview of the most succesful deep architectures proposed in the literature, both for supervised and unsupervised learning.
Article
Network intrusion detection systems (NIDSs) play a crucial role in defending computer networks. However, there are concerns regarding the feasibility and sustainability of current approaches when faced with the demands of modern networks. More specifically, these concerns relate to the increasing levels of required human interaction and the decreasing levels of detection accuracy. This paper presents a novel deep learning technique for intrusion detection, which addresses these concerns. We detail our proposed nonsymmetric deep autoencoder (NDAE) for unsupervised feature learning. Furthermore, we also propose our novel deep learning classification model constructed using stacked NDAEs. Our proposed classifier has been implemented in graphics processing unit (GPU)-enabled TensorFlow and evaluated using the benchmark KDD Cup ’99 and NSL-KDD datasets. Promising results have been obtained from our model thus far, demonstrating improvements over existing approaches and the strong potential for use in modern NIDSs.
Conference Paper
Modern intrusion detection systems must handle many complicated issues in real-time, as they have to cope with a real data stream; indeed, for the task of classification, typically the classes are unbalanced and, in addition, they have to cope with distributed attacks and they have to quickly react to changes in the data. Data mining techniques and, in particular, ensemble of classifiers permit to combine different classifiers that together provide complementary information and can be built in an incremental way. This paper introduces the architecture of a distributed intrusion detection framework and in particular, the detector module based on a meta-ensemble, which is used to cope with the problem of detecting intrusions, in which typically the number of attacks is minor than the number of normal connections. To this aim, we explore the usage of ensembles specialized to detect particular types of attack or normal connections, and Genetic Programming is adopted to generate a non-trainable function to combine each specialized ensemble. Non-trainable functions can be evolved without any extra phase of training and, therefore, they are particularly apt to handle concept drifts, also in the case of real-time constraints. Preliminary experiments, conducted on the well-known KDD dataset and on a more up-to-date dataset, ISCX IDS, show the effectiveness of the approach.
Article
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research.
Conference Paper
Recently, deep learning has gained prominence due to the potential it portends for machine learning. For this reason, deep learning techniques have been applied in many fields, such as recognizing some kinds of patterns or classification. Intrusion detection analyses got data from monitoring security events to get situation assessment of network. Lots of traditional machine learning method has been put forward to intrusion detection, but it is necessary to improvement the detection performance and accuracy. This paper discusses different methods which were used to classify network traffic. We decided to use different methods on open data set and did experiment with these methods to find out a best way to intrusion detection.
Article
Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.
Article
While both cost-sensitive learning and online learning have been studied separately, these two issues have seldom been addressed simultaneously. Yet there are many applications where both aspects are important. This paper investigates a class of algorithmic approaches suitable for online cost-sensitive learning, designed for such problems. The basic idea is to leverage existing methods for online ensemble algorithms, and combine these with batch mode methods for cost-sensitive bagging/boosting algorithms. Within this framework, we describe several theoretically sound online cost-sensitive bagging and online cost-sensitive boosting algorithms, and show that the convergence of the proposed algorithms is guaranteed under certain conditions. We then present extensive experimental results on benchmark datasets to compare the performance of the various proposed approaches.
Article
Modern network intrusion detection systems must be able to handle large and fast changing data, often also taking into account real-time requirements. Ensemble-based data mining algorithms and their distributed implementations are a promising approach to these issues. Therefore, this work presents the current state of the art of the ensemble-based methods used in modern intrusion detection systems, with a particular attention to distributed approaches and implementations. This review also consider supervised and unsupervised data mining algorithms, more suitable to work in an environment that requires the analysis of data streams in real-time. Sharing knowledge across multiple nodes is another of the key points in designing appropriate NIDSs and for this reason, collaborative IDS were also included in this work. Finally, we discuss some open issues and lessons learned from this review, which can help researchers to design more efficient NIDSs.
Conference Paper
The ability to learn incrementally from streaming data either in an online or batch setting is of crucial importance for a prediction algorithm to learn from environments that generate vast amounts of data, where it is impractical or simply unfeasible to store all historical data. On the other hand, learning from streaming data becomes increasingly difficult when the probability distribution generating the data stream evolves over time, which renders the classification model generated from previously seen data suboptimal or potentially useless. Ensemble systems that employ multiple classifiers may be used to mitigate this effect, but even in such cases some classifiers (experts) become less knowledgeable for predicting on different domains than others as the distribution drifts. Further complication results when labeled data from a prediction (target) domain is not immediately available; hence, causing prediction on the target domain to yield sub-optimal results. In this work, we provide upper bounds on the loss, which hold with high probability, of a multiple expert system trained in such a nonstationary environment with verification latency. Furthermore, we show why a single model selection strategy can lead to undesirable results when learning in such nonstationary streaming settings. We present our analytical results with experiments on simulated as well as real-world data sets, comparing several different ensemble approaches to a single model.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Conference Paper
Multiple expert systems (MES) have been widely used in machine learning because of their inherent ability to decrease variance and improve generalization performance by receiving advice from more than one expert. However, a typical MES explicitly assumes that training and testing data are independent and identically distributed (iid), which, unfortunately, is often violated in practice when the probability distribution generating the data changes with time. One of the key aspects of any MES algorithm deployed in such environments is the decision rule used to combine the decisions of the experts. Many MES algorithms choose adaptive weighting schemes that adjust the weights of a classifier based on its loss in recent time, or use an average of the experts probabilities. However, in a stochastic setting where the loss of an expert is uncertain at a future point in time, which combiner method is the most reliable? In this work, we show that non-uniform weighting experts can provide a stable upper bound on loss compared to techniques such as a follow-the-Ieader or uniform methodology. Several well-studied MES approaches are tested on a variety of real-world data sets to support and demonstrate the theory.
Article
Network anomaly detection is an important and dynamic research area. Many network intrusion detection methods and systems (NIDS) have been proposed in the literature. In this paper, we provide a structured and comprehensive overview of various facets of network anomaly detection so that a researcher can become quickly familiar with every aspect of network anomaly detection. We present attacks normally encountered by network intrusion detection systems. We categorize existing network anomaly detection methods and systems based on the underlying computational techniques used. Within this framework, we briefly describe and compare a large number of network anomaly detection methods and systems. In addition, we also discuss tools that can be used by network defenders and datasets that researchers in network anomaly detection can use. We also highlight research directions in network anomaly detection.
Article
In network intrusion detection, anomaly-based approaches in particular suffer from accurate evaluation, comparison, and deployment which originates from the scarcity of adequate datasets. Many such datasets are internal and cannot be shared due to privacy issues, others are heavily anonymized and do not reflect current trends, or they lack certain statistical characteristics. These deficiencies are primarily the reasons why a perfect dataset is yet to exist. Thus, researchers must resort to datasets that are often suboptimal. As network behaviors and patterns change and intrusions evolve, it has very much become necessary to move away from static and one-time datasets toward more dynamically generated datasets which not only reflect the traffic compositions and intrusions of that time, but are also modifiable, extensible, and reproducible. In this paper, a systematic approach to generate the required datasets is introduced to address this need. The underlying notion is based on the concept of profiles which contain detailed descriptions of intrusions and abstract distribution models for applications, protocols, or lower level network entities. Real traces are analyzed to create profiles for agents that generate real traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. In this regard, a set of guidelines is established to outline valid datasets, which set the basis for generating profiles. These guidelines are vital for the effectiveness of the dataset in terms of realism, evaluation capabilities, total capture, completeness, and malicious activity. The profiles are then employed in an experiment to generate the desirable dataset in a testbed environment. Various multi-stage attacks scenarios were subsequently carried out to supply the anomalous portion of the dataset. The intent for this dataset is to assist various researchers in acquiring datasets of this kind for testing, evaluation, and comparison purposes, through sharing the generated datasets and profiles.
Conference Paper
Extensive use of computer networks and online electronic data and high demand for security has called for reliable intrusion detection systems. A repertoire of different classifiers has been proposed for this problem over last decade. In this paper we propose a combining classification approach for intrusion detection. Outputs of four base classifiers ANN, SVM, kNN and decision trees are fused using three combination strategies: majority voting, Bayesian averaging and a belief measure. Our results support the superiority of the proposed approach compared with single classifiers for the problem of intrusion detection.
Conference Paper
A hierarchical classification framework is proposed for discriminating rare classes in imprecise domains, characterized by rarity (of both classes and cases), noise and low class separability. The devised framework couples the rules of a rule-based classifier with as many local probabilistic generative models. These are trained over the coverage of the corresponding rules to better catch those globally rare cases/classes that become less rare in the coverage. Two novel schemes for tightly integrating rule-based and probabilistic classification are introduced, that classify unlabeled cases by considering multiple classifier rules as well as their local probabilistic counterparts. An intensive evaluation shows that the proposed framework is competitive and often superior in accuracy w.r.t. established competitors, while overcoming them in dealing with rare classes.
Article
In 1998 and again in 1999, the Lincoln Laboratory of MIT conducted a comparative evaluation of intrusion detection systems (IDSs) developed under DARPA funding. While this evaluation represents a significant and monumental undertaking, there are a number of issues associated with its design and execution that remain unsettled. Some methodologies used in the evaluation are questionable and may have biased its results. One problem is that the evaluators have published relatively little concerning some of the more critical aspects of their work, such as validation of their test data. The appropriateness of the evaluation techniques used needs further investigation. The purpose of this article is to attempt to identify the shortcomings of the Lincoln Lab effort in the hope that future efforts of this kind will be placed on a sounder footing. Some of the problems that the article points out might well be resolved if the evaluators were to publish a detailed description of their procedures and the rationale that led to their adoption, but other problems would clearly remain. Categories and Subject Descriptors: K.6.5 (Management of Computing and Information
Article
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
Article
The objective of this paper is to construct a lightweight Intrusion Detection System (IDS) aimed at detecting anomalies in networks. The crucial part of building lightweight IDS depends on preprocessing of network data, identifying important features and in the design of efficient learning algorithm that classify normal and anomalous patterns. Therefore in this work, the design of IDS is investigated from these three perspectives. The goals of this paper are (i) removing redundant instances that causes the learning algorithm to be unbiased (ii) identifying suitable subset of features by employing a wrapper based feature selection algorithm (iii) realizing proposed IDS with neurotree to achieve better detection accuracy. The lightweight IDS has been developed by using a wrapper based feature selection algorithm that maximizes the specificity and sensitivity of the IDS as well as by employing a neural ensemble decision tree iterative procedure to evolve optimal features. An extensive experimental evaluation of the proposed approach with a family of six decision tree classifiers namely Decision Stump, C4.5, Naive Baye’s Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern has been introduced.