Conference Paper

Towards a Generic Feature-Selection Measure for Intrusion Detection.

DOI: 10.1109/ICPR.2010.378 Conference: 20th International Conference on Pattern Recognition, ICPR 2010, Istanbul, Turkey, 23-26 August 2010
Source: DBLP

ABSTRACT Performance of a pattern recognition system depends strongly on the employed feature-selection method. We perform an in-depth analysis of two main measures used in the filter model: the correlation-feature-selection (CFS) measure and the minimal-redundancy-maximal-relevance (mRMR) measure. We show that these measures can be fused and generalized into a generic feature-selection (GeFS) measure. Further on, we propose a new feature-selection method that ensures globally optimal feature sets. The new approach is based on solving a mixed 0-1 linear programming problem (M01LP) by using the branchand-bound algorithm. In this M01LP problem, the number of constraints and variables is linear (O(n)) in the number n of full set features. In order to evaluate the quality of our GeFS measure, we chose the design of an intrusion detection system (IDS) as a possible application. Experimental results obtained over the KDD Cup '99 test data set for IDS show that the GeFS measure removes 93% of irrelevant and redundant features from the original data set, while keeping or yielding an even better classification accuracy.


Available from: Katrin Franke, Jun 03, 2015
1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: Anomaly detection in communication networks provides the basis for the uncovering of novel attacks, misconfigurations and network failures. Resource constraints for data storage, transmission and processing make it beneficial to restrict input data to features that are (a) highly relevant for the detection task and (b) easily derivable from network observations without expensive operations. Removing strong correlated, redundant and irrelevant features also improves the detection quality for many algorithms that are based on learning techniques. In this paper we address the feature selection problem for network traffic based anomaly detection. We propose a multi-stage feature selection method using filters and stepwise regression wrappers. Our analysis is based on 41 widely-adopted traffic features that are presented in several commonly used traffic data sets. With our combined feature selection method we could reduce the original feature vectors from 41 to only 16 features. We tested our results with five fundamentally different classifiers, observing no significant reduction of the detection performance. In order to quantify the practical benefits of our results, we analyzed the costs for generating individual features from standard IP Flow Information Export records, available at many routers. We show that we can eliminate 13 very costly features and thus reducing the computational effort for on-line feature generation from live traffic observations at network nodes.
    Machine Learning 12/2014; DOI:10.1007/s10994-014-5473-9 · 1.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a hybrid classifier using fuzzy clustering and several neural networks has been proposed. With using the fuzzy C-means algorithm, training samples will be clustered and the inappropriate data will be detected and moved to another dataset (Removed-Dataset) and used differently in the classification phase. Also, in the proposed method using the membership degree of samples to the clusters, the class of samples will be changed to the fuzzy class. Thus, for example in KDD cup99 dataset, any sample will have 5 membership degrees to classes DoS, Probe, Normal, U2R, and R2L. Afterwards, the neural networks will be trained by new labels then using a combination of regression and classification methods, the hybrid classifier will be created. Also to classify the outlier data, a fuzzy ARTMAP neural network is employed which is a part of the hybrid classifier. Evaluation of the proposed method is performed by KDDCup99 dataset for intrusion detection and Cambridge datasets for traffic classification problems. Our experimental results indicate that the proposed system has performed better than the previous works in the case of precision, recall and f-value also detection and false alarm rate. Also, ROC curve analysis shows that the proposed hybrid classifier has been better than the famous non-hybrid classifiers.
    11/2014; 6(11).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Web Application Firewalls (WAFs) analyze the HTTP traffic in order to protect Web applications from attacks. To be effective, WAFs need to analyze the payload of the packets. One of the techniques used for intrusion detection is to extract features from the payload by means of n-grams. An n-gram is a subsequence of n items from a given sequence. The number of n-grams is 256 to the nth power. Since it grows exponentially with n, the curse of dimensionality and computational complexity problem arise. In this paper we propose to apply feature selection in order to reduce the number of features extracted by n-grams and thus to improve the effectiveness of WAFs. We conduct experiments on our own HTTP data set. After extracting n-grams from this data set, we apply the Generic-Feature-Selection (GeFS) measure for intrusion detection [5] to select important features. We use four different classifiers to test the detection accuracy before and after feature selection. The experiments show that we can remove more than 95% of irrelevant and redundant features from the original data set (and thus improve the performance by more than 80% on average), while reducing only slightly (by less than 6%) the accuracy of WAFs.
    2011 3rd International Workshop on Security and Communication Networks (IWSCN); 05/2011