An Efficient Fuzzy Clustering-Based Approach for Intrusion Detection

CEUR Workshop Proceedings 10/2011; 789.
Source: arXiv


The need to increase accuracy in detecting sophisticated cyber attacks poses
a great challenge not only to the research community but also to corporations.
So far, many approaches have been proposed to cope with this threat. Among
them, data mining has brought on remarkable contributions to the intrusion
detection problem. However, the generalization ability of data mining-based
methods remains limited, and hence detecting sophisticated attacks remains a
tough task. In this thread, we present a novel method based on both clustering
and classification for developing an efficient intrusion detection system
(IDS). The key idea is to take useful information exploited from fuzzy
clustering into account for the process of building an IDS. To this aim, we
first present cornerstones to construct additional cluster features for a
training set. Then, we come up with an algorithm to generate an IDS based on
such cluster features and the original input features. Finally, we
experimentally prove that our method outperforms several well-known methods.

Download full-text


Available from: Jérôme Darmont, Jan 23, 2014
12 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The dramatic proliferation of sophisticated cyber attacks, in conjunction with the ever growing use of Internet-based services and applications, is nowadays becoming a great concern in any organization. Among many efficient security solutions proposed in the literature to deal with this evolving threat, ensemble approaches, a particular family of data mining, have proven very successful in designing high performance intrusion detection systems (IDSs) resting on the mutual combination of multiple classifiers. However, the strength of ensemble systems depends heavily on the methods to generate and combine individual classifiers (ensemble members). In this thread, we propose a novel design method to generate a robust ensemble-based IDS. In our approach, individual classifiers are built using both the input feature space and additional features exploited from k-means clustering. In addition, the ensemble combination is calculated based on the classification ability of individual classifiers on different local data regions defined in form of k-means clustering. Experimental results prove that our solution is superior to several state-of-the-art methods.
    15th International Database Engineering and Applications Symposium (IDEAS 11), Lisbon, Portugal; 09/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cancer has been identified as the leading cause of death. It is predicted that around 20-26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective methodology to understand, prevent and cure cancer. Micro array technology provides a useful basis of achieving this ultimate goal. In particular to cancer research, it has become almost routine to create gene expression profiles, which can discriminate patients into good and poor prognosis groups, and identify possible tumor subtypes. This classification or predictive model offers a useful tool for individualized treatment of disease. However, the accuracy of existing classifiers have been constrained by the curse of dimensionality typically observed in micro array data. In addition to gene selection, one may transform the original data to another variation, where only key gene components are included. Unlike conventional transformation-based techniques found in the literature, this paper presents a novel method that makes use of cluster ensembles, specifically the summarizing information matrix, as the transformed data for the following classification step. Among different state-of-the-art methods, the link-based cluster ensemble approach (LCE) provides a highly accurate clustering, and thus particularly employed here. The performance of this transformation model is evaluated on published micro array datasets and C4.5, in comparison with benchmark techniques. The findings suggest that the new model can improve the classification accuracy of original data and performs better than the other transformation methods investigated in the empirical study.
    Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Increasing student retention has been a common goal of many academic institutions, especially in the university level. The negative effects of student attrition are evident to students, parents, university and the society as a whole. The first-year students are at the greatest risk of dropping out or not completing their degree on time. With this insight, a number of data mining methods have been developed for early detection of students at risk of dropout, hence the immediate application of assistive measure. As compared to western countries, this subject has attracted only a few studies in Thai university, with educational data mining being limited to the use of conventional classification models. This paper presents the most recent investigation of student dropout at Mae Fah Luang University, Thailand, and the novel reuse of link-based cluster ensemble as a data transformation framework for more accurate prediction. The empirical study on mixed-type data collection related to students’ demographic detail, academic performance and enrollment record, suggests that the proposed approach is usually more effective than several benchmark transformation techniques, across different classifiers.
    International Journal of Machine Learning and Cybernetics 02/2015; DOI:10.1007/s13042-015-0341-x