Conference PaperPDF Available

An Evaluation on KNN-SVM Algorithm for Detection and Prediction of DDoS Attack

Authors:

Abstract and Figures

Recently, damage caused by DDoS attacks increases year by year. Along with the advancement of communication technology, this kind of attack also evolves and it has become more complicated and hard to detect using flash crowd agent, slow rate attack and also amplification attack that exploits a vulnerability in DNS server. Fast detection of the DDoS attack, quick response mechanisms and proper mitigation are a must for an organization. An investigation has been performed on DDoS attack and it analyzes the details of its phase using machine learning technique to classify the network status. In this paper, we propose a hybrid KNN-SVM method on classifying, detecting and predicting the DDoS attack. The simulation result showed that each phase of the attack scenario is partitioned well and we can detect precursors of DDoS attack as well as the attack itself.
Content may be subject to copyright.
adfa, p. 34, 2011.
© Springer-Verlag Berlin Heidelberg 2011
An Evaluation on KNN-SVM Algorithm for Detection
and Prediction of DDoS Attack
Ahmad Riza’ain Yusof
1,2
, Nur Izura Udzir
1
, Ali Selamat
2
Faculty of Computer Science and Information Technology,
Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia
1
{izura@upm.edu.my}
Faculty of Computing, Universiti Teknologi Malaysia,
UTM Johor Bahru, 81310 Johor Bahru, Malaysia
2
{rizaain, aselamat} @utm.my
Abstract Recently, damage caused by DDoS attacks increases year by year.
Along with the advancement of communication technology, this kind of attack
also evolves and it has become more complicated and hard to detect using flash
crowd agent, slow rate attack and also amplification attack that exploits a vulner-
ability in DNS server. Fast detection of the DDoS attack, quick response mecha-
nisms and proper mitigation are a must for an organization. An investigation has
been performed on DDoS attack and it analyzes the details of its phase using
machine learning technique to classify the network status. In this paper, we pro-
pose a hybrid KNN-SVM method on classifying, detecting and predicting the
DDoS attack. The simulation result showed that each phase of the attack scenario
is partitioned well and we can detect precursors of DDoS attack as well as the
attack itself.
Keywords: Distributed denial of services (DDoS), Machine learning classifiers,
Security, Intrusion detection, Prediction, support vector machine (SVM), k-
nearest neighbor (KNN), KNN-SVM
1 Introduction
Three aspects usually involve in computer related issues such as integrity, con-
fidentiality and availability. Security threats fall into three categories such as breach of
confidentiality, failure of authenticity and unauthorized denial of services [1]. Distrib-
uted Denial of Services (DDoS) become the major problem and it gives the latest threat
to the users, organizations and infrastructures of the internet. This type of intrusion
(DDoS) attacker attempts to disrupt a target, by flooding it with illegitimate packets,
exhausting its resource and overtaking it to prevent legitimate inquiries from getting
through. According to the security report of Arbor 2005-2010 [8].
This paper analyzes current research challenges in DDoS by evaluating machine
learning algorithms for detecting and predicting DDoS attack, which includes feature
extraction, classification, and clustering. Besides, various hybrid approaches have been
employed. It is illustrated that these evaluation results of research challenges are mainly
suitable for machine learning technique.
This paper is organized as follows. Section 2 provides a related study on an
overview of machine learning techniques and briefly describes a number of related
techniques for intrusion detection. Section 3 compares related work based on the types
of classifier design, the chosen baselines, datasets used for experiments, etc. Conclusion
and discussion for future research are given in Section 4.
2 Related Study
Lately, there are many reports that show the involvement of DDoS attack on
commercial or government website [4]. Along with the advancement technique of
DDoS attack, the studies on detection also evolve and as a result, various methods have
been suggested to counter DDoS attack. As we know, DDoS attack can be classified
into anomaly-based, congestion-based and others [3]. A network traffic controller using
machine learning (ML) techniques was proposed in 1990, aiming to maximize call
completion in a circuit-switched telecommunications network [1]. This was one of the
works that marked the point at which ML techniques expanded their application space
into the telecommunications networking field. In 1994, ML was first utilized for Inter-
net flow classification in the context of intrusion detection. It is the starting point for
much of the work using ML techniques in Internet traffic classification that follows.
Gavrilis et. al [4] utilized RBF-NN detector which is a two-layer neural net-
work. It uses nine packet parameters and the frequencies of these parameters are esti-
mated. Based on the frequencies, RBF-NN classifies traffic into attack or normal class.
In this study, the IP spoofing characteristic which is one of the most definite DDoS
attack evidences is not considered for a correct attack detection. Regarding UDP type
attacks, the detection efficiency is lower than that of TCP type attacks and is apparently
low in the beginning period of attacks. Defining k-means center which minimizes the
quantization error is also a difficult task.
The hybrid technique proposed by Ming-Yang Su et.al [5] is a method to weigh
features of DDoS attacks and it analyzed the relationship between detection perfor-
mance and number of features. The study proposed a genetic algorithm combined with
KNN (k-nearest-neighbor) for feature selection and weighting. All initial 35 features in
the training phase were weighted, and the top ones were selected to implement Network
Intrusion Detection System (NIDS) for testing. A fast mechanism to detect DDoS attack
is by extracting features from the network traffic, so that all these features come from
the headers, including IP, TCP, UDP, ICMP, ARP and IGMP. According to the frame-
work of Genetic Algorithm (GA), the proposed NIDS is described by three parts in the
section. The first subsection will present all features that are considered in the study;
the second subsection will state the encoding of a chromosome and the fitness function;
the third subsection will provide details on the selection, crossover, and mutation in the
GA. There is also an evaluation on machine learning technique on DDoS attack, pro-
posed by Suresh [10] which indicates that Fuzzy c-means clustering gives better clas-
sification and it is fast compared to the other algorithms.
3 Propose Work
In this section, we discuss the details of methods that have been utilized in this
work for detection and prediction of DDoS attack. There are k-nearest neighbor (KNN)
and support vector machine (SVM) or known as KNN-SVM.
3.1 Support Vector Machine (SVM)
In classification and regression, Support Vector Machines are the most common
and popular method for machine learning tasks [16]. In this method, a set of training
examples is given with which each example is marked belonging to one of the two
categories. Then, by using the SVM algorithm, a model that can predict whether a new
example falls into one category or the other is built.
3.2 K-Nearest Neighbor (k-NN)
A k-NN algorithm has been shown to be very effective for a variety of problem
domains including text categorizing [17]. It determines the class label of a test example
based on its k neighbor that is close to it. The similarity score of each neighbor docu-
ment to test document is used as the weight of categories of the neighbor document.
Referring to fig. 1, it has been effectively used to calculate the distance among neigh-
bors.
Fig. 1. A k-nearest neighbor (KNN) classifier [16]
3.3 Features Extraction
Various types of DDoS attacks are studied to select the traffic parameters that
change unusually during such attacks. There are eight features extracted from both da-
tasets using information gain rank. Then we rank all the features to identify which one
is more relevant. Many machine learning problems can actually enhance their accuracy
by applying features selection and extraction. This situation intensively indicates that
feature selection is also important for ranking [10]. Information gain is applied to meas-
ure the importance of each feature. The information gain of a given attribute X with
respect to the class Y is the reduction in uncertainty about the value of Y, after observ-
ing values of X. The uncertainty about the value of Y is measured by its entropy defined
as
H(Y) = -
 

(1)
where P(Yi) is the prior probabilities for all values of Y. The uncertainty about the
value of Y after observing values of X is given by the conditional entropy of Y given
X defined as
H () = -
(

)


 
  


 (2)
where P(
|
) is the posterior probabilities of Y given the values of X. The infor-
mation gain is thus defined as
IG () = H(Y) - H() (3)
Info. Gain
Rank
Features
No Features Description
1 5 Src bytes Number of data bytes from source to destination
2 23 Count Number of connections to the same host as the current
connection in the past two seconds
3 3 Service Network service on the destination, e.g., http, telnet,
etc.
4 24 Srv count Number of connections to the same service as the cur-
rent connection in the past two seconds
5 36 Dst host same
src port rate
Percentage of connections to the current host having
the same source port
6 2 Protocol Type Connection protocol ( TCP, UDP, ICMP)
7 33 Dst host srv
count
Count of connections having the same destination
host and using the same service
8 35 Dst host diff
srv rate
Percentage of different services on the current host
Table 1. : List of features extraction
By calculating information gain, the correlations of each attribute can be
ranked to the class. The most important attributes can then be selected based on the
ranking. Based on the result, the following eight feature vectors are selected for
detection of DDoS attacks.
3.4 Machine Learning Algorithms
In this part, we briefly describe machine learning algorithm which is used in our
experiment.
3.4.1 Naive Bayes
The Naïve Bayes is a simple probabilistic classifier. According to Livadas et.
al [12], a widely used framework for classification is provided by a simple theorem of
probability known as Bayes' rule, Bayes' theorem, or Bayes' formula:
3.4.2 C4.5
Among classification algorithms, the C4.5 system of Quinlan [13], shows the
result of research in machine learning that traces back to the ID
[14] system that tries
to locate small decision tree.
3.4.3 K-Mean Clustering
K-means or hard c means clustering is basically a partitioning method applied
to analyze data and treat observations of the data as objects based on locations and
distance between various input data points. Partitioning the objects into mutually ex-
clusive clusters (K) is done by it in such a fashion that objects within each cluster re-
main as close as possible to each other but as far as possible from objects in other clus-
ters [15].
3.4.4 K-NN Classifier
The k-NN algorithm is a similarity-based learning algorithm and is known to
be highly effective in various problem domains, including classification problems.
Given a test element dt, the k-NN algorithm finds its k-nearest neighbors among the
training elements, which form the neighborhood of dt. Majority voting among the ele-
ments in the neighborhood is used to decide the class for dt.
3.4.5 FCM Clustering
Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to
belong to two or more clusters. This method (developed by Dunn in 1973 and improved
by Bezdek in 1981) is frequently used in pattern recognition. It is based on minimiza-
tion of the following objective function:
(U,v) = 




  

(4)
Where m is any number greater than 1,

is the degree of membership of
in the
cluster j,
is the ith of the d-dimensional measured data,
is the d-dimension center
of the cluster, and ||*|| is any norm expressing the similarity between any measured data
and the center.
4 Experimental Result
The KDD99 dataset [17] is used in the experiments as the attack component. Clas-
sification of attack and normal traffic is done using WEKA. Table 2 shows the dataset
and the normal traffic. Table 3 shows the correct classification and the attack detection
time. Table 4 shows the F-measure details and Fig. 2 shows the evaluation results using
ROC curves for the selected machine learning techniques.
4.1 Performance Evaluation Criteria
Two criteria are chosen for evaluating performance of the classifier: True Positive
Rate (TPR) and False Positive Rate (FPR).
TPR =


, FPR =


(5)
In formula (3), TP(True Positive), FN(False Negative), FP(False Positive) and TN
(True Negative) are defined in [9]. TPR describes the sensitivity of our classifier while
FPR shows the rate of false alarms. According to TPR and FPR, a Receiver Operating
Characteristic (ROC) curve can be drawn, which is from signal detection theory.
Fig 1:
Attack traffic trace at 11.30 a.m [17]
Table 2. Sample collected
Method Used Correct Classification
%
Detection Time
(In Second)
SVM 96.4 0.23
KNN 96.6 0.26
Decision Tree 95.6 0.25
K-Mean 96.7 0.20
Naive Bayesian 92.9 0.52
Fuzzy C Mean 98.7 0.15
Table 3. Classification results
Network Data Data type Total number of record
Trained Full set data
494,021
Normal
97,277
DDoS Attack
391,458
Method TP FP TN FN F-Measure
SVM 281 18 253 20 0.96
KNN 280 20 243 30 0.97
Decision Tree 277 22 218 55 0.96
K-Mean 285 15 273 0 0.97
Naive Bayesian 292 10 256 17 0.97
Fuzzy C Mean 298 2 270 3 0.99
Table 4. F-Measure details of classifiers
5 Conclusion
The dataset is evaluated by using machine learning algorithms for effectively
detecting the DDoS attacks. KDD99 dataset is used as the attack data and based on
information gain ranking, relevant features have been selected. Experimental results
show that Fuzzy c-means clustering gives better classification and it is fast compared
to other algorithms.
6 Acknowledgement
The authors would like to thank anonymous reviewers for their constructive
comments and valuable suggestions. The authors wish to thank Universiti Teknologi
Malaysia (UTM) under Research University Grant Vot-02G31 and Ministry of Higher
Education Malaysia (MOHE) under the Fundamental Research Grant Scheme (FRGS
Vot-4F551) for completion of the research.
References
1. B. Silver, “Netman: A learning network traffic controller,” in Proc. Third International
Conference on Industrial and Engineering Applications of Artificial Intelligence and
Expert Systems, Association for Computing Machinery, 1990.
2. I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Tech-
niques with Java Implementations (Second Edition). Morgan Kaufmann Publishers,
2005.
3. P. Ferguson and D. Senie, "Network ingress filtering: Defeating denial of service
attacks which employ IP source address spoofing," RFC 2267, January 1998
4. Gavrilis, D., & Dermatas, E. (2005). Real-time detection of distributed denial-of-ser-
vice attacks using RBF networks and statistical features. Computer Networks, 48(2),
235–245. doi:10.1016/j.comnet.2004.08.014
5. Lee, K., Kim, J., Kwon, K.H., Han, Y., Kim, S.: DDoS Attack Detection Method using
Cluster Analysis. Expert Systems with Applications 34, 1659–1665 (2008)
6. Panda, M., Patra, M.R.: Evaluating Machine Learning Algorithms for Detecting Net-
work Intrusions. International Journal of Recent Trends in Engineering 1(1), 472–
477 (2009)
7. Arbor Networks Annual report(2015), http://www.arbornetworks.com/re-
sources/annual-security-report [accessed on January 2016]
8. Geng, X., Liu, T., Qin, T., & Li, H. (2007). Feature Selection for Ranking 2. Learning,
(49), 407–414.
9. Suresh, M., & Anitha, R. (2011). Evaluating Machine Learning Algorithms for De-
tecting DDoS Attacks. 4th International Conference, CNSA 2011, Chennai, India,
441–452. doi:10.1007/978-3-642-22540-6_42
10. Livadas, C., Walsh, R., Lapsley, D., & Strayer, W. T. (2006). Usilng Machine Learn-
ing Technliques to Identify Botnet Traffic. Local Computer Networks, Proceedings
2006 31st IEEE Conference on, 967–974. doi:10.1109/LCN.2006.322210
11. Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Ar-
tificial Intelligence Research, 4, 77–90. doi:10.1613/jair.279
12. J.R. Quinlan, ªInduction of Decision Trees,ºMachine Learning, vol. 1, no. 1,pp. 81-
106, 1996
13. Ghosh, S., & Dubey, S. (2013). Comparative Analysis of K-Means and Fuzzy C-
Means Algorithms. Ijacsa, 4(4), 35–39. doi:10.14569/IJACSA.2013.040406
14. Vapnik, V.: The Nature of Statitical Learning Theory. Springer, Heidelberg (1995)
15. Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2006). Using kNN Model-based
Approach for Automatic Text Categorization. Soft Computing, 10(5), 423–430.
16. M. Tavallaee, E. Bagheri, W. Lu, and A. a. Ghorbani, “A detailed analysis of the
KDD CUP 99 data set,” IEEE Symp. Comput. Intell. Secur. Def. Appl. CISDA 2009,
no. Cisda, pp. 1–6, 2009.
17. “The CAIDA UCSD ‘DDoS Attack 2007’ Dataset http://www.caida.org/data/pas-
sive/ddos-20070804_dataset.xml,”2013
... We got very few research papers where a machine learning algorithm was used to train a model for the detection of drive-by downloads, man-in-the-middle, and Malware attacks [102], [64], [44], [16], [79] 90.0 Naive Bayes [102], [93], [44], [5], [88], [116] 84.6 Random Forest [8], [93], [44], [36], [5], [88] 93.34 Decision Tress [8], [102], [44], [36], [16], [106] 96 XGBoost [8], [36], [70], [25], [90] 96.2 KNN [102], [36], [16], [32], [115] 96.5 [21], [85], [114], [58], [75], [6] 80.431 SVM [21], [85], [58], [75], [14], [6] 89.429 Random Forest [21], [85], [114], [58], [75], [13] 97.065 Decision Tree [85], [58], [75], [14], [6], [86] 95.248 Logistic Regression [21], [85], [75], [17], [76], [101] 92.589 KNN [21], [85], [58], [75], [6], [11] 90.479 or any of their combination. The few research papers where machine learning algorithms were used to train models for the detection of drive-by download or man-in-the-middle were so few that comparing the result with the performance of machine learning algorithms for the detection of other categories of cyberattack will lead to severe bias on the result by tilting it against the performance of ML models in the detection of other categories of cyberattacks. ...
Conference Paper
Full-text available
In this research, we analyzed the suitability of each of the current state-of-the-art machine learning models for various cyberattack detection from the past 5 years with a major emphasis on the most recent works for comparative study to identify the knowledge gap where work is still needed to be done with regard to detection of each category of cyberattack. We also reviewed the suitability, efficiency and limitations of recent research on state-of-the-art classifiers and novel frameworks in the detection of different cyberattacks. Our result shows the need for; further research and exploration on machine learning approach for the detection of drive-by download attacks, an investigation into the mix performance of Naive Bayes to identify possible research direction on improvement to existing state-of-the-art Naive Bayes classifier, we also identify that current machine learning approach to the detection of SQLi attack cannot detect an already compromised database with SQLi attack signifying another possible future research direction.
... Authors Mean Score Naive Bayes [27], [28], [29], [30], [31], [32], [33] 90.4 SVM [28], [29], [34], [30], [32], [35], [33] 87.63 Random Forest [27], [28], [32], [35], [36], [37], [33] 93.71 Logistic Regression [27], [30], [35], [38], [37], [33], [39] 89.68 KNN [32], [37], [33], [40], [41], [39], [42] 87.2 Decision Tree [35], [29], [36], [33], [40], [41], [43] 90.04 [44], [45], [46], [47], [48], [49], [50] 90.0 Naive Bayes [44], [51], [46], [52], [53], [54], [47] 84.6 Random Forest [55], [51], [46], [56], [52], [53], [57] 93.34 Decision Tress [55], [44], [46], [56], [47], [49], [58] 96 XGBoost [55], [56], [50], [57], [59], [60] 96.2 KNN [44], [56], [47], [61], [62], [63], [58] 96.5 Naive Bayes [23], [64], [65], [66], [24], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76] 80.431 SVM [23], [64], [66], [24], [77], [67], [78], [79], [80], [81], [82], [75] 89.429 Random Forest [23], [64], [65], [66], [24], [83], [77], [67], [80], [84], [78], [85], [82] 97.065 Decision Tree [64], [66], [24], [77], [67], [78], [80], [86], [82], [70], [71], [75] 95.248 Logistic Regression [23], [64], [24], [79], [80], [87], [85], [82], [88], [69], [70] 92.589 KNN [23], [64], [66], [24], [67], [84], [79], [80], [87], [85], [89], [82], [71], [75] 90.479 1. Insufficient research on the capability of machine learning algorithms for the detection of drive-by downloads, man-in-the-middle, and Malware attacks. ...
Preprint
Full-text available
To secure computers and information systems from attackers taking advantage of vulnerabilities in the system to commit cybercrime, several methods have been proposed for real-time detection of vulnerabilities to improve security around information systems. Of all the proposed methods, machine learning had been the most effective method in securing a system with capabilities ranging from early detection of software vulnerabilities to real-time detection of ongoing compromise in a system. As there are different types of cyberattacks, each of the existing state-of-the-art machine learning models depends on different algorithms for training which also impact their suitability for detection of a particular type of cyberattack. In this research, we analyzed each of the current state-of-theart machine learning models for different types of cyberattack detection from the past 10 years with a major emphasis on the most recent works for comparative study to identify the knowledge gap where work is still needed to be done with regard to detection of each category of cyberattack
... Denial-of-service (DoS) attacks [9] involve resource stacking to make a system inaccessible to service requests [3]. As with DDoS attacks [18] [19], these attacks are launched from a huge number of infected and controlled host devices. DDoS [26][27]assaults use botnets to fully disable a website or online service. ...
Article
The rapid evolution of communication networks, particularly Software-Defined Networking (SDN) and next-generation communication infrastructures, has introduced new challenges in securing these dynamic and complex environments. Among the most persistent threats are Distributed Denial of Service (DDoS) attacks, which can disrupt critical services and inflict severe economic and operational damages. To combat these threats, novel and adaptive DDoS detection mechanisms are crucial. This paper proposes a Bayesian Regularization (BR) optimization-based approach for DDoS detection in SDN and next-generation communication networks. Bayesian Regularization is a statistical technique that combines the strength of Bayesian analysis with optimization methodologies, enabling the model to adapt to changing network conditions and attack strategies. This approach leverages the inherent advantages of SDN, such as centralized control and real-time network monitoring, to enhance the accuracy and timeliness of DDoS detection.
... The decision boundary is maximized; when new data points arrive based on the nature of the point, it categorizes the data into the clusters previously formed. Thus, the SVM algorithm can successfully differentiate the exact nature of the flow of traffic in both normal and DDoS scenarios [39]. ...
Article
Full-text available
The development of smart network infrastructure of the Internet of Things (IoT) faces the immense threat of sophisticated Distributed Denial-of-Services (DDoS) security attacks. The existing network security solutions of enterprise networks are significantly expensive and unscalable for IoT. The integration of recently developed Software Defined Networking (SDN) reduces a significant amount of computational overhead for IoT network devices and enables additional security measurements. At the prelude stage of SDN-enabled IoT network infrastructure, the sampling based security approach currently results in low accuracy and low DDoS attack detection. In this paper, we propose an Adaptive Machine Learning based SDN-enabled Distributed Denial-of-Services attacks Detection and Mitigation (AMLSDM) framework. The proposed AMLSDM framework develops an SDN-enabled security mechanism for IoT devices with the support of an adaptive machine learning classification model to achieve the successful detection and mitigation of DDoS attacks. The proposed framework utilizes machine learning algorithms in an adaptive multilayered feed-forwarding scheme to successfully detect the DDoS attacks by examining the static features of the inspected network traffic. In the proposed adaptive multilayered feed-forwarding framework, the first layer utilizes Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), k-Nearest Neighbor (kNN), and Logistic Regression (LR) classifiers to build a model for detecting DDoS attacks from the training and testing environment-specific datasets. The output of the first layer passes to an Ensemble Voting (EV) algorithm, which accumulates the performance of the first layer classifiers. In the third layer, the adaptive frameworks measures the real-time live network traffic to detect the DDoS attacks in the network traffic. The proposed framework utilizes a remote SDN controller to mitigate the detected DDoS attacks over Open Flow (OF) switches and reconfigures the network resources for legitimate network hosts. The experimental results show the better performance of the proposed framework as compared to existing state-of-the art solutions in terms of higher accuracy of DDoS detection and low false alarm rate.
... Existing solutions for ISPs to defend against DDoS attacks involving statistical and traditional machine learning-based approaches [11,12,13] can be easily overwhelmed with current rates of DDoS traffic. Other approaches involving SDNs within ISPs network [14,15] suffer from scalability issues that come with SDN along with the threat that SDN itself could be an easy target of DDoS attacks. ...
Article
The growing number of IoT edge devices have inflicted a change in the cyber-attack space. The DDoS attacks, in particular, have significantly increased in magnitude and intensity. Of the existing DDoS solutions, while the destination-based defense mechanisms incur high false positives due to the seemingly legitimate nature of the attack traffic, defense mechanisms implemented at the source alone do not suffice due to the lack of visibility into ongoing DDoS attacks. This paper proposes a distributed DDoS detection and mitigation framework, SmartDefense, based on edge computing approaches towards detecting and mitigating DDoS attacks at and near the source. By mitigating the DDoS attacks near the source, SmartDefense significantly reduces unnecessary bandwidth otherwise consumed by DDoS traffic going from residential edge networks to the ISP edge network. Furthermore, SmartDefense demonstrates how ISPs can detect botnet devices in their customer’s network by having smart edge devices pass attributes that are processed by the botnet detection engine at the provider’s edge. The evaluation of this work shows that SmartDefense can improve the detection and mitigation rate, with over 90% of DDoS traffic caught at the source and over 97.5% of remaining DDoS traffic caught at the provider’s edge. Our experiments also demonstrate how using a botnet detection engine can further reduce the DDoS traffic by up to 51.95% by facilitating ISPs to detect bot devices in their customers’ edge network.
Chapter
The distributed denial-of-service (DDOS) exploit is one of the most catastrophic assaults on the Internet, disrupting the performance of critical administrations offered by numerous organizations. These attacks have become increasingly complicated, and their number has been steadily increasing, making it harder to detect and respond to such assaults As a result, a sharp security system (IDS) is necessary to detect and control any unexpected system traffic behavior. In a DDOS Assaults, the intruder delivers a stream of packets to the server while exploiting known or unknown flaws and vulnerabilities.KeywordsDDOS assaultsNetwork securityDecision treeNaïve BayesSVMNeural networkFuzzy logicLearning techniques
Article
Full-text available
In the arena of software, data mining technology has been considered as useful means for identifying patterns and trends of large volume of data. This approach is basically used to extract the unknown pattern from the large set of data for business as well as real time applications. It is a computational intelligence discipline which has emerged as a valuable tool for data analysis, new knowledge discovery and autonomous decision making. The raw, unlabeled data from the large volume of dataset can be classified initially in an unsupervised fashion by using cluster analysis i.e. clustering the assignment of a set of observations into clusters so that observations in the same cluster may be in some sense be treated as similar. The outcome of the clustering process and efficiency of its domain application are generally determined through algorithms. There are various algorithms which are used to solve this problem. In this research work two important clustering algorithms namely centroid based K-Means and representative object based FCM (Fuzzy C-Means) clustering algorithms are compared. These algorithms are applied and performance is evaluated on the basis of the efficiency of clustering output. The numbers of data points as well as the number of clusters are the factors upon which the behaviour patterns of both the algorithms are analyzed. FCM produces close results to K-Means clustering but it still requires more computation time than K-Means clustering.
Article
Full-text available
An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNN Model) is proposed. It combines the strength of both kNN and Rocchio. A text categorization prototype, which implements kNN Model along with kNN and Rocchio, is described. An experimental evaluation of different methods is carried out on two common document corpora: the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the proposed kNN model-based method outperforms the kNN and Rocchio classifiers, and is therefore a good alternative for kNN and Rocchio in some application areas.
Article
Full-text available
With recent advances in network based technology and increased dependability of our everyday life on this technology, assuring reliable operation of network based system is very important. Signature based intrusion detection systems cannot detect new attacks. These systems are the most used and developed ones. Current anomaly based intrusion detection systems are also unable to detect all kinds of new attacks because they are designed to restricted applications on limited environments. It is important problems to increase the detection rates and reduce the false positive rates in network intrusion detection systems (NIDS). In this paper, we propose machine learning algorithms such as Random Forest and AdaBoost, along with Naïve Bayes, to build an efficient intrusion detection model. We also report our experimental results over KDDCup'1999 datasets. The results shows that the choice of any data mining algorithm is a compromise among the time taken to build the model, detection rate and low false alarm rate .
Conference Paper
Full-text available
Recently, as the serious damage caused by DDoS attacks increases, the rapid detection of the attack and the proper response mechanisms are urgent. Signature based DDoS detection systems cannot detect new attacks. Current anomaly based detection systems are also unable to detect all kinds of new attacks, because they are designed to restricted applications on limited environments. However, existing security mechanisms do not provide effective defense against these attacks, or the defense capability of some mechanisms is only limited to specific DDoS attacks. It is necessary to analyze the fundamental features of DDoS attacks because these attacks can easily vary the used port/protocol, or operation method. Also lot of research work has been done in detecting the attacks using machine learning techniques. Still what are the relevant features and which technique will be more suitable one for the attack detection is an open question. In this paper, we use the chi-square and Information gain feature selection mechanisms for selecting the important attributes. With the selected attributes, various machine learning models, like Navies Bayes, C4.5, SVM, KNN, K-means and Fuzzy c-means clustering are developed for efficient detection of DDoS attacks. Then our experimental results show that Fuzzy c-means clustering gives better accuracy in identifying the attacks. KeywordsClassifier–Navies Bayes–SVM–C4.5–K-NN–K-means–Fuzzy c-means
Conference Paper
Full-text available
To date, techniques to counter cyber-attacks have predominantly been reactive; they focus on monitoring network traffic, detecting anomalies and cyber-attack traffic patterns, and, a posteriori, combating the cyber-attacks and mitigating their effects. Contrary to such approaches, we advocate proactively detecting and identifying botnets prior to their being used as part of a cyber-attack (Strayer et al., 2006). In this paper, we present our work on using machine learning-based classification techniques to identify the command and control (C2) traffic of IRC-based botnets - compromised hosts that are collectively commanded using Internet relay chat (IRC). We split this task into two stages: (I) distinguishing between IRC and non-IRC traffic, and (II) distinguishing between botnet and real IRC traffic. For stage I, we compare the performance of J48, naive Bayes, and Bayesian network classifiers, identify the features that achieve good overall classification accuracy, and determine the classification sensitivity to the training set size. While sensitive to the training data and the attributes used to characterize communication flows, machine learning-based classifiers show promise in identifying IRC traffic. Using classification in stage II is trickier, since accurately labeling IRC traffic as botnet and non-botnet is challenging. We are currently exploring labeling flows as suspicious and non-suspicious based on telltales of hosts being compromised
Book
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.
Article
Recent occurrences of various Denial of Service (DoS) attacks which have employed forged source addresses have proven to be a troublesome issue for Internet Service Providers and the Internet community overall. This paper discusses a simple, effective, and straightforward method for using ingress traffic filtering to prohibit DoS attacks which use forged IP addresses to be propagated from 'behind' an Internet Service Provider's (ISP) aggregation point.
Article
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.