Content uploaded by Zied Elouedi

Author content

All content in this area was uploaded by Zied Elouedi on Jan 07, 2014

Content may be subject to copyright.

Naive Bayes vs Decision Trees

in Intrusion Detection Systems

Nahla Ben Amor

Institute Supérieur de Gestion

41 Avenue de la libert´

e

2000 Le Bardo, Tunisie

nahla.benamor@gmx.fr

Salem Benferhat

CRIL - CNRS

Universit´

e d’Artois

Rue Jean Souvraz

Lens, Cedex, France

benferhat@cril.univ-

artois.fr

Zied Elouedi

Institute Supérieur de Gestion

41 Avenue de la libert´

e

2000 Le Bardo, Tunisie

zied.elouedi@gmx.fr

ABSTRACT

Bayes networks are powerful tools for decision and reason-

ing under uncertainty. A very simple form of Bayes net-

works is called naive Bayes, which are particularly eﬃcient

for inference tasks. However, naive Bayes are based on a

very strong independence assumption. This paper oﬀers an

experimental study of the use of naive Bayes in intrusion

detection. We show that even if having a simple structure,

naive Bayes provide very competitive results. The experi-

mental study is done on KDD’99 intrusion data sets. We

consider three levels of attack granularities depending on

whether dealing with whole attacks, or grouping them in

four main categories or just focusing on normal and abnor-

mal behaviours. In the whole experimentations, we compare

the performance of naive Bayes networks with one of well

known machine learning techniques which is decision tree.

Moreover, we compare the good performance of Bayes nets

with respect to existing best results performed on KDD’99.

1. INTRODUCTION

Intrusion detection in the context of information systems

is regarded as a set of attempts to compromise a computer

network resource security. There are two general approaches

to intrusion detection [1]:

Anomaly detection: The idea is that each user has a cer-

tain proﬁle within the system that will not be changed a

lot in time. Then, any signiﬁcant deviation from it will be

considered as an anomaly. Examples of anomaly approaches

are NIDES [9], EMERALD [11].

Misuse detection: The idea is that any intrusion can be

described by its signature characterized by the values of its

features. Systems based on this approach use diﬀerent mod-

els like state transition analysis e.g. STAT [8], or a more

formal pattern classiﬁcation e.g. IDIOT [7] and SNORT[16].

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

SAC’04, March 14-17, 2004, Nicosia, Cyprus.

Copyright 2004 ACM 1-58113-812-1/03/2004 ...$5.00.

Recently, Valdes [14] has performed a hybrid approach to

intrusion detection based on Bayes networks. Bayes net-

works [5, 10] are tools to reason with uncertain informa-

tion in the probability theory framework. They use directed

acyclic graphs to represent causal relations, and conditional

probabilities (of each node given its parents) to express un-

certainty of causal relations. Valdes [14] uses a simple form

of Bayes networks, called naive Bayes, composed of two lev-

els: one root node which represents a session class (normal

and diﬀerent kinds of attacks), and several leaf nodes, each

of them contain a feature of a connection.

Naive Bayes have several advantages due to their simple

structure. In particular, the construction of naive Bayes

is very simple. Moreover, the inference (classiﬁcation) is

achieved in a linear time (while the inference in Bayes net-

works with a general structure is known to be NP-complete

[3]). Finally, the construction of naive Bayes is incremen-

tal, in the sense that it can be easily updated (namely, it is

always easy to consider and take into account new cases in

hand). However, naive Bayes make a strong independence

relation assumption: features are independent in the con-

text of a session class. Such assumption is not always true,

and may have a negative inﬂuence on the inferred results.

The aim of this paper is to provide experimental results

showing that naive Bayes, with their simple structure and

despite their strong assumptions, can be very competitive.

Experimental results obtained in this paper use KDD’99 [15]

intrusion data sets. Indeed, diﬀerent experimentations are

performed according to three levels of attack granularities

depending on whether dealing with whole attacks, or group-

ing them in four main categories or just focusing on normal

and abnormal behaviours.

In order to evaluate the performance of naive Bayes, we

compare, based on same data, results given by naive Bayes,

to those of decision trees [12], considered as one of the well

known learning methods. We show that naive Bayes are re-

ally very competitive, and the performance diﬀerence with

respect to decision trees is not signiﬁcant. However, from

computation point of view, naive Bayes are more eﬃcient

both in the learning and in the classiﬁcation tasks. In-

deed, the construction of naive Bayes is linear while the

construction of optimal decision trees is in general an NP-

complete problem [4]. We also show that naive Bayes net-

works give very good results with respect to winning strat-

egy performed on KDD’99 data sets [15].

420

2004 ACM Symposium on Applied Computing

Section 2 provides respectively a description of the ba-

sics of decision trees and naive Bayes. Section 3 presents

KDD’99 data set. Then, Section 4 presents diﬀerent study

cases that will be handled in this paper. Section 5 focuses on

all attacks within decision trees and naive Bayes approaches.

Then, Section 6 emphasizes on the four categories of attacks

(i.e. DOS, R2L, U2R, PROBING). Section 7 handles the

case of normal and abnormal connection. Finally, Section 8

summarizes main results and concludes the paper.

2. BRIEF REFRESHER ON DECISION

TREES AND NAIVE BAYES

2.1 Decision trees

Decision trees are among the well known machine learn-

ing techniques. A decision tree is composed of three basic

elements:

-A decision node specifying a test attribute.

-An edge or a branch corresponding to the one of the pos-

sible attribute values which means one of the test attribute

outcomes.

-A leaf which is also named an answer node, contains the

class to which the object belongs.

In decision trees, two major phases should be ensured:

1. Building the tree. Based on a given training set, a

decision tree is built. It consists of selecting for each

decision node the ‘appropriate’ test attribute and also

to deﬁne the class labeling each leaf.

2. Classiﬁcation. In order to classify a new instance, we

start by the root of the decision tree, then we test the

attribute speciﬁed by this node. The result of this test

allows to move down the tree branch relative to the

attribute value of the given instance. This process will

be repeated until a leaf is encountered. The instance

is then being classiﬁed in the same class as the one

characterizing the reached leaf.

Several algorithms have been developed in order to ensure

the construction of decision trees and its use for the classi-

ﬁcation task. The ID3 and C4.5 algorithms developed by

Quinlan [12] are probably the most popular ones. We can

also mention the CART algorithm of Breiman and al. [2].

The majority of these algorithms use a descendent strategy,

i.e. from the root to the leaves. To ensure this procedure,

the following generic parameters are required:

-The attribute selection measure taking into account the dis-

criminative power of each attribute over classes in order to

choose the ‘best’ one as the root of the (sub) decision tree. In

other words, this measure should consider the ability of each

attribute Akto determine training objects’ classes. In the

literature many attribute selection measures are proposed

[2, 12]. We mention the gain ratio, used within the C4.5

algorithm [12] and based on the Shannon entropy, where

for an attribute Akand a set of objects T, it is deﬁned as

follows:

Gain(T, Ak) = Inf o(T)−Inf oAk(T) where (1)

Inf o(T) = −

n

X

i=1

freq(ci, T )

|T|log2

freq(ci, T )

|T|(2)

Inf oAk(T) = X

ak∈D(Ak)

|TAk

ak|

|T|Inf o(TAk

ak) (3)

and freq(ci, T ) denotes the number of objects in the set

Tbelonging to the class ciand TAk

akis the subset of objects

for which the attribute Akhas the value ak(belonging to

the domain of Akdenoted D(Ak)).

Then, Split I nf o(Ak) is deﬁned as the information con-

tent of the attribute Akitself [12]:

Split I nf o(T, Ak) = −X

ak∈D(Ak)

|TAk

ak|

|T|log2

|TAk

ak|

|T|(4)

So, the gain ratio is the information gain calibrated by Split

Info:

Gain ratio(T, Ak) = Gain(T, Ak)

Split I nf o(Ak)(5)

-The partitioning strategy having as objective to divide the

current training set by taking into account the selected test

attribute.

-The stopping criteria dealing with the condition(s) of stop-

ping the growth of a part of the decision tree (or even all

the decision tree). In other words, they determine whether

or not a training subset will be further divided.

2.2 Naive Bayes

Bayes networks are one of the most widely used graphical

models to represent and handle uncertain information [5,

10]. Bayes networks are speciﬁed by two components:

- A graphical component composed of a directed acyclic

graph (DAG) where vertices represent events and edges are

relations between events.

- A numerical component consisting in a quantiﬁcation

of diﬀerent links in the DAG by a conditional probability

distribution of each node in the context of its parents.

Naive Bayes are very simple Bayes networks which are

composed of DAGs with only one root node (called parent),

representing the unobserved node, and several children, cor-

responding to observed nodes, with the strong assumption

of independence among child nodes in the context of their

parent.

The classiﬁcation is ensured by considering the parent

node to be a hidden variable stating to which class each

object in the testing set should belong and child nodes rep-

resent diﬀerent attributes specifying this object.

Hence, in presence of a training set we should only com-

pute the conditional probabilities since the structure is unique.

Once the network is quantiﬁed, it is possible to classify

any new object giving its attributes’ values using the Baye’s

rule. expressed by:

P(ci|A) = P(A|ci)·P(ci)

P(A)(6)

where ciis a possible value in the session class and A is

the total evidence on attributes nodes. The evidence A

can be dispatched into pieces of evidence, say a1, a2, ..., an

relative to the attributes A1, A2, ..., An, respectively. Since

naive Bayes work under the assumption that these attributes

are independent (giving the parent node C), their combined

421

probability is obtained as follows:

P(ci|A) = P(a1|ci)·P(a2|ci)·... ·P(an|ci)·P(ci)

P(A)

(7)

Note that there is no need to explicitly compute the de-

nominator P(A) since it is determined by the normalization

condition.

3. DESCRIPTION OF KDD’99 DATA SET

The data used in this paper are those proposed in the

KDD’99 for intrusion detection [15] which are generally used

for benchmarking intrusion detection problems. They set up

an environment to collect TCP/IP dump raws from a host

located on a simulated military network. Each TCP/IP con-

nection is described by 41 discrete and continuous features

(e.g. duration, protocol type, ﬂag, etc.) and labeled as ei-

ther normal, or as an attack, with exactly one speciﬁc attack

type (e.g. Smurf, Perl, etc.). Attacks fall into four main cat-

egories:

-Denial of Service Attacks (DOS) in which an attacker over-

whelms the victim host with a huge number of requests.

-User to Root Attacks (U2R) in which an attacker or a

hacker tries to get the access rights from a normal host in

order, for instance, to gain the root access to the system.

-Remote to User Attacks (R2L) in which the intruder tries

to exploit the system vulnerabilities in order to control the

remote machine through the network as a local user.

-Probing in which an attacker attempts to gather useful

information about machines and services available on the

network in order to look for exploits.

4. DIFFERENT EXPERIMENTAL STUDY

CASES

We handle 10% of the whole KDD’99 dataset correspond-

ing to 494019 training connections and 311029 testing con-

nections. Diﬀerent experimentations performed in this pa-

per suppose that the connections to classify are certainly

known which is not always the case in the real TCP/IP

traﬃc. Namely, the used testing set corresponds to an Oﬀ

line traﬃc. The strategy behind diﬀerent experimentations

presented in this paper is based on the following points:

-Three levels of attack granularities: we can focal-

ize on three cases relative to diﬀerent attacks in order to

handle :

•Whole-attacks: all attack classes presented by KDD

dataset (see [15]) in addition to the normal situation.

•Five-classes: the four attack categories (i.e. DOS,

R2L, U2R, Probing). Note that there are 19.65% (resp.

79.07%, 0.23%, 0.22%, 0.83%) of normal (resp. DOS,

R2L, U2R, Probing) training connections and 19.48%

(resp. 73.90%, 5.21%, 0.07%, 1.34%) of normal (resp.

DOS, R2L, U2R, Probing) testing connections.

•Two-classes: i.e. Normal and Abnormal by grouping

all attacks in the same class (i.e. Abnormal).

-Gathering attacks: in the ﬁve-class and two-class cases,

there are two strategies to gather results either before or

after classiﬁcation:

•Gathering before classiﬁcation: the idea is to slightly

modify the dataset by grouping attacks belonging to

the same attack category (i.e. DOS, R2L, U2R or

Probing) or by grouping them in a unique class i.e.

abnormal (if we are interested with only normal and

abnormal connections).

•Gathering after classiﬁcation:

- For the ﬁve classes, the training set remains un-

changed. However, each connection classiﬁed into one

of the 38 attacks is assigned to the one of the four cat-

egories it belongs to.

- For the two classes, there are two strategies: either

we do not modify the training set and each connection

classiﬁed into one of the 38 attacks is simply labeled as

abnormal, or we ﬁrst modify the training set by gath-

ering attacks into four categories, then each connection

classiﬁed in one of these categories will be labeled as

abnormal.

In each of the studied cases, the evaluation of classiﬁcation

eﬃciency is based on the Percent of Correct Classiﬁcation

(PCC) of the instances belonging to the testing set.

Lastly, note for continuous variables there are diﬀerent

ways to construct naive Bayes. We have analyzed two cases:

normal distributions and kernel density estimation. We have

noticed that the PCC when using kernel gives general bet-

ter results that normal distributions. This result is not sur-

prising, since it has been shown in [6] that kernel distribu-

tions are usually better than gaussian distributions for naive

Bayes classiﬁers. In the following, we only give results of the

best strategy, namely when using kernel density estimation.

5. FOCUSING ON ALL ATTACKS

This section presents experimental results, where we are

interested to evaluate the ability of naive Bayes and deci-

sion trees to detect each elementary attack. Results of these

experimentations are summarized in Table 1. It shows that

both decision trees and naive Bayes are completely in accor-

dance with the training set which means that this latter is

coherent, i.e., almost all training instances characterized by

the same attributes’ values belong to the same class. This

behaviour is also kept with the classiﬁcation phase with a

little bit advantage for decision trees.

Table 1: PCC’s in the whole-attack case

training set testing set

Decision tree

99.99% 91.41%

Naive Bayes

99.23% 91.20%

6. FOCUSINGONTHEFOURCATEGORIES

OF ATTACKS

This section presents experimental results where we are

only interested to know to which category (normal, DOS,

R2L, U2R, Probing) a given connection belongs. In order

to achieve these experimentations, we have grouped attacks

belonging to the same category together. This is done be-

fore and after classiﬁcation. For the latter, we use results

422

of the whole-attack case by summing the number of occur-

rences relative to each attack category (i.e. DOS, R2L, U2R,

Probing). Table 2 gives PCC’s relative to these two experi-

mentations1.

Table 2: PCC’s relative to ﬁve classes

training set testing set

Decision tree

99.99% (99.99%) 92.28% (91.81%)

Naive Bayes

99.16% (99.28%) 91.47% (92.10%)

Similarly to the whole-attack case, decision trees’ PCC’s

slightly exceed those of naive Bayes in both learning and

classiﬁcations phases.

Note that gathering attacks before or after classiﬁcation

has no inﬂuence on decision trees. However, with naive

Bayes it is slightly better to gather results after classiﬁcation

rather than before it.

Table 3 presents confusion matrices (a two-dimensionel

table with a row and column for each class. Each element of

the matrix shows the number of test examples for which the

actual class is the row and the predicted class is the column).

It shows that Normal, DOS and Probing connections are well

classiﬁed with the two techniques. This is not the case for

R2L and U2R connections which are always misclassiﬁed.

Table 3: Confusion matrices relative to ﬁve classes

Decision tree

→Normal DOS R2L U2R Probing

Normal 99.50% 0.13% 0.01% 0.01% 0.36%

(60593) (99.43%) (0.14%) (0.02%) (0.02%) (0.39%)

DOS 2.76% 97.24% 0.00% 0.00% 0.00%

(229853) (2.94%) (96.57%) (0.10%) (0.00%) (0.39%)

R2L 96.55% 0.02% 0.52% 0.15% 2.76%

(16189) (75.77%) (2.79%) (0.45%) (4.27%) (16.71%)

U2R 79.82% 2.63% 1.75% 7.89% 7.89%

(228) (23.25%) (0.00%) (5.26%) (13.60%) (57.89%)

Probing 19.54% 5.16% 0.34% 0.00% 74.96%

(4166) (15.22%) (6.67%) (0.19%) (0.00%) (77.92%)

PCC 92.06% (92.80%)

Naive Bayes

Normal 96.64% 2.78% 0.23% 0.11% 0.25%

(60593) (97.68%) (1.29%) (0.30%) (0.15%) (0.58%)

DOS 3.09% 96.38% 0.10% 0.00% 0.43%

(229853) (2.75%) (96.65%) (0.01%) (0.00%) (0.58%)

R2L 85.09% 4.84% 7.11% 2.92% 0.04%

(16189) (88.80%) (0.01%) (8.66%) (1.54%) (0.99%)

U2R 54.39% 0.00% 8.33% 11.84% 25.44%

(228) (76.32%) (0.00%) (3.95%) (10.96%) (8.77%)

Probing 14.19% 3.36% 3.79% 0.48% 78.18%

(4166) (10.80%) (0.38%) (0.38%) (0.10%) (88.33%)

PCC 91.47% (92.10%)

Regarding decision trees, this behaviour does not reﬂect

the optimistic results obtained on the training data where

82.69% (resp. 98.93%) of U2R (resp. R2L) connections are

well-classiﬁed. This is due to the fact that the proportions,

in the training set, of U2R and R2L attacks are very low

(0.22% for U2R and 0.23% for R2L).

1values between parentheses are relative to gathering whole-

attacks results into ﬁve classes after classiﬁcation

In fact, within decision trees, when a class is represented

by a low number of training instances, then it leads to a

weak learning regarding this class and consequently to a

misclassiﬁcation of testing connections really belonging to it.

Hence, we can have new testing instances really belonging

to U2R and R2L attacks, but characterized by attributes’

values which deviate from those characterizing these two

classes in the training set. These instances are not already

learned in the construction phase and their resulting class

when applying the induced tree are generally wrong.

A thorough analysis of sub-attacks pertaining to the U2R

class conﬁrms this remark since, for instance, the ”Http-

tunnel” attack which is massively presented in the testing

test (158 connections over 228 U2R attacks) is characterized

by REJ value for the attribute ﬂag but it never appears

in the training set. The same explanation holds for naive

Bayes since in the learning phase the conditional probabil-

ity of REJ in the context of U2R will be equal to zero (i.e.

P(REJ |U2R) = 0). Thus, testing connections pertain-

ing, eﬀectively, to U2R but presenting the value REJ in the

attribute ﬂag will be missclassiﬁed.

7. FOCUSING ON NORMAL AND ABNOR-

MAL CONNECTIONS

In this section, we emphasize on normal behaviour namely

we are only interested to know if a given connection is nor-

mal or not. For this purpose, we have studied confusion ma-

trices and PCC’s values by focusing on normal connections

over the abnormal ones. We ﬁrst consider the case where we

gather all attacks before classiﬁcation, then the case where

gathering is made after classiﬁcation using results on the

whole-class case and those relative to the ﬁve-class case. In-

duced results are summarized in Table 42.

Table 4: PCC’s relative to the normal and abnormal

connections

training set testing set

Decision tree

99.99% (99.99%, 99.99%) 93.02% (93.55%, 92.52%)

Naive Bayes

98.71% (99.31%, 99.24%) 91.45% (92.69%, 92.40%)

As with previous experimentations, Table 4 shows that

the gap between decision trees and naive Bayes is insignif-

icant since the PCC is almost the same within the two ap-

proaches and presents a very good rate. More precisely,

the confusion matrix presented by Table 5 shows that in all

kinds of experimentations, decisions trees are slightly better

than naive Bayes except in the case where gathering is made

before classiﬁcation.

8. RELATED WORK AND CONCLUDING

DISCUSSIONS

The major deduced remarks are the followings:

According to Table 6, we can see that dealing with all

attacks, ﬁve classes or only two classes do not considerably

aﬀect the classiﬁcation quality using either the decision tree

2values between parentheses are relative to gathering whole-

attacks and ﬁve classes results into two classes after classi-

ﬁcation

423

Table 5: Confusion matrix relative to the normal

and abnormal classes

Decision tree

→Normal Abnormal

Normal 99.39% 0.61%

(60593) (99.43%, 98.50%) (0.57%, 0.50%)

Abnormal 8.53% 91.47%

(250436) (7.87%, 9.17%) (92.13%, 90.83%)

PCC 93.02% (93.55%, 92.52%)

Naive Bayes

→Normal Abnormal

Normal 98.59% 1.41%

(60593) (97.68%, 96.64%) (2.32%, 3.36%)

Abnormal 10.28% 89.72%

(250436) (8.52%, 8.62%) (91.47%, 91.38%)

PCC 91.45% (92.69%, 92.40%)

Table 6: Summary Table: PCC’s on the testing set

whole attacks five classes normal and abnormal

Decision tree

91.69% 92.06% 93.02%

(92.8% ) (93.55%, 92.52%)

Naive Bayes

91.20% 91.47% 91.45%

(92.10%) (92.69%, 92.40% )

or naive Bayes. Namely, this never exceeds 1% diﬀerence

between diﬀerent strategies.

Table 7 shows that both techniques presented in this pa-

per are competitive with the winning strategy in KDD’99

[15](and also share their failure to correctly classify R2L

and U2R connections) which is based on a mixture of bag-

ging and boosting decision tree technique [13]. It shows that

even we have not used these two options there are some cases

where results with decision trees and naive Bayes are equal

or slightly better than those of the winning strategy. In-

deed, there are strategies when within decision trees we get

better PCCs with Normal, DOS and R2L categories while

within naive Bayes, we perform better with U2R and Prob-

ing. These interesting results are obtained by applying dif-

ferent strategies such as gathering attacks before or after

classiﬁcation and also discretization of some continuous at-

tributes. The complementarity between decision trees and

naive Bayes can be exploited to develop a meta classiﬁer to

provide good results in each category of attacks. This is left

for future works. Of course, it is clear that globally decision

trees give slightly better results. However, from computa-

tion point of view, the construction of naive Bayes is largely

faster than decision trees. For instance, with a Pentium III

700 Mhz, it is impossible to construct the tree (with a train-

Table 7: Comparisons between the winning strategy,

decision trees and naive Bayes

Winning strategy Decision trees Naive Bayes

Normal 99.50% 99.50% 97.68%

Dos 97.10% 97.24% 96.65%

R2L 8.40% 0.52% 8.66%

U2R 13.2% 13.60% 11.84%

Probing 83.3% 77.92% 88.33%

ing set having 494019 instances) while with naive Bayes it

only took few minutes. When the construction of decision

trees is possible, then learning and classifying with naive

bayes is generally 7 times faster than learning and classify-

ing with decision trees.

9. ACKNOWLEDGMENTS

This work is supported by a french national project RNTL

(R´eseau National des Technologies Logicielles), DICO

(D´etection d’Intrusions COop´erative). The authors would

like to thank Fr´ed´eric Cuppens for his useful comments.

10. REFERENCES

[1] Axelsson, S.: Intrusion detection systems: a survey

and taxonomy. Technical report 99-15, March 2000.

[2] Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C.

J.: Classiﬁcation and regression trees. Monterey, CA

Wadsworth & Brooks, 1984.

[3] Cooper, G. F.: Computational complexity of

probabilistic inference using Bayes belief networks.

Artiﬁcial Intelligence, Vol. 42, 393-405, 1990.

[4] Hyaﬁl, L., Rivest, R. L: Constructing optimal binary

decision trees is NP-complete. Information Processing

Letters, 5(1):15-17, 1976.

[5] Jensen, F. V.: Introduction to Bayesien networks.

UCL Press, 1996.

[6] John, G.: Enhancements to the Data Mining Process.

PhD thesis, Stanford University, 1997.

[7] Kumar, S., Spaﬀord., E. H.: A software architecture

to support misuse intrusion detection. In proceedings

of the 18th National Information Security Conference,

194-204, 1995.

[8] Ilgun, K., Kemmerer., R. A., Porras, P. A.: State

transition: A rule-based intrusion detection approach.

IEEE Transactions on Software Engineering, 21(3),

181-199, 1995.

[9] Lunt, T.: Detecting intruders in computer systems. In

proceedings of the Sixth Annual Symposium and

Technical Displays on Physical and Electronic

Security, 1993.

[10] Pearl J.: Probabilistic Reasoning in intelligent

systems: networks of plausible inference. Morgan

Kaufmman , Los Altos, CA, 1988.

[11] Porras, P. A., Neumann., P. G., EMERALD: Event

monitoring enabling responses to anomalous live

disturbances. In proceedings of the 20th National

Information Systems Security Conference, Baltimore,

Maryland, USA, NIST, 353-365, 1997.

[12] Quinlan, J. R.: C4.5, Programs for machine learning.

Morgan Kaufmann San Mateo Ca, 1993.

[13] Quinlan, J. R.: Bagging, boosting, and C4.5.

Proceedings of the thirteenth national conference on

AI, Vol. 1, 725-730, 1997.

[14] Valdes, A., Skinner K.: Adaptive Model-based

Monitoring for Cyber Attack Detection. In

proceedings of Recent Advances in Intrusion Detection

(RAID 2000), Toulouse, France, 80-92, 2000.

[15] http://kdd.ccs.uci.edu/databases/kddcup99/task.html

[16] R. Marty: Snort the open source network IDS,

http://www.snort.org/, 2001.

424