Content uploaded by Eric Postma

Author content

All content in this area was uploaded by Eric Postma

Content may be subject to copyright.

Outlier Detection with One-Class Classiﬁers from ML and KDD

Jeroen H.M. Janssens, Ildiko Flesch, and Eric O. Postma

Tilburg centre for Creative Computing

Tilburg University

Tilburg, The Netherlands

Email: jeroen@jeroenjanssens.com, ildiko.ﬂesch@gmail.com, eric.postma@gmail.com

Abstract—The problem of outlier detection is well studied in

the ﬁelds of Machine Learning (ML) and Knowledge Discovery

in Databases (KDD). Both ﬁelds have their own methods and

evaluation procedures. In ML, Support Vector Machines and

Parzen Windows are well-known methods that can be used for

outlier detection. In KDD, the heuristic local-density estimation

methods LOF and LOCI are generally considered to be

superior outlier-detection methods. Hitherto, the performances

of these ML and KDD methods have not been compared. This

paper formalizes LOF and LOCI in the ML framework of

one-class classiﬁcation and performs a comparative evaluation

of the ML and KDD outlier-detection methods on real-world

datasets. Experimental results show that LOF and SVDD are

the two best-performing methods. It is concluded that both

ﬁelds offer outlier-detection methods that are competitive in

performance and that bridging the gap between both ﬁelds

may facilitate the development of outlier-detection methods.

Keywords-one-class classiﬁcation; outlier detection; local

density estimation

I. INTRODUCTION

There is a growing interest in the automatic detection

of abnormal or suspicious patterns in large data volumes

to detect terrorist activity, illegal ﬁnancial transactions, or

potentially dangerous situations in industrial processes. The

interest is reﬂected in the development and evaluation of

outlier-detection methods [1], [2], [3], [4]. In recent years,

outlier-detection methods have been proposed in two re-

lated ﬁelds: Knowledge Discovery (in Databases) (KDD)

and Machine Learning (ML). Although both ﬁelds have

considerable overlap in their objectives and subject of study,

there appears to be some separation in the study of outlier-

detection methods. In the KDD ﬁeld, the Local Outlier

Factor (LOF) method [3] and the Local Correlation Integral

(LOCI) method [4] are the two main methods for outlier

detection. Like most methods from KDD, LOF and LOCI

are targeted to process large volumes of data [5]. In the ML

ﬁeld, outlier detection is generally based on data description

methods inspired by k-Nearest Neighbors (KNNDD),Parzen

Windows (PWDD), and Support Vector Machines (SVDD),

where DD stands for data description [1], [2]. These methods

originate from statistics and pattern recognition, and have a

solid theoretical foundation [6], [7].

Interestingly, within both ﬁelds the evaluation of outlier-

detection methods occurs quite isolated from the other ﬁeld.

In the KDD ﬁeld, LOF and LOCI are rarely compared to

ML methods such as KNNDD, PWDD, and SVDD [3], [4]

and in the ML ﬁeld, LOF and LOCI are seldom mentioned.

As a case in point, in Hodge and Austin’s review of outlier

detection methods [8], LOF and LOCI are not mentioned

at all, while in a recent anomaly-detection survey [9], these

methods are compared on a conceptual level, only. The aim

of this paper is to treat outlier-detection methods from both

ﬁelds on an equal footing by framing them in a common

methodological framework and by performing a comparative

evaluation. To the best of ourknowledge, this is the ﬁrst time

that outlier-detection methods from the ﬁelds of KDD and

ML are evaluated and compared in a statistically valid way.1

To this end we adopt the one-class classiﬁcation frame-

work [1]. The framework allows outlier-detection methods

to be evaluated using the well-known performance measure

AUC [11], and to be compared using statistically funded

comparison test such as the Friedman test [12] and the post-

hoc Nemenyi test [13].

The outlier-detection methods of which the performances

are compared are: LOF, LOCI from the ﬁeld of KDD, and

KNNDD, PWDD, and SVDD from the ﬁeld of ML. In this

paper, LOF and LOCI are reformulated in terms of the one-

class classiﬁcation framework. The ML methods have been

proposed in terms of the one-class classiﬁcation framework

by De Ridder et al. [14] and Tax [1].

The remainder of the paper is organized as follows.

Section II brieﬂy presents the one-class classiﬁcation frame-

work. In Sections III and IV we introduce the KDD and

ML outlier-detection methods, respectively, and explain how

they compute a measure of outlierness. We describe the set-

up of our experiments in Section V and their results in

Section VI. Section VII discusses the results in terms of three

observations. Finally, Section VIII concludes by stating that

the ﬁelds of KDD and ML have outlier-detection methods

that are competitive in performance and deserve treatment

on equal footing.

1Hido et al. recently compared LOF, SVDD, and several other outlier-

detection methods [10]. Unfortunately, their study is ﬂawed because no

independent test set and proper evaluation procedure were used.

II. ONE-CLASS CLASSIFICATION FRAMEWORK

In the one-class classiﬁcation framework, outlier detection

is formalized in terms of objects and labels as follows. Let

X={x1,...,xn},xi∈Rdbe the object space and let Y

be the corresponding label space. A dataset Dis a sequence

of object-label pairs, i.e., D={(x1,y1),...,(xn,yn)}⊆

X×Y.

In one-class classiﬁcation, only example objects from a

single class, the target class, are used to train a classiﬁer.

This makes a one-class classiﬁer particularly useful for

outlier detection [15]. A one-class classiﬁer fclassiﬁes a

new object xieither as belonging to the target class or to

the outlier class.

An object is classiﬁed as an outlier when it is very

‘dissimilar’ from the given target objects. To this end, one-

class classiﬁers generally consist of two components: a

dissimilarity measure δand a threshold θ[1]. A new object

xiis accepted as a target object when the dissimilarity value

δis less than or equal to the threshold θ, otherwise it is

rejected as an outlier object:

f(xi)=target if δ(xi,Dtrain)≤θ,

outlier if δ(xi,Dtrain)>θ, (1)

where Dtrain, the training set, is a subset of dataset D.

Each method that is presented in this paperhas a different

way to compute the dissimilarity measure, which, together

with the dataset at hand, determines the optimal threshold. In

our experiments, the methods are evaluated on a complete

range of thresholds using the AUC performance measure

[11].

III. KDD OUTLIER-DETECTION METHODS

In this section we describe two popular outlier-detection

methods from the ﬁeld of KDD, namely the Local Outlier

Factor method (LOF) [3] and the Local Correlation Integral

method (LOCI) [4]. Both methods are based on local

densities, meaning that they consider an object to be an

outlier when its surrounding space contains relatively few

objects (i.e., when the data density in that part of the data

space is relatively low).

We frame the KDD outlier detection methods LOF and

LOCI into the one-class classiﬁcation framework [16] by

letting them compute a dissimilarity measure δby: (i)

constructing a neighbourhood around xi, (ii) estimating the

density of the neighborhood, and (iii) comparingthis density

with the neighborhood densities of the neighboring objects.

Subsections III-A and III-B explain how the three steps are

implemented in LOF and LOCI, respectively.

A. Local Outlier Factor

The ﬁrst KDD method we describe is the heuristic Local

Outlier Factor method (LOF) [3]. The user needs to specify

one parameter, k, which represents the number of neighbors

constituting the neighborhood used for assessing the local

density.

In order to construct the neighborhood of an object xi,

LOF deﬁnes the neighborhood border distance dborder of

xias the Euclidean distance dfrom xito its kth nearest

neighbor NN(xi,k):

dborder(xi,k)=d(xi,NN(xi,k)) .

Then, a neighborhood N(xi,k)is constructed, containing

all objects xjwhose Euclidean distance to xiis not greater

than the neighborhood border distance dborder:

N(xi,k)={xj∈D

train \{xi}|d(xi,xj)≤dborder(xi,k)}.

To estimate the density of the constructed neighborhood,

the reachability distance is introduced. Intuitively, this dis-

tance is deﬁned to ensure that a minimal distance between

the two objects xiand xjis maintained, by “keeping”

object xioutside the neighborhood of object xj. The use of

the reachability distance causes a smoothing effect whose

strength depends on the parameter k. The reachability dis-

tance dreach is formally given by:

dreach(xi,xj,k)=max{dborder(xj,k),d(xj,xi)}.

It should be noted that the reachability distance dreach is an

asymmetric measure.

The neighborhood density ρof object xidepends on the

number of objects in the neighborhood, |N(xi,k)|, and on

their reachability distances. It is deﬁned as:

ρ(xi,k)= |N(xi,k)|

xj∈N (xi,k)

dreach(xi,xj,k).

Objects xjin the neighborhood that are furtheraway from

object xi, have a smaller impact on the neighborhooddensity

ρ(xi,k).

In the third step, the neighborhood density ρof object xi

is compared with those of its surrounding neighborhoods.

The comparison results in a dissimilarity measure δLOF and

requires the neighborhood densities ρ(xj,k)of the objects

xjthat are inside the neighborhood of xi. The dissimilarity

measure δLOF is deﬁned formally as:

δLOF(xi,k,Dtrain)=

xj∈N (xi,k)

ρ(xj,k)

ρ(xi,k)

|N(xi,k)|.

An object which lies deep inside a cluster gets a local

outlier factor dissimilarity value of around 1because it

has a neighborhood density equal to its neighbors. An

object which lies outside a cluster has a relatively low

neighborhood density and gets a higher local outlier factor.

B. Local Correlation Integral

The Local Correlation Integral method (LOCI) [4] was

proposed as an improvement over LOF. More speciﬁcally,

the authors state that the choice of the neighborhood size,

k, in LOF is non-trivial and may lead to erroneous outlier

detections. LOCI is claimed to be an improvement over

LOF because it considers the local density at multiple scales

or levels of granularity. LOCI achieves this by iteratively

performing the three steps, each time using a neighborhood

of increasing radius r∈R+. We denote the set of relevant

radii as R.

Another difference with LOF is that LOCI deﬁnes two

neighborhoods for an object xi: (i) the extended neigh-

borhood, Next, and (ii) the local neighborhood, Nloc. The

extended neighborhood of an object xicontains all objects

xjthat are within radius rfrom xi:

Next(xi,r)={xj∈D

train |d(xj,xi)≤r}∪xi,

and the (smaller) local neighborhood contains all objects that

are within radius αr from object xi:

Nloc(xi,r,α)={xj∈D

train |d(xj,xi)≤αr}∪xi,

where αdeﬁnes the ratio between the two neighborhoods

(α∈(0,1]).

In LOCI, the density of the local neighborhood of

an object xiis denoted by ρ(xi,αr), and is deﬁned as

|Nloc(xi,r,α)|.

The extended neighborhood of an object xihas a density

ˆρ(xi,r,α), which is deﬁned as the average density of the

local neighborhoods of all objects in the extended neighbor-

hood of object xi. In formal terms:

ˆρ(xi,r,α)=

xj∈Next(xi,r)

ρ(xj,αr)

|Next(xi,r)|.

The local neighborhood density of object xiis compared

to the extended neighborhood density by means of the multi-

granularity deviation factor (MDEF):

MDEF(xi,r,α)=1−ρ(xi,αr)

ˆρ(xi,r,α).

An object which lies deep inside a cluster has a local

neighborhood density equal to its neighbors and therefore

gets an MDEF value around 0. The MDEF value ap-

proaches 1as an object lies more outside a cluster.

To determine whether an object is an outlier, LOCI

introduces the normalized MDEF:

σMDEF (xi,r,α)= σˆρ(xi,r,α)

ˆρ(xi,r,α),

where σˆρ(xi,r,α)is the standard deviation of all ρ(xj,αr)

in Next(xi,r). The normalized MDEF becomes smaller

when the local neighborhoods have the same density. Intu-

itively, this causes a cluster of uniformly distributed objects

to have a tighter decision boundary than, for example, a

Gaussian distributed cluster.

We deﬁne the dissimilarity measure δLOCI as the maxi-

mum ratio of MDEF to σMDEF of all radii r∈R:

δLOCI(xi,α,Dtrain)=max

r∈R MDEF(xi,r,α)

σMDEF(xi,r,α).

IV. ML OUTLIER-DETECTION METHODS

In this section we brieﬂy discuss the outlier-detection

methods from the ﬁeld of Machine Learning. The methods

k-Nearest Neighbor Data Description, Parzen Windows Data

Description, and Support Vector Domain Description are

explained in Sections IV-A, IV-B, and IV-C, respectively.

A. k-Nearest Neighbor Data Description

The k-Nearest Neighbor Data Description method

(KNNDD) [14]. The dissimilarity measure computed by

KNNDD is simply the ratio between two distances. The

ﬁrst is the distance between the test object xiand its kth

nearest neighbor in the training set NN(xi,k). The second

is the distance between the kth nearest training object and

its kth nearest neighbor. Formally:

δKNNDD(xi,k,Dtrain)= d(xi,NN(xi,k))

d(NN(xi,k),NN(NN(xi,k),k)) .

The KNNDD method is similar to LOF and LOCI in the

sense that it locally samples the density. The main difference

with LOF and LOCI is that KNNDD is much simpler.

B. Parzen Windows Data Description

The second ML method is the Parzen Windows Data

Description (PWDD), which is based on Parzen Windows

[6]. PWDD estimates the probability density functionof the

target class xi:

δPWDD(xi,h,Dtrain)= 1

Nh

N

j=1

Kxi−xj

h,

where Nis |Dtrain|,his a smoothing parameter, and K

typically is a Gaussian kernel:

K(x)= 1

√2πe−1

2x2.

The parameter his optimised using a leave-one out

maximum likelihood estimation [14]. Since the dissimilarity

measure δPWDD is a probability and not a distance, the

threshold function (cf. Equation 1) for PWDD becomes:

fPWDD(xi)=target if δPWDD(xi,h,Dtrain)≥θ,

outlier if δPWDD(xi,h,Dtrain)<θ,

such that an object with a too low probability of being a

target object is rejected as an outlier. It should be noted that

PWDD estimates the density globally, while LOF, LOCI,

and KNNDD estimate local densities.

Table I: Summary of the features of the KDD and ML

outlier-detection methods used in our experiments.

KDD ML

Feature LOF LOCI KNN PWDD SVDD

Estimate local density ✔✔✔

Estimate global density ✔

Domain based ✔

C. Support Vector Domain Description

The third method is the Support Vector Domain Descrip-

tion (SVDD) [2]. We conﬁne ourselves to a brief description

of this kernel-based data-description method. The interested

reader is referred to [1], [2] for a full description of the

SVDD method.

SVDD is a domain-based outlier-detection method in-

spired by Support Vector Machines [17], and unlike LOF,

LOCI, and PWDD, SVDD does not estimate the data

density directly. Instead, it ﬁnds an optimal boundary around

the target class by ﬁtting a non-linearly transformed hyper-

sphere with minimal volume using the kernel trick, such

that it encloses most of the target objects. The optimal

boundary is found using quadratic programming, where only

distant target objects are allowed to be outside the boundary

[17], [7]. The dissimilarity measure δSVDD is deﬁned as the

distance between object xiand the target boundary. In our

experiments we employ a Gaussian kernel whose width, s,

is found as described in [2].

V. EXPERIMENTAL SET-UP

This section describes the set-up of our experiments

where we evaluate and compare the performances of LOF,

LOCI, KNNDD, PWDD, and SVDD. For clarity, we have

summarized the features of these methods in Table I. In

Section V-A we explain howwe prepare the multi-class real-

world datasets that we use for our one-class experiments.

The evaluation involves the calculation of the weighted

AUC, which is described in Section V-B. The statistical

Friedman and Nemenyi tests, which we use to compare the

methods, are presented in Section V-C.

A. Datasets

In order to evaluate the methods on a wide variety of

datasets (i.e., varying in size, dimensionality, class vol-

ume overlap), we use 24 real-world multi-class datasets

from the UCI Machine Learning Repository2[19] as

redeﬁned as one-class classiﬁcation datasets by David

Tax (http://ict.ewi.tudelft.nl/˜davidt): Ar-

rhythmia,Balance-scale,Biomed,Breast,Cancer wpbc,

Colon,Delft Pump,Diabetes,Ecoli,Glass Building,Heart,

Hepatitis,Housing,Imports,Ionosphere,Iris,Liver,Sonar,

Spectf,Survival,Vehicle,Vowel,Waveform, and Wine.

2Except the Delft Pump dataset which is taken from Ypma [18].

From each multi-class dataset containing a set of classes

C,|C|one-class datasets are constructed by relabelling one

class as the target class and the remaining |C|−1classes

as the outlier class, for all classes separately.

B. Evaluation

We apply the following procedure in order to evaluate

a method on a one-class dataset. An independent test set

containing 20% of the dataset is reserved. With the remain-

ing 80% a 5-fold cross-validation procedure is applied to

optimise the parameters (i.e., k=1,2,...,50 for LOF

and KNNDD, α=0.1,0.2,...,1.0for LOCI, hfor

PWDD, and sfor SVDD). Each method is trained with

the parameter value yielding the best performance, and its

AUC performance is evaluated using the independent test

set. This procedure is repeated 5 times.

We report on the performances of the methods on an entire

multi-class dataset. To this end, the AUCs of the |C|one-

class datasets (cf. Section V-A) are averaged, where each

one-class dataset is weighted according to the prevalence,

p(ci), of the target class, ci:

AUCweighted =

∀ci∈C

AUC (ci)×p(ci),

where p(ci)is deﬁned as the ratio of the number of target

objects to the total number of objects in the dataset. The use

of a weighted average prevents one-class datasets containing

little target objects from dominating the results [20].

C. Comparison

Following Demˇsar [21], we adopt the statistical Friedman

test and the post-hoc Nemenyi test for the comparison of

multiple methods on multiple datasets. The Friedman test

[12] is used to investigate whether there is a signiﬁcant

difference between the performances of the methods. The

Friedman test ﬁrst ranks the methods for each dataset, where

the best performing method is assigned the rank of 1, the

second best the rank of 2, and so forth. Then it checks

whether the measured average ranks R2

jare signiﬁcantly

different from the mean rank, which is 3in our case. Iman

and Davenport proposed the FFstatistic, which is less

conservative than the Friedman statistic [22]:

FF=(N−1) χ2

F

N(m−1) −χ2

F

,

where Nis the number of datasets (i.e., 24), mis the number

of methods (i.e., 5), and χ2

Fis the Friedman statistic:

χ2

F=12N

m(m+1)⎛

⎝

j

R2

j−m(m+1)

2

4⎞

⎠.

The FFstatistic is distributed according to the F-

distribution with m−1and (m−1) (N−1) degrees of

freedom.

Table II: The weighted AUC performance in percentages

obtained by the Machine Learning and Knowledge Discov-

ery outlier-detection methods on 24 real-worlddatasets. The

corresponding average rank for each method is reported

below.

Dataset LOF LOCI KNN PWDD SVDD

Arrhythmia 62.87 56.70 56.50 50.00 61.76

Balance-scale 94.66 91.26 93.50 94.18 95.80

Biomed 78.61 85.04 88.29 72.99 88.20

Breast 96.81 98.31 96.68 72.24 97.75

Cancer wpbc 62.57 58.55 61.37 56.56 60.59

Colon 73.47 42.08 75.53 50.00 70.18

Delft Pump 94.93 87.27 93.13 92.97 94.43

Diabetes 68.24 64.86 65.03 63.14 67.98

Ecoli 96.74 96.10 96.53 93.09 93.39

Glass Building 81.47 75.79 78.93 77.39 77.32

Heart 61.82 60.88 60.53 56.86 62.05

Hepatitis 66.53 62.05 64.72 58.73 60.89

Housing 62.86 63.75 64.56 61.66 64.24

Imports 80.49 74.24 80.24 80.09 82.11

Ionosphere 74.28 68.51 75.47 71.14 81.38

Iris 98.28 98.23 98.49 98.26 99.20

Liver 59.57 59.10 55.53 53.46 59.15

Sonar 76.89 66.71 74.22 74.45 75.77

Spectf 60.20 56.11 52.81 51.57 78.92

Survival 67.02 64.18 66.64 62.30 68.12

Vehicle 75.68 74.99 79.15 78.81 81.37

Vowel 97.89 95.22 99.49 99.56 99.28

Waveform 90.26 89.17 89.85 86.89 90.36

Wine 88.42 87.68 89.43 84.20 86.94

Average rank 2.083 3.917 2.625 4.292 2.083

When there is a signiﬁcant difference, we proceed with

the post-hoc Nemenyi test [13], which checks for each pair

of methods whether there is a signiﬁcant different in per-

formance. The performance of two methods is signiﬁcantly

different when the difference between their average ranks is

greater or equal to the critical difference:

CD =qαm(m+1)

6N,

where qαis the Studentised range statistic divided by √2..

In our case, CD =1.245 for α=0.05.

VI. RESULTS

Table II presents the weighted AUC performances of each

method on the 24 real-world datasets.

The average ranks of the methods are shown at the bottom

of the table. On these 24 real-world datasets, SVDD and

LOF perform best, both with an average rank of 2.083. With

an average rank of 2.625,KNNDD performs surprisingly

well. LOCI and PWDD perform the worst, with average

ranks of 3.917 and 4.292, respectively. Interestingly, SVDD

seems to perform well on those datasets where LOF per-

forms worse and vice versa. Apparently, both methods are

54321

CD

SVDD

LOF

KNNDD

LOCI

PWDD

Figure 1: Comparison of all methods against each other

with the Nemenyi test. Groups of methods that are not

signiﬁcantly different (at p=0.05) are connected.

complementary with respect to the nature of the dataset at

hand.

To see whether there is a signiﬁcant difference between

these average ranks, we calculate the Friedman statistic,

χ2

F=48.53, which results in an FFstatistic of FF=23.52.

With ﬁve methods and 24 data sets, FFis distributed

according to the Fdistribution with 5−1=4and

(5 −1) ×(24 −1) = 92 degrees of freedom. The critical

value of F(4,92) for α=0.05 is 2.471, so we reject the

null-hypothesis, which states that all methods have an equal

performance.

We continue with the Nemenyi test, for which the critical

distance CD, for α=0.05,is1.245. We identify two groups

of methods. The performances of LOCI and PWDD are

signiﬁcantly worse than that of KNNDD, LOF, and SVDD.

Figure 1 graphically displays the result of the Nemenyi

test in a so-called critical difference diagram.

Groups of methods that are not signiﬁcantly different

(at p=0.05) are connected. The diagram reveals that, in

terms of performances, the methods examined fall into two

clusters. The cluster of best-performing methods consists

of SVDD, LOF, and KNNDD. The other cluster contains

PWDD and LOCI.

VII. DISCUSSION

We have evaluatedﬁve outlier-detection methods from the

ﬁelds of Machine Learning and Knowledge Discovery in

Databases on a real-world of datasets. The performances

of the methods have been statistically compared using the

Friedman and Nemenyi tests. From the obtained experimen-

tal results we make three main observations. We describe

each observation separately and providepossible reasons for

each of them below.

A. Observation 1: Local density estimates outperform

global density estimates

The ﬁrst observation we make is that PWDD performs

signiﬁcantly worse than LOF. This is to be expected, since

PWDD performs a global density estimate. Such an estima-

tion becomes problematic when there exist large differences

in the density. Objects in sparse clusters will be erroneously

classiﬁed as outliers.

LOF and LOCI overcome this problem by performing

an additional step. Instead of using the density estimate as a

dissimilarity measure, they locally compare the density with

the neighborhood. This produces an estimate which is both

relative and local, and enables LOF and LOCI to cope with

different densities across different subspaces. For LOF, the

local density estimate results in a better performance. For

LOCI, however, this is not the case. Possible reasons for

this are discussed in the second observation below.

B. Observation 2: LOF outperforms LOCI

The second observation we make is that LOCI is out-

performed by both LOF and KNNDD. This is unexpected

because LOCI, just like LOF, considers local densities.

Moreover, LOCI performs a multi-scale analysis of the

dataset, whereas LOF does not.

We provide two possible reasons for the relative weak

performance of LOCI. The ﬁrst possible reason is that

LOF considers three consecutive neighborhoods to compute

the dissimilarity measure. LOCI, instead, considers two

neighborhoods, only. The three-fold density analysis of LOF

is more profound than the two-fold analysis of LOCI and

therefore yields a better estimation of the data density.

The second possible reason for the observed results is

that LOCI constructs a neighborhood with a given radius,

and not with a given number of objects. For small radii,

the extended neighborhood may contain one object only,

implying that there may be no deviation in the density and

that outliers might be missed at a small scale. LOF and

KNNDD, on the other hand, do not suffer from this because

both methods construct a neighborhood with a given number

of objects.

C. Observation 3: Domain-based and Density-based meth-

ods are competitive

The third observation we make is that domain-based

(SVDD) and density-based (LOF) methods are competitive

in performance.

To obtain good estimates, density-based methods require

large datasets, especially when the object space is of high

dimensionality.This implies that in case of sparsely sampled

datasets, density-based methods may fail to detect outliers

[23]. SVDD describes only the domain in the object space

(i.e., it deﬁnes a closed boundary around the target class),

and does not estimate the complete data density. As a con-

sequence, SVDD is less sensitive to an inaccurate sampling

and better able to deal with small sample sizes [1].

VIII. CONCLUSION

This paper evaluates and compares outlier-detection meth-

ods from the ﬁelds of Knowledge Discovery in Databases

(KDD) and Machine Learning (ML). The KDD methods

LOF and LOCI and the ML methods KNNDD, PWDD,

and SVDD have been framed into the one-class classiﬁca-

tion framework, to allow for an evaluation using the AUC

performance measure, and a statistical comparison using the

Friedman and Nemenyi tests.

In our experimental comparison, we have determined

that the best performing methods are KNNDD, LOF, and

SVDD. These outlier-detection methods originate from the

ﬁelds of KDD and ML. Our ﬁndings indicate that methods

developed in both ﬁelds are competitive and deserve treat-

ment on equal footing.

Framing KDD methods in ML-based frameworks such

as the one-class classiﬁcation framework, facilitates the

comparison of methods across ﬁelds and may lead to novel

methods that combine ideas of both ﬁelds. For instance, our

results suggest that it may be worthwhile to develop outlier-

detection methods that combine elements of domain-based

and local density-based methods.

We conclude that the ﬁelds of KDD and ML offer outlier-

detection methods that are competitive in performance and

that bridging the gap between both ﬁelds may facilitate the

development of outlier-detection methods. We identify two

directions for future research.

The ﬁrst direction is to investigate the complementarityof

LOF and SVDD with respect to the nature of the dataset.

The relative strengths of both methods appear to depend

on the characteristics of the dataset. Therefore, investigating

which dataset characteristics determine the performances of

LOF and SVDD is useful.

The second direction is to combine the best of both ﬁelds.

For example, to extend LOF with a kernel-based domain

description.

ACKNOWLEDGMENT

This work has been carried out as part of the Poseidon

project under the responsibility of the Embedded Systems

Institute (ESI), Eindhoven, The Netherlands. This project

is partially supported by the Dutch Ministry of Economic

Affairs under the BSIK03021 program. The authors would

like to thank the anonymous reviewers and Hans Hiemstra

for their critical and constructive comments and suggestions.

REFERENCES

[1] D. Tax, “One-class classiﬁcation: Concept-learning in the

absence of counter-examples,” Ph.D. dissertation, Delft Uni-

versity of Technology, Delft, The Netherlands, June 2001.

[2] D. Tax and R. Duin, “Support vector domain description,”

Pattern Recognition Letters, vol. 20, no. 11-13, pp. 1191–

1199, 1999.

[3] M. Breunig, H. Kriegel, R. Ng, and J. Sander, “LOF: Iden-

tifying density-based local outliers,” ACM SIGMOD Record,

vol. 29, no. 2, pp. 93–104, 2000.

[4] S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos,

“LOCI: Fast outlier detection using the local correlation

integral,” in Proceedings of the 19th International Conference

on Data Engineering, Bangalore, India, March 2003, pp. 315–

326.

[5] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “Knowledge

discovery and data mining: Towards a unifying framework,”

Knowledge discovery and data mining, pp. 82–88, 1996.

[6] E. Parzen, “On estimation of a probability density function

and mode,” The Annals of Mathematical Statistics, pp. 1065–

1076, 1962.

[7] B. Scholkopf and A. Smola, Learning with kernels. MIT

press Cambridge, MA, USA, 2002.

[8] V. Hodge and J. Austin, “A survey of outlier detection

methodologies,” Artiﬁcial Intelligence Review, vol. 22, no. 2,

pp. 85–126, October 2004.

[9] V. Chandola, A. Banerjee,and V. Kumar, “Anomaly detection:

A survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58,

2009.

[10] S. Hido, Y. Tsuboi, H. Kashima, M. Sugiyama, and

T. Kanamori, “Inlier-based outlier detection via direct density

ratio estimation,” in Eighth IEEE International Conference on

Data Mining, 2008. ICDM’08, 2008, pp. 223–232.

[11] A. Bradley, “The use of the area under the ROC curve

in the evaluation of machine learning algorithms,” Pattern

Recognition, vol. 30, no. 7, pp. 1145–1159, 1997.

[12] M. Friedman, “The use of ranks to avoid the assumption of

normality implicit in the analysis of variance,” Journal of the

American Statistical Association, pp. 675–701, 1937.

[13] P. Nemenyi, “Distribution-free multiple comparisons,” Ph.D.

dissertation, Princeton, 1963.

[14] D. de Ridder, D. Tax, and R. Duin, “An experimental com-

parison of one-class classiﬁcation methods,” in Proceedings

of the Fourth Annual Conference of the Advanced School for

Computing and Imaging. Delft, The Netherlands: ASCI,

June 1998, pp. 213–218.

[15] N. Japkowicz, “Concept-learning in the absence of counter-

examples: An autoassociation-based approach to classiﬁca-

tion,” Ph.D. dissertation, Rutgers University, New Brunswick,

NJ, October 1999.

[16] J. Janssens and E. Postma, “One-class classiﬁcation with

LOF and LOCI: An empirical comparison,” in Proceedings

of the 18th Annual Belgian-Dutch Conference on Machine

Learning, Tilburg, The Netherlands, May 2009, pp. 56–64.

[17] V. Vapnik, The nature of statistical learning theory. Springer-

Verlag, NY, USA, 1995.

[18] A. Ypma, “Learning methods for machine vibration analysis

and health monitoring,” Ph.D. dissertation, Delft University,

2001.

[19] A. Asuncion and D. Newman. (2007) UCI machine

learning repository. [Online]. Available: http://www.ics.uci.

edu/∼mlearn/MLRepository.html

[20] K. Hempstalk and E. Frank, “Discriminating against new

classes: One-class versus multi-class classiﬁcation,” in Proc

21st Australasian Joint Conference on Artiﬁcial Intelligence,

ser. Auckland, New Zealand. Springer, 2008.

[21] J. Demˇsar, “Statistical comparisons of classiﬁers over multiple

data sets,” The Journal of Machine Learning Research, vol. 7,

pp. 1–30, 2006.

[22] R. Iman and J. Davenport, “Approximations of the critical

region of the Friedman statistic,” in Annual meeting of the

American Statistical Association, vol. 12, 1979.

[23] C. Aggarwal and P. Yu, “Outlier detection for high dimen-

sional data,” ACM SIGMOD Record, vol. 30, no. 2, pp. 37–46,

2001.