ArticlePDF Available

Machine Learning Techniques for Anomaly Detection: An Overview

Authors:

Abstract and Figures

Intrusion detection has gain a broad attention and become a fertile field for several researches, and still being the subject of widespread interest by researchers. The intrusion detection community still confronts difficult problems even after many years of research. Reducing the large number of false alerts during the process of detecting unknown attack patterns remains unresolved problem. However, several research results recently have shown that there are potential solutions to this problem. Anomaly detection is a key issue of intrusion detection in which perturbations of normal behavior indicates a presence of intended or unintended induced attacks, faults, defects and others. This paper presents an overview of research directions for applying supervised and unsupervised methods for managing the problem of anomaly detection. The references cited will cover the major theoretical issues, guiding the researcher in interesting research directions.
Content may be subject to copyright.
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
33
Machine Learning Techniques for Anomaly Detection:
An Overview
Salima Omar
Universiti Teknologi Malaysia
Faculty of Computing
Asri Ngadi
Universiti Teknologi Malaysia
Faculty of Computing
Hamid H. Jebur
Universiti Teknologi Malaysia
Faculty of Computing
ABSTRACT
Intrusion detection has gain a broad attention and become a
fertile field for several researches, and still being the subject
of widespread interest by researchers. The intrusion detection
community still confronts difficult problems even after many
years of research. Reducing the large number of false alerts
during the process of detecting unknown attack patterns
remains unresolved problem. However, several research
results recently have shown that there are potential solutions
to this problem. Anomaly detection is a key issue of intrusion
detection in which perturbations of normal behavior indicates
a presence of intended or unintended induced attacks, faults,
defects and others. This paper presents an overview of
research directions for applying supervised and unsupervised
methods for managing the problem of anomaly detection. The
references cited will cover the major theoretical issues,
guiding the researcher in interesting research directions.
Keywords
Supervised Machine Learning, Unsupervised Machine
Learning, Network Intrusion Detection.
1. INTRODUCTION
Intrusion detection has been studied for approximately 20
years. Intrusions are the activities that violate the information
system security policy, and intrusion detection is the
identifying intrusions process. Intrusion detection is based on
the assumption that the intruder behavior will be significantly
diverse from the legitimate behaviors, which facilitates and
enables the detection of a lot of non-authorized activities.
Intrusion detection systems are usually used together with
other protection systems such as access control and
authentication as a second defense line to protect information
systems. There are many reasons that make intrusion
detection the important parts in the whole defense system.
First, many of the traditional systems and applications have
been built and developed without taking security seriously
into account. Second, computer systems and applications may
have flaws or bugs in their design that could be used by
intruders to attack the systems or applications. Therefore, the
preventive technique may not be as effective as anticipated.
Despite their importance, IDSs are not replacement for
preventive security mechanisms, but they complement the
other protective mechanisms to enhance the security of the
system. Actually, IDSs alone cannot offer sufficient
protection for information systems. Therefore, IDSs should be
used with other preventive security mechanisms as a part of a
total protective system [59]. Intrusion detection systems are
classified as a signature detection system and an anomaly
detection system. A signature detection system identifies
traffic or application data patterns assumed to be malicious,
while anomaly detection systems compare activities with
‘‘normal baseline. Both signature detection and anomaly
detection systems have advantages and drawbacks. The
primary advantage of signature detection is that it can detect
known attacks fairly for all of the potential attacks against a
network. Anomaly detection systems have two main
advantages over signature based intrusion detection systems.
The first advantage is their capability to detect unknown
attacks because they can model the normal operation of a
system and detect deviations from this model. The second
advantage is the customization ability of the normal activity
profiles for every system, application and network. This will
increase the difficulty for an attacker to know what activities
can be done without getting detected. However, the anomaly
detection approach has its drawbacks such as the system
complexity, high false alarms and the difficulty of detecting
which event triggers those alarms. These are some of many
technical challenges that have to be handled before the
adoption of anomaly detection systems.
This paper presents an overview of research directions for
applying supervised and unsupervised methods for managing
the problem of anomaly detection. The rest of this paper is
organized as follows. In Section 2, the general architecture of
anomaly intrusion detection systems and detailed discussions
on the supervised and unsupervised techniques used in
anomaly detection are described. Finally, the conclusion of
this paper is presented in section 3.
2. ANOMALY DETECTION
TECHNIQUES
The general architecture of all anomaly based network
intrusion detection systems (A-NIDS) methods is similar.
According to [12] and [13], generally, all of them consist of
the following basic modules or stages (Fig. 1). These stages
are parameterization, training and detection. Parameterization
includes collecting raw data from a monitored environment.
The raw data should be representative of the system to be
modeled, (e.g. Packet data from a network). The training stage
seeks to model the system using manual or automatic
methods. For the client-server architecture, the server is a host
that keeps waiting for incoming connections. When a
connection is established between client and server, the server
would instantiate a socket, which will be used to instantiate a
handler object that runs on a separate thread. These handlers
will be kept in a collection object.
The behaviors represented in the model will differ based on
the technique used. Detection compares the system generated
in the training stage with the selected parameterized data
portion. Threshold criteria will be selected to determine
anomalous data instance [13].
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
34
inn Intrusion Report
Fig .1 Generic A-NIDS Functional Architecture
Machine learning can build the required model automatically
based on some given training data. A motivation for this
approach is the availability of the necessary training data, or it
can be at least obtained more easily compared to the effort
needed to define the model manually. With the increase the
complexity and the number of different attacks, machine
learning techniques that allow constructing and maintaining
anomaly detection system (ADS) with less human
intervention look is the only practical approach to achieve the
next generation of intrusion detection systems.
Applying machine learning techniques for intrusion detection
can automatically build the model based on the training data
set, which contains data instances that can be described using
a set of attributes (features) and associated labels. The
attributes can be of different types such as categorical or
continuous.
The attributes nature determines the applicability of anomaly
detection techniques.The labels associated with data instances
are usually in form of binary values, i.e. normal and
anomalous. On the other hand, some researchers have
employed various attacks types such as DoS, U2R, R2L and
Probe rather than the anomalous label. This learning
technique is capable to provide more information about the
anomalies types. Anomaly detection techniques include
supervised techniques and unsupervised techniques (Fig.2)
[20, 55].
Fig 2: Anomaly Detection Techniques
2.1 Supervised Anomaly Detection
Supervised methods (also known as classification methods)
required a labeled training set containing both normal and
anomalous samples to construct the predictive model.
Theoretically, supervised methods provide better detection
rate than semi-supervised and unsupervised methods, since
they have access to more information. However, there exist
some technical issues, which make these methods seem not
accurate as they are supposed to be. The first issue is the
shortage of a training data set that covers all areas. Moreover,
obtaining accurate labels is a challenge and the training sets
usually contain some noises that result in higher false alarm
rates. The most common supervised algorithms are,
Supervised Neural Networks, Support Vector Machines
Machine Learning Techniques Based
Instruction Detection
Supervised Anomaly
Detection
K -NN Neighbor
BN
NN
DT
SVM
Unsupervised Anomaly
Detection Techniques
Clustering
Techniques
K-means
EM
UNC
FCM
SOM
Monitored
environment
Parameterization
Training
Model
Detection
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
35
(SVM), k-Nearest Neighbors, Bayesian Networks and
Decision Tree [60].
2.1.1 .K -Nearest Neighbor (k-NN)
K-nearest neighbor (k-NN) is one of the modest and
conventional nonparametric techniques for classifying
samples [4], [32]. It calculates the approximate distances
between various points on the input vectors, and then assigns
the unlabeled point to the class of its K-nearest neighbors. In
the process of creating k-NN classifier, (k) is an important
parameter and various (k) values can cause various
performances. If k is very huge, the neighbors, which used for
prediction, will consume large classification time and affect
the prediction accuracy.
Shailendra and Sanjay [51] introduced a hybrid approach for
feature selection, which includes two phases filter and
wrapper. The filter phase selects the features with highest
information gain and feeds them to the wrapper phase that
outputs the final feature subset. The final feature subsets are
input to the K-nearest neighbor classifier to classify attacks.
This algorithm effectiveness is demonstrated on DARPA
KDDCUP99 cyber-attack dataset. Ming. Y [33] suggested a
genetic algorithm combined with KNN (k-nearest-neighbor)
for feature selection and weighting. All initial 35 features in
the training phase were weighted, and the ones of highest
weights were selected for testing. Many DoS attacks were
applied to evaluate the systems.
2.1.2 Bayesian Network (BN)
Heckerman [17] defined a Bayesian as “A Bayesian Network
(BN) is a model that encodes probabilistic relationships
among variables interest. This technique is generally used for
intrusion detection in combination with statistical schemes. It
has several advantages, including the capability of encoding
interdependencies between variables and of predicting events,
as well as the ability to incorporate both prior knowledge and
data. ”
Johansen and Lee [22] stated that a BN system provides a
proper mathematical foundation to make straightforward
apparently a difficult problem. They have proposed that BN
based IDS should distinguish attacks from normal network
activity by comparing metrics of each network traffic sample.
Moore and Zuev [35] used a supervised Naive Bayes
classifier and 248 flow features to differentiate between
different types of application such as packet length and inter
arrival times, in addition to numerous TCP header derived
features. Correlation-based feature selection was used to
define stronger features, and it indicated that only a small
subset of fewer than 20 features is needed for accurate
classification.
2.1.3 Supervised Neural Network (NN)
The NNs learning predict different users and daemons
behavior in systems. If they properly designed and
implemented, NNs have the capability to address many
problems encountered by rule-based approaches. The main
NNs advantage is their tolerance to imprecise data and
uncertain information, and their ability to conclude solutions
from data without having previous knowledge of the
regularities in the data. This, in combination with their ability
to generalize from learning data, has made them a proper
approach to ID. In order to apply this approach to ID, data
representing attacks and non-attacks have to be introduced to
the NN to adjust automatically network coefficients during
the training phase [27]. Multilayer perceptron (MLP) and
Radial basis function (RBF) are the most commonly
supervised neural networks used.
Multi Layered Perceptron (MLP). MLP can only classify
linearly separable instances sets. If a straight line or plane can
be drawn to separate the input instances into their correct
categories, input instances are linearly separable and the
perceptron will find the solution. If the instances are not
linearly separable learning will never reach a point where all
instances are classified properly. Multilayered perceptron
(Artificial Neural Networks) have been created to try solving
this problem [47].
There were researches implement an IDS using MLP, which
has the capability of detecting normal and attacks connection
as in [54] and [48]. They were implemented using MLP of
three and four layers neural network. Moradi and Zulkernine
[36], Mohammed et al. [34] used three layers MLP (two
hidden layers) not only for detecting normal and attacks
connection but also for identifying attack type. Yao et al. [57]
proposed Hybrid MLP/CNN neural network, which is
constructed in order to enhance the detection rate of time-
delayed attacks. While obtaining a similarly detection rate of
real-time attacks as the MLP does, the proposed approach can
detect time-delayed attacks efficiently with chaotic neuron.
Radial Basis Function Neural Networks (RBF) is another
common type of feed forward neural networks. Since they
perform classification by measuring distances between inputs
and the centers of the RBF hidden neurons, RBF networks are
much faster than the time consuming back propagation, and
most suitable for problems with large sample size [6].
Research, such as Hofmann et al. [18], Liu et al. [31], Rapaka
[45] employed RBFs to learn multiple local clusters for well-
known attacks and for normal events. Other than being a
classifier, the RBF network is also used to fuse results from
multiple classifiers [6]. It outperformed five different decision
fusion functions, such as a DempsterShafer combination and
weighted majority vote. Jiang et al. [21] introduced a new
approach, which combines both misuse and anomaly
detections in a hierarchical RBF network. In the first layer, an
RBF anomaly detector defines the event nature if it is normal
or anomaly. Anomaly events then pass through a RBF misuse
detector chain, where each detector detects a specific type of
attack. Un classified anomaly events by any misuse detectors
were saved into a database. If enough anomaly events were
collected, they were clustered by a C-means clustering
algorithm into different groups, which used to train a misuse
RBF detector, and added to the misuse detector chain. This
mannar leads to detect and lable and all intrusion events
automatically.
2.1.4 Decision Tree (DT)
Quinlan [43] defined Decision Trees as “powerful and
common tools for classification and prediction. A decision
tree is a tree that has three main components: nodes, arcs and
leaves. Each node is labeled with a feature attribute, which is
most informative among the attributes not yet considered in
the path from the root. Each arc out of a node is labeled with a
feature value for the node’s feature, and each leaf is labeled
with a category or class. A decision tree can then be used to
classify a data point by starting at the root of the tree and
moving through it until a leaf node is reached. The leaf node
provides the classification of the data point. ID3 and C4.5
developed by Quinlan are the most common implementations
of the Decision Tree. ”
Peddabachigari et al. [41], proposed decision trees (DT) and
support vector machines (SVM) as intrusion-detection model.
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
36
They also designed a hybrid DT-SVM model and an ensemble
approach with decision tree where SVM and DT-SVM models
proposed as base classifiers. Joong et al. [23] generated the
decision trees for DoS attacks, R2L attack, U2R attack, and
Scan attack. The ID3 algorithm is used as the learning
algorithm to generate the decision tree automatically.
2.1.5 Support Vector Machine (SVM)
Support vector machines (SVM) are proposed by Vapnik [56].
SVM first maps the input vector into a higher-dimensional
feature space and then obtains the optimal separating hyper-
plane in the high dimensional feature space. Moreover, a
decision boundary, i.e. the separating hyper-plane, is
determined by support vectors rather than the whole training
samples and thus is extremely robust to outliers. In particular,
an SVM classifier is designed for binary classification. That
is, to separate a set of training vectors, which belong to two
different class's notes that the support vectors are the training
samples close to a decision boundary. The SVM also provides
a user-specified parameter called a penalty factor. It allows
users to make a tradeoff between the number of
misclassification samples and the width of a decision
boundary.
Mukkamala et al. [37] designed model to network anomaly
detection problems by “applied kernel classifiers and
classifier design methods to network anomaly detection
problems. They evaluated the impact of kernel type and
parameter values on the accuracy with which a support vector
machine (SVM) performs intrusion classification. Jun et al.
[25] proposed PSOSVM model is applied to an intrusion
detection problem, the standard PSO is used to determine free
parameters of support vector machine and the binary PSO is
to obtain the optimum feature subset at the building intrusion
detection system. Paulo et al. [40] proposed an intrusion
detection system model based on the behavior of network
traffic through the analysis and classification of messages.
Two artificial intelligence techniques named Kohonen neural
network (KNN) and support vector machine (SVM) are
applied to detect anomalies.
2.2 Unsupervised Anomaly Detection
Techniques
These techniques do not need training data. As alternative,
they based on two basic assumptions. First, they presume that
most of the network connections are normal traffic and only a
very small traffic percentage is abnormal. Second, they
anticipate that malicious traffic is statistically various from
normal traffic. According to these two assumptions, data
groups of similar instances which appear frequently are
assumed to be normal traffic, while infrequently instances
which considerably various from the majority of the instances
are regarded to be malicious [7]. The most common
unsupervised algorithms are, K-Means, Self-organizing maps
(SOM), C-means, Expectation-Maximization Meta algorithm
(EM), Adaptive resonance theory (ART), Unsupervised Niche
Clustering (UNC) and One-Class Support Vector Machine.
2.2.1 Clustering Techniques
Rawat [45] and many more found that Clustering techniques
work by grouping the observed data into clusters, according to
a given similarity or distance measure. There exist at least two
approaches to clustering based anomaly detection. In the first
approach, the anomaly detection model is trained using
unlabeled data that consist of both normal as well as attack
traffic. In the second approach, the model is trained using only
normal data and a profile of normal activity is created. The
idea behind the first approach is that anomalous or attack data
forms a small percentage of the total data. If this assumption
holds, anomalies and attacks can be detected based on cluster
sizes large clusters correspond to normal data, and the rest of
the data points, which are outliers, correspond to attacks.
2.2.1.1 Unsupervised Neural Network
The two typical unsupervised neural networks are self-
organizing maps and adaptive resonance theory. They used
similarity to group objects. They are adequate for intrusion
detection tasks where normal behavior is densely concentrated
around one or two centers, while anomaly behavior and
intrusions spread in space outside of normal clusters.
The Self-organizing map (SOM) is trained by an unsupervised
competitive learning algorithm [26]. The aim of the SOM is to
reduce the dimension of data visualization. That is, SOM
outputs are clustered in a low dimensional (usually 2D or 3D)
grid. It usually consists of an input layer and the Kohonen
layer, which is designed as the two-dimensional arrangement
of neurons that maps n dimensional input to two dimensions.
Kohonen’s SOM associates each of the input vectors to a
representative output. The network finds the node nearest to
each training case and moves the winning node, which is the
closest neuron (i.e. the neuron with minimum distance) in the
training course. That is, SOM maps similar input vectors onto
the same or similar output units on such a two-dimensional
map, which leads to self-organize the output units into an
ordered map and the output units of similar weights are also
placed nearby after training.
SOMs are the most popular neural networks to be trained for
anomaly detection tasks. For example Kayacik et al. [28], they
have created three layers of employment: First, individual
SOM is associated with each basic TCP feature. Second layer
integrates the views provided by the first-level SOM into a
single view of the problem. The final layer is built for those
neurons, which win for both attack and normal behaviors. Oh
and Chae [39] proposed an approach a real-time intrusion-
detection system based on SOM that groups similar data and
visualizes their clusters. The system labels the map produced
by SOM using correlations between features. Jun et al. [24]
introduced a novel methodology to analysis the feature
attributes of network traffic flow with some new techniques,
including a novel quantization model of TCP states.
Integrating with data preprocessing, the authors construct an
anomaly detection algorithm with SOFM and applied the
detection frame to DARPA Intrusion Detection Evaluation
Data.
Adaptive Resonance Theory (ART). The adaptive resonance
theory embraces a series of neural network models that
perform unsupervised or supervised learning, pattern
recognition, and prediction. Unsupervised learning models
Include ART-1, ART- 2, ART-3, and Fuzzy ART. Various
supervised networks are named with the suffix ‘‘MAP’’, such
as ARTMAP, Fuzzy ARTMAP, and Gaussian ARTMAP.
Amini et al. [1] Compared the performance of ART-1
(accepting binary inputs) and ART-2 (accepting continuous
inputs) on KDD99 data. Liao et al. [29] deployed Fuzzy ART
in an adaptive learning framework which is suitable for
dynamic changing environments. Normal behavior changes
are efficiently accommodated while anomalous activities can
still be identified.
2.2.1.2 K-Means
K-means algorithm is a traditional clustering algorithm. It
divides the data into k clusters, and guarantee that the data
within the same cluster are similar, while the data in a various
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
37
clusters have low similarities. K-means algorithm is first
selected K data at random as the initial cluster center, for the
rest data add it to the cluster with the highest similarity
according to its distance to the cluster center, and then
recalculate the cluster center of each cluster. Repeat this
process until each cluster center doesn’t change. Thus data are
divided into K clusters. Unfortunately, K-means clustering is
sensitive to the outliers and a set of objects closer to a centroid
may be empty, in which case centroids cannot be updated
[16].
[30] proposed K-means algorithms for anomaly detection.
Firstly, a method to reduce the noise and isolated points in the
data set was advanced. By dividing and merging clusters and
using the density radius of a super sphere, an algorithm to
calculate the number of the cluster centroid was given. By
more accurate method of finding k clustering center, an
anomaly detection model was presented to get better detection
effect. Cuixiao et al. [7] proposed a mixed intrusion detection
system (IDS) model. Data are examined by the misuse
detection module and then the detection of abnormal data is
performed by anomaly detection module. In this model,
unsupervised clustering method is used to build the anomaly
detection module. The algorithm used is an improved
algorithm of K-means clustering algorithm and it is
demonstrate to have a high detection rate in the anomaly
detection module.
2.2.1.3 Fuzzy C-Means (FCM)
Fuzzy C-means is a clustering method, which grants one piece
of data to belong to two or more clusters. It was developed by
Dunn [9] and improved later by Bezdek [3], it is used in
applications for which hard classification of data is not
meaningful or difficult to achieve (e.g, pattern recognition).
C-means algorithm is similar to K-Means except that
membership of each point is defined based on a fuzzy
function and all the points contribute to the relocation of a
cluster centroid based on their fuzzy membership to that
cluster.
Shingo et al. [52] proposed a new approach called FC-ANN,
based on ANN and fuzzy clustering to solve the problem and
help IDS achieving higher detection rate, less false positive
rate and stronger stability. Yu and Jian [58] proposed an
approach integrating several soft computing techniques to
build a hierarchical neuro-fuzzy inference intrusion detection
system. In this approach, principal component analysis neural
network is used to reduce feature space dimensions. The
preprocessed data were clustered by applying an enhanced
fuzzy C-means clustering algorithm to extract and manage
fuzzy rules. Another approach that uses a fuzzy approach for
unsupervised clustering is presented by Shah et al. [50]. They
employed the Fuzzy C-Medoids (FCMdd) in order to index
cluster streams of system call, low level Kernel data and
network data.
2.2.1.4 Unsupervised Niche Clustering (UNC)
(UNC) is a robust clustering algorithm, which uses an
evolutionary algorithm with a niching strategy (Nasraoui et al.
[38]. The evolutionary algorithm helps to find clusters using a
robust density fitness function, while the niching technique
allows it to create and maintain the niches (candidate
clusters). Since UNC is based on genetic optimization, it is
much less susceptible to suboptimal solutions than traditional
techniques. The algorithm main advantage is the ability to
handle noise and to determine clusters number automatically.
Elizabeth et al. [10] combined the UNC with fuzzy set theory
for anomaly detection and applied it to network intrusion
detection. They associated to each cluster generated by the
UNC a member function that follows a Gaussian shape using
evolved cluster center and radius. Such cluster membership
functions will define the normalcy level of a data sample.
2.2.1.5 Expectation-Maximization Meta Algorithm
(EM)
EM is another soft clustering method based on Expectation-
Maximization Meta algorithm Dempster et al. [8].
Expectation-Maximization is an algorithm for finding
maximum probability estimates of parameters in probabilistic
models. EM clustering algorithm alternates between
performing expectation (E) step, by computing an estimation
of likelihood using current model parameters (as if they are
known), and a maximization (M) step, by computing the
maximum probability estimates of model parameters. The
model parameters new estimations contribute to an
expectation step of next iteration.
Hajji [15] used Gaussian mixture models to characterize
utilization measurements. Model parameters are estimated
using Expectation-Maximization (EM) algorithm and
anomalies are detected corresponding to network failure
events. Animesh and Jung [2] proposed an anomaly detection
scheme, called SCAN to address the threats posed by
network-based denial of service attacks in high speed
networks. The noteworthy features of SCAN include: (a) it
rationally samples the incoming network traffic to reduce the
amount of audit data being sampled while retaining the
intrinsic characteristics of the network traffic itself; (b) it
computes the missing elements of the sampled audit data by
using an enhanced Expectation-Maximization (EM)
algorithm-based clustering algorithm; and (c) it enhances the
convergence speed of the clustering process by employing
Bloom filters and data summaries.
2.2.2 One -Class Support Vector Machine
(OCSVM)
The one-class support vector machine is a very specified
sample of a support vector machine which is geared for
anomaly detection. The one-class SVM varies from the SVM
generic version in that the resulting problem of quadratic
optimization includes an allowance for a specific small
predefined outliers percentage, making it proper for anomaly
detection. These outliers lie between the origin and the
optimal separating hyper plane. All the remaining data fall on
the opposite side of the optimal separating hyper plane,
belonging to a single nominal class, hence the terminology
“one-class” SVM. The SVM outputs a score that represents
the distance from the data point being tested to the optimal
hyper plane. Positive values for the one-class SVM output
represent normal behavior (with higher values representing
greater normality) and negative values represent abnormal
behavior (with lower values representing greater abnormality)
[42].
Eskin et al. [11] and Honig et al. [19] used an SVM in
addition to their clustering methods for unsupervised learning.
The SVM algorithm had to be modified a little to work in
unsupervised learning domain. Once it was, it performs better
than both of their clustering methods.
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
38
Shon and Moon [53] suggested a new SVM approach, named
Enhanced SVM, which merges (soft-margin SVM method and
one-class SVM) in order to provide unsupervised learning and
low false alarm capability, similar to that of a supervised
SVM approach. Rui et al. [46] proposed a method for network
anomaly detection based on one class support vector machine
(OCSVM). The method contains two main steps: first is the
detector training, the training data set is used to generate the
OCSVM detector, which is capable to learn the data nominal
profile, and the second step is to detect the anomalies in the
performance data with the trained detector.
2.3 Anomaly Detection Algorithms
Comparison
Various unsupervised anomaly detection algorithms have
been applied to intrusion detection to enhance IDSs
performance in all levels such as in clustering, features
selection and classifications. Based on the previous
description of the different unsupervised anomaly detection
algorithms, Table 1 shows a comparison among the most
common algorithms. The comparison summarizes the pros
and cons of each one.
the weaknesses of knowledge base detection techniques.
Anomaly detection comprises supervised techniques and
unsupervised techniques. Many algorithms were used to
achieve good results for these techniques. This paper proposes
an overview of machine learning techniques for anomaly
detection. The experiments demonstrated that the supervised
learning methods significantly outperform the unsupervised
ones if the test data contains no unknown attacks. Among the
supervised methods, the best performance is achieved by the
non-linear methods, such as SVM, multi-layer perceptron and
the rule-based methods. Techniques for unsupervised such as
K-Means, SOM, and one class SVM achieved better
performance over the other techniques although they differ
in their capabilities of detecting all attacks classes efficiently.
Table 1: Pros and Cons of Anomaly Detection Techniques
Technique
Pros
Cons
K -Nearest
Neighbor
Very easy to
understand when
there are few
predictor
variables.
Useful for
building models
that involve non-
standard data
types, such as text.
Have large storage
requirements.
Sensitive to the choice of
the similarity function that is
used to compare instances.
Lack a principled way to
choose k, except through
cross-validation or similar.
Computationally-expensive
technique.
Neural
Network
A neural network
can perform tasks
that a linear
program cannot.
When an element
of the neural
network fails, it
can continue
without any
The neural network needs
training to operate.
The architecture of a neural
network is different from the
architecture of
microprocessors therefore
needs to be emulated.
Requires high processing
time for large neural
problem with their
parallel nature.
A neural network
learns and does
not need to be
reprogrammed.
It can be
implemented in
any application.
networks.
Decision
Tree
Simple to
understand and
interpret.
Requires little
data preparation.
Able to handle
both numerical
and categorical
data.
Uses a white box
model.
Possible to
validate a model
using statistical
tests.
Robust.
Perform well with
large data in a
short time.
The problem of learning an
optimal decision tree is
known to be NP-complete
under several aspects of
optimality and even for
simple concepts.
Decision-tree learners create
over-complex trees that do
not generalize the data well.
There are concepts that are
hard to learn because
decision trees do not express
them easily.
Support
Vector
Machine
Find the optimal
separation hyper
plane.
Can deal with
very high
dimensional data.
Some kernels
have infinite
Vapnik-
Chervonenkis
dimension, which
means that they
can learn very
elaborate
concepts.
Usually work
very well.
Require both positive and
negative examples.
Need to select a good kernel
function.
Require lots of memory and
CPU time.
There are some numerical
stability problems in solving
the constraint
QP.
Self-
organizing
map
Simple and easy-
to-understand
algorithm that
works.
A topological
clustering
unsupervised
algorithm that
works with
nonlinear data set.
The excellent
capability to
visualize high-
dimensional data
onto 1 or 2
dimensional space
Time consuming algorithm
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
39
makes it unique
especially for
dimensionality
reduction.
K-means
Low complexity.
Necessity of specifying k.
Sensitive to noise and outlier
data points.
Clusters are sensitive to
initial assignment of
centroids.
Fuzzy C-
means
Allows a data
point to be in
multiple clusters.
A more natural
representation of
the behavior of
genes.
Need to define c, the clusters
number.
Need to determine
membership cutoff value.
Clusters are sensitive to
initial assignment of
centroids.
Expectation-
Maximization
Meta
Can easily change
the model to adapt
to a different
distribution of data
sets.
Parameters
number does not
increase with the
training data
increasing.
Slow convergence in some
cases
3. CONCLOUSION
Machine learning techniques have received considerable
attention among the intrusion detection researchers to address
the weaknesses of knowledge base detection techniques.
Anomaly detection comprises supervised techniques and
unsupervised techniques. Many algorithms were used to
achieve good results for these techniques. This paper proposes
an overview of machine learning techniques for anomaly
detection. The experiments demonstrated that the supervised
learning methods significantly outperform the unsupervised
ones if the test data contains no unknown attacks. Among the
supervised methods, the best performance is achieved by the
non-linear methods, such as SVM, multi-layer perceptron and
the rule-based methods. Techniques for unsupervised such as
K-Means, SOM, and one class SVM achieved better
performance over the other techniques although they differ
in their capabilities of detecting all attacks classes efficiently.
4. REFERENCES
[1] Amini and Jalili. 2004. Network-based intrusion
detection using unsupervised adaptive resonance theory.
in Proceedings of the 4th Conference on Engineering of
Intelligent Systems (EIS’04).
[2] Animesh, P. and Jung,M. 2007. “Network Anomaly
Detection with Incomplete Audit Data”. Elsevier
Science,12 February, 2007, pp. 5-35.
[3] Bezdek, J. 1981. Pattern recognition with fuzzy
objective function algorithms”. Kluwer Academic
Publishers, Norwell, MA, USA (1981).
[4] Bishop, C.1995. Neural networks for pattern recognition
England, Oxford University.
[5] Bouzida, F., Cuppens,B. and Gombault,s.2004.Efficient
intrusion detection using principal component analysis.
in Proceedings of the 3ème Conférence sur la Sécurité et
Architectures Réseaux (SAR).
[6] Chan, F. , Yeung,S. and Tsang,S.2005. Comparison of
different fusion approaches for network intrusion
detection using an ensemble of RBFNN. in: Proceedings
of 2005 International Conference on Machine Learning
and Cybernetics.
[7] Guobing,Z.,Cuixia,Z.and Shanshan,s.2009. A Mixed
Unsupervised Clustering-based Intrusion Detection
Model. Third International Conference on Genetic and
Evolutionary Computing.
[8] Dempster,A., Laird, N.and Rubin, D. 1977. Maximum
likelihood from incomplete Data via the EM algorithm”.
J. Royal Stat, Soc, Vol. 39, 1977, pp. 138.
[9] Dunn, J. 1973.” A fuzzy relative of the ISO data process
and its use in detecting compact well-separated clusters”.
Journal of Cyber natics, Vol.3(3), pp. 3257.
[10] Lizabeth, L., Olfa, N. and Jonatan,G.2007. Anomaly
detection based on unsupervised niche clustering with
application to network intrusion detection. Proceedings
of the IEEE Conference on Evolutionary Computation.
[11] Eskin,E.,Arnold,A .,Preraua,M., Portnoy.L and
Stolfo,S.” A geometric framework for unsupervised
anomaly detection: Detecting intrusions in unlabeled
data”. In D. Barber and S. Jajodia (Eds.). Data Mining for
Security Applications. Boston: Kluwer Academic
Publishers.
[12] Estevez,J.,Garcya,P. and Dyaz, J. 2004.”Anomaly
detection methods in wired networks: a survey and
taxonomy”. Computer Networks. Vol .27, No.16, 2004,
pp. 156984.
[13] Garcıa,T. Dıaz,V. Macia,F. and Vazquezb. 2009.
“Anomaly-based network intrusion detection”.
Computers and security, Vol. 2 8, 2 0 0 9, pp. 1 8 2 8.
[14] Gilles, C., Melanie, H. and Christian,P. 2004.” One-class
support vector machines with a conformal kernel”. A
case study in handling class imbalance .In Structural
Syntactic and Statistical Pattern Recognition, 2004,
pp.850858.
[15] Hajji ,H.” Statistical Analysis of Network Traffic for
Adaptive Faults Detection”. 2005. IEEE Trans. Neural
Networks, Vol.16, NO5, 2005, PP. 1053-1063.
[16] Han, J. and Kamber, M. 2001. Data mining: Concept
and Techniques. (1th Ed) , Morgan Kaufman publishers,
[17] Heckerman 1995.A tutorial on Learning with Bayesian
Networks”. Technical report. Microsoft research,
MSRTR, Vol 6
[18] Hofmann,A., Schmitz,C. and Sick, B.2003. Rule
extraction from neural networks for intrusion detection in
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
40
computer networks.in IEEE International Conference on
Systems, Man and Cybernetics.
[19] Honig, A. 2002 Adaptive model generation: An
architecture for the deployment of data mining based
intrusion detection systems”. In D. Barbar and S.
Jajodia, (Eds.), Data Mining for Security Applications.
Boston: Kluwer Academic Publishers May 2002.
[20] Jain, A., Murty, M. and Flynn, P. 1999.” Data clustering:
A review”. ACM Computing Surveys, Vol. 31, NO3, pp.
264323.
[21] Jiang,J.,Zhang,C. and Kame,M.2003. RBF-based real-
Time hierarchical intrusion detection systems. In
Proceedings of the International Joint Conference on
Neural Networks (IJCNN’03).
[22] Johansen, K. and Lee. CS424 network security:
Bayesian Network Intrusion Detection (BINDS)”:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1
.83.8479
[23] Joong, L., Jong,H., Seon,G. and Tai,M . 2008.” Effective
Value of Decision Tree with KDD99 Intrusion Detection
Datasets for Intrusion Detection System. ICACT, pp.
17-20.
[24] Jun, Z., Ming, H., Hong, Z .2004. A new Method of Data
Preprocessing and Anomaly Detection. Pro. of Third
Inter. Conf on Machine Learning and cybernetics.
[25] Jun, W., Xu,H.,Rong, R. and Tai-hang ,L.2009. A Real
Time Intrusion Detection System Based on PSO-SVM.
Proceedings of the International Workshop on
Information Security and Application (IWISA).
[26] Kohonen, 1995. Self-Organizing Map”. Springer,
Berlin,
[27] Kumar, G. Kumar, K. and Sachdeva, M.2010. The use of
artificial intelligence based techniques for intrusion
detection: a review.
[28] Kayacik, G., Zincir, H. and Heywood, M.2003. On the
Capability of an SOM Based Intrusion Detection System.
Proc IEEE, IJCNN.
[29] Liao,Y. , Vemuri,R. and Pasos,A. 2007.” Adaptive
anomaly detection with evolving connectionist Systems”.
Journal of Network and Computer Applications, Vol.30,
NO1, PP. 6080.
[30] [30] LI,H 2010.Research and Implementation of an
Anomaly Detection Model Based on Clustering Analysis.
International Symposium on Intelligent Information
Processing and Trusted Computing.
[31] Liu,Z., Florez, C. and Bridges, S.2002. A comparison of
input representations in neural networks: a case study in
intrusion detection. In Proceedings of the International
Joint Conference on Neural Networks (IJCNN’02).
[32] Manocha, S. and Girolami, M. 2007.” An empirical
analysis of the probabilistic K-nearest Neighbor
Classifier”. Pattern Recognition Letters, Vol. 28, pp.
18181824.
[33] Ming, Y. 2011. ” Real Time Anomaly Detection Systems
for Denial of Service Attacks by Weighted k-Nearest-
Neighbor Classifiers”. Expert Systems with
Applications, Vol.38, 2011, pp. 3492-3498.
[34] Mohammed,.S., Marwa, S., Mohammed, Imane,
S.2007.Artificial Neural Networks Architecture for
IntrusionDetection Systems and Classification of
Attacks, CairoUniversity, Egypt.
[35] Moore, D.2005. Internet Traffic Classification Using
Bayesian Analysis Techniques. in Proceedings of ACM
SIGMETRICS.
[36] Moradi and Zulkernine.2004. A Neural Network Based
System for Intrusion Detection and Classification of
Attacks.IEEE International Conference on Advances in
Intelligent Systems-Theory and Applications,
Luxembourg: Kirchberg.
[37] Mukkamala,S.,Sung, A.and Ribeiro, B.2005. Model
Selection for Kernel Based Intrusion Detection Systems.
Proceedings of International Conference on Adaptive
and Natural Computing Algorithm.
[38] Nasraoui, O., Leon, E. & Krishnapuram, R. 2005.
Unsupervised Niche Clustering: Discovering an
Unknown Number of Clusters in Noisy Data Sets. In:
GHOSH, A. & JAIN, L. (eds.) Evolutionary
Computation in Data Mining. Springer Berlin
Heidelberg.
[39] Oh and Chae.2008. Real Time Intrusion Detection
System Based on Self-Organized Maps and Feature
Correlations. The Proceedings of the Third International
Conference on Convergence and Hybrid Information.
[40] Paulo, M., Vinicius , M. and Joni.2010. Octopus-IIDS:
An Anomaly Based Intelligent Intrusion Detection
System.Proceedings of Computers and Communications
(ISCC).
[41] Peddabachigari, S., Abraham, A., Grosan, C. and
Thomas, J. 2007.” Modeling Intrusion Detection System
using Hybrid Intelligent Systems”. J. Netw. Comput.
Appl, Vol. 30, NO1, PP. 114-132.
[42] Gilles,C.,Melanie, H. and Christian, P.2004.One-Class
Support vector Machines with a Conformal kernel A case
study in handling class Imbalance. In: Structural yntactic
and Statistical Pattern Recognition.
[43] Quinlan, J.1993.” C4.5: programs for machine learning”.
Log Altos, CA, Morgan Kaufmann.
[44] Rapaka,A., Novokhodko,A. and Wunsch,D.2003
Intrusion detection using radial basis function network on
sequence of system calls. In Proceedings of
theInternational Joint Conference on Neural Networks
(IJCNN’03).
[45] Rawat,S.2005. Efficient Data Mining Algorithms for
Intrusion Detection. in Proceedings of the 4th
International Journal of Computer Applications (0975 8887)
Volume 79 No.2, October 2013
41
Conference on Engineering of Intelligent Systems
(EIS’04).
[46] Rui, Z., Shaoyan, Z., Yang, L. and Jianmin ,J.2008.
Network Anomaly Detection Using One Class Support
Vector Machine. Proceedings of the International Multi
Conference of Engineers and Computer Scientists.
[47] Rumelhart, D. Hinton, G. and Williams, R. 1986.
Learning internal representations by error propagation .
In: Rumelhart, D., McClelland J L et al. (Eds.) Parallel
Distributed Processing: Explorations in
theMicrostructure of Cognition. MIT Press, Cambridge,
MA,Vol. 1, pp. 318-362.
[48] Sahar, S., Hashem, M. and Taymoor, M. 2010.” Intrusion
Detection using Multi-Stage”. Neural Network.
International Journal of Computer Science and
Information Security, Vol. 8, NO 4, PP. 14-20.
[49] Santanu, D., Ashok, S. and Aditi, C.2007. Classification
of Damage Signatures in Composite Plates using One-
Class SVM’s. In Proceedings of the IEEE Aerospace
Conference, Big Sky. MO.
[50] Shah,H., Undercoffer,J. and Joshi, A. 2003. Fuzzy
Clustering for Intrusion Detection. the 12th IEEE
International Conference on Fuzzy Systems.
[51] Shailendra and Sanjay. 2009.” An ensemble approach for
feature selection of Cyber Attack Dataset”, International
Journal of Computer Science and Information Security
P12-(IJCSIS), Vol.6, NO 2.
[52] Shingo, M., Ci, C. Nannan, L. Kaoru, S. and Kotaro, H.
An Intrusion Detection Model Based on Fuzzy Class
Association Rule Mining Using Genetic Network
Programming”. IEEE Transactions on Systems, Part C.
Vol.41, pp. 130-139.
[53] Shon and Moon. 2007.” A hybrid Machine Learning
Approach to Network Anomaly Detection”. Inf. SCI,
Vol.177, NO 18, PP. 3799-3821.
[54] Srinivas, M. 2002. Intrusion Detection using Neural
Networks and Support vector Machine. Proceedings of
the IEEE International HI.
[55] Theodoridis, S. and Koutroumbas. 2006. Pattern
recognition (3rd Ed.)”. USA: Academic Press.
[56] Vapnik, V. Statistical learning theory”. Wiley, New
York, 1998.
[57] Yao,Y. , Wei, Y. GAO, F. and Yu,G.2006. Anomaly
Intrusion Detection Approach Using Hybrid MLP/CNN
Neural Network. Proceedings of the Sixth International
Conference on Intelligent Systems Design and
Applications.
[58] Yu, Z. and Jian, F. 2009 Intrusion Detection Model
Based on Hierarchical Fuzzy Inference System. Second
International Conference on Information and Computing
Science Icic.
[59] Peng, N. and Sushil, J. 2003. “Intrusion Detection
Techniques”. nIn H. Bidgoli (Ed.), the Internet
Encyclopedia. John Wiley & Sons.
[60] Ghorbani, Wei and Tavallaee. 2010.” Theoretical
Foundation of Detection Network Intrusion Detection
and Prevention”. Concepts and Techniques Advances in
Information Security. Springer Science,Vol.47, pp.47-
114.
IJCATM : www.ijcaonline.org
... The problem of distinguishing anomalous from regular events for unlabeled datasets is a challenging task, which is even harder when considering that such a dataset is usually highly imbalanced [11,12]. Several methods exist to solve this problem, involving, for example, various clustering methods, self-organizing maps (SOMs), adaptive resonance theory (ART), artificial neural networks, one-class support vector machine (OCSVM), and many others [13][14][15][16][17][18][19]. In general, machine learning techniques (for a survey, see, for instance, [17,20,21]) promise to be at least complementary (or even superior) to traditional rule-based and human-defined methods [22][23][24]. ...
... Several methods exist to solve this problem, involving, for example, various clustering methods, self-organizing maps (SOMs), adaptive resonance theory (ART), artificial neural networks, one-class support vector machine (OCSVM), and many others [13][14][15][16][17][18][19]. In general, machine learning techniques (for a survey, see, for instance, [17,20,21]) promise to be at least complementary (or even superior) to traditional rule-based and human-defined methods [22][23][24]. The latter ones, because of attacks having no constant patterns and a rapidly changing behavior over time, can make them cumbersome, quickly obsolete, and therefore, unsustainable [24]. ...
Article
Full-text available
The number of fraud occurrences in electronic banking is rising each year. Experts in the field of cybercrime are continuously monitoring and verifying network infrastructure and transaction systems. Dedicated threat response teams (CSIRTs) are used by organizations to ensure security and stop cyber attacks. Financial institutions are well aware of this and have increased funding for CSIRTs and antifraud software. If the company has a rule-based antifraud system, the CSIRT can examine fraud cases and create rules to counter the threat. If not, they can attempt to analyze Internet traffic down to the packet level and look for anomalies before adding network rules to proxy or firewall servers to mitigate the threat. However, this does not always solve the issues, because transactions occasionally receive a “gray” rating. Nevertheless, the bank is unable to approve every gray transaction because the number of call center employees is insufficient to make this possible. In this study, we designed a machine-learning-based rating system that provides early warnings against financial fraud. We present the system architecture together with the new ML-based scoring extension, which examines customer logins from the banking transaction system. The suggested method enhances the organization’s rule-based fraud prevention system. Because they occur immediately after the client identification and authorization process, the system can quickly identify gray operations. The suggested method reduces the amount of successful fraud and improves call center queue administration.
... Here, statistical conventional methods such as Seasonal Auto-regressive Integrated Moving Average (SARIMA) and Seasonal Trend Decomposition (STL) are generally used for periodic and stationary time series [27,19]. Conventional ML methods in anomaly detection are broadly researched area [28,29]. Some conventional ML approaches, e.g., clusterbased methods, have mostly the same challenges, i.e., abnormal activities are modeled in cluster-based approaches. ...
Article
Full-text available
Accurate and automated anomaly detection in time series data sets has an increasingly important role in a wide range of applications. Inspired by coding in the cortical networks of the brain, here we introduce a novel approach for high performance real-time anomaly detection. Cortical coding method is adaptive and dynamic, consisting of self-organized networks. In the cortical coding network introduced herein, the morphological structuring is driven by a brain inspired feature extraction strategy that aims the minimization of the signal energy dissipation while increasing the information entropy of the system. We combine the cortical coding network with transform coding and multi resolution analysis for anomaly detection. As we demonstrate here, the new coding methodology provides high computational efficiency in addition to scalability with respect to target accuracy compared to the traditional clustering algorithms. A wide variety of data sets are used to demonstrate time series anomaly detection performance. In a preliminary work presented here, we detected 77.6% of the present anomalies correctly, using the same hyperparameters for every stage of the method. The results are compared with several clustering algorithms such as K-means and its variants mini-batch K-means, sequential K-means and finally with hierarchical agglomerative clustering. Additionally, the performance of all the clustering methods are compared by memorizing all input data set without performing any clustering. The cortical coding method has shown the best performance compared to the other methods. From the results achieved so far, it appears that there is still a significant room for improvement of the success rate by, specifically, performing hyperparameter and filter optimization according to the characteristics of data sets and using a more advanced fusion model at the output layer. Low time and space complexity, high generalization performance, suitability to real-time anomaly detection, and in-memory processing compatibility are distinct advantages of the cortical coding method in a variety of anomaly detection problems, such as predictive maintenance, cybersecurity, telemedicine, risk management, and transportation safety.
... As a separate class is not defined, the learning image aims to use only the normal class or train only the bad class. However, it is recommended to train only one class of images to be learned, [11]. ...
Article
To solve the problem of high-wage employment and unemployment that is constantly occurring in industrial sites, we designed a real-time anomaly detection system based on YOLOv4 to automate the detection of defective products at actual manufacturing sites. This contributes to reducing labor costs and increasing work efficiency in the field. It also contributes to manufacturing data collection and smart factory system construction by utilizing the established system.
... However, this is unrealistic in the presence of ever-growing attacks. Non-linear methods, such as support vector machines, multilayer perceptrons, and heuristic-based detections, tend to outperform traditional learning techniques [1]. Modern applications require rigorous analysis of test data to filter out the outliers and anomalies present in order to maintain the system's reliability. ...
Article
Full-text available
With the significant growth of the cyber environment over recent years, defensive mechanisms against adversaries have become an important step in maintaining online safety. The adaptive defense mechanism is an evolving approach that, when combined with nature-inspired algorithms, allows users to effectively run a series of artificial intelligence-driven tests on their customized networks to detect normal and under attack behavior of the nodes or machines attached to the network. This includes a detailed analysis of the difference in the throughput, end-to-end delay, and packet delivery ratio of the nodes before and after an attack. In this paper, we compare the behavior and fitness of the nodes when nodes under a simulated attack are altered, aiding several nature-inspired cyber security-based adaptive defense mechanism approaches and achieving clear experimental results. The simulation results show the effectiveness of the fitness of the nodes and their differences through a specially crafted metric value defined using the network performance statistics and the actual throughput difference of the attacked node before and after the attack.
... Anomaly detection is an area of broad scope that can be used in many fields of knowledge, such as fraud detection [16][17][18], image processing [19], healthcare [20,21], equipment failure detection [22,23], or consumption [9,24]. Anomaly detection problems are generally divided into three main categories [15,25]: (i) Supervised; (ii) Semi-supervised; or (iii) Unsupervised detection. Supervised detection is characterized by problems wherein data are classified, i.e., anomalies and normal data are previously known. ...
Article
Full-text available
Buildings are responsible for a high percentage of global energy consumption, and thus, the improvement of their efficiency can positively impact not only the costs to the companies they house, but also at a global level. One way to reduce that impact is to constantly monitor the consumption levels of these buildings and to quickly act when unjustified levels are detected. Currently, a variety of sensor networks can be deployed to constantly monitor many variables associated with these buildings, including distinct types of meters, air temperature, solar radiation, etc. However, as consumption is highly dependent on occupancy and environmental variables, the identification of anomalous consumption levels is a challenging task. This study focuses on the implementation of an intelligent system, capable of performing the early detection of anomalous sequences of values in consumption time series applied to distinct hotel unit meters. The development of the system was performed in several steps, which resulted in the implementation of several modules. An initial (i) Exploratory Data Analysis (EDA) phase was made to analyze the data, including the consumption datasets of electricity, water, and gas, obtained over several years. The results of the EDA were used to implement a (ii) data correction module, capable of dealing with the transmission losses and erroneous values identified during the EDA’s phase. Then, a (iii) comparative study was performed between a machine learning (ML) algorithm and a deep learning (DL) one, respectively, the isolation forest (IF) and a variational autoencoder (VAE). The study was made, taking into consideration a (iv) proposed performance metric for anomaly detection algorithms in unsupervised time series, also considering computational requirements and adaptability to different types of data. (v) The results show that the IF algorithm is a better solution for the presented problem, since it is easily adaptable to different sources of data, to different combinations of features, and has lower computational complexity. This allows its deployment without major computational requirements, high knowledge, and data history, whilst also being less prone to problems with missing data. As a global outcome, an architecture of a platform is proposed that encompasses the mentioned modules. The platform represents a running system, performing continuous detection and quickly alerting hotel managers about possible anomalous consumption levels, allowing them to take more timely measures to investigate and solve the associated causes.
... An another approach is to perform unsupervised clustering with machine learning algorithms. Common algorithms from this field are k nearest neighbour, Connectivity-Based Outlier Factor (COF) [17], One-Class Support Vector Machine [18] and neural networks based solutions such as the Self-organizing map (SOM) [19] and the Adaptive Resonance Theory (ART) [20]. 2. Model the normal and abnormal behaviour. ...
Conference Paper
Full-text available
Due to the popularity of cloud services it is unavoidable, that not just legitimate, but fraudulent registrations will happen. For a service with good reputation it is essential to prevent fraud users. A common way is to filter these cases during the registration process by analysts. This paper presents a novel decision support system, that can recognise anomalous behavioural patterns and classify accounts based on the available data thus implementing an automated fraud prevention system. The process uses both supervised and unsupervised approaches, thus avoiding errors due to inaccurate labeling. As a supervised machine learning algorithm random forest classifier and logistic regression, as an unsupervised auto encoder is used. The developed flow gives a recommendation to the analyst whether a new user is potentially fraud or not and provides feedback on the accuracy of analysts’ work based on the results of the unsupervised approach. The newly developed process is able to supervise the decisions made by analysts thus improving the labeling process. The main goal of this paper is to present a new, more deterministic labeling workflow with the ability to provide feedback so it can improve the correctness of the training data set.
Chapter
In this paper we propose a resumable interruption framework for robotic applications which allows to “filter” misrecognition signals after their occurrence. Handling misrecognition is essential for deploying reactive systems into the real world, since being over-reactive to detection errors can lead to livelocks and stagnation. For example, constantly interrupting and resuming a picking task due to misrecognition can make the robot alternate between pre-grasping and grasping motions, without ever achieving the task. Our solution is based on resumable interruptions, continuing interrupted procedures from the exact preemption point if similar execution requests are received shortly after a cancellation order. This acts as a post-facto misrecognition filter, which stabilizes execution and ensures task completion. Compared with standard filtering, the post-facto approach allows to deliver signals faster and recover from misrecognition longer. The proposed system is verified through real robot experiments in dynamic and static environments. KeywordsRoboticsReactive roboticsMisrecognition handlingRobot manipulationDynamic environments
Chapter
Businesses sometimes make fraudulent financial reporting by taking into account the interests of the business and applying misleading accounting practices. Fraudulent financial reporting harms investors, business owners, business employees, government, creditor financial institutions, and other businesses that have commercial relations with the business. In order to prevent such damages, enterprises are audited by independent audit institutions. However, this audit is a difficult, time-consuming and costly process that requires reviewing all business records. In particular, the process has a proportionately increasing difficulty with the increasing number of business movements subject to audit. In this process, the knowledge and experience of the audit firm and the audit team play a significant role in the detection of irregularities by the company. The audit firm directs the audit process and completes the report in line with the findings obtained from the sample by sampling the business records based on its own knowledge and experience. In fact, the audit firm also accepts the audit risk with the report it prepares. In this study, it is aimed to develop a fuzzy logic based application that will facilitate the audit process of the audit firm and assist the auditor in detecting fraudulent transactions, and the success rate was achieved as 92% using the Fuzzy C-Means (FCM) method in cases of an anomaly by analyzing past data. KeywordsFraudFraud detectionCreative accountingFuzzy logic
Article
Problem. Video surveillance is a process of monitoring various objects, which is implemented with the use of video cameras - optical-electronic and microprocessor devices, designed for visual control of the environment, with the aim of the safety of life, activity and property of a modern person. Such processes and objects can be, for example, cars moving at an intersection, on a street or on a country road, a road surface during the control of its condition and quality, a security system of any infrastructure object. Goal. The purpose of the study is the analysis of the technical composition of systems for detecting anomalies in the video of video surveillance cameras and a comparative review of computational methods for processing the results of this observation. To achieve the goal, it is necessary to research literary sources, that is, articles in scientific journals, reports at conferences, articles on non-thematic web portals, monographs and textbooks, the names of which indicate the possibility of finding information useful for this research. Methodology. As part of the research task, we are interested in the technologies, systems and methods that have been proposed and developed for obtaining, processing and analyzing video sequences and images, including machine vision tasks, image classification, object and anomaly detection, image segmentation, etc. Results. As a result of this research, the following was done: 1) An overview of the main modern systems for detecting anomalies in the video series of video surveillance cameras was conducted. It was concluded that the differences between the anomaly detection systems in the video series of video surveillance cameras are due to the choice of methods for processing video information. 2) An analysis of methods of detecting anomalies in the video series of video surveillance cameras was carried out. For this purpose, a classification of modern methods of detecting anomalies in the video series was developed and the basics of the theory of deep neural networks were considered in terms of the possibility of their application for classification, localization, segmentation, detection, identification and tracking of objects in the video series of surveillance cameras. Originality. An overview of the main modern systems for detecting anomalies in the video series of video surveillance cameras was conducted. It was concluded that the differences between the systems for searching for anomalies in the video series of video surveillance cameras are determined by the choice of methods for processing video information. An analysis of the methods of detecting anomalies in the video series of video surveillance cameras was carried out. Practical value. The developed information system is already used to provide students of all educational institutions of Ukraine of the III level of accreditation with the information about our university; regarding the specialties offered by the university and the corresponding professions; regarding open days, preparatory courses and much more.
Article
Full-text available
Most current intrusion detection systems are signature based ones or machine learning based methods. Despite the number of machine learning algorithms applied to KDD 99 cup, none of them have introduced a pre-model to reduce the huge information quantity present in the different KDD 99 datasets. We introduce a method that applies to the different datasets before performing any of the different machine learning algorithms applied to KDD 99 intrusion detection cup. This method enables us to significantly reduce the information quantity in the different datasets without loss of information. Our method is based on Principal Component Analysis (PCA). It works by projecting data elements onto a feature space, which is actually a vector space R d , that spans the significant variations among known data elements. We present two well known algorithms we deal with, decision trees and nearest neighbor, and we show the contribution of our approach to alleviate the decision process. We rely on some experiments we perform over network records from the KDD 99 dataset, first by a direct application of these two algorithms on the rough data, second after projection of the different datasets on the new feature space.
Chapter
In this final chapter we consider several fuzzy algorithms that effect partitions of feature space ℝp , enabling classification of unlabeled (future) observations, based on the decision functions which characterize the classifier. S25 describes the general problem in terms of a canonical classifier, and briefly discusses Bayesian statistical decision theory. In S26 estimation of the parameters of a mixed multivariate normal distribution via statistical (maximum likelihood) and fuzzy (c-means) methods is illustrated. Both methods generate very similar estimates of the optimal Bayesian classifier. S27 considers the utilization of the prototypical means generated by (A11.1) for characterization of a (single) nearest prototype classifier, and compares its empirical performance to the well-known k-nearest-neighbor family of deterministic classifiers. In S28, an implicit classifier design based on Ruspini’s algorithm is discussed and exemplified.
Article
We examine a graphical representation of uncertain knowledge called a Bayesian network. The representation is easy to construct and interpret, yet has formal probabilistic semantics making it suitable for statistical manipulation. We show how we can use the representation to learn new knowledge by combining domain knowledge with statistical data. 1 Introduction Many techniques for learning rely heavily on data. In contrast, the knowledge encoded in expert systems usually comes solely from an expert. In this paper, we examine a knowledge representation, called a Bayesian network, that lets us have the best of both worlds. Namely, the representation allows us to learn new knowledge by combining expert domain knowledge and statistical data. A Bayesian network is a graphical representation of uncertain knowledge that most people find easy to construct and interpret. In addition, the representation has formal probabilistic semantics, making it suitable for statistical manipulation (Howard,...
Article
Despite the advances reached along the last 20 years, anomaly detection in network behavior is still an immature technology, and the shortage of commercial tools thus corroborates it. Nevertheless, the benefits which could be obtained from a better understanding of the problem itself as well as the improvement of these mechanisms, especially in network security, justify the demand for more research efforts in this direction.This article presents a survey on current anomaly detection methods for network intrusion detection in classical wired environments. After introducing the problem and elucidating its interest, a taxonomy of current solutions is presented. The outlined scheme allows us to systematically classify current detection methods as well as to study the different facets of the problem. The more relevant paradigms are subsequently discussed and illustrated through several case studies of selected systems developed in the field. The problems addressed by each of them as well as their weakest points are thus explained. Finally, this work concludes with an analysis of the problems that still remain open. Based on this discussion, some research lines are identified.