Conference Paper

# One Pass Outlier Detection for Streaming Categorical Data

Conference: IDAM 2013

- Citations (10)
- Cited In (0)

- [Show abstract] [Hide abstract]

**ABSTRACT:**An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of such outliers is important for many applications and has recently attracted much attention in the data mining research community. In this paper, we present a new method to detect outliers by discovering frequent patterns (or frequent itemsets) from the data set. The outliers are defined as the data transactions that contain less frequent patterns in their itemsets. We define a measure called FPOF (Frequent Pattern Outlier Factor) to detect the outlier transactions and propose the FindFPOF algorithm to discover outliers. The experimental results have shown that our approach outperformed the existing methods on identifying interesting outliers.Comput. Sci. Inf. Syst. 01/2005; 2:103-118. - [Show abstract] [Hide abstract]

**ABSTRACT:**The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Detection of such outliers is important for many applications such as fraud detection and customer migration. Most existing methods are designed for numeric data. They will encounter problems with real-life applications that contain categorical data. In this paper, we formally define the problem of outlier detection in categorical data as an optimization problem from a global viewpoint. Moreover, we present a local-search heuristic based algorithm for efficiently finding feasible solutions. Experimental results on real datasets and large synthetic datasets demonstrate the superiority of our model and algorithm.04/2005; -
##### Conference Paper: HOT: Hypergraph-Based Outlier Test for Categorical Data.

[Show abstract] [Hide abstract]

**ABSTRACT:**As a widely used data mining technique, outlier detection is a process which aims to Þnd anomalies with good explanations. Most existing methods are designed for numeric data. However, they will meet problems in real-life applications, which always contain categorical data. In this paper, we introduce a novel outlier mining method based on hy- pergraph model for categorical data. Since hypergraphs precisely capture the distribution characteristics in data subspaces, this method is eec- tive in identifying anomalies in dense subspaces and presents good inter- pretations for the local outlierness. By selecting the most relevant sub- spaces, the problem of "curse of dimensionality" in very large databases can also be ameliorated. Furthermore, the connectivity property is used to replace the distance metrics, so that the distance-based computa- tion is not needed anymore, which enhances the robustness for handling missing-value data. The fact that connectivity computation facilitates the aggregation operations supported by most SQL-compatible database systems, makes the mining process much ecient. Finally, we give ex- periments and analysis which show that our method can Þnd outliers in categorical data with good performance and quality.Advances in Knowledge Discovery and Data Mining, 7th Pacific-Asia Conference, PAKDD 2003, Seoul, Korea, April 30 - May 2, 2003, Proceedings; 01/2003

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.