Conference Proceeding

Statistical selection of relevant subspace projections for outlier ranking

Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
05/2011; DOI:10.1109/ICDE.2011.5767916 In proceeding of: Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Source: DBLP

ABSTRACT Outlier mining is an important data analysis task to distinguish exceptional outliers from regular objects. For outlier mining in the full data space, there are well established methods which are successful in measuring the degree of deviation for outlier ranking. However, in recent applications traditional outlier mining approaches miss outliers as they are hidden in subspace projections. Especially, outlier ranking approaches measuring deviation on all available attributes miss outliers deviating from their local neighborhood only in subsets of the attributes. In this work, we propose a novel outlier ranking based on the objects deviation in a statistically selected set of relevant subspace projections. This ensures to find objects deviating in multiple relevant subspaces, while it excludes irrelevant projections showing no clear contrast between outliers and the residual objects. Thus, we tackle the general challenges of detecting outliers hidden in subspaces of the data. We provide a selection of subspaces with high contrast and propose a novel ranking based on an adaptive degree of deviation in arbitrary subspaces. In thorough experiments on real and synthetic data we show that our approach outperforms competing outlier ranking approaches by detecting outliers in arbitrary subspace projections.

0 0
 · 
0 Bookmarks
 · 
24 Views

Full-text (2 Sources)

View
45 Downloads
Available from
4 Dec 2012

Keywords

approach outperforms
 
arbitrary subspace projections
 
arbitrary subspaces
 
clear contrast
 
data analysis task
 
detecting outliers
 
distinguish exceptional outliers
 
full data space
 
irrelevant projections
 
multiple relevant subspaces
 
novel outlier ranking
 
objects deviation
 
Outlier mining
 
outlier ranking approaches
 
outliers deviating
 
regular objects
 
relevant subspace projections
 
residual objects
 
subspace projections
 
synthetic data