Conference Proceeding

LDC: enabling search by partial distance in a hyper-dimensional space

Shannon Lab., AT&T Labs Res., Basking Ridge, NJ, USA;
DOI:10.1109/ICDE.2004.1319980 ISBN: 0-7695-2065-0 pp.6- 17 In proceeding of: Data Engineering, 2004. Proceedings. 20th International Conference on
Source: IEEE Xplore

ABSTRACT Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.

0 0
 · 
0 Bookmarks
 · 
17 Views

Full-text

View
0 Downloads
Available from

Keywords

digital code(DC)
 
existing indexing methods
 
expensive operations
 
high-dimensional indexing techniques
 
hundred dimensions
 
hyper-dimensional data
 
hyper-dimensional databases
 
indexing databases
 
L-norm distance functions
 
local digital coding
 
real life
 
Recent advances
 
research fields
 
significant performance advantages
 
significant problems
 
simple bitmap representation
 
subsequent comparisons
 
support efficient querying
 
synthetic hyper-dimensional datasets
 
ubiquitous indices
 

N. Koudas