Access Structures for Advanced Similarity Search in Metric Spaces.
ABSTRACT Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in or- der to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space pro- vides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets from arbitrary met- ric spaces, and a novel access structure for similarity search in metric data, called D-Index, is discussed. D-Index combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. Moreover, we propose an extension of this access structure (eD-Index) which is able to deal with the problem of similarity self join. Though this approach is not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.
Full-textDOI: · Available from: Pavel Zezula, May 29, 2015
SourceAvailable from: Vlastislav Dohnal