Conference Paper

Similarity Grid for Searching in Metric Spaces.

DOI: 10.1007/11549819_3 Conference: Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures, 6th Thematic Workshop of the EU Network of Excellence DELOS, Cagliari, Italy, June 24-25, 2004, Revised Selected Papers
Source: DBLP

ABSTRACT Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing
centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response
time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure.
By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity
range queries in data-sets of arbitrary size. The structure also scales well with respect to the growing volume of retrieved
data. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time,
the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized
by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life
data-sets.

0 Bookmarks
 · 
71 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of objects. We propose a distributed index structure for similarity data management called the Metric Index (M-Index) which can answer queries in precise and approximate manner. This technique can take advantage of any distributed hash table that supports interval queries and utilize it as an underlying index. We have performed numerous experiments to test various settings of the M-Index structure and we have proved its usability by developing a full-featured publicly-available Web application.
    Information Processing & Management 01/2012; 48(5):855-872. · 0.82 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel approach for solving the approximate nearest neighbor search problem in arbitrary metric spaces. The distinctive feature of our approach is that we can incrementally build a non-hierarchical distributed structure for given metric space data with a logarithmic complexity scaling on the size of the structure and adjustable accuracy probabilistic nearest neighbor queries. The structure is based on a small world graph with vertices corresponding to the stored elements, edges for links between them and the greedy algorithm as base algorithm for searching. Both search and addition algorithms require only local information from the structure. The performed simulation for data in the Euclidian space shows that the structure built using the proposed algorithm has navigable small world properties with logarithmic search complexity at fixed accuracy and has weak (power law) scalability with the dimensionality of the stored data.
    Proceedings of the 5th international conference on Similarity Search and Applications; 08/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: In general metric spaces, one of the most widely used indexing techniques is the partitioning of the objects using pivot elements. The efficiency of partitioning depends on the selection of the appropriate set of pivot elements. In the paper, some methods are presented to improve the quality of the partitioning in GHT structure from the viewpoint of balancing factor. The main goal of the investigation is to determine the conditions when costs of distance computations can be reduced. We show with different tests that the proposed methods work better than the usual random and incremental pivot search methods.
    Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition; 07/2012

Full-text (2 Sources)

View
42 Downloads
Available from
Jun 4, 2014