Conference Paper

Similarity Grid for Searching in Metric Spaces

DOI: 10.1007/11549819_3 Conference: Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures, 6th Thematic Workshop of the EU Network of Excellence DELOS, Cagliari, Italy, June 24-25, 2004, Revised Selected Papers
Source: DBLP


Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing
centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response
time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure.
By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity
range queries in data-sets of arbitrary size. The structure also scales well with respect to the growing volume of retrieved
data. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time,
the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized
by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life

Download full-text


Available from: Pavel Zezula
  • Source
    • "However, if the data distribution changes, these systems must be reorganized. In [26] each peer maintains an instance of an Address Search Tree, which reduces the system scalability. "
    [Show abstract] [Hide abstract]
    ABSTRACT: As one of the most important technologies for implementing large-scale distributed systems, peer-to-peer (P2P) computing has attracted much attention in both research and industrial communities, for its advantages such as high availability, high performance, and high flexibility to the dynamics of networks. However, multidimensional data indexing remains as a big challenge to P2P computing, because of the inefficiency in search and network maintenance caused by the complicated existing index structures, which greatly limits the scalability of applications and dimensionality of the data to be indexed.We propose SDI (Swift tree structure for multidimensional Data Indexing), a swift index scheme with a simple tree structure for multidimensional data indexing in large-scale distributed systems. While keeping the query efficiency in O(logN) in terms of routing hops, SDI has extremely low maintenance costs which is proved through theoretical analysis. Furthermore, SDI overcomes the root-bottleneck problem existing in most other tree-based distributed indexing systems. Extensive empirical study verifies the superiority of SDI in both query and maintenance performance.
    Full-text · Article · Jan 2009 · Future Generation Computer Systems
  • Source
    • "Though there are many techniques proposed in literature [8] [12] [4], only the distributed versions can successfully deal with the scalability problem. For a metric space, there are several techniques [4] [10] [18] [5] able to deliver answers to similarity queries in P2P environment. Basically, they all follow the schema depicted by Figure 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Current information systems are required to process complex digital objects, which are typically characterized by multiple descriptors. Since the values of many descriptors belong to non-sortable domains, they are effectively comparable only by a sort of similarity. Moreover, the scalability is very important in the current digital-explosion age. Therefore, we propose a distributed extension of the well-known threshold algorithm for peer-to-peer paradigm. The technique allows to answer similarity queries that combine multiple similarity measures and due to its peer-to- peer nature it is highly scalable. We also explore possibilities of approximate evaluation strategies, where some relevant results can be lost in favor of increasing the efficiency by order of magnitude. To reveal the strengths and weaknesses of our approach we have experimented with a 1.6 million image database from Flicker comparing the content of the images by five similarity measures from the MPEG-7 standard. To the best of our knowledge, the experience with such a huge real-life dataset is quite unique.
    Full-text · Conference Paper · May 2008
  • Source
    • "Most of the recent effort in the field of distributed data structures has focused on the vector-based approach [20] [8] [12] [2] [6] [1]. As far as we know, the only metric-based distributed structure published are the GHT index [4] [3] and, very recently, MCAN [11]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The need for a retrieval based not on the attribute val- ues but on the very data content has recently led to rise of themetric-basedsimilarity search. Thecomputationalcom- plexity of such a retrieval and large volumes of processed data call for distributed processing which allows to achieve scalability. In this paper, we propose M-Chord, a dis- tributed data structure for metric-based similarity search. The structure takes advantage of the idea of a vector index method iDistance in order to transform the issue of simi- larity searching into the problem of interval search in one dimension. The proposed peer-to-peer organization, based on the Chord protocol, distributes the storage space and parallelizes the execution of similarity queries. Promising features of the structure are validated by experiments on the prototype implementation and two real-life datasets.
    Full-text · Conference Paper · Jan 2006
Show more