Conference PaperPDF Available

SAPIR: Scalable and distributed image searching

Authors:

Abstract

In this paper we present a scalable and distributed system for image retrieval based on visual features and annotated text. This system is the core of the SAPIR project. Its architecture makes use of Peer-to-Peer networks to achieve scalability and efficiency allowing the management of huge amount of data. For the presented demo we use 10 million images and accom-panying text (tags, comments, etc.) taken from Flickr. Through the web interface it is possible to efficient perform content-based similarity search, as well as traditional text search on the metadata annotated by the Flickr community. Fast complex query processing is also possible combining visual features and text. We show that the combination of content-based and text search on a large scale can dramatically improve the capability of a multimedia search system to answer the users needs and that the Peer-to-Peer based architecture can cope with the scalability issues (response time obtained for this demo over 10 million images is always below 500 milliseconds).
SAPIR: Scalable and Distributed Image Searching
Fabrizio Falchi, Mouna Kacimi, Yosi Mass, Fausto Rabitti, Pavel Zezula§
ISTI-CNR - Pisa, Italy - Email: {fabrizio.falchi|fausto.rabitti}@isti.cnr.it
Max-Planck-Institut f¨
ur Informatik - Saarbr¨
ucken, Germany - Email: mkacimi@mpi-inf.mpg.de
IBM Haifa Research Lab - Haifa, Israel - Email: yosimass@il.ibm.com
§Masaryk University - Brno, Czech Republic - Email: zezula@fi.muni.cz
Abstract—In this paper we present a scalable and distributed
system for image retrieval based on visual features and annotated
text. This system is the core of the SAPIR project. Its architecture
makes use of Peer-to-Peer networks to achieve scalability and
efficiency allowing the management of huge amount of data.
For the presented demo we use 10 million images and accom-
panying text (tags, comments, etc.) taken from Flickr. Through
the web interface it is possible to efficient perform content-
based similarity search, as well as traditional text search on the
metadata annotated by the Flickr community. Fast complex query
processing is also possible combining visual features and text.
We show that the combination of content-based and text search
on a large scale can dramatically improve the capability of a
multimedia search system to answer the users needs and that
the Peer-to-Peer based architecture can cope with the scalability
issues (response time obtained for this demo over 10 million
images is always below 500 milliseconds).
Index Terms—Peer-to-Peer, metric spaces, distributed, scala-
bility, MPEG-7, similarity search.
I. INTRODUCTION
Non-text data, such as images, music, animations, and
videos is nowadays a large component of the Web. However,
web tools for performing image searching (such the ones
provided by Google, Yahoo!, or MSN Live Search) simply
index the text associated with the image. Web search is
dominated today by text only indexes enriched by page rank
algorithms, thus search for audio-visual content, it is limited
to associated text and metadata annotations.
Image indexing methods based on content-based analysis or
pattern matching (which typically analyze the characteristics
of images, i.e., features, such as colors and shapes) are usually
not exploited at all. In fact, for this kind of data the appro-
priate search methods are based on similarity paradigms that
typically exploits range queries and nearest neighbor queries.
These queries are computationally more intensive than text
search, because conventional inverted indexes used for text
are not applicable for such data.
The European project SAPIR (Search on Audio-visual
content using Peer-to-peer Information Retrieval)1aims at
breaking this technological barrier by developing a large-
scale, distributed Peer-to-Peer infrastructure that will make it
possible to search for audio-visual content by querying the
specific characteristics (i.e., features) of the content. SAPIR’s
goal is to establish a giant Peer-to-Peer network, where users
1http://www.sapir.eu/
are peers that produce audiovisual content using multiple
devices (e.g., cell phones) and service providers will use
more powerful peers that maintain indexes and provide search
capabilities
A picture is worth a thousand words” so using an image
taken by a cell phone to find information about e.g. a monu-
ment we bump into or singing a melody as a search hint for
a full song, combined with optional metadata annotations and
user and social networking context will provide the next level
of search capabilities and precision of retrieved results.
II. SAPIR ARCHITECTURE
Although many similarity search approaches have been
proposed, the most generic one considers the mathematical
metric space as a suitable abstraction of similarity [1]. The
simple but powerful concept of the metric space consists of a
domain of objects and a metric distance function that measures
the proximity of pairs of objects. A distance, to be a metric,
must satisfy a set of simple constrains the most important of
which is the triangle inequality.
The metric space approach has been proved to be very
important for building efficient indexes for content based
similarity searching. A survey of existing approaches for
centralized structures (e.g. M-tree, can be found in [1]).
However, searching on the level of features exhibits linear
scalability with respect to the data search size. The reason
is that for this kind of data the appropriate search methods
are based on similarity paradigms that typically exploits range
queries and nearest neighbor queries which very much inten-
sive because conventional inverted indexes used for text are
not applicable for such data.
Very recently scalable and distributed index structures based
on Peer-to-Peer networks have also been proposed for similar-
ity searching in metric spaces and are used in the context of
the SAPIR project - i.e. GHT* [2], VPT* [3], MCAN [4], M-
Chord [5] (see [3] These index structures have been proved
to provide scalability for similarity search adding resources
as the dataset grows. Peer-to-Peer architectures are convenient
approach and a common characteristic of all these existing
approaches is the autonomy of the peers with no need of
central coordination or flooding strategies. Since there are no
bottlenecks, the structures are scalable and high performance is
achieved through parallel query execution on individual peers.
11
In SAPIR also text will be indexed using a Peer-to-Peer
architecture called MINERVA [6]. In MINERVA each peer is
considered autonomous and has its own local search engine
with a crawler and a local index. Posting meta-information
into the Peer-to-Peer network the peers share their local
indexes. This meta-information contains compact statistics and
quality-of-service information, and effectively forms a global
directory. The Peer-to-Peer engine uses the global directory
to identify candidate peers that are most likely to provide
good query results. More information about MINERVA can
be found in [6].
An IR-style query language for multimedia content based
retrieval has been developed for SAPIR. It exploits the XML
representation of MPEG-7 and it is an extension of the XML
Fragments query language that was originally designed as a
Query-By-Example for text-only XML collections. Detailed
information can be found in [7].
In SAPIR it is also possible to perform complex similarity
search combining result lists obtained using distinct features,
GPS information and text. To this aim, state of the art
algorithms for combining results are used (e.g., [8]). For multi-
feature indexing SAPIR makes uses of the MUFIN (Multi-
feature Indexing Network) which is built over the MESSIF 2
(Metric Similarity Search Implementation Framework) archi-
tecture [9].
The web user interface used for this demo is derived from
the one we developed for a photo album application built upon
MILOS [10] (a centralized Multimedia Content Management
System).
III. DATASET
For the presented demo the dataset consists of 10 million
images taken from Flickr 3. Each image has metadata infor-
mation annotated by the users community (e.g., tags, location,
comments, etc.).
To perform content-based image retrieval we make use of 5
MPEG-7 Visual Descriptors - ScalableColor, ColorStructure,
DominantColor, EdgeHistogram and HomogeneousTexture.
The extraction of these features from the images typically
requests from 1 to 2 seconds on a nowadays standard PC.
Thus, to process tens of millions images a distributed environ-
ment was required. We decided to use a Grid infrastructure. In
particular we used the DILIGENT 4(A testbed DIgital Library
Infrastructure on Grid ENabled Technology) project which
delivers a Grid production infrastructure shared by a large
number of European organisations on the EGEE 5(Enabling
Grids for E-sciencE) project.
Overall 44 thousand jobs were successfully executed on
the grid, processing around 37 million images. This generated
approximately 112 million text and images objects (4,55 TB
of data) that contain more then 150 million extracted features.
The target for the project is 100 million images. More detailed
2http://lsd.fi.muni.cz/trac/messif
3http://www.flickr.com
4http://www.diligentproject.org
5http://www.eu-egee.org
information (including job distribution per site) can be found
in [11].
It is so far the largest test bed of multimedia content (it
will grow up to 100 million Flickr’s images) available not
only inside SAPIR project but also, in the near future, to the
research community.
ACKNOWLEDGMENTS
This work has been partially supported by the SAPIR
(Search In Audio Visual Content Using Peer-to-Peer IR)
project, funded by the European Commission under IST FP6
(Sixth Framework Programme, Contract no. 45128).
The development and preparation of the demo has involved
a large number of people from several SAPIR project partners.
In particular we would like to mention: Michal Shmueli-
Scheuer and Benjamin Sznajder from IBM Haifa Research
Lab; Paolo Bolettieri, Claudio Gennaro, Claudio Lucchese,
Matteo Mordacchini, Raffaele Perego and Tommaso Piccioli
from ISTI-CNR; Michal Batko, Vlastislav Dohnal, David
Novak and Jan Sedmidubsky from Masaryk University; Tom
Crecelius from Max-Planck-Institut f¨
ur Informatik.
REFERENCES
[1] P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search. The
Metric Space Approach, ser. Advances in Database Systems. Springer
Science + Business Media, Inc., 2006, vol. 32.
[2] M. Batko, C. Gennaro, and P. Zezula, “Similarity grid for searching in
metric spaces.” in Peer-to-Peer, Grid, and Service-Orientation in Digital
Library Architectures. 6th Thematic Workshop of the EU Network of
Excellence DELOS, Revised Selected Papers, ser. LNCS, vol. 3664.
Springer-Verlag Berlin Heidelberg, 2004, pp. 25–44.
[3] M. Batko, D. Novak, F. Falchi, and P. Zezula, “On scalability of the
similarity search in the world of peers,” in InfoScale ’06: Proceedings of
the 1st international conference on Scalable information systems.New
York, NY, USA: ACM Press, 2006, p. 20.
[4] F. Falchi, C. Gennaro, and P. Zezula, “A content-addressable network
for similarity search in metric spaces,” in DBISP2P ’05: Proceedings of
the the 2nd International Workshop on Databases, Information Systems
and Peer-to-Peer Computing, Trondheim, Norway, ser. Lecture Notes in
Computer Science, vol. 4125. Springer, 2005, pp. 98–110.
[5] D. Novak and P. Zezula, “M-chord: a scalable distributed similarity
search structure,” in InfoScale ’06: Proceedings of the 1st international
conference on Scalable information systems. New York, NY, USA:
ACM Press, 2006, p. 19.
[6] M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer,
“MINERVA: Collaborative P2P Search,” in VLDB ’05: Proceedings of
the 31st international conference on Very large data bases. VLDB
Endowment, 2005, pp. 1263–1266.
[7] J. Mamou, Y. Mass, M. Shmueli-Sheuer, and B. Sznajder, “Query
language for multimedia content,” in Procedding of the Multimedia
Information Retrieval workshop held in conjunction with ACM SIGIR
Conference - 27 July 2007, Amsterdam, 2007.
[8] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms
for Middleware,CoRR, vol. cs.DB/0204046, 2002. [Online]. Available:
http://arxiv.org/abs/cs.DB/0204046
[9] M. Batko, D. Novak, and P. Zezula, “Messif: Metric similarity search
implementation framework,” in DELOS Conference 2007: Working
Notes, Pisa, 13-14 February 2007. Information Society Technologies,
2007, pp. 11–23.
[10] G. Amato, P. Bolettieri, F. Debole, F. Falchi, F. Rabitti, and P. Savino,
“Using milos to build a multimedia digital library application: The
photobook experience.” in Research and Advanced Technology for
Digital Libraries, ECDL 2006, ser. LNCS, vol. 4172. Springer-Verlag
Berlin Heidelberg, 2006, pp. 379–390.
[11] “Diligent data challenges,” 2007, [Online;
accessed 12-October-2007]. [Online]. Available:
https://twiki.cern.ch/twiki/bin/view/DILIGENT/DiligentFlickrDC
12
... Pivots are quite studied and it is known that this technique does not resist the curse of dimensionality 3 , but in low dimensional spaces (2)(3)(4)(5)(6)(7)(8), it has good performance. By the other hand, partition based algorithms have good performance in medium and high dimension (8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20). ...
... In this section, we show the performance of our heuristic in a real-world space of images. The dataset used was obtained from the web site called Flickr, using the URL provided by the SAPIR collection [6]. The content-based descriptors extracted from the images were: Color Histogram 3×3×3 using RGB color space (a 27 dim vector), Gabor Wavelet (a 48 dim vector), Efficient Color Descriptor 8×1 using both RGB and HSV color space (two 32 dim vectors), and Edge Local 4 × 4 (a 80 dim vector). ...
Conference Paper
Full-text available
Proximity searching consists in retrieving the most similar objects to a given query. This kind of searching is a basic tool in many fields of artificial intelligence, because it can be used as a search engine to solve problems like $kN\!N$ searching. A common technique to solve proximity queries is to use an index. In this paper, we show a variant of the permutation based index, which, in his original version, has a great predicting power about which are the objects worth to compare with the query (avoiding the exhaustive comparison). We have noted that when two permutants are close, they can produce small differences in the order in which objects are revised, which could be responsible of finding the true answer or missing it. In this paper we pretend to mitigate this effect. As a matter of fact, our technique allows us both to reduce the index size and to improve the query cost up to 30%.
... Texture: Texture measures for visual patterns in images, are presented by texels which are put into a number of sets, it depends on how many textures are found out in an image [3]. These sets not only define the textures it also tells location of image in the texture. ...
Article
Full-text available
Another method of image processing content based image is said to be possibility of recovery content based information retrieval (CBIR), posing question by image content (QBIC) and (CBIR). This is an application of computer vision meant to explain image retrieval problem. In large databases we have to find the required image by applying some query on the basis of content based shapes, textures colors etc.we find the required data or image. If the ability to estimate or examine the image Content does not exist, in that case search must depend upon metadata like caption or keywords. If the query doesn't match the required contents then it is implemented on some other feature of images to retrieve from the database. This research focus on the Content Based image retrieval with specific domain of Text Based image retrieval (TBIR) system.
... dimensionality. To investigate scalability we used a real dataset obtained from the SAPIR 2 image collection [26]. The content-based descriptors extracted from the images were Color Histogram 3 Â 3 Â 3 using RGB color space (a 27 dim vector), Gabor Wavelet (a 48 dim vector), Efficient Color Descriptor 8 Â 1 using both RGB and HSV color space (two 32 dim vectors), and Edge Local 4 Â 4 (a 80 dim vector). ...
... The set of image objects were taken from Flickr, using the URL provided by the SAPIR collection [6]. The content-based descriptors extracted from the images were: Color Histogram 3 × 3 × 3 using RGB color space (27 dimension vector), Gabor Wavelet (48 dimension vector), Efficient Color Descriptor (ECD) 8 × 1 using RGB color space (32 dimension vector), ECD 8 × 1 using HSV color space (32 dimension vector), and Edge Local 4 × 4 (80 dimension vector). ...
Conference Paper
Full-text available
The permutation based index has shown to be very effective in medium and high dimensional metric spaces, even in difficult problems such as solving reverse $k$-nearest neighbor queries. Nevertheless, currently there is no study about which are the desirable features one can ask to a permutant set, or how to select good permutants. Similar to the case of pivots, our experimental results show that, compared with a randomly chosen set, a good permutant set yields to fast query response or to reduce the amount of space used by the index. In this paper, we start by characterizing permutants and studying their predictive power; then we propose an effective heuristic to select a good set of permutant candidates. We also show empirical evidence that supports our technique.
... Texture: Texture measures for visual patterns in images, are presented by texels which are put into a number of sets, it depends on how many textures are found out in an image [3]. These sets not only define the textures it also tells location of image in the texture. ...
Article
Full-text available
Another method of image processing content based image is said to be possibility of recovery content based information retrieval (CBIR), posing question by image content (QBIC) and (CBIR). This is an application of computer vision meant to explain image retrieval problem. In large databases we have to find the required image by applying some query on the basis of content based shapes, textures colors etc.we find the required data or image. If the ability to estimate or examine the image Content does not exist, in that case search must depend upon metadata like caption or keywords. If the query doesn't match the required contents then it is implemented on some other feature of images to retrieve from the database. This research focus on the Content Based image retrieval with specific domain of Text Based image retrieval (TBIR) system. [Journal of American Science 2010; 6(9):704-707]. (ISSN: 1545-1003).
... EXPERIMENTS In this section we show the performance of our heuristic in a real-world space of images. The set of image objects were taken from Flickr, using the URL provided by the SAPIR collection [12]. The content-based descriptors extracted from the images were: Color Histogram 3x3x3 using RGB color space (a 27dim vector), Gabor Wavelet (a 48dim vector), Efficient Color Descriptor (ECD) 8x1 using RGB color space (a 32dim vector), ECD 8x1 using HSV color space (a 32dim vector), and Edge Local 4x4 (a 80dim vector). ...
Article
Full-text available
The permutation index has shown to be very effective in medium and high dimensional metric spaces, even in difficult problems, for instance, when solving reverse k-nearest neighbor queries. Nevertheless, currently there is no study about which are the desirable features one can ask to a permutant set, or how to select good permutants. Similar to the case of pivots, our experimental results show that, compared with a randomly chosen set, a good permutant set yields to fast query response or to reduce the amount of space used by the index. In this paper we start by characterizing permutants and studying their discrimination power, and then we propose an effective heuristic to select a good permutant candidate set. We also show empirical evidence that supports our technique.
... The present image search system has been implemented using Java 1.6, C++, and PostgreSQL. The set of image object were taken from Flickr web site 1 using the SAPIR collection [3]. ...
Conference Paper
Full-text available
We present an image retrieval system based on a combined search of text and content. The idea is to use the text present in title, description, and tags of the images for improving the results obtained with a standard content-based search. The system contains two different user interfaces: a sidebar for the browser designed for end users, where the user must enter the Flickr URL that is visiting and the system retrieves similar images from the collection, and an advanced search designed for experienced users, where the distance functions and weights can be customized. Text-based search provides results with semantic sim- ilarity, while content-based search provides results with visual similarity. Due to the independence between these approaches, is likely that their combination could improve the performance of a search system by benefiting of both approaches. In the present work, we present an image retrieval system based on a combined search of text and content.
... Flickr. The set of image objects were taken from Flickr, using the URL provided by the SAPIR collection [5]. The content-based descriptors extracted from the images were: Color Histogram 3 × 3 × 3 using RGB color space (a 27 dim vector), Gabor Wavelet (a 48 dim vector), Efficient Color Descriptor (ECD) 8 × 1 using RGB color space (a 32 dim vector), ECD 8 × 1 using HSV color space (a 32 dim vector), and Edge Local 4 × 4 (a 80 dim vector). ...
Conference Paper
Full-text available
Modeling proximity searching problems in a metric space allows one to approach many problems in different areas, e.g. pattern recognition, multimedia search, or clustering. Recently there was proposed the permutation based approach, a novel technique that is unbeatable in practice but difficult to compress. In this article we introduce an improvement on that metric space search data structure. Our technique shows that we can compress the permutation based algorithm without loosing precision. We show experimentally that our technique is competitive with the original idea and improves it up to 46% in real databases.
Article
Full-text available
The growing amount of digital multimedia data available today and the de-facto MPEG-7 standard for multimedia content description has lead to the requirement of a query language for multimedia content. MPEG-7 is expressed in XML and it defines descriptors of the multimedia content such as audio-visual descriptors, location and time attributes as well as other metadata such as media author, media Uri and more. While most search solutions for multimedia today are based on text annotations, having the MPEG-7 standard opens an opportunity for real multimedia content based retrieval. In this paper we propose an IR-style query language for such multimedia content based retrieval that exploits the XML representation of MPEG-7. The query language is an extension of the "XML Fragments" query language that was originally designed as a Query-By-Example for text-only XML collections. We mainly focus on the unique characteristics of Multimedia content which needs to support similarity search query (range search and K-nearest neighbors) and queries on spatio-temporal attributes.
Conference Paper
Full-text available
The need for a retrieval based not on the attribute val- ues but on the very data content has recently led to rise of themetric-basedsimilarity search. Thecomputationalcom- plexity of such a retrieval and large volumes of processed data call for distributed processing which allows to achieve scalability. In this paper, we propose M-Chord, a dis- tributed data structure for metric-based similarity search. The structure takes advantage of the idea of a vector index method iDistance in order to transform the issue of simi- larity searching into the problem of interval search in one dimension. The proposed peer-to-peer organization, based on the Chord protocol, distributes the storage space and parallelizes the execution of similarity queries. Promising features of the structure are validated by experiments on the prototype implementation and two real-life datasets.
Conference Paper
Full-text available
Due to the increasing complexity of current digital data, similarity search has become a fundamental computational task in many applications. Unfortunately, its costs are still high and the linear scalability of single server implemen- tations prevents from efficient searching in large data vol- umes. In this paper, we shortly describe four recent scalable distributed similarity search techniques and study their per- formance of executing queries on three different datasets. Though all the methods employ parallelism to speed up query execution, different advantages for different objec- tives have been identified by experiments. The reported re- sults can be exploited for choosing the best implementations for specific applications. They can also be used for design- ing new and better indexing structures in the future.
Conference Paper
Full-text available
This paper proposes the live demonstration of a prototype of MINERVA, a novel P2P Web search engine. The search engine is layered on top of a DHT-based overlay network that connects an a-priori unlimited number of peers, each of which maintains a personal local database and a local search facility. Each peer posts a small amount of metadata to a physically distributed directory that is used to efficiently select promising peers from across the peer population that can best locally execute a query. The proposed demonstration serves as a proof of concept for P2P Web search by deploying the project on standard notebook PCs and also invites everybody to join the network by instantly installing a small piece of software from a USB memory stick.
Conference Paper
Full-text available
The digital library field is recently broadening its scope of applicability and it is also continuously adapting to the frequent changes occurring in the internet society. Accordingly, digital libraries are slightly moving from a controlled environment accessible only to professionals and domain-experts, to environments accessible to casual users that want to exploit the potentialities offered by the digital library technology. These new trends require, for instance, new search paradigms to be offered, new media content to be managed, and new description extraction techniques to be used. Building digital library applications, and effectively adapting them to new emerging trends, requires to develop a platform that offers standard and powerful building blocks to support application developers. In this paper we discuss our experience of using MILOS, a multimedia content management system oriented to the construction of digital libraries, to build a demanding application dedicated to non-professional users. Specifically, we discuss the design and implementation of an on-line photo album (PhotoBook), which is a digital library application that allows people to manage their own photos, to share them with friends, and to make them publicly available and searchable. PhotoBook, uses a complex internal metadata schema (MPEG-7) and allows users to simply express complex queries (combining similarity search and fielded search), enabling them to retrieve material of interest even if metadata are imprecise or missing.
Conference Paper
Full-text available
The similarity search has become a fundamental computational task in many applications. One of the mathematical models of the similarity - the metric space - has drawn attention of many researchers resulting in several sophisticated metric-indexing techniques. An important part of a research in this area is typically a prototype implementation and subsequent experimental evaluation of the proposed data structure. This paper describes an implementation framework called MESSIF that eases the task of building such prototypes. It provides a number of modules from basic storage management to automatic collecting of performance statistics. Due to its open and modular design it is also easy to implement additional modules if necessary. The MESSIF also oers several ready-to-use generic clients that allow to control and test the index structures and also measure its performance.
Article
Let D be a database of N objects where each object has m fields. The objects are given in m sorted lists (where the ith list is sorted according to the ith field). Our goal is to find the top k objects according to a monotone aggregation function t, while minimizing access to the lists. The problem arises in several contexts. In particular Fagin (JCSS 1999) considered it for the purpose of aggregating information in a multimedia database system. We are interested in instance optimality, i.e. that our algorithm will be as good as any other (correct) algorithm on any instance. We provide and analyze several instance optimal algorithms for the task, with various access costs and models.
Similarity grid for searching in metric spaces in Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures. 6th Thematic Workshop of the EU Network of Excellence DELOS, Revised Selected Papers, ser
  • M Batko
  • C Gennaro
  • P Zezula
M. Batko, C. Gennaro, and P. Zezula, "Similarity grid for searching in metric spaces." in Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures. 6th Thematic Workshop of the EU Network of Excellence DELOS, Revised Selected Papers, ser. LNCS, vol. 3664. Springer-Verlag Berlin Heidelberg, 2004, pp. 25–44.
A content-addressable network for similarity search in metric spaces Information Systems and Peer-to-Peer Computing
  • F Falchi
  • C Gennaro
  • P Zezula
F. Falchi, C. Gennaro, and P. Zezula, "A content-addressable network for similarity search in metric spaces," in DBISP2P '05: Proceedings of the the 2nd International Workshop on Databases, Information Systems and Peer-to-Peer Computing, Trondheim, Norway, ser. Lecture Notes in Computer Science, vol. 4125. Springer, 2005, pp. 98–110.
Similarity Search. The Metric Space Approach, ser Advances in Database Systems
  • P Zezula
  • G Amato
  • V Dohnal
  • M Batko
P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search. The Metric Space Approach, ser. Advances in Database Systems. Springer Science + Business Media, Inc., 2006, vol. 32.