Conference PaperPDF Available

Adaptive Committees of Feature-Specific Classifiers for Image Classification.


Abstract and Figures

We present a system for image classification based on an adaptive committee of five classifiers, each specialized on classifying images based on a single MPEG-7 feature. We test four different ways to set up such a committee, and obtain important accuracy improvements with respect to a baseline in which a single classifier, working an all five features at the same time, is employed.
Content may be subject to copyright.
Adaptive Committees of Feature-specific Classifiers for
Image Classification
Tiziano Fagni, Fabrizio Falchi and Fabrizio Sebastiani
Istituto di Scienza e Tecnologia dell’Informazione
Consiglio Nazionale delle Ricerche
Via Giuseppe Moruzzi 1 – 56124 Pisa, Italy
Abstract. We present a system for image classification based on an adaptive
committee of five classifiers, each specialized on classifying images based on a
single MPEG-7 feature. We test four different ways to set up such a committee,
and obtain important accuracy improvements with respect to a baseline in which
a single classifier, working an all five features at the same time, is employed.
1 Introduction
An automated classification system is normally specified by specifying two essential
components. The first is a scheme for internally representing the data items that are the
objects of classification; this representation scheme, that is usually vectorial in nature,
must be such that a suitable notion of similarity (or closeness) between the representa-
tions of two data items can be defined. Here, “suitable” means that similar representa-
tions must be attributed to data items that are perceived to be similar. If so, a classifier
may identify, within the space of all the representations of the data items, a limited re-
gion of space where the objects belonging to a given class lie; here, the assumption of
course is that data items that belong to the same class are “similar”. The second com-
ponent is a learning device that takes as input the representations of trainingdata items
and generates a classifier from them.
In this work we address single-label image classification, i.e., the problem of setting
up an automated system that classifies an image into exactly one from a predefined set
of classes. Image classification has a long history (see e.g., [1]), most of which has
produced systems that conform to the pattern described at the beginning of this section.
In this paper we take a detour from this tradition, and describe an image classifi-
cation system that makes use not of a single representation, but of five different ones
for the same data item; these representations are based on five different descriptors, or
“features”, from the MPEG-7 standard, each analyzing an image under a different point
of view. As a learning device we use a “committee” of five feature-specific classifiers,
i.e., an appropriately combined set of classifiers each based on the representation of the
image specific to a single MPEG-7 feature. The committees that we use are adaptive,
in the sense that, for each image to be classified, they dynamically decide which among
the five classifiers should be entrusted with the classification decision, or decide whose
decisions should be trusted more. We study experimentally four different techniques of
combining the decisions of the five individual classifiers, using a dataset consisting of
photographs of stone slabs classified into different types of stone.
As a technique for generating the individual members of the classifier committee
we use distance-weighted knearest neighbours, a well-known example-based learn-
ing technique. Technically, this method does not require a vectorial representation of
data items to be defined, since it simply requires that, given two data items, a distance
between them is defined. In the discussion that follows this will allow us to abstract
away from the details of the representation specified by the MPEG-7 standard, and sim-
ply specify our methods in terms of distance functions between data items. This is not
problematic, since distance functions both for the individual MPEG-7 features and for
the image as a whole have already been studied and defined in the literature.
Since distance computation is so fundamental to our methods, we have also studied
how to compute distances between data items efficiently, and have implemented an
efficient system that makes use of metric data structures explicitly devised for “nearest
neighbour search”.
The rest of the paper is organized as follows. Section 2 describes in detail the learn-
ing algorithm, while Section 3 discusses how we have implemented efficiently these
learning algorithms by recurring to metric data structures. In Section 4 we move to
describing our experiments, and to discuss conclusions that can be drawn from them.
2 Automatic Image Classification by means of Adaptive,
Feature-specific Committees
Given a set of documents Dand a predefined set of classes (also known as labels, or
categories)C={c1,...,cm},single-label (aka 1-of-m, or multiclass)document clas-
sification (SLC) is the task of automatically building a single-label document classifier,
i.e., a function ˆ
Φthat predicts, for any diD, the correct class cjCto which di
belongs. More formally, the task is that of approximating, or estimating, an unknown
target function Φ:DC, that describes how documents ought to be classified, by
means of a function ˆ
Φ:DC, called the classifier, such that Φand ˆ
Φ“coincide as
much as possible”1.
The solutions we will give to this task will be based on automatically generating
the classifiers ˆ
Φby supervised learning. This will require a set of documents as
input which are manually labelled according to the classes C, i.e., such that for each
document dithe value of the function Φ(di)is known. In the experiments we
present in Section 4 the set will be partitioned into two subsets T r (the training set)
and T e (the test set), with T r T e =;T r will be used in order to generate the
classifiers ˆ
Φby means of supervised learning methods, while T e will be used in order
to test the effectiveness (i.e., accuracy) of the generated classifiers.
1Consistently with most mathematical literature we use the caret symbol (ˆ) to indicate estima-
2.1 Image Classifiers as Committees of Single-feature Classifiers
The image classifier ˆ
Φ:DCthat we will generate will actually consist of a classi-
fier committee (aka classifier ensemble), i.e., of a tuple ˆ
Φ= ( ˆ
Φ1,..., ˆ
Φn)of classifiers,
where each classifier ˆ
Φsis specialized in analyzing the image from the point of view
of a single feature fsF, where Fis a set of image features. For instance, a classi-
fier ˆ
Φcolour will be set up that classifies the image only according to its distribution of
colours, and a further classifier ˆ
Φtexture will be set up that classifies the image accord-
ing to texture considerations. As image features we will use five visual “descriptors” as
defined in the MPEG-7 standard2, each of them characterizing a particular visual aspect
of the image. These five descriptors are Colour Layout (CL – information about the spa-
tial layout of colour images), Colour Structure (CS – information about colour content
and its spatial arrangement), Edge Histogram (EH – information about the spatial distri-
bution of five types of edges), Homogeneous Texture (HT – texture-related properties of
the image), and Scalable Colour (SC – a colour histogram in the HSV colour space)3.
The “aggregate” classifier ˆ
Φtakes its classification decision by combining the de-
cisions returned by the feature-specific classifiers ˆ
Φsby means of an adaptive combi-
nation rule, i.e., a combination rule that pays particular attention to those ˆ
Φs’s that are
expected to perform more accurately on the particular image that needs to be classified.
This is advantageous, since different features could be the most revealing for classify-
ing different types of images; e.g., for correctly recognizing that an image belongs to
class cthe Homogeneous Texture feature might be more important than Colour Layout,
while the contrary might happen for class c′′ . In the techniques that we have used in this
work, whether and how much a given feature is effective for classifying a given docu-
ment is automatically detected, and automatically brought to bear in the classification
For implementing the classifier committee, i.e., for combining appropriately the
outputs of the ˆ
Φs’s, we will experiment with four different techniques. In Sections 2.1
to 2.1 we will describe these techniques, while in Section 2.2 we will describe how to
generate the individual members of these committees.
Dynamic Classifier Selection. The first technique we test is dynamic classifier selec-
tion (DCS) [2–4]. This technique consists in
1. identifying the set
χw(di) = arg w
dpT r δ(di, dp)(1)
of the wtraining examples closest to the test document di, where δ(d, d′′ )is a
(global) measure of distance to be discussed more in detail in Section 3);
2. attributing to each feature-specific classifier ˆ
Φsa score g(ˆ
Φs, χw(di)) that measures
how well it classifies the examples in χw(di); see below for details;
2International Organization for Standardization, Information technology - Multimedia content
description interfaces, Standard ISO/IEC 15938, 2002.
3For definitions of these MPEG-7 visual descriptors see: International Organization for Stan-
dardization, Information technology - Multimedia content description interfaces - Part 3: Vi-
sual, Standard ISO/IEC 15938-3, 2002.
3. adopting the decision of the classifier with the highest score; i.e., ˆ
Φ(di) = ˆ
where ˆ
Φt= arg max
Φs, χw(di)).
This technique is based on the intuition that similar documents are handled best by
similar techniques, and that we should thus trust the classifier which has proven to
behave best on documents similar to the one we need to classify.
We compute the score from Step (2) as
Φs, di) = X
(1 δ(di, dp)) ·[ˆ
Φs(dp) = Φ(dp)] (2)
where [α]is an indicator function, i.e.,
[α] = +1 if α=T rue
1if α=F alse
Equation 2 thus encodes the intuition that the more examples in χw(di)are correctly
classified by ˆ
Φs(i.e., are such that ˆ
Φs(dp) = Φ(dp)), and the closer they are to di(i.e,
the lower δ(di, dp)is), the better ˆ
Φsmay be expected to behave in classifying di.
Weighted Majority Vote. The second technique we test is weighted majority vote
(WMV), a technique similar in spirit to the “adaptive classifier combination” technique
of [3]. WMV is different from DCS in that, while DCS eventually trusts a single feature-
specific classifier (namely, the one that has proven to behave best on documents similar
to the test document), thus completely disregarding the decisions of all the other clas-
sifiers, WMV uses a weighted majority vote of the decisions of all the feature-specific
classifiers ˆ
Φ, with weights proportional to how well each ˆ
Φshas proven to be-
have on documents similar to the test document. This technique is thus identical to DCS
except that Step 3 is replaced by the following two steps:
3. for each class cjC, all evidence in favour of the fact that cjis the correct class of
diis gathered by summing the g(ˆ
Φs, χw(di)) scores of the classifiers that believe
this fact to be true; i.e.,
z(di, cj) = X
Φs, χw(di)) (3)
4. the class that obtains the maximum z(di, cj)score is chosen, i.e.,
Φ(di) = arg max
cjCz(di, cj)(4)
Confidence-rated Dynamic Classifier Selection. The third technique we test is confi-
dence-rated dynamic classifier selection (CRDCS), a variant of DCS in which the confi-
dence with which a given classifier has classified a document is also taken into account.
From now on we will indeed assume that, given a test document di, a given feature-
specific classifier ˆ
Φsreturns both a class cjCto which it believes dito belong and a
numerical value ν(ˆ
Φs, di)that represents the confidence that ˆ
Φshas in its decision (high
values of νcorrespond to high confidence). In Section 2.2 we will see this to be true
of the feature-specific classifiers we generate in our experiment. Note also that, with
respect to the “standard” version of DCS described in Section 2.1, this “confidence-
aware” variant is more in line with the developments in computational learning theory
of the last 10 years, since confidence is closely related to the notion of “margin”, which
plays a key role in learning frameworks based on structural risk minimization, such as
kernel machines and boosting [5].
The intuition behind the use of these confidence values is that a classifier that has
made a correct decision with high confidence should be preferred to one which has
made the same correct decision but with a lower degree of confidence; and a classifier
that has taken a wrong decision with high confidence should be trusted even less than a
classifier that has taken the same wrong decision but with a lower confidence.
CRDCS is thus the same as DCS in Section 2.1, except for the computation of
the g(ˆ
Φs, di)score in Step 2, which now becomes confidence-sensitive. In CRDCS
Equation (2) thus becomes
Φs, di) = X
(1 δ(di, dp)) ·[ˆ
Φs(dp) = Φ(dp)] ·ν(ˆ
Φs, dp)(5)
Therefore, a classifier ˆ
Φsmay be expected to perform accurately on an example diwhen
many examples in χw(di)are correctly classified by ˆ
Φs, when these are close to di,and
when these correct classifications have been reached with high confidence.
Steps 1 and 3 from Section 2.1 remain unchanged.
Confidence-rated WeightedMajorityVote. The fourth technique we test, confidence-
rated weighted majority vote (CRWMV), stands to WMV as CRDCS stands to DCS;
that is, it consists of a version of WMV in which confidence considerations, as from the
previous section, are taken into account. CRWMV has thus the same form of WMV; the
only difference is that the g(ˆ
Φs, di)score as from Step 2 is obtained through Equation
(5), which takes into account the confidence with which the ˆ
Φsclassifiers have classified
the training examples in χw(di), instead of Equation (2), which does not. Steps 1, 3 and
4 from Section 2.1 remain unchanged.
2.2 Generating the Individual Classifiers
Each individual classifier ˆ
Φs(i.e., each member of the various committees described in
Section 2.1) is generated by means of the well-known (single-label, distance-weighted)
knearest neighbours (k-NN) technique. This technique consists in the following steps;
for a test document di
1. (similarly to Equation 1) identify the set
χk(di) = arg k
dpT r δs(di, dp)(6)
of the ktraining examples closest to the test document di, where δs(d, d′′)is a
distance measure between documents in which only aspects specific to feature fs
are taken into consideration, and kis an integer parameter;
2. for each class cjC, gather the evidence q(di, cj)in favour of cjby summing the
complements of the distances between diand the documents in χk(di)that belong
to cj; i.e.,
q(di, cj) = X
dpχk(di) : Φ(dp)=cj
(1 δs(di, dp)) (7)
3. pick the class that maximizes this evidence, i.e.,
Φs(di) = arg max
cjCq(di, cj)(8)
Standard forms of distance-weighted k-NN do not usually output a value of confidence
in their decision. We naturally make up for this by adding a further step to the process,
4. set the value of confidence in this decision to
Φs, di) = q(di,ˆ
Φs(di)) Pcj6=ˆ
Φs(di)q(di, cj)
That is, the confidence in the decision taken is defined as the strength of evidence in
favour of the chosen class minus the average strength of evidence in favour of all the
remaining classes.
Distance-weighted k-NN classifiers have several advantages over classifiers gener-
ated by means of other learning methods:
Very good effectiveness, as shown in several text classification experiments [6–
9]; this effectiveness is often due to their natural ability to deal with non-linearly
separable classes;
The fact that they scale extremely well (better than SVMs) to very high numbers
of classes [9]. In fact, computing the |T r|distance scores and sorting them in de-
scending order (as from Step 1) needs to be performed only once, irrespectively of
the number mof classes involved; this means that distance-weighted k-NN scales
(wildly) sublinearly with the number of classes involved, while learning methods
that generate linear classifiers scale linearly, since none of the computation needed
for generating a single classifier ˆ
Φcan be reused for the generation of another
classifier ˆ
Φ′′, even if the same training set T r is involved.
The fact that they are parametric in the distance function they use. This allows the
use of distance measures customized to the specific type of data involved, which
turns out to be extremely useful in our case.
3 Efficient Implementation of Nearest Neighbour Search by
Metric Data Structures
In order to speed up the computations of our classifiers we have focused on imple-
menting efficiently nearest neighbour search, which can be defined as the operation of
finding, within a set of objects, the kobjects closest to a given target object, given a
suitable notion of distance. The reason we have focused on speeding up this operation
is that
1. it accounts for most of the computation involved in classifying objects through the
k-NN method of Section 2.2; Step 1 of this method requires nearest neighbour
2. it also accounts for most of the computation involved in combining base classi-
fiers through each of the four methods of Section 2.1; Step 1 of each of these four
methods also requires nearest neighbour search.
Efficient implementation of nearest neighbour search requires data structures in sec-
ondary storage that are explicitly devised for this task [10–12]. As such a data structure
we have used an M-tree [13]4, a data structure explicitly devised for speeding up near-
est neighbour search in metric spaces, i.e., sets in which a distance function is defined
between their members that is a metric5. We have been able to use M-trees exactly
as the five feature-specific distance functions δsof Equation 6, we have chosen the
distance measures recommended by the MPEG group (see [14] for details), which
are indeed metrics;
as the global distance function δof Equation 1 we have chosen a linear combination
of the previously mentioned five δsfunctions, which is by definition also a metric.
As the linear combination weights wswe have simply adopted the weights derived
from the study presented in [15], i.e., w(CL) = .007,w(CS) = .261,w(EH ) =
.348,w(H T ) = .043,w(SC) = .174.
Note that, in reality, the δsfunctions from [14] that we have adopted do not range
on [0,1], but on five different intervals [0, αs]; in order to have them all range on
[0,1] we have multiplied all distances by the normalization weights z(CL) = .174,
z(CS ) = .075,z(EH) = .059,z(H T ) = .020,z(SC) = .001.
4 Experiments
The dataset that we have used for our experiments (here called the Stone dataset) is
a set of 2,597 photographs of stone slabs, subdivided under 37 classes representing
different types of stone6. The dataset was randomly split into a training set, containing
approximately 30% of the entire dataset, and a test set, consisting of the remaining
70%. For each photograph an internal representation in terms of MPEG-7 features was
generated and stored into an M-tree.
4We have used the publicly available Java implementation of M-trees developed at Masaryk
University, Brno; see
5Ametric is a distance function δon a set of objects Xsuch that, for any x1, x2, x3X, it is
true that (a) δ(x1, x2)0(non-negativity); (b) δ(x1, x2) = 0 if and only if x1=x2(iden-
tity of indiscernibles); (c) δ(x1, x2) = δ(x2, x1)(symmetry); (d) δ(x1, x3)δ(x1, x2) +
δ(x2, x3)(triangle inequality).
6The dataset was provided by the Metro S.p.A. Marmi e Graniti company (see, and was generated during their routine production pro-
cess, according to which slabs are first cut from stone blocks, and then photographed in order to
be listed in online catalogues that group together stone slabs produced by different companies.
As a measure of effectiveness we have used error rate (noted E), i.e., the percentage
of test documents that have been misplaced in a wrong class.
As a baseline, we have use a “multi-feature” version of the distance-weighted k-
NN technique of Section 2.2, i.e., one in which the distance function δmentioned at
the end of Section 3, and resulting from a linear combination of the five feature-specific
δsfunctions, is used in place of δsin Equation 6. For completeness we also report five
other baselines, obtained in a way similar to the one above but using in each a feature-
specific distance function δs. In these baselines and in the experiments involving our
adaptive classifiers the kparameter has been fixed to 30, since this value has proved
the best choice in previous experiments involving the same technique [7,8]. The w
parameter of the four adaptive committees has been set to 5, which is the value that
had performed best on previous experiments we had run on a different dataset. In future
experiments we plan to optimize these parameters more carefully by cross-validation.
The results of our experiments are reported in Table 1. From this table we may
notice that all four committees (2nd row, 2nd to 5th cells) bring about a noteworthy
reduction of error rate with respect to the baseline (2nd row,1st cell). The best performer
proves the confidence-rated dynamic classifier selection method of Section 2.1, with
a reduction in error rate of 39.7% with respect to the baseline. This is noteworthy,
since both this method and the baseline use the same information, and only combine
it in different ways. The results also show that confidence-rated methods (CRDCS and
CRWMV) are not uniformly superior to methods (DCS and WMV) which do not make
use of confidence values. They also show that dynamic classifier selection methods
(DCS and CRDCS) are definitely superior to weighted majority voting methods (WMV
and CRWMV).
This latter result might be explained by the fact that, out of five features, three (CS,
CL, SC) are based on colour, and are thus not completely independent from each other;
if, for a given test image, colour considerations are not relevant for picking the correct
class, it may be different to ignore them anyway, since they are brought to bear three
times in the linear combination. In this case, DCS and CRDCS are more capable of
ignoring colour considerations, since they will likely entrust either the EH- or the HT-
based classifier with taking the final classification decision.
The same result also seems to suggest that, for any image, there tends to be a single
feature that alone is able to determine the correct class of the image, but this feature is
not always the same, and sharply differs across categories. For instance, the SC feature
is the best performer, among the single-feature classifiers (1st row), on test images
belonging to class GIALLO VENEZIANO (E=.11), where it largely outperforms the
EH feature (E=.55), but the contrary happens for class ANTIQUE BROWN, where
EH (E=.01) largely outperforms SC (.22). That no single feature alone is a solution
for all situations is also witnessed by the fact that all single-feature classifiers (1st row)
are, across the entire dataset, largely outperformed by both the baseline classifier and
all the adaptive committees. This fact confirms that splitting the image representation
into independent feature-specific representations on which feature-specific classifiers
operate is a good idea.
Table 1. Error rates of the techniques as tested on the Stone dataset; percentages indicate de-
crease in error rate with respect to the baseline. The first five results are relative to the five
feature-specific baselines. Boldface indicates the best performer.
0.479 0.318 0.479 0.410 0.419
0.297 0.183 (-38.4%) 0.179 (-39.7%) 0.225 (-24.2%) 0.227 (-23.6%)
This work has been partially supported by Project “Networked Peers for Business”
(NeP4B), funded by the Italian Ministry of University and Research (MIUR) under
the “Fondo per gli Investimenti della Ricerca di Base” (FIRB) funding scheme. We
thank Gianluca Fabrizi and Metro S.p.A. Marmi e Graniti for making the Stone dataset
available. Thanks also to Claudio Gennaro and Fausto Rabitti for useful discussions.
1. Lu, D., Weng, Q.: A survey of image classification methods and techniques for improving
classification performance. International Journal of Remote Sensing 28(5)(2007) 823-870
2. Giacinto, G., Roli, F.: Adaptive selection of image classifiers. In: Proceedings of the 9th
International Conference on Image Analysis and Processing (ICIAP’97), Firenze, IT(1997)
3. Li, Y.H., Jain, A.K.: Classification of text documents. The Computer Journal 41(8) (1998)
4. Woods, K., Kegelmeyer Jr, W., Bowyer, K.: Combination of multiple classifiers using local
accuracy estimates. IEEE Transactions on Pattern and Machine Intelligence 19(4) (1997)
5. Schapire, R.E., Singer, Y.: Improved boosting using confidence-rated predictions. Machine
Learning 37(3) (1999) 297-336
6. Joachims, T.: Text categorization with support vector machines: Learning with many rel-
evant features. In: Proceedings of the 10th European Conference on Machine Learning
(ECML’98), Chemnitz, DE (1998) 137-142
7. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval
1(1/2) (1999) 69-90
8. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the
22nd ACM International Conference on Research and Development in Information Retrieval
(SIGIR’99), Berkeley, US (1999) 42-49
9. Yang, Y., Zhang, J., Kisiel, B.: A scalability analysis of classifiers in text categorization. In:
Proceedings of the 26th ACM International Conference on Research and Development in
Information Retrieval (SIGIR’03), Toronto, CA (2003) 96-103
10. Ch·vez, E., Navarro, G., Baeza-Yates, R., Marroqu`
In, J.L.: Searching in metric spaces. ACM
Computing Surveys 33(3) (2001) 273-321
11. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann,
San Francisco, US (2006)
12. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach.
Springer Verlag, Heidelberg, DE (2006)
13. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search
in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data
Bases (VLDB ’97), Athens, GR (1997) 426-435
14. Manjunath, B., Salembier, P., Sikora, T., eds.: Introduction to MPEG-7: Multimedia Content
Description Interface. John Wiley& Sons, New York, US (2002)
15. Amato, G., Falchi, F., Gennaro, C., Rabitti, F., Savino, P., Stanchev, P.: Improving image
similarity search effectiveness in a multimedia content management system. In: Proceedings
of the 10th International Workshop on Multimedia Information System (MIS’04), College
Park, US (2004) 139-146
... The kNN based approach which is the main focus of this paper is not reported in that survey. In [10], various MPEG-7 descriptors have been used to build kNN classifier committees and test were performed on a slabs of stones dataset. In [7] the effectiveness of NN image classifiers has been proved and an innovative approach based on Image-to-Class distance that is similar in spirit to our approach has been proposed. ...
Conference Paper
Full-text available
In this paper, we propose a novel image classification approach, derived from the kNN classification strategy, that is particularly suited to be used when classifying images described by local features. Our proposal relies on the possibility of performing similarity search between image local features. With the use of local features generated over interest points, we revised the single label kNN classification approach to consider similarity between local features of the images in the training set rather than similarity between images, opening up new opportunities to investigate more efficient and effective strategies. We will see that classifying at the level of local features we can exploit global information contained in the training set, which cannot be used when classifying only at the level of entire images, as for instance the effect of local feature cleaning strategies. We perform several experiments by testing the proposed approach with different types of image local features in a touristic landmarks recognition task.
Full-text available
In this paper, the performance of several visual features is evaluated in automatically recognizing landmarks (monuments, statues, buildings, etc.) in pictures. A number of landmarks were selected for the test. Pictures taken from a test set were classified automatically trying to guess which landmark they contained. We evaluated both global and local features. As expected, local features performed better given their capability of being less affected to visual variations and given that landmarks are mainly static objects that generally also maintain static local features. Between the local features, SIFT outperformed SURF and ColorSIFT.
Full-text available
In the Information Society, information holds the master key to economic influence. Similarity Search: The Metric Space Approach will focus on efficient ways to locate user-relevant information in collections of objects, the similarity of which is quantified using a pairwise distance measure. This book is a direct response to recent advances in computing, communications and storage which have led to the current flood of digital libraries, data warehouses and the limitless heterogeneity of internet resources. Similarity Search: The Metric Space Approach will introduce state-of-the-art in developing index structures for searching complex data modeled as instances of a metric space. This book consists of two parts. Part 1 presents the metric search approach in a nutshell by defining the problem, describes major theoretical principals, and provides an extensive survey of specific techniques for a large range of applications. Part 2 concentrates on approaches particularly designed for searching in very large collections of data. Similarity Search: The Metric Space Approach is designed for a professional audience, composed of academic researchers as well as practitioners in industry. This book is also suitable as introductory material for graduate-level students in computer science.
Full-text available
The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. A large number of solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reinvented several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of "intrinsic dimensionality". We also present a unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework. Most approaches turn out to be variations on a few different concepts. We organize those works in a taxonomy which allows us to devise new algorithms from combinations of concepts which were not noticed before because of the lack of communication between different communities. We present experiments validating our results and comparing the existing approaches. We finish with recommendations for practitioners and open questions for future development.
Conference Paper
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.