Conference PaperPDF Available

Rosso Tiziano: A System for User-Centered Exploration and Discovery in Large Image Information Bases


Abstract and Figures

Retrieval in image information bases has been traditionally addressed by two different and unreconciled approaches: the first one uses normal query methods on metadata or on a textual description of each item. The second one works on low-level multimedia features (such as color, texture, etc.) and tries to find items that are similar to a specific selected item. Neither of these approaches supports the most common end-user task: the exploration of an information base in order to find the “right” items. This paper describes a prototype system based on dynamic taxonomies, a model for the intelligent exploration of heterogeneous information bases, and shows how the system implements a new access paradigm supporting guided exploration, discovery, and the seamless integration of access through metadata with methods based on low-level multimedia features. Example interactions are discussed, as well as the major implications of this approach.
Content may be subject to copyright.
Rosso Tiziano: a system for user-centered exploration
and discovery in large image information bases
Giovanni Maria Sacco
Dipartimento di Informatica, Università di Torino, Corso Svizzera 185,
10149 Torino, Italy
Abstract. Retrieval in image information bases has been traditionally addressed
by two different and unreconciled approaches: the first one uses normal query
methods on metadata or on a textual description of each item. The second one
works on primitive multimedia features (such as color, texture, etc.) and tries to
find items that are similar to a specific selected item. Neither of these
approaches supports the most common end-user task: the exploration of an
information base in order to find the “right” items. This paper describes a
prototype system based on dynamic taxonomies, a model for the intelligent
exploration of heterogeneous information bases, and shows how the system
implements a new access paradigm supporting guided exploration and
discovery and the seamless integration of access through metadata with
methods based on primitive multimedia features. Example interactions are
discussed, as well as the major implications of this approach.
1. Introduction
Current research on image retrieval focuses on two different and unreconciled
approaches for accessing multimedia databases: the metadata approach as opposed to
the content-based approach. In the metadata approach, each image is described by
metadata. Metadata types range from a set of keywords or a textual description, to
standardized structures for metadata attributes and their relationships like the MPEG-7
standard [8], to ontologies [20]. While some metadata (e.g. image format) can be
automatically derived from the image itself, the vast majority of them are manually
entered. Once the items in the collection are described by metadata, the type and
actual content of the item itself becomes irrelevant for browsing and retrieval and only
needed when the item itself has to be “shown” to the user. From this point of view, it
is intuitively appealing and straightforward to use techniques such as database queries
on structured metadata or text retrieval queries on metadata consisting of a textual
description of the multimedia item.
The content-based approach (CBIR, content-based image retrieval) describes the
image through primitive features such as color, texture, shape, etc. which are
automatically extracted from images. Retrieval is based on similarity among images:
the user provides an item (selected through metadata or, in some cases, at random) and
requests similar ones.
Both approaches suffer from significant drawbacks. The metadata approach relies
on descriptions that are known to be inaccurate, heavily dependent on the specific
human classifier, ambiguous, etc. These problems can be alleviated by using
ontological metadata rather than plain text descriptions, but a level of subjectivity
remains. In addition, semantic annotation is costly, especially for large, existing
CBIR originated as an attempt to overcome these problems, by stressing the
automatic extraction of “descriptions” from the image itself. This process is
inexpensive and parallelization can overcome any capacity problems. In addition, the
characterization is completely objective, and does not suffer from the inconsistencies
caused by human classification. However, despite significant improvements over the
years, the accuracy of CBIR systems is admittedly less than satisfactory. The main
reason for this is the semantic gap or “… the lack of coincidence between the
information that one can extract from the visual data and the interpretation that the
same data have for the user in a given situation" [21]. We believe that no CBIR
system will be able to reconstruct all relevant information in all situations: in many
practical cases, the information is just not there. As an example, a photo with a
mountain scene is similar to many other mountain scenes, but it can come from
different (mountain) countries. If the photo chosen as the cover for a leaflet by the
Austrian Tourist Office turns out to be from France or Italy, a less than enthusiastic
client can be expected.
In considering user access, it is obvious that the dichotomy between the two
approaches (metadata vs. primitive features) forces the user to use different access
strategies that only depend on the type of feature (conceptual or primitive) he is
considering. Most importantly, from the user point of view, none of these approaches
really supports an exploratory access to the image collection, which we believe to be
the most common access paradigm. Consider our user looking for the cover photo for
the Austrian Tourist Office. She would probably like to find out which types of photos
on Austria the information base stores: e.g. mountains, towns, museums, etc. Once she
focused on the most interesting type (according to her current requirements), say
mountains, she might be interested in knowing that there are winter, spring, etc. photos
on mountains, or there are photos with a specific dominant, etc.
In short, the user needs to explore the information base. We define exploration as
an iterative activity in which the user must be able to:
1. have a systematic summary S of the entire collection C
2. freely focus on any subset of the entire collection F C and have a systematic
summary S’of F
3. repeat 2 by setting additional, secondary foci until the number of selected items is
sufficiently small for manual inspection.
The major difference between exploration and plain browsing rests on systematic
summaries that provide concise descriptions of content that would otherwise require a
time-consuming scan of the actual items. This implies that some sort of conceptual
organization exists and that the exploration system is able to summarize any subset of
the collection based on such organization. Closely linked to our definition of
exploration is the notion of discovery: the user exploring the information base will
often discover new and unexpected relationships among concepts. We want to stress
here that exploration is not an additional, desirable feature of a multimedia
information retrieval system. On the contrary, we believe that, in most practical cases,
retrieval without exploration is just a trial-and-error task, with no guarantee of the
quality of the user’s final choice.
The access paradigm supported by most current image retrieval systems is quite
different. Systems based on metadata use access techniques such as queries on
structured data or text retrieval techniques that work on precise specifications and do
not support exploration because they produce flat result lists. The inadequacy of these
tools for exploration has been discussed in literature [14, 4]. CBIR systems use
information retrieval techniques that are centered on retrieval by similarity. This type
of access affords a very simple and intuitive user interaction, but offers no clue on
what the information base contains. Systems that organize images through hierarchical
clustering, e.g. [22], do offer an initial systematic summary of the collection, but do
not account for the iterative refinement required by our working definition of
exploration. From this point of view, they are similar to traditional taxonomies that
offer conceptual but static summaries.
There are very few exceptions. Sacco [14] considered the application of dynamic
taxonomies, a knowledge management model that supports exploration (and forms the
backbone of the present approach) to multimedia databases described by conceptual
metadata. The same model was used by Hearst et al. [4] to build a prototype system,
Flamenco, that was successfully applied to a rather large image collection [24]. These
works rely on conceptual metadata features and ignore primitive multimedia features.
From another perspective, El Niño, a prototype system by Santini and Jain [19],
focuses on browsing based on primitive features and textual descriptions. As we
commented before, browsing is quite different from our definition of exploration. El
Niño, in fact, works on a multi-feature weighted similarity measure and relies on a
visual relevance feedback interface to modify these weights and try to match the user
notion of similarity.
The approach we describe here extends the initial ideas reported in [15] and
proposes a significant change in access paradigms, based on dynamic taxonomies. A
system was implemented in order to make all the implications of the non-traditional
design directions we are taking concrete. The prototype system discussed here has
several goals. First, we provide a single, coherent framework that seamlessly
integrates access by metadata and access by primitive features: it considerably
simplifies user access, and each access method reinforces the effectiveness of the
other one. Second, this framework is used to support the exploration of image
collections, so that both metadata and primitive features can be used to express
interest foci, and at the same time to systematically summarize them, in order to guide
the user towards his goal. A number of problems that are relevant in this context are
discussed in the following. Finally, such a prototype system, in which different
primitive features approaches can be easily integrated and compared, can provide an
excellent test bed for the future evaluation of different strategies, integrated access and
human factors in general. At the present stage, we report the first informal findings of
users experimenting with the system.
The information base used in the examples below consists of 251 images of five
masters of the Italian Renaissance: Piero della Francesca, Masaccio, Antonello da
Messina, Paolo Uccello and Raphael. Each work was thoroughly classified according
to a number of topics that include, among others, the painter name, the type of
painting (single, polyptic, etc.), the technique used (oil, tempera, etc.), the period in
which it was painted, current locations, the themes (religious painting, portrait), etc.
Differently from other test image databases, which usually exhibit widely different
images, the images in the sample collection are relatively similar and therefore harder
to characterize. In addition, the collection is a good representative of one of the most
important applications of image retrieval: museum and art collections. The sample
infobase is available on-line at, and is managed by the
Universal Knowledge Processor [5].
Although we focus here on describing image information bases through primitive
features and metadata, the dynamic taxonomy approach can be used in a number of
variations, by considering metadata only or primitive features only, or by integrating
traditional CBIR retrieval by similarity with a dynamic taxonomy metadata
description in order to clarify contexts for similar objects.
2. Dynamic Taxonomies Reviewed
Dynamic taxonomies ([13, 14], also known as faceted search) are a general
knowledge management model for complex, heterogeneous information bases. It has
an extremely wide application range [18] that includes, among others, electronic
commerce, e-government, human resource management and medical guidelines and
diagnosis. The intension of a dynamic taxonomy is a taxonomy designed by an expert,
i.e. a concept hierarchy going from the most general to the most specific concepts. A
dynamic taxonomy does not require any other relationships in addition to
subsumptions (e.g., IS-A and PART-OF relationships) and directed acyclic graph
taxonomies modeling multiple inheritance are supported but rarely required.
Dynamic taxonomies work on conceptual descriptions of items, so that
heterogeneous items of any type and format can be managed in a single, coherent
framework. In the extension, items can be freely classified under several topics at any
level of abstraction (i.e. at any level in the conceptual tree). This multidimensional
classification is a generalization of the monodimensional classification scheme used in
conventional taxonomies and models common real-life situations. First, an item is
very rarely classified under a single topic, because items are very often about different
concepts. Second, items to be classified usually have different independent features
(e.g. Time, Location, etc.), each of which can be described by an independent
taxonomy. These features are often called perspectives or facets.
By defining concepts in terms of instances rather than properties, a concept C is
just a label that identifies all the items classified under C. Because of the subsumption
relationship between a concept and its descendants, the items classified under C
(items(C)) are all those items in the deep extension of C, i.e. the set of items identified
by C includes the shallow extension of C (all the items directly classified under C)
union the deep extension of C’s sons. By construction, the shallow and the deep
extension for a terminal concept are the same.
There are two important consequences of our approach. First, since concepts
identify sets of items, logical operations on concepts can be performed by the
corresponding set operations on their extension. Second, dynamic taxonomies can
infer all the concepts related to a given concept C, on the basis of empirical evidence:
these concepts represent the conceptual summary of C. Concept relationships other
than IS-A are inferred through the extension only, according to the following
extensional inference rule: two concepts A and B are related if there is at least one
item d in the infobase which is classified at the same time under A (or under one of
A’s descendants) and under B (or under one of B’s descendants). For example, an
unnamed relationship between Raphael and Rome can be inferred if an item classified
under Raphael and Rome exists in the infobase. At the same time, since Rome is a
descendant of Italy, also a relationship between Raphael and Italy can be inferred.
The extensional inference rule can be easily extended to cover the relationship
between a given concept C and a concept expressed by an arbitrary subset S of the
universe: C is related to S if there is at least one item d in S which is also in items(C).
In this way, the extensional inference rule can produce conceptual summaries not only
for base concepts, but also for any logical combination of concepts and, most
importantly, access through dynamic taxonomies can be easily combined with other
retrieval methods such as information retrieval, etc.
Information Access through Dynamic Taxonomies
The user is initially presented with a tree representation of the initial taxonomy for
the entire infobase. In general, each concept label has also a count of all the items
classified under it (i.e. the cardinality of items(C) for all C’s), because this statistical
information is important to guide the search. The initial user focus F is the universe
(i.e. all the items in the infobase). In the simplest case, the user can then select a
concept C in the taxonomy and zoom over it. The zoom operation changes the current
state in two ways. First, the current focus F is refined by intersecting it with C (i.e.,
with items(C)); items not in the focus are discarded. Second, the tree representation of
the taxonomy is modified in order to summarize the new focus. All and only the
concepts related to F are retained and the count for each retained concept C’ is
updated to reflect the number of items in the focus F that are classified under C’. The
reduced taxonomy is a conceptual summary of the set of documents identified by F,
exactly in the same way as the original taxonomy was a conceptual summary of the
universe. In fact, the term dynamic taxonomy is used to indicate that the taxonomy can
dynamically adapt to the subset of the universe on which the user is focusing, whereas
traditional, static taxonomies can only describe the entire universe.
The retrieval process is an iterative thinning of the information base: the user
selects a focus, which restricts the information base by discarding all the items not in
the current focus. Only the concepts used to classify the items in the focus, and their
ancestors, are retained. These concepts, which summarize the current focus, are those
and only those concepts that can be used for further refinements. From the human
computer interaction point of view, the user is effectively guided to reach his goal, by
a clear and consistent listing of all possible alternatives.
The advantages of dynamic taxonomies over traditional methods are dramatic in
terms of convergence of exploratory patterns and in terms of human factors. Three
zoom operations on terminal concepts are sufficient to reduce a ten million item
information base to an average ten items [17]. Dynamic taxonomies require a very
light theoretical background: namely, the concept of a subject index (i.e. the
taxonomic organization) and the zoom operation, which seems to be very quickly
understood by end-users. Hearst et al. [4] and Yee et al. [24] conducted usability tests
on a corpus of art images described by metadata only, showing a significantly better
recall than access through text retrieval and, perhaps more importantly, the feeling that
one has actually considered all the alternatives in reaching a result.
3. Combining Conceptual Access with Primitive Multimedia
Access by primitive multimedia features is usually based on clustering: documents
are grouped on the basis of the values of one or more features, say color, according to
a measure of similarity between any two documents. The goal is to create clusters in
such a way that the similarity between any two documents in a cluster C is higher than
the similarity between any document in C and any document not in C. Techniques of
this type derive from the ample body of research on clustering for textual information
bases. Generally a vector space model is used, in which a document is represented by
an n-dimensional vector x=(x1, …, xn). The similarity between any two documents can
then be computed as the distance d(x, y) between the two vectors that represent the
documents; the cosine of the angle between the two vectors is generally used, but
other measures, such as Jaccard’s coefficient, can be used. This geometric
interpretation has an additional benefit, in that a cluster of documents can be
represented by its centroid, which is either the barycenter of the cluster or the
document closest to it.
If we ignore the geometric interpretation of clusters, a cluster merely identifies a set
of items that are grouped together by some affinity. This definition is, for all access
purposes, equivalent to the definition of a concept in a dynamic taxonomy. In fact, in a
dynamic taxonomy, a concept denotes a set of documents classified under it, rather
than a set of properties that instances of a concept must satisfy. Consequently, a rather
straightforward strategy to integrate clusters in a dynamic taxonomy is by adding a
top-most concept (or “facet”) for each clustering scheme, with its actual clusters being
the sons of this concept. For instance, if a clustering scheme by dominant color exists,
a facet labeled “dominant color” will be added at the top-most level. Its sons will be
all the clusters grouping items by dominant color similarity.
There are two obvious problems. The first one is how these clusters are to be
labeled in such a way that their content is easily identifiable and understandable:
labeling strategies are discussed in the following. The second problem is that, in most
situations, the number of clusters for each feature will be large and difficult to
represent in the interface and to manipulate by users: hierarchical clustering schemes
can greatly simplify the effectiveness of interaction.
In the approach presented here, methods based on primitive features benefit from
dynamic taxonomies in two basic ways. First, combinations of concepts can be used to
supply a conceptual context to such methods, and consequently reduce noise. Such a
context “is essential for determining the meaning of an image and for judging image
similarity” [19]. Alternatively, when the user starts from retrieval based on primitive
features, dynamic taxonomies can be used to quickly summarize the result according
to the original conceptual taxonomy, thus increasing the precision of the result. This is
especially important in order to quickly correlate primitive features with metadata.
As an example, consider a user starting from a primitive feature such as a blue
dominant. Focusing on it, the system will show in the conceptual summary that images
with a blue dominant include skies, sea, lakes, cars, etc. Conversely, a user focusing
on skies will find that images of skies may have a blue dominant, but also a yellow
one, a red one, etc. In both cases, the conceptual summary indicates which conceptual
contexts are available to further refine user access.
As we mentioned before, the integration of metadata access with access by
primitive features boosts the effectiveness of each type of access. Very simple
features, such as dominant color, are probably not sufficient per se to provide an
effective access to a large image base: however, when combined with other features,
they can provide a useful discrimination. On the other hand, the availability of
primitive features makes it useless to manually enter a large number of metadata
descriptions, and as we show below, even a simple and small metadata structure can
produce significant benefits in access and exploration.
In fact, an extremely important point in the application of dynamic taxonomies to
image (and, in general, multimedia) information bases is the dramatic convergence
dynamic taxonomies offer. Following the analysis reported in [17], assume that we
organize the taxonomy for an image base containing D documents as a faceted
taxonomy with j facets. This means that the taxonomy is composed by j independent
sub-taxonomies, e.g. topics, dominant colors, luminance, etc. For the sake of
simplicity, assume that the set of terminal concepts T in the dynamic taxonomy is
partitioned into j subsets of the same size. Under this assumption, each partition has
T/j (T/j2) terminal concepts. Assume now that each image d is classified under one
and only one leaf concept C in each partition, and that the probability of classifying d
under C is uniform for each partition. The average number of images to be manually
scanned R (i.e. the cardinality of the current focus F) after k zoom operations is
=0,2, (1)
Assume now an image base of ten million items, and a taxonomy of 1000 terminal
concepts organized in 10 facets, each having 100 terminal concepts. In the case of
primitive features, terminal concepts correspond to feature values (e.g. a specific level
of luminance). According to (1), on the average, one zoom operation produces a focus
with 100,000 images, two zooms produce a focus consisting of 1,000 images, and
three zooms select 10 images.
The main indications of this analysis are two. First, we do not need many features
as long as they are sufficiently uncorrelated [17] and even low-selectivity features can
be used in practice. Second, the upward scalability of dynamic taxonomies is
dramatic, even with compact taxonomies. In the example, a maximum of 10 zoom
operations can be performed: they are sufficient to produce a focus of 10 documents
for an image base consisting of 1021 images.
4. Monodimensional vs. Multidimensional Clustering
Most current research accounts for the diversity of features by computing an overall
similarity distance for the images in the database by linearly combining the similarity
distance of each individual feature, such as primitive multimedia features (e.g. color)
or conceptual features, such as a painter name. Clustering groups documents together
according to a single similarity measure, and consequently each item belongs to one
and only one cluster. We call this type of clustering a monodimensional clustering by
analogy with classification theory. If a hierarchical clustering scheme is used,
monodimensional clustering is similar to a traditional, monodimensional taxonomy.
In alternative, each feature can be considered independently: each feature will
result, in general, into a different clustering scheme because, for instance, two items
similar by texture may have different colors. In this case, an item will belong to
different clusters. We call this multidimensional clustering1.
Here we criticize the use of monodimensional clustering schemes by comparing
them to multidimensional clustering schemes. By switching from monodimensional
clustering to multidimensional clustering on F features, the cost of clustering increases
by a factor F, because all the items have to be fully clustered for each feature.
1. the notion of similarity is inaccurate and ineffective in monodimensional
clustering and classification, and
2. an exponential growth in the number of clusters is required if the number of items
in a cluster is to be kept at a reasonable low level.
In monodimensional clustering, a given multimedia item is represented by a point
in a multidimensional space, computed as the weighted combination of the item’s
primitive multimedia features. Similarity between any two items is then computed as
the inverse of their distance in this space and depends on the weights used to combine
primitive features. Consider two primitive features such as color and texture. Different
users will generally use different weights to combine these features, according to their
interests: user A may be more interested in color than in texture, whereas user B could
be more interested in texture than in color. Different weights on these features imply
different similarity functions, so that items a and b can be similar for user A and quite
different for user B. However, since a single, predefined similarity function is used for
1 clustering theory is indeed defined in a multidimensional space: but in classic clustering an
item a only belongs to single cluster, whereas clustering schemes based on multidimensional
classification can place the same item in different clusters, according to different similarity
clustering, the resulting clustering scheme only accommodates those users whose
notion of similarity matches the predefined function. In order to accommodate other
users, in this same framework, clustering should be performed dynamically on a
similarity function given by the user himself. However, this is not feasible for two
1. cost, as the dynamic reclustering of large information bases requires substantial
resources and time. Reclustering is required in El Niño [19]. A similar approach
in textual databases, Scatter-Gather [1], is criticized in [14];
2. human factors, because it is unlikely that the average, unskilled user would be
able to understand the effects of weights and hence come up with appropriate
weights for different features. A similar problem occurs in product selection in e-
commerce, and is discussed in [16].
Even validation of results can be seriously biased by a monodimensional clustering.
In fact, weighting coefficients that combine feature similarity play a fundamental part
on clustering and access by similarity, so that it may become difficult to discriminate
between the actual advantages offered by specific features and the random effects of a
specific weighting scheme.
In addition, a hierarchical monodimensional clustering scheme is analogous to a
traditional monodimensional taxonom, and the analysis reported in [17] applies. In
particular, the maximum resolution of a hierarchical clustering scheme with T terminal
clusters is MR=1/T. Compare this resolution with the maximum resolution of a
multidimensional scheme with the same number T of terminal clusters organized
according to j facets, which is MR=1/(T/j)j. The reducing power of a multidimensional
clustering scheme is jj/T(j-1) higher than the corresponding monodimensional scheme.
From another prospective, note that the average number of images to be manually
scanned R is given by R=D/T. Therefore in order to have a reasonable result size, say
R 10, we need a number of terminal clusters that is just one order of magnitude less
than the image collection size. From the one hand, such a high number of clusters is
difficult to manage and access by end-users. From the other hand, there is no
guarantee that such a fine subdivision can be produced.
Therefore, the advantages of multidimensional clustering over monodimensional
scheme are analogous to those that we discussed for multidimensional vs.
monodimensional taxonomies: namely, a dramatically better scalability and the ability
to correlate different features. As an example, we can zoom on a specific texture
cluster, and immediately see all the color clusters for that texture. In addition, custom
similarities can be easily expressed.
In summary, we believe that multidimensional clustering strategies deserve close
attention, because of their faster convergence and, most importantly, because they
present visual similarities according to different perspectives (color, luminance, etc.),
and allow for a more articulated exploration of the information base.
5. Representing Primitive Features and Clusters
In addition to metadata, each image in the sample collection is automatically
described by a number of independent primitive features. These include:
1. average image brightness
2. average image saturation
3. HSV histogram
4. clustering on average color in CIE L*a*b color space on a 4x4 grid
Each image was first reduced to a 250x250 size, while preserving the aspect ratio.
For all features except the last, images were converted to the HSV color space, which
is perceptually more accurate than the RGB color space. We record the image color
histogram, using the color reduction proposed by Lei et al. [7]. A Gaussian blur is
applied to the image before conversion to the HSV color space, in order to avoid
noise, and HSV colors are mapped to 36 bins based on the perceptual characteristics
of the color itself.
The last primitive feature records average colors. Each image axis is subdivided
into 4 intervals, giving 16 areas: the average color of each area is computed. The
aspect ratio is not preserved. Here, the image is first converted to the CIE L*a*b color
space, because linear color combinations in this space are perceptually more accurate
than in other spaces. Each image is then represented by a vector on 16 dimensions and
clustering is applied, based on the cosine measure. The entire collection was clustered
into 10 clusters, by using an agglomerative complete-link clustering algorithm [6]. We
predefined 10 clusters (see Fig. 1), a number chosen according to the guidelines from
[14]: a larger collection would require a hierarchical clustering scheme.
Since the number and variety of primitive features that can be used to describe
images is significant, we do not strive for an exhaustive coverage, and important
features such as textures, regions, shapes, and salient points [23] are not considered
here. Instead, we focus on simple features and problems common to significant classes
of features.
Although the discussion in the previous sections focused on clustering, many
primitive features can be represented by a priori fixed hierarchical organizations of
the feature space. As an example, monodimensional image data such as average or
dominant color, overall luminance, etc., can be easily represented by a facet with k
sons, each representing a possible value. The label of each son can usually be
provided by the value (color, luminance level, etc.) itself. In the example, we
subdivided both average brightness and saturation in 10 intervals (figure 2). Although
a much finer subdivision is supported by measurements, finer subdivisions would be
useless from the perceptual point of view.
A similar organization applies also to a number of bidimensional features, such as
color histograms [10]. The color histogram facet can be represented by a 2-level
hierarchy, where colors are enumerated on the first level (immediate sons of the facet).
Each color is further characterized, at the next level, by its normalized count,
organized as range of values. The structure reported in the figures is slightly more
complex than normal histograms. Here, in fact, the highest level reports principal
colors (8 shades of gray and the 8 main colors, red to purple); at the lower level, each
main color is subdivided into 4 actual colors, by using different luminance and
saturation (figure 3).
More complex image representations such as spatial sampling of average color,
salient points, etc. are not easily represented in this way. In these cases, hierarchical
clustering is required, and the problem of conveying the “meaning” of each cluster has
to be solved. In traditional clustering, clusters are usually labeled by their centroid:
either the cluster barycenter or the item closest to it. Even with text-based clustering,
the meaning of the cluster is often so unclear as to pose serious cognitive challenges to
the user [14]. With image representations, these problems are considerably worse,
because the image features used by clustering might be difficult to understand. In our
example, we labeled clusters by their stylized barycenter rather than by the image
closest to the barycenter, because a specific image might not convey the rationale used
for clustering. For instance, a tondo (i.e. a circular painting) is rather uncommon,
since the usual format is rectangular. If a tondo is used to label a cluster, users are
likely to assume that the cluster contains tondos, whatever the correct interpretation
might be.
6. Examples of Exploration
Figures 4 to 9 report three different explorative sessions that show the significant
advantages of the current approach. In the first session, the user starts her exploration
from a primitive feature, the average image brightness, and selects dark paintings
(figure 4), i.e. paintings with a brightness of 20% or less. After the zoom operation,
the reduced taxonomy in figure 5 indicates that only Antonello da Messina and
Raphael painted dark paintings, and that almost all such paintings are portraits. If the
user displays these portraits, he will notice that they have a black background. If you
wonder why no other painters produced dark paintings, the explosion of the Technique
topic will show you that both painters are the only masters in the collection that use oil
painting. All the other masters used tempera, a technique using water and egg-yolk
(instead of oil) as a binding medium. Tempera paintings tend to be much brighter.
In the second session, exploration starts from metadata: Painter>Masaccio and
then Theme>Sacred are zoomed upon (figure 6). The result is then summarized
according to the HSV histogram, and paintings that have an orange-ish color are
displayed (figure 7). As you will notice, almost all the sacred paintings by Masaccio
fall in this category: these are paintings with a golden background (the so-called fondo
oro) that is typical of Italian sacred paintings of the fifteenth century.
Finally, in the third and last session, clustering on a 4x4 grid is used to explore
Portraits by Antonello da Messina (figure 8). From the clusters in the reduced
taxonomy, most portraits fall in a single cluster, which means they are visually very
similar. In fact, almost all the portraits by Antonello have a very dark background (see
the first session) and the face covers most of the painting (figure 9).
Three simple and quick sessions show how the information base can be effortlessly
explored, gaining insights on its contents: in fact, we discovered relationships and
features of the image collection that no other access method would have made
7. Conclusions and Future Research
A unified intelligent browsing system for complex multimedia information bases
was presented. Dynamic taxonomies allow users to explore complex and large
multimedia information bases in a totally assisted way without unnecessary
constrictions or asymmetries in search paths. They effectively integrate primitive
multimedia features and conceptual metadata features in a unique, integrated visual
framework, in which both features can be used to focus on and to conceptually
summarize portions of the infobase.
The shift in the access paradigm has important implications on how primitive
features are selected and used in the present context. Traditional CBIR systems strive
to capture image “semantics” through a mix of high quality features. Here, instead, it
is much more important that the features used are easily understood by users and that
they capture image characteristics that are useful for exploration. In fact, we
hypothesize that an array of simple features, such as the ones we used in the examples
above, may be adequate, because of the very quick convergence of dynamic
taxonomies, even for very large and more complex image collections. It is tempting to
call RAIF (redundant array of inexpensive features) such an approach and to see a
strong analogy with RAID techniques in disk technology. In both cases, the intelligent
combination of very simple items (features in the present context, disks in RAID)
produces a holistic result that is much better than the original components.
Although formal usability studies are required, the first informal tests on people
reasonably familiar with the paintings in the information base show some interesting
trends. First, most explorations start from metadata with primitive features used in
later stages in order to find visual similarities among paintings characterized by
semantic metadata descriptions. If confirmed by formal experiments, this would
indicate that both access by metadata and by primitive features are required and must
be dealt with in a uniform way. In addition, feature-only CBIR’s that do not support
metadata access would not seem to match user interactions and requirements.
Second, users found that the ability to see images clustered according to different
and independent visual features quite important in exploring the information base, and
in discovering effective visual similarities. Again, if confirmed, this would make a
case for multidimensional clustering and simple, easy-to-understand primitive
1. Cutting DR, Karger DR, Pedersen JO, Tukey JW, Scatter/Gather: a cluster-based
approach to browsing large document collections. ACM SIGIR 1992, 318-329
2. Datta R, Li J, Wang JZ, Content-Based Image Retrieval - Approaches and Trends
of the New Age, ACM MIR 2005
3. Gärdenfors P (2000) Conceptual Spaces – The Geometry of Thought, MIT Press,
Cambridge, MA, USA
4. Hearst M, et al. Finding the Flow in Web Site Search, Comm. of the ACM, 45(9),
2002, 42-49
5. Knowledge Processors (1999) The Universal Knowledge Processor,
6. Jain AK, Murty MN, and Flynn PJ, Data clustering: a review. ACM Comput.
Surv. 31, 3, 1999, 264-323
7. Lei Z, Fuzong L, Bo Z, A CBIR method based on color-spatial feature,
TENCON 99, Proc of the IEEE Region 10 Conference, 1999, 166-169
8. Martinez J (ed.), MPEG 7 – Overview, ISO/IEC JTC1/SC29/WG11 N4980, 2002
9. McDonald S and Tait J, Search strategies in content-based image retrieval. ACM
SIGIR 2003, 80-87
10. Niblack W, Barber R. et al, The QBIC Project: Querying Images By Content
Using Color, Texture, and Shape, SPIE Vol. 1908, 1993, 173-181
11. Ranganathan SR The Colon Classification. New Jersey: Rutgers University Press,
Volume 4, 1965
12. Sanderson M, Croft B, Deriving concept hierarchies from text. ACM SIGIR 1999
13. Sacco GM, Navigating the CD-ROM, Proc. Int. Conf. Business of CD-ROM,
14. Sacco GM, Dynamic Taxonomies: A Model for Large Information Bases, IEEE
Transactions on Knowledge and Data Engineering, 12:3, 2000
15. Sacco GM, Uniform access to multimedia information bases through dynamic
taxonomies, IEEE 6th Int. Symp. on Multimedia Software Engineering, 2004
16. Sacco GM, The intelligent e-store: easy interactive product selection and
comparison, 7th IEEE Conf on E-Commerce Technology, IEEE CEC'05, 2005
17. Sacco GM, Analysis and Validation of Information Access through Mono,
Multidimensional and Dynamic Taxonomies, FQAS 2006, 7th Int. Conf. on
Flexible Query Answering Systems, Springer LNAI 4027, 2006
18. Sacco GM, Some Research Results in Dynamic Taxonomy and Faceted Search
Systems, SIGIR 2006 Workshop on Faceted Search, 2006
19. Santini, S., Jain, R Integrated Browsing and Querying for Image Databases. IEEE
MultiMedia 7, 3, 2000, 26-39.
20. Schreiber AT, Dubbeldam B, Wielemaker J, Wielinga BJ, Ontology-based photo
annotation, IEEE Intelligent Systems, 2001
21. Smeulders A, et al. Content-based image retrieval at the end of early years. IEEE
Trans on Pattern Analysis and Machine Intelligence, 22(12), 2000, 1349-1380
22. Stan D, A New Approach for Exploration of Image Databases, Proceedings of the
Grace Hopper Celebration of Women in Computing 2002
23. Tian Q, et al, Image retrieval using wavelet-based salient points, Journal of
Electronic Imaging 10(4), 2001, 835 –849
24. Yee K, Swearingen K, Li K, and Hearst M. Faceted metadata for image search
and browsing. In Proc ACM SIGCHI Conf CHI '03, 2003
Fig. 1. Multidimensional
primitive features: clustering
of average color on a 4x4
grid. Clusters are labelled by
their baricenter
Fig. 2. Monodimensional
primitive features: average
brightness and average
Fig. 3. Bidimensional
primitive features: reduced
HSV histogram
Fig. 4. Zooming on dark
Fig. 5. Exploring dark paintings: only Raphael and Antonello have dark items,
and almost all are portraits. Dark portaits are expanded
Fig. 6. Zooming on
Sacred paintings, after a
zoom on Masaccio
Fig. 7. Histogram summary of Masaccio’s sacred paintings: paintings
with orange-ish colors are displayed
Fig. 8. Zooming on Portaits
paintings, after a zoom on
Fig. 9. Cluster summary of Antonello’s portraits: displaying the selected
... As a consequence, clustering can be, and was, applied to other domains, for example image retrieval. The reader is referred to [19], where the "Rosso Tiziano" image exploration system is described. This system was the first one to combine primitive image features, clustering, and metadata into a single exploratory system by embedding them in a dynamic taxonomy. ...
Full-text available
Self-adapting exploratory structures (SAESs) are the basic components of exploratory search. They are abstract structures which allow searching or querying of an information base and summarizing of results using a uniform representation. A definition and a characterization of SAES is given, as well as a discussion of structures that are SAES or can be modified in order to become SAES. These include dynamic taxonomies (also known as faceted search), tag clouds, continuous sliders, geographic maps, and dynamic clustering methods, such as Scatter-Gather. Finally, the integration of these structures into a single interface is discussed.
... The " end-game " , in which candidate items are compared in order to select a single item, is solved by color-coding. An online demo for digital cameras is available at [7]; 2. Multimedia databases [14, 17, 24], where dynamic taxonomies can be used to integrate access by conceptual metadata and access by primitive multimedia features (color, texture, etc.) into a single, coherent framework. Rosso Tiziano, an online demo on an image collection of paintings of the Italian Renaissance, is available at [26]; ...
Conference Paper
Full-text available
Metadata access to complex information bases through multidimensional dynamic taxonomies (aka faceted search) is rapidly becoming a hot topic both in research and in industry, where e-commerce applications based on this access paradigm are becoming pervasive. The dynamic taxonomy model was the first model to fully exploit multidimensional classification schemes and to integrate query and exploration into a single visual framework. We review a number of research results in data modeling, implementation, user interaction and emerging application areas of dynamic taxonomies and faceted search, originating from the Department of Informatics, University of Torino.
This paper examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections.
The goal of this paper is to describe an exploration system for large image databases. The system also gives the capability for semi-automatic annotation: instead of trying to give all possible meanings to an image, the user will interpret and annotate an image in the context in which that image appears, thus dramatically reducing the time taken to annotate large collection of images.
An evaluation of a large, operational full-text document-retrieval system (containing roughly 350,000 pages of text) shows the system to be retrieving less than 20 percent of the documents relevant to a particular search. The findings are discussed in terms of the theory and practice of full-text document retrieval.
Object-oriented database mangement systems (OODBMS) combine the data abstraction and computational models of object-oriented programming languages with the query and performance capabilities of database management systems. A concise, formal data model for OODBMS has not been universally accepted, preventing detailed investigation of various system issues such as query processing. We define a data model that captures the essence of classification-based object-oriented systems and formalize concepts such as object identity, inheritence, and methods. The main topic of the paper is the presentation of a query processing methodology complete with an object calculus to object algebra translation are discussed in detail. The paper concludes with a discussion of equivalence-preserving transformation rules for object algebra expressions.
In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.