Computer Science, RMIT University,
major hurdle in practical Content-Based Image Re-
trieval (CBIR) is conveying the user’s information need to
the system. One common method
query specification is
to express the query using one or more example images.
this work, we consider whether using more examples im-
proves the effectiveness
meeting a user’s in-
formation need. We show that using multiple examples
improves retrieval effectiveness by around
single-example queries, but that further improvements in
using more than
examples may not justifL the added
Keywords Image retrieval, multimedia databases, multi-
ple queries, recall and precision.
Without pictorial material, less value
placed on pub-
lished material, be it a magazine article, reference soft-
instruction manual. Unfortunately, although
we have more digitised images than ever, finding
information need is becoming more difficult;
searching through large numbers of images is not a trivial
There are two well-known methods of finding an image
in an image database. One
to manually associate
tual annotation with each image before the image is added
to the database: these captions are indexed, and at query
time, they are searched.
method suffers from a lack of
scalability and the caption is implicitly linked to the annota-
tor’s abstract of the image, not the image content itself. The
second method is to use machine vision techniques to auto-
matically recognise pre-defined objects.
when applied to unconstrained domains, where the images
are not limited to a fixed number of known categories.
practical alternative to these traditional approaches is
Content-Based Image Retrieval (CBIR). In CBIR, the sys-
tem produces and stores a summary of each image in the
collection-usually as it
added to the database-by ex-
tracting feature data from it. Similar data extracted from
the user’s query is compared with this stored data, and a list
images is presented to the user, sorted by increasing sta-
tistical difference to the query. The most common features
used in CBIR
those based on colour, texture, and shape.
It seems intuitive that if we are presenting a sample im-
age as the query, using more example images should lead
better retrieval results. For example, if we present the
system an image
a red rose, it is possible that a large
number of the top-ranking results will be red objects that
are not flowers. However, if we present the system with
three images of red roses, we might speculate that the sys-
tem may extract more features
red roses and more effec-
tively present red roses as high-ranking answers.
We have experimented with multiple-image querying in
CBIR on a medium-size image collection using different
calculating the distance between images. We have
that multiple-image querying with two examples improves
retrieval effectiveness by around
age querying for closely-matching answers. Importantly,
adding more than two images to the query produces only
modest further improvement.
Content-Based Image Retrieval (CBIR) allows users to
pose image queries to a database
images, with the goal
of retrieving relevant images that satisfy the user’s infor-
mation need. In a similar way to Information Retrieval (IR)
practice, likely relevance is approximated by statistical sim-
ilarity, where images returned have the highest estimated
statistical likelihood of being perceived as relevant to the
query [15,20]. An answer to an image query is usually an
ordered list of images.
A CBIR system stores summaries
each image in the
collection, as well as the images themselves. The sum-
maries are usually a representation of one or more features
extracted from the images stored in the database. When
searching for matching images, the same features are ex-
tracted from queries. The query features are then statisti-
cally compared with the stored feature data, and a list of
images is presented to the user, sorted by similarity. The
most common features used in CBIR are those based on
colour, texture, and shape.
Several existing CBIR systems allow the user to present
a single example image as a query. Among these sys-
tems are QBIC [4,5], Virage, VisualSEEk[l9], and NE-
supports a multiple-example query
paradigm, where a user can select small tiles of colour and
texture to build up an example mosaic for the query.
The CHITRA system supports multiple-example queries
91, where the user can select any number of example
images to pose as a query’. Analysis of the effectiveness of
multiple example querying-as embodied in the CHITRA
system-is the subject of this paper.
Careful choice of the features used in a CBIR system
is crucial to system effectiveness. Indeed, combination of
several features to represent an image is also important. The
possible features that can be used fall into three primary
categories, colour, texture, and shape.
Colour features used to abstract or summarise images in-
Among colour spaces, the simplest and
best-known is Red, Green, Blue (RGB); this is in largely
due to its direct relation to the method of image representa-
tion in computer monitors. However, while it is conceptu-
ally simple, there is no simple mapping between RGB and
human perception of colour.
Human perception of colour
more complex than sim-
ple RGB, since we deal with around a dozen colours, both
when observing sights and discussing them
31. This ob-
servation permits classification of colours into a small num-
An ideal colour space feature would accurately map to
human perception and allow us to accurately estimate how
different two images are in terms of colour. Such a colour
space has several characteristics. First, it will be linear in
terms of human perception; a unit change in the value of
one of the colour space components will be equally percep-
tible across the range
values of this component. Second,
it will separate the brightness component from chromaticity
to avoid the effects of varying lighting conditions on per-
ceived colour. Last, a good colour space avoids combina-
tions of opposing colours; some colours-white and black,
the world-wide web
red and green, and blue and yellow-are diametrically op-
posed in terms of human perception, we never talk of a
“reddish-green” or a “bluish-yellow”
The Munsell colour space is widely acknowledged to
be the closest to human perception, but has the disadvan-
tages of being non-linear and difficult to transform from
other colour spaces. An approximate, fuzzy version of the
Munsell colour space has been developed for application to
The Intemational Commission on Illumination (CIE
has progressively developed several colour spaces that are
practical for CBIR. In particular, the L*a*b* (LAB) and
L*u*v* (LUV) colour spaces have been designed to better
match the characteristics of human perception, while re-
maining mathematically tractable
They have similar
characteristics, and largely coexist through lack
ment between their developers.
colour space separates the luminance
and V components are often subsam-
pled at half the rate of that for luminance, since the human
eye is not as sensitive to colour variations as it is to varia-
tions in luminance
A second feature class used
CBIR is texture. While
it is not as effective as colour in most cases, it does pro-
vide added discrimination where colour alone does not suf-
141. We do not discuss texture features in detail here.
The third feature class used in CBIR is shape. Images
may be partitioned into regions, and the shape, colour, and
texture of these regions can be used for retrieval. Shape
features are mostly derived
the moments of the image
regions . As with texture, we do not discuss shape-based
CBIR in detail here.
We have compared nine feature spaces using single-
image queries with the techniques described later in Sec-
In the interest of brevity, results for six of the fea-
ture spaces are not reported in this paper. The three best-
performing spaces-LAB, LUV, and Y uv-were retained
for continued investigation. Results with these schemes us-
ing single and multiple-example querying are reported in
The calculation of statistical similarity between a query
feature and the features of each database image requires a
measure. Two common methods for
calculating the distance between two images are the
hattan-sum of absolute distances-and the
sum of squares of distances-distance measures. Other dis-
tance measures have been developed, each with its own ad-
In addition to comparing the
ture spaces, we compare Manhattan and Euclidean distance
measures in this paper.
Multiple Example Queries
systems only allow single-example
queries, that is, database images are retrieved after com-
parison to a single query image. The results are retumed in
ranked order of increasing distance from the example image
in one or more feature spaces.
In this paper, we investigate whether providing more ex-
amples images as a query can improve the effectiveness of
CBIR. Multiple-example querying provides users with an
additional, altemative querying mode permitting expression
of different features of an information need. For example, a
user who wishes to find images of red roses may not be con-
cemed whether the images retumed show a bunch of roses,
flowerbed of roses, or a single rose. Accordingly, the user
may present three images as a single query that illustrates
these different groupings of red roses and conveys a broad
user can not find a single image that expresses
an information need, multiple-image querying provides a
powerful altemative. Consider a case where the user wishes
to retrieve images
a red rose but does not have a repre-
sentative query image. In this case, the user may select two
images that together convey the concept: a white rose and
red carnation. While the results are unllkely to solely con-
red roses-we would also expect to see white carna-
tions, white roses, and red carnations-in this case multiple-
example querying allows querying
our comparison of possible multiple-example query-
schemes, we have restricted ourselves to multiple-
example, single-feature queries, that is, distance measure-
ment is based
we additionally restrict the comparison to three colour fea-
tures and two distance measures.
There are two possible approaches to multiple-example
querying. First, when multiple query images are presented
we can combine the image features to form a composite
feature, and execute a single query. Second, when mul-
tiple query images are presented we can execute multiple
queries-one database query per query image-and com-
bine the ranked answer lists. Composites of these ap-
proaches are also possible. We investigate the latter ap-
proach of combining ranked result sets in this paper.
illustrate multiple example querying, consider Fig-
that shows two example images as points and three
candidate database images. Which of the three candidate
and C-best matches a query represented
by Examples 1 and 2? Three simple approaches would be
Image points in a two-dimensional
Image A, since it is close to one of the examples, ex-
distant from either example
has the smallest total distance from
image querying is .a difficult problem.
this paper, we
consider three approaches: the
functions. These determine the distance of a partic-
ular collection image from the specified multiple example
images to be the sum, the minimum, and the maximum of
the jndividual distances respectively.
To process a two-example query, the distance of the can-
didate image to each example is calculated. Then, the com-
bining function is applied to reduce the multiple distances
a single aggregate value.
When this has been performed for all images
lection, the user is presented with a list
ranged in order of increasing aggregate value.
the example in Figure 1, these functions
the best matches as shown below:
More Examples Help?
In our experiments, we used a collection of one thou-
sand assorted images. These images are categorised into
one of ten
Buildings, Fish, Flowerbeds, Flow-
ers, Greenbeds, Mountains, People, Plants, Sea, and Sun-
sets. The number of images for each concept varied from
35-for Mountains-to 1 12-for Sunsets. We selected and
1 images at random from each concept to
query set of 210 images, leaving
images in the database
as a test set.
To experiment with multiple-example querying, we par-
titioned our 210-image query set. First, we began by se-
lecting one image from each of the 10 concept sets of 21
images each. This is a single-example query, and the re-
trieval performance with this query is recorded; we discuss
performance measurement below. Second, another exam-
ple was extracted from each concept and paired with each
of our single examples, forming a two-example query. Last,
another example can be extracted from each concept and
triples made from each pair. For each query concept set
of 21 images, we can produce either
example queries or 7 independent three-example queries.
This process of extracting images and producing indepen-
dent sets can be generalised for four-example and larger
queries, although the number of independent queries is dra-
To measure retrieval effectiveness, we use recall-
precision as often used in information retrieval . Recall-
precision measurement requires that each image in the
database can be classified as either “relevant” or “not rele-
vant” to each query that is posed. In most practical applica-
tions, this assessment is impractical, since it is not feasible
to assess each image in the database for relevance to each
query. However, by making a simplistic assumption that
only the images that are members of a concept are relevant
answers to a query extracted from that concept, we can ap-
proximate recall-precision measurement. For example, for
a fish query, only fish concept images are deemed relevant
and all images from other concepts are judged as irrelevant
answers. This somewhat restrictive assumption allows the
practical calculation of effectiveness performance values.
retrieved at a particular point, that is
Concept images retrieved
Total images retrieved
Recall, in contrast, measures the fraction of the relevant an-
swers that have been retrieved at a particular point, or
Concept images retrieved
Total concept images
Conventionally, precision is reported for each
call values being
Interpolated precision values are often used when dis-
cussing average effectiveness. The interpolated precision
at a given recall value R is the maximum actual precision
that appears at any recall value equal to or greater than
Interpolated values are also commonly calculated at 10%
increments of recall.
the interpolated precision at
recall is the highest precision obtained at any recall value.
Table 1 shows the precision at four recall levels--0%,
20%, and 30%-using single-example queries and the
features. We also show the results of
two-example queries with the three combining functions-
minimum, maximum, and sum-and precision values rela-
tive to the one-example figure shown.
With two examples and the sum combining function,
there are significant improvements in retrieval performance.
colour space-with the Euclidean distance
and the sum combining function-adding a second exam-
ple image to a query improves precision by between 8.7%
and 20.3% at low recall levels. We have calculated confi-
dence levels of above
At higher recall levels, gains in
precision are more modest; however, it is frequently argued
that users are more concerned with high precision at lower
recall levels .
The improvement in effectiveness with the
features is less than that with
less effective as a single-example scheme, two-examples
is more effective than any other single or multiple
example scheme. Interestingly, the sum combining func-
tion is the only consistently effective scheme, suggesting
that images that are close to both examples in the query are
more likely to be relevant than images that are close to only
one of the examples.
Figure 2 shows the effect of adding a third example to
each of our 70 independent two-example queries. The im-
provement shown is much less striking than that
example queries over one-example queries. The same trend
continues as more examples are added to each independent
query, as shown
This figure shows the re-
sults of a much smaller experiment, with different distance
functions and combining measures, however we have ob-
served the same trend in increasing the number of exam-
ples with all such variations in query parameters; because
of the smaller query set, the confidence interval for these
results is much lower and they are indicative only
performance trend. We conclude that adding
ample can significantly improve retrieval effectiveness but
that adding more examples offers little additional improve-
In the results presented
far, we have compared dif-
ferent feature spaces and combining functions. Figure 4
shows a comparison of the Manhattan and Euclidean dis-
tance measurement schemes using the HSV feature space
and a sum combining function. Using the Manhattan dis-
tance improves effectiveness by around 2%-3% over the
experimented with the perceptual colour
categories, but obtained poor results that we do not report
in detail here. We believe that partitioning colours in this
manner is not particularly effective from
tive. We also used a 48-dimensional Gabor texture vec-
While not performing well individually, we empir-
ically observed that it did prove useful in special-cases for
differentiating images with no identifying colour. For ex-
ample, we noted that in our experiments, texture produced
effective results for the Buildings concept, where the colour
features performed poorly.
Average retrieval effectiveness of
one-image and two-image queries. Precision is shown
recall and the
features. For two-image queries,
the minimum, maximum, and sum combining functions are shown. The Euclidean distance
function is used for similarity calculation.
Recall Feature One Image
Sum Minimum Maximum
31.8 +14.2 -10.3 -3.8
In this paper, we have examined whether presenting
multiple image queries to a content-based image retrieval
system improves the retrieval effectiveness over single-
image querying. We have shown that for selected param-
eters, using more example images improves retrieval per-
formance. Two-example queries with selected parameters
have also found that using more
than three examples
future work, we aim to develop heuristic methods to
capture the user's requirements. If a system is presented
with two different examples--one of a red rose and another
of a yellow daffodil-should the system judge we are inter-
only red or yellow objects, red or yellow flowers, or
flower? Most humans would not agree on any
one solution to this problem, and it is probable that iterative
feedback methods must be incorporated into the system
continually re-evaluate the user's opinion of the results.
Content-based image retrieval is becoming more impor-
tant with the increasing size and prevalence of image repos-
itories. We have shown that multiple-example querying can
improve retrieval effectiveness in searching these databases
and better meet user information needs.
This work was supported by the Australian Research
Council and the Multimedia Database Systems group at
RMIT University. We thank
Surya Nepal for their contribution to the CHITRA project.
also express our appreciation to the anonymous referees
for their helpful comments.
B. Berlin and
Color Terms: Their Universaliq
of Cal. Press., Berkeley, Califomia, USA,
based image querying. In
and Video Libraries,
In conjunction with
Ogle. Storage and retrieval
data for a very large online image collection.
the lEEE Computer
D. Petkovic, and
Equitz. Efficient and effective querying
by image content.
4):231-262, July 1994.
Hafner, D. Lee, D. Petkovic, and
and. Query by image and video content: The QBIC
Three Examples (Sum)
The performance of three examples-using the sum combining function, the
space, and Euclidean distance-is not markedly better than that of two example querying.
- - -
increasing the number of query examples using the
feature space, Manhattan
distance, and the sum combining function.
A. Gupta. Visual information retrieval: A Virage perspec-
tive. Technical Report Revision
Virage Inc., 9605 Scran-
ton Road, Suite
San Diego, CA 92121, 1997. URL:
toolbox for nav-
igating large image databases. In
Image Processing, pages 568-571, Santa Bar-
bara, California, Oct 1997.
Ma. Texture features for brows-
ing and retrieval of image data.
Transactions on Pat-
generalized test bed for
formation Resources Management Association,
V. Ramakrishna, and
image data modelling.
Australasian Database Conference,
Australia, 2-3 February 1998.
tional Lighting Vocabulary.
C. Poynton. Frequently asked questions about color, 1997.
Digital Image Processing.
Wiley, New York,
second edition, 1999.
Lin. Gabor histogram feature for
content-based image retrieval. In
Digital Image Computing,
---. Euclidean Distance
Figure 4. The Manhattan distance measure performs better than the Euclidean distance using the
Automatic Text Processing: The Transformation,
Analysis, and Retrieval of Information by Computer.
son, Reading, Massachusetts,
Santini and R. Jain. Similarity matching. In
ings of the Second Asian Conference on Computei- Vision,
pages 571-580, Singapore, De-
Hepplewhite, and J. Stonham. Fuzzy colour
content-based image retrieval. Technical
Electrical and Electronic Engi-
neering, Brunel University, Uxbridge, Middlesex, UB8 3PH,
ceptual colour and texture queries using stackable mosaics.
Proceedings of the International Conference
media Computing and Systems,
Uxbridge, Middlesex, UB8
R. Smith and
Chang. VisualSEEk: A fully automated
content-based image query system.
Proc ACM Multime-
98, Boston, MA, November
Witten, A. Moffat, and
Compressing and Indexing Documents and Images.
Kaufmann Publishers, Los Altos, CA