Content uploaded by Hugh Williams
Author content
All content in this area was uploaded by Hugh Williams on Jun 06, 2020
Content may be subject to copyright.
Are
Two
Pictures Better
Than
One?
S
.M.M.
Tahaghoghi
James
A.
Thom Hugh
E.
Williams
Department
of
Computer Science, RMIT University,
GPO
Box
2476K
Melbourne
3001,
Australia
E-mail:
{stahagho,
jat,
hugh}@cs
.rmit .edu.au
Abstract
A
major hurdle in practical Content-Based Image Re-
trieval (CBIR) is conveying the user’s information need to
the system. One common method
of
query specification is
to express the query using one or more example images.
In
this work, we consider whether using more examples im-
proves the effectiveness
of
CBIR
in
meeting a user’s in-
formation need. We show that using multiple examples
improves retrieval effectiveness by around
9%-20%
over
single-example queries, but that further improvements in
using more than
two
examples may not justifL the added
processing required.
Keywords Image retrieval, multimedia databases, multi-
ple queries, recall and precision.
1.
Introduction
Without pictorial material, less value
is
placed on pub-
lished material, be it a magazine article, reference soft-
ware,
or
an
instruction manual. Unfortunately, although
we have more digitised images than ever, finding
an
image
that meets
an
information need is becoming more difficult;
searching through large numbers of images is not a trivial
task.
There are two well-known methods of finding an image
in an image database. One
is
to manually associate
a
tex-
tual annotation with each image before the image is added
to the database: these captions are indexed, and at query
time, they are searched.
This
method suffers from a lack of
scalability and the caption is implicitly linked to the annota-
tor’s abstract of the image, not the image content itself. The
second method is to use machine vision techniques to auto-
matically recognise pre-defined objects.
This
method falters
when applied to unconstrained domains, where the images
are not limited to a fixed number of known categories.
A
practical alternative to these traditional approaches is
Content-Based Image Retrieval (CBIR). In CBIR, the sys-
tem produces and stores a summary of each image in the
collection-usually as it
is
added to the database-by ex-
tracting feature data from it. Similar data extracted from
‘
the user’s query is compared with this stored data, and a list
of
images is presented to the user, sorted by increasing sta-
tistical difference to the query. The most common features
used in CBIR
are
those based on colour, texture, and shape.
It seems intuitive that if we are presenting a sample im-
age as the query, using more example images should lead
to
better retrieval results. For example, if we present the
system an image
of
a red rose, it is possible that a large
number of the top-ranking results will be red objects that
are not flowers. However, if we present the system with
three images of red roses, we might speculate that the sys-
tem may extract more features
of
red roses and more effec-
tively present red roses as high-ranking answers.
We have experimented with multiple-image querying in
CBIR on a medium-size image collection using different
features,
methods
of
combining
results,
and
techniques for
calculating the distance between images. We have
found
that multiple-image querying with two examples improves
retrieval effectiveness by around
97~20%
over single
im-
age querying for closely-matching answers. Importantly,
adding more than two images to the query produces only
modest further improvement.
2.
Background
Content-Based Image Retrieval (CBIR) allows users to
pose image queries to a database
of
images, with the goal
of retrieving relevant images that satisfy the user’s infor-
mation need. In a similar way to Information Retrieval (IR)
practice, likely relevance is approximated by statistical sim-
ilarity, where images returned have the highest estimated
statistical likelihood of being perceived as relevant to the
query [15,20]. An answer to an image query is usually an
ordered list of images.
A CBIR system stores summaries
of
each image in the
collection, as well as the images themselves. The sum-
1530-0919/01
$10.00
@
2001
IEEE
138
maries are usually a representation of one or more features
extracted from the images stored in the database. When
searching for matching images, the same features are ex-
tracted from queries. The query features are then statisti-
cally compared with the stored feature data, and a list of
images is presented to the user, sorted by similarity. The
most common features used in CBIR are those based on
colour, texture, and shape.
Several existing CBIR systems allow the user to present
a single example image as a query. Among these sys-
tems are QBIC [4,5], Virage[6], VisualSEEk[l9], and NE-
TRA
[7].
Pisaro
[18]
supports a multiple-example query
paradigm, where a user can select small tiles of colour and
texture to build up an example mosaic for the query.
The CHITRA system supports multiple-example queries
[lo,
91, where the user can select any number of example
images to pose as a query’. Analysis of the effectiveness of
multiple example querying-as embodied in the CHITRA
system-is the subject of this paper.
2.1.
Feature Spaces
Careful choice of the features used in a CBIR system
is crucial to system effectiveness. Indeed, combination of
several features to represent an image is also important. The
possible features that can be used fall into three primary
categories, colour, texture, and shape.
Colour features used to abstract or summarise images in-
clude the
RGB,
Munsell,
HSV,
L*a*b*
(LAB),
and L*u*v*
(LUV)
spaces
[12].
Among colour spaces, the simplest and
best-known is Red, Green, Blue (RGB); this is in largely
due to its direct relation to the method of image representa-
tion in computer monitors. However, while it is conceptu-
ally simple, there is no simple mapping between RGB and
human perception of colour.
Human perception of colour
is
more complex than sim-
ple RGB, since we deal with around a dozen colours, both
when observing sights and discussing them
[l,
31. This ob-
servation permits classification of colours into a small num-
ber
of
perceptually significant
colours.
An ideal colour space feature would accurately map to
human perception and allow us to accurately estimate how
different two images are in terms of colour. Such a colour
space has several characteristics. First, it will be linear in
terms of human perception; a unit change in the value of
one of the colour space components will be equally percep-
tible across the range
of
values of this component. Second,
it will separate the brightness component from chromaticity
to avoid the effects of varying lighting conditions on per-
ceived colour. Last, a good colour space avoids combina-
tions of opposing colours; some colours-white and black,
‘The
CHITRA
system
can
be
accessed
on
the world-wide web
at
http://kroid.mds.rmit.edu.au/Nstahagho/cbir/
red and green, and blue and yellow-are diametrically op-
posed in terms of human perception, we never talk of a
“reddish-green” or a “bluish-yellow”
[
171.
The Munsell colour space is widely acknowledged to
be the closest to human perception, but has the disadvan-
tages of being non-linear and difficult to transform from
other colour spaces. An approximate, fuzzy version of the
Munsell colour space has been developed for application to
CBIR [U].
The Intemational Commission on Illumination (CIE
*)
has progressively developed several colour spaces that are
practical for CBIR. In particular, the L*a*b* (LAB) and
L*u*v* (LUV) colour spaces have been designed to better
match the characteristics of human perception, while re-
maining mathematically tractable
[
111.
They have similar
characteristics, and largely coexist through lack
of
agree-
ment between their developers.
The
YUV
colour space separates the luminance
(Y)
from
chrominance. The
U
and V components are often subsam-
pled at half the rate of that for luminance, since the human
eye is not as sensitive to colour variations as it is to varia-
tions in luminance
[
131.
A second feature class used
in
CBIR is texture. While
it is not as effective as colour in most cases, it does pro-
vide added discrimination where colour alone does not suf-
fice
[
141. We do not discuss texture features in detail here.
The third feature class used in CBIR is shape. Images
may be partitioned into regions, and the shape, colour, and
texture of these regions can be used for retrieval. Shape
features are mostly derived
from
the moments of the image
regions [2]. As with texture, we do not discuss shape-based
CBIR in detail here.
We have compared nine feature spaces using single-
image queries with the techniques described later in Sec-
tion
4.
In the interest of brevity, results for six of the fea-
ture spaces are not reported in this paper. The three best-
performing spaces-LAB, LUV, and Y uv-were retained
for continued investigation. Results with these schemes us-
ing single and multiple-example querying are reported in
Section 4.
2.2.
Distance measures
The calculation of statistical similarity between a query
feature and the features of each database image requires a
distance
or
similarity
measure. Two common methods for
calculating the distance between two images are the
Man-
hattan-sum of absolute distances-and the
Euclidean-
sum of squares of distances-distance measures. Other dis-
tance measures have been developed, each with its own ad-
vantages [16].
2Coniniission
Internutionule
de
I’
EcIuir-uge
139
In addition to comparing the
YUV,
LUV,
and
LAB
fea-
ture spaces, we compare Manhattan and Euclidean distance
measures in this paper.
3.
Multiple Example Queries
Almost all
CBIR
systems only allow single-example
queries, that is, database images are retrieved after com-
parison to a single query image. The results are retumed in
ranked order of increasing distance from the example image
in one or more feature spaces.
In this paper, we investigate whether providing more ex-
amples images as a query can improve the effectiveness of
CBIR. Multiple-example querying provides users with an
additional, altemative querying mode permitting expression
of different features of an information need. For example, a
user who wishes to find images of red roses may not be con-
cemed whether the images retumed show a bunch of roses,
a
flowerbed of roses, or a single rose. Accordingly, the user
may present three images as a single query that illustrates
these different groupings of red roses and conveys a broad
information need.
When
a
user can not find a single image that expresses
an information need, multiple-image querying provides a
powerful altemative. Consider a case where the user wishes
to retrieve images
of
a red rose but does not have a repre-
sentative query image. In this case, the user may select two
images that together convey the concept: a white rose and
red carnation. While the results are unllkely to solely con-
sist
of
red roses-we would also expect to see white carna-
tions, white roses, and red carnations-in this case multiple-
example querying allows querying
to
proceed.
In
our comparison of possible multiple-example query-
ing
schemes, we have restricted ourselves to multiple-
example, single-feature queries, that is, distance measure-
ment is based
on
a
single feature.
As
discussed previously,
we additionally restrict the comparison to three colour fea-
tures and two distance measures.
There are two possible approaches to multiple-example
querying. First, when multiple query images are presented
we can combine the image features to form a composite
feature, and execute a single query. Second, when mul-
tiple query images are presented we can execute multiple
queries-one database query per query image-and com-
bine the ranked answer lists. Composites of these ap-
proaches are also possible. We investigate the latter ap-
proach of combining ranked result sets in this paper.
To
illustrate multiple example querying, consider Fig-
ure
1
that shows two example images as points and three
candidate database images. Which of the three candidate
images-A,
B,
and C-best matches a query represented
by Examples 1 and 2? Three simple approaches would be
to select:
__....I
..
Example
1
Example
2
Figure
1.
Image points in a two-dimensional
query
space
e
Image A, since it is close to one of the examples, ex-
ample 1
0
Image
B,
since it
is
not
too
distant from either example
e
Image
C,
since
it
has the smallest total distance from
the examples
Finding effective
combining functions
for multiple-
image querying is .a difficult problem.
In
this paper, we
consider three approaches: the
Sum,
Minimum,
and
Max-
imum
functions. These determine the distance of a partic-
ular collection image from the specified multiple example
images to be the sum, the minimum, and the maximum of
the jndividual distances respectively.
To process a two-example query, the distance of the can-
didate image to each example is calculated. Then, the com-
bining function is applied to reduce the multiple distances
to
a single aggregate value.
When this has been performed for all images
in
the col-
lection, the user is presented with a list
of
the images,
ar-
ranged in order of increasing aggregate value.
For
the example in Figure 1, these functions
would
retum
the best matches as shown below:
Minimum
Maximum Sum
1
A
B
C
2
C
C
B
3
B
A
A
4.
Do
More Examples Help?
In our experiments, we used a collection of one thou-
sand assorted images. These images are categorised into
one of ten
concepts:
Buildings, Fish, Flowerbeds, Flow-
ers, Greenbeds, Mountains, People, Plants, Sea, and Sun-
sets. The number of images for each concept varied from
35-for Mountains-to 1 12-for Sunsets. We selected and
removed
2
1 images at random from each concept to
form
a
query set of 210 images, leaving
790
images in the database
as a test set.
To experiment with multiple-example querying, we par-
titioned our 210-image query set. First, we began by se-
lecting one image from each of the 10 concept sets of 21
140
images each. This is a single-example query, and the re-
trieval performance with this query is recorded; we discuss
performance measurement below. Second, another exam-
ple was extracted from each concept and paired with each
of our single examples, forming a two-example query. Last,
another example can be extracted from each concept and
triples made from each pair. For each query concept set
of 21 images, we can produce either
10
independent two-
example queries or 7 independent three-example queries.
This process of extracting images and producing indepen-
dent sets can be generalised for four-example and larger
queries, although the number of independent queries is dra-
matically reduced.
To measure retrieval effectiveness, we use recall-
precision as often used in information retrieval [20]. Recall-
precision measurement requires that each image in the
database can be classified as either “relevant” or “not rele-
vant” to each query that is posed. In most practical applica-
tions, this assessment is impractical, since it is not feasible
to assess each image in the database for relevance to each
query. However, by making a simplistic assumption that
only the images that are members of a concept are relevant
answers to a query extracted from that concept, we can ap-
proximate recall-precision measurement. For example, for
a fish query, only fish concept images are deemed relevant
and all images from other concepts are judged as irrelevant
answers. This somewhat restrictive assumption allows the
practical calculation of effectiveness performance values.
Precision
is
a measure
of
the fraction
of
relevant answers
retrieved at a particular point, that is
Concept images retrieved
Total images retrieved
P=
Recall, in contrast, measures the fraction of the relevant an-
swers that have been retrieved at a particular point, or
Concept images retrieved
Total concept images
R=
Conventionally, precision is reported for each
of
eleven re-
call values being
10%
intervals from
0%
to
100%.
Interpolated precision values are often used when dis-
cussing average effectiveness. The interpolated precision
at a given recall value R is the maximum actual precision
that appears at any recall value equal to or greater than
R.
Interpolated values are also commonly calculated at 10%
increments of recall.
Thus,
the interpolated precision at
0%
recall is the highest precision obtained at any recall value.
Table 1 shows the precision at four recall levels--0%,
lo%,
20%, and 30%-using single-example queries and the
LAB, LUV,
and
YUV
features. We also show the results of
two-example queries with the three combining functions-
minimum, maximum, and sum-and precision values rela-
tive to the one-example figure shown.
With two examples and the sum combining function,
there are significant improvements in retrieval performance.
Using the
YUV
colour space-with the Euclidean distance
and the sum combining function-adding a second exam-
ple image to a query improves precision by between 8.7%
and 20.3% at low recall levels. We have calculated confi-
dence levels of above
99%.
At higher recall levels, gains in
precision are more modest; however, it is frequently argued
that users are more concerned with high precision at lower
recall levels [20].
The improvement in effectiveness with the
LAB
and
LUV
features is less than that with
YUV.
While
YUV
is slightly
less effective as a single-example scheme, two-examples
with
Y
uv
is more effective than any other single or multiple
example scheme. Interestingly, the sum combining func-
tion is the only consistently effective scheme, suggesting
that images that are close to both examples in the query are
more likely to be relevant than images that are close to only
one of the examples.
Figure 2 shows the effect of adding a third example to
each of our 70 independent two-example queries. The im-
provement shown is much less striking than that
of
two-
example queries over one-example queries. The same trend
continues as more examples are added to each independent
query, as shown
in
Figure
3.
This figure shows the re-
sults of a much smaller experiment, with different distance
functions and combining measures, however we have ob-
served the same trend in increasing the number of exam-
ples with all such variations in query parameters; because
of the smaller query set, the confidence interval for these
results is much lower and they are indicative only
of
the
performance trend. We conclude that adding
a
second ex-
ample can significantly improve retrieval effectiveness but
that adding more examples offers little additional improve-
ment.
In the results presented
so
far, we have compared dif-
ferent feature spaces and combining functions. Figure 4
shows a comparison of the Manhattan and Euclidean dis-
tance measurement schemes using the HSV feature space
and a sum combining function. Using the Manhattan dis-
tance improves effectiveness by around 2%-3% over the
Euclidean distance.
We have
also
experimented with the perceptual colour
categories, but obtained poor results that we do not report
in detail here. We believe that partitioning colours in this
manner is not particularly effective from
a
CBIR
perspec-
tive. We also used a 48-dimensional Gabor texture vec-
tor
[8].
While not performing well individually, we empir-
ically observed that it did prove useful in special-cases for
differentiating images with no identifying colour. For ex-
ample, we noted that in our experiments, texture produced
effective results for the Buildings concept, where the colour
features performed poorly.
141
Table
1.
Average retrieval effectiveness of
70
one-image and two-image queries. Precision is shown
for for
0%,
lo%,
20%, and
30%
recall and the
LAB,
LUV,
and
YUV
features. For two-image queries,
results
of
the minimum, maximum, and sum combining functions are shown. The Euclidean distance
function is used for similarity calculation.
Recall Feature One Image
Two
images
(%>
(Precision %)
(%
Precision Change)
Sum Minimum Maximum
LAB
51.5
+
5.5
-
5.6
+0.9
0
LUV
51.3
-
0.5
-
0.4 -4.5
YUV
50.7 +10.7
+
0.8
+0.2
LAB
32.0
+
1.0
-10.8 -5.1
10
LUV
30.6
+
6.6
-
8.6 f2.5
YUV
31.8 +14.2 -10.3 -3.8
LAB
24.5
-t
2.8 -12.0
-2.0
20
LUV
23.3
4-
9.6
-
4.1
+4.5
5.
Conclusions
YUV
24.5
+
8.7
-
5.0
-4.5
LAB
18.9
+
7.5
-
5.7
-5.3
30 LUV
18.0 +20.1
+
0.6
$2.0
Y
uv
17.9
+20.3
+
9.1
$2.1
In this paper, we have examined whether presenting
multiple image queries to a content-based image retrieval
system improves the retrieval effectiveness over single-
image querying. We have shown that for selected param-
eters, using more example images improves retrieval per-
formance. Two-example queries with selected parameters
improve
retrieval
effectiveness
by
between
9%-20%
over
single-example queries.
We
have also found that using more
than three examples
in
a
query
is
unlikely
to
improve re-
trieval significantly.
In
future work, we aim to develop heuristic methods to
capture the user's requirements. If a system is presented
with two different examples--one of a red rose and another
of a yellow daffodil-should the system judge we are inter-
ested
in
only red or yellow objects, red or yellow flowers, or
any type
of
flower? Most humans would not agree on any
one solution to this problem, and it is probable that iterative
feedback methods must be incorporated into the system
to
continually re-evaluate the user's opinion of the results.
Content-based image retrieval is becoming more impor-
tant with the increasing size and prevalence of image repos-
itories. We have shown that multiple-example querying can
'
improve retrieval effectiveness in searching these databases
and better meet user information needs.
Acknowledgments
This work was supported by the Australian Research
Council and the Multimedia Database Systems group at
RMIT University. We thank
M.V.
(Rania)
Ramakrishna and
Surya Nepal for their contribution to the CHITRA project.
We
also express our appreciation to the anonymous referees
for their helpful comments.
References
[l
J
B. Berlin and
P.
Kay.
Basic
Color Terms: Their Universaliq
and Evoluriori.
U.
of Cal. Press., Berkeley, Califomia, USA,
1969.
[2]
C. Carson,
S.
Belongie,
H.
Greenspan, and
J.
Malik. Region-
based image querying. In
Proc
of
IEEE
Workshop
on
Content-based
Access
of
Image
and Video Libraries,
1997.
In conjunction with
IEEE
CVPR
'97.
131
C. Carson
and
V.
E.
Ogle. Storage and retrieval
of
feature
data for a very large online image collection.
Bulleriu
of
the lEEE Computer
Socieo
Technical Committee
on
Data
Engineering,
19(4):
1%27,
December
1996.
[4]
C.
Faloutsos,
R.
Barber,
M.
Flickner,
J.
Hafner,
W.
Niblack,
D. Petkovic, and
W.
Equitz. Efficient and effective querying
by image content.
Journal
of
lntelligent Information
Sys-
tems,
3(3
&
4):231-262, July 1994.
[5]
M. Flickner,
H.
Sawhney,
W.
Niblack.
J.
Ashley,
Q.
Huang,
B. Dom,
M.
Gorkani,
J.
Hafner, D. Lee, D. Petkovic, and
D.
S.
and. Query by image and video content: The QBIC
system.
Computer,
28(9):23-32,
September
1995.
142
---
Three Examples (Sum)
----.
Two
Examples
(Sum)
-
One Example
60
8
nn
--
----
I
I
I I
I
0
20
40
60
80
100
Recall
(%)
Figure
2.
The performance of three examples-using the sum combining function, the
YUV
colour
space, and Euclidean distance-is not markedly better than that of two example querying.
-
Ten examples
_._---
Five examples
_--.
One example
--.
---
-
------___
-___
----*__
-
-
- - -
-
--=-a-
-I-_
0
I
I
I
I
100
1
0
20
40
60 80
Recall
(%)
Figure
3.
Effect
of
increasing the number of query examples using the
LAB
feature space, Manhattan
distance, and the sum combining function.
[6]
A. Gupta. Visual information retrieval: A Virage perspec-
tive. Technical Report Revision
4,
Virage Inc., 9605 Scran-
ton Road, Suite
240,
San Diego, CA 92121, 1997. URL:
http://www.virage.com/wpaper/.
[7]
W.
Y.
Ma and
B.
S.
Manjunath. NETRA:
A
toolbox for nav-
igating large image databases. In
Proc.
IEEE
International
Conference
oti
Image Processing,
IEEE
International Con-
ference
on
Image Processing, pages 568-571, Santa Bar-
bara, California, Oct 1997.
[8]
B.
S.
Manjunath and
W.
Y.
Ma. Texture features for brows-
ing and retrieval of image data.
IEEE
Transactions on Pat-
tern
Analysis
arid
Machine
Intelligence,
18(8):837-842, Au-
gust 1996.
191
S.
Nepal and
M.
Ramakrishna.
A
generalized test bed for
image databases.
In
10th
lnternationul Coilference
of
the
In-
formation Resources Management Association,
pages 926-
928,
Hershey, Pennsylvania,
USA,
May 1999.
[lo1
S.
Nepal,
M.
V. Ramakrishna, and
J.
A.
Thom.
Four layer
schema
for
image data modelling.
In
Australian Com-
piiter
Science
Communications,
Vol20,
No
2,
Proceedings
of
the
9th
Australasian Database Conference,
ADC’98,
pages
189-200,
Perth,
Australia, 2-3 February 1998.
11
1)
1.
C.
on
Illumination. Publication
CIE
No.
17.4,
Intema-
tional Lighting Vocabulary.
[I23
C. Poynton. Frequently asked questions about color, 1997.
URL:
http://home.inforamp.net/-poynton/colorfaq.html.
1131
W.
K.
Pratt.
Digital Image Processing.
Wiley, New York,
USA,
second edition, 1999.
1141
M.
Ramakrishna and
J.
Lin. Gabor histogram feature for
content-based image retrieval. In
Proc.
Fifth
Iiiternationul
Conference
cm
Digital Image Computing,
Perth, Australia,
December 1999.
143
-
Manhattan Distance
---. Euclidean Distance
I
I
I
I
I
0
20
40
60
80
100
Recall
(%)
Figure 4. The Manhattan distance measure performs better than the Euclidean distance using the
HSV
feature space.
G.
Salton.
Automatic Text Processing: The Transformation,
Analysis, and Retrieval of Information by Computer.
Addi-
son, Reading, Massachusetts,
1989.
S.
Santini and R. Jain. Similarity matching. In
Proceed-
ings of the Second Asian Conference on Computei- Vision,
ACCV
'95.
Invited Paper,
pages 571-580, Singapore, De-
cember 1995.
M. Seabom,
L.
Hepplewhite, and J. Stonham. Fuzzy colour
category map
for
content-based image retrieval. Technical
Report
701,
Department
of
Electrical and Electronic Engi-
neering, Brunel University, Uxbridge, Middlesex, UB8 3PH,
UK,
1999.
[18]
M. Seabom,
L.
Hepplewhite, and
J.
Stonham.
Pisaro:
Per-
ceptual colour and texture queries using stackable mosaics.
In
Proceedings of the International Conference
on
Multi-
media Computing and Systems,
Uxbridge, Middlesex, UB8
3PH,
UK,
1999.
[19]
J.
R. Smith and
S.-E
Chang. VisualSEEk: A fully automated
content-based image query system.
In
Proc ACM Multime-
dia
96, pages
87
-
98, Boston, MA, November
1996.
[20]
I.
Witten, A. Moffat, and
T.
Bell.
Managing Gigabytes:
Compressing and Indexing Documents and Images.
Morgan
Kaufmann Publishers, Los Altos, CA
94022,
USA, second
edition, 1999.
144