ArticlePDF Available

A diffusion approach for interactive image retrieval

Authors:
HAL Id: hal-00808245
https://hal.archives-ouvertes.fr/hal-00808245
Submitted on 5 Apr 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
A diusion approach for interactive image retrieval
Hassan Tabout, Youssef Chahir, Abderrahman Sbihi
To cite this version:
Hassan Tabout, Youssef Chahir, Abderrahman Sbihi. A diusion approach for interactive image
retrieval. Studia Informatica Universalis, Hermann, 2010, 8 (4), pp.111-127. <hal-00808245>
A diffusion approach for interactive image
retrieval
H. Tabout *, Y. Chahir ** , A. Sbihi **
*LASTID, Ibntofail University, Morocco
h.tabout@ieee.org
** GREYC - CNRS UMR 6072
Universit´
e de Caen, France
youssef.chahir@info.unicaen.fr
*** Abdelemalek Essaadi University, Morocco
sbihi@cari-info.org
R´
ESUM ´
E.
ABSTRACT. We study in this paper the problem of using multiple-instance semi-supervised
learning to solve image Relevance feedback problem. Many multiple-instance learning algo-
rithms have been proposed to tackle this problem; most of them only have a global repre-
sentation of images. In this paper, we present a semi-supervised version of multiple instance
learning. By taking into account both the multiple-instance and the semi-supervised properties
simultaneously. A novel graph-based diffusion algorithm is developed, in which global and lo-
cal information are used. Experimental results show promising results of the proposed method
for a test database containing more than 2000 color seaweed images.
MOTS-CL ´
ES : Bouclage de pertinence, diffusion, Apprentissage multi-instance, images
d’algues
KEYWORDS: Relevance Feedback, Diffusion, Multi-Instance Learning, Seaweed images
1. Introduction
The present work is interested in indexing and retrieving algae
images. Seaweeds or macroscopic marine algae, one of the important
Studia Informatica Universalis.
112 Studia Informatica Universalis.
marine living resources could be termed as the futuristically promising
plants. These plants have been a source of food, feed and medicine.
Agar, Carrageenan and Alginate are popular examples of products ex-
tracted from seaweeds. These have been used as food for human beings,
feed for animals, fertilizers for plants and source of various chemicals.
Seaweed products are used in our daily lives in one or the other way.
For example, some seaweed polysaccharides are employed in the ma-
nufacture of toothpastes, soaps, shampoos, cosmetics, milk, ice creams,
meat, processed food, air fresheners and a host of other items [1, 2].
However, many species are deemed to be harmful [3]. The aim of this
study is to manage and to use multimedia metadata to facilitate access to
our biodiversity heritage. This is done by considering, on the one hand,
the tools of analysis and image processing for the description of these
images’ contents and on the other hand, the installation of a search and
navigation system in such a base.
Recently, with the rapid development in various multimedia techno-
logies, large collections of digital pictures have sprung up easily and
users would like to retrieve and browse these collections. Consequently,
Content-Based Image Retrieval (CBIR) has gained significant interest
in computer vision field [4, 5]. CBIR systems rely on low-level features
automatically extracted from image, such as color, texture and shape,
to retrieve relevant images most similar to a query. However, Such ap-
proaches suffer from the fact that the comparison is usually performed
using features extracted automatically from each image. These features
should comply with the human perception. This requirement is a diffi-
cult challenge ; the difficulty comes from the semantic gap between low
level image representation and higher level concepts by which human
understand images. Since an image contains several regions of interest;
each region may have different contents and represent different seman-
tic meaning. It is basic then, to segment an image into blobs and ex-
tract visual features from each blob. Hence, many local image indexing
methods have been proposed [5, 6, 7] to alleviate the problem of the se-
mantic gap. However, such methods may not be sufficient, since they do
not take into account human judgement. An ideal image retrieval system
from a user perspective would involve what is referred as semantic re-
trieval. The straightforward solution is to annotate each image manually
with keywords and then search on those keywords using a text search
Diffusion Approach 113
engine. The underlying principle of this approach is that keywords can
capture the semantic content of images more precisely, and thus provide
better means to organize and search an image database. However, ma-
nual annotation is not scalable and very expensive when the volume of
data becomes important.
Relevance feedback algorithm tries to learn the relationship between
the content of an image and its semantic meaning. Therefore, it seeks to
retrieve images that best describe the visual content of the query image.
After retrieval sessions, a mapping between semantic meaning and vi-
sual content is learned and the system dynamically learns the user’s in-
tention and gradually boosts its retrieval performance. In this paper, we
formulate image relevance feedback as a supervised learning problem
and present a novel solution using Multiple-Instance Learning (MIL).
In our framework, images are viewed as bags, each of which contains a
few instances corresponding to the segmented image regions.
The remainder of the paper is organized as follows. The next section
summarizes related works. Section 3 describes the used segmentation
approach. MIL-based bag feature representation is presented in section
4. Section 5 provides a brief review of the overall algorithm. In section
6, simulation results and evaluation are provided, and finally, conclu-
ding remarks are offered in section 7.
2. Related works
Image segmentation has attracted much attention in the computer vi-
sion applications, as a prior step for the recognition of different image
elements or objects. Several algorithms have been introduced to tackle
this problem. In this paper we provide a brief review of the related work
that is most relevant to our approach. Targhi et al. have proposed in [8]
a novel segmentation approach based on the Eigentransform. The trans-
form provides a measure of roughness by considering the eigenvalues
of a matrix which is formed by inserting the grey values of a square
patch around a pixel directly into a matrix of the same size.
Recently, graph-based approaches have been gained significant in-
terest to image segmentation [9, 10]. The basic idea of the underlying
114 Studia Informatica Universalis.
approaches is the construction of a weighted graph G = (V, E) whose
each node represents a pixel of the image and the weight of an edge
is some measure of the dissimilarity between the two pixels connected
by that edge (e.g., intensity, color, or some other local feature). This
graph is partitioned into components in a way that minimizes some spe-
cified cost function of the vertices in the components and/or the boun-
dary between those components. The procedure used here, is based on
a diffusion model. We define a random walk through a local window by
assigning a transition probability to each link. Chahir et al. have used
diffusion on graph to recognize facial expressions [11] . In [12], the au-
thors have used a diffusion kernel based nonlinear approach to identify
the modality relationship between visual and text modalities. The au-
thors in [13, 14] have used random walks approach to interactive image
retrieval frameworks.
Much works on multiple-instance learning have been developed to
object-based image retrieval problem [15]. Multiple-instance learning
was first introduced by Dietterich et al. [16] in the drug activity pre-
diction problems, and then broadly used in content-based image retrie-
val and image annotation [15, 17, 18]. In multi-instance learning, the
training set is composed of many bags each contains many instances.
The goal of a MIL algorithm is to generate a classifier that will clas-
sify unseen bags correctly. A bag is positive if at least one instance in
it is positive and negative if all the instances in it are negative. In re-
gion based image retrieval, Images are segmented into small regions,
and each region represents an instance. Under MIL setting, each query
image which contains the target object is considered as a positive bag,
while the other negative labeled images are considered as negative ones.
Objects containing or within the target object are considered as positive
instances, while the others are negative instances.
There have been much works on applying semi-supervised learning
(SSL) to solve practical problems by propagating label information. Zhu
has presented in [19] a detailed semi-supervised learning survey. Ho-
wever, most of them are only interested in single-instance setting, whe-
reas image retrieval is often presented as a multiple instance learning
problem. In this paper we have considered both multiple-instance and
semi-supervised properties.
Diffusion Approach 115
Related to multiple-instance semi supervised learning, Rahmani and
Goldman [20] have presented a Graph-Based Semi-Supervised Lear-
ning Method to address object-based image retrieval task, that trans-
forms any MI problem into an input for a graph-based single-instance
semi supervised learning method that encodes the MI aspects of the pro-
blem simultaneously working at both the bag and point. In [21] Zhou
et al. have proposed a Multiple Instance learning by Semi-Supervised
Support Vector Machine (MissSVM) algorithm, which tackles multi-
instance problems using semi-supervised learning techniques, in parti-
cular, a special semi-supervised SVM.
3. Texture Segmentation
Region based representation of images is an effective way to improve
accuracy in image retrieval. Global features do not always represent sa-
lient objects seen in an image. Such features are computationally effec-
tive but provide rough representation of the image content. So, higher
retrieval performance will be more effective with taking into account
more precise information. Our approach to segmentation uses a spectral
approach based on a random walks process.
Given a graph and a starting node, we move to a neighbor of it at
random ; then we select randomly a neighbor of this node and we move
to it etc. the random sequence of selected nodes is a random walk on a
graph. Let G= (V, E )be a connected graph with n nodes and m edges.
A random walker on graph G moves from a node u to a neighbor node
v with the probability :
p(u, v) = w(u, v)
d(u)(1)
Where : w (u, v) is the edge weight between vertex u and v and d (u)
is total weight of edges incident to vertex u. The sequence of random
vertices (vt: t=0, 1, 2...) is a Markov chain. We denote by P the matrix
of transition probabilities of this Markov chain. So :
P=D1W(2)
116 Studia Informatica Universalis.
Where : D denote the diagonal matrix with D (u, u) = d (u) and W is
the similarity matrix.
A classical construction of weight matrix is based on Gaussian kernel
of 0-mean and variance σ. Let d (u, v) be some general distance measure
between nodes u and v, then the weight w(u,v) will be computed as
follows :
W(u, v) = exp(d(u, v)
σ2)(3)
The goal here is to extract local and non local information. The graph
is then constructed from the following local and non-local definition :
W(u, v) = (exp kuvk2
σ2
1kF(u)F(v)k2
σ2
2if u, v VpXp (u);
0otherwise.
(4)
VpXp (u)is a square window of size p around u. The feature vector
F is a texture descriptor based on the mean of anisotropy, contrast and
polarity throughout the window.
We proceed by computing the eigenvalues of the transition matrix P
and sorting them in decreasing order. Finally we compute texture des-
criptor at each pixel by using Eigen transform.
Γ(G) =
w
X
i=1
λi(5)
Where w is the number of Eigen values.
In the context of image segmentation, we have proposed a novel me-
thod based on the trace of the Markov matrix. A brief description of the
proposed algorithm is summarized as follows : Given an image X
1) We consider a square neighborhood of size p x p pixels around
each pixel.
Diffusion Approach 117
2) We compute a texture descriptor F throughout the window and we
assign this descriptor to the pixel. This descriptor will be used in matrix
similarity (equation 4).
3) We construct the Markov Matrix P
4) We measure texture in each pixel by computing the trace of the
matrix P.
To test the performance of each algorithm several criteria were used.
Respectively, tp, tn, fp, and fn, stand for the number of pixels being
labelled as true positive, true negative, false positive, and false negative.
Accuracy =tp +tn
tp +fp +f n +tn (6)
P recision =tp
tp +fp (7)
Recall =tp
tp +fn (8)
Experimental results are presented in Table 1. The algorithms per-
formed comparably, in terms of pixel accuracy, precision and recall.
However, the Eigen Transform method require more time than the pro-
posed algorithm.
Tableau 1 Average performance on tested images.
Method Accuracy Precision Recall
Sum of Eigen values 0.70 0.74 0.75
Trace of P 0.69 0.74 0.74
In order to compare our method with an existing one, we have cho-
sen the technique of Shi and Malik (Ncut). We have processed a group
of images with our segmentation method and compared the results to
Ncut algorithm. The Ncut algorithm uses the optimal parameters given
by authors [10]. Figure 1 shows a comparison between the proposed
approach and Ncut algorithms. According to the segmentation results
on these images, we note that our algorithm best localize regions in the
processed image compared to the Ncut method.
118 Studia Informatica Universalis.
(a) (b)
(c) (d)
(e) (f)
Figure 1 A comparison between the proposed approach and Ncut al-
gorithms.(a) and (d). Original images. (b) and (e) the output of Ncut
algorithm. (c) and (f) the output of our algorithm.
4. Graph-based multi instance semi supervised learning
The goal of this paper is to predict the labels of the unlabeled images.
Semi supervised learning is very useful in such problem. In this pa-
per, we presented a graph-based learning algorithm. The key to semi-
supervised learning problems is the prior assumption of consistency,
which means : (1) nearby points are likely to have the same label ; and
(2) points on the same structure (typically referred to as a cluster or a
manifold) are likely to have the same label [22]. The first assumption is
Diffusion Approach 119
local, while the second one is global. We have integrated both local and
global information during learning.
In this section we will introduce the relevance feedback procedure in
detail. We will describe a Multi-Instance and Semi-Supervised Learning
method by combining local and global information.
4.1. Local Representations
In multi-instance learning, the training set is composed of many bags
each includes many instances. A bag is positively labeled if it contains
at least one positive instance ; otherwise it is labeled as a negative bag.
The task is to learn some concept from the training set for correctly
labeling unseen bags.
In region-based image retrieval, each region is an instance, and the
set of regions that comes from the same image can be treated as a bag.
We annotate an image as positive if at least one region in the image has
the semantic meaning of the requested species. Given an image contai-
ning several regions, we can expect that at least one region will corres-
pond to the user interest need even if segmentation may be imperfect.
Hence, the image retrieval problem is in essence identical to the MIL
setting. Each image in the image database is segmented. Color and tex-
ture features of each region are computed. The image is then seen as a
bag containing many instances (feature vectors).
Based on the segmentation results, the representative color feature
for each region is calculated by the color moments. The representative
texture feature for each region is computed by the average anisotropy,
polarity and contrast.
4.2. Global representations
Inaccurate image segmentation may make the MIL-based bag fea-
ture representation imprecise and therefore decrease the retrieval accu-
racy. We add global-feature to address this problem. In order to com-
pensate the limitations associated with the specific color space and the
specific texture representation, we construct the global features in a
120 Studia Informatica Universalis.
different manner as used in creating the regional features. Color histo-
grams represent color features ; whereas the representative texture fea-
ture is computed by the average energy in each high frequency band
after 2-level wavelet decompositions. After the global features of all the
training images are obtained, they are fed into another set of a global
bag.
5. Algorithm
The interactive system consists in asking the user questions such that
his/her responses make it possible to reduce the semantic gap according
to the following steps :
Step1. The system compares the query image with each image data-
base using feature vectors. Similarity measurement, in the first retrieval,
is carried out by Euclidean distance and the most similar images are re-
turned and displayed by achieving the first retrieval stage.
Step2. The user annotates displayed images as positive or negative
items according to his/her interest need.
Step3. The system then predicts the label of the unlabeled images :
Let X denotes the image set X={x1, ..., xl, xl+1 , . . . , xn} m
and a label set L={+ 1 ,1}, the fist lpoints xi(il)are labeled as
yiLand the remaining points xu= ( l+ 1 un)are unlabeled.
Here yiis equal +1 if the image xibelongs to the user’s class of interest
and yi=1otherwise. We define a function F ; which assigns a value
yito image xi[23] by :
F= (IαS )1y(9)
Where I denotes the identity matrix and αis a parameter in (0 ; 1) ;
In graph-based learning, feature vectors are arranged in a weighted
undirected graph. The graph is characterized by a weight matrix W,
whose elements W ij >= 0 are similarity measures between vertices
i and j, and by its initial label vector. The commonly used weight is
defined by the equation 3.
Diffusion Approach 121
The proposed algorithm is summarized as follows :
1) We use the segmentation approach described in 3.1, to segment
the images in database and obtain the regions of each image. We com-
pute the visual features of every image in the database and build the
local similarity matrix WLby using K-nearest neighbor in order to
connect each vertex only to its k-nearest neighbors. We use average
Hausdorff distance in equation 3 to compute distance between two bags
of images.
2) Extract global features and Construct the global similarity matrix
Wg. We use here Euclidean distance in Equation 3. The final similarity
is defined as :
W=WLWg(10)
3) Construct the matrix S=D1/2WD1/2in which D is a
diagonal matrix and Dii =Pn
j=1 wij
4) labelling the unlabelled images by Computing F (equation 9).
6. Discussion and results
The algae image database contains algae images of various classes :
Gelidium, Codium, Blidingia. ..There are more than 2000 images.
Images are segmented using the algorithm described above, only re-
gions larger than a threshold are selected. A 12 dimensional low-level
feature vector is extracted from each region, which includes 9 color fea-
tures and 3 texture features. For global bag, each image is indexed by
54 dimensional feature vector. This vector includes 48 components for
color histogram generated in HSV color space and 6 for texture.
To evaluate the proposed algorithm, a total of 40 images, one from
each species, were selected as the query images. For each query, the
top 16 images were retrieved to provide necessary relevance feedback.
Using this method, in the ideal case all the top 16 retrievals are from
the same species. The performance was measured in terms of average
retrieval rate of the 40 query images, which was defined by [23] :
Retrieval rate = relevant images
class size ×100%.
122 Studia Informatica Universalis.
Table 2 exhibits the average retrieval rate, where iter denotes the
number of iterations. The following observations were made from the
results.
Firstly, the performance with the relevance feedback method after
three iterations was substantially better than the noninteractive scheme
(iter=0) and the improvements are important. Secondly, after three
sessions of interactive learning, our method gave a promising perfor-
mance : on average 93.54% of the correct images are in the top 16 re-
trieved.
Tableau 2 Average Retrieval Rate (%) For 40 Query image.
Average Rate Retrieval (%)
iter=0 70.26
iter=1 90.06
iter=2 92.96
iter=3 93.54
In the second experience we measured retrieval performance for each
species. Retrieval effectiveness can be defined in terms of precision and
recall rates. A precision rate can be defined as the percent of retrieved
images similar to the query among the total number of retrieved images.
A recall rate is defined as the percent of retrieved images, which are
similar to the query among the total number of images similar to the
query in the database.
Figure 2 shows the recall precision graphs correspond to a 4 species :
gelidium sesquipedale, codium fragile, Fucus spiralis and blidinga mi-
nima after third round of relevance feedback. It is clear from the graphs
that the use of proposed approach improves the performance. Moreover,
adopting the relevance feedback mechanism presents a slight increase
in terms of consuming time.
Diffusion Approach 123
7. Conclusion
In this paper, we have introduced a novel framework for region ba-
sed algae image retrieval. We have proposed a method of segmenting
images based on a spectral decomposition with a local random walks
model. We formulate interactive image retrieval as a semi supervised
learning problem and present a novel solution using Multiple-Instances.
This allows bridging the semantic gap from the low level description.
The user can access directly to the objects of interest, specifically algae
species and find images which contain the requested species. The eva-
luation showed that the proposed approach gives good results according
to our test image database.
In the future, we will work on a larger algae database by conside-
ring more species ; we plan to present an image annotation system by
mining and refining more relevant semantic information and building
more suitable connection between image content features and available
semantic information.
124 Studia Informatica Universalis.
(a) (b)
(c) (d)
Figure 2 Precision and recall graph comparing interactive and nonin-
teractive methods of 4 species : (a) Gelidium sesquipedale, (b) Codium
fragile, (c) Fucus spiralis and (d) Blidinga minima.
R´
ef´
erences
[1] V.K. Dhargalkar and N. Pereira,”Seaweed : Promising plant of the
millennium” Science and Culture, vol.71 (3-4), pp 60-66, 2005.
[2] S. Ayyappan, ”Seaweed Cultivation and Utilization” National
Academy of Agricultural Sciences, pp 22, 2003.
[3] R. Akallal, S.Alaoui, T.Givernaud and A.Mouradi Intoxications
alimentaires associes aux efflorescences d’algues marines nocives
Revue Marocaine, 2001.
Diffusion Approach 125
[4] A.W. M. Smeulders, M.Worring, S. Santini, A. Gupta, and R. Jain,
.Content based image retrieval at the end of the early years, IEEE
Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no.
12, pp. 1349.1380, 2000
[5] C.Carson, S. Belongie, H. Greenspan, and J.Malik ”Blobworld :
Image segmentation using Expectation-Maximization and its ap-
plication to image querying IEEE Transaction on pattern analysis
and machine intelligence, vol 24, N 8, pp 1026-1038,2002.
[6] J. Ashley et al. Automatic and semiautomatic methods for image
annotation and retrieval in QBIC. In SPIE Proc. Storage and Re-
trieval for Image and Video Databases, pages 24-35, 1995.
[7] Khanh Vu et al. Image Retrieval Based on Regions of Interest”
IEEE Transactions on knowledge and data engineering, vol. 15,
no. 4, July/august 2003, pp. 1045-1049.
[8] Targhi Tavakoli, A.Hayman, E.Eklundh, J.Shahshahani ”The
eigen-transform and applications”. ACCV (1), pp70-79. 2006.
[9] P. Soundararajan and S. Sarkar, ”An in-depth study of graph parti-
tioning measures for perceptual organization. IEEE Trans. Pattern
Anal.Mach. Intell. Vol. 25, no. 6, pp. 642-660, 2003.
[10] J. Shi and J. Malik, ”Normalized cuts and image segmentation”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 22, no. 8, AUGUST 2000.
[11] Y. Chahir, Y. Zinbi et K. Aziz, ”Catgorisation des expressions fa-
ciales par marches alatoires sur graphe”, CORESA 2007.
[12] Rajeev Agrawal et al. ”Application of Diffusion Kernel in Multi-
modal Image Retrieval”, MIPR 2007.
[13] H.Sahbi et al.” Graph Laplacian for Interactive Image Retrieval”,
ICASSP, 817-820. April 4 2008.
[14] Jana Urban and Joemon M. Jose. ”Adaptive Image Retrieval using
a Graph Model for semantic Feature Integration”. MIR’06, Octo-
ber 26-27, Santa Barbara, California, USA. 2006
[15] Changhu Wang et al. ”Graph-based Multiple-Instance Lear-
ning for Object-based Image Retrieval”. International Multimedia
Conference, pp 156-163. 2008.
126 Studia Informatica Universalis.
[16] Dietterich, T. G., Lathrop, R. H., and Lozano-Perez, T. ”Solving
the multiple instance problem with axis parallel rectangles”. Artif.
Intell. Vol 98, pp 31-71. 1997
[17] Changbo Yang and Ming Dong, ”region-based image annota-
tion using asymmetrical support vector machine-based multiple-
instance learning”. Computer Vision and Pattern Recognition (CV-
PR’06. pp 2057 - 2063. 2006
[18] C. Yang, M. Dong, F. Fotouhi, ”Region based image annotation
through multiple instance learning”, ACM Conference on Multi-
media (MM’05), Singapore, pp. : 435-438, Nov., 2005
[19] Zhu, X. ”Semi-Supervised Learning Literature Survey”. Com-
puter Sciences Technical Report 1530, University of Wisconsin-
Madison, 2005.
[20] Rahmani, R. and Goldman, S. A. (2006). ”MISSL : multiple ins-
tance semi-supervised learning”. In Proc. of ICML, 2006.
[21] Zhi-Hua Zhou and Jun-Ming Xu, ”On the Relation Between
Multi-Instance Learning and Semi-Supervised Learning”. IMC,
pp. 1167-1174. 2007.
[22] Dengyong Zhou et al. ”Learning with Local and Global Consis-
tency”. Advances in Neural Information Processing Systems 16,
pp 321-328. 2004.
[23] B. S. Manjunath and W. Y. Ma, ”Texture features for browsing and
retrieval of image data, IEEE Trans. Pattern Anal. Machine Intell.
vol.18, no. 8, pp. 837-842, 1996.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Résumé La description des expressions faciales est un ensemble d'interprétations successives de composantes faciales. Un sens est construit à partir de distances de bas niveaux (présence ou non d'une composante, position éventuelle). Dans cet article, nous proposons une approche basée sur la diffusion géométrique par marches aléatoires sur graphe, pour la catégorisation des expressions faciles. L'idée de base est de considérer un graphe modélisant tous les visages d'une base de grande dimension à travers les descripteurs des différentes composantes faciales. Cette approche permet de caractériser au mieux l'information visuelle en se basant sur les graphes de similarités et l'exploitation de leurs propriétés spectrales. Pour cela, nous proposons un modèle unifié pour l'extraction des points caractéristiques d'une expression faciale, et nous définirons un ensemble de distances caractéristiques auquel nous associons un ensemble de règles logiques.
Conference Paper
Full-text available
There has been much work on applying multiple-instance (MI) learning to content- based image retrieval (CBIR) where the goal is to rank all images in a known reposi- tory using a small labeled data set. Most existing MI learning algorithms are non- transductive in that the images in the repos- itory serve only as test data and are not used in the learning process. We present MISSL (Multiple-Instance Semi-Supervised Learn- ing) that transforms any MI problem into an input for a graph-based single-instance semi- supervised learning method that encodes the MI aspects of the problem simultaneously working at both the bag and point levels. Unlike most prior MI learning algorithms, MISSL makes use of the unlabeled data.
Conference Paper
Interactive image search or relevance feedback is the process which helps a user refining his query and finding difficult target categories. This consists in a step-by-step labeling of a very small fraction of an image database and iteratively refining a decision rule using both the labeled and unlabeled data. Training of this decision rule is referred to as transductive learning. Our work is an original approach for relevance feedback based on Graph Laplacian. We introduce a new Graph Laplacian which makes it possible to robustly learn the embedding of the manifold enclosing the dataset via a diffusion map. Our approach is two-folds: it allows us (i) to integrate all the unlabeled images in the decision process and (ii) to robustly capture the topology of the image set. Relevance feedback experiments were conducted on simple databases including Olivetti and Swedish as well as challenging and large scale databases including Corel. Comparisons show clear and consistent gain of our graph Laplacian method with respect to state-of-the art relevance feedback approaches.
Article
The multiple instance problem arises in tasks where the training examples are ambiguous: a single example object may have many alternative feature vectors (instances) that describe it, and yet only one of those feature vectors may be responsible for the observed classification of the object. This paper describes and compares three kinds of algorithms that learn axis-parallel rectangles to solve the multiple instance problem. Algorithms that ignore the multiple instance problem perform very poorly. An algorithm that directly confronts the multiple instance problem (by attempting to identify which feature vectors are responsible for the observed classifications) performs best, giving 89% correct predictions on a musk odor prediction task. The paper also illustrates the use of artificial data to debug and compare these algorithms.
Conference Paper
In an annotated image database, keywords are usually associated with images instead of individual regions, which poses a major challenge for any region based image annotation algorithm. In this paper, we propose to learn the correspondence between image regions and keywords through Multiple-Instance Learning (MIL). After a representative image region has been learned for a given keyword, we consider image annotation as a problem of image classification, in which each keyword is treated as a distinct class label. The classification problem is then addressed using the Bayesian framework. The proposed image annotation method is evaluated on an image database with 5,000 images.
Conference Paper
Multi-instance learning and semi-supervised learning are different branches of machine learning. The former attempts to learn from a training set consists of labeled bags each containing many unlabeled instances; the lat- ter tries to exploit abundant unlabeled in- stances when learning with a small number of labeled examples. In this paper, we estab- lish a bridge between these two branches by showing that multi-instance learning can be viewed as a special case of semi-supervised learning. Based on this recognition, we pro- pose the MissSVM algorithm which addresses multi-instance learning using a special semi- supervised support vector machine. Experi- ments show that solving multi-instance prob- lems from the view of semi-supervised learn- ing is feasible, and the MissSVM algorithm is competitive with state-of-the-art multi- instance learning algorithms.
Conference Paper
We study in this paper the problem of using multiple-instance semi-supervised learning to solve object-based image retrieval problem, in which the user is only interested in a portion of the image, and the rest of the image is considered as irrelevant. Although many multiple-instance learning (MIL) algorithms have been proposed to solve object-based image retrieval problem, most of them only have a supervised manner and do not fully utilize the information of the unlabeled data in the image collection. In this paper, to make use of the large amount of unlabeled data, we present a semi-supervised version of multiple- instance learning, i.e. multiple-instance semi-supervised learning (MISSL). By taking into account both the multiple-instance property and the semi-supervised property simultaneously, a novel regularization framework for MISSL is presented. Based on this framework, a graph-based multiple-instance learning (GMIL) algorithm is developed, in which three kinds of data, i.e. labeled data, semi-labeled data, and unlabeled data simultaneously propagate information on a graph. Moreover, under the same framework, GMIL can be reduced to a novel standard MIL algorithm (GMIL-M) by ignoring unlabeled data. We theoretically prove the convergence of the iterative solutions for GMIL and GMIL-M. We apply GMIL algorithm to solving object-based image retrieval problem, and experimental results show the superiority of the proposed method. Some experiments on standard MIL problems are also provided to show the competitiveness of the proposed algorithms compared with state- of-the-art MIL algorithms.
Conference Paper
The variety of features available to represent multimedia data con- stitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the retrieval model should ideally incorporate context-dependent fea- ture representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both fea- ture similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. How- ever, the model is not restricted to the image domain and could eas- ily be employed for retrieving multimedia data (and even a combi- nation of different domains, eg images, audio and text documents). Categories and Subject Descriptors: H.3.3 (Information Storage and Retrieval): Information Search and Retrieval—relevance feed- back, retrieval models
Conference Paper
Advances in technologies for scanning, networking, and CD-ROM, lower prices for large disk storage, and acceptance of common image compression and file formats have contributed to an increase in the number, size, and uses of on-line image collections. New tools are needed to help users create, manage, and retrieve images from these collections. We are developing QBIC (query by image content), a prototype system that allows a user to create and query image databases in which the image content -- the colors, textures, shapes, and layout of images and the objects they contain -- is used as the basis of queries. This paper describes two sets of algorithms in QBIC. The first are methods that allow `query by color drawing,' a form of query in which a user draws an approximate color version of an image, and similar images are retrieved. These are automatic algorithms in the sense that no user action is necessary during database population. Secondly, we describe algorithms for semi-automatic identification of image objects during database population, improving the speed and usability of this manually-intensive step. Once outlined, detailed queries on the content-properties of these individual objects can be made at query time.