Conference PaperPDF Available

Unified Hybrid Image Retrieval System with Continuous Relevance Feedback


Abstract and Figures

In this paper, we present a unified hybrid image retrieval system consisting of the following components: various visual features and their combinations, combination of visual and textual feature spaces, combination of visual and textual feature spaces in the context of search refinement, and interactive user interface with hybrid relevance feedback, exploratory search, query history, relevance bar, and positive and negative results panels. In the paper we also introduce two novel hybrid spinoff models and describe the new continuous relevance feedback framework that allows us to move away from graded relevance and shows the relationships between feedback images.
Content may be subject to copyright.
Unified Hybrid Image Retrieval System with
Continuous Relevance Feedback
Leszek Kaliciak, Hans Myrhaug, and Ayse Goker
Ambiesense Limited, Aberdeen, Scotland
Email:; Email:; Email:
Abstract—In this paper, we present a unified hybrid image
retrieval system consisting of the following components: various
visual features and their combinations, combination of visual and
textual feature spaces, combination of visual and textual feature
spaces in the context of search refinement, and interactive user
interface with hybrid relevance feedback, exploratory search,
query history, relevance bar, and positive and negative results
In the paper we also introduce two novel hybrid spinoff
models and describe the new continuous relevance feedback
framework that allows us to move away from graded relevance
and shows the relationships between feedback images.
KeywordsInformation Fusion, Intelligent User Interfaces,
Hybrid Models, Multimedia Retrieval, Relevance Feedback
Only a few prototype systems utilize both visual and textual
information in the context of relevance feedback (e.g. [1], [14],
[12]). These systems, however, combine the features in an ad-
hoc manner. No inter or intra feature relationships are modelled
within the proposed frameworks. Moreover, existing hybrid
prototype systems do not support the user-system interaction
at such high level as some mono-modal approaches (e.g. [8],
Commercial systems have only recently started to use
visual content of images (e.g. Baidu, Yandex, Google, iStock
Photo). They usually allow for searching based on one of the
modalities only and offer very little interactivity with the user.
We propose a novel solution to alleviate this problem.
Relevance feedback in Content-based Image Retrieval
(CBIR) helps to narrow down the search by involving the
users in the search refinement process. After issuing a query
and presenting the initial results, the user can provide his/her
relevance feedback. Typically the relevance feedback would
be binary in nature - relevant/not relevant e.g. [10], [18]. A
few systems would allow for a discrete number of relevance
degrees - e.g. relevant, partially relevant, not relevant or more
degrees [9], [15].
This paper presents our prototype system consisting of an
interactive user interface, a hybrid model for the combination
of visual and textual features, and a hybrid adaptive model
for the combination of features in the context of relevance
The novel interactive interface offers the functionalities of
mono-modal prototype systems and more: continuous degrees
of relevance represented in a form of the relevance bar to which
a user can drag and drop feedback images in a more natural
way, integrated with the hybrid relevance feedback model.
Other functionalities include: exploratory search (browsing
through positive and negative results, positive and negative
results may become the current query, total control over the
query history and images in the feedback set), query history,
zoom in and out on the images of interest, positive and negative
results which can be both utilized as feedback, selection of
visual features and their combinations.
Figure 1 shows our user interface. The panels and func-
tionalities of the interface are: Visual Example Panel, Positive
and Negative Results Panels, Relevance Feedback Panel, Query
History Panel, show/hide panel description button, help button,
two reset buttons for resetting the set of feedback images
or the query history, “reset all” button for resetting both the
set of feedback images and the query history, expandable
“collection” and “visual features” lists for selecting image
collections and specific visual features and their combinations.
One of the example use cases of the system could be: a
user selects an image as a visual example (“a woman with
tattoo and sunglasses”), performs the search, narrows down
the search by dragging and dropping images from both positive
and negative result panels (concept tattoos) and click the refine
button. Then the user utilizes one of the result images as a
new query and click pagination buttons to find more relevant
images (exploratory search).
Visual Example Panel: here, the user can browse the data
collection by clicking the browse button or exploiting the quick
browse panel (visual example). Then, an image can be selected
which will represent a visual example, to query the system.
Thus, the selected current query will be displayed in the panel
current query. A double mouse click on the query image will
result in a full screen mode image display. Having selected the
visual example, the user can click the search button which will
perform the data collection search based on the hybrid model.
Positive and Negative Results Panels display the posi-
tive/negative results starting from the most/least relevant (top
left) image. Each result image can be displayed in a full
screen mode by double clicking it. Additionally, each result
image can be dragged and dropped in the current query panel
and thus become a current query itself. Buttons marked as
<<” and “>>” allow the user to browse the search results
(exploratory search). Images from the positive/negative panel
results can also be dragged and dropped in the relevance bar,
thus indicating the level of relevance of specific images.
Fig. 1: User interface.
Relevance Bar indicates the levels of relevance of feed-
back images. The user can always change the levels of rele-
vance of the feedback images already placed on the relevance
bar. The degree of relevance is naturally represented in a form
of the spectrum, with one end corresponding to the positive
feedback, and the other end corresponding to the negative
feedback. If the current query is selected and the user needs
to continue the search and already gave the feedback to the
system, then clicking the refine button will narrow down and
also correct the search.
The expandable Query History Panel, as the name sug-
gests, shows the previously issued queries. These queries are
also utilized during the search refinement process.
Other functionalities provided by our interface are: reset
buttons for resetting either the relevance feedback or query
history, or both, show/hide buttons display or hide the panels’
descriptions, the help button displays descriptions of provided
functionalities, a left mouse click on the collection button
displays a list of data collections, a left click on the visual
features button displays a list of visual features, and various
combinations of visual features.
Existing image retrieval systems that incorporate interac-
tion with the user in the form of relevance feedback utilise
binary - relevant/not relevant or graded relevance - e.g. rel-
evant, partially relevant, not relevant. The evidence suggests
that graded relevance judgments play an important role in
the information seeking process [2], [16]. This should not be
surprising as the relevance is rarely binary in nature and more
degrees of relevance offer the user more flexibility and more
freedom of choice. However, the existing implementations
of graded relevance feedback do not show the relationships
between images in terms of their relevance. By analogy, let us
imagine a set of unordered whole numbers. When placed in a
real coordinate space of one dimension (an axis), the relative
position of these numbers (in relation to each other) allows
us to immediately compare them because the number on the
right hand side is always grater than the number on the left
Fig. 2: Relevance bar, continuous degree of relevance.
hand side. The absolute position of the numbers (in relation
to the entire axis) indicates their value. Another advantage of
placing these numbers on an axis is that the aforementioned
relationships make it easy to change one or more of these
numbers (relevance updating) or add a new one to the set.
In this section we present a relevance continuum consisting
of the relevance bar and continuous relevance feedback model.
We can envision a relevance bar representing a spectrum of
possible relevance degrees (see Figure 2). A user can drag and
drop feedback images onto this spectrum according to their
relevance degrees.
Not only the absolute (in relation to the entire bar) but also
relative (in relation to each other) relevance of feedback images
are shown on the spectrum bar. Thus, it is straightforward to
add a new image to the feedback set. Moreover, updating of
the relevance degree of feedback images can be performed
naturally by re-arranging their order. The user can drag and
drop the images to the relevance bar in a natural way. The
relevance bar shows the relationships between the relevance
degrees associated with images selected for the feedback. This
in turn makes it straightforward to add new images to the
feedback set and also update the feedback set - the feedback
images can be rearranged on the relevance continuum.
The relevance degree from the relevance bar can be calcu-
lated as
wi=1+2·dist (0, imiP os)(1)
dist (·,·)[0,1] (2)
wi[1,1] (3)
where dist(0, imiP os)represents distance from zero to the
feedback image position on the relevance spectrum as shown
in Figure 2.
Now, thus calculated weights wican be utilized to modify
the importance of individual feedback images. For example,
the weights can be incorporated into the classic Rocchio
algorithm (or Rocchio algorithm inspired variations) which
modifies the query so that it moves closer to the centroid
of relevant documents and further away from the centroid of
irrelevant ones
Qm= (a·Qo) +
where Qm- modified query vector, Qo- original query vector,
Dj- related document vector, Dk- non-related document
vector, a- original query weight, b- related documents’
weights, c- non-related documents’ weights, Dr- set of related
documents, Dnr - set of non-related documents.
We have implemented two tensor-based hybrid models in
our prototype system demo. The data in Multimedia Retrieval
is often multimodal and heterogeneous. Tensors, as generaliza-
tions of vectors and matrices, provide a natural and scalable
framework for handling data with inherent structures and
complex dependencies. There has been a recent renaissance of
tensor-based methods in machine learning. They range from
scalable algorithms for tensor operations (e.g. [13]), novel
models through tensor representations (e.g. [7]), to industry
applications such as Google TensorFlow [19] and Torch [20].
First is the hybrid model for the combination of visual
and textual features. We compute the Euclidean distances on
tensor-ed representations. This is equivalent to combining the
Euclidean metric (visual similarity), and the cosine similarity
(textual similarity) in a specific late fusion form. The advan-
tages of this hybrid approach are discussed in [6]. Thus, we
combine the distances as
1, dv
1, dt
1, dt
2) + 2 (5)
where sedenotes Euclidean distance, screpresents cosine
similarity measure, dv
1and dt
1denote visual and textual
representations of the query respectively, dv
2and dt
visual and textual representations of an image in the data
collection respectively, and is the tensor operator. Thus, we
utilize Euclidean distance to measure the similarity between
visual representations, and the cosine similarity to measure the
similarity between textual representations. Euclidean distance,
in case of our mid-level visual features, performs better than
cosine similarity. It is due to the fact that normalization of our
local features hampers the retrieval performance. On the other
hand, cosine similarity in textual space performs better than
other similarity measurements.
It can be shown that the aforementioned combination
of measurements performed on individual feature spaces is
equivalent to
1, dv
1, dt
1, dt
2) + 2 =
1, dv
Thus, the implemented model is equivalent to computing
the Euclidean distance on a tensored representation. This
equivalence helps us avoid the curse of dimensionality (no
need to perform the tensor operation).
The second implemented hybrid model is the hybrid rele-
vance feedback model for image refinement. Figure 3 presents
the hybrid relevance model at work. It utilizes the correlation
and complementarity between different feature spaces [5].
Moreover, because query can be correlated with its context to
a different extent ( [17], [4]), the implemented model adapts
its weights based on the interactions with the user. 1
1In this paper, the textual and visual terms refer to image tags and instances
of visual words, respectively. Additionally, the visual and textual context is
represented as a visual and textual subspace of feedback images, respectively.
Fig. 3: Hybrid relevance feedback at work. Initially, users X
and Yissue the same query depicting a road sign and a snow
among other concepts - first image from the top. User X
narrows down the search to the concept snow by dragging and
dropping relevant images to the relevance bar - second image
from the top. User Ynarrows down the search to the concept
sign - third image from the top. Based on the combination of
text and visual features the top results for the original query get
re-ranked to present both users with different results - second
and third image from the top, respectively.
Let us assume that tr denotes the matrix trace operator,
⟨·|·⟩ represents the inner product, M1,M2are co-occurrence
matrices corresponding to different feature spaces (a subspace
generated by the query vector and vectors from the feedback
set), denotes the tensor operator, a,bare different represen-
tations of an image in the collection corresponding to M1and
M2,qv,qtdenote the visual and textual representations of the
query, ci,didenote visual and textual representations of the
images in the feedback set, Dv
fdenote the density (co-
occurrence) matrices of a visual query and its visual context
(feedback images), Dt
fdenote the density matrices of a
textual query and its textual context, and ndenotes the number
of images in the feedback set. Then
tr (M1M2)·aTabTb=
strvqv|a2+ (1 strv)1
strtqt|b2+ (1 strt)1
The presented hybrid adaptive model can be easily ex-
panded to incorporate more features (e.g. textual and multiple
visual features)
tr (nMn)·naT
Let us assume that the relevance feedback is provided
after the first round retrieval to refine the query. The adaptive
weighting can be interpreted in a following way:
1) small Dq|Df; weak relationship between query and
its context, context becomes important. We adjust the
probability of the original query terms; the adjustment
will significantly modify the original query.
2) big Dq|Df; strong relationship (similarity) between
query and its context, context will not help much.
The original query terms will tend to dominate the
whole term distribution in the modified model. The
adjustment will not significantly modify the original
The visual features implemented in our prototype model
comprise: edge histogram, homogeneous texture, bag of visual
words features, colour histogram, co-occurrence matrix, and
their combinations.
In the following subsections we present different variations
of our hybrid model.
A. Hybrid Relevance Feedback Model Based on the Orthogo-
nal Projection
First, let us assume that just like in the original model
fand M2=r1·Dt
f. We can
decompose (eigenvalue decomposition) the density matrices
M1,M2to estimate the bases2(pv
j) of the subspaces
generated by the query and the images in the feedback set.
Now, let us consider the measurement
P1P2|aTabTb (11)
where P1,P2are the projectors onto visual and textual sub-
spaces generated by the query and the images in the feedback
set (i(pv
j), and a,bare the visual and
textual representations of an image from the data set. Because
the tensor product of the projectors corresponding to visual and
textual Hilbert spaces (H1,H2) is a projector onto the tensored
Hilbert space (H1H2), our similarity measurement can be
interpreted as the probability of the relevance context, the
probability that vector abwas generated within the subspace
(representing the relevance context) generated by M1M2.
P rv
P rt
1|a, . . . , pv
We can see, that this measurement is equivalent to the
weighted combinations of all the probabilities of projections
for all the images involved. In quantum mechanics, the square
of the absolute value of the inner product between the initial
state and the eigenstate is the probability of the system
collapsing to this eigenstate. In our case, the square of the
absolute value of the inner product can be interpreted as a
particular contextual factor influencing the measurement.
B. Hybrid Relevance Feedback Model for Image Re-ranking
Another version of the original hybrid relevance feedback
model employs density matrices corresponding to feedback
images only (no query density matrix). The quantum-like
measurement is then utilized to re-rank the top images (from
the first round retrieval). We have discovered that fixed-weight
relevance feedback models which utilize both query and feed-
back information should employ measurement for re-scoring
of the whole data collection. On the other hand, relevance
feedback models which utilize the feedback information only
should employ measurement for re-ranking of the top images.
2It has been highlighted [11] that the orthogonal decomposition may not be
the best option for visual spaces because the receptive fields that result from
this process are not localized, and the vast majority do not at all resemble any
known cortical receptive fields. Thus, in the case of visual spaces, we may want
to utilize decomposition methods that produce non-orthogonal basis vectors.
This is related to the dynamic nature of the importance of query
and its context and the adaptive models help alleviate this issue.
One of the main advantages of the hybrid relevance feedback
model for image re-ranking would be the lower computational
cost than the re-scoring models.
Thus, the density matrices will now be generated by
feedback images only
and the model will simplify to
In this paper, we have presented a unified hybrid image
retrieval system consisting of various visual features and their
combinations, combination of visual and textual feature spaces,
combination of visual and textual feature spaces in the context
of search refinement with adaptive weighting scheme, and
interactive user interface with exploratory search, query his-
tory, continuous degree of relevance, and positive and negative
results panels.
Because the original hybrid model for the combination of
features in the context of user feedback can be modified to
produce other hybrid models, in the paper we describe two
novel spinoff models and discuss their potential advantages.
The relevance continuum for an image retrieval system
consists of the relevance bar and the continuous feedback rele-
vance model. The relevance continuum shows the absolute and
relative relationships between feedback images with respect to
their relevance degree. This makes it straightforward to add
new images to the feedback set and also update the feedback
set - the feedback images can be rearranged on the relevance
We want to perform a rigorous testing and comparison
of the two variations of the original hybrid model. The
investigated models will be used in the ocean monitoring
application, where the images and videos are captured by
marine robots and made accessible to users via wireless
communication and cloud services. The user will be provided
with a unique solution combining the virtual reality real-time
headset, 360 degrees view from the 360 degrees surface
and underwater camera, and augmented reality to remotely
monitor the surface and underwater environment (Figure 4).
Our objective is to enhance the user interaction with the
Fig. 4: Ocean Monitoring Application - augmented user inter-
remote sensing and control applications. The marine robot
will augment the real-time surface and underwater data stream
with the information about similar (previously encountered)
information objects. The retrieval of similar objects will be
based on the fusion of different types of features in order
to reduce the semantic gap, the difference between machine
representation and human perception of images. Thus, the
real-time visual image of the environment will be augmented
by additional digital information. When using the select
button on the Virtual Reality (VR) control pad, the on-the-fly
image capture and retrieval will present the user with the
visual and textual information related to similar information
objects. A pop-up window will be used to display the retrieval
results which can be freely browsed. The user will be able
to further narrow down the presented top retrieval results by
highlighting the relevant images with one of the control pad
buttons. When using the select button on one of the top result
images, the associated textual information will be displayed.
Acknowledgment This work has been partially funded
by the CERBERO project no. 732105 - a HORIZON 2020
EU project. CERBERO project aims at developing a design
environment for Cyber Physical Systems based on two pillars:
a cross-layer model based approach to describe, optimize,
and analyze the system and all its different views concur-
rently; an advanced adaptivity support based on a multi-
layer autonomous engine. AmbieSense works on the new
type of marine robot with surface and underwater surveillance
capabilities, which is one of CERBERO use cases.
[1] Z. Chen, L. Wenyin, F. Zhang, M. Li, H. Zhang. Web Mining for Web
Image Retrieval. Journal of the American Society for Information Science
and Technology, 52(10):831–839, 2001.
[2] F. Crestani, G. Pasi. Soft Computing in Information Retrieval: Techniques
and Applications. Physica, 50, 2013.
[3] T. Deselaers, D. Keysers, H. Ney. FIRE - Flexible Image Retrieval
Engine: ImageCLEF 2004 Evaluation. Proceedings of the 5th Conference
on Cross-Language Evaluation Forum: Multilingual Information Access
for Text, Speech and Images, 688–698, 2005.
[4] A. Goker, H. Myrhaug, R. Bierig. Context and Information Retrieval,
in Information Retrieval: Searching in the 21st Century. John Wiley and
Sons, 2009.
[5] L. Kaliciak, H. Myrhaug, A. Goker, D. Song. Adaptive Relevance
Feedback for Fusion of Text and Visual Features. The 18th International
Conference on Information Fusion (Fusion 2015), Washington DC, USA,
1322–1329, 2015.
[6] L. Kaliciak, H. Myrhaug, A. Goker, D. Song. On the Duality of Fusion
Strategies and Query Modification as a Combination of Scores. The
17th International Conference on Information Fusion (Fusion 2014),
Salamanca, Spain 2014.
[7] L. Kuang, F. Hao, L. T. Yang, M. Lin, C. Luo and G. Min. A
Tensor-Based Approach for Big Data Representation and Dimensionality
Reduction. IEEE Transactions on Emerging Topics in Computing, 2,
3:280–291, 2014.
[8] H. Liu, S. Zagorac, V. Uren, D. Song, S. Ruger. Enabling Effective
User Interactions in Content-Based Image Retrieval. Proceedings of
the 5th Asia Information Retrieval Symposium on Information Retrieval
Technology, 265–276, 2009.
[9] H. Liu, P. Mulholland, D. Song, V. Uren, S. Ruger. An Information
Foraging Theory-based User Study of an Adaptive User Interaction
Framework for Content-based Image Retrieval. 17th International
Conference on MultiMedia Modeling, (MMM), 6524: 241–251, 2011.
[10] D. Markonis, R. Schaer, H. M¨
uller. Evaluating Multimodal Relevance
Feedback Techniques for Medical Image Retrieval. Information Retrieval
Journal, 19,1:100–112, 2016.
[11] B.A. Olshausen, D.J. Field.
Emergence of Simple-cell Receptive Field Properties by Learning a
Sparse Code for Natural Images.
Nature, 381:607–609, 1996.
[12] M. Ortega-Binderberger, S. Mehrotra, K. Chakrabarti, K. Porkaew.
WebMARS: a Multimedia Search Engine. Proceedings of the 12th
Annual ACM International Conference on Multimedia, 314–321, 1999.
[13] L. Qiao, B. Zhang, L. Zhuang and J. Su. An Efficient Algorithm for Ten-
sor Principal Component Analysis via Proximal Linearized Alternating
Direction Method of Multipliers. International Conference on Advanced
Cloud and Big Data (CBD), 283–288, 2016.
[14] T. Quack, U. Monich, L. Thiele, B. S. Manjunath. Cortina: a System
for Large-scale, Content-based Web Image Retrieval. Proceedings of the
12th Annual ACM International Conference on Multimedia, 508–511,
[15] Y. Rui, T. S. Huang, M. Ortega, S. Mehrotra. Relevance Feedback:
a Power Tool for Interactive Content-based Image Retrieval. IEEE
Transactions on Circuits and Systems for Video Technology, 8,5:644–
655, 1998.
[16] A. Spink, H. Greisdorf, J. Bateman. From Highly Relevant to Not
Relevant: Examining Different Regions of Relevance. Information
Processing and Management, 34, 5: 599–621, 1998.
[17] J. Teevan, S. Dumais, E. Horvitz. Personalizing Search via Automated
Analysis of Interests and Activities. 28th Annual International ACM SI-
GIR Conference on Research and Development in Information Retrieval,
449–456, 2005.
[18] R. Tronci, G. Murgia, M, Pili, L. Piras, G. Giacinto. Imagehunter: a
Novel Tool for Relevance Feedback in Content-based Image Retrieval.
New Challenges in Distributed Information Filtering and Retrieval, 53–
70, 2013.
... • The Ocean Monitoring use case comprises smart video-sensing unmanned vehicles with immersive environmental monitoring capabilities. They serve as "marine eyeballs" that can capture live videos and images of the local on-sea and subsea surroundings [8,9]. The Ocean Monitoring use case combines system and hw/sw co-design levels for development of underwater ocean monitoring robots. ...
Conference Paper
Technical Requirements (TRs) provide a "black box" conceptualization of the target project results with explicit verification tests. The goal of Technical Requirements Elicitation (TRE) is to ensure that all needs of involved stakeholders are being identified and adequately addressed without prescribing how to achieve them. Whilst TRE methodology in product or service development is well known, TRE in large research projects turns far too commonly into an ad-hoc process carried out without the support of a common, solid methodology. The objective of this paper is to propose a methodology for identification of all stakeholders in research projects and their needs based on experience of Horizon 2020 project CERBERO.
Conference Paper
Full-text available
It has been shown that query can be correlated with its context to a different extent; in this case the feedback images. We introduce an adaptive weighting scheme where the respective weights are automatically modified, depending on the relationship strength between visual query and its visual context and textual query and its textual context; the number of terms or visual terms (mid-level visual features) co-occurring between current query and its context. The user simulation experiment has shown that this kind of adaptation can indeed further improve the effectiveness of hybrid CBIR models. Keywords: Hybrid Relevance Feedback, Visual Features, Textual Features, Early Fusion, Late Fusion, Re-Ranking, Adaptive Weighting Scheme
Full-text available
Medical image retrieval can assist physicians in finding information supporting their diagnosis and fulfilling information needs. Systems that allow searching for medical images need to provide tools for quick and easy navigation and query refinement as the time available for information search is often short. Relevance feedback is a powerful tool in information retrieval. This study evaluates relevance feedback techniques with regard to the content they use. A novel relevance feedback technique that uses both text and visual information of the results is proposed. The two information modalities from the image examples are fused either at the feature level using the Rocchio algorithm or at the query list fusion step using a common late fusion rule. Results using the ImageCLEF 2012 benchmark database for medical image retrieval show the potential of relevance feedback techniques in medical image retrieval. The mean average precision (mAP) is used as the evaluation metric and the proposed method outperforms commonly-used methods. The baseline without feedback reached 16 % whereas the relevance feedback with 20 images reached up to 26.35 % with three steps and when using 100 images up to 34.87 % in four steps. Most improvements occur in the first two steps of relevance feedback and then results start to become relatively flat. This might also be due to only using positive feedback as negative feeback often also improves results after more steps. The effect of relevance feedback in automatically spelling corrected and translated queries is investigated as well. Results without mistakes were better than spell-corrected results but the spelling correction more than double results over non-corrected retrieval. Multimodal relevance feedback has shown to be able to help visual medical information retrieval. Next steps include integrating semantics into relevance feedback techniques to benefit from the structured knowledge of ontologies and experimenting on the fusion of text and visual information.
Conference Paper
Full-text available
Nowadays, a very large number of digital image archives is easily produced thanks to the wide diffusion of personal digital cameras and mobile devices with embedded cameras. Thus, personal computers, personal storage units, as well as photo-sharing and social-network websites, are rapidly becoming the repository for thousands, or even billions of images (i.e., more than 100 million photos are uploaded every day on the social site Facebook). As a consequence, there is an increasing need for tools enabling the semantic search, classification, and retrieval of images. The use of meta-data associated to images solves the problems only partially, as the process of assigning reliable meta-data to images is not trivial, is slow, and closely related to whom performed the task. One solution for effective image search and retrieval is to combine content-based analysis with feedbacks from the users. In this chapter we present Image Hunter, a tool that implements a Content Based Image Retrieval (CBIR) engine with a Relevance Feedback mechanism. Thanks to a user friendly interface the tool is especially suited to unskilled users. In addition, the modular structure permits the use of the same core both in web-based and stand alone applications.
Conference Paper
Full-text available
We formulate and study search algorithms that consider a user's prior interactions with a wide variety of content to personalize that user's current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that leverage implicit information about the user's interests. This information is used to re-rank Web search results within a relevance feedback framework. We explore rich models of user interests, built from both search-related information, such as previously issued queries and previously visited Web pages, and other information about the user such as documents and email the user has read and created. Our research suggests that rich representations of the user and the corpus are important for personalization, but that it is possible to approximate these representations and provide efficient client-side algorithms for personalizing search. We show that such personalization algorithms can significantly improve on current Web search.
Conference Paper
Full-text available
In this paper we present FIRE, a content-based image retrieval system and the methods we used in the ImageCLEF 2004 evaluation. In FIRE, dierent features are available to represent images. This diversity of available features allows the user to adapt the system to task specific characteristics. A weighted combination of these features admits very flexible query formulations and helps in processing specific queries. For the ImageCLEF 2004 evaluation, we used content-based methods only and the experimental results compare favorably well with other systems that make use of the textual information in addition to the images.
The popularity of digital images is rapidly increasing due to improving digital imaging technologies and convenient availability facilitated by the Internet. However, how to find user-intended images from the Internet is nontrivial. The main reason is that the Web images are usually not annotated using semantic descriptors. In this article, we present an effective approach to and a prototype system for image retrieval from the Internet using Web mining. The system can also serve as a Web image search engine. One of the key ideas in the approach is to extract the text information on the Web pages to semantically describe the images. The text description is then combined with other low-level image features in the image similarity assessment. Another main contribution of this work is that we apply data mining on the log of users' feedback to improve image retrieval performance in three aspects. First, the accuracy of the document space model of image representation obtained from the Web pages is improved by removing clutter and irrelevant text information. Second, to construct the user space model of users' representation of images, which is then combined with the document space model to eliminate mismatch between the page author's expression and the user's understanding and expectation. Third, to discover the relationship between low-level and high-level features, which is extremely useful for assigning the low-level features' weights in similarity assessment.
Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semistructured, and structured data. With tensor extension operator, various types of data are represented as subtensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an incremental high order singular value decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyzes in terms of time complexity, memory usage, and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing 18% elements can guarantee 93% accuracy in general. Theoretical analyzes and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction.
User relevance judgments are central to both the systems and user-oriented approaches to information retrieval (IR) systems research and development. Users' judgments and criteria for relevant items have been central issues for much of the relevance research. However, relevance research has operated on two largely unconnected tracks. First, a relevance level track that examines users' criteria for relevance judgments. Second, a degree of relevance track that examines the measurement of users' high relevance judgments. In this paper, the results of these recent studies are used to expand the framework for relevance research and identify the characteristics of the middle region of relevance or partial relevance. Differences between users' criteria for highly, partially and not relevant items are identified. Findings suggest that: (1) partially relevant items may play an important role in the early stages of a users' information seeking process over time for a particular information problem and (2) a relationship may exist between partially relevant items retrieved and changes in users' information problems during an information seeking process. The results also suggest that partially relevant items may be useful at the early stages of users' information seeking processes. We further propose: (1) a useful concept of relevance as a relationship and an effect on the movement of a user through the iterative stages of their information seeking process, and (2) that users' relevance judgments can be plotted on a Three-Dimensional Spatial Model of Relevance Level, Degree and Time. Implications for the development of IR systems, searching practice and relevance research are also discussed.
Conference Paper
Recent advances in processing and networking capabilities of computers have led to an accumulation of immense amounts of multimedia data such as images. One of the largest repositories for such data is the World Wide Web (WWW). We present Cortina, a large-scale image retrieval system for the WWW. It handles over 3 million images to date. The system retrieves images based on visual features and collateral text. We show that a search process which consists of an initial query-by-keyword or query-by-image and followed by relevance feedback on the visual appearance of the results is possible for large-scale data sets. We also show that it is superior to the pure text retrieval commonly used in large-scale systems. Semantic relationships in the data are explored and exploited by data mining, and multiple feature spaces are included in the search process.