Conference Paper

Fusion vs. Two-Stage for Multimodal Retrieval.

Conference: Advances in Information Retrieval - 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings
Source: DBLP


We compare two methods for retrieval from multimodal collections. The first is a score-based fusion of results, retrieved visually and textually. The second is a two-stage method that visually re-ranks the top-K results textually retrieved. We discuss their underlying hypotheses and practical limitations, and contact a comparative evaluation on a standardized snapshot of Wikipedia. Both methods are found to be significantly more effective than single-modality baselines, with no clear winner but with different robustness features. Nevertheless, two-stage retrieval provides efficiency benefits over fusion.

Download full-text


Available from: Konstantinos Zagoris,
15 Reads
  • Source
    • "If the query image happens to have a low generality, early rank positions may be dominated by spurious results. As a result, the simultaneous employment of CBIR techniques and metadata were found to be significantly more effective and reduce the communication gap that exists between the Humans and the Computers more than the text-only and image-only baseline [3]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper tackles the problem of the user's incapability to describe exactly the image that he seeks by introducing an innovative image search engine called TsoKaDo. Until now the traditional web image search was based only on the comparison between metadata of the webpage and the user's textual description. In the method proposed, images from various search engines are classified based on visual content and new tags are proposed to the user. Recursively, the results get closer to the user's desire. The aim of this paper is to present a new way of searching, especially in case with less query generality, giving greater weight in visual content rather than in metadata.
    ACHI 2012, The Fifth International Conference on Advances in Computer-Human Interactions; 01/2012
  • Source
    • "Traditionally, the method that has been followed in order to deal with multimodal databases is to search the modalities separately and fuse their results. While fusion has been proven robust , we also found that two-stage is more effective than fusion [3]. Furthermore, a two-stage approach has an efficiency benefit: it cuts down greatly on expensive image operations. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Bag-Of-Visual-Words (BOVW) paradigm is fast becoming a popular image representation for Content-Based Image Retrieval (CBIR), mainly because of its better retrieval effectiveness over global feature representations on collections with images being near-duplicate to queries. In this experimental study we demonstrate that this advantage of BOVW is diminished when visual diversity is enhanced by using a secondary modality, such as text, to pre-filter images. The TOP-SURF descriptor is evaluated against Compact Composite Descriptors on a two-stage image retrieval setup, which first uses a text modality to rank the collection and then perform CBIR only on the top-K items.
    Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Large amounts of medical visual data are produced daily in hospitals, while new imaging techniques continue to emerge. In addition, many images are made available continuously via publications in the scientific literature and can also be valuable for clinical routine, research and education. Information retrieval systems are useful tools to provide access to the biomedical literature and fulfil the information needs of medical professionals. The tools developed in this thesis can potentially help clinicians make decisions about difficult diagnoses via a case-based retrieval system based on a use case associated with a specific evaluation task. This system retrieves articles from the biomedical literature when querying with a case description and attached images. This thesis proposes a multimodal approach for medical case-based retrieval with focus on the integration of visual information connected to text. Furthermore, the ImageCLEFmed evaluation campaign was organised during this thesis promoting medical retrieval system evaluation.
    04/2015, Degree: Ph.D. in Sciences, Supervisor: Henning Müller; Stéphane Marchand-Maillet