Conference Paper

Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval.

DOI: 10.1145/2009916.2010144 Conference: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011
Source: DBLP

ABSTRACT The Bag-Of-Visual-Words (BOVW) paradigm is fast becoming a popular image representation for Content-Based Image Retrieval (CBIR), mainly because of its better retrieval effectiveness over global feature representations on collections with images being near-duplicate to queries. In this experimental study we demonstrate that this advantage of BOVW is diminished when visual diversity is enhanced by using a secondary modality, such as text, to pre-filter images. The TOP-SURF descriptor is evaluated against Compact Composite Descriptors on a two-stage image retrieval setup, which first uses a text modality to rank the collection and then perform CBIR only on the top-K items.

0 Bookmarks
 · 
71 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to the rapid development of information technology and the continuously increasing number of available multimedia data, the task of retrieving information based on visual content has become a popular subject of scientific interest. Recent approaches adopt the bag-of-visual-words (BOVW) model to retrieve images in a semantic way. BOVW has shown remarkable performance in content-based image retrieval tasks, exhibiting better retrieval effectiveness over global and local feature (LF) representations. The performance of the BOVW approach depends strongly, however, on predicting the ideal codebook size, a difficult and database-dependent task. The contribution of this paper is threefold. First, it presents a new technique that uses a self-growing and self-organized neural gas network to calculate the most appropriate size of a codebook for a given database. Second, it proposes a new soft-weighting technique, whereby each LF is classified into only one visual word (VW) with a degree of participation. Third, by combining the information derived from the method that automatically detects the number of VWs, the soft-weighting method, and a color information extraction method from the literature, it shapes a new descriptor, called color VWs. Experimental results on two well-known benchmarking databases demonstrate that the proposed descriptor outperforms 15 contemporary descriptors and methods from the literature, in terms of both precision at K and its ability to retrieve the entire ground truth.
    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society 07/2012; · 3.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mobile devices such as smartphones and tablets are widely used in everyday life to perform a variety of operations, such as e-mail exchange, connection to social media, bank/financial transactions, and so on. Moreover, because of the large growth of multimedia applications, video and image transferring and sharing via a wireless network is becoming increasingly popular. Several modern mobile applications perform information retrieval and image recognition. For example, Google Goggles is an image recognition application that is used for searches based on pictures taken by handheld devices. In most of the cases, image recognition procedure is an image retrieval procedure. The captured images or a low-level description of them are uploaded online, and the system recognizes their content by retrieving visually similar pictures. Taking into account the last comment, our goal in this paper is to evaluate the process of image retrieval/recognition over an Institute of Electrical and Electronics Engineers 802.11b network, operating at 2.4 GHz. Our evaluation is performed through a simulated network configuration, which consists of a number of mobile nodes communicating with an access point. Throughout our simulations, we examine the impact of several factors, such as the existence of a strong line of sight during the communication between wireless devices. Strong line of sight depends on the fading model used for the simulations and has an effect on BER. We have used a large number of image descriptors and a variety of scenarios, reported in the relative literature, in order to comprehensively evaluate our system. To reinforce our results, experiments were conducted on two well-known images databases by using 10 descriptors from the literature. Copyright © 2014 John Wiley & Sons, Ltd.
    International Journal of Communication Systems 01/2014; · 1.11 Impact Factor

Full-text

Download
3 Downloads
Available from
Sep 10, 2014