Conference Proceeding

Cluster-based retrieval using language models.

01/2004; DOI:10.1145/1008992.1009026 In proceeding of: SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, 2004
Source: DBLP

ABSTRACT Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.

0 0
 · 
1 Bookmark
 · 
55 Views
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: To address the inability of current ranking systems to support subtopic retrieval, two main post-processing techniques of search results have been investigated: clustering and diversification. In this paper we present a comparative study of their performance, using a set of complementary evaluation measures that can be applied to both partitions and ranked lists, and two specialized test collections focusing on broad and ambiguous queries, respectively. The main finding of our experiments is that diversification of top hits is more useful for quick coverage of distinct subtopics whereas clustering is better for full retrieval of single subtopics, with a better balance in performance achieved through generating multiple subsets of diverse search results. We also found that there is little scope for improvement over the search engine baseline unless we are interested in strict full-subtopic retrieval, and that search results clustering methods do not perform well on queries with low divergence subtopics, mainly due to the difficulty of generating discriminative cluster labels.
    Information Processing & Management 01/2012; · 0.82 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: This paper is a demonstration of how a robot can, through introspection and then targeted data retrieval, improve its own performance. It is a step in the direction of lifelong learning and adaptation and is motivated by the desire to build robots that have plastic competencies which are not baked in. They should react to and benefit from use. We consider a particular instantiation of this problem in the context of place recognition. Based on a topic based probabilistic model of images, we use a measure of perplexity to evaluate how well a working set of background images explain the robot's online view of the world. Offline, the robot then searches an external resource to seek out additional background images that bolster its ability to localise in its environment when used next. In this way the robot adapts and improves performance through use.
    Robotics and Automation (ICRA), 2011 IEEE International Conference on; 06/2011
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Continuous growth of the Web and user bases forces web search engine companies to make costly investments on very large compute infrastructures. The scalability of these infrastructures requires careful performance optimizations in every major component of the search engine. Herein, we try to provide a fairly comprehensive coverage of the literature on scalability challenges in large-scale web search engines. We present the identified challenges through an architectural classification, starting from a simple single-node search system and moving towards a hypothetical multi-site web search architecture. We also discuss a number of open research problems and provide recommendations to researchers in the field.
    10/2011: pages 27-50;

Full-text

View
0 Downloads
Available from