[Show abstract][Hide abstract] ABSTRACT: This paper presents a probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they relax the traditional assumption of independent relevance of documents.
Information Processing & Management 01/2006; · 0.82 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Text clustering is most commonly treated as a fully auto- mated task without user feedback. However, a variety of re- searchers have explored mixed-initiative clustering methods which allow a user to interact with and advise the clustering algorithm. This mixed-initiative approach is especially at- tractive for text clustering tasks where the user is trying to organize a corpus of documents into clusters for some par- ticular purpose (e.g., clustering their email into folders that re∞ect various activities in which they are involved). This paper introduces a new approach to mixed-initiative clus- tering that handles several natural types of user feedback. We flrst introduce a new probabilistic generative model for text clustering (the SpeClustering model) and show that it outperforms the commonly used mixture of multinomi- als clustering model, even when used in fully autonomous mode with no user input. We then describe how to incor- porate four distinct types of user feedback into the cluster- ing algorithm, and provide experimental evidence showing substantial improvements in text clustering when this user feedback is incorporated.
SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006; 01/2006
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.