Suggesting Topic-Based Query Terms as You Type.
ABSTRACT Query term suggestion that interactively expands the queries is an indispensable technique to help users formulate high-quality queries and has attracted much attention in the community of web search. Existing methods usually suggest terms based on statistics in documents as well as query logs and external dictionaries, and they neglect the fact that the topic information is very crucial because it helps retrieve topically relevant documents. To give users gratification, we propose a novel term suggestion method: as the user types in queries letter by letter, we suggest the terms that are topically coherent with the query and could retrieve relevant documents instantly. For effectively suggesting highly relevant terms, we propose a generative model by incorporating the topical coherence of terms. The model learns the topics from the underlying documents based on Latent Dirichlet Allocation (LDA). For achieving the goal of instant query suggestion, we use a trie structure to index and access terms. We devise an efficient top-k algorithm to suggest terms as users type in queries. Experimental results show that our approach not only improves the effectiveness of term suggestion, but also achieves better efficiency and scalability.
- [Show abstract] [Hide abstract]
ABSTRACT: Term suggestions recommend query terms to a user based on his initial query. Suggesting adequate terms is a challenging issue. Most existing commercial search engines suggest search terms based on the frequency of prior used terms that match the leading alphabets the user types. In this article, we present a novel mechanism to construct semantic term-relation graphs to suggest relevant search terms in the semantic level. We built term-relation graphs based on multipartite networks of existing social media, especially from Wikipedia. The multipartite linkage networks of contributor-term, term-category, and term-term are extracted from Wikipedia to eventually form term relation graphs. For fusing these multipartite linkage networks, we propose to incorporate the contributor-category networks to model the expertise of the contributors. Based on our experiments, this step has demonstrated clear enhancement on the accuracy of the inferred relatedness of the term-semantic graphs. Experiments on keyword-expanded search based on 200 TREC-5 ad-hoc topics showed obvious advantage of our algorithms over existing approaches.ACM Transactions on Intelligent Systems and Technology (TIST). 12/2013; 5(1).
- [Show abstract] [Hide abstract]
ABSTRACT: Autocompletion systems support users in the formulation of queries in different situations, from development environments to the web. In this paper we describe Composite Match Autocompletion COMMA, a lightweight approach to the introduction of semantics in the realization of a semi-structured data autocompletion matching algorithm. The approach is formally described, then it is applied and evaluated with specific reference to the e-commerce context. The semantic extension to the matching algorithm exploits available information about product categories and distinguishing features of products to enhance the elaboration of exploratory queries. COMMA supports a seamless management of both targeted/precise queries and exploratory/vague ones, combining different filtering and scoring techniques. The algorithm is evaluated with respect both to effectiveness and efficiency in a real-world scenario: the achieved improvement is significant and it is not associated to a sensible increase of computational costs.Web Intelligence and Agent Systems 01/2014; 12(1):35-49.
Conference Paper: Learning to personalize query auto-completion[Show abstract] [Hide abstract]
ABSTRACT: Query auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes tend to be short and ambiguous, and existing models mostly rely on the past popularity of matching candidates for ranking. However, the popularity of certain queries may vary drastically across different demographics and users. For instance, while instagram and imdb have comparable popularities overall and are both legitimate candidates to show for prefix i, the former is noticeably more popular among young female users, and the latter is more likely to be issued by men. In this paper, we present a supervised framework for personalizing auto-completion ranking. We introduce a novel labelling strategy for generating offline training labels that can be used for learning personalized rankers. We compare the effectiveness of several user-specific and demographic-based features and show that among them, the user's long-term search history and location are the most effective for personalizing auto-completion rankers. We perform our experiments on the publicly available AOL query logs, and also on the larger-scale logs of Bing. The results suggest that supervised rankers enhanced by personalization features can significantly outperform the existing popularity-based base-lines, in terms of mean reciprocal rank (MRR) by up to 9%.Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval; 07/2013