Conference PaperPDF Available

Identifying ambiguous queries in web search



It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of "how many queries are ambiguous?" and "how can we automatically identify an ambiguous query?" This paper deals with these issues. First, we construct the taxonomy of query ambiguity, and ask human annotators to manually classify queries based upon it. From manually labeled results, we find that query ambiguity is to some extent predictable. We then use a supervised learning approach to automatically classify queries as being ambiguous or not. Experimental results show that we can correctly identify 87% of labeled queries. Finally, we estimate that about 16% of queries in a real search log are ambiguous.
Identifying Ambiguous Queries in Web Search
Ruihua Song
1, 2
, Zhenxiao Luo
, Ji-Rong Wen
, Yong Yu
, and Hsiao-Wuen Hon
Shanghai Jiao Tong University, Shanghai China
Microsoft Research Asia, Beijing China
Fudan University, Shanghai China
It is widely believed that some queries submitted to search
engines are by nature ambiguous (e.g., java, apple). However, few
studies have investigated the questions of “how many queries are
ambiguous?” and “how can we automatically identify an
ambiguous query?” This paper deals with these issues. First, we
construct the taxonomy of query ambiguity, and ask human
annotators to manually classify queries based upon it. From
manually labeled results, we find that query ambiguity is to some
extent predictable. We then use a supervised learning approach to
automatically classify queries as being ambiguous or not.
Experimental results show that we can correctly identify 87% of
labeled queries. Finally, we estimate that about 16% of queries in
a real search log are ambiguous.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search
and Retrieval – query formulation; H.5.2 [Information
Interfaces and Presentation]: User Interfaces – Natural
General Terms
Experimentation, Languages, Human Factors
Ambiguous query, query classification, broad topics, Web user
Some technologies like personalized Web search and search
results clustering aim to improve users’ satisfaction towards
ambiguous queries from different perspectives. However, there is
no sufficient study on ambiguous queries identification. Questions
like “what percentage of queries are ambiguous?” and “can we
automatically determine whether a query is ambiguous?” are still
open. If we can estimate the percentage of ambiguous queries, we
would know how many queries will be influenced potentially by
the query ambiguity oriented technologies. If we can further
identify ambiguous queries automatically, it is possible to apply
such technologies for a particular kind of queries, instead of for
all. We will try to answer such questions in this paper.
Identifying ambiguous queries is challenging for three reasons.
First, there is no acknowledged definition and taxonomy of query
ambiguity. Many terms related to this concept, such as
“ambiguous query,” “semi-ambiguous query,” “clear query,”
“general term,” “broad topic,” and “diffuse topic.” These terms
are confusing in our investigation. Second, it is uncertain whether
most queries can be associated with a particular type in terms of
ambiguity quality. Cronen-Townsend et al. [1] proposed to use
the relative entropy between a query and the collection to
quantify query clarity, but the score is not easily aligned to
concepts in human’s mind. Third, even if ambiguous queries can
be recognized manually, it is not realistic to label thousand of
queries sampled from query logs. So how can we identify them in
an automatic way?
In this paper, we first construct taxonomy for query ambiguity
from the literature. We then assess human agreement on query
classification through a user study. Based on the findings, we take
a supervised learning approach to automatically identify
ambiguous queries. Experimental results show that our approach
achieves 85% precision and 81% recall in identifying ambiguous
queries. Finally, we estimate that about 16% of queries in the
sampled search log are ambiguous.
By surveying the literature, we summarize the following three
types of queries from being ambiguous to specific.
Type A (Ambiguous Query): a query that has more than
one meaning;
e.g. “giant,” which may refer to “Giant Company Software
Inc.” (an internet security software developer), “Giant” (a film
produced in 1956), “Giant Bike” (a bicycle manufacturer), or
San Francisco Giants” (National League baseball team).
Type B (Broad Query): a query that covers a variety of
subtopics and a user might look for one of the subtopics by
issuing another query.
e.g. “songs,” which covers some subtopics such as “song
lyrics,” “love songs,” “party songs,” and “download songs.” In
practice, a user often issues such a query first, and then narrows
down to a subtopic.
Type C (Clear Query): a query that has a specific meaning
and covers a narrow topic.
e.g. “University of Chicago” and “Billie Holiday.” A clear
query usually means a successful search in which a user can find
several results with a high degree of quality in the first results
The purpose of user study is to answer whether it is ever possible
to associate a query with a certain type by looking at Web search
results. Since it is difficult to find different meanings of a query
by going through all the results, we use clustered search results
generated by Vivisimo [5] to facilitate understanding the query.
Copyright is held by the author/owner(s).
WWW 2007, May 8-12, 2007, Banff, Alberta, Canada.
ACM 978-1-59593-654-7/07/0005.
WWW 2007 / Poster Paper Topic: Search
Work & Money
Work & Money
’Billie Holiday’
Work & Money
(a) Type A: giant (b) Type B: songs (c) Type C: Billie Holiday
Figure 1. Projection of documents represented in categories for three example queries
Queries used in our user study are sampled from 12-day Live
Search [4] query logs in August 2006. We use a total of 60
queries and involve five human subjects. Each participant is asked
to judge whether a query is ambiguous (Type A) or not. If the
query is not ambiguous, the participant would answer an
additional question: “Is it necessary to add some words to the
query in order to let it be clearer?” The question aims to clarify
whether the query is broad (Type B) or clear (Type C).
The user study results indicate that participants are in general
agreement, i.e. 90%, in judging whether a query is ambiguous or
not. However, it is difficult to distinguish Type B from Type C as
the agreement is only 50%.
In this paper, we utilize a query q and a set of top n search
with respect to the query in modeling query ambiguity.
We formulate the problem of identifying ambiguous queries as a
classification problem:
(, ) ( | )
qD A Aa
Based on the findings in the user study, we aim to classify a query
(ambiguous queries) or
(broad or clear queries). Support
Vector Machines (SVM) developed by Vapnik [3] with RBF
kernel is used as our classifier.
A text classifier similar to that used in [2] is applied to classify
each Web document in
D into predefined categories in
KDDCUP 2005. We represent a document by a vector of
categories, in which each dimension corresponds to the
confidence that the document belongs to a category.
Our main idea of identifying an ambiguous query is that relevant
documents with different interpretations probably belong to
several different categories. To illustrate this assumption, we
project documents into a three-dimensional (3D) space and show
three example queries in Figure 1. The coordinates correspond to
three categories that a query most likely belongs to. “
Giant”, as
an ambiguous query, may refer to “
giant squid” in Library
category, “
Giant Company Inc.” in Computing category, and
Gaint Food supermarket” in Work&Money category. Figure 1(a)
shows scattered distribution among these three categories. “
” is a clear query and Figure 1(c) shows almost all the
documents are gathered in the category of Entertainment. “
is a broad query. A pattern of documents between being scattered
and gathered is observed in Figure 1(b).
12 features are derived to quantify the distribution of
D , such as
the maximum Euclidean distance between a document vector and
the centroid document vector in
D .
We conduct the experiments of learning a query ambiguity model
on 253 labeled queries. Five-fold cross validation is performed.
The best classifier in our experiments achieves precision of 85.4%,
recall of 80.9%, and accuracy of 87.4%. Such performance
verifies that ambiguous queries can be identified automatically.
We try to estimate what percentage of queries is ambiguous in a
query set sampled from Live Search logs. The set consists of 989
queries. To achieve the goal, our newly learned query ambiguity
model is used to do prediction on the query set. When we increase
the size of query set for estimation from 1/10 to 10/10, the
percentage first vibrates between 15% and 18% and finally
stabilizes at around 16%. Therefore, we estimate that about 16%
of all the queries are ambiguous.
In this paper, we find people are in general agreement on whether
a query is ambiguous or not. Thus we propose a machine learning
model based on search results to identify ambiguous queries. The
best classifier achieves high accuracy as 87%. By applying the
classifier, we estimate that about 16% queries are ambiguous in
the sampled logs.
[1] S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting
query performance. In Proceedings of the 25
Conference on Research in Information Retrieval (SIGIR),
pages 299-306, 2002
[2] D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q.
Yang. Q2c@ust: our winning solution to query classification
in KDDCUP 2005. SIGKDD Explorations, 7(2):100–110,
[3] V. Vapnik. Principles of risk minimization for learning
theory. In D. S. Lippman, J. E. Moody, and D. S. Touretzky,
editors, Advances in neural information processing systems 3,
pages 831-838. Morgan Kaufmann, 1992
[4] Live Search.
[5] Vivisimo search engine.
WWW 2007 / Poster Paper Topic: Search
... The Web personalization process can be divided into four distinct phases [13,20]: i) Collection of Web data. ...
... Generally there are two classes of privacy protection problems for PWS. One class includes those treat privacy as the identification of an individual, as described in [20]. The other includes those consider the sensitivity of the data, particularly the user profiles, exposed to the PWS server. ...
Full-text available
Now a days, the smart section of the interWeb is the private part where the user is able to watch all that matches his/her requirements. Nowadays, Web users are relying totally on the interWeb in relation to all the problems they have in their daily life. Based on the rapid increase in InterWeb usage, users are more dependent on the web search engine for various information needs. The query submitted by the user has the features of shortness, ambiguousness and incompleteness. As the amount of information on the Web increases rapidly, it creates newer challenges for Web search. Search engines are the collection of programs that facilitates information retrieval from the InterWeb. Even though the search engines do a good job of retrieving content from the InterWeb, users often feel disoriented about the result retrieved. Hence, no matter who the user of the search engine is, if the same query is provided as input to the search engine, the results returned will be exactly the same. The need to provide users with information tailored to their needs led to the development of various information personalization techniques. A Personalized Web Search has various levels of effectiveness for different users, queries, contexts etc. Personalized search has been a most important research area and many techniques have been developed and tested, still many issues and challenges are yet to be explored. In this paper, concentrates on the privacy secured personalized web search information of the user. Here, we propose the elliptic curve cryptography method for personalized the web search.
... Despite this recent progresses, even the best-performing systems are not able to perform smoothly and coherently on all requests [5,27,29]. For example, broad queries can have multiple interpretations and aspects [13,14,36,41], and satisfying such ambiguous ...
Despite recent progress on conversational systems, they still do not perform smoothly and coherently when faced with ambiguous requests. When questions are unclear, conversational systems should have the ability to ask clarifying questions, rather than assuming a particular interpretation or simply responding that they do not understand. Previous studies have shown that users are more satisfied when asked a clarifying question, rather than receiving an unrelated response. While the research community has paid substantial attention to the problem of predicting query ambiguity in traditional search contexts, researchers have paid relatively little attention to predicting when this ambiguity is sufficient to warrant clarification in the context of conversational systems. In this paper, we propose an unsupervised method for predicting the need for clarification. This method is based on the measured coherency of results from an initial answer retrieval step, under the assumption that a less ambiguous query is more likely to retrieve more coherent results when compared to an ambiguous query. We build a graph from retrieved items based on their context similarity, treating measures of graph connectivity as indicators of ambiguity. We evaluate our approach on two recently released open-domain conversational question answering datasets, ClariQ and AmbigNQ, comparing it with neural and non-neural baselines. Our unsupervised approach performs as well as supervised approaches while providing better generalization.
... To challenge our semantic matchmaker presented in Section, we build Q by using two types of query ambiguity introduced by Song et al. [Song 2007]. These authors classify Web queries into broad but clear and ambiguous. ...
... To challenge our semantic matchmaker presented in Section 4.2.2, we build Q by using two types of query ambiguity introduced by Song et al. [57]. These authors classify Web queries into broad but clear and ambiguous. ...
Full-text available
Network Function Virtualization (NFV) has increasingly gained importance to address some emerging networking challenges like agility and cost-effectiveness. NFV enables to run Virtualized Network Functions (VNF) on top of any generic, Commercial-Off-The-Shelf (COTS) hardware, anytime and anywhere in the network. Specific service providers offer VNFs to prospective network providers. Service providers publish VNFs in dedicated marketplaces where network providers search VNFs and instantiate them according to a pre-established service-level agreement. On top of being proprietary and specific to the service providers, the existing VNF description models include details on VNF deployment but fail to fit VNF functional and non-functional specifications. This description alters an efficient selection of the most relevant VNFs and prevents full automation of the VNFs provisioning. This paper introduces a novel domain-independent VIrtualized networK functIoN ontoloGy (VIKING for short) for VNF description and publication in federated repositories. It also proposes a semantic-based matchmaking algorithm to discover and select the most relevant VNFs that satisfy prospective VNF consumers’ requests. As for validation, a prototype called Mastermyr Chest, including VIKING’s instantiation along with the matchmaker in Content Delivery Networks (CDN) domain was implemented. This prototype illustrates a new way to contribute to the redesign of the CDN’s traditional architecture by enabling value-added CDN service provisioning in an agile and dynamic manner. A set of experiments was run to (i) evaluate the matchmaker performances and (ii) demonstrate its accuracy and precision.
The goal of search result diversification is to retrieve diverse documents to meet as many different information needs as possible. Graph neural networks provide a feasible way to capture the sophisticated relationship between candidate documents, while existing graph-based diversification methods require an extra model to construct the graph, which will bring about the problem of error accumulation. In this paper, we propose a novel model to address this problem. Specifically, we maintain a document interaction graph for the candidate documents of each query to model the diverse information interactions between them. To extract latent diversity features, we adopt graph attention networks (GATs) to update the representation of each document by aggregating its neighbors with learnable weights, which enables our model not dependent on knowing the graph structure in advance. Finally, we simultaneously compute the ranking score of each candidate document with the extracted latent diversity features and the traditional relevance features, and the ranking can be acquired by sorting the scores. Experimental results on TREC Web Track benchmark datasets show that the proposed model outperforms existing state-of-the-art models.KeywordsSearch result diversificationGraph attention networksDocument interaction
Searching the world wide web in this technological information era has become the only viable way for many people to find information. Search engines can return thousands of results when ambiguous search queries are used. These results, containing many irrelevant items, can cause user frustration and can eventually lead to a bad user experience rating for the particular search engine. To address this problem, we combined user experience measurement, using conventional methods such as questionnaires, observations, and the measurement of task success, with alternative methods, like measuring the users’ emotional states while performing ambiguous search queries. Participants completed a pre-test questionnaire, conducted search engine searches using ambiguous search queries and completed a post-test questionnaire. An electroencephalography brain-computer interface was used to monitor the participants’ emotional state (engagement, boredom, excitement, frustration and meditation level) in real‐time while completing the questionnaires and conducting the searches. At the same time, custom‐developed software captured the emotional data of participants. The results indicated that the emotions of meditation (minimum, maximum, average and fluctuation), frustration (fluctuation), short‐term excitement (fluctuation) and engagement (fluctuation) could add value to computer usability testing when using ambiguous search queries during world wide web search.
To satisfy different intents behind the queries issued by users, the search engines need to re-rank the search result documents for diversification. Most of previous approaches of search result diversification use pre-trained embeddings to represent the candidate documents. These representation-based approaches lose fine-grained matching signals. In this paper, we propose a new supervised framework leveraging interaction-based neural matching signals for implicit search result diversification. Compared with previous works, our proposed framework can capture and aggregate fine-grained matching signals between each candidate document and selected document sequences, and improve the performance of implicit search result diversification. Experimental results show that our proposed framework can outperform previous state-of-the-art implicit and explicit diversification approaches significantly, and even slightly outperforms ensemble diversification approaches. Besides, with our proposed strategies the online ranking latency of our framework is moderate and affordable.
Retrieving information through World Wide Web searches is part of daily life. Many people prefer using short search strings/queries which can be ambiguous because of their brevity. This ambiguity often causes search engines to return thousands of irrelevant results which can cause frustration with the particular search engine. Consequently, users might rate the particular search engine unfavourably. We conducted a randomised controlled cross-over trial with a Graeco-Latin Square design to measure various user emotions (Frustration, Excitement, Meditation and Engagement) with a Brain-Computer Interface while participants performed ambiguous Internet searches in Google, Yahoo! and Bing. The study results suggest that emotion data captured with a Brain-Computer Interface together with the pre-test and post-test questionnaire feedback can be used to characterise the user experience of different search engines when searches are conducted using ambiguous search terms. In particular, the effect of Search Engine and Search Term had a significant outcome on the measured emotions, while the effect of Occasion was not significant.
To improve personalized search, we need to increase the efficiency of personalization models using effective user profiles and ranking models. The ranking models improve accuracy by combining personalized and non-personalized models. In the personalized models, user profiles are used to re-rank the results, while in non-personalized models documents are ranked in the absence of user profile. A personalization metric able to estimate the potential for personalization can enable the selective application of personalization and improve the overall effectiveness of the search system. In this paper, a personalization fuzzy topic model (FTM) is proposed for integrating the topical user profile into the personalized web search. The topical user profile is built using the fuzzy logic in handling the uncertainty of the occurrence of all topics in a document, and the fuzzy c-means algorithm is used to retrieve the relevant topics. To evaluate the proposed model, the ranking results using the proposed Personalized-FTM are compared against personalization using the Latent Dirichlet Allocation model. The result reveals that the Personalized-FTM improves the Mean Reciprocal Rank and the Normalized Discounted Cumulative Gain by 7% and 5%, respectively, for all topic numbers.
The application of clustering to Web search engine technology is a novel approach that offers structure to the information deluge often faced by Web searchers. Clustering methods have been well studied in research labs; however, real user searching with clustering systems in operational Web environments is not well understood. This article reports on results from a transaction log analysis of, which is a Web meta-search engine that dynamically clusters users' search results. A transaction log analysis was conducted on 2-week's worth of data collected from March 28 to April 4 and April 25 to May 2, 2004, representing 100% of site traffic during these periods and 2,029,734 queries overall. The results show that the highest percentage of queries contained two terms. The highest percentage of search sessions contained one query and was less than 1 minute in duration. Almost half of user interactions with clusters consisted of displaying a cluster's result set, and a small percentage of interactions showed cluster tree expansion. Findings show that 11.1% of search sessions were multitasking searches, and there are a broad variety of search topics in multitasking search sessions. Other searching interactions and statistics on repeat users of the search engine are reported. These results provide insights into search characteristics with a cluster-based Web search engine and extend research into Web searching trends. © 2006 Wiley Periodicals, Inc.