Conference PaperPDF Available

Determining the user intent of web search engine queries

Authors:

Abstract

Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze samples of queries from seven transaction logs from three different Web search engines containing more than five million queries. From this analysis, we identified characteristics of user queries based on three broad classifications of user intent. The classifications of informational, navigational, and transactional represent the type of content destination the searcher desired as expressed by their query. We implemented our classification algorithm and automatically classified a separate Web search engine transaction log of over a million queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the classification to the results from our algorithm. This comparison showed that our automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification. We illustrate how knowledge of searcher intent might be used to enhance future Web search engines.
Determining the User Intent of Web Search Engine Queries
Bernard J. Jansen, Danielle L. Booth
College of Information Sciences and Technology
The Pennsylvania State University
University Park, PA, 16801, USA
jjansen@acm.org, dlb5000@psu.edu
Amanda Spink
Faculty of Information Technology
Queensland University of Technology
Gardens Point Campus, 2 George St, GPO Box 2434
Brisbane QLD 4001 Australia
ah.spink@qut.edu.au
ABSTRACT
Determining the user intent of Web searches is a difficult problem
due to the sparse data available concerning the searcher. In this
paper, we examine a method to determine the user intent
underlying Web search engine queries. We qualitatively analyze
samples of queries from seven transaction logs from three different
Web search engines containing more than five million queries.
From this analysis, we identified characteristics of user queries
based on three broad classifications of user intent. The
classifications of informational, navigational, and transactional
represent the type of content destination the searcher desired as
expressed by their query. We implemented our classification
algorithm and automatically classified a separate Web search
engine transaction log of over a million queries submitted by
several hundred thousand users. Our findings show that more than
80% of Web queries are informational in nature, with about 10%
each being navigational and transactional. In order to validate the
accuracy of our algorithm, we manually coded 400 queries and
compared the classification to the results from our algorithm. This
comparison showed that our automatic classification has an
accuracy of 74%. Of the remaining 25% of the queries, the user
intent is generally vague or multi-faceted, pointing to the need to
for probabilistic classification. We illustrate how knowledge of
searcher intent might be used to enhance future Web search
engines.
Categories and Subject Descriptors
H.3.3 [1] Information Search and Retrieval – Search process
General Terms
Measurement, Experimentation, Human Factors
Keywords
User intent, Web queries, Web searching, search engines
1. INTRODUCTION
The Web has become an indispensable aspect in the lives of many
people, and search engines are the main portal to the Web. Search
engines are “the tool” for accessing the information, Internet sites,
and services on the Web that many people use on a daily basis.
Beyond their popularity, how are people using these Web search
engines? How can we determine what these people are seeking?
What task, goal, need, or intent are they trying to address with their
Web searching?
Web search engines can help people find the resources they are
looking for by more clearly identifying the searcher’s intent behind
the query. In this paper, we classify user searcher based on intent
in terms of the type of content specified and operationalize these
classifications with defining characteristics. We implement this
operationalized classification in an application that automatically
classifies queries from a search engine transaction log. We discuss
how this model can be used to improve Web search engines.
2. RELATED STUDIES
Discovering the intent of Web searchers is a growing research
area. Some of the most initial work is from Broder [2] and Rose
and Levinson [7]. Lee, Liu, and Cho [6] attempted automated
classification, comparing only informational and navigational in
order to simplify the problem. Baeza-Yates, Benavides, and
Gonz´alez-Caro [1] use supervised and unsupervised learning to
classify 6,042 Web queries as either informational, not
informational, or ambiguous.
From a review of existing literature, efforts at classification of
Web queries have usually involved small quantities of queries
manually classified. There has been little effort on automated
classification of queries for user intent. It is these issues that
motivate our research. A comprehensive evaluation of a substantial
set of Web searching queries will significantly enhance
understanding user intent in Web searching.
3. RESEARCH OBJECTIVES
The following are our research objectives: (1) isolate
characteristics of informational, navigational, and transactional
for Web searching queries by identifying characteristics of each
query type that will lead to real world classification. (2) Validate
the taxonomy by automatically classifying a large set of queries
from a Web search engine.
4. RESEARCH DESIGN
For research question one, we qualitatively analyzed samples of
queries from seven Web search engine transaction logs [3, 5]. in
order to identify characteristics for each query category. For the
analysis, we selected random samples of queries and manually
classified them in one of three categories (information,
navigational, and transactional) as define in [2]. We then derived
characteristics for each category that would serve to define the
queries in that category. This was an iterative process with
multiple rounds of “query selection – classification –
characteristics refinement”.
To address research question two, we implemented our
characteristics in an algorithm (i.e., program), executed this
program on a Web transaction log. The transaction log we used
was from Dogpile.com (http://www.Dogpile.com/).1 A complete
statistical analysis of the Dogpile transaction log is presented in
[4].
1 We will make this log file available to the research community upon
expiration of the NDA. Other search log files are available at
http://ist.psu.edu/faculty_pages/jjansen/academic/transaction_logs.html.
Copyright is held by the author/owner(s).
WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.
ACM 978-1-59593-654-7/07/0005.
WWW 2007 / Poster Paper Topic: Search
1149
5. RESULTS
For research question one, we derived the following characteristics
for each category.
Navigational Searching
queries containing company/business/organization/people
names
queries containing domains suffixes
queries with “web” as the source
queries length (i.e., number of terms in query) less than 3
searcher viewing the first search engine results page
Transactional Searching
queries containing terms related to movies, songs, lyrics,
recipes, images, humor, and porn
queries with “obtaining” terms (e.g., lyrics, recipes, etc.)
queries with “download” terms (e.g., download, software,
etc.)
queries relating to image, audio, or video collections
queries with “audio”, “images”, or “video” as the source
queries with “entertainment” terms (pictures, games, etc.)
queries with “interact” terms (e.g., buy, chat, etc.)
queries with movies, songs, lyrics, images, and multimedia or
compression file extensions (jpeg, zip, etc.)
Informational Searching
uses question words (i.e., “ways to,” “how to,” “what is”, etc.)
queries with natural language terms
queries containing informational terms (e.g., list, playlist, etc.)
queries that were beyond the first query submitted
queries where the searcher viewed multiple results pages
queries length (i.e., number of terms in a query) greater than 2
queries that do not meet criteria for navigational or
transactional
Some navigational queries were quite easy to identify, especially
those queries containing portions of URLs or even complete URLs.
We also classified company and organizational names as
navigation queries, assuming that the user intended to go to the
Website of that company or organization. We also noted that most
navigation queries were short in length and occurred at the
beginning of the user session. Identification of transactional
queries was primarily via term and content analysis, with
identification of key terms related to transactional domains such as
entertainment and ecommerce. With the relatively clear
characteristics of navigational and transactional queries,
information queries became the catch-all by default.
For research question two, we implemented our characteristics in a
program. We then executed the program on the Dogpile search
engine transaction log, with Table 1 presenting the results.
Table 1. Results from Automatic Classification of Queries
Classification Occurrences %
Informational 1,228,427 80.6%
Navigational 155,628 10.2%
Transactional 139,738 9.2%
1,523,793 100.0%
Table 1 shows that more than 80% of Web queries were as
informational in intent, with navigational and transactional queries
each representing about 10% of Web queries. These results
indicate a higher level of informational queries than reported in
prior work. Broder [2] used a random of queries separate from the
session, and Rose and Levinson [7] used only the first query in
each session. These differences in data sampling may be
responsible for the discrepancies in percentages with our work,
which uses all queries from the user sessions.
6. CONCLUSION
In order for Web search engines to continue to improve, they must
leverage an increased knowledge of user behavior, especially
efforts to understand the underlying intent of the searchers. The
results of this research demonstrate the ability to implement of an
approach for automatically classifying queries. Our approach does
not depend on external content and can be implemented in real
time. This makes it a viable solution for Web search engines to
classify user intent based on the type of content desired.
Additionally, the larger data set provides more accurate
percentages of user intent classification than smaller mostly
manual studies. The higher percentage of information queries
indicates that users view search engines primarily as information
retrieval tools rather than instruments of navigation or commerce.
A limitation of our study is that we assigned each query to one and
only one category. We are aware that a query may have multiple
intents. However, from result of our research to verify the accuracy
of our approach, it appears that approximately 75 percent of
queries can be classified into a single category of intent (i.e.,
informational, navigational, or transactional) based on a manual
coding of 400 queries. We are planning to investigate probability
approaches such as naïve Bayes to arrive at a probability of
classifying a query into one or more categories. Future work
involves an both queries and sessions in order to identify more
granular classifications of user intent (i.e. sub-categorizations of
informational, navigations, and transactional). More targeted Web
results to the underlying user content need will increase
performance of future Web search engines.
ACKNOWLEDGMENT
We would like to thank Infospace.com for providing the data for
this analysis. The AFOSR and the NSF funded portions of this
research.
7. Reference
[1] Baeza-Yates, R., Calder´on-Benavides, L. and Gonz´alez-
Caro, C. 2006. The Intention Behind Web Queries. In Proceedings
of STRING PROCESSING AND INFORMATION RETRIEVAL
(SPIRE 2006). Glasgow, Scotland, 98-109.
[2] Broder, A. 2002. A Taxonomy of Web Search. SIGIR Forum.
36, 2, 3-10.
[3] Jansen, B. J. and Spink, A. 2005. How are we searching the
World Wide Web? A comparison of nine search engine transaction
logs. Information Processing & Management. 42, 1, 248-263.
[4] Jansen, B. J., Spink, A., Blakely, C. and Koshman, S.
forthcoming. Web Searcher Interaction with the Dogpile.com
Meta-Search Engine. Journal of the American Society for
Information Science and Technology.
[5] Jansen, B. J., Spink, A. and Saracevic, T. 2000. Real Life,
Real Users, and Real Needs: A Study and Analysis of User Queries
on the Web. Information Processing & Management. 36, 2, 207-
227.
[6] Lee, U., Liu, Z. and Cho, J. 2005. Automatic Identification of
User Goals in Web Search. In Proceedings of The World Wide
Web Conference. Chiba, Japan, 391-401.
[7] Rose, D. E. and Levinson, D. 2004. Understanding User
Goals in Web Search. In Proceedings of the World Wide Web
Conference (WWW 2004). New York, NY, USA, 13-19.
WWW 2007 / Poster Paper Topic: Search
1150
... Este enfoque se centra en la intención de búsqueda de los usuarios. Es decir, la necesidad, pregunta o problema real que lleva al usuario a buscar algo en un motor de búsqueda (Jansen et al., 2007). Una práctica mucho más precisa que el análisis de palabras clave centrado en los intereses de la empresa o marca. ...
... En los últimos años, a raíz del éxito de los móviles, se ha incorporado una nueva intención de búsqueda recogida en la literatura bajo el nombre "visitar en persona", la cual está relacionada con las búsquedas que tienen como objetivo obtener información e indicaciones sobre cómo llegar a establecimientos o lugares de una determinada categoría cerca del usuario (Macià, 2019). El estudio de la intención de búsqueda de los usuarios, así como el análisis semántico de los términos utilizados en la ecuación de búsqueda han venido siendo estudiados en la última década en la literatura científica dentro del ámbito de la documentación, la informática y el marketing (Hulth, 2003;Rose y Levinson, 2004;Jansen et al., 2007;2008;Yin y Shah, 2010). ...
Article
Los buscadores son el principal punto de acceso a los contenidos de los sitios web. El SEO es la práctica encaminada al aumento de la cantidad y calidad de tráfico hacia un sitio web a través de los resultados de búsqueda orgánicos procedentes de los buscadores. El trabajo SEO busca satisfacer ciertos factores de posicionamiento que tienen en cuenta los algoritmos de los buscadores en la ordenación de los resultados de búsqueda. En los últimos años hemos visto como estos algoritmos han ido virando hacia factores y señales orientados a priorizar aquellos resultados que mejor satisfacen la intención de búsqueda que se esconde tras la palabra clave utilizada, ofreciendo también la mejor experiencia de usuario posible en la página de destino. Tras un análisis bibliográfico de los factores relacionados con el análisis de la intención de búsqueda y los factores relacionados con la mejora de la experiencia de usuario desde un punto de vista SEO en el buscador de Google, se recogen un conjunto de acciones y estrategias que pueden implementarse con el objetivo de mejorar el posicionamiento de las páginas de un sitio web.
... Life-time search increases the query's precision over time, but suffers from concept drift [2,23,33,10,8]. In information systems that use historical data about a user are sensitive to local changes and quick global shifting of the user's search intent [35,20,27]. This sensitivity causes the system's results to degrade in quality with time. ...
... This approach is not adjusted for search engines but more suitable for recommendation systems. Some methods of handling recommendation and searching build a profile of the user and filter or reorder results according to each profile [20]. This methodology uses the life-time data the system gathers. ...
Preprint
Full-text available
Finding relevant research literature in online databases is a familiar challenge to all researchers. General search approaches trying to tackle this challenge fall into two groups: one-time search and life-time search. We observe that both approaches ignore unique attributes of the research domain and are affected by concept drift. We posit that in searching for research papers, a combination of a life-time search engine with an explicitly-provided context (project) provides a solution to the concept drift problem. We developed and deployed a project-based meta-search engine for research papers called Rivendell. Using Rivendell, we conducted experiments with 199 subjects, comparing project-based search performance to one-time and life-time search engines, revealing an improvement of up to 12.8 percent in project-based search compared to life-time search.
... More broadly speaking, purchase intention can be seen as the intention to perform a specific behavior that would eventually result in a buying behavior. Therefore, some researchers (Jansen et al., 2007;Kathuria et al., 2010) classify purchase intents into four types: informational intent, investigative intent, navigational intent, and transaction intent. The transaction intent is the one that is most equivalent to actual purchase intent and conversion. ...
Article
Full-text available
The newly established private pension scheme in China has received great attention as it would be an important supplement to China’s social safety net and corporate annuity amid an aging population. It provides a way of helping to address the challenge of ensuring adequate retirement income, and the scheme is expected to grow significantly in the coming years. This study investigates factors affecting the intention of purchasing the private pension scheme using a conceptual model based on the integration of Fogg Behavioral Model (FBM) and Unified Theory of Acceptance and Use of Technology (UTAUT) model. The questionnaire-based data from a sample of 462 respondents had been analyzed. Both exploratory factor analysis and confirmatory factor analysis were used to assess validity. The hypothesized relationships in the integrated FBM-UTAUT model were tested using structural equation modeling. The research findings indicate that anticipation, social influence, effort expectancy, performance expectancy, side benefits and facilitating conditions have significant positive impacts on intention to purchase. According to the exploratory factor analysis, the integrated FBM-UTAUT model can explain more than 70% of the total variance. Meanwhile, effort expectancy can be affected by time effort, thought effort and physical effort collectively, while performance expectancy can be affected by risk and trust. It is revealed that the integrated FBM-UTAUT model can be effective in explaining purchase intentions in a private pension scheme context, and this study is expected to offer helpful advice on the design of pension products and the reform of pension policies.
... User intent can be explicitly defined. For example, search intents are classified into navigational, informational, and transactional [5,16,32]. With this approach, we can annotate the training data with explicitly defined user intent and cast it as a supervised learning task to predict user intent. ...
Preprint
Full-text available
Sequential recommender models are essential components of modern industrial recommender systems. These models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience. We propose a probabilistic modeling approach and formulate user intent as latent variables, which are inferred based on user behavior signals using variational autoencoders (VAE). The recommendation policy is then adjusted accordingly given the inferred user intent. We demonstrate the effectiveness of the latent user intent modeling via offline analyses as well as live experiments on a large-scale industrial recommendation platform.
... In this search session, a 'new query' reformulation occurred 15 times as the user tried different queries, including "sine formula", "tangent" and "cotangent." The analysis of these ten long math search sessions suggests that math searches can be long for different reasons, and that it might therefore be useful for math search engines to include functions for inferring user intent [59]. ...
Thesis
Large collections containing millions of math formulas are available online. Retrieving math expressions from these collections is challenging. Users can use formula, formula+text, or math questions to express their math information needs. The structural complexity of formulas requires specialized processing. Despite the existence of math search systems and online community question-answering websites for math, little is known about mathematical information needs. This research first explores the characteristics of math searches using a general search engine. The findings show how math searches are different from general searches. Then, test collections for math-aware search are introduced. The ARQMath test collections have two main tasks: 1) finding answers for math questions and 2) contextual formula search. In each test collection (ARQMath-1 to -3) the same collection is used, Math Stack Exchange posts from 2010 to 2018, introducing different topics for each task. Compared to the previous test collections, ARQMath has a much larger number of diverse topics, and improved evaluation protocol. Another key role of this research is to leverage text and math information for improved math information retrieval. Three formula search models that only use the formula, with no context are introduced. The first model is an n-gram embedding model using both symbol layout tree and operator tree representations. The second model uses tree-edit distance to re-rank the results from the first model. Finally, a learning-to-rank model that leverages full-tree, sub-tree, and vector similarity scores is introduced. To use context, Math Abstract Meaning Representation (MathAMR) is introduced, which generalizes AMR trees to include math formula operations and arguments. This MathAMR is then used for contextualized formula search using a fine-tuned Sentence-BERT model. The experiments show tree-edit distance ranking achieves the current state-of-the-art results on contextual formula search task, and the MathAMR model can be beneficial or re-ranking. This research also addresses the answer retrieval task, introducing a two-step retrieval model in which similar questions are first found and then answers previously given to those similar questions are ranked. The proposed model, fine-tunes two Sentence-BERT models, one for finding similar questions and another one for ranking the answers. For Sentence-BERT model, raw text as well as MathAMR are used.
... Fig. 1 shows layer 2 (Find Layer) of the web of things architecture [2]. To search the data from a search engine efficiently, one of the significant elements of a search engine is indexing [5]. Indexing is a place where search engines can store the data organized. ...
Article
Full-text available
The number of interconnected real-world devices such as sensors, actuators, and physical devices has increased with the advancement of technology. Due to this advancement, users face difficulties searching for the location of these devices, and the central issue is the findability of Things. In the WoT environment, keyword-based and geospatial searching approaches are used to locate these devices anywhere and on the web interface. A few static methods of indexing and ranking are discussed in the literature, but they are not suitable for finding devices dynamically. The authors have proposed a mechanism for dynamic and efficient searching of the devices in this paper. Indexing and ranking approaches can improve dynamic searching in different ways. The present paper has focused on indexing for improving dynamic searching and has indexed the Things Description in Solr. This paper presents the Things Description according to the model of W3C JSON-LD along with the open-access APIs. Search efficiency can be analyzed with query response timings, and the accuracy of response timings is critical for search results. Therefore, in this paper, the authors have evaluated their approach by analyzing the search query response timings and the accuracy of their search results. This study utilized different indexing approaches such as key-words-based, spatial, and hybrid. Results indicate that response time and accuracy are better with the hybrid approach than with keyword-based and spatial indexing approaches.
Chapter
Cross-session search (XSS) describes situations in which users search for information related to the same task across multiple sessions. While there has been research on XSS, little attention has been paid to users’ motivations for searching multiple sessions in real-life contexts. We conducted a diary study to investigate the reasons that lead people to search across multiple sessions for their own tasks. We applied Lin and Belkin’s [24] MISE theoretical model as a coding framework to analyze users’ open-ended responses about their XSS reasons. We open-coded reasons that the MISE model did not cover. Our findings identified a subset of session-resuming reasons in the MISE model (i.e., spawning, transmuting, unanswered-incomplete, cultivated-updated, and anticipated) as the main reasons that caused people to start a search session in our participants’ real-world searches. We also found six additional session resuming reasons rarely discussed in the context of XSS: exploring more topic aspects, finding inspiration and examples, reviewing the information found earlier, monitoring task progress, completing a search following a scheduled plan, and feeling in the mood/having the energy to search. Our results contextualize and enrich the MISE session resuming reasons by examining them in real-world examples. Our results also illustrate that users’ XSS motivations are multifaceted. These findings have implications for developing assisting tools to support XSS and help design different types of search sessions to study XSS behavior.
Chapter
In this chapter, we revisit the fundamental formal models of IR and associated simplified assumptions, with the goal of exploring and introducing actionable directions toward which the assumptions can be extended to at least partially cover the triggers and characteristics of bounded rationality. To this end, we first categorize different types of explicit and implicit assumptions into three groups, pre-search, within-search, and post-search, and discuss their conflicts with empirical findings on bounded rationality. Within each group, we discuss possible ways to extend and revise existing rational assumptions, as a key preparation for enhancing formal user models and IR evaluation techniques. When explaining the methods for extending rational assumptions, we also discuss related boundaries and explain the implications for user modeling and evaluation and how these potential boundaries are related to IIR-specific factors.
Chapter
Identification of product attributes (product type, brand, color, gender, etc.) from a query is critically important for e-commerce search systems, especially the identification of brand intent. Recently, Named Entity Recognition (NER) method has been used to address this issue. However, the limitation of NER method is that it can only identify brand intent specified by terms of a query and cannot work appropriately if brand terms are not provided explicitly. To overcome this limitation, we propose a novel Extreme Multi-label based hierarchical Multi-tAsk (EMMA) framework, where we treat the brand identification as an issue of extreme multi-label classification; thereafter, a deep learning model is also developed to jointly learn query’s product intent and brand intent in a coarse-to-fine approach. The results from both online A/B test and offline experiment on real industrial dataset demonstrate the effectiveness of our proposed framework. Additionally, this framework may be extended potentially from e-commerce system to other search scenarios.
Article
Full-text available
The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content.
Article
Full-text available
We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions — changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries — the number of search terms, and the use of logic and modifiers; and (iii) terms — their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure analysis, identifying trends among user mistakes. We conclude with a summary of findings and a discussion of the implications of these findings.
Conference Paper
Full-text available
The identification of the user’s intention or interest through queries that they submit to a search engine can be very useful to offer them more adequate results. In this work we present a framework for the identification of user’s interest in an automatic way, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of a user and the categories in which these aims are situated. A manual classification of the queries was made in order to have a reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that for a considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationships between users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning we established that there are categories that we are not able to determine in contrast with other classes that were not considered but naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervised learning is a good alternative to find user’s goals. From supervised learning we can identify the user interest given certain established goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used, refine them and select the most appropriate to the user’s needs.
Article
Full-text available
Classic IR (information retrieval) is inherently predicated on users searching for information, the so-called "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop, download a file, or find a map). We explore this taxonomy of web searches and discuss how global search engines evolved to deal with web-specific needs.
Article
Full-text available
Previous work on understanding user web search behavior has focused on how people search and what they are searching for, but not why they are searching. In this paper, we describe a framework for understanding the underlying goals of user searches, and our experience in using the framework to manually classify queries from a web search engine. Our analysis suggests that so-called "navigational" searches are less prevalent than generally believed, while a previously unexplored "resourceseeking " goal may account for a large fraction of web searches. We also illustrate how this knowledge of user search goals might be used to improve future web search engines.
Conference Paper
There has been recent interests in studying the "goal" behind a user's Web query, so that this goal can be used to improve the quality of a search engine's results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whether and how we can automate this goal-identification process. We first present our results from a human subject study that strongly indicate the feasibility of automatic query-goal identification. We then propose two types of features for the goal-identification task: user-click behavior and anchor-link distribution. Our experimental evaluation shows that by combining these features we can correctly identify the goals for 90% of the queries studied.
Article
Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on rele- vance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major Web metasearch engine, with the aim of discovering how Web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and com- pare these results with findings from other Web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84% of searchers), use about 3 terms per query (mean � 2.85), implement system feedback moderately (8.4% of users), and generally (56% of users) spend less than one minute interacting with the Web search engine. Overall, metasearchers seem to have higher degrees of interac- tion than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of Web searching. We discuss the implica- tions of our findings in relation to metasearch for Web searchers, search engines, and content providers.
Poster Paper Topic: Search
WWW 2007 / Poster Paper Topic: Search