ArticlePDF Available

Intentional query suggestion: User goals more explicit during search

Authors:

Abstract and Figures

The degree to which users' make their search intent explicit can be assumed to represent an upper bound on the level of service that search engines can provide. In a departure from traditional query expansion mechanisms, we introduce Intentional Query Suggestion as a novel idea that is attempting to make users' intent more explicit during search. In this paper, we present a prototypical algorithm for Intentional Query Suggestion and we discuss corresponding data from comparative experiments with traditional query suggestion mechanisms. Our preliminary results indicate that intentional query suggestions 1) diversify search result sets (i.e. it reduces result set overlap) and 2) have the potential to yield higher click-through rates than traditional query suggestions.
Content may be subject to copyright.
Intentional Query Suggestion:
Making User Goals More Explicit During Search
Markus Strohmaier
Graz University of Technology and
Know-Center
Inffeldgasse 21a
8010 Graz, Austria
markus.strohmaier@tugraz.at
Mark Kröll
Graz University of Technology
Inffeldgasse 21a
8010 Graz, Austria
mkroell@tugraz.at
Christian Körner
Graz University of Technology
Inffeldgasse 21a
8010 Graz, Austria
christian.koerner@
student.tugraz.at
ABSTRACT
The degree to which users’ make their search intent explicit can
be assumed to represent an upper bound on the level of service
that search engines can provide. In a departure from traditional
query expansion mechanisms, we introduce Intentional Query
Suggestion as a novel idea that is attempting to make users’ intent
more explicit during search. In this paper, we present a
prototypical algorithm for Intentional Query Suggestion and we
discuss corresponding data from comparative experiments with
traditional query suggestion mechanisms. Our preliminary results
indicate that intentional query suggestions 1) diversify search
result sets (i.e. it reduces result set overlap) and 2) have the
potential to yield higher click-through rates than traditional query
suggestions.
Categories and Subject Descriptors
H.1.2 [User/Machine Systems]: Human Factors; H.3.3
[Information Storage and Retrieval]: Query Formulation,
Search Process, Retrieval Models
General Terms
Algorithms, Human Factors, Experimentation
Keywords
Query Suggestion, User Intent
1. INTRODUCTION
In IR literature, the purpose of query suggestion has often been
described as the process of making a user query resemble more
closely the documents it is expected to retrieve ([26]). In other
words, the goal of query suggestion is commonly understood as
maximizing the similarity between query terms and expected
documents. The task of a searcher then is to envision the expected
documents, and craft queries that reflect their contents.
However, research on query log analysis suggests that many
queries exhibit a lack of user understanding about the specific
documents users expect to retrieve. Broder [6] found that only
~25% of queries have a clear navigational intent, and up to ~75%
of queries need to be understood as informational or transactional
queries, meaning they are not directed towards a specific set of
expected documents. Recent studies even estimate more drastic
ratios [13]. While users crafting informational or transactional
search queries often have a high level search intent (“plan a trip to
Europe”), in many situations they have no clear idea or knowledge
about the specific documents they expect to retrieve. This makes
it difficult for users to craft successful queries and makes query
suggestion a particularly important and challenging problem.
In this paper we are interested in exploring the following question:
What if search engines would, rather than letting users guess
arbitrary words from the set of documents they are expected to
retrieve, encourage users to tell them their original search intent in
a more unambiguous and natural way? In other words, what if
search engines would encourage users to make their search intent
more explicit (e.g. “buy a car”) rather than formulating their query
in a rather artificial manner (“car dealership”)? In future search
interfaces (such as audio search interfaces for cell phones or
natural language search interfaces), current mechanisms for query
suggestion might become inadequate and natural language search
queries might play a more important role. This work is interested
in understanding how current search methods would cope with
such a development.
For this purpose, we introduce and study a novel approach to
query suggestion: Intentional Query Suggestion or query
suggestion by user intent. While traditional query suggestion often
aims to make a query resemble more closely the documents a user
is expected to retrieve (which might be unknown to the user), we
want to study an alternative: expanding queries to make searchers’
intentions more explicit.
To give an example: In traditional query suggestion, a query “car
might receive the following suggestions: “car rental”, “car
insurance”, “enterprise car rental”, “car games” (actual suggestions
produced by Yahoo.com on Nov 27th 2008). In query suggestion
based on explicit user intent, the suggestions could be “buy a car”,
rent a car”, “sell your car”, “repair your car” (see Table 1 for
examples). We can speculate that in innovative search interfaces
(such as audio search interfaces), such suggestions would be
easier to verify with a user than verifying traditional query
suggestions (e.g. “Do you want to: buy a car OR sell a car OR …?”).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
WSCD’09 at WSDM’09, February 9, 2009, Barcelona, Spain.
Copyright 2009 ACM 1-58113-000-0/00/0004…$5.00.
Table 1: Comparison of suggested queries provided by
Yahoo!, MSN and Intentional Query Suggestion.
Initial
Query Semantic Query
Suggestion1 Semantic Query
Suggestion2 Intentional Query
Suggestion
car car rental, car
insurance,
enterprise car
rental, car games
used cars, new cars,
2007 new cars, used
cars for sale, cars for
sale, fast cars,
classic cars, car
games
buy a car, rent a car,
sell your car, repair
your car
poker online poker, poker
games, world series
of poker, party
poker, free poker
free online poker,
full tilt poker, free
poker games, free
poker, poker rules,
absolute poker,
online poker, poker
hands
cheating at poker,
learn to play poker,
buy poker table,
design your own
poker chips
house house plans, white
house, house of
fraser, columbia
house, house of
blues, full house
house TV show,
houses for sale,
houses for rent,
house plans, house
MD, house fox,
haunted houses,
Hugh Laurie
insure my house,
sell your house,
make offer on
house, buy house
online, build my
own house
We are interested in studying the effects of this idea on the search
result sets obtained from experiments with a current search engine
provider. In particular, we are interested in seeking answers to the
following questions: How do today’s search engines deal with
queries that contain explicit user goals? How would queries
expanded by user intent influence search results and click
through?
This paper introduces Intentional Query Suggestion as a novel
type of query suggestion. Specifically, this paper 1) introduces a
definition of Intentional Query Suggestion 2) presents a
preliminary algorithm to perform intentional query suggestion
based on historic query log data and 3) discusses experimental
results and potential implications for future research on search
interfaces.
2. QUERY SUGGESTION
The general idea of query suggestion is to support the searcher in
formulating queries that have a better chance to retrieve relevant
documents [21], [3]. Methods offered to expand queries can be
divided into two major categories. Global methods employ entire
document collections or external sources such as thesauri as
corpora for producing suggestions. Local methods reformulate the
initial query based on the result set it has retrieved. Relevance
feedback represents another query reformulation strategy in which
a searcher is involved by marking retrieved documents as relevant
or not. Global as well as local methods aim to eventually move
the initial query closer to the entire cluster of relevant documents.
2.1 Intentional Query Suggestion
While traditional query suggestion techniques aim at narrowing
the gap between the initial query and the set of relevant
documents, we seek to approximate the user’s intentions behind a
1 Related query suggestion results from Yahoo!
2 Related query suggestion results from MSN
query and expand it based on a better understanding of the
corresponding information need – thereby aiming to make user
intent more explicit.
We define Intentional Query Suggestion as the incremental
process of transforming a query into a new query based on
intentional structures found in a given domain, in our case: a
search query log. An initial query is replaced by the most probable
intentions that underlay the query. To give an example: for the
query “playground mat”, an Intentional Query Suggestion
mechanism might suggest the following 5 user intentions: “buy
playground equipment”, “build a swing set”, “covering dirt in a
playground”, “buy children plastic slides”, “raise money for our
playground”.
In our case, we extract the proposed intentions from search query
logs, but they could potentially be extracted from other knowledge
bases containing common human goals as well, such as
ConceptNet [19] or others. To the best of our knowledge, the
application of explicit search intent [24] to query suggestion
represents a novel idea that has not been studied yet.
3. EXPERIMENTAL SETUP
In traditional query suggestion, an initial query formulation is
replaced by some other query that refines, disambiguates or
clarifies the original query. In our approach, the initial query is
replaced by a query that exhibits a higher degree of intentional
explicitness, meaning that it makes user intent more explicit [24].
Definition: We define this replacement as query suggestion based
on user intent. The suggested queries can be considered to
represent Intentional Query Suggestions whenever they 1) contain
at least one verb and 2) describe a plausible state of affairs that
the user may want to achieve or avoid (cf.) in 3) a recognizable
way.
We developed a parametric algorithm that executes the function
f(q) RQE = {qe,1, qe,2 … qe,k}, mapping implicit intentional
queries (length <= 2) to a set of potential explicit intentional query
suggestions (e.g. “carbuy a car”, “rent a car”, “repair my car”).
3.1 Datasets
The MSN Search query log excerpt contains about 15 million
queries (from US users) that were sampled over one month in
May, 2006. The search query log data is split into two files, one
file containing attributes Time, Query, QueryID and ResultCount, the
other one attributes QueryID, Query, Time, URL and Position
providing click-through data. The queries were modified via the
following normalization steps (i) trimming of each query, and (ii)
space sequence reduction to one space character. Queries and
corresponding click-through data containing adult content were
filtered out (and were not taken into account in our study).
A set of ~46.000 explicit intentional queries was extracted from
the MSN Search Asset Data Spring 2006 applying the algorithm
described in [24]. The resulting set has an estimated precision of
77% of explicit intentional queries (based on the evaluations
reported in [25]) and represents our knowledge base for
Intentional Query Suggestion. We call this subset of queries the
Explicit Intentional Query Dataset from here on.
Our parametric algorithm for Intentional Query Suggestion
approximates the searcher’s intent by combining two different yet
complementary approaches, i.e. text-based Intentional Query
Suggestion (see Section 3.2) and neighborhood-based Intentional
Query Suggestion (see Section 3.3). The two approaches can be
combined yielding a ranked list of potential intentional query
suggestions.
3.2 Text-Based Intentional Query Suggestion
In the text-based approach, the tokens of input queries are
textually compared to all query tokens in the Explicit Intentional
Query Dataset. We experimented with several text-based
similarity measures including Cosine Similarity, Dice Similarity,
Jaccard Similarity and Overlap Similarity [11], [3]. Because the
similarity measures did not exhibit significant differences, we
decided on using Jaccard Similarity throughout our experiments
for reasons of simplicity. In text-based intentional query
suggestion, we calculate Jaccard Similarity in the following way:
where qA and qB are the respective token sets representing two
queries.
3.3 Neighborhood-Based Intentional Query
Suggestion
In addition to Intentional Query Suggestion based on text, we are
using a similarity construct based on query log session
neighborhood. This has the potential to include behavioural
intentional structures in our algorithm. For that purpose, we are
conceptualizing query logs as consisting of two types of nodes (a
bipartite graph), where nodes of one type correspond to explicit
intentional queries and nodes of the other type correspond to
implicit intentional queries. We construct a bipartite graph based
on session proximity between these two types of nodes. Thereby,
we use neighboring queries to further describe and characterize
explicit intentional queries, building characteristic term vectors
for explicit intentional queries. In the following, we introduce the
parametric algorithm for intentional query expansion in a more
formal way.
Table 2: Search query log excerpt illustrating the explicit
intentional query qe,1 and its neighborhood N(qe,1, 3).
Type Query Date
qu,1 types of diet pills 2006-05-24 13:34:16
qu,2 Lipo6 2006-05-24 13:36:24
qu,3 lose 20 pounds in 8 weeks 2006-05-24 13:37:23
qe,1 lose weight fast 2006-05-24 13:38:42
qu,4 lose weight fast 2006-05-24 13:39:06
qu,5 weight loss upplements 2006-05-24 13:39:51
qu,6 weight loss supplements 2006-05-24 13:39:56
3.3.1 Parametric Algorithm
Let Q = {q1, q2 … qn} denote the set of n queries in a search query
log. Q consists of two disjoint sets QE={qe,1, qe,2 … qe,s} and
QU={qu,1, qu,2 … qu,t } so that Q = QE QU and s + t = n. QE
represents the set of explicit intentional queries, such as “lose
weight fast”, and QU the neighboring implicit intentional queries
such as “weight loss supplements” as illustrated in Table 2.
We define the neighborhood of an explicit intentional query qe as
N(qe, Pd), where the parameter Pd determines the number of
queries that are considered before and after the query qe. The
neighborhood N(qe, Pd) contains 2 * Pd queries where q
ϵ
Q
U
holds. Queries qi
ϵ
N(qe, Pd) are processed to serve as tags
(dimensions of the characteristic vector describing explicit
intentional queries) for the corresponding intentional query qe.
After stop words have been removed, the remaining tokens are
combined into a set of words and form a tag set T(qe)={t1, t2 … tm}
of the explicit intentional query qe. In addition to parameter Pd, we
introduce the parameter Pi that denotes the intersection size
between explicit intentional queries and neighboring queries. This
parameter can be considered as a quality filter. Tokens of one
query are only admitted to the tag set T(qe) if the query shares at
least Pi tokens with qe. Let qe be “lose weight fast”, qu be “weight
loss supplements” and Pi = 1: qe and qu share one common term
(“weight”). Consequently, the tokens of qu are considered tags for
qe, i.e. T(qe) = {“weight”, “loss”, “supplement”}. We suspect this
parameter to be related to the quality of the tags admitted to the
tag set and consequently related to the quality of the entire model.
This yields a characteristic vector of tags for each explicit
intentional query based on session-neighborhood.
Figure 1 shows a bipartite graph that was partly generated from
the query log excerpt in Table 2 with a parameter setting Pd = 3
and Pi = 1. The graph illustrates relations between explicit
intentional queries and meaningful terms in the session
neighborhood, representing characteristic term vectors for explicit
intentional queries. The example also shows that the
neighborhood-based approach is agnostic to misspellings. The
bipartite graph is useful in at least two ways: Bottom-up, it can
help to produce intentional query suggestions based on co-
occurrence (e.g. “upplementslose weight fast”). Top-down, the
graph can help to transform explicit intentional queries into
implicit ones (which is not further pursued in this paper). Note
that qu,3 and qu,4 both represent explicit intentional queries and are
therefore neglected in the graph generation process.
Figure 1: Bipartite graph partly generated from search query
log excerpt in Table 2 with parameter setting Pd=3 and Pi=1.
Similarity between an input query (“upplements”) and a number of
explicit intentional queries (“lose weight fast”) can now be
calculated with traditional similarity metrics. Again, we
experimented with different similarity measures and opted for the
Jaccard similarity measure due to insignificant differences
between the measures. In neighbourhood-based intentional query
suggestion, we calculate Jaccard similarity in the following way:
where T(qA) and T(qB) are the respective token sets representing
two queries.
)()(
)()(
),(
BA
BA
BAG qTqT
qTqT
qqS
=
BA
BA
BAT qq
qq
qqS
=),(
3.4 Query Suggestion based on User Intent
When input queries are processed by our algorithm, both
similarity measures are calculated. In our approach, a linear
combination determines the overall similarity between an input
query and every explicit intentional query in our dataset yielding a
ranked list of potential user intentions. The parameter α defines
the impact of each measure:
In this work we do not intend to identify an optimized parameter
set to generate the model. We rather chose a simple parameter set
for the purpose of seeking answers to the exploratory questions of
this paper. Future work might explore the utility of parameter
variations in greater depth.
The parametric algorithm for Intentional Query Suggestion can be
described by the function IQS f (Pd, Pi, α). We used following
parameter setting: Pd = 3, Pi = 1 and α = 0.5 in our experiments.
An evaluation of the selected model is provided in Section 3.5.
3.5 Evaluation
We conducted a user study to learn more about the quality of
intentions that were suggested by our algorithm. Annotators were
asked to categorize the 10 top-ranked suggested explicit
intentional queries for 30 queries into one of the following two
relevance classes.
Relevance Classes:
(1) Potential User Intention: the suggested query represents
a plausible intention behind a short query.
Initial Query Intentional Query Suggestions
anime” “draw anime”, “draw manga
playground mat” “buy playground equipment”, “build a swing set
or the suggested query represents an unlikely yet still
related user intention as illustrated by following examples:
Initial Query Intentional Query Suggestions
Boston herald” “getting around Boston”, “sightseeing in
Boston
ginseng coffee” “moving coffee stains”, “fix my keyboard
(2) Clear Misinterpretation: the suggested query has no
relation with the initial query. Suggestions that do not
conform to our definition (see Section 3) are assigned this
category as well.
Initial Query Intentional Query Suggestions
Boston herald” “care for Boston fern”, “flying to Nantucket
playground mat” “raise money for our playground”, “weave a
basket fifth grade project
30 queries of length 1 or 2 were randomly drawn from the MSN
search query log. The prospective queries were filtered with
regard to (i) reasonableness, i.e. discarding queries such as
wiseco” or “drinkingmate” and to (ii) non American raters, i.e.
discarding queries such as “target” or “espn”.
In order to evaluate intentional query suggestions that are
provided by our algorithm, we calculated the percentage of correct
suggestions, i.e. query suggestions that were assigned to relevance
class 1. Achieved precision values are illustrated in Table 3.
Table 3: Precision values of our algorithm as rated by three
human annotators (X, Y and Z).
X Y Z
Precision 0.61 0.73 0.8
The average precision amounts to 0.71, i.e. in seven out of ten
cases the algorithm returns a potential user intention.
In addition, we calculated the inter-rater agreement κ [8] between
all pairs of human subjects X, Y, and Z. Cohen’s κ measures the
average pair-wise agreement corrected for chance agreement
when classifying N items into C mutually exclusive categories.
Cohen’s κ formula reads:
where P(O) is the proportion of times that a hypothesis agrees
with a standard (or another rater), and P(C) is the proportion of
times that a hypothesis and a standard would be expected to agree
by chance. The κ value is constrained to the interval [-1,1]. A κ-
value of 1 indicates total agreement, 0 indicates agreement by
chance and -1 indicates total disagreement. Table 4 shows the
achieved κ-values in our human subject study.
Table 4: Kappa values amongst three annotators (X, Y and Z)
for the two relevance classes.
X-Y X-Z Y-Z
Cohen’s Kappa (κ) 0.6416 0.5125 0.6703
The κ-values (see Table 4) range from 0.51 to 0.67 (0.61 on
average) containing two values above 0.6 indicating some level of
agreement.
4. PRELIMINARY RESULTS
In this section we discuss two potential implications of Intentional
Query Suggestion for web search: First, diversity of search results
has recently gained importance in web search [9]. For example in
informational queries, web search results should not provide
monolithic search result sets but rather cover as many different
aspects (topics) as possible. We are interested in exploring the
influence of explicit intentional queries on the diversity of search
result sets. If result sets of explicit intentional queries would be
more diverse, Intentional Query Suggestion could help to better
focus and guide searchers’ intent in exploratory searches.
Second, click through rates have been frequently used as a proxy
for measuring relevance in large document collections (cf. [10]).
We are interested in studying whether explicit intentional queries
would yield other/better click-through rates than implicit
intentional queries. If explicit intentional queries would yield
higher click-through rates, making user intent more explicit would
represent an interesting new mechanism to improve search engine
performance.
4.1 Influence on Diversity of Search Results
We examine the diversity within search results by calculating the
intersection size between different URL result sets produced by
different/same query suggestion mechanisms. Two experiments
were conducted, seeking answers to the following questions:
),(*)1(),(*),( BAGBATBA qqSqqSqqS
αα
+=
)(1
)()(
CP CPOP
=
κ
(i) Intersection between different Query Suggestion
Mechanisms: How many URLs (top level domains only)
intersect between URL result sets retrieved by 1) the
original queries, 2) the corresponding Yahoo! expanded
queries and 3) the corresponding intentional query
suggestions?
(ii) Intersection within same Query Suggestion
Mechanisms: How many URLs (top level domains only)
intersect between result sets that were retrieved by
different query suggestions (produced by the same
query suggestion mechanism) regarding one original
query?
400 queries of length 1 or 2 were randomly drawn from the MSR
search query log. Following constraints were made: original
queries (i) should yield at least 10 suggestions by our algorithm,
(ii) should not contain misspellings and (iii) must not be ‘adult’
phrases. For each selected query, the top 10 suggestions were
produced by using the Yahoo! API and by the Intentional Query
Suggestion algorithm. We processed the top 50 result URLs for
each suggestion, totalling 500 URLs per selected query. Searches
were conducted by applying the Yahoo! BOSS API3. In order to
compare the original query results with both expanded results sets,
500 resulting URLs are retrieved for every original query. For
each query, we calculated how many URLs are shared on average
between the URL result sets taking into account only unique
URLs as well as only top level domains of the resulting set.
Again, we used Jaccard as a metric for intersection/similarity. The
averaged results over all candidate queries are shown in Table 5.
Table 5: Average intersection sizes for URL sets of original
queries and their corresponding suggestions.
Compared URL result sets Avg.Inter-
section
Original Queries vs. Yahoo! Suggestions 0.1911
Original Queries vs. Intentional Suggestions 0.0467
Yahoo! Suggestions vs. Intentional Suggestions 0.0511
The results in Table 5 imply that original query results share more
URLs with results from Yahoo! expanded queries than with
results yielded by queries that reflect potential user intent. This
suggests that if queries are expanded by user intent more diverse
result sets can be achieved. In addition, we calculated the inner
intersection size of the result sets, i.e. the overlap between
different result sets produced by the same suggestion mechanism.
The results were again averaged over all queries and are shown in
Table 6.
The results in Table 6 suggest that queries expanded by Yahoo!
yield more overlapping URLs than queries expanded by user
intent. These results suggest that queries that express a specific
intention lead to more diverse results than queries that attempt to
approximate the expected document content to retrieve.
3 http://developer.yahoo.com/search/boss/
Table 6: Average intersection sizes for URL sets expanded by
Yahoo! Suggestions and Intentional Query Suggestion.
Compared URL result sets Average Intersection
Yahoo! Suggestions 0.103
Intentional Query Suggestion 0.026
Considering the presented results, we can speculate that search
processes could be made more focused if the searchers’ intention
is explicitly included in the search process. It appears that
intentional query suggestions diversify search results and cover a
wider range of topics than Yahoo!’s suggestions.
4.2 Influence on Click-Through
To study the influence of explicit intentional queries on click
through, we analyzed the number of click-through events for
different token lengths. We obtained the click-through numbers
for different token lengths in the MSR query dataset and created
the following token length bins: one token queries, two token
queries, three to four token queries, five token queries, six to ten
token queries and queries consisting of more than ten tokens
(excluding explicit intentional queries). Five token queries were
of particular interest, since the average length of queries in our
Explicit Intentional Query Dataset amounts to 5.33 tokens. For
each category, a random sample of 5,000 queries was drawn from
the MSN search query log and all corresponding click-through
events were registered and counted. Table 7 shows the number of
click through events for each bin and also for the set of explicit
intentional queries.
Table 7: Click-through distribution for different query
lengths and explicit intentional queries
Implicit Intentional Queries Explicit
Intent.
Queries
Query
Length 1
2 3-4 5 6-10 >10 5.33
#click-
through
855,649 358,327 64,313 5,559 2,728 960 7,236
It can be observed that explicit intentional queries appear to have
a ~ 30% higher number of click through events (#click-through =
7,236) than implicit intentional queries of comparable length
(length 5, #click-through = 5,559). The higher click-through
numbers of explicit intentional queries suggest that such queries
retrieve more relevant results, which appears to be an interesting
finding and preliminary evidence for the potential utility of
intentional query suggestions.
5. RELATED WORK
Two areas of research are particularly relevant to our work:
Studies of search intent in query logs and query suggestion.
Studies of search intent in query logs: Peter Norvig discussed4
search intent as one of the outstanding problems in the future of
search. One interpretation of understanding the users’ needs is to
4 Interview in the Technology Review (Monday, July 16, 2007)
understand the intentions behind search queries. Intentional query
suggestions could be regarded as a first step in this direction by
helping users to make their search intent more explicit. In
previous years, several different definitions of user intent emerged
[6], [10], [12],[25]. Broder [6] for example introduced a high level
taxonomy of search intent by categorizing search queries into
three categories: navigational, informational and transactional.
This has stimulated a series of follow up research on automatic
query categorization by [18], [13], [15], [12] and [23]. Evolutions
of Broder’s taxonomy include collapsing categories, adding
categories [5] and/or focusing on subsets only [18]. In contrast to
Broder, we do not incorporate high-level categories of search
intent but rather focus on instances of user intentions
(informational vs. “things to consider when buying a car”).
He et al. [12] used syntactic structures, i.e. verb-object pairs, to
classify queries into Broder’s categories. In a similar way,
Strohmaier et al. [25] employed part-of-speech trigrams as
features to extract instances of user intentions in search query
logs. In this paper, user intent is understood as a certain type of
verb phrases that explicitly state the user’s goal. Downey et al
[10] view the information seeking process differently: Actions
that follow a search query are proposed as characterizations of the
searcher’s information goal. The last URL visited in a search
session serves as a proxy for the user intent. While their approach
is useful to study user behavior during search sessions, it can not
easily be used in an interactive way - to enable users to make their
search intent more explicit.
In addition to studies of user intent, research on query suggestion
is related to our work as well. Query expansion [27], query
substitution [14], query recommendation [4] and query refinement
[17] are different concepts that share a similar objective:
transforming an initial query into a ‘better’ query that is capable
of satisfying the searcher’s information need by retrieving more
relevant documents. We deviate from these traditional approaches
that focus on query vs. expected documents by focusing on
queries and potential user intentions. Xu et al. [27] for example
employed local and global documents in query expansion by
applying the measure of global analysis to the selection of query
terms in local feedback. Query suggestion is closely related to
query substitution as well where the original query is extended by
new search terms to narrow the search scope. Jones et al. [14]
investigated a query substitution mechanism that does not exhibit
query drift which represents a common drawback of query
expansion techniques. The authors make use of search query
sessions to infer relations between queries.
Baeza-Yates et al. [4] proposed an approach that suggests related
queries based on query log data and clustering. Former queries
were transformed into a new term-vector representation by taking
into account the content of the clicked URLs. Another approach
reported in [17] employed anchor texts for the purpose of query
refinement. It is based on the observation that queries and anchor
texts are highly similar. Query transformation techniques have
already spread to other areas such as question answering [1].
Work on query suggestion has recently been done by [20], [22].
Both papers apply their algorithms on bipartite graphs (user -
query and/or query - URL) that were generated from search query
logs. In a similar way, our work generates a bipartite graph from a
search query log. However, our approach focuses on explicit
intentional queries and their implicit intentional query
neighborhood, thereby focusing on explicit user intent rather than
the generation of syntactic or semantic query suggestions.
6. CONCLUSIONS
While there is a significant body of research on understanding
user intent during search ([6], [23], [13], [5], [18], [10], [7]), to
the best of our knowledge, the application of user intent to query
suggestion is a novel idea which has not been studied yet. In this
paper, we introduce and define the concept of Intentional Query
Suggestion and present a prototypical algorithm as first evidence
for the feasibility of this idea. In a number of experiments,
we.could highlight interesting differences to traditional query
suggestion mechanisms: 1) Differences in the diversity of search
results. Our results suggest that intentional query expansions can
be used to diversify result sets. One implication of this finding is
that search engine vendors might be able to make search processes
more focused if the searchers’ intention is explicitly included in
the search process. 2) Different click-through distributions for
explicit intentional queries. Our experiments showed a higher
click-through ratio for explicit intentional queries compared to
implicit intentional queries of similar length. The higher click-
through numbers suggest that such queries retrieve more relevant
results. This interesting finding might inspire novel ways to
approach query suggestion in the future.
Our results could be relevant for a number of currently open
research problems. 1) Query disambiguation: Similar to Allan [2],
where the problem of query disambiguation was approached by
posing questions, Intentional Query Suggestion could provide a
mechanism to identify the original user goal during search. 2)
Search intent: A better understanding of the user’s intent could
give search engine vendors a better picture of users’ needs. In the
long run, approximating user intent could help making search
more focused and prevent topic drift. 3) Search session: Along
with a better understanding of users’ search intent, new, more
useful definitions of search sessions might be necessary. New
definitions could differ from existing definitions by, for example,
putting emphasis on a set of coherent, goal-related queries rather
than time-based notions, where multi-tasking behavior of users is
hard to capture. 4) Evaluation: Kinney et al. [16] point out the
difficulty of finding expert annotators when it comes to annotating
web search results for evaluation purposes. In order to alleviate
the annotation task, the authors proposed statements that
described the user intent behind a query. Intentional query
suggestion might serve as a link between plain queries and the
intent statements by offering a list of empirically-grounded,
plausible user intentions.
7. ACKNOWLEDGMENTS
We would like to thank Microsoft Research for providing the
search query log and Peter Prettenhofer for his support in
extracting the Explicit Intentional Query Dataset. This work is
funded by the FWF Austrian Science Fund Grant P20269
TransAgere. The Know-Center is funded within the Austrian
COMET Program under the auspices of the Austrian Ministry of
Transport, Innovation and Technology, the Austrian Ministry of
Economics and Labor and by the State of Styria. COMET is
managed by the Austrian Research Promotion Agency FFG.
8. REFERENCES
[1] Agichtein E., Lawrence S. and Gravano L. Learning search
engine specific query transformations for question
answering. In 'WWW '01: Proceedings of the 10th
International Conference on World Wide Web', ACM, New
York, NY, USA, pp. 169--178, 2001.
[2] Allan J. and Raghavan H. Using part-of-speech patterns to
reduce query ambiguity. In 'Proceedings of the 25th Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval', ACM Press New
York, NY, USA, pp. 307--314, 2002.
[3] Baeza-Yates R. and Ribeiro-Neto B. Modern Information
Retrieval, AddisonWesley, 1999.
[4] Baeza-Yates R., Hurtado C.A. and Mendoza M. Query
recommendation using query logs in search engines. In
Lindner W., Mesiti M., Türker C., Tzitzikas Y. and Vakali
A., 'EDBT Workshops', Springer, pp. 588--596, 2004.
[5] Baeza-Yates R., Calderón-Benavides L. and González-Caro
C. The intention behind web queries. In String Processing
and Information Retrieval, pp. 98--109, 2006.
[6] Broder A. A taxonomy of web search. In ACM SIGIR Forum
36(2), pp. 3--10, 2002.
[7] Chang Y., He, K., Yu S. and Lu, W. Identifying user goals
from web search results. In 'WI '06: Proceedings of the 2006
IEEE/WIC/ACM International Conference on Web
Intelligence', IEEE Computer Society, Washington, DC,
USA, pp. 1038--1041, 2006.
[8] Cohen, J. A coefficient of agreement for nominal scales. In
Educational and Psychological Measurement 20(1), 37,
1960.
[9] Crabtree, D. W., Andreae, P. and Gao, X. Exploiting
underrepresented query aspects for automatic query
expansion. In 'KDD '07: Proceedings of the 13th ACM
SIGKDD international conference on Knowledge Discovery
and Data Mining', ACM, New York, NY, USA, pp. 191--
200, 2007.
[10] Downey, D., Liebling, D. and Dumais, S. Understanding the
relationship between searchers, queries and information
goals. 'CIKM '08: Proceedings of the 17th ACM Conference
on Information and Knowledge Management', ACM, New
York, NY, USA, 2008.
[11] Ferber, R. Information Retrieval, Dpunkt.Verlag, ISBN 978-
3898642132, 2003.
[12] He, K., Chang, Y. and Lu, W. Improving identification of
latent user goals through search-result snippet classification.
In 'WI '07: Proceedings of the IEEE/WIC/ACM International
Conference on Web Intelligence', IEEE Computer Society,
Washington, DC, USA, pp. 683--686, 2007.
[13] Jansen, B. J., Booth, D. L. and Spink, A. Determining the
informational, navigational, and transactional intent of web
queries. In Inf. Process. Manage. 44(3), pp. 1251--1266,
2008.
[14] Jones, R., Rey, B., Madani, O. and Greiner, W. Generating
query substitutions. In 'WWW '06: Proceedings of the 15th
International Conference on World Wide Web', ACM, New
York, NY, USA, pp. 387--396, 2006.
[15] Kang, I. and Kim, G. Query type classification for web
document retrieval. In 'SIGIR '03: Proceedings of the 26th
Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval', ACM, New
York, NY, USA, pp. 64--71, 2003.
[16] Kinney, K. A., Huffman, S. B. and Zhai, J. How evaluator
domain expertise affects search result relevance judgments.
In 'CIKM '08: Proceedings of the 17th ACM Conference on
Information and Knowledge Management', ACM, New
York, NY, USA, pp. 591--598, 2008.
[17] Kraft, R. and Zien, J. Mining anchor text for query
refinement. In 'WWW '04: Proceedings of the 13th
International Conference on World Wide Web', ACM, New
York, NY, USA, pp. 666--674, 2004.
[18] Lee, U., Liu, Z. and Cho, J. Automatic identification of user
goals in web search. In 'WWW '05: Proceedings of the 14th
International Conference on World Wide Web', ACM, New
York, NY, USA, pp. 391--400, 2005.
[19] Liu, H. and Singh, P. 'ConceptNet — A practical
commonsense reasoning tool-kit'. In BT Technology Journal
22(4), pp. 211--226, 2004.
[20] Ma, H., Yang, H., King, I. and Lyu, M. R. Learning latent
semantic relations from clickthrough data for query
suggestion. In 'CIKM '08: Proceedings of the 17th ACM
Conference on Information and Knowledge Management',
ACM, New York, NY, USA, pp. 709--718, 2008.
[21] Manning, C. D., Raghavan, P. and Schütze, H. Introduction
to Information Retrieval, Cambridge University Press, 2008.
[22] Mei, Q., Zhou, D. and Church, K. Query suggestion using
hitting time. In 'CIKM '08: Proceedings of the 17th ACM
Conference on Information and Knowledge Management',
ACM, New York, NY, USA, pp. 469--478, 2008.
[23] Rose, D. E. and Levinson, D. Understanding user goals in
web search. In 'WWW '04: Proceedings of the 13th
International Conference on World Wide Web', ACM, New
York, NY, USA, pp. 13--19, 2004.
[24] Strohmaier, M., Prettenhofer, P. and Lux, M. Different
degrees of explicitness in intentional artifacts - studying user
goals in a large search query log. In 'CSKGOI'08:
Proceedings of the Workshop on Commonsense Knowledge
and Goal Oriented Interfaces, in conjunction with IUI'08',
Canary Islands, Spain, 2008.
[25] Strohmaier, M., Prettenhofer, P. and Kröll, M. Acquiring
explicit user goals from search query logs. In 'International
Workshop on Agents and Data Mining Interaction ADMI'
08, in conjunction with WI '08', 2008.
[26] Strzalkowski, T. and Carballo, J. Natural Language
Information Retrieval: TREC-5 Report. In 'Text REtrieval
Conference', pp. 164--173, 1998.
[27] Xu, J. and Croft, W. B. Query expansion using local and
global document analysis. In 'SIGIR '96: Proceedings of the
19th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval', ACM,
New York, NY, USA, pp. 4--1, 1996
... Nowadays, a standard feature expected in any modern search engine is interactive query suggestion, which displays a ranked list of possible queries as suggestions with each letter that the user types. Since the large scale web search engines have large query logs, most query suggestion methods Barouni-Ebarhimi and Ghorbani (2007); Strohmaier et al. (2009); Meij et al. (2009) in the literature are based on generating query suggestions using the available query log. Being actual queries entered by users, the relatively frequent queries in such huge logs are almost always well formed. ...
... Li et al. (2012suggests topically related web-queries using a hidden topic model which is built in a supervised setting based on a training data. Another work Strohmaier et al. (2009) introduces the concept of Intentional Query Suggestion which aims at presenting the user with suggestions that attempt to make the user's intent more explicit during search and hence help her reach the relevant documents faster. Some other works for efficient query suggestion and type-ahead search for relational databases Li et al. (2009), Xiao et al. (2013, Nandi and Jagadish (2007) are exclusively in the area of query auto-completion of terms or phrases that are exactly present in the corpus. ...
... -163 unfamiliar items are more inclined to be retrieved. However, only 25% of queries can clearly express user intent [16]; inaccurate queries lead to familiar commands being retrieved, so a familiarity weighted sorting is proposed. If the command's familiarity weight is high, the user is familiar with it and the command is given a slightly lower priority; otherwise, the command is given a slightly higher priority (see Section 5.2). ...
... However, the input query often contains few words; the average number of words contained in the user's query is 2.4 and only about 25% of queries clearly express user intent, even if the users knows what they are searching for, they usually do not know how to formulate an appropriate query [3,16]. Due to the inaccuracy of queries, familiar items may have a high priority in the search results; so their priorities are appropriately adjusted according to the corresponding familiarity weights (the authors use familiarity to quantify how familiar users are with the commands: the more familiar with the command, the higher the familiarity weight). ...
Article
Full-text available
Abstract The command line interface is a crucial way of interacting with Linux, many programs such as ls, pwd and netstat are used on it and it is also the primary way to access a server remotely. However, the command line interface is not user friendly and thus it is difficult to use; there are many programs and users do not know which one is appropriate for finishing their task. To help users find useful commands efficiently, the authors propose FindCmd that retrieves commands based on the local data and user familiarity with commands. Then the local command data are collected including user manual such as man, info and strings extracted from the binary ELF (executable and linkable format) file. Based on the characteristics of local data, an enhanced command retrieval framework is proposed. In addition, the authors marginally decreased the priority of familiar commands when retrieving commands since users tend to use command retrieval tool to find an unfamiliar command. To the best of our knowledge, this is the first local tool for personalised command retrieval. In the evaluation section, the authors compare FindCmd with retrieval tools apropos and howdoi; our experimental results show that FindCmd outperforms the other two tools in retrieving commands. In addition, the experiments demonstrate the effectiveness of personalised search of FindCmd.
... Nowadays, a standard feature expected in any modern search engine is interactive query suggestion, which displays a ranked list of possible queries as suggestions with each letter that the user types. Since the large scale web search engines have large query logs, most query suggestion methods Barouni-Ebarhimi and Ghorbani (2007); Strohmaier et al. (2009); Meij et al. (2009) in the literature are based on generating query suggestions using the available query log. Being actual queries entered by users, the relatively frequent queries in such huge logs are almost always well formed. ...
... Li et al. (2012suggests topically related web-queries using a hidden topic model which is built in a supervised setting based on a training data. Another work Strohmaier et al. (2009) introduces the concept of Intentional Query Suggestion which aims at presenting the user with suggestions that attempt to make the user's intent more explicit during search and hence help her reach the relevant documents faster. Some other works for efficient query suggestion and type-ahead search for relational databases Li et al. (2009), Xiao et al. (2013, Nandi and Jagadish (2007) are exclusively in the area of query auto-completion of terms or phrases that are exactly present in the corpus. ...
Technical Report
Full-text available
Recently the need for interactive query suggestions for partially typed queries without using query log has been acknowledged in the IR community, and some first of a kind methods to generate meaningful corpus-based query suggestions have been proposed ?). However, the state-of-the-art methods for this setting are not efficient enough to be interactive in real-time and do not address the need of having a diverse set of meaningful suggestions. Moreover, due to the absence of any standard test collection for this setting, conducting a reproducible evaluation of such a query suggestion system is a challenge. In this paper, we present a comprehensive approach to efficiently generate meaningful and diverse suggestions based on only the corpus. We also present a novel, deterministic and reproducible automated evaluation method using the Google Suggest API as a gold standard. Experiments over roughly 9000 partial queries, show that our method generates query suggestions that are more meaningful and diverse than the state-of-the-art methods, despite being orders of magnitude faster. Further, it shows our system is much closer in performance to the query-log based suggestion systems (gold standard Google Suggest API) than the state-of-the-art.
... Therefore Nie [9] stated the query expansion that needs to be improved in monolingual while Dang and Croft [10] present query formulation improving the search result. Another perspective, Strohmaier and partners [11] describe the suggested words for query so that users may choose the right query for expected results. On top of that, stemmer [2,6] is very important to apply to that translated query [12,13,14,15,16]. ...
Article
Full-text available
Quranic documents result has a limited query due to focusing on exact words to retrieve those relevant documents. Therefore, there is variety of results to be useful for the target users to explore Quran documents in proper manner. Thus, this paper presents analysis according to conducted empirical experiments in 12 retrieval processes. Thus a system is needed to retrieve relevant documents across language boundaries as well as monolingual. Therefore, empirical experiments are conducted with the purposes to investigate English-Malay translation approach and vice versa against monolingual searching process. Furthermore, it is also conducted to investigate the performance between keywords and querywords based on total retrieve and relevant for each retrieval process. The retrieval however, included the unnecessary documents because of the translation polysemy. This research also is being applied in retrieving Quran English and Malay translated documents with queries compared to monolingual query searching retrieval. Furthermore, in order to produce more significant result, the comparison between stemmer and monolingual results are successfully analysed to evaluate precision and recall percentages. The most important findings are the use of stemmer more beneficial to the query and documents simultaneously regardless the experiments applied translation or not. It leads more and more relevant results displayed.
Chapter
Search engines aim at helping users find relevant results from the Web. Understanding the underlying intent of queries issued to search engines is a critical step toward this goal. Till now, it is still a challenge to have a scientific definition of query intent. Existing approaches attempting to understand query intents can be classified into two categories: (1) query intent classification: mapping queries into categories and (2) query intent mining: finding subtopics covered by the queries. For the first group of work, the mapping between queries and categories can be conducted in various ways, including classifying based on navigational, informational, or transactional intent, based on geographic locality, temporal intent, topical categories, or available vertical services. For query intent mining, the output can be a list of explicit subqueries, or some implicit representation of subintent, such as a list of document clusters, a list of entities, etc. In this chapter, we will introduce these query intent prediction approaches in detail.
Article
Inferring query intent is significant in information retrieval tasks. Query subtopic mining aims to find possible subtopics for a given query to represent potential intents. Subtopic mining is challenging due to the nature of short queries. Learning distributed representations or sequences of words has been developed recently and quickly, making great impacts on many fields. It is still not clear whether distributed representations are effective in alleviating the challenges of query subtopic mining. In this paper, we exploit and compare the main semantic composition of distributed representations for query subtopic mining. Specifically, we focus on two types of distributed representations: paragraph vector which represents word sequences with an arbitrary length directly, and word vector composition. We thoroughly investigate the impacts of semantic composition strategies and the types of data for learning distributed representations. Experiments were conducted on a public dataset offered by the National Institute of Informatics Testbeds and Community for Information Access Research. The empirical results show that distributed semantic representations can achieve outstanding performance for query subtopic mining, compared with traditional semantic representations. More insights are reported as well.
Conference Paper
Query recommendation plays an important role in improving users' search experience. Traditional ways most mine recommended words from log information. However, in user logs, sessions are difficult to divide. At the same time, click results are with bias and noise, and many queries lack clicks, it will make useful information be sparse. In this paper, we present a novel method based on local documents. Different from the traditional query recommendation, this method recommends related terminology according to the meaning of the query. We extract terminology documents from the pseudo-related feedback documents, then model topics of the terminology documents and use the inference strategies to infer the topic of the query to solve the problem of theme drift. In addition, to bring better recommendation results, we fuse supervised and unsupervised methods to mine semantic concept relations between query words and recommended words. Finally, the words with semantic concepts relation are recommended to the user. Experimental results show that our method can meet the user's search needs better. Compared with traditional query recommendation, users prefer the query recommendation way that we propose.
Article
Today'smultimedia search engines are expected to respond to queries reflecting a wide variety of information needs from users with different goals. The topical dimension ("what" the user is searching for) of these information needs is well studied; however, the intent dimension ("why" the user is searching) has received relatively less attention. Specifically, intent is the "immediate reason, purpose, or goal" that motivates a user to query a search engine. We present a thorough survey of multimedia information retrieval research directed at the problem of enabling search engines to respond to user intent. The survey begins by defining intent, including a differentiation from related, often-confused concepts. It then presents the key conceptual models of search intent. The core is an overview of intent-aware approaches that operate at each stage of the multimedia search engine pipeline (i.e., indexing, query processing, ranking). We discuss intent in conventional text-based search wherever it provides insight into multimedia search intent or intentaware approaches. Finally, we identify and discuss the most important future challenges for intent-aware multimedia search engines. Facing these challenges will allow multimedia information retrieval to recognize and respond to user intent and, as a result, fully satisfy the information needs of users.
Chapter
Ihre Bezeichnung zeigt es bereits an: Die Sozialen Medien – Twitter, Facebook, YouTube, Google+ und eine Vielzahl weniger bekannter Angebote, Community-Portale oder Blogs – sind unbestritten ein Interaktionsraum mit immenser sozialer Bedeutung. Eine (beinahe) ubiquitäre Zugänglichkeit, die große Reichweite und ein hoher Durchdringungsgrad würden sie allein in quantitativer Hinsicht zu einem beachtenswerten Phänomen machen. Sie tragen aber auch zu einem „neuen Strukturwandel der Öffentlichkeit“ bei – wie die drei deutschsprachigen Fachgesellschaften für Soziologie aus Deutschland, Österreich und der Schweiz Ende 2011 ihren Dreiländerkongress in Anknüpfung an die knapp fünfzig Jahre zuvor erschienene bahnbrechende Studie von Jürgen Habermas betitelten.
Article
Full-text available
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.
Conference Paper
Full-text available
Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely to be more relevant than those that only represent some aspects. Current web search engines often produce result sets whose top ranking documents represent only a subset of the query aspects. By expanding the query using the right keywords, the search engine can find documents that represent more query aspects and performance improves. This paper describes AbraQ, an approach for automatically finding the right keywords to expand the query. AbraQ identifies the aspects in the query, identifies which aspects are underrepresented in the result set of the original query, and finally, for any particularly underrepresented aspect, identifies keywords that would enhance that aspect's representation and automatically expands the query using the best one. The paper presents experiments that show AbraQ significantly increases the precision of hard queries, whereas traditional automatic query expansion techniques have not improved precision. AbraQ also compared favourably against a range of interactive query expansion techniques that require user involvement including clustering, web-log analysis, relevance feedback, and pseudo relevance feedback.
Conference Paper
Full-text available
For a given query raised by a specific user, the Query Sug- gestion technique aims to recommend relevant queries which potentially suit the information needs of that user. Due to the complexity of the Web structure and the ambiguity of users' inputs, most of the suggestion algorithms suffer from the problem of poor recommendation accuracy. In this pa- per, aiming at providing semantically relevant queries for users, we develop a novel, effective and efficient two-level query suggestion model by mining clickthrough data, in the form of two bipartite graphs (user-query and query-URL bi- partite graphs) extracted from the clickthrough data. Based on this, we first propose a joint matrix factorization method which utilizes two bipartite graphs to learn the low-rank query latent feature space, and then build a query simi- larity graph based on the features. After that, we design an online ranking algorithm to propagate similarities on the query similarity graph, and finally recommend latent seman- tically relevant queries to users. Experimental analysis on the clickthrough data of a commercial search engine shows the effectiveness and the efficiency of our method.
Conference Paper
Full-text available
We describe results from Web search log studies aimed at elucidating user behaviors associated with queries and destination URLs that appear with different frequencies. We note the diversity of information goals that searchers have and the differing ways that goals are specified. We examine rare and common information goals that are specified using rare or common queries. We identify several significant differences in user behavior depending on the rarity of the query and the destination URL. We find that searchers are more likely to be successful when the frequencies of the query and destination URL are similar. We also establish that the behavioral differences observed for queries and goals of varying rarity persist even after accounting for potential confounding variables, including query length, search engine ranking, session duration, and task difficulty. Finally, using an information-theoretic measure of search difficulty, we show that the benefits obtained by search and navigation actions depend on the frequency of the information goal.
Conference Paper
Full-text available
Generating alternative queries, also known as query sugges- tion, has long been proved useful to help a user explore and express his information need. In many scenarios, such sug- gestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. However, how to generate suggestions while ensuring their semantic consistency with the original query remains a chal- lenging problem. In this work, we propose a novel query suggestion algo- rithm based on ranking queries with the hitting time on a large scale bipartite graph. Without involvement of twisted heuristics or heavy tuning of parameters, this method clearly captures the semantic consistency between the suggested query and the original query. Empirical experiments on a large scale query log of a commercial search engine and a scientiflc literature collection show that hitting time is efiec- tive to generate semantically consistent query suggestions. The proposed algorithm and its variations can successfully boost long tail queries, accommodating personalized query suggestion, as well as flnding related authors in research.
Conference Paper
Full-text available
The identification of the user’s intention or interest through queries that they submit to a search engine can be very useful to offer them more adequate results. In this work we present a framework for the identification of user’s interest in an automatic way, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of a user and the categories in which these aims are situated. A manual classification of the queries was made in order to have a reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that for a considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationships between users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning we established that there are categories that we are not able to determine in contrast with other classes that were not considered but naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervised learning is a good alternative to find user’s goals. From supervised learning we can identify the user interest given certain established goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used, refine them and select the most appropriate to the user’s needs.
Conference Paper
On the web, search engines represent a primary instrument through which users exercise their intent. Understanding the specific goals users express in search queries could improve our theoretical knowledge about strategies for search goal formulation and search behavior, and could equip search engine providers with better descriptions of users’ information needs. However, the degree to which goals are explicitly expressed in search queries can be suspected to exhibit considerable variety, which poses a series of challenges for researchers and search engine providers. This paper introduces a novel perspective on analyzing user goals in search query logs by proposing to study different degrees of intentional explicitness. To explore the implications of this perspective, we studied two different degrees of explicitness of user goals in the AOL search query log containing more than 20 million queries. Our results suggest that different degrees of intentional explicitness represent an orthogonal dimension to existing search query categories and that understanding these different degrees is essential for effective search. The overall contribution of this paper is the elaboration of a set of theoretical arguments and empirical evidence that makes a strong case for further studies of different degrees of intentional explicitness in search query logs.
Conference Paper
Traditional search evaluation approaches have often relied on domain experts to evaluate results for each query. Unfortunately, the range of topics present in any representative sample of web queries makes it impractical to have expert evaluators for every topic. In this paper, we investigate the effect of using "generalist" evaluators instead of experts in the domain of queries being evaluated. Empirically, we ind that for queries drawn from domains requiring high expertise, (1) generalists tend to give shallow, inaccurate ratings as compared to experts. (2) Further experiments show that generalists disagree on the underlying meaning of these queries significantly more often than experts, and often appear to "give up'' and fall back on surface features such as keyword matching. (3) Finally, by estimating the percentage of "expertise requiring'' queries in a web query sample, we estimate the impact of using generalists, versus the ideal of having domain experts for every "expertise requiring'' query.
Conference Paper
With the fast growth of the Web, users often suffer from the problem of information overload since many existing search engines response lots of non-relevant documents containing query terms based on the search mechanism of keyword matching. In fact, it is eagerly expected by both users and search engine developers to reduce overloaded information by understanding user goals clearly. In this paper, we intend to utilize Web search results to identify user goals. We propose one novel probabilistic inference model which effectively employs syntactic features to discover a variety of confined user goals