Conference PaperPDF Available

How Do Users Express Goals on the Web? - An Exploration of Intentional Structures in Web Search

Authors:

Abstract

Many activities on the web are driven by high-level goals of users, such as “plan a trip” or “buy some product”. In this paper, we are interested in exploring the role and structure of users’ goals in web search. We want to gain insights into how users express goals, and how their goals can be represented in a semi-formal way. This paper presents results from an exploratory study that focused on analyzing selected search sessions from a search engine log. In a detailed example, we demonstrate how goal-oriented search can be represented and understood as a traversal of goal graphs. Finally, we provide some ideas on how to construct large-scale goal graphs in a semi-algorithmic, collaborative way. We conclude with a description of a series of challenges that we consider to be important for future research.
How Do Users Express Goals on the Web? -
An Exploration of Intentional Structures in Web Search
M. Strohmaier1, M. Lux2, M. Granitzer3, P. Scheir1,3, S. Liaskos4, E. Yu5
1Graz University of Technology, 8010 Graz, Austria
2Klagenfurt University, 9020 Klagenfurt, Austria
3Know-Center Graz, 8010 Graz, Austria
4York University, Toronto, Canada
5University of Toronto, Toronto, Canada
markus.strohmaier@tugraz.at, mlux@itec.uni-klu.ac.at, mgrani@know-center.at,
peter.scheir@tugraz.at, liaskos@yorku.ca, yu@fis.utoronto.ca
Abstract. Many activities on the web are driven by high-level goals of users,
such as “plan a trip” or “buy some product”. In this paper, we are interested in
exploring the role and structure of users’ goals in web search. We want gain
insights into how users express goals, and how their goals can be represented in
a semi-formal way. The paper presents results from an exploratory study that
focused on analyzing selected search sessions from a search engine log. In a
detailed example, we demonstrate how goal-oriented search can be represented
and understood as a traversal of goal graphs. Finally, we provide some ideas on
how to construct large-scale goal graphs in a semi-algorithmic, collaborative
way. We conclude with a description of a series of challenges that we consider
to be important for future research.
Keywords: information search, search process, goals, intentional structures
1 Motivation
In a highly influential article regarding the future of the web [1], Tim Berners-Lee
sketches a scenario that describes a set of agents collaborating on the web to address
different needs of users – such as “get medication”, “find medical providers” or
coordinate appointments”.
In fact, many activities on the web are already implicitly driven by goals today.
Users utilize the web for buying products, planning trips, conducting business, doing
research or seeking health advice. Many of these activities involve rather high-level
goals of users, which are typically knowledge intensive and often benefit from social
relations and collaboration. Yet, the web in its current form is largely non-intentional.
That means the web lacks explicit intentional structures and representations, which
would allow systems to, for example, associate users’ goals with resources available
on the web. As a consequence, every time users turn to the web for a specific purpose
they are required to cognitively translate their high-level goals into the non-intentional
structure of the web. They need to break down their goals into specific search queries,
tag concepts, classification terms or ontological vocabulary. This prevents users from,
In Proceedings of the International Workshop on Collaborative Knowledge Management for Web Information Systems
(We.Know'07), held in conjunction with the 8th International Conference on Web Information Systems Engineering
(WISE 2007)
2 M. Strohmaier1, M. Lux2, M. Granitzer3, P. Scheir1,3, S. Liaskos4, E. Yu5
for example, effectively assessing the relevance and context of resources with respect
to their goals, benefiting from the experiences of others who pursued similar goals
and also prevents them from assessing conflicts or systematically exploring
alternative means.
In a recent interview, Peter Norvig, Director of Google Research, acknowledged that
understanding users' needs to a greater extent represents an “outstanding” research
problem. He explains that Google is currently looking at “finding ways to get the user
more involved, to have them tell us more of what they want.” [2]. Having explicit
intentional representations and structures available on the web would allow users to
express and share their goals and would enable technologies and other users to
explore, comprehend, reason about and act upon them.
It is only recently that researchers have developed a broad interest in the goals and
motivations of web users. For example, several researchers studied intentionality and
motivations in web search logs during the last years [3,4,5]. Because web search
today represents a primary instrument through which users exercise their intent,
search engines have a tremendous corpus of intentional artifacts at their disposal. We
define intentional artifacts broadly to be electronic artifacts produced by users or user
behaviour that contain recognizabletraces of intent”, i.e. implicit traces of users’
goals and intentions.
This paper represents our initial attempt towards exploring the role and structure of
users’ goals in web search queries. We want to learn in detail how users express their
goals on the web - as opposed to what goals they have, which is in the focus of other
studies [3,4,5]. We also want to explore how search goals can be represented in an
explicit, semi-formal way and we are interested in learning about the different ways in
which explicit goal representations could be useful, and to what extent. From our
preliminary findings of an exploratory study, we want to give a qualitative account of
identified potentials and obstacles in the context of goal-oriented search.
2 State of the Art
We will discuss two main streams of research that are relevant in the context of this
paper: The first stream of research focuses on identifying and understanding what
goals users pursue in web search. The second stream focuses on developing goal-
oriented technical solutions, i.e. solutions that depend on the explicit articulation of
user goals or automatic inference thereof.
In the first stream, researchers have proposed categories and taxonomies of user
goals [4,5] and automatic classification techniques to classify search queries into goal
categories [3]. Goal taxonomies include, for example, navigational, informational and
transactional categories [3]. Different categories are assumed to have different
implications on users’ search behaviour and search algorithms. To give some
examples: Navigational search queries (such as the query “citeseer”) characterize
situations where a user has a particular web site in mind and where he is primarily
interested in visiting this page. Informational search queries (such as the query
“increase wine crop”) are queries where this is not the case, and users intend to visit
multiple pages to, for example, learn about a topic [3]. Further research aims to
How Do Users Express Goals on the Web? -
An Exploration of Intentional Structures in Web Search 3
empirically assess the distribution of different goal categories in search query logs via
manual classification and subsequent statistical generalization [4] and/or Web Query
Mining techniques [3,6]. There is some evidence that certain categories of goals can
be identified algorithmically based on different features of user behaviour, such as
“past user-click behaviour” and an analysis of “click distributions” [3]. Recently, a
community of researchers with an interest in Query Log Analysis has formed at the
World Wide Web 2007 conference as a separate workshop.
A second stream of research attempts to demonstrate the principle feasibility of
implementing goal-orientation on an operational level. GOOSE, for example, is a
prototypical goal-oriented search engine that aims to assist users in finding adequate
search terms for their goals [7]. Miro, another example, is an application that
facilitates goal-oriented web browsing [8]. The Lumiere Project focused on inferring
goals of software users based on Bayesian user modeling [10]. Work on goal-oriented
acquisition of requirements for hypermedia applications [11] shows that it is possible
to translate high-level goals of stakeholders into (among other things) low level
content requirements for web applications. Another example [12] facilitates
purposeful navigation of geospatial data through goal-driven service invocation based
on WSMO. WSMO is a web service description approach that decouples user desires
from service descriptions by modeling low-level goals (such as
“havingATripConfirmation”) and non-functional property constructs [13]. In addition
to these approaches, there have been several studies in the domain of information
science that focus on different search strategies (such as top-down, bottom-up, mixed
strategies) of users [14].
Apart from these isolated, yet encouraging, attempts, current research lacks a deep
understanding about how users express their goals, and what explicit representations
could be suitable to describe them.
3 How do Users Express Goals in Web Search?
We initiated an explorative study in response to the observation that there is a lack
of research on how users express their goals in web search. In the following we will
present preliminary findings from this study.
Data sources: We have used the AOL search database [15] as our main data
source1. In addition to the AOL search database, several other web search logs are
available [16]. We have used the AOL search database because it provides
information about anonymous User IDs, time stamps, search queries, and clicked
links. To our knowledge, the AOL search database is also the most recent corpus of
search queries available (2006). We are aware of the ethic controversies arising from
using the AOL search database. For example, although the User IDs are anonymous, a
New York Times reporter was able to track back the identity of one of the users in the
dataset [17]. As a consequence, we masked the search queries that are presented in
1 Because the AOL search database was retracted from AOL shortly after releasing it, we
obtained a copy from a secondary source: http://www.gregsadetsky.com/aol-data/ last
accessed on July 15th, 2007.
4 M. Strohmaier1, M. Lux2, M. Granitzer3, P. Scheir1,3, S. Liaskos4, E. Yu5
this paper by maintaining their semantic frame structure, but exchanging certain
frame element instantiations [19]. We will elaborate on this later on. In following
such an approach, we aim to protect the real identity of the users being studied while
retaining necessary temporal and intentional relations of search queries.
Methodology: In this study we were interested in how users express, refine, alter
and reformulate their goals while searching. We have searched the AOL search
database for different verbs that are considered to indicate the presence of goals,
including verbs such as achieve, make, improve, speedup, increase, satisfied,
completed, allocated, maintain, keep, ensure and others [18]. We subsequently
annotated random results (different search queries) with semantic frame elements
obtained from Berkeley’s Framenet [19]. Framenet is a lexical database that aims to
document the different semantic and syntactic combinatory possibilities of English
words in each of its senses. It aims to achieve that by annotating large corpora of text.
It currently provides information on more than 10.000 lexical units in more than 825
semantic frames [19]. A lexical unit is a pairing of a word with a meaning. For
example, the verb “look” has several lexical units dealing with different meanings of
this verb, such as “direct one’s gaze in a specified direction” or “attempt to find”.
Each different meaning of the word belongs to a semantic frame, which is “a script-
like conceptual structure that describes a particular type of situation, object or event
along with its participants and props” [19]. Each of these elements of a semantic
frame is called frame elements. Semantic frames are evoked by lexical units. To give
an example, the semantic frame “Cause_change_of_position_on_a_scale” is evoked by a
set of lexical units, such as decline, decrease, gain, plummet, rise, increase, etc, and
has the core frame elements Agent [], Attribute [Variable], Cause [Cause] and Item [Item].
Agent refers to the person who causes a change of position on a scale, attribute refers to
the scale that changes its value, cause refers to non-human causes to the change, and
item refers to the entity that is being changed.
Example: The search query “Increase Computer Speed” can be annotated with
Frame Elements from Framenet’s lexicon. The lexical unit “increase” evokes the
frame “Cause_change_of_position_on_a_scale”, which we can use to annotate “Increase
Computer Speed” in the following way: “Increase [item Computer] [attribute Speed]”. The
frame elements Agent and Cause do not apply here.
Selected Results: One verb we were using to explore the dataset was “increase”.
The query history depicted in Table 1 below presents an excerpt of the search history
of a single user that performed search queries containing the verb “increase”. We
picked this particular search log because it demonstrates several interesting aspects of
the role of goals in web search. We do not claim that this user’s search behaviour is
typically or representative for a larger set of users or queries. In fact, the majority of
search queries in the AOL search database is of a non-intentional nature. We discuss
the implications of this observation in the Section 5.
We obtained the complete search record of the selected user, frame-annotated his
intentional queries based on the FrameNet lexicon and classified the queries from an
intentional perspective (e.g. refinement, generalization, etc). The particular frame
used during annotation was “Cause_change_of_position_on_a_scale”, which is evoked
by the verb “increase”. For privacy reasons, we modified the search queries in the
How Do Users Express Goals on the Web? -
An Exploration of Intentional Structures in Web Search 5
following way: We retained the verbs and attributes which were part of the original
query, but modified the contents of the semantic frame element item (e.g. wine crop)
and cause (e.g. fertilizer) as well as time stamps (maintaining relative time differences
with an accuracy of +/- 60 seconds). We’d like to remark that the users’ search history
below was interrupted by other, non-intentional queries (queries such as “flickr.com”)
and also other more complex intentional queries. For reasons of illustration and
simplicity, we leave these out in Table 1.
Nr. Query Frame Annotation Time Stamp Goal
#1 How to get more wine
crop How to
get more
[itemwine crop]
2006-03-30
19:29:59
Formulation
#2 Fertilizer or
insecticide to increase
wine crop
[cause Fertilizer] or
[cause insecticide] to
increase
[itemwine crop]
2006-03-30
19:45:28 Refinement
#3 Fertilizer to increase
wine crop [cause Fertilizer] to
increase
[item wine crop]
2006-03-30
19:46:11 Refinement
[further non-intentional queries, not related to wine crop]
#4 Increase wine crop increase
[item wine crop] 2006-03-30
19:48:25 Generali-
zation
#5 How to get rich wine
crop How to
get rich
[item wine crop]
2006-04-07
06:29:19 Different
Goal
Formulation
[non-intentional query “wine crop”]
#6 How to get good wine
crop How to
have good
[item wine crop]
2006-04-07
06:40:45 Re-
formulation
[further non-intentional queries and further more complex intentional queries
related to “wine crops”]
Table 1. Frame-based Annotation of Selected Queries from a Single Search Session
From a semantic frame perspective, it is interesting to see that it is not possible to
annotate all of the above queries consistently. While the verb increase evokes the
corresponding frame “Cause_change_of_position_on_a_scale” in queries #2, #3 and #4,
the other queries #1, #5, and #6 do not contain increase and therefore do not evoke
the same frame. Although FrameNet contains lexical entries for the verbs get and
have and the adjectives good, rich and more, the word senses get more, get rich and
have good are not yet captured as lexical units in the FrameNet lexicon. However, it is
easily conceivable that an expanded or customized version of FrameNet (possibly in
combination with WordNet) would contain these units and that they could be
associated with the same semantic frame.
From a goal-oriented perspective, we will use our findings to develop a set of
6 M. Strohmaier1, M. Lux2, M. Granitzer3, P. Scheir1,3, S. Liaskos4, E. Yu5
hypothesis that we believe are relevant and helpful to further study the role and
structure of users’ goals on the web.
Several things are noteworthy in the search history of the above user: First, the user
started off with a goal formulation (#1 how to get more wine crop) and then
proceeded with a refinement of this goal in a second query (#2 Fertilizer or insecticide
to increase wine crop). The provided time stamps reveal that in this case, the time
difference between the two queries was more than 15 minutes! Although it is hard to
assess the real cause for this time lag, the AOL search database provides a possible
explanation by listing the websites that the user visited in response to query #1, which
includes a discussion board website hosting discussions on different strategies to get
more “wine crop” (including “insecticides” and “fertilizer”). This allows us to
hypothesize that H1: Goal refinement is a time-intensive process during search.
In query #3, the user performed a further refinement of his goal to “fertilizer to
increase wine crop” and in #4, he performs a generalization to “Increase wine crop”.
This is interesting again from a goal-oriented perspective: Instead of refining his goals
in a strict top-down approach, the user alternates between top-down (refining) and
bottom up (generalizing) goal formulations. We consider this observation in a
hypothesis 2 that claims that, from a goal-oriented perspective, user search is neither a
strict top-down, nor a purely bottom-up approach, but a combination of both. While
we focus on informational queries only, previous studies have found that the type of
approach does not only depend on the type of task, but also different types of users
[14]. This leads us to hypothesize H2: Users search by iteratively refining,
generalizing and reformulating goals, in no particular order.
In query #5 the user performs a different goal formulation: “How to get rich wine
crop”. Instead of focusing on quantity (“get more” / “increase”), the search now can
be interpreted to focus on the quality of wine crop (“get rich”). In query #6, a goal re-
formulation is performed. This can be regarded to represent the same goal, but
articulated in a slightly different way (“get good” instead of “get rich” wine crop).
Another very interesting observation is that there is a time span of more than 7 days
between queries #1-#4 and queries #5-#6! Although we have no information about
what the user might have done in between these search activities, we use this evidence
to tentatively hypothesize that identifying different, but related, goals is difficult for
users, and it involves significant time and potentially cognitive efforts. In a more
intuitive way, we can say that it seems that, especially with high-level, knowledge
intensive goals, users learn about their goals as they go. We formulate this
observation in hypothesis H3: Exploring related goals is more time-intensive than
goal refinement.
And finally, we can observe that a smaller amount of time is passing between
search queries #5 and #6. The question that is interesting to ask based on this
observation is whether goal refinements require more time and cognitive investments
from users than goal re-formulations. One might expect that users with search
experience become skilled in tweaking their queries based on the search engines’
responses without modifying their initial goal. We express this question in our
hypothesis H4: Goal re-formulation requires less time than goal exploration or
goal refinement. Next, we will explore some implications of these observations.
How Do Users Express Goals on the Web? -
An Exploration of Intentional Structures in Web Search 7
Analysis: If hypothesis H1 would be corroborated in future studies, offering users
possible goal refinements would be very likely to be considered a useful concept. If
hypothesis H2 would be supported in further studies, goal-oriented search would not
only need to focus on goal refinement, but also on providing a range of different
intentional navigation structures, allowing to flexibly alternate between refining,
generalizing and exploring goals. If the exploration of goals represents a very time
intensive process (H3), then users can be assumed to greatly benefit from having
access to the goals of other users. And finally, if goal re-formulation does not require
significant amounts of time (H4), there might be little motivation for researchers to
invest in semantic similarity of web searchers, but more motivation to invest in
intentional similarity.
Surprisingly, when analyzing current search technologies such as Google, we can
see that there is almost no support for any of these different goal-related search tasks
(refinement, generalization, etc) identified. Although Google helps in reformulating
search queries (“Did you mean X?”), this – at most – can be regarded to provide some
support for users in goal re-formulation on a syntactic level, but not on a truly
intentional level (help in goal refinement, generalization, etc).
These observations immediately raise a set of interesting research questions: Do
the formulated hypotheses hold for large sets of search sessions? How can the
hypotheses be further refined to make them amenable to algorithmic analysis? And
how can the identified goals be represented in more formal structures? While we are
interested in all of these questions, in this paper we will only discuss the issue of more
formal representations in some greater detail.
4 Representing Search Goals as Semi-Formal Goal Graphs
We have modeled the goals of a user who is interested in “wine crop” with the
agent-and goal-oriented modeling framework i* [20]. When applying i*, we focused
on goal aspects and neglected agent-related concepts such as actors, roles and others.
The i* framework provides elements such as softgoals, goals, tasks, resources and a
set of semantic relations between them. The goal graph in Figure 1 was constructed
by one of the authors of this paper based on the frame-annotated goals depicted in
Table 1. In the diagram, the goals of the users are represented through oval-shaped
elements. Means-ends links are used to indicate alternative ways (means) by which a
goal (ends) can be fulfilled. Goals represent states of affairs to be reached, and tasks,
which are represented through hexagonal elements, describe specific activities that
can be performed for the fulfillment of goals. Soft-goals, which are represented
through cloud-shaped elements, describe goals for which there is no clear-cut
criterion to be used for deciding whether they are satisfied or not. Thus, soft-goals are
fulfilled or denied to a certain degree, based on the presence or absence of relevant
evidence. In i* diagrams, links such as "help" or "hurt" are used to represent how a
belief about the fulfillment or denial of a soft-goal depends on the satisfaction of other
goals. From the goal-graph in Figure 1 we can infer that the goal “increase wine crop”
can be achieved through a variety of means: Fertilizer, Insecticides and Irrigation all
represent means to achieve the end of increasing wine crop. The goal “Increase wine
8 M. Strohmaier1, M. Lux2, M. Granitzer3, P. Scheir1,3, S. Liaskos4, E. Yu5
crop” and the related goal “Improve wine crop” both have “help” contribution links to
the overarching soft-goal “Winery be successful”.
Fig. 1. Representing Users’ Search Goals in a Semi-Formal Goal Graph
Assuming that such goal graphs can be constructed for a range of different
domains (which is evident in a broad set of published examples from the domain of
requirements engineering), it would be interesting to see how the different goal-
related activities of users during search (such as goal formulation, goal refinement,
goal generalization, etc) can be represented as a traversal of such a goal graph. We
will explore this question next.
4.1 How Can Search be Understood as a Traversal through A Goal Graph?
Modifying search engines’ algorithms to exploit knowledge about users’ goals has
a high priority for search engine vendors [5]. Being able to relate search queries to
nodes in a goal graph could enable search engines to provide users goal-oriented
support in search. This could mean that software could offer users to refine their
search goals, generalize them or propose related goals from other users.
Figure 2, depicts the results of manually associating the search queries presented in
Table 1 with the goal graph introduced in Figure 1. We can see that the user starts his
search by formulating a version of the goal “increase wine crop” in query #1. This
goal is refined in query #2 “Fertilizer or insecticides to increase wine crop” which can
be mapped onto the two means “Fertilizer to increase wine crop” and “Insecticides to
increase wine crop”. Query #3 “fertilizer to increase wine crop” represents a further
refinement. In query #4, the user generalizes his search goal to “increase wine crop”
again. Query #5 and #6 relate to a different goal: “Improve wine crop”. Query #5 and
#6 can be considered to be re-formulations of the same goal.
Interestingly, the goal graph reveals that the user did not execute search queries
related to the means “Irrigation to increase wine crop” or the soft-goal “Winery be
successful”, although one can reasonably expect that the user might have had a
genuine interest in these goals too (although validation of this claim is certainly hard
without user interaction).
How Do Users Express Goals on the Web? -
An Exploration of Intentional Structures in Web Search 9
Fig. 2. Goal-Oriented Search as a Traversal of Goal Graphs
As a consequence, a major benefit of having goal graphs available during search
could be pointing users to refined goals or making sure that users do not miss related
goals. But assuming that having such goal graphs would be beneficial, how can they
be constructed?
4.2 How Can Large-Scale Goal Graphs be Constructed?
Mapping search queries onto goal graphs presumes the existence and availability of
goal graphs. In our example, we have hand-crafted a goal graph for illustration
purposes. However, manually constructing such goal graphs is costly, and anticipating
the entirety, or even a large proportion, of users’ goals on the internet would render
such an approach unfeasible. So how can we construct large-scale goal graphs that do
not rely on the involvement of expert modelers? Automatic user goal identification is
an open research problem [6], and answering this question satisfyingly would go well
beyond the scope of this paper, but we’d like to discuss some pointers and ideas: The
recent notion of folksonomies has powerfully demonstrated that meaningful relations
can emerge out of collective behaviour and interactions [21]. We would like to briefly
explore this idea and some of its implications for constructing large-scale goal graphs
based on frame-analysis of intentional artifacts.
Let’s assume that a system has the capability to come up with frame-based
annotations of search queries. The search query “fertilizer or insecticide to increase
wine crop” would then be annotated in a way that is depicted on the left side of Figure
3. Based on such annotations, a goal graph construction algorithm could use heuristics
to construct a goal graph similar to the one depicted on the right side of Figure 3.
Heuristic rules could, for example, prescribe that the root goal is represented by the
central verb (“increase”) and its corresponding item (“wine crop”), and that the means
10 M. Strohmaier1, M. Lux2, M. Granitzer3, P. Scheir1,3, S. Liaskos4, E. Yu5
to this end are represented by the frame elements cause (“fertilizer”, “insecticide”).
Each time a user formulates an intentional search query, the goal graph construction
algorithm could construct such small, atomic goal graphs heuristically.
Fig. 3. Heuristic Construction of Atomic Goal Graphs via Frame-Annotation of Search Queries
In a next step, these atomic goal graphs constructed from different users’ search
queries would need to be connected to larger whole. Considering hypothesis 2, this
appears to be a task that is hard to perform by algorithms alone. Nevertheless, usage
data analysis, explicit user involvement or semi-automatic, collaborative model
construction efforts (as e.g. pursued by the ConceptNet project [9]) might help to
overcome this issue, which can be considered to represent a non-trivial research
challenge.
5 Implications and Threats to Validity
We are aware that our particular research approach puts some constraints on the
results of our work: Due to our focus, the search queries we analyzed were not
required to be representative and, in fact, they are not. To obtain some quantitative
evidence, two of the authors have categorized a pseudo-random sample (based on
java.util.Random randomizer) of 2000 out of 21,011,340 queries into intentional and
non-intentional categories, based on the criterion whether a query contains at least
one verb (infinitive form, excluding gerund) and at least one noun. For each of these
candidates, two authors of this paper judged whether it would be possible to envisage
the goal a user might have had based on a specific query (such as “increase computer
speed”). From our analysis, only 2.35% (47 out of 2000) of the searches from the
AOL search database can be considered to be such “intentional queries”. The
probability of occurrence then results in a 95% confidence interval of [0.0169,
0.0301] for the probability of a query being intentional according to our criteria.
In contrast to these findings, related studies found somewhat higher numbers. A study
reported in [4] suggests that 35% of search sessions have a general, high-level
information research goal (such as questions, undirected requests for information, and
advice seeking). The difference in numbers might be explained by different levels of
analysis and a more relaxed understanding of goals in [4], which allows a broader set
of queries (including queries that do not have verbs) to be labelled as goal-related.
How Do Users Express Goals on the Web? -
An Exploration of Intentional Structures in Web Search 11
There are several implications of this discrepancy: While users often have high-level
goals when they are searching the web, they are currently not rewarded for
formulating (strictly) intentional queries. In fact, one can assume that formulating
non-intentional queries represents a (locally) successful strategy in today’s search
engine landscape. As a result, users might have adapted to the non-intentional mode
in which Google, Yahoo and other search engines operate today. However, this
situation makes it necessary for users to cognitively translate their high-level goals
into search queries and perform reasoning about their goals in their mind. This
potentially increases the cognitive burden of users and makes it hard for systems to
connect them with other users who pursue similar goals or allowing them to benefit
from the experiences made by other searchers.
We do not believe that these implications put constraints on our results: With a
collaborative goal modeling approach, even a small percentage of strictly intentional
queries could be used to construct large-scale goal graphs. Even if the percentage of
intentional queries among the entirety of search queries would be as low as 1% or
even lower, the sheer amount of queries executed on the World Wide Web would still
provide algorithms with a rich corpus to construct large-scale goal graphs. On the
web, such an approach is by far not unusual: For example, on wikipedia, a minority of
users contributes content that is being used by a majority. However, the task of
constructing large-scale goal graphs would obviously become much easier if users
actually would be aware that search engines would interpret their queries as an
expression of intent rather than an input that is being used for text string matching.
6 Conclusions
Based on our preliminary findings, we can formulate a set of interesting research
challenges: First, how can large-scale goal graphs be represented and constructed?
How can intentional artifacts (such as search queries) be associated with nodes in
such goal graphs? How can goals and web resources be associated? And how can
collaboration on the internet support the construction of such intentional structures?
Our work represents an initial attempt towards understanding the role and structure
of goals in web search. We have demonstrated how search processes can be
understood as a traversal through goal graphs and have provided some ideas on how
to construct large scale goal graphs. In future work, we are interested in further
investigating and shaping intentional structures on the web.
References
1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284
(2001)
2. Greene, K., The Future of Search. http://www.technologyreview.com/Biztech/19050/,
last accessed on July 18th, 2007, MIT Technology Review, July 16 (2007)
3. Lee, U., Liu, Z., Cho, J.: Automatic Identification of User Goals in Web Search. In:
WWW ’05: Proceedings of the 14th International World Wide Web Conference, New
York, NY, USA, ACM Press (2005) 391–400
12 M. Strohmaier1, M. Lux2, M. Granitzer3, P. Scheir1,3, S. Liaskos4, E. Yu5
4. Rose, D., Levinson, D.: Understanding User Goals in Web Search. In Feldman, S.I.,
Uretsky, M., Najork, M., Wills, C., eds.: Proceedings of the 13th International World
Wide Web Conference, ACM Press (2004) 13–19
5. Broder, A.: A Taxonomy of Web Search. SIGIR Forum 36 (2002) 3–1
6. Baeza-Yates, R., Calderon-Benavides, L., Gonzalez-Caro, C.: The Intention Behind
Web Queries. In Crestani, F., Ferragina, P., Sanderson, M., eds.: Proceedings of String
Processing and Information Retrieval (SPIRE). Volume 4209 of Lecture Notes in
Computer Science., Springer (2006) 98–109
7. Liu, H., Lieberman, H., Selker, T.: GOOSE: A Goal-Oriented Search Engine with
Commonsense. In: AH ’02: Proceedings of the Second International Conference on
Adaptive Hypermedia and Adaptive Web-Based Systems, London, UK, Springer-
Verlag (2002) 253–263
8. Faaborg, A., Lieberman, H.: A Goal-Oriented Web Browser. In: CHI ’06: Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY,
USA, ACM Press (2006) 751–760
9. Liu, H., Singh, P.: Conceptnet - A Practical Commonsense Reasoning Tool-Kit. BT
Technology Journal 22 (2004) 211–226
10. Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The Lumiere
Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users.
In: In Proceedings of the Fourteenth Conference on Uncertainty in Artificial
Intelligence, Madison, WI (1998) 256–265
11. Bolchini, D., Paolini, P., Randazzo, G.: Adding Hypermedia Requirements to Goal-
Driven Analysis. In: In Proceedings of the 11th IEEE International Conference on
Requirements Engineering (RE 2003), IEEE Computer Society (2003) 127–137
12. Tanasescu, V., Gugliotta, A., Domingue, J., Villarıas, L., Davies, R., Rowlatt, M.,
Richardson, M., Stincic, S. In: Geospatial Data Integration with Semantic Web
Services: the eMerges Approach. (2007)
13. Roman, D., Keller, U., Lausen, H., de Bruijn, J., Lara, R., Stollberg, M., Polleres, A.,
Feier, C., Bussler, C., Fensel, D.: Web Service Modeling Ontology. Applied Ontology
1 (2005) 77–106
14. Navarro-Prieto, R., Scaife, M., Rogers, Y.: Cognitive Strategies in Web Searching. In:
Proceedings of the 5th Conference on Human Factors & the Web. (1999)
15. Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. Proceedings of the 1st
International Conference on Scalable Information Systems, ACM Press New York,
NY, USA, (2006)
16. Jansen, B., Spink, A.: How Are We Searching the World Wide Web? A Comparison of
Nine Search Engine Transaction Logs. Information Processing and Management 42
(2006) 248–263
17. Barbaro, M., Zeller Jr, T.: A Face Is Exposed for AOL Searcher No. 4417749, New
York Times, August 9 (2006)
18. Regev, G., Wegmann, A.: Where Do Goals Come From: The Underlying Principles of
Goal-Oriented Requirements Engineering. In: RE ’05: Proceedings of the 13th IEEE
International Conference on Requirements Engineering (RE’05), Washington, DC,
USA, IEEE Computer Society (2005) 253–362
19. Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C., Scheffczyk, J.: FrameNet II:
Extended Theory and Practice, International Computer Science Institute, University of
California at Berkeley (2006)
20. Yu, E.: Modelling Strategic Relationships for Process Reengineering. PhD thesis,
Department of Computer Science, University of Toronto, Toronto, Canada (1995)
21. Mika, P.: Ontologies Are Us: A Unified Model of Social Networks and Semantics. In:
International Semantic Web Conference. LNCS, Springer (2005) 522–536
... Though the query itself, the returned results, the viewed results and further searches and query refinements can be taken into account. Strohmaier et al. [Strohmaier et al., 2007] investigate how users express their goals and introduce a way how these goals can be represented in a semi-formal way using the i* notation developed by Eric Yu [Yu, 1996]. In the context of multimedia content, images and Flickr, Fan et al. [Fan et al., 2008] developed a hyperbolic visualization based on clusters. ...
... However, at the second task – " Travel to Seattle " – a test user formulated the query " seattle sightseeing " . While this does not follow the common verb-frame pattern of a simple intentional query as defined in [Strohmaier et al., 2007], it is still a composition of noun and frame and reflects the intention of the actual search task. Also the recorded requests at the eighth task, which deals with " Tie a tie " , contained queries like " how to tie a tie " . ...
... However, other expected intentional queries as " express joy " or " express aggression " for tasks 1 and 4 have not been issued by the test users. This results in a share of 2.71% of queries explicitely reflecting intentions using a verb-frame like structure, a lot lower than the share found for text search in [Strohmaier et al., 2007]. Due to this we assume that in the context of multimedia data a query just contains keywords which describe the expected and desired content of the image, not the intention itself. ...
Article
Full-text available
Search queries are typically interpreted as specification of information need of a user. Typically the search query is either interpreted as is or based on the context of a user, being for instance a user profile, his/her previously undertaken searches or any other background information. The actual intent of the user – the goal s/he wants to achieve with information retrieval – is an important part of a user's context. In this paper we present the results of an exploratory study on the interplay between the goals of users and their search behavior in multimedia retrieval.
... As a matter of fact, current service registries still essentially adopt the keyword-based technology for service search, which usually suffers from low recall [19,30]. Moreover, many activities on the Web such as Web search are driven by users' highlevel goals [31], e.g., "book hotels" and "search playlist," in order to obtain accurate search results. Because the keyword-based technology is mainly based on the bag-of-words model that cannot capture important relationships between the words [32] in service descriptions and user queries, it is insufficient to retrieve accurate services for users' functional goals. ...
... A service goal is used to exhibit the intentional functionality of a service[6,45]. According to the studies on goal modeling[31,46], we defined the service goal as a triple sg =< sgv, sgn, sgp >, where sgv is a verb or verb phrase, which denotes the action of the service goal, sgn is a noun or noun phrase, which denotes the entities affected by the action, and sgp is an optional set of parameters, which denote the additional information such as how the action affects the entity, the initial or final state of the entity.As shown inFig. 1, the textual description of a service generally contains several sentences. ...
Article
Full-text available
In recent years, RESTful services that are mainly described using short texts are becoming increasingly popular. The keyword-based discovery technology adopted by existing service registries usually suffers from low recall and is insufficient to retrieve accurate RESTful services according to users’ functional goals. Moreover, it is often difficult for users to specify queries that can precisely represent their requirements due to the lack of knowledge on their desired service functionalities. Toward these issues, we propose a RESTful service discovery approach by leveraging service goal (i.e., service functionality) knowledge mined from services’ textual descriptions. The approach first groups the available services into clusters using probabilistic topic models. Then, service goals are extracted from the textual descriptions of services and also clustered based on the topic modeling results of services. Based on service goal clusters, we design a mechanism to recommend semantically relevant service goals to help users refine their initial queries. Relevant services are retrieved by matching user selected service goals with those of candidate services. To improve the recall of the goal-based service discovery approach, we further propose a hybrid approach by integrating it with two existing service discovery approaches. A series of experiments conducted on real-world services crawled from a publicly accessible registry, ProgrammableWeb, demonstrate the effectiveness of the proposed approaches.
... As a means of improving performance, some studies have tried to exploit the context provided by user sessions [17,18], searchresult snippets and click-through data [19]. However, automatically identifying user sessions is not an easy task [20], and these can involve users trying to achieve several goals. ...
Article
Abstract Automatically identifying the user intent behind web queries has started to catch the attention of the research community, since it allows search engines to enhance user experience by adapting results to that goal. It is broadly agreed that there are three archetypal intentions behind search queries: navigational, resource/transactional and informational. Thus, as a natural consequence, this task has been interpreted as a multi-class classification problem. At large, recent works have focused on comparing several machine learning methods built with words as features. Conversely, this paper examines the influence of assorted properties on three classification approaches. In particular, it focuses its attention on the contribution of linguistic-based attributes. However, most of natural language processing tools are designed for documents, not web queries. Therefore, as a means of bridging this linguistic gap, we benefited from caseless models, which are trained with traditionally labeled data, but all terms are converted to lowercase before their generation. Overall, tested attributes proved to be effective by improving on word-based classifiers by up to 8.347% (accuracy), and outperforming a baseline by up to 6.17%. Most notably, linguistic-oriented features, from caseless models, are shown to be instrumental in narrowing the linguistic gap between queries and documents.
Conference Paper
The increasing amount of services published on the Web makes it difficult to discover relevant services for users. Unlike the SOAP-based services that are described by structural WSDL documents, RESTful services, the most popular type of services, are mainly described using short texts. The keyword-based discovery technology for RESTful services adopted by existing service registries is insufficient to obtain accurate services according to user requirements. Moreover, it remains a difficult task for users to specify queries that perfectly reflect their requirements due to the lack of knowledge of their expected service functionalities. In this paper, we propose a goal-oriented service discovery approach, which aims to obtain accurate RESTful services for user functional goals. The approach first groups existing services into clusters using topic models. It then clusters the service goals extracted from the textual descriptions of services by leveraging the topic model trained for services. Based on the service goal clusters, our approach can help users refine their initial queries by recommending similar service goals. Finally, relevant services are obtained by matching the service goals selected by users with those of existing services. Experiments conducted on a real-world service dataset crawled from ProgrammableWeb show the effectiveness of the proposed approach.
Article
Today'smultimedia search engines are expected to respond to queries reflecting a wide variety of information needs from users with different goals. The topical dimension ("what" the user is searching for) of these information needs is well studied; however, the intent dimension ("why" the user is searching) has received relatively less attention. Specifically, intent is the "immediate reason, purpose, or goal" that motivates a user to query a search engine. We present a thorough survey of multimedia information retrieval research directed at the problem of enabling search engines to respond to user intent. The survey begins by defining intent, including a differentiation from related, often-confused concepts. It then presents the key conceptual models of search intent. The core is an overview of intent-aware approaches that operate at each stage of the multimedia search engine pipeline (i.e., indexing, query processing, ranking). We discuss intent in conventional text-based search wherever it provides insight into multimedia search intent or intentaware approaches. Finally, we identify and discuss the most important future challenges for intent-aware multimedia search engines. Facing these challenges will allow multimedia information retrieval to recognize and respond to user intent and, as a result, fully satisfy the information needs of users.
Article
The overwhelming availability of visual content on the Internet poses a serious problem: although there are huge resources to tap, users often cannot find the content they actually need. The information need of a user is based on the user's intention or the goal he wants to achieve. In this paper we distinguish between users that have clearly defined information needs and goals and those that have broader, more vague needs. We present a novel approach to retrieval adaptation based on the degree of intentionality. Our approach utilizes the Bag of Visual Words retrieval method to support different degrees of intentionality. Experimental results using the PASCAL VOC 2007 dataset and varying the visual codebook sizes demonstrate the feasibility and benefits of our approach.
Chapter
Over the last years we have observed a remarkable shift of media spendings from offline brand building activities to online performance advertising as well as a noticeable increase in “green marketing” efforts and sustainability communication by companies of various branches. In this paper we bring these two research streams together. We develop and perform a non reactive A/B-test that enables us to evaluate the influence of sustainability information on the customers decision to buy a product by clicking on an ad on a search engine results page (SERP). We analyze campaign performance data from a European e-commerce retailer, apply a Bayesian parameter estimation to compare the two groups, and discuss the implications of the results.
Conference Paper
In RoboCup it is important to build up domain knowledge for decision-making. Unfortunately, this is a time-consuming and laborious job. At championships easy adaptability of this domain knowledge can be especially crucial as teams need to be able to change tactics and adjust to opponent behavior as fast as possible. An intuitive interface to the agent is therefore necessary. In this paper, we present a methodology to automatically populate a domain ontology from natural language text. The resulting populated ontology can then be deployed in a multi-agent system. This automatic transformation of text to knowledge for decision-making thus provides such an intuitive interface to the agents. It is embedded into the broader (up to now) theoretical context of an ontology lifecycle. We have created a proof-of-concept implementation in the 2D RoboCup Simulation League on the base of tactics descriptions from soccer literature. Experiments show that 71% of tactics are perfectly transformed and 86% of the actions are executed correctly in terms of geometric relations.
Conference Paper
Faced with the increasing services and users' personalized requirements, it remains a big challenge for users to effectively and accurately discover and reuse interested services. Goal oriented requirements modeling has attracted more and more attentions in services discovery and modeling, but little work has focused on extracting intentional goals from service descriptions. In this paper, based on the ranked domain keywords, we investigate how to extract domain-specific service goals from service descriptions, which can contribute to services discovery and recommendation. Programmable Web, a publicly accessible service repository, is selected as the testbed. Experiments show the feasibility of the proposed approach.
Chapter
Full-text available
This paper focusses on conceptualizing the quantification of the Carbon Footprint of IT-Services (CFIS). Initially, the increasing relevance of Carbon Footprint to the IS-community is pointed out. Based on literature review, we pre-sent related work that describes underlying concepts e.g. the Carbon Footprint of Products, Life Cycle Assessment as well as IT energy and performance measure-ment. We apply a transfer-oriented approach (design science) to propose a meth-odological framework for CFIS that is based on the phases of Life Cycle Assess-ment, and furthermore provide an example for the calculation. To our opinion the conceptualization of CFIS is an inevitable step to advance Green IS, since it quantifies dependencies between IT-Services, IT energy consumption and related greenhouse gas emissions. Thus, the paper contributes to the IS community by providing an applicable and novel method to IT service providers for calculating the CFIS and by identifying further important research directions in this field.
Article
Full-text available
The Lumiere Project centers on harnessing probability and utility to provide assistance to computer software users. We review work on Bayesian user models that can be employed to infer a users needs by considering a user's background, actions, and queries. Several problems were tackled in Lumiere research, including (1) the construction of Bayesian models for reasoning about the time-varying goals of computer users from their observed actions and queries, (2) gaining access to a stream of events from software applications, (3) developing a language for transforming system events into observational variables represented in Bayesian user models, (4) developing persistent profiles to capture changes in a user expertise, and (5) the development of an overall architecture for an intelligent user interface. Lumiere prototypes served as the basis for the Office Assistant in the Microsoft Office '97 suite of productivity applications.
Article
Full-text available
The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content.
Conference Paper
Full-text available
The identification of the user’s intention or interest through queries that they submit to a search engine can be very useful to offer them more adequate results. In this work we present a framework for the identification of user’s interest in an automatic way, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of a user and the categories in which these aims are situated. A manual classification of the queries was made in order to have a reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that for a considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationships between users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning we established that there are categories that we are not able to determine in contrast with other classes that were not considered but naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervised learning is a good alternative to find user’s goals. From supervised learning we can identify the user interest given certain established goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used, refine them and select the most appropriate to the user’s needs.
Article
Buried in a list of 20 million Web search queries collected by AOL and recently released on the Internet is user No. 4417749. The number was assigned by the company to protect the searcher's anonymity, but it was not much of a shield. No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from "numb fingers" to "60 single men" to "dog that urinates on everything." And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for "landscapers in Lilburn, Ga," several people with the last name Arnold and "homes sold in shadow lake subdivision gwinnett county georgia." It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends' medical ailments and loves her three dogs. "Those are my searches," she said, after a reporter read part of the list to her. AOL removed the search data from its site over the weekend and apologized for its release, saying it was an unauthorized move by a team that had hoped it would benefit academic researchers. But the detailed records of searches conducted by Ms. Arnold and 657,000 other Americans, copies of which continue to circulate online, underscore how much people unintentionally reveal about themselves when they use search engines — and how risky it can be for companies like AOL, Google and Yahoo to compile such data. Those risks have long pitted privacy advocates against online marketers and other Internet companies seeking to profit from the Internet's unique ability to track the comings and goings of users, allowing for more focused and therefore more lucrative advertising. But the unintended consequences of all that data being compiled, stored and cross-linked are what Marc Rotenberg, the executive director of the Electronic Privacy Information Center, a privacy rights group in Washington, called "a ticking privacy time bomb." Mr. Rotenberg pointed to Google's own joust earlier this year with the Justice Department over a subpoena for some of its search data. The company successfully fended off the agency's demand in court, but several other search companies, including AOL, complied. The Justice Department sought the information to help it defend a challenge to a law that is meant to shield children from sexually explicit material.
Conference Paper
In our work we extend the traditional bipartite model of ontologies with the social dimension, leading to a tripartite model of actors, concepts and instances. We demonstrate the application of this representation by showing how community-based semantics emerges from this model through a process of graph transformation. We illustrate ontology emergence by two case studies, an analysis of a large scale folksonomy system and a novel method for the extraction of community-based ontologies from Web pages.
Conference Paper
Many users are familiar with the interesting but limited functionality of Data Detector interfaces like Microsoft's Smart Tags and Google's AutoLink. In this paper we significantly expand the breadth and functionality of this type of user interface through the use of large-scale knowledge bases of semantic information. The result is a Web browser that is able to generate personalized semantic hypertext, providing a goal-oriented browsing experience. We present (1) Creo, a Programming by Example system for the Web that allows users to create a general-purpose procedure with a single example, and (2) Miro, a Data Detector that matches the content of a page to high-level user goals.