Content uploaded by Börkur Sigurbjörnsson
Author content
All content in this area was uploaded by Börkur Sigurbjörnsson
Content may be subject to copyright.
TECHNICAL REPORT
YL-2010-005
TAGEXPLORER: FACETED BROWSING OF FLICKR
PHOTOS
B¨orkur Sigurbj¨ornsson
Yahoo! Research
Diagonal 177, 08018 Barcelona, Spain
borkur@yahoo-inc.com
Roelof van Zwol
Yahoo! Research
4401 Great America Parkway. Santa Clara, CA 95054
roelof@yahoo-inc.com
August 20, 2010
Bangalore •Barcelona •Haifa •Montreal •New York
Santiago •Silicon Valley
Yahoo! Labs Technical Report No. YL-2010-005
Yahoo! Labs Technical Report No. YL-2010-005
TAGEXPLORER: FACETED BROWSING OF FLICKR
PHOTOS
B¨
orkur Sigurbj¨
ornsson
Yahoo! Research
Diagonal 177, 08018 Barcelona, Spain
borkur@yahoo-inc.com
Roelof van Zwol
Yahoo! Research
4401 Great America Parkway. Santa Clara, CA 95054
roelof@yahoo-inc.com
August 20, 2010
ABSTRACT: In this paper we present TagExplorer, an application for faceted browsing of Flickr
photos using tags. Information facets have been proposed as a useful means to assist the user in
browsing large and complex information spaces. The tags provided by users annotating their photos
in Flickr provide valuable knowledge that we deploy to feed the faceted browsing. We identify
facets for photo browsing based on frequency of usage in Flickr annotations. We deploy a tag
classification system to map the tags to facets and combine this with tag co-occurrence analysis,
to obtain meaningful query refinement terms. The faceted browsing application presents the facets
in the form of a tag cloud where terms belonging to the same facet are grouped together. The
application has been available on-line for over a year and we present an analysis of interaction
logs collected over the period of 12 months. The analysis shows how users can effectively deploy
the query recommendations to explore large image collections, and provides detailed insight into a
users’ search behavior.
1
Yahoo! Labs Technical Report No. YL-2010-005
1. Introduction
With the ever increasing popularity of multimedia sharing sites such as Flickr1and YouTube2,
browsing of large media collections is challenging. The user is presented with billions of images
or videos and could do with some help navigating the collections.. Facets have been proposed as a
useful paradigm for browsing photographs and other types of media objects [4, 26, 22]. The core
idea is that the user can refine the search criteria along a set of pre-defined facets, such as, through
neighborhood characteristics for a flat rental search [4]; through searching the works of a particular
artist in an art collection [26]; or related celebrities or points-of-interest in general Web search [22].
In this paper we present TagExplorer3– an application to enhance the image exploration expe-
rience through faceted browsing of photos in Flickr. By mining the tags of a large set of photos
uploaded in Flickr, we derive appropriate facets for the Flickr tag vocabulary, in combination with
a tag classification system for associating tags with facets. For that purpose we have adopted the
WordNet broad categories [8] as our classification system and we leverage the structure and knowl-
edge of Wikipedia to extend the coverage of the tag classification system [16].
Query refinement support has become a standard feature in on-line search engines, such as,
Yahoo! Search Assist4, Google Suggest5and Yahoo! Image Search facets [22]. Contrary to the
query-refinement tools used by search engines which are based on query-log analysis, our approach
is based on the tags associated with the photos [21]. It allows the user to explore the contents of
the collection, rather than providing suggestions that were of interest to other users. Through the
analysis of tag co-occurrence among Flickr photos, we derive a ranked list of refinement terms for a
given query. The refinement terms can be used to narrow or broaden the current search; or a lateral
exploration, where the user explores a related topic.
We combine the tag classification system and tag-based query refinement in TagExplorer – an
application that enables faceted browsing of Flickr photos. For a given query, the user is presented a
cloud of terms, which allows the user to refine her search, or to explore related concepts. The terms
are organized within the cloud in such a manner that terms belonging to the same facet are grouped
together. Figure 1 shows an example of the term-cloud refinement support that is implemented in
our interface. For the query “London” the user is presented a number of refinement terms that are
grouped around locations,subjects,activities, and time. As common practice in tag-clouds the more
relevant a term is, the more prominent that term is displayed in the refinement box.
The TagExplorer application has been available on-line for more than a year and we have col-
lected the interaction logs of users when exploring Flickr photos with the application. We have
analyzed the interaction logs collected over 12 months and we present the main findings in this
paper. The analysis sheds light on the way users interact with the system both in terms of query
refinement and faceted browsing of Flickr photos.
1http://www.flickr.com/
2http://www.youtube.com/
3http://tagexplorer.sandbox.yahoo.com/
4http://search.yahoo.com/
5http://google.com/
2
Yahoo! Labs Technical Report No. YL-2010-005
Figure 1: Tag-cloud style presentation of query refinement terms
This paper is organized as follows. We start with an overview of related work in Section 2.
We define facets for the Flickr tag vocabulary in Section 3 and present how to map Flickr tags into
facets. In Section 4 we describe the generation of query refinement terms for a given query. While
in Section 5 we discuss how the components are brought together in an on-line Web application.
We then present an analysis of the user interaction logs in Section 6. Finally, we come to our
conclusions and discuss future work in Section 7.
2. Related Work
There is a wide range of related work on the topic of faceted browsing, which we’ll discuss first.
Next, we review work on facet identification for tags and finally we discuss related work on query
refinement.
2.1. Faceted Browsing
Burke et al. [4] present faceted browsing for apartment rentals. The user can refine queries
based on facets such as price, size, and neighborhood characteristics. The facets are extracted by
applying a parser specialized for parsing classified ads. Yee et al. [26] present an interface for
searching and browsing images using faceted metadata. They apply the system for searching an art
catalog where the metadata describes different facets such as artist names, types of media, dates
and textual description of the art item content. The faceted meta-data is partially provided by the
collection itself and partially extracted using WordNet. One big difference between the scenarios of
these approaches and our approach is the scope of the collection. Browsing Flickr photos involves
browsing a very large topic space in terms of photo subjects, geographic location and contexts in
which the photos are taken. Furthermore the tags we are working with are highly heterogeneous of
nature and can literally cover any topic.
Popescu and Moellic [17] present OLIVE – a Web-based image search engine, which uses
WordNet and the Google image search API. The hierarchal structure of WordNet is used both to
expand the query that is sent to Google and to give users the opportunity to browse using broader,
narrower, and related terms from the WordNet hierarchy. Their application covers an equally wide
scope as our application. However, using WordNet alone has been shown to have insufficient cover-
age of the Flickr vocabulary [16] and a similar argument can be made about Web images in general.
In this paper we show how to address the browsing task using refinement terms with a wider range
than provided by WordNet.
3
Yahoo! Labs Technical Report No. YL-2010-005
Van Zwol et al. [22] describe MediaFaces – a system for faceted browsing of Image Search
results. They extract entities and relationships between entities (facets) from structured sources and
for a given input entity, rank the facets using query logs and flickr tags. The TagExplorer application
described in this paper was the motivating prototype that later developed into MediaFaces.
In addition to the work described above there is prior work on facilitating browsing of photos by
clustering search results. This can be done using either visual features [13] or textual features [12,
23]. Although related, this work addresses a different aspect of browsing image collections than the
work presented in this paper.
2.2. Identifying Tag Facets
There are several papers that have addressed the task of identifying facets for tags. Dubinko
et al. [7] analyze Flickr tag burstiness over time to identify tags that refer to one-time events (e.g.,
Super Bowl XLII or Beijing Olympics) and tags that refer to periodic events (e.g., New Year’s Eve or
Easter). Rattenbury et al. [18] analyze Flickr tag burstiness over geographic coordinates to identify
tags that refer to places, landmarks etc. Both papers focus on a specific facet of tags, e.g. events and
locations only. Overell et al. [16] focus on more general facets of tags by using a combination of
WordNet and Wikipedia to classify Flickr tags into WordNet broad categories. The tag facets that
we describe in this paper are based on the approach proposed by Overell et al.
2.3. Query Refinement
Query refinement has been studied thoroughly in the information retrieval community, both for
automated query refinement where the search engine rewrites the user query [24, 25], as well for
interactive query refinement where the user is prompted with options to refine the query using a list
of suggested queries or query terms [3, 2, 10, 1]. The query refinement options can be generated
by either analyzing word relationships using the corpus as a whole (global) or the top documents
retrieved for the initial query (local) [24]. The query refinement options can also be derived from
an analysis of query logs [5, 11].
In this paper we opt for an approach based on relationships derived from the Flickr corpus,
and we use a technology similar to that described in previous work for inducing term/tag hierar-
chies [19, 6, 14, 20], and for tag recommendation [21]. The motivation to go for a corpus-based
query refinement rather than query-log based query refinement is that we want the refinement terms
to represent what is contained in the collection rather than what has been searched for by other
users. Second, we have chosen not to use the top X retrieved documents to generate the query re-
finement recommendations, but to use collect global statistics derived from the entire collection, as
these statistics are more stable and not biased towards the ranking function that is used to retrieve
the photos.
3. Identifying Tag Facets
The first step in building a faceted browsing system is to choose the facets that will be used
and the appropriate facet instances. In our case, any tag used in Flickr can be considered a facet
4
Yahoo! Labs Technical Report No. YL-2010-005
Unclassified (30.8%)
Location (19.3%)
Artifact/object (15.1%)
Person/group (15.9%)
Action/event (4.1%)
Time (7.0%)
Other (7.8%)
Figure 2: Classification of Flickr tags into WordNet broad categories using the
ClassTag system.
instance. Hence we need to choose the facets in such a way that they have a good coverage in the
Flickr tag vocabulary. In this section we will discuss how to define facets for Flickr tags and how to
classify the tags according to the defined facets.
3.1. Defining Facets for Flickr Tags
Previous work on image annotations has identified three main facets to classify the annotation
terms: place, activity, and depictions [20, 15]. In this paper we define three broad facet cate-
gories that correspond more or less to those identified in previous work. Our facet categories are:
“Where?”, “What?” and “When?”. For a single photo these facets would correspond to information
about where the photo was taken, the subject of the photo, and when it was taken (both in terms of
time and context).
The choice of these three broad categories is based on our previous work on tag classification
where we show that 19% of tags can be classified as locations; 31% as person, group, artifact or
object; and 13% as action, event or time [16]. Figure 2 shows the classification results in more
detail. We divide the three broad facet categories into 5 slightly more fine grained facets:
•Where? describes where the photo was taken and contains only one facet: locations describ-
ing the location where the photo was taken.
•What? describes the content of the photo and is divided into: subjects describing objects
(natural or man-made), animals, plants etc., and names describing names of people or orga-
nizations related to the topic of interest.
•When? describes when the photo was taken and is divided into: activities describing the
activity or event when the photo was taken, and time describing the time at which the photo
was taken.
Finally, we create a mapping between the 5 facets and the 12 WordNet broad categories [8] that ap-
pear most frequently as Flickr tags [21, 16]. The mapping is shown in Figure 3. The locations facet
is associated with the WordNet category noun.location; subjects with noun.artifcat, noun.object,
5
Yahoo! Labs Technical Report No. YL-2010-005
activities
object
substance
plant
animal
food
person
group
act
event
WordNet
broad
categories
location time
Where? What? When?
Facets locations subjects names time
artifact
Figure 3: Mapping between facets and WordNet broad categories.
Tag Facets
France locations where?
Paris locations where?
Sacre Coeur subjects what?
church subjects what?
buildings subjects what?
Montmartre locations where?
June 2004 time when?
Paul Abadie names what?
Figure 4: Example of a tagged photo on Flickr. We show the tags as they were
inserted by the user and the facet classification according to our framework
(Photo by: borkur.net)
noun.substance, noun.plant, noun.animal, and noun.food; names with noun.person and noun.group;
time with noun.time; and activities with noun.act and noun.event. In the next subsection we will
describe in detail how we classify the Flickr tags into the WordNet broad categories.
As an illustrative example, Figure 4 shows an example of an annotated photo in Flickr6where
its tags are mapped to the facets introduced above. The photo owner annotated the photo according
to four facets: locations,subjects,names, and time. The tags describe where and when the photo
was taken, together with information about the subject such as name of the building, its type and its
architect.
3.2. Mapping Flickr Tags to Facets
We map the Flickr tags to facets by first classifying them into WordNet broad categories using
the ClassTag system described by Overell et al. [16]. The ClassTag system is composed of a union
of two mappings: a baseline using WordNet only and a machine learned mapping using structural
features of Wikipedia pages. The learned mapping provides a significant extension of the coverage
of classified tags in Flickr.
6http://www.flickr.com/photos/borkurdotnet/363644515/
6
Yahoo! Labs Technical Report No. YL-2010-005
Wikipedia
patterns
structural
WordNet
lemma
WordNet
category
Flickr
tag
Wikipedia
article(s)
Wikipedia
article
classifier
text
anchor WordNet
category
~?
article
Figure 5: ClassTag’s machine learned mapping from tags to WordNet cate-
gories.
The baseline system uses a straight forward string matching between tags and WordNet lemmas.
Using the baseline alone, we can classify 57% of the Flickr tags. The main drawback of the baseline
is that WordNet is limited to the vocabulary of a fairly old news corpus. It does not contain infor-
mation about many important entities, such as, Pamplona, Chrysler Building, London Eye, Briteney
Spears or biodiesel, to name a few examples.
To expand the coverage of classified tags, ClassTag uses a machine learned approach that de-
ploys the structural features present in Wikipedia pages. Figure 5 shows an overview of the ap-
proach. The system is composed of two components:
1. A classifier for classifying Wikipedia articles using structural patterns as features with Word-
Net categories as the classification scheme (top part of Figure 5).
2. A pipeline for mapping (Flickr) tags to WordNet categories, using the output of classifier
(bottom part of Figure 5).
We will now describe – at a high level – the two components. A detailed description of the approach
can be found in [16].
3.2.1. Classifying Wikipedia Articles We build the feature space of our classifier by extracting
structural patterns from Wikipedia articles – more precisely using category and template structures.
We can map a subset of the Wikipedia articles to WordNet semantic categories using simple string
matching between the Wikipedia article titles and WordNet lemmas. We use the successful matches
as training instances. We then use the trained model to classify the remaining Wikipedia articles.
Following this approach, we build models/classifiers for the 11 WordNet categories that appear most
frequently among Flickr tags [21, 16].
3.2.2. Classifying Flickr Tags Having classified Wikipedia articles we can use the classification
results to classify Flickr tags. We do that using a simple pipeline of mappings. First we map a
Flickr tag to Wikipedia anchor texts using exact string matching. Next we map Wikipedia anchor
texts to Wikipedia articles based on their anchor frequency. Then we determine the appropriate
7
Yahoo! Labs Technical Report No. YL-2010-005
classification of the Flickr tag using the Wikipedia article classification procedure (as described
above).
3.3. Evaluation
Using the combination of the WordNet baseline and the machine learned mapping we are able
to extend the coverage of classified tags to 69% of the Flickr tags (compared to 57% using just the
WordNet baseline). This performance was achieved by a settings of the ClassTag system that gave
a precision of 72%. Hence the classification of tags is not perfect, but striking a reasonable balance
between precision and recall (See [16] for evaluation details).
We conclude this section by listing some tags that were covered by ClassTag, but were not
classified by the WordNet:
•Activities: Triathlon, geocaching, mountain biking, kendo.
•Animals: Jack Russell Terrier, Australian Shepherd.
•Artifacts: Notre Dame, London Eye, Sagrada Familia, nikon, nokia, wii, 4x4.
•Food: BBQ, Churrasco, Japanese food, Ramen, Asado.
•Groups: Live8, G8, NBA, SIGGRAPH, Tate Modern.
•Locations: NYC, Philly, Phuket, Big Island, Nottingham.
•Objects: Blue Mountains, Point Reyes, Half Dome, Lake Titicaca, Jungfrau.
•People: Norman Foster, Ronaldinho, Britney Spears, Chris, Alex, Emily, Lisa.
•Time: New Years Eve, 4th of July, Valentines day.
Those tags are certainly important when exploring photo collections.
4. Query Refinement
Interactive query refinement refers to the task of supporting users to refine the search path by
suggesting terms to add to a query or by suggesting related queries. In our case we want to generate
a list of terms that the user can use to either expand the current query or use as a new query.
For a given term in the user’s query we want to find potential refinement terms to show to
the user. We distinguish three different types of terms that could potentially be useful for query
refinement.
•General terms: Terms that are more general than the query term. E.g., France is more general
than Paris. This allows the user to explore the more general context of the current query.
8
Yahoo! Labs Technical Report No. YL-2010-005
•Specific terms: Terms that are more specific than the query term. E.g., Montmartre is more
specific than Paris. This allows the user to drill down into a specific aspect of the current
query.
•Lateral terms: Terms that represent concepts related to the query term. E.g., the Chrysler
Building is laterally related to the Empire State Building. This allows the user to explore
related concepts of the current query context.
The application in this paper mainly focuses on general terms,specific terms, and implicitly also
lateral terms. I.e., for each query term we generate a ranked list of general terms and a ranked list
of specific terms. We obtain the lateral terms together with the general and specific terms.
4.1. Query Refinement using Tag Co-occurrence
Our approach is based on our previous work on tag-recommendation were we use co-occurrence
statistics for Flickr tags, calculated over a large set of Flickr photos (over 250 million) [21]. We use
a probabilistic approach to derive potential refinement terms for a user query. For each query term
tqwe calculate a list of general terms using conditional probability
P(t|tq) := |t∩tq|
|tq|(4.1)
and a list of specific terms using the conditional probability
P(tq|t) := |t∩tq|
|t|(4.2)
where |t∩tq|is the number of photos annotated with both tag tand tag tq; and |t|is the number of
photos annotated with tag t.
We combine the different lists using a weighted voting strategy. Different weights are given
to the general and specific list, depending on the level of specificity of each query term. I.e., for
general query terms more weight is given to specific refinements and for specific query terms more
weight is given to general refinement terms. Generality of a term was calculated using:
G(t) := log(|t|/|t|u)
log(maxt(|t|/|t|u)) (4.3)
where |t|is the number of photos annotated with tag tand |t|uis the number of users who have
photos annotated with tag t. This particular measure performed best in correctly detecting the
most specific tag for a given set of known general/specific tag pairs. The way we generate query
refinement terms is closely related to generating term hierarchies and tag recommendations [19, 14,
20, 21]. We refer to those publications for more details.
9
Yahoo! Labs Technical Report No. YL-2010-005
4.2. Examples
Below, are query refinement examples produced by our system for the queries “London”, “break-
fast”, and the “Millennium bridge”:
•london: england, uk, 2007, europe, big ben, united kingdom, thames, 2006, london eye,
travel, buckingham palace, music, southwark, party, city, britain
•breakfast: food, coffee, eggs, bed, morning, brunch, bacon, toast, pancakes, 2007, restaurant,
egg, cafe, bread, lunch, cereal
•millennium bridge: thames, london, bridge, england, tate modern, st pauls cathedral, uk,
gateshead, tyne, newcastle, river, river thames, london eye, south bank, night, tyne bridge
For the “London” query our system gives generalization refinements, such as England and UK,
but also more specific refinement options such as Big Ben, Thames, and London Eye. For the
“breakfast” query our system gives specific refinements such as coffee, eggs, and toast; as well
as context refinement such as bed and restaurant. For the “Millennium Bridge” query our system
includes refinement terms for disambiguating between the Millennium Bridge in London and the
Gateshead Millennium Bridge in Newcastle; as well as recommending pointers to landmarks near
the bridge in London, such as Tate Modern and St. Paul’s Cathedral.
5. Faceted Browsing Interface
In this section we discuss how to combine the query refinement and tag facets in TagExplorer –
a faceted interface for browsing Flickr photos.
5.1. Design
In Figure 6 we show the faceted browsing prototype applied to browsing photos for the user
query ‘london’. The interface is split into four main blocks:
•Query box: Allows the user to enter a query (Figure 6 A).
•Query refinement: Shows the current query (Figure 6 B) and the related term refinement
suggestions (Figure 6 C).
•Result list: Shows the 36 most relevant photos for the query, which are obtained through the
Flickr API (Figure 6 E).
•Image display: A block for displaying a small version of a photo with additional information
such as title, description, photographer and tags (Figure 6 F). This block is updated if the user
clicks on a photo thumbnail.
The goals underlying the faceted browsing interface are motivated by the design principles for
information access systems outlined by Hearst [9]:
10
Yahoo! Labs Technical Report No. YL-2010-005
Figure 6: Screenshot of the interface applied to browsing photos related to the
query ‘london’.
•Offer informative feedback: We provide feedback to the user about the relationship between
the query specification and the results using the image display where the user can see the
title of each photo and its tags (Figure 6 F). We also provide feedback about the relationship
between the query specification and the underlying collection by showing potential refinement
terms (Figure 6 C).
•Reduce working memory load: We alleviate the working memory load of the user by pro-
viding the option of choosing relevant query terms to refine the query (Figure 6 C); and we
offer support to quickly go back to previous searches by easily removing query terms (Fig-
ure 6 B).
•Trade-off between simplicity and power: Striking the right balance between simplicity and
expressive power is a major challenge for our interface. We want to give the user power to
refine her query in several different ways: by exploring more specific aspects of the current
topic, by exploring more general aspects of the current topic, or by moving laterally to related
topics. Giving this power in a simple manner is far from trivial. In our implementation we
believe that we have traded some of the simplicity for the power but that the interface is not
overly complex.
11
Yahoo! Labs Technical Report No. YL-2010-005
Figure 7: Screenshot of the initial screen of the interface
5.2. Exploration
Prior research on information access interfaces has shown that it is important that interfaces
support the user both in their initial query formulation and through successive query refinement
steps [9]. Hence, one of the main goals of our interface is to assist the user when starting an
exploration session, but also when refining the current exploration session.
5.2.1. Initial Exploration When a user is about to start to explore the collection we want to give
her a good starting point, which reflects the content of the collection. We do this by showing the
user a term-cloud that contains the 100 tags most frequently used by Flickr users – see Figure 7.
The term-cloud is organized in such a way that terms belonging to the same facet are grouped
together. The user is also free to start the exploration through entering a keyword-based query in
the query-box.
5.2.2. Continuous Exploration When a user is already in the midst of browsing a topic we want
to provide assistance to continue browsing the collection, if desired by the user. We identify three
directions in which the user can move.
•Specification: The user can explore a sub-topic of the current topic.
•Generalization: The user can explore the general context related to the current topic.
•Lateral movement: The user can explore topics similar to the current topic.
We allow for the three types of movements using three explicit actions, not strictly respectively:
•Add query terms: The user can add query terms clicking on the ‘+’-sign of the correspond-
ing term in the term-cloud (Figure 6 C).
•Remove query terms: The user can remove query terms by clicking the ‘×’-sign of the
corresponding term in the current query (Figure 6 B).
12
Yahoo! Labs Technical Report No. YL-2010-005
•New query: The user can start a new query by clicking on a term in the term-cloud (Fig-
ure 6 C).
We refer to a chain of these actions as a continuous exploration. In addition to the three actions
mentioned above the user can at any time alter the query directly using the query box (Figure 6 A).
The most complex part of designing the interface has been to find a way for the user to choose
between either adding a refinement term to a query or to post a new query using a refinement term.
In our design we opt for the standard tag-cloud functionality for posting a new query by clicking a
refinement term. We implement the non-standard adding of a query term by adding a ‘+’-sign next
to each refinement term. We realize that this may compromise the simplicity of the interface but
we believe that this is outweighed by the additional expressive power of having the two refinement
options available.
6. Interaction Log Analysis
TagExplorer has been available online for over a year. During this time we have collected inter-
action logs of users users with the online application. In this section we will analyze the interaction
logs that we have collected over a 12 month period after the launch of the application. The analysis
is based on more than 3,000 interaction sessions. We have left out the interaction sessions collected
during the first weeks after the launch to reduce the bias of our own testing. The goal of this analysis
is to shed a light on the browsing patterns of users while interacting with the system. User were
not recruited to use the system, so the interaction analysis is based on users who stumbled upon
the system, e.g., through the Yahoo! Sandbox website.7This is thus an uncontrolled experiment
and one should be careful when interpreting the outcome of the analysis. We will however motivate
some system improvements based on the analysis. We will revisit this in Section 6.5
6.1. Exploration
First we look at the types of actions used in exploration. Figure 8 shows a summary of different
exploration actions taken by the users. We see that 32% of the actions are queries posted through
the query box while 68% of actions are interactions with the term-cloud (add/click suggested term)
or query-terms (remove/click query term). Of the query refinement actions, clicking a term is done
most frequently (45%), followed by adding a new term to the query (14%), and removing a term
from the query (8%). The action of clicking a query term is hardly ever observed. It is encouraging
to see that the query refinement actions are used more than the modifying the query directly in
the query-box. In the following subsections we analyze this in more detail, in terms of initial and
continuous exploration.
6.1.1. Initial Exploration For the first interaction step of the user with the system there are two
options to start the exploration: (1) use the query box or (2) click a term in the cloud (Figure 7).
Figure 9 shows the percentage of users that used each of the respective options. We see that it is
7http://sandbox.yahoo.com/
13
Yahoo! Labs Technical Report No. YL-2010-005
add term;
14%
click term;
45%
click query
term; 1%
query box;
32%
remove
term; 8%
Figure 8: Summary of different exploration actions undertaken by the users,
shown as a fraction of the overall action count.
click term;
48%
query box;
52%
Figure 9: Portion of initial exploration divided between query box actions and
term clicks.
14
Yahoo! Labs Technical Report No. YL-2010-005
0%
10%
20%
30%
40%
50%
60%
70%
1
2
3
4
5
6
7
8
9
10
11
12
Fraction
Interaction steps
add term
click term
query box
remove term
Figure 10: Summary of interaction within a session. The x-axis represents the
n-th interaction step in a session. The y-axis represents the fraction of different
exploration actions relative to the total number of actions taken at that step.
almost an equal split between the two options. This result indicates that the collection overview is
useful, but the query box is still an important means to start an exploration session. If the user has
a pre-defined goal, she is probably more likely to use the query box to start her session. If the user
does, however, not have a pre-defined goal, she may be likely to use the collection overview.
6.1.2. Continuous Exploration We now analyze the interaction steps for a complete interaction
session. Figure 10 shows the proportion of the interactions grouped by action type, taken at each
step within a session. The left-most part of the graph – interaction step 1 – corresponds to the initial
action taken by the users (Figure 9). We see that after the initial interaction the use of the query
box decreases significantly, compared to the click-based refinement. This indicates that the users do
appreciate the faceted query refinement options given to them – in particular once they have started
their exploration session.
If we compare the difference between clicking on refinement terms and adding or removing
refinement terms, we see that as the session evolves the user-behaviour is changing: the action of
clicking on term decreases while at the same time the addition/removal of refinement terms to the
query is increasing. This may indicate that when the user starts using the system the functionality of
the ‘+’-sign and the ‘x’-sign is not clear (i.e., for adding and removing query terms). However, as the
user uses the system she is able to learn the difference and hence uses it more frequently. A manual
inspection of a subset of the interaction logs supported this interpretation. In many cases users start
their refinement chains by clicking on terms, but later add the same terms to their query using the
‘+’-sign. An alternative explanation could be that it takes the user some time to first reformulate
15
Yahoo! Labs Technical Report No. YL-2010-005
Action Term Query
Click term beach beach
Click term sand sand
Click term water water
Add term sand water sand
Action Term Query
Click term beach beach
Click term water water
Add term ripple water ripple
Add term wave water ripple wave
Session I Session II
Figure 11: Example of two interaction sessions starting from clicking the term
‘beach’.
Figure 12: Distribution of facet usage when browsing compared to usage in
Flickr photo annotations. †Based on statistics reported in [16]
the general query before she starts to zoom in on a specific aspect of the query. Figure 11 shows an
example of two interaction sessions starting with clicking the term ‘beach’. In both cases the users
start with clicking terms and then later add terms. In Session I the user may be confused by the
functionality of the ‘+’-sign as she clicks the terms ‘sand’ and ‘water’ separately before she adds
one to the other in the same query. In Session II the user seems to take two clicks to refine her basic
query before making it more specific by adding terms.
6.2. Faceted Browsing
Let us now turn our attention to the facets used in browsing. Figure 12 shows an overview of the
types of facets used in browsing, compared to the distribution in the underlying collection (based on
data reported in [16]). The browsing information is aggregated over the different click-based action
types (different refinement term clicks). We see that subjects is the most frequently used facet
type (37%), followed by locations (28%) and names (16%). Activities and time are the least used
facets for browsing with 10% and 9% respectively. The distribution of facets used in browsing is
16
Yahoo! Labs Technical Report No. YL-2010-005
Figure 13: Distribution of facets used in initial screen, fist browsing action and
first query box entry. †Based on manual classification of queries in query log.
similar to the facet distribution in the underlying collection. Subjects and activities have a relatively
higher frequency in the interaction logs than in the collection, while names have a relatively lower
frequency in the interaction logs compared to the collection.
We will now study in detail the facets shown to the user and compare with the facets used in
browsing actions and in the query box.
Figure 13 shows the distribution of facets used in initial screen, first browsing action and first
query box entry. Comparing the distribution of facets in the initial screen and first browsing action,
we see that locations have a relatively high click-rate as a first browsing action, where as names
and time have a relatively low click-rate. Comparing the first query-box entry – i.e. in the case
when a browsing session was started using the query-box – and the first browsing action we see that
locations are popular in both cases. Subjects are less frequent as first query-box entries than first
browsing actions but names are by far more frequent as first query-box entries than first browsing
actions. The time facet is hardly ever used to start an exploration session through the query-box.
The frequency of names as first query-box entries is likely to be due to people searching for photos
of themselves or people they know.
The distribution of tags appearing in follow-up screens and used in follow-up actions is shown
in Figure 14. We do not have the exact number for the distribution of tags appearing in follow-up
screens but we estimate it by looking at the distribution of refinement terms shown for the 50 most
frequently displayed follow-up screens. Comparing the distribution of terms appearing in follow-up
screens and terms clicked in follow-up screens we see that subjects and names are clicked relatively
frequently but time less frequently. For locations and activities the probability of being used in
refinement is equal to the probability of appearing.
The distribution of facets used first action and follow-up actions is shown in Figure 15. We see
that the click-rate for locations drops considerably (from 45% to 27%) while subjects and names
17
Yahoo! Labs Technical Report No. YL-2010-005
Figure 14: Distribution of facets used in follow-up screens and follow-up brows-
ing actions. †Estimate based on the distribution for the 50 most popular queries.
Figure 15: Distribution of facets used in fist browsing action and follow-up
browsing actions.
18
Yahoo! Labs Technical Report No. YL-2010-005
Table 1: Transition probability of clicking or adding a term in a certain facet
as a 2nd action (columns), given that the 1st action was involved a certain facet
(rows). The last column show the fraction of click-pairs originating in the given
facet.
Locations Subjects Names Actions Time % of volume
Locations 52% 25% 10% 6% 7% 46%
Subjects 14% 70% 6% 5% 5% 33%
Names 3% 21% 38% 26% 12% 5%
Actions 19% 20% 37% 14% 10% 11%
Time 34% 8% 8% 11% 39% 5%
Table 2: Transition probability of clicking or adding a term in a certain facet
(columns), given that the previous action was involved a certain facet (rows).
The last column show the fraction of click-pairs originating in the given facet.
Locations Subjects Names Actions Time % of volume
Locations 50% 28% 9% 6% 7% 30%
Subjects 13% 60% 12% 7% 9% 37%
Names 12% 23% 45% 14% 6% 16%
Actions 15% 22% 27% 23% 13% 9%
Time 21% 19% 15% 11% 34% 8%
increase (from 33% to 37% and from 5% to 19%, respectively). The click-rate for activities and
time is the same between first action and follow-up actions.
6.3. Refinement Transitions
We have seen that users often begin their exploration looking at a location, but what about
successive actions? We will now turn our attention to different types or refinement transitions.
Table 1 shows the probability of clicking or adding a term of a certain facet as their 2nd action
(columns), given that the 1st action involved a certain facet (rows). E.g., if the session started with
a click on a location term (row 1) then in 52% of the cases the users’ next action was to click on or
add another location term; in 25% of the cases a subject was clicked or added; a name was clicked
or added in 10% of the cases; etc. The last column shows the fraction of refinement pairs originating
in the given facet. E.g., 46% of the sessions where the two first actions included tag clicks or tag
addition started by clicking a location.
Table 2 shows similar statistics as Table 1, but averaged over all interaction steps. E.g., if the
user’s previous action (at any step) involved a location and she decided to either click or add a term,
then in 50% of the cases she chose another location; in 28% of the cases she chose a subject; in 9%
of the cases she chose names; etc.
From both tables we can clearly see that the dominant behavior is that two consecutive clicks
19
Yahoo! Labs Technical Report No. YL-2010-005
Figure 16: Frequency of different query refinement patterns in a random set of
100 interaction sessions. E.g., the fraction of sessions containing at least one
action of a given refinement pattern.
tend to be on terms belonging to the same facet. This is most prominently the case for the two most
popular facet types – subjects and locations.
6.4. Refinement Patterns
In the previous sections we argued that we want our system to support three types of query
refinement: generalization, specification, and lateral shift. To see if these patterns were present
in our interaction logs we randomly sampled a set of 100 interaction sessions with at least one
refinement action and manually annotated what sort of refinement patterns were present. Figure 16
shows the distribution of different query refinement patterns. Specification was the most frequent
pattern as it appeared in about 60% of the interaction sessions. Generalizations and lateral shifts
were less frequent as each appeared in about 40% of the interaction sessions. We included an
additional refinement pattern which we call “topical shift” – meaning that the user changed her
search topic in mid-session. This pattern was fairly frequent as it appeared in almost 50% of the
sessions. It is interesting to note that most sessions included multiple refinement patterns.
Figure 17 shows an example of an interaction session containing multiple refinement patterns.
The session starts out with a specification chain where the user goes from ‘Germany’ to ‘Berlin’
and to ‘Reichstag’ – the building housing the German parliament. The user then moves laterally
to exploring ‘Bundestag’ – the German parliament – before she makes a topical shift to ’Norman
Foster’ – the architect behind the reconstruction of the Reichstag building. The user then removes
the ‘Norman Foster’ term and starts a new query sequence by typing ‘Koblenz’ into the query box.
She then makes her query more specific by clicking ‘Ehrenbreitstein’ – a mountain (and fortress)
on the east bank of Rhine river opposite to Koblenz. Finally the user makes a lateral move from
‘Ehrenbreitstein’ to ‘Rhine’.
20
Yahoo! Labs Technical Report No. YL-2010-005
Action Term Query Refinement Pattern
Click term germany germany –
Click term berlin berlin specification
Click term reichstag reichstag specification
Click term bundestag bundestag lateral shift
Click term normanfoster normanfoster topical shift
Remove term normanfoster –
Query box koblenz koblenz topical shift
Click term ehrenbreitstein ehrenbreitstein specificaion
Click term rhine rhine lateral shift
Figure 17: Example of an interaction session with multiple refinement patterns
occurring in the same session
6.5. Discussion
In this section, we summarize the main observations that have come out of analysis of the
user interaction logs and we use the observations to motivate system improvements that could be
implemented in a next iteration of the system development.
Query box and term-clouds are equally popular as starting points for an exploration session.
This indicates that a collection overview is useful for the user to start an exploration session. It
does, however, not serve as a replacement for the query box. Both functionalities should thus be
supported in any iteration of the system implementation.
Query term refinement using faceted term-clouds is used more frequently than the query box.
This indicates that the browsing support given by the cloud of refinement terms is appreciated by
users. This should be explicitly verified in a user study.
Clicking a term is used more frequently than adding a term. Before launching the tool we were
concerned that the distinction between starting a new query by clicking a term and adding a query
term to the current query might be too complicated for the users to grasp. The low usage of the
‘+’-sign seems to confirm that our concerns were valid. Particularly given the fact that 60% of the
sessions included a specification (Figure 16). Informal interviews with users have also indicated
that the distinction between clicking a term and clicking the ‘+’-sign can be confusing. Some users
expected that clicking a refinement term would add the term to the query rather than posting a new
query. However, when analyzing interaction steps we saw that the usage of the ‘+’- sign increased
as sessions progressed (See Figure 10). This may indicate that there is a learning effect (Figure 11
Session II). It could also be a natural behavior where the user first refines her the basic query before
making it specific (Figure 11 Session I). In many cases it is also natural to refine one’s query without
choosing to add terms to the query, but rather click a more specific term (Figure 16). In sum, the
answers to this question are still inconclusive and should be addressed explicitly in a formal user
study.
Locations are a popular starting point for exploration sessions but subjects are more frequent
in the successive exploration steps. It is not surprising that locations serve as popular starting points
21
Yahoo! Labs Technical Report No. YL-2010-005
for photo exploration. Photos are an ideal means to get familiar with a location. The successive
use of subjects in query refinement indicates that users may want to zoom in on a specific subject
when browsing a certain location. An analysis of adjacent click pairs did show that the alternation
between locations and subjects was fairly high (Tables 1 and 2). Based on this observation we could
chose to make location tags more prominent in the initial exploration stage.
Specification and topical shifts are frequent refinement directions. Both specification and topical
shifts are frequent refinement directions. This suggests that the TagExplorer application can have
dual purpose of finding photos of specific topics and for browsing the underlying collection without
a specific information need.
7. Conclusions
In this paper we have presented TagExplorer – a novel application for faceted browsing of
Flickr photos. The application combines a tag classification system with tag co-occurrence analysis
to identify for any given query what the related refinement terms are, and presents the terms in a
term-cloud that groups the refinement terms according to their facet.
The tag classification system allows us to map any tag to a facet, which provides an elementary
structure along which the facets are presented to the user. Second, based on the tag co-occurrence
frequencies, we can provide the user with terms that allow for broadening or narrowing the search
path, or to spin off from the search topic in a lateral direction. The application implements an
endless browsing paradigm, that allows the user to explore the photo collection.
The outcome of the user interaction analysis based on more than 3,000 user sessions clearly
reveals the uptake in browsing behavior of the user through interaction with the facets, and at the
same time the reduction of manually refined queries. Secondly, based on an analysis of the facet
classes we find that users predominantly start with typing or clicking on a location. This corresponds
well with the tagging behavior of users in Flickr. We also find that users have a strong interest in
subjects and specific names to further refine their exploration. Finally, we find that users click
more frequently on a tag, than on the ‘+’-sign for query expansion. Further research is needed to
investigate if this is an artifact of the way the facets are presented to the user. We did however
observe an uptake on the query expansion after the first steps of interaction, which indicates that
there is a short learning curve.
For our future work, we plan to improve the quality of the facets by incorporating the click-
behavior of users. Including the user-feedback into the query refinement systems will allow us to
fit the recommended facets to the needs of the user. In addition, we plan to work on alternative
presentation and interaction methods, to investigate how best to support the user in the exploration
task.
References
[1] Peter Anick. Using terminological feedback for web search refinement: a log-based study. In
SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research
and development in informaion retrieval, pages 88–95, New York, NY, USA, 2003. ACM.
22
Yahoo! Labs Technical Report No. YL-2010-005
[2] N. J. Belkin, C. Cool, D. Kelly, S.-J. Lin, S. Y. Park, J. Perez-Carballo, and C. Sikora. It-
erative exploration, design and evaluation of support for query reformulation in interactive
information retrieval. Inf. Process. Manage., 37(3):403–434, 2001.
[3] Peter Bruza, Robert McArthur, and Simon Dennis. Interactive internet search: keyword, direc-
tory and query reformulation mechanisms compared. In SIGIR ’00: Proceedings of the 23rd
annual international ACM SIGIR conference on Research and development in information
retrieval, pages 280–287, New York, NY, USA, 2000. ACM.
[4] Robin D. Burke, Kristian J. Hammond, and Benjamin C. Young. Knowledge-based navigation
of complex information spaces. In Proceedings of the 13th National Conference on Artificial
Intelligence, pages 462–468, 1996.
[5] Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. Probabilistic query expansion using
query logs. In WWW ’02: Proceedings of the 11th international conference on World Wide
Web, pages 325–332, New York, NY, USA, 2002. ACM.
[6] Wisam Dakka, Panagiotis G. Ipeirotis, and Kenneth R. Wood. Automatic construction of
multifaceted browsing interfaces. In CIKM ’05: Proceedings of the 14th ACM international
conference on Information and knowledge management, pages 768–775, New York, NY, USA,
2005. ACM.
[7] Micah Dubinko, Ravi Kumar, Joseph Magnani, Jasmine Novak, Prabhakar Raghavan, and An-
drew Tomkins. Visualizing tags over time. In WWW ’06: Proceedings of the 15th international
conference on World Wide Web, pages 193–202, New York, NY, USA, 2006. ACM.
[8] Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. The MIT Press, 1998.
[9] M. Hearst. User interfaces and visualization. In Ricardo Baeza-Yates and Berthier Ribeiro-
Neto, editors, Modern Information Retrieval. Addison-Wesley Longman Publishing Company,
1999.
[10] Hideo Joho, Claire Coverson, Mark Sanderson, and Micheline Beaulieu. Hierarchical presen-
tation of expansion terms. In SAC ’02: Proceedings of the 2002 ACM symposium on Applied
computing, pages 645–649, New York, NY, USA, 2002. ACM.
[11] Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. Generating query substitutions.
In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages
387–396, New York, NY, USA, 2006. ACM.
[12] Shuo-Peng Liao, Pu-Jen Cheng, Ruey-Cheng Chen, and Lee-Feng Chien. LiveImage: Orga-
nizing Web images by relevant concepts. In Proceedings of the Workshop on the Science of
the Artificial 2005, pages 210–220, 2005.
23
Yahoo! Labs Technical Report No. YL-2010-005
[13] Hao Liu, Xing Xie, Xiaoou Tang, Zhi W. Li, and Wei Y. Ma. Effective browsing of web image
search results. In MIR ’04: Proceedings of the 6th ACM SIGMM international workshop on
Multimedia information retrieval, pages 84–90, 2004.
[14] Peter Mika. Ontologies are us: A unified model of social networks and semantics. In Proceed-
ings of the 4th International Semantic Web Conference (ISWC 2005), volume 3729 of LNCS.
Springer-Verlag, 2005.
[15] Mor Naaman, Susumu Harada, QianYing Wang, Hector Garcia-Molina, and Andreas Paepcke.
Context data in geo-referenced digital photo collections. In MULTIMEDIA ’04: Proceedings
of the 12th annual ACM international conference on Multimedia, pages 196–203, New York,
NY, USA, 2004. ACM.
[16] Simon Overell, B ¨
orkur Sigurbj¨
ornsson, and Roelof van Zwol. Classifying tags using open
content resources. In WSDM ’09: Proceedings of the Second ACM International Conference
on Web Search and Data Mining, pages 64–73. ACM, 2009.
[17] Adrian Popescu and Pierre-Alain Moellic. OLIVE : a conceptual Web image search engine.
In ACM Multimedia 2007, September 24 - 29, Augsburg, Germany, 2007.
[18] Tye Rattenbury, Nathan Good, and Mor Naaman. Towards automatic extraction of event and
place semantics from flickr tags. In Proceedings of the Thirtieth International ACM SIGIR
Conference, (SIGIR 07), 2007.
[19] Mark Sanderson and Bruce Croft. Deriving concept hierarchies from text. In SIGIR ’99:
Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, pages 206–213. ACM Press, 1999.
[20] P. Schmitz. Inducing ontology from flickr tags. In Proceedings of the Collaborative Web
Tagging Workshop (WWW’06), 2006.
[21] B¨
orkur Sigurbj¨
ornsson and Roelof van Zwol. Flickr tag recommendation based on collective
knowledge. In WWW ’08: Proceeding of the 17th international conference on World Wide
Web, pages 327–336. ACM, 2008.
[22] Roelof van Zwol, B¨
orkur Sigurbj¨
ornsson, Ramu Adapala, Lluis Garcia Pueyo, Abhinav Kati-
yar, Kaushal Kurapati, Mridul Muralidharan, Sudar Muthu, Vanessa Murdock, Polly Ng,
Anand Ramani, Anuj Sahai, Sriram Thiru Sathish, Hari Vasudev, and Upendra Vuyyuru.
Faceted exploration of image search results. In WWW ’10: Proceedings of the 19th inter-
national conference on World wide web, pages 961–970, New York, NY, USA, 2010. ACM.
[23] Shuo Wang, Feng Jing, Jibo He, Qixing Du, and Lei Zhang. Igroup: presenting web image
search results in semantic clusters. In CHI ’07: Proceedings of the SIGCHI conference on
Human factors in computing systems, pages 587–596, New York, NY, USA, 2007. ACM.
24
Yahoo! Labs Technical Report No. YL-2010-005
[24] Jinxi Xu and W. Bruce Croft. Query expansion using local and global document analysis. In
SIGIR ’96: Proceedings of the 19th annual international ACM SIGIR conference on Research
and development in information retrieval, pages 4–11, New York, NY, USA, 1996. ACM.
[25] Jinxi Xu and W. Bruce Croft. Improving the effectiveness of information retrieval with local
context analysis. ACM Transactions on Information Systems, 18:79–112, 2000.
[26] Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. Faceted metadata for image search
and browsing. In Proceedings of ACM CHI, 2003.
25