ArticlePDF Available

On the Construction of Efficiently Navigable Tag Clouds Using Knowledge from Structured Web Content

Authors:

Abstract and Figures

In this paper we present an approach to improving navigability of a hierarchically structured Web content. The approach is based on an integration of a tagging module and adoption of tag clouds as a navigational aid for such content. The main idea of this approach is to apply tagging for the purpose of a better highlighting of cross–references between information items across the hierarchy. Although in principle tag clouds have the potential to support efficient navigation in tagging systems, recent research identified a number of limitations. In particular, applying tag clouds within pragmatic limits of a typical user interface leads to poor navigational performance as tag clouds are vulnerable to a so-called pagination effect. In this paper, a solution to the pagination problem is discussed, implemented as a part of an Austrian online encyclopedia called Austria-Forum, and analyzed. In addition, a simulation-based evaluation of the new algorithm has been conducted. The first evaluation results are quite promising, as the efficient navigational properties are restored.
Content may be subject to copyright.
On the Construction of Efficiently Navigable Tag Clouds
Using Knowledge from Structured Web Content
Christoph Trattner
(KMI and IICM, Graz University of Technology, Graz, Austria
ctrattner@iicm.edu)
Denis Helic
(KMI, Graz University of Technology, Graz, Austria
dhelic@tugraz.at)
Markus Strohmaier
(KMI, Graz University of Technology, Graz, Austria
markus.strohmaier@tugraz.at)
Abstract: In this paper we present an approach to improving navigability of a hierar-
chically structured Web content. The approach is based on an integration of a tagging
module and adoption of tag clouds as a navigational aid for such content. The main
idea of this approach is to apply tagging for the purpose of a better highlighting of
cross–references between information items across the hierarchy. Although in principle
tag clouds have the potential to support efficient navigation in tagging systems, recent
research identified a number of limitations. In particular, applying tag clouds within
pragmatic limits of a typical user interface leads to poor navigational performance as
tag clouds are vulnerable to a so-called pagination effect. In this paper, a solution
to the pagination problem is discussed, implemented as a part of an Austrian online
encyclopedia called Austria-Forum, and analyzed. In addition, a simulation-based eval-
uation of the new algorithm has been conducted. The first evaluation results are quite
promising, as the efficient navigational properties are restored.
Key Words: Tagging, tags, tag clouds, algorithm, tag cloud algorithm, navigation,
navigability, online encyclopedia
Category: H.4
1 Introduction
An example of a semi-structured website is Austria-Forum1. Basically, Austria-
Forum is a collection of several hierarchically structured Austrian encyclopedias
that contain information about biographies, post stamps, coins, or the Austrian
Universal Encyclopedia AEIOU2. Austria-Forum is a Wiki based system, whose
articles within a single encyclopedia are hierarchically structured. Thus, Austria-
Forum is also called a structured Wiki [Trattner et al. 2010]. Currently, as of 1st
of October 2010 the system provides over 130,000 information items to the user.
1http://www.austria-lexikon.at/
2http://www.aeiou.at/
Due to the hierarchical structure and the rapid growth of the system over
the past few months, links between articles in different encyclopedias are sparse
even though they might be related to each other. For example, there are several
“Mozart” stamps in the Stamps encyclopedia. However, none of these articles
has links to the “Mozart” biography, or “Mozart” coins because the articles are
created and managed independently.
To tackle the problem of poor connectivity, a simple tagging mechanism was
introduced to Austria-Forum [Trattner and Helic 2009]. In tagging systems peo-
ple use free-form vocabulary [Hammond et al. 2005] to annotate resources with
“tags” [Wu et al. 2006, Marlow et al. 2006, Us Saaed 2008]. This is either done
for semantic reasons (e.g. to enrich information items with metadata), conversa-
tional (e.g. for social signaling) [Ames and Naaman 2007] or for organizational
reasons (e.g. to categorize information items) [K¨orner et al. 2010]. Regardless of
“why people tag” [Strohmaier et al. 2010, Nov and Ye 2010, Strohmaier 2008,
orner et al. 2010a], tags can be visualized in so-called “tag clouds”. A tag cloud
[Ames and Naaman 2007] is a selection of tags related to a particular resource.
Upon clicking on a tag, a list of resources tagged with that tag is presented to
users leaving them with a possibility to easily navigate to related resources. The
main idea of including a tag module into Austria-Forum can best be described via
the previously mentioned “Mozart” example. Suppose that users tag “Mozart”
stamps, “Mozart” coins, “Mozart” biography, or any other document dealing
with “Mozart” with a common tag, e.g. “Amadeus”. Whenever users navigate
to any of these articles a tag cloud containing all assigned tags is presented by
the system. Thus, users can now click on “Amadeus” tag and this presents a list
of all other articles tagged by that tag. Consequently, all articles tagged with
“Amadeus” are now linked to each other, in fact, they are cross-linked across
the hierarchical structure. Due to such indirect linking capabilities, tag clouds
are sometimes applied to provide navigational support in tagging systems (cf.
systems such as Flickr, Delicious, or BibSonomy).
Recently, in a number of studies tag clouds have been investigated from
user interface [Mesnage and Carman 2009, Sinclair and Cardew-Hall 2008] and
networktheoretic perspectives [Neubauer and Obermayer 2009]. These studies
agree with regard to some interesting findings, such as the observation that
current tag cloud calculation algorithms need to be improved. The ability of
tag clouds to support “efficient” navigation under the consideration of prag-
matic user interface limits, such as tag cloud size and pagination, is very poor
[Helic et al. 2010]. In particular, the pagination effect causes the fragmentation
of the network destroying the connected component and thus leaving a majority
of resources unreachable.
In this paper, we present an approach to constructing tag clouds that sup-
port efficient navigation. This new algorithm is based on the idea of hierarchical
network models that are known to be efficiently navigable [Kleinberg 2001]. The
algorithm has been implemented in Austria-Forum as a general tool for improv-
ing connectivity and navigability of the system as a whole.
The paper is structured as follows: Section 2 presents a model for tag cloud
based navigation. Section 3 discusses the problems of tag cloud based navigation
and current tag cloud construction algorithms. Section 4 presents the idea of
a new and optimized tag cloud calculation algorithm based on the ideas of a
hierarchical network model within an online encyclopedia system called Austria-
Forum. Section 5 provides an analysis of the potentials and limitation of this
new approach. Section 7 gives some insights to related work in this field. Finally,
Section 7 concludes the paper and provides an outlook for the future work in
this area.
2 Model of Tag Cloud Navigation
In this paper, the tagging data is modeled as a pair of the form (r, t), where ris
a resource from the set of all resources R, and tis a tag of all tags T. Here, we
do not take into account users as we concentrate only on links between resources
imposed by tags assigned to those resources. The main navigational aid in a
tagging system is a tag cloud and we denote it with T C. Formally, a tag cloud
T C is a particular selection of tags from the tag set.
Due to user interface restrictions the number of tags within a tag cloud
is usually limited to an upper bound. To model this situation we additionally
introduce a factor nas a maximum number of tags in a tag cloud.
Usually, the most popular tags are assigned to a large number of resources
– hundreds or even thousands of resources. When a user clicks on such a tag,
tagging systems present a long paginated list of tagged resources. In most cases,
10–100 resources are presented to the users at once (see e.g. Delicious or Bib-
sonomy). To model these user interface limitation – that we refer to as the
pagination from here on – we introduce a factor kthat k-limits the resource list
of tags within a tag cloud T C .
Finally, let us model the navigation process in a tagging system. Navigation
in a tagging system might start from a home page where a system-global tag
cloud is presented. Typically, tags with the highest global frequency are selected
for inclusion in a tag cloud. Upon clicking on a particular tag a k-limited list
of resources is shown. Once the user has selected a specific resource, the system
transfers the user to the selected resource and presents a resource-specific tag
cloud T Cr. The tags in such a resource-specific tag are selected according to the
highest local frequency. In the next step, by selecting a tag from a given resource-
specific tag cloud, the system again presents a paginated list of resources and
the user might continue the navigation process in the same manner as before
(see Figure 1).
Resource specific
Tag!Cloud TCr
Resource r Resource List!R!
of Tag!t
Tag!t
Figure 1: Resource specific tag cloud T Crand k-limited resource list Rof tag t
within Austria-Forum
3 Problems of Tag Cloud Navigation
Resource-specific tag clouds are a simple way to connect resources within a tag-
ging system, i.e. in a typical tagging system one can find nearly 99% of the re-
sources interlinked with each other within a tag cloud network [Helic et al. 2010].
However, this simple approach to building tag clouds exhibits certain problems.
In particular, resource-specific tag clouds are vulnerable to a so-called pagina-
tion effect [Helic et al. 2010]. In other words, by k-limiting the resource list of
a given tag (with typical pagination values such as 5, 10, or 20), the connec-
tivity of the tag cloud network collapses drastically. Practically, this leads to
a situation where the tag cloud network consists of isolated network clusters
(components) that are not linked to each other anymore. In other words, the
users cannot reach one network fragment from another network fragment by
navigating resource-specific tag clouds. One simple solution to this problem is
to select resource for inclusion in a k-limited resource list uniformly at random
[Helic et al. 2010]. For example, whenever the user clicks on a given tag in the
tag cloud the system randomly selects kresources and presents them to the user.
This leads to situation, that not always the same links are selected which leads
to the situation that isolated network clusters are created [Helic et al. 2010]. As
[Bollob´as and Chung 1988, Helic et al. 2010] have shown this approach produces
a random network that is, even for small values of k, completely connected.
3.1 Navigable vs. Efficiently Navigable Tag Cloud Networks
Another interesting issue in that context is the question if such randomly gen-
erated networks are also navigable. From a network-theoretic point of view
<category-page>
<category-page/category-page>
<category-page/category-page/sub-page>
<category-page/category-page/category-page>
< cate
g
or
y
-
p
a
g
e/cate
g
or
y
-
p
a
g
e/cate
g
or
y
-
p
a
g
e/sub-
a
e>
g y
p g g y
p g g y
p g
< category-page/category-page/category-page/category-page>
< category-page/category-page/category-page/category-page/sub-page>
Figure 2: Hierarchical structure and URL addressing schema within Austria-
Forum.
Kleinberg [Kleinberg 2000a, Kleinberg 2000b, Kleinberg 2001] showed that a
navigable network can be formally defined as a network with a low diameter
[Newman 2003] bounded by log(N), where N is the number of nodes in the
network, and an existing giant component, i.e. a strongly connected component
containing almost all nodes. Additionally, Kleinberg defined an “efficiently” nav-
igable network as a network possessing certain structural properties so that it
is possible to design efficient decentralized search algorithms (algorithms that
only have local knowledge of the network) [Kleinberg 2000a, Kleinberg 2000b,
Kleinberg 2001]. The delivery time (the expected number of steps to reach an ar-
bitrary target node) of such algorithms is polylogarithmic or at most sub-linear
in N. Put short, in [Kleinberg 2001] Kleinberg also showed that naive random
networks algorithms form network structures which require linear search time
(O(N)), i.e. in the worst case one has to visit all Nnodes within a network to
reach a certain destination node, i.e. such networks are not efficient navigable.
However, in [Kleinberg 2001] Kleinberg also showed that hierarchical network
models generate networks which are navigable in polynomial of O(logN ). Thus,
we applied a hierarchical network model for tag cloud network generation in
Austria-Forum to support efficient navigation.
4 Algorithm
4.1 Tag Clouds Hierarchy
We distinguish between two different types of nodes within Austria-Forum –
category-page and sub-page nodes with sub-page nodes being hierarchy leaves
(see Figure 2). Information items within Austria-Forum are hierarchically struc-
tured and addressable via a hierarchical URL schema.
The first component of the tag cloud generation algorithm in Austria-Forum
simply follows the hierarchical data organization and constructs hierarchically
organized tag clouds. The idea of this component is to provide more links between
articles in one and the same category and to shorten the paths between category-
pages and sub-pages. Thus, in order to generate a tag cloud for a particular
category-page, the tags of all sub-categories and all sub-pages are aggregated
recursively [Trattner and Helic 2009]. On the other hand, in order to generate
a tag cloud for a particular sub-page, the resource-specific tag cloud calculation
pattern is applied. The hierarchical tag cloud generation algorithm is shown in
Algorithm 1 with tfrepresenting the local tag frequency.
Algorithm 1 Tag Cloud Calculation Algorithm
getTagCloud: url, n
if (url is category-page) then
T Cn
rselect top ntags sorted by tfwhere r.url.startsWith(url)
else
T Cn
rselect top ntags sorted by tf
end if
return T C n
r
4.2 Addressing the Pagination Problem
Hierarchical network models [Kleinberg 2001] are based on the idea that, in many
settings, the nodes in a network can be organized in a hierarchy. The hierarchy
can be represented as a b-ary tree and network nodes can be attached to the
leaves of the tree. For each node v, we can create a link to all other nodes wwith
the probability pthat decreases with h(v, w) where his the height of the least
common ancestor of vand win the tree. Networks generated by this model are
“efficiently” navigable [Kleinberg 2001].
The main idea of applying such a hierarchical network model is to reuse the
hierarchical organization schema of articles in Austria-Forum as the basis for
generating the link probability distribution pas described before. The hierar-
chical network model as introduced by Kleinberg takes a complete, balanced
tree of nodes to obtain the link distribution. However, such an optimal model is
typically not obtainable since real-word networks (cf. Open Directory Project3,
Google Directory4or Yahoo! Directory5) form hierarchical structures which are
3http://www.dmoz.org/
4http://directory.google.com/
5http://dir.yahoo.com/
100101102103104105
100
101
102
103
Out−Degree (Branching Factor)
Number of Nodes
Out−Degree−Node Distribution/Height−Node Distribution of
Austria−Forum Resource Hierarchy
1 2 3 4 5 6 7 8 9
0
0.5
1
1.5
2x 104
Height
Number of Nodes
Figure 3: Out-degree distribution and node distribution of Austria-Forum re-
source hierarchy.
rarely complete nor balanced (cf. [Adamic and Adar]). For instance, in Austria-
Forum the average branching factor is around 21 nodes ranging from 1 to over
14,000 nodes per category while the out-degree (branching factor) distribution of
the hierarchy follows a power-law distribution (see Figure 3), which do not sat-
isfy Kleinberg criteria such as a constant branching factor b. Thus, an algorithm
implementing Kleinberg’s model in our setting needs to work with intuitions and
approximations.
The intuition which we followed with our algorithm is that the probability
that an article is linked with other articles from the same category is higher than
the probability that an article is linked with articles from other categories (cf.
[Watts et al. 2002, Adamic and Adar, Kleinberg 2001]). Put short, this can be
modeled by defining a link selection function that inter-links two nodes (articles)
v, w according to a link probability function that is equal to p=edist(v,w)(cf.
[Watts et al. 2002]) and a distance function that is calculated as dist(v, w) =
hv+hw2h(v, w)1, where hv, hware the heights of two nodes v, w in the
hierarchy and where h(v, w) is the height of the least common ancestor of the
nodes v, w in the hierarchy (cf. [Adamic and Adar]). In Algorithm 2 the actual
algorithm is presented.
Algorithm 2 Resource List Calculation Algorithm
For any given node r(t)Rin the resource hierarchy R, where tis the tag
applied to this node, we find all other nodes rj(t)Rand calculate dis-
tance dist(r(t), rj(t)) = h(r(t)) + h(rj(t)) 2h(r(t), rj(t)) 1. For all found
nodes rj(t)Rwe put rj(t) according to the distance dist(r(t), rj(t)) into
clusters clx= [ri, ..., rj] and store these clusters into an array rdist(i)r(t)=
[cldist1, ..., cldistn1]. Now, to select klinks from the resource list, we generate k
random numbers ik= 1 ... sizeof(rdist(i)r(t)) with a probability density function
p=exwith x= 1 ... sizeof(rdist(i)r(t)) and select kclusters clikrdist(ik)r(t)
returning for each cluster just one element which is selected uniform at random.
5 Evaluation
To evaluate the presented algorithm, we developed a theoretical framework that
integrates the following two modules:
anetwork-theoretic module based on the Stanford Snap6library to
calculate and evaluate network properties such as the size of the Largest
Strongly Connected Component (LSCC) or the Effective Diameter (ED)
[Helic et al. 2010] of the tag cloud network
and a searcher module which implements a hierarchical decentralized
searcher to simulate “efficient” tag cloud driven navigation.
5.1 Datasets
In the following section we describe the tag cloud networks which were gen-
erated and used for further evaluations. Basically, five different types of tag
cloud networks were generated (see Table 1). They all vary in the way how the
tag cloud and the resource list is calculated. Since one of our recent studies
[Helic et al. 2010] showed that limiting the tag cloud to practically feasible sizes
(e.g. 5, 10, or more) does not influence navigability, we set the tag cloud size in
our experiments to a fixed value of n= 30 which is actually also the size of the
tag clouds of Austria-Forum live system. Contrary, we varied the value k, i.e.
the maximum number of links in the resource list, to k= 15,50,100, which is
expected to impair navigability [Helic et al. 2010].
Dataset N (=Naive): This tag cloud network simulates the most common
and naive tag cloud and resource list calculation approach used these days in
tagging systems [Helic et al. 2010]. In other words, the tag cloud calculation
algorithm in this model follows a simple TopN approach displaying the most
6http://snap.stanford.edu/
Name TC-Algo. R-Algo. n k Nodes Links
N 15 TopN Chron. 30 15 11,716 246,031
N 50 TopN Chron. 30 50 11,716 637,448
N 100 TopN Chron. 30 100 11,716 1,039,741
HN 15 TopN-H Chron. 30 15 12,044 292,692
HN 50 TopN-H Chron. 30 50 12,044 753,482
HN 100 TopN-H Chron. 30 100 12,044 1,242,580
R 15 TopN Rand. 30 15 11,716 254,004
R 50 TopN Rand. 30 50 11,716 648,937
R 100 TopN Rand. 30 100 11,716 1,050,708
HR 15 TopN-H Rand. 30 15 12,044 308,183
HR 50 TopN-H Rand. 30 50 12,044 777,929
HR 100 TopN-H Rand. 30 100 12,044 1,265,023
HH 15 TopN-H Hier. 30 15 12,044 286,513
HH 50 TopN-H Hier. 30 50 12,044 727,252
HH 100 TopN-H Hier. 30 100 12,044 1,199,263
TC-Algo. = Tag Cloud Calculation Algorithm, R-Algo. = Resource List
Calculation Algorithm, TopN-H = TopN Hierarchically, Chron. =
Chronologically Sorted, Rand. = Randomly Sorted, Hier. = Hierarchically
Sorted.
Table 1: Tag cloud network statistics: Number of nodes and links.
frequent ntags in the tag cloud while the resource list calculation algorithm
sorts the resources descending chronological order and selecting the kmost top
resources.
Dataset HN (=Hierarchical Naive): This tag cloud network is generated
using the hierarchical tag cloud calculation algorithm introduced in Algorithm
1. The resource list is calculated sorting the resources (links) chronologically in
descending order and selecting the kmost top resources.
Dataset R (=Random): This tag cloud network using a naive TopN al-
gorithm (cf. Dataset G) for tag cloud calculations displaying the most frequent
ntags in the tag clouds. The resource list is generated selecting kresources
uniform at random.
Dataset HR (=Hierarchical Random): This tag cloud network is gener-
ated using the hierarchical tag cloud algorithm introduced in Algorithm 1. The
resource list is calculated selecting kresources uniform at random.
Dataset HH (=Hierarchical Hierarchical): This tag cloud network is
generated using the hierarchical tag cloud algorithm introduced in Algorithm 1
Name TC-Algo. R-Algo. n k LSCC ED NAV
N 15 TopN Chron. 30 15 0.567002 5.99404 unnav.
N 50 TopN Chron. 30 50 0.761011 5.39847 unnav.
N 100 TopN Chron. 30 100 0.863008 5.93894 unnav.
HN 15 TopN-H Chron. 30 15 0.566008 3.47673 unnav.
HN 50 TopN-H Chron. 30 50 0.755314 2.93258 unnav.
HN 100 TopN-H Chron. 30 100 0.856941 2.90164 unnav.
R 15 TopN Rand. 30 15 0.949983 5.93975 nav.
R 50 TopN Rand. 30 50 0.949983 5.03066 nav.
R 100 TopN Rand. 30 100 0.949983 5.43866 nav.
HR 15 TopN-H Rand. 30 15 0.968034 3.73302 nav.
HR 50 TopN-H Rand. 30 50 0.968034 3.17498 nav.
HR 100 TopN-H Rand. 30 100 0.968034 2.90565 nav.
HH 15 TopN-H Hier. 30 15 0.968034 3.46743 nav.
HH 50 TopN-H Hier. 30 50 0.968034 2.92611 nav.
HH 100 TopN-H Hier. 30 100 0.968034 2.92633 nav.
TC-Algo. = Tag Cloud Calculation Algorithm, R-Algo. = Resource List
Calculation Algorithm, Chron. = Chronologically Sorted, Rand. = Randomly
Sorted, Hier. = Hierarchically Sorted, LSCC = Largest Strongly Connected
Component, ED = Effective Diameter, NAV = Navigability, TopN-H = TopN
Hierarchically Calculated, unnav. = unnavigable, nav. = navibale
Table 2: Tag cloud network dataset statistics: Largest Strongly Connected Com-
ponent, Efficient Diameter and Navigability.
and the hierarchical resource list algorithm introduced in Algorithm 2.
5.2 Evaluating Navigability
In order to evaluate whether the generated tag cloud networks are navigable or
not, the size of the largest strongly connected component (LSCC) and the effec-
tive diameter (ED) was calculated. As already defined before (see Section 3.1),
we consider navigable networks to be networks that have a low diameter bounded
logarithmically and a giant component. As shown in Table 2, naive constructed
tag cloud networks (N 15 – N 100 and HN 15 – HN 100) are formally seen not
navigable. This is the case, since these types of networks do not have a giant
component containing nearly almost all nodes of the network. Contrary, all other
networks form navigable network structures, i.e. they contain a giant component
and an effective diameter that is bounded logarithmically. Note, networks built
on such a hierarchical approach generate networks that have a lower diameter
helvetisch
schloß
tag
kirche
schloß
tag schloß
kirche
domkirche
schloß
zackenstil
tag
tag
kirche
tag
tag
ki h
domkirche
tag
tag
ki
rc
h
e
domkirche
Figure 4: Shows an example of a resource-specific tag cloud network in Austria-
Forum and a search through it.
than networks implementing a general TopN tag cloud calculation approach.
This is not surprising, since such networks generate more long range links from
category-pages to sub-pages, i.e. they shorten the paths to reach the sub-pages
in the system.
5.3 Evaluating Efficiency
In order to evaluate the efficiency of our new approach, a hierarchical decentral-
ized searcher was developed to simulate “efficient” tag cloud driven navigation.
The searcher is basically an adoption of the work made by [Adamic and Adar]
which uses background knowledge from the underlaying resource network struc-
ture to navigate the tag cloud network.
To model tag cloud based navigation, we define the tag cloud network as a
bipartite hypergraph of the form V=RT[Helic et al. 2010], where Ris the set
of resources and Tthe set of tags. Since the resource lists are limited to a certain
value kwhich forces the tag cloud network to collapse into a directed unipartite
tag-resource network (with resource specific tags), we developed a searcher that
walks along the underlying projected directed resource-resource network.
In Algorithm 3, the actual searcher algorithm is presented. In words, the
algorithms works as follows:
To find a certain target resource w(e.g. tagged as “domkirche”) from a
certain start node v(e.g. tagged as “schloß”) within the network (see Figure
4), the searcher first selects all adjacent nodes vifor the start node and then
Algorithm 3 Hierarchical Decentralized Searcher (cf. [Adamic and Adar])
Searcher: resource-resource graph G, resource-hierarchy T, start node v, tar-
get node w
while v != w do
viget all adjacent nodes Gfrom v
// finds closest node according to dist =distmin
// where dist(vi, w) = h(vi) + h(w)2h(vi, w)1
vfindClosestNode (vi,T)
end while
dist=2
schloß
dist=3
schloß kirche
kirche
domkirche
domkirche
Figure 5: Shows an example of the Austria-Forum resource - taxonomy and a
sample of tags they have applied.
selects the node vfrom the network (“kirche”) that has the shortest distance
dist(vi, w) = h(vi) + h(w)2h(vi, w)1 to wnode in the resource taxonomy
T, with h(vi), h(w) being the heights of the two nodes vi, w in the hierarchy and
with h(vi, w) being the height of the least common ancestor of the two nodes
vi, w in the hierarchy [Adamic and Adar]. In the next step, the adjacent nodes
of vare again selected and the distances dist(vi, w) are calculated, while the
node vwith shortest distance is selected in the end. The process is continued
until the target node wis reached.
In order to get statistically significant results, we simulated 100,000 search-
requests starting randomly selected at a certain resource viand targeting at cer-
tain randomly selected resource wiin the tag cloud network. Note, only search
pairs vi, wiwere considered for the simulations for which a path (vi, wi) was
present in the network. The upper limit for a search was set to a value of max-
k=15 k=50 k=100
0
10
20
30
40
50
60
70
80
90
100
Resource List Size
Count (%)
Error Rate
N
HN
R
HR
HH
Figure 6: Error Rate for different types of networks. As expected, hierarchically
generated networks (see network HG, HR and HH) perform significantly better
than naive generated tag cloud networks (see network G and R).
imum 100 hops in the simulations, i.e. we canceled searches which took more
than 100 hops to find a target node wi. If the searcher was not able to find a
path further in the tag cloud network, we canceled the search task as well. If a
search task was being canceled, we did not reset the searcher to find a new path
for the same search pair vi, wi.
As shown in Figure 6, flat and paginated tag cloud networks (labeled as net-
work G and R in Figure 6) produce poor results for a naive hierarchical search
algorithm in such networks. The reason for this behavior is the fact that the
searcher frequently lands on a sink in the tag cloud network. This is the case
since the resource has already been visited before or there is no link offered by
the resource the searcher can follow anymore due to the low number of links (see
Table 1) because of the pagination effect. “Expanded” networks implementing a
hierarchical tag cloud algorithm (see Algorithm 1) perform even better in find-
ing paths from resources vito a resources wiin the network. For instance, for
paginated resource lists and hierarchically calculated tag clouds (cf. network N
and NH in Figure 6), the searcher fails only in 27% of all cases, while without
hierarchically calculated tag clouds the error rate of the searcher in more than
1 2 3 4 5 6 7 8 9
0
5
10
15
20
25
30
35
Hops
Count (%)
Hierarchical Decentralized Searcher
Hops Distribution (100,000 Simulations)
N_15
HN_15
R_15
HR_15
HH_15
(a) k=15
1 2 3 4 5 6 7 8 9
0
5
10
15
20
25
30
35
Hops
Count (%)
Hierarchical Decentralized Searcher
Hops Distribution (100,000 Simulations)
N_50
HN_50
R_50
HR_50
HH_50
(b) k=50
1 2 3 4 5 6 7 8 9
0
5
10
15
20
25
30
35
Hops
Count (%)
Hierarchical Decentralized Searcher
Hops Distribution (100,000 Simulations)
N_100
HN_100
R_100
HR_100
HH_100
(c) k=100
Figure 7: Hierarchical Decentralized Searcher hop-distributions for different val-
ues of k= 15,50,100 (size of the resource list).
89%. Furthermore, we can observe that hierarchically randomly generated net-
works are better navigable than all other investigated approaches (see network
HR and HH in Figure 6). This is the case, since such networks provide more
links between the resources of a tagging system (see Table 1) than “flat” general
networks. Finally, we can investigate that networks adopting a hierarchical re-
source list calculation algorithm (see network HH in Figure 6) perform best by
means of navigation. In case of Austria-Forum, this type of tag cloud network
generates the lowest error rate and the fastest searchable network (see Figure 7)
among all others.
6 Related Work
In related research on tagging systems, tag clouds have been characterized as
a way to translate the emergent vocabulary of a folksonomy into social navi-
gation tools [Sinclair and Cardew-Hall 2008, Dieberger 1997]. Social navigation
itself represents a multi-dimensional concept, covering a range of different is-
sues and ideas. A distinction between direct and indirect social navigation, for
example, highlights whether navigational clues are provided by direct communi-
cation among users (e.g. via chat), or whether navigational clues are indirectly
inferred from historical traces left by others [Millen and Feinberg 2006]. Based
on this distinction, our work only focuses on indirect social navigation in the
sense that it studies the effectiveness of traces (“tags”) left by users in tagging
systems. Other types of social navigation emphasise the need to show the pres-
ence of others users, to build trust among groups of users, or to encourage certain
behaviour [Millen and Feinberg 2006].
Researchers have discussed the advantages and drawbacks of tag clouds, sug-
gesting that tag clouds are a useful mechanism when users’ search tasks are
general and explorative (for example, learn about Web 2.0), while tag clouds
provide little value for specific information-seeking tasks (for example, navigate
to www.cnn.com) [Sinclair and Cardew-Hall 2008]. While the paper at hand fo-
cuses on network-theoretic aspects, cognitive aspects of navigation have been
studied previously using, for example, SNIF-ACT [Fu and Pirolli] and social in-
formation foraging theory [Pirolli 2009]. Other work has studied the motivations
of users for tagging [K¨orner et al. 2010], and how they influence emergent seman-
tic (as opposed to navigational) structures. The navigational utility of single tags
has been investigated [Chi and Mytkowicz] with somewhat disappointing results.
With time the tags become harder and harder to use as they lose specificity and
reference too many resources. Such tags are exactly those paginated tags where
new pagination algorithms are needed.
Navigation models for tagging systems have been also discussed recently. In
[Ramezani et al. 2009] authors describe a navigation framework for tagging sys-
tems. The authors apply the framework to analyze possible attacks on tagging
systems. In principle, the framework identifies a navigation channels as any com-
bination of the basic elements of a tagging system (users, tags, and resources).
Thus, the specific combination which we investigated in this paper can be sum-
marized as the resource-tag or tag-resource navigation channel.
Recent literature also discusses further algorithms for the construction of tag
clouds. The ELSABer algorithm [Li et al. 2007] represents an example of such
an effort aimed towards identifying hierarchical relationships between annota-
tions to facilitate browsing. The work by [Aouiche et al. 2008] is another exam-
ple, introducing entropy-based algorithms for the construction of interesting tag
clouds. However, these algorithms have not found wide-spread adoption in cur-
rent social tagging systems, and their usefulness to support navigation is largely
unknown. In future work, it would be interesting to compare additional tag
cloud construction algorithms with our approach. In addition, empirical studies
of tagging systems have for example focused on comparing navigational charac-
teristics of tag distributions to similar distributions produced by library terms
[Heymann et al. 2010].
7 Conclusions and Future Work
The main contribution of this paper is the introduction of a novel, tag-based algo-
rithm for interlinking resources in hierarchically-structured Web content. Based
on a review of tag cloud limitations and an existing hierarchical algorithm for the
construction of efficiently navigable networks, we discussed, implemented, and
evaluated by simulation a new approach to tag cloud construction that improves
the overall navigability of social tagging systems. While the arguments laid out
in this paper are of a theoretical nature, we empirically tested the navigability
of link structures produced by such an algorithm and confirmed the theoretical
expectations by simulation. Finally, evaluating the usability and usefulness of
the proposed algorithm with end users in an experimental setting would bring
new insights into the potentials and limitations of the proposed approach.
Acknowledgments
This work is funded by - BMVIT - the Federal Ministry for Transport, In-
novation and Technology, program line Forschung, Innovation und Technologie
ur Informationstechnologie, project NAVTAG – Improving the navigability of
tagging systems.
References
[Adamic and Adar] Adamic, L. and Adar, E.: How to search a social network, Social
Networks, Volume 27, Issue 3, 187-203, 2005.
[Ames and Naaman 2007] Ames, M. and Naaman., M.: Why we tag: motivations for
annotation in mobile and online media. In CHI ’07: Proceedings of the SIGCHI
conference on Human factors in computing systems, ACM, New York, 2007.
[Aouiche et al. 2008] Aouiche, K., Lemire, D. and Godin, R.: Web 2.0 OLAP: From
Data Cubes to Tag Clouds, 4th International Conference, WEBIST 2008, Lecture
Notes in Business Information Processing, Springer Berlin Heidelberg, Volume 18,
2008.
[Bollob´as and Chung 1988] Bollob´as, B. and Chung, F. R. K.: The diameter of a cycle
plus a random matching. In SIAM J. Discret. Math. 1(3), pp 328–333, 1988.
[Chi and Mytkowicz] Chi, E. H. and Mytkowicz, T.: Understanding the efficiency of
social tagging systems using information theory,H T ’08: Proceedings of the nine-
teenth ACM conference on Hypertext and hypermedia, ACM, NY, 81-88,2008.
[Dieberger 1997] Dieberger, A.: Supporting social navigation on the World Wide Web,
Academic Press, Inc., Volume 46 (6), 805-825, Duluth, MN, USA, 1997.
[Fu and Pirolli] Fu, W.T. and Pirolli, P.: SNIF-ACT: a cognitive model of user nav-
igation on the world wide web, Hum.-Comput. Interact., Volume 22 (4), 355-412,
Hillsdale, NJ, USA, 2007.
[Hammond et al. 2005] Hammond, T., Hannay, T., Lund, B. and Scott, J.: Automatic
construction and management of large open webs. Social Bookmarking Tools (I): A
General Review, D-Lib Magazine, 11(4), 2005.
[Helic et al. 2010] Helic, D., Trattner, Ch., Strohmaier, M. and Andrews, K.: On the
Navigability of Social Tagging Systems, The 2nd IEEE Conference on Social Com-
puting, SocialCom2010, Minneapolis, Minnesota, USA, 2010.
[Heymann et al. 2010] Heymann, P., Paepcke, A. and Garcia-Molina, H.: Tagging Hu-
man Knowledge, Proceedings of the Third ACM International Conference on Web
Search and Data Mining, ACM, NY, 51-61, 2010.
[Kleinberg 2000b] Kleinberg, J. M.: The small-world phenomenon: an algorithm per-
spective, In Proceedings of the Thirty-Second Annual ACM Symposium on theory
of Computing (Portland, Oregon, United States, May 21 - 23, 2000), STOC ’00,
ACM, New York, NY, 163-170, 2000.
[Kleinberg 2000a] Kleinberg, J. M.: Navigation in a small world, Nature, vol. 406, no.
6798, August 2000.
[Kleinberg 2001] Kleinberg, J. M.: Small-World Phenomena and the Dynamics of In-
formation. In Advances in Neural Information Processing Systems (NIPS) 14, 2001.
[K¨orner et al. 2010] orner, C., Benz, D., Hotho, A., Strohmaier, M. and Stumme, G.:
Stop Thinking, Start Tagging: Tag Semantics Emerge From Collaborative Verbosity,
19th International World Wide Web Conference (WWW2010), ACM, Raleigh, NC,
USA, April 26-30, 2010.
[K¨orner et al. 2010a] orner, C., Kern, R., Grahsl, H.P, and Strohmaier, M.: Of cat-
egorizers and describers: An evaluation of quantitative measures for tagging mo-
tivation, Proceedings of the 21st ACM conference on Hypertext and Hypermedia,
Toronto, Canada, 2010.
[Li et al. 2007] Li, R., Bao, S., Yu, Y., Fei, B. and Su, Z.: Towards effective browsing
of large scale social annotations, Proceedings of the 16th international conference
on World Wide Web, ACM, NY, 943-952, 2007.
[Marlow et al. 2006] Marlow, C., Naaman, M., Boyd, D. and Davis, M.: HT06, tagging
paper, taxonomy, Flickr, academic article, to read, In Proceedings of the Seven-
teenth Conference on Hypertext and Hypermedia (Odense, Denmark, August 22 -
25, 2006), HYPERTEXT ’06, ACM, New York, 2006.
[Mesnage and Carman 2009] Mesnage, C. S. and Carman., M. J.: Tag navigation. In
SoSEA ’09: Proceedings of the 2nd international workshop on Social software engi-
neering and applications, ACM, New York, 29 - 32, 2009.
[Millen and Feinberg 2006] Millen, D.R. and Feinberg, J.:Using social tagging to im-
prove social navigation, Workshop on the Social Navigation and Community Based
Adaptation Technologies, Citeseer, Dublin, Ireland, 2006.
[Neubauer and Obermayer 2009] Neubauer, N. and Obermayer, K.: Hyperincident
connected components of tagging networks, In HT’09: Proceedings of the 20th ACM
conference on Hypertext and hypermedia, ACM, New York, 229 - 238, 2009.
[Newman 2003] Newman, M. E. J.: The structure and function of complex networks,
SIAM Review, 45(2):167-256, 2003.
[Nov and Ye 2010] Nov, O. and Ye, C.: Why do people tag?: motivations for photo
tagging, Commun. ACM 53, 7 (Jul. 2010), 128-131, 2010.
[Pirolli 2009] Pirolli, P.: An elementary social information foraging mode, Proceedings
of the 27th international conference on Human factors in computing systems, 605-
614, ACM, NY, 2009.
[Ramezani et al. 2009] Ramezani, M., Sandvig, J.J., Schimoler, T., Gemmell, J.,
Mobasher, B. and Burke, R.: Evaluating the Impact of Attacks in Collaborative
Tagging Environments, International Conference on Computational Science and En-
gineering 2009, CSE ’09, 136-143, 2009.
[Sinclair and Cardew-Hall 2008] Sinclair, J. and Cardew-Hall, M.: The folksonomy tag
cloud: when is it useful? Journal of Information Science, 34:15, 2008.
[Strohmaier et al. 2010] Strohmaier, M., K¨orner, C., and Kern, R.: Why do Users Tag?
Detecting Users’ Motivation for Tagging in Social Tagging Systems, 4th Interna-
tional AAAI Conference on Weblogs and Social Media (ICWSM2010), Washington,
DC, USA, May 23-26, 2010.
[Strohmaier 2008] Strohmaier, M.: Purpose Tagging - Capturing User Intent to As-
sist Goal-Oriented Social Search, SSM’08 Workshop on Search in Social Media, in
conjunction with CIKM’08, Napa Valley, USA, 2008.
[Trattner and Helic 2009] Trattner, C. and Helic, D.: Extending The Basic Tagging
Model: Context Aware Tagging, In Proceedings of IADIS International Conference
WWW/Internet 2009 (2009), IADIS International Conference on WWW/Internet,
Rom, 76 - 83, 2009.
[Trattner et al. 2010] Trattner, C., Hasani, I., Helic, D. and Leitner, H.: The Austrian
way of Wiki(pedia)! - Development of a Structured Wiki-based Encyclopedia within
a Local Austrian Context, WikiSym 2010 - The 6th International Symposium on
Wikis and Open Collaboration, ACM, Gdansk, Poland, 1-10, 2010.
[Us Saaed 2008] Us Saaed, A., Afzal, M.T., Latif, A., Stocker, A. and Tochtermann, K.:
Does Tagging Indicate Knowledge Diffusion? An Exploratory Case Study, In Proc.
of the ICCIT 08 - International Conference on Convergence and hybrid Information
Technology, Busan, Korea, 2008.
[Watts et al. 2002] Watts, D.J., Dodds, P.S. and Newman, M.E.J.: Identity and search
in social networks. Science, Volume 296, 1302-1305, 2002.
[Wu et al. 2006] Wu, H., Zubair, M., and Maly, K.: Harvesting social knowledge from
folksonomies. In Proceedings of the Seventeenth Conference on Hypertext and Hy-
permedia (Odense, Denmark, August 22 - 25, 2006), HYPERTEXT ’06. ACM, New
York, 111 - 114, 2006.
... Hence, more sophisticated strategies were investigated to generate also efficiently navigable tagging systems. Recently, it was found that we can create efficiently navigable tag networks if a hierarchical network model [3] [12] is applied to select the k resources in the resource list. Put simple, the idea is to place the resources of the tagging system within a hierarchical taxonomy and to use this taxonomy to generate a probability density function to generate a tag " briefmarke " urce list 1 Wissenssammlungen nstructed resou Briefmarken 1949 1950 1965 1968 1969 1973 1974 1976 dist=3 rarchically con ...
... = resource with tag " briefmarke " applied = currently viewed resource " Briefmarken " Figure 1: Example of a hierarchically constructed resource list in a system called Austria- Forum [12] with corresponding resource taxonomy. k-limited resource list. ...
... The distance dist(r(t), r(t) i ) is calculated as dist(r(t), r(t)i) = h(r(t)) + h(r(t) i ) − 2h(r(t), r(t) i ), where h(r(t)), h(r(t) i ) are the heights of r(t) and r(t) i in a given resource taxonomy T and where h(r(t), r(t) i ) is the height of the least common ancestor of r(t) and r(t) i in the resource taxonomy T [12] (cf. Figure 1). ...
Conference Paper
Full-text available
This paper presents the first practical results of a novel resource list generation algorithm based on a hierarchical network model which demonstrably improves the navigability of tagging systems. In particular, the results of a formal experiment show that the new algorithm is able to create tag networks which are significantly more navigable than the one generated by the currently most widely used resource list generation - the reverse chronological sorting resource list generation algorithm.
... Therefore, we have investigated in our recent work more sophisticated strategies to generate a k-limited resource list for a particular tag in the tagging system. In [32] we have shown that it is possible, at least in theory, if we apply a hierarchical network model [20] to select the k resources for the resource list. The idea is to place the resources in the collection within a hierarchical taxonomy and to use this taxonomy to generate a probability function to select the k resources in the resource list [32]. ...
... In [32] we have shown that it is possible, at least in theory, if we apply a hierarchical network model [20] to select the k resources for the resource list. The idea is to place the resources in the collection within a hierarchical taxonomy and to use this taxonomy to generate a probability function to select the k resources in the resource list [32]. In [14] we introduced a hierarchical decentralized search approach. ...
... In [14] we introduced a hierarchical decentralized search approach. In [32] we used the searcher to simulate a user navigating a tagging system. In short, simulations were able to demonstrate that the hierarchical resource list generation approach generates tag networks which are significantly more navigable than tag networks generated by the most popular resource list generation approach – the reverse chronological sorting resource list generation algorithm [32]. ...
Article
Full-text available
Recent research has shown that the navigability of tagging systems leaves much to be desired. In general, it was observed that tagging systems are not navigable if the resource lists of the tagging system are limited to a certain factor k. Hence, in this paper a novel resource list generation approach is introduced that addresses this issue. The proposed approach is based on a hierarchical network model. The paper shows through a number of experiments based on a tagging dataset from a large online encyclopedia system called Austria-Forum, that the new algorithm is able to create tag network structures that are navigable in a efficient manner. Contrary to previous work, the method featured in this paper is completely generic, i.e. the introduced resource list generation approach could be used to improve the navigability of any tagging system. This work is relevant for researchers interested in navigability of emergent hypertext structures and for engineers seeking to improve the navigability of tagging systems.
... Heymann and Garcia-Molina 9 Hsieh et al. 55 Di Caro et al. 58 Kaptein and Marx 12 Trattner et al. 56 Janicke and Scheuermann 18 Ontologies Kim et al. 57 Other reviews Khusro et al. 4 circular masses of text with an aspect similar to the one they have today. 5 In 2002, Flickr, the popular website based on sharing images among users, began to need a way to classify or tag user images, creating a tag cloud that showed the popularity of the tags using different font sizes. ...
... With this method they obtained a good information retrieval due to the incorporation of the concept ''distance,'' but they lost precision. Trattner et al. 56 built a navigation model through tag clouds based on inheritance. In this model, when generating the tag cloud of a parent category, the tags from all child categories are added recursively. ...
Article
Full-text available
Tag clouds are tools that have been widely used on the Internet since their conception. The main applications of these textual visualizations are information retrieval, content representation and browsing of the original text from which the tags are generated. Despite the extensive use of tag clouds, their enormous popularity and the amount of research related to different aspects of them, few studies have summarized their most important features when they work as tools for information retrieval and content representation. In this paper we present a summary of the main characteristics of tag clouds found in the literature, such as their different functions, designs and negative aspects. We also present a summary of the most popular metrics used to capture the structural properties of a tag cloud generated from the query results, as well as other measures for evaluating the goodness of the tag cloud when it works as a tool for content representation. The different methods for tagging and the semantic association processes in tag clouds are also considered. Finally we give a list of alternative for visual interfaces, which makes this study a useful first help for researchers who want to study the content representation and information retrieval interfaces in greater depth.
... However, our previous experiments were based on intuitions how humans navigate and we have not yet compared our simulations (based on decentralized search) with real human navigational paths. Hence, the purpose of this paper is to compare simulations based on hierarchical decentralized search with a large-scale corpus of human navigational paths and to reveal whether or not it is justified to simulate human navigational behavior in information networks with the hierarchical decentralized search procedure as introduced and used by us in previous work [11, 21, 22, 23]. To that end, we compared more than 150,000 click trails of users navigating the complete English Wikipedia with simulations. ...
... In this work we explored the differences and similarities between hierarchical decentralized search and human navigational behavior in information networks and to reveal whether or not it is justified to simulate human navigational behavior in information networks with the hierarchical decentralized search procedure introduced and used by us in previous work [11, 21, 22, 23]. Based on a large-scale click dataset of over 150,000 click trails from the online platform the Wiki Game, we performed a number of experiments to gain insights into how humans search in information works and how well simulations based on hierarchical decentralized search correlate with humans click trails. ...
Conference Paper
Full-text available
Decentralized search in networks is an activity that is often performed in online tasks. It refers to situations where a user has no global knowledge of a network’s topology, but only local knowledge. On Wikipedia for instance, humans typically have local knowledge of the links emanating from a given Wikipedia article, but no global knowledge of the entire Wikipedia graph. This makes the task of navigation to a target Wikipedia article from a given starting article an interesting problem for both humans and algorithms. As we know from previous studies, people can have very efficient decentralized search procedures that find shortest paths in many cases, using intuitions about a given network. These intuitions can be modeled as hierarchical background knowledge that people access to approximate a networks’ topology. In this paper, we explore the differences and similarities between decentralized search that utilizes hierarchical background knowledge and actual human navigation in information networks. For that purpose we perform a large scale study on the Wikipedia information network with over 500,000 users and 1,500,000 click trails. As our results reveal, a decentralized search procedure based on hierarchies created directly from the link structure of the information network simulates human navigational behavior better than simulations based on hierarchies that are created from external knowledge.
... While recent research has studied navigation in tagging systems from user interface [7], [8], [6] and information-theory [1] perspectives, the unique focus of our work is the network-theoretic analysis of tagging systems. In previous research it was observed that different tag cloud or resource list calculation algo- rithms [4], [10], different tag taxonomy induction algorithms [3] or also different types of tags [9] influence the navigability of a tagging system significantly. Ba- sically, a navigable tagging system is defined as a system where the underlying tag network has a low diameter bounded by log(N ), where N is the number of nodes in the network, and an existing giant component, i.e. a strongly con- nected component containing almost all resource of the tagging system [5]. ...
Conference Paper
Full-text available
This paper presents NAVTAG – a network theoretical framework to assess and improve the navigability of tagging systems. The framework provides the developer of a tagging system with a simple to use and scalable tool to assess the navigability of a given tag network or a tag network that is generated by the NAVTAG framework using different tag cloud and resource list generation algorithms. To the best of our knowledge this framework is the first approach of a tool that is able to assess and improve the navigability of a given tagging system from a network-theoretic perspective.
Conference Paper
Full-text available
Tagging systems are very popular tools for organizing and structuring information about arbitrary Web resources by assigning simple keywords (tags) to those resources in a collaborative fashion. Folksonomy is a data structure resulting from the tagging process and is characterized as a flat collection of triples of the form user-resource-tag. Research on folksonomies is only taking off and as of now a number of open research questions and problems require to be answered. One of such problems is the problem of tag ambiguity – a situation where one and the same tag might have multiple meanings depending on users, topics, or context in which the tags have been created. In this paper we present a simple solution to the ambiguity problem by extending the basic triple model of tagging systems with the fourth element: context. Thus, each tag is (automatically) put into a context. Note here that the notion of context is a general one. That is the context might be inferred from the content of a resource, external information structures existing on the top of a resource, or from properties of the user profile. Lastly, the paper presents an implementation of this principle in an online Wiki-based encyclopedia called Austria-Forum and discusses directions for the future work.
Article
Full-text available
We describe the development of a computational cognitive model that explains navigation behavior on the World Wide Web. The model, called SNIF-ACT (Scent-based Navigation and Information Foraging in the ACT cognitive architecture), is motivated by Information Foraging Theory (IFT), which quantifies the perceived relevance of a Web link to a user's goal by a spreading activation mechanism. The model assumes that users evaluate links on a Web page sequentially and decide to click on a link or to go back to the previous page by a Bayesian satisficing model (BSM) that adaptively evaluates and selects actions based on a combination of previous and current assessments of the relevance of link texts to information goals. SNIF-ACT 1.0 utilizes the measure of utility, called information scent, derived from IFT to predict rankings of links on different Web pages. The model was tested against a detailed set of protocol data collected from 8 participants as they engaged in two information-seeking tasks using the World Wide Web. The model provided a good match to participants' link selections. In SNIF-ACT 2.0, we included the adaptive link selection mechanism from the BSM that sequentially evaluates links on a Web page. The mechanism allowed the model to dynamically build up the aspiration levels of actions in a satisficing process (e.g., to follow a link or leave a Web site) as it sequential assessed link texts on a Web page. The dynamic mechanism provides an integrated account of how and when users decide to click on a link or leave a page based on the sequential, ongoing experiences with the link context on current and previous Web pages. SNIF-ACT 2.0 was validated on a data set obtained from 74 subjects. Monte Carlo simulations of the model showed that SNIF-ACT 2.0 provided better fits to human data than SNIF-ACT 1.0 and a Position model that used position of links on a Web page to decide which link to select. We conclude that the combination of the IFT and the BSM provides a good description of user-Web interaction. Practical implications of the model are discussed.
Article
Full-text available
In this paper, we explore the increasingly popular social bookmarking services. These services powerfully combine personal tagging of information sources with interactive browsing, which allows for improved social navigation. We examine the use of a social bookmarking service, deployed in a large organization, to understand how social navigation is supported. We conclude that social tags used in the context of a social bookmarking service are an important way to improve social navigation.
Conference Paper
Full-text available
User interfaces and information systems have become increasingly social in recent years, aimed at supporting the decentralized, cooperative production and use of content. A theory that predicts the impact of interface and interaction designs on such factors as participation rates and knowledge discovery is likely to be useful. This paper reviews a variety of observed phenomena in social information foraging and sketches a framework extending Information Foraging Theory towards making predictions about the effects of diversity, interference, and cost-of- effort on performance time, participation rates, and utility of discoveries. Author Keywords Social information foraging theory.
Article
We address the question of how participants in a small world experiment are able to find short paths in a social network using only local information about their immediate contacts. We simulate such experiments on a network of actual email contacts within an organization as well as on a student social networking website. On the email network we find that small world search strategies using a contact’s position in physical space or in an organizational hierarchy relative to the target can effectively be used to locate most individuals. However, we find that in the online student network, where the data is incomplete and hierarchical structures are not well defined, local search strategies are less effective. We compare our findings to recent theoretical hypotheses about underlying social structure that would enable these simple search strategies to succeed and discuss the implications to social software design.
Article
Many systems take the form of networks, including the Internet, distribution and transport networks, neural networks, food webs, and social networks. The characterization and modeling of these systems has proved amenable to treatment using techniques drawn from statistical and computational physics, and has as a result attracted considerable attention in the physics literature in recent years. In this paper the author reviews some of the interesting issues in this area and recounts some recent work on these issues by himself and by others.
Conference Paper
The terms that are used by users during tagging have been found to be different from the terms that are used when searching for resources, which represents a fundamental problem for search in tagging based systems. To address this problem, we propose purpose tagging as a novel kind of tagging that focuses on capturing aspects of intent rather than content. By capturing the different purposes a given resource can serve, purpose tags appear useful to mediate between the vocabulary of user intent on one hand, and the vocabulary of contents and tags provided by social software applications on the other. The paper at hand makes the following contributions: 1) It extends the set of known kinds of tags with a novel type and 2) it provides first empirical evidence of the principle feasibility of purpose tagging and its potential to facilitate goal-oriented social search in an exploratory case study.
Conference Paper
A fundamental premise of tagging systems is that regular users can organize large collections for browsing and other tasks using uncontrolled vocabularies. Until now, that premise has remained relatively unexamined. Using library data, we test the tagging approach to organizing a collection. We find that tagging systems have three major large scale organizational features: consistency, quality, and completeness. In addition to testing these features, we present results suggesting that users produce tags similar to the topics designed by experts, that paid tagging can effectively supplement tags in a tagging system, and that information integration may be possible across tagging systems.