Conference PaperPDF Available

On the Navigability of Social Tagging Systems



It is a widely held belief among designers of social tagging systems that tag clouds represent a useful tool for navigation. This is evident in, for example, the increasing number of tagging systems offering tag clouds for navigational purposes, which hints towards an implicit assumption that tag clouds support efficient navigation. In this paper, we examine and test this assumption from a network-theoretic perspective, and show that in many cases it does not hold. We first model navigation in tagging systems as a bipartite graph of tags and resources and then simulate the navigation process in such a graph. We use network-theoretic properties to analyse the navigability of three tagging datasets with regard to different user interface restrictions imposed by tag clouds. Our results confirm that tag resource networks have efficient navigation properties in theory, but they also show that popular user interface decisions (such as “pagination” combined with reverse-chronological listing of resources) significantly impair the potential of tag clouds as a useful tool for navigation. Based on our findings, we identify a number of avenues for further research and the design of novel tag cloud construction algorithms. Our work is relevant for researchers interested in navigability of emergent hypertext structures, and for engineers seeking to improve the navigability of social tagging systems.
On the Navigability of Social Tagging Systems
Denis Helic
, Christoph Trattner
, Markus Strohmaier
, Keith Andrews
Knowledge Management Institute
Graz University of Technology
Graz, Austria
Email: {dhelic,markus.strohmaier}
Institute for Information Systems and Computer Media
Graz University of Technology
Graz, Austria
Email: {ctrattner,kandrews}
Know-Center, Graz University of Technology, Graz, Austria
Abstract—It is a widely held belief among designers of social
tagging systems that tag clouds represent a useful tool for
navigation. This is evident in, for example, the increasing number
of tagging systems offering tag clouds for navigational purposes,
which hints towards an implicit assumption that tag clouds
support efficient navigation. In this paper, we examine and test
this assumption from a network-theoretic perspective, and show
that in many cases it does not hold. We first model navigation
in tagging systems as a bipartite graph of tags and resources
and then simulate the navigation process in such a graph. We
use network-theoretic properties to analyse the navigability of
three tagging datasets with regard to different user interface
restrictions imposed by tag clouds. Our results confirm that tag-
resource networks have efficient navigation properties in theory,
but they also show that popular user interface decisions (such
as “pagination” combined with reverse-chronological listing of
resources) significantly impair the potential of tag clouds as a
useful tool for navigation. Based on our findings, we identify
a number of avenues for further research and the design of
novel tag cloud construction algorithms. Our work is relevant
for researchers interested in navigability of emergent hypertext
structures, and for engineers seeking to improve the navigability
of social tagging systems.
In social tagging systems such as Flickr and Delicious, tag
clouds have emerged as an interesting alternative to traditional
forms of navigation and hypertext browsing. The basic idea
is that tag clouds provide navigational clues by aggregating
tags and corresponding resources from multiple sources, and
by displaying them in a visually appealing fashion. Users are
presented with these tag clouds as a means for exploring and
navigating the resource space in social tagging systems.
While tag clouds can potentially serve different purposes,
there seems to be an implicit assumption among engineers of
social tagging systems that tag clouds are specifically useful to
support navigation. This is evident in the large-scale adoption
of tag clouds for interlinking resources in numerous systems
such as Flickr, Delicious, and BibSonomy. However, this
Navigability Assumption has hardly been critically reflected
(with some notable exceptions, for example [1]), and has
largely remained untested in the past. In this paper, we will
demonstrate that the prevalent approach to tag cloud-based
navigation in social tagging systems is highly problematic
with regard to network-theoretic measures of navigability. In
a series of experiments, we will show that the Navigability
Assumption only holds in very specific settings, and for the
most common scenarios, we can assert that it is wrong.
While recent research has studied navigation in social
tagging systems from user interface [2], [3], [4] and network-
theoretic [5] perspectives, the unique focus of this paper is
the intersection of these issues. With that focus, we want to
answer questions such as: How do user interface constraints of
tag clouds affect the navigability of tagging systems? And how
efficient is navigation via tag clouds from a network-theoretic
Particularly, we will first 1) investigate the intrinsic navi-
gability of tagging datasets without considering user interface
effects, and then 2) take pragmatic user interface constraints
into account. Next, 3) we will demonstrate that for many social
tagging systems, the Navigability Assumption does not hold
and we will finally 4) use our findings to illuminate a path
towards improving the navigability of tag clouds.
To the best of our knowledge, this paper is among the first to
study what we have called the Navigability Assumption of Tag
Clouds, i. e. the widely held belief that tag clouds are useful
for navigating social tagging systems. One of the main results
of this paper is a more critical stance towards the usefulness of
tag clouds as a navigational aid in tagging systems. We argue
that in order to make use of the full potential of tag clouds,
new ways of thinking about tag cloud algorithms are needed.
The paper is structured as follows: In Section 2 we present
our network-theoretic approach to assessing navigability of
tagging systems. Section 3 describes the analyzed datasets.
Section 4 presents and discusses the analysis results. Based
on our findings, we call for and discuss new ideas for tag
cloud algorithms in Section 5. Section 6 provides an overview
of related work. Finally, Section 7 concludes the paper and
presents directions for future work.
A tagging dataset is typically modeled as a tripartite hyper-
graph with V = R U T , where R is the resource set, U is
the user set, and T is the tag set [6], [7], [8]. An annotation
of a particular resource with a particular tag produced by a
particular user is a hyperedge (r, t, u), connecting three nodes
from these three disjoint sets.
Such a tripartite hypergraph can be mapped onto three
bipartite graphs connecting users and resources, users and tags,
and tags and resources. For different purposes it is often more
practical to analyse one or more of these bipartite graphs.
For example, in the context of ontology learning, the bipartite
graph of users and tags has been shown to be an effective
projection [9].
In this paper, we focus on tag-resource bipartite graphs.
These graphs naturally reflect the way users are supposed to
adopt tag clouds for navigating social tagging systems. For
example, in many tagging systems, tag clouds are intended to
be used in the following way:
1) The system presents a tag cloud to the user.
2) The user selects a tag from the tag cloud.
3) The system presents a list of resources tagged with the
selected tag.
4) The user selects a resource from the list of resources.
5) The system transfers the user to the selected resource,
and the process potentially starts anew.
We will study this general interaction schema and model
it with a simulated user moving along the edges of the
tag-resource bipartite graph and alternately visiting tag and
resource nodes.
To that end, we introduce a network-theoretic approach for
assessing the navigability and the efficiency of navigability
in such a bipartite graph. Ever since Milgram’s small world
experiment [10], researchers aimed to understand “navigabil-
ity” and in particular “efficient” navigation of networks (for
details see Section VI). Among others, two important results
stem from this line of research: (1) there exist short paths
between people (nodes) in a social network and (2) people
are able to navigate “efficiently” through the network having
only local knowledge of the network, i.e. knowing only their
personal contacts.
Kleinberg [11], [12], [13] and also independently Watts
[14] formalised these properties concluding that a navigable
network has a short path between all or almost all nodes in
the network [13]. Formally, such a network has a low diameter
bounded polylogarithmically, i.e. by a polynomial in logN ,
where N is the number of nodes in the network, and there
exists a giant component, i.e. a strongly connected component
containing almost all nodes [13]. Additionally, an “efficiently”
navigable network possesses certain structural properties so
that it is possible to design efficient decentralised search
algorithms (algorithms that only have local knowledge of the
network) [11], [12], [13]. The delivery time (the expected
number of steps to reach an arbitrary target node) of such
algorithms is polylogarithmic or at most sub-linear in N.
User navigation in hypertext systems is naturally modeled
as a decentralised search, i.e. at each particular node in the
network, users select a new node having only local knowledge
of the network and following the idea that the selected node
would bring them closest to their destination node. We use
this model to investigate the navigability of tag clouds next.
In the following, we conduct experiments aiming to shed
light on the navigability of tag-clouds in social tagging sys-
tems. We are particularly interested in studying how design
decisions, such as what tags to include in a tag cloud or how
many tags to display, effect the navigability of tag clouds.
While, today, designers often base such decisions on intuition
or heuristics, it is our goal to study the consequences of these
decisions experimentally, i.e. by exploring their empirical
effects on the network.
In our experiments, we used three datasets covering a range
of different settings.
Dataset Austria-Forum: This dataset consists of anno-
tations from an Austrian encyclopedia called Austria-
. The dataset contains 32,245 annotations and
12,837 unique resources. The system is at an early phase
of adoption, i.e. not many users currently contribute new
Dataset BibSonomy: This dataset
contains nearly all
916,495 annotations and 235,339 unique resources from a
dump of BibSonomy [15] until 2009-01-01. Annotations
from known spammers have been excluded from the
dataset. This dataset is obtained from a more mature
tagging system.
Dataset CiteULike: This dataset contains 6,328,021 an-
notations and 1,697,365 unique resources and is available
. Again, this is a dataset acquired from a more
mature tagging system.
Dataset Austria-Forum represents a tagging system at an
early stage of adoption. Datasets BibSonomy and CiteULike
are tagging systems which have reached a certain level of
maturity (i.e. attracted a larger set of active users). While
all three systems adopt tag clouds for navigational purposes,
their specific approaches vary. However, because the datasets
contain complete information about the tripartite graph, we can
experimentally manipulate the data in a way that simulates
different approaches to tag cloud construction consistently
across all datasets. We will describe how we manipulate the
data to simulate different user interface constraints next.
A. User Interface Issues
The first user interface restriction which we model is the
size of a tag cloud, i.e. the maximal number of tags displayed
in a tag cloud. While different tagging systems implement
different design choices, we can simulate alternative choices
across all datasets. For example, in some tagging systems the
maximum number of tags in a tag cloud might be 20, while
in others it might be much larger.
Another important issue of tag clouds is the algorithm used
to select the tags to display in a tag cloud. While, in theory,
there are many ways to compute and visualise tag clouds
[16], [17], [3], in practice many tagging systems follow a
simple resource-specific, TopN algorithm. In resource-specific
approaches to tag cloud construction, only tags assigned to the
corresponding resources are considered. In TopN approaches,
the top n tags with the highest resource-specific frequency are
chosen for display in the corresponding tag cloud. In cases
where less than n tags per resource are available, the remaining
slots are left empty.
For the experiments aiming to study the Navigational As-
sumption, we used the TopN algorithm (because it is the
most common) to reconstruct simulated networks of resource-
specific tag clouds for our three datasets.
Popular tags in a mature tagging system can cover hundreds
or even thousands of resources, which exceeds the pragmatic
limits of a system’s user interface. In this situation, tagging
systems usually resort to limiting the set of resources being
displayed for a given tag (for example, by sorting and “pag-
inating” the list of corresponding resources). To model such
limits, we introduce a pragmatic parameter, the length of the
resource list being presented, and denote it henceforth with k.
In the majority of tagging systems, the resource lists
presented after selecting a tag are usually sorted reverse-
chronologically (the resources most recently tagged are listed
first). For simplicity, in our experiments, we select the k
resources for k-limited resource lists randomly.
A. Intrinsic navigability of tagging systems
We start our study by analysing the navigability of tagging
systems in a synthetic network-theoretic case, i.e. without
taking any user interface restrictions into account. The first
row in each of Tables I(a), I(b), and I(c) present the obtained
results. The results show the existence of a giant component
connecting almost all of the nodes (98%), as well as the
existence of a low effective diameter (less than 7, i.e. it is
less than polynomial in logN , see Figure 1).
The only exception here is the Austria-Forum dataset. We
speculate that the reason for that is due to the system being
in an early adoption stage. While the effective diameter of
the Austria-Forum dataset is larger than the one in the two
other datasets (see Figure 1), it is still limited polylogarith-
mically, whereas the giant component contains only 77% of
nodes. This result suggests that the Navigability Assumption
depends on the adoption stage of the tagging system under
investigation, i. e. the assumption may only hold for more
mature tagging systems BibSonomy or CiteULike. We leave
the issue of identifying the point in time where immature
tagging systems transition to tagging systems exhibiting more
useful navigational properties to future research. At this point,
we simply observe that the Navigation Assumption is sensitive
to the stage of adoption of a tagging system.
Result 1: The usefulness of tag clouds for navigation is
sensitive to the phase of adoption of the social tagging system.
0 2 4 6 8 10 12 14 16 18
Percentage of pairs of nodes
Number of hops
Austria-Forum EffDiam: 10.7262, G(24171, 64366)
BibSonomy EffDiam: 6.96109, G(291763, 1727992)
CiteULike EffDiam: 6.84779, G(2045200, 12298510)
Fig. 1. Hop plots for three different tagging datasets. We can observe the
shrinking diameter phenomenon [18]: The two mature datasets (Bibsonomy
and CiteULike, the two lines on the left) exhibit a small diameter, while the
Austria-Forum (a tagging system in an early adoption phase, the line on the
right) exhibits a larger diameter, and a larger ratio of long distances between
Figures 2(a), 2(b), and 2(c) show tag (blue), resource
(green), and degree (red) distributions for the analysed
datasets. The tag and resource distributions were obtained
by analysing a unidirectional bipartite graph, i.e. a graph
with only directed links from tags to resources. The out-
degree distribution and the in-degree distribution in this graph
correspond to tag distribution and to resource distribution
respectively. For certain ranges of degrees, both distributions
are power law distributions. There are deviations in the tail of
the tag distribution – these stem from the system tags assigned
to imported resources (see Figures 2(b) and 2(c)). The vertical
line in the tail of Figure 2(c) comes from the existence of
synonym tags in the dataset. The resource distributions exhibit
an exponential cut-off in the tail (see Figure 2(b)), a deviation
in the tail stemming from a test resource (see Figure 2(a)),
and a power law distribution as in Figure 2(c).
The degree distribution of the undirected bipartite graph (the
red line in Figures 2(a), 2(b) 2(c)) combines both tag and
resource distributions. For lower degrees, the combined degree
distribution takes the form of the resource distribution, i.e.
the number of resources with low frequencies dominates the
number of tags with low frequencies. For higher degrees, the
combined distribution takes the form of the tag distribution, i.e.
there are more tags with high frequencies than resources with
high frequencies. The tag distribution is two or more orders
of magnitude larger than the resource distribution, i.e. the tag
distribution strongly dominates the resource distribution for
higher degrees. That means that the network hubs (high-degree
nodes) are the “head” tags, i.e. the top tags for TopN tag cloud
construction algorithms.
Due to the existence of a giant component and a low
diameter, tagging systems are intrinsically navigable. In [19],
Adamic shows the existence of efficient decentralised nav-
Count (CCDF)
Austria-Forum G(24171, 64366).
Combined Degree Dist.
Resource Dist.
Tag Dist.
(a) Austria-Forum
Count (CCDF)
BibSonomy G(291763, 1727992).
Combined Degree Dist.
Resource Dist.
Tag Dist.
(b) BibSonomy
Count (CCDF)
CiteULike G(2045200, 12298510).
Combined Degree Dist.
Resource Dist.
Tag Dist.
(c) CiteULike
Fig. 2. Tag, resource, and degree distributions for the three datasets.
(a) Austria-Forum
none 0.77 10.73 none sub-lin.
n = 5 0.75 10.99 TopN sub-lin.
n = 10 0.76 11.3 TopN sub-lin.
n = 20 0.76 11.97 TopN sub-lin.
n = 30 0.76 11.05 TopN sub-lin.
k = 5 0.36 12.04 Chron. unnav.
k = 10 0.47 11.16 Chron. unnav.
k = 20 0.56 10.31 Chron. unnav.
k = 30 0.6 10.68 Chron. unnav.
(b) BibSonomy
none 0.98 6.96 none sub-lin.
n = 5 0.94 6.8 TopN sub-lin.
n = 10 0.97 6.87 TopN sub-lin.
n = 20 0.98 6.84 TopN sub-lin.
n = 30 0.98 6.91 TopN sub-lin.
k = 5 0.31 6.82 Chron. unnav.
k = 10 0.4 6.62 Chron. unnav.
k = 20 0.5 6.61 Chron. unnav.
k = 30 0.54 6.65 Chron. unnav.
(c) CiteULike
none 0.98 6.85 none sub-lin.
n = 5 0.93 6.97 TopN sub-lin.
n = 10 0.95 7.07 TopN sub-lin.
n = 20 0.97 7.17 TopN sub-lin.
n = 30 0.97 6.98 TopN sub-lin.
k = 5 0.27 6.89 Chron. unnav.
k = 10 0.36 6.95 Chron. unnav.
k = 20 0.44 6.91 Chron. unnav.
k = 30 0.48 7.05 Chron. unnav.
UIR = UI Restriction, GC = Giant Component, ED = Effective Diameter, UIA = UI Algorithm, NADT = Navigation Algorithm Delivery Time
Chron. = Chronological algorithm, sub-lin. = sub-linear, unnav. = unnavigable network
igation and search algorithms for power law networks. In
principle, a user could first navigate to a hub (which is
typically achieved in a few hops in a power law network)
and since hubs have a large out-degree, one can reach the
destination node easily. The delivery time of the algorithm
is sub-linear, although the number of inspected nodes in the
worst-case is O(N ), since sometimes the user needs to inspect
all outgoing links from a hub.
Result 2: Tagging networks are navigable power-law net-
works. For power law networks, efficient sub-linear decen-
tralised navigation algorithms exist.
B. Tag cloud size
Rows two to ve of Tables I(a), I(b), and I(c) show the
results of applying the TopN algorithm to limit the tag cloud
size on the analysed datasets. From a network-theoretic point
of view, limiting the tag cloud size means limiting the out-
degree of the resource nodes in the bipartite graph. The out-
degree of the resource nodes is two orders of magnitude
smaller then the out-degree of the tag nodes, indicating there
are no resource “hubs” in the network. Therefore, limiting the
tag cloud size does not influence the network to a large extent.
In other words, the structure of the network is still maintained,
i.e. the network remains a navigable network with navigation
efficiency inherent to power law networks.
Result 3: Limiting the tag cloud size to practically feasible
sizes (e.g. 5, 10, or more) does not influence navigability.
C. Pagination
Rows six to nine of Tables I(a), I(b), and I(c) contain
the results of simulating pagination with resource lists sorted
reverse-chronologically. Even without experiments, it is ev-
ident that limiting the number of links going out from a
tag node has destructive effects on the resulting network.
In other words, limiting the out-degree of hub nodes in a
power-law network destroys the connectivity of the network
as a whole. Our experiments show exactly that: the giant
component collapses, and the largest strongly connected com-
ponent now only contains around 50% or less nodes. As such,
pagination destroys network navigability, and the Navigability
Assumption only holds when we assume that users would be
able and willing to inspect long lists (>10.000) of resources
per tag, which is not reasonable. For example, we know from
search query log research that users rarely click on links
beyond the first result page [20]. This yields our final result:
Result 4: Limiting the out-degree of high frequency tags
(e.g. through pagination with resource lists sorted reverse-
chronologically) leaves the network vulnerable to fragmenta-
tion. This destroys navigability of prevalent approaches to tag
1: Input: G =< V, E >, r, t, k
2: for all r R
3: add t
to V
4: add (r, t
) to E
5: RL
f (r, k)
6: for all rr RL
7: add (t
, rr) to E
8: end for
9: end for
10: remove t from V
Fig. 3. Generalized pagination algorithm
The previous analysis illustrated the vulnerability of tagging
networks to the pagination effect, where a limit is placed on
the number of links going out from paginated tags, i.e. tags
with frequency higher than the pagination parameter k. This
vulnerability is mainly due to the simplicity of the common
pagination algorithm, i.e. the resource list is simply sorted
reverse-chronologically and only the k most recently tagged
resources are presented to the user. The algorithm does not
take into account the current user context, i.e. the resource
where the user clicks on a paginated tag. Rather the same
reverse-chronologically resource list is presented for a given
paginated tag throughout the system.
Let us now investigate possibilities to recover the nav-
igability of tagging networks by means of alternative tag
construction algorithms. To this end, we introduce an adapted
pagination algorithm. A simple generalisation of the pagina-
tion algorithm is to select k different resources out of all
resources tagged with a given paginated tag, depending on the
current user context, i.e. depending on the resource where the
user activates a paginated tag. Let us denote the resources list
of a given paginated tag t with R
. In this case, a particular
selection of resources for t becomes a function of a given
resource and parameter k, i.e. RL
= f (r, k). In other words,
each paginated tag is replaced by as many resource-specific
tags (t
) as there are resources in its resource list. Each
resource-specific tag is then connected to resources computed
by f (r, k). The pseudo-code of the generalised algorithm is
given in Figure 3.
We now discuss some potential functions f (r, k) for select-
ing resources from the available resource pool and analyse
their influence on network navigability.
A. Random link selection
A first obvious choice for f (r, k) is to select k resources
uniformly at random. This approach generates a random graph
as introduced by [21] for each given paginated tag. As [22]
and [23] showed, graphs generated uniformly at random are
typically connected and have with a high probability a
diameter bound by logN (already for out-degrees k 3).
However, since there are no structural clues in a randomly
generated network, a decentralized search algorithm will need
to inspect, in the worst case, all nodes of the network in order
to reach a destination node from the given starting node.
Table II shows the results of a random pagination algorithm
on the three test datasets. All three networks become strongly
connected with a giant component even for low values of k.
As expected, all three networks also possess a low diameter.
B. Hierarchical network model
In [13], Kleinberg introduced the hierarchical network
model and elegantly proved that it is possible to design
efficient decentralised search algorithms for such networks
with a delivery time polynomial in logN (for details see
Section VI). Put simply, Kleinberg showed that, if the nodes
of a network can be organised into a hierarchy, then such
a hierarchy provides a probability distribution for connecting
the nodes in the network. The resulting network is efficiently
navigable. A special case of the hierarchical network model is
given when there is a constant number of links leaving a node,
i.e. when the out-degree of a node is limited by a parameter
k as it is the case with pagination. In this case, the tree leaves
contain so-called clusters of nodes, i.e. a collection of a certain
constant number of nodes.
Thus, we developed a hierarchical network generator that 1)
sorts the resource list of a given paginated tag by frequency,
2) creates resource clusters of size 10 by traversing the sorted
resource list sequentially, 3) creates a balanced b-ary (b = 5)
tree where the number of leaves is equal to the number of
the resource clusters, 4) traverses the tree in postorder from
left to right and attaches resource clusters to the tree leaves,
and 5) uses this tree structure to obtain the link probability
distribution for connecting a resource-specific tag node with
resources of a given paginated tag.
It is important to note that the tree creation process follows
the statistical properties of the tagging dataset only, it has no
inherent semantic rationale. As such, it serves primarily as a
statistical tool to improve the efficiency of navigability from a
network-theoretic perspective. Table III provides an overview
of the results of the structural network analysis performed with
the three real-life datasets.
Another important observation is that in our model each
paginated tag is a source of a network generated by a hi-
erarchy. These networks are themselves connected through
tag co-occurrence in the dataset, i.e. since tags overlap and
share resources such shared resources link different generated
networks. This makes it more difficult to estimate the delivery
time of a decentralised search algorithm possessing only
the local knowledge. If the algorithm is extended to have
knowledge of all the hierarchies used in the generation of the
networks, then this additional information might be useful in
finding a destination node faster.
However, more theoretical work is needed to offer a proof
of this intuitive assumption. In addition, it would be interesting
to test these ideas empirically, for example, by implementing
the algorithm and applying it to the real-life datasets. An-
other interesting problem is the fitting of parameters for the
hierarchical network model, for example what is the optimal
(a) Austria-Forum
k=5 0.86 11.7 Random linear
k=10 0.86 11.02 Random linear
k=20 0.85 10 Random linear
k=30 0.84 10.42 Random linear
(b) BibSonomy
k=5 0.99 8.75 Random linear
k=10 0.99 6.97 Random linear
k=20 0.99 6.75 Random linear
k=30 0.99 6.46 Random linear
(c) CiteULike
k=5 0.99 7.98 Random linear
k=10 0.99 7.88 Random linear
k=20 0.99 7.13 Random linear
k=30 0.99 6.86 Random linear
UIR = UI Restriction, GC = Giant Component, ED = Effective Diameter, UIA = UI Algorithm, NADT = Navigation Algorithm Delivery Time
combination of the cluster size and the maximum number of
children, with respect to the size of the resource list and the
pagination parameter k.
C. Navigational and semantic penalty
The previous section shows that one way of designing an
efficiently navigable network in a tagging system is to classify
the resources of a given paginated tag into a hierarchy.
The hierarchy used in our experiments does not possess
any semantic grounding, but it is optimal from a navigational
point of view. However, improvements of our algorithms will
need to take the semantics of the dataset into account by
identifying a set of resource attributes. For example, resource
attributes might be the date of creation, authors, other tags,
or even attributes external to the system such as URLs, full-
text, or title. To design a navigable network, the pagination
algorithm needs to organise these resource attributes into a
hierarchy. At the same time, it is difficult to expect that an
algorithm taking into account the semantics of resources can
produce an optimal hierarchy that optimizes navigability of
the tagging system as a whole. Rather, the semantic algorithm
will tend to produce an unbalanced tree with a variable cluster
size. As a consequence, the navigational structure generated
by such an algorithm will be sub-optimal, i.e. a decentralised
search algorithm will need to take more steps (investigate more
nodes) to find a destination node. We will call this effect
the navigational penalty. Of course, the pagination algorithm
might be altered to produce a tree closer to the optimal tree
from the navigational point of view. This, however, seems
possible only by breaking semantics to a certain extent. We
will call this contrasting effect the semantic penalty. This
reveals an essential trade-off which tag cloud construction
algorithms will need to address: balancing the navigational
and semantic penalties.
Let us illustrate the navigational and semantic penalties
with an example. Suppose we have 1000 resources about
Austrian cities tagged with Austria”. A particular tagging
system might decide to paginate that tag with a pagination
parameter of k = 20 (listing 20 resources per page). Firstly, the
system would need to semantically classify the resources into
a clustered hierarchy. For example, it could take geography as
the criteria for creating clusters: each cluster corresponding to
an Austrian province. However, the size of the clusters varies
and the province of Vienna (the capital of Austria) might
dominate, since it contains, say, 500 resources. Generating
the network from such an unbalanced hierarchy will result
in a navigational penalty, whereas a new classification of the
resources taking into account the Vienna districts as a further
geographical refinement to balance the cluster size may cause
a semantic penalty, if the Vienna province is represented at a
finer level of detail than other provinces.
Further research needs to shed light on how to measure
and balance the trade-off between navigational and semantic
penalties. For example, there are measures such as FolkRank
[24] or TagCont [25] which might represent an interesting
starting point for quantifying the effects of the semantic
We start our review of related work with a brief overview
of network-related research. Research on network navigability
has been inspired by Milgram’s small world experiment [10].
In this experiment, selected persons from Nebraska received
a letter they were then asked to send through their social
networks to a stockbroker in Boston. The striking result of
the study was that, for those letters reaching the destination,
the average number of hops was around 6, i.e. the population
of the USA constituted a “small world”. While the conclusions
have been challenged [26], this experiment has attracted a
great deal of interest in the research community.
Numerous researchers analysed Milgram’s experiment try-
ing to create network models and generators able to produce
such “small world” networks (see for example [27]). The
lattice model by Watts [28] mimics a real-life social network,
where people are primarily connected to their neighbours with
a few “long-range” contacts. The networks generated by this
model have, like the random graph model [22], [23], a giant
component and a diameter bound by logN .
Kleinberg analysed the second result of the Milgram’s
experiment, the ability of people to find a short path when
there is such a path between two nodes [11], [12], [13]. He
concluded that there are structural clues in such networks,
which allow people to find a short path efficiently and argued
that for an “efficiently” navigable network there exists a
decentralised search algorithm with delivery time polynomial
in logN .
Kleinberg also designed a number of network models such
as 2D-grid models [12], hierarchical models [13], and group
models [13], and showed that for certain combinations of
parameters, efficient decentralised search algorithms exist.
(a) Austria-Forum
k=5 0.85 12.03 Hier. polylog.
k=10 0.86 10.62 Hier. polylog.
k=20 0.85 9.29 Hier. polylog.
k=30 0.84 9.71 Hier. polylog.
(b) BibSonomy
k=5 0.99 8.82 Hier. polylog.
k=10 0.99 7.62 Hier. polylog.
k=20 0.99 6.94 Hier. polylog.
k=30 0.99 6.75 Hier. polylog.
(c) CiteULike
k=5 0.99 8.76 Hier. polylog.
k=10 0.99 7.6 Hier. polylog.
k=20 0.99 6.36 Hier. polylog.
k=30 0.99 5.89 Hier. polylog.
UIR = UI Restriction, GC = Giant Comp., ED = Eff. Diameter, UIA = UI Algorithm, NADT = Navigation Algorithm Delivery Time
Hier. = Hierarchical Algorithm, polylog. = polylogarithmic
Kleineberg also showed that there is no such algorithm for
the lattice model.
Particularly, hierarchical network models [13] are based on
the idea that, in many settings, the nodes in a network might
be classified according to a taxonomy. The taxonomy can be
represented as a b-ary tree and network nodes can be attached
to the leaves of the tree. For each node v, we can create a link
to all other nodes w with the probability that decreases with
h(v, w) where h is the height of the least common ancestor of
v and w in the tree. For a constant out-degree, the nodes are
clustered and then the clusters are attached to the tree. The
link distribution defined by f(h) = (h + 1)
generates a
navigable network with a decentralised search algorithm with
delivery time of O(log
In related research of tagging systems, tag clouds have
been characterised as a way to translate the emergent vo-
cabulary of a folksonomy into social navigation tools [4],
[29]. Social navigation itself represents a multi-dimensional
concept, covering a range of different issues and ideas. A
distinction between direct and indirect social navigation, for
example, highlights whether navigational clues are provided by
direct communication among users (e.g. via chat), or whether
navigational clues are indirectly inferred from historical traces
left by others [30]. Based on this distinction, our work only
focuses on indirect social navigation in the sense that it studies
the effectiveness of traces (“tags”) left by users in tagging
systems. Other types of social navigation emphasise the need
to show the presence of others users, to build trust among
groups of users, or to encourage certain behaviour [30].
Researchers have discussed the advantages and drawbacks
of tag clouds, suggesting that tag clouds are a useful mech-
anism when users’ search tasks are general and explorative
(for example, learn about Web 2.0), while tag clouds provide
little value for specific information-seeking tasks (for example,
navigate to [4], [31]. While the paper at hand
focuses on network-theoretic aspects, cognitive aspects of nav-
igation have been studied previously using, for example, SNIF-
ACT [32] and social information foraging theory [33]. Other
work has studied the motivations of users for tagging [34],
[35], and how they influence emergent semantic (as opposed
to navigational) structures. The navigational utility of single
tags has been investigated [36] with somewhat disappointing
results. With time the tags become harder and harder to use as
they lose specificity and reference too many resources. Such
tags are exactly those paginated tags where new pagination
algorithms are needed.
Navigation models for tagging systems have been also dis-
cussed recently. In [8] authors describe a navigation framework
for tagging systems. The authors apply the framework to
analyze possible attacks on tagging systems. In principle, the
framework identifies a navigation channels as any combination
of the basic elements of a tagging system (users, tags, and
resources). Thus, the specific combination which we investi-
gated in this paper can be summarized as the resource-tag or
tag-resource navigation channel.
Recent literature also discusses algorithms for the construc-
tion of tag clouds. The ELSABer algorithm [37] represents
an example of such an effort aimed towards identifying
hierarchical relationships between annotations to facilitate
browsing. The work by [38] is another example, introducing
entropy-based algorithms for the construction of interesting
tag clouds. However, these algorithms have not found wide-
spread adoption in current social tagging systems. In addition,
empirical studies of tagging systems have for example focused
on comparing navigational characteristics of tag distributions
to similar distributions produced by library terms [39].
Our work contributes to an increased theoretical understand-
ing about the navigability of current tag cloud algorithms in
social tagging systems. Our experiments identify empirical
problems related to the navigability of tag clouds in three real-
world tagging systems.
The motivation for this research was to examine and test
the widely held belief that tag clouds support efficient nav-
igation in social tagging systems. We have shown that for
certain specific, but popular, tag cloud scenarios, the so-called
Navigability Assumption does not hold. The results presented
in this paper make a theoretical and an empirical argument
against existing approaches to tag cloud construction. Our
work thereby both confirms and refutes the assumption that
current tag cloud incarnations are a useful tool for navi-
gating social tagging systems. While we confirm that tag-
resource networks have efficient navigational properties in
theory, we show that popular user interface decisions (such as
“pagination” combined with reverse-chronological listing of
resources) significantly impair navigability. Our experimental
results demonstrate that popular approaches to using tag clouds
for navigational purposes suffer from significant problems.
Building on recent research results from network theory, in
particular hierarchical network models, we have illustrated a
path towards constructing more efficiently navigable tag cloud
networks, which are less vulnerable to pagination influences.
Our findings suggest that engineers who want to design
effective tag cloud algorithms have to essentially strike a
balance between semantic and navigation penalties, in order to
make navigation in social tagging systems both efficient and
effective. We conclude that in order to make full use of the
potential of tag clouds for navigating social tagging systems,
new and more sophisticated ways of thinking about designing
tag cloud algorithms are needed.
[1] M. A. Hearst and D. Rosner, “Tag clouds: Data analysis tool or
social signaller?” in HICSS ’08: Proceedings of the Proceedings of
the 41st Annual Hawaii International Conference on System Sciences.
Washington, DC, USA: IEEE Computer Society, 2008.
[2] C. S. Mesnage and M. J. Carman, “Tag navigation, in SoSEA ’09:
Proceedings of the 2nd international workshop on Social software
engineering and applications. New York, NY, USA: ACM, 2009, pp.
[3] A. W. Rivadeneira, D. M. Gruen, M. J. Muller, and D. R. Millen,
“Getting our head in the clouds: toward evaluation studies of tagclouds,
in CHI ’07: Proceedings of the SIGCHI conference on Human factors
in computing systems. New York, NY, USA: ACM, 2007, pp. 995–998.
[4] J. Sinclair and M. Cardew-Hall, “The folksonomy tag cloud: when is it
useful?” Journal of Information Science, vol. 34, p. 15, 2008. [Online].
[5] N. Neubauer and K. Obermayer, “Hyperincident connected components
of tagging networks, in HT ’09: Proceedings of the 20th ACM confer-
ence on Hypertext and hypermedia. New York, NY, USA: ACM, 2009,
pp. 229–238.
[6] C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto,
A. Hotho, M. Grahl, and G. Stumme, “Network properties of folk-
sonomies, AI Commun., vol. 20, no. 4, pp. 245–262, 2007.
[7] C. Schmitz, A. Hotho, R. J
aschke, and G. Stumme, “Mining association
rules in folksonomies, in Data Science and Classification: Proc. of
the 10th IFCS Conf., Studies in Classification, Data Analysis, and
Knowledge Organization. Springer, 2006, pp. 261–270.
[8] M. Ramezani, J. Sandvig, T. Schimoler, J. Gemmell, B. Mobasher, and
R. Burke, “Evaluating the impact of attacks in collaborative tagging
environments, in Computational Science and Engineering, 2009. CSE
’09. International Conference on, vol. 4, aug. 2009, pp. 136 –143.
[9] P. Mika, “Ontologies are us: A unified model of social networks and
semantics, Web Semantics: Science, Services and Agents on the World
Wide Web, vol. 5, no. 1, pp. 5–15, 2007.
[10] S. Milgram, “The small world problem, Psychology Today, vol. 1, pp.
60–67, 1967.
[11] J. M. Kleinberg, “Navigation in a small world, Nature, vol. 406, no.
6798, August 2000.
[12] J. Kleinberg, “The small-world phenomenon: An algorithmic perspec-
tive, in Proceedings of the 32nd ACM Symposium on Theory of
Computing, 2000.
[13] J. M. Kleinberg, “Small-world phenomena and the dynamics of infor-
mation, in Advances in Neural Information Processing Systems (NIPS)
14. MIT Press, 2001, p. 2001.
[14] D. J. Watts, P. S. Dodds, and M. E. J. Newman, “Identity and search in
social networks, Science, vol. 296, pp. 1302–1305, 2002.
[15] A. Hotho, R. J
aschke, C. Schmitz, and G. Stumme, “Bibsonomy: A
social bookmark and publication sharing system, in Proceedings of
the Conceptual Structures Tool Interoperability Workshop at the 14th
International Conference on Conceptual Structures, 2006, pp. 87–102.
[16] T. Eda, T. Uchiyama, T. Uchiyama, and M. Yoshikawa, “Signaling emo-
tion in tagclouds, in WWW ’09: Proceedings of the 18th international
conference on World wide web. New York, NY, USA: ACM, 2009, pp.
[17] O. Kaser and D. Lemire, “Tag-Cloud Drawing: Algorithms for Cloud
Visualization, Proceedings of Tagging and Metadata for Social
Information Organization (WWW 2007), 2007. [Online]. Available:
[18] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graphs over time: densi-
fication laws, shrinking diameters and possible explanations, in KDD
’05: Proceedings of the eleventh ACM SIGKDD international conference
on Knowledge discovery in data mining. New York, NY, USA: ACM,
2005, pp. 177–187.
[19] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman,
“Search in power-law networks, Physical Review E, vol. 64, no. 4, pp.
046 135 1–8, Sep 2001.
[20] Y. Zhang, B. Jansen, and A. Spink, “Time series analysis of a Web
search engine transaction log, Information Processing & Management,
vol. 45, no. 2, pp. 230–245, 2009.
[21] P. Erdos and A. Renyi, “On the evolution of random graphs, Publ.
Math. Inst. Hung. Acad. Sci, vol. 5, pp. 17–61, 1960.
[22] B. Bollob
as and W. F. de la Vega, “The diameter of random regular
graphs, Combinatorica, vol. 2, no. 2, pp. 125–134, 1982.
[23] B. Bollob
as and F. R. K. Chung, “The diameter of a cycle plus a random
matching, SIAM J. Discret. Math., vol. 1, no. 3, pp. 328–333, 1988.
[24] A. Hotho, R. J
aschke, C. Schmitz, and G. Stumme, “Folkrank: A ranking
algorithm for folksonomies, in Proc. FGIR 2006, 2006.
[25] C. Koerner, D. Benz, A. Hotho, M. Strohmaier, and G. Stumme, “Stop
thinking, start tagging: Tag semantics arise from collaborative verbosity,
in 19th International World Wide Web Conference (WWW2010), Raleigh,
NC, USA, April 26-30, ACM, 2010.
[26] J. Kleinfeld, “Could it be a big world after all? The six degrees of
separation myth, Society, April 2002.
[27] M. Kochen, Ed., The Small World. Norwood, NJ: Ablex, 1989.
[28] D. J. Watts and S. H. Strogatz, “Collective dynamics of small-world
networks, Nature, vol. 393, no. 6684, pp. 440–442, June 1998.
[29] A. Dieberger, “Supporting social navigation on the world wide web,
Int. J. Hum.-Comput. Stud., vol. 46, no. 6, pp. 805–825, 1997.
[30] D. Millen and J. Feinberg, “Using social tagging to improve social
navigation, in Workshop on the Social Navigation and Community
Based Adaptation Technologies. Dublin, Ireland. Citeseer, 2006.
[Online]. Available:\&amp;rep=rep1\&amp;type=pdf
[31] M. Strohmaier, C. Trattner, and D. Helic, “The benefits and limitations
of tag clouds as a tool for social navigation from a network-theoretic
perspective, Journal of Universal Computer Science, 2010, (UNDER
[32] W.-T. Fu and P. Pirolli, “Snif-act: a cognitive model of user navigation
on the world wide web, Hum.-Comput. Interact., vol. 22, no. 4, pp.
355–412, 2007.
[33] P. Pirolli, “An elementary social information foraging model, in Pro-
ceedings of the 27th international conference on Human factors in
computing systems. ACM, 2009, pp. 605–614.
[34] M. Strohmaier, C. Koerner, and R. Kern, “Why do users tag? Detecting
users’ motivation for tagging in social tagging systems, in International
AAAI Conference on Weblogs and Social Media (ICWSM2010), Wash-
ington, DC, USA, May 23-26, 2010.
[35] C. Koerner, R. Kern, H. P. Grahsl, and M. Strohmaier, “Of categorizers
and describers: An evaluation of quantitative measures for tagging
motivation, in 21st ACM SIGWEB Conference on Hypertext and Hy-
permedia (HT 2010), Toronto, Canada, ACM, Toronto, Canada, June
[36] E. H. Chi and T. Mytkowicz, “Understanding the efficiency of social
tagging systems using information theory, in HT ’08: Proceedings of
the nineteenth ACM conference on Hypertext and hypermedia. New
York, NY, USA: ACM, 2008, pp. 81–88.
[37] R. Li, S. Bao, Y. Yu, B. Fei, and Z. Su, “Towards effective browsing
of large scale social annotations, Proceedings of the 16th international
conference on World Wide Web, p. 952, 2007. [Online]. Available:
[38] K. Aouiche, D. Lemire, and R. Godin, “Web 2.0 OLAP: From Data
Cubes to Tag Clouds, 4th International Conference, WEBIST 2008,
vol. 18, 2008. [Online]. Available:
[39] P. Heymann, A. Paepcke, and H. Garcia-Molina, “Tagging human
knowledge, in Proceedings of the Third ACM International Conference
on Web Search and Data Mining. New York, NY, USA: ACM, 2010,
pp. 51–61.
... Helic et al. [18], however, challenged the underlying hypothesis that tagging systems also support efficient navigation. ...
Full-text available
Tagging facilitates information retrieval in social media and other online communities by allowing users to organize and describe online content. Researchers found that the efficiency of tagging systems steadily decreases over time, because tags become less precise in identifying specific documents, i.e., they lose their descriptiveness. However, previous works did not answer how or even whether community managers can improve the efficiency of tags. In this work, we use information-theoretic measures to track the descriptive and retrieval efficiency of tags on Stack Overflow, a question-answering system that strictly limits the number of tags users can specify per question. We observe that tagging efficiency stabilizes over time, while tag content and descriptiveness both increase. To explain this observation, we hypothesize that limiting the number of tags fosters novelty and diversity in tag usage, two properties which are both beneficial for tagging efficiency. To provide qualitative evidence supporting our hypothesis, we present a statistical model of tagging that demonstrates how novelty and diversity lead to greater tag efficiency in the long run. Our work offers insights into policies to improve information organization and retrieval in online communities.
... Social tagging systems also exhibit different kinds of tag-based browsing models [5]. The simpler ones are one-level tag cloud based systems that only make it possible to select a single tag [12] [16]. This kind of browsing process can be managed efficiently with an inverted index. ...
Conference Paper
Full-text available
In this paper we describe one of the browsing strategies for learning object repositories implemented in the Clavy platform, a platform for the management of repositories with reconfigurable structures. Since Clavy makes it possible to dynamically modify the structure of the repositories, it must support a navigation model independent of that structure. For this purpose, the platform adopts a tag-based browsing model, according to which users select descriptive tags to filter learning objects (in Clavy, these descriptive tags correspond to element-value pairs). In our experience using Clavy, we have realized that updating the browsing state when the user changes the set of selected tags can be a costly process. The proposed strategy alleviates this cost by combining inverted indexes with a multilevel cache model that enables the system, on the one hand, to cache filtered objects (i.e., set of objects filtered by sets of selected tags), and, on the other hand, selectable tags (i.e., set of tags able to shrink sets of objects).
... Tags are tools to mark resources, on the one hand for guiding other users to have information (Helic, 2010), and on the other hand to receive information about a user due to the history of tagging (Gupta, 2010). ...
Full-text available
Les technologies des entrepôts de données et de l’analyse en ligne sont au coeur des systèmes décisionnels modernes. Consciente du rôle des recherches dans cette thématique, la communauté scientifique internationale ne cesse d'accorder une importance de plus en plus grandissante, voire privilégiée, à ce domaine ce qui s'est traduit par l'apparition de manifestations scientifiques venant enrichir le panorama des rencontres entre chercheurs et industriels. Forte de son succès graduel et dans le prolongement des éditions précédentes (Agadir– Maroc 2006, Sousse-Tunisie 2007, Mohammedia-Maroc 2008, Jijel-Algérie 2009, Sfax- Tunisie 2010, Blida-Algérie 2012 et Marrakech-Maroc 2013, Hammamet-Tunisie 2014), ASD fait peau neuve et s’est convertie depuis sa 7ème édition en 2013 en Conférence Maghrébine sur les Avancées des Systèmes Décisionnels. Cette nouvelle édition ASD 2015 est accueillie cette année par le Maroc. ASD 2015 ambitionne de consolider les expériences conduites par les chercheurs, industriels et utilisateurs issus de communautés travaillant sur les systèmes décisionnels. L'objectif de cette neuvième édition de la conférence, en particulier après le succès des précédentes éditions, est de contribuer à dynamiser davantage la recherche dans ce domaine et à créer une synergie entre les chercheurs, essentiellement mais non exclusivement maghrébins, travaillant dans leur pays ou dans des laboratoires de recherche à l'étranger. D’autre part, elle vise à renforcer les liens existants et à tisser de nouvelles relations afin de faire émerger une communauté thématisée systèmes décisionnels au niveau du Maghreb. Ces actes regroupent les articles acceptés et présentés à cette nouvelle édition. ASD 2015 a reçu 34 soumissions d'articles en provenance de six pays (Algérie, France, Lybie, Maroc, Tunisie). Après évaluation par les membres du comité scientifique, composé par 59 chercheurs-experts internationaux du domaine, 15 articles longs et 9 articles courts ont été retenus. Ces papiers couvrent différents thèmes de recherche et d’application sur les systèmes décisionnels. ASD 2015 est organisé par l’Ecole Nationale des Sciences Appliquées de Tanger, Maroc et a reçu son soutien ainsi que celui de différentes institutions publiques d’enseignement et de recherche que nous tenons à remercier : l'Université Abdelmalek Essâadi, l’Ecole des Nouvelles Sciences et Ingénierie de Tanger (ENSI), Le Centre National pour la Recherche Scientifique et Technique (CNRST), le Laboratoire LabTIC de l’université Abdelmalek Essâadi, le Laboratoire ERIC de l’Université Lyon 2, l'Université HASSAN II Mohammedia-Casablanca, la Faculté des Sciences et Techniques de Mohammedia, la Faculté des Sciences Economiques et de Gestion de Sfax ; le Centre de Recherche en Informatique, Multimédia et Traitement Numérique des Données de Sfax, l’association ii AMINTIS, et toutes les autres institutions qui ont aidé de loin ou de près pour la réussite de cette manifestation. Le succès de cette nouvelle édition d’ASD n'aurait pas été réalisé sans la coopération étroite des trois comités : de pilotage, scientifique et d’organisation, que nous tenons également à remercier très chaleureusement. Nous sommes très reconnaissants de leur soutien. Nous voulons remercier l’ensemble des auteurs qui ont soumis à cette édition d’ASD. Nous félicitons ceux dont les articles ont été acceptés. Nous encourageons les autres auteurs des papiers non retenus à persévérer et à poursuivre leurs efforts.
... Journal of Data and Information Science Candan, Di Caro, & Sapino, 2008;Helic et al., 2010;Heymann, & Garcia-Molina, 2006;Huang et al., 2013;Sinclair & Cardew-Hall, 2008). These hierarchies usually involved various tags and were more practical when users were not familiar with the resource domain. ...
Full-text available
Purpose: This study introduces an algorithm to construct tag trees that can be used as a user-friendly navigation tool for knowledge sharing and retrieval by solving two issues of previous studies, i.e. semantic drift and structural skew. Design/methodology/approach: Inspired by the generality based methods, this study builds tag trees from a co-occurrence tag network and uses the h-degree as a node generality metric. The proposed algorithm is characterized by the following four features: (1) the ancestors should be more representative than the descendants, (2) the semantic meaning along the ancestor-descendant paths needs to be coherent, (3) the children of one parent are collectively exhaustive and mutually exclusive in describing their parent, and (4) tags are roughly evenly distributed to their upper-level parents to avoid structural skew. Findings: The proposed algorithm has been compared with a well-established solution Heymann Tag Tree (HTT). The experimental results using a social tag dataset showed that the proposed algorithm with its default condition outperformed HTT in precision based on Open Directory Project (ODP) classification. It has been verified that h-degree can be applied as a better node generality metric compared with degree centrality. Research limitations: A thorough investigation into the evaluation methodology is needed, including user studies and a set of metrics for evaluating semantic coherence and navigation performance. Practical implications: The algorithm will benefit the use of digital resources by generating a flexible domain knowledge structure that is easy to navigate. It could be used to manage multiple resource collections even without social annotations since tags can be keywords created by authors or experts, as well as automatically extracted from text. Originality/value: Few previous studies paid attention to the issue of whether the tagging systems are easy to navigate for users. The contributions of this study are twofold: (1) an algorithm was developed to construct tag trees with consideration given to both semantic coherence and structural balance and (2) the effectiveness of a node generality metric, h-degree, was investigated in a tag co-occurrence network.
This book includes a selection of articles from The 2019 World Conference on Information Systems and Technologies (WorldCIST’19), held from April 16 to 19, at La Toja, Spain. WorldCIST is a global forum for researchers and practitioners to present and discuss recent results and innovations, current trends, professional experiences and challenges in modern information systems and technologies research, together with their technological development and applications. The book covers a number of topics, including A) Information and Knowledge Management; B) Organizational Models and Information Systems; C) Software and Systems Modeling; D) Software Systems, Architectures, Applications and Tools; E) Multimedia Systems and Applications; F) Computer Networks, Mobility and Pervasive Systems; G) Intelligent and Decision Support Systems; H) Big Data Analytics and Applications; I) Human–Computer Interaction; J) Ethics, Computers & Security; K) Health Informatics; L) Information Technologies in Education; M) Information Technologies in Radiocommunications; and N) Technologies for Biomedical Applications.
Tag-based browsing is a common interaction technique in business, the culture industry and many other domains. According to this technique, digital resources have a set of descriptive tags associated, which can be used to perform an exploratory search, letting users focus on interesting resources. For this purpose, a set of tags is collected sequentially, and, at each stage, the set of resources described by all the selected tags is filtered. This browsing style can be implemented using inverted indexes. However, this implementation requires a considerable amount of set operations during navigation, which can have a negative impact on user experience. In this paper we propose addressing this shortcoming by using a cache that makes it possible to identify equivalent browsing states (i.e., states yielding the same set of filtered resources), which in turn will avoid redundant computations. The technique proposed will be compared with more basic implementations using a real-world web-based collection in the field of digital humanities.
Tag-based browsing is a popular interaction model for navigating digital libraries. According to this model, users select descriptive tags to filter resources in the collections. Typical implementations of the model are based on inverted indexes. However, these implementations can require a considerable amount of set operations to update the browsing state. To palliate this inconvenience, it is possible to adopt suitable cache strategies. In this paper we describe and compare two of these strategies: (i) a query-based strategy, according to which previously computed browsing states are indexed by sets of selected tags; and (ii) a resource-based strategy, according to which browsing states are indexed by sets of filtered resources. Our comparison focused on runtime performance, and was carried out empirically, using a real-world web-based collection in the field of digital humanities. The results obtained show that the resource-based strategy clearly outperforms the query-based one.
Allowing users to organize content by tagging resources in webbased systems has led to the emergence of the so-called SocialWeb. Tags turned out to be helpful not only for giving recommendations and improving search in social tagging systems but also for enhancing information access by navigating. In this chapter, we will cover much of the pioneer research work that has studied tag-based navigation and visualization. After giving a short overview of the social tagging process and its specifics, we provide an extensive description of the typical user interfaces and visualization techniques characteristic for social tagging systems. As the efficiency of tag-based navigation depends on structuring tagging data, we also provide a review of the state of the art algorithms for tag clustering. Before we conclude, we demonstrate how tag-based navigation can be modeled and discuss the intrinsic navigability of social tagging systems from various theoretic perspectives.
Full-text available
We describe the development of a computational cognitive model that explains navigation behavior on the World Wide Web. The model, called SNIF-ACT (Scent-based Navigation and Information Foraging in the ACT cognitive architecture), is motivated by Information Foraging Theory (IFT), which quantifies the perceived relevance of a Web link to a user's goal by a spreading activation mechanism. The model assumes that users evaluate links on a Web page sequentially and decide to click on a link or to go back to the previous page by a Bayesian satisficing model (BSM) that adaptively evaluates and selects actions based on a combination of previous and current assessments of the relevance of link texts to information goals. SNIF-ACT 1.0 utilizes the measure of utility, called information scent, derived from IFT to predict rankings of links on different Web pages. The model was tested against a detailed set of protocol data collected from 8 participants as they engaged in two information-seeking tasks using the World Wide Web. The model provided a good match to participants' link selections. In SNIF-ACT 2.0, we included the adaptive link selection mechanism from the BSM that sequentially evaluates links on a Web page. The mechanism allowed the model to dynamically build up the aspiration levels of actions in a satisficing process (e.g., to follow a link or leave a Web site) as it sequential assessed link texts on a Web page. The dynamic mechanism provides an integrated account of how and when users decide to click on a link or leave a page based on the sequential, ongoing experiences with the link context on current and previous Web pages. SNIF-ACT 2.0 was validated on a data set obtained from 74 subjects. Monte Carlo simulations of the model showed that SNIF-ACT 2.0 provided better fits to human data than SNIF-ACT 1.0 and a Position model that used position of links on a Web page to decide which link to select. We conclude that the combination of the IFT and the BSM provides a good description of user-Web interaction. Practical implications of the model are discussed.
In our work the traditional bipartite model of ontologies is extended with the social dimension, leading to a tripartite model of actors, concepts and instances. We demonstrate the application of this representation by showing how community-based semantics emerges from this model through a process of graph transformation. We illustrate ontology emergence by two case studies, an analysis of a large scale folksonomy system and a novel method for the extraction of community-based ontologies from Web pages.
In this paper, we use time series analysis to evaluate predictive scenarios using search engine transactional logs. Our goal is to develop models for the analysis of searchers’ behaviors over time and investigate if time series analysis is a valid method for predicting relationships between searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web search engine transactional log and time series analysis to investigate users’ actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of searchers clicked on sponsored links. However, from 22:00 to 24:00, searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and transactional queries, while rates for transactional queries vary during different periods. Our results show that the average length of a searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future research.
Conference Paper
A fundamental premise of tagging systems is that regular users can organize large collections for browsing and other tasks using uncontrolled vocabularies. Until now, that premise has remained relatively unexamined. Using library data, we test the tagging approach to organizing a collection. We find that tagging systems have three major large scale organizational features: consistency, quality, and completeness. In addition to testing these features, we present results suggesting that users produce tags similar to the topics designed by experts, that paid tagging can effectively supplement tags in a tagging system, and that information integration may be possible across tagging systems.