Article

A call for social tagging datasets

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Tagging represents a new, user-driven form of indexing and labeling resources on the web. The notion of "social tagging" usually refers to web-based systems that are supporting users in collaboratively tagging and sharing resources, such as Delicious, Flickr and others. In recent years, social tagging systems have emerged as an interesting alternative to labeling and linking resources on the web. This development has created an interesting opportunity for the Hypertext research community to gain new perspectives and understanding about the dynamics and nature of tagging and linking in large scale, participative Hypertext systems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Since the work on the extraction of formal semantics from folksonomies became an active topic of research in Semantic Web and related communities, the research community realised that the lack of golden standards and benchmarks significantly hinders the progress in this area: " The current lack of transparency about available social tagging datasets holds back research progress and leads to a number of problems including issues of efficiency , reproducibility, comparability and ultimately validity of research " [37]; and in particular about finding semantics in folksonomies " We have noticed the lack of testbeds and standard evaluation metrics that allow proper comparisons of the different research works " [29, 37, 5]. Actually, as reported in [37], some datasets are available of different folksonomies, however, none of them is yet annotated with links to an ontology disambiguating the tags' meaning (the tags' semantic). ...
... Since the work on the extraction of formal semantics from folksonomies became an active topic of research in Semantic Web and related communities, the research community realised that the lack of golden standards and benchmarks significantly hinders the progress in this area: " The current lack of transparency about available social tagging datasets holds back research progress and leads to a number of problems including issues of efficiency , reproducibility, comparability and ultimately validity of research " [37]; and in particular about finding semantics in folksonomies " We have noticed the lack of testbeds and standard evaluation metrics that allow proper comparisons of the different research works " [29, 37, 5]. Actually, as reported in [37], some datasets are available of different folksonomies, however, none of them is yet annotated with links to an ontology disambiguating the tags' meaning (the tags' semantic). ...
... Since the work on the extraction of formal semantics from folksonomies became an active topic of research in Semantic Web and related communities, the research community realised that the lack of golden standards and benchmarks significantly hinders the progress in this area: " The current lack of transparency about available social tagging datasets holds back research progress and leads to a number of problems including issues of efficiency , reproducibility, comparability and ultimately validity of research " [37]; and in particular about finding semantics in folksonomies " We have noticed the lack of testbeds and standard evaluation metrics that allow proper comparisons of the different research works " [29, 37, 5]. Actually, as reported in [37], some datasets are available of different folksonomies, however, none of them is yet annotated with links to an ontology disambiguating the tags' meaning (the tags' semantic). For instance, in the del.icio.us ...
Article
Full-text available
The aim of the INSEMTIVES project is to involve the users more heavily in the generation of semantic contents, i.e., contents with machine processable formal semantics. The goal of Workpackage 2 (Models and Methods for the Creation and Usage of Lightweight, Structured Knowledge) is to develop models and methods for storing and processing these semantics contents produced by the users as well as for helping the user in the annotation process. Because the end user is not supposed to be knowledgeable in the semantic technologies field, these models need to be suitable for storing {\em lightweight} semantic contents that, for example, can be generated by an ordinary user as part of her everyday activities. The previous deliverables of this Workpackage proposed models and methods based on the requirements collected from the use case partners and based on the analysis of the state-of-the-art. These deliverables are: D2.1.1~\cite{D211} (Report on the state-of-the-art and requirements for annotation representation models), D2.1.2~\cite{D212} (Specification of models for representing single-user and community-based annotations of Web resources), D2.2.1~\cite{D221} (Report on methods and algorithms for bootstrapping Semantic Web content from user repositories and reaching consensus on the use of semantics), D2.2.2/D2.2.3~\cite{D222} (Report on methods and algorithms for linking user-generated semantic annotations to Semantic Web and supporting their evolution in time), D2.3.1~\cite{D231} (Requirements for information retrieval (IR) methods for semantic content), and D2.3.2~\cite{D232} (Specification of information retrieval (IR) methods for semantic content). The proposed models and methods were then validated against evolved requirements from the use case partners and the areas of refinements were identified. This deliverable provides a detailed account on the results of the validation and on the refinements that need to be introduced to the models and to the algorithms. In particular, the following algorithms are detailed in this deliverable: (i) the semantic convergence algorithm that supports the computation of concepts from user annotations and positioning of these concepts in an ontology; (ii) the annotation evolution algorithm that supports the recomputation of links from annotations to the underlying ontology as the ontology evolves; (iii) the summarization algorithm that is capable of computing short summaries for concepts from the ontology to help users decide which concepts to use in the annotation process; (iv) semantic search algorithm that uses the underlying ontology in order to provide the user with more relevant results. The algorithms are described at the reproducible level of details and their relation to the state-of-the-art is reported, whenever possible. The deliverable also presents a platform for creating golden standards for semantic annotation systems and describes a golden standard dataset that was created using the platform and that was used for the evaluation of some of the proposed algorithms. To the best of our knowledge, it is the first attempt to develop such a platform that would facilitate the creation of golden standard datasets for annotation systems in the Semantic Web community. The aforementioned dataset is exported to RDF and is currently undergoing the process of its inclusion to the Linking Open Data could. The platform and the dataset represent a valuable contribution to the community, where the need for golden standard datasets, which can be used for a comparative analysis of existing approaches, has been realised. The deliverable is the concluding deliverable on annotation models and methods in Workpackage 2. Further possible refinements of the models and methods will be reported in publications in scientific conferences, journals, and other venues.
... To the best of our knowledge, our work is the first to consider processing kNN and range queries using both crowdsourced data and computation in incomplete distributed databases with accuracy guarantee. Due to the benefits of involving crowd in solving problems with efficiency and efficacy, crowdsourcing has been investigated in recent years for data col- lection [24, 25], query processing [26, 27] and recommendation systems [28, 29]. In [24, 25], the authors proposed techniques to extract crowdsourced datasets from social systems like bookmarking or tagging systems. ...
... Due to the benefits of involving crowd in solving problems with efficiency and efficacy, crowdsourcing has been investigated in recent years for data col- lection [24, 25], query processing [26, 27] and recommendation systems [28, 29]. In [24, 25], the authors proposed techniques to extract crowdsourced datasets from social systems like bookmarking or tagging systems. In [30], the authors developed a technique to collect comprehensive tempo-spatial data by involving a limited number of vehicles, whereas the focus of the mobile crowdsensing framework [31] is to reduce the load of crowd for manually collecting data while preserving the accuracy of the urban data. ...
Article
Full-text available
With the proliferation of mobile devices and wireless technologies, location based services (LBSs) are becoming popular in smart cities. Two important classes of LBSs are Nearest Neighbor (NN) queries and range queries that provide user information about the locations of a point of interests (POIs) such as hospitals or restaurants. Answers of these queries are more reliable and satisfiable if they come from trustworthy crowd instead of traditional location service providers (LSPs). We introduce an approach to evaluate NN and range queries with crowdsourced data and computation that eliminates the role of an LSP. In our crowdsourced approach, a user evaluates LBSs in a group. It may happen that group members do not have knowledge of all POIs in a certain area. We present efficient algorithms to evaluate queries with accuracy guarantee in incomplete databases. Experiments show that our approach is scalable and incurs less computational overhead.
... Ipeirotis et al. gathered all avaliable information from Amazon Mechanical Turk by computing daily statistics for new projects and completed tasks once a day and shared the dataset to the public 18 . Körner and Strohmaier [47] ...
... In 2010, Ipeirotis et al. gathered all avaliable information from Amazon Mechanical Turk by computing daily statistics for new projects and completed tasks once a day and shared the dataset to the public 18 . Körner and Strohmaier [47] posted a list of social tagging datasets made available for research 19 . ...
Conference Paper
Full-text available
Crowd sourcing is evolving as a distributed problem-solving and business production model in recent years. In crowd sourcing paradigm, tasks are distributed to networked people to complete such that a company's production cost can be greatly reduced. In 2003, Luis von Ahn and his colleagues pioneered the concept of "human computation", which utilizes human abilities to perform computation tasks that are difficult for computers to process. Later, the term "crowdsourcing" was coined by Jeff Howe in 2006. Since then, a lot of work in crowd sourcing has focused on different aspects of crowd sourcing, such as computational techniques and performance analysis. In this paper, we give a survey on the literature on crowd sourcing which are categorized according to their applications, algorithms, performances and datasets. This paper provides a structured view of the research on crowd sourcing to date.
... While in recent years a lot of research investigated social tagging systems and the resulting folksonomic structure or mined the user's interests to support personalization in those systems, there still exists little information about how users' background (e.g., research discipline, gender, location, ...) becomes manifested in their used tags. This is mainly due to a lack of profile information in social tagging datasets (see the call for social tagging datasets by Körner et al. [4] for details). Subsequently little is known about how users' background information is reflected in the tags used. ...
Article
Full-text available
In social tagging systems the tagging activities of users leave a huge amount of implicit information about them. The users choose tags for the resources they annotate based on their interests, background knowledge, personal opinion and other criteria. Whilst existing research in mining social tagging data mostly focused on gaining a deeper understanding of the user's interests and the emerging structures in those systems, little work has yet been done to use the rich implicit information in tagging activities to unveil to what degree users' tags convey information about their background. The automatic inference of user background information can be used to complete user profiles which in turn supports various recommendation mechanisms. This work illustrates the application of supervised learning mechanisms to analyze a large online corpus of tagged academic literature for extraction of user characteristics from tagging behavior. As a representative example of background characteristics we mine the user's research discipline. Our results show that tags convey rich information that can help designers of those systems to better understand and support their prolific users - users that tag actively - beyond their interests.
... However, these tags are totally uncontrolled and their semantic is not explicit. In the current datasets, for instance the ones provided by Tagora 4 or listed in [10], no-one has yet, to the best of our knowledge, provided a golden standard with such semantics. In that, the del.icio.us ...
Article
Full-text available
Corporate portals make an important integral part of the enterprise infrastructure, facilitating the creation, sharing, discovery and consumption of enterprise assets through blogs, news, forums, documents and information in general. However, as the amount of data grows, it becomes much more difficult to access the right asset in the precise moment when it is needed. Annotation systems tried to address this problem to a certain extent by allowing the users to collaboratively annotate assets using tags such that these assets can be found more easily by reusing these tags in queries. However, this model often falls short due to mismatches in the vocabularies of different users, due to the use of synonymous, polysemous, more specific or general terms in tagging and searching, and so on. In this paper we: (a) provide an analysis of the corporate portal of the Telefonica group; (b) identify requirements for a semantics-enabled annotation system that is capable of addressing the above-mentioned shortcomings; (c) define a semantic annotation model that meets the requirements; (d) provide the details of the implementation of the annotation model for the Telefonica portal; and (e) provide an initial evaluation from the use of the semantic annotation system in the enterprise.
Article
Full-text available
Mobile crowdsensing serves as a critical building block for emerging Internet of Things (IoT) applications. However, the sensing devices continuously generate a large amount of data, which consumes much resources (e.g., bandwidth, energy, and storage) and may sacrifice the Quality-of-Service (QoS) of applications. Prior work has demonstrated that there is significant redundancy in the content of the sensed data. By judiciously reducing redundant data, data size and load can be significantly reduced, thereby reducing resource cost and facilitating the timely delivery of unique, probably critical information and enhancing QoS. This article presents a survey of existing works on mobile crowdsensing strategies with an emphasis on reducing resource cost and achieving high QoS. We start by introducing the motivation for this survey and present the necessary background of crowdsensing and IoT. We then present various mobile crowdsensing strategies and discuss their strengths and limitations. Finally, we discuss future research directions for mobile crowdsensing for IoT. The survey addresses a broad range of techniques, methods, models, systems, and applications related to mobile crowdsensing and IoT. Our goal is not only to analyze and compare the strategies proposed in prior works, but also to discuss their applicability toward the IoT and provide guidance on future research directions for mobile crowdsensing.
Conference Paper
Full-text available
Social annotation systems such as del.icio.us, Flickr and others have gained tremendous popularity among Web 2.0 users. One of the factors of success was the simplicity of the underlying model, which consists of a resource (e.g., a web page), a tag (e.g., a text string), and a user who annotates the resource with the tag. However, due to the syntactic nature of the underlying model, these systems have been criticised for not being able to take into account the explicit semantics implicitly encoded by the users in each tag. In this article we: a) provide a formalisation of an annotation model in which tags are based on concepts instead of being free text strings; b) describe how an existing annotation system can be converted to the proposed model; c) report on the results of such a conversion on the example of a del.icio.us dataset; and d) show how the quality of search can be improved by the semantic in the converted dataset.
Article
Auch erschienen in: Moor, Aldo de u.a. (Hrsg.): Proceedings of the First Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures. Aalborg : Universitetsforlag, 2006. S. 87-102 Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. In this paper we specify a formal model for folksonomies and briefly describe our own system BibSonomy, which allows for sharing both bookmarks and publication references in a kind of personal library.
To Be Published in 2010. A Collection of Tagging Datasets Containing Complete Personomies From Heterogeneous Sources
  • H Grahsl
  • C Körner
  • M Strohmaier