Social tagging in the life sciences: Characterizing a new metadata resource for bioinformatics

Heart + Lung Institute at St, Paul's Hospital, University of British Columbia, Vancouver, Canada.
BMC Bioinformatics (Impact Factor: 2.58). 09/2009; 10(1):313. DOI: 10.1186/1471-2105-10-313
Source: DBLP


Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products.
We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags) per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting.
The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search) to benefit substantially from socially generated metadata in the life sciences, more documents need to be tagged and more tags are needed for each document. These issues may be addressed both by finding ways to attract more users to current systems and by creating new user interfaces that encourage more collectively useful individual tagging behaviour.

Download full-text


Available from: Joseph T Tennis
    • "They provide researchers with means of content organization using keywords (search tags) [4]. The use of Medical Subject Headings (MeSH) search tags has proven to be crucial in linking PubMed to other resources. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The importance of searching BioMedical literature for drug interaction and side-effects is apparent. Current digital libraries (e.g., PubMed) suffer infrequent tagging and metadata annotation updates. Such limitations cause absence of linking literature to new scientific evidence. This demonstrates a great deal of challenges that stand in the way of scientists when searching biomedical repositories. In this paper, we present a network mining approach that provides a bridge for linking and searching drug-related literature. Our contributions here are two fold: (1) an efficient algorithm called HashPairMiner to address the run-time complexity issues demonstrated in its predecessor algorithm: HashnetMiner, and (2) a database of discoveries hosted on the web to facilitate literature search using the results produced by HashPairMiner. Though the K-H Network model and the HashPairMiner algorithm are fairly young, their outcome is evidence of the considerable promise they offer to the biomedical science community in general and the drug research community in particular. Full text available: Copyright © 2015. Published by Elsevier Inc.
    No preview · Article · Jun 2015 · Journal of Biomedical Informatics
  • Source
    • "However, the limited resources may still add some values as additional index terms in users' information seeking process. In fact, it was reported that some social tags not overlapping with MeSH terms were actually good representation of the topical contents of their associated articles [3] "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents our ongoing study of the current/future impact of social bookmarks (or social tags) on information retrieval (IR). Our main research question asked in the present work is "How are social tags compared with conventional, yet reliable manual indexing from the viewpoint of IR performance?". To answer the question, we look at the biomedical literature and begin with examining basic statistics of social tags from CiteULike in comparison with Medical Subject Headings (MeSH) annotated in the Medline bibliographic database. Then, using the data, we conduct various experiments in an IR setting, which reveals that social tags work complementarily with MeSH and that retrieval performance would improve as the coverage of CiteULike grows.
    Full-text · Conference Paper · Jan 2010
  • Source
    • "We match tags and terms for every single document and then calculate average values [1]. This is done to avoid mismatches between tags and metadata of different documents, as we believe term overlap between docsonomy and respective metadata is more valuable than overlap between folksonomy and the entire metadata collection. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Qualitative journal evaluation makes use of cumulated content descriptions of single articles. These can either be represented by author-generated keywords, professionally indexed subject headings, automatically extracted terms or by reader-generated tags as used in social bookmarking systems. It is assumed that particularly the users' view on article content differs significantly from the authors' or indexers' perspectives. To verify this assumption, title and abstract terms, author keywords, Inspec subject headings, KeyWords Plus TM and tags are compared by calculating the overlap between the respective datasets. Our approach includes extensive term preprocessing (i.e. stemming, spelling unifications) to gain a homogeneous term collection. When term overlap is calculated for every single document of the dataset, similarity values are low. Thus, the presented study confirms the assumption, that the different types of keywords each reflect a different perspective of the articles' contents and that tags (cumulated across articles) can be used in journal evaluation to represent a reader-specific view on published content.
    Full-text · Article ·
Show more