Utopia documents: linking scholarly literature with research data.

School of Computer Science, Faculty of Life Sciences, University of Manchester, Manchester, UK.
Bioinformatics (Impact Factor: 4.62). 09/2010; 26(18):i568-74. DOI: 10.1093/bioinformatics/btq383
Source: PubMed

ABSTRACT In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged.
To this end, we have developed Utopia Documents, a novel PDF reader that semantically integrates visualization and data-analysis tools with published research articles. In a successful pilot with editors of the Biochemical Journal (BJ), the system has been used to transform static document features into objects that can be linked, annotated, visualized and analyzed interactively ( Utopia Documents is now used routinely by BJ editors to mark up article content prior to publication. Recent additions include integration of various text-mining and biodatabase plugins, demonstrating the system's ability to seamlessly integrate on-line content with PDF articles.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper explores how and why the Linguistic Annotation Framework might be adapted for compatibility with recent more general proposals for the representation of annotations in the Semantic Web, referred to here as the Open Annotation models. We argue that the adapted model, in addition to being interoperable with other annotations and annotation tools, also resolves some representational limitations and semantic ambiguity of the original data model.
    Proceedings of the Sixth Linguistic Annotation Workshop; 07/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Rich and �fine-grained semantic information describing varied aspects of scienti�fic productions is essential to support their di�ffusion as well as to properly assess the quality of their output. To foster this trend, in the context of the ESWC2014 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings. Proceedings are analyzed through a sequence of processing phases. SVM classi�ers complemented by heuristics are used to annotate missing CEUR-WS markups. Annotations are then linked to external datasets like DBpedia and Bibsonomy. Finally, the data is modeled and published as an RDF graph. Our system is provided as an on-line Web service to support on-the-fly RDF generation. In this paper we describe the system and present its evaluation following the procedure set by the organizers of the challenge.
    Extended Semantic Web Conference 2014, Crete, Grece; 04/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
    PeerJ. 01/2014; 2:e524.

Full-text (2 Sources)

Available from
May 23, 2014