Utopia documents: Linking scholarly literature with research data

School of Computer Science, Faculty of Life Sciences, University of Manchester, Manchester, UK.
Bioinformatics (Impact Factor: 4.98). 09/2010; 26(18):i568-74. DOI: 10.1093/bioinformatics/btq383
Source: PubMed


In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged.
To this end, we have developed Utopia Documents, a novel PDF reader that semantically integrates visualization and data-analysis tools with published research articles. In a successful pilot with editors of the Biochemical Journal (BJ), the system has been used to transform static document features into objects that can be linked, annotated, visualized and analyzed interactively ( Utopia Documents is now used routinely by BJ editors to mark up article content prior to publication. Recent additions include integration of various text-mining and biodatabase plugins, demonstrating the system's ability to seamlessly integrate on-line content with PDF articles.

Download full-text


Available from: Teresa K Attwood
  • Source
    • "Increased interest in the LOD has been seen in various sectors e.g. Education (Dietze et al., 2013; Piedra et al., 2014), Scientific research (Attwood et al., 2010), libraries (Hannemann & Kett, 2010; Howarth, 2012), Government (Ding et al., 2011; Hendler et al., 2012; Shadbolt et al., 2012), Cultural heritage (Edelstein et al., 2013) and many others, however, the religious sector has yet to cache upon the power of the linked open data. "

    Full-text · Conference Paper · Oct 2015
  • Source
    • "Not just academic software, but also commercial software exists and is actively marketed to pharmaceutical and biotech companies for this kind of entity extraction (e.g. the Linguamatics textmining suite [4]. Once the text-to-vocabulary mappings have been achieved, they may then serve as the basis for popups and visualizations [2] [25], and/or alerting systems based on researcher or industrial interest. It is clearly essential in this approach, shown conceptually in Figure 1, to employ robust entity recognition algorithms based on sound ontologies. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Two complementary models for biomedical literature-data integration are presented: entity-based and argument-based. We believe the argument-based model is a novel application in this domain and can be exceptionally useful in providing better support than currently exists for robust and reproducible science. We describe both approaches, along with some current models and available tools for scientific literature annotation. We then show how argument graphs, represented as stand-o↵ annotation on research articles, can help improve the robustness of scientific findings over time.
    Full-text · Conference Paper · Jul 2015
  • Source
    • "Although an increasing number of journals today require the data used to derive the results as prerequisite for publication (e.g., f1000), the steps on how these data have been assembled from primary data and how the data have been processed during the analysis are often hidden. Losing the link between primary data, derived data products, and knowledge results in a " gulf " between primary data repositories and knowledge repositories (Shotton 2009; Attwood et al. 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: We are witnessing a growing gap separating primary research data from derived data products presented as knowledge in publications. Although journals today more often require the underlying data products used to derive the results as a prerequisite for a publication, the important link to the primary data is lost. However, documenting the postprocessing steps of data linking, the primary data with derived data products has the potential to increase the accuracy and the reproducibility of scientific findings significantly. Here, we introduce the rBEFdata R package as companion to the collaborative data management platform BEFdata. The R package provides programmatic access to features of the platform. It allows to search for data and integrates the search with external thesauri to improve the data discovery. It allows to download and import data and metadata into R for analysis. A batched download is available as well which works along a paper proposal mechanism implemented by BEFdata. This feature of BEFdata allows to group primary data and metadata and streamlines discussions and collaborations revolving around a certain research idea. The upload functionality of the R package in combination with the paper proposal mechanism of the portal allows to attach derived data products and scripts directly from R, thus addressing major aspects of documenting data postprocessing. We present the core features of the rBEFdata R package along an ecological analysis example and further discuss the potential of postprocessing documentation for data, linking primary data with derived data products and knowledge.
    Full-text · Article · Jul 2015 · Ecology and Evolution
Show more