Utopia documents: Linking scholarly literature with research data

School of Computer Science, Faculty of Life Sciences, University of Manchester, Manchester, UK.
Bioinformatics (Impact Factor: 4.98). 09/2010; 26(18):i568-74. DOI: 10.1093/bioinformatics/btq383
Source: PubMed


In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged.
To this end, we have developed Utopia Documents, a novel PDF reader that semantically integrates visualization and data-analysis tools with published research articles. In a successful pilot with editors of the Biochemical Journal (BJ), the system has been used to transform static document features into objects that can be linked, annotated, visualized and analyzed interactively (http://www.biochemj.org/bj/424/3/). Utopia Documents is now used routinely by BJ editors to mark up article content prior to publication. Recent additions include integration of various text-mining and biodatabase plugins, demonstrating the system's ability to seamlessly integrate on-line content with PDF articles.

Download full-text


Available from: Teresa K Attwood,
  • Source
    • "Not just academic software, but also commercial software exists and is actively marketed to pharmaceutical and biotech companies for this kind of entity extraction (e.g. the Linguamatics textmining suite [4]. Once the text-to-vocabulary mappings have been achieved, they may then serve as the basis for popups and visualizations [2] [25], and/or alerting systems based on researcher or industrial interest. It is clearly essential in this approach, shown conceptually in Figure 1, to employ robust entity recognition algorithms based on sound ontologies. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Two complementary models for biomedical literature-data integration are presented: entity-based and argument-based. We believe the argument-based model is a novel application in this domain and can be exceptionally useful in providing better support than currently exists for robust and reproducible science. We describe both approaches, along with some current models and available tools for scientific literature annotation. We then show how argument graphs, represented as stand-o↵ annotation on research articles, can help improve the robustness of scientific findings over time.
    First International Workshop on Capturing Scientific Knowledge; 07/2015
  • Source
    • "Although an increasing number of journals today require the data used to derive the results as prerequisite for publication (e.g., f1000), the steps on how these data have been assembled from primary data and how the data have been processed during the analysis are often hidden. Losing the link between primary data, derived data products, and knowledge results in a " gulf " between primary data repositories and knowledge repositories (Shotton 2009; Attwood et al. 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: We are witnessing a growing gap separating primary research data from derived data products presented as knowledge in publications. Although journals today more often require the underlying data products used to derive the results as a prerequisite for a publication, the important link to the primary data is lost. However, documenting the postprocessing steps of data linking, the primary data with derived data products has the potential to increase the accuracy and the reproducibility of scientific findings significantly. Here, we introduce the rBEFdata R package as companion to the collaborative data management platform BEFdata. The R package provides programmatic access to features of the platform. It allows to search for data and integrates the search with external thesauri to improve the data discovery. It allows to download and import data and metadata into R for analysis. A batched download is available as well which works along a paper proposal mechanism implemented by BEFdata. This feature of BEFdata allows to group primary data and metadata and streamlines discussions and collaborations revolving around a certain research idea. The upload functionality of the R package in combination with the paper proposal mechanism of the portal allows to attach derived data products and scripts directly from R, thus addressing major aspects of documenting data postprocessing. We present the core features of the rBEFdata R package along an ecological analysis example and further discuss the potential of postprocessing documentation for data, linking primary data with derived data products and knowledge.
    Ecology and Evolution 07/2015; 5(14). DOI:10.1002/ece3.1547 · 2.32 Impact Factor
  • Source
    • "As shown by the survey conducted in [2], Enhanced Publications (EPs) can be generally conceived as digital publications "enriched with" or "linking to" related research results, such as research data, workflows, software, and possibly connections among them. Enhanced Publication Information Systems (EPISs) are systems devised for the management of EPs [3] [4] [5] [6] [7] [8] [19]. The majority of those systems are tailored to their specific communities and realized "from scratch" so that functionalities that are shared across disciplines and user communities are re-implemented every time. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Enhanced publications (EPs) can be generally conceived as digital publications "enriched with" or "linking to" related research results, such as research data, workflows, software, and possibly connections among them. Enhanced Publication Information Systems (EPISs) are information systems devised for the management of EPs in specific application domains. Currently, no framework supporting the realization of EPISs is known, and EPIs are typically realized "from scratch" by integrating general-purpose technologies (e.g. relational databases, file stores, triple stores) and Digital Library oriented software (e.g. repositories, cataloguing systems). Such an approach is doomed to entail non-negligible realization and maintenance costs that could be decreased by adopting a more systemic approach. The framework proposed in this work addresses this task by providing EPIS developers with EP management tools that facilitate their efforts by hiding the complexity of the underlying technologies.
    D-Lib Magazine 01/2015; 21(1/2). DOI:10.1045/january2015-bardi
Show more