About
88
Publications
20,543
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,927
Citations
Introduction
Publications
Publications (88)
The early twenty-first century has witnessed massive expansions in availability and accessibility of digital data in virtually all domains of the biodiversity sciences. Led by an array of asynchronous digitization activities spanning ecological, environmental, climatological, and biological collections data, these initiatives have resulted in a ple...
Semantic segmentation has been proposed as a tool to accelerate the processing of natural history collection images. However, developing a flexible and resilient segmentation network requires an approach for adaptation which allows processing different datasets with minimal training and validation. This paper presents a cross-validation approach de...
The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices, yet the incentives for researchers to change their practices are presently weak. In addition, data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices. To overcome these...
A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable, with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them. Label data are traditionally manually tran...
This paper gives a summary of implementation activities in the realm of FAIR Digital Objects (FDO). It gives an idea which software components are robust and used for many years, which components are comparatively new and are being tested out in pilot projects and what the challenges are that need to be urgently addressed by the FDO community. Afte...
Specimens have long been viewed as critical to research in the natural sciences because each specimen captures the phenotype (and often the genotype) of a particular individual at a particular point in space and time. In recent years there has been considerable focus on digitizing the many physical specimens currently in the world’s natural history...
Approved formally as a TDWG Task Group (TG) in September 2020, TG MIDS is working to harmonise a framework for "Minimum Information about a Digital Specimen (MIDS)". MIDS clarifies what is meant by different levels of digitization (MIDS levels) and specifies the minimum information to be captured at each level. Capturing and presenting data in futu...
International mass digitization efforts through infrastructures like the European Distributed System of Scientific Collections (DiSSCo), the US resource for Digitization of Biodiversity Collections (iDigBio), the National Specimen Information Infrastructure (NSII) of China, and Australia’s digitization of National Research Collections (NRCA Digital...
Transdisciplinary and cross-cultural cooperation and collaboration are needed to build extended, densely interconnected information resources. These are the prerequisites for the successful implementation and execution of, for example, an ambitious monitoring framework accompanying the post-2020 Global Biodiversity Framework (GBF) of the Convention...
International collaboration between collections, aggregators, and researchers within the biodiversity community and beyond is becoming increasingly important in our efforts to support biodiversity, conservation and the life of the planet. The social, technical, logistical and financial aspects of an equitable biodiversity data landscape – from work...
Persistent identifiers (PID) to identify digital representations of physical specimens in natural science collections (i.e., digital specimens) unambiguously and uniquely on the Internet are one of the mechanisms for digitally transforming collections-based science. Digital Specimen PIDs contribute to building and maintaining long-term community tr...
To support future research based on natural sciences collection data, DiSSCo (Distributed System of Scientific Collections) – the European Research Infrastructure for Natural Science Collections – adopts Digital Object Architecture as the basis for its planned data infrastructure. Using the outputs of one Research Data Alliance (RDA) interest group...
Digitisation is the process of converting analogue data about physical specimens to digital representation that includes electronic text, images and other forms. The term has been used diversely within the natural science collections community, and between different digitisation initiatives, the outputs can be quite different.
Digitisation of indiv...
In a Biodiversity_Next 2019 symposium, a vision of Digital Specimens based on the concept of a Digital Object Architecture (Kahn and Wilensky 2006) (DOA) was discussed as a new layer between data infrastructure of natural science collections and user applications for processing and interacting with information about specimens and collections. This...
There has been little work to compare and understand the operating costs of digitisation using a standardised approach. This paper discusses a first attempt at gathering digitisation cost information from multiple institutions and analysing the data. This paper has been written: for other digitisation managers who want to breakdown and compare proj...
We describe an effective approach to automated text digitisation with respect to natural history specimen labels. These labels contain much useful data about the specimen including its collector, country of origin, and collection date. Our approach to automatically extracting these data takes the form of a pipeline. Recommendations are made for the...
Advances in automation, communication, sensing and computation enable experimental scientific processes to generate data at increasingly great speeds and volumes. Research infrastructures are devised to take advantage of these data, providing advanced capabilities for acquisition, sharing, processing, and analysis; enabling advanced research and pl...
We describe an effective approach to automated text digitisation with respect to natural history specimen labels. These labels contain much useful data about the specimen including its collector, country of origin, and collection date. Our approach to automatically extracting these data takes the form of a pipeline. Recommendations are made for the...
DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to the specimens held in natural science collections and delivering it to scientific communities and beyond. Bringing together 120 institutions across 21 countries and combining e...
We examine the intersection of the FAIR principles (Findable, Accessible, Interoperable and Reusable), the challenges and opportunities presented by the aggregation of widely distributed and heterogeneous data about biological and geological specimens, and the use of the Digital Object Architecture (DOA) data model and components as an approach to...
DiSSCo – the Distributed System of Scientific Collections – will mobilise, unify and deliver bio- and geo-diversity information at the scale, form and precision required by scientific communities, and thereby transform a fragmented landscape into a coherent and responsive research infrastructure. At present DiSSCo has 115 partners from 21 countries...
With projected lifespans of many decades, infrastructure initiatives such as Europe’s Distributed System of Scientific Collections (DiSSCo), USA’s Integrated Digitized Biocollections (iDigBio), National Specimen Information Infrastructure (NSII) of China and Australia’s digitisation of national research collections (NRCA Digital) aim at transforming...
The definition of a digital specimen is proposed to encompass the digital representation(s) of physical specimens from natural science collections. The digital specimen concept is intended to define a representation (digital object) that brings together an array of heterogeneous data types, which are themselves alternative physical specimen represe...
Preserved specimens in natural science collections have lifespans of many decades and often, several hundreds of years. Specimens must be unambiguously identifiable and traceable in the face of changes in physical location, changes in organisation of the collection to which they belong, and changes in classification. When digitizing museum collecti...
The paper investigates how to implement open access to data in collection institutions and in the DiSSCo research infrastructure. Large-scale digitisation projects generate lots of images, but data transcription often remains backlogged for years. The paper discusses minimum information standards (MIDS) for digital specimens, and tentatively define...
More and more herbaria are digitising their collections. Images of specimens are made available online to facilitate access to them and allow extraction of information from them. Transcription of the data written on specimens is critical for general discoverability and enables incorporation into large aggregated research datasets. Different methods...
R script used for this paper
R script used to map data from FinBIF API to DwC
Table of specimen data, DOIs and URIs
Python script to upload the dataset to Zenodo
Taxonomic coverage (interactive HTML file)
R Script to compile JSON files from CSV
Essential biodiversity variables (EBV) are information products for assessing biodiversity change. Species populations EBVs are one class of EBVs that can be used to monitor the spread of invasive species. However, systematic, reliable, repeatable procedures to process primary data into EBVs do not yet exist, and environmental research infrastructu...
Essential Biodiversity Variables (EBV) are fundamental variables that can be used for assessing biodiversity change over time, for determining adherence to biodiversity policy, for monitoring progress towards sustainable development goals, and for tracking biodiversity responses to disturbances and management interventions. Data from observations o...
Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures. Typically subject to substantial processing, observational data are used by research communities,...
DiSSCo (The Distributed System of Scientific Collections) is a Research Infrastructure (RI) aiming at providing unified physical (transnational), remote (loans) and virtual (digital) access to the approximately 1.5 billion biological and geological specimens in collections across Europe. DiSSCo represents the largest ever formal agreement between n...
ICEDIG is a design study for the new research infrastructure Distributed System of Scientific Collections (DiSSCo), focusing on the issues around digitisation of the collections and making their data freely and openly available following the FAIR principles (data being Findable, Accessible, Interoperable, and Re-usable).
As a design study, ICEDIG...
Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to...
Background
Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced...
Europe is building its Open Science Cloud; a set of robust and interoperable e-infrastructures with the capacity to provide data and computational solutions through cloud-based services. The development and sustainable operation of such e-infrastructures are at the forefront of European funding priorities. The research community, however, is still...
In order to preserve the variety of life on Earth, we must understand it better. Biodiversity research is at a pivotal point with research projects generating data at an ever increasing rate. Structuring, aggregating, linking and processing these data in a meaningful way is a major challenge. The systematic application of information management and...
Environmental research infrastructures (RIs) support data-intensive research by integrating large-scale sensor/observer networks with dedicated data curation services and analytical tools. However the diversity of scientific disciplines coupled with the lack of an accepted methodology for constructing new RIs inevitably leads to incompatibilities b...
Environmental research infrastructures (RIs) support their respective research communities by integrating large-scale sensor/observation networks with data curation services, analytical tools and common operational policies. These RIs are developed as pillars of intra-and interdisciplinary research, however comprehension of the complex, pathologica...
Essential biodiversity variables (EBVs) have been proposed by the Group on Earth Observations Biodiversity Observation Network (GEO BON) to identify a minimum set of essential measurements that are required for studying, monitoring and reporting biodiversity and ecosystem change. Despite the initial conceptualisation, however, the practical impleme...
Marine biological invasions have increased with the development of global trading, causing the homogenization of communities and the decline of biodiversity. A main vector is ballast water exchange from shipping. This study evaluates the use of ecological niche modelling (ENM) to predict the spread of 18 non-indigenous species (NIS) along shipping...
Ecological Niche Modelling (ENM) Components are a set of reusable workflow components specialized for performing ENM tasks within the Taverna workflow management system. Each component encapsulates specific functionality and can be combined with other components to facilitate the creation of larger and more complex workflows. One key distinguishing...
This paper presents SCRAM–CK, a method to elicit requirements by means of strong user involvement supported by prototyping activities. The method integrates two existing approaches, SCRAM and CK theory. SCRAM provides the framework for requirements management, while CK theory provides a framework for reasoning about design and its evolution. The me...
The interoperability between research infrastruc-tures, including not only cross invocations of services, but also the integration between data schemas, processing models and management policies and security controls, is essential to enable large scale data driven experiments. Analysing functional gaps between research infrastructures and decomposi...
For the upcoming calls for Horizon 2020 research funding, the European Commission has said that it would prefer bids from open, collaborative consortia rather than the competitive bids seen in previous funding programmes. To this end, the organizers of 18 European biodiversity informatics projects agreed at a meeting in Rome…
Much progress has been made in the past ten years to fulfil the potential of biodiversity informatics. However, it is dwarfed by the scale of what is still required. The Global Biodiversity Informatics Outlook (GBIO) offers a framework for reaching a much deeper understanding of the world’s biodiversity, and through that understanding the means to...
The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can
be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments),
using the Taverna Server. In bioinformatics,...
Biodiversity informatics plays a central enabling role in the research community's efforts to address scientific conservation and sustainability issues. Great strides have been made in the past decade establishing a framework for sharing data, where taxonomy and systematics has been perceived as the most prominent discipline involved. To some exten...
The European ENVRI project (Common operations of Environmental Research
Infrastructures) is addressing common ICT solutions for the research
infrastructures as selected in the ESFRI Roadmap. More specifically, the
project is looking for solutions that will assist interdisciplinary
users who want to benefit from the data and other services of more t...
Frontier environmental research increasingly depends on a wide range of data and advanced capabilities to process and analyse them. The ENVRI project, "Common Operations of Environmental Research infrastructures" is a collaboration in the ESFRI Environment Cluster, with support from ICT experts, to develop common e-science components and services f...
Wireless Sensor Networks (WSNs) produce large quantities of raw data from multiple sources, such as cameras, temperature or humidity sensors. We propose a network that uses the knowledge of domain experts that use the network, as well as previously sensed data, to classify incoming sensed data in near real-time. Using a field centre in Malaysia as...
The aim of the BioVeL project is to provide a seamlessly connected informatics environment that makes it easier for biodiversity scientists to carry out in-silico analysis of relevant biodiversity data and to pursue in-silico experimentation based on composing and executing sequences of complex digital data manipulations and modelling tasks. In Bio...
To propose a research agenda that addresses technological and other knowledge gaps in developing telemonitoring solutions for patients with chronic diseases, with particular focus on detecting deterioration early enough to intervene effectively.
A mixed methods approach incorporating literature review, key informant, and focus group interviews to g...
Patients with chronic disease may suffer frequent acute deteriorations and associated increased risk of hospitalisation. Earlier detection of these could enable successful intervention, improving patients’ well-being and reducing costs; however, current telemonitoring systems do not achieve this effectively. We conducted a qualitative study using s...
To examine the evidence base for telemonitoring designed for patients who have chronic obstructive pulmonary disease and heart failure, and to assess whether telemonitoring fulfils the principles of monitoring and is ready for implementation into routine settings.
Qualitative data collection using interviews and participation in a multi-path mappin...
There are many promising earth and biodiversity-monitoring projects underway across the globe, but they often operate in information islands, unable easily to share data with others. This is not convenient: It is a barrier to scientists collaborating on complex, cross-disciplinary projects which is an essential nature of biodiversity research. Life...
Article not yet included in an issue
The Species 2000 and ITIS Catalogue of Life aims to create and deliver a catalogue of all known species, using a distributed
set of data sources. The current Species 2000 software has developed over a number of years, and the system requirements have
evolved substantially over the same period. In this paper we discuss the current Catalogue of Life...
The LifeWatch Reference Model provides the basis for an interoperable ICT infrastructure for European biodiversity research building on standards whenever feasible. Distinguishing features will be support of workflow for scientific in-silico experiments, tracking of provenance, and semantic support for interoperability. This paper presents the key...
A Birds of a Feather (BoF) meeting brought together more than 70 interested participants of the e Science All Hands Meeting 2008 to consider questions and issues arising from the proposition: ?e-Infrastructure: tool for the elite or tool for everybody?? Contrasting position statements from two practitioners in the field set the scene for multiple b...
The research aim underpinning the Healthcare@Home (HH) information system described here was to enable 'near real time' risk analysis for disease early detection and prevention. To this end, we are implementing a family of prototype web services to 'push' or 'pull' individual's health-related data via an system of clinical hubs, mobile communicatio...