Conference Paper

iCurate: A Research Data Management System

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Scientific research activities generate a large amount of data, which varies in format, volume, structure and ownership. Although there are revision control systems and databases developed for data archiving, the traditional data management methods are not suitable for High-Performance Computing (HPC) systems. The files in such systems do not have semantic annotations and cannot be archived and managed for public dissemination. We have proposed and developed a Research Data Management (RDM) system, ‘iCurate’, which provides easy-to-use RDM facilities with semantic annotations. The system incorporates Metadata Retrieval, Departmental Archiving, Workflow Management System, Meta data Validation and Self Inferencing. The ‘i’ emphasises the user-oriented design. iCurate will support researchers by annotating their data in a clearer and machine readable way from its production to publication for the future reuse.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Approaches for extracting metadata on an application level especially for HPC are harder to find. One of them is iCurate, the research data management system at the University of Huddersfield, UK [36]. It is tailored to HPC data and offers an automated metadata extraction. ...
Article
Full-text available
The deluge of dark data is about to happen. Lacking data management capabilities, especially in the field of supercomputing, and missing data documentation (i.e., missing metadata annotation) constitute a major source of dark data. The present work contributes to addressing this challenge by presenting ExtractIng, a generic automated metadata extraction toolkit. Existing metadata information of simulation output files scattered through the file system, can be aggregated, parsed and converted to the EngMeta metadata model. Use cases from computational engineering are considered to demonstrate the viability of ExtractIng. The evaluation results show that the metadata extraction is simulation-code independent in the sense that it can handle data outputs from various fields of science, is easy to integrate into simulation workflows and compatible with a multitude of computational environments.
... The importance and advantages of RDM are already widely known, and universities start building own checklists and platforms to provide among others data sharing, mainly to collect their data and to allocate todays' data for their future researchers [13], [30], [31]. It happens that existing platforms do not offer the special needs of different research institutions and fields of study, just as sufficient storage and security safeguards [9], and due to that universities, for example, develop their own platforms with individual research data management guidelines [32]. The university of Rochester (New York), as an example, spent US$200,000 in designing and implementing a digital archive to manage the data of their researchers [24]. ...
Conference Paper
Full-text available
The ongoing digitalization of academic work processes has led to a shift in academic work culture where researchers are supposed to take on more responsibility in term of adequate data management. Third party funding institutions as well as high class journals are increasingly asking for standardized data management processes and started to set up policies which should guide researchers to manage their data properly. In this work, we deal with the highly IS relevant topic of research data management (RDM) and provide an overview of the different existing research data management guidelines of the eight biggest governmental funded institutions and the biggest politically-independent institution. All existing guidelines of those institutions were considered in a qualitative analysis, summarized and evaluated. It has been found that non-technical requirements evolve to non-technical barriers, which institutions need to address to a greater extent within their guidelines to promote scientific research. This work shows the shift in the understanding of RDM and provides the present perspective which help researchers to better understand the ongoing trend of RDM within science.
... They introduce persistent identifiers to reference the data. iCurate is a data management system that is not based on OAIS but has a strong focus on the management layer and metadata [17]: Annotation, retrieval, and validation of metadata should be possible. The system is adapted to the HPC workflow, that means users can specify already in the Portable Batch System (PBS) file some metadata which is added to the output files when the job is complete. ...
Conference Paper
This paper targets the challenges of research data management with a focus on High Performance Computing (HPC) and simulation data. Main challenges are discussed: The Big Data qualities of HPC research data, technical data management, organizational and administrative challenges. Emerging from these challenges, requirements for a feasible HPC research data management are derived and an alternative data life cycle is proposed. The requirement analysis includes recommendations which are based on a modified OAIS architecture: To meet the HPC requirements of a scalable system, metadata and data must not be stored together. Metadata keys are defined and organizational actions are recommended. Moreover, this paper contributes by introducing the role of a Scientific Data Manager, who is responsible for the institution’s data management and taking stewardship of the data.
Article
This article offers a twofold introduction to the JISC-funded SWORD Project which ran for eight months in mid-2007. Firstly it presents an overview of the methods and madness that led us to where we currently are, including a timeline of how this work moved through an informal working group to a lightweight, distributed project. Secondly, it offers an explanation of the outputs produced for the SWORD Project and their potential benefits for the repositories community. SWORD, which stands for Simple Web service Offering Repository Deposit, came into being in March 2007 but was preceded by a series of discussions and activities which have contributed much to the project, known as the 'Deposit API'. The project itself was funded under the JISC Repositories and Preservation Programme, Tools and Innovation strand, with the over-arching aim of scoping, defining, developing and testing a standard mechanism for depositing into repositories and other systems. The motivation was that there was no standard way of doing this currently and increasingly scenarios were arising that might usefully leverage such a standard.
New checklist for a data management plan
  • S Jones
Iso 14721:2003 subscribe to updates space data and information transfer systems-open archival information system-reference model
  • Iso