Conference PaperPDF Available

Data Curation Activities in Research Data Repositories: Best Practices

Authors:

Abstract and Figures

Although, the massive amount of research data is being generated through modern-day research activities, lack of awareness regarding the documentation, metadata rendering, versioning, file format selection, data cleaning, secure storage options, data deposit templates and making research data easily accessible through research data repositories has resulted in invaluable research data becoming lost and/or rejected. Proper management and sharing of research data increase the access, impact, and efficiency of research activities. Therefore, as a requirement of the government funding agencies’ guidelines academic and research institutions have started to establish research data repository platforms. However, to build effective research data repositories with appropriate data curation activities are needed before uploading and publishing data. The current study has been undertaken with the primary objective of presenting the best data curation activities in research data repositories. The study further gives an overview of the software tools and applications available for various data curation activities viz. data cleaning, metadata creation, editing images/videos, storing data, identifying and validating data files, applications for data curation, and data indexing for searches among other core activities. The author further identifies the roles of interdisciplinary librarians and data generators and data providers to perform the best data curation activities in research data repositories. The methodology of the study was guided by content analysis of literature on data curation activities and the role of interdisciplinary librarians in data repositories. Additionally, data about the software and application tools available for data curation was collected through web surveys. The outcome of this study will greatly benefit the key stakeholders in adopting the best practices for data curation practices in research data repositories for enabling research data sharing and reuse. It will help in the development of required skills and competencies to full-fil the role of interdisciplinary librarians in data curation activities.
Content may be subject to copyright.
Arcle
Proceedings ICSTRDA
Copyright (c) 2021: Advanced Research Publicaons
INFO ABSTRACT
Although, the massive amount of research data is being generated
through modern-day research acvies, lack of awareness regarding the
documentaon, metadata rendering, versioning, le format selecon,
data cleaning, secure storage opons, data deposit templates and making
research data easily accessible through research data repositories has
resulted in invaluable research data becoming lost or rejected. Proper
management and sharing of research data increase the access, impact
and eciency of research acvies. Therefore, as a requirement of
the government funding agencies’ guidelines academic and research
instuons have started to establish research data repository plaorms.
However, to build eecve research data repositories with appropriate
data curaon acvies are needed before uploading and publishing data.
The current study has been undertaken with the primary objecve of
presenng the best data curaon acvies in research data repositories.
The study further gives an overview of the soware tools and applicaons
available for various data curaon acvies viz. data cleaning, metadata
creaon, eding images/ videos, storing data, idenfying and validang
data les, applicaons for data curaon, and data indexing for searches
among other core acvies. The author further idenes the roles of
interdisciplinary librarians and data generators and data providers to
perform the best data curaon acvies in research data repositories.
The methodology of the study was guided by content analysis of
literature on data curaon acvies and the role of interdisciplinary
librarians in data repositories. Addionally, data about the soware
and applicaon tools available for data curaon was collected through
web surveys. The outcome of this study will greatly benet the key
stakeholders in adopng the best pracces for data curaon pracces in
research data repositories for enabling research data sharing and reuse.
It will help in the development of required skills and competencies to
full-l the role of interdisciplinary librarians in data curaon acvies.
Keywords: Data Curaon, Data Cleaning, Research Data Repositories,
Data Librarian, Data Generator
Corresponding Author:
Manu TR, Adani Institute of Infrastructure,
Ahmedabad, Gujarat, India.
E-mail Id:
manutr91@gmail.com
How to cite this arcle:
Manu TR, Gala B. Data Curaon Acvies in
Research Data Repositories: Best Pracces.
Proceedings ICSTRDA 2021; 43-51.
Date of Submission: 2021-02-09
Date of Acceptance: 2021-03-01
Data Curation Activities in Research Data
Repositories: Best Practices
Manu TR1, Bhakti Gala2
1Library, Adani Instute of Infrastructure, Ahmedabad, Gujarat, India.
2School of Library and Informaon Science, Central University of Gujarat, Gandhinagar, India.
44
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
Introduction
Research Data Curaon is a process to consider data needs
for current and prospecve use, focusing on consultaon
and solutions for improved access, data protection,
citaon and documentaon. Data curaon is the “acve
and on-going management of data through its lifecycle
of interest and usefulness to scholarship, science and
educaon” through acvies that “enable data discovery
and retrieval, maintain quality, add value and provide for
re-use over me (Library, 2021)”. The best data curaon
acvies are required for enabling research data sharing
and reuse through research data repositories. The University
of California, library states the process of research data
curaon includes metadata and documentaon, le and
folder organizaon, storage and preservaon, version
control, the carpentries, data diconaries, etc (University of
California, 2021). The frequently used terms in research data
curaon are archive, preservaon, back-up, le formats, le
sharing, creave commons, agreement, license, metadata,
data repositories, non-proprietary le format, persistent
ideners, standards etc. Majorly eight data curaon steps
are receiving, appraising and selecng, processing, ingesng
and storing, describing (with appropriate metadata),
facilitang access, preserving and reusing research data.
Librarians are increasingly expected to play a role in data
curaon, where they can assist researchers with data
curaon by maintaining and adding value to research data
for current and future use. A librarian can take measures
to ensure the data is documented, maintained and access
through the proper channel. Now a day’s librarians are also
called as data curators, digital curators, data analysts, data
manager, metadata creator etc. according to the role and
responsibilies of librarians within data curaon pracces.
Since data curaon is an area of inter-disciplinary research
and pracce, librarians need to develop the knowledge
on research lifecycle, data policy, data curaon, subject
knowledge on managing data etc poses new challenges
for librarians. And the process of future data curaon
is crucial, ICT in organizaons and libraries with their
preservaon skills and repository experience need to work
together. Thus, globally library professional associaons
have developed the educaon and training frameworks of
skills and competencies required for librarians to extend
their services in data curaon. Librarians can play vital role
in the current data curaon system with creang policy,
data deposit templates, preserving data, administrang
infrastructure, the establishment of collaboraon network
among the data generator to understanding data curaon
needs and the importance of data management and sharing.
Data Curation
Data curaon is the technical funcon that ensures research
datasets are stored and managed in ways that promote
ongoing integrity and accessibility. The data curaon acvity
of managing data throughout its lifecycle, appropriately
maintaining integrity and authenticity, ensuring that
it is properly appraised, selected, securely stored and
made accessible, while remaining usable in subsequent
technology environments. Understanding of data, as well
as research results, data acquision and manipulaon
processes must also curated. The research data curaon can
be performed by the individuals, departments or groups,
instuons, communies, disciplines, publishers, naonal
services and third party services (Rusbridge, 2007). Data
curaon is the signicant role that data librarians, data
curators, research communies play in appraising the value
of data for long-term preservaon. The term appraisal refers
to the method of idenfying digital content’s permanent
value for long-term preservaon. Therefore appraisal in
data curaon has been closely linked to data repository
or data archival policies on research data management
(Ogier, Nicholls and Spee, 2017).
Research Data Repository
Data repository (also referred to as a research data
repository) is a searchable interfacing enty that can store,
manage, maintain and curate Data/ Digital Objects. It
manages the locaon where research data is registered,
permanently stored, made accessible and retrievable, and
curated (Johnston, 2017). Research data repositories are
an opportunity for librarians to leverage their experse in
curaon, outreach and preservaon while strengthening
their long-standing relationships with academic
departments in order to implement robust repositories
that sasfy the needs of their communies (Gerwig, 2017).
Treloar and Wilkinson, (2008) argue that research data
repositories should be support for easy access to data and
other informaon reliable and consistent forms. Gradually
data repositories are increasingly replacing instuonal
repositories of universies (Gowen, Meier, 2020), because
instuonal repositories plaorms (Eprints, Dspace, Digital
Commons, OJS,) also supports data discovery, provenance,
access controls, access, identy management, auding of
use, accountability and impact (Alsaad, O’Hara, Carr, 2019;
Macgregor, 2020).
The re3data.org-Registry of Research Data Repositories
which indexes the research data repositories and oer the
services to the researchers, funding organizaons, libraries
and publishers etc launched 28
th
May, 2013. It indexes over
2300 + research data repositories from around world and
presents data in typologies categories like instuonal,
disciplinary, muldisciplinary project-specic repositories
(Pampel, H., et al., 2013). These repositories are being
sponsored by the Governments, funding agencies, academic
instuons, professional sociees and scholarly publishers
(Goben & Sandusky, 2020). FAIRsharing registry is also
45
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
a collecon of public research data repositories which
provides the standards, databases, policies collecon/
recommendaons (Suhr, et al., 2020).
The key factors of research data repositories to encourage
research data deposit and sharing, make the repository
more visually appealing, carry out tailored and connuous
advocacy, demonstrate stascs, internaonal interest,
get good visibility in Google’s search engine results, strong
community support for the repository, use of terminology
and the language of ‘repository-speak’, make the deposit
process as easy and streamlined as possible, saving me
with data entry and avoiding duplicaon of eort. Support
and good pracce in managing research data and own IPR
and clearing third-party copyright (Gramstadt, 2012). JISC
has been encouraged in the creaon of several repositories
like EThOS (hp://www.ethos.ac.uk/), JorumOpen (hp://
www.jorum.ac.uk/), Depot (hp://depot.edina.ac.uk/)
etc. in the UK provides the usage services, preservaon
services andshared infrastructure services (Jacobs, Thomas,
& McGregor, 2008).
Literature Review
Data Curaon is not a new term, being well used established
in art and museum pracces (Rusbridge, 2007). However,
it is relavely new in relaon to research data and was
rst used in Russian literature (Kosinov, et al., 2019).
Research data curaon goes beyond data management,
as it comprises addional services to preserve and add
value within the research lifecycle of the research project
and beyond, i.e. by enabling reuse (Partlo, Symons and
Carlson, 2015). It adds value and increases the quality
of data (Ali, 2019) and as it is acve and on-going data
management acvity throughout the research lifecycle, it
enables the data “authencaon, archiving, management,
preservaon, retrieval and representaon” (GEO, 2015). The
Data Curaon Centre (DCC) guide on how to develop RDM
services describes the role and responsibilies of individual
stakeholders who can deliver RDM service (Jones, Pryor,
Whyte, 2013). DCC takes to broad view on data curaon and
is concerned mainly with sustainability and exit strategy,
data resources, access, re-use, preserving and archiving
and me scales (Rusbridge, 2007). Curang research data
is a part of scholarly record and is recognized by research
funders, government agencies and research instutes
(Bryant, Lavoie, Malpas, 2017). Digital curaon is dened
as “maintaining and adding value to a trusted body of digital
informaon for future and current use, specically, the
acve management and appraisal of data over the enre
lifecycle” (Ogier, Nicholls, Spee, 2017).
A research report by Rusbridge (2007) has idened the
top data curaon acvies as documentaon, secure
storage, metadata, data visualizaon, versioning, le
format transforms, quality assurance, soware registry,
contextualize, code review, persistent identifier and
le audit for research data. Bielefeld University library
has formulated minimal data quality framework “Data
Irreproducibility Analyzer (DIRA)” for checking data quality
(Schirrwagen, et al., 2019). UiT University library provides
the data curation services including data collection,
descripon and organizing, analysis, archiving, haring,
re-collecon (Ali, 2019).
Nowadays’ data processing in the data curaon may
range from the simple calculaons made in a spreadsheet
editor, to distributed processes that data using dedicated
soware and hardware tools (Miksa, Rauber, 2015). The
data processing process would be comprehensive guidance,
few process are build workow of curaon/ re-usability,
keep data which has ability to process, make ownership
and allowable uses and make it citable (Rusbridge, 2007).
The level of stang and skills for curang data are the key
to research data curaon and it made easier for fellow
researcher and future collaborators to understanding and
more likely to be trusted (Johnston et al., 2017).
A librarian should be required the more skills on subject
knowledge, IT knowledge, legal knowledge, ethics, research
life cycle awareness including data curaon skills, data
descripon and documentaon skills (Schmidt, Shearer,
2016). The board skills that a librarian required for RDM
implementaon are: providing access to data, advocacy and
support for managing data and managing data collecons
addionally librarians role also related to the open access
and instuonal repositories, collecon development,
advisory services (copyright, policies, etc.), informaon
literacy, digital curation, digital preservation, digital
collecons.
Objectives
The objecves of this research study are governed in
presenng the best data curaon acvies in research
data repositories for enabling eecve research data sharing
and reuse. The broad objecves study as follows:
To idenfy the best pracces of the data curaon
process in research data repositories
To examine the interdisciplinary role of librarians,
skills and competencies required for providing data
curaon services
To highlight the responsibilies of data generators and
data providers
To give an overview of the soware and applicaons
tools available for various data curaon acvies
Methodology
The researchers adopted the qualitative research
methodology for the study. It was guided by content analysis
of literature published. Scopus citaon and bibliographical
database and Google Scholar scholarly search engine were
46
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
used to nd the literature on data curaons acvies in
research data repositories, interdisciplinary librarians’ role,
skills and competencies required to perform the research
data curaon acvies. Literature we also analyzed on
the primary role and responsibilies of data generator/
provider in data curaon process to ensure the data access
made available through proper channel. A web-survey was
undertaken to idenfy and collect soware and applicaon
tools available for data curaon acvies including data
repository, data cleaning, metadata schemas, data idener
schemas, controlled vocabularies, creang and eding
metadata, eding images or videos, storing data, idenfying
and validang, data les, transferring data, indexing data
for searches, tracking and measuring data, internet web
applicaons etc.
Findings and Discussion
Data Curation Process
The effective implication of data curation process,
fundamentally required the professional skills, educaons,
domain knowledge and IT skills for the data librarian,
researcher and other stakeholder (Goben, Raszewski, 2015;
Wiljes, Cimiano, 2019). The informaon professionals,
data stewards, data libraries, data curator, IT departments,
metadata experts and IR manager are aspiring the skills on
data storing, managing, archiving and research data sharing,
metadata creaon and metadata analysis. The general
aspects of the data set when received from researcher are
how many les? Total size of the data set, le formats and
soware need to open the les stage of data (e.g., raw,
processed, etc.), is there documentaon available, who
owns the copyrights for this data? What related metadata
standards are commonly used in the data or the eld? Data
sharing concerns, who are the coauthors of the data? Who
funded this research, are there agency requirements for
sharing? What are the instuonal obligaons for data
release? Is there potenally patentable informaon, what
licenses, if any would the data be released under? Are there
sharing concerns, such as protecng the identy of human
subjects? What are the goals for disseminaon (e.g., world
wide access, researcher only access)? Are there exisng
repositories in the eld that you nd and download data
from? Long-term value, are there exisng publicaons that
make use of or cite the data? Will the data change or be
added to over me? How oen? Are there alternave le
formats recommended for deposit? (e.g., the data curator
may recommend a format for preservaon purposes.), is
the data easy or dicult to reproduce and why? What is
the reuse potenal of this data? When, if ever, should the
data be withdrawn or destroyed? etc. To answering these
quesons, the potenal set of data curaon acvies are
needed. The Figure 1 and Table 1, presents the major data
curaon process, major acvies, interdisciplinary librarians
and their required skill set responsibilies of data generator
around the data curaon.
Figure 1.Data Curation Process
47
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
Table 1.Data Curation Process Best Practices, Interdisciplinary Librarian’s Role, Skill Set and
Responsibilities of Data Generator and Researcher
Data curaon
process Major acvies
Interdis-
ciplinary
librarians
Skill Set Possible responsibilies
of data generator
Preparaon for
data curaon
Creang local policies
and rules Data curaon
workow wizard
Building infrastructure
and facilies Skills
and knowledge for
data curaons, Assign
curaon responsibilies
to appropriate data
curator, Build plan data
curaon structure in data
repository
Head
- Data
Librarian
Understanding data
curaon lifecycle,
Strategic understanding
and inuencing skills,
Understanding the best
pracces of data curaon,
Knowledge of data curaon
acvies, So skills, me
management
Collaboraon with
data curator, librarian,
metadata specialist,
data repository manager
and other stakeholder,
Part of creang internal
data curaon policies
and rules, workow,
Volunteering for data
curaon acvies
Understanding
needs of data
curaon
Understanding researcher
needs, Interviewing data
generator, Consulng
researcher and data
provider, Conceptualize
data with data provider,
Provide outreach for
data repository and data
curaon
Knowledge on researcher’s
needs and available
datasets, Negoaon
and communicaon
skills, Coordinaon skills
across instuons, Ability
to communicate and
collaborate, Ability to work
with data curaon team
Conceptualize data with
data curators Selecon
of data to be curated and
published, Idencaon
of data generate sources
Data receiving
and gathering
Data receiving policy
for researcher, Deport
rights and license
agreement, Facilies
for data deposit, Data
deposit template, Data
gathering with minimum
available metadata
and documentaon,
Acknowledgement of data
recieving
Data
Curator
Knowledge on data
deposit template creaon,
Negoaon skills,
Coordinaon skills across
instuons, Familiar with
research data, Academic
research methods, Ability to
handle data complexity and
diversity, Skills in computer
science
Adhere data deposit
policy Deposit data les
in required template with
data sources, Provide the
minimum metadata of
data Provide data access
rights, Providing Creave
Commons license
Data cleaning
and appraisal
Understanding of data-
stage of data (e.g. raw
and processed etc.), size,
types, format etc. Data
validaon and vericaon
Data cleaning from
stascal perspecve Data
classicaon, Idenfying
qualitave & quantave
errors, Error repairing,
migraon, Consider key
risk factors
Data
Scienst
/ Data
Analysist
Data quality literacy skills,
Ability of data cleaning
and error detecon,
Familiarity with research
data cleaning tools and
applicaon, Familiarity with
data analysis and stascal
tools, Ability to understand
and measure data quality
Knowledge on Geospaal
data and soware,
Knowledge about data
manipulaon technologies
Convert the le formats
to non-propriety formats,
Data classicaon
by stage of data, size
format, types, Minimize
the errors, Extend
operaonal support for
error repairing, Extend
support in data analysis
48
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
Data
documentaon
and metadata
Data preparaon for
preservaon formang,
le organizaon,
packaging, Metadata
creaon, descripon
Structure and descripve
metadata, Disciplinary
metadata standards /
schema Metadata tools
and resources, Idener
schemas, controlled
vocabularies, Use of
applicaon tools available
for data curaon
Metadata
Specialist
/ Subject
specialist
Familiarity with metadata
elements, standards,
schemas and tools Skills
metadata creaons
Knowledge of data
preparaon methods,
organizaon, Knowledge
on disciplinary metadata
standards parcularly
research data, Knowledge
on subject area Skills on
creaon of metadata
standards according
data sets, Familiarity
with idener schemas,
controlled vocabularies
Familiar with metadata
structure, standards
and schema Metadata
creaon for data, Unique
and proper le naming
system, File arrangement
in hierarchy system, Work
with metadata specialists
to create appropriate
metadata
Long term data
preservaon
Needs of long-term
preservaon Selecng
dataset for long-term
preservaon, Rights and
permissions, Data storage
and security Long-term
preservaon value, Plan
and standardizaon
Data
Manager
Skills on license agreement
and copyright Data long-
term management skills,
Knowledge on policies for
preserving data
Valid datasets for long-
term preservaon,
Provide rights and
permission for archive
Secured authencaon
for condenal data
preservaon
Data publishing
in repository
Find trusted data
repositories, Building
open data repository
Assists of required data
security features, Transfer
the processed data les
to the repository, Data
sharing policies Key
components of data
publishing
Repository
Manager
Technical skills, Soware
required for the repository,
Development and
managing data repositories,
Knowledge on Server and
digital library architecture,
Managing data repository
daily basis, Team work
sills, Maintain and update
repository soware
Suggest and nd
discipline data
repositories, Allow data
sharing and publishing,
Ensuring deposited data
published in repository,
Work with data curator
and repository manager
to upload dataset into
repository
Data access
and reuse
Searchable and
discoverable facilies,
Annotang data for
relevant enes,
Opmizing data to search
engine, Keeping data
up to date into mirror
repository, Data embargo,
authencate access
Monitor data reuse, data
citaons, Consider post-
ingest review techniques,
Provide customer support
User access guide and
Data promoon skills, Data
aenon plaorms, Search
engine opmizaon, Data
repository index in scholarly
search engines, Enabling
automac process for
indexing, Opmizing data
repository content
Make sure published
data accessible to users,
Embargoed access,
Provide request base
condenal data access,
Monitoring data citaon
received, Promoon of
data access, Searching/
browsing / downloading,
Share recommended
citaons and contribute
citaons data
49
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
Re-evaluaon
of data
Evaluate or view research
data, Withdrawal of data
from repository, Ensuring
future usability, Enabling
data citaons, Regular
data up-to-date
Data
Curator
Evaluang value of
dataSkills to idenfy out
dated data
Evaluate the data
published in data
repository Ensure long-
term data quality
Analyzing data
usage
Monitor the usage,
downloads, view,
citaons, Managing
descripve stascs of
data usage, Data usage
tracking, Data external
users, researcher, funder,
agencies
Data
Manager
Usage analysis and review
skills, Enabling data usage
tools skills, Enabling
Almetrics Improve access
experience, Skills in metrics
Disseminate of data
sets, Social networking
/sharing/tagging, Data
cing
Connuing
pracces
Educang researchers,
Providing workshops
for data analysis tools,
Outreach program for
data curators and instute
community Adopng
best pracces of data
curaons Learning future
technologies
Head
- Data
Librarian
Data literacy skills, Event
organizaonal skills,
Tutorial and training
module development skill,
Networking skills, Subject-
experts
Taking place of
workshops and training
program, Idenfy the
best pracces in data
curaon
Research Data Librarians: Interdisciplinary
Roles and Competencies
Research data curaon process is typically dened as a
set of acvies that required to involved mul skilled
informaon professionals, therefore it described the
data curaon planning, data acquisions, data cleaning,
analysis, data publishing and long-term preservaon etc.
Therefore, interdisciplinary skilled informaon professional
required to impalement best pracces in the data curaon
acvies. As menoned in Table 1, the major informaon
professional are like data librarian, curaon librarian,
digital collecons curator, digital content strategist, data
management consultant, data curaon librarian, digital
projects designer, repository specialist, technical analyst,
repository coordinator, data curaon specialist, repository
coordinator, metadata specialist, system administrator,
soware developer.
The worthy skillset and competencies are required for
such interdisciplinary librarians to develop the best
pracces in data curaon acvies in the research data
repository. The major skills including Understanding of
data curaon lifecycle, Familiarity with research data (e.g.,
Ability to handle data complexity and diversity), Collecon
management skill, Metadata knowledge parcularly for
research data Technical details of repository soware, server
and its architecture Understanding disciplinary metadata,
workow and knowledge on academic research helps
them to build or plan data curaon structure in their data
repository, consult with data providers and connect them
to metadata specialists or repository managers, facilitate
communicaon across dierent enes, outreach and
educate campus community, work with data providers to
help add metadata and upload data into data repository,
helps data provider to create appropriate metadata for
their dataset and provide the support researcher and help
in management of data repositories.
Responsibilities of Data Generator
Data generators and also called data providers are primary
stakeholders of the research data curaon pracces in
instutes. The major data generator of the instute includes
faculty members, postdoc researchers, researchers, graduate
students, undergraduate students and other aicted
researchers who have been involved in creang research
data from a sampled data sources in both qualitave and
quantave studies. Therefore data generator has huge
role to play in the research establishment of best pracces
in the research data curaon. Data generator have to
collaboravely work with data librarian, data curator, data
scienst, metadata specialist, data manager and repository
manger to make sure research data published through
research data repositories and enabling data sharing reuse.
Gradually, researchers have been managing and organizing
the data created to easily retrieving when it is required. But,
for publishing research in a repository the researcher has
to work with the data curaon team to properly preserve
50
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
research data. Being a data owner, the researcher has to
help metadata specialist to dene the metadata elements
of the data generated by them. The various types of the
data created like data (e.g., raw data), text documents (e.g.,
word, pdf, latex, txt), spreadsheets (e.g., excel), slides (e.g.,
PowerPoint), audios, audio-visuals, images, laboratory
notes, stascal data, databases (e.g., access, MySQL)
would require dierent metadata elements to be used to
dene the data. As menoned above in Table 1, along with
the metadata creaons, the data generator has a key role
to play with Head-Data Librarian for creang internal data
curaon policies and rules, workow, conducng outreach
acvies and idenfying the best pracces in data curaon.
The data generator has to further deposit the research
data in a template made by the data curator, transfer
data access rights, convert le formats to nonproprietary
formats, data classicaon, error minimizing, data analysis
support, browsing/ downloading, etc. The data generator
also helps in promong the research data published through
a research data repository while they share research data
through social networking sites, scholarly search engines,
data repository directories, to increase the data citaons
and widely disseminate the research data across the world.
The data generator can also evaluate the data published
in data repository to ensure the long-term data quality
and up-to-date data available for user access. The data
generator hold the rights to share the condenal data
to users, therefore users have to make formal request to
the data generator through a data repository to avail the
condenal data. Table 1 gives and overview of the key roles
of the data generator in each stage of the data curaon.
Software and Application Tools for Data
Curation
Data curaon process is set of technical and non-technical
acvies, it requires various soware applicaon tools
to implement data curaon process in the research data
repository. There are many open source and property
soware applicaon tools available in the each stage of
data curaon process. Table 2, presents the major list of
applicaon soware available for data curaon in a research
data repository. A data librarian can use the best pracce
template to prepare dra of the data curaon process, and
online survey/ interview tools can be used to understanding
needs of data curaon. A data deposit template helps in
gathering research data from the data generators with
required addional details. However, members of a data
curaon team should have skills to use the below soware
applicaon tools for data curaon acvies.
Types Soware and Applicaon Tools
Data Repository Soware Bepress Digital Commons, DSpace, Hydra, Dataverse, HUBzero, Aubrey, SobekCM,
EPrints, CKAN,
Metadata Schemas
Dublin Core (DC), Qualied DC, DataCite Metadata, MODS, METS, PREMIS, MIX, EAD,
Darwin Core, Ecological Metadata Language (EML), Visual Resources Associaon (VRA
Core), DDI, CIF (Crystallographic Informaon Framework), ABCD (Access to Biological
Collecon Data), AgMES (Agricultural Metadata Element Set), AVM (Astronomy
Visualizaon Metadata), PREMIS
Metadata Schemas used
in Supplementary Space Darwin Core, EML, DDI, TEI, FGDC, ISO 19115 Geographical Metadata (ISO 19115),
Idener Schemas DataCite, DOI, Handle, ARK, HTTP URI, Permanent local URL,
Controlled Vocabularies
DC Contolled Vocabularies, Library of Congress Subject Headings (LCSH), Medical
Subject Headings
(MeSH), Faceted Applicaon of Subject Terminology (FAST),Only with Hydra: DC RDF
Ontology, FOAF,
RDF Schema, Astronomy Thesaurus, NASA Thesaurus, Art & Architecture Thesaurus
Creang and Eding
Metadata
Microso Word, Microso Excel, Text Editor (WordPad, Notepad++),
Oxygen XML Editor, Morpho (Ecology Metadata Editor), Nesstar
Eding Images or Videos SnagIt Photoshop for images, Handbreak for audiovisual, image
Cleaning Data Open Rene, Data Cleaner.
Storing Data Dropbox, Google Drive
Idenfying and Validang
Data Files DROID, PRONOM, Git for version control, FITS for ®le characterizaon
Table 2.Software and Application Tools for Data Curation in Research Data Repository
51
Virtual Internaonal Conference on
Stascal Tools and Techniques for Research Data Analysis ICSTRDA 2021
21 & 22 January 2021
Transferring Data BagIt
Indexing Data for
Searches Apache Solar
Tracking and Measuring
Data Altmetric,
Internet web applicaons EZID service, Google rene,
Conclusion
Data curaon acvity of managing and promong the use
of research data from its point of creaon ensures its usage
for contemporary purpose, helps in for the discovery and
re-use of data. The best pracces of data curaon process
is like a cycle begins from preparaon, understanding the
needs and arrangement of infrastructure prerequisite, data
receiving, cleaning, documentaon, metadata creaon,
preservaon, data publishing, data access, evaluate the
usage and data citaons. Disciplinary metadata creaon
is a major acvity of the data librarians along with data
repository development which helps in publishing data for
future reuse. Librarian’s has to develop the interdisciplinary
and IT oriented skills and competencies along with academic
research knowledge to perform best data curaon pracces.
The interdisciplinary skills help them to curate data,
metadata creaons, descripon and documentaons. IT
oriented competencies serve to develop the user friendly
data repositories and data cleaning. Academic research
knowledge helps librarians to understand the research
acvies and need of the data curaon assistance by data
generators. Appropriate use of soware and applicaon
tools facilitate in the development and pracce of data
curaon. The training and connuing educaon of data
curaon skills in is not limited only to library and informaon
professionals, its scope has been expanded to professionals
form the computer science domain too. This study benets
the key data curaon stakeholders in understanding the best
pracces for data curaon by idenfying the key stages of
data curaon, core data curaon acvies, soware and
applicaon tools available, the role and responsibilies of
each stakeholder including data librarians and researchers
References
1. Ali AK. Research Data Management Support. The 14th
Munin Conference on Scholarly Publishing. Tromsø:
UiT University. 2019.
2.
Bryant R, Lavoie B, Malpas C. The Realies of Research
Data Management Part Four: Sourcing and Scaling
University RDM Services. Dublin: OCLC Research. 2018.
3.
Cudre-Mauroux P. Design Consideraons on SWITCH’s
Connectome Vision. Zorich: A SWITCH Innovaon Lab.
4.
GEO. Data Management Principles Implementaon
Guidelines. Group of Earth Observaons.
5. Gerwig KJ. Current Outreach and Markeng Pracces
for Research Data Repositories. L. Johnston, Curang.
Research Data 2017; 1.
6.
Goben A, Sandusky RJ. Open data repositories: current
risks and opportunies. College & Research Libraries
News 2020; 81(2).
7.
Gramstadt MT. Kulvang Kultur: Increasing Arts
Research Deposit. ARIADNE. 2012.
8.
Jacobs N, Thomas A, McGregor A. Institutional
repositories in the UK: the JISC approach. Library Trends
2008; 57(2): 124-141. doi:10.1353/lib.0.0035
9. Jones S, Pryor G, Whyte A. How to Develop Research
Data Management Services - a guide for HEIs. In DCC
How-to Guides. Edinburgh: Digital Curaon Centre.
10. Library. Research Data Curaon. Retrieved from The
University of Melbourne: hps://library.unimelb.edu.
au/digital-stewardship/research_data_curaon
11. Miksa T, Rauber A. Beyond Data: Process Sharing and
Reuse. ERCIM News 2015; 100: 18-10.
12. Noonan D. Data Curaon and the University Archives.
The American Archivist 2014; 77(1).
13.
Ogier A, Nicholls N, Spee R. Open Exit: Reaching the End
of the Data Life Cycle. L. Johnston, Curang. Research
Data 2017; 1.
14.
Partlo K, Symons D, Carlson JD. Revoluonary or
evoluonary? Making research data management
manageable. B. Eden, Creang Research Infrastructures
in the 21
st
Century Academic Library: Conceiving,
Funding and Building New Facilies and Sta.
15.
Rusbridge C. Create, curate, re-use. Proceedings of
Educause Australasia, 2007. Auckland: EDUCAUS.
16. Schmidt B, Dierkes J. New alliances for research and
teaching support: establishing the Göngen eResearch
Alliance. Program 2015; 49(4): 461-474. doi:10.1108/
PROG-02-2015-0020
17.
Schmidt B, Shearer K. Librarians’ Competencies Prole
for Research Data Management. Chicago: Joint Task
Force on Librarians’ Competencies in Support of
EResearch and Scholarly Communicaon, 2016.
18.
University of California, L. Research Data Curaon.
Retrieved from University of California, LIBRARY: hp://
library.ucmerced.edu/research-data-curaon
19. Wi M. Instuonal Repositories and Research Data
Curaon in a Distributed Environment. Library Trends
2008; 57(2): 191-201.
ResearchGate has not been able to resolve any citations for this publication.
Book
Full-text available
The four-part report is openly available at https://www.oclc.org/research/publications/2017/oclcresearch-research-data-management.html. In this introductory report, we provide some brief background on the emergence of RDM as a focus for research support services within higher education, and present a simple framework describing three major components of the RDM service space: Education—educating researchers and other stakeholders on the importance, and in some cases, the necessity, of responsibly managing their data and making arrangements for its long-term curation Expertise—providing decision support and customized solutions for researchers working through specific research data management problems Curation—supplying technical infrastructure and related services that support data management throughout the research cycle
Article
Full-text available
Author version: http://resolver.sub.uni-goettingen.de/purl?gs-1/12119 Purpose – The purpose of this paper is to describe the design and implementation of policies, digital infrastructures and hands-on support for eResearch at the University of Göttingen. Core elements of this activity are to provide support for research data management to researchers of all disciplines and to coordinate on-campus activities. These activities are actively aligned with disciplinary, national and international policies and e-infrastructures. Design/methodology/approach – The process of setting up and implementing an institutional data policy and its necessary communications and workflows are described and analysed. A first assessment of service development and uptake is provided in the area of embedded research data support. Findings – A coordination unit for eResearch brings together knowledge about methods and tools that are otherwise scattered across disciplinary units. This provides a framework for policy implementation and improves the quality of institutional research environments. Practical implications – The study provides information about an institutional implementation strategy for infrastructure and services related to research data. The lessons learned allow insights into current challenges and work ahead. Originality/value – With a cross-cutting, “horizontal” approach, in the Göttingen eResearch Alliance, two research-orientated infrastructure providers, a library and an IT service, combine their services and expertise to develop an eResearch service and support portfolio for the Göttingen Campus.
Article
As data sharing has become a more familiar obligation for academic researchers, there has been a correlating increase in the roles that librarians play supporting open data repositories and providing data management consulting and services. These repositories are sponsored by governments, funding agencies, academic institutions, professional societies, and scholarly publishers.
Article
Data curation-the active and ongoing management of data through its life cycle-is a concern for researchers. As caretakers, it would seem natural that the university archivist and the university archives have a role in the data curation process. This article addresses the authors' research project to determine the level of involvement of archives and archivists in data curation. The results are contextualized within archival theory and practice, and the authors suggest additional steps that should be taken if the profession is to have a role in data curation.
Article
As the uptake of repositories increases, JISC continues to support UK universities in developing their services. This article describes some of the key areas of activity at the institutional and national levels, illustrates the way in which effective networked repositories can support academics, and showcases the contribution of JISC-funded projects to the global growth of repository services. published or submitted for publication
Article
Broadly speaking, the lack of a framework for organizing, preserving, and making research data available for the long-term has resulted in valuable datasets becoming lost or discarded. The approach of the Distributed Data Curation Center of the Purdue University Libraries has been to integrate librarians and the principles of library and archival sciences with domain sciences, computer and information sciences, and information technology to address the challenges of managing collections of research data and to learn how to better support interdisciplinary research through data curation. One piece of infrastructure that supports these activities is a "distributed institutional repository" that includes electronic documents, digitized archival collections, and research datasets housed in multiple systems that are connected together using web services and other middleware. Concurrently, roles for librarians and institutional repositories in data curation are being explored.
Research Data Management Support
  • A K Ali
Ali AK. Research Data Management Support. The 14 th Munin Conference on Scholarly Publishing. Tromsø: UiT University. 2019.
Design Considerations on SWITCH's Connectome Vision. Zorich: A SWITCH Innovation Lab
  • P Cudre-Mauroux
Cudre-Mauroux P. Design Considerations on SWITCH's Connectome Vision. Zorich: A SWITCH Innovation Lab.
Current Outreach and Marketing Practices for Research Data Repositories
  • K J Gerwig
Gerwig KJ. Current Outreach and Marketing Practices for Research Data Repositories. L. Johnston, Curating. Research Data 2017; 1.
Kultivating Kultur: Increasing Arts Research Deposit
  • M T Gramstadt
Gramstadt MT. Kultivating Kultur: Increasing Arts Research Deposit. ARIADNE. 2012.