ArticlePDF Available

Abstract and Figures

The management of research data throughout its life-cycle is both a key prerequisite for effective data sharing and efficient long-term preservation of data. This article summarizes the data services and the overall approach to data management as currently practised at ETH-Bibliothek, the main library of ETH Zürich, the largest technical university in Switzerland. The services offered by service providers within ETH Zürich cover the entirety of the data life-cycle. The library provides support regarding conceptual questions, offers training and services concerning data publication and long-term preservation. As research data management continues to play a steadily more prominent part in both the requirements of researchers and funders as well as curricula and good scientific practice, ETH-Bibliothek is establishing close collaborations with researchers, in order to promote a mutual learning process and tackle new challenges.
Content may be subject to copyright.
Journal Article
Research Data Services at ETH-Bibliothek
Sesartić, Ana; Töwe, Matthias
Publication Date:
Permanent Link:
Originally published in:
IFLA Journal 42(4),
Rights / License:
In Copyright - Non-Commercial Use Permitted
Research Data Services at ETH-Bibliothek
Ana Sesartic1
ETH-Bibliothek, ETH Zürich, Switzerland
Matthias Töwe
ETH-Bibliothek, ETH Zürich, Switzerland
The management of research data throughout its life-cycle is both a key prerequisite for effective
data sharing and efficient long-term preservation of data. This article summarizes the data services
and the overall approach to data management as currently practiced at ETH-Bibliothek, the main
library of ETH Zürich, the largest technical university in Switzerland. The services offered by service
providers within ETH Zürich cover the entirety of the data life-cycle. The library provides support
regarding conceptual questions, offers training and services concerning data publication and long-
term preservation. As research data management continues to play a steadily more prominent part
in both the requirements of researchers and funders as well as curricula and good scientific practice,
ETH-Bibliothek is establishing close collaborations with researchers, in order to promote a mutual
learning process and tackle new challenges.
Data life-cycle, data management plan, libraries, preservation, research data, research data
The growing volume of data produced in research has created new challenges for its
management and curation to ensure continuity, transparency and account- ability. Timely and
effective management of research data throughout its life-cycle ensures its long-term value
and prevents data from falling into digital obsolescence (Corti et al., 2014; Goodman et al.,
Proper data management is a key prerequisite for effective data sharing within a specific
scientific com- munity and for data publication beyond any particular target group. This, in turn,
increases the visibility of scholarly work and is likely to increase citation rates (Piwowar and
Vision, 2013: 25; Piwowar et al., 2007). Managing research data throughout its life-cycle
is not only a key prerequisite for effective data sharing but also for efficient long-term
preservation because the latter must rely on technical, administrative and rights metadata, as
well as sufficient context information being available to make sure that data remains usable
and understandable in the long run.
1 Corresponding author:
Ana Sesartic, ETH Zürich, ETH-Bibliothek, Rämistrasse 101, CH-8092 Switzerland.
Depending on their respective institutional setting, libraries can contribute to research data
management in different ways and likewise, expectations from their patrons can vary
widely, e.g. by scientific discipline. Therefore, the following report should be understood
as a case study rather than a general recommendation.
Libraries are never the only service providers in a university and, ideally, a range of providers
caters for researchers’ and students’ needs. When it comes to data management in particular, IT
services will obviously be a strong player on the technical side whereas research offices must
take an interest in how researchers comply with internal and external requirements. In such a
landscape, it is important to note that libraries should focus on their strengths such as metadata
management, content curation, and support and training of their customers, in this case the
researchers. Also, the services offered are never carved in stone and should be adapted to the
current needs of science.
In the following, we summarize the data services and approach to data management as
currently practised at ETH-Bibliothek (ETH Zürich, 2016c), the main library of ETH Zürich (ETH
Zürich, 2016b), the largest technical university in Switzerland.
Figure 1. Actors and their tasks at ETH Zürich along phases of the data life-cycle.
Units in red boxes
belong to ETH Library.
The overall concept
The role of the ETH-Bibliothek within ETH Zürich is to support researchers from early on in the
data life- cycle (see Figure 1), starting with general consulting regarding compliance, support
through the development of data management plans, the offer of data archiving and
publication services, to the creation of DOIs for easier citation and re-use of data. Important
phases of the life-cycle mainly in its active research phase (see Figure 1) require
competencies beyond what the library can offer. Other actors within the university support
these phases and a clarification of the division of tasks between those service providers is
needed. At ETH Zürich, a good level of mutual
understanding, for example of the activities in
storage and preservation, was achieved between library staff and the storage section of the IT
services as part of a common small-scale project starting in 2006. Regular meetings have
continued ever since. Nevertheless, it
can be challenging to communicate the division of
transparently to customers. Ideally, there
should be only a single point of contact for
to turn to. This has not been established so far. How-
ever, it should not matter who
customers contact as
long as they are redirected appropriately. This, again,
requires a
reasonable understanding of other units’
services, which obviously evolve over time.
With the Digital Curation Office (ETH-Bibliothek, 2016a), the ETH-Bibliothek offers a point of
contact for technical and conceptual questions regarding long-term preservation and
management of research data. It also offers help and support to the researchers of ETH Zürich
in managing and publishing their data, as well as following the requirements as stated in the
Guidelines for Research Integrity of ETH Zürich (ETH Zürich, 2011: 31). Regarding intellectual
property and research ethics issues, the ETH-Bibliothek closely collaborates with the
Technology Transfer Office and the Office of Research, respectively.
Figure 1 illustrates the coverage of roughly defined stages of the research data life-cycle by actors
and services at ETH Zürich with their respective tasks. Note that Guidelines for Good Scientific
Practice and further legal requirements (centre) apply throughout the cycle, but should obviously
be considered from early on.
For a first-time or occasional customer, the multitude of actors can obviously be confusing
and users should not be left alone with it. From our experience, customers appreciate being
able to get in touch with a contact person they can talk to about their needs. In a first instance,
it is not even expected that this person can provide an immediate solution, but it is important
that someone takes care of the issue and provides guidance as to where to turn. This
means that all
actors must be aware of tasks not within their own portfolio as well, and we
are aware that this remains a challenge and requires an ongoing learning process. As a detail
to underline the importance of personal contacts: as one of the first teams within the ETH-
Library, the Digital Curation Office put portrait images of its staff on its website to lower
the barrier of writing to an otherwise anonymous email box.
Data management training
Research data management (RDM) has gained increasing attention over the last years, due to
growing aware- ness of the value contained in research data and of the risks of losing such data
over time. Apart from the need to manage data over the course of a project, there is a need to
retain and curate data which one plans to work with in the future. It might be possible and
sometimes advisable to repeat an experimental measurement, but
in other cases it might either
be prohibitively expensive or otherwise ethically unacceptable to do so. For unique
observational data, a repetition is not possible at all and accordingly, communities relying on
such data have long been aware of the challenges.
While these issues are mainly related to the efficiency and effectiveness of research and its
funding, RDM also addresses the accountability of science. One principle of the scientific process is
the requirement to be able to justify results by providing underlying data where necessary.
There are significant reputational risks involved for individual researchers, principal
investigators and institutions who fail to comply with good scientific practice of which RDM is just
one part. To address these questions a workshop within the ETH Critical Thinking Annual
Programme (ETH Zürich, 2016e) was developed. The overall aim of this programme is to
strengthen critical thinking and a responsible approach to taking actions beyond disciplinary
competences. The workshop introduced some services and tools for RDM, as well as
encouraging the participants to share both their experiences and the methods and tools they
use, during the interactive parts of the workshop. The goal of the workshop within the
Critical Thinking programme was to increase the critical thinking skills of undergraduate
and graduate students, as well as scientists, regarding RDM. The philosophy behind this
approach is the conviction that researchers themselves must be empowered to make
informed decisions on their data, as they are the experts with the most intimate knowledge of
their own data. The workshop was focused on activating teaching methods, engaging the
participants in group work and discussions.
The majority of the participants at this event were working on their doctoral studies with few
students and post-docs present. They showed very varied needs and levels of
knowledge, but they were well aware of the problems regarding RDM before- hand and were
looking for solutions. This is why we also offer tailored training courses in RDM for groups and
departments. These can range from so-called Tools & Tricks mini lectures, short 15-minutes
inputs over coffee for lunch break, to fully-fledged one-day training workshops. It is important
to note that certain departments and institutes already offer similar or overlapping internal
training, which is why communication and coordination are key. As of now, there is no
dedicated course on RDM within the ETH curriculum; however, some departments offer
methodological courses including research ethics and scientific writing which might cover
some aspects. With increasing concerns about data management issues, it is to be
expected that the topic will figure more prominently in curricula in the future.
Data management plan checklist
Today, the availability of well-managed data is part of good scientific practice and ensures
the reproducibility of research results, a key requirement at the core of the research process.
Many funding organizations prescribe the use of data management plans and insist on open
access publication of the research results they funded.
In some parts of EU’s research programme Horizon 2020, DMPs will, for example, be
evaluated as part of the impact of a proposal and in the reporting during the course of a
funded project (European Com- mission, 2013: 6). But even if a funding body does not explicitly
demand data management, following professional curation and preservation concepts has
numerous advantages (DLCM, 2016):
It greatly facilitates the reuse of research data;
As a result, this increases the impact of research results;
It saves precious research funds and ultimately natural and human resources by avoiding
unnecessary duplication of work
As effective and efficient data management becomes more and more challenging for
both researchers and information specialists, the question arose how they can best be reached
and supported on a national level regarding best innovative practices in digital preservation so
as to succeed in this ambitious enterprise. This led to the creation of the Swiss Data Life-Cycle
Management (DLCM) project (Blumer and Burgi, 2015: 16), facilitated by swissuniversities
(swissuniversities, 2016) and involving collaboration between eight Swiss higher education
institutions (EPF Lausanne, ETH Zürich, Universities of Basel, Geneva (lead), Lausanne, Zürich,
Geneva School of Business Administration at the Western Switzerland University of Applied
Sciences and Arts, and SWITCH, the national IT service provider for higher education
The Data Management Plan (DMP) Checklist (ETH-Bibliothek and EPFL Library, 2016) is
among the first tangible deliverables of the DLCM project. It is meant to be an essential tool
aiding researchers in the management of their data, thus preparing them for later publication
and preservation, as needed. By giving clear guidelines, it should facilitate this task for
researchers and eventually save them time and effort. The list has been customized for
Switzerland based on pre-existing national and international policies. It covers general
planning and the phases of the data life-cycle, from data collection and creation to data
sharing and long-term management. Special sections cover documentation and metadata, file
formats, storage, ethical and intellectual property issues. Ideally, the list should be used by
researchers to critically assess their data management and to gather information they might
need to create a data management plan. It can also serve as a starting point for further
face-to-face discussions of data management issues within research groups and with
support staff if required. The list is static with no further functionality and it will be observed
whether a more interactive solution will be required later.
The checklist was created in close collaboration between the D igital Curation Office at
ETH- Bibliothek and the Research Data Team at EPFL Library. It is currently available
through the ETH-Bibliothek and EPFL Library websites (ETH- Bibliothek and EPFL Library,
2016) and will soon be disseminated on a national portal on DLCM which will be
launched in the next few months. The portal itself will touch on many aspects, aggregating
and providing further information about data organization, training (in person and online)
of end- users, and consulting regarding best practices among many others.
Active data management
Data management is understood as a comprehensive task throughout the data life-cycle. It
therefore needs to comprise the handling of research data while the actual research is carried
out. We call this active data management to signify that data at this stage is usually not static, but
keeps being analysed and worked upon as part of the research. At this stage of the life-cycle,
subject-specific tools are employed in data processing which may be implemented and run by a
specialized support unit or by research groups themselves.
At ETH Zürich, the section Scientific IT Services of the central IT Services provides this kind of
sup- port. Research groups from life sciences are known to be among the most intensive users
of their services. Among them figures the data management software platform openBIS (ETH
Zürich, 2016d), which has also been extended with components to serve as an electronic
laboratory notebook (ELN) and/or as a laboratory information management system (LIMS).
Obviously, this platform and other tools with similar aims must already capture a lot of
information, which is relevant for current research. Part of this information might be
gathered automatically, while more will be required to be entered by researchers, with
quality depending on their willingness to comply and therefore on the ease of the process. Most
of this input and possibly additional documentation will be needed in order to use and make
sense of the research data at a much later stage and with the active data
platform no longer
being available.
While such systems themselves are not meant as publishing or preservation tools for
research data, they can very well serve as sources for these pro- cesses. Ideally, researchers
working in such a system should be able to decide which part of the content must be
preserved or can be published to trigger an export, e.g. to a long-term preservation
solution. Such a process has not yet been implemented, but it is envisaged that an interface
from openBIS to the ETH Data Archive will be developed in the current DLCM project.
It should be noted that the exchange between the different systems must not be understood
merely as a technical transfer. Data at this stage also crosses a boundary between the
research world in a narrow sense and a curation domain (see Figure 2). Such a transition is
prone to suffer from misunderstandings between the parties involved due to a lack of common
standards or even a uniform vocabulary.
Figure 2. Transitions between curation domains along the research data life-cycle.
Adapted from
(Treloar, 2012).
This already highlights the importance of close collaboration not only of researchers and
service providers, but also among different providers with
complementary competencies.
An essential part of the
work to achieve this has been a constructive exchange
over several years
between ETH Library and the Central IT Services. Even before the Digital Curation
Team existed,
staff from various teams in the library
(e.g. University Archives of ETH Zürich, Library IT
Services, Consortium of Swiss Academic Libraries)
engaged in early pilot projects mainly with the
Central IT Services team in charge of storage management. These activities from 2006 onwards
served to
understand the required functions, to get an idea of the
available competencies and not
the least to build trust
within an, albeit loose, network of players in the field
of data
management within the university. This
helped to achieve a common understanding of the
and to raise awareness, e.g. for the need to
define more precisely what each stakeholder under-
stands when talking aboutarchiving’.
Two rather surprising outcomes arose from the exchange with a retiring professor on how to
transfer his well-managed archive to the library. Firstly, one of his post-docs involved in the
operation could be hired to work with ETH Library’s Digital Curation Team, and secondly, the
discussion of the principles underlying this research group’s archive recently led to a
publication highlighting the general concepts behind their approach which had proven
useful for almost three decades of research practice (Sesartic´
et al., 2016).
Publication and preservation of data
From the start of its activities around research data, ETH-Bibliothek saw a central role in
support of the processes for publication and preservation of such data. This was in line with
expectations from both researchers and other service providers in the university. At the same
time, it was decided to address preservation issues with a view on all kinds of scientific and
cultural heritage unique to ETH Zürich. This includes research data from ETH Zürich staff,
administrative records or personal bequests to ETH Zürich University Archives (ETH-
Bibliothek, 2016c) and digital born and digitized content of ETH- Bibliothek, in
particular doctoral theses which must be deposited in digital form and master files from
several large scale digitization projects.
Since 2014, ETH-Bibliothek has offered the ETH Data Archive as a productive service. It is
based on the commercial long-term preservation system Rosetta (Ex-Libris Group, 2016b).
The application itself is maintained by the library’s own IT services, while both virtual servers
and different types of storage at two different sites of the university are provided by ETH
Zürich’s central IT services.
ETH researchers can deposit content into the ETH Data Archive and define the appropriate
access rights. This can be done manually via a web client, e.g. for supplementary material
belonging to an article. Depositors are quite free to define on which level they want to or can
provide metadata: an archival package or intellectual entity may contain just a single file
with its individual description or a ZIP- or TAR-container with thousands of files included
under only one metadata set. The latter is sometimes made use of when a collection of files
belonging to a concluded thesis or another publication needs to be retained for a defined
period of time (10 years minimum). In this case, it is often assumed that the thesis itself is
the most comprehensive documentation of the data
package and should suffice for the
professor to answer any inquiries. This might not be an ideal arrangement, but it represents
considerable progress compared to some previously existing group archives held exclusively
on CD-ROM and DVD. When the focus is on safeguarding data for a limited period of time only,
no strict requirements on the longevity of file formats are enforced. If requested by researchers,
support is provided with identifying suitable formats.
In other cases, metadata is routinely gathered for research data on the level of individual
folders and files in research groups who have already seen a need to operate a managed
archive. They may use the open source editor and viewer docuteam packer (docuteam, 2016;
ETH-Bibliothek, 2016e) locally to organize files in a defined structure and add metadata
as required. To support this potentially laborious task, certain metadata can be pre-defined
or inherited from the top-most level. A researcher can then decide when to submit the whole or
part of the structure she or he created to the ETH Data Archive. For data in docuteam packer,
DOIs (Digital Object Identifiers) can already be reserved which will be registered to
become active after submission to the ETH Data Archive. The advantage of this method is
that the DOI can already be used in a manuscript even before the data has actually been
deposited. DOIs are registered with the consortium DataCite (2016). Likewise, access
rights and a retention period can be defined which will be enforced after submission.
Submission itself is handled by the tool docuteam feeder, which processes the Submission
Information Package (SIP) put out by docuteam packer into the according Archival
Information Packages (AIP) to be submitted to the ETH Data Archive. Docuteam packer will also
be used for submissions to ETH Zürich University Archives with a modified configuration and a
more interactive workflow between depositors and University Archives staff.
Obviously, there are limits to which kind of sub- missions can be comfortably handled via a
web user interface and via docuteam packer. Automated pro- cesses with submission
applications relying on existing sources for metadata and content are currently in use only for
library content from the institutional repository ETH E-Collection, for master files from the
digitization of rare books from ETH-Bibliothek in e-rara (ETH-Bibliothek, 2016b), and for
digitized material from ETH Zürich’s Archives of Contemporary History (ETH Zürich, 2016a). In the
future, such interfaces should also be created with existing sources for research data, such as the
platform openBIS mentioned above. Apart from fully automated processes, there might still be
others requiring a trigger from the data producer to give them more control of which part of
their content is archived when. For example, producers might want to collect, (re-)
structure and describe their data before finally deciding to submit an archival package. In
other cases, professors want to review datasets from their groups before submission and start
the transfer themselves.
Deposit in the ETH Data Archive is not coupled with an immediate publication of the content.
While ETH Zürich encourages open access also to research data, it is currently up to the data
producer to decide which data they want to make accessible as long as they are observing
existing requirements, e.g. from funders. They may opt for an embargo period of e.g. two
years or for limited access within the IP-range of ETH Zürich only. Even very restrictive access
rights for defined persons only are currently accepted.
Metadata of published content are published in the ETH Knowledge Portal (ETH Zürich, 2016c),
in the Primo Central Index (Ex-Libris Group, 2016a) and in the Data Citation Index (Thomson
Reuters, 2016).
Currently, the workflow for the deposit of research data is largely separated from the one for
publications, e.g. articles to be published via the green road of open access. In the future, a new
platform will pull several workflows for publications, bibliographic records and research data
together into one service.
For published research data in particular, it is a reasonable expectation that it should remain
available in the long term similar to what is expected from formal publications. Whether
this will actually be possible in the future depends to a large degree on the file formats
being employed. In the heterogeneous environment of a university with numerous
contradictory pressures and constraints on researchers, it
would not be a realistic approach
to admit only a limited number of well-documented open formats to a data archive. Rather,
the Digital Curation Office at
ETH-Bibliothek offers some guidance on preferred formats and
recommends a few of them depending on the expected retention period (ETH-Bibliothek,
2016d). This may limit the chances of actual preservation measures in the future to the
extent where only bitstream preservation remains as viable. This is made transparent to
researchers submitting data and usually they are fully aware of this serious limitation.
However, those depositing data just in case’
do not consider this as a major problem because
they expect a need to invest effort into using such data in the future, anyway. Their
perspective then is to post- pone this effort to the point where it is actually needed
rather than making an upfront investment which might be in vain, given that only a very
part of their data might ever be re-used.
On the other hand, data producers who regularly share and exchange data with colleagues in
their own community have usually overcome the barrier of proprietary formats and often rely
on open community standards. However, this might only apply to the core of research data,
while accompanying material may contain less suitable formats. Given the high level
awareness of these users, they might want to re-consider those formats, as well.
A particular format issue concerns the vast number of research data files in plain text
formats, but with a large variety of sometimes misleading file
extensions. While text files
with documented encoding are actually very suitable for preservation purposes, it can be
challenging to identify them in the first place and in many cases, the identification will
be a technical one, but will rely on information from the data producers.
Taking the sections above into account, it is obvious that communication with researchers
forms an essential part of providing research data services for publication and preservation.
It is in the best cases rewarding for customers and staff alike, but nevertheless time
consuming when time might already be pressing, for example when a manuscript is about
to be submitted and supplementary material needs to be deposited on time. Obviously, this
kind of communication is much facilitated if appropriate skills are available on both sides.
It is therefore very helpful
to have staff with a scientific background in the Digital Curation
Office, although it is obvious that they cannot cover all fields in depth.
With the growing digitization of science and society, a curation gap between research practice
and curation needs opens up. However, if there is collaboration and communication between IT
services, libraries and researchers, the discrepancy between research practice and research
content preservation can be minimized and the curation gap closed. In order to do so,
university libraries and data centres must continue to support and educate researchers,
which also requires a thorough understanding of researchers’
work practices and the
challenges they meet. The heterogeneity of their needs limits the possibility to generalize
or the other way round: in some cases, it may only be possible to serve needs close to
the smallest common denominator between various interests. This is the reason why
libraries should not aim at serving all communities equally themselves, but rather also keep
an eye on subject specific solutions, which are created by third parties to address specific
needs of one discipline. A combined and well-integrated landscape of institutional,
networked and subject-specific approaches might then cover most needs over time.
While libraries need to build on their strengths to become an active part of the overall
landscape for RDM in a university, they must also consider that the services offered are not
set in stone but have to be adapted and developed continuously. As scientific practice
evolves rapidly, a constant learning and innovation process is needed to keep up with
changing requirements. Libraries
and other service providers
need to open up and reach
out further towards researchers and collaborate more closely with the researchers, in
order to establish a mutual learning process on the part of both libraries and researchers.
Requirements of researchers will not only evolve through technical and scientific
developments in their field of research, but it can also be expected that RDM will play a more
prominent part in curricula and in good scientific practice in the years to come. Constant efforts
in information and training on RDM should help to further implement it as an essential task
of researchers with each new generation of, for example, doctoral students. A top-down
commitment from universities can certainly support this, provided it is appropriately
translated into activities, which really reach researchers at their workplace.
The authors would like to thank all members of the Swiss DLCM project for insightful
discussions. Special thanks go to the Research Data Team at EPFL Library for the excellent
collaboration on the Data Management Checklist.
Declaration of Conflicting Interests
The authors are employees of ETH Zürich and participate in the DLCM project.
The authors received no financial support for the research, authorship, and/or publication of
this article beyond their employment by ETH Zürich.
Blumer E and Burgi P-Y (2015) Data Life-Cycle Management Project : SUC P2 2015-2018.
Revue électronique suisse de science de l’information 16: 1–17.
Available at: http://archive- (accessed 5 October 2016).
Corti L, Van den Eynden V, Bishop L et al. (eds) (2014) Managing and Sharing Research Data.
London: SAGE. Available at: (accessed 7
October 2016).
DataCite (2016) DataCite. Available at: (accessed 6 May 2016).
DLCM (2016) Data Management Checklist. Available at: (accessed 22 September
docuteam (2016) docuteam packer. Available at:
archives/software (accessed 10 May 2016).
ETH Zürich (2011) Guidelines for Research Integrity. Zürich: ETH Zürich. Available at: (accessed 7 October 2016). ETH Zürich
(2016a) ETH Archives for Contemporary History. Available at: (accessed
22 September 2016).
ETH Zürich (2016b) ETH Zürich. Available at: (accessed 11 May
ETH Zürich (2016c) ETH-Bibliothek. Available at: (accessed 11
May 2016).
ETH Zürich (2016d) OpenBIS. Available at: (accessed 2 September
ETH Zürich (2016e) Critical Thinking Initiative. Available at:
programme.html (accessed 22 September 2016).
ETH-Bibliothek (2016a) Digital Curation Office. Avail- able at: (accessed 11 May 2016).
ETH-Bibliothek (2016b) E-Rara. Available at: (accessed 6 May
ETH-Bibliothek (2016c) ETH Zürich University Archives. Available at:
University-Archives (accessed 11 May 2016).
ETH-Bibliothek (2016d) File Formats for Archiving. Available at: (accessed 11 May
ETH-Bibliothek (2016e) Intended Purpose of Docuteam
Packer. Available at:
(accessed 10 May 2016).
ETH-Bibliothek and EPFL Library (2016) Data Management Checklist. Available at: (accessed 22 September 2016).
European Commission (2013) Guidelines on Data Management in Horizon 2020. Available
0-hi-oa-data-mgt_en.pdf (accessed 22 September 2016).
Ex-Libris Group (2016a) Primo Central Index. Available at: (accessed 6 May 2016).
Ex-Libris Group (2016b) Rosetta. Available at:
(accessed 10 May 2016).
Goodman A, Pepe A, Blocker AW et al. (2014) Ten simple rules for the care and feeding of
scientific data. PLoS
Computational Biology. 10(4): e1003542.
Piwowar HA and Vision TJ (2013) Data reuse and the open data citation advantage. PeerJ 1:
e175 Available at: (accessed 5 October 2016).
Piwowar HA, Day RS and Fridsma DB (2007) Sharing detailed research data is associated with
increased citation rate. PLoS ONE 2(3). Available at: (accessed 5 October 2016).
Sesartic´ A, Fischlin A and Töwe M (2016) Towards narrowing the curation gap: Theoretical
considerations and lessons learned from decades of practice. ISPRS Inter-
national Journal of
Geo-Information 5(6): 91. DOI: 10. 3390/ijgi5060091.
swissuniversities (2016) Rectors’ Conference of Swiss
Higher Education Institutions.
Available at: (accessed 12 May 2016).
Thomson Reuters (2016) Data Citation Index. Web of Science. Available at: (accessed 6 May 2016).
Treloar A (2012) Private Research, Shared Research, Publication, and the Boundary Transitions.
Available at:
(accessed 22 September 2016).
Author biographies
Ana Sesartic is an environmental scientist with a doctoral degree in Atmospheric Physics. She
brings hands-on experience in working with and managing big data sets, and is now bridging
the gap between scientists and librarians at the Digital Curation Office. Currently she is working
on the Data Life-Cycle Management (DLCM) project.
Matthias Töwe is a chemist with a doctoral degree in Experimental Physics and a trained
scientific librarian. Since 2003 he has been working at ETH-Bibliothek in different roles, first
for the Consortium of Swiss Academic Libraries and later for the Swiss electronic library:
Since late 2010 he has been head of the Digital Curation Office of ETH Zürich at ETH-
... The digital archive of ETH Foundation Zurich 5 (ETH) ( (Sesartic & Töwe, 2016) has been productive since 2012 and the Digital Preservation System of the New Zealand National Library (Rosin & Smith, 2014) ( went live in 2008. ...
Full-text available
The German National Specialist Libraries cooperate closely in the field of digital preservation. One of the partners hosts the preservation system, while each library creates its own workflows and is free to ingest its digital material into this system. This paper delineates the factors for success of this collaboration. It describes the different aspects of collaboration in digital preservation and describes the benefits and costs of cooperation in this field as a case study. Digital preservation is resource intensive and the required technology is complex. Therefore the libraries benefit from synergy effects: Reduced cost by sharing the preservation system, usage of similar workflows and formats of digital objects, work sharing in networking activities and staff training. The paper also stresses the advantages and difficulties while applying for certificates in a consortium. Their collaboration extends to intensive national and international networking activities, which yield more contacts than a single library could maintain if it acted on its own. The libraries’ staffs are active in working groups of nestor – the German network for digital preservation – and in working groups of the Open Preservation Foundation.
... Tailored training courses are especially useful when participants have different levels of knowledge and needs. Sesartic and Töwe (2016) study involved participants who were Masters, doctoral and post-doctoral students, all of whom had different levels of RDM knowledge and were at different stages of their research. Therefore, to cater to the different knowledge levels and stages, different forms of training with different focuses were rolled out. ...
Research Data Management (RDM) is an inherently complex area that presents a challenge for institutions to effectively upskill people in RDM and get them to enact RDM best practices. This systematic review aims to provide a reference point for institutions and academic developers to design and deliver effective training for their students and staff to navigate the RDM landscape. After narrowing down from a pool of 1256 articles across 7 databases to 28 articles for analysis, this systematic review provides an overview of available RDM training and identifies three main themes in relation to key features of this RDM training. Here is the eprint link:
Full-text available
Full-text available
New technological developments, the availability of big data, and the creation of research platforms open a variety of opportunities to generate, store, and analyze research data. To ensure the sustainable handling of research data, the European Commission as well as scientific commissions have recently highlighted the importance of implementing a research data management system (RDMS) in higher education institutes (HEI) which combines technical as well as organizational solutions. A deep understanding of the requirements of research data management (RDM), as well as an overview of the different stakeholders, is a key prerequisite for the implementation of an RDMS. Based on a scientific literature review, the aim of this study is to answer the following research questions: “What organizational factors need to be considered when implementing an RDMS? How do these organizational factors interact with each other and how do they constrain or facilitate the implementation of an RDMS?” The structure of the analysis is built on the four components of Leavitt’s classical model of organizational change: task, structure, technology, and people. The findings reveal that the implementation of RDMS is strongly impacted by the organizational structure, infrastructure, labor culture as well as strategic considerations. Overall, this literature review summarizes different approaches for the implementation of an RDMS. It also identifies areas for future research.
Preface The Research Data Curation and Management Bibliography includes over 800 selected English-language articles and books that are useful in understanding the curation of digital research data in academic and other research institutions. The "digital curation" concept is still evolving. In "Digital Curation and Trusted Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo define digital curation as follows: Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts.1 The Research Data Curation and Management Bibliography covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, open access, peer review, publication, citation, sharing, reuse, and preservation. It is highly selective in its coverage. The bibliography does not cover conference proceedings, digital media works (such as MP3 files), editorials, e-mail messages, interviews, letters to the editor, presentation slides or transcripts, technical reports. unpublished e-prints, or weblog postings. Most sources have been published from January 2009 through December 2019; however, a limited number of earlier key sources are also included. The bibliography has links to included works. URLs may alter without warning (or automatic forwarding) or they may disappear altogether. Where possible, this bibliography uses Digital Object Identifier System (DOI) URLs. DOIs are not rechecked after initial validation. Publisher systems may have temporary DOI 3 resolution problems. Should a link be dead, try entering it in the Internet Archive Wayback Machine. Abstracts are included in this bibliography if a work is under a Creative Commons Attribution License (BY and national/international variations), a Creative Commons public domain dedication (CC0), or a Creative Commons Public Domain Mark and this is clearly indicated in the publisher’s current webpage for the article. Note that a publisher may have changed the licenses for all articles on a journal’s website but not have made corresponding license changes in journal’s PDF files. The license on the current webpage is deemed to be the correct one. Since publishers can change licenses in the future, the license indicated for a work in this bibliography may not be the one you find upon retrieval of the work. Unless otherwise noted, article abstracts in this bibliography are under a Creative Commons Attribution 4.0 International License, Abstracts are reproduced as written in the source material. 1 Christopher A. Lee and Helen R. Tibbo, "Digital Curation and Trusted Repositories: Steps Toward Success," Journal of Digital Information 8, no. 2 (2007).
Full-text available
This bibliography includes over 800 selected English-language articles and books. It covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, open access, peer review, publication, citation, sharing, reuse, and preservation. Abstracts are included in this bibliography if a work is under certain Creative Commons Attribution licenses. Also see the Research Data Sharing and Reuse Bibliography (; over 200 works), Note: The International Journal of Digital Curation website is currently unavailable and there is no indication when it will be available. The Internet Archive snapshot of the journal from 1/17/2022 is available at
Full-text available
Ausgehend von der Frage nach möglichen bibliothekarischen Aufgabenfeldern in Zusammenhang mit der Erstellung von Datenmanagementplänen für geisteswissenschaftliche Forschungsprojekte untersucht die vorliegende Arbeit, inwieweit die Konzepte des Embedded bzw. des Liaison Librarian für die Forschungsunterstützung fruchtbar gemacht werden können. Anhand von vier kunsthistorischen Projekten wird der Status quo und der mögliche Bedarf für zukünftige Dienstleistungen abgeklärt. Das Template eines Datenmanagement-plans dient dabei aus Wegleitung, um im Projektverlauf diejenigen Aufgaben herauszuarbeiten, die sich für die Unterstützung durch die Bibliothek besonders eignen. Es hat sich gezeigt, dass es für die Etablierung neuer Aufgabenbereiche im Forschungsdatenmanagement seitens der Bibliothek einer intensiven Beziehungspflege zu den Forschenden bedarf, wie beispielsweise ein Liaison Librarian sie leisten könnte. Diese Veröffentlichung geht zurück auf eine Masterarbeit im weiterbildenden Masterstudiengang im Fernstudium Bibliotheks- und Informationswissenschaft (Library and Information Science, M. A. (LIS)) an der Humboldt- Universität zu Berlin. Eine Online-Version ist auf dem edoc Publikationsserver der Humboldt-Universität zu Berlin verfügbar.
Full-text available
Note: This bibliography has been superseded by the Research Data Curation and Management Bibliography (; over 800 works) and the Research Data Sharing and Reuse Bibliography (; over 200 works). This selective bibliography presents over 750 English-language articles, books, and technical reports. It covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, open access, peer review, publication, citation, sharing, reuse, and preservation. Most sources were published from 2009 through 2018. Important note: The International Journal of Digital Curation website is currently unavailable and there is no indication when it will be available. The Internet Archive snapshot of the journal from 1/17/2022 is available at
Full-text available
Research as a digital enterprise has created new, often poorly addressed challenges for the management and curation of research to ensure continuity, transparency, and accountability. There is a common misunderstanding that curation can be considered at a later point in the research cycle or delegated or that it is too burdensome or too expensive due to a lack of efficient tools. This creates a curation gap between research practice and curation needs. We argue that this gap can be narrowed if curators provide attractive support that befits research needs and if researchers consistently manage their work according to generic concepts consistently from the beginning. A rather uniquely long-term case study demonstrates how such concepts have helped to pragmatically implement a research practice intentionally using only minimalist tools for sustained, self-contained archiving since 1989. The paper sketches the concepts underlying three core research activities. (i) handling of research data, (ii) reference management as part of scholarly publishing, and (iii) advancing theories through modelling and simulation. These concepts represent a universally transferable best research practice, while technical details are obviously prone to continuous change. We hope it stimulates researchers to manage research similarly and that curators gain a better understanding of the curation challenges research practice actually faces.
Full-text available
This article offers a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more, but our goal here is not to review that literature. Instead, we present a short guide intended for researchers who want to know why it is important to "care for and feed" data, with some practical advice on how to do that.
Full-text available
Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Full-text available
Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.
Data Life-Cycle Management Project : SUC P2 2015-2018
  • E Blumer
  • P-Y Burgi
ETH Archives for Contemporary History Available at
  • Eth Zürich
Guidelines for Research Integrity Zürich: ETH Zürich Available at:
  • Eth Zürich
Managing and Sharing Research Data. London: SAGE. Available at:
  • L Corti
  • V Van Den Eynden
  • L Bishop
OpenBIS Available at: (accessed 2
  • Eth Zürich
Critical Thinking Initiative Available at:
  • Eth Zürich