Chapter

Open Research Data: From Vision to Practice

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

“To make progress in science, we need to be open and share.” This quote from Neelie Kroes (2012), vice president of the European Commission describes the growing public demand for an Open Science. Part of Open Science is, next to Open Access to peer-reviewed publications, the Open Access to research data, the basis of scholarly knowledge. The opportunities and challenges of Data Sharing are discussed widely in the scholarly sector. The cultures of Data Sharing differ within the scholarly disciplines. Well advanced are for example disciplines like biomedicine and earth sciences. Today, more and more funding agencies require a proper Research Data Management and the possibility of data re-use. Many researchers often see the potential of Data Sharing, but they act cautiously. This situation shows a clear ambivalence between the demand for Data Sharing and the current practice of Data Sharing. Starting from a baseline study on current discussions, practices and developments the article describe the challenges of Open Research Data. The authors briefly discuss the barriers and drivers to Data Sharing. Furthermore, the article analyses strategies and approaches to promote and implement Data Sharing. This comprises an analysis of the current landscape of data repositories, enhanced publications and data papers. In this context the authors also shed light on incentive mechanisms, data citation practises and the interaction between data repositories and journals. In the conclusions the authors outline requirements of a future Data Sharing culture.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Depuis quelques années le vocable Open Research Data désigne en fait des débats au sein de la communauté scientifique sur les stratégies et processus d'ouverture des données qui sont envisagés comme une composante d'une évolution vers la Science ouverte (Pampel et Dallmeier-Tiessen, 2014). ...
... Vu sous l'angle des pratiques scientifiques, les stratégies de publication des données de recherche varient. Tantôt celles-ci sont publiées sous la forme d'un objet informationnel autonome, par exemple dans un entrepôt de données, tantôt comme « supplément » ou comme publication augmentée (enhanced publication) en étant directement reliées avec un article, ou carrément en tant que data paper dans une revue scientifique ou plus spécifiquement dans un data journal (Woutersen-Windhouwer et Brandsma, 2009 ;Pampel et Dallmeier-Tiessen, 2014). Le framework Scholix apparu en 2017 repose sur le concept de l'interaction entre entrepôts de données et articles de revues scientifiques (Burton et al., 2017). ...
... Ils sont particulièrement intéressants quand les métadonnées descriptives ne suffisent pas à donner une idée du contenu d'un jeu de données présent dans un entrepôt, notamment en exposant les possibilités de réutilisation qu'il offre. La publication d'un tel data paper s'effectue dans une revue scientifique, dans un data journal spécifique, tout comme dans d'autres types de publication (Pampel et Dallmeier-Tiessen, 2014) 27 . C'est ainsi qu'a été créé en 2008 le data journal Earth System Science Data (ESSD) en géosciences (Pfeiffenberger et Carlson, 2011). ...
... Data management needs planning and appropriate metadata. [30][31][32] The creation and/or collection context of the data, the purpose of data creation/collection, storage format and access rights are essential information for the reuse of data. [33,34] Providing such metadata demands expertise on data management, knowledge about the data and the context of their usage. ...
... Data practices change towards openness as funding bodies and scholarly journals require data sharing and open publishing. Pampel and Dallmeier-Tisel [30] emphasize the effect of incentives. Researchers themselves have started to insist opening research data for verification and replication purposes (e.g. ...
... Concerns about misuse, misinterpretation, lack of confidentiality and loss of intellectual property are typical [36,37]. Ethical issues, lack of funding, time or knowledge about the possibilities are also mentioned as barriers to data sharing [30,38]. ...
Article
Full-text available
This study focuses on the use and users of Finnish social science research data archive. Study is based on enriched user data of the archive from years 2015–2018. Study investigates the number and type of downloaded datasets, the number of citations for data, the demographics of data downloaders and the purposes data are downloaded for. Datasets were downloaded from the archive 10346 times. Majority of the downloaded datasets are quantitative. Quantitative datasets are also more often cited, but the number of citations vary and does not always correlate with the number of downloads. Use of the archive varies by user’s country, organization, and discipline. Datasets from the archive were downloaded most often for study work, bachelor’s and master’s theses, and research purposes. It is likely that reusing research data will increase in the near future as more data will become available, scholars are more informed about research data management, and data citation practices are established.
... Aitken et al. [27], in a systematic review of sharing health data for research, conclude that there are low levels of knowledge of current practices and uses of the data and point to the need for greater awareness among all stakeholders, combined with public deliberation on data sharing. Levin et al. [28] conducted in UK an analysis of in-depth interviews to researchers in biology and bioinformatics and identified some core themes that characterize researchers' understanding of openness in science, including "the existence of repositories for data, software, and models to carry it out; the competitiveness of academic fields; the digital nature of research; the credit system; career structures in academic research; collaborations with industrial partners and attempts at commercialization; and guidelines for intellectual property" [28]. They conclude that it is necessary to take into account the diversity and contextual nature of openness in policies and recommendations, given the heterogeneity of data formats, sizes, standards, and repositories. ...
... Aitken et al. [27], in a systematic review of sharing health data for research, conclude that there are low levels of knowledge of current practices and uses of the data and point to the need for greater awareness among all stakeholders, combined with public deliberation on data sharing. Levin et al. [28] conducted in UK an analysis of in-depth interviews to researchers in biology and bioinformatics and identified some core themes that characterize researchers' understanding of openness in science, including "the existence of repositories for data, software, and models to carry it out; the competitiveness of academic fields; the digital nature of research; the credit system; career structures in academic research; collaborations with industrial partners and attempts at commercialization; and guidelines for intellectual property" [28]. They conclude that it is necessary to take into account the diversity and contextual nature of openness in policies and recommendations, given the heterogeneity of data formats, sizes, standards, and repositories. ...
... They conclude that it is necessary to take into account the diversity and contextual nature of openness in policies and recommendations, given the heterogeneity of data formats, sizes, standards, and repositories. For example, adopting open data policies that are too rigorous may have negative effects on scientific research by forcing scientists to disclose results and resources in ways that they deem useless or inappropriate or by requiring openness at a stage of research where it is more likely to hamper than encourage progress [28]. In the study commissioned by the Wellcome Trust, the main incentives to make more data available in the future are funding to cover the cost of data preparation, enhancement of academic reputation, knowing how others will use the data and data sharing being taken into account in future funding and career promotion decisions [24]. ...
Article
Full-text available
This work provides an overview of a Spanish survey on research data, which was carried out within the framework of the project Datasea at the beginning of 2015. It is covered by the objectives of sustainable development (goal 9) to support the research. The purpose of the study was to identify the habits and current experiences of Spanish researchers in the health sciences in relation to the management and sharing of raw research data. Method: An electronic questionnaire composed of 40 questions divided into three blocks was designed. The three Section s contained questions on the following aspects: (A) personal information; (B) creation and reuse of data; and (C) preservation of data. The questionnaire was sent by email to a list of universities in Spain to be distributed among their researchers and professors. A total of 1063 researchers completed the questionnaire. More than half of the respondents (54.9%) lacked a data management plan; nearly a quarter had storage systems for the research group; 81.5% used personal computers to store data; “Contact with colleagues” was the most frequent means used to locate and access other researchers’ data; and nearly 60% of researchers stated their data were available to the research group and collaborating colleagues. The main fears about sharing were legal questions (47.9%), misuse or interpretation of data (42.7%), and loss of authorship (28.7%). The results allow us to understand the state of data sharing among Spanish researchers and can serve as a basis to identify the needs of researchers to share data, optimize existing infrastructure, and promote data sharing among those who do not practice it yet.
... One should note that "data" is not copyrightable per se, but the arrangement of data can be protected by either copyright law or sui generis database law. Advocates for open access to data argue that a mandate to make data freely available is both more efficient for the science community and enables better scientific practices, as data can be reused and verified by other researchers (Pampel and Dallmeier-Tiessen 2014). Another argument for the OA movement is rather political and distributional: for scholarship that is funded by tax money, the public has already paid for the research and should not be charged again for accessing the results (Suber 2012). ...
... Another argument for the OA movement is rather political and distributional: for scholarship that is funded by tax money, the public has already paid for the research and should not be charged again for accessing the results (Suber 2012). This argument works for both open access to journal articles and research data (Pampel and Dallmeier-Tiessen 2014). ...
Article
Web-based crowdsourced citizen science is an efficient method for scientists to collect and process data. Although lay persons obtain the opportunities to participate in research and engage with scientists, these crowdsourced projects generally maintain the traditional hierarchy of academic science. Lay persons have little say in project or platform governance, and institutional tools to hold project investigators accountable are almost nonexistent. This article examines how existing institutional policies address the question of distribution in crowdsourced citizen science, as it may further affect lay participants’ role in the institution of scientific knowledge production and their access to research resources. This article begins by comparing the norms developed by citizen-science institutions. It then discusses examples from Galaxy Zoo to see how the results of research projects are distributed, both in the form of access to research outcome and in authorship. The article also discusses the potential conflicts that arise when crowdsourced projects are organized by for-profit companies and why citizen-science platforms should develop institutional norms to avoid such conflicts.
... Furthermore, a conception is needed that acknowledges, with regard to data reuse (and, some would argue, primary data collection as well), that all data and context are partial and incomplete (Moore, 2007;Carmichael, 2017;Gillies and Edwards, 2005). Aside from information about the original research being unavailable to data reusers as a result of reusers' not "being there" when the data were collected (see the discussions in Corti, 2000;Medjedović, 2011;and Mauthner and Parry, 2009), data may need to be anonymized (Andersson and Sørvik, 2013), there are time and resource constraints to recording relevant details from the original research (Borgman et al., 2007a;Pampel and Dallmeier-Tiessen, 2014), and some procedural or other tacit information can be difficult or impossible to record (Kelder, 2005;Roland andLee, 2013, Niu, 2009a). ...
Thesis
Government funding agencies and commissions have proposed that sharing, preserving, and providing access to more scientific research data will lead to increased reuse of data in academic research and result in greater knowledge and new discoveries. However, researchers encounter significant logistical, theoretical, methodological and ethical challenges to reusing data that hinder the achievement of these goals. One of the challenges researchers face is obtaining sufficient knowledge about data and the context of data creation to make a decision to reuse the data in their research. In this dissertation, I report on a mixed methods study to investigate how researchers set limits on the types and amounts of knowledge they obtain about data, and what influences them to do so. A more nuanced understanding of how and why researchers determine such thresholds can inform strategic measures to enhance support for data reuse. My study included a survey and semi-structured interviews and was conducted on a sample of researchers who reused data from the ICPSR data archive. I used Donna Haraway’s theory of situated knowledges and Herbert Simon’s theory of satisficing to develop conceptualizations of data and means of evaluating thresholds of knowledge that researchers obtained about data. I defined a concept called “reuse equilibrium”—when researchers determine data are sufficient to reuse to meet their research goals—and examined whether satisficing was a means by which researchers obtained knowledge to reach reuse equilibrium. I found that researchers lacked knowledge they desired about data and that this lack of knowledge frequently had a negative impact on their research. The type of knowledge researchers most often desired but were unable to obtain was “supplemental” knowledge that was not archived with the data and may never have been collected. While researchers lacked knowledge about the data they desired, I found that satisficing did not accurately represent their behavior in knowledge attainment. Instead, researchers sought to maximize their knowledge of data to meet personal aims (i.e., to reach “personal reuse equilibrium”) in environments characterized by pressures and incentives that favored the achievement of social norms and requirements (i.e., "social reuse equilibrium"). I concluded that an important way to improve the environment for reuse was to assist researchers in obtaining supplemental knowledge about data they desired, thus supporting their achievement of personal equilibrium. This could be done by facilitating more structured and intentional “conversations” between data creators and data reusers with the purpose to influence the data that are created in the first place. My findings about the knowledge researchers lack about data and the ways they seek to obtain it will be of interest to data reusers to gain a broader perspective on their colleagues' experiences. They will also be of interest to data creators, as well as data stewards, publishers, and other data intermediaries, to understand the knowledge researchers desire about data and the role they can play in helping researchers obtain it. Such findings, in addition to those about pressures and considerations in the reuse environment, will be of interest to funders and policy makers to gain insight into the ways current policies, practices, and incentives could be enhanced or changed to maximize the return on investment in primary research.
... Evidence suggests that the replication of results, the discovery and exchange of information, and the reuse of research data have emerged as some of the most important reasons for Open Science (Hey et al., 2009;Wilkinson et al., 2019). Interestingly, the reuse of "data created by others", also known as secondary data, described as "the basis of scholarly knowledge" (Pampel & Dallmeier-Tiessen, 2014), is considered one of the key aspects of Open Science (Vicente-Sáez & Martnez-Fuentes, 2018). ...
Article
Full-text available
Understanding the complexity of restricted research data is vitally important in the current new era of Open Science. While the FAIR Guiding Principles have been introduced to help researchers to make data Findable, Accessible, Interoperable and Reusable, it is still unclear how the notions of FAIR and Openness can be applied in the context of restricted data. Many methods have been proposed in support of the implementation of the principles, but there is yet no consensus among the scientific community as to the suitable mechanisms of making restricted data FAIR. We present here a systematic literature review to identify the methods applied by scientists when researching restricted data in a FAIR-compliant manner in the context of the FAIR principles. Through the employment of a descriptive and iterative study design, we aim to answer the following three questions: (1) What methods have been proposed to apply the FAIR principles to restricted data?, (2) How can the relevant aspects of the methods proposed be categorized?, (3) What is the maturity of the methods proposed in applying the FAIR principles to restricted data?. After analysis of the 40 included publications, we noticed that the methods found, reflect the stages of the Data Life Cycle, and can be divided into the following Classes: Data Collection, Metadata Representation, Data Processing, Anonymization, Data Publication, Data Usage and Post Data Usage. We observed that a large number of publications used ‘Access Control‘ and ‘Usage and License Terms’ methods, while others such as ‘Embargo on Data Release’ and the use of ‘Synthetic Data’ were used in fewer instances. In conclusion, we are presenting the first extensive literature review on the methods applied to confidential data in the context of FAIR, providing a comprehensive conceptual framework for future research on restricted access data.
... This is due to the ever-increasing scale of research problems that require extensive collaboration between groups of users using geographically distributed, heterogeneous, high volume, and high-value data sources. Remote cooperation and sharing of data around the world result in new approaches [1] that require proper data handling [2], supported by easy-to-use data storage solutions. Data-driven research often entails highperformance data access and processing taking into account the expectations of both users and providers. ...
Article
Data access and management on a large, global scale is currently at the center of scientific interest. This follows from the need for data access from multi- and hybrid-cloud applications. In most cases existing solutions provide sufficient functionality to scale computing resources but scaling resources in terms of efficient data access e.g. for data-intensive applications is still not comprehensively resolved. In this paper, we present a new approach to global data access that supports the execution of data-intensive applications on globally distributed heterogeneous resources provided and managed by independent providers. We identify the functionality, representation, and processing of contextual information in the form of metadata, and the organization of data, resulting in four models describing the details of the approach. An experimental evaluation of this approach is discussed. We show overheads for single- and multi-site environments, system scalability, and usage of context awareness to achieve desired behavior proving insignificant overhead introduced by the system. The results confirm usability of the approach to supporting computations in heterogeneous environments with unified access to data distributed worldwide achieved by using broad contextual information.
... Within this datascape, by far the easiest way to circumvent limitations of unpublishable data is by avoiding them altogether, for example by working with open data, which by definition has much less restrictive conditions (Kitchin, 2014). Working with open data is regarded as providing benefits both for the research community by supporting better scientific practices and promoting data standards, as seen for example in (Bartha & Kocsis, 2011), as well as at the political level, where open data is regarded as "providing greater returns from the public investment in research" (Pampel & Dallmeier-Tiessen, 2014). One such case of using open data in the geospatial research community is the use of OpenStreetMap (OSM), a crowdsourced web-mapping platform (Haklay & Weber, 2008) that has seen increased use in recent years, with research associated with it both using OSM data as input as well as feeding its output back into the community, as seen for example in humanitarian mapping during emergency 9 response (Dittus et al., 2017). ...
Article
Full-text available
Detailed datasets of real-world systems are becoming more and more available, accompanied by a similar increased use in research. However, datasets are often provided to researchers with restrictions regarding their publication. This poses a major limitation for the dissemination of computational tools, whose comprehension often requires the availability of the detailed dataset around which the tool was built. This paper discusses the potential of synthetic datasets for circumventing such limitations, as it is often the data content itself that is proprietary, rather than the dataset schema. Therefore, new data can be generated that conform to the schema, and may then be distributed freely alongside the relevant models, allowing other researchers to explore tools in action to their full extent. This paper presents the process of creating synthetic geospatial data within the scope of a research project which relied on real-world data, originally captured through close collaboration with industry partners.
... However, both of these documents, as well as numerous others we reviewed, encourage moving away from existing journal-level metrics, as a proxy for individual quality, in particular the JIF (see for example European Commission, 2018b; Morais and Borrell-Damián, 2018; Ayris et al., 2018).Building on the existing scientific practices of academic citation, the Next Generations Metrics report recommends using citations for every scientific output, highlighting that being able to do this depends on developing corresponding infrastructure and assigning PIDs, which is also mentioned in the previously mentioned Amsterdam call for Action on Open Science (Dutch Ministry of Education, Culture and Science, 2016). Other work(Pampel and Dallmeier-Tiessen, 2014) proposes not just usage of citations, but also including some type of "sharing factor" to indicate how much researchers share information for the good of society.The starting point for designing responsible metrics for open science should be defining the desired objectives and outcomes for open science. As metrics are developed, there should also be a program of meta-research about the indicators to assess their likely benefits and consequences as they are applied in evaluation and also to identify unintended biases and consequences (Wilsdon et al., 2017; European Commission, 2018b). ...
Technical Report
Full-text available
We argue that open science can be encouraged and rewarded by developing FAIReR assessments recognizing open science outputs and activities. This requires a variety of stakeholders - research communities, policy makers, funders and publishers - to work together to address social and cultural barriers and challenges. It also requires creating a technical infrastructure, which makes responsible assessments of open science practices and outputs possible. In this report, we focus particularly on two aspects needed to make assessing open science practices both rewarding and technically possible. These were addressed through the course of two intertwined EOSC Co-Creation projects. The first project, European Overview of Career Merit Systems, aimed to survey the existing landscape of the policies, technical systems and data models used in researcher evaluations and to evaluate how these systems support (or do not support) the responsible assessment of open science in academic careers. The second project Vision for Research Data in Research Careers aimed to understand the current state of assessing open science and data practices and to co-create a vision and roadmap for how these practices can be responsibly taken into account in academic careers. We discuss the findings and outcomes from this work here, drawing on work conducted during an intense seven-month period of research and co-creation. During this time, we (i) performed an extensive review of policy documents, reports, manifestos for responsible assessments and the academic literature and (ii) conducted a survey and detailed case studies of five infrastructures. These studies were used to create the overview of the current state of practices and infrastructures presented in Section 1 (Overview) of this report. We also (iii) designed, moderated and engaged in four co-creation bootcamps with experts in open science, research data and academic assessments. We drew on the insights developed during the course of these bootcamps to propose the vision and roadmap, validated through a round of open public consultation, which compose the final two sections (Vision and Roadmap) of this report.
... Challenges in scientific publishing are as diverse as they are complex, ranging from navigating the line between scientific rigour and the rising popularity of pre-prints (Kaiser, 2017), issues of equality and bias in publishing (Hofstra et al., 2020;Tomkins et al., 2017), limitations of metrics that evaluate research and researcher impact (Berenbaum, 2019;Statzner & Resh, 2010) and challenges of data availability and reproducibility (Pampel & Dallmeier-Tiessen, 2014). Many publishing challenges also disproportionately affect early-career researchers (ECRs)-for example, concerns about impact factor for job applications (Berenbaum, 2019) and biases in peer review (Tomkins et al., 2017). ...
Article
Full-text available
Peer‐review and subject‐matter editing is the backbone of scientific publishing. However, early‐career researchers (ECRs) are given few opportunities to participate in the editorial process beyond reviewing articles. Thus, a disconnect exists: science needs high‐quality editorial talent to conduct, oversee and improve the publishing process, yet we dedicate few resources to building editorial talent nor giving ECRs formal opportunities to influence publishing from within. ECRs can contribute to the publishing landscape in unique ways given their insight into new and rapidly developing publishing trends (e.g. open science). Here, we describe a two‐way fellowship model that gives ECRs a “seat” at the editorial table of a field‐leading journal. We describe both the necessary framework and benefits that can stem from editorial fellowships for ECRs, editors, journals, societies, and the broader scientific community. Peer‐review and subject‐matter editing is the backbone of scientific publishing. However, early‐career researchers (ECRs) are given few opportunities to participate in the editorial process beyond reviewing articles. Here, we describe a two‐way fellowship model that gives ECRs a “seat” at the editorial table of a field‐leading journal.
... The publishing landscape and early career researchers Challenges in scientific publishing are as diverse as they are complex, ranging from navigating the line between scientific rigor and the rising popularity of pre-prints (Kaiser 2017), issues of equality and bias in publishing (Tomkins et al. 2017;Hofstra et al. 2020), limitations of metrics that evaluate research and researcher impact (Statzner & Resh 2010;Berenbaum 2019), and challenges of data availability and reproducibility (Pampel & Dallmeier-Tiessen 2014). Many publishing challenges also disproportionately affect early career researchers (ECRs)-e.g., concerns about impact factor for job applications (Berenbaum 2019) and biases in peer review (Tomkins et al. 2017). ...
Preprint
Full-text available
Peer-review and subject-matter editing is the backbone of scientific publishing. However, early career researchers (ECRs) are given few opportunities to participate in the editorial process beyond reviewing articles. Thus, a disconnect exists: science needs high-quality editorial talent to conduct, oversee, and improve the publishing process, yet we dedicate few resources to building editorial talent nor giving ECRs formal opportunities to influence the publishing landscape from within. Here, we describe a “two-way” fellowship model that gives ECRs a “seat” at the editorial table of a field-leading journal. We describe both the necessary framework and benefits that can stem from editorial fellowships for ECRs, editors, journals, and the scientific community.
... In times where data is abundant -in marketing (Sheth & Kellstadt, 2020), in finance (Begenau, Farboodi, & Veldkamp, 2018), and in accounting (Bhimani & Willcocks, 2014) -and transparency is key (Beugelsdijk et al., 2020;Mendes-da-Silva, 2019;Mendes-da-Silva & Leal, 2020), articles with open data, open code, and reused data become essential to the research community. All these practices have numerous benefits to researchers (increasing visibility, reputation, and citation), to students (decrease learning efforts and time), and to the general society (increase transparency and decrease the cost of science) (Drachen, Ellegaard, Larsen, & Dorch, 2016;McKiernan et al., 2016;Pampel & Dallmeier-Tiessen, 2014;Piwowar & Vision, 2013). ...
Article
Full-text available
Context: this document is designed to be along with those that are in the first edition of the new section of the Journal of Contemporary Administration (RAC): the tutorial-articles section. Objective: the purpose is to present the new section and discuss relevant topics of tutorial-articles. Method: I divide the document into three main parts. First, I provide a summary of the state of the art in open data and open code at the current date that, jointly, create the context for tutorial-articles. Second, I provide some guidance to the future of the section on tutorial-articles, providing a structure and some insights that can be developed in the future. Third, I offer a short R script to show examples of open data that, I believe, can be used in the future in tutorial-articles, but also in innovative empirical studies. Conclusion: finally, I provide a short description of the first tutorial-articles accepted for publication in this current RAC’s edition.
... La donnée ouverte concerne l'activité de recherche à toutes ses étapes. La collecte, la gestion et la publication des données sont envisagées sous l'angle de la transparence, du partage et de la réutilisation par la communauté scientifique et font l'objet de validation à toutes les étapes (Pampel et Dallmeier-Tiessen, 2014). ...
Book
Full-text available
Plateformes, réseaux sociaux, ressources en ligne, simulations, apprentissage à distance, données d’apprentissage, données massives, intelligence artificielle… la transition numérique bouleverse l’enseignement supérieur et la recherche publics. Elle modifie les contenus, les outils et méthodes pédagogiques, ainsi que le rôle des enseignants et des apprenants. Elle remodèle la recherche, ses pratiques, ses métiers et son écosystème. Pour anticiper les changements induits par la transition numérique à l’horizon 2040, INRAE et Agreenium ont commandité cette prospective. S’appuyant sur un groupe d’experts et sur une synthèse des tendances actuelles dans les sciences agronomiques, de l’environnement, de l’alimentation et vétérinaires, elle a abouti à la construction de quatre scénarios dont les enseignements sont pertinents bien au-delà de ces domaines scientifiques pour les chercheurs, enseignants, décideurs et prospectivistes. Ces scénarios éclairent les enjeux des évolutions de l’apprentissage, du partage des savoirs et des transformations des pratiques scientifiques. Ils ouvrent de nouvelles perspectives sur les relations entre la science et la société, et sur le rôle de la recherche publique face aux géants du numérique.
... To funding organizations, it helps to understand how scientists allocate public and private resources, and to evaluate the return of investments in science; it also helps to allocate resources better and to prevent double-spending in same-investigations or similar experiments. Finally, to the general public, it helps politicians to create and improve social and economic agendas, it promotes the public democratic right of accessing knowledge and enhances public engagement with the directions science should take in the future McKiernan et al., 2016;Pampel & Dallmeier-Tiessen, 2014;Piwowar & Vision, 2013). ...
Technical Report
Full-text available
We all should trust science is a common saying among scientists, but also among practitioners and society. However, science is facing a credibility and trust crisis (Bergh, Sharp, Aguinis, & Li, 2017), and we are aware of such a crisis, at least, since 2000 (Millstone & Zwanenberg, 2000). In the last few years, we have seen an increasing discussion about how to counterattack this credibility and trust crisis (Peng, 2015). One idea is getting more and more traction over these past few years: Open Science (OS).
... Among other researchers, Cribb and Sari describe the access to knowledge as a necessity for human development (Cribb and Sari, 2010;Phelps et al., 2012). One aspect of Open Data addresses the reuse of published scientific data (Pampel and Dallmeier-Tiessen, 2014). Often, an academic third party like a publisher holds the rights, so the scientific community is not allowed to reuse this data without permission (Murray-Rust, 2008;Molloy, 2011). ...
Article
Full-text available
Many sectors, like finance, medicine, manufacturing, and education, use blockchain applications to profit from the unique bundle of characteristics of this technology. Blockchain technology (BT) promises benefits in trustability, collaboration, organization, identification, credibility, and transparency. In this paper, we conduct an analysis in which we show how open science can benefit from this technology and its properties. For this, we determined the requirements of an open science ecosystem and compared them with the characteristics of BT to prove that the technology suits as an infrastructure. We also review literature and promising blockchain-based projects for open science to describe the current research situation. To this end, we examine the projects in particular for their relevance and contribution to open science and categorize them afterwards according to their primary purpose. Several of them already provide functionalities that can have a positive impact on current research workflows. So, BT offers promising possibilities for its use in science, but why is it then not used on a large-scale in that area? To answer this question, we point out various shortcomings, challenges, unanswered questions, and research potentials that we found in the literature and identified during our analysis. These topics shall serve as starting points for future research to foster the BT for open science and beyond, especially in the long-term.
... • Open research data are the results of scientific research, they can be freely digitally accessed, are published in a machine readable form and can be reused. According to Pampel and Dallmeier-Tiessen (2014), open research data are available on the Internet and users can access, copy, analyze, re-process, and use them for any purpose. An important element of open research data are the following FAIR principles: they should be findable, accessible, interoperable, and reusable (Wilkinson et al. 2016). ...
... During recent years, we have observed a strong trend towards Open Science across different stakeholders and disciplines (Pampel & Dallmeier-Tiessen (2014)). Researchers must now submit their research data as supplementary information in order to be in compliance with the data storing requirements of major funding agencies, high profile journals and data journals (Molloy (2011)). ...
Article
Full-text available
The ability to reuse research data is now considered a key benefit for the wider research community. Researchers of all disciplines are confronted with the pressure to share their research data so that it can be reused. The demand for data use and reuse has implications on how we document, publish and share research in the first place, and, perhaps most importantly, it affects how we measure the impact of research, which is commonly a measurement of its use and reuse. It is surprising that research communities, policy makers, etc. have not clearly defined what use and reuse is yet. We postulate that a clear definition of use and reuse is needed to establish better metrics for a comprehensive scholarly record of individuals, institutions, organizations, etc. Hence, this article presents a first definition of reuse of research data. Characteristics of reuse are identified by examining the etymology of the term and the analysis of the current discourse, leading to a range of reuse scenarios that show the complexity of today’s research landscape, which has been moving towards a data-driven approach. The analysis underlines that there is no reason to distinguish use and reuse. We discuss what that means for possible new metrics that attempt to cover Open Science practices more comprehensively. We hope that the resulting definition will enable a better and more refined strategy for Open Science.
... Data curation, as it is typically known, focuses on the movement of data and its management (Research Data Management) to ensure its long-term value (so-called digital preservation) and to encourage secondary use. Over the last 20 years, libraries, data centres and other institutions have increasingly attempted to collaborate, build partnerships, define policies and build up information infrastructures in pursuit of those goals (Pampel and Dallmeier-Tiessen 2014;Oßwald and Strathmann 2012;Reilly 2012). Alongside of this, many funding bodies have mandated the creation of research data management plans (RDMP) and institutional Open Research Data policies. ...
Article
Full-text available
The Open Science (OS) agenda has potentially massive cultural, organizational and infrastructural consequences. Ambitions for OS-driven policies have proliferated, within which researchers are expected to publish their scientific data. Significant research has been devoted to studying the issues associated with managing Open Research Data. Digital curation, as it is typically known, seeks to assess data management issues to ensure its long-term value and encourage secondary use. Hitherto, relatively little interest has been shown in examining the immense gap that exists between the OS grand vision and researchers’ actual data practices. Our specific contribution is to examine research data practices before systematic attempts at curation are made. We suggest that interdisciplinary ethnographically-driven contexts offer a perspicuous opportunity to understand the Data Curation and Research Data Management issues that can problematize uptake. These relate to obvious discrepancies between Open Research Data policies and subject-specific research practices and needs. Not least, it opens up questions about how data is constituted in different disciplinary and interdisciplinary contexts. We present a detailed empirical account of interdisciplinary ethnographically-driven research contexts in order to clarify critical aspects of the OS agenda and how to realize its benefits, highlighting three gaps: between policy and practice, in knowledge, and in tool use and development.
Article
Full-text available
A Ciência Aberta tem se consolidado como estratégia para construção de saberes e acesso ao conhecimento produzido na comunidade científica. O estudo analisa a produção científica sobre Ciência Aberta produzida no Encontro Nacional de Pesquisa em Ciência da Informação (ENANCIB). Caracteriza-se como pesquisa de natureza básica, bibliográfica, documental e de levantamento quanto aos procedimentos técnicos, descritiva quanto aos objetivos, com abordagem quantitativa e qualitativa. A análise deu-se por inferência crítico-reflexiva sobre os materiais recuperados, com o uso da Análise de Redes Sociais para análise da rede de coautoria. Os resultados apontam para a produção de 93 trabalhos apresentados nas edições de 2015 a 2019 em crescente evolução, submetidos nas modalidades: comunicação oral/trabalhos completos e pôsteres/resumos expandidos. Os Grupos de Trabalho Política e Economia da Informação (GT5), Produção e Comunicação da Informação em Ciência, Tecnologia & Inovação (GT7), e Informação e Tecnologia (GT8) destacaram-se na produção dessa temática. A rede de coautoria na produção cientifica totalizou 180 atores empenhados em investigar assuntos associados à Ciência Aberta como: dados abertos, dados governamentais abertos, inovação aberta, redes sociais acadêmicas, reuso de dados e compartilhamento de dados de pesquisa. Salienta-se, ainda, o foco sobre tópicos que versam sobre comunicação científica, lei de acesso à informação, ciência aberta, acesso à informação, dados abertos, acesso aberto, ciência da informação, transparência, dados governamentais abertos, informação pública, transparência pública, dados de pesquisa, entre outros. Conclui-se que a Ciência da Informação se configura como área que contribui para as discussões sobre a Ciência Aberta.
Article
Data culture/s as a research topic has begun to attract attention from a wide range of disciplines, albeit with inconsistent application of definitions, dimensions, and applications. This work builds on a call to investigate data culture/s within the information studies domain as a topic related to, but distinct from, information culture. The purpose of this study is to explore what is known about data culture/s in greater depth. We apply a retroductive approach to select and consider likely dimensions, inputs, and aspects of data culture/s in order to further map this construct to the literature, and thereby highlight gaps and opportunities to add to this body of knowledge. The initial candidate dimensions explored below include data‐related skills and attitudes, data sharing, data use/reuse, data ethics and governance, and a specific focus on Indigenous perspectives to provide insights on why and how a group may contest the emergent dominant discourse of data culture/s. Our conclusion highlights areas needing further research to fully define and examine the dimensions, inputs, and aspects of data culture/s, and calls for greater understanding and engagement with data culture/s from the information studies community.
Article
Full-text available
Advanced diagnosis systems provide doctors with an abundance of high-quality data, which allows for diagnosing dangerous diseases, such as brain cancers. Unfortunately, humans flooded with such plentiful information might overlook tumor symptoms. Hence, diagnostical devices are becoming more commonly combined with software systems, enhancing the decisioning process. This work picks up the subject of designing a neural network based system that allows for automatic brain tumor diagnosis from MRI images and points out important areas. The application intends to speed up the diagnosis and lower the risk of slipping up on a neoplastic lesion. The study based on two types of neural networks, Convolutional Neural Networks and Vision Transformers, aimed to assess the capabilities of the innovative ViT and its possible future evolution compared with well-known CNNs. The research reveals a tumor recognition rate as high as 90% with both architectures, while the Vision Transformer turned out to be easier to train and provided more detailed decision reasoning. The results show that computer-aided diagnosis and ViTs might be a significant part of modern medicine development in IoT and healthcare systems.
Article
Purpose The purpose of this study is to examine the development of Dataverse, a global research data management consortium. The authors examine specifically the institutional characteristics, the utilization of the associated data sets and the relevant research data management services at its participating university libraries. This evidence-based approach is essential for understanding the current state of research data management practices in the global context. Design/methodology/approach The data was collected from 67 participants’ data portals between December 1, 2020, and January 31, 2021. Findings Over 80% of its current participants joined the group in the past five years, 2016–2020. Thirty-three Dataverse portals have had less than 10,000 total downloads since their inception. Twenty-nine participating universities are included in three major global university ranking systems, and 18 of those university libraries offer research data services. Originality/value This project is an explorative study on Dataverse, an international research data management consortium. The findings contribute to the understanding of the current development of the Dataverse project as well as the practices at the participating institutions. Moreover, they offer insights to other global higher education institutions and research organizations regarding research data management. While this study is practical, its findings and observations could be of use to future researchers interested in developing a framework for data work in academic libraries.
Article
Full-text available
Se analiza el interés de los científicos en compartir datos de investigación con base en el ethos de la ciencia de Robert K. Merton. Se revisaron treinta documentos científicos, resultado de búsquedas en bases de datos como Scopus y Web of Science; el análisis interpretativo del contenido se realizó con el programa ATLAS.ti. Entre los hallazgos destacan diversos factores asociados a la negación de los científicos para compartir datos en la ciencia. Asimismo, se encontraron otros posicionamientos que promueven una concientización a favor de distribuirlos, pues el bien común está por encima de los intereses particulares. Finalmente, se constata que el ethos es una propuesta apropiada para examinar el interés de los investigadores en la distribución de datos de investigación.
Chapter
Openly sharing research data helps ensure research transparency and reproducibility, thus advancing scientific discoveries. As a pivotal player in the scholarly publishing ecosystem, more and more journals adopt data policies to encourage data sharing. Many previous studies have investigated the prevalence of journal data policies (JDPs). However, prior work usually focuses on the policy prevalence in certain disciplines at a specific time. To provide a comprehensive understanding of how JDPs have evolved across scientific areas over time, a systematic literature review was conducted. This study takes a content analysis approach to review 42 empirical studies that examine the prevalence of JDPs and their respective policy emphases. An upward trend was observed in the proportion of journals having data policies over the past two decades. The present study also reveals the policy emphases repeatedly discussed in the literature such as policy strength and suggested data sharing methods. The coding results and a reusable coding frame were made publicly available as a data package via the Open Science Framework (OSF) platform.KeywordsJournal data policiesResearch data sharingOpen scienceSystematic literature review
Article
This study discusses the sharing of research data through the Repositori Ilmiah Nasional, the Indonesian national scientific repository, which is managed by the Center for Scientific Data and Documentation, Indonesian Institute of Sciences (Pusat Dokumentasi dan Informasi Ilmiah, Lembaga Ilmu Pengetahuan Indonesia, known by the abbreviation PDDI-LIPI). The purpose of this study is to describe the process of research data sharing and identify supporting factors and obstacles faced in that process. This study uses a qualitative approach, with a case study method. Data collection techniques included field observations and observations on the repository system; semi-structured interviews with several informants, including researchers as well as development and librarian teams; and, analysis of policy documents and guidelines. Through these investigations, we discovered that while the Center has developed a new DataVerse repository system to enable research data sharing, there are still several issues that impede the repository from meeting institutional goals for increased data access. There is a need for additional training and socialization of researchers, to encourage and motivate them to share their research data through this service. Additionally, staff members need to gain competence in the management and curation of data. Researchers and librarians involved in research data sharing activities still face various obstacles in the areas of policy, service visibility, and promotion. This research is expected to increase the awareness of researchers, librarians, and repository development teams about each other’s needs and to aid them in collaborating with each other to optimize the sharing of research of data through the repository.
Article
Full-text available
Cel/Teza: Czasopisma jako jeden z podstawowych kanałów komunikacji naukowej powinny wspierać badaczy w procesie otwartego udostępniania danych badawczych. Ich upublicznienie wpływa bowiem pozytywnie na jakość badań naukowych, zmniejsza koszty ich prowadzenia, sprzyja nawiązywaniu współpracy naukowej. Znaczenie tego zagadnienia skłania do przeprowadzenia badań nad strategią postępowania z danymi badawczymi podejmowaną przez polskie i zagraniczne czasopisma. W artykule zbadano ten problem na przykładzie grupy 198 polskich i 95 zagranicznych czasopism z obszaru nauk historycznych. Koncepcja/Metody badań: Strategię postępowania z danymi badawczymi zbadano, analizując instrukcje dla autorów opublikowane na witrynach WWW czasopism historycznych, które znalazły się na liście czasopism punktowanych przez MNiSW oraz zagranicznych czasopism posiadających wskaźnik Impact Factor. W instrukcjach szukano odniesień do kwestii postępowania z danymi badawczymi. Wyniki i wnioski: Z analizy wynika, że czasopisma z obszaru nauk historycznych z oporem przyjmują wprowadzanie zasad polityki postępowania z danymi badawczymi. Szczególnie jest to widoczne w przypadku polskich periodyków, choć także wśród zagranicznych czasopism z ustalonym wskaźnikiem Impact Factor wdrażanie odpowiednich praktyk nie jest powszechnym zjawiskiem. Wartość poznawcza: Badanie pokazuje jeden z pomijanych aspektów funkcjonowania czasopism naukowych, zwłaszcza w kontekście dyskusji na temat zapewnienia otwartego dostępu do publikacji naukowych i danych badawczych. Uzasadnia także potrzebę wdrożenia w czasopismach naukowych dobrych praktyk związanych z udostępnianiem danych badawczych.
Article
Background: In the context of globalization, Vietnamese universities, whose primary function is teaching, there is a need to improve research performance. Methods: Based on SSHPA data, an exclusive database of Vietnamese social sciences and humanities researchers’ productivity, between 2008 and 2019 period, this study analyzes the research output of Vietnamese universities in the field of social sciences and humanities. Results: Vietnamese universities have been steadily producing a high volume of publications in the 2008-2019 period, with a peak of 598 articles in 2019. Moreover, many private universities and institutions are also joining the publication race, pushing competitiveness in the country. Conclusions: Solutions to improve both quantity and quality of Vietnamese universities’ research practice in the context of the industrial revolution 4.0 could be applying international criteria in Vietnamese higher education, developing scientific and critical thinking for general and STEM education, and promoting science communication.
Article
Ein freier und uneingeschränkter Zugang (Open Access) zu wissenschaftlichen Erkenntnissen kann zur Unterstützung eines effizienten, transparenten und nachhaltigen wissenschaftlichen Arbeitens beitragen. In der Open-Access-Bewegung werden Forschungsdaten als eine zentrale Grundlage wissenschaftlicher Erkenntnisse durch das stetig ansteigende Datenaufkommen in allen Forschungsbereichen zunehmend wichtiger. Forschungsdatenmanagement und das gemeinsame Nutzen von Daten ermöglichen kollaboratives Arbeiten und führen dazu, dass Forschung über disziplinäre Grenzen hinweg nachvollziehbar und nachnutzbar wird. Für ein eingegrenztes Anwendungsgebiet mit hohen Standardisierungsmöglichkeiten (motorische Testdaten) hat sich die eResearch-Infrastruktur MO|RE data zum Ziel gesetzt, Daten öffentlich zugänglich und zitierfähig zur Verfügung zu stellen. Es werden Digital-Object-Identifier (DOI) verwendet, die ein Data-Pooling verschiedener Datensätze ermöglichen. Durch eine systematische Recherche in Repositorienverzeichnissen sowie einer Befragung unter Motoriktestanwender*innen wurden in der folgenden Untersuchung der Bedarf und die Relevanz der eResearch-Infrastruktur für das Anwendungsfeld motorischer Tests analysiert. Die Umfrageteilnehmer*innen wurden zu ihrem Interesse an einer Nutzung von MO|RE data sowie zur Open-Access-Bereitschaft eigener motorischer Testdaten auf MO|RE data befragt. Aktuell existiert keine vergleichbare Datenbank, die Motoriktestdaten frei zugänglich bereitstellt. Von den 143 teilnehmenden Personen der Umfrage zeigten sowohl Motoriktestanwender*innen aus wissenschaftlichen Einrichtungen als auch aus anderen Handlungsfeldern (Schule, Kindergarten, Verein) ein großes Interesse an einer Nutzung von MO|RE data. Das Interesse von Datenhaltern*innen war dabei deutlich höher als das von Nichtdatenhalter*innen. Die Bereitschaft, eigene Daten auf MO|RE data bereitzustellen, war mit über 70 % sehr hoch. Es lässt sich zusammenfassend sagen, dass Open Data bei der untersuchten Stichprobe aus dem sportwissenschaftlichen Anwendungsfeld motorischer Tests auf hohe Akzeptanz und Zustimmung gestoßen ist.
Article
Full-text available
The objective of this study is to develop a model guidelines addressing legal impediments to open access to publicly funded research data in Malaysia. Previous studies have identified legal impediments to open access arising from intellectual property, confidentiality, privacy, national security, patent and tort laws. The legal impediments have not been fully addressed by public research funding agencies in Malaysia, thus the need for a model guidelines to be developed. This study conducted a comparative analysis of the principles/policies/guidelines on open access to research data of the civil society, government bodies, research funding agencies and research institutions in Australia, Canada, the EU, the UK and the USA. This comparative analysis attempts to identify the appropriate measures to address the legal impediments to open access to research data. This model guidelines is of international standard and suitable for adoption by public research funding agencies and research institutions in Malaysia. Hence, the model guidelines can become a benchmark in pursuing the objective of enabling open access to publicly funded research data in Malaysia.
Article
Full-text available
Este artigo tem o objetivo de apresentar os princípios FAIR e a iniciativa Global Open FAIR que busca disseminar esses princípios em todos os países interessados na aplicação dos dados FAIR (Findable, Accessible, Interoperable, Reusable) em seus serviços de informação. Propõe ainda a divulgação e capacitação de instituições de ensino e pesquisa nesses princípios, com o intuito de promover a normalização no tratamento da gestão dos dados garantindo a interoperabilidade entre eles. Como procedimento metodológico, utiliza a revisão bibliográfica e documental para o embasamento teórico sobre ciência aberta, acesso aberto à informação científica e aos dados de pesquisa, visando fundamentar os princípios FAIR em aplicações e serviços de gestão de dados de pesquisa. Ressalta a importância desse tipo de iniciativa para a expansão mundial de abertura dos dados de pesquisa no âmbito da ciência aberta. Ao final, aponta para a necessidade de uma mudança nos processos de pesquisa em ciência e tecnologia na direção da adoção desses princípios.
Article
Full-text available
Article
Full-text available
The validation of scientific results requires reproducible methods and data. Often, however, data sets supporting research articles are not openly accessible and interlinked. This analysis tests whether open sharing and linking of supporting data through the PANGAEA° data library measurably increases the citation rate of articles published between 1993 and 2010 in the journal Paleoceanography as reported in the Thomson Reuters Web of Science database. The 12.85% (171) of articles with publicly available supporting data sets received 19.94% (8,056) of the aggregate citations (40,409). Publicly available data were thus significantly (p=0.007, 95% confidence interval) associated with about 35% more citations per article than the average of all articles sampled over the 18-year study period (1,331), and the increase is fairly consistent over time (14 of 18 years). This relationship between openly available, curated data and increased citation rate may incentivize researchers to share their data.
Article
Full-text available
The 'Berlin Declaration' was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the 'Berlin Declaration' should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community.
Article
Full-text available
Context The free and open sharing of information, data, and materials regarding published research is vital to the replication of published results, the efficient advancement of science, and the education of students. Yet in daily practice, the ideal of free sharing is often breached.Objective To understand the nature, extent, and consequences of data withholding in academic genetics.Design, Setting, and Participants Mailed survey (March-July 2000) of geneticists and other life scientists in the 100 US universities that received the most funding from the National Institutes of Health in 1998. Of a potential 3000 respondents, 2893 were eligible and 1849 responded, yielding an overall response rate of 64%. We analyzed a subsample of 1240 self-identified geneticists and made a limited number of comparisons with 600 self-identified nongeneticists.Main Outcome Measures Percentage of faculty who made requests for data that were denied; percentage of respondents who denied requests; influences on and consequences of withholding data; and changes over time in perceived willingness to share data.Results Forty-seven percent of geneticists who asked other faculty for additional information, data, or materials regarding published research reported that at least 1 of their requests had been denied in the preceding 3 years. Ten percent of all postpublication requests for additional information were denied. Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research. Twelve percent said that in the previous 3 years, they had denied another academician's request for data concerning published results. Among geneticists who said they had intentionally withheld data regarding their published work, 80% reported that it required too much effort to produce the materials or information; 64%, that they were protecting the ability of a graduate student, postdoctoral fellow, or junior faculty member to publish; and 53%, that they were protecting their own ability to publish. Thirty-five percent of geneticists said that sharing had decreased during the last decade; 14%, that sharing had increased. Geneticists were as likely as other life scientists to deny others' requests (odds ratio [OR], 1.39; 95% confidence interval [CI], 0.81-2.40) and to have their own requests denied (OR, 0.97; 95% CI, 0.69-1.40). However, other life scientists were less likely to report that withholding had a negative impact on their own research as well as their field of research.Conclusions Data withholding occurs in academic genetics and it affects essential scientific activities such as the ability to confirm published results. Lack of resources and issues of scientific priority may play an important role in scientists' decisions to withhold data, materials, and information from other academic geneticists. Figures in this Article Without the free exchange of published scientific information and resources, researchers may unknowingly build on something less than the total accumulation of scientific knowledge or work on problems already solved.1 However, a number of instances of data withholding (defining data to include the full range of research results, techniques, and materials useful in future investigations and withholding as the failure to share such published data) have been reported.2- 7 A 1994-1995 survey of academic life scientists found that 34% of respondents were denied research results requested from a fellow university scientist in the previous 3 years, and 8.9% said they had denied a request from another university scientist for access to research results.8 Weinberg9 asserts that secrecy is more common in genetics and particularly human genetics than in other areas. Reasons may include the increased scientific competitiveness of the field and the opportunities for commercial applications.10 Research has shown that scientists who reported conducting research on goals similar to that of the Human Genome Project (HGP) were more likely to deny requests for information, data, and materials than were other life scientists.8 Understanding the withholding of information, data, and materials may be particularly important in genetics for a number of reasons. First, since academic geneticists publish more articles in peer-reviewed journals, teach more, and serve in more leadership roles in their university and discipline than do their colleagues in other biomedical specialties, the sharing and withholding practices of geneticists may have a disproportionate impact on university policy, the behavior of junior faculty, and the training and socialization of graduate students and postdoctoral fellows.11 Second, understanding the role of genetics in human disease is believed to be important to the future of medicine.12 Clearly, the progress made in mapping and sequencing the human genome represents a major step toward scientific breakthroughs in genetic-based diagnostics, preventive technologies, and therapeutics. The rate of progress in realizing these medical benefits may depend somewhat on the extent to which the results of genetic investigations flow freely among scientists in the field. There is scant empirical evidence regarding sharing and withholding in academic genetics. For example, little is known about the extent to which geneticists share and withhold information and how these behaviors have changed over time. Nor do we know much about the reasons researchers withhold information, data, or materials from other academicians and what impact this behavior has on individual researchers or on the field of genetics as a whole. To address these issues, we conducted a national study of data sharing and data withholding in academic genetics, with a comparison group of other life sciences.
Article
Full-text available
This paper present some indications of the existence of a Citation Advantage related to linking to data, using astrophysics as a case. Using simple measures, I find that the Citation Advantage presently (at the least since 2009 and in The Astrophysical Journal) amounts to papers with links to data receiving on the average 50% more citations per paper per year, than the papers without links to data. A similar study by other authors should a cummulative effect after several years amounting to 20%. Hence, a Data Sharing Citation Advantage seems inevitable.
Article
Full-text available
Free and open access to primary biodiversity data is essential for informed decision-making to achieve conservation of biodiversity and sustainable development. However, primary biodiversity data are neither easily accessible nor discoverable. Among several impediments, one is a lack of incentives to data publishers for publishing of their data resources. One such mechanism currently lacking is recognition through conventional scholarly publication of enriched metadata, which should ensure rapid discovery of 'fit-for-use' biodiversity data resources. We review the state of the art of data discovery options and the mechanisms in place for incentivizing data publishers efforts towards easy, efficient and enhanced publishing, dissemination, sharing and re-use of biodiversity data. We propose the establishment of the 'biodiversity data paper' as one possible mechanism to offer scholarly recognition for efforts and investment by data publishers in authoring rich metadata and publishing them as citable academic papers. While detailing the benefits to data publishers, we describe the objectives, work flow and outcomes of the pilot project commissioned by the Global Biodiversity Information Facility in collaboration with scholarly publishers and pioneered by Pensoft Publishers through its journals Zookeys, PhytoKeys, MycoKeys, BioRisk, NeoBiota, Nature Conservation and the forthcoming Biodiversity Data Journal. We then debate further enhancements of the data paper beyond the pilot project and attempt to forecast the future uptake of data papers as an incentivization mechanism by the stakeholder communities. We believe that in addition to recognition for those involved in the data publishing enterprise, data papers will also expedite publishing of fit-for-use biodiversity data resources. However, uptake and establishment of the data paper as a potential mechanism of scholarly recognition requires a high degree of commitment and investment by the cross-sectional stakeholder communities.
Article
Full-text available
GenBank® (http://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for over 300 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP.
Article
Full-text available
Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers--data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results. A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the short- and long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region. Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.
Article
Full-text available
Three articles from the early years of Molecular Biology of the Cell (MBoC) have had remarkably many citations in the literature since their publication approximately 10 years ago. As a coauthor of these articles and the former editor of MBoC, I was asked for possible explanations. I believe the answer lies in the unusual nature of these articles: each presents and summarizes gene expression data for nearly every gene in the yeast or human genomes. Continuing interest in the data themselves by cell biologists, rather than results or conclusions drawn by the authors, best accounts for the citation history. The flatness of the numbers of citations over time, the continuing high rate of accesses to individual Web sites set up to allow searching and display of the underlying data, and the large fraction of citations in journals focused on mathematics and computation all support the same conclusion: it's the data.
Article
Full-text available
Many journals now require authors share their data with other investigators, either by depositing the data in a public repository or making it freely available upon request. These policies are explicit, but remain largely untested. We sought to determine how well authors comply with such policies by requesting data from authors who had published in one of two journals with clear data sharing policies. We requested data from ten investigators who had published in either PLoS Medicine or PLoS Clinical Trials. All responses were carefully documented. In the event that we were refused data, we reminded authors of the journal's data sharing guidelines. If we did not receive a response to our initial request, a second request was made. Following the ten requests for raw data, three investigators did not respond, four authors responded and refused to share their data, two email addresses were no longer valid, and one author requested further details. A reminder of PLoS's explicit requirement that authors share data did not change the reply from the four authors who initially refused. Only one author sent an original data set. We received only one of ten raw data sets requested. This suggests that journal policies requiring data sharing do not lead to authors making their data sets available to independent investigators.
Article
Full-text available
Rapid release of prepublication data has served the field of genomics well. Attendees at a workshop in Toronto recommend extending the practice to other biological data sets.
Article
Full-text available
The free and open sharing of information, data, and materials regarding published research is vital to the replication of published results, the efficient advancement of science, and the education of students. Yet in daily practice, the ideal of free sharing is often breached. To understand the nature, extent, and consequences of data withholding in academic genetics. Mailed survey (March-July 2000) of geneticists and other life scientists in the 100 US universities that received the most funding from the National Institutes of Health in 1998. Of a potential 3000 respondents, 2893 were eligible and 1849 responded, yielding an overall response rate of 64%. We analyzed a subsample of 1240 self-identified geneticists and made a limited number of comparisons with 600 self-identified nongeneticists. Percentage of faculty who made requests for data that were denied; percentage of respondents who denied requests; influences on and consequences of withholding data; and changes over time in perceived willingness to share data. Forty-seven percent of geneticists who asked other faculty for additional information, data, or materials regarding published research reported that at least 1 of their requests had been denied in the preceding 3 years. Ten percent of all postpublication requests for additional information were denied. Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research. Twelve percent said that in the previous 3 years, they had denied another academician's request for data concerning published results. Among geneticists who said they had intentionally withheld data regarding their published work, 80% reported that it required too much effort to produce the materials or information; 64%, that they were protecting the ability of a graduate student, postdoctoral fellow, or junior faculty member to publish; and 53%, that they were protecting their own ability to publish. Thirty-five percent of geneticists said that sharing had decreased during the last decade; 14%, that sharing had increased. Geneticists were as likely as other life scientists to deny others' requests (odds ratio [OR], 1.39; 95% confidence interval [CI], 0.81-2.40) and to have their own requests denied (OR, 0.97; 95% CI, 0.69-1.40). However, other life scientists were less likely to report that withholding had a negative impact on their own research as well as their field of research. Data withholding occurs in academic genetics and it affects essential scientific activities such as the ability to confirm published results. Lack of resources and issues of scientific priority may play an important role in scientists' decisions to withhold data, materials, and information from other academic geneticists.
Article
Full-text available
The origin of the present comment lies in a failed attempt to obtain, through e-mailed requests, data reported in 141 empirical articles recently published by the American Psychological Association (APA). Our original aim was to reanalyze these data sets to assess the robustness of the research findings to outliers. We never got that far. In June 2005, we contacted the corresponding author of every article that appeared in the last two 2004 issues of four major APA journals. Because their articles had been published in APA journals, we were certain that all of the authors had signed the APA Certification of Compliance With APA Ethical Principles, which includes the principle on sharing data for reanalysis. Unfortunately, 6 months later, after writing more than 400 e-mails--and sending some corresponding authors detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes-we ended up with a meager 38 positive reactions and the actual data sets from 64 studies (25.7% of the total number of 249 data sets). This means that 73% of the authors did not share their data.
Article
Full-text available
Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.
Article
Scientists around the world are addressing the need to increase access to research data. Science is international and global cooperation is imperative. DataCite, launched in December 2009, is an association of more than a dozen members from 10 countries and growing, that enables researchers to locate, identify, and cite research datasets with confidence, and plays a global leadership role in promoting the use of persistent identifiers for datasets. In June 2010, the first DataCite summer meeting took place in Hannover, Germany and provided a forum for 25 speakers and nearly 100 participants from Europe, North America and Australia to exchange information for handling research data. This special issue of D-Lib Magazine includes eight articles derived from talks given at the summer meeting and one additional article on the quality of research data. Together, these articles provide a snapshot of the state-of-the-art on these topics.
Article
In 2008, ESSD was established to provide a venue for publishing highly important research data, with two main aims: To provide reward for data "authors" through fully qualified citation of research data, classically aligned with the certification of quality of a peer reviewed journal. A major step towards this goal was the definition and rationale of article structure and review criteria for articles about datasets.
Article
The main conclusion is that publishers and repositories have the building blocks and the tools, but in general do not use them to create an Enhanced Publication for all three information categories. Publisher and repositories should offer the service and tools to add research data, extra materials and post-publication data to the publications. Researchers should be responsible for the content.
Book
Die fortschreitende Digitalisierung der Wissenschaft führt zu einem rasant ansteigenden Aufkommen an digitalen Forschungsdaten.1 Wissenschaftspolitisch gewinnt die Forderung nach einem verantwortungsvollen Umgang mit diesen Daten an Bedeutung. Im Rahmen von E-Science und Cyberinfrastructure2 werden Konzepte des Managements von Forschungsdaten diskutiert und angewendet. Die vielfältigen und häufig disziplinspezifischen Herausforderungen beim Umgang mit wissenschaftlichen Daten fordern eine engere Kooperation zwischen Wissenschaft und infrastrukturellen Serviceeinrichtungen. Bibliotheken bietet sich die Chance, die Entwicklung organisatorischer und technischer Lösungen des Forschungsdatenmanagements3 aktiv zu gestalten und eine tragende Rolle in diesem Feld zu übernehmen. Hierzu werden von Bibliothekaren zunehmend kommunikative und Schnittstellen-Kompetenzen gefordert.
Chapter
Unter Beachtung disziplinärer Anforderungen beginnen Akteure aus Wissenschaft, Wissenschaftsmanagement und Infrastruktureinrichtungen Aussagen zum Umgang mit Forschungsdaten zu tätigen. Je nach Akteur und Zielgruppe variieren diese Aussagen, die häufig unter dem Begriff Policy gefasst werden. Der Beitrag gibt einen Überblick über die Vielfalt der Policies und beschreibt die Herausforderungen bei der Umsetzung dieser empfehlenden oder verpflichtenden Aussagen.
Article
Scientists are getting to know the pathogen causing the deadliest outbreak of enterohemorrhagic Escherichia coli bacteria on record in unprecedented detail via tweets, wikis, and blogs.
Article
The deluge of scientific research data has excited the general public, as well as the scientific community, with the possibilities for better understanding of scientific problems, from climate to culture. For data to be available, researchers must be willing and able to share them. The policies of governments, funding agencies, journals, and university tenure and promotion committees also influence how, when, and whether research data are shared. Data are complex objects. Their purposes and the methods by which they are produced vary widely across scientific fields, as do the criteria for sharing them. To address these challenges, it is necessary to examine the arguments for sharing data and how those arguments match the motivations and interests of the scientific community and the public. Four arguments are examined: to make the results of publicly funded data available to the public, to enable others to ask new questions of extant data, to advance the state of science, and to reproduce research. Libraries need to consider their role in the face of each of these arguments, and what expertise and systems they require for data curation.
Compilation of results on drivers and barriers and new opportunities
  • S Dallmeier-Tiessen
GenBank celebrates 25 years of service with two day conference. Leading scientists will discuss the DNA database at April 7-8 Meeting
  • K Cravedi
Commission recommendation on access to and preservation of scientific information. C(2012) 4890 final
  • European Commission
Opening Science Through e-infrastructures. Available at: europa.eu/rapid/pressReleasesAction.do?reference=SPEECH/12/258
  • N Kroes
Insight into digital preservation of research output in Europe
  • T Kuipers
  • J Van Der Hoeven
The role of libraries in curation and preservation of research data in Germany: Findings of a survey
  • A Osswald
  • S Strathmann
The role of libraries in supporting data exchange
  • S Reilly
International large-scale sequencing meeting
  • D Smith
  • A Carrano
The German E. coli outbreak: 40 lives and hours of crowdsourced sequence analysis later
  • M Turner
Sharing data from largescale biological research projects. A system of tripartite responsibility
  • Wellcome Trust
Award and administration guide. Chapter VI other post award requirements and considerations, Available at
National Science Foundation. (2011a). Award and administration guide. Chapter VI other post award requirements and considerations, Available at: http://www.nsf.gov/pubs/policydocs/ pappguide/nsf11001/aag_6.jsp#VID4.
How to encourage the right behaviour
Nature. (2002). How to encourage the right behaviour. Nature, 416(6876), 1. doi:10.1038/ 416001b.
''Data librarianship Available at
  • H Pampel
  • R Bertelmann
  • H.-C Hobohm
  • Rollen
  • Kompetenzen Aufgaben
Pampel, H., Bertelmann, R., & Hobohm, H.-C. (2010). ''Data librarianship'' Rollen, Aufgaben, Kompetenzen. In U. Hohoff & C. Schmiedeknecht (eds.), Ein neuer Blick auf Bibliotheken. Hildesheim: Olms, pp. 159–176. Available at: http://econpapers.repec.org/paper/rswrswwps/ rswwps144.htm.
Challenges and opportunities
Science. (2011). Challenges and opportunities. Science, 331(6018), 692–693. doi:10.1126/ science.331.6018.692.
Credit where credit is overdue
Nature Biotechnology. (2009). Credit where credit is overdue. Nature Biotechnology, 27(7), 579. doi:10.1038/nbt0709-579.
Open Science for the 21st century Available at
All European Academies. (2012). Open Science for the 21st century, Rome, Italy. Available at: http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/allea-declaration-1.pdf.
Linking to data-effect on citation rates in astronomy. Digital Libraries; Instrumentation and Methods for Astrophysics
  • E A Henneken
  • A Accomazzi
Henneken, E.A. & Accomazzi, A. (2011). Linking to data—effect on citation rates in astronomy. Digital Libraries; Instrumentation and Methods for Astrophysics. Available at: http:// arxiv.org/abs/1111.3618v1.
Access to research data. D-Lib Magazine
  • J Brase
  • A Farquhar
Brase, J. & Farquhar, A. (2011). Access to research data. D-Lib Magazine, 17(1/2). doi:10.1045/ january2011-brase.
  • D A Benson
Benson, D.A., et al. (2012). GenBank. Nucleic Acids Research, 40(D1), D48–D53. doi:10.1093/ nar/gkr1202.
The German E. coli outbreak: 40 lives and hours of crowdsourced sequence analysis later Available at: http://blogs.nature.com/news A Surfboard for riding the wave. Towards a four country action programme on research data
  • M M Turner
  • L Waaijers
Turner, M. (2011). The German E. coli outbreak: 40 lives and hours of crowdsourced sequence analysis later. Nature News Blog. Available at: http://blogs.nature.com/news/2011/06/ the_german_e_coli_outbreak_40.html. van der Graaf, M. & Waaijers, L. (2011). A Surfboard for riding the wave. Towards a four country action programme on research data, Wellcome Trust. Available at: http:// www.knowledge-exchange.info/Admin/Public.
Riding the wave. How Europe can gain from the rising tide of scientific data Available at
  • H Pampel
  • S Dallmeier
222 H. Pampel and S. Dallmeier-Tiessen High Level Expert Group on Scientific Data. (2010). Riding the wave. How Europe can gain from the rising tide of scientific data. Available at: http://cordis.europa.eu/fp7/ict/e-infrastructure/ docs/hlg-sdi-report.pdf.
Online survey on scientific information in the digital age
  • European Commission