Article

Metadata implementation and data discoverability: A survey on university libraries' Dataverse portals

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Research Data Management (RDM) has become increasingly important for more and more academic institutions. Using the Peking University Open Research Data Repository (PKU-ORDR) project as an example, this paper will review a library-based university-wide open research data repository project and related RDM services implementation process including project kickoff, needs assessment, partnerships establishment, software investigation and selection, software customization, as well as data curation services and training. Through the review, some issues revealed during the stages of the implementation process are also discussed and addressed in the paper such as awareness of research data, demands from data providers and users, data policies and requirements from home institution, requirements from funding agencies and publishers, the collaboration between administrative units and libraries, and concerns from data providers and users. The significance of the study is that the paper shows an example of creating an open data repository and RDM services for other Chinese academic libraries planning to implement their RDM services for their home institutions. The authors of the paper have also observed since the PKU-ORDR and RDM services implemented in 2015, the Peking University Library (PKUL) has helped numerous researchers to support the entire research lifecycle and enhanced open science practices on campus, as well as impacted the national open science movement in China through various national events and activities hosted by the PKUL.
Article
Full-text available
The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.
Article
Full-text available
A 5‐year project to study scientific data uses in geography, starting in 1999, evolved into 20 years of research on data practices in sensor networks, environmental sciences, biology, seismology, undersea science, biomedicine, astronomy, and other fields. By emulating the “team science” approaches of the scientists studied, the UCLA Center for Knowledge Infrastructures accumulated a comprehensive collection of qualitative data about how scientists generate, manage, use, and reuse data across domains. Building upon Paul N. Edwards's model of “making global data”—collecting signals via consistent methods, technologies, and policies—to “make data global”—comparing and integrating those data, the research team has managed and exploited these data as a collaborative resource. This article reflects on the social, technical, organizational, economic, and policy challenges the team has encountered in creating new knowledge from data old and new. We reflect on continuity over generations of students and staff, transitions between grants, transfer of legacy data between software tools, research methods, and the role of professional data managers in the social sciences.
Article
Full-text available
Purpose Research data management (RDM) has been called a “ground-breaking” area for research libraries and it is among the top future trends for academic libraries. Hence, this study aims to systematically review RDM practices and services primarily focusing on the challenges, services and skills along with motivational factors associated with it. Design/methodology/approach A systematic literature review method was used focusing on literature produced between 2016–2020 to understand the latest trends. An extensive research strategy was framed and 15,206 results appeared. Finally, 19 studies have fulfilled the criteria to be included in the study following preferred reporting items for systematic reviews and meta-analysis. Findings RDM is gradually gaining importance among researchers and academic libraries; however, it is still poorly practiced by researchers and academic libraries. Albeit, it is better observed in developed countries over developing countries, however, there are lots of challenges associated with RDM practices by researchers and services by libraries. These challenges demand certain sets of skills to be developed for better practices and services. An active collaboration is required among stakeholders and university services departments to figure out the challenges and issues. Research limitations/implications The implications of policy and practical point-of-view present how research data can be better managed in the future by researchers and library professionals. The expected/desired role of key stockholders in this regard is also highlighted. Originality/value RDM is an important and emerging area. Researchers and Library and Information Science professionals are not comprehensively managing research data as it involves complex cooperation among various stakeholders. A combination of measures is required to better manage research data that would ultimately move forward for open access publishing.
Article
Full-text available
Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.
Article
Full-text available
This paper reports the results of an international survey on research data management (RDM) services in libraries. More than 240 practicing librarians responded to the survey and outlined their roles and levels of preparedness in providing RDM services, challenges their libraries face, and knowledge and skills that they deemed essential to advance the RDM practice. Findings of the study revealed not only a number of location and organizational differences in RDM services and tools provided but also the impact of the level of preparedness and degree of development in RDM roles on the types of RDM services provided. Respondents’ perceptions on both the current challenges and future roles of RDM services were also examined. With a majority of the respondents recognizing the importance of RDM and hoping to receive more training while expressing concerns of lack of bandwidth or capacity in this area, it is clear that, in order to grow RDM services, institutional commitment to resources and training opportunities is crucial. As an emergent profession, data librarians need to be nurtured, mentored, and further trained. The study makes a case for developing a global community of practice where data librarians work together, exchange information, help one another grow, and strive to advance RDM practice around the world.
Article
Full-text available
‘Metadata’ has received a fraction of the attention that ‘data’ has received in sociological studies of scientific research. A neglect of ‘metadata’ reduces the attention on a number of critical aspects of scientific work processes, including documentary work, accountability relations, and collaboration routines. Metadata processes and products are essential components of the work needed to practically accomplish day-to-day scientific research tasks, and are central to ensuring that research findings and products meet externally driven standards or requirements. This article is an attempt to open up the discussion on and conceptualization of metadata within the sociology of science and the sociology of data. It presents ethnographic research of metadata creation within everyday scientific practice, focusing on how researchers document, describe, annotate, organize and manage their data, both for their own use and the use of researchers outside of their project. In particular, this article argues that the role and significance of metadata within scientific research contexts are intimately tied to the nature of evidence and accountability within particular social situations. Studying metadata can (1) provide insight into the production of evidence, that is, how something we might call ‘data’ becomes able to serve an evidentiary role, and (2) provide a mechanism for revealing what people in research contexts are held accountable for, and what they achieve accountability with.
Article
Full-text available
As data repositories make more data openly available it becomes challenging for researchers to find what they need either from a repository or through web search engines. This study attempts to investigate data users’ requirements and the role that data repositories can play in supporting data discoverability by meeting those requirements. We collected 79 data discovery use cases (or data search scenarios), from which we derived nine functional requirements for data repositories through qualitative analysis. We then applied usability heuristic evaluation and expert review methods to identify best practices that data repositories can implement to meet each functional requirement. We propose the following ten recommendations for data repository operators to consider for improving data discoverability and user’s data search experience: 1. Provide a range of query interfaces to accommodate various data search behaviours. 2. Provide multiple access points to find data. 3. Make it easier for researchers to judge relevance, accessibility and reusability of a data collection from a search summary. 4. Make individual metadata records readable and analysable. 5. Enable sharing and downloading of bibliographic references. 6. Expose data usage statistics. 7. Strive for consistency with other repositories. 8. Identify and aggregate metadata records that describe the same data object. 9. Make metadata records easily indexed and searchable by major web search engines. 10. Follow API search standards and community adopted vocabularies for interoperability.
Article
Full-text available
Research data is the data which is generated when the researchers undertake or execute any research activity or project. The data may be textual, quantitative, qualitative, images, recordings, musical compositions, verbal communication, experimental readings, simulations, codes and so on. It needs to be preserved for future use.In this context, the paper has studied the research data management (RDM) services implemented by different university libraries for managing, organising, curating and preserving research data generated at their universities’ departments and laboratories, for reusing and sharing. It has surveyed the central university libraries and the best 20 university libraries of the world to highlight how RDM is extended to the researchers. Further, it has suggested a model for the university libraries in the country to follow for actually deploying RDM services
Article
Full-text available
This paper presents a comprehensive overview of the literature on the types, effects, conditions and user of Open Government Data (OGD). The review analyses 101 academic studies about OGD which discuss at least one of the four factors of OGD utilization: the different types of utilization, the effects of utilization, the key conditions, and the different users. Our analysis shows that the majority of studies focus on the OGD provisions while assuming, but not empirically testing, various forms of utilization. The paper synthesizes the hypothesized relations in a multi-dimensional framework of OGD utilization. Based on the framework we suggest four future directions for research: 1) investigate the link between type of utilization and type of users (e.g. journalists, citizens) 2) investigate the link between type of user and type of effect (e.g. societal, economic and good governance benefits) 3) investigate the conditions that moderate OGD effects (e.g. policy, data quality) and 4) establishing a causal link between utilization and OGD outcomes.
Article
Full-text available
The paper provides an overview of recent research and publications on the integration of research data in Current Research Information Systems (CRIS) and addresses three related issues, i.e. the object of evaluation, identifier schemes and conservation. Our focus is on social sciences and humanities. As research data gradually become a crucial topic of scientific communication and evaluation, current research information systems must be able to consider and manage the great variety and granularity levels of data as sources and results of scientific research. More empirical and moreover conceptual work is needed to increase our understanding of the reality of research data and the way they can and should be used for the needs and objectives of research evaluation. The paper contributes to the debate on the evaluation of research data, especially in the environment of open science and open data, and will be helpful in implementing CRIS and research data policies.
Article
Full-text available
Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to “open science” dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories. This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain. They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation. From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories. These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.
Article
Full-text available
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
Article
Full-text available
Data sharing is increasingly recognized as integral to scientific research and publishing. This requires informed and thoughtful preparation from initial research planning to collection of data/metadata, interoperability, deposit in data repositories, and curation. Research Data Canada (RDC) is a collaborative, non-government organization that promotes access to and preservation of Canadian research data. The RDC Standards and Interoperability Committee (RDC-SINC) surveyed 32 Canadian and International online data platforms for storage, data transfer, curation activities, preservation, access, and sharing features. We developed a checklist to compare criteria and features between platforms. The survey revealed a heterogeneity of features and services across platforms, non-standardized use of terms, uneven compliance with relevant standards, and a paucity of certified data repositories. Recommendations for online digital infrastructure development to meet evolving researcher and end-user needs centre around persistent identification and citation of datasets, data reliability, version control, metadata, data sharing, privacy controls, long-term preservation of data, and certification of data repositories. We identified a need in Canada for investment in an integrated, comprehensive national digital infrastructure for research data.
Article
Purpose The purpose of this study is to examine the development of Dataverse, a global research data management consortium. The authors examine specifically the institutional characteristics, the utilization of the associated data sets and the relevant research data management services at its participating university libraries. This evidence-based approach is essential for understanding the current state of research data management practices in the global context. Design/methodology/approach The data was collected from 67 participants’ data portals between December 1, 2020, and January 31, 2021. Findings Over 80% of its current participants joined the group in the past five years, 2016–2020. Thirty-three Dataverse portals have had less than 10,000 total downloads since their inception. Twenty-nine participating universities are included in three major global university ranking systems, and 18 of those university libraries offer research data services. Originality/value This project is an explorative study on Dataverse, an international research data management consortium. The findings contribute to the understanding of the current development of the Dataverse project as well as the practices at the participating institutions. Moreover, they offer insights to other global higher education institutions and research organizations regarding research data management. While this study is practical, its findings and observations could be of use to future researchers interested in developing a framework for data work in academic libraries.
Article
Meta­data in various forms pervades our institutions, technologies, and daily lives. Meta­data is a distinct focus of academic research and professional practice for many people within the library and information sciences (LIS). This article is an exploration of the concept of “meta­data.” It presents a high-level introduction to the topic with analysis of key research problems and practical challenges. The paper discusses varying understandings of what “meta­data” means, the origin and evolution of meta­data as an important topic within information and data fields, and the central characteristics of that which gets called “meta­data.” Meta­data can be understood as both process and product and can result from both human effort and computational techniques. Given the central role meta­data have in the establishment of know­ledge, evidence, and truth, it is necessary for researchers and professionals within LIS to think critically about our meta­data practices and systems.
Article
The ubiquity of data lakes has created fascinating new challenges for data management research. In this tutorial, we review the state-of-the-art in data management for data lakes. We consider how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management.
Article
Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
Article
Research data is an essential part of the scholarly record, and management of research data is increasingly seen as an important role for academic libraries. This article presents the results of a survey of directors of the Association of European Research Libraries (LIBER) academic member libraries to discover what types of research data services (RDS) are being offered by European academic research libraries and what services are planned for the future. Overall, the survey found that library directors strongly agree on the importance of RDS. As was found in earlier studies of academic libraries in North America, more European libraries are currently offering or are planning to offer consultative or reference RDS than technical or hands-on RDS. The majority of libraries provide support for training in skills related to RDS for their staff members. Almost all libraries collaborate with other organizations inside their institutions or with outside institutions in orderto offer or develop policy related to RDS. We discuss the implications of the current state of RDS in European academic research libraries, and offer directions for future research. © 2017, Igitur, Utrecht Publishing and Archiving Services. All rights reserved.
Article
Purpose – The purpose of this paper is to demonstrate how knowledge of local research data management (RDM) practices critically informs the progressive development of research data services (RDS) after basic services have already been established. Design/methodology/approach – An online survey was distributed via e-mail to all university faculty in the fall of 2013, and was left open for just over one month. The authors sent two reminder e-mails before closing the survey. Survey data were downloaded from Qualtrics survey software and analyzed in R. Findings – In this paper, the authors reviewed a subset of survey findings that included data types, volume, and storage locations, RDM roles and responsibilities, and metadata practices. The authors found that Oregon State University (OSU) researchers are generating a wide variety of data types, and that practices vary between colleges. The authors discovered that faculty are not utilizing campus-wide storage infrastructure, and are maintaining their own storage servers in surprising numbers. Faculty-level research assistants perform the majority of data-related tasks at OSU, with the exception of data sharing, which is primarily handled by the professorial ranks. The authors found that many faculty on campus are creating metadata, but that there is a need to provide support in how to discover and create standardized metadata. Originality/value – This paper presents a novel example of how to efficiently move from establishing basic RDM services to providing more focussed services that meet specific local needs. It provides an approach for others to follow when tackling the difficult question of, “What next?” with regard to providing academic RDS.
Article
The aim of this research study is to examine the functionality development of the open source repository system: DSpace. The data on DSpace repositories' implementation practices were collected from the DSpace User Registry during September 2013–March 2014. A total of 545 repositories in the registry indicated specific system function customizations, representing 533 unique institutions from 95 countries worldwide. The findings indicate that U.S.A. and India are the top two countries to have adopted DSpace. The majority of the DSpace digital repositories are created by academic institutions, which indicates a strong representation of academic institutions in the use of DSpace. The major adopted system functions are statistics, Dublin Core Meta Toolkit, Manakin Themes, and language packages. Most DSpace members use the repository system as their institutional and learning resource repositories. The top content types are conference papers, research documents, and learning/teaching materials. The implications of the findings are also discussed.
Article
Research data curation initiatives must support heterogeneous kinds of projects, data, and metadata. This article examines variability in data and metadata practices using “institutions” as the key theoretical concept. Institutions, in the sense used here, are stable patterns of human behavior that structure, legitimize, or delegitimize actions, relationships, and understandings within particular situations. Based on prior conceptualizations of institutions, a theoretical framework is presented that outlines 5 categories of “institutional carriers” for data practices: (a) norms and symbols, (b) intermediaries, (c) routines, (d) standards, and (e) material objects. These institutional carriers are central to understanding how scientific data and metadata practices originate, stabilize, evolve, and transfer. This institutional framework is applied to 3 case studies: the Center for Embedded Networked Sensing (CENS), the Long Term Ecological Research (LTER) network, and the University Corporation for Atmospheric Research (UCAR). These cases are used to illustrate how institutional support for data and metadata management are not uniform within a single organization or academic discipline. Instead, broad spectra of institutional configurations for managing data and metadata exist within and across disciplines and organizations.
Search facets and ranking in geospatial dataset search
  • Hervey
RAW data for detailed information of in vivo experiment of sea grapes extract activity against blood glucose level (BGL), total cholesterol (TC), and serum PGC-1α concentration
  • Harvard Dataverse
Harvard Dataverse. RAW data for detailed information of in vivo experiment of sea grapes extract activity against blood glucose level (BGL), total cholesterol (TC), and serum PGC-1α concentration. https://dataverse.harvard.edu/dataset.xhtml?persiste ntId=doi:10.7910/DVN/8IKREA.
Search facets and ranking in geospatial dataset search
  • T Hervey
  • S Lafia
  • W Kuhn
Hervey, T., Lafia, S., & Kuhn, W. (2020). Search facets and ranking in geospatial dataset search. Leibniz International Proceedings in Informatics, LIPIcs, 177. https://doi.org/ 10.4230/LIPIcs.GIScience.2021.I.5