Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

OpenAIRE is the European Union initiative for an Open Access Infrastructure for Research in support of open scholarly communication and access to the research output of European funded projects and open access content from a network of institutional and disciplinary repositories. This article outlines the curation activities conducted in the OpenAIRE infrastructure, which employs a multi-level, multi-targeted approach: the publication and implementation of interoperability guidelines to assist in the local data curation processes, the data curation due to the integration of heterogeneous sources supporting different types of data, the inference of links to accomplish the publication research contextualization and data enrichment, and the end-user metadata curation that allows users to edit the attributes and provide links among the entities.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Generic RKGs focus on bibliographic metadata of scientific artifacts and entities. There are several well-known generic RKGs, such as Microsoft Academic Knowledge Graph 9 (Färber, 2019), OpenAlex 10 (Priem et al., 2022), Springer Nature SciGraph 11 (Hammond et al., 2017), Semantic Scholar Literature Graph 12 (Ammar et al., 2018), OpenAIRE Research Graph 13 (Manghi et al., 2019;Schirrwagen et al., 2013), Research Graph 14 (Aryani and Wang, 2017), and Scholarly Link Exchange (Scholix) 15 (Burton et al., 2017). All these RKGs have in common that they use bibliographic metadata to organize scientific artifacts, entities, and their relationships to enable, for example, their search, visualization, and processing (Brack et al., 2022;Stocker et al., 2022). ...
... Generic RKGs focus on bibliographic metadata of scientific artifacts and entities. There are several well-known generic RKGs, such as Microsoft Academic Knowledge Graph [54], OpenAlex [55], Springer Nature SciGraph [56], Semantic Scholar Literature Graph [33], OpenAIRE Research-Graph [57], [58], Research Graph [59], and Scholarly Link Exchange (Scholix) [60]. These RKGs have in common that they use bibliographic metadata to organize scientific artifacts, entities, and their relationships to enable, for example, their search, visualization, and processing [40], [61]. ...
Conference Paper
[Background.] Empirical research in requirements engineering (RE) is a constantly evolving topic, with a growing number of publications. Several papers address this topic using literature reviews to provide a snapshot of its “current” state and evolution. However, these papers have never built on or updated earlier ones, resulting in overlap and redundancy. The underlying problem is the unavailability of data from earlier works. Researchers need technical infrastructures to conduct sustainable literature reviews. [Aims.] We examine the use of the Open Research Knowledge Graph (ORKG) as such an infrastructure to build and publish an initial Knowledge Graph of Empirical research in RE (KG-EmpiRE) whose data is openly available. Our long-term goal is to continuously maintain KG-EmpiRE with the research community to synthesize a comprehensive, up-to-date, and long-term available overview of the state and evolution of empirical research in RE. [Method.] We conduct a literature review using the ORKG to build and publish KG-EmpiRE which we evaluate against competency questions derived from a published vision of empirical research in software (requirements) engineering for 2020–2025. [Results.] From 570 papers of the IEEE International Requirements Engineering Conference (2000–2022), we extract and analyze data on the reported empirical research and answer 16 out of 77 competency questions. These answers show a positive development towards the vision, but also the need for future improvements. [Conclusions.] The ORKG is a ready-to-use and advanced infrastructure to organize data from literature reviews as knowledge graphs. The resulting knowledge graphs make the data openly available and maintainable by research communities, enabling sustainable literature reviews.
... Generic RKGs focus on bibliographic metadata of scientific artifacts and entities. There are several well-known generic RKGs, such as Microsoft Academic Knowledge Graph [54], OpenAlex [55], Springer Nature SciGraph [56], Semantic Scholar Literature Graph [33], OpenAIRE Research-Graph [57], [58], Research Graph [59], and Scholarly Link Exchange (Scholix) [60]. These RKGs have in common that they use bibliographic metadata to organize scientific artifacts, entities, and their relationships to enable, for example, their search, visualization, and processing [40], [61]. ...
Preprint
[Background.] Empirical research in requirements engineering (RE) is a constantly evolving topic, with a growing number of publications. Several papers address this topic using literature reviews to provide a snapshot of its "current" state and evolution. However, these papers have never built on or updated earlier ones, resulting in overlap and redundancy. The underlying problem is the unavailability of data from earlier works. Researchers need technical infrastructures to conduct sustainable literature reviews. [Aims.] We examine the use of the Open Research Knowledge Graph (ORKG) as such an infrastructure to build and publish an initial Knowledge Graph of Empirical research in RE (KG-EmpiRE) whose data is openly available. Our long-term goal is to continuously maintain KG-EmpiRE with the research community to synthesize a comprehensive, up-to-date, and long-term available overview of the state and evolution of empirical research in RE. [Method.] We conduct a literature review using the ORKG to build and publish KG-EmpiRE which we evaluate against competency questions derived from a published vision of empirical research in software (requirements) engineering for 2020 - 2025. [Results.] From 570 papers of the IEEE International Requirements Engineering Conference (2000 - 2022), we extract and analyze data on the reported empirical research and answer 16 out of 77 competency questions. These answers show a positive development towards the vision, but also the need for future improvements. [Conclusions.] The ORKG is a ready-to-use and advanced infrastructure to organize data from literature reviews as knowledge graphs. The resulting knowledge graphs make the data openly available and maintainable by research communities, enabling sustainable literature reviews.
... Since 2004, DINI has certified repositories against a comprehensive set of criteria (Müller & Schirmbacher, 2007). From the beginning, these criteria have been aligned with the OAI standards and related European standardisation efforts driven by the EU-funded projects DRIVER (Lossau & Peters, 2008), OpenAIRE (Schirrwagen et al., 2013) and the Confederation of Open Access Fig. 1 Study design: schematic display of gathering, matching, and preprocessing of data (read from left to right). First, article-level data was obtained from the Web of Science in-house database of the German Competence Centre for Bibliometrics (WoS-KB) including its standardised affiliation information. ...
Article
Full-text available
This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010–2018. Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allowed us to identify different patterns of adoption of OA. Taking into account the particularities of the German research landscape, variations in terms of productivity, OA uptake and approaches to OA are examined at the meso-level and possible explanations are discussed. The development of the OA uptake is analysed for the different research sectors in Germany (universities, non-university research institutes of the Helmholtz Association, Fraunhofer Society, Max Planck Society, Leibniz Association, and government research agencies). Combining several data sources (incl. Web of Science, Unpaywall, an authority file of standardised German affiliation information, the ISSN-Gold-OA 3.0 list, and OpenDOAR), the study confirms the growth of the OA share mirroring the international trend reported in related studies. We found that 45% of all considered articles during the observed period were openly available at the time of analysis. Our findings show that subject-specific repositories are the most prevalent type of OA. However, the percentages for publication in fully OA journals and OA via institutional repositories show similarly steep increases. Enabling data-driven decision-making regarding the implementation of OA in Germany at the institutional level, the results of this study furthermore can serve as a baseline to assess the impact recent transformative agreements with major publishers will likely have on scholarly communication.
... The heterogeneous nature of such metadata and variety of sources plugging metadata to scholarly KGs [14,18,22] keeps complex metaresearch enquiries (research of research) challenging to analyse. This influences the quality of the services relying only on the explicitly represented information. ...
Chapter
The increasing availability of scholarly metadata in the form of Knowledge Graphs (KG) offers opportunities for studying the structure of scholarly communication and evolution of science. Such KGs build the foundation for knowledge-driven tasks e.g., link discovery, prediction and entity classification which allow to provide recommendation services. Knowledge graph embedding (KGE) models have been investigated for such knowledge-driven tasks in different application domains. One of the applications of KGE models is to provide link predictions, which can also be viewed as a foundation for recommendation service, e.g. high confidence “co-author” links in a scholarly knowledge graph can be seen as suggested collaborations. In this paper, KGEs are reconciled with a specific loss function (Soft Margin) and examined with respect to their performance for co-authorship link prediction task on scholarly KGs. The results show a significant improvement in the accuracy of the experimented KGE models on the considered scholarly KGs using this specific loss. TransE with Soft Margin (TransE-SM) obtains a score of 79.5% Hits@10 for co-authorship link prediction task while the original TransE obtains 77.2%, on the same task. In terms of accuracy and Hits@10, TransE-SM also outperforms other state-of-the-art embedding models such as ComplEx, ConvE and RotatE in this setting. The predicted co-authorship links have been validated by evaluating profile of scholars.
... As expected, Scholarly Knowledge Graphs (SKGs) are also incomplete, mainly due to the document-oriented workflow of scientific publishing and communication as well as the practiced data harvesting and integration approaches [10,9,14]. Despite the importance of the scholarly domain, automatic KG completion methods using link discovery tools have rarely been studied [6] as the graphs are heterogeneous and complex by nature [17]. ...
Preprint
Full-text available
Knowledge graphs (KGs), i.e. representation of information as a semantic graph, provide a significant test bed for many tasks including question answering, recommendation, and link prediction. Various amount of scholarly metadata have been made vailable as knowledge graphs from the diversity of data providers and agents. However, these high-quantities of data remain far from quality criteria in terms of completeness while growing at a rapid pace. Most of the attempts in completing such KGs are following traditional data digitization, harvesting and collaborative curation approaches. Whereas, advanced AI-related approaches such as embedding models - specifically designed for such tasks - are usually evaluated for standard benchmarks such as Freebase and Wordnet. The tailored nature of such datasets prevents those approaches to shed the lights on more accurate discoveries. Application of such models on domain-specific KGs takes advantage of enriched meta-data and provides accurate results where the underlying domain can enormously benefit. In this work, the TransE embedding model is reconciled for a specific link prediction task on scholarly metadata. The results show a significant shift in the accuracy and performance evaluation of the model on a dataset with scholarly metadata. The newly proposed version of TransE obtains 99.9% for link prediction task while original TransE gets 95%. In terms of accuracy and Hit@10, TransE outperforms other embedding models such as ComplEx, TransH and TransR experimented over scholarly knowledge graphs
... Aggregators and registries also play an infrastructural role of organizing and standardizing the repository landscape. This becomes particularly obvious in the European network OpenAIRE that issues guidelines for repository operators (Schirrwagen et al., 2013). An earlier and highly influential guideline has been issued by the German Initiative for Networked Information (DINI) that additionally implements a personal validation process for testing guideline compliance. ...
... Its mission is twofold: enabling the Open Science cultural shift of the current scientific communication infrastructure by linking, engaging, and aligning people, ideas, resources, and services at the global level; monitoring of Open Access trends and measuring research impact in terms of publications and datasets to serve research communities and funders. To this aim, OpenAIRE offers services [21] that collect, harmonize, de-duplicate, and enrich by inference (text mining) or end-user feedback, metadata relative to publications, datasets, organizations, persons, projects and several funders from all over the world. Starting 2017, in order to join the infrastructure, data sources will sign a Terms of Agreement where they will grant to the OpenAIRE services the right of collecting and reusing metadata records under CC-0. ...
Article
Full-text available
This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. We discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata, and provide some examples of the applicability of this approach to internationally known data infrastructures.
... The infrastructure is capable of collecting and interlinking content from data repositories, publication repositories, CRIS systems and several entity registries -e.g. the repository registry OpenDOAR, the EC FP7 project directory CORDA, the registry of data repositories re3data.org. Data curators manage workflows for the curation [3] (harmonization and cleansing), de-duplication, and enrichment 2 OpenAIRE, http://www.openaire.eu by inference (mining algorithms) of metadata collected from thousands of data sources (currently 450+ and growing) in a sustainable way. ...
Article
Full-text available
'Aggregative Data Infrastructures' (ADIs) are systems devised to collect metadata descriptions (and files) from several data sources to construct uniform Information Spaces, hence providing cross-data source access via standard APIs or custom portals. ADIs typically deal with data collection workflows from arbitrary numbers of data sources, with heterogeneous access protocols, data exchange formats, and data models. Besides, they handle data processing work-flows for the harmonization and enrichment of aggregated metadata. Correct workflow management is crucial to ensure Information Space consistency, but is in general hard to sustain. This demo will present the solution offered in the context of the OpenAIRE infrastructure, which today collects metadata and files from around 450+ data sources (and growing) of several typologies. The D-NET Workflow Management Suite user interfaces support data curators at orchestrating overtime and in a sustainable way the configuration, execution, and monitoring of data collection and processing workflows for thousands of data sources.
Book
Full-text available
This bibliography presents over 225 selected English-language articles and books that are useful in understanding the publication and citation of research data. It also provides limited coverage of closely related topics, such as research data identifiers (e.g., DOI) and scholarly metrics. Most sources have been published from January 2009 through December 2021. It includes full abstracts for works under certain Creative Commons Licenses. It is also available as a website (http://digital-scholarship.org/citation/citation.htm), which includes a Google Translate link. This work is licensed under a Creative Commons Attribution 4.0 International License. Keywords: altmetrics, data citation, data journals, data publication, data reuse, data sharing, data sharing policies, Digital Object Identifiers, funding agency requirements, open access, open access journals, open science, peer review, persistent identifiers, scholarly metrics, research data, research data management, research data publishing, scholarly journals, and scholarly publishing.
Article
Full-text available
Zusammenfassung Der Beitrag untersucht die Wechselbeziehungen zwischen Bibliometrie und Open Access bei der Serviceentwicklung an deutschen Universitäten. Trotz der wissenschaftspolitischen und praktischen Relevanz der Bibliometrie beziehen Open-Access-Angebote nur bedingt entsprechende Verfahren und Expertise ein. Während Bibliometrieangebote verstärkt berufsethische Aspekte im Sinne eines verantwortungsbewussten Umgangs reflektieren, finden sich bei Open-Access-Angeboten problematische Praxen. Im Gegenzug profitieren institutionelle Angebote im Bereich des Publikationsmonitorings sowohl von standardisierter und vernetzter Forschungsinformation als auch von einer arbeitsteiligen Organisation des Berichtswesens innerhalb eines Universitätsverbunds.
Article
Preface The Research Data Curation and Management Bibliography includes over 800 selected English-language articles and books that are useful in understanding the curation of digital research data in academic and other research institutions. The "digital curation" concept is still evolving. In "Digital Curation and Trusted Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo define digital curation as follows: Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts.1 The Research Data Curation and Management Bibliography covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, open access, peer review, publication, citation, sharing, reuse, and preservation. It is highly selective in its coverage. The bibliography does not cover conference proceedings, digital media works (such as MP3 files), editorials, e-mail messages, interviews, letters to the editor, presentation slides or transcripts, technical reports. unpublished e-prints, or weblog postings. Most sources have been published from January 2009 through December 2019; however, a limited number of earlier key sources are also included. The bibliography has links to included works. URLs may alter without warning (or automatic forwarding) or they may disappear altogether. Where possible, this bibliography uses Digital Object Identifier System (DOI) URLs. DOIs are not rechecked after initial validation. Publisher systems may have temporary DOI 3 resolution problems. Should a link be dead, try entering it in the Internet Archive Wayback Machine. Abstracts are included in this bibliography if a work is under a Creative Commons Attribution License (BY and national/international variations), a Creative Commons public domain dedication (CC0), or a Creative Commons Public Domain Mark and this is clearly indicated in the publisher’s current webpage for the article. Note that a publisher may have changed the licenses for all articles on a journal’s website but not have made corresponding license changes in journal’s PDF files. The license on the current webpage is deemed to be the correct one. Since publishers can change licenses in the future, the license indicated for a work in this bibliography may not be the one you find upon retrieval of the work. Unless otherwise noted, article abstracts in this bibliography are under a Creative Commons Attribution 4.0 International License, https://creativecommons.org/licenses/by/4.0/. Abstracts are reproduced as written in the source material. 1 Christopher A. Lee and Helen R. Tibbo, "Digital Curation and Trusted Repositories: Steps Toward Success," Journal of Digital Information 8, no. 2 (2007). https://journals.tdl.org/jodi/index.php/jodi/article/view/229
Book
Full-text available
This selective bibliography presents over 800 English-language articles and books. It covers topics such as research data creation, metadata, provenance, repositories, management, policies, support services, funding agency requirements, open access, peer review, publication, citation, sharing, reuse, and preservation. It is also available as a paperback PDF file (https://www.digital-scholarship.org/rdcmb/rdcmb.pdf) and a website (https://www.digital-scholarship.org/rdcmb/rdcmb-web.htm), which includes a Google Translate link. Most sources were published from 2009 through 2019. It includes full abstracts for works under certain Creative Commons Licenses. This work is licensed under a Creative Commons Attribution 4.0 International License. (See also the Research Data Sharing and Reuse Bibliography and the Research Data Publication and Citation Bibliography.) Keywords: academic libraries, altmetrics, data citation, data curation, data journals, data preservation, data privacy, data publication, data repositories, data reuse, data sharing, data sharing policies, Digital Object Identifiers, peer review, ethical data sharing, geospatial data, funding agency requirements, open access, open access journals, open science, persistent identifiers, research data, research data management, research data metadata, research data publishing, research data metadata, research data services, research data training, research libraries, scholarly journals, scholarly metrics, and scholarly publishing.
Chapter
Full-text available
Puisant ses analyses et ses exemples dans des champs scientifiques variés, cet ouvrage (dont l’original est paru en 2015 chez MIT Press) offre une étude inédite des utilisations des données au sein des infrastructures de la connaissance – utilisations qui varient largement d’une discipline à l’autre. Bien que le big data ait régulièrement les honneurs de la presse des deux côtés de l’Atlantique, Christine L. Borgman met en évidence qu’il vaut mieux disposer des bonnes données qu’en avoir beaucoup. Elle montre également que les little data peuvent s’avérer aussi précieuses que les big data, et, que, dans bien des cas, il n’y a aucune donnée, parce que les informations pertinentes n’existent pas, sont introuvables ou sont indisponibles… Au travers d’études de cas pratiques issus d’horizons divers, Christine L. Borgman met aussi en lumière que les données n’ont ni valeur ni signification isolément : elles s’inscrivent au sein d’une infrastructure de la connaissance, c’est-à-dire d’un écosystème de personnes, de pratiques, de technologies, d’institutions, d’objets matériels et de relations. Pour l’autrice, gérer les données et les exploiter sur le long terme requiert ainsi des investissements massifs dans ces infrastructures de la connaissance. L’avenir de la recherche, dans un monde en réseau, en dépend.
Book
Full-text available
This bibliography presents over 750 English-language articles, books, and technical reports. It covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, open access, peer review, publication, citation, sharing, reuse, and preservation. Most sources were published from 2009 through 2018. It is also available as a website with a Google Translate link (https://tinyurl.com/k4vvzz68). This work is licensed under a Creative Commons Attribution 4.0 International License. This bibliography has been superseded by the 2021 Research Data Curation and Management Bibliography (https://tinyurl.com/3rec6ejn), which covers over 800 works.
Chapter
This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. We discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata, and provide some examples of the applicability of this approach to internationally known data infrastructures. KeywordsSemantic webResearch metadataLicensingDiscoverabilityData infrastructureCreative commonsOpen data
Book
This bibliography includes over 750 selected English-language articles, books, and technical reports. It covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, open access, peer review, publication, citation, sharing, reuse, and preservation. Most sources have been published from January 2009 through December 2017. It is licensed under a Creative Commons Attribution 4.0 International License. This bibliography has been superseded by the 2021 Research Data Curation and Management Bibliography (https://tinyurl.com/3rec6ejn), which covers over 800 works.
Article
This article focuses on curation of digital research data and development of related infrastructure. It explains that data-intensive computational approaches to science is playing an increasingly important role in scholarly research. However the sheer volume of digital data produced in the sciences is staggering, posing a daunting challenge to researchers and to publishers alike. It notes on the move by institutions to colloborate and make cross-section partnerships.
Article
We present an Open Cultural Digital Content Infrastructure, a platform providing a coherent suite of loosely-coupled services that aim to promote metadata quality in repositories and facilitate metadata data and digital content reuse. The key functions of the infrastructure are the aggregation of metadata and digital files and the automatic validation of metadata records and digital material for compliance with desired quality specifications. The system that has recently moved to production, is currently being employed to ensure the quality standards of the output of more than 70 projects that support Greek cultural heritage organisations and are funded by the European Union structural funds. These projects are expected to produce more than 1.5 million digitized and born-digital items accompanied with detailed metadata. The validation is based on a set of quality and interoperability specifications that have been developed for the purpose. The infrastructure has been developed using an open source technology stack and tools and in particular reuses a number of components of the publicly available Europeana aggregator and portal software platform.
Conference Paper
We present an Open Cultural Digital Content Infrastructure, a platform providing a coherent suite of loosely-coupled services that aim to promote metadata quality in repositories and facilitate metadata data and digital content reuse. The key functions of the infrastructure are the aggregation of metadata and digital files and the automatic validation of metadata records and digital material for compliance with desired quality specifications. The system that has recently moved to production, is currently being employed to ensure the quality standards of the output of more than 70 projects that support Greek cultural heritage organisations and are funded by the European Union structural funds. These projects are expected to produce more than 1.5 million digitised and born-digital items accompanied with detailed metadata. The validation is based on a set of quality and interoperability specifications that have been developed for the purpose. In this paper we emphasize on Validator and Aggregator components and present experimental results of their scalability.
ResearchGate has not been able to resolve any references for this publication.