Sören Auer

Sören Auer
Leibniz Universität Hannover · L3S Research Center

Prof. Dr.
Working on organizing the flood of research with the Open Research Knowledge Graph: https://www.orkg.org/

About

607
Publications
286,828
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
21,561
Citations
Introduction
My key research question is: "How can we digitize the work and information flows in science and technology?" I serve as director of TIB German National Library of Science and Technology ‒ Leibniz Information Centre for Science and Technology. My research interests include social and semantic technologies, knowledge representation, engineering & management, usability, agile methodologies as well as databases and information systems.
Additional affiliations
July 2017 - present
July 2017 - present
Leibniz Universität Hannover
Position
  • Professor
June 2013 - June 2017
University of Bonn
Position
  • Professor
Education
September 2003 - October 2006
University of Leipzig
Field of study
  • Computer Science
September 1997 - June 1998
Ural Federal University
Field of study
  • Mathematics
October 1995 - February 2000
Technische Universität Dresden
Field of study
  • Mathematics

Publications

Publications (607)
Technical Report
Full-text available
This whitepaper gives an overview on aims and architecture of the Industrial Data Space. Additionally, some use cases and the Industrial Data Space Association are introduced.
Conference Paper
Full-text available
The management and analysis of large-scale datasets – described with the term Big Data – involves the three classic dimensions volume, velocity and variety. While the former two are well supported by a plethora of software components, the variety dimension is still rather neglected. We present the BDE platform – an easy-to-deploy, easy-to-use and a...
Conference Paper
Full-text available
In the engineering and manufacturing domain, there is currently an atmosphere of departure to a new era of digitized production. In different regions, initiatives in these directions are known under different names, such as industrie du futur in France, industrial internet in the US or Industrie 4.0 in Germany. While the vision of digitizing produc...
Conference Paper
Full-text available
The search for information on the Web of Data is becoming increasingly difficult due to its dramatic growth. Especially novice users need to acquire both knowledge about the underlying ontology structure and proficiency in formulating formal queries (e. g. SPARQL queries) to retrieve information from Linked Data sources. So as to simplify and autom...
Conference Paper
Full-text available
With Linked Data, a very pragmatic approach towards achieving the vision of the Semantic Web has recently gained much traction. The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. While many standards, methods and technologies developed within by the Semantic Web community are applicabl...
Conference Paper
Full-text available
0000−0003−3975−5374] , Kheir Eddine Farfar 1[0000−0002−0366−4596] , Allard Oelen 1[0000−0001−9924−9153] , Oliver Karras 1[0000−0001−5336−6899] , and Sören Auer 1,2[0000−0002−0698−2864] Abstract. One of the pillars of the scientific method is reproducibility-the ability to replicate the results of a prior study if the same procedures are followed. A...
Article
Full-text available
Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly documen...
Chapter
We propose the LLMs4OL approach, which utilizes Large Language Models (LLMs) for Ontology Learning (OL). LLMs have shown significant advancements in natural language processing, demonstrating their ability to capture complex language patterns in different knowledge domains. Our LLMs4OL paradigm investigates the following hypothesis: Can LLMs effect...
Conference Paper
[Background.] Empirical research in requirements engineering (RE) is a constantly evolving topic, with a growing number of publications. Several papers address this topic using literature reviews to provide a snapshot of its “current” state and evolution. However, these papers have never built on or updated earlier ones, resulting in overlap and re...
Article
Full-text available
The Open Research Knowledge Graph (ORKG) is an Open Science digital infrastructure for the production, curation, publication, and reuse of machine-actionable scholarly knowledge. Built on top of the RDF data model and extensible ontologies, the ORKG provides a common vocabulary for researchers to describe their research contributions and data, impr...
Preprint
Full-text available
The amount of research articles produced every day is overwhelming: scholarly knowledge is getting harder to communicate and easier to get lost. A possible solution is to represent the information in knowledge graphs: structures representing knowledge in networks of entities, their semantic types, and relationships between them. But this solution h...
Chapter
Full-text available
The amount of research articles produced every day is overwhelming: scholarly knowledge is getting harder to communicate and easier to get lost. A possible solution is to represent the information in knowledge graphs: structures representing knowledge in networks of entities, their semantic types, and relationships between them. But this solution h...
Chapter
Recent investigations have explored prompt-based training of transformer language models for new text genres in low-resource settings. This approach has proven effective in transferring pre-trained or fine-tuned models to resource-scarce environments. This work presents the first results on applying prompt-based training to transformers for scholar...
Preprint
Full-text available
We propose the LLMs4OL approach, which utilizes Large Language Models (LLMs) for Ontology Learning (OL). LLMs have shown significant advancements in natural language processing, demonstrating their ability to capture complex language patterns in different knowledge domains. Our LLMs4OL paradigm investigates the following hypothesis: \textit{Can LLM...
Preprint
[Background.] Empirical research in requirements engineering (RE) is a constantly evolving topic, with a growing number of publications. Several papers address this topic using literature reviews to provide a snapshot of its "current" state and evolution. However, these papers have never built on or updated earlier ones, resulting in overlap and re...
Article
Full-text available
Complex research problems are increasingly addressed by interdisciplinary, collaborate research projects generating large amounts of heterogeneous amounts of data. The overarching processing, analysis and availability of data are critical success factors for these research efforts. Data repositories enable long term availability of such data for th...
Preprint
Full-text available
The reuse of research software is central to research efficiency and academic exchange. The application of software enables researchers with varied backgrounds to reproduce, validate, and expand upon study findings. Furthermore, the analysis of open source code aids in the comprehension, comparison, and integration of approaches. Often, however, no...
Article
Full-text available
The purpose of this work is to describe the orkg-Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files....
Chapter
Full-text available
The overall AI trend of creating neuro-symbolic systems is reflected in the Semantic Web community with an increased interest in the development of systems that rely on both Semantic Web resources and Machine Learning components (SWeMLS, for short). However, understanding trends and best practices in this rapidly growing field is hampered by a lack...
Preprint
Full-text available
There have been many recent investigations into prompt-based training of transformer language models for new text genres in low-resource settings. The prompt-based training approach has been found to be effective in generalizing pre-trained or fine-tuned models for transfer to resource-scarce settings. This work, for the first time, reports results...
Preprint
Full-text available
The purpose of this work is to describe the Orkg-Leaderboard software designed to extract leaderboards defined as Task-Dataset-Metric tuples automatically from large collections of empirical research papers in Artificial Intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files....
Article
Full-text available
Knowledge graphs have gained increasing popularity in the last decade in science and technology. However, knowledge graphs are currently relatively simple to moderate semantic structures that are mainly a collection of factual statements. Question answering (QA) benchmarks and systems were so far mainly geared towards encyclopedic knowledge graphs...
Preprint
Full-text available
The rapid growth of research publications has placed great demands on digital libraries (DL) for advanced information management technologies. To cater to these demands, techniques relying on knowledge-graph structures are being advocated. In such graph-based pipelines, inferring semantic relations between related scientific concepts is a crucial s...
Preprint
Full-text available
Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly documen...
Article
Full-text available
Due to the growing number of scholarly publications, finding relevant articles becomes increasingly difficult. Scholarly knowledge graphs can be used to organize the scholarly knowledge presented within those publications and represent them in machine-readable formats. Natural language processing (NLP) provides scalable methods to automatically ext...
Preprint
Full-text available
We present a large-scale empirical investigation of the zero-shot learning phenomena in a specific recognizing textual entailment (RTE) task category, i.e. the automated mining of leaderboards for Empirical AI Research. The prior reported state-of-the-art models for leaderboards extraction formulated as an RTE task, in a non-zero-shot setting, are...
Preprint
Full-text available
In line with the general trend in artificial intelligence research to create intelligent systems that combine learning and symbolic components, a new sub-area has emerged that focuses on combining machine learning (ML) components with techniques developed by the Semantic Web (SW) community - Semantic Web Machine Learning (SWeML for short). Due to i...
Preprint
Recent events such as wars, sanctions, pandemics, and climate change have shown the importance of proper supply network management. A key step in managing supply networks is procurement. We present an approach for realizing a next-generation procurement workspace that aims to facilitate resilience and sustainability. To achieve this, the approach e...
Article
Full-text available
The Open Research Knowledge Graph is an infrastructure for the production, curation, publication and use of FAIR scientific information. Its mission is to shape a future scholarly publishing and communication where the contents of scholarly articles are FAIR research data.
Article
Full-text available
In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber , a framework that brings together the research community’s disjoint efforts...
Conference Paper
The value of structured scholarly knowledge for research and society at large is well understood, but producing scholarly knowledge (i.e., knowledge traditionally published in articles) in structured form remains a challenge. We propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis...
Preprint
Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, a...
Conference Paper
A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowled...
Chapter
Full-text available
Domain-specific named entity recognition (NER) on Computer Science (CS) scholarly articles is an information extraction task that is arguably more challenging for the various annotation aims that can hamper the task and has been less studied than NER in the general domain. Given that significant progress has been made on NER, we anticipate that sch...
Chapter
Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, a...
Chapter
Full-text available
When semantically describing knowledge graphs (KGs), users have to make a critical choice of a vocabulary (i.e. predicates and resources). The success of KG building is determined by the convergence of shared vocabularies so that meaning can be established. The typical lifecycle for a new KG construction can be defined as follows: nascent phases of...
Conference Paper
Full-text available
Knowledge Graphs (KG) have gained increasing importance in science, business and society in the last years. However, most knowledge graphs were either extracted or compiled from existing sources. There are only relatively few examples where knowledge graphs were genuinely created by an intertwined human-machine collaboration. Also, since the qualit...
Preprint
Full-text available
Knowledge Graphs (KG) have gained increasing importance in science, business and society in the last years. However, most knowledge graphs were either extracted or compiled from existing sources. There are only relatively few examples where knowledge graphs were genuinely created by an intertwined human-machine collaboration. Also, since the qualit...
Article
The seamless documentation of research data flows from generation, processing, analysis, publication, and reuse is of utmost importance when dealing with large amounts of data. Semantic linking of process documentation and gathered data creates a knowledge space enabling the discovery of relations between steps of process chains. This paper shows t...
Preprint
Full-text available
When semantically describing knowledge graphs (KGs), users have to make a critical choice of a vocabulary (i.e. predicates and resources). The success of KG building is determined by the convergence of shared vocabularies so that meaning can be established. The typical lifecycle for a new KG construction can be defined as follows: nascent phases of...
Conference Paper
Full-text available
The development of a novel manufacturing process chain is a complex scientific challenge and requires interdisciplinary collaboration, as well as technological solutions that extend the boundaries of automation and customize the information flows between different organizational units. Due to these challenges an approach to parametrize each s...
Preprint
Full-text available
We leverage the Open Research Knowledge Graph - a scholarly infrastructure that supports the creation, curation, and reuse of structured, semantic scholarly knowledge - and present an approach for persistent identification of FAIR scholarly knowledge. We propose a DOI-based persistent identification of ORKG Papers, which are machine-actionable desc...
Chapter
The development of a novel manufacturing process chain is a complex scientific challenge and requires interdisciplinary and inter-institutional collaboration. Data need to be exchanged continuously between involved researchers in order to coordinate between individual process steps and to identify cause-effect relationships within the process. This...
Article
Full-text available
Scholarly knowledge graphs provide researchers with a novel modality of information retrieval, and their wider use in academia is beneficial for the digitalization of published works and the development of scholarly communication. To increase the acceptance of scholarly knowledge graphs, we present a dashboard, which visualizes the research contrib...
Chapter
Full-text available
A key aspect of establishing data spaces is to develop a common understanding of the data to be shared in the data space. Semantic standards and technologies were developed for this purpose since over two decades. In this article, we will discuss the history and importance of semantic integration for data spaces. We will introduce the base concepts...
Chapter
Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG, orkg.org) represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained, machine-re...
Preprint
Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is...
Preprint
Despite improved digital access to scholarly literature in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. Scholarly knowledge remains locked in representations that are inadequate for machine processing. The Open Research Knowledge Graph (ORKG) is an infrastructure...
Article
Full-text available
The rapid growth of research publications has placed great demands on digital libraries (DL) for advanced information management technologies. To cater to these demands, techniques relying on knowledge-graph structures are being advocated. In such graph-based pipelines, inferring semantic relations between related scientific concepts is a crucial s...
Conference Paper
Full-text available
The comprehensive implementation of digital technologies in product manufacturing leads to changes in engineering processes and requires new approaches to data management. An important role belongs to the processes of organizing the collection, storage and reuse of research data obtained and used in the process of product, system or technology deve...
Preprint
Full-text available
Supply Chains (SCs) are subject to disruptive events that potentially hinder the operational performance. Disruption Management Process (DMP) relies on the analysis of integrated heterogeneous data sources such as production scheduling, order management and logistics to evaluate the impact of disruptions on the SC. Existing approaches are limited a...
Preprint
Full-text available
Supply Chain (SC) modeling is essential to understand and influence SC behavior, especially for increasingly globalized and complex SCs. Existing models address various SC notions, e.g., processes, tiers and production, in an isolated manner limiting enriched analysis granted by integrated information systems. Moreover, the scarcity of real-world d...
Preprint
Full-text available
Semiconductor supply chains are described by significant demand fluctuation that increases as one moves up the supply chain, the so-called bullwhip effect. To counteract, semiconductor manufacturers aim to optimize capacity utilization, to deliver with shorter lead times and exploit this to generate revenue. Additionally, in a competitive market, f...
Preprint
As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article...
Preprint
Full-text available
Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG) https://www.orkg.org/ represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained...
Preprint
Full-text available
Domain-specific named entity recognition (NER) on Computer Science (CS) scholarly articles is an information extraction task that is arguably more challenging for the various annotation aims that can beset the task and has been less studied than NER in the general domain. Given that significant progress has been made on NER, we believe that scholar...
Preprint
Full-text available
Leveraging a GraphQL-based federated query service that integrates multiple scholarly communication infrastructures (specifically, DataCite, ORCID, ROR, OpenAIRE, Semantic Scholar, Wikidata and Altmetric), we develop a novel web widget based approach for the presentation of scholarly knowledge with rich contextual information. We implement the prop...
Article
Full-text available
In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as article narratives in document form. Despite being a well-established tradition in scholarly communication, PDF...
Article
In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question's answer by 'connecting the dots' across various pertinent facts. Considering a...
Chapter
Full-text available
The continuous and significant growth of data, together with improved access to data and the availability of powerful computing infrastructure, has led to intensified activities around Big Data Value (BDV) and data-driven Artificial Intelligence (AI). Powerful data techniques and tools allow collecting, storing, analysing, processing and visualisin...
Chapter
Leveraging a GraphQL-based federated query service that integrates multiple scholarly communication infrastructures (specifically, DataCite, ORCID, ROR, OpenAIRE, Semantic Scholar, Wikidata and Altmetric), we develop a novel web widget based approach for the presentation of scholarly knowledge with rich contextual information. We implement the prop...
Chapter
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two...
Preprint
Full-text available
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two...
Chapter
We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed...
Preprint
Review articles are a means to structure state-of-the-art literature and to organize the growing number of scholarly publications. However, review articles are suffering from numerous limitations, weakening the impact the articles could potentially have. A key limitation is the inability of machines to access and process knowledge presented within...