
Sören AuerLeibniz Universität Hannover · L3S Research Center
Sören Auer
Prof. Dr.
Working on organizing the flood of research with the Open Research Knowledge Graph: https://www.orkg.org/
About
607
Publications
286,828
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
21,561
Citations
Introduction
My key research question is: "How can we digitize the work and information flows in science and technology?" I serve as director of TIB German National Library of Science and Technology ‒ Leibniz Information Centre for Science and Technology. My research interests include social and semantic technologies, knowledge representation, engineering & management, usability, agile methodologies as well as databases and information systems.
Additional affiliations
July 2017 - present
July 2017 - present
June 2013 - June 2017
Education
September 2003 - October 2006
September 1997 - June 1998
October 1995 - February 2000
Publications
Publications (607)
This whitepaper gives an overview on aims and architecture of the Industrial Data Space. Additionally, some use cases and the Industrial Data Space Association are introduced.
The management and analysis of large-scale datasets – described with the term Big Data – involves the three classic dimensions volume, velocity and variety. While the former two are well supported by a plethora of software components, the variety dimension is still rather neglected. We present the BDE platform – an easy-to-deploy, easy-to-use and a...
In the engineering and manufacturing domain, there is currently an atmosphere of departure to a new era of digitized production. In different regions, initiatives in these directions are known under different names, such as industrie du futur in France, industrial internet in the US or Industrie 4.0 in Germany. While the vision of digitizing produc...
The search for information on the Web of Data is becoming increasingly difficult due to its dramatic growth. Especially novice users need to acquire both knowledge about the underlying ontology structure and proficiency in formulating formal queries (e. g. SPARQL queries) to retrieve information from Linked Data sources. So as to simplify and autom...
With Linked Data, a very pragmatic approach towards achieving the vision of the Semantic Web has recently gained much traction. The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. While many standards, methods and technologies developed within by the Semantic Web community are applicabl...
0000−0003−3975−5374] , Kheir Eddine Farfar 1[0000−0002−0366−4596] , Allard Oelen 1[0000−0001−9924−9153] , Oliver Karras 1[0000−0001−5336−6899] , and Sören Auer 1,2[0000−0002−0698−2864] Abstract. One of the pillars of the scientific method is reproducibility-the ability to replicate the results of a prior study if the same procedures are followed. A...
Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly documen...
We propose the LLMs4OL approach, which utilizes Large Language Models (LLMs) for Ontology Learning (OL). LLMs have shown significant advancements in natural language processing, demonstrating their ability to capture complex language patterns in different knowledge domains. Our LLMs4OL paradigm investigates the following hypothesis: Can LLMs effect...
[Background.] Empirical research in requirements engineering (RE) is a constantly evolving topic, with a growing number of publications. Several papers address this topic using literature reviews to provide a snapshot of its “current” state and evolution. However, these papers have never built on or updated earlier ones, resulting in overlap and re...
The Open Research Knowledge Graph (ORKG) is an Open Science digital infrastructure for the production, curation, publication, and reuse of machine-actionable scholarly knowledge. Built on top of the RDF data model and extensible ontologies, the ORKG provides a common vocabulary for researchers to describe their research contributions and data, impr...
The amount of research articles produced every day is overwhelming: scholarly knowledge is getting harder to communicate and easier to get lost. A possible solution is to represent the information in knowledge graphs: structures representing knowledge in networks of entities, their semantic types, and relationships between them. But this solution h...
The amount of research articles produced every day is overwhelming: scholarly knowledge is getting harder to communicate and easier to get lost. A possible solution is to represent the information in knowledge graphs: structures representing knowledge in networks of entities, their semantic types, and relationships between them. But this solution h...
Recent investigations have explored prompt-based training of transformer language models for new text genres in low-resource settings. This approach has proven effective in transferring pre-trained or fine-tuned models to resource-scarce environments. This work presents the first results on applying prompt-based training to transformers for scholar...
We propose the LLMs4OL approach, which utilizes Large Language Models (LLMs) for Ontology Learning (OL). LLMs have shown significant advancements in natural language processing, demonstrating their ability to capture complex language patterns in different knowledge domains. Our LLMs4OL paradigm investigates the following hypothesis: \textit{Can LLM...
[Background.] Empirical research in requirements engineering (RE) is a constantly evolving topic, with a growing number of publications. Several papers address this topic using literature reviews to provide a snapshot of its "current" state and evolution. However, these papers have never built on or updated earlier ones, resulting in overlap and re...
Complex research problems are increasingly addressed by interdisciplinary, collaborate research projects generating large amounts of heterogeneous amounts of data. The overarching processing, analysis and availability of data are critical success factors for these research efforts. Data repositories enable long term availability of such data for th...
The reuse of research software is central to research efficiency and academic exchange. The application of software enables researchers with varied backgrounds to reproduce, validate, and expand upon study findings. Furthermore, the analysis of open source code aids in the comprehension, comparison, and integration of approaches. Often, however, no...
The purpose of this work is to describe the orkg-Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files....
The overall AI trend of creating neuro-symbolic systems is reflected in the Semantic Web community with an increased interest in the development of systems that rely on both Semantic Web resources and Machine Learning components (SWeMLS, for short). However, understanding trends and best practices in this rapidly growing field is hampered by a lack...
There have been many recent investigations into prompt-based training of transformer language models for new text genres in low-resource settings. The prompt-based training approach has been found to be effective in generalizing pre-trained or fine-tuned models for transfer to resource-scarce settings. This work, for the first time, reports results...
The purpose of this work is to describe the Orkg-Leaderboard software designed to extract leaderboards defined as Task-Dataset-Metric tuples automatically from large collections of empirical research papers in Artificial Intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files....
Knowledge graphs have gained increasing popularity in the last decade in science and technology. However, knowledge graphs are currently relatively simple to moderate semantic structures that are mainly a collection of factual statements. Question answering (QA) benchmarks and systems were so far mainly geared towards encyclopedic knowledge graphs...
The rapid growth of research publications has placed great demands on digital libraries (DL) for advanced information management technologies. To cater to these demands, techniques relying on knowledge-graph structures are being advocated. In such graph-based pipelines, inferring semantic relations between related scientific concepts is a crucial s...
Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly documen...
Due to the growing number of scholarly publications, finding relevant articles becomes increasingly difficult. Scholarly knowledge graphs can be used to organize the scholarly knowledge presented within those publications and represent them in machine-readable formats. Natural language processing (NLP) provides scalable methods to automatically ext...
We present a large-scale empirical investigation of the zero-shot learning phenomena in a specific recognizing textual entailment (RTE) task category, i.e. the automated mining of leaderboards for Empirical AI Research. The prior reported state-of-the-art models for leaderboards extraction formulated as an RTE task, in a non-zero-shot setting, are...
In line with the general trend in artificial intelligence research to create intelligent systems that combine learning and symbolic components, a new sub-area has emerged that focuses on combining machine learning (ML) components with techniques developed by the Semantic Web (SW) community - Semantic Web Machine Learning (SWeML for short). Due to i...
Recent events such as wars, sanctions, pandemics, and climate change have shown the importance of proper supply network management. A key step in managing supply networks is procurement. We present an approach for realizing a next-generation procurement workspace that aims to facilitate resilience and sustainability. To achieve this, the approach e...
The Open Research Knowledge Graph is an infrastructure for the production, curation, publication and use of FAIR scientific information. Its mission is to shape a future scholarly publishing and communication where the contents of scholarly articles are FAIR research data.
In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber , a framework that brings together the research community’s disjoint efforts...
The value of structured scholarly knowledge for research and society at large is well understood, but producing scholarly knowledge (i.e., knowledge traditionally published in articles) in structured form remains a challenge. We propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis...
Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, a...
A plethora of scientific software packages are
published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowled...
Domain-specific named entity recognition (NER) on Computer Science (CS) scholarly articles is an information extraction task that is arguably more challenging for the various annotation aims that can hamper the task and has been less studied than NER in the general domain. Given that significant progress has been made on NER, we anticipate that sch...
Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, a...
When semantically describing knowledge graphs (KGs), users have to make a critical choice of a vocabulary (i.e. predicates and resources). The success of KG building is determined by the convergence of shared vocabularies so that meaning can be established. The typical lifecycle for a new KG construction can be defined as follows: nascent phases of...
Knowledge Graphs (KG) have gained increasing importance in science, business and society in the last years. However, most knowledge graphs were either extracted or compiled from existing sources. There are only relatively few examples where knowledge graphs were genuinely created by an intertwined human-machine collaboration. Also, since the qualit...
Knowledge Graphs (KG) have gained increasing importance in science, business and society in the last years. However, most knowledge graphs were either extracted or compiled from existing sources. There are only relatively few examples where knowledge graphs were genuinely created by an intertwined human-machine collaboration. Also, since the qualit...
The seamless documentation of research data flows from generation, processing, analysis, publication, and reuse is of utmost importance when dealing with large amounts of data. Semantic linking of process documentation and gathered data creates a knowledge space enabling the discovery of relations between steps of process chains. This paper shows t...
When semantically describing knowledge graphs (KGs), users have to make a critical choice of a vocabulary (i.e. predicates and resources). The success of KG building is determined by the convergence of shared vocabularies so that meaning can be established. The typical lifecycle for a new KG construction can be defined as follows: nascent phases of...
The development of a novel manufacturing
process chain is a complex scientific challenge and
requires interdisciplinary collaboration, as well as
technological solutions that extend the boundaries of
automation and customize the information flows between
different organizational units. Due to these challenges an
approach to parametrize each s...
We leverage the Open Research Knowledge Graph - a scholarly infrastructure that supports the creation, curation, and reuse of structured, semantic scholarly knowledge - and present an approach for persistent identification of FAIR scholarly knowledge. We propose a DOI-based persistent identification of ORKG Papers, which are machine-actionable desc...
The development of a novel manufacturing process chain is a complex scientific challenge and requires interdisciplinary and inter-institutional collaboration. Data need to be exchanged continuously between involved researchers in order to coordinate between individual process steps and to identify cause-effect relationships within the process. This...
Scholarly knowledge graphs provide researchers with a novel modality of information retrieval, and their wider use in academia is beneficial for the digitalization of published works and the development of scholarly communication. To increase the acceptance of scholarly knowledge graphs, we present a dashboard, which visualizes the research contrib...
A key aspect of establishing data spaces is to develop a common understanding of the data to be shared in the data space. Semantic standards and technologies were developed for this purpose since over two decades. In this article, we will discuss the history and importance of semantic integration for data spaces. We will introduce the base concepts...
Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG, orkg.org) represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained, machine-re...
Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is...
Despite improved digital access to scholarly literature in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. Scholarly knowledge remains locked in representations that are inadequate for machine processing. The Open Research Knowledge Graph (ORKG) is an infrastructure...
The rapid growth of research publications has placed great demands on digital libraries (DL) for advanced information management technologies. To cater to these demands, techniques relying on knowledge-graph structures are being advocated. In such graph-based pipelines, inferring semantic relations between related scientific concepts is a crucial s...
The comprehensive implementation of digital technologies in product manufacturing leads to changes in engineering processes and requires new approaches to data management. An important role belongs to the processes of organizing the collection, storage and reuse of research data obtained and used in the process of product, system or technology deve...
Supply Chains (SCs) are subject to disruptive events that potentially hinder the operational performance. Disruption Management Process (DMP) relies on the analysis of integrated heterogeneous data sources such as production scheduling, order management and logistics to evaluate the impact of disruptions on the SC. Existing approaches are limited a...
Supply Chain (SC) modeling is essential to understand and influence SC behavior, especially for increasingly globalized and complex SCs. Existing models address various SC notions, e.g., processes, tiers and production, in an isolated manner limiting enriched analysis granted by integrated information systems. Moreover, the scarcity of real-world d...
Semiconductor supply chains are described by significant demand fluctuation that increases as one moves up the supply chain, the so-called bullwhip effect. To counteract, semiconductor manufacturers aim to optimize capacity utilization, to deliver with shorter lead times and exploit this to generate revenue. Additionally, in a competitive market, f...
As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article...
Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG) https://www.orkg.org/ represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained...
Domain-specific named entity recognition (NER) on Computer Science (CS) scholarly articles is an information extraction task that is arguably more challenging for the various annotation aims that can beset the task and has been less studied than NER in the general domain. Given that significant progress has been made on NER, we believe that scholar...
Leveraging a GraphQL-based federated query service that integrates multiple scholarly communication infrastructures (specifically, DataCite, ORCID, ROR, OpenAIRE, Semantic Scholar, Wikidata and Altmetric), we develop a novel web widget based approach for the presentation of scholarly knowledge with rich contextual information. We implement the prop...
In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as article narratives in document form. Despite being a well-established tradition in scholarly communication, PDF...
In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question's answer by 'connecting the dots' across various pertinent facts.
Considering a...
The continuous and significant growth of data, together with improved access to data and the availability of powerful computing infrastructure, has led to intensified activities around Big Data Value (BDV) and data-driven Artificial Intelligence (AI). Powerful data techniques and tools allow collecting, storing, analysing, processing and visualisin...
Leveraging a GraphQL-based federated query service that integrates multiple scholarly communication infrastructures (specifically, DataCite, ORCID, ROR, OpenAIRE, Semantic Scholar, Wikidata and Altmetric), we develop a novel web widget based approach for the presentation of scholarly knowledge with rich contextual information. We implement the prop...
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two...
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two...
We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed...
Review articles are a means to structure state-of-the-art literature and to organize the growing number of scholarly publications. However, review articles are suffering from numerous limitations, weakening the impact the articles could potentially have. A key limitation is the inability of machines to access and process knowledge presented within...