Jens Lehmann

Jens Lehmann
Amazon

Prof. Dr.

About

448
Publications
173,914
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
23,522
Citations
Introduction
Website: http://jens-lehmann.org I am leading the Smart Data Analytics Research Group at the University of Bonn, Fraunhofer IAIS Dresden and InfAI. My research interests include semantic technologies, conversational AI, machine learning and more broadly artificial intelligence.
Additional affiliations
October 2022 - present
TU Dresden
Position
  • Professor
December 2015 - May 2022
University of Bonn
Position
  • Professor
December 2015 - May 2022
Fraunhofer Institute for Intelligent Analysis and Information Systems
Position
  • Researcher
Education
October 2006 - June 2010
Leipzig University
Field of study
  • Computer Science
October 2005 - April 2006
University of Bristol
Field of study
  • Computer Science
October 2001 - September 2006
TU Dresden
Field of study
  • Computer Science

Publications

Publications (448)
Article
Full-text available
The Semantic Web eases data and information integration tasks by providing an infrastructure based on RDF and ontologies. In this paper, we contribute to the development of a spatial Data Web by elaborating on how the collaboratively collected OpenStreetMap data can be interactively transformed and represented adhering to the RDF data model. This t...
Article
The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich...
Article
Full-text available
With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in re- search and applications, however, is constrained by the lack of well-structured knowledge bases consisting of a sophisticated schema and instance data adhering to this schema. It is para...
Article
Full-text available
Representation learning for link prediction is one of the leading approaches to deal with incompleteness problem of real world knowledge graphs. Such methods are often called knowledge graph embedding models which represent entities and relationships in knowledge graphs in continuous vector spaces. By doing this, semantic relationships and patterns...
Conference Paper
The adaption of multilingual pre-trained LLMs into eloquent and helpful assistants is essential to facilitate their use across different language regions. In that spirit, we are the first to conduct an extensive study of the performance of multilingual models instruction-tuned on different language compositions on parallel instruction-tuning benchm...
Chapter
This study introduces a Model-Based Deep Reinforcement Learning approach to enhance the effectiveness and transparency of mechanical ventilation treatment in the critical care setting of Intensive Care Units (ICUs). Distinct from conventional model-free methods, our approach benefits from the model-based algorithms’ capability to learn and interrog...
Chapter
Full-text available
Question answering (QA) over knowledge graphs (KGs) is an essential task that maps a user’s utterance to a query over a KG to retrieve the correct answer. Earlier methods in this field relied heavily on predefined templates and rules, which had limited adaptability and learning capability. Recent research has made significant strides in answering s...
Article
Answering factual questions from heterogenous sources, such as graphs and text, is a key capacity of intelligent systems. Current approaches either (i) perform question answering over text and structured sources as separate pipelines followed by a merge step or (ii) provide an early integration, giving up the strengths of particular information sou...
Conference Paper
Full-text available
Knowledge graph embedding (KGE) models provide a low-dimensional representation of knowledge graphs in continuous vector spaces. This representation learning enables different downstream AI tasks such as link prediction for graph completion. However, most embedding models are only designed considering the algebra and geometry of the entity embeddin...
Article
Deep Non-Negative Matrix Factorization (DNMF) methods provide an efficient low-dimensional representation of given data through their layered architecture. A limitation of such methods is that they cannot effectively preserve the local and global geometric structures of the data in each layer. Consequently, a significant amount of the geometrical i...
Article
Full-text available
Knowledge graph embedding models represent entities as vectors in continuous spaces and their relations by geometric transformations, mainly translation, and rotation. However, multi-relational knowledge graphs contain complex sub-graph structures, for which these two families of models fail to solely preserve. The complexities of these sub-graph s...
Chapter
Knowledge graphs comprise structural and textual information to represent knowledge. To predict new structural knowledge, current approaches learn representations using both types of information through knowledge graph embeddings and language models. These approaches commit to a single pre-trained language model. We hypothesize that heterogeneous l...
Chapter
Full-text available
We propose the use of controlled natural language as a target for knowledge graph question answering (KGQA) semantic parsing via language models as opposed to using formal query languages directly. Controlled natural languages are close to (human) natural languages, but can be unambiguously translated into a formal language such as SPARQL. Our rese...
Chapter
Geometric aspects of knowledge graph embedding models directly impact their capability to preserve knowledge from the original graph to the vector space. For example, the capability to preserve structural patterns such as hierarchies, loops, and paths present as relational structures in a knowledge graph depends on the underlying geometry. In these...
Preprint
Full-text available
Skilled employees are usually seen as the most important pillar of an organization. Despite this, most organizations face high attrition and turnover rates. While several machine learning models have been developed for analyzing attrition and its causal factors, the interpretations of those models remain opaque. In this paper, we propose the HR-DSS...
Article
Full-text available
In recent years, exciting sources of data have been modeled as knowledge graphs (KGs). This modeling represents both structural relationships and the entity-specific multi-modal data in KGs. In various data analytic pipelines and machine learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values...
Article
Full-text available
The amount of multilingual data on the Web proliferates; therefore, developing ontologies in various natural languages is attracting considerable attention. In order to achieve semantic interoperability for the multilingual Web, cross-lingual ontology matching techniques are highly required. This paper proposes a Multilingual Ontology Matching (MoM...
Preprint
Full-text available
Knowledge Graph Embedding models have become an important area of machine learning. Those models provide a latent representation of entities and relations of a knowledge graph which can then be used in downstream machine learning tasks such as link prediction.The learning process of such models can be performed by contrasting positive and negative...
Preprint
Full-text available
This paper addresses the task of conversational question answering (ConvQA) over knowledge graphs (KGs). The majority of existing ConvQA methods rely on full supervision signals with a strict assumption of the availability of gold logical forms of queries to extract answers from the KG. However, creating such a gold logical form is not viable for e...
Conference Paper
Full-text available
Artificial Intelligence (AI) and Machine Learning (ML) are becoming common in our daily lives. The AI-driven processes significantly affect us as individuals and as a society, spanning across ethical dimensions like discrimination, misin-formation, and fraud. Several of these AI & ML approaches rely on Knowledge Graph (KG) data. Due to the large vo...
Conference Paper
Full-text available
In recent years, more and more exciting sources of data have been modeled as Knowledge Graphs (KGs). This modeling represents both structural relationships and the entity specific multi-modal data in KGs. In various data analytic pipelines and Machine Learning (ML), the task of semantic similarity estimation plays a significant role. Assigning simi...
Preprint
Full-text available
We introduce a new dataset for conversational question answering over Knowledge Graphs (KGs) with verbalized answers. Question answering over KGs is currently focused on answer generation for single-turn questions (KGQA) or multiple-tun conversational question answering (ConvQA). However, in a real-world scenario (e.g., voice assistants such as Sir...
Preprint
Full-text available
Knowledge Graphs, such as Wikidata, comprise structural and textual knowledge in order to represent knowledge. For each of the two modalities dedicated approaches for graph embedding and language models learn patterns that allow for predicting novel structural knowledge. Few approaches have integrated learning and inference with both modalities and...
Conference Paper
For many years, link prediction on knowledge. graphs has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, whereas th...
Conference Paper
Climate change has a severe impact on the overall ecosystem of the whole world, including humankind. This demo paper presents Climate Bot - a machine reading comprehension system for question answering over documents about climate change. The proposed Climate Bot provides an interface for users to ask questions in natural language and get answers f...
Preprint
While a considerable amount of semantic parsing approaches have employed RNN architectures for code generation tasks, there have been only few attempts to investigate the applicability of Transformers for this task. Including hierarchical information of the underlying programming language syntax has proven to be effective for code generation. Since...
Chapter
Many knowledge graphs (KG) contain spatial and temporal information. Most KG embedding models follow triple-based representation and often neglect the simultaneous consideration of the spatial and temporal aspects. Encoding such higher dimensional knowledge necessitates the consideration of true algebraic and geometric aspects. Hypercomplex algebra...
Article
Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed app...
Preprint
Full-text available
Task-oriented dialogue generation is challenging since the underlying knowledge is often dynamic and effectively incorporating knowledge into the learning process is hard. It is particularly challenging to generate both human-like and informative responses in this setting. Recent research primarily focused on various knowledge distillation methods...
Preprint
Full-text available
Evaluating Natural Language Generation (NLG) systems is a challenging task. Firstly, the metric should ensure that the generated hypothesis reflects the reference's semantics. Secondly, it should consider the grammatical quality of the generated sentence. Thirdly, it should be robust enough to handle various surface forms of the generated sentence....
Preprint
Full-text available
Knowledge Graph Embeddings (KGEs) encode the entities and relations of a knowledge graph (KG) into a vector space with a purpose of representation learning and reasoning for an ultimate downstream task (i.e., link prediction, question answering). Since KGEs follow closed-world assumption and assume all the present facts in KGs to be positive (corre...
Preprint
Full-text available
Entity alignment aims to identify equivalent entity pairs between different knowledge graphs (KGs). Recently, the availability of temporal KGs (TKGs) that contain time information created the need for reasoning over time in such TKGs. Existing embedding-based entity alignment approaches disregard time information that commonly exists in many large-...
Chapter
Ontologies – providing an explicit schema for underlying data – often serve as background knowledge for machine learning approaches. Similar to ILP methods, concept learning utilizes such ontologies to learn concept expressions from examples in a supervised manner. This learning process is usually cast as a search process through the space of ontol...
Article
Full-text available
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (...
Article
Geospatial knowledge has always been an essential driver for many societal aspects. This concerns in particular urban planning and urban growth management. To gain insights from geospatial data and guide decisions usually authoritative and open data sources are used, combined with user or citizen sensing data. However, we see a great potential for...
Article
Recent years, Knowledge Graph Embeddings (KGEs) have shown promising performance on link prediction tasks by mapping the entities and relations from a Knowledge Graph (KG) into a geometric space and thus have gained increasing attentions. In addition, many recent Knowledge Graphs involve evolving data, e.g., the fact ( Obama , PresidentOf , USA...
Article
Full-text available
Most Knowledge Graph-based Question Answering (KGQA) systems rely on training data to reach their optimal performance. However, acquiring training data for supervised systems is both time-consuming and resource-intensive. To address this, in this paper, we propose Tree-KGQA , an unsupervised KGQA system leveraging pre-trained language models and...
Article
Full-text available
SPARQL query generation from natural language questions is complex because it requires an understanding of both the question and underlying knowledge graph (KG) patterns. Most SPARQL query generation approaches are template-based, tailored to a specific knowledge graph and require pipelines with multiple steps, including entity and relation linking...
Article
Full-text available
Knowledge graph embedding models have become a popular approach for knowledge graph completion through predicting the plausibility of (potential) triples. This is performed by transforming the entities and relations of the knowledge graph into an embedding space. However, knowledge graphs often include further textual information stored in literal,...
Conference Paper
Full-text available
This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-memory data preprocessing pipelines for Spark-based machine learning on RDF knowledge graphs. This framework introduces software modules that transform large-scale RDF data into ML-ready fixed-length numeric feature vectors. The developed modules are op...
Preprint
Full-text available
Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge af...
Preprint
Full-text available
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (...
Article
The heterogeneity in recently published knowledge graph embedding models’ implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could b...
Poster
Full-text available
DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs CIKM Poster
Conference Paper
Full-text available
With the tremendous increase in the volume of semantic data on the Web, reasoning over such an amount of data has become a challenging task. On the other hand, the traditional centralized approaches are no longer feasible for large-scale data due to the limitations of software and hardware resources. Therefore, horizontal scalability is desirable....
Article
Full-text available
Knowledge graph embedding models have gained significant attention in AI research. The aim of knowledge graph embedding is to embed the graphs into a vector space in which the structure of the graph is preserved. Recent works have shown that the inclusion of background knowledge, such as logical rules, can improve the performance of embeddings in d...
Chapter
For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, where...
Article
A central component in many applications is the underlying data management layer. In Data-Web applications, the central component of this layer is the triple store. It is thus evident that finding the most adequate store for the application to develop is of crucial importance for individual projects as well as for data integration on the Data Web i...
Preprint
Full-text available
Spatial data is ubiquitous in our data-driven society. The Logic Programming community has been investigating the use of spatial data in different settings. Despite the success of this research, the Geographic Information System (GIS) community has rarely made use of these new approaches. This has mainly two reasons. First, there is a lack of tools...
Chapter
In recent years, there have been significant developments in Question Answering over Knowledge Graphs (KGQA). Despite all the notable advancements, current KGQA systems only focus on answer generation techniques and not on answer verbalization. However, in real-world scenarios (e.g., voice assistants such as Alexa, Siri, etc.), users prefer verbali...
Chapter
Knowledge graphs embeddings (KGE) are lately at the center of many artificial intelligence studies due to their applicability for solving downstream tasks, including link prediction and node classification. However, most Knowledge Graph embedding models encode, into the vector space, only the local graph structure of an entity, i.e., information of...
Article
Full-text available
Industry 4.0 (I4.0) standards and standardization frameworks provide a unified way to describe smart factories. Standards specify the main components, systems, and processes inside a smart factory and the interaction among all of them. Furthermore, standardization frameworks classify standards according to their functions into layers and dimensions...
Chapter
Full-text available
The last decades have witnessed significant advancements in terms of data generation, management, and maintenance. This has resulted in vast amounts of data becoming available in a variety of forms and formats including RDF. As RDF data is represented as a graph structure, applying machine learning algorithms to extract valuable knowledge and insig...
Article
Full-text available
Knowledge graphs (KGs) are widely used for modeling scholarly communication, performing scientometric analyses, and supporting a variety of intelligent services to explore the literature and predict research dynamics. However, they often suffer from incompleteness (e.g., missing affiliations, references, research topics), leading to a reduced scope...
Book
Full-text available
Was bedeutet Natural Language Processing, was verbirgt sich hinter GPT-3 und wie funktionieren eigentlich Chatbots? Antworten auf diese Fragen liefert die neue Studie »Moderne Sprachtechnologien – Konzepte, Anwendungen, Chancen« von KI.NRW. In einer umfassenden Einführung zeigen Wissenschaftler des Fraunhofer-Instituts für Intelligente Analyse- und...
Preprint
Full-text available
For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based \glspl{kg}...
Preprint
Full-text available
The incompleteness of Knowledge Graphs (KGs) is a crucial issue affecting the quality of AI-based services. In the scholarly domain, KGs describing research publications typically lack important information, hindering our ability to analyse and predict research dynamics. In recent years, link prediction approaches based on Knowledge Graph Embedding...
Preprint
Full-text available
Knowledge graph embedding models have been studied comprehensively recently. However, these studies lack an evaluation system that compares their efficiency in a reproducible manner that follows the FAIR principles. In this study, we extend the general HOBBIT benchmarking platform to evaluate the efficiency of embedding models with such criteria. T...