Axel-Cyrille Ngonga Ngomo

Axel-Cyrille Ngonga Ngomo
Universität Paderborn | UPB · Data Science

Prof. Dr. rer. nat. habil.

About

401
Publications
104,824
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,539
Citations
Citations since 2017
219 Research Items
6423 Citations
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
Introduction
I lead the Data Science Group at Paderborn University (http://dice-research.org) with a focus on knowledge graphs and discrete representation for heterogeneous data.
Additional affiliations
April 2014 - October 2016
University of Leipzig
Position
  • Head of AKSW
August 2008 - present
University of Leipzig
Position
  • Head of AKSW/SIMBA

Publications

Publications (401)
Chapter
Class expression learning in description logics has long been regarded as an iterative search problem in an infinite conceptual space. Each iteration of the search process invokes a reasoner and a heuristic function. The reasoner finds the instances of the current expression, and the heuristic function computes the information gain and decides on t...
Chapter
Most real-world knowledge graphs, including Wikidata, DBpedia, and Yago are incomplete. Answering queries on such incomplete graphs is an important, but challenging problem. Recently, a number of approaches, including complex query decomposition (CQD), have been proposed to answer complex, multi-hop queries with conjunctions and disjunctions on suc...
Chapter
A growing number of knowledge graph embedding models exploit the characteristics of division algebras (e.g., \(\mathbb {R}\), \(\mathbb {C}\), \(\mathbb {H}\), and \(\mathbb {O}\)) to learn embeddings. Yet, recent empirical results suggest that the suitability of algebras is contingent upon the knowledge graph being embedded. In this work, we tackl...
Chapter
Full-text available
Purpose: The query language GraphQL has gained significant traction in recent years. In particular, it has recently gained the attention of the semantic web and graph database communities and is now often used as a means to query knowledge graphs. Most of the storage solutions that support GraphQL rely on a translation layer to map the said languag...
Chapter
Full-text available
Purpose: This study addresses the limitations of current short abstracts of DBPEDIA entities, which often lack a comprehensive overview due to their creating method (i.e., selecting the first two-three sentences from the full DBPEDIA abstracts). Methodology: We leverage pre-trained language models to generate abstractive summaries of DBPEDIA abstra...
Chapter
Full-text available
Purpose: Data integration and applications across knowledge graphs (KGs) rely heavily on the discovery of links between resources within these KGs. Geospatial link discovery algorithms have to deal with millions of point sets containing billions of points. Methodology: To speed up the discovery of geospatial links, we propose COBALT. COBALT combine...
Conference Paper
Full-text available
Models computed using deep learning have been effectively applied to tackle various problems in many disciplines. Yet, the predictions of these models are often at most post-hoc and locally explainable. In contrast, class expressions in description logics are ante-hoc and globally explainable. Although state-of-the-art symbolic machine learning app...
Preprint
Full-text available
Relation Extraction (RE) is a task that identifies relationships between entities in a text, enabling the acquisition of relational facts and bridging the gap between natural language and structured knowledge. However, current RE models often rely on small datasets with low coverage of relation types, particularly when working with languages other...
Chapter
Full-text available
Linked knowledge graphs build the backbone of many data-driven applications such as search engines, conversational agents and e-commerce solutions. Declarative link discovery frameworks use complex link specifications to express the conditions under which a link between two resources can be deemed to exist. However, understanding such complex link...
Chapter
Full-text available
Indonesian is classified as underrepresented in the Natural Language Processing (NLP) field, despite being the tenth most spoken language in the world with 198 million speakers. The paucity of datasets is recognized as the main reason for the slow advancements in NLP research for underrepresented languages. Significant attempts were made in 2020 to...
Article
Full-text available
Graffiti is an urban phenomenon that is increasingly attracting the interest of the sciences. To the best of our knowledge, no suitable data corpora are available for systematic research until now. The Information System Graffiti in Germany project (Ingrid) closes this gap by dealing with graffiti image collections that have been made available to...
Preprint
Full-text available
Most real-world knowledge graphs, including Wikidata, DBpedia, and Yago are incomplete. Answering queries on such incomplete graphs is an important, but challenging problem. Recently, a number of approaches, including complex query decomposition (CQD), have been proposed to answer complex, multi-hop queries with conjunctions and disjunctions on suc...
Preprint
Full-text available
Concept learning deals with learning description logic concepts from a background knowledge and input examples. The goal is to learn a concept that covers all positive examples, while not covering any negative examples. This non-trivial task is often formulated as a search problem within an infinite quasi-ordered concept space. Although state-of-th...
Conference Paper
Full-text available
At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct Webis-CausalQA-22, a benchmark corpus of 1.1 million causal questions with answers. We distinguish different types of causal questions using a novel typ...
Preprint
Knowledge bases are widely used for information management on the web, enabling high-impact applications such as web search, question answering, and natural language processing. They also serve as the backbone for automatic decision systems, e.g. for medical diagnostics and credit scoring. As stakeholders affected by these decisions would like to u...
Article
Full-text available
Knowledge graphs (KGs) that follow the Linked Data principles are created daily. However, there are no holistic models for the Linked Open Data (LOD). Building these models( i.e., engineering a pipeline system) is still a big challenge in order to make the LOD vision comes true. In this paper, we address this challenge by presenting NELLIE, a pipel...
Article
Full-text available
We present the Linked SPARQL Queries (LSQ) dataset, which currently describes 43.95 million executions of 11.56 million unique SPARQL queries extracted from the logs of 27 different endpoints. The LSQ dataset provides RDF descriptions of each such query, which are indexed in a public LSQ endpoint, allowing interested parties to find queries with th...
Chapter
We consider fact-checking approaches that aim to predict the veracity of assertions in knowledge graphs. Five main categories of fact-checking approaches for knowledge graphs have been proposed in the recent literature, of which each is subject to partially overlapping limitations. In particular, current text-based approaches are limited by manual...
Chapter
Keyphrase extraction aims to identify a small set of phrases that best describe the content of text. The automatic generation of keyphrases has become essential for many natural language applications such as text categorization, indexing, and summarization. In this paper, we propose MultPAX, a multitask framework for extracting present and absent k...
Chapter
In recent years, several relation extractions (RE) models have been developed to extract knowledge from natural language texts. Accordingly, several benchmark datasets have been proposed to evaluate these models. These RE datasets consisted of natural language sentences with a fixed number of relations from a particular domain. Albeit useful for ge...
Chapter
Full-text available
Time-efficient solutions for querying RDF knowledge graphs depend on indexing structures with low response times to answer SPARQL queries rapidly. Hypertries—an indexing structure we recently developed for tensor-based triple stores—have achieved significant runtime improvements over several mainstream storage solutions for RDF knowledge graphs. Ho...
Chapter
A large amount of data is generated every day by different systems and applications. In many cases, this data comes in a tabular format that lacks semantic representation and poses new challenges in data modelling. For semantic applications, it then becomes necessary to lift the data to a richer representation, such as a knowledge graph that adhere...
Preprint
Full-text available
Knowledge graph embedding research has mainly focused on learning continuous representations of knowledge graphs towards the link prediction problem. Recently developed frameworks can be effectively applied in research related applications. Yet, these frameworks do not fulfill many requirements of real-world applications. As the size of the knowled...
Article
Full-text available
The rapid generation of large amounts of information about the coronavirus SARS-CoV-2 and the disease COVID-19 makes it increasingly difficult to gain a comprehensive overview of current insights related to the disease. With this work, we aim to support the rapid access to a comprehensive data source on COVID-19 targeted especially at researchers....
Article
Knowledge graph embedding research has mainly focused on learning continuous representations of knowledge graphs towards the link prediction problem. Recently developed frameworks can be effectively applied in research related applications. Yet, these frameworks do not fulfill many requirements of real-world applications. As the size of the knowled...
Preprint
Full-text available
Knowledge graph embedding research has mainly focused on learning continuous representations of entities and relations tailored towards the link prediction problem. Recent results indicate an ever increasing predictive ability of current approaches on benchmark datasets. However, this effectiveness often comes with the cost of over-parameterization...
Chapter
Concept learning approaches based on refinement operators explore partially ordered solution spaces to compute concepts, which are used as binary classification models for individuals. However, the number of concepts explored by these approaches can grow to the millions for complex learning problems. This often leads to impractical runtimes. We pro...
Preprint
Full-text available
Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge af...
Conference Paper
Full-text available
The sheer size of modern knowledge graphs has led to increased attention being paid to the entity summarization task. Given a knowledge graph T and an entity e found therein, solutions to entity summarization select a subset of the triples from T which summarize e's concise bound description. Presently, the best performing approaches rely on sequen...
Preprint
Full-text available
Class expression learning is a branch of explainable supervised machine learning of increasing importance. Most existing approaches for class expression learning in description logics are search algorithms or hard-rule-based. In particular, approaches based on refinement operators suffer from scalability issues as they rely on heuristic functions t...
Article
Full-text available
RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend...
Preprint
Classifying nodes in knowledge graphs is an important task, e.g., predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned m...
Conference Paper
Full-text available
Data partitioning is an effective way to manage large datasets. While a broad range of RDF graph partitioning techniques has been proposed in previous works, little attention has been given to workload-aware RDF graph partitioning. In this paper, we propose two techniques that make use of the querying workload to detect the portions of RDF graphs t...
Chapter
Unsupervised fact checking approaches for knowledge graphs commonly combine path search and scoring to predict the likelihood of assertions being true. Current approaches search for said metapaths in the discrete search space spanned by the input knowledge graph and make no use of continuous representations of knowledge graphs. We hypothesize that...
Article
A central component in many applications is the underlying data management layer. In Data-Web applications, the central component of this layer is the triple store. It is thus evident that finding the most adequate store for the application to develop is of crucial importance for individual projects as well as for data integration on the Data Web i...
Chapter
Full-text available
With significant growth in RDF datasets, application developers demand online availability of these datasets to meet the end users’ expectations. Various interfaces are available for querying RDF data using SPARQL query language. Studies show that SPARQL end-points may provide high query runtime performance at the cost of low availability. For exam...
Chapter
Full-text available
Knowledge Graph Question Answering (KGQA) systems are often based on machine learning algorithms, requiring thousands of question-answer pairs as training examples or natural language processing pipelines that need module fine-tuning. In this paper, we present a novel QA approach, dubbed TeBaQA. Our approach learns to answer questions based on grap...
Article
Full-text available
Social media plays a significant role in disaster management by providing valuable data about affected people, donations and help requests. Recent studies highlight the need to filter information on social media into fine-grained content labels. However, identifying useful information from massive amounts of social media posts during a crisis is a...
Preprint
Full-text available
Concept learning approaches based on refinement operators explore partially ordered solution spaces to compute concepts, which are used as binary classification models for individuals. However, the refinement trees spanned by these approaches can easily grow to millions of nodes for complex learning problems. This leads to refinement-based approach...
Article
Full-text available
In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as lang...
Chapter
Relation extraction between two named entities from unstructured text is an important natural language processing task. In the absence of labelled data, semi-supervised and unsupervised approaches are used to extract relations. We present a novel approach that uses sentence encoding for unsupervised relation extraction. We use a pre-trained, SBERT...
Preprint
Full-text available
Approaches based on refinement operators have been successfully applied to class expression learning on RDF knowledge graphs. These approaches often need to explore a large number of concepts to find adequate hypotheses. This need arguably stems from current approaches relying on myopic heuristic functions to guide their search through an infinite...
Preprint
Full-text available
Knowledge graph embedding research has mainly focused on the two smallest normed division algebras, $\mathbb{R}$ and $\mathbb{C}$. Recent results suggest that trilinear products of quaternion-valued embeddings can be a more effective means to tackle link prediction. In addition, models based on convolutions on real-valued embeddings often yield sta...
Chapter
Full-text available
We investigate the problem of learning continuous vector representations of knowledge graphs for predicting missing links. Recent results suggest that using a Hermitian inner product on complex-valued embeddings or convolutions on real-valued embeddings can be effective means for predicting missing links. We bring these insights together and propos...
Chapter
Data compression for RDF knowledge graphs is used in an increasing number of settings. In parallel to this, several grammar-based graph compression algorithms have been developed to reduce the size of graphs. We port gRePair—a state-of-the-art grammar-based graph compression algorithm—to RDF (named RDFRePair). We compare this promising technique wi...
Preprint
Full-text available
Knowledge graph embedding techniques are key to making knowledge graphs amenable to the plethora of machine learning approaches based on vector representations. Link prediction is often used as a proxy to evaluate the quality of these embeddings. Given that the creation of benchmarks for link prediction is a time-consuming endeavor, most work on th...
Preprint
Full-text available
In the Open Data Portal Germany (OPAL) project, a pipeline of the following data refinement steps has been developed: requirements analysis, data acquisition, analysis, conversion, integration and selection. 800,000 datasets in DCAT format have been produced.
Preprint
Full-text available
In recent years, we observe an increasing amount of software with machine learning components being deployed. This poses the question of quality assurance for such components: how can we validate whether specified requirements are fulfilled by a machine learned software? Current testing and verification approaches either focus on a single requireme...
Conference Paper
Full-text available
We investigate the problem of named entity recognition in the user-generated text such as social media posts. This task is rendered particularly difficult by the restricted length and limited grammatical coherence of this data type. Current state-of-the-art approaches rely on external sources such as gazetteers to alleviate some of these restrictio...
Preprint
Full-text available
Recent years have seen the growing adoption of non-relational data models for representing diverse, incomplete data. Among these, the RDF graph-based data model has seen ever-broadening adoption, particularly on the Web. This adoption has prompted the standardization of the SPARQL query language for RDF, as well as the development of a variety of l...
Preprint
Full-text available
Recent years have seen the growing adoption of non-relational data models for representing diverse, incomplete data. Among these, the RDF graph-based data model has seen ever-broadening adoption, particularly on the Web. This adoption has prompted the standardization of the SPARQL query language for RDF, as well as the development of a variety of l...
Preprint
Full-text available
Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctnes...
Article
Full-text available
The Linked Data paradigm builds upon the backbone of distributed knowledge bases connected by typed links. The mere volume of current knowledge bases as well as their sheer number pose two major challenges when aiming to support the computation of links across and within them. The first is that tools for link discovery have to be time-efficient whe...
Preprint
Full-text available
Knowledge Graph Question Answering (KGQA) systems are based on machine learning algorithms, requiring thousands of question-answer pairs as training examples or natural language processing pipelines that need module fine-tuning. In this paper, we present a novel QA approach, dubbed TeBaQA. Our approach learns to answer questions based on graph isom...
Preprint
Full-text available
Recent years have seen the growing adoption of non-relational data models for representing diverse , incomplete data. Among these, the RDF graph-based data model has seen ever-broadening adoption, particularly on the Web. This adoption has prompted the standardization of the SPARQL query language for RDF, as well as the development of a variety of...
Article
The number and size of datasets abiding by the Linked Data paradigm increase every day. Discovering links between these datasets is thus central to achieving the vision behind the Data Web. Declarative Link Discovery (LD) frameworks rely on complex Link Specification (LS) to express the conditions under which two resources should be linked. Underst...
Conference Paper
Full-text available
The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Ou...
Preprint
Full-text available
Knowledge graph completion refers to predicting missing triples. Most approaches achieve this goal by predicting entities, given an entity and a relation. We predict missing triples via the relation prediction. To this end, we frame the relation prediction problem as a multi-label classification problem and propose a shallow neural model (SHALLOM)...