Axel-Cyrille Ngonga Ngomo’s research while affiliated with Paderborn University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


Embedding Knowledge Graphs in Degenerate Clifford Algebras
  • Chapter

October 2024

·

1 Read

Louis Mozart Kamdem Teyou

·

·

Axel-Cyrille Ngonga Ngomo

Clifford algebras are a natural extension of division algebras, including real numbers, complex numbers, quaternions, and octonions. Previous research in knowledge graph embeddings has focused exclusively on Clifford algebras of a specific type, which do not include nilpotent base vectors—elements that square to zero. In this work, we introduce a novel approach by incorporating nilpotent base vectors with a nilpotency index of two, leading to a more general form of Clifford algebras named degenerate Clifford algebras. This generalization to degenerate Clifford algebras does allow for covering dual numbers and as such include translations and rotations models under the same generalization paradigm for the first time. We develop two models to determine the parameters that define the algebra: one using a greedy search and another predicting the parameters based on neural network embeddings of the input knowledge graph. Our evaluation on seven benchmark datasets demonstrates that this incorporation of nilpotent vectors enhances the quality of embeddings. Additionally, our method outperforms state-of-the-art approaches in terms of generalization, particularly regarding the mean reciprocal rank achieved on validation data. Finally, we show that even a simple greedy search can effectively discover optimal or near-optimal parameters for the algebra.



Datasets used throughout the evaluation and their features (number of entities, relations, and triples in each split).
Performance Evaluation of Knowledge Graph Embedding Approaches under Non-adversarial Attacks
  • Preprint
  • File available

July 2024

·

14 Reads

Knowledge Graph Embedding (KGE) transforms a discrete Knowledge Graph (KG) into a continuous vector space facilitating its use in various AI-driven applications like Semantic Search, Question Answering, or Recommenders. While KGE approaches are effective in these applications, most existing approaches assume that all information in the given KG is correct. This enables attackers to influence the output of these approaches, e.g., by perturbing the input. Consequently, the robustness of such KGE approaches has to be addressed. Recent work focused on adversarial attacks. However, non-adversarial attacks on all attack surfaces of these approaches have not been thoroughly examined. We close this gap by evaluating the impact of non-adversarial attacks on the performance of 5 state-of-the-art KGE algorithms on 5 datasets with respect to attacks on 3 attack surfaces-graph, parameter, and label perturbation. Our evaluation results suggest that label perturbation has a strong effect on the KGE performance, followed by parameter perturbation with a moderate and graph with a low effect.

Download

Interpretability Index Based on Balanced Volumes for Transparent Models and Agnostic Explainers

June 2024

·

10 Reads

·

6 Citations

We discuss interpretability and explainability of machine learning models. We introduce a universal interpretability index, JJ, to quantify and monitor the interpretability of a general-purpose model, which can be static or evolve incre-mentally from a data stream. The models can be transparent classifiers, predictors or controllers operating on partitions or granules of the data space, e.g., rule-based models, trees, proba-bilistic clustering models, modular or granular neural networks. Additionally, black boxes can be monitored either after the derivation of a global or local surrogate model as a result of the application of a model-agnostic explainer. The index does not de-pend on the type of algorithm that creates or updates the model, i.e., supervised, unsupervised, semi-supervised, or reinforcement. While loss or error-related indices, validity measures, processing time, and closed-loop criteria have been used to evaluate model effectiveness across different fields, a general interpretability index in consonance with explainable AI does not exist. The index JJ is computed straightforwardly. It reflects the principle of justifiable granularity by taking into account balanced volumes, number of partitions and dependent parameters, and features per partition. The index advocates that a concise model founded on balanced partitions offers a higher level of interpretability. It facilitates comparisons between models, can be used as a term in loss functions or embedded into learning procedures to motivate equilibrium of volumes, and ultimately support human-centered decision making. Computational experiments show the versatility of the index in a biomedical prediction problem from speech data, and in image classification.


EGNN-C+: Interpretable Evolving Granular Neural Network and Application in Classification of Weakly-Supervised EEG Data Streams

May 2024

·

24 Reads

·

4 Citations

We introduce a modified incremental learning algorithm for evolving Granular Neural Network Classifiers (eGNN-C+). We use double-boundary hyper-boxes to represent granules, and customize the adaptation procedures to enhance the robustness of outer boxes for data coverage and noise suppression, while ensuring that inner boxes remain flexible to capture drifts. The classifier evolves from scratch, incorporates new classes on the fly, and performs local incremental feature weighting. As an application, we focus on the classification of emotion-related patterns within electroencephalogram (EEG) signals. Emotion recognition is crucial for enhancing the realism and interactivity of computer systems. The challenge lies exactly in developing high-performance algorithms capable of effectively managing individual differences and non-stationarities in physiological data without relying on subject-specific information. We extract features from the Fourier spectrum of EEG signals obtained from 28 individuals engaged in playing computer games – a public dataset. Each game elicits a different predominant emotion: boredom, calmness, horror, or joy. We analyze individual electrodes, time window lengths, and frequency bands to assess the accuracy and interpretability of resulting user-independent neural models. The findings indicate that both brain hemispheres assist classification, especially electrodes on the temporal (T8) and parietal (P7) areas, alongside contributions from frontal and occipital electrodes. While patterns may manifest in any band, the Alpha (8-13Hz), Delta (1-4Hz), and Theta (4-8Hz) bands, in this order, exhibited higher correspondence with the emotion classes. The eGNN-C+ demonstrates effectiveness in learning EEG data. It achieves an accuracy of 81.7% and a 0.0029 II interpretability using 10-second time windows, even in face of a highly-stochastic time-varying 4-class classification problem.


Computing Repairs Under Functional and Inclusion Dependencies via Argumentation

March 2024

·

3 Reads

Lecture Notes in Computer Science

We discover a connection between finding subset-maximal repairs for sets of functional and inclusion dependencies, and computing extensions within argumentation frameworks (AFs). We study the complexity of existence of a repair, and deciding whether a given tuple belongs to some (or every) repair, by simulating the instances of these problems via AFs. We prove that subset-maximal repairs under functional dependencies correspond to the naive extensions, which also coincide with the preferred and stable extensions in the resulting AFs. For inclusion dependencies one needs a pre-processing step on the resulting AFs in order for the extensions to coincide. Allowing both types of dependencies breaks this relationship between extensions and only preferred semantics captures the repairs. Finally, we establish that the complexities of the above decision problems are NP {\textbf {NP}}-complete and Π2P\boldsymbol{\mathrm {\Pi }}^{ {\textbf {P}}}_2-complete, when both functional and inclusion dependencies are allowed.


Fig. 1: eGNN-C+: Granular neural network with evolving structure and parameters for classification of data streams A similarity vector, denoted as x i[h] = [ x i 1 ... x i n ] ′ , arises from the matching between the instance x [h] = [x 1 ... x n ] ′ and the granule G i = {G i 1 , . . . , G i n }. The core of G i j is the interval [g i
Fig. 2: The bi-dimensional Arousal-Valence model: a framework to describe and categorize human emotions
Fig. 3: Examples of spectra and bands obtained from raw data generated by four frontal electrodes
Fig. 5: Evolution of the granular structure and performance of the best, user-independent, generalized eGNN-C+ model
EGNN-C+: Interpretable Evolving Granular Neural Network and Application in Classification of Weakly-Supervised EEG Data Streams

February 2024

·

132 Reads

We introduce a modified incremental learning algorithm for evolving Granular Neural Network Classifiers (eGNN-C+). We use double-boundary hyper-boxes to represent granules, and customize the adaptation procedures to enhance the robust-ness of outer boxes for data coverage and noise suppression, while ensuring that inner boxes remain flexible to capture drifts. The classifier evolves from scratch, incorporates new classes on the fly, and performs local incremental feature weighting. As an application, we focus on the classification of emotion-related patterns within electroencephalogram (EEG) signals. Emotion recognition is crucial for enhancing the realism and interactivity of computer systems. The challenge lies exactly in developing high-performance algorithms capable of effectively managing individual differences and non-stationarities in physiological data without relying on subject-specific calibration data. We extract features from the Fourier spectrum of EEG signals obtained from 28 individuals engaged in playing computer games-a public dataset. Each game elicits a different predominant emotion: boredom, calmness, horror, or joy. We analyze individual electrodes, time window lengths, and frequency bands to assess the accuracy and interpretability of resulting user-independent neural models. The findings indicate that both brain hemispheres assist classification, especially electrodes on the temporal (T8) and parietal (P7) areas, alongside contributions from frontal and occipital electrodes. While patterns may manifest in any band, the Alpha (8-13Hz), Delta (1-4Hz), and Theta (4-8Hz) bands, in this order, exhibited higher correspondence with the emotion classes. The eGNN-C+ demonstrates effectiveness in learning EEG data. It achieves an accuracy of 81.7% and a 0.0029 II interpretability using 10-second time windows, even in face of a highly-stochastic time-varying 4-class classification problem.


Chapter 13. Class Expression Learning with Multiple Representations

July 2023

·

17 Reads

Knowledge bases are now first-class citizens of the Web. Circa 50% of the 3.2 billion websites in the 2022 crawl of Web Data Commons contains knowledge base fragments in RDF. The 82 billion assertions known to exist in these websites are complemented by a roughly comparable number of triples available in dumps. As this data is now the backbone of a number of applications, it stands to reason that machine learning approaches able to exploit the explicit semantics exposed by RDF knowledge bases must scale to large knowledge bases. In this chapter, we present approaches based on continuous and symbolic representations that aim to achieve this goal by addressing some of the main scalability bottlenecks of existing class expression learning approaches. While we focus on the description logic ALC , the approaches we present are far from being limited to this particular expressiveness.


LauNuts: A Knowledge Graph to Identify and Compare Geographic Regions in the European Union

May 2023

·

6 Reads

·

1 Citation

Lecture Notes in Computer Science

The Nomenclature of Territorial Units for Statistics (NUTS) is a classification that represents countries in the European Union (EU). It is published at intervals of several years and organized in a hierarchical system where geographical areas are subdivided according to their population sizes. In addition to NUTS, there is a further subdivided hierarchy level, named Local Administrative Units (LAU), whose data are updated annually by EU member states. While both datasets are published by Eurostat as Excel files, an additional RDF dataset is available for NUTS up to the 2016 scheme. With this work, we provide the Linked Data community with an up-to-date Knowledge Graph in which NUTS and LAU data are linked and which contains population numbers as well as area sizes. We also publish an Open Source generator software for future released versions that will naturally arise due to changes in population numbers. These contributions can be used to enrich other datasets and allow comparisons among regions in the European Union. All resources are available at https://w3id.org/launuts.KeywordsEUEuropean UnionEurostatKnowledge GraphLAULauNutsLinked DataNUTS


RELD: A Knowledge Graph of Relation Extraction Datasets

May 2023

·

28 Reads

·

1 Citation

Lecture Notes in Computer Science

Relation extraction plays an important role in natural language processing. There is a wide range of available datasets that benchmark existing relation extraction approaches. However, most benchmarking datasets are provided in different formats containing specific annotation rules, thus making it difficult to conduct experiments on different types of relation extraction approaches. We present RELD, an RDF knowledge graph of eight open-licensed and publicly available relation extraction datasets. We modeled the benchmarking datasets into a single ontology that provides a unified format for data access, along with annotations required for training different types of relation extraction systems. Moreover, RELD abides by the Linked Data principles. To the best of our knowledge, RELD is the largest RDF knowledge graph of entities and relations from text, containing \sim 1230 million triples describing 1034 relations, 2 million sentences, 3 million abstracts and 4013 documents. RELD contributes to a variety of uses in the natural language processing community, and distinctly provides unified and easy modeling of data for benchmarking relation extraction and named entity recognition models.KeywordsKnowledge graphRelation extractionbenchmarksNatural language processing.ontologyRDF


Citations (6)


... Toward the development of computational models based on voice and speech data for BD monitoring, the two most fundamental aspects are prediction accuracy and model interpretability or explainability [8] [9]. Practically speaking, interpretability and explainability are tied or very close concepts, often used interchangeably [10]. ...

Reference:

Incremental Learning and Granular Computing from Evolving Data Streams: An Application to Speech-based Bipolar Disorder Diagnosis
Interpretability Index Based on Balanced Volumes for Transparent Models and Agnostic Explainers
  • Citing Conference Paper
  • June 2024

... Depending on the type of data and methods employed, various XAI methods have been proposed in the literature, and they have been effectively used in the medical domain [5]. Some examples are: feature importance techniques such as SHAP (SHapley Additive exPlanations) [6] and LIME (Local Interpretable Model-agnostic Explanations) [7], Counterfactual Explanations [8], Layer-wise Relevance Propagation (LRP) [9], Rule-based Models [10,11,12,13], Attention Mechanisms [14], Surrogate Models [15,16,17]. ...

EGNN-C+: Interpretable Evolving Granular Neural Network and Application in Classification of Weakly-Supervised EEG Data Streams
  • Citing Conference Paper
  • May 2024

... A further focus is put on technologies which enhance explainability of AI models, thus facilitating human understanding [7,8,9]. Approaches which make such models more trustworthy can rely on probing with semantic adversarials which might occur in real life [1], and in a better understanding how to identify misuse and unwanted effects such as hate speech [3] and how this capability changes based on historical context [4]. ...

Neural Class Expression Synthesis
  • Citing Chapter
  • May 2023

Lecture Notes in Computer Science

... GADM is published as Linked Data, named GADM-RDF [20]. NUTS, the Nomenclature of Units for Territorial Statistics, provides geospatial regions in the European Union as Linked Data for statistical and policy purposes [21]. Table 2. Geospatial data sources and related statistics. ...

LauNuts: A Knowledge Graph to Identify and Compare Geographic Regions in the European Union
  • Citing Chapter
  • May 2023

Lecture Notes in Computer Science

... Newer systems must handle more volume and variety of knowledge. Finally, improved multilingual capabilities are urgently needed to increase the accessibility of KGQA systems to users around the world [23]. In the face of these requirements, we introduce QALD-10 as the newest successor of the Question Answering over Linked Data (QALD) benchmark series to facilitate the standardized evaluation of KGQA approaches. ...

Enhancing the Accessibility of Knowledge Graph Question Answering Systems through Multilingualization
  • Citing Conference Paper
  • January 2022

... Thus, the significant differences between the federated and the centralized runtimes were not due to the Bgee data version. Mainly because in our experimental setup, we do not consider any cost-based federated query processing engines [46], and we expected that the performance of federated queries would be significantly worse than the same centralized query, notably, due to network latency and poorer query optimization plan of federated queries. ...

An empirical evaluation of cost-based federated SPARQL query processing engines

Semantic Web