David J. Tena Cucala’s research while affiliated with Royal Holloway University of London and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Bridging Max Graph Neural Networks and Datalog with Negation
  • Conference Paper

November 2024

·

4 Reads

David J. Tena Cucala

·

Bernardo Cuenca Grau

We consider a general class of data transformations based on Graph Neural Networks (GNNs), which can be used for a wide variety of tasks. An important question in this setting is to characterise the expressive power of these transformations in terms of a suitable logic-based language. From a practical perspective, the correspondence of a GNN with a logical theory can be exploited for explaining the model's predictions symbolically. In this paper, we introduce a broad family of GNN-based transformations which can be characterised using Datalog programs with negation-as-failure, which can be computed from the GNNs after training. This generalises existing approaches based on positive programs by enabling the learning of nonmonotonic transformations. We show empirically that these GNNs offer good performance for knowledge graph completion tasks, and that we can efficiently extract programs for explaining individual predictions.


Relational Graph Convolutional Networks Do Not Learn Sound Rules

November 2024

Graph neural networks (GNNs) are frequently used to predict missing facts in knowledge graphs (KGs). Motivated by the lack of explainability for the outputs of these models, recent work has aimed to explain their predictions using Datalog, a widely used logic-based formalism. However, such work has been restricted to certain subclasses of GNNs. In this paper, we consider one of the most popular GNN architectures for KGs, R-GCN, and we provide two methods to extract rules that explain its predictions and are sound, in the sense that each fact derived by the rules is also predicted by the GNN, for any input dataset. Furthermore, we provide a method that can verify that certain classes of Datalog rules are not sound for the R-GCN. In our experiments, we train R-GCNs on KG completion benchmarks, and we are able to verify that no Datalog rule is sound for these models, even though the models often obtain high to near-perfect accuracy. This raises some concerns about the ability of R-GCN models to generalise and about the explainability of their predictions. We further provide two variations to the training paradigm of R-GCN that encourage it to learn sound rules and find a trade-off between model accuracy and the number of learned sound rules.


Results for the monotonic LogInfer datasets. Loss is on the training set. %UB, %Stable, and %Inc are the percentages of unbounded, stable, and increasing channels, respectively. %SO is the percentage of LogInfer rules that are sound for the GNN and %NG is the percentage of monotonic LogInfer rules that are not sound for the GNN since there is some grounding of the body which does not entail the head. #1B and #2B are the number of sound rules with one and two body atoms respectively.
Results for the benchmark datasets. AUPRC is on the validation set and Loss on the training set. %UB, %Stable, %Inc, and %Safe are the percentages of unbounded, stable, increasing, and safe channels, respectively. #1B and #2B are the number of sound rules with one and two body atoms respectively.
Relational Graph Convolutional Networks Do Not Learn Sound Rules
  • Preprint
  • File available

August 2024

·

5 Reads

Graph neural networks (GNNs) are frequently used to predict missing facts in knowledge graphs (KGs). Motivated by the lack of explainability for the outputs of these models, recent work has aimed to explain their predictions using Datalog, a widely used logic-based formalism. However, such work has been restricted to certain subclasses of GNNs. In this paper, we consider one of the most popular GNN architectures for KGs, R-GCN, and we provide two methods to extract rules that explain its predictions and are sound, in the sense that each fact derived by the rules is also predicted by the GNN, for any input dataset. Furthermore, we provide a method that can verify that certain classes of Datalog rules are not sound for the R-GCN. In our experiments, we train R-GCNs on KG completion benchmarks, and we are able to verify that no Datalog rule is sound for these models, even though the models often obtain high to near-perfect accuracy. This raises some concerns about the ability of R-GCN models to generalise and about the explainability of their predictions. We further provide two variations to the training paradigm of R-GCN that encourage it to learn sound rules and find a trade-off between model accuracy and the number of learned sound rules.

Download

The Stable Model Semantics of Datalog with Metric Temporal Operators

August 2023

·

53 Reads

Theory and Practice of Logic Programming

We introduce negation under the stable model semantics in DatalogMTL – a temporal extension of Datalog with metric temporal operators. As a result, we obtain a rule language which combines the power of answer set programming with the temporal dimension provided by metric operators. We show that, in this setting, reasoning becomes undecidable over the rational timeline, and decidable in EXPSPACE{{\rm E}{\small\rm XP}{\rm S}{\small\rm PACE}} in data complexity over the integer timeline. We also show that, if we restrict our attention to forward-propagating programs, reasoning over the integer timeline becomes PSPACE{{\rm PS}{\small\rm PACE}}-complete in data complexity, and hence, no harder than over positive programs; however, reasoning over the rational timeline in this fragment remains undecidable.


The Stable Model Semantics of Datalog with Metric Temporal Operators

June 2023

·

13 Reads

We introduce negation under the stable model semantics in DatalogMTL - a temporal extension of Datalog with metric temporal operators. As a result, we obtain a rule language which combines the power of answer set programming with the temporal dimension provided by metric operators. We show that, in this setting, reasoning becomes undecidable over the rational timeline, and decidable in EXPSPACE in data complexity over the integer timeline. We also show that, if we restrict our attention to forward-propagating programs, reasoning over the integer timeline becomes PSPACE-complete in data complexity, and hence, no harder than over positive programs; however, reasoning over the rational timeline in this fragment remains undecidable. Under consideration in Theory and Practice of Logic Programming (TPLP).


Fig. 1. Representation of tree-CQ q(x) and KG K with completion K * from Example 1: the single completion fact is drawn as a dashed line, constants are shown by first letter.
Fig. 2. Representation of ψ, K, and augmentation R q (K) from Example 2
Benchmark statistics, where ||q|| and h(q) are the number of atoms and height of the tree-CQ, and 'pos./neg.' stands for 'number of positive / negative examples' Benchmark |Pred| ||q|| /h(q) train: pos./neg test: pos./neg.
GNNQ: A Neuro-Symbolic Approach to Query Answering over Incomplete Knowledge Graphs

October 2022

·

14 Reads

·

2 Citations

Lecture Notes in Computer Science

Real-world knowledge graphs (KGs) are usually incomplete—that is, miss some facts representing valid information. So, when applied to such KGs, standard symbolic query engines fail to produce answers that are expected but not logically entailed by the KGs. To overcome this issue, state-of-the-art ML-based approaches first embed KGs and queries into a low-dimensional vector space, and then produce query answers based on the proximity of the candidate entity and the query embeddings in the embedding space. This allows embedding-based approaches to obtain expected answers that are not logically entailed. However, embedding-based approaches are not applicable in the inductive setting, where KG entities (i.e., constants) seen at runtime may differ from those seen during training. In this paper, we propose a novel neuro-symbolic approach to query answering over incomplete KGs applicable in the inductive setting. Our approach first symbolically augments the input KG with facts representing parts of the KG that match query fragments, and then applies a generalisation of the Relational Graph Convolutional Networks (RGCNs) to the augmented KG to produce the predicted query answers. We formally prove that, under reasonable assumptions, our approach can capture an approach based on vanilla RGCNs (and no KG augmentation) using a (often substantially) smaller number of layers. Finally, we empirically validate our theoretical findings by evaluating an implementation of our approach against the RGCN baseline on several dedicated benchmarks. KeywordsQuery answeringKnowledge graphsGraph neural networksNeuro-symbolic AI


Faithful Approaches to Rule Learning

July 2022

·

6 Reads

·

6 Citations

Rule learning involves developing machine learning models that can be applied to a set of logical facts to predict additional facts, as well as providing methods for extracting from the learned model a set of logical rules that explain symbolically the model's predictions. Existing such approaches, however, do not describe formally the relationship between the model's predictions and the derivations of the extracted rules; rather, it is often claimed without justification that the extracted rules `approximate' or `explain' the model, and rule quality is evaluated by manual inspection. In this paper, we study the formal properties of Neural-LP--a prominent rule learning approach. We show that the rules extracted from Neural-LP models can be both unsound and incomplete: on the same input dataset, the extracted rules can derive facts not predicted by the model, and the model can make predictions not derived by the extracted rules. We also propose a modification to the Neural-LP model that ensures that the extracted rules are always sound and complete. Finally, we show that, on several prominent benchmarks, the classification performance of our modified model is comparable to that of the standard Neural-LP model. Thus, faithful learning of rules is feasible from both a theoretical and practical point of view.

Citations (2)


... Graph Neural Networks (GNNs) (Scarselli et al. 2008;Liu and Zhou 2020;Hamilton 2021) are a popular family of machine learning formalisms, which operate directly on graphstructured data. Systems based on GNNs have achieved remarkable success in applications to areas as diverse as biology (Fout et al. 2017), chemistry (Reiser et al. 2022), recommender systems (Ying et al. 2018), and data management (Schlichtkrull et al. 2018;Pflueger, Tena Cucala, and Kostylev 2022). Despite these successes, it is now wellunderstood that GNNs have limitations, and seemingly insignificant differences between GNN variants may lead to dramatic changes in performance on various tasks. ...

Reference:

Recurrent Graph Neural Networks and Their Connections to Bisimulation and Logic
GNNQ: A Neuro-Symbolic Approach to Query Answering over Incomplete Knowledge Graphs

Lecture Notes in Computer Science

... The canonical transformation T N induced by a max GNN can only derive new unary facts, while KG completion requires also the derivation of binary facts. To address this limitation, we use an alternative encoding/decoding scheme (Liu et al. 2021;Tena Cucala et al. 2022) where binary facts are also encoded in feature vector components and edges in the graph correspond to different types of possible joins between unary and binary atoms. As shown in (Tena Cucala et al. 2023), such scheme can be captured by fixed encoding and decoding programs P enc and P dec so that the overall transformation is given by T P dec (T N (T Penc (D))), which in turn coincides with T P dec (T L+2 P N (T Penc (D))) by Theorem 7. Benchmarks, metrics, and baselines We used the inductive KG completion benchmarks by Teru, Denis, and Hamilton (2020), based on the FB15K-237 (Bordes et al. 2013), NELL-995 (Xiong, Hoang, and Wang 2017), and WN18RR (Dettmers et al. 2018) KGs. ...

Faithful Approaches to Rule Learning
  • Citing Conference Paper
  • July 2022