Conference PaperPDF Available

ExCut: Explainable Embedding-based Clustering over Knowledge Graphs

Authors:

Abstract and Figures

Clustering entities over knowledge graphs (KGs) is an asset for explorative search and knowledge discovery. KG embeddings have been intensively investigated, mostly for KG completion, and have potential also for entity clustering. However, embeddings are latent and do not convey user-interpretable labels for clusters. This work presents ExCut, a novel approach that combines KG embeddings with rule mining methods, to compute informative clusters of entities along with comprehensible explanations. The explanations are in the form of concise combinations of entity relations. ExCut jointly enhances the quality of entity clusters and their explanations, in an iterative manner that interleaves the learning of embeddings and rules. Experiments on real-world KGs demonstrate the effectiveness of ExCut for discovering high-quality clusters and their explanations.
Content may be subject to copyright.
pre-print September. 05, 2020
ExCut: Explainable Embedding-based Clustering over
Knowledge Graphs
Mohamed H. Gad-Elrab1,2, Daria Stepanova2, Trung-Kien Tran2, Heike Adel2, and
Gerhard Weikum1
1Max-Planck Institute for Informatics, Saarland Informatics Campus, Saarbr¨
ucken, Germany
{gadelrab,weikum}@mpi-inf.mpg.de
2Bosch Center for Artificial Intelligence, Renningen, Germany
{firstname.lastname}@de.bosch.com
Abstract. Clustering entities over knowledge graphs (KGs) is an asset for ex-
plorative search and knowledge discovery. KG embeddings have been intensively
investigated, mostly for KG completion, and have potential also for entity cluster-
ing. However, embeddings are latent and do not convey user-interpretable labels
for clusters. This work presents ExCut, a novel approach that combines KG em-
beddings with rule mining methods, to compute informative clusters of entities
along with comprehensible explanations. The explanations are in the form of con-
cise combinations of entity relations. ExCut jointly enhances the quality of entity
clusters and their explanations, in an iterative manner that interleaves the learning
of embeddings and rules. Experiments on real-world KGs demonstrate the effec-
tiveness of ExCut for discovering high-quality clusters and their explanations.
1 Introduction
Motivation. Knowledge graphs (KGs) are collections of triples of the form hsubject
predicate objectiused for important tasks such as entity search, question answering and
text analytics, by providing rich repositories of typed entities and associated properties.
For example, Tedros Adhanom is known as a health expert, director of the World Health
Organization (WHO), alumni of the University of London, and many more.
KGs can support analysts in exploring sets of interrelated entities and discovering
interesting structures. This can be facilitated by entity clustering, using unsupervised
methods for grouping entities into informative subsets. Consider, for example, an ana-
lyst or journalist who works on a large corpus of topically relevant documents, say on
the Coronavirus crisis. Assume that key entities in this collection have been spotted and
linked to the KG already. Then the KG can guide the user in understanding what kinds
of entities are most relevant. With thousands of input entities, from health experts, geo-
locations, political decision-makers all the way to diseases, drugs, and vaccinations, the
user is likely overwhelmed and would appreciate a group-wise organization. This task
of computing entity clusters [4,6,16] is the problem we address.
Merely clustering the entity set is insufficient, though. The user also needs to un-
derstand the nature of each cluster. In other words, clusters must be explainable, in
the form of user-comprehensible labels. As entities have types in the KG, an obvi-
ous solution is to label each cluster with its prevalent entity type. However, some KGs
pre-print September. 05, 2020
2 M. H. Gad-Elrab et al.
have only coarse-grained types and labels like “people” or “diseases” cannot distinguish
health experts from politicians or virus diseases from bacterial infections. Switching to
fine-grained types, such as Wikipedia categories, on the other hand, causes the oppo-
site problem: each entity is associated with tens or hundreds of types, and it is unclear
which of these would be a good cluster label. The same holds for an approach where
common SPO properties (e.g., educatedIn UK) are considered as labels. Moreover, once
we switch from a single KG to a set of linked open data (LOD) sources as a joint entity
repository, the situation becomes even more difficult.
Problem Statement. Given a large set of entities, each with a substantial set of KG
properties in the form of categorical values or relations to other entities, our problem
is to jointly tackle: (i) Clustering: group the entities into kclusters of semantically
similar entities; (ii) Explanation: generate a user-comprehensible concise labels for
the clusters, based on the entity relations to other entities.
State-of-the-Art and its Limitations. The problem of clustering relational data is tra-
ditionally known as conceptual clustering (see, e.g., [25] for overview). Recently, it has
been adapted to KGs in the Semantic Web community [6,16]. Existing approaches aim
at clustering graph-structured data itself by, e.g., introducing novel notions of distance
and similarity directly on the KG [4,5]. Due to the complexity of the data, finding such
universally good similarity notions is challenging [5].
Moreover, existing relational learning approaches are not sufficiently scalable to
handle large KGs with millions of facts, e.g., YAGO [26] and Wikidata [30]. Cluster-
ing entities represented in latent space, e.g., [12,31], helps to overcome this challenge,
yet, the resulting clusters are lacking explanations, clustering process is prone to the
embedding quality, and hyperparameters are hard to tune [5]. Explaining clusters over
KGs, such as [27,28] focus on the discovery of explanations for given perfect clusters.
However, obtaining such high-quality clusters in practice is not straightforward.
Approach. To address the above shortcomings, we present ExCut, a new method for
computing explainable clusters of large sets of entities. The method uses KG embedding
as a signal for finding plausible entity clusters, and combines it with logical rule mining,
over the available set of properties, to learn interpretable labels. The labels take the
form of concise conjunctions of relations that characterize the majority of entities in a
cluster. For example, for the above Coronavirus scenario, we aim at mining such labels
as worksFor (X,Y)type(Y,health org)hasDegreeIn(X,life sciences)for
a cluster of health experts, type(X, disease)causedBy(X, Y )type(Y, virus)
for a cluster of virus diseases, and more. A key point in our approach is that these labels
can in turn inform the entity embeddings, as they add salient information. Therefore,
we interleave and iterate the computation of embeddings and rule mining adapting the
embeddings using as feedback the information inferred by the learned rules.
Contributions. Our main contributions can be summarized as follows:
We introduce ExCut, a novel approach for computing explainable clusters, which
combines embedding-based clustering with symbolic rule learning to produce human-
understandable explanations for the resulting clusters. These explanations can also
serve as new types for entities.
pre-print September. 05, 2020
ExCut 3
We propose several strategies to iteratively fine-tune the embedding model to maxi-
mize the explainability and accuracy of the discovered clusters based on the feedback
from the learned explanations.
We evaluate ExCut on real-world KGs. In many cases, it out-performs state-of-the-art
methods w.r.t. both clustering and explanations quality.
ExCut’s implementation and resources are available at github.com/mhmgad/ExCut.
2 Preliminaries
Knowledge Graphs. KGs represent interlinked collections of factual information, en-
coded as a set of hsubject predicate objectitriples, e.g.,htedros adhanom directorOf
WHOi. For simplicity, we write triples as in predicate logics format, e.g., directorOf (
tedros adhanom,WHO). A signature of a KG Gis ΣG=hP,Ei, where Pis a set of
binary predicates and Eis a set of entities, i.e., constants, in G.
KG Embeddings. KG embeddings aim at representing all entities and relations in
a continuous vector space, usually as vectors or matrices called embeddings. Embed-
dings can be used to estimate the likelihood of a triple to be true via a scoring function:
f:E×P×ER. Concrete scoring functions are defined based on various vector
space assumptions: (i) The translation-based assumption, e.g., TransE [1] embeds enti-
ties and relations as vectors and assumes vs+vrvofor true triples, where vs,vr,vo
are vector embeddings for subject s, relation rand object o, resp. (ii) The linear map
assumption, e.g., ComplEx [29] embeds entities as vectors and relations as matrices.
It assumes that for true triples, the linear mapping Mrof the subject embedding vsis
close to the object embedding vo:vsMrvo. The likelihood that these assumptions
of the embedding methods hold should be higher for triples in the KG than for those
outside. The learning process is done through minimizing the error induced from the
assumptions given by their respective loss functions.
Rules. Let Xbe a set of variables. A rule ris an expression of the form head body,
where head, or head(r), is an atom over PEXand body, or body(r), is a con-
junction of positive atoms over PEX. In this work, we are concerned with Horn
rules, a subset of first-order logic rules with only positive atoms, in which every head
variable appears at least once in the body atoms.
Example 1. An example of a rule over the KG in Fig. 1 is r:has(X,covid19)
worksWith(X,Y),has(Y,covid19)stating that coworkers of individuals with covid19
infection, potentially, also have covid19.
Execution of rules over KGs is defined in the standard way. More precisely, let Gbe
a KG, ra rule over ΣG, and aan atom, we write r|=Gaif there is a variable assignment
that maps all atoms of body(r)into Gand head(r)to the atom a.Rule-based inference
is the process of applying a given rule ron G, which results in the extension Grof G
defined as: Gr=G ∪ {a|r|=Ga}.
Example 2. Application of the rule rfrom Example 1 on the KG Gfrom Figure 1
results in r|=Ghas (e2,covid19)and r|=Ghas (e3,covid19). Hence, Gr=G ∪
{has(e2,covid19),has(e3,covid19)}.
pre-print September. 05, 2020
4 M. H. Gad-Elrab et al.
gender
friendOf
e1
covid19_case
male
e4
e2
e5
marriedTo
e3e6
type
type
type
type
type
type
e8
worksWith
worksWith
org1worksAt
covid19
worksAt
has
city1
visited
visited
visited
gender
gender
female
gender
gender
city3
risk_area
listedAs
listedAs
e7
has
city2
livesIn
bornIn city4
visted
listedAs
city5
livesIn
actedIn film1
plays
tennis
livesIn
Fig. 1: An example KG with potential COVID-19 cases split into two entity clusters (in
green and red). Black edges are relevant for the potential explanations of these clusters.
3 Model for Computing Explainable Clusters
Given a KG, a subset of its entities and an integer k, our goal is to find a “good” split of
entities into kclusters and compute explanations for the constructed groups that would
serve as informative cluster labels. E.g., consider the KG in Fig. 1, the set of target
entities {e1,...,e6}and the integer k= 2. One of the possible solutions is to put e13
into the first cluster C1and the other three entities into the second one C2. Explanations
for this split would be that C1includes those who got infected via interacting with their
coworkers, while the others were infected after visiting a risk area. Obviously, in general
there are many other splits and identifying the criteria for the best ones is challenging.
Formally, we define the problem of computing explainable entity clusters as follows:
Definition 1 (Computing Explainable Entity Clusters Problem).
Given: (i) a knowledge graph Gover ΣG=hP,Ei; (ii) a set TEof target entities;
(iii) a number of desired clusters k > 1; (iv) an explanation language L; and (v) an
explanation evaluation function d: 2L×2T× G [0..1]
Find: a split C={C1,...Ck}of entities in Tinto kclusters and a set of explanations
R={r1, . . . , rk}for them, where riL, s.t. d(R,C,G)is maximal.
3.1 Explanation Language
Explanations (i.e., informative labels) for clusters can be characterized as conjunctions
of common entity properties in a given cluster; for that Horn rules are sufficient. Thus,
our explanation language relies on (cluster) explanation rules defined as follows:
Definition 2 (Cluster Explanation Rules). Let Gbe a KG with the signature ΣG=
hP,Ei, let CEbe a subset of entities in G,i.e., a cluster, and Xa set of variables.
A(cluster) explanation rule rfor Cover Gis of the form
r:belongsT o(X, eC)p1(X1),...,pm(Xm),(1)
where eC6∈ Eis a fresh unique entity representing the cluster C,belongsT o 6∈ Pis a
fresh predicate, and body (r)is a finite set of atoms over Pand XE.
Example 3. A possible explanation rule for C1={e1,e2,e3}in Gfrom Fig. 1 is
r:belongsTo(X, eC1)worksWith (X,Y),has(Y,covid19)
pre-print September. 05, 2020
ExCut 5
which describes C1as a set of people working with infected colleagues.
Out of all possible cluster explanation rules we naturally prefer succinct ones.
Therefore, we put further restrictions on the explanation language Lby limiting the
number of rule body atoms (an adjustable parameter in our method).
3.2 Evaluation Function
The function dfrom Def. 1 compares solutions to the problem of explainable entity
clustering w.r.t. their quality, and ideally dshould satisfy the following two criteria:
(i) Coverage: Given two explanation rules for a cluster, the one covering more enti-
ties should be preferred and (ii) Exclusiveness: Explanation rules for different clusters
should be (approximately) mutually exclusive.
The coverage measure from data mining is a natural choice for satisfying (i).
Definition 3 (Explanation Rule Coverage). Let Gbe a KG, Ca cluster of entities,
and ra cluster explanation rule. The coverage of ron Cw.r.t. Gis
cover(r,C,G) = |{cC|r|=GbelongsTo(c, eC)}|
|C|(2)
Example 4. Consider clusters C1={e1,e2,e3},C2={e4,e5,e6}shown in Fig. 1.
The set of potential cluster explanation rules along with their coverage scores for C1
and C2respectively, is given as follows:
r1:belongsTo(X, eCi)type(X, covid19 case)1 1
r2:belongsTo(X,eCi)gender(X, male)0.67 0.33
r3:belongsTo(X, eCi)worksWith (X, Y ),has(Y , covid19)0.67 0
r4:belongsTo(X, eCi)visited (X,Y),listedAs(Y,risk area)0 1
While addressing (i), the coverage measure does not account for the criteria (ii).
Indeed, high coverage of a rule for a given cluster does not imply a low value of this
measure for other clusters. For instance, in Example 4 r1is too general, as it perfectly
covers entities from both clusters. This motivates us to favour (approximately) mutually
exclusive explanation rules, i.e., explanation rules with high coverage for a given cluster
but low coverage for others (similar to [13]). To capture this intuition, we define the
exclusive explanation coverage of a rule for a cluster given other clusters as follows.
Definition 4 (Exclusive Explanation Rule Coverage). Let Gbe a KG, let Cbe a set
of all clusters of interest, C∈ C a cluster, and ran explanation rule. The exclusive
explanation rule coverage of rfor Cw.r.t. Cand Gis defined as
exc(r,C,C,G)=
0,if min
C0∈C\C{cover(r,C,G)cover (r,C0,G)}≤0
cover(r, C, G)
P
C0∈C\C
cover(r,C 0,G)
|C\C|,otherwise.
(3)
Example 5. Consider C={C1, C2},R={r1, r2, r3, r4}from Example 4 and the KG
Gfrom Fig. 1. We have exc(r1, C1,C,G) = exc(r1, C2,C,G) = 0, which disqualifies
r1as an explanation for either of the clusters. For r2, we have exc(r2, C1,C,G)=0.34
making it less suitable for the cluster C1than r3with exc(r3, C1,C,G)=0.67. Finally,
r4perfectly explains C2, since exc(r4, C2,C,G)=1.
pre-print September. 05, 2020
6 M. H. Gad-Elrab et al.
1. Embedding
Learning 2. Clustering
...
...
...
3. Rule
Learning
5. Embedding
Adaptation
Clusters
4. Rule-based
Inference
belongsTo
belongsTo
belongsTo
belongsTo
belongsTo
Feedback Triples
Explanation Rule s
Target Entities
KG Embedding
KG
Fig. 2: ExCut pipeline overview.
Similarly, we can measure the quality of a collection of clusters with their explana-
tions by averaging their per-cluster exclusive explanation rule coverage.
Definition 5 (Quality of Explainable Clusters). Let Gbe a KG, C={C1, . . . , Ck}
a set of entity clusters, and R={r1, . . . , rk}a set of cluster explanation rules, where
each riis an explanation for Ci,1ik. The explainable clustering quality qof R
for Cw.r.t. Gis defined as follows:
q(R,C,G) = 1
|C|
|C|
X
i=1
exc(ri, Ci,C,G)(4)
Realizing the function din Definition 1 by the above measure allows us to conveniently
compare the solutions of the explainable clusters discovery problem.
Example 6. Consider Gfrom Fig. 1, the set of target entities T={e1,...,e6},k= 2,
language Lof cluster explanation rules with at most 2 body atoms, and the evaluation
function dgiven as qfrom Def. 5. The best solution to the respective problem of com-
puting explainable entity clusters is C={C1, C2},R={r3, r4}, where C1, C2, r3, r4
are from Example 4. We have that q(R,C,G)=0.83.
4 Method
We now present our method ExCut, which iteratively utilizes KG Embedding-based
Clustering and Rule Learning to compute explainable clusters. More specifically, as
shown in Fig. 2, ExCut starts with (1) Embedding Learning for a given KG. Then,
it performs (2) Clustering of the entities in the target set over the learned embeddings.
Afterwards, (3) Rule Learning is utilized to induce explanation rules for the constructed
clusters, which are ranked based on the exclusive coverage measure. Using the learned
explanation rules, we perform (4) Rule-based Inference to deduce new entity-cluster
assignment triples reflecting the learned structural similarities among the target entities.
Then, ExCut uses the rules and the inferred assignment triples in constructing feedback
to guide the clustering in the subsequent iterations. We achieve that by fine-tuning the
embeddings of the target entities in Step (5) Embedding Adaptation.
In what follows we present the detailed description of ExCut’s components.
pre-print September. 05, 2020
ExCut 7
marriedTo politicianOf
hasSymp tom
marriedTo politicianOf
hasSymp tom
type
type
1
A B
Fig. 3: KG fragments.
4.1 Embedding Learning and Clustering
Embedding Learning. ExCut starts with learning vector representations of entities
and relations. We adopt KG embeddings in this first step, as they are well-known for
their ability to capture semantic similarities among entities, and thus could be suited for
defining a robust similarity function for clustering relational data [5]. Embeddings are
also effective for dealing with data incompleteness, e.g., predicting the potentially miss-
ing fact worksWith(e1,e7)in Fig. 1. Moreover, embeddings facilitate the inclusion of
unstructured external sources during training, e.g., textual entity descriptions [33].
Conceptually, any embedding method can be used in our approach. We experi-
mented with TransE [1] and ComplEx [29] as prominent representatives of translation-
based and linear map embeddings. To account for the context surrounding the target
entities, we train embeddings using the whole KG.
Clustering. The Clustering step takes as input the trained embedding vectors of the
target entities and the number kof clusters to be constructed. We perform clustering
relying on the embeddings as features to compute pairwise distances among the target
entities using standard distance functions, e.g.,cosine distance. Various classical clus-
tering approaches or more complex embedding-driven clustering techniques [31] could
be exploited here too. In this paper, we rely on the traditional Kmeans method [17] as a
proof of concept.
For KGs with types, the majority of embedding models [1,29] would map entities
of a certain type to similar vectors [31]. For example, e1and e2in Fig. 3.A are likely
to be close to each other in the embedding space, and thus have a high chance of being
clustered together. An ideal embedding model for explainable clustering should follow
the same intuition even if types in the KG are missing. In other words, it should be capa-
ble of assigning similar vectors to entities that belong to structurally similar subgraphs
of certain pre-specified complexity. For instance, in Fig. 3.B, both e1and e2belong to
subgraphs reflecting that these entities are married to politicians with some covid19
symptom, and hence should be mapped to similar vectors.
Despite certain attempts to consider specific graph patterns (e.g., [15]), to the best
of our knowledge none of the existing embedding models is general enough to capture
patterns of arbitrary complexity. We propose to tackle this limitation (see Sec. 4.3)
by passing to the embedding model feedback created using cluster explanation rules
learned in the Step 3 of ExCut.
4.2 Explanation Mining
KG-based Explanations. KG embeddings and the respective clusters constructed in
Steps 1 and 2 of our method are not interpretable. However, since KG embeddings are
expected to preserve semantic similarities among entities, the clusters in the embedding
pre-print September. 05, 2020
8 M. H. Gad-Elrab et al.
space should intuitively have some meaning. Motivated by this, in ExCut, we aim at
decoding these similarities by learning rules over the KG extended by the facts that
reflect the cluster assignments computed in the Clustering step.
Rule Learning Procedure. After augmenting Gwith the facts belongsTo(e,eCi)for
all entities eclustered in Ci, we learn Horn rules of the form (1) from Def. 2. There
are powerful rule-learning tools such as AMIE+ [8], AnyBurl [18], RLvLR [21,20]
and RuDiK [22]. Nevertheless, we decided to develop our own rule learner so that we
could have full control over our specific scoring functions and their integration into the
learner’s search strategy. Following [8], we model rules as sequences of atoms, where
the first atom is the head of the rule (i.e.,belongsTo (X,eCi)with Cibeing the cluster
to be explained), and other atoms form the rule’s body.
For each cluster Ci, we maintain an independent queue of intermediate rules, initial-
ized with a single rule atom belongsTo(X,eCi), and then exploit an iterative breadth-
first search strategy. At every iteration, we expand the existing rules in the queue using
the following refinement operators: (i) add a positive dangling atom: add a binary posi-
tive atom with one fresh variable and another variable appearing in the rule, i.e.,shared
variable ,e.g., adding worksAt(X,Y), where Yis a fresh variable not appearing in the
current rule; (ii) add a positive instantiated atom: add a positive atom with one argument
being a constant and the other one a shared variable , e.g., adding locatedIn(X,usa),
where usa is a constant, and Xappears elsewhere in the rule constructed so far.
These operators produce a set of new rule candidates, which are then filtered re-
lying on the given explanation language L. Suitable rules with a minimum coverage
of 0.5,i.e., rules covering the majority of the respective cluster, are added to the out-
put set. We refine the rules until the maximum length specified in the language bias
is reached. Finally, we rank the constructed rules based on the exclusive explanation
coverage (Def. 4), and select the top mrules for each cluster.
Example 7. Assume that for Gin Fig. 1, and T={e1,...,e6}, the embedding-based
clustering resulted in the following clusters C1={e1,e2,e4}and C2={e5,e6,e3},
where e4and e3are incorrectly placed in wrong clusters. The top cluster explanation
rules for C2ranked based on exc measure from Def. 4 are:
r1:belongsTo(X,eC2)visited(X,Y) 0.67
r2:belongsTo(X,eC2)gender (X,male) 0.33
r3:belongsTo(X,eC2)visited(X,Y),listedAs(Y,risk area).0.33
Inferring Entity-Clusters Assignments. In the Rule-based Inference (Step 4 in Fig. 2),
we apply the top-mrules obtained in the Rule Learning step on the KG to predict the
assignments between the target entities and the discovered clusters over belongsTo re-
lation using standard deductive reasoning techniques. The computed assignment triples
are ranked and filtered based on the exc score of the respective rules that inferred them.
Example 8. Application of the rules from Ex. 7 on Gw.r.t. the target entities e16
results in the cluster assignment triples: {belongsTo(e3,eC2),belongsTo(e4,eC2),
belongsTo (e2,eC2)}. Note that based on r1,e4is assigned to C2instead of C1.
pre-print September. 05, 2020
ExCut 9
belongsTo
appliedTo
belongsTo
infers
appliedTo
CD
sameClsA s
B
sameClsA s
belongsTo
belongsTo
A
Fig. 4: Inferred clusters assignment triples modeling options.
4.3 Embedding Adaptation
Learned explanation rules capture explicit structural similarities among the target enti-
ties. We propose to utilize them to create feedback to guide the embedding-based clus-
tering towards better explainable clusters. This feedback is passed to the embedding
model in the form of additional training triples reflecting the assignments inferred by
the learned rules. Our intuition is that such added triples should potentially help other
similarities of analogous nature to be discovered by the embeddings, compensating for
the embedding-based clustering limitation discussed in Section 4.1.
Specifically, the embedding adaptation (Step 5 in Fig. 2) is summarized as follows:
(a) From the Rule Learning and Rule-based Inference steps, described above, we obtain
a set of cluster assignment triples of the form belongsT o(e, eC)together with rules
inferring them, where eis an entity in the input KG Gand eCis a new entity uniquely
representing the cluster C. (b) We then model the cluster assignments from (a) and rules
that produce them using one of our four strategies described below and store the results
in Ginf . (c) A subset Gcontext of Gconsisting of triples that surround the target entities
is then constructed. (d) Finally, we fine-tune the embedding model by training it further
on the data compiled from Ginf and Gcontext.
Modeling Rule-based Feedback. Determining the adequate structure and amount of
training triples required for fine-tuning the embedding model is challenging. On the
one hand, the training data should be rich enough to reflect the learned structure, but
on the other hand, it should not corrupt the current embedding. We now present our
proposed four strategies for representing the inferred cluster-assignments along with
the corresponding rules as a set of triples Ginf suitable for adapting the embedding.
The strategies are listed in the ascending order of their complexity.
– Direct: As a straightforward strategy, we directly use the inferred entity-cluster as-
signment triples in Ginf as shown in Fig. 4.A, e.g.,belongsTo(e1,eC2).
– Same-cluster-as: In the second strategy, we model the inferred assignments as edges
only. As shown in Fig. 4.B, we compile Ginf using triples of sameClsAs relations
between every pair of entities belonging to the same cluster as the learned rules sug-
gest, e.g.,sameClsAs(e1,e2). Modeling the cluster assignments using fresh rela-
tions allows us to stress the updates related to the target entities, as no extra entities
are added to the KG in this strategy.
Rules as edges: Third, we propose to model the inferred assignments together with
the rules which led to their prediction. More precisely, for every rule rwhich deduced
belongsTo (e,eCi), we introduce a fresh predicate prand add a triple pr(e,eCi)to
the training set Ginf , as illustrated in Fig. 4.C. This allows us to encode all conflict-
ing entity-cluster assignments (i.e., assignments, in which an entity belongs to two
pre-print September. 05, 2020
10 M. H. Gad-Elrab et al.
Table 1: Datasets statistics.
UWCSE WebKB Terror. IMDB Mutag. Hepatitis LUBM YAGO
# Target Entities 209 106 1293 268 230 500 2850 3900
# Target Clusters 2 4 6 2 2 2 2 3
# KG Entities 991 5906 1392 578 6196 6511 242558 4295825
# Relations 12 7 4 4 14 19 22 38
# Facts 2216 72464 17117 1231 30805 77585 2169451 12430700
different clusters) and supply the embedding model with richer evidence about the
rules that predicted these assignments.
Rules as entities: Rules used in the deduction process can also be modeled as en-
tities. In the fourth strategy, we exploit this possibility by introducing additional
predicates infers and appliedTo , and for every rule ra fresh entity er. Here, each
belongsTo (e,eCi)fact deduced by the rule ris modeled in Ginf with two triples
infers(er,eCi)and appliedTo(er,e)as shown in Fig. 4.D.
Embedding Fine-Tuning. At every iteration iof ExCut, we start with the embedding
vectors obtained in the previous iteration i1and train the embedding further with a
set of adaptation triples Gadapt . The set Gadapt is composed of the union of all Ginf
jfor
j= 1 . . . i and a set of context triples Gcontext. For Gcontext , we only consider those
directly involving the target entities as a subject or object. E.g., among the facts in the
surrounding context of e1, we have worksAt (e1,org1)and plays(e1,tennis).
Our empirical studies (see the technical report1) showed that including assignment
triples from previous iterations j < i leads to better results; thus, we include them in
Gadapt , but distinguish entity and relation names from different iterations. Additionally,
considering the context subgraph helps in regulating the change caused by the cluster
assignment triples by preserving some of the characteristics of the original embeddings.
5 Experiments
We evaluate the effectiveness of ExCut for computing explainable clusters. More specif-
ically, we report the experimental results covering the following aspects: (i) the quality
of the clusters produced by ExCut compared to existing clustering approaches; (ii) the
quality of the computed cluster explanations; (iii) the usefulness and understandability
of the explanations for humans based on a user study; (iv) the benefits of interleaving
embedding and rule learning for enhancing the quality of the clusters and their expla-
nations; and (v) the impact of using different embedding paradigms and our strategies
for modeling the feedback from the rules.
5.1 Experiment Setup
ExCut Configurations. We implemented ExCut1in Python and configured its com-
ponents as follows: (i) Embedding-based Clustering: We extended the implementation
1Code, data and the technical report are available at https://github.com/mhmgad/ExCut.
pre-print September. 05, 2020
ExCut 11
of TransE and ComplEx provided by Ampligraph [3] to allow embedding fine-tuning.
We set the size of the embeddings to 100, and trained a base model with the whole KG
for 100 epochs, using stochastic gradient descent with a learning rate of 0.0005. For
fine-tuning, we trained the model for 25 epochs with a learning rate of 0.005.Kmeans
is used for clustering. (ii) Rule Learning: We implemented the algorithm described in
Section 4.2. For experiments, we fix the language bias of the explanations to paths of
length two, e.g.,belongsT o(x, eCi)p(x, y), q(y, z), where zis either a free vari-
able or bind to a constant. (iii) Modeling Rule-based Feedback: We experiment with the
four strategies from Section 4.3: direct (belongToCl), same cluster as edges (sameClAs),
rules as edges (entExplCl), and rules as entities (followExpl).
Datasets. We performed experiments on six datasets (Tab. 1) with a pre-specified set
of target entities, which are widely used for relational clustering [4]. Additionally, we
considered the following large-scale KGs: (i) LUBM-Courses: a subset of entities from
LUBM syntactic KG [9] describing the university domain, where target entities are
distributed over graduate and undergraduate courses; and (ii) YAGO-Artwork KG with
a set of target entities randomly selected from YAGO [26]. The entities are uniformly
distributed over three types, book,song, and movie. To avoid trivial explanations, type
triples for target entities were removed from the KG. Tab. 1 reports the dataset statistics.
Baselines. We compare ExCut to the following clustering methods: (i) ReCeNT [4],
a state-of-the-art relational clustering approach, that clusters entities based on a simi-
larity score computed from entity neighborhood trees; (ii) Deep Embedding Clustering
(DEC) [32], an embedding-based clustering method that performs dimensionality re-
duction jointly with clustering and (iii) Standard Kmeans applied directly over embed-
dings: TransE (Kmeans-T) and ComplEx (Kmeans-C). This baseline is equivalent to a
single iteration of our system ExCut. Extended experiments with clustering algorithms
that automatically detect the number of clusters can be found in the technical report1.
Clustering Quality Metrics. We measure the clustering quality w.r.t. the ground truth
with three standard metrics: Accuracy (ACC),Adjusted Rand Index (ARI), and Normal-
ized Mutual Information (NMI) (the higher, the better).
Explanation Quality Metrics. The quality of the generated explanations is measured
using the coverage metrics defined in Section 3.2, namely, per cluster coverage (Cov)
and exclusive coverage (Exc). In addition, we adapted the ”novelty” metric Weighted
Relative Accuracy (WRA) [14], which represents a trade-off between the coverage and
the accuracy of the discovered explanations. We compute the average of the respective
quality of the top explanations for all clusters. To assess the quality of the solution to the
explainable clustering problem from Def. 1 found by ExCut, we compare the computed
quality value to the quality of the explanations computed over the ground truth.
All experiments were performed on a Linux machine with 80 cores and 500GB
RAM. The average results over 5 runs are reported.
User Study. To assess the human-understandability and usefulness of the explanation
rules, we analyze whether ExCut explanations are the best fitting labels for the com-
puted clusters based on the user opinion. The study was conducted on Amazon MTurk.
More specifically, based on the YAGO KG, we provided the user study participants
with: (i) Three clusters of entities, each represented with three entities pseudo-randomly
pre-print September. 05, 2020
12 M. H. Gad-Elrab et al.
Table 2: Clustering results of ExCut compared to the baselines.
Methods UWCSE IMDB Hepatitis Mutagenesis WebKB Terrorist
ACC ARI NMI ACC ARI NMI ACC ARI NMI ACC ARI NMI ACC ARI NMI ACC ARI NMI
Baselines
ReCeNT 0.90 0.60 0.54 0.61 0.02 0.01 0.51 -0.01 0.01 0.77 0.30 0.24 0.52 0.00 -0.25 0.37 0.10 0.13
DEC 0.67 0.17 0.12 0.54 0.00 0.01 0.55 0.01 0.01 0.51 0.00 0.00 0.31 0.03 0.05 0.37 0.16 0.26
Kmeans-T 0.91 0.66 0.51 0.58 0.03 0.08 0.51 0.00 0.00 0.52 0.00 0.00 0.33 0.01 0.06 0.53 0.33 0.44
Kmeans-C 0.54 0.00 0.01 0.53 0.00 0.00 0.52 0.00 0.00 0.73 0.21 0.18 0.49 0.21 0.34 0.51 0.23 0.28
ExCut-T
belongToCl 0.99 0.96 0.92 1.00 1.00 1.00 0.83 0.43 0.35 0.68 0.12 0.13 0.43 0.13 0.17 0.52 0.27 0.31
sameClAs 1.00 1.00 1.00 1.00 1.00 1.00 0.56 0.01 0.01 0.65 0.08 0.08 0.36 0.06 0.08 0.35 0.03 0.06
entExplCl 1.00 1.00 1.00 1.00 1.00 1.00 0.82 0.41 0.33 0.64 0.07 0.08 0.43 0.13 0.20 0.45 0.17 0.23
followExpl 1.00 1.00 1.00 1.00 1.00 1.00 0.82 0.41 0.33 0.64 0.08 0.08 0.44 0.15 0.22 0.45 0.16 0.22
Excut-C
belongToCl 0.96 0.85 0.77 1.00 1.00 1.00 0.63 0.07 0.05 0.73 0.21 0.18 0.51 0.23 0.37 0.54 0.26 0.29
sameClAs 0.98 0.91 0.86 1.00 1.00 1.00 0.58 0.02 0.02 0.73 0.21 0.18 0.38 0.08 0.17 0.34 0.03 0.08
entExplCl 0.97 0.88 0.81 0.65 0.08 0.19 0.69 0.15 0.11 0.73 0.21 0.19 0.52 0.24 0.36 0.53 0.25 0.29
followExpl 0.99 0.97 0.94 1.00 1.00 1.00 0.66 0.10 0.08 0.73 0.20 0.18 0.51 0.22 0.34 0.52 0.24 0.29
selected from these clusters along with a brief summary for each entity, and a link to its
Wikipedia page; (ii) A set of 10 potential explanations composed of the top explanations
generated by ExCut and other explanations with high Cov but low Exc. Explanations
were displayed in natural language for the ease of readability. We asked the participants
to match each explanation to all relevant clusters.
Auseful explanation is the one that is exclusively matched to the correct clus-
ter by the participants. To detect useful explanations, for every explanation-cluster
pair, we compute the ratio of responses where the pair is exclusively matched. Let
match(ri,cm) = 1 if the user matched explanation rito the cluster cm(otherwise 0).
Then, riis exclusively matched to cmif additionally, match(ri,cj)=0 for all j6=m.
5.2 Experiment Results
In seven out of eight datasets, our approach outperforms the baselines with regard to
the overall clustering and explanation quality metrics. Additionally, the quality of the
computed explanations increases after few iterations.
Clustering Quality. Table 2 presents the quality of the clusters computed by the base-
lines, in the first 4 rows, followed by ExCut with the four feedback strategies, where
ExCut-T and ExCut-C stand for ExCut with TransE and ComplEx respectively.
For all datasets except for Mutagensis, ExCut achieved, in general, better results
w.r.t. the ACC value than the state-of-the-art methods. Furthermore, ExCut-T results in
significantly better clusters on all datasets apart from Terrorists compared to Kmeans-T,
i.e., the direct application of Kmeans on the TransE embedding model. Since the Ter-
rorists dataset contains several attributed predicates (e.g., facts over numerical values),
a different language bias for the explanation rules is required.
Our system managed to fully re-discover the ground truth clusters for the two
datasets: UWCSE and IMDB. The accuracy enhancement by ExCut-T compared to the
respective baseline (Kmeans-T) exceeds 30% for IMDB and Hepatitis. Other quality
measurements indicate similar increments.
Explanation Quality. Table 3 shows the average quality of the top explanations for
the discovered clusters, where the average per cluster coverage (Cov) and exclusive
pre-print September. 05, 2020
ExCut 13
Table 3: Quality of Clusters Explanations by ExCut compared to the baselines.
Methods UWCSE IMDB Hepatitis Mutagenesis WebKB Terrorist
Cov Exc WRA Cov Exc WRA Cov Exc WRA Cov Exc WRA Cov Exc WRA Cov Exc WRA
Baselines
ReCeNT 0.91 0.88 0.14 1.00 0.04 0.01 1.00 0.00 0.00 1.00 0.00 0.00 1.00 1.00 0.00 0.93 0.42 0.06
DEC 0.73 0.31 0.07 1.00 0.03 0.01 1.00 0.01 0.00 1.00 0.00 0.00 1.00 0.06 0.01 0.60 0.13 0.02
Kmeans-T 0.83 0.76 0.16 0.74 0.11 0.01 0.81 0.09 0.02 0.75 0.11 0.03 0.75 0.11 0.03 0.49 0.17 0.02
Kmeans-C 0.59 0.06 0.01 0.73 0.04 0.01 0.61 0.09 0.02 0.87 0.30 0.08 0.98 0.04 0.01 0.64 0.28 0.02
ExCut-T
belongToCl 0.89 0.89 0.19 1.00 1.00 0.11 0.76 0.64 0.13 0.94 0.39 0.09 0.98 0.12 0.01 0.68 0.26 0.03
sameClAs 0.90 0.90 0.19 1.00 1.00 0.11 0.94 0.45 0.09 0.96 0.50 0.12 0.99 0.04 0.01 0.87 0.49 0.06
entExplCl 0.90 0.90 0.19 1.00 1.00 0.11 0.75 0.64 0.13 0.99 0.48 0.12 0.99 0.10 0.01 0.94 0.80 0.11
followExpl 0.90 0.90 0.19 1.00 1.00 0.11 0.75 0.63 0.13 0.98 0.46 0.11 0.99 0.09 0.01 0.95 0.79 0.11
ExCut-C
belongToCl 0.88 0.86 0.18 1.00 1.00 0.11 0.73 0.50 0.12 0.87 0.31 0.08 0.98 0.08 0.01 0.68 0.32 0.02
sameClAs 0.91 0.89 0.19 1.00 1.00 0.11 0.80 0.45 0.11 0.87 0.30 0.08 0.98 0.10 0.01 0.85 0.61 0.07
entExplCl 0.88 0.88 0.19 0.73 0.18 0.01 0.85 0.73 0.18 0.87 0.31 0.08 0.97 0.08 0.01 0.68 0.33 0.03
followExpl 0.90 0.89 0.19 1.00 1.00 0.11 0.81 0.66 0.12 0.87 0.31 0.08 0.97 0.07 0.01 0.67 0.30 0.03
Ground truth 0.92 0.90 0.19 1.00 1.00 0.11 0.92 0.57 0.14 1.00 0.16 0.04 1.00 0.04 0.01 0.64 0.33 0.03
Table 4: Quality of the clusters and the explanations found in Large-scale KGs.
Methods LUBM Courses Yago Artwork
ACC ARI NMI Cov Exc WRA ACC ARI NMI Cov Exc WRA
Bas.
DEC 0.92 0.70 0.66 0.96 0.95 0.19 0.56 0.44 0.57 0.92 0.49 0.11
Kmeans-T 0.50 0.00 0.00 0.46 0.03 0.01 0.52 0.42 0.58 0.92 0.42 0.11
ExCut-T
belongToCl 1.00 1.00 1.00 1.00 1.00 0.25 0.82 0.63 0.59 0.85 0.70 0.16
sameClAs 0.88 0.57 0.53 0.91 0.79 0.19 0.97 0.91 0.90 0.95 0.93 0.21
entExplCl 1.00 1.00 1.00 1.00 1.00 0.25 0.97 0.92 0.91 0.95 0.93 0.21
followExpl 1.00 1.00 1.00 1.00 1.00 0.25 0.88 0.73 0.70 0.86 0.78 0.17
Ground truth - - - 1.00 1.00 0.25 - - - 0.95 0.93 0.21
coverage (Exc) are intrinsic evaluation metrics used as our optimization functions, while
the WRA measure is the extrinsic one.
The last row presents the quality of the learned explanations for the ground truth
clusters; these values are not necessarily 1.0, as perfect explanations under the specified
language bias may not exist. We report them as reference points.
ExCut enhances the average Exc and WRA scores of the clusters’ explanations com-
pared to the ones obtained by the baselines. These two measures highlight the exclusive-
ness of the explanations; making them more representative than Cov. Thus, the decrease
in the Cov, as in Terrorist, is acceptable, given that it is in favor of increasing them.
Similar to the clustering results, for UWCSE and IMDB our method achieved the
explanations quality of the ground truth. For other datasets, our method obtained higher
explanations quality than the respective baselines. This demonstrates the effectiveness
of the proposed feedback mechanism in adapting the embedding model to better capture
the graph structures in the input KGs.
Results on large-scale KGs. Table 4 presents quality measures for clustering and
explainability of ExCut running with TransE on LUBM and YAGO. ExCut succeeds to
compute the ground truth clusters on LUBM. Despite the noise in YAGO, it achieves
approximately 40% enhancement of the clustering accuracy. The explanation quality is
also improved. ReCent did not scale on LUBM and YAGO due to memory requirements.
pre-print September. 05, 2020
14 M. H. Gad-Elrab et al.
Table 5: Explanations of clusters song,book, and movie from Yago KG. (XCi)
Kmeans-T ExCut-T
Explanations Cov Exc WRA Explanations Cov Exc WRA
C1
created(Y,X),bornIn(Y,Z)0.94 0.55 0.13 created(Y,X),type(Y,artist)0.99 0.96 0.21
created(Y,X),type(Y,artist)0.49 0.45 0.10 created(Y,X),won (Y,grammy)0.57 0.57 0.12
created(Y,X),type(Y,writer)0.52 0.44 0.10 created(Y,X),type(Y,person)0.84 0.48 0.11
C2
directed(Y,X)0.92 0.56 0.11 created(Y,X),type(Y,writer)0.99 0.91 0.19
directed(Y,X),gender (Y,male)0.89 0.54 0.10 created (Y,X),diedIn(Y,Z)0.46 0.20 0.04
created(Y,X),type(Y,person)0.71 0.52 0.06 created(Y,X)1.00 0.00 0.05
C3
actedIn(Y,X),type(Y,person)0.58 0.30 0.07 actedIn(Y,X)0.81 0.81 0.19
locatedIn(X,Y),hasLang(Y,Z)0.60 0.29 0.07 actedIn(Y,X),bornIn(Y,Z)0.79 0.79 0.18
locatedIn(X,Y),currency(Y,Z)0.60 0.29 0.07 actedIn(Y,X),type(Y,person)0.78 0.78 0.18
0.0 0.2 0.4 0.6 0.8 1.0
Exclusive Matching Ratio (Stacked)
r1:“created by a human”
r2:“written in English”
r3:“has an actor”
r4:“created by a male”
r5:“created by a living person”
r6:“created by a singer”
r7:“is a novel”
r8:“created by a Grammy winner”
r9:“created by a writer”
r10 :“created by a director”
Movies Books Songs
Fig. 5: Ratio of explanation-to-cluster pairs exclusively matched.
Human-understanbility. For illustration in Table 5, we present the top-3 explanations
for each cluster computed by ExCut along with their quality on the YAGO KG. In the
ground truth, C1, C2, C3are clusters for entities of the type Songs,Books, and Movies
respectively. One can observe that the explanations generated by ExCut-T are more
intuitive and of higher quality than those obtained using Kmeans-T. The correlation
between the explanation relevance and the used quality metrics can also be observed.
Fig. 5 summarizes the results of the 50 responses collected via the user-study. Each
bar shows the ratio of responses exclusively matching explanation rito each of the
provided clusters. The results show that the majority of the participants exclusively
matched explanations r3and r10 to movies;r7and r9to books; and r6and r8to songs.
The explanations r3,r6, and r9have been learned by ExCut. The high relative exclusive
matching ratio to the corresponding correct cluster for the ExCut explanations demon-
strates their usefulness in differentiating between the given clusters.
Results Analysis. In Fig. 6, we present a sample for the quality of the clusters and
the aggregated quality of their top explanations over 10 iterations of ExCut-T using
the followExpl configuration. In general, clustering and explanations qualities consis-
tently improved over iterations, which demonstrates the advantage of the introduced
embedding fine-tuning procedure. For IMDB, the qualities drop at the beginning, but
increase and reach the highest values at the third iteration. This highlights the benefit
of accumulating the auxiliary triples for enhancing the feedback signal, thus prevent-
pre-print September. 05, 2020
ExCut 15
(a) IMDB (b) Hepatitis (c) YAGO Artwork
Fig. 6: ExCut-T clustering and explanations quality over the iterations(x-axis).
ing the embedding tuning from diverging. The charts also show a correlation between
the clustering and explanation quality, which proves our hypothesis that the introduced
exclusive coverage measure (Exc) is useful for computing good clusters.
With respect to the effects of different embeddings and feedback modeling, as
shown in Tables 2 and 3, we observe that ExCut with TransE is more robust than with
ComplEx regardless of the feedback modeling method. Furthermore, modeling the feed-
back using followExpl strategy leads to better results on the majority of the datasets,
especially for large-scale KGs. This reflects the benefit of passing richer feedback to
the embedding, as it allows for better entity positioning in the latent space.
6 Related Work
Clustering relational data has been actively studied (e.g., [4,6,7,16,25]). The majority
of the existing approaches are based on finding interesting features in KGs and defining
distance measures between their vectors. Our work is conceptually similar, but we let
embedding model identify the features implicitly instead of computing them on the KG
directly, which is in spirit of linked data propositionalization [24].
A framework for explaining given high-quality clusters using linked data and in-
ductive logic programming has been proposed in [27,28]. While [28] aims at explaining
existing clusters, we focus on performing clustering and explanation learning iteratively
to discover high-quality clusters with explanations. The work [12] targets interpreting
embedding models by finding concept spaces in node embeddings and linking them to a
simple external type hierarchy. This is different from our method of explaining clusters
computed over embeddings by learning rules from a given KG. Similarly, [2] proposes
a method for learning conceptual space representations of known concepts by associ-
ating a Gaussian distribution over a learned vector space with each concept. In [10,23]
the authors introduce methods for answering logical queries over the embedding space.
In contrast, in our work, the concepts are not given but rather need to be discovered.
While the step of explanation learning in our method is an adaptation of [8], the
extension of other exact symbolic rule learning methods [18,22] is likewise possi-
ble. In principle, one can also employ neural-based rule learners for our needs, such
as [20,21,34]; however the integration of our exclusive rule coverage scoring function
into such approaches is challenging, and requires further careful investigation.
Several methods recently focused on combining [11,35] and comparing [5,19] rule
learning and embedding methods. The authors of [11] propose to rank rules learned
pre-print September. 05, 2020
16 M. H. Gad-Elrab et al.
from KGs by relying both on their embedding-based predictive quality and traditional
rule measures, which is conceptually different from our work. In [35] an iterative method
for joint learning of linear-map embeddings and OWL axioms (without nominals) has
been introduced. The triples inferred by the learned rukes are injected into the KG, be-
fore the embedding is re-trained from scratch in the subsequent iteration. In contrast, the
rule-based feedback generated by ExCut is not limited to only fact predictions, but en-
codes further structural similarities across entities. Furthermore, we do not re-train the
whole model from scratch, but rather adapt the embedding of target entities accounting
for the feedback. Finally, unlike [35], the rules that we learn support constants, which
allow to capture a larger variety of explanations.
7 Conclusion
We have proposed ExCut, an approach for explainable KG entity clustering, which
iteratively utilizes embeddings and rule learning methods to compute accurate clusters
and human-readable explanations for them. Our approach is flexible, as any embedding
model can be used. Experiments show the effectiveness of ExCut on real-world KGs.
There are several directions for future work. Considering more general rules (e.g.,
with negations) in the Rule Learning component of our method or exploiting several
embedding models instead of a single one in the Embedding-based Clustering step
should lead to cleaner clusters. Further questions to study include the analysis of how
well our method performs when the number of clusters is very large, and how the feed-
back from the rules can be used to determine the number of clusters automatically.
References
1. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embed-
dings for modeling multi-relational data. In: NeurIPS. pp. 2787–2795 (2013)
2. Bouraoui, Z., Schockaert, S.: Learning conceptual space representations of interrelated con-
cepts. In: IJCAI. pp. 1760–1766 (2018)
3. Costabello, L., Pai, S., Van, C.L., McGrath, R., McCarthy, N., Tabacof, P.: AmpliGraph: a
Library for Representation Learning on Knowledge Graphs (Mar 2019)
4. Dumancic, S., Blockeel, H.: An expressive dissimilarity measure for relational clustering
over neighbourhood trees. MLJ (2017)
5. Dumancic, S., Garc´
ıa-Dur´
an, A., Niepert, M.: On embeddings as an alternative paradigm for
relational learning. CoRR abs/1806.11391[v2] (2018)
6. Fanizzi, N., d’Amato, C., Esposito, F.: Conceptual clustering and its application to concept
drift and novelty detection. In: ESWC. pp. 318–332 (2008)
7. Fonseca, N.A., Costa, V.S., Camacho, R.: Conceptual clustering of multi-relational data. In:
ILP. pp. 145–159 (2011)
8. Gal´
arraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowl-
edge bases with amie++. The VLDB Journal 24(6), 707–730 (2015)
9. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. J. Web
Semant. 3(2-3), 158–182 (2005)
10. Hamilton, W.L., Bajaj, P., Zitnik, M., Jurafsky, D., Leskovec, J.: Embedding logical queries
on knowledge graphs. In: NeurIPS. pp. 2030–2041 (2018)
pre-print September. 05, 2020
ExCut 17
11. Ho, V.T., Stepanova, D., Gad-Elrab, M.H., Kharlamov, E., Weikum, G.: Rule learning from
knowledge graphs guided by embedding models. In: ISWC. pp. 72–90 (2018)
12. Idahl, M., Khosla, M., Anand, A.: Finding interpretable concept spaces in node embeddings
using knowledge bases. In: Cellier, P., Driessens, K. (eds.) ML/KDD. pp. 229–240 (2020)
13. Knobbe, A.J., Ho, E.K.Y.: Pattern teams. In: PKDD. pp. 577–584 (2006)
14. Lavraˇ
c, N., Flach, P., Zupan, B.: Rule evaluation measures: A unifying view. In: ILP. pp.
174–185 (1999)
15. Lin, Y., Liu, Z., Luan, H., Sun, M., Rao, S., Liu, S.: Modeling relation paths for representa-
tion learning of knowledge bases. In: EMNLP. pp. 705–714 (2015)
16. Lisi, F.A.: A pattern-based approach to conceptual clustering in FOL. In: ICCS. vol. 4068,
pp. 346–359 (2006)
17. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observa-
tions. In: Symp. on math. stat. and prob. vol. 1, pp. 281–297 (1967)
18. Meilicke, C., Chekol, M.W., Ruffinelli, D., Stuckenschmidt, H.: Anytime bottom-up rule
learning for knowledge graph completion. In: IJCAI. pp. 3137–3143 (2019)
19. Meilicke, C., Fink, M., Wang, Y., Ruffinelli, D., Gemulla, R., Stuckenschmidt, H.: Fine-
grained evaluation of rule- and embedding-based systems for knowledge graph completion.
In: ISWC. pp. 3–20 (2018)
20. Omran, P.G., Wang, K., Wang, Z.: An embedding-based approach to rule learning in knowl-
edge graphs. IEEE pp. 1–1 (2019)
21. Omran, P.G., Wang, K., Wang, Z.: Scalable rule learning via learning representation. In:
IJCAI. pp. 2149–2155 (2018)
22. Ortona, S., Meduri, V.V., Papotti, P.: Robust discovery of positive and negative rules in
knowledge bases. In: ICDE. pp. 1168–1179. IEEE (2018)
23. Ren, H., Hu, W., Leskovec, J.: Query2box: Reasoning over knowledge graphs in vector space
using box embeddings. In: ICLR (2020)
24. Ristoski, P., Paulheim, H.: A comparison of propositionalization strategies for creating fea-
tures from linked open data. In: 1st W. on LD for Knowledge Disc. (2014)
25. Su´
arez, A.P., Mart’inez Trinidad, J.F., Carrasco-Ochoa, J.A.: A review of conceptual cluster-
ing algorithms. Artif. Intell. Rev. 52(2), 1267–1296 (2019)
26. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Pro-
ceedings of WWW. pp. 697–706 (2007)
27. Tiddi, I., d’Aquin, M., Motta, E.: Dedalo: Looking for clusters explanations in a labyrinth of
linked data. In: ESWC. pp. 333–348 (2014)
28. Tiddi, I., d’Aquin, M., Motta, E.: Data patterns explained with linked data. In: ECML/PKDD.
pp. 271–275 (2015)
29. Trouillon, T., Welbl, J., Riedel, S., Gaussier, ´
E., Bouchard, G.: Complex embeddings for
simple link prediction. In: ICML. pp. 2071–2080 (2016)
30. Vrandecic, D., Kr¨
otzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM
57(10), 78–85 (2014)
31. Wang, C., Pan, S., Hu, R., Long, G., Jiang, J., Zhang, C.: Attributed graph clustering: A deep
attentional embedding approach. In: IJCAI. pp. 3670–3676 (2019)
32. Xie, J., Girshick, R.B., Farhadi, A.: Unsupervised deep embedding for clustering analysis.
In: ICML. pp. 478–487 (2016)
33. Xie, R., Liu, Z., Jia, J., Luan, H., Sun, M.: Representation learning of KGs with entity de-
scriptions. In: AAAI. pp. 2659–2665 (2016)
34. Yang, F., Yang, Z., Cohen, W.W.: Differentiable learning of logical rules for knowledge base
reasoning. In: NeurIPS. pp. 2319–2328 (2017)
35. Zhang, W., Paudel, B., Wang, L., Chen, J., Zhu, H., Zhang, W., Bernstein, A., Chen, H.:
Iteratively learning embeddings and rules for knowledge graph reasoning. In: WWW. pp.
2366–2377 (2019)
... Clustering over (explainable) knowledge graph embeddings [14], [15] could also serve a similar purpose. However, the limited sample size available in our scenario probably is limiting the achievable quality of embeddings. ...
... Classifying whether an operator is building knowledge through exploring behaviour or exploiting knowledge in a given sample could be addressed by novelty detection [17], which could then be used to discount information gathered during explorative behaviour. Alternatively, clustering of knowledge graph embeddings [15] could be investigated if a suitable representation for actual parametrisations is found. To achieve this, however, a different representation of the knowledge graph is needed. ...
... Instead, we generate explanations by identifying shared semantic aspects and making predictions with the trained model. ExCut (Gad-Elrab et al. 2020) is another approach that uses KG embeddings to identify clusters of entities and then combines it with rule-mining methods to learn interpretable labels. ...
Conference Paper
Full-text available
Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves comparable or even superior performance to standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.
... For clustering customer embeddings, the traditional k-means algorithm with cosine distance was chosen, as in the example of [7]. The number of optimal clusters was determined using the elbow method. ...
Article
Full-text available
Modelling customer behaviour is extremely important in areas such as banking, allowing for improvement in the quality of customer service. Embedding methods have proven adequate for predicting subsequent customer interactions in several areas. However, there is little research on banking data, and it is also unclear which factors most influence the model's decisions. At the same time, knowledge graph embeddings are gaining more and more popularity, explicitly taking into account various types of relations in the data. In this study, we consider the problem of predicting the store of a customer's next purchase based on historical transactional data. The results show that knowledge graph-based embeddings better account for repetitive customer behaviour, while traditional matrix factorization-based methods better predict previously unobserved triplets. The paper also presents exciting discoveries regarding the interpretation of representations.
... The results of the clustering indicates the effectiveness of the text-embeddings over the KGEs. ExCut [5] performs clustering of entities by combining KG embeddings with rule mining methods. Even though ExCut also uses a real-world KG, the quality of the KG is better and suitable for applying KGEs as compared to the use-case (i.e., CAS-KG which is the KG generated from the RDB provided by CAS) that is being addressed in our paper. ...
Preprint
Full-text available
We study the problem of explainability-first clustering where explainability becomes a first-class citizen for clustering. Previous clustering approaches use decision trees for explanation, but only after the clustering is completed. In contrast, our approach is to perform clustering and decision tree training holistically where the decision tree's performance and size also influence the clustering results. We assume the attributes for clustering and explaining are distinct, although this is not necessary. We observe that our problem is a monotonic optimization where the objective function is a difference of monotonic functions. We then propose an efficient branch-and-bound algorithm for finding the best parameters that lead to a balance of cluster distortion and decision tree explainability. Our experiments show that our method can improve the explainability of any clustering that fits in our framework.
Conference Paper
Full-text available
We present RUDIK, a system for the discovery of declarative rules over knowledge-bases (KBs). RUDIK discovers rules that express positive relationships between entities, such as "if two persons have the same parent, they are siblings", and negative rules, i.e., patterns that identify contradictions in the data, such as "if two persons are married, one cannot be the child of the other". While the former class infers new facts in the KB, the latter class is crucial for other tasks, such as detecting erroneous triples in data cleaning, or the creation of negative examples to bootstrap learning algorithms. The system is designed to: (i) enlarge the expressive power of the rule language to obtain complex rules and wide coverage of the facts in the KB, (ii) discover approximate rules (soft constraints) to be robust to errors and incompleteness in the KB, (iii) use disk-based algorithms, effectively enabling rule mining in commodity machines. In contrast with traditional ranking of all rules based on a measure of support, we propose an approach to identify the subset of useful rules to be exposed to the user. We model the mining process as an incremental graph exploration problem and prove that our search strategy has guarantees on the optimality of the results. We have conducted extensive experiments using real-world KBs to show that RUDIK outperforms previous proposals in terms of efficiency and that it discovers more effective rules for the application at hand.
Article
Full-text available
Clustering is a fundamental technique in data mining and pattern recognition, which has been successfully applied in several contexts. However, most of the clustering algorithms developed so far have been focused only in organizing the collection of objects into a set of clusters, leaving the interpretation of those clusters to the user. Conceptual clustering algorithms, in addition to the list of objects belonging to the clusters, provide for each cluster one or several concepts, as an explanation of the clusters. In this work, we present an overview of the most influential algorithms reported in the field of conceptual clustering, highlighting their limitations or drawbacks. Additionally, we present a taxonomy of these methods as well as a qualitative comparison of these algorithms, regarding a set of characteristics desirable since a practical point of view, which may help in the selection of the most appropriate method for solving a problem at hand. Finally, some research lines that need to be further developed in the context of conceptual clustering are discussed.
Chapter
In this paper we propose and study the novel problem of explaining node embeddings by finding embedded human interpretable subspaces in already trained unsupervised node representation embeddings. We use an external knowledge base that is organized as a taxonomy of human-understandable concepts over entities as a guide to identify subspaces in node embeddings learned from an entity graph derived from Wikipedia. We propose a method that given a concept finds a linear transformation to a subspace where the structure of the concept is retained. Our initial experiments show that we obtain low error in finding fine-grained concepts.
Article
It is natural and effective to use rules for representing explicit knowledge in knowledge graphs. However, it is challenging to learn rules automatically from very large knowledge graphs such as Freebase and YAGO. This paper presents a new approach, RLvLR (Rule Learning via Learning Representations), to learning rules from large knowledge graphs by using the technique of embedding in representation learning together with a new sampling method. Based on RLvLR, a new method RLvLR-Stream is developed for learning rules from streams of knowledge graphs. Both RLvLR and RLvLR-Stream have been implemented and experiments conducted to validate the proposed methods regarding the tasks of rule learning and link prediction. Experimental results show that our systems are able to handle the task of rule learning from large knowledge graphs with high accuracy and outperform some state-of-the-art systems. Specifically, for massive knowledge graphs with hundreds of predicates and over 10M facts, RLvLR is much faster and can learn much more quality rules than major systems for rule learning in knowledge graphs such as AMIE+. In the setting of knowledge graph streams, RLvLR-Stream significantly improved RLvLR for both rule learning and link prediction.
Conference Paper
We propose an anytime bottom-up technique for learning logical rules from large knowledge graphs. We apply the learned rules to predict candidates in the context of knowledge graph completion. Our approach outperforms other rule-based approaches and it is competitive with current state of the art, which is based on latent representations. Besides, our approach is significantly faster, requires less computational resources, and yields an explanation in terms of the rules that propose a candidate.
Conference Paper
Graph clustering is a fundamental task which discovers communities or groups in networks. Recent studies have mostly focused on developing deep learning approaches to learn a compact graph embedding, upon which classic clustering methods like k-means or spectral clustering algorithms are applied. These two-step frameworks are difficult to manipulate and usually lead to suboptimal performance, mainly because the graph embedding is not goal-directed, i.e., designed for the specific clustering task. In this paper, we propose a goal-directed deep learning approach, Deep Attentional Embedded Graph Clustering (DAEGC for short). Our method focuses on attributed graphs to sufficiently explore the two sides of information in graphs. By employing an attention network to capture the importance of the neighboring nodes to a target node, our DAEGC algorithm encodes the topological structure and node content in a graph to a compact representation, on which an inner product decoder is trained to reconstruct the graph structure. Furthermore, soft labels from the graph embedding itself are generated to supervise a self-training graph clustering process, which iteratively refines the clustering results. The self-training process is jointly learned and optimized with the graph embedding in a unified framework, to mutually benefit both components. Experimental results compared with state-of-the-art algorithms demonstrate the superiority of our method.
Conference Paper
Several recently proposed methods aim to learn conceptual space representations from large text collections. These learned representations associate each object from a given domain of interest with a point in a high-dimensional Euclidean space, but they do not model the concepts from this domain, and can thus not directly be used for categorization and related cognitive tasks. A natural solution is to represent concepts as Gaussians, learned from the representations of their instances, but this can only be reliably done if sufficiently many instances are given, which is often not the case. In this paper, we introduce a Bayesian model which addresses this problem by constructing informative priors from background knowledge about how the concepts of interest are interrelated with each other. We show that this leads to substantially better predictions in a knowledge base completion task.
Conference Paper
We study the problem of learning first-order rules from large Knowledge Graphs (KGs). With recent advancement in information extraction, vast data repositories in the KG format have been obtained such as Freebase and YAGO. However, traditional techniques for rule learning are not scalable for KGs. This paper presents a new approach RLvLR to learning rules from KGs by using the technique of embedding in representation learning together with a new sampling method. Experimental results show that our system outperforms some state-of-the-art systems. Specifically, for massive KGs with hundreds of predicates and over 10M facts, RLvLR is much faster and can learn much more quality rules than major systems for rule learning in KGs such as AMIE+. We also used the RLvLR-mined rules in an inference module to carry out the link prediction task. In this task, RLvLR outperformed Neural LP, a state-of-the-art link prediction system, in both runtime and accuracy.
Article
Learned models composed of probabilistic logical rules are useful for many tasks, such as knowledge base completion. Unfortunately this learning problem is difficult, since determining the structure of the theory normally requires solving a discrete optimization problem. In this paper, we propose an alternative approach: a completely differentiable model for learning sets of first-order rules. The approach is inspired by a recently-developed differentiable logic, i.e. a subset of first-order logic for which inference tasks can be compiled into sequences of differentiable operations. Here we describe a neural controller system which learns how to sequentially compose the these primitive differentiable operations to solve reasoning tasks, and in particular, to perform knowledge base completion. The long-term goal of this work is to develop integrated, end-to-end systems that can learn to perform high-level logical reasoning as well as lower-level perceptual tasks.