ChapterPDF Available

Do Judge an Entity by Its Name! Entity Typing Using Language Models

  • Télécom Paris Institut Polytechnique de Paris

Abstract and Figures

The entity type information in a Knowledge Graph (KG) plays an important role in a wide range of applications in Natural Language Processing such as entity linking, question answering, relation extraction, etc. However, the available entity types are often noisy and incomplete. Entity Typing is a non-trivial task if enough information is not available for the entities in a KG. In this work, neural language models and a character embedding model are exploited to predict the type of an entity from only the name of the entity without any other information from the KG. The model has been successfully evaluated on a benchmark dataset.
Content may be subject to copyright.
See discussions, stats, and author profiles for this publication at:
Do Judge an Entity by its Name! Entity Typing using Language Models
Conference Paper · May 2021
6 authors, including:
Some of the authors of this publication are also working on these related projects:
TOPORAZ - Topographie in Raum und Zeit View project
Soknos View project
Russa Biswas
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure
Mehwish Alam
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure
Heiko Paulheim
Universität Mannheim
Harald Sack
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure
All content following this page was uploaded by Harald Sack on 21 May 2021.
The user has requested enhancement of the downloaded file.
Do Judge an Entity by its Name!
Entity Typing using Language Models
Russa Biswas1,2, Radina Sofronova1,2, Mehwish Alam1,2,
Nicolas Heist3, Heiko Paulheim3, and Harald Sack1,2
1FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Germany,
2Karlsruhe Institute of Technology, Institute AIFB, Germany
3University of Mannheim, Germany
Abstract. The entity type information in a Knowledge Graph (KG)
plays an important role in a wide range of applications in Natural Lan-
guage Processing such as entity linking, question answering, relation ex-
traction, etc. However, the available entity types are often noisy and
incomplete. Entity Typing is a non-trivial task if enough information
is not available for the entities in a KG. In this work, neural language
models and a character embedding model are exploited to predict the
type of an entity from only the name of the entity without any other
information from the KG. The model has been successfully evaluated on
a benchmark dataset.
Keywords: Entity Type Prediction ·Knowledge Graph Completion ·
Deep Neural Networks.
1 Introduction
Entity Typing is a vital task in Knowledge Graph (KG) completion and con-
struction. The entity types in KGs such as DBpedia, YAGO, Wikidata, etc. are
either extracted automatically from structured data, generated using heuristics,
or are human-curated. These factors lead to incomplete and noisy entity type
information in the KGs. More specifically, in case of DBpedia, the Wikipedia
infoboxes are the primary source of information. The types of the entities in
Wikipedia infoboxes are mapped to the classes in DBpedia. Recent years have
witnessed research in the automated prediction of entity types in KGs using
heuristics [5] as well as neural network-based models [1, 3, 4]. The existing state-
of-the-art (SOTA) models exploit the triples in the KGs whereas others consider
the textual entity descriptions as well. While those approaches work well if there
is a lot of information about an entity, it is still a challenge to type entities
for which there is only scarce information. This paper focuses on predicting the
entity types solely from their label names, e.g., Is it possible to predict that the
entity dbr:Berlin is a place only from its name?. To do so, the SOTA contin-
uous space-based Neural Language Models (NLM) such as Word2Vec, GloVe,
2 R. Biswas et al.
Wikipedia2Vec [11], BERT [6] as well as a character embedding model are ex-
ploited. This work tackles the challenge of insufficient information for the enti-
ties. Since the NLMs are trained on a huge amount of textual data, they provide
implicit contextual information about the entities in their corresponding latent
representations. In this work, the task of entity typing is considered as a classi-
fication problem in which a neural network-based classifier is applied on top of
the NLMs. Furthermore, an analysis of the performance of the different NLMs
for this task is provided.
2 Related Work
A heuristic based approach SDType [5] leverages the relations between the in-
stances to predict the types of the entities. In [3, 4], the authors propose embed-
ding based entity typing models considering the structural information in the
KG as well as the textual entity descriptions. The word embedding models such
as Word2Vec, GloVe, FastText are trained on KGs in [1] to generate the entity
vectors to predict the types of entities. Other language model based entity typ-
ing models are proposed in MuLR [10] and FIGMENT [9] in which multi-level
representations of entities are learned by using character, word, and entity em-
beddings. However, these entity type prediction models based on NLMs do not
restrict themselves to only the label names and consider the other information
available in the KGs. In [8], the authors propose a model in which the pre-trained
RDF2Vec vectors are used to predict the entity types using a classifier. Also, the
meaningfulness of the entity names in Semantic Web has been studied in [7].
However, unlike the SOTA models, in this work, the NLMs are leveraged to gen-
erate the entity embeddings from the names of the entities for the task of entity
type prediction.
3 Model
This section discusses the NLMs and the classifiers used for the task of entity
typing only from the names of the entities.
Word2Vec. It aims to learn the distributed representation for words reduc-
ing the high dimensional word representations in a large corpus. The CBOW
Word2Vec model predicts the current word from a window of context words and
the skip-gram model predicts the context words based on the current word.
GloVe. GloVe exploits the global word-word co-occurrence statistics in the cor-
pus with the underlying intuition that the ratios of word-word co-occurrence
probabilities encode some form of the meaning of the words.
BERT. Bidirectional Encoder Representations from Transformers is a contex-
tual information based embedding approach in which pretraining on bidirectional
representations from unlabeled text by using the left and the right context in all
the layers is performed.
Wikipedia2vec. The model jointly learns word and entity embeddings from
Wikipedia where similar words and entities are close to one another in the vector
Do Judge an Entity by its Name! Entity Typing using Language Models 3
space. It uses three submodels to learn the representation namely: Wikipedia
Link Graph Model, Word-based skip-gram model, and Anchor context model.
Character Embedding. Character embedding represents the latent represen-
tations of characters trained over a corpus which helps in determining the vector
representations of out-of-vocabulary words.
Embeddings of the Entity Names. In this work, pre-trained Word2Vec model
on Google News dataset4, GloVe model pre-trained on Wikipedia 2014 ver-
sion and Gigaword 55, Wikipedia2Vec model pre-trained on English Wikipedia
2018 version6, and pre-trained English character embeddings derived from GloVe
840B/300D dataset7are used with a vector dimension of 300. The average of
all word vectors in the entity names is taken as the vector representation of the
entities. For BERT, the average of the last four hidden layers of the model is
taken as a representation of the names of entities and the dimension used is 768.
Classification. In this work, entity typing is considered a classification task
with the types of entities as classes. Two classifiers have been built on top of
the NLMs: (i) Fully Connected Neural Network (FCNN), and (ii) Convolutional
Neural Network (CNN). A three-layered FCNN model consisting of two dense
layers with ReLU as an activation function has been used on the top of the
vectors generated from the NLMs. The softmax function is used in the last layer
to calculate the probability of the entities belonging to different classes. The
CNN model consists of two 1-D convolutional layers followed by a global max-
pooling layer. ReLu is used as an activation function in the convolutional layers
and the output of the pooling layer is then passed through a fully connected
final layer, in which the softmax function predicts the classes of the entities.
4 Evaluation
This section consists of a detailed description of the datasets used for evaluating
the models, followed by an analysis of the results obtained.
Datasets. The experiments are conducted on the benchmark dataset DBpe-
dia630k [12] extracted from DBpedia consisting of 14 non-overlapping classes8
with 560,000 train and 70,000 test entities. However, predicting fine-grained type
information of an entity only from its name is a non-trivial task. For e.g. identify-
ing dbr:Kate Winslet as an Athlete or Artist from only the entity name is chal-
lenging. Therefore, seven coarse-grained classes of the entities in this dataset are
considered: dbo:Organisation,dbo:Person,dbo:MeanOfTransportation,dbo:Place,
dbo:Animal,dbo:Plant, and dbo:Work. Also, 4.656% of the total entities in the
train set and 4.614% entities in the test set have their type information men-
tioned in their RDF(S) labels. For example, dbr:Cybersoft (video game company)
has the label Cybersoft (video game company) stating that it is a Company.
4 R. Biswas et al.
Table 1. Results on the DBpedia630k dataset (in accuracy %)
Embedding Types in Labels no Types in Labels CaLiGraph Test Set
word2vec 80.11 46.71 72.08 44.39 48.93 25.91
GloVe 83.34 54.06 82.62 53.41 61.88 31.3
wikipedia2vec 91.14 60.47 90.68 57.36 75.21 36.97
BERT 67.37 62.27 64.63 60.4 53.42 35.55
character embedding 73.43 58.13 72.66 58.3 54.91 45.73
Therefore, the experiments are conducted both with and without the type infor-
mation in the names for the DBpedia630k dataset. To evaluate the approaches
independently of DBpedia, we use an additional test set9composed of entities
from CaLiGraph [2]. The latter is a Wikipedia-based KG containing entities ex-
tracted from tables and enumerations in Wikipedia articles. It consists of 70,000
entities that are unknown to DBpedia and evenly distributed among 7 classes.
Results. The results in Table 1 depict that for all the NLMs, FCNN works
better compared to the CNN model. This is because the CNN model does not
work well in finding patterns in the label names of the entities. Also, BERT
performs the worst in predicting the type of the entities from their label names.
Further error analysis shows that only 4.2% of the total person entities in the
test set with Types in Labels variation of the dataset have been correctly iden-
tified as dbo:Person for BERT. Since the names of persons can be ambiguous
and BERT is a contextual embedding model, the vector representations of the
entities generated only from their label names do not provide a proper latent
representation of the entity. However, FCNN achieves an accuracy of 84.74%
on the same dataset without the class dbo:Person for BERT. On the other
hand, Wikipedia2Vec works best amongst all the NLMs for FCNN with an ac-
curacy of 91.14% and 90.68% on the Types in Labels and no Types in Labels
variants of the dataset respectively. Also, on removal of the class dbo:Person
from the dataset, it achieves an accuracy of 91.01% on Types in Labels vari-
ant. Therefore, the decrease of 0.13% in the accuracy infers that entities of the
class dbo:Person are well represented in the entity vectors obtained from the
pre-trained Wikipedia2Vec model.
However, after removing the type information from the name labels, a slight
drop in the accuracy for each model has been observed for both the classifiers.
Wikipedia2Vec and the character embedding model experience the smallest drop
in accuracy of 0.46% and 0.77% with the FCNN classifier. This is because DB-
pedia entities are extracted from Wikipedia articles, therefore the vectors of the
entities are well represented by the Wikipedia2Vec model. Also for character
embedding, removal of the type information from their labels has low impact
because the vector representation of the entity names depends on the corre-
sponding character vectors and not word vectors. Furthermore, an unseen test
in-a-name caligraph test-balanced70k.csv.bz2
Do Judge an Entity by its Name! Entity Typing using Language Models 5
set from CaLiGraph has been evaluated on the classification model trained on
the no Types in Labels variation of the dataset. On the CaLiGraph test set, the
FCNN model achieves the best results with the Wikipedia2Vec model with an
accuracy of 75.21%. The entities in the CaLiGraph test set are not contained in
DBpedia, hence the representations of these entities are not learned during the
training of the Wikipedia2Vec model. This depicts the robustness of the proposed
model and the entity vectors generated by taking average of the word vectors
present in the names of the entities provides a better latent representation.
5 Conclusion and Future Work
In this paper, different NLMs for entity typing in a KG have been analyzed. The
achieved results imply that NLMs can be exploited to get enough information
to predict the types of entities in a KG only from their names. In the future,
fine-grained type prediction using other features from the KG using the NLMs
is to be explored.
1. Biswas, R., Sofronova, R., Alam, M., Sack, H.: Entity type prediction in knowledge
graphs using embeddings. DL4KG @ ESWC2020 (2020)
2. Heist, N., Paulheim, H.: Entity extraction from Wikipedia list pages. In: ESWC
3. Jin, H., Hou, L., Li, J., Dong, T.: Attributed and predictive entity embedding for
fine-grained entity typing in knowledge bases. In: Coling (2018)
4. Jin, H., Hou, L., Li, J., Dong, T.: Fine-grained entity typing via hierarchical multi
graph convolutional networks. In: EMNLP-IJCNLP (2019)
5. Paulheim, H., Bizer, C.: Type Inference on Noisy RDF Data. In: ISWC (2013)
6. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle-
moyer, L.: Deep contextualized word representations. In: NAACL-HLT (2018)
7. de Rooij, S., Beek, W., Bloem, P., van Harmelen, F., Schlobach, S.: Are names
meaningful? quantifying social meaning on the semantic web. In: International
Semantic Web Conference. pp. 184–199. Springer (2016)
8. Sofronova, R., Sack, H.: Entity typing based on rdf2vec using supervised and unsu-
pervised methods. The Semantic Web: ESWC 2020 Satellite Events: ESWC 2020
Satellite Events, Heraklion, Crete, Greece, May 31–June 4, 2020, Revised Selected
Papers p. 203 (2020)
9. Yaghoobzadeh, Y., Adel, H., Sch¨utze, H.: Corpus-level fine-grained entity typing.
J. Artif. Intell. Res. 61, 835–862 (2018)
10. Yaghoobzadeh, Y., Sch¨utze, H.: Multi-level representations for fine-grained typing
of knowledge base entities. In: Proceedings of the 15th Conference of the European
Chapter of the Association for Computational Linguistics, EACL 2017, Valencia,
Spain, April 3-7, 2017, Volume 1: Long Papers (2017)
11. Yamada, I., Asai, A., Sakuma, J., Shindo, H., Takeda, H., Takefuji, Y., Matsumoto,
Y.: Wikipedia2vec: An efficient toolkit for learning and visualizing the embeddings
of words and entities from wikipedia. arXiv preprint 1812.06280v3 (2020)
12. Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text
classification. In: NIPS (2015)
View publication statsView publication stats
... We also investigate the presence of semantic properties in representations of entities, but do so with different methods and include models that have been fine-tuned. Biswas et al. (2021) use entity embeddings obtained from various language models to classify entities as one of 14 types. Interestingly, BERT embeddings obtain the lowest accuracy of the models tested. ...
... Interestingly, BERT embeddings obtain the lowest accuracy of the models tested. Where Biswas et al. (2021) embeds only the name of the entity, our work studies representations of entities that appear in context. ...
... Others have investigated how language models can be employed to identify errors in knowledge graphs [17] or explored using language models to weigh KG triples from ConceptNet for measuring semantic similarity. [18] have showcased the utility of language models in entity typing by predicting entity classes using language model-based approaches. ...
Full-text available
The advent of Large Language Models (LLM) has revolutionized the field of natural language processing, enabling significant progress in various applications. One key area of interest is the construction of Knowledge Bases (KB) using these powerful models. Knowledge bases serve as repositories of structured information, facilitating information retrieval and inference tasks. Our paper proposes LLM2KB, a system for constructing knowledge bases using large language models, with a focus on the Llama 2 architecture and the Wikipedia dataset. We perform parameter efficient instruction tuning for Llama-2-13b-chat and StableBeluga-13B by training small injection models that have only 0.05 % of the parameters of the base models using the Low Rank Adaptation (LoRA) technique. These injection models have been trained with prompts that are engineered to utilize Wikipedia page contexts of subject entities fetched using a Dense Passage Retrieval (DPR) algorithm, to answer relevant object entities for a given subject entity and relation. Our best performing model achieved an average F1 score of 0.6185 across 21 relations in the LM-KBC challenge held at the ISWC 2023 conference.
Conference Paper
Entity Typing is the task of assigning a type to an entity in a knowledge graph. In this paper, we propose ETwT (Entity Typing with Triples), which leverages the triples of an entity, namely its label, description and the property labels used on it. We analyse which language models and classifiers are best suited to this input and compare ETwT’s performance on coarse-grained and fine-grained entity typing. Our evaluation demonstrates that ETwT is able to predict coarse-grained entity types with an F\(_1\) score of 0.994, outperforming three baselines.
The entity type information in Knowledge Graphs (KGs) of different languages plays an important role in a wide range of Natural Language Processing applications. However, the entity types in KGs are often incomplete. Multilingual entity typing is a non-trivial task if enough information is not available for the entities in a KG. In this work, multilingual neural language models are exploited to predict the type of an entity from only the name of the entity. The model has been successfully evaluated on multilingual datasets extracted from different language chapters in DBpedia namely German, French, Spanish, and Dutch.KeywordsEntity type predictionKnowledge graph completionMultilingual language modelsClassification
Conference Paper
Full-text available
The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation. Entity Typing is the task of assigning or inferring the semantic type of an entity in a KG. This paper introduces an approach named Cat2Type which exploits the Wikipedia Categories to predict the missing entity types in a KG. This work extracts information from Wikipedia Category names and the Wikipedia Category graph which are the sources of rich semantic information about the entities. In Cat2Type, the characteristic features of the entities encapsulated in Wikipedia Category names are exploited using Neural Language Models. On the other hand, a Wikipedia Category graph is constructed to capture the connection between the categories. The Node level representations are learned by optimizing the neighbourhood information on the Wikipedia category graph. These representations are then used for entity type prediction via classification. The performance of Cat2Type is assessed on two real-world benchmark datasets DBpedia630k and FIGER. The experiments depict that Cat2Type obtained a significant improvement over state-of-the-art approaches.
Full-text available
Knowledge Graphs have been recognized as the foundation for diverse applications in the field of data mining, information retrieval, and natural language processing. So the completeness and the correctness of the KGs are of high importance. The type information of the entities in a KG, is one of the most vital facts. However, it has been observed that type information is often noisy or incomplete. In this work, the task of fine-grained entity typing is addressed by exploiting the pre-trained RDF2Vec vectors using supervised and unsupervised approaches.
Conference Paper
Full-text available
The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embed-dings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings. In our experiments, our tool achieved a state-of-the-art result on the KORE entity related-ness dataset, and competitive results on various standard benchmark datasets. Furthermore , our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings for 12 languages at
Full-text available
Extracting information about entities remains an important research area. This paper addresses the problem of corpus-level entity typing, i.e., inferring from a large corpus that an entity is a member of a class, such as “food” or “artist”. The application of entity typing we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding-based and combines (i) a global model that computes scores based on global information of an entity and (ii) a context model that first evaluates the individual occurrences of an entity and then aggregates the scores. Each of the two proposed models has specific properties. For the global model, learning high- quality entity representations is crucial because it is the only source used for the predictions. There- fore, we introduce representations using the name and contexts of entities on the three levels of entity, word, and character. We show that each level provides complementary information and a multi-level representation performs best. For the context model, we need to use distant supervision since there are no context-level labels available for entities. Distantly supervised labels are noisy and this harms the performance of models. Therefore, we introduce and apply new algorithms for noise mitigation using multi-instance learning. We show the effectiveness of our models on a large entity typing dataset built from Freebase.
Conference Paper
Full-text available
Entities are essential elements of natural language. In this paper, we present methods for learning multi-level representations of entities on three complementary levels: character (character patterns in entity names extracted, e.g., by neural networks), word (embeddings of words in entity names) and entity (entity embeddings). We investigate state-of-the-art learning methods on each level and find large differences, e.g., for deep learning models, traditional ngram features and the subword model of fasttext (Bojanowski et al., 2016) on the character level; for word2vec (Mikolov et al., 2013) on the word level; and for the order-aware model wang2vec (Ling et al., 2015a) on the entity level. We confirm experimentally that each level of representation contributes complementary information and a joint representation of all three levels improves the existing embedding based baseline for fine-grained entity typing by a large margin. Additionally, we show that adding information from entity descriptions further improves multi-level representations of entities.
When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia’s policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia’s list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
Conference Paper
This paper addresses the problem of inferring the Fine-Grained Entity Typing of an entity from a knowledge base. We convert this problem into the task of graph-based semi-supervised classification, and propose Hierarchical Multi Graph Convolutional Network (HMGCN), a novel Deep Learning architecture to tackle this problem. We construct three kinds of connectivity matrices to capture different kinds of semantic correlations between entities. A recursive regularization is proposed to model the subClassOf relations between types in given type hierarchy. Extensive experiments with two large-scale public datasets show that our proposed method significantly outperforms four state-of-the-art methods.
Conference Paper
According to its model-theoretic semantics, Semantic Web IRIs are individual constants or predicate letters whose names are chosen arbitrarily and carry no formal meaning. At the same time it is a well-known aspect of Semantic Web pragmatics that IRIs are often constructed mnemonically, in order to be meaningful to a human interpreter. The latter has traditionally been termed ‘social meaning’, a concept that has been discussed but not yet quantitatively studied by the Semantic Web community. In this paper we use measures of mutual information content and methods from statistical model learning to quantify the meaning that is (at least) encoded in Semantic Web names. We implement the approach and evaluate it over hundreds of thousands of datasets in order to illustrate its efficacy. Our experiments confirm that many Semantic Web names are indeed meaningful and, more interestingly, we provide a quantitative lower bound on how much meaning is encoded in names on a per-dataset basis. To our knowledge, this is the first paper about the interaction between social and formal meaning, as well as the first paper that uses statistical model learning as a method to quantify meaning in the Semantic Web context. These insights are useful for the design of a new generation of Semantic Web tools that take such social meaning into account.