- A preview of this full-text is provided by Springer Nature.
- Learn more
Preview content only
Content available from Knowledge and Information Systems
This content is subject to copyright. Terms and conditions apply.
Knowledge and Information Systems (2020) 62:2169–2190
https://doi.org/10.1007/s10115-019-01415-5
REGULAR PAPER
DBkWik: extracting and integrating knowledge from
thousands of Wikis
Sven Hertling1·Heiko Paulheim1
Received: 2 January 2019 / Revised: 3 October 2019 / Accepted: 5 October 2019 /
Published online: 2 November 2019
© Springer-Verlag London Ltd., part of Springer Nature 2019
Abstract
Popular cross-domain knowledge graphs, such as DBpedia and YAGO, are built from
Wikipedia, and therefore similar in coverage. In contrast, Wikifarms like Fandom contain
Wikis for specific topics, which are often complementary to the information contained in
Wikipedia, and thus DBpedia and YAGO. Extracting these Wikis with the DBpedia extrac-
tion framework is possible, but results in many isolated knowledge graphs. In this paper,
we show how to create one consolidated knowledge graph, called DBkWik, from thousands
of Wikis. We perform entity resolution and schema matching, and show that the resulting
large-scale knowledge graph is complementary to DBpedia. Furthermore, we discuss the
potential use of DBkWik as a benchmark for knowledge graph matching.
Keywords Knowledge graph creation ·Information extraction ·Linked open data ·
Knowledge graph matching
1 Introduction
General purpose knowledge graphs, such as DBpedia, YAGO, and Wikidata, have become
a central part of the linked open data cloud [49] and are among the most frequently used
datasets within the Web of data [8]. Such knowledge graphs contain information on millions
of entities from multiple topical domains [37].
Many of the popular knowledge graphs are created from Wikipedia and hence have a
similar coverage [47]. Generally speaking, each real-world entity for which a dedicated
Wikipedia page exists becomes an entity in the knowledge graph. This is a fundamental
restriction for many applications—for example, for building content-based recommender
systems backed by knowledge graphs, Di Noia et al. showed that the coverage of entities in
popular recommender system datasets in DBpedia is no more than 85% for movies, 63% for
music artists, and 31% for books [35].
BSven Hertling
sven@informatik.uni-mannheim.de
Heiko Paulheim
heiko@informatik.uni-mannheim.de
1Data and Web Science Group, University of Mannheim, Mannheim, Germany
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.