Heiko Paulheim

Heiko Paulheim
Universität Mannheim · Data and Web Science Group

PhD

About

238
Publications
27,870
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,817
Citations

Publications

Publications (238)
Article
The Semantic Web is distributed yet interoperable: Distributed since resources are created and published by a variety of producers, tailored to their specific needs and knowledge; Interoperable as entities are linked across resources, allowing to use resources from different providers in concord. Complementary to the explicit usage of Semantic Web...
Preprint
Full-text available
The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity typing is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents \textit{GRAND}, a novel approach for entity typing leveraging different graph walk st...
Preprint
Full-text available
Knowledge graphs have emerged as an effective tool for managing and standardizing semistructured domain knowledge in a human- and machine-interpretable way. In terms of graph-based domain applications, such as embeddings and graph neural networks, current research is increasingly taking into account the time-related evolution of the information enc...
Preprint
Full-text available
Knowledge graph embedding is a representation learning technique that projects entities and relations in a knowledge graph to continuous vector spaces. Embeddings have gained a lot of uptake and have been heavily used in link prediction and other downstream prediction tasks. Most approaches are evaluated on a single task or a single group of tasks...
Preprint
Full-text available
One of the strongest signals for automated matching of knowledge graphs and ontologies are textual concept descriptions. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is available to researchers. However, performing pairwise comparisons of all textual descriptions of concepts in...
Preprint
Full-text available
Medical diagnosis is the process of making a prediction of the disease a patient is likely to have, given a set of symptoms and observations. This requires extensive expert knowledge, in particular when covering a large variety of diseases. Such knowledge can be coded in a knowledge graph -- encompassing diseases, symptoms, and diagnosis paths. Sin...
Preprint
Full-text available
Ontology matching is a core task when creating interoperable and linked open datasets. In this paper, we explore a novel structure-based mapping approach which is based on knowledge graph embeddings: The ontologies to be matched are embedded, and an approach known as absolute orientation is used to align the two embedding spaces. Next to the approa...
Preprint
Full-text available
RDF2vec is a knowledge graph embedding mechanism which first extracts sequences from knowledge graphs by performing random walks, then feeds those into the word embedding algorithm word2vec for computing vector representations for entities. In this poster, we introduce two new flavors of walk extraction coined e-walks and p-walks, which put an emph...
Preprint
Full-text available
News recommender systems are used by online news providers to alleviate information overload and to provide personalized content to users. However, algorithmic news curation has been hypothesized to create filter bubbles and to intensify users' selective exposure, potentially increasing their vulnerability to polarized opinions and fake news. In th...
Article
Full-text available
Knowledge Graph Embeddings, i.e., projections of entities and relations to lower dimensional spaces, have been proposed for two purposes: (1) providing an encoding for data mining tasks, and (2) predicting links in a knowledge graph. Both lines of research have been pursued rather in isolation from each other so far, each with their own benchmarks...
Chapter
In recent scientific work, a classification-based approach to the machinery prognostics problem has been elaborated as an alternative to the Remaining Useful Life approaches. The classification-based approaches rely on a prediction horizon parameter, to which the model quality is sensitive. However, existing studies do not provide any means of dete...
Preprint
Full-text available
Knowledge graphs (KGs) provide information in machine interpretable form. In cases where multiple KGs are used in the same system, that information needs to be integrated. This is usually done by automated matching systems. Most of those systems consider only 1:1 (binary) matching tasks. Thus, matching a larger number of knowledge graphs with such...
Preprint
Full-text available
CaLiGraph is a large-scale cross-domain knowledge graph generated from Wikipedia by exploiting the category system, list pages, and other list structures in Wikipedia, containing more than 15 million typed entities and around 10 million relation assertions. Other than knowledge graphs such as DBpedia and YAGO, whose ontologies are comparably simpli...
Conference Paper
The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2021 campaign offered 13 tracks and w...
Chapter
The use of external background knowledge can be beneficial for the task of matching schemas or ontologies automatically. In this paper, we exploit six general-purpose knowledge graphs as sources of background knowledge for the matching task. The background sources are evaluated by applying three different exploitation strategies. We find that expli...
Chapter
Pricing decisions are increasingly made by AI. Thanks to their ability to train with live market data while making decisions on the fly, deep reinforcement learning algorithms are especially effective in taking such pricing decisions. In e-commerce scenarios, multiple reinforcement learning agents can set prices based on their competitor’s prices....
Conference Paper
Full-text available
One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character-or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language mo...
Preprint
Full-text available
One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character- or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language m...
Chapter
Wissensgraphen sind eine gebräuchliche Form der Wissensrepräsentation, die aus dem Bereich Semantic Web stammt. Sie werden heute in vielen Anwendungen verwendet; prominent ist vor allem die Verwendung in der Web-Suche von Google, die auch den Begriff Wissensgraph (engl. Knowledge Graph) geprägt hat. In einem Wissensgraph werden Dinge der Welt (z. B...
Chapter
As Knowledge Graphs are symbolic constructs, specialized techniques have to be applied in order to make them compatible with data mining techniques. RDF2Vec is an unsupervised technique that can create task-agnostic numerical representations of the nodes in a KG by extending successful language modeling techniques. The original work proposed the We...
Preprint
Full-text available
The RDF2vec method for creating node embeddings on knowledge graphs is based on word2vec, which, in turn, is agnostic towards the position of context words. In this paper, we argue that this might be a shortcoming when training RDF2vec, and show that using a word2vec variant which respects order yields considerable performance gains especially on t...
Preprint
Full-text available
Pricing decisions are increasingly made by AI. Thanks to their ability to train with live market data while making decisions on the fly, deep reinforcement learning algorithms are especially effective in taking such pricing decisions. In e-commerce scenarios, multiple reinforcement learning agents can set prices based on their competitor's prices....
Preprint
Full-text available
Modern large-scale knowledge graphs, such as DBpedia, are datasets which require large computational resources to serve and process. Moreover, they often have longer release cycles, which leads to outdated information in those graphs. In this paper, we present DBpedia on Demand -- a system which serves DBpedia resources on demand without the need t...
Chapter
Python is currently the most used platform for data science and machine learning. At the same time, public knowledge graphs have been identified as a valuable source of background knowledge in many data science tasks. In this paper, we introduce the kgextension package for Python, which allows for using knowledge graph in data science pipelines bui...
Chapter
Full-text available
The entity type information in a Knowledge Graph (KG) plays an important role in a wide range of applications in Natural Language Processing such as entity linking, question answering, relation extraction, etc. However, the available entity types are often noisy and incomplete. Entity Typing is a non-trivial task if enough information is not availa...
Preprint
Full-text available
The use of external background knowledge can be beneficial for the task of matching schemas or ontologies automatically. In this paper, we exploit six general-purpose knowledge graphs as sources of background knowledge for the matching task. The background sources are evaluated by applying three different exploitation strategies. We find that expli...
Preprint
Full-text available
In today's academic publishing model, especially in Computer Science, conferences commonly constitute the main platforms for releasing the latest peer-reviewed advancements in their respective fields. However, choosing a suitable academic venue for publishing one's research can represent a challenging task considering the plethora of available conf...
Chapter
Tables on the web constitute a valuable data source for many applications, like factual search and knowledge base augmentation. However, as genuine tables containing relational knowledge only account for a small proportion of tables on the web, reliable genuine web table classification is a crucial first step of table extraction. Previous works usu...
Preprint
Taxonomies are an important ingredient of knowledge organization, and serve as a backbone for more sophisticated knowledge representations in intelligent systems, such as formal ontologies. However, building taxonomies manually is a costly endeavor, and hence, automatic methods for taxonomy induction are a good alternative to build large-scale taxo...
Preprint
Full-text available
Public knowledge graphs such as DBpedia and Wikidata have been recognized as interesting sources of background knowledge to build content-based recommender systems. They can be used to add information about the items to be recommended and links between those. While quite a few approaches for exploiting knowledge graphs have been proposed, most of t...
Article
Full-text available
The problem of automatic detection of fake news in social media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or no...
Preprint
Full-text available
This paper presents the FinMatcher system and its results for the FinSim 2021 shared task which is co-located with the Workshop on Financial Technology on the Web (FinWeb) in conjunction with The Web Conference. The FinSim-2 shared task consists of a set of concept labels from the financial services domain. The goal is to find the most relevant top...
Preprint
Tables on the web constitute a valuable data source for many applications, like factual search and knowledge base augmentation. However, as genuine tables containing relational knowledge only account for a small proportion of tables on the web, reliable genuine web table classification is a crucial first step of table extraction. Previous works usu...
Preprint
Full-text available
Knowledge about entities and their interrelations is a crucial factor of success for tasks like question answering or text summarization. Publicly available knowledge graphs like Wikidata or DBpedia are, however, far from being complete. In this paper, we explore how information extracted from similar entities that co-occur in structures like table...
Book
This book constitutes the refereed proceedings of the 18th International Semantic Web Conference, ESWC 2021, held virtually in June 2021. The 41 full papers and 2 short papers presented were carefully reviewed and selected from 167 submissions. The papers were submitted to three tracks: the research track, the resource track and the in-use track. T...
Preprint
Full-text available
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of ent...
Chapter
In this demo, we introduce MELT Dashboard, an interactive Web user interface for ontology alignment evaluation which is created with the existing Matching EvaLuation Toolkit (MELT). Compared to existing, static evaluation interfaces in the ontology matching domain, our dashboard allows for interactive self-service analyses such as a drill down into...
Chapter
The taxation of multi-national companies is a complex field, since it is influenced by the legislation of several states. Laws in different states may have unforeseen interaction effects, which can be exploited by allowing multinational companies to minimize taxes, a concept known as tax planning. In this paper, we present a knowledge graph of mult...
Preprint
Full-text available
In this paper, we present MELT-ML, a machine learning extension to the Matching and EvaLuation Toolkit (MELT) which facilitates the application of supervised learning for ontology and instance matching. Our contributions are twofold: We present an open source machine learning extension to the matching toolkit as well as two supervised learning use...
Preprint
Knowledge graph embedding approaches represent nodes and edges of graphs as mathematical vectors. Current approaches focus on embedding complete knowledge graphs, i.e. all nodes and edges. This leads to very high computational requirements on large graphs such as DBpedia or Wikidata. However, for most downstream application scenarios, only a small...
Preprint
Full-text available
As KGs are symbolic constructs, specialized techniques have to be applied in order to make them compatible with data mining techniques. RDF2Vec is an unsupervised technique that can create task-agnostic numerical representations of the nodes in a KG by extending successful language modelling techniques. The original work proposed the Weisfeiler-Leh...
Preprint
Full-text available
RDF2vec is an embedding technique for representing knowledge graph entities in a continuous vector space. In this paper, we investigate the effect of materializing implicit A-box axioms induced by subproperties, as well as symmetric and transitive properties. While it might be a reasonable assumption that such a materialization before computing emb...
Preprint
Full-text available
The taxation of multi-national companies is a complex field, since it is influenced by the legislation of several states. Laws in different states may have unforeseen interaction effects, which can be exploited by allowing multinational companies to minimize taxes, a concept known as tax planning. In this paper, we present a knowledge graph of mult...
Article
Full-text available
Popular cross-domain knowledge graphs, such as DBpedia and YAGO, are built from Wikipedia, and therefore similar in coverage. In contrast, Wikifarms like Fandom contain Wikis for specific topics, which are often complementary to the information contained in Wikipedia, and thus DBpedia and YAGO. Extracting these Wikis with the DBpedia extraction fra...
Chapter
The Ontology Alignment Evaluation Initiative (OAEI) is an annual evaluation of ontology matching tools. In 2018, we have started the Knowledge Graph track, whose goal is to evaluate the simultaneous matching of entities and schemas of large-scale knowledge graphs. In this paper, we discuss the design of the track and two different strategies of gol...
Chapter
When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-ba...
Preprint
Full-text available
In this demo, we introduce MELT Dashboard, an interactive Web user interface for ontology alignment evaluation which is created with the existing Matching EvaLuation Toolkit (MELT). Compared to existing, static evaluation interfaces in the ontology matching domain, our dashboard allows for interactive self-service analyses such as a drill down into...
Preprint
Full-text available
RDF2vec is a technique for creating vector space embeddings from an RDF knowledge graph, i.e., representing each entity in the graph as a vector. It first creates sequences of nodes by performing random walks on the graph. In a second step, those sequences are processed by the word2vec algorithm for creating the actual embeddings. In this paper, we...
Article
Full-text available
It is conventional wisdom in machine learning and data mining that logical models such as rule sets are more interpretable than other models, and that among such rule-based models, simpler models are more interpretable than more complex ones. In this position paper, we question this latter assumption by focusing on one particular aspect of interpre...
Preprint
When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-ba...
Preprint
In this paper, we present KGvec2go, a Web API for accessing and consuming graph embeddings in a light-weight fashion in downstream applications. Currently, we serve pre-trained embeddings for four knowledge graphs. We introduce the service and its usage, and we show further that the trained models have semantic value by evaluating them on multiple...
Preprint
Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowledge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nod...
Preprint
The Ontology Alignment Evaluation Initiative (OAEI) is an annual evaluation of ontology matching tools. In 2018, we have started the Knowledge Graph track, whose goal is to evaluate the simultaneous matching of entities and schemas of large-scale knowledge graphs. In this paper, we discuss the design of the track and two different strategies of gol...
Book
This book constitutes the refereed proceedings of the 17th International Semantic Web Conference, ESWC 2020, held in Heraklion, Crete, Greece.* The 39 revised full papers presented were carefully reviewed and selected from 166 submissions. The papers were submitted to three tracks: the research track, the resource track and the in-use track. These...
Chapter
Full-text available
SciGraph is a Linked Open Data graph published by Springer Nature which contains information about conferences and conference publications. In this paper, we discuss how this dataset can be utilized to build a conference recommendation system, yielding a recall@10 of up to 0.665, and a MAP of up to 0.540, generating recommendations based on authors...
Chapter
Full-text available
Knowledge Graph completion deals with the addition of missing facts to knowledge graphs. While quite a few approaches exist for type and link prediction in knowledge graphs, the addition of literal values (also called instance or entity attributes) is not very well covered in the literature. In this paper, we present an approach for extracting nume...
Chapter
Full-text available
In this paper, we present the Ontology Matching EvaLuation Toolkit (MELT), a software toolkit to facilitate ontology matcher development, configuration, evaluation, and packaging. Compared to existing tools in the ontology matching domain, our framework offers detailed evaluation capabilities on the correspondence level of alignments as well as ext...
Conference Paper
Full-text available
The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity (from simple thesauri to expressive OWL ontologies) and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consen...
Chapter
Ensembles of machine learning models have proven to improve the performance of prediction tasks in various domains. The additional computational costs for the performance increase are usually high since multiple models must be trained. Recently, snapshot ensembles (Huang et al. in Snapshot ensembles: train 1 get M for free, (2017) [16]) provide a c...
Chapter
The Wikipedia category graph serves as the taxonomic backbone for large-scale knowledge graphs like YAGO or Probase, and has been used extensively for tasks like entity disambiguation or semantic similarity estimation. Wikipedia’s categories are a rich source of taxonomic as well as non-taxonomic information. The category German science fiction wri...
Chapter
Classical statistical models for time series forecasting most often make a number of assumptions about the data at hand, therewith, requiring intensive manual preprocessing steps prior to modeling. As a consequence, it is very challenging to come up with a more generic forecasting framework. Extensive hyperparameter optimization and ensemble archit...
Preprint
The Wikipedia category graph serves as the taxonomic backbone for large-scale knowledge graphs like YAGO or Probase, and has been used extensively for tasks like entity disambiguation or semantic similarity estimation. Wikipedia's categories are a rich source of taxonomic as well as non-taxonomic information. The category 'German science fiction wr...