ArticlePDF Available

D2R MAP -- A Database to RDF Mapping Language

Authors:

Abstract

The vision of the Semantic Web is to give data on the web a welldefined meaning by representing it in RDF and linking it to commonly accepted ontologies. Most formatted data today is stored in relational databases. To be able to use this data on the Semantic Web, we need a flexible but easy to use mechanism to map relational data into RDF. The poster presents D2R MAP, a declarative language to describe mappings between relational database schemata and OWL ontologies.
D2R MAP – A Database to RDF Mapping Language
Christian Bizer
Freie Universität Berlin
Institut für Produktion,
Wirtschaftsinformatik und OR
Garystr. 21, D-14195 Berlin, Germany
+49 30 838 54057
bizer@wiwiss.fu-berlin.de
ABSTRACT
The vision of the Semantic Web is to give data on the web a well-
defined meaning by representing it in RDF and linking it to com-
monly accepted ontologies. Most formatted data today is stored in
relational databases. To be able to use this data on the Semantic
Web, we need a flexible but easy to use mechanism to map rela-
tional data into RDF. The poster presents D2R MAP, a declarative
language to describe mappings between relational database
schemata and OWL ontologies.
Categories and Subject Descriptors
H.2.3 [Database Management]: Languages
General Terms
Standardization, Languages
Keywords
Semantic Web, relational databases, RDF data model, mapping
1. INTRODUCTION
The Semantic Web is an extension of the current Web, in which
data is given a well-defined meaning by representing it in RDF
and linking it to commonly accepted ontologies. This semantic
enrichment allows data to be shared, exchanged or integrated from
different sources and enables applications to use data in different
contexts [1].
Most formatted data today is stored in relational databases. To be
able to use this data in a semantic context, it has to be mapped
into RDF, the data format of the Semantic Web.
The data model behind RDF is a directed labelled graph, which
consists of nodes and labelled directed arcs linking pairs of nodes
[2]. To export data from an RDBMS into RDF, the relational
database model has to be mapped to the graph-based RDF data
model.
2. D2R LANGUAGE FEATURES
D2R MAP is a declarative, XML-based language to describe such
mappings. The main goal of the language design is to allow flexi-
ble mappings of complex relational structures without having to
change the existing database schema. This flexibility is achieved
by employing SQL statements directly in the mapping rules. The
resulting record sets are grouped afterwards and the data is
mapped to the created instances. This approach allows the
handling of binary and higher degree relationships, multivalued
class properties, complex conditions and highly normalized table
structures, where instance data is spread over several tables.
The mapping process performed by the D2R processor has four
logical steps as shown in Figure 1. For each class or group of
similar classes a record set is selected from the database. Second,
the record set is grouped according to the groupBy columns of the
specific ClassMap. Then the class instances are created and
assigned a URI or a blank node identifier. Finally, the instance
properties are created using datatype and object property bridges.
The division between step three and four allows references to
blank nodes within the model and to instances dynamically cre-
ated in the mapping process.
The second goal is to keep D2R MAP as simple as possible. Apart
from elements to describe the database connection and the name-
spaces used, the actual mappings are expressed with just three
elements. For each class or group of similar classes in the onto-
logy a ClassMap element is used. Each ClassMap has an sql
attribute and a groupBy attribute. To create instance URIs, pat-
terns and value substitution tables can be used. The instance prop-
erties are constructed with DataTypePropertyBridge elements for
literal properties, which can be typed using XML datatypes and
xml:lang attributes. Datatype properties can be converted simi-
larly using patterns and value substitution tables. References to
external resources or instances within the model are created with
an ObjectPropertyBridge element. To refer to the instances cre-
ated on the fly, a referredClass together with a referredGroupBy
attribute is used. Multiple values of a single property can be put in
rdf:Bag, rdf:Alt or rdf:Seq containers, using the useContainer
Copyright is held by the author/owner(s).
WWW 2003, May 20-24, 2003, Budapest, Hungary.
ACM xxx.
Figure 1. The D2R mapping process.
Table
Table
Table
Record set
Instance
Property
Property
Property
Grouped record set
Instance
Instance
Instance
Instance
attribute together with a DataTypePropertyBridge or Object-
PropertyBridge element.
3. EXAMPLE
The following example illustrates the use of a D2R MAP to ex-
port data about authors and their publications from a database into
RDF. Because authors usually have more than one publication
and publications can be written by multiple authors, the informa-
tion would typically be stored in three database tables: One for the
authors, one for their publications and a third one for the n:m
relationship between authors and publications. A D2R MAP
transformation of these tables to the classes ex:Author and
ex:Book could look as follows:
<Map>
<DBConnection odbcDSN="bookDB" />
<ProcessorMessage
outputFormat="RDF/XML-ABBREV"/>
<Namespace prefix="ex"
namespace="http://example.org#"/>
<ClassMap type="ex:Book"
sql="SELECT isbn, title FROM books;"
groupBy="isbn"
uriPattern="ex:book@@isbn@@">
<DatatypePropertyBridge
property="ex:title"
column="title" xml:lang="en"/>
</ClassMap>
<ClassMap type="ex:Author"
sql="SELECT authors.aid, name, URL,
isbn FROM authors, bookauthor
WHERE authors.aid = bookauthor.aid;"
groupBy="authors.aid">
<DatatypePropertyBridge
property="ex:fullname"
column="name" />
<ObjectPropertyBridge
property="ex:homepage"
column="URL" />
<ObjectPropertyBridge
property="ex:author_of"
referredClass="ex:Book"
referredGroupBy="isbn"
useContainer="rdf:Bag"/>
</ClassMap>
</Map>
The first three subelements define the database connection, the
desired output format and an example namespace. The first
ClassMap element describes the mapping for the ex:Book class.
The instance URIs are created using a pattern. The xml:lang
attribute is set for the title property.
The second ClassMap describes the creation of ex:Author in-
stances and links the authors to their publications using an rdf:bag
container for the ex:autor_of property. Because the ex:Author
class map contains no URI construction schema, instances are
identified as blank nodes. The following example instance is cre-
ated with the map above:
<ex:Author rdf:nodeID='A465'>
<ex:fullname>Chris Bizer</ex:fullname>
<ex:homepage rdf:resource=
'http://www.bizer.de'/>
<ex:author_of>
<rdf:Bag>
<rdf:li rdf:resource=
'http://example.org#book321230273'/>
<rdf:li rdf:resource=
'http://example.org#book884237273'/>
</rdf:Bag>
</ex:author_of>
</ex:Author>
This example shows only some features of D2R MAP. The com-
plete language specification and further examples are found at
http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm.
A D2R processor prototype is publicly available under GNU
LGPL license. The processor is implemented in Java and is based
on the Jena API [3]. It exports data as RDF, N3, N-TRIPLES and
Jena models. It is compliant with all relational databases offering
JDBC or ODBC access. The processor can be used in a servlet
environment to dynamically publish XHTML pages containing
RDF, as a database connector in applications working with Jena
models or as a command line tool.
4. RELATED WORK
Other mapping approaches which have influenced the design of
D2R MAP, are developed by the AIFB Institute, University of
Karlsruhe, Germany [4] and by Boeing, Philadelphia, USA [5]. It
is planned to extend D2R MAP with conditional mappings and
more sophisticated value transformation abilities. These exten-
sions could be based on RuleML [6], RDFT [7] or further lan-
guage constructs borrowed from XSLT.
REFERENCES
[1]
James Hendler, Tim Berners-Lee, Eric Miller. Integrating
Applications on the Semantic Web. Journal of the Institute of
Electrical Engineers of Japan, Vol 122(10), October 2002,
p.676-680.
[2]
Graham Klyne, Jeremy Carroll (eds.). Resource Description
Framework (RDF): Concepts and Abstract Syntax. W3C
Working Draft (work in progress). November 2002,
http://www.w3.org/TR/2002/WD-rdf-concepts-20021108/.
[3]
Brian McBride. Jena: Implementing the RDF Model and
Syntax Specification. Technical report, Hewlett Packard
Laboratories (Bristol, 2000). http://www.hpl.hp.com/
semweb/index.html.
[4]
Nenad Stojanovic, Ljiljana Stojanovic, Raphael Volz. A
reverse engineering approach for migrating data-intensive
web sites to the Semantic Web. IIP-2002 (Montreal, 2002).
http://www.aifb.uni-karlsruhe.de/WBS/nst/docs/papers/
IIPv31finalv1.pdf.
[5]
Tom Barrett et al. RDF Representation of Metadata for
Semantic Integration of Corporate Information Resources.
WWW2002
(Hawaii, 2002).
http://www.cs.rutgers.edu/
~shklar/www11/ final_submissions/paper3.pdf.
[6]
RuleML. http://www.dfki.uni-kl.de/ruleml/.
[7]
Borys Omelayenko. RDFT: A Mapping Meta-Ontology for
Business Integration. ECAI-2002 (
Lyon, 2002)
http://www.cs.vu.nl/~borys/papers/rdft4ktsw02.pdf.
... To build this language, the authors rely on the syntax of XQuery and define some new components that are available in SPARQL syntax. D2RMap [14] is a declarative language using XML syntax to define mappings between relational databases and RDF. R2RML [15] is the only mapping language that has become a W3C recommendation for data transformation. ...
... To build this language, the authors rely on the syntax of XQuery and define some new components that are available in SPARQL syntax. D2RMap [14] is a declarative language using XML syntax to define mappings between relational databases and RDF. R2RML [15] is the only mapping language that has become a W3C recommendation for data transformation. ...
Conference Paper
Currently, all blockchain-based applications conduct two primary operations, i.e., writing data on blockchain networks and reading these data from the networks. These tasks require users to have considerable knowledge in blockchain technology, and they become even more challenging if users want to utilize different blockchain platforms to write and read data. So far, we have not had a uniform mechanism to perform write and read operations on various blockchain platforms. In addition, writing a huge amount of data on blockchain networks is a time-consuming task and requires considerable transaction fees. To address these issues, we present in this paper a data mapping language named BML. BML allows developers to uniformly define mappings for data transformation from traditional data storage mechanisms into blockchain networks. Conversely, this language also assists users in describing requests to read transformed data. Furthermore, BML utilizes InterPlanetary File System (IPFS), which is a decentralized storage system, for directly storing the input data before writing the hash value of the data into blockchain networks. Currently, BML accepts five input data sources, including XML, JSON, XLSX, SQL (relational database), NoSQL, and supports two output platforms, including Hyperledger Sawtooth and Ethereum. We show that BML not only provides a unified way to read and write data but also has salient advantages of performance, scalability, parallelization, privacy, and decentralization.
Article
Full-text available
With the progressive improvements in the power, effectiveness, and reliability of AI solutions, more and more critical human problems are being handled by automated AI-based tools and systems. For more complex or particularly critical applications, the level of knowledge, not just information, must be handled by systems where explicit relationships among objects are represented and processed. For this purpose, the knowledge representation branch of AI proposes Knowledge Graphs, widely used in the Semantic Web, where different online applications may interact by understanding the meaning of the data they process and exchange. This paper describes a framework and online platform for the Internet-based knowledge graph definition, population, and exploitation based on the LPG graph model. Its main advantages are its efficiency and representational power and the wide range of functions that it provides to its users beyond traditional Semantic Web reasoning: network analysis, data mining, multistrategy reasoning, and knowledge browsing. Still, it can also be mapped onto the SW.
Chapter
Knowledge graph is an effective way to model and represent complex linked data, which have attracted broad research in recent years and have been applied in different fields. Considering the data characteristics and development needs of course platform in SCHOLAT, Scholar-Course Knowledge Graph (SCKG) is built with scholars and courses as the core concept and integrated it into the next version of our course platform. The ontology structure of SCKG is constructed first and then extracted knowledge from different data sources by employing D2R technology, web crawlers, etc. so as to add them to SCKG. There are 110,856 entities and 1,674,961 pairs of relationships in total after the construction of SCKG. 13 b-tree indexes and 3 full-text indexes are created on some key properties to speed up the query and we also defined some constraints on SCKG to ensure data consistency.
Chapter
There are many real-world application domains where data can be naturally modelled as a graph, such as social networks and computer networks. Relational Database Management Systems (RDBMS) find it hard to capture the relationships and inherent graph structure of data and are inappropriate for storing highly connected data; thus, graph databases have emerged to address the challenges of high data connectivity. As the performance of querying highly connected data in relational query statements is usually worse than that in the graph database. Transforming data from a relational database to a graph database is imperative for improving the performance of graph queries. In this paper, we demonstrate SQL2Cypher, a system for migrating data from a relational database to a graph database automatically. This system also supports translating SQL queries into Cypher queries. SQL2Cypher is open-source (https://github.com/UNSW-database/SQL2Cypher) to allow researchers and programmers to migrate data efficiently. Our demonstration video can be found here: https://www.youtube.com/watch?v=eGaeBrVTJws.
Chapter
Knowledge graph (KG), as a new type of knowledge representation, has gained much attention in knowledge engineering. It is difficult for researchers to construct a high-quality KG. Open-source software (OSS) has been slightly used for the knowledge graph construction, which provide an easier way for researchers to development KG quickly. In this work, we discuss briefly the process of KGC and involved techniques at first. This review also summarizes several OSSs available on the web, and their main functions and features, etc. We hope this work can provide some useful reference for knowledge graph construction.
Article
Full-text available
Organizations use data in different formats: Word documents, Excel spreadsheets, databases, HTML pages, etc. It is not easy to make decisions with such data due to the lack of integration between the different sources and built-in decision-making rules. Decisions can be reached with knowledge bases, which, unlike databases, make it possible to store not only objects, facts and attributes but also more sophisticated patterns such as rules and axioms. The paper proposes an ontology-based method for knowledge base creation that allows for the simultaneous integration of semi-structured data sources and extendibility while remaining context-independent. At the initial steps of the method, data specification should be performed with the Data Sources Ontology developed by the authors. This ontology provides data structure description that forms supportive knowledge graph. The graph's schema should be mapped with the domain ontology to be populated. Finally, the data are inserted into the domain ontology according to the mapping rules
Chapter
The basic principles of data consolidation of the production capacities planning system of the large industrial enterprise are formulated in this article. The article describes an example of data consolidation process of two relational databases (RDBs). The proposed approach involves using of ontological engineering methods for extracting metadata (ontologies) from RDB schemas. The research contains an analysis of approaches to the consolidation of RDBs at different levels. The merging of extracted metadata is used to organize the data consolidation process of several RDBs. The difference between the traditional and the proposed data consolidation algorithms is shown, their advantages and disadvantages are considered. The formalization of the integrating data model as system of extracted metadata of RDB schemas is described. Steps for integrating data model building in the process of ontology merging is presented. An example of the integrating data model building as settings for data consolidation process confirms the possibility of practical use of the proposed approach in the data consolidation process.
Article
Full-text available
Data processing is increasingly becoming the subject of various policies and regulations, such as the European General Data Protection Regulation (GDPR) that came into effect in May 2018. One important aspect of GDPR is informed consent, which captures one’s permission for using one’s personal information for specific data processing purposes. Organizations must demonstrate that they comply with these policies. The fines that come with non-compliance are of such importance that it has driven research in facilitating compliance verification. The state-of-the-art primarily focuses on, for instance, the analysis of prescriptive models and posthoc analysis on logs to check whether data processing is compliant to GDPR. We argue that GDPR compliance can be facilitated by ensuring datasets used in processing activities are compliant with consent from the very start. The problem addressed in this paper is how we can generate datasets that comply with given consent “just-in-time”. We propose RDF and OWL ontologies to represent the consent that an organization has collected and its relationship with data processing purposes. We use this ontology to annotate schemas, allowing us to generate declarative mappings that transform (relational) data into RDF driven by the annotations. We furthermore demonstrate how we can create compliant datasets by altering the results of the mapping. The use of RDF and OWL allows us to implement the entire process in a declarative manner using SPARQL. We have integrated all components in a service that furthermore captures provenance information for each step, further contributing to the transparency that is needed towards facilitating compliance verification. We demonstrate the approach with a synthetic dataset simulating users (re-)giving, withdrawing, and rejecting their consent on data processing purposes of systems. In summary, it is argued that the approach facilitates transparency and compliance verification from the start, reducing the need for posthoc compliance analysis common in the state-of-the-art.
Conference Paper
Full-text available
The Semantic Web is intended to enable machine understandable web content and seems to be a solution for many drawbacks of the current Web. It is based on metadata that formally describe the semantics of Web contents. In this paper we present an integrated and semi-automatic approach for generating shared-understandable metadata for data-intensive Web applications. This approach is based on mapping a given relational schema into an already existing ontology structure using a reverse engineering process. As a case study we present this style of a schema- and data-migration for our Institute web portal. The presented approach can be applied to a broad range of today’s data-intensive Web sites.
Article
In this paper, we will discuss the use of RDF-based metadata to achieve the semantic integration of corporate information resources. This approach uses the Resource Description Framework (RDF) and Resource Description Framework Schema (RDFS) for the representation of all metadata characterizing the content of the information source as well as an interlinqua for standardizing all communications between system components. We will discuss the integration architecture and describe the metadata structures used. We will also describe how metadata facilitates information browsing and a practical approach to query optimization.
Article
To create new added value the Semantic Web has to provide new means for business integration enabling open world-wide cooperation between various enterprises. Existing business integration techniques assume the enterprises having similar conceptual models of the objects being exchanged, and this limits their application areas. The Semantic Web needs to possess a technique capable of mapping different conceptual models in the business integration context. We propose a mapping meta-ontology built on top of RDF Schema using previous experiences in modelling mappings, specifics of RDF Schema and business integration tasks.