Chapter

MOCHA 2017 as a Challenge for Virtuoso

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Mighty Storage Challenge (MOCHA) aims to test the performance of solutions for SPARQL processing, in several aspects relevant for modern Linked Data applications. Virtuoso, by OpenLink Software, is a modern enterprise-grade solution for data access, integration, and relational database management, which provides a scalable RDF Quad Store. In this paper, we present a short overview of Virtuoso with a focus on RDF triple storage and SPARQL query execution. Furthermore, we showcase the final results of the MOCHA 2017 challenge and its tasks, along with a comparison between the performance of our system and the other participating systems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Last year's Mighty Storage Challenge, MOCHA 2017, was quite successful for our team and Virtuoso -we won the overall challenge [5,9]. Building on that, we intend to participate in this year's challenge as well, in all four challenge tasks: (i) RDF data ingestion, (ii) data storage, (iii) versioning and (iv) browsing. ...
... A more detailed description of Virtuoso's triple storage, the compression implementation and the translation of SPARQL queries into SQL queries, is available in our paper from MOCHA 2017 [9]. ...
Conference Paper
Full-text available
Following the success of Virtuoso at last year's Mighty Storage Challenge - MOCHA 2017, we decided to participate once again and test the latest Virtuoso version against the new tasks which comprise the MOCHA 2018 challenge. The aim of the challenge is to test the performance of solutions for SPARQL processing in aspects relevant for modern applications: ingesting data, answering queries on large datasets and serving as backend for applications driven by Linked Data. The challenge tests the systems against data derived from real applications and with realistic loads, with an emphasis on dealing with changing data in the form of streams or updates. Virtuoso, by OpenLink Software, is a modern enterprise-grade solution for data access, integration, and relational database management, which provides a scalable RDF Quad Store. In this paper, we present the training phase results for Virtuoso, for the MOCHA 2018 challenge. These results will serve as a guideline for improvements in Virtuoso which will then be tested as part of the main MOCHA 2018 challenge.
... It is written in the Java programming language, and builds on a previous generator, in order to improve some of the metrics in the resulting graph and make its features closer to a real-world RDF dataset. Aside from this, our team has also worked with other RDF graph generators, for instance in the field of geo-spatial data [16][15] [14] and in benchmarking RDF storage solution [13] [18]. All of these examples include purpose-built RDF data generators, which serve a specific need. ...
Preprint
Full-text available
This paper introduces RDFGraphGen, a general-purpose, domain-independent generator of synthetic RDF graphs based on SHACL constraints. The Shapes Constraint Language (SHACL) is a W3C standard which specifies ways to validate data in RDF graphs, by defining constraining shapes. However, even though the main purpose of SHACL is validation of existing RDF data, in order to solve the problem with the lack of available RDF datasets in multiple RDF-based application development processes, we envisioned and implemented a reverse role for SHACL: we use SHACL shape definitions as a starting point to generate synthetic data for an RDF graph. The generation process involves extracting the constraints from the SHACL shapes, converting the specified constraints into rules, and then generating artificial data for a predefined number of RDF entities, based on these rules. The purpose of RDFGraphGen is the generation of small, medium or large RDF knowledge graphs for the purpose of benchmarking, testing, quality control, training and other similar purposes for applications from the RDF, Linked Data and Semantic Web domain. RDFGraphGen is open-source and is available as a ready-to-use Python package.
Article
This paper details RDF/OWL storage and management in two popular Relational Database Management Systems (RDBMSs): Oracle and Virtuoso. Popularity, sustainability, and conformance with the SPARQL language are the main reasons for choosing these systems. This work combines empirical and analytical studies guided by a comparative framework developed and motivated in the paper. Several dimensions are considered, including RDF data type preservation, SPARQL query and update processing, reasoning capabilities, custom inferences, blank node management, and other functional and non-functional features. Furthermore, a review of the performance assessments reported in the literature has been conducted. This study’s results identify the advantages and shortcomings of these RDBMSs for storing and managing RDF/OWL. They can help improve these systems or serve as a guide in choosing an appropriate system to use in a project context. Opportunities for further research efforts are also suggested.
Chapter
Following the success of Virtuoso at last year’s Mighty Storage Challenge - MOCHA 2017, we decided to participate once again and test the latest Virtuoso version against the new tasks which comprise the MOCHA 2018 challenge. The aim of the challenge is to test the performance of solutions for SPARQL processing in aspects relevant for modern applications: ingesting data, answering queries on large datasets and serving as backend for applications driven by Linked Data. The challenge tests the systems against data derived from real applications and with realistic loads, with an emphasis on dealing with changing data in the form of streams or updates. Virtuoso, by OpenLink Software, is a modern enterprise-grade solution for data access, integration, and relational database management, which provides a scalable RDF Quad Store. In this paper, we present the final challenge results from MOCHA 2018 for Virtuoso v8.0, compared to the other participating systems. Based on these results, Virtuoso v8.0 was declared as the overall winner of MOCHA 2018.
Conference Paper
Full-text available
Triple stores are the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triple stores is by far less homogeneous than suggested by previous benchmarks.
Conference Paper
Full-text available
This paper discusses RDF related work in the context of OpenLink Vir- tuoso, a general purpose relational / federated database and applications platform. We discuss adapting a relational engine for native RDF support with dedicated data types, bitmap indexing and SQL optimizer techniques. We further discuss mapping existing relational data into RDF for SPARQL access without converting the data into physical triples. We present conclusions and metrics as well as a number of use cases, from DBpedia to bio informatics and collaborative web applications.
Article
RDF (Resource Description Framework) is seeing rapidly increasing adoption, for example, in the context of the Linked Open Data (LOD) movement and diverse life sciences data publishing and integration projects. This paper discusses how we have adapted OpenLink Virtuoso, a general purpose RDBMS, for this new type of workload. We discuss adapting Virtuoso's relational engine for native RDF support with dedicated data types, bitmap indexing and SQL optimizer techniques. We further discuss scaling out by running on a cluster of commodity servers, each with local memory and disk. We look at how this impacts query planning and execution and how we achieve high parallel utilization of multiple CPU cores on multiple servers. We present comparisons with other RDF storage models as well as other approaches to scaling out on server clusters. We present conclusions and metrics as well as a number of use cases, from DBpedia to bio informatics and collaborative web applications.
Virtuoso, a Hybrid RDBMS/Graph Column Store. https://virtuoso.openlinksw.com/dataspace
  • O Erling
Data Extraction Benchmark for Sensor Data
  • K Georgala
First Version of the Data Storage Benchmark
  • M Jovanovik
  • M Spasić
First Version of the Faceted Browsing Benchmark
  • H Petzka