Chapter

MOCHA2017: The Mighty Storage Challenge at ESWC 2017

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The aim of the Mighty Storage Challenge (MOCHA) at ESWC 2017 was to test the performance of solutions for SPARQL processing in aspects that are relevant for modern applications. These include ingesting data, answering queries on large datasets and serving as backend for applications driven by Linked Data. The challenge tested the systems against data derived from real applications and with realistic loads. An emphasis was put on dealing with data in form of streams or updates.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Consequently, HOBBIT abides by the FAIR principles [47]. The practical usability of the platform was ensured by its use in 11 challenges between 2016 and 2019 (e.g., [1,17,18,21,22,24,31,34,40,41]). HOBBIT is open-source and can be deployed locally, on a local cluster and on computing services such as Amazon Web Services (AWS). ...
... The Mighty Storage Challenges 2017 and 2018 focused on benchmarking triple stores [17,18]. Their RDF data ingestion tasks showed that most triple stores are unable to consume and retrieve triples (e.g., sensor or event data) efficiently. ...
... Our results show that both versions of Virtuoso show similar performance on instance retrieval and facet counts. More detailed results of that challenge will be published in the ESWC challenge proceedings [10]. Interestingly, the query-per-second score was the lowest for choke points 6, 7 and 12 (see Figure 4). ...
... While choke point 12 concerns advanced property paths, both choke points 6 and 7 concern the performance on numerical data restrictions. Figure 4: Preliminary results from the MOCHA challenge -Query-per-second score [10] While the challenge was run with limited success in terms of the number of participating triple stores, further iterations of the faceted browsing challenge will be organized within the project over the next 18 months allowing further systems to participate. ...
Conference Paper
Full-text available
The increasing availability of large amounts of linked data creates a need for software that allows for its efficient exploration. Systems enabling faceted browsing constitute a user-friendly solution that need to combine suitable choices for front and back end. Since a generic solution must be adjustable with respect to the dataset, the underlying ontology and the knowledge graph characteristics raise several challenges and heavily influence the browsing experience. As a consequence, an understanding of these challenges becomes an important matter of study. We present a benchmark on faceted browsing of triple stores, which allows systems to test their performance on specific choke points on the back end. Further, we address additional issues in faceted browsing that may be caused by problematic modelling choices within the underlying ontology.
... The platform provides several default dataset generators, including PoDiGG, which can be used to benchmark RDF systems. PoDiGG, and its generated datasets are being used in the ESWC Mighty Storage Challenge 2017 and 2018 [35]. The first task of this challenge consists of RDF data ingestion into triple stores, and querying over this data. ...
... Last year's Mighty Storage Challenge, MOCHA 2017, was quite successful for our team and Virtuoso -we won the overall challenge [5,9]. Building on that, we intend to participate in this year's challenge as well, in all four challenge tasks: (i) RDF data ingestion, (ii) data storage, (iii) versioning and (iv) browsing. ...
Conference Paper
Full-text available
Following the success of Virtuoso at last year's Mighty Storage Challenge - MOCHA 2017, we decided to participate once again and test the latest Virtuoso version against the new tasks which comprise the MOCHA 2018 challenge. The aim of the challenge is to test the performance of solutions for SPARQL processing in aspects relevant for modern applications: ingesting data, answering queries on large datasets and serving as backend for applications driven by Linked Data. The challenge tests the systems against data derived from real applications and with realistic loads, with an emphasis on dealing with changing data in the form of streams or updates. Virtuoso, by OpenLink Software, is a modern enterprise-grade solution for data access, integration, and relational database management, which provides a scalable RDF Quad Store. In this paper, we present the training phase results for Virtuoso, for the MOCHA 2018 challenge. These results will serve as a guideline for improvements in Virtuoso which will then be tested as part of the main MOCHA 2018 challenge.
Article
When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PoDiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PoDiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PoDiGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data.
Chapter
Following the success of Virtuoso at last year’s Mighty Storage Challenge - MOCHA 2017, we decided to participate once again and test the latest Virtuoso version against the new tasks which comprise the MOCHA 2018 challenge. The aim of the challenge is to test the performance of solutions for SPARQL processing in aspects relevant for modern applications: ingesting data, answering queries on large datasets and serving as backend for applications driven by Linked Data. The challenge tests the systems against data derived from real applications and with realistic loads, with an emphasis on dealing with changing data in the form of streams or updates. Virtuoso, by OpenLink Software, is a modern enterprise-grade solution for data access, integration, and relational database management, which provides a scalable RDF Quad Store. In this paper, we present the final challenge results from MOCHA 2018 for Virtuoso v8.0, compared to the other participating systems. Based on these results, Virtuoso v8.0 was declared as the overall winner of MOCHA 2018.
Conference Paper
Full-text available
Synthetic datasets used in benchmarking need to mimic all characteristics of real-world datasets, in order to provide realistic benchmarking results. Synthetic RDF datasets usually show a significant discrepancy in the level of structuredness compared to real-world RDF datasets. This structural difference is important as it directly affects storage, indexing and querying. In this paper, we show that the synthetic RDF dataset used in the Social Network Benchmark is characterized with high-structuredness and therefore introduce modifications to the data generator so that it produces an RDF dataset with a real-world structuredness.
Conference Paper
Full-text available
With Linked Data, a very pragmatic approach towards achieving the vision of the Semantic Web has recently gained much traction. The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. While many standards, methods and technologies developed within by the Semantic Web community are applicable for Linked Data, there are also a number of specific characteristics of Linked Data, which have to be considered. In this article we introduce the main concepts of Linked Data. We present an overview of the Linked Data lifecycle and discuss individual approaches as well as the state-of-the-art with regard to extraction, authoring, linking, enrichment as well as evolution of Linked Data. We conclude the chapter with a discussion of issues, limitations and further research and development challenges of Linked Data.
Conference Paper
A large amount of public transport data is made available by many different providers, which makes RDF a great method for integrating these datasets. Furthermore, this type of data provides a great source of information that combines both geospatial and temporal data. These aspects are currently undertested in RDF data management systems, because of the limited availability of realistic input datasets. In order to bring public transport data to the world of benchmarking, we need to be able to create synthetic variants of this data. In this paper, we introduce a dataset generator with the capability to create realistic public transport data. This dataset generator, and the ability to configure it on different levels, makes it easier to use public transport data for benchmarking with great flexibility.
Conference Paper
This paper presents the Transport Disruption ontology, a formal framework for modelling travel and transport related events that have a disruptive impact on traveller’s journeys. We discuss related models, describe how transport events and their impacts are captured, and outline use of the ontology within an interlinked repository of the travel information to support intelligent transport systems.
HOBBIT: Holistic Benchmarking of Big Linked Data
  • A.-C N Ngomo
  • A García-Rojas
  • I Fundulaki