Conference Paper

Web-Scale Querying through Linked Data Fragments

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

To unlock the full potential of Linked Data sources, we need flexible ways to query them. Public SPARQL endpoints aim to fulfill that need, but their availability is notoriously problematic. We therefore introduce Linked Data Fragments, a publishing method that allows efficient offloading of query execution from servers to clients through a lightweight partitioning strategy. It enables servers to maintain availability rates as high as any regular HTTP server, allowing querying to scale reliably to much larger numbers of clients. This paper explains the core concepts behind Linked Data Fragments and experimentally verifies their Web-level scalability, at the cost of increased query times. We show how trading server-side query execution for inexpensive data resources with relevant affordances enables a new generation of intelligent clients.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Verblijfplaats' (Fig. 13). These JSON documents can then be interpreted 55 as Linked Data. ...
... This introduces high costs for the clients. The SPARQL endpoint allows flexible live querying, but its availability is problematic [55]. Therefore the Flemish Government administration will explore the possibilities of Linked Data Fragments, 56 a REST(ful) publishing strategy that allows efficient offloading of query execution from servers to clients through a lightweight partitioning strategy [55]. ...
... The SPARQL endpoint allows flexible live querying, but its availability is problematic [55]. Therefore the Flemish Government administration will explore the possibilities of Linked Data Fragments, 56 a REST(ful) publishing strategy that allows efficient offloading of query execution from servers to clients through a lightweight partitioning strategy [55]. Representational State Transfer (REST) style, outlines how to construct networkbased software applications having the same characteristics as the Web: simplicity, evolvability, and performance [56]. ...
Article
The transformation of society towards a digital economy and government austerity creates a new context leading to changing roles for both government and private sector. Boundaries between public and private services are blurring, enabling government and private sector to collaborate and share responsibilities. In Belgium, the regional Government of Flanders embedded the re-use of public sector information in its legislation and published a data portal containing well over 4000 Open Datasets. Due to a lack of interoperability, interconnecting and interpreting these sources of information remain challenges for public administrations, businesses and citizens. To dissolve the boundaries between the data silos, the Flemish government applied Linked Data design principles in an operational public sector context. This paper discusses the trends we have identified while ‘rewiring’ the Authentic Source for addresses to a Linked Base Registry. We observed the impact on multiple interoperability levels; namely on the legal, organisational, semantic and technical level. In conclusion Linked Data can increase semantic and technical interoperability and lead to a better adoption of government information in the public and private sector. We strongly believe that the insights from the past thirteen years in the region of Flanders could speed up processes in other countries that are facing the complexity of raising technical and semantic interoperability.
... Le cache Web joue un rôle important dans les performances d'un serveur TPF [62]. Lors de l'exécution d'une requête SPARQL, un client TPF génère beaucoup d'appels au serveur TPF. ...
... L'approche de Triple Pattern Fragment (TPF) [62,61] propose de déplacer le processus complexe de traitement des requêtes des serveurs aux clients, dans le but d'améliorer la disponibilité et la capacité de passage à l'échelle des serveurs SPARQL. Une requête SPARQL est décomposée en requête de fragment et le serveur TPF renvoie aux clients les données correspondant aux triplets reçus. ...
... Les triplets récupérés durant le traitement de la requête sont mis en cache dans le client TPF et dans un serveur de cache Web, placé devant le serveur TPF. Bien que le traitement d'une requête SPARQL augmente le nombre d'appels au serveur HTTP, une grande partie de ces appels est interceptée par le cache Web, ce qui réduit la charge du serveur TPF, comme démontré dans [62]. ...
Thesis
Les producteurs de données ont publié des millions de faits RDFsur le Web en suivant les principes des données liées. N’importequi peut récupérer des informations utiles en interrogeant lesdonnées liées avec des requêtes SPARQL. Ces requêtes sont utilesdans plusieurs domaines, comme la santé ou le journalisme desdonnées. Cependant, il y a un compromis entre la performance desrequêtes et la disponibilité des données lors de l’exécution desrequêtes SPARQL.Dans cette thèse, nous étudions comment la collaboration desconsommateurs de données ouvre de nouvelles opportunitésconcernant ce compromis. Plus précisément, comment lacollaboration des consommateurs de données peut : améliorer lesperformances sans dégrader la disponibilité, ou améliorer ladisponibilité sans dégrader les performances.Nous considérons que les données liées permettent à n’importe quid’exécuter un médiateur compact qui peut interroger des sourcesde données sur le Web grâce à des requêtes SPARQL. L’idéeprincipale est de connecter ces médiateurs ensemble pourconstruire une fédération de consommateurs de données liées.Dans cette fédération, chaque médiateur interagit avec unsous-ensemble du réseau. Grâce à cette fédération, nous avonsconstruit : (i) un cache décentralisé hébergé par les médiateurs. Cecache côté client permet de prendre en charge une part importantedes sous-requêtes et d’améliorer la disponibilité des données avecun impact faible sur les performances. (ii) un algorithme dedélégation qui permet aux médiateurs de déléguer leurs requêtes àd’autres médiateurs. Nous démontrons que la délégation permetd’exécuter un ensemble de requêtes plus rapidement quand lesmédiateurs collaborent. Cela améliore les performances sansdégrader la disponibilité des données.
... Caching plays an important role in the performance of LDF servers [17]. Client-side SPARQL query processing using Triple-Pattern Fragments (TPF) generates many calls to LDF server. ...
... The Linked Data Fragments (LDF) [17,16] propose to shift complex query processing from servers to clients to improve availability and scalability of SPARQL endpoints. A SPARQL query is decomposed into triple patterns, an LDF server answers triple patterns and sends data back to the client. ...
... The client performs joins operations based on the nested loop operators, the triples patterns generated during the query processing are cached in the LDF client and in the traditional HTTP cache in front of the LDF Server. Although, a SPARQL query processing increases the number of HTTP requests to the server, a large number of requests are intercepted by the server cache reducing significantly the load on LDF server as demonstrated in [17]. LDF relies on a temporal locality, and the data providers have to provide resources for the data caching. ...
Conference Paper
The Linked Data Fragment (LDF) approach promotes a new trade-off between performance and data availability for querying Linked Data. If data providers’ HTTP caches plays a crucial role in LDF performances, LDF clients are also caching data during SPARQL query processing. Unfortunately, as these clients do not collaborate, they cannot take advantage of this large decentralized cache hosted by clients. In this paper, we propose CyCLaDEs an overlay network based on LDF fragments similarity. For each LDF client, CyCLaDEs builds a neighborhood of LDF clients hosting related fragments in their cache. During query processing, neighborhood cache is checked before requesting LDF server. Experimental results show that CyCLaDEs is able to handle a significant amount of LDF query processing and provide a more specialized cache on client-side.
... Asking a lot of these complex queries can put a heavy load on the server, causing a significant delay or even downtime. Recently, triple pattern fragments (TPF, [15]) were introduced as a way to reduce this load on the server by partially offloading query processing to clients. This is done by restricting the TPF server interface to more simple queries. ...
... Since every subquery causes a new HTTP request to the server, minimizing the number of queries reduces the network load and improves the total response time. The algorithm proposed by Verborgh et al. [15] is greedy: at each decision point, clients choose the local optimum by executing the request that has the fewest results. This works fine for certain classes of queries, but others can perform quite badly. ...
... on the Web Linked Data Fragments In order to characterize the many possibilities for publishing Linked Datasets on the Web, Linked Data Fragments (LDF, [15]) was introduced as a uniform view on all possible Web APIs to Linked Data. The common characteristic of all interfaces is that, in one way or another, they offer specific parts of a dataset. ...
Conference Paper
In order to reduce the server-side cost of publishing queryable Linked Data, Triple Pattern Fragments (TPF) were introduced as a simple interface to RDF triples. They allow for SPARQL query execution at low server cost, by partially shifting the load from servers to clients. The previously proposed query execution algorithm uses more http requests than necessary, and only makes partial use of the available metadata. In this paper, we propose a new query execution algorithm for a client communicating with a TPF server. In contrast to a greedy solution, we maintain an overview of the entire query to find the optimal steps for solving a given query. We show multiple cases in which our algorithm reaches solutions with far fewer http requests, without significantly increasing the cost in other cases. This improves the efficiency of common SPARQL queries against TPF interfaces, augmenting their viability compared to the more powerful, but more costly, SPARQL interface.
... In such cases, HTTP caching mechanisms are ineffective as they can only optimize repeated identical queries. The architecture of SPARQL protocol demands the server to respond to highly complex requests, thus reliable public SPARQL endpoints are an exceptionally difficult challenge [56]. Such challenges contribute to the low availability of public SPARQL endpoints [10]. ...
... In such cases, consumers set up their own private SPARQL endpoint to host the data. However, as this resolves some issues, it has several drawbacks [56]: (i) setting up a SPARQL endpoint requires (possibly expensive) infrastructural support, (ii) involves (often manual) set-up and maintenance, (iii) the data are not up-to-date, and (iv) the entire dataset should be loaded in the server, even though just a part of it is needed. ...
Article
Full-text available
The need for reusable, interoperable, and interlinked linguistic resources in Natural Language Processing downstream tasks has been proved by the increasing efforts to develop standards and metadata suitable to represent several layers of information. Nevertheless, despite these efforts, the achievement of full compatibility for metadata in linguistic resource production is still far from being reached. Access to resources observing these standards is hindered either by (i) lack of or incomplete information, (ii) inconsistent ways of coding their metadata, and (iii) lack of maintenance. In this paper, we offer a quantitative and qualitative analysis of descriptive metadata and resources availability of two main metadata repositories: LOD Cloud and Annohub. Furthermore, we introduce a metadata enrichment, which aims at improving resource information, and a metadata alignment to META-SHARE ontology, suitable for easing the accessibility and interoperability of such resources.
... Currently, the Tarql library only supports the mapping of tabular data to RDF triple formats. RDF quad formats can be supported in the future by using the RDF Mapping Language (RML) 9 . There are three strategies to apply a mapping from a table to RDF using the Databus Client: 1) a generic transformation from CSV to RDF, that generates a resource URI for each row and creates a triple for each column and its corresponding value (the column header is appended to a generic base URI to specify the property). ...
... While the Databus Client allows a flexible and via DataID metadata finegrained access to files, this granularity is still dependent on the file partitioning strategy of the dump publisher. Although a monthly DBpedia release is separated into over 3,000 files, if information for only a small set of entities is consumed by an application, a SPARQL or Linked Data fragments [9] endpoint is more convenient. We plan to extend the current file-based focus of the client to an even more flexible extraction phase that can use e.g. ...
Chapter
Full-text available
Realizing a data-driven application or workflow, that consumes bulk data files from the Web, poses a multitude of challenges ranging from sustainable dependency management supporting automatic updates, to dealing with compression, serialization format, and data model variety. In this work, we present an approach using the novel Databus Client, which is backed by the DBpedia Databus - a data asset release management platform inspired by paradigms and techniques successfully applied in software release management. The approach shifts effort from the publisher to the client while making data consumption and dependency management easier and more unified as a whole. The client leverages 4 layers (download, compression, format, and mapping) that tackle individual challenges and offers a fully automated way for extracting and compiling data assets from the DBpedia Databus, given one command and a flexible dependency configuration using SPARQL or Databus Collections. The current vertical-sliced implementation supports format conversion within as well as mapping between RDF triples, RDF quads, and CSV/TSV files. We developed an evaluation strategy for the format conversion and mapping functionality using so-called round trip tests.KeywordsData dependency managementData compilationData release management platformMetadata repositoryETL
... Linked Data Fragments (LDF) [11] highlight the role of clients for scaling query engines by offloading partial execution to the web browser. Since our prototype is implemented in JavaScript, the proxy also runs as a standalone instance in the browser. ...
... The framework only needs a connection to a SPARQL endpoint over HTTP, or a locally emulated one such as the LDF client. In this regard, we aim to achieve Web-Scale querying as described by Verborgh [11]. ...
Conference Paper
Full-text available
Powered by Semantic Web technologies, the Linked Data paradigm aims at weaving a globally interconnected graph of raw data that transforms the ways we publish, retrieve, share, reuse, and integrate data from a variety of distributed and heterogeneous sources. In practice, however, this vision faces substantial challenges with respect to data quality, coverage, and longevity, the amount of background knowledge required to query distant data, the reproducibility of query results and their derived (scientific) findings, and the lack of computational capabilities required for many tasks. One key issue underlying these challenges is the trade-off between storing data and computing them. Intuitively, data that is derived from already stored data, changes frequently in space and time, or is the result of some workflow or procedure, should be computed. However, this functionality is not readily available on the Linked Data cloud with its current technology stack. In this work, we introduce a proxy that can transparently run on top of arbitrary SPARQL endpoints to enable the on-demand computation of Linked Data together with the provenance information required to understand how they were derived. While our work can be generalized to multiple domains, we focus on two geographic use cases to showcase the proxy’s capabilities.
... Furthermore, the graph patterns are usually star-shaped and despite triple pattern chains exist, they are generally short. Other, more cacheable interfaces, as introduced by Linked Data Fragments [25], also benefit from query logs in order to e.g., know who is using what clients to query the Web, yet the user intention is lost [24]. This was called " a blessing for privacy " , yet adding the user query as an HTTP header was suggested. ...
... It is 11 https:// developers.google.com/ transit/ gtfs/ reference [25] , illustrates the possibilities within publishing transport data. Different ticks on the axis illustrate client effort versus server expressivity: on the far left, data dumps offer high server availability, yet the effort needed by data reusers is high, and query logs do not reach the server. ...
Conference Paper
In the field of smart cities, researchers need an indication of how people move in and between cities. Yet, getting statistics of travel flows within public transit systems has proven to be troublesome. In order to get an indication of public transit travel flows in Belgium, we analyzed the query logs of the iRail API, a highly expressive route planning API for the Belgian railways. We were able to study ∼100k to 500k requests for each month between October 2012 and November 2015, which is between 0.56% and 1.66% of the amount of monthly passengers. Using data visualizations, we illustrate the commuting patterns in Belgium and confirm that Brussels, the capital, acts as a central hub. The Flemish region appears to be polycentric, while in the Walloon region, everything converges on Brussels. The findings correspond to the real travel demand, according to experts of the passenger federation Trein Tram Bus. We conclude that query logs of route planners are of high importance in getting an indication of travel flows. However, better travel intentions would be acquirable using dedicated HTTP POST requests.
... Specifying metadata for the mapping rules or considering the mapping rules to determine the provenance and metadata becomes even more cumbersome, in particular in the case of mapping language whose representation is not in rdf, e.g., sml, sparql or xquery. Similarly, among the rdf data publishing infrastructures, only Triple Pattern Fragments 10 (tpf) [26, 27] provide some metadata information, mainly regarding dataset level statistics and access. Virtuso 11 , 4store 12 and other pioneer publishing infrastructures do not provide out-of-the-box metadata information, e.g., provenance, dataset-level statics etc. of the rdf data published. ...
... Implicit graphs might be used to identify a dataset or a graph, but also triples. In the later case, as Triple Pattern Fragments (tpf) introduced [26, 27], each triple can be found by using the elements of itself, thus, each triple has a uri and, thereby, its implicit graph. For example, the triple x y z for a certain dataset could be identified by the tpf uri http://example.org/dataset?subject= ...
Conference Paper
Provenance and other metadata are essential for determining ownership and trust. Nevertheless, no systematic approaches were introduced so far in the Linked Data publishing workflow to capture them. Defining such metadata remained independent of the RDF data generation and publishing. In most cases, metadata is manually defined by the data publishers (person-agents), rather than produced by the involved applications (software-agents). Moreover, the generated RDF data and the published one are considered to be one and the same, which is not always the case, leading to pure, condense and often seductive information. This paper introduces an approach that takes into consideration declarative definitions of mapping rules, which define how the RDF data is generated, and data descriptions of raw data that allow to automatically and incrementally generate provenance and metadata information. This way, it is assured that the metadata information is accurate, consistent and complete.
... In particular, we aim for solutions with minimal server complexity (minimizing the cost for data publishers) while still enabling live querying (maximizing the utility for Semantic Web applications). 1 http://stats.lod2.eu/ In this article, we present and extend our ongoing work on Linked Data Fragments [6], a framework to analyze Linked Data publication interfaces on the Web, and Triple Pattern Fragments [7], a low-cost interface to triples. Novel contributions include in particular: ...
... The approach of a dynamic iteratorbased pipeline applied to process SPARQL queries has been researched [72]. SPARQL endpoints implement a protocol on top of HTTP-contrary to regular HTTP servers, there are many ways to express the same request that cache hits are likely to be very low-and, therefore, common HTTP caching cannot be used, which has a negative impact on the scalability [73,74]. ...
Article
Full-text available
Smart cities need (sensor) data for better decision-making. However, while there are vast amounts of data available about and from cities, an intermediary is needed that connects and interprets (sensor) data on a Web-scale. Today, governments in Europe are struggling to publish open data in a sustainable, predictable and cost-effective way. Our research question considers what methods for publishing Linked Open Data time series, in particular air quality data, are suitable in a sustainable and cost-effective way. Furthermore, we demonstrate the cross-domain applicability of our data publishing approach through a different use case on railway infrastructure—Linked Open Data. Based on scenarios co-created with various governmental stakeholders, we researched methods to promote data interoperability, scalability and flexibility. The results show that applying a Linked Data Fragments-based approach on public endpoints for air quality and railway infrastructure data, lowers the cost of publishing and increases availability due to better Web caching strategies.
... Note that an ETL (extract-transformload) process is a data transformation paradigm, where data is first extracted from its original location, transformed by the data processing system and loaded to the target database. Once a LD dataset is created and accessible, either as an RDF dump, in a SPARQL endpoint, through IRI dereference or as a Linked Data Fragments server [13], there is an issue regarding its discoverability, i.e. how do the users find it. Partially, the issue of discoverability is tackled by, e.g. the Linked Open Data Cloud 1 , or by a recently initialized list of known SPARQL endpoints 2 . ...
Chapter
Full-text available
Consumption of Linked Data (LD) is a far less explored problem than its production. LinkedPipes Applications (LP-APPs) is a platform enabling data analysts and data journalists to easily create LD based applications such as, but not limited to, visualizations. It builds on our previous research regarding the automatic discovery of possible visualizations of LD. The approach was based on the matching of classes and predicates used in the data, e.g. in a form of a data sample, to what an application or visualization expects, e.g. in a form of a SPARQL query, solving potential mismatches in data by dynamically applying data transformers. In this demo, we present a platform that allows a data analyst to automatically discover possible visualizations of a given LD data source using this method and the applications contained in the platform. Next, the data analyst is able to configure the discovered visualization application and publish it or embed it in an arbitrary web page. Thanks to the configuration being stored in their Solid POD, multiple analysts are able to collaborate on a single application in a decentralized fashion. The resulting visualization application can be kept up to date via scheduling an ETL pipeline, regularly refreshing the underlying data.
... Additionally, it is unclear how to extend SPARQL to support functionalities such as computing embeddings or node centrality. A scalable alternative to SPARQL is Linked Data Fragments (LDF) [23]. The list of native operations in LDF boils down to triple pattern matching, resembling our proposed filter operation. ...
Chapter
Full-text available
Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper we present KGTK, a data science-centric toolkit designed to represent, create, transform, enhance and analyze KGs. KGTK represents graphs in tables and leverages popular libraries developed for data science applications, enabling a wide audience of developers to easily construct knowledge graph pipelines for their applications. We illustrate the framework with real-world scenarios where we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet.
... Additionally, it is unclear how to extend SPARQL to support functionalities such as computing embeddings or node centrality. A scalable alternative to SPARQL is Linked Data Fragments (LDF) [21]. The list of natively supported operations in LDF boils down to triple pattern matching, resembling our proposed filter operation. ...
Preprint
Full-text available
Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper, we present KGTK, a data science-centric toolkit to represent, create, transform, enhance and analyze KGs. KGTK represents graphs in tables and leverages popular libraries developed for data science applications, enabling a wide audience of developers to easily construct knowledge graph pipelines for their applications. We illustrate KGTK with real-world scenarios in which we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet, in our own work.
... In this sense, future work on this research will extend the framework to incorporate a pluggable mechanism that enables developers/users to provide custom aggregates as extensions that would be integrated to the running data ingestion pipeline. Additionally, the query processing component of the framework will be further developed to enable features such as predictive caching to anticipate the queries that are likely to be issued next, according to user's interaction behaviour, and federated querying by implementing a linked data fragments interface, which boosts system scalability by pushing part of the query computation to the client-side application [56,57]. ...
Article
Full-text available
Citizen engagement is one of the key factors for smart city initiatives to remain sustainable over time. This in turn entails providing citizens and other relevant stakeholders with the latest data and tools that enable them to derive insights that add value to their day-to-day life. The massive volume of data being constantly produced in these smart city environments makes satisfying this requirement particularly challenging. This paper introduces EXPLORA, a generic framework for serving interactive low-latency requests, typical of visual exploratory applications on spatiotemporal data, which leverages the stream processing for deriving—on ingestion time—synopsis data structures that concisely capture the spatial and temporal trends and dynamics of the sensed variables and serve as compacted data sets to provide fast (approximate) answers to visual queries on smart city data. The experimental evaluation conducted on proof-of-concept implementations of EXPLORA, based on traditional database and distributed data processing setups, accounts for a decrease of up to 2 orders of magnitude in query latency compared to queries running on the base raw data at the expense of less than 10% query accuracy and 30% data footprint. The implementation of the framework on real smart city data along with the obtained experimental results prove the feasibility of the proposed approach.
... One possible solution to the unavailability of knowledge bases is the implementation of a caching mechanism and the prefetching of data if the domain of interest can be determined by some example terms. Using data dumps is also possible, but a more balanced way is to use Linked Data Fragments [393]. OntoConnector does not yet have a user-friendly interface to help complete the provided SPARQL templates. ...
Thesis
Full-text available
Domain modeling is an important model-driven engineering activity, which is typically used in the early stages of software projects. Domain models capture concepts and relationships of respective application fields using a modeling language and domain-specific terms. They are a key factor in achieving shared understanding of the problem area among stakeholders, improving communication in software development, and generating code and software. Domain models are a prerequisite for domain-specific language development and are often implemented in software and data integration and software modernization projects, which constitute the larger part of industrial IT investment. Several studies from recent years have shown that model-driven methods are much more widespread in the industry than previously thought, yet their application is challenging. Creating domain models requires that software engineers have both experience in model-driven engineering and detailed domain knowledge. While the former is one of the general modeling skills, the required domain knowledge varies from project to project. Domain knowledge acquisition is a time-consuming manual process because it requires multidisciplinary collaboration and gathering of information from different groups of people, documents, and other sources of knowledge, and is rarely supported in current modeling environments. Consistent access to large amounts of structured domain knowledge is not possible due to the heterogeneity of formats, access methods, schemas, and semantic representations. Besides, existing suitable knowledge bases were mostly manually created and are therefore not extensive enough. The automated construction of knowledge resources utilizes mainly information extraction approaches that focus on factual knowledge at the instance level and therefore cannot be used for conceptual-level domain modeling. This thesis develops novel methods and tools that provide domain information directly during modeling to reduce the initial effort of using domain modeling and to help software developers create domain models. It works on the connection of the areas of software modeling, knowledge bases, information extraction and recommender systems to automatically acquire conceptual knowledge from structured knowledge sources and unstructured natural language datasets, to transform the aggregated knowledge into appropriate recommendations, and to develop suitable assistance services for modeling environments. With this thesis, the paradigm of Semantic Modeling Support is proposed, the methodological foundation for providing automated modeling assistance. It includes an iterative procedure of model refinement, knowledge acquisition, and element recommendation that allows to query and provide the necessary domain knowledge for a range of support scenarios at each stage of domain model development, keeping the human in the loop. To address the lack of conceptual knowledge resources, new methods are developed to extract conceptual terms and relationships directly from large N-gram text data using syntactic patterns, co-occurrences, and statistical features of text corpora. A large Semantic Network of Related Terms is automatically constructed with nearly 6 million unique one-word terms and multi-word expressions connected with over 355 million weighted binary and ternary relationships. It allows to directly answer top-N queries. This thesis introduces an extensible query component with a set of fully connected knowledge bases to uniformly access structured knowledge with well-defined relationships. The developed Mediator-Based Querying Architecture with Generic Templates is responsible for retrieving lexical information from heterogeneous knowledge bases and mapping to modeling language-specific concepts. Furthermore, it is demonstrated how to implement the semantic modeling support strategies by extending the widely used Eclipse Modeling Project. The Domain Modeling Recommender System generates context-sensitive modeling suggestions based on the connected knowledge bases, the semantic network of terms, and an integrated ranking strategy. Finally, this thesis reports on practical experience with the application of the developed methods and tools in three research projects.
... We would like to note that the community is already building an infrastructure that would eliminate this mismatch: the Linked Data Fragments initiative [15], which aims to study different ways of publishing linked data on the web. Specifically, one of the proposals of this initiative is to build an infrastructure that can support the answer of any SPARQL triple pattern [14]. ...
Article
Full-text available
One of the advantages that Linked Data offers over the classical database setting is the ability to connect and navigate through different datasets. At the moment the standard mechanism for exploring navigational properties of the Semantic Web data are SPARQL property paths. However, the semantics of property paths is only defined assuming one evaluates them over a single local database, and it is still not clear what is the correct way to implement them over the Web of Linked Data, nor if this is even feasible. In this paper we explore this topic in more depth and gauge the merits of different approaches of executing property paths over Linked Data. To this end we test how property paths perform if the Linked Data is assumed to be available locally, through endpoints, or if it is accessed directly through dereferencing IRIs.
... Linked Data Fragment (LDF) [39,88] is an initiative as an alternative approach to SPARQL endpoints. LDF aims at addressing the availability issue of data sources which prevents reliably querying these data. ...
Thesis
Driven by the Semantic Web standards, an increasing number of RDF data sources are published and connected over the Web by data providers, leading to a large distributed linked data network. However, exploiting the wealth of these data sources is very challenging for data consumers considering the data distribution, their volume growth and data sources autonomy. In the Linked Data context, federation engines allow querying these distributed data sources by relying on Distributed Query Processing (DQP) techniques. Nevertheless, a naive implementation of the DQP approach may generate a tremendous number of remote requests towards data sources and numerous intermediate results, thus leading to costly network communications. Furthermore, the distributed query semantics is often overlooked. Query expressiveness, data partitioning, and data replication are other challenges to be taken into account. To address these challenges, we first proposed in this thesis a SPARQL and RDF compliant Distributed Query Processing semantics which preserves the SPARQL language expressiveness. Afterwards, we presented several strategies for a federated query engine that transparently addresses distributed data sources, while managing data partitioning, query results completeness, data replication, and query processing performance. We implemented and evaluated our approach and optimization strategies in a federated query engine to prove their effectiveness.
... The hosting of Linked Data datasets is a well understood problem [23,25,26]. However, the wide range of different interface types for Linked Data datasets [28] requires a respective range of experts to access the entirety of the data. Linked Data clients must be able to interact with these interface types. ...
Conference Paper
Full-text available
With more and more applications providing semantic data to improve interoperability, the amount of available RDF datasets is constantly increasing. The SPARQL query language is a W3C recommendation to provide query capabilities on such RDF datasets. Data integration from different RDF sources is up to now mostly task of RDF consuming clients. However, from a functional perspective, data integration boils down to a function application that consumes input data as parameters, and based on these, produces a new set of data as output. Following this notion, we introduce SPARQλ, an extension to the SPARQL 1.1 query language. SPARQλ enables dynamic injection of RDF datasets during evaluation of the query, and by this lifts SPARQL to a tool to write templates for RDF producing functions, an important step to reduce the effort to write SPARQL queries that work on data from various sources. SPARQλ is moreover suitable to directly translate to an RDF described Web service interface, which allows to lift integration of data and re-provisioning of integrated results from clients to cloud environments, and by this solving the bottleneck of RDF data integration on client side.
... The hosting of Linked Data datasets is a well understood problem [2,3,4]. However, the wide range of different interface types for Linked Data datasets [5] requires a respective range of experts to access the entirety of the data. Linked Data clients must be able to interact with these interface types. ...
Conference Paper
Full-text available
With more and more applications providing semantic data to improve interoperability, the amount of available RDF datasets is con- stantly increasing. The SPARQL query language is a W3C recommen- dation to provide query capabilities on such RDF datasets. Yet as the coverage of RDF datasets with efficient and available SPARQL endpoints is still limited, integration of data from different RDF sources is a bot- tleneck that has mostly to be done in RDF consuming clients. We tackle this bottleneck by introducing SPARQλ, an extension to the SPARQL 1.1 query language. SPARQλ enables dynamic injection of RDF datasets during evaluation of the query, and by this lifts SPARQL to a tool to write templates for RDF producing functions in functional programming style. This is an important step to reduce the effort to write SPARQL queries that work on data from various sources. SPARQλ is moreover suitable to directly translate to an RDF described Web service interface, which allows to lift integration of data and re-provisioning of integrated results from clients to cloud environments.
... In particular, we aim for solutions with minimal server complexity (minimizing the cost for data publishers) while still enabling live querying (maximizing the utility for Semantic Web applications). 1 http://stats.lod2.eu/ In this article, we present and extend our ongoing work on Linked Data Fragments [6], a framework to analyze Linked Data publication interfaces on the Web, and Triple Pattern Fragments [7], a low-cost interface to triples. Novel contributions include in particular: ...
Article
Full-text available
Billions of Linked Data triples exist in thousands of RDF knowledge graphs on the Web, but few of those graphs can be queried live from Web applications. Only a limited number of knowledge graphs are available in a queryable interface, and existing interfaces can be expensive to host at high availability. To mitigate this shortage of live queryable Linked Data, we designed a low-cost Triple Pattern Fragments interface for servers, and a client-side algorithm that evaluates SPARQL queries against this interface. This article describes the Linked Data Fragments framework to analyze Web interfaces to Linked Data and uses this framework as a basis to define Triple Pattern Fragments. We describe client-side querying for single knowledge graphs and federations thereof. Our evaluation verifies that this technique reduces server load and increases caching effectiveness, which leads to lower costs to maintain high server availability. These benefits come at the expense of increased bandwidth and slower, but more stable query execution times. These results substantiate the claim that lightweight interfaces can lower the cost for knowledge publishers compared to more expressive endpoints, while enabling applications to query the publishers’ data with the necessary reliability.
... Another line of inquiry that tries to address the issue of distributed query processing by focusing on reducing client-server load, is that of Triple Pattern Fragments (TPFs), as outlined in [19] and [20]. However, this assumes blank-node free sources, and assumes that existing data is converted into fragments beforehand, hence not a general solution. ...
Conference Paper
Distributed query processing, when blank nodes might occur in sources and only the signature of sources are known, may in the worst case require that all possible query partitions be evaluated in order to ensure soundness and completeness of answers. The work presented in this paper attempts to push the boundary as to what query sizes and structures lies within practical feasibility. The presented approach utilizes semantic information obtained by probing sources for occurrences of blank nodes, interleaved in a recursive algorithm for calculating restricted growth strings, in order to detect and abort unfruitful branches in the partition generating process. The approach is evaluated against a well-known SPARQL benchmark, modified for the distributed case, and tentative conclusions regarding the effectiveness are drawn.
... One reason is that full-fledged semantic stacks are perceived as costly, unreliable server-sided architectures, in opposition with current (i.e. modular and client-side) Web design practices [19]. We have addressed this problem in [18], by proposing HyLAR, a reasoner that can be both used on the server and client sides. ...
Conference Paper
Full-text available
Today's Web applications tend to reason about cyclic data (i.e. facts that re-occur periodically) on the client side. Although they can benefit from efficient incremental maintenance algorithms capable of handling frequent data updates, existing rule-based algorithms cause successive re-derivations of previously inferred information. In this paper, we propose an incremental maintenance approach for rule-based reasoning that prevents successive re-computations of fact derivations. We tag (i.e. annotate) facts to keep trace of their provenance and validity. We compare our solution with the DRed-based incremental reasoning algorithm and show that it significantly outperforms this algorithm for fact updates in re-occurring situations, to the cost of tagging facts at their first insertion. Our experiments show that this cost can be recovered within a small number of cycles of deletions and reinsertions of explicit facts. We discuss the utility and limitations of our approach on Web clients and provide implementation packages of this reasoner that can be directly integrated in Web applications, on both server and client sides.
... One reason is that fullfledged semantic stacks are perceived as costly, unreliable server-sided architectures, in opposition with current (i.e. modular and client-side) Web design practices [Verborgh et al., 2014b]. Adaptation of the reasoning can solve issues that are specific to the WoT application infrastructure and software environment. ...
Thesis
Full-text available
The Web of Things (WoT) takes place in a variety of application domains (e.g. homes, enterprises, industry, healthcare, city, agriculture...). It builds a Web-based uniform layer on top of the Internet of Things (IoT) to overcome the heterogeneity of protocols present in the IoT networks. WoT applications provide added value by combining access to connected objects and external data sources, as well as standard-based reasoning (RDF-S, OWL 2) to allow for interpretation and manipulation of gathered data as contextual information. Contextual information is then exploited to allow these applications to adapt their components to changes in their environment. Yet, contextual adaptation is a major challenge for theWoT. Existing adaptation solutions are either tightly coupled with their application domains (as they rely on domain-specific context models) or offered as standalone software components that hardly fit inWeb-based and semantic architectures. This leads to integration, performance and maintainability problems. In this thesis, we propose a multi-purpose contextual adaptation solution for WoT applications that addresses usability, flexibility, relevance, and performance issues in such applications. Our work is based on a smart agriculture scenario running inside the avatar-based platformASAWoO. First,we provide a generic context meta-model to build standard, interoperable et reusable context models. Second, we present a context lifecycle and a contextual adaptation workflow that provide parallel raw data semantization and contextualization at runtime, using heterogeneous sources (expert knowledge, device documentation, sensors,Web services, etc.). Third, we present a situation-driven adaptation rule design and generation at design time that eases experts and WoT application designers’ work. Fourth, we provide two optimizations of contextual reasoning for theWeb: the first adapts the location of reasoning tasks depending on the context, and the second improves incremental maintenance of contextual information
... In an effort to make SPARQL scalable and highly-available, Linked Data fragments (LDF) have very promising results and represent a viable approach to "cloud-scale" SPARQL [34,33]. ...
Article
Full-text available
https://arxiv.org/abs/1803.03525 The development of increasingly complex IoT systems requires large engineering environments. These environments generally consist of tools from different vendors and are not necessarily integrated well with each other. In order to automate various analyses, queries across resources from multiple tools have to be executed in parallel to the engineering activities. In this paper, we identify the necessary requirements on such a query capability and evaluate different architectures according to these requirements. We propose an improved lifecycle query architecture, which builds upon the existing Tracked Resource Set (TRS) protocol, and complements it with the MQTT messaging protocol in order to allow the data in the warehouse to be kept updated in real-time. As part of the case study focusing on the development of an IoT automated warehouse, this architecture was implemented for a toolchain integrated using RESTful microservices and linked data.
... c) LDFragments: The third architecture pattern is LD-Fragments [18]. LDFragments tackle the problems of reliability and scalability prevalent in SPARQL endpoints by introducing a low-cost query technique based on triple pattern fragments. ...
Conference Paper
Full-text available
In this position paper, we propose a decentralized architecture for publishing RDF. Unlike traditional client-server architectures, we propose a peer-to-peer (P2P) architecture for SPARQL query processing and RDF knowledge sharing. We present a general architecture and its key problems. Further, we discuss its potential with respect to solving the availability and performance issues that for long time have been prevailing on existing infrastructures.
... The clean data is nally republished in a uniform serialization format, and either a SPARQL endpoint or dump les are provided. Additionally, the data can be accessed as HDT or using Linked Data Fragments [19]. The pipeline for acquiring data from LOD laundromat consists of the following steps: 1) reading and parsing the metadata le 10 ; 2) for each dataset found, stream the clean data; 3) based on the original dataset download URI, we fetch metadata in any of the CKAN repositories analyzed. ...
Conference Paper
Full-text available
Over the last decade, we observed a steadily increasing amount of RDF datasets made available on the web of data. The decentralized nature of the web, however, makes it hard to identify all these datasets. Even more so, when downloadable data distributions are discovered, only insufficient metadata is available to describe the datasets properly, thus posing barriers on its usefulness and reuse. In this paper, we describe an attempt to exhaustively identify the whole linked open data cloud by harvesting metadata from multiple sources, providing insights about duplicated data and the general quality of the available metadata. This was only possible by using a probabilistic data structure called Bloom filter. Finally, we published a dump file containing metadata which can further be used to enrich existent datasets.
... These results can also be embedded on other Web pages via an HTML iframe element. We note that Wikidata is open data published under the Public Domain Dedication and Waiver (CC0), 8 and that it is available not only through the SPARQL endpoint, but also as Linked Data Fragments 9 [35] and-like any other project of the Wikimedia family-through an API and dump files. 10 In the following sections, we describe how Wikidata has been used for bibliographic information, some statistics on it and present Scholia, our website built to expose such information. ...
Conference Paper
Full-text available
Scholia is a tool to handle scientific bibliographic information through Wikidata. The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. To collect the data, it queries the SPARQL-based Wikidata Query Service. Among several display formats available in Scholia are lists of publications for individual researchers and organizations, plots of publications per year, employment timelines, as well as co-author and topic networks and citation graphs. The Python package implementing the Web service is also able to format Wikidata bibliographic entries for use in LaTeX/BIBTeX. Apart from detailing Scholia, we describe how Wikidata has been used for bibliographic information and we also provide some scientometric statistics on this information.
... However, facing the lack of any standardization or widespread adoption of Linked Services, a fatum shared with Semantic Web Services [50], the scientific community currently favours addressing more utile techniques for data integration and system interoperation. To name a few, we briefly list RDF constraint languages for describing and constraining the contents of RDF graphs [21,32,35,38,44], novel RESTful Linked Data interfaces as well as client-side query execution techniques [47,49,51,52], the specification of integration and interaction patterns for building resource-oriented Linked Data applications [13-15, 26, 39], or high-performance processing techniques for dynamic Linked Data [19,20]. ...
Conference Paper
Full-text available
The use of autonomous robots in both industry and every day life has increased significantly in the recent years. A growing number of robots is connected to and operated from networks, including the World Wide Web. Consequently, the Robotics community is exploring and adopting REST (Representational State Transfer) architectural principles and considers the use of Linked Data technologies as fruitful next step. However, we observe a lack of concise and stable specifications of how to properly leverage the RESTful paradigm and Linked Data concepts in the Robotics domain. Introducing the notion of Linked Robotic Things, we provide a minimalistic, yet well-defined specification covering a minimal set of requirements with respect to the use of HTTP and RDF.
... Fortunately, the LOD Laundromat [1,21] makes this data available by gathering dataset dumps from the Web, including archived data. LOD Laundromat cleans the data by fixing syntactic errors and removing duplicates, and then makes it available through download (either as gzipped N-Triples or N-Quads, or HDT [10] files), a SPARQL endpoint, and Triple Pattern Fragments [24]. Using the LOD Laundromat is also a better solution than trying to use documents dereferenced by URIs, because most of datasets available online are data dumps [9,16], thus not accessible by dereferencing. ...
Conference Paper
Full-text available
While a number of quality metrics have been successfully proposed for datasets in the Web of Data, there is a lack of trust metrics that can be computed for any given dataset. We argue that reuse of data can be seen as an act of trust. In the Semantic Web environment, datasets regularly include terms from other sources, and each of these connections express a degree of trust on that source. However, determining what is a dataset in this context is not straightforward. We study the concepts of dataset and dataset link, to finally use the concept of Pay-Level Domain to differentiate datasets, and consider usage of external terms as connections among them. Using these connections we compute the PageRank value for each dataset, and examine the influence of ignoring predicates for computation. This process has been performed for more than 300 datasets, extracted from the LOD Laundromat. The results show that reuse of a dataset is not correlated with its size, and provide some insight on the limitations of the approach and ways to improve its efficacy.
... During the last years, significant research activities have appeared that focus on industrial relevant scenarios, such as the LATC (LATC Project 2016) and the LOD2 (LOD2 Project 2016) projects that aim to contribute high-quality interlinked versions of public semantic web datasets and promoting their use in new cross-domain applications by developers across the globe (Verborgh et al. 2014). In the context of these efforts and emerging tools, while there is considerable support for linked data in other issues, such as storage (Virtuoso, Sesame), linkage (Silk Framework), discovery and publishing (SPARQL standard) and even visualization of RDF graphs (LodLive, CubeViz), there are very limited options for renovating existing data into Linked Data. ...
Article
Full-text available
Linked Data has become the current W3C recommended approach for publishing data on the World Wide Web as it is sharable, extensible, and easily re-usable. An ecosystem of linked data hubs in the Public Sector has the potential to offer significant benefits to its consumers (other public offices and ministries, as well as researchers, citizens and SMEs), such as increased accessibility and re-use value of their data through the use of web-scale identifiers and easy interlinking with datasets of other public data providers. The power and flexibility of the schema-defying Linked Data, however, is counterbalanced by inborn factors that diminish the potential for cost-effective and efficient adoption by the Public Sector. The paper analyzes these challenges in view of the current state-of-the-art in linked data technologies and proposes a technical framework that aims to hide the underlying complexity of linked data while maintaining and promoting the interlinking capabilities enabled by the Linked Data Paradigm. The paper presents the innovations behind our proposed solutions as well as their advantages, especially for the non-expert users.
... As a result, publishing RDF data via SPARQL endpoints might not be a feasible option for many data providers. c) LDFragments: The third architecture pattern is LD-Fragments [18], which was proposed in 2014 with the aim of tackling the reliability and scalability problems prevalent with SPARQL endpoints. LDFragments introduces a low-cost query technique based on triple pattern fragments. ...
... Querying multiple datasets is possible via other paradigms, such as Linked Data Fragments [18]. However, these largely share the disadvantages of SPARQL. ...
Conference Paper
Full-text available
It is difficult to find resources on the Semantic Web today, in particular if one wants to search for resources based on natural language keywords and across multiple datasets. In this paper, we present LOTUS: Linked Open Text UnleaShed, a full-text lookup index over a huge Linked Open Data collection. We detail LOTUS' approach, its implementation, its coverage, and demonstrate the ease with which it allows the LOD cloud to be queried in different domain-specific scenarios.
... The strong dynamic RDF support of the platform allows the local store to be used as a cache which relies on large external stores for background knowledge. We plan to provide an interface to linked data fragments [37]. Using the LOD Laundromat linked data fragments endpoint and appropriate caching techniques will provide web-scale querying. ...
Article
Full-text available
ClioPatria is a comprehensive semantic web development framework based on SWI-Prolog. SWI-Prolog provides an efficient C-based main-memory RDF store that is designed to cooperate naturally and efficiently with Prolog, realizing a flexible RDF-based environment for rule based programming. ClioPatria extends this core with a SPARQL and LOD server, an extensible web frontend to manage the server, browse the data, query the data using SPARQL and Prolog and a Git-based plugin manager. The ability to query RDF using Prolog provides query composition and smooth integration with application logic. ClioPatria is primarily positioned as a prototyping platform for exploring novel ways of reasoning with RDF data. It has been used in several research projects in order to perform tasks such as data integration and enrichment and semantic search.
... Fortunately, the LOD Laundromat [8,9] makes this data available by gathering dataset dumps from the Web, including archived data. The LOD Laundromat "cleans" the data by fixing syntactic errors and removing duplicates, and then makes it available through download (either as gzipped N-Triples or N-Quads, or HDT [10] files), a SPARQL endpoint, and Triple Pattern Fragments [11]. ...
Conference Paper
Full-text available
While a number of quality metrics have been successfully proposed for datasets in the Web of Data, there is a lack of trust metrics that can be computed for any given dataset. We argue that reuse of data can be seen as an act of trust. In the Semantic Web environment, datasets regularly include terms from other sources, and each of these connections express a degree of trust on that source. However, determining what is a dataset in this context is not straightforward. We use the concept of Pay-Level Domain to differentiate datasets, and consider usage of external terms as connections among them. Using these connections we compute the PageRank value for each dataset. This process has been performed for more than 300 datasets, extracted from the LOD Laundromat.
... We will extend this approach to continuously updating querying over dynamic data. Figure 1 shows this shift to more static data in relation to the Linked Data Fragments ( ldf) [19] axis. ldf is a conceptual framework to compare Linked Data publication interface in which tpf can be seen as a trade-off between high server and client effort for data retrieval. ...
Conference Paper
Our society is evolving towards massive data consumption from heterogeneous sources, which includes rapidly changing data like public transit delay information. Many applications that depend on dynamic data consumption require highly available server interfaces. Existing interfaces involve substantial costs to publish rapidly changing data with high availability, and are therefore only possible for organisations that can afford such an expensive infrastructure. In my doctoral research, I investigate how to publish and consume real-time and historical Linked Data on a large scale. To reduce server-side costs for making dynamic data publication affordable, I will examine different possibilities to divide query evaluation between servers and clients. This paper discusses the methods I aim to follow together with preliminary results and the steps required to use this solution. An initial prototype achieves significantly lower server processing cost per query, while maintaining reasonable query execution times and client costs. Given these promising results, I feel confident this research direction is a viable solution for offering low-cost dynamic Linked Data interfaces as opposed to the existing high-cost solutions.
... In particular, we aim for solutions with minimal server complexity (minimizing the cost for data publishers ) while still enabling live querying (maximizing the utility for Semantic Web applications). In this article, we present and extend our ongoing work on Linked Data Fragments [6], a framework to analyze Linked Data publication interfaces on the Web, and Triple Pattern Fragments [7], a low-cost interface to triples. Novel contributions include in particular: @BULLET an extended formalization (Sections 4 and 5) that details the response format and its considerations (Section 5.3); @BULLET a detailed discussion of Triple Pattern Fragments publication and their relationship to existing interfaces (Sections 4.3 and 5.4); ...
Article
Billions of Linked Data triples exist in thousands of RDF knowledge graphs on the Web, but few of those graphs can be queried live from Web applications. Only a limited number of knowledge graphs are available in a queryable interface, and existing interfaces can be expensive to host at high availability. To mitigate this shortage of live queryable Linked Data, we designed a low-cost Triple Pattern Fragments interface for servers, and a client-side algorithm that evaluates SPARQL queries against this interface. This article describes the Linked Data Fragments framework to analyze Web interfaces to Linked Data and uses this framework as a basis to define Triple Pattern Fragments. We describe client-side querying for single knowledge graphs and federations thereof. Our evaluation verifies that this technique reduces server load and increases caching effectiveness, which leads to lower costs to maintain high server availability. These benefits come at the expense of increased bandwidth and slower, but more stable query execution times. These results substantiate the claim that lightweight interfaces can lower the cost for knowledge publishers compared to more expressive endpoints, while enabling applications to query the publishers’ data with the necessary reliability.
... Complex query interfaces transfer a significant proportion of the data analysis burden (namely the costly data selection process) to the server. This has been proven problematic in terms of performance and reliability of the service endpoints [3] and has given rise to decentralised approaches to complex queries against the Web of Data, such as Linked Data Fragments [17] and Linked Data queries [5]. ...
Conference Paper
Usage mining always was and still is a key topic for research in the context of the Web [16]. This is evidenced by the series of papers that appear in the scientific tracks of the WWW conference year by year. Web usage is being studied to create economic value by placing targeted ads or delivering personalized content, but also in order to better understand how people behave online in mass movements and collective action.
Chapter
The European Union Agency for Railways is an European authority, tasked with the provision of a legal and technical framework to support harmonized and safe cross-border railway operations throughout the EU. So far, the agency relied on traditional application-centric approaches to support the data exchange among multiple actors interacting within the railway domain. This lead however, to isolated digital environments that consequently added barriers to digital interoperability while increasing the cost of maintenance and innovation. In this work, we show how Semantic Web technologies are leveraged to create a semantic layer for data integration across the base registries maintained by the agency. We validate the usefulness of this approach by supporting route compatibility checks, a highly demanded use case in this domain, which was not available over the agency’s registries before. Our contributions include (i) an official ontology for the railway infrastructure and authorized vehicle types, including 28 reference datasets; (ii) a reusable Knowledge Graph describing the European railway infrastructure; (iii) a cost-efficient system architecture that enables high-flexibility for use case development; and (iv) an open source and RDF native Web application to support route compatibility checks. This work demonstrates how data-centric system design, powered by Semantic Web technologies and Linked Data principles, provides a framework to achieve data interoperability and unlock new and innovative use cases and applications. Based on the results obtained during this work, ERA officially decided to make Semantic Web and Linked Data-based approaches, the default setting for any future development of the data, registers and specifications under the agency’s remit for data exchange mandated by the EU legal framework. The next steps, which are already underway, include further developing and bringing these solutions to a production-ready state.
Article
Natural Language Query Interfaces (NLQIs) have once again captured the public imagination, but developing them for the Semantic Web has proven to be non-trivial. This is unfortunate, because the Semantic Web offers many opportunities for interacting with smart devices, including those connected to the Internet of Things. In this paper, we present an NLQI to the Semantic Web based on a Compositional Semantics (CS) that can accommodate many particularly tricky aspects of the English language, including nested n-ary transitive verbs, superlatives, and chained prepositional phrases, and even ambiguity. Key to our approach is a new data structure which has proven to be useful in answering NL queries. As a consequence of this, our system is able to handle NL features that are often considered to be non-compositional. We also present a novel method to memoize sub-expressions of a query formed from CS, drastically improving query execution times with respect to large triplestores. Our approach is agnostic to any particular database query language. A live demonstration of our NLQI is available online.
Conference Paper
Web-based information services transformed how we interact with public transport. Discovering alternatives to reach destinations and obtaining live updates about them is necessary to optimize journeys and improve the quality of travellers’ experience. However, keeping travellers updated with opportune information is demanding. Traditional Web APIs for live public transport data follow a polling approach and allocate all data processing on either data providers, lowering data accessibility, or data consumers, increasing the costs of innovative solutions. Moreover, data processing load increases further because previously obtained route plans are fully recalculated when live updates occur. In between solutions sharing processing load between clients and servers, and alternative Web API architectures were not thoroughly investigated yet. We study performance trade-offs of polling and push-based Web architectures to efficiently publish and consume live public transport data. We implement (i) alternative architectures that allow sharing data processing load between clients and servers, and evaluate their performance following polling- and push-based approaches; (ii) a rollback mechanism that extends the Connection Scan Algorithm to avoid unnecessary full route plan recalculations upon live updates. Evaluations show polling as a more efficient alternative on CPU and RAM but hint towards push-based alternatives when bandwidth is a concern. Clients update route plan results 8–10 times faster with our rollback approach. Smarter API design combining polling and push-based Web interfaces for live public transport data reduces the intrinsic costs of data sharing by equitably distributing the processing load between clients and servers. Future work can investigate more complex multimodal transport scenarios. KeywordsPublic transportWeb interfacesLive updatesRoute planning
Chapter
This paper describes the system architecture for generating the History of the Law developed for the Chilean National Library of Congress (BCN). The production system uses Semantic Web technologies, Akoma-Ntoso, and tools that automate the marking of plain text to XML, enriching and linking documents. These documents semantically annotated allow to develop specialized political and legislative services, and to extract knowledge for a Legal Knowledge Base for public use. We show the strategies used for the implementation of the automatic markup tools, as well as describe the knowledge graph generated from semantic documents. Finally, we show the contrast between the time of document processing using semantic technologies versus manual tasks, and the lessons learnt in this process, installing a base for the replication of a technological model that allows the generation of useful services for diverse contexts.
Conference Paper
We present our work-in-progress on handling temporal RDF graph data using the Ethereum distributed ledger. The motivation for this work are scenarios where multiple distributed consumers of streamed data may need or wish to verify that data has not been tampered with since it was generated – for example, if the data describes something which can be or has been sold, such as domestically-generated electricity. We describe a system in which temporal annotations, and information suitable to validate a given dataset, are stored on a distributed ledger, alongside the results of fixed SPARQL queries executed at the time of data storage. The model adopted implements a graph-based form of temporal RDF, in which time intervals are represented by named graphs corresponding to ledger entries. We conclude by discussing evaluation, what remains to be implemented, and future directions.
Thesis
Full-text available
Internet used by us every day was not originally created for convenient data analysis, but rather for exchange documents. The answer for this state of affairs seems to be Semantic Web and Linked Data projects. Using this approach to World Wide Web, we can construct certain search queries and obtain useful information without much effort. This work discussed issues related to the Semantic Web, Linked Data, RDF (language describing data model), different RDF serializations and database data caching concepts. In addition, I present RTriples -- complete RDF graph store based on NoSQL database, adapted for use in Semantic Web and Linked Data environments. The whole is supplemented by performance tests of presented solution. Title in English: Caching triples in RDF graph stores
Article
We describe SPARQLES: an online system that monitors the health of public SPARQL endpoints on the Web by probing them with custom-designed queries at regular intervals. We present the architecture of SPARQLES and the variety of analytics that it runs over public SPARQL endpoints, categorised by availability, discoverability, performance and interoperability. We also detail the interfaces that the system provides for human and software agents to learn more about the recent history and current state of an individual SPARQL endpoint or about overall trends concerning the maturity of all endpoints monitored by the system. We likewise present some details of the performance of the system and the impact it has had thus far.
Article
This paper demonstrates that the presence of blank nodes in RDF data represents a problem for distributed processing of SPARQL queries. It is shown that the usual decomposition strategies from the literature will leak information - even when information derives from a single source. It is argued that this leakage, and the proper reparational measures, need to be accounted for in a formal semantics. To this end a set semantics for SPARQL is generalized with a parameter representing execution contexts. This makes it possible to keep tabs on the naming of blank nodes across execution contexts, which in turn makes it possible to articulate a decomposition strategy that is provably sound and complete wrt. any selection of RDF sources even when blank nodes are allowed. Alas, this strategy is not computationally tractable. However, there are ways of utilizing knowledge about the sources, if one has it, that will help considerably.
Conference Paper
Many ICT applications and services, including those from the Semantic Web, rely on the Web for the exchange of data. This includes expensive server and network infrastructures. Most rural areas of developing countries are not reached by the Web and its possibilities, while at the same time the ability to share knowledge has been identified as a key enabler for development. To make widespread knowledge sharing possible in these rural areas, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for Web-like exchange of RDF data over cellular networks in which HTTP is substituted by SMS. We motivate and validate this through two use cases in West Africa. We present the design and implementation of the solution, along with a data compression method that combines generic compression strategies and strategies that use Semantic Web specific features to reduce the size of RDF before it is transferred over the low-bandwidth cellular network.
Conference Paper
Scalability of the data access architecture in the Semantic Web is dependent on the establishment of caching mechanisms to take the load off of servers. Unfortunately, there is a chicken and egg problem here: Research, implementation, and evaluation of caching infrastructure is uninteresting as long as data providers do not publish relevant metadata. And publishing metadata is useless as long as there is no infrastructure that uses it. We show by means of a survey of live RDF data sources that caching metadata is prevalent enough already to be used in some cases. On the other hand, they are not commonly used even on relatively static data, and when they are given, they are very conservatively set. We point out future directions and give recommendations for the enhanced use of caching in the Semantic Web.
Conference Paper
In this paper, we investigate challenges related to the adaptation of similarity measures used in the field of Information Retrieval to work with semantic features, i.e. Named Entities. The challenges to consider are numerous, including the accuracy of the annotation process, the adapted similarity measures, the quality of the Linked Data referred to, and the efficient access to the Semantic Web. We discuss each challenge in detail, as well as possible ways to tackle them.
Conference Paper
Full-text available
Publicly available Linked Data repositories provide a multitude of information. By utilizing Sparql, Web sites and services can consume this data and present it in a user-friendly form, e.g., in mash-ups. To gather RDF triples for this task, machine agents typically issue similarly structured queries with recurring patterns against the Sparql endpoint. These queries usually differ only in a small number of individual triple pattern parts, such as resource labels or literals in objects. We present an approach to detect such recurring patterns in queries and introduce the notion of query templates, which represent clusters of similar queries exhibiting these recurrences. We describe a matching algorithm to extract query templates and illustrate the benefits of prefetching data by utilizing these templates. Finally, we comment on the applicability of our approach using results from real-world Sparql query logs.
Article
Full-text available
This paper will give an overview of the practical implications of the HATEOAS constraint of the REST architectural style, and in that light argue why hypermedia RDF is a practical necessity. We will then sketch a vocabulary for hypermedia RDF using Mike Amundsen's H Factor classification as motivator. Finally, we will briefly argue that SPARQL is important when making non-trivial traversals of Linked Data graphs, and see how to a bridge between Linked Data and SPARQL may be created with hypermedia RDF.
Article
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Article
Full-text available
The current Web of Data is producing increasingly large RDF datasets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large datasets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these datasets. This scenario calls for efficient formats for publication and exchange. This article presents a binary RDF representation addressing these issues. Based on a set of metrics that characterizes the skewed structure of real-world RDF data, we develop a proposal of an RDF representation that modularly partitions and efficiently represents three components of RDF datasets: Header information, a Dictionary, and the actual Triples structure (thus called HDT). Our experimental evaluation shows that datasets in HDT format can be compacted by more than fifteen times as compared to current naive representations, improving both parsing and processing while keeping a consistent publication scheme. Specific compression techniques over HDT further improve these compression rates and prove to outperform existing compression solutions for efficient RDF exchange.
Article
Full-text available
The publication of Linked Open Data on the Web has gained tremendous momentum over the last five years. This development makes possible (and interesting) the execution of queries using up-to-date data from multiple, automatically discovered data sources. As a result, we currently witness the emergence of a new research area that focuses on an online execution of Linked Data queries; i.e. queries that range over data that is made available using the Linked Data publishing principles. This article provides a general overview on this new area. In particular, we introduce the specific challenges that need to be addressed and then focus on possible strategies for executing Linked Data queries. Furthermore, we classify approaches proposed in the literature w.r.t. these strategies.
Conference Paper
Full-text available
Link traversal based query execution is a novel query approach which enables applications that exploit the Web of Data to its full potential. This approach makes use of the characteristics of Linked Data: During query execution it traverses data links to discover data that may contribute to query results. Once retrieved from the Web, the data can be cached and reused for subsequent queries. We expect such a reuse to be beneficial for two reasons: First, it may improve query performance because it reduces the need to re-trieve data multiple times; second, it may provide for additional query results, calculated based on cached data that would not be discoverable by a link traversal based execution alone. However, no systematic analysis exist that justifies the application of caching strategies based on these assumptions. In this paper we evaluate the potential of caching to improve efficiency and result completeness in link traversal based query ex-ecution systems. We conceptually analyze the potential benefit of keeping and reusing retrieved data. Furthermore, we verify the the-oretical impact of caching by conducting a comprehensive experi-ment that is based on a real-world application scenario.
Conference Paper
Full-text available
Link traversal based query execution is a new query execution paradigm for the Web of Data. This approach allows the execution engine to discover potentially relevant data during the query execution and, thus, enables users to tap the full potential of the Web. In earlier work we propose to implement the idea of link traversal based query execution using a synchronous pipeline of iterators. While this idea allows for an easy and efficient implementation, it introduces restrictions that cause less comprehensive result sets. In this paper we address this limitation. We analyze the restrictions and discuss how the evaluation order of a query may affect result set size and query execution costs. To identify a suitable order, we propose a heuristic for our scenario where no a-priory information about relevant data sources is present. We evaluate this heuristic by executing real-world queries over the Web of Data.
Conference Paper
Full-text available
The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.
Conference Paper
Full-text available
Loose coupling is often quoted as a desirable property of systems architectures. One of the main goals of building systems using Web technologies is to achieve loose coupling. However, given the lack of a widely accepted definition of this term, it becomes hard to use coupling as a criterion to evaluate alternative Web technology choices, as all options may exhibit, and claim to provide, some kind of "loose" cou- pling effects. This paper presents a systematic study of the degree of coupling found in service-oriented systems based on a multi-faceted approach. Thanks to the metric intro- duced in this paper, coupling is no longer a one-dimensional concept with loose coupling found somewhere in between tight coupling and no coupling. The paper shows how the metric can be applied to real-world examples in order to support and improve the design process of service-oriented systems.
Article
Full-text available
Data center availability is critical considering the explosive growth in Internet services and people's dependence on them. Furthermore, in recent years, sustainability has become important. However, data center designers have little information on the sustainability impact of data center availability architectures. In this paper, we present an approach to estimate the sustainability impact of such architectures. Availability is computed using Stochastic Petri Net (SPN) models while an exergy-based lifecycle assessment (LCA) approach is used for quantifying sustainability impact. The approach is demonstrated on real life data center power infrastructure architectures. Five different architectures are considered and initial results show that quantification of sustainability impact provides important information to a data center designer in evaluating availability architecture choices.
Article
Full-text available
The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open Web settings. As SPARQL is taken up by the community, there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores as well as systems that rewrite SPARQL queries to SQL queries against non-RDF relational databases. This article introduces the Berlin SPARQL Benchmark (BSBM) for comparing the performance of native RDF stores with the performance of SPARQL-to-SQL rewriters across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix emulates the search and navigation pattern of a consumer looking for a product. The article discusses the design of the BSBM benchmark and presents the results of a benchmark experiment comparing the performance of four popular RDF stores (Sesame, Virtuoso, Jena TDB, and Jena SDB) with the performance of two SPARQL-to-SQL rewriters (D2R Server and Virtuoso RDF Views) as well as the performance of two relational database management systems (MySQL and Virtuoso RDBMS).
Article
Full-text available
Recently, the SPARQL query language for RDF has reached the W3C recommendation status. In response to this emerging standard, the database community is currently exploring efficient storage techniques for RDF data and evaluation strategies for SPARQL queries. A meaningful analysis and comparison of these approaches necessitates a comprehensive and universal benchmark platform. To this end, we have developed SP$^2$Bench, a publicly available, language-specific SPARQL performance benchmark. SP$^2$Bench is settled in the DBLP scenario and comprises both a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. As a proof of concept, we apply SP$^2$Bench to existing engines and discuss their strengths and weaknesses that follow immediately from the benchmark results.
Conference Paper
Applications are increasingly using triple stores as persistence backends, and accessing large amounts of data through SPARQL endpoints. To improve query performance, this paper presents an approach that reuses results of cached queries in a content-aware way for answering subsequent queries. With a focus on a common class of conjunctive SPARQL queries with filter conditions, not only does the paper provide an efficient method for testing whether a query can be evaluated on the result of a cached query, but it also shows how to evaluate the query. Experimental results show the effectiveness of the approach.
Conference Paper
Linked Data repositories offer a wealth of structured facts, useful for a wide array of application scenarios. However, retrieving this data using Sparql queries yields a number of challenges, such as limited endpoint capabilities and availability, or high latency for connecting to it. To cope with these challenges, we argue that it is advantageous to cache data that is relevant for future information needs. However, instead of retaining only results of previously issued queries, we aim at retrieving data that is potentially interesting for subsequent requests in advance. To this end, we present different methods to modify the structure of a query so that the altered query can be used to retrieve additional related information. We evaluate these approaches by applying them to requests found in real-world Sparql query logs.
Conference Paper
Coping with the ever-increasing amount of data becomes increasingly challenging. To alleviate the information overload put on people, systems are progressively being connected directly to each other. They exchange, analyze, and manipulate humongous amounts of data without any human interaction. Most current solutions, however, do not exploit the whole potential of the architecture of the World Wide Web and completely ignore the possibilities offered by Semantic Web technologies. Based on the experiences gained by implementing and analyzing various RESTful APIs and drawing from the longer history of Semantic Web research we developed Hydra, a small vocabulary to describe Web APIs. It aims to simplify the development of truly RESTful services by leveraging the power of Linked Data. By breaking the descriptions down into small independent fragments, a new breed of interoperable Web APIs using decentralized, reusable, and composable contracts can be realized.
Article
To obtain comparable high query performance with relational databases, diverse database technologies have to be adapted to confront the complexity posed by both Resource Description Framework (RDF) data and SPARQL query. Database caching is one of such technologies that improves the performance of database with reasonable space expense based on the spatial/temporal/semantic locality principle. However, existing caching schemes exploited in RDF stores are found to be dysfunctional for complex query semantics. Although semantic caching approaches work effectively in this case, little work has been done in this area. In this paper, we try to improve SPARQL query performance with semantic caching approaches, i.e., SPARQL algebraic expression tree (AET) based caching and entity caching. Successive queries with multiple identical sub-queries and star-shaped joins can be efficiently evaluated with these two approaches. The approaches are implemented on a two-level-storage structure. The main memory stores the most frequently accessed cache items, and items swapped out are stored on the disk for future possible reuse. Evaluation results on three mainstream RDF benchmarks illustrate the effectiveness and efficiency of our approaches. Comparisons with previous research are also provided.
Conference Paper
Hundreds of public SPARQL endpoints have been deployed on the Web, forming a novel decentralised infrastructure for querying billions of structured facts from a variety of sources on a plethora of topics. But is this infrastructure mature enough to support applications? For 427 public SPARQL endpoints registered on the DataHub, we conduct various experiments to test their maturity. Regarding discoverability, we find that only one-third of endpoints make descriptive meta-data available, making it difficult to locate or learn about their content and capabilities. Regarding interoperability, we find patchy support for established SPARQL features like ORDER BY as well as (understandably) for new SPARQL 1.1 features. Regarding efficiency, we show that the performance of endpoints for generic queries can vary by up to 3–4 orders of magnitude. Regarding availability, based on a 27-month long monitoring experiment, we show that only 32.2% of public endpoints can be expected to have (monthly) “two-nines” uptimes of 99–100%.
Article
We explore the factors which contribute to achieving High Availability (HA) on Linux, from intrinsic cluster type to those which lengthen the application's uptime to those which reduce the unplanned downtime.
Article
The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of Data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia.
Conference Paper
As SPARQL endpoints are increasingly used to serve linked data, their ability to scale becomes crucial. Although much work has been done to improve query evaluation, little has been done to take advantage of caching. Effective solutions for caching query results can improve scalability by reducing latency, network IO, and CPU overhead. We show that simple augmentation of the database indexes found in common SPARQL implementations can directly lead to effective caching at the HTTP protocol level. Using tests from the Berlin SPARQL benchmark, we evaluate the potential of such caching to improve overall efficiency of SPARQL query evaluation.
Article
The article included many scenarios in which intelligent agents and bots undertook tasks on behalf of their human or corporate owners. Of course, shopbots and auction bots abound on the Web, but these are essentially handcrafted for particular tasks: they have little ability to interact with heterogeneous data and information types. Because we haven't yet delivered large-scale, agent-based mediation, some commentators argue that the semantic Web has failed to deliver. We argue that agents can only flourish when standards are well established and that the Web standards for expressing shared meaning have progressed steadily over the past five years
Article
Enterprise integration is a way that encourages the development of beneficial applications. Enterprise architects tend to favor practices and approaches based on a service-oriented architecture (SOA). Many approaches to connecting software systems, including but not limited to Corba, many Java-based integration systems, and .NET, generally follow the best practices of SOA.
Linked Data – the story so far:// tomheath.com/ papers/ bizer-heath-bernerslee-ijswis-linked-data
  • C Bizer
  • T Heath
  • Berners
  • T Lee
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data – the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (Mar 2009), http:// tomheath.com/ papers/ bizer-heath-bernerslee-ijswis-linked-data.pdf
sparqlWeb-querying infrastructure: Ready for action?:// link.springer.com/ chapter/ 10
  • Buil
  • C Aranda
  • A Hogan
  • J Umbrich
  • P Y Vandenbussche
Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.Y.: sparqlWeb-querying infrastructure: Ready for action? In: Proceedings of the 12 th International Semantic Web Conference (Nov 2013), http:// link.springer.com/ chapter/ 10.1007/ 978-3-642-41338-4_18
Executing sparql queries over the Web of Linked Data:// www2.informatik.hu-berlin.de/ ~hartig/ files/ HartigEtAl_ QueryTheWeb_ISWC09_Preprint
  • O Hartig
  • C Bizer
  • J C Freytag
Hartig, O., Bizer, C., Freytag, J.C.: Executing sparql queries over the Web of Linked Data. In: Proceedings of the 8 th International Semantic Web Conference. pp. 293–309. Springer (2009), http:// www2.informatik.hu-berlin.de/ ~hartig/ files/ HartigEtAl_ QueryTheWeb_ISWC09_Preprint.pdf
The necessity of hypermedia rdf and an approach to achieve it:// lapis2012.linkedservices.org/ papers/ 1
  • K Kjernsmo
Kjernsmo, K.: The necessity of hypermedia rdf and an approach to achieve it. In: Proceedings of the Workshop on Linked apis for the Semantic Web (May 2012), http:// lapis2012.linkedservices.org/ papers/ 1.pdf
Hydra: A vocabulary for hypermedia-driven Web apis:// ceur-ws.org/ Vol-996/ papers/ ldow2013-paper-03
  • M Lanthaler
  • C Gütl
Lanthaler, M., Gütl, C.: Hydra: A vocabulary for hypermedia-driven Web apis. In: Proceedings of the 6 th Workshop on Linked Data on the Web (May 2013), http:// ceur-ws.org/ Vol-996/ papers/ ldow2013-paper-03.pdf
Improving the performance of Semantic Web applications with sparql query caching In: The Semantic Web: Research and Applications
  • M Martin
  • J Unbehauen
  • S Auer
Martin, M., Unbehauen, J., Auer, S.: Improving the performance of Semantic Web applications with sparql query caching. In: The Semantic Web: Research and Applications, Lecture Notes in Computer Science, vol. 6089, pp. 304-318. Springer (2010), http:// dx.doi.org/ 10.1007/ 978-3-642-13489-0_21
Serendipitous reuse:// steve.vinoski.net/ pdf/ IEEE-Serendipitous_Reuse
  • S Vinoski
Vinoski, S.: Serendipitous reuse. Internet Computing 12(1), 84–87 (Jan 2008), http:// steve.vinoski.net/ pdf/ IEEE-Serendipitous_Reuse.pdf
Hypertext Transfer Protocol (http) Request For Comments 2616
  • R T Fielding
  • J Gettys
  • J Mogul
  • H Frystyk
  • L Masinter
  • P Leach
  • T Berners-Lee
Fielding, R.T., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext Transfer Protocol (http). Request For Comments 2616, Internet Engineering Task Force (Jun 1999), http:// tools.ietf.org/ html/ rfc2616
prov overview. Working group note, World Wide Web Consortium
  • P Groth
  • L Moreau
Groth, P., Moreau, L.: prov overview. Working group note, World Wide Web Consortium (Apr 2013), http:// www.w3.org/ TR/ prov-overview/
Vocabulary of Interlinked Datasets (void) Interest group note, World Wide Web Consortium
  • R Cyganiak
  • J Zhao
  • K Alexander
  • M Hausenblas
Cyganiak, R., Zhao, J., Alexander, K., Hausenblas, M.: Vocabulary of Interlinked Datasets (void). Interest group note, World Wide Web Consortium (Mar 2011), http:// www.w3.org/ TR/ media-frags/
Media fragments uri . (basic). Recommendation, World Wide Web Consortium
  • R Troncy
  • E Mannens
  • S Pfeiffer
  • D Van Deursen
Troncy, R., Mannens, E., Pfeiffer, S., Van Deursen, D.: Media fragments uri . (basic). Recommendation, World Wide Web Consortium (Sep 2012), http:// www.w3.org/ TR/ media-frags/
sparql . federated query. Recommendation, World Wide Web Consortium
  • E Prud 'hommeaux
  • C Buil-Aranda
Prud'hommeaux, E., Buil-Aranda, C.: sparql . federated query. Recommendation, World Wide Web Consortium (Mar 2013), http:// www.w3.org/ TR/ sparql11-federated-query/
Linked Data Platform 1.0. Working draft, World Wide Web Consortium
  • S Speicher
  • J Arwe
  • A Malhotra
Speicher, S., Arwe, J., Malhotra, A.: Linked Data Platform 1.0. Working draft, World Wide Web Consortium (Jul 2013), http:// www.w3.org/ TR/ 2013/ WD-ldp-20130730/
Pubby-a Linked Data frontend for sparql endpoints
  • R Cyganiak
  • C Bizer
Cyganiak, R., Bizer, C.: Pubby-a Linked Data frontend for sparql endpoints, http:// wifo5-03.informatik.uni-mannheim.de/ pubby/
Hypertext Transfer Protocol version 2.0. Internet draft, Internet Engineering Task Force
  • M Belshe
  • R Peon
  • M Thomson
  • A Melnikov
Belshe, M., Peon, R., Thomson, M., Melnikov, A.: Hypertext Transfer Protocol version 2.0. Internet draft, Internet Engineering Task Force (Dec 2013), http:// tools.ietf.org/ html/ draft-ietf-httpbis-http2-09
Feed paging and archiving. Request For Comments 5005, Internet Engineering Task Force
  • M Nottingham
Nottingham, M.: Feed paging and archiving. Request For Comments 5005, Internet Engineering Task Force (Sep 2007), http:// tools.ietf.org/ html/ rfc5005
sparql . query language. Recommendation, World Wide Web Consortium
  • S Harris
  • A Seaborne
Harris, S., Seaborne, A.: sparql . query language. Recommendation, World Wide Web Consortium (Mar 2013), http:// www.w3.org/ TR/ sparql11-query/
Serendipitous reuse Enabling fine-grained http caching of sparql query results
  • S Vinoski
  • G T Williams
  • J Weaver
Vinoski, S.: Serendipitous reuse. Internet Computing 12(1), 84-87 (Jan 2008), http:// steve.vinoski.net/ pdf/ IEEE-Serendipitous_Reuse.pdf [35] Williams, G.T., Weaver, J.: Enabling fine-grained http caching of sparql query results. In: Proceedings of the 10 th International Conference on The Semantic Web. pp. 762-777. Springer (2011), http:// www.cs.rpi.edu/ ~weavej3/ papers/ iswc2011.pdf