ArticlePDF Available

Towards Efficient Schema-Enhanced Pattern Matching over RDF Data Streams

Authors:

Abstract and Figures

Data streams, often seen as sources of events, have appeared on the Web. Event processing on the Web needs however to cope with the typical openness and heterogeneity of the Web environment. Semantic Web technology, meant to facilitate data integration in an open envi-ronment, can help to address heterogeneities across multiple streams. In this paper we discuss an approach towards efficient pattern matching over RDF data streams based on the Rete algorithm, which can be considered as a first building block for event processing on the Web. Our approach focuses on enhancing Rete with knowledge from the RDF schema as-sociated with data streams, so that implicit knowledge can contribute to pattern matching. Moreover, we cover Rete extensions that cope with the streaming nature of the processed data, such as support for temporal operators, time windows, consumption strategies and garbage collection.
Content may be subject to copyright.
A preview of the PDF is not available
... In Sparkwave entailments have been implemented separately and denoted as the e-network [Komazec and Cerri(2011), Komazec et al(2012)] to separate from aand b -nodes found in the original Rete-algorithm [Forgy(1982)]. In [Komazec et al(2012)] it is concluded that "only rules rdfs2, rdfs3, rdfs7 and rdfs9 need consideration at runtime", but Sparkwave also adds three rules on owl:inverseOf and owl:SymmetricProperty to arrive at subsets of both RDFS and OWL entailments. ...
... In Sparkwave entailments have been implemented separately and denoted as the e-network [Komazec and Cerri(2011), Komazec et al(2012)] to separate from aand b -nodes found in the original Rete-algorithm [Forgy(1982)]. In [Komazec et al(2012)] it is concluded that "only rules rdfs2, rdfs3, rdfs7 and rdfs9 need consideration at runtime", but Sparkwave also adds three rules on owl:inverseOf and owl:SymmetricProperty to arrive at subsets of both RDFS and OWL entailments. ...
... One regime may also implement two inverse variants of a rule (e.g. prp-inv1 and prp-inv2 in OWL 2 RL [W3C (2012)]), while another solution is to implement one of the rules complemented by an explicit inverse rule (inv1 and inv2 in [Komazec et al(2012)]). It is therefore observed that apart from D*, P and SWCLOS2, which are intentionally disjoint, there is no modularity between entailment regimes and they are not suitable for use in parallel. ...
Article
Full-text available
Stream reasoning is one of the building blocks giving semantic web an advantage in the race for the real-time web. This paper demonstrates implementation of materialisation-based reasoning using an event processor supporting networks of specification-compliant SPARQL Update rules. Collections of rules coded in SPARQL leave the rule implementation exposed for selection and modification by the platform user using the same query language for both the queries and entailment rules. Observations on the differences of SPARQL and rule semantics are made. The entailment-category tests of the SPARQL 1.1 conformance test set are thoroughly reviewed. New rules are constructed to improve platform pass rate, and the test results are measured. An event-based memory handling solution to the accumulation of data in stream processing scenarios through separation of static data (e.g. the ontology) from dynamic event data is presented and tested. This implementation extends the reasoning support available in an RDF stream processor from RDF(S) to rho-df, D*, P-entailment and OWL 2 RL. The performance of the Instans platform is measured using a well-known benchmark requiring reasoning, comparing complete sets of entailment rules against the necessary subset to complete each test. Performance is also compared to non-streaming SPARQL query processors with reasoning support.
... -No support for heterogeneous events: Delimiting windows by time (with timestamps assigned to individual triples) or the number of triples carries a strong assumption that each event is represented by a single triple, an approach sometimes referred to as data stream processing [13]. Event objects consisting of a variable number of triples would be split across window borders (or merged, if the timestamps are identical). ...
... Streaming SPARQL was first presented in [11] using a network highly similar to Rete. Komazec and Cerri [13] apply SPARQL queries to RDF data using an extended Rete-algorithm in a system called Sparkwave 7 . Their focus is on supporting selected RDF and RDFS inference rules through the use of a preprocessing ✏ network and fast processing of data streams consisting of individual triples instead of multi-triple events. ...
... This size of network would already be capable of solving quite complex event processing tasks. The complete set of rules is available in executable form in the Instans github repository 13 . Memory management was not discussed in detail in this document, but the complete query set in the repository avoids garbage buildup by two means: ...
Conference Paper
Full-text available
SPARQL was originally developed as a derivative of SQL to process queries over finite-length datasets encoded as RDF graphs. Processing of infinite data streams with SPARQL has been approached by using pre-processors dividing streams into finite-length windows based on either time or the number of incoming triples. Recent extensions to SPARQL can support interconnections of queries, enabling event processing applications to be constructed out of multiple incrementally processed collaborating SPARQL update rules. With more elaborate networks of queries it is possible to perform event processing on heterogeneous event formats without strict restrictions on the number of triples per event. Heterogeneous event support combined with the capability to synthesize new events enables the creation of layered event processing systems. In this paper we review the different types of complex event processing building blocks presented in literature and show their translations to SPARQL update rules through examples, supporting a modular and layered approach. The interconnected examples demonstrate the creation of an elaborate network of SPARQL update rules for solving event processing tasks.
... Sparkweave 5 [6] applies SPARQL queries to RDF format data using an extended Rete-algorithm, but focuses on inference and fast data stream processing of individual triples instead of heterogeneous events. Sparkweave v. 1.1 also doesn't have support for SPARQL 1.1 features such as SPARQL Update. ...
... No other system based on collaborative SPARQL queries is known to us. Current systems in the research community are mainly concentrating on running one query at a time 6 . Even the ones allowing to register multiple simultaneous queries are not expecting the queries to communicate during runtime. ...
Conference Paper
Full-text available
Complex event processing is currently done primarily with proprietary definition languages. Future smart environments will require collaboration of multi-platform sensors operated by multiple parties. The goal of my research is to verify the applicability of standard-compliant SPARQL for complex event processing tasks. If successful, semantic web standards RDF, SPARQL and OWL with their established base of tools have many other benefits for event processing including support for interconnecting disjoint vocabularies, enriching event information with linked open data and reasoning over semantically annotated content. A software platform capable of continuous incremental evaluation of multiple parallel SPARQL queries is a key enabler of the approach.
... However, its extensions are in place to support stream reasoning e.g. [13], [81] and [43]. Jess also supports backward chaining, which is effectively simulated in terms of forward chaining rules [109]. ...
Thesis
Semantic technologies have been extensively used for integrating stream data applications. However, SWRL, which has become the de facto standard rule language in Semantic Web, has never been used in stream data applications. Its open world assumption and monotonic nature makes SWRL powerless for doing continuous inference over stream data. For example, aggregate functions on a particular window of streams cannot be expressed in SWRL. Semantic Web standard query language, SPARQL, has been extensively used in stream data applications. A number of its extensions have been developed to enable powerful stream processing capabilities including data filtering and aggregation functions. One of the most recognized, C-SPARQL, is a framework which supports continuous querying over data streams combined with “static” knowledge bases. However, stream processing systems have their own limitation, e.g. they cannot modify the knowledge base. State-of-the-art stream reasoning systems have achieved the desired expressivity and scalability level. However, being a hybrid approach they suffer from translation, reasoner and side-effects issues. The purpose of this thesis, therefore, is to provide a unified Semantic Web stream reasoning framework that further supports continuous inference over stream data. C-SWRL was developed, a system that uses SWRL rules in conjunction with C-SPARQL filtering and aggregation of RDF streams to enable closed-world and time-aware reasoning over stream data. Moreover, the non-monotonic behavior is supported with the use of OWLAPI constructs. In particular, it is shown how negation as failure (NAF) can be enabled in this system. C-SWRL is presented by means of examples in water quality monitoring. Moreover, the contribution of this thesis also includes the development of an ontology for water quality management called INWS ontology. Namely, it is an SSN-based ontology to support water quality classification based on different regulation authorities such as EU Water Framework Directive. Furthermore, to demonstrate its usage, StreamJess was developed, which is an expert system which uses INWS ontology for water quality monitoring and investigation of potential sources of pollution.
... As a different solution to optimizing mobile, semantic reasoning, Tai et al. [7] present a selective rule loading algorithm, which composes a pD* ruleset based on ontology expressivity; and a two-phase RETE construction process, which utilizes selectivity information from the first phase to optimize join sequences in the second phase. Komazec et al. [38] integrated a special network into RETE to optimize RDFS entailments. ...
Conference Paper
Full-text available
Mobile hardware improvements have opened the door for deploying rule systems on ubiquitous, mobile platforms. By executing rule-based tasks locally, less remote (cloud) resources are needed, bandwidth usage is reduced, and local, time-sensitive tasks are no longer influenced by network conditions. Further, with data being increasingly published in semantic format, an opportunity arises for rule systems to leverage the embedded semantics of semantic, ontology-based data. To support this kind of ontology-based reasoning in rule systems, rule-based axiomatizations of ontology semantics can be utilized (e.g., OWL 2 RL). Nonetheless, recent benchmarks have found that any kind of semantic reasoning on mobile platforms still lacks scalability, at least when directly re-using existing (PC- or server-based) technologies. To create a tailored solution for resource-constrained platforms, we propose changes to RETE, the mainstay algorithm for production rule systems. In particular, we present an adapted algorithm that, by selectively pooling RETE memories, aims to better balance memory usage with performance. We show that this algorithm is well-suited towards many typical Semantic Web scenarios. Using our custom algorithm, we perform an extensive evaluation of semantic, ontology-based reasoning, using our custom RETE algorithm and an OWL2 RL ruleset, both on the PC and mobile platform.
... However, its extensions are in place to support stream reasoning e.g. (Walzer et al., 2008), (Komazec and Cerri, 2011) and (Schmidt et al., 2008). Jess also supports backward chaining, which is effectively simulated in terms of forward chaining rules (Hill, 2003). ...
Article
Stream data knowledge bases modeled with OWL are a proved natural approach. But, querying and reasoning over these knowledge bases is not supported with standard Semantic Web technologies like SPARQL and SWRL. Query processing systems enable querying, but to the best of our knowledge, Semantic Web rules are still unable to handle the required reasoning features for effective inference over stream data i.e. non-monotonic, closed-world and time-aware reasoning. In absence of such system, we showed in our previous work how Jess can be used for monitoring water quality, but by bringing input data manually. In this paper, we enable stream data support and thus a timely detection of faulty water quality statuses. The system also identifies the potential sources of pollution by also extending our ontology with the pollutants module. The solution utilizes C-SPARQL abilities to filter and aggregate RDF streams on windows to enable closed-world and time-aware reasoning with Jess rules. Moreover, Jess Tab functions are used to enable non-monotonic behavior
... However, its extensions are in place to support stream reasoning e.g. (Walzer et al., 2008), (Komazec and Cerri, 2011) and (Schmidt et al., 2008). Jess also supports backward chaining, which is effectively simulated in terms of forward chaining rules (Hill, 2003). ...
Article
Full-text available
Stream data knowledge bases modeled with OWL are a proved natural approach. But, querying and reasoning over these knowledge bases is not supported with standard Semantic Web technologies like SPARQL and SWRL. Query processing systems enable querying, but to the best of our knowledge, Semantic Web rules are still unable to handle the required reasoning features for effective inference over stream data i.e. non-monotonic, closed-world and time-aware reasoning. In absence of such system, we showed in our previous work how Jess can be used for monitoring water quality, but by bringing input data manually. In this paper, we enable stream data support and thus a timely detection of faulty water quality statuses. The system also identifies the potential sources of pollution by also extending our ontology with the pollutants module. The solution utilizes C-SPARQL abilities to filter and aggregate RDF streams on windows to enable closed-world and time-aware reasoning with Jess rules. Moreover, Jess Tab functions are used to enable non-monotonic behavior.
... Streaming SPARQL was first presented in [17] using a network highly similar to Rete. Komazec and Cerri [20] apply SPARQL queries to RDF data using an extended Rete algorithm in a system called Sparkwave. 6 Their focus is on supporting selected RDF and RDFS inference rules through the use of a pre-processing network and fast processing of data streams consisting of individual triples instead of multi-triple events. ...
Article
Full-text available
SPARQL was originally developed to process queries over finite-length datasets encoded as RDF graphs. Processing of infinite data streams can be enabled through continuous incremental evaluation of an incoming event stream. SPARQL Update provides tools for interconnecting queries, enabling event processing applications to be constructed out of multiple incrementally processed collaborating rules. These rule networks can perform event processing on heterogeneous event structures. Heterogeneous event support combined with the capability to synthesise new events enables the creation of layered event processing networks. In this paper, we review the different types of complex event processing building blocks presented in the literature and show their translations to SPARQL Update rules through examples, supporting a modular and layered approach. The interconnected examples demonstrate the creation of an elaborate network for solving event processing tasks. The performance of the example event processing network is verified on the Instans platform.
... A pure performance comparison of Instans would be best made against another RDF/SPARQL system using incremental query evaluation. Sparkweave 4 version 1.1 presented by Komazec and Cerri [8] -which is based on Retealgorithm -is the only one we have discovered so far. Unfortunately, it does not support FILTERs or SPARQL 1.1 features, so it was not possible to use it for comparison of our example. ...
Conference Paper
Full-text available
SPARQL query language is targeted to search datasets encoded in RDF. SPARQL Update adds support of insert and delete operations between graph stores, enabling queries to process data in steps, have persistent memory and communicate with each other. When used in a system supporting incremental evaluation of multiple simultaneously active and collaborating queries SPARQL can define entire event processing networks. The method is demonstrated by an example service, which triggers notifications about the proximity of friends, comparing alternative SPARQL-based approaches. Observed performance in terms of both notification delay and correctness of results far exceed systems based on window repetition without extending standard SPARQL or RDF.
Article
Full-text available
This article defines C-SPARQL, an extension of SPARQL whose distinguishing feature is the support of continuous queries, i.e. queries registered over RDF data streams and then continuously executed. Queries consider windows, i.e. the most recent triples of such streams, observed while data is continuously flowing. Supporting streams in RDF format guarantees interoperability and opens up important applications, in which reasoners can deal with evolving knowledge over time. C-SPARQL is presented by means of a full specification of the syntax, a formal semantics, and a comprehensive set of examples, relative to urban computing applications, that systematically cover the SPARQL extensions. The expression of meaningful queries over streaming data is strictly connected to the availability of aggregation primitives, thus C-SPARQL also includes extensions in this respect.
Article
This article defines C-SPARQL, an extension of SPARQL whose distinguishing feature is the support of continuous queries, i.e. queries registered over RDF data streams and then continuously executed. Queries consider windows, i.e. the most recent triples of such streams, observed while data is continuously flowing. Supporting streams in RDF format guarantees interoperability and opens up important applications, in which reasoners can deal with evolving knowledge over time. C-SPARQL is presented by means of a full specification of the syntax, a formal semantics, and a comprehensive set of examples, relative to urban computing applications, that systematically cover the SPARQL extensions. The expression of meaningful queries over streaming data is strictly connected to the availability of aggregation primitives, thus C-SPARQL also includes extensions in this respect.
Chapter
Complex Event Processing (CEP) is concerned with timely detection of complex events within multiple streams of atomic occurrences, and has useful applications in areas including financial services, mobile and sensor devices, click stream analysis and so forth. In this chapter, we present ETALIS Language for Events. It is an expressive language for specifying and combining complex events. For this language we provide both a syntax as well as a clear declarative formal semantics. The execution model of the language is based on a compilation strategy into Prolog. We provide an implementation of the language, and present experimental results of our running prototype. Further on, we show how our logic rule-based approach compares with a non-logic approach in respect of performance.
Article
Snoop is an event specification language developed for expressing primitive and composite events that are part of Event-Condition-Action (or ECA) rules. In Snoop, an event was defined to be an instantaneous, atomic (happens completely or not at all) occurrence of interest and the time of occurrence of the last event in an event expression was used as the time of occurrence for the entire event expression. The above detection-based semantics does not recognize multiple compositions of some operators – especially Sequence – in the intended way. In order to recognize all event operators, in all contexts, in the intended way, operator semantics need to include start time as well as end time for an event expression (i.e., interval-based semantics). In this paper, we formalize Snoop Interval-Based (SnoopIB), the occurrence of Snoop event operators and expressions using interval-based semantics. The algorithms for the detection of events using interval-based semantics introduce some challenges, as not all the events are known (especially their starting points).
Article
The Rete Match Algorithm is an efficient method for comparing a large collection of patterns to a large collection of objects. It finds all the objects that match each pattern. The algorithm was developed for use in production system interpreters, and it has been used for systems containing from a few hundred to more than a thousand patterns and objects. This article presents the algorithm in detail. It explains the basic concepts of the algorithm, it describes pattern and object representations that are appropriate for the algorithm, and it describes the operations performed by the pattern matcher.
Conference Paper
Snoop is an event specification language developed for expressing primitive and composite events that are part of Event-Condition-Action (or ECA) rules. In Snoop, an event was defined to be an instantaneous, atomic (happens completely or not at all) occurrence of interest and the time of occurrence of the last event in an event expression was used as the time of occurrence for the entire event expression. The above detection-based semantics does not recognize multiple compositions of some operators – especially Sequence – in the intended way. In order to recognize all event operators, in all contexts, in the intended way, operator semantics need to include start time as well as end time for an event expression (i.e., interval-based semantics). In this paper, we formalize Snoop Interval-Based (SnoopIB), the occurrence of Snoop event operators and expressions using interval-based semantics. The algorithms for the detection of events using interval-based semantics introduce some challenges, as not all the events are known (especially their starting points).
Conference Paper
Complex event processing is an important technology with possible application in supply chain management and business activity monitoring. Its basis is the identification of event patterns within multiple occurring events having logical, causal or temporal relationships. The Rete algorithm is commonly used in rule-based systems to trigger certain actions if a corresponding rule holds. The algorithm’s good performance for a high number of rules makes it ideally suited for complex event detection. However, the traditional Rete algorithm does not support aggregation of values in time-based windows although this is a common requirement in complex event processing for business applications. We propose an extension of the Rete algorithm to support temporal reasoning, namely the detection of time-based windows by adding a time-enabled beta-node to restrict event detection to a certain time-frame.
Conference Paper
Making a database system active entails devel- oping an expressive event specification language with well-defined semantics, algorithms for the detection of composite events, and an architec- ture for an event detector along with its imple- mentation. Thii paper presents the semantics of composite events using the notion of a global event history (or a global event-log). Parame- ter contexts are introduced and precisely defined to facilitate efficient management and detection of composite events. Finally, an architecture and the implementation of a composite event, detector is analyzed in the context of an object-oriented active DBMS.