Chapter

Autonomous RDF Stream Processing for IoT Edge Devices

Abstract and Figures

The wide adoption of increasingly cheap and computationally powerful single-board computers, has triggered the emergence of new paradigms for collaborative data processing among IoT devices. Motivated by the billions of ARM chips having been shipped as IoT gateways so far, our paper proposes a novel continuous federation approach that uses RDF Stream Processing (RSP) engines as autonomous processing agents. These agents can coordinate their resources to distribute processing pipelines by delegating partial workloads to their peers via subscribing continuous queries. Our empirical study in “cooperative sensing” scenarios with resourceful experiments on a cluster of Raspberry Pi nodes shows that the scalability can be significantly improved by adding more autonomous agents to a network of edge devices on demand. The findings open several new interesting follow-up research challenges in enabling semantic interoperability for the edge computing paradigm.
Content may be subject to copyright.
Autonomous RDF Stream Processing
for IoT Edge Devices
Manh Nguyen-Duc1(B
), Anh Le-Tuan1,3, Jean-Paul Calbimonte4,
Manfred Hauswirth1,2, and Danh Le-Phuoc1
1Open Distributed Systems, TU Berlin, Berlin, Germany
manh.nguyenduc@campus.tu-berlin.de
2Fraunhofer Institute for Open Communication Systems, Berlin, Germany
3Insight Centre for Data Analytics, NUI Galway, Galway, Ireland
4University of Applied Sciences and Arts Western Switzerland HES-SO,
Sierre, Switzerland
Abstract. The wide adoption of increasingly cheap and computation-
ally powerful single-board computers, has triggered the emergence of new
paradigms for collaborative data processing among IoT devices. Moti-
vated by the billions of ARM chips having been shipped as IoT gateways
so far, our paper proposes a novel continuous federation approach that
uses RDF Stream Processing (RSP) engines as autonomous processing
agents. These agents can coordinate their resources to distribute pro-
cessing pipelines by delegating partial workloads to their peers via sub-
scribing continuous queries. Our empirical study in “cooperative sens-
ing” scenarios with resourceful experiments on a cluster of Raspberry Pi
nodes shows that the scalability can be significantly improved by adding
more autonomous agents to a network of edge devices on demand. The
findings open several new interesting follow-up research challenges in
enabling semantic interoperability for the edge computing paradigm.
Keywords: Autonomous systems ·Stream processing ·Cooperative
sensing ·Query federation
1 Introduction
Over the last few years, Semantic Web technologies have provided promising
solutions for achieving semantic interoperability in the IoT (Internet of Things)
domain. Ranging from ontologies for describing streams and devices [10,11],
to continuous query processors and stream reasoning agents [8], these efforts
constitute important milestones towards the integration of heterogeneous IoT
platforms and applications. While these different technologies enable the pub-
lication of streams using semantic technologies (e.g., RDF streams), and the
querying of streaming data over ontological representations, most of them tend
to centralise the processing, relegating interactions among IoT devices simply
to data transmission. This approach may be convenient in certain scenarios
c
Springer Nature Switzerland AG 2020
X. Wang et al. (Eds.): JIST 2019, LNCS 12032, pp. 304–319, 2020.
https://doi.org/10.1007/978-3-030-41407-8_20
Autonomous RDF Stream Processing on IoT Edge Devices 305
where the streams, typically time-annotated RDF data, are integrated following
a top-down approach, for instance using cloud-based solutions for RDF Stream
Processing (RSP). However, in the context of IoT, decentralised integration
paradigms fit better with the distributed nature of autonomous deployments
of smart devices [22]. Moreover, moving the computation closer to the edge net-
works, such as sensor nodes or IoT gateways, will not only create more chances
to improve performance and to reduce network overhead/bottlenecks, but also to
enable flexible and continuous integration of new IoT devices/data sources [19].
Thanks to recent developments in the design of embedded devices, e.g.,
ARM boards [23], single board computers are getting cheaper and smaller while
increasing their computational power. For example, a Raspberry computer costs
less than 40 EUR and its size is just roughly as big as the size of a credit card.
Despite the size, they are powerful enough to run a fully-functioning Linux dis-
tribution that provides both operational and deployment advantages. On the
one hand, they are both power efficient and cost-effective, while computation-
ally powerful. On the other hand, their small sizes make it easier to embed or
bundle them with other IoT devices (e.g., sensors and actuators) as a processing
gateway interfacing with outer networks, called edge devices.
RDF Stream Processing (RSP) [21] extends the RDF data model, enabling
to capture and to process heterogeneous streaming sensor sources under a uni-
fied data model. An RSP engine usually supports a continuous query language
based on SPARQL, e.g. C-SPARQL [3] and CQELS-QL [15]. Hence, an edge
device equipped with an RSP engine could play the role of an autonomous data
processing gateway. Such an autonomous gateway can coordinate the actions
with other peers connected to it to execute a data processing pipe in a collab-
orative fashion. However, to the best of our knowledge, there has not been any
in-depth study on how such a decentralised processing paradigm would work
with edge devices. In particular, an edge device has 10–100 times less resources
than those of a PC counter-part which is originally the expected execution set-
ting for an RSP engine. Hence, this raises two main questions: how feasible would
it be to enable such a paradigm for edge devices, and how it would affect the
performance and scalability. Putting our motivation in the context of 100 billion
ARM chips that have been shipped so far [4], enabling computational and pro-
cessing autonomy along with semantic interoperability will have a huge impact
even for a small fraction of this number of devices (e.g. 0.1% would account for
100s millions devices).
To this end, this paper investigates how to realise this edge computing
paradigm by extending an RSP engine (i.e., CQELS) as a continuous query feder-
ation engine to enable a decentralised computation architecture for edge devices.
A prototype engine was implemented to empirically study the performance and
scalability aspects on “cooperative sensing” scenarios. Our experiment results
on a realistic setup with the biggest network of its kind in Sect.4show that our
federation engine can considerably scale the processing throughput of a network
of edge devices by adding more nodes on demand. We believe this is the largest
experiment setup of its kind so far. The main contributions of the paper are
summarised below:
306 M. Nguyen-Duc et al.
1. We propose a novel federation mechanism based on autonomous RSP Engines
and distributed continuous queries.
2. We present technical details on how to realise such a federation mechanism
by integrating an RSP engine and an RDF Store for edge devices.
3. We carry out an empirical study on performance and scalability on “cooper-
ative sensing” that leads to various quantitative findings and then opens up
several interesting follow-up research challenges.
The paper is outlined as follows. The next section presents our approach on
continuous federation based on autonomous RSP. Section3describes the imple-
mentation details of our federated RSP engine for edge devices. The setup and
results of the experiments is reported in Sect. 4. We summarise related work in
Sects. 5and 6concludes the paper.
2 Continuous Federation with Autonomous RSP
2.1 Preliminaries: RDF Stream Processing with CQELS-QL
CQELS-QL is a continuous query language for RSP that extends SPARQL 1.1
with sliding windows [15]. As an example, the CQELS-QL query below continu-
ously provides the “updates for the locations of 10 weather stations which have
reported the highest temperatures in the last 5 min”. This query then is also
used as Query Q3 in the experiments of Sect. 4.
1SELECT ?sensor ?maxTemp ?lat ?lon
2SELECT {
3{SELECT ?sensor (MAX(?temp)as ?maxTemp)
4STREAM ?streamURI [RANGE 5m ON sosa :resultTime ] {
5?observation sosa:hasSimpleResult ?temp.
6?sensor rdf:type <TempSensor>; made:Observation ?observation.}
7GROUP BY ?sensor}
8?streamURI prov:wasGeneratedBy ?sensor.?sensor sosa:isHostedBy ?station.
9?station wgs84:Point ?loc.?loc wgs84:lat ?lat; wgs84:lon ?lon.}
10 ORDER BY ?maxTemp
11 LIMIT 10
Listing 1. Query Q3 in CQELS-QL (prefixes are omitted)
In the original centralised setting, the above query can be subscribed to a
CQELS engine installed in one processing node. Stream data in RDF formats (e.g
JSON-LD or Turtle) can be provided to it from data acquisition nodes, called
streaming nodes. These streaming nodes collect data from sensors of weather
stations that can be geographically distributed in different locations. In practice,
an edge device can host both a CQELS node and a streaming node, but we
can assume they communicate via an internal process. As soon as the data
is collected, the sensor data is pushed to the CQELS engine via a streaming
protocol such as Websocket or MQTT. The incoming data continuously triggers
the processing pipeline compiled from Query Q3. Consequently, the computing
node that hosts this CQELS engine needs to have enough resources (bandwidth,
CPU and memory) to deal with the workload regardless of how many stream
nodes exist in the network. Hence, if the CQELS engine is only hosted on an edge
Autonomous RDF Stream Processing on IoT Edge Devices 307
device, the physical limits of its hardware quickly becomes a bottleneck as shown
in Sect. 4. To create a more scalable processing system, we need to decentralise
the processing pipelines of similar queries to a network of edge devices connected
to these stream nodes. The following two sections describe our approach to enable
this type of network.
2.2 Dynamic Subscription and Discovery for Autonomous RSP
Engines
To enable a CQELS engine to work in a decentralised fashion, it would require
the capability to operate as an autonomous agent which can collaborate with
other peers to execute a distributed processing pipeline specified in CQELS-QL.
An autonomous CQELS node can dynamically join a network of existing peers
by subscribing itself to an existing node in the network, called a parent node, and
it then notifies the parent node about the query service and streaming service it
can provide to the network. These services can be semantically described by using
vocabularies provided by VoCaLS [27]. For instance, VoCaLS allows describing
the URIs of the streams and their related metadata (e.g. sensors that generated
the streams), which are used in the query patterns of the query Q3. Hence, a
subscription can be done by sending a RDF-based message via a REST API or
Websocket channel. Listing 2illustrates a snippet of a subscription message in
RDF Turtle that is used in our experiments in Sect. 4.
1<> a vocals:StreamDescriptor, vsd:CatalogService; dcat:dataset :NOAAWeather.
2:NOAAWeather a vocals:RDFStream; prov:wasGeneratedBy :TemperatureSensor;
3vocals:hasEndpoint :NOAAWeatherEndpoint; dct:title "Weather stream From Berlin".
4:NOAAWeatherEndpoint a vocals:StreamEndpoint; dct:format frmt:JSON-LD;
5dcat:accessURL "ws://192.168.178.5/noaa/berlin".
Listing 2. Example of subscription message in RDF
Based on the semantic description provided by the subscribed nodes, the
parent node can carry the stream discovery patterns which use a variable in the
stream pattern, as shown in line 4 of the query Q3. The variable ?streamURI then
can be matched in other metadata as shown in line 8. In this example, it is used
to link with the sensors that generated this stream. Recursively, the subscription
process can propagate the stream information upstream hierarchically, and vice
versa, the discovery process can be recursively delegated to downstream nodes
via sub-queries in CQELS-QL.
To this end, when an autonomous CQELS joins a network, it makes itself and
its connected nodes discoverable and queryable to other nodes of the network.
Moreover, each node can share its processing resources by executing a CQELS
query on it. This will help us treat the query similar to query Q3 as a query
to a sensor network whereby sensor nodes and network gateways collaborate as
a single system to answer the query of this kind. Next, we will discuss how to
federate such queries in “cooperative sensing” scenarios whereby such a network
of autonomous CQELS processing nodes will coordinate each other to answer a
CQELS-QL request in a decentralised fashion.
308 M. Nguyen-Duc et al.
2.3 Continuous Query Federation Mechanism
With the support of the above subscription and discovery operations, a stream
processing pipeline written in CQELS-QL can be deployed across several sites
distributed in different locations: e.g., weather stations provide environmental
sensory streams in various locations on earth. Each autonomous CQELS node
gives access to data streams fed from streaming nodes connecting to it. Such
stream nodes can ingest a range of sensors, such as air temperature, humidity
and carbon monoxide. When the stream data arrives, this CQELS node can
partially process the data at its processing site, and then forward the results as
mappings or RDF stream elements to its parent node.
In this context, when a query is subscribed to the top-most node, called
root node, it will divide this query to sub-query fragments and deploy at one or
more sites via its subscribed nodes. A query fragment consists of one or more
operators, and each fragment of the same query can be deployed on different
processing nodes. Recursively, a sub-query delegated to a node can be federated
to its subscribed nodes. All participant nodes of a processing pipeline can syn-
chronise their processing timeline via a timing stream that is propagated from
the root node. The execution process of sub-query fragments can use resources,
i.e. CPU, memory, disk space and network bandwidth of participant nodes to
process incoming RDF graphs or sets of solution mappings and generate output
RDF graphs/sets of solution mappings. Output streams may be further pro-
cessed by fragments of the same query, until results are sent to the query issuer
at the root node. For example, the sub-query of the query Q3 in below Listing 3
can be sent down to the nodes closer to the streaming nodes, then the results
will be recursively sent to upper nodes to carry out the partial top-k queries
in lines 10 and 11 until it reaches the root node to carry out final computation
steps to return the expected results.
1SELECT ?sensor (MAX(?temp)as ?maxTemp)
2WHERE{
3STREAM ?streamURI [RANGE 5m ON sosa:resultTime]
4{?observation sosa:hasSimpleResult ?temp.
5?sensor rdf:type <TempSensor>; made:Observation ?observation.}
6?streamURI prov:wasGeneratedBy ?sensor.}
7GROUP BY ?sensor
Listing 3. An example of the subquery of Q3
This federation process can be carried out dynamically thanks to the dynamic
subscription and discovery capability above. Moreover, the processing topology
of such as processing pipelines in our experiment scenarios of Sect. 4can be
dynamically configured by changing where and how participant nodes subscribed
themselves to the processing networks. For example, we carried out five different
federation topologies in Sect. 4. The biggest advantage of this federation mecha-
nism is the ability to dynamically push some processing operations closer to the
streaming nodes to alleviate the network and processing bottlenecks which often
happen at edge devices. Moreover, this mechanism can significantly improve the
processing throughput by adding more processing nodes on demand as shown in
the experiments in Sect. 4.
Autonomous RDF Stream Processing on IoT Edge Devices 309
3 Design and Implementation
To enable the cooperative federation of RSP engines on edge devices, we built
a decentralised version of the CQELS engine, called Fed4Edge. Fed4Edge was
implemented by extending the algorithms and Java codebase of the original
open sourced version of CQELS [15]. Thanks to the platform-agnostic design
of its execution framework [14], the core components are abstract enough to be
seamlessly integrated with different RDF libraries in order to port to different
hardware platforms. To tailor the RDF-based data processing operations on
edge devices (e.g ARM CPU, Flash-storage and the likes), we integrated the
core components of CQELS with the counterparts of RDF4Led [17], a RISC
style RDF engine for lightweight edge devices. The Fed4Edge system will be
open-sourced at https://github.com/cqels/Fed4Edge.
Fig. 1. Overview architecture of Fed4Edge
The architectural overview of
the system is depicted in Fig. 1.
The core components of CQELS
and RDF4Led such as the
Dictionary,Encoder,Decoder,
Dynamic Execution,Adaptive
Query Optimiser,Buffer Man-
ager are reused in our Fed4Edge
implementation. And the exten-
sion plugins of them such Adap-
tive Federator,Thing Direc-
tory,Stream Input Handler and
Stream Output Handler are built
to facilitate the federation mech-
anism proposed in Sect. 2.The
technical details of these compo-
nents are discussed next.
CQELS and RDF4Led share
similar RDF data processing
flows due to the fact that both
systems apply the same RDF data encoding approach, which normalises RDF
nodes into a fixed-size integer. By encoding the RDF nodes, most of the opera-
tors on RDF data can be executed on a smaller data structure rather than large
variable-length strings. This approach is commonly used in many RDF data
processors in order to reduce memory footprint, I/O time, and improve cache
efficiency. The platform-agnostic design of CQELS allows the size of the encoded
node to be tuned to adapt to a targeted platform without changing the imple-
mentation of other core components. Therefore, the Encoder, Decoder and Dic-
tionary of RDF4Led can be easily integrated with CQELS for the RDF normal-
isation tasks. After receiving RDF data from RDF stream subscriptions via the
Stream Input Handler, the data is encoded by the Encoder. The encoded RDF
triples are then sent to the Buffer Manager for further processing. The Decoder
waits for the output from the Dynamic Executor, transforms the encoded nodes
310 M. Nguyen-Duc et al.
back to a lexical representation before sending them to the Stream Output Han-
dler. The Encoder and Decoder share the Dictionary for encoding and decoding.
Instead of using a 64-bit integer for encoding node as in the original version
of CQELS, the Dictionary of RDF4Led uses 32-bit integers, which entails less
memory footprint for cached data. Therefore, backed by RDF4Led, Fed4Edge
can process 30 million triples with only 80 MB of memory [17] on ARM comput-
ing architectures.
The Buffer Manager is responsible for managing the buffered data of windows
and then feeding the data to the Dynamic Executor. Furthermore, the Buffer
Manager also manages cached data for querying and writing the static data
in the Thing Directory. Stream data is evicted from the buffer by the data
invalidating policy defined by the window operators [12,15]. Meanwhile, the
flash-aware updating algorithms of RDF4Led are reused in order to achieve fast
updating for static data [17].
The Dynamic Executor employs a routing-based query execution algorithm
that provides dynamic execution strategies in each node [12,13]. During the
lifetime of a continuous query, the query plan can be changed by redirecting
data flow on the routing network. The Adaptive Optimiser continuously adjusts
the efficient query plan according to the data distribution in each execution
step [15,17]. RDF4Led and CQELS employ a similar query execution paradigm.
While CQELS uses routing-based query execution algorithms, RDF4Led exe-
cutes SPARQL with a one-tuple-at-a-time policy. Therefore, the same cost model
of the Adaptive Optimiser can be applied when calculating the best plan for a
query that has static data patterns. The Buffer Manager treats the buffer for
join results of the static patterns as a window, and depending on the available
memory, it will apply the fresh update or incremental update policy.
The Adaptive Federator acts as the query rewriter, which adaptively divides
the input query into subqueries. For the implementation used in our experiments
in Sect. 4, the rewriter will push down operators as close to the streaming nodes
as possible by following the predicate pushdown practice in common logical opti-
misation algorithms. The Thing Directory stores the metadata subscribed by the
other Fed4Edge engines (cf. Section 2) in the default graph. Similar to [7], such
metadata allows endpoint services of the Fed4Edge engines to be discovered via
the Adaptive Federator. When the Adaptive Federator sends out a subquery, it
notifies the Stream Input Handler to subscribe and listens to the results return-
ing from the subquery. On the other hand, the Stream Output Handler sends
out the subqueries to other nodes or sends back the results to the requester.
4 Evaluation and Analyses
4.1 Evaluation Setup
Datasets and Queries: To prepare the RDF Stream dataset for the evaluation,
we used the SSN/SOSA ontology [10] to map sensor readings of the NCDC
Integrated Surface Database (ISD) dataset1to RDF. The ISD dataset is one of
1https://www.ncdc.noaa.gov/.
Autonomous RDF Stream Processing on IoT Edge Devices 311
the most prominent weather datasets, which contains weather observation data
from 1901 to present, from nearly 20 K stations over the world. A weather reading
of a station produces an observation that covers measurements for temperature,
wind speed, wind gust, etc. depending on the types of sensors equipped for
that station. Each observation needs approximately 87 RDF triples to map its
values and attributes to the schema illustrated in Fig. 2. The data from different
weather stations was split to multiple devices which acted as streaming nodes
(i.e., the white nodes in Fig. 4). Each streaming node hosts a Websocket server
which manages WebSocket stream endpoints. The data is read from CSV files in
local storage, then mapped to the RDF data schema in Fig. 2before streaming
out.
Fig. 2. RDF stream schema for NCDC weather data
We designed the following queries in order to show the advantages of coop-
erative federation for querying streaming data over a network of edge devices.
Listings 4and 5respectively present the queries Q1 and Q2 that are used for
measuring the improvement of the streaming throughput in simple federation
cases. Q1 will return the updated temperature and the corresponding location
and Q2 will answer the location where the latest temperature is higher than 30.
The subquery of Q1 and Q2 contains only triple patterns for querying streaming
data. With these simple join patterns, the behaviour of the system is mostly
influenced by the behaviour of the network. The filter at line 7 of Q2 will reduce
the number of intermediate results sent from the lower node, and therefore, it
can highlight the benefit of pushing down processing operators closer to the data
sources.
312 M. Nguyen-Duc et al.
For the queries that can show the collaborative behaviour of the participant
edge nodes, we used the queries Q3 (as described in the example of Sect. 2)and
query Q4 in Listing 6. The query Q3 has aggregation and top-k operators and
the Q4 includes a complex join across windows.
1SELECT ?temp ?lat ?lon ?resultTime
2WHERE {
3STREAM ?streamURI [LATEST ON ssn:resultTime] {
4?obs sosa:hasSimpleResult ?temp; sosa:resultTime ?resultTime.
5?sensor rdf:type iot:TempSensor; made:Observation ?obs.}
6?streamURI prov:wasGeneratedBy ?sensor.?sensor sosa:isHostedBy ?station.
7?station wgs84:Point ?loc.?loc wgs84:lat ?lat; wgs84:lon ?lon.}
Listing 4. Q1: Return updated temperature and the corresponding location.
1SELECT ?lat ?lon
2WHERE {
3STREAM ?streamURI [LATEST ON ssn:resultTime] {
4?obs sosa:hasSimpleResult ?temp; sosa:resultTime ?resultTime.
5?sensor rdf:type iot:TempSensor; made:Observation ?obs.
6FILTER (?temp > 30) }
7?streamURI prov:wasGeneratedBy ?sensor.?sensor sosa:isHostedBy ?station.
8?station wgs84:Point ?loc.?loc wgs84:lat ?lat; wgs84:lon ?lon.}
Listing 5. Q2: Return the location where the latest temperature is higher than 30
degree.
1SELECT ?city ?temp ?windspeed
2WHERE{
3STREAM ?streamURI [RANGE 5m ON ssn:resultTime] {
4?obs1 sosa:hasSimpleResult ?temp; sosa:resultTime ?resultTime.
5?obs1 sosa:hasFeatureOfInterest ?foi1.
6?foi1 ssn:hasProperty iot:Temperature. ?foi1 :hasLocation ?loc.
7FILTER (?temp > 30) }
8STREAM \textcolor{v}{?streamURI } [RANGE 5m ON ssn:resultTime] {
9?obs2 sosa:hasSimpleResult ?windspeed; sosa:resultTime ?resultTime.
10 ?obs2 sosa:hasFeatureOfInterest ?foi2.
11 ?foi2 ssn:hasProperty iot:WindSpeed. ?foi2 :hasLocation ?loc.
12 FILTER (?windspeed > 15) }
13 ?streamURI prov:wasGeneratedBy ?sensor.?sensor sosa:isHostedBy ?station.
14 ?station wgs84:Point ?loc.?loc geo:city ?city.}
Listing 6. Q4: Return the city where the temperature is higher than 30and the wind
speed is higher than 15 km in the last 5 min.
Hardware and Software: The hardware for the experiment is a cluster of 85
Raspberry Pi model B nodes, each one is equipped with: Quad Core 1.2 GHz
Broadcom BCM2837 64bit CPU, 1 GB RAM and 100 Mbps Ethernet. All nodes
are connected to five TP-LINK JetStream T2500-28TC switches. Each has
24 100 Mbps Ethernet ports 4 1000 Mbps uplinks shown in Fig.3.Astothe
switching capacity, T2500-28TC has a non-blocking aggregated bandwidth of
12.8 Gbps. Four switches for connecting streaming nodes will be connected to
the fifth one via the 1000 Mps links. This fifth switch is used to connected CQELS
processing nodes. Every node uses Raspbian Jessie as the operating system and
OpenJDK 1.7 for ARM as the JVM. We set 512MB as the maximum heap size
for the Fed4Edge engine.
Autonomous RDF Stream Processing on IoT Edge Devices 313
4.2 Experiments
Fig. 3. The evaluation cluster of 85 Rasp-
berry PI nodes
Baseline Calibration (Exp1): In this
experiment, we calibrated the maxi-
mum processing capability of a process-
ing node as the baseline for the following
federation experiment. We increased the
number of stream nodes to observe the
bottleneck phenomena whereby increas-
ing more streaming nodes decreases the
processing throughput of the network.
Each streaming node will stream out
recorded data as its maximum capacity.
We will use Query 1 and its two variants
as the testing queries. These two vari-
ants are made by reducing four triple
patterns into 1 and 2 patterns respec-
tively. The throughput is measured by
using a timing stream whereby each
streaming nodes will send timing triples
indicating when each of them starts and
finishes sending their data. In each test
we will equally split 500 k–1 M obser-
vations among streaming nodes and
record how much time to process these
observations to calculate the average throughput. Note that we separated the
streaming and processing processes in different physical devices to avoid per-
formance and bandwidth interference which might have an impact on our
measurements.
Fan-out Federation (Exp2): To test the possibility of increasing the processing
throughput by increasing more edge nodes as autonomous agents to the network,
we carried out the tests on five topologies as shown in Fig.4. The first topology
(1-hop) in Fig. 4a was the configuration that gave the peak throughput in Exp1.
Let kbe the number of hops the data has travel to reach to the final destination,
we will increase kto add more intermediate nodes to this topology to create new
topologies. As a result, we can recursively add nnodes to the root node (k=2,
namely 2-hop) and then nnodes to the root node’s children nodes (k= 3, namely
3-hop) whereby nis called the fanout factor (denoted as n-fanout). Then, we
have k1
i=0 nias the number of nodes of a topology with khops and fanout
factor n. We choose n=2andn= 4 (corresponding to the number of streaming
nodes at the maximum throughput reported in Exp1 below), thus, we have four
new topologies with 3, 5, 7 and 21 processing nodes in Figs. 4b, c, d, and e.
In each processing topology, the lowest processing nodes are connected with 4
streaming nodes. We will record the throughput and delay for processing three
queries (Q1, Q2, Q3 and Q4) on these five topologies in a similar fashion to
Exp1.
314 M. Nguyen-Duc et al.
(a) 1node(1-
hop)
(b) 3 nodes (2 hop, 2-
fanout)
(c) 5 nodes (2 hop, 4-fanout)
(d) 7 nodes (3-hop, 2-
fanout)
(e) 21 nodes (3-hop,4-fanout)
Fig. 4. Federation topologies
4.3 Results and Discussions
Fig. 5. Baseline calibration (Color figure
online)
Figure 5reports the results of the
experiment Exp1. The maximum pro-
cessing throughput for three variants
of Query 1 on one single edge device
is from 4200–5000 triples correspond-
ing to 4 streaming nodes. It is inter-
esting that increasing the number of
streaming nodes more than 4 will grad-
ually decrease the overall processing
throughput. The results are consistent
with different complexities of the vari-
ants of Query 1. We observed that the
CPU usages were around 60–70% and the memory consumption is around 270–
300 MB in all tests. Therefore, we can conclude that the bottleneck was caused
by the bandwidth limitations. We also carried out a similar test with Q1 on a PC
(Intel Core i7 i7-7700 K, 4 GHz, 1 GBb Ethernet and 16 GB RAM) as the root
node which has more than 10 times of processing power, memory and network
bandwidth than those of a Raspberry Pi model B. As we expected, the PC’s
maximum throughput is approximately 36k triples/second, around 8–10 times
the one with a Raspberry Pi. Note that this PC costs more than the price of 40
Raspberry Pi nodes.
Figure 6a shows the results of throughput improvements via federating the
processing workload on other intermediate nodes in four proposed topologies.
The results show that adding more nodes will increase the processing throughput
in general. Most queries have their processing throughput consistently boosted
up as a considerable amount of processing load were done at the intermediate
Autonomous RDF Stream Processing on IoT Edge Devices 315
nodes. However, the increase is not consistently correlated with the total number
of processing nodes. In fact, the topology with 5 nodes in Fig. 4d gives a slightly
higher throughput than those of the topology with 7 nodes in Fig. 4c. This can be
explained by the fact that both topologies have 4 processing nodes at the lowest
levels (called leaf processing nodes, i.e, connecting to streaming nodes) but the
data in the latter topology has to travel 1 more hop in comparison with the
former. Due to our pushing down rewriting strategy presented in Sect.3, these
two upper blue nodes in Fig. 4c did not significantly contribute to the overall
throughput but on the other hand cause more communication overhead.
Look closer to the reported figures, we see a high correlation between the
number of leaf processing nodes, i.e. nk1, and the processing throughput in
all topologies. This shows that our proposed approach is able to linearly scale
a network of IoT devices by adding the more devices on demand. In particu-
lar, a network of 21 Raspberry Pi nodes can collaboratively process up to 74 k
triples/seconds or equivalent to roughly 8500 sensor observations/second that
are streamed from other 64 streaming nodes. Hence, the above 20 K weather
stations across the globe of NCDC can be queried via such a network with the
update rate 20–30 observations per minute which are much faster than the high-
est update rate currently supported by NCDC2, i.e. ASOS 1-min data. Moreover,
the processing capacity of this network is twice more than that of the above PC
but it only costs roughly a half of the PC. Regarding the energy consumption,
each Raspberry Pi only consumes around 2 W in comparison of 240 W of the
above PC.
Fig. 6. Federation topologies
Figure 6b reports the average time for each observation to travel through a
processing pipeline specified by each query on different topologies, i.e., average
processing time. It shows that adding more intermediate nodes for query Q1
and Q2 can lower the average processing time as it can reduce queuing time at
some nodes. That means communication time might be a dominant factor for
the delay in these processing pipelines. In queries Q3 and Q4, we witness the
2https://www.ncdc.noaa.gov/data-access/land-based-station-data.
316 M. Nguyen-Duc et al.
consistent increase in processing time wrt. the number of hops which explains
the nature of query Q3 and Q4 that needs more coordination among nodes.
However, it is interesting that increasing 1 hop in organising a network topology
just adds 10–15% delay while the maximum throughput gain is linear to nk1.
4.4 Follow-Up Challenges
We observed the CPU, memory consumption and bandwidth in our experiments.
It is interesting that all tests used 60–70% of CPU (across 4 cores), 25–30%
of physical memory and 20–40% of Ethernet bandwidth (i.e., 100 Mbps). Our
reported performance figures show that edge devices have enough resources to
enable semantic interoperability for the edge computing paradigm. From our
analyses of hardware and software libraries, the most potential suspects for the
processing bottleneck are related to the communication among the nodes. Hence,
there is a lot of room to make our approach much more efficient and scalable.
In this context, to help 100 billions and more edge devices to reach their full
potential, we outline some interesting research challenges below.
The first challenge is how to address the multiple optimisation problems
that such a federated processing pipeline entails. The first one is how to opti-
mise an RSP engine for edge devices which have distinctive processing and I/O
behaviours from those of PC/workstations due to their own design philosophies.
The second challenge is about how to find optimal operator placements on very
dynamic execution settings. The subsequent challenge is how to define cost mod-
els which are no longer limited to processing time/throughput, but need to cover
several cost metrics such as bandwidth, power consumption and robustness.
Looking beyond database-oriented optimisation goals, another relevant
research challenge would be how to model socioeconomic aspects as the control
or optimisation scheme for such a cooperative system. In particular, autonomous
RSP processing nodes can be operated by different stakeholders which have dif-
ferent utility functions dictating when and how to join a network and to share
data and resources. To this end, the coordination strategies become related to
the game theory which inspired some relevant proposals in both the stream
processing and Web communities. For instance, [1] proposed a contract-based
coordination scheme based on mechanism design, a field in economics and game
theory that designs economic mechanisms or incentives toward desired objec-
tives. Similarly, [9] also proposed to use mechanism design for establishing an
incentive-driven coordination strategy among SPARQL endpoints. Inspired by
this line of work, we also proposed an architecture for in-cooperating blockchain
with RDF4Led [18] to pave the way to in-cooperate with such incentive and
contract-based coordination strategies.
Regarding cooperation and negotiation among RSP autonomous agents, a
potential research challenge is the study and exploration of protocols and strate-
gies that follow the multi-agent system paradigm. Although early works on the
topic [26] point at potential opportunities in this area, several aspects have not
been studied yet. These include the usage of individual contextual knowledge
Autonomous RDF Stream Processing on IoT Edge Devices 317
for local decision making (potentially through reasoning) and for a resource-
optimised distribution of tasks among a set of competing/associated nodes. The
dynamics of these federated processing networks would need to adapt to changing
conditions of load, membership, throughput, and other criteria, with emerging
behaviour patterns on the sensing and processing nodes.
5 Related Work
Semantic interoperability in the IoT domain has gained considerable attention
both in the academic and industrial spheres. Beyond syntactic standards such as
SensorML, semantically rich ontologies such as SSN-O/SOSA [10] have shown a
significant impact in different IoT projects and solutions, such as OpenIoT [24],
SymbIoTe [25], or BigIoT [5]. Other related vocabularies, such as the Things-
Description ontology, have also recently gained support from different IoT ven-
dors, aiming at consolidating it as a backbone representation model for generic
IoT devices and services. Regarding the representation of data streams them-
selves, the VoCaLS vocabulary [27] has been designed as a means for the pub-
lication, consumption, and shared processing of streams. Although these ontol-
ogy resources provide different and complementary ways to represent IoT and
streaming data, they require the necessary infrastructure and software compo-
nents (or agents) able to interpret the stream metadata, and apply coordina-
tion/cooperation mechanisms for federated/decentralised processing, as shown
in this paper.
The processing of continuous streaming data, structured according to Seman-
tic Web standards has been studied in the last decade, generally within the
fields of RDF Stream processing (RSP) and Stream Reasoning [8]. A number of
RSP engines have been developed in this period, focusing on different aspects
including incremental reasoning, continuous querying, complex event process-
ing, among others [3,6,15,20]. However, most of these RDF stream processors
lack the capability of interconnecting with each other, or to establish cooper-
ation patterns among them. The coordination among RDF stream processing
nodes is sometimes delegated to a generic cloud-based stream processing plat-
form such as Apache Storm (e.g [16]) or Apache Spark (e.g [20]). In contrast, in
this paper, we investigate a more decentralised environment whereby participant
nodes can be distributed across different organisations. Moreover, the hardware
capabilities of such processing nodes are different from the cloud-based setting,
i.e. resource-constraint edge devices.
Regarding the distributed processing and integration of RSP engines on a
truly decentralised architecture, different aspects and building blocks have sur-
faced in the latest years. Initial attempts to provide HTTP-based service inter-
faces for streaming data were explored in [3]. Other contributions in this line
are the RSP Service Interface3, and the SLD Revolution framework [2]. These
propose the establishment of distributed workflows of RSP engines, using lazy-
transformation techniques for optimised interactions among the engines. Further
3http://streamreasoning.org/resources/rsp-services.
318 M. Nguyen-Duc et al.
conceptualisations of RDF stream processing over decentralised entities have
been presented in works such as WeSP [7]4, which advocates for a community-
driven definition of stream vocabularies and interoperable interfaces. Cooper-
ation strategies among RDF stream processors, or stream reasoning agents is
discussed in [26], introducing potential challenges and opportunities for feder-
ated processing through negotiation established across multi-agent systems.
6 Conclusion
This paper presented a continuous query federation approach that uses RSP
engines as autonomous processing agents. The approach enables the coordina-
tion of edge devices’ resources to process query processing pipelines by cooper-
atively delegating partial workload to their peer agents. We implemented our
approach as an open source engine, Fed4Edge, to conduct an empirical study in
“cooperative sensing” scenarios. The resourceful experiments of the study show
that the scalablity can be significantly improved by adding more edge devices
to a network of processing nodes on demand. This opens several interesting
follow-up research challenges in enabling semantic interoperability for the edge
computing paradigm. Our next step will be investigating on how to adaptively
optimise the distributed processing pipeline of Fed4Edge. Another interesting
step is studying how the communication will effect its performance and scala-
bility on an Internet-scale setting whereby the processing nodes are distributed
across different networks and countries.
Acknowledgements. This work was funded in part by the German Ministry for Edu-
cation and Research as BBDC 2 - Berlin Big Data Center Phase 2 (ref. 01IS18025A),
Irish Research Council under Grant Number GOIPG/2014/917, HES-SO RCSO ISNet
grant 87057 (PROFILES), and Marie Skodowska-Curie Programme H2020-MSCA-IF-
2014 (SMARTER project) under Grant No. 661180.
References
1. Balazinska, M., Balakrishnan, H., Stonebraker, M.: Contract-based load manage-
ment in federated distributed systems. In: NSDI 2004 (2004)
2. Balduini, M., Della Valle, E., Tommasini, R.: SLD revolution: a cheaper, faster yet
more accurate streaming linked data framework. In: ESWC (2017)
3. Barbieri, D.F., Braga, D., Ceri, S., Grossniklaus, M.: An execution environment
for C-SPARQL queries. In: EDBT 2010 (2010)
4. Enabling mass iot connectivity as arm partners ship 100 billion chips. http://tiny.
cc/uiefcz
5. Br¨oring, S., et al.: The big iot api-semantically enabling iot interoperability. IEEE
Pervasive Comput. 17(4), 41–51 (2018)
6. Calbimonte, J.-P., Corcho, O., Gray, A.J.G.: Enabling ontology-based access to
streaming data sources. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS,
vol. 6496, pp. 96–111. Springer, Heidelberg (2010). https://doi.org/10.1007/978-
3-642-17746-0 7
4http://w3id.org/wesp/web-data-streams.
Autonomous RDF Stream Processing on IoT Edge Devices 319
7. Dell’Aglio, D., Della Valle, E., van Harmelen, F., Bernstein, A.: Stream reasoning:
a survey and outlook. Data Sci. 1(1), 59–83 (2017)
8. Dell’Aglio, D., Phuoc, D.L., Le-Tuan, A., Ali, M.I., Calbimonte, J.-P.: On a web
of data streams. In: DeSemWeb@ISWC (2017)
9. Grubenmann, T., Bernstein, A., Moor, D., Seuken, S.: Financing the web of data
with delayed-answer auctions. In: WWW 2018 (2018)
10. Haller, A., et al.: The modular SSN ontology: a joint W3C and OGC standard
specifying the semantics of sensors, observations, sampling, and actuation. Semant.
Web 10(1), 9–32 (2019)
11. Kaebisch, S., Kamiya, T., McCool, M., Charpenay, V.: Web of things (wot) thing
description. W3C, W3C Candidate Recommendation (2019)
12. Le-Phuoc, D.: Operator-aware approach for boosting performance in RDF stream
processing. J. Web Semant. 42, 38–54 (2017)
13. Le-Phuoc, D.: Adaptive optimisation for continuous multi-way joins over rdf
streams. In: Companion Proceedings of the the Web Conference 2018, WWW
2018, pp. 1857–1865 (2018)
14. Le-Phuoc, D., Dao-Tran, M., Le Van, C., Le Tuan, A., Manh Nguyen Duc, T.T.N.,
Hauswirth, M.: Platform-agnostic execution framework towards rdf stream pro-
cessing. In: RDF Stream Processing Workshop at ESWC2015 (2015)
15. Le-Phuoc, D., Dao-Tran, M., Parreira, J.X., Hauswirth, M.: A native and adaptive
approach for unified processing of linked streams and linked data. In: ISWC 2011,
pp. 370–388 (2011)
16. Le-Phuoc, D., Quoc, H.N.M., Van, C.L., Hauswirth, M.: Elastic and scalable pro-
cessing of linked stream data in the cloud. In: ISWC, pp. 280–297 (2013)
17. Le-Tuan, A., Hayes, C., Wylot, M., Le-Phuoc, D.: Rdf4led: An rdf engine for
lightweight edge devices. In: IOT 2018 (2018)
18. Le-Tuan, A., Hingu, D., Hauswirth, M., Le-Phuoc, D.: Incorporating blockchain
into rdf store at the lightweight edge devices. In: Semantic 2019 (2019)
19. Munir, A., Kansakar, P., Khan, S.U.: IFCIoT: integrated fog cloud iot a novel
architectural paradigm for the future internet of things. IEEE Consum. Electron.
Mag. 6(3), 74–82 (2017)
20. Ren, X., Cur´e, O.: Strider: a hybrid adaptive distributed RDF stream processing
engine. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 559–576.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4 33
21. Sakr, S., Wylot, M., Mutharaju, R., Le Phuoc, D., Fundulaki, I.: Processing of
RDF Stream Data. Springer, Cham (2018)
22. Satyanarayanan, M.: The emergence of edge computing. Computer 50(1), 30–39
(2017)
23. Smith, B.: Arm and intel battle over the mobile chip’s future. Computer 41(5),
15–18 (2008)
24. Soldatos, J., et al.: Openiot: open source internet-of-things in the cloud. In: Inter-
operability and open-source solutions for the internet of things. Springer (2015)
25. Soursos, S., ˇ
Zarko, I.P., Zwickl, P., Gojmerac, I., Bianchi, G., Carrozzo, G.: Towards
the cross-domain interoperability of iot platforms. In: 2016 European Conference
on Networks and Communications (EuCNC), pp. 398–402. IEEE (2016)
26. Tommasini, R., Calvaresi, D., Calbimonte, J.-P.: Stream reasoning agents: blue sky
ideas track. In: AAMAS, pp. 1664–1680 (2019)
27. Tommasini, R., et al.: Vocals: vocabulary and catalog of linked streams. In: Inter-
national Semantic Web Conference (2018)
... Semantic stream processing and reasoning are getting more and more attention in various application domains such as IoT, Industry IoT, and Smart Cities [18,20,21,4,26]. Among them, many recent works, e.g. ...
... Note that using RDF-based symbols from such vocabularies can facilitate semantic interoperability across distributed ROS nodes, called Semantic Nodes. With the declarative continuous query language [16,9] , the query federation feature in [26] and [27] is aligned with the publish/subscribe mechanism along with distributed data distribution abstraction, such as Data Distribution Service (DDS) of ROS. For example, the following CQELS-QL query in Listing 1 will subscribe to a continuous query to fuse two videos, and publish a new stream as a ROS node, generating bounding boxes of "traffic obstacles" utilizing object detection models. ...
... The idea behind this is that they could share and reuse the data captured and processed from their peers instead of capturing and processing it themselves. Based on the federation mechanism from [26] and [27], all cars in the network will communicate and exchange data with each other through CQELS-QL queries. Moreover, a robot-car could take the data from the others around them to extend its view or improve its prediction. ...
Preprint
Full-text available
Stream processing and reasoning is getting considerable attention in various application domains such as IoT, Industry IoT and Smart Cities. In parallel, reasoning and knowledge-based features have attracted research into many areas of robotics, such as robotic mapping, perception and interaction. To this end, the Semantic Stream Reasoning (SSR) framework can unify the representations of symbolic/semantic streams with deep neural networks, to integrate high-dimensional data streams, such as video streams and LiDAR point clouds, with traditional graph or relational stream data. As such, this positioning and system paper will outline our approach to build a platform to facilitate semantic stream reasoning capabilities on a robotic operating system called SemRob.
... Therefore, the existing framework is no longer suitable for the new application scenarios and requires significant extensions. Towards this goal, in CQELS 2.0, we integrate a number of new stream data types such as video streams, LiDARs, and support more hardwares such as ARM and mobiles [17,20]. Moreover, we provide data fusion operations in our engines [15,21]. ...
... DFG COSMO and BMBF BIFOLD ) and industry partners. For example, application scenarios in edge intelligence [26,9] and industry 4.0 from DellEMC, Siemens and Bossh motivate us to build autonomous processing kernels powered by CQELS [20,21]. Such systems deal with city-scale camera deployments which have become ubiquitous in smart cities and traffic monitoring with a continuously increasing in size and utilization of their deployments. ...
... For scalability, CQELS employs Storm and HBase as underlying software stacks for coordinating parallel execution processes to build an RSP engine on the cloud computing infrastructure, called CQELS Cloud [16]. To tailor the RDF-based data processing operations on edge devices (e.g, ARM CPU, Flash-storage), CQLES can be integrated in RDF4Led [17], a RISC style RDF engine for lightweight edge devices, to build Fed4Edge [20]. The whole Fed4Edge is smaller than 10MB and needs only 4-6 MB of RAM to process millions of triples on various small devices such as BeagleBone, Raspberry PI. ...
Preprint
Full-text available
We present CQELS 2.0, the second version of Continuous Query Evaluation over Linked Streams. CQELS 2.0 is a platform-agnostic federated execution framework towards semantic stream fusion. In this version, we introduce a novel neural-symbolic stream reasoning component that enables specifying deep neural network (DNN) based data fusion pipelines via logic rules with learnable probabilistic degrees as weights. As a platform-agnostic framework, CQELS 2.0 can be implemented for devices with different hardware architectures (from embedded devices to cloud infrastructures). Moreover, this version also includes an adaptive federator that allows CQELS instances on different nodes in a network to coordinate their resources to distribute processing pipelines by delegating partial workloads to their peers via subscribing continuous queries
... In [66] the authors propose Fed4Edge, a system that enables the coordination of resources available in Edge devices to process query pipelines in a collaborative way. Fed4Edge uses RDF Stream Processing (RSP) engines as autonomous processing agents. ...
... The testbed allows IoT native experimentation (e.g. wireless sensor network experi- [169,77,31,86,167,67,151,39,128,37,60,170,97,5,57,3,161,62,80,50,76,43,12,99], medium scale [38,111,127,129,36,56,157], and large scale [66,123,121] ments) and service provision experiments (e.g. applications using real-time real-world sensor data). ...
Article
The explosion of data volumes generated by an increasing number of applications is strongly impacting the evolution of distributed digital infrastructures for data analytics and machine learning (ML). While data analytics used to be mainly performed on cloud infrastructures, the rapid development of IoT infrastructures and the requirements for low-latency, secure processing has motivated the development of edge analytics. Today, to balance various trade-offs, ML-based analytics tends to increasingly leverage an interconnected ecosystem that allows complex applications to be executed on hybrid infrastructures where IoT Edge devices are interconnected to Cloud/HPC systems in what is called the Computing Continuum, the Digital Continuum, or the Transcontinuum. Enabling learning-based analytics on such a complex infrastructures is challenging. The large scale and optimized deployment of learning-based workflows across the Edge-to-Cloud Continuum requires extensive and reproducible experimental analysis of the application execution on representative testbeds. This is necessary to help understand the performance trade-offs that result from combining a variety of learning paradigms and supportive frameworks. A thorough experimental analysis requires the assessment of the impact of multiple factors, such as: model accuracy, training time, network overhead, energy consumption, processing latency, among others. This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today. It describes the main learning paradigms enabling learning-based analytics on the Edge-to-Cloud Continuum. The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed. Furthermore, we analyze how the selected systems provide support for experiment reproducibility. We conclude our review with a detailed discussion of relevant open research challenges and of future directions in this domain such as: holistic understanding of performance; performance optimization of applications;efficient deployment of Artificial Intelligence (AI) workflows on highly heterogeneous infrastructures; and reproducible analysis of experiments on the Computing Continuum.
... Listing 2). Via the semantic subscription and discovery mechanism of Fed4Edge [11], each ASF agent can use the above SSN-based representation to describe sensing and fusion capabilities so that other agents can the semantic discovery service 1 https://w3c.github.io/rdf-star/cg-spec/editors_draft.html of Fed4Edge to delegate their processing sensing and fusion tasks. As a result, an agent can o give access to data streams fed from the downstream ASF agents connecting to it. ...
... To build the processing kernel for an ASF agent, we design its architecture shown in Figure 3. This design aims to reuse the processing components of Fed4Edge [11] and SSR (Semantic Stream Reasoner) [7]. While Fed4Edge provides semantic-based subscription and discovery services for autonomous agents for the Federator, SSR provides the probalistic reasoning component over both neural network outputs and semantic stream symbols. ...
Conference Paper
Full-text available
Video streams are becoming ubiquitous in smart cities and traffic monitoring. Recent advances in computer vision with deep neural networks enable querying a rich set of visual features from these video streams. However, it is challenging to deploy these queries on edge devices due to the resource intensive nature of the computing operations of this sort. Hence, this paper will demonstrate our approach in pushing these computing operations closer to the video stream sources via autonomous stream fusion agents. These agents will facilitate an edge computing paradigm that enables edge devices to utilize its computing resources to serve federated queries over video streams. Our demonstration shows that edge devices can significantly alleviate the bottleneck of the centralized server in dealing with distributed video streams.
Preprint
Full-text available
Semantic stream processing and reasoning refers to a set of models, principles, and techniques for analyzing and processing stream data by exploiting semantic structures that are explicitly or implicitly embedded in stream data elements. Such “semantic streams” are represented as sequences of temporal graphs linking human and machine understandable semantics and computational primitives. Semantic stream processing and reasoning approaches leverage processing and reasoning capabilities through formally defined rules to automate and optimize their continuous processing flows formulated in high-level abstract concepts and relationships.
Chapter
Full-text available
RDF stores provide a simple abstraction for publishing and querying data, that is becoming a norm in data sharing practice. They also empower the decentralised architecture of data publishing for the Web or IoT-driven systems. Such architecture shares a lot in common with blockchain infrastructure and technologies. Therefore, there are emerging interests in marrying RDF stores and blockchain to realise desirable but speculative benefits of blockchain-powered data sharing. This paper presents the first RDF store with blockchain that enables lightweight edge devices to control of the data sharing processes (personal, IoT data). Our novel approach on the deep integration of the storage design for RDF store enables the ability to enforce controlling measures on access methods and auditing policies over data elements via smart contracts before they fetched from the sources to the consumers. Our experiments show that the prototype system delivers an effective performance for a processing load of 1 billion triples on a small network of lightweight nodes which costs less than a commodity PC.
Conference Paper
Full-text available
Semantic interoperability for the Internet of Things(IoT) is being enabled by standards and technologies from the Semantic Web. As recent research suggests a move towards decentralised IoT architectures, our focus is on how to enable scalable and robust RDF engines that can be embedded throughout the architecture, in particular at edge nodes. RDF processing at edge enables the creation of semantic integration gateways for locally connected low-level devices. We introduce a lightweight RDF engine, which comprises of RDF storage and SPARQL processor, for the lightweight edge devices, called RDF4Led. RDF4Led follows the RISCstyle (Reduce Instruction Set Computer) design philosophy. The design comprises a flash-aware storage structure, an indexing scheme and a low-memory-footprint join algorithm which improves scalability as well as robustness over competing solutions. With a significantly smaller memory footprint, we show that RDF4Led can handle 2 to 5 times more data than RDF engines such as Jena TDB and Virtuoso. On three types of ARM boards, RDF4Led requires 10--30% memory of its competitors to operate up to 30 million triples dataset; it can perform faster updates and can scale better than Jena TDB and Virtuoso. Furthermore, we demonstrate considerably faster query operations than Jena TDB.
Article
Full-text available
Today, IoT platforms offer proprietary interfaces and protocols. To enable interoperable interaction with those platforms we present the generic BIG IoT API that employs a novel approach for self-description and semantic annotation to fully adapt arbitrary IoT platforms. We have deployed this approach for multiple platforms from the mobility domain.
Article
Full-text available
The joint W3C (World Wide Web Consortium) and OGC (Open Geospatial Consortium) Spatial Data on the Web (SDW) Working Group developed a set of ontologies to describe sensors, actuators, samplers as well as their observations, actuation, and sampling activities. The ontologies have been published both as a W3C recommendation and as an OGC implementation standard. The set includes a lightweight core module called SOSA (Sensor, Observation, Sampler, and Actuator) available at: http://www.w3.org/ns/sosa/, and a more expressive extension module called SSN (Semantic Sensor Network) available at: http://www.w3.org/ns/ssn/. Together they describe systems of sensors and actuators, observations, the used procedures, the subjects and their properties being observed or acted upon, samples and the process of sampling, and so forth. The set of ontologies adopts a modular architecture with SOSA as a self-contained core that is extended by SSN and other modules to add expressivity and breadth. The SOSA/SSN ontologies are able to support a wide range of applications and use cases, including satellite imagery, large-scale scientific monitoring, industrial and household infrastructures, social sensing, citizen science, observation-driven ontology engineering, and the Internet of Things. In this paper we give an overview of the ontologies and discuss the rationale behind key design decisions, reporting on the differences between the new SSN ontology presented here and its predecessor [9] developed by the W3C Semantic Sensor Network Incubator group (the SSN-XG). We present usage examples and describe alignment modules that foster interoperability with other ontologies. A. Haller et al. / The Modular SSN Ontology: A Joint W3C and OGC Standard 1
Conference Paper
Full-text available
The join operator is a core component of an RDF Stream Processing engine. The join operations usually dominate the processing load of a query execution plan. Due to the constantly updating nature of continuous queries, the query optimiser has to frequently change the optimal execution plan for a query. However, optimising the join executing plan for every execution step might be prohibitively expensive, hence, dynamic optimisation of continuous join operations is still a challenging problem so far. Therefore, this paper proposes the first adaptive optimisation approach towards this problem in the context of RDF Stream Processing. The approach comes with two dynamic cost-based optimisation algorithms which use a light-weight process to search for the best execution plan for every execution step. The experiments show the encouraging results towards this direction.
Conference Paper
The World Wide Web is a massive network of interlinked documents. One of the reasons the World Wide Web is so successful is the fact that most content is available free of any charge. Inspired by the success of the World Wide Web, the Web of Data applies the same strategy of interlinking to data. To this point, most of data in the Web of Data is also free of charge. The fact that the data is freely available raises the question of financing these services, however. As we will discuss in this paper, advertisement and donations cannot easily be applied to this new setting. To create incentives to subsidize data providers, we propose that sponsors should pay the providers to promote sponsored data. In return, sponsored data will be privileged over non-sponsored data. Since it is not possible to enforce a certain ordering on the data the user will receive, we propose to split up the data into different batches and deliver these batches with different delays. In this way, we can privilege sponsored data without withholding any non-sponsored data from the user. In this paper, we introduce a new concept of a delayed-answer auction, where sponsors can pay to prioritize their data. We introduce a new model which captures the particular situation when a user access data in the Web of Data. We show how the weighted Vickrey-Clarke-Groves auction mechanism can be applied to our scenario and we discuss how certain parameters can influence the nature of our auction. With our new concept, we build a first step to a free yet financial sustainable Web of Data.
Chapter
We are witnessing a paradigm shift, where real-time, time-dependent data is becoming ubiquitous. As Linked Data facilitates the data integration process among heterogenous data sources, RDF Stream Data has the same goal with respect to data streams. It bridges the gap between stream and more static data sources. To support the processing on RDF stream data, there is a need on investigating how to extend RDF to model and represent stream data. Then, from the RDF-based data representation, the query model processing models need to be defined to build the stream processing engine that is tailored for streaming data. This chapter provides an overview on how such requirements are addressed in the current state-of-the-art of RDF Stream Data processing.