Content uploaded by Muhammad Saleem
Author content
All content in this area was uploaded by Muhammad Saleem on Feb 26, 2021
Content may be subject to copyright.
Journal Title 1 (2019) 1–5 1
IOS Press
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
An Empirical Evaluation of Cost-based
Federated SPARQL Query Processing
Engines
Umair Qudus a, Muhammad Saleem b, Axel-Cyrille Ngonga Ngomo cand Young-koo Lee a,*
aDKE, Kyung Hee University, South Korea
E-mail:{umair.qudus,yklee}@khu.ac.kr
bAKSW, Leipzig, Germany
E-mail: {lastname}@informatik.uni-leipzig.de
cUniversity of Paderborn, Germany
E-mail: axel.ngonga@upb.de
Editor: Ruben Verborgh, Ghent University, Belgium
Solicited reviews: Stasinos Konstantopoulos, Institute and Informatics and Telecommunications, Greece; Five anonymous reviewers
Abstract.
Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation
engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation
engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources
selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the
accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the
effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this
challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query
engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using
LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful
for the development of future cost-based federated SPARQL query processing engines.
Keywords: SPARQL, benchmarking, cost-based, cost-free, federated, querying
1. Introduction
The availability of increasing amounts of data pub-
lished in RDF has led to the genesis of many federated
SPARQL query engines. These engines vary widely
in their approaches to generating a good query plan
[
23
,
36
,
39
,
50
]. In general, there exist several possible
plans that a federation engine can consider when exe-
*Corresponding author. E-mail: yklee@khu.ac.kr
cuting a given query. These plans have different costs
in terms of the resources required and the overall query
execution time. Selection of the best possible plan with
minimum cost is hence of key importance when devis-
ing cost-based federation engines; a fact which is cor-
roborated by a plethora of works in database research
[25, 27].
In SPARQL query federation, index-free (heuristics-
based) [
14
,
29
,
47
] and index-assisted (cost-based)
[
7
,
9
,
11
,
15
,
18
,
24
,
26
,
28
,
34
,
43
,
49
] engines are most
0000-0000/19/$00.00 © 2019 – IOS Press and the authors. All rights reserved
2An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
commonly used for federated query processing [
39
].
The heuristics-based federation engines do not store
any pre-computed statistics and hence mostly use differ-
ent heuristics to optimize their query plans [
47
]. Cost-
based engines make use of an index with pre-computed
statistics about the datasets [
39
]. Using cardinality es-
timates as principal input, such engines make use of
cost models to calculate the cost of different query joins
and generate optimized query plans. Most state-of-the-
art cost-based federated SPARQL processing engines
[
7
,
11
,
15
,
18
,
24
,
26
,
28
,
43
,
49
] achieve the goal of
optimizing their query plan by first estimating the cardi-
nality of the query’s triple patterns. Subsequently, they
use this information to estimate the cardinality of the
joins involved in the query. A cost model is then used
to compute the cost of performing different query joins
while considering network communication costs. One
of the query plans with minimum execution costs is
finally selected for result retrieval. Since the principal
input for cost-based query planning is the cardinality
estimates, the accuracy of these estimates is crucial to
achieve a good query plan.
The performance of federated SPARQL query pro-
cessing engines has been evaluated in many recent stud-
ies [
1
–
3
,
7
,
8
,
10
,
16
,
21
,
22
,
28
,
39
–
41
,
43
,
48
] using
different federated benchmarks [
5
,
12
,
17
,
30
,
31
,
35
,
38
,
45
,
46
]. Performance metrics, including query exe-
cution time, number of sources selected, source selec-
tion time, query planning time, continuous efficiency of
query processing, answer completeness and correctness,
time for the first answer, and throughput, are usually
reported in these studies. While these metrics allow the
evaluation of certain components (e.g., the source selec-
tion model), they cannot be used to evaluate the accu-
racy of the cardinality estimators of the cost-based fed-
eration engines. Consequently, they are unable to show
how the estimated cardinality errors affect the overall
query runtime performance of federation engines.
In this paper, we address the problem of measuring
the accuracy of the cardinality estimators of federated
SPARQL engines, as well as the effect of these errors
on the overall query runtime performance. In particular,
we propose metrics
1
for measuring errors in the car-
dinality estimations of (1) triple patterns, (2) joins be-
tween triple patterns, and (3) query plans. We correlate
these errors with the overall query runtime performance
of state-of-the-art cost-based SPARQL federation en-
1
Our proposed metric is open-source and available online at
https://github.com/dice-group/CostBased-FedEval
gines. The observed results show that these metrics are
correlated with the overall runtime performances. In ad-
dition, we compare sate-of-the-art cost-based SPARQL
federation engines using existing metrics pertaining to
indexing, query processing, network, and overall query
runtime using different evaluation setups.
In summary, the contributions of this work are as
follows:
–
We propose metrics to measure the errors in cardi-
nality estimations of cost-based federated engines.
These metrics allow a fine-grained evaluation of
cost-based federated SPARQL query engines and
uncover novel insights about the performance of
these types of federation engines that were not
reported in previous works.
–
We measure the correlation between the values of
the novel metrics and the overall query runtimes.
We show that some of these metrics have a strong
correlation with runtimes and can hence be used
as predictors for the overall query execution per-
formance.
–
We present an empirical evaluation of five—
CostFed [
43
], Odyessey [
28
], SemaGrow [
7
],
LHD [
49
] and SPLENDID [
11
]—state-of-the-art
cost-based SPARQL federation engines on Larg-
eRDFBench [
38
] by using the proposed metrics
along with existing metrics, affecting the query
runtime performance.
The rest of the paper is organized as follows: In Sec-
tion 2, we present related work. A motivating example
is given in Section 3. In Section 4, we present our novel
metrics for the evaluation of cost-based federation en-
gines. In Section 5, we give an overview of the cardinal-
ity estimators of selected cost-based federation engines.
The evaluation of these engines with proposed as well
as existing metrics is shown in Section 6. Finally, we
conclude in Section 7.
2. Related Work
In this section, we focus on the performance met-
rics used in the state-of-the-art to compare federated
SPARQL query processing engines. Based on the pre-
vious federated SPARQL benchmarks [
12
,
38
,
45
] and
performance evaluations [
1
–
3
,
7
,
10
,
11
,
16
,
28
,
34
,
43
,
47
,
49
] (see Table 1 for an overview), the performance
metrics used for federated SPARQL engines compari-
son can be categorized as:
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 3
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Engine Index Processing Network Res RS Add
Cr Gt Qp #Ts Qet #A Sst #Tt #Er Cu Mu Cp Ct @T @K
CostFed [43] 3 3 3 3 3 3 3
SPLENDID [11] 3 3 3
SemaGrow [7] 3 3 3
Odyssey [28] 3 3 3 3 3 3
LHD [49] 3 3 3 3 3 3
DARQ [34] 3 3
ANAPSID [2] 3
HiBISCuS [40] 3 3 3 3 3 3 3
MULDER [10] 3 3 3 3 3
FedX [47] 3 3 3
Lusail [1] 3 3 3
BioFed [16] 3 3 3 3 3 3
TopFed [42] 3 3
SAFE [21, 22] 3 3 3 3 3 3
Table 1: Metrics used in the existing federated SPARQL query processing systems,
Res:
Resource Related,
RS:
Result Set Related,
Add:
Additional,
Cr:
index compression ratio,
Gt:
the index/summary generation time,
Qp:
Query planning time,
#Ts:
total number of triple pattern-wise sources selected,
Qet:
the average query execution
time,
#A:
total number of SPARQL ASK requests submitted,
Sst:
the average source selection time,
#Tt:
number
of transferred tuples,
#Er:
number of endpoint requests,
Cu:
CPU usage,
Mu:
Memory usage,
Cp:
Result Set
completeness, Ct: Result Set correctness, @T: dief@t, @K: dief@k
– Index-Related
: Index-assisted approaches [
39
]
make use of stored dataset statistics to generate
an optimized query execution plan. The indexes
are pre-computed by collecting information from
available federated datasets. This is usually a one-
time process. However, later updates are required
to ensure the result-set completeness of the query
processing. The index generation time and its com-
pression ratio (w.r.t. overall dataset size) are im-
portant measures to be considered when devising
index-assisted federated engines.
– Query-Processing-Related
: This category con-
tains metrics related to the query processing ca-
pabilities of the federated SPARQL engines. The
reported metrics in this category are the total num-
ber of triple-pattern-wise sources selected, number
of
ASK
requests used to perform source selection,
source selection time, query planning time, and
overall query runtime.
– Network-Related:
Federated engines collect
information from multiple data sources, e.g.,
SPARQL endpoints. Thus, it is important to mini-
mize the network traffic generated by the engines
during query processing. The number of trans-
ferred tuples and the number of endpoint requests
generated by the federation engine are the two
network-related metrics used in existing federated
SPARQL query processing evaluations.
– Result-Set-Related:
Two systems are only com-
parable if they produce exactly the same results.
Therefore, result set correctness and completeness
are the two most important metrics in this cate-
gory.
– Resource-Related:
The CPU and memory re-
sources consumed during query processing dictate
the query load an engine can bear. Hence, they are
of importance when evaluating the performance
of federated SPARQL engines.
– Additional:
Two metrics dief@t and dief@k are
proposed to measure continuous efficiency of
query processing approaches.
All of these metrics are helpful to evaluate the per-
formance of different components of federated query
engines. However, none of these metrics can be used to
evaluate the accuracy of the cardinality estimators of
4An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
cost-based federation engines. Consequently, studying
the effect of estimated cardinality errors on the over-
all query runtime performance of federation engines
cannot be conducted based on these metrics. To over-
come these limitations, we propose metrics for measur-
ing errors in cardinality estimations of triple patterns,
joins between triple patterns, and overall query plan,
and show how these metrics are affecting the overall
runtime performance of federation engines.
3. Motivating Example
In this section, we present an example to motivate
our work and to understand the proposed metrics. We
assume that the reader is familiar with the concepts of
SPARQL and RDF, including the notions of a triple
pattern, the joins between triple patterns, the cardinal-
ity (result size) of a triple pattern, and left-deep query
execution plans. As aforementioned, most cost-based
SPARQL federation engines first estimate individual
triple pattern cardinality and use this information to esti-
mate the cardinality of joins found in the query. Finally,
the query execution plan is generated by ordering the
joins. In general, the optimizer first selects the triple pat-
terns and joins with minimum estimated cardinalities
[43].
Figure 1 shows a motivating example containing a
SPARQL query with three triple patterns—namely TP1,
TP2 and TP3—and two joins. Consider two different
cost-based federation engines with different cardinality
estimators. Figure 1a shows the real (
C
r) and estimated
cardinalities (
Ce1
for Engine 1 and
Ce2
for Engine
2) for triple patterns of the query. Let us assume that
both engines generate left-deep query plans by selecting
triple patterns with the smallest cardinalities to perform
their first join. The results of this join are then used
to perform the second join with the remaining third
triple pattern. By using actual cardinalities, the optimal
query execution plan would be to first perform the join
between TP1 and TP2 and then perform the second join
with TP3. The same plan will be generated by Engine
1 as well, as shown in Figure 1b. The subOptimal plan
generated by Engine 2 is given in Figure 1c. Note that
Engine 2 did not select the optimal plan because of
large errors in cardinality estimations of triple patterns
and joins between triple patterns.
The motivating example clearly shows that good car-
dinality estimations are essential to produce a better
query plan. The question we aim to answer pertains
to how much the accuracy of cardinality estimations
affects the overall query plan and the overall query run-
time performance. To answer this question, the q-error
(Q in Figure 1) was introduced in [
27
] in the database
literature. In the next section, we define this measure
and propose new metrics based on similarities to mea-
sure the overall triple patterns error
ET
, overall joins
error EJas well as overall query plan error EP.
4. Cardinality Estimation-related Metrics
Now we formally define the q-error and our proposed
metrics, namely
ET
,
EJ
,
EP
to measure the overall er-
ror in cardinality estimations of triple patterns, joins
between triple patterns and overall query plan error,
respectively.
4.1. q-error
The q-error is the factor by which an estimated car-
dinality value differs from the actual cardinality value
[27].
Definition 1
(q-error)
.
Let
r= (r1,...,rn)∈Rn
where
ri>0
be a vector of real values and
e= (e1,...,en)∈
Rn
be the vector of the corresponding estimated values.
By defining
e/
r=
e
r= (e1/r1,...,en/rn)
, then q-error
of estimation e of r is given as
||e/r||Q= max
16i6n||ei/ri||Q, where
||ei/ri||Q= max(ei/ri,ri/ei)
In this definition, over- and underestimations are
treated symmetrically [
27
]. In the motivating example
given in Figure 1, the real cardinality of TP1 is 100
(i.e.,
Cr(T P1) = 100
) while the estimated cardinal-
ity by engine 1 for the same triple pattern is 90 (i.e.,
Cr(T P1) = 90
). Thus, the q-error for this individual
triple pattern is
max(90/100,100/90) = 1.11
. The
query’s overall q-error of its triple patterns (see Figure
1b) is the maximum value of all the q-error values of
triple patterns, i.e.,
max(1.11,1.25,1) = 1.25
. The q-
error of the complete query plan would be the maxi-
mum q-error values in all triple patterns and joins used
in the query plan, i.e., max(1.11,1.25,1,1.3,3) = 3.
The q-error makes use of the ratio instead of an ab-
solute or quadratic difference and is hence able to cap-
ture the intuition that only relative differences matter
for making planning decisions. In addition, the q-error
provides a theoretical upper bound for the plan quality
if the q-error of a query is bounded. Since it only con-
siders the maximum value amongst those calculated, it
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 5
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
SELECT *WHERE {
? s : p 1 ? o1 .
Cr ( T P1 ) : 1 0 0
Ce 1 ( TP1 ) : 9 0
Ce 2 ( TP1 ) : 2 0 0
? s : p 2 ? o2 .
Cr ( T P2 ) : 2 0 0
Ce 1 ( TP2 ) : 2 5 0
Ce 2 ( TP2 ) : 6 0 0
? s : p 3 ? o3 .
Cr ( T P3 ) : 3 0 0
Ce 1 ( TP3 ) : 3 0 0
Ce 2 ( TP3 ) : 5 0 0 }
(a) Example query
𝜋
⋈
⋈
TP1
Cr(TP1): 100
Ce1(TP1): 90
Q(TP1): 1.11
TP2
Cr(TP2): 200
Ce1(TP2): 250
Q(TP2): 1.25
Cr(BGP1⋈TP3): 50
Ce1(BGP1⋈TP3): 150
Q(BGP1⋈TP3): 3
Cr(TP1⋈TP2): 50
Ce1(TP1⋈TP2): 65
Q(TP1⋈TP2): 1.3 TP3
Cr(TP3): 300
Ce1(TP3): 300
Q(TP3): 1
BGP1
(b) Engine 1 optimal query plan
𝜋
⋈
⋈
TP1
Cr(TP1): 100
Ce2(TP1): 200
Q(TP1): 2
Cr(BGP1⋈TP2): 50
Ce2(BGP1⋈TP2)): 75
Q(BGP1⋈TP2)): 1.5
Cr(TP1⋈TP3): 100
Ce2(TP1⋈TP3)): 50
Q(TP1⋈TP3)): 2
BGP1
TP3
Cr(TP3): 300
Ce2(TP3): 500
Q(TP2): 1.66
TP2
Cr(TP2): 200
Ce2(TP2): 600
Q(TP2): 3
(c) Engine 2 subOptimal query plan
Fig. 1.: Motivating Example: A sample SPARQL query and the corresponding query plans of two different federation
engines
is possible that plans with good average estimations are
regarded as poor by this measure. Consider the query
plans given in Figure 1b and Figure 1c. Both have a
q-error of 3, yet the query plan in Figure 1b is optimal,
while the query plan in Figure 1c is not. To solve this
problem, we introduce the additional metrics defined
below.
4.2. Similarity Errors
The overall similarity error of query triple patterns is
formalised as follows:
Definition 2
(Triple Patterns Error
ET
)
.
Let Q be
a SPARQL query containing triple patterns T =
{
T P1,...,T Pn
}. Let
r= (Cr(T P1),...,Cr(T Pn)) ∈
Rn
be the vector of real cardinalities of T and
e=
(Ce(T P1),...,Ce(T Pn)) ∈Rn
be the vector of the cor-
responding estimated cardinalities of T. Then, we define
our overall triple pattern error as follows:
ET=k
r−
ek
k
rk+k
ek=√Pn
i=1(Cr (T Pi)−Ce(T Pi))2
√Pn
i=1 (Cr(T Pi))2+√Pn
i=1 (Ce(T Pi))2
In the motivating example given in Figure 1, the real
cardinalities vector
r
= (100,200,300) and the Engine 1
estimated cardinalities vector
e
= (90,250,300). Thus,
ET
=
0.0658
. Similarly, the Engine 2 estimated cardinal-
ity vector is
e
= (200,500,600). Thus, Engine 2 achieves
ET=0.388.
Definition 3
(Joins Error
EJ
)
.
Let Q be a SPARQL
query containing joins J = {
J1,...,Jn
}. Let
r=
(Cr(J1),...,Cr(Jn)) ∈Rn
a vector of real cardinali-
ties of J and
e= (Ce(J1),...,Ce(Jn)) ∈Rn
be the
vector of the corresponding estimated cardinalities of
J, then the overall joins error is defined by the same
equation in Definition 2.
Definition 4
(Query Plan Error
EP
)
.
Let Q be a
SPARQL query and TJ be the set of triple patterns and
joins in Q. Let
r= (r1,...,rn)∈Rn
be a vector of real
cardinalities of TJ and
e= (e1,...,en)∈Rn
be the
vector of corresponding estimated cardinalities of TJ,
then the overall query plan error is defined by the same
equation in Definition 2.
In the motivating example given in Figure 1b, the
real cardinalities vector of all triple patterns and joins,
r
= (100,200,300,50,50) and the Engine 1 estimated
cardinalities vectors
e
= (90,250,300,65,150). Thus,
EP
= 0.1391 for Engine 1. Engine 2 achieves
EP
= 0.3838.
In these matrices, over- and underestimations are also
treated symmetrically. The purpose of these definitions
is to keep the lower bound at 0, which could be reached
if
r=e
(i.e., there is no error in the estimation), and
the upper bound at 1, which could be reached if
e
is
much larger than r.
6An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
5. Selected Federation Engines
In this section, we give a brief overview of the se-
lected cost-based SPARQL federation engines. In par-
ticular, we describe how the cardinality estimations
for triple patterns and joins between triple patterns are
performed in these engines.
CostFed:
CostFed [
43
] makes use of pre-computed
statistics stored in index to estimate the cardinality
of triple patterns and joins between triple patterns.
CostFed benefits from both bind join (
b
) [
7
,
11
,
47
]
and symmetric hash join (
h
) [
2
] for joining the results
of triple patterns. The decision of join selection is based
on calculating the cost of both joins on query runtime.
It creates three buckets for each distinct predicate used
in the RDF dataset. These buckets are used for estimat-
ing the cardinality of query triple patterns. Furthermore,
CostFed stores selectivity information that is used to
estimate the cardinality of triple patterns as well as
devising an efficient query plan. The CostFed query
planner also considers the skew in the distribution of
objects and subjects across predicates. Separate cardi-
nality estimation is used for Multi-valued predicates.
Multi-valued predicates are the predicates that can have
multiple values, as people can have multiple contact
numbers or graduation schools. It performs a join-aware
trie-based source selection, which uses common URI
prefixes.
Let
D
represent a dataset or source for short,
tp =<
s,p,o>
be a triple pattern having predicate
p
, and
R(tp)
be the set of relevant sources for that triple pat-
tern. The following notations are used to calculate the
cardinality of tp.
–T(p,D)
is the total number of triples with predi-
cate pin D.
–avgSS(p,D)
is the average subject selectivity of
p
in D.
–avgOS(p,D)
is the average object selectivity of
p
in D.
–tT (D)is the total number of triples in D.
–tS (D)
is the total number of distinct subjects in
D
.
–tO(D)
is the total number of distinct objects in
D
.
From these notations the cardinality
C(tp)
of tp is
calculated as follows (the predicate
b
stands for bound):
P
∀Di∈R(tp)
T(p,Di)×1
⇒if b(p)∧!b(s) ∧!b(o),
P
∀Di∈R(tp)
T(p,Di)×avgS S (p,Di)
⇒if b(p) ∧b(s) ∧!b(o),
P
∀Di∈R(tp)
T(p,Di)×avgOS (p,Di)
⇒if b(p) ∧!b(s) ∧b(o),
P
∀Di∈R(tp)
tT (Di)×1
⇒if !b(p) ∧!b(s) ∧!b(o),
P
∀Di∈R(tp)
tT (Di)×1
tS (Di)
⇒if !b(p) ∧b(s) ∧!b(o),
P
∀Di∈R(tp)
tT (Di)×1
tO(Di)
⇒if !b(p) ∧!b(s) ∧b(o),
P
∀Di∈R(tp)
tT (Di)×1
tS (Di)×tO(Di)
⇒if !b(p) ∧b(s) ∧b(o),
1⇒if b(p) ∧b(s) ∧b(o)
A recursive definition is used to define the SPARQL
expression
E
[
6
,
7
] in the query planning phase and
is defined as follows: all triple patterns are SPARQL
expressions and if
E1
and
E2
are SPARQL expressions
then
E1 E2
is also a SPARQL expression. The join
cardinality of two expressions
E1
and
E2
is estimated
as follows:
C(E1 E2)
=M(E1) ×M(E2) ×Min(C(E1),C(E2))
where the average frequency of multi-valued predicates
in the expression
E
is defined as
M(E)
. In
M(E)
,
E
is not the result of joins between triple patterns but
the triple pattern itself.
M(E)
is calculated using the
following equation:
M(E) =
1/√2if b(p) ∧!b(s) ∧b(o),
C(E)/di stS b js(p,D)if b(p) ∧!b(s) ∧!b(o) ∧j(s),
C(E)/di stOb js(p,D)if b(p) ∧!b(o) ∧!b(s) ∧j(o),
1 other
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 7
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
If the subject of the triple pattern is involved in the join,
it is defined as
j(s)
, and b(s), b(o), and b(p) are defined
as bound subject, object, predicate respectively.
SPLENDID:
SPLENDID [
11
] also uses VoID statis-
tics to generate a query execution plan. It uses a dy-
namic programming approach to produce a query exe-
cution plan. SPLENDID makes use of both hash (
h
)
and bind (b) joins.
Triple pattern cardinality is estimated as follows:
cardd(?,p,?) = cardd(p)
cardd(s,?,?) = |d| · sel .sd
cardd(s,p,?) = cardd(p)·sel.sd(p)
cardd(?,?,o) = |d| · sel .od
cardd(?,p,o) = cardd(p)·sel .od(p)
cardd(s,?,o) = |d| · sel .sd·sel .od
where the
cardd(p)
is the number of triple patterns in
the data source
d
having predicate
p
. The total number
of triples in a data source
d
is defined as
|d|
. If we have
a bound predicate then the average selectivity of subject
and object is defined as
sel.sd(p)
and
sel.od(p)
respec-
tively; if the predicate is not bound then the average
selectivity of subject and object is defined as
sel.sd
and
sel.od
respectively. In star-shaped queries, SPLENDID
estimates the cardinality of triple patterns having the
same subject separately. All triples with same subjects
are grouped and then the minimum cardinality of all
triple patterns with bound objects is calculated. Lastly,
the cardinality of remaining triples with unbound ob-
jects is multiplied with the average selectivity of sub-
jects and the minimum value. Formally, the equation is
defined as:
cardd(T) =
min (cardd(Tbound)) ·Y(sel.sd·cardd(Tunbound ))
Join cardinality is estimated as follows:
card(q1 q2) = card (q1)·card (q2)·sel(q1,q2)
In these equations the
sel
is the join selectivity of
two input relations. It defines how many bindings are
matched between two relations. SPLENDID uses the
average selectivity of join variables as join selectivity.
LHD:
LHD [
49
] is a cardinality-based and index-
assisted approach that aims to maximize parallel exe-
cution of sub-queries. It makes use of the VoID statis-
tics for estimating the cardinality of triple patterns and
joins between triple patterns. LHD only uses Bind joins
for query execution. LHD implements a response-time-
cost model by making an assumption that the response
time of a query request is proportional to the total num-
ber of bindings transferred. LHD determines the total
number of triples
td
, distinct subjects
sd
and objects
od
from the VoID description of a dataset d. The VoID file
also provides other information, such as the number
of triples
td.p
, distinct subjects
sd.p
and distinct objects
od.p
in the dataset d for a predicate p. The federation
engine makes an assumption about uniform distribution
of objects and subjects in datasets. Let’s assume a triple
pattern
T:{S PO}2
, the function to get the set of rel-
evant datasets of T is defined as
S(T)
, the selectivity
of x with respect to
S(T)
is defined as
selT (x)
, and
the cardinality of x with respect to
S(T)
is defined as
cardT (x).
For single triple pattern cardinality estimation, the
selectivity of each part is estimated as follows:
selT(S) =
Pd∈S(T)td/sd
Pd∈S(T)sdif var(P)∧ ¬var(S)
Pd∈S(T)td·p/sd·p
Pd∈S(T)sd·pif P=p∧ ¬ var(S)
1if var(S)
selT(P) =
(Pd∈S(T)td.p
Pd∈S(T)tdif P=p
1if var(P)
selT(O) =
Pd∈S(T)td/od
Pd∈S(T)odif var(P)∧ ¬var(O)
Pd∈S(T)td.p/od.p
Pd∈S(T)od.pif P=p∧ ¬ var(O)
1if var(O)
After calculating the selectivity of each part, LHD esti-
mates the cardinality of the triple pattern as follows:
card(T) = t·selT(S)·selT(P)·selT(O)
2
In this section, the letters with a question mark (e.g. ?x) denote
a variable in an RDF triple, a literal value is represented by a lower-
case letter (e.g. o) , and a variable or a literal value is defined by an
upper-case letter (e.g. S)
8An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Given two triple patterns T1 and T2, LHD calculates
the join selectivity by using the following equations:
sel (T1 T2) =
Pd∈S(T1)sd.p1·Pd∈S(T2)sd.p2
Pd∈S(T1)sd·Pd∈S(T2)sdif joined on S1=S2
Pd∈S(T1)od.p1·Pd∈S(T2)od.p2
Pd∈S(T1)od·Pd∈S(T2)odif joined on O1=O2
Pd∈S(T1)sd.p1·Pd∈S(T2)od.p2
Pd∈S(T1)sd·Pd∈S(T2)odif joined on S1=O2
1if no shared variables.
Using the join selectivity values, join cardinality is
estimated by the following equation:
card (T1 T2 . .. Tn)
=
n
Y
i=1
card (Ti)·sel (T1 T2 . .. Tn)
SemaGrow:
SemaGrow [
7
] query planning is based
on VoID
3
statistics [
4
] about datasets. It makes use of
the VoID index as well as SPARQL ASK queries to
perform source selection. Three types of joins, i.e, bind,
merge and hash, are used during the query planning.
The selection to perform the required join operation
is based on a cost function. It uses a reactive model
for retrieving results of the joins as well as individual
triple patterns. As with CostFed, SemaGrow recursively
defines SPARQL expressions. Given a data source S,
the cardinality estimations of triple patterns and joins
are explained below.
SemaGrow contains a Resource discovery compo-
nent, which returns the list of relevant sources to a triple
pattern along with statistics. The statistics related to the
data source include (1) the number of estimated distinct
subjects, predicates and objects matching the triple pat-
tern, and (2) the number of triple patterns in the data
sources matching the triple pattern. The cardinality of
a triple pattern is provided by the Resource Discov-
ery component. On the other hand, for more complex
expressions, SemaGrow needs to make an estimation
based on available statistics. In order to estimate com-
plex expressions based on the aforementioned basic
statistics, SemaGrow adopts the formulas described by
3VoID vocabulary: https://www.w3.org/TR/void/
LHD [
49
]. The cardinality of each expression (E) in a
data source S, is defined as Card([E],S)).
For estimating the join cardinality we need to calcu-
late the join selectivity (
JoinSel ([E1] [E2])
), which
is given as follows:
JoinSel ([E1] [E2]) =
min ( JoinSel [E1],JoinSel [E2])
JoinSel ([T]) = min (1/di)
In these equations, E1 and E2 reside any join expres-
sions or triple patterns. The T is a single triple pattern.
di
represents the number of distinct values for the i-st
join attribute in a T. Hence, the join cardinality is given
as follows:
Card([E1] [E2],S) =
Card([E1],S)·Card([E2],S)·JoinSel ([E1] [E2])
Odyssey:
Odyssey [
28
] makes use of the distributed
characteristic sets (CS) [
32
] and characteristic pair (CP)
[
13
] statistics to estimate cardinalities. Odyssey es-
timates the cardinality of each type of query differ-
ently using these statistics. For star-shaped queries,
where the subject (or object) is the same for all join-
ing triple patterns, estimated cardinality for a given set
of properties P (predicates of joining triple patterns)
is computed using CSs
Cj
containing all these prop-
erties. The common subject (or object) is defined as
an entity. CSs can be computed by scanning once a
dataset’s triples are sorted by subject; after all the entity
properties have been scanned, the entity’s CS is identi-
fied. For each CS C, Odyssey computes statistics, i.e.,
(count(C))
represents the number of entities sharing
C and
(occurrences(p,C))
represents the number of
triples with predicate p occurring with these entities.
Odyssey represents
estimatedCardinalityDistinct (P)
as the estimated cardinality of queries that contain dis-
tinct keywords, and
estimatedCardinality(P)
as the es-
timated cardinality of those queries that do not contain
the distinct keyword. Formally, estimated cardinality
for star-shaped queries is defined as follows:
estimatedCardinality Distinct (P) = X
P⊆Cj
(count (Cj))
estimatedCardinality (P) =
X
P⊆Cj
count (Cj)·Y
pi∈P
ocurrences (pi,Cj)
count (Cj)
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 9
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
For arbitrary-shaped queries, Odyssey also considers
the connections (links) between different CSs. Charac-
teristic pairs (CPs) help in describing the links between
Characteristic sets (CSs) using properties. For entities
e1
and
e2
the link is defined as
(CS s(e1),CS s(e2),p)
,
given that
(e1,p,e2) ∈s
, where s is data source. The
number of links between two CS s:Ciand Cj, through
a property p is represented in statistics, which is defined
as: –
count(Ci,C j,p)
. The equation for estimating the
cardinality (pairs of entities with a set of properties Pk
and Pl) for complex-shaped queries is defined as:
estimatedCardinality (Pk,Pl,p) =
PPk⊆Ci∧Pl⊆Cj(count (Ci,Cj,p)·Qpk∈Pk−{ p}
ocurrences (pk,Ci)
count(Ci)·Qpl∈Plocurrences (pl,Cj)
count(Cj))
In order to reduce the complexity, Odyssey treats each
star-shaped query as a single meta-node; assuming that
the order of joins has already optimized within the star-
shaped sub-queries. It uses Characteristics Pairs (CPs)
to estimate the cardinality of joins between star-shaped
queries (meta-nodes) and uses dynamic programming
(DP) to optimize the join order and find the optimal
plan.
6. Evaluation and Results
In this section, we discuss the results we obtained
in our evaluation. All results are also available at the
project homepage. First, we evaluate our novel metrics
in terms of how they are correlated with the overall
query runtime performance of state-of-the-art federated
query engines. Thereafter, we compare existing cost-
based SPARQL federation engines using the proposed
metrics and discuss the evaluation results.
6.1. Experiment Setup and Hardware:
Benchmarks Used:
In our experiments, we used
the state-of-the-art benchmark for federated engines
dubbed LargeRDFBench [
38
]. LargeRDFBench com-
prises a total of 40 queries (including all queries from
FedBench [
45
]): 14 simple queries (S1-S14) from Fed-
Bench, 10 complex queries (C1-C10), 8 complex and
high-sources queries (CH1-CH8), and 10 large data
queries (L1-L10). Simple queries are fast to execute and
include the smallest number of triple patterns, which
ranges from 2 to 7 [
38
]. Complex queries are more
challenging and take more time to execute compared to
simple queries [
38
]. The queries in this category have
at least 8 triple patterns and contain more joins and
SPARQL operators than simple queries. The complex
and high-sources queries are even more challenging as
they need to retrieve results from more data sources
and they have more triple patterns, joins and SPARQL
operators than the simple and complex queries [38].
We used all queries except the large data queries
(L1-L10) in our experiments. The reason for not using
L1-L10 was that the evaluation results presented in [
38
]
show that most engines are not yet able to execute these
queries. LargeRDFBench comprises of 13 real-world
RDF datasets of varying sizes. We loaded each dataset
into a Virtuoso 7.2 server.
Cost-based Federation Engines:
We evaluated five—
CostFed [
43
], Odyessey [
28
], SemaGrow [
7
], LHD
[
49
] and SPLENDID [
11
]—state-of-the-art cost-based
SPARQL federation engines. To the best of our knowl-
edge, these are most of the currently available, open-
source cost-based federation engines.
Hardware Used:
Each Virtuoso was deployed on a
physical machine (32 GB RAM, Core i7 processor and
500 GB hard disk). We ran the selected federation en-
gines on a local client machine with the same specifica-
tion. Our experiments were run in a local environment
where the network cost is negligible. This is the stan-
dard setting used in the original LargeRDFBench. Note
that the accuracy of the cardinality estimators of the
federated SPARQL query processing is independent of
the network cost.
Warm-up and Number of Runs:
In each experiment,
we warmed up each federation engine for 10 minutes by
executing the Linked Data (LD1-LD10) queries from
FedBench. Experiments were run three times and the
results were averaged. The query timeout was set to 30
minutes.
Metrics:
We present results for the (1) q-error of
triple patterns, (2) q-error of joins between triple pat-
terns, (3) q-error of overall query plans, (4) errors of
triple patterns, (5) errors of joins between triple pat-
terns, (6) errors of overall query plans, (7) overall query
runtimes, (8) number of tuples transferred (intermediate
results), (9) source selection related metrics, and (10)
quality of plans generated by query planner of each en-
gine. In addition, we used Spearman’s correlation coef-
ficient to measure the correlation between the proposed
10 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
metrics and the overall query runtimes. The Spearman
test is designed to assess how well the dependency be-
tween two variables can be described using a monotonic
function. While the Pearson test could also be used, we
preferred the Spearman test because it is parameter-free
and tests at rank level. We used simple linear and robust
regression models to compute the correlation.
6.2. Regression Experiments
Throughout our regression experiments, our null hy-
pothesis was that there is no correlation between the
runtimes of queries and error measurements (i.e., q-
error or similarity error) used in the experiments. We
began by investigating the dependency between the met-
rics we proposed and the overall query runtime perfor-
mance of the federation engines selected for our ex-
periments. Figure 2 shows the results of a simple lin-
ear regression experiment aiming to compute the de-
pendency between the q-error and similarity errors and
the overall query runtimes. For a particular engine, the
left figure shows the dependency between the q-error
and overall runtime while the right figure in the same
row shows the result of the correlation of runtime with
similarity error. The higher coefficients (dubbed
R
in
the figure) computed in the experiments with similarity
errors suggest that it is likely that the similarity errors
are a better predictor for runtime. The positive value
of the coefficient suggests that an increase in similarity
error also means an increase in the overall runtime. It
can be observed from the figure that outliers are po-
tentially contaminating the results. Hence, we applied
robust regression [
19
,
33
,
37
] using the Huber loss func-
tion [
20
] in a second series of experiments to lessen
the effect of the outliers on the results (especially for
q-errors) (see Figure 3). We observe that after removing
outliers using robust regression, the average
R
-values of
the similarity-based error correlation further increases.
The lower p-values in the similarity-error-based experi-
ments further confirm that our metrics are more likely
to be a better predictor for runtime than the q-error. The
reason for this result is that our measure exploits more
information and is hence less affected by outliers. This
is not the case for the q-error, which can be perturbed
significantly by a single outlier.
To further investigate the correlation between met-
rics and runtimes, we measured Spearman’s correlation
coefficient between query runtimes and corresponding
errors of each of the first six metrics. The results are
shown in Table 2 which shows that the proposed metrics
on average have positive correlations with query run-
times, i.e., the smaller the error, the smaller the query
runtimes. The similarity error of overall query plan (
EP
)
has the highest impact (i.e. 0.35) on query runtimes,
followed by the similarity error of the triple pattern
(i.e.,
ET
with 0.27), q-error of joins (i.e.,
QJ
with 0.26),
similarity error of Join (i.e.,
EJ
with 0.22), q-error of
overall plan (i.e.,
QP
with 0.17), and q-error of triple
patterns (i.e., QTwith 0.06).
In order to make a fair comparison between the re-
sults, we only take the common queries on which every
system passed. We eliminate the LHD [
49
] because it
failed in 20/32 benchmark queries (which is a very high
number and only 12 simple queries passed), and is not
adequate for comparison. We apply Spearman’s corre-
lation again on common queries. Table 3 shows that the
proposed metric has a positive correlation with query
runtime when we deal with only common queries. The
similarity error of overall plan (
EP
) and triple pattern
(
ET
) has the highest impact (i.e., 0.40) on query run-
time, followed by similarity error of joins (i.e.,
EJ
with
0.39), q-error of joins (i.e.,
QJ
with 0.17) and overall
query plan (i.e.,
QP
with 0.17), and q-error of triple
patterns (i.e., QTwith 0.01).
Furthermore, we removed outliers influencing results
by applying robust regression on both the q-error and
proposed similarity error metrics. Robust regression is
done by Iterated Re-weighted Least Squares (IRLS)
[
19
]. We used Huber weights [
20
] as weighting func-
tion in IRLS. This approach further fine-tuned the re-
sults and made the correlation for our proposed sim-
ilarity error and runtime stronger. Table 4 shows that
all metrics have a positive correlation. However, in our
proposed metric this difference is definite. The simi-
larity error of overall query plan (
EP
) has the highest
impact (i.e., 0.56) on query runtimes, followed by the
similarity error of the triple pattern (i.e. ETwith 0.49),
similarity error of joins (
EJ
with 0.45), q-error of joins
(i.e.
QJ
with 0.22), q-error of overall plan (i.e.,
QP
with
0.18) and triple pattern (i.e.,
QP
with 0.18). Table 4 also
shows that the q-error for Odyssey is negatively corre-
lated with runtime. We can also observe high q-error
values from Figure 4.
Another important factor worth mentioning is that
the robust regression does not abide by the normal-
ity assumptions. Comparing the p-values (at 5% confi-
dence level) of the simple linear regression and robust
regression suggests that the data is sufficiently normally
distributed for simple linear regression.
Overall, the results show that the proposed similar-
ity errors correlate better with query runtimes than the
q-error. Moreover, the correct estimation of the overall
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 11
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
R = 0.24 , p = 0.28
4
6
8
10
0 1000 2000 3000
Qerror
runTime
costFed
R = 0.59 , p = 0.0094
4
6
8
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
R = 0.4 , p = 0.079
5.0
7.5
10.0
12.5
0 500 1000 1500 2000
Qerror
runTime
SemaGrow
R = 0.56 , p = 0.042
4
6
8
10
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
R = 0.041 , p = 0.86
8
10
12
14
0 2000 4000 6000
Qerror
runTime
Splendid
R = 0.45 , p = 0.073
6
8
10
12
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
R = −0.015 , p = 0.96
6
8
10
12
0.0e+00 5.0e+06 1.0e+07 1.5e+07
Qerror
runTime
Odyssey
R = 0.42 , p = 0.13
4
6
8
10
12
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
Fig. 2.: q-error and similarity error vs. runtime (Simple Linear Regression Analysis). The grey shaded areas represent
the confidence intervals (bands) in regression line.
12 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
R = 0.16 , p = 0.46
4
6
8
10
0 100 200 300
Qerror
runTime
costFed
R = 0.66 , p = 0.0021
2.5
5.0
7.5
10.0
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
R = 0.53 , p = 0.016
6
9
12
0 300 600 900 1200
Qerror
runTime
SemaGrow
R = 0.56 , p = 0.042
4
6
8
10
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
R = 0.041 , p = 0.86
8
10
12
0 25 50 75 100 125
Qerror
runTime
Splendid
R = 0.55 , p = 0.023
4
6
8
10
12
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
R = −0.02 , p = 0.94
7
9
11
0 1000 2000 3000 4000 5000
Qerror
runTime
Odyssey
R = 0.45 , p = 0.11
4
6
8
10
12
0.00 0.25 0.50 0.75 1.00
similarity_error
runTime
Fig. 3.: q-error and similarity error vs. runtime (Robust Regression Analysis). The grey areas represent the confidence
intervals (bands) in regression line.
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 13
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Rank 1 2 3 4 5 6
Similarity Error q-error
Feature EJEPETAverage QJQPQTAverage
F.Q Engines
CostFed 0.23 0.59 0.43 0.42 0.14 0.26 0.1 0.17
SemaGrow 0.33 0.33 0.33 0.33 0.47 0.37 0.001 0.28
ODYSSEY 0.11 0.14 0.55 0.26 0.01 0.03 −0.06 −0.01
SPLENDID 0.3 0.4 0.24 0.32 0.17 0.1 0.24 0.17
LHD 0.16 0.28 −0.2 0.08 0.51 0.11 0.04 0.22
Average 0.22 0.35 0.27 0.28 0.26 0.17 0.06 0.17
Table 2: Spearman’s rank correlation coefficients between query plan features and query runtimes for all queries.
Rank 1 2 3 4 5 6
Similarity Error q-error
Feature EJEPETAverage QJQPQTAverage
F.Q Engines
CostFed 0.54 0.61 0.36 0.5 0.11 0.23 0.05 0.13
SemaGrow 0.44 0.56 0.43 0.48 0.49 0.40 −0.02 0.29
ODYSSEY 0.22 0.42 0.53 0.39 −0.04 −0.01 −0.20 −0.08
SPLENDID 0.35 0.45 0.27 0.36 0.12 0.04 0.21 0.12
Average 0.39 0.51 0.40 0.43 0.17 0.17 0.01 0.12
Table 3: Spearman’s rank correlation coefficients between query plan features and query runtimes after linear
regression (only for common queries between all systems).
Rank 1 2 3 4 5 6
Similarity Error q-error
Feature EJEPETAverage QJQPQTAverage
F.Q Engines
CostFed 0.60 0.66 0.62 0.63 0.16 0.16 0.16 0.16
SemaGrow 0.56 0.56 0.57 0.56 0.60 0.53 0.57 0.56
ODYSSEY 0.25 0.45 0.59 0.43 −0.04 −0.02 −0.20 −0.08
SPLENDID 0.49 0.55 0.20 0.38 0.14 0.041 0.18 0.12
Average 0.45 0.56 0.49 0.50 0.22 0.18 0.18 0.19
Table 4: Spearman’s rank correlation coefficients between query plan features and query runtimes after robust
regression (only for common queries between all systems).
EJ
: Similarity Error of Joins,
EP
: Similarity Error of
overall query plan,
ET
: Similarity Error of Triple Patterns,
QJ
: q-error of Joins,
QP
: q-error of overall query plan,
QT
:
q-error Error of Triple Patterns, F.Q: Federated Query. Correlations and colors (
−+
):
0.00...0.19
very weak (
),
0.20...0.39 weak ( ), 0.40...0.59 moderate ( ), 0.60...0.79 strong ( ), 0.80...1.00 very strong ( ).
plan is clearly the most crucial fragment of the plan gen-
eration. Thus, it is important for federation engines to
pay particular attention to the cardinality estimation of
the overall query plan. However, given that this estima-
tion commonly depends on triple patterns and join esti-
mations, better means for approximating triple patterns
and join cardinalities should lead to better plans. The
weak to moderate correlation of the similarity errors
with query runtimes suggests that the query runtime is
a complex measure affected by multi-dimensional met-
rics, such as metrics given in table 1 and the SPARQL
features, such as number of triple patterns, their selec-
tivities, use of projection variables, number of joins and
their types [
44
]. Therefore, it is rather hard to pinpoint
a single metric or a SPARQL feature which has a high
correlation with the runtime [
38
,
44
]. The proposed
similarity error metric is related to the query planning
component of the federation engines and is useful for
evaluating the quality of the query plans generated by
these engines.
14 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
CostFed SemaGrow SPLENDID Odyssey LHD
0
0.2
0.4
0.6
0.8
1
1.2
(a) Overall Similarity Error of query plans
CostFed SemaGrow SPLENDID Odyssey LHD
1E00
1E02
1E04
1E06
1E08
(b) Overall q-error of query plans
CostFed SemaGrow SPLENDID Odyssey LHD
0
0.2
0.4
0.6
0.8
1
1.2
(c) Join Similarity Error of query plans
CostFed SemaGrow SPLENDID Odyssey LHD
1E00
1E02
1E04
1E06
1E08
(d) Join q-error of query plan
CostFed SemaGrow SPLENDID Odyssey LHD
0
0.2
0.4
0.6
0.8
1
1.2
(e) Triple pattern Similarity Error of query
CostFed SemaGrow SPLENDID Odyssey LHD
1E00
1E02
1E04
1E06
1E08
(f) Triple pattern q-error of query
Fig. 4.: Similarity and q-error of Query plan
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 15
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
6.2.1. Outlier Analysis
In the robust regression model, the outliers are ad-
justed with new values according to Huber loss func-
tion. In similarity error, the list of queries which are
re-weighted after applying robust regression are: C2,
C1, S14, CH7 in CostFed; S2 in SemaGrow; S8 and
S2 in SPLENDID; and S8 in Odyssey. In q-error, the
list of queries which are re-weighted after applying ro-
bust regression are: C6, C2, C4, CH7, S3 in CostFed;
CH3, CH4, S13, C2 in SemaGrow; CH6, C2, C7, S5 in
SPLENDID; and S11, C2, C1, S4 in Odyssey.
In these queries, the residual values are either sig-
nificantly higher or lower than the regression line. For
example, in CostFed the average of the similarity errors
across all queries is 0.272 and the range of the residual
values for unmodified queries is between -0.17 and 0.17,
while C2 similarity error is 0.99 with residual value
0.73, CH7 similarity error is 0.99 with residual value
0.19, and C1 similarity error is 0.62 with residual value
0.32. Hence, by re-weighting these queries, the similar-
ity error and q-error values are re-adjusted close to the
regression line, to get a more clear and concise picture
of the regression experiments. As it can be observed
from the Simple Linear Regression figure, the outliers
are influencing the results. For example, in CostFed,
the R value in simple regression is 0.59 and after re-
weighting in robust regression the value increased to
0.66. Furthermore, we observe that, in similarity error
the R values are increased in robust regression as com-
pared to simple linear regression. While on the other
hand, for q-error the R values are decreased in robust
regression, further suggesting that similarity error is the
better predictor of the runtime as compared to q-error.
Finally, the overall q-error is more affected by ro-
bust regression as compared to similarity error. This is
because a q-error takes the maximum of all the errors
in the cardinality estimation of the joins and triple pat-
terns. Consequently, some queries produce very high
q-error values due to a single less efficient cardinality
estimation for a join or a triple pattern.
6.2.2. Combined Regression-based Comparison
Analysis:
Recall that our null hypothesis was that there is no
correlation between query runtime and error measure-
ment. Based on the results shown in Figures 2 and 3,
we can make the following observations:
–
We can reject the null hypothesis in 62.5% (i.e., 5
out 8) of the experiments for the similarity error
while the same can only be done in 12.5% (1 out
of 8) experimental settings for the q-error.
–
The similarity error is significantly correlated with
the runtimes of CostFed (simple and robust regres-
sion), SemaGrow (simple and robust regression)
and Splendid (robust regression). On the other
hand, the q-error is solely significantly correlated
with the runtime of SemaGrow (robust regression).
In the one case where the p-values achieved by
both measures allow to reject the null hypothesis
(i.e., for SemaGrow using the robust regression
analysis), the R-value of the similarity error is
higher than that of the q-error (.56 vs. .53).
–
For Odyssey, both the similarity error and q-error
were not able to produce significant results in
our experiments. This suggests that the two er-
rors do not capture the phenomena that influence
the performance of Odyssey. A deeper look into
Odyssey’s runtime performance suggests that it
performs worst w.r.t. its source selection time (see
Table 8), a factor which is not captured by the
errors considered herein.
Our observations suggest that the similarity error is
more likely to be significantly correlated with the run-
time of a federated query engine than the q-error. How-
ever, for some systems (like Odyssey in our case) it may
not produce significant results. Interestingly, the correla-
tion between similarity error and runtimes is significant
and highest for best-performing (in terms of average
query runtime, see Figure 5) federated query engine
CostFed. We hypothesize that this result might indicate
that the similarity error is most useful for systems which
are already optimized to generate good plans. However,
this hypothesis needs to be confirmed through further
experiments. Still, the usefulness of the similarity er-
ror seems especially evident when one compares the
behaviour of the similarity and the q-error when faced
with single cardinality estimation errors. For example,
suppose we have 3 joins in a query with estimated car-
dinalities 10, 10, 100 and with real cardinalities 10, 10
and 1 respectively. The q-error of the plan would be
100 even though only a single join estimation was not
optimal. As shown by the equation in Section 4.1, the
q-error is sensitive to single estimation error if they are
of high magnitude. This is not the case with similarity
errors, which would return 0.86.
6.3. q-error and Similarity-Based Errors
We now present a comparison of the selected cost-
based engines based on the 6 metrics given in Figure
4. Overall, the similarity errors of query plans given
16 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
in Figure 4a suggests that CostFed produces the small-
est errors followed by SPLENDID, LHD, SemaGrow,
and Odyssey. CostFed produces smaller errors than
SPLENDID in 10/17 comparable queries (excluding
queries with timeout and runtime errors). SPLENDID
produces smaller errors than LHD in 12/14 comparable
queries. LHD produces smaller errors than SemaGrow
in 6/12 comparable queries. In turn, SemaGrow pro-
duces smaller errors than Odyssey in 9/15 comparable
queries.
An overall evaluation of the q-error of query plans
given in Figure 4b leads to the following result: CostFed
produces the smallest errors followed by SPLENDID,
SemaGrow, Odyssey, and LHD. In particular, CostFed
produces smaller errors than SPLENDID in 9/17 com-
parable queries (excluding queries with timeout and run-
time error). SPLENDID produces smaller errors than
SemaGrow in 9/17 comparable queries. SemaGrow pro-
duces smaller errors than Odyssey in 8/13 comparable
queries. Odyssey is superior to LHD in 5/8 cases.
An overall evaluation of the similarity error in joins
leads to a different picture (see Figure 4c). While
CostFed remains the best system and produces the
smallest errors, it is followed by Odyssey, SPLENDID,
SemaGrow, and LHD. In particular, CostFed outper-
forms Odyssey in 12/17 comparable queries (excluding
queries with timeout and runtime error). Odyssey pro-
duces less errors than SPLENDID in 7/14 comparable
queries. SPLENDID is superior to SemaGrow in 11/17
comparable queries. SemaGrow outperforms LHD in
7/12 comparable queries.
As an overall evaluation of the q-error of joins given
in Figure 4d, CostFed produces the smallest errors fol-
lowed by SPLENDID, SemaGrow, Odyssey, and LHD.
CostFed produces less errors than SPLENDID in 12/17
comparable queries (excluding queries with timeout
and runtime error). SPLENDID produces less errors
than SemaGrow in 9/17 comparable queries. Sema-
Grow produces less errors than Odyssey in 9/13 compa-
rable queries. Odyssey produces less errors than LHD
in 4/8 comparable queries.
Overall, the evaluation of the similarity errors of
triple patterns given in Figure 4e reveals that CostFed
produces the smallest errors followed by SPLENDID,
Odyssey, SemaGrow, and LHD. CostFed produces
smaller errors than SPLENDID in 10/17 comparable
queries (excluding queries with timeout and runtime er-
ror). SPLENDID produces smaller errors than Odyssey
in 15/17 comparable queries. Odyssey produces smaller
errors than SemaGrow in 7/14 comparable queries.
SemaGrow outperformed LHD in 6/12 queries.
An overall evaluation of the
q-
error of triple pat-
terns given in Figure 4f leads to a different ranking:
CostFed produces the smallest errors followed by LHD,
SemaGrow, SPLENDID, and Odyssey. CostFed out-
performs LHD in 6/11 comparable queries (exclud-
ing queries with timeout and runtime error). LHD pro-
duces fewer errors than SemaGrow in 5/10 comparable
queries. SemaGrow is better than SPLENDID in 10/17
comparable queries. SPLENDID produces fewer errors
than Odyssey in 7/14 comparable queries.
In general, the accuracy of the estimation is depen-
dent upon the detail of the statistics stored in the index
or data summaries. Furthermore, it is important to pay
special attention to the different types of triple patterns
(with bound and unbound subject, predicate, objects)
and joins types (subject-subject, subject-object, object-
object) for the better cardinality estimations. CostFed
is more accurate because of the more detailed data sum-
maries, able to handle the different types of triple pat-
terns and joins between triple patterns. The use of the
buckets can more accurately estimate the cardinalities
of the triple patterns with most common predicates used
in the dataset. Furthermore, it handles multi-valued
predicates. The Odyssey statistics are more detailed as
compared to SPLENDID and SemaGrow (both using
VoiD statististics). The distributed characteristic sets
(CS) and characteristic pair (CP) statistics generally
leads to better cardinality estimations for joins.
6.4. How Much Does An Efficient Cardinality
Estimation Really Matter?
We observed that it is possible for a federation engine
to produce quite a high cardinality estimation error (e.g.,
0.99 is the overall similarity error for the S11 query
in SemaGrow), yet it produces the optimal query plan.
This leads to the question, how much does the efficiency
of cardinality estimators of federation engines matter to
generate optimal query plans? To this end, we analyzed
query plans generated by each of the selected engines
for the benchmark queries. In our analysis, there are
three possible cases in each plan:
– Optimal plan
: In the optimal plan, the best possi-
ble join order is selected based on the given source
selection performed by the underlying federation
engine, i.e., the least cardinality joins are always
executed first.
– Sub-optimal plan
: In the sub-optimal plan, the
engine fails to select the best join based on the
given source selection performed by the under-
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 17
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Query
CostFed SemaGrow SPLENDID Odyssey LHD
Simple Queries
S1 OnlyP OnlyP OnlyP OnlyP OnlyP
S2 OnlyP OnlyP OnlyP OnlyP OptP
S3 OnlyP OnlyP OnlyP OnlyP OptP
S4 OnlyP OptP OnlyP OnlyP OptP
S5 OnlyP OptP OnlyP OnlyP OptP
S6 OptP OptP OptP subOpt OptP
S7 OptP OptP OptP subOpt OptP
S8 OnlyP OnlyP OnlyP OnlyP OnlyP
S9 OnlyP OnlyP OnlyP OnlyP OnlyP
S10 OnlyP subOpt OnlyP OptP subOpt
S11 OnlyP OnlyP OnlyP OnlyP subOpt
S12 OptP OptP OptP OptP subOpt
S13 OnlyP OptP OptP OptP subOpt
S14 OnlyP OptP OptP OnlyP OptP
Complex Queries
C1 OnlyP OptP OptP OnlyP Failed
C2 subOpt subOpt subOpt OptP Failed
C3 OptP subOpt OptP OptP Failed
C4 OptP subOpt subOpt subOpt Failed
C5 subOpt OptP subOpt subOpt Failed
C6 OnlyP OptP subOpt OnlyP Failed
C7 OnlyP OptP OptP OnlyP Failed
C8 OnlyP subOpt subOpt OnlyP Failed
C9 OnlyP OptP subOpt OnlyP Failed
C10 OptP OptP subOpt subOpt Failed
Complex + High Data
Sources Queries
CH1 OptP subOpt subOpt OptP Failed
CH2 subOpt subOpt subOpt subOpt Failed
CH3 OptP subOpt subOpt subOpt Failed
CH4 subOpt subOpt subOpt Failed Failed
CH5 Failed subOpt subOpt subOpt Failed
CH6 Failed Failed OptP subOpt Failed
CH7 subOpt subOpt subOpt subOpt Failed
CH8 subOpt subOpt subOpt subOpt Failed
Table 5: Query Plans generated by query engines for all queries (Simple, Complex, Complex + High Dimensional
Queries).
Failed:
(
) Engine Failed to produce Query Plan,
OptP:
(
) Optimal Query Plan generated by engine,
subOpt:( ) subOptimal Plan generated by engine, OnlyP:( ) Only Plan possible.
lying federation engine, i.e., the least cardinality
joins are not always executed first. Please note that
this also means that the high error in the join car-
dinality estimation leads to the sub-optimal join
order.
– Only-plan
: In only-plan, there is only one possi-
ble join order according to the given source se-
lection performed by the underlying federation
engine. This is possible if only 1 join (excluding
a left-join due to the OPTIONAL clause in the
query) needs to be executed locally by the federa-
tion engine. This situation occurs if there is only
a single join in the query or the federation engine
creates exclusive groups of joins that are executed
remotely by the underlying SPARQL endpoints.
Table 5 shows the query plan generated by the query
planners of the selected engines according to the afore-
mentioned three cases possible for each plan. Since
LHD failed to generate any query plan for the majority
of the LargeRDFBench queries, we omit it from further
discussion. In our evaluation, CostFed produced the
smallest sub-optimal plans (i.e, 6) followed by Odyssey
(i.e., 11), SemaGrow (i.e., 12), and SPLENDID (i.e, 14).
The reason for CostFed’s small number of sub-optimal
plans is due to the fact that it has the fewest cardinality
errors in the estimation, as discussed in the previous
section. In addition, it generates the highest number of
possible only-plans (which can be regarded as optimal
plans for the given source selection information). This
is because CostFed’s source selection is more efficient
18 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
in terms of the total triple pattern-wise sources selected
without losing recall (see Table 8).
In Table 5, we can see that only a few sub-optimal
query plans were generated for simple queries. This is
due to the fact that simple category queries of the Larg-
eRDFBench contain very few joins (avg. 2.6 [
38
]) to be
executed by the federation engines. Thus, it is relatively
easy to find the best join execution order. However, for
complex and complex-plus-high-data sources queries,
more sub-optimal plans were generated. This is because
these queries contain more joins (around 4 joins on avg.
[
38
]), hence a more accurate join cardinality estimation
is required to generate the optimal join ordering plan.
In conclusion, efficient cardinality estimation is more
important for complex queries with more possible join
ordering.
6.5. Number of Transferred Triples
Table 6 shows the number of tuples sent and received
during the query execution for the selected federation
engines. The number of sent tuples is related to the
number of endpoint requests sent by the federation
engine during query processing [
28
,
47
]. The number
of received tuples can be regarded as the number of
intermediate results produced by the federation engine
during query processing [
28
]. The smaller number of
transferred tuples is considered important for fast query
processing [
28
]. In this regard, CostFed ranked first
with 31 green boxes (i.e., it had the best results among
the selected engines), followed by Odyssey with 24
green boxes, SemaGrow with 12 green boxes, LHD
with 10 green boxes, and then SPLENDID with 9 green
boxes.
In most queries, CostFed and Odyssey produced the
only possible plans only-plan, which means only one
(excluding the Left join for OPTIONAL SPARQL op-
erator) was locally executed by the federation engine.
Consequently, these engines transfer fewer tuples in
comparison to other approaches. The largest difference
is observed for S13, where CostFed and Odyssey clearly
outperform the other approaches, transferring 500 times
fewer tuples. The number of received tuples in LHD is
significantly high in comparison to other approaches.
This is because it does not produce normal tree-like
query plans. Rather, LHD focuses on generating inde-
pendent tasks that can be run in parallel. Therefore, in-
dependent tasks retrieve a lot of intermediate results,
which need to be joined locally in order to get the final
query resultset.
Another advantage that CostFed and Odyssey have
over other approaches is their join-aware approach for
triple pattern-wise sources selected (TPWSS). This
join-aware nature of these engines saves many tuples
from transferring due to less overestimation of sources.
CostFed also performs better because it maintains cache
for ask requests and saves many queries from sending
to different sources. Another important factor worth
mentioning here is that the number of transferred tuples
does not consider the number of columns (i.e., the num-
ber of projection variables in the query) in the result set,
but only counts the number of rows (i.e., the number
of results) returned or sent to the endpoints. We also
observed that in the case of an only-plan or an optimal
plan, the number of received tuples is less compared to
sub-optimal plans, clearly indicating that a smaller num-
ber of tuples is key to fast query processing. The amal-
gamated average of all queries could also be misleading
because in complex queries, there are more failed/time-
out queries for some systems while producing answers
in others. Therefore, we calculated the separate average
for each category of queries, i.e., simple, complex and
complex-and-high-data. From our analysis of the re-
sults, we concludes that if an engine produces optimal
or only-plan, the number of intermediate results also
decreases.
6.6. Indexing and Source Selection Metrics
A smaller-sized index is essential for fast index
lookup during source selection, but it can lack impor-
tant information. In contrast, large index sizes provide
slow index lookup and are hard to manage, but may
lead to better cardinality estimations. To this end, it is
important to compare the size of the indexes generated
by the selected federation engines. Table 7 shows a
comparison of the index/data summaries’ construction
time and the index size
4
of the selected state-of-the-
art cost-based SPARQL federation approaches. Sema-
Grow, SPLENDID and LHD rely on VOID statistics
with a size of 1 MB for the complete LargeRDFBench
datasets of size 34.3 GB. CostFed’s index size is 10.5
MB while Odyssey’s is 5.2 GB. The much bigger index
size used by Odyssey might makes this approach less
appropriate to be used for Big RDF datasets such as
WikiData, Linked Geo Data etc. CostFed’s index con-
struction time is around 1 hr and 6 mins for the complete
LargeRDFBench datasets. SPLENDID, SemaGrow and
4
The index size is given by size of summaries used for cardinality
estimation (in MBs).
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 19
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Queries CostFed SemaGrow Odyssey LHD SPLENDID
sent received sent received sent received sent received sent received
S1 31 100 49 111 47 100 34 91 49 111
S2 12 11 21 12 11 11 10 3 21 12
S3 20 2 22 24 20 2 Failed Failed 25 4
S4 12 1 46 17 12 1 20 18 15 3
S5 16 17 21 12 16 17 18 13444 18 20
S6 34 1616 TO TO 6500 3254 283 1766 36 1618
S7 27 642 45 635 TO TO 20 143 29 371
S8 10 1159 10 1159 10 1159 4 1159 10 1159
S9 25 351 59 382 25 351 34 342 59 382
S10 20 20054 36 14578 TO TO 16 24540 21 20055
S11 12 13 14 15 12 13 19 4261 14 15
S12 2147 2136 7772 3428 2147 2136 Failed Failed 7442 3428
S13 68 228 1161 10267 105 131 2456 13079 1161 10267
S14 2877 4033 3852 4366 2877 4033 97 2449 3852 4366
Avg 379 2168 1008 2693 981 934 250 5108 910 2987
C1 4234 2573 Failed Failed 4234 2573 Failed Failed 5232 4173
C2 1371 2118 131 1532 1352 1354 Failed Failed 131 1532
C3 6213 13343 8670 15464 6213 13343 Failed Failed 13854 9796
C4 26 550 TO TO 11 1093 Failed Failed Failed Failed
C5 TO TO Failed Failed 2232 20532 Failed Failed TO TO
C6 12 11432 TO TO 12 11432 Failed Failed 20 125310
C7 87 112 550 335 87 112 Failed Failed 550 335
C8 622 3519 1365 4768 622 3519 Failed Failed 1365 4768
C9 7274 21178 3358 10275 TO TO Failed Failed Failed Failed
C10 51 5702 112 1541 Failed Failed Failed Failed 979 1312
Avg 2210 6725 2364 5652 1845 6745 NA NA 3164 21032
CH1 390 8253 1709 9439 TO TO Failed Failed Failed Failed
CH2 TO TO TO TO TO TO Failed Failed Failed Failed
CH3 167 5053 4686 4011 TO TO Failed Failed Failed Failed
CH4 72 25 39 20 Failed Failed Failed Failed Failed Failed
CH5 Failed Failed Failed Failed TO TO Failed Failed TO TO
CH6 Failed Failed Failed Failed TO TO Failed Failed 1551 9401
CH7 2332 85158 Failed Failed TO TO Failed Failed Failed Failed
CH8 TO TO Failed Failed TO TO Failed Failed Failed Failed
Avg 740 24622 2145 4490 NA NA NA NA 1551 9401
Table 6: Number of transferred tuples.
NA:
"Not applicable".
Failed
means either "Runtime Error" or "Incomplete
Results" and
TO:
"Timeout", which means Query Execution exceeds threshold value. "green color"(
) means lowest
value among all systems, and "red color"( ) means highest value among all systems.
CostFed SemaGrow SPLENDID Odyssey LHD
Index Gen. Time (min) 65 110 110 533 110
Index Size (MBs) 10 1 1 5200 1
Table 7: Comparison of index construction time (
Index Gen. Time
) and
Index Size
for selected federation engines
20 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
Odyssey SPLENDID LHD SemaGrow CostFed
Qry #T #A ST #T #A ST #T #A ST #T #A ST #T #A ST
S1 11 0 1 11 26 293 28 0 261 11 26 293 4 18 6
S2 3 0 14 3 9 33 10 0 8 3 9 33 3 9 1
S3 5 0 44 12 2 17 20 0 34 12 2 17 5 0 1
S4 5 0 321 19 2 14 20 0 15 19 2 14 5 0 1
S5 4 0 223 11 1 11 11 0 8 11 1 11 4 0 1
S6 6 0 88 9 2 16 10 0 36 9 2 16 8 0 3
S7 6 0 72 13 2 19 13 0 67 13 2 19 6 0 1
S8 1 0 8 1 0 2 1 0 5 1 0 2 1 0 1
S9 4 0 1 11 26 200 28 0 69 11 26 200 4 18 5
S10 7 0 705 12 1 11 20 0 46 12 1 11 5 0 1
S11 7 0 30 7 2 19 15 0 12 7 2 19 7 0 1
S12 7 0 67 10 1 7 18 0 20 10 1 7 7 0 1
S13 10 0 23 9 2 8 17 0 58 9 2 8 5 0 1
S14 5 0 17 6 1 6 6 0 18 6 1 6 6 0 1
T/A 81 0 115.3 134 77 46 217 0 47 134 77 46 70 45 1.7
C1 8 0 38 11 1 11 RE RE RE 11 1 11 8 0 1
C2 8 0 44 11 1 7 RE RE RE 11 1 7 8 0 1
C3 30 0 121 21 3 12 RE RE RE 21 3 12 11 0 1
C4 12 0 24 28 0 3 RE RE RE 28 0 3 18 0 1
C5 16 0 320 33 0 3 RE RE RE 33 0 3 10 0 1
C6 9 0 311 24 0 2 RE RE RE 24 0 2 9 0 1
C7 9 0 38 17 2 9 RE RE RE 17 2 9 9 0 1
C8 11 0 27 25 2 11 RE RE RE 25 2 11 11 0 1
C9 19 0 452 16 2 17 RE RE RE 16 2 17 9 0 1
C10 12 0 142 13 0 3 RE RE RE 13 0 3 11 0 1
T/A 134 0 151.7 199 11 7.8 NA NA NA 199 11 7.8 104 0 1
CH1 22 0 333 41 48 62 RE RE RE 41 48 62 22 0 3
CH2 10 0 196 20 32 96 RE RE RE 20 32 96 10 0 5
CH3 18 0 544 37 37 604 RE RE RE 37 37 604 13 0 4
CH4 RE RE RE 18 28 25 RE RE RE 18 28 25 12 0 3
CH5 11 0 522 29 41 48 RE RE RE 29 41 48 RE RE RE
CH6 15 0 311 34 54 42 RE RE RE RE RE RE RE RE RE
CH7 26 0 337 47 65 57 RE RE RE 47 65 57 26 0 7
CH8 35 0 126 36 77 66 RE RE RE 36 77 66 35 0 6
T/A 137 0 338.4 262 382 125 NA NA NA 228 328 136 98 0 4.6
Table 8: Comparison of selected federation engines in terms of source selection time
ST
in msec, total number of
SPARQL
ASK
requests
#A
, and total triple pattern-wise sources selected
#T
. (
RE
represents "Runtime Error",
TO
represents "Time Out" of 20 min,
T/A
represents "Total/Average" where Average is for ST, and Total is for #T and
#A,
NA
represents "Not Applicable"). "green color"(
) means lowest value among all systems, and "red color"(
)
means highest value among all systems.
LHD took 1 hr and 50 mins to generate the index. The
Index construction time for Odyssey was 86 hrs and 30
mins, which makes it difficult to use for big datasets or
datasets with frequent updates.
According to [
38
], the efficiency of source selection
can be measured in terms of: (1) total number of triple
pattern-wise sources selected (#T), (2) the number of
SPARQL ASK requests sent to the endpoints (#A) dur-
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 21
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
ing source selection, and (3) the source selection time.
Table 8 shows a comparison of the source selection al-
gorithms of the select triple stores across these metrics.
As discussed previously, the smaller #T leads to better
query plan generation [
38
]. The smaller #A leads to
smaller source selection time, which in turn leads to
smaller query execution time. In this regard, CostFed
ranked first (83 green boxes, i.e., the best results among
the selected engines), followed by Odyssey with 56
green boxes, LHD with 15 green boxes, SPLENDID
with 10 green boxes, and then SemaGrow with 9 green
boxes.
The approaches that perform a join-aware and hy-
brid (SPARQL + index) source selection lead to smaller
#T [
38
]. Both Odyssey and Costfed perform join-
aware source selection and hence lead to smaller #T
than other selected approaches. The highest number of
SPARQL ASK requests is sent by index-free federation
engines, followed by hybrid (SPARQL + index), which
in turn is followed by index-only federation engines
[
38
]. This is because for index-free federation engines,
such as FedX, the complete source selection is based
on SPARQL ASK queries. The Hybrid engines such as
CostFed, SPLENDID, SemaGrow and Odyssey make
use of both index and SPARQL ASK queries to per-
form source selection, thus some of the SPARQL ASK
requests are skipped due to the information used in
the index. The Index-only engines, such as LHD, only
make use of the index to perform the complete source
selection. Thus, these engines do not consume a sin-
gle SPARQL ASK query during source selection. The
source selection time for such engines is much smaller
due to only index-lookup without sending outside re-
quests to endpoints. However, they have more #T than
hybrid (SPARQL + index) source selection approaches.
6.7. Query Execution Time
Finally, we present the query runtime results of
the selected federation engines across the different
queries categories of LargeRDFBench. Figure 5 gives
an overview of our results. In our runtime evaluation
on simple queries (S1-S14) (see Figure 5a), CostFed
has the shortest runtimes, followed by SemaGrow,
LHD, Odyssey, and SPLENDID. CostFed’s runtimes
are shorter than SemaGrow’s on 13/13 comparable
queries (excluding queries with timeout and runtime
error) (average runtime = 0.5 sec for CostFed vs. 2.5
sec for SemaGrow). SemaGrow outperforms LHD on
4/11 comparable queries with an average runtime of
2.5 sec for SemaGrow vs. 2.7 sec for LHD. LHD’s run-
times are shorter than Odyssey’s on 8/10 comparable
queries with an average runtime of 8.5 sec for Odyssey.
Finally, Odyssey is clearly faster than SPLENDID on
8/12 comparable queries with an average runtime of
131 sec for SPLENDID.
Our runtime evaluation on the complex queries (C1-
C10) (see Figure 5b) leads to a different ranking:
CostFed produces the shortest runtimes followed by
SemaGrow, Odyssey, and SPLENDID. CostFed outper-
forms SemaGrow in 6/6 comparable queries (excluding
queries with timeout and runtime error) with an average
runtime of 3 sec for CostFed vs. 9 sec for SemaGrow.
SemaGrow’s runtimes are shorter than Odyssey’s in
3/4 comparable queries with an average runtime of 63
sec for Odyssey. Odyssey is better than SPLENDID in
5/5 comparable queries, where SPLENDID’s average
runtime is 98 sec.
The runtime evaluation on the complex and high
sources queries (CH1-C8) given in Figure 5c estab-
lishes CostFed as the best query federation engine, fol-
lowed by SPLENDID and then SemaGrow. CostFed’s
runtimes are smaller than SemaGrow in 3/3 comparable
queries (excluding queries with timeout and runtime
error), with an average runtime of 4 sec for CostFed
vs. 191 sec for SemaGrow. SPLENDID has no compa-
rable queries with CostFed and SemaGrow. LHD and
Odyssey both fail to produce results when faced with
complex queries.
7. Conclusion
In this paper, we presented an extensive evaluation of
existing cost-based federated query engines. We used
existing metrics from relational database research and
proposed new metrics to measure the quality of car-
dinality estimators of selected engines. To the best of
our knowledge, this work is the first evaluation of cost-
based SPARQL federation engines focused on the qual-
ity of the cardinality estimations.
–
The proposed similarity-based errors have a more
positive correlation with runtimes, i.e., the smaller
the error values, the better the query runtimes.
Thus, this metric helps developers to design a more
efficient query execution planner for federation
engines. Our proposed approach produces more
significant results compared to q-error. However,
there is still room for further improvement.
22 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
TIMEOUT
TIMEOUT
TIMEOUT
FAILED
FAILED
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 Avrg
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
Average execution time(msec) log scale
COSTFED
SEMAGROW
ODYSSEY
SPLENDID
LHD
(a) Average execution time of simple (S) queries (FedBench)
TIMEOUT
FAILED
TIMEOUT
FAILED
TIMEOUT
TIMEOUT
TIMEOUT
FAILED
FAILED
TIMEOUT
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Avrg
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
Average execution time(msec) log scale
COSTFED
SEMAGROW
ODYSSEY
SPLENDID
LHD
(b) Average execution time of complex (C) queries (LargeRDFBench)
TIMEOUT
FAILED
FAILED
TIMEOUT
TIMEOUT
FAILED
FAILED
FAILED
FAILED
TIMEOUT
FAILED
TIMEOUT
FAILED
TIMEOUT
TIMEOUT
TIMEOUT
TIMEOUT
FAILED
FAILED
FAILED
FAILED
FAILED
TIMEOUT
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 Avrg
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
Average execution time(msec) log scale
COSTFED
SEMAGROW
ODYSSEY
SPLENDID
LHD
(c) Average execution time of complex and high data structures (ch) queries(LargeRDFBench)
Fig. 5.: Average execution time of LargeRDFBench and FedBench Queries.
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 23
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
–
The higher coefficients (R values) with similarity
errors (as opposed to q-error), suggest that the
proposed similarity errors are a better predictor for
runtime than the q-error.
–
The smaller p-values of the similarity errors, as
compared to q-error, further confirm that similarity
errors are more likely to be a better predictor for
runtime than the q-error.
–
Errors in the cardinality estimation of triple pat-
terns have a higher correlation to runtimes than the
error in the cardinality estimation of joins. Thus,
cost-based federation engines must pay particular
attention to attaining accurate cardinality estima-
tions of triple patterns.
–
The number of transferred tuples have a direct co-
relation with query runtime, i.e., the smaller the
number of transferred tuples, the smaller the query
runtimes.
–
The smaller number of triple pattern-wise sources
selected is key to generate maximum only possible
query plans (only-plan).
–
On average, the CostFed engine produces the
fewest estimation errors and has the shortest exe-
cution time for the majority of LargeRDFBench
queries.
–
The weak to moderate correlation of the cardinal-
ity errors with query execution time suggests that
the query runtime is a complex measure affected
by multi-dimensional performance metrics and
SPARQL query features. The proposed similarity
error metric is related to the query planning com-
ponent of the federation engines and is useful for
evaluating the quality of the query plans generated
by these engines.
–
The proposed cardinality estimating metrics are
generic and can be applied to non-federated
cardinality-based query processing engines as
well.
The impact of our proposed work is to provide new
measures for the development of better cost-based fed-
erated SPARQL query engines. Furthermore, our pro-
posed metrics will help in determining the quality of
the generated query plans, such as indicating whether
or not the join orders are correct. This kind of informa-
tion is not revealed from the query runtime because the
overall query runtime is affected by all metrics given in
Table 1. As future work, we want to compare heuristic-
based (index-free) federated SPARQL query processing
engines with cost-based federated engines. We want to
investigate how much an index is assisting a cost-based
federated SPARQL engine to generate optimized query
execution plans.
Acknowledgments
The work has been supported by the EU H2020
Marie Skłodowska-Curie project KnowGraphs (no.
860801), BMVI-funded project LIMBO (Grant no.
19F2029I), BMVI-funded project OPAL (no. 19F2028A),
and BMBF-funded EuroStars project SOLIDE (no.
13N14456). This work has also been supported by the
National Research Foundation of Korea (NRF) (grant
funded by the Korea government (MSIT) (no. NRF-
2018R1A2A2A05023669)).
References
[1]
Ibrahim Abdelaziz, Essam Mansour, Mourad Ouzzani, Ashraf
Aboulnaga, and Panos Kalnis. Lusail: A System for Querying
Linked Data at Scale. Proc. VLDB Endow., 11(4):485–498,
December 2017. DOI:10.1145/3186728.3164144.
[2]
Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio
Castillo, and Edna Ruckhaus. ANAPSID: An Adaptive Query
Processing Engine for SPARQL Endpoints. In Lora Aroyo,
Chris Welty, Harith Alani, Jamie Taylor, Abraham Bernstein,
Lalana Kagal, Natasha Noy, and Eva Blomqvist, editors, The
Semantic Web – ISWC 2011, volume 7031 of Lecture Notes in
Computer Science, pages 18–34. Springer-Verlag Berlin Heidel-
berg, 2011. DOI:10.1007/978-3-642-25073-6_2.
[3]
Maribel Acosta, Maria-Esther Vidal, and York Sure-Vetter. Dief-
ficiency Metrics: Measuring the Continuous Efficiency of Query
Processing Approaches. In Claudia d’Amato, Miriam Fernan-
dez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux,
Juan Sequeda, Christoph Lange, and Jeff Heflin, editors, The
Semantic Web – ISWC 2017, pages 3–19, Cham, 2017. Springer-
Verlag Berlin Heidelberg. DOI:10.1007/978-3-319-68204-4_1.
[4]
Keith Alexander, Ltd Talis, Cyganiak, Michael Hausenblas,
and Jun Zhao. Describing Linked Datasets-On the Design and
Usage of void, the
´
Vocabulary of Interlinked Datasets
´
. In In
Linked Data on the Web Workshop (LDOW 09), in conjunction
with 18th International World Wide Web Conference (WWW 09,
volume 538, 12 2010.
[5]
Christian Bizer and Andreas Schultz. The Berlin SPARQL
Benchmark. In International Journal on Semantic Web and
Information Systems (IJSWIS), volume 5, pages 1–24. Inter-
national Journal on Semantic Web and Information Systems
(IJSWIS), 2009. DOI:10.4018/jswis.2009040101.
[6]
Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich, and Pierre-
Yves Vandenbussche. SPARQL Web-Querying Infrastructure:
Ready for Action? In Harith Alani, Lalana Kagal, Achille
Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira,
Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janow-
icz, editors, The Semantic Web – ISWC 2013, pages 277–
293, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
DOI:10.1007/978-3-642-41338-4_18.
24 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
[7]
Angelos Charalambidis, Antonis Troumpoukis, and Stasinos
Konstantopoulos. Semagrow: Optimizing Federated SPARQL
Queries. In Proceedings of the 11th International Conference
on Semantic Systems, SEMANTICS ’15, pages 121–128, New
York, NY, USA, 2015. ACM. DOI:10.1145/2814864.2814886.
[8] Felix Conrads, Jens Lehmann, Muhammad Saleem, Mohamed
Morsey, and Axel-Cyrille Ngonga Ngomo. I guana: A Generic
Framework for Benchmarking the Read-Write Performance of
Triple Stores. In International Semantic Web Conference, pages
48–65, Cham, 2017. Springer, Springer International Publishing.
DOI:10.1007/978-3-319-68204-4_5.
[9]
Fang Du, Yueguo Chen, and Xiaoyong Du. Partitioned In-
dexes for Entity Search over RDF Knowledge Bases. In Pro-
ceedings of the 17th International Conference on Database
Systems for Advanced Applications - Volume Part I, DAS-
FAA’12, page 141–155, Berlin, Heidelberg, 2012. Springer-
Verlag. DOI:10.1007/978-3-642-29038-1_12.
[10]
Kemele M. Endris, Mikhail Galkin, Ioanna Lytra, Mo-
hamed Nadjib Mami, Maria-Esther Vidal, and Sören Auer.
MULDER: Querying the Linked Data Web by Bridging RDF
Molecule Templates. In Djamal Benslimane, Ernesto Dami-
ani, William I. Grosky, Abdelkader Hameurlain, Amit Sheth,
and Roland R. Wagner, editors, Database and Expert Systems
Applications (DEXA
´
17), pages 3–18, Cham, 8 2017. Springer
International Publishing. DOI:10.1007/978-3-319-64468-4_1.
[11]
Olaf Görlitz and Steffen Staab. SPLENDID: SPARQL Endpoint
Federation Exploiting VOID Descriptions. In Proceedings of
the Second International Conference on Consuming Linked
Data - Volume 782, COLD’11, pages 13–24, Aachen, Germany,
Germany, 2010. CEUR-WS.org.
[12]
Olaf Görlitz, Matthias Thimm, and Steffen Staab. SPLODGE:
Systematic Generation of SPARQL Benchmark Queries for
Linked Open Data. In Proceedings of the 11th International
Conference on The Semantic Web - Volume Part I, The Semantic
Web – ISWC’12, pages 116–132, Berlin, Heidelberg, 2012.
Springer-Verlag Berlin Heidelberg. DOI:10.1007/978-3-642-
35176-1_8.
[13]
Andrey Gubichev and Thomas Neumann. Exploiting the query
structure for efficient join ordering in SPARQL queries. In
EDBT, volume 14, pages 439–450, 2014.
[14]
Olaf Hartig, Christian Bizer, and Johann-Christoph Freytag.
Executing SPARQL Queries over the Web of Linked Data. In
Proceedings of the 8th International Semantic Web Conference,
ISWC ’09, page 293–309, Berlin, Heidelberg, 2009. Springer-
Verlag. DOI:10.1007/978-3-642-04930-9_19.
[15]
Ali Hasnain, Ronan Fox, Stefan Decker, and Helena F Deus.
Cataloguing and Linking Life Sciences LOD Cloud. In 1st Inter-
national Workshop on Ontology Engineering in a Data-driven
World (OEDW 2012) collocated with 8th International Confer-
ence on Knowledge Engineering and Knowledge Management
(EKAW 2012), pages 114–130, 2012.
[16]
Ali Hasnain, Qaiser Mehmood, Syeda Sana E Zainab, Muham-
mad Saleem, Claude Warren, Jr, Durre Zehra, Stefan Decker,
and Dietrich Rebholz-Schuhman. BioFed: Federated query pro-
cessing over life sciences linked open data. Journal of Biomedi-
cal Semantics, 8(1):13, 03 2017.
[17]
Ali Hasnain, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo,
and Dietrich Rebholz-Schuhmann. Extending LargeRDFBench
for Multi-Source data at scale for SPARQL endpoint federation.
In Proceedings of the 12th International Workshop on Scalable
Semantic Web Knowledge Base Systems co-located with 17th
International Semantic Web Conference, SSWS@ISWC 2018,
Monterey, California, USA, October 9, 2018, volume 2179,
pages 28–44, 2018. DOI:10.3233/978-1-61499-894-5-203.
[18]
Ali Hasnain, Syeda Sana e Zainab, MaulikR. Kamdar, Qaiser
Mehmood, Jr. Warren, ClaudeN., QurratalAin Fatimah, Hele-
naF. Deus, Muntazir Mehdi, and Stefan Decker. A Roadmap
for Navigating the Life Sciences Linked Open Data Cloud. In
Thepchai Supnithi, Takahira Yamaguchi, Jeff Z. Pan, Vilas Wu-
wongse, and Marut Buranarach, editors, Semantic Technology,
volume 8943 of Lecture Notes in Computer Science, pages 97–
112. Springer International Publishing, 2015. DOI:10.1007/978-
3-319-15615-6_8.
[19]
Paul W. Holland and Roy E. Welsch. Robust regression
using iteratively reweighted least-squares. Communications
in Statistics - Theory and Methods, 6(9):813–827, 1977.
DOI:10.1080/03610927708827533.
[20]
Peter J. Huber. Robust Estimation of a Location Parameter,
pages 492–518. Springer New York, New York, NY, 1992.
DOI:10.1007/978-1-4612-4380-9_35.
[21]
Yasar Khan, Muhammad Saleem, Aftab Iqbal, Muntazir Mehdi,
Aidan Hogan, Axel-Cyrille Ngonga Ngomo, Stefan Decker,
and Ratnesh Sahay. SAFE: Policy Aware SPARQL Query
Federation Over RDF Data Cubes. In Proceedings of the 7th
International Workshop on Semantic Web Applications and
Tools for Life Sciences, Berlin, Germany, December 9-11, 2014.,
Berlin, Germany, 10 2014. DOI:10.13140/2.1.3153.9204.
[22]
Yasar Khan, Muhammad Saleem, Muntazir Mehdi, Aidan
Hogan, Qaiser Mehmood, Dietrich Rebholz-Schuhmann, and
Ratnesh Sahay. SAFE: SPARQL Federation over RDF Data
Cubes with Access Control. Journal of biomedical semantics,
8(1):5, 2017. DOI:10.1186/s13326-017-0112-6.
[23]
Donald Kossmann. The State of the Art in Distributed Query
Processing. ACM Comput. Surv., 32(4):422–469, December
2000. DOI:10.1145/371578.371598.
[24]
Günter Ladwig and Thanh Tran. SIHJoin: Querying Remote
and Local Linked Data. In Grigoris Antoniou, Marko Grobel-
nik, Elena Simperl, Bijan Parsia, Dimitris Plexousakis, Pieter
De Leenheer, and Jeff Pan, editors, The Semantic Web: Re-
search and Applications, volume 6643 of Lecture Notes in
Computer Science, pages 139–153. Springer Berlin Heidelberg,
2011. DOI:10.1007/978-3-642-21034-1_10.
[25]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz,
Alfons Kemper, and Thomas Neumann. How Good Are Query
Optimizers, Really? Proc. VLDB Endow., 9(3):204–215, nov
2015. DOI:10.14778/2850583.2850594.
[26]
Steven Lynden, Isao Kojima, Akiyoshi Matono, and Yusuke
Tanimura. ADERIS: An Adaptive Query Processor for Join-
ing Federated SPARQL Endpoints. In R. Meersman, T. Dil-
lon, P. Herrero, A. Kumar, M. Reichert, L. Qing, B.-C. Ooi,
E. Damiani, D.C. Schmidt, J. White, M. Hauswirth, P. Hitzler,
M. Mohania, editors, On the Move to Meaningful Internet Sys-
tems (OTM2011), Part II. LNCS, volume 7045, pages 808–817.
Springer Heidelberg, 2011. DOI:10.1007/978-3-642-25106-
1_28.
[27]
Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. Pre-
venting Bad Plans by Bounding the Impact of Cardinality Es-
timation Errors. Proc. VLDB Endow., 2(1):982–993, 8 2009.
DOI:10.14778/1687627.1687738.
[28]
Gabriela Montoya, Hala Skaf-Molli, and Katja Hose. The
Odyssey Approach for Optimizing Federated SPARQL Queries.
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines 25
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
The Semantic Web – ISWC 2017, page 471–489, 2017.
DOI:10.1007/978-3-319-68288-4_28.
[29]
Gabriela Montoya, Maria-Esther Vidal, and Maribel Acosta.
A Heuristic-based Approach for Planning Federated SPARQL
Queries. In Proceedings of the Third International Conference
on Consuming Linked Data - Volume 905, COLD’12, pages
63–74, Aachen, Germany, Germany, 2012. CEUR-WS.org.
DOI:10.5555/2887367.2887373.
[30]
Gabriela Montoya, Maria-Esther Vidal, Óscar Corcho, Edna
Ruckhaus, and Carlos Buil Aranda. Benchmarking Federated
SPARQL Query Engines: Are Existing Testbeds Enough? In
Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tu-
dorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier
Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein,
and Eva Blomqvist, editors, The Semantic Web - ISWC 2012 -
11th International Semantic Web Conference, Boston, MA, USA,
November 11-15, 2012, Proceedings, Part II, volume 7650 of
Lecture Notes in Computer Science, pages 313–324. Springer,
2012. DOI:10.1007/978-3-642-35173-0_21.
[31]
Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-
Cyrille Ngonga Ngomo. DBpedia SPARQL Benchmark: Perfor-
mance Assessment with Real Queries on Real Data. In Proceed-
ings of the 10th International Conference on The Semantic Web
- Volume Part I, The Semantic Web – ISWC’11, pages 454–469,
Berlin, Heidelberg, 2011. Springer-Verlag Berlin Heidelberg.
DOI:10.1007/978-3-642-25073-6_29.
[32]
Thomas Neumann and Guido Moerkotte. Characteristic Sets:
Accurate Cardinality Estimation for RDF Queries with Multiple
Joins. In Proceedings of the 2011 IEEE 27th International Con-
ference on Data Engineering, pages 984–994. IEEE Computer
Society, IEEE, 2011. DOI:10.1109/ICDE.2011.5767868.
[33]
Dianne P. O’Leary. Robust Regression Computation Using
Iteratively Reweighted Least Squares. SIAM J. Matrix Anal.
Appl., 11(3):466–480, May 1990. DOI:10.1137/0611032.
[34]
Bastian Quilitz and Ulf Leser. Querying Distributed RDF Data
Sources with SPARQL. In Proceedings of the 5th European
Semantic Web Conference on The Semantic Web: Research and
Applications, ESWC’08, pages 524–538, Berlin, Heidelberg,
2008. Springer-Verlag. DOI: 10.1007/978-3-540-68234-9_39.
[35]
Nur Aini Rakhmawati, Muhammad Saleem, Sarasi Lalithsena,
and Stefan Decker. QFed: Query Set For Federated SPARQL
Query Benchmark. In Proceedings of the 16th International
Conference on Information Integration and Web-based Applica-
tions & Services, iiWAS ’14, pages 207–211, New York,
NY, USA, 2014. ACM. DOI:10.1145/2684200.2684321.
[36]
Nur Aini Rakhmawati, Jürgen Umbrich, Marcel Karnstedt, Ali
Hasnain, and Michael Hausenblas. Querying over Federated
SPARQL Endpoints - A State of the Art survey. CoRR, 2013.
arXiv:1306.1723.
[37]
Peter J Rousseeuw and Annick M Leroy. Robust regression and
outlier detection, volume 589. John wiley & sons, Inc., New
York, NY, USA, 1st edition, 1987. DOI:10.1002/0471725382.
[38]
Muhammad Saleem, Ali Hasnain, and Axel-Cyrille
Ngonga Ngomo. LargeRDFBench: A Billion Triples Bench-
mark for SPARQL Endpoint Federation. Journal of Web Seman-
tics, 48:85–125, 01 2018. DOI:10.1016/j.websem.2017.12.005.
[39]
Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov,
and Axel-Cyrille Ngonga Ngomo. A fine-grained evaluation of
SPARQL endpoint federation systems. Semantic Web Journal,
7(5):493–518, 6 2016. DOI:10.3233/SW-150186.
[40]
Muhammad Saleem and Axel-Cyrille Ngonga Ngomo. HiBIS-
CuS: Hypergraph-Based Source Selection for SPARQL End-
point Federation. In Valentina Presutti, Claudia d’Amato, Fa-
bien Gandon, Mathieu d’Aquin, Steffen Staab, and Anna Tordai,
editors, The Semantic Web: Trends and Challenges, volume
8465 of Lecture Notes in Computer Science, pages 176–191.
Springer International Publishing, 2014. DOI:10.1007/978-3-
319-07443-6_13.
[41]
Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Josiane
Xavier Parreira, Helena F. Deus, and Manfred Hauswirth. DAW:
Duplicate-AWare Federated Query Processing over the Web
of Data. In Proceedings of the 12th International Semantic
Web Conference - Part I, Lecture Notes in Computer Science,
pages 574–590, New York, NY, USA, 2013. Springer-Verlag
New York, Inc. DOI:10.1007/978-3-642-41335-3_36.
[42]
Muhammad Saleem, Shanmukha S. Padmanabhuni, Axel-
Cyrille Ngonga Ngomo, Aftab Iqbal, Jonas S. Almeida, Stefan
Decker, and Helena F. Deus. TopFed: TCGA Tailored Federated
Query Processing and Linking to LOD. J. Biomed. Semant.,
5:47, 2014. DOI:10.1186/2041-1480-5-47.
[43]
Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf
Hartig, and Axel-Cyrille Ngonga Ngomo. CostFed: Cost-Based
Query Optimization for SPARQL Endpoint Federation. In
Proceedings of the 14th International Conference on Seman-
tic Systems, volume 137, pages 163–174. Elsevier, 09 2018.
DOI:10.1016/j.procs.2018.09.016.
[44]
Muhammad Saleem, Gábor Szárnyas, Felix Conrads, Syed
Ahmad Chan Bukhari, Qaiser Mehmood, and Axel-Cyrille
Ngonga Ngomo. How Representative is a SPARQL Bench-
mark? An Analysis of RDF Triplestore Benchmarks. In The
World Wide Web Conference, WWW ’19, page 1623–1633, New
York, NY, USA, 2019. Association for Computing Machinery.
DOI:10.1145/3308558.3313556.
[45]
Michael Schmidt, Olaf Görlitz, Peter Haase, Günter Ladwig,
Andreas Schwarte, and Thanh Tran. FedBench: A Bench-
mark Suite for Federated Semantic Data Query Processing. In
Lora Aroyo, Chris Welty, Harith Alani, Jamie Taylor, Abra-
ham Bernstein, Lalana Kagal, Natasha Noy, and Eva Blomqvist,
editors, The Semantic Web – ISWC 2011, pages 585–600,
Berlin, Heidelberg, 2011. Springer-Verlag Berlin Heidelberg.
DOI:10.1007/978-3-642-25073-6_37.
[46]
Michael Schmidt, Thomas Hornung, Georg Lausen, and
Christoph Pinkel. SPˆ 2Bench: A SPARQL Performance
Benchmark. In Proceedings of the 25th International Confer-
ence on Data Engineering ICDE, pages 222–233. IEEE, 2009.
DOI:10.1109/ICDE.2009.28.
[47]
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel,
and Michael Schmidt. FedX: Optimization Techniques for
Federated Query Processing on Linked Data. In Lora Aroyo,
Chris Welty, Harith Alani, Jamie Taylor, Abraham Bernstein,
Lalana Kagal, Natasha Noy, and Eva Blomqvist, editors, The
Semantic Web – ISWC 2011, pages 601–616, Berlin, Heidelberg,
01 2011. Springer-Verlag Berlin Heidelberg. DOI:10.1007/978-
3-642-25073-6_38.
[48]
Jürgen Umbrich, Aidan Hogan, Axel Polleres, and Stefan
Decker. Link Traversal Querying for a Diverse Web of Data. Se-
mantic Web Journal, 6(6):585–624, 6 2015. DOI:10.3233/SW-
140164.
[49]
Xin Wang, Thanassis Tiropanis, and Hugh Davis. LHD Opti-
mising Linked Data Query Processing Using Parallelisation. In
Workshop on Linked Data on the Web (LDOW
´
13), Proceedings
26 An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 50
51 51
of the WWW2013, volume 996 of CEUR Workshop Proceedings,
Rio de Janeiro, Brazil, 05 2013. CEUR-WS.org.
[50]
Marcin Wylot, Manfred Hauswirth, Philippe Cudré-Mauroux,
and Sherif Sakr. RDF Data Storage and Query Processing
Schemes: A Survey. ACM Comput. Surv., 51(4):84:1–84:36,
September 2018. DOI:10.1145/3177850.