ChapterPDF Available

Abstract and Figures

As more private data is entering the web, defining authorization about its access is crucial for privacy protection. This paper proposes a policy language that leverages SPARQL expressiveness and popularity for flexible access control management and enforces the protection using temporal graphs. The temporal graphs are created during the authentication phase and are cached for further usage. They enable design-time policy testing and debugging, which is necessary for correctness guarantee.
Content may be subject to copyright.
Temporal Authorization Graphs: Pros,
Cons and Limits
Riste Stojanov1(B
), Ognen Popovski2,MilosJovanovik
1,3, Eftim Zdravevski1,
Petre Lameski1,andDimitarTrajanov
1
1Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University
in Skopje, Skopje, Macedonia
{riste.stojanov,milos.jovanovik,eftim.zdravevski,
petre.lameski,dimitar.trajanov}@finki.ukim.mk
2Netcetera, Skopje, Macedonia
3OpenLink Software, Burlington, UK
Abstract. As more private data is entering the web, defining autho-
rization about its access is crucial for privacy protection. This paper
proposes a policy language that leverages SPARQL expressiveness and
popularity for flexible access control management and enforces the pro-
tection using temporal graphs. The temporal graphs are created during
the authentication phase and are cached for further usage. They enable
design-time policy testing and debugging, which is necessary for correct-
ness guarantee.
The security never comes with convenience, and this paper examines
the environments in which the temporal graphs are suitable. Based on
the evaluation results, an approximated function is defined for suitability
determination based on the expected temporal graph size.
Keywords: Authorization ·Temporal authorization graphs ·Policy
language ·Semantic access control
1Introduction
The expansion of the information technologies have produced vast amount of
data that are stored in various systems and represent almost every aspect of
our professional and private life. The authorization systems are the guardian
of these data and regulate its access only to the granted requesters. The dis-
tributed environments in which the data is stored introduce many challenges for
the authorization systems. The authorization is usually declared with policies
that are enforced by an access control module implementation. The standards
for policy definition, such as XACML [9], depend on the underlaying data model,
and the policies are usually separately for each of the sub-systems. The separate
authorization definition is mainly due to the lack of integration in multi-domain
scenarios. There are multiple solutions for authentication in distributed environ-
ments, such as single-sign-on services [2], WebID [19,23]andOAuth[11], and
there are frameworks that enable their integration and combination [18].
c
!ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2021
Published by Springer Nature Switzerland AG 2021. All Rights Reserved
I. M. Pires et al. (Eds.): GOODTECHS 2021, LNICST 401, pp. 105–120, 2021.
https ://doi.or g/10 .1007/978 -3-0 30-91421- 9_9
106 R. Stojanov et al.
The semantic web [3] initiatives have provided solutions for the problems of
dierent data representation in the various systems and have defined standards
for bridging this gap. The linked data initiative [4] provides linking the same
concepts with dierent representations from the various systems. However, even
though these technologies solve most of the integration issues among various
systems, there is no clear authorization solution that stands out and will bring
them closer to the enterprise and personal applications.
The environment analyzed in this paper can be described through Definition 1
and Definition 2, using a data centric approach for authorization. The intent
Iis used to describe the set of facts presented by the requester or its agent
application. It contains all the information necessary for the system to decide
whether it will be further processed or not. It usually provides some evidence [14]
about the requester, its environment and the intended action. It is also a common
practice to include the software agent parameters as evidence, since it submits
the intent on behalf of the requester. The system presented in this paper is not
responsible for intent construction Iand only provides interface for accepting
it. However, it observes each intent change in order to determine the available
resources for that state.
Definition 1. Protecting data (D)is a set of statements and resources that
the authorization system should protect.
Definition 2. Intent (I)is a set of statements and resources that define the
requester intent together with its environment.
The authorization systems usually operate in order to enforce a given set
of client requirements. The clients define these requirements in a natural lan-
guage and should be modeled in a suitable way for the system. An example
requirement that an authorization system should model and enforce is shown in
Requirement 1. It relates to both the data Dand the intent I, using the rela-
tions among them. The requirements are designed around the assumption that
there is an implicit intent present. A most general abstraction can be that the
requirement define a permission or prohibition of a certain interaction with a
subset of the data for a given intent. It also may be observed as a set of rules
that constrain an interaction using the connections among the intent and the
data.
Requirement 1. The professors can manage their active courses’ grades from
their faculty’s network.
This particular requirement permits a managing interaction with a grades,
which courses are connected with some professor. Which particular grades are
modeled with this requirement will be known when the intent will be present.
Only the intents that contain a requester, an agent address and a manage action
are applicable for this requirement. When suitable intent is present, the require-
ment can be materialized and the corresponding grades can be determined.
The requirements are represented as policies in the authorization systems,
and the policy language and formalism defines its flexibility, understandability
Temporal Authorization Graphs: Pros, Cons and Limits 107
and maintainability. The flexibility describes the ability to transform the nat-
ural language requirements into the policies. If some requirement can not be
modeled as a policy, than the system is not flexible enough for that kind of
requirements. The time required for the administrators to learn and practice
the policy language is referred to as understandability. The maintainability of
the authorization system correlates to the human eort required to configure
and maintain it. The transformation of the natural language requirements into
policies, and their correctness assurance occupies most of the maintenance eort.
In order to rank better in respect with these features, the policy language
should be based on well adopted and widely spread technologies and standards. It
should be as close as possible with the natural language expressiveness, enabling
flexibility to define multiple complex relations among the data and the intent.
For better maintainability, the policy language should provide close one-to-one
requirement to policy transformation, and ability to test it for correctness and
consistency.
2RelatedWork
The authorization policies are the formalized requirements that are enforced
during the authorization.
The enforcement process is usually implemented dierently in each system,
but there are three main enforcement patterns that may be detected: Resource
protection is the most commonly used pattern, where the system controls the
access to each of the resources based on the authorization policies. This pattern
is most widely used due to its simplicity [18]. However, the downside of this
pattern is that it is coarse grained. Another pattern that is being used is with
creating authorized data set for each user, implemented by constructing a
graph that composed only of the permitted data [6,8,16,20,21]. The trade-oof
the implementation simplicity of this pattern is the extra time required for graph
construction per user login [13,16]. Query Rewriting enforcement is the most
complex pattern that can be used [20,22]. It adds authorization constructs to
each query that is executed in the system, such that only the permitted resources
can be obtained. This pattern uses complex query rewriting algorithms that must
be extensively tested for correctness [13].
Generalized policy format is described in [13]as:
< Subj ect, Resource, AccessRight > (1)
The Subject1defines for whom the policy will be used i.e. the agent or the
user that is interacting with the system. The Resource defines what is protected
by the policy, and the Access Right defines whether certain action is allowed
or denied by the policy. In other words, the access right defines whether
1In this paper we will use the term requester instead of subject, since it beater
describes the actor that is interacting with the system.
108 R. Stojanov et al.
the policy permits or denies interaction with the resource on behalf of the
requester in a given context (referred to as condition expressiveness in [14]).
The semantic authorization systems correctness is poorly analyzed in most
of the literature, with exception of [13], where the query rewriting is tested for
correctness against a temporal graph containing only the permitted resources.
However, the possible errors in the policy design and definition processes are not
considered in this paper. We have addressed this issue in our previous work [20].
For simpler policy definition, many system allow both permit and deny poli-
cies. However, in this case conflicts may arise [5,7,14]. There are various ways
to solve the conflicts, such as: default behavior [1,7,20], meta-policies [12], pri-
orities [15,20], and detection and prevention [16,20,24].
The formalization in (1) is not sucient, since it does not encounter con-
textual evidences and the relation between the requester description with the
under-laying data. The access control models, on the other hand, are focused on
separate elements from this policy format and none of them model the complete
authorization environment. Additionally, only few systems enable flexibility for
connecting the request’s attributes with the protected data.
Policies have an essential role in data science applications to mitigate privacy
and ethical concerns. For example, during feature engineering processes and
when evaluating information value and feature importance of dierent data [26],
also needs to consider whether that data is suitable to be used for that particular
application in the first place. Therefore, dierent applications, such as in churn
prediction, may be aected by the various policies in place [25]. Likewise, the
computational requirements may aect the scalability of the whole cloud-based
solution [10].
3PolicyFormat
Definition 3provides a formalization of the requirements described previously.
The activation function αis responsible for testing and applying the intent’s
data that is implicitly assumed in the requirement. It is executed whenever the
intent is changed and filters out the policies that are not suitable for the current
state. It also replaces the implicit variables with concrete values form the intent
for the suitable policies. The function ϕfilters the data that is protected by the
policy, and #defines whether that data is allowed or denied.
Definition 3. Policy is a tuple of !α,ϕ,#,ρ"that defines the condition α(I
D)that an intent Ishould satisfy in relation to the protected data D, so that
interaction #{allow, deny}will be enforced with the result data R=ϕ(ID).
The element ρis a priority that is used for conflict resolution, αstands for policy
activation condition and ϕrepresents a partial data filtering function.
3.1 Policy Combination
The policy combination is important for two main reasons: (1) breaking down a
complex authorization into simpler rules and (2) conflict resolution. Definition 4
gives a formal definition of a conflict.
Temporal Authorization Graphs: Pros, Cons and Limits 109
Definition 4. Two policies P1=!α1,ϕ
1,#
1,ρ
1"and P1=!α2,ϕ
2,#
2,ρ
2"are in
conict if:
#1%=#2,
there exist intent for which they are both active,
Φ%=,
where Φ=Φ1Φ2and Φi=ϕi(DI),i[1,2].
The policy combination is discussed in [7,17], and it suggests that the poli-
cies2should be combined with !ϕ+\!ϕ. Even though this method pro-
vides policy combination, it is not flexible for conflict resolution, since the deny
policies are always at a higher priority. The most common conflict resolution
approaches include: (1) meta-policies, (2) priority, and (3) harmonization. The
harmonization approach (3) requires disjoint policies that should be provided by
the administrator of the system. In (1) rules for conflict resolutions are defined,
which are with higher priority than the other policies, and this makes it a special
case of (2).
The priority approach (2) is the most flexible, since it is similar to the way
people solve the conflicts when they occur. It is much to define which rule is
more important. This is leveraged with the ρparameter in Definition 3. The
Definition 5defines an operator for combining an ordered set of policies.
Definition 5. Policy result combination is a non-commutative operator (
such that:
!#1,ϕ
1"(!#2,ϕ
2"="!#1,ϕ
1ϕ2",#
1=#2
!#1,ϕ
1\ϕ2",#
1%=#2
The operator (is able to produce dierent output for dierent policy priority
assignments. For example, if there are policies that allow the data Φ1,Φ
2,and
deny the Φ3part of the data, so that Φ=Φ1Φ2Φ3and Φ%=. If the
policies are activated for same intent , then they are in conflict and one of their
6possibleorderingscanbechosen.Herearethreeexampleorderingstogether
with the protected data results:
ρ1<ρ
2<ρ
3⇒!#+,(Φ1Φ2)\Φ3"
ρ1<ρ
3<ρ
2⇒!#+,(Φ1\Φ3)Φ2"
ρ3<ρ
2<ρ
1⇒!#,(Φ3\Φ2)\Φ1"
3.2 Policy Language
The policy language defines the level of flexibility, understandability and main-
tainability. In this paper, the policies are defined in RDF format that enables
actual representation of the policy elements described in Definition 3.TheRDF
2In this description the partial data filter function ϕhas superscript + or if it is
part of a policy with enforcement method "+and ", correspondingly.
110 R. Stojanov et al.
and SPARQL are combined together in order to bridge the understandability
gap, and provide better flexibility and maintainability. Even though SPARQL
can be dicult to learn, it is chosen for representation of the activation function
(in the policy ontology it is p:intent binding) and the partial result filtering func-
tion (p:protected data) in order to provide flexibility for requirement modeling.
Since the data is stored in semantic format, its query language SPARQL provides
greater flexibility for data selection using various patterns. This policy format
also enables easier requirement transformation into policies, with approximately
one policy per requirement, which improves the system maintainability.
Example 1shows the policy representation for the requirement Require-
ment 1. It is with low priority of 1. The prefix p:3is a lightweight policy ontology
that describes the policy language of the system, while int:4is a basic ontology
that define the most common intent classes and properties. Example 1shows the
classes int:Requester and int:Agent which are used to define who the requester is
and its agent definition, in this case containing the requesting IP address. This
way the policy language can model the requirement that are context dependent,
using these queries and the dynamic nature of the intent.
Example 1. _p a p:Policy;
p:intent_binding ’select ?s ?a ?n where {
?s a int:Requester. ?a a acl:Read.
?ag a int:Agent. ?ag int:address ?ip. ?ip int:network ?n}’;
p:protected_data ’construct { ?g ?p ?o } where {
?g a univ:Grade. ?g univ:for_course ?c.
?c univ:has_professor ?s. ?s univ:works_at ?f.
?f univ:has_network ?n. ?g ?p ?o}’;
p:enforce ’allow’;
p:priority 1.0^^xsd:double.
4 Enforcement Architecture
The system presented in this paper relies on the policies defined in the for-
mat and language presented in Sect. 3.2. The enforcement process always starts
with a requester’s intent. The requester can choose to provide multiple pieces
of evidence in the intent, among which can be its identity, additional attributes
about him/her, the environment in which it operates, the action he/she intends
to invoke, and some additional action parameters. The intent is represented as
a separate semantic graph. The data contained in this graph is not controlled
by the authorization system presented here. It only provides interface for intent
provisioning.
The authorization system intercepts the intent and builds a temporal graph
that corresponds to the data available for that intent. This process is illustrated
3http://github.com/ristes/univ-datasets/ont/policy.owl.
4http://github.com/ristes/univ-datasets/ont/intent.owl.
Temporal Authorization Graphs: Pros, Cons and Limits 111
Separate Thread for each intent
Activate policies
Combine the policies in query
Graph with this
query exists?
Construct temporal graph from
query
Cache the temporal graph
Return the cached graph
Filtered & materialized policies
Combining query
No
Yes
Temporal Graph
Ordered Policies
Datasets
Temporal
graph
Temporal
graph
Temporal
graph
Temporal
graph cache
Dataset
Intent
Fig. 1. Temporal graph maintenance
in Fig. 1.Itisinvokedeverytimetheintentischanged.Inthistemporalgraph
creation, only the policies for protecting read operations are considered.
As Fig. 1shows, the intent is used for policy activation. This process activates
the policies using the query provided in the property p:intent binding, which is
defined in the policy language. This query is executed against the intent graph
(from now on referred to as I), and the resulting variable bindings are used to
rewrite the p:protected data query. The rewriting process replaces the variables
that are mentioned in the both queries. If the p:intent binding query does not
return results, the policy is filtered out as inadequate.
The active policies with rewritten partial data filtering query are then com-
bined using the (operator described in Sect. 3.1. The only change made in the
implementation is that if the lowest priority policy is denying access, a new
implicit allow policy for all data is inserted with even lower priority, in order
to obtain a temporal graph which always holds the permitted data. The policy
combination is implemented using the Jena library5. As described in Sect. 3.2,
each of these queries are in the form construct T where OP,whereTis the triple
pattern that describes which data will be select in the temporal graph, and OP
defines the condition that should be meet by these triples. The system presented
5http://jena.apache.org.
112 R. Stojanov et al.
here, first rewrites all these queries so that all of them has T=?s ?p ?o6.If
the triple from some of these queries contains a term7then the OP is extended
with FILTER (?var=term),where?var is the variable used at that position in
the triple. Next, their OP conditions are combined using the SPARQL UNION
and MINUS operators that correspond to and \operations from Definition 5.
After this step, a combining query is obtained, which has the form construct ?s
?p ?o where CombinedOP.
Once this query is created, the system first check if it is dierent from the ver-
sion from the previous intent. If the current intent does not change the temporal
graph construction query, the old temporal graph is returned, and otherwise the
temporal graph is recreated, cached and then returned for further usage.
Once the temporal graph is created, every read operation is executed
against it. The read operations include the operations for fetching resources or
triples, listing their properties and executing SELECT, ASK and CONSTRUCT
SPARQL queries. This way, all operations operate over the permitted data only.
4.1 Conflict Detection
Get all
variable
mappings
Replace
variables in P2
Mappings:
[{v1->v2,...}
,{},...]
Execute query:
{ P1 } Filter Exist { P2’ }
Group By mappingVars
P2’
Res =
Yes
Conflict
No
P1 P2
Res
For each mapping
]
Mapping:
{v1->v2,...}
pp g
No
conflict
No more
mappings
Fig. 2. Conflict detection
Definition 4defines when two policies are in conflict. The temporal graph imple-
mentation enables conflicts to be detected automatically, with execution of the
partial data selection queries of each pair of policies with dierent enforcement
method. This is illustrated in Fig. 2, where at the beginning all variables from
the policies’ activation queries are mapped (all mappings are generated). Then,
6The variable names ?s, ?p and ?o are chosen for convenience, while in the implemen-
tation their names are randomly generated.
7The term unifies the IRI and literal elements.
Temporal Authorization Graphs: Pros, Cons and Limits 113
for each mapping the variables in P2 are rewritten, and the partial data queries
are combined together with FILTER EXIST operation. Group by element is
added for all mapped variable in order to find a results that are returned for
that particular variable combination. When this query returns results, a conflict
is detected. This way the administrator can be alerted about the conflict, and
can solve them using the policy priorities.
5Evaluation
As it was previously described, the main overhead that is imposed by this autho-
rization enforcement system is the temporal graph management described in
Fig. 1. The policy activation and combination steps are carried out completely
in memory, while the graph construction is carried out by the underlaying stor-
age engine, which may uses multiple I/O operations with the disk. Because of
this, the main focus in the evaluation was to determine the factors and their
influence to the graph creation time.
Multiple datasets were generated and stored using the Jena TDB semantic
storage. The dataset generators and evaluation code is based on the Jena API
and can be easily extended for other storage engines. Multiple queries with dif-
ferent features were generated, and each of them was executed 40 times, 20 of
witch were warm-up.
24 datasets were generated for the evaluation process. The data was gener-
ated following university ontology8. The dataset size can be expressed with this
formula: |DSi|=|DSi1|+|generateGrades(100 (i%6 + 1) 10i/6,i)|where
i[1..24] and DS0contains only one faculty, one study program and 4 technical
stausers. The generateGrades function takes an argument which defines the
number of grades that should be generated as a first argument, and the identifier
for the course for which this grades will be generated as a second argument. It
first generates the course, and then each grade is assigned to that course and to
anewlygeneratedstudentresource.
The fixed number of grades for each course is leveraged for query construc-
tion, so that it is possible to determine the query processing time correlation
with the dataset size and the number of returned results. The template of these
query is shown in Example 2.Anothervarianttodeterminethetimeisevalu-
ated by just adding FILTER (?v >6). The <courseIri>part is dierent in each
evaluation query, and this enables obtaining dierent number of results.
Example 2. CONSTRUCT { ?g ?p ?o } WHERE {
?g univ:for_course <courseIri>. ?g univ:grade_value ?v. ?g ?p ?o
}
Since the temporal graph is created with a query composed of multiple
SPARQL UNION and MINUS elements, few query variants were made to explore
8http://github.com/ristes/univ-datasets/ont/univ.owl.
114 R. Stojanov et al.
the dependency from the number of these constructs. These queries are combin-
ing the WHERE part shown in Example 2for a dierent courseIri replacements.
All the queries used for evaluation, together with the results are available in the
generator project repository9.
5.1 Evaluation Results
In Fig. 3part of the query results are shown. Each line represents the dependency
of the query processing time from the number of obtained results for some of the
generated datasets. All of the queries tested here are composed of a basic graph
pattern without filters. It shows that for this type of queries, the processing
time depends only from the number of selected results, and not from the dataset
size. The queries with the filter appended to them also does not depend on the
dataset size, but they took more time to return the same number of results.
Fig. 3. Temporal graph maintenance
The execution of the queries containing union and minus clauses revealed
that the union operation queries does not depend on the dataset size and the
number of union expressions, but only on the selected results. On the other
hand, the MINUS operation increases the time for query processing depending
on the total returned plus the total removed results (the one selected in the
minus clauses).
In terms of performance, the enforcement approach with temporal authoriza-
tion graphs is inferior when executed once for every intended action [7,13,16].
This is why this paper investigates the graph construction performances in order
to defines the limits in which this approach is suitable.
The suitability limits are calculated using the following approximate expec-
tations as inputs:
9http://github.com/ristes/univ-datasets.
Temporal Authorization Graphs: Pros, Cons and Limits 115
Fig. 4. Dierent query types comparison
|ϕ(p)|: Average expected protected triples
icr: Expected intent change rate (changes/minute)
The results presented here show the correlations between the processing time
and the query types, resulting data size and the dataset size. Since in the most
of the cases the dataset size does not have significant impact, the expected
processing time function f(|ϕ(p)|) is interpolated from the presented results,
averaged for all query types. The average is added since it is expected that
multiple policies of various types will be activated, and the average best fits this
diversity. An important note is that the function fdoes not calculate the actual
execution time, but it is intended to find the approximate expected processing
time based on the expected query results.
The Fig. 4shows the time required for a dierent types of queries based on
the results they select. This figure shows that the queries containing FILTER
or MINUS expressions are slightly slower than the other one. The reason for
this is because the TDB engine has to process all suitable results that match
the triple patterns, and then filters out part of those results, leading that these
variants of queries require more processing time per result. Our analysis shows
that the performance in general depend on the number of processed triples, not
on the returned one. This way of observation shows that it is enough to execute
all basic graph patterns from all policies combined with union, and for this
particular engine (Jena TDB) the results will depend on the number of returned
results.
During the experiments all the resulting Jena models were serialized as bytes,
and they gave an average of 80 bytes per stored triple. Even though this value
depends on the type of the data being stored in the dataset, it can be used to
estimate the required memory for temporal graph storing through the system.
116 R. Stojanov et al.
Fig. 5. Linear approximation function
The linear approximation function fshown in (2) can be used to give fast
expectation about the query execution time, where results size represents the
number of expected results from all policy basic graph pattern combined. Even
though this approximation gives lower values for result sizes above 1 M triples,
as Fig. 5shows, the time for this amount of data is more that 10 s which in our
opinion exceeds the reasonable limits for login time, and thus the approximation
function is no further fitted.
f:time =1.8105results size 0.57 (2)
6Discussion
The policy language presented in this paper provides flexibility for capturing
most of the client requirements. The policy combination is designed to follow
the requirement process, since it is easier to order the policies by their prior-
ity. The Intent enables using all available evidences as a semantic statements
and their incorporation within the policies. The policies enable combination of
the intent’s data with the protecting data in the activation and filtering phases,
which matches the flexibility in the natural language used in the requirements.
This way the policy language can model the requirement that are context depen-
dent, using these queries and the dynamic nature of the intent. The activation
query, and its results combination with the partial data filtering enables flexi-
ble selection of the requesters for which the policy should be applied. This is
even more flexible than the ABAC authentication systems, since more than just
property values can be selected. The policy language defined in this paper can
select the relevant requester using its contextual evidences, its properties and
its connections with the rest of the data, which provides great flexibility in the
requirement transformation process. The protection granularity enables protec-
tion of resources, triples, quads and graphs., actions and aggregate operations.
Temporal Authorization Graphs: Pros, Cons and Limits 117
This range of protection elements makes this policy language unique, especially
in terms of the aggregate operations, which often are requested as exceptions of
the standard rules. Such example that infers the average grade for each student
in the dataset is shown in Example 3.Thispolicyallowsaccesstothestudents
average grade for all professors, with a high priority that is likely to be added
to the temporal graph.
Example 3. _p1 a p:Policy;
p:intent_binding ’select ?r ?a where {
?r a int:Requester. ?a a acl:Read
}’;
p:partial_data ’construct { ?st univ:has_average ?avg }
where {
?r a univ:User. ?c univ:has_professor ?r.
{select(avg(?v)as?avg)?st
where {
?g univ:for_student ?st. ?g univ:grade_value ?v
}groupby?st
}};
p:enforce ’allow’;
p:priority 150.0^^xsd:double.
The policy language is chosen to be close to the natural language, and using
tools like sentence dependency graph can lower the ambiguity in the process
of policy design. The use of a SPARQL for policy activation and partial data
selection provides a number of policies close to that of the separate requirement
scenarios. The popularity of the technologies used in the policy language will
lower the time required to learn it, and the large number of resources will make
the policy format more understandable.
The temporal graph can be used in the policy design phase for testing. The
administrator can use simulated intents to test the correctness of the policies.
This provides early policy inconsistency detection. Also, when the temporal
graph is used, there is no way of inferring some data with multiple query probes.
For instance, if the authorization system forbids the grades of the students, but
allows querying data with filter by the grade value, the grades of the students
may be induced with multiple query executions. This can not be done when
temporal graphs are used.
6.1 Use Case
In order to demonstrate how the results form this paper can be used in real
life scenario. Lets assume that a faculty has 100 professors that can see only the
grades for the courses they have lectured. For simplicity, let every course have 100
students. In worst case scenario, if all the professors are employed for 30 years,
and in every semester they have held 3 courses, their temporal graph will contain
in total 30 years * 2 semesters/year * 3 courses/semester * 100 students/course *
1grade/student*4statements/grade=72000statementsforthegrades,which
118 R. Stojanov et al.
gives something less that 7.2 MB for the grades. Even though many students
overlap over the courses, lets assume that all the students are dierent and
this gives as additional ˜
10 MB, or roughly around 20 MB per professor temporal
graph. This graph will contain in total around 200000 statements.
Given the approximation formula (2), the expected time for each profes-
sor temporal graph creation will be 1.810521050.57 = 3.03 seconds
approximately in worst case. Since the grades are not frequently changed and the
professor environment does not influence to the permitted data for the professor,
there wont be many temporal graph re-creations, which makes the caching time
acceptable, especially given the implicit security provided by the temporal graph
that contains only the permitted data.
The memory requirement for the system will be 2 GB of RAM, if all of the
professors are using the system in parallel only for the temporal graphs, which
now days is the memory that the mobile phones have.
However, the temporal graphs are not always suitable. One such case can be
a scenario that contains temporal policies, which will induce many re-creation
of the temporal graphs, so no matter how small they are, there will be a lot of
overhead.
7Conclusion
The temporal authorization graphs by their nature provide convenience with
the implicit security they provide. Additionally, it is easy to test the policies
in the design phase using this approach, leading to higher correctness of the
authorization system. Even though their creation may introduce significant per-
formance deterioration, the caching mechanism proposed in this paper overcomes
this issue, when there are no frequent changes in the system.
The policy formalism and language presented in this paper enable flexible
requirement to policy transformation. The use of the SPARQL in the policy lan-
guage helps in this process, by allowing selection of arbitrary peaces of data, and
following the natural language of the requirements. This way the maintenance
of the system is improved, because each requirement is represented with one or
few policies, while the temporal graph simplifies their testing.
Finally, this paper shows that there are use cases for which the temporal
authorization graphs introduce benefits, and the approximated formula (2) pro-
vides a tool that simplifies this process for the Jena TDB storage. During the
evaluation process few datasets were created, together with evaluation scenarios,
which can be used to fit this formula for other semantic data storage engines.
References
1. Abel, F., De Coi, J.L., Henze, N., Koesling, A.W., Krause, D., Olmedilla, D.:
Enabling advanced and context-dependent access control in RDF stores. In:
Aberer, K., et al. (eds.) The Semantic Web, ASWC/ISWC -2007. LNCS, vol. 4825,
pp. 1–14. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-
01
Temporal Authorization Graphs: Pros, Cons and Limits 119
2. Armando, A., Carbone, R., Compagna, L., Cuellar, J., Tobarra. L.: Formal analysis
of saml 2.0 web browser single sign-on: breaking the saml-based single sign-on for
google apps. In: Proceedings of the 6th ACM Workshop on Formal Methods in
Security Engineering, pp 1–10. ACM (2008)
3. Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Sci. Am. 284(5),
28–37 (2001)
4. Bizer, C., Heath, T., Berners-Lee. T.: Linked data-the story so far. In: Semantic
Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227
(2009)
5. Costabello, L., Villata, S., Rodriguez Rocha, O., Gandon, F.: Access control for
HTTP operations on linked data. In: Cimiano, P., Corcho, O., Presutti, V., Hollink,
L., Rudolph, S. (eds.) The Semantic Web: Semantics and Big Data, ESWC 2013.
LNCS, vol. 7882, pp. 185–199. Springer, Heidelberg (2013). https://doi.org/10.
1007/978-3-642-38288- 8 13
6. Dietzold, S., Auer, S.: Access control on RDF triple stores from a semantic wiki
perspective. In: ESWC Workshop on Scripting for the Semantic Web. Citeseer
(2006)
7. Flouris, G., Fundulaki, I., Michou, M., Antoniou, G.: Controlling access to RDF
graphs. In: Berre, A.J., G´omez-P´erez, A., Tutschku, Kurt, Fensel, D. (eds.) Future
Internet - FIS 2010, FIS 2010. LNCS, vol. 6369, pp. 107–117. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-15877-3 12
8. Franzoni, S., Mazzoleni, P., Valtolina, S., Bertino, E.: Towards a fine-grained access
control model and mechanisms for semantic databases. In: IEEE International
Conference on Web Services (ICWS 2007), pp. 993–1000. IEEE (2007)
9. Godik, S., Anderson, A., Parducci, B., Humenn, P., Vajjhala. S.: Oasis extensible
access control 2 markup language (xacml) 3. Technical report, OASIS (2002)
10. Grzegorowski, M., Zdravevski, E., Janusz, A., Lameski, P., Apanowicz, C., Slezak,
D.: Cost optimization for big data workloads based on dynamic scheduling and
cluster-size tuning. Big Data Res. 25, 100203 (2021)
11. Hardt, D.: The OAuth 2.0 authorization framework (2012)
12. Kagal, L., Finin, T., Joshi, A.: A policy language for a pervasive computing envi-
ronment. In: Policies for Distributed Systems and Networks, 2003. Proceedings.
POLICY 2003. IEEE 4th International Workshop on, pp. 63–74. IEEE (2003)
13. Kirrane, S.: Linked data with access control. Ph.D. Thesis (2015)
14. Kirrane, S., Mileo, A., Decker, S.: Access control and the resource description
framework: a survey. Seman. Web 8(2), 311–352 (2017)
15. Kolovski, V., Hendler, J., Parsia, B.: Analyzing web access control policies. In:
Proceedings of the 16th international conference on World Wide Web, pp. 677–
686. ACM (2007)
16. Muhleisen, H., Kost, M., Freytag, J.-C.: SWRL-based access policies for linked
data. Procs of SPOT, 80 (2010)
17. Oulmakhzoune, S., Cuppens-Boulahia, N., Cuppens, F., Morucci, S.: fQuery:
SPARQL query rewriting to enforce data confidentiality. In: Foresti, S., Jajodia,
S. (eds.) Data and Applications Security and Privacy XXIV, DBSec 2010. LNCS,
vol. 6166, pp. 146–161. Springer, Heidelberg (2010). https://doi.org/10.1007/978-
3-642-13739-6 10
18. Scarioni, C.: Pro Spring Security. Apress, New York City (2013)
19. Sporny, M., Inkster, T., Story, H., Harbulot, B., Bachmann-Gm¨ur, R.: Webid 1.0:
Web identification and discovery. Editor’s draft, W3C (2011)
20. Stojanov, R., Gramatikov, S., Mishkovski, I., Trajanov, D.: Linked data authoriza-
tion platform. IEEE Access 6, 1189–1213 (2017)
120 R. Stojanov et al.
21. Stojanov, R., Gramatikov, S., Popovski, O., Trajanov, D.: Semantic-driven secured
data access in distributed IoT systems. In: 2018 26th Telecommunications Forum
(TELFOR), pp. 420–425. IEEE (2018)
22. Stojanov, R., Jovanovik, M.: Authorization proxy for SPARQL endpoints. In: Tra-
janov, D., Bakeva, V. (eds.) ICT Innovations 2017. CCIS, vol. 778, pp. 205–218.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67597-8 20
23. Story, H., Harbulot, B., Jacobi, I., Jones, M.: FOAF+ SSl: restful authentication
for the social web. In: Proceedings of the First Workshop on Trust and Privacy on
the Social and Semantic Web (SPOT2009) (2009)
24. Toninelli, A., Montanari, R., Kagal, L., Lassila, O.: Proteus: a semantic context-
aware adaptive policy model. In: Eighth IEEE International Workshop on Policies
for Distributed Systems and Networks (POLICY’07), pp. 129–140. IEEE (2007)
25. Zdravevski, E., Lameski, P., Apanowicz, C., ´
Sl¸ezak, D.: From big data to business
analytics: the case study of churn prediction. Appl. Soft Comput. 90, 106164 (2020)
26. Zdravevski, E., Lameski, P., Kulakov, A., Filiposka, S., Trajanov, D., Jakimovski,
B.: Parallel computation of information gain using Hadoop and MapReduce. In:
2015 Federated Conference on Computer Science and Information Systems (Fed-
CSIS), pp. 181–192 (2015)
ResearchGate has not been able to resolve any citations for this publication.
Book
Full-text available
The explosion of digital content and the heterogeneity of enterprise content sources have pushed existing data integration solutions to their boundaries. An alternative solution is to use the Resource Description Framework (RDF) together with the existing web infrastructure, commonly known as the Linked Data Web (LDW), as a means to integrate both public and private data. With the advent of SPARQL 1.1, it is possible not only to execute queries over the LDW, but also to use the SPARQL query language to maintain distributed graph data. However, such a decentralised architecture brings with it a number of additional challenges with respect to both data security and integrity. In this thesis, we focus on the problem of access control for the LDW. We are particularly interested in lifting both the data and the access control policies from existing line of business applications and enforcing and maintaining access control over linked data, irrespective of how it is published. We start by proposing a lifting strategy, which can be used to extract both data and access control policies from relational databases. The access permissions are represented using an extension of the RDF model known as annotated RDF (aRDF), which allows contextual information to be associated with RDF data. By using aRDF domain operations it is possible to combine different annotations for the same triple and to infer new annotations based on RDF Schema (RDFS) inference rules. We demonstrate how the proposed modelling, together with a set of inference rules, can be used to provide support for commonly used access control models, such as Role-Based Access Control, Attribute Based Access Control and View Based Access Control. With regards to the enforcement and administration of access control over RDF, we focus specifically on Discretionary Access Control (DAC). Given DAC allows users to delegate their permissions to others, it is particularly suitable for managing access control over distributed data. Although a number of authors demonstrate how they can used Semantic Web technology to represent DAC policies, the authors do not examine DAC from a data model perspective. In order to fill this gap, we provide a summary of access control requirements for the RDF data model, based on the different characteristics of the RDF data model compared to relational and tree data models. In order to support access control policy specification at the triple, resource, graph and schema level, authorisations are specified using graph patterns. In addition, we demonstrate how our proposed flexible graph based authorisation framework, which we call GFAF, can be used to cater for the specification, administration and enforcement of DAC policies over linked data. Given the proposed access control framework enforces access control at the query layer, when access to the requested RDF data is partially restricted, it is necessary to rewrite the query so that it behaves in the same manner as a query executed over a filtered dataset. Therefore, we propose a query rewriting strategy for both the SPARQL 1.1 query and update languages, that can be used to partially restrict access to unauthorised data. In addition, we demonstrate how a set of criteria, which was originally used to verify relational access control policies, can be adapted to ensure the correctness of access control over RDF via query rewriting.
Article
Full-text available
The success of companies hugely depends on how well they can analyze the available data and extract meaningful knowledge. The Extract-Transform-Load (ETL) process is instrumental in accomplishing these goals, but requires significant effort, especially for Big Data. Previous works have failed to formalize, integrate, and evaluate the ETL process for Big Data problems in a scalable and cost-effective way. In this paper, we propose a cloud-based ETL framework for data fusion and aggregation from a variety of sources. Next, we define three scenarios regarding data aggregation during ETL: (i) ETL with no aggregation; (ii) aggregation based on predefined columns or time intervals; and (iii) aggregation within single user sessions spanning over arbitrary time intervals. The third scenario is very valuable in the context of feature engineering, making it possible to define features as “the time since the last occurrence of event X”. The scalability was evaluated on Amazon AWS Hadoop clusters by processing user logs collected with Kinesis streams with datasets ranging from 30 GB to 2.6 TB. The business value of the architecture was demonstrated with applications in churn prediction, service-outage prediction, fraud detection, and more generally — decision support and recommendation systems. In the churn prediction case, we showed that over 98% of churners could be detected, while identifying the individual reason. This allowed support and sales teams to perform targeted retention measures.
Article
Full-text available
The expansion of the smart devices, the growing popularity of the social networks and the wide spread of the corporate services impose huge amounts of heterogeneous data to be generated and stored in separate silos on a daily basis. Parts of this data are private and highly sensitive as they reflect owner’s behavior, obligations, habits, and preferences. On the other hand, the emerging crowd services challenge the owners to expose this data in return to the convenience they offer. Therefore, it is imperative not only to protect the interaction with sensitive data, but also to selectively open it in an unharmful manner for the owner’s personal integrity. One of the main enablers of the crowd services is the emerging Linked Data, which is all about opening heterogeneous knowledge from separate data silos. Its growing popularity encourages the data owners to publish their personal data in linked data format. The fusion of sensor, social and corporate data opens new security challenges which extend the standard security considerations towards more flexible and context aware authorization platforms. In this paper, we propose a Linked Data Authorization (LDA) platform atop a policy language flexible enough to cover all newly emerged requirements, including context awareness. The proposed policy language extends the widely accepted W3C’s SPARQL query language and leverages its expressiveness to protect every part of the data. The novelty of our LDA platform is its unique capability of design time policy validation through stand alone testing, conflict detection and overall protection coverage extraction.
Conference Paper
Full-text available
Nowadays, companies collect data at an increasingly high rate to the extent that traditional implementation of algorithms cannot cope with it in reasonable time. On the other hand, analysis of the available data is a key to the business success. In a Big Data setting tasks like feature selection, finding discretization thresholds of continuous data, building decision threes, etc are especially difficult. In this paper we discuss how a parallel implementation of the algorithm for computing the information gain can address these issues. Our approach is based on writing Pig Latin scripts that are compiled into MapReduce jobs which then can be executed on Hadoop clusters. In order to implement the algorithm first we define a framework for developing arbitrary algorithms and then we apply it for the task at hand. With intent to analyze the impact of the parallelization, we have processed the FedCSIS AAIA'14 dataset with the proposed implementation of the information gain. During the experiments we evaluate the speedup of the parallelization compared to a one-node cluster. We also analyze how to optimally determine the number of map and reduce tasks for a given cluster. To demonstrate the portability of the implementation we present results using an on-premises and Amazon AWS clusters. Finally, we illustrate the scalability of the implementation by evaluating it on a replicated version of the same dataset which is 80 times larger than the original.
Article
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Article
Analytical data processing has become the cornerstone of today's businesses success, and it is facilitated by Big Data platforms that offer virtually limitless scalability. However, minimizing the total cost of ownership (TCO) for the infrastructure can be challenging. We propose a novel method to build resilient clusters on cloud resources that are fine-tuned to the particular data processing task. The presented architecture follows the infrastructure-as-a-code paradigm so that the cluster can be dynamically configured and managed. It first identifies the optimal cluster size to perform a job in the required time. Then, by analyzing spot instance price history and using ARIMA models, it optimizes the schedule of the job execution to leverage the discounted prices of the cloud spot market. In particular, we evaluated savings opportunities when using Amazon EC2 spot instances comparing to on-demand resources. The performed experiments confirmed that the prediction module significantly improved the cost-effectiveness of the solution – up to 80% savings compared to the on-demand prices, and at the worst-case, 1% more cost than the absolute minimum. The production deployments of the architecture show that it is invaluable for minimizing the total cost of ownership of analytical data processing solutions.
Article
Security is a key element in the development of any non-trivial application. The Spring Security Framework provides a comprehensive set of functionalities to implement industry-standard authentication and authorization mechanisms for Java applications. Pro Spring Security will be a reference and advanced tutorial that will do the following: • Guides you through the implementation of the security features for a Java web application by presenting consistent examples built from the ground-up. • Demonstrates the different authentication and authorization methods to secure enterprise-level applications by using the Spring Security Framework. • Provides you with a broader look into Spring security by including up-to-date use cases such as building a security layer for RESTful web services and Grails applications.
Chapter
A large number of emerging services expose their data using various Application Programming Interfaces (APIs). Consuming and fusing data form various providers is a challenging task, since separate client implementation is usually required for each API. The Semantic Web provides a set of standards and mechanisms for unifying data representation on the Web, as well as means of uniform access via its query language – SPARQL. However, the lack of data protection mechanisms for the SPARQL query language and its HTTP-based data access protocol might be the main reason why it is not widely accepted as a data exchange and linking mechanism. This paper presents an authorization proxy that solves this problem using query interception and rewriting. For a given client, it solely returns the permitted data for the requested query, defined via a flexible policy language that combines the RDF and SPARQL standards for policy definition.
Article
In recent years we have seen significant advances in the technology used to both publish and consume structured data using the existing web infrastructure, commonly referred to as the Linked Data Web. However, in order to support the next generation of e-business applications on top of Linked Data suitable forms of access control need to be put in place. This paper provides an overview of the various access control models, standards and policy languages, and the different access control enforcement strategies for the Resource Description Framework (the data model underpinning the Linked Data Web). A set of access control requirements that can be used to categorise existing access control strategies is proposed and a number of challenges that still need to be overcome are identified.