ArticlePDF Available

A two-dimensional sharding model for access control and data privilege management of blockchain

Authors:

Abstract and Figures

This paper presents a method to manage private data stored on a blockchain. With our method, the blockchain’s features for log transparency and tamper-resistance are maintained, even though the data is only available to authorized users. The most relevant work so far randomly selects nodes to store the decryption key shares of a threshold cryptosystem for some data which are not maintained in the system. They provide the decryption keys to the data requester via on-chain methods. This is for guaranteeing the availability and distributing the incentives. If the system maintains the data and wants to achieve the same guarantees, it has to post the data to the blockchain. This will make the blockchain oversized and the work impractical. This paper shows that nodes in our method may provide data to the requester directly without posing to the blockchain while guaranteeing availability and that the incentives be fairly distributed. Furthermore, each data request incurs a tiny size of transactions. We achieve so by implementing a two-dimensional sharding model, where nodes are randomly assigned to shards. Data is arithmetically compressed and then split into pieces. Each data piece is stored by a node in a first dimension shard. Without getting all the pieces, the data cannot be successfully decompressed. Each node in the first dimension shard is monitored by a second dimension shard. We propose designs that empower the corresponding second dimension shard for evaluating whether the first dimension node has provided the correct data piece to the data requester. This waives the need for placing the data into transactions and being witnessed by all. In case when a first dimension node fails, its data will be recovered by the corresponding second dimension shard.
Content may be subject to copyright.
A two-dimensional sharding model for access control and data
privilege management of Blockchain
Yibin Xu
University of Copenhagen
Copenhagen, Denmark
yx@di.ku.dk
Tijs Slaats
University of Copenhagen
Copenhagen, Denmark
slaats@di.ku.dk
Boris Düdder
University of Copenhagen
Copenhagen, Denmark
boris.d@di.ku.dk
ABSTRACT
This paper presents a method to manage private data stored on
a blockchain. With our method, the blockchain’s features for log
transparency and tamper-resistance are maintained, even though
the data is only available to authorized users.
The most relevant work so far randomly selects nodes to store
the decryption key shares of a threshold cryptosystem for some
data which are not maintained in the system. They provide the
decryption keys to the data requester via on-chain methods. This
is for guaranteeing the availability and distributing the incentives.
If the system maintains the data and wants to achieve the same
guarantees, it has to post the data to the blockchain. This will make
the blockchain oversized and the work impractical.
This paper shows that nodes in our method may provide data to
the requester directly without posing to the blockchain while guar-
anteeing availability and that the incentives be fairly distributed.
Furthermore, each data request incurs a tiny size of transactions.
We achieve so by implementing a two-dimensional sharding
model, where nodes are randomly assigned to shards. Data is arith-
metically compressed and then split into pieces. Each data piece is
stored by a node in a rst dimension shard. Without getting all the
pieces, the data cannot be successfully decompressed.
Each node in the rst dimension shard is monitored by a second
dimension shard. We propose designs that empower the corre-
sponding second dimension shard for evaluating whether the rst
dimension node has provided the correct data piece to the data
requester. This waives the need for placing the data into transac-
tions and being witnessed by all. In case when a rst dimension
node fails, its data will be recovered by the corresponding second
dimension shard.
1 INTRODUCTION
The idea of storing data in blockchain has been studied in recent
years [
8
,
32
,
34
,
41
,
44
,
48
,
49
]. Despite the time consumed to reach
consensus in dierent blockchain nodes, blockchain enables people
to store data in an anonymous and untrustworthy environment.
In the worst case, it endures only if a threshold number of nodes
honestly record the data. Compared to the data cloud that achieves
data distribution but requires trust between the service providers
and the users, blockchain-based storage systems achieve data dis-
tribution and tamper-resistance without requiring trust in a central
authority. Blockchain holds promise to power a new generation
of networked data handling systems by addressing concerns over
censorship and unethical data collection.
Despite these promises, blockchain only oers a at data privilege
model and, thus, does not support data privilege levels. In the decen-
tralized environment, the blockchain’s data is fully replicated and
accessible to every node so that the late-comers can verify and sync
with the blockchain. Therefore blockchain cannot store conden-
tial data or maintain any on-chain data privilege. Currently, data
privilege control is mainly achieved through encryption, where
nodes sync and maintain the encrypted data they do not under-
stand. O-chain channels [
3
,
18
,
30
] or identity key handshakes via
smart contract are used to pass the decryption key to the querier
when a query for data is posted. These methods incur drawbacks
from three aspects:
(1)
Storage size. Storing data directly as transactions either
brings a limited data size or an oversized blockchain. It is
infeasible to make every node in the blockchain store all
user-uploaded data in a full replication scheme. Notably, it
might even result in pushing the system toward a centraliza-
tion architecture when anonymous and storage-constrained
nodes are unable to participate in the system due to lacking
storage space.
(2)
Data privilege.The decryption key cannot be stored in
the blockchain or be passed to the querier in a plain text
format through smart contract execution, as that makes
the key a piece of on-chain data that anyone can sync. It
eectively makes the decryption key owner a “data centre"
for the data. The probability of leakage increases with every
node sharing the same decryption key. However, in many
circumstances, the data should be handled by a trusted inter-
mediary, which is lacking in the decentralized environment,
e.g., a central key management provider. For example, a con-
tact tracing app for Covid-19 logs the users’ location to the
blockchain. In non-infected circumstances, a user’s location
data should be available to the user only. However, in case
of health risks, the authority should track and locate people
using these data. Encryption methods are undesirable in
this situation as (1) if the authority/the trusted third party
does not own a decryption key to the data of the person
in question, the user must cooperate when the authority
requires data (must stay online and active whenever the
1
Yibin Xu, Tijs Slaats, and Boris Düdder
authority needs data). Or (2) if the authority/the trusted
third party owns a decryption key to the data, the data
handling lacks transparency, monitoring, and supervision.
(3)
Attestation. Once the decryption keys are obtained by a
data requester, not only can it then share the plain text data
with others, the decryption keys become the evidence that
authenticates the data as the decrypted secret stored in the
blockchain. Thereby, once a secret is leaked, one cannot
deny it.
Much research addresses the storage problem by only logging
the hash of the data into the blockchain and then storing the data
separately. Users can determine the correctness of received data
by matching it with the hash digest indicated in the blockchain.
As a result of lacking centralized control, it is not easy to allocate
storage among nodes. Many approaches [
15
,
32
,
42
44
,
48
] use
compensation to heavily reward nodes for storing fewer backup
data and then achieve balanced storage in a decentralized manner.
However, only a few [
43
,
44
] methods have theoretically proven
the rate of data loss with their designs and fault models. In other
words, most blockchain-based storage systems claim that losing
data is unlikely because the anonymous nodes are enticed to store
less abundant data for prot, but they are not deterministic about
how secure the systems are.
In terms of the data privilege problem, Calypso [
20
] employs a
secret sharing scheme for blockchain. Data in Calypso is encrypted
and the decryption keys are stored using the
(𝑇, 𝑚)
secret sharing
scheme [
5
,
35
]. The keys are kept by a secret group of nodes. When
a key is requested, at least
𝑇
nodes post their encrypted shares to
the blockchain with a PVSS scheme [
37
]. Only the requester can
decrypt the encrypted shares but the public can verify. This method
may empower a virtual trusted third party where the read and write
privilege toward the encrypted data can be granted when at least
𝑇
nodes out of the
𝑚
nodes agree so, and the
𝑚
nodes themselves
cannot make a decision or view/change the data individually. The
“trust” may be granted if the probability for the adversary of a
pre-dened population taking at least
𝑇
nodes among
𝑚
nodes is
below a threshold probability. Using the previous Covid-19 example,
in Calypso, the decryption keys for data will only be released to
the authority if the user has been declared positive for the disease.
However, (1) Calypso only concerns keys and key management,
but not how encrypted data is stored. Thus, data availability cannot
be guaranteed. (2) When the decryption keys are released to a data
requester and then leaked, it not only reveals the secret data but
also attests to the rumor, provided that the encrypted data is stored
in a decentralized manner but indicated in the blockchain and can
be reached by anyone.
The problem with attestation is unique in the blockchain-based
storage systems as it is brought by the tamper-resisting feature
of blockchain. To the best of our knowledge, there is no existing
work that, on the one hand, ensures that the data requester gets
the correct data, on the other, does not build a link between secret
data maintained in the blockchain and its plain test.
This work introduces a new design for the blockchain to manage
private data. Our design addresses both storage and privilege prob-
lems, is deterministic in the rate of losing data, and provides data
privilege in a way that separates the data and attestation. In addition,
instead of simply addressing the protocol for granting/revoking
access to particular data, our design employs a process-aware en-
gine that allows sophisticated processes for granting/revoking data
privilege. This can be done via smart contract executions.
At a low level, our method combines blockchain sharding with
data fragmentation. Compared to other data fragmentation methods
[
23
], we do not require trust from the nodes hosting the fragments.
We consider data recovery and condentiality in the adversarial
setting. This work will theoretically prove that our method achieves
balanced storage for data. When a threshold number of nodes (e.g.,
one-third of nodes) suddenly fail, the probability of losing data is
strictly below a threshold (e.g., 10
6
) when the security assumptions
hold. We theoretically prove that it is challenging to access data
illegally or incomplete data.
At a high level, we manage data privilege via smart contract
execution. All nodes listen to a global blockchain, which can be
existing blockchains like Algorand and Ethereum. Our method
grants access to a data querier entirely through smart contract
execution on this blockchain and does not require the data owner
or any trusted third party’s presence. The history of the smart
contract execution is recorded in the blockchain so that it is clear
to the nodes to which users have access right to data. Note that we
only aim to grant data privilege to particular users via smart contract
executions. The smart contract manages the data privilege. We do
not aim to provide data privilege for the “input data” to a smart
contract. In our context, the read or write privilege of particular
data for a particular user (identied by the PKI key pair) is the
“input data” for the smart contract.
1.1 Key techniques
We propose a two-dimensional sharding model of a
𝑛
node system,
with adversary taken at most
𝑓 ⌊(𝑛
1
)/
3
nodes. All
𝑛
nodes
participate in a global blockchain. For each data, a
𝑚
row,
𝑠
column
table is constructed, where each cell represents a node,
𝑛=𝑚×𝑛
.
Nodes are randomly placed in this table. The fundamental security
of the system is based on the fact that nodes have no control over
which cells they will be placed in and the probability for the adver-
sary nodes to be assigned to the particular cells at the same time
complies with the binomial distribution that depends on
𝑠
and
𝑚
.
The data is split into
𝑚
pieces and these pieces are stored in the rst
column of nodes, one piece per node. We refer to these nodes as
rst-dimension nodes. Our security model ensures the probability
that all
𝑚
rst-dimension nodes are adversarial is minimal because
of randomization. The data is arithmetically compressed before
distributing to nodes. Without getting all the pieces, one cannot
successfully decode the data. This guarantees data condentiality.
Each data piece is split into
𝑠
1sub-pieces, which are stored by
the remaining nodes in each row (second-dimension nodes). The
sub-pieces are erasure code encoded, such that a data piece can
be reconstructed by
𝑇=⌊(𝑠
2
)/
2
+
1out of
𝑠
1sub-pieces.
Shall a rst-dimension node fail, the data it stored will be recovered
using the sub-pieces from the remaining nodes in the same row.
Our security model ensures that when at most one-third of the
nodes (both rst- and second-dimension) fail, the probability for
data to be incomplete is minimal. This guarantees data availability.
Figure 1 shows a visualized example of the node table.
2
A two-dimensional sharding model for access control and data privilege management of Blockchain
s
m
First dimension node
-
Second dimension node
-
Figure 1: The node table for a two-dimensional sharding
model of 24 nodes
1.1.1 Key challenge 1: Service monitoring and fair compensation.
The key challenge for maintaining o-chain data sharing in our
system is to determine if a rst dimension node actually provided
a data piece to the data requester. In this context, both the node
and data requester may be adversarial and their claims cannot be
trusted. To address this, we propose the following design:
(1)
Each row of nodes maintains a synchronous permissioned
blockchain. The data requester may sync these blockchains.
(2)
When a data requester requests a data piece from a rst-
dimension node
𝐴
, node
𝐴
gives the data requester an al-
most complete data piece with 256 bits of random informa-
tion removed. So that it takes 2
256
attempts for the attacker
to recover the data without the information.
(3)
When the data requester receives the data from node
𝐴
,
it requests the missing data from the second-dimension
nodes of node
𝐴
. The missing information is maintained in
a way that when the data requester gets data from
𝑇
out of
𝑠
1second-dimension nodes, the missing information is
recovered. The second-dimension nodes reply to the data
requester by posting the information encrypted via a PVSS
scheme [37, 46, 47] as transactions to the blockchain.
(4)
In the normal case,
𝑇
nodes then know that node
𝐴
has
provided incomplete data to the data requester, otherwise,
there is no reason for the data requester to ask for the miss-
ing information. So that
𝐴
and the rst
𝑇
second-dimension
nodes which replied can be compensated. The transaction
for compensation will be recorded on the blockchain of the
corresponding row of nodes.
(5)
In case
𝐴
does not reply to the data requester within a
time-bound, the data requester may notify the blockchain
of the corresponding second-dimension nodes. Nodes will
then provide sub-pieces of the incomplete data to the data
requester privately. When a node provides a sub-piece to the
data requester, the node will also request the relevant sub-
pieces from
𝐴
as a liveness verication. A verdict regarding
whether node
𝐴
has failed will be recorded to a block and
then appended to the blockchain of corresponding second-
dimension nodes. Node
𝐴
will then be punished or still be
compensated. The data requester then requires the missing
information as in the normal case. So later, the second-
dimension nodes will be compensated also.
Figure 2 shows the procedure of requesting a data piece from
node
𝐴
. Node
𝐴
has no motivation to ignore the data requester be-
cause that will trigger liveness verication and cause the same com-
munication burden as replying to the data requester. The second-
dimension nodes have the motivation to provide sub-pieces for the
data requester, as later, they will be paid for providing the missing
information to the data requester.
The data requester has no motivation to report a faulty failure
of
𝐴
when it already received the data from
𝐴
because: (1) The data
requester would need to nish downloading at least
𝑇
sub-pieces
from the second-dimension nodes to trigger a liveness verication
for
𝐴
(a second-dimension node will not check on
𝐴
if the data
requester did not require the sub-piece from it). (2) Node
𝐴
will
then pass the verication of the corresponding blockchain and still
receive compensation.
By this, we build a service monitor and also a fair incentive
distribution. It also eases the data burden for nodes because most
transactions are logged to the blockchains of the corresponding
second-dimension nodes, which are not synced by all.
The global blockchain only logs the node membership info, data
info, and privilege info (e.g., which nodes are participating in the
system, which data is stored in the system, what is the data’s node
table, who can access the data, etc.).
1.1.2 Key challenge 2: Provide data but not aestation. The goal is
to empower the data requester to verify the genuity of the data it
received from our system while it cannot prove to others that the
data it received is the data stored in our system. To address this, we
provide a design:
(1)
The data uploader splits the data
𝐷
into
𝑚
pieces,
𝐷𝑃1···𝑚
.
(2)
The data uploader sends a data piece
𝐷𝑃𝑖 [ 1,𝑚]
with a ran-
dom
𝑁𝑜𝑛𝑐𝑒𝑖
to the corresponding rst-dimension node.
ℎ𝑎𝑠ℎ(𝐷𝑃𝑖+𝑁 𝑜𝑛𝑐𝑒𝑖)
is logged to the global blockchain. Each
rst-dimension node can check the data it received with
the hash indicated in the blockchain.
(3)
The data uploader sends sub-pieces to the second-dimension
nodes. Each sub-piece
𝐷𝑃𝑖𝑥 [1,𝑠 ]
of
𝐷𝑃𝑖
is attached with
a random
𝑁𝑜𝑛𝑐𝑒𝑖𝑥
. The data uploader logs
ℎ𝑎𝑠ℎ(𝐷𝑃𝑖𝑥+
𝑁𝑜𝑛𝑐𝑒𝑖𝑥)
to the global blockchain. Each second-dimension
node can check the data it received with the hash indi-
cated in the blockchain.
𝑁𝑜𝑛𝑐𝑒𝑖𝑥
is also sent to the rst-
dimension node, which hosts
𝐷𝑃𝑖
, so it can generate and
send a sub-piece of data to the relevant second-dimension
node when it is under liveness verication.
(4)
Each data piece is attached with a checksum, such that
𝐶ℎ𝑒𝑐𝑘 𝑠𝑢𝑚𝐷 𝑃𝑖,𝑖 [2,𝑚]
=ℎ𝑎𝑠ℎ(𝐶ℎ𝑒𝑐𝑘𝑠𝑢𝑚𝐷 𝑃𝑖1+ℎ𝑎𝑠ℎ (𝐷𝑃𝑖))
;
𝐶ℎ𝑒𝑐𝑘 𝑠𝑢𝑚𝐷 𝑃1=ℎ𝑎𝑠ℎ (𝐷 𝑃1).
When the data requester gets all the data pieces and puts them
together, the data requester may verify each
𝐷𝑃𝑖 ,𝑖 [ 1,𝑚]
received
by generating the checksum and comparing it with the checksum
attached. If the checksums are all correct, then the data received is
authenticated. An altered
𝐷𝑃𝑖
will change all checksums afterward,
so unless all the data pieces are provided by adversarial nodes,
unauthenticated data cannot have correct checksums. Our security
3
Yibin Xu, Tijs Slaats, and Boris Düdder
The data requester
asks for a data piece
from node A
Node A replied
privately?
The data
requester asks
the missing info
from the second
dimension nodes
Node A and the first
T nodes replied in
transactions get
rewards
The second dimension
nodes reply by posting
PVSS of the missing
info to the blockchain
The data
requester asks
sub-pieces from
the second
dimension nodes
The data
requester asks
the missing info
from the second
dimension nodes
All second dimension
nodes get reward s
providing sub-pieces;
the first T nodes
replied in transaction
get rewards for
providing the missing
data
The second dimension
nodes reply by posting
PVSS of the missing
info to the blockchain
The second
dimension nodes
reply privately
The second dimension
nodes check the liveness
of node A
Yes
No
At least T
nodes replied
AliveFailed
-
-
-
-
-
-
-
-
for
Figure 2: The procedure for a data requester to request a data piece from a rst dimension node 𝐴
model has ensured that the probability for the entire rst-dimension
nodes to be adversarial is minimal.
If the data requester provides the data it received with others,
it cannot prove that the data is actually stored in the blockchain.
Because (1) it cannot prove that the data is received from the nodes
in our system; (2) the hash of the data pieces or sub-pieces are not
recorded on the blockchain; (3)
𝑁𝑜𝑛𝑐𝑒𝑖
or
𝑁𝑜𝑛𝑐𝑒𝑖𝑥
are not provided
to the data requester.
1.2 Contribution
To summarize, we propose a two-dimensional sharding model for
the blockchain enabling:
On-chain and process-aware data privilege and attes-
tation control. We achieve data privilege control via a
method that combines data fragmentation and blockchain
sharding. On top of that, we run a process model to de-
scribe and govern sophisticated processes for data privilege.
Compared to encryption methods, we do not require the
decryption key holder’s presence to grant data access. The
rights can be granted by smart contract execution. Com-
pared to other fragmentation methods, we do not require
trust in nodes hosting the fragments to provide condentiality
or to ensure data completeness and availability. We assume a
number of them are adversaries who try to steal the data or
render the data incomplete. The probability of data being
illegally acquired or rendered incomplete by the adversary
is strictly below a threshold. Compared to secret-sharing
systems, we do not need to consider data availability sepa-
rately and allow large-size data instead of only storing the
decryption keys. Compared to all existing approaches for
blockchain, we separate the data attestation from the data,
this improves the privacy of the system.
Balanced and loss-resilience data storage. Using the
two-dimensional sharding model, we guarantee that every
user-uploaded data has an equal number of backups. We the-
oretically prove that it is dicult for data loss to occur even
if a threshold number of nodes go oine simultaneously.
1.3 Organization
The rest of the paper is organized as follows. Section 2 discusses the
use cases and the related works. We introduce some preliminary
knowledge in section 3. Section 4 denes our Data Privilege and
Authority Model (DPAM), where terms, architectures and fault
models are introduced. The two-dimensional sharding model is
dened in section 5. We formulate the security of our system in
section 6. The paper is concluded in section 7.
2 USE CASES AND RELATED WORK
2.1 Use case
We give example applications to illustrate the benets of storing
data using a method that combines blockchain sharding and data
fragmentation to empower large data sizes and support access
control, data privilege, and attestation.
Example 1. Dynamic querying and combining private
data. A government agent wants to nd all households’
addresses with at least one member who has contracted
Covid-19 but not having health insurance. This task in-
volves the government with household registration data,
hospitals with electronic medical records of patients, and
insurance rms. All data is stored in our system and kept
condential from the public. We assume data being key-
value data stored in tables. The government agent submits
a smart contract and requests the data, which only con-
tains the column of the person’s name and address from
4
A two-dimensional sharding model for access control and data privilege management of Blockchain
the household database. It collects only the column of the
name of the people who are registered with Covid-19 from
the medical database. It collects only the column of people
who bought medical insurance from the insurance database.
Executing this smart contract in the blockchain and then
being granted relevant privileges, it leaves the data access
logs in terms of the time, the scope of data, and the reason
for access permanently in the blockchain. As only nodes
who have the access privilege can download the data pieces
from relevant nodes, the public monitors the update for
data, yet the data remains hidden from the public.
Example 2. Levels of data privilege. (A) An e-commerce
company wants to host its website on the blockchain. This
website is a dynamic website that contains dynamic infor-
mation varying from user to user. The company can update
dynamic information (e.g., the shopping records) privately
and restrict the user information’s privilege only to the user
via smart contract executions. These are not achievable via
IPFS [
4
]. Even though IPFS claims to power the distributed
web, it can only host static content. (B) Many process man-
agement models and languages [
12
,
22
,
28
,
36
] require a
blockchain to establish trust among the participating par-
ties. However, many processes are private and cannot be
disclosed to the public, although a public witness is required.
Combining our architecture with some languages like DCR
graph, [
13
] we can build process authority and privilege
dynamically among participants and provide an interface
for supervision in case of conicts or arbitration.
Example 3. Separating data attestation and usage. HR
wants to verify the credentials of a candidate. He/she needs
to obtain consent from the candidate and contact the rel-
evant organizations to verify the information. Such pro-
cedures are repeated for every job-seeking event. In our
model, the relevant organization can log the credential into
the blockchain, and then the candidate can grant the data
access privilege to HR by himself/herself. Compared to an
asymmetric encryption-based method where the relevant
organizations sign credentials using a private key, and the
candidates hold the public key, granting an access privilege
has no permanent eects, and it can be revoked afterward.
Even if HR shares the information obtained with others,
they must contact the candidate to validate further because
there is no cryptography signature attached to the informa-
tion. The information itself is not anti-counterfeiting, but the
way it is maintained in our model is. Similar use cases in-
clude “vaccine passport verication”, “credit check” and
“medical report disclose and sharing”.
In all of the examples above, our system does not suer from the
problem of storage oversize because our design had split the data
and backup them in dierent nodes. Only the index info is stored
in the blockchain. As mentioned in the example, we support levels
of sophisticated processes for granting/revoking the data privilege
and we provide data in a way that guarantees generosity but does
not bring attestation to the data.
2.2 Smart Vault/Hyperledger Fabric Channels
Suppose a sender stores a condential document in its Smart Vault
[
23
]. This document is added to a permission-enhanced IPFS storage
system [
2
,
4
] where the data is split into fragments and restricted
to certain nodes only. An Access Control List (ACL) smart contract
is deployed on the blockchain. A public shareable document ref-
erence, metadata le, and its hash is generated and added to the
blockchain. The IPFS Merkle Directed Acyclic Graphs (DAGs) [
6
]
are linked to the ACL smart contract. When the sender wants to
share the document with the receiver, it rst grants the receiver
read-access using the ACL smart contract. The sender then shares
the document reference (the hash of the metadata) with the re-
ceiver. After that, the receiver can fetch and open the document.
The receiver’s Smart Vault performs a read request to all peers,
where the IPFS protocol resolves the peer holding the document.
The sender’s Smart Vault nodes perform an authorization check
against the condential document’s ACL before distributing the
content. After the above procedure, the document is now stored
in two decentralized Smart Vaults, protected with the same ACL.
Figure 3 shows an outline of the operations.
Permission Enhanced IPFS
Sender
Add file
Protected File
FS
Node 1
FS
Node 2
FS
Node 3
IPFS Bitswap Engine
Receiver
Get me the file
Get ACL
of file's
DAG
Blockchain with support for Smart
contracts
Has the receiver access?
Data +
ACL ID
Data +
ACL ID
Data +
ACL ID
Figure 3: Smart vault operation outlines
Hyperledger Fabric uses a similar approach. However, both mod-
els suer from the drawbacks of both condentiality and data com-
pleteness. Though data is also split into fragments, it is not prepared
for the situation that the nodes hosting the data are adversaries.
Then, it is unclear how easy it is for the adversary to get the docu-
ments illegally by controlling all nodes storing the fragments. Besides,
it cannot deal with data loss if a node fails or goes oine when no
other authorized node stores a replica of the data.
2.3 Threshold Secure-sharing
A
(𝑇, 𝑚)
-secret-sharing system [
5
,
35
] allows a user to share a secret
with
𝑚
trustees, allowing any subset of
𝑇
trustees to work together
to reconstruct the secret, but any smaller group learns nothing
about the secret. Secret sharing with
(𝑇, 𝑚)
can thus protect up
to
𝑇
-1 malevolent participants,
𝑚𝑇
oine participants, or both.
Any party knowing the public keys for the encryption methods
𝐸𝑖
may non-interactively verify if
𝐸𝑖(𝑠𝑖)
is a correct encryption
of the share
𝑠𝑖
for the participant
𝑃𝑖
without knowing
𝑠𝑖
. Calypso
[
20
] employs the secret sharing scheme in the blockchain. Data in
Calypso is encrypted and the decryption keys are stored using the
(𝑇, 𝑚)
secret sharing scheme. The keys are kept by a secret group
of nodes. When a key request is made, at least
𝑇
nodes post their
encrypted shares to the blockchain. Only the requester can decrypt
5
Yibin Xu, Tijs Slaats, and Boris Düdder
the encrypted shares but the public can verify. However, (1) It only
concerns the keys, but not how the encrypted data is stored. The
data availability cannot be guaranteed. (2) It cannot support the use
case Example 3. (3) It is not designed to support a sophisticated
process for granting/revoking access to data.
2.4 Process modelling
Previous approaches towards process-aware blockchains [
19
,
22
,
27
,
39
] have focused on providing translations from process models
into existing smart contract languages, particularly, by translating
ow-based Business Process Model and Notation (BPMN) diagrams
to Solidity. [
25
] proposed to reduce the cost of redeployment of the
smart contracts when changing the process model by a specially
designed interpreter of BPMN process models based on dynamic
data structures. [
26
] presented a model for the dynamic binding of
actors to roles in collaborative processes and an associated binding
policy specication language. Contrary to these works, we focus
here on modelling data access control and employ a declarative
constraint-based formalism instead of an imperative ow-based
notation.
Inspired by institutional grammars, [
10
] proposed a high-level
declarative language that focuses on business contracts, however,
no implementation is provided. A high-level vision of the business
artifact paradigm toward modelling business processes on a dis-
tributed ledger was given in [
16
]. [
38
] proposed a lean architecture
enabling lightweight and full-featured on-chain implementations
of a decentralised process execution system. In [
28
], the authors
mapped DCR graphs to Solidity contracts and in [
45
], the authors
mapped DCR graphs to TEAL contracts. None of these works fo-
cused on the data-access control perspective.
There has also been signicant work on using workows to
model security aspects, including access-control and privacy con-
cerns, most commonly as extensions to the BPMN standard, e.g.
[
21
,
33
]. In [
17
] access control for inter-organizational workows
was considered, but not in a blockchain environment with untrusted
partners. The use of declarative DCR Graphs for modelling data
access control is a fully novel contribution to the current work.
3 PRELIMINARY
3.1 Publicly Veriable Secret Sharing Scheme
A dealer may share a secret with a committee of shareholders
using a publicly-veriable secret-sharing scheme (PVSS), which
enables everyone—not just the shareholders—to conrm that the
secret was correctly disclosed and that it is recoverable. The sender
can broadcast a single message to the whole universe via a non-
interactive PVSS mechanism, from which the shareholders can
obtain their shares and everyone else can verify that sharing was
carried out correctly. In addition, a proactive PVSS scheme [
29
,
31
]
enables the secret to be passed from one committee of shareholders
to the next, ensuring that: (a) the secret is kept a secret from an
adversary that only controls a minority in each committee and
(b) everyone can verify that the secret is properly passed between
subsequent committees. These protocols, which are essential to
distributed cryptography, have been thoroughly investigated in
literature [
11
,
46
,
47
]. They have also lately been suggested as tools
for enabling secure processing on massively distributed networks
like blockchains.
3.2 Dynamic Condition Response graph
DCR Graphs are a formal declarative notation for expressing pro-
cesses, [
14
]. The primary emphasis of the basic notation is control
ow, or the permitted ordering of operations. The executable com-
ponents, referred to as events, that make up a DCR Graph’s nodes
can be given labels via labeling functions. Numerous events can
share a single label thanks to the labeling function, which enables
process activities to take place multiple times in a graph under
various constraints based on their context.
An marking that indicates for each event whether it (1) has been
executed in the past, (2) is pending, and (3) is now included in the
execution, describes the state of a graph. The edges of the graph, or
the connections between events, specify how the graph evolves. An
event might impose constraints on or aect another event through
its relations. There are two limitations that could apply: The mile-
stone (
) captures that an event cannot be executed while another
event is pending, while the condition (
→•
) captures that an event
cannot be executed unless another event has been executed some
time (not necessarily immediately) before it. There are three types
of eect relations: exclusion (
%
), inclusion (
→+
), and response
(
•→
). The exclusion relation removes an event from the process
and disables any constraints it might have on other events. The
inclusion relation includes the removed event back into the process
and enables any constraints it may have had. Before a process may
be said to be in an accepting condition, pending events, which are
obligations, must be satised by either being executed or being
excluded.
Caseworker
Collect
documents
It
Irregular
neighbourhood
Mobile consultant
On-site appraisal
Caseworker
Statistical
appraisal
Caseworker
Assess loan
application !
Customer
Submit budget
+
%%
%
(not accepting)
Figure 4: DCR Graph of a mortgage application process
adapted from [1].
A simplied version of a loan application procedure used in in-
dustry is depicted in Fig. 4, which was modelled as a DCR Graph in
[
9
]. Along with the name of the action, the labels of the events also
list the roles that are authorized to carry out the activity. The red
wording and exclamation point, which indicate that the event is a
initial response, indicate that the loan application should always be
evaluated by the case worker. To accomplish this, the case worker
must rst gather the necessary paperwork, and the client must sub-
mit a budget as evidenced by the conditional relationships between
6
A two-dimensional sharding model for access control and data privilege management of Blockchain
these two events. Additionally, a statistical or on-site appraisal has
to be completed. Both are prerequisites for evaluating a loan appli-
cation, but they also mutually exclude one another, so if one occurs,
the other is likewise nullied and won’t prevent other events from
occurring. The budget submission also has a relationship to the
assessment’s response, thus any time a customer submits a (new)
budget, the loan application must be evaluated once more. Finally,
IT may conclude that a site appraisal is necessary for the property’s
neighborhood. Then it eliminates the event from the statistical
appraisal and includes the event from the on-site appraisal, re-
enabling on-site as a requirement for the assessment even though
it had previously been excluded by a statistical appraisal.
Unusual neighborhoods and loan applications are initially blocked
as having requirements that need to be satised. Other occurrences
are permitted because they are present and do not have any mile-
stones or blocking criteria. Since the evaluate loan application is
present and an answer is still pending, the graph is not receiving
new data.
Executing Collect documents and Submit budget will mark these
events as executed. We can execute Assess loan application, which
will remove the waiting response and put the graph into an accept-
able state, after doing a statistical appraisal, which will mark itself
as executed and exclude On-site appraisal.
Keep in mind that even if the customer provides updated in-
formation that necessitates a repeating execution of Assess loan
application, we can still execute Submit budget.
4 MODELING DPAM
In our Data Privilege and Authority model (DPAM), we assume
a pre-determined node membership table of
𝑛
nodes as like any
permissioned blockchain. There is a vanilla blockchain (global
blockchain) using a synchronous consensus protocol in which all
nodes participate in. Each time when a user adds its data to the sys-
tem, a two-dimensional sharding model is constructed, where the
𝑛
nodes are assigned with spots in the sharding model. The scope of
data access and update rights for data stored in the two-dimensional
sharding model, as well as the settings for the sharding model, can
be dynamically enforced solely by smart contract executions in the
global blockchain and without any superior nodes for the data. All
data has an equal number of backups and can resist a threshold
number of nodes going oine suddenly.
Terms:
On-chain data. On-chain data is the data stored in our
DPAM model. The update and delete operations for the
data are logged in the blockchain.
Data ID: Every on-chain data has a unique id called Data
ID.
User-uploaded data. A user-uploaded data is on-chain
data that has an Authorized-Access-List (AAL) and an Index-
List (IL).
Atomic data. Atomic data is user-uploaded data with empty
IL. Atomic data is Arithmetic compressed
1
so that with-
out getting the complete text, one cannot decompress and
read the data. Atomic data is split into pieces and stored in
1
Unlike Human coding, which recovers quickly from errors, a single inverted bit in
arithmetically compressed text corrupts all subsequent output.[7]
a rst dimension shard of the two-dimensional sharding
model. Each piece is further split and backup in the second
dimension shards. Atomic data has a Fragment-Index-List
(FIL) to locate the pieces.
Authorized-Access-List (AAL).The AAL of a user-uploaded
data records the users who have access/update rights to
this data.
Index-List (IL). The Index-list is a data-structure-guide
that helps the users to locate sub-data of on-chain data. For
example, if the data is a key-value table, its IL indicates the
table columns’ storage location (Data ID).
Fragment-Index-List (FIL). The Fragment-index-list indi-
cates which nodes store which parts of Atomic data. Nodes
can download the data via the nodes indicated in the list.
User. A user for data is identied by the public identity key.
A user can update/read a particular user-uploaded data if it
has the relevant privilege.
Figure 5 gives an example illustration of the terms. A table of
four columns is updated to the DPAM, and the uploader decided
to make the data of each cell the Atomic data. As can be seen
from the gure, the value “abc” is arithmetically compressed and
is split into four pieces (assuming
𝑚=
4) by the uploader. Users
may view/change data via links in the IL of the table and then via
the IL of the sub-tables until they nd the Atomic data. It is up to
the uploader to decide how its data is structured. It may make the
whole table Atomic data, and then the user can only get the full
data privilege of the table instead of some cells. Depending on the
favour of the data uploader, FIL, AAL, IL of data can either be stored
in the blockchain (therefore, anyone can access it directly) or in the
sharding model as the Atomic data themselves with enough links
in the blockchain for users to locate them. When a user access the
Atomic data, relevant nodes hosting the data must check the user’s
identity through the AAL of the data before allowing the user to
access/update the data.
DPAM model provides the four properties as follows:
Safety. The user-uploaded data stored in the model can
only be accessed by authorized users. Illegal access to data
can only occur below the threshold probability.
Completeness. The user-uploaded data will always be
available either through data pieces or sub-pieces in shards,
assuming the adversary population is below the threshold
number (e,g. one-third of the overall population).
Witness. All updates of data leave records in the blockchain.
Assembly. The user-uploaded data can be the assembly
of sub-data. The sub-data also has safety, completeness,
witness, and assembly properties. The sub-data derive the
authorized access list of the data. Note that Atomic data
does not have sub-data.
Governance of Authorized-Access-List: We aim to dynamically
alter the data privilege for users following the pre-determined pro-
tocols. Business Process Management Languages like DCR graph
[
13
] can be used to update the data access authorization dynami-
cally. Previous research [
28
,
45
] studied methods to update the DCR
graph via smart contract executions.
7
Yibin Xu, Tijs Slaats, and Boris Düdder
IL:
Name column,
ID:0x1e
DOB column,ID:0x4f
EDU column,ID:0xea
WY column,ID:0xce
Name
D
OB
EDU
abc
01
-01
-
1990
Undergrad
Alice
01
-01
-
1995
Master
ID: 0x7ef ID of the data
Name
abc
Alice
IL:
Data 1, ID:0x3ed, shard 1
Data 2, ID:0x1ec, shard 2
ID:0x1e
abc
Arithmetic Encoding
0.17299999999999999891753
25511
IL:empty
ID:0x3ed
where to locate
sub-data
FIL:
Piece 1 in node 1
Piece 2 in node 2
Piece 3 in node 3
Piece 4 in node 4
frequency_table =
{"c": 1, "a": 2, "b": 7}
Sub-piece … of piece
in node … of shard … of
second dimension
Piece 1 Piece 2 Piece 3 Piece 4
Cut in pieces
AAL Who can view/change
AAL
AAL
Figure 5: An illustration of the terms. Via the IL of the table,
the user can access the information for a column, and then
via the IL of the column, it can locate the Atomic data “abc”
which is arithmetically compressed and stored in the three-
dimensional sharding model.
Log of access/update: An ordinary blockchain is maintained, and
the change of access/update rights are logged in the blockchain.
In case of a security breach, accountability can be built by tracing
transactions in the blockchain to locate relevant users who have
the right to access particular data in a specic time frame.
Fault Model: We assume the attacker will get and leak the data
to others when its nodes are in the relevant positions that can
reconstruct the data, and it will incomplete the data when its nodes
are in the relevant positions that can incomplete the data.
Security assumption. We assume the honest nodes will keep
alive and store the data in the system, and the adversary nodes
will execute the attacks outlined in the fault model above and
may go oine/refuse to reply suddenly. The maximum adversary
population is a predened parameter set before the system starts.
Other assumptions. Nodes have private-public identity key pairs.
We assume pair-wise synchronous communication channels in the
system.
5
DATA PRIVILEGE AND AUTHORITY MODEL
5.1 Foundational functions
The global blockchain maintains
𝐿
, a list of nodes represented by
their public identity address. There are
𝑛
nodes in the list in total,
up to 𝑓 (𝑛1)/3are adversarial.
5.1.1 Uploading/modifying data. A data uploader uploads its data
to the system.
(1)
It generates a transaction that states
𝐴𝐴𝐿
and
𝐼𝐿
for the
data. A random number 𝑅is attached to the transaction.
(2)
It generates
ℎ𝑎𝑠ℎ(𝐷𝑃𝑖𝑥+𝑁 𝑜𝑛𝑐𝑒𝑖𝑥)
and
ℎ𝑎𝑠ℎ(𝐷𝑃𝑖+𝑁 𝑜𝑛𝑐𝑒𝑖)
as stated in section 1, where
𝐷𝑃𝑖𝑥
refers to the number
𝑥
sub-pieces for the number
𝑖
data piece. These hashes are
attached to the transaction.
(3) It then sends the transaction to the global blockchain.
(4)
If it is modifying existing data, the identity of the data
uploader will be compared with the information indicated
in the original 𝐴𝐴𝐿 to determine if the request is valid.
(5)
Nodes generate the
𝐹𝐼𝐿
for the data by hashing the public
identity keys in
𝐿
with
𝑅
, and then rank the nodes alphabet-
ically according to the hashes into the list. The rst
𝑚
nodes
are the rst-dimension nodes for the data. The node in po-
sition
𝑘
is a second-dimension node for the rst-dimension
node 𝑖,𝑘mod 𝑚+1=𝑖.
(6)
Nodes request data pieces or sub-pieces from the data up-
loader according to
𝐹𝐼𝐿
. They post an acknowledgment
to the corresponding second-dimension blockchain after
downloading the data pieces/sub-pieces. These should n-
ish within a presetted number of block intervals.
(7)
If for each and every data piece, at least
𝑇
second dimension
nodes are acknowledged for downloading the sub-pieces,
then the data uploading is completed. Otherwise, the upload
failed.
5.1.2 Downloading data. A data requester request the data from
the system.
(1)
It generates a transaction stating the data ID and sends the
transaction to the global blockchain.
(2)
The nodes run the data requester’s identity through the
𝐴𝐴𝐿
of the data. Afterward, the data requester can reach
out to the nodes and follow the protocol outlined in section
1. The service monitoring the second-dimension nodes is
provided by the design of posing PVSS encoded pieces of
the missing data to the blockchain. The missing data as
a whole has the size of 256 bits. Thereby the generated
PVSS is small. It is only being posted to the blockchain
of the nodes in the same row preventing data burden for
individual nodes.
5.1.3 Incentives. Each row of the nodes in the node table main-
tains a blockchain for itself. The compensation transactions are
posted to this blockchain. The balance of the compensations can
be transferred to the global blockchain using some cross-shard pro-
tocols [
24
,
40
]. To improve stability, the system may request each
rst dimension node to post a security deposit for the data and
conscate it when the node goes oine.
5.2 Data privilege management via DCR graph
Our previous research [
28
,
45
] studied the designs that represent the
markings of the DCR graph as the smart contracts on Ethereum or
Algorand blockchain. One submits a transaction to execute a state
of the graph. The smart contract will check if one has the privilege
to execute the state and if there are any constraints preventing
the state from execution. After these procedures, the state will be
executed and the markings will be updated accordingly.
In this paper, we have shown that the process of uploading,
modifying, or downloading data can be done in a decentralized
manner but only requires cooperation among nodes. Thus, we can
model “upload”, “update” and “download” data as states on the
DCR graph. When a data requester wants to interact with the data,
an instance of a predetermined DCR graph is initiated. Each state
is a block of smart contract codes. Only when the corresponding
8
A two-dimensional sharding model for access control and data privilege management of Blockchain
states are executed, the data requester’s identity key is added or
removed from the corresponding
𝐴𝐴𝐿
. We show the DCR graph
for our COVID-19 example given in section 1 in Figure 6. When
a user uploads his/her location info, it enables a state which will
be automatically executed by the system to remove the info when
the data expires. This state itself will be removed after execution.
When a doctor declares the user as COVID-19 infected, only then,
the health authority can then download the location data of the
user.
Through the DCR graph, data can only be acquired or updated
according to the preset rules, so the data maintained in this way is
transparently provided that the security assumptions remain valid.
User
Upload his/her
location info
Doctor
Declare that the
user has been
infected
%
System
Delete out of
date location
info
+
Health Autority
Acquire the
location info
%
+
(not accepting)
Figure 6: DCR Graph of a simple location sharing process for
COVID-19 prevention process.
5.3 Data requirement
Safely assume we have an erasure code algorithm that generates
coded data of
𝑘
pieces. Each piece has size
𝐸𝑟 (𝑑, 𝑗, 𝑘)
with the
original data sized
𝑑
and the original data can be recovered via
𝑗
pieces out of 𝑘pieces.
Let a user-uploaded data
𝛼
be of size
𝐷(𝛼)
. It is being equally
split into
𝑚
pieces
(𝛼1· · · 𝛼𝑚)
at the rst dimension, each piece
sized
𝐷(𝛼)/𝑚
. Each node at the second-dimension shard
𝑖
stores a
sub-piece sized 𝐸𝑟 (𝐷(𝛼)/𝑚,𝑇 , 𝑠 1).
For each Atomic data, if a node is inside the rst-dimension
shard hosting it, then the total data requirement is
𝐷𝑅 (𝛼)=𝐷(𝛼) /𝑚(1)
otherwise,
𝐷𝑅 (𝛼)=𝐸𝑟 (𝐷(𝛼)
𝑚,𝑇 , 𝑠 1)(2)
The equations show the choice of parameters inuences the data
requirement.
We simulate a
𝑛=
2000-system with dierent
𝑠
and
𝑚
. The
system stores 1 terabyte of user-uploaded data in total. Each user-
uploaded data has a size of 1 gigabyte. Figure 7 shows the average
storage size assigned to individual nodes with dierent settings of
𝑠
and
𝑚
. Because each user-uploaded data is associated with dierent
node table, a node may be the rst-dimension node for a user-
uploaded, but a second-dimension node for another user-uploaded.
When a node is the rst-dimension node for data, its storage size
for this user-uploaded data is calculated using Equation 1 otherwise
5/400 16/125 40/50 100/20 250/8 1000/2
s/m
0
500
1000
1500
2000
Data requirement (Mbytes)
Figure 7: Data requirements for individual nodes in corre-
sponding to equation 2 and equation 1 when
𝑛=
2000
,𝑇 =
0.5× (𝑠1)⌋ + 1
Store data as transactions 5/400 16/125 40/50 100/20 250/8 1000/2
s/m
0.0
0.2
0.4
0.6
0.8
1.0
Data requirement (Mbytes)
1e6 1048576
1497 1370 1358 1054 1174 1299
Figure 8: Data requirements for individual nodes in corre-
sponding to equation 2 and equation 1 when
𝑛=
2000
,𝑇 =
0
.
5
× (𝑠
1
)⌋ +
1, which is almost invisible in the gure com-
pared to storing data directly as transactions
it is calculated using Equation 2. Figure 8 compares the storage size
for individual nodes with the storage size as if the data is directly
stored as transactions in the blockchain. As can be seen from gure
7, it only requires around 3 gigabytes of storage on average for
𝑛=
2000
,𝑇 =
0
.
5
× (𝑠
1
)⌋ +
1in our approach, while that is 1
terabyte without our approach.
Thereby, storing data in our system is practical.
6 SECURITY
The security of the system relies on the random assignment of nodes
to the node table. The adversary may make the data incomplete
by taking at least
𝑇
nodes in a row of the node table. In that case,
the second-dimension shard cannot provide the missing info to
the data requester nor reconstruct the data in the rst dimension.
Thus, the probability of an attack to make a particular data piece
incomplete is
𝑃𝑟ℎ𝑔 =
𝑠1
𝑋= (𝑠2)/2 +1
(𝑓
𝑋)(𝑁𝑓
𝑠1𝑋)
(𝑁
𝑠1)(3)
𝑓=⌊(𝑛1) /3.
The upper-bound probability of rendering any data piece incom-
plete is
𝑃𝑟 =max(1,𝑚 ×𝑃𝑟ℎ𝑔 )(4)
It is challenging to formulate the probability of illegally acquiring
user-uploaded data. This is because the adversary can acquire a data
piece by taking the corresponding rst-dimension node or having
enough spots in the corresponding second-dimension shard. For
each data piece, we need to consider both cases. We approximate
the probability by sampling from a probability distribution. We
9
Yibin Xu, Tijs Slaats, and Boris Düdder
outline an algorithm in algorithm 1 to approximate probabilities
using MCMC (Markov chain Monte Carlo) method.
Tables 1, 2, 3 show the probabilities approximated with some
random
𝑛
of dierent scales when
𝑓 ⌊(𝑛
1
)/
3
nodes are
adversarial. Tables 4 and 5 show the probabilities approximated
with some random
𝑛
of dierent scales when
𝑓 ⌊(𝑛
1
)/
4
nodes
are adversarial. Table 6 show the probabilities approximated when
𝑓 ⌊(𝑛
1
)/
5
are adversarial.We highlight some good results in
the tables using bold faces.
As can be seen from these tables, with suitable parameters, the
probabilities for successful attacks are minimal, rendering our sys-
tem practically secure.
Algorithm 1 Approximate
𝑃𝑟𝑖 𝑙𝑙 𝑒𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟 𝑒𝑚𝑒𝑛𝑡
,
𝑃𝑟𝑖 𝑛𝑐𝑜𝑚𝑝𝑙 𝑒𝑡 𝑒 𝑑𝑎𝑡 𝑎
Input: 𝑛,𝑚,𝑠,𝑚𝑎𝑥_𝑟𝑜𝑢𝑛𝑑𝑠
Initialization: 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑎:=0,𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑏:=0,𝐶𝑜𝑢𝑛𝑡𝑒𝑟 :=0
repeat
𝐶𝑜𝑢𝑛𝑡 𝑒𝑟 :=𝐶𝑜𝑢𝑛𝑡𝑒𝑟 +1
Generate a node table 𝐴𝑠𝑠𝑖𝑔𝑛1, width 𝑠spot, height 𝑚spot.
if The adversary gets at least
𝑇
spots in a row
𝑐
of
𝐴𝑠𝑠𝑖𝑔𝑛
1
(excluding the rst column) then
The number
𝑐
spot of the rst column is considered taken
by the adversary regardless if it was originally assigned with an
honest or adversary node.
end if
if The adversary has taken all spots in the rst column of
𝐴𝑠𝑠𝑖𝑔𝑛1then
𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑎:=𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑎+1
end if
if The adversary gets at least
𝑇
spots in at least any row of
𝐴𝑠𝑠𝑖𝑔𝑛1(excluding the rst column) then
𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑏:=𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑏+1
end if
until 𝐶𝑜𝑢𝑛𝑡 𝑒𝑟 𝑚𝑎𝑥 _𝑟𝑜𝑢𝑛𝑑𝑠
Output: 𝑃𝑟𝑖 𝑙𝑙 𝑒𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟 𝑒𝑚𝑒𝑛𝑡 =𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑎/𝐶𝑜𝑢𝑛𝑡 𝑒𝑟
𝑃𝑟𝑖 𝑛𝑐𝑜𝑚𝑝𝑙 𝑒𝑡 𝑒 𝑑𝑎𝑡 𝑎 =𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑏/𝐶𝑜𝑢𝑛𝑡 𝑒𝑟
Table 1: The probability approximated for
𝑛=
1000, maxi-
mum run=105,𝑓 ⌊(𝑛1) /3
𝑠 𝑚 𝑃𝑟𝑖𝑙𝑙𝑒𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟𝑒𝑚𝑒 𝑛𝑡 𝑃𝑟𝑖𝑛𝑐𝑜𝑚𝑝 𝑙𝑒𝑡𝑒 𝑑𝑎𝑡𝑎
4 250 0 0.99996
5 200 0 0.99999
8 125 0 0.9985
10 100 0 0.99167
20 50 0 0.71697
25 40 0 0.70058
40 25 0 0.13934
50 20 0 0.05633
100 10 1.00E-05 0.00067
125 8 0.00017 0.00011
200 5 0.00427 0
250 4 0.01206 0
500 2 0.10911 0
Table 2: The probability approximated for
𝑛=
2000, maxi-
mum run=105,𝑓 ⌊(𝑛1) /3
𝑠 𝑚 𝑃𝑟𝑖 𝑙𝑙 𝑒𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟 𝑒𝑚𝑒𝑛𝑡 𝑃 𝑟𝑖𝑛𝑐𝑜𝑚𝑝𝑙 𝑒𝑡 𝑒 𝑑𝑎𝑡 𝑎
4 500 0 0.99999
5 400 0 0.99999
8 250 0 0.99999
10 200 0 0.99991
16 125 0 0.98541
20 100 0 0.92336
25 80 0 0.91539
40 50 0 0.27093
50 40 0 0.12001
80 25 0 0.01011
100 20 0 0.00182
125 16 0 0.00033
200 10 1.00E-05 0
250 8 0.00011 0
400 5 0.0039 0
500 4 0.01181 0
1000 2 0.11289 0
Table 3: The probability approximated for
𝑛=
3000, maxi-
mum run=105,𝑓 ⌊(𝑛1) /3
𝑠 𝑚 𝑃𝑟𝑖 𝑙𝑙 𝑒𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟 𝑒𝑚𝑒𝑛𝑡 𝑃 𝑟𝑖𝑛𝑐𝑜𝑚𝑝𝑙 𝑒𝑡 𝑒 𝑑𝑎𝑡 𝑎
3 1000 0 0.99999
4 750 0 0.99999
5 600 0 0.99999
6 500 0 0.99999
8 375 0 0.99999
10 300 0 0.99999
12 250 0 0.99997
15 200 0 0.99999
20 150 0 0.98004
24 125 0 0.91534
25 120 0 0.97697
30 100 0 0.72848
40 75 0 0.38641
50 60 0 0.1789
60 50 0 0.08009
75 40 0 0.03776
100 30 0 0.00347
120 25 0 0.00082
125 24 0 0.00081
150 20 0 8.00E-05
200 15 0 0
250 12 1.00E-05 0
300 10 2.00E-05 0
375 8 0.00013 0
500 6 0.00144 0
600 5 0.00408 0
750 4 0.01206 0
1000 3 0.03739 0
1500 2 0.11231 0
10
A two-dimensional sharding model for access control and data privilege management of Blockchain
Table 4: The probability approximated for
𝑛=
200, maximum
run=105,𝑓 ⌊(𝑛1) /4
𝑠 𝑚 𝑃 𝑟𝑖𝑙 𝑙𝑒 𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟 𝑒𝑚𝑒𝑛𝑡 𝑃𝑟𝑖𝑛𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑑 𝑎𝑡𝑎
4 50 0 0.54507
5 40 0 0.90139
8 25 0 0.2567
10 20 0 0.16057
20 10 0 0.01353
25 8 0 0.00816
40 5 0 4.00E-05
50 4 0.00356 1.00E-05
100 2 0.06122 0
Table 5: The probability approximated for
𝑛=
400, maximum
run=105,𝑓 ⌊(𝑛1) /4
𝑠 𝑚 𝑃 𝑟𝑖𝑙 𝑙𝑒 𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟 𝑒𝑚𝑒𝑛𝑡 𝑃𝑟𝑖𝑛𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑑 𝑎𝑡𝑎
4 100 0 0.80421
5 80 0 0.99148
8 50 0 0.47251
10 40 0 0.32001
16 25 0 0.08606
20 20 0 0.03449
25 16 0 0.02395
40 10 0 0.00046
50 8 0 5.00E-05
80 5 0.00089 0
100 4 0.0037 0
200 2 0.06131 0
Table 6: The probability approximated for
𝑛=
100, maximum
run=105,𝑓 ⌊(𝑛1) /5
𝑠 𝑚 𝑃𝑟𝑖𝑙𝑙𝑒𝑔𝑎𝑙 𝑎𝑐𝑞𝑢𝑖𝑟𝑒𝑚𝑒 𝑛𝑡 𝑃𝑟𝑖𝑛𝑐𝑜𝑚𝑝 𝑙𝑒𝑡𝑒 𝑑𝑎𝑡𝑎
4 25 0 0.16516
5 20 0 0.41867
10 10 0 0.01817
20 5 0.00018 0.00017
25 4 0.0012 4.00E-05
50 2 0.03897 0
6.1 Preventing the system from a Sybil attack
A simple solution to prevent adversaries from claiming numerous
nodes in order to gain more data is to use a threshold Proof-of-Work
(PoW) or threshold Proof-of-Stake (PoS) algorithm: The nodes must
submit a threshold amount of PoS or PoW in each block interval.
Thereby, the number of nodes an entity can have is bounded by the
amount of calculation power/stake it holds.
6.2 Security for atomic data.
The atomic data is split and backup by nodes, only holding parts of
atomic data cannot reveal any useful information. As introduced in
section 4, each Atomic data is arithmetically compressed before be-
ing uploaded to the sharding model. Unlike Human coding, which
recovers quickly from errors, a single inverted bit in arithmetically
compressed text corrupts all subsequent output. Therefore, if the
adversary only gets parts of the Atomic data, it cannot successfully
decode the Atomic data (nor part of it). It can only decode the
Atomic data when it gets all parts of the data.
6.3 Update the nodes hosting the data
The FIL of the Atomic data indicates which nodes store which
data pieces. The FIL is initially sent to the network by the data
uploader, and then relevant nodes begin to sync the data. When
the nodes are reshued across the shards afterward, the FIL for
existing data is not changed nor are the nodes storing the data. So
nodes do not learn new data during regular membership adjustment
(periodically adding new nodes and deleting oine nodes). However,
it is ideal that the system has a mechanism to regularly (e.g., once
a month) replace the nodes indicated in the FIL, which went oine.
A straightforward method is to group the data stored by the oine
nodes and regenerate new FIL for them (treat the grouped data as
new Atomic data) then recover the data. For example, if a data piece
in the rst dimension is lost, the system generates a new FIL for it
(through consensus decision, as nodes can determine which nodes
when oine). The new node assigned this data piece according
to the new FIL can acquire the sub-pieces in the relevant second-
dimension shard (according to the old FIL) and reassemble this data
piece. After that, the old FIL is partly updated, with the oine node
being replaced by this new node. If a sub-piece is lost, it can be
recovered by resembling its sub-sub-pieces of it. If a sub-sub-piece
is lost, it can be recovered with the help of the corresponding node
that hosts the relevant data from the lower dimension.
Even with methods being placed to replace the oine nodes,
adversary nodes do not learn new data, because our security assump-
tion assumes that honest nodes will always stay online. Therefore,
the security is not compromised.
Note that with the nancial features of the blockchain, the system
can impose incentives to encourage nodes to stay online and penalize
the ones that went oine. In this way, it stabilizes the node population.
7 CONCLUSION
This study presented a two-dimensional blockchain sharding model
for on-chain data privilege and authority on the blockchain. We
demonstrated designs that restrict the likelihood of data loss and
leakage to a minimal probability. By doing so, we demonstrate
that our designs’ public witness and tamper-resistance are unaf-
fected, despite the fact that the data is only accessible to authorized
individuals.
REFERENCES
[1] Reference suppressed for double-blind reviews.
[2]
Ascigil, O., Reñé, S., Król, M., Pavlou, G., Zhang, L., Hasegawa, T., Koizumi,
Y., and Kita, K. Towards peer-to-peer content retrieval markets: Enhancing
ipfs with icn. In Proceedings of the 6th ACM Conference on Information-Centric
Networking (2019), pp. 78–88.
[3]
Avarikioti, G., Kogias, E. K., Wattenhofer, R., and Zindros, D. Brick: Asyn-
chronous payment channels. arXiv preprint arXiv:1905.11360 (2019).
[4]
Benet, J. Ipfs-content addressed, versioned, p2p le system. arXiv preprint
arXiv:1407.3561 (2014).
[5]
BLAKLEY, G. R. Safeguarding cryptographic keys. In 1979 International Workshop
on Managing Requirements Knowledge (MARK) (1979), pp. 313–318.
11
Yibin Xu, Tijs Slaats, and Boris Düdder
[6]
Bleichenbacher, D., and Maurer, U. M. Directed acyclic graphs, one-way
functions and digital signatures. In Annual International Cryptology Conference
(1994), Springer, pp. 75–82.
[7]
Boyd, C., Cleary, J. G., Irvine, S. A., Rinsma-Melchert, I., and Witten, I. H.
Integrating error detection into arithmetic coding. IEEE Transactions on Commu-
nications 45, 1 (1997), 1–3.
[8]
Dai, M., Zhang, S., Wang, H., and Jin, S. A low storage room requirement
framework for distributed ledger in blockchain. IEEE Access 6 (2018), 22970–
22975.
[9]
Debois, S., Hildebrandt, T., and Slaats, T. Concurrency and asynchrony in
declarative workows. In BPM 2016 (2016), vol. 9253 of LNCS, Springer, Cham.
[10]
Frantz, C. K., and Nowostawski, M. From institutions to code: Towards
automated generation of smart contracts. In FAS*W (2016), IEEE, pp. 210–215.
[11]
Fujisaki, E., and Okamoto, T. A practical and provably secure scheme for
publicly veriable secret sharing and its applications. In International Conference
on the Theory and Applications of Cryptographic Techniques (1998), Springer,
pp. 32–46.
[12]
Hildebrandt, T., Mukkamala, R. R., and Slaats, T. Safe distribution of declar-
ative processes. In International Conference on Software Engineering and Formal
Methods (2011), Springer, pp. 237–252.
[13]
Hildebrandt, T., Mukkamala, R. R., Slaats, T., and Zanitti, F. Contracts for
cross-organizational workows as timed dynamic condition response graphs.
The Journal of Logic and Algebraic Programming 82, 5-7 (2013), 164–185.
[14]
Hildebrandt, T. T., and Mukkamala, R. R. Declarative event-based workow as
distributed dynamic condition response graphs. In PLACES 2010 (2010), K. Honda
and A. Mycroft, Eds., vol. 69 of EPTCS, pp. 59–73.
[15]
Huang, J., Lei, K., Du, M., Zhao, H., Liu, H., Liu, J., and Qi, Z. Survey on
blockchain incentive mechanism. In International Conference of Pioneering Com-
puter Scientists, Engineers and Educators (2019), Springer, pp. 386–395.
[16]
Hull, R., Batra, V. S., Chen, Y.-M., Deutsch, A., Heath III, F. F. T., and Vianu,
V. Towards a shared ledger business collaboration language based on data-aware
processes. In ICSOC (2016), Springer, pp. 18–36.
[17]
Kang, M. H., Park, J. S., and Froscher, J. N. Access control mechanisms for
inter-organizational workow. In Proceedings of the sixth ACM symposium on
Access control models and technologies (2001), pp. 66–74.
[18]
Khalil, R., Zamyatin, A., Felley, G., Moreno-Sanchez, P., and Gervais, A.
Commit-chains: Secure, scalable o-chain payments. Cryptology ePrint Archive,
Report 2018/642 (2018).
[19]
Klinger, P., and Bodendorf, F. Blockchain-based cross-organizational exe-
cution framework for dynamic integration of process collaborations. In WI
(2020).
[20]
Kokoris-Kogias, E., Alp, E. C., Gasser, L., Jovanovic, P., Syta, E., and Ford, B.
Calypso: Private data management for decentralized ledgers. Proc. VLDB Endow.
14, 4 (dec 2020), 586–599.
[21]
Labda, W., Mehandjiev, N., and Sampaio, P. Modeling of privacy-aware busi-
ness processes in bpmn to protect personal data. In Proceedings of the 29th Annual
ACM Symposium on Applied Computing (2014), pp. 1399–1405.
[22]
Ladleif, J., Weske, M., and Weber, I. Modeling and enforcing blockchain-based
choreographies. In BPM (2019), Springer, pp. 69–85.
[23]
Lightstreams. Smart vault. https://docs.lightstreams.network/products/smart-
vault, access date: 2021-05-16.
[24]
Liu, Y., Liu, J., Yin, J., Li, G., Yu, H., and Wu, Q. Cross-shard transaction
processing in sharding blockchains. In International Conference on Algorithms
and Architectures for Parallel Processing (2020), Springer, pp. 324–339.
[25]
López-Pintado, O.,Dumas, M., García-Bañuelos, L., and Weber, I. Interpreted
execution of business process models on blockchain. In EDOC (2019), IEEE,
pp. 206–215.
[26]
López-Pintado, O., Dumas, M., García-Bañuelos, L., and Weber, I. Controlled
exibility in blockchain-based collaborative business processes. Information
Systems (2020), 101622.
[27]
López-Pintado, O., García-Bañuelos, L., Dumas, M., Weber, I., and Pono-
marev, A. Caterpillar: a business process execution engine on the ethereum
blockchain. SPE 49, 7 (2019), 1162–1193.
[28]
Madsen, M. F., Gaub, M., Høgnason, T., Kirkbro, M. E., Slaats, T., and De-
bois, S. Collaboration among adversaries: distributed workow execution on a
blockchain. In SCFAB 2018 (2018).
[29]
Mashhadi, S. Secure publicly veriable and proactive secret sharing schemes
with general access structure. Information sciences 378 (2017), 99–108.
[30]
Miller, A., Bentov, I., Bakshi, S., Kumaresan, R., and McCorry, P. Sprites and
state channels: Payment networks that go faster than lightning. In International
Conference on Financial Cryptography and Data Security (2019), Springer, pp. 508–
526.
[31]
Nojoumian, M. Unconditionally secure proactive veriable secret sharing using
new detection and recovery techniques. In 2016 14th Annual Conference on
Privacy, Security and Trust (PST) (2016), IEEE, pp. 269–274.
[32]
Ren, Y., Liu, Y., Ji, S., Sangaiah, A. K., and Wang, J. Incentive mechanism of data
storage based on blockchain for wireless sensor networks. Mobile Information
Systems 2018 (2018).
[33]
Rodríguez, A., Fernández-Medina, E., and Piattini, M. A bpmn extension for
the modeling of security requirements in business processes. IEICE transactions
on information and systems 90, 4 (2007), 745–752.
[34]
Shafagh, H., Burkhalter, L., Hithnawi, A., and Duqennoy, S. Towards
blockchain-based auditable storage and sharing of iot data. In Proceedings of the
2017 on Cloud Computing Security Workshop (2017), pp. 45–50.
[35] Shamir, A. How to share a secret. Commun. ACM 22, 11 (nov 1979), 612–613.
[36]
Slaats, T., Mukkamala, R. R., Hildebrandt, T.,and Marqard, M. Exformatics
declarative case management workows as dcr graphs. In Business process
management. Springer, 2013, pp. 339–354.
[37]
Stadler, M. Publicly veriable secret sharing. In International Conference
on the Theory and Applications of Cryptographic Techniques (1996), Springer,
pp. 190–199.
[38]
Sturm, C., Szalanczi, J., Schönig, S., and Jablonski, S. A lean architecture for
blockchain based decentralized process execution. In BPM (2019), F. Daniel, Q. Z.
Sheng, and H. Motahari, Eds., Springer, pp. 361–373.
[39]
Tran, A. B., Lu, Q., and Weber, I. Lorikeet: A model-driven engineering tool
for blockchain-based business process execution and asset management. In BPM
(2018), pp. 56–60.
[40]
Wang, C., and Raviv, N. Low latency cross-shard transactions in coded
blockchain. In 2021 IEEE International Symposium on Information Theory (ISIT)
(2021), IEEE, pp. 2678–2683.
[41]
Wang, S., Zhang, Y., and Zhang, Y. A blockchain-based framework for data
sharing with ne-grained access control in decentralized storage systems. Ieee
Access 6 (2018), 38437–38450.
[42]
Wilkinson, S., and Lowry, J. Metadisk a blockchain-based decentralized le
storage application.
[43]
Xu, Y. Section-blockchain: A storage reduced blockchain protocol, the foundation
of an autotrophic decentralized storage architecture. In 2018 23rd International
Conference on Engineering of Complex Computer Systems (ICECCS) (2018), IEEE,
pp. 115–125.
[44]
Xu, Y., and Huang, Y. Segment blockchain: A size reduced storage mechanism
for blockchain. IEEE Access 8 (2020), 17434–17441.
[45]
Xu, Y., Slaats, T., Düdder, B., Debois, S., and Wu,H. Distributed and adversarial
resistant workow execution on the algorand blockchain. In 6th Workshop on
Trusted Smart Contracts In Association with Financial Cryptography 2022 (2022),
Springer.
[46]
Young, A., and Yung, M. A pvss as hard as discrete log and shareholder separa-
bility. In International Workshop on Public Key Cryptography (2001), Springer,
pp. 287–299.
[47]
Yu, J., Kong, F., and Hao, R. Publicly veriable secret sharing with enrollment
ability. In Eighth ACIS International Conference on Software Engineering, Articial
Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007) (2007),
vol. 3, IEEE, pp. 194–199.
[48]
Zheng, Q., Li, Y., Chen, P., and Dong, X. An innovative ipfs-based storage
model for blockchain. In 2018 IEEE/WIC/ACM International Conference on Web
Intelligence (WI) (2018), IEEE, pp. 704–708.
[49]
Zyskind, G., Nathan, O., et al. Decentralizing privacy: Using blockchain to
protect personal data. In Security and Privacy Workshops (2015), IEEE, pp. 180–
184.
A A DETAILED USAGE CASE
We have shown in this paper that our DPAM serves as a decentral-
ized and distributed database that any update/access over it will
leave an unerasable record. This provides a way to store records of
public interest safely.
Assuming the government of nation
𝐶
is developing a contact
tracing application for COVID-19. Its people demand their privacy
is protected while using the application. In particular, only when a
positive case is detected can the government then acquire the GPS
locations of this person and inform close contacts. In the meantime,
when such operations are performed, the scope of data acquisition
and the time of acquiring are logged.
The government decides to employ a blockchain and our DPAM
for this app. The transaction throughput of blockchain is enhanced
via existing blockchain sharding architectures. Every citizen with
a valid digital ID can register a device (usually a personal smart-
phone) as a node in the blockchain.
Normal usage: Each device regularly uploads the GPS-location by
12
A two-dimensional sharding model for access control and data privilege management of Blockchain
(1)
sending an information pair ({GPS location, UNIX times-
tamp}, one_time_index) to the blockchain. one_time_index
is a random index generated for this information pair.
(2)
sending an information pair (one_time_index, identity-
address) to our DPAM where the identity-address can be
indexed by the one_time_index, but no one has the access
privilege to read this identity-address. This one_time_index
is the same with the one in the corresponding ({GPS loca-
tion,UNIX timestamp}, one_time_index).
(3)
Append the GPS location and UNIX timestamp to a space in
our DPAM that can be indexed by the identity-address and
only the identity-address can read the information in this
space. Let there exist an information pair in government’s
private database (real-world-identity-number, identity-
address) where identity-address can be indexed by real-
world-identity-number and only the government has the
access privileges to the identity-address.
Government requiring the information: A person has been
detected COVID-19 positive. The government acquires the per-
son’s identity-address through its real-world-identity-number.
The government then informs the blockchain that this identity-
address has been tested positive. After that, everyone is allowed
to see the information in the relevant time window in the location
of this identity-address and read all the GPS locations as well as
the UNIX timestamps in it.
Inform close contacts: When some GPS locations and UNIX times
have been determined relating to COVID-19, the government are
allowed to acquire all identity-addresses via ({GPS-location
±
10 meters, UNIX timestamp
±
10 minutes}, identity-address)
pairs. The government can then contact close contacts.
Note that a random delay is imposed between step 2 and step 3 of
normal usage to avoid the two pieces of information being linked
together. Step 3 can be held until the GPS locations accumulated in,
e.g., every 12 hours sent to our DPAM at once, so as to completely
decouple step 3 from step 2.
In this design, only the government knows the real identity
associated with the identity address. The government can only
know the GPS locations of the person who has been classied as
COVID-19 positive. It can only acquire other identity addresses
by querying the GPS location when someone who it declared as
COVID-19 positive went to this location.
All these operations are logged in the blockchain. People will
know when and why their locations have been acquired. Then this
design satises the privacy requirements, allows for transparency,
and provides a massive number of nodes for our DPAM to ensure
the failure probability to below the threshold. The logs can be
audited for legal purposes.
13
... Blockchain sharding [16], [27], [14], [5], [29], [7], [8], [15], [20], [9], [12], [13], [23], [24] is a method that aims to improve the scalability of a vote-based blockchain system by randomly dividing the network into smaller divisions, called shards. The idea is to increase parallelism and reduce the overhead in the consensus process of each shard, thereby increasing efficiency. ...
... VII. RELATED WORK Several sharding protocols [16], [21], [24], [14], [27], [2] have been proposed in recent years to address scalability issues in blockchain systems. This section reviews some of the most representative ones. ...
... attribute is insufficient to handle the large number of nodes joining the blockchain as its scale increases. Other access control methods proposed by Xu et al. [21] and Bera et al. [22] are primarily used in private blockchains, where the number of users is relatively small and strict approval is required for users to join the blockchain. In contrast, for multi-platform and multi-dimensional education data management, access control is mainly oriented toward alliance chains, which accommodate a larger number of users and have lower entry requirements. ...
Article
Full-text available
Exploring diverse education data applications through blockchain technology can offer innovative methods for smart education. Among them, storing multi-platform and multi-dimensional education data on the blockchain can enhance the system’s ability to integrate and manage the data effectively. However, the system involves a large number of stakeholders. Furthermore, because the data are stored chronologically, relevant information may be distributed across multiple blocks. They will lead to security issues related to access and inefficient queries. We innovatively design an education data management system that combines access-control views with a consensus algorithm, effectively addressing the issue of low query performance for security data. Additionally, we propose a method that integrates main chain blocks and side blocks within a chained storage structure to recycle view blocks, effectively addressing the issue of block redundancy caused by non-periodic changes in the utilization of education data. For the proposed method, experiments are conducted on two education datasets to validate the efficiency of the proposed access-control view in query performance and the effectiveness of the view maintenance consensus algorithm.
... However, they do not consider that a single attribute is insufficient to handle the large number of nodes joining the blockchain as its scale increases. Other access control methods proposed by Xu et al. [21] and Bera et al. [22] are primarily used in private blockchains, where the number of users is relatively small and strict approval is required for users to join the blockchain. In contrast, for multi-platform and multi-dimensional education data management, access control is mainly oriented towards alliance chains, which accommodate a larger number of users and have lower entry requirements. ...
Preprint
Full-text available
Exploring diverse education data applications through blockchain technology can offer innovative methods for smart education. Among them, storing multi-platform and multi-dimensional education data on the blockchain can enhance the system's ability to integrate and manage the data effectively. However, the system involves a large number of stakeholders. Furthermore, because the data are stored chronologically, relevant information may be distributed across multiple blocks. They will lead to security issues related to access and inefficient queries. We innovatively design an education data management system that combines access-control views with a consensus algorithm, effectively addressing the issue of low query performance for security data. Additionally, we propose a method that integrates main chain blocks and side blocks within a chained storage structure to recycle view blocks, effectively addressing the issue of block redundancy caused by non-periodic changes in the utilization of education data. For the proposed method, experiments are conducted on two education datasets to validate the efficiency of the proposed access control view in query performance and the effectiveness of the view maintenance consensus algorithm.
Article
Location data management plays a crucial role in facilitating data collection and supporting location-based services. However, the escalating volume of transportation big data has given rise to increased concerns regarding privacy and security issues in data management, potentially posing threats to the lives and property of users. At present, there are two possible attacks in data management, namely Reverse-clustering Inference Attack and Mobile-spatiotemporal Feature Inference Attack. Additionally, the dynamic allocation of privacy budgets emerges as an NP-hard problem. To protect data privacy and maintain utility in data management, a novel protection model for location privacy information in data management, Classified Regional Location Privacy-Protection Model based on Personalized Clustering with Differential Privacy (PCDP-CRLPPM), is proposed. Firstly, a twice-clustering algorithm combined with gridding is proposed, which divides continuous locations into different clusters based on the different privacy protection needs of different users. Subsequently, these clusters are categorized into different spatiotemporal feature regions. Then, a Sensitive-priority algorithm is proposed to allocate privacy budgets adaptively for each region. Finally, a Regional-fuzzy algorithm is presented to introduce Laplacian noise into the centroids of the regions, thereby safeguarding users’ location privacy. The experimental results demonstrate that, compared to other models, PCDP-CRLPPM exhibits superior resistance against two specific attack models and achieves high levels of data utility while preserving privacy effectively.
Article
Smart contracts executed on blockchains are interactive programs where external actors generate events that trigger function invocations. Events can be emitted by participants asynchronously. However, some functionalities should be restricted to participants inhabiting specific roles in the system, which might be dynamically adjusted while the system evolves. We argue that current smart contract languages adopting imperative programming paradigms require additional complicated access control code. Furthermore, smart contracts are often developed and evolved independently and cannot share a joint access control policy. This makes it challenging to ensure the correctness of access control properties and to maintain correctness when the contracts are adapted. We propose using dynamic condition response (DCR) graphs for role‐based and declarative access control for smart contracts and techniques for test‐driven modelling and refinement of DCR graphs to support the safe design and evolution of smart contracts. We show that they allow for capturing and visualizing a form of dynamic access control where access rights evolve as the contract state progresses. Their use supports the straightforward declaration of access control rights, improved code auditing, test‐driven modelling, and safe evolution of smart contracts and improves users' understanding.
Conference Paper
Full-text available
Sharding blockchains could improve the transaction throughput and achieve scalibility, making the application fields of the blockchain technology more extensive. Cross-shard transactions account for a large fraction of transactions in a sharding blockchain, so the processing method of cross-shard transactions is of vital importance to the system efficiency. In this paper, we focus on the study of cross-shard transaction processing methods. Firstly, a summary of cross-shard transaction processing methods for sharding blockchains is given. Secondly, we propose RSTBP, which is built on the basis of a two phase commit protocol. In RSTBP, an input shard runs an intra-shard consensus algorithm, i.e., a Byzantine fault tolerance (BFT) algorithm, to process multiple inputs of different transactions simultaneously. For each input, a corresponding proof of availability is generated and sent to the relevant shards. Compared with previous schemes, the number of BFT calls is reduced by hundreds of times when processing the same number of transactions. Thirdly, RSTSBP is designed by making some modifications to RSTBP. The proofs of availability are constructed according to different shards. The Merkel tree structure is different from that of RSTBP to cut down message complexity of the proofs. Both of the two schemes are proved to satisfy the consistency, liveness and responsiveness properties, and improve the cross-shard transaction processing efficiency.
Article
Full-text available
Blockchain technology enables the execution of collaborative business processes involving mutually untrusted parties. Existing tools allow such processes to be modeled using high-level notations and compiled into smart contracts that can be deployed on blockchain platforms. However, these tools do not provide mechanisms to cope with the flexibility requirements inherent to open and dynamic collaboration environments. In particular, existing tools adopt a static role binding approach wherein roles are bound to actors upfront when a process instance is created. Also, these tools do not allow participants to collectively make choices regarding alternative sub-processes or branches in the process model, at runtime. This paper presents a model for dynamic binding of actors to roles in collaborative processes and an associated binding policy specification language. The proposed language is endowed with a Petri net semantics, thus enabling policy consistency verification. Furthermore, the paper introduces a model for consensus-based control-flow flexibility, wherein participants in a process can collectively agree on how to steer the business process within the boundaries defined by control-flow agreement policies. The paper also outlines an approach to compile policy specifications into smart contracts for enforcement. An experimental evaluation shows that the cost of policy enforcement increases linearly with the number of roles, control-flow elements, and policy constraints.
Conference Paper
Full-text available
Distributed Ledger Technology (DLT) and blockchains in particular have been identified as promising foundations to realize inter organizational business processes. Capabilities such as shared data and decision logic defined as smart contracts open up entirely new ways to implement process choreographies. However, current choreography modeling languages solely focus on direct interactions between organizations ; they do not take into account the conceptually new features of blockchains, like shared data and smart contracts. To bridge the gap between choreography modeling and implementation, this paper critically analyzes the assumptions of choreography languages. We propose new language concepts specifically targeting blockchain capabilities, and we define their operational semantics. Our work is evaluated with a proof-of-concept implementation and an analysis of three real-world case studies from the private and the corporate sectors.
Article
Full-text available
The exponential growth of the blockchain size has become a major contributing factor that hinders the decentralisation of blockchain and its potential implementations in data-heavy applications. In this paper, we propose segment blockchain, an approach that segmentises blockchain and enables nodes to only store a copy of one blockchain segment. We use PoW as a membership threshold to limit the number of nodes taken by an Adversary—the Adversary can only gain at most n/2 of nodes in a network of n nodes when it has 50% of the calculation power in the system (the Nakamoto blockchain security threshold). A segment blockchain system fails when an Adversary stores all copies of a segment, because the Adversary can then leave the system, causing a permanent loss of the segment. We theoretically prove that segment blockchain can sustain a (AD/n)m(AD/n)^{m} failure probability when the Adversary has no more than AD number of nodes and every segment is stored by m number of nodes. The storage requirement is mostly shrunken compared to the traditional design and therefore making the blockchain more suitable for data-heavy applications.
Chapter
Full-text available
Cross-organizational business processes involving multiple participants are choreographed, thus rely on mutual trust of collaborators or need to be coordinated by a central instance. Using Smart Contracts, business processes can be executed without a mutually trusted and centralized orchestrating authority. Former Blockchain-based execution framework proposals focus on orchestration diagrams as a basis for execution. Contrary, this work focuses on BPMN process collaboration diagrams as implementation basis and makes additional transformation steps obsolete. With the herein proposed framework for execution of cross-organizational process collaborations, another approach for the implementation and execution of inter-organizational processes on a Blockchain is presented, including a voting mechanism for process deployment as well as a subscription service to facilitate process handovers between participants more efficiently. The framework is exemplified and evaluated with a use case from a large German industrial manufacturing company.
Chapter
We provide a practical translation from the Dynamic Condition Response (DCR) process modelling language to the Transaction Execution Approval Language (TEAL) used by the Algorand blockchain. Compared to earlier implementations of business process notations on blockchains, particularly Ethereum, the present implementation is four orders of magnitude cheaper. This translation has the following immediate ramifications: (1) It allows decentralised execution of DCR-specified business processes in the absence of expensive intermediaries (lawyers, brokers) or counterparty risk. (2) It provides a possibly helpful high-level language for implementing business processes on Algorand. (3) It demonstrates that despite the strict limitations on Algorand smart contracts, they are powerful enough to encode models of a modern process notation.KeywordsApplications of blockchainSmart contractsAlgorandInter-institutional collaboration
Article
Distributed ledgers provide high availability and integrity , making them a key enabler for practical and secure computation of distributed workloads among mutually distrustful parties. Many practical applications also require strong confidentiality , however. This work enhances permissioned and permissionless blockchains with the ability to manage confidential data without forfeiting availability or decentralization. The proposed Calypso architecture addresses two orthogonal challenges confronting modern distributed ledgers: (a) enabling the auditable management of secrets and (b) protecting distributed computations against arbitrage attacks when their results depend on the ordering and secrecy of inputs. Calypso introduces on-chain secrets, a novel abstraction that enforces atomic deposition of an auditable trace whenever users access confidential data. Calypso provides user-controlled consent management that ensures revocation atomicity and accountable anonymity. To enable permissionless deployment, we introduce an incentive scheme and provide users with the option to select their preferred trustees. We evaluated our Calypso prototype with a confidential document-sharing application and a decentralized lottery. Our benchmarks show that transaction-processing latency increases linearly in terms of security (number of trustees) and is in the range of 0.2 to 8 seconds for 16 to 128 trustees.