PreprintPDF Available

When Blockchain Meets Distributed File Systems: An Overview, Challenges, and Open Issues

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Constructing globally distributed file systems (DFS) has received great attention. Traditional Peer-to-Peer (P2P) distributed file systems have inevitable drawbacks such as instability, lacking auditing and incentive mechanisms. Thus, Inter-Planetary File System (IPFS) and Swarm, as the representative DFSs which integrate with blockchain technologies, are proposed and becoming a new generation of distributed file systems. Although the blockchain-based DFSs successfully provide adequate incentives and security guarantees by exploiting the advantages of blockchain, a series of challenges, such as scalability and privacy issues, are also constraining the development of the new generation of DFSs. Mainly focusing on IPFS and Swarm, this paper conducts an overview of the rationale, layered structure and cutting-edge studies of the blockchain-based DFSs. Furthermore, we also identify their challenges, open issues and future directions. We anticipate that this survey can shed new light on the subsequent studies related to blockchain-based distributed file systems.
Content may be subject to copyright.
Date of publication March, 2020, date of current version March 12, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2979881
When Blockchain Meets Distributed File
Systems: An Overview, Challenges, and
Open Issues
HUAWEI HUANG (MEMBER, IEEE), JIANRU LIN (MEMBER, IEEE), BAICHUAN ZHENG ,
ZIBIN ZHENG (SENIOR MEMBER, IEEE), JING BIAN
School of Data and Computer Science, Sun Yat-Sen University, 510006, Guangzhou, China
National Engineering Research Center of Digital Life, Sun Yat-sen University, Guangzhou, China
Corresponding author: Zibin Zheng. (e-mail: zhzibin@mail.sysu.edu.cn).
The work described in this paper was supported by the National Key Research and Development Program (2016YFB1000101), the
National Natural Science Foundation of China (61902445, 61722214), and the Guangdong Province Universities and Colleges Pearl River
Scholar Funded Scheme (2016).
ABSTRACT Constructing globally distributed file systems (DFS) has received great attention. Traditional
Peer-to-Peer (P2P) distributed file systems have inevitable drawbacks such as instability, lacking auditing
and incentive mechanisms. Thus, Inter-Planetary File System (IPFS) and Swarm, as the representative DFSs
which integrate with blockchain technologies, are proposed and becoming a new generation of distributed
file systems. Although the blockchain-based DFSs successfully provide adequate incentives and security
guarantees by exploiting the advantages of blockchain, a series of challenges, such as scalability and privacy
issues, are also constraining the development of the new generation of DFSs. Mainly focusing on IPFS and
Swarm, this paper conducts an overview of the rationale, layered structure and cutting-edge studies of the
blockchain-based DFSs. Furthermore, we also identify their challenges, open issues and future directions.
We anticipate that this survey can shed new light on the subsequent studies related to blockchain-based
distributed file systems.
INDEX TERMS Blockchain, Distributed File Systems, IPFS, Swarm
I. INTRODUCTION
THERE have been many attempts dedicated to construct-
ing a distributed file system. The phenomenal popularity
and study of Peer-to-Peer (P2P) services, such as Napster
[1], Gnutella [2], Kazaa [3] and Morpheus [4], make the
implementation of distributed file systems an exciting and
promising research field. As one of the most successful P2P
distributed file system, BitTorrent [5] has supported over
100 million online users. It has a large-scale deployment
where tens of millions of nodes join and churn everyday. In a
distributed file system, storage resources and system clients
are dispersed in the network. Each user is both a creator and a
consumer of data stored in the system. Thus, the challenge is
to provide considerable incentives in an efficient, secure and
practical manner.
By far, the biggest distributed file system is HyperText
Transfer Protocol (HTTP), which is a web server used to
upload data. Then, other peers can access to a particular
data anywhere allover the world. To ensure the data ac-
cessibility in web servers, a maintaining cost needs to pay.
Such maintaining cost increases along with the growth of
data popularity. Moreover, another problem is that there are
very few ways to share the burden of information dissemi-
nation with the clients directly. This is because HTTP lacks
upgrading design and thus fails to take advantages of the
advanced file distribution techniques proposed in the past few
years. Meanwhile, P2P technique had been gathering a great
pace and soon dominated the majority of data packets in the
Internet. Such P2P file systems, like BitTorrent [5], optimize
resources brilliantly by giving different pieces of popular data
to clients and enabling them swap the missing parts between
each another. In this way, the bandwidth consumption of
hosts can be balanced and the overall cost of operational
expenditure (OPEX) can be also degraded.
Although BitTorrent has a lot of advantages aforemen-
tioned, the following inevitable drawbacks cannot be ig-
nored:
1) Downloading is unstable, which limits BitTorrent to be
VOLUME 4, 2016 1
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
widely used in specific occasions.
2) Unable to verify file publishers, and it is hard to guar-
antee the credibility of the content downloaded.
3) There is no incentive mechanism such that the seed
nodes are not rewarded for sharing their bandwidth and
storage resources.
Anticipating to replace HTTP, Zeronet [6] adopted Bit-
Torrent as the file distribution mechanism for Web content.
However, simply sharing bandwidth, storage and computing
resources cannot provide the brilliant experience as HTTP
users expect.
Recently, blockchain has become a buzzword in both
industry and academia, and the combination of blockchain
and distributed file system is becoming a promising solution,
where blockchain is expected to provide incentives and se-
curity for the stored files in systems. Currently, the popular
blockchain-based distributed file systems include IPFS [7],
Swarm [8], Storj [9], and PPIO [10]. Within those file sys-
tems, IPFS is a peer-to-peer distributed file system for storing
and accessing files, websites, applications and data; Swarm is
a distributed storage platform and content distribution service
based on Ethereum; Storj is another peer-to-peer decentral-
ized cloud storage platform that allows users to share data
without relying on a third-party data provider; and PPIO is
a decentralized programmable storage network that permits
users store and retrieve any data from anywhere on web. With
respect to the combination with blockchains, IPFS, Swarm,
and Storj file systems adopt Filecoin [11], Ethereum [12], and
Metadisk [13] as their incentive mechanisms, respectively.
PPIO exploits up to 4 proof algorithms, which are explained
in Section III, for its incentive layer.
Considering that the technologies of all distributed file
systems are similar to IPFS and Swarm, we review the
recent cutting-edge studies of blockchain-based DFSs mainly
focusing on IPFS and Swarm.
The contribution of this survey includes the following
aspects.
This paper first introduces the layered structure of
blockchain-based DFSs. We then make a comprehensive
taxonomy of the cutting-edge studies on the scalability
and privacy perspectives.
We also clarified the challenges, open issues and future
directions of the blockchain-based DFSs.
To the best of our knowledge, this is the first survey
related to the blockchain-based DFSs. Our review in this
article can help subsequent researchers well understand
both the current development and the future trends of
the blockchain-based DFS.
The rest of this paper is organized as follows. In Section
II, we explain necessary preliminaries and basic concepts.
Section III shows the layered structure of distributed file
systems. Section IV summarizes the cutting-edge studies.
Section V discusses open issues, challenges and future direc-
tions. Finally, section VI concludes this paper. We also show
the structure of this survey in Figure 1.
Blockchain-based Distribu ted File Systems (DFS)
Preliminaries of DFS Layered Structure of
Blockchain-based DFS
Cutting-Edge Studies of
Blockchain-based DFS
Open Issue s, Challenges &
Futur e Directions
FIGURE 1. The structure of this article.
II. PRELIMINARIES
Since the blockchain-based distributed file systems empha-
sized on this article have a close correlation with the basic
data structure of blockchains, we first introduce the prelimi-
naries of Merkle Tree and Merkle DAG. Then, we have an
overview of BitTorrent, which can help us understand the
rationale of distributed file systems such as IPFS and Swarm.
A. MERKLE TREE & MERKLE DAG
Merkle Tree [14] is a binary tree built based on a crypto-
graphic hash function. Each leaf in an merkle tree has a hash
value which is computed by one or multiple imported values.
Each parent node derives its hash value from children’s value
which is recursively dependent on all values in its sub-
tree. Figure 2 illustrates an example of a merkle tree, each
leaf (H1-H4) obtains its value though computing imported
value (D1-D4) and parents (H5-H6) derive values from their
children (H1-H4) and finally the root of this merkle tree (H7)
is obtained which is relevant with every value in the tree.
In blockchain area, merkle tree is usually used for integrity
validations (for example the block validation in bitcoin
[15]) and quick validations (for example the light peers in
Ethereum [12]). Since a tiny change in a merkle tree can
drastically change the root of the tree, we can do integrity
validation by simply storing the root. To validate if a node is
in a merkle tree, only the a few hashes of nodes are needed,
instead of the entire tree. For example in Figure 2, H4 and
H5 are needed to validate if H3 is in the tree. Using H3, H4
and H5, a root (H8) can be computed. By comparing H7 and
H8, we can confirm that H3 is in the tree if two roots are the
same, or H3 is not in the tree if two roots are different.
Similar to the concept of merkle tree, Merkle DAG (Di-
rected Acyclic Graph) [7] is used in IPFS as a data object
model. An object in IPFS is a structure containing two
attributes: Data and Links. Each Link structure includes three
attributes: Name, Hash and Size. Using this object structure,
IPFS can compose objects and build a directed acyclic graph.
In IPFS, merkle DAG organizes the structure of a file or even
a file directory which is shown in Figure 3. In this figure,
there are two files (example.js and hello.txt) and one file
path (dir) in the root path of this file directory, example.js
is divided into three different data pieces and file path dir
has two files: other.txt and example.txt (here file content
2VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
D1 D2 D3 D4
Data Blocks
H1
hash(D1)
H2
hash(D2)
H3
hash(D3)
H4
hash(D4)
H5
hash(H1+H2)
H6
hash(H3+H4)
H7
hash(H5+H6)
FIGURE 2. An illustration of a Merkle Tree.
Qmb5wd
“”
Qm8nok
metadata
Qm9dv4 …
“”
Qmwe89
“other\n”
Qm287n
“Hello\n”
Qm23dv
Data piece3
Qm04km …
Data piece2
Qm95kv …
Data piece1
example.js dir
hello.txt
example.txt
other.txt
FIGURE 3. Merkle DAG in IPFS.
of example.txt in dir and hello.txt in root path are exactly
the same therefore they are linked to the same object), each
object derives its value though computing its children’s value
and the content of data.
B. OVERVIEW OF BITTORRENT
As mentioned earlier, BitTorrent (BT) is one of the most
popular distributed file systems. Basically, the process of file
sharing in BitTorrent is illustrated in Figure 4, and can be
described with the following 5 steps:
Peer A interacts with a Web server and downloads a
.torrent file.
Peer A interacts with the tracker which peer A finds in
the *.torrent file, and requests a list of peers that are in
the Torrent network.
Tracker sends a list of a specified number of peers that
are in the Torrent network.
Peer A selects randomly a part of candidate peers from
the list as its neighbors and establishes connections with
each of them.
Then peer A can exchange file pieces with its neighbors
using swarming technique.
In BitTorrent, an overlay network called Torrent is estab-
lished when each file is being distributed. Torrent is com-
posed by peers in a network which can be classified into
two types: seed and leecher. A seed is a client which has a
complete copy of a file, while a leecher is a client which is
downloading a file. Besides seed and leecher, the Web servers
Web
Server
Seed
Tracker
Leecher
A peer
:Download .torrent file
:Exchange file pieces
:Establish connections
:Request a list of peers
:Send a list of peers
1
1
3
4
5
2
3
24
5
4
5
FIGURE 4. Mechanism of file-sharing in BitTorrent, which can help us well
understand the blockchain-based distributed file system IPFS.
and trackers are also required. If a peer wants to join a Torrent
network, it can obtain a .torrent file from a Web server. This
file contains information of a file including its name, length,
hash digest and the URL of the tracker. A tracker is a special
peer storing the meta information of peers which are active
in a Torrent network. A peer can interact with a tracker and
obtain the list of IP/Port pairs of other peers in a Torrent,
and then select randomly about 20-40 peers from the list
as its neighbors. In BitTorrent, a file exchanging technique
called swarming is adopted to separate a file into fixed-size
pieces each of which is usually with a 256 KB in size [5].
When a piece is fully downloaded, a peer compares its SHA1
hash value with the value in the .torrent file. If match, the
peer announces the availability of this complete piece to its
neighbors for further file exchanging and downloading.
III. THE LAYERED STRUCTURE OF
BLOCKCHAIN-BASED DISTRIBUTED FILE SYSTEMS
In this section, we present the typical layered structure of
blockchain-based distributed file systems by particularly em-
phasizing on IPFS and Swarm. The structure is shown in
Table 1 in detail. Generally, we classify 7 layers behind the
popular distributed file systems, i.e., Identities Layer, Data
Layer, Data-swap Layer, Network Layer, Routing Layer,
Consensus Layer and Incentive Layer. Each layer is a critical
module for distributed file systems. We summarize their
functions and related references in Table 1.
A. IDENTITY LAYER
To archive the content distribution between nodes in P2P file
system, each node has to be identified by a unique identifier,
which needs to ensure collision-free. It means that two differ-
ent data objects can never map to the same identifier. In IPFS,
the encrypted hash (in multi-hash format) of a public key, i.e.,
NodeId, is used to identify each node. The format of multi-
hash is hhash function codeihhash digest lengthihhash digest
bytesi. Nodes periodically check public keys and NodeId
VOLUME 4, 2016 3
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Layer Function Examples and References
Identity Layer Identity layer assigns unique id for each node Keccak hash [16]
Data layer Data layer organizes file structure in distributed file system Merkle DAG [7]
Data-swap layer Data swap layer formulates file sharing strategy between each node BitSwap [5]
Network layer Network layer enables nodes to discover other nodes, establish connec-
tions and exchange file with each other in a secure environment libP2P [17], Devp2p [18]
Routing layer Routing layer enables each file piece to be located and accessible by
nodes in the network
Distributed sloppy hash table (DSHT) [19], Dis-
tributed preimage archive (DPA) [20]
Consensus layer
Consensus layer ensures ledger recording transactions in each node are
basically correct and encourages users to maintain the consistency of
the network
“Expected Consensus" [21], Proof-of-Work [12]
Incentive layer
Incentive layer establishes reward and punishment mechanism of dis-
tributed file system, encourages nodes in the network to be active and
honest in the transaction
Filecoin [11], SWAP, SWEAR and SWINDLE [8]
TABLE 1. Layered structure of Distributed File systems such as IPFS & Swarm.
when connecting with each other. In Swarm systems, the
node hash-address is generated by Keccak 256bit SHA3 [16]
using the public key of an Ethereum account.
B. ROUTING LAYER
Generally, the functionalities of the routing layer of a dis-
tributed file system includes: 1) maintaining peer-connection
topology such that specific peers and data objects can be
located, 2) responding to the queries from both local and
remote peers, and 3) communicating with distributed hash
tables.
IPFS adopts Distributed Sloppy Hash Table (DSHT) [19] ,
which is implemented based on S/Kademlia [22] and Coral
[23]. Such the DSHT located in a peer can help find 1)
the network addresses of other peers, and 2) the group of
peers who can serve specific data objects. The conventional
Distributed Hash Table (DHT) stores small values. For larger
values, DSHT stores references, i.e., the NodeIds of peers
who can serve a block. It should be noticed that IPFS is
highly modular, and DSHT is just a temporal protocol that
can be displaced in the future.
Swarm implements its routing layer using Distributed
Preimage Archive (DPA) technique [20]. In such DPA, a
source object is divided into equal-sized chunks which are
then synced to different nodes. When receiving these content-
addressed chunks other nodes could sync them to their neigh-
bors that are in the same address space.
C. NETWORK LAYER
Under the framework of IPFS, an advanced generic P2P so-
lution, named libP2P [17], is exploited as the network layer.
libP2P is developed based on bittorrent DHT implementa-
tion. Based on libP2P, IPFS can use any network protocol to
transfer data. If underlying network is not stable, IPFS can
alter to choose UTP [24] or SCTP [25]. IPFS achieves this
free shifting mainly by using multiaddr formatted technique
[7], which combines addresses and corresponding protocols.
Swarm relies on the Ethereum P2P network, which is
comprised of three different protocols: 1) RLPx (Recursive
Length Prefix) [26] for node discovery and secure data
transmission, 2) DevP2P [18] for node session establishment
and message exchange, and 3) Ethereum subprotocol [27].
DevP2P [18] is inspired by libP2P and has security properties
that are beneficial to Swarm. When discovering through
RLPx, Swarm nodes establish TCP connections and send
“HELLO" messages including NodeId, listening port and
other attributes based on DevP2P. Sessions start to transmit
data packets. Due to the ecosystem of Ethereum, Swarm
has a large number of long-term nodes, which support the
robustness and stability of Swarm systems.
D. DATA LAYER
There are four levels of the data model in IPFS:
Block: an arbitrary-sized piece of data.
List: a collection of blocks or other lists.
Tree: a collection of blocks, lists, or other trees.
Commit: a snapshot in the version history of a tree.
Such data model is similar to that of Git [28]. Based on
this data model, IPFS systems employ Merkle DAG to store
data. Merkle DAG identifies data and links in each data object
with multi-hash technique [7], which protects stored data
from tampering, and makes file path to be retrieved easily
because data object is converted into string-formatted path
(with a format like /ipfs/object-hash/object-name). To divide
a file into independent blocks, IPFS exploits many algorithms
such as rsync rolling-checksum algorithm [29], and Rabin
Fingerprints [30].
Swarm also defines a set of data structures:
Chunk: a fixed-size (maximum 4 KB [7]) piece of data.
File: a complete set of chunks.
Manifest: a mapping between paths and files, which
handles file collection.
Chunker, which is a Swarm’s component for splitting and
recovering files, is able to process live stream data. After
being split, chunks are collected to calculate the Swarm
hashes, in which a hash algorithm is used to obtain the root
hash of the Merkle Tree. The root hash is then used to identify
a specific file and avoid tampering. During this procedure,
the hash of each chunk is also calculated and is treated as a
reference to this chunk.
4VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Veri f y
Store
User
Get
Put
Pledge
Filecoin IPFS Network
FIGURE 5. Mechanism and position of Filecoin [11], which is adopted by
IPFS and exploits blockchain as its fundamental components.
E. INCENTIVE LAYER
1) Incentive Layer of IPFS
As shown in Figure 5, Filecoin [11] is a blockchain-based
digital payment system, which supports digital storage and
data retrieval for IPFS users. It is adopted as an incentive
layer for IPFS. There are two markets in Filecoin: a Storage
Market and a Retrieval Market. The data of Storage Market
is stored on the Filecoin blockchain, and the data of Retrieval
Market is recorded off-chain.
These two markets provide data storage and data retrieval
services via a network composed of Storage Clients,Storage
Miners,Retrieval Clients and Retrieval Miners. Those partic-
ipants are explained as follows.
Storage Clients are those who need file storage services.
They are on the demand side of Storage Market.
Storage Miners are the nodes which provide storage to
a Filecoin system using its free disk space. They are
on the supply side of Storage Market. The transactions
occurred on the Storage Market contribute new blocks
to the Filecoin blockchain.
Retrieval Clients are those who desires to retrieve a spe-
cific resource from the network. They are the demand
side of the Retrieval Market.
Retrieval Miners are those who provide network re-
sources, such as bandwidth, helping retrieval clients
search for the retrieval information. They are on the
supply side of the Retrieval Market.
To store data in Filecoin, a storage client first submits a bid
order to Storage Market. If a storage miner intends to take a
bid order, it has to send a request order to Storage Market.
When Storage Market is receiving a bid order and a request
order, storage clients and storage miners start to exchange
blocks and submit a signed deal order to Storage Market.
After that, storage miner must prove the data stored in its
dedicated uniquely physical storage by repeatedly generating
proofs of replication, which is then verified by IPFS.
To retrieve data from Filecoin, similarly a retrieval client
first submits a bid order to Retrieval Market. When Retrieval
Market is receiving a request order from a retrieval client, the
retrieval miners begin to transport data and submit a signed
deal order to Retrieval Market to confirm whether a retrieve
deal is succeeded or not.
2) Incentive Layer of Swarm
In Swarm, incentive scheme consists of two important parts:
1) bandwidth incentives, and 2) storage incentives. This is
because bandwidth and storage are the two most important
resources in a distributed file system.
Bandwidth Incentives. In the context of Swarm, the ser-
vice of delivering chunks is chargeable, and nodes can trade
services for services or services for tokens. In order to moti-
vate nodes to provide stable services in a credible context,
Swarm proposes the Swarm Accounting Protocol (SWAP)
[8]. Firstly, nodes negotiate chunk price when communicat-
ing in the handshake protocol. Different prices mean vary-
ing bandwidth costs. After chunk price is set, chequebook
contract is used to secure the payment. Chequebook contract
is a kind of smart contract and has ether (Ethereum token)
balance. Another secure payment called channel contract
is later proposed by Swarm and can be seen in [8]. Both
modes of payment support secure off-chain transactions and
delayed updates. All of the transactions are stored in the state
of Ethereum blockchain which cannot be tampered. Finally,
nodes establish network connection and exchange data.
Storage Incentives. Swarm encourages nodes to preserve
the data that has been uploaded to network.Normally long-
term data preservation is not realistic. Unpopular chunks do
not bring enough profits and may be cleaned up to make room
for new chunks. In order to guarantee long-term availability
of data, owner of each chunk needs to compensate for storage
of nodes. To manage storage deals, Swarm adapts a set of
incentive schemes: SWAP,SWEAR and SWINDLE, which are
described as follows.
SWAP [8]: Nodes establish connections with their reg-
istered peers that are the target nodes they want to
compensate to and sign contracts with. Then they can
swap information including syncing, receipting, price
negotiation and payments.
SWEAR [8]: Registered peers are responsible for their
promises of long-term storage and they must register
via the SWEAR (Secure Ways of Ensuring Archival or
Swarm Enforcement And Registration) [8] contract on
Ethereum by uploading their deposit. Peers are stood to
be punished and lose deposit in an on-chain litigation
process if they violate the rules.
SWINDLE [8]: Nodes provide signed receipts for
stored chunks. When dispute about whether the rules
are violated has occurred, nodes that lost the chunk can
submit a challenge to the SWINDLE (Secured With
Insurance Deposit Litigation and Escrow) [8] contract
by uploading the receipt of the lost chunk. Nodes can
also propose the refutation of a challenge by uploading
the chunk or proof of custody. Swindle contract decides
which one is guilty by checking the hash of the chunk.
When chunks are being forwarded, a chain of contracts
are created based on the incentive schemes aforementioned,
which elegantly solve the disputes between nodes.
F. DATA-SWAP LAYER
IPFS adopts BitSwap [5] as its data-swap layer. BitSwap
is based on BitTorrent protocol. In detail, BitSwap nodes
VOLUME 4, 2016 5
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
provide the blocks they are holding to each other directly,
aiming to spread the blocks within their group. The debt of
a node raises when it receives target blocks and decreases
when it contributes blocks that the other nodes desire. Thus,
BitSwap encourages nodes to cache and contribute blocks
positively.
To prevent the nodes that never share, each BitSwap node
checks the debt of the other peers before they exchange
blocks. BitSwap nodes also keep ledgers that record the
transferring history, and exchange ledgers with each other
when establishing connections. This exchange-policy pro-
tects BitSwap ledger from tampering, and isolates the ma-
licious nodes that lose ledger intentionally.
In Swarm, nodes store chunks for selling to get profits
when they receive a data-retrieve request. If nodes do not
have the target chunk claimed in the retrieve request, they
pass the retrieve request to the nearest neighbor node. During
managing storage transactions, receipts play an important
role. When Swarm nodes interact with any contracts, receipts
are generated and stored in Swarm. In this way, the source of
a chunk is accessible, and a commitment in case of litigations
can be traced.
G. CONSENSUS LAYER
Consensus mechanism is critical for every blockchain sys-
tem. In a large distributed network, multiple peers form a
network cluster normally through asynchronous communica-
tions. Network could be congested, resulting in that the error
messages propagate all over the system. Thus, peers could be
failed if they cannot communicate with other with a consen-
sus network view [31]. Therefore, it is necessary to define a
resilient consensus protocol that can work in the unreliable
asynchronous networks for distributed file systems. The aim
of such consensus protocol is to ensure that each peer reaches
a secure, reliable and consistent state without a centralized
synchronizer.
In the following, we review several typical consensus
protocols proposed by recent representative studies.
1) “Expected Consensus" Algorithm of FileCoin
Different from Ethereum which only has one main chain,
Filecoin [11] contains not only a single main chain, but also
astorage market as well. Users in Filecoin interact with the
storage market. These interactions of users are stored in the
main-chain ledger. Three proofs that play an important role
in consensus process of Filecoin are summarized as follows.
Transaction Proof: After miner and user have reached
a deal, the main chain locks the token of the user
and deposit of the miner. Main chain also records the
information about the transaction including hard disk
sector of miner, details of deposit, transaction fee and
storage deadline, etc.
Proof-of-Replication (PoRep): A file is divided into
pieces and each piece is accepted by a storage miner.
At this time, a storage miner may pretend to store a
piece (this type of behavior is called a generation attack
[32]). Furthermore, a miner may obtain a piece from
another peer instead of itself (this type of behavior is
called an outsourcing attack [32]). Another case is that
a miner may create multiple fake peers and pretend to
store several replications of a file piece (this type of
behavior is called a Sybil attack [32]). To prevent these
network attacks, Filecoin requires each miner to submit
the proof of replication to the main chain. Such the
Proof-of-Replication ensures that each miner stores file
pieces truly and independently.
Proof-of-Spacetime (PoSt): To prove that miners keep
storing a file piece in the effective time of transaction,
each miner has to submit proof of spacetime to the main
chain regularly. In the current design of Filecoin, the
proof is committed by providing spacetime every 20,000
blocks (roughly consuming 6 days to mine on average)
[21] to prove that the file piece is not missing. Storage
market has to validate the proofs uploaded by miners
and decides whether to punish miners every 100 blocks
(50 minutes to mine on average) [21].
The consensus algorithm of Filecoin is called Expected
Consensus [21], in which a ticket is computed in each round
of consensus process. By comparing the ticket value and the
effective storage of each peer, a peer or several peers can be
the leaders of this round. A leader can select transactions to
pack in the new blocks generated. When a block is packed, it
will be sent to other peers for synchronization. Transactions
in a block are executed by Ethereum virtual machine (EVM)
[33], and the state of each account will be updated.
2) Consensus of Ethereum
There are four stages of Ethereum: Frontier, Homestead,
Metropolis and Serenity. In the first three stages, proof-of-
work (PoW) [34] is adopted as the consensus mechanism of
Ethereum, while in the fourth stage, the proof-of-stake (PoS)
[35] will be adopted.
In PoW, each miner packs transactions from the transac-
tion pool and constructs a new block in a sequential order.
Then miners adjust the nonce value constantly which is
imported to PoW function [34] with the block header. A
target indicator is also computed according to the difficulty
of the blockchain. By comparing the result of this function
with the target indicator, the miner decides whether it wins
in the consensus process. When a miner confirms that it has
won, it starts to broadcast its new block to other peers. Upon
receiving a block from other peers, a miner stops computing
to validate the nonce value of the newly received block. Each
transaction of the new block is executed by EVM. After the
processing of all transactions included in this new block, the
state of this peer will be updated [33]. Currently, the average
time of consensus in Ethereum is around 15 seconds, which
ensures the consistency of all peers [36].
3) Consensus Algorithms of Other File Systems
Storj’s Proofs of Retrievability: Designed as a decentral-
ized cloud object storage, Storj [9] proposed Proof of Storage
6VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
in its first-version white paper. Interestingly, we found that
in version 2.0 of Storj’s white paper [37], the consensus
algorithm has been changed to Proofs of Retrievability [38].
Proofs of Retrievability aims at ensuring a certain piece of file
exists on a host. It offers a high availability of files under an
ideal proof, in which messages are with minimum size, and
pre-processing is minimal. According to the new white paper
[37], Poof of retrievability is still under ongoing research
and implementation. We then analyze the reason behind the
change of Storj’s consensus algorithms. It probably because
of that the current reputation systems, including proof of
storage, fail to solve the cheating client attacks [37]. In such
cheating attacks, it is hard to independently verify whether
a privately verifiable audit under a reputation system was
issued or not as claimed. Thus, the proof of storage lacks
publicly verifiable practices.
PPIO’s 4 Proof Schemes: PPIO [10] exploits difference
proof algorithms, i.e., PoRep, PoSt, Proof of Download
(PoD), and Light Proof of Capacity (LPoC), in which PoD
and LPoC are two brand new proof mechanisms created by
PPIO. PoD particularly supports the media streaming related
service. LPoC is designed to cold start storage miners. How-
ever, because LPoC technically occupies hard disk resources
with no real values, the PPIO team has decided to abandon
the implementation of LPoC.
H. SUMMARY OF THE LAYERED STRUCTURE
As the efficient decentralized storage layer of the next gen-
eration Internet, both IPFS and Swarm use similar technolo-
gies. They provide low-latency data retrieval, fault-tolerant
guarantees and decentralized/distributed storage solutions.
In identities layer, multi-hash technique [7] is used by
IPFS which can store the hash function and hash digest.
Swarm uses the account address of Ethereum directly. In
network layer, Swarm adapts the secure and stable network
of Ethereum. IPFS uses libP2P which is a more generic
solution. The incentive layer of Swarm relies smart contracts
of Ethereum, which support automated auditing and delayed
payment. This saves transaction costs of Swarm and remains
secure. Filecoin relies on proofs and consensus of blockchain
which is an overuse of blockchain. The PoW consensus of
Swarm stands the test of time while the expected consensus
of IPFS remains waiting the test of real-world.
Swarm inherits directly technology design of Ethereum.
For example, the identities layer, network layer and consen-
sus layer of Swarm are the same as Ethereum. As Swarm
benefits from Ethereum with its large ecosystem, secure and
living network and reliable funding sources. IPFS is highly
modular and can replace existing component with state-of-
the-art technology. In conclusion, the technology of Swarmis
more stable while the technology of IPFS is more advanced.
IV. CUTTING-EDGE STUDIES OF BLOCKCHAIN-BASED
DISTRIBUTED FILE SYSTEMS
In this section, we discuss recent cutting-edge studies of
blockchain-integrated distributed file systems, mainly em-
phasizing on IPFS and Swarm.
A. SCALABILITY
With the number of transactions increasing in blockchain
networks, each peer has to validate and store a growing size
of transactions periodically. This incurs a huge burden of both
storage and performance to each peer. In addition, the limited
size of each block and the latency of consensus-achieving
must be taken into account, because these factors induce the
delayed transactions. Meanwhile, as the cluster size and data
replications growing in network, the performance of IPFS
and Swarm degrades severely.
In this part, we review several studies paying attention
on the scalability issues of distributed file systems, mainly
focusing on IPFS and Swarm. These works can be classified
into two categories: 1) scalability evaluation, and 2) storage
optimization. For convenient identification, we summarize
these studies on Table 2.
1) Scalability Evaluation
Although the performance of IPFS is under doubting by
academia, we only found few research studies that evaluate or
discuss the scalability of IPFS. The representative papers are
reviewed as follows. Wennergren et al. [39] discuss and ana-
lyzes the scalability performance of IPFS. They conducted
simulations with varying cluster sizes and replication fac-
tors. The simulation results show that the average download
time of data stored in IPFS increases as cluster size and
replication factor grow. In consequence, the response time
among peers in an IPFS network grows, and the downloading
speed reduces as well. The authors mentioned that the limited
bandwidth of each instance of IPFS could be one of the
critical reasons for the low scalability of IPFS.
Recently, Shen et al. [40] conducted the systematic
evaluations of IPFS storage system by deploying real
geographically-distributed instances on Amazon EC2 cloud.
The authors emphasize on the data I/O operations from a
client’s perspective. The extensive measurement results show
that the access patterns of clients can severely affect the I/O
performance of IPFS. Further quantitative analysis indicates
that downloading and resolving operations could be bottle-
neck factors while clients are reading objects from remote
nodes.
To address the traceability problem of a distributed file
system, Nyaletey et al. [41] proposed a solution combining
the blockchain and IPFS which named BlockIPFS, which
can trace and audit the access events of each file on IPFS.
The authors conducted a group of experiments to evaluate
the scalability of the proposed BlockIPFS by varying the
number of nodes. Then, they measured the latency con-
sumed by uploading, downloading, and reading transactions
of each file stored in system. The measurement results show
that the increasing number of nodes does not cause drastic
growing of transaction times. Unfortunately, the scale of
their experiments is too small since the number of nodes in
BlockIPFS system is ranging from 3 to 27. Thus, this group
VOLUME 4, 2016 7
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Category Reference File System Methodology
Scalability
Wennergren et al. [39] IPFS Analyzed scalability performance via varying cluster sizes and replication
factors.
Evaluation
Shen et al. [40] IPFS Evaluated storage system of IPFS on Amazon EC2 cloud, by emphasizing on
the data I/O operations in a client’s point of view.
BlockIPFS [41] IPFS
Nyaletey et al. proposed a solution by integrating blockchain and IPFS, which
can trace the access events of each file on IPFS. They evaluated the scalability
of BlockIPFS by varying the number of system nodes.
Storage
Chen et al. [42] IPFS Proposed a new storage model based on Zigzag code [43], for the block storage
scheme adopted by IPFS.
Optimization
Norvill et al. [44] IPFS Proposed an off-chain approach that moves contract-generation codes to IPFS
database, aiming to improve the storage performance of distributed file system.
Design of Ethereum [45] Swarm Advocated that a chain of contracts in Swarm should be configured in off-chain
storage.
White paper [9] Storj Storj encrypts data using sharding technique and enables data availability based
on Reed-Solomon erasure coding [46].
Lightning Network [47],
Plasma [48]
Off-chain file
systems
Allows blockchain clients conducting transactions in the off-chain manner,
aiming to offload the storage pressure of main-chain.
TABLE 2. Scalability-related studies of distributed file systems.
of experiments makes the scalability of their system unknown
under a very large-scale deployment.
2) Storage Optimization
Erasure Codes. To guarantee high data availability, some
distributed file systems, e.g., Sia [49] and Storj [9], adopt
the erasure codes for their storage strategy. In a typical (N,
K) erasure code [50], an original file is usually divided into
a number K(>1) of blocks. Each block is then encoded
to a larger number N(K) of coded blocks. Out of
those Nencoded blocks, any Kof them can reconstruct the
original file. Thus, exploiting erasure codes can improve the
storage resilience of distributed file systems. For example,
to improve user experience of a P2P file system, Chen et
al. [42] proposed a new storage model based on zigzag [43]
and blockchain techniques. The new storage model aims at
improving the block storage strategy adopted by IPFS.
Storing Data Off-Chain. On the other hand, we also
found other optimized solutions related to the storage of
transactions and smart contracts. For instance, to improve the
storage performance of distributed file systems, Norvill et al.
[44] proposed a solution that moves the contract-generation
code to an off-chain by treating IPFS as a storage database.
In their proposal, Ethereum loads complex contract codes
by sending a simple hash value to IPFS peers. By this way,
system clients only have to send hash values rather than
the full codes when performing fast synchronizations. Thus,
the bulk of network traffic can be reduced. In the design of
Swarm [45], a chain of contracts is configured to maintain
the basic operations. These contracts increase the data size
of blockchain such that Swarm is hard to be operated as a
full blockchain ledger. Thus, according to reference [44], we
know that the developers of Ethereum have been working on
Swarm towards an off-chain storage. Some other off-chain
solutions such as Lightning Network [47] and Plasma [48]
allows participants to execute transactions in the off-chain
manner, such that a large portion of on-chain transactions and
smart contracts can be offloaded from the main-chain. Thus,
integrating the off-chain techniques will bring new solutions
to the storage policy of future distributed file systems.
B. PRIVACY
In Swarm and IPFS, data uploaded to the distributed file
systems by users is divided into several pieces, which are
then stored in different peers. Although the data uploaded
can be encrypted, the data content stored in the network is
accessible by every peer. Besides, according to the design
of IPFS and Swarm, transactions that record developments
of a peer can be easily collected. User’s information can
be revealed through the graph analysis of transactions. For
example, according to [51], a client can be identified through
the peers it directly connects to. Thus, transactions stored
in blockchain behind distributed file systems are publicly
visible.
To address these issues, a number of efforts have been
devoted to the privacy-preserving of distributed file systems.
Through an extensive literature review, we have found many
privacy-preserving solutions, mechanisms and applications.
Some representative works are classified into two main cat-
egories: 1) Access Control, and 2) Peer Anonumity. We also
compare several attributes of these studies in Table 3.
1) Access Control
In the distributed file systems such as IPFS and Swarm,
although users are not permitted freely to share data within
a specific group of peers, this is necessary when taking
the privacy issues into account. To provide access control
when sharing files, Steichen et al. [52] proposed a modified
version of IPFS named acl-IPFS based on Ethereum. An acl-
IPFS peer is constructed by an IPFS peer and an Ethereum
account. The uploading, downloading and transferring of
data in IPFS networks are achieved though the interaction
with smart contracts residing in Ethereum. Smart contracts
dynamically maintain the access control lists of each file
in acl-IPFS. Users can grant or revoke a permission of a
file through smart contracts, too. Aiming to enhance the
8VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Category Reference Technology Foundation Privacy Level Functionality
Access
acl-IPFS [52] Smart contract, Ethereum,
IPFS Strong Provided an access control list for the files shared in the
proposed acl-IPFS system.
Control
Ali et al. [53] Consortium blockchain,
Sidechain, IPFS Strong Proposed a modular consortium architecture for privacy
preserving towards IoT data.
Nizamuddin [54] IPFS, Ethereum smart con-
tract Strong Proposed a solution to the authenticity of original online
published digital works.
Wang et al. [55]
IIPFS, Ethereum and
Attribute-Based Encryption
(ABE)
Unclear
Proposed a smart contract-based access control mecha-
nism for decentralized storage systems.
Naz et al. [56] IPFS, Smart contract, RSA
signatures Strong Proposed an IPFS-based secure data sharing framework
to deliver digital assets.
Nyaletey et al. [41] IPFS, Hyperledger Fabric Strong Proposed a solution BlockIPFS to trace the access events
and provide audit access to files stored on IPFS.
Huang et al. [31] Ethereum, local databases Strong Proposed an Ethereum-based access control and trust-
worthiness protection for multiple-domain participants.
Peer
Zerocoin [57] Bitcoin laundry system,
zero-knowledge proof Limited Offered a limited anonymity to the Bitcoin account ad-
dresses based on zero-knowledge proof.
Anonymity
E-Voting System [58] Bitcoin laundry system, Ze-
rocoin Limited Provided an electronic-voting system based on Zerocoin,
aiming to solve the privacy issues in original Bitcoin.
Zerocash [59]
Zero-knowledge Argument,
decentralized anonymous
payment scheme
Strong
Provided a strong anonymous transactions by covering
up the origins, destinations and the total amount of a
payment.
Mixcoin [60]
Accountable mixes, an in-
dependent cryptographic ac-
countability layer
Strong
Proposed an anonymous payment protocol that can be
deployed immediately with no changes to Bitcoin.
ReportCoin [61] IPFS, blockchain Strong Proposed a blockchain-based reporting system for the
management of smart city.
TABLE 3. Privacy-related studies of distributed file systems.
privacy preserving towards IoT data, Muhammad et al. [53]
proposed a modular consortium architecture by combining
the techniques of IoT and blockchains. The proposed ar-
chitecture can provide decentralized management for IoT
data by exploiting the advantages of blockchain and IPFS.
Nizamuddin et al. [54] studied the authenticity of online
digital and multimedia content. To provide the originality
proof, authors proposed an authenticity solution based on
IPFS and smart contracts. Based on IPFS, Ethereum and
attribute-based encryption (ABE) technologies, Wang et al.
[55] investigated the data storage and sharing mechanism for
distributed storage framework, in which no trusted private-
key-generator is required. To achieve the fine-grained access
control, a data owner can distribute secret keys for other
users, and encrypt his data under a certain access policy.
Then, towards transparency and quality of data, Naz et al.
[56] proposed a secure digital-asset sharing framework based
on integrated technology by combining IPFS, blockchain and
encryption mechanisms. Next, Huang et al. [31] proposed
an Ethereum-based network-view sharing platform, which
can bring global trustworthiness for multiple domains such
as different IoT domain networks. In particular, the domain
view of each partner is stored in their local databases, while
the Ethereum-based system provides the access control and
trustworthiness protection over all participants.
2) Peer Anonymity
The privacy preservation of blockchain peers attracts par-
ticular attention in recent years. For example, considering
that the original Bitcoin system has significant limitations
on the privacy of Bitcoin peers, Miers et al. [57] proposed
Zerocoin, which enables a limited anonymity to the Bitcoin
account addresses based on zero-knowledge proof. However,
the proposed Zerocoin cannot guarantee the full anonymity
because at least the number of minted and spent coins, and
the denomination of transactions are visible to all users of
this system. Using Zerocoin, Takabatake et al. [58] then
proposed a new Bitcoin laundry middleware for Bitcoin.
In this middleware, authors mentioned that the origin of
transactions can be hidden and miners are able to validate
transactions without signatures. However, the destination of
a transaction and the amount of a payment are still ex-
posed to users. Thus, the proposed e-voting system is also
with a limited anonymity. Moreover, the execution speed
of this voting system is also an obstacle that is hard to
address. To address this problem, Zerocash [59] is claimed
to fulfill a strong anonymity for payments, because it hides
the transaction amount and the values of user-held coins,
by invoking the Zero-Knowledge Succinct Non-interactive
Arguments of Knowledge (ZK-SNARKs) [62]. Mixcoin [60]
provided a combined service that transfers funds from multi-
ple source addresses to multiple destination addresses. Thus,
the relationship between two accounts is hard to be revealed.
Zou et al. [61] studied an incentive anonymous reporting
mechanism based on blockchain and IPFS. The accounts and
transactions stored in ReportCoin are open, transparent, and
tamper-resistant. Thus, the anonymity of reporting sources
can be protected with a high guarantee. The proposed Report-
Coin was only evaluated through simulations, which make
this work less convincing. The practicality of the proposed
VOLUME 4, 2016 9
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
incentive mechanism requires more convincing proofs by real
implementations.
V. OPEN ISSUES, CHALLENGES & FUTURE
DIRECTIONS
In this section, we discuss open issues, challenges and future
directions of distributed file systems with respect to 4 per-
spectives: Scalability,Privacy,Applications and Big Data.
A. SCALABILITY ISSUES
1) Scalability Performance
We have reviewed some representative studies [39] related
to the scalability performance measurement of DFSs in pre-
vious section. These existing works have shown us some
insights of DFSs. For example, Wennergren et al. [39] men-
tioned that the limited bandwidth of each instance of IPFS
could be one of the critical reasons for the low scalability of
IPFS. The quantitative analysis [40] of systematic evaluations
towards IPFS storage system indicates that downloading
and resolving operations could be bottlenecks while IPFS
clients are reading objects from remote nodes. Nyaletey et
al. [41] evaluated the scalability of the proposed BlockIPFS
by varying the number of nodes. However, the scale of their
experiments is too small, making the scalability performance
of their system unclear under a very large-scale deployment.
Through the studies [39], [40], [44], we see that the current
distributed file systems, such as IPFS and Storj, are still in
their immature stages. For example, IPFS still faces some
notable shortcomings, including the bottlenecks of resolving
and downloading, and the high latency of I/O operations.
Thus, to achieve the large-scale commercial applications,
IPFS must solve a number of challenges such as storage
optimization, geo-distributed deployment of nodes, and file
request performance, etc.
On the storage-optimization perspective, although the con-
ventional Erasure coding Zigzag codes [43] can be used
to improve the storage efficiency for the proposed IPFS-
based systems, some open issues should not be ignored. For
example, reconstructing original files could bring a high con-
sumption of both disk I/O and bandwidth to some associated
peer nodes.
Another critical problem that IPFS needs to address is how
to update the contents already stored on its system. This is
because all data stored in the IPFS network is a series of hash
addresses. Once a change occurs on a file stored in IPFS,
the hash address changes, too. Therefore, an efficient update
mechanism should be developed for IPFS.
Finally, to improve the scalability of blockchain-based
DFSs, we believe that to develop new solutions that can
improve the efficiency of DFS’s structure layers can be a
promising direction. We wish to see the related studies will
be proposed soon.
2) Performance Measurement Methodology
The performance measurement of IPFS and Swarm con-
sidering Quality of Service (QoS) metrics still need to be
further conducted widely and deeply in future. Especially
when integrating them into business models, users desire
to know which one (either IPFS or Swarm) matches their
requirements best. Fortunately, Zheng et al. [63] proposed a
real-time performance monitoring framework for blockchain
systems. This work has evaluated four famous blockchain
systems, i.e., Ethereum [12], Parity [64], Cryptape Inter-
enterprise Trust Automation (CITA) [65] and Hyperledger
Fabric [66], with respect to the QoS metrics of transactions
per second,average response delay,transactions per CPU,
transactions per memory second,transactions per disk I/O
and transactions per network data. Such comprehensive
performance evaluation results give us insightful viewpoints
over the 4 well-known blockchain systems. Their experi-
mental logs and technique report [67] can be found from
http://xblock.pro. In addition, Curran et al. [68] mentioned
that they plan to analyze the performance of IPFS while a
website is under an unexpected surge of visitors. However,
we cannot find the subsequent technique report of their
measurements.
3) System Measurement Standards
Based on the existing studies aforementioned, new system
measuring standards need to be proposed for IPFS and
Swarm. Generally, the system testing can be separated into
two phases [69]: a standardization phase and a testing phase.
In the former phase, a series of metrics have been designed
to show the performance of systems in terms of Transactions
Per Second, Contract Execution Time and Consensus-Cost
Time. In the latter phase, systems are tested in different
situations. For example, failures including network shutdown
and high memory occupation could be injected. Then, the
designed metrics could show the performance under different
failures, which can help identify different types of failures.
Furthermore, the transaction amount that are received by
a blockchain system in one second could be adjusted in a
testing environment. Thus, the system performance under
different transaction rates could be measured.
Through the further review described above, we see that
the system measurement of distributed file systems is still in
its immature stage. Thus, we look forward to seeing exciting
new studies on this topic.
B. PRIVACY & SECURITY ISSUES
Some current versions of DFS such as IPFS, do not tolerate
Byzantine attacks. For instance, every peer can access every
file stored on IPFS as long as it joins in the system. This sit-
uation makes privacy and security issues are weaknesses for
IPFS systems. Therefore, to import some privacy-protection
means such as smart contract-based Access Control mech-
anisms [31] and encryption technologies [55] over the data
stored on blockchain-based DFSs could be feasible solutions.
In addition, researchers are also considering that will
Reed-Solomon erasure coding [46] be implemented for IPFS.
Note that, Reed-Solomon coding is very popular in the
datacenters as they provide great disk-savings against data
10 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
replication. IPFS has not yet addressed such data replication
problem. On the other hand, adopting such erasure coding
can also enhance the privacy and security level for DFSs.
This is because each data chunk is encoded under an erasure
coding, even if a peer gets a chunk, it doesn’t know what
the content is. Furthermore, if a malicious attacker outside
a DFS intends to eavesdrop from the DFS peers, the attacker
must have all encoded pieces of data chunks associated to the
desired file. It would be very difficult if the malicious attacker
is blocked by an access-control mechanism.
In summary, we anticipate to see new solutions regarding
data privacy & security of IPFS are going to be implemented
in near future.
C. APPLICATION ISSUES
IPFS is providing business solutions to enterprises. A grow-
ing number of applications based on IPFS have been de-
veloped. According to the original design, IPFS is used
for storing data. For example, Jia et al. [70] developed a
decentralized music-sharing platform called Opus employing
both IPFS and Ethereum. Opus provides encrypted storages
using IPFS. The keys of these encrypted data are traded using
smart contracts. Opus is also able to prevent monopoly of
streaming platforms, track the digital ownership of artists and
compensate artists with reasonable monetary price. Not only
playing as a game-changer in music domain, IPFS has been
adopted by other areas. For instance, Tenorio-Forn et al. [71]
proposed a decentralized publication system for open-access
science based on IPFS. Their proposed distributed systems
can record reviewers’ reputation, and handle the transparent
governance processes.
Recently, the IPSE team [72] proposed a new revolutionary
search engine, which is implemented on top of IPFS and
blockchain. Such IPSE focuses on user privacy and search
efficiency, because it allows users to search network files on
IPFS and access to the file without relying on a centralized
entity such as Google or Baidu. More importantly, IPSE also
enables users to take full control of their own network data
by exploiting encryption technologies and smart contracts.
Thus, IPSE is a good example that integrates a distributed
file system with Blockchain technologies.
It can be seen that most of these applications leverage the
decentralized characteristics of IPFS. With the integration of
Filecoin, smart contracts are imported into IPFS. This new
feature brings a great potential to IPFS. Thus, smart contracts
make the application development based on IPFS or Swarm
a promising direction.
D. BIG DATA ISSUES
IPFS and Swarm can be also well combined with big data
applications. We discuss the big data issues considering the
following two aspects: big data storage and big data analyt-
ics.
On one hand, regarding big data storage, IPFS and Swarm
can store data with their decentralized and secure character-
istics. For example, Confais et al. [73] proposed an object
store for Fog and Edge Computing using IPFS and Scale-
out Network Attached Storage systems (NAS) [74]. The
proposed system alleviated the issues of high latency of
cloud computing architecture and thus is suitable for the
Internet of Things (IoT). According to [75], in the era of
the fifth Generation Communications Network (5G), more
IoT facilities require larger and more secure storages. To
meet this requirement, the blockchain-based distributed file
systems such as Swarm and IPFS can play an important role
as the secure storage layer for IoT.
On the other hand, with respect to big data analytics,
the transactions on blockchains and the logs in file systems
can be used for data analytics. For example, the analytics
of transactions collected from blockchain systems can be
used to extract the trading patterns of users. The data ana-
lytics of peer’s credit is also useful when deciding whether
to sign deals with peers. As representative works of data
analytics, Chen et al. [76]–[78] analyzed a large-scale of
smart contracts collected from Bitcoin and Ethereum. The
authors then successfully detected a large number of market
manipulations and Ponzi Schemes [79] using data mining and
machine learning methods. Their studies can be viewed as
a pioneer on combining big data analytics with blockchains.
The technique reports, datasets and even data-analytics codes
[80] can be downloaded from http://xblock.pro. Using sim-
ilar approaches, transactions and other data in blockchain
networks can be analyzed such that malicious peers and
potential attacks existing in distributed file systems can be
detected.
Since we have not found any further studies related to
the big data issues, we believe that this topic will become
a very promising direction for the research community of
blockchain-based distributed file systems.
VI. CONCLUSION
The new generation of blockchain-based distributed file sys-
tems, such as IPFS and Swarm, have shown their great
potentials with their key characteristics: novel solutions of
incentive, low-latency data retrieval, automated auditing, and
censorship-resistant, etc. This paper first presents the ratio-
nale, layered structure and an overview of blockchain-based
distributed file systems, particularly focusing on IPFS and
Swarm systems. Then, we review the cutting-edge studies,
and reveal a series of challenges that constrain their develop-
ment. Open issues and future directions are also discussed.
We believe that the blockchain-based distributed file systems
will become very promising solutions for the next-generation
websites and data-sharing platforms. We anticipate that this
article can trigger blooming investigations on blockchain-
based distributed file systems.
REFERENCES
[1] M. Giesler and M. Pohlmann, “The anthropology of file sharing: Consum-
ing napster as a gift,” ACR North American Advances, 2003.
[2] M. Ripeanu, “Peer-to-peer architecture case study: Gnutella network,”
in Proceedings first international conference on peer-to-peer computing.
IEEE, 2001, pp. 99–100.
VOLUME 4, 2016 11
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[3] N. S. Good and A. Krekelberg, “Usability and privacy: a study of kazaa
P2P file-sharing,” in Proc. of the SIGCHI conference on Human factors in
computing systems. ACM, 2003, pp. 137–144.
[4] H.-W. Tseng, Q. Zhao, Y. Zhou, M. Gahagan, and S. Swanson, “Morpheus:
creating application objects efficiently for heterogeneous computing,” in
2016 ACM/IEEE 43rd Annual International Symposium on Computer
Architecture (ISCA). IEEE, 2016, pp. 53–65.
[5] J. Pouwelse, P. Garbacki, D. Epema, and H. Sips, “The bittorrent p2p file-
sharing system: Measurements and analysis,” in International Workshop
on Peer-to-Peer Systems. Springer, 2005, pp. 205–216.
[6] J. E. Cater and J. Soria, “The evolution of round zero-net-mass-flux jets,
Journal of Fluid Mechanics, vol. 472, pp. 167–200, 2002.
[7] J. Benet, “Ipfs-content addressed, versioned, p2p file system,” arXiv
preprint arXiv:1407.3561, 2014.
[8] V. Trón, A. Fischer, and Nagy, “State channels on swap networks: claims
and obligations on and off the blockchain (tentative title),” 2016.
[9] S. Wilkinson, T. Boshevski, J. Brandoff, and V. Buterin, “Storj a peer-to-
peer cloud storage network,” 2014.
[10] “Ppio: a decentralized programmable storage and delivery network,” https:
//www.pp.io/docs/.
[11] J. Benet and N. Greco, “Filecoin: A decentralized storage network,”
Protoc. Labs, 2018.
[12] G. Wood et al., “Ethereum: A secure decentralised generalised transaction
ledger,” Ethereum project yellow paper, vol. 151, pp. 1–32, 2014.
[13] S. Wilkinson, J. Lowry, and T. Boshevski, “Metadisk: a blockchain-based
decentralized file storage application,” Storj Labs Inc., Technical Report,
hal, pp. 1–11, 2014.
[14] M. Szydlo, “Merkle tree traversal in log space and time,” in International
Conference on the Theory and Applications of Cryptographic Techniques.
Springer, 2004, pp. 541–554.
[15] S. Nakamoto et al., “Bitcoin: A peer-to-peer electronic cash system,” 2008.
[16] G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche, “Keccak,” 2013.
[17] “LibP2P,” https://github.com/libp2p.
[18] “DevP2P,” https://github.com/ethereum/devp2p.
[19] M. J. Freedman, E. Freudenthal, and D. M. Eres, “Democratizing content
publication with coral,” in Conference on Symposium on Networked
Systems Design & Implementation, 2004.
[20] “Distributed preimage archive of swarm,” https://swarm-guide.
readthedocs.io/en/latest/architecture.html#distributed-preimage-archive.
[21] “Expected consensus,” http://www.ipfs.cn/news/info-100327.html.
[22] I. Baumgart and S. Mies, “S/kademlia: A practicable approach towards
secure key-based routing,” in International Conference on Parallel &
Distributed Systems, 2007.
[23] M. J. Freedman and D. MaziÃl’res, “Sloppy hashing and self-organizing
clusters,” 2003.
[24] S. Shalunov, G. Hazel, J. Iyengar, and M. Kuehlewind, “Low extra delay
background transport,” Internet-draft, Internet Engineering Task Force,
Tech. Rep., 2010.
[25] R. Stewart, Q. Xie, and M. C. Allman, “Stream control transmission
protocol (sctp): A reference,” Publisher: Addison-Wesley, 2001.
[26] A. Chockalingam and G. Bao, “Performance of tcp/rlp protocol stack on
correlated fading ds-cdma wireless links,” IEEE Transactions on Vehicular
Technology, vol. 49, no. 1, pp. 28–33, 1998.
[27] S. Kim, “Measuring ethereum’s peer-to-peer network,” 2017.
[28] “The Homepage of GIT,” https://git-scm.com/.
[29] A. Tridgell, P. Mackerras et al., “The rsync algorithm,” 1996.
[30] A. Z. Broder, “Some applications of rabin’s fingerprinting method,” in
Sequences II. Springer, 1993, pp. 143–152.
[31] H. Huang, S. Zhou, J. Lin, K. Zhang, and S. Guo, “Bridge the Trust-
worthiness Gap amongst Multiple Domains: A Practical Blockchain-based
Approach,” in Proc. of 11th IEEE International Conference on Communi-
cations (ICC’20), June 2020, pp. 1–6.
[32] “Attacks of IPFS,” http://www.ipfs.cn/news/info-100379.html.
[33] Y. Hirai, “Defining the ethereum virtual machine for interactive theorem
provers,” in International Conference on Financial Cryptography and Data
Security. Springer, 2017, pp. 520–535.
[34] A. Gervais, G. O. Karame, V. Glykantzis, H. Ritzdorf, and S. Capkun,
“On the security and performance of proof of work blockchains,” in Acm
SIGSAC Conference on Computer & Communications Security, 2016.
[35] I. Bentov, C. Lee, A. Mizrahi, and M. Rosenfeld, “Proof of activ-
ity:extending bitcoin’s proof of work via proof of stake [extended ab-
stract]y,” Acm Sigmetrics Performance Evaluation Review, vol. 42, no. 3,
pp. 34–37, 2014.
[36] “Design rationale of ethereum,” https://github.com/ethereum/wiki/wiki/
Design-Rationale.
[37] S. Wilkinson, T. Boshevski, J. Brandoff, J. Prestwich, G. Hall, P. Gerbes,
P. Hutchins, C. Pollard, and V. Buterin, “Storj a peer-to-peer cloud storage
network (version 2.0),” Dec. 2016.
[38] H. Shacham and B. Waters, “Compact proofs of Retrievability,” Journal of
cryptology, vol. 26, no. 3, pp. 442–483, 2013.
[39] O. Wennergren, M. Vidhall, and J. Sörensen, “Transparency analysis of
distributed file systems: With a focus on interplanetary file system,” 2018.
[40] J. Shen, Y. Li, Y. Zhou, and X. Wang, “Understanding i/o performance
of IPFS storage: a client’s perspective,” in Proc. of the International
Symposium on Quality of Service (IWQoS’19), 2019, pp. 1–10.
[41] E. Nyaletey, R. M. Parizi, Q. Zhang, and K.-K. R. Choo, “BlockIPFS-
blockchain-enabled interplanetary file system for forensic and trusted
data traceability,” in 2019 IEEE International Conference on Blockchain
(Blockchain), 2019, pp. 18–25.
[42] Y. Chen, H. Li, K. Li, and J. Zhang, “An improved P2P file system scheme
based on IPFS and blockchain,” in Proc. of IEEE International Conference
on Big Data (Big Data), 2017, pp. 2652–2657.
[43] I. Tamo, Z. Wang, and J. Bruck, “Zigzag codes: Mds array codes with
optimal rebuilding,” IEEE Transactions on Information Theory, vol. 59,
no. 3, pp. 1597–1616, 2012.
[44] R. Norvill, B. B. F. Pontiveros, R. State, and A. Cullen, “IPFS for reduction
of chain size in Ethereum,” in Proc. of IEEE International Conference on
Internet of Things (iThings) and IEEE Green Computing and Commu-
nications (GreenCom) and IEEE Cyber, Physical and Social Computing
(CPSCom) and IEEE Smart Data (SmartData), 2018, pp. 1121–1128.
[45] “IPFS & SWARM,” https://github.com/ethersphere/swarm/wiki/IPFS-\
&-SWARM.
[46] J. S. Plank, “A tutorial on Reed–Solomon coding for fault-tolerance in
RAID-like systems,” Software: Practice and Experience, vol. 27, no. 9, pp.
995–1012, 1997.
[47] J. Poon and T. Dryja, “The bitcoin lightning network: Scalable off-chain
instant payments,” 2016.
[48] J. Poon and V. Buterin, “Plasma: Scalable Autonomous Smart Contracts,”
White Paper, pp. 1–47, 2017.
[49] D. Vorick and L. Champine, “Sia: Simple decentralized storage,” Nebulous
Inc, 2014.
[50] H. Huang, S. Guo, W. Liang, K. Wang, and Y. Okabe, “Coflow-like Online
Data Acquisition from Low-Earth-Orbit Datacenters,” IEEE Transactions
on Mobile Computing (TMC), 2019, DOI: 10.1109/TMC.2019.2936202.
[51] G. Fanti and P. Viswanath, “Deanonymization in the bitcoin P2P network,
in Advances in Neural Information Processing Systems, 2017, pp. 1364–
1373.
[52] M. Steichen, B. Fiz, R. Norvill, W. Shbair, and R. State, “Blockchain-
based, decentralized access control for IPFS,” in Proc. of IEEE Interna-
tional Conference on iThings, GreenCom, CPSCom and SmartData, 2018,
pp. 1499–1506.
[53] M. S. Ali, K. Dolui, and F. Antonelli, “Iot data privacy via blockchains and
IPFS,” in Proc. of the Seventh International Conference on the Internet of
Things. ACM, 2017, p. 14.
[54] N. Nizamuddin, H. R. Hasan, and K. Salah, “IPFS-blockchain-based
authenticity of online publications,” in International Conference on
Blockchain. Springer, 2018, pp. 199–212.
[55] S. Wang, Y. Zhang, and Y. Zhang, “A blockchain-based framework for data
sharing with fine-grained access control in decentralized storage systems,”
IEEE ACCESS, vol. 6, pp. 38437–38 450, 2018.
[56] M. Naz, F. A. Al-zahrani, R. Khalid, N. Javaid, A. M. Qamar, M. K.
Afzal, and M. Shafiq, “A secure data sharing platform using blockchain
and interplanetary file system,” Sustainability, vol. 11, no. 24, p. 7054,
2019.
[57] I. Miers, C. Garman, M. Green, and A. D. Rubin, “Zerocoin: Anonymous
distributed e-cash from bitcoin,” in Proc. of IEEE Symposium on Security
and Privacy, 2013, pp. 397–411.
[58] Y. Takabatake, D. Kotani, and Y. Okabe, “An anonymous distributed
electronic voting system using zerocoin,” IEICE Technique Report, 2016.
[59] E. B. Sasson, A. Chiesa, C. Garman, M. Green, I. Miers, E. Tromer, and
M. Virza, “Zerocash: Decentralized anonymous payments from bitcoin,
in Proc. of IEEE Symposium on Security and Privacy, 2014, pp. 459–474.
[60] J. Bonneau, A. Narayanan, A. Miller, J. Clark, J. A. Kroll, and E. W.
Felten, “Mixcoin: Anonymity for bitcoin with accountable mixes,” in
International Conference on Financial Cryptography and Data Security.
Springer, 2014, pp. 486–504.
12 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[61] S. Zou, J. Xi, S. Wang, Y. Lu, and G. Xu, “Reportcoin: A novel blockchain-
based incentive anonymous reporting system,” IEEE Access, vol. 7, pp.
65 544–65 559, 2019.
[62] H. Lipmaa, “Prover-efficient commit-and-prove zero-knowledge snarks,”
in Proc. of International Conference on Cryptology in Africa. Springer,
2016, pp. 185–206.
[63] P. Zheng, Z. Zheng, X. Luo, X. Chen, and X. Liu, “A detailed and real-
time performance monitoring framework for blockchain systems,” in Proc.
of IEEE/ACM 40th International Conference on Software Engineering:
Software Engineering in Practice Track (ICSE-SEIP), 2018, pp. 134–143.
[64] “Parity documentation,” https://paritytech.github.io/wiki.
[65] “Cita technical whitepaper,” https://github.com/cryptape/cita.
[66] “Hyperledger fabric website,” https://hyperledger-fabric.readthedocs.io/
en/release-1.4/write_first_app.html.
[67] Xblock, “Performance Monitoring,” Website, Feb. 2020, http://xblock.pro/
performance/.
[68] T. Curran and B. de Graaff, “Analysing the performance of IPFS during
flash crowds,” 2016.
[69] Z. Zheng, S. Xie, H. Dai, X. Chen, and H. Wang, “An overview of
blockchain technology: Architecture, consensus, and future trends,” in
Proc. of IEEE International Congress on Big Data (BigData Congress),
2017, pp. 557–564.
[70] B. Jia, C. Xu, R. Gotla, S. Peeters, R. Abouelnasr, and M. Mach, “Opus-
decentralized music distribution using interplanetary file systems (IPFS)
on the ethereum blockchain v0. 8.3,” 2016.
[71] A. Tenorio-Fornés, V. Jacynycz, D. Llop-Vila, A. Sánchez-Ruiz, and
S. Hassan, “Towards a decentralized process for scientific publication and
peer review using blockchain and IPFS,” in Proc. of the 52nd Hawaii
International Conference on System Sciences, 2019.
[72] I. Team", “Ipse: A search engine based on ipfs,” https://ipfssearch.io/
IPSE-whitepaper-en.pdf.
[73] B. Confais, A. Lebre, and B. Parrein, “An object store for fog infrastruc-
tures based on IPFS and a scale-out nas,” in RESCOM 2017, 2017, p. 2.
[74] G. A. Gibson, “Network attached storage architecture,” Comm Acm,
vol. 43, no. 11, pp. 37–45, 2000.
[75] I. Jovovi´
c, S. Husnjak, I. Forenbacher, and S. Maˇ
cek, “5G, Blockchain
and IPFS: A General Survey with Possible Innovative Applications in
Industry 4.0,” in 3rd EAI International Conference on Management of
Manufacturing Systems-MMS 2018, 2018.
[76] W. Chen, Z. Zheng, J. Cui, E. Ngai, P. Zheng, and Y. Zhou, “Detecting
ponzi schemes on ethereum: Towards healthier blockchain technology,
in Proc. of the 2018 World Wide Web Conference on World Wide Web.
International World Wide Web Conferences Steering Committee, 2018,
pp. 1409–1418.
[77] W. Chen, J. Wu, Z. Zheng, C. Chen, and Y. Zhou, “Market manipulation of
bitcoin: Evidence from mining the mt. gox transaction network,” in IEEE
Conference on Computer Communications. IEEE, 2019, pp. 964–972.
[78] W. Chen, Z. Zheng, E. C.-H. Ngai, P. Zheng, and Y. Zhou, “Exploiting
blockchain data to detect smart ponzi schemes on ethereum,” IEEE Access,
vol. 7, pp. 37 575–37 586, 2019.
[79] “Ponzi scheme,” https://en.wikipedia.org/wiki/Ponzi_scheme.
[80] Xblock, “Fraud Detection,” Website, Feb. 2020, http://xblock.pro/
fraud-detection/.
HUAWEI HUANG (M’16) is currently an Asso-
ciate Professor with the School of Data and Com-
puter Science, Sun Yat-Sen University, China.
He earned his Ph.D. degree in Computer Sci-
ence and Engineering from the University of Aizu
(Japan) in 2016. His research interests include
blockchain and intelligent distributed computing.
He has served a Research Fellow of JSPS (2016-
2018); a visiting scholar with Hong Kong Poly-
technic University (2017-2018); an Assistant Pro-
fessor with Kyoto University, Japan (2018-2019). He received the best paper
award from TrustCom2016.
JIANRU LIN is currently a visiting researcher
in School of Data and Computer Science, Sun
Yat-Sen University, China. His research interests
include consensus protocols and blockchain.
BAICHUAN ZHENG received his Bachelor’s de-
gree from the School of Data and Computer Sci-
ence of Sun Yat-Sen University in 2019. He is
currently doing research on new consensus mech-
anism and database of blockchain.
ZIBIN ZHENG received the Ph.D. degree from
the Chinese University of Hong Kong, in 2011.
He is currently a Professor at School of Data and
Computer Science with Sun Yat-sen University,
China. He serves as Chairman of the Software
Engineering Department. He published over 120
international journal and conference papers, in-
cluding 3 ESI highly cited papers. According to
Google Scholar, his papers have more than 9600
citations, with an H-index of 47. His research
interests include blockchain, services computing, software engineering, and
financial big data. He was a recipient of several awards, including the Top 50
Influential Papers in Blockchain of 2018, the ACM SIGSOFT Distinguished
Paper Award at ICSE2010, the Best Student Paper Award at ICWS2010. He
served as BlockSys’19 and CollaborateCom’16 General Co-Chair, SC2’19,
ICIOT’18 and IoV’14 PC Co-Chair.
JING BIAN received the B. Sc. degree in Au-
tomation in 1988, the M. Sc. degree in Computa-
tional Mathematics in 2001, and the Ph. D. degree
in Physics in 2006. from Sun Yat-sen Univer-
sity, Guangzhou, China. She is currently a vice-
professor with the School of Data and Computer
Science, Sun Yat-sen University, Guangzhou. Her
current research interests include design and anal-
ysis of algorithms, blockchain, Electronic Com-
merce and social networks.
VOLUME 4, 2016 13
... Therefore, the storage of original data should also use a decentralized distributed storage mode, which is also the "blockchain + IPFS" storage structure mainly adopted by the academic world [21]. IPFS (Inter Planetary File System) is a content-addressed, versionable, peer-to-peer distributed file system. ...
... Tag is a series of numbers determined by the encoding rule. As shown in Table 3, Tag = [11,21,31,255] In tag, we define a terminator. In Table 3, the terminator's value is 255. ...
... =[11,21,31,255] 11 means the block height of the block where the previous transaction is located; 21 means the previous transaction's hash value; 31 means the CID of the previous transaction's image evidence; 255 is a terminator. ...
Article
To solve the problems exposed by the application of blockchain technology under complex scenarios, such as fraudulent use of data, hard to store huge amounts of data, and low traceability efficiency under an ultra-huge number of traceability requests, this paper constructs an image-based interactive traceability structure by using images as an enhancement. By adding pointers to raw image files, a specific file structure is formed for traceability, and the traceability process is separated from the verification process, therefore realizing the distributed traceability of “traceability off the chain and verification on the chain”. The experimental results show that, compared with the traditional blockchain traceability mode, the interactive traceability structure can reduce the data retrieval pressure and greatly improve the traceability efficiency of a specific transaction chain. With the growth of the span of the transaction chain, the traceability efficiency advantage of the interactive traceability structure becomes more obvious.
... For example, the Inter-Planetary-File-System (IPFS) [15] provides availability and resilience by replicating data across nodes of a network. Recently, decentralized storage protocols were integrated in this approach for storing data on the web in a decentralized way [16]. Furthermore, IPFS can be combined with reward mechanisms. ...
... IPFS possesses, similarly to BitTorrent, a peer-to-peer architecture allowing for the retrieval of specific sets of files or directories [15], [16]. In addition, it provides mechanisms of a file system within this architecture, including references and the fine-grained retrieval of blocks of files. ...
... financial transactions for settlements using cryptocurrency or the attestation of legal documents. On the other hand, the continuous adoption in the direction of multiple blockchains and side-chains such as Polkadot, 15 Polygon, 16 and potential improvements in Ethereum 2.0 17 might lessen the network utilization in existing networks and might provide sufficient capacity overall. If transaction cost does not decline and remains at today's levels, the current cost of attestations on the order of tens of USD would be sufficient for the applicability of the approach in mid-to high-value certification scenarios. ...
Article
Full-text available
The distribution of information through web protocols is today based on the client-server model. Recently, decentralized protocols with greater availability appear as well as blockchain-based attestation methods, allowing for proving the existence of information. In combination, these methods promise a secure, decentralized and long-term storage. However, there exist two major problems: (1) the scalability of blockchains limits their storage capacity and (2) various (de)centralized web protocols are in use and could alleviate this problem, but they do not support blockchain-based attestations. In this paper, we extend an approach for blockchain-based attestation with compatibility for multi-protocol storage. Instead of specific protocols or blockchains, the extended approach aims to contribute novel concepts to the discussion on blockchain scalability. It augments the capabilities of existing protocols for applications such as certification or timestamping of digital artifacts. With the use of decentralized protocols such as IPFS, further availability and inherent resilience properties are gained, allowing for applications such as open research repositories and digital registries. We discuss the architecture of the extended approach, a possible implementation in a smart contract on the Ethereum blockchain with IPFS and Git, and evaluate the time and cost of attestations.
... Inter-Planetary File System (IPFS) is a multipurpose, distributed, peer-to-peer, versioncontrolled file system with no single point of failure [3]. IPFS engenders a global Merkle Directed Acyclic Graph data structure (Merkle DAG) [8], with a content-addressing storage block architecture driven by Distributed Hash Tables (DHT), a block exchange system and a self-certifying namespace [3]. Stored content inside IPFS is accessible via CID hyperlinks (Content Identifiers). ...
... Stored content inside IPFS is accessible via CID hyperlinks (Content Identifiers). Currently IPFS is used for a wide variety of applications, such as distributed web applications and serverless applications [9], telecommunication, cloud data storage networks [10], content delivery networks (CDN) and blockchains [8], and even a crypto-currency based large scale decentralized file storage network called Filecoin [11] [12]. Its trustless and decentralized structure may lead to the development of a censorship-resistant and permanent web[1]. ...
Article
Full-text available
This research describes the simple implementation of asynchronous distributed cellular automata and decentralized swarms of asynchronous distributed cellular automata built on top of inter-planetary file system's publish-subscribe (IPFS PubSub) experimentation. Various publish-subscribe (PubSub) models are described. As an illustration, two distributed versions and a decentralized swarm version of a 2D elementary cellular automaton are thoroughly detailed to highlight the simplicity of implementation with IPFS and the inner workings of these kinds of cellular automata (CA). Both algorithms were implemented, and experiments were conducted throughout five datacenters of Grid'5000 testbed in France to obtain preliminary performance results in terms of network band-width usage. This work is prior to implementing a large-scale decentralized epidemic propagation modeling and prediction system based upon asynchronous distributed cellular automata with application to the current pandemic of SARS-CoV-2 coronavirus disease 2019 (COVID-19). This is an open access article under the CC BY-SA license.
... Hence, when data owners transmit "erasure commands" to the distributed network, it is not clear if all the peers would obey this command and delete their version of the "deleted" file or document. A solution to this data replication issue can be a common technique commonly present in data centers [23,24]. Scalability Challenges Since GLASS aims to create an eGovernance framework to be followed by all European Union's member states, the infrastructure's scalability poses a real threat. ...
Chapter
Global health security concerns have gained vast importance in recent times with outbreak of COVID-19. Today, the growing interdependence among countries and states has effected into accelerated growth of pandemics. A global need for rugged medical systems on a common platform is deemed today. Pandemics will not stop, they will resurrect again, they will happen irrespective till such times the medical world attains a disease less world in future. But till then, we can attempt to decelerate the pandemics growth enabled with new generation technologies. Medical cyber-physical systems are marred by a number of challenges and this paper proposes a model to negate these identified challenges enabled on multichain blockchain platform that imparts peculiar blockchain characteristics to the network of effected systems. The proposed model also enables to share encrypted data on select blockchain nodes granted defined access controls with proven encryption algorithms.
Chapter
Blockchain is mainly used to store the amount of coins transferred and the information of sender/recipient in text format. Treating such simple information, computing and comparing hashes of the blocks, and maintaining its integrety already burden the system, which causes ‘the limit of block capacity’, the difficulty of putting a large amount of data in one block of the blockchain. This presents a solution to the block capacity problem by efficiently distributing and encrypting files using the IPFS program with which a large amount of data is recorded outside the Blockchain.
Article
Blockchain and the programs running on it, called Smart Contracts, are increasingly applied in all fields where trust and strong certifications are required. Our work focuses on industrial applications of blockchains, and not on cryptocurrencies or tokens. We use frameworks to compare public and permissioned blockchains, specifically suited for industrial applications. We also propose a complete solution based on Ethereum to implement a decentralized application, putting together in an original way components and patterns already used and proved. This solution is characterized by a set of validator nodes running the blockchain using Proof-of-Authority or similar efficient consensus algorithms, by the use of an Explorer enabling users to check the blockchain state, and the source code of the Smart Contracts running on it. From time to time, the hash digest of the last mined block is written into a public blockchain to guarantee immutability. The right to send transactions is granted by validator nodes to users by endowing them with the Ethers mined locally. Overall, the proposed approach has the same transparency and immutability of a public blockchain, largely reducing its drawbacks.
Article
The recently proposed Blockchain-based healthcare system proposes an interesting vision for the level of data integrity and security. This research aims to propose a conceptual model of a break-glass conceptual for Blockchain-based healthcare systems. In case of emergency, it provides access to the whole patient’s medical records for healthcare professionals as quickly as possible regarding patients’ privacy and data security. The proposed conceptual model was designed based on blockchain technology, IPFS (InterPlanetary File System), and ABAC (Attribute-Based Access control) as a novel design in this domain. In current healthcare systems, regulatory and non-integrated offline data sources make it near impossible for timely access to patients’ EHRs and EMRs, even in case of emergencies for healthcare professionals. Our conceptual model could be a satisfactory alternative not only for patients but also for governing organizations to handle this situation clearly by regarding patients’ privacy. Additionally, it can work in an untrusted environment, and it doesn’t require bypassing the access control system to make the patients’ data available. In case of emergencies, healthcare professionals receive medical records access near just in time with regard to all the rights of security and privacy based on the attribute which were set by the patients in the past. This novel conceptual model has been designed by coupling Blockchain technology with IPFS, and the attribute base control system (ABAC).
Conference Paper
Full-text available
In isolated network domains, the global trustworthiness (e.g., consistent network views, etc) is critical to the multiple-party business partners who aim to perform the trusted corporations depending on each isolated network view. However, to achieve such global trustworthiness over distributed network domains is a challenge. This is because when multiple partners are required to exchange their local domain views with each other, it is difficult to ensure the data trustworthiness among them. The isolated domain view in each partner is prone to be destroyed by the malicious falsification attacks. To this end, we propose a blockchain-based approach that can ensure the trustworthiness among multiple-party domains. In this paper, we mainly present the design and implementation of the proposed trustworthiness-protection system. A cloud-based prototype and a local testbed are developed based on Ethereum. Finally, experimental results demonstrate the effectiveness of the proposed prototype and testbed.
Article
Full-text available
In a research community, data sharing is an essential step to gain maximum knowledge from the prior work. Existing data sharing platforms depend on trusted third party (TTP). Due to the involvement of TTP, such systems lack trust, transparency, security, and immutability. To overcome these issues, this paper proposed a blockchain-based secure data sharing platform by leveraging the benefits of interplanetary file system (IPFS). A meta data is uploaded to IPFS server by owner and then divided into n secret shares. The proposed scheme achieves security and access control by executing the access roles written in smart contract by owner. Users are first authenticated through RSA signatures and then submit the requested amount as a price of digital content. After the successful delivery of data, the user is encouraged to register the reviews about data. These reviews are validated through Watson analyzer to filter out the fake reviews. The customers registering valid reviews are given incentives. In this way, maximum reviews are submitted against every file. In this scenario, decentralized storage, Ethereum blockchain, encryption, and incentive mechanism are combined. To implement the proposed scenario, smart contracts are written in solidity and deployed on local Ethereum test network. The proposed scheme achieves transparency, security, access control, authenticity of owner, and quality of data. In simulation results, an analysis is performed on gas consumption and actual cost required in terms of USD, so that a good price estimate can be done while deploying the implemented scenario in real set-up. Moreover, computational time for different encryption schemes are plotted to represent the performance of implemented scheme, which is shamir secret sharing (SSS). Results show that SSS shows the least computational time as compared to advanced encryption standard (AES) 128 and 256.
Article
Full-text available
Satellite-based communication technology regains much attention in the past few years, where satellites play mainly the supplementary roles as relay devices to terrestrial communication networks. Unlike previous work, we treat the low-earth-orbit (LEO) satellites as secure data storage mediums [1]. We focus on data acquisition from an LEO satellite-based data storage system (also referred to as the LEO based datacenters), which has been considered as a promising and secure paradigm on data storage. Under the LEO based datacenter architecture, one fundamental challenge is to deal with energy-efficient downloading from space to ground while maintaining the system stability. In this paper, we aim to maximize the amount of data admitted while minimizing the energy consumption, when downloading files from LEO based datacenters to meet user demands. To this end, we first formulate a novel optimization problem and develop an online scheduling framework. We then devise a novel coflow-like "Join the first K-shortest Queues (JKQ)" based job-dispatch strategy, which can significantly lower backlogs of queues residing in LEO satellites, thereby improving the system stability. We also analyze the optimality of the proposed approach and system stability. We finally evaluate the performance of the proposed algorithm through conducting emulator based simulations, based on real-world LEO constellation and user demand traces. The simulation results show that the proposed algorithm can dramatically lower the queue backlogs and achieve high energy efficiency.
Conference Paper
Full-text available
The Interplanetary File System (IPFS) is a distributed file system that seeks to decentralize the web and to make it faster and more efficient. It incorporates well-known technologies, including BitTorrent and Git, to create a swarm of computing systems that share information. Since its introduction in 2016, IPFS has seen great improvements and adoption from both individuals and enterprise organizations. Its distributed network allows users to share files and information across the globe. IPFS works well with large files that may consume or require large bandwidth to upload and/or download over the Internet. The rapid adoption of this distributed file system is in part because IPFS is designed to operate on top of different protocols, such as FTP and HTTP. However, there are underpinning concerns relating to security and access control, for example lack of traceability on how the files are accessed. The aim of this paper is to complement IPFS with blockchain technology, by proposing a new approach (BlockIPFS) to create a clear audit trail. BlockIPFS allows us to achieve improved trustworthiness of the data and authorship protection, and provide a clear route to trace back all activities associated with a given file using blockchain as a service.
Article
Full-text available
With the widespread popularity of Internet-enabled devices, mobile users can request and receive messages anytime and anywhere, which facilitates information feedback for smart city management. However, few people are willing to reflect or report some violations of law and discipline around them, and more people choose to ignore. In general, there are two major reasons for this phenomenon. First, reporting with a real name is highly recommended, but it is difficult to send trusted and reliable reporting messages without revealing the reporter’s identity. Second, generally no benefit, users usually lack the motivation to report due to worrying about being retaliated. In this paper, we propose an effective anonymous reporting system called ReportCoin, a novel Blockchain-based incentive anonymous reporting system. ReportCoin guarantees user identity privacy and reporting message reliability throughout the reporting process. On the one hand, ReportCoin allows nondeterministic mobile users to vote the reporting by signing and to send anonymous announcements in the non-fully trusted network. On the other hand, ReportCoin motivates users with incentives to report without worrying about the disclosure of identity information to be retaliated. Meanwhile, account information and transaction records in ReportCoin are open, transparent, and tamper-resistant. Theoretical analysis and extensive experimental results show that ReportCoin is efficient and practical.
Article
Full-text available
Blockchain technology becomes increasingly popular. It also attracts scams, for example, Ponzi scheme, a classic fraud, has been found making a notable amount of money on Blockchain, which has a very negative impact. To help dealing with this issue and to provide reusable research data sets for future research, this study collects real-world samples and proposes an approach to detect Ponzi schemes implemented as smart contracts (i.e., smart Ponzi schemes) on blockchain. Firstly, 200 smart Ponzi schemes are obtained by manually checking more than 3,000 open source smart contracts on the Ethereum platform. Then, two kinds of features are extracted from the transaction history and operation codes of the smart contracts. Finally, a classification model is presented to detect smart Ponzi schemes. Extensive experiments show that the proposed model performs better than many traditional classification models and can achieve high accuracy for practical use. By using the proposed approach, we estimate that there are more than 500 smart Ponzi schemes running on Ethereum. Based on these results, we propose to build a uniform platform to evaluate and monitor every created smart contract for early warning of scams.
Conference Paper
Full-text available
In this paper we propose a system that moves the bytecode of an Ethereum contract creation transaction off-chain. As blockchains are append-only we present a way to help reduce the chain size and growth for Ethereum. Contract creation transaction data is replaced with hashes which identify a file in InterPlanetary File System (IPFS). Doing so reduces the size of data stored in such transactions by 93.86% in our dataset. The proposed system retains the assurance provided by blockchain and reduces network traffic under certain conditions .
Conference Paper
Ethereum, the second-largest cryptocurrency valued at a peak of $138 billion in 2018, is a decentralized, Turing-complete computing platform. Although the stability and security of Ethereum---and blockchain systems in general---have been widely-studied, most analysis has focused on application level features of these systems such as cryptographic mining challenges, smart contract semantics, or block mining operators. Little attention has been paid to the underlying peer-to-peer (P2P) networks that are responsible for information propagation and that enable blockchain consensus. In this work, we develop NodeFinder to measure this previously opaque network at scale and illuminate the properties of its nodes. We analyze the Ethereum network from two vantage points: a three-month long view of nodes on the P2P network, and a single day snapshot of the Ethereum Mainnet peers. We uncover a noisy DEVp2p ecosystem in which fewer than half of all nodes contribute to the Ethereum Mainnet. Through a comparison with other previously studied P2P networks including BitTorrent, Gnutella, and Bitcoin, we find that Ethereum differs in both network size and geographical distribution.
Conference Paper
IPFS has surged into popularity in recent years. It organizes user data as multiple objects where users can obtain the objects according to their Content IDentifiers (CIDs). As a storage system, it is of great importance to understand its data I/O performance. But existing work still lacks such a comprehensive study. In this work, we deploy an IPFS storage system with geographically-distributed storage nodes on Amazon EC2. We then conduct extensive experiments to evaluate the performance of data I/O operations from a client's perspective. We find that the access patterns of I/O operations (e.g., request size) severely affect the I/O performance, since IPFS typically uses multiple I/O strategies to perform different I/O requests. Moreover, for the read operations, IPFS requires to resolve remote nodes and downloading objects via the internet. Our experimental study reveals that both resolving and downloading operations can become bottlenecks. Our results can shed light to optimizing IPFS in avoiding high-latency I/O operations.