PreprintPDF Available

When Blockchain Meets Distributed File Systems: An Overview, Challenges, and Open Issues

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Constructing globally distributed file systems (DFS) has received great attention. Traditional Peer-to-Peer (P2P) distributed file systems have inevitable drawbacks such as instability, lacking auditing and incentive mechanisms. Thus, Inter-Planetary File System (IPFS) and Swarm, as the representative DFSs which integrate with blockchain technologies, are proposed and becoming a new generation of distributed file systems. Although the blockchain-based DFSs successfully provide adequate incentives and security guarantees by exploiting the advantages of blockchain, a series of challenges, such as scalability and privacy issues, are also constraining the development of the new generation of DFSs. Mainly focusing on IPFS and Swarm, this paper conducts an overview of the rationale, layered structure and cutting-edge studies of the blockchain-based DFSs. Furthermore, we also identify their challenges, open issues and future directions. We anticipate that this survey can shed new light on the subsequent studies related to blockchain-based distributed file systems.
Content may be subject to copyright.
Date of publication March, 2020, date of current version March 12, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2979881
When Blockchain Meets Distributed File
Systems: An Overview, Challenges, and
Open Issues
School of Data and Computer Science, Sun Yat-Sen University, 510006, Guangzhou, China
National Engineering Research Center of Digital Life, Sun Yat-sen University, Guangzhou, China
Corresponding author: Zibin Zheng. (e-mail:
The work described in this paper was supported by the National Key Research and Development Program (2016YFB1000101), the
National Natural Science Foundation of China (61902445, 61722214), and the Guangdong Province Universities and Colleges Pearl River
Scholar Funded Scheme (2016).
ABSTRACT Constructing globally distributed file systems (DFS) has received great attention. Traditional
Peer-to-Peer (P2P) distributed file systems have inevitable drawbacks such as instability, lacking auditing
and incentive mechanisms. Thus, Inter-Planetary File System (IPFS) and Swarm, as the representative DFSs
which integrate with blockchain technologies, are proposed and becoming a new generation of distributed
file systems. Although the blockchain-based DFSs successfully provide adequate incentives and security
guarantees by exploiting the advantages of blockchain, a series of challenges, such as scalability and privacy
issues, are also constraining the development of the new generation of DFSs. Mainly focusing on IPFS and
Swarm, this paper conducts an overview of the rationale, layered structure and cutting-edge studies of the
blockchain-based DFSs. Furthermore, we also identify their challenges, open issues and future directions.
We anticipate that this survey can shed new light on the subsequent studies related to blockchain-based
distributed file systems.
INDEX TERMS Blockchain, Distributed File Systems, IPFS, Swarm
THERE have been many attempts dedicated to construct-
ing a distributed file system. The phenomenal popularity
and study of Peer-to-Peer (P2P) services, such as Napster
[1], Gnutella [2], Kazaa [3] and Morpheus [4], make the
implementation of distributed file systems an exciting and
promising research field. As one of the most successful P2P
distributed file system, BitTorrent [5] has supported over
100 million online users. It has a large-scale deployment
where tens of millions of nodes join and churn everyday. In a
distributed file system, storage resources and system clients
are dispersed in the network. Each user is both a creator and a
consumer of data stored in the system. Thus, the challenge is
to provide considerable incentives in an efficient, secure and
practical manner.
By far, the biggest distributed file system is HyperText
Transfer Protocol (HTTP), which is a web server used to
upload data. Then, other peers can access to a particular
data anywhere allover the world. To ensure the data ac-
cessibility in web servers, a maintaining cost needs to pay.
Such maintaining cost increases along with the growth of
data popularity. Moreover, another problem is that there are
very few ways to share the burden of information dissemi-
nation with the clients directly. This is because HTTP lacks
upgrading design and thus fails to take advantages of the
advanced file distribution techniques proposed in the past few
years. Meanwhile, P2P technique had been gathering a great
pace and soon dominated the majority of data packets in the
Internet. Such P2P file systems, like BitTorrent [5], optimize
resources brilliantly by giving different pieces of popular data
to clients and enabling them swap the missing parts between
each another. In this way, the bandwidth consumption of
hosts can be balanced and the overall cost of operational
expenditure (OPEX) can be also degraded.
Although BitTorrent has a lot of advantages aforemen-
tioned, the following inevitable drawbacks cannot be ig-
1) Downloading is unstable, which limits BitTorrent to be
VOLUME 4, 2016 1
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
widely used in specific occasions.
2) Unable to verify file publishers, and it is hard to guar-
antee the credibility of the content downloaded.
3) There is no incentive mechanism such that the seed
nodes are not rewarded for sharing their bandwidth and
storage resources.
Anticipating to replace HTTP, Zeronet [6] adopted Bit-
Torrent as the file distribution mechanism for Web content.
However, simply sharing bandwidth, storage and computing
resources cannot provide the brilliant experience as HTTP
users expect.
Recently, blockchain has become a buzzword in both
industry and academia, and the combination of blockchain
and distributed file system is becoming a promising solution,
where blockchain is expected to provide incentives and se-
curity for the stored files in systems. Currently, the popular
blockchain-based distributed file systems include IPFS [7],
Swarm [8], Storj [9], and PPIO [10]. Within those file sys-
tems, IPFS is a peer-to-peer distributed file system for storing
and accessing files, websites, applications and data; Swarm is
a distributed storage platform and content distribution service
based on Ethereum; Storj is another peer-to-peer decentral-
ized cloud storage platform that allows users to share data
without relying on a third-party data provider; and PPIO is
a decentralized programmable storage network that permits
users store and retrieve any data from anywhere on web. With
respect to the combination with blockchains, IPFS, Swarm,
and Storj file systems adopt Filecoin [11], Ethereum [12], and
Metadisk [13] as their incentive mechanisms, respectively.
PPIO exploits up to 4 proof algorithms, which are explained
in Section III, for its incentive layer.
Considering that the technologies of all distributed file
systems are similar to IPFS and Swarm, we review the
recent cutting-edge studies of blockchain-based DFSs mainly
focusing on IPFS and Swarm.
The contribution of this survey includes the following
This paper first introduces the layered structure of
blockchain-based DFSs. We then make a comprehensive
taxonomy of the cutting-edge studies on the scalability
and privacy perspectives.
We also clarified the challenges, open issues and future
directions of the blockchain-based DFSs.
To the best of our knowledge, this is the first survey
related to the blockchain-based DFSs. Our review in this
article can help subsequent researchers well understand
both the current development and the future trends of
the blockchain-based DFS.
The rest of this paper is organized as follows. In Section
II, we explain necessary preliminaries and basic concepts.
Section III shows the layered structure of distributed file
systems. Section IV summarizes the cutting-edge studies.
Section V discusses open issues, challenges and future direc-
tions. Finally, section VI concludes this paper. We also show
the structure of this survey in Figure 1.
Blockchain-based Distribu ted File Systems (DFS)
Preliminaries of DFS Layered Structure of
Blockchain-based DFS
Cutting-Edge Studies of
Blockchain-based DFS
Open Issue s, Challenges &
Futur e Directions
FIGURE 1. The structure of this article.
Since the blockchain-based distributed file systems empha-
sized on this article have a close correlation with the basic
data structure of blockchains, we first introduce the prelimi-
naries of Merkle Tree and Merkle DAG. Then, we have an
overview of BitTorrent, which can help us understand the
rationale of distributed file systems such as IPFS and Swarm.
Merkle Tree [14] is a binary tree built based on a crypto-
graphic hash function. Each leaf in an merkle tree has a hash
value which is computed by one or multiple imported values.
Each parent node derives its hash value from children’s value
which is recursively dependent on all values in its sub-
tree. Figure 2 illustrates an example of a merkle tree, each
leaf (H1-H4) obtains its value though computing imported
value (D1-D4) and parents (H5-H6) derive values from their
children (H1-H4) and finally the root of this merkle tree (H7)
is obtained which is relevant with every value in the tree.
In blockchain area, merkle tree is usually used for integrity
validations (for example the block validation in bitcoin
[15]) and quick validations (for example the light peers in
Ethereum [12]). Since a tiny change in a merkle tree can
drastically change the root of the tree, we can do integrity
validation by simply storing the root. To validate if a node is
in a merkle tree, only the a few hashes of nodes are needed,
instead of the entire tree. For example in Figure 2, H4 and
H5 are needed to validate if H3 is in the tree. Using H3, H4
and H5, a root (H8) can be computed. By comparing H7 and
H8, we can confirm that H3 is in the tree if two roots are the
same, or H3 is not in the tree if two roots are different.
Similar to the concept of merkle tree, Merkle DAG (Di-
rected Acyclic Graph) [7] is used in IPFS as a data object
model. An object in IPFS is a structure containing two
attributes: Data and Links. Each Link structure includes three
attributes: Name, Hash and Size. Using this object structure,
IPFS can compose objects and build a directed acyclic graph.
In IPFS, merkle DAG organizes the structure of a file or even
a file directory which is shown in Figure 3. In this figure,
there are two files (example.js and hello.txt) and one file
path (dir) in the root path of this file directory, example.js
is divided into three different data pieces and file path dir
has two files: other.txt and example.txt (here file content
2VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
D1 D2 D3 D4
Data Blocks
FIGURE 2. An illustration of a Merkle Tree.
Qm9dv4 …
Data piece3
Qm04km …
Data piece2
Qm95kv …
Data piece1
example.js dir
FIGURE 3. Merkle DAG in IPFS.
of example.txt in dir and hello.txt in root path are exactly
the same therefore they are linked to the same object), each
object derives its value though computing its children’s value
and the content of data.
As mentioned earlier, BitTorrent (BT) is one of the most
popular distributed file systems. Basically, the process of file
sharing in BitTorrent is illustrated in Figure 4, and can be
described with the following 5 steps:
Peer A interacts with a Web server and downloads a
.torrent file.
Peer A interacts with the tracker which peer A finds in
the *.torrent file, and requests a list of peers that are in
the Torrent network.
Tracker sends a list of a specified number of peers that
are in the Torrent network.
Peer A selects randomly a part of candidate peers from
the list as its neighbors and establishes connections with
each of them.
Then peer A can exchange file pieces with its neighbors
using swarming technique.
In BitTorrent, an overlay network called Torrent is estab-
lished when each file is being distributed. Torrent is com-
posed by peers in a network which can be classified into
two types: seed and leecher. A seed is a client which has a
complete copy of a file, while a leecher is a client which is
downloading a file. Besides seed and leecher, the Web servers
A peer
:Download .torrent file
:Exchange file pieces
:Establish connections
:Request a list of peers
:Send a list of peers
FIGURE 4. Mechanism of file-sharing in BitTorrent, which can help us well
understand the blockchain-based distributed file system IPFS.
and trackers are also required. If a peer wants to join a Torrent
network, it can obtain a .torrent file from a Web server. This
file contains information of a file including its name, length,
hash digest and the URL of the tracker. A tracker is a special
peer storing the meta information of peers which are active
in a Torrent network. A peer can interact with a tracker and
obtain the list of IP/Port pairs of other peers in a Torrent,
and then select randomly about 20-40 peers from the list
as its neighbors. In BitTorrent, a file exchanging technique
called swarming is adopted to separate a file into fixed-size
pieces each of which is usually with a 256 KB in size [5].
When a piece is fully downloaded, a peer compares its SHA1
hash value with the value in the .torrent file. If match, the
peer announces the availability of this complete piece to its
neighbors for further file exchanging and downloading.
In this section, we present the typical layered structure of
blockchain-based distributed file systems by particularly em-
phasizing on IPFS and Swarm. The structure is shown in
Table 1 in detail. Generally, we classify 7 layers behind the
popular distributed file systems, i.e., Identities Layer, Data
Layer, Data-swap Layer, Network Layer, Routing Layer,
Consensus Layer and Incentive Layer. Each layer is a critical
module for distributed file systems. We summarize their
functions and related references in Table 1.
To archive the content distribution between nodes in P2P file
system, each node has to be identified by a unique identifier,
which needs to ensure collision-free. It means that two differ-
ent data objects can never map to the same identifier. In IPFS,
the encrypted hash (in multi-hash format) of a public key, i.e.,
NodeId, is used to identify each node. The format of multi-
hash is hhash function codeihhash digest lengthihhash digest
bytesi. Nodes periodically check public keys and NodeId
VOLUME 4, 2016 3
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Layer Function Examples and References
Identity Layer Identity layer assigns unique id for each node Keccak hash [16]
Data layer Data layer organizes file structure in distributed file system Merkle DAG [7]
Data-swap layer Data swap layer formulates file sharing strategy between each node BitSwap [5]
Network layer Network layer enables nodes to discover other nodes, establish connec-
tions and exchange file with each other in a secure environment libP2P [17], Devp2p [18]
Routing layer Routing layer enables each file piece to be located and accessible by
nodes in the network
Distributed sloppy hash table (DSHT) [19], Dis-
tributed preimage archive (DPA) [20]
Consensus layer
Consensus layer ensures ledger recording transactions in each node are
basically correct and encourages users to maintain the consistency of
the network
“Expected Consensus" [21], Proof-of-Work [12]
Incentive layer
Incentive layer establishes reward and punishment mechanism of dis-
tributed file system, encourages nodes in the network to be active and
honest in the transaction
Filecoin [11], SWAP, SWEAR and SWINDLE [8]
TABLE 1. Layered structure of Distributed File systems such as IPFS & Swarm.
when connecting with each other. In Swarm systems, the
node hash-address is generated by Keccak 256bit SHA3 [16]
using the public key of an Ethereum account.
Generally, the functionalities of the routing layer of a dis-
tributed file system includes: 1) maintaining peer-connection
topology such that specific peers and data objects can be
located, 2) responding to the queries from both local and
remote peers, and 3) communicating with distributed hash
IPFS adopts Distributed Sloppy Hash Table (DSHT) [19] ,
which is implemented based on S/Kademlia [22] and Coral
[23]. Such the DSHT located in a peer can help find 1)
the network addresses of other peers, and 2) the group of
peers who can serve specific data objects. The conventional
Distributed Hash Table (DHT) stores small values. For larger
values, DSHT stores references, i.e., the NodeIds of peers
who can serve a block. It should be noticed that IPFS is
highly modular, and DSHT is just a temporal protocol that
can be displaced in the future.
Swarm implements its routing layer using Distributed
Preimage Archive (DPA) technique [20]. In such DPA, a
source object is divided into equal-sized chunks which are
then synced to different nodes. When receiving these content-
addressed chunks other nodes could sync them to their neigh-
bors that are in the same address space.
Under the framework of IPFS, an advanced generic P2P so-
lution, named libP2P [17], is exploited as the network layer.
libP2P is developed based on bittorrent DHT implementa-
tion. Based on libP2P, IPFS can use any network protocol to
transfer data. If underlying network is not stable, IPFS can
alter to choose UTP [24] or SCTP [25]. IPFS achieves this
free shifting mainly by using multiaddr formatted technique
[7], which combines addresses and corresponding protocols.
Swarm relies on the Ethereum P2P network, which is
comprised of three different protocols: 1) RLPx (Recursive
Length Prefix) [26] for node discovery and secure data
transmission, 2) DevP2P [18] for node session establishment
and message exchange, and 3) Ethereum subprotocol [27].
DevP2P [18] is inspired by libP2P and has security properties
that are beneficial to Swarm. When discovering through
RLPx, Swarm nodes establish TCP connections and send
“HELLO" messages including NodeId, listening port and
other attributes based on DevP2P. Sessions start to transmit
data packets. Due to the ecosystem of Ethereum, Swarm
has a large number of long-term nodes, which support the
robustness and stability of Swarm systems.
There are four levels of the data model in IPFS:
Block: an arbitrary-sized piece of data.
List: a collection of blocks or other lists.
Tree: a collection of blocks, lists, or other trees.
Commit: a snapshot in the version history of a tree.
Such data model is similar to that of Git [28]. Based on
this data model, IPFS systems employ Merkle DAG to store
data. Merkle DAG identifies data and links in each data object
with multi-hash technique [7], which protects stored data
from tampering, and makes file path to be retrieved easily
because data object is converted into string-formatted path
(with a format like /ipfs/object-hash/object-name). To divide
a file into independent blocks, IPFS exploits many algorithms
such as rsync rolling-checksum algorithm [29], and Rabin
Fingerprints [30].
Swarm also defines a set of data structures:
Chunk: a fixed-size (maximum 4 KB [7]) piece of data.
File: a complete set of chunks.
Manifest: a mapping between paths and files, which
handles file collection.
Chunker, which is a Swarm’s component for splitting and
recovering files, is able to process live stream data. After
being split, chunks are collected to calculate the Swarm
hashes, in which a hash algorithm is used to obtain the root
hash of the Merkle Tree. The root hash is then used to identify
a specific file and avoid tampering. During this procedure,
the hash of each chunk is also calculated and is treated as a
reference to this chunk.
4VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Veri f y
Filecoin IPFS Network
FIGURE 5. Mechanism and position of Filecoin [11], which is adopted by
IPFS and exploits blockchain as its fundamental components.
1) Incentive Layer of IPFS
As shown in Figure 5, Filecoin [11] is a blockchain-based
digital payment system, which supports digital storage and
data retrieval for IPFS users. It is adopted as an incentive
layer for IPFS. There are two markets in Filecoin: a Storage
Market and a Retrieval Market. The data of Storage Market
is stored on the Filecoin blockchain, and the data of Retrieval
Market is recorded off-chain.
These two markets provide data storage and data retrieval
services via a network composed of Storage Clients,Storage
Miners,Retrieval Clients and Retrieval Miners. Those partic-
ipants are explained as follows.
Storage Clients are those who need file storage services.
They are on the demand side of Storage Market.
Storage Miners are the nodes which provide storage to
a Filecoin system using its free disk space. They are
on the supply side of Storage Market. The transactions
occurred on the Storage Market contribute new blocks
to the Filecoin blockchain.
Retrieval Clients are those who desires to retrieve a spe-
cific resource from the network. They are the demand
side of the Retrieval Market.
Retrieval Miners are those who provide network re-
sources, such as bandwidth, helping retrieval clients
search for the retrieval information. They are on the
supply side of the Retrieval Market.
To store data in Filecoin, a storage client first submits a bid
order to Storage Market. If a storage miner intends to take a
bid order, it has to send a request order to Storage Market.
When Storage Market is receiving a bid order and a request
order, storage clients and storage miners start to exchange
blocks and submit a signed deal order to Storage Market.
After that, storage miner must prove the data stored in its
dedicated uniquely physical storage by repeatedly generating
proofs of replication, which is then verified by IPFS.
To retrieve data from Filecoin, similarly a retrieval client
first submits a bid order to Retrieval Market. When Retrieval
Market is receiving a request order from a retrieval client, the
retrieval miners begin to transport data and submit a signed
deal order to Retrieval Market to confirm whether a retrieve
deal is succeeded or not.
2) Incentive Layer of Swarm
In Swarm, incentive scheme consists of two important parts:
1) bandwidth incentives, and 2) storage incentives. This is
because bandwidth and storage are the two most important
resources in a distributed file system.
Bandwidth Incentives. In the context of Swarm, the ser-
vice of delivering chunks is chargeable, and nodes can trade
services for services or services for tokens. In order to moti-
vate nodes to provide stable services in a credible context,
Swarm proposes the Swarm Accounting Protocol (SWAP)
[8]. Firstly, nodes negotiate chunk price when communicat-
ing in the handshake protocol. Different prices mean vary-
ing bandwidth costs. After chunk price is set, chequebook
contract is used to secure the payment. Chequebook contract
is a kind of smart contract and has ether (Ethereum token)
balance. Another secure payment called channel contract
is later proposed by Swarm and can be seen in [8]. Both
modes of payment support secure off-chain transactions and
delayed updates. All of the transactions are stored in the state
of Ethereum blockchain which cannot be tampered. Finally,
nodes establish network connection and exchange data.
Storage Incentives. Swarm encourages nodes to preserve
the data that has been uploaded to network.Normally long-
term data preservation is not realistic. Unpopular chunks do
not bring enough profits and may be cleaned up to make room
for new chunks. In order to guarantee long-term availability
of data, owner of each chunk needs to compensate for storage
of nodes. To manage storage deals, Swarm adapts a set of
incentive schemes: SWAP,SWEAR and SWINDLE, which are
described as follows.
SWAP [8]: Nodes establish connections with their reg-
istered peers that are the target nodes they want to
compensate to and sign contracts with. Then they can
swap information including syncing, receipting, price
negotiation and payments.
SWEAR [8]: Registered peers are responsible for their
promises of long-term storage and they must register
via the SWEAR (Secure Ways of Ensuring Archival or
Swarm Enforcement And Registration) [8] contract on
Ethereum by uploading their deposit. Peers are stood to
be punished and lose deposit in an on-chain litigation
process if they violate the rules.
SWINDLE [8]: Nodes provide signed receipts for
stored chunks. When dispute about whether the rules
are violated has occurred, nodes that lost the chunk can
submit a challenge to the SWINDLE (Secured With
Insurance Deposit Litigation and Escrow) [8] contract
by uploading the receipt of the lost chunk. Nodes can
also propose the refutation of a challenge by uploading
the chunk or proof of custody. Swindle contract decides
which one is guilty by checking the hash of the chunk.
When chunks are being forwarded, a chain of contracts
are created based on the incentive schemes aforementioned,
which elegantly solve the disputes between nodes.
IPFS adopts BitSwap [5] as its data-swap layer. BitSwap
is based on BitTorrent protocol. In detail, BitSwap nodes
VOLUME 4, 2016 5
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
provide the blocks they are holding to each other directly,
aiming to spread the blocks within their group. The debt of
a node raises when it receives target blocks and decreases
when it contributes blocks that the other nodes desire. Thus,
BitSwap encourages nodes to cache and contribute blocks
To prevent the nodes that never share, each BitSwap node
checks the debt of the other peers before they exchange
blocks. BitSwap nodes also keep ledgers that record the
transferring history, and exchange ledgers with each other
when establishing connections. This exchange-policy pro-
tects BitSwap ledger from tampering, and isolates the ma-
licious nodes that lose ledger intentionally.
In Swarm, nodes store chunks for selling to get profits
when they receive a data-retrieve request. If nodes do not
have the target chunk claimed in the retrieve request, they
pass the retrieve request to the nearest neighbor node. During
managing storage transactions, receipts play an important
role. When Swarm nodes interact with any contracts, receipts
are generated and stored in Swarm. In this way, the source of
a chunk is accessible, and a commitment in case of litigations
can be traced.
Consensus mechanism is critical for every blockchain sys-
tem. In a large distributed network, multiple peers form a
network cluster normally through asynchronous communica-
tions. Network could be congested, resulting in that the error
messages propagate all over the system. Thus, peers could be
failed if they cannot communicate with other with a consen-
sus network view [31]. Therefore, it is necessary to define a
resilient consensus protocol that can work in the unreliable
asynchronous networks for distributed file systems. The aim
of such consensus protocol is to ensure that each peer reaches
a secure, reliable and consistent state without a centralized
In the following, we review several typical consensus
protocols proposed by recent representative studies.
1) “Expected Consensus" Algorithm of FileCoin
Different from Ethereum which only has one main chain,
Filecoin [11] contains not only a single main chain, but also
astorage market as well. Users in Filecoin interact with the
storage market. These interactions of users are stored in the
main-chain ledger. Three proofs that play an important role
in consensus process of Filecoin are summarized as follows.
Transaction Proof: After miner and user have reached
a deal, the main chain locks the token of the user
and deposit of the miner. Main chain also records the
information about the transaction including hard disk
sector of miner, details of deposit, transaction fee and
storage deadline, etc.
Proof-of-Replication (PoRep): A file is divided into
pieces and each piece is accepted by a storage miner.
At this time, a storage miner may pretend to store a
piece (this type of behavior is called a generation attack
[32]). Furthermore, a miner may obtain a piece from
another peer instead of itself (this type of behavior is
called an outsourcing attack [32]). Another case is that
a miner may create multiple fake peers and pretend to
store several replications of a file piece (this type of
behavior is called a Sybil attack [32]). To prevent these
network attacks, Filecoin requires each miner to submit
the proof of replication to the main chain. Such the
Proof-of-Replication ensures that each miner stores file
pieces truly and independently.
Proof-of-Spacetime (PoSt): To prove that miners keep
storing a file piece in the effective time of transaction,
each miner has to submit proof of spacetime to the main
chain regularly. In the current design of Filecoin, the
proof is committed by providing spacetime every 20,000
blocks (roughly consuming 6 days to mine on average)
[21] to prove that the file piece is not missing. Storage
market has to validate the proofs uploaded by miners
and decides whether to punish miners every 100 blocks
(50 minutes to mine on average) [21].
The consensus algorithm of Filecoin is called Expected
Consensus [21], in which a ticket is computed in each round
of consensus process. By comparing the ticket value and the
effective storage of each peer, a peer or several peers can be
the leaders of this round. A leader can select transactions to
pack in the new blocks generated. When a block is packed, it
will be sent to other peers for synchronization. Transactions
in a block are executed by Ethereum virtual machine (EVM)
[33], and the state of each account will be updated.
2) Consensus of Ethereum
There are four stages of Ethereum: Frontier, Homestead,
Metropolis and Serenity. In the first three stages, proof-of-
work (PoW) [34] is adopted as the consensus mechanism of
Ethereum, while in the fourth stage, the proof-of-stake (PoS)
[35] will be adopted.
In PoW, each miner packs transactions from the transac-
tion pool and constructs a new block in a sequential order.
Then miners adjust the nonce value constantly which is
imported to PoW function [34] with the block header. A
target indicator is also computed according to the difficulty
of the blockchain. By comparing the result of this function
with the target indicator, the miner decides whether it wins
in the consensus process. When a miner confirms that it has
won, it starts to broadcast its new block to other peers. Upon
receiving a block from other peers, a miner stops computing
to validate the nonce value of the newly received block. Each
transaction of the new block is executed by EVM. After the
processing of all transactions included in this new block, the
state of this peer will be updated [33]. Currently, the average
time of consensus in Ethereum is around 15 seconds, which
ensures the consistency of all peers [36].
3) Consensus Algorithms of Other File Systems
Storj’s Proofs of Retrievability: Designed as a decentral-
ized cloud object storage, Storj [9] proposed Proof of Storage
6VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
in its first-version white paper. Interestingly, we found that
in version 2.0 of Storj’s white paper [37], the consensus
algorithm has been changed to Proofs of Retrievability [38].
Proofs of Retrievability aims at ensuring a certain piece of file
exists on a host. It offers a high availability of files under an
ideal proof, in which messages are with minimum size, and
pre-processing is minimal. According to the new white paper
[37], Poof of retrievability is still under ongoing research
and implementation. We then analyze the reason behind the
change of Storj’s consensus algorithms. It probably because
of that the current reputation systems, including proof of
storage, fail to solve the cheating client attacks [37]. In such
cheating attacks, it is hard to independently verify whether
a privately verifiable audit under a reputation system was
issued or not as claimed. Thus, the proof of storage lacks
publicly verifiable practices.
PPIO’s 4 Proof Schemes: PPIO [10] exploits difference
proof algorithms, i.e., PoRep, PoSt, Proof of Download
(PoD), and Light Proof of Capacity (LPoC), in which PoD
and LPoC are two brand new proof mechanisms created by
PPIO. PoD particularly supports the media streaming related
service. LPoC is designed to cold start storage miners. How-
ever, because LPoC technically occupies hard disk resources
with no real values, the PPIO team has decided to abandon
the implementation of LPoC.
As the efficient decentralized storage layer of the next gen-
eration Internet, both IPFS and Swarm use similar technolo-
gies. They provide low-latency data retrieval, fault-tolerant
guarantees and decentralized/distributed storage solutions.
In identities layer, multi-hash technique [7] is used by
IPFS which can store the hash function and hash digest.
Swarm uses the account address of Ethereum directly. In
network layer, Swarm adapts the secure and stable network
of Ethereum. IPFS uses libP2P which is a more generic
solution. The incentive layer of Swarm relies smart contracts
of Ethereum, which support automated auditing and delayed
payment. This saves transaction costs of Swarm and remains
secure. Filecoin relies on proofs and consensus of blockchain
which is an overuse of blockchain. The PoW consensus of
Swarm stands the test of time while the expected consensus
of IPFS remains waiting the test of real-world.
Swarm inherits directly technology design of Ethereum.
For example, the identities layer, network layer and consen-
sus layer of Swarm are the same as Ethereum. As Swarm
benefits from Ethereum with its large ecosystem, secure and
living network and reliable funding sources. IPFS is highly
modular and can replace existing component with state-of-
the-art technology. In conclusion, the technology of Swarmis
more stable while the technology of IPFS is more advanced.
In this section, we discuss recent cutting-edge studies of
blockchain-integrated distributed file systems, mainly em-
phasizing on IPFS and Swarm.
With the number of transactions increasing in blockchain
networks, each peer has to validate and store a growing size
of transactions periodically. This incurs a huge burden of both
storage and performance to each peer. In addition, the limited
size of each block and the latency of consensus-achieving
must be taken into account, because these factors induce the
delayed transactions. Meanwhile, as the cluster size and data
replications growing in network, the performance of IPFS
and Swarm degrades severely.
In this part, we review several studies paying attention
on the scalability issues of distributed file systems, mainly
focusing on IPFS and Swarm. These works can be classified
into two categories: 1) scalability evaluation, and 2) storage
optimization. For convenient identification, we summarize
these studies on Table 2.
1) Scalability Evaluation
Although the performance of IPFS is under doubting by
academia, we only found few research studies that evaluate or
discuss the scalability of IPFS. The representative papers are
reviewed as follows. Wennergren et al. [39] discuss and ana-
lyzes the scalability performance of IPFS. They conducted
simulations with varying cluster sizes and replication fac-
tors. The simulation results show that the average download
time of data stored in IPFS increases as cluster size and
replication factor grow. In consequence, the response time
among peers in an IPFS network grows, and the downloading
speed reduces as well. The authors mentioned that the limited
bandwidth of each instance of IPFS could be one of the
critical reasons for the low scalability of IPFS.
Recently, Shen et al. [40] conducted the systematic
evaluations of IPFS storage system by deploying real
geographically-distributed instances on Amazon EC2 cloud.
The authors emphasize on the data I/O operations from a
client’s perspective. The extensive measurement results show
that the access patterns of clients can severely affect the I/O
performance of IPFS. Further quantitative analysis indicates
that downloading and resolving operations could be bottle-
neck factors while clients are reading objects from remote
To address the traceability problem of a distributed file
system, Nyaletey et al. [41] proposed a solution combining
the blockchain and IPFS which named BlockIPFS, which
can trace and audit the access events of each file on IPFS.
The authors conducted a group of experiments to evaluate
the scalability of the proposed BlockIPFS by varying the
number of nodes. Then, they measured the latency con-
sumed by uploading, downloading, and reading transactions
of each file stored in system. The measurement results show
that the increasing number of nodes does not cause drastic
growing of transaction times. Unfortunately, the scale of
their experiments is too small since the number of nodes in
BlockIPFS system is ranging from 3 to 27. Thus, this group
VOLUME 4, 2016 7
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Category Reference File System Methodology
Wennergren et al. [39] IPFS Analyzed scalability performance via varying cluster sizes and replication
Shen et al. [40] IPFS Evaluated storage system of IPFS on Amazon EC2 cloud, by emphasizing on
the data I/O operations in a client’s point of view.
BlockIPFS [41] IPFS
Nyaletey et al. proposed a solution by integrating blockchain and IPFS, which
can trace the access events of each file on IPFS. They evaluated the scalability
of BlockIPFS by varying the number of system nodes.
Chen et al. [42] IPFS Proposed a new storage model based on Zigzag code [43], for the block storage
scheme adopted by IPFS.
Norvill et al. [44] IPFS Proposed an off-chain approach that moves contract-generation codes to IPFS
database, aiming to improve the storage performance of distributed file system.
Design of Ethereum [45] Swarm Advocated that a chain of contracts in Swarm should be configured in off-chain
White paper [9] Storj Storj encrypts data using sharding technique and enables data availability based
on Reed-Solomon erasure coding [46].
Lightning Network [47],
Plasma [48]
Off-chain file
Allows blockchain clients conducting transactions in the off-chain manner,
aiming to offload the storage pressure of main-chain.
TABLE 2. Scalability-related studies of distributed file systems.
of experiments makes the scalability of their system unknown
under a very large-scale deployment.
2) Storage Optimization
Erasure Codes. To guarantee high data availability, some
distributed file systems, e.g., Sia [49] and Storj [9], adopt
the erasure codes for their storage strategy. In a typical (N,
K) erasure code [50], an original file is usually divided into
a number K(>1) of blocks. Each block is then encoded
to a larger number N(K) of coded blocks. Out of
those Nencoded blocks, any Kof them can reconstruct the
original file. Thus, exploiting erasure codes can improve the
storage resilience of distributed file systems. For example,
to improve user experience of a P2P file system, Chen et
al. [42] proposed a new storage model based on zigzag [43]
and blockchain techniques. The new storage model aims at
improving the block storage strategy adopted by IPFS.
Storing Data Off-Chain. On the other hand, we also
found other optimized solutions related to the storage of
transactions and smart contracts. For instance, to improve the
storage performance of distributed file systems, Norvill et al.
[44] proposed a solution that moves the contract-generation
code to an off-chain by treating IPFS as a storage database.
In their proposal, Ethereum loads complex contract codes
by sending a simple hash value to IPFS peers. By this way,
system clients only have to send hash values rather than
the full codes when performing fast synchronizations. Thus,
the bulk of network traffic can be reduced. In the design of
Swarm [45], a chain of contracts is configured to maintain
the basic operations. These contracts increase the data size
of blockchain such that Swarm is hard to be operated as a
full blockchain ledger. Thus, according to reference [44], we
know that the developers of Ethereum have been working on
Swarm towards an off-chain storage. Some other off-chain
solutions such as Lightning Network [47] and Plasma [48]
allows participants to execute transactions in the off-chain
manner, such that a large portion of on-chain transactions and
smart contracts can be offloaded from the main-chain. Thus,
integrating the off-chain techniques will bring new solutions
to the storage policy of future distributed file systems.
In Swarm and IPFS, data uploaded to the distributed file
systems by users is divided into several pieces, which are
then stored in different peers. Although the data uploaded
can be encrypted, the data content stored in the network is
accessible by every peer. Besides, according to the design
of IPFS and Swarm, transactions that record developments
of a peer can be easily collected. User’s information can
be revealed through the graph analysis of transactions. For
example, according to [51], a client can be identified through
the peers it directly connects to. Thus, transactions stored
in blockchain behind distributed file systems are publicly
To address these issues, a number of efforts have been
devoted to the privacy-preserving of distributed file systems.
Through an extensive literature review, we have found many
privacy-preserving solutions, mechanisms and applications.
Some representative works are classified into two main cat-
egories: 1) Access Control, and 2) Peer Anonumity. We also
compare several attributes of these studies in Table 3.
1) Access Control
In the distributed file systems such as IPFS and Swarm,
although users are not permitted freely to share data within
a specific group of peers, this is necessary when taking
the privacy issues into account. To provide access control
when sharing files, Steichen et al. [52] proposed a modified
version of IPFS named acl-IPFS based on Ethereum. An acl-
IPFS peer is constructed by an IPFS peer and an Ethereum
account. The uploading, downloading and transferring of
data in IPFS networks are achieved though the interaction
with smart contracts residing in Ethereum. Smart contracts
dynamically maintain the access control lists of each file
in acl-IPFS. Users can grant or revoke a permission of a
file through smart contracts, too. Aiming to enhance the
8VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Category Reference Technology Foundation Privacy Level Functionality
acl-IPFS [52] Smart contract, Ethereum,
IPFS Strong Provided an access control list for the files shared in the
proposed acl-IPFS system.
Ali et al. [53] Consortium blockchain,
Sidechain, IPFS Strong Proposed a modular consortium architecture for privacy
preserving towards IoT data.
Nizamuddin [54] IPFS, Ethereum smart con-
tract Strong Proposed a solution to the authenticity of original online
published digital works.
Wang et al. [55]
IIPFS, Ethereum and
Attribute-Based Encryption
Proposed a smart contract-based access control mecha-
nism for decentralized storage systems.
Naz et al. [56] IPFS, Smart contract, RSA
signatures Strong Proposed an IPFS-based secure data sharing framework
to deliver digital assets.
Nyaletey et al. [41] IPFS, Hyperledger Fabric Strong Proposed a solution BlockIPFS to trace the access events
and provide audit access to files stored on IPFS.
Huang et al. [31] Ethereum, local databases Strong Proposed an Ethereum-based access control and trust-
worthiness protection for multiple-domain participants.
Zerocoin [57] Bitcoin laundry system,
zero-knowledge proof Limited Offered a limited anonymity to the Bitcoin account ad-
dresses based on zero-knowledge proof.
E-Voting System [58] Bitcoin laundry system, Ze-
rocoin Limited Provided an electronic-voting system based on Zerocoin,
aiming to solve the privacy issues in original Bitcoin.
Zerocash [59]
Zero-knowledge Argument,
decentralized anonymous
payment scheme
Provided a strong anonymous transactions by covering
up the origins, destinations and the total amount of a
Mixcoin [60]
Accountable mixes, an in-
dependent cryptographic ac-
countability layer
Proposed an anonymous payment protocol that can be
deployed immediately with no changes to Bitcoin.
ReportCoin [61] IPFS, blockchain Strong Proposed a blockchain-based reporting system for the
management of smart city.
TABLE 3. Privacy-related studies of distributed file systems.
privacy preserving towards IoT data, Muhammad et al. [53]
proposed a modular consortium architecture by combining
the techniques of IoT and blockchains. The proposed ar-
chitecture can provide decentralized management for IoT
data by exploiting the advantages of blockchain and IPFS.
Nizamuddin et al. [54] studied the authenticity of online
digital and multimedia content. To provide the originality
proof, authors proposed an authenticity solution based on
IPFS and smart contracts. Based on IPFS, Ethereum and
attribute-based encryption (ABE) technologies, Wang et al.
[55] investigated the data storage and sharing mechanism for
distributed storage framework, in which no trusted private-
key-generator is required. To achieve the fine-grained access
control, a data owner can distribute secret keys for other
users, and encrypt his data under a certain access policy.
Then, towards transparency and quality of data, Naz et al.
[56] proposed a secure digital-asset sharing framework based
on integrated technology by combining IPFS, blockchain and
encryption mechanisms. Next, Huang et al. [31] proposed
an Ethereum-based network-view sharing platform, which
can bring global trustworthiness for multiple domains such
as different IoT domain networks. In particular, the domain
view of each partner is stored in their local databases, while
the Ethereum-based system provides the access control and
trustworthiness protection over all participants.
2) Peer Anonymity
The privacy preservation of blockchain peers attracts par-
ticular attention in recent years. For example, considering
that the original Bitcoin system has significant limitations
on the privacy of Bitcoin peers, Miers et al. [57] proposed
Zerocoin, which enables a limited anonymity to the Bitcoin
account addresses based on zero-knowledge proof. However,
the proposed Zerocoin cannot guarantee the full anonymity
because at least the number of minted and spent coins, and
the denomination of transactions are visible to all users of
this system. Using Zerocoin, Takabatake et al. [58] then
proposed a new Bitcoin laundry middleware for Bitcoin.
In this middleware, authors mentioned that the origin of
transactions can be hidden and miners are able to validate
transactions without signatures. However, the destination of
a transaction and the amount of a payment are still ex-
posed to users. Thus, the proposed e-voting system is also
with a limited anonymity. Moreover, the execution speed
of this voting system is also an obstacle that is hard to
address. To address this problem, Zerocash [59] is claimed
to fulfill a strong anonymity for payments, because it hides
the transaction amount and the values of user-held coins,
by invoking the Zero-Knowledge Succinct Non-interactive
Arguments of Knowledge (ZK-SNARKs) [62]. Mixcoin [60]
provided a combined service that transfers funds from multi-
ple source addresses to multiple destination addresses. Thus,
the relationship between two accounts is hard to be revealed.
Zou et al. [61] studied an incentive anonymous reporting
mechanism based on blockchain and IPFS. The accounts and
transactions stored in ReportCoin are open, transparent, and
tamper-resistant. Thus, the anonymity of reporting sources
can be protected with a high guarantee. The proposed Report-
Coin was only evaluated through simulations, which make
this work less convincing. The practicality of the proposed
VOLUME 4, 2016 9
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
incentive mechanism requires more convincing proofs by real
In this section, we discuss open issues, challenges and future
directions of distributed file systems with respect to 4 per-
spectives: Scalability,Privacy,Applications and Big Data.
1) Scalability Performance
We have reviewed some representative studies [39] related
to the scalability performance measurement of DFSs in pre-
vious section. These existing works have shown us some
insights of DFSs. For example, Wennergren et al. [39] men-
tioned that the limited bandwidth of each instance of IPFS
could be one of the critical reasons for the low scalability of
IPFS. The quantitative analysis [40] of systematic evaluations
towards IPFS storage system indicates that downloading
and resolving operations could be bottlenecks while IPFS
clients are reading objects from remote nodes. Nyaletey et
al. [41] evaluated the scalability of the proposed BlockIPFS
by varying the number of nodes. However, the scale of their
experiments is too small, making the scalability performance
of their system unclear under a very large-scale deployment.
Through the studies [39], [40], [44], we see that the current
distributed file systems, such as IPFS and Storj, are still in
their immature stages. For example, IPFS still faces some
notable shortcomings, including the bottlenecks of resolving
and downloading, and the high latency of I/O operations.
Thus, to achieve the large-scale commercial applications,
IPFS must solve a number of challenges such as storage
optimization, geo-distributed deployment of nodes, and file
request performance, etc.
On the storage-optimization perspective, although the con-
ventional Erasure coding Zigzag codes [43] can be used
to improve the storage efficiency for the proposed IPFS-
based systems, some open issues should not be ignored. For
example, reconstructing original files could bring a high con-
sumption of both disk I/O and bandwidth to some associated
peer nodes.
Another critical problem that IPFS needs to address is how
to update the contents already stored on its system. This is
because all data stored in the IPFS network is a series of hash
addresses. Once a change occurs on a file stored in IPFS,
the hash address changes, too. Therefore, an efficient update
mechanism should be developed for IPFS.
Finally, to improve the scalability of blockchain-based
DFSs, we believe that to develop new solutions that can
improve the efficiency of DFS’s structure layers can be a
promising direction. We wish to see the related studies will
be proposed soon.
2) Performance Measurement Methodology
The performance measurement of IPFS and Swarm con-
sidering Quality of Service (QoS) metrics still need to be
further conducted widely and deeply in future. Especially
when integrating them into business models, users desire
to know which one (either IPFS or Swarm) matches their
requirements best. Fortunately, Zheng et al. [63] proposed a
real-time performance monitoring framework for blockchain
systems. This work has evaluated four famous blockchain
systems, i.e., Ethereum [12], Parity [64], Cryptape Inter-
enterprise Trust Automation (CITA) [65] and Hyperledger
Fabric [66], with respect to the QoS metrics of transactions
per second,average response delay,transactions per CPU,
transactions per memory second,transactions per disk I/O
and transactions per network data. Such comprehensive
performance evaluation results give us insightful viewpoints
over the 4 well-known blockchain systems. Their experi-
mental logs and technique report [67] can be found from In addition, Curran et al. [68] mentioned
that they plan to analyze the performance of IPFS while a
website is under an unexpected surge of visitors. However,
we cannot find the subsequent technique report of their
3) System Measurement Standards
Based on the existing studies aforementioned, new system
measuring standards need to be proposed for IPFS and
Swarm. Generally, the system testing can be separated into
two phases [69]: a standardization phase and a testing phase.
In the former phase, a series of metrics have been designed
to show the performance of systems in terms of Transactions
Per Second, Contract Execution Time and Consensus-Cost
Time. In the latter phase, systems are tested in different
situations. For example, failures including network shutdown
and high memory occupation could be injected. Then, the
designed metrics could show the performance under different
failures, which can help identify different types of failures.
Furthermore, the transaction amount that are received by
a blockchain system in one second could be adjusted in a
testing environment. Thus, the system performance under
different transaction rates could be measured.
Through the further review described above, we see that
the system measurement of distributed file systems is still in
its immature stage. Thus, we look forward to seeing exciting
new studies on this topic.
Some current versions of DFS such as IPFS, do not tolerate
Byzantine attacks. For instance, every peer can access every
file stored on IPFS as long as it joins in the system. This sit-
uation makes privacy and security issues are weaknesses for
IPFS systems. Therefore, to import some privacy-protection
means such as smart contract-based Access Control mech-
anisms [31] and encryption technologies [55] over the data
stored on blockchain-based DFSs could be feasible solutions.
In addition, researchers are also considering that will
Reed-Solomon erasure coding [46] be implemented for IPFS.
Note that, Reed-Solomon coding is very popular in the
datacenters as they provide great disk-savings against data
10 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
replication. IPFS has not yet addressed such data replication
problem. On the other hand, adopting such erasure coding
can also enhance the privacy and security level for DFSs.
This is because each data chunk is encoded under an erasure
coding, even if a peer gets a chunk, it doesn’t know what
the content is. Furthermore, if a malicious attacker outside
a DFS intends to eavesdrop from the DFS peers, the attacker
must have all encoded pieces of data chunks associated to the
desired file. It would be very difficult if the malicious attacker
is blocked by an access-control mechanism.
In summary, we anticipate to see new solutions regarding
data privacy & security of IPFS are going to be implemented
in near future.
IPFS is providing business solutions to enterprises. A grow-
ing number of applications based on IPFS have been de-
veloped. According to the original design, IPFS is used
for storing data. For example, Jia et al. [70] developed a
decentralized music-sharing platform called Opus employing
both IPFS and Ethereum. Opus provides encrypted storages
using IPFS. The keys of these encrypted data are traded using
smart contracts. Opus is also able to prevent monopoly of
streaming platforms, track the digital ownership of artists and
compensate artists with reasonable monetary price. Not only
playing as a game-changer in music domain, IPFS has been
adopted by other areas. For instance, Tenorio-Forn et al. [71]
proposed a decentralized publication system for open-access
science based on IPFS. Their proposed distributed systems
can record reviewers’ reputation, and handle the transparent
governance processes.
Recently, the IPSE team [72] proposed a new revolutionary
search engine, which is implemented on top of IPFS and
blockchain. Such IPSE focuses on user privacy and search
efficiency, because it allows users to search network files on
IPFS and access to the file without relying on a centralized
entity such as Google or Baidu. More importantly, IPSE also
enables users to take full control of their own network data
by exploiting encryption technologies and smart contracts.
Thus, IPSE is a good example that integrates a distributed
file system with Blockchain technologies.
It can be seen that most of these applications leverage the
decentralized characteristics of IPFS. With the integration of
Filecoin, smart contracts are imported into IPFS. This new
feature brings a great potential to IPFS. Thus, smart contracts
make the application development based on IPFS or Swarm
a promising direction.
IPFS and Swarm can be also well combined with big data
applications. We discuss the big data issues considering the
following two aspects: big data storage and big data analyt-
On one hand, regarding big data storage, IPFS and Swarm
can store data with their decentralized and secure character-
istics. For example, Confais et al. [73] proposed an object
store for Fog and Edge Computing using IPFS and Scale-
out Network Attached Storage systems (NAS) [74]. The
proposed system alleviated the issues of high latency of
cloud computing architecture and thus is suitable for the
Internet of Things (IoT). According to [75], in the era of
the fifth Generation Communications Network (5G), more
IoT facilities require larger and more secure storages. To
meet this requirement, the blockchain-based distributed file
systems such as Swarm and IPFS can play an important role
as the secure storage layer for IoT.
On the other hand, with respect to big data analytics,
the transactions on blockchains and the logs in file systems
can be used for data analytics. For example, the analytics
of transactions collected from blockchain systems can be
used to extract the trading patterns of users. The data ana-
lytics of peer’s credit is also useful when deciding whether
to sign deals with peers. As representative works of data
analytics, Chen et al. [76]–[78] analyzed a large-scale of
smart contracts collected from Bitcoin and Ethereum. The
authors then successfully detected a large number of market
manipulations and Ponzi Schemes [79] using data mining and
machine learning methods. Their studies can be viewed as
a pioneer on combining big data analytics with blockchains.
The technique reports, datasets and even data-analytics codes
[80] can be downloaded from Using sim-
ilar approaches, transactions and other data in blockchain
networks can be analyzed such that malicious peers and
potential attacks existing in distributed file systems can be
Since we have not found any further studies related to
the big data issues, we believe that this topic will become
a very promising direction for the research community of
blockchain-based distributed file systems.
The new generation of blockchain-based distributed file sys-
tems, such as IPFS and Swarm, have shown their great
potentials with their key characteristics: novel solutions of
incentive, low-latency data retrieval, automated auditing, and
censorship-resistant, etc. This paper first presents the ratio-
nale, layered structure and an overview of blockchain-based
distributed file systems, particularly focusing on IPFS and
Swarm systems. Then, we review the cutting-edge studies,
and reveal a series of challenges that constrain their develop-
ment. Open issues and future directions are also discussed.
We believe that the blockchain-based distributed file systems
will become very promising solutions for the next-generation
websites and data-sharing platforms. We anticipate that this
article can trigger blooming investigations on blockchain-
based distributed file systems.
[1] M. Giesler and M. Pohlmann, “The anthropology of file sharing: Consum-
ing napster as a gift,” ACR North American Advances, 2003.
[2] M. Ripeanu, “Peer-to-peer architecture case study: Gnutella network,”
in Proceedings first international conference on peer-to-peer computing.
IEEE, 2001, pp. 99–100.
VOLUME 4, 2016 11
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[3] N. S. Good and A. Krekelberg, “Usability and privacy: a study of kazaa
P2P file-sharing,” in Proc. of the SIGCHI conference on Human factors in
computing systems. ACM, 2003, pp. 137–144.
[4] H.-W. Tseng, Q. Zhao, Y. Zhou, M. Gahagan, and S. Swanson, “Morpheus:
creating application objects efficiently for heterogeneous computing,” in
2016 ACM/IEEE 43rd Annual International Symposium on Computer
Architecture (ISCA). IEEE, 2016, pp. 53–65.
[5] J. Pouwelse, P. Garbacki, D. Epema, and H. Sips, “The bittorrent p2p file-
sharing system: Measurements and analysis,” in International Workshop
on Peer-to-Peer Systems. Springer, 2005, pp. 205–216.
[6] J. E. Cater and J. Soria, “The evolution of round zero-net-mass-flux jets,
Journal of Fluid Mechanics, vol. 472, pp. 167–200, 2002.
[7] J. Benet, “Ipfs-content addressed, versioned, p2p file system,” arXiv
preprint arXiv:1407.3561, 2014.
[8] V. Trón, A. Fischer, and Nagy, “State channels on swap networks: claims
and obligations on and off the blockchain (tentative title),” 2016.
[9] S. Wilkinson, T. Boshevski, J. Brandoff, and V. Buterin, “Storj a peer-to-
peer cloud storage network,” 2014.
[10] “Ppio: a decentralized programmable storage and delivery network,” https:
[11] J. Benet and N. Greco, “Filecoin: A decentralized storage network,”
Protoc. Labs, 2018.
[12] G. Wood et al., “Ethereum: A secure decentralised generalised transaction
ledger,” Ethereum project yellow paper, vol. 151, pp. 1–32, 2014.
[13] S. Wilkinson, J. Lowry, and T. Boshevski, “Metadisk: a blockchain-based
decentralized file storage application,” Storj Labs Inc., Technical Report,
hal, pp. 1–11, 2014.
[14] M. Szydlo, “Merkle tree traversal in log space and time,” in International
Conference on the Theory and Applications of Cryptographic Techniques.
Springer, 2004, pp. 541–554.
[15] S. Nakamoto et al., “Bitcoin: A peer-to-peer electronic cash system,” 2008.
[16] G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche, “Keccak,” 2013.
[17] “LibP2P,”
[18] “DevP2P,”
[19] M. J. Freedman, E. Freudenthal, and D. M. Eres, “Democratizing content
publication with coral,” in Conference on Symposium on Networked
Systems Design & Implementation, 2004.
[20] “Distributed preimage archive of swarm,” https://swarm-guide.
[21] “Expected consensus,”
[22] I. Baumgart and S. Mies, “S/kademlia: A practicable approach towards
secure key-based routing,” in International Conference on Parallel &
Distributed Systems, 2007.
[23] M. J. Freedman and D. MaziÃl’res, “Sloppy hashing and self-organizing
clusters,” 2003.
[24] S. Shalunov, G. Hazel, J. Iyengar, and M. Kuehlewind, “Low extra delay
background transport,” Internet-draft, Internet Engineering Task Force,
Tech. Rep., 2010.
[25] R. Stewart, Q. Xie, and M. C. Allman, “Stream control transmission
protocol (sctp): A reference,” Publisher: Addison-Wesley, 2001.
[26] A. Chockalingam and G. Bao, “Performance of tcp/rlp protocol stack on
correlated fading ds-cdma wireless links,” IEEE Transactions on Vehicular
Technology, vol. 49, no. 1, pp. 28–33, 1998.
[27] S. Kim, “Measuring ethereum’s peer-to-peer network,” 2017.
[28] “The Homepage of GIT,”
[29] A. Tridgell, P. Mackerras et al., “The rsync algorithm,” 1996.
[30] A. Z. Broder, “Some applications of rabin’s fingerprinting method,” in
Sequences II. Springer, 1993, pp. 143–152.
[31] H. Huang, S. Zhou, J. Lin, K. Zhang, and S. Guo, “Bridge the Trust-
worthiness Gap amongst Multiple Domains: A Practical Blockchain-based
Approach,” in Proc. of 11th IEEE International Conference on Communi-
cations (ICC’20), June 2020, pp. 1–6.
[32] “Attacks of IPFS,”
[33] Y. Hirai, “Defining the ethereum virtual machine for interactive theorem
provers,” in International Conference on Financial Cryptography and Data
Security. Springer, 2017, pp. 520–535.
[34] A. Gervais, G. O. Karame, V. Glykantzis, H. Ritzdorf, and S. Capkun,
“On the security and performance of proof of work blockchains,” in Acm
SIGSAC Conference on Computer & Communications Security, 2016.
[35] I. Bentov, C. Lee, A. Mizrahi, and M. Rosenfeld, “Proof of activ-
ity:extending bitcoin’s proof of work via proof of stake [extended ab-
stract]y,” Acm Sigmetrics Performance Evaluation Review, vol. 42, no. 3,
pp. 34–37, 2014.
[36] “Design rationale of ethereum,”
[37] S. Wilkinson, T. Boshevski, J. Brandoff, J. Prestwich, G. Hall, P. Gerbes,
P. Hutchins, C. Pollard, and V. Buterin, “Storj a peer-to-peer cloud storage
network (version 2.0),” Dec. 2016.
[38] H. Shacham and B. Waters, “Compact proofs of Retrievability,” Journal of
cryptology, vol. 26, no. 3, pp. 442–483, 2013.
[39] O. Wennergren, M. Vidhall, and J. Sörensen, “Transparency analysis of
distributed file systems: With a focus on interplanetary file system,” 2018.
[40] J. Shen, Y. Li, Y. Zhou, and X. Wang, “Understanding i/o performance
of IPFS storage: a client’s perspective,” in Proc. of the International
Symposium on Quality of Service (IWQoS’19), 2019, pp. 1–10.
[41] E. Nyaletey, R. M. Parizi, Q. Zhang, and K.-K. R. Choo, “BlockIPFS-
blockchain-enabled interplanetary file system for forensic and trusted
data traceability,” in 2019 IEEE International Conference on Blockchain
(Blockchain), 2019, pp. 18–25.
[42] Y. Chen, H. Li, K. Li, and J. Zhang, “An improved P2P file system scheme
based on IPFS and blockchain,” in Proc. of IEEE International Conference
on Big Data (Big Data), 2017, pp. 2652–2657.
[43] I. Tamo, Z. Wang, and J. Bruck, “Zigzag codes: Mds array codes with
optimal rebuilding,” IEEE Transactions on Information Theory, vol. 59,
no. 3, pp. 1597–1616, 2012.
[44] R. Norvill, B. B. F. Pontiveros, R. State, and A. Cullen, “IPFS for reduction
of chain size in Ethereum,” in Proc. of IEEE International Conference on
Internet of Things (iThings) and IEEE Green Computing and Commu-
nications (GreenCom) and IEEE Cyber, Physical and Social Computing
(CPSCom) and IEEE Smart Data (SmartData), 2018, pp. 1121–1128.
[45] “IPFS & SWARM,”\
[46] J. S. Plank, “A tutorial on Reed–Solomon coding for fault-tolerance in
RAID-like systems,” Software: Practice and Experience, vol. 27, no. 9, pp.
995–1012, 1997.
[47] J. Poon and T. Dryja, “The bitcoin lightning network: Scalable off-chain
instant payments,” 2016.
[48] J. Poon and V. Buterin, “Plasma: Scalable Autonomous Smart Contracts,”
White Paper, pp. 1–47, 2017.
[49] D. Vorick and L. Champine, “Sia: Simple decentralized storage,” Nebulous
Inc, 2014.
[50] H. Huang, S. Guo, W. Liang, K. Wang, and Y. Okabe, “Coflow-like Online
Data Acquisition from Low-Earth-Orbit Datacenters,” IEEE Transactions
on Mobile Computing (TMC), 2019, DOI: 10.1109/TMC.2019.2936202.
[51] G. Fanti and P. Viswanath, “Deanonymization in the bitcoin P2P network,
in Advances in Neural Information Processing Systems, 2017, pp. 1364–
[52] M. Steichen, B. Fiz, R. Norvill, W. Shbair, and R. State, “Blockchain-
based, decentralized access control for IPFS,” in Proc. of IEEE Interna-
tional Conference on iThings, GreenCom, CPSCom and SmartData, 2018,
pp. 1499–1506.
[53] M. S. Ali, K. Dolui, and F. Antonelli, “Iot data privacy via blockchains and
IPFS,” in Proc. of the Seventh International Conference on the Internet of
Things. ACM, 2017, p. 14.
[54] N. Nizamuddin, H. R. Hasan, and K. Salah, “IPFS-blockchain-based
authenticity of online publications,” in International Conference on
Blockchain. Springer, 2018, pp. 199–212.
[55] S. Wang, Y. Zhang, and Y. Zhang, “A blockchain-based framework for data
sharing with fine-grained access control in decentralized storage systems,”
IEEE ACCESS, vol. 6, pp. 38437–38 450, 2018.
[56] M. Naz, F. A. Al-zahrani, R. Khalid, N. Javaid, A. M. Qamar, M. K.
Afzal, and M. Shafiq, “A secure data sharing platform using blockchain
and interplanetary file system,” Sustainability, vol. 11, no. 24, p. 7054,
[57] I. Miers, C. Garman, M. Green, and A. D. Rubin, “Zerocoin: Anonymous
distributed e-cash from bitcoin,” in Proc. of IEEE Symposium on Security
and Privacy, 2013, pp. 397–411.
[58] Y. Takabatake, D. Kotani, and Y. Okabe, “An anonymous distributed
electronic voting system using zerocoin,” IEICE Technique Report, 2016.
[59] E. B. Sasson, A. Chiesa, C. Garman, M. Green, I. Miers, E. Tromer, and
M. Virza, “Zerocash: Decentralized anonymous payments from bitcoin,
in Proc. of IEEE Symposium on Security and Privacy, 2014, pp. 459–474.
[60] J. Bonneau, A. Narayanan, A. Miller, J. Clark, J. A. Kroll, and E. W.
Felten, “Mixcoin: Anonymity for bitcoin with accountable mixes,” in
International Conference on Financial Cryptography and Data Security.
Springer, 2014, pp. 486–504.
12 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[61] S. Zou, J. Xi, S. Wang, Y. Lu, and G. Xu, “Reportcoin: A novel blockchain-
based incentive anonymous reporting system,” IEEE Access, vol. 7, pp.
65 544–65 559, 2019.
[62] H. Lipmaa, “Prover-efficient commit-and-prove zero-knowledge snarks,”
in Proc. of International Conference on Cryptology in Africa. Springer,
2016, pp. 185–206.
[63] P. Zheng, Z. Zheng, X. Luo, X. Chen, and X. Liu, “A detailed and real-
time performance monitoring framework for blockchain systems,” in Proc.
of IEEE/ACM 40th International Conference on Software Engineering:
Software Engineering in Practice Track (ICSE-SEIP), 2018, pp. 134–143.
[64] “Parity documentation,”
[65] “Cita technical whitepaper,”
[66] “Hyperledger fabric website,”
[67] Xblock, “Performance Monitoring,” Website, Feb. 2020,
[68] T. Curran and B. de Graaff, “Analysing the performance of IPFS during
flash crowds,” 2016.
[69] Z. Zheng, S. Xie, H. Dai, X. Chen, and H. Wang, “An overview of
blockchain technology: Architecture, consensus, and future trends,” in
Proc. of IEEE International Congress on Big Data (BigData Congress),
2017, pp. 557–564.
[70] B. Jia, C. Xu, R. Gotla, S. Peeters, R. Abouelnasr, and M. Mach, “Opus-
decentralized music distribution using interplanetary file systems (IPFS)
on the ethereum blockchain v0. 8.3,” 2016.
[71] A. Tenorio-Fornés, V. Jacynycz, D. Llop-Vila, A. Sánchez-Ruiz, and
S. Hassan, “Towards a decentralized process for scientific publication and
peer review using blockchain and IPFS,” in Proc. of the 52nd Hawaii
International Conference on System Sciences, 2019.
[72] I. Team", “Ipse: A search engine based on ipfs,”
[73] B. Confais, A. Lebre, and B. Parrein, “An object store for fog infrastruc-
tures based on IPFS and a scale-out nas,” in RESCOM 2017, 2017, p. 2.
[74] G. A. Gibson, “Network attached storage architecture,” Comm Acm,
vol. 43, no. 11, pp. 37–45, 2000.
[75] I. Jovovi´
c, S. Husnjak, I. Forenbacher, and S. Maˇ
cek, “5G, Blockchain
and IPFS: A General Survey with Possible Innovative Applications in
Industry 4.0,” in 3rd EAI International Conference on Management of
Manufacturing Systems-MMS 2018, 2018.
[76] W. Chen, Z. Zheng, J. Cui, E. Ngai, P. Zheng, and Y. Zhou, “Detecting
ponzi schemes on ethereum: Towards healthier blockchain technology,
in Proc. of the 2018 World Wide Web Conference on World Wide Web.
International World Wide Web Conferences Steering Committee, 2018,
pp. 1409–1418.
[77] W. Chen, J. Wu, Z. Zheng, C. Chen, and Y. Zhou, “Market manipulation of
bitcoin: Evidence from mining the mt. gox transaction network,” in IEEE
Conference on Computer Communications. IEEE, 2019, pp. 964–972.
[78] W. Chen, Z. Zheng, E. C.-H. Ngai, P. Zheng, and Y. Zhou, “Exploiting
blockchain data to detect smart ponzi schemes on ethereum,” IEEE Access,
vol. 7, pp. 37 575–37 586, 2019.
[79] “Ponzi scheme,”
[80] Xblock, “Fraud Detection,” Website, Feb. 2020,
HUAWEI HUANG (M’16) is currently an Asso-
ciate Professor with the School of Data and Com-
puter Science, Sun Yat-Sen University, China.
He earned his Ph.D. degree in Computer Sci-
ence and Engineering from the University of Aizu
(Japan) in 2016. His research interests include
blockchain and intelligent distributed computing.
He has served a Research Fellow of JSPS (2016-
2018); a visiting scholar with Hong Kong Poly-
technic University (2017-2018); an Assistant Pro-
fessor with Kyoto University, Japan (2018-2019). He received the best paper
award from TrustCom2016.
JIANRU LIN is currently a visiting researcher
in School of Data and Computer Science, Sun
Yat-Sen University, China. His research interests
include consensus protocols and blockchain.
BAICHUAN ZHENG received his Bachelor’s de-
gree from the School of Data and Computer Sci-
ence of Sun Yat-Sen University in 2019. He is
currently doing research on new consensus mech-
anism and database of blockchain.
ZIBIN ZHENG received the Ph.D. degree from
the Chinese University of Hong Kong, in 2011.
He is currently a Professor at School of Data and
Computer Science with Sun Yat-sen University,
China. He serves as Chairman of the Software
Engineering Department. He published over 120
international journal and conference papers, in-
cluding 3 ESI highly cited papers. According to
Google Scholar, his papers have more than 9600
citations, with an H-index of 47. His research
interests include blockchain, services computing, software engineering, and
financial big data. He was a recipient of several awards, including the Top 50
Influential Papers in Blockchain of 2018, the ACM SIGSOFT Distinguished
Paper Award at ICSE2010, the Best Student Paper Award at ICWS2010. He
served as BlockSys’19 and CollaborateCom’16 General Co-Chair, SC2’19,
ICIOT’18 and IoV’14 PC Co-Chair.
JING BIAN received the B. Sc. degree in Au-
tomation in 1988, the M. Sc. degree in Computa-
tional Mathematics in 2001, and the Ph. D. degree
in Physics in 2006. from Sun Yat-sen Univer-
sity, Guangzhou, China. She is currently a vice-
professor with the School of Data and Computer
Science, Sun Yat-sen University, Guangzhou. Her
current research interests include design and anal-
ysis of algorithms, blockchain, Electronic Com-
merce and social networks.
VOLUME 4, 2016 13
... IPFS, although still in its infancy, has gained a lot of attention and traction among various organizations due to its potential and benefits in online file storage and distribution [13]. The architecture of IPFS is unique in the way it breaks down data into several encrypted data blocks and assigns each block a unique address based on its hash [14]. This decentralized approach allows for the efficient distribution of data across the network, where each node can contribute storage and bandwidth resources [15]. ...
Full-text available
Centralized storage is a data storage model in which data is stored and managed in a single physical location or centralized system. In this model, all data and information are stored on servers or data centers managed by one entity or organization. This model also has disadvantages such as risk of system failure against distributed denial of service (DDoS) attacks, natural disasters, and hardware failures causing a single point of failure. This threat results in loss of data and a lack of user confidence in the availability of data in centralized storage. This study proposes to evaluate the availability of data in decentralized data storage using a four-node interplanetary file system (IPFS) that is interconnected with a swarm key as the authentication key. Unlike centralized storage which has only one data center, four-node IPFS allows users to upload and download data from four interconnected data centers. This can avoid dependence on the central server and reduce server load. The evaluation results show that decentralized data storage using a four-node IPFS system is three times more resilient than centralized storage against a single point of failure. This system can increase data availability so that organizations can minimize data loss from the threat of system failure.
... Files are sliced into multiple parts and stored across multiple nodes and tracked via their hash values. IPFS offers low latency and high throughput-desirable characteristics in an off-chain file system [93]. Kumari et al. [3] propose storing smart-meter data, including energy generation, consumption, etc. on IPFS. ...
Full-text available
Blockchain technology and, in particular, smart contracts based on it, offers a new, decentralized mechanism for entering into and fulfilling contracts in diverse markets. Energy markets are no exception, and indeed, the decentralized nature of the blockchain may be particularly important for them as the penetration of residential prosumers offering microgeneration to the grid grows. At this time, however, the literature on smart contracts in energy markets—and particularly their interaction with the technical infrastructure of the smart grid—is limited and scattered. There is a need to consolidate these studies into a comprehensive understanding of the state-of-the-art in smart contract design for the smart grid. However, no existing reviews focus on smart contracts in energy systems. The scope of our study is the role of smart contracts in energy systems and what limitations they encounter. We conduct a systematic review of this topic, focusing on systems that have been implemented as prototypes. These studies provide key evidence on the scalability of smart contracts for energy systems and their interaction with the technical elements of the smart grid. We selected a pool of 76 papers meeting our criteria, with three others excluded for misinterpreting fundamental aspects of blockchains and smart contracts. After reviewing each paper, we found that this literature falls into four categories: market operations, ancillary services, auditing and monitoring, and cybersecurity. We then identify and examine the cross-cutting concerns of data storage in and interoperability between blockchains. We finally discuss the implications of our findings for future research. In particular, there is likely to be a complex interplay between the data generated and stored via the blockchain versus the data required to meet energy system reliability targets and market obligations for participants.
... One of this technology's limitations is scalability in extensive data storage, as storing them on-chain can grow very expensive; one of the well-established solutions is storing data offchain and managing it on-chain [13]. Blockchain-based distributed file systems, e.g., InterPlanetary File System (IPFS) [1], can be classified into seven layers [14]: ...
IoT technology is rapidly growing in all fields of modern industries. Billions of IoT devices contribute to facilitating life in various contexts. A centralized system can hardly handle the extensive volume of IoT networks. Blockchain technology provides an immutable decentralized platform for communication in IoT applications. However, blockchain does not provide a solution for the confidentiality of data, which can be vital in many IoT applications. To this end, in this paper, we define the security concerns of data sharing in the IoT application. On that basis, we propose a secure data-sharing framework based on blockchain. In this framework, broadcast encryption is used to provide confidentiality of data with minimum data overhead. Moreover, with homomorphic encryption, the proposed framework enables secure data queries without leaking any information about the data. In the security analysis of the proposed framework, we have formally proved that this framework meets the security requirements. Moreover, the proposed framework is evaluated in the teleconsultation use case. The evaluation results show the proposed framework’s strength in providing a secure and robust framework for data-sharing in the context of e-health.KeywordsIoTBlockchainSecureBroadcast encryptionHomomorphic encryption
... IPFS is a peer-to-peer protocol that enables users to access and share files and data directly without the need for a centralized server [17]. For example, in the context of healthcare, IPFS can be used for fast retrieval and easy sharing of patients' personal health records among a variety of players, such as physicians, nurses, insurance companies, or researchers, without any concerns for security and privacy [18]. ...
Full-text available
This paper summarizes the work of many different authors, industries, and countries by introducing important and influential factors that will help in the development, successful adoption, and sustainable use of the Web3/Metaverse and its applications. We introduce a few important factors derived from the current state-of-the-art literature, including four essential elements including (1) appropriate decentralization, (2) good user experience, (3) appropriate translation and synchronization to the real world, and (4) a viable economy, which are required for appropriate implementation of a Metaverse and its applications. The future of Web3 is all about decentralization, and blockchain can play a significant part in the development of the Metaverse. This paper also sheds light on some of the most relevant open issues and challenges currently facing the Web3/Metaverse and its applications, with the hope that this discourse will help to encourage the development of appropriate solutions.
In the blockchain system, the light node uses a simplified transaction verification method. Its storage overhead increases linearly with the size of the blockchain, which can quickly become prohibitive for mobile and IoT low-end devices in the blockchain system. However, most existing schemes only achieve limited storage compression, apply only to UTXO scenarios, or cannot verify the transactions independently. This paper proposes a light node for public blockchain with constant-size storage called LNPB. LNBP fuses the blockchain verification and the simple payment verification protocols to achieve a more succinct and secure transaction verification protocol with constant-size storage. It applies to UTXO and non-UTXO scenarios. To this end, we re-design the block header, which contains a constant-size summary and an RSA modulus. Then we use an RSA encryption accumulator to calculate this summary. The light node only stores the summary of the latest block when verifying the transaction. In generating the proof of the transaction by full node, we employ the Merkle Mountain Range to store the intermediate results of the summary, which makes the proof generating faster. In addition, we conduct simulation experiments and analysis on LNPB, and compare it with existing schemes. The results indicate that LNPB achieves the expected goals, and can save the storage and computation overheads.KeywordsBlockchainLight nodeRSA encryption accumulatorSPVMMR
Full-text available
In an increasingly global, connected, and digital world, the management, protection, enforcement and monetization of intellectual property has never been so challenging and critical at the same time. Challenging because intellectual property, especially in a digital form, can be easily copied, deployed, stolen or misappropriated. Critical because nowadays intellectual property is everywhere, it is present in all areas of economic activities, it enables companies to create competitive moat and its attached monetary value is material. Knowledge should also be openly available; therefore, intellectual property law is basically about finding the right balance between authors´ interests (protection, enforcement, monetization) and users' interests (usage, access). Blockchain and its underlying technology of distributed ledgers has the potential to disrupt the way intellectual properties are managed, protected and monetized. Where the access and distribution of content has been revolutionized by the internet, distributed ledger technology might offer an alternative path helping intellectual property law enter the digital age and address its original intent which is at its core to protect and reward creators, enable open access to knowledge in order to foster innovation. In this paper we conduct a PRISMA guided systematic literature review of 176 scientific publications in the field of blockchain-based management of intellectual property. We use a PESTEL framework to investigate the benefits as well as the limitations of using distributed ledger technology to manage intellectual property. Additionally, we provide recommendations on how the identified challenges can be addressed, as well as future research directions.
Conference Paper
Full-text available
In isolated network domains, the global trustworthiness (e.g., consistent network views, etc) is critical to the multiple-party business partners who aim to perform the trusted corporations depending on each isolated network view. However, to achieve such global trustworthiness over distributed network domains is a challenge. This is because when multiple partners are required to exchange their local domain views with each other, it is difficult to ensure the data trustworthiness among them. The isolated domain view in each partner is prone to be destroyed by the malicious falsification attacks. To this end, we propose a blockchain-based approach that can ensure the trustworthiness among multiple-party domains. In this paper, we mainly present the design and implementation of the proposed trustworthiness-protection system. A cloud-based prototype and a local testbed are developed based on Ethereum. Finally, experimental results demonstrate the effectiveness of the proposed prototype and testbed.
Full-text available
In a research community, data sharing is an essential step to gain maximum knowledge from the prior work. Existing data sharing platforms depend on trusted third party (TTP). Due to the involvement of TTP, such systems lack trust, transparency, security, and immutability. To overcome these issues, this paper proposed a blockchain-based secure data sharing platform by leveraging the benefits of interplanetary file system (IPFS). A meta data is uploaded to IPFS server by owner and then divided into n secret shares. The proposed scheme achieves security and access control by executing the access roles written in smart contract by owner. Users are first authenticated through RSA signatures and then submit the requested amount as a price of digital content. After the successful delivery of data, the user is encouraged to register the reviews about data. These reviews are validated through Watson analyzer to filter out the fake reviews. The customers registering valid reviews are given incentives. In this way, maximum reviews are submitted against every file. In this scenario, decentralized storage, Ethereum blockchain, encryption, and incentive mechanism are combined. To implement the proposed scenario, smart contracts are written in solidity and deployed on local Ethereum test network. The proposed scheme achieves transparency, security, access control, authenticity of owner, and quality of data. In simulation results, an analysis is performed on gas consumption and actual cost required in terms of USD, so that a good price estimate can be done while deploying the implemented scenario in real set-up. Moreover, computational time for different encryption schemes are plotted to represent the performance of implemented scheme, which is shamir secret sharing (SSS). Results show that SSS shows the least computational time as compared to advanced encryption standard (AES) 128 and 256.
Full-text available
Satellite-based communication technology regains much attention in the past few years, where satellites play mainly the supplementary roles as relay devices to terrestrial communication networks. Unlike previous work, we treat the low-earth-orbit (LEO) satellites as secure data storage mediums [1]. We focus on data acquisition from an LEO satellite-based data storage system (also referred to as the LEO based datacenters), which has been considered as a promising and secure paradigm on data storage. Under the LEO based datacenter architecture, one fundamental challenge is to deal with energy-efficient downloading from space to ground while maintaining the system stability. In this paper, we aim to maximize the amount of data admitted while minimizing the energy consumption, when downloading files from LEO based datacenters to meet user demands. To this end, we first formulate a novel optimization problem and develop an online scheduling framework. We then devise a novel coflow-like "Join the first K-shortest Queues (JKQ)" based job-dispatch strategy, which can significantly lower backlogs of queues residing in LEO satellites, thereby improving the system stability. We also analyze the optimality of the proposed approach and system stability. We finally evaluate the performance of the proposed algorithm through conducting emulator based simulations, based on real-world LEO constellation and user demand traces. The simulation results show that the proposed algorithm can dramatically lower the queue backlogs and achieve high energy efficiency.
Conference Paper
Full-text available
The Interplanetary File System (IPFS) is a distributed file system that seeks to decentralize the web and to make it faster and more efficient. It incorporates well-known technologies, including BitTorrent and Git, to create a swarm of computing systems that share information. Since its introduction in 2016, IPFS has seen great improvements and adoption from both individuals and enterprise organizations. Its distributed network allows users to share files and information across the globe. IPFS works well with large files that may consume or require large bandwidth to upload and/or download over the Internet. The rapid adoption of this distributed file system is in part because IPFS is designed to operate on top of different protocols, such as FTP and HTTP. However, there are underpinning concerns relating to security and access control, for example lack of traceability on how the files are accessed. The aim of this paper is to complement IPFS with blockchain technology, by proposing a new approach (BlockIPFS) to create a clear audit trail. BlockIPFS allows us to achieve improved trustworthiness of the data and authorship protection, and provide a clear route to trace back all activities associated with a given file using blockchain as a service.
Full-text available
With the widespread popularity of Internet-enabled devices, mobile users can request and receive messages anytime and anywhere, which facilitates information feedback for smart city management. However, few people are willing to reflect or report some violations of law and discipline around them, and more people choose to ignore. In general, there are two major reasons for this phenomenon. First, reporting with a real name is highly recommended, but it is difficult to send trusted and reliable reporting messages without revealing the reporter’s identity. Second, generally no benefit, users usually lack the motivation to report due to worrying about being retaliated. In this paper, we propose an effective anonymous reporting system called ReportCoin, a novel Blockchain-based incentive anonymous reporting system. ReportCoin guarantees user identity privacy and reporting message reliability throughout the reporting process. On the one hand, ReportCoin allows nondeterministic mobile users to vote the reporting by signing and to send anonymous announcements in the non-fully trusted network. On the other hand, ReportCoin motivates users with incentives to report without worrying about the disclosure of identity information to be retaliated. Meanwhile, account information and transaction records in ReportCoin are open, transparent, and tamper-resistant. Theoretical analysis and extensive experimental results show that ReportCoin is efficient and practical.
Full-text available
Blockchain technology becomes increasingly popular. It also attracts scams, for example, Ponzi scheme, a classic fraud, has been found making a notable amount of money on Blockchain, which has a very negative impact. To help dealing with this issue and to provide reusable research data sets for future research, this study collects real-world samples and proposes an approach to detect Ponzi schemes implemented as smart contracts (i.e., smart Ponzi schemes) on blockchain. Firstly, 200 smart Ponzi schemes are obtained by manually checking more than 3,000 open source smart contracts on the Ethereum platform. Then, two kinds of features are extracted from the transaction history and operation codes of the smart contracts. Finally, a classification model is presented to detect smart Ponzi schemes. Extensive experiments show that the proposed model performs better than many traditional classification models and can achieve high accuracy for practical use. By using the proposed approach, we estimate that there are more than 500 smart Ponzi schemes running on Ethereum. Based on these results, we propose to build a uniform platform to evaluate and monitor every created smart contract for early warning of scams.
Conference Paper
Ethereum, the second-largest cryptocurrency valued at a peak of $138 billion in 2018, is a decentralized, Turing-complete computing platform. Although the stability and security of Ethereum---and blockchain systems in general---have been widely-studied, most analysis has focused on application level features of these systems such as cryptographic mining challenges, smart contract semantics, or block mining operators. Little attention has been paid to the underlying peer-to-peer (P2P) networks that are responsible for information propagation and that enable blockchain consensus. In this work, we develop NodeFinder to measure this previously opaque network at scale and illuminate the properties of its nodes. We analyze the Ethereum network from two vantage points: a three-month long view of nodes on the P2P network, and a single day snapshot of the Ethereum Mainnet peers. We uncover a noisy DEVp2p ecosystem in which fewer than half of all nodes contribute to the Ethereum Mainnet. Through a comparison with other previously studied P2P networks including BitTorrent, Gnutella, and Bitcoin, we find that Ethereum differs in both network size and geographical distribution.
Conference Paper
IPFS has surged into popularity in recent years. It organizes user data as multiple objects where users can obtain the objects according to their Content IDentifiers (CIDs). As a storage system, it is of great importance to understand its data I/O performance. But existing work still lacks such a comprehensive study. In this work, we deploy an IPFS storage system with geographically-distributed storage nodes on Amazon EC2. We then conduct extensive experiments to evaluate the performance of data I/O operations from a client's perspective. We find that the access patterns of I/O operations (e.g., request size) severely affect the I/O performance, since IPFS typically uses multiple I/O strategies to perform different I/O requests. Moreover, for the read operations, IPFS requires to resolve remote nodes and downloading objects via the internet. Our experimental study reveals that both resolving and downloading operations can become bottlenecks. Our results can shed light to optimizing IPFS in avoiding high-latency I/O operations.