Content uploaded by Mohammad Jabed Morshed Chowdhury
Author content
All content in this area was uploaded by Mohammad Jabed Morshed Chowdhury on Mar 16, 2019
Content may be subject to copyright.
Blockchain as a Notarization Service for Data
Sharing with Personal Data Store
Mohammad Jabed Morshed Chowdhury∗, Alan Colman∗, Muhammad Ashad Kabir†, Jun Han∗and Paul Sarda∗
∗School of Software and Electrical Engineering,
Swinburne University of Technology, Melbourne, Australia
Email: {mjchowdhury, acolman, jhan, 4949501}@swin.edu.au
†School of Computing and Mathematics
Charles Sturt University, NSW, Australia
Email: akabir@csu.edu.au
Abstract—Personal data such as electronic medical records and
academic records are critical and sensitive private information.
These personal information is usually hosted across many data-
custodian systems. Personal Data Store (PDS) is a service that
lets an individual store, manage and deploy their key personal
data in a highly secure and structured way. It also gives the
user a central point of control for their personal information.
One of the inherent problems of digital records is that it can be
easily forged. Therefore, the data-consumer(with whom the data
is shared) often needs to verify the authenticity of the shared
document/record by communicating with the document/certifi-
cate issuing authority (e.g., data custodian). However, this process
is time consuming and inefficient. In recent time, blockchain has
gained tremendous attention from both industry and academia
for distributed recording and immutable transactions. Blockchain
provides a shared, immutable and transparent history of trans-
actions enabling the building of applications that incorporate
trust, accountability and transparency. This provides a unique
opportunity to develop a secure and trustable data sharing system
using blockchain. However, blockchain is primarily proposed for
publicly verifiable transactions and does not provide privacy
to the individuals. In this paper, we propose a data sharing
framework that will guarantee the authenticity of the shared data
in real-time and provide transactional privacy in a blockchain
network. We have implemented our framework in a prototype
that ensures privacy, integrity, and fine-grained access control
over the shared data. The proposed work can significantly reduce
the turnaround time for data sharing, improve the decision
making process and reduce the overall cost.
Index Terms—context-aware, privacy, blockchain, data shar-
ing.
I. INTRODUCTION
Currently most of established organizations (e.g., hospitals,
universities, banks etc.) use web enabled systems to provide
their services and host their clients’ data. Recently, online
social networks (e.g., Facebook, Twitter, and LinkedIn) have
revolutionized the individual’s attitude towards data sharing.
Individuals (e.g., data subject) now want to share their personal
information such as academic record, and medical records
with others (e.g., data consumer). They wish to provide others
secure and selective access to their personal data hosted at
different data-custodian systems (which stores and manage
individual’s data) for different purposes.
However, different data custodians have different goals and
policies related to the individual data subject’s access to,
and control over, the personal data about them. Traditional
organizations (e.g., university, hospital) are usually reluctant
to allow their users to share their data outside of their
administrative domain. In some cases, they outsource the
limited sharing functionalities to a trusted third party (e.g.,
universities in Australia and New Zealand have outsourced
academic record sharing to a third-party organization called
My eQuals [12]). In [4], authors have identified the following
reasons for organizations not to allow the data subjects to share
their data.
1) there are legal implications for possible sharing of data,
so entities that possess data tend to keep them locked;
2) data (and the knowledge related to them) represent a
useful and valuable good, thus companies do not intend
to share information with others;
The lack of sharing facilities from data custodians has
fueled the recent popularity of cloud based Personal Health
Record (PHR)[5] and Personal Data Storage (PDS)[6] solu-
tions among the data subjects. Personal Data Storage (PDS)
service providers provide cryptographic solutions to store
encrypted data to ensure the confidentiality of the personal
data on the cloud. They allow the data subject to store
their data from different data custodians (e.g., health data,
academic data etc.) in a single place. It helps the data subjects
to get a “subject-centric” view of their data. However, in
PDS-based system, the shared data is not being accessed
directly from the source/issuing authority, data consumer has
to often rely on offline notarization mechanism [7] to proof the
authenticity of the shared document. Sometime, data consumer
(the shared party) communicates back to the data generating
organization (e.g., Employers call back to the university to
validate academic certificate of a student). This is often time
consuming and inefficient. Therefore, we need a system where
data consumer can easily verify the authenticity of the shared
document.
Blockchain [8] technology has gained popularity for its role
in crypto-currencies (Bitcoin currently being the best known
of them all). Aside from the use of the blockchain technology
in cryptocurrency, developers and researchers are working to
build applications and services (e.g., identity management,
proof of existence for intellectual property, decentralised DNS
services, and others) that leverage the power and versatility
of the blockchain technology. Blockchain technology offers
secure cryptographic techniques to store data that cannot
be edited by other entities in the blockchain network. The
records in the blockchain are hosted in multiple participating
nodes in the network. Due to its distributed and immutable
characteristics and cryptographic strength, the blockchain can
provide proof of authenticity of any particular record stored
in the blockchain.
In addition, researchers have also identified that data subject
wants to keep record of the audit trail of their sharing [9]. Usu-
ally blockchain only keeps record of the “write operation” that
changes the state of the object/data and “read operation” is
not recorded in the blockchain (e.g., Hyperledger Blockchain
). Keeping audit log “read operation” is important. However,
that may leak the privacy of the data subject in blockchain
network. Therefore, we need a mechanism to protect the
privacy of the individual.
The essence of our approach is to provide authenticity of the
sharing documents using blockchain technology. We employ
permissioned blockchain technology (see section II) to ensure
that only a limited valid data custodians can join and generate
hash of the documents. The contribution of the paper are two
fold. Firstly, we present a architecture of blockchain based data
sharing for Personal Data Storage (PDS), where blockchain
provides the notarization service. Secondly, we provide a
mechanism to ensure the transactional privacy in our proposed
framework. We developed a prototype using Ethereum private
blockchain.
In section 2, we discuss about the blockchain and its
associated technologies. Section 3 presents Personal Data
Storage (PDS) based data sharing requirements. Section 4
presents the a system architecture to achieve “subject-centric”
data sharing using PDS and blockchain. A mechanism for
transactional privacy is presented in section 5. Section 6
presents Implementation and validation. Section 7 shows the
mapping between the identified requirements (III-B) and the
proposed system architecture and implementation. Section 8
presents the related work. We conclude the paper with a
discussion of future work in Section 9.
II. BACKGROU ND O N BLOCKCHAIN
A blockchain is a distributed, always available, irreversible,
tamper resistant, replicated public repository of data. It allows
trustless users to agree on an immutable and auditable piece
of data without third party interaction [17]. In other words,
blockchain technology allows to build an append only secure
database relying on a distributed consensus protocol to decide
what valid new data to add in a distributed manner. There are
different variations of blockchain based on who can join the
network and how the data is stored in the network.
A blockchain is called public if each participants can read
it and use it to carry out transactions but also if everyone can
participate in the process of creating the consensus. There is
therefore no central register, nor a trusted third party.
On the other hand, a blockchain is called private (or semi-
private) if the consensus process can only be achieved by a
limited and predefined number of participants. Write access is
given by an organization and read permissions can be public
or restricted. In the case of the private blockchain, consensus
mechanism is usually lightweight.
In blockchain, data or transactions are usually kept on the
chain. This ensures the ACID (Atomicity, Consistency, Isola-
tion, Durability) behavior of the chain. With the blockchain,
the logic of the system requires a large group of people to
keep copies of everything. If data is lost or corrupted in any
particular node, then it can easily synchronize with other nodes
in the network.Therefore, it can provide durability.
III. DATA SHARING WITH PERSONA L DATA STORE
Researchers have proposed Personal Health Record (PHR)
as a mean to aggregate different kinds of medical records of
any particular patient[5]. Personal Data Storage is an extension
of that concept, where individuals can store and control their
different types (e.g., academic, financial) of personal data
from a central point. Individual’s data cloud be co-located
with external services provider (e.g., hospital, university, cloud
based fitness tracking provider etc.) Data from a PDS may be
accessed via an API. Users are allowed to selectively share
sets of data with other users. Researchers from MIT have
developed a PDS-based system called OpenPDS [6] which
allows users to collect, store, and give ne-grained access to
their data all while protecting their privacy. Though PDS
provides a central point of control for individuals for their
data, but it suffers from authenticity of the document problem
while sharing with others. We will elaborate the problem with
the help of a scenario in the next section.
A. Scenario
Alice has just finished her graduation from University of
Wonderland. She is now applying for jobs. Mr. Smith from
Xbiz Corporation has liked her interview and wants to see her
academic transcripts and academic certificates. Luckily, Alice
uses a PDS service where she has put all her academic records.
She has shared those documents with Mr. Smith. Mr. Smith has
got access to those documents. However, he is not sure about
the authenticity of the documents and wants to verify them.
In addition, Alice wants that the system should keep record of
the audit log who has accessed her document. While keeping
the audit log, she also wants to be sure that her privacy is well
kept.
B. Requirement Analysis
From an analysis of the scenario, we define the following
requirements;
Notarization(Req. 1): As the potential employer (Mr.
Smith) can access the shared academic record directly from
the data custodian (University of Wonderland) due to privacy
concerns, he therefore needs proof of authenticity of the doc-
uments given to him by potential employees. In our scenario,
as the data is shared from the PDS, Mr. Smith may ask for
Membership
Service
Sends H(Dsid
+ Dcusid
+ DR)
Personal Data
Store
Store
records
Context
Provider
Data
Context
Provider
Context
Provider
Context
Provider
Context
Provider
Data
Retrieve context
information
Retrieve context
information
Define policy
Hosts data of
Data Consumer
Client
Validation
process
Data Consumer
uses
Access the shared data
Validate hash
User registration
Register for ID (e.g., Dcusid
)
Data Consumer
Data
custodi an 3
Data custodian 2
Data Custodian 1
Data
Fig. 1: The system architecture of blockchain based data notarization
the proof of authenticity of the shared document. Currently,
Alice can try to prove the authenticity of the shared document
by using traditional notarization service, although these many
also be unreliable. Or, Mr. Smith can call the university to
verify the authenticity of the shared document. Both of these
approaches are time consuming and does not provide real-time
solution. Therefore, we need a notarization mechanism that
can prove the authenticity of the document in a PDS-based
sharing environment.
Audit Trail(Req. 2): Individuals are interested to control
the access to their personal data. At the same time, they are
interested to see the logs who has accessed their data at what
time. Regulations such as HIPPA [15], and the Gramm-Leach-
Bliley Act [16] have specific mandates relating to audit trail.
Privacy in Sharing(Req. 3): Privacy plays an important
role in personal data sharing use cases. Individuals are usually
concern about their privacy when they share some of their
personal data with others. For example, in the scenario, Alice
has applied for the job at Xbiz, but she does not want that
anybody other than Xbiz should know that she has applied
to Xbiz. Therefore, we need a mechanism that can provide
privacy while sharing data.
IV. SYS TE M ARCHITECTURE
To fulfill the above mentioned requirements we have come-
up with an system architecture that can provide notarization
service in a PDS-based sharing environment. We have de-
ployed blockchain technology to provide the real-time nota-
rization in collaboration with data custodian and PDS.
A. Design Choices
In terms of architecture we have taken the following design
rationale:
1.1 Off Chain Data Storage: Storing data in the blockchain
provides the durability to the data. However, it may reduce
the performance of the network and increase the data storage
requirements. In the presented scenario, if universities stores
the academic record in the chain then it will increase the
blockchain size quite rapidly.
1.2. Consortium Blockchain: As PDS based system mainly
deals with personal data, therefore data subject is concerned
about the real identity of the data consumer. Therefore, we
have to use permissioned blockchain which will mandate real
world identity mapping of the participating parties.
B. Architecture
Figure 1 shows the proposed architecture for PDS-based
data sharing. The framework consists of the Membership
service, data custodian (e.g., universities), Personal Data
Storage (PDS), Context provider and Data consumer client
(Blockchain client for verification). There will be multiple data
custodians (e.g., all the universities in Australia) and multiple
data consumers (e.g., many companies) in the system which
host data subject’s data. There could be multiple PDS which
manages data sharing on behalf of the data subjects.
The data custodian (e.g. university) generates or issues the
document against any particular data subject. It stores the main
copy and custodian generates a hash of the data-subject id
(DSid), data-custodian id (DCusid ) and data resource (DR).
Data custodian then uploads this hash to the blockchain :
H{DSid || DCusid || DR }. Data is stored as the Key -
Value pair in the blockchain. Here the custodian id, student id
(e.g., unique personally identifiable number) and the document
are concatenated as the key (e.g., Key(datacudstodian-
id||studentid||documentid)) and the value is the produced
hash (e.g., H{DSid || DCusid || DR }) . In this blockchain
network only the data custodian can upload the Hash into the
blockchain.
The data subject downloads the data (e.g., academic certifi-
cate) from the data custodian and upload it to the PDS. As
PDS is a cloud data storage platform, data subject encrypt the
data using a symmetric encryption key (SKAES) before storing
it in the PDS. When the data resource is shared after fulling the
policy conditions, the symmetric encryption key is passed to
the data consumer for decryption. Data subject defines the data
sharing policy using our proposed data sharing policy language
[9]. The data consumer gets a notification when data subject
share any documents. When the data consumer clicks on the
shared resource link, it redirects the data consumer to the
PDS. The PDS authenticates the data consumer and enforce
the defined policy. If the policy is successfully evaluated, then
the data consumer gets access to the data. Figure 2 shows
interactions between different parties in PDS-based system. In
addition, revocation of any issued document is rare but not
impossible. There are occasions when issuing authority may
revoke the issued document for various reasons.
V. TRANSACTIONAL PRIVACY
In a public blockchain like BitCoin, anyone can join the
network and can work as the ”miner” in the network to verify
the transaction without reveling their true identity. This is
problematic in personal data sharing use cases. Individuals
want to be sure with whom they are sharing their data.
Therefore, we propose to use consortium based blockchain
(for details see section II) for the presented scenario.
A. Audit Trail
In blockchain, a transaction is an activity which changes
the state of the current blockchain. Transactions are recorded
in the blockchain as a part of the block. As read or data-
query does not change the state of the blockchain, therefore
they are not treated as transactions and are not kept in the
blockchain. We have also explored few current implementation
of blockchain such as Ethereum and Hyperledger and find that
“read” operation is not treated as transaction.
1) Transactions and Visibility: In the presented scenario,
there are different types of transactions in the blockchain
network. Data custodians have two types of transactions, such
as ”uploading hash” and revocation. The visibility of these
transaction needs to be public within the network. Any em-
ployer with a valid blockchain-client (of this network) should
be able to query this information to check the authenticity of
the document.
We propose to divide transactions into public and private
ones. Information concerning public transactions and the
consensus process is available to all the nodes within the
system that have the right privilege (e.g. data custodian). For
example, Universities in this blockchain network plays the
role of “data custodian”. Therefore, when any university (e.g.,
data custodian) will upload the “hash” of the document, this
transaction will be communicated to all the “data custodian”
node. All the creator node will vote for the “valid” transaction
using Byzantine Fault Tolerant(BFT) algorithm [13]. This will
provide safety of the network.
The query transaction will be treated as private transaction.
The private transaction will be encrypted by public key of the
data subject (e.g., student).Therefore, only data subject can see
how has accessed his data.
PrivateTransaction =(pkdata-consumer
sign ,pkdata-subject
sign ,
Keytransaction,timestam)EncryptBy(pkdata-subject)
The structure of the private blockchain is shown in the figure
V-A1. It keeps record of who has accessed the data (e.g., data
consumer), whose data (e.g., data subject), what data (e.g.,
hash of the data)and the time of the access. Then it is encrypted
by the public key of the data subject.
VI. IMPLEMENTATION
We have implemented a prototype to demonstrate how the
universities (data custodian) can generate the hash of the
academic record or transcript using our proposed schema. The
“hash” transaction is then communicated with other nodes.
After the successful consensus procedure the “hash” of the
record is added to the blockchain.
The data consumer gets the copy of the shared document
from the PDS after the successful evaluation of the data
sharing policy [9]. When the data consumer gets the copy
of the shared academic record. He uses a blockchain client
interface to validate the authenticity of the shared record. The
blockchain client generates the hash of the document using
the same hashing schema ( presented in IV-B) and query
the blockchain. If the document is valid it will verify the
document. If the document is forged it will not verify the
document. If the blockchain client does not find the hash
in the blockchain it will show question mark. Lastly, if the
issued document is revoked by the university, then it will be
shown revoked. Figure 4 shows different types of output of
the verification client. In this paper, we have not implemented
audit trail transactions in the blockchain due to time limitations
and plan it as our future work
Figure 3 shows the time (in milliseconds) for uploading and
verification transaction for different file sizes.The experiment
shows that the time for both uploading and verification are
almost consistent for different file sizes (between 50KB to
DataCustodian DataSubject BlockchainDataConsumer PDS ContextProvider
SharesthedocumentwithDataConsumer
Sendnotificationofsharing
Requesttoaccessshareddocument
Accessgranted
Policyisevaluatedasdeny
Replyofdataaccessrequest
Authenticatethedataconsumer
PolicyEvaluation
Accessdenied
Reply(valid/invalid/revoked)
Querytheblockchain
Fileisstored
Uploadthedocument
Returndocument
Downloadthedocument
Transactionaddedtoblockchain
SendHash
({H(DCus
id
||DS
id
||File)}Sign
DCus
)
Requestcontextinfoifneeded
Replythecontextinfo
Fig. 2: Sequence diagram for sharing and verification
Fig. 3: Performance of uploading and verification
25MB). The source code of the project could be found at
github1. Block explorer shows a transaction and its payload in
1https://github.com/sardap/BARS
the ethereum blockchain network. We have uploaded a demon-
stration video online to show our blockchain implementation
2.
VII. REL ATED WO RK
Several higher educations institutions have employed the
blockchain technology to design different solutions and ap-
proaches related to higher education. The majority of solutions
use the Bitcoin blockchain [2]. An incubation project is based
on the Bitcoin blockchain and is lead by the Media Lab
Learning Initiative at the Massachusetts Institute of Tech-
nology (MIT) [1]. This approach addressed the issues of
digitizing academic certificates and however, they have not
considered selective sharing. The generated certificates are
publicly viewable.
There are also other higher education institutions, which
have or intend to use the blockchain technology. In 2015, a
2https://www.youtube.com/watch?v=3Hbr9kBhn9s
(a) Shared Document is a match (b) Shared Document is not a
match
(c) The certificate cannot be
found
(d) The certificate has been re-
voked
Fig. 4: Results of the verification client
software engineering school in San Francisco, the Holberton
School, announced using the technology to help employers
verify academic credentials [10]. Each graduate will receive a
paper certificate and a digital certificate number that they can
include on resumes. The digital number will allow employers
to verify the certificates validity. Therefore it only allows
verification, not sharing of the certificate online.
In [3], Yan et.al. have proposed a blockchain mechanism to
protect privacy in OpenPDS [14] based environment. Open-
PDS is a privacy preserving SafeAnswer platform which
stores user data and provides answer if a query is submitted
using its APIs. They have also proposed a notary service
using Blockchain. However, they have proposed to stored the
personal data in the blockchain. Storing data in the blockchain
is prohibitively expensive. Therefore, in our apporach, we have
stored the data in the PDS and the blockchain only stores the
hash of the original data.
VIII. CONCLUSION AND FUTURE WORK
Individuals’ data remain scattered among many data-
custodian systems. Individuals may even lose access to their
data custodian systems. The personal data store (PDS) pro-
vides a common point of storage and control of different
types of data about any particular individual, which We refer
to as a subject-centric data storage. When data is shared
from the PDS-based system, there is the issue of how to
verify their authenticity as data consumer often requires the
verification of the shared document. In this paper, we have
provided a blockchain-based architecture to provide the au-
thenticity verification of the shared documents in real-time
while maintaining necessary privacy. In addition, we have
used blockchain to achieve an audit trail of the accesses to
the shared information while keeping the audit trail private
to the individuals concerned. In particular, we have used the
consortium based blockchain network, so that the privacy risk
is minimized and only the valid data creator (e.g., univerisities)
can join the blockchain network while all potential data
consumers can have verification access.
REFERENCES
[1] Initiative MLL. Digital Certificates Project;. Available from: http://
certificates.media.mit.edu/ (accessed on 18th May, 2018)
[2] Bond F, Amati F, Blousson G. Blockchain, academic verification
use case; 2015. Available from: https://s3.amazonaws.com/signatura-
usercontent/blockchain academic verification use case.pdf (accessed
on 18th May, 2018)
[3] Yan Z, Gan G, Riad K. BC-PDS: Protecting Privacy and Self-
Sovereignty through BlockChains for OpenPDS. InService-Oriented
System Engineering (SOSE), 2017 IEEE Symposium on 2017 Apr 6
(pp. 138-144). IEEE.
[4] Dong X, Guo B, Duan X, Shen Y, Zhang H, Shen Y. DSPM: A Platform
for Personal Data Share and Privacy Protect Based on Metadata.
InEmbedded Software and Systems (ICESS), 2016 13th International
Conference on 2016 Aug 13 (pp. 182-185). IEEE.
[5] Tang, Paul C., Joan S. Ash, David W. Bates, J. Marc Overhage, and
Daniel Z. Sands. ”Personal health records: definitions, benefits, and
strategies for overcoming barriers to adoption.” Journal of the American
Medical Informatics Association 13, no. 2 (2006): 121-126.
[6] de Montjoye YA, Wang SS, Pentland A, Anh DT, Datta A. On the
Trusted Use of Large-Scale Personal Data. IEEE Data Eng. Bull.. 2012
Dec;35(4):5-8.
[7] What is Notarization? https://www.nationalnotary.org/knowledge-
center/about-notaries/what- is-notarization (accessed 15 th Feburary,
2018)
[8] Pilkington M. 11 Blockchain technology: principles and applications.
Research handbook on digital transformations. 2016 Sep 30:225.
[9] Chowdhury MJ, Colman A, Han J, Kabir MA. A Policy Framework
for Subject-Driven Data Sharing. InProceedings of the 51st Hawaii
International Conference on System Sciences 2018 Jan 3.
[10] Coleman L. Engineering School Simplifies Verifying Certificates
Using The Blockchain - CryptoCoinsNews; 2015. Available from:
https://www.cryptocoinsnews.com/engineering-school-simplifies-
verifying-certificates-using-block-chain/ (accessed on 18th May, 2018)
[11] Wood G. Ethereum: A secure decentralised generalised transaction
ledger. Ethereum Project Yellow Paper. 2014 Apr;151:1-32.
[12] MyEquals, https://www.myequals.edu.au/ (accessed on 18th May, 2018)
[13] Vukoli M. The quest for scalable blockchain fabric: Proof-of-work
vs. BFT replication. InInternational Workshop on Open Problems in
Network Security 2015 Oct 29 (pp. 112-125). Springer, Cham.
[14] Yan Z, Gan G, Riad K. BC-PDS: Protecting Privacy and Self-
Sovereignty through BlockChains for OpenPDS. InService-Oriented
System Engineering (SOSE), 2017 IEEE Symposium on 2017 Apr 6
(pp. 138-144). IEEE.
[15] Lee WB, Lee CD. A cryptographic key management solution for
HIPAA privacy/security regulations. IEEE Transactions on Information
Technology in Biomedicine. 2008 Jan;12(1):34-41.
[16] Akhigbe A, Whyte AM. The GrammLeachBliley Act of 1999: risk
implications for the financial services industry. Journal of Financial
Research. 2004 Sep 1;27(3):435-46.
[17] Alexopoulos N, Daubert J, Mhlhuser M, Habib SM. Beyond the Hype:
On Using Blockchains in Trust Management for Authentication. InTrust-
com/BigDataSE/ICESS, 2017 IEEE 2017 Aug 1 (pp. 546-553). IEEE.