ArticlePDF Available

MedChain: Efficient Healthcare Data Sharing via Blockchain

Authors:

Abstract and Figures

Healthcare information exchange is an important research topic, which can benefit both healthcare providers and patients. In healthcare data sharing, many cloud-based solutions have been proposed, but the trustworthiness of a third-party cloud service is questionable. Recently, blockchain has been introduced in healthcare record sharing, which does not rely on trusting a third party. However, existing approaches only focus on the records collected from medical examination. They are not efficient in sharing data streams continuously generated from sensors and other monitoring devices. Today, IoT devices have been widely deployed and sensors and mobile applications can monitor patients’ body conditions. The collected data are shared to laboratories and institutions for diagnosis and further study. Moreover, existing approaches are too rigid to efficiently support metadata change. In this paper, an efficient data-sharing scheme is proposed, called MedChain, which combines blockchain, digest chain, and structured P2P network techniques to overcome the above efficiency issues in the existing approaches for sharing both types of healthcare data. Based on MedChain, a session-based healthcare data-sharing scheme is devised, which brings flexibility in data sharing. The evaluation results show that MedChain can achieve higher efficiency and satisfy the security requirements in data sharing.
Content may be subject to copyright.
applied
sciences
Article
MedChain: Efficient Healthcare Data Sharing
via Blockchain
Bingqing Shen , Jingzhi Guo * and Yilong Yang
Faculty of Science and Technology, University of Macau, Macau SAR 999078, China;
daniel.shen@connect.umac.mo (B.S.); yylonly@gmail.com (Y.Y.)
*Correspondence: jzguo@umac.mo; Tel.: +853-8822-4360
Received: 27 November 2018; Accepted: 28 December 2018; Published: 22 March 2019


Abstract:
Healthcare information exchange is an important research topic, which can benefit both
healthcare providers and patients. In healthcare data sharing, many cloud-based solutions have been
proposed, but the trustworthiness of a third-party cloud service is questionable. Recently, blockchain
has been introduced in healthcare record sharing, which does not rely on trusting a third party.
However, existing approaches only focus on the records collected from medical examination. They are
not efficient in sharing data streams continuously generated from sensors and other monitoring
devices. Today, IoT devices have been widely deployed and sensors and mobile applications can
monitor patients’ body conditions. The collected data are shared to laboratories and institutions
for diagnosis and further study. Moreover, existing approaches are too rigid to efficiently support
metadata change. In this paper, an efficient data-sharing scheme is proposed, called MedChain,
which combines blockchain, digest chain, and structured P2P network techniques to overcome the
above efficiency issues in the existing approaches for sharing both types of healthcare data. Based on
MedChain, a session-based healthcare data-sharing scheme is devised, which brings flexibility in
data sharing. The evaluation results show that MedChain can achieve higher efficiency and satisfy
the security requirements in data sharing.
Keywords:
blockchain; healthcare data; electronic health record; data stream; healthcare information
exchange; data sharing; peer-to-peer; decentralization; digest chain
1. Introduction
In the industry, healthcare is an important sector. In addition to traditional medical examination,
patient’s body states, including heart rate, diabetes, electroencephalogram, and other vital biomedical
signals can be monitored by applying various medical tracking devices for diagnosis [
1
,
2
] or health
quality improvement [
3
]. Sharing of such a huge amount of data among organizations can facilitate
medical diagnosis, biomedical research, and policy making. For example, a doctor may need the
medical history of a patient stored in different hospitals when deciding on the best treatment. Moreover,
this market will create a big impact in the economy [
4
]. In healthcare data sharing, user trust is a key
factor for success. Any deficiency could result in distrust among patients towards the e-healthcare
market [5].
For scalability, flexibility, and economic reasons, some cloud-based healthcare data sharing
schemes [
6
] have been proposed through data encryption and operation anonymization. However,
users are always hesitant to transfer their private and sensitive data to the cloud due to its potential
risks [
7
]. Recently, blockchain-based solutions have been widely discussed [
8
]. Blockchain can achieve
many compelling features without trusting a third party. The most important one is tamper-proof,
which is achieved by the special data structure and the consensus mechanism. Moreover, data stored
on a blockchain are highly reliable and available through replication. With the above advantages,
Appl. Sci. 2019,9, 1207; doi:10.3390/app9061207 www.mdpi.com/journal/applsci
Appl. Sci. 2019,9, 1207 2 of 23
the market evidence has shown the potential of the blockchain-based solutions from both a profit shift
(Figure 1a) and management awareness (Figure 1b).
Appl. Sci. 2019, 9, x FOR PEER REVIEW 2 of 23
advantages, the market evidence has shown the potential of the blockchain-based solutions from both
a profit shift (Figure 1a) and management awareness (Figure 1b).
(a)
(b)
(c)
(d)
Figure 1. The potential of blockchain-based solution and major external barriers of adoption in the
healthcare industry: (a) Healthcare investment in blockchain by 2018 [9]. (b) Awareness about
blockchain technology among medical practice administrators and executives [10]. Major external
barriers of healthcare blockchain adoption in (c) the world [11] and (d) Asia-Pacific [12], answered by
respondents.
However, there are still some challenges in healthcare blockchain adoption. Within the major
technical barriers, efficiency (including scalability and latency) is one of the top concerns (Figure
1a,d). Existing blockchain-based solutions are less efficient in sharing data streams from Internet-of-
Things (IoT) devices. First, the data from IoT devices are time-series data streams, such as ECG
signals. In storage, they are cut into many data chunks. Different from a single healthcare record,
accessing a data stream requires access to all the data chunks. Existing schemes are designed to verify
the integrity for a single record. For verifying a data stream, they need to download the digest of each
chunk and check the integrity for all of them, which is inefficient, especially for accessing a long
stream. Second, the description of data could be changed from time to time. For example, a new tag
or category could be added to the existing data. Managing mutable information on blockchain either
needs to add new blocks, which consumes more storage space, or needs to re-write the entire
blockchain. Moreover, data sharing is a dynamic process. When a sharing is over, some temporary
information, such as the location of the actual data and the cryptographic keys used for security
purpose, needs to be cleared, which prevents them from littering the storage space. Existing schemes
are incapable of providing such a mechanism, since the information stored on blockchain is
immutable. When IoT is applied in healthcare, it can be expected that more data will be shared and
storage space overhead will become an outstanding problem. Thus, efficiently sharing healthcare
data has become an emergent problem.
To solve the problem, a new healthcare data sharing solution called MedChain, is proposed,
which introduces the following novelties. First, it leverages two separate decentralized networks: a
blockchain network and a peer-to-peer (P2P) storage network. The blockchain network stores the
Figure 1.
The potential of blockchain-based solution and major external barriers of adoption in
the healthcare industry: (
a
) Healthcare investment in blockchain by 2018 [
9
]. (
b
) Awareness about
blockchain technology among medical practice administrators and executives [
10
]. Major external
barriers of healthcare blockchain adoption in (
c
) the world [
11
] and (
d
) Asia-Pacific [
12
], answered
by respondents.
However, there are still some challenges in healthcare blockchain adoption. Within the major
technical barriers, efficiency (including scalability and latency) is one of the top concerns (Figure 1a,d).
Existing blockchain-based solutions are less efficient in sharing data streams from Internet-of-Things
(IoT) devices. First, the data from IoT devices are time-series data streams, such as ECG signals.
In storage, they are cut into many data chunks. Different from a single healthcare record, accessing a
data stream requires access to all the data chunks. Existing schemes are designed to verify the integrity
for a single record. For verifying a data stream, they need to download the digest of each chunk and
check the integrity for all of them, which is inefficient, especially for accessing a long stream. Second,
the description of data could be changed from time to time. For example, a new tag or category could
be added to the existing data. Managing mutable information on blockchain either needs to add new
blocks, which consumes more storage space, or needs to re-write the entire blockchain. Moreover,
data sharing is a dynamic process. When a sharing is over, some temporary information, such as the
location of the actual data and the cryptographic keys used for security purpose, needs to be cleared,
which prevents them from littering the storage space. Existing schemes are incapable of providing
such a mechanism, since the information stored on blockchain is immutable. When IoT is applied in
healthcare, it can be expected that more data will be shared and storage space overhead will become
an outstanding problem. Thus, efficiently sharing healthcare data has become an emergent problem.
To solve the problem, a new healthcare data sharing solution called MedChain, is proposed,
which introduces the following novelties. First, it leverages two separate decentralized networks:
a blockchain network and a peer-to-peer (P2P) storage network. The blockchain network stores the
fingerprint of data, session, and operation, such as data digest, which are immutable, while the P2P
Appl. Sci. 2019,9, 1207 3 of 23
storage network stores the description of data and session, which are mutable. By separating the
mutable part to another P2P network, data description can be easily updated without bringing
additional overhead to the immutable part. Second, a new data structure, called digest chain,
is proposed to facilitate data stream verification. By concatenating the chunks of digest of the same data
stream into a chain, the plethoric digest download and integrity check problem can be solved. Third,
based on the proposed architecture, a session is introduced in the data sharing process for packaging
and removing the mutable information, which can largely reduce storage overhead. Moreover,
the security properties of the system are validated, which is crucial in healthcare data sharing. Thus,
compared with state-of-the-art works, this paper has the following contributions.
(1)
It has described a MedChain data-sharing framework for flexibly managing different types of
information derived from healthcare data.
(2)
It has also devised a chained digest creation approach to efficiently check the integrity of shared
medical IoT data stream.
(3)
It has provided a session-based data-sharing scheme for achieving efficiency improvement and
the security, integrity, auditability, and privacy-preservation goals.
The rest of this paper is devised as follows. Section 2reviews the related work. Section 3describes
the proposed model. Section 4presents the data-sharing scheme based on the proposed model.
The analysis of the security properties and the experiment results are discussed in Section 5. Lastly,
Section 6concludes the paper.
2. Related Works
In the medical and healthcare sector, legacy systems generally only exchange medical resources
internally [
13
] and are not interoperable with external systems [
14
]. Yet, evidences [
15
,
16
] show
numerous benefits from connecting these systems for integrated and improved healthcare, calling
health informatics researchers for an interconnection solution among different organizations. One of
the most important challenges is inter-organizational data sharing [
17
], demanding the medical data
collected by one healthcare provider to be securely accessible to other entities, such as a doctor or a
research organization.
Cloud computing is considered to be an immediate solution for medical data storage and
sharing [
6
,
18
,
19
] because it is scalable, highly available, and flexible in pricing. Yet, due to the privacy
and confidentiality requirements in healthcare data sharing, additional security means must be applied
to mitigate the risks of cloud sourcing in healthcare and public health industry [
7
]. Many studies [
4
,
20
]
have been conducted to address the security and privacy issues. For example, Reference [
21
] analyzes
the security and privacy issues in the access and management of EHRs. References [
22
,
23
] explores the
solution for search over encrypted health records in a public cloud. Reference [
24
] studies the identity
exposure problem in cloud-based healthcare applications and proposes an anonymous authentication
approach. However, cloud services are not fully trusted by users, due to the infrastructure security,
data ownership, and vendor lock-in issues in cloud sourcing [7,25,26].
Our experience [
18
] shows the importance of inter-organizational healthcare data sharing.
In Macau, for example, the hemodialysis center does not provide a platform for examination result
sharing. Patients have to carry the paper/CD-based results to hospitals for diagnosis. Our previous
work [
18
] provides a hybrid cloud-based medical resource sharing solution for electronic healthcare
record (EHR) sharing, which relies on the trust of an external cloud service provider. This work
replaces the cloud with two decentralized networks to remove the needs of the external party.
To win user trust, blockchain-enabled medical record sharing has been extensively discussed
within the last two years [
8
]. Yue et al. reported their early adoption in Reference [
27
].
They treat blockchain as a database for storing health records. Likewise, References [
28
,
29
] use
permissioned blockchain as a repository of health data to attain access accountability and data integrity.
These approaches need to transfer the actual data to blockchain servers, bringing extra overhead in
Appl. Sci. 2019,9, 1207 4 of 23
data transmission. Reference [
30
] uses blockchain to store only the address of shared data. However,
data sharing is not sufficiently discussed.
Recently, MeDShare [
31
] enables the sharing of patient medical data to third-party research
institutes in cloud repositories. It uses a smart contract for data access auditing and access control.
However, MeDShare also shares the user trust issue in cloud computing. In another design for medical
image sharing [
32
], the source files are still stored at the end of healthcare providers and only the URLs
for file access are stored on the blockchain. Unfortunately, these solutions do not provide an efficient
approach for data search over the blockchain. Since blockchain is not search-friendly, looking up a
specific record would be very slow with the increase of data.
MedRec [
33
] and MedBlock [
34
] provide a breadcrumb mechanism for a record search.
Breadcrumbs maintain the address of blocks containing the records of a patient, grouped by a healthcare
provider or department. Reference [
35
] implements the keyword-based search over encrypted data for
record matching. However, they still share three limitations. First, a blockchain only stores immutable
records. Yet, the actual locations of data are likely to be changed. When a URL is modified, a new
block containing the new URL has to be generated rather than modifying the old block (due to content
immutability of blockchain). Second, their data sharing schemes do not reclaim the space after a
sharing. Thus, a data sharing session also takes a new block [
35
]. Both limitations will result in high
storage overhead. Moreover, the breadcrumbs mechanism is only efficient in looking up a single
record. However, in data stream sharing, it has O(n) communication overhead, where nis the number
of data chunks.
3. MedChain Model
This section discusses the MedChain model. First, the overall system architecture and data
representations are introduced. Then, the details of the blockchain service and the directory service
are introduced.
3.1. System Architecture
MedChain is constructed on a decentralized network, which connects all healthcare providers,
including hospitals, medical centers, clinics, and healthcare corporates. The MedChain network
contains two types of peer nodes: super peers and edge peers. A simple but reasonable peer selection
approach is adopted. Super peers consist of the servers from large healthcare providers, such as
national hospitals, which are more capable in computing and storage, providing the main infrastructure
of data sharing. The edge nodes are the servers from small providers such as community clinics,
which only store the actual patient data. The overall network model is shown in Figure 2, in which a
trusted Certificate Authority (CA) is employed for certificating and validating public keys. It contains
two sub-networks: a blockchain service network and a directory service, which store different types of
information derived from healthcare data.
The resources of a super peer is divided into three modules: blockchain service, directory service,
and healthcare database (HDB), as shown in Figure 3. The blockchain server maintains a complete
blockchain for verifying data integrity and auditing activities. The directory server maintains the
inventory of user healthcare data, maps them to the actual location of storage, and manages sessions
for data sharing. The servers of the two types on all super-peers form two sub-networks. The last
component, HDB, stores the actual healthcare data of patients. Note that MedChain does not require
the healthcare provider to migrate the actual data to the new system. Instead, it provides the reference
to the data in the legacy system for access. Thus, it is an integrated solution, which can reduce
the difficulty of adoption, since most healthcare providers are reluctant to migrate their data to a
new platform.
Appl. Sci. 2019,9, 1207 5 of 23
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 23
Super Peer
(National
Hospital)
Super Peer
(Large
Company)
Clinic
Small
Company
Blockchain
Service
Directory
Service
CA
HDB
Super Peer
Super Peer
HDB
HDB
HDB
HDB
HDB
Immutable
Information
Healthcare
Data
Mutable
Information
<Refer to>
Figure 2. MedChain architecture.
Directory
Service
Inventories
Sessions
Blockchain
Service
Audit Trails
Digests
Healthcare
Database
Super Peer
Healthcare
Database
External
Healthcare Data
Location
Manag ement
Directory
Network
Blockchain
Network
Healthcare
Database
External
Healthcare Data
Auditing
Verification
Auditin g
Legacy System
(Local)
Location
Manag ement
Figure 3. Modules of a super peer.
3.2. Preliminaries
Before describing the details of the system, some preliminaries are introduced first to facilitate
the subsequent description.
3.2.1. Healthcare Data
Healthcare data describes a patient’s health state, in the form of electronic healthcare record, e.g.,
a CT image, or data stream collected by a medical sensor, e.g., ECG signal. Each healthcare
record/data stream is assigned a unique ID (DID). A data stream is a time series record, which maps
each sampled value to a time label. A data stream is cut and saved into multiple data chunks. One
chunk records the sampled value within a time interval. Thus, each chunk has a start time and an end
time labeling the range of the data chunk. All data chunks of the same data stream share the same
DID, but different chunk indexes.
Figure 2. MedChain architecture.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 23
Figure 2. MedChain architecture.
Directory
Service
Inventories
Sessions
Blockchain
Service
Audit Trails
Digests
Healthcare
Database
Super Peer
Healthcare
Database
External
Healthcare Data
Location
Manag ement
Directory
Network
Blockchain
Network
Healthcare
Database
External
Healthcare Data
Auditing
Verification
Auditin g
Legacy System
(Local)
Location
Manag ement
Figure 3. Modules of a super peer.
3.2. Preliminaries
Before describing the details of the system, some preliminaries are introduced first to facilitate
the subsequent description.
3.2.1. Healthcare Data
Healthcare data describes a patient’s health state, in the form of electronic healthcare record, e.g.,
a CT image, or data stream collected by a medical sensor, e.g., ECG signal. Each healthcare
record/data stream is assigned a unique ID (DID). A data stream is a time series record, which maps
each sampled value to a time label. A data stream is cut and saved into multiple data chunks. One
chunk records the sampled value within a time interval. Thus, each chunk has a start time and an end
time labeling the range of the data chunk. All data chunks of the same data stream share the same
DID, but different chunk indexes.
Figure 3. Modules of a super peer.
3.2. Preliminaries
Before describing the details of the system, some preliminaries are introduced first to facilitate the
subsequent description.
3.2.1. Healthcare Data
Healthcare data describes a patient’s health state, in the form of electronic healthcare record, e.g.,
a CT image, or data stream collected by a medical sensor, e.g., ECG signal. Each healthcare record/data
stream is assigned a unique ID (DID). A data stream is a time series record, which maps each sampled
value to a time label. A data stream is cut and saved into multiple data chunks. One chunk records the
sampled value within a time interval. Thus, each chunk has a start time and an end time labeling the
range of the data chunk. All data chunks of the same data stream share the same DID, but different
chunk indexes.
Appl. Sci. 2019,9, 1207 6 of 23
The derived information from healthcare data can be classified into immutable information and
mutable information. The immutable information maintains the security, integrity, and authenticity
of data, which includes the identity of the related users, the digest of the data, the endorsement (e.g.,
digital signature) from the data provider, and the data collection range of a stream. The mutable
information facilitates data query and access, which includes the description, the tags, and the network
location (i.e., URL) of the actual data.
For healthcare data, the identity of the data, the network location, and the start time, and end
time of the chunk for data stream belong to the privacy of a patient, as well as the actual data. Thus,
privacy protection is needed in MedChain to disassociate DID, data location, and the actual data with
data owner (i.e., patient).
3.2.2. User Roles
MedChain contains three user roles: patient, requester, and healthcare provider.
(1)
Patient: shares their data through MedChain with, for instance, a doctor, an insurance company,
or a research center for medical consultancy.
(2)
Requester: who could be, e.g., a doctor, asks a patient to share some of her healthcare data
through MedChain.
(3)
Healthcare provider: maintains the actual healthcare data of patients.
3.2.3. Cryptographic Keys
Cryptographic keys are applied to fulfill the security and privacy requirements of the system
design. The keys to achieve confidentiality over untrusted communication channels and storage space
are highlighted as follows.
Patient public-private key pair (PKpat/SKpat ).
Healthcare provider public-private key pair (PKsp/SKsp).
Requester public-private key pair (PKreq/SKreq).
Inventory secret S
data
: a symmetric secret key for patient inventory access, generated by a
healthcare provider.
Section secret K
sec
: a symmetric secret key for accessing a section of a session, generated by
the patient.
In the implementation, MedChain employs elliptic curve cryptography (ECC) [
36
] for
public-private key pair generation. Moreover, the following notation of cryptographic functions
are employed for description consistency.
Ekey(m): encrypts message mwith key.
Dkey(m): decrypts message mwith key.
H(m): creates a hash code/digest of content m.
For authentication and an integrity purpose, a message msent by a user is signed with the user’s
private key to generate the signature appended on m. Later, the signature will be verified by the public
key of the user.
3.2.4. Other Notations
Addition to the above denotations, the data formats of blockchain, directory, and messages are
defined with basic set and logic notations to facilitate description. Throughout the paper, the symbols
for representing the relations of elements are listed in Table 1.
Appl. Sci. 2019,9, 1207 7 of 23
Table 1. Notations.
Notation Description
:=Definition
h· · · iTuple, representing data or message format
{· · · }Set, representing one or more items of the same type
(,· · ·)Optional element(s)
|Exclusive disjunction (XOR)
kString concatenation operation
. Membership
Message transmission
Assignment
3.3. Blockchain Service
A blockchain is a distributed ledger recording the events of healthcare data generation and data
sharing. It is composed of a growing number of blocks. In MedChain, each block contains one or more
events identified by event hash. An event hash is computed by hashing the event content, as the event
fingerprint. A block also has a block header, which contains:
Merkle Root. The root of the Merkle tree [37] constructed by all the event hashes in the block.
Timestamp: The time of when the block is created.
Block hash: The hash code computed based on the hash of the last block, the Merkle root, and
the timestamp.
The cascaded hash computing at the event level (event hash), the block level (Merkle root), and the
chain level (block hash) ensures the content immutability of a blockchain. If someone wants to modify
the block information, he/she has to modify the entire chain. Yet, any tampering with the content
can be easily detected by re-generating the hash codes and comparing them with the original ones.
Thus, the blockchain is effective with storage immutable information, but weak when storing mutable
information. The MedChain blockchain structure is illustrated in Figure 4, which shows a block with
two events.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 23
Table 1. Notations.
Notation
Description
Definition
Tuple, representing data or message format
{}
Set, representing one or more items of the same type
(,)
Optional element(s)
|
Exclusive disjunction (XOR)
String concatenation operation
.
Membership
Message transmission
Assignment
3.3. Blockchain Service
A blockchain is a distributed ledger recording the events of healthcare data generation and data
sharing. It is composed of a growing number of blocks. In MedChain, each block contains one or
more events identified by event hash. An event hash is computed by hashing the event content, as
the event fingerprint. A block also has a block header, which contains:
Merkle Root. The root of the Merkle tree [37] constructed by all the event hashes in the block.
Timestamp: The time of when the block is created.
Block hash: The hash code computed based on the hash of the last block, the Merkle root, and
the timestamp.
The cascaded hash computing at the event level (event hash), the block level (Merkle root), and
the chain level (block hash) ensures the content immutability of a blockchain. If someone wants to
modify the block information, he/she has to modify the entire chain. Yet, any tampering with the
content can be easily detected by re-generating the hash codes and comparing them with the original
ones. Thus, the blockchain is effective with storage immutable information, but weak when storing
mutable information. The MedChain blockchain structure is illustrated in Figure 4, which shows a
block with two events.
Block n - 1 Block n
Block Header
Block Hash: xxxxxxxxxxxxxxxxxx
Merkle Root: xxxxxxxxxxxxxxxxx
Timestamp: nnnnnnnn
Event 1 (EventDG)
Event Hash: xxxxxxxxxxxxxxxxxx
Event Content: xxxxxxxxxxxxxxxxx
Timestamp: nnnnnnnn
Event 2
Event Hash: xxxxxxxxxxxxxxxxxx
Event Content: xxxxxxxxxxxxxxxxx
Timestamp: nnnnnnnn
Block n - 2
Block m
Digest(Chunk i) Digest(Chunk i+1) Digest(Chunk i+2)
Figure 4. An illustration of the MedChain blockchain structure with a digest chain.
In MedChain, the blockchain service manages the immutable information of healthcare data.
Two types of events are recorded on a blockchain: data generation event and session creation event.
Figure 4. An illustration of the MedChain blockchain structure with a digest chain.
In MedChain, the blockchain service manages the immutable information of healthcare data.
Two types of events are recorded on a blockchain: data generation event and session creation
Appl. Sci. 2019,9, 1207 8 of 23
event. The immutable information and the format of events will be elaborated in detail in the
following paragraphs.
Data Generation Event (EventDG)
. A data generation event is created when a healthcare record
file or a data chunk file is generated. It contains DID, chunk index, patient public key, healthcare
provider public key and signature (Sign
sp
), data digest, and the reference (Ref
L
) to the last chunk location
(identified by the block hash (BH
L
) and the event hash (EH
L
)) on the blockchain (for data stream only)
and event type. As shown in Def. (1), the privacy-sensitive items, including DID, chunk index,
and PK
pat
, are encrypted with PK
pat
for disassociating a patient with a healthcare provider. The data
digest is created by hashing the data file for data integrity check. See the sub-section below for the
details of the digest chain.
EventDG Content,PKs p,Digest,Signsp(,Re f L),Type
ContentEPKp at DI D(,I ndex),PKpat
DigestH(Data)|H(Chunk(i)kDigesti1)
Re f LhBHL,EHLi
Type AddData
(1)
Session Creation Event (EventSC)
. A session creation event is created when a patient grants the
access of some of the healthcare data to a requester. It contains a list of DIDs. For the data stream,
the start time (st) and end time (et) can be specified as an access constraint. It also contains the public
key of the patient and the requester for identifying the session participants, as well as the signature
of the patient (Sign
pat
). The formal definition of the session creation event is shown in Def. (2).
For de-identification, DID,st,et, and PK
req
are encrypted with PK
pat
. The session digest is generated by
hashing the DIDs and the time constraints of the shared data and the session participants’ public keys.
EventSCContent,PKp at,Digest,Signpat,Type
ContentEPKp at {DI D(,st,et)},PKreq
DigestH{D ID(,st,et)}kPKreq kPKpat
Type CreateSession
(2)
The blockchain servers run the blockchain service on all super peers, which collectively provide a
consortium blockchain network [
38
]. Each blockchain server maintains a complete blockchain and
they run a distributed consensus algorithm to collectively determine the content of the next block.
MedChain is not coupled with a particular consensus protocol. The current implementation adopts
BFT-SMaRt [
39
] for its simplicity. It can be easily changed to another protocol with minimal adaptation,
such as Proof-of-Stake [32].
Digest Chain
To efficiently verify the data stream, the digest generation algorithm is modified. For a data
stream, the digest of the ith chunk is created by hashing (H(
·
)) the content, which consists of the
chunk data (Chunk(i)) and the digest of the last chunk (Digest
i1
), which was formulated by the
H(Chunk(i)
k
Digest
i1
). Then, multiple digests of the chunks covering a continuous sampling time
period forms a digest chain. As illustrated in Figure 4, the chunks of a digest chain could be distributed
on different blocks and they are connected with the reference Ref
L
in the Event
DG
. Thus, to generate a
new block, only the block containing the digest of the last chunk needs to be retrieved through Ref
L
for
querying the digest of the last chunk.
It is important that, with a digest chain, downloading and verifying the digest of one chunk can
check the integrity of all the previous chunks. If the last chunk is valid, the integrity check can stop.
Otherwise, it will repeat until a valid chunk is found. This mechanism allows early termination in the
data integrity check, which increases system responsiveness in data stream access. Note that a digest
Appl. Sci. 2019,9, 1207 9 of 23
chain does not need to be verified by the blockchain servers in new block generation. It can be left to
data access, which is efficient especially for using less data.
3.4. Directory Service
The directory service is also an important component in the MedChain model, managing the
mutable information of healthcare data for data identification, location, and sharing. MedChain
provides two types of directories: patient inventory and session. The mutable information and the
format of directories are introduced below in detail.
Inventory
. An inventory records all the description details of a patient’s healthcare data
maintained by a healthcare provider. As defined in Def. (3), it includes the inventory ID (InvID),
the inventory content (InvContent), and the encrypted inventory secret for healthcare providers (ES
sp
)
and patients (ES
pat
). The inventory ID uniquely identifies the inventory with the association of the
patient and the healthcare provider. For a privacy purpose, such association is obfuscated by the
inventory secret S
data
. The inventory content contains a list of data descriptions, including the DIDs,
metadata (e.g., the summary and the tags of the data), the locations of the actual data (i.e., URLs),
and the references (Ref or Ref
L
) to the blockchain for data integrity check. For a single record, Ref refers
to the location (identified by the block hash (BH) and event hash (EH)) of the Event
DG
on the blockchain.
For a data stream, RefLrefers to the location of the last chunk EventDG.
InventoryInvID,InvContent,ESsp ,ESpat
InvI DH(PKuser kSdata)
InvContentESdata ({D ID,Metadata,URL,Re f |Re f L})
Re f hBH,EHi
Re f LhBHL,EHLi
ESs p|pat EPKs p|PKpat (Sdata)
(3)
A patient may have multiple inventories from different healthcare providers (i.e., different S
data
).
In addition, a provider can divide the inventory of a patient into multiple inventories, e.g., according
to the hospital departments, which provides more flexibility in inventory storage and management.
Session
. A session describes a data sharing between a patient and a third party, e.g., a doctor,
called a requester. Def. (4) shows the session definition. It includes a session ID (SID) and multiple
sections. Session ID is the event hash of the corresponding session creation event in the blockchain.
A session contains the directory of all the shared data, but it divides them into multiple sections
according to patient inventories to facilitate management. Actually, each section contains some of the
entries copied from the corresponding inventory (after decryption by S
data
). Thus, the section ID is
the InvID of the inventory. Each entry of a section is composed of the DID, the metadata, the URL of
the actual data, and the reference (Ref or Ref
L
) to the Event
DG
in the blockchain. For the data stream,
the range of data allowed to access can also be specified (i.e., by st and et). In this case, Ref
L
refers
to the last shared chunk. For privacy, the entries of a section are encrypted by the section secret K
sec
.
K
sec
is shared to the requester for accessing the shared data. It is also shared with the corresponding
healthcare provider for metadata and URL update and data access verification. To securely distribute
the section secret, K
sec
is encrypted with the public key of the section participants to EK
sp
,EK
req
,
and EKpat. Notably, different sections and sessions are encrypted with different Ksec.
Appl. Sci. 2019,9, 1207 10 of 23
SessionhS ID,{Section}i
SectionhSectionID,{Section}i
SI DSCEvent.EventHash
SectionIDI nvID
SectionEKsec ({Entry}),EKsp ,EKreq,EKpat
EntryhDID,Metadata,URL(,st,et),Re f L)i
Re f LhBH,EHi|hBHL,EHLi
EKs p|req|patEPKsp |PKreq |PKpat (Ksec)
(4)
Besides the blockchain service, super peers are also connected to provide the directory service.
For directory lookup, MedChain employs Chord [
40
] to map a directory ID (i.e., InvID or SID) to a
server ID, which can also be replaced with any other P2P routing protocol implementing the distributed
hash table (DHT) function [41].
Search over Blockchain
In MedChain, a healthcare record or data stream is identified by its unique identity (DID). On the
blockchain, however, DID is encrypted together with patient identity (i.e., PK
pat
) in the Content section
for a privacy purpose (see Def. (1)). Given a DID, the entire blockchain has to be downloaded,
traversed, and decrypted to find the record as in some existing works.
The directory service can be treated as the index of the events on the blockchain, providing an
efficient way for search over blockchain. Like the breadcrumbs mechanism in References [
33
,
34
],
each patient inventory contains a reference (either Ref or Ref
L
) to the corresponding record on the
blockchain (see Def. (3)). A reference consists of a block hash (BH) and an event hash (EH). On a
blockchain server, the events are stored on a database indexed first by block hash and then by event
hash. Thus, an efficient search algorithm (e.g., binary search) can be applied for event lookup. If the
binary search is applied, the search complexity is O(log(p)
·
log(q)) where pis the number of blocks
and qis the number of events in a block, which is much lower than (p
·
q) if the entire blockchain
is traversed (as in References [
31
,
32
]). The same search process can be applied to event lookup for
a session, since inventory and session share the same data structure in healthcare data description
(Def. (3) and Def. (4)).
The directory service maps a data stream to the last chunk generation event on the blockchain with
the last chunk block hash (BH
L
) and event hash (EH
L
) (see Def. (3)). On the blockchain, each chunk,
except the first one, links to the previous chunk of the same stream by referring to its block hash and
event hash (see Def. (1)). Thus, to download all the chunks of a data stream on the blockchain, a client
first locates the position of the last chunk (by referring to the Ref
L
in Def. (3)) and then recursively
search the previous chunk (by referring to the RefLin Def. (1)) until the first one.
A more fine-grained and user-friendly chunk search approach can be designed by adding the
chunk start time and end time into each chunk generation events on the blockchain to support
chunk-level data search, which will be implemented in our future work.
4. Healthcare Data Sharing
This section describes the session-based healthcare data-sharing scheme in detail, which includes
data generation, session management, and key management. The first two processes describe the main
activities involved in data sharing, while the third part renders an efferent key management design,
since many types of keys are involved in the scheme.
4.1. Data Generation
The data generation process includes adding the data generation event to the blockchain service
and the data description to the directory service. Figure 5illustrates this procedure, which can be
described with the following steps.
Appl. Sci. 2019,9, 1207 11 of 23
Step 1. The healthcare provider collects the healthcare data from a patient through medical devices
and sensors.
Step 2–3. The healthcare provider creates an Event
DG
and sends it to the blockchain. The blockchain
service adds the Event
DG
to a new block together with other received events, and then replies the
block hash and the event hash to the provider.
Step 4. The healthcare provider adds a new entry to the patient’s inventory for describing the data.
If the inventory does not exist, a new inventory will be created. Later, the patient can access it
by decrypting the inventory with S
data
, which is retrieved by decrypting the corresponding ES
pat
with his/her SKpat.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 23
Event
Block & Event Hash
Inventory
Healthcare
Provider
Data
Directory
Blockchain
Ledger
Medical data
ECG
Figure 5. Process of new healthcare data generation.
4.2. Session Management
After the information of the data is added into the blockchain service and the directory service,
a patient can grant access to his/her healthcare data to a requester through a session. The complete
data sharing process is shown in Figure 6.
Step 12. As per request, the patient selects the data from his/her inventory for sharing, creates
a session with the selected data descriptions and the PKreq, encrypts them with SKpat, generates
the session digest to maintain session integrity, creates an EventSC, and sends it to the blockchain
service. The blockchain service adds the EventSC to a new block together with other received
events, and appends the block to the end of the blockchain. Then, it sends back the SID (i.e., the
event hash of EventSC) to the patient.
Step 34. The patient then creates a session in the directory service with the received SID, and
notifies the requester and the healthcare provider of the session by sending them the SID. He/she
uses Ksec to encrypt the content of the session and then PKreq to encrypt the session key.
Step 56. With the SID, the requester can find and access the session. After decrypting the session
key (Ksec) with his/her SKreq and the session with the Ksec, the requester learns the actual location
of the shared data and sends the request to the healthcare provider for data access.
Step 78. On receiving the data access request, the healthcare provider checks the session state
in the directory service. If the session exists, it then verifies the request, including the message
signature and the range of the requested. If they are all valid, the healthcare provider returns the
data to the requester. In data transmission, a secure channel can be established through
asymmetric encryption on the data with PKreq.
Step 9. The requester, after receiving and decrypting the data with SKreq, verifies the data integrity
by downloading the data digests from the blockchain service. If the data is valid, then he/she
can access the data. If the data is invalid due to, for example, data loss/corruption during storage
or transmission, an alert will be triggered, which is out of the scope of this paper.
A patient can revoke the access of a requester by closing the session. Session removal is
implemented by removing the session in the directory service to prevent the requester from future
access and to recycle storage space.
Figure 5. Process of new healthcare data generation.
4.2. Session Management
After the information of the data is added into the blockchain service and the directory service,
a patient can grant access to his/her healthcare data to a requester through a session. The complete
data sharing process is shown in Figure 6.
Step 1–2. As per request, the patient selects the data from his/her inventory for sharing, creates
a session with the selected data descriptions and the PK
req
, encrypts them with SK
pat
, generates
the session digest to maintain session integrity, creates an Event
SC
, and sends it to the blockchain
service. The blockchain service adds the Event
SC
to a new block together with other received
events, and appends the block to the end of the blockchain. Then, it sends back the SID (i.e.,
the event hash of EventSC) to the patient.
Step 3–4. The patient then creates a session in the directory service with the received SID,
and notifies the requester and the healthcare provider of the session by sending them the SID.
He/she uses Ksec to encrypt the content of the session and then PKreq to encrypt the session key.
Step 5–6. With the SID, the requester can find and access the session. After decrypting the session
key (K
sec
) with his/her SK
req
and the session with the K
sec
, the requester learns the actual location
of the shared data and sends the request to the healthcare provider for data access.
Step 7–8. On receiving the data access request, the healthcare provider checks the session state
in the directory service. If the session exists, it then verifies the request, including the message
signature and the range of the requested. If they are all valid, the healthcare provider returns
the data to the requester. In data transmission, a secure channel can be established through
asymmetric encryption on the data with PKreq.
Step 9. The requester, after receiving and decrypting the data with SK
req
, verifies the data integrity
by downloading the data digests from the blockchain service. If the data is valid, then he/she can
Appl. Sci. 2019,9, 1207 12 of 23
access the data. If the data is invalid due to, for example, data loss/corruption during storage or
transmission, an alert will be triggered, which is out of the scope of this paper.
Appl.Sci.2019,9,xFORPEERREVIEW12of23
⑨Digest
①Event
Doctor
Patient
Healthcare
Provider
②SessionID
③Session
④SessionID
⑤Session
⑥Request
Data
Directory
Blockchain
Ledger
⑧Data
⑦Session
Figure6.Processofsessionbaseddatasharing.
Foreachdatastreaminasession,apatientcandeterminethedataaccessrangebyspecifying
thestarttime(st)andendtime(et)ofthesharedsegment.Inapatientcentricsharingscheme,a
healthcareproviderwillonlyreturnthechunkswithintherangetoarequester.Letstcandetcbethe
datacollectionstarttimeandtimeofachunkc.Thechunksallowedtoaccessare{c|stc≥st∧etc≤et}.
Inauserinterfacedesign,weplantoalignthepaceoftimeselectionwiththepaceofchunkfile
generationtoavoidtimespecifyinginthemiddleofachunk.Forinstance,ifthechunksofastream
aregeneratedperhour,thenstandetwillbevariedbyhourandfromthestarttimeofthefirstchunk.
Notethattheaboveprocessisflexibleatseveralplaces.First,apatientcanleavethesessionopen
forfutureaccessbythesamedoctor,ifsubsequentconsultationsareneededorthedataisusedfor
medicalresearch.Then,theendtimeofadatastreaminasessioncanbeleftemptytoenablethe
doctortotrackthepatient’sconditionaslongasneeded.Iftheendtimeisempty,thesharingscheme
willnotrestrictarequesterfromaccessinganychunkofthestreamfromthespecifiedstarttime.If
boththestarttimeandtheendtimeareempty,nochunkaccessrestrictionwillbeimposedtothe
requester.Moreover,MedChaindoesnotaimtoreplacetheexistingEHRsystemsathospitals,buta
complementarysolutionspecificforinterorganizationaldatasharing.Thus,forthedoctorswho
needthelocalpatientrecordsgeneratedatthehospital,theycanstillretrieveitthroughexisting
procedureandthelegacysystem.
4.3.KeyManagement
Theapplicationprovideseachuser(apatient,requester,orhealthcareprovider)asoftwarekey
caseforstoringthereceivedcryptographickeys.AsshowninFigure7a,auserfirstgeneratesand
storeshis/herkeypair(publickeyandprivatekey)inthekeycase.Thepublickeyissignedand
verifiedbytheCertificateAuthority(CA).Usersalsostorethekeysofothersfordatasharing.
Specifically,whenapatientregistersatahealthcareservice,thepatientandthehealthcareprovider
exchangetheirrespectivepublickeysforinventorycreation.Whenarequesterneedsaccesstoa
patient’sdata,his/herpublickeywillbeincludedintherequestandcachedinthekeycaseofthe
patientforasessioncreation.Whenasessionisrevoked,thepatientwillremovethecorresponding
key(i.e.,PKreq)fromthekeycase.Thefirsttimeofuse,auserwillsendacopyofhis/herpublickey
tothecommunicatingsuperpeerformessageauthentication.Moreover,thekeysofpatientscanbe
sharedtofamilymembersforemergenthealthcareprovision.Yet,amoresophisticatedsolutionfor
emergencyaccessislefttoourfuturework.
Thesymmetrickeys(i.e.,SdataandKsec)arenotstoredinuserkeycases,butstoredalongwiththe
informationtheyencrypted(aninventoryorasessionsection).Thesymmetrickeystorageis
illustratedinFigure7b,c.SdataisstoredwithinthepatientinventoryandencryptedbyPKspandPKpat.
KsecisstoredwithineachsessionsectionandencryptedbyPKpat,PKsp,andPKreq.Sincekeycasesdo
Figure 6. Process of session-based data sharing.
A patient can revoke the access of a requester by closing the session. Session removal is
implemented by removing the session in the directory service to prevent the requester from future
access and to recycle storage space.
For each data stream in a session, a patient can determine the data access range by specifying the
start time (st) and end time (et) of the shared segment. In a patient-centric sharing scheme, a healthcare
provider will only return the chunks within the range to a requester. Let st
c
and et
c
be the data
collection start time and time of a chunk c. The chunks allowed to access are {c|st
c
st
et
c
et}. In a
user interface design, we plan to align the pace of time selection with the pace of chunk file generation
to avoid time specifying in the middle of a chunk. For instance, if the chunks of a stream are generated
per hour, then st and et will be varied by hour and from the start time of the first chunk.
Note that the above process is flexible at several places. First, a patient can leave the session
open for future access by the same doctor, if subsequent consultations are needed or the data is used
for medical research. Then, the end time of a data stream in a session can be left empty to enable
the doctor to track the patient’s condition as long as needed. If the end time is empty, the sharing
scheme will not restrict a requester from accessing any chunk of the stream from the specified start
time. If both the start time and the end time are empty, no chunk access restriction will be imposed
to the requester. Moreover, MedChain does not aim to replace the existing EHR systems at hospitals,
but a complementary solution specific for inter-organizational data sharing. Thus, for the doctors
who need the local patient records generated at the hospital, they can still retrieve it through existing
procedure and the legacy system.
4.3. Key Management
The application provides each user (a patient, requester, or healthcare provider) a software key
case for storing the received cryptographic keys. As shown in Figure 7a, a user first generates and stores
his/her key pair (public key and private key) in the key case. The public key is signed and verified
by the Certificate Authority (CA). Users also store the keys of others for data sharing. Specifically,
when a patient registers at a healthcare service, the patient and the healthcare provider exchange
their respective public keys for inventory creation. When a requester needs access to a patient’s data,
his/her public key will be included in the request and cached in the key case of the patient for a session
creation. When a session is revoked, the patient will remove the corresponding key (i.e., PK
req
) from
Appl. Sci. 2019,9, 1207 13 of 23
the key case. The first time of use, a user will send a copy of his/her public key to the communicating
super-peer for message authentication. Moreover, the keys of patients can be shared to family members
for emergent healthcare provision. Yet, a more sophisticated solution for emergency access is left to
our future work.
Appl.Sci.2019,9,xFORPEERREVIEW13of23
notneedtostorethesekeys,whicharetemporarilygeneratedindatasharing(esp.forsessionkeys
sinceinventoriesarelikelytobelongtermmaintained),thisdesigncanlargelyreducekey
managementoverhead.Whenthedatasharingnetworkconnectstoahugenumberofhealthcare
providersandrequesters,suchabenefitwillbecomemoreimportantforthereducedspace
requirement,andmoreuserfriendlyespeciallyfortheapplicationclientsbuiltonmobiledevices.
Provider
Keycase Patient
Keycase Doctor
Keycase
CA
PK
sp
/SK
sp
PK
pat
/SK
pat
PK
req
/SK
req
PK
pat
PK
req
PK
sp
①Req uest Data
②Register Service
③Use System
①②
Super‐peer
Keystore
PK
req
PK
sp
PK
pat
Inventory
11101000110
203utoijg834pof2qu3r
0utpwujtf9032up219i
R03utgwejfh0t23irpdd
dwd2425tw43qwt3t3r
oigAd76fe1gS8DgefeG
InvID
S
data
(InvContent)
PK
pat
(S
data
)3RwejefH5y2AefYL3D
PK
sp
(S
data
)
Session
11101000110
uto203ijg834p3rof2qu
wuj0utptf903219iup2
ejfhR03utgw0pddt23ir
2425dwdt3t3rw43qwt
efaAd76fe1gS8DgefeG
SID
K
sec
(Entry)
PK
pat
(K
sec
)jefH3Rwe5y2AefYL3D
PK
sp
(K
sec
)8DgefeG3YgT43ut0we
PK
req
(K
sec
)
(a)(b)(c)
Figure7.KeyManagementScheme:Keysarestoredin(a)keycase,(b)inventory,and(c)session
throughencryption.
5.Evaluations
Inthissection,theaccuracyandperformanceofMedChainareevaluated.First,thesystem
securityisanalyzedbystudyingtherelatedproperties.Then,thesystemefficiencyisstudiedthrough
theoreticalanalysisandexperiments.Lastly,theevaluationissummarizedthroughacomprehensive
discussion.
5.1.SecurityAnalysis
Thissectionvalidatesthesecurityrequirementfulfillment,includingtheanalysisonattack
resistance,privacyprotection,andnonrepudiation.Assumetheusersofthesystemneverexpose
theirprivatekeysandsecretkeystoothers.Theencryptionalgorithmcannotbecompromisedand
anadversarycannotrecoverthekeys.Inaddition,theauthorizedrequestersareassumedtobe
trustworthy.Forexample,relatedlawsandregulationsareenactedtorestrainhealthcaredata
secondaryusage.Tofacilitatedescription,someconventionsaremadefirst.Amessagemsentfrom
CtoBisdenotedby→:,󰇛󰇜where󰇛󰇜representsthesignatureofthemessage
signedwithC’sprivatekey.C(A)representsthescenariothatCisdisguisedas