ThesisPDF Available

Efficient Malicious Node Detection and Distributed Data Storage using Blockchain and IPFS in WSNs

Efficient Malicious Node Detection and
Distributed Data Storage using Blockchain and
Muhammad Nouman
MS Thesis
Computer Science
COMSATS University Islamabad, Islamabad - Pakistan
Spring, 2022
COMSATS University Islamabad
Efficient Malicious Node Detection and
Distributed Data Storage using Blockchain and
A Thesis Presented to
COMSATS University Islamabad
In partial fulfillment
of the requirement for the degree of
MS (Computer Science)
Muhammad Nouman
Spring, 2022
Efficient Malicious Node Detection and
Distributed Data Storage using Blockchain and
A Post Graduate Thesis submitted to the Department of Computer Science as
partial fulfilment of the requirement for the award of Degree of MS (Computer
Name Registration Number
Muhammad Nouman CIIT/FA20-RCS-015/ISB
Prof. Dr. Nadeem Javaid,
Department of Computer Science,
COMSATS University Islamabad,
Islamabad, Pakistan
Dr. Umar Qasim,
Associate Professor, Department of Computer Science,
University of Engineering and Technology at Lahore (New Campus),
Lahore, Pakistan
Final Approval
This thesis titled
Efficient Malicious Node Detection and Distributed Data Storage
using Blockchain and IPFS in WSNs
Muhammad Nouman
has been approved
For the COMSATS University Islamabad, Islamabad
External Examiner:
Dr. Muhammad Muzammal
Sr. Associate Professor, Department of Software Engineering,
Bahria University Islamabad, Islamabad
Prof. Dr. Nadeem Javaid
Department of Computer Science,
COMSATS University Islamabad, Islamabad
Dr. Umar Qasim
Associate Professor, Department of Computer Science,
University of Engineering and Technology at Lahore (New Campus), Lahore,
Prof. Dr. Majid Iqbal Khan,
Department of Computer Science,
COMSATS University Islamabad, Islamabad
I Muhammad Nouman (Registration No. CIIT/FA20-RCS-015/ISB) hereby declare
that I have produced the work presented in this thesis, during the scheduled period
of study. I also declare that I have not taken any material from any source except
referred to wherever due that amount of plagiarism is within acceptable range. If
a violation of HEC rules on research has occurred in this thesis, I shall be liable
to punishable action under the plagiarism rules of the HEC.
Date: September, 2022
Muhammad Nouman
It is certified that Muhammad Nouman (Registration No. CIIT/FA20-RCS-015/ISB)
has carried out all the work related to this thesis under my supervision at the De-
partment of Computer Science, COMSATS University, Islamabad and the work
fulfills the requirement for award of MS degree.
Date: September, 2022
Prof. Dr. Nadeem Javaid
Department of Computer Science
Dr. Umar Qasim
Department of Computer Science,
University of Engineering and Technology at Lahore,
Lahore, Pakistan
Prof. Dr. Majid Iqbal Khan
Department of Computer Science
Dedicated to my mentor Dr. Nadeem Javaid, Dr. Mariam Akbar
and loving Parents, who equipped me with pearls of knowledge and
showed me the way of spiritual and personal enlightenment in this
world and the world hereafter.
First of all, thanks to Allah Almighty who give me strength and confidence to
complete this dissertation. After that, I would like to express my profound ap-
preciation to many people who supported me during my MS and who helped me
to complete my thesis. Their generous support made this research work possi-
ble Firstly, I would like to express my sincere gratitude to my advisor Prof. Dr.
Nadeem Javaid for the continuous support of my MS study and related research,
for his patience, motivation and immense knowledge. His guidance helped me in
all the time of research and writing of this thesis. I could not have imagined having
a better advisor and mentor for my MS study. I am truly indebted to him for his
knowledge, thoughts and friendship. Last but not the least, I am greatly thankful
to Director of ComSens Lab and all of my colleagues at CUI for providing me the
warm and friendly atmosphere.
Efficient Malicious Node Detection and Distributed Data
Storage using Blockchain and IPFS in WSNs.
In this thesis, we have provided two major contributions. In the first contribution,
blockchain is implemented on the Base Stations (BSs) and Cluster Heads (CHs)
to register the nodes using their credentials and also to tackle various security
issues. Moreover, a Machine Learning (ML) classifier, termed as Histogram Gra-
dient Boost (HGB), is employed on the BSs to classify the nodes as malicious
or legitimate. In case, the node is found to be malicious, its registration is re-
voked from the network. Whereas, if a node is found to be legitimate, then its
data is stored in an Interplanetary File System (IPFS). IPFS stores the data in
the form of chunks and generates a hash for the data, which is then stored in
the blockchain. In addition, Verifiable Byzantine Fault Tolerance (VBFT) is used
instead of Proof of Work (PoW) to perform consensus and validate transactions.
Also, extensive simulations are performed using the WSN dataset, referred to as
WSN-DS. The proposed model is evaluated both on the original dataset and the
balanced dataset. Furthermore, HGB is compared with other existing classifiers,
Adaptive Boost (AdaBoost), Gradient Boost (GB), Linear Discriminant Analysis
(LDA), Extreme Gradient Boost (XGB), and ridge, using different performance
metrics like accuracy, precision, recall, micro-F1 score, and macro-F1 score. The
performance evaluation of HGB shows that it outperforms all other classifiers in
terms of the mentioned metrics. Overall, the proposed model performs efficiently
in terms of malicious node detection and secure data storage.
Whereas, in the second contribution, a blockchain based distributed interplanetary
file system (IPFS) for data storage and a DRCL stacking model for malicious node
detection in WSNs. Furthermore, blockchain is integrated with IPFS. IPFS is used
to store massive amounts of data, while blockchain is used to record their hashes.
The proposed DRCL stacking model consists of four standalone classifiers, such
as Decision tree (DT), Random forest (RF), Complement naive bayes (CNB), and
Logistic regression (LR). Moreover, the Adaptive synthesis (ADASYN) technique
is used to balance the wireless sensor network dataset (WSN-DS). The proposed
DRCL stacking model with ADASYN is compared to standalone classifiers on
original and balanced datasets in terms of accuracy, precision, recall, F1-score,
and Area under the curve (AUC). The proposed model achieves 99.91% accuracy,
100% precision, 100% recall, 100% F1-score, and 99.99% AUC score. Moreover,
hashes stored on blockchain consume 309585 gwei transaction cost and 46438
gwei execution cost. The simulation results and comparative analysis show that
the proposed DRCL stacking model with ADASYN outperforms as compared to
standalone classifiers.
Dedication vii
Acknowledgements viii
Abstract ix
List of Figures xiii
List of Tables xiv
List of Symbols xv
1 Introduction 1
1.1 Overview of Research .......................... 2
1.2 Problem Statement ........................... 4
1.3 Research Objectives ........................... 4
1.4 Research Questions and Answers: ................... 5
1.5 Research Contributions 1 ........................ 5
1.6 Research Contributions 2 ........................ 6
1.7 Research Methodology ......................... 7
1.8 Thesis Structure ............................. 7
2 Background studies 8
2.1 Blockchain ................................ 9
2.1.1 Public Blockchain ........................ 9
2.1.2 Private Blockchain ....................... 9
2.1.3 Consortium Blockchain ..................... 9
2.2 Mining .................................. 9
2.3 Consensus Mechanism ......................... 10
2.3.1 Proof of Work .......................... 10
2.3.2 Proof of Stake .......................... 10
2.3.3 Proof of Authority ....................... 10
2.4 Wireless Sensor Network ........................ 10
2.5 Sensor Nodes .............................. 10
2.6 Cluster Head .............................. 11
2.7 Malicious Node ............................. 11
2.8 Blackhole Attack ............................ 11
2.9 Grayhole Attack ............................. 11
2.10 Interplanetary File System ....................... 11
2.11 Hashing ................................. 11
2.12 Chapter Summary ........................... 12
3 Literature Review 13
3.1 Related Work Contributions 1 ..................... 14
3.2 Related Work Contributions 2 ..................... 24
3.3 Critical Analysis ............................ 27
4 Proposed Solution 28
4.1 Proposed System Model 1 ....................... 29
4.1.1 Assumptions ........................... 29
4.1.2 Network Model Description .................. 29
4.1.3 Registration ........................... 30
4.1.4 Sensor Nodes .......................... 30
4.1.5 Cluster Heads .......................... 31
4.1.6 Base Station ........................... 31
4.1.7 Customers ............................ 32
4.1.8 Malicious Node Detection Using Machine Learning Classifiers 32
4.1.9 Dataset Description ....................... 37
4.1.10 Data Sampling ......................... 37
4.2 Proposed System Model 2 ....................... 38
4.2.1 Wirless Sensor Network (WSN) ................ 39
4.2.2 Blockchain ............................ 39
4.2.3 Wireless Sensor Network Dataset (WSN-DS) ......... 40
4.2.4 Oversampling with ADASYN ................. 41
4.2.5 Malicious Nodes’ Detection Using Stacking Model ...... 42 Logistic Regression .................. 43 Random Forest .................... 43 Complement Naive Bayes .............. 43 Decision Tree ..................... 44
5 Experimental Results and Evaluation 46
5.1 Contributions 1 Results and Discussions ............... 47
5.1.1 Blockchain Results’ Discussion ................. 47
5.1.2 Analysis of ML Results ..................... 50
5.1.3 Comparative Analysis of the Proposed Model ........ 57
5.2 Feasibility of the Proposed Model ................... 58
5.3 Contributions 2 Results and Discussions ............... 59
5.4 Security Analysic ............................ 63
5.4.1 Interger Underflow and Overflow ............... 64
5.4.2 Transaction Ordering Dependence TOD ........... 64
5.4.3 Callstack Attack ........................ 64
5.4.4 Timestamp Dependency .................... 65
5.4.5 Reentrancy Attack ....................... 65
5.5 Chapter Summary ........................... 65
6 Conclusions and Future Work 66
7 References 68
4.1 Proposed Network Model for WSNs. ................. 29
4.2 Proposed Model’s Workflow. ...................... 31
4.3 DRCL Stacking model for malicious nodes detection in WSNs . . . 38
5.1 Comparison of PoW and VBFT consensus mechanism in terms of
transaction cost. ............................. 48
5.2 (a) Comparison of time consumed in uploading files on IPFS. (b)
Comparison of time consumed in downloading files from IPFS. . . 49
5.3 (a) Accuracy of classifiers on the balanced dataset. (b) Accuracy of
classifiers on the original dataset. ................... 51
5.4 (a) Precision of classifiers on the balanced dataset. (b) Precision of
classifiers on the original dataset. ................... 52
5.5 (a) Recall of classifiers on the balanced dataset. (b) Recall of clas-
sifiers on the original dataset. ..................... 53
5.6 (a) F1-score of classifiers on the balanced dataset. (b) F1-score of
classifiers on the original dataset. ................... 55
5.7 ROC curve of classifiers on the balanced dataset. .......... 56
5.8 Comparision of DRCL model with standalone classifiers on the orig-
inal dataset ............................... 60
5.9 Comparision of DRCL model with standalone classifiers on the Bal-
anced dataset .............................. 61
5.10 Comparision of IPFS upload time and download time on various files 62
5.11 Comparision of datastorage and hashes storage on blockchain . . . . 63
5.12 Security Analysis of Proposed Smart Contract ............ 64
3.1 Literature Review ............................ 19
4.1 Details of WSN-DS ........................... 37
4.2 Details of original and Balanced WSN-DS .............. 40
5.1 Time Complexity of Classifiers ..................... 56
5.2 The detection performance of six classifiers using the dataset bal-
anced through SMOTE. ........................ 57
5.3 The detection performance of six classifiers on the original dataset. 57
5.4 Comparison of DRCL stacking model with standalone classifiers on
the original dataset ........................... 60
5.5 Comparison of DRCL stacking model with standalone classifiers on
the balanced dataset using ADASYN ................. 62
List of Symbols
BSs Base Stations
SNs Sensor Nodes
CHs Cluster Heads
IPFS Interplantery File System
WSNs Wireless Sensor Networks
BCR Blockchain Contractual Routing
PoW Proof of Work
PoA Proof of Authority
VBFT Verifiable Byzantine Fault Tolerance
PoS Proof of Stack
BC-EKM Blockchain-Based Encryption Key Management
MEC Mobile Edge Computing
BC-EKM Blockchain-Based Encryption Key Management
MEC Mobile Edge Computing
ADMM Alternating Direction Method of Multipliers
ICN Information Centric Network
IoT Internet of Things
IIoT Industrial Internet of Things
DT Data Transmission
SMP Synergistic Multiple Proof
SDN Software Define Network
P2P Peer to Peer
LB Light Block
RMCV Real time Message Content Validation
STS Station to Station
UBoF Unrelated Block Offloading Filter
DoS Denial of Service
DDoS Distributive Denial of Services
TDFS Trusted Data Forwarding Support
SDN Software Defined Network
AoI Age of Information
ML Machine Learning
LDA Linear Discernment Analysis
Adaboost adaptive Boost
GB Gradient Boost
XGBoost Extreme Gradient Boost
HGB Histogram Gradient Boost
DT Decision Tree
HGB Histogram Gradient Boost
DS Datasets
RQs Research Questions
RAs Research Answers
ROs Research Objectives
Chapter 1
Chapter 1 Introduction
The objective of this thesis is to design a blockchain based registration and ma-
licious node detection system for WSNs. This chapter provides an overview of
blockchain and WSN. This chapter covers the problem statement, objectives, re-
search questions, contributions, significance, technique, and thesis structure. A
summary of the chapter is also provided.
1.1 Overview of Research
A Wireless Sensor Network (WSN), comprising thousands of nodes, is widely used
in several applications like supply chain management, military surveillance, en-
vironmental monitoring, etc., [1]. Sensor Nodes (SNs) are used to monitor and
gather environmental data. Besides, in crowd sensing networks, SNs send massive
amounts of the collected data to the nearby nodes and Cluster Heads (CHs). This
process decreases the cost of different types of equipment and conventional meth-
ods for data collection. However, some nodes do not participate in crowd sensing
networks due to privacy issues.
Moreover, in the absence of a security mechanism, WSNs become vulnerable to
malicious nodes that modify the data for their own interest. Furthermore, the
SNs are resource constrained and do not perform efficient resource utilization. In
addition, traditional methods are unable to detect malicious nodes. Whenever,
an attack is performed by a malicious node, the network is compromised, and
malicious nodes perform malicious activities that affect the entire network. To
prevent the nodes from acting maliciously, many authors propose authentication
schemes that allow only the authentic nodes to join the network [2]. However,
the existing authentication schemes depend upon centralized entities, which are
vulnerable to cyber-attacks.
In WSNs, SNs are either randomly or statically deployed depending upon network
topology. SNs gather environmental data and transfer it to their destination.
However, some SNs do not store the location information because their topology
is frequently changed, and the usage of a large number of sensing nodes may
cause network information congestion. To solve this issue, a WSN is split into
sub-networks that CHs manage. CHs get data from SNs and send it to Base
Stations (BSs) [3]. Moreover, SNs are resource constrained in terms of low storage
and computational power. Also, SNs are prone to different types of attacks and
are easily compromised by malicious nodes. Many researchers propose different
techniques to avoid malicious attacks and detect malicious nodes [4,5]. However,
detection of the malicious node in WSNs depends on a third party, which can easily
2Thesis by: Muhammad Nouman
Chapter 1 Introduction
be compromised. Therefore, blockchain is introduced to overcome the problems
associated with centralization and the involvement of third parties [6,7,8,9,10].
WSN nodes produce vast amounts of data and store them on a centralized system.
However, security breaches and failures might destabilize the WSNs. Therefore, a
Peer-to-Peer (P2P) network is proposed to overcome centralization issues related
to data storage [11]. In a P2P network, nodes directly transfer the data from the
source to the destination without the assistance of a third party. With the rapid
expansion of WSN nodes, P2P architecture faces security and privacy challenges.
Therefore, blockchain technology is introduced to address the security issues of
WSNs through a distributed, decentralized, and immutable ledger [12]. Once data
is added to the blockchain, it will never be tampered by any malicious party due
to the distributive nature of the blockchain. Furthermore, the idea of integrating
WSN and blockchain has attracted much attention from the public. However,
blockchain consumes a lot of computational resources, whereas, SNs have limited
resources. Also, when incorporating the new blockchain design into the WSNs,
some other issues may arise. Besides, the Proof of Work (PoW) consensus mecha-
nism is widely used in blockchain that effectively reduces the number of malicious
nodes and verifies the transaction. However, the PoW consensus mechanism re-
quires a large amount of computational power to confirm a transaction and add
it to the block in the blockchain [13].
Moreover, most of the researchers propose the Interplanetary File System (IPFS)
for data storage, which was introduced by Juan Benet in 2014 [14]. IPFS shares
many of the same characteristics as blockchain. It uses a P2P, decentralized, and
distributed file storage system. Besides, IPFS nodes are the machines that execute
the IPFS software to store and retrieve files from the IPFS network. IPFS nodes
use content addressing to store and retrieve the files. All IPFS nodes store the
files in the form of chunks, similar to a BitTorrent network. There is no effect on
the network if one node fails. Furthermore, it uses two types of data structures
to distribute the file. One is Distributed Hash Table (DHT), and the second is
Merkle Directed Acyclic Graph (DAG). When nodes send a file to the IPFS for
storage, then SHA256 algorithm is executed on the file, and the hash value for
each stored file is generated. The hash value is called a content identifier, which
is used to retrieve the stored files from the IPFS.
3Thesis by: Muhammad Nouman
Chapter 1 Introduction
1.2 Problem Statement
This section aims to present the problem statement of the proposal. In WSNs,
nodes are randomly deployed, and data is collected from the surrounding envi-
ronment. WSNs are easily accessible, and any node can join them. As a result,
malicious nodes enter the network and perform malicious activities that affect the
entire network. The authors in [24] propose a centralized authentication mech-
anism that registers the nodes and protects confidential node identification from
an unauthorized node in WSNs. However, the centralized system causes the issue
of a single point of failure. Moreover, SNs have resource constraints and do not
efficiently detect malicious behaviour in the network. Also, malicious nodes can
easily damage and compromise the WSNs [4]. Furthermore, malicious nodes col-
lect false data and deliver it to destination nodes where blockchain is deployed to
store the data [23]. However, storing huge volumes of data in a blockchain is very
expensive. In addition, blockchain uses the PoW consensus mechanism for block
generation, which consumes a huge amount of computation power during block
generation [13].
1.3 Research Objectives
To determine the research aim helps to determine the research objectives (ROs).
These are the following research objectives listed below.
RO-1: For a secure network, authentication is necessary to protect the network
from external nodes. Otherwise, Any node enters the network and do
any unwanted operations.
RO-2: Malicious nodes compromise the network and perform malicious activ-
ity. As a result, network performance decreases. Malicious node de-
tection methods are required to provide a network with free malicious
RO-3: The large amounts of data on the blockchain are expensive. Therefore,
We require an effective method for reducing storage costs.
RO-4: The PoW consensus mechanism in the blockchain requires a significant
amount of computing power and transection for verification and new
blocks. Therefore, We need an effective method for reducing transaction
cost and computation power.
4Thesis by: Muhammad Nouman
Chapter 1 Introduction
1.4 Research Questions and Answers:
Following are the research answers (RAs) related to research question (RQs).
RQ-1: How to perform registration and authentication of nodes that
are participating in the network?
RA-1: A network model is deployed that will register and authenticate the SNs
and CHs using blockchain. The registered SNs and CHs are randomly deployed
in WSN and register itself on BSs, BSs are deployed statically. BSs are the trust-
worthy nodes for CHs and SNs.
RQ-2: What is the purpose of IPFS and why it is used?
RA-2: The IPFS storage system was used in the network model. IPFS is a
file storage system that is distributed. It’s core function is built on blockchain.
Many of the qualities of IPFS are similar to those of blockchain. IPFS nodes store
data in chunks, similar to a BitTorrent network. If one node fails, the network
remains unaffected. It reduces costs on storage while providing the same level of
protection as blockchain.
RQ-3: How to detect the malicious nodes in the network?
RA-3: The malicious nodes must be removed from the network to make it free
of malicious activity. Therefore, BSs are trained to detect malicious nodes using
machine learning. The SNs are heterogeneous and have resource constraints, and
these nodes are not able to learn and detect the maliciousness of nodes.
RQ-4: How to reduce the computational cost in blockchain?
RA-4: Public blockchain is used with the VBFT consensus mechanism. Miner
nodes are randomely select in each round, which reduces the computational cost.
1.5 Research Contributions 1
The following are the research’s significant contributions:
5Thesis by: Muhammad Nouman
Chapter 1 Introduction
in a WSN, a blockchain based decentralized authentication mechanism is
used to protect disclosure of node identities by external nodes,
for data storage in a WSN, IPFS is deployed that integrates blockchain
technology. The cost of storing data in the blockchain is minimized when
storing data on IPFS. The data is stored in chunks in IPFS, and the hashes
are created that are recorded in the blockchain,
the proposed blockchain based network uses the Verifiable Byzantine Fault
Tolerance (VBFT) consensus mechanism [?], which reduces the blockchain
transaction cost and increases the throughput as compared to the existing
consensus mechanisms like PoW and
the comparative analysis of the proposed classifier, i.e., Histogram Gradi-
ent Boost (HGB), with Adaptive Boost (AdaBoost), Gradient Boost (GB),
Extreme Gradient Boost (XGB), Linear Discriminant Analysis (LDA), and
ridge classifiers is performed. The analysis is done on the basis of numerous
performance metrics, including accuracy, precision, recall, micro F1-score,
and macro F1-score.
1.6 Research Contributions 2
A DRCL stacking model is proposed for malicious node detection in WSN
by combining Logistic regression (LR), Random forest (RF), Complement
naive bayes (CNB), and Decision tree (DT),
the Adaptive synthetic (ADASYN) technique is used to tackle the imbal-
anced dataset problem,
Proof of Authority (PoA) consensus mechanism is used, which reduced the
transaction cost and increased the throughput as compared to the existing
consensus mechanism,
IPFS, which is integrated with blockchain technology, is used for data storage
in a WSN. When data is stored on IPFS. The IPFS generates the hashes
that are stored on the blockchain. It reduces the storage cost,
we perform the comparative analyses of the proposed technique, i.e., DRCL
stacking model followed by ADASYN, with a Decision tree (DT), Random
forest (RF), Complement naive bayise (CNB), and Logistic regression (LR).
6Thesis by: Muhammad Nouman
Chapter 1 Introduction
Simulation results show that our DRCL stacking model with Adaptive syn-
thesis (ADASYN) gives better results as compared to standalone classifiers
on the original and balanced dataset,
We use the Oyenete tool to evaluate the smart contract’s robustness against
several attacks, including integer underflow and overflow, timestamp depen-
dence, callstack attack, etc.
1.7 Research Methodology
This research focus on deploying a network model for WSN data storage and mali-
cious node identification. Furthermore, blockchain is used in this work to prevent
malicious attacks. However, I first select the domain blockchain and machine
learning to do this. After that, I selected a supervisor and began my research.
I search and download the most recent research papers in my field during my
research. Then I completely read these research papers and discovered certain
shortcomings that are listed in Table [5.4]. Following that, I provide a thorough
solution stated in the research limitations. Finally, I choose a benchmark WSN
dataset and suitable performances measures to validate the proposed solution’s
1.8 Thesis Structure
The remaining thesis is written out as follows. The preliminaries and basic con-
cepts about the domain used in this study are provided in Chapter 2. Chapter 3
presents the related work of the closely related research articles. Chapter 4The
components of the proposed system model are described. Whereas chapter 5ex-
plains the findings and the debate about the proposed and baseline models. The
thesis’ conclusion and future work are presented in chapter 6.
7Thesis by: Muhammad Nouman
Chapter 2
Background Studies
Chapter 2 Background studies
This chapter explains the preliminaries that used in the thesis are explained as fol-
lows. At the start of the chapter, the basics concept of the Blockchain is presented.
Also, blockchain and its types and consensus mechanism is also discussed.
2.1 Blockchain
Blockchain is a distributed, decentralised, and immutable ledger. It is made up
of cryptographic hashed blocks that are joined together. The genesis block is the
first block on the blockchain. A transaction is stored in a block that is 1 megabyte
in size. A record that has been added to the blockchain will never be tampered
with by a malicious party.
2.1.1 Public Blockchain
In a public blockchain, anyone can join and exit the public blockchain network
without requiring permission from a third entity. Users who are part of the
blockchain can participate in all operations without restriction. These types of
blockchain are permissionless.
2.1.2 Private Blockchain
The concept of private blockchain is contrasted to that of public blockchain. Only
authorised users are permitted to join in the network in this blockchain. Pre-
selected nodes are users who are allowed to do network operations such as writing,
updating, and storing transactions. It is also called a local blockchain.
2.1.3 Consortium Blockchain
A consortium blockchain accepts the pre-selected participants and can only access
the ledger between two or more organizations, and only permission participants
can join the network.
2.2 Mining
In a blockchain, mining is performed through minor nodes. The blockchain trans-
actions are validated during the mining process. After that, these transaction
details are stored in blocks as Merkle tree roots. Then, It added a new block
through mining to the blockchain.
9Thesis by: Muhammad Nouman
Chapter 2 Background studies
2.3 Consensus Mechanism
The consensus means agreement among the nodes on a single value is called con-
sensus mechanism. The most common consensus mechanisms are used in the
blockchain are Proof of Work (PoW), Proof of Stake (PoS), Proof of Authority
(PoA), etc.
2.3.1 Proof of Work
Multiple consensus mechanisms are used to perform blockchain mining. The most
common consensus technique is PoW, which involves all users of the blockchain-
based network in the mining process. PoW has a significant computational power
consumption as a consensus mechanism.
2.3.2 Proof of Stake
The node which has more stake has the authority to perform the transactions as
compared to other nodes.
2.3.3 Proof of Authority
In PoA, pre-authorized miners validate the transactions. If miners allow the new
node to perform the transaction, then the new block performs the transaction.
2.4 Wireless Sensor Network
A WSN’s nodes are randomly distributed and communicate wirelessly. SNs de-
tect, monitor, and gather data from the surrounding environment using wireless
2.5 Sensor Nodes
The sensor nodes are a resource and energy-constrained devices with sensing abil-
ity. It consumes energy when collecting, communicating, and processing data.
10 Thesis by: Muhammad Nouman
Chapter 2 Background studies
2.6 Cluster Head
The logical subdivision of a network is called a cluster. In cluster, nodes are
grouped, and cluster head (CH) is selected where nodes send data towards their
respective CH.
2.7 Malicious Node
Malicious nodes are nodes that execute malicious operations and impersonate the
identities of normal nodes in the network.
2.8 Blackhole Attack
A black hole attack is when all packets are not transferred to the destination node.
Compared to a grey hole attack during network routing, this attack drops the most
2.9 Grayhole Attack
In this attack, the intermediary nodes act as a malicious node that receives the data
packets from the source and forwards some data packets towards the destination,
and drops other data packets.
2.10 Interplanetary File System
IPFS is used for data storage. Its core function is built on blockchain technol-
ogy. Many of the qualities of IPFS are similar to those of blockchain. It uses a
distributive, decentralized file storage system.
2.11 Hashing
The process of converting an arbitrary size data into the fixed size data is called
11 Thesis by: Muhammad Nouman
Chapter 2 Background studies
2.12 Chapter Summary
In this chapter, different terms are discussed and elaborated that are necessary to
be understood in blockchain and WSN fields. Firstly, the authentication is dis-
cussed. Secondly, blockchain and its types are discussed. Moreover, the consensus
mechanism that used in the blockchain technology is also presented.
12 Thesis by: Muhammad Nouman
Chapter 3
Literature Review
Chapter 3 Literature Review
This section discusses a literature review of several papers based on the problems
already addressed, solutions they have provided, and problems to be addressed.
3.1 Related Work Contributions 1
In WSNs, SNs share information and communicate with each other. WSNs are
easily accessible, and any node can join them. Malicious nodes acquire legitimate
node identities, which makes it easy for them to become part of the network.
The authors propose a lightweight blockchain IoT authentication scheme in [15].
This scheme ensures integrity and non-repudiation in the network. Whenever IoT
nodes communicate with one another, they must first authenticate each other,
which is done using a lightweight blockchain. In [16], the authors develop a hy-
brid blockchain model for IoT nodes to prevent malicious or fake data packets
from spreading throughout the network. Public and private blockchain make up a
hybrid blockchain. Between CHs and BSs, the public blockchain is implemented,
while the private blockchain is implemented between CHs and SNs. SNs are au-
thenticated on CH using a smart contract, and CHs are authenticated on BS. In
[17], blockchain and reinforcement learning based model is proposed for efficient
and secure routing in WSNs. The reinforcement learning algorithm selects the
best possible routing path. It avoids the malicious routing links that might send
data through compromised nodes, while blockchain is used for node authentication
and managing all routing information. In [18], blockchain based key management
is presented to tackle the issue of certificate-less key management. The blockchain
performs node authentication, registration, and joining or quitting of nodes. In
addition, it provides the mechanism for the detection of the compromised node.
In [19], a data structure based on blockchain is used to hold nodes’ authentication
and trust information. Blockchain authentication consists of three aspects: public
keys, block mining, and mutual influence, while the blockchain trust model consists
of two aspects: knowledge based trust and trust evaluation. In [20], blockchain is
used to overcome IoT issues. IoT devices register themselves on the blockchain.
If the IoT device is successfully registered and authenticated, the activity is per-
formed according to its capability. Similarly, users need to be authenticated in the
blockchain network to be able to control and manage IoT devices. It restricts the
malicious nodes from becoming a part of the network and stores all evidences on
a blockchain. In [21], the authors propose an information-centric network (ICN)
for node identification using the public key cryptography technique. In addition,
for storing data, a decentralized sharing and cross-verification scheme is used. In
[22], the modified version of the station-to-station (STS) protocol is presented. It
14 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
first authenticates the user and then establishes a secret exchange session key that
ensures user anonymity inside a group.
In [23], a blockchain based data structure model is used for malicious node de-
tection. WSN nodes have limited memory and computational power, and are
unable to detect malicious nodes. Whenever an attack is performed on a node, it
is compromised by a malicious node. In [24], service providers or clients who are
malicious may refuse to offer services. Blockchain based nonrepudiation scheme
is proposed to resolve conflicts. A homomorphic hash based service verification
method verifies and stores the service record on the blockchain. In [25], the au-
thors propose the three layered SDN architecture that monitors and analyzes the
traffic in the IoT environment. Another pertinent point is that a blockchain is
used for decentralized attack detection. As a result, fog computing and mobile
edge computing provide attack detection, reducing the number of attacks that
occur at the edge layer. In [26], a secure and privacy-preserving model is proposed
for the smart city. Three modules make up the proposed model. The first module
is trustworthiness, where authors use the blockchain among the IoT devices to
maintain trust. The second module is two-level privacy, where enhanced PoW
is used in blockchain to achieve confidentiality and prevent the poisoning attack.
The third module is the intrusion detection system, which is used for malicious
node detection. XGB classifier is utilized in the process of identifying malicious
nodes. In [27], the authors propose the secure privacy-preserving framework. The
presented model has two major components: two-level of privacy and an intru-
sion detection mechanism. Blockchain is utilized in two-level privacy to securely
transmit data among IoT nodes. The two-level privacy uses principal component
analysis (PCA) to transform data into a new form to protect it against inference
attacks. The authors use gradient boost anomaly detection (GBAD) for the in-
trusion detection system based on light gradient boost model (LGBM). GBAD is
deployed in a smart city that can proficiently classify normal and malicious ob-
servations. In [28], a blockchain based automatic (AutoML) model is proposed
for customer services to overcome the third parties’ challenges. IoT devices are
used to collect data, and blockchain is used for secure data exchange in an open
environment. Furthermore, AutoML is designed to process data and reduce ex-
pert costs. In [29], the authors propose an ensemble learning technique that uses
multiple ML techniques to classify data. The final classification report is obtained
based on all classifiers’ votes.
In [30], a confusion model with the blockchain based incentive mechanism is used
to ensure data security that is not tampered by a malicious node. Data is stored
15 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
on the blockchain, and users are rewarded with virtual currency based on how
many times they participate. In [31], secure routing with multi-layered IoT archi-
tecture is proposed, where light blockchain and cloud are used. Light blockchain is
used for security and privacy, while the cloud is used for data storage. In [32], two
different kinds of blockchain are used in a WSN: one for storing data and the other
for managing how users can access data. A verifiable data possession consensus
mechanism is also used to reduce the cost of computing. In [33], a decentralized
blockchain mechanism is proposed for Internet of Things (IoT) monitoring and
controlling, in which each entity can track and communicate. The data controller
manager receives the data and filters out specific data stored in the blockchain.
In [34], the multiple synergistic proofs green consensus method is proposed to ad-
dress the issue of limited data storage. As a result, less space is being allocated to
blockchain data. When peer nodes verify a transaction or a block, they frequently
send the same information. This is not favourable and deteriorates network perfor-
mance. In [35], a blockchain based aggregation scheme is proposed that decreases
the device’s duty cycle. Moreover, the reduction in the risk of transmitting a
large amount of data at the risk of increasing data delay at the IoT device is
achieved. The selection of gathered data is based on the channel’s quality, and
the most recent data structure statistics. The expense is incurred in sending the
Merkle-Patricia tree data structures as evidence of inclusion for the most recent
data. In [36], an optimized sampling rate strategy is proposed for IoT sensors that
transmit data using blockchain and Tangle technologies, which decrease the age
of information, and make efficient end-user processing and networking resources.
In [37], blockchain is used to resolve IoT networks’ security and privacy issues, pro-
viding a secure distributed and immutable ledger. Some nodes in a blockchain net-
work are responsible for mining, which entails validating transactions and adding
new blocks to the blockchain. Mining nodes require a high computing capacity as
compared to conventional nodes. IoT devices have limited power, battery life, etc.
This research encourages IoT devices to purchase processing power through edge
servers, participate in mining, and earn incentives on the blockchain network. In
[38], a scalable and secure blockchain is proposed for the IIoT system. First, a self-
adaptive PoW consensus mechanism is proposed, which adjusts the difficulty for
nodes as per their behaviours. The self-adaptive PoW consensus mechanism effi-
ciently reduces computational power. Moreover, asymmetric cryptography is used
for access control, giving users more options for managing data authority. Further-
more, a blockchain based directed acyclic network is used to improve throughput
and transaction time. In [39], a hybrid model based on an SDN and blockchain is
16 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
proposed for the smart city. Two types of smart city nodes exist in the network:
edge node and core node. The core nodes are provided with the data from the
edge nodes after the edge nodes receive the data from the sensors. These edge
nodes act as centralized entities because edge nodes also use SDN technology, and
their computing power and storage are less than that of core nodes. Whereas,
core nodes are the powerful nodes that receive the sensors’ data and perform min-
ing. Core nodes, also called miner nodes, use blockchain technology for mining
transactions and enhancing security.
In [40], a lightweight exclusive OR (XOR) hash algorithm is used, which pro-
vides secure and reliable data routing using blockchain technology. In [41], rolling
blockchain is used for WSNs to ensure that the WSN nodes and data are secured
from attackers. In [42], a blockchain system with mobile edge computing allows
mobile miner nodes to perform computationally intensive tasks on surrounding
edge nodes. As a result of this strategy, backhaul and latency are minimized. In
[43], a trust-aware localization routing protocol with class based dynamic encryp-
tion is proposed. This proposed method first searches the secure path from source
to destination and then forwards the data packet. The selection of a secure path is
made using the trust value. Moreover, blockchain based encryption is used for data
integrity. In [44], a trust based range-free safe localization method is proposed.
The trust values of beacon nodes are communicated through the blockchain with
their nearby nodes. Trustworthy beacon nodes are selected as miners for block
mining. However, it is a time consuming process.
In [45], WSNs nodes have limited memory and computational power. Nodes are
unable to detect malicious nodes. Whenever an attack is performed on a node, it is
compromised with a malicious node. Increase the attack density on network nodes
that have been compromised or have lost data. The author proposes that rolling
blockchain technology in wireless sensor networks ensures that the WSN nodes
and data are secure from attackers. In [46], the authors addressed existing crypto-
graphic protocol socket secure layer, or transport layer security (SSL/TLS) is just
a tunneling mechanism that cannot ensure specific goals such as data authentic-
ity or user anonymity obtained from sensor modules. Data security and privacy
issues continue to challenge any malicious node that steals or modifies valuable
data. The authors propose a secret exchange session key using an updated vari-
ant of the Station to Station (STS) protocol. To guarantee that user anonymity
within a group is maintained. In addition, blockchain is used to ensure the non-
tempering and monitoring of the data. The SNs data is stored on a blockchain.
Once data is recorded, malicious nodes cannot tamper with it. However, data
17 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
security and privacy issues are still challenging issues that are not resolved . In
[47], many problems have developed in response to the use of blockchain. The
miner’s computational capability and storage availability are in extremely high
demand. There is a backhaul and delay issue. The authors propose a blockchain
system with mobile edge computing (MEC) that allows mobile miners to perform
computation-intensive tasks on surrounding edge nodes. Service provider overload
offloading is divided into two modes to minimize (MEC). The authors propose a
blockchain system with mobile edge computing (MEC) that allows mobile min-
ers to perform computation-intensive tasks on surrounding edge nodes. In [48],
malicious nodes affect the network routing. Malicious nodes in a network are
a serious issue of data security. It has been proposed a trust-aware localization
routing protocol with class-based dynamic encryption. This method first searches
the secure path from source to destination then forwards the data packet. Selec-
tion of a secure path based on trust value. For all routes, the TDFS metric is
utilized. Blockchain-based encryption is used for data integrity. Authors validate
the model through different performance parameters such as security performance,
throughput performance security performances encryption/decryption, and time
18 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
Table 3.1: Literature Review
Limitations al-
ready addressed
Solutions al-
ready proposed
already done
Limitations to
be addressed
Malicious nodes are
presence in the net-
Intrusion preventa-
tion framework is
proposed [15]
Data integrity
and network
XoR func-
tion is not
strong enough,
blackhole and
greyhole attack
may occur
Localization prob-
lem of unknow node
Trust model based
on blockchain is
used [16]
Feasibility, fair-
ness and trace-
RSA slow down
the encryption
method in case
of large data
Crowded sensing
networks are vulner-
Confusion mech-
anism and
blockchain based
incentive mecha-
nism proposed [17]
Energy con-
sumption, delay
Not improved in
route acquisition
latency and
packet delivery
method is used for
A hybrid
blockchain based
model is proposed
Processing time
and transmis-
sion delay
Data duplication
Localization of
WSNs nodes, un-
know nodes perform
attack on network
A blockchain trust
model is proposed
False negative
rate, detection
accuracy and
energy con-
No encryption
and hashing
algorithm used
for security
Dynamically rout-
ing, and centralized
Blockchain and re-
inforcement learn-
ing algorithm are
proposed [20]
Delay, energy
Queue delay and
processing delay
IoT nodes are re-
source constrained
IoT authentication
scheme is proposed
Latency and
packet delivery
Increases pro-
cessing time
19 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
IoT node manufac-
turers are unable to
agree on a simple
central administra-
BCR protocol is
proposed [22]
Packet drop ra-
tio, packet de-
livery ratio, de-
Not improved in
route acquisition
latency and
packet delivery
The key manage-
ment security weak-
ness is caused by a
vulnerable and un-
trusted BS
secure key manage-
ment technique is
proposed [23]
packet delivery
ratio and route
latency with
adhoc on de-
mand distance
Security issue in
centerlized key
High computational
cost for data trans-
Blockchain technol-
ogy is used [24]
Increase the attack
density on network
nodes that have
been compromised
or have lost data
The rolling
blockchain technol-
ogy is proposed for
WSNs [25]
with traditional
No encryption
and hashing
algorithm for
Data security and
privacy issues
(STS) protocol is
proposed [26]
Less storage
and energy
Single point of
PoW consensus
mechanism, there
is a backhaul and
delay issue
Proposed MEC-
enabled blockchain
system [27]
Averagy delay,
total net rev-
enue and task
offloading deci-
No identification
mechanism of the
sensing data, data
duplication waste
the resources and
privacy sensitive
Secure caching
scheme is proposed
Processing time
and transmis-
sion delay
Increases pro-
cessing time
Data storage de-
pends on trusted
third party or cen-
tralized system
A blockchain-based
incentive system
for data storage is
proposed [29]
No encryption
and hashing
algorithm for
Trust management
is a serious issue
in decentralized
networks and WSN
Security model that
used a blockchain-
based data struc-
ture is proposed
PoA consumes
less time in min-
ing transactions
Centralized edge
20 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
Malicious nodes in a
network are a serious
issue of data security
Trusted routing
protocol with class
based dynamic
encryption is pro-
posed [?]
security per-
cryption and
time complexity
Queue delay and
processing delay
Centralized indus-
try architecture is
used for monitoring
and controlling IoT
devices, duplicated
data is stored in IoT
devices, resulting in
resource waste
blockchain mecha-
nism for industrial
IoT is proposed
Average local-
ization error,
probability of
finding true
Not find the ac-
curate position
of node
Traditional nonre-
pudiation solutions
in IIoT environ-
ments rely on
trusted third parties
A consortium
blockchain with
PoA are used for
nonrepudiation [33]
Average trans-
action la-
tency, average
and bloating
No authentica-
tion mechanism
for clients
Computational over-
head in the network
blockchain with
PoA is used for
nonrepudiation [34]
cost, letancy
Did not address
the node behavi-
Centralized manage-
ment structure, mas-
sive storage,
A synergistic multi-
ple proof consensus
mechanism is pro-
posed [35]
Attacker are an-
alyze the secu-
It is possible for the
information to be
tampered with be-
cause it is stored on
a centralized plat-
Blockchain, SVM,
k-mean are used
Quality loss,
weight loss and
mean relative
Queue delay and
processing delay
Centralization, pri-
vacy issue, large
data problem
Blockchain based
SDN network are
proposed [37]
Score, accuracy,
DT and AUC
SDN act as a
centralized sys-
Growth of IoT de-
vices causes high
latency, bandwidth
and scalability
Blockchain technol-
ogy is used with
SDN [38]
packet delivery
ratio, latency
21 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
Storing entire blocks
causes high compu-
tational overhead
Proof of incusion is
proposed [39]
Duty cycle
trade-offs and
Not trusted as
compared to
Blockchain consen-
sus mechanism is
Optimal sampling
and age of infor-
mation is proposed
Age of informa-
tion and sam-
pling intervals
Security com-
Throughput, scala-
Sharding scheme
based on MaOEA-
DRP is proposed
Shard invalid
Uneven distribu-
tion of malicious
Centralized archi-
tecture, service
delay, bandwidth
Incentive mech-
anism for secure
blockchain basedon
IOTs is proposed
Impact on a
unit price,
power on strate-
PoW consume
high computa-
tional resources
Key management re-
lies on centralized
BS, CL-EKM can-
not guaranteed for-
ward secrecy
Blockchain based
key management
is proposed for
DWSN [42]
Storage over-
head, energy
packet recep-
tion rate
Asymmetric en-
cryption create
highly overhead
Conflict between
high concurrency
and low bandwidth
Blockchain based
DAG structure is
proposed to achieve
high throughput
Average time
transection per
second, credit
value change
impact node
DAG base
blockchain is
vulnerable for
double spending
Centralization ar-
chitecture cause a
SPOF, IoT devices
are valuable for
different attacks
Blockchain is pro-
posed to solve the
issues associate
with IoT [44]
setup is used for
the experiment
Centralization, scal-
ability, verifiability
TP2SF framework
is proposed [45]
F1-score, detec-
tion rate, up-
load file time
Scalability, trans-
PPSF framework
and GBAD is pro-
posed [46]
Off-chain stor-
Gradient Boost
required high
22 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
Unknown nodes
broadcast mislead-
ing information in
the network
SVR is proposed
Root mean
square, correla-
tion coefficient
Lack of development
of basic customer
service capabilities.
A blockchain-based
AutoML is pro-
posed [48]
Block size,
Privacy of cus-
tomer data
The single ML tech-
nique does not clas-
sify malicious nodes
an ensemble learn-
ing technique is
proposed [49]
Accuracy, recall Increase time
23 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
3.2 Related Work Contributions 2
WSN nodes are an essential part of IoT networks that collect data from specified
regions. WSN is a self organizing network that consists of several SNs with poor
computing, storage, and energy capabilities. The WSN nodes rely on two types of
architectures, centralized and distributed [49]. The SNs data is transmitted to the
BS in a centralized WSN. In a decentralized WSN, SNs transmitted and received
data from the other SNs. In [50], a Low energy adaptive clustering hierarchy
(LEACH) protocol is designed to increase network lifetime. It uses a probability
function to choose some of the nodes as CHs. The remaining nodes are then ran-
domly distributed around the network. The SNs remain fixed and provide data
to the BS in a one-hop transfer through connected CHs. However, CHs computa-
tional overhead increases, which is ineffective for WSN. Furthermore, there is no
mechanism used for node identity authentication to ensure its security. The au-
thors addressed the problem in existing identity authentication problem in WSN.
The existing authentication method relies on a centralized system or trusted third
parties, which are vulnerable to cyber attacks. The authors proposed the mutual
authentication method using a hybrid blockchain [51]. In the proposed model,
hybrid blockchain consists of two types: public and local. A public blockchain
is implemented on the BS, which authenticates the selected CHs, while a local
blockchain is implemented on CHs, which authenticates the ordinary nodes. Each
node first mutually authenticates to each other and then communicates. In [52], a
decentralized blockchain architecture with authentication and privacy protection
techniques has been developed to allow secure communication among WSN nodes.
BS is responsible for registering and verifying all SNs. After the verification, the
CHs used an immutable key method to keep all the essential parameters. Data
collection is sent to BS through CHs. It is classified into significant parameters and
detected data. Subsequently, a large quantity of data was stored in the cloud. On
the blockchain, important parameters have been kept to increase the immutability
of the data and its transparency. A malicious SNs was eliminated in the network
when the certification was revoked. In [53], develop a blockchain-based access
control architecture for IoT nodes that allows verification and the revocation of
access privileges. The blockchain checks the identities of each user before recording
all public keys, user attribute sets, and revocation lists. After establishing these
settings, the administrator will provide the node access to their private keys. The
administrator is responsible for establishing and managing the security policies.
If the credentials of the nodes are valid and the attributes meet the access cri-
teria, the cloud servers will provide intermediate access. The decisional bilinear
24 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
diffie hellman assumption ensures the system’s security and resistance to intrusion,
which allows for tracking malicious nodes and the ability to revoke their access
rights at any moment.
In [54], the malicious node detection in a WSN relies upon a third party, which
cannot guarantee fairness and traceability. To tackle this address, The author pro-
posed the blockchain based trust paradigm for malicious node detection in WSN.
The proposed model, which differs from existing techniques, creates a blockchain
based data structure to identify malicious nodes. It stores the information into
blocks; each block is linked to the one before it by a hash value.
A blockchain based, low power data gathering approach for Unmanned aerial ve-
hicle (UAV) assisted IoT is proposed [55]. This model includes a recharge station
for long term UAV service. Charging coins and blockchain secure all transactions
by recording and incentivizing them. In this approach, malicious UAVs will ul-
timately lose power and depart. It is believed that adaptive data prediction will
minimize blockchain transactions and increase energy efficiency. According to the
proposed method, attacking UAVs use up all their available energy. The proposed
method is a secure and efficient way to collect data for UAV assisted IoT based
on the blockchain.
Industry 4.0 cannot operate without IoT devices. IoT devices help reduce costs,
improve performance, and enhance security in manufacturing and industrial pro-
cesses, which has a huge potential to enhance the industry’s integrity, availability,
and scalability. IoT devices generate massive amounts of data stored on the cloud
service provider [56]. A traditional cloud based storage system follows a cen-
tralized control strategy where reliable outside parties keep the information. A
centralized system may appear to function effectively from the outside, but it has
several issues. It is hard to ensure service availability with such a system since it is
expensive to maintain and prone to hacking. To tackle these issues, the researcher
suggested blockchain data storage [57]. However, blockchain consumes significant
resources, and its storage cost is exponentially high. In order to address the afore-
mentioned problems, the authors suggested an IPFS based system for data storage
[58]. When a file is uploaded to the IPFS storage system, the hash value is re-
turned to the file’s owner. Only users with the hash key have access to the data.
IPFS ensures data security and reliability while being less expensive than alter-
native storage options. In [59], cloud based blockchain technology is proposed for
malicious attack prevention and efficient energy usage of IoT nodes. Cloud servers
are used with edge servers that reduce the burden on cloud servers and give IoT
25 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
nodes instant access to the resources. On the other hand, blockchain is used to
keep IoT nodes secure from malicious attacks. The blockchain checks the legiti-
macy of the edge server and its service before making it accessible to the cloud
servers and IoT nodes. In [60], a fitness framework is built based on blockchain
technology and machine learning. The suggested system uses a blockchain based
IoT network to protect and store sensing data and an improved smart contract
relationship and inference engine to unearth insights hidden in IoT and user device
network data. The updated smart contract allows immutable monitoring, control-
ling, straightforward access, and reporting. The inference engine module enables
users to judge more accurately and improve the services. The multiple nodes in an
IoT network securely transmit information and assets using blockchain technology.
It is crucial to reduce centralized control and solve the current problem in artifi-
cial intelligence (AI). Blockchain is integrated with IoT architecture and AI [61].
It provides a feasible option for enhancing IoT operations and infrastructure with
blockchain and AI. This technique employs both qualitative and quantitative ways
of measurement. The qualitative evaluation examines how blockchain and AI can
be implemented in IoT networks. In contrast, the quantitative evaluation assesses
the performance of the proposed model and compares it to previous research exam-
ining various criteria. The suggested IoT system architecture outperforms existing
IoT systems in all researched domains. Furthermore, the suggested architecture
can be extended with decentralized machine learning concepts such as feature
extraction, scalability, and categorization.
In [62], a conditional packets manipulation attack (CPMA) is proposed for in-
side targeted attacks in which attackers change packets with attribute values that
match particular requirements. Existing CPMA attack detection systems are in-
effective and inefficient for low-power IoT nodes. They identify malicious nodes
by collecting and analyzing node behaviour. CPMAED is a technology designed
to identify malicious nodes during CPMA attacks. While forwarding packets with
varied attribute values, CPMAED maintains partial trust metrics for each relay
node, which signal the probability of an attack. In addition, apply regression and
clustering methods to categorize nodes as normal or malicious. They improve
detection precision by optimizing packet routing and injecting packets to obtain
more node data.
Machine learning techniques are frequently used for IDS because they can be
quickly learned and adaptable. These machine learning techniques are trained on
large datasets containing a large number of attacks in order to enhance anomaly
detection. However, higher dimension data creates machine learning problems,
26 Thesis by: Muhammad Nouman
Chapter 3 Literature Review
increasing processing time. It is a crucial issue for nodes with limited resources
(battery, energy). The authors proposed a classification system with trust based
IDS [63]. This proposed system finds and classifies intrusions on a secure network.
It uses a novel approach to minimizing the number of input data features. The
features are initially assigned to random groups, increasing their likelihood of be-
ing in distinct groups. The groups are ranked by how accurate they are. Only
the highest ranked features are used to classify network packets stored as part of
the node’s history. Furthermore, it uses accelerated trust based intrusion detec-
tion and classification system (TIDCS). TIDCS employs a dynamic approach to
determining when nodes need to be removed and limiting their exposure periods.
Both methods are used machine learning to estimate the final category by look-
ing at how nodes behaved in the past. Any attack makes the nodes less reliable,
and the system immediately eliminates itself. There are still concerns with the
quality of labeled data, even though a massive volume of data has been produced
in recent years. Due to the absence of labels in the dataset, it is challenging to
discriminate between qualities and impossible to anticipate target classes. Various
machine learning techniques demonstrate that the data distribution values for all
predicted classes are identical. This assumption is false in several applications,
such as weather forecasting, alignment diagnostics, and intrusion detection. Typ-
ically, most records are assigned to one class, while the remainder is assigned to
another [64].
3.3 Critical Analysis
According to the reviewed literature, several methods have been used for detecting
malicious nodes in WSN. Malicious nodes are identified and removed from the
network. However, Centralized authentication mechanisms are used to protect
node identities from external nodes in a WSN. Furthermore, SNs are resource
constraints and cannot detect malicious activity in the network. Malicious nodes
compromise the network and perform malicious actions. As a result, network
performance decreases. Malicious node detection is a difficult task because nodes
are interconnected to one another. Moreover, blockchain is used for decentralized
data storage to achieve verifiability, scalability, and transparency. However, the
huge volume of data storage on the blockchain is costly. In addition, blockchain
uses the PoW consensus mechanism to verify transection and add new blocks in the
blockchain. However, the PoW consensus mechanism consumes many computation
resources that are not feasible for WSN.
27 Thesis by: Muhammad Nouman
Chapter 4
Proposed Network Models
Chapter 4 Proposed Solution
4.1 Proposed System Model 1
In this section, the WSN network developed in the proposed work is discussed.
Also, this section introduces the reasonable assumptions that are used to propose
the network for WSNs. Figure 4.1 presents the proposed system model.
Wireless Sensor Network Machine Learning
Data Preprocessing
Verification of the
Smart Contract
New Blocks Creation
Cluster Head
Sensor Node
Miner Node
Revoking the
Base Station
Hashes Stored on Blockchain
Data Stored on
Access Data
Provide Hash
Target Label
Normal Malicious
Input Data
Request Hash
Response Hash
Figure 4.1: Proposed Network Model for WSNs.
4.1.1 Assumptions
SNs and CHs are resource constrained and each node has a unique identity.
BSs provide a certain amount of data storage and computational power for
processing the data sent by the SNs.
BSs are resource enriched and trustworthy nodes for the CHs and SNs.
4.1.2 Network Model Description
Our proposed network model’s nodes are divided into SNs, CHs, and BSs. SNs are
randomly deployed. Whereas, CHs are chosen based on their high residual energy
relative to SNs and their closeness to the BSs. SNs and CHs are registered on
29 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
the blockchain, which is deployed on BSs. Following registration, the blockchain
authenticates SNs and CHs using Node ID. SNs collect data and send it to CHs.
While CHs process data and transmit it to BSs. BSs utilize an ML classifier,
HGB, to determine whether data is transmitted to a malicious node or a normal
node. When data is transferred to a malicious node, the HGB classifier quickly
recognizes that the data belongs to which malicious class, classifies the attack to
that class, and reports to the blockchain. Blockchain then revokes the malicious
node’s registration. Otherwise, data is stored in an IPFS database. IPFS generates
a unique identifier (hash) for the data and sends it back to the BSs, where it is
stored on the blockchain. Moreover, public blockchain is implemented on the
BSs. The public blockchain is customized to allow nodes to add, remove and
validate transactions. Furthermore, a VBFT consensus mechanism is used in the
blockchain to verify and store transaction nodes’ credentials and cryptography
hashes [?]. Figure 4.2 is a representation of the workflow associated with the
proposed model. The steps involved in the system model are given in Algorithm
Algorithm 1: Pseudo Code of the Proposed Model
1Step-1: All nodes are registered on the BSs, where blockchain is
2Step-2: SNs collect data from the surrounding area and relay it to CHs,
while CHs forward the data to the BSs [79].
3Step-3: The BSs are trained on an ML classifier, HGB, that classifies the
data and sends it either to a malicious node or a normal node.
4Step-4: Data is stored on the IPFS if the BS classifies the node as a normal
node. Otherwise, the BS revokes the node’s registration.
5Step-5: The IPFS provides hashes that are stored on the blockchain
implemented on the BSs.
4.1.3 Registration
SNs and CHs play an important part in the overall system model. SNs and CHs
must create accounts on the blockchain and send registration requests to BSs.
After getting their registration responses, the SNs and CHs send data to BSs.
4.1.4 Sensor Nodes
In the proposed model, we adopted the WSN model, which is also used by [79].
SNs are randomly deployed according to their functionalities. SNs collect data
from their surrounding area and transmit it to the CH node. Each SN is directly
30 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
WSN nodes are registered
on a blockchain
Data Preprocessing
Data is stored on IPFS
Flood ing
Detection of
WSN nodes provide data
Revoke the
malicious node
Figure 4.2: Proposed Model’s Workflow.
connected to a CH in the network and shares its data and credentials after it is
fully registered. These credentials are stored on the blockchain.
4.1.5 Cluster Heads
CHs are intermediate nodes that receive data from SNs, process it, and then pass
it to the BSs. CHs’ storage capacity and computational power are higher than
those of SNs and lower than those of BSs. Each CH is directly connected to a BS
to shares its information and credentials with it.
4.1.6 Base Station
A BS is a powerful node that has the highest computational power and storage
capacity in the proposed network. It is also considered a core node of a network.
31 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
In the proposed network, BSs receive data from CHs, perform some complex oper-
ations, and verify if the data is being transferred to a malicious mode or a normal
node. The malicious node’s registration is revoked if data is delivered to it. If
not, it keeps information in the IPFS database. BSs store the network nodes’
credentials and monitor the whole network. BSs serve as trusted nodes for other
nodes or subnetworks in the network.
4.1.7 Customers
In the proposed model Figure 1, customers rely on IPFS and blockchain. Cus-
tomers need to access the data that the SNs gathered. Therefore, customers are
initially registered on the blockchain. The blockchain confirms whether the cus-
tomer identity exists or not. If the customer is validated, it is allowed to join the
network. A customer enters the network and requests the hash of the desired data,
already recorded on the blockchain. The customer provides the hash value to the
IPFS database to retrieve the data associated with the hash.
4.1.8 Malicious Node Detection Using Machine Learning
Some malicious nodes enter the network as legitimate nodes to complete their
registration process. After the registration, these malicious nodes change their
behaviour and act maliciously to attack the network nodes. In our proposed
system, the ML classifier is deployed on BSs to classify the normal and malicious
nodes. We conduct comparative analyses using six different ML classifiers for
malicious node detection. These classifiers are AdaBoost, GB, XGB, LDA, ridge,
and HGB, which are used to classify malicious and legitimate nodes. After nodes’
classification, BSs revoke the malicious nodes from the network. The classifiers
are further discussed below.
Freund and Schapire invented AdaBoost in 1996 [66]. It was the first ensemble
boosting technique, which aims to combine multiple weak learners and make a
strong model because a single weak learner is not able to predict an accurate class.
The combination of weak classifiers makes a new strong classifier after the voting
mechanism, and AdaBoost is one of them. Because of less time complexity, fast
performance, and no difficulty in implementation, AdaBoost is the most efficient
and effectively used classifier in computer vision. Also, boosting methods are
considered greedy in terms of dealing with the exponential error function. The
usage of AdaBoost improves the accuracy of weak classifiers. This algorithm
32 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
initially assigns equal weights to all samples and passes them to the first weak
learner. The weak classifier is trained, giving the output in the form of 1, -1.
After that, weights are assigned in the second round to each observation. This
process is repeated several times, creating a set of weak classifiers. The time
complexity of AdaBoost is O(ftn) [67]. Here, f represents the features, t represents
the weak learners while the number of dataset samples is presented by n. The
AdaBoost algorithm is presented below.
Algorithm 2: AdaBoost Algorithm
1Initialize the observation weights wi= 1/N, i = 1, 2, ..., N.
2for m=1 to M
3(a) Fit a classifier Gm(x) to training data using weight wi.
4(b) Compute errm=PN
i=1 wiI(yi=Gm(xi))
i=1 wi.
5(c) Compute αm= log((1 errm)/errm).
6(d) Set wiwi.exp[αm.I(yi=Gm)], i = 1, 2, ..., N.
8Output Gm(x)= sign[PM
m=1 αmGm(x)].
GB is a supervised ML algorithm invented by Friedman, 2001 [68]. It is an en-
semble technique that is used for classification and regression. It is different from
AdaBoost, and it comprises three parts: loss function, weak learners, and additive
model. The loss function is used to minimize the residual and converge the final
output. While the weak learners are used to make predictions. Initially, GB uses
two models to start with a base model and find the residual passes to the first
weak learner. It combines several weak learners and makes a single strong learner
using the additive model. Individual weak learners act as decision trees (DT) in
GB. These DTs are constructed so that each new tree fits within the residuals of
the previous step, allowing the model to reduce error. A new DT learned from
the mistakes of a prior DT. These DTs are sequentially connected to each other,
and each DT minimizes the error of the previous DT. Furthermore, the additive
model combines the outcomes of each step, given the strong learner. The time
complexity of GB is O(ftnlogn) [69]. Here, logn represents the depth of the weak
learners. The GB algorithm is presented below.
LDA is a broader variant of Fisher discriminate analysis (FDA), also known as
normal discriminate. LDA is a supervised ML technique used for classification and
dimensionality reduction, invented by Ronald A. Fisher in 1936 [70]. The primary
goal of LDA is to reduce higher dimension data into lower dimension data to pre-
vent losing important information and reduce the consumption of computational
resources. However, the number of features surpassing the number of samples
33 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
Algorithm 3: Gradient Boost Algorithm
1Initialize model with constant value: F0x=argminrPn
i=1 L(yi, r).
2for m = 1 to M
3Compute residual rim =[∂L(yi,F (xi)
∂F (xi)]F(x)=Fm1(x)for i = 1, ..., n
4Train regression tree with feature x against r and create terminal nodes
reasons Rjm for j = 1, ..., m
5Compute rjm =argminrPxiεRjm L(yi,F(m1)(xi)+r) for j = 1, ..., m
6Update the model: Fm(x) = Fm1(x) + vPjm
j=1 rjm 1(xRjm )
8Output f(x)=Fm(x).
along with the nonlinearity of the data points cause the LDA to fail. In dimen-
sionality reduction, three steps are involved. In the first step, separability between
the classes, known as between the class matrix or between the class variance, is
calculated. The goal is to maximize the separation between the two classes. The
difference between the mean of class and the data point of a class, known as within
class matrix or within class variance, is calculated in the second step. The aim of
this calculation is minimizing the within class matrix or within class variance. In
the third step, the new lower dimensional space is built and the new data point is
projected onto it. The time complexity of LDA is O(Mf2) [71] in the case where
the number of instances exceeds the number of features. M represents the mean
of instances. The LDA algorithm is presented below.
Algorithm 4: Linear Discriminant Analysis Algorithm
1Given a set of N samples [xi]N
2Compute the mean of each class µ(1 ×M).
3Compute the total mean of all data µ(1 ×M).
4Calculate between class matrix SB(M×M).
5for Class i= 1, 2, ..., c
6Compute within-class matrix of each class Swi(M×M), as follows:
8Construct a transformation matrix for each class (Wi) as follows:
10 The eigenvalues (V) and eigenvectors (ω) of each transformation matrix
(Wi) are calculated, the calculated eigenvector and eigenvalues of the i-th
11 Sort the eigenvectors in descending order according to their corresponding
12 Project the samples of each class onto their lower dimensional space.
13 Y=X.T
14 endfor
34 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
XGB was created by Tianqi Chen in 2014 in order to improve the performance and
speed of ML models [72]. Despite being scalable and a highly accurate extension of
GB for boosted tree algorithms, XGB needs heaps of computing power. It refers to
engineers’ goal of pushing the limit of computation resources for the GB technique.
This technique sequentially generates DTs. Weights are very important in XGB.
All independent variable are assigned weights and subsequently fed into the DT
predicting results. When the tree wrongly predicts a variable, the weight of the
variables is increased, and these variables are provided in the second DT. These
various predictors are combined to form a more robust and precise model. Three
steps are performed in XGB. Firstly, it reduces overfitting by using regularization.
Second, it optimizes sorting with parallel execution, which increases runtime speed.
Finally, it prunes the tree using the maximum depth of the DT as a parameter,
minimizing the total runtime. The time complexity of XGB is O(tdxlogn) [73].
Here, d represents the height of the tree, and x represents the missing values. The
XGB algorithm is presented below.
Algorithm 5: Extreme Gradient Boost Algorithm
1Data: Dataset and hyperparameters Initialize f0x;
2for k=1 to M Calculate gk=∂L(y,f )
∂f ;
3Calculate hk=2L(y,f )
∂f 2;
4Determine the structure by choosing splits with maximized gain
5A = 1
6Determine the leaf weights w=G
7Determine the base learner b(x)=PT
j=1 wI;
8Add trees fk(x) = fk1(x) + b(x);
10 Result f(x) = PM
k=0 fk(x)
A histogram is used to count the frequency of data across a specific time period.
It is also known as binning or bucket [74]. Instead of calculating the split points
on the sorted feature values, HGB applies the binning method to the DT [75].
The binning method is applied to data for pre-processing, which sorts the feature
values and then divides the sorted feature values into numerous buckets or bins. It
makes this algorithm more efficient as compared to XGB, LGB, and GB in terms
of memory consumption and training speed. It is also used to convert continuous
or numerical variables into categorical features and deal with noisy data. The
time complexity of HGB is O(ft(n bins)) [76]. Here, n bins represents the data
instances in the current block to generate the histogram. The HGB algorithm is
presented below.
35 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
Algorithm 6: Histogram Gradient Boost Algorithm
1Initialize model with constant value: F0x=argminrPn
i=1 L(yi, r).
2for m=1 to M
3Compute residual rim =[∂L(yi,F (xi)
∂F (xi)]F(x)=Fm1(x)
4for i=1,...,n
5Apply binning technique.
6Sort the data features.
7Distribute the data feature in bins.
8Train regression tree with feature x against r and create terminal nodes
region Rjm for j= 1, ..., m
9Compute rjm =argminrPxiεRjm L(yi,F(m1)(xi) + r) for j=1, ..., m
10 Update the model: Fm(x) = Fm1(x) + vPjm
j=1 rjm 1(xRjm )
11 endfor
12 Output f(x) = Fm(x).
A ridge classifier is a type of ridge regressor. The ridge classifier first converts the
target variable into binary form (1,-1) and then treats it as a ridge regressor. Hoerl
and Kennard introduced the ridge regressor in 1970 as a regularization method
for reducing model complexity [77]. The time complexity of the ridge classifier
is 0(n3) [78]. It uses the coefficient estimator for variables that are not linearly
independent but are highly correlated. The ridge estimator shrinks the coefficient
value and produces a new value close to the actual population. Furthermore, it
involves plenty of coefficient mechanisms, meaning that no coefficient is left when
the model is built. Due to the penalty mechanism, the loss function is minimized.
The ridge classification algorithm is presented below.
Algorithm 7: Ridge Classification Algorithm
1Step-1: The training dataset is stored in input data matrix X, while the test
dataset is stored in input data matrix X-test.
2Step-2: For each test data X-test, calculate the regression parameter
vector ´αas ´α=argamin||xXiα||2
2where, λrepresents the
regularization parameter and i class.
3Step-3: The new test sample X should be projected into the subspace of
each class I using projection. ´αas ´xi=Xi´αi.
4Step-4: Find the separation between the class-specific subsample and the
test sample x. ´xi.
5Step-5: The test sample x belongs to the class with the shortest distance.
36 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
4.1.9 Dataset Description
The WSN dataset (WSN-DS) used in this study was published in [79]. Accord-
ing to this research, the WSN-DS was developed with the help of the Low Energy
Adaptive Clustering Hierarchy (LEACH) routing protocol. SNs are used to collect
the data and deliver it to the CH. The CH receives data from the SNs and trans-
mits it to the BS. The BS then aggregates the data from all CHs and generates the
dataset. This dataset has 18 features and five classes. The dataset’s features are
Node ID, Is CH, RSSI, Distance to CH, Average distance to CH, Max distance to
CH, Current energy, ADV CH send, ADV CH receives, Distance CH to BS, Data
send, Data received, etc. More details about the features are given in [79]. This
dataset is divided into five different classes. The first class is normal, while the
remaining classes are concerned with the DoS attacks. DoS attacks include Gray-
hole attacks, Blackhole attacks, Time division multiple access (TDMA) attacks,
and Flooding attacks. The details about the classes and distribution of instances
are given in Table 4.1. Furthermore, the WSN-DS consists of 374661 instances,
with 340066 instances belong to the normal class, and 34595 instances belong to
the malicious class. It indicates that WSN-DS is highly imbalanced, which could
lead to a problem of weak generalization by classifiers.
Table 4.1: Details of WSN-DS
Class Label Number of In-
0 Normal 340066
1 Grayhole 14596
2 Blackhole 10049
3 TDMA 6638
4 Flooding 3312
4.1.10 Data Sampling
The WSN-DS is highly imbalanced, as mentioned above in Table 4.1. When data
from the majority and minority classes is not balanced, it indicates a biasness in
favour of the majority class. As a result, classification accuracy decreases, and
the classifiers’ performance degrades. In our case, the number of normal class
instances is greater than the number of malicious class instances. Therefore, it
is necessary to balance the data before giving it to the classification model. In
literature, two types of balancing techniques are used to deal with imbalanced
data. One is oversampling, and the second is undersampling. The oversampling
37 Thesis by: Muhammad Nouman
Chapter 4 Proposed Solution
increases the number of instances, whereas, the undersampling decreases the num-
ber of instances. Both are used to solve the problem of data imbalance. Both have
their own sets of benefits and drawbacks. This research uses the Synthetic Mi-
nority Oversampling Technique (SMOTE) to handle the imbalanced data [80]. It
duplicates the minority class instances by using an existing instance to make new
4.2 Proposed System Model 2
This section provides complete details about the proposed system model. The
steps for developing the system model are described in Algorithm 8. The proposed
system model has several sub-modules, such as WSN, blockchain, WSN-DS, Over-
sampling, malicious nodes’ detection, and the stacking model. As illustrated in
Figure 4.3, these submodules are graphically represented in our proposed system
model for malicious node detections in WSNs. The following subsections pro-
vide in-depth discussions of the abovementioned sub-modules and their respective
Cluster Head
Sensor Node
Base Station
Normalization Oversampling
using ADASYN
Training Dataset
Testing Dataset
Stacking Model
Base Models
Level 0
Meta Model