ArticlePDF Available

Abstract and Figures

Massive Internet of Things (IoT) connectivity requires addressing spectrum congestion caused by spectrum scarcity in wireless communications. Over the past decade, cognitive radio has been proposed as a promising solution to utilize the licensed spectrum efficiently. Conventional spectrum sensing approaches are complex and require statistical information about the behavior of licensed users, which is impractical. To overcome this limitation, several reinforcement learning (RL)- based spectrum sensing approaches have been proposed that are also highly adaptable to the dynamics of IoT environments. Additionally, cooperative RL-based spectrum sensing approaches have been widely used because they are more accurate than noncooperative approaches. However, the advantage comes at the cost of scalability due to increased information sharing overhead. Furthermore, cooperative spectrum sensing (CSS) approaches suffer from attacks on the network, such as sensing data falsification (SDF) attacks, which deteriorate sensing accuracy dramatically. In this paper, we present a scalable, partially CSS algorithm that is highly resilient to SDF attacks. The novelty of the proposed algorithm lies in partial cooperation through coalition formation, which reduces sensing and information sharing overhead while improving sensing accuracy. Moreover, the algorithm learns to adapt the sensing participation percentage and selects the most rewarding channel for sensing to maximize rewards while minimizing energy consumption. The proposed algorithm outperforms state-of-the-art CSS algorithms in terms of sensing accuracy and overheard. Contrary to centralized CSS algorithms, the proposed algorithm’s performance is directly proportional to the number of devices; hence, it is suitable for massive connectivity.
Content may be subject to copyright.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 1
Partially Cooperative Scalable Spectrum Sensing in
Cognitive Radio Networks under SDF Attacks
Sadia Khaf, Student Member, IEEE, Mohammad T. Alkhodary, and Georges Kaddoum, Senior Member, IEEE
Abstract—Massive Internet of Things (IoT) connectivity re-
quires addressing spectrum congestion caused by spectrum
scarcity in wireless communications. Over the past decade,
cognitive radio has been proposed as a promising solution to
utilize the licensed spectrum efficiently. Conventional spectrum
sensing approaches are complex and require statistical informa-
tion about the behavior of licensed users, which is impractical.
To overcome this limitation, several reinforcement learning (RL)-
based spectrum sensing approaches have been proposed that
are also highly adaptable to the dynamics of IoT environments.
Additionally, cooperative RL-based spectrum sensing approaches
have been widely used because they are more accurate than non-
cooperative approaches. However, the advantage comes at the
cost of scalability due to increased information sharing overhead.
Furthermore, cooperative spectrum sensing (CSS) approaches
suffer from attacks on the network, such as sensing data
falsification (SDF) attacks, which deteriorate sensing accuracy
dramatically. In this paper, we present a scalable, partially CSS
algorithm that is highly resilient to SDF attacks. The novelty
of the proposed algorithm lies in partial cooperation through
coalition formation, which reduces sensing and information
sharing overhead while improving sensing accuracy. Moreover,
the algorithm learns to adapt the sensing participation percentage
and selects the most rewarding channel for sensing to maximize
rewards while minimizing energy consumption. The proposed
algorithm outperforms state-of-the-art CSS algorithms in terms
of sensing accuracy and overheard. Contrary to centralized CSS
algorithms, the proposed algorithm’s performance is directly
proportional to the number of devices; hence, it is suitable for
massive connectivity.
Index Terms—Smart spectrum sensing, cognitive radio Inter-
net of Things (CR-IoT), channel utilization, energy efficiency.
I. INT ROD UC TI ON
THE number of Internet of Things (IoT) devices is ex-
pected to grow to 41.6 billion devices, generating 79.4
zettabytes (ZB) of data, in 2025 [1]. Therefore, the need
for intelligent resource management and efficient bandwidth
utilization is greater than ever before [2]. The radio spectrum is
divided into licensed bands, which are used mainly by cellular,
radio and television networks, and unlicensed bands occupied
primarily by IoT devices. The rapid growth in the number
of IoT devices will quickly consume all available unlicensed
bands, leading to congestion and packet drops. The concept of
cognitive radio (CR) has become much more popular in recent
years to solve the problem of congestion in the unlicensed
spectrum [3]. In CR, secondary (unlicensed) users (SUs) are
Sadia Khaf, Mohammad T. Alkhodary, and Georges Kaddoum are with
the Department of Electrical Engineering, ´
Ecole de technologie sup´
erieure,
Montreal, QC, H3C 1K3 Canada. e-mail: sadia.khaf.1@ens.etsmtl.ca
Manuscript received xxxxx xx, 2021; revised xxxx xx, 2021.
allowed to utilize the licensed spectrum when it is not occupied
by primary (licensed) users (PUs). Ordinary IoT devices would
inevitably need to be transformed into cognitive IoT devices to
solve spectrum congestion [2]. Introducing CR to IoT opened
the door to massive connectivity in IoT networks and promised
support for an unprecedented number of sensors and devices.
The concept of CR-IoT emerged as a promising solution that
leverages the vacant spaces in the licensed spectrum to solve
the congestion problem in the unlicensed one. The authors
of [4] survey the CR-IoT architectures and frameworks and
propose potential applications. However, CR algorithms face
significant challenges in the context of IoT, such as sensing
accuracy, energy consumption, and network [5].
Spectrum sharing in the context of CR-IoT relies heavily
on spectrum sensing accuracy to minimize harmful interfer-
ence to PUs caused by SUs. Sensing accuracy is commonly
compromised by device accuracy and sensing data falsification
(SDF) attacks. The former is caused by hardware limitations
of the sensing device, and the latter is due to malicious
attackers falsifying the sensing results of SUs, leading to the
SUs learning incorrect channel statuses [6]. An attack of the
latter type affects a few SUs, and majority vote is an effective
strategy to mitigate this type of attack, and hence improve
sensing accuracy. Cooperative spectrum sensing (CSS) was
introduced to improve the sensing accuracy of individual
spectrum sensing devices [7]. In this vein, several multi-agent
deep learning approaches have been proposed to provide a
high level of accuracy using CSS [8], [9], [10]. The authors
of [8] propose an SU selection strategy that chooses an SU
to sense the channel status based on energy detection and
shares the results with other SUs. The authors of [9] also
use deep reinforcement learning to efficiently explore the
radio environment through CSS. However, these approaches
focus on full cooperation among SUs to improve accuracy
and are not scalable to massive IoT networks, and hence have
limited performance under high load factors. Therefore, an
energy efficient approach that maintains a high level of sensing
accuracy under various load factors is needed to support the
future demands of IoT networks.
Current state-of-the-art spectrum sensing techniques con-
sume a lot of time and energy, which makes it impractical
to implement them with low-powered IoT devices [11]. To
overcome this limitation, spectrum sensing using evolutionary
game theory (EGT) approaches gives SUs the freedom to be
free riders to conserve their energy [12], but suffers from
inaccurate sensing due to probabilistic environment model-
ing. On the other hand, spectrum sensing using multi-agent
reinforcement learning (RL) trains the agents (SUs) through
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 2
rewards from interaction with the environment to provide a
realistic environment and accurate sensing [8]. However, the
aforementioned approaches consume a lot of time and energy
under a high load factor [13]. Thus, reducing the energy
consumption (for sensing and collaboration) of multi-agent
CSS is an open research problem.
To tackle CR-IoT’s sensing accuracy, energy consumption
and scalability challenges, this paper presents a novel partially
cooperative multi-agent RL (PCMARL) spectrum sensing
algorithm. Partial cooperation is achieved through coalition
formation, which restricts sensing and information sharing to
small subsets of SUs. The SUs in these subsets collaborate and
use majority voting to fight SDF attacks and improve sensing
accuracy as well as reduce sensing overhead. Our approach
teaches the SUs to optimize their sensing participation by
learning to be free riders to conserve energy while maximizing
rewards. We also give SUs the freedom to select a channel to
sense based on reward history, which gives them the flexibility
to switch channels when a channel is frequently busy or due
to SU mobility.
Novelty and Contribution
This paper presents the first scalable, energy efficient, RL-
based algorithm that takes advantage of coalition formation
and majority voting for reliable spectrum sensing in the
presence of SDF attacks. The contribution of this paper can
be summarized as follows:
The PCMARL algorithm provides agents with a realistic
model of the radio environment, rather than a probabilis-
tic one as in the case of EGT-based algorithms. In other
words, the environment is more realistic in the sense
that it does not rely on SUs’ probability of detection
and probability of false alarm, but rather calculates the
rewards through the agent’s interaction with the radio
environment.
We provide a coalition-based partial cooperation strategy
that reduces the amount of energy spent physically sens-
ing the channel, and sharing the channel status with other
SUs.
Contrary to traditional CSS approaches, the PCMARL
algorithm does not rely on a fusion center to calculate
majority vote. Instead, the SUs in each coalition com-
pute majority vote locally, which makes the PCMARL
algorithm scalable to massive connectivity.
This is the first approach that gives SUs the freedom to
choose the channel for sensing based on channel-specific
reward history and teaches them to be free riders to
conserve energy.
In contrast to state-of-the-art spectrum sensing algo-
rithms, the PCMARL algorithm is the first approach
whose performance is directly proportional to the number
of devices in the network, making it suitable for massive
connectivity.
The remainder of this paper is organized as follows. Section
II provides a literature review of the most relevant works. The
system model and the proposed PCMARL spectrum sensing
algorithm are presented in Section III. Section IV discusses the
simulation results. Finally, Section V concludes this work.
II. RE LATE D WOR KS
The challenges of spectrum sensing in the context of IoT
networks can be summarized as spectrum sensing scheduling,
sensing time minimization, energy consumption minimization,
enabling massive connectivity, and maintaining a high level
of accuracy in the presence of SDF attacks [14], [6]. The
solutions proposed to address some of these challenges can be
classified into three categories based on the approach used: A)
Optimization and Heuristic Approaches, B) Game-Theoretic
Approaches, and C) Machine Learning Approaches. A brief
comparison of these approaches in terms of methodology,
advantages, and disadvantages, is presented in Table I with
corresponding references.The key aspects of each approach
are discussed below.
A. Optimization and Heuristic Approaches
Optimization and heuristic approaches were among the
earliest approaches used for CSS [15]. More recently, authors
of [16], [17], [18] have tried sub-carrier allocation subject
to delay, maximum power, and interference constraints. The
proposed methods are computationally expensive. They work
well with a limited number of IoT devices; however, scaling
them to a large number of devices makes them too complex.
The authors of [19] used a Hidden Markov Model (HMM)
for CSS and spectrum hand-off in a CR network (CRN);
however, the fully cooperative nature of the solution makes
it unscalable. In short, classical optimization methods are not
suitable for CR-IoT applications since a complete model of
the environment is not available [29], they are not scalable to
support massive IoT connectivity [30], [31], and they cannot
adapt to the dynamic and evolving environment [31].
B. Game Theoretic Approaches
The use of game theoretic approaches for dynamic spectrum
allocation is studied in [20], and both cooperative and non-
cooperative game models are explored. The application of
EGT for CSS is presented in [21]. The authors present a novel
concept of “free will” for the SUs to participate in spectrum
sensing to reduce energy consumption. An extension of the
work that allows SUs to join multiple coalitions at the same
time is studied in [23]. Adaptive random access for collecting
sensing data is proposed in [22] to optimize sensing time.
In all the game theoretic approaches mentioned above, the
rewards are calculated from SUs’ probability of detection and
probability of false alarm, which provide an unrealistic repre-
sentation of the environment. Additionally, game theoretic ap-
proaches assume homogeneous rewards, are highly complex,
and have poor convergence and low flexibility. Furthermore,
game theoretic approaches can be challenging to implement
in practical scenarios due to the difficulty of distributing the
game rule for the players [32].
C. Machine Learning Approaches
Several deep learning frameworks for spectrum sensing are
proposed in [24], [25], [26] to improve the energy efficiency
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 3
TABLE I: Literature on Spectrum Sensing: Benefits and Drawbacks
Optimization and Heuristic Approaches Game Theoretic Approaches Machine Learning Approaches
Methodology Simplified model of the environment to
formulate spectrum sensing in classic op-
timization problem. Optimization is done
using genetic algorithms, e.g., particle
swarm optimization, etc.
Distributing resources among competing
players to reach a Nash equilibrium
Learning from data or the environment to
make decisions autonomously. Improves
by experience.
Advantages Fast implementation with a close-form
solution.
Fair distribution of resources. Dynami-
cally varying in time
No prior environment model is needed.
Learns complex patterns in the data. Re-
acts to dynamic environment
Disadvantages Incomplete environment model. Difficult
to scale in dynamic IoT environments.
Does not support heterogeneous players.
Slow convergence. May not reach equi-
librium. Difficult to scale in dynamic IoT
environments.
Rely on labeled data. Need training us-
ing large datasets in case of deep learn-
ing (DL). Curse of dimensionality. Su-
pervised machine learning is difficult to
scale in dynamic IoT environments.
References [15], [16], [17], [18], [19] [20], [21], [22], [23] [24], [25], [26], [27], [28]
of the system and provide a low-complexity alternative to
classical optimization methods. A survey of supervised and
unsupervised machine learning methods for spectrum sensing
can be found in [27]. Our earlier works [33], [34], [35]
provide a foundation for resource sharing among competing
IoT devices using deep learning. Similarly, the authors of
[36] propose a reinforcement learning method, SARSA, for
resource provisioning and horizontal container scaling in fog
computing. The authors further extend their work to deep
reinforcement learning in [37], [38] proposing deep Q-learning
as an alternative to heuristic methods for service provisioning
to IoT devices due to the NP hardness of the problem and
demonstrating the algorithm’s efficiency in making proac-
tive placement and scaling decisions. Several supervised and
unsupervised machine learning approaches such as K-means
clustering, Gaussian mixer model, support vector machine, and
K-nearest neighbors are proposed for opportunistic spectrum
access in [28]. Nevertheless, it should be highlighted that data-
driven machine learning approaches suffer from a number of
inherent drawbacks such as lack of sufficient labeled data,
partial observability of the environment, and lack of support
for incorporating delayed feedback from the environment.
Such shortcomings prevent machine learning algorithms from
being an ideal solution for spectrum allocation in a dynamic
radio environment and motivate the use of RL.
The authors of [39] propose to use RL for spectrum sensing
in a dynamic IoT environment and analyze and compare the
performance of -greedy and upper confidence bound (UCB)
exploratory strategies. The idea of using RL with UCB is
further explored in [9], where the authors created several PU
traffic models such as bursty, legacy, and frequency hopping
patterns to show the adaptability of RL-based spectrum sens-
ing methods. The use of RL to facilitate the coexistence of
long-term evolution license-assisted access (LTE-LAA) and
IEEE 802.11 Wi-Fi systems is studied in [40], where agents
compete for channel access, considering the effects of MAC
and physical layers. The authors of [41] proposed a spatial-
correlation-based SU selection mechanism that selects the
most suitable SU for local sensing.
An RL-based multi-agent CSS problem is studied in [42].
A distributed model is used that assumes perfect channel
information. Since the approach is distributed, it is easily
scalable to massive IoT networks. It is assumed that there
will always be only one PU in six channels with a 20%
probability that the PU remains on the same channel and an
80% probability that it switches channel. This is an unrealistic
assumption since, in practice, there can be more than one PU
transmitting at the same time. The authors of [43], [44] also
propose a multi-agent RL framework for spectrum sensing in
CR-IoT. The use of RL for spectrum sensing is also advocated
in more recent works due to the RL agents’ ability to quickly
adapt to a highly dynamic IoT sensor environment [45], [8],
[46], [47], [48], [49].
Cooperative RL approaches exhibit superior performance in
terms of accuracy. To implement cooperative RL algorithms
for low-power CR-IoT devices, a significant reduction in
energy consumption, sensing overhead, and sensing time is
needed. This can be achieved by using coalition formation
and majority vote, and sharing sensing belief with a limited
number of neighbors as shown herein.
Sensing Falsification
The sensing data of individual SUs is vulnerable to sensing
falsification due to device sensing inaccuracies and SDF
attacks. Therefore, the belief-sharing aspect of CSS introduces
new security threats. In an SDF attack, malicious attackers
send false local sensing results to mislead the cooperating
SUs [6]. Attacks may result in either excessive interference
in the PU network or a decrease in spectrum utilization [50].
Attacks of this type are carried out by sparse agents, and
majority vote is an effective strategy to mitigate them. In
addition to SDF attacks, SU sensing results are subject to
falsification due to device-hardware sensing inaccuracies. The
authors of [51] suggest using block chains to mitigate the
effect of falsified sensing. The authors of [52] propose a
distributed trust model to discredit the malicious and selfish
nodes, thereby reducing the weight of their signals. In other
works [53], the authors discuss probabilistic SDF attacks and
a centralized fusion center (FC)-based mitigation strategy.
Similarly the authors of [54] propose a sequential fusion-based
mitigation strategy that relies on an FC. The authors of [55]
also propose an FC-based network topology with three levels
of control, where the secondary controller is responsible for
resource sharing and management strategies as well as attack
detection and mitigation. Although FC is optimized overall, it
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 4
creates a single point of failure and makes the approach non-
scalable. Moreover, the performance of FC-based approaches
degrades in terms of energy efficiency due to large information
sharing overhead. In contrast, we propose an energy efficient
partially cooperative SDF mitigation strategy that computes
the majority vote locally and improves, both energy efficiency
and accuracy with a greater number of devices.
III. PCMARL AL GO RI THM-BA SE D SPE CT RUM SE NS IN G
This section presents the proposed system model, intro-
duces the notation used, explains the structure and flow of
the PCMARL algorithm, lists performance indicators, shows
the algorithm’s resilience against sensing falsification, and
analyzes its complexity.
A. System Model
The proposed CR environment consists of NPPU, NsSU,
and Mlicensed channels, and we assume M=Np. The inter-
nal clocks of all PUs and SUs are synchronized and sufficiently
accurate [56]. The stochastic process governing PUs’ access
to channels can be modeled as a Markov decision process
(MDP). Each PU transmits randomly with transmission length
L=kt, where kis a random number and tis the transmission
unit “time slot”. The SUs have no prior knowledge of the
transmission pattern or transmission length of the PUs, and
should be sure to not interfere with the PUs. Each SU can
sense only Kchannels, where (KM), in each time slot due
to energy and hardware limitations. Each SU’s sensing results
are subject to falsification with with probability Pdue to SDF
attacks and hardware sensing inaccuracy. The CR model is
distributed, and there is no centralized unit to share sensing
information among SUs.
Let S={si, ..., sNs}be the SU set with Nsuser “agents”,
and C={ci, ..., cM}be a set of Mchannels occupied by Np
PUs. Each agent interacts with the environment and with other
agents, a subset of Sthat sense the same channel, to collect
information about channel occupancy and decide on its sensing
policy. The sets of observations and actions are donated
by O={0,1, ..., T }and A={not-sensed, idle, busy},
respectively, with size |O| and |A|. The “not-sensed” action
represents the agent’s action if it decides to be a free rider
to conserve energy. The notation asi
j A represents the
action taken by the agent sifor the channel j. The set
Dj={as1
j, ..., asNj
j}represents the multi-set of actions of all
agents in collation j, where Njis the number of agents sensing
channel j. Similarly, rsi
j(ot)∈ {0,1}donates the reward of
agent sias a result of selecting channel jfor observation
ot∈ O.
Interaction with the environment happens through coalition
formation, the sharing of sensing belief Dj, majority vote,
and the exchange of rewards. Each agent keeps a record of its
own actions and its rewards from each channel. In each time
slot, the agent picks a channel to sense based on its reward
history. All agents sensing channel jare considered coalition
Cj, and they can join or leave a coalition based on their quality
of service (QoS), as depicted in Figure 1. After joining a
coalition, each agent calculates the best action, i.e., channel
Coali�on 3 Coali�on 2
Coali�on 1
Coali�on 4 Coali�on 5 PU SU
Fig. 1: The proposed CR environment, all SUs that are trying
to access the same channel are considered a coalition. Note:
this approach is distributed since there is no centralized control
unit, or a fusion center. SUs can join or leave a coalition based
on their QoS.
status, based on the Q-table and shares it with all members
in the coalition. All agents in the coalition then calculate the
majority vote of the coalition locally using the channel status
shared by other members and update their actions based on
the majority vote. Lastly, the agents receive a reward from
the environment if their predicted channel status matches the
actual status and update their action-reward history.
The work of this paper (PCMARL algorithm) is inspired
by EGT and RL in the following manners. From an EGT
perspective, the work uses the concept of coalition forma-
tion, belief sharing, and majority vote. From RL perspective,
the proposed algorithm uses belief sharing, Q-learning, and
rewards through interaction with the environment. Thus, the
algorithm uses belief sharing, which is common to cooperative
games and cooperative RL approaches, to provide a unified
and more realistic model of the radio environment.
B. PCMARL Spectrum Sensing Algorithm
Algorithm 1 highlights the important steps in the PCMARL
spectrum sensing algorithm. Additionally, Figure 2 depicts the
flow of the algorithm. The input parameters αand γdefine
the learning rate and the weight associated with the temporal
difference in the Q-learning algorithm, respectively. The Q-
table and agent histories are initialized to zero so that the
initial actions of all agents are always exploratory.
The PCMARL algorithm uses -greedy to balance explo-
ration and exploitation such that the agent takes exploratory
action with probability and chooses the greedy action, i.e.,
the action with the highest Q-value, with probability 1.
However, the value of decreases exponentially from 1 to 0
with the number of episodes rather than being constant.
The Q-table holds Q-values for all actions and all obser-
vations for each channel. In the case of a random action, the
agent randomly selects the channel and the channel status,
whereas in the case of a greedy action, the agent picks the
best channel to sense based on channel reward history and the
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 5
Algorithm 1: PCMARL Cooperative Spectrum Sens-
ing
1Algorithm parameters: Learning rate α(0,1],γ,
majority-vote v[0,1] ;
2Initialize Q(o, a), for all o∈ O, a ∈ A, ;
3Initialize R(o, s), for all o∈ o, s ∈ S, ;
4foreach episode do
5Reset Environment;
6foreach step of episode do
7if random float (episodecount)then
8Sample afrom A;
9else
10 Choose channel with max(Rc);
11 Choose action for channel from Q-table;
12 if v== 1 then
13 Form coalition Cj;si∈ Cjif sisensed
channel j;
14 Calculate majority decision for each
coalition ;
15 Update actions of each sibased on
majority-vote ;
16 foreach si∈ S do
17 Take action a, observe R,o;
18 Q(o, a)
Q(o, a)+ α[R+γmaxaQ(o0, a)Q(o, a)];
19 oo0;
channel status based on the Q-table. The total reward Rsi
jthat
SU siobtains from channel jis
Rsi
j=
ot|t=to
X
ot|t=1
rsi
j(ot),(1)
where otand toare the observation at time tand the current
time, respectively. The average channel reward ¯
Rsi
jof sifor
channel jis calculated as
¯
Rsi
j=Rsi
j
Ksi
j
,(2)
where Ksi
jis the number of times that channel jwas sensed
in all past observations up to the current observation. In each
time slot, each SU picks the channel with highest average
sensing reward to sense as follows:
cj= arg max
j
¯
Rsi
j.(3)
After selecting the best channel, the agent needs to choose
the best action for that channel. The action, i.e., the channel
status for channel j, is selected by SU sibased on
asi
j= arg max
a∈A Q(ot, a).(4)
Since the action space contains “free riding”, “idle”, and
“busy” as actions, the one with the highest Q-value is selected
according to Eq. 4.
In a traditional Q-learning approach, an individual action
chosen based on the Q-table is first compared to the PU
Start
Initialize EnvModel &
Agent_History
Identify Current State
Explore?
Sample random action
from action_space
Choose channel with
max(av/g_chl_reward)
Choose action based
on Q-table
Form coalition
Majority
Vote?
Update agents’
actions
Take Action
Receive
Reward
Update Q-table &
Agent_History
Terminal
State?
End
No
Yes
Yes
No
No
Yes
Fig. 2: Flowchart of the proposed PCMARL cooperative
sensing algorithm.
transmission, and then the agent receives a reward accordingly.
SUs’ individual sensing results can be falsified with probability
Pdue to SDF attacks and hardware sensing inaccuracy.
Coalition formation and majority vote provide a strong defense
against falsification, as explained in detail in Section III-D.
Once all agents in all coalitions have chosen their individual
actions, they share the channel status with the other members
of their coalition [42]. Each coalition member in coalition Cj
thus receives a multi-set Dj={as1
j, ..., asNj
j}containing the
actions of all SUs in the coalition, some of which may have
been falsified. Then, each agent calculates the majority action
of the coalition as
dj=mode(Dj),(5)
and all members of the coalition update their actions as
asi
j=dj. Once all members have updated their actions, their
actions are compared to the actual transmission pattern of the
PUs and the agents receive a reward is rsi
j∈ {0,1}from
the radio environment for predicting the channel status. The
reward rsi
j= 1 for correctly predicting channel status or
rsi
j= 0 for not sensing or incorrectly predicting it. The goal
of the SUs is to maximize their rewards. Each agent sithen
updates its Q-table as follows:
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 6
TABLE II: Table of notations
S={si, ..., sNs}set of SUs
C={ci, ..., cM}set of channels
siSU i
Ns=|S| number of SUs
M=|C| number of channels
Np=Mnumber of PUs
Tnumber of steps per episode, horizon
ttime
ot:t= 1, ..., T observation (function of time)
Pprobability of falsification due to SDF attack
asi
j∈ A action of SU sifor channel j
A={0,1,2}action space
O={o1, o2,...oT}observation space
rsi
j(ot)∈ {0,1}reward of SU sifrom channel jin ot
Rsi
jtotal reward of SU sifrom channel j
Rsitotal reward of SU siregardless of the channel
¯
Rsi
javerage channel reward of SU si
ˆ
Rsitotal reward-per-contribution of SU si
Ksi
jnumber of times channel jwas sensed by si
Ksinumber of times SU siparticipated in sensing
Qsi(ot, asi
j)Q-table of SU si
Dj={as1
j, ..., asNj
j}multi-set of actions of all SUs in coalition Cj
{sN1, ..., sNj}⊂S set of SUs in coalition Cj
dj=mode(Dj)majority decision of coalition Cj
Njnumber of SUs sensing channel j
η=Ns/Npload factor
Lj[1,3] transmission length of PU j
Hsisensing overhead of SU si
Q(ot, asi
j) = Q(ot, asi
j) + α(rsi
j+γ(max
a∈A Q(ot+1, a)
Q(ot, asi
j))).(6)
The agents continue to repeat the above steps until they
reach the terminal observation, which marks the end of the
episode. At that point, the environment is reset to the initial
observation, a new episode begins, and the process repeats
itself. The convergence of the algorithm is well established
and can be found in [57], [58], [59]. The notations used to
describe the above process are listed in Table II.
C. Performance Indicators
Sensing accuracy is the first performance indicator used in
this work. It is defined as the number of correct predictions
normalized by the number of times the SU participated in
spectrum sensing, as shown below:
Sensing Accuracy =Number of correct predictions
Total sensing contribution ×100.
(7)
Sensing accuracy is calculated per SU, and the total sensing
contribution is an indicator of the energy consumed by the
SU for spectrum sensing and collaboration. The SU’s goal
is therefore to minimize this while maintaining a high level
of accuracy. In terms of RL, this is achieved by assigning
a small negative ”step cost” to all actions other than those
“not-sensed”. However, two agents with different sensing
contributions can have the same sensing accuracy but different
sensing rewards. Therefore, the second performance indicator
used is the total rewards per episode for each SU (regardless
of the channel) Rsi, which is defined as
Rsi=
|C|
X
j=1
Rsi
j.(8)
Since total reward alone does not reflect SU’s contribution,
reward-per-contribution ˆ
Rsiis another performance indicator,
defined as ˆ
Rsi=Rsi/Ksi,(9)
where Ksiis the number of times siparticipated in spectrum
sensing. The last performance indicator used is sensing over-
head Hsi, which shows the amount of energy SUs consume
in spectrum sensing without receiving any reward. Sensing
overhead is defined as
Hsi= 1 Rsi/K si.(10)
PCMARL algorithm’s performance is evaluated in more
detail using the aforementioned indicators in Section IV.
D. Resilience of the PCMARL Algorithm Against SDF Attacks
CR approaches are vulnerable to sensing inaccuracy due to
hardware limitations and SDF attacks. Majority vote provides
a strong defense against such inaccuracies since the coalition’s
probability of incorrect decision Pjis always lower than P,
where Pis the probability that the sensing result reported by
an SU has been falsified. We calculate Pjas
Pj=Pr(more than half of the agents in coalition Cjreport
falsified results)
=
|Cj|
X
k=l1+|Cj|
2m
Pr(kagents in Cjfalsified).
(11)
The kagents from Cjwho reported a falsified channel status
form subset Ck
j. Hence, there are |Cj|
kpossibilities of such
subsets. Therefore, the coalition’s probability of an incorrect
decision can be calculated as
Pj=
|Cj|
X
k=l1+|Cj|
2m|Cj|
kPk(1 P)|Cj|−k.(12)
Since ∀|Cj|>1and P(0,0.5),Pj< P , majority vote is
an effective remedy against SDF attacks. Figure 3 shows this
phenomenon by plotting a coalition’s probability of incorrect
decision Pjfor various |Cj|against P. It also demonstrates that
larger coalitions provide stronger defense against SDF attacks,
which is advantageous to support scalability. Furthermore,
Eq. 12 provides a relationship between |Cj|and Pjwhere |Cj|
is the number of SUs that actually participated in sensing.
The relationship helps determine the number of contributors
required in a coalition to achieve a certain probability of an
incorrect decision. Hence, coalition formation and majority
vote not only provide a defense against SDF attacks but also
enable greater energy savings by providing a threshold of
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 7
Fig. 3: Coalition’s probability of incorrect decision against P
for various |Cj|.
contributors required to achieve a certain level of accuracy. If
there are more SUs in a coalition than required, the remaining
SUs can be free riders.
E. On the Complexity of the PCMARL Algorithm
The authors of [60] determine the complexity of general
MDPs to be bounded by polynomials in experiment time, and
the size of the state-space and action-space. The time com-
plexity of model-free algorithms, including Q-learning with
-greedy, UCB, and UCB-H exploration strategies is shown in
[61] to be O(E), where Edenotes the total experiment time.
The space complexity of Q-learning is proven in [62] to be
O(|S||A|), where |S | is the size of the state-space, and |A| is
the size of the action-space. The complexity of the PCMARL
algorithm differs in only two aspects:
1) Reward history-based channel selection, which is a max
operation over the number of channels and, hence, has
a complexity of O(M).
2) Majority vote, which is a mode operation over the action
set Djand has a complexity of O(|Dj|). Note that
free-riders reduce this complexity, since the mode is
performed only on contributors.
Therefore, the additional complexity is linear in channel
quantity and coalition size but constant in load factor since
each agent performs all computations locally.
The simulation results are discussed in the next section.
IV. PER FO RM AN CE EVAL UATION
In this section, we present the simulation results, discuss
the PCMARL sensing algorithm and evaluate various perfor-
mance indicators under several load factors. The PCMARL
algorithm’s performance is compared to that of traditional Q-
learning with constant proposed in [9], Q-learning with UCB
as used in [59], and UCB-H as used in [9] for CSS.
Fig. 4: Average sensing accuracy under η= 10,P= 0.3for
all approaches.
The sensing capacity Kof each SU is assumed to be 1,
i.e., each SU can sense only one channel in one time slot,
which is a realistic assumption for most low-powered IoT
devices. The SUs are assumed to be capable of switching
between available channels. We use Np= 3 PUs for all
simulations and Ns=ηNpwhere η∈ {10,25,50,100,200}.
The simulations are carried out for 1,000 episodes, and each
episode has T= 20 steps. If an SU participates in spectrum
sensing, their reported channel status is subject to change due
to SDF attacks and sensing inaccuracy with probability P. The
simulation parameters are given in Table III.
TABLE III: Simulation Parameters
Parameter Value(s) Parameter Value(s)
η[2,10,25,50,100,200] NSηNP
P[0.01,0.05,0.1,0.2,0.3] A {0,1,2}
Number of episodes [100,500,1000] T20
NP3σ5
γ0.1α0.1
Figure 4 compares the sensing accuracy of the proposed PC-
MARL algorithm with three related approaches, as mentioned
above, under load factor η= 5 and falsification probability
due to SDF attack P= 0.3. Since the PCMARL algorithm
uses majority voting, it is significantly more accurate than
traditional Q-learning with constant , the Q-UCB approach
of [59], and the Q-UCB-H approach of [9]. Similarly, Figure
5 shows the sensing accuracy of all four approaches for η= 5
and P= 0.1. Lower values for ηand Pcompared to Figure
4 brings the performance of other approaches closer to that of
the PCMARL algorithm, nevertheless the PCMARL algorithm
is still significantly more accurate.
In order to show the effect of the load factor on sensing
accuracy, Figure 6 plots the sensing accuracy of all approaches
for two different load factors: η= 2 and η= 100. With a
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 8
Fig. 5: Average sensing accuracy under η= 5,P= 0.1for
all approaches.
Fig. 6: Average sensing accuracy of all approaches for η= 2
and η= 100 for P= 0.3.
high load factor, the PCMARL algorithm converges to 100%
accuracy in under 200 episodes due to cooperation among a
large number of SUs, which provides a strong defense against
SDF attacks.
Figure 7 shows the sensing accuracy of the PCMARL
algorithm under various load factors to show scalability. As
the load factor increases, majority vote heavily influences
individual decisions. This figure also shows our approach’s
resilience against SDF attacks when the probability of falsi-
fication is high, i.e., P= 0.3. The system was tested with
various load factor values up to η= 200, and it achieved
100% sensing accuracy in about half as many episodes as the
other approaches for all load factors.
Figure 8 compares the accuracy of all approaches with
Fig. 7: Average sensing accuracy of PCMARL for different
load-factors
Fig. 8: Average sensing accuracy of all approaches under
various P.
different probabilities of falsification in order to show the
proposed algorithm’s resilience. The PCMARL algorithm is
the most resilient to SDF attacks even when 30% of the
reported channel statuses have been falsified. The performance
of all approaches degrades with increasing P. Figure 9 shows
the PCMARL algorithm’s resilience to SDF attacks by plotting
its accuracy for various probabilities of sensing falsification.
Figure 10 shows the average rewards per episode over all
SUs for all approaches. Each episode has T= 20 slots, and the
SU sireceives reward rsi
j(ot)=1for sensing and correctly
predicting the status of channel jin state ot. The minimum and
maximum quantity of rewards per episode is therefore [0,20],
which represents the y-axis in all rewards-related figures.
Figure 11 shows the average reward per episode for all ap-
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 9
Fig. 9: Average sensing accuracy of PCMARL under various
P.
Fig. 10: Average reward over all SUs per episode for all
approaches with η= 10 and P= 0.3.
proaches under various load factors. The PCMARL algorithm
converges faster to higher rewards than the other approaches
despite the probability of falsification being P= 0.3. Sim-
ilarly, Figure 12 shows the average reward per episode with
different probabilities of falsification for η= 10. It emphasizes
the PCMARL algorithm’s resilience against SDF attacks, as
its average reward even when P= 0.3is higher than that of
other approaches when P= 0.1.
One of the contributions of this work is giving SUs the
freedom to choose a channel for sensing based on their reward
history. If a channel frequently becomes busy, an SU can
decide to stop sensing it, leave the coalition and join another
coalition (the one corresponding to the channel from which it
has historically received the highest reward) for sensing in the
Fig. 11: Average reward per episode for all approaches under
various load factors.
Fig. 12: Average reward per episode for all approaches under
various P.
hope of finding a more frequently available channel. Another
reason for SUs to switch between coalitions is their mobility,
which affects their sensing accuracy. The SUs can thus pick
a channel for which they are able to predict channel status
more accurately , thereby leading to higher rewards. In other
words, the SUs also learn to join the most suitable coalition
for higher rewards. This concept is reflected in the PCMARL
algorithm’s behavior as shown in Figures 10, 11 and 12.
The SUs have the free will to be contributors or free riders,
and there is a small per-step negative cost associated with
participating in spectrum sensing to encourage the SUs to learn
to participate in sensing only when a high reward is expected
in order to conserve their energy. In other words, SUs can
learn to settle for a lower reward to conserve their energy. The
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 10
Fig. 13: Average contribution percentage for all approaches.
Fig. 14: Average energy consumption per unit reward for all
approaches.
number of times that SUs participate in spectrum sensing per
episode is indicated by their contribution percentage, shown in
Figure 13, which also reflects their total energy consumption.
We assume that an SU spends one unit of energy whenever
it participates in spectrum sensing in a particular time slot
and zero units of energy whenever it is a free rider. Note
that the epsilon-greedy algorithm exhibits better exploration
of the environment than the UCB and UCB-H algorithms that
prioritize saving energy.
However, total rewards and contribution percentage alone
do not give a complete measure of energy efficiency for SUs
since they do not reflect whether the reward is worth the energy
spent for spectrum sensing. For example, an SU may have
participated in spectrum sensing only three times per episode
to conserve energy and predicted correctly each time. SU’s
Fig. 15: Average energy consumption per unit reward for
PCMARL under various load factors.
Fig. 16: Average sensing overhead of all approaches for P=
0.1and P= 0.3.
total reward per episode might still be lower than that of the
other SUs that participated more in sensing. Thus, rewards
and contribution percentage, when looked at separately, do
not fully represent energy efficiency.
A better indicator of energy-reward balance is energy-per-
reward, which reflects the total energy the SU consumed
per episode normalized by the total rewards it received per
episode. Figure 14 shows total energy-per-reward for the SUs
for all approaches. A lower energy-per-reward ratio reflects
higher energy efficiency since it shows that the SUs learned
to participate in sensing only when they expected a reward.
Ideally, the energy-per-reward ratio should be equal 1. The
PCMARL algorithm achieves a near ideal energy-per-reward
ratio, whereas UCB-H is the most energy inefficient approach
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 11
Fig. 17: Average sensing overhead of PCMARL under various
P.
and does not learn to lower the participation percentage to
conserve energy. Figure 15 shows the PCMARL algorithm’s
energy-per-reward ratio under various load factors, and it is
clear that as the load factor increases, the SUs have more
opportunities to conserve their energy and still achieve a near
ideal energy-per-reward ratio.
A related measure of energy wastage is sensing overhead,
which indicates the amount of energy the SUs spent sensing a
channel when they did not receive any reward. Figure 16 shows
the average sensing overhead of all approaches. The PCMARL
algorithm has extremely low sensing overhead after half as
many episodes as the other approaches. Sensing overhead
is high at the beginning when the SUs are exploring the
environment, but drops as the number of episodes increases. In
an ideal system with zero wasted energy, the sensing overhead
will be zero. Figure 17 shows the PCMARL algorithm’s
sensing overhead with various Pvalues and that it has very
low overhead even with high Pvalues.
V. CO NC LU SI ON
In this paper, we propose a PCMARL spectrum sensing
algorithm that takes advantage of coalition formation and
majority voting to combat SDF attacks in CR-IoT. The pro-
posed PCMARL algorithm outperforms state-of-the-art RL-
based sensing algorithms, namely Q-, Q-UCB, and Q-UCB-
H, in terms of sensing accuracy and energy efficiency. The
adopted majority vote mechanism makes the PCMARL al-
gorithm resilient to SDF attacks even when 30% of the
reported sensing results have been falsified. Unlike centralized
fully CSS approaches, the proposed PCMARL algorithm has
a coalition-specific partially cooperative sensing and belief-
sharing mechanism that makes its performance directly propor-
tional to the number of devices in the network. Moreover, the
algorithm forces agents to learn to optimize their participation
percentage to conserve energy. Agents also learn to switch
channels based on their reward history to maximize their
rewards. Future works may consider approximating the Q-table
to extend the work to very large state-spaces and implement
it in low-powered devices.
ACK NOW LE DG ME NT
This work is supported by FRQNT scholarship number
305094 and the Canada Research Chair Program tier-II entitled
”Towards a Novel and Intelligent Framework for the Next
Generations of IoT Networks”.
REF ER EN CE S
[1] C. MacGillivray and D. Reinsel, “Worldwide global datasphere
iot device and data forecast, 2019-2023,” Available at
https://www.idc.com/getdoc.jsp?containerId=prUS45213219
(2019/06/18).
[2] A. A. Khan, M. H. Rehmani, and A. Rachedi, “When Cognitive Radio
meets the Internet of Things?” 2016 International Wireless Communica-
tions and Mobile Computing Conference, IWCMC, pp. 469–474, 2016.
[3] J. Mitola and G. Maguire, “Cognitive radio: making software radios
more personal,” IEEE Personal Communications, vol. 6, no. 4, pp. 13–
18, 1999.
[4] A. A. Khan, M. H. Rehmani, and A. Rachedi, “Cognitive-Radio-
Based Internet of Things: Applications, architectures, spectrum related
functionalities, and future research directions,” IEEE Wireless Commu-
nications, vol. 24, no. 3, pp. 17–25, 2017.
[5] P. Cheng, Z. Chen, M. Ding, Y. Li, B. Vucetic, and D. Niyato, “Spectrum
Intelligent Radio: Technology, Development, and Future Trends,IEEE
Communications Magazine, vol. 58, no. 1, pp. 12–18, 2020.
[6] Y. Zou, J. Zhu, L. Yang, Y. C. Liang, and Y. D. Yao, “Securing
physical-layer communications for cognitive radio networks,IEEE
Communications Magazine, vol. 53, no. 9, pp. 48–54, 2015.
[7] H. Vu-Van and I. Koo, “Cooperative spectrum sensing with collaborative
users using individual sensing credibility for cognitive radio network,
IEEE Transactions on Consumer Electronics, vol. 57, no. 2, pp. 320–
326, 2011.
[8] R. Sarikhani and F. Keynia, “Cooperative Spectrum Sensing Meets
Machine Learning: Deep Reinforcement Learning Approach,” IEEE
Communications Letters, vol. 24, no. 7, pp. 1459–1462, 2020.
[9] Y. Zhang, P. Cai, C. Pan, and S. Zhang, “Multi-Agent Deep Rein-
forcement Learning-Based Cooperative Spectrum Sensing With Upper
Confidence Bound Exploration,” IEEE Access, vol. 7, pp. 118898–
118 906, 2019.
[10] Y. Alghorani, G. Kaddoum, S. Muhaidat, and S. Pierre, “On the
Approximate Analysis of Energy Detection over n Rayleigh Fading
Channels Through Cooperative Spectrum Sensing,IEEE Wireless Com-
munications Letters, vol. 4, no. 4, pp. 413–416, 2015.
[11] F. Hussain, S. A. Hassan, R. Hussain, and E. Hossain, “Machine
Learning for Resource Management in Cellular and IoT Networks:
Potentials, Current Solutions, and Open Challenges,” IEEE Communi-
cations Surveys and Tutorials, vol. 22, no. 2, pp. 1251–1275, 2020.
[12] H. Li, X. Xing, J. Zhu, X. Cheng, K. Li, R. Bie, and T. Jing, “Utility-
Based Cooperative Spectrum Sensing Scheduling in Cognitive Radio
Networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 1,
pp. 645–655, 2017.
[13] K. Jagannathan, I. Menache, E. Modiano, and G. Zussman, “Non-
cooperative spectrum access the dedicated vs. free spectrum choice,
IEEE Journal on Selected Areas in Communications, vol. 30, no. 11,
pp. 2251–2261, 2012.
[14] A. Ali and W. Hamouda, “Advances on Spectrum Sensing for Cognitive
Radio Networks: Theory and Applications,” IEEE Communications
Surveys and Tutorials, vol. 19, no. 2, pp. 1277–1304, 2017.
[15] Z. Quan, S. Cui, and A. H. Sayed, “Optimal linear cooperation for
spectrum sensing in cognitive radio networks,IEEE Journal of selected
topics in signal processing, vol. 2, no. 1, pp. 28–40, 2008.
[16] J. Oueis, E. C. Strinati, and S. Barbarossa, “The fog balancing: Load dis-
tribution for small cell cloud computing,” in 2015 IEEE 81st Vehicular
Technology Conference (VTC Spring), 2015, pp. 1–6.
[17] M. Kim and I.-Y. Ko, “An efficient resource allocation approach based
on a genetic algorithm for composite services in IoT environments,” in
2015 IEEE International Conference on Web Services. IEEE, 2015,
pp. 543–550.
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 12
[18] Y. Xu, Y. Yang, Q. Liu, and Z. Li, “Joint energy-efficient resource
allocation and transmission duration for cognitive HetNets under
imperfect CSI,” Signal Processing, vol. 167, p. 107309, 2020. [Online].
Available: https://doi.org/10.1016/j.sigpro.2019.107309
[19] C. Pham, N. H. Tran, C. T. Do, S. I. Moon, and C. S. Hong, “Spectrum
handoff model based on hidden markov model in cognitive radio
networks,” in The International Conference on Information Networking
2014 (ICOIN2014), 2014, pp. 406–411.
[20] Q. Ni, R. Zhu, Z. Wu, Y. Sun, L. Zhou, and B. Zhou, “Spectrum
allocation based on game theory in cognitive radio networks.JNW,
vol. 8, no. 3, pp. 712–722, 2013.
[21] H. Li, X. Xing, J. Zhu, X. Cheng, K. Li, R. Bie, and T. Jing, “Utility-
Based Cooperative Spectrum Sensing Scheduling in Cognitive Radio
Networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 1,
pp. 645–655, 2017.
[22] D. J. Lee, “Adaptive random access for cooperative spectrum sensing in
cognitive radio networks,IEEE Transactions on Wireless Communica-
tions, vol. 14, no. 2, pp. 831–840, 2015.
[23] Z. Dai, Z. Wang, and V. W. Wong, “An Overlapping Coalitional Game
for Cooperative Spectrum Sensing and Access in Cognitive Radio
Networks,” IEEE Transactions on Vehicular Technology, vol. 65, no. 10,
pp. 8400–8413, 2016.
[24] M. C. Hlophe and S. B. Maharaj, “Spectrum Occupancy Reconstruction
in Distributed Cognitive Radio Networks Using Deep Learning,IEEE
Access, vol. 7, pp. 14 294–14 307, 2019.
[25] H. He and H. Jiang, “Deep Learning Based Energy Efficiency Opti-
mization for Distributed Cooperative Spectrum Sensing,IEEE Wireless
Communications, vol. 26, no. 3, pp. 32–39, 2019.
[26] Z. Shi, W. Gao, S. Zhang, J. Liu, and N. Kato, “AI-Enhanced Coop-
erative Spectrum Sensing for Non-Orthogonal Multiple Access,IEEE
Wireless Communications, vol. 27, no. 2, pp. 173–179, 2020.
[27] M. Bkassiny, Y. Li, and S. K. Jayaweera, “A Survey on Machine-
Learning Techniques in Cognitive Radios,” IEEE Communications Sur-
veys and Tutorials Tutorials, vol. 15, no. 3, pp. 1136–1159, 2013.
[28] K. M. Thilina, K. W. Choi, N. Saquib, and E. Nazmus, “Machine
Learning Techniques for Cooperative Spectrum Sensing in Cognitive
Radio Networks,” IEEE Journal on Selected Areas in Communications,
vol. 31, no. 11, pp. 2209–2221, 2013.
[29] G. Scutari and J. S. Pang, “Joint sensing and power allocation in noncon-
vex cognitive radio games: Nash equilibria and distributed algorithms,
IEEE Transactions on Information Theory, vol. 59, no. 7, pp. 4626–
4661, 2013.
[30] A. S. Alfa, B. T. Maharaj, S. Lall, and S. Pal, “Mixed-Integer Program-
ming based Techniques for Resource Allocation in Underlay Cognitive
Radio Networks : A Survey,” Journal of Communications and Networks,
vol. 18, no. 5, pp. 744–761, 2016.
[31] S. Verma, Y. Kawamoto, Z. M. Fadlullah, H. Nishiyama, and N. Kato,
“A Survey on Network Methodologies for Real-Time Analytics of
Massive IoT Data and Open Research Issues,IEEE Communications
Surveys and Tutorials, vol. 19, no. 3, pp. 1457–1477, 2017.
[32] C. H. Tan, K. C. Tan, and A. Tay, “Dynamic Game Difficulty Scaling
Using Adaptive Behavior-Based AI,IEEE Transactions on Computa-
tional Intelligence and AI in Games, vol. 3, no. 4, pp. 289–301, 2011.
[33] Z. Ali, L. Jiao, T. Baker, G. Abbas, Z. H. Abbas, and S. Khaf, “A
deep learning approach for energy efficient computational offloading in
mobile edge computing,” IEEE Access, vol. 7, pp. 149623–149 633,
2019.
[34] Z. Ali, S. Khaf, Z. Abbas, G. Abbas, L. Jiao, A. Irshad, K. Kwak, and
M. Bilal, “A comprehensive utility function for resource allocation in
mobile edge computing,” Cmc -Tech Science Press-, vol. 66, pp. 1461–
1477, 11 2020.
[35] Z. Ali, S. Khaf, Z. H. Abbas, G. Abbas, F. Muhammad, and S. Kim, “A
deep learning approach for mobility-aware and energy-efficient resource
allocation in mec,” IEEE Access, vol. 8, pp. 179530–179 546, 2020.
[36] H. Sami, A. Mourad, H. Otrok, and J. Bentahar, “FScaler: Automatic
Resource Scaling of Containers in Fog Clusters Using Reinforcement
Learning,” 2020 International Wireless Communications and Mobile
Computing, IWCMC 2020, pp. 1824–1829, 2020.
[37] ——, “Demand-Driven Deep Reinforcement Learning for Scalable Fog
and Service Placement,” IEEE Transactions on Services Computing, pp.
1–14, 2021.
[38] H. Sami, H. Otrok, J. Bentahar, and A. Mourad, “AI-based Resource
Provisioning of IoE Services in 6G: A Deep Reinforcement Learning
Approach,” IEEE Transactions on Network and Service Management,
pp. 1–14, 2021.
[39] J. Oksanen, J. Lund´
en, and V. Koivunen, “Reinforcement learning
based sensing policy optimization for energy efficient cognitive radio
networks,” Neurocomputing, vol. 80, pp. 102–110, 2012.
[40] S. Mosleh, Y. Ma, J. D. Rezac, and J. B. Coder, “Dynamic Spectrum
Access with Reinforcement Learning for Unlicensed Access in 5G
and beyond,” IEEE Vehicular Technology Conference, vol. 2020-May,
no. Ml, 2020.
[41] R. Sarikhani and F. Keynia, “Cooperative Spectrum Sensing Meets
Machine Learning: Deep Reinforcement Learning Approach,” IEEE
Communications Letters, vol. XX, no. X, pp. 1–1, 2020.
[42] Y. Zhang, P. Cai, C. Pan, and S. Zhang, “Multi-Agent Deep Rein-
forcement Learning-Based Cooperative Spectrum Sensing With Upper
Confidence Bound Exploration,” IEEE Access, vol. 7, pp. 118898–
118 906, 2019.
[43] J. Lunden, S. R. Kulkarni, V. Koivunen, and H. V. Poor, “Multiagent
reinforcement learning based spectrum sensing policies for cognitive
radio networks,” pp. 858–868, 2013.
[44] J. Lunden, V. Koivunen, S. R. Kulkarni, and H. V. Poor, “Exploiting
spatial diversity in multiagent reinforcement learning based spectrum
sensing,” 2011 4th IEEE International Workshop on Computational
Advances in Multi-Sensor Adaptive Processing, CAMSAP 2011, pp. 325–
328, 2011.
[45] K. S. Shin, G. H. Hwang, and O. Jo, “Distributed reinforcement learning
scheme for environmentally adaptive IoT network selection,Electronics
Letters, vol. 56, no. 9, pp. 441–444, 2020.
[46] H.-h. Chang, H. Song, Y. Yi, J. Zhang, H. He, and L. Liu, “Distributive
Dynamic Spectrum Access Through Deep Reinforcement Learning:
A Reservoir Computing-Based Approach,” IEEE Internet of Things
Journal, vol. 6, no. 2, pp. 1938–1948, 2019.
[47] T. Panayiotou, K. Manousakis, S. P. Chatzis, and G. Ellinas, “A Data-
Driven Bandwidth Allocation Framework with QoS Considerations for
EONs,” Journal of Lightwave Technology, vol. 37, no. 9, pp. 1853–1864,
2019.
[48] K. Zia, N. Javed, M. N. Sial, S. Ahmed, A. A. Pirzada, and F. Pervez, “A
Distributed Multi-Agent RL-Based Autonomous Spectrum Allocation
Scheme in D2D Enabled Multi-Tier HetNets,IEEE Access, vol. 7, pp.
6733–6745, 2019.
[49] Y. Xu, J. Yu, W. C. Headley, and R. M. Buehrer, “Deep Reinforcement
Learning for Dynamic Spectrum Access in Wireless Networks,Pro-
ceedings - IEEE Military Communications Conference MILCOM, vol.
2019-Octob, pp. 207–212, 2019.
[50] H. Chen, M. Zhou, K. Wang, and J. Li, “Joint Spectrum Sensing
and Resource Allocation Scheme in Cognitive Radio Networks with
Spectrum Sensing Data Falsification Attack,” IEEE Transactions on
Vehicular Technology, vol. 65, no. 11, pp. 9181–9191, 2016.
[51] A. Sajid, B. Khalid, M. Ali, S. Mumtaz, U. Masud, and F. Qamar,
“Securing Cognitive Radio Networks using blockchains,Future
Generation Computer Systems, vol. 108, pp. 816–826, 2020. [Online].
Available: https://doi.org/10.1016/j.future.2020.03.020
[52] N. Haddadou, A. Rachedi, and Y. Ghamri-Doudane, “A Job Market
Signaling Scheme for Incentive and Trust Management in Vehicular Ad
Hoc Networks,” IEEE Transactions on Vehicular Technology, vol. 64,
no. 8, pp. 3657–3674, 2015.
[53] A. Ahmadfard, A. Jamshidi, and A. Keshavarz-Haddad, “Probabilistic
spectrum sensing data falsification attack in cognitive radio networks,
Signal Processing, vol. 137, pp. 1–9, 2017. [Online]. Available:
http://dx.doi.org/10.1016/j.sigpro.2017.01.033
[54] J. Wu, C. Wang, Y. Yu, T. Song, and J. Hu, “Sequential fusion to defend
against sensing data falsification attack for cognitive Internet of Things,
ETRI Journal, vol. 42, no. August 2019, pp. 976–986, 2020.
[55] D. Bendouda, A. Rachedi, and H. Haffaf, “Programmable
architecture based on Software Defined Network for Internet of
Things: Connected Dominated Sets approach,” Future Generation
Computer Systems, vol. 80, pp. 188–197, 2018. [Online]. Available:
https://doi.org/10.1016/j.future.2017.09.070
[56] X. Zhang and H. Su, “CREAM-MAC: Cognitive Radio-EnAbled Multi-
Channel MAC Protocol Over Dynamic Spectrum Access Networks,
IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 1, pp.
110–123, 2011.
[57] J. N. Tsitsiklis, Asynchronous Stochastic Approximation and Q-
Learning, 1994, vol. 16, no. 3.
[58] A. L. Strehl, E. Wiewiora, J. Langford, and M. L. Littman, “PAC Model-
Free Reinforcement Learning,” Proceedings of the 23rd International
Conference on Machine Learning, pp. 881–888, 2006.
[59] R. S. Sutton and A. G. Barto, Reinforcement Learning:An Introduction,
2nd ed. The MIT Press, 2018.
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3116928, IEEE Internet of
Things Journal
JOURNAL OF XXXXX CLASS FILES, VOL. XX, NO. X, XXXXX 20XX 13
[60] M. Kearns and S. Singh, “Near-optimal reinforcement learning in
polynomial time,” Machine learning, vol. 49, no. 2, pp. 209–232, 2002.
[61] C. Jin, Z. Allen-Zhu, S. Bubeck, and M. I. Jordan, “Is Q-learning prov-
ably efficient?” Advances in Neural Information Processing Systems,
vol. 2018-Decem, no. NeurIPS, pp. 4863–4873, 2018.
[62] A. L. Strehl, E. Wiewiora, J. Langford, and M. L. Littman, “PAC Model-
Free Reinforcement Learning,” pp. 881–888, 2006.
Sadia Khaf received the B.E. degree in electrical en-
gineering from the National University of Sciences
and Technology, School of Electrical Engineering
and Computer Science (NUST-SEECS), Pakistan, in
2015. She received the M.S. degree in electrical
and electronics engineering from Bilkent University,
Turkey, in 2018. From 2015 to 2018, she was a
Research Assistant with IONOLAB, Turkey. From
2018 to 2020, she was with the Faculty of Elec-
trical Engineering, Ghulam Ishaq Khan Institute of
Engineering Sciences and Technology, Pakistan, as
a Lecturer. Currently, she is with ´
Ecole de Technologie sup´
erieure ( ´
ETS),
Canada, as a Ph.D. candidate. Her research interests include cognitive radio
networks, internet of things, radio resource management, and machine learn-
ing. She received several research excellence awards and grants, including the
Fonds de recherche du Qu´
ebec, Nature et technologies (FRQNT) doctoral fel-
lowship, P.E.O. International Peace Scholarship, ´
ETS Bourses D’implication
aux Sup´
erieurs, and ´
ETS Palmar`
es F´
eminin pluriel award.
Mohammad T. Alkhodary received a B.S. de-
gree in Telecommunication Engineering from the
University of Science and Technology, Yemen, in
2008. He received an M.S. in Communication En-
gineering and Ph.D. in Electrical Engineering from
the King Fahd University of Petroleum and Miner-
als(KFUPM), Dhahran, Saudi Arabia, in 2012 and
2017, respectively. From 2015 to 2018, he held
research positions at King Abdullah University of
Science and Technology, Thuwal, Saudi Arabia. He
was a visiting scholar at the Georgia Institute of
Technology (GerorgiaTech), Atlanta, USA, in 2017. In 2018, he joined ´
Ecole
de Technologie Sup´
erieure ( ´
ETS) as a postdoctoral Fellow. His research inter-
est includes MLOps, the applications of ML and signal processing techniques
to wireless communications, Cloud-native for B5G, Node Clustering, ML for
the Edge-services.
Georges Kaddoum (Senior Member, IEEE) re-
ceived the bachelor’s degree in electrical engineering
from the ´
Ecole Nationale Sup´
erieure de Techniques
Avanc´
es (ENSTA), France, the M.Sc. degree in
telecommunications and signal processing from the
Telecom Bretagne (ENSTB), Brest, in 2005, and
the Ph.D. degree in signal processing and telecom-
munications from the National Institute of Applied
Sciences (INSA), Toulouse, France, in 2009. He is
currently a Full Professor and the Tier-II Canada
Research Chair with the ´
Ecole de Technologie
Sup´
erieure, University of Quebec, Montr´
eal, Canada. He has published over
150 journal and conference papers and has two pending patents. His recent
research interests include IoT wireless communication networks, resource
allocations, security and space communications, and navigation. He was
awarded the ´
ETS Research Chair in physical-layer security for wireless
networks in 2014; the Research Excellence Award of the Universit´
e du
Qu´
ebec, in 2018; the Prestigious Tier 2 Canada Research Chair in wireless
IoT networks in 2019; and the Research Excellence Award from the ´
ETS in
recognition of his outstanding research outcomes, in 2019.
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on October 12,2021 at 17:00:51 UTC from IEEE Xplore. Restrictions apply.
... Reinforcement learning (RL)-based sensing and scheduling algorithms offer promise for addressing the energy constraints of CRNs [4], [5], [6]. Unlike heuristic or gametheoretic methods, RL-based algorithms adapt to PU traffic patterns and eliminate the need for constant sensing by learning in real time and facilitating SU cooperation to conserve energy [3]. ...
... However, SFA and the energy-intensive nature of these approaches remains a pressing concern, particularly for resource-constrained IoT devices operating in CRNs [14], [15], [16], [10]. In response to this challenge, recent research endeavors have increasingly focused on harnessing machine learning approaches to enhance spectrum sensing efficiency, adaptability, and energy conservation in CRNs [4], [5], [6]. ...
... The algorithm, which employs a deep deterministic policy gradient (DDPG) structure with four deep neural networks, can handle hybrid action spaces and is benchmarked against exhaustive search and other algorithms. Participatory spectrum sensing and results sharing enable partial cooperation among agents to conserve energy and improve sensing accuracy while avoiding causing interference to PUs [4]. The proposed coalition scheme enhances the SUs' learning with ICSI and mitigates SFA and SDF attacks for granular optimization of the sensing, cooperation, and transmission processes. ...
Article
Full-text available
Cognitive radio networks (CRNs) mitigate spectrum scarcity by leveraging the holes in the licensed spectrum to enable internet of things (IoT) devices to opportunistically access the spectrum. However, IoT devices need to sense the spectrum before they can access it, which is an energy-intensive process and hinders the practical implementation of opportunistic spectrum access for energy-constrained IoT devices. In this context, reinforcement learning-based algorithms that encourage cooperation among IoT devices to eliminate the need for constant sensing are promising candidates for practical CRN implementation. As exciting as the application of reinforcement learning to CRNs is, benchmarking the performance of different algorithms is a huge challenge due to a lack of standardized comparison metrics, especially for hybrid action spaces that comprise both discrete and continuous actions. We propose a hybrid discrete-continuous space deep reinforcement learning algorithm that maximizes the energy efficiency of CRNs by optimizing sensing, cooperation, and transmission by IoT devices.We also analyze the algorithm’s performance by setting the theoretical upper bound for throughput and find that it reaches 99.4% of the theoretical upper bound, while its discrete action-space version reaches 96% and other baseline algorithms range between 70% and 86%.
... Simulations show that the proposed method is applicable to existing CR environments. Since CSS methods may suffer from, for example, network attacks like sensed data forgery (SDF), Sadia Khaf, in [40], proposed a scalable, partial CSS algorithm that is able to improve the sensing accuracy while reducing the sharing overhead, and the results show that the performance of the proposed algorithm is proportional to the number of devices and that it is suitable for multi-device connectivity. As for the underwater cognitive network environment, Changho Yun, in [41], proposed an underwater cooperative spectrum-sharing (UCSS) protocol for centralized underwater cognitive acoustic networks, which can effectively utilize non-proprietary hydroacoustic frequencies by periodically detecting the random appearance of interference, and two heuristic resource allocation algorithms of multi-round RA (MRRA) and mono-regular RA (SRRA), and the results show that MRRA outperforms SRRA in a variety of scenarios and is superior to SRRA. ...
... Knowing a priori information about the signal, higher SNR, and higher phase synchronization. [37][38][39][40][41][42][43] Cooperative spectrum sensing ...
Article
Full-text available
In recent years, with the rapid development in wireless communication and 5G networks, the rapid growth in mobile users has been accompanied by an increasing demand for the electromagnetic spectrum. The birth of cognitive radio and its spectrum-sensing technology provides hope for solving the problem of low utilization of the wireless spectrum. Artificial intelligence (AI) has been widely discussed globally. Deep learning technology, known for its strong learning ability and adaptability, plays a significant role in this field. Moreover, integrating deep learning with wireless communication technology has become a prominent research direction in recent years. The research objective of this paper is to summarize the algorithm of cognitive radio spectrum-sensing technology combined with deep learning technology. To review the advantages of deep-learning-based spectrum-sensing algorithms, this paper first introduces the traditional spectrum-sensing methods. It summarizes and compares the advantages and disadvantages of each method. It then describes the application of deep learning algorithms in spectrum sensing and focuses on the typical deep-neural-network-based sensing methods. Then, the existing deep-learning-based cooperative spectrum-sensing methods are summarized. Finally, the deep learning spectrum-sensing methods are discussed, along with challenges in the field and future research directions.
... Due to the scattered natures of cognitive radio devices across different locations, the cooperation among those nodes to operate with improved accuracy in spectrum sensing often remains a challenging issue. Further, the accuracy also depends on the number of channels and states in addition to the geographical locations of the nodes in the CRN [103]. Moreover, the correlation of the measured spectrum performed using efficient computation must be robust to noises with enhanced prediction accuracy. ...
Preprint
Full-text available
Deep learning has been proven to be a powerful tool for addressing the most significant issues in cognitive radio networks, such as spectrum sensing, spectrum sharing, resource allocation, and security attacks. The utilization of deep learning techniques in cognitive radio networks can significantly enhance the network's capability to adapt to changing environments and improve the overall system's efficiency and reliability. As the demand for higher data rates and connectivity increases, B5G/6G wireless networks are expected to enable new services and applications significantly. Therefore, the significance of deep learning in addressing cognitive radio network challenges cannot be overstated. This review article provides valuable insights into potential solutions that can serve as a foundation for the development of future B5G/6G services. By leveraging the power of deep learning, cognitive radio networks can pave the way for the next generation of wireless networks capable of meeting the ever-increasing demands for higher data rates, improved reliability, and security.
... Hence, by means of SD, J. Wu et al. further decreased samples required by CSS to improve the cooperative efficiency in References 13-15. To be specific, 11,12,31 took the energy efficiency and consumption, or detection performance in hostile sensing environments into account, respectively. In addition, in References 36-38, M. Faheem et al. also took some optimization algorithms into consideration to improve communication security, decrease communication delay and energy consumption in 5G/smart grid, respectively. ...
Article
Full-text available
With the rapid growth of internet of thing (IoT) devices, cooperative spectrum sensing (CSS) has emerged as a promising solution to leverage the spatial diversity of multiple secondary IoT sensing nodes (SNs) for spectrum availability. However, the cooperative paradigm also incurs increased cooperative costs between each SN and the fusion center (FC), leading to decreased cooperative efficiency and achievable throughput, especially in large‐scale cognitive IoT (CIoT). To address these challenges, we present a sequential detection with feedback information (SD‐FI) approach in this paper. To achieve this objective, we propose a two‐way CSS model that formulates an optimization problem of Bayes cost in a quickest detection framework with feedback. To solve this optimization problem, we derive the structure of the optimal local decision rule from the local decision function and determine the optimal detection threshold in conjunction with the cost function. Following the optimal threshold pair, we implement the optimal SD‐FI and theoretically demonstrate the uniqueness of the optimal threshold and optimal sensing time. Simulation results demonstrate superiority of SD‐FI in terms of cooperative performance (i.e., detection performance and Bayes cost) and sample size. Notably, even with limited sensing time, our proposed SD‐FI exhibits high throughput, highlighting its effectiveness in enhancing spectrum availability and utilization in CIoT.
... The advent of Industry 4.0 is a bright spot of innovation in the dynamic field of industrial systems, as it turns conventional manufacturing into intelligent, networked ecosystems. This paper outlines a novel introduction to a forward-looking Internet of Things (IoT) framework that is carefully designed to integrate robustness and scalability, acting as the cornerstone for putting future developments in intelligent and adaptable industrial systems into practice [1]. ...
Article
Full-text available
The emergence of Industry 4.0 signifies a paradigm shift in industrial systems, characterized by the amalgamation of digital technologies with tangible operations. The goal of this study is to present a state-of-the-art, scalable, and robust Internet of Things (IoT) framework that will enable future innovations in intelligent and adaptable industrial systems to be seamlessly integrated. Our framework gives scalability first priority in response to Industry 4.0's dynamic nature, which is marked by fast technical evolution and rising connection in order to handle the expanding ecosystem of networked devices. The suggested structure places a strong emphasis on resilience and is designed to resist setbacks and guarantee the continuation of vital industrial processes. Our framework improves industrial systems' intelligence by utilizing edge computing, machine learning techniques, and improved communication protocols. This allows the systems to self-adapt to changing situations. Moreover, it adopts a modular architecture that facilitates interoperability and makes it simple to integrate various devices and technologies. Our IoT framework creates a solid, flexible, and future-proof industrial environment with this all-encompassing strategy, enabling businesses to confidently and effectively traverse Industry 4.0's frontiers.
... Cognitive radio expands the concepts of software-defined radio (SDR) and hardware radio from a device with a specific purpose to a radio that perceives and reacts to its operating environment [4]. Therefore, the ratio that automatically predicts the channels in the wireless spectrum adjusts its reception or transmission characteristics to permit many concurrent wireless communications inside a specific frequency band [5]. Cognitive radio is a means to automatically exploit neighbouring unutilized spectrum to provide exclusive spectrum access routes. ...
Article
During the last decade, Cognitive Radio Network (CRN) technology has been a significant advance in addressing the ever-increasing spectrum demand. As the number of licensed and unlicensed users in a network rises to complete a certain activity, the information exchange between various types of traffic becomes more complex and difficult. Congestion in CRN is also caused by the conflict among several users for channel access (PUs and SUs). In a very crowded network, many applications perform badly owing to packet collisions and, as a result, packet loss before significant buffer queue building. This circumstance is aggravated by an increase in network users. Congestion control is a vital and essential aspect of the present research issue in communication networks. Several recent reviews in the literature indicate that the congestion issue in the CRN has not been thoroughly studied. Thus, effective and efficient congestion control strategies are sought to optimize network resource usage and management. To prevent congestion, it is crucial for CRN to do research on the creation of an efficient congestion management system. This will improve the network's resource consumption and performance. "Performance improvement via efficient spectrum management through optimum resource management and congestion control in the CRN by mitigating different threats" is the primary target of this project. This research also focused on enhancing performance by addressing security issues in an IoT-based CRN environment. This study provides a comprehensive review of several similar studies and their limitations, which may be used to formulate a new research target.
Article
Full-text available
The advent of 6G wireless communication promises improvements in signal coverage, data rates, and latency, addressing increased connectivity demands due to the proliferation of 5G, IoT, and augmented reality. This paper introduces a Quantum-Secured IoT Communication Framework designed for 6G Cognitive Radio Networks (CRNs) in order to cater to the growing need for dependable and protected connectivity. Noteworthy aspects of this framework encompass dual-layer authentication utilizing Quantum Key Distribution (QKD) and Public Key Infrastructure (PKI), secured spectrum access regulations, and efficient beamforming strategies. Moreover, the framework employs a Reinforcement Learning-based Ensemble Regression (RL-ER) model for spectrum sensing and a Multi-Layer Perceptron with Kalman Filter (MLP-KF) for Channel State Information (CSI) prediction. Simulations demonstrate that the framework significantly improves prediction accuracy, encryption and decryption times, and error rates, thereby enhancing IoT network performance with better signal coverage, reduced latency, and robust security.
Article
The advancement of 6G cognitive radio networks aims to reduce latency in rural and remote areas. Very few studies have been conducted on this technology. Therefore, this study utilizes massive multiple-input, multiple-output (MIMO) technology for secure data transmission at 6G base stations. Blockchain technology authenticates IDs and maintains secure records for network users, with decentralization achieved through the chimp optimization algorithm. The availability of the spectrum is monitored using the Q-learning hidden sparse variate logistic regression model, and the channel-state information is predicted using the quasi-Newton iterative unscented Kalman filter algorithm. Additionally, beamforming is enhanced through cooperative strategies. Secure routing is facilitated by the golden eagle optimization-hyper elliptic curve cryptography algorithm, where data are routed according to paths determined by the Dijkstra algorithm. The MIMO-6G-cognitive radio-based Internet-of-Things framework performs better compared to existing methods.
Article
Full-text available
Cognitive Radio Ad-hoc Networks (CRAHNs) combines characteristics of ad-hoc networks with cognitive radios to facilitate a variety of communication scenarios. However, these networks are subject to persistent attacks from internal and external adversaries, such as Masquerading, Spoofing, Spying, and Distributed Denial of Service (DDoS). Existing deep learning models proposed to counter these attacks suffer from complexity, real-time processing limitations, and a lack of network scalability. In addition, their limited IP tracing capabilities make them unsuitable for real-time deployments. To address these issues, this paper proposes a novel blockchain-based model for deep traffic pattern analysis and mitigation of malicious attacks in CRAHNs. The proposed model employs a multi-step methodology. The collected traffic patterns are then used to train a deep Convolutional Neural Network (dCNN) model. This trained model permits the temporal classification of incoming real-time traffic packets. The packets are stored on a distributed network based on blockchain technology to ensure data integrity, transparency, and traceability. This blockchain implementation renders the packets immutable and easily accessible, while also facilitating the recommendation of improved mitigation strategies. The blockchain employs a Proof-of-Trust (PoT) consensus mechanism and is managed via a Genetic Algorithm (GA)-based sidechaining model, which significantly reduces access and writing delays for various packets. The proposed model achieves a significant reduction in communication delay in comparison to existing models, with decreases of 35.4%, 19.5%, and 23.5% for BIDS, BIST WM, and GAN, respectively. Additionally, energy consumption is reduced by 18.5%, 18.3%, and 24.5%, respectively. In addition, the model exhibits an increase in throughput of 14.5%, 16.4%, and 15.5%, respectively. Lastly, it improves the accuracy of attack detection during various communications by 8.3%, 23.2%, and 8.3%, respectively. This paper presents a promising solution for enhancing the security of CRAHNs by providing a robust defense against malicious attacks and real-time protection for dynamic ad hoc networks. The integration of deep learning, blockchain technology, and optimization techniques results in significant improvements in performance metrics, demonstrating the potential of the proposed model for ensuring the security and integrity of CRAHNs.
Article
Full-text available
Currently, researchers have motivated a vision of 6G for empowering the new generation of the Internet of Everything (IoE) services that are not supported by 5G. In the context of 6G, more computing resources are required, a problem that is dealt with by Mobile Edge Computing (MEC). However, due to the dynamic change of service demands from various locations, the limitation of available computing resources of MEC, and the increase in the number and complexity of IoE services, intelligent resource provisioning for multiple applications is vital. To address this challenging issue, we propose in this paper IScaler, a novel intelligent and proactive IoE resource scaling and service placement solution. IScaler is tailored for MEC and benefits from the new advancements in Deep Reinforcement Learning (DRL). Multiple requirements are considered in the design of IScaler’s Markov Decision Process. These requirements include the prediction of the resource usage of scaled applications, the prediction of available resources by hosting servers, performing combined horizontal and vertical scaling, as well as making service placement decisions. The use of DRL to solve this problem raises several challenges that prevent the realization of IScaler’s full potential, including exploration errors and long learning time. These challenges are tackled by proposing an architecture that embeds an Intelligent Scaling and Placement module (ISP). ISP utilizes IScaler and an optimizer based on heuristics as a bootstrapper and backup. Finally, we use the Google Cluster Usage Trace dataset to perform real-life simulations and illustrate the effectiveness of IScaler’s multi-application autonomous resource provisioning.
Article
Full-text available
In mobile edge computing (MEC), one of the important challenges is how much resources of which mobile edge server (MES) should be allocated to which user equipment (UE). The existing resource allocation schemes only consider CPU as the requested resource and assume utility for MESs as either a random variable or dependent on the requested CPU only. This paper presents a novel comprehensive utility function for resource allocation in MEC. The utility function considers the heterogeneous nature of applications that a UE offloads to MES. The proposed utility function considers all important parameters, including CPU, RAM, hard disk space, required time, and distance, to calculate a more realistic utility value for MESs. Moreover, we improve upon some general algorithms , used for resource allocation in MEC and cloud computing, by considering our proposed utility function. We name the improved versions of these resource allocation schemes as comprehensive resource allocation schemes. The UE requests are modeled to represent the amount of resources requested by the UE as well as the time for which the UE has requested these resources. The utility function depends upon the UE requests and the distance between UEs and MES, and serves as a realistic means of comparison between different types of UE requests. Choosing (or selecting) an optimal MES with the optimal amount of resources to be allocated to each UE request is a challenging task. We show that MES resource allocation is sub-optimal if CPU is the only resource considered. By taking into account the other resources, i.e., RAM, disk space, request time, and distance in the utility function, we demonstrate improvement in the resource allocation algorithms in terms of service rate, utility , and MES energy consumption.
Article
Full-text available
Mobile Edge Computing (MEC) has emerged as an alternative to cloud computing to meet the latency and Quality-of-Service (QoS) requirements of mobile devices. In this paper, we address the problem of server resource allocation in MEC. Due to the dynamic load conditions on MEC servers, their resources need to be used intelligently to meet the QoS requirements of the users and to minimize server energy consumption. We present a novel resource allocation algorithm, called Power Migration Expand (PowMigExpand). Our algorithm assigns user requests to the optimal server and allocates optimal amount of resources to User Equipment (UE) based on our comprehensive utility function. The PowMigExpand also migrates UE requests due to the mobility of users, to a new server when needed. We also present a low cost Energy Efficient Smart Allocator (EESA) algorithm that uses deep learning for energy efficient allocation of requests to optimal servers. The proposed algorithm considers varying load of incoming requests and their heterogeneous nature, energy efficient activation of servers, and Virtual Machine (VM) migration for smart resource allocation and, thus, is the first comprehensive approach to address the complex and multidimensional resource allocation problem using deep learning. We compare our proposed algorithm with other resource allocation approaches and show that our approach can handle the dynamic load conditions better than the others.
Article
Full-text available
Proliferation of smart internet of things (IOTs) devices has boosted the improvement of multiple networking functions which have a different capability in terms of capacity and access delay. Herein, the networking function of IoT devices should be properly selected to fully utilise the capabilities of the different types of networking technologies. In this Letter, a reinforcement learning‐based self‐organising scheme is proposed for the IOTs. A node selects an adequate IoT network function and adapts its topology by learning channel circumstance. To verify the performance of the proposed learning‐based scheme, simulations reflect a multiple number of heterogeneous IoT networks and show that the average latency of IoT devices can be efficiently reduced compared to the conventional benchmark networks (Wi‐Fi and narrow band IoT).
Article
The increasing number of Internet of Things (IoT) devices necessitates the need for a more substantial fog computing infrastructure to support the users' demand for services. In this context, the placement problem consists of selecting fog resources and mapping services to these resources. This problem is particularly challenging due to the dynamic changes in both users' demand and available fog resources. Existing solutions utilize on-demand fog formation and periodic container placement using heuristics due to the NP-hardness of the problem. Unfortunately, constant updates of services are time consuming in terms of environment setup, especially when required services and available fog nodes are changing. Therefore, due to the need for fast and proactive service updates to meet users demand, and the complexity of the container placement problem, we propose in this paper a Deep Reinforcement Learning (DRL) solution, named Intelligent Fog and Service Placement (IFSP), to perform instantaneous placement decisions proactively. The DRL-based IFSP is developed through a scalable Markov Decision Process (MDP) design. To address the long learning time for DRL to converge, and the high volume of errors needed to explore, we also propose a novel end-to-end architecture utilizing a service scheduler and a bootstrapper.
Article
Cognitive radio network (CRN) emerged to utilize the frequency bands efficiently. To use the frequency bands efficiently without any interference on the licensed user, detection of the frequency holes is the first step, which is called spectrum sensing in the context. In order to increase the quality of local spectrum sensing results, cooperative spectrum sensing (CSS) is introduced in the literature to combine the local sensing results. Recently, machine learning techniques are designed to improve the classification of the images and signals. Specifically, Deep Reinforcement Learning (DRL) is of interest for its substantial improvement in the classification problems. In this paper, we have proposed DRL based CSS algorithm, which is employed to decrease the signaling in the network of SUs. The simulation results represent the superiority of the proposed approach to state-of-the-art approaches, including Deep Cooperative Sensing (DCS), K-out-of-N, and Support Vector Machine (SVM) based CSS algorithms.
Article
Due to the increase in industrial applications of Internet of Things (IoT), number of internet connected devices have been increased accordingly. This has resulted in big challenges in terms of accessibility, scalability, connectivity and adaptability. IoT is capable of creating connections between devices on wireless medium but the utilization of scarce spectrum in efficient manner for the establishment of these connections is the biggest concern. To accommodate spectrum allocation problem different radio technologies are being utilized. One of the most efficient technique being used is cognitive radio, which dynamically allocate the unlicensed spectrum for IoT applications. Spectrum sensing being the fundamental component of Cognitive Radio Network (CRN) is threatened by security attacks. Process of spectrum sensing is disturbed by the malicious user (MU) which attacks the primary signal detection and affects the accuracy of sensing outcome. The presence of such MU in system, sending false sensing data can degrade the performance of cognitive radios. Therefore, in this article a blockchain based method is proposed for the MU detection in network. By using this method an MU can easily be discriminated from a reliable user through cryptographic keys. The efficiency of the proposed mechanism is analyzed through proper simulations using MATLAB. Consequently, this mechanism can be deployed for the validation of participating users in the process of spectrum sensing in CRN for IoTs.
Article
Internet of Things (IoT) is considered the future network to support wireless communications. To realize an IoT network, sufficient spectrum should be allocated for the rapidly increasing IoT devices. Through cognitive radio, unlicensed IoT devices exploit cooperative spectrum sensing (CSS) to opportunistically access a licensed spectrum without causing harmful interference to licensed primary users (PUs), thereby effectively improving the spectrum utilization. However, an open access cognitive IoT allows abnormal IoT devices to undermine the CSS process. Herein, we first establish a hard‐combining attack model according to the malicious behavior of falsifying sensing data. Subsequently, we propose a weighted sequential hypothesis test (WSHT) to increase the PU detection accuracy and decrease the sampling number, which comprises the data transmission status‐trust evaluation mechanism, sensing data availability, and sequential hypothesis test. Finally, simulation results show that when various attacks are encountered, the requirements of the WSHT are less than those of the conventional WSHT for a better detection performance.