ArticlePDF Available

A Q -Learning Approach for Real-Time NOMA Scheduling of Medical Data in UAV-aided WBANs

Authors:
  • Toronto Metropolitan University

Abstract and Figures

Unmanned Aerial Vehicles (UAVs) have emerged as a flexible and cost-effective solution for remote monitoring of the vital signs of patients in large-scale Internet of Medical Things (IoMT) Wireless Body Area Networks (WBANs). This paper deals with the problem of using UAVs for real-time scheduling of the transmission of vital signs in delay-sensitive IoMT WBANs. The main challenge for such a network is to timely and reliably transmit the vital signs of patients to the remote monitoring center without interrupting their daily lifestyles. To achieve this goal, we propose a Q -learning-based algorithm to optimize the trajectory of each UAV, as the mobile Base Station (BS), to harvest vital signs of patients in outdoor applications, especially in unreachable areas. In this algorithm, UAVs learn to reach the best 3D position by discovering the network environment step-by-step. It stands for the position in which the covered patients by each UAV have the highest transmission rate, the least delay and energy consumption. Moreover, we employ the Non-Orthogonal Multiple Access (NOMA) technique to simultaneously schedule multiple transmissions by accepting a degree of interference between them in order to enhance the spectrum efficiency of the network. Eventually, the performance of our proposed scheme is evaluated via extensive simulations in terms of throughput, energy consumption, and delay. The simulation results show that our proposed scheme iteratively converges to the benchmark value of the mentioned factors by increasing the information of cluster environment through episodes.
Content may be subject to copyright.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
AQ-Learning Approach for Real-Time
NOMA Scheduling of Medical Data in
UAV-aided WBANs
ZEINAB ASKARI1, JAMSHID ABOUEI2, (Senior, IEEE), MUHAMMAD JASEEMUDDIN3,
(Member, IEEE), ALAGAN ANPALAGAN4, (Senior, IEEE), KONSTANTINOS N PLATANIOTIS,
5,(Fellow, IEEE)
1WINEL research group in the Department of Electrical Engineering, Yazd University (e-mail: askarizeinab1989@gmail.com)
2WINEL research group in the Department of Electrical Engineering, Yazd University (e-mail: abouei@yazd.ac.ir)
3Department of Electrical, Computer and Biomedical Engineering, Ryerson University, Toronto, Canada (e-mail: jaseem@ee.ryerson.ca)
4Department of Electrical, Computer and Biomedical Engineering, Ryerson University, Toronto, Canada (e-mail: alagan@ee.ryerson.ca)
5Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada (e-mail: kostas@comm.utoronto.ca)
Corresponding author: First A. Author (e-mail: author@ boulder.nist.gov).
ABSTRACT Unmanned Aerial Vehicles (UAVs) have emerged as a flexible and cost-effective solution
for remote monitoring of the vital signs of patients in large-scale Internet of Medical Things (IoMT)
Wireless Body Area Networks (WBANs). This paper deals with the problem of using UAVs for real-time
scheduling of the transmission of vital signs in delay-sensitive IoMT WBANs. The main challenge for
such a network is to timely and reliably transmit the vital signs of patients to the remote monitoring center
without interrupting their daily lifestyles. To achieve this goal, we propose a Q-learning-based algorithm
to optimize the trajectory of each UAV, as the mobile Base Station (BS), to harvest vital signs of patients
in outdoor applications, especially in unreachable areas. In this algorithm, UAVs learn to reach the best
3D position by discovering the network environment step-by-step. It stands for the position in which the
covered patients by each UAV have the highest transmission rate, the least delay and energy consumption.
Moreover, we employ the Non-Orthogonal Multiple Access (NOMA) technique to simultaneously schedule
multiple transmissions by accepting a degree of interference between them in order to enhance the spectrum
efficiency of the network. Eventually, the performance of our proposed scheme is evaluated via extensive
simulations in terms of throughput, energy consumption, and delay. The simulation results show that our
proposed scheme iteratively converges to the benchmark value of the mentioned factors by increasing the
information of cluster environment through episodes.
INDEX TERMS IoMT WBAN, UAV, latency, trajectory, NOMA, Q-learning.
I. INTRODUCTION
A. BACKGROUND AND RELATED WORK
EXPANDING Internet of Things (IoT) devices every-
where enables the real-time monitoring of vital signals
of patients in indoor/outdoor environments without inter-
rupting their daily lifestyles. In particular, in recent years,
remotely tracking the vital signs of patients has been gaining
a lot of research interests in terms of 5G and 6G wireless
networks and beyond. These devices including smart phones
and smart watches should have the ability to connect to
the internet via Base Stations (BSs) or cloud servers and
transmit vital signs of patients to the remote health-care
center with high reliability. The above characteristics have
recently expanded the traditional Wireless Body Area Net-
works (WBANs) to new emerging Internet of Medical Things
(IoMT)-based WBANs in emergency situations for outdoor
environments [1]. In particular, COVID-19 pandemic that
has jeopardized the health and safety of elderly resulted in
significant dependence on IoMT-based WBANs. Different
from traditional healthcare applications where controlling the
vital signs of patients were only possible inside the hospitals
and by wired equipment, in recent IoMT-based WBANs,
there is no need for patients to monitor their vital signs only
by hospitalizing in a medical center. Indeed, in the state-of-
the-art IoMT-based WBANs, the vital signs of patients are
collected via their smartphones or smartwatches and then
VOLUME 4, 2016 1
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
transmitted to the remote monitoring center, on a regular
basis, without interrupting their everyday life. In other words,
these IoMT devices provide access to the Internet everywhere
for remotely monitoring of vital signs. In these kinds of
networks, a massive volume of traffic load will be invariably
generated by several patients. In this regard, the most chal-
lenging issue in such an ultra-dense E-health care network
is aggregating data packets of these patients in a timely and
reliable manner. Moreover, this problem will become far and
away crucial when compulsive emergency conditions occur
to the patients located in unreachable areas where static Base
Stations (BSs) will not be able to provide efficient services.
In these delay-sensitive applications, using Unmanned Aerial
Vehicles (UAVs), as mobile base stations, will be a viable
solution for real-time remote monitoring of vital signs. As
demonstrated in the literature [2]–[4], UAVs provide an
efficient and reliable data harvesting system for wireless
networks by hovering over outdoor areas. In addition, UAVs
can easily access data packets of patients in unreachable areas
by providing air-to-ground communication links and enhance
throughput and energy efficiency of WBANs by adapting
their horizontal locations and altitudes.
Furthermore, optimizing the horizontal and vertical tra-
jectory of UAVs leads to a remarkable increase in the net-
work coverage area compared to employing static BSs on
the ground which further enhances the throughput, e.g., the
sum capacity, and quality of service of the network. It is
worth mentioning that the authors in [5] state that the use
of UAVs as BSs in future wireless communication networks
is currently gaining significant attention for its ability to
yield ultra-flexible deployments, in use cases like disaster
recovery scenarios. Moreover, in [6] it is asserted that future
mobile networks aim to realize larger coverage, support more
devices, and achieve higher throughput to meet the explosive
increasing demand for data. As a result, there has been grow-
ing interest in hybrid cellular networks assisted by UAVs
as mobile BSs, due to their mobility and flexibility. In [7],
furthermore, UAVs are claimed to be capable of providing
wireless connectivity even without network infrastructure or
complementing the conventional BSs, whose coverage may
suffer from severe blockage due to tall buildings or the
damage caused by natural disasters.
Although UAVs considerably expand the coverage area of
the network, efficiently designing their trajectory is a major
challenge in IoMT-based WBANs, as we should reduce the
delay and the energy consumption along with increasing
the transmission rate of the network. In this regard, several
research attentions turned recently to employ Reinforcement
Learning (RL)-based algorithms to optimally find the best
location of UAVs in different wireless networks such as
wireless sensor networks, cellular systems, and vehicular
networks [7]–[12], even though, to the best of our knowl-
edge, there is not any RL-based algorithm in UAV-assisted
WBANs. Generally, the relevant RL-based algorithms are
classified into two main categories i)model-based, and ii)
model-free RL schemes. In the model-based class, the agent
computes the transition probability distribution and reward
function of all possible state-action pairs and then uses this
model to optimize the policy by predicting the best actions
that lead to higher rewards through interacting with the
environment. In contrast, in model-free algorithms like Q-
Learning (QL), the agent does not employ the transition
probability to predict the best action, instead, it optimizes
the policy by making direct decisions via a trial and error
mechanism. Recently, QL algorithms have attracted remark-
able research attentions in finding an optimal trajectory of
UAVs. This algorithm is among the model-free RL category
and is based on the Markov Decision Process (MDP) to
sequentially find the best position of UAVs in each state in
order to achieve the highest reward in terms of optimizing the
aforementioned factors. In addition, in the QL class, the agent
has no prior knowledge about the environment and the reward
of each state-action transition. It is worth mentioning that due
to a great deal of time needed by complex machine learning
algorithms such as DQN, DDQN, and Rainbow schemes, for
collecting enormous datasets as a reply buffer and training
procedure of neural networks, they are sort of slow in terms
of converging to the optimal value. Taking this issue into
account, we employ an efficient Q-learning scheme with
the modified state and action space to address the real-time
transmission of delay-sensitive data packets in the proposed
test-bed.
Although the problem of utilizing UAVs has been studied
in various wireless networks, to the best of our knowledge,
there exists a few research works on the applications of
UAVs in IoMT-WBANs [13]–[20]. Authors in [13], [14]
identify open research issues and challenges in UAV-assisted
health-care intelligent systems and explain some of the prac-
tical attempts that have been made for employing UAVs in
emergency medical services. References [15], [16] study the
security issues in outdoor health monitoring systems with
the help of UAVs. Authors in [17]–[19] aim to improve the
procedure of collecting data from bio-sensors using UAVs.
Finally, the authors in [20] focus on optimizing the UAV
placement over a serving area where UAV is considered as
a fog node to serve the IoMT devices on the ground. In
this regard, they propose a particle swarm optimization-based
algorithm to improve the communication coverage, energy
consumption, exploration area, and optimal number of UAVs.
The Non-Orthogonal Multiple Access (NOMA) scheme is
another promising technique in reducing delay and increas-
ing the transmission rate of vital signals to the remote moni-
toring center. Using this scheme, multiple patients can simul-
taneously transmit their vital signals satisfying the limited
interference at the receiver side. NOMA achieves this goal
by utilizing Superposition Coding (SC) at the transmitter and
Successive Interference Cancelation (SIC) at the receiver.
Thus, the NOMA scheduling scheme can considerably out-
perform conventional Orthogonal Multiple Access (OMA)
schemes in delay-sensitive and reliable applications.
This paper aims to address the aforementioned issues, for
the first time, by simultaneously employing multiple UAVs
2VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
as mobile BSs and NOMA scheduling technique. Generally
speaking, the main objective of this paper is the timely and
reliable transmission of vital signs of patients in outdoor
applications, especially in unreachable areas. To this end,
each UAV finds its best trajectory to harvest the vital signs of
covered patients in such a way that it increases the throughput
and reduces the delay and energy consumption. In this regard,
we propose a Q-learning-based algorithm that discovers the
environment to reach the best position step-by-step. Our
proposed algorithm reduces the computational complexity in
UAVs because it does not require all information of entire en-
vironment. Additionally, to further increase the transmission
rate and reduce the delay, the NOMA technique is employed
to schedule multiple transmissions at the same time slot to
each UAV.
B. MAIN CONTRIBUTIONS
The key contributions of this paper are briefly explained as
follows:
This paper investigates, for the first time, the NOMA
scheduling of vital signs of patients in UAV-enabled IoMT-
WBAN for outdoor applications. The patients are equipped
with IoMT devices for sensing and gathering the different
vital signs and UAVs are responsible for gathering the data
packets of patients distributed in the city area which includes
unreachable locations.
Our proposed scheduling algorithm comprises of two
levels. In the first level, we schedule the transmissions of
data packets of bio-sensors belonging to each patient, to the
corresponding hub. In order to eliminate the interference be-
tween asynchronous transmissions of bio-sensors, the Walsh
Hadamard (WH) coding scheme is employed in which the
sensed data packets of each bio-sensor are multiplied by one
of the orthogonal codes extracted from WH matrix. Using
this scheme, transmissions of all bio-sensors of a patient can
be scheduled in the same time slot without occurring any
collision among them.
In level two, the city area is partitioned into multiple
clusters and one UAV is assigned to each cluster. All pa-
tients belonging to individual cluster are served by the cor-
responding UAV. Under the circumstances, the transmissions
of hubs to UAVs are scheduled using NOMA technique.
This technique schedules multiple hubs in the same time
slot by accepting a degree of interference between them and
satisfying the rate requirements of all these hubs.
Moreover, we optimize the 3D trajectory of UAVs by
jointly considering the transmission rate, energy consump-
tion, and delay. To achieve this goal, our proposed algorithm
numerically solves a multi objective problem by Q-Learning
method. Using this method, we train each UAV individually
for finding the best 3D location where it can achieve the
highest sum rate along with the least energy consumption and
delay.
Different priority and emergency levels of vital signs are
other challenging issues in IoMT-WBANs. Because of the
existing wide variety of chronic diseases, the vital signs of
different patients have various delay sensitivity. Furthermore,
in disaster situations, unexpected emergency conditions may
occur to some patients where the vital signs of them need
to be timely transmitted to the monitoring center. In this
regard, our proposed algorithm takes the combined effect of
data priority, patient priority, and emergency conditions into
account in determining the total delay.
The rest of the paper is organized as follows. The pre-
liminaries of our work is introduced in Section II including
the system model, channel propagation model, and NOMA-
based uplink transmission model. In Section III, we compre-
hensively express the procedure of our proposed clustering
and Q-learning-based trajectory optimization algorithms. In
Section V, the simulation results are shown to verify the
performance of our proposed scheme. Finally, the results of
our proposed algorithm are concluded in Section VI.
Notation: In this paper, scalars are denoted by italic letters.
Boldface lower-case letters denote vectors. RM×1denotes
the space of M-dimensional real-valued vector. For a vector
x,xTdenotes its transpose and ||x|| represents the Euclidean
norm.
II. HEALTH CARE SYSTEM MODEL
In this work, we consider a large-scale UAV-aided WBAN
consisting of MUAVs, indexed by U={u1, ..., uM}, as
flying BSs that form an adaptive multi-hop network to serve
a set of P={p1, ..., pK}patients randomly distributed in
a large geographical area of size (A×A)m2. Each patient
piPis equipped with the set of Nbio-sensors, indexed by
the set B={bi1, ..., biN }, for sensing and transmitting the vi-
tal signals of patient’s body and one IoMT device as a hub Hi
to gather and transmit data packets of its corresponding bio-
sensors to the monitoring center. We suppose that patients
can move in all directions independently with the velocity of
vt
Hi, where trepresents the current time slot. It should be
noted that the velocity of patients is much lower than the
velocity of UAVs. Throughout this paper, we occasionally
use the term hub Hiinstead of the corresponding patient pi.
It is assumed that each UAV covers a cluster of terrestrial
hubs that satisfy the SINR threshold of the UAV. As it will
be fully described in Section III, we classify all patients in
the network area into Mgroups based on their locations and
horizontal coordinations by proposing a modified version of
one unsupervised machine learning-based method, namely
the Fast Global K-Means (FGKM) algorithm. Let CPujbe
the set of all hubs forming a cluster that is overlaid by UAV
ujUand C={CP u1, ..., CP uM}be the set of all clusters
in the network, where CP um CP un=,m6=n. In this
situation, because of the mobility of patients in the network
area, the location and the members of clusters are varied by
time, thus, the topology of the backbone adaptively changes
by flying the UAVs to the location of new clusters.
To get more insight into the aforementioned system model,
Fig. 1 illustrates a practical IoMT-based WBAN architecture
proposed by research communities for timely and reliable
aggregating data packets of patients distributed in a smart
VOLUME 4, 2016 3
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
city. As illustrated in this figure, at first, the vital signs
are sensed by sticking different bio-sensors on the patients’
bodies or using in-body implant devices, which communicate
to a hub for further processing via a star topology. This
hub can be an IoMT-based device like smart phones or
smart watches embedded in the human body. In this layer,
since the communication range is less than a meter, the path
loss is just considered the loss at the referenced distance
which is far and away less than the path loss of long-range
communication model WLAN and cellular networks and it
can be neglected. Thus, the channel propagation model of
Tier I in the proposed network model is totally different
from that of WLANs and cellular networks. Since patients
are distributed in a smart city area, some of them, located in
unreachable areas, cannot be covered by static access points.
Therefore, they are not able to transmit their data packets
to the monitoring center. This turns into a major when an
unexpected emergency condition occurs to the patients in
hot spots. UAVs have remarkable performance in terms of
online and reliable tracking of the vital signs of patients
in outdoor applications. After aggregating the vital signs at
each hub, they are transmitted to the corresponding UAV
or any Access Point (AP) in proximity. In this level, the
city area is divided into different clusters and a single UAV
covers the patients located in each cluster. This structure is
really useful in unreachable areas where cannot be covered
by stationary base stations. Note that the channel propagation
model of Tier II is approximately similar to the channel
distribution of UAV-assisted cellular networks. To be more
precise, similar to cellular networks, Rician distribution has
been employed to model the channel between transmitters
and receivers, which comprises LoS, NLoS, and shadowing
effect terms. However, LoS communication is the dominant
term in contrast to WLANs where the transmissions mostly
occur in indoor environments. Under the circumstances,
thanks to existing walls, doors, and furniture in indoor areas,
NLoS links are commonly conducted among transmitters and
receivers in WLANs. Eventually, the UAVs transmit data
packets to a remote cloud server where medical experts can
access them, directly and track the vital signs of patients in a
real-time manner. In Tier III of the proposed system model,
the collected vital signs are forwarded to the cloud by UAVs
and then the cloud server estimates the best path for each data
packet in line with its emergency condition and bandwidth
requirement. In this tier, transmitting data packets occur via
wired links, where the attenuation is relevant to the thermal
noise junctions, material, and electromagnetic field of the
link which is completely different from the transmission links
in WLANs and cellular networks.
A. BIO-SENSORS’ SIGNALING MODEL
To enhance the spectral efficiency, we suppose that all bio-
sensors of one patient’s body use the same bandwidth to
transmit their sensed vital signs to the corresponding hub. To
mitigate the co-channel interference due to the simultaneous
transmissions of different types of bio-sensors of a patient,
we assign a code from the Walsh Hadamard (WH) code space
to a bio-sensor that guarantees the use of orthogonal codes for
transmission. To this end, W He= [rw1,· · · ,rwk+1 ]T
Z(k+1)×2kis employed as the matrix of the pre-specified
rows extracted from the original WH matrix:
WH2k=
WH2k1WH2k1
WH2k1WH2k1
, k = 1,2, ... (1)
with WH1= [1]. It is shown in [21] that the rows of WHe
represent the codewords which are two-by-two orthogonal
in every phase shift ψ= 0,· · · ,2k1. The orthogonality
of each pair of rows rwi= [ci1,· · · , ci2k]and rwj=
cj1,· · · , cj2k,i6=j, is calculated by the following cross-
correlation expression in the phase shift ψ:
Φrwi,rw j(ψ) =
2k
X
`=1
cψ
i` ×cψ
j` = 0,ψ= 0,· · · ,2k1.
(2)
Taking this property into account, by the product of signal of
each bio-sensor bin to one of the rows rwi, we can guarantee
the collision-free transmissions of different signals from bio-
sensors in one patient, i.e.,
rwψ
irwψ0
jsbin .rwψ
isbim .rwψ0
j, n 6=m, (3)
for all ψ, ψ0= 0,· · · ,2k1,where sbin and sbim are the
signal vectors of bio-sensors bin and bim, respectively.
B. UAV CHANNEL PROPAGATION MODEL
In order to model the channel characteristics between each
UAV ujUand a terrestrial hub Hibelonging to cluster
CP uj, we compute the 3D Euclidean distance between uj
and Hi. In this regard, the positions of hub Hiand UAV uj
are represented by pHi= [xHi, yHi]TR2×1and puj=
xuj, yuj, aujTR3×1, respectively, where (xuj, yuj)
denotes the coordinates of UAV ujon the horizontal plane
and aujindicates its altitude. Accordingly, the 3D distance
between hub Hiand UAV ujis calculated by:
dHi,uj=q(xHixuj)2+ (yHiyuj)2+ (auj)2.(4)
We also consider a well-known Air-To-Ground (ATG)
channel model [22], in which there are two main propagation
groups. Depending on the altitude of UAVs and the eleva-
tion angle between the hub Hiand UAV uj, denoted by
θHi,uj, these groups are categorized by Line-of-Sight (LoS)
and Non-Line-of-Sight (NLoS) communication links. In this
regard, the probability of LoS communication between UAV
ujat the altitude aujand hub Hiwith the Euclidean distance
between them dHi,uj, is formulated by the sigmoid function
as follows:
PLoS
Hi,uj=1
1 + αexp β180
πθHi,ujα,(5)
where αand βare the environment constants that represent
the ratio of built-up area to the total land area multiplied by
4VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Cloud Server
UAV
FIGURE 1. Typical IoMT-based WBAN architecture equipped with UAV technology.
the mean number of building per unit area, and buildings’
height distribution, respectively. Furthermore, the elevation
angle θHi,ujis defined as
θHi,uj= tan1auj
rHi,uj,(6)
where rHi,uj=qxHixuj2+yHiyuj2. On the
other hand, the probability of NLoS propagation communi-
cation is obtained as
PNLoS
Hi,uj= 1 PLoS
Hi,uj.(7)
The collected vital signals by the hubs propagate through
the network area where they experience shadowing and scat-
tering phenomena imposed by the sky scrapers and huge
obstacles, then, they enter the free space and reach the
corresponding UAVs. The shadowing effect, represented by
XHi,uj, imposes excessive loss to the ATG link which has a
Gaussian distribution. Under the circumstances, the LoS and
NLoS path loss expressions for given hub Hiand UAV uj
are determined as
LLoS
Hi,uj[dB] = L0+ 10ηLoS log dHi,uj+XLoS
Hi,uj,(8)
LNLoS
Hi,uj[dB] = L0+ 10ηN LoS log dHi,uj+XNLoS
Hi,uj,(9)
where L0= 20 log 4πfcd0
crepresents the path loss at a
reference distance d0,fc, and cdenote the carrier frequency
and the speed of light, respectively. Moreover, ηLoS and
ηNLoS are the path loss exponents for LoS and NLoS links.
Based on the aforementioned definitions, the average path
loss between hub Hiand UAV ujcan be calculated as
follows:
¯
LHi,uj=PLoS
Hi,ujLLoS
Hi,uj+PNLoS
Hi,ujLNLoS
Hi,uj.(10)
In addition, the small scale channel fading coefficient for
transmitting a symbol from hub Hito UAV uj, denoted
by ˜
CHi,uj, is represented by a complex Gaussian random
variable with the non-zero expected value and variance σ2.
Taking the above definitions into account, the instantaneous
channel coefficient between hub Hiand UAV ujis repre-
sented as
CHi,uj=
˜
CHi,uj
q¯
LHi,uj
.(11)
Thus, the average channel gain between hub Hiand UAV uj,
given E
˜
CHi,uj
2= 1, is computed as
¯gHi,uj=E
˜
CHi,uj
q¯
LHi,uj
2
=
E
˜
CHi,uj
2
¯
LHi,uj
=¯
LHi,uj1.
(12)
C. NOMA-BASED UPLINK TRANSMISSION MODEL
As previously mentioned, one of the responsibilities of hub
Hiis to transmit the collected vital signs of patient pito
the corresponding UAV uj. In the situation of partitioning
hubs into different clusters, all the overlaid hubs by UAV uj,
i.e., ∀Hi CP uj, employ the NOMA signaling technique
to simultaneously transmit their data packets to UAV uj. In
fact, the signals of different Hi CPujare superposed with
different transmission powers and then those are sent to UAV
uj. The transmission power of each hub Hi CPujis a
fraction of the total transmission power Ptotal supported by
UAV uj. Thus, the average channel gain is computed for all
the members of CP ujand then the members are indexed in
descending order, such as:
¯gH1,uj · · · ¯gH|CPuj|,uj,∀Hi CPuj,(13)
VOLUME 4, 2016 5
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
which implies that the first and the last hubs in cluster CP uj
have the strongest and weakest condition, respectively. Based
on the criterion in (13), the NOMA technique assigns a
fraction of Ptotal to each Hi CP ujby employing power
coefficients ζ1 · · · ζ|CP uj|, where
|CP uj|
P
i=1
ζi= 1. For
more clarification, the maximum coefficient is assigned to
the hub with the weakest channel condition to form its power
ζ|CP uj|Ptotal =PH|CPuj|, while the minimum coefficient
is allocated to the hub with the strongest channel condition.
Accordingly, the transmission power of all Hi CPujis
arranged in the ascending order as follows:
PH1 · · · PH|CPuj|,∀Hi CP uj.(14)
Based on the above power allocation technique, the trans-
mitted signal by each Hi CP ujis expressed as follows:
xHi=pPHisHi=qζ|CP uj|PtotalsHi,(15)
where sHiindicates the information of hub Hiwith
Eh|sHi|2i= 1. In addition, the received signal at UAV uj
can be expressed as
ruj=
|CP uj|
X
i=1
CHi,ujxHi+nuj
=
|CP uj|
X
i=1 sζiPtotal
LHi,uj
¯
CHi,ujsHi+nuj,
(16)
where nujdenotes the zero-mean additive white Gaussian
noise with power N0at UAV uj. On the other hand, the
NOMA scheduling technique employs the SIC scheme at
the receiver UAV ujto detect the massage signal of each
Hi CP ujwithout co-channel interference. To this end, the
SIC scheme detects the signal of a hub with the strongest
channel condition and treats the signals of other poorer
hubs as interference. Subsequently, this signal is subtracted
from the total signal. This procedure is repeated until the
signals of all Hi CP ujare detected separately. Under
this circumstance, the received SINR of hub Hiat UAV uj,
represented by γHi,uj, is obtained as
ΓHi,uj=ζiPtotal ¯gHi,uj
|CP uj|
P
k=i+1
ζkPtotal ¯gHk,uj+N0
,∀Hi CP uj.
(17)
The data rate of Hi CP ujis computed as follows:
RHi,uj=Wlog21+ΓHi,uj,(18)
where Wis the total bandwidth. In order to satisfy the
reliability of the received information, the transmitted SINR
of hub Hiat UAV uishould satisfy the following constraint:
ΓHi,ujΓth,(19)
where Γth is the minimum SINR of hub Hithat is required
to achieve a satisfactory Bit Error Rate (BER) for all UAVs.
D. ENERGY CONSUMPTION AND DELAY MODEL
The total energy consumption of each hub Hifor transmitting
Lbdata bits to UAV uiconsists of two following components:
EHi,uj=Eelec
Hi+ET x
Hi,uj,(20)
where Eelec
Hiand ET x
Hi,ujrepresent the energy consumption of
electronic circuits of hub Hiand the energy consumption of
transmitting data packets on the channel, respectively. The
amount of ET x
Hi,ujdepends on the channel condition, distance,
transmission power, and length of packets, and it can be
calculated by:
ET x
Hi,uj=PHiLb
RHi,uj
=ζiPtotalLb
Wlog1
21+ΓHi,uj.
(21)
Since the altitude of each UAV is optimized to maximize
the throughput of the network, the distance between each
UAV and the corresponding hubs fluctuates between short
and long ranges. For the case when the distance between
hubs and UAVs is sufficiently large, Eelec
Hiis considerably
smaller than ET x
Hi,uj, thus, we ignore this term in determining
EHi,uj. Moreover, since UAVs are battery-powered with lim-
ited energy capacity, thus by reducing the remaining energy
of UAVs below a specified value, the vital signs will not be
forwarded in time. In this regard, parameter Eth is defined
as the remaining energy threshold of UAVs. In our proposed
algorithm, the remaining energy of all UAVs must be higher
than Eth to satisfy the real-time and reliable transmission of
vital signs.
The total experienced delay by data packets of hub Hiuntil
receiving by UAV ujis obtained by the sum of two terms as
follows:
DHi,uj=Dacc
Hi+DT x
Hi,uj,(22)
where Dacc
Hiand DT x
Hi,ujdenote the requirement time for hub
Hito access the channel and the time of transmitting one data
packet with the size Lbon the channel, respectively. In this
regard, DT x
Hi,ujis obtained as
DT x
Hi,uj=Lb
RHi,uj
=Lb
Wlog1
21+ΓHi,uj.(23)
E. 3DTRAJECTORY MODEL OF UAV
We assume that the initial and final locations of each UAV
ujare represented by p(0)
uj=hx(0)
uj, y(0)
uj, a(0)
ujiT
and p(F)
uj=
hx(F)
uj, y(F)
uj, a(F)
ujiT
, respectively. In order to show the time-
spatial changes of UAVs, we divide the time horizon T
into multiple equal-length time slots represented by the set
T={t0,· · · , tZ}. The length of time slots is obtained
by δ=T
Z. The maximum horizontal (in xyplane)
and vertical speeds of each UAV ujare indicated by vuj
xy
and vuj
awhere it is supposed each UAV can independently
control its horizontal and vertical speeds [23]. Accordingly,
the maximum horizontal and vertical distances UAV uj
spans during each time slot are calculated by Suj
xy =vuj
xyδ
and Suj
a=vuj
aδ, respectively. In this regard, considering
6VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
quj=xuj, yujTas the horizontal coordinates of UAV uj,
the distance between two consecutive coordinates of UAV uj
should satisfy the following constraints:
q(t`+1 )
ujq(t`)
uj Suj
xy ,a(t`+1)
uja(t`)
uj Suj
a.(24)
Moreover, to avoid the obstacles like buildings, each UAV
should fly above the minimum altitude denoted by amin, i.e.,
aujamin.
F. PROBLEM FORMULATION
Based on the aforementioned definitions, the main objective
of our proposed algorithm is to jointly maximize the total
effective throughput and minimize the total energy consump-
tion and total delay in each time slot. To this end, these
factors are defined as the sum of rates, the sum of energy
consumption, and the sum of delay of all hubs belonging
to cluster CP ujwhich allows connecting to the UAV uj,
respectively. These factors are determined by:
R(t`)
uj=X
∀HiH(t`)
RHi,uj,(25)
E(t`)
uj=X
∀HiH(t`)
EHi,uj,(26)
D(t`)
uj=X
∀HiH(t`)
DHi,uj,(27)
where H(t`)is the set of all His connected to the UAV ujin
time slot t`based on the real time NOMA scheduling tech-
nique. Accordingly, we aim to optimize these three metrics,
simultaneously, as mathematically expressed below:
max
nq(`)
ujo(R(t`)
uj,1
E(t`)
uj
,1
D(t`)
uj).(28)
It is worth mentioning that there exists a correlation among
sum rates, sum energy consumption, and sum delays as the
main optimization objectives in the proposed IoMT-based
WBAN. Thus, if these objectives are separately optimized,
achieving the local optimum value of one variable leads
to inefficient values of the others. Accordingly, there is a
tradeoff between optimizing rate and delay values along
with a tradeoff between delay and energy consumption. For
more clarification, consider a situation in which an unex-
pected emergency condition would occur to a patient, but
the corresponding communications suffer from poor channel
conditions. In this case, the network experiences a delay-
sensitive situation, but at the cost of a lower transmission
rate (according to the Shannon capacity formula). Under the
circumstances, separately optimizing the delay value leads to
an inefficient transmission rate and vice versa. For another
example, consider two different transmissions to be sched-
uled in the current time slot, and assuming that the first trans-
mission has higher delay sensitivity and energy consumption
than the second one. In this situation, if the delay is optimized
locally, the first transmission is selected to access the channel,
in contrast, if the energy consumption value is only consid-
ered in the optimization problem, the second transmission
is chosen by that metric. Moreover, based on (21), there
is a reverse relationship between energy consumption and
transmission rate. Consequently, by selecting the transmis-
sions which have the highest data rate in each time slot, the
energy consumption will be decreased to the minimum value.
Taking the above considerations into account, our objective
is simultaneously optimizing all of these factors to design an
energy-efficient scheduling algorithm for timely and reliably
transmitting the vital signs to the monitoring center. As will
be shown in subsection III-A, to fairly combine the impacts
of these metrics in the aforementioned multi-objective opti-
mization problem, the computed value of each metric would
be normalized to its maximum value. Using this method, the
values of all metrics are mapped to the interval of [0,1].
III. THE PROPOSED Q-REDTO ALGORITHM
In this section, we propose a two-layer scheduling algorithm
for optimizing the trajectory of UAVs, namely Q-Learning-
based Rate- Energy- and Delay-aware Trajectory Optimizer
(Q-REDTO). The first layer is related to scheduling the
transmission of vital signs sensed by bio-sensors of one
patient to the corresponding hub, while the second layer
schedules the transmission of collected signals by each hub
to the monitoring center via UAVs. The procedure of our
proposed Q-REDTO algorithm is explained in the following
four stages:
Stage 1: Interference Avoidance Scheduling in Tier I:
In the first stage, the vital signals of different bio-sensors are
timely scheduled to transmit to the corresponding hub. Since
all bio-sensors of a typical patient share the same bandwidth
along with their asynchronous duty cycling mechanism, the
collision is not avoidable between transmissions of these bio-
sensors, without using any scheduling technique. To solve
this problem, we employ the WH coding scheme in which
sensed data of all bio-sensors are simultaneously delivered
to the hub by multiplying them to the cyclic orthogonal
WH codes extracted from (1). This process mitigates the
interference between concurrent transmissions because of the
orthogonality of these codes in every phase shift. For more
clarification, consider sbin = [sbin (1),· · · , sbin (X)]Tand
sbim = [sbim (1),· · · , sbim (Y)]Tas the vital signs of bio-
sensors bin and bim,n6=m. As described in the previous
section, since the cross-correlation of two different cyclic
orthogonal codes rwψ
iand rwψ0
j,i6=j, is equal to zero,
therefore, it can be proved that by multiplying sbin to rwψ
i
and sbim to rwψ0
j, and denoting ˜
sbin =sbin .rwψ
iand
˜
sbim =sbim .rwψ0
j, the cross-correlation of these two signals
will be zero, i.e.,
Φ˜
sbin ,˜
sbim (ψ, ψ0) =
X
X
x=1
Y
X
y=1
2k
X
κ=1
sbin (x)cψ
sbim (y)cψ0
= 0,
ψ, ψ0= 0,· · · ,2k1, n 6=m, i 6=j.
(29)
VOLUME 4, 2016 7
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
To highlight the benefit of the above interference avoid-
ance scheduling in practical WBAN, let consider four differ-
ent medical bio-sensors consisting of two MAX30100 chips
used for ECG and SpO2signals, the MPS20N0040D sensor
for sensing the systolic blood pressure, MAX30205MAT
employed for measuring body temperature, and the X2M200
chip exploited for sensing respiration rate. The different
sampling frequencies of these bio-sensors are shown in Fig.
2. As it can be realized from the figure, because of the asyn-
chronous sampling frequencies of bio-sensors, interference is
inevitable between the transmissions. To avoid the interfer-
ence, the cyclic orthogonal WH coding scheme is employed
which multiplies the vital signs to a set of codes that are
orthogonal in every phase shift. Besides, this simultaneous
transmission of vital signs results in reducing the transmis-
sion delay of the proposed algorithm which satisfies the
real-time monitoring of medical in delay-sensitive WBAN
applications.
Stage 2: Patients Clustering Method: After transmit-
ting vital signs of bio-sensors to their corresponding hubs,
the hubs communicate to UAVs to deliver data packets
to the monitoring center. As depicted in Fig. 3, because
of employing multiple UAVs in the network, we classify
the hubs into different clusters using an unsupervised ma-
chine learning-based scheme namely Fast Global K-Means
(FGKM) [24]. This algorithm establishes an incremental
deterministic global optimization method that optimally adds
one new cluster at each step until convergence. Assuming
a set of hubs HiPdistributed in the geographical area,
the FGKM algorithm partitions these hubs into Mdisjoint
clusters to optimize a specific criterion. In the first iteration,
the hub that optimizes the criterion is selected as the optimal
cluster center cc1for the m= 1 clustering problem. Conse-
quently, in the second iteration, i.e., m= 2, one new optimal
cluster center cc2is added by assuming that the previous cc1
is the first optimal cluster center in the current iteration. This
procedure is repeated until the objective function converges
to its minimum value. We define this objective function as
the Mean Square Error (MSE) of the summation of distances
between each Hiand the corresponding cluster center ccm,
i.e.,
M(cc1,· · · , ccM) = 1
K
K
X
i=1
min
m=1,··· ,M ||ccm Hi||2.
(30)
In each iteration, the FGKM algorithm adds the cluster that
minimizes the upper bound of MSE in the current iteration,
i.e., m1m, where m1denotes the upper bound of
MSE in the previous iteration and
m= arg max
H0
iPH0
i,(31)
in which,
H0
i=1
K
K
X
i=1
max ||ccm1 Hi||2 ||H0
i Hi||2,0.
(32)
For the above equations, H0
idenotes the candidate hub for
being the new cluster center, and ||ccm1 Hi||2represents
the squared distance between hub Hiand its previous closest
cluster center ccm1. Based on (32), if the squared distance
between hub Hiand H0
iin each iteration is smaller than the
distance of Hito ccm1, it is added to the set of cluster
members of H0
i. Thus, regarding (31), this metric selects
the hub that has the most number of other hubs in its
proximity and its adjacent hubs are in the furthest location
from cc1,· · · , ccm1, as the new cluster center ccm. In this
situation, it can be easily shown that mm1m.
Accordingly, this metric reduces the computational complex-
ity of the FGKM scheme and speeds up the convergence
of the objective function. We summarize the aforementioned
procedure of the FGKM scheme as pseudocode in Algorithm
1.
Algorithm 1 Pseudocode of the Proposed FGKM Algorithm
1: Input: Positions of all HiPand number of UAVs M.
2: Output: {cc1,· · · , ccm}and C={C Pu1, ..., CPuM}.
3: for m= 1,· · · , M do
4: if m== 1 then
5: for each H0
iPdo
6: Calculate 1from (30).
7: end for
8: cc1 H0
iarg min 1over all H0
iP.
9: else
10: Suppose cc1,· · · , ccm1as the existed optimal
cluster centers.
11: for each H0
iPdo
12: Calculate H0
iaccording to (32).
13: end for
14: Compute musing (31)
15: ccm H0
iarg min(m1m)over H0
i
P.
16: end if
17: end for
Stage 3: Q-Learning-based Solution for Trajectory Op-
timization and Vital Signals Scheduling: After partitioning
the area into different clusters, all hubs belonging to the same
cluster are served by the same UAV. The optimal number of
UAVs is determined by the FGKM algorithm. By dividing
the total time frame into multiple time slots, we design the
Q-REDTO algorithm in such a way that the best vertical and
horizontal position of each UAV is determined in each time
slot. The objective function given in subsection II-F, which
refers to the best position in which each UAV can achieve
the highest sum rate along with the least energy consumption
and delay. The Q-REDTO algorithm employs the QL-based
framework to solve the multi-objective trajectory optimiza-
tion problem of UAVs to aggregate data packets of their
corresponding hubs in each cluster. The Q-REDTO algorithm
consists of four core elements, i.e., states, actions, reward,
and Q-values which are described as follows:
States: Since the WBAN environment is generally a city
8VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 2. Typical duty cycle of four practical medical bio-sensors MAX30100 chip, MPS20N0040D sensor, MAX30205MAT, and X2M200 chip.
au2
=220 au1
=200
au3
=190
au6
=190
au5
=170
au8
=180
au9
=210
au4
=210
au10
=210
au7
=220
FIGURE 3. A typical city area partitioned into 10 clusters using FGKM where
the green star shows the UAV with the optimum value of altitude serving
patients in its corresponding cluster.
area, obviously, we encounter infinite states in our 3D tra-
jectory optimization problem which is not tractable because
of infinite decision space. In order to map the continuous
environment with a finite number of states, we discretize
the WBAN environment into the set of equal-space tiles
denoted by S={st1,· · · , stI}. These tiles represent the
available states of the environment in which UAVs can only
fly between the central points of tiles.
Actions: The procedure of the state transition of
UAVs is implemented by taking different actions. In this
work, all directions in which UAVs can fly indicate the
action space. Based on the aforementioned discrete state-
space model, our action space is limited to eight direc-
tions. In other words, each state is surrounded by eight
other states that each UAV can fly to. In this situation, the
set A={“north”,“east”,“south”,“west”, north-east”, north-
west”,“south-east”,“south-west”} denotes the action space of
a typical UAV.
Rewards: The consequence of each action is defined
as a reward. By selecting an action from the set Aand
transition to the new state, each UAV can observe a new part
of its cluster environment and can achieve a new amount of
sum rate, energy consumption, and delay requirements. The
combination of these factors is considered as a reward gained
by UAV for observing the new state.
Q-values: These values are calculated from the follow-
ing Q-function that is responsible for the convergence of our
proposed Q-REDTO algorithm:
Qnew(st, ac) (1 λ)Qold(st, ac)+
λ<(st, ac) + γmax
ac0AQold(st0, ac0),
(33)
where λ,γ, and st0denote the learning rate, discount fac-
tor, and the new observing state, respectively. Moreover,
<(st, ac)and Qold(st0, ac0)represent the reward value of
the current state-action policy and the expected reward of
the new state-action policy. Each UAV keeps Q-values of
all possible state-actions in a Q-table in which its rows
and columns stand for actions and states, respectively. By
observing the new state, the Q-value of the current state is
updated using (33).
A. UTILITY MAXIMIZATION
After describing the preliminaries of the QL-based method,
the procedure of the proposed Q-REDTO algorithm is pre-
sented to find the best position of UAV in each time slot. As
mentioned before, the Q-REDTO scheme aims at optimizing
both vertical and horizontal trajectories of UAVs to gain the
best performance in terms of increasing the total sum rate
and reducing the energy consumption and delay. To this end,
we propose the following utility function to simultaneously
VOLUME 4, 2016 9
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
consider the effects of the aforementioned factors:
URED =ωrRns ωeEns +ωiIns ,(34)
where Rns,Ens , and Ins represent the normalized sum rate,
sum of the energy consumption, and sum of the emergency
index of all hubs covered by each UAV in the specific
position, respectively. In addition, ωr,ωe, and ωidenote
the corresponding weighting coefficients. To this end, the
proposed optimization problem in (28) is converted to the
following optimization problem:
P1) max
nq(`)
ujoURED s.t.(35)
C1 :q(0)
uj=hx(0)
uj, y(0)
ujiT
=q(t0)
uj,(36)
C2 :q(F)
uj=hx(F)
uj, y(F)
ujiT
=q(tZ)
uj,(37)
C3 :γHi,uj> γth,∀HiH(t`),(38)
C4 :q(t`+1 )
ujq(t`)
uj Suj
xy ,tzT,(39)
C5 :aujamin.(40)
The procedure for computing the normalized terms in (34)
is demonstrated in detail as follows.
As mentioned before, our Q-REDTO algorithm employs
NOMA technique to simultaneously schedule multiple hubs
to connect to the corresponding UAV. In this regard, the total
throughput achieved by each UAV is equal to the sum of
the rates of all hubs located in the coverage area of each
UAV that satisfy the constraint (19). According to this metric
definition, the proposed Q-REDTO algorithm computes the
sum rate of all hubs covered by each UAV in different
states from (18) and subsequently divides it to the largest
value of sum rate to calculate the normalized metric Rns.
With a similar argument, our proposed algorithm obtains
the energy consumption of each hub using (20) and then,
adds all the values together to obtain sum of the energy
consumption. Finally, to calculate the normalized sum of the
energy consumption, i.e., Ens, in each state of the cluster
area, the computed sum is divided into the largest value of
the sum of energy consumption.
The third effective factor in the utility function URED is
the normalized sum of the emergency indexes of all hubs cov-
ered by each UAV ujthat satisfy the constraint (19), based on
the NOMA power assignment. In the context of emergency
health care systems like WBANs, some vital signs or patients
have precedence over others which require a lower delay and
higher data rate. Under the circumstances, we should design
an appropriate mechanism to address the timely transmission
of life-critical vital signs to the monitoring center. The first
factor that is effective to design the emergency index of each
hub Hi, is the data priority index denoted by Idp. This index
is defined as the priority of different vital signals to each
other. For instance, the vital signs of respiration rate bio-
sensors have priority over the vital signs of body temperature
TABLE 1. Data prior ity mapping in IEEE 802.15.6standard.
DP index (Idp)Data packet types
0 Background (BK)
1 Best effort (BE)
2 Excellent effort (EE)
3 Video (VI)
4 Voice (VO)
5 Low-priority medical data (e.g., body TMP)
6 High-priority medical data (e.g., body SBP)
7 Emergency medical data (e.g., body ECG)
TABLE 2. Patient Priority mapping in IEEE 802.15.6standard.
PP index WBAN services
(Ipp)
0 Non-medical services (e.g., sport applications)
1 Low priority medical services (e.g., people in laboratory)
2 General health services (e.g., patient in GHU)
3 Highest priority medical services (e.g., patients in ICU)
bio-sensors. Our algorithm uses Table 1, categorized by IEEE
802.15.6standard [25], to determine Idp for different bio-
sensors. The second influential parameter on the total emer-
gency index is the patient priority index Ipp that represents
the precedence of some patients to others. As an example,
the vital signs of patients who are under surgery or have
chronic illnesses, have precedence over other patients. In this
situation, the patient priority is mapped by Table 2 extracted
from the IEEE 802.15.6standard, to distinguish between the
life-emergency patients and non-critical patients.
The most effective factor in computing the total emergency
index is the data severity index represented by Ids. This
parameter stands for checking the occurrence of unexpected
emergency conditions. In practical health- care applications,
each vital sign has a normal range and if the sensed value
exceeds this range, it shows anomaly in the vital sign that
may be due to the patient’s life-critical condition. For more
clarification, consider the case when the value of the blood
pressure bio-sensor exceeds its normal range. Under this con-
dition, it has the privilege to ECG vital sign which is in the
normal range, although Idp of ECG is higher than Idp of the
blood pressure sensor. In order to model Ids, consider ϕbij
as the sensed vital sign by bio-sensor bij and [ϕlowj, ϕupj]as
the pre-assigned value of that bio-sensor. In this regard, let us
define ϕ(ij)
up,b ,ϕupjϕbij and ϕ(ij)
b,low ,ϕbij ϕlowj. In
this situation, we define the following indicator function that
models the normal and abnormal cases that occurred to each
patient:
I(ij)
n=(1,if ϕ(ij)
up,b >0and ϕ(ij)
b,low >0(Normal),
0,if ϕ(ij)
up,b <0or ϕ(ij)
b,low <0(Abnormal).
(41)
To this end, the data severity index of jth bio-sensor of
10 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
patient piis formulated as follows:
I(ij)
ds =
1,if I(ij)
n= 1,
1 + |ϕ(ij)
up,b|
ϕ(j)
up,low
,if I(ij)
n= 0 and ϕ(ij)
up,b <0,
1 + |ϕ(ij)
b,low|
ϕ(j)
up,low
,if I(ij)
n= 0 and ϕ(ij)
b,low <0,
(42)
where ϕ(j)
up,low ,ϕupjϕlowj. From (42), we can realize
that in the abnormal case, i.e., I(ij)
n= 0, the normalized de-
viation of ϕbij from the corresponding boundary is increased
by 1. According to these factors, the emergency index of each
hub Hi, represented by Iem, is determined as follows:
IHi
em = ωdp
IHi
dp
7+ωpp
IHi
pp
3!IHi
ds ,(43)
where ωdp and ωpp represent the weighting coefficients of
Idp and Ipp, respectively. It is worth mentioning that IHi
em in-
dicates the delay sensitivity level of the hub Hi. Accordingly,
the higher value of IHi
em shows the lower tolerable access
delay of data packets belonging to the hub Hi. Thus, in order
to minimize the access delay of each hub, the proposed Q-
REDTO algorithm should select the hubs with the maximum
value of Iem. Then, the Q-REDTO algorithm calculates the
sum of Iem of all covered hubs by each UAV ujin each state.
Finally, Ins of each state is computed by dividing its sum of
Iem to the maximum value of that summation between all
states.
In order to optimize the vertical position of each UAV
uj, the Q-REDTO algorithm changes the altitude of ujin
ascending order from amin to amax, and then at each specific
altitude, it calculates the Uuj
RED metric in (34) for all tails
belonging to each cluster. In the next stage, the algorithm
computes the average Uuj
RED over all the tails and eventually,
the altitude that maximizes Uuj
RED in each cluster is selected
as the optimal vertical position of each UAV uj. After deter-
mining the optimal vertical position of each UAV uj, we op-
timize its horizontal trajectory. To this end, the proposed Q-
REDTO algorithm employs the QL-based mechanism to train
each UAV ujfor finding its best position in the corresponding
cluster area. According to the aforementioned definitions of
the elements in the QL-based mechanism, Q-REDTO runs
enough episodes to update values of the Q-table of each UAV
ujstep by step until it converges to the optimum value. In
this regard, each episode is started by randomly selecting one
state in each cluster as the initial state of the corresponding
UAV. Then, each UAV ujselects an action from its available
action space Ato reach the next state. In this regard, our
proposed algorithm uses the following two different policies:
Reward maximizing action selection policy: Using
this policy, the Q-REDTO algorithm selects an action that
maximizes the new Q-value of the current state, i.e.,
acuj
sel arg max
acAQuj
new (st, ac).(44)
According to (33), the Q-function of each UAV ujconsists
of three factors. The first one is the previous Q-value of
the current state and the second one is the reward function
which has the most important role in identifying the optimal
horizontal trajectory of each UAV uj. In this regard, if
Uuj
RED (st)<Uuj
RED (st0), the reward function is designed as
<uj(st, ac) = Uuj
RED (st0) Uuj
RED (st), otherwise, it takes
zero value. To select an optimal action, the reward function
is computed for all available actions of the current state.
According to this reward function, the states that cover the
hubs with lower energy consumption, higher data rate, and
delay sensitivity are more probable candidates to be selected
as the next state for the corresponding UAV. The third ef-
fective factor in (33) is the future expected reward, denoted
by maxac0AQold(st0, ac0), which refers to the maximum Q-
value of the future state.
After calculating the above three factors, the correspond-
ing Q-values are obtained from (33) and consequently, the
action that has the maximum Q-value is selected as the
optimal action for each UAV uj.
Random action selection policy: We use the random
action selection mechanism in early episodes to enable UAVs
to experience new actions and states. In this regard, we
introduce a policy selection variable Ξthat selects a random
number for each state from the interval [0,1]. If this number
is higher than a pre-specified threshold epth =(epi1)
κ,
where epirepresents the current episode and κ=|E|,E=
{ep1, ..., epκ}is the total number of episodes, the random
selection policy will be selected to reach the next state;
otherwise, the reward maximization policy is selected. It is
worth mentioning that the threshold differs in a ascending
manner in each episode.
We summarize the aforementioned procedure of the pro-
posed Q-REDTO algorithm as pseudocode in Algorithm 2.
Beyond WBAN transmission model: After collecting
vital signs of patients by UAVs, they must be transmitted to
the remote monitoring center by means of a cloud backbone
as is shown in Fig. 1. The cloud network consists of a cloud
server in addition to several routers forming a mesh network
for delivering vital signs to beyond the WBAN. These routers
have equivalent transmission ranges and all of them employ
the same channel for transmitting data packets. Due to the
limited transmission range of each router, we use multiple
routers to increase the coverage area of the network. After
forwarding vital signs to the cloud by UAVs, the cloud server
estimates the best path for each data packet in line with
its emergency condition and bandwidth requirement. To this
end, the shortest path tree (SPT) algorithm is run by the
cloud server to find the shortest path for each packet reaching
its destination, and the routing tables of the corresponding
routers are updated, simultaneously. Furthermore, a weight
representing the amount of traffic load is assigned to each link
between the routers, to prevent the congestion in the routes
and guarantee the load balancing. In this regard, by selecting
each path, the weights of the constituent links are increased
resulting in reducing the chance of selecting these links in the
future paths.
VOLUME 4, 2016 11
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Algorithm 2 Procedure of the proposed Q-REDTO algo-
rithm
1: Input: U={u1, ..., uM},C={CP u1, ..., CP uM},
S,A={“north”,“east”,“south”,“west”,“north-
east”,“north-west”,“south-east”,“south-west”}.
2: Output: The optimal trajectory of UAVs.
3: for each CP uiCdo
4: Discretize the cluster environment into tails.
5: end for
6: for each Hicovered with ujdo
7: Calculate RHi,ujfrom (18).
8: Calculate EHi,ujfrom (20).
9: Calculate IHi
em from (43).
10: end for
11: for each ujUdo
12: for each amin < auj< amax do
13: for each stiSdo
14: Compute Rns,Ens , and Ins.
15: Compute Uuj
RED (st)from (34).
16: end for
17: Determine the average value of Uuj
RED .
18: end for
19: aopt
ujarg maxaujUuj
RED .
20: end for
21: for each epiEdo
22: for each ujUdo
23: Randomly select the initial state of uj.
24: for stido
25: Select a random number Ξ[0,1].
26: Calculate epth =(epi1)
|E|.
27: if Ξ> epth then
28: Randomly select acujA.
29: else
30: for each acujAdo
31: $ Uuj
RED (st) Uuj
RED (st0)
32: if $ < 0then
33: <uj(st, ac) $.
34: else
35: <uj(st, ac)0.
36: end if
37: Compute Quj
new(st, ac)from (33).
38: end for
39: acuj
sel arg maxacAQuj
new (st, ac).
40: end if
41: Update Quj
new(st, ac)using (33).
42: end for
43: end for
44: end for
IV. COMPLEXITY ANALYZES
Proposition 1: The computational complexity of Algorithm
1 is of order O(M|P|), in which Mrepresents the total
number of UAVs needed after the convergence of FGKM and
Pdenotes the set of all patients in the network, where |•|is
the cardinality operator.
Proof: It is realized from the pseudo code of Algorithm
1that there are one main “for” loop (i.e., lines 317)
and two inner “for” loops (i.e., lines 57and 11
13). The computational complexity of the first inner loop
is O(|P {cc1}|), and the complexity of second one is
O(|P {cc1,· · · , ccm1}|). Because the number of cluster
centers is much less than the number of patients, it can be
neglected in comparison to |P|. Thus, the complexity of these
two inner loops is of order O(|P|). Moreover, complexity
of the other lines and the main loop are O(1) and O(M),
respectively. Thus, the total complexity of the nested loops is
equal to O(M|P|).
Proposition 2: The computational complexity of Algo-
rithm 2is of order O(|E|M|S| |A|), where |E|,|S|, and
|A|represent the number of episodes, available states, and
actions in the proposed Q-REDTO algorithm, respectively.
Proof: Regarding the pseudo code of Algorithm 2, there
are four main “for” loops in this algorithm. Complexity of
the first and second loops (i.e., lines 35and 610) are
O(1) and O(|P|), respectively. The third main “for” loop
(i.e., lines 11 20) consists of three nested “for” loops. The
computational complexity of the outer loop is O(M), then
the complexity of the first inner loop is of order O(1) and for
the second loop is O(|S|). Therefore, the complexity of this
part is of order O(M|S|). The fourth main loop (i.e., lines
2144) comprises of four nested “for” loops. The computa-
tional complexities of these loops from outer on to inner one
are O(|E|),O(M),O(|S|), and O(|A|), respectively. Thus,
the total complexity of this term is of order O(|E|M|S| |A|).
Finally, the total complexity of the Q-REDTO algorithm is
computed by the sum of the main loops complexities as
O(|P|+M|S|+|E|M|S| |A|). In this equation, the third
term is much larger than the other terms when the number of
episodes grows. Therefore, the third term is dominant and the
other terms can be ignored. It should be noted that, in deep
learning methods based on data sets, the complexity is related
to the number of data samples used for training the model.
However, in Q-learning method which is based on trial and
error, the complexity is determined according to the number
of episodes.
It is worth mentioning that all computational tasks in the
ESTO algorithm should be executed for all tiles in each clus-
ter area. Accordingly, the complexity of this algorithm is of
order OM
˜
S, where
˜
Sis the number of all tiles in each
cluster which is much larger than |S|in Q-REDTO. Since
|E|and |A|are upper bounded by the maximum values of
deployed episodes and actions in Q-REDTO, it is concluded
that the computational complexity of ESTO is higher than
that of the proposed Q-REDTO algorithm especially in a
12 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 3. Simulation Parameters
Parameter Value Parameter Value
c3×108m/s fc2.4GHZ
d01mW20 MHZ
Ptotal 60 dBm N036 dBm
ηLoS 2.5ηNLoS 3
α9.61 β0.16
amin 170 mamax 220 m
λ0.1γ0.7
large number of tiles.
V. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed
Q-REDTO algorithm in terms of average sum rate, delay,
and the energy consumption of hubs. In this regard, we
investigate how the Q-REDTO algorithm designs the optimal
trajectory of UAVs to maximize the average sum rate besides
minimizing the average sum delay and energy consumption.
Toward this goal, we consider a vast network in which
|P|= 300 patients are randomly distributed in the area
with the size of 3000 ×3000 m2. The optimal number of
existing UAVs in the network is calculated by the FGKM
scheme described in Algorithm 1. This algorithm divides the
network area into |C|clusters and dedicates a single UAV to
each cluster. For modeling the state space of our proposed
algorithm, the cluster area is discretized into different tails
with an area of 100 ×100 m2. To reduce the complex-
ity of the proposed Q-REDTO algorithm and to guarantee
the real-time transmission of vital signs in delay-sensitive
healthcare applications, the numerical results of Q-REDTO
in state spaces with a different number of tiles are compared.
Afterward, the state space with tiles’ space 100m×100m is
selected to manage the trade-off between the best-obtained
results and the complexity of the algorithm. In addition, we
assume that each UAV ujcan horizontally fly at the altitudes
auj {170,180,190,200,210,220}meters. Moreover, the
horizontal radius of each UAV at each specific altitude is
equal to that altitude. To consider the dynamic nature of the
WBANs environment, Difference Correlated Random Walk
(DCRW) model is assumed for the mobility pattern of hubs.
In this regard, the velocity of each hub Hiis randomly
selected from the interval [0,4] m/s and it is changed in each
time slot. Furthermore, to achieve fair results, equal weights
are considered for transmission rate, energy consumption,
and delay in our proposed utility function in (34). Based on
this, ωr,ωe, and ωiare assumed to be 1. According to these
assumptions, we simulate our proposed algorithm using the
MATLAB simulator V.2018. Table 3 illustrates the list of
parameters for our simulations.
To compare the performance of the proposed Q-REDTO
algorithm in terms of high transmission rate, energy-efficient,
and delay-sensitive transmission of vital signs in WBANs, we
consider the following three baseline models:
Particle Swarm Optimization (PSO)-based UAV Place-
ment with OMA: To make a fair comparison, in this scheme,
the transmissions belonging to communication tier I are
scheduled using the proposed WH coding scheme, in which
all bio-sensors of each patient can simultaneously transmit
data to the corresponding hub by employing the aforemen-
tioned orthogonal WH codes. Afterward, in communication
tier II, where the aggregated data in each hub is transmitted
to UAVs, the OMA-based PSO algorithm proposed in [20]
is employed to optimize the trajectory of UAVs in each
cluster. This algorithm finds the best position of each UAV
in each time slot in which the UAV can achieve the best
throughput along with the least energy consumption. It is
worth mentioning that this algorithm uses the OMA scheme
to schedule the transmission of hubs to UAVs. Accordingly,
in each time slot, only one hub can transmit its data to each
UAV. To the best of our knowledge, [20] is the only work
that has investigated the problem of trajectory optimization
of UAVs in WBANs.
Particle Swarm Optimization (PSO)-based UAV Place-
ment with NOMA: In light of expanding the proposed PSO
algorithm in [20], the NOMA-based PSO scheme is de-
signed to support the NOMA scheduling technique. Similar
to other algorithms, the cyclic orthogonal WH codes are
used for scheduling the transmissions of bio-sensors to the
corresponding hubs in tier I of this scheme. Subsequently, in
communication tier II, the best trajectory of each UAV over
each cluster area is determined by the new proposed PSO
algorithm to optimize the value of utility function URED .
In this case, the transmission of multiple hubs that satisfies
constraint (19) can be simultaneously scheduled in the same
time slot using the NOMA technique.
Exhaustive Search-based Trajectory Optimization
(ESTO) Algorithm: Similar to the PSO-based algorithm, in
the communication tier I of this scheme, the transmission of
vital signs sensed by bio-sensors is scheduled by employing
the cyclic orthogonal WH coding scheme. In the next stage,
UAVs aggregate vital signs received by hubs. In the ESTO
scheme, each UAV has all information of hubs located in its
corresponding cluster. In other words, the utility function in
ESTO is obtained for each tail from (34). Then, based on
this information, each UAV computes its best trajectory in
each time slot. Moreover, ESTO uses the NOMA scheduling
technique in which multiple hubs can simultaneously trans-
mit their data packets to its UAV.
Proposed Q-REDTO Algorithm: This scheme is our
proposed Q-learning-based UAV trajectory optimization al-
gorithm, where we use the cyclic WH codes in commu-
nication tier I. Furthermore, as described in the previous
section, we design a Q-learning-based search algorithm to
find the best location of each UAV in order to maximize the
utility function in (34). Additionally, the same as ESTO, our
proposed algorithm employs the NOMA scheduling scheme
to deliver data packets of hubs to UAVs.
A. RESULTS AND DISCUSSION
In this subsection, we first investigate the trajectory of UAVs
in corresponding cluster areas in the IoMT WBAN. This
VOLUME 4, 2016 13
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 4. 3D trajectory of two UAVs over their cluster areas during ten time
slots.
trajectory is obtained by dividing the whole city area into dif-
ferent clusters using FGKM and assigning one UAV to each
cluster. Fig. 4 shows the 3D trajectory of two different UAVs
attained from Q-REDTO compared to the ESTO algorithm
during ten time slots. In each time slot, Q-REDTO and ESTO
find the best 3D position of UAVs which maximizes URE D
according to (34). It should be noted that because of the
mobility of patients in the network and the variations of vital
signs, the best position of UAVs is changed during the time.
As shown in Fig. 4, in the proposed Q-REDTO algorithm,
UAVs are trained to reach the benchmark trajectory shown
in solid line, by knowing the cluster environment episode by
episode. Moreover, we can see that in some time slots, the po-
sition of UAVs, selected by Q-REDTO, is different from the
benchmark value obtained from the exhaustive search. This
occurs as a result of limited knowledge of UAVs from the
clusters’ area when the Q-REDTO algorithm is employed.
Afterward, we evaluate the performance of the proposed
Q-REDTO algorithm through various scenarios in terms of
the spectral efficiency, energy consumption, and delay.
Scenario I: We first examine the changes in average spec-
tral efficiency defined as the ratio between average sum rate
and the total bandwidth Win different number of patients.
In this regard, assuming the number of patients, i.e., |P|, is
varied from 100 to 300, Fig. 5 illustrates the performance
of our Q-REDTO in terms of increasing the average spectral
efficiency in comparison to the other mentioned schemes.
Because of employing NOMA scheduling technique in both
ESTO and Q-REDTO algorithms, any increase in the number
of patients leads to increasing the number of hubs covered by
each UAV. In this situation, based on NOMA, the number
of simultaneous transmissions from hubs to each UAV is
increased, which results in intensifying the average sum rate.
However, by further increasing the number of patients, the
interference is highly increased that leads to reducing the
rate of raising average spectral efficiency. As it can be re-
100 150 200 250 300
Number of Patients
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
Average Sum Rate (bps)
107
ESTO
Q-REDTO with FGKM
Q-REDTO with K-means
NOMA-based PSO
OMA-based PSO
FIGURE 5. The variation of the average spectral efficiency versus different
number of patients.
alized from this figure, the Q-REDTO algorithm with FGKM
achieves considerably better results than Q-REDTO with K-
means in terms of increasing the average sum rate. This
indicates the poor performance of the K-means algorithm
in finding the global optimal solution of the city clustering
problem that results in deteriorating the optimized trajectory
of UAVs in the network. Furthermore, the results in Fig. 5
show that our proposed Q-REDTO algorithm outperforms
the PSO-based algorithm in increasing the average spectral
efficiency. This occurs as a result of scheduling the transmis-
sions of multiple hubs in the same time slot using NOMA
technique. As demonstrated in the figure, the proposed Q-
REDTO algorithm achieves a better performance in finding
the best value of the average sum rate compared to the
NOMA-based PSO algorithm. The result indicates the lower
convergence speed of PSO than the Q-REDTO algorithm.
In other words, under very similar assumptions, because the
NOMA-based PSO does not converge to the best value of
utility function URED , it obtains the lower average sum rate
in comparison to Q-REDTO. Eventually, Fig. 5 demonstrates
that the results of our Q-REDTO algorithm are absolutely
close to the optimal values of the spectral efficiency obtained
by the ESTO algorithm. Thus, we can claim that our Q-
REDTO algorithm can achieve near optimal results without
requiring all information about the location and channel
conditions of hubs, and just by discovering the cluster en-
vironment step by step through executing episodes.
Scenario II: This scenario investigates the effect of a
different number of patients on the average sum energy
consumption at each time slot. To this end, the value of
|P|is changed between 100 and 300 and the results are
shown in Fig. 6. According to this figure, the average sum
energy consumption of our proposed Q-REDTO and ESTO
algorithms is higher than that for the PSO-based algorithms.
This occurs as a consequence of employing the NOMA
scheduling technique in Q-REDTO and ESTO algorithms.
14 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Regarding this property, multiple adjacent hubs that satisfy
condition (19), simultaneously transmit their data packets
to the corresponding UAV, while in the OMA-based PSO
algorithm, only one hub can transmit its data to each UAV
at each time slot. Additionally, the average sum energy con-
sumption of NOMA-based PSO is less than the Q-REDTO
algorithm. As a consequence of the lower convergence speed
of NOMA-based PSO, the lower number of simultaneous
transmissions is scheduled in each time slot. Hence, the sum
value of energy consumption of these transmissions is less
than that of Q-REDTO. Moreover, from Fig. 6, we can realize
that by increasing the number of patients in the network, the
average sum energy consumption of Q-REDTO and ESTO
is intensified. In these algorithms, by increasing the number
of patients in each cluster, the number of hubs covered by
each UAV is increased. Thus, by increasing the number of
simultaneous transmissions to each UAV, the total energy
consumption of the network is increased. However, by further
intensifying the number of concurrent transmissions, some
of them cannot satisfy condition (19) and they are discarded
from the scheduling process. In this situation, the sum energy
consumption reduces. In contrast, the curve in blue demon-
strates the average sum energy consumption of the OMA-
based PSO algorithm where only one hub transmits its data
packet to each UAV within each time slot. It should be noted
that by increasing the number of patients, the number of hubs
transmitting to each UAV will not rise. Taking this problem
into account, the energy consumption of OMA-based PSO
roughly remains the same in a different number of patients.
Taking these features into account, by rising the number
of patients in the other algorithms, more hubs will be able
to transmit their data packets concurrently which increases
the average sum energy consumption of these algorithms
and finally results in crossing the blue curve. Finally, Fig. 6
illustrates that the average sum energy consumption value of
Q-REDTO with K-means is less than that for Q-REDTO with
FGKM. In this case, because of non-globally optimizing the
clustering problem in K-means, the number of simultaneous
transmissions scheduled in the same time slot is reduced.
Consequently, the sum energy consumption value of these
transmissions is decreased.
Scenario III: The delay sensitivity of our proposed al-
gorithm is evaluated in this scenario. In this regard, the
sum emergency index of different algorithms is compared
in a different number of patients. To obtain this metric, the
emergency indexes of all covered hubs by all UAVs are
added together. This metric shows the delay sensitivity of
the algorithms. In other words, a higher amount of computed
sum emergency index represents that the hubs with higher
emergency conditions can access the channel earlier. This
guarantees the timely transmission of vital signals in life-
critical situations. The results are shown in Fig. 7 supposing
|P|is varied from 100 to 300. As illustrated in Fig. 7, the sum
emergency index of Q-REDTO and ESTO is much higher
than the PSO-based algorithms. This is a consequence of
scheduling multiple hubs in the same time slot using the
100 150 200 250 300
Number of Patients
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Average Sum Energy Consumption (J)
ESTO
Q-REDTO with FGKM
Q-REDTO with K-means
NOMA-based PSO
OMA-based PSO
FIGURE 6. The variation of the average sum energy consumption versus
different number of patients.
NOMA technique in ESTO and Q-REDTO. Accordingly, we
can claim that our proposed Q-REDTO algorithm outper-
forms the PSO-based schemes in terms of delay sensitivity.
Furthermore, this figure shows that under similar assump-
tions, the NOMA-based PSO scheme has worse performance
in comparison to the proposed Q-REDTO algorithm. As
mentioned before, this occurs because the convergence speed
of NOMA-based PSO is less than our proposed algorithm.
Therefore, the inefficient of URED is selected by the NOMA-
based PSO scheme which leads to reducing the sum emer-
gency index value of concurrent transmissions scheduled by
the NOMA technique. Moreover, we can realize that, by
increasing the number of patients, the sum emergency index
of ESTO and Q-REDTO is increased, whereas it remains
constant for the OMA-based PSO algorithm. As it was clari-
fied in the previous scenario, this is a result of increasing the
number of simultaneous transmissions to each UAV in ESTO
and Q-REDTO. Similar to the argument in Scenario I, our
proposed Q-REDTO with FGKM has a better performance in
increasing the value of the average sum emergency index in
comparison to Q-REDTO with K-means, which is a result of
globally finding an optimal solution by the FGKM clustering
algorithm.
Scenario IV: In this scenario, we investigate the con-
vergence of our proposed Q-REDTO algorithm throughout
episodes and in different learning rates. To this end, the
optimal value obtained from the ESTO algorithm in one
time slot is assumed to be the benchmark value, and our
proposed Q-REDTO is trained to converge to this value step
by step during the episodes. In this regard, Fig. 8 illustrates
the convergence of our proposed algorithm by increasing
the number of episodes in the range [1,100], in terms of
the aforementioned factors for one of the UAVs in different
values of the learning rate, i.e., λ. Figs. 8a, 8b, and 8c
show respectively, the convergence of sum rate, sum energy
consumption, and sum emergency index of all admitted hubs
VOLUME 4, 2016 15
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
100 150 200 250 300
Number of Patients
0
50
100
150
200
250
300
350
400
Average Sum Emergency Index
ESTO
Q-REDTO with FGKM
Q-REDTO with K-means
NOMA-based PSO
OMA-based PSO
FIGURE 7. The variation of the average sum emergency index versus
different number of patients.
by the NOMA technique in the selected position of the
UAV versus different episodes. These figures illustrate that
in the earlier episodes, the values obtained from Q-REDTO
are really divergent from the benchmark line attained from
ESTO. This phenomenon occurs because, in earlier episodes,
the UAV has no appropriate cognition from its cluster envi-
ronment, thus, it randomly selects its positions. However, by
further increasing the number of episodes, the environment
information of the UAV is enhanced, and eventually, the val-
ues of sum rate, energy consumption, and emergency index
converge to the corresponding benchmark values. Moreover,
we can realize from Fig. 8 that by increasing the value of
the learning rate, the amount of the three mentioned factors
converges slower. This is because of the unknown nature of
the cluster. In this situation, by increasing the learning rate
value, the effect of calculated reward is intensified in updat-
ing the Q-value of selected state-actions according to (33).
Therefore, in earlier episodes, due to the lack of information
of the environment, selecting some inappropriate positions
leads to considerable changes in Q-values of that state’s
actions. This problem results in a raising number of requisite
episodes for convergence of the algorithm. Additionally, Fig.
8c demonstrates the convergence of the sum value of the
emergency indexes of all hubs scheduled in the same time
slot, which has a reverse relationship with the access delay.
Indeed, as explained in Section III-A, the emergency index
metric, i.e., IHi
em, represents the delay sensitivity of each hub
Hi. Under the circumstances, the higher value of the sum
emergency index in each time slot means that hubs with
higher emergency conditions are scheduled in that time slot.
This property results in reducing the access delay of life-
critical data packets transmitted in the allocated bandwidth.
VI. CONCLUSION AND FUTURE WORK
In this paper, we addressed the Q-learning-based 3D trajec-
tory optimization of UAVs to the timely transmission of vital
signs of patients without interrupting their daily lifestyle.
To this end, we proposed the Q-REDTO algorithm which
efficiently increases the amount of throughput and reduces
the energy consumption and delay by training UAVs to
achieve the best 3D placement. In this regard, at first, each
UAV has no prior cognition of its corresponding cluster area,
and it gets to know the environment during the episodes by
moving among the states based on their Q-value. It should
be noted that the mobility of patients leads to a time-varying
topology of the network. In this situation, our proposed Q-
REDTO algorithm learns to reach the best 3D position for
each UAV in each time slot by updating its Q-table step-
by-step. Moreover, our algorithm employs the NOMA tech-
nique, which simultaneously schedules the transmission of
multiple hubs by considering a degree of interference among
them. Under the circumstances, the data rate requirement
of all of the simultaneous transmissions should be satisfied
using a pre-specified SINR threshold. The simulation results
demonstrated that our Q-REDTO scheme can achieve the
benchmark value of throughput, energy consumption, and
delay without requiring complete information about the envi-
ronment. One possible future work is to expand this study to
use a more sophisticated learning algorithm along with edge
computing in the NOMA technique to find the best set of
simultaneous transmissions as well as employing federated
learning to improve the performance of the proposed trajec-
tory optimization algorithm in absolutely large test-beds.
REFERENCES
[1] A. Ghubaish, T. Salman, M. Zolanvari, D. Unal, A. K. Al-Ali, and R. Jain.
Recent advances in the internet of medical things (IoMT) systems security.
IEEE Internet of Things Journal, 8(11):8707–8718, Dec. 2020.
[2] M. T. Mamaghani and Y. Hong. Intelligent trajectory design for secure
full- duplex MIMO-UAV relaying against active eavesdroppers: A model-
free reinforcement learning approach. IEEE Access, 9:4447–4465, Jan.
2021.
[3] A. Visintini, T. D. P. Perera, and D. N. K. Jayakody. 3-d trajectory
optimization for fixed-wing uav-enabled wireless network. IEEE Access,
9:35045–35056, March 2021.
[4] Y. Du, Z. Hen, J. Hao, and Y. Guo. Joint optimization of trajectory
and communication in multi-UAV assisted backscatter communication
networks. IEEE Access, 10:40861–40871, Apr. 2022.
[5] O. Esrafilian, R. Gangula, and D. Gesbert. Learning to communicate in
UAV-aided wireless networks: Map-based approaches. IEEE Internet of
Things Journal, 6(2):1791–1802, April 2019.
[6] F. Cheng, S. Zhang, Z. Li, Y. Chen, N. Zhao, F. R. Yu, and V. C. M. Leung.
UAV trajectory optimization for data offloading at the edge of multiple
cells. IEEE Transactions on Vehicular Technology, 67(7):6732–6736, July
2018.
[7] L. Wang, K. Wang, C. Pan, W. Xu, N. Aslam, and L. Hanzo. Multi-agent
deep reinforcement learning based trajectory planning for multi-UAV as-
sisted mobile edge computing. IEEE Trans. on Cognitive Communications
and Networking, 7(1):73–84, March 2021.
[8] X. Liu, Y. Liu, Y. Chen, and L. Hanzo. Trajectory design and power control
for multi-UAV assisted wireless networks: A machine learning approach.
IEEE Trans. on Vehicular Technology, 68(8):7957–7969, Aug. 2019.
[9] J. Cui, Z. Ding, Y. Deng, A. Nallanathan, and L. Hanzo. Adaptive UAV-
trajectory optimization under quality of service constraints: A model-free
solution. IEEE Access, 8:112253–112265, June 2020.
[10] H. Wu, F. Lyu, C. Zhou, J. Chen, L. Wang, and X. Shen. Optimal UAV
caching and trajectory in aerial-assisted vehicular networks: A learning-
based approach. IEEE Journal on Selected Areas in Communications,
38(12):2783–2797, Dec. 2020.
16 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[11] Y. Hsu and R. Gau. Reinforcement learning-based collision avoidance and
optimal trajectory planning in UAV communication networks. IEEE Trans.
on Mobile Computing, 2020.
[12] S. Zhu, L. Gui, N. Cheng, F. Sun, and Q. Zhang. Joint design of access
point selection and path planning for UAV-assisted cellular networks.
IEEE Internet of Things Journal, 7(1):220–233, Jan. 2020.
[13] R. W. Jones and G. Despotou. Unmanned aerial systems and healthcare:
Possibilities and challenges. In Proc. 2019 14th IEEE Conference on
Industrial Electronics and Applications (ICIEA), pages 189–194, 2019.
[14] S. Ullah, K. Kim, K. H. Kim, M. Imran, P. Khan, E. Tovar, and F. Ali.
UAV-enabled healthcare architecture: Issues and challenges. Future Gen-
eration Computer Systems, 97:425–432, Aug. 2019.
[15] R. Gupta, A. Shukla, P. Mehta, P. Bhattacharya, S. Tanwar, S. Tyagi,
and N. Kumar. VAHAK: A blockchain-based outdoor delivery scheme
using UAV for healthcare 4.0 services. In Proc. IEEE Conference on
Computer Communications Workshops (INFOCOM WKSHPS), pages
255–260, 2020.
[16] A. Islam and S. Y. Shin. BHMUS: Blockchain based secure outdoor
health monitoring scheme using UAV in smart city. In Proc. 2019 7th
International Conference on Information and Communication Technology
(ICoICT), pages 1–6, 2019.
[17] A. Mukhopadhyay and D. Ganguly. FANET based emergency healthcare
data dissemination. In Proc. 2020 Second International Conference on
Inventive Research in Computing Applications (ICIRCA), pages 170–175,
2020.
[18] A. Kachroo, S. Vishwakarma, J. N. Dixon, H. Abuella, A. Popuri, Q. H.
Abbasi, C. F. Bunting, J. D. Jacob, and S. Ekin. Unmanned aerial vehicle-
to-wearables (UAV2W) indoor radio propagation channel measurements
and modeling. IEEE Access, 7:73741–73750, May 2019.
[19] S. R. Vangimalla and M. El-Sharkawy. Interoperability enhancement in
health care at remote locations using thread protocol in UAVs. In Proc.
IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics
Society, pages 2821–2826, 2018.
[20] C. Tang, C. Zhu, X. Wei, J. J. P. C. Rodrigues, M. Guizani, and W. Jia.
UAV placement optimization for Internet of medical things. In Proc. IEEE
International Wireless Communications and Mobile Computing, pages
752–757. Limassol, Cyprus, July 2020.
[21] A. Tawfiq, J. Abouei, and K. N. Plataniotis. Cyclic orthogonal codes
in CDMA-based asynchronous wireless body area networks. In Proc.
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 1593–1596, 2012.
[22] A. Al-Hourani, S. Kandeepan, and S. Lardner. Optimal LAP altitude for
maximum coverage. IEEE Wireless Communications Letters, 3(6):569–
572, Dec. 2014.
[23] C. You and R. Zhang. 3D trajectory optimization in Rician fading for
UAV-enabled data harvesting. IEEE Trans. on Wireless Communications,
18(6):3192–3207, June 2019.
[24] A. Likas, N. Vlassis, and J.J. Verbeek. The global k-means clustering
algorithm. Pattern Recognition, 36(2):451 461, Feb. 2003.
[25] IEEE standard for local and metropolitan area networks-part 15.6: Wire-
less body area networks. IEEE Std 802.15.6-2012, pages 1–271, Feb.
2012.
ZEINAB ASKARI received the B.Sc. degree in
electronics engineering from Sheikhbahaee Uni-
versity, Iran, in 2012. She completed M. Sc. course
in telecommunication systems engineering at Na-
jafabad University, Iran, in 2016. Her main re-
search interest is wireless networking specially
Internet of Medical Things (IoMT), Wireless Body
Area Networks (WBANs), Scheduling, Reinforce-
ment Learning, and Resource allocation.
JAMSHID ABOUEI (S05, M11, SM13) received
the B.Sc. degree in Electronics Engineering and
the M.Sc. degree in Communication Systems En-
gineering both from Isfahan University of Tech-
nology, Iran, in 1993 and 1996, respectively, and
the Ph.D. degree in Electrical Engineering from
University of Waterloo, Canada, in 2009. He
joined the Department of Electrical Engineering,
Yazd University, Iran, in 1996 (as a Lecturer) and
was promoted to Assistant Professor in 2010, and
Associate Professor in 2015. From 2009 to 2010, he was a Postdoctoral Fel-
low in the Department of Electrical and Computer Engineering, University
of Toronto, Canada. During his sabbatical, he was an Associate Researcher
in the Department of Electrical, Computer and Biomedical Engineering,
Ryerson University, Canada. His research interests are in 5G and wireless
sensor networks (WSNs), with a particular emphasis on PHY/MAC layer
designs including the energy efficiency and optimal resource allocation in
cognitive cell-free massive MIMO networks, multi-user information theory,
mobile edge computing and femtocaching.
MUHAMMAD JASEEMUDDIN (M ’98) re-
ceived B.E. from N.E.D. University, Pakistan, M.
S. from The University of Texas at Arlington,
and Ph.D. from University of Toronto. He worked
in Advanced IP group and Wireless Technology
Lab (WTL) at Nortel Networks. He is Professor
and Program Director of Computer Networks Pro-
gram at Ryerson University. His research interests
include network automation; caching in 5G and
ICN networks; context-aware mobile middleware
and mobile cloud; localization, power-aware MAC and routing for sensor
networks; heterogeneous wireless networks; and IP routing and traffic engi-
neering.
VOLUME 4, 2016 17
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
ALAGAN ANPALAGAN (S98-M01-SM04) re-
ceived the B.A.Sc., M.A.Sc., and Ph.D. degrees,
all in electrical engineering from the University
of Toronto, Canada. He joined with the ELCE
Department, Ryerson University, Canada in 2001,
and was promoted to Full Professor in 2010. He
served the department in administrative positions
as Associate Chair, Program Director for Electri-
cal Engineering, and Graduate Program Director.
During his sabbatical, he was a Visiting Professor
at Asian Institute of Technology, and Visiting Researcher at Kyoto Univer-
sity. His industrial experience includes working for three years with Bell
Mobility, Nortel Networks, and IBM. He directs a research group working
on radio resource management (RRM) and radio access and networking
(RAN) areas within the WINCORE Laboratory. Dr. Anpalagan served as
an Editor for the IEEE COMMUNICATIONS SURVEYS AND TUTO-
RIALS (2012-2014), IEEE COMMUNICATIONS LETTERS (2010-2013),
and EURASIP Journal of Wireless Communications and Networking (2004-
2009). He also served as Guest Editor for six special issues published in
IEEE, IET and ACM. He served as TPC Co-Chair, IEEE VTC Fall 2017,
TPC Co–Chair, IEEE INFOCOM’16, IEEE Globecom15, IEEE PIMRC’11.
He served as Vice Chair, IEEE SIG on Green and Sustainable Networking
and Computing with Cognition and Cooperation (2015–18), IEEE Canada
Central Area Chair (2012-2014), IEEE Toronto Section Chair (2006-2007),
ComSoc Toronto Chapter Chair (2004-2005), and IEEE Canada Professional
Activities Committee Chair (2009-2011). Dr. Anpalagan was the recipient
of the IEEE Canada J.M. Ham Outstanding Engineering Educator Award
(2018), YSGS Outstanding Contribution to Graduate Education Award
(2017), Deans Teaching Award (2011), Faculty Scholastic, Research and
Creativity Award thrice from Ryerson University.
KONSTANTINOS (KOSTAS) N. PLATANIO-
TIS received his B. Eng. degree in Computer
Engineering from University of Patras, Greece and
his M.S. and Ph.D. degrees in Electrical Engi-
neering from Florida Institute of Technology Mel-
bourne, Florida. Dr. Plataniotis is currently a Pro-
fessor with The Edward S. Rogers Sr. Department
of Electrical and Computer Engineering at the
University of Toronto in Toronto, Ontario, Canada,
where he directs the Multimedia Laboratory. He
holds the Bell Canada Endowed Chair in Multimedia since 2014. His
research interests are primarily in the areas of image/signal processing,
machine learning and adaptive learning systems, visual data analysis, mul-
timedia and knowledge media, and affective computing. Dr. Plataniotis is a
Fellow of IEEE, Fellow of the Engineering Institute of Canada, Fellow of the
Canadian Academy of Engineering, and a registered professional engineer in
Ontario.
Dr. Plataniotis has served as the Editor-in-Chief of the IEEE Signal
Processing Letters. He was the Technical Co-Chair of the IEEE 2013
International Conference in Acoustics, Speech and Signal Processing, and
he served as the inaugural IEEE Signal Processing Society Vice President
for Membership (2014 -2016) and General Co-Chair for the 2017 IEEE
GLOBALSIP. He served as the General Co-Chair for the 2018 IEEE Interna-
tional Conference on Image Processing (ICIP 2018) and IEEE International
Acoustics, Speech and Signal Processing (ICASSP 2021). Dr. Plataniotis is
the General Chair for the 2027 IEEE International Conference on Acoustics,
Speech and Signal Processing ICASSP2027), Toronto, ON, Canada.
0 20 40 60 80 100
Episodes
2.173
2.174
2.175
2.176
2.177
Sum Transmission Rate (bps)
106
ESTO
Q-REDTO learning rate 0.1
Q-REDTO learning rate 0.4
Q-REDTO learning rate 0.8
(a)
0 20 40 60 80 100
Episodes
0.106
0.108
0.11
0.112
0.114
0.116
0.118
Sum Energy Consumption (J)
ESTO
Q-REDTO learning rate 0.1
Q-REDTO learning rate 0.4
Q-REDTO learning rate 0.8
(b)
0 20 40 60 80 100
Episodes
30
31
32
33
34
35
36
37
38
39
Sum Emergency Index (reverse of access delay)
ESTO
Q-REDTO learning rate 0.1
Q-REDTO learning rate 0.4
Q-REDTO learning rate 0.8
(c)
FIGURE 8. Convergence of Q-REDTO in terms of a)sum rate, b)sum energy
consumption, and c)sum emergency index.
18 VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3218675
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
... ln conclusion, NOMA performance can be improved by applying AI technologies in to channel estimation, interference mitigation, detection or modulation, resource optimization, and signal processing [50], [87], [93], [277]- [279]. For example, most AI solutions for various PD-NOMA designs include DL (e.g., [182]- [186], [251], [255], [260], [261]), DRL (e.g., [178], [179], [250], [252], [253], [257]- [259], [262], [263]), and FL (e.g., [192], [193]). In particular, DL is used in MIMO NOMA for CSI [182] and resource allocation [183], in RSMA for the precoder vector [184] and robustness [185], in HMAS for signal detection [186], in multi-carrier NOMA for power optimizing [255], in NOMA-VLC for signal demodulation [255], in FTN-NOMA for sliding window detection [260], and in cooperative NOMA for channel classification [261]. ...
... In particular, DL is used in MIMO NOMA for CSI [182] and resource allocation [183], in RSMA for the precoder vector [184] and robustness [185], in HMAS for signal detection [186], in multi-carrier NOMA for power optimizing [255], in NOMA-VLC for signal demodulation [255], in FTN-NOMA for sliding window detection [260], and in cooperative NOMA for channel classification [261]. DRL is used in RSMA and PDMA for resource allocation and optimization [178], [179], in up/downlink NOMA system for transmission scheduling [250], resource allocation [252], [257], [258], beamforming [253], channel estimation [259], [262], and in random NOMA for slot allocation [263]. FL is used in RSMA for optimizing [192] and edge transmission [193]. ...
Preprint
Full-text available
With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimization methods are gradually losing ground to artificial intelligence (AI) techniques that have proven their superiority in handling complexity. AI-empowered MA and its optimization strategies aimed at achieving high Quality-of-Service (QoS) are attracting more attention, especially in the area of latency-sensitive applications in 6G systems. In this work, we aim to: 1) present the development and comparative evaluation of AI-enabled MA; 2) provide a timely survey focusing on spectrum sensing, protocol design, and optimization for AI-empowered MA; and 3) explore the potential use cases of AI-empowered MA in the typical application scenarios within 6G systems. Specifically, we first present a unified framework of AI-empowered MA for 6G systems by incorporating various promising machine learning techniques in spectrum sensing, resource allocation, MA protocol design, and optimization. We then introduce AI-empowered MA spectrum sensing related to spectrum sharing and spectrum interference management. Next, we discuss the AI-empowered MA protocol designs and implementation methods by reviewing and comparing the state-of-the-art, and we further explore the optimization algorithms related to dynamic resource management, parameter adjustment, and access scheme switching. Finally, we discuss the current challenges, point out open issues, and outline potential future research directions in this field.
... Moreover, UAVs are also suggested for telemedicine purposes, as in one experimental study [79], wherein drones enabled the medical professional to guide a subject to conduct a remotely mentored lung examination on himself, or in the theoretical model proposed in [82], in which UAVs resulted in being a flexible and cost-effective solution for the remote monitoring of the vital signs of patients and real-time scheduling of the transmission of vital signs. ...
... Drones are being explored for various other medical applications, including assessing patient conditions [78,81] and telemedicine support [79,82]. These applications offer innovative solutions for remote healthcare, but further research and development are required to address technical and regulatory challenges. ...
Article
Full-text available
Uncrewed aerial vehicles (UAVs), commonly known as drones, have emerged as transformative tools in the healthcare sector, offering the potential to revolutionize medical logistics, emergency response, and patient care. This scoping review provides a comprehensive exploration of the diverse applications of drones in healthcare, addressing critical gaps in existing literature. While previous reviews have primarily focused on specific facets of drone technology within the medical field, this study offers a holistic perspective, encompassing a wide range of potential healthcare applications. The review categorizes and analyzes the literature according to key domains, including the transport of biomedical goods, automated external defibrillator (AED) delivery, healthcare logistics, air ambulance services, and various other medical applications. It also examines public acceptance and the regulatory framework surrounding medical drone services. Despite advancements, critical knowledge gaps persist, particularly in understanding the intricate interplay between technological challenges, the existing regulatory framework, and societal acceptance. This review highlights the need for the extensive validation of cost-effective business cases, the development of control techniques that can address time and resource savings within the constraints of real-life scenarios, the design of crash-protected containers, and the establishment of corresponding tests and standards to demonstrate their conformity.
... However, IoMT encompasses a wider range of medical devices and equipment than only the human body [17]. The authors in [18] used NOMA scheduling for vital signs of patients in IoMT-WBAN with unmanned air vehicle (UAV) capability for outdoor application. IoMT devices are used by the patients to sense and collect several vital signs, while UAVs are in charge of collecting patient data packets dispersed throughout the city, including unreachable areas. ...
Article
Full-text available
Wireless body area networks (WBANs) are noteworthy, dependable, and most advantageous development in many health directed applications. WBANs in health system connect tiny sensors to invasive or non-invasive medical equipment for patient examinations utilizing low power wireless technology. When WBANs are utilized in crowded areas or in conjunction with other wireless sensor networks, communication interference can arise. This can lead to unstable signal integrity, which can impair system performance. Thus, interference mitigation needs to be taken into account when designing. In this paper we survey interference mitigation methods used in WBANs, classify them, and finally point to some future research directions in this area. In particular, we start by reviewing sample papers that tackle the interference management problem through classical signal processing and shaping methods. Then we review some works on cooperative communications approaches to encounter this problem. After that we consider approaches that are not centralized through the use of game theoretic formulations. We close our discussion by surveying model free algorithm through learning based approaches.
... The throughput of the secondary network can be significantly improved by optimizing the UAV's power allocation in NOMA transmissions. The authors in [41] focused on a UAV-assisted large-scale wireless body area network for remote monitoring of the patients' vital signs by optimizing each UAV's trajectory. The NOMA technique was employed to simultaneously schedule UAVs' data transmissions, which can enhance the network throughput with high spectrum efficiency. ...
Article
Full-text available
This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs via low-power backscatter communications and then forwards it to the remote BS by the non-orthogonal multiple access (NOMA) transmissions. We formulate a multi-stage stochastic optimization problem to minimize the long-term time-averaged age-of-information (AoI) by jointly optimizing the GUs' access control, the UAVs' beamforming, and trajectory planning strategies. We first model the dynamics of the GUs' AoI statuses by virtual queueing systems, and then propose the AoI-aware sensing scheduling and trajectory optimization (AoI-STO) algorithm. This allows us to transform the multi-stage AoI minimization problem into a series of per-slot control problems by using the Lyapunov optimization framework. In each time slot, the GUs' access control, the UAVs' beamforming, and mobility control strategies are updated by using the block coordinate descent (BCD) method according to the instant GUs' AoI statuses. Simulation results reveal that the proposed AoI-STO algorithm can reduce the overall AoI by more than 50%. The GUs' scheduling fairness is also improved by adapting the GUs' access control compared with typical baseline schemes.
... Moreover, deep learning is introduced into UAV trajectory optimization. 19 uses Q learning for trajectory optimization, and 20 extend this by incorporating quality of service constraints. However, the optimization problems of the existing works only focus on the location of the UAV, and use PSO or convex optimization method to solve them, without paying attention to the user group, or joint optimization of UAV's position and user grouping. ...
Article
Full-text available
This article investigates the use of unmanned aerial vehicles (UAVs) in assisting hybrid non-orthogonal multiple access (NOMA) systems to enhance spectrum efficiency and communication connectivity. A joint optimization problem is formulated for UAV positioning and user grouping to maximize the sum rate. The formulated problem exhibits non-convexity, calling for an effective solution. To address this issue, a two-stage approach is proposed. In the first stage, a particle swarm optimization algorithm is employed to optimize the UAV positions without considering user grouping. With the UAV positions optimized, a game theory-based approach is utilized in the second stage to optimize user grouping and improve the sum rate of the hybrid NOMA system. Simulation results demonstrate that the proposed two-stage method achieves solutions close to the global optimum of the original problem. By optimizing the positions of UAVs and user groups, the sum rate can be effectively improved. Additionally, optimizing the deployment of UAVs ensures better fairness in providing communication services to multiple users. K E Y W O R D S game theory, non-orthogonal multiple access, particle swarm optimization, unmanned aerial vehicle, user grouping Computational Intelligence. 2024;1-17. wileyonlinelibrary.com/journal/coin
... Error-free transmission at ergodic capacity becomes no longer applicable [15]. Instead, data transmission in WBAN typically operates in the finite blocklength (FBL) regime [16]. Furthermore, due to the time-varying nature of WBAN channels, different transmission sessions experience different channel realization. ...
Article
Energy-efficient transmission is essential to wireless body area networks (WBAN) as most biosensors in WBAN have limited energy supply. In this paper, we study the energy consumption minimization problem for each amplify-and-forward (AF) relay transmission session while satisfying a certain reliability requirement in WBAN. To minimize the energy consumption during successful transmission and wasted energy due to failed transmission attempts, over finite blocklength (FBL) regime, we design an intelligent agent that can determine: i) whether to transmit or not for given current channel state information (CSI) and available resources and ii) the best power levels and blocklength values for the current transmission session if the agent decides to transmit. To perform these two tasks simultaneously, we propose a novel hybrid supervised/reinforcement learning solution. Specifically, we design a classification network following the supervised learning approach to determine whether to transmit or not based on predicted minimal packet error probability. We then develop a deep reinforcement learning (DRL)-based solution that determines the optimal values of the transmission parameters. We also propose a DRL-based online parameter tuning (DRL-OPT) algorithm to minimize the impact of model inaccuracy and/or environment changes. Simulation results reveal that the performance of the proposed hybrid solution is almost identical to that of the exhaustive search. The DRL-OPT algorithm can follow environment variation and maintain a good performance with low computational complexity. Moreover, we numerically analyze the effect of slot duration on energy consumption and develop a guideline for practical WBAN design.
... I NTERNET of medical things-based wireless body-area network (IoMT-based WBAN), a promising technology in continuously patient tracking and health monitoring applications, has recently received considerable attention, even in 5G and beyond 6G systems [1]- [5]. The IEEE Std 802.15.6-2012 is the published international standard for WBAN, based on which various frequency bands are allocated to WBAN applications, such as 402-405 MHz medical implant communication services band, 902-928; 2400-2500; 5725-5875 MHz industrial, scientific, and medical (ISM) band, 3.1-10.76 ...
Article
Full-text available
Continuous health monitoring of vital signs of patients is a challenging issue, especially in emergency medical conditions. This paper designs a practical internet of medical things-based wireless body area network (IoMT-based WBAN), to address this issue. Accordingly, a two-fold test-bed design is proposed taking i) signaling and ii) antenna configuration into account to attain uninterruptible on/off-body communication links. Firstly, the Walsh-Hadamard coding technique is used in all bio-sensors to retain orthogonal simultaneous signaling for on-body links. Secondly, an antenna configuration of the hub is designed so that it prevents probable interruptions in off-body links which may be caused by some human postures. More accurately, a novel periodic leaky-wave antenna (LWA) with an elliptical belt shape is introduced which generates a quasi-omnidirectional pattern. The LWA is designed based on a multi-tone periodicity of a width-modulated microstrip line. At the design frequency of 5.8 GHz, the suggested conformal periodic LWA was simulated and then fabricated. Simulations and measurement results illustrate that the performance of on/off-body communication links is improved in comparison to conventional antennas. Furthermore, simulated and measured radiation patterns have a good agreement with theoretical calculations. Moreover, specific absorption rate (SAR) values of the proposed antenna are significantly below the SAR limits so that this technique can be highly recommended for WBAN applications.
Article
With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network (NN) models, the complexity of multiple access (MA) for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in sixth-generation (6G) systems. Traditional MA design and optimization methods are gradually losing ground to artificial intelligence (AI) techniques that have proven their superiority in handling complexity. AI-empowered MA and its optimization strategies aimed at achieving high quality-of-service (QoS) are attracting more attention, especially in the area of latency-sensitive applications in 6G systems. In this work, we aim to: 1) present the development and comparative evaluation of AI-enabled MA; 2) provide a timely survey focusing on spectrum sensing, protocol design, and optimization for AI-empowered MA; and 3) explore the potential use cases of AI-empowered MA in the typical application scenarios within 6G systems. Specifically, we first present a unified framework of AI-empowered MA for 6G systems by incorporating various promising machine learning (ML) techniques in spectrum sensing, resource allocation, MA protocol design, and optimization. We then introduce AI-empowered MA spectrum sensing related to spectrum sharing and spectrum interference management. Next, we discuss the AI-empowered MA protocol designs and implementation methods by reviewing and comparing the state of the art and further explore the optimization algorithms related to dynamic resource management, parameter adjustment, and access scheme switching. Finally, we discuss the current challenges, point out open issues, and outline potential future research directions in this field.
Article
Full-text available
This paper investigates a multiple unmanned aerial vehicle (multi-UAV) assisted backscatter communication network (BCN), where multiple UAVs are employed to transmit RF carriers to as well as collect data from multiple backscatter sensor nodes (BSNs) deployed on the ground. We formulate an optimization problem to maximize the max-min rate of the BCN by jointly optimizing three blocks of variables, i.e., the UAVs’ trajectories, the UAVs’ transmission power and the BSNs’ scheduling. The BSNs’ sequential energy constraints are innovatively considered in our work. However, the formulated optimization problem is difficult to be solved due to its non-convexity and combinatorial nature. To this end, we use the block coordinate descent (BCD) method and successive convex approximation (SCA) technique. Numerical results show the impact of the BSNs’ sequential energy constraint on the designed UAV trajectories and verify the gain of the proposed design in the max-min rate as compared to a benchmark scheme with UAV trajectory not optimized.
Article
Full-text available
Unmanned aerial vehicles (UAVs) is a promising technology for the next-generation communication systems. In this paper, a fixed-wing UAV to enhance the connectivity for far-users at the coverage region of an overcrowded base station (BS) is considered. In particular, a three dimensions (3D) UAV trajectory optimization to improve the overall energy efficiency of the communication system by considering the system throughput and the UAV’s energy consumption for a given finite time horizon. The solutions for the proposed optimization problem are derived by applying Lagrangian optimization and using an algorithm based on successive convex iteration techniques. Numerical results demonstrate that by optimizing the UAV’s trajectory in the 3D space, the proposed system design achieves significantly higher energy efficiency with the gain reaching up to 20 bits/J compared to the14 bits/J maximum gain achieved by the 2D space trajectory. Further, results reveal that the proposed algorithm converge earlier in 3d space trajectory compare to the 2D space trajectory.
Article
Full-text available
Unmanned aerial vehicle (UAV) assisted wireless communication has recently been recognized as an inevitably promising component of future wireless networks. Particularly, UAVs can be utilized as relays to establish or improve network connectivity thanks to their flexible mobility and likely line-of-sight channel conditions. However, this gives rise to more harmful security issues due to potential adversaries, particularly active eavesdroppers. To combat active eavesdroppers, we propose an artificial-noise beamforming based secure transmission scheme for a full-duplex UAV relaying scenario. In the considered scheme, we investigate a UAV-relay equipped with multiple antennas to securely serve multiple ground users in the presence of randomly located active eavesdroppers. We formulate a novel average system secrecy rate (ASSR) maximization problem under some quality of service (QoS) and mission time constraints. Since the ASSR optimization problem is too hard to solve by conventional optimization methods due to the unavailability of the environment's dynamics and complex model, we develop some model-free reinforcement learning-based algorithms, i.e., Q-learning, SARSA, Expected SARSA, Double Q-learning, and SARSA(λ), to efficiently solve the problem without substantial UAV-network data exchange. Using the proposed algorithms, we can maximize ASSR via finding an optimal UAV trajectory and proper resource allocation. Simulation results demonstrate that all the proposed learning-based algorithms can train the UAV-relay to learn the environment by iterative interactions, thus finding an optimal trajectory, intelligently. Particularly, we find that SARSA(λ) based proposed algorithm with λ=0.1 outperforms the others in terms of the ASSR.
Article
Full-text available
The rapid evolutions in micro-computing, minihardware manufacturing, and machine to machine (M2M) communications have enabled novel Internet of Things (IoT) solutions to reshape many networking applications. Healthcare systems are among these applications that have been revolutionized with IoT, introducing an IoT branch known as the Internet of Medical Things (IoMT) systems. IoMT systems allow remote monitoring of patients with chronic diseases. Thus, it can provide timely patients’ diagnostic that can save their life in case of emergencies. However, security in these critical systems is a major challenge facing their wide utilization. In this paper, we present state-of-the-art techniques to secure IoMT systems’ data during collection, transmission, and storage. We comprehensively overview IoMT systems’ potential attacks, including physical and network attacks. Our findings reveal that most security techniques do not consider various types of attacks. Hence, we propose a security framework that combines several security techniques. The framework covers IoMT security requirements and can mitigate most of its known attacks.
Article
Full-text available
An unmanned aerial vehicle (UAV)-aided mobile edge computing (MEC) framework is proposed, where several UAVs having different trajectories fly over the target area and support the user equipments (UEs) on the ground. We aim to jointly optimize the geographical fairness among all the UEs, the fairness of each UAV’ UE-load and the overall energy consumption of UEs. The above optimization problem includes both integer and continues variables and it is challenging to solve. To address the above problem, a multi-agent deep reinforcement learning based trajectory control algorithm is proposed for managing the trajectory of each UAV independently, where the popular Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method is applied. Given the UAVs’ trajectories, a low-complexity approach is introduced for optimizing the offloading decisions of UEs. We show that our proposed solution has considerable performance over other traditional algorithms, both in terms of the fairness for serving UEs, fairness of UE-load at each UAV and energy consumption for all the UEs.
Article
Full-text available
Unmanned aerial vehicles (UAVs) with the potential of providing reliable high-rate connectivity, are becoming a promising component of future wireless networks. A UAV collects data from a set of randomly distributed sensors, where both the locations of these sensors and their data volume to be transmitted are unknown to the UAV. In order to assist the UAV in finding the optimal motion trajectory in the face of the uncertainty without the above knowledge whilst aiming for maximizing the cumulative collected data, we formulate a reinforcement learning problem by modelling the motion-trajectory as a Markov decision process with the UAV acting as the learning agent. Then, we propose a pair of novel trajectory optimization algorithms based on stochastic modelling and reinforcement learning, which allows the UAV to optimize its flight trajectory without the need for system identification. More specifically, by dividing the considered region into small tiles, we conceive state-action-reward-state-action (Sarsa) and Q-learning based UAV-trajectory optimization algorithms (i.e., SUTOA and QUTOA) aiming to maximize the cumulative data collected during the finite flight-time. Our simulation results demonstrate that both of the proposed approaches are capable of finding an optimal trajectory under the flight-time constraint. The preference for QUTOA vs. SUTOA depends on the relative position of the start and the end points of the UAVs.
Article
In this paper, we investigate the UAV-aided edge caching to assist terrestrial vehicular networks in delivering high-bandwidth content files. Aiming at maximizing the overall network throughput, we formulate a joint caching and trajectory optimization (JCTO) problem to make decisions on content placement, content delivery, and UAV trajectory simultaneously. As the decisions interact with each other and the UAV energy is limited, the formulated JCTO problem is intractable directly and timely. To this end, we propose a deep supervised learning scheme to enable intelligent edge for real-time decision-making in the highly dynamic vehicular networks. In specific, we first propose a clustering-based two-layered (CBTL) algorithm to solve the JCTO problem offline. With a given content placement strategy, we devise a time-based graph decomposition method to jointly optimize the content delivery and trajectory design, with which we then leverage the particle swarm optimization (PSO) algorithm to further optimize the content placement. We then design a deep supervised learning architecture of the convolutional neural network (CNN) to make fast decisions online. The network density and content request distribution with spatio-temporal dimensions are labeled as channeled images and input to the CNN-based model, and the results achieved by the CBTL algorithm are labeled as model outputs. With the CNN-based model, a function which maps the input network information to the output decision can be intelligently learnt to make timely inference and facilitate online decisions. We conduct extensive trace-driven experiments, and our results demonstrate both the efficiency of CBTL in solving the JCTO problem and the superior learning performance with the CNN-based model.
Article
In this paper, we propose a reinforcement learning approach of collision avoidance and investigate optimal trajectory planning for unmanned aerial vehicle (UAV) communication networks. Specifically, each UAV takes charge of delivering objects in the forward path and collecting data from heterogeneous ground IoT devices in the backward path. We adopt reinforcement learning for assisting UAVs to learn collision avoidance without knowing the trajectories of other UAVs in advance. In addition, for each UAV, we use optimization theory to find out a shortest backward path that assures data collection from all associated IoT devices. To obtain an optimal visiting order for IoT devices, we formulate and solve a no-return traveling salesman problem. Given a visiting order, we formulate and solve a sequence of convex optimization problems to obtain line segments of an optimal backward path for heterogeneous ground IoT devices. We use analytical results and simulation results to justify the usage of the proposed approach. Simulation results show that the proposed approach is superior to a number of alternative approaches.