Conference PaperPDF Available

Abstract and Figures

Zero-energy radios in energy-constrained devices are envisioned as key enablers to realizing the next-generation Internet-of-things (NG-IoT) networks for ultra-dense sensing and monitoring. This paper presents analytical modeling and analysis of the energy-efficient uplink transmission of an energy-constrained secondary sensor operating opportunistically among several primary sensors. The considered scenario assumes that all primary sensors transmit in a round-robin, time division multiple access-based schemes, and the secondary sensor is admitted in the time slot of each primary sensor using a non-orthogonal multiple access technique, inspired by cognitive radio. The energy efficiency of the secondary sensor is maximized by exposing it to a deep reinforcement learning-based algorithm, recognized as a deep deterministic policy gradient (DDPG). Our results demonstrate that the DDPG-based transmission scheme outperforms the conventional random and greedy algorithms in terms of energy efficiency at different operating conditions. Index Terms-Next-generation Internet-of-things (NG-IoT), non-orthogonal multiple access (NOMA), deep deterministic policy gradient (DDPG), energy efficiency (EE).
Content may be subject to copyright.
Deep RL-assisted Energy Harvesting in CR-NOMA
Communications for NextG IoT Networks
Syed Asad Ullah, Shah Zeb, Aamir Mahmood, Syed Ali Hassan, and Mikael Gidlund
School of Electrical Engineering & Computer Science (SEECS),
National University of Sciences & Technology (NUST), 44000 Islamabad, Pakistan.
Department of Information Systems & Technology, Mid Sweden University, 851 70 Sundsvall, Sweden.
Email: {sullah.phdee21seecs, szeb.dphd19seecs, ali.hassan}@seecs.edu.pk, {firstname.lastname}@miun.se
Abstract—Zero-energy radios in energy-constrained devices
are envisioned as key enablers to realizing the next-generation
Internet-of-things (NG-IoT) networks for ultra-dense sensing
and monitoring. This paper presents analytical modeling and
analysis of the energy-efficient uplink transmission of an energy-
constrained secondary sensor operating opportunistically among
several primary sensors. The considered scenario assumes that
all primary sensors transmit in a round-robin, time division
multiple access-based schemes, and the secondary sensor is
admitted in the time slot of each primary sensor using a non-
orthogonal multiple access technique, inspired by cognitive radio.
The energy efficiency of the secondary sensor is maximized by
exposing it to a deep reinforcement learning-based algorithm,
recognized as a deep deterministic policy gradient (DDPG). Our
results demonstrate that the DDPG-based transmission scheme
outperforms the conventional random and greedy algorithms in
terms of energy efficiency at different operating conditions.
Index Terms—Next-generation Internet-of-things (NG-IoT),
non-orthogonal multiple access (NOMA), deep deterministic
policy gradient (DDPG), energy efficiency (EE).
I. INTRODUCTION
The provision of energy-efficient wireless connectivity is
becoming vital to realize next-generation Internet-of-things
(NG-IoT) networks. The IoT devices usually have constrained
power supplies, mandating the design of energy-efficient
radios and optimized communication protocols to reduce
energy consumption. In this respect, zero-energy radios are
envisioned to enable ultra-dense connectivity for numerous
application areas, including smart industries, smart healthcare,
smart agriculture, smart cities, etc., [1], [2]. Such radios are
expected to increase the scale of sensing and monitoring
without requiring the need of charging or replacing batteries
for operators. Hence, the goal of NG-IoT networks is to en-
sure energy-efficient communication while satisfying sustain-
able development goals (SDGs) and operational expenditures
(OPEX) of the communication network [3], [4].
With the ever-growing size of the IoT networks, main-
taining the network’s lifetime of energy-constrained sensors
becomes difficult. Particularly, when the sensors are implanted
in unreachable places, the traditional battery-based solutions
are impractical due to high cost of battery replacements and
recycling issues. Therefore, numerous radio frequency (RF)-
based energy harvesting and green communications techniques
are being investigated to address this challenge [5], [6].
In the harvest-then-transmit model, the energy-constrained
sensors may need to switch from transmitting to harvesting or
vice versa depending on various dynamic factors, including
battery capacity, channel conditions, transmit power, and
circuit power [7]–[9]. Under these dynamics, autonomous
and intelligent decision-making and optimization techniques
are necessary, for which deep reinforcement learning (DRL)-
based strategies are gaining momentum [10].
Nevertheless, servicing multiple energy-constrained sen-
sors is still a challenging task due to spectrum limitations.
The challenge of the limited spectrum can be addressed by
adopting a cognitive radio-inspired, prominent multiple ac-
cess technique, recognized as non-orthogonal multiple access
(CR-NOMA), which ensures that multiple uplink users are
multiplexed together and served concurrently [11]–[13].
To provide energy- and spectrum-efficient communication,
optimal energy harvesting and CR-NOMA-based transmis-
sion methods are being investigated in the literature. The
work in [14] addressed a long-term throughput maximization
problem of a point-to-point network and applied the deep
deterministic policy gradient (DDPG) algorithm to achieve
this goal. The authors of [12] have looked into the throughput
maximization problem in an extended uplink scenario where
one unlicensed user uses the NOMA approach to transmit
data during a licensed user’s time slot. To the best of our
knowledge, energy efficiency maximization and its analysis
for an energy-constrained sensor in a CR-NOMA-assisted
NG-IoT network have not been addressed yet.
To maintain a reasonable quality of service (QoS), in CR-
NOMA-assisted NG-IoT networks, we mathematically model
the uplink transmission of an energy-constrained sensor oper-
ating in a CR-NOMA-assisted NG-IoT network and provide
its energy consumption analysis. A DRL-based approach is
implemented to maximize EE of the energy-constrained IoT
sensor operating among several primary sensors in a round-
robin time division multiple access-based (TDMA) scheme.
The contributions of this paper are listed as follows.
We formulate the energy efficiency metric for an energy-
constrained sensor in a CR-NOMA-assisted IoT network
and optimize it using the DDPG algorithm.
We present the analysis of energy efficiency for different
parameters, including path loss exponent, distance, circuit
power, etc., and compare the results with the existing
benchmark schemes, such as greedy and random algo-
rithms.
Fig. 1. System model diagram for uplink communication in NG-IoT network
The remainder of the paper is structured as follows. The
system model is presented in Sec. II. Sec. III formulates our
problem within the DDPG framework and Sec. IV explores
the results of the simulations. Finally Sec. V concludes the
paper.
II. SY ST EM MO DE L
We consider an uplink communication scenario as shown in
Fig. 1. There are Nprimary users (e.g., sensors), denoted by
Uj, for j={1,· · · , N }, a base station (BS), and an energy-
constrained secondary sensor, represented by U0, which can
harvest energy from primary sensors, when they transmit.
Channel gain of the secondary sensor is denoted as h0, and
those of the primary sensors are denoted by hj. The channel
between the secondary sensor and the respective primary
sensor is given by hj,0. All primary sensors transmit based
on a TDMA round-robin scheduling, assisted by CR-NOMA,
with a fixed time T, and the transmission continues for a long
time (N T )so that each primary sensor can transmit at least
once.
1) CR-NOMA-enhanced scheme: For transmitting data, an
energy-constrained sensor is combined into the time slot of
each primary sensor via CR-NOMA. Considering each time
slot T, the first τtTseconds are used by the secondary sensor,
for transmitting data, and the remaining time (1 τt)T, for
harvesting energy, where τtdenotes the time sharing coeffi-
cient and assumes a value between 0 and 1. The following
assumptions are considered in this scenario, i) the secondary
sensor is aware of the channel state information of each
primary sensor, scheduled at that particular time slot T, and ii)
the battery of the energy-constrained sensor is assumed to be
full at the start of the communication. With these assumptions,
the transmit power of the secondary sensor is given by
τtT P0,t Et,(1)
where Etdenotes the current energy in the battery of the
secondary sensor at time tand P0,t represents its transmit
power at time t. Similarly, the energy accumulated by the
secondary sensor, at the start of the time slot, t+ 1, is given
by
Et+1 =minnEt+(1τt)T ηPjt |hjt,0|2τtT P0,t , Emo,(2)
which fulfills the condition of no energy overflow. In (2), Em
represents the secondary sensor’s maximum battery capacity,
Pjt represents the power received from the j-th transmitting
sensor at t-th time, ηis the coefficient of energy harvesting
efficiency, and hjt,0represents the channel between the sec-
ondary sensor and the j-th primary sensor at time t. Therefore,
the EE of the secondary sensor at the t-th time can be defined
as [15]
ˆ
ΓEE =PM
t=1 Rt(τt, P0,t)
PT
,(3)
where Rt(τt, P0,t) = τtlog21 + P0,t|h0|2
1+Pjt |hjt|2and PT=Pc+
P0,t, with Pcrepresenting the circuit power consumed by the
internal circuitry of the secondary sensor. The Rtexpression
ensures that the BS first performs successive interference
cancellation (SIC) and can correctly decode the signal from
the secondary sensor. After the BS eliminates the secondary
sensor’s decoded signal, the signals of the primary sensors’
can be decoded.
A. Problem Formulation
Our goal is to maximize EE, therefore, (3) can be formu-
lated as a maximization problem as
max
τt,P0,t
fo(τt, P0,t)
s.t. C1 : f1(P0,t , τt) = minnEm, Qo,
C2 : f2(P0,t , τt)0,
C3 : 0 f3(τt)1,
C4 : 0 f4(P0,t )Psm,
(4)
where Psm is the maximum transmit power of the secondary
sensor, fo(τt, P0,t) = ˆ
ΓEE(τt, P0,t ),f1(P0,t, τt) = Et+1 ,
f2(P0,t, τt) = τtT P0,t Et,f3(τt)=τt,f4(P0,t )=P0,t,
and Q=Et+ (1 τt)T ηPjt|hjt,0|2τtT P0,t . Constraint C1
expresses the battery energy level of the secondary sensor at
time t+1 while the amount of harvested energy cannot exceed
its maximum battery capacity. C2is the difference between
the energy consumed and the energy available at time t, which
ensures the non-negativity of C1.C3limits the value of the
time-sharing coefficient between 0 and 1. Finally, C4states
that the transmit power of the secondary sensor can assume a
value between 0 and Psm.
Problem (4) is non-convex due to C1being not an affine
function and both the optimization variables appear in multi-
plication in C2. However, because the optimization variables’
values are continuous, problem (4) can be resolved using the
DDPG algorithm. Problem (4) is initially divided into two
sub-problems since the range of values for the optimization
variables makes direct implementation of DDPG challenging.
The first sub-problem is defined as
max
τt,P0,t
fo(τt, P0,t)
s.t. C1 : ˆ
f1(P0,t, τt) = 0,
C2,C3,C4in (4),
(5)
where ˆ
f1(P0,t, τt) = (1 τt)T ηPjt |hjt,0|2τtT P0,t ¯
Et
and ¯
Et= (1 τt)T η Pjt|hj t,0|2τtT P0,t, which denotes the
energy fluctuation parameter. Problem (5) is solved by convex
optimization, where the close-form expressions are obtained
for a given ¯
Et. The corresponding closed-form expressions
are given as [12]
P
0,t(¯
Et) = (1 τ
t)ηPjt|hjt,0|2
τ
t
¯
Et
τ
tT,
and,
τ
t(¯
Et) = min{1,max{x,0}},
where 0=maxn1Et+¯
Et
T ηPjt |h0,t|2,T η Pjt|h0,t |2¯
Et
T ηPjt |h0,t|2+T Pmo,
x=x1x2
ew0(e1(x11))+11+x1
,x1=ηPjt |hjt,0|2|h0|2
1+Pjt |hjt|2,x2=
¯
Et|h0|2
T(1+Pjt |hjt|2)and W0(.)represents the Lambart-W-Function.
The second sub-problem is defined as follows. As our goal
is to maximize EE, from (5) we can observe that the EE, ˆ
ΓEE,
at time t, is not dependent on τˆ
tand P0,ˆ
tfor t=ˆ
t. Hence, the
optimization problem (4) can be reformulated as a function of
¯
Et, into the framework of DDPG, which is given as
max
¯
Et
γt1ˆ
ΓEE¯
Et|τ
t, P
0,t
s.t. Et+1 =minnEm, Et+¯
Eto,
(6)
where γrepresents the discounted factor and assumes a value
between 0 and 1. From problem (6) it can be seen that the
action of the energy-constrained sensor is to choose ¯
Etfor
a given τ
tand P
0,t. By substituting the expression of ˆ
ΓEE
in (6), we get the maximization problem as
max
¯
EtPM
t=1 γt1τ
t(¯
Et)log2 1 + P
0,t(¯
Et)|h0|2
1+Pjt |hjt|2!
PT
s.t. Et+1 =min{Em, Et+¯
Et}.
(7)
It can be observed that the above maximization problem is
a univariate function, which is also continuous. This makes
problem (7) well-fitted to be solved by the DDPG algorithm.
III. IMPLEMENTATION OF DRL ALGORITHM
In this section, we provide preliminaries of the DRL al-
gorithm, i.e., DDPG and we formulate our problem into the
DDPG framework.
A. Deep Deterministic Policy Gradient
DDPG being an actor-critic algorithm is based on determin-
istic policy gradient (DPG) and Deep Q-Network (DQN) [16].
Deep Q-Learning (DQL) becomes inefficient when action and
state spaces are continuous and highly dimensional, therefore
DDPG suits best for such scenarios [17]. In a DRL setup,
initially, the agent (or observer) possesses zero knowledge
about the environment. The agent learns the environment with
time, as it continuously monitors the surroundings and learns
how to maximize a reward signal, using an optimal policy.
1) DDPG Framework: In the DDPG algorithm, at a par-
ticular time step t, the goal of an agent is to find an action
at, for an observation st, that receives a reward rt, which
consequently maximizes the action value function, represented
by Q(st, at). Accordingly, the maximization problem is given
as
a
t(st) = argmax Q(st, at),(8)
where Q(st, at)represents the expected return. The actor
network (or policy network), takes the action, whereas the
critic network (or Q network) acts as an evaluator, which
evaluates how well the action taken by the actor network is.
The parameter for policy network is θµ, which takes stas an
input and produces an action, represented by µ(st|θµ). The
corresponding actor target network is parameterized by θµt
and outputs µt(st|θµt). The critic network is parameterized
by θQ, which takes stand atas inputs and produces the state
value function, represented by Q(st, at|θQ). The correspond-
ing critic target network is parameterized by θQtand outputs
Qt(st, at|θQt).
2) Networks Updating Process: The actor network takes
the action, while other networks ensure that, the actor network
has been trained perfectly in evaluating its output (action). Let
us assume a tuple (st, at, rt, st+1), where strepresents the
current state, atrepresents the action, the agent took according
to the state observed, rtis the reward for the action taken, and
st+1 represents the upcoming state. Based on the above tuple,
the networks update process is given as follows.
1) The training process for the actor network is accomplished
by maximizing (8), which is known as the state value
function. Using parameters of actor and critic networks, (8)
can be reformulated as
J(θµ) = Q(st, at=µ(st|θµ)|θQ).(9)
By taking the gradient of (9) with respect to θµwe get
θµJ(θµ)=∆atQ(st, at|θQ)∆θµµ(st|θµ).(10)
2) Updating the critic network depends on two actor net-
works, first by feeding the output of the target actor
network to the target critic network, which outputs the
target value as a state value function, as
yt=rt+γQt(st+, µt(st+|θµt)|θQt ).(11)
The second estimate for the state value function can be
obtained by minimizing the loss function given by
L(θQ) = |ytQ(st, at|θQ)|2.(12)
3) Using a soft target, which assumes a very low value, the
parameters of both the critic target network and the actor
target network are updated. This is because both target
networks are updated less frequently as compared to their
corresponding counterparts. The corresponding parameters
are updated as
θµtξθµ+ (1 ξ)θµt(13)
and
θQtξθQ+ (1 ξ)θQt(14)
respectively, and ξdenotes the soft updating parameter.
Replay buffer and exploration are two other important features
of the DDPG algorithm. DDPG replay buffer refers to the
storage of the past tuples (st, at, rt, st+1) in a pool. These
tuples are used for enhancing the learning of the agent. Once
the network updating process is completed, batch-sized tuples
are chosen randomly from the pool, which is further passed
on for updating the network. Regarding exploration, the actor
network is forced to explore its surroundings completely, to
do so, the noise figure is supplemented to the actor network’s
output, which can be represented as
a(st) = µ(st|θµ)+Ψ,(15)
where Ψrepresents the added noise.
B. Problem Formulation into DDPG Framework
The DDPG algorithm is implemented in the above problem
while defining state space, action space, and reward as follows:
1) State Space: The state space shall be a tuple containing
channel gains and the energy-constrained sensor’s available
energy, which is represented as
st=hEt,|hjt |2,|h0|2|hjt,0|2iT
.(16)
2) Action Space: The action space contains a single pa-
rameter, which is ¯
Et. The maximum and minimum values
achieved by ¯
Etare given by
min{T Psm, Et} ¯
Etmin{EmEt, T ηPj t|ht,o |2},(17)
where the lower bound is due to the fact when τt= 1, i.e., no
energy harvesting, but transmission only, and also due to the
energy available at the start of time slot Tt. The upper bound
on ¯
Etis due to the fact that τt= 0, i.e., no transmission
but only energy harvesting, and also since a finite amount of
energy can be gathered at time Tt.
TABLE I
SIMULATION PARAMETERS
Parameter Symbol Value
Actor Network’s learning rate αa0.002
Critic Network’s learning rate αc0.005
Batch size B64 Tuples
Memory capacity R10000
Noise spectral density σo-190 dBm
Signal bandwidth Ws10 MHz
Maximum Battery Capacity Em0.2 J
Maximum Transmit Power Psm 23 dBm
Circuit Power Pc15 dBm
Energy Efficiency Coefficient η0.9
Time slot duration T1s
Discounted Factor γ0.99
Center Frequency fc914 MHz
Soft Update Parameter ξ0.01
Since (17) can assume a much larger or much smaller value,
these values can be bounded between 0 and 1, hence ¯
Etis
normalized as follows:
¯
Et=ζtminnEmEt, T ηPj t|hj t,0|2o
(1 ζn)minnT Psm, Eto.(18)
According to (18), the the action parameter for the DDPG
algorithm shall be ζ, where ζ[0,1].
3) Reward: The reward parameter is the EE achieved by
the secondary sensor, i.e., ˆ
ΓEE.
IV. SIMULATION RESULTS AN D ANALY SI S
In this section, we provide performance analysis of the
system model defined in Sec. II. We benchmark the perfor-
mance of the DDPG algorithm against random and greedy
methods. In these benchmark methods, the transmit power
of the energy-constrained sensor is fixed at Psm, however,
the selection of the time-sharing coefficient, τt, differs. In
the random algorithm, τtis chosen uniformly between 0 and
min{1,Et
T Psm }, whereas, in the greedy algorithm, τtis selected
to be min{1,Et
T Psm }.
A. Simulation Environment Setup and Parameters Selection
In our simulations, we have assumed that the BS is located
at the x-y plane’s origin, i.e., (0,0) and we assume large-scale
route loss and ignore random fading. The neural networks,
each having two hidden layers, are simulated for both actor
and critic networks. The activation function used for two
hidden layers is the linear activation function, known as
rectified linear activation function (ReLU), whereas the output
layer’s activation function is the hyperbolic tangent function.
Regarding the critic network, the ReLU activation function is
used in all hidden layers. Further details of fixed parameters,
chosen for simulations, are listed in Table I.
B. Results Analysis
In this section, we present a performance analysis of the
DDPG scheme in comparison with other benchmark schemes,
i.e., greedy and random algorithms.
5 15 25 35 45
Episodes
400
1000
1600
2200
Energy Efficiency (bits\Joul)
DDPG
RANDOM
GREEDY
20 25 30
0
20
40
60
Fig. 2. Energy efficiency of the energy-constrained sensor for the various
number of episodes for the three algorithms.
1) EE comparison against Episodes: Fig 2 shows the
comparison of episodic rewards in terms of EE for the DDPG
algorithm and the benchmark schemes against a number of
episodes. It can be observed that DDPG achieves higher
rewards as compared to greedy and random techniques.
Additionally, we can see that the DDPG algorithm almost
converges after 40 episodes and that there is only a marginal
improvement in the episodic reward after that point. To
help the reader get clarity, a magnified perspective of the
performance of the random and greedy algorithms has been
provided in Fig 2.
2) EE comparison against Path Loss: In order to evaluate
the performance of the DDPG algorithm, EE for all three
schemes are plotted in Fig. 3 for various values of the path
loss exponent. During this setup, the two primary sensors
are at locations (0 m, 1000 m) and (0 m, 1 m), respectively.
The maximum transmit power of primary sensor is fixed at,
Pum = 30 dBm and the power consumed by the RF circuitry
is assumed to be, Pc= 15 dBm. It can be observed that the
DDPG-based algorithm outperforms both the random as well
as the greedy approach. This looks contradictory that, usually
by increasing the path loss exponent the energy consumption
shall increase, because of the dense environment assumed.
However, this increase in EE is because the throughput of the
secondary sensor depends on the transmit power of the pri-
mary sensors, thus, when the path loss exponent is increased,
the transmit power of the primary sensor (located at (0 m,
1 m)) is more affected as compared to the secondary sensor.
Therefore, this benefits the secondary sensor in achieving high
EE, with the increase in the path loss exponent.
3) EE comparison against Transmit Power of Primary
Sensors: The comparison of EE against the transmit power
of primary sensors is shown in Fig. 4. Once again the DDPG
algorithm outperforms the random and greedy algorithms. In
this setup, the path loss exponent is set to n= 3, and the
two primary sensors, assisting the secondary sensor, are at
locations (0 m, 1000 m) and (0 m, 1 m), respectively in the x-
y plane, where the location of the secondary sensor is (1 m,
1 m) in the x-y plane. The power consumed by the RF circuitry
2 2.5 3 3.5 4
Path Loss Exponent
5
10
20
50
150
Energy Efficiency (bits/J)
DDPG
Random
Greedy
Fig. 3. Energy efficiency comparison of three algorithms against path loss
exponent.
0.5 1 1.5 2 2.5
Maximum Transmit Power (W)
1
2
10
100
250
Energy Efficiency (bits/J)
DDPG
Random
Greedy
Fig. 4. Energy efficiency comparison of three algorithms against the transmit
power of primary sensors.
is assumed to be Pc= 15 dBm.
We can observe that by increasing the transmit power of
the primary sensors the EE of the secondary is not raised
much and shows a constant behavior. In the case of the DDPG
algorithm, this is because the expression of data rate when
increased by some value of transmit power is decreased by
the same value at the same time, hence showing a constant
trend.
4) EE comparison against Distance and Circuit Power:
The combined effect of distance and circuit power on both
DDPG and random algorithms has been depicted in Fig. 5(a)
and Fig. 5(b). The distance of the secondary sensor from BS
and primary sensors is presented on the y-axis and the amount
of power consumed by the internal circuitry of the secondary
sensor is presented on the x-axis. In this setting the path loss
exponent is assumed to be, n= 3, the maximum transmit
power of primary sensors is assumed to be, Pum = 30 dBm
and the two primary sensors are located, in x-y plane, at
(0 m, 1000 m) and (0 m, 1 m), respectively. One can observe
a decrease in EE of the secondary sensor with both variables
changing in ascending order. In other words, as the energy-
constrained sensor moves away from primary sensors, in the
x-plane, more energy would be required by the secondary
sensor to make its transmissions, hence its EE is reduced.
The decrease in EE of the secondary sensor against its circuit
140
140
188
188
188
235
235
283
283
331
331
378
378
426
473
521
569
616
0.01 0.03 0.05 0.07 0.09
Circuit Power (w)
5
15
25
35
Distance (m)
Energy
Efficiency
100
200
300
400
500
600
700
(a)
33
33
65
65
97
97
97
128
128
160
192
224
256
288
320
352
383
0.01 0.03 0.05 0.07 0.09
Circuit Power (w)
5
15
25
35
Distance (m)
Energy
Efficiency
50
100
150
200
250
300
350
400
(b)
Fig. 5. Energy efficiency of the energy-constrained sensor against distance
and circuit power, (a) DDPG algorithm, (b) random algorithm.
power can also be observed being declined, as the circuit
power increases. This is because an increase in the circuit
power of the secondary sensor increases the total amount of
power required to transmit data, which causes the EE of the
secondary sensor to be reduced.
ACKNOWLEDGMENT
This work was supported by the Swedish Knowledge Foun-
dation (KKS) research profile NIIT.
V. CONCLUSION
This paper studied the uplink performance analysis of an
energy-constrained secondary sensor in a considered CR-
NOMA-assisted IoT network. We mathematically modeled
and formulated the EE maximization problem of the secondary
sensor, which was solved using a DRL framework, i.e., the
DDPG algorithm. Moreover, we analyzed and compared the
obtained simulation results with the benchmark algorithms,
i.e., greedy and random. The simulation results demonstrated
that the considered DDPG algorithm outperforms the selected
benchmark algorithms in the EE metric. In comparison, we
observed that the EE curve for the DDPG algorithm converged
almost after 40 episodes, while high EE performance was ob-
served in harsher and more diverse environmental conditions.
Similarly, the results demonstrated that increasing the transmit
power of primary sensors in CR-assisted NOMA transmission
leads to improved EE of the secondary sensor with DDPG. We
also examined the combined effect of separation distance and
circuit power, which can be a handful from a system design
perspective. In future work, the model can be extended to
analyze the EE of multiple energy-constrained sensors in a
CR-NOMA network.
REFERENCES
[1] Y. B. Zikria, R. Ali, M. K. Afzal, and S. W. Kim, “Next-generation
Internet of things (IoT): Opportunities, challenges, and solutions,”
Sensors, vol. 21, no. 4, p. 1174, 2021.
[2] S. Zeb, A. Mahmood, et al., Analysis of beyond 5G integrated com-
munication and ranging services under indoor 3-D mmwave stochastic
channels,” IEEE Transactions on Industrial Informatics, vol. 18, no. 10,
pp. 7128–7138, 2022.
[3] S. Zeb et al., “Industry 5.0 is coming: A survey on intelligent
nextG wireless networks as technological enablers, arXiv preprint
arXiv:2205.09084, 2022.
[4] S. Zeb, M. A. Rathore, et al., “Edge intelligence in softwarized 6G:
Deep learning-enabled network traffic predictions, in IEEE Globecom
Workshops (GC Wkshps), pp. 1–6, 2021.
[5] G. G. de Oliveira Brante, M. T. Kakitani, and R. D. Souza, “Energy
efficiency analysis of some cooperative and non-cooperative trans-
mission schemes in wireless sensor networks,” IEEE Transactions on
Communications, vol. 59, no. 10, pp. 2671–2677, 2011.
[6] A. W. Nazar, S. A. Hassan, H. Jung, A. Mahmood, and M. Gidlund,
“BER analysis of a backscatter communication system with non-
orthogonal multiple access,” IEEE Transactions on Green Communi-
cations and Networking, vol. 5, no. 2, pp. 574–586, 2021.
[7] S. Zeb et al., “Industrial digital twins at the nexus of nextG wireless
networks and computational intelligence: A survey,” Journal of Network
and Computer Applications, vol. 200, p. 103309, 2022.
[8] B. Matthiesen, A. Zappone, et al., A globally optimal energy-efficient
power control framework and its efficient implementation in wireless in-
terference networks,” IEEE Transactions on Signal Processing, vol. 68,
pp. 3887–3902, 2020.
[9] N. Rubab et al., “Interference mitigation in RIS-assisted 6G systems
for indoor industrial iot networks,” in IEEE 12th Sensor Array and
Multichannel Signal Processing Workshop (SAM), pp. 211–215, 2022.
[10] A. Mahmood et al., “Industrial IoT in 5G-and-beyond networks: Vi-
sion, architecture, and design trends,” IEEE Transactions on Industrial
Informatics, vol. 18, no. 6, pp. 4122–4137, 2022.
[11] F. Jameel et al., “NOMA-enabled backscatter communications: Toward
battery-free iot networks,” IEEE Internet of Things Magazine, vol. 3,
no. 4, pp. 95–101, 2020.
[12] Z. Ding, R. Schober, and H. V. Poor, “No-pain no-gain: DRL assisted
optimization in energy-constrained CR-NOMA networks, IEEE Trans-
actions on Communications, vol. 69, no. 9, pp. 5917–5932, 2021.
[13] S. Zeb, Q. Abbas, et al., “NOMA enhanced backscatter communication
for green iot networks,” in 16th International Symposium on Wireless
Communication Systems, pp. 640–644, 2019.
[14] L. Li, H. Xu, J. Ma, A. Zhou, and J. Liu, “Joint EH time and transmit
power optimization based on DDPG for EH communications,” IEEE
Communications Letters, vol. 24, no. 9, pp. 2043–2046, 2020.
[15] G. Y. Li, Z. Xu, C. Xiong, C. Yang, S. Zhang, Y. Chen, and S. Xu,
“Energy-efficient wireless communications: tutorial, survey, and open
issues,” IEEE Wireless communications, vol. 18, no. 6, pp. 28–35, 2011.
[16] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
“Deterministic policy gradient algorithms,” in International conference
on machine learning, pp. 387–395, 2014.
[17] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
D. Silver, and D. Wierstra, “Continuous control with deep reinforcement
learning,” arXiv preprint arXiv:1509.02971, 2015.
... A DRL-assisted approach is used to solve the long-term throughput maximization problem in a communication network that employs CR-NOMA. Similarly, the work presented in [28], addressed the EE maximization problem for an EH radio functioning amidst scheduled primary devices. By leveraging an RL approach, the authors proposed an optimal transmission policy that enhances spectral efficiency and maximizes the EE of the EH device through CR-NOMA. ...
... Studies such as [26] and [27] use linear EH models. Research in [28] and [29] also use linear EH models with RL and DDPG algorithms. Our contributions include the development of a non-linear EH model, accounting for RF circuit power consumption, and the implementation of CR-NOMA, all optimized through an advanced DRL algorithm, termed the MDDPG algorithm. ...
... where Ω = max ˆ, Ξ , and it can be observed that (28) is a function ofΓ . Finally, using (28), the optimal solution Ψ * (Γ ) is given by (from (9)) ...
Article
Full-text available
Given the rising demand for low-power sensing, integrating additional devices into an existing wireless infrastructure calls for innovative energy-and spectrum-efficient wireless connectivity strategies. In this respect, wireless-powered or energy-harvesting symbiotic radio (EHSR) is gaining attention for establishing the secondary relationship with the primary wireless systems in terms of RF EH and opportunistically sharing the spectrum or schedule. In this paper, assuming the commensalistic relationship with the primary system, we consider the energy-efficient optimization of such an EHSR by intelligently making EH and transmission decisions under the inherent nonlinearity of the EH circuitry and dynamics of pre-scheduled primary devices. We present a state-of-the-art deep reinforcement learning (DRL)-engineered, energy-efficient transmission strategy, which intelligently orchestrates EHSR’s uplink transmissions, leveraging the cognitive radio-inspired non-orthogonal multiple access (CR-NOMA) scheme. We first formulate the energy efficiency (EE) optimization metric for EHSR considering the nonlinear EH model, and then we decompose the inherently complex, non-convex problem into two optimization layers. The strategy first derives the optimal transmit power and time-sharing coefficient parameters, using convex optimization. Subsequently, these inferred parameters are substituted in the subsequent layer, where the optimization problem with continuous action space is addressed via a DRL framework, named modified deep deterministic policy gradient (MDDPG). Simulation results reveal that, compared to the baseline DDPG algorithm, our proposed solution provides a 6% EE gain with the linear EH model and approximately a 7% EE gain with the non-linear EH model.
... A DRL-assisted approach is used to solve the long-term throughput maximization problem in a communication network that employs CR-NOMA. Similarly, the work presented in [28], addressed the EE maximization problem for an EH radio functioning amidst scheduled primary devices. By leveraging an RL approach, the authors proposed an optimal transmission policy that enhances spectral efficiency and maximizes the EE of the EH device through CR-NOMA. ...
... Studies such as [26] and [27] use linear EH models. Research in [28] and [29] also use linear EH models with RL and DDPG algorithms. Our contributions include the development of a non-linear EH model, accounting for RF circuit power consumption, and the implementation of CR-NOMA, all optimized through an advanced DRL algorithm, termed the MDDPG algorithm. ...
... where Ω = max ˆ, Ξ , and it can be observed that (28) is a function ofΓ . Finally, using (28), the optimal solution Ψ * (Γ ) is given by (from (9)) ...
Article
Full-text available
Given the rising demand for low-power sensing, integrating additional devices into an existing wireless infrastructure calls for innovative energy-and spectrum-efficient wireless connectivity strategies. In this respect, wireless-powered or energy-harvesting symbiotic radio (EHSR) is gaining attention for establishing the secondary relationship with the primary wireless systems in terms of RF EH and opportunistically sharing the spectrum or schedule. In this paper, assuming the commensalistic relationship with the primary system, we consider the energy-efficient optimization of such an EHSR by intelligently making EH and transmission decisions under the inherent nonlinearity of the EH circuitry and dynamics of pre-scheduled primary devices. We present a state-of-the-art deep reinforcement learning (DRL)-engineered, energy-efficient transmission strategy, which intelligently orchestrates EHSR's uplink transmissions, leveraging the cognitive radio-inspired non-orthogonal multiple access (CR-NOMA) scheme. We first formulate the energy efficiency (EE) optimization metric for EHSR considering the nonlinear EH model, and then we decompose the inherently complex, non-convex problem into two optimization layers. The strategy first derives the optimal transmit power and time-sharing coefficient parameters, using convex optimization. Subsequently, these inferred parameters are substituted in the subsequent layer, where the optimization problem with continuous action space is addressed via a DRL framework, named modified deep deterministic policy gradient (MDDPG). Simulation results reveal that, compared to the baseline DDPG algorithm, our proposed solution provides a 6% EE gain with the linear EH model and approximately a 7% EE gain with the non-linear EH model. INDEX TERMS Symbiotic radio, RF EH, cognitive radio-inspired non-orthogonal multiple access (CR-NOMA), energy efficiency (EE), deep deterministic policy gradient (DDPG).
... Among these technologies, cognitive radio (CR) technology emerges as a promising solution to address the challenges associated with spectrum under-utilization by allowing secondary users to share the spectrum with primary users during their idle mode [9]. In the context of this paper, we will be exploring the combination of CR and non-orthogonal multiple access (NOMA) [10], [11] to establish a spectral-efficient WPCN. The core principle of CR-NOMA is to serve the secondary node (SN) while guaranteeing the quality-of-service (QoS) of the primary device (PD) [12]. ...
... Subsequently, in the second subproblem, the optimal solutions derived from the first subproblem are employed to formulate a one-dimensional, continuous action space optimization problem, which is solved by a DRL framework. • We employ various deep reinforcement learning (DRL) algorithms, such as deep deterministic policy gradient (DDPG) [30], prioritized experience replay (PER)-DDPG [31], combined experience replay (CER)-DDPG [11], proximal policy optimization (PPO) [32], and twin delayed DDPG (TD3) [33], alongside non-DRL algorithms like the random method [11] and the greedy method [11], to investigate how different algorithms could perform in dynamic environment settings. • We present how various factors, such as the number of RF-EH antennas, and the transmit power of the PDs impact the EH performance and sum rate of SN. ...
... Subsequently, in the second subproblem, the optimal solutions derived from the first subproblem are employed to formulate a one-dimensional, continuous action space optimization problem, which is solved by a DRL framework. • We employ various deep reinforcement learning (DRL) algorithms, such as deep deterministic policy gradient (DDPG) [30], prioritized experience replay (PER)-DDPG [31], combined experience replay (CER)-DDPG [11], proximal policy optimization (PPO) [32], and twin delayed DDPG (TD3) [33], alongside non-DRL algorithms like the random method [11] and the greedy method [11], to investigate how different algorithms could perform in dynamic environment settings. • We present how various factors, such as the number of RF-EH antennas, and the transmit power of the PDs impact the EH performance and sum rate of SN. ...
Article
Full-text available
In the rapidly evolving landscape of advanced wireless networks, self-sustainable Internet-of-things (IoT) networks become pivotal, necessitating to seamlessly accommodate additional resource-limited devices into the existing wireless infras-tructures. To this end, this paper considers an IoT scenario with a wireless-powered communication network (WPCN) where a resource-constrained secondary node (SN) with energy harvesting (EH) capabilities harvests energy from the ambient radio-frequency (RF) signals to meet its energy requirements. Notably, we introduce RF-EH diversity-combining techniques, such as equal gain combining (EGC), maximum ratio combining (MRC), and selection combining (SC) tailored for linear EH models. To address the spectrum scarcity, the SN employs a quality of service (QoS)-aware non-orthogonal multiple access (NOMA) scheme to opportunistically transmit data within the uplink transmissions of the primary devices (PDs) operating around. Aiming to maximize the sum rate of the SN, we jointly optimize the EH time and transmit power of the SN using deep reinforcement learning (DRL). Specifically, we implement a set of DRL and non-DRL algorithms to investigate their robustness in diverse RF-EH diversity-combining environment settings. Simulation results demonstrate the influence of diversity combining techniques on the sum rate performance of the SN, providing valuable insights into their role in optimizing SN performance under dynamic EH environments. Keywords-Internet-of-things (IoT), wireless-powered communication network (WPCN), RF energy harvesting, diversity-combining, non-orthogonal multiple access (NOMA), and deep reinforcement learning (DRL).
... Their simulation results demonstrated that the proposed method performs better than empirical search algorithm (ESA) and genetic algorithm (GA). In [33], Ullah et al. proposed a power allocation algorithm based on DDPG to maximize energy efficiency in MIMO-NOMA next-generation Internet-of-Things (NG-IoT) networks. Their simulation results demonstrated that the proposed method achieved better performance compared with random algorithms and greedy algorithms. ...
... In the testing stage, we verify the performance of the near-optimal policy obtained in the training stage. Existing works have adopted GA [32] and random power allocation policy [33] as the baseline algorithm for power allocation; therefore, we selected these two algorithms for comparison. Here, random power allocation policy and GA are introduced as follows: ...
Article
Full-text available
Multi-input multi-output and non-orthogonal multiple access (MIMO-NOMA) Internet-of-Things (IoT) systems can improve channel capacity and spectrum efficiency distinctly to support real-time applications. Age of information (AoI) plays a crucial role in real-time applications as it determines the timeliness of the extracted information. In MIMO-NOMA IoT systems, the base station (BS) determines the sample collection commands and allocates the transmit power for each IoT device. Each device determines whether to sample data according to the sample collection commands and adopts the allocated power to transmit the sampled data to the BS over the MIMO-NOMA channel. Afterwards, the BS employs the successive interference cancellation (SIC) technique to decode the signal of the data transmitted by each device. The sample collection commands and power allocation may affect the AoI and energy consumption of the system. Optimizing the sample collection commands and power allocation is essential for minimizing both AoI and energy consumption in MIMO-NOMA IoT systems. In this paper, we propose the optimal power allocation to achieve it based on deep reinforcement learning (DRL). Simulations have demonstrated that the optimal power allocation effectively achieves lower AoI and energy consumption compared to other algorithms. Overall, the reward is reduced by 6.44% and 11.78% compared the to GA algorithm and random algorithm, respectively.
... They employed a DRL-assisted approach to tackle the long-term throughput maximization problem in a communication network implementing CR-NOMA. Similarly, in [3], the authors maximized energy efficiency (EE) for an EH radio alongside scheduled primary devices. Their objective is to enhance the system's spectral efficiency and optimize the EE of the EH device using a DRL method. ...
Article
Full-text available
This paper investigates the uplink communication of an energy harvesting (EH)-enabled resource-constrained secondary device (RCSD) coexisting with primary devices in a cognitive radio-aided non-orthogonal multi-access (CR-NOMA) network. Assuming a non-linear EH model in practice, the data rate of the RCSD is maximized using deep reinforcement learning (DRL). We first derive the optimal solutions for the parameters of interest including the time-sharing coefficient and transmit power of the RCSD, using convex optimization and then implement the DRL to address a continuous action spaced optimization problem. To comprehensively assess the agent's performance and adaptability, we implement various DRL algorithms and compare them under non-linear EH, which reveals their suitability in various scenarios, aiding in selecting the most effective approach.
... Similar to the studies in [7] and [8], the work in [6] also concentrates on enhancing overall user throughput without addressing the overall system EE. On the other hand, the work in [9] primarily concentrates on EE optimization. ...
Conference Paper
Full-text available
Amidst the ongoing debate about limited spectral availability, there remains a persistent demand for the development of spectrally efficient self-sustainable network (SSN) models. This paper addresses this challenge by optimizing spectral efficiency (SE) in uplink transmissions for an energy harvesting (EH)-enabled secondary user (SU) that operates opportunistically among multiple primary users (PUs) in an Internet-of-things (IoT) network. The PUs are assumed to employ a rotational time division multiple access (TDMA) scheme for transmissions, where the signals are divided into time slots for each PU to transmit data in a cyclic manner, while the SU uses an opportunistic non-orthogonal multiple access (NOMA) technique to transmit data without interfering with the PU transmissions, such that, at any given time slot, a PU and a SU share the same frequency band simultaneously. The SE of the system is maximized jointly by employing convex optimization and a deep reinforcement learning (DRL) model, specifically the deep deterministic policy gradient (DDPG) algorithm. Simulations demonstrate that the proposed approach significantly improves the SE of the considered IoT network, highlighting its potential for efficient spectrum management in IoT networks. We present a comprehensive SE analysis of the system, which further underscores the robustness and adaptability of our approach in optimizing SE under diverse operational conditions. Index Terms-Self-sustainable network (SSN), spectral efficiency (SE), non-orthogonal multiple access (NOMA), Internet-of-things (IoT), and deep deterministic policy gradient (DDPG).
... Finally, we optimize the joint objective of AoI and energy consumption in NR-V2X using MPDQN. Many recent works employed genetic algorithms [56] and random algorithms [57] as baseline algorithms for resource allocation, and thus we shall compare our approach with these two methods above. It is observed that the average AoI in the system increases with the number of vehicles, regardless of whether LTE-V2X or NR-V2X communication mode is utilized. ...
Preprint
Autonomous driving may be the most important application scenario of next generation, the development of wireless access technologies enabling reliable and low-latency vehicle communication becomes crucial. To address this, 3GPP has developed Vehicle-to-Everything (V2X) specifications based on 5G New Radio (NR) technology, where Mode 2 Side-Link (SL) communication resembles Mode 4 in LTE-V2X, allowing direct communication between vehicles. This supplements SL communication in LTE-V2X and represents the latest advancement in cellular V2X (C-V2X) with improved performance of NR-V2X. However, in NR-V2X Mode 2, resource collisions still occur, and thus degrade the age of information (AOI). Therefore, a interference cancellation method is employed to mitigate this impact by combining NR-V2X with Non-Orthogonal multiple access (NOMA) technology. In NR-V2X, when vehicles select smaller resource reservation interval (RRI), higher-frequency transmissions take ore energy to reduce AoI. Hence, it is important to jointly consider AoI and communication energy consumption based on NR-V2X communication. Then, we formulate such an optimization problem and employ the Deep Reinforcement Learning (DRL) algorithm to compute the optimal transmission RRI and transmission power for each transmitting vehicle to reduce the energy consumption of each transmitting vehicle and the AoI of each receiving vehicle. Extensive simulations have demonstrated the performance of our proposed algorithm.
... Finally, we optimize the joint objective of AoI and energy consumption in NR-V2X using MPDQN. Many recent works have employed genetic algorithms [36] and random algorithms [37] as baseline algorithms for resource allocation, and thus, we shall compare our approach with these two methods above. Figure 3 illustrates the variation in average AoI in the system as the number of vehicles using LTE-V2X and NR-V2X for V2V direct communication scenarios. ...
Article
Full-text available
As autonomous driving may be the most important application scenario of the next generation, the development of wireless access technologies enabling reliable and low-latency vehicle communication becomes crucial. To address this, 3GPP has developed Vehicle-to-Everything (V2X) specifications based on 5G New Radio (NR) technology, where Mode 2 Side-Link (SL) communication resembles Mode 4 in LTE-V2X, allowing direct communication between vehicles. This supplements SL communication in LTE-V2X and represents the latest advancements in cellular V2X (C-V2X) with the improved performance of NR-V2X. However, in NR-V2X Mode 2, resource collisions still occur and thus degrade the age of information (AOI). Therefore, an interference cancellation method is employed to mitigate this impact by combining NR-V2X with Non-Orthogonal multiple access (NOMA) technology. In NR-V2X, when vehicles select smaller resource reservation intervals (RRIs), higher-frequency transmissions use more energy to reduce AoI. Hence, it is important to jointly considerAoI and communication energy consumption based on NR-V2X communication. Then, we formulate such an optimization problem and employ the Deep Reinforcement Learning (DRL) algorithm to compute the optimal transmission RRI and transmission power for each transmitting vehicle to reduce the energy consumption of each transmitting vehicle and the AoI of each receiving vehicle. Extensive simulations demonstrate the performance of our proposed algorithm.
Article
The next frontier in wireless connectivity lies at the intersection of cognitive radio (CR) technology and machine learning (ML), where intelligent networks can provide pervasive connectivity for an ever-expanding range of applications. In this regard, this survey provides an in-depth examination of the integration of ML-based CR in a wide range of emerging wireless networks, including the Internet of Things (IoT), mobile communications (vehicular and railway), and unmanned aerial vehicle (UAV) communications. By combining ML-based CR and emerging wireless networks, we can create intelligent, efficient, and ubiquitous wireless communication systems that satisfy spectrum-hungry applications and services of next-generation networks. For each type of wireless network, we highlight the key motivation for using intelligent CR and present a full review of the existing state-of-the-art ML approaches that address pressing challenges, including energy efficiency, interference, throughput, latency, and security. Our goal is to provide researchers and newcomers with a clear understanding of the motivation and methodology behind applying intelligent CR to emerging wireless networks. Moreover, problems and prospective research avenues are outlined, and a future roadmap is offered that explores possibilities for overcoming challenges through trending concepts.
Conference Paper
Full-text available
Ever necessity of tremendous data traffic and massive deployment of industrial Internet-of-Things (IIoT) devices operating in higher bands, i.e., millimeter-wave (mmWave) and terahertz (THz), have encouraged academia and industry to transition towards the future sixth-generation (6G) wireless communication networks. Nevertheless, the recent emergence of 6G enablers, i.e., reflecting intelligent surface (RIS) and massive multiple-input multiple-output (mMIMO), has the ability to potentially change the previous paradigm of indoor mmWave wireless communication by modifying the propagation environment. It can control and establish the favorable and tunable wireless channel responses by exploiting the multipath and diversity of the propagation environment. Therefore, this next cutting-edge technology is capable of massively improving the performance of mmWave-mMIMO-enabled IIoT data transmissions, making it a feasible solution for 6G networks. In this paper, we propose a RIS-assisted mmWave-mMIMO for a multi-cells indoor factory propagation environment, which provides, 1) aid in mitigating the impact of radio frequency (RF) interference from interferes in closed vicinity (i.e., neighbor cells) by employing metasurface laminated walls, and 2) increased the mmWave-mMIMO system performance by controlling and tuning the indoor factory channel conditions. Our results indicate that the proposed RIS-assisted mmWave-mMIMO system outperforms the benchmark link capacity performance in the presence of interference Index Terms-6G, reconfigurable intelligent surfaces, massive MIMO, millimeter-Wave, industrial Internet-of-things.
Article
Full-text available
By amalgamating recent communication and control technologies, computing and data analytics techniques, and modular manufacturing, Industry 4.0 promotes integrating cyber-physical worlds through cyber-physical systems (CPS) and digital twin (DT) for monitoring, optimization, and prognostics of industrial processes. A DT enables interaction with the digital image of the industrial physical objects/processes to simulate, analyze, and control their real-time operation. DT is rapidly diffusing in numerous industries with the interdisciplinary advances in the industrial Internet of things (IIoT), edge and cloud computing, machine learning, artificial intelligence, and advanced data analytics. However, the existing literature lacks in identifying and discussing the role and requirements of these technologies in DT-enabled industries from the communication and computing perspective. In this article, we first present the functional aspects, appeal, and innovative use of DT in smart industries. Then, we elaborate on this perspective by systematically reviewing and reflecting on recent research trends in next-generation (NextG) wireless technologies (e.g., 5G-and-Beyond networks) and design tools, and current computational intelligence paradigms (e.g., edge and cloud computing-enabled data analytics, federated learning). Moreover, we discuss the DT deployment strategies at different communication layers to meet the monitoring and control requirements of industrial applications. We also outline several key reflections and future research challenges and directions to facilitate industrial DT's adoption.
Article
Full-text available
5G and beyond (B5G) networks are moving towards the higher end of the millimeter-wave (mmWave) spectrum (i.e., from 25 GHz to 100 GHz) to support integrated communications and ranging (ICAR) services in next-generation factory deployments. The ICAR services in factory deployments require extreme bandwidth/capacity and large ranging coverage, which a mmWave-B5G system can fulfill using massive multi-input and multi-output (mMIMO), beamforming, and advanced ranging techniques. However, as mmWave signal propagation is sensitive to harsh channel conditions experienced in typical indoor factory environments, there is a growing interest in the realistic mmWave indoor channel modeling to evaluate the practical scope of the mmWave-B5G systems. In this paper, we study and implement a 3D stochastic channel model using the baseline 3GPP model. Our channel model employs the time-cluster spatial-lobe (TCSL) technique, and utilizes the temporal and spatial statistics to create the channel impulse response (CIR), reflecting realistic indoor factory conditions. Using the generated CIR, we present the performance analysis of a mmWave-B5G system in terms of power delay profile (PDP), path loss, communication and ranging coverage, and mMIMO channel capacity.
Conference Paper
Full-text available
The 6G vision is envisaged to enable agile network expansion and rapid deployment of new on-demand microservices (e.g., visibility services for data traffic management, mobile edge computing services) closer to the network's edge IoT devices. However, providing one of the critical features of network visibility services, i.e., data flow prediction in the network, is challenging at the edge devices within a dynamic cloud-native environment as the traffic flow characteristics are random and sporadic. To provide the AI-native services for the 6G vision, we propose a novel edge-native framework to provide an intelligent prognosis technique for data traffic management in this paper. The prognosis model uses long short-term memory (LSTM)-based encoder-decoder deep learning, which we train on real time-series multivariate data records collected from the edge µ-boxes of a selected testbed network. Our result accurately predicts the statistical characteristics of data traffic and verifies the trained model against the ground truth observations. Moreover, we validate our novel framework with two performance metrics for each feature of the multivariate data.
Article
Full-text available
Cellular networks are envisioned to be a cornerstone in future industrial IoT (IIoT) wireless connectivity in terms of fulfilling the industrial-grade coverage, capacity, robustness, and timeliness requirements. This vision has led to the design of verticals-centric service-based architecture of 5G radio access and core networks. The design incorporates the capabilities to include 5G-AI-Edge ecosystem for computing, intelligence, and flexible deployment and integration options (e.g., centralized and distributed, physical and virtual) while eliminating the privacy/security concerns of mission-critical systems. In this paper, driven by the industrial interest in enabling large-scale wireless IIoT deployments for operational agility, flexible, and cost-efficient production, we present the state-of-the-art 5G architecture, transformative technologies, and recent design trends, which we also selectively supplemented with new results. We also identify several research challenges in these promising design trends that beyond-5G systems must overcome to support rapidly unfolding transition in creating value-centric industrial wireless networks.
Preprint
Full-text available
By amalgamating recent communication and control technologies, computing and data analytics techniques, and modular manufacturing, Industry~4.0 promotes integrating cyber-physical worlds through cyber-physical systems (CPS) and digital twin (DT) for monitoring, optimization, and prognostics of industrial processes. A DT is an emerging but conceptually different construct than CPS. Like CPS, DT relies on communication to create a highly-consistent, synchronized digital mirror image of the objects or physical processes. DT, in addition, uses built-in models on this precise image to simulate, analyze, predict, and optimize their real-time operation using feedback. DT is rapidly diffusing in the industries with recent advances in the industrial Internet of things (IIoT), edge and cloud computing, machine learning, artificial intelligence, and advanced data analytics. However, the existing literature lacks in identifying and discussing the role and requirements of these technologies in DT-enabled industries from the communication and computing perspective. In this article, we first present the functional aspects, appeal, and innovative use of DT in smart industries. Then, we elaborate on this perspective by systematically reviewing and reflecting on recent research in next-generation (NextG) wireless technologies (e.g., 5G and beyond networks), various tools (e.g., age of information, federated learning, data analytics), and other promising trends in networked computing (e.g., edge and cloud computing). Moreover, we discuss the DT deployment strategies at different industrial communication layers to meet the monitoring and control requirements of industrial applications. We also outline several key reflections and future research challenges and directions to facilitate industrial DT's adoption.
Article
Full-text available
Backscatter communication (BackCom) has been emerging as a prospective candidate in tackling lifetime management problems for massively deployed Internet-of-things (IoT) devices, which suffer from battery related issues, i.e., replacements, charging, and recycling. This passive sensing approach allows a backscatter sensor node (BSN) to transmit information by reflecting the incident signal from a carrier emitter without initiating its transmission. To multiplex multiple BSNs, power-domain non-orthogonal multiple access (NOMA) is fully exploited in this work. In this paper, we present the design and analysis of a NOMA enhanced bistatic BackCom system for a battery-less smart communication paradigm. Specifically, we derive the closed-form bit error rate (BER) expressions for a cluster of two devices in a bistatic BackCom system employing NOMA with imperfect successive interference cancellation under Nakagami-m fading channel. The obtained expressions are utilized to evaluate the reflection coefficients of devices needed for the most favorable system performance along with the performance comparison with orthogonal multiple access-time domain multiple access scheme (OMA-TDMA).
Article
Full-text available
It is predicted that by 2025, all devices will be connected to the Internet, subsequently causing the number of devices connected with the Internet to rise [...]
Article
Full-text available
This paper applies machine learning to optimize the transmission policy of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) networks, where time-division multiple access (TDMA) is used to schedule multiple primary users and an energy-constrained secondary user is admitted to the primary users' time slots via NOMA. During each time slot, the secondary user performs the two tasks: data transmission and energy harvesting based on the signals received from the primary users. The goal of the paper is to maximize the secondary user's long-term throughput, by optimizing its transmit power and the time-sharing coefficient for its two tasks. The long-term throughput maximization problem is challenging due to the need for making decisions that yield long-term gains but might result in short-term losses. For example, when in a given time slot, a primary user with large channel gains transmits, intuition suggests that the secondary user should not carry out data transmission due to the strong interference from the primary user but perform energy harvesting only, which results in zero data rate for this time slot but yields potential long-term benefits. In this paper, a deep reinforcement learning approach is applied to emulate this intuition, where the deep deterministic policy gradient (DDPG) algorithm is employed together with convex optimization. Our simulation results demonstrate that the proposed deep reinforcement learning assisted NOMA transmission scheme can yield significant performance gains over two benchmark schemes.
Article
Full-text available
(To appear in IEEE Internet of Things Magazine 2020.) A new wireless era beckons, giving rise to novel communication techniques to support the services and demands foreseen for the coming decades. One such revolutionizing technique intended to enable the Internet-of-things (IoT) is backscatter communication. Simply employing backscatter communication may not be enough to efficiently connect the massive number of devices in the IoT network. To achieve this feat, non-orthogonal multiple access (NOMA) techniques have been merged with backscatter communications. Although NOMA-enabled backscatter communication is expected to significantly improve the low-powered IoT system, the benefits come with several challenges. In this article, we show that NOMA-enabled backscatter communication has the potential to connect a large number of IoT devices in a battery-free manner. To begin, this article provides the basics on backscatter communication and NOMA techniques. Next, a taxonomy and gap analysis of the studies on backscatter communication is provided. Then, novel use-cases for NOMA-enabled backscatter communication are detailed, and a case study for smart farming using the overall data rate of backscatter sensor devices employing power-domain NOMA (PD-NOMA) is presented. Finally, we discuss some interesting and potential research challenges to the realization of massive IoT networks using NOMA-enabled backscatter communication.