Conference PaperPDF Available

Abstract and Figures

Zero-energy radios in energy-constrained devices are envisioned as key enablers to realizing the next-generation Internet-of-things (NG-IoT) networks for ultra-dense sensing and monitoring. This paper presents analytical modeling and analysis of the energy-efficient uplink transmission of an energy-constrained secondary sensor operating opportunistically among several primary sensors. The considered scenario assumes that all primary sensors transmit in a round-robin, time division multiple access-based schemes, and the secondary sensor is admitted in the time slot of each primary sensor using a non-orthogonal multiple access technique, inspired by cognitive radio. The energy efficiency of the secondary sensor is maximized by exposing it to a deep reinforcement learning-based algorithm, recognized as a deep deterministic policy gradient (DDPG). Our results demonstrate that the DDPG-based transmission scheme outperforms the conventional random and greedy algorithms in terms of energy efficiency at different operating conditions. Index Terms-Next-generation Internet-of-things (NG-IoT), non-orthogonal multiple access (NOMA), deep deterministic policy gradient (DDPG), energy efficiency (EE).
Content may be subject to copyright.
Deep RL-assisted Energy Harvesting in CR-NOMA
Communications for NextG IoT Networks
Syed Asad Ullah, Shah Zeb, Aamir Mahmood, Syed Ali Hassan, and Mikael Gidlund
School of Electrical Engineering & Computer Science (SEECS),
National University of Sciences & Technology (NUST), 44000 Islamabad, Pakistan.
Department of Information Systems & Technology, Mid Sweden University, 851 70 Sundsvall, Sweden.
Email: {sullah.phdee21seecs, szeb.dphd19seecs, ali.hassan}@seecs.edu.pk, {firstname.lastname}@miun.se
Abstract—Zero-energy radios in energy-constrained devices
are envisioned as key enablers to realizing the next-generation
Internet-of-things (NG-IoT) networks for ultra-dense sensing
and monitoring. This paper presents analytical modeling and
analysis of the energy-efficient uplink transmission of an energy-
constrained secondary sensor operating opportunistically among
several primary sensors. The considered scenario assumes that
all primary sensors transmit in a round-robin, time division
multiple access-based schemes, and the secondary sensor is
admitted in the time slot of each primary sensor using a non-
orthogonal multiple access technique, inspired by cognitive radio.
The energy efficiency of the secondary sensor is maximized by
exposing it to a deep reinforcement learning-based algorithm,
recognized as a deep deterministic policy gradient (DDPG). Our
results demonstrate that the DDPG-based transmission scheme
outperforms the conventional random and greedy algorithms in
terms of energy efficiency at different operating conditions.
Index Terms—Next-generation Internet-of-things (NG-IoT),
non-orthogonal multiple access (NOMA), deep deterministic
policy gradient (DDPG), energy efficiency (EE).
I. INTRODUCTION
The provision of energy-efficient wireless connectivity is
becoming vital to realize next-generation Internet-of-things
(NG-IoT) networks. The IoT devices usually have constrained
power supplies, mandating the design of energy-efficient
radios and optimized communication protocols to reduce
energy consumption. In this respect, zero-energy radios are
envisioned to enable ultra-dense connectivity for numerous
application areas, including smart industries, smart healthcare,
smart agriculture, smart cities, etc., [1], [2]. Such radios are
expected to increase the scale of sensing and monitoring
without requiring the need of charging or replacing batteries
for operators. Hence, the goal of NG-IoT networks is to en-
sure energy-efficient communication while satisfying sustain-
able development goals (SDGs) and operational expenditures
(OPEX) of the communication network [3], [4].
With the ever-growing size of the IoT networks, main-
taining the network’s lifetime of energy-constrained sensors
becomes difficult. Particularly, when the sensors are implanted
in unreachable places, the traditional battery-based solutions
are impractical due to high cost of battery replacements and
recycling issues. Therefore, numerous radio frequency (RF)-
based energy harvesting and green communications techniques
are being investigated to address this challenge [5], [6].
In the harvest-then-transmit model, the energy-constrained
sensors may need to switch from transmitting to harvesting or
vice versa depending on various dynamic factors, including
battery capacity, channel conditions, transmit power, and
circuit power [7]–[9]. Under these dynamics, autonomous
and intelligent decision-making and optimization techniques
are necessary, for which deep reinforcement learning (DRL)-
based strategies are gaining momentum [10].
Nevertheless, servicing multiple energy-constrained sen-
sors is still a challenging task due to spectrum limitations.
The challenge of the limited spectrum can be addressed by
adopting a cognitive radio-inspired, prominent multiple ac-
cess technique, recognized as non-orthogonal multiple access
(CR-NOMA), which ensures that multiple uplink users are
multiplexed together and served concurrently [11]–[13].
To provide energy- and spectrum-efficient communication,
optimal energy harvesting and CR-NOMA-based transmis-
sion methods are being investigated in the literature. The
work in [14] addressed a long-term throughput maximization
problem of a point-to-point network and applied the deep
deterministic policy gradient (DDPG) algorithm to achieve
this goal. The authors of [12] have looked into the throughput
maximization problem in an extended uplink scenario where
one unlicensed user uses the NOMA approach to transmit
data during a licensed user’s time slot. To the best of our
knowledge, energy efficiency maximization and its analysis
for an energy-constrained sensor in a CR-NOMA-assisted
NG-IoT network have not been addressed yet.
To maintain a reasonable quality of service (QoS), in CR-
NOMA-assisted NG-IoT networks, we mathematically model
the uplink transmission of an energy-constrained sensor oper-
ating in a CR-NOMA-assisted NG-IoT network and provide
its energy consumption analysis. A DRL-based approach is
implemented to maximize EE of the energy-constrained IoT
sensor operating among several primary sensors in a round-
robin time division multiple access-based (TDMA) scheme.
The contributions of this paper are listed as follows.
We formulate the energy efficiency metric for an energy-
constrained sensor in a CR-NOMA-assisted IoT network
and optimize it using the DDPG algorithm.
We present the analysis of energy efficiency for different
parameters, including path loss exponent, distance, circuit
power, etc., and compare the results with the existing
benchmark schemes, such as greedy and random algo-
rithms.
Fig. 1. System model diagram for uplink communication in NG-IoT network
The remainder of the paper is structured as follows. The
system model is presented in Sec. II. Sec. III formulates our
problem within the DDPG framework and Sec. IV explores
the results of the simulations. Finally Sec. V concludes the
paper.
II. SY ST EM MO DE L
We consider an uplink communication scenario as shown in
Fig. 1. There are Nprimary users (e.g., sensors), denoted by
Uj, for j={1,· · · , N }, a base station (BS), and an energy-
constrained secondary sensor, represented by U0, which can
harvest energy from primary sensors, when they transmit.
Channel gain of the secondary sensor is denoted as h0, and
those of the primary sensors are denoted by hj. The channel
between the secondary sensor and the respective primary
sensor is given by hj,0. All primary sensors transmit based
on a TDMA round-robin scheduling, assisted by CR-NOMA,
with a fixed time T, and the transmission continues for a long
time (N T )so that each primary sensor can transmit at least
once.
1) CR-NOMA-enhanced scheme: For transmitting data, an
energy-constrained sensor is combined into the time slot of
each primary sensor via CR-NOMA. Considering each time
slot T, the first τtTseconds are used by the secondary sensor,
for transmitting data, and the remaining time (1 τt)T, for
harvesting energy, where τtdenotes the time sharing coeffi-
cient and assumes a value between 0 and 1. The following
assumptions are considered in this scenario, i) the secondary
sensor is aware of the channel state information of each
primary sensor, scheduled at that particular time slot T, and ii)
the battery of the energy-constrained sensor is assumed to be
full at the start of the communication. With these assumptions,
the transmit power of the secondary sensor is given by
τtT P0,t Et,(1)
where Etdenotes the current energy in the battery of the
secondary sensor at time tand P0,t represents its transmit
power at time t. Similarly, the energy accumulated by the
secondary sensor, at the start of the time slot, t+ 1, is given
by
Et+1 =minnEt+(1τt)T ηPjt |hjt,0|2τtT P0,t , Emo,(2)
which fulfills the condition of no energy overflow. In (2), Em
represents the secondary sensor’s maximum battery capacity,
Pjt represents the power received from the j-th transmitting
sensor at t-th time, ηis the coefficient of energy harvesting
efficiency, and hjt,0represents the channel between the sec-
ondary sensor and the j-th primary sensor at time t. Therefore,
the EE of the secondary sensor at the t-th time can be defined
as [15]
ˆ
ΓEE =PM
t=1 Rt(τt, P0,t)
PT
,(3)
where Rt(τt, P0,t) = τtlog21 + P0,t|h0|2
1+Pjt |hjt|2and PT=Pc+
P0,t, with Pcrepresenting the circuit power consumed by the
internal circuitry of the secondary sensor. The Rtexpression
ensures that the BS first performs successive interference
cancellation (SIC) and can correctly decode the signal from
the secondary sensor. After the BS eliminates the secondary
sensor’s decoded signal, the signals of the primary sensors’
can be decoded.
A. Problem Formulation
Our goal is to maximize EE, therefore, (3) can be formu-
lated as a maximization problem as
max
τt,P0,t
fo(τt, P0,t)
s.t. C1 : f1(P0,t , τt) = minnEm, Qo,
C2 : f2(P0,t , τt)0,
C3 : 0 f3(τt)1,
C4 : 0 f4(P0,t )Psm,
(4)
where Psm is the maximum transmit power of the secondary
sensor, fo(τt, P0,t) = ˆ
ΓEE(τt, P0,t ),f1(P0,t, τt) = Et+1 ,
f2(P0,t, τt) = τtT P0,t Et,f3(τt)=τt,f4(P0,t )=P0,t,
and Q=Et+ (1 τt)T ηPjt|hjt,0|2τtT P0,t . Constraint C1
expresses the battery energy level of the secondary sensor at
time t+1 while the amount of harvested energy cannot exceed
its maximum battery capacity. C2is the difference between
the energy consumed and the energy available at time t, which
ensures the non-negativity of C1.C3limits the value of the
time-sharing coefficient between 0 and 1. Finally, C4states
that the transmit power of the secondary sensor can assume a
value between 0 and Psm.
Problem (4) is non-convex due to C1being not an affine
function and both the optimization variables appear in multi-
plication in C2. However, because the optimization variables’
values are continuous, problem (4) can be resolved using the
DDPG algorithm. Problem (4) is initially divided into two
sub-problems since the range of values for the optimization
variables makes direct implementation of DDPG challenging.
The first sub-problem is defined as
max
τt,P0,t
fo(τt, P0,t)
s.t. C1 : ˆ
f1(P0,t, τt) = 0,
C2,C3,C4in (4),
(5)
where ˆ
f1(P0,t, τt) = (1 τt)T ηPjt |hjt,0|2τtT P0,t ¯
Et
and ¯
Et= (1 τt)T η Pjt|hj t,0|2τtT P0,t, which denotes the
energy fluctuation parameter. Problem (5) is solved by convex
optimization, where the close-form expressions are obtained
for a given ¯
Et. The corresponding closed-form expressions
are given as [12]
P
0,t(¯
Et) = (1 τ
t)ηPjt|hjt,0|2
τ
t
¯
Et
τ
tT,
and,
τ
t(¯
Et) = min{1,max{x,0}},
where 0=maxn1Et+¯
Et
T ηPjt |h0,t|2,T η Pjt|h0,t |2¯
Et
T ηPjt |h0,t|2+T Pmo,
x=x1x2
ew0(e1(x11))+11+x1
,x1=ηPjt |hjt,0|2|h0|2
1+Pjt |hjt|2,x2=
¯
Et|h0|2
T(1+Pjt |hjt|2)and W0(.)represents the Lambart-W-Function.
The second sub-problem is defined as follows. As our goal
is to maximize EE, from (5) we can observe that the EE, ˆ
ΓEE,
at time t, is not dependent on τˆ
tand P0,ˆ
tfor t=ˆ
t. Hence, the
optimization problem (4) can be reformulated as a function of
¯
Et, into the framework of DDPG, which is given as
max
¯
Et
γt1ˆ
ΓEE¯
Et|τ
t, P
0,t
s.t. Et+1 =minnEm, Et+¯
Eto,
(6)
where γrepresents the discounted factor and assumes a value
between 0 and 1. From problem (6) it can be seen that the
action of the energy-constrained sensor is to choose ¯
Etfor
a given τ
tand P
0,t. By substituting the expression of ˆ
ΓEE
in (6), we get the maximization problem as
max
¯
EtPM
t=1 γt1τ
t(¯
Et)log2 1 + P
0,t(¯
Et)|h0|2
1+Pjt |hjt|2!
PT
s.t. Et+1 =min{Em, Et+¯
Et}.
(7)
It can be observed that the above maximization problem is
a univariate function, which is also continuous. This makes
problem (7) well-fitted to be solved by the DDPG algorithm.
III. IMPLEMENTATION OF DRL ALGORITHM
In this section, we provide preliminaries of the DRL al-
gorithm, i.e., DDPG and we formulate our problem into the
DDPG framework.
A. Deep Deterministic Policy Gradient
DDPG being an actor-critic algorithm is based on determin-
istic policy gradient (DPG) and Deep Q-Network (DQN) [16].
Deep Q-Learning (DQL) becomes inefficient when action and
state spaces are continuous and highly dimensional, therefore
DDPG suits best for such scenarios [17]. In a DRL setup,
initially, the agent (or observer) possesses zero knowledge
about the environment. The agent learns the environment with
time, as it continuously monitors the surroundings and learns
how to maximize a reward signal, using an optimal policy.
1) DDPG Framework: In the DDPG algorithm, at a par-
ticular time step t, the goal of an agent is to find an action
at, for an observation st, that receives a reward rt, which
consequently maximizes the action value function, represented
by Q(st, at). Accordingly, the maximization problem is given
as
a
t(st) = argmax Q(st, at),(8)
where Q(st, at)represents the expected return. The actor
network (or policy network), takes the action, whereas the
critic network (or Q network) acts as an evaluator, which
evaluates how well the action taken by the actor network is.
The parameter for policy network is θµ, which takes stas an
input and produces an action, represented by µ(st|θµ). The
corresponding actor target network is parameterized by θµt
and outputs µt(st|θµt). The critic network is parameterized
by θQ, which takes stand atas inputs and produces the state
value function, represented by Q(st, at|θQ). The correspond-
ing critic target network is parameterized by θQtand outputs
Qt(st, at|θQt).
2) Networks Updating Process: The actor network takes
the action, while other networks ensure that, the actor network
has been trained perfectly in evaluating its output (action). Let
us assume a tuple (st, at, rt, st+1), where strepresents the
current state, atrepresents the action, the agent took according
to the state observed, rtis the reward for the action taken, and
st+1 represents the upcoming state. Based on the above tuple,
the networks update process is given as follows.
1) The training process for the actor network is accomplished
by maximizing (8), which is known as the state value
function. Using parameters of actor and critic networks, (8)
can be reformulated as
J(θµ) = Q(st, at=µ(st|θµ)|θQ).(9)
By taking the gradient of (9) with respect to θµwe get
θµJ(θµ)=∆atQ(st, at|θQ)∆θµµ(st|θµ).(10)
2) Updating the critic network depends on two actor net-
works, first by feeding the output of the target actor
network to the target critic network, which outputs the
target value as a state value function, as
yt=rt+γQt(st+, µt(st+|θµt)|θQt ).(11)
The second estimate for the state value function can be
obtained by minimizing the loss function given by
L(θQ) = |ytQ(st, at|θQ)|2.(12)
3) Using a soft target, which assumes a very low value, the
parameters of both the critic target network and the actor
target network are updated. This is because both target
networks are updated less frequently as compared to their
corresponding counterparts. The corresponding parameters
are updated as
θµtξθµ+ (1 ξ)θµt(13)
and
θQtξθQ+ (1 ξ)θQt(14)
respectively, and ξdenotes the soft updating parameter.
Replay buffer and exploration are two other important features
of the DDPG algorithm. DDPG replay buffer refers to the
storage of the past tuples (st, at, rt, st+1) in a pool. These
tuples are used for enhancing the learning of the agent. Once
the network updating process is completed, batch-sized tuples
are chosen randomly from the pool, which is further passed
on for updating the network. Regarding exploration, the actor
network is forced to explore its surroundings completely, to
do so, the noise figure is supplemented to the actor network’s
output, which can be represented as
a(st) = µ(st|θµ)+Ψ,(15)
where Ψrepresents the added noise.
B. Problem Formulation into DDPG Framework
The DDPG algorithm is implemented in the above problem
while defining state space, action space, and reward as follows:
1) State Space: The state space shall be a tuple containing
channel gains and the energy-constrained sensor’s available
energy, which is represented as
st=hEt,|hjt |2,|h0|2|hjt,0|2iT
.(16)
2) Action Space: The action space contains a single pa-
rameter, which is ¯
Et. The maximum and minimum values
achieved by ¯
Etare given by
min{T Psm, Et} ¯
Etmin{EmEt, T ηPj t|ht,o |2},(17)
where the lower bound is due to the fact when τt= 1, i.e., no
energy harvesting, but transmission only, and also due to the
energy available at the start of time slot Tt. The upper bound
on ¯
Etis due to the fact that τt= 0, i.e., no transmission
but only energy harvesting, and also since a finite amount of
energy can be gathered at time Tt.
TABLE I
SIMULATION PARAMETERS
Parameter Symbol Value
Actor Network’s learning rate αa0.002
Critic Network’s learning rate αc0.005
Batch size B64 Tuples
Memory capacity R10000
Noise spectral density σo-190 dBm
Signal bandwidth Ws10 MHz
Maximum Battery Capacity Em0.2 J
Maximum Transmit Power Psm 23 dBm
Circuit Power Pc15 dBm
Energy Efficiency Coefficient η0.9
Time slot duration T1s
Discounted Factor γ0.99
Center Frequency fc914 MHz
Soft Update Parameter ξ0.01
Since (17) can assume a much larger or much smaller value,
these values can be bounded between 0 and 1, hence ¯
Etis
normalized as follows:
¯
Et=ζtminnEmEt, T ηPj t|hj t,0|2o
(1 ζn)minnT Psm, Eto.(18)
According to (18), the the action parameter for the DDPG
algorithm shall be ζ, where ζ[0,1].
3) Reward: The reward parameter is the EE achieved by
the secondary sensor, i.e., ˆ
ΓEE.
IV. SIMULATION RESULTS AN D ANALY SI S
In this section, we provide performance analysis of the
system model defined in Sec. II. We benchmark the perfor-
mance of the DDPG algorithm against random and greedy
methods. In these benchmark methods, the transmit power
of the energy-constrained sensor is fixed at Psm, however,
the selection of the time-sharing coefficient, τt, differs. In
the random algorithm, τtis chosen uniformly between 0 and
min{1,Et
T Psm }, whereas, in the greedy algorithm, τtis selected
to be min{1,Et
T Psm }.
A. Simulation Environment Setup and Parameters Selection
In our simulations, we have assumed that the BS is located
at the x-y plane’s origin, i.e., (0,0) and we assume large-scale
route loss and ignore random fading. The neural networks,
each having two hidden layers, are simulated for both actor
and critic networks. The activation function used for two
hidden layers is the linear activation function, known as
rectified linear activation function (ReLU), whereas the output
layer’s activation function is the hyperbolic tangent function.
Regarding the critic network, the ReLU activation function is
used in all hidden layers. Further details of fixed parameters,
chosen for simulations, are listed in Table I.
B. Results Analysis
In this section, we present a performance analysis of the
DDPG scheme in comparison with other benchmark schemes,
i.e., greedy and random algorithms.
5 15 25 35 45
Episodes
400
1000
1600
2200
Energy Efficiency (bits\Joul)
DDPG
RANDOM
GREEDY
20 25 30
0
20
40
60
Fig. 2. Energy efficiency of the energy-constrained sensor for the various
number of episodes for the three algorithms.
1) EE comparison against Episodes: Fig 2 shows the
comparison of episodic rewards in terms of EE for the DDPG
algorithm and the benchmark schemes against a number of
episodes. It can be observed that DDPG achieves higher
rewards as compared to greedy and random techniques.
Additionally, we can see that the DDPG algorithm almost
converges after 40 episodes and that there is only a marginal
improvement in the episodic reward after that point. To
help the reader get clarity, a magnified perspective of the
performance of the random and greedy algorithms has been
provided in Fig 2.
2) EE comparison against Path Loss: In order to evaluate
the performance of the DDPG algorithm, EE for all three
schemes are plotted in Fig. 3 for various values of the path
loss exponent. During this setup, the two primary sensors
are at locations (0 m, 1000 m) and (0 m, 1 m), respectively.
The maximum transmit power of primary sensor is fixed at,
Pum = 30 dBm and the power consumed by the RF circuitry
is assumed to be, Pc= 15 dBm. It can be observed that the
DDPG-based algorithm outperforms both the random as well
as the greedy approach. This looks contradictory that, usually
by increasing the path loss exponent the energy consumption
shall increase, because of the dense environment assumed.
However, this increase in EE is because the throughput of the
secondary sensor depends on the transmit power of the pri-
mary sensors, thus, when the path loss exponent is increased,
the transmit power of the primary sensor (located at (0 m,
1 m)) is more affected as compared to the secondary sensor.
Therefore, this benefits the secondary sensor in achieving high
EE, with the increase in the path loss exponent.
3) EE comparison against Transmit Power of Primary
Sensors: The comparison of EE against the transmit power
of primary sensors is shown in Fig. 4. Once again the DDPG
algorithm outperforms the random and greedy algorithms. In
this setup, the path loss exponent is set to n= 3, and the
two primary sensors, assisting the secondary sensor, are at
locations (0 m, 1000 m) and (0 m, 1 m), respectively in the x-
y plane, where the location of the secondary sensor is (1 m,
1 m) in the x-y plane. The power consumed by the RF circuitry
2 2.5 3 3.5 4
Path Loss Exponent
5
10
20
50
150
Energy Efficiency (bits/J)
DDPG
Random
Greedy
Fig. 3. Energy efficiency comparison of three algorithms against path loss
exponent.
0.5 1 1.5 2 2.5
Maximum Transmit Power (W)
1
2
10
100
250
Energy Efficiency (bits/J)
DDPG
Random
Greedy
Fig. 4. Energy efficiency comparison of three algorithms against the transmit
power of primary sensors.
is assumed to be Pc= 15 dBm.
We can observe that by increasing the transmit power of
the primary sensors the EE of the secondary is not raised
much and shows a constant behavior. In the case of the DDPG
algorithm, this is because the expression of data rate when
increased by some value of transmit power is decreased by
the same value at the same time, hence showing a constant
trend.
4) EE comparison against Distance and Circuit Power:
The combined effect of distance and circuit power on both
DDPG and random algorithms has been depicted in Fig. 5(a)
and Fig. 5(b). The distance of the secondary sensor from BS
and primary sensors is presented on the y-axis and the amount
of power consumed by the internal circuitry of the secondary
sensor is presented on the x-axis. In this setting the path loss
exponent is assumed to be, n= 3, the maximum transmit
power of primary sensors is assumed to be, Pum = 30 dBm
and the two primary sensors are located, in x-y plane, at
(0 m, 1000 m) and (0 m, 1 m), respectively. One can observe
a decrease in EE of the secondary sensor with both variables
changing in ascending order. In other words, as the energy-
constrained sensor moves away from primary sensors, in the
x-plane, more energy would be required by the secondary
sensor to make its transmissions, hence its EE is reduced.
The decrease in EE of the secondary sensor against its circuit
140
140
188
188
188
235
235
283
283
331
331
378
378
426
473
521
569
616
0.01 0.03 0.05 0.07 0.09
Circuit Power (w)
5
15
25
35
Distance (m)
Energy
Efficiency
100
200
300
400
500
600
700
(a)
33
33
65
65
97
97
97
128
128
160
192
224
256
288
320
352
383
0.01 0.03 0.05 0.07 0.09
Circuit Power (w)
5
15
25
35
Distance (m)
Energy
Efficiency
50
100
150
200
250
300
350
400
(b)
Fig. 5. Energy efficiency of the energy-constrained sensor against distance
and circuit power, (a) DDPG algorithm, (b) random algorithm.
power can also be observed being declined, as the circuit
power increases. This is because an increase in the circuit
power of the secondary sensor increases the total amount of
power required to transmit data, which causes the EE of the
secondary sensor to be reduced.
ACKNOWLEDGMENT
This work was supported by the Swedish Knowledge Foun-
dation (KKS) research profile NIIT.
V. CONCLUSION
This paper studied the uplink performance analysis of an
energy-constrained secondary sensor in a considered CR-
NOMA-assisted IoT network. We mathematically modeled
and formulated the EE maximization problem of the secondary
sensor, which was solved using a DRL framework, i.e., the
DDPG algorithm. Moreover, we analyzed and compared the
obtained simulation results with the benchmark algorithms,
i.e., greedy and random. The simulation results demonstrated
that the considered DDPG algorithm outperforms the selected
benchmark algorithms in the EE metric. In comparison, we
observed that the EE curve for the DDPG algorithm converged
almost after 40 episodes, while high EE performance was ob-
served in harsher and more diverse environmental conditions.
Similarly, the results demonstrated that increasing the transmit
power of primary sensors in CR-assisted NOMA transmission
leads to improved EE of the secondary sensor with DDPG. We
also examined the combined effect of separation distance and
circuit power, which can be a handful from a system design
perspective. In future work, the model can be extended to
analyze the EE of multiple energy-constrained sensors in a
CR-NOMA network.
REFERENCES
[1] Y. B. Zikria, R. Ali, M. K. Afzal, and S. W. Kim, “Next-generation
Internet of things (IoT): Opportunities, challenges, and solutions,”
Sensors, vol. 21, no. 4, p. 1174, 2021.
[2] S. Zeb, A. Mahmood, et al., Analysis of beyond 5G integrated com-
munication and ranging services under indoor 3-D mmwave stochastic
channels,” IEEE Transactions on Industrial Informatics, vol. 18, no. 10,
pp. 7128–7138, 2022.
[3] S. Zeb et al., “Industry 5.0 is coming: A survey on intelligent
nextG wireless networks as technological enablers, arXiv preprint
arXiv:2205.09084, 2022.
[4] S. Zeb, M. A. Rathore, et al., “Edge intelligence in softwarized 6G:
Deep learning-enabled network traffic predictions, in IEEE Globecom
Workshops (GC Wkshps), pp. 1–6, 2021.
[5] G. G. de Oliveira Brante, M. T. Kakitani, and R. D. Souza, “Energy
efficiency analysis of some cooperative and non-cooperative trans-
mission schemes in wireless sensor networks,” IEEE Transactions on
Communications, vol. 59, no. 10, pp. 2671–2677, 2011.
[6] A. W. Nazar, S. A. Hassan, H. Jung, A. Mahmood, and M. Gidlund,
“BER analysis of a backscatter communication system with non-
orthogonal multiple access,” IEEE Transactions on Green Communi-
cations and Networking, vol. 5, no. 2, pp. 574–586, 2021.
[7] S. Zeb et al., “Industrial digital twins at the nexus of nextG wireless
networks and computational intelligence: A survey,” Journal of Network
and Computer Applications, vol. 200, p. 103309, 2022.
[8] B. Matthiesen, A. Zappone, et al., A globally optimal energy-efficient
power control framework and its efficient implementation in wireless in-
terference networks,” IEEE Transactions on Signal Processing, vol. 68,
pp. 3887–3902, 2020.
[9] N. Rubab et al., “Interference mitigation in RIS-assisted 6G systems
for indoor industrial iot networks,” in IEEE 12th Sensor Array and
Multichannel Signal Processing Workshop (SAM), pp. 211–215, 2022.
[10] A. Mahmood et al., “Industrial IoT in 5G-and-beyond networks: Vi-
sion, architecture, and design trends,” IEEE Transactions on Industrial
Informatics, vol. 18, no. 6, pp. 4122–4137, 2022.
[11] F. Jameel et al., “NOMA-enabled backscatter communications: Toward
battery-free iot networks,” IEEE Internet of Things Magazine, vol. 3,
no. 4, pp. 95–101, 2020.
[12] Z. Ding, R. Schober, and H. V. Poor, “No-pain no-gain: DRL assisted
optimization in energy-constrained CR-NOMA networks, IEEE Trans-
actions on Communications, vol. 69, no. 9, pp. 5917–5932, 2021.
[13] S. Zeb, Q. Abbas, et al., “NOMA enhanced backscatter communication
for green iot networks,” in 16th International Symposium on Wireless
Communication Systems, pp. 640–644, 2019.
[14] L. Li, H. Xu, J. Ma, A. Zhou, and J. Liu, “Joint EH time and transmit
power optimization based on DDPG for EH communications,” IEEE
Communications Letters, vol. 24, no. 9, pp. 2043–2046, 2020.
[15] G. Y. Li, Z. Xu, C. Xiong, C. Yang, S. Zhang, Y. Chen, and S. Xu,
“Energy-efficient wireless communications: tutorial, survey, and open
issues,” IEEE Wireless communications, vol. 18, no. 6, pp. 28–35, 2011.
[16] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
“Deterministic policy gradient algorithms,” in International conference
on machine learning, pp. 387–395, 2014.
[17] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
D. Silver, and D. Wierstra, “Continuous control with deep reinforcement
learning,” arXiv preprint arXiv:1509.02971, 2015.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Ever necessity of tremendous data traffic and massive deployment of industrial Internet-of-Things (IIoT) devices operating in higher bands, i.e., millimeter-wave (mmWave) and terahertz (THz), have encouraged academia and industry to transition towards the future sixth-generation (6G) wireless communication networks. Nevertheless, the recent emergence of 6G enablers, i.e., reflecting intelligent surface (RIS) and massive multiple-input multiple-output (mMIMO), has the ability to potentially change the previous paradigm of indoor mmWave wireless communication by modifying the propagation environment. It can control and establish the favorable and tunable wireless channel responses by exploiting the multipath and diversity of the propagation environment. Therefore, this next cutting-edge technology is capable of massively improving the performance of mmWave-mMIMO-enabled IIoT data transmissions, making it a feasible solution for 6G networks. In this paper, we propose a RIS-assisted mmWave-mMIMO for a multi-cells indoor factory propagation environment, which provides, 1) aid in mitigating the impact of radio frequency (RF) interference from interferes in closed vicinity (i.e., neighbor cells) by employing metasurface laminated walls, and 2) increased the mmWave-mMIMO system performance by controlling and tuning the indoor factory channel conditions. Our results indicate that the proposed RIS-assisted mmWave-mMIMO system outperforms the benchmark link capacity performance in the presence of interference Index Terms-6G, reconfigurable intelligent surfaces, massive MIMO, millimeter-Wave, industrial Internet-of-things.
Article
Full-text available
By amalgamating recent communication and control technologies, computing and data analytics techniques, and modular manufacturing, Industry 4.0 promotes integrating cyber-physical worlds through cyber-physical systems (CPS) and digital twin (DT) for monitoring, optimization, and prognostics of industrial processes. A DT enables interaction with the digital image of the industrial physical objects/processes to simulate, analyze, and control their real-time operation. DT is rapidly diffusing in numerous industries with the interdisciplinary advances in the industrial Internet of things (IIoT), edge and cloud computing, machine learning, artificial intelligence, and advanced data analytics. However, the existing literature lacks in identifying and discussing the role and requirements of these technologies in DT-enabled industries from the communication and computing perspective. In this article, we first present the functional aspects, appeal, and innovative use of DT in smart industries. Then, we elaborate on this perspective by systematically reviewing and reflecting on recent research trends in next-generation (NextG) wireless technologies (e.g., 5G-and-Beyond networks) and design tools, and current computational intelligence paradigms (e.g., edge and cloud computing-enabled data analytics, federated learning). Moreover, we discuss the DT deployment strategies at different communication layers to meet the monitoring and control requirements of industrial applications. We also outline several key reflections and future research challenges and directions to facilitate industrial DT's adoption.
Article
Full-text available
5G and beyond (B5G) networks are moving towards the higher end of the millimeter-wave (mmWave) spectrum (i.e., from 25 GHz to 100 GHz) to support integrated communications and ranging (ICAR) services in next-generation factory deployments. The ICAR services in factory deployments require extreme bandwidth/capacity and large ranging coverage, which a mmWave-B5G system can fulfill using massive multi-input and multi-output (mMIMO), beamforming, and advanced ranging techniques. However, as mmWave signal propagation is sensitive to harsh channel conditions experienced in typical indoor factory environments, there is a growing interest in the realistic mmWave indoor channel modeling to evaluate the practical scope of the mmWave-B5G systems. In this paper, we study and implement a 3D stochastic channel model using the baseline 3GPP model. Our channel model employs the time-cluster spatial-lobe (TCSL) technique, and utilizes the temporal and spatial statistics to create the channel impulse response (CIR), reflecting realistic indoor factory conditions. Using the generated CIR, we present the performance analysis of a mmWave-B5G system in terms of power delay profile (PDP), path loss, communication and ranging coverage, and mMIMO channel capacity.
Conference Paper
Full-text available
The 6G vision is envisaged to enable agile network expansion and rapid deployment of new on-demand microservices (e.g., visibility services for data traffic management, mobile edge computing services) closer to the network's edge IoT devices. However, providing one of the critical features of network visibility services, i.e., data flow prediction in the network, is challenging at the edge devices within a dynamic cloud-native environment as the traffic flow characteristics are random and sporadic. To provide the AI-native services for the 6G vision, we propose a novel edge-native framework to provide an intelligent prognosis technique for data traffic management in this paper. The prognosis model uses long short-term memory (LSTM)-based encoder-decoder deep learning, which we train on real time-series multivariate data records collected from the edge µ-boxes of a selected testbed network. Our result accurately predicts the statistical characteristics of data traffic and verifies the trained model against the ground truth observations. Moreover, we validate our novel framework with two performance metrics for each feature of the multivariate data.
Article
Full-text available
Cellular networks are envisioned to be a cornerstone in future industrial IoT (IIoT) wireless connectivity in terms of fulfilling the industrial-grade coverage, capacity, robustness, and timeliness requirements. This vision has led to the design of verticals-centric service-based architecture of 5G radio access and core networks. The design incorporates the capabilities to include 5G-AI-Edge ecosystem for computing, intelligence, and flexible deployment and integration options (e.g., centralized and distributed, physical and virtual) while eliminating the privacy/security concerns of mission-critical systems. In this paper, driven by the industrial interest in enabling large-scale wireless IIoT deployments for operational agility, flexible, and cost-efficient production, we present the state-of-the-art 5G architecture, transformative technologies, and recent design trends, which we also selectively supplemented with new results. We also identify several research challenges in these promising design trends that beyond-5G systems must overcome to support rapidly unfolding transition in creating value-centric industrial wireless networks.
Article
Full-text available
Backscatter communication (BackCom) has been emerging as a prospective candidate in tackling lifetime management problems for massively deployed Internet-of-things (IoT) devices, which suffer from battery related issues, i.e., replacements, charging, and recycling. This passive sensing approach allows a backscatter sensor node (BSN) to transmit information by reflecting the incident signal from a carrier emitter without initiating its transmission. To multiplex multiple BSNs, power-domain non-orthogonal multiple access (NOMA) is fully exploited in this work. In this paper, we present the design and analysis of a NOMA enhanced bistatic BackCom system for a battery-less smart communication paradigm. Specifically, we derive the closed-form bit error rate (BER) expressions for a cluster of two devices in a bistatic BackCom system employing NOMA with imperfect successive interference cancellation under Nakagami-m fading channel. The obtained expressions are utilized to evaluate the reflection coefficients of devices needed for the most favorable system performance along with the performance comparison with orthogonal multiple access-time domain multiple access scheme (OMA-TDMA).
Article
Full-text available
It is predicted that by 2025, all devices will be connected to the Internet, subsequently causing the number of devices connected with the Internet to rise [...]
Article
Full-text available
This paper applies machine learning to optimize the transmission policy of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) networks, where time-division multiple access (TDMA) is used to schedule multiple primary users and an energy-constrained secondary user is admitted to the primary users' time slots via NOMA. During each time slot, the secondary user performs the two tasks: data transmission and energy harvesting based on the signals received from the primary users. The goal of the paper is to maximize the secondary user's long-term throughput, by optimizing its transmit power and the time-sharing coefficient for its two tasks. The long-term throughput maximization problem is challenging due to the need for making decisions that yield long-term gains but might result in short-term losses. For example, when in a given time slot, a primary user with large channel gains transmits, intuition suggests that the secondary user should not carry out data transmission due to the strong interference from the primary user but perform energy harvesting only, which results in zero data rate for this time slot but yields potential long-term benefits. In this paper, a deep reinforcement learning approach is applied to emulate this intuition, where the deep deterministic policy gradient (DDPG) algorithm is employed together with convex optimization. Our simulation results demonstrate that the proposed deep reinforcement learning assisted NOMA transmission scheme can yield significant performance gains over two benchmark schemes.
Article
Full-text available
This work develops a novel power control framework for energy-efficient power control in wireless networks. The proposed method is a new branch-and-bound procedure based on problem-specific bounds for energy-efficiency maximization that allow for faster convergence. This enables to find the global solution for all of the most common energy-efficient power control problems with a complexity that, although still exponential in the number of variables, is much lower than other available global optimization frameworks. Moreover, the reduced complexity of the proposed framework allows its practical implementation through the use of deep neural networks. Specifically, thanks to its reduced complexity, the proposed method can be used to train an artificial neural network to predict the optimal resource allocation. This is in contrast with other power control methods based on deep learning, which train the neural network based on suboptimal power allocations due to the large complexity that generating large training sets of optimal power allocations would have with available global optimization methods. As a benchmark, we also develop a novel first-order optimal power allocation algorithm. Numerical results show that a neural network can be trained to predict the optimal power allocation policy.
Article
Energy management and power allocation policy is considered for energy harvesting (EH) communications. In this letter, we propose a joint optimization problem with the continuous EH time and transmit power to maximize the longterm throughput based on deep deterministic policy gradient (DDPG). However, the joint optimization problem leads to a large continuous action space. In order to reduce the dimension of action space, we present a deep reinforcement learning (DRL) framework by combining DDPG and convex program. The original problem is decomposed into two-layer optimization subproblems by using the primal decomposition method. The primary problem can be solved by DDPG with a low-dimensional action space. The lower-layer subproblem can be solved by using the existing convex toolbox. Numerical simulation results show that, compared with the existing energy management or power allocation policies for EH communications, the proposed DRL framework can achieve higher long-term throughput.