Conference PaperPDF Available

Actor-Critic Deep Reinforcement Learning for Energy Minimization in UAV-Aided Networks

Authors:

Figures

Content may be subject to copyright.
Actor-Critic Deep Reinforcement Learning for
Energy Minimization in UAV-Aided Networks
Yaxiong Yuan, Lei Lei, Thang X. Vu, Symeon Chatzinotas, and Bj¨
orn Ottersten
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg
Emails: {yaxiong.yuan; lei.lei; thang.vu; symeon.chatzinotas; bjorn.ottersten@uni.lu }
Abstract—In this paper, we investigate a user-timeslot schedul-
ing problem for downlink unmanned aerial vehicle (UAV)-aided
networks, where the UAV serves as an aerial base station. We
formulate an optimization problem by jointly determining user
scheduling and hovering time to minimize UAV’s transmission
and hovering energy. An offline algorithm is proposed to solve the
problem based on the branch and bound method and the golden
section search. However, executing the offline algorithm suffers
from the exponential growth of computational time. Therefore,
we apply a deep reinforcement learning (DRL) method to design
an online algorithm with less computational time. To this end,
we first reformulate the original user scheduling problem to a
Markov decision process (MDP). Then, an actor-critic-based RL
algorithm is developed to determine the scheduling policy under
the guidance of two deep neural networks. Numerical results
show the proposed online algorithm obtains a good tradeoff
between performance gain and computational time.
Index Terms—UAV-aided networks, deep reinforcement learn-
ing, actor-critic, user scheduling, energy minimization.
I. INT ROD UC TI ON
Unmanned aerial vehicles (UAVs) are widely used in many
areas. Two promising features of the UAVs are flexibility
and mobility such that they can be applied in dynamic,
distributed or plug-and-play environments, e.g., disaster rescue
and live concert [1]. Since the UAVs have more likelihood to
undergo line-of-sight (LoS) connections with ground users,
it is favorable for reliable communications [2]. For these
benefits, the applications of UAV-aided wireless networks have
been emerging. We consider a UAV-aided downlink scenario
where the UAV serves as an aerial base station (BS) to deliver
data to ground users when some of the terrestrial BSs are
destroyed after the disaster. The design of an energy-efficient
UAV network is necessary, as the battery storage of the UAV
is limited.
The energy consumption of the UAV mainly comes from the
propulsion energy, i.e., the energy used for flying, hovering,
and communication. A proper user-timeslot scheduling scheme
is effective to achieve energy conservation for UAV systems
[3]-[5]. In [3], the authors introduced an energy model and
proposed a user scheduling method to minimize the energy
consumption of the UAV. In [4] and [5], the authors studied
an energy-efficiency maximization problem via jointly UAV-
to-user scheduling, power control, and trajectory design. The
above papers focused on single-antenna UAV networks where
the users are served by time division multiple access (TDMA)
mode. Equipping with multiple antennas, the UAV can trans-
mit data to multiple users simultaneously to improve the
capacity of networks. This is known as space division multiple
access (SDMA). In [6], the authors proposed a semiorthogonal
user selection (SUS) algorithm for terrestrial multiple input
multiple output (MIMO) systems. However, the SDMA-based
user scheduling problem for UAV networks is more difficult
than TDMA due to the combinatorial explosion of the user
groups. Moreover, diversified sources of the UAV’s energy
consumption make the problem more complicated.
Deep reinforcement learning (DRL) combines artificial neu-
ral networks with a reinforcement learning architecture that
can provide efficient algorithms. Since DRL makes decisions
based on the current environment state, it is more suitable
for dynamic systems, such as UAV’s moving, time-varying
channel and new user arrival. In [7], the authors proposed a
user association algorithm by deep Q-network, where non-
linear deep neural networks are used to approximate the
value function. In [8], the authors proposed an echo state
network-based DRL algorithm by joint path selection, UAV-
BSs association, and power control. In [9], an actor-critic-
based deep deterministic policy gradient algorithm was applied
to deal with the problem of flying direction and distance
selection. Actor-critic learning can acquire a stochastic policy
to deal with a very huge or continuous action space. In our
problem, the combinatorial nature of the scheduling problem
makes a huge action space. Therefore, we employ actor-critic-
based DRL to develop an online user scheduling algorithm.
The following lists our major contributions:
For energy minimization, we formulate a combinatorial
optimization problem and propose an offline algorithm to
solve it.
Implementing the offline algorithm is impractical since
it suffers from long computational time. To overcome
this difficulty, we reformulate the original optimization
problem to an MDP problem and design an actor-critic-
based DRL algorithm.
Simulation results demonstrate that the proposed DRL al-
gorithm strikes a good tradeoff between the performance
gain and computational time.
II. SY ST EM MO DE L
In the UAV-aided downlink system, the UAV serves as the
BS to deliver data to the ground users. The service area is
2020 European Conference on Networks and Communications (EuCNC): Wireless, Optical and Satellite Networks (WOS)
978-1-7281-4355-2/20/$31.00 ©2020 IEEE 348
Authorized licensed use limited to: University of Luxembourg. Downloaded on October 15,2020 at 11:50:00 UTC from IEEE Xplore. Restrictions apply.
divided into Nclusters due to the limited service range of
the UAV. Before the service starts, the UAV has to be fully
charged and prepared at a dock station. Then, the UAV flies
through all the clusters successively at a fixed altitude and
transmits data to the ground users. We denote Knand qk,n
(in bits) as the number of users and the k-th user’s demands in
the n-th cluster. When all the demands in the current cluster
are satisfied, the UAV will leave for the next cluster. After the
service, the UAV flies back to the dock station and prepares for
the next round. New users may arrive during the service. Their
requests will be processed in the next round. Fig. 1 illustrates
an example of the system model.
Fig. 1. A UAV network with N= 3 clusters.
The delivery process evolves over a sequence of frames
whose structure is standardized. Since the data collected by the
UAV have a certain life span, in each round, all the tasks must
be completed within a limited time TL(in frames). We assume
that a frame lasts TF(in seconds) and consists of Itimeslots.
Thus, each timeslot lasts TI=TF/I (in seconds). Under the
assumption of SDMA, the UAV can schedule more than one
user at each timeslot. As shown in Fig. 2, the shaded block
indicates that the user is scheduled. We define the scheduled
users as a user group. If no user is scheduled, the group is an
empty set. The number of users for each cluster is up to K.
So, the maximum number of candidate groups is calculated
by G= 2K, which increases exponentially with K.
Fig. 2. An illustration of the structure of the frame.
For the UAV-to-ground communication links, we consider
a quasi-static Rician fading channel, as it can comprise both a
deterministic LoS component and a random multipath compo-
nent [10]. The channel states are fixed within a transmission
frame. We denote hk,n =gk,n10ξk,n /10 as the channel vector
between the UAV and the ground user kin cluster n, where
ξk,n is the path loss. We assume Lis the number of antennas
of the UAV while the ground users are equipped with single
antenna. gk,n = [gk,n,1, ..., gk,n,L ]is the Rician fading vector
where all elements are independent of each other. We collect
their channel vectors to reformulate a matrix HnCKn×L.
Towards eliminating multi-user interference within a group,
minimum mean square error (MMSE) precoding is applied
[11]. The precoding vector for user kis calculated by wk,n =
pˆ
hk,n, where ˆ
hk,n is the column corresponding to user k
in HH
n(σ2I+HnHH
n)1. We normalize the precoder ˆ
hk,n and
assume the power allocation pis the same for all the users. σ2
refers to the noise power. Denote β(kj)
n=|hH
k,n ˆ
hj,n|2as the
channel coefficient after precoding. The signal-to-interference-
plus-noise ratio (SINR) for user kis given by:
SINRk,g,n =β(kk)
np
Pj∈Kg\{k}β(kj)
np+σ2, k ∈ Kg, g ∈ Gn,(1)
where Kgis the set of users in group gand Gnis the set of
candidate groups in cluster n. Let Bas the system bandwidth.
The transmitted data of each timeslot can be expressed by
Shannon equation:
dk,g,n =TIBlog2(1 + SINRk,g ,n), k ∈ Kg, g ∈ Gn.(2)
The communication energy of group gis given by:
e(c)
g,n =TIX
k∈Kg
β(kk)
np, g ∈ Gn.(3)
To facilitate the following calculation, we collect dk,g,n to
form a data matrix for each cluster, Dn={dk,g,n}K×G.
We set the element dk,g ,n to 0 if k /∈ Kgor g /∈ Gn.
e(c)
n= [e(c)
1,n, ..., e(c)
G,n]is denoted as the vector of commu-
nication energy for all the groups in cluster n.
To analyze the propulsion energy, we employ a UAV energy
model [3]. The flying power P(V)is given by:
P(V) =P0 1 + 3V2
U2
tip !+P1 s1 + V4
4v4
0V2
2v2
0!1/2
(4)
+1
2d0ρsAV 3,
where Vis the flying speed while P0and P1refer to the
blade profile power and induced power in hovering status,
respectively. Utip is the tip speed of the rotor blade. v0
denotes the mean rotor induced velocity. d0and sare the
fuselage drag ratio and rotor solidity, respectively. ρand A
are denoted as the air density and rotor disc area, respectively.
By substituting V= 0, the hovering power is given by
P(h)=P(0) = P0+P1. We assume the UAV travels between
clusters with a constant speed Vmr and a predetermined flying
path (total flying distance is S). Vmr refers to the maximum-
range speed which maximizes the total traveling distance with
any given battery storage [3]. So, the flying power is fixed at
P(f)=P(Vmr)and the flying energy can be calculated by
E(f)=SP (f)/Vmr .
III. PROB LE M FOR MU LATI ON A ND OFFL IN E APP ROAC H
A. Problem Formulation
The user scheduling scheme varies between the frames.
We denote Λn[t] = {λg,i,n}G×Ias the scheduling matrix on
2020 European Conference on Networks and Communications (EuCNC): Wireless, Optical and Satellite Networks (WOS)
349
Authorized licensed use limited to: University of Luxembourg. Downloaded on October 15,2020 at 11:50:00 UTC from IEEE Xplore. Restrictions apply.
frame t, where the element λg,i,n ∈ {0,1}indicates whether
the g-th group is selected at the i-th timeslot. νn[t]∈ {0,1}
refers to the hovering indicator. νn[t]=1means the UAV is
hovering above the cluster nat frame t. The variable νn[t]has
the following constraints:
νn[t+1] + νn+1[t+1] = 1,if νn[t]=1,t, (5)
XN
n=1 νn[t]1,t. (6)
Eq. (5) shows the UAV has to choose either staying at the
current cluster or flying to the next cluster at the end of each
frame but cannot fly back to the previous cluster. Eq. (6) means
the UAV can only serve one cluster at each frame. The energy
minimization problem is formulated as P1:
P1: min
Λn[t]n[t]E(f)+E(c)+E(h)(7)
s.t. dnqn,n, (7a)
Λn[t]T11,n, t, (7b)
Λn[t]B, νn[t]∈ {0,1},n, t (7c)
(5),(6).
E(c)is the UAV’s communication energy, which is given
by PTL
t=1 PN
n=1 νn[t]e(c)
n[t]Λn[t]1.
E(h)is the UAV’s hovering energy, which is given by
PTL
t=1(PN
n=1 νn[t])TFP(h).
dnis the received data for all the users in the n-th cluster,
which is given by PTL
t=1 νn[t]Dn[t]Λn[t]1.
qnis the demand vector of the n-th cluster that consists
of the users’ requests qk,n.
Constraints (7a) mean that all the users’ requests have to
be satisfied. Constraints (7b) indicate that no more than
one group can be scheduled per timeslot. Constraints (7c)
confine both Λn[t]and νn[t]to binary variables. P1has some
characteristics that make it difficult to be solved:
P1is a combinatorial optimization problem whose goal
is to find an optimal user group from a finite set. The size
of the set exponentially increases with the user scale.
The two variables Λn[t]and νn[t]are coupled. The
change of νn[t]affects the scheduling policy.
The constraints (5) and (6) make the decisions of νn[t]
time-correlated. The hovering time allocation of the cur-
rent cluster affects the decisions of the next cluster.
B. Offline Approach
To make P1more tractable, we first investigate the schedul-
ing policy under the fixed hovering indicator νn[t]. Thus, the
constraints (5) and (6) can be removed. We denote tn=
PTL
t=1 νn[t]as the hovering time for each cluster n. When
νn[t]is fixed, the scheduling policies are independent between
clusters. P1can be divided into Nsub-problems P1(n)to find
the optimal Λn[t]. For concise expression, we replace νv[t]
with tnand delete the constant E(f)from the objective.
P1(n) : min
Λn[t]E(c)
n+E(h)
n(8)
s.t. (7a),(7b),(7c),
where E(c)
n=Pτn+tn
t=τn+1 e(c)
n[t]Λn[t]1and E(h)
n=TFP(h)tn.
τnrefers to the consumed time before the UAV arriving
cluster n, which can be calculated by τn=PTL
t=1 PN1
n=1 νn[t].
We can observe that P1(n)is a binary linear programming
problem that can be solved optimally by branch and bound
(B&B) [12].
Determining optimal hovering time tnis a non-trivial task
since tnand Λn[t]are coupled. Brute force search is the most
direct approach but not efficient. In our study, an efficient
search method, golden section search (GSS), is employed to
provide sub-optimal solutions [14]. The offline algorithm is
summarized in Alg. 1. In the outer loop (line 4, Alg. 1), GSS
is used to determine the hovering time tnfor each cluster. In
the inner loop (line 5, Alg. 1), we apply B&B to find the user
scheduling scheme.
Algorithm 1 Offline Algorithm
Inputs: Users’ demands q1, ..., qN, maximal time limitation TL,
channel coefficients {β(kj)
n[t]}, ..., {β(kj)
N[t]}.
Outputs: Hovering time t
n, user scheduling Λ
n[t].
1: for n= 1 : Ndo:
2: = 0.618,a1= 0,b1=TL,i= 1,
3: u1=db1(b1a1),v1=da1+(b1a1)e.
4: while |biai| 6= 1 do:golden section search
5: Solve P1(n)with t1
n=uiand t2
n=viby B&B.
6: Obtain the user scheduling Λ1
n[t]and Λ2
n[t].
7: Obtain the objective energy E1
nand E2
n.
8: if E1
n< E2
nthen:
9: ai+1 =ai, bi+1 =vi,
vi+1 =ui, ui+1 =dbi+1 (bi+1 ai+1)e.
10: else:
11: ai+1 =ui, bi+1 =bi,
ui+1 =vi, vi+1 =dbi+1 (bi+1 ai+1)e.
12: end if
13: i=i+ 1.
14: end while
15: t
n=t2
n,Λ
n[t] = Λ2
n[t].
16: end for
However, the offline approach still has some limitations in
practical usage. The computational time for the inner loop
increases exponentially with the user scale. Specifically, under
the worst case, the time complexity of B&B is O((22K)Itn).
Besides, when a new round starts, the problem needs to be
recalculated due to the new users’ arrival and the channel con-
dition change. So, we propose a deep reinforcement learning
algorithm that enables UAV to make decisions intelligently to
overcome these difficulties.
IV. ACTO R- CR IT IC S- BASED ONLINE ALGORITH M
Actor-critic is one of the reinforcement learning frameworks
which splits the model into two components to leverage the
strengths of both value-based and policy-based methods [13].
For the actor part, the policy πcan be described as an action’s
probability distribution π(a|s), where a∈ A and s∈ S are the
current action and state, respectively. Usually, we call π(a|s)
the stochastic policy which handles a huge or continuous
2020 European Conference on Networks and Communications (EuCNC): Wireless, Optical and Satellite Networks (WOS)
350
Authorized licensed use limited to: University of Luxembourg. Downloaded on October 15,2020 at 11:50:00 UTC from IEEE Xplore. Restrictions apply.
action space. The objective of reinforcement learning is to
find a policy πwhich maximizes
J=Eπ[Qπ(s, a)] = ZS
pπ(s)ZA
π(a|s)Qπ(s, a)dads, (9)
where pπ(s)is the distribution of state and Qπ(s, a)is the
action-value (or Q-value) function under the policy π. To
predict the action’s distribution, a parameterized approximator
πθ(a|s)is built. Assume πθ(a|s)is differentiable in regards
to θsuch that the gradient of Jcan be expressed by:
θJ(θ) = ZS
pπθ(s)ZAθπθ(a|s)Qπθ(s, a)dads
=Eπθ[θlog(πθ(a|s))Qπθ(s, a)] .(10)
Based on gradient ascending, the parameter θis updated by:
θ0=θ+αaθJ(θ),(11)
where αais the learning rate for the actor.
The critic is responsible for estimating Qπθ(s, a). Like
the actor, the critic also has a parameterized approximator
Qw(s, a). Temporal difference (TD) can be applied in the critic
to enhance the learning efficiency [13]. The objective of the
critic is to minimize the square of TD error
L(w) = [(r+Qw(s0, a0)) Qw(s, a)]2,(12)
where a0and s0are the next action and state, respectively. The
update rule for parameter wcan be derived as:
w0=w+αcwL(w).(13)
In our study, we use two fully connected deep neural network
(DNN) as the approximators for the actor and critic. Thus, θ
and wrepresent the weights of the neural networks.
To apply the DRL algorithm, we reformulate the optimiza-
tion problem P1to an MDP.
1) System States: The system states stare jointly deter-
mined by the channel state, remaining requests, and cluster
indicator. To characterize the fading nature of the wireless
channel, the time-varying channel can be further modeled as
first state Markov channel (FSMC) [7]. In our study, we model
the channel coefficient β(kj)
n[t]as a Markov random variable.
The remaining requests are the difference between the required
data and delivered data,
bn[t] = qnXt
t0=1 dn[t0].(14)
We denote c[t]as the cluster indicator that shows which cluster
the UAV is serving on frame t. When all the users’ requests
in the current cluster are completed, i.e., bn[t] = 0, the UAV
will move to the next cluster such that c[t+1] = c[t] + 1. The
state can be defined as:
st={β(kj)
1[t], ..., β(kj )
N[t],b1[t], ..., bN[t], c[t]}.(15)
2) System Actions: The action of the UAV is to choose a
group of users to serve. On each frame t, we define the action
atas:
at={a1,t, ..., aI ,t}, ai,t [0, G].(16)
where ai,t is a continuous value. After selecting an action at,
we round all the elements to integer values ˆai,t .ˆai,t =gmeans
the g-th group is selected at the i-th timeslot. The hovering
time of each cluster can be determined by calculating the time
difference between the UAV arriving and leaving the cluster.
3) Rewards Design: The rewards of the DRL algorithms
are commonly related to the objective of the problem. Since
P1is an energy minimization problem, the reward function
can be designed by:
rt= 1/e[t](17)
where e[t]is the energy consumed on frame t. The reward
function is monotonically decreasing with regards to the en-
ergy enabling the UAV’s decisions to evolve towards reducing
energy consumption. If the after-learned policy violates the
constraint (7a), the agent will get a penalty φwhich is a
negative value.
Based on the above definition, we propose an actor-critic-
based DRL algorithm in Alg. 2. When a new user arrives, the
DRL algorithm only needs to update the state information at
the beginning of the next round, without restarting training.
Algorithm 2 Actor-critic-based DRL Algorithm
Inputs: The current state st.
Outputs: The current action at.
1: Initialize the parameters θ1and w1.
2: for each learning round do:
3: for t= 1 : TLdo:
4: Approximate a distribution π(at|st, θt)by actor.
5: Randomly choose ataccording to π.
6: Round atto ˆatand observe rt,st+1.
7: Pass rtand st+1 to the critic.
8: Approximate Q(at, st|wt),Q(at+1, st+1 |wt)by critic.
9: Calculate TD error L(wt)by Eq. (12).
10: Collect tuples {at, st, st+1, rt, L(wt)}.
11: Update θtand wtby Eq. (11) and Eq. (13).
12: end for
13: end for
V. SIM UL ATIO N RES ULTS
In this section, we compare the proposed actor-critic-based
DRL algorithm with the semiorthogonal user selection (SUS)
algorithm in [6] and the offline approach. The UAV is equipped
with L= 10 antennas serving N= 3 clusters. The users are
randomly scattered in the service area. The users’ arrival and
departure follow Possion distribution. In each round, the users’
demands qk,n are randomly selected from {1, 1.5, 2, 3, 4.45,
5}(Mbit). Each cluster holds a maximum of K= 10 users
such that the number of candidate groups G= 210. We assume
the bandwidth B= 10MHz, noise power σ2= 0.1mW,
transmission power p= 3W and hovering power PH= 10W.
Based on FSMC, we quantize the channel coefficient β(kj)
n
into 10 levels, B={0,0.3,0.6,0.9,1.2,1.5,1.8,2.1,2.4}, and
apply the transfer probability matrix referring to [7]. For the
2020 European Conference on Networks and Communications (EuCNC): Wireless, Optical and Satellite Networks (WOS)
351
Authorized licensed use limited to: University of Luxembourg. Downloaded on October 15,2020 at 11:50:00 UTC from IEEE Xplore. Restrictions apply.
120 140 160 180 200 220
TL (Frame)
500
1000
1500
2000
2500
3000
Objective energy (Joule)
Offline algorithm
Actor-critic DRL algorithm
SUS algorithm in [6]
Fig. 3. Energy vs. TL.
56789
K
500
1000
1500
2000
Objective Energy (Joule)
Offline algorithm
Actor-critic DRL algorithm
SUS algorithm in [6]
Fig. 4. Energy vs. K.
56789
K
0
0.5
1
1.5
2
2.5
3
3.5
4
Computational time (s)
Offline algorithm
Actor-critic DRL algorithm
SUS algorithm in [6]
Fig. 5. Computational time vs. K.
120 140 160 180 200 220
TL (Frame)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Outage ratio
Offline algorithm
Actor-critic DRL algorithm
SUS algorithm in [6]
Fig. 6. Outage ratio vs. TL.
DRL algorithm, we set the learning rate αa=αc= 0.001, the
penalty φ=100. We use two 5-layer (300 nodes per layer)
DNNs to build the approximators. The stochastic policy π(a|s)
follows Gaussian distribution.
Fig. 3 compares the energy consumption of three schemes
with different time limitation TL. We can observe that the
energy drops at the minimal point and then increases steadily.
This is because, when TLis not sufficient, the users have to
share one timeslot leading to the inter-user interference and the
degradation of the average transmission rates. In this case, the
UAV needs to consume more transmission energy to satisfy the
users’ requests. Thus, the communication energy undergoes
a sharp decrease and then becomes stable after TLis large
enough. On the other hand, the hovering energy maintains a
linear growth with regards to TL. Among the three algorithms,
the offline approach has the best performance in energy conser-
vation, followed by the proposed DRL algorithm. Fig. 4 shows
the energy consumption with different K. We can observe
that a large user scale leads to more energy consumption. The
offline algorithm consumes the least energy compare to the
others. The proposed DRL algorithm saves more than 25%
energy of SUS.
Fig. 5 compares the computational time with different K.
The computational time refers to the average time needed
to generate a scheduling solution for each frame. It can be
found that the proposed DRL algorithms are more time-
efficient. The computational time of the offline algorithm
grows exponentially, while that of the DRL algorithm and
SUS increase linearly. Since the DRL algorithm has an extra
learning process, it takes more time than the non-learning SUS.
If a user’s request is not completed within TL, the service
of the user will be interrupted. We define the outage ratio as
the percentage of the number of interrupted users to the total
number of users. Fig. 6 shows the outage ratio with regards
to TL. We can observe that, if TLis sufficient, almost all the
users’ requests can be completed without interruption. When
the TLis small, the proposed DRL algorithm has a lower
outage ratio compared with SUS.
VI. CO NC LU SI ON
In this paper, we have investigated an energy-efficient user
scheduling problem in UAV-aided communication systems.
We first proposed an offline algorithm that provides user
scheduling solutions to minimize the energy consumption
for the UAV. To reduce the computational time, an actor-
critic-based DRL algorithm was developed by transferring
the problem to an MDP. Numerical results show that the
proposed DRL algorithm achieves good performance in energy
conservation with less computational time.
ACK NOW LE DG ME NT
The work has been supported by the ERC project
AGNOSTIC (742648), by the FNR CORE projects RO-
SETTA (11632107), ProCAST (C17/IS/11691338) and 5G-
Sky (C19/IS/13713801), and by the FNR bilateral project
LARGOS (12173206).
REF ER EN CE S
[1] Y. Zeng, R. Zhang and T. J. Lim, “Wireless Communications with
Unmanned Aerial Vehicles: Opportunities and Challenges,” in IEEE
Communication Magazine, vol. 52, no. 5, pp. 36–42, May 2016.
[2] M. Mozaffari, W. Saad, M. Bennis et al., “A Tutorial on UAVs for
Wireless Networks: Applications, Challenges, and Open Problems,” in
IEEE Communications Surveys & Tutorials, Mar. 2019.
[3] Y. Zeng, J. Xu and R. Zhang, “Energy Minimization for Wireless Com-
munication with Rotary-Wing UAV,” in IEEE Transactions on Wireless
Communications, vol. 18, no. 4, pp. 2329–2345, Mar. 2019.
[4] Y. Cai, Z. Wei, R. Li et al., “Energy-Efficient Resource Allocation for
Secure UAV Communication Systems,” in 2019 IEEE Wireless Commu-
nications and Networking Conference (WCNC), Apr. 2019.
[5] S. Ahmed, M. Z. Chowdhury and Y. M. Jang, “Energy-Efficient UAV-
to-User Scheduling to Maximize Throughput in Wireless Networks,” in
IEEE Access, vol. 8, pp.21215–21225, Jan. 2020.
[6] T. Yoo and A. Goldsmith, “On the Optimality of Multiantenna Broadcast
Scheduling Using Zero-Forcing Beamforming,” in IEEE Journal on
Selected Areas in Communications, vol. 24, no. 3, pp. 528–541 Mar.
2006.
[7] Y. He, Z. Zhang, F. R. Yu et al., “Deep-Reinforcement-Learning-Based
Optimization for Cache-Enabled Opportunistic Interference Alignment
Wireless Networks,” in IEEE Transactions on Vehicular Technology,
vol. 66, no. 11 pp.10433–10445, Sep. 2017.
[8] U. Challita, W. Saad, and C. Bettstetter, “Cellular-Connected UAVs over
5G: Deep Reinforcement Learning for Interference Management,” in
arXiv preprint:1801.05500, Jan. 2018.
[9] C. H. Liu, Z. Chen, J. Tang et al., “Energy-Efficient UAV Control for
Effective and Fair Communication Coverage: A Deep Reinforcement
Learning Approach,” in IEEE Journal on Selected Areas in Communi-
cations, vol. 36, no. 9, pp. 2059–2070, Aug. 2018.
[10] C. You and R. Zhang, “3D Trajectory Optimization in Rician Fading
for UAV-Enabled Data Harvesting,” in arXiv preprint:1901.04106, Jan.
2019.
[11] C. Zhang, W. Xu and M. Chen, “Robust MMSE Beamforming for
Multiuser MISO Systems With Limited Feedback,” in IEEE Signal
Processing Letters, vol. 16, no. 7, pp. 588–591, Jul. 2009.
[12] W. Zhang, “Branch-and-Bound Search Algorithms and Their Computa-
tional Complexity,” Research Report (No. ISI/RR-96-443), University of
Southern California, May 1996.
[13] V. R. Konda and J. N. Tsitsiklis, “Actor-Critic Algorithms,” in Advances
in Neural Information Processing Systems, pp. 1008–1014, 2000.
[14] J. Guillot, D. R. Leal, C. R. Algar´
ın et al., “Search for Global Maximal in
Multimodal Functions by Applying Numerical Optimization Algorithms:
A Comparison Between Golden Section and Simulated Annealing,” in
Computation, vol. 7, no. 3, 2019.
2020 European Conference on Networks and Communications (EuCNC): Wireless, Optical and Satellite Networks (WOS)
352
Authorized licensed use limited to: University of Luxembourg. Downloaded on October 15,2020 at 11:50:00 UTC from IEEE Xplore. Restrictions apply.
... Year Category Specific Algorithm CA ML [232] 2018 Heuristic [233] 2021 Heuristic [29] 2020 Heuristic [234] 2020 Heuristic [235] 2020 Heuristic [236] 2016 Heuristic [237] 2018 Heuristic [238] 2020 Heuristic [30] 2020 DDPG [239], [240] 2020, 2021 Actor-critic as well as timely recharging of the UAV battery. A DRL algorithm based on DDPG was employed to solve the joint optimization of both the mobility and charging cycle of the UAV-BSs, leading to maximization of EE and achievement of fair user coverage. ...
... A DRL algorithm based on DDPG was employed to solve the joint optimization of both the mobility and charging cycle of the UAV-BSs, leading to maximization of EE and achievement of fair user coverage. The work in [239] and [240] considered the problem of user scheduling in UAV-based communication networks to minimize the energy consumption of the UAVs. An offline method, based on branch and bound method, was proposed to solve the problem, however, this approach involves a huge computational overhead. ...
Preprint
Wireless communication networks have been witnessing an unprecedented demand due to the increasing number of connected devices and emerging bandwidth-hungry applications. Albeit many competent technologies for capacity enhancement purposes, such as millimeter wave communications and network densification, there is still room and need for further capacity enhancement in wireless communication networks, especially for the cases of unusual people gatherings, such as sport competitions, musical concerts, etc. Unmanned aerial vehicles (UAVs) have been identified as one of the promising options to enhance the capacity due to their easy implementation, pop up fashion operation, and cost-effective nature. The main idea is to deploy base stations on UAVs and operate them as flying base stations, thereby bringing additional capacity to where it is needed. However, because the UAVs mostly have limited energy storage, their energy consumption must be optimized to increase flight time. In this survey, we investigate different energy optimization techniques with a top-level classification in terms of the optimization algorithm employed; conventional and machine learning (ML). Such classification helps understand the state of the art and the current trend in terms of methodology. In this regard, various optimization techniques are identified from the related literature, and they are presented under the above mentioned classes of employed optimization methods. In addition, for the purpose of completeness, we include a brief tutorial on the optimization methods and power supply and charging mechanisms of UAVs. Moreover, novel concepts, such as reflective intelligent surfaces and landing spot optimization, are also covered to capture the latest trend in the literature.
... UAV-MECs are a promising technology due to flexible wireless connectivity and coverage even in the absence of network infrastructure. Despite the fact that UAV-MECs have numerous applications ranging from mobile relay BS to caching and MEC cloudlets, it is critical to thoroughly investigate UAV trajectory optimization [220] [221] [222], hovering altitude [223] [224], and speed control [225] [226]. The UAV-MEC mobility causes them to leave the coverage area of the serviced MDs, which may increase latency [227]; additionally, there is a large amount of data to be offloaded from MDs concerning the available bandwidth on both UAV-MECS and the backhaul network [228]. ...
Article
Full-text available
The lack of resource constraints for edge servers makes it difficult to simultaneously perform many Mobile Devices' (MDs) requests. The Mobile Network Operator (MNO) must then select how to delegate MD queries to its Mobile Edge Computing (MEC) server to maximize the overall benefit of admitted requests with varying latency needs. Unmanned Aerial Vehicles (UAVs) and Artificial Intelligent (AI) can increase MNO performance because of their flexibility in deployment, high mobility of UAV, and efficiency of AI algorithms. There is a trade-off between the cost incurred by the MD and the profit received by the MNO. Intelligent computing offloading to UAV-enabled MEC, on the other hand, is a promising way to bridge the gap between MDs' limited processing resources, as well as the intelligent algorithms that are utilized for computation offloading in the UAV-MEC network and the high computing demands of upcoming applications. This study looks at some of the research on the benefits of computation offloading process in the UAV-MEC network, as well as the intelligent models that are utilized for computation offloading in the UAV-MEC network. In addition, this article examines several intelligent pricing techniques in different structures in the UAV-MEC network. Finally, this work highlights some important open research issues and future research directions of Artificial Intelligent (AI) in computation offloading and applying intelligent pricing strategies in the UAV-MEC network.
... UAV-MECs are a promising technology due to flexible wireless connectivity and coverage even in the absence of network infrastructure. Despite the fact that UAV-MECs have numerous applications ranging from mobile relay BS to caching and MEC cloudlets, it is critical to thoroughly investigate UAV trajectory optimization [220] [221] [222], hovering altitude [223] [224], and speed control [225] [226]. The UAV-MEC mobility causes them to leave the coverage area of the serviced MDs, which may increase latency [227]; additionally, there is a large amount of data to be offloaded from MDs concerning the available bandwidth on both UAV-MECS and the backhaul network [228]. ...
Preprint
Full-text available
The Mobile Network Operator (MNO) must select how to delegate Mobile Device (MD) queries to its Mobile Edge Computing (MEC) server in order to maximize the overall benefit of admitted requests with varying latency needs. Unmanned Aerial Vehicles (UAVs) and Artificial Intelligent (AI) can increase MNO performance because of their flexibility in deployment, high mobility of UAV, and efficiency of AI algorithms. There is a trade-off between the cost incurred by the MD and the profit received by the MNO. Intelligent computing offloading to UAV-enabled MEC, on the other hand, is a promising way to bridge the gap between MDs' limited processing resources, as well as the intelligent algorithms that are utilized for computation offloading in the UAV-MEC network and the high computing demands of upcoming applications. This study looks at some of the research on the benefits of computation offloading process in the UAV-MEC network, as well as the intelligent models that are utilized for computation offloading. In addition, this article examines several intelligent pricing techniques in different structures in the UAV-MEC network. Finally, this work highlights some important open research issues and future research directions of Artificial Intelligent (AI) in computation offloading and applying intelligent pricing strategies in the UAV-MEC network.
Article
Energy consumption is a critical constraint for Unmanned Aerial Vehicles (UAVs) delivery operations to achieve their full potential of providing fast delivery, reducing cost, and cutting emissions. In this article, we propose a synchronized delivery mechanism that employs trucks and UAVs to construct an energy efficient essential service delivery model using Multi-Swarm UAV-Truck (MSUT) framework in a sixth generation (6G) assisted environment. Firstly, we introduce an efficient Brain Storm Optimization (BSO) algorithm that determines the optimal placement location for the trucks and the number of UAV launch sites, given the delivery requirements for optimal delivery of essentials to the target destination. Further, a Multi-Agent Reinforcement Learning (MARL) model, namely Multi-Agent Advantage Actor Critic (MAAC), is employed on UAVs in a swarm for route optimization and efficient energy consumption while en route to the destination. We further investigate the reduced overall delivery time and energy metrics for the proposed UAV-truck network by comparing it with existing Deep Reinforcement Learning (DRL) delivery models.
Article
Full-text available
Wireless communication networks have been witnessing unprecedented demand due to the increasing number of connected devices and emerging bandwidth-hungry applications. Although there are many competent technologies for capacity enhancement purposes, such as millimeter wave communications and network densification, there is still room and need for further capacity enhancement in wireless communication networks, especially for the cases of unusual people gatherings, such as sport competitions, musical concerts, etc. Unmanned aerial vehicles (UAVs) have been identified as one of the promising options to enhance capacity due to their easy implementation, pop-up fashion operation, and cost-effective nature. The main idea is to deploy base stations on UAVs and operate them as flying base stations, thereby bringing additional capacity where it is needed. However, UAVs mostly have limited energy storage, hence, their energy consumption must be optimized to increase flight time. In this survey, we investigate different energy optimization techniques with a top-level classification in terms of the optimization algorithm employed—conventional and machine learning (ML). Such classification helps understand the state-of-the-art and the current trend in terms of methodology. In this regard, various optimization techniques are identified from the related literature, and they are presented under the above-mentioned classes of employed optimization methods. In addition, for the purpose of completeness, we include a brief tutorial on the optimization methods and power supply and charging mechanisms of UAVs. Moreover, novel concepts, such as reflective intelligent surfaces and landing spot optimization, are also covered to capture the latest trends in the literature.
Article
Satellite and unmanned aerial vehicle (UAV) networks have been introduced as enhanced approach to provide dynamic control, massive connections and global coverage for future wireless communication systems. This paper considers a coordinated satellite-UAV communication system, where the UAV performs the environmental reconnaissance task with the assistance of satellite in a hostile jamming environment. To fulfill this task, the UAV needs to realize autonomous trajectory control and upload the collected data to the satellite. With the aid of the uploading data, the satellite builds the environment situation map integrating the beam quality, jamming status, and traffic distribution. Accordingly, we propose a closed-loop anti-jamming dynamic trajectory optimization approach, which is divided into three stages. Firstly, an intentional trajectory planning is made according to the limited prior information and preset points. Secondly, the flight control between two preset points is formulated as a Markov decision process, and a reinforcement learning (RL) based automatic flying control algorithm is proposed to explore the unknown hostile environment and realize autonomous and precise trajectory control. Thirdly, based on the collected data during the UAV's flight, the satellite utilizes an environment situation estimating algorithm to build an environment situation map, which is used to reselect the preset points for the first stage and provide better initialization for the RL process in the second stage. Simulation results verify the validity and superiority of the proposed approach.
Article
Unmanned aerial vehicles (UAVs) have emerged as a promising candidate solution for data collection of large-scale wireless sensor networks (WSNs). In this paper, we investigate a UAV-aided WSN, where cluster heads (CHs) receive data from their member nodes, and a UAV is dispatched to collect data from CHs along the planned trajectory. We aim to minimize the total energy consumption of the UAV-WSN system in a complete round of data collection. Toward this end, we formulate the energy consumption minimization problem as a constrained combinatorial optimization problem by jointly selecting CHs from nodes within clusters and planning the UAV's visiting order to the selected CHs. The formulated energy consumption minimization problem is NP-hard, and hence, hard to solve optimally. In order to tackle this challenge, we propose a novel deep reinforcement learning (DRL) technique, pointer network-A* (Ptr-A*), which can efficiently learn from experiences the UAV trajectory policy for minimizing the energy consumption. The UAV's start point and the WSN with a set of pre-determined clusters are fed into the Ptr-A*, and the Ptr-A* outputs a group of CHs and the visiting order to these CHs, i.e., the UAV's trajectory. The parameters of the Ptr-A* are trained on small-scale clusters problem instances for faster training by using the actor-critic algorithm in an unsupervised manner. At inference, three search strategies are also proposed to improve the quality of solutions. Simulation results show that the trained models based on 20-clusters and 40-clusters have a good generalization ability to solve the UAV's trajectory planning problem in WSNs with different numbers of clusters, without the need to retrain the models. Furthermore, the results show that our proposed DRL algorithm outperforms two baseline techniques.
Article
Full-text available
The unmanned aerial vehicle (UAV) communication is a potential technology to meet the excessive next-generation cellular users’ demand due to its reliable connectivity and cost-effective deployment. However, UAV communications have to be energy efficient so that it can save energy. Thus, the UAV flies sufficiently long enough time to serve the ground users with limited on-board energy. In this paper, we investigate an energy-efficient UAV communication via designing the UAV trajectory path. We consider throughput and the UAV propulsion energy consumption jointly. We assume that the UAV flies at a fixed altitude such that it can avoid tall obstacles. A binary decision variable is assigned to schedule UAV-to-user communication. First, we derive the UAV-to-user channel model based on the line of sight and non-line of sight communication links and jointly optimize the trajectory, transmit power, and the speed of UAV; and UAV-to-user scheduling to maximize throughput. Then, we apply the UAV propulsion energy consumption, which is a function of the UAV trajectory and speed. Finally, we formulate the UAV energy-efficiency maximization problem, which is defined as the total bits of information sent to the ground users by consuming the UAV energy for a given UAV flight duration. The formulated energy-efficiency maximization problem is non-convex, fractional, and mixed-integer non-linear programming in nature. We propose an efficient algorithm based on successive convex approximation and classical Dinkelbach method to achieve the optimal solution of energy-efficient UAV. We present simulation results to validate the efficacy of our proposed algorithms. The results show a significant performance improvement compared to the benchmark methods.
Article
Full-text available
In the field of engineering when a situation is not resolved analytically, efforts are made to develop methods that approximate a possible solution. These efforts have originated the numerical methods known at present, which allow formulating mathematical problems that can be solved using logical and arithmetic operations. This paper presents a comparison between the numerical optimization algorithms golden section search and simulated annealing, which are tested in four different scenarios. These scenarios are functions implemented with a feedforward neural network, which emulate a partial shading behavior in photovoltaic modules with local and global maxima. The presence of the local maxima makes it difficult to track the maximum power point, necessary to obtain the highest possible performance of the photovoltaic module. The programming of the algorithms was performed in C language. The results demonstrate the effectiveness of the algorithms to find global maxima. However, the golden section search method showed a better performance in terms of percentage of error, computation time and number of iterations, except in test scenario number three, where a better percentage of error was obtained with the simulated annealing algorithm for a computational temperature of 1000.
Article
Full-text available
This paper studies unmanned aerial vehicle (UAV) enabled wireless communication, where a rotarywing UAV is dispatched to send/collect data to/from multiple ground nodes (GNs). We aim to minimize the total UAV energy consumption, including both propulsion energy and communication related energy, while satisfying the communication throughput requirement of each GN. To this end, we first derive an analytical propulsion power consumption model for rotary-wing UAVs, and then formulate the energy minimization problem by jointly optimizing the UAV trajectory and communication time allocation among GNs, as well as the total mission completion time. The problem is difficult to be optimally solved, as it is non-convex and involves infinitely many variables over time. To tackle this problem, we first consider the simple fly-hover-communicate design, where the UAV successively visits a set of hovering locations and communicates with one corresponding GN when hovering at each location. For this design, we propose an efficient algorithm to optimize the hovering locations and durations, as well as the flying trajectory connecting these hovering locations, by leveraging the travelling salesman problem (TSP) and convex optimization techniques. Next, we consider the general case where the UAV communicates also when flying. We propose a new path discretization method to transform the original problem into a discretized equivalent with a finite number of optimization variables, for which we obtain a locally optimal solution by applying the successive convex approximation (SCA) technique. Numerical results show the significant performance gains of the proposed designs over benchmark schemes, in achieving energy-efficient communication with rotary-wing UAVs.
Article
Full-text available
The use of flying platforms such as unmanned aerial vehicles (UAVs), popularly known as drones, is rapidly growing in a wide range of wireless networking applications. In particular, with their inherent attributes such as mobility, flexibility, and adaptive altitude, UAVs admit several key potential applications in wireless systems. On the one hand, UAVs can be used as aerial base stations to enhance coverage, capacity, reliability, and energy efficiency of wireless networks. For instance, UAVs can be deployed to complement existing cellular systems by providing additional capacity to hotspot areas as well as to provide network coverage in emergency and public safety situations. On the other hand, UAVs can operate as flying mobile terminals within the cellular networks. In this paper, a comprehensive tutorial on the potential benefits and applications of UAVs in wireless communications is presented. Moreover, the important challenges and the fundamental tradeoffs in UAV-enabled wireless networks are thoroughly investigated. In particular, the key UAV challenges such as three-dimensional deployment, performance analysis, air-to-ground channel modeling, and energy efficiency are explored along with representative results. Then, fundamental open problems and potential research directions pertaining to wireless communications and networking with UAVs are introduced. To cope with the open research problems, various analytical frameworks and mathematical tools such as optimization theory, machine learning, stochastic geometry, transport theory, and game theory are described. The use of such tools for addressing unique UAV problems is also presented. In a nutshell, this tutorial provides key guidelines on how to analyze, optimize, and design UAV-based wireless communication systems.
Article
Full-text available
In this paper, an interference-aware path planning scheme for a network of cellular-connected unmanned aerial vehicles (UAVs) is proposed. In particular, each UAV aims at achieving a tradeoff between maximizing energy efficiency and minimizing both wireless latency and the interference level caused on the ground network along its path. The problem is cast as a dynamic game among UAVs. To solve this game, a deep reinforcement learning algorithm, based on echo state network (ESN) cells, is proposed. The introduced deep ESN architecture is trained to allow each UAV to map each observation of the network state to an action, with the goal of minimizing a sequence of time-dependent utility functions. Each UAV uses ESN to learn its optimal path, transmission power level, and cell association vector at different locations along its path. The proposed algorithm is shown to reach a subgame perfect Nash equilibrium (SPNE) upon convergence. Moreover, an upper and lower bound for the altitude of the UAVs is derived thus reducing the computational complexity of the proposed algorithm. Simulation results show that the proposed scheme achieves better wireless latency per UAV and rate per ground user (UE) while requiring a number of steps that is comparable to a heuristic baseline that considers moving via the shortest distance towards the corresponding destinations. The results also show that the optimal altitude of the UAVs varies based on the ground network density and the UE data rate requirements and plays a vital role in minimizing the interference level on the ground UEs as well as the wireless transmission delay of the UAV.
Article
Dispatching unmanned aerial vehicles (UAVs) to harvest sensing-data from distributed sensors is expected to significantly improve the data collection efficiency in conventional wireless sensor networks (WSNs). In this paper, we consider a UAV-enabled WSN where a flying UAV is employed to collect data from multiple sensor nodes (SNs). Our objective is to maximize the minimum average data collection rate from all SNs subject to a prescribed reliability constraint for each SN by jointly optimizing the UAV communication scheduling and three-dimensional (3D) trajectory. Different from the existing works that assume the simplified line-of-sight (LoS) UAV-ground channels, we consider the more practically accurate angledependent Rician fading channels between the UAV and SNs with the Rician factors determined by the corresponding UAV-SN elevation angles. However, the formulated optimization problem is intractable due to the lack of a closed-form expression for a key parameter termed effective fading power that characterizes the achievable rate given the reliability requirement in terms of outage probability. To tackle this difficulty, we first approximate the parameter by a logistic (‘S’ shape) function with respect to the 3D UAV trajectory by using the data regression method. Then the original problem is reformulated to an approximate form, which, however, is still challenging to solve due to its nonconvexity. As such, we further propose an efficient algorithm to derive its suboptimal solution by using the block coordinate descent technique, which iteratively optimizes the communication scheduling, the UAV’s horizontal trajectory, and its vertical trajectory. The latter two subproblems are shown to be non-convex, while locally optimal solutions are obtained for them by using the successive convex approximation technique. Last, extensive numerical results are provided to evaluate the performance of the proposed algorithm and draw new insights on the 3D UAV trajectory under the Rician fading as compared to conventional LoS channel models.
Article
Unmanned aerial vehicles (UAVs) can be used to serve as aerial base stations to enhance both the coverage and performance of communication networks in various scenarios, such as emergency communications and network access for remote areas. Mobile UAVs can establish communication links for ground users to deliver packets. However, UAVs have limited communication ranges and energy resources. Particularly, for a large region, they cannot cover the entire area all the time or keep flying for a long time. It is thus challenging to control a group of UAVs to achieve certain communication coverage in a long run, while preserving their connectivity and minimizing their energy consumption. Toward this end, we propose to leverage emerging deep reinforcement learning (DRL) for UAV control and present a novel and highly energy-efficient DRL-based method, which we call DRL-based energy-efficient control for coverage and connectivity (DRL-EC 3 ). The proposed method 1) maximizes a novel energy efficiency function with joint consideration for communications coverage, fairness, energy consumption and connectivity; 2) learns the environment and its dynamics; and 3) makes decisions under the guidance of two powerful deep neural networks. We conduct extensive simulations for performance evaluation. Simulation results have shown that DRL-EC 3 significantly and consistently outperform two commonly used baseline methods in terms of coverage, fairness, and energy consumption.
Article
Both caching and interference alignment (IA) are promising techniques for next generation wireless networks. Nevertheless, most existing works on cache-enabled IA wireless networks assume that the channel is invariant, which is unrealistic considering the time-varying nature of practical wireless environments. In this paper, we consider realistic time-varying channels. Specifically, the channel is formulated as a finite-state Markov channel (FSMC). The complexity of the system is very high when we consider realistic FSMC models. Therefore, in this paper, we propose a novel deep reinforcement learning approach, which is an advanced reinforcement learning algorithm that uses deep Q network to approximate the Q value-action function. We use Google TensorFlow to implement deep reinforcement learning in this paper to obtain the optimal IA user selection policy in cache-enabled opportunistic IA wireless networks. Simulation results are presented to show that the performance of cache-enabled opportunistic IA networks in terms of the network's sum rate and energy efficiency can be significantly improved by using the proposed approach.
Article
Wireless communication systems that include unmanned aerial vehicles (UAVs) promise to provide cost-effective wireless connectivity for devices without infrastructure coverage. Compared to terrestrial communications or those based on high-altitude platforms (HAPs), on-demand wireless systems with low-altitude UAVs are in general faster to deploy, more flexibly re-configured, and are likely to have better communication channels due to the presence of short-range line-of-sight (LoS) links. However, the utilization of highly mobile and energy-constrained UAVs for wireless communications also introduces many new challenges. In this article, we provide an overview of UAV-aided wireless communications, by introducing the basic networking architecture and main channel characteristics, highlighting the key design considerations as well as the new opportunities to be exploited.