ArticlePDF Available

Adaptive Security Solutions for NOMA Networks: The Role of DDPG and RIS-Equipped UAVs

Authors:

Abstract and Figures

In this work, we apply the Deep Deterministic Policy Gradient (DDPG) technique to improve the security of a non-orthogonal multiple access (NOMA) downlink network by enabling use of a reconfigurable intelligent surface (RIS) equipped unmanned aerial vehicles (UAV). Our main objective is to prevent eavesdroppers from accessing the network while preserving seamless communication for authorized users. The system is made up of a UAV integrated with an RIS which is essential for optimizing signal paths, and a Base Station. Our work aims to maximize secrecy rates for all users under possible eavesdropping scenarios by dynamically adjusting the RIS’s phase shifts and power allocations. This method not only shows how flexible and successful the DDPG algorithm is at protecting wireless communications when used in conjunction with an RIS but it also highlights how much the algorithm has advanced secure communication systems.
Content may be subject to copyright.
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
158
Adaptive Security Solutions for NOMA Networks: The Role
of DDPG and RIS-Equipped UAVs
Syed Zain Ul Abideen1*, Abdul Wahid2, Mian Muhammad Kamal3
1,2College of Computer Science and Technology, Qingdao University, Qingdao, China
3Joint International Research Laboratory of Information Display and Visualization, School of Electronic
Science and Engineering, Southeast University, Nanjing, China
zain208shah@gmail.com, wahidjan999@gmail.com, mmkamal@seu.edu.cn
*Corresponding author: zain208shah@gmail.com
Abstract
This study employs the Deep Deterministic Policy Gradient (DDPG) algorithm to enhance the security of non-
orthogonal multiple access (NOMA) downlink networks using Reconfigurable Intelligent Surface (RIS) technology
integrated with Unmanned Aerial Vehicles (UAV). The primary aim is to safeguard communications against eavesdropping
while maintaining high-quality service for authorized users. The proposed system features a UAV equipped with an RIS
and a Base Station (BS), designed to optimize secrecy rates by dynamically adjusting RIS phase shifts and power
allocations in real-time. The novel use of DDPG facilitates adaptive optimization in complex, non-convex environments,
effectively mitigating eavesdropping risks and enhancing Physical Layer Security (PLS). Through comprehensive
simulations, this study demonstrates the DDPG algorithm’s superior capability to secure wireless transmissions,
significantly improving the secrecy rate and resource allocation under varying network conditions. The system’s
adaptability allows it to respond efficiently to changing channel conditions, showcasing its potential for high-dimensional
problem-solving in modern communication networks. The findings highlight the strategic importance of integrating AI-
driven learning algorithms with RIS-equipped UAVs for future secure and robust wireless communications.
Keywords: deep deterministic policy gradient algorithm, non-orthogonal multiple access network, physical layer security
enhancement, reconfigurable intelligent surface technology, unmanned aerial vehicle integration.
This%work%is%licensed%under%a%CC%BY-SA.%Copyright%©2024%by%Author.%Published%by%Universitas%Riau. %
INTRODUCTION
Next-level wireless communication systems, particularly those designed for the sixth
generation (6G) networks, are expected to set new standards in connectivity characterized by
high reliability, extremely fast data rates, and minimal latency. RIS and UAV assisted
communication systems are identified as crucial components contributing to this significant
advancement [1]. In such a scenario, NOMA is high- lighted as an impactful access method
intended to enhance spectral efficiency and cater to the increasing demand for wireless
connectivity [2]. RIS have emerged as a strategic approach to alter the propagation
environment. Through the use of programmable metal surfaces containing numerous passive
reflecting elements, RISs can manipulate the phase and amplitude of in- coming signals in
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
159
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
order to tailor propagation paths according to specific communication needs [3]. The inclusion
of this technology in UAV-assisted wireless networks is particularly promising due to its ability
to provide a flexible and responsive solution for achieving comprehensive network coverage
[4]. UAV introduce an additional level of dynamism to the telecommunications ecosystem.
Their ability to be positioned and move in the air allows for rapid and specific deployment,
making them extremely valuable tools for im- proving coverage, capacity, and addressing
service gaps in ground-based networks [5]. When combined with RIS and aided by NOMA
protocols [6], UAVs can further enhance resource allocation efficiency and enhance user
experiences throughout the network. The emergence of RIS as a cornerstone technology has
marked a significant milestone in the progression of PLS, providing a dynamic and controllable
domain for wireless communications [7]. RIS is architected from an assortment of passive
components that are not only cost-efficient but also have the capacity for electromagnetic
modulation, thanks to integrated PIN diodes [8]. With the strategic modulation of signal phases
orchestrated by an advanced control system, RIS enhances signal integrity for approved
receivers while concurrently disrupting those designated for unauthorized interceptors.
When compared to conventional PLS methodologies that utilize artificial noise [9] or advanced
multi-antenna beamforming [10], RIS stands out for its passive operational nature, which
bypasses the high costs associated with RF chains. Moreover, the flexible nature of RIS
facilitates its straightforward incorporation into prevailing network systems and makes it ideal
for attachment to a variety of structures within urban landscapes, as well as to wearable
technology [11]. Nevertheless, the growing interconnections and complexity of systems also
increase the risk of security breaches. The physical layer of wireless communication systems
is particularly vulnerable to eavesdropping and other cyber threats, highlighting the need for
new protective strategies [12]. In this respect RIS offer a valuable means to enhance the
security of the physical layer. Through strategic manipulation of signal reflections [13], RISs
can reduce signal reception at potential eavesdropper locations while strengthening it for
authorized recipients, establishing a secure communication environment. The field of industrial
automation is experiencing a dynamic evolution due to the rise of innovative AI algorithm
frameworks like RL, DL, and particularly DRL. These technologies are streamlining the path
toward real- time automation and sustained advancement. DRL is at the forefront,
distinguished by its capability to effectively process the intricate flow of data within
communication systems and to navigate the complex management of system and resource
control, often presented in non-linear and challenging non-convex scenarios. This level of
adeptness is achieved even in the computationally rigorous task of de- ciphering and network
formation for understanding wireless channels, conducted without reliance on established
channel models or documented patterns of user movements [14]. Moreover, DRL’s strength
lies in its strategic identification of optimal solutions for complex optimization issues, a skill
honed by analyzing the patterns of rewards obtained from interactions in a wireless context,
thus contributing vitally to the advancement of cutting-edge algorithmic designs [11].
RIS have rapidly emerged as a transformative technology within wireless networks gaining
prominence in fields such as NOMA [15], [16], CoMP [17], [18], and UAV-based
communications [19], [20], primarily due to their capacity to markedly enhance network
functionalities. This development was made possible by the innovative research of [15], which
provided an effective method for integrating RIS into NOMA systems while effectively
managing the conflict between sum rate and power efficiency. This was achieved by carefully
applying the SCA technique, which made it possible to improve beamforming and phase shifts
on a periodic schedule. Expanding on this research [16] extended the study to evaluate how
RIS affects the performance of semi-grant- free NOMA systems, proposing algorithms that are
adapted for various RIS setups. The investigation went on [17], who examined the role of RIS
in CoMP communications, considering a spectrum of scenarios from ideal to less than ideal,
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
160
and applying the dual close approach to optimize reflection coefficients. Simultaneously [18],
focused on enhancing the long-term energy efficiency in STAR-RIS- facilitated CoMP networks
by combining methods for active and passive beamforming optimization with a combination of
partial programming and DRL approaches to achieve close to optimal results.
The discussion expands much more to include UAV communications, as [19], addressing the
challenge of enhancing sum rates within RIS-supported multi-UAV NOMA frameworks. This
entailed a holistic optimization strategy that encompassed UAV positioning, power
management, RIS reflection matrix configurations, and the sequencing of NOMA decoding, all
resolved through a BCD iterative methodology. Complementary to this [20], pioneered a novel
DRRL algorithm designed to optimize both UAV flight patterns and beamforming processes
active at the UAV and passive at the STAR-RIS simultaneously thus showcasing the expansive
utility and significant advantages of RIS in advancing the capabilities of contemporary wireless
communication systems. Recent research incorporating RIS has significantly advanced the
topic of Physical Layer Security (PLS). Important advancements include the [21], novel RIS
configuration that protects downlink NOMA systems from attackers and the [22], method that
enhances beamforming in secure wireless systems that use RIS. Robust optimization
strategies were proposed by [23], to address the problem of imperfect eavesdropper CSI. By
combining STAR-RIS with NOMA in a novel way. [24], improved security by creating artificial
noise [25], explored the use of RIS to optimize secure energy efficiency in the context of UAVs
and [26], used aerial RIS to adjust signal distributions for NOMA systems in order to improve
secrecy rates, highlighting the revolutionary effect of RIS on improving wireless communication
security.
This work is motivated by the critical role that RIS play in the development of 6G wireless
communication technology. We are at the beginning of a new age in wireless communications,
and with it comes a growing need for improved energy efficiency, cost reduction and spectrum
utilization. RIS presents a fresh solution to these problems because of its capacity to
reorganize electromagnetic wave propagation especially in situations where direct
communication routes are restricted. Moreover, the escalating concerns regarding security
breaches in wireless communications highlight the urgency of integrating robust PLS
measures. RIS technology presents an innovative way to secure data transmission against
illegal access and eavesdropping, while simultaneously enhancing service quality through
innovative non-line-of-sight connections. By carefully modifying RIS elements one can boost
QoS and strengthen security while simultaneously leveraging the unique spatial selectivity that
comes with RIS to optimize signal confidentiality for authorized users and restrict it for potential
eavesdroppers.
The integration of RIS and NOMA along with the potential integration with UAVs highlight the
revolutionary possibilities of these technologies in establishing a more adaptable, effective and
safe wireless communication environment. The potential to leverage these state-of-the-art
technologies to tackle the challenges of dynamic multi-user environments where traditional
optimization techniques fall short because of their non- linear nature is what drives this
research. Adaptability and Resource Efficiency: Our work employs the DDPG algorithm to
simultaneously optimize power distribution, phase shifting, and UAV positioning allowing for
dynamic adaptation to changing channel conditions and user requirements. With the help of
this comprehensive optimization approach the system is able to efficiently allocate wireless
resources in real-time maximizing spectrum efficiency and ensuring maximum performance in
a variety of operational conditions. Moreover, the system may constantly im- prove its
approaches responding to changing network dynamics and improving overall utilization of
resources in wireless communication environments through using AI-driven learning
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
161
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
algorithms. Securing Wireless Communication: Our work presents distinctive PLS approaches
that attempt to enhance the system’s security measures against potential eaves- dropping
threats in addition to optimizing system performance. Our solution strengthens the security and
integrity of wireless transmissions by incorporating PLS mechanisms like secure beamforming
and trans- mission techniques into the optimization framework. This reduces the possibility of
unauthorized access and information interception. The practical relevance and significance of
our research in protecting wireless communication systems from malicious actors is high-
lighted by the need of this proactive security approach which is especially important in wireless
communication networks handling confidential information or operating in unsafe
environments. Our aims are to demonstrate the efficacy of the DDPG- based strategy through
comprehensive computer simulations demonstrating its ability to strengthen the physical layer
against security attacks and improve overall network performance. The goal of this research
is to fully utilize RIS, NOMA, and UAV technologies in order to influence the development of
secure and efficient wireless communication networks in the future.
Figure 1. System Model
In a complex communication system that utilizes UAV-assisted NOMA to enhance the
confidentiality of the physical layer, the structure consists of a Base Transmission Station
responsible for sending data to a set of K end-users represented as {1, 2, ..., K}. A UAV with
an RIS comprising N passive modulators aids in reflecting and directing signals for this data
relay. The UAV operates independently at a predetermined altitude above the specified area
{A}, commencing its mission from an established charging point. The system is designed for
quasi-static frequency flat-fading channels assuming perfect Channel State Information
availability at both the BTS and the UAV-mounted IRS. Energy consumption and operational
duration considerations are abstracted away for simplicity. This sophisticated setup is
vulnerable to potential covert surveillance attempts by a group of eavesdroppers denoted as
{E} = {1, 2, ..., E}, aiming to illicitly intercept communication. Notably, due to utilizing the DDPG
method, the proposed algorithm can adjust to changing channel conditions across different
time slots while maintaining consistency within each individual time slot ensuring resilience
and dependability in dynamic communication environments. The linkage between the BTS and
the RIS is characterized by G ! "!"#"$, representing the propagation path of the transmitted
signal. Concurrently, the communication channels from the RIS to each k-th end-user and e-
th eavesdropper are denoted by #%&' ! "!"#"$ and #%&( ! "!"#"$, respectively, capturing the
nuances of signal reflection and potential attenuation or enhancement due to the RIS's
modulation capabilities.
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
162
The reception at each k-th end-user is mathematically modeled as:
where Φ = diag$%)*!&'&%)*"( encapsulates the RIS's phase shift capabilities. )+ represents the
power allocated to the i-th user's signal *,, and +' symbolizes the additive white Gaussian
noise, inherent in wireless communication, at the k-th end-user's receiver. ,+ denotes the BS's
power allocation coefficient for the i-th user, constrained within the interval [0,1]. The sum of
these coefficients across all K users equals 1. The transmitted signal for the i-th user is
represented by *,, designed such that its expected power equals 1, denoted as -.*+
-/0 12. The
noise affecting the k-th user's signal, denoted as +', follows a complex normal distribution with
zero mean and variance 3-. The RIS, positioned on a UAV, is located at v(x, y) with a height
#,&1while the BS is at the origin (0,0) with height #.. The horizontal position of each k-th user is
given by uk(xk,yk), surveillance by the %45# eavesdropper yields the intercepted signal as:
6/0$70&/
189(:;
2
34$
<=3>3?@/&A!B111CDE
1
For each legitimate user C1F1E, the channel gain considering the path loss can be expressed
as:
where α is the path loss coefficient, and dBI and G,5# represent the distances from the base
station (BS) to the RIS and from the RIS to the k user respectively.
To incorporate Eve into this model, we can extend the scenario to include the channel vector
from the RIS to Eve, hre, and the distance from the RIS to Eve, dIe. The channel gain for Eve
would then be similar to that of the legitimate users, adjusted for Eve's position:
In the SINR calculations for implementing Successive Interference Cancellation (SIC) among
NOMA users, it's important to account for the potential interception by Eve. The SINR for a
user j decoding the signal intended for a weaker user t is given by:
For Eve attempting to decode the signal intended for user t the SINR would be:
This formulation allows the system to evaluate the risk posed by Eve's interception attempts
and adjust the power allocation pi and phase shifts θn accordingly to ensure secure
communication. To ensure effective SIC and maintain the system's security, the data rates for
the legitimate users must be optimized to maximize the difference between their SINRs and
𝑦𝑘=𝐡𝑟,𝑘
𝐻𝚽𝐆 𝑖=1
𝐾
*𝑝𝑖𝑠𝑖+𝑛𝑘,𝑘 𝒦(1)
𝒉𝑘=𝒉𝑟𝑘
𝐻𝚽𝒈
(𝑑𝐵𝐼𝑑𝐼𝑢𝑘)𝛼// (3)
𝒉𝑒=𝒉𝑟𝑒
𝐻𝚽𝒈
(𝑑𝐵𝐼𝑑𝐼𝑒)𝛼(4)
SINR𝑡→𝑗 =|𝒉𝑗|2𝑃max𝜌𝑡
𝑖=𝑡+1
𝐾)|𝒉𝑗|2𝑃max𝜌𝑖+𝜎2
(5)
SINR
𝑡→𝑒 =|𝒉𝑒|2𝑃max𝜌𝑡
𝑖=𝑡+1
𝐾)|𝒉𝑒|2𝑃max𝜌𝑖+𝜎2
(6)
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
163
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
Eve's SINR, effectively increasing the secrecy rate and making the system robust against
eavesdropping. To ensure physical layer confidentiality, the covert communication rate or
secrecy rate for each 1F45# user, which quantifies the secure information transmission rate,
is defined as:
where Rk is the legitimate communication rate to the k-th user, and Re,k is the potential
information rate accessible to the e-th eavesdropper. These rates are articulated by: effectively
capturing the dynamics of secure and potentially compromised communication paths. The
principal objective in this advanced communication paradigm is to maximize the sum of all
users' secrecy rates, which is pivotal for ensuring robust secure communication against
eavesdropping threats.
The use of the DDPG algorithm is essential in the setting of the integrated RIS-equipped UAV-
assisted NOMA downlink system with a focus on improving PLS [27]. This section begins with
a brief overview of DDPG and then goes into great detail into how the DDPG framework was
modified to fit the given optimization challenge.
Figure 2. DDPG Diagram
𝑅𝑘=log21+ |𝐡𝑟,𝑘
𝐻𝚽𝐆|2𝑝𝑘
𝜎2+𝑖≠𝑘 ,|𝐡𝑟,𝑘
𝐻𝚽𝐆|2𝑝𝑖
(8)
𝑅𝑒,𝑘=log21+ |𝐡𝑟,𝑒
𝐻𝚽𝐆|2𝑝𝑘
𝜎2+𝑖≠𝑘-|𝐡𝑟,𝑒
𝐻𝚽𝐆|2𝑝𝑖
(9)
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
164
Since conventional algorithms such Deep Q-Networks (DQN), are mostly designed to operate
in discrete action spaces they are constrained in their application to continuous action spaces.
In order to solve our stated problem, this constraint compels us to look into alternative
strategies such DDPG. Using deep function approximators, DDPG which is characterized by
its model free off-policy actor-critic mechanism efficiently explores the high-dimensional
continuous action space [28]. The deterministic policy gradient technique which is a
fundamental component of DDPG maps a particular action α systematically and assesses its
effectiveness using a Ǫ function, Ǫ(s,a,θq) where θq stands for the parameters of the critic
network. Optimizing the output Ǫ value is the main goal of DDPG, which also improves system
performance overall. Experience replay is a key component of DDPG, since it solves the
problem of non-i.i.d. (independently and identically distributed) data that result from
environment exploration in sequence [29]. To maintain stability and reduce variation in the
learning process this is combined with the use of soft update approaches. The soft update
technique uses the following equation to gradually adjust the target network parameters θ`, to
match the learned evaluation network:
Where τ is significantly less than 1, ensuring a conservative update approach. Moreover, the
investigation in the continuous action space presents an important challenge that
DDPG successfully resolves by introducing noise N into the policy. The following equation
describes this approach:
where N is the environment dependent noise and μ′(st) is the target policy with extra noise
encouraging exploration and helping in the identification of optimal policies. This
comprehensive approach makes use of DDPG's capabilities to provide a solid basis for
addressing the challenges of PLS optimization in the RIS-UAV-NOMA downlink system
ensuring effective and secure network connectivity [30]. After introducing the DDPG algorithm
and implementing it in a secure RIS-equipped UAV-assisted NOMA downlink communication
system we then explore the details of processing Deep Reinforcement Learning (DRL) in this
complicated environment. In our advanced RIS-enhanced UAV-supported NOMA downlink
system, the dynamic nature of wireless channels constitutes the environment in which our
system operates. The collaborative setup of the RIS coupled with an UAV act as the central
agent within this scenario, tasked with the dual objectives of maximizing transmission efficiency
and ensuring robust security at the physical layer. Our Deep Reinforcement Learning (DRL)
framework is crafted to mirror the unique features and goals of our setup through several key
elements. For each discrete interval, labeled as t, the state space is meticulously defined to
capture an exhaustive view of the system’s present condition, incorporating aspects such as:
Prior timestep secrecy metrics (Rs,k(t−1)) for individual users k, underscoring the
emphasis on safeguarding data confidentiality.
Last known RIS modulation phases (θ) and power distribution profiles (ρ), key
parameters for optimizing signal propagation and energy consumption.
The UAV’s most recent positioning coordinates (x and y), vital for optimal aerial relay
positioning and interference management.
𝜃ʹ𝜏𝜃 +(1 𝜏)𝜃ʹ(10)
𝜇
(𝑠𝑡)= 𝜇(𝑠𝑡; 𝜃𝜇)+ 𝒩(11)
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
165
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
This state is mathematically captured as:
Action Space Configuration
Aligned with the state st, the action set at includes a suite of strategic decisions enacted
by the RIS- UAV duo, designed to navigate the intricacies of the prevailing wireless
communication landscape and user requirements. These actions encompass:
Fine-tuning of RIS reflective properties (θ) to enhance signal directionality and
strength.
Strategic adjustments in transmission power allocations (ρ) to users, ensuring
optimal energy utilization.
Real-time repositioning of the UAV (x, y) to maintain superior signal quality and
reduce potential interference.
:
Reward Function
At each interval t, the reward metric r
t
is ingeniously formulated to resonate with the
system’s overarching aims of enhancing throughput and reinforcing data security
across the wireless network. It amalgamates
the total system throughput and the
collective secrecy
performance across all users:
In this formulation, α and β are coefficients that calibrate the focus between operational
efficiency and security enhancement, facilitating a balanced optimization approach.
Through this nuanced DRL processing methodology, our RIS-UAV framework adeptly learns
to traverse the dynamic terrain of our NOMA downlink communication system. By judiciously
making decisions that not only propel the throughput but also significantly elevate the system’s
security posture, this adaptive learning paradigm, rooted in DDPG methodologies, marks a
significant leap towards achieving secure, high-performing, and adaptive wireless
communication networks.
In optimizing our sophisticated RIS-aided UAV-based NOMA downlink architecture, the
DRL strategy is finely tuned to respect the system's Quality of Service (QoS) benchmarks
and other operational intricacies. Here’s how the DRL scheme, particularly leveraging the
DDPG algorithm, methodically addresses the stipulated problem:
Initial Data Rate Validation for QoS Alignment: At every decision point, marked as timestep
t, the DRL algorithm diligently computes the transmission rate
R
(t)
for every user in the
setup. This step is pivotal for confirming adherence to predefined QoS criteria, a
fundamental aspect delineated in our problem’s constraint 5b. Actions leading to
satisfactory QoS fulfillment are stored within the model’s memory buffer, reinforcing
optimal behavior. In contrast, actions resulting in QoS breaches are either penalized or
discarded to prevent recurrence, steering the DRL agent towards more effective strategies.
Adaptive SIC Procedure Refinement: Navigating the complexities of efficient SIC
execution, a critical facet of NOMA systems, our DRL framework opts for an agile
𝑠𝑡=[𝑅1
(𝑡1), ⋯, 𝑅𝐾
(𝑡−1), 𝑅𝑠,1
(𝑡−1), ⋯, 𝑅𝑠,𝐾
(𝑡1),
𝜃1
(𝑡−1), ⋯, 𝜃2𝑁
(𝑡−1), 𝜌1
(𝑡1), ⋯, 𝜌𝑘
(𝑡1), 𝑥(𝑡−1), 𝑦 (𝑡−1)](12)
𝑎𝑡= 𝜃1
(𝑡), , 𝜃2𝑁
(𝑡), 𝜌1
(𝑡), ⋯,𝜌𝑘
(𝑡), 𝑥(𝑡),𝑦(𝑡) (13)
!"#$%#&
'
(
)
(
*%
+", -.%#&
'
(
)
(
*/0%
+", +&1,
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
166
"
approach. Post each action, it recalculates channel vectors, adjusting the decoding
sequence to suit the current channel dynamics. This flexibility ensures the SIC requirement
is inherently met, underscoring the DRL algorithm’s capacity to adjust to live changes
within the communication framework, thus optimizing the SIC mechanism’s effectiveness.
Proposition: Guaranteeing SIC Compliance: The assurance of the SIC constraint within
our DRL framework is showcased through an adaptive decoding sequence optimization,
reliant on the immediate channel state. This assertion is supported by revisiting the SINR
formulations, for user t decoding user j’s channel:
And for user t decoding at its own channel:
Given the channel condition:
it's evident that the SINR for decoding at j consistently meets or surpasses the self-decoding
SINR at t thereby upholding the SIC stipulation. This highlights the DRL algorithm's adeptness
in dynamically tailoring to channel fluctuations while ensuring the SIC protocol's integrity within
the NOMA framework. The DRL approach refocuses the optimization challenge towards
elevating each user’s data rate, while stringently adhering to the system’s QoS requisites and
the physical constraints tied to the RIS and UAV:
Maintaining QoS by ensuring
R
tt
R
min
for all network users t, observing power distribution
boundaries and UAV spatial constraints for optimal resource utilization, Confirming RIS phase
adjustments remain within practical limits, facilitating peak system operation. This structured
DRL methodology, powered by the nuanced capabilities of the DDPG algorithm, proficiently
navigates through the multifaceted constraints and goals inherent in the RIS-enhanced UAV-
assisted NOMA downlink system. It not only propels network efficiency but also substantially
fortifies the physical layer’s security, illustrating the potential for ground-breaking progress in
secure and high-performing wire- less communication networks.
The foundation of our DDPG-based learning mechanism is established upon four critical neural
networks: the target and evaluation networks for both the actor and critic components.
Specifically, the target
actor
network
(
θ
µ
)
and
the
evaluation
actor
network
(
θ
µ
), alongside
their
critic
counterparts
(
θ
q
and
θ
q
),
are meticulously constructed with parallel architectures to
ensure consistency in learning and prediction dynamics. An experience replay buffer (B), with
a predefined capacity (C), serves as the memory infrastructure, storing transitions that
encapsulate the state-action-reward sequences experienced by the agent. Within each
episode of the learning process, a fresh initialization of channel gain (hk), RIS phase shifts (Φ),
|𝑗||𝑡|
𝑚𝑎𝑥
{𝛩,𝑎,𝑣}
𝑡=1
𝐾
.
𝑅𝑡→𝑡 (17)
𝑆𝐼𝑁𝑅𝑡→𝑗 =𝑃
max
𝜌𝑡
𝑖=𝑡+1
𝐾
3
𝑃
max
𝜌𝑖+𝜎2
|ℎ𝑗|2
(15)
!"#$%&% '(
)*+
,%
-'%./
0
1
2
1
(
)*+
,-.34
56%54
7/89
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
167
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
and user positions (u) within the designated area (A) is conducted. The UAV’s horizontal
position (v) is set at a predetermined point, and power allocations (ρ) are uniformly distributed
as initial conditions. Following this setup, the system computes the data rates (Ri(t)) for all users,
setting the stage for the initial state (st). The evaluation actor network then processes st to
generate the corresponding action (at), which encompasses decisions on phase shifts, power
allocations, and UAV positioning. The immediate reward (rt), reflective of the system’s
performance as per equation [9], and the subsequent state (st+1), as determined by equation
[8], are determined thereafter. These elements form a transition ({st, at, rt, st+1}) that is
archived within the replay buffer (D). Upon filling the replay buffer, the training phase
commences, wherein each episode entails updating the current transition within D,
followed by the extraction of a mini-batch consisting of NB transitions ({si, ai, ri, si+1}) for
processing. The target Q values (yi) are computed for each transition in the minibatch,
employing the equation:
Here, λ denotes the discount factor, emphasizing the value of future rewards. The critic
evaluation network (θq) is updated by minimizing the loss function:
Subsequently, the actor evaluation network (θu) is refined through gradient ascent, leveraging
the gradient of the Q function with respect to the actor parameters:
The algorithm iterates through these steps, periodically employing soft updates (as per
equation (6)) to gradually align the target networks (θμ′ and θq′) with their evaluation
counterparts, thereby ensuring a stable convergence towards optimal policy and value
function. The reward function rt directly incorporates the secrecy rate equations from our
system model, ensuring that the optimization is closely aligned with the objective of enhancing
PLS The state and action representations are designed to capture the essential elements of
the RIS-equipped UAV-NOMA system, including the UAV's mobility, the RIS's configurability,
and the dynamic wireless environment. Adjustments in the algorithm (e.g., network
architectures, exploration strategies) might be necessary to accommodate the complexity and
specifics of our communication system model. This structured DDPG algorithm, tailored to our
RIS-UAV-NOMA system, provides a mathematical framework for optimizing the system's
performance with respect to PLS leveraging deep reinforcement learning techniques to
address the high-dimensional and non-convex nature of the problem.
RESULTS AND DISCUSSION
In our investigation, we employ a DDPG-driven framework tailored for an RIS-supported UAV-
NOMA communication setup to scrutinize its efficacy in boosting system performance and
security. The simulation setup positions the Base Station (BS) at the coordinate origin while
situating the RIS-equipped UAV at the initial location of (50,0). The designated user area,
defined by the vertices (45,45), (55,45), (55,55), and (45,55) hosts users whose positions are
𝑦𝑖= 𝑟𝑖𝑟𝑖+ 𝜆𝑄
𝑠𝑖+1, 𝜇
𝑠𝑖+1; 𝜃𝜇
;𝜃𝑞
(18)
!"#$%& '
()*&'
()
+
,
+
"-*./"0*12*3#$%%4"'5%
𝛻𝜃𝜇𝐽 = 1
𝑁𝐵𝑖=1
𝑁𝐵
+
𝛻𝑎𝑄(𝑠𝑖,𝜇(𝑠𝑖;𝜃𝜇);𝜃𝑞)|𝛻𝜃𝜇𝜇(𝑠𝑖;𝜃𝜇) (20)
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
168
predetermined and remain constant throughout each simulation episode. The system assumes
Line-of-Sight (LoS) connectivity for both the BS-to-RIS and RIS-to-user links adopting a Rician
fading model articulated as:
Here, represents the line-of-sight component, HRayleigh signifies the non-line-of-sight
component subject to Rayleigh fading, and is the Rician K-factor set to 10 for our simulations.
The path loss exponent is denoted by α and is chosen as 2. Channel conditions are randomly
generated at exponent is denoted by α and is chosen as 2. Channel conditions are randomly
generated at the onset of each episode and remain static for the episode's duration. The BS
and the RIS-equipped UAV are fixed at altitudes of 20 meters and 30 meters, respectively.
Further parameters include a noise power setting of σ2 = -60dB and a baseline user rate
requirement of Rmin= 1.2 bps/Hz. The network architecture for the Actor network comprises a
dual-layer fully connected neural network for both the evaluation and target networks, with
input and output layers sized according to the dimensions of the state and action vectors.
Activation functions include ReLU for the initial layer and tanh for the output layer to ensure a
strong gradient signal. The Critic network adopts a similar two-layered structure, processing
state and action inputs through separate pathways before merging and applying ReLU
activation, leading into the final output layer. Batch normalization is applied across both
networks to stabilize learning. Hyperparameters for the simulation include an evaluation
network learning rate (β) of 0.0001, a discount factor (λ) of 0.95, a soft update rate (τ) of 0.005,
a replay buffer capacity (C) of 50,000, 1500 episodes, 300 steps per episode, and a minibatch
size (NB) of 16. Exploration noise, introduced to promote policy diversity, follows a complex
Gaussian distribution with zero mean and a variance of 0.1. To incorporate the potential threat
posed by eavesdroppers (Eve) into the system, the simulation contemplates the RIS-to-Eve
channel, denoted as hr,e, and the RIS-to-Eve distance, dI,e. The channel gain experienced by
Eve, while analogous to that of legitimate users, is adjusted for Eve's specific location,
expressed as:
This incorporation enables the system to evaluate and mitigate the risks associated with Eve's
interception attempts, fine-tuning power allocations pi and phase adjustments θn to safeguard
communication channels. In Figure. 3, the graph showcases the progression of secrecy rates
over a sequence of episodes for various learning rates in a UAV-aided communication network
using a DDPG algorithm. The system, aimed at bolstering PLS adapts and refines its
performance across episodes. Learning rates are a critical factor here: a high rate (LR=0.1)
leads to rapid learning but with considerable volatility, potentially due to over-adjustments.
Conversely, lower learning rates, such as LR=0.001, demonstrate a gradual but stable
enhancement in secrecy rates, suggesting a more methodical learning approach that might
converge more reliably to optimal strategies for secure communications. This balance between
speed and stability in learning rates is crucial for the algorithm to efficiently navigate and
optimize the complex dynamics of the secure communication environment.
In Figure. 4 depicts the impact of different numbers of eavesdroppers (denoted by E) on the
secrecy rate in a UAV-assisted secure communication system, as episodes progress. A clear
𝑮=Ω
Ω+1𝑯
$+1
Ω+1𝑯Rayleigh
𝐡𝑒=
𝐡𝑟,𝑒
𝐻𝚽𝐆
(𝑑𝐵,𝐼 𝑑𝐼,𝑒)𝛼
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
169
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
pattern emerges: as the number of eavesdroppers increases from E=2 to E=5, there is a
general trend of decreasing secrecy rate, indicating that more eavesdroppers make it
challenging to maintain high levels of secure communication. The plot with E=2 reaches the
highest secrecy rate more quickly and maintains it with less fluctuation, suggesting that fewer
eavesdroppers make it easier for the system to optimize security. Conversely, with E=5, the
system takes longer to reach a stable and high secrecy rate, and it experiences more
significant variance, reflecting the increased complexity of optimizing secure communication
in the presence of more eavesdropping threats. This illustrates the system's adaptive learning
in response to the number of eavesdroppers, which directly affects the secrecy rate over time.
Figure 3. Different Learning rates
Figure 4. Different Number of Eves
In Figure. 5 displays the evolution of the secrecy rate, measured in bits per second per Hertz
(bps/Hz), across different episodes for varying numbers of users in a secure communication
system aided by an UAV. As the episodes increase, the secrecy rate for each scenario with
different user counts ranging from K=2 to K=5 tends to rise and eventually stabilizes. Initially,
the learning algorithm quickly enhances the secrecy rate for lower user scenarios (K=2 and
K=3), indicating that the system can more easily optimize for fewer users. As the user count
increases (K=4 and K=5), the rate at which the secrecy rate improves slows down, suggesting
that a greater number of users presents additional challenges for the system to optimize secure
communication. However, all scenarios reach a point of convergence, indicating that despite
the complexity introduced by more users, the system's learning algorithm can adapt and
enhance the security over time. In Figure. 6 plots the secrecy rate in bits per second per Hertz
(bps/Hz) against the transmit power in decibels (dB) for different configurations of an RIS using
both a proposed DDPG approach and a random RIS orientation. It compares the performance
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
170
of a 50-element and 100-element RIS under both methodologies. As the transmit power
increases, all configurations demonstrate an increase in the secrecy rate. The proposed DDPG
algorithm with 100 RIS elements achieves the highest secrecy rate, indicating that the DDPG
approach efficiently optimizes the phase shifts of the RIS for enhanced secure communication.
In contrast, a random RIS orientation, even with the same number of elements, results in a
significantly lower secrecy rate, highlighting the benefits of intelligent phase shift design in RIS-
aided communication systems. The difference in performance between the proposed DDPG
method and the random approach is more pronounced at higher element counts, suggesting
that the advantages of the DDPG algorithm become more substantial as the number of RIS
elements increases.
Figure 5. Different Number of Users
Figure 6. Transmit Power
In Figure. 7 depicts a comparison of secrecy rates achieved by employing an RIS with different
numbers of elements. It compares a proposed DDPG optimization strategy against a random
RIS element orientation at two transmit power levels, 10dB and 30dB. The DDPG strategy
outperforms the random orientation at both power levels, with its advantage more pronounced
at the higher number of RIS elements.
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
171
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
Figure 7. Number of RIS Elements
The DDPG method maintains superior performance even at the lower transmit power,
highlighting the effectiveness of intelligent optimization in enhancing secure wireless
communication.
CONCLUSION
This research successfully demonstrates the potential of integrating Reconfigurable Intelligent
Surfaces (RIS) and Unmanned Aerial Vehicles (UAV) in Non-Orthogonal Multiple Access
(NOMA) downlink networks to enhance physical layer security (PLS). By utilizing the Deep
Deterministic Policy Gradient (DDPG) algorithm, the study achieves significant improvements
in maintaining secure communications while optimizing resource allocation and system
performance. The proposed DDPG-based approach efficiently adapts to dynamic channel
conditions, showcasing robust real-time optimization capabilities for power distribution, RIS
phase shifts, and UAV positioning. The findings from the simulations reveal that incorporating
RIS technology with UAVs allows for better control over the propagation environment,
effectively minimizing the risks associated with eavesdropping. The advanced AI-driven
framework presented in this study not only ensures superior secrecy rates but also
demonstrates the scalability and adaptability required for future 6G wireless communication
systems. The results underline the strategic importance of leveraging AI and machine learning
techniques to address complex non-linear optimization problems in wireless security. This
research paves the way for future exploration into combining reinforcement learning algorithms
with advanced wireless technologies to develop more secure and efficient communication
networks.
REFERENCES
[1] J. Li, S. Xu, J. Liu, Y. Cao, and W. Gao, “Reconfigurable Intelligent Surface Enhanced
Secure Aerial-Ground Communication,” IEEE Transactions on Communications, vol. 69,
no. 9, pp. 61856197, Jun. 2021.
[2] V. N. Vo, C. So-In, H. Tran, D.-D. Tran, and T. P. Huu, “Performance Analysis of an Energy-
Harvesting IoT System Using a UAV Friendly Jammer and NOMA Under Cooperative
Attack,” IEEE Access, vol. 8, pp. 221986222000, Dec. 2020.
[3] A. S. Abdalla, T. F. Rahman, and V. Marojevic, “UAVs with reconfigurable intelligent
surfaces: Applications, challenges, and opportunities,” arXiv preprint arXiv:2012.04775,
2020.
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
172
[4] W. U. Khan, A. Mahmood, C. K. Sheemar, E. Lagunas, S. Chatzinotas, and B. Ottersten,
“Reconfigurable Intelligent Surfaces for 6G Non-Terrestrial Networks: Assisting
Connectivity from the Sky,” IEEE Internet of Things Magazine, vol. 7, no. 1, pp. 3439,
Jan. 2024.
[5] T. Alladi, Naren, G. Bansal, V. Chamola, and M. Guizani, “Secauthuav: A novel
authentication scheme for uav-ground station and uav-uav communication,” IEEE
Transactions on Vehicular Technology, vol. 69, no. 12, pp. 1506815077, Oct. 2020.
[6] S. Jiao, X. Xie, and Z. Ding, “Deep reinforcement learning based optimization for irs based
uav-noma downlink networks,” arXiv preprint arXiv:2106.09616, 2021.
[7] J. Yao and J. Xu, “Joint 3D Maneuver and Power Adaptation for Secure UAV
Communication With CoMP Reception,” IEEE Transactions on Wireless Communications,
vol. 19, no. 10, pp. 69927006, Jul. 2020.
[8] M. Najafi, V. Jamali, R. Schober, and H. V. Poor, “Physics-Based Modeling and Scalable
Optimization of Large Intelligent Reflecting Surfaces,” IEEE Transactions on
Communications, vol. 69, no. 4, pp. 26732691, Dec. 2021.
[9] F. Zhou, Z. Li, J. Cheng, Q. Li, and J. Si, “Robust AN-Aided Beamforming and Power
Splitting Design for Secure MISO Cognitive Radio With SWIPT,” IEEE Transactions on
Wireless Communications, vol. 16, no. 4, pp. 24502464, Mar. 2017.
[10] X. Zhang, X. Zhou, and M. R. McKay, “Enhancing Secrecy with Multi-Antenna
Transmission in Wireless Ad Hoc Networks,” IEEE Transactions on Information Forensics
and Security, vol. 8, no. 11, pp. 18021814, Aug. 2013.
[11] Y. Liu, X. Liu, X. Mu, T. Hou, J. Xu, M. Di Renzo, and N. Al-Dhahir, “Reconfigurable
Intelligent Surfaces: Principles and Opportunities,” IEEE Communications Surveys
Tutorials, vol. 23, no. 3, pp. 15461577, May 2021.
[12] X. Sun, D. W. K. Ng, Z. Ding, Y. Xu, and Z. Zhong, “Physical layer security in uav systems:
Challenges and opportunities,” IEEE Wireless Communications, vol. 26, no. 5, pp. 4047,
Oct. 2019.
[13] Z. Zhang, C. Zhang, C. Jiang, F. Jia, J. Ge, and F. Gong, “Improving physical layer security
for reconfigurable intelligent surface aided NOMA 6G networks,” IEEE Transactions on
Vehicular Technology, vol. 70, no. 5, pp. 44514463, Mar. 2021.
[14] C. Huang, G. C. Alexandropoulos, A. Zappone, C. Yuen, and M. Debbah, “Deep Learning
for UL/DL Channel Calibration in Generic Massive MIMO Systems,” in ICC 2019 - 2019
IEEE International Conference on Communications (ICC), Shanghai, China, Jul. 2019, pp.
1–6.
[15] T. Wang, F. Fang, and Z. Ding, “An SCA and Relaxation Based Energy Efficiency
Optimization for Multi-User RIS-Assisted NOMA Networks,” IEEE Transactions on
Vehicular Technology, vol. 71, no. 6, pp. 68436847, Mar. 2022.
[16] J. Chen, L. Guo, J. Jia, J. Shang, and X. Wang, “Resource Allocation for IRS Assisted
SGF NOMA Transmission: A MADRL Approach,” IEEE Journal on Selected Areas in
Communications, vol. 40, no. 4, pp. 13021316, 2022.
[17] J. Chen, Y. Xie, X. Mu, J. Jia, Y. Liu, and X. Wang, “Energy Efficient Resource Allocation
for IRS Assisted CoMP Systems,” IEEE Transactions on Wireless Communications, vol.
21, no. 7, pp. 56885702, Jan. 2022.
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
173
Received: April 9, 2024, Revised: May 9, 2024, Accepted: October 30, 2024
https://doi.org/10.31258/ijeepse.7.3.158-174
[18] J. Chen, Z. Ma, Y. Zou, J. Jia, and X. Wang, “DRL-based Energy Efficient Resource
Allocation for STAR-RIS Assisted Coordinated Multi-Cell Networks,” in GLOBECOM 2022
- 2022 IEEE Global Communications Conference, Jan. 2022, pp. 42324237.
[19] X. Mu, Y. Liu, L. Guo, J. Lin, and H. V. Poor, “Intelligent Reflecting Surface Enhanced
Multi-UAV NOMA Networks,” IEEE Journal on Selected Areas in Communications, vol.
39, no. 10, pp. 30513066, Jun. 2021.
[20] J. Zhao, Y. Zhu, X. Mu, K. Cai, Y. Liu, and L. Hanzo, “Simultaneously Transmitting and
Reflecting Reconfigurable Intelligent Surface (STAR-RIS) Assisted UAV
Communications,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 10,
pp. 3041–3056, Aug. 2022.
[21] Z. Tang, T. Hou, Y. Liu, J. Zhang, and C. Zhong, “A Novel Design of RIS for Enhancing
the Physical Layer Security for RIS-Aided NOMA Networks,” IEEE Wireless
Communications Letters, vol. 10, no. 11, pp. 23982401, Aug. 2021.
[22] M. Cui, G. Zhang, and R. Zhang, “Secure Wireless Communication via Intelligent
Reflecting Surface,” IEEE Wireless Communications Letters, vol. 8, no. 5, pp. 14101414,
May 2019.
[23] X. Yu, D. Xu, Y. Sun, D. W. K. Ng, and R. Schober, “Robust and Secure Wireless
Communications via Intelligent Reflecting Surfaces,” IEEE Journal on Selected Areas in
Communications, vol. 38, no. 11, pp. 26372652, Jul. 2020.
[24] Y. Han, N. Li, Y. Liu, T. Zhang, and X. Tao, “Artificial Noise Aided Secure NOMA
Communications in STAR-RIS Networks,” IEEE Wireless Communications Letters, vol.
11, no. 6, pp. 11911195, Mar. 2022.
[25] H. Long, M. Chen, Z. Yang, Z. Li, B. Wang, X. Yun, and M. ShikhBahaei, “Joint Trajectory
and Passive Beamforming Design for Secure UAV Networks with RIS,” in 2020 IEEE
Globecom Workshops (GC Wkshps), Mar. 2020, pp. 16.
[26] D. Wang, Y. Zhao, Y. Lou, L. Pang, Y. He, and D. Zhang, “Secure NOMA Based RIS-UAV
Networks: Passive Beamforming and Location Optimization,” in GLOBECOM 2022 - 2022
IEEE Global Communications Conference, Rio de Janeiro, Brazil, Jan. 2022, pp. 3168
3173.
[27] L. Li, Q. Cheng, X. Tang, T. Bai, W. Chen, Z. Ding, and Z. Han, “Resource Allocation for
NOMA-MEC Systems in Ultra-Dense Networks: A Learning Aided Mean-Field Game
Approach,” IEEE Transactions on Wireless Communications, vol. 20, no. 3, pp. 1487
1500, Nov. 2021.
[28] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic
policy gradient algorithms,” in International Conference on Machine Learning. PMLR,
2014, pp. 387395.
[29] S. Zhang and R. S. Sutton, “A deeper look at experience replay,” arXiv preprint
arXiv:1712.01275, 2017.
[30] H. Peng and X. S. Shen, “DDPG-based Resource Management for MEC/UAV-Assisted
Vehicular Networks,” in 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-
Fall), Victoria, BC, Canada, Feb. 2020, pp. 16.
International Journal of Electrical, Energy
and Power System Engineering
Vol. 07, No. 03, October 2024
e-ISSN: 2654-4644
174
BIOGRAPHIES OF AUTHORS
SYED ZAIN UL ABIDEEN received the B.Sc. degree in software engineering from
University of Science and Technology Bannu, kpk Pakistan in 2018. Since 2018, he has
worked as a Full Stack Developer in various companies, mastering a wide range of
programming languages. He is currently pursuing the Master degree with College of
Computer Science and Technology, Qingdao University, Qingdao 266071, China. his
main research interests include wireless communication, machine learning, deep
learning, physical layer security, reconfigurable intelligent surface and backscatter
communication.
ABDUL WAHID received the B.S. degree in Computer Science from the University of
Peshawar, Pakistan, and the M.S. degree in Computer Science and Technology from
the Southwest University of Science and Technology (SWUST), Mianyang, Sichuan,
China, in 2017 and 2020, respectively. He is currently pursuing the Ph.D. degree with
the College of Computer Science and Technology, Qingdao University, Qingdao,
Shandong, China. His current research interests include wireless communications,
physical layer security, reconfigurable intelligent surface, machine learning, and 6G
technology.
MIAN MUHAMMAD KAMAL received the Master degree from Northwestern
Polytechnical University, in 2018, and the Ph.D. degree from the Zhengzhou University
in 2023. He was a part of the Next Generation Ubiquitous Network Laboratory, College
of Electronic and Information, Northwestern Polytechnical University, China, from Sep
2016 to June 2019. He is currently a Postdoctoral Fellow at Southeast University. His
research interests include MIMO antenna systems, smart-phone antennas, slot
antennas, 5G antennas, millimeter-wave, terahertz components, origami and kirigami
antennas, multiband antennas, and meta-surfaces. He is the author or coauthor of more
than 25 technical journals and conference papers.
... RIS is an advanced programmable metasurface capable of precisely modifying the electromagnetic properties of incoming signals, thus intelligently controlling the radio propagation environment [10]. This capability has gained significant interest from the academic community, leading to research on its integration into diverse wireless systems such as vehicular networks [11], mmWave communication [12], and integrated space-air-ground communication frameworks [13,14], [15] demonstrates how RIS combined with UAVs can optimize energy efficiency and secrecy performance. It focuses on using DRL to adjust RIS phase shifts dynamically under uncertain channel conditions, making it highly relevant to secure UAV-aided communication systems. ...
... Spatial parameters, such as the distances between the transmitter and the RIS. (25), to users (30), and from the RIS to Eves (20), along with the variable user-RIS distances (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25), play a pivotal role in shaping the agent's strategy. These distances influence the optimization of signal quality and security through RIS manipulation. ...
Article
Full-text available
In 6 G wireless networks, secure communication is crucial due to the inherent susceptibility of electromagnetic waves to eavesdropping. Reconfigurable intelligent surfaces (RIS) and backscatter communication technologies offer promising solutions by securely directing signals to authorized users, even in the presence of multiple passive eavesdroppers equipped with multi-antenna setups. This paper proposes a novel RIS-enhanced backscatter communication system that utilizes radio frequency (RF) signals from a power beacon (PB) to transmit confidential information to multiple authorized users, each equipped with a single antenna. To optimize system performance, the deep deterministic policy gradient (DDPG) algorithm is employed to dynamically control RIS beamforming and mitigate eavesdropping attempts by adversaries using linear decoding techniques. Simulation results demonstrate that the proposed DDPG-based strategy significantly improves multicast secrecy rates while satisfying transmit power and unit modulus constraints. Compared to conventional optimization methods, the DDPG algorithm enhances the alignment of RIS reflections toward intended users and minimizes signal leakage to eavesdroppers. This research highlights how RIS and backscatter communication technologies can enhance security and energy efficiency in 6 G networks, providing a scalable solution to reduce eavesdropping threats in future wireless systems.
Article
Full-text available
A novel air-to-ground communication paradigm is conceived, where an unmanned aerial vehicle (UAV)-mounted base station (BS) equipped with multiple antennas sends information to multiple ground users (GUs) with the aid of a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). In contrast to the conventional RIS whose main function is to reflect incident signals, the STAR-RIS is capable of both transmitting and reflecting the impinging signals from either side of the surface, thereby leading to full-space 360 degree coverage. However, the transmissive and reflective capabilities of the STAR-RIS require more complex transmission/reflection coefficient design. Therefore, in this work, a sum-rate maximization problem is formulated for the joint optimization of the UAV’s trajectory, the active beamforming at the UAV, and the passive transmission/reflection beamforming at the STAR-RIS. This cutting-edge optimization problem is also subject to the UAV’s flight safety, to the maximum flight duration constraint, as well as to the GUs’ minimum data rate requirements. Given the unknown locations of obstacles prior to the UAV’s flight, we provide an online decision making framework employing reinforcement learning (RL) to simultaneously adjust both the UAV’s trajectory as well as the active and passive beamformer. To enhance the system’s robustness against the associated uncertainties caused by limited sampling of the environment, a novel “distributionally-robust” RL (DRRL) algorithm is proposed for offering an adequate worst-case performance guarantee. Our numerical results unveil that: 1) the STAR-RIS assisted UAV communications benefit from significant sum-rate gain over the conventional reflecting-only RIS; and 2) the proposed DRRL algorithm achieves both more stable and more robust performance than the state-of-the-art RL algorithms.
Article
Full-text available
This study investigates the application of deep deterministic policy gradient (DDPG) to reconfigurable intelligent surface (RIS)-based unmanned aerial vehicles (UAV)-assisted non-orthogonal multiple access (NOMA) downlink networks. The deployment of UAV equipped with a RIS is important, as the UAV increases the flexibility of the RIS significantly, especially for the case of users who have no line-of-sight (LoS) path to the base station (BS). Therefore, the aim of this study is to maximize the sum-rate by jointly optimizing the power allocation of the BS, the phase shifting of the RIS, and the horizontal position of the UAV. The formulated problem is non-convex, the DDPG algorithm is utilized to solve it. The computer simulation results are provided to show the superior performance of the proposed DDPG-based algorithm.
Article
Full-text available
This letter proposes a novel design of reconfigurable intelligent surface (RIS) to enhance the physical layer security (PLS) in the RIS-aided non-orthogonal multiple access (NOMA) network. Under the design of the RIS, the problem of increasing the number of RIS elements damaging the secrecy performance is solved. Besides, it also ensures that the networks can use traditional channel coding schemes to achieve secrecy. Our results show that the novel design of the RIS is ready for enhancing secrecy performance.
Article
Sixth-generation (6G) non-terrestrial networks (NTNs) are advanced wireless communication systems that operate beyond traditional terrestrial networks. These networks utilize various technologies and platforms to provide flexible, enhanced connectivity and coverage. When operating at high frequency, ground user terminals require low-directional antennas, which experience poor link budgets from satellites and thus drive the quest for novel solutions. Reconfigurable Intelligent Surfaces (RISs) have recently emerged as a promising technology for 6G and beyond cellular systems. This article studies the potential of RIS-in-tegrated NTNs to revolutionize next-generation connectivity. First, it discusses the fundamentals of RIS technology. Secondly, it delves into reporting the recent advances in RIS-integrated NTNs. Subsequently, it presents a novel framework based on the current state-of-the-art for IRS-integrated NTNs with classical single connected diagonal RIS and fully connected beyond diagonal RIS architectures. Finally, the article highlights open challenges and future research directions to revolutionize the realm of RIS-integrated NTNs.
Article
Reconfigurable intelligent surface (RIS) assisted non-orthogonal multiple access (NOMA) transmission can effectively improve the energy/spectrum efficiency in wireless networks. This paper designs a low-complexity scheme to achieve the balanced tradeoff between the sum-rate and power consumption in a RIS-assisted NOMA system, which can be measured by energy efficiency. To solve the formulated problem effectively, the original non-convex problem is first decomposed into two subproblems, i.e., beamforming optimization and phase shift optimization. Alternating optimization is proposed to solve these two subproblems iteratively. In particular, successive convex approximation (SCA) is utilized to convert the non-convex constraints to convex ones. The provided simulation results demonstrate that the proposed scheme can achieve superior performance on energy efficiency compared to the random phase shifts and orthogonal multiple access (OMA) schemes.
Article
Combination of simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) and non-orthogonal multiple access (NOMA) is a win-win strategy which can significantly enhance the coverage performance. However, eavesdroppers may enjoy similar performance gains as the legitimate users. To solve this problem, an artificial noise (AN) assisted secure communication strategy is proposed to maximize the secrecy rate. An alternating optimization (AO) based iterative algorithm leveraging the classical successive convex approximation (SCA) and the semidefinite relaxation (SDR) techniques is proposed to derive the optimal AN model and the RIS parameters. It is found that the proposed algorithm provides better secrecy performance with less AN power compared with the benchmark schemes. More RIS elements help reducing the AN power, while this effect shrinks when the number of RIS elements is sufficiently large. Increasing the number of transmit antennas reduces the AN power if the eavesdropper is quite close to the transmitter, while improves it when the eavesdropper is far away.
Article
Non-orthogonal multiple access (NOMA) assisted semi-grant-free (SGF) transmission has been viewed as one of the promising technologies to meet massive connectivity requirements of the next-generation networks. A novel intelligent reconfigurable surface (IRS) assisted SGF NOMA transmission system is proposed, where the IRS is employed to satisfy the channel gain requirements for grant-based users (GBUs) and grant-free users (GFUs). The dynamic optimization on the sub-carrier assignment and power allocation for roaming GFUs, and the amplitude control and phase shift design for reflecting elements of the IRS, is formulated. Aiming at maximizing the long-term data rate of all GFUs, the optimization problem is first modeled as a multi-agent Markov decision problem. Then, three multi-agent deep reinforcement learning based frameworks are proposed to solve the problem under three different IRS cases, including the ideal IRS, non-ideal IRS with continuous phase shifts, and non-ideal IRS with discrete phase shifts. Specifically, for each GFU agent, a sub-carrier assignment deep Q-network (DQN) and a power allocation deep deterministic policy gradient (DDPG) are integrated to dynamically assign network resources for each GFU. For the only IRS agent, two DDPGs are integrated to dynamically assign phase shift and amplitude for each reflecting element of ideal IRS. The single DDPG for dynamically assigning continuous phase shifts, and parallel DQNs for dynamically assigning discrete phase shifts for non-ideal IRS with fixed amplitude are also proposed. Simulation results demonstrate that: 1) The network sum rates of all GFUs can achieve a significant improvement with the aid of IRS, comparing with the system without IRS. 2) The network sum rates of the NOMA assisted SGF transmissions are superior to that of OMA assisted GF transmissions.
Article
Reconfigurable intelligent surfaces (RISs), also known as intelligent reflecting surfaces (IRSs), or large intelligent surfaces (LISs), 1 have received significant attention for their potential to enhance the capacity and coverage of wireless networks by smartly reconfiguring the wireless propagation environment. Therefore, RISs are considered a promising technology for the sixth-generation (6G) of communication networks. In this context, we provide a comprehensive overview of the state-of-the-art on RISs, with focus on their operating principles, performance evaluation, beamforming design and resource management, applications of machine learning to RIS-enhanced wireless networks, as well as the integration of RISs with other emerging technologies. We describe the basic principles of RISs both from physics and communications perspectives, based on which we present performance evaluation of multiantenna assisted RIS systems. In addition, we systematically survey existing designs for RIS-enhanced wireless networks encompassing performance analysis, information theory, and performance optimization perspectives. Furthermore, we survey existing research contributions that apply machine learning for tackling challenges in dynamic scenarios, such as random fluctuations of wireless channels and user mobility in RIS-enhanced wireless networks. Last but not least, we identify major issues and research opportunities associated with the integration of RISs and other emerging technologies for applications to next-generation networks. 1 Without loss of generality, we use the name of RIS in the remainder of this paper. </fn