ArticlePDF Available

Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments

Authors:

Abstract and Figures

In this paper, we study a reconfigurable intelligent surface (RIS)-aided multiuser MISO system with imperfect hardware, where the transceiver design is based on the statistical channel state information (CSI). Considering the transceiver hardware impairments (HWI), we aim to maximize the minimum average user data rate, where the precoding matrices at the base station (BS) and the reflecting phase shifts at the RIS are jointly optimized. Since the problem is nonconvex and the objective function cannot be derived in closed form, we adopt the deep deterministic policy gradient (DDPG) algorithm to deal with this challenging optimization problem, where we generate a set of CSI vectors in an offline way, and then these data sets are used to train the neural networks. The simulation results demonstrate the rapid convergence speed of the adopted DDPG algorithm and also emphasize that it is crucial to consider the HWI when optimizing the transceiver.
Content may be subject to copyright.
Citation: Ma, W.; Zhuo, L.; Li, L.; Liu,
Y.; Ren, H. Deep Reinforcement
Learning for RIS-Aided Multiuser
MISO System with Hardware
Impairments. Appl. Sci. 2022,12,
7236. https://doi.org/10.3390/
app12147236
Academic Editor: Christos Bouras
Received: 31 May 2022
Accepted: 13 July 2022
Published: 18 July 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
Deep Reinforcement Learning for RIS-Aided Multiuser MISO
System with Hardware Impairments
Wenjie Ma, Liuchang Zhuo, Luchu Li, Yuhao Liu and Hong Ren *
National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China;
213200837@seu.edu.cn (W.M.); 213203755@seu.edu.cn (L.Z.); 213203234@seu.edu.cn (L.L.);
213202125@seu.edu.cn (Y.L.)
*Correspondence: hren@seu.edu.cn
Abstract:
In this paper, we study a reconfigurable intelligent surface (RIS)-aided multiuser MISO
system with imperfect hardware, where the transceiver design is based on the statistical channel state
information (CSI). Considering the transceiver hardware impairments (HWI), we aim to maximize
the minimum average user data rate, where the precoding matrices at the base station (BS) and
the reflecting phase shifts at the RIS are jointly optimized. Since the problem is nonconvex and the
objective function cannot be derived in closed form, we adopt the deep deterministic policy gradient
(DDPG) algorithm to deal with this challenging optimization problem, where we generate a set of
CSI vectors in an offline way, and then these data sets are used to train the neural networks. The
simulation results demonstrate the rapid convergence speed of the adopted DDPG algorithm and
also emphasize that it is crucial to consider the HWI when optimizing the transceiver.
Keywords:
intelligent reflecting surface (IRS); reconfigurable intelligent surface (RIS); hardware
impairment (HWI); deep deterministic policy gradient (DDPG)
1. Introduction
Thanks to its attractive properties of low power consumption and hardware cost,
reconfigurable intelligent surface (RIS) is recognized as one of the most promising tech-
niques in future sixth-generation (6G) wireless systems [
1
6
]. RIS consists of an array of
passive and low-cost reflecting elements whose phase shifts can be tuned. The authors
of [
7
,
8
] studied the RIS-aided multicell and RIS-aided simultaneous wireless information
and power transfer, respectively. Low-complexity algorithms were developed to jointly
optimize the precoding matrices at the base station (BS) and the reflecting phase shifts at
the RIS. However, the above contributions in [
7
,
8
] were based on the ideal assumption of
perfect hardware, which is difficult to hold in practice. In practical communication systems,
there are inevitable transceiver hardware impairments (HWI), which would cause signal
distortions and cannot be ignored in the transceiver design.
The authors of [
9
] derived the closed-form date rate expression for RIS-aided com-
munication systems, and then the impact of HWI on the RIS-aided systems was analyzed.
A RIS-aided single-user communication system with HWI was studied in [
10
], where the
phase shifts of the RIS were optimized by the majorization-minimization (MM) algorithm.
Recently, the joint beamforming and phase shift design was studied in a RIS-aided physical
layer security system in [
11
]. Besides the transceiver hardware impairment, the authors
of [
12
] further considered the impact of the phase noise at the RIS and derived the closed-
form data rate expression, based on which the genetic algorithm was adopted to solve
the phase shift optimization problem. In [
13
], the RIS-aided communication system for
serving a mobile user was studied, and the authors proposed an interesting algorithm
to predict the positions of the user under HWI. In [
14
], the authors analyzed the outage
performance for RIS-aided non-orthogonal multiple access systems with HWI, where both
near-field and far-field users were considered. Most recently, robust transceiver design
Appl. Sci. 2022,12, 7236. https://doi.org/10.3390/app12147236 https://www.mdpi.com/journal/applsci
Appl. Sci. 2022,12, 7236 2 of 14
for RIS-aided communication systems was studied in [
15
], where both imperfect CSI and
HWI were taken into account. The semidefinite programming was proposed to solve the
robust problem.
However, all the above papers were based on the assumption that the BS can acquire
the instantaneous CSI, which is challenging in practice due to the limited channel coherence
time. Recently, the researchers have focused on the phase shift design based on the statistical
CSI such as location/angle information or channel distribution information such as channel
covariance matrices, which varies in a much lower time scale than the instantaneous CSI.
There are several advantages to using statistical CSI for transceiver design [
16
]. Firstly,
the channel estimation overhead can be reduced as only statistical CSI is needed, which
changes very slowly. Secondly, the computational complexity is significantly reduced as
the phase shifts at the RIS are only needed to be recomputed when the statistical CSI has
changed. Thirdly, the feedback overhead is decreased since the phase shift values of the
RIS are only fed back to the RIS controller only when its values are updated, which changes
with statistical CSI. Due to the above appealing advantages, the transceiver design based
on statistical CSI for RIS-aided systems has attracted extensive research attention [
17
,
18
].
Specifically, the authors of [
17
] derived the closed-form date rate expression for a RIS-aided
multiuser system. Then, a genetic algorithm was first proposed to optimize the phase shifts,
which only depend on the statistical CSI. As a step further, the authors extended the work
in [17] to the practical case when there are imperfect hardware, and a robust transmission
design was proposed to optimize the phase shift by considering HWI.
However, the contributions in [
17
,
18
] considered the two-time scale design, where
the BS designed its precoding matrices based on the instantaneous effective CSI, while
only the phase shifts were designed based on statistical CSI. This means that the BS
still needs to estimate the instantaneous effective CSI, which will incur sizable channel
estimation overhead for highly-mobile scenarios. Against the above background, the
authors of [
19
] studied the transceiver design for RIS-aided communication systems based
on fully statistical CSI, where both the precoding matrices at the BS and the reflecting phase
shifts at the RIS were designed based on statistical CSI. However, this work was based on
the ideal assumption of perfect hardware, which is difficult to hold in practice. As a result,
the contributions of this work are summarized as follows:
We consider optimizing the precoding matrices at the BS and the reflecting phase
shifts at the RIS based on statistical CSI to maximize the minimum user data rate to
ensure fairness among the users, where the imperfect hardware is taken
into account.
Due to the expectation operator along with the hardware impairment, it is challenging
to derive the closed-form data rate expression. Furthermore, the objective function in
terms of the max-min format is discontinuous and non-differentiable. As a result, the
existing algorithms based on mathematical derivations are not applicable. Instead, we
resort to the powerful deep deterministic policy gradient (DDPG) algorithm to solve
this challenging optimization problem.
Note that the convergence speed is quite fast as it can converge within 600–900 iter-
ations and the overall computational complexity are mainly from the calculation of
rewards, which are only simple mathematical calculations. In addition, the calculated
parameters can be used in subsequent steps and only need to be recalculated when
the statistical CSI changes. Once the neural network is trained, it can be directly
applied in real-time applications with only simple mathematical calculations. The
neural networks only need to be retrained once the statistical CSI changes. Hence, the
computational complexity is not high.
2. System Model
We consider a RIS-aided downlink multi-user system where the base station (BS) is
equipped with
M
antennas and the user has a single antenna. The system architecture
Appl. Sci. 2022,12, 7236 3 of 14
is shown in Figure 1. In this system, we assume that the RIS has
N
reflecting elements.
Considering the hardware impairment, the transmit signal at the BS can be expressed as
x
x
x=K
k=1(wksk+η
η
ηs), (1)
where
wkCM×1
represents the beamforming vector from the BS to the
k
-th user and
sk CN(
0, 1
)
represents the data signal symbol transmitted to the
k
-th user which sat-
isfies
En|sk|2o=
1. Furthermore,
η
η
ηs
denotes the independent Gaussian distortion noise,
which satisfies the Gaussian distribution of zero mean, and its distortion noise power
is proportional to the transmit power of the antenna. Then,
η
η
ηs
can be represented as
η
η
ηs CN(0
0
0
,
ksdiagnK
k=1wkwH
ko)
, where
ks(0, 1)
denotes the normalized variance of
the emission distortion noise.
Figure 1. System Model.
For Kusers, the beamforming matrix at the BS can be expressed as
W=[w1,··· ,wK]. (2)
The beamforming matrix
W
has to satisfy the power constraints, which can be formulated as
tr{WWH} Pmax. (3)
Furthermore, the channel between the base station and the RIS is denoted as
HSI CN×M
, the channel between the
k
-th user and the BS is denoted as
hSD,kCM×1
and
the channel matrix from RIS to the
k
-th user is denoted as
hID,kCN×1
. In this paper, we
consider the Rician fading model, the channel HSI ,hSD,k, and hI D,kcan be formulated as
HSI =pβ rδ
δ+1HSI +r1
δ+1e
HSI !, (4)
Appl. Sci. 2022,12, 7236 4 of 14
hSD,k=γk rρk
ρk+1hSD,k+s1
ρk+1e
hSD,k!, (5)
hID,k=αk rεk
εk+1hID,k+s1
εk+1e
hID,k!, (6)
where
β
,
γk
and
αk
are the large-scale path loss coefficients;
δ
,
ρk
and
εk
are the Rician
factors;
HSI
,
hSD,k
and
hID,k
represent the line-of-sight components, which are statistical
CSI and remain unchanged over long time. When using the uniform area array model, the
line-of-sight components HSI ,hSD,kand hI D,kcan be formulated as
HSI =aN(θa
A,θe
A)aH
M(ϕa
D,ϕe
D), (7)
hID,k=aM(θa
D,k,θe
D,k), (8)
hSD,k=aN(ϕa
D,k,ϕe
D,k), (9)
where
θa
A
and
θe
A
deonte the azimuth and elevation angles of arrival at the RIS from the BS,
respectively;
ϕa
D
and
ϕe
D
denote the azimuth and elevation angles of departure from the BS
to the RIS, respectively;
θa
D,k
and
θe
D,k(ϕa
D,k
,
ϕe
D,k)
are the the azimuth and elevation angles
of departure from the RIS to the
k
-th user (from the BS to the
k
-th user). These angles are
randomly generated. In addition, the array response vector is defined as
aX(ϑa,ϑe)= [1, ··· ,ej2πd
λ(xsin ϑasin ϑe+ycos ϑe),··· ,
ej2πd
λ((X1)sin ϑasin ϑe+(X1)cos ϑe)]T,
(10)
where
X
can be substituted as
M
,
N
or
K
and
θa
A
and
θe
A
deonte the azimuth and el-
evation angles, respectively; dand
λ
represent the antenna spacing at the BS and the
wavelength, respectively.
Moreover,
e
HSI CN (
0,
RHR RH B )
,
e
hSD,k CN (
0,
RhB,k)
and
e
hID,k CN(
0,
RhR,k)
represent the non-line-of-sight component with
RHR
,
RHB
,
RhB,k
and
RhR,k
being the corre-
sponding spatial covariance matrices, which are given by
[RHB ]i,j=ρ|ij|
,
[RHR ]i,j=ρ|ij|
,
[RhB ]i,j=ρ|ij|and [RhR ]i,j=ρ|ij|and ρrepresents the correlation coefficient.
[aX(θa,θe)]x=exp2πjd
λx1
Xsin θasin θe+(x1)modXcos θe. (11)
SINRk=|(hH
ID,kΦ
Φ
ΦHSI +hH
SD,k)wk|2
K
i=1,i6=k(1+kB)|(hH
ID,kΦ
Φ
ΦHSI +hH
SD,k)wi|2+(1+kB)σ2
k+Γ
k(w,Φ
Φ
Φ). (12)
Γ
k(w,Φ
Φ
Φ)=hH
ID,kΦ
Φ
ΦHSI +hH
SD,k"kBwkwH
k+ (1+kB)ksdiag(K
i=1
wiwH
i)#hI D,kΦ
Φ
ΦHHH
SI +hSD,k. (13)
Thus, the signal received at the k-th user can be written as
yk=hH
ID,kΦHSI (wksk+η
η
ηs)
|{z }
reflected link
+hH
SD,k(wksk+η
η
ηs)
| {z }
direct link
+
K
i=1,i6=k
hH
ID,kΦHSI wisi
| {z }
multiuser interference
+η
η
ηB
|{z}
receiver HWI
+η
η
ηk
|{z}
noise
=e
yk+η
η
ηB.
(14)
Appl. Sci. 2022,12, 7236 5 of 14
where
Φ=diagejθ1,ejθ2,··· ,ejθN
denote the phase shift matrix at the RIS and
θi
is the
phase shift of the
i
-th reflecting element;
η
η
ηB CN(
0,
kBEn|e
yk|2o)
deontes the user’s
additional distortion noise, which satisfies the Gaussian distribution with zero mean and
kB(0, 1)
denotes the normalized variance of the received distortion noise;
η
η
ηk CN(
0,
σ2
k)
denotes the additive Gaussian white noise by the k-th user.
Therefore, the
k
-th user’s instantaneous signal-to-interference-plus-noise ratio (SINR)
is given by (12) and (13) on the next page. Based on (12) and (13), the instantaneous data
rate of the k-th user can be expressed as
Rk=log2(1+SINRk). (15)
Therefore, the optimization problem in this paper can be written as
max
W,Φmin
k
E[Rk]
s.t. C1 : tr{WWH} Pmax,
C2 : |θi|=1, i=1, 2, . . . , N,
(16)
where the expectation in the objective function is taken over the nonline-of-sight compo-
nents in the CSI. In the above optimization problem, C1 represents the power constraint at
the BS, while C2 means the unit modulus constraints of the phase shifts of the RIS. Unfortu-
nately, it is challenging to derive the closed-form expression of the objective function since
the average data rate contains the expectation operation over numerous random small-
scale channel gains. In addition, this work studied the impact of hardware impairment
on both the BS and the users. The average data rate expressions would be much more
complicated. Hence, there are no existing mathematical algorithms that can solve these
kinds of optimization problems.
3. Proposed Algorithm
In this section, we propose a statistical CSI-based transmission scheme where the
DDGP algorithm is adopted to solve the optimization problem.
3.1. Transmission Scheme
For the existing transmission schemes for RIS-assisted communication systems, the in-
stantaneous CSI is adopted to adjust the beamforming and RIS phases shift, which requires
channel estimation in each channel coherence time interval, as shown in Figure 2. How-
ever, this method has some drawbacks, as summarized as follows. For the instantaneous
CSI-based scheme, the beamforming matrix and phase shift matrix need to be calculated in
channel coherence interval, which increases the computational complexity. Furthermore,
phase shifts of the RIS need to be updated frequently and sent back to the RIS controller,
which incurs significant feedback overhead.
To address this issue, in this paper, we consider the design of the transmission scheme
based on statistical CSI. As shown in Figure 2, for the statistical CSI-based scheme, the BS
only needs to estimate the statistical CSI at the start of the transmission, and the rest of
several channel coherence time intervals will be fully used to transmit the information,
which significantly reduces the computational complexity and feedback overhead. Once
the network is trained, it can be directly applied in real-time with only simple mathematical
calculations. The neural networks only need to be retrained once the statistical CSI changes.
3.2. DDPG Algorithm
As the objective function in (16) does not have a closed-form expression, it is difficult to
solve this problem using conventional optimization algorithms. Therefore, in this paper, we
adopt a deep reinforcement learning algorithm, which can efficiently process complex envi-
ronmental parameters and a large amount of state information by utilizing techniques such
as stochastic gradient optimization and inverse parameter transfer in deep neural networks.
Appl. Sci. 2022,12, 7236 6 of 14
Specifically, the DDGP algorithm is employed to solve the optimization problem in this
paper, which is one of the deep reinforcement learning algorithms. The DDPG algorithm
can be used to solve this challenging optimization problem with continuous variables.
Figure 2. Transmission Scheme
The DDPG algorithm adopts the Actor-Critic architecture, which uses the policy
network to output deterministic actions directly, and the functions of its four networks are
introduced as follows:
(1)
Actor Current network
The role of the actor current network is to iteratively update the policy network
parameters
θ
and select the current action according to the state
S(t)
at time step
t
, which is composed of three parts: the beamforming matrix
W(t)CM×K
, the
phase shift matrix
Φ(t)CN×N
and the channel matrices, i.e.,
H(t)
SI CN×M
,
h(t)
SD,kCM×1
,
h(t)
ID,kCN×1
. In addition, the actor current network also inter-
acts with the environment to generate
S(t+1)
and reward
R(t)
, which can be defined as
R(t)=max
W,Φmin
k
E[R(t)
k]
and
R(t)
k
is defined in (15). The expression of loss function
J(t)(θ)can be expressed as
J(t)(θ)=1
mm
j=1Q(t)(si,ai,w). (17)
(2)
Actor Target network
The actor target network serves to select the optimal action
a(t+1)
based on the state
S(t+1)
at time
t+
1 sampled in the empirical playback pool. The action
a(t)R2MK+N
is composed of two parts: the first
N
elements corresponding to the phase shifts of
RIS reflecting elements and the remaining 2
MK
elements corresponding to the real
part and imaginary part of the beamforming matrix, respectively. We take action
a(t)
to optimize the beamforming matrix
W(t)
and the phase shift matrix
Φ(t)
, and the
optimized results can be described as
W(t+1)=PmaxW(t)
W(t)
F
.
φa(t+1)
n=φa(t)
n+a(t)
jπ
where
φ(t)
n=cos(φa(t)
n) + jsin(φa(t)
n)
is the
n
-th phase shift in
Φ(t)
and
a(t)
j
is the
j
-th
action value in a(t),n=j=1, 2, . . . , N.
Appl. Sci. 2022,12, 7236 7 of 14
The target network parameter
θ(t)0
is periodically copied from the current network
parameter θ(t), which uses the soft update method, and the soft update factor is τ.
θ(t+1)=τθ(t)+(1τ)θ(t+1).
(3)
Critic Current network
The critic current network is used to iteratively update the value network parameter
w(t)
and calculate the current value of
QS(t),a(t),w(t)
. The target value of
Q(t)0
is
given by
yi=R(t)+γQ0S(t),a(t),w(t)0. (18)
The loss function is given by
Jw(t)=1
mm
j=1yjQφS(t)
j,a(t)
j,w(t)2.(19)
(4)
Critic Target network
The critic target network aims to calculate the
Q0S(t),a(t),w(t)0
portion of the target
value
Q
. The network parameter
w(t)0
is periodically copied from
w(t)
, which uses the
soft update method, and the soft update factor is τ:
w(t+1)=τw(t)+(1τ)w(t+1).
At the same time, to increase some randomness and increase the coverage of learning
in the learning process, the DDPG algorithm adds some noise
N
to the selected action
A. That is, the expression of the final and interactive action A of the environment is
a(t)=πθS(t)+N. (20)
The structures of actor and critic networks are shown in Table 1and both of them have
three layers of neural networks.
Table 1. Network Structures.
Parameter Networks Number of Neurons Activation Function
Actor
128 ReLU
64 ReLU
N+MK2 tanh(·)
Critic
64 ReLU
32 ReLU
1 None
The overall algorithm is summarized in Algorithm 1.
Appl. Sci. 2022,12, 7236 8 of 14
Algorithm 1 The Proposed DDPG Algorithm.
1:
Randomly initialize
θ(t)
,
w(t)
,
w(t+1)=w(t)
,
θ(t+1)=θ(t)
. Empty the collection of
experience playback D.
2: for I = 1,2, . . ., T do
3:
Initialize
S(t)
as the first state of the current state sequence, and get its eigenvec-
tor φS(t).
4: Get the action A(t)=πθ(S)+Nin Actor ’s current network based on state S.
5:
Perform the action
A(t)
, get a new state
S(t+1)
, reward
R(t)
, and determine whether
arrive the termination status ‘end’.
6:
Stores the array
nφS(t),A(t),R(t),φS(t+1),endo
into the empirical playback
set D.
7: S(t+1)=S(t)
8:
Get
m
samples
nφS(t)
j,A(t)
j,R(t)
j,φS(t+1)
j,endjo
,
j=
1, 2,
. . .
,
m
, from empirical
playback sets, and calculate the current target Q0svalue yj:
yj=
R(t)
j
R(t)
j+γQ(t+1)φS(t+1)
j,πθ0φS(t+1)
j.(21)
9: Use the mean-variance loss function
1
mm
j=1yjQ(t)φS(t)
j,A(t)
j,w2,
to update the Critic’s current network parameter w through the gradient backpropa-
gation of neural networks.
10: Use
J(θ)=1
mm
j=1Q(t)(si,ai,w).
to update the Actor’s current network parameter
θ
through the gradient backpropa-
gation of neural networks.
11:
If
t
%
C=
1, update Critic’s target network and parameters of Actor’s
target network:
w(t+1)=τw(t+1)+(1τ)w(t+1).
θ(t+1)=τθ(t)+(1τ)θ(t+1).
12:
If
S(t+1)
is at termination status, end the current time step’s iteration; otherwise go to
Step b.
13: end for
4. Simulation Results
In this section, the performance of the DDPG algorithm-based scheme is evaluated.
Firstly, The locations of A and B are set at (0, 0, 30 m) and (100 m, 20 m, 10 m), respectively.
Besides, the users are limited to a circle centered at (150 m, 0, 1.5 m) with a radius of
20 m.
Other parameters are shown in Table 2.
Based on the table above, we adopt the DDPG algorithm based on statistical CSI. The
number of reflecting elements is set to
N=
20, 30, 40, and 50, respectively. For each time
step, we use the beamforming matrix
W
and the phase shift matrix
Φ
at time step
t
as
the input of the DDPG neural network, and the output will be
W
and
Φ
at time
t+
1.
In Figures 36, we illustrate the minimum average user date versus the time steps for
different N.
Appl. Sci. 2022,12, 7236 9 of 14
Table 2. Simulation Parameters.
Parameter Name Sign Parameter Value
Noise power density ρn174 dBm/Hz
Channel bandwidth B1 MHz
Reference path loss PL00–30 dB
Reference distance d01 m
Path loss coeffificients
β2.2
αk2.2
γk3.75
Rician factors
δ3
εk3
ρk3
Correlation coefficients ρ0.1
Normalized variance kB0.01
Normalized variance kS0.01
Numbers of antennas M8
Numbers of users K4
Numbers of reflecting elements N20–50
Figure 3. RIS = 20.
Figure 4. RIS = 30.
Appl. Sci. 2022,12, 7236 10 of 14
Figure 5. RIS = 40.
Figure 6. RIS = 50.
As shown in Figures 36, when the number of RIS reflecting elements is set to
N=
20,
the MAUR (minimum average user date) converges to 0.75 when the number is set to
N=
30, MAUR converges to about 0.8. When the number of RIS reflecting elements
increases to 40, MAUR increases to 0.9, and when
N
is 50, the rate is about 1.1. Hence, the
MAUR increases with the number of reflecting elements. Meanwhile, the simulation results
show that, under different conditions, the increase in the number of reflecting elements
does not affect the converging speed of the proposed DDPG algorithm.
In Figures 710, we respectively set
kB=ks
as 0.01–0.15 to explore the convergence
of MUAR under different conditions. It indicates that the MAUR decreases with
kB
and
ks
. This conclusion is also consistent with the SINR formula in (13). Furthermore, when
kB
,
ks
are set to 0, the result is identical to the situation when hardware impairments are
not considered.
In addition, considering the influence of wind and rain in nature, we have added a
random variable on the channel angles (angle of departure and angle of arrival), where the
random variable is assumed to follow the uniform distribution. In general, the uniform
distribution can be regarded as the worst case since the variables are uniformly distributed
rather than peaking at one point for the Gaussian distribution. Then, we use the trained
solution obtained from our DDPG networks for the realistic channels with angle variance
to demonstrate the effectiveness of our algorithm. As observed from Figure 11, we can find
that the performance degradation due to the channel variations is not too much, which
confirms the robustness of our proposed algorithm.
Appl. Sci. 2022,12, 7236 11 of 14
Figure 7. Convergence Speed when kB=kS=0.01.
Figure 8. Convergence Speed when kB=kS=0.05.
Figure 9. Convergence Speed when kB=kS=0.1.
Appl. Sci. 2022,12, 7236 12 of 14
Figure 10. Convergence Speed when kB=kS=0.15.
Figure 11. Performance verification under disturbance.
Finally, in Figure 12, we compare the performance of the proposed algorithm with
the existing non-optimized algorithm to evaluate the effectiveness of the optimization
operations. In specific, for the non-optimized algorithm, the beamforming vector at the
BS is randomly generated and the phase shift matrix is set to a unit matrix. It is ob-
served from Figure 12 that the proposed algorithm significantly outperforms the existing
non-optimized algorithm.
Figure 12. Comparison with non-optimized algorithm.
Appl. Sci. 2022,12, 7236 13 of 14
5. Conclusions
In this paper, we studied the downlink IRS-aided multiuser MISO system with im-
perfect hardware, which is based on statistical CSI design. The DDPG algorithm was
applied to optimize the beamforming matrix at the BS and the phase shifts matrix at the
RIS jointly. Furthermore, the transceiver hardware impairment was also considered to
solve the problem of inevitable hardware loss in practical systems. The simulation results
demonstrated that it is necessary to take into HWI, and the DDGP algorithm can achieve
excellent performance.
Author Contributions:
Conceptualization, W.M., L.L., L.Z., Y.L. and H.R.; methodology, W.M., L.
Li, L.Z., Y.L. and H.R.; software, W.M.; validation, L.Z.; formal analysis, Y.L.; investigation, W.M.;
resources, W.M.; data curation, W.M.; writing—original draft preparation, W.M., L. Li, L.Z. and
Y.L.; writing—review and editing, H.R.; visualization, Y.L.; supervision, H.R.; project administration,
H.R.; funding acquisition, H.R. All authors have read and agreed to the published version of the
manuscript.
Funding:
This work was supported in part by the National Natural Science Foundation of China
(62101128) and Basic Research Project of Jiangsu Provincial Department of Science and Technology
(BK20210205).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses,
or interpretation of data; in the writing of the manuscript, or in the decision to publish the results
References
1.
Pan, C.; Ren, H.; Wang, K.; Kolb, J.F.; Elkashlan, M.; Chen, M.; Di Renzo, M.; Hao, Y.; Wang, J.; Swindlehurst, A.L.; et al.
Reconfigurable Intelligent Surfaces for 6G Systems: Principles, Applications, and Research Directions. IEEE Commun. Mag.
2021
,
59, 14–20. [CrossRef]
2.
Renzo, M.D.; Debbah, M.; Phan-Huy, D.T.; Zappone, A.; Alouini, M.S.; Yuen, C.; Sciancalepore, V.; Alexandropoulos, G.C.;
Hoydis, J.; Gacanin, H.; et al. Smart radio environments empowered by reconfigurable AI meta-surfaces: An idea whose time has
come. EURASIP J. Wirel. Commun. Netw. 2019,2019, 1–20. [CrossRef]
3. Oliveri, G.; Rocca, P.; Salucci, M.; Massa, A. Holographic smart EM skins for advanced beam power shaping in next generation
wireless environments. IEEE J. Multiscale Multiphys. Comput. Tech. 2021,6, 171–182. [CrossRef]
4.
Di Renzo, M.; Zappone, A.; Debbah, M.; Alouini, M.S.; Yuen, C.; de Rosny, J.; Tretyakov, S. Smart Radio Environments Empowered
by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead. IEEE J. Sel. Areas Commun.
2020
,
38, 2450–2525. [CrossRef]
5.
Huang, C.; Hu, S.; Alexandropoulos, G.C.; Zappone, A.; Yuen, C.; Zhang, R.; Renzo, M.D.; Debbah, M. Holographic MIMO
Surfaces for 6G Wireless Networks: Opportunities, Challenges, and Trends. IEEE Wirel. Commun. 2020,27, 118–125. [CrossRef]
6.
Benoni, A.; Salucci, M.; Oliveri, G.; Rocca, P.; Li, B.; Massa, A. Planning of EM Skins for Improved Quality-of-Service in Urban
Areas. IEEE Trans. Antennas Propag. 2022. [CrossRef]
7.
Pan, C.; Ren, H.; Wang, K.; Xu, W.; Elkashlan, M.; Nallanathan, A.; Hanzo, L. Multicell MIMO Communications Relying on
Intelligent Reflecting Surfaces. IEEE Trans. Wirel. Commun. 2020,19, 5218–5233. [CrossRef]
8.
Pan, C.; Ren, H.; Wang, K.; Elkashlan, M.; Nallanathan, A.; Wang, J.; Hanzo, L. Intelligent Reflecting Surface Aided MIMO
Broadcasting for Simultaneous Wireless Information and Power Transfer. IEEE J. Sel. Areas Commun.
2020
,38, 1719–1734.
[CrossRef]
9.
Boulogeorgos, A.A.A.; Alexiou, A. How Much do Hardware Imperfections Affect the Performance of Reconfigurable Intelligent
Surface-Assisted Systems? IEEE Open J. Commun. Soc. 2020,1, 1185–1195. [CrossRef]
10.
Shen, H.; Xu, W.; Gong, S.; Zhao, C.; Ng, D.W.K. Beamforming Optimization for IRS-Aided Communications with Transceiver
Hardware Impairments. IEEE Trans. Commun. 2021,69, 1214–1227. [CrossRef]
11.
Zhou, G.; Pan, C.; Ren, H.; Wang, K.; Peng, Z. Secure Wireless Communication in RIS-Aided MISO System with Hardware
Impairments. IEEE Wirel. Commun. Lett. 2021,10, 1309–1313. [CrossRef]
12.
Peng, Z.; Li, T.; Pan, C.; Ren, H.; Wang, J. RIS-Aided D2D Communications Relying on Statistical CSI With Imperfect Hardware.
IEEE Commun. Lett. 2022,26, 473–477. [CrossRef]
13.
Wang, K.; Lam, C.T.; Ng, B.K. Doppler Effect Mitigation using Reconfigurable Intelligent Surfaces with Hardware Impairments.
In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; pp. 1–6. [CrossRef]
Appl. Sci. 2022,12, 7236 14 of 14
14.
Hemanth, A.; Umamaheswari, K.; Pogaku, A.C.; Do, D.T.; Lee, B.M. Outage Performance Analysis of Reconfigurable Intelligent
Surfaces-Aided NOMA Under Presence of Hardware Impairment. IEEE Access 2020,8, 212156–212165. [CrossRef]
15.
Peng, Z.; Chen, Z.; Pan, C.; Zhou, G.; Ren, H. Robust Transmission Design for RIS-Aided Communications With Both Transceiver
Hardware Impairments and Imperfect CSI. IEEE Wirel. Commun. Lett. 2022,11, 528–532. [CrossRef]
16.
Hassan, A.K.; Moinuddin, M.; Al-Saggaf, U.M.; Aldayel, O.; Davidson, T.N.; Al-Naffouri, T.Y. Performance Analysis and Joint
Statistical Beamformer Design for Multi-User MIMO Systems. IEEE Commun. Lett. 2020,24, 2152–2156. [CrossRef]
17.
Zhi, K.; Pan, C.; Ren, H.; Wang, K. Power Scaling Law Analysis and Phase Shift Optimization of RIS-Aided Massive MIMO
Systems With Statistical CSI. IEEE Trans. Commun. 2022,70, 3558–3574. [CrossRef]
18.
Dai, J.; Zhu, F.; Pan, C.; Ren, H.; Wang, K. Statistical CSI-Based Transmission Design for Reconfigurable Intelligent Surface-Aided
Massive MIMO Systems With Hardware Impairments. IEEE Wirel. Commun. Lett. 2022,11, 38–42. [CrossRef]
19.
Ren, H.; Pan, C.; Wang, L.; Liu, W.; Kou, Z.; Wang, K. Long-Term CSI-Based Design for RIS-Aided Multiuser MISO Systems
Exploiting Deep Reinforcement Learning. IEEE Commun. Lett. 2022,26, 567–571. [CrossRef]
ResearchGate has not been able to resolve any citations for this publication.
Article
In this paper, we study the transmission design for reconfigurable intelligent surface (RIS)-aided multiuser communication networks. Different from most of the existing contributions, we consider long-term CSI-based transmission design, where both the beamforming vectors at the base station (BS) and the phase shifts at the RIS are designed based on long-term CSI, which can significantly reduce the channel estimation overhead. Due to the lack of explicit ergodic data rate expression, we propose a novel deep deterministic policy gradient (DDPG) based algorithm to solve the optimization problem, which was trained by using the channel vectors generated in an offline manner. Simulation results demonstrate that the achievable net throughput is higher than that achieved by the conventional instantaneous-CSI based scheme when taking the channel estimation overhead into account.
Article
Reconfigurable intelligent surface (RIS) or intelligent reflecting surface (IRS) has recently been envisioned as one of the most promising technologies in the future sixth-generation (6G) communications. In this paper, we consider the joint optimization of the transmit beamforming at the base station (BS) and the phase shifts at the RIS for an RIS-aided wireless communication system with both hardware impairments and imperfect channel state information (CSI). Specifically, we assume both the BS-user channel and the BS-RIS-user channel are imperfect due to the channel estimation error, and we consider the channel estimation error under the statistical CSI error model. Then, the transmit power of the BS is minimized, subject to the outage probability constraint and the unit-modulus constraints on the reflecting elements. By using Bernstein-type inequality and semidefinite relaxation (SDR) to reformulate the constraints, we transform the optimization problem into a semidefinite programming (SDP) problem. Numerical results show that the proposed robust design algorithm can ensure communication quality of the user in the presence of both hardware impairments and imperfect CSI.
Article
We consider a reconfigurable intelligent surface (RIS)-aided massive multi-user multiple-input multiple-output (MIMO) communication system with transceiver hardware impairments (HWIs) and RIS phase noise. Different from the existing contributions, the phase shifts of the RIS are designed based on the long-term angle informations. Firstly, an approximate analytical expression of the uplink achievable rate is derived. Then, we use genetic algorithm (GA) to maximize the sum rate and the minimum date rate. Finally, we show that it is crucial to take HWIs into account when designing the phase shift of RIS.
Article
Reconfigurable intelligent surfaces (RISs) or intelligent reflecting surfaces (IRSs), are regarded as one of the most promising and revolutionizing techniques for enhancing the spectrum and/or energy efficiency of wireless systems. These devices are capable of reconfiguring the wireless propagation environment by carefully tuning the phase shifts of a large number of low-cost passive reflecting elements. In this article, we aim for answering four fundmental questions: 1) Why do we need RISs? 2) What is an RIS? 3) What are RIS’s applications? 4) What are the relevant challenges and future research directions? In response, eight promising research directions are pointed out.
Article
In practice, residual transceiver hardware impairments inevitably lead to distortion noise which causes the performance loss. In this paper, we study the robust transmission design for a reconfigurable intelligent surface (RIS)-aided secure communication system in the presence of transceiver hardware impairments. We aim for maximizing the secrecy rate while ensuring the transmit power constraint on the active beamforming at the base station and the unit-modulus constraint on the passive beamforming at the RIS. To address this problem, we adopt the alternate optimization method to iteratively optimize one set of variables while keeping the other set fixed. Specifically, the successive convex approximation (SCA) method is used to solve the active beamforming optimization subproblem, while the passive beamforming is obtained by using the semidefinite program (SDP) method. Numerical results illustrate that the proposed transmission design scheme is more robust to the hardware impairments than the conventional non-robust scheme that ignores the impact of the hardware impairments.