Available via license: CC BY 4.0

Content may be subject to copyright.

Citation: Ma, W.; Zhuo, L.; Li, L.; Liu,

Y.; Ren, H. Deep Reinforcement

Learning for RIS-Aided Multiuser

MISO System with Hardware

Impairments. Appl. Sci. 2022,12,

7236. https://doi.org/10.3390/

app12147236

Academic Editor: Christos Bouras

Received: 31 May 2022

Accepted: 13 July 2022

Published: 18 July 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Copyright: © 2022 by the authors.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

applied

sciences

Article

Deep Reinforcement Learning for RIS-Aided Multiuser MISO

System with Hardware Impairments

Wenjie Ma, Liuchang Zhuo, Luchu Li, Yuhao Liu and Hong Ren *

National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China;

213200837@seu.edu.cn (W.M.); 213203755@seu.edu.cn (L.Z.); 213203234@seu.edu.cn (L.L.);

213202125@seu.edu.cn (Y.L.)

*Correspondence: hren@seu.edu.cn

Abstract:

In this paper, we study a reconﬁgurable intelligent surface (RIS)-aided multiuser MISO

system with imperfect hardware, where the transceiver design is based on the statistical channel state

information (CSI). Considering the transceiver hardware impairments (HWI), we aim to maximize

the minimum average user data rate, where the precoding matrices at the base station (BS) and

the reﬂecting phase shifts at the RIS are jointly optimized. Since the problem is nonconvex and the

objective function cannot be derived in closed form, we adopt the deep deterministic policy gradient

(DDPG) algorithm to deal with this challenging optimization problem, where we generate a set of

CSI vectors in an ofﬂine way, and then these data sets are used to train the neural networks. The

simulation results demonstrate the rapid convergence speed of the adopted DDPG algorithm and

also emphasize that it is crucial to consider the HWI when optimizing the transceiver.

Keywords:

intelligent reﬂecting surface (IRS); reconﬁgurable intelligent surface (RIS); hardware

impairment (HWI); deep deterministic policy gradient (DDPG)

1. Introduction

Thanks to its attractive properties of low power consumption and hardware cost,

reconﬁgurable intelligent surface (RIS) is recognized as one of the most promising tech-

niques in future sixth-generation (6G) wireless systems [

1

–

6

]. RIS consists of an array of

passive and low-cost reﬂecting elements whose phase shifts can be tuned. The authors

of [

7

,

8

] studied the RIS-aided multicell and RIS-aided simultaneous wireless information

and power transfer, respectively. Low-complexity algorithms were developed to jointly

optimize the precoding matrices at the base station (BS) and the reﬂecting phase shifts at

the RIS. However, the above contributions in [

7

,

8

] were based on the ideal assumption of

perfect hardware, which is difﬁcult to hold in practice. In practical communication systems,

there are inevitable transceiver hardware impairments (HWI), which would cause signal

distortions and cannot be ignored in the transceiver design.

The authors of [

9

] derived the closed-form date rate expression for RIS-aided com-

munication systems, and then the impact of HWI on the RIS-aided systems was analyzed.

A RIS-aided single-user communication system with HWI was studied in [

10

], where the

phase shifts of the RIS were optimized by the majorization-minimization (MM) algorithm.

Recently, the joint beamforming and phase shift design was studied in a RIS-aided physical

layer security system in [

11

]. Besides the transceiver hardware impairment, the authors

of [

12

] further considered the impact of the phase noise at the RIS and derived the closed-

form data rate expression, based on which the genetic algorithm was adopted to solve

the phase shift optimization problem. In [

13

], the RIS-aided communication system for

serving a mobile user was studied, and the authors proposed an interesting algorithm

to predict the positions of the user under HWI. In [

14

], the authors analyzed the outage

performance for RIS-aided non-orthogonal multiple access systems with HWI, where both

near-ﬁeld and far-ﬁeld users were considered. Most recently, robust transceiver design

Appl. Sci. 2022,12, 7236. https://doi.org/10.3390/app12147236 https://www.mdpi.com/journal/applsci

Appl. Sci. 2022,12, 7236 2 of 14

for RIS-aided communication systems was studied in [

15

], where both imperfect CSI and

HWI were taken into account. The semideﬁnite programming was proposed to solve the

robust problem.

However, all the above papers were based on the assumption that the BS can acquire

the instantaneous CSI, which is challenging in practice due to the limited channel coherence

time. Recently, the researchers have focused on the phase shift design based on the statistical

CSI such as location/angle information or channel distribution information such as channel

covariance matrices, which varies in a much lower time scale than the instantaneous CSI.

There are several advantages to using statistical CSI for transceiver design [

16

]. Firstly,

the channel estimation overhead can be reduced as only statistical CSI is needed, which

changes very slowly. Secondly, the computational complexity is signiﬁcantly reduced as

the phase shifts at the RIS are only needed to be recomputed when the statistical CSI has

changed. Thirdly, the feedback overhead is decreased since the phase shift values of the

RIS are only fed back to the RIS controller only when its values are updated, which changes

with statistical CSI. Due to the above appealing advantages, the transceiver design based

on statistical CSI for RIS-aided systems has attracted extensive research attention [

17

,

18

].

Speciﬁcally, the authors of [

17

] derived the closed-form date rate expression for a RIS-aided

multiuser system. Then, a genetic algorithm was ﬁrst proposed to optimize the phase shifts,

which only depend on the statistical CSI. As a step further, the authors extended the work

in [17] to the practical case when there are imperfect hardware, and a robust transmission

design was proposed to optimize the phase shift by considering HWI.

However, the contributions in [

17

,

18

] considered the two-time scale design, where

the BS designed its precoding matrices based on the instantaneous effective CSI, while

only the phase shifts were designed based on statistical CSI. This means that the BS

still needs to estimate the instantaneous effective CSI, which will incur sizable channel

estimation overhead for highly-mobile scenarios. Against the above background, the

authors of [

19

] studied the transceiver design for RIS-aided communication systems based

on fully statistical CSI, where both the precoding matrices at the BS and the reﬂecting phase

shifts at the RIS were designed based on statistical CSI. However, this work was based on

the ideal assumption of perfect hardware, which is difﬁcult to hold in practice. As a result,

the contributions of this work are summarized as follows:

•

We consider optimizing the precoding matrices at the BS and the reﬂecting phase

shifts at the RIS based on statistical CSI to maximize the minimum user data rate to

ensure fairness among the users, where the imperfect hardware is taken

into account.

•

Due to the expectation operator along with the hardware impairment, it is challenging

to derive the closed-form data rate expression. Furthermore, the objective function in

terms of the max-min format is discontinuous and non-differentiable. As a result, the

existing algorithms based on mathematical derivations are not applicable. Instead, we

resort to the powerful deep deterministic policy gradient (DDPG) algorithm to solve

this challenging optimization problem.

•

Note that the convergence speed is quite fast as it can converge within 600–900 iter-

ations and the overall computational complexity are mainly from the calculation of

rewards, which are only simple mathematical calculations. In addition, the calculated

parameters can be used in subsequent steps and only need to be recalculated when

the statistical CSI changes. Once the neural network is trained, it can be directly

applied in real-time applications with only simple mathematical calculations. The

neural networks only need to be retrained once the statistical CSI changes. Hence, the

computational complexity is not high.

2. System Model

We consider a RIS-aided downlink multi-user system where the base station (BS) is

equipped with

M

antennas and the user has a single antenna. The system architecture

Appl. Sci. 2022,12, 7236 3 of 14

is shown in Figure 1. In this system, we assume that the RIS has

N

reﬂecting elements.

Considering the hardware impairment, the transmit signal at the BS can be expressed as

x

x

x=∑K

k=1(wksk+η

η

ηs), (1)

where

wk∈CM×1

represents the beamforming vector from the BS to the

k

-th user and

sk∼ CN(

0, 1

)

represents the data signal symbol transmitted to the

k

-th user which sat-

isﬁes

En|sk|2o=

1. Furthermore,

η

η

ηs

denotes the independent Gaussian distortion noise,

which satisﬁes the Gaussian distribution of zero mean, and its distortion noise power

is proportional to the transmit power of the antenna. Then,

η

η

ηs

can be represented as

η

η

ηs∼ CN(0

0

0

,

ksdiagn∑K

k=1wkwH

ko)

, where

ks∈(0, 1)

denotes the normalized variance of

the emission distortion noise.

Figure 1. System Model.

For Kusers, the beamforming matrix at the BS can be expressed as

W=[w1,··· ,wK]. (2)

The beamforming matrix

W

has to satisfy the power constraints, which can be formulated as

tr{WWH} ≤ Pmax. (3)

Furthermore, the channel between the base station and the RIS is denoted as

HSI ∈CN×M

, the channel between the

k

-th user and the BS is denoted as

hSD,k∈CM×1

and

the channel matrix from RIS to the

k

-th user is denoted as

hID,k∈CN×1

. In this paper, we

consider the Rician fading model, the channel HSI ,hSD,k, and hI D,kcan be formulated as

HSI =pβ rδ

δ+1HSI +r1

δ+1e

HSI !, (4)

Appl. Sci. 2022,12, 7236 4 of 14

hSD,k=√γk rρk

ρk+1hSD,k+s1

ρk+1e

hSD,k!, (5)

hID,k=√αk rεk

εk+1hID,k+s1

εk+1e

hID,k!, (6)

where

β

,

γk

and

αk

are the large-scale path loss coefﬁcients;

δ

,

ρk

and

εk

are the Rician

factors;

HSI

,

hSD,k

and

hID,k

represent the line-of-sight components, which are statistical

CSI and remain unchanged over long time. When using the uniform area array model, the

line-of-sight components HSI ,hSD,kand hI D,kcan be formulated as

HSI =aN(θa

A,θe

A)aH

M(ϕa

D,ϕe

D), (7)

hID,k=aM(θa

D,k,θe

D,k), (8)

hSD,k=aN(ϕa

D,k,ϕe

D,k), (9)

where

θa

A

and

θe

A

deonte the azimuth and elevation angles of arrival at the RIS from the BS,

respectively;

ϕa

D

and

ϕe

D

denote the azimuth and elevation angles of departure from the BS

to the RIS, respectively;

θa

D,k

and

θe

D,k(ϕa

D,k

,

ϕe

D,k)

are the the azimuth and elevation angles

of departure from the RIS to the

k

-th user (from the BS to the

k

-th user). These angles are

randomly generated. In addition, the array response vector is deﬁned as

aX(ϑa,ϑe)= [1, ··· ,ej2πd

λ(xsin ϑasin ϑe+ycos ϑe),··· ,

ej2πd

λ((√X−1)sin ϑasin ϑe+(√X−1)cos ϑe)]T,

(10)

where

X

can be substituted as

M

,

N

or

K

and

θa

A

and

θe

A

deonte the azimuth and el-

evation angles, respectively; dand

λ

represent the antenna spacing at the BS and the

wavelength, respectively.

Moreover,

e

HSI ∼ CN (

0,

RHR ⊗RH B )

,

e

hSD,k∼ CN (

0,

RhB,k)

and

e

hID,k∼ CN(

0,

RhR,k)

represent the non-line-of-sight component with

RHR

,

RHB

,

RhB,k

and

RhR,k

being the corre-

sponding spatial covariance matrices, which are given by

[RHB ]i,j=ρ|i−j|

,

[RHR ]i,j=ρ|i−j|

,

[RhB ]i,j=ρ|i−j|and [RhR ]i,j=ρ|i−j|and ρrepresents the correlation coefﬁcient.

[aX(θa,θe)]x=exp2πjd

λx−1

√Xsin θasin θe+(x−1)mod√Xcos θe. (11)

SINRk=|(hH

ID,kΦ

Φ

ΦHSI +hH

SD,k)wk|2

∑K

i=1,i6=k(1+kB)|(hH

ID,kΦ

Φ

ΦHSI +hH

SD,k)wi|2+(1+kB)σ2

k+Γ

k(w,Φ

Φ

Φ). (12)

Γ

k(w,Φ

Φ

Φ)=hH

ID,kΦ

Φ

ΦHSI +hH

SD,k"kBwkwH

k+ (1+kB)ksdiag(K

∑

i=1

wiwH

i)#hI D,kΦ

Φ

ΦHHH

SI +hSD,k. (13)

Thus, the signal received at the k-th user can be written as

yk=hH

ID,kΦHSI (wksk+η

η

ηs)

|{z }

reﬂected link

+hH

SD,k(wksk+η

η

ηs)

| {z }

direct link

+

K

∑

i=1,i6=k

hH

ID,kΦHSI wisi

| {z }

multiuser interference

+η

η

ηB

|{z}

receiver HWI

+η

η

ηk

|{z}

noise

=e

yk+η

η

ηB.

(14)

Appl. Sci. 2022,12, 7236 5 of 14

where

Φ=diagejθ1,ejθ2,··· ,ejθN

denote the phase shift matrix at the RIS and

θi

is the

phase shift of the

i

-th reﬂecting element;

η

η

ηB∼ CN(

0,

kBEn|e

yk|2o)

deontes the user’s

additional distortion noise, which satisﬁes the Gaussian distribution with zero mean and

kB∈(0, 1)

denotes the normalized variance of the received distortion noise;

η

η

ηk∼ CN(

0,

σ2

k)

denotes the additive Gaussian white noise by the k-th user.

Therefore, the

k

-th user’s instantaneous signal-to-interference-plus-noise ratio (SINR)

is given by (12) and (13) on the next page. Based on (12) and (13), the instantaneous data

rate of the k-th user can be expressed as

Rk=log2(1+SINRk). (15)

Therefore, the optimization problem in this paper can be written as

max

W,Φmin

k

E[Rk]

s.t. C1 : tr{WWH} ≤ Pmax,

C2 : |θi|=1, ∀i=1, 2, . . . , N,

(16)

where the expectation in the objective function is taken over the nonline-of-sight compo-

nents in the CSI. In the above optimization problem, C1 represents the power constraint at

the BS, while C2 means the unit modulus constraints of the phase shifts of the RIS. Unfortu-

nately, it is challenging to derive the closed-form expression of the objective function since

the average data rate contains the expectation operation over numerous random small-

scale channel gains. In addition, this work studied the impact of hardware impairment

on both the BS and the users. The average data rate expressions would be much more

complicated. Hence, there are no existing mathematical algorithms that can solve these

kinds of optimization problems.

3. Proposed Algorithm

In this section, we propose a statistical CSI-based transmission scheme where the

DDGP algorithm is adopted to solve the optimization problem.

3.1. Transmission Scheme

For the existing transmission schemes for RIS-assisted communication systems, the in-

stantaneous CSI is adopted to adjust the beamforming and RIS phases shift, which requires

channel estimation in each channel coherence time interval, as shown in Figure 2. How-

ever, this method has some drawbacks, as summarized as follows. For the instantaneous

CSI-based scheme, the beamforming matrix and phase shift matrix need to be calculated in

channel coherence interval, which increases the computational complexity. Furthermore,

phase shifts of the RIS need to be updated frequently and sent back to the RIS controller,

which incurs signiﬁcant feedback overhead.

To address this issue, in this paper, we consider the design of the transmission scheme

based on statistical CSI. As shown in Figure 2, for the statistical CSI-based scheme, the BS

only needs to estimate the statistical CSI at the start of the transmission, and the rest of

several channel coherence time intervals will be fully used to transmit the information,

which signiﬁcantly reduces the computational complexity and feedback overhead. Once

the network is trained, it can be directly applied in real-time with only simple mathematical

calculations. The neural networks only need to be retrained once the statistical CSI changes.

3.2. DDPG Algorithm

As the objective function in (16) does not have a closed-form expression, it is difﬁcult to

solve this problem using conventional optimization algorithms. Therefore, in this paper, we

adopt a deep reinforcement learning algorithm, which can efﬁciently process complex envi-

ronmental parameters and a large amount of state information by utilizing techniques such

as stochastic gradient optimization and inverse parameter transfer in deep neural networks.

Appl. Sci. 2022,12, 7236 6 of 14

Speciﬁcally, the DDGP algorithm is employed to solve the optimization problem in this

paper, which is one of the deep reinforcement learning algorithms. The DDPG algorithm

can be used to solve this challenging optimization problem with continuous variables.

Figure 2. Transmission Scheme

The DDPG algorithm adopts the Actor-Critic architecture, which uses the policy

network to output deterministic actions directly, and the functions of its four networks are

introduced as follows:

(1)

Actor Current network

The role of the actor current network is to iteratively update the policy network

parameters

θ

and select the current action according to the state

S(t)

at time step

t

, which is composed of three parts: the beamforming matrix

W(t)∈CM×K

, the

phase shift matrix

Φ(t)∈CN×N

and the channel matrices, i.e.,

H(t)

SI ∈CN×M

,

h(t)

SD,k∈CM×1

,

h(t)

ID,k∈CN×1

. In addition, the actor current network also inter-

acts with the environment to generate

S(t+1)

and reward

R(t)

, which can be deﬁned as

R(t)=max

W,Φmin

k

E[R(t)

k]

and

R(t)

k

is deﬁned in (15). The expression of loss function

J(t)(θ)can be expressed as

J(t)(θ)=−1

m∑m

j=1Q(t)(si,ai,w). (17)

(2)

Actor Target network

The actor target network serves to select the optimal action

a(t+1)

based on the state

S(t+1)

at time

t+

1 sampled in the empirical playback pool. The action

a(t)∈R2MK+N

is composed of two parts: the ﬁrst

N

elements corresponding to the phase shifts of

RIS reﬂecting elements and the remaining 2

MK

elements corresponding to the real

part and imaginary part of the beamforming matrix, respectively. We take action

a(t)

to optimize the beamforming matrix

W(t)

and the phase shift matrix

Φ(t)

, and the

optimized results can be described as

W(t+1)=√PmaxW(t)

W(t)

F

.

φa(t+1)

n=φa(t)

n+a(t)

jπ

where

φ(t)

n=cos(φa(t)

n) + jsin(φa(t)

n)

is the

n

-th phase shift in

Φ(t)

and

a(t)

j

is the

j

-th

action value in a(t),∀n=j=1, 2, . . . , N.

Appl. Sci. 2022,12, 7236 7 of 14

The target network parameter

θ(t)0

is periodically copied from the current network

parameter θ(t), which uses the soft update method, and the soft update factor is τ.

θ(t+1)=τθ(t)+(1−τ)θ(t+1).

(3)

Critic Current network

The critic current network is used to iteratively update the value network parameter

w(t)

and calculate the current value of

QS(t),a(t),w(t)

. The target value of

Q(t)0

is

given by

yi=R(t)+γQ0S(t),a(t),w(t)0. (18)

The loss function is given by

Jw(t)=1

m∑m

j=1yj−QφS(t)

j,a(t)

j,w(t)2.(19)

(4)

Critic Target network

The critic target network aims to calculate the

Q0S(t),a(t),w(t)0

portion of the target

value

Q

. The network parameter

w(t)0

is periodically copied from

w(t)

, which uses the

soft update method, and the soft update factor is τ:

w(t+1)=τw(t)+(1−τ)w(t+1).

At the same time, to increase some randomness and increase the coverage of learning

in the learning process, the DDPG algorithm adds some noise

N

to the selected action

A. That is, the expression of the ﬁnal and interactive action A of the environment is

a(t)=πθS(t)+N. (20)

The structures of actor and critic networks are shown in Table 1and both of them have

three layers of neural networks.

Table 1. Network Structures.

Parameter Networks Number of Neurons Activation Function

Actor

128 ReLU

64 ReLU

N+M∗K∗2 tanh(·)

Critic

64 ReLU

32 ReLU

1 None

The overall algorithm is summarized in Algorithm 1.

Appl. Sci. 2022,12, 7236 8 of 14

Algorithm 1 The Proposed DDPG Algorithm.

1:

Randomly initialize

θ(t)

,

w(t)

,

w(t+1)=w(t)

,

θ(t+1)=θ(t)

. Empty the collection of

experience playback D.

2: for I = 1,2, . . ., T do

3:

Initialize

S(t)

as the ﬁrst state of the current state sequence, and get its eigenvec-

tor φS(t).

4: Get the action A(t)=πθ(S)+Nin Actor ’s current network based on state S.

5:

Perform the action

A(t)

, get a new state

S(t+1)

, reward

R(t)

, and determine whether

arrive the termination status ‘end’.

6:

Stores the array

nφS(t),A(t),R(t),φS(t+1),endo

into the empirical playback

set D.

7: S(t+1)=S(t)

8:

Get

m

samples

nφS(t)

j,A(t)

j,R(t)

j,φS(t+1)

j,endjo

,

∀j=

1, 2,

. . .

,

m

, from empirical

playback sets, and calculate the current target Q0svalue yj:

yj=

R(t)

j

R(t)

j+γQ(t+1)φS(t+1)

j,πθ0φS(t+1)

j.(21)

9: Use the mean-variance loss function

1

m∑m

j=1yj−Q(t)φS(t)

j,A(t)

j,w2,

to update the Critic’s current network parameter w through the gradient backpropa-

gation of neural networks.

10: Use

J(θ)=−1

m∑m

j=1Q(t)(si,ai,w).

to update the Actor’s current network parameter

θ

through the gradient backpropa-

gation of neural networks.

11:

If

t

%

C=

1, update Critic’s target network and parameters of Actor’s

target network:

w(t+1)=τw(t+1)+(1−τ)w(t+1).

θ(t+1)=τθ(t)+(1−τ)θ(t+1).

12:

If

S(t+1)

is at termination status, end the current time step’s iteration; otherwise go to

Step b.

13: end for

4. Simulation Results

In this section, the performance of the DDPG algorithm-based scheme is evaluated.

Firstly, The locations of A and B are set at (0, 0, 30 m) and (100 m, 20 m, 10 m), respectively.

Besides, the users are limited to a circle centered at (150 m, 0, 1.5 m) with a radius of

20 m.

Other parameters are shown in Table 2.

Based on the table above, we adopt the DDPG algorithm based on statistical CSI. The

number of reﬂecting elements is set to

N=

20, 30, 40, and 50, respectively. For each time

step, we use the beamforming matrix

W

and the phase shift matrix

Φ

at time step

t

as

the input of the DDPG neural network, and the output will be

W

and

Φ

at time

t+

1.

In Figures 3–6, we illustrate the minimum average user date versus the time steps for

different N.

Appl. Sci. 2022,12, 7236 9 of 14

Table 2. Simulation Parameters.

Parameter Name Sign Parameter Value

Noise power density ρn−174 dBm/Hz

Channel bandwidth B1 MHz

Reference path loss PL00–30 dB

Reference distance d01 m

Path loss coefﬁﬁcients

β2.2

αk2.2

γk3.75

Rician factors

δ3

εk3

ρk3

Correlation coefﬁcients ρ0.1

Normalized variance kB0.01

Normalized variance kS0.01

Numbers of antennas M8

Numbers of users K4

Numbers of reﬂecting elements N20–50

Figure 3. RIS = 20.

Figure 4. RIS = 30.

Appl. Sci. 2022,12, 7236 10 of 14

Figure 5. RIS = 40.

Figure 6. RIS = 50.

As shown in Figures 3–6, when the number of RIS reﬂecting elements is set to

N=

20,

the MAUR (minimum average user date) converges to 0.75 when the number is set to

N=

30, MAUR converges to about 0.8. When the number of RIS reﬂecting elements

increases to 40, MAUR increases to 0.9, and when

N

is 50, the rate is about 1.1. Hence, the

MAUR increases with the number of reﬂecting elements. Meanwhile, the simulation results

show that, under different conditions, the increase in the number of reﬂecting elements

does not affect the converging speed of the proposed DDPG algorithm.

In Figures 7–10, we respectively set

kB=ks

as 0.01–0.15 to explore the convergence

of MUAR under different conditions. It indicates that the MAUR decreases with

kB

and

ks

. This conclusion is also consistent with the SINR formula in (13). Furthermore, when

kB

,

ks

are set to 0, the result is identical to the situation when hardware impairments are

not considered.

In addition, considering the inﬂuence of wind and rain in nature, we have added a

random variable on the channel angles (angle of departure and angle of arrival), where the

random variable is assumed to follow the uniform distribution. In general, the uniform

distribution can be regarded as the worst case since the variables are uniformly distributed

rather than peaking at one point for the Gaussian distribution. Then, we use the trained

solution obtained from our DDPG networks for the realistic channels with angle variance

to demonstrate the effectiveness of our algorithm. As observed from Figure 11, we can ﬁnd

that the performance degradation due to the channel variations is not too much, which

conﬁrms the robustness of our proposed algorithm.

Appl. Sci. 2022,12, 7236 11 of 14

Figure 7. Convergence Speed when kB=kS=0.01.

Figure 8. Convergence Speed when kB=kS=0.05.

Figure 9. Convergence Speed when kB=kS=0.1.

Appl. Sci. 2022,12, 7236 12 of 14

Figure 10. Convergence Speed when kB=kS=0.15.

Figure 11. Performance veriﬁcation under disturbance.

Finally, in Figure 12, we compare the performance of the proposed algorithm with

the existing non-optimized algorithm to evaluate the effectiveness of the optimization

operations. In speciﬁc, for the non-optimized algorithm, the beamforming vector at the

BS is randomly generated and the phase shift matrix is set to a unit matrix. It is ob-

served from Figure 12 that the proposed algorithm signiﬁcantly outperforms the existing

non-optimized algorithm.

Figure 12. Comparison with non-optimized algorithm.

Appl. Sci. 2022,12, 7236 13 of 14

5. Conclusions

In this paper, we studied the downlink IRS-aided multiuser MISO system with im-

perfect hardware, which is based on statistical CSI design. The DDPG algorithm was

applied to optimize the beamforming matrix at the BS and the phase shifts matrix at the

RIS jointly. Furthermore, the transceiver hardware impairment was also considered to

solve the problem of inevitable hardware loss in practical systems. The simulation results

demonstrated that it is necessary to take into HWI, and the DDGP algorithm can achieve

excellent performance.

Author Contributions:

Conceptualization, W.M., L.L., L.Z., Y.L. and H.R.; methodology, W.M., L.

Li, L.Z., Y.L. and H.R.; software, W.M.; validation, L.Z.; formal analysis, Y.L.; investigation, W.M.;

resources, W.M.; data curation, W.M.; writing—original draft preparation, W.M., L. Li, L.Z. and

Y.L.; writing—review and editing, H.R.; visualization, Y.L.; supervision, H.R.; project administration,

H.R.; funding acquisition, H.R. All authors have read and agreed to the published version of the

manuscript.

Funding:

This work was supported in part by the National Natural Science Foundation of China

(62101128) and Basic Research Project of Jiangsu Provincial Department of Science and Technology

(BK20210205).

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: Not applicable.

Conﬂicts of Interest:

The funders had no role in the design of the study; in the collection, analyses,

or interpretation of data; in the writing of the manuscript, or in the decision to publish the results

References

1.

Pan, C.; Ren, H.; Wang, K.; Kolb, J.F.; Elkashlan, M.; Chen, M.; Di Renzo, M.; Hao, Y.; Wang, J.; Swindlehurst, A.L.; et al.

Reconﬁgurable Intelligent Surfaces for 6G Systems: Principles, Applications, and Research Directions. IEEE Commun. Mag.

2021

,

59, 14–20. [CrossRef]

2.

Renzo, M.D.; Debbah, M.; Phan-Huy, D.T.; Zappone, A.; Alouini, M.S.; Yuen, C.; Sciancalepore, V.; Alexandropoulos, G.C.;

Hoydis, J.; Gacanin, H.; et al. Smart radio environments empowered by reconﬁgurable AI meta-surfaces: An idea whose time has

come. EURASIP J. Wirel. Commun. Netw. 2019,2019, 1–20. [CrossRef]

3. Oliveri, G.; Rocca, P.; Salucci, M.; Massa, A. Holographic smart EM skins for advanced beam power shaping in next generation

wireless environments. IEEE J. Multiscale Multiphys. Comput. Tech. 2021,6, 171–182. [CrossRef]

4.

Di Renzo, M.; Zappone, A.; Debbah, M.; Alouini, M.S.; Yuen, C.; de Rosny, J.; Tretyakov, S. Smart Radio Environments Empowered

by Reconﬁgurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead. IEEE J. Sel. Areas Commun.

2020

,

38, 2450–2525. [CrossRef]

5.

Huang, C.; Hu, S.; Alexandropoulos, G.C.; Zappone, A.; Yuen, C.; Zhang, R.; Renzo, M.D.; Debbah, M. Holographic MIMO

Surfaces for 6G Wireless Networks: Opportunities, Challenges, and Trends. IEEE Wirel. Commun. 2020,27, 118–125. [CrossRef]

6.

Benoni, A.; Salucci, M.; Oliveri, G.; Rocca, P.; Li, B.; Massa, A. Planning of EM Skins for Improved Quality-of-Service in Urban

Areas. IEEE Trans. Antennas Propag. 2022. [CrossRef]

7.

Pan, C.; Ren, H.; Wang, K.; Xu, W.; Elkashlan, M.; Nallanathan, A.; Hanzo, L. Multicell MIMO Communications Relying on

Intelligent Reﬂecting Surfaces. IEEE Trans. Wirel. Commun. 2020,19, 5218–5233. [CrossRef]

8.

Pan, C.; Ren, H.; Wang, K.; Elkashlan, M.; Nallanathan, A.; Wang, J.; Hanzo, L. Intelligent Reﬂecting Surface Aided MIMO

Broadcasting for Simultaneous Wireless Information and Power Transfer. IEEE J. Sel. Areas Commun.

2020

,38, 1719–1734.

[CrossRef]

9.

Boulogeorgos, A.A.A.; Alexiou, A. How Much do Hardware Imperfections Affect the Performance of Reconﬁgurable Intelligent

Surface-Assisted Systems? IEEE Open J. Commun. Soc. 2020,1, 1185–1195. [CrossRef]

10.

Shen, H.; Xu, W.; Gong, S.; Zhao, C.; Ng, D.W.K. Beamforming Optimization for IRS-Aided Communications with Transceiver

Hardware Impairments. IEEE Trans. Commun. 2021,69, 1214–1227. [CrossRef]

11.

Zhou, G.; Pan, C.; Ren, H.; Wang, K.; Peng, Z. Secure Wireless Communication in RIS-Aided MISO System with Hardware

Impairments. IEEE Wirel. Commun. Lett. 2021,10, 1309–1313. [CrossRef]

12.

Peng, Z.; Li, T.; Pan, C.; Ren, H.; Wang, J. RIS-Aided D2D Communications Relying on Statistical CSI With Imperfect Hardware.

IEEE Commun. Lett. 2022,26, 473–477. [CrossRef]

13.

Wang, K.; Lam, C.T.; Ng, B.K. Doppler Effect Mitigation using Reconﬁgurable Intelligent Surfaces with Hardware Impairments.

In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; pp. 1–6. [CrossRef]

Appl. Sci. 2022,12, 7236 14 of 14

14.

Hemanth, A.; Umamaheswari, K.; Pogaku, A.C.; Do, D.T.; Lee, B.M. Outage Performance Analysis of Reconﬁgurable Intelligent

Surfaces-Aided NOMA Under Presence of Hardware Impairment. IEEE Access 2020,8, 212156–212165. [CrossRef]

15.

Peng, Z.; Chen, Z.; Pan, C.; Zhou, G.; Ren, H. Robust Transmission Design for RIS-Aided Communications With Both Transceiver

Hardware Impairments and Imperfect CSI. IEEE Wirel. Commun. Lett. 2022,11, 528–532. [CrossRef]

16.

Hassan, A.K.; Moinuddin, M.; Al-Saggaf, U.M.; Aldayel, O.; Davidson, T.N.; Al-Naffouri, T.Y. Performance Analysis and Joint

Statistical Beamformer Design for Multi-User MIMO Systems. IEEE Commun. Lett. 2020,24, 2152–2156. [CrossRef]

17.

Zhi, K.; Pan, C.; Ren, H.; Wang, K. Power Scaling Law Analysis and Phase Shift Optimization of RIS-Aided Massive MIMO

Systems With Statistical CSI. IEEE Trans. Commun. 2022,70, 3558–3574. [CrossRef]

18.

Dai, J.; Zhu, F.; Pan, C.; Ren, H.; Wang, K. Statistical CSI-Based Transmission Design for Reconﬁgurable Intelligent Surface-Aided

Massive MIMO Systems With Hardware Impairments. IEEE Wirel. Commun. Lett. 2022,11, 38–42. [CrossRef]

19.

Ren, H.; Pan, C.; Wang, L.; Liu, W.; Kou, Z.; Wang, K. Long-Term CSI-Based Design for RIS-Aided Multiuser MISO Systems

Exploiting Deep Reinforcement Learning. IEEE Commun. Lett. 2022,26, 567–571. [CrossRef]