Content uploaded by Jiayuan Chen

Author content

All content in this area was uploaded by Jiayuan Chen on Aug 12, 2022

Content may be subject to copyright.

A Joint Optimization of Sensor Activation and

Mobile Charging Scheduling in Industrial Wireless

Rechargeable Sensor Networks

Jiayuan Chen∗, Changyan Yi∗, Ran Wang∗, Kun Zhu∗and Jun Cai†

∗College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

†Department of Electrical and Computer Engineering, Concordia University, Montr´

eal, QC, H3G 1M8, Canada

Email: {jiayuan.chen, changyan.yi, wangran, zhukun}@nuaa.edu.cn, jun.cai@concordia.ca

Abstract—In this paper, a joint optimization of sensor acti-

vation and mobile charging scheduling for industrial wireless

rechargeable sensor networks (IWRSNs) is studied. In the con-

sidered model, an optimal sensor set is selected to collaboratively

execute a bundle of heterogeneous tasks of production-line

monitoring, meeting the quality-of-monitoring (QoM) of each

individual task. There is a mobile charger vehicle (MCV) which is

scheduled for recharging sensors before their charging deadlines

(i.e., the time instant of running out of their energy). Our goal

is to jointly optimize the sensor activation and MCV scheduling

for minimizing the energy consumption of the entire IWRSN,

subjected to tasks’ QoM requirements, sensor charging deadlines

and the energy capacity of the MCV. Unfortunately, solving this

problem is non-trivial, because it involves solving two tightly

coupled NP-hard problems. To address this issue, we design an

efﬁcient algorithm integrating deep reinforcement learning and

marginal product based approximation algorithm. Simulations

are conducted to evaluate the performance of the proposed

solution and demonstrate its superiority over counterparts.

I. INTRODUCTION

WITH the development of intelligent manufacturing,

industrial wireless sensor networks (IWSNs) have been

widely used for the automatic control of industrial production

process and the monitoring of various parameters. Never-

theless, wireless sensor nodes are severely energy-limited,

which hinders the wide application of IWSNs. To tackle such

sensor energy provisioning problem, researchers studied how

to reduce the energy consumption by optimizing wake-up and

sleeping scheduling, data gathering and routing strategies, etc.

to prolong the lifetime of IWSNs. However, these methods

cannot fundamentally address the shortage of total energy

capacities of sensors. Therefore, recent advances of wireless

energy transfer technology have inspired the emergence of

industrial wireless rechargeable sensor networks (IWRSNs)

[1], in which mobile charger vehicles (MCVs) are employed

to travel around and replenish energy for sensors without

interconnecting wires.

Although IWRSNs can obviously outperform traditional

IWSNs in alleviating the heavy burden of energy consumption,

there are still some open problems remaining. In practice,

sensing tasks for production-line monitoring may be highly

heterogeneous in terms of quality of monitoring (QoM) re-

quirements, locations and types. Besides, industrial sensors

may also be heterogeneous in terms of sensing radius, types,

etc. Therefore, it is crucial to select the optimal set of sensors

to activate for collaboratively and continuously execute all

monitoring tasks while meeting the QoM of each task, and

such problem becomes more complicated since sensors in

IWRSNs are rechargeable.

Furthermore, industrial sensors must keep up high-intensity

work for long periods and continuously feed data back to

controllers or actuators. For example, while a cutting machine

is working, industrial camera sensors must collaboratively

monitor the position of cutters in real-time and send out the

data in a timely manner. Any unpredictable sensor failure

may cause serious consequences, e.g., unexpected damages

and casualties. Hence, in order to guarantee that all activated

sensors can work continuously during the monitoring period,

the MCV in IWRSNs should be scheduled to recharge sensors

before their charging deadlines (i.e., the instant of running out

of their energy). However, the energy capacity of MCV is also

limited, and thus the scheduling of MCV is not only subjected

to the charging deadlines of sensors, but also its own energy

capacity constraint.

To address the aforementioned issues, in this paper, we

study a joint optimization of sensor activation and mobile

charging scheduling for IWRSNs. The goal is to jointly opti-

mize the sensor activation and MCV scheduling for minimiz-

ing the energy consumption of the considered IWRSN, sub-

jected to tasks’ QoM requirements, sensor charging deadlines

and energy capacity of the MCV. In the considered model, the

MCV starts from the depot, travels along the scheduled path

and returns to the depot at the end of a trip. While traveling

on its path, the MCV charges activated sensors before their

charging deadlines. To solve such joint sensor activation and

mobile charging scheduling problem, we propose an efﬁcient

algorithm integrating deep reinforcement learning (DRL) and

marginal product based approximation algorithm.

The main contributions of this paper are summarized in the

following.

•A joint optimization of sensor activation and mobile

charging scheduling for IWRSNs is formulated, where

the objective is to minimize the energy consumption of

the entire network.

•An efﬁcient algorithm, called joint sensor activation and

charging scheduling algorithm (JSACS), is proposed inte-

Industrial Environment Depot

Mobile

Charging

Vehicle

Inactive

Sensor

Active

Sensor

Task of

Monitoring

Initial

Energy of

Sensor

Charging

Route

Sensing

Radius

Charging

Deadline

Fig. 1. An illustration of the considered IWRSN.

grating DRL and marginal product based approximation

algorithm, which jointly optimizes the sensor activation

and the MCV’s charging route scheduling.

•Simulations are conducted to show the superiority of the

proposed JSACS over counterparts.

The rest of this paper is organized as follows: Section II

presents the system model and the problem description. In

Section III, an efﬁcient solution for the problem is proposed.

Simulation results are provided in Section IV, followed by

conclusions in Section V.

II. SY ST EM MO DE L AN D PROB LE M DESCRIPTION

A. Network Model

Consider an IWRSN, as illustrated in Fig. 1, consisting

of a group of tasks for production-line monitoring, a set of

stationary industrial rechargeable sensors Swith cardinality of

|S| =Suniformly distributed in a certain area, and an MCV

which starts working from a depot deployed at the center.

At the beginning of a monitoring period, the industrial

controller declares its a bundle of monitoring tasks Z=

{zm

j|∀m∈ {1,2, ..., M },∀j∈ {1,2, . . . , J}} to the IWRSN,

where mand jstand for the index of the monitoring task

and its corresponding type, respectively. For meeting the QoM

requirements of these tasks, a group of sensors H ⊆ S should

be activated to collaboratively execute the monitoring tasks.

In practice, sensors’ sensing radius are limited, which can

be denoted by Ri,i∈ S. In addition, different types of sensors

can only execute tasks ﬁtting their types, and thus we deﬁne

Sjas the set of sensors specialized in task type j. Obviously,

each sensor i∈ S can only execute task zm

j∈ Z that is

located within its sensing radius Riand falls into its targeted

type. In each monitoring period, each sensor is able to execute

at most one task. In this paper, we adopt the probabilistic

sensing coverage (PSC) model [2], [3], and denote pi,zm

jas

the detection probability of zm

jby sensor i, which can be

calculated as

pi,zm

j=(e−αi·dist(i,zm

j),if dist i, zm

j≤Ri, i ∈ Sj,

0,otherwise,

(1)

where αirepresents the intensity coefﬁcient related to the

sensor i’s physical characteristics, and dist i, zm

jindicates

the Euclidean distance between sensor iand task zm

j[2]–

[4]. The collaborative coverage probability of sensor set Hto

the monitoring task zm

jis required to be larger or equal to

Pdemand

zm

j, i.e.,

1−Y

i∈H

(1 −pi,zm

j)≥Pdemand

zm

j,(2)

where Pdemand

zm

jmeasures the minimum QoM demanded by

each task zm

j. For sensors that are activated to execute tasks,

they should work continuously during the monitoring period

due to the application for industrial monitoring. However,

the battery capacity of each sensor Ecapacity

iis limited, and

once the battery is completely consumed, the sensor stops

working. To this end, the MCV is employed with energy

capacity EMC V which travels starting at the depot, charges

dying sensors in Hand returns to the depot at the end. Because

of the hardware limitation, the MCV can only recharge one

sensor at a time. We denote Einitial

ias the initial energy of

each sensor i∈ S at the beginning of the monitoring period.

For simplicity, assume that for each sensor i∈ S,Einitial

iis

sufﬁciently large to guarantee that Einitial

i≥Emin

i, where

Emin

iis the minimum energy for i∈ S to be operational.

Here, we characterize the energy consumption rate of each

sensor i∈ S by Econsume

i. Note that it is possible that some

sensors may have sufﬁciently enough energy so that they can

work continuously during the monitoring period and are not

necessary to be recharged by the MCV. We classify these

sensors into the set H0⊆ H, and categorize the others which

have to be recharged by the MCV into set H1=H\H0.

Obviously, the amount of energy that sensor i∈ H1required

to be recharged can be calculated as

Edemand

i=T·Econsume

i−(Einitial

i−Emin

i),∀i∈ H1,(3)

where Tis the time duration of each production-line monitor-

ing task period.

For ensuring that all activated sensors can execute tasks

continuously, the MCV should charge the sensors in set H1

before their charging deadline ddli,i∈ H1, which can be

calculated as

ddli=Einitial

i−Emin

i

Econsume

i

,∀i∈ H1.(4)

Besides, let us denote the charging route of the MCV by

a vector LH1={π0, π1, ..., πg, ..., π|H1|, π|H1|+1}, where πg

signiﬁes the gth visiting target (i.e., the targeted sensor for

recharging). Speciﬁcally, π0=π|H1|+1 = 0 indicates that

the MCV travels starting from the depot and returns at the

end, and πg∈ H1for g= 1,...,|H1|. Note that, each

sensor i∈ H1can only be visited once, that is πg6=πg0

for g6=g0. Furthermore, we deﬁne the arrival time of the

MCV at a visiting target πgas Aπg. Clearly, Aπgdepends

on the arrival time of the last visited target πg−1, the service

time (i.e., battery recharging time) for the target πg−1, and the

traveling time of the MCV from πg−1to πg. Hence, Aπgcan

be expressed as

Aπg=Aπg−1+Edemand

πg−1

ε+dist (πg−1, πg)

v,∀πg∈ LH1,(5)

where εand vstand for the the charging efﬁciency and the

velocity of the MCV, respectively. Following the deﬁnition in

(3), Edemand

πgdepicts the amount energy that the target πg(or

sensor πg) demands for recharging. In particular, Edemand

π0=

Edemand

π|H1|+1 = 0, and Aπ0= 0.

In this paper, we assume that when a sensor i∈ H

has been fully recharged, it can work continuously without

interruption during the monitoring period, namely Ecapacity

i≥

T·Econsume

i.

B. Problem Description

The energy consumption of an IWRSN includes the energy

consumption of the MCV and the energy consumption of

sensors in Hfor executing tasks. Although the energy cost

of the MCV further consists of both the traveling energy

cost and the recharging energy cost, all recharging energy

will be consumed completely by sensors for a higher energy

utilization efﬁciency, and thus such term is implied by the

energy cost of sensors in H. Therefore, the total energy

consumption of an IWRSN Etotal H,LH1can be formulated

as

Etotal(H,LH1)=

|H1|

X

g=0

γ·dist (πg, πg+1)+X

i∈H

T·Econsume

i,

where γrepresents the energy consumption rate from MCV’s

travelling.

Accordingly, a joint optimization of sensor activation (i.e.,

the optimal set of sensors to activate H) and mobile charging

scheduling (i.e., the optimal charging route LH1) for the

IWRSN can be formulated as

[P1] : min

H,LH1

Etotal(H,LH1)(6)

s.t., 1−Y

i∈H

(1 −pi,zm

j)≥Pdemand

zm

j,∀zm

j∈ Z,(7)

Aπg≤ddlπg, g=1, . . . |H1|,(8)

πg6=πg0, g 6=g0;g=1, . . . |H1|, g0=1, . . . |H1|,(9)

|H1|

X

g=0

γ·dist (πg, πg+1)+

|H1|

X

g=1

Edemand

πg≤EMC V ,(10)

π0= 0, π|H1|+1 = 0,(11)

H ⊆ S,(12)

H=H0∪ H1,(13)

LH1={π0, π1, ..., πg, ..., π|H1|, π|H1|+1},(14)

where constraint (7) states that each monitoring task’s QoM

requirement should be met; constraint (8) ensures that the

MCV can always be scheduled to arrive before each sensor’s

charging deadline expires; constraint (9) means that the MCV

should not visit the same sensor more than once in the

scheduled charging route; constraint (10) indicates that the

total energy consumption of the MCV should be less than or

equal to its energy capacity EMC V ; constraint (11) illustrates

that the MCV starts at the depot and returns to the depot at

the end. In the following section, we will propose an efﬁcient

algorithm to derive the solution of this joint optimization

problem.

III. JOINT SEN SO R ACT IVATIO N AN D MOBILE CHARGING

SCHEDULING

A. Hardness Analysis

From the problem formulation [P1], we can observe that

the joint optimization of sensor activation and mobile charg-

ing scheduling actually includes two-layer optimizations. The

upper layer optimization mainly addresses the sensor set

selection with tasks’ QoM constraints, where the objective is

to minimize the energy consumption of the activating sensor

set H. And the lower layer optimization aims to determine the

charging route scheduling for the MCV by taking into account

sensors’ charging deadlines, where the objective is to minimize

the traveling energy consumption of the MCV. Indeed, these

two optimization problems are tightly coupled.

Given the charging route LH1of the MCV, we can get the

set of candidate sensors S0⊆ S, where all sensors in S0have

sufﬁcient energy to execute monitoring tasks continuously

during the monitoring period. The upper layer sensor set

selection problem turns to be a variant generalized assignment

problem, which is NP-hard:

[P2] : min

HX

i∈H

T·Econsume

i

s.t., (7),(13) and H ⊆ S0,

While given the set H, the set H1can also be obtained and

the lower layer mobile charging route scheduling problem can

be seen as a reduced traveling salesman with time windows

problem, which is NP-hard:

[P3] : min

LH1

|H1|

X

g=0

γ·dist (πg, πg+1)

s.t., (8),(9),(10),(11) and (14)

Based on the above analyses, it is obvious that solving the

joint optimization of sensor activation and mobile charging

scheduling for the IWRSN directly is very challenging be-

cause: i) both the upper layer sensor selection optimization,

and the lower layer charging route scheduling problem are

NP-hard; ii) the upper and lower layer problems are tightly

coupled (i.e., the input of the lower layer problem depends on

the output of the upper layer one, while the optimization of the

upper problem would impact the lower layer problem). In the

following subsection, we ﬁrst solve the MCV charging route

scheduling problem by applying a DRL-based approach. Then,

we jointly optimize the sensor set selection and the MCV

charging route scheduling by utilizing a marginal product

based approximation algorithm.

B. DRL Algorithm for Mobile Charging Route Scheduling

Here, a modiﬁed pointer network similar to that in [5] is

introduced to model the lower layer problem [P3], and the

Actor-Critic algorithm is utilized for training.

First, we introduce the input structure of the neural network.

At each decoding step g= 0,1,...,|H1|+ 1, let the set

of inputs be Xg={x0

g, x1

g,...x|H1|

g}, where |H1|indicates

the number of targets that need to be recharged. Each xi

gis

represented by a sequence of tuples {xi

g= (si, di

g)}, where si

and di

gstand for the static and dynamic elements of the input,

respectively. It is worth noting that dynamic elements of each

input are allowed to alter between the decoding steps, while

the static elements are invariant. For example, siis the attribute

of the target i, including target i’s location and the charging

deadline, which does not change during the charging process.

However, the charging requirement of the target ibecomes 0

after charging by the MCV. Therefore, xi

gcan be viewed as a

vector of features that depicts the state of iat decoding step g.

Particularly, x0

grepresents attributes of the depot, which is set

to locate at the center of the area, and its charging deadline is

inﬁnite and it has no charging demand.

The output of the model is a permutation of the sensors

and the depot, LH1={π0, π1, ..., π|H1|, ..., π|H1|+1}. At each

decoding step g= 0,1,...,|H1|+ 1,πgpoints to a sensor

or the depot in Xg, determining the next visiting target. The

states of sensors in Xgare updated every time after a target has

been visited. When the charging requirements of all sensors

are satisﬁed, the process will be terminated.

To map input X0to output LH1, the probability chain rule

is utilized:

PLH1|X0=

|H1|

Y

g=1

P(πg+1 |π0, π1,· · · , πg, Xg).(15)

Firstly the depot is selected as π0. Eq. (15) provides the

probability of selecting the next visiting target according to

π0, π1, . . . , πg, i.e., the already visited targets. Then a modiﬁed

pointer network similar to that in [5] is used to model (15).

Its basic structure is the sequence-to-sequence model [6], a

powerful model in the machine translation ﬁeld, which maps

one sequence to another. The sequence-to-sequence model

consists of two recurrent neural networks (RNNs), namely

encoder and decoder.

Encoder encodes the input sequence into a code vector

which contains knowledge of the input. Since the attributes

of the targets convey no sequential information and the order

of targets in the inputs is meaningless, RNN is not necessary

to be utilized in the encoder. Therefore, a simple embedding

layer is adopted to encode the inputs which decreases the

computational compilations without decreasing the efﬁciency

[5]. In this work, we apply a 1-dimensional (1-D) convolution

layer to encode the inputs to a high-dimensional vector [5]

(d= 128 in this work). The parameters of the 1-D convolution

layer are shared among the inputs.

Different from the encoder, we use RNN to model the

decoder network since we need to store the knowledge of

Algorithm 1: Actor-Critic training algorithm

Output: The optimal model M∗= [θ∗, φ∗].

1Initialize: Let the actor network with random weights θand critic

network with random weights φ;

2for iteration ←1,2,... do

3generate F problem instances from {ΦM1,ΦM2,...,ΦMM};

4for c←1,...,F do

5t←0;

6while not terminated do

7select the next target πc

g+1 according to

Pπc

g+1|πc

1,...,πc

g, Xc

g;

8Update Xc

gto Xc

g+1 leaving out the visited targets;

9compute the reward Rc;

10 dθ ←1

FPF

c=1 Rc−VXc

0;φ∇θlog PYc|Xc

0;

11 dφ ←1

FPF

c=1 ∇φRc−VXc

0;φ2;

12 θ←θ+ηdθ;

13 φ←φ+ηdφ;

14 Determine θ∗=θ, φ∗=φ.

previous steps π0, π1,· · · , πgto assist for obtaining πg+1.

The hidden state of RNN decoder dgcan memorize the

previously selected visited targets. Then dgis combined with

the encoding of the inputs ρ0

g, ρ1

g,...ρ|H1|

gto calculate the

conditional probability P(πg+1 |π0, π1,· · · , πg, Xg).

The attention mechanism is utilized to calculate the degree

of correlation of each input to the decoding step g. More

attention is given to the most relevant one which is more

likely to be selected as the next target. The calculation can be

expressed as

ui

g=wTtanh W1ρi

g+W2dg, i ∈(0,1, ..., |H1|) ;

P(πg+1 |π0, π1,· · · , πg, Xg) = softmax ui

g,

where w,W1,W2are learnable parameters. For each target

i, its ui

gis computed by dgand its encoder hidden state ρi

g.

The softmax operator is used to normalize u0

g, u1

g, . . . , u|H1|

g,

and probability for selecting each target iat step gcan then

be obtained. In this paper, the greedy decoder is utilized to

select the next target.

We adopt the well-known Actor-Critic method to train the

network. The method introduces two networks that require to

be trained: i) an actor network, which is the pointer network in

this work, is used to calculate the probability distribution for

choosing the next target; and ii) a critic network that evaluates

the expected reward given a speciﬁc problem state. In addition,

the critic network uses the same architecture as the pointer

network’s encoder which maps the encoder hidden state into

the critic output. However, during training, the model selects

the next target by sampling from the probability distribution

instead of choosing the target with the maximum probability.

The training is conducted in an unsupervised way and

the training procedure is presented in Algorithm 1. Dur-

ing the training process, we generate instances from dis-

tributions {ΦM1,ΦM2,...,ΦMM}, where Msigniﬁes dif-

ferent input features of the targets, i.e, the targets’ loca-

tions, charging deadlines, etc. Finstances are sampled from

{ΦM1,ΦM2,...,ΦMM}for training the actor and critic net-

Algorithm 2: Joint Sensor Activation and Charging

Scheduling Algorithm (JSACS)

Input: Scandidate

zm

j={i|pi,zm

j6= 0,∀i∈ S},Scandidate =

P

zm

j∈Z

Scandidate

zm

j

,Zunsatisfied =Z.

Output: H,LH1.

1Initialize: Let H0=∅,H1=∅,H=∅,Etravel

MC V (H1)=0;

2while Zunsatisfied is nonempty do

3for each i∈ Scandidate do

4if Einitial

i−Emin

i≥T·Econsume

ithen

5Etravel

MC V (H1∪ {i}) = Etravel

MC V (H1);

6else

7Call the model M∗= [θ∗, φ∗]in algorithm 1 to get a

charging route LH1∪{i}which meets each sensor’s

charging deadline (If there is no charging route that

meets the sensor’s charging deadline or the energy

consumption of the MCV exceeds EMC V , delete

the sensor ifrom Scandidate.), then compute the

energy consumption of the charging route

Etravel

MC V (H1∪ {i});

8iselected =

arg max

i∈Scandidate

{

(1−Qi0∈H∪{i}(1−pi0,zm

j))−(1−Qi0∈H(1−pi0,zm

j))

Etotal(H∪{i},LH1∪h)−Etotal (H,LH1),∀zm

j∈ Z},

Update H=H ∪ {iselected},Etrav el

MC V (H1) =

Etravel

MC V (H1∪ {iselected});

9if Einitial

iselected −Emin

iselected ≥T·Econsume

iselected then

10 Update H0=H0∪ {iselected};

11 else

12 Update H1=H1∪ {iselected};

13 for each zm

j∈ Zunsatisf ied do

14 if 1−Qi∈H(1 −pi,zm

j)≥Pdemand

zm

j

then

15 Update Scandidate =Scandidate\{S candidate

zm

j},

Zunsatisfied =Zunsatisfied \{zm

j};

16 Update Scandidate =Scandidate\{iselected };

17 return H,LH1.

works with parameters θand φ. For each instance, the actor

network with current parameters θproduces the permutation of

targets, and the corresponding reward can be obtained. Then

policy gradient is computed in line 10 to update the actor

network. Meanwhile, the critic network is updated in line 11

by reducing the difference between the observed rewards and

the approximated rewards.

C. Joint Sensor Activation & Charging Scheduling Algorithm

Based on the MCV’s traveling energy consumption calcu-

lated by the trained model M∗, the core idea is iteratively

selecting a new sensor iwhich has the largest marginal product

[7]. Marginal product is a concept in economics, which refers

to the increase in the total output brought about by adding

a unit of an input, assuming that the quantities of other

inputs are maintained as constant [7]. In this paper, the energy

consumption of the IWRSN corresponds to the adding input,

and the QoM obtained by all tasks corresponds to the output.

Then, in each iteration, a new activating sensor should be

TABLE I

MAIN SIMULATION PAR AM ETE RS .

Parameter Value

Sensor types [0,1,2,3]

Task types [0,1,2,3]

Number of sensors 800 (number of each type: 200)

Number of tasks 40 (randomly chosen over [0,1,2,3])

Area dimensions 80 m ×80 m

Sensing radius Rirandomly chosen over [10,15,20,25] m

Energy capacity Ecapacity

i10.8 kJ

Energy consumption rate Econsume

i0.5 J/s

Minimum energy Emin

i540 J

Initial energy Einitial

irandomly over [1080,3240] J

Intensity coefﬁcient αirandomly over [0.1,0.3]

QoM demand Pdemand

zm

j

randomly over [0.5, 0.7]

Charging efﬁciency ε15 W

Velocity v2 m/s

Traveling energy consumption γ20 J/m

Energy capacity of MCV EMC V 128 kJ

Time duration of monitoring period T1 hour

selected according to:

arg max

i∈Scandidate

{(1−Qi0∈H∪{i}(1−pi0,zm

j))−(1−Qi0∈H(1−pi0,zm

j))

Etotal(H∪{i},LH1∪h)−Etotal (H,LH1),∀zm

j∈ Z},

where hindicates whether this sensor needed to be recharged

or not:

h=({i},if Einitial

i−Emin

i< T ·Econsume

i,

∅,otherwise.

Initially, H=∅, and the details of the proposed JSACS

algorithm can be found in Algorithm 2.

IV. SIMULATION RESULTS

In this section, simulations are conducted to numerically

evaluate the performance of the proposed JSACS for problem

P1. Table I lists the values of main simulation parameters.

Similar settings have been employed in the literature [8].

Note that some parameters may vary according to different

evaluation scenarios.

For effective and fair comparisons, we introduce the greedy

algorithm (GRE) and an existing algorithm named reward-cost

ratio algorithm (RC-ratio) [9]. GRE greedily selects sensors

into Hthat have maximum coverage probability until all tasks’

QoM are satisﬁed and then applies the earliest deadline ﬁrst

policy (EDF) [10] to derive the charging tour of the MCV for

H1. For EDF, MCV always selects a sensor with the earliest

charging deadline as its next serving target. Besides, both the

charging deadlines of sensors in H1and the energy capacity of

MCV are taken into account when selecting each sensor. RC-

ratio selects sensors into Haccording to the marginal product

function while the MCV’s charging route is determined by

EDF.

Fig. 3 demonstrates the superiority of the proposed JSACS

in terms of the entire network energy consumption. It is shown

that, the energy consumption of the entire network increases

monotonically with the number of tasks. This is because with

the growth of the number of tasks, more sensors need to be

activated, leading to more energy consumption. Meanwhile,

35 36 37 38 39 40 41 42 43 44 45

Number of Tasks

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

1.55

1.6

Energy Consumption of the Entire IWRSN (J)

×105

GRE

RC-ratio

Proposed JSACS

Fig. 3. Comparison of energy consumption of

the entire IWRSN w.r.t. number of tasks.

35 36 37 38 39 40 41 42 43 44 45

Number of Tasks

0

10

20

30

40

50

60

70

80

Energy Utilization Efficiency of the MCV (%)

GRE

RC-ratio

Proposed JSACS

Fig. 4. Comparison of energy utilization efﬁ-

ciency of the MCV w.r.t. number of tasks.

50 60 70 80 90 100 110 120 130 140 150

Network Size (L*L)

0.6

0.8

1

1.2

1.4

1.6

1.8

Energy Consumption of the Entire IWRSN (J)

×105

GRE

RC-ratio

Proposed JSACS

Fig. 5. Comparison of energy consumption of

the entire IWRSN w.r.t. network sizes.

with more sensors being activated, a growing number of them

need to be recharged within this area, resulting in the increase

of the MCV’s traveling energy consumption. Additionally, it

can be observed that the proposed JSACS outperforms GRE

and RC-ratio. The reason is that GRE iteratively selects a

sensor with maximum coverage probability while ignores the

sensor selection impacts on the total energy consumption. RC-

ratio outperforms GRE since RC-ratio selects a sensor with

maximum marginal product in each iteration. The proposed

JSACS achieves the best performance because it does not

only select a sensor with the largest marginal product in each

iteration, but also determines the charging route of the MCV

by a well DRL model instead of EDF.

Fig. 4 compares the energy utilization efﬁciency of GRE,

RC-ratio and proposed JSACS. The energy utilization efﬁ-

ciency refers to the proportion of the energy for recharging

sensors to total MCV energy consumption. It is shown that

the proposed JSACS performs better than GRE and RC-ratio.

The reason is that the proposed JSACS consider the two-

layer optimization simultaneously when selecting a sensor.

In addition, the objective of the trained DRL model is to

minimize the traveling energy consumption of the MCV while

meeting the charging deadlines of sensors. However, the EDF

applied in GRE and RC-ratio does not consider the traveling

length of the MCV, and it simply recharge sensors in a timely

manner. Therefore, the proposed JSACS can prompt the MCV

to utilize more energy for task execution to increase the QoM

of tasks, rather than wasting energy on traveling.

Fig. 5 shows that the energy consumption of the entire

network of these three algorithms increases almost linearly

with the network size. The reason is that the larger network

size makes the sensor deployment more sparse, leading to

more energy consumption on traveling. In addition, a larger

network size also makes the distance between the sensor and

its monitoring tasks larger, and the detection probabilities of

sensors decrease, so that more sensors need to be activated to

execute tasks, inducing more energy consumption of sensors.

Intuitively, the proposed JSACS outperforms GRE and RC-

ratio, beneﬁting from integrating DRL and marginal product

based approximation algorithm to jointly solve the sensor

activation and charging scheduling problem.

V. CONCLUSION

In this paper, the joint optimization of sensor activation

and mobile charging scheduling for IWRSNs has been stud-

ied. By considering the objective of minimizing the energy

consumption of the entire network subjected to tasks’ QoM

requirements, sensor charging deadlines and the energy ca-

pacity of the MCV, an efﬁcient algorithm named JSACS is

proposed integrating DRL and marginal product based approx-

imation algorithm. Simulation results show that, compared to

counterparts, the proposed algorithm can decrease the energy

consumption of the entire IWRSN and improve the energy

utilization efﬁciency of the MCV.

ACKNOWLEDGMENTS

This work was supported by National Natural Science Foun-

dation of China (NSFC) under Grants 62002164, 62176122,

and 62171218.

REFERENCES

[1] Y. Feng, W. Zhang, G. Han, Y. Kang, and J. Wang, “A newborn

particle swarm optimization algorithm for charging-scheduling algorithm

in industrial rechargeable sensor networks,” IEEE Sensors J., vol. 20,

no. 18, pp. 11 014–11 027, 2020.

[2] H. P. Gupta, T. Venkatesh, S. V. Rao, and T. Dutta, “Analysis of coverage

under border effects in three-dimensional mobile sensor networks,” IEEE

Trans. Mobile Comput., vol. 16, no. 9, pp. 2436–2449, 2017.

[3] C. Yi, J. Cai, K. Zhu, and R. Wang, “A queueing game based man-

agement framework for fog computing with strategic computing speed

control,” IEEE Trans. Mobile Comput., 2022.

[4] C. Yi, J. Cai, T. Zhang, K. Zhu, B. Chen, and Q. Wu, “Workload re-

allocation for edge computing with server collaboration: A cooperative

queueing game approach,” IEEE Trans. Mobile Comput., pp. 1–1, 2022.

[5] M. Nazari, A. Oroojlooy, L. V. Snyder, and M. Tak´

aˇ

c, “Reinforcement

learning for solving the vehicle routing problem,” in Adv. Neural Inf.

Process. Syst., 2018, pp. 9839–9849.

[6] I. Sutskever and O. Vinyals, “Sequence to sequence learning with neural

networks,” in Adv. Neural Inf. Process. Syst., 2014, pp. 3104–3112.

[7] A. Brewer, The making of the classical theory of economic growth.

Routledge, 2010.

[8] T. Liu, B. Wu, S. Zhang, J. Peng, and W. Xu, “An effective multi-node

charging scheme for wireless rechargeable sensor networks,” in Proc.

IEEE Int. Conf. Comput. Commun., 2020.

[9] T. Wu, P. Yang, H. Dai, C. Xiang, X. Rao, J. Huang, and T. Ma, “Joint

sensor selection and energy allocation for tasks-driven mobile charging

in wireless rechargeable sensor networks,” IEEE Internet Things J.,

vol. 7, no. 12, pp. 11 505–11 523, 2020.

[10] J. A. Stankovic, M. Spuri, K. Ramamritham, and G. C. Buttazzo,

Deadline scheduling for real-time systems: EDF and related algorithms.

Springer Science & Business Media, 2012, vol. 460.