Content uploaded by Jiayuan Chen
Author content
All content in this area was uploaded by Jiayuan Chen on Aug 12, 2022
Content may be subject to copyright.
A Joint Optimization of Sensor Activation and
Mobile Charging Scheduling in Industrial Wireless
Rechargeable Sensor Networks
Jiayuan Chen∗, Changyan Yi∗, Ran Wang∗, Kun Zhu∗and Jun Cai†
∗College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
†Department of Electrical and Computer Engineering, Concordia University, Montr´
eal, QC, H3G 1M8, Canada
Email: {jiayuan.chen, changyan.yi, wangran, zhukun}@nuaa.edu.cn, jun.cai@concordia.ca
Abstract—In this paper, a joint optimization of sensor acti-
vation and mobile charging scheduling for industrial wireless
rechargeable sensor networks (IWRSNs) is studied. In the con-
sidered model, an optimal sensor set is selected to collaboratively
execute a bundle of heterogeneous tasks of production-line
monitoring, meeting the quality-of-monitoring (QoM) of each
individual task. There is a mobile charger vehicle (MCV) which is
scheduled for recharging sensors before their charging deadlines
(i.e., the time instant of running out of their energy). Our goal
is to jointly optimize the sensor activation and MCV scheduling
for minimizing the energy consumption of the entire IWRSN,
subjected to tasks’ QoM requirements, sensor charging deadlines
and the energy capacity of the MCV. Unfortunately, solving this
problem is non-trivial, because it involves solving two tightly
coupled NP-hard problems. To address this issue, we design an
efficient algorithm integrating deep reinforcement learning and
marginal product based approximation algorithm. Simulations
are conducted to evaluate the performance of the proposed
solution and demonstrate its superiority over counterparts.
I. INTRODUCTION
WITH the development of intelligent manufacturing,
industrial wireless sensor networks (IWSNs) have been
widely used for the automatic control of industrial production
process and the monitoring of various parameters. Never-
theless, wireless sensor nodes are severely energy-limited,
which hinders the wide application of IWSNs. To tackle such
sensor energy provisioning problem, researchers studied how
to reduce the energy consumption by optimizing wake-up and
sleeping scheduling, data gathering and routing strategies, etc.
to prolong the lifetime of IWSNs. However, these methods
cannot fundamentally address the shortage of total energy
capacities of sensors. Therefore, recent advances of wireless
energy transfer technology have inspired the emergence of
industrial wireless rechargeable sensor networks (IWRSNs)
[1], in which mobile charger vehicles (MCVs) are employed
to travel around and replenish energy for sensors without
interconnecting wires.
Although IWRSNs can obviously outperform traditional
IWSNs in alleviating the heavy burden of energy consumption,
there are still some open problems remaining. In practice,
sensing tasks for production-line monitoring may be highly
heterogeneous in terms of quality of monitoring (QoM) re-
quirements, locations and types. Besides, industrial sensors
may also be heterogeneous in terms of sensing radius, types,
etc. Therefore, it is crucial to select the optimal set of sensors
to activate for collaboratively and continuously execute all
monitoring tasks while meeting the QoM of each task, and
such problem becomes more complicated since sensors in
IWRSNs are rechargeable.
Furthermore, industrial sensors must keep up high-intensity
work for long periods and continuously feed data back to
controllers or actuators. For example, while a cutting machine
is working, industrial camera sensors must collaboratively
monitor the position of cutters in real-time and send out the
data in a timely manner. Any unpredictable sensor failure
may cause serious consequences, e.g., unexpected damages
and casualties. Hence, in order to guarantee that all activated
sensors can work continuously during the monitoring period,
the MCV in IWRSNs should be scheduled to recharge sensors
before their charging deadlines (i.e., the instant of running out
of their energy). However, the energy capacity of MCV is also
limited, and thus the scheduling of MCV is not only subjected
to the charging deadlines of sensors, but also its own energy
capacity constraint.
To address the aforementioned issues, in this paper, we
study a joint optimization of sensor activation and mobile
charging scheduling for IWRSNs. The goal is to jointly opti-
mize the sensor activation and MCV scheduling for minimiz-
ing the energy consumption of the considered IWRSN, sub-
jected to tasks’ QoM requirements, sensor charging deadlines
and energy capacity of the MCV. In the considered model, the
MCV starts from the depot, travels along the scheduled path
and returns to the depot at the end of a trip. While traveling
on its path, the MCV charges activated sensors before their
charging deadlines. To solve such joint sensor activation and
mobile charging scheduling problem, we propose an efficient
algorithm integrating deep reinforcement learning (DRL) and
marginal product based approximation algorithm.
The main contributions of this paper are summarized in the
following.
•A joint optimization of sensor activation and mobile
charging scheduling for IWRSNs is formulated, where
the objective is to minimize the energy consumption of
the entire network.
•An efficient algorithm, called joint sensor activation and
charging scheduling algorithm (JSACS), is proposed inte-
Industrial Environment Depot
Mobile
Charging
Vehicle
Inactive
Sensor
Active
Sensor
Task of
Monitoring
Initial
Energy of
Sensor
Charging
Route
Sensing
Radius
Charging
Deadline
Fig. 1. An illustration of the considered IWRSN.
grating DRL and marginal product based approximation
algorithm, which jointly optimizes the sensor activation
and the MCV’s charging route scheduling.
•Simulations are conducted to show the superiority of the
proposed JSACS over counterparts.
The rest of this paper is organized as follows: Section II
presents the system model and the problem description. In
Section III, an efficient solution for the problem is proposed.
Simulation results are provided in Section IV, followed by
conclusions in Section V.
II. SY ST EM MO DE L AN D PROB LE M DESCRIPTION
A. Network Model
Consider an IWRSN, as illustrated in Fig. 1, consisting
of a group of tasks for production-line monitoring, a set of
stationary industrial rechargeable sensors Swith cardinality of
|S| =Suniformly distributed in a certain area, and an MCV
which starts working from a depot deployed at the center.
At the beginning of a monitoring period, the industrial
controller declares its a bundle of monitoring tasks Z=
{zm
j|∀m∈ {1,2, ..., M },∀j∈ {1,2, . . . , J}} to the IWRSN,
where mand jstand for the index of the monitoring task
and its corresponding type, respectively. For meeting the QoM
requirements of these tasks, a group of sensors H ⊆ S should
be activated to collaboratively execute the monitoring tasks.
In practice, sensors’ sensing radius are limited, which can
be denoted by Ri,i∈ S. In addition, different types of sensors
can only execute tasks fitting their types, and thus we define
Sjas the set of sensors specialized in task type j. Obviously,
each sensor i∈ S can only execute task zm
j∈ Z that is
located within its sensing radius Riand falls into its targeted
type. In each monitoring period, each sensor is able to execute
at most one task. In this paper, we adopt the probabilistic
sensing coverage (PSC) model [2], [3], and denote pi,zm
jas
the detection probability of zm
jby sensor i, which can be
calculated as
pi,zm
j=(e−αi·dist(i,zm
j),if dist i, zm
j≤Ri, i ∈ Sj,
0,otherwise,
(1)
where αirepresents the intensity coefficient related to the
sensor i’s physical characteristics, and dist i, zm
jindicates
the Euclidean distance between sensor iand task zm
j[2]–
[4]. The collaborative coverage probability of sensor set Hto
the monitoring task zm
jis required to be larger or equal to
Pdemand
zm
j, i.e.,
1−Y
i∈H
(1 −pi,zm
j)≥Pdemand
zm
j,(2)
where Pdemand
zm
jmeasures the minimum QoM demanded by
each task zm
j. For sensors that are activated to execute tasks,
they should work continuously during the monitoring period
due to the application for industrial monitoring. However,
the battery capacity of each sensor Ecapacity
iis limited, and
once the battery is completely consumed, the sensor stops
working. To this end, the MCV is employed with energy
capacity EMC V which travels starting at the depot, charges
dying sensors in Hand returns to the depot at the end. Because
of the hardware limitation, the MCV can only recharge one
sensor at a time. We denote Einitial
ias the initial energy of
each sensor i∈ S at the beginning of the monitoring period.
For simplicity, assume that for each sensor i∈ S,Einitial
iis
sufficiently large to guarantee that Einitial
i≥Emin
i, where
Emin
iis the minimum energy for i∈ S to be operational.
Here, we characterize the energy consumption rate of each
sensor i∈ S by Econsume
i. Note that it is possible that some
sensors may have sufficiently enough energy so that they can
work continuously during the monitoring period and are not
necessary to be recharged by the MCV. We classify these
sensors into the set H0⊆ H, and categorize the others which
have to be recharged by the MCV into set H1=H\H0.
Obviously, the amount of energy that sensor i∈ H1required
to be recharged can be calculated as
Edemand
i=T·Econsume
i−(Einitial
i−Emin
i),∀i∈ H1,(3)
where Tis the time duration of each production-line monitor-
ing task period.
For ensuring that all activated sensors can execute tasks
continuously, the MCV should charge the sensors in set H1
before their charging deadline ddli,i∈ H1, which can be
calculated as
ddli=Einitial
i−Emin
i
Econsume
i
,∀i∈ H1.(4)
Besides, let us denote the charging route of the MCV by
a vector LH1={π0, π1, ..., πg, ..., π|H1|, π|H1|+1}, where πg
signifies the gth visiting target (i.e., the targeted sensor for
recharging). Specifically, π0=π|H1|+1 = 0 indicates that
the MCV travels starting from the depot and returns at the
end, and πg∈ H1for g= 1,...,|H1|. Note that, each
sensor i∈ H1can only be visited once, that is πg6=πg0
for g6=g0. Furthermore, we define the arrival time of the
MCV at a visiting target πgas Aπg. Clearly, Aπgdepends
on the arrival time of the last visited target πg−1, the service
time (i.e., battery recharging time) for the target πg−1, and the
traveling time of the MCV from πg−1to πg. Hence, Aπgcan
be expressed as
Aπg=Aπg−1+Edemand
πg−1
ε+dist (πg−1, πg)
v,∀πg∈ LH1,(5)
where εand vstand for the the charging efficiency and the
velocity of the MCV, respectively. Following the definition in
(3), Edemand
πgdepicts the amount energy that the target πg(or
sensor πg) demands for recharging. In particular, Edemand
π0=
Edemand
π|H1|+1 = 0, and Aπ0= 0.
In this paper, we assume that when a sensor i∈ H
has been fully recharged, it can work continuously without
interruption during the monitoring period, namely Ecapacity
i≥
T·Econsume
i.
B. Problem Description
The energy consumption of an IWRSN includes the energy
consumption of the MCV and the energy consumption of
sensors in Hfor executing tasks. Although the energy cost
of the MCV further consists of both the traveling energy
cost and the recharging energy cost, all recharging energy
will be consumed completely by sensors for a higher energy
utilization efficiency, and thus such term is implied by the
energy cost of sensors in H. Therefore, the total energy
consumption of an IWRSN Etotal H,LH1can be formulated
as
Etotal(H,LH1)=
|H1|
X
g=0
γ·dist (πg, πg+1)+X
i∈H
T·Econsume
i,
where γrepresents the energy consumption rate from MCV’s
travelling.
Accordingly, a joint optimization of sensor activation (i.e.,
the optimal set of sensors to activate H) and mobile charging
scheduling (i.e., the optimal charging route LH1) for the
IWRSN can be formulated as
[P1] : min
H,LH1
Etotal(H,LH1)(6)
s.t., 1−Y
i∈H
(1 −pi,zm
j)≥Pdemand
zm
j,∀zm
j∈ Z,(7)
Aπg≤ddlπg, g=1, . . . |H1|,(8)
πg6=πg0, g 6=g0;g=1, . . . |H1|, g0=1, . . . |H1|,(9)
|H1|
X
g=0
γ·dist (πg, πg+1)+
|H1|
X
g=1
Edemand
πg≤EMC V ,(10)
π0= 0, π|H1|+1 = 0,(11)
H ⊆ S,(12)
H=H0∪ H1,(13)
LH1={π0, π1, ..., πg, ..., π|H1|, π|H1|+1},(14)
where constraint (7) states that each monitoring task’s QoM
requirement should be met; constraint (8) ensures that the
MCV can always be scheduled to arrive before each sensor’s
charging deadline expires; constraint (9) means that the MCV
should not visit the same sensor more than once in the
scheduled charging route; constraint (10) indicates that the
total energy consumption of the MCV should be less than or
equal to its energy capacity EMC V ; constraint (11) illustrates
that the MCV starts at the depot and returns to the depot at
the end. In the following section, we will propose an efficient
algorithm to derive the solution of this joint optimization
problem.
III. JOINT SEN SO R ACT IVATIO N AN D MOBILE CHARGING
SCHEDULING
A. Hardness Analysis
From the problem formulation [P1], we can observe that
the joint optimization of sensor activation and mobile charg-
ing scheduling actually includes two-layer optimizations. The
upper layer optimization mainly addresses the sensor set
selection with tasks’ QoM constraints, where the objective is
to minimize the energy consumption of the activating sensor
set H. And the lower layer optimization aims to determine the
charging route scheduling for the MCV by taking into account
sensors’ charging deadlines, where the objective is to minimize
the traveling energy consumption of the MCV. Indeed, these
two optimization problems are tightly coupled.
Given the charging route LH1of the MCV, we can get the
set of candidate sensors S0⊆ S, where all sensors in S0have
sufficient energy to execute monitoring tasks continuously
during the monitoring period. The upper layer sensor set
selection problem turns to be a variant generalized assignment
problem, which is NP-hard:
[P2] : min
HX
i∈H
T·Econsume
i
s.t., (7),(13) and H ⊆ S0,
While given the set H, the set H1can also be obtained and
the lower layer mobile charging route scheduling problem can
be seen as a reduced traveling salesman with time windows
problem, which is NP-hard:
[P3] : min
LH1
|H1|
X
g=0
γ·dist (πg, πg+1)
s.t., (8),(9),(10),(11) and (14)
Based on the above analyses, it is obvious that solving the
joint optimization of sensor activation and mobile charging
scheduling for the IWRSN directly is very challenging be-
cause: i) both the upper layer sensor selection optimization,
and the lower layer charging route scheduling problem are
NP-hard; ii) the upper and lower layer problems are tightly
coupled (i.e., the input of the lower layer problem depends on
the output of the upper layer one, while the optimization of the
upper problem would impact the lower layer problem). In the
following subsection, we first solve the MCV charging route
scheduling problem by applying a DRL-based approach. Then,
we jointly optimize the sensor set selection and the MCV
charging route scheduling by utilizing a marginal product
based approximation algorithm.
B. DRL Algorithm for Mobile Charging Route Scheduling
Here, a modified pointer network similar to that in [5] is
introduced to model the lower layer problem [P3], and the
Actor-Critic algorithm is utilized for training.
First, we introduce the input structure of the neural network.
At each decoding step g= 0,1,...,|H1|+ 1, let the set
of inputs be Xg={x0
g, x1
g,...x|H1|
g}, where |H1|indicates
the number of targets that need to be recharged. Each xi
gis
represented by a sequence of tuples {xi
g= (si, di
g)}, where si
and di
gstand for the static and dynamic elements of the input,
respectively. It is worth noting that dynamic elements of each
input are allowed to alter between the decoding steps, while
the static elements are invariant. For example, siis the attribute
of the target i, including target i’s location and the charging
deadline, which does not change during the charging process.
However, the charging requirement of the target ibecomes 0
after charging by the MCV. Therefore, xi
gcan be viewed as a
vector of features that depicts the state of iat decoding step g.
Particularly, x0
grepresents attributes of the depot, which is set
to locate at the center of the area, and its charging deadline is
infinite and it has no charging demand.
The output of the model is a permutation of the sensors
and the depot, LH1={π0, π1, ..., π|H1|, ..., π|H1|+1}. At each
decoding step g= 0,1,...,|H1|+ 1,πgpoints to a sensor
or the depot in Xg, determining the next visiting target. The
states of sensors in Xgare updated every time after a target has
been visited. When the charging requirements of all sensors
are satisfied, the process will be terminated.
To map input X0to output LH1, the probability chain rule
is utilized:
PLH1|X0=
|H1|
Y
g=1
P(πg+1 |π0, π1,· · · , πg, Xg).(15)
Firstly the depot is selected as π0. Eq. (15) provides the
probability of selecting the next visiting target according to
π0, π1, . . . , πg, i.e., the already visited targets. Then a modified
pointer network similar to that in [5] is used to model (15).
Its basic structure is the sequence-to-sequence model [6], a
powerful model in the machine translation field, which maps
one sequence to another. The sequence-to-sequence model
consists of two recurrent neural networks (RNNs), namely
encoder and decoder.
Encoder encodes the input sequence into a code vector
which contains knowledge of the input. Since the attributes
of the targets convey no sequential information and the order
of targets in the inputs is meaningless, RNN is not necessary
to be utilized in the encoder. Therefore, a simple embedding
layer is adopted to encode the inputs which decreases the
computational compilations without decreasing the efficiency
[5]. In this work, we apply a 1-dimensional (1-D) convolution
layer to encode the inputs to a high-dimensional vector [5]
(d= 128 in this work). The parameters of the 1-D convolution
layer are shared among the inputs.
Different from the encoder, we use RNN to model the
decoder network since we need to store the knowledge of
Algorithm 1: Actor-Critic training algorithm
Output: The optimal model M∗= [θ∗, φ∗].
1Initialize: Let the actor network with random weights θand critic
network with random weights φ;
2for iteration ←1,2,... do
3generate F problem instances from {ΦM1,ΦM2,...,ΦMM};
4for c←1,...,F do
5t←0;
6while not terminated do
7select the next target πc
g+1 according to
Pπc
g+1|πc
1,...,πc
g, Xc
g;
8Update Xc
gto Xc
g+1 leaving out the visited targets;
9compute the reward Rc;
10 dθ ←1
FPF
c=1 Rc−VXc
0;φ∇θlog PYc|Xc
0;
11 dφ ←1
FPF
c=1 ∇φRc−VXc
0;φ2;
12 θ←θ+ηdθ;
13 φ←φ+ηdφ;
14 Determine θ∗=θ, φ∗=φ.
previous steps π0, π1,· · · , πgto assist for obtaining πg+1.
The hidden state of RNN decoder dgcan memorize the
previously selected visited targets. Then dgis combined with
the encoding of the inputs ρ0
g, ρ1
g,...ρ|H1|
gto calculate the
conditional probability P(πg+1 |π0, π1,· · · , πg, Xg).
The attention mechanism is utilized to calculate the degree
of correlation of each input to the decoding step g. More
attention is given to the most relevant one which is more
likely to be selected as the next target. The calculation can be
expressed as
ui
g=wTtanh W1ρi
g+W2dg, i ∈(0,1, ..., |H1|) ;
P(πg+1 |π0, π1,· · · , πg, Xg) = softmax ui
g,
where w,W1,W2are learnable parameters. For each target
i, its ui
gis computed by dgand its encoder hidden state ρi
g.
The softmax operator is used to normalize u0
g, u1
g, . . . , u|H1|
g,
and probability for selecting each target iat step gcan then
be obtained. In this paper, the greedy decoder is utilized to
select the next target.
We adopt the well-known Actor-Critic method to train the
network. The method introduces two networks that require to
be trained: i) an actor network, which is the pointer network in
this work, is used to calculate the probability distribution for
choosing the next target; and ii) a critic network that evaluates
the expected reward given a specific problem state. In addition,
the critic network uses the same architecture as the pointer
network’s encoder which maps the encoder hidden state into
the critic output. However, during training, the model selects
the next target by sampling from the probability distribution
instead of choosing the target with the maximum probability.
The training is conducted in an unsupervised way and
the training procedure is presented in Algorithm 1. Dur-
ing the training process, we generate instances from dis-
tributions {ΦM1,ΦM2,...,ΦMM}, where Msignifies dif-
ferent input features of the targets, i.e, the targets’ loca-
tions, charging deadlines, etc. Finstances are sampled from
{ΦM1,ΦM2,...,ΦMM}for training the actor and critic net-
Algorithm 2: Joint Sensor Activation and Charging
Scheduling Algorithm (JSACS)
Input: Scandidate
zm
j={i|pi,zm
j6= 0,∀i∈ S},Scandidate =
P
zm
j∈Z
Scandidate
zm
j
,Zunsatisfied =Z.
Output: H,LH1.
1Initialize: Let H0=∅,H1=∅,H=∅,Etravel
MC V (H1)=0;
2while Zunsatisfied is nonempty do
3for each i∈ Scandidate do
4if Einitial
i−Emin
i≥T·Econsume
ithen
5Etravel
MC V (H1∪ {i}) = Etravel
MC V (H1);
6else
7Call the model M∗= [θ∗, φ∗]in algorithm 1 to get a
charging route LH1∪{i}which meets each sensor’s
charging deadline (If there is no charging route that
meets the sensor’s charging deadline or the energy
consumption of the MCV exceeds EMC V , delete
the sensor ifrom Scandidate.), then compute the
energy consumption of the charging route
Etravel
MC V (H1∪ {i});
8iselected =
arg max
i∈Scandidate
{
(1−Qi0∈H∪{i}(1−pi0,zm
j))−(1−Qi0∈H(1−pi0,zm
j))
Etotal(H∪{i},LH1∪h)−Etotal (H,LH1),∀zm
j∈ Z},
Update H=H ∪ {iselected},Etrav el
MC V (H1) =
Etravel
MC V (H1∪ {iselected});
9if Einitial
iselected −Emin
iselected ≥T·Econsume
iselected then
10 Update H0=H0∪ {iselected};
11 else
12 Update H1=H1∪ {iselected};
13 for each zm
j∈ Zunsatisf ied do
14 if 1−Qi∈H(1 −pi,zm
j)≥Pdemand
zm
j
then
15 Update Scandidate =Scandidate\{S candidate
zm
j},
Zunsatisfied =Zunsatisfied \{zm
j};
16 Update Scandidate =Scandidate\{iselected };
17 return H,LH1.
works with parameters θand φ. For each instance, the actor
network with current parameters θproduces the permutation of
targets, and the corresponding reward can be obtained. Then
policy gradient is computed in line 10 to update the actor
network. Meanwhile, the critic network is updated in line 11
by reducing the difference between the observed rewards and
the approximated rewards.
C. Joint Sensor Activation & Charging Scheduling Algorithm
Based on the MCV’s traveling energy consumption calcu-
lated by the trained model M∗, the core idea is iteratively
selecting a new sensor iwhich has the largest marginal product
[7]. Marginal product is a concept in economics, which refers
to the increase in the total output brought about by adding
a unit of an input, assuming that the quantities of other
inputs are maintained as constant [7]. In this paper, the energy
consumption of the IWRSN corresponds to the adding input,
and the QoM obtained by all tasks corresponds to the output.
Then, in each iteration, a new activating sensor should be
TABLE I
MAIN SIMULATION PAR AM ETE RS .
Parameter Value
Sensor types [0,1,2,3]
Task types [0,1,2,3]
Number of sensors 800 (number of each type: 200)
Number of tasks 40 (randomly chosen over [0,1,2,3])
Area dimensions 80 m ×80 m
Sensing radius Rirandomly chosen over [10,15,20,25] m
Energy capacity Ecapacity
i10.8 kJ
Energy consumption rate Econsume
i0.5 J/s
Minimum energy Emin
i540 J
Initial energy Einitial
irandomly over [1080,3240] J
Intensity coefficient αirandomly over [0.1,0.3]
QoM demand Pdemand
zm
j
randomly over [0.5, 0.7]
Charging efficiency ε15 W
Velocity v2 m/s
Traveling energy consumption γ20 J/m
Energy capacity of MCV EMC V 128 kJ
Time duration of monitoring period T1 hour
selected according to:
arg max
i∈Scandidate
{(1−Qi0∈H∪{i}(1−pi0,zm
j))−(1−Qi0∈H(1−pi0,zm
j))
Etotal(H∪{i},LH1∪h)−Etotal (H,LH1),∀zm
j∈ Z},
where hindicates whether this sensor needed to be recharged
or not:
h=({i},if Einitial
i−Emin
i< T ·Econsume
i,
∅,otherwise.
Initially, H=∅, and the details of the proposed JSACS
algorithm can be found in Algorithm 2.
IV. SIMULATION RESULTS
In this section, simulations are conducted to numerically
evaluate the performance of the proposed JSACS for problem
P1. Table I lists the values of main simulation parameters.
Similar settings have been employed in the literature [8].
Note that some parameters may vary according to different
evaluation scenarios.
For effective and fair comparisons, we introduce the greedy
algorithm (GRE) and an existing algorithm named reward-cost
ratio algorithm (RC-ratio) [9]. GRE greedily selects sensors
into Hthat have maximum coverage probability until all tasks’
QoM are satisfied and then applies the earliest deadline first
policy (EDF) [10] to derive the charging tour of the MCV for
H1. For EDF, MCV always selects a sensor with the earliest
charging deadline as its next serving target. Besides, both the
charging deadlines of sensors in H1and the energy capacity of
MCV are taken into account when selecting each sensor. RC-
ratio selects sensors into Haccording to the marginal product
function while the MCV’s charging route is determined by
EDF.
Fig. 3 demonstrates the superiority of the proposed JSACS
in terms of the entire network energy consumption. It is shown
that, the energy consumption of the entire network increases
monotonically with the number of tasks. This is because with
the growth of the number of tasks, more sensors need to be
activated, leading to more energy consumption. Meanwhile,
35 36 37 38 39 40 41 42 43 44 45
Number of Tasks
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
1.55
1.6
Energy Consumption of the Entire IWRSN (J)
×105
GRE
RC-ratio
Proposed JSACS
Fig. 3. Comparison of energy consumption of
the entire IWRSN w.r.t. number of tasks.
35 36 37 38 39 40 41 42 43 44 45
Number of Tasks
0
10
20
30
40
50
60
70
80
Energy Utilization Efficiency of the MCV (%)
GRE
RC-ratio
Proposed JSACS
Fig. 4. Comparison of energy utilization effi-
ciency of the MCV w.r.t. number of tasks.
50 60 70 80 90 100 110 120 130 140 150
Network Size (L*L)
0.6
0.8
1
1.2
1.4
1.6
1.8
Energy Consumption of the Entire IWRSN (J)
×105
GRE
RC-ratio
Proposed JSACS
Fig. 5. Comparison of energy consumption of
the entire IWRSN w.r.t. network sizes.
with more sensors being activated, a growing number of them
need to be recharged within this area, resulting in the increase
of the MCV’s traveling energy consumption. Additionally, it
can be observed that the proposed JSACS outperforms GRE
and RC-ratio. The reason is that GRE iteratively selects a
sensor with maximum coverage probability while ignores the
sensor selection impacts on the total energy consumption. RC-
ratio outperforms GRE since RC-ratio selects a sensor with
maximum marginal product in each iteration. The proposed
JSACS achieves the best performance because it does not
only select a sensor with the largest marginal product in each
iteration, but also determines the charging route of the MCV
by a well DRL model instead of EDF.
Fig. 4 compares the energy utilization efficiency of GRE,
RC-ratio and proposed JSACS. The energy utilization effi-
ciency refers to the proportion of the energy for recharging
sensors to total MCV energy consumption. It is shown that
the proposed JSACS performs better than GRE and RC-ratio.
The reason is that the proposed JSACS consider the two-
layer optimization simultaneously when selecting a sensor.
In addition, the objective of the trained DRL model is to
minimize the traveling energy consumption of the MCV while
meeting the charging deadlines of sensors. However, the EDF
applied in GRE and RC-ratio does not consider the traveling
length of the MCV, and it simply recharge sensors in a timely
manner. Therefore, the proposed JSACS can prompt the MCV
to utilize more energy for task execution to increase the QoM
of tasks, rather than wasting energy on traveling.
Fig. 5 shows that the energy consumption of the entire
network of these three algorithms increases almost linearly
with the network size. The reason is that the larger network
size makes the sensor deployment more sparse, leading to
more energy consumption on traveling. In addition, a larger
network size also makes the distance between the sensor and
its monitoring tasks larger, and the detection probabilities of
sensors decrease, so that more sensors need to be activated to
execute tasks, inducing more energy consumption of sensors.
Intuitively, the proposed JSACS outperforms GRE and RC-
ratio, benefiting from integrating DRL and marginal product
based approximation algorithm to jointly solve the sensor
activation and charging scheduling problem.
V. CONCLUSION
In this paper, the joint optimization of sensor activation
and mobile charging scheduling for IWRSNs has been stud-
ied. By considering the objective of minimizing the energy
consumption of the entire network subjected to tasks’ QoM
requirements, sensor charging deadlines and the energy ca-
pacity of the MCV, an efficient algorithm named JSACS is
proposed integrating DRL and marginal product based approx-
imation algorithm. Simulation results show that, compared to
counterparts, the proposed algorithm can decrease the energy
consumption of the entire IWRSN and improve the energy
utilization efficiency of the MCV.
ACKNOWLEDGMENTS
This work was supported by National Natural Science Foun-
dation of China (NSFC) under Grants 62002164, 62176122,
and 62171218.
REFERENCES
[1] Y. Feng, W. Zhang, G. Han, Y. Kang, and J. Wang, “A newborn
particle swarm optimization algorithm for charging-scheduling algorithm
in industrial rechargeable sensor networks,” IEEE Sensors J., vol. 20,
no. 18, pp. 11 014–11 027, 2020.
[2] H. P. Gupta, T. Venkatesh, S. V. Rao, and T. Dutta, “Analysis of coverage
under border effects in three-dimensional mobile sensor networks,” IEEE
Trans. Mobile Comput., vol. 16, no. 9, pp. 2436–2449, 2017.
[3] C. Yi, J. Cai, K. Zhu, and R. Wang, “A queueing game based man-
agement framework for fog computing with strategic computing speed
control,” IEEE Trans. Mobile Comput., 2022.
[4] C. Yi, J. Cai, T. Zhang, K. Zhu, B. Chen, and Q. Wu, “Workload re-
allocation for edge computing with server collaboration: A cooperative
queueing game approach,” IEEE Trans. Mobile Comput., pp. 1–1, 2022.
[5] M. Nazari, A. Oroojlooy, L. V. Snyder, and M. Tak´
aˇ
c, “Reinforcement
learning for solving the vehicle routing problem,” in Adv. Neural Inf.
Process. Syst., 2018, pp. 9839–9849.
[6] I. Sutskever and O. Vinyals, “Sequence to sequence learning with neural
networks,” in Adv. Neural Inf. Process. Syst., 2014, pp. 3104–3112.
[7] A. Brewer, The making of the classical theory of economic growth.
Routledge, 2010.
[8] T. Liu, B. Wu, S. Zhang, J. Peng, and W. Xu, “An effective multi-node
charging scheme for wireless rechargeable sensor networks,” in Proc.
IEEE Int. Conf. Comput. Commun., 2020.
[9] T. Wu, P. Yang, H. Dai, C. Xiang, X. Rao, J. Huang, and T. Ma, “Joint
sensor selection and energy allocation for tasks-driven mobile charging
in wireless rechargeable sensor networks,” IEEE Internet Things J.,
vol. 7, no. 12, pp. 11 505–11 523, 2020.
[10] J. A. Stankovic, M. Spuri, K. Ramamritham, and G. C. Buttazzo,
Deadline scheduling for real-time systems: EDF and related algorithms.
Springer Science & Business Media, 2012, vol. 460.