Conference PaperPDF Available

A Joint Optimization of Sensor Activation and Mobile Charging Scheduling in Industrial Wireless Rechargeable Sensor Networks

Authors:
A Joint Optimization of Sensor Activation and
Mobile Charging Scheduling in Industrial Wireless
Rechargeable Sensor Networks
Jiayuan Chen, Changyan Yi, Ran Wang, Kun Zhuand Jun Cai
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Department of Electrical and Computer Engineering, Concordia University, Montr´
eal, QC, H3G 1M8, Canada
Email: {jiayuan.chen, changyan.yi, wangran, zhukun}@nuaa.edu.cn, jun.cai@concordia.ca
Abstract—In this paper, a joint optimization of sensor acti-
vation and mobile charging scheduling for industrial wireless
rechargeable sensor networks (IWRSNs) is studied. In the con-
sidered model, an optimal sensor set is selected to collaboratively
execute a bundle of heterogeneous tasks of production-line
monitoring, meeting the quality-of-monitoring (QoM) of each
individual task. There is a mobile charger vehicle (MCV) which is
scheduled for recharging sensors before their charging deadlines
(i.e., the time instant of running out of their energy). Our goal
is to jointly optimize the sensor activation and MCV scheduling
for minimizing the energy consumption of the entire IWRSN,
subjected to tasks’ QoM requirements, sensor charging deadlines
and the energy capacity of the MCV. Unfortunately, solving this
problem is non-trivial, because it involves solving two tightly
coupled NP-hard problems. To address this issue, we design an
efficient algorithm integrating deep reinforcement learning and
marginal product based approximation algorithm. Simulations
are conducted to evaluate the performance of the proposed
solution and demonstrate its superiority over counterparts.
I. INTRODUCTION
WITH the development of intelligent manufacturing,
industrial wireless sensor networks (IWSNs) have been
widely used for the automatic control of industrial production
process and the monitoring of various parameters. Never-
theless, wireless sensor nodes are severely energy-limited,
which hinders the wide application of IWSNs. To tackle such
sensor energy provisioning problem, researchers studied how
to reduce the energy consumption by optimizing wake-up and
sleeping scheduling, data gathering and routing strategies, etc.
to prolong the lifetime of IWSNs. However, these methods
cannot fundamentally address the shortage of total energy
capacities of sensors. Therefore, recent advances of wireless
energy transfer technology have inspired the emergence of
industrial wireless rechargeable sensor networks (IWRSNs)
[1], in which mobile charger vehicles (MCVs) are employed
to travel around and replenish energy for sensors without
interconnecting wires.
Although IWRSNs can obviously outperform traditional
IWSNs in alleviating the heavy burden of energy consumption,
there are still some open problems remaining. In practice,
sensing tasks for production-line monitoring may be highly
heterogeneous in terms of quality of monitoring (QoM) re-
quirements, locations and types. Besides, industrial sensors
may also be heterogeneous in terms of sensing radius, types,
etc. Therefore, it is crucial to select the optimal set of sensors
to activate for collaboratively and continuously execute all
monitoring tasks while meeting the QoM of each task, and
such problem becomes more complicated since sensors in
IWRSNs are rechargeable.
Furthermore, industrial sensors must keep up high-intensity
work for long periods and continuously feed data back to
controllers or actuators. For example, while a cutting machine
is working, industrial camera sensors must collaboratively
monitor the position of cutters in real-time and send out the
data in a timely manner. Any unpredictable sensor failure
may cause serious consequences, e.g., unexpected damages
and casualties. Hence, in order to guarantee that all activated
sensors can work continuously during the monitoring period,
the MCV in IWRSNs should be scheduled to recharge sensors
before their charging deadlines (i.e., the instant of running out
of their energy). However, the energy capacity of MCV is also
limited, and thus the scheduling of MCV is not only subjected
to the charging deadlines of sensors, but also its own energy
capacity constraint.
To address the aforementioned issues, in this paper, we
study a joint optimization of sensor activation and mobile
charging scheduling for IWRSNs. The goal is to jointly opti-
mize the sensor activation and MCV scheduling for minimiz-
ing the energy consumption of the considered IWRSN, sub-
jected to tasks’ QoM requirements, sensor charging deadlines
and energy capacity of the MCV. In the considered model, the
MCV starts from the depot, travels along the scheduled path
and returns to the depot at the end of a trip. While traveling
on its path, the MCV charges activated sensors before their
charging deadlines. To solve such joint sensor activation and
mobile charging scheduling problem, we propose an efficient
algorithm integrating deep reinforcement learning (DRL) and
marginal product based approximation algorithm.
The main contributions of this paper are summarized in the
following.
A joint optimization of sensor activation and mobile
charging scheduling for IWRSNs is formulated, where
the objective is to minimize the energy consumption of
the entire network.
An efficient algorithm, called joint sensor activation and
charging scheduling algorithm (JSACS), is proposed inte-
Industrial Environment Depot
Mobile
Charging
Vehicle
Inactive
Sensor
Active
Sensor
Task of
Monitoring
Initial
Energy of
Sensor
Charging
Route
Sensing
Radius
Charging
Deadline
Fig. 1. An illustration of the considered IWRSN.
grating DRL and marginal product based approximation
algorithm, which jointly optimizes the sensor activation
and the MCV’s charging route scheduling.
Simulations are conducted to show the superiority of the
proposed JSACS over counterparts.
The rest of this paper is organized as follows: Section II
presents the system model and the problem description. In
Section III, an efficient solution for the problem is proposed.
Simulation results are provided in Section IV, followed by
conclusions in Section V.
II. SY ST EM MO DE L AN D PROB LE M DESCRIPTION
A. Network Model
Consider an IWRSN, as illustrated in Fig. 1, consisting
of a group of tasks for production-line monitoring, a set of
stationary industrial rechargeable sensors Swith cardinality of
|S| =Suniformly distributed in a certain area, and an MCV
which starts working from a depot deployed at the center.
At the beginning of a monitoring period, the industrial
controller declares its a bundle of monitoring tasks Z=
{zm
j|∀m {1,2, ..., M },j {1,2, . . . , J}} to the IWRSN,
where mand jstand for the index of the monitoring task
and its corresponding type, respectively. For meeting the QoM
requirements of these tasks, a group of sensors H S should
be activated to collaboratively execute the monitoring tasks.
In practice, sensors’ sensing radius are limited, which can
be denoted by Ri,i S. In addition, different types of sensors
can only execute tasks fitting their types, and thus we define
Sjas the set of sensors specialized in task type j. Obviously,
each sensor i S can only execute task zm
j Z that is
located within its sensing radius Riand falls into its targeted
type. In each monitoring period, each sensor is able to execute
at most one task. In this paper, we adopt the probabilistic
sensing coverage (PSC) model [2], [3], and denote pi,zm
jas
the detection probability of zm
jby sensor i, which can be
calculated as
pi,zm
j=(eαi·dist(i,zm
j),if dist i, zm
jRi, i Sj,
0,otherwise,
(1)
where αirepresents the intensity coefficient related to the
sensor i’s physical characteristics, and dist i, zm
jindicates
the Euclidean distance between sensor iand task zm
j[2]–
[4]. The collaborative coverage probability of sensor set Hto
the monitoring task zm
jis required to be larger or equal to
Pdemand
zm
j, i.e.,
1Y
i∈H
(1 pi,zm
j)Pdemand
zm
j,(2)
where Pdemand
zm
jmeasures the minimum QoM demanded by
each task zm
j. For sensors that are activated to execute tasks,
they should work continuously during the monitoring period
due to the application for industrial monitoring. However,
the battery capacity of each sensor Ecapacity
iis limited, and
once the battery is completely consumed, the sensor stops
working. To this end, the MCV is employed with energy
capacity EMC V which travels starting at the depot, charges
dying sensors in Hand returns to the depot at the end. Because
of the hardware limitation, the MCV can only recharge one
sensor at a time. We denote Einitial
ias the initial energy of
each sensor i S at the beginning of the monitoring period.
For simplicity, assume that for each sensor i S,Einitial
iis
sufficiently large to guarantee that Einitial
iEmin
i, where
Emin
iis the minimum energy for i S to be operational.
Here, we characterize the energy consumption rate of each
sensor i S by Econsume
i. Note that it is possible that some
sensors may have sufficiently enough energy so that they can
work continuously during the monitoring period and are not
necessary to be recharged by the MCV. We classify these
sensors into the set H0 H, and categorize the others which
have to be recharged by the MCV into set H1=H\H0.
Obviously, the amount of energy that sensor i H1required
to be recharged can be calculated as
Edemand
i=T·Econsume
i(Einitial
iEmin
i),i H1,(3)
where Tis the time duration of each production-line monitor-
ing task period.
For ensuring that all activated sensors can execute tasks
continuously, the MCV should charge the sensors in set H1
before their charging deadline ddli,i H1, which can be
calculated as
ddli=Einitial
iEmin
i
Econsume
i
,i H1.(4)
Besides, let us denote the charging route of the MCV by
a vector LH1={π0, π1, ..., πg, ..., π|H1|, π|H1|+1}, where πg
signifies the gth visiting target (i.e., the targeted sensor for
recharging). Specifically, π0=π|H1|+1 = 0 indicates that
the MCV travels starting from the depot and returns at the
end, and πg H1for g= 1,...,|H1|. Note that, each
sensor i H1can only be visited once, that is πg6=πg0
for g6=g0. Furthermore, we define the arrival time of the
MCV at a visiting target πgas Aπg. Clearly, Aπgdepends
on the arrival time of the last visited target πg1, the service
time (i.e., battery recharging time) for the target πg1, and the
traveling time of the MCV from πg1to πg. Hence, Aπgcan
be expressed as
Aπg=Aπg1+Edemand
πg1
ε+dist (πg1, πg)
v,πg LH1,(5)
where εand vstand for the the charging efficiency and the
velocity of the MCV, respectively. Following the definition in
(3), Edemand
πgdepicts the amount energy that the target πg(or
sensor πg) demands for recharging. In particular, Edemand
π0=
Edemand
π|H1|+1 = 0, and Aπ0= 0.
In this paper, we assume that when a sensor i H
has been fully recharged, it can work continuously without
interruption during the monitoring period, namely Ecapacity
i
T·Econsume
i.
B. Problem Description
The energy consumption of an IWRSN includes the energy
consumption of the MCV and the energy consumption of
sensors in Hfor executing tasks. Although the energy cost
of the MCV further consists of both the traveling energy
cost and the recharging energy cost, all recharging energy
will be consumed completely by sensors for a higher energy
utilization efficiency, and thus such term is implied by the
energy cost of sensors in H. Therefore, the total energy
consumption of an IWRSN Etotal H,LH1can be formulated
as
Etotal(H,LH1)=
|H1|
X
g=0
γ·dist (πg, πg+1)+X
i∈H
T·Econsume
i,
where γrepresents the energy consumption rate from MCV’s
travelling.
Accordingly, a joint optimization of sensor activation (i.e.,
the optimal set of sensors to activate H) and mobile charging
scheduling (i.e., the optimal charging route LH1) for the
IWRSN can be formulated as
[P1] : min
H,LH1
Etotal(H,LH1)(6)
s.t., 1Y
i∈H
(1 pi,zm
j)Pdemand
zm
j,zm
j Z,(7)
Aπgddlπg, g=1, . . . |H1|,(8)
πg6=πg0, g 6=g0;g=1, . . . |H1|, g0=1, . . . |H1|,(9)
|H1|
X
g=0
γ·dist (πg, πg+1)+
|H1|
X
g=1
Edemand
πgEMC V ,(10)
π0= 0, π|H1|+1 = 0,(11)
H S,(12)
H=H0 H1,(13)
LH1={π0, π1, ..., πg, ..., π|H1|, π|H1|+1},(14)
where constraint (7) states that each monitoring task’s QoM
requirement should be met; constraint (8) ensures that the
MCV can always be scheduled to arrive before each sensor’s
charging deadline expires; constraint (9) means that the MCV
should not visit the same sensor more than once in the
scheduled charging route; constraint (10) indicates that the
total energy consumption of the MCV should be less than or
equal to its energy capacity EMC V ; constraint (11) illustrates
that the MCV starts at the depot and returns to the depot at
the end. In the following section, we will propose an efficient
algorithm to derive the solution of this joint optimization
problem.
III. JOINT SEN SO R ACT IVATIO N AN D MOBILE CHARGING
SCHEDULING
A. Hardness Analysis
From the problem formulation [P1], we can observe that
the joint optimization of sensor activation and mobile charg-
ing scheduling actually includes two-layer optimizations. The
upper layer optimization mainly addresses the sensor set
selection with tasks’ QoM constraints, where the objective is
to minimize the energy consumption of the activating sensor
set H. And the lower layer optimization aims to determine the
charging route scheduling for the MCV by taking into account
sensors’ charging deadlines, where the objective is to minimize
the traveling energy consumption of the MCV. Indeed, these
two optimization problems are tightly coupled.
Given the charging route LH1of the MCV, we can get the
set of candidate sensors S0 S, where all sensors in S0have
sufficient energy to execute monitoring tasks continuously
during the monitoring period. The upper layer sensor set
selection problem turns to be a variant generalized assignment
problem, which is NP-hard:
[P2] : min
HX
i∈H
T·Econsume
i
s.t., (7),(13) and H S0,
While given the set H, the set H1can also be obtained and
the lower layer mobile charging route scheduling problem can
be seen as a reduced traveling salesman with time windows
problem, which is NP-hard:
[P3] : min
LH1
|H1|
X
g=0
γ·dist (πg, πg+1)
s.t., (8),(9),(10),(11) and (14)
Based on the above analyses, it is obvious that solving the
joint optimization of sensor activation and mobile charging
scheduling for the IWRSN directly is very challenging be-
cause: i) both the upper layer sensor selection optimization,
and the lower layer charging route scheduling problem are
NP-hard; ii) the upper and lower layer problems are tightly
coupled (i.e., the input of the lower layer problem depends on
the output of the upper layer one, while the optimization of the
upper problem would impact the lower layer problem). In the
following subsection, we first solve the MCV charging route
scheduling problem by applying a DRL-based approach. Then,
we jointly optimize the sensor set selection and the MCV
charging route scheduling by utilizing a marginal product
based approximation algorithm.
B. DRL Algorithm for Mobile Charging Route Scheduling
Here, a modified pointer network similar to that in [5] is
introduced to model the lower layer problem [P3], and the
Actor-Critic algorithm is utilized for training.
First, we introduce the input structure of the neural network.
At each decoding step g= 0,1,...,|H1|+ 1, let the set
of inputs be Xg={x0
g, x1
g,...x|H1|
g}, where |H1|indicates
the number of targets that need to be recharged. Each xi
gis
represented by a sequence of tuples {xi
g= (si, di
g)}, where si
and di
gstand for the static and dynamic elements of the input,
respectively. It is worth noting that dynamic elements of each
input are allowed to alter between the decoding steps, while
the static elements are invariant. For example, siis the attribute
of the target i, including target is location and the charging
deadline, which does not change during the charging process.
However, the charging requirement of the target ibecomes 0
after charging by the MCV. Therefore, xi
gcan be viewed as a
vector of features that depicts the state of iat decoding step g.
Particularly, x0
grepresents attributes of the depot, which is set
to locate at the center of the area, and its charging deadline is
infinite and it has no charging demand.
The output of the model is a permutation of the sensors
and the depot, LH1={π0, π1, ..., π|H1|, ..., π|H1|+1}. At each
decoding step g= 0,1,...,|H1|+ 1,πgpoints to a sensor
or the depot in Xg, determining the next visiting target. The
states of sensors in Xgare updated every time after a target has
been visited. When the charging requirements of all sensors
are satisfied, the process will be terminated.
To map input X0to output LH1, the probability chain rule
is utilized:
PLH1|X0=
|H1|
Y
g=1
P(πg+1 |π0, π1,· · · , πg, Xg).(15)
Firstly the depot is selected as π0. Eq. (15) provides the
probability of selecting the next visiting target according to
π0, π1, . . . , πg, i.e., the already visited targets. Then a modified
pointer network similar to that in [5] is used to model (15).
Its basic structure is the sequence-to-sequence model [6], a
powerful model in the machine translation field, which maps
one sequence to another. The sequence-to-sequence model
consists of two recurrent neural networks (RNNs), namely
encoder and decoder.
Encoder encodes the input sequence into a code vector
which contains knowledge of the input. Since the attributes
of the targets convey no sequential information and the order
of targets in the inputs is meaningless, RNN is not necessary
to be utilized in the encoder. Therefore, a simple embedding
layer is adopted to encode the inputs which decreases the
computational compilations without decreasing the efficiency
[5]. In this work, we apply a 1-dimensional (1-D) convolution
layer to encode the inputs to a high-dimensional vector [5]
(d= 128 in this work). The parameters of the 1-D convolution
layer are shared among the inputs.
Different from the encoder, we use RNN to model the
decoder network since we need to store the knowledge of
Algorithm 1: Actor-Critic training algorithm
Output: The optimal model M= [θ, φ].
1Initialize: Let the actor network with random weights θand critic
network with random weights φ;
2for iteration 1,2,... do
3generate F problem instances from {ΦM1,ΦM2,...,ΦMM};
4for c1,...,F do
5t0;
6while not terminated do
7select the next target πc
g+1 according to
Pπc
g+1|πc
1,...,πc
g, Xc
g;
8Update Xc
gto Xc
g+1 leaving out the visited targets;
9compute the reward Rc;
10 1
FPF
c=1 RcVXc
0;φθlog PYc|Xc
0;
11 1
FPF
c=1 φRcVXc
0;φ2;
12 θθ+ηdθ;
13 φφ+ηdφ;
14 Determine θ=θ, φ=φ.
previous steps π0, π1,· · · , πgto assist for obtaining πg+1.
The hidden state of RNN decoder dgcan memorize the
previously selected visited targets. Then dgis combined with
the encoding of the inputs ρ0
g, ρ1
g,...ρ|H1|
gto calculate the
conditional probability P(πg+1 |π0, π1,· · · , πg, Xg).
The attention mechanism is utilized to calculate the degree
of correlation of each input to the decoding step g. More
attention is given to the most relevant one which is more
likely to be selected as the next target. The calculation can be
expressed as
ui
g=wTtanh W1ρi
g+W2dg, i (0,1, ..., |H1|) ;
P(πg+1 |π0, π1,· · · , πg, Xg) = softmax ui
g,
where w,W1,W2are learnable parameters. For each target
i, its ui
gis computed by dgand its encoder hidden state ρi
g.
The softmax operator is used to normalize u0
g, u1
g, . . . , u|H1|
g,
and probability for selecting each target iat step gcan then
be obtained. In this paper, the greedy decoder is utilized to
select the next target.
We adopt the well-known Actor-Critic method to train the
network. The method introduces two networks that require to
be trained: i) an actor network, which is the pointer network in
this work, is used to calculate the probability distribution for
choosing the next target; and ii) a critic network that evaluates
the expected reward given a specific problem state. In addition,
the critic network uses the same architecture as the pointer
network’s encoder which maps the encoder hidden state into
the critic output. However, during training, the model selects
the next target by sampling from the probability distribution
instead of choosing the target with the maximum probability.
The training is conducted in an unsupervised way and
the training procedure is presented in Algorithm 1. Dur-
ing the training process, we generate instances from dis-
tributions {ΦM1,ΦM2,...,ΦMM}, where Msignifies dif-
ferent input features of the targets, i.e, the targets’ loca-
tions, charging deadlines, etc. Finstances are sampled from
{ΦM1,ΦM2,...,ΦMM}for training the actor and critic net-
Algorithm 2: Joint Sensor Activation and Charging
Scheduling Algorithm (JSACS)
Input: Scandidate
zm
j={i|pi,zm
j6= 0,i S},Scandidate =
P
zm
j∈Z
Scandidate
zm
j
,Zunsatisfied =Z.
Output: H,LH1.
1Initialize: Let H0=,H1=,H=,Etravel
MC V (H1)=0;
2while Zunsatisfied is nonempty do
3for each i Scandidate do
4if Einitial
iEmin
iT·Econsume
ithen
5Etravel
MC V (H1 {i}) = Etravel
MC V (H1);
6else
7Call the model M= [θ, φ]in algorithm 1 to get a
charging route LH1∪{i}which meets each sensor’s
charging deadline (If there is no charging route that
meets the sensor’s charging deadline or the energy
consumption of the MCV exceeds EMC V , delete
the sensor ifrom Scandidate.), then compute the
energy consumption of the charging route
Etravel
MC V (H1 {i});
8iselected =
arg max
i∈Scandidate
{
(1Qi0∈H∪{i}(1pi0,zm
j))(1Qi0∈H(1pi0,zm
j))
Etotal(H∪{i},LH1h)Etotal (H,LH1),zm
j Z},
Update H=H {iselected},Etrav el
MC V (H1) =
Etravel
MC V (H1 {iselected});
9if Einitial
iselected Emin
iselected T·Econsume
iselected then
10 Update H0=H0 {iselected};
11 else
12 Update H1=H1 {iselected};
13 for each zm
j Zunsatisf ied do
14 if 1Qi∈H(1 pi,zm
j)Pdemand
zm
j
then
15 Update Scandidate =Scandidate\{S candidate
zm
j},
Zunsatisfied =Zunsatisfied \{zm
j};
16 Update Scandidate =Scandidate\{iselected };
17 return H,LH1.
works with parameters θand φ. For each instance, the actor
network with current parameters θproduces the permutation of
targets, and the corresponding reward can be obtained. Then
policy gradient is computed in line 10 to update the actor
network. Meanwhile, the critic network is updated in line 11
by reducing the difference between the observed rewards and
the approximated rewards.
C. Joint Sensor Activation & Charging Scheduling Algorithm
Based on the MCV’s traveling energy consumption calcu-
lated by the trained model M, the core idea is iteratively
selecting a new sensor iwhich has the largest marginal product
[7]. Marginal product is a concept in economics, which refers
to the increase in the total output brought about by adding
a unit of an input, assuming that the quantities of other
inputs are maintained as constant [7]. In this paper, the energy
consumption of the IWRSN corresponds to the adding input,
and the QoM obtained by all tasks corresponds to the output.
Then, in each iteration, a new activating sensor should be
TABLE I
MAIN SIMULATION PAR AM ETE RS .
Parameter Value
Sensor types [0,1,2,3]
Task types [0,1,2,3]
Number of sensors 800 (number of each type: 200)
Number of tasks 40 (randomly chosen over [0,1,2,3])
Area dimensions 80 m ×80 m
Sensing radius Rirandomly chosen over [10,15,20,25] m
Energy capacity Ecapacity
i10.8 kJ
Energy consumption rate Econsume
i0.5 J/s
Minimum energy Emin
i540 J
Initial energy Einitial
irandomly over [1080,3240] J
Intensity coefficient αirandomly over [0.1,0.3]
QoM demand Pdemand
zm
j
randomly over [0.5, 0.7]
Charging efficiency ε15 W
Velocity v2 m/s
Traveling energy consumption γ20 J/m
Energy capacity of MCV EMC V 128 kJ
Time duration of monitoring period T1 hour
selected according to:
arg max
i∈Scandidate
{(1Qi0∈H∪{i}(1pi0,zm
j))(1Qi0∈H(1pi0,zm
j))
Etotal(H∪{i},LH1h)Etotal (H,LH1),zm
j Z},
where hindicates whether this sensor needed to be recharged
or not:
h=({i},if Einitial
iEmin
i< T ·Econsume
i,
,otherwise.
Initially, H=, and the details of the proposed JSACS
algorithm can be found in Algorithm 2.
IV. SIMULATION RESULTS
In this section, simulations are conducted to numerically
evaluate the performance of the proposed JSACS for problem
P1. Table I lists the values of main simulation parameters.
Similar settings have been employed in the literature [8].
Note that some parameters may vary according to different
evaluation scenarios.
For effective and fair comparisons, we introduce the greedy
algorithm (GRE) and an existing algorithm named reward-cost
ratio algorithm (RC-ratio) [9]. GRE greedily selects sensors
into Hthat have maximum coverage probability until all tasks’
QoM are satisfied and then applies the earliest deadline first
policy (EDF) [10] to derive the charging tour of the MCV for
H1. For EDF, MCV always selects a sensor with the earliest
charging deadline as its next serving target. Besides, both the
charging deadlines of sensors in H1and the energy capacity of
MCV are taken into account when selecting each sensor. RC-
ratio selects sensors into Haccording to the marginal product
function while the MCV’s charging route is determined by
EDF.
Fig. 3 demonstrates the superiority of the proposed JSACS
in terms of the entire network energy consumption. It is shown
that, the energy consumption of the entire network increases
monotonically with the number of tasks. This is because with
the growth of the number of tasks, more sensors need to be
activated, leading to more energy consumption. Meanwhile,
35 36 37 38 39 40 41 42 43 44 45
Number of Tasks
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
1.55
1.6
Energy Consumption of the Entire IWRSN (J)
×105
GRE
RC-ratio
Proposed JSACS
Fig. 3. Comparison of energy consumption of
the entire IWRSN w.r.t. number of tasks.
35 36 37 38 39 40 41 42 43 44 45
Number of Tasks
0
10
20
30
40
50
60
70
80
Energy Utilization Efficiency of the MCV (%)
GRE
RC-ratio
Proposed JSACS
Fig. 4. Comparison of energy utilization effi-
ciency of the MCV w.r.t. number of tasks.
50 60 70 80 90 100 110 120 130 140 150
Network Size (L*L)
0.6
0.8
1
1.2
1.4
1.6
1.8
Energy Consumption of the Entire IWRSN (J)
×105
GRE
RC-ratio
Proposed JSACS
Fig. 5. Comparison of energy consumption of
the entire IWRSN w.r.t. network sizes.
with more sensors being activated, a growing number of them
need to be recharged within this area, resulting in the increase
of the MCV’s traveling energy consumption. Additionally, it
can be observed that the proposed JSACS outperforms GRE
and RC-ratio. The reason is that GRE iteratively selects a
sensor with maximum coverage probability while ignores the
sensor selection impacts on the total energy consumption. RC-
ratio outperforms GRE since RC-ratio selects a sensor with
maximum marginal product in each iteration. The proposed
JSACS achieves the best performance because it does not
only select a sensor with the largest marginal product in each
iteration, but also determines the charging route of the MCV
by a well DRL model instead of EDF.
Fig. 4 compares the energy utilization efficiency of GRE,
RC-ratio and proposed JSACS. The energy utilization effi-
ciency refers to the proportion of the energy for recharging
sensors to total MCV energy consumption. It is shown that
the proposed JSACS performs better than GRE and RC-ratio.
The reason is that the proposed JSACS consider the two-
layer optimization simultaneously when selecting a sensor.
In addition, the objective of the trained DRL model is to
minimize the traveling energy consumption of the MCV while
meeting the charging deadlines of sensors. However, the EDF
applied in GRE and RC-ratio does not consider the traveling
length of the MCV, and it simply recharge sensors in a timely
manner. Therefore, the proposed JSACS can prompt the MCV
to utilize more energy for task execution to increase the QoM
of tasks, rather than wasting energy on traveling.
Fig. 5 shows that the energy consumption of the entire
network of these three algorithms increases almost linearly
with the network size. The reason is that the larger network
size makes the sensor deployment more sparse, leading to
more energy consumption on traveling. In addition, a larger
network size also makes the distance between the sensor and
its monitoring tasks larger, and the detection probabilities of
sensors decrease, so that more sensors need to be activated to
execute tasks, inducing more energy consumption of sensors.
Intuitively, the proposed JSACS outperforms GRE and RC-
ratio, benefiting from integrating DRL and marginal product
based approximation algorithm to jointly solve the sensor
activation and charging scheduling problem.
V. CONCLUSION
In this paper, the joint optimization of sensor activation
and mobile charging scheduling for IWRSNs has been stud-
ied. By considering the objective of minimizing the energy
consumption of the entire network subjected to tasks’ QoM
requirements, sensor charging deadlines and the energy ca-
pacity of the MCV, an efficient algorithm named JSACS is
proposed integrating DRL and marginal product based approx-
imation algorithm. Simulation results show that, compared to
counterparts, the proposed algorithm can decrease the energy
consumption of the entire IWRSN and improve the energy
utilization efficiency of the MCV.
ACKNOWLEDGMENTS
This work was supported by National Natural Science Foun-
dation of China (NSFC) under Grants 62002164, 62176122,
and 62171218.
REFERENCES
[1] Y. Feng, W. Zhang, G. Han, Y. Kang, and J. Wang, A newborn
particle swarm optimization algorithm for charging-scheduling algorithm
in industrial rechargeable sensor networks, IEEE Sensors J., vol. 20,
no. 18, pp. 11 014–11 027, 2020.
[2] H. P. Gupta, T. Venkatesh, S. V. Rao, and T. Dutta, “Analysis of coverage
under border effects in three-dimensional mobile sensor networks, IEEE
Trans. Mobile Comput., vol. 16, no. 9, pp. 2436–2449, 2017.
[3] C. Yi, J. Cai, K. Zhu, and R. Wang, A queueing game based man-
agement framework for fog computing with strategic computing speed
control,” IEEE Trans. Mobile Comput., 2022.
[4] C. Yi, J. Cai, T. Zhang, K. Zhu, B. Chen, and Q. Wu, “Workload re-
allocation for edge computing with server collaboration: A cooperative
queueing game approach,” IEEE Trans. Mobile Comput., pp. 1–1, 2022.
[5] M. Nazari, A. Oroojlooy, L. V. Snyder, and M. Tak´
aˇ
c, “Reinforcement
learning for solving the vehicle routing problem,” in Adv. Neural Inf.
Process. Syst., 2018, pp. 9839–9849.
[6] I. Sutskever and O. Vinyals, “Sequence to sequence learning with neural
networks,” in Adv. Neural Inf. Process. Syst., 2014, pp. 3104–3112.
[7] A. Brewer, The making of the classical theory of economic growth.
Routledge, 2010.
[8] T. Liu, B. Wu, S. Zhang, J. Peng, and W. Xu, “An effective multi-node
charging scheme for wireless rechargeable sensor networks, in Proc.
IEEE Int. Conf. Comput. Commun., 2020.
[9] T. Wu, P. Yang, H. Dai, C. Xiang, X. Rao, J. Huang, and T. Ma, “Joint
sensor selection and energy allocation for tasks-driven mobile charging
in wireless rechargeable sensor networks, IEEE Internet Things J.,
vol. 7, no. 12, pp. 11 505–11 523, 2020.
[10] J. A. Stankovic, M. Spuri, K. Ramamritham, and G. C. Buttazzo,
Deadline scheduling for real-time systems: EDF and related algorithms.
Springer Science & Business Media, 2012, vol. 460.
... J. Chen, C. Yi, R. Wang A preliminary version [1] has been presented in IEEE ICC 2022. (Corresponding author: Changyan Yi) sensor device runs out of the energy, its perception ability will be greatly reduced and the overall system may collapse. ...
Article
Full-text available
In this paper, the joint sensor activation and mobile charging vehicle scheduling for wireless rechargeable sensor network (WRSN) based industrial Internet of Things (IIoT) is studied. In the proposed framework, an optimal sensor set is selected to collaboratively execute a bundle of heterogeneous industrial tasks (e.g., production-line monitoring), meeting the quality-of-monitoring (QoM) of each individual task, and we consider that a mobile charging vehicle (MCV) is scheduled for recharging sensors before their charging deadlines, i.e., time instants of running out of their batteries, in order to prevent from any potential service interruptions (which is one of the key features of IIoT). Our goal is to jointly optimize the sensor activation and MCV charging scheduling for minimizing the system energy consumption, subject to tasks' QoM requirements, sensor charging deadlines and the energy capacity of the MCV. Unfortunately, solving this problem is nontrivial, because it involves solving two tightly coupled NP-hard optimization problems. To address this issue, we design a novel scheme integrating reinforcement learning and marginal product based approximation algorithms, and prove that it is not only computationally efficient but also theoretically bounded with a guaranteed performance in terms of the approximation ratio. Simulation results show the feasibility of the proposed scheme and demonstrate its superiority over counterparts.
Article
Full-text available
We present an end-to-end framework for solving Vehicle Routing Problem (VRP) using deep reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. Our method is faster in both training and inference than a recent method that solves the Traveling Salesman Problem (TSP), with nearly identical solution quality. On the more general VRP, our approach outperforms classical heuristics on medium-sized instances in both solution quality and computation time (after training). Our proposed framework can be applied to variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.
Article
In this paper, a long-term workload management problem for multi-server edge computing with server collaboration is studied. In the considered model, mobile users computation-intensive tasks are generated dynamically over the time and offloaded to associated edge servers according to pre-determined subscription agreements. Upon receiving the subscribed workload, each edge server can then decide to whether participate in server collaboration for enabling workload re-allocation (i.e., workload exchange) with other heterogeneously configured edge servers. Unlike most of the existing work, this paper takes into account both competitions and collaborations among strategic edge servers in sharing their computing capacities. To achieve the equilibrium for each edge server in minimizing its expected cost (including energy consumption, delay, transmission, configuration and pricing costs), a joint optimization is formulated for determining i) its amount of workload to undertake, ii) compensation price charged from peers, and iii) computing speed to adopt. To efficiently solve this problem, we propose a novel cooperative queueing game approach, which integrates a convex optimization, a core cost sharing scheme and a mapping rule. Theoretical analyses and extensive simulations are conducted to evaluate the performance of the proposed solution, and demonstrate its superiority over counterparts.
Article
In this paper, a novel management framework for fog computing with strategic computing speed control at fog nodes (FNs) is studied. In the considered model, mobile users declare requests of offloading resource-hungry computation tasks that are dynamically collected at a dedicated edge server (ES). Upon receiving these requests, the ES can decide to either self-process or delegate some workloads to third-party FNs for maximizing the overall management profit. Unlike the existing work, this paper takes into account strategic behaviors of FNs in computing speed control, i.e., each FN can strategically allocate its computing resource to maximize its utility, which consists of the benefit gained from executing offloaded tasks and the cost incurred by dissatisfied (delayed) service to its own subscribed tasks. To jointly address the long-term system performance and FNs strategic interactions, a scheduling mechanism integrating a noncooperative game and a queueing model is formulated. We then investigate two delegation reward settings, i.e., constant and utility-dependent delegation prices, and propose efficient adaptive algorithms to determine the optimal workload distribution at the ES and the computing speed equilibrium among FNs. Both theoretical analyses and simulations are conducted to evaluate the performance of the proposed solutions and demonstrate their superiorities over counterparts.
Article
Wireless power transfer (WPT) has emerged as a promising paradigm to charge devices due to the high reliability and efficiency of continuous power supply. Recent studies usually focus on relatively general charging patterns and metrics but neglect the collaborated task execution of nodes that incur charging inefficiency. In this article, we respect the energy requirement diversity among nodes to investigate the collaborated and tasks-driven mobile charging problem. Our goal is to maximize the overall task utility that concerns sensor selection and task cooperation. To address this problem, we propose a $(1-1/e)/4$ -approximation algorithm. First, we propose a novel energy allocation scheme with a specific theoretical analysis of the submodularity and gap property for the surrogate function. Then, we approximate the traveling cost to transform the formulated problem into an essentially monotone submodular function optimization subject to a general routing constraint and propose a greedy algorithm to address this problem. We conduct extensive simulations to validate our theoretical results and the results show our algorithm can achieve a near-optimal solution covering at least 84.9% of the optimal result achieved by the OPT algorithm. Furthermore, field experiments in an office room and a soccer field environment are implemented, respectively, to validate our proposed algorithm.
Article
The Industrial Wireless Rechargeable Sensor Network (IWRSN) is a sensor network used in industrial environments. In order to ensure a certain intensity of industrial monitoring and real-time industrial control, the network is equipped with mobile charger to supplement the energy for sensors according to the charging schedule. Because of the complexity of industrial environment, the monitoring area is firstly divided into grids and established a set of paths that can be driven by mobile chargers. On this basis, a newborn particle swarm optimization (NPSO) charging scheduling algorithm is proposed for the constraint of node working time window. The NPSO algorithm borrows the idea of fireworks algorithm to introduce newborn particles into the population, and improves the convergence speed of the algorithm, then applies it to the charging scheduling process. The NPSO charging algorithm firstly plans the initial scheduling path for the node that needs to be priority charging. The remaining nodes to be charged are then designed to search for the location of the initial path near their position and update the time window of the subsequent charging node. The simulation results show that the proposed newborn particle swarm optimization charging scheduling algorithm has superiority in energy utilization and node mortality compared with the existing charging scheduling algorithm.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Article
Recent advances in robotics and low-power embedded systems made three-dimensional (3D) mobile wireless sensor networks (MSNs) an effective solution for monitoring a field of interest (FoI). From a cost perspective, it is often important to ensure the desired coverage ratio for the FoI within a maximum allowable response (MAR) time, by using a minimum number of sensors in MSNs. The literature on determining the minimum number of sensors for the desired coverage ratio assumes that the FoI is unbounded to overcome the border effects. Since the entire sensing sphere of the sensors near the boundary may not be useful for the coverage, the number of sensors estimated without the border effects is lower than the actual value. In this paper, we estimate the minimum number of sensors required to achieve a desired coverage ratio in a given MAR time for a 3D FoI. We term this problem (;V;T)�coverage problem, where , V , and T are the desired coverage ratio, average speed of sensors, and MAR time, respectively. We assume straight line mobility model for the sensors and consider the border effects while deriving the expected sensing volume of a sensor useful in coverage. We also consider the restriction of sampling rate of the sensors in this analysis. We discuss the application of our analysis for a nonhyper- rectangle shaped FoI, random walk and waypoint mobility models, and also the impact of neglecting the border effects. Our numerical and simulation results demonstrate the significance of border effects on the number of sensors and also the relationship between the coverage ratio, MAR time, sampling period and the sensing range.
Article
This book collects together for the first time Anthony Brewer's work on the origins and development of the theory of economic growth from the late eighteenth century and looking at how it came to dominate economic thinking in the nineteenth century. Brewer argues that many of the earliest proponents of economics growth theory had no concept of it as a continuing theory. This book looks at many of the key players such as Smith, Hume, Ferguson, Steuart, Turgot, West and Rae and is tied together with a rigorous introduction and a new chapter on capital accumulation.
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.