PreprintPDF Available

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

## Abstract

This paper considers random access in deadline-constrained broadcasting with frame-synchronized traffic. To enhance the maximum achievable timely delivery ratio (TDR), we define a dynamic control scheme that allows each active node to determine the transmission probability with certainty based on the current delivery urgency and the knowledge of current contention intensity. For an idealized environment where the contention intensity is completely known, we develop an analytical framework based on the theory of Markov Decision Process (MDP), which leads to an optimal scheme by applying backward induction. For a realistic environment where the contention intensity is incompletely known, we develop a framework using Partially Observable Markov Decision Process (POMDP), which can in theory be solved. We show that for both environments, there exists an optimal scheme that is optimal over all types of policies. To overcome the infeasibility in obtaining an optimal or near-optimal scheme from the POMDP framework, we investigate the behaviors of the optimal scheme for two extreme cases in the MDP framework, and leverage intuition gained from these behaviors to propose a heuristic scheme for the realistic environment with TDR close to the maximum achievable TDR in the idealized environment. In addition, we propose an approximation on the knowledge of contention intensity to further simplify this heuristic scheme. Numerical results with respect to a wide range of configurations are provided to validate our study.
arXiv:2108.03176v1 [eess.SY] 6 Aug 2021
1
Dynamic Control for Random Access in
Aoyu Gong, Lei Deng, Fang Liu and Yijin Zhang
Abstract—This paper considers random access in deadline-
constrained broadcasting with frame-synchronized trafﬁc. To
enhance the maximum achievable timely delivery ratio (TDR),
we deﬁne a dynamic control scheme that allows each active
node to determine the transmission probability with certainty
based on the current delivery urgency and the knowledge of
current contention intensity. For an idealized environment where
the contention intensity is completely known, we develop an
analytical framework based on the theory of Markov Deci-
sion Process (MDP), which leads to an optimal scheme by
applying backward induction. For a realistic environment where
the contention intensity is incompletely known, we develop a
framework using Partially Observable Markov Decision Process
(POMDP), which can in theory be solved. We show that for both
environments, there exists an optimal scheme that is optimal over
all types of policies. To overcome the infeasibility in obtaining an
optimal or near-optimal scheme from the POMDP framework,
we investigate the behaviors of the optimal scheme for two
extreme cases in the MDP framework, and leverage intuition
gained from these behaviors to propose a heuristic scheme for the
realistic environment with TDR close to the maximum achievable
TDR in the idealized environment. In addition, we propose
an approximation on the knowledge of contention intensity to
further simplify this heuristic scheme. Numerical results with
respect to a wide range of conﬁgurations are provided to validate
our study.
ing, reliability, Markov decision process
I. INTRO DUC TIO N
BROADCASTING is a fundamental operation in wireless
networks. With the explosive growth of ultrareliable low-
latency services for Internet of things [1], [2], [3], such
as multimedia sharing in sensor networks, safety message
dissemination in vehicular networks and industrial control in
has been becoming a research focus in recent years. For such
broadcasting, each packet needs to be transmitted within a
if the deadline expires. Hence, timely delivery ratio (TDR), de-
ﬁned as the probability that a broadcast packet is successfully
This work was supported in part by the National Natural Science Foundation
of China under Grants 62071236, 61902256, and in part by the Fundamental
Research Funds for the Central Universities of China (No. 30920021127).
(Corresponding author: Yijin Zhang.)
A. Gong and Y. Zhang are with the School of Electronic and Optical
Engineering, Nanjing University of Science and Technology, Nanjing 210094,
China. E-mail: {gongaoyu; yijin.zhang}@gmail.com.
L. Deng is with the College of Electronics and Information Engineering,
Shenzhen University, Shenzhen 518061, China. E-mail: ldeng@szu.edu.cn.
F. Liu is with the Department of Information Engineering, The Chi-
nese University of Hong Kong, Shatin, N. T., Hong Kong. E-mail:
lf015@ie.cuhk.edu.hk.
delivered to an arbitrary intended receiver within the given
delivery deadline, is considered as a critical metric to evaluate
that an uncertain set of nodes with new or backlogged packets
attempt to transmit at approximately the same time, so that
random access mechanisms are needed to support efﬁcient
channel sharing and careful design of access parameters is
needed to maximize the TDR.
Many recent literatures have been dedicated to this issue.
Under saturated trafﬁc, Bae [4], [5] obtained the optimal
slotted-ALOHA for broadcasting single-slot packets and op-
timal p-persistent CSMA for broadcasting multi-slot packets,
respectively. Under a discrete-time Geo/Geo/1 queue model,
Bae [6] obtained the optimal slotted-ALOHA for broadcasting
single-slot packets. Under frame-synchronized trafﬁc, Cam-
polo et al. [7] proposed an analytical model for using IEEE
802.11p CSMA/CA to broadcast multi-slot packets, which
can be used to obtain the optimal contention window size.
However, [4], [5], [6] adopted a static transmission probability
and [7] adopted a static contention window size, thus in-
evitably limiting the maximum achievable TDR. Other studies
on deadline-constrained random access include [8], [9] for
networks, which all still restrict their attentions to static access
parameters.
As such, to enhance the maximum achievable TDR of ran-
required to develop a dynamic control scheme that allows
each node to adjust its access parameters according to the
knowledge of current delivery urgencies and the knowledge
of current contention intensity. Unfortunately, due to random
trafﬁc or limited capability on observing the channel status,
each node cannot obtain a complete knowledge of the current
contention intensity in practice, which renders such design a
challenging task. So, each node has to estimate the current
contention intensity using the information obtained from the
observed channel status. A great amount of work has gone
into studying such information that can be obtained [11], [12],
[13], [14], [15] under various models and protocols. Our work
follows the same direction of [12], [13] to keep an A Poste-
riori Probability (APP) distribution for the current contention
intensity given all past observations and access settings, which
is a sufﬁcient statistic for the optimal design [16], but needs to
additionally take into account the impact of delivery urgencies.
To our best knowledge, this is the ﬁrst time to study dynamic
Furthermore, it is naturally desirable for this dynamic
2
control to strike a balance between the chance to gain an
instantaneous successful transmission and the chance to gain
a future successful transmission within the given deadline,
which requires reasoning about future sequences of access
parameters and observations. So, the dynamic control design
under this objective is more challenging than that for maximiz-
ing the instantaneous throughput of random access [12], [14],
[15], which is only “one-step look-ahead”. By seeing access
parameters as actions, in this paper we apply the theories
of Markov Decision Process (MDP) and Partially Observable
MDP (POMDP) to obtain optimal control schemes. Although
the idea of using MDP and POMDP in the context of ran-
dom access control is not new [13], [17], [18], to our best
knowledge, this is the ﬁrst work to apply them in deadline-
general computationally prohibitive, it is important to develop
a simple control scheme with little performance loss.
ing under frame-synchronized trafﬁc. Such a trafﬁc pattern
can capture a number of scenarios in machine-to-machine
communications [1], [7], [8], [10], [19] where each node has
periodic-i.i.d. packet arrivals. The contributions of our work
are as follows.
1) We generalize slotted-ALOHA to deﬁne a dynamic con-
trol scheme, i.e., a deterministic Markovian policy, which
allows each active node to determine the current trans-
mission probability with certainty based on its current
delivery urgency and the knowledge of current contention
intensity.
2) For an idealized environment where the contention in-
tensity is completely known, we develop an analytical
framework based on the theory of MDP, which leads
to an optimal control scheme by applying the backward
induction algorithm. We further show it is indeed optimal
over all types of policies for this environment.
3) For a realistic environment where the contention inten-
sity is incompletely known, we develop an analytical
framework based on the theory of POMDP, which can
in theory lead to an optimal control scheme by backward
induction. We also show it is indeed optimal over all types
of policies for this environment.
4) To overcome the infeasibility in obtaining an optimal or
near-optimal control scheme from the POMDP frame-
work, we investigate the behaviors of the optimal control
scheme for two extreme cases in the idealized environ-
ment, and use these behaviors as clues to design a simple
heuristic control scheme for the realistic environment
with TDR close to the maximum achievable TDR in
the idealized environment. In addition, we propose an
approximation on the knowledge of contention intensity
to further simplify this heuristic scheme.
Note that although the MDP framework for the idealized
environment has limited applicability as the contention inten-
sity cannot be completely known in practice, it will serve to
provide an upper bound on the maximum achievable TDR in
the realistic environment, and serve to provide clues to design
a heuristic scheme for the realistic environment.
The remainder of this paper is organized as follows. The
system model is speciﬁed in Section II-A, and a dynamic
control scheme is deﬁned in Section II-B. The idealized and
realistic environments are studied in Sections III and IV,
respectively. A simple heuristic control scheme for the realistic
environment is proposed in Section V. Numerical results with
respect to a wide range of conﬁgurations are provided in
Section VI. Conclusions are given in Section VII.
II. SY S TE M MO DE L A ND DY NA M IC CO NTRO L SCHE ME
A. System model
Consider a globally synchronized wireless network where a
ﬁnite number, N2, of nodes are within the communication
range of each other. The global time axis is divided into
frames, each of which consists of a ﬁnite number, D1,
of time slots of equal duration, indexed by t∈ T ,
{1,2,...,D}. To broadcast the freshest information, at the
beginning of each frame, each node independently generates
a packet to be transmitted with probability λ(0,1]. We
further assume that every packet has a strict delivery deadline
Dslots, i.e., a packet generated at the beginning of a frame will
for collaborative target detection with the above assumptions
can be found in Fig. 1.
¬
t =
TargetTarget
TargetTarget
Detection Range
TargetTarget
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
et
et
et
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Targ
rg
rg
rg
rg
rg
rg
etet
et
et
et
et
et
et
et
et
et
et
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
et
et
et
Target
TargetTarget
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
et
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
rget
et
et
et
et
et
et
et
Target
TargetTarget
TaTa
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
et
et
TaTa
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rgrgrg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
et
et
TaTa
Ta
Ta
Ta
rgrg
rg
rg
rgetet
et
et
et
TaTa
Ta
Ta
Ta
rgrg
rg
rg
rgetet
et
et
et
Target
UAV
UAV
UAV
UAV
Communicat ion Link 1 2 3 1 2 3
D = 3
ĂĂ
¬
detection
By considering random channel errors due to wireless
fading effect, we assume that a packet sent from a node
is successfully received by an arbitrary other node with the
probability σ(0,1] if it does not collide with other packets,
and otherwise is certainly unsuccessfully received by any other
node. Due to the broadcast nature, we assume that every
packet is neither acknowledged nor retransmitted. Then, at the
beginning of slot tof a frame, a node is called an active node
if it generated a packet at the beginning of the frame and has
not transmitted before slot t, but is called an inactive node
otherwise. Each active node follows a common control scheme
for random access, which will be deﬁned in Section II-B,
to generate transmission probabilities at the beginnings of
different slots.
A slot is said to be in the idle channel status if no packet
is being transmitted, and in the busy status otherwise. At the
3
end of a slot, we assume that each node is able to be aware
of the channel status of this slot.
The values of N,D,λand σare all assumed to be
completely known in advance to each node.
B. Dynamic Control for Random Access
Due to frame-synchronous trafﬁc, at the beginning of an
arbitrary slot with at least one active node, we know each
active node has the same delivery urgency. To take into account
the joint impact of delivery urgency and the knowledge of
contention intensity on determining transmission probabilities,
a dynamic control scheme for random access in deadline-
constrained broadcasting, which can be seen as a generaliza-
tion of slotted-ALOHA, is formally deﬁned as follows.
Consider an arbitrary frame. Let the random variable nt
taking values in N,{0,1,...,N 1}denote the actual
number of other active nodes in the view of an arbitrary
node at the beginning of slot t T . At the beginning of an
arbitrary slot t∈ T with at least one active node, we assume
that each active node has the same observation history (for
estimating the actual value of nt) from the environment, and
we require each active node to adopt the same transmission
probability. Thus, each active node has the same knowledge
of the actual value of ntbased on all past observations
from the environment and all past transmission probabilities.
Such a knowledge can be summarized by a probability vector
bt,bt(0), bt(1),...,bt(N1), called the activity belief,
where bt(n)is the conditional probability (given all past
observations from the environment and all past transmission
probabilities) that nt=n. Let Btdenote the set of all possible
values of btin [0,1]Nsuch that PN1
n=0 bt(n) = 1. Hence, at
the beginning of every slot t∈ T with at least one active
node, we require each active node to use the values of tand
btfor determining the value of transmission probability ptby
a transmission function πt:Bt[0,1]. An example of the
working procedure for the case of N= 8,D= 6 is illustrated
in Fig. 1.
packet transmission
node 1
p1p2p3p4p5
t = 1
transmission
probabilities
t = 2 t = 3 t = 4 t = 5 t = 6
node 2
node 3
node 4
node 5
node 6
node 7
node 8
active
inactive
active
active
inactive
active
active
inactive
inactive
inactive
active
active
inactive
active
active
inactive
inactive
inactive
active
active
inactive
active
inactive
inactive
inactive
inactive
inactive
active
inactive
inactive
inactive
inactive
inactive
inactive
inactive
active
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
an arbitrary frame
busy busy busy idle busy idle
channel status
]1,
0[
:®
t
tB
p
Fig. 2. An example of the working procedure for N= 8,D= 6.
We further consider two different environments for active
nodes to obtain the value of the activity belief bt.
1) Idealized environment: at the beginning of every slot t
Twith at least one active node, each active node always
has a complete knowledge of the value of nt, i.e., bt=
(0,...,0, bt(n) = 1,0,...,0) if nt=nactually. Hence,
the transmission function πtin this environment can be
simply written as a function bπt:N [0,1].
2) Realistic environment: at the beginning of each slot t∈ T
with at least one active node, each active node is able to
obtain the value of btonly based on the characteristic
of packet arrivals, all past channel statuses (idle or busy)
and all past transmission probabilities, and thus has an
incomplete knowledge of the value of nt.
Obviously, the idealized environment is infeasible to imple-
ment due to the difﬁculty in determining the initial actual
number of other active nodes and determining the number
of nodes involved in each busy slot, whereas the realistic
environment can be easily implemented without imposing
The objective of subsequent sections is to seek optimal
control schemes, i.e., design bπtand πtsequentially in each slot
so that the TDR is maximized, for the idealized and realistic
environments, respectively. It will be shown in Section III-B
that the mapping of bπtcan lead to an optimal control scheme
over all possible schemes for the idealized environment, and
will be shown in Section IV-B that the mapping of πtcan lead
to an optimal control scheme over all possible schemes for the
realistic environment.
III. MDP FR AM EWO RK F OR T H E IDE ALI ZED
ENV IRON ME N T
In this section, we formulate the random access control
problem in the idealized environment as an MDP, use the
expected total reward of this MDP to evaluate the TDR, and
then obtain an optimal control scheme that maximizes the
TDR.
A. MDP Formulation
For an arbitrary frame, consider an arbitrary node as the
tagged node, and let the random variable qttaking values from
{0 (inactive),1 (active)}denote the status of the tagged node
at the beginning of slot t. From the dynamic control scheme
speciﬁed in Section II-B, we see that each node becomes
inactive at the beginning of slot t+ 1 with the transmission
probability ptif it is active at the beginning of slot t, and will
be always inactive if it is inactive at the beginning of slot t.
This implies that the probability of moving to the next state
in the state process (qt, nt)t∈T depends only on the current
state. Thus, (qt, nt)t∈T can be viewed as a discrete-time ﬁnite-
horizon, ﬁnite-state Markov chain.
Based on the Markov chain (qt, nt)t∈T , we present an MDP
formulation by introducing the following deﬁnitions.
1) Actions: At the beginning of each slot t∈ T with qt= 1,
the action performed by the tagged node (and the other
active nodes) is the chosen transmission probability pt
taking values in the action space [0,1]. Note that the
tagged node performs no action when qt= 0.
2) State Transition Function: As the tagged node will never
transmit since slot tif qt= 0, we only concern about the
state transition function when qt= 1. The state transition
4
function βt(q, n),(1, n), pis deﬁned as the transition
probability of moving from the state (qt, nt) = (1, n)to
the state (qt+1, nt+1 ) = (q, n)when each active node at
the beginning of slot tadopts the transmission probability
pt=p. So, we have
βt(q, n),(1, n), p
,Pr(qt+1, nt+1 ) = (q, n)|(qt, nt) = (1, n), pt=p
=(n
nnpnn+1q(1 p)n+q,if nn,
0,otherwise,(1)
for each t T \ {D}, each q∈ {0,1}, each n, n∈ N
and each p[0,1].
3) Rewards: The reward gained at slot tis deﬁned as the
average number of packets of the tagged node transmitted
successfully to an arbitrary other node at slot t. As there
is no reward at slot twhen qt= 0, we only focus on the
cases when qt= 1. Let rt(1, n), pdenote the reward
at slot tfor the state (qt, nt) = (1, n)when each active
node at the beginning of slot tadopts the transmission
probability pt=p. So, we have
rt(1, n), p=σp(1 p)n,(2)
for each t∈ T , each n∈ N and each p[0,1].
4) Policies: A deterministic Markovian policy b
πis deﬁned
by a sequence of transmission functions (i.e., decision
rules) for the idealized environment:
b
π,(bπ1,bπ2,...,bπD),where bπt:N [0,1].
Let b
ΠMD denote the set of all possible such polices.
Obviously, a dynamic control scheme for the idealized
environment as described in Section II-B is essentially a
deterministic Markovian policy here.
Let R
b
π(1, n)denote the expected total reward from slot 1
to slot Dwhen q1= 1,n1=nand the policy b
πis used,
which can be deﬁned by
R
b
π(1, n)
,E
b
πnD
X
t=1,qt=1
rt(1, nt),bπt(nt)|q1= 1, n1=no,
where E
b
πrepresents the conditional expectation given that
policy b
πis employed. Then, the TDR under the policy b
πcan
be computed by
TDR
b
π=X
n∈N N1
nλn(1 λ)N1nR
b
π(1, n).
B. MDP solution
Due to the ﬁnite horizon, ﬁnite state space, compact action
space, bounded rewards, continuous rewards with respect to
pand continuous state transition function with respect to p
in our MDP formulation, [20, Prop. 4.4.3, Ch. 4] indicates
that for maximizing TDR
b
π, there exists a b
πb
ΠMD, which
is indeed optimal over all random and deterministic, history-
dependent and Markovian policies. This property also justiﬁes
the transmission function and design goal for the idealized
environment considered in Section II-B. Hence, we aim to
seek
b
πarg max
b
π
b
ΠMD
TDR
b
π.
Let U
t(1, n)denote the value function corresponding to the
maximum total expected reward from slot tto slot Dwhen
qt= 1 and nt=n. Averaging over all possible next states
with qt+1 = 1, we arrive at the following Bellman’s equation:
U
D(1, n) = max
p[0,1] rD(1, n), p,n N ,
U
t(1, n) = max
p[0,1] rt(1, n), p
+X
n∈N
βt(1, n),(1, n), pU
t+1(1, n),n N ,
(3)
for each t T \ {D}. Then, applying the backward induction
algorithm to get a solution to Eq. (3) involves ﬁnding global
maximizers of a series of real-coefﬁcient univariate polynomi-
als deﬁned on [0,1], and can formally lead to b
π.
IV. POMDP F RAM EWO RK FOR T HE REA LIS TIC
ENV IRON ME N T
In this section, we formulate the random access control
problem in the realistic environment as a POMDP, use the
expected total reward of this POMDP to evaluate the TDR,
and then discuss how to obtain an optimal or near-optimal
control scheme.
A. POMDP Formulation
Based on the Markov chain (qt, nt)t∈T speciﬁed in Sec-
tion III and the activity belief btfor each t∈ T with qt= 1
speciﬁed in Section II-B, we present a POMDP formulation
by introducing the following deﬁnitions.
1) Actions, State Transition Function, Rewards: The deﬁni-
tions of these elements are the same as in Section III.
2) Observations and Observation Function: The tagged
node at the beginning of slot t+ 1 can obtain an
observation on the channel status of slot t, denoted by
ot. When qt+1 = 0, the tagged node will never transmit
since slot t+ 1 and otwill thus be useless. Hence, we
only consider otwhen qt+1 = 1 taking values from the
observation space O,{0 (idle),1 (busy)}. Further, the
observation function ωto, (1, n),(1, n)is deﬁned as
the probability that the tagged node at the beginning of
slot t+ 1 obtains the observation ot=oif the state
(qt, nt) = (1, n)and the state (qt+1 , nt+1) = (1, n). So,
we have
ωto, (1, n),(1, n)
,Prot=o|(qt, nt) = (1, n),(qt+1, nt+1 ) = (1, n)
=
1,if o= 0, n =n,
1,if o= 1, n n1,
0,otherwise,
for each t∈ T , each o∈ O and each n, n N .
3) Bayesian update of the Activity Belief: It has been shown
in [16] that for each t∈ T with qt= 1, the value
5
of the activity belief btis a sufﬁcient statistic for the
initial activity belief, all past channel statuses and all
past transmission probabilities. First, by the total number
of nodes Nand the packet generation probability λ, the
tagged node can obtain
b1=hλ
,(1 λ)N1,(N1)λ(1 λ)N2,...,λN1.
(4)
Then, for each t T \ {D}, given the condition qt=
qt+1 = 1, the activity belief bt=b, the observation
ot=o, the transmission probability pt=pused at slot
t, the tagged node at slot t+ 1 can obtain bt+1 via the
Bayes’ rule:
bt+1 ,θt(b, p, o, 1,1),
bt+1(n)
,Prnt+1 =n|bt=b, pt=p, ot=o, qt=qt+1 = 1
=Pn∈N b(n)ωto, (1, n),(1, n)βt(1, n),(1, n), p
χt(o, b, p, 1,1) ,
for each n N , where
χt(o, b, p, 1,1)
,Prqt+1 = 1, ot=o|qt= 1,bt=b, pt=p
=X
n∈N
b(n)X
n′′∈N
ωto, (1, n),(1, n′′)
·βt(1, n′′),(1, n), p.
4) Policies: A deterministic Markovian policy πis deﬁned
by a sequence of transmission functions for the realistic
environment:
π,(π1, π2,...,πD),where πt:Bt[0,1].
Let ΠMD denote the set of all possible such polices.
Obviously, a dynamic control scheme for the realistic
environment as speciﬁed in Section II-B is essentially
a deterministic Markovian policy here.
Let Rπ(1,hλ)denote the expected total reward from slot
1to slot Dwhen q1= 1,b1=hλand the policy πis used,
which can be deﬁned by
Rπ(1,hλ)
,EπnD
X
t=1,qt=1
rt(1, nt), πt(bt)|q1= 1,b1=hλo.
Obviously, we have TDRπ=Rπ(1,hλ), where TDRπde-
notes the TDR under the policy π,
B. POMDP solution
Due to the ﬁnite horizon, ﬁnite state space, compact action
space, bounded rewards, continuous rewards with respect
to pand continuous χt(o, b, p, 1,1) with respect to pin
our POMDP formulation, [20, Prop. 4.4.3, Ch. 4] and [21,
Thm. 7.1, Ch. 6] indicate that for maximizing TDRπ, there
exists a πΠMD, which is indeed optimal over all types of
policies. This property also justiﬁes the transmission function
and design goal for the realistic environment considered in
Section II-B. Hence, we aim to seek an optimal policy in
ΠMD that maximizes TDRπ, i.e.,
πarg max
πΠMD
TDRπ.
Let V
t(1,b)denote the value function corresponding to the
maximum total expected reward from slot tto slot Dwhen
qt= 1 and bt=b. Averaging over all possible current states
with qt= 1 and observations with qt+1 = 1, we arrive at the
following Bellman’s equation:
V
D(1,b) = max
p[0,1] X
n∈N
b(n)rD(1, n), p,b∈ BD,
V
t(1,b) = max
p[0,1] X
n∈N
b(n)rt(1, n), p
+X
o∈O
χt(o, b, p, 1,1)V
t+11, θt(b, p, o, 1,1),b∈ Bt,
(5)
for each t∈ T \ {D}. Solving Eq. (5) formally leads to π.
Unfortunately, getting πby solving Eq. (5) is computation-
ally intractable, as both the belief state space St∈T Btand the
action space [0,1] are inﬁnite in our POMDP formulation. As
such, an alternative is to consider a discretized action space
Adthat only consists of uniformly distributed samples of
the interval [0,1], i.e., Ad,{0,p, 2∆p, . . . , 1}where p
denotes the sampling interval. Hence, it is easy to see that Bt
will become ﬁnite for each t∈ T due to the ﬁnite Ad. Then,
theoretically, applying the backward induction algorithm [16]
to get a solution to Eq. (5) can lead to a near-optimal policy,
whose loss of optimality increases with p. However, this
approach is still computationally prohibitive due to super-
exponential growth in the value function complexity.
V. A HEUR IS T IC SC HEM E FOR T H E REA L IS TIC
ENV IRON ME N T
To overcome the infeasibility in obtaining an optimal or
near-optimal control scheme for the realistic environment from
the POMDP framework, in this section, we propose a simple
heuristic control scheme that utilizes the key properties of our
problem. It will be shown in Section VI that the heuristic
scheme performs quite well in simulations.
A. Heuristic from the idealized environment
We ﬁrst investigate the behaviors of b
πfor two extreme
cases in the idealized environment, which would serve to pro-
vide important clues on approximating b
π. Let Ut(1, n), p
denote the total expected reward from slot tto slot Dfor the
state (qt, nt) = (1, n)when each active node at the beginning
of slot tadopts the transmission probability pt=pand the
optimal decision rules at slots t+ 1, t + 2 ,...,D. So, we have
Ut(1, n),bπ
t(n)=U
t(1, n).
6
Lemma 1. When n1=m→ ∞, by assuming that each
collision involves a ﬁnite number of packets, for each t∈ T
and each possible nt=n, we have
lim
m→∞(n+ 1)Ut(1, n),1
n+ 1=(Dt+ 1)σ
e,(6)
lim
m→∞(n+ 1)U
t(1, n) = (Dt+ 1)σ
e.(7)
The proof of Lemma 1 is provided in Appendix A.
Lemma 1 motivates us to conjecture that, if n1takes a value
sufﬁciently larger than D, the realizations of (nt+ 1)bπ
t(nt)
would always approach 1 for each t T . Fig. 3 shows
1000 such realizations when D= 10 for n1= 30,50,100,
respectively, which conﬁrm our conjecture.
1 2 3 4 5 6 7 8 9 10
1
1.002
1.004
1.006
1.008
1.01
1.012
1.014
1.016
Fig. 3. Realizations of (nt+1)bπ
t(nt)when n1= 30,50,100 and D= 10.
We further investigate the behaviors of b
πfor the extreme
case that n1takes a value sufﬁciently smaller than D.
Lemma 2. For each t∈ T , we have
U
t(1,1) = 3D3t+ 1
3D3t+ 4σ, (8)
and for each t T \ {D}, we have
bπ
t(1) = 3
3D3t+ 4.(9)
The proof of Lemma 2 is provided in Appendix B.
Inspired by Eq. (9), we consider a simple control scheme
b
πeve ,[bπeve
1,bπeve
2,...,bπeve
D]b
ΠMD where
bπeve
t(n) = 1
Dt+ 1,(10)
for each t∈ T and each n N .
Let Ueve
t(1, n)denote the expected total reward from slot t
to slot Dfor the state (qt, nt) = (1, n)when each active node
tat slots t, t+1,...,D. So, using
the ﬁnite-horizon policy evaluation algorithm [20], we have
Ueve
D(1, n) = rD(1, n),bπeve
D(n),n N ,
Ueve
t(1, n) = rt(1, n),bπeve
t(n)
+X
n∈N
βt(1, n),(1, n),bπeve
t(n)Ueve
t+1(1, n),n N .
(11)
for each t∈ T \ {D}.
Lemma 3. For each t∈ T \ {D}and each n N , we have
Ueve
t(1, n) = σ11
Dt+ 1n.(12)
The proof of Lemma 3 is provided in Appendix C.
For each t T \ {D}and each n N , based on the fact
U
t(1, n)σ, we have
Ueve
t(1, n)
U
t(1, n)11
Dt+ 1 n.(13)
We can observe from Eq. (13) that, if nis sufﬁciently smaller
than Dt+ 1, the value of U
t(1, n)is close to the value
of Ueve
t(1, n). Then, Eq. (13) motivates us to conjecture that,
if nttakes a value sufﬁciently smaller than Dt+ 1,bπeve
t
may behave like bπ
t, i.e., the realizations of (Dt+ 1 )bπ
t(nt)
would always approach (Dt+ 1)bπeve
t(nt) = 1 for each nt=
n N . Fig. 4 shows 1000 such realizations when n1= 10 for
D= 30,50,100, respectively, which conﬁrm our conjecture.
1 2 3 4 5 6 7 8 9 10
0.972
0.976
0.98
0.984
0.988
0.992
0.996
1
Fig. 4. Realizations of (Dt+ 1)bπ
t(nt)when n1= 10 and D=
30,50,100.
Naturally, we obtain the following heuristic from Lemma 1
and Eq. (13).
1) When the number of active nodes is sufﬁciently large
compared with the value of remaining slots, it is desirable
for the active nodes to adopt the transmission probability
that maximizes the instantaneous throughput.
2) When the number of active nodes is sufﬁciently small
compared with the value of remaining slots, it is desirable
for the active nodes to adopt the transmission probability
to ensure that all the backlogged packets would be almost
evenly transmitted in the remaining slots.
Based on this heuristic and the obvious fact bπ
D(n) = 1
n+1 for
each n N , we propose a simple approximation on b
π.
Approximation on b
π: For each slot t∈ T and each nt=
n N , if the number of active nodes n+ 1 is larger than
the value of remaining slots Dt+ 1 or t=D,bπ
t(n)can
be estimated by 1
n+1 , otherwise bπ
t(n)can be estimated by
1
Dt+1 .
7
0 5 10 15 20 25 30
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
Fig. 5. bπ
t(n)and its approximation for typical choices of parameters when
D= 30.
0 5 10 15 20 25 30
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Fig. 6. The total expected rewards from slot tto Dcorresponding to bπ
t(n)
and its approximation for typical choices of parameters when D= 30.
Fig. 5 compares bπ
t(n)and its approximation for typical
choices of parameters when D= 30. The results show that the
approximation error is very small when the difference between
nand Dtis large, but is noticeable when this difference
is small, thus justifying our heuristic. The results also show
that the ratio of the cases with the approximation error
larger than 8% is 6.67% and the largest approximation error
is 11.47%, thus justifying our approximation. Furthermore,
Fig. 6 compares the total expected rewards from slot tto
Dcorresponding to bπ
t(n)and its approximation for typical
choices of parameters when D= 30. The results show that
the approximation leads to at most 0.66% reward loss (much
smaller than the approximation error), and thus verify that our
approximation can be used to obtain the TDR quite close to
the maximum achievable TDR in the idealized environment.
B. A simple approximation on the activity belief of the realistic
environment
To apply the approximation on b
πto the realistic envi-
ronment, it is necessary for each active node to perform
a runtime updating of the activity belief bt. However, as
shown in Section IV-A, the full Bayesian updating of bt
is a bit computationally demanding to implement. So, we
shall propose a simple approximation on bt, denoted by
bbd
t,bbd
t(0), bbd
t(1),...,bbd
t(N1), relying on a binomial
distribution with a changeable parameter vector (Mt, αt).
More speciﬁcally, if (Mt, αt) = (M, α), we have
bbd
t(n) = (M
nαn(1 α)Mn,if 0nM,
0,otherwise.(14)
As such, in this manner, each active node will only keep the
parameter vector (Mt, αt)rather than the activity belief bt.
Obviously, by Eq. (4), we can set (M1, α1) = (N1, λ)
to achieve bt=bbd
t. Then, for each t T \ {D}, we will
show that we can use the Bayes’ rule exactly to set the value of
(Mt+1, αt+1 )when the observation ot= 0, but must introduce
an approximation assumption when ot= 1.
For each t∈ T \ {D}, given (Mt, αt) = (M, α),bbd
t=bbd
and pt=p, the following procedure ﬁrst uses the Bayes’ rule
to compute bmed
t+1 ,bmed
t+1(0), bmed
t+1(1),...,bmed
t+1(N1), and
then computes (Mt+1 , αt+1)based on the value of bmed
t+1.
Case 1: if ot= 0, the Bayesian update yields
bmed
t+1(n)
=Pn∈N bbd(n)ωt0,(1, n),(1, n)βt(1, n),(1, n), p
χt(0,bbd, p, 1,1)
=
M
nααp
1αp n1ααp
1αp Mn
,if 0nM,
0,otherwise.
We require bbd
t+1 to directly take the value of bmed
t+1, and set
(Mt+1, αt+1 ) = (M, ααp
1αp ).
Case 2: if ot= 1, the Bayesian update yields
bmed
t+1(n)
=Pn∈N bbd(n)ωt1,(1, n),(1, n)βt(1, n),(1, n), p
χt(1,bbd, p, 1,1)
=
1
1(1αp)M
·M
nα(1 p)n1α(1 p)Mn
(1 αp)MM
nααp
1αp n1ααp
1αp Mn,
if 0nM1,
0,otherwise.
Such a Bayesian update does not yield a distribution in the
form (14). However, we modify the value of bmed
t+1 to a
distribution in the form (14) by keeping the mean of the
distribution unchanged and considering that the number of
8
TABLE I
ACO MPA RI SON BE TWEE N THE REA LI ZATIO NS O F btA ND I TS AP PROX IMATI ON bbd
tWH EN E AC H ACT I VE N OD E AD OP T S πheu FO R N= 10,λ= 0.8,
D= 10.
bt(0) bt(1) bt(2) bt(3) bt(4) bt(5) bt(6) bt(7) bt(8) bt(9)
t= 1
o1= 0
bt0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218
Approx. 0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218
t= 2
o2= 1
bt0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160
Approx. 0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160
t= 3
o3= 1
bt0.000059 0.001098 0.009014 0.042646 0.127254 0.245406 0.298859 0.210235 0.065430 0
Approx. 0.000052 0.001004 0.008559 0.041692 0.126924 0.247294 0.301138 0.209545 0.063792 0
t= 4
o4= 1
bt0.001086 0.012248 0.059916 0.164987 0.276437 0.282086 0.162465 0.040774 0 0
Approx. 0.000974 0.011537 0.058598 0.165343 0.279925 0.284347 0.160466 0.038810 0 0
t= 5
o5= 1
bt0.010921 0.072058 0.201100 0.304173 0.263268 0.123764 0.024716 0 0 0
Approx. 0.010329 0.070827 0.202359 0.308353 0.264299 0.120821 0.023013 0 0 0
t= 6
o6= 0
bt0.068102 0.238724 0.340491 0.247285 0.091556 0.013842 0 0 0 0
Approx. 0.067210 0.240606 0.344541 0.246686 0.088312 0.012646 0 0 0 0
t= 7
o7= 0
bt0.169904 0.357679 0.306377 0.133629 0.029713 0.002698 0 0 0 0
Approx. 0.167239 0.359554 0.309208 0.132956 0.028585 0.002458 0 0 0 0
t= 8
o8= 1
bt0.421334 0.395352 0.150943 0.029344 0.002908 0.000118 0 0 0 0
Approx. 0.416144 0.398784 0.152859 0.029297 0.002807 0.000108 0 0 0 0
active nodes will be reduced by at least one due to a busy
slot. So, when M > 1, we set
(Mt+1, αt+1 ) = M1,M(ααp)1(1 αp)M1
(M1)1(1 αp)M,
and when M= 1, we adopt the convention that
(Mt+1, αt+1 ) = (M1,1).
The accuracy of this approximation will be examined via
numerical results at the end of this section.
C. A heuristic scheme
With the investigations in Sections V-A and V-B together,
we are ready to propose a heuristic but very simple control
scheme for the realistic environment, πheu.
At the beginning of each slot t T , given the parameter
vector of belief approximation (Mt, αt) = (M, α), each active
node uses M α to estimate the mean of nt, and further uses
the following simple rule πheu
t(M, α)to determine the value
of transmission probability pt.
1) If M α + 1 > D t+ 1 or t=D, set πheu
t(M, α)to
maximize the expected instantaneous throughput, i.e.,
πheu
t(M, α)arg max
p[0,1] X
n∈N
bbd(n)rt(1, n), p.
= min 1
Mα +α,1.(15)
The proof of Eq. (15) is provided in Appendix D.
2) Otherwise, set
πheu
t(M, α) = 1
Dt+ 1 .
Table I compares the realizations of btand its approxi-
mation bbd
twhen each active node adopts πheu for N= 10,
λ= 0.8,D= 10, and veriﬁes that the proposed approximation
is reasonable.
VI. NU M ER ICA L EVALUATIO N
In this section, we present numerical results to compare
the TDR performance of an optimal control scheme for the
idealized environment b
π, the proposed heuristic scheme for
the realistic environment πheu , and an optimal static scheme
for the realistic environment πsta. Here, πsta requires each
active node to always adopt a static and identical transmission
probability, and can be obtained using the single-variable
optimization methods. Such compactions are not only helpful
to demonstrate the performance loss due to the incomplete
knowledge of the value of nt, but also helpful to demonstrate
justing the transmission probability.
0.1 0.16 0.22 0.28 0.34 0.4
0.15
0.28
0.41
0.54
0.67
0.8
Fig. 7. The TDR as a function of the packet arrival rate λfor N= 50,
D= 10,20,σ= 0.9.
The scenarios considered in the numerical experiments are
in accordance with the system model speciﬁed in Section II.
9
We shall vary the network conﬁguration over a wide range
to investigate the impact of control scheme design on the
TDR performance. Each numerical result is obtained from 107
independent numerical experiments.
Fig. 7 shows the TDR performance as a function of the
packet arrival rate λfor N= 50,D= 10,20,σ= 0.9.
We observe that πheu performs close to b
π:3.07%8.28%
loss when D= 10 and 0.60%4.47% loss when D= 20.
This indicates that the design of πheu is reasonable, and the
incomplete knowledge of the actual value of nthas a minor
impact on the TDR performance. We further observe that πheu
signiﬁcantly outperforms πs ta:1.84%17.12% improvement
when D= 10 and 11.11%19.40% improvement when
D= 20. The reason is obviously that πsta does not adjust
the transmission probability according to the current delivery
urgency and contention intensity. Meanwhile, it is interesting
to note that πsta performs closer to other schemes as λin-
creases. This is because the optimal transmission probabilities
for different values of tand ntbecome closer with the value
of n1, as indicated by Lemma 1.
10 12 14 16 18 20
0.2
0.29
0.38
0.47
0.56
0.65
Fig. 8. The TDR as a function of the delivery deadline D(slots) for N= 50,
λ= 0.25,σ= 0.8,1.
The observations in Fig. 7 are conﬁrmed again from Figs. 8–
9, which show the TDR performance as a function of the
delivery deadline D(slots) and the TDR performance as a
function of the packet success rate σ, respectively. In Fig. 8,
we observe that πheu performs signiﬁcantly better than πsta :
6.45%17.06% improvement when σ= 0.8and 6.30%
16.74% improvement when σ= 1, and performs close to b
π:
3.12%6.68% loss when σ= 0.8and 3.45%6.83% loss when
σ= 1. In Fig. 9, we observe that πheu performs signiﬁcantly
better than πsta:18.51%19.33% improvement when λ= 0.1
and 5.58%5.81% improvement when λ= 0.4, and performs
close to b
π:0.55%0.87% loss when λ= 0.1and 4.11%
4.41% loss when λ= 0.4. It is also shown that πsta performs
close to other transmission schemes as N λ
Dbecomes larger.
VII. CO NC LUS ION
In this paper, under the idealized and realistic environ-
ments, optimal dynamic control schemes for random access
0.8 0.84 0.88 0.92 0.96 1
0.1
0.24
0.38
0.52
0.66
0.8
Fig. 9. The TDR as a function of the packet success rate σfor N= 50,
λ= 0.1,0.4,D= 15.
trafﬁc have been investigated based on the theories of MDP
and POMDP, respectively. A novel feature of this work is to
require each active node to determine the current transmission
probability not only according to the knowledge of current
contention intensity, but also according to the current deliv-
ery urgency. The proposed heuristic scheme for the realistic
environment is able to achieve the threefold goal of being
implemented without imposing extra overhead and hardware
cost, of being implemented with very low computational
complexity, and of achieving TDR close to the maximum
achievable TDR in the idealized environment. An interesting
and important future research direction is to optimize deadline-
constrained broadcasting under general trafﬁc patterns.
APP EN D IX A
PROO F OF LEM MA 1
Assume each collision involves at most a ﬁnite number,
k2, of packets. We begin with the case t=D. By Eq. (2),
we know UD(1, n), p=σp(1 p)nand thus bπ
D(n) = 1
n+1 .
As kand Dare both ﬁnite, for each nD=n∈ {mk(D
1), m k(D1)+ 1,...,m}, we obtain that m→ ∞ implies
n→ ∞ and then
lim
m→∞(n+1)U
D(1, n) = lim
m→∞(n+1)UD(1, n),1
n+ 1=σ
e.
(16)
Next, we consider the case t=D1. By the ﬁnite-horizon
policy evaluation algorithm [20] and Eqs. (1), (2), for each
nD1=n∈ {mk(D2), m k(D2) + 1,...,m}, we
10
have
(n+ 1)UD1(1, n), p
= (n+ 1)rD1(1, n), p
+ (n+ 1) X
n∈N
βD1(1, n),(1, n), pU
D(1, n)
=σ(n+ 1)p(1 p)n
+X
n∈N
(n+ 1) n!
n!(nn)! pnn(1 p)n+1U
D(1, n)
=σ(n+ 1)p(1 p)n
+X
n∈N n+ 1
nnpnn(1 p)n+1(n+ 1)U
D(1, n).
By assuming each collision involves at most a ﬁnite number,
k2, of packets, we have
(n+ 1)UD1(1, n), p
=σ(n+ 1)p(1 p)n
+
n
X
n=nk+1 n+ 1
nnpnn(1 p)n+1(n+ 1)U
D(1, n)
+1
n
X
n=nk+1 n+ 1
nnpnn(1 p)n+1
·(nk+ 1)U
D(1, n k)(17)
σ(1 1
n+ 1 )n+ (nk+ 1)U
D(1, n k)
+
n
X
n=nk+1 n+ 1
nnpnn(1 p)n+1
·(n+ 1)U
D(1, n)(nk+ 1)U
D(1, n k).(18)
For each n∈ {nk+ 1, n k+ 2,...,n}, since
0n+1
nnpnn(1 p)n+1 1, by applying the squeeze
theorem, we obtain from Eq. (16) that
lim
m→∞ n+ 1
nnpnn(1 p)n+1
·(n+ 1)U
D(1, n)(nk+ 1)U
D(1, n k)= 0.
(19)
By Eqs. (16), (19) and inequality (18), as kand Dare both
ﬁnite, we further obtain that m→ ∞ implies n→ ∞ and
then
lim sup
m→∞ (n+ 1)UD1(1, n), p
lim sup
m→∞ σ(1 1
n+ 1)n+ (nk+ 1)U
D(1, n k)
= lim
m→∞ σ(1 1
n+ 1)n+ (nk+ 1)U
D(1, n k)
=2σ
e,
which implies
lim sup
m→∞ (n+ 1)U
D1(1, n)2σ
e.(20)
By setting bπD1(n) = 1
n+1 for each nD1=n∈ {m
k(D2), m k(D2) + 1,...,m}, as kand Dare both
ﬁnite, we obtain that m→ ∞ implies n→ ∞, and then obtain
from Eqs. (16), (17) and (19) that
lim
m→∞(n+ 1)UD1(1, n),1
n+ 1
= lim
m→∞ σ(1 1
n+ 1 )n+ (nk+ 1)U
D(1, n k)
=2σ
e.
Since U
D1(1, n)UD1(1, n),1
n+1 , we have
lim inf
m→∞ (n+ 1)U
D1(1, n)
lim inf
m→∞ (n+ 1)UD1(1, n),1
n+ 1=2σ
e.(21)
Combining inequalities (20) and (21), we have limm→∞ (n+
1)U
D1(1, n) = 2σ
e.
For the case t=D2, D3,...,1, iteratively repeating the
above argument can lead to Eqs. (6) and (7) for each possible
nt=n.
APP EN D IX B
PROO F OF LEM MA 2
As U
t(1,0) = σfor each t∈ T , we have
Ut(1,1), p= 2σp(1 p) + (1 p)2U
t+1(1,1),(22)
for each t T \ {D}. Taking the derivative of Ut(1,1), p
with respect to pderives that
d
dpUt(1,1), p
=2σ2U
t+1(1,1)4σ2U
t+1(1,1)p.
As σ > 0and U
t(1,1) σfor each t T \ {D}, we have
bπ
t(1) = σU
t+1(1,1)
2σU
t+1(1,1) ,(23)
for each t T \ {D}. In particular, as U
D(1,1) = σ/4, we
obtain bπ
D1(1) = 3/7, which satisﬁes Eq. (9).
Then, we aim to investigate the relation between bπ
t(1) and
bπ
D1(1) for each t T \ {D1, D}. By setting p=bπ
t(1)
in Eq. (22), we obtain
U
t(1,1) = 2σbπ
t(1)1bπ
t(1)+1bπ
t(1)2U
t+1(1,1)
=σ2
2σU
t+1(1,1) .(24)
Using Eq. (23) to express U
t+1(1,1) and U
t(1,1) in Eq. (24)
in terms of bπ
t(1) and bπ
t1(1), respectively, we have
bπ
t(1) = bπ
t+1(1)
1 + bπ
t+1(1) ,(25)
for each t T \ {D1, D}. Furthermore, recursively using
Eq. (25) yields
bπ
t(1) = bπ
D1(1)
1 + (Dt1)bπ
D1(1) (26)
and thus implies Eq. (9) by bπ
D1(1) = 3/7.
11
Finally, combining Eqs. (9) and (23) obtains
U
t(1,1) = 12bπ
t1(1)
1bπ
t1(1) σ=3D3t+ 1
3D3t+ 4 σ, (27)
for each t T \ {1}, and substituting Eq. (27) into Eq. (24)
obtains U
1(1,1) = 3D2
3D+1 σ. Hence we complete the proof for
Eq. (8).
APP EN D IX C
PROO F OF LEM MA 3
We shall prove Ueve
t(1, n) = σ11
Dt+1 nfor each n
Nby induction from t=D1down to 1.
First, when t=D1, by Eqs. (1), (2) and (11), we have
Ueve
D1(1, n)
=rD1(1, n),bπeve
D1(n)
+X
n∈N
βD1(1, n),(1, n),bπeve
D1(n)Ueve
D(1, n)
=σ1
211
2n+ (1 1
2)n+1Ueve
D(1,0)
=σ(1 1
2)n,
for each n N , thereby establishing the induction basis.
Next, when t T \ {D1, D}, we assume Ueve
t+1(1, n) =
σ11
Dtnfor each n N . By Eqs. (1), (2) and (11), we
have
Ueve
t(1, n)
=rt(1, n),bπeve
t(n)
+X
n∈N
βt(1, n),(1, n),bπeve
t(n)Ueve
t+1(1, n)
=σ1
Dt+ 1 11
Dt+ 1n
+X
n∈N n
nn1
Dt+ 1 nn
·11
Dt+ 1n+1σ11
Dtn
=σ11
Dt+ 1 n1
Dt+ 1
+σ11
Dt+ 1nDt
Dt+ 1
·X
n∈N n
nn1
Dtnn11
Dtn
=σ11
Dt+ 1 n,
for each n N . So, the inductive step is established.
Since both the base case and the inductive step have been
proved as true, we have Ueve
t(1, n) = σ11
Dt+1 nfor each
t∈ T \ {D}and each n N .
APP EN D IX D
PROO F OF EQ. (15)
Letting f(M, α), p,(M+1)α
σPn∈N bbd(n)rt(1, n), p
for p[0,1] and ci=M+1
iαi(1 α)(M+1i)for 1i
M+ 1, we have
f(M, α), p=
M+1
X
i=1
icip(1 p)i1.
The derivative of f(M, α), pwith respect to pis given by
d
dpf(M, α), p
=
M+1
X
i=1
ici(1 p)i1
M+1
X
i=2
i(i1)cip(1 p)i2
= (M+ 1)α+ (M+ 1)2αM+1(p)M+
M1
X
j=1
βjpj,(28)
where βj,1jM1is derived as follows:
βj= (1)j
M+1j
X
k=1 M+ 1
j+kαj+k(1 α)M+1jk
·(j+k)j+k1
j+ (j+k1)j+k2
j1
= (1)j(j+ 1)2αj+1
·
M+1j
X
k=1 j+k
k1M+ 1
j+kαk1(1 α)M+1jk
= (1)j(j+ 1)2αj+1M+ 1
j+ 1
·
M+1j
X
k=1 Mj
k1αk1(1 α)Mjk+1
= (1)jαj+1
·j(M+ 1)2+ (M+ 1)(Mj)(M1)!
(Mj)!j!
= (1)jαj+1
·(M+ 1)2M1
j1+ (M+ 1)M1
j.(29)
Combining Eqs. (28) and (29), we have
d
dpf(M, α), p
= (M+ 1)α+ (M+ 1)2αM+1(p)M
+
M1
X
j=1 h(M+ 1)2M1
j1+ (M+ 1)M1
ji
·αj+1(p)j
= (M+ 1)α1(M+ 1)αp
+ (M+ 1)α1(M+ 1)αp(αp)M1
+
M2
X
j=1 M1
j(M+ 1)α1(M+ 1)αp(αp)j
= (M+ 1)α1(M+ 1)αp1αpM1.(30)
12
From Eq. (30), for p[0,1], we obtain that f(M , α), p
f(M, α),1
+αwhen 1
+α1, and f(M , α), p
f(M, α),1when 1
+α>1.
Hence we complete the proof for Eq. (15).
ACK NOWL E DG E ME NT
The authors would like to thank Dr. He Chen for helpful
suggestions and discussions.
REF ERE NC E S
[1] D. Feng, C. She, K. Ying, L. Lai, Z. Hou, T. Q. S. Quek, Y. Li, and
B. Vucetic, “Toward ultrareliable low-latency communications: Typical
scenarios, possible solutions, and open issues,IEEE Veh. Technol. Mag.,
vol. 14, no. 2, pp. 94–102, 2019.
[2] J. Gao, M. Li, L. Zhao, and X. Shen, “Contention intensity based
distributed coordination for V2V safety message broadcast, IEEE Trans.
Veh. Technol., vol. 67, no. 12, pp. 12288–12 301, 2018.
[3] M. Luvisotto, Z. Pang, and D. Dzung, “High-performance wireless
networks for industrial control applications: New targets and feasibility,
Proc. IEEE, vol. 107, no. 6, pp. 1074–1093, 2019.
[4] Y. H. Bae, “Analysis of optimal random access for broadcasting with
no. 3, pp. 573–575, 2013.
[5] Y. H. Bae, “Random access scheme to improve broadcast reliability,
IEEE Commun. Lett., vol. 17, no. 7, pp. 1467–1470, 2013.
wireless networks,IEEE Commun. Lett., vol. 19, no. 10, pp. 1782–
1785, 2015.
[7] C. Campolo, A. Vinel, A. Molinaro, and Y. Koucheryavy, “Modeling
broadcasting in IEEE 802.11p/WAVE vehicular networks,” IEEE Com-
mun. Lett., vol. 15, no. 2, pp. 199–201, 2011.
[8] L. Deng, J. Deng, P. Chen, and Y. S. Han, “On the asymptotic perfor-
mance of delay-constrained slotted ALOHA,” in Proc. IEEE ICCCN,
2018, pp. 1–8.
[9] Y. Zhang, Y. Lo, F. Shu, and J. Li, “Achieving maximum reliability
in deadline-constrained random access with multiple-packet reception,”
IEEE Trans. Veh. Technol., vol. 68, no. 6, pp. 5997–6008, 2019.
[10] L. Deng, F. Liu, Y. Zhang, and W. S. Wong, “Delay-constrained
topology-transparent distributed scheduling for MANETs,” IEEE Trans.
Veh. Technol., vol. 70, no. 1, pp. 1083–1088, 2021.
[11] A. Segall, “Recursive estimation from discrete-time point processes,”
IEEE Trans. Inf. Theory, vol. 22, no. 4, pp. 422–431, 1976.
[12] R. Rivest, “Network control by Bayesian broadcast,IEEE Trans. Inf.
Theory, vol. IT-33, no. 3, pp. 323–328, 1987.
[13] G. del Angel and T. L. Fine, “Optimal power and retransmission control
policies for random access systems,” IEEE/ACM Trans. Netw., vol. 12,
no. 6, pp. 1156–1166, 2004.
[14] L. Bononi, M. Conti, and E. Gregori, “Runtime optimization of IEEE
802.11 wireless LANs performance,IEEE Trans. Parallel Distrib. Syst.,
vol. 15, no. 1, pp. 66–80, 2004.
[15] H. Wu, C. Zhu, R. J. La, X. Liu, and Y. Zhang, “FASA: Accelerated S-
ALOHA using access history for event-driven M2M communications,”
IEEE/ACM Trans. Netw., vol. 21, no. 6, pp. 1904–1917, 2013.
[16] R. Smallwood and E. Sondik, “The optimal control of partially observ-
able Markov processes over a ﬁnite horizon, Oper. Res., vol. 21, no. 5,
pp. 1071–1088, 1973.
[17] Y. Zhang, A. Gong, Y. Lo, J. Li, F. Shu, and W. S. Wong, “Generalized
p-persistent CSMA for asynchronous multiple-packet reception,” IEEE
Trans. Commun., vol. 67, no. 10, pp. 6966–6979, 2019.
[18] A. Biason, S. Dey, and M. Zorzi, “A decentralized optimization frame-
work for energy harvesting devices,” IEEE Trans. Mob. Comput., vol. 17,
no. 11, pp. 2483–2496, 2018.
[19] A 5G trafﬁc model for industrial use cases. White Paper, 5G Alliance
for Connected Industries and Automation, 2019.
[20] M. L. Puterman, Markov decision processes: Discrete stochastic dy-
namic programming. John Wiley & Sons, 2014.
[21] P. R. Kumar and P. Varaiya, Stochastic systems: Estimation, identiﬁca-
tion, and adaptive control. SIAM, 2015.
[22] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, “The
complexity of decentralized control of Markov decision processes,”
Math. Oper. Res., vol. 27, no. 4, pp. 819–840, 2002.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper considers a multiple-access system with multiple-packet reception (MPR) capability $\gamma$, i.e., a packet can be successfully received as long as it overlaps with $\gamma-1$ or fewer other packets at any instant during its lifetime. To efficiently utilize the MPR capability, this paper generalizes $p$-persistent carrier-sense multiple access (CSMA) to consider that a user with carrier sensing capability $c$ adopts the transmission probability $p_n$ if this user has sensed $n$ ongoing transmissions for $n=0,1,\ldots,c-1$. This paper aims to model the characteristics of such CSMA and to design transmission probabilities for achieving maximum saturation throughput. To this end, we first formulate such CSMA as a parameterized Markov decision process (MDP) and use the long-run average performance to evaluate the saturation throughput. Second, by observing that the exact values of optimal transmission probabilities are in general infeasible to find, we modify this MDP to establish an upper bound on the maximum throughput, and modify this MDP again to propose a heuristic design with near-optimal performance. Simulations with respect to a wide range of configurations are provided to validate our study. The throughput performance under more general models and the robustness of our design are also investigated.
Article
Full-text available
Ultra-reliable low-latency communications (URLLC) has been considered as one of the three new application scenarios in the 5th Generation (5G) New Radio (NR), where the physical layer design aspects have been specified. With the 5G NR, we can guarantee the reliability and latency in radio access networks. However, for communication scenarios where the transmission involves both radio access and wide area core networks, the delay in radio access networks only contributes to part of the end-to-end (E2E) delay. In this paper, we outline the delay components and packet loss probabilities in typical communication scenarios of URLLC, and formulate the constraints on E2E delay and overall packet loss probability. Then, we summarize possible solutions in the physical layer, the link layer, the network layer, and the cross-layer design, respectively. Finally, we discuss the open issues in prediction and communication co-design for URLLC in wide area large scale networks.
Article
Full-text available
In this paper, we propose a contention intensity based distributed coordination (CIDC) scheme for safety message broadcast. By exploiting the high-frequency and periodical fea- tures of the safety message broadcast, the application-layer design of the CIDC enables each vehicle to estimate the instantaneous channel contention intensity in a fully distributed manner. With the contention intensity information, the MAC layer design of CIDC allows vehicles to adopt a better channel access strategy compared to the 802.11p. This is because CIDC selects the initial back-off counter for each new packet deterministically, i.e., based on the contention intensity, instead of randomly. The proposed CIDC is modeled, and key performance indicators in terms of the packet collision probability and average contention delay, are derived. It is shown that the proposed change in the initial counter selection leads to a system model completely different from the classic Markov chain based model. Moreover, the proposed CIDC, fully distributed and compatible with the 802.11p, can achieve both a much lower collision probability and a smaller contention delay compared with 802.11p at the cost of a small communication and computation overhead. Extensive simulation results demonstrate the effectiveness of the CIDC in both of the accurate and the erroneous contention intensity estimation scenarios.
Article
Transparent topology is common in many mobile ad hoc networks (MANETs) such as vehicle ad hoc networks (VANETs), unmanned aerial vehicle (UAV) ad hoc networks, and wireless sensor networks due to their decentralization and mobility nature. There are many existing works on distributed scheduling scheme design for topology-transparent MANETs. Most of them focus on delay-unconstrained settings. However, with the proliferation of real-time applications over wireless communications, it becomes more and more important to support delay-constrained traffic in MANETs. In such applications, each packet has a given hard deadline: if it is not delivered before its deadline, its validity will expire and it will be removed from the system. This feature is fundamentally different from the traditional delay-unconstrained one. In this paper, we for the first time investigate distributed scheduling schemes for a topology-transparent MANET to support delay-constrained traffic. We analyze and compare probabilistic ALOHA scheme and deterministic sequence schemes, including the conventional time division multiple access (TDMA), the Galois Field (GF) sequence scheme proposed in \cite{chlamtac1994making}, and the combination sequence scheme that we propose for a special type of sparse network topology. We use both theoretical analysis and empirical simulations to compare all these schemes and summarize the conditions under which different individual schemes perform best.
Article
Wireless networks are ever more deployed in the industrial control scenario, thanks to the numerous benefits they can bring, especially in terms of costs and flexibility. However, some critical fields of application, such as motion control, power systems automation, or power electronics control, to mention some, have extremely tight requirements in terms of timeliness, reliability, and determinism, which nowadays can only be satisfied by wired communication networks. Indeed, the available industrial wireless solutions are far from offering adequate performance levels, especially in the timing budget, due to the native limitations of their physical (PHY) layers. In this paper, an innovative approach for high-performance industrial wireless networks [wireless high performance (WirelessHP)] is presented, based on a substantial redesign of the lower layers of the industrial wireless protocol stack, with the aim of supporting the requirements of critical industrial control applications. The required levels of timeliness, reliability, and determinism are first derived through a comprehensive survey that looks at real-world application scenarios as well as at the performance of wired networks for industrial control, such as real-time Ethernet networks. The design of a new solution, which is able to satisfy these targets, is then discussed in detail, introducing a low-latency PHY layer that aims at reducing the transmission time of short packets to $1~\mu \text{s}$ , or even less. The feasibility of the proposed solution is presented through an experimental demonstrator based on software-defined radios, while its performance bounds are computed through theoretical analyses. Finally, future activities in the context of WirelessHP are widely discussed, providing an overview of the directions that will have to be addressed, particularly in the design of the upper layers.
Article
This paper considers random access in a communication channel, which is shared by $N$ active users with saturated traffic. Following a slotted ALOHA-type protocol, each active user attempts to transmit in every slot with a common probability. It is assumed that the channel has the multiple-packet reception (MPR) capability to enable the correct reception of up to $M$ ($1 \leq M < N$) time-overlapping transmissions. To support mission- and time-critical applications that require reliable delivery within a strict delivery deadline $D$ (in units of slot), the goal of this paper is to achieve the maximum deadline-constrained reliability. First, we prove the uniqueness of the optimal transmission probability for any $1\leq M<N$ and any $D\geq1$. Second, we show it can be computed by a fixed-point iteration for all the cases. Third, for real-life scenarios where $N$ may be unknown and changing, we develop a distributed algorithm for $M>1$, which allows each active user to dynamically tune its transmission probability based on a method for estimating $N$. Simulation results verify our analysis and show that the proposed tuning algorithm is effective with near-optimal performance. In addition, as a special case (i.e., $D=1$) of our study, the issue of saturation throughput maximization is completely addressed for the first time.
Article
Designing decentralized policies for wireless communication networks is a crucial problem, which has only been partially solved in the literature so far. In this paper, we propose the Decentralized Markov Decision Process (Dec-MDP) framework to analyze a wireless sensor network with multiple users which access a common wireless channel. We consider devices with energy harvesting capabilities, so that they aim at balancing the energy arrivals with the data departures and with the probability of colliding with other nodes. Randomly over time, an access point triggers a SYNC slot, wherein it recomputes the optimal transmission parameters of the whole network, and distributes this information. Every node receives its own policy, which specifies how it should access the channel in the future, and, thereafter, proceeds in a fully decentralized fashion, without interacting with other entities in the network. We propose a multi-layer Markov model, where an external MDP manages the jumps between SYNC slots, and an internal Dec-MDP computes the optimal policy in the near future. We numerically show that, because of the harvesting, a fully orthogonal scheme (e.g., TDMA-like) is suboptimal in energy harvesting scenarios, and the optimal trade-off lies between an orthogonal and a random access system.
Article
Time-constrained broadcasting is an essential operation needed for data dissemination and real-time multimedia broadcasting over wireless networks. This letter considers a deadline-constrained broadcasting based on the slotted Aloha in wireless networks, in which various relevant features, including message delivery deadline, node distribution, and stochastic message arrival process, are taken into consideration. First, our system is modeled as a discrete-time Geo/Geo/1 queue with a constant delivery deadline until the end of service. Next, we derive various performance measures such as loss probability, queue length distribution, mean waiting time, and successful delivery probability (SDP), which is a critical performance metric for a deadline-constrained broadcasting. Finally, we derive the maximum achievable SDP and the corresponding optimal access probability as a function of the message arrival rate and the node density.