ArticlePDF Available

Abstract

This paper considers dynamic optimization of random access in deadline-constrained broadcasting with frame-synchronized traffic. Under the non-retransmission setting, we define a dynamic control scheme that allows each active node to determine the transmission probability based on the local knowledge of current delivery urgency and contention intensity (i.e., the number of active nodes). For an idealized environment where the contention intensity is completely known, we develop a Markov Decision Process (MDP) framework, by which an optimal scheme for maximizing the timely delivery ratio (TDR) can be explicitly obtained. For a realistic environment where the contention intensity is incompletely known, we develop a Partially Observable MDP (POMDP) framework, by which an optimal scheme can only in theory be found. To overcome the infeasibility in obtaining an optimal or near-optimal scheme from the POMDP framework, we investigate the behaviors of the optimal scheme for extreme cases in the MDP framework, and leverage intuition gained from these behaviors together with an approximation on the contention intensity knowledge to propose a heuristic scheme for the realistic environment with TDR close to the maximum TDR in the idealized environment. We further generalize the heuristic scheme to support retransmissions. Numerical results are provided to validate our study.
1
Dynamic Optimization of Random Access in
Deadline-Constrained Broadcasting
Aoyu Gong, Yijin Zhang, Senior Member, IEEE, Lei Deng, Fang Liu, Jun Li, Senior Member, IEEE,
and Feng Shu, Member, IEEE
Abstract—This paper considers dynamic optimization of ran-
dom access in deadline-constrained broadcasting with frame-
synchronized traffic. Under the non-retransmission setting, we
define a dynamic control scheme that allows each active node
to determine the transmission probability based on the local
knowledge of current delivery urgency and contention intensity
(i.e., the number of active nodes). For an idealized environment
where the contention intensity is completely known, we develop
a Markov Decision Process (MDP) framework, by which an
optimal scheme for maximizing the timely delivery ratio (TDR)
can be explicitly obtained. For a realistic environment where the
contention intensity is incompletely known, we develop a Partially
Observable MDP (POMDP) framework, by which an optimal
scheme can only in theory be found. To overcome the infeasibility
in obtaining an optimal or near-optimal scheme from the POMDP
framework, we investigate the behaviors of the optimal scheme
for extreme cases in the MDP framework, and leverage intuition
gained from these behaviors together with an approximation on
the contention intensity knowledge to propose a heuristic scheme
for the realistic environment with TDR close to the maximum
TDR in the idealized environment. We further generalize the
heuristic scheme to support retransmissions. Numerical results
are provided to validate our study.
Index Terms—Distributed algorithms, dynamic optimization,
random access, delivery deadline
I. INTRODUCTION
A. Background
BROADCASTING is a fundamental operation in dis-
tributed wireless systems. With the explosive growth of
ultra-reliable low-latency services for the Internet of things
This work was supported in part by the National Natural Science Foundation
of China under Grants 62071236, U22A2002, 62071234, 61902256, in part
by the Major Science and Technology plan of Hainan Province under Grant
ZDKJ2021022, in part by the Scientific Research Fund Project of Hainan
University under Grant KYQD(ZR)-21008, in part by the Fundamental
Research Funds for the Central Universities of China (Nos. 30920021127,
30921013104), in part by Future Network Grant of Provincial Education
Board in Jiangsu, and in part by the Open Research Fund of State Key
Laboratory of Integrated Services Networks, Xidian University, under Grant
ISN22-14. (Corresponding author: Yijin Zhang.)
A. Gong, Y. Zhang, and J. Li are with the School of Electronic and
Optical Engineering, Nanjing University of Science and Technology, Nanjing
210094, China. Y. Zhang is also with the State Key Laboratory of Integrated
Services Networks, Xidian University, Xian 710071. E-mail: {gongaoyu;
yijin.zhang}@gmail.com, jun.li@njust.edu.cn.
L. Deng is with the College of Electronics and Information Engineering,
Shenzhen University, Shenzhen 518060, China. E-mail: ldeng@szu.edu.cn.
F. Liu is with the Department of Information Engineering, The Chi-
nese University of Hong Kong, Shatin, N. T., Hong Kong. E-mail:
lf015@ie.cuhk.edu.hk.
F. Shu is with the School of Information and Communication Engineer-
ing, Hainan University, Haikou 570228, China, and also with the School
of Electronic and Optical Engineering, Nanjing University of Science and
Technology, Nanjing, 210094, China. E-mail: shufeng0101@163.com.
(IoT) [1]–[4], such as detection information sharing in un-
manned aerial vehicles (UAV) networks, safety message dis-
semination in vehicular networks, and industrial control in fac-
tory automation, deadline-constrained broadcasting has been
becoming a research focus in recent years. For such broadcast-
ing, each packet needs to be transmitted within a strict delivery
deadline since its arrival and will be discarded if the deadline
expires. Hence, timely delivery ratio (TDR), defined as the
probability that a broadcast packet is successfully delivered
to an arbitrary intended receiver within the given delivery
deadline, is considered as a critical metric to evaluate the
performance of such broadcasting. Note that such broadcasting
is important for agents in these applications to obtain as
much timely information of the world as possible for mak-
ing operations that best achieve their application objectives.
For example, in UAV networks for collaborative multitarget
tracking [5], due to the limited detection capability, each UAV
is only able to detect targets situated within a certain area, and
has to use the detection information shared by other UAVs to
decide on the optimal path to follow in order to cover as many
targets as possible.
A canonical deadline-constrained broadcasting scenario is
that, under a given traffic pattern, an uncertain set of nodes
with new or backlogged packets attempt to transmit before
deadline expiration without centralized scheduling. In this
scenario, it is expected that the MAC layer behaves differently
from what is commonly believed in conventional deadline-
unconstrained protocols, as the contention intensity would be
jointly determined by the traffic pattern, delivery deadline, and
retransmission setting. So, random access mechanisms tailored
for this scenario are needed to support efficient channel sharing
under deadline, and careful design of access parameters is
needed to maximize the TDR.
B. Related Work and Motivation
Many recent literatures [6]–[9] have been dedicated to this
issue by assuming no retransmissions, which is commonly
adopted in broadcasting due to the lack of acknowledgments
or for the sake of energy efficiency. Under saturated traffic,
Bae [6], [7] obtained the optimal slotted-ALOHA for broad-
casting single-slot packets and optimal p-persistent CSMA for
broadcasting multi-slot packets, respectively. Under a discrete-
time Geo/Geo/1 queue model, Bae [8] obtained the optimal
slotted-ALOHA for broadcasting single-slot packets. Under
frame-synchronized traffic, Campolo et al. [9] proposed an
analytical model for using IEEE 802.11p CSMA/CA to broad-
cast multi-slot packets, which can be used to obtain the optimal
2
contention window size. To improve the TDR with the help
of retransmissions, Hassan et al. [10] investigated the impact
of retransmissions on the TDR of IEEE 802.11p CSMA/CA
under frame-synchronized traffic, and Bae [11] obtained the
optimal slotted-ALOHA with retransmissions under saturated
traffic. However, [6]–[8], [11] adopted a static transmission
probability and [9], [10] adopted a static contention window
size, thus inevitably limiting the maximum achievable TDR.
Other studies on deadline-constrained random access in-
clude [12]–[16] for uplink to a common receiver. Under
Bernoulli traffic, Bae [12] derived the optimal static transmis-
sion probability for maximizing the TDR based on stationary
Markov chain modeling. Under frame-synchronized traffic,
Deng et al. [13] developed an algorithm to recursively analyze
the timely throughput for any static transmission probability
and characterized the asymptotic behavior of optimal one.
However, [12], [13] still restrict their attentions to static access
parameters. Using absorbing Markov chain modeling, Bae
et al. [14] proposed to myopically change the transmission
probability when the contention intensity is completely known,
which, however, did not account for dynamic programming
optimality. Zhao et al. [15] proposed to simply double or
halve transmission probability based on the channel feedback,
which is easily implemented but lacks an explicit optimization
goal. Zhang et al. [16] proposed to adjust the transmission
probability for maximizing the TDR by a joint use of the
fixed point iteration and a recursive estimator, but did not
utilize all of the observed data. Another type of deadline-
constrained access is based on the sequence design [17], [18],
where each active node deterministically decides whether to
transmit according to the assigned sequence but utilizes no
observation to adjust its access behavior.
As such, to enhance the maximum achievable TDR of dis-
tributed random access in deadline-constrained broadcasting,
it is strongly required to develop a dynamic control scheme
that allows each node to adjust its access parameters according
to local knowledge of current delivery urgency and contention
intensity. Unfortunately, due to random traffic or limited capa-
bility on observing the channel status, each node cannot obtain
a complete knowledge of the current contention intensity in
practice, which renders such design a challenging task. So,
each node has to estimate the current contention intensity using
the information obtained from the observed channel status. A
great amount of work has gone into studying such information
that can be obtained [16], [19]–[23] under various models and
protocols. Our work follows the same direction of [20], [21]
to keep an A Posteriori Probability (APP) distribution for the
current contention intensity given all past observations and
access settings, which is a sufficient statistic for the optimal
design [24], but needs to additionally take into account the
impact of delivery urgencies. It should be noted that another
estimation technique is based on the “Certainty Equivalence”
principle [16], [19], [22], [23], which uses simple recursive
point estimators to merely estimate the actual value of the
current contention intensity, but does not utilize a sufficient
statistic for the optimal design. To our best knowledge, this is
the first time to study dynamic control for deadline-constrained
random access, and the previous estimation approaches [16],
[19]–[23] cannot be directly applied here.
Furthermore, it is naturally desirable for this dynamic
control to strike a balance between the chance to gain an
instantaneous successful transmission and the chance to gain
a future successful transmission within the given deadline,
which requires reasoning about future sequences of access
parameters and observations. So, the dynamic control design
under this objective is more challenging than that for maxi-
mizing the instantaneous throughput of random access [14],
[20]–[23], which is only “one-step look-ahead”. By seeing
access parameters as actions, in this paper we apply the
theories of Markov Decision Process (MDP) and Partially
Observable MDP (POMDP) to obtain optimal control schemes
for maximizing the TDR. To our best knowledge, this is the
first work to apply them in deadline-constrained broadcasting.
Although the idea of using MDP and POMDP in the context
of random access control is not new [21], [25], [26], our study
is different because the delivery urgency plays a nontrivial role
in decision making. It not only leads to accounting for time-
dependent decision rules, but also leads to a number of new
theoretical model properties (see Lemmas 1–3) to answer how
the delivery deadline affects optimal policies.
In addition, as solving POMDP is in general computation-
ally prohibitive, it is important to develop a simple control
scheme for deadline-constrained broadcasting with little TDR
performance loss. However, this design objective is unique-
ly challenging due to the difficulty in defining a reason-
able myopic optimization goal. Note that the instantaneous-
throughput-maximization (ITM) goal usually adopted in the
literature [14], [20], [21] is no longer a suitable candidate here,
because it may significantly degrade the TDR performance
especially when the delivery deadline is relatively long and
the maximum allowed number of retransmissions is limited.
As such, how to utilize the model properties to design a simple
control scheme is a major issue that needs to be addressed.
C. Contributions
In this paper, we focus on deadline-constrained broadcasting
under frame-synchronized traffic. Such a traffic pattern can
capture a number of scenarios in IoT communications [1], [9],
[10], [13], [14], [27], [28] where each node has periodic-i.i.d.
packet arrivals. Our contributions are as follows.
1) For the commonly adopted non-retransmission setting, we
generalize slotted-ALOHA to define a dynamic control
scheme, i.e., a deterministic Markovian policy, which
allows each active node to determine the current trans-
mission probability with certainty based on its current
delivery urgency and the knowledge of current contention
intensity.
2) For an idealized environment where the contention in-
tensity is completely known, we develop an analytical
framework based on the theory of MDP, which leads
to an optimal control scheme by applying the backward
induction algorithm. We further show it is indeed optimal
over all types of policies for this environment.
3) For a realistic environment where the contention inten-
sity is incompletely known, we develop an analytical
3
TABLE I: Comparison of the proposed schemes and previously known schemes when being applied to deadline-constrained
broadcasting under frame-synchronized traffic. The elementary operation is defined as root finding of one univariate polynomial.
Scheme Observation Requirement Retrans. Limit Complexity
The optimal scheme (idealized), b
π(Section III) idealized no retrans. O(N D)operations
The optimal scheme (realistic), π(Section IV) realistic no retrans. O(|BD|D)operations
The heuristic scheme (realistic), πheu (Section V) realistic no retrans. O(1) operation (closed-form formula)
The heuristic retrans.-based scheme (realistic), πheuR
(Section VI) realistic an arbitrary number
of retrans. O(1) operation (closed-form formula)
The ITM scheme ( [14], [20], [21]) realistic no retrans./no limit O(|BD|)operations
The static scheme ( [6], [8], [11], [13]) realistic an arbitrary number
of retrans. O(1) operation
framework based on the theory of POMDP, which can
in theory lead to an optimal control scheme by backward
induction. We also show it is indeed optimal over all types
of policies for this environment.
4) To overcome the infeasibility in obtaining an optimal or
near-optimal control scheme from the POMDP frame-
work, we investigate the behaviors of the optimal control
scheme for two extreme cases in the idealized envi-
ronment and use these behaviors as clues to design a
simple heuristic control scheme (without need to solve
any dynamic programming equations) for the realistic
environment with TDR close to the maximum achievable
TDR in the idealized environment. In addition, we pro-
pose an approximation on the knowledge of contention
intensity to further simplify this heuristic scheme.
Note that, although the MDP framework in the idealized envi-
ronment has limited applicability as the contention intensity
cannot be completely known in practice, it will serve to
provide an upper bound on the maximum achievable TDR in
the realistic environment, and serve to provide clues to design
a heuristic scheme for the realistic environment.
5) To further improve the TDR in the realistic environment,
we generalize the proposed heuristic scheme to support
retransmissions.
A comparison of the proposed schemes and previously known
schemes when being applied to deadline-constrained broad-
casting under frame-synchronized traffic is summarized in
Table I. Previously known schemes [6], [8], [11], [13] adopted
the static transmission probability and [14], [20], [21] adjusted
the transmission probability for ITM merely relying on the
knowledge of contention intensity. In contrast, our schemes ac-
count for the local knowledge of current delivery urgency and
contention intensity simultaneously to adjust the transmission
probability, which can yield better performance. Moreover,
our schemes can be applied to the realistic environment, can
support an arbitrary number of retransmissions, and still have
low complexity.
The remainder of this paper is organized as follows. For
the non-retransmission setting, the network model, protocol
design, and problem formulation are specified in Section II.
Optimal schemes for the idealized and realistic environments
are studied in Sections III and IV, respectively. The proposed
heuristic scheme for the realistic environment is presented in
Section V. In Section VI, we generalize this heuristic scheme
to support retransmissions. Numerical results with respect to
a wide range of configurations are provided in Section VII.
Conclusions are given in Section VIII.
II. SY ST EM MO DE L AN D PROB LE M FOR MU LATI ON
A. Network Model
Consider a wireless system with global synchronization,
where a finite number, N2, of nodes are within the
communication range of each other. The global time axis is
divided into frames, each of which consists of a finite number,
D1, of time slots of equal duration, indexed by t T ,
{1,2, . . . , D}. To broadcast the freshest information, at the
beginning of each frame, each node independently generates
a packet to be transmitted with probability λ(0,1]. We
further assume that every packet has a strict delivery deadline
Dslots, i.e., a packet generated at the beginning of a frame
will be discarded at the end of this frame.
By considering random channel errors due to wireless
fading effect, we assume that a packet sent from a node is
successfully received by an arbitrary other node with prob-
ability σ(0,1] if it does not collide with other packets,
and otherwise is certainly unsuccessfully received by any
other node. Due to the broadcast nature, we assume that
every packet is neither acknowledged nor retransmitted. Then,
at the beginning of slot tof a frame, a node is called an
active node if it generated a packet at the beginning of the
frame and has not transmitted before slot t, but is called an
inactive node otherwise. Each active node follows a common
control scheme for random access, which will be defined
in Section II-B, to generate transmission probabilities at the
beginnings of different slots. A slot is said to be in the idle
channel status if no packet is being transmitted, and in the
busy status otherwise. At the end of a slot, assume that each
node is able to be aware of the channel status of this slot.
The values of N,Dand λare all assumed to be completely
known in advance to each node.
In this paper, we will mainly focus on the aforementioned
system model. The extension to retransmission-based broad-
casting will be discussed in Section VI. It should be noted
that, for ease of presentation, although we have assumed an
identical arrival rate, results obtained in this paper can be
readily extended to general cases.
The main notations used in our analysis are listed in Table II.
4
TABLE II: Main notations.
Notation Description
NNumber of nodes
DDelivery deadline (in time slots)
λPacket arrival rate
σPacket success rate
ntActual number of other active nodes in the view of
an arbitrary node at the beginning of slot t
btActivity belief at the beginning of slot t
ptTransmission probability at the beginning of slot t
qtStatus of the tagged node at the beginning of slot t
βtState transition function at slot t
rtThe reward gained at slot t
bπtTransmission function (idealized) at slot t
b
πPolicy (idealized)
R
b
πExpected total reward (idealized)
TDR
b
πTDR under the policy b
π
U
tValue function (idealized) at slot t
otObservation on the channel status (realistic) of slot t
ωtObservation function (realistic) at slot t
hλInitial activity belief (realistic)
θtBayesian update function (realistic) at slot t
χtBayesian update normalizing factor (realistic) at slot t
πtTransmission function (realistic) at slot t
πPolicy (realistic)
RπExpected total reward (realistic)
TDRπTDR under the policy π
V
tValue function (realistic) at slot t
(Mt, αt)Parameter vector for approximating bt(realistic)
A broadcasting scenario with the above assumptions can be
found in Fig. 1. There is a communication link between any
two UAVs, i.e., all UAVs are within the communication range
of each other, and each UAV adopts the protocol proposed
in Section II-B to broadcast its detection information to all
other UAVs for collaborative multitarget detection. It can be
employed in a traffic mapping application [29] where a team of
UAVs keep estimating the locations of ground vehicles, and
a surveillance application [30] where a team of UAVs keep
tracking several targets with uncertainties caused by occlusion
or motion blur.
B. Protocol Description
Due to frame-synchronous traffic, at the beginning of an
arbitrary slot with at least one active node, we know each ac-
tive node has the same delivery urgency. To take into account
the joint impact of delivery urgency and the knowledge of
contention intensity on determining transmission probabilities,
a dynamic control scheme for random access in deadline-
constrained broadcasting, which can be seen as a generaliza-
tion of slotted-ALOHA, is formally defined as follows.
Consider an arbitrary frame. Let the random variable nt
taking values in N,{0,1, . . . , N 1}denote the actual
number of other active nodes in the view of an arbitrary
node at the beginning of slot t T . At the beginning of an
arbitrary slot t T with at least one active node, we assume
¬
t =
TargetTarget
TargetTarget
Detection Area
TargetTarget
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
et
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
et
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
Ta
et
Target
TargetTarget
TargetTarget
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
et
Ta
Ta
Ta
Ta
Ta
Ta
Ta
rg
rg
rg
rg
rg
rg
rg
et
et
et
et
et
et
Ta
Ta
Ta
Ta
rg
rg
rg
et
et
et
Ta
Ta
Ta
Ta
rg
rg
rg
et
et
et
Target
UAV
UAV
UAV
UAV
Communication Link 1 2 3 1 2 3
D = 3
Ăì
TargetTarget
Fig. 1: A deadline-constrained broadcasting scenario for col-
laborative multitarget detection, which requires each UAV to
be equipped with a Ground Moving Target Indicator (GMTI)
sensor module and a wireless communication module.
that each active node has the same observation history (for
estimating the actual value of nt) from the environment, and
we require each active node to adopt the same transmission
probability. Thus, each active node has the same knowledge
of the value of ntbased on all past observations from the
environment and all past transmission probabilities. Such a
knowledge can be summarized by a probability vector bt,
bt(0), bt(1), . . . , bt(N1), called the activity belief, where
bt(n)is the conditional probability (given all past observations
from the environment and all past transmission probabilities)
that nt=n. Let Btdenote the set of all possible values
of btin [0,1]Nsuch that N1
n=0 bt(n)=1. Hence, at the
beginning of every slot t T with at least one active node,
we require each active node to use the values of tand bt
for determining the value of transmission probability ptby a
transmission function πt:Bt[0,1]. An example of the
working procedure for N= 8,D= 6 is illustrated in Fig. 2.
packet transmission
node 1
p1p2p3p4p5
t = 1
transmission
probabilities
t = 2 t = 3 t = 4 t = 5 t = 6
node 2
node 3
node 4
node 5
node 6
node 7
node 8
active
inactive
active
active
inactive
active
active
inactive
inactive
inactive
active
active
inactive
active
active
inactive
inactive
inactive
active
active
inactive
active
inactive
inactive
inactive
inactive
inactive
active
inactive
inactive
inactive
inactive
inactive
inactive
inactive
active
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
inactive
an arbitrary frame
busy busy busy idle busy idle
channel status
]
1,
0
[
:®
t
tB
p
Fig. 2: An example of the working procedure of the dynamic
control scheme for N= 8,D= 6.
We further consider two different environments for active
nodes to obtain the value of the activity belief bt.
1) Idealized environment: at the beginning of every slot t
Twith at least one active node, each active node always
has a complete knowledge of the value of nt, i.e., bt=
(0, . . . , 0, bt(n)=1,0, . . . , 0) if nt=nactually. Hence,
5
the transmission function πtin this environment can be
simply written as a function πt:N [0,1], i.e., πtcan
be chosen from all possible mappings from Nto [0,1].
A dynamic control scheme
πis defined by a sequence
of transmission functions for the idealized environment:
π,(π1,π2, . . . , πD),where πt:N [0,1].
Let
Πdenote the set of all possible such schemes.
2) Realistic environment: at the beginning of each slot t T
with at least one active node, each active node is able to
obtain the value of btonly based on the characteristic
of packet arrivals, all past channel statuses (idle or busy)
and all past transmission probabilities, and thus has an
incomplete knowledge of the value of nt. A dynamic con-
trol scheme πis defined by a sequence of transmission
functions for the realistic environment:
π,(π1, π2, . . . , πD),where πt:Bt[0,1].
Let Πdenote the set of all possible such schemes.
Obviously, the idealized environment is infeasible to imple-
ment due to the difficulty in determining the initial actual
number of other active nodes and determining the number
of nodes involved in each busy slot, whereas the realistic
environment can be easily implemented without imposing
extra overhead and hardware cost.
C. Problem Formulation
As all active nodes are homogeneous and, accordingly,
the performance of them is the same, we can consider an
arbitrary node as the tagged node to evaluate the network
performance. Let the random variable qttaking values from
{0 (inactive),1 (active)}denote the status of the tagged node
at the beginning of slot t.
The optimization problem for the idealized environment can
be formulated as
(P1) max
b
π
b
Π
TDR
b
π,
where TDR
b
πis the TDR under the control scheme
π, i.e.,
TDR
b
π,
n∈N N1
nλn(1 λ)N1n
·E
b
π
t∈T ,qt=1
σπt(nt)1πt(nt)nt|q1= 1, n1=n.
Similarly, the optimization problem for the realistic environ-
ment can be formulated as
(P2) max
πΠTDRπ,
where TDRπis the TDR under the control scheme π, i.e.,
TDRπ
,Eπ
t∈T ,qt=1
σπt(bt)1πt(bt)nt|q1= 1,b1=hλ.
The objective of Sections III–IV is to seek optimal control
schemes for maximizing the TDR under the idealized and
realistic environments, respectively.
III. MDP FRA ME WORK FOR THE IDEALIZED
ENVIRONMENT
In this section, we formulate the random access control
problem in the idealized environment as an MDP, use the
expected total reward of this MDP to evaluate the TDR, and
obtain an optimal control scheme that maximizes the TDR.
A. MDP Formulation
From the dynamic control scheme specified in Section II-B,
we see that each node becomes inactive at the beginning of
slot t+ 1 with the transmission probability ptif it is active at
the beginning of slot t, and will be always inactive otherwise.
This implies that the probability of moving to the next state
in the state process (qt, nt)t∈T depends only on the current
state. Thus, (qt, nt)t∈T can be viewed as a discrete-time finite-
horizon, finite-state Markov chain.
Based on the Markov chain (qt, nt)t∈T , we present an MDP
formulation by introducing the following definitions.
Actions: At the beginning of each slot t T with qt= 1,
the action performed by the tagged node (and the other active
nodes) is the chosen transmission probability pttaking values
in the action space [0,1]. Note that the tagged node performs
no action when qt= 0.
State Transition Function: As the tagged node will never
transmit since slot tif qt= 0, we only consider the state
transition function when qt= 1. The state transition function
βt(q, n),(1, n), pis defined as the transition probability
of moving from the state (qt, nt) = (1, n)to (qt+1, nt+1) =
(q, n)when each active node at the beginning of slot tadopts
the transmission probability pt=p. So, we have
βt(q, n),(1, n), p
,Pr(qt+1, nt+1 ) = (q, n)|(qt, nt) = (1, n), pt=p
=n
nnpnn+1q(1 p)n+q,if nn,
0,otherwise,(1)
for each t T \ {D}, each q {0,1}, each n, n N and
each p[0,1].
Rewards: The reward gained at slot tis defined as the
average number of packets of the tagged node transmitted
successfully to an arbitrary other node at slot t. As there is no
reward at slot twhen qt= 0, we only focus on the cases when
qt= 1. Let rt(1, n), pdenote the reward at slot tfor the
state (qt, nt) = (1, n)when each active node at the beginning
of slot tadopts pt=p. So, we have
rt(1, n), p=σp(1 p)n,(2)
for each t T , each n N and each p[0,1].
Policy: A dynamic control scheme
πdefined in Section II-B
can be seen as a deterministic Markovian policy.
Let R
b
π(1, n)denote the expected total reward from slot 1
to slot Dwhen q1= 1,n1=nand the policy
πis used,
which can be defined by
R
b
π(1, n),E
b
πD
t=1,qt=1
rt(1, nt),πt(nt)|q1= 1, n1=n.
Obviously, TDR
b
π=n∈N N1
nλn(1 λ)N1nR
b
π(1, n).
6
B. MDP solution
Due to the finite horizon, finite state space, compact action
space, bounded rewards, continuous rewards with respect to p
and continuous state transition function with respect to pin
our MDP formulation, [31, Prop. 4.4.3, Ch. 4] indicates that
for maximizing TDR
b
π, there exists a
π
Π, which is indeed
optimal over all random and deterministic, history-dependent
and Markovian policies. This property also justifies the design
goal for the idealized environment considered in Section II.
Hence, we aim to seek
πarg max
b
π
b
Π
R
b
π(1, n),n N .
Let U
t(1, n)denote the value function corresponding to the
maximum total expected reward from slot tto slot Dwhen
qt= 1 and nt=n. Averaging over all possible next states
with qt+1 = 1, we arrive at the following Bellman’s equation:
U
D(1, n) = max
p[0,1] rD(1, n), p,n N ,
U
t(1, n) = max
p[0,1] rt(1, n), p
+
n∈N
βt(1, n),(1, n), pU
t+1(1, n),n N ,
(3)
for each t T \ {D}.
Applying the backward induction algorithm to get a solution
to Eq. (3) involves finding global maximizers of N D real-
coefficient univariate polynomials defined on [0,1], and can
formally lead to
π. This indicates that even if the contention
intensity is completely known, the computation required to
obtain an optimal scheme is still demanding in practice.
IV. POMDP FR AM EW OR K FO R TH E REA LI ST IC
ENVIRONMENT
In this section, we formulate the random access control
problem in the realistic environment as a POMDP, use the
expected total reward of this POMDP to evaluate the TDR,
and discuss how to obtain an optimal or near-optimal scheme.
A. POMDP Formulation
Based on the Markov chain (qt, nt)t∈T specified in Sec-
tion III and the activity belief btfor each t T with qt= 1
specified in Section II-B, we present a POMDP formulation
by introducing the following definitions.
Actions, State Transition Function, Rewards: The defini-
tions of these elements are the same as in Section III.
Observations and Observation Function: The tagged node
at the beginning of slot t+ 1 can obtain an observation on the
channel status of slot t, denoted by ot. When qt+1 = 0, the
tagged node will never transmit since slot t+1 and otwill thus
be useless. Hence, we only consider otwhen qt+1 = 1 taking
values from the observation space O,{0 (idle),1 (busy)}.
Further, the observation function ωto, (1, n),(1, n)is de-
fined as the probability that the tagged node at the beginning
of slot t+ 1 obtains the observation ot=oif (qt, nt) = (1, n)
and (qt+1, nt+1 ) = (1, n). So, we have
ωto, (1, n),(1, n)
,Prot=o|(qt, nt) = (1, n),(qt+1, nt+1 ) = (1, n)
=
1,if o= 0, n =n,
1,if o= 1, n n1,
0,otherwise,
for each t T \ {D}, each o O and each n, n N .
Bayesian update of the Activity Belief: It has been shown
in [24] that for each t T with qt= 1, the value of the
activity belief btis a sufficient statistic for the initial activity
belief, all past channel statuses, and all past transmission
probabilities. First, by the total number of nodes Nand the
packet generation probability λ, the tagged node can obtain
b1=hλ,(1 λ)N1,(N1)λ(1 λ)N2, . . . , λN1.
(4)
Then, for each t T \{D}, given the condition qt=qt+1 = 1,
the activity belief bt=b, the observation ot=o, and the
transmission probability pt=pused at slot t, the tagged node
at slot t+ 1 can obtain bt+1 via the Bayes’ rule:
bt+1 ,θt(b, p, o, 1,1),
bt+1(n)
,Prnt+1 =n|bt=b, pt=p, ot=o, qt=qt+1 = 1
=n∈N b(n)ωto, (1, n),(1, n)βt(1, n),(1, n), p
χt(o, b, p, 1,1) ,
for each n N , where
χt(o, b, p, 1,1) ,Prqt+1 = 1, ot=o|qt= 1,bt=b, pt=p
=
n∈N
b(n)
n′′∈N
ωto, (1, n),(1, n′′)βt(1, n′′ ),(1, n), p.
Policy: A dynamic control scheme πdefined in Section II-B
can be seen as a deterministic Markovian policy.
Let Rπ(1,hλ)denote the expected total reward from slot
1to slot Dwhen q1= 1,b1=hλand the policy πis used,
which can be defined by
Rπ(1,hλ)
,EπD
t=1,qt=1
rt(1, nt), πt(bt)|q1= 1,b1=hλ.
Obviously, TDRπ=Rπ(1,hλ).
B. POMDP solution
Due to the finite horizon, finite state space, compact action
space, bounded rewards, continuous rewards with respect
to pand continuous χt(o, b, p, 1,1) with respect to pin
our POMDP formulation, [31, Prop. 4.4.3, Ch. 4] and [32,
Thm. 7.1, Ch. 6] indicate that for maximizing TDRπ, there
exists a πΠ, which is indeed optimal over all types of
policies. This property justifies the design goal for the realistic
environment considered in Section II. Hence, we aim to seek
πarg max
πΠ
Rπ(1,hλ).
7
Let V
t(1,b)denote the value function corresponding to the
maximum total expected reward from slot tto slot Dwhen
qt= 1 and bt=b. Averaging over all possible current states
with qt= 1 and observations with qt+1 = 1, we arrive at the
following Bellman’s equation:
V
D(1,b) = max
p[0,1]
n∈N
b(n)rD(1, n), p,b BD,
V
t(1,b) = max
p[0,1]
n∈N
b(n)rt(1, n), p
+
o∈O
χt(o, b, p, 1,1)V
t+11, θt(b, p, o, 1,1),b Bt,
(5)
for each t T \ {D}. Solving Eq. (5) formally leads to π.
Unfortunately, getting πby solving Eq. (5) is computation-
ally intractable, as both the belief state space t∈T Btand the
action space [0,1] are infinite in our POMDP formulation. As
such, an alternative is to consider a discretized action space
Adthat only consists of uniformly distributed samples of
the interval [0,1], i.e., Ad,{0,p, 2∆p, . . . , 1}, where p
denotes the sampling interval. Hence, it is easy to see that Bt
will become finite for each t T due to the finite Ad. Then,
theoretically, applying the backward induction algorithm [24]
to get a solution to Eq. (5) can lead to a near-optimal policy,
whose loss of optimality increases with p. However, this
approach is still computationally prohibitive due to super-
exponential growth in the value function complexity.
V. A HEURISTIC SCH EM E FO R TH E REA LI ST IC
ENVIRONMENT
To overcome the infeasibility in obtaining an optimal or
near-optimal control scheme for the realistic environment from
the POMDP framework, in this section, we propose a simple
heuristic control scheme that utilizes the key properties of our
problem. It will be shown in Section VII that the heuristic
scheme performs quite well in simulations.
A. Heuristics from the idealized environment
We first investigate the behaviors of
πfor two extreme
cases in the idealized environment, which would serve to pro-
vide important clues on approximating
π. Let Ut(1, n), p
denote the total expected reward from slot tto slot Dfor the
state (qt, nt) = (1, n)when each active node at the beginning
of slot tadopts the transmission probability pt=pand the
optimal decision rules at slots t+ 1, t + 2, . . . , D. So, we have
Ut(1, n),π
t(n)=U
t(1, n).
Lemma 1. When n1=m , by assuming that each
collision involves a finite number of packets, for each t T
and each possible nt=n, we have
lim
m→∞(n+ 1)Ut(1, n),1
n+ 1=(Dt+ 1)σ
e,(6)
lim
m→∞(n+ 1)U
t(1, n) = (Dt+ 1)σ
e.(7)
Proof. Assume each collision involves at most a finite number,
k2, of packets. We begin with the case t=D. By Eq. (2),
we know UD(1, n), p=σp(1p)nand thus π
D(n) = 1
n+1 .
As kand Dare both finite, for each nD=n {mk(D
1), m k(D1)+ 1, . . . , m}, we obtain that m implies
n and then
lim
m→∞(n+1)U
D(1, n) = lim
m→∞(n+1)UD(1, n),1
n+ 1=σ
e.
(8)
Consider the case t=D1. By the finite-horizon policy
evaluation algorithm [31] and Eqs. (1), (2), for each nD1=
n {mk(D2), m k(D2) + 1, . . . , m}, we have
(n+ 1)UD1(1, n), p= (n+ 1)rD1(1, n), p
+ (n+ 1)
n∈N
βD1(1, n),(1, n), pU
D(1, n)
=σ(n+ 1)p(1 p)n
+
n∈N
(n+ 1) n!
n!(nn)!pnn(1 p)n+1U
D(1, n)
=σ(n+ 1)p(1 p)n
+
n∈N n+ 1
nnpnn(1 p)n+1(n+ 1)U
D(1, n).
By assuming each collision involves at most a finite number,
k2, of packets, we have
(n+ 1)UD1(1, n), p=σ(n+ 1)p(1 p)n
+
n
n=nk+1 n+ 1
nnpnn(1 p)n+1(n+ 1)U
D(1, n)
+1
n
n=nk+1 n+ 1
nnpnn(1 p)n+1
·(nk+ 1)U
D(1, n k)(9)
σ(1 1
n+ 1)n+ (nk+ 1)U
D(1, n k)
+
n
n=nk+1 n+ 1
nnpnn(1 p)n+1
·(n+ 1)U
D(1, n)(nk+ 1)U
D(1, n k).(10)
For each n {nk+ 1, n k+ 2, . . . , n}, since
0n+1
nnpnn(1 p)n+1 1, by applying the squeeze
theorem, we obtain from Eq. (8) that
lim
m→∞ n+ 1
nnpnn(1 p)n+1
·(n+ 1)U
D(1, n)(nk+ 1)U
D(1, n k)= 0.
(11)
By Eqs. (8), (11) and inequality (10), as kand Dare both
finite, we further obtain that m implies n and
lim sup
m→∞ (n+ 1)UD1(1, n), p
lim sup
m→∞ σ(1 1
n+ 1)n+ (nk+ 1)U
D(1, n k)
= lim
m→∞ σ(1 1
n+ 1)n+ (nk+ 1)U
D(1, n k)=2σ
e,
which implies
lim sup
m→∞ (n+ 1)U
D1(1, n)2σ
e.(12)
8
By setting πD1(n) = 1
n+1 for each nD1=n {m
k(D2), m k(D2) + 1, . . . , m}, as kand Dare both
finite, we obtain that m implies n , and then obtain
from Eqs. (8), (9) and (11) that
lim
m→∞(n+ 1)UD1(1, n),1
n+ 1
= lim
m→∞ σ(1 1
n+ 1)n+ (nk+ 1)U
D(1, n k)=2σ
e.
Since U
D1(1, n)UD1(1, n),1
n+1 , we have
lim inf
m→∞ (n+ 1)U
D1(1, n)
lim inf
m→∞ (n+ 1)UD1(1, n),1
n+ 1=2σ
e.(13)
Combining inequalities (12) and (13), we have limm→∞(n+
1)U
D1(1, n) = 2σ
e.
For the cases of t=D2, D3, . . . , 1, iteratively repeating
the above argument can lead to Eqs. (6) and (7) for each
possible nt=n.
Lemma 1 motivates us to conjecture that, if n1takes a value
sufficiently larger than D, the realizations of (nt+ 1)π
t(nt)
would always approach 1 for each t T . Fig. 3 shows
1000 such realizations when D= 10 for n1= 30,50,100,
respectively, which confirm our conjecture.
12345678910
1
1.002
1.004
1.006
1.008
1.01
1.012
1.014
1.016
Fig. 3: Realizations of (nt+ 1)π
t(nt)when n1= 30,50,100
and D= 10.
We further investigate the behaviors of
πfor the extreme
case that n1takes a value sufficiently smaller than D.
Lemma 2. For each t T , we have
U
t(1,1) = 3D3t+ 1
3D3t+ 4σ, (14)
and for each t T \ {D}, we have
π
t(1) = 3
3D3t+ 4.(15)
Proof. As U
t(1,0) = σfor each t T , we have
Ut(1,1), p= 2σp(1 p) + (1 p)2U
t+1(1,1),(16)
for each t T \ {D}. Taking the derivative of Ut(1,1), p
with respect to pderives that
d
dpUt(1,1), p
=2σ2U
t+1(1,1)4σ2U
t+1(1,1)p.
As σ > 0and U
t(1,1) σfor each t T \ {D}, we have
π
t(1) = σU
t+1(1,1)
2σU
t+1(1,1) ,(17)
for each t T \ {D}. In particular, as U
D(1,1) = σ/4, we
obtain π
D1(1) = 3/7, which satisfies Eq. (15).
Then, we aim to investigate the relation between π
t(1) and
π
D1(1) for each t T \ {D1, D}. By setting p=π
t(1)
in Eq. (16), we obtain
U
t(1,1) = 2σπ
t(1)1π
t(1)+1π
t(1)2U
t+1(1,1)
=σ2
2σU
t+1(1,1) .(18)
Using Eq. (17) to express U
t+1(1,1) and U
t(1,1) in Eq. (18)
in terms of π
t(1) and π
t1(1), respectively, we have
π
t(1) = π
t+1(1)
1 + π
t+1(1) ,(19)
for each t T \ {D1, D}. Furthermore, recursively using
Eq. (19) yields
π
t(1) = π
D1(1)
1+(Dt1)π
D1(1),(20)
and thus implies Eq. (15) by π
D1(1) = 3/7.
Finally, combining Eqs. (15) and (17) yields
U
t(1,1) = 12π
t1(1)
1π
t1(1) σ=3D3t+ 1
3D3t+ 4σ, (21)
for each t T \ {1}, and substituting Eq. (21) into Eq. (18)
yields U
1(1,1) = 3D2
3D+1 σ. Hence we complete the proof for
Eq. (14).
Inspired by Eq. (15), we consider a simple control scheme
πeve ,[πeve
1,πeve
2, . . . , πeve
D]
Πwhere
πeve
t(n) = 1
Dt+ 1,(22)
for each t T and each n N .
Let Ueve
t(1, n)denote the expected total reward from slot t
to slot Dfor the state (qt, nt) = (1, n)when each active node
adopts the decision rules πeve
tat slots t, t+1, . . . , D. So, using
the finite-horizon policy evaluation algorithm [31], we have
Ueve
D(1, n) = rD(1, n),πeve
D(n),n N ,
Ueve
t(1, n) = rt(1, n),πeve
t(n)
+
n∈N
βt(1, n),(1, n),πeve
t(n)Ueve
t+1(1, n),n N .
(23)
for each t T \ {D}.
Lemma 3. For each t T \ {D}and each n N , we have
Ueve
t(1, n) = σ11
Dt+ 1n.(24)
9
Proof. We shall prove Ueve
t(1, n) = σ11
Dt+1 nfor each
n N by induction from t=D1down to 1.
First, when t=D1, by Eqs. (1), (2) and (23), we have
Ueve
D1(1, n) = rD1(1, n),πeve
D1(n)
+
n∈N
βD1(1, n),(1, n),πeve
D1(n)Ueve
D(1, n)
=σ1
211
2n+ (1 1
2)n+1Ueve
D(1,0) = σ(1 1
2)n,
for each n N , thereby establishing the induction basis.
When t T \ {D1, D}, assume Ueve
t+1(1, n) = σ1
1
Dtnfor each n N . By Eqs. (1), (2) and (23), we have
Ueve
t(1, n) = rt(1, n),πeve
t(n)
+
n∈N
βt(1, n),(1, n),πeve
t(n)Ueve
t+1(1, n)
=σ1
Dt+ 111
Dt+ 1n
+
n∈N n
nn1
Dt+ 1nn
·11
Dt+ 1n+1σ11
Dtn
=σ11
Dt+ 1n1
Dt+ 1
+σ11
Dt+ 1nDt
Dt+ 1
·
n∈N n
nn1
Dtnn11
Dtn
=σ11
Dt+ 1n,
for each n N . So, the inductive step is established.
Since both the base case and the inductive step have been
proved as true, we have Ueve
t(1, n) = σ11
Dt+1 nfor each
t T \ {D}and each n N .
For each t T \ {D}and each n N , based on the fact
U
t(1, n)σ, we have
Ueve
t(1, n)
U
t(1, n)11
Dt+ 1n.(25)
We can observe from Eq. (25) that, if nis sufficiently smaller
than Dt+ 1, the value of U
t(1, n)is close to the value
of Ueve
t(1, n). Then, Eq. (25) motivates us to conjecture that,
if nttakes a value sufficiently smaller than Dt+ 1,πeve
t
may behave like π
t, i.e., the realizations of (Dt+ 1)π
t(nt)
would always approach (Dt+1)πeve
t(nt) = 1 for each nt=
n N . Fig. 4 shows 1000 such realizations when n1= 10 for
D= 30,50,100, respectively, which confirm our conjecture.
Naturally, we obtain the following heuristics from Lemma
1 and Eq. (25), respectively.
1) From Lemma 1: When the number of active nodes is
sufficiently large compared with the value of remaining
slots, it is desirable for the active nodes to adopt the
transmission probability that maximizes the instantaneous
throughput. This implies that, when the remaining slots
are not enough, the active nodes should utilize the channel
as much as possible, which is time-independent.
12345678910
0.972
0.976
0.98
0.984
0.988
0.992
0.996
1
Fig. 4: Realizations of (Dt+ 1)π
t(nt)when n1= 10 and
D= 30,50,100.
0 5 10 15 20 25 30
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
Fig. 5: π
t(n)and its approximation for typical choices of
parameters when D= 30.
2) From Eq. (25):When the number of active nodes is
sufficiently small compared with the value of remaining
slots, it is desirable for the active nodes to adopt the
transmission probability to ensure that all the backlogged
packets would be almost evenly transmitted in the remain-
ing slots. This implies that, when the remaining slots are
enough, the active nodes should cherish the transmission
chance in order to utilize the time as much as possible,
which is time-dependent.
Based on these heuristics and the obvious fact π
D(n) = 1
n+1
for each n N , we propose a simple approximation on
π.
Approximation on
π: For each slot t T and each nt=
n N , if the number of active nodes n+ 1 is larger than
the value of remaining slots Dt+ 1 or t=D,π
t(n)can
be estimated by 1
n+1 , otherwise π
t(n)can be estimated by
1
Dt+1 .
Fig. 5 compares π
t(n)and its approximation for typical
choices of parameters when D= 30. It is shown that the
approximation error is very small when the difference between
nand Dtis large. This is because π
t(n)is dominated by nif
nis much larger than Dtbut is dominated by tif nis much
smaller than Dt, which is consistent with the heuristics.
However, it is shown that the approximation error is noticeable
10
0 5 10 15 20 25 30
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Fig. 6: The total expected rewards from slot tto Dcorre-
sponding to π
t(n)and its approximation for typical choices
of parameters when D= 30.
when the difference between nand Dtis small. This is
because both nand thave notable impacts on π
t(n)for this
case, which is not considered in the heuristics. The results also
show that the ratio of the cases with the approximation error
larger than 8% is 6.67% and the largest approximation error
is 11.47%, thus justifying our approximation. Furthermore,
Fig. 6 compares the total expected rewards from slot tto
Dcorresponding to π
t(n)and its approximation for typical
choices of parameters when D= 30. The results show that
the approximation leads to at most 0.66% reward loss (much
smaller than the approximation error), and thus verify that our
approximation can be used to obtain TDR quite close to the
maximum achievable TDR in the idealized environment.
B. A simple approximation on the activity belief of the realistic
environment
To apply the approximation on
πto the realistic envi-
ronment, it is necessary for each active node to perform
a runtime updating of the activity belief bt. However, as
shown in Section IV-A, the full Bayesian updating of bt
is a bit computationally demanding to implement. So, we
shall propose a simple approximation on bt, denoted by
bbd
t,bbd
t(0), bbd
t(1), . . . , bbd
t(N1), relying on a binomial
distribution with a changeable parameter vector (Mt, αt).
More specifically, if (Mt, αt) = (M, α), we have
bbd
t(n) = M
nαn(1 α)Mn,if 0nM,
0,otherwise.(26)
As such, in this manner, each active node will only keep the
parameter vector (Mt, αt)rather than the activity belief bt.
Obviously, by Eq. (4), we can set (M1, α1)=(N1, λ)
to achieve bt=bbd
t. Then, for each t T \ {D}, we will
show that we can use the Bayes’ rule exactly to set the value of
(Mt+1, αt+1 )when the observation ot= 0, but must introduce
an approximation assumption when ot= 1.
For each t T \ {D}, given (Mt, αt) = (M, α),bbd
t=bbd
and pt=p, the following procedure first uses the Bayes’ rule
to compute bmed
t+1 ,bmed
t+1(0), bmed
t+1(1), . . . , bmed
t+1(N1), and
then computes (Mt+1, αt+1 )based on the value of bmed
t+1.
Case 1: if ot= 0, the Bayesian update yields
bmed
t+1(n)
=n∈N bbd(n)ωt0,(1, n),(1, n)βt(1, n),(1, n), p
χt(0,bbd, p, 1,1)
=
M
nααp
1αp n1ααp
1αp Mn
,if 0nM,
0,otherwise.
We require bbd
t+1 to directly take the value of bmed
t+1, and set
(Mt+1, αt+1 ) = (M, ααp
1αp ).
Case 2: if ot= 1, the Bayesian update yields
bmed
t+1(n)
=n∈N bbd(n)ωt1,(1, n),(1, n)βt(1, n),(1, n), p
χt(1,bbd, p, 1,1)
=
1
1(1αp)M
·M
nα(1 p)n1α(1 p)Mn
(1 αp)MM
nααp
1αp n1ααp
1αp Mn,
if 0nM1,
0,otherwise.
Such a Bayesian update does not yield a distribution in the
form (26). However, we modify the value of bmed
t+1 to a
distribution in the form (26) by keeping the mean of the
distribution unchanged and considering that the number of
active nodes will be reduced by at least one due to a busy
slot. So, when M > 1, we set
(Mt+1, αt+1 ) = M1,M(ααp)1(1 αp)M1
(M1)1(1 αp)M,
and when M= 1, we adopt the convention that
(Mt+1, αt+1 ) = (M1,1).
The accuracy of this approximation will be examined via
numerical results at the end of this section.
C. A heuristic scheme
With the investigations in Sections V-A and V-B together,
we are ready to propose a heuristic but very simple control
scheme for the realistic environment, πheu.
At the beginning of each slot t T , given the parameter
vector of belief approximation (Mt, αt) = (M, α), each active
node uses Mα to estimate the mean of nt, and further uses
the following simple rule πheu
t(M, α)to determine the value
of transmission probability pt.
1) If M α + 1 > D t+ 1 or t=D, set πheu
t(M, α)to
maximize the expected instantaneous throughput, i.e.,
πheu
t(M, α)arg max
p[0,1]
n∈N
bbd(n)rt(1, n), p
= min 1
Mα +α,1.(27)
11
TABLE III: A comparison between the realizations of btand its approximation bbd
twhen each active node adopts πheu for
N= 10,λ= 0.8,D= 8.
bt(0) bt(1) bt(2) bt(3) bt(4) bt(5) bt(6) bt(7) bt(8) bt(9)
t= 1
o1= 0
bt0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218
Approx. 0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218
t= 2
o2= 1
bt0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160
Approx. 0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160
t= 3
o3= 1
bt0.000059 0.001098 0.009014 0.042646 0.127254 0.245406 0.298859 0.210235 0.065430 0
Approx. 0.000052 0.001004 0.008559 0.041692 0.126924 0.247294 0.301138 0.209545 0.063792 0
t= 4
o4= 1
bt0.001086 0.012248 0.059916 0.164987 0.276437 0.282086 0.162465 0.040774 0 0
Approx. 0.000974 0.011537 0.058598 0.165343 0.279925 0.284347 0.160466 0.038810 0 0
t= 5
o5= 1
bt0.010921 0.072058 0.201100 0.304173 0.263268 0.123764 0.024716 0 0 0
Approx. 0.010329 0.070827 0.202359 0.308353 0.264299 0.120821 0.023013 0 0 0
t= 6
o6= 0
bt0.068102 0.238724 0.340491 0.247285 0.091556 0.013842 0 0 0 0
Approx. 0.067210 0.240606 0.344541 0.246686 0.088312 0.012646 0 0 0 0
t= 7
o7= 0
bt0.169904 0.357679 0.306377 0.133629 0.029713 0.002698 0 0 0 0
Approx. 0.167239 0.359554 0.309208 0.132956 0.028585 0.002458 0 0 0 0
t= 8
o8= 1
bt0.421334 0.395352 0.150943 0.029344 0.002908 0.000118 0 0 0 0
Approx. 0.416144 0.398784 0.152859 0.029297 0.002807 0.000108 0 0 0 0
The proof of Eq. (27) is provided in the supplemental
material.
2) Otherwise, set
πheu
t(M, α) = 1
Dt+ 1.
The heuristic scheme depends on not only ntbut also t, so it
is a time-dependent dynamic policy.
Table III compares the realizations of btand its approxi-
mation bbd
twhen each active node adopts πheu for N= 10,
λ= 0.8,D= 8, and verifies that the proposed approximation
is reasonable.
Let Rπheu 1,(N1, λ)denote the expected total reward
from slot 1to slot Dwhen q1= 1,(M1, α1) = (N1, λ)
and the policy πheu is used, which can be defined by
Rπheu 1,(N1, λ)
,Eπheu D
t=1,qt=1
rt(1, nt), πheu
t(Mt, αt)
|q1= 1,(M1, α1) = (N1, λ).
Obviously, we have TDRπheu =Rπheu 1,(N1, λ).
Remark: The proposed heuristic scheme for the realistic en-
vironment is based on the model-based MDP and POMDP
formulations, which require each node to know other nodes’
traffic parameters. Owing to the known model parameters,
the heuristic scheme enjoys threefold advantages of being
implemented without imposing extra overhead and hardware
cost, of being implemented with very low computational
complexity, and of achieving TDR close to the maximum
achievable TDR in the idealized environment. However, in
practice, the model parameters may not be available to for-
mulate the optimization and obtain the heuristic scheme. As
such, reinforcement learning algorithms [33] can be used to
learn the model parameters by directly interacting with the
environment, but at the cost of extra training phase with slow
convergence speed and high computational complexity, which
may not be feasible for low-cost nodes.
VI. EX TE NS IO N TO RETRANSMISSION-BA SE D
BROADCASTING
In our study so far, we have assumed that every packet is
never retransmitted, which is a common setting in broadcasting
due to the lack of acknowledgements, and a desirable setting
for energy saving in many energy-constrained applications.
This assumption also allows us to provide a nice presentation
in Sections III–V for designing optimal and heuristic schemes.
In this section, to further improve the TDR, we relax this
assumption to allow at most K1copies of every packet
to be transmitted within the deadline Dslots. However, the
updating of the activity belief for the case K2would
lead to much more complicated modeling than that for the
case K= 1. Instead, we directly generalize πheu to propose
a heuristic scheme for the realistic environment with K1,
πheuR, which makes good use of the results in Sections III–V.
The key idea behind πheuR is based on the heuristic that each
copy of every packet has “average deadline” D/K slots. So,
we first divide a frame into Kconsecutive subframes, indexed
by k {1,2, . . . , K}, and require the k-th subframe to occupy
Dkslots by the following rule: if r= 0,Dk=D/K for k=
1,2, . . . , K; otherwise, Dk=D/Kfor k= 1,2, . . . , K r,
and Dk=D/Kfor k=K+ 1 r, . . . , K, where r=D
mod K. Then, we require all the nodes, which are active at the
beginning of a frame, to only broadcast the k-th copy at most
once during the k-th subframe, for k= 1,2, . . . , K. In this
manner, it is easy to see that the updating of the activity belief
proposed in Sections IV–V is still applicable to each subframe.
This property motivates us to use πheu with the same initial
belief to broadcast the k-th copy during the k-th subframe for
k= 1,2, . . . , K, i.e., repeatedly use πheu for Ktimes. Clearly,
the study in Section V can be seen as a particular case here
(i.e., K= 1,D1=D). An example of the working procedure
of πheuR for D= 6, K = 2 can be found in Fig. 7.
12
packet transmission
p1
t = 1
transmission
probabilities
t = 2 t = 3 t = 1 t = 2 t = 3
an arbitrary
node active inactive
active
active active
an arbitrary frame, D = 6 slots
1-st subframe, D1 = 3 slots 2-nd subframe, D2 = 3 slots
p2p2
p1p3
( )
ttt Mα ,πheu
( )
ttt Mα ,πheu
()()
λ ,
1
α , 1
1-
=N
M
()()
λ ,
1
α , 1
1-
=N
M
active
Fig. 7: An example of the working procedure of πheuR for
D= 6, K = 2.
Let Rπheu
k1, n, (N1, λ)denote the expected total reward
of the k-th subframe from slot 1to slot Dkwhen q1= 1,
n1=n,(M1, α1)=(N1, λ)and the policy πheu is used,
which can be defined by
Rπheu
k1, n, (N1, λ)
,Eπheu Dk
t=1,qt=1
rt(1, nt), πheu
t(Mt, αt)
|q1= 1, n1=n, (M1, α1) = (N1, λ).
Then, the TDR under the policy πheuR can be computed by
TDRπheuR = 1
n∈N N1
nλn(1 λ)N1n
·
K
k=1 1Rπheu
k1, n, (N1, λ).
The performance improvement owning to the use of πheuR
will be investigated in Section VII.
VII. NUMERICAL EVALUATI ON
This section includes two subsections. To validate the stud-
ies in Sections III–V without considering retransmissions, the
first subsection compares the TDR performance of the optimal
scheme for the idealized environment
π, the proposed heuris-
tic scheme for the realistic environment πheu, an optimal static
scheme for the realistic environment πsta [6], [8], [11], [13],
and the ITM scheme for the realistic environment (also can
be seen as the myopic scheme in our POMDP model) πmyo
[14], [20], [21]. Here, πsta requires each active node to always
adopt a static and identical transmission probability, and an
optimal transmission probability can be obtained by solving
maxp[0,1] Rπ(1,hλ)s.t. πt(b) = p,t T ,b Bt.
Such comparisons are not only helpful to demonstrate the
performance loss due to the incomplete knowledge of the value
of nt, but also helpful to demonstrate the performance advan-
tage benefitting from dynamically adjusting the transmission
probability. In the second subsection, for the scenario where
each packet is allowed to be transmitted at most Ktimes,
we compare the TDR performance of the proposed heuristic
scheme for the realistic environment πheuR with K=KheuR,
and an optimal static scheme for the realistic environment
πstaR with K=KstaR. Here, KheuRand KstaRare optimal
values of Kthat maximize the TDR of πheuR and πstaR,
respectively.
It is worth mentioning that, to our best knowledge, design-
ing the ITM scheme with 1< K < D for the realistic
environment requires a new modeling to update the activity
belief due to the impact of the retransmission limit on the
node status, which is very different from the modelings in this
paper and previous studies [19]–[23]. Meanwhile, the ITM
scheme with K=Dfor the realistic environment can be
seen as a particular static scheme. So, based on these two
considerations, the ITM scheme with retransmissions is not
presented in Subsection VII-B for comparisons.
The scenarios considered in the numerical experiments for
the first subsection are in accordance with the system model
specified in Section II-A, while the scenarios for the second
subsection additionally allows retransmissions. We shall vary
the system configuration over a wide range to investigate the
impact of control scheme design on the TDR performance.
Each numerical result is obtained from 107independent nu-
merical experiments.
A. Comparisons Without Retransmissions
Fig. 8 shows the TDR performance without retransmissions
as a function of the packet arrival rate λ. The TDR of each
scheme except πmyo always decreases when λincreases, due
to the increase of contention intensity. But λhas a more
complicated impact on πmyo, because πmyo behaves more
optimally when λincreases as indicated by Lemma 1, which
compensates the negative impact of the increase of contention
intensity. We observe that πheu performs close to
π:2.87%
8.18% loss when D= 10 and 0.56%4.04% loss when
D= 20. This indicates that the design of πheu is reasonable,
and the incomplete knowledge of the value of nthas a minor
impact on the TDR performance. We further observe that πheu
significantly outperforms πsta:2.03%17.06% improvement
when D= 10 and 11.09%19.59% improvement when
D= 20. The reason is obvious that πsta does not adjust
the transmission probability according to the current delivery
urgency and contention intensity. It is interesting to note that
πsta performs closer to other schemes as λincreases. This
is because the optimal transmission probabilities for different
values of tand ntbecome closer with the value of n1, as
indicated by Lemma 1. Moreover, we also observe that πheu
significantly outperforms πmyo when Nλ < D:14.43%
58.24% improvement when D= 10 and 12.37%106.57%
improvement when D= 20;πheu achieves almost the same
TDR as πmyo when Nλ D. This observation confirms
again the heuristic from Lemma 1 in Section V-A and further
indicates that the ITM goal [14], [20], [21] is unsuitable here.
Fig. 9 shows the TDR performance without retransmissions
as a function of the delivery deadline D. We observe that
the TDR of each scheme except πmyo always increases with
D, due to the decrease of delivery urgency. But the TDR of
πmyo first increases and then remains the same with D. This
phenomenon is due to the fact that, when Nλ < D,πmyo
would transmit all backlogged packets in the first Nλslots
at average and waste the remaining slots. We observe that
πheu performs significantly better than πsta:6.61%16.74%
improvement when σ= 0.8and 6.56%16.89% improvement
13
0.1 0.16 0.22 0.28 0.34 0.4
0.15
0.29
0.43
0.57
0.71
0.85
Fig. 8: The TDR as a function of the packet arrival rate λfor
N= 50,D= 10,20,σ= 0.9.
when σ= 1;πheu performs close to
π:3.30%6.56% loss
when σ= 0.8and 3.17%6.60% loss when σ= 1. It is shown
that the relative gap between
πand πheu is small when D
is much larger than Nλ. The rationale is that, for this case in
the realistic environment, ntprobably takes a value nmuch
smaller than Dtand it is easy for the active nodes to judge
correctly whether n < Dt, which is crucial for πheu . We also
observe that πheu significantly outperforms πmyo when Nλ <
D:6.15%39.84% improvement when σ= 0.8and 6.58%
39.72% improvement when σ= 1, and achieves almost the
same TDR performance as πmyo when Nλ D. This confirms
again that the inefficient time utilization of πmyo, which is
opposite to the insight behind the heuristic from Eq. (25),
yields the poor TDR performance of πmyo.
10 12 14 16 18 20
0.2
0.29
0.38
0.47
0.56
0.65
Fig. 9: The TDR as a function of the delivery deadline Dfor
N= 50,λ= 0.25,σ= 0.8,1.
Fig. 10 shows the TDR performance without retransmission-
s as a function of the packet success rate σ. The TDR of each
scheme always increases with σ, due to the increase of channel
quality. We observe that πheu performs significantly better than
πsta:18.59%19.03% improvement when λ= 0.1and 5.51%
5.77% improvement when λ= 0.4, and performs close to
π:
0.70%1.06% loss when λ= 0.1and 4.09%4.33% loss when
λ= 0.4. We also observe that πheu significantly outperforms
πmyo when Nλ < D:89.67%91.00% improvement when
λ= 0.1;πheu achieves almost the same TDR as πmyo when
Nλ D. This confirms again the benefits of determining the
transmission probability simultaneously based on the current
delivery urgency and contention intensity. It is also shown that
πsta performs close to other schemes as N λ
Dbecomes larger.
This is because, ntprobably takes a value nmuch larger than
Dtand the knowledge of nhas a minor impact on TDR
due to large n.
0.8 0.84 0.88 0.92 0.96 1
0.2
0.35
0.5
0.65
0.8
0.95
Fig. 10: The TDR as a function of the packet success rate σ
for N= 50,λ= 0.1,0.4,D= 15.
B. Comparisons With Retransmissions
Figs. 11–13 show the TDR performance with retransmis-
sions as functions of λ,Dand σ, respectively. We observe
from Fig. 11 that πheuR with K=KheuRoutperforms πstaR
with K=KstaR:2.06%9.29% improvement when D= 10
and 3.42%11.04% improvement when D= 20. We observe
from Fig. 12 that πheuR with K=KheuRenjoys 5.64%
10.46% improvement when σ= 0.8and 6.58%11.25%
improvement when σ= 1. We observe from Fig. 13 that
πheuR with K=KheuRenjoys 1.80%3.74% improvement
when λ= 0.1and 5.61%5.95% improvement when λ= 0.4.
These results indicate that the idea of generalizing πheu to
πheuR is effective in improving the TDR for a wide range
of configurations when retransmissions are allowed. Next, we
can see that, in general, KheuRincreases when λdecreases,
Dincreases, or σdecreases. This is because appropriately
introducing more retransmissions would be useful to improve
the time utilization for smaller λand larger D, and to resist
the risk of transmission failures for smaller σ. Moreover, it
is shown that KheuRKstaRin all the cases, which is
desirable for the sake of energy efficiency. This is because
dynamic control is helpful to improve the delivery reliability of
a single transmission and alleviate the need of retransmissions.
It also implies that although the performance gap in Figs. 11–
13 is less notable than that when K= 1, such a phenomenon
is caused by allowing the static scheme to unfairly perform
more retransmissions.
VIII. CONCLUSION
In this paper, under the idealized and realistic environments
without retransmissions, optimal dynamic control schemes
for random access in deadline-constrained broadcasting with
14
0.1 0.16 0.22 0.28 0.34 0.4
0.1
0.25
0.4
0.55
0.7
0.85
Fig. 11: The TDR as a function of the packet arrival rate λ
for N= 50,D= 10,20,σ= 0.9.
10 12 14 16 18 20
0.2
0.28
0.36
0.44
0.52
0.6
Fig. 12: The TDR as a function of the delivery deadline D
for N= 50,λ= 0.25,σ= 0.8,1.
0.8 0.84 0.88 0.92 0.96 1
0.1
0.25
0.4
0.55
0.7
0.85
Fig. 13: The TDR as a function of the packet success rate σ
for N= 50,λ= 0.1,0.4,D= 15.
frame-synchronized traffic have been investigated based on
the theories of MDP and POMDP, respectively. The pro-
posed heuristic scheme for the realistic environment is able
to achieve the threefold goal of being implemented with-
out imposing extra overhead and hardware cost, of being
implemented with very low computational complexity, and
of achieving TDR close to the maximum achievable TDR
in the idealized environment. Moreover, it has been shown
that the proposed heuristic scheme can be easily extended to
incorporate retransmissions for further improving the TDR.
An interesting and important future research direction is
to optimize deadline-constrained broadcasting under general
traffic patterns. To handle with such an asymmetric scenario,
a standard way is to develop a dynamic control scheme based
on the theory of decentralized MDP (Dec-MDP). However,
solving a Dec-MDP is in general NEXP-complete [34]. Hence,
an appropriate practical scheme needs to be further studied,
which is our ongoing work.
ACK NOW LE DG EM EN T
The authors would like to thank Dr. He Chen for helpful
suggestions and discussions.
REFERENCES
[1] D. Feng, C. She, K. Ying, L. Lai, Z. Hou, T. Q. S. Quek, Y. Li, and
B. Vucetic, “Toward ultrareliable low-latency communications: Typical
scenarios, possible solutions, and open issues,” IEEE Veh. Technol. Mag.,
vol. 14, no. 2, pp. 94–102, 2019.
[2] J. Gao, M. Li, L. Zhao, and X. Shen, “Contention intensity based
distributed coordination for V2V safety message broadcast,” IEEE Trans.
Veh. Technol., vol. 67, no. 12, pp. 12288–12301, 2018.
[3] M. Luvisotto, Z. Pang, and D. Dzung, “High-performance wireless
networks for industrial control applications: New targets and feasibility,”
Proc. IEEE, vol. 107, no. 6, pp. 1074–1093, 2019.
[4] C. Chen, H. Li, H. Li, R. Fu, Y. Liu, and S. Wan, “Efficiency and
fairness oriented dynamic task offloading in Internet of Vehicles,” IEEE
Trans. Green Commun. Netw., vol. 6, no. 3, pp. 1481–1493, 2022.
[5] Y. Zhao, X. Wang, C. Wang, Y. Cong, and L. Shen, “Systemic design
of distributed multi-UAV cooperative decision-making for multi-target
tracking,” Auton. Agents Multi-Agent Syst., vol. 33, no. 1, pp. 132–158,
2019.
[6] Y. H. Bae, “Analysis of optimal random access for broadcasting with
deadline in cognitive radio networks, IEEE Commun. Lett., vol. 17,
no. 3, pp. 573–575, 2013.
[7] Y. H. Bae, “Random access scheme to improve broadcast reliability,
IEEE Commun. Lett., vol. 17, no. 7, pp. 1467–1470, 2013.
[8] Y. H. Bae, “Queueing analysis of deadline-constrained broadcasting in
wireless networks,” IEEE Commun. Lett., vol. 19, no. 10, pp. 1782–
1785, 2015.
[9] C. Campolo, A. Vinel, A. Molinaro, and Y. Koucheryavy, “Modeling
broadcasting in IEEE 802.11p/WAVE vehicular networks,” IEEE Com-
mun. Lett., vol. 15, no. 2, pp. 199–201, 2011.
[10] M. I. Hassan, H. L. Vu, and T. Sakurai, “Performance analysis of the
IEEE 802.11 MAC protocol for DSRC with and without retransmission-
s,” in Proc. IEEE WoWMoM, 2010, pp. 1–8.
[11] Y. H. Bae, “Optimal retransmission-based broadcasting under delivery
deadline constraint,” IEEE Commun. Lett., vol. 19, no. 6, pp. 1041–1044,
2015.
[12] Y. H. Bae, “Modeling timely-delivery ratio of slotted aloha with energy
harvesting,” IEEE Commun. Lett., vol. 21, no. 8, pp. 1823–1826, 2017.
[13] L. Deng, J. Deng, P. Chen, and Y. S. Han, “On the asymptotic perfor-
mance of delay-constrained slotted ALOHA,” in Proc. IEEE ICCCN,