Content uploaded by Aoyu Gong

Author content

All content in this area was uploaded by Aoyu Gong on Jan 27, 2023

Content may be subject to copyright.

1

Dynamic Optimization of Random Access in

Deadline-Constrained Broadcasting

Aoyu Gong, Yijin Zhang, Senior Member, IEEE, Lei Deng, Fang Liu, Jun Li, Senior Member, IEEE,

and Feng Shu, Member, IEEE

Abstract—This paper considers dynamic optimization of ran-

dom access in deadline-constrained broadcasting with frame-

synchronized trafﬁc. Under the non-retransmission setting, we

deﬁne a dynamic control scheme that allows each active node

to determine the transmission probability based on the local

knowledge of current delivery urgency and contention intensity

(i.e., the number of active nodes). For an idealized environment

where the contention intensity is completely known, we develop

a Markov Decision Process (MDP) framework, by which an

optimal scheme for maximizing the timely delivery ratio (TDR)

can be explicitly obtained. For a realistic environment where the

contention intensity is incompletely known, we develop a Partially

Observable MDP (POMDP) framework, by which an optimal

scheme can only in theory be found. To overcome the infeasibility

in obtaining an optimal or near-optimal scheme from the POMDP

framework, we investigate the behaviors of the optimal scheme

for extreme cases in the MDP framework, and leverage intuition

gained from these behaviors together with an approximation on

the contention intensity knowledge to propose a heuristic scheme

for the realistic environment with TDR close to the maximum

TDR in the idealized environment. We further generalize the

heuristic scheme to support retransmissions. Numerical results

are provided to validate our study.

Index Terms—Distributed algorithms, dynamic optimization,

random access, delivery deadline

I. INTRODUCTION

A. Background

BROADCASTING is a fundamental operation in dis-

tributed wireless systems. With the explosive growth of

ultra-reliable low-latency services for the Internet of things

This work was supported in part by the National Natural Science Foundation

of China under Grants 62071236, U22A2002, 62071234, 61902256, in part

by the Major Science and Technology plan of Hainan Province under Grant

ZDKJ2021022, in part by the Scientiﬁc Research Fund Project of Hainan

University under Grant KYQD(ZR)-21008, in part by the Fundamental

Research Funds for the Central Universities of China (Nos. 30920021127,

30921013104), in part by Future Network Grant of Provincial Education

Board in Jiangsu, and in part by the Open Research Fund of State Key

Laboratory of Integrated Services Networks, Xidian University, under Grant

ISN22-14. (Corresponding author: Yijin Zhang.)

A. Gong, Y. Zhang, and J. Li are with the School of Electronic and

Optical Engineering, Nanjing University of Science and Technology, Nanjing

210094, China. Y. Zhang is also with the State Key Laboratory of Integrated

Services Networks, Xidian University, Xian 710071. E-mail: {gongaoyu;

yijin.zhang}@gmail.com, jun.li@njust.edu.cn.

L. Deng is with the College of Electronics and Information Engineering,

Shenzhen University, Shenzhen 518060, China. E-mail: ldeng@szu.edu.cn.

F. Liu is with the Department of Information Engineering, The Chi-

nese University of Hong Kong, Shatin, N. T., Hong Kong. E-mail:

lf015@ie.cuhk.edu.hk.

F. Shu is with the School of Information and Communication Engineer-

ing, Hainan University, Haikou 570228, China, and also with the School

of Electronic and Optical Engineering, Nanjing University of Science and

Technology, Nanjing, 210094, China. E-mail: shufeng0101@163.com.

(IoT) [1]–[4], such as detection information sharing in un-

manned aerial vehicles (UAV) networks, safety message dis-

semination in vehicular networks, and industrial control in fac-

tory automation, deadline-constrained broadcasting has been

becoming a research focus in recent years. For such broadcast-

ing, each packet needs to be transmitted within a strict delivery

deadline since its arrival and will be discarded if the deadline

expires. Hence, timely delivery ratio (TDR), deﬁned as the

probability that a broadcast packet is successfully delivered

to an arbitrary intended receiver within the given delivery

deadline, is considered as a critical metric to evaluate the

performance of such broadcasting. Note that such broadcasting

is important for agents in these applications to obtain as

much timely information of the world as possible for mak-

ing operations that best achieve their application objectives.

For example, in UAV networks for collaborative multitarget

tracking [5], due to the limited detection capability, each UAV

is only able to detect targets situated within a certain area, and

has to use the detection information shared by other UAVs to

decide on the optimal path to follow in order to cover as many

targets as possible.

A canonical deadline-constrained broadcasting scenario is

that, under a given trafﬁc pattern, an uncertain set of nodes

with new or backlogged packets attempt to transmit before

deadline expiration without centralized scheduling. In this

scenario, it is expected that the MAC layer behaves differently

from what is commonly believed in conventional deadline-

unconstrained protocols, as the contention intensity would be

jointly determined by the trafﬁc pattern, delivery deadline, and

retransmission setting. So, random access mechanisms tailored

for this scenario are needed to support efﬁcient channel sharing

under deadline, and careful design of access parameters is

needed to maximize the TDR.

B. Related Work and Motivation

Many recent literatures [6]–[9] have been dedicated to this

issue by assuming no retransmissions, which is commonly

adopted in broadcasting due to the lack of acknowledgments

or for the sake of energy efﬁciency. Under saturated trafﬁc,

Bae [6], [7] obtained the optimal slotted-ALOHA for broad-

casting single-slot packets and optimal p-persistent CSMA for

broadcasting multi-slot packets, respectively. Under a discrete-

time Geo/Geo/1 queue model, Bae [8] obtained the optimal

slotted-ALOHA for broadcasting single-slot packets. Under

frame-synchronized trafﬁc, Campolo et al. [9] proposed an

analytical model for using IEEE 802.11p CSMA/CA to broad-

cast multi-slot packets, which can be used to obtain the optimal

2

contention window size. To improve the TDR with the help

of retransmissions, Hassan et al. [10] investigated the impact

of retransmissions on the TDR of IEEE 802.11p CSMA/CA

under frame-synchronized trafﬁc, and Bae [11] obtained the

optimal slotted-ALOHA with retransmissions under saturated

trafﬁc. However, [6]–[8], [11] adopted a static transmission

probability and [9], [10] adopted a static contention window

size, thus inevitably limiting the maximum achievable TDR.

Other studies on deadline-constrained random access in-

clude [12]–[16] for uplink to a common receiver. Under

Bernoulli trafﬁc, Bae [12] derived the optimal static transmis-

sion probability for maximizing the TDR based on stationary

Markov chain modeling. Under frame-synchronized trafﬁc,

Deng et al. [13] developed an algorithm to recursively analyze

the timely throughput for any static transmission probability

and characterized the asymptotic behavior of optimal one.

However, [12], [13] still restrict their attentions to static access

parameters. Using absorbing Markov chain modeling, Bae

et al. [14] proposed to myopically change the transmission

probability when the contention intensity is completely known,

which, however, did not account for dynamic programming

optimality. Zhao et al. [15] proposed to simply double or

halve transmission probability based on the channel feedback,

which is easily implemented but lacks an explicit optimization

goal. Zhang et al. [16] proposed to adjust the transmission

probability for maximizing the TDR by a joint use of the

ﬁxed point iteration and a recursive estimator, but did not

utilize all of the observed data. Another type of deadline-

constrained access is based on the sequence design [17], [18],

where each active node deterministically decides whether to

transmit according to the assigned sequence but utilizes no

observation to adjust its access behavior.

As such, to enhance the maximum achievable TDR of dis-

tributed random access in deadline-constrained broadcasting,

it is strongly required to develop a dynamic control scheme

that allows each node to adjust its access parameters according

to local knowledge of current delivery urgency and contention

intensity. Unfortunately, due to random trafﬁc or limited capa-

bility on observing the channel status, each node cannot obtain

a complete knowledge of the current contention intensity in

practice, which renders such design a challenging task. So,

each node has to estimate the current contention intensity using

the information obtained from the observed channel status. A

great amount of work has gone into studying such information

that can be obtained [16], [19]–[23] under various models and

protocols. Our work follows the same direction of [20], [21]

to keep an A Posteriori Probability (APP) distribution for the

current contention intensity given all past observations and

access settings, which is a sufﬁcient statistic for the optimal

design [24], but needs to additionally take into account the

impact of delivery urgencies. It should be noted that another

estimation technique is based on the “Certainty Equivalence”

principle [16], [19], [22], [23], which uses simple recursive

point estimators to merely estimate the actual value of the

current contention intensity, but does not utilize a sufﬁcient

statistic for the optimal design. To our best knowledge, this is

the ﬁrst time to study dynamic control for deadline-constrained

random access, and the previous estimation approaches [16],

[19]–[23] cannot be directly applied here.

Furthermore, it is naturally desirable for this dynamic

control to strike a balance between the chance to gain an

instantaneous successful transmission and the chance to gain

a future successful transmission within the given deadline,

which requires reasoning about future sequences of access

parameters and observations. So, the dynamic control design

under this objective is more challenging than that for maxi-

mizing the instantaneous throughput of random access [14],

[20]–[23], which is only “one-step look-ahead”. By seeing

access parameters as actions, in this paper we apply the

theories of Markov Decision Process (MDP) and Partially

Observable MDP (POMDP) to obtain optimal control schemes

for maximizing the TDR. To our best knowledge, this is the

ﬁrst work to apply them in deadline-constrained broadcasting.

Although the idea of using MDP and POMDP in the context

of random access control is not new [21], [25], [26], our study

is different because the delivery urgency plays a nontrivial role

in decision making. It not only leads to accounting for time-

dependent decision rules, but also leads to a number of new

theoretical model properties (see Lemmas 1–3) to answer how

the delivery deadline affects optimal policies.

In addition, as solving POMDP is in general computation-

ally prohibitive, it is important to develop a simple control

scheme for deadline-constrained broadcasting with little TDR

performance loss. However, this design objective is unique-

ly challenging due to the difﬁculty in deﬁning a reason-

able myopic optimization goal. Note that the instantaneous-

throughput-maximization (ITM) goal usually adopted in the

literature [14], [20], [21] is no longer a suitable candidate here,

because it may signiﬁcantly degrade the TDR performance

especially when the delivery deadline is relatively long and

the maximum allowed number of retransmissions is limited.

As such, how to utilize the model properties to design a simple

control scheme is a major issue that needs to be addressed.

C. Contributions

In this paper, we focus on deadline-constrained broadcasting

under frame-synchronized trafﬁc. Such a trafﬁc pattern can

capture a number of scenarios in IoT communications [1], [9],

[10], [13], [14], [27], [28] where each node has periodic-i.i.d.

packet arrivals. Our contributions are as follows.

1) For the commonly adopted non-retransmission setting, we

generalize slotted-ALOHA to deﬁne a dynamic control

scheme, i.e., a deterministic Markovian policy, which

allows each active node to determine the current trans-

mission probability with certainty based on its current

delivery urgency and the knowledge of current contention

intensity.

2) For an idealized environment where the contention in-

tensity is completely known, we develop an analytical

framework based on the theory of MDP, which leads

to an optimal control scheme by applying the backward

induction algorithm. We further show it is indeed optimal

over all types of policies for this environment.

3) For a realistic environment where the contention inten-

sity is incompletely known, we develop an analytical

3

TABLE I: Comparison of the proposed schemes and previously known schemes when being applied to deadline-constrained

broadcasting under frame-synchronized trafﬁc. The elementary operation is deﬁned as root ﬁnding of one univariate polynomial.

Scheme Observation Requirement Retrans. Limit Complexity

The optimal scheme (idealized), b

π∗(Section III) idealized no retrans. O(N D)operations

The optimal scheme (realistic), π∗(Section IV) realistic no retrans. O(|BD|D)operations

The heuristic scheme (realistic), πheu (Section V) realistic no retrans. O(1) operation (closed-form formula)

The heuristic retrans.-based scheme (realistic), πheuR

(Section VI) realistic an arbitrary number

of retrans. O(1) operation (closed-form formula)

The ITM scheme ( [14], [20], [21]) realistic no retrans./no limit O(|BD|)operations

The static scheme ( [6], [8], [11], [13]) realistic an arbitrary number

of retrans. O(1) operation

framework based on the theory of POMDP, which can

in theory lead to an optimal control scheme by backward

induction. We also show it is indeed optimal over all types

of policies for this environment.

4) To overcome the infeasibility in obtaining an optimal or

near-optimal control scheme from the POMDP frame-

work, we investigate the behaviors of the optimal control

scheme for two extreme cases in the idealized envi-

ronment and use these behaviors as clues to design a

simple heuristic control scheme (without need to solve

any dynamic programming equations) for the realistic

environment with TDR close to the maximum achievable

TDR in the idealized environment. In addition, we pro-

pose an approximation on the knowledge of contention

intensity to further simplify this heuristic scheme.

Note that, although the MDP framework in the idealized envi-

ronment has limited applicability as the contention intensity

cannot be completely known in practice, it will serve to

provide an upper bound on the maximum achievable TDR in

the realistic environment, and serve to provide clues to design

a heuristic scheme for the realistic environment.

5) To further improve the TDR in the realistic environment,

we generalize the proposed heuristic scheme to support

retransmissions.

A comparison of the proposed schemes and previously known

schemes when being applied to deadline-constrained broad-

casting under frame-synchronized trafﬁc is summarized in

Table I. Previously known schemes [6], [8], [11], [13] adopted

the static transmission probability and [14], [20], [21] adjusted

the transmission probability for ITM merely relying on the

knowledge of contention intensity. In contrast, our schemes ac-

count for the local knowledge of current delivery urgency and

contention intensity simultaneously to adjust the transmission

probability, which can yield better performance. Moreover,

our schemes can be applied to the realistic environment, can

support an arbitrary number of retransmissions, and still have

low complexity.

The remainder of this paper is organized as follows. For

the non-retransmission setting, the network model, protocol

design, and problem formulation are speciﬁed in Section II.

Optimal schemes for the idealized and realistic environments

are studied in Sections III and IV, respectively. The proposed

heuristic scheme for the realistic environment is presented in

Section V. In Section VI, we generalize this heuristic scheme

to support retransmissions. Numerical results with respect to

a wide range of conﬁgurations are provided in Section VII.

Conclusions are given in Section VIII.

II. SY ST EM MO DE L AN D PROB LE M FOR MU LATI ON

A. Network Model

Consider a wireless system with global synchronization,

where a ﬁnite number, N≥2, of nodes are within the

communication range of each other. The global time axis is

divided into frames, each of which consists of a ﬁnite number,

D≥1, of time slots of equal duration, indexed by t∈ T ,

{1,2, . . . , D}. To broadcast the freshest information, at the

beginning of each frame, each node independently generates

a packet to be transmitted with probability λ∈(0,1]. We

further assume that every packet has a strict delivery deadline

Dslots, i.e., a packet generated at the beginning of a frame

will be discarded at the end of this frame.

By considering random channel errors due to wireless

fading effect, we assume that a packet sent from a node is

successfully received by an arbitrary other node with prob-

ability σ∈(0,1] if it does not collide with other packets,

and otherwise is certainly unsuccessfully received by any

other node. Due to the broadcast nature, we assume that

every packet is neither acknowledged nor retransmitted. Then,

at the beginning of slot tof a frame, a node is called an

active node if it generated a packet at the beginning of the

frame and has not transmitted before slot t, but is called an

inactive node otherwise. Each active node follows a common

control scheme for random access, which will be deﬁned

in Section II-B, to generate transmission probabilities at the

beginnings of different slots. A slot is said to be in the idle

channel status if no packet is being transmitted, and in the

busy status otherwise. At the end of a slot, assume that each

node is able to be aware of the channel status of this slot.

The values of N,Dand λare all assumed to be completely

known in advance to each node.

In this paper, we will mainly focus on the aforementioned

system model. The extension to retransmission-based broad-

casting will be discussed in Section VI. It should be noted

that, for ease of presentation, although we have assumed an

identical arrival rate, results obtained in this paper can be

readily extended to general cases.

The main notations used in our analysis are listed in Table II.

4

TABLE II: Main notations.

Notation Description

NNumber of nodes

DDelivery deadline (in time slots)

λPacket arrival rate

σPacket success rate

ntActual number of other active nodes in the view of

an arbitrary node at the beginning of slot t

btActivity belief at the beginning of slot t

ptTransmission probability at the beginning of slot t

qtStatus of the tagged node at the beginning of slot t

βtState transition function at slot t

rtThe reward gained at slot t

bπtTransmission function (idealized) at slot t

b

πPolicy (idealized)

R

b

πExpected total reward (idealized)

TDR

b

πTDR under the policy b

π

U∗

tValue function (idealized) at slot t

otObservation on the channel status (realistic) of slot t

ωtObservation function (realistic) at slot t

hλInitial activity belief (realistic)

θtBayesian update function (realistic) at slot t

χtBayesian update normalizing factor (realistic) at slot t

πtTransmission function (realistic) at slot t

πPolicy (realistic)

RπExpected total reward (realistic)

TDRπTDR under the policy π

V∗

tValue function (realistic) at slot t

(Mt, αt)Parameter vector for approximating bt(realistic)

A broadcasting scenario with the above assumptions can be

found in Fig. 1. There is a communication link between any

two UAVs, i.e., all UAVs are within the communication range

of each other, and each UAV adopts the protocol proposed

in Section II-B to broadcast its detection information to all

other UAVs for collaborative multitarget detection. It can be

employed in a trafﬁc mapping application [29] where a team of

UAVs keep estimating the locations of ground vehicles, and

a surveillance application [30] where a team of UAVs keep

tracking several targets with uncertainties caused by occlusion

or motion blur.

B. Protocol Description

Due to frame-synchronous trafﬁc, at the beginning of an

arbitrary slot with at least one active node, we know each ac-

tive node has the same delivery urgency. To take into account

the joint impact of delivery urgency and the knowledge of

contention intensity on determining transmission probabilities,

a dynamic control scheme for random access in deadline-

constrained broadcasting, which can be seen as a generaliza-

tion of slotted-ALOHA, is formally deﬁned as follows.

Consider an arbitrary frame. Let the random variable nt

taking values in N,{0,1, . . . , N −1}denote the actual

number of other active nodes in the view of an arbitrary

node at the beginning of slot t∈ T . At the beginning of an

arbitrary slot t∈ T with at least one active node, we assume

¬

t =

TargetTarget

TargetTarget

Detection Area

TargetTarget

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

rg

et

et

et

et

et

et

et

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

et

et

et

et

et

et

et

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

et

Target

TargetTarget

TargetTarget

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

et

et

et

et

et

et

et

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

et

et

et

et

et

et

Ta

Ta

Ta

Ta

rg

rg

rg

et

et

et

Ta

Ta

Ta

Ta

rg

rg

rg

et

et

et

Target

UAV

UAV

UAV

UAV

Communication Link 1 2 3 1 2 3

D = 3

ĂĂ¬

TargetTarget

Fig. 1: A deadline-constrained broadcasting scenario for col-

laborative multitarget detection, which requires each UAV to

be equipped with a Ground Moving Target Indicator (GMTI)

sensor module and a wireless communication module.

that each active node has the same observation history (for

estimating the actual value of nt) from the environment, and

we require each active node to adopt the same transmission

probability. Thus, each active node has the same knowledge

of the value of ntbased on all past observations from the

environment and all past transmission probabilities. Such a

knowledge can be summarized by a probability vector bt,

bt(0), bt(1), . . . , bt(N−1), called the activity belief, where

bt(n)is the conditional probability (given all past observations

from the environment and all past transmission probabilities)

that nt=n. Let Btdenote the set of all possible values

of btin [0,1]Nsuch that N−1

n=0 bt(n)=1. Hence, at the

beginning of every slot t∈ T with at least one active node,

we require each active node to use the values of tand bt

for determining the value of transmission probability ptby a

transmission function πt:Bt→[0,1]. An example of the

working procedure for N= 8,D= 6 is illustrated in Fig. 2.

packet transmission

node 1

p1p2p3p4p5

t = 1

transmission

probabilities

t = 2 t = 3 t = 4 t = 5 t = 6

node 2

node 3

node 4

node 5

node 6

node 7

node 8

active

inactive

active

active

inactive

active

active

inactive

inactive

inactive

active

active

inactive

active

active

inactive

inactive

inactive

active

active

inactive

active

inactive

inactive

inactive

inactive

inactive

active

inactive

inactive

inactive

inactive

inactive

inactive

inactive

active

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

an arbitrary frame

busy busy busy idle busy idle

channel status

]

1,

0

[

:®

t

tB

p

Fig. 2: An example of the working procedure of the dynamic

control scheme for N= 8,D= 6.

We further consider two different environments for active

nodes to obtain the value of the activity belief bt.

1) Idealized environment: at the beginning of every slot t∈

Twith at least one active node, each active node always

has a complete knowledge of the value of nt, i.e., bt=

(0, . . . , 0, bt(n)=1,0, . . . , 0) if nt=nactually. Hence,

5

the transmission function πtin this environment can be

simply written as a function πt:N → [0,1], i.e., πtcan

be chosen from all possible mappings from Nto [0,1].

A dynamic control scheme

πis deﬁned by a sequence

of transmission functions for the idealized environment:

π,(π1,π2, . . . , πD),where πt:N → [0,1].

Let

Πdenote the set of all possible such schemes.

2) Realistic environment: at the beginning of each slot t∈ T

with at least one active node, each active node is able to

obtain the value of btonly based on the characteristic

of packet arrivals, all past channel statuses (idle or busy)

and all past transmission probabilities, and thus has an

incomplete knowledge of the value of nt. A dynamic con-

trol scheme πis deﬁned by a sequence of transmission

functions for the realistic environment:

π,(π1, π2, . . . , πD),where πt:Bt→[0,1].

Let Πdenote the set of all possible such schemes.

Obviously, the idealized environment is infeasible to imple-

ment due to the difﬁculty in determining the initial actual

number of other active nodes and determining the number

of nodes involved in each busy slot, whereas the realistic

environment can be easily implemented without imposing

extra overhead and hardware cost.

C. Problem Formulation

As all active nodes are homogeneous and, accordingly,

the performance of them is the same, we can consider an

arbitrary node as the tagged node to evaluate the network

performance. Let the random variable qttaking values from

{0 (inactive),1 (active)}denote the status of the tagged node

at the beginning of slot t.

The optimization problem for the idealized environment can

be formulated as

(P1) max

b

π∈

b

Π

TDR

b

π,

where TDR

b

πis the TDR under the control scheme

π, i.e.,

TDR

b

π,

n∈N N−1

nλn(1 −λ)N−1−n

·E

b

π

t∈T ,qt=1

σπt(nt)1−πt(nt)nt|q1= 1, n1=n.

Similarly, the optimization problem for the realistic environ-

ment can be formulated as

(P2) max

π∈ΠTDRπ,

where TDRπis the TDR under the control scheme π, i.e.,

TDRπ

,Eπ

t∈T ,qt=1

σπt(bt)1−πt(bt)nt|q1= 1,b1=hλ.

The objective of Sections III–IV is to seek optimal control

schemes for maximizing the TDR under the idealized and

realistic environments, respectively.

III. MDP FRA ME WORK FOR THE IDEALIZED

ENVIRONMENT

In this section, we formulate the random access control

problem in the idealized environment as an MDP, use the

expected total reward of this MDP to evaluate the TDR, and

obtain an optimal control scheme that maximizes the TDR.

A. MDP Formulation

From the dynamic control scheme speciﬁed in Section II-B,

we see that each node becomes inactive at the beginning of

slot t+ 1 with the transmission probability ptif it is active at

the beginning of slot t, and will be always inactive otherwise.

This implies that the probability of moving to the next state

in the state process (qt, nt)t∈T depends only on the current

state. Thus, (qt, nt)t∈T can be viewed as a discrete-time ﬁnite-

horizon, ﬁnite-state Markov chain.

Based on the Markov chain (qt, nt)t∈T , we present an MDP

formulation by introducing the following deﬁnitions.

Actions: At the beginning of each slot t∈ T with qt= 1,

the action performed by the tagged node (and the other active

nodes) is the chosen transmission probability pttaking values

in the action space [0,1]. Note that the tagged node performs

no action when qt= 0.

State Transition Function: As the tagged node will never

transmit since slot tif qt= 0, we only consider the state

transition function when qt= 1. The state transition function

βt(q′, n′),(1, n), pis deﬁned as the transition probability

of moving from the state (qt, nt) = (1, n)to (qt+1, nt+1) =

(q′, n′)when each active node at the beginning of slot tadopts

the transmission probability pt=p. So, we have

βt(q′, n′),(1, n), p

,Pr(qt+1, nt+1 ) = (q′, n′)|(qt, nt) = (1, n), pt=p

=n

n−n′pn−n′+1−q′(1 −p)n′+q′,if n′≤n,

0,otherwise,(1)

for each t∈ T \ {D}, each q′∈ {0,1}, each n, n′∈ N and

each p∈[0,1].

Rewards: The reward gained at slot tis deﬁned as the

average number of packets of the tagged node transmitted

successfully to an arbitrary other node at slot t. As there is no

reward at slot twhen qt= 0, we only focus on the cases when

qt= 1. Let rt(1, n), pdenote the reward at slot tfor the

state (qt, nt) = (1, n)when each active node at the beginning

of slot tadopts pt=p. So, we have

rt(1, n), p=σp(1 −p)n,(2)

for each t∈ T , each n∈ N and each p∈[0,1].

Policy: A dynamic control scheme

πdeﬁned in Section II-B

can be seen as a deterministic Markovian policy.

Let R

b

π(1, n)denote the expected total reward from slot 1

to slot Dwhen q1= 1,n1=nand the policy

πis used,

which can be deﬁned by

R

b

π(1, n),E

b

πD

t=1,qt=1

rt(1, nt),πt(nt)|q1= 1, n1=n.

Obviously, TDR

b

π=n∈N N−1

nλn(1 −λ)N−1−nR

b

π(1, n).

6

B. MDP solution

Due to the ﬁnite horizon, ﬁnite state space, compact action

space, bounded rewards, continuous rewards with respect to p

and continuous state transition function with respect to pin

our MDP formulation, [31, Prop. 4.4.3, Ch. 4] indicates that

for maximizing TDR

b

π, there exists a

π∈

Π, which is indeed

optimal over all random and deterministic, history-dependent

and Markovian policies. This property also justiﬁes the design

goal for the idealized environment considered in Section II.

Hence, we aim to seek

π∗∈arg max

b

π∈

b

Π

R

b

π(1, n),∀n∈ N .

Let U∗

t(1, n)denote the value function corresponding to the

maximum total expected reward from slot tto slot Dwhen

qt= 1 and nt=n. Averaging over all possible next states

with qt+1 = 1, we arrive at the following Bellman’s equation:

U∗

D(1, n) = max

p∈[0,1] rD(1, n), p,∀n∈ N ,

U∗

t(1, n) = max

p∈[0,1] rt(1, n), p

+

n′∈N

βt(1, n′),(1, n), pU∗

t+1(1, n′),∀n∈ N ,

(3)

for each t∈ T \ {D}.

Applying the backward induction algorithm to get a solution

to Eq. (3) involves ﬁnding global maximizers of N D real-

coefﬁcient univariate polynomials deﬁned on [0,1], and can

formally lead to

π∗. This indicates that even if the contention

intensity is completely known, the computation required to

obtain an optimal scheme is still demanding in practice.

IV. POMDP FR AM EW OR K FO R TH E REA LI ST IC

ENVIRONMENT

In this section, we formulate the random access control

problem in the realistic environment as a POMDP, use the

expected total reward of this POMDP to evaluate the TDR,

and discuss how to obtain an optimal or near-optimal scheme.

A. POMDP Formulation

Based on the Markov chain (qt, nt)t∈T speciﬁed in Sec-

tion III and the activity belief btfor each t∈ T with qt= 1

speciﬁed in Section II-B, we present a POMDP formulation

by introducing the following deﬁnitions.

Actions, State Transition Function, Rewards: The deﬁni-

tions of these elements are the same as in Section III.

Observations and Observation Function: The tagged node

at the beginning of slot t+ 1 can obtain an observation on the

channel status of slot t, denoted by ot. When qt+1 = 0, the

tagged node will never transmit since slot t+1 and otwill thus

be useless. Hence, we only consider otwhen qt+1 = 1 taking

values from the observation space O,{0 (idle),1 (busy)}.

Further, the observation function ωto, (1, n),(1, n′)is de-

ﬁned as the probability that the tagged node at the beginning

of slot t+ 1 obtains the observation ot=oif (qt, nt) = (1, n)

and (qt+1, nt+1 ) = (1, n′). So, we have

ωto, (1, n),(1, n′)

,Prot=o|(qt, nt) = (1, n),(qt+1, nt+1 ) = (1, n′)

=

1,if o= 0, n =n′,

1,if o= 1, n −n′≥1,

0,otherwise,

for each t∈ T \ {D}, each o∈ O and each n, n′∈ N .

Bayesian update of the Activity Belief: It has been shown

in [24] that for each t∈ T with qt= 1, the value of the

activity belief btis a sufﬁcient statistic for the initial activity

belief, all past channel statuses, and all past transmission

probabilities. First, by the total number of nodes Nand the

packet generation probability λ, the tagged node can obtain

b1=hλ,(1 −λ)N−1,(N−1)λ(1 −λ)N−2, . . . , λN−1.

(4)

Then, for each t∈ T \{D}, given the condition qt=qt+1 = 1,

the activity belief bt=b, the observation ot=o, and the

transmission probability pt=pused at slot t, the tagged node

at slot t+ 1 can obtain bt+1 via the Bayes’ rule:

bt+1 ,θt(b, p, o, 1,1),

bt+1(n′)

,Prnt+1 =n′|bt=b, pt=p, ot=o, qt=qt+1 = 1

=n∈N b(n)ωto, (1, n),(1, n′)βt(1, n′),(1, n), p

χt(o, b, p, 1,1) ,

for each n′∈ N , where

χt(o, b, p, 1,1) ,Prqt+1 = 1, ot=o|qt= 1,bt=b, pt=p

=

n∈N

b(n)

n′′∈N

ωto, (1, n),(1, n′′)βt(1, n′′ ),(1, n), p.

Policy: A dynamic control scheme πdeﬁned in Section II-B

can be seen as a deterministic Markovian policy.

Let Rπ(1,hλ)denote the expected total reward from slot

1to slot Dwhen q1= 1,b1=hλand the policy πis used,

which can be deﬁned by

Rπ(1,hλ)

,EπD

t=1,qt=1

rt(1, nt), πt(bt)|q1= 1,b1=hλ.

Obviously, TDRπ=Rπ(1,hλ).

B. POMDP solution

Due to the ﬁnite horizon, ﬁnite state space, compact action

space, bounded rewards, continuous rewards with respect

to pand continuous χt(o, b, p, 1,1) with respect to pin

our POMDP formulation, [31, Prop. 4.4.3, Ch. 4] and [32,

Thm. 7.1, Ch. 6] indicate that for maximizing TDRπ, there

exists a π∈Π, which is indeed optimal over all types of

policies. This property justiﬁes the design goal for the realistic

environment considered in Section II. Hence, we aim to seek

π∗∈arg max

π∈Π

Rπ(1,hλ).

7

Let V∗

t(1,b)denote the value function corresponding to the

maximum total expected reward from slot tto slot Dwhen

qt= 1 and bt=b. Averaging over all possible current states

with qt= 1 and observations with qt+1 = 1, we arrive at the

following Bellman’s equation:

V∗

D(1,b) = max

p∈[0,1]

n∈N

b(n)rD(1, n), p,∀b∈ BD,

V∗

t(1,b) = max

p∈[0,1]

n∈N

b(n)rt(1, n), p

+

o∈O

χt(o, b, p, 1,1)V∗

t+11, θt(b, p, o, 1,1),∀b∈ Bt,

(5)

for each t∈ T \ {D}. Solving Eq. (5) formally leads to π∗.

Unfortunately, getting π∗by solving Eq. (5) is computation-

ally intractable, as both the belief state space t∈T Btand the

action space [0,1] are inﬁnite in our POMDP formulation. As

such, an alternative is to consider a discretized action space

Adthat only consists of uniformly distributed samples of

the interval [0,1], i.e., Ad,{0,∆p, 2∆p, . . . , 1}, where ∆p

denotes the sampling interval. Hence, it is easy to see that Bt

will become ﬁnite for each t∈ T due to the ﬁnite Ad. Then,

theoretically, applying the backward induction algorithm [24]

to get a solution to Eq. (5) can lead to a near-optimal policy,

whose loss of optimality increases with ∆p. However, this

approach is still computationally prohibitive due to super-

exponential growth in the value function complexity.

V. A HEURISTIC SCH EM E FO R TH E REA LI ST IC

ENVIRONMENT

To overcome the infeasibility in obtaining an optimal or

near-optimal control scheme for the realistic environment from

the POMDP framework, in this section, we propose a simple

heuristic control scheme that utilizes the key properties of our

problem. It will be shown in Section VII that the heuristic

scheme performs quite well in simulations.

A. Heuristics from the idealized environment

We ﬁrst investigate the behaviors of

π∗for two extreme

cases in the idealized environment, which would serve to pro-

vide important clues on approximating

π∗. Let Ut(1, n), p

denote the total expected reward from slot tto slot Dfor the

state (qt, nt) = (1, n)when each active node at the beginning

of slot tadopts the transmission probability pt=pand the

optimal decision rules at slots t+ 1, t + 2, . . . , D. So, we have

Ut(1, n),π∗

t(n)=U∗

t(1, n).

Lemma 1. When n1=m→ ∞, by assuming that each

collision involves a ﬁnite number of packets, for each t∈ T

and each possible nt=n, we have

lim

m→∞(n+ 1)Ut(1, n),1

n+ 1=(D−t+ 1)σ

e,(6)

lim

m→∞(n+ 1)U∗

t(1, n) = (D−t+ 1)σ

e.(7)

Proof. Assume each collision involves at most a ﬁnite number,

k≥2, of packets. We begin with the case t=D. By Eq. (2),

we know UD(1, n), p=σp(1−p)nand thus π∗

D(n) = 1

n+1 .

As kand Dare both ﬁnite, for each nD=n∈ {m−k(D−

1), m −k(D−1)+ 1, . . . , m}, we obtain that m→ ∞ implies

n→ ∞ and then

lim

m→∞(n+1)U∗

D(1, n) = lim

m→∞(n+1)UD(1, n),1

n+ 1=σ

e.

(8)

Consider the case t=D−1. By the ﬁnite-horizon policy

evaluation algorithm [31] and Eqs. (1), (2), for each nD−1=

n∈ {m−k(D−2), m −k(D−2) + 1, . . . , m}, we have

(n+ 1)UD−1(1, n), p= (n+ 1)rD−1(1, n), p

+ (n+ 1)

n′∈N

βD−1(1, n′),(1, n), pU∗

D(1, n′)

=σ(n+ 1)p(1 −p)n

+

n′∈N

(n+ 1) n!

n′!(n−n′)!pn−n′(1 −p)n′+1U∗

D(1, n′)

=σ(n+ 1)p(1 −p)n

+

n′∈N n+ 1

n−n′pn−n′(1 −p)n′+1(n′+ 1)U∗

D(1, n′).

By assuming each collision involves at most a ﬁnite number,

k≥2, of packets, we have

(n+ 1)UD−1(1, n), p=σ(n+ 1)p(1 −p)n

+

n

n′=n−k+1 n+ 1

n−n′pn−n′(1 −p)n′+1(n′+ 1)U∗

D(1, n′)

+1−

n

n′=n−k+1 n+ 1

n−n′pn−n′(1 −p)n′+1

·(n−k+ 1)U∗

D(1, n −k)(9)

≤σ(1 −1

n+ 1)n+ (n−k+ 1)U∗

D(1, n −k)

+

n

n′=n−k+1 n+ 1

n−n′pn−n′(1 −p)n′+1

·(n′+ 1)U∗

D(1, n′)−(n−k+ 1)U∗

D(1, n −k).(10)

For each n′∈ {n−k+ 1, n −k+ 2, . . . , n}, since

0≤n+1

n−n′pn−n′(1 −p)n′+1 ≤1, by applying the squeeze

theorem, we obtain from Eq. (8) that

lim

m→∞ n+ 1

n−n′pn−n′(1 −p)n′+1

·(n′+ 1)U∗

D(1, n′)−(n−k+ 1)U∗

D(1, n −k)= 0.

(11)

By Eqs. (8), (11) and inequality (10), as kand Dare both

ﬁnite, we further obtain that m→ ∞ implies n→ ∞ and

lim sup

m→∞ (n+ 1)UD−1(1, n), p

≤lim sup

m→∞ σ(1 −1

n+ 1)n+ (n−k+ 1)U∗

D(1, n −k)

= lim

m→∞ σ(1 −1

n+ 1)n+ (n−k+ 1)U∗

D(1, n −k)=2σ

e,

which implies

lim sup

m→∞ (n+ 1)U∗

D−1(1, n)≤2σ

e.(12)

8

By setting πD−1(n) = 1

n+1 for each nD−1=n∈ {m−

k(D−2), m −k(D−2) + 1, . . . , m}, as kand Dare both

ﬁnite, we obtain that m→ ∞ implies n→ ∞, and then obtain

from Eqs. (8), (9) and (11) that

lim

m→∞(n+ 1)UD−1(1, n),1

n+ 1

= lim

m→∞ σ(1 −1

n+ 1)n+ (n−k+ 1)U∗

D(1, n −k)=2σ

e.

Since U∗

D−1(1, n)≥UD−1(1, n),1

n+1 , we have

lim inf

m→∞ (n+ 1)U∗

D−1(1, n)

≥lim inf

m→∞ (n+ 1)UD−1(1, n),1

n+ 1=2σ

e.(13)

Combining inequalities (12) and (13), we have limm→∞(n+

1)U∗

D−1(1, n) = 2σ

e.

For the cases of t=D−2, D−3, . . . , 1, iteratively repeating

the above argument can lead to Eqs. (6) and (7) for each

possible nt=n.

Lemma 1 motivates us to conjecture that, if n1takes a value

sufﬁciently larger than D, the realizations of (nt+ 1)π∗

t(nt)

would always approach 1 for each t∈ T . Fig. 3 shows

1000 such realizations when D= 10 for n1= 30,50,100,

respectively, which conﬁrm our conjecture.

12345678910

1

1.002

1.004

1.006

1.008

1.01

1.012

1.014

1.016

Fig. 3: Realizations of (nt+ 1)π∗

t(nt)when n1= 30,50,100

and D= 10.

We further investigate the behaviors of

π∗for the extreme

case that n1takes a value sufﬁciently smaller than D.

Lemma 2. For each t∈ T , we have

U∗

t(1,1) = 3D−3t+ 1

3D−3t+ 4σ, (14)

and for each t∈ T \ {D}, we have

π∗

t(1) = 3

3D−3t+ 4.(15)

Proof. As U∗

t(1,0) = σfor each t∈ T , we have

Ut(1,1), p= 2σp(1 −p) + (1 −p)2U∗

t+1(1,1),(16)

for each t∈ T \ {D}. Taking the derivative of Ut(1,1), p

with respect to pderives that

d

dpUt(1,1), p

=2σ−2U∗

t+1(1,1)−4σ−2U∗

t+1(1,1)p.

As σ > 0and U∗

t(1,1) ≤σfor each t∈ T \ {D}, we have

π∗

t(1) = σ−U∗

t+1(1,1)

2σ−U∗

t+1(1,1) ,(17)

for each t∈ T \ {D}. In particular, as U∗

D(1,1) = σ/4, we

obtain π∗

D−1(1) = 3/7, which satisﬁes Eq. (15).

Then, we aim to investigate the relation between π∗

t(1) and

π∗

D−1(1) for each t∈ T \ {D−1, D}. By setting p=π∗

t(1)

in Eq. (16), we obtain

U∗

t(1,1) = 2σπ∗

t(1)1−π∗

t(1)+1−π∗

t(1)2U∗

t+1(1,1)

=σ2

2σ−U∗

t+1(1,1) .(18)

Using Eq. (17) to express U∗

t+1(1,1) and U∗

t(1,1) in Eq. (18)

in terms of π∗

t(1) and π∗

t−1(1), respectively, we have

π∗

t(1) = π∗

t+1(1)

1 + π∗

t+1(1) ,(19)

for each t∈ T \ {D−1, D}. Furthermore, recursively using

Eq. (19) yields

π∗

t(1) = π∗

D−1(1)

1+(D−t−1)π∗

D−1(1),(20)

and thus implies Eq. (15) by π∗

D−1(1) = 3/7.

Finally, combining Eqs. (15) and (17) yields

U∗

t(1,1) = 1−2π∗

t−1(1)

1−π∗

t−1(1) σ=3D−3t+ 1

3D−3t+ 4σ, (21)

for each t∈ T \ {1}, and substituting Eq. (21) into Eq. (18)

yields U∗

1(1,1) = 3D−2

3D+1 σ. Hence we complete the proof for

Eq. (14).

Inspired by Eq. (15), we consider a simple control scheme

πeve ,[πeve

1,πeve

2, . . . , πeve

D]∈

Πwhere

πeve

t(n) = 1

D−t+ 1,(22)

for each t∈ T and each n∈ N .

Let Ueve

t(1, n)denote the expected total reward from slot t

to slot Dfor the state (qt, nt) = (1, n)when each active node

adopts the decision rules πeve

tat slots t, t+1, . . . , D. So, using

the ﬁnite-horizon policy evaluation algorithm [31], we have

Ueve

D(1, n) = rD(1, n),πeve

D(n),∀n∈ N ,

Ueve

t(1, n) = rt(1, n),πeve

t(n)

+

n′∈N

βt(1, n′),(1, n),πeve

t(n)Ueve

t+1(1, n′),∀n∈ N .

(23)

for each t∈ T \ {D}.

Lemma 3. For each t∈ T \ {D}and each n∈ N , we have

Ueve

t(1, n) = σ1−1

D−t+ 1n.(24)

9

Proof. We shall prove Ueve

t(1, n) = σ1−1

D−t+1 nfor each

n∈ N by induction from t=D−1down to 1.

First, when t=D−1, by Eqs. (1), (2) and (23), we have

Ueve

D−1(1, n) = rD−1(1, n),πeve

D−1(n)

+

n′∈N

βD−1(1, n′),(1, n),πeve

D−1(n)Ueve

D(1, n′)

=σ1

21−1

2n+ (1 −1

2)n+1Ueve

D(1,0) = σ(1 −1

2)n,

for each n∈ N , thereby establishing the induction basis.

When t∈ T \ {D−1, D}, assume Ueve

t+1(1, n) = σ1−

1

D−tnfor each n∈ N . By Eqs. (1), (2) and (23), we have

Ueve

t(1, n) = rt(1, n),πeve

t(n)

+

n′∈N

βt(1, n′),(1, n),πeve

t(n)Ueve

t+1(1, n′)

=σ1

D−t+ 11−1

D−t+ 1n

+

n′∈N n

n−n′1

D−t+ 1n−n′

·1−1

D−t+ 1n′+1σ1−1

D−tn′

=σ1−1

D−t+ 1n1

D−t+ 1

+σ1−1

D−t+ 1nD−t

D−t+ 1

·

n′∈N n

n−n′1

D−tn−n′1−1

D−tn′

=σ1−1

D−t+ 1n,

for each n∈ N . So, the inductive step is established.

Since both the base case and the inductive step have been

proved as true, we have Ueve

t(1, n) = σ1−1

D−t+1 nfor each

t∈ T \ {D}and each n∈ N .

For each t∈ T \ {D}and each n∈ N , based on the fact

U∗

t(1, n)≤σ, we have

Ueve

t(1, n)

U∗

t(1, n)≥1−1

D−t+ 1n.(25)

We can observe from Eq. (25) that, if nis sufﬁciently smaller

than D−t+ 1, the value of U∗

t(1, n)is close to the value

of Ueve

t(1, n). Then, Eq. (25) motivates us to conjecture that,

if nttakes a value sufﬁciently smaller than D−t+ 1,πeve

t

may behave like π∗

t, i.e., the realizations of (D−t+ 1)π∗

t(nt)

would always approach (D−t+1)πeve

t(nt) = 1 for each nt=

n∈ N . Fig. 4 shows 1000 such realizations when n1= 10 for

D= 30,50,100, respectively, which conﬁrm our conjecture.

Naturally, we obtain the following heuristics from Lemma

1 and Eq. (25), respectively.

1) From Lemma 1: When the number of active nodes is

sufﬁciently large compared with the value of remaining

slots, it is desirable for the active nodes to adopt the

transmission probability that maximizes the instantaneous

throughput. This implies that, when the remaining slots

are not enough, the active nodes should utilize the channel

as much as possible, which is time-independent.

12345678910

0.972

0.976

0.98

0.984

0.988

0.992

0.996

1

Fig. 4: Realizations of (D−t+ 1)π∗

t(nt)when n1= 10 and

D= 30,50,100.

0 5 10 15 20 25 30

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

Fig. 5: π∗

t(n)and its approximation for typical choices of

parameters when D= 30.

2) From Eq. (25):When the number of active nodes is

sufﬁciently small compared with the value of remaining

slots, it is desirable for the active nodes to adopt the

transmission probability to ensure that all the backlogged

packets would be almost evenly transmitted in the remain-

ing slots. This implies that, when the remaining slots are

enough, the active nodes should cherish the transmission

chance in order to utilize the time as much as possible,

which is time-dependent.

Based on these heuristics and the obvious fact π∗

D(n) = 1

n+1

for each n∈ N , we propose a simple approximation on

π∗.

Approximation on

π∗: For each slot t∈ T and each nt=

n∈ N , if the number of active nodes n+ 1 is larger than

the value of remaining slots D−t+ 1 or t=D,π∗

t(n)can

be estimated by 1

n+1 , otherwise π∗

t(n)can be estimated by

1

D−t+1 .

Fig. 5 compares π∗

t(n)and its approximation for typical

choices of parameters when D= 30. It is shown that the

approximation error is very small when the difference between

nand D−tis large. This is because π∗

t(n)is dominated by nif

nis much larger than D−tbut is dominated by tif nis much

smaller than D−t, which is consistent with the heuristics.

However, it is shown that the approximation error is noticeable

10

0 5 10 15 20 25 30

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Fig. 6: The total expected rewards from slot tto Dcorre-

sponding to π∗

t(n)and its approximation for typical choices

of parameters when D= 30.

when the difference between nand D−tis small. This is

because both nand thave notable impacts on π∗

t(n)for this

case, which is not considered in the heuristics. The results also

show that the ratio of the cases with the approximation error

larger than 8% is 6.67% and the largest approximation error

is 11.47%, thus justifying our approximation. Furthermore,

Fig. 6 compares the total expected rewards from slot tto

Dcorresponding to π∗

t(n)and its approximation for typical

choices of parameters when D= 30. The results show that

the approximation leads to at most 0.66% reward loss (much

smaller than the approximation error), and thus verify that our

approximation can be used to obtain TDR quite close to the

maximum achievable TDR in the idealized environment.

B. A simple approximation on the activity belief of the realistic

environment

To apply the approximation on

π∗to the realistic envi-

ronment, it is necessary for each active node to perform

a runtime updating of the activity belief bt. However, as

shown in Section IV-A, the full Bayesian updating of bt

is a bit computationally demanding to implement. So, we

shall propose a simple approximation on bt, denoted by

bbd

t,bbd

t(0), bbd

t(1), . . . , bbd

t(N−1), relying on a binomial

distribution with a changeable parameter vector (Mt, αt).

More speciﬁcally, if (Mt, αt) = (M, α), we have

bbd

t(n) = M

nαn(1 −α)M−n,if 0≤n≤M,

0,otherwise.(26)

As such, in this manner, each active node will only keep the

parameter vector (Mt, αt)rather than the activity belief bt.

Obviously, by Eq. (4), we can set (M1, α1)=(N−1, λ)

to achieve bt=bbd

t. Then, for each t∈ T \ {D}, we will

show that we can use the Bayes’ rule exactly to set the value of

(Mt+1, αt+1 )when the observation ot= 0, but must introduce

an approximation assumption when ot= 1.

For each t∈ T \ {D}, given (Mt, αt) = (M, α),bbd

t=bbd

and pt=p, the following procedure ﬁrst uses the Bayes’ rule

to compute bmed

t+1 ,bmed

t+1(0), bmed

t+1(1), . . . , bmed

t+1(N−1), and

then computes (Mt+1, αt+1 )based on the value of bmed

t+1.

Case 1: if ot= 0, the Bayesian update yields

bmed

t+1(n′)

=n∈N bbd(n)ωt0,(1, n),(1, n′)βt(1, n′),(1, n), p

χt(0,bbd, p, 1,1)

=

M

n′α−αp

1−αp n′1−α−αp

1−αp M−n′

,if 0≤n′≤M,

0,otherwise.

We require bbd

t+1 to directly take the value of bmed

t+1, and set

(Mt+1, αt+1 ) = (M, α−αp

1−αp ).

Case 2: if ot= 1, the Bayesian update yields

bmed

t+1(n′)

=n∈N bbd(n)ωt1,(1, n),(1, n′)βt(1, n′),(1, n), p

χt(1,bbd, p, 1,1)

=

1

1−(1−αp)M

·M

n′α(1 −p)n′1−α(1 −p)M−n′

−(1 −αp)MM

n′α−αp

1−αp n′1−α−αp

1−αp M−n′,

if 0≤n′≤M−1,

0,otherwise.

Such a Bayesian update does not yield a distribution in the

form (26). However, we modify the value of bmed

t+1 to a

distribution in the form (26) by keeping the mean of the

distribution unchanged and considering that the number of

active nodes will be reduced by at least one due to a busy

slot. So, when M > 1, we set

(Mt+1, αt+1 ) = M−1,M(α−αp)1−(1 −αp)M−1

(M−1)1−(1 −αp)M,

and when M= 1, we adopt the convention that

(Mt+1, αt+1 ) = (M−1,1).

The accuracy of this approximation will be examined via

numerical results at the end of this section.

C. A heuristic scheme

With the investigations in Sections V-A and V-B together,

we are ready to propose a heuristic but very simple control

scheme for the realistic environment, πheu.

At the beginning of each slot t∈ T , given the parameter

vector of belief approximation (Mt, αt) = (M, α), each active

node uses Mα to estimate the mean of nt, and further uses

the following simple rule πheu

t(M, α)to determine the value

of transmission probability pt.

1) If M α + 1 > D −t+ 1 or t=D, set πheu

t(M, α)to

maximize the expected instantaneous throughput, i.e.,

πheu

t(M, α)∈arg max

p∈[0,1]

n∈N

bbd(n)rt(1, n), p

= min 1

Mα +α,1.(27)

11

TABLE III: A comparison between the realizations of btand its approximation bbd

twhen each active node adopts πheu for

N= 10,λ= 0.8,D= 8.

bt(0) bt(1) bt(2) bt(3) bt(4) bt(5) bt(6) bt(7) bt(8) bt(9)

t= 1

o1= 0

bt0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218

Approx. 0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218

t= 2

o2= 1

bt0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160

Approx. 0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160

t= 3

o3= 1

bt0.000059 0.001098 0.009014 0.042646 0.127254 0.245406 0.298859 0.210235 0.065430 0

Approx. 0.000052 0.001004 0.008559 0.041692 0.126924 0.247294 0.301138 0.209545 0.063792 0

t= 4

o4= 1

bt0.001086 0.012248 0.059916 0.164987 0.276437 0.282086 0.162465 0.040774 0 0

Approx. 0.000974 0.011537 0.058598 0.165343 0.279925 0.284347 0.160466 0.038810 0 0

t= 5

o5= 1

bt0.010921 0.072058 0.201100 0.304173 0.263268 0.123764 0.024716 0 0 0

Approx. 0.010329 0.070827 0.202359 0.308353 0.264299 0.120821 0.023013 0 0 0

t= 6

o6= 0

bt0.068102 0.238724 0.340491 0.247285 0.091556 0.013842 0 0 0 0

Approx. 0.067210 0.240606 0.344541 0.246686 0.088312 0.012646 0 0 0 0

t= 7

o7= 0

bt0.169904 0.357679 0.306377 0.133629 0.029713 0.002698 0 0 0 0

Approx. 0.167239 0.359554 0.309208 0.132956 0.028585 0.002458 0 0 0 0

t= 8

o8= 1

bt0.421334 0.395352 0.150943 0.029344 0.002908 0.000118 0 0 0 0

Approx. 0.416144 0.398784 0.152859 0.029297 0.002807 0.000108 0 0 0 0

The proof of Eq. (27) is provided in the supplemental

material.

2) Otherwise, set

πheu

t(M, α) = 1

D−t+ 1.

The heuristic scheme depends on not only ntbut also t, so it

is a time-dependent dynamic policy.

Table III compares the realizations of btand its approxi-

mation bbd

twhen each active node adopts πheu for N= 10,

λ= 0.8,D= 8, and veriﬁes that the proposed approximation

is reasonable.

Let Rπheu 1,(N−1, λ)denote the expected total reward

from slot 1to slot Dwhen q1= 1,(M1, α1) = (N−1, λ)

and the policy πheu is used, which can be deﬁned by

Rπheu 1,(N−1, λ)

,Eπheu D

t=1,qt=1

rt(1, nt), πheu

t(Mt, αt)

|q1= 1,(M1, α1) = (N−1, λ).

Obviously, we have TDRπheu =Rπheu 1,(N−1, λ).

Remark: The proposed heuristic scheme for the realistic en-

vironment is based on the model-based MDP and POMDP

formulations, which require each node to know other nodes’

trafﬁc parameters. Owing to the known model parameters,

the heuristic scheme enjoys threefold advantages of being

implemented without imposing extra overhead and hardware

cost, of being implemented with very low computational

complexity, and of achieving TDR close to the maximum

achievable TDR in the idealized environment. However, in

practice, the model parameters may not be available to for-

mulate the optimization and obtain the heuristic scheme. As

such, reinforcement learning algorithms [33] can be used to

learn the model parameters by directly interacting with the

environment, but at the cost of extra training phase with slow

convergence speed and high computational complexity, which

may not be feasible for low-cost nodes.

VI. EX TE NS IO N TO RETRANSMISSION-BA SE D

BROADCASTING

In our study so far, we have assumed that every packet is

never retransmitted, which is a common setting in broadcasting

due to the lack of acknowledgements, and a desirable setting

for energy saving in many energy-constrained applications.

This assumption also allows us to provide a nice presentation

in Sections III–V for designing optimal and heuristic schemes.

In this section, to further improve the TDR, we relax this

assumption to allow at most K≥1copies of every packet

to be transmitted within the deadline Dslots. However, the

updating of the activity belief for the case K≥2would

lead to much more complicated modeling than that for the

case K= 1. Instead, we directly generalize πheu to propose

a heuristic scheme for the realistic environment with K≥1,

πheuR, which makes good use of the results in Sections III–V.

The key idea behind πheuR is based on the heuristic that each

copy of every packet has “average deadline” D/K slots. So,

we ﬁrst divide a frame into Kconsecutive subframes, indexed

by k∈ {1,2, . . . , K}, and require the k-th subframe to occupy

Dkslots by the following rule: if r= 0,Dk=D/K for k=

1,2, . . . , K; otherwise, Dk=⌊D/K⌋for k= 1,2, . . . , K −r,

and Dk=⌈D/K⌉for k=K+ 1 −r, . . . , K, where r=D

mod K. Then, we require all the nodes, which are active at the

beginning of a frame, to only broadcast the k-th copy at most

once during the k-th subframe, for k= 1,2, . . . , K. In this

manner, it is easy to see that the updating of the activity belief

proposed in Sections IV–V is still applicable to each subframe.

This property motivates us to use πheu with the same initial

belief to broadcast the k-th copy during the k-th subframe for

k= 1,2, . . . , K, i.e., repeatedly use πheu for Ktimes. Clearly,

the study in Section V can be seen as a particular case here

(i.e., K= 1,D1=D). An example of the working procedure

of πheuR for D= 6, K = 2 can be found in Fig. 7.

12

packet transmission

p1

t = 1

transmission

probabilities

t = 2 t = 3 t = 1 t = 2 t = 3

an arbitrary

node active inactive

active

active active

an arbitrary frame, D = 6 slots

1-st subframe, D1 = 3 slots 2-nd subframe, D2 = 3 slots

p2p2

p1p3

( )

ttt Mα ,πheu

( )

ttt Mα ,πheu

()()

λ ,

1

α , 1

1-

=N

M

()()

λ ,

1

α , 1

1-

=N

M

active

Fig. 7: An example of the working procedure of πheuR for

D= 6, K = 2.

Let Rπheu

k1, n, (N−1, λ)denote the expected total reward

of the k-th subframe from slot 1to slot Dkwhen q1= 1,

n1=n,(M1, α1)=(N−1, λ)and the policy πheu is used,

which can be deﬁned by

Rπheu

k1, n, (N−1, λ)

,Eπheu Dk

t=1,qt=1

rt(1, nt), πheu

t(Mt, αt)

|q1= 1, n1=n, (M1, α1) = (N−1, λ).

Then, the TDR under the policy πheuR can be computed by

TDRπheuR = 1 −

n∈N N−1

nλn(1 −λ)N−1−n

·

K

k=1 1−Rπheu

k1, n, (N−1, λ).

The performance improvement owning to the use of πheuR

will be investigated in Section VII.

VII. NUMERICAL EVALUATI ON

This section includes two subsections. To validate the stud-

ies in Sections III–V without considering retransmissions, the

ﬁrst subsection compares the TDR performance of the optimal

scheme for the idealized environment

π∗, the proposed heuris-

tic scheme for the realistic environment πheu, an optimal static

scheme for the realistic environment πsta [6], [8], [11], [13],

and the ITM scheme for the realistic environment (also can

be seen as the myopic scheme in our POMDP model) πmyo

[14], [20], [21]. Here, πsta requires each active node to always

adopt a static and identical transmission probability, and an

optimal transmission probability can be obtained by solving

maxp∈[0,1] Rπ(1,hλ)s.t. πt(b) = p,∀t∈ T ,∀b∈ Bt.

Such comparisons are not only helpful to demonstrate the

performance loss due to the incomplete knowledge of the value

of nt, but also helpful to demonstrate the performance advan-

tage beneﬁtting from dynamically adjusting the transmission

probability. In the second subsection, for the scenario where

each packet is allowed to be transmitted at most Ktimes,

we compare the TDR performance of the proposed heuristic

scheme for the realistic environment πheuR with K=KheuR∗,

and an optimal static scheme for the realistic environment

πstaR with K=KstaR∗. Here, KheuR∗and KstaR∗are optimal

values of Kthat maximize the TDR of πheuR and πstaR,

respectively.

It is worth mentioning that, to our best knowledge, design-

ing the ITM scheme with 1< K < D for the realistic

environment requires a new modeling to update the activity

belief due to the impact of the retransmission limit on the

node status, which is very different from the modelings in this

paper and previous studies [19]–[23]. Meanwhile, the ITM

scheme with K=Dfor the realistic environment can be

seen as a particular static scheme. So, based on these two

considerations, the ITM scheme with retransmissions is not

presented in Subsection VII-B for comparisons.

The scenarios considered in the numerical experiments for

the ﬁrst subsection are in accordance with the system model

speciﬁed in Section II-A, while the scenarios for the second

subsection additionally allows retransmissions. We shall vary

the system conﬁguration over a wide range to investigate the

impact of control scheme design on the TDR performance.

Each numerical result is obtained from 107independent nu-

merical experiments.

A. Comparisons Without Retransmissions

Fig. 8 shows the TDR performance without retransmissions

as a function of the packet arrival rate λ. The TDR of each

scheme except πmyo always decreases when λincreases, due

to the increase of contention intensity. But λhas a more

complicated impact on πmyo, because πmyo behaves more

optimally when λincreases as indicated by Lemma 1, which

compensates the negative impact of the increase of contention

intensity. We observe that πheu performs close to

π∗:2.87%–

8.18% loss when D= 10 and 0.56%–4.04% loss when

D= 20. This indicates that the design of πheu is reasonable,

and the incomplete knowledge of the value of nthas a minor

impact on the TDR performance. We further observe that πheu

signiﬁcantly outperforms πsta:2.03%–17.06% improvement

when D= 10 and 11.09%–19.59% improvement when

D= 20. The reason is obvious that πsta does not adjust

the transmission probability according to the current delivery

urgency and contention intensity. It is interesting to note that

πsta performs closer to other schemes as λincreases. This

is because the optimal transmission probabilities for different

values of tand ntbecome closer with the value of n1, as

indicated by Lemma 1. Moreover, we also observe that πheu

signiﬁcantly outperforms πmyo when Nλ < D:14.43%–

58.24% improvement when D= 10 and 12.37%–106.57%

improvement when D= 20;πheu achieves almost the same

TDR as πmyo when Nλ ≥D. This observation conﬁrms

again the heuristic from Lemma 1 in Section V-A and further

indicates that the ITM goal [14], [20], [21] is unsuitable here.

Fig. 9 shows the TDR performance without retransmissions

as a function of the delivery deadline D. We observe that

the TDR of each scheme except πmyo always increases with

D, due to the decrease of delivery urgency. But the TDR of

πmyo ﬁrst increases and then remains the same with D. This

phenomenon is due to the fact that, when Nλ < D,πmyo

would transmit all backlogged packets in the ﬁrst ⌈Nλ⌉slots

at average and waste the remaining slots. We observe that

πheu performs signiﬁcantly better than πsta:6.61%–16.74%

improvement when σ= 0.8and 6.56%–16.89% improvement

13

0.1 0.16 0.22 0.28 0.34 0.4

0.15

0.29

0.43

0.57

0.71

0.85

Fig. 8: The TDR as a function of the packet arrival rate λfor

N= 50,D= 10,20,σ= 0.9.

when σ= 1;πheu performs close to

π∗:3.30%–6.56% loss

when σ= 0.8and 3.17%–6.60% loss when σ= 1. It is shown

that the relative gap between

π∗and πheu is small when D

is much larger than Nλ. The rationale is that, for this case in

the realistic environment, ntprobably takes a value nmuch

smaller than D−tand it is easy for the active nodes to judge

correctly whether n < D−t, which is crucial for πheu . We also

observe that πheu signiﬁcantly outperforms πmyo when Nλ <

D:6.15%–39.84% improvement when σ= 0.8and 6.58%–

39.72% improvement when σ= 1, and achieves almost the

same TDR performance as πmyo when Nλ ≥D. This conﬁrms

again that the inefﬁcient time utilization of πmyo, which is

opposite to the insight behind the heuristic from Eq. (25),

yields the poor TDR performance of πmyo.

10 12 14 16 18 20

0.2

0.29

0.38

0.47

0.56

0.65

Fig. 9: The TDR as a function of the delivery deadline Dfor

N= 50,λ= 0.25,σ= 0.8,1.

Fig. 10 shows the TDR performance without retransmission-

s as a function of the packet success rate σ. The TDR of each

scheme always increases with σ, due to the increase of channel

quality. We observe that πheu performs signiﬁcantly better than

πsta:18.59%–19.03% improvement when λ= 0.1and 5.51%–

5.77% improvement when λ= 0.4, and performs close to

π∗:

0.70%–1.06% loss when λ= 0.1and 4.09%–4.33% loss when

λ= 0.4. We also observe that πheu signiﬁcantly outperforms

πmyo when Nλ < D:89.67%–91.00% improvement when

λ= 0.1;πheu achieves almost the same TDR as πmyo when

Nλ ≥D. This conﬁrms again the beneﬁts of determining the

transmission probability simultaneously based on the current

delivery urgency and contention intensity. It is also shown that

πsta performs close to other schemes as N λ

Dbecomes larger.

This is because, ntprobably takes a value nmuch larger than

D−tand the knowledge of nhas a minor impact on TDR

due to large n.

0.8 0.84 0.88 0.92 0.96 1

0.2

0.35

0.5

0.65

0.8

0.95

Fig. 10: The TDR as a function of the packet success rate σ

for N= 50,λ= 0.1,0.4,D= 15.

B. Comparisons With Retransmissions

Figs. 11–13 show the TDR performance with retransmis-

sions as functions of λ,Dand σ, respectively. We observe

from Fig. 11 that πheuR with K=KheuR∗outperforms πstaR

with K=KstaR∗:2.06%–9.29% improvement when D= 10

and 3.42%–11.04% improvement when D= 20. We observe

from Fig. 12 that πheuR with K=KheuR∗enjoys 5.64%–

10.46% improvement when σ= 0.8and 6.58%–11.25%

improvement when σ= 1. We observe from Fig. 13 that

πheuR with K=KheuR∗enjoys 1.80%–3.74% improvement

when λ= 0.1and 5.61%–5.95% improvement when λ= 0.4.

These results indicate that the idea of generalizing πheu to

πheuR is effective in improving the TDR for a wide range

of conﬁgurations when retransmissions are allowed. Next, we

can see that, in general, KheuR∗increases when λdecreases,

Dincreases, or σdecreases. This is because appropriately

introducing more retransmissions would be useful to improve

the time utilization for smaller λand larger D, and to resist

the risk of transmission failures for smaller σ. Moreover, it

is shown that KheuR∗≤KstaR∗in all the cases, which is

desirable for the sake of energy efﬁciency. This is because

dynamic control is helpful to improve the delivery reliability of

a single transmission and alleviate the need of retransmissions.

It also implies that although the performance gap in Figs. 11–

13 is less notable than that when K= 1, such a phenomenon

is caused by allowing the static scheme to unfairly perform

more retransmissions.

VIII. CONCLUSION

In this paper, under the idealized and realistic environments

without retransmissions, optimal dynamic control schemes

for random access in deadline-constrained broadcasting with

14

0.1 0.16 0.22 0.28 0.34 0.4

0.1

0.25

0.4

0.55

0.7

0.85

Fig. 11: The TDR as a function of the packet arrival rate λ

for N= 50,D= 10,20,σ= 0.9.

10 12 14 16 18 20

0.2

0.28

0.36

0.44

0.52

0.6

Fig. 12: The TDR as a function of the delivery deadline D

for N= 50,λ= 0.25,σ= 0.8,1.

0.8 0.84 0.88 0.92 0.96 1

0.1

0.25

0.4

0.55

0.7

0.85

Fig. 13: The TDR as a function of the packet success rate σ

for N= 50,λ= 0.1,0.4,D= 15.

frame-synchronized trafﬁc have been investigated based on

the theories of MDP and POMDP, respectively. The pro-

posed heuristic scheme for the realistic environment is able

to achieve the threefold goal of being implemented with-

out imposing extra overhead and hardware cost, of being

implemented with very low computational complexity, and

of achieving TDR close to the maximum achievable TDR

in the idealized environment. Moreover, it has been shown

that the proposed heuristic scheme can be easily extended to

incorporate retransmissions for further improving the TDR.

An interesting and important future research direction is

to optimize deadline-constrained broadcasting under general

trafﬁc patterns. To handle with such an asymmetric scenario,

a standard way is to develop a dynamic control scheme based

on the theory of decentralized MDP (Dec-MDP). However,

solving a Dec-MDP is in general NEXP-complete [34]. Hence,

an appropriate practical scheme needs to be further studied,

which is our ongoing work.

ACK NOW LE DG EM EN T

The authors would like to thank Dr. He Chen for helpful

suggestions and discussions.

REFERENCES

[1] D. Feng, C. She, K. Ying, L. Lai, Z. Hou, T. Q. S. Quek, Y. Li, and

B. Vucetic, “Toward ultrareliable low-latency communications: Typical

scenarios, possible solutions, and open issues,” IEEE Veh. Technol. Mag.,

vol. 14, no. 2, pp. 94–102, 2019.

[2] J. Gao, M. Li, L. Zhao, and X. Shen, “Contention intensity based

distributed coordination for V2V safety message broadcast,” IEEE Trans.

Veh. Technol., vol. 67, no. 12, pp. 12288–12301, 2018.

[3] M. Luvisotto, Z. Pang, and D. Dzung, “High-performance wireless

networks for industrial control applications: New targets and feasibility,”

Proc. IEEE, vol. 107, no. 6, pp. 1074–1093, 2019.

[4] C. Chen, H. Li, H. Li, R. Fu, Y. Liu, and S. Wan, “Efﬁciency and

fairness oriented dynamic task ofﬂoading in Internet of Vehicles,” IEEE

Trans. Green Commun. Netw., vol. 6, no. 3, pp. 1481–1493, 2022.

[5] Y. Zhao, X. Wang, C. Wang, Y. Cong, and L. Shen, “Systemic design

of distributed multi-UAV cooperative decision-making for multi-target

tracking,” Auton. Agents Multi-Agent Syst., vol. 33, no. 1, pp. 132–158,

2019.

[6] Y. H. Bae, “Analysis of optimal random access for broadcasting with

deadline in cognitive radio networks,” IEEE Commun. Lett., vol. 17,

no. 3, pp. 573–575, 2013.

[7] Y. H. Bae, “Random access scheme to improve broadcast reliability,”

IEEE Commun. Lett., vol. 17, no. 7, pp. 1467–1470, 2013.

[8] Y. H. Bae, “Queueing analysis of deadline-constrained broadcasting in

wireless networks,” IEEE Commun. Lett., vol. 19, no. 10, pp. 1782–

1785, 2015.

[9] C. Campolo, A. Vinel, A. Molinaro, and Y. Koucheryavy, “Modeling

broadcasting in IEEE 802.11p/WAVE vehicular networks,” IEEE Com-

mun. Lett., vol. 15, no. 2, pp. 199–201, 2011.

[10] M. I. Hassan, H. L. Vu, and T. Sakurai, “Performance analysis of the

IEEE 802.11 MAC protocol for DSRC with and without retransmission-

s,” in Proc. IEEE WoWMoM, 2010, pp. 1–8.

[11] Y. H. Bae, “Optimal retransmission-based broadcasting under delivery

deadline constraint,” IEEE Commun. Lett., vol. 19, no. 6, pp. 1041–1044,

2015.

[12] Y. H. Bae, “Modeling timely-delivery ratio of slotted aloha with energy

harvesting,” IEEE Commun. Lett., vol. 21, no. 8, pp. 1823–1826, 2017.

[13] L. Deng, J. Deng, P. Chen, and Y. S. Han, “On the asymptotic perfor-

mance of delay-constrained slotted ALOHA,” in Proc. IEEE ICCCN,