Content uploaded by Aoyu Gong

Author content

All content in this area was uploaded by Aoyu Gong on Aug 11, 2021

Content may be subject to copyright.

arXiv:2108.03176v1 [eess.SY] 6 Aug 2021

1

Dynamic Control for Random Access in

Deadline-Constrained Broadcasting

Aoyu Gong, Lei Deng, Fang Liu and Yijin Zhang

Abstract—This paper considers random access in deadline-

constrained broadcasting with frame-synchronized trafﬁc. To

enhance the maximum achievable timely delivery ratio (TDR),

we deﬁne a dynamic control scheme that allows each active

node to determine the transmission probability with certainty

based on the current delivery urgency and the knowledge of

current contention intensity. For an idealized environment where

the contention intensity is completely known, we develop an

analytical framework based on the theory of Markov Deci-

sion Process (MDP), which leads to an optimal scheme by

applying backward induction. For a realistic environment where

the contention intensity is incompletely known, we develop a

framework using Partially Observable Markov Decision Process

(POMDP), which can in theory be solved. We show that for both

environments, there exists an optimal scheme that is optimal over

all types of policies. To overcome the infeasibility in obtaining an

optimal or near-optimal scheme from the POMDP framework,

we investigate the behaviors of the optimal scheme for two

extreme cases in the MDP framework, and leverage intuition

gained from these behaviors to propose a heuristic scheme for the

realistic environment with TDR close to the maximum achievable

TDR in the idealized environment. In addition, we propose

an approximation on the knowledge of contention intensity to

further simplify this heuristic scheme. Numerical results with

respect to a wide range of conﬁgurations are provided to validate

our study.

Index Terms—Random access, deadline-constrained broadcast-

ing, reliability, Markov decision process

I. INTRO DUC TIO N

BROADCASTING is a fundamental operation in wireless

networks. With the explosive growth of ultrareliable low-

latency services for Internet of things [1], [2], [3], such

as multimedia sharing in sensor networks, safety message

dissemination in vehicular networks and industrial control in

factory automation systems, deadline-constrained broadcasting

has been becoming a research focus in recent years. For such

broadcasting, each packet needs to be transmitted within a

strict delivery deadline since its arrival, and will be discarded

if the deadline expires. Hence, timely delivery ratio (TDR), de-

ﬁned as the probability that a broadcast packet is successfully

This work was supported in part by the National Natural Science Foundation

of China under Grants 62071236, 61902256, and in part by the Fundamental

Research Funds for the Central Universities of China (No. 30920021127).

(Corresponding author: Yijin Zhang.)

A. Gong and Y. Zhang are with the School of Electronic and Optical

Engineering, Nanjing University of Science and Technology, Nanjing 210094,

China. E-mail: {gongaoyu; yijin.zhang}@gmail.com.

L. Deng is with the College of Electronics and Information Engineering,

Shenzhen University, Shenzhen 518061, China. E-mail: ldeng@szu.edu.cn.

F. Liu is with the Department of Information Engineering, The Chi-

nese University of Hong Kong, Shatin, N. T., Hong Kong. E-mail:

lf015@ie.cuhk.edu.hk.

delivered to an arbitrary intended receiver within the given

delivery deadline, is considered as a critical metric to evaluate

the performance of such broadcasting.

A canonical deadline-constrained broadcasting scenario is

that an uncertain set of nodes with new or backlogged packets

attempt to transmit at approximately the same time, so that

random access mechanisms are needed to support efﬁcient

channel sharing and careful design of access parameters is

needed to maximize the TDR.

Many recent literatures have been dedicated to this issue.

Under saturated trafﬁc, Bae [4], [5] obtained the optimal

slotted-ALOHA for broadcasting single-slot packets and op-

timal p-persistent CSMA for broadcasting multi-slot packets,

respectively. Under a discrete-time Geo/Geo/1 queue model,

Bae [6] obtained the optimal slotted-ALOHA for broadcasting

single-slot packets. Under frame-synchronized trafﬁc, Cam-

polo et al. [7] proposed an analytical model for using IEEE

802.11p CSMA/CA to broadcast multi-slot packets, which

can be used to obtain the optimal contention window size.

However, [4], [5], [6] adopted a static transmission probability

and [7] adopted a static contention window size, thus in-

evitably limiting the maximum achievable TDR. Other studies

on deadline-constrained random access include [8], [9] for

uplink to a common receiver and [10] for unicast in ad hoc

networks, which all still restrict their attentions to static access

parameters.

As such, to enhance the maximum achievable TDR of ran-

dom access in deadline-constrained broadcasting, it is strongly

required to develop a dynamic control scheme that allows

each node to adjust its access parameters according to the

knowledge of current delivery urgencies and the knowledge

of current contention intensity. Unfortunately, due to random

trafﬁc or limited capability on observing the channel status,

each node cannot obtain a complete knowledge of the current

contention intensity in practice, which renders such design a

challenging task. So, each node has to estimate the current

contention intensity using the information obtained from the

observed channel status. A great amount of work has gone

into studying such information that can be obtained [11], [12],

[13], [14], [15] under various models and protocols. Our work

follows the same direction of [12], [13] to keep an A Poste-

riori Probability (APP) distribution for the current contention

intensity given all past observations and access settings, which

is a sufﬁcient statistic for the optimal design [16], but needs to

additionally take into account the impact of delivery urgencies.

To our best knowledge, this is the ﬁrst time to study dynamic

control for deadline-constrained random access.

Furthermore, it is naturally desirable for this dynamic

2

control to strike a balance between the chance to gain an

instantaneous successful transmission and the chance to gain

a future successful transmission within the given deadline,

which requires reasoning about future sequences of access

parameters and observations. So, the dynamic control design

under this objective is more challenging than that for maximiz-

ing the instantaneous throughput of random access [12], [14],

[15], which is only “one-step look-ahead”. By seeing access

parameters as actions, in this paper we apply the theories

of Markov Decision Process (MDP) and Partially Observable

MDP (POMDP) to obtain optimal control schemes. Although

the idea of using MDP and POMDP in the context of ran-

dom access control is not new [13], [17], [18], to our best

knowledge, this is the ﬁrst work to apply them in deadline-

constrained broadcasting. In addition, as solving POMDP is in

general computationally prohibitive, it is important to develop

a simple control scheme with little performance loss.

In this paper, we focus on deadline-constrained broadcast-

ing under frame-synchronized trafﬁc. Such a trafﬁc pattern

can capture a number of scenarios in machine-to-machine

communications [1], [7], [8], [10], [19] where each node has

periodic-i.i.d. packet arrivals. The contributions of our work

are as follows.

1) We generalize slotted-ALOHA to deﬁne a dynamic con-

trol scheme, i.e., a deterministic Markovian policy, which

allows each active node to determine the current trans-

mission probability with certainty based on its current

delivery urgency and the knowledge of current contention

intensity.

2) For an idealized environment where the contention in-

tensity is completely known, we develop an analytical

framework based on the theory of MDP, which leads

to an optimal control scheme by applying the backward

induction algorithm. We further show it is indeed optimal

over all types of policies for this environment.

3) For a realistic environment where the contention inten-

sity is incompletely known, we develop an analytical

framework based on the theory of POMDP, which can

in theory lead to an optimal control scheme by backward

induction. We also show it is indeed optimal over all types

of policies for this environment.

4) To overcome the infeasibility in obtaining an optimal or

near-optimal control scheme from the POMDP frame-

work, we investigate the behaviors of the optimal control

scheme for two extreme cases in the idealized environ-

ment, and use these behaviors as clues to design a simple

heuristic control scheme for the realistic environment

with TDR close to the maximum achievable TDR in

the idealized environment. In addition, we propose an

approximation on the knowledge of contention intensity

to further simplify this heuristic scheme.

Note that although the MDP framework for the idealized

environment has limited applicability as the contention inten-

sity cannot be completely known in practice, it will serve to

provide an upper bound on the maximum achievable TDR in

the realistic environment, and serve to provide clues to design

a heuristic scheme for the realistic environment.

The remainder of this paper is organized as follows. The

system model is speciﬁed in Section II-A, and a dynamic

control scheme is deﬁned in Section II-B. The idealized and

realistic environments are studied in Sections III and IV,

respectively. A simple heuristic control scheme for the realistic

environment is proposed in Section V. Numerical results with

respect to a wide range of conﬁgurations are provided in

Section VI. Conclusions are given in Section VII.

II. SY S TE M MO DE L A ND DY NA M IC CO NTRO L SCHE ME

A. System model

Consider a globally synchronized wireless network where a

ﬁnite number, N≥2, of nodes are within the communication

range of each other. The global time axis is divided into

frames, each of which consists of a ﬁnite number, D≥1,

of time slots of equal duration, indexed by t∈ T ,

{1,2,...,D}. To broadcast the freshest information, at the

beginning of each frame, each node independently generates

a packet to be transmitted with probability λ∈(0,1]. We

further assume that every packet has a strict delivery deadline

Dslots, i.e., a packet generated at the beginning of a frame will

be discarded at the end of this frame. A broadcasting scenario

for collaborative target detection with the above assumptions

can be found in Fig. 1.

¬

t =

TargetTarget

TargetTarget

Detection Range

TargetTarget

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

et

et

et

et

et

et

et

et

et

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Targ

rg

rg

rg

rg

rg

rg

etet

et

et

et

et

et

et

et

et

et

et

etetet

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

et

et

et

etetet

Target

TargetTarget

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

rget

et

et

et

et

et

et

et

Ta

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

rget

et

et

et

et

et

et

et

Target

TargetTarget

TaTa

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rg

rg

rg

rg

rg

rg

rg

rg

rg

et

et

et

et

et

et

et

et

TaTa

Ta

Ta

Ta

Ta

Ta

Ta

Ta

rgrgrg

rg

rg

rg

rg

rg

rg

et

et

et

et

et

et

et

et

TaTa

Ta

Ta

Ta

rgrg

rg

rg

rgetet

et

et

et

TaTa

Ta

Ta

Ta

rgrg

rg

rg

rgetet

et

et

et

Target

UAV

UAV

UAV

UAV

Communicat ion Link 1 2 3 1 2 3

D = 3

ĂĂ

¬

Fig. 1. A deadline-constrained broadcasting scenario for collaborative target

detection

By considering random channel errors due to wireless

fading effect, we assume that a packet sent from a node

is successfully received by an arbitrary other node with the

probability σ∈(0,1] if it does not collide with other packets,

and otherwise is certainly unsuccessfully received by any other

node. Due to the broadcast nature, we assume that every

packet is neither acknowledged nor retransmitted. Then, at the

beginning of slot tof a frame, a node is called an active node

if it generated a packet at the beginning of the frame and has

not transmitted before slot t, but is called an inactive node

otherwise. Each active node follows a common control scheme

for random access, which will be deﬁned in Section II-B,

to generate transmission probabilities at the beginnings of

different slots.

A slot is said to be in the idle channel status if no packet

is being transmitted, and in the busy status otherwise. At the

3

end of a slot, we assume that each node is able to be aware

of the channel status of this slot.

The values of N,D,λand σare all assumed to be

completely known in advance to each node.

B. Dynamic Control for Random Access

Due to frame-synchronous trafﬁc, at the beginning of an

arbitrary slot with at least one active node, we know each

active node has the same delivery urgency. To take into account

the joint impact of delivery urgency and the knowledge of

contention intensity on determining transmission probabilities,

a dynamic control scheme for random access in deadline-

constrained broadcasting, which can be seen as a generaliza-

tion of slotted-ALOHA, is formally deﬁned as follows.

Consider an arbitrary frame. Let the random variable nt

taking values in N,{0,1,...,N −1}denote the actual

number of other active nodes in the view of an arbitrary

node at the beginning of slot t∈ T . At the beginning of an

arbitrary slot t∈ T with at least one active node, we assume

that each active node has the same observation history (for

estimating the actual value of nt) from the environment, and

we require each active node to adopt the same transmission

probability. Thus, each active node has the same knowledge

of the actual value of ntbased on all past observations

from the environment and all past transmission probabilities.

Such a knowledge can be summarized by a probability vector

bt,bt(0), bt(1),...,bt(N−1), called the activity belief,

where bt(n)is the conditional probability (given all past

observations from the environment and all past transmission

probabilities) that nt=n. Let Btdenote the set of all possible

values of btin [0,1]Nsuch that PN−1

n=0 bt(n) = 1. Hence, at

the beginning of every slot t∈ T with at least one active

node, we require each active node to use the values of tand

btfor determining the value of transmission probability ptby

a transmission function πt:Bt→[0,1]. An example of the

working procedure for the case of N= 8,D= 6 is illustrated

in Fig. 1.

packet transmission

node 1

p1p2p3p4p5

t = 1

transmission

probabilities

t = 2 t = 3 t = 4 t = 5 t = 6

node 2

node 3

node 4

node 5

node 6

node 7

node 8

active

inactive

active

active

inactive

active

active

inactive

inactive

inactive

active

active

inactive

active

active

inactive

inactive

inactive

active

active

inactive

active

inactive

inactive

inactive

inactive

inactive

active

inactive

inactive

inactive

inactive

inactive

inactive

inactive

active

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

inactive

an arbitrary frame

busy busy busy idle busy idle

channel status

]1,

0[

:®

t

tB

p

Fig. 2. An example of the working procedure for N= 8,D= 6.

We further consider two different environments for active

nodes to obtain the value of the activity belief bt.

1) Idealized environment: at the beginning of every slot t∈

Twith at least one active node, each active node always

has a complete knowledge of the value of nt, i.e., bt=

(0,...,0, bt(n) = 1,0,...,0) if nt=nactually. Hence,

the transmission function πtin this environment can be

simply written as a function bπt:N → [0,1].

2) Realistic environment: at the beginning of each slot t∈ T

with at least one active node, each active node is able to

obtain the value of btonly based on the characteristic

of packet arrivals, all past channel statuses (idle or busy)

and all past transmission probabilities, and thus has an

incomplete knowledge of the value of nt.

Obviously, the idealized environment is infeasible to imple-

ment due to the difﬁculty in determining the initial actual

number of other active nodes and determining the number

of nodes involved in each busy slot, whereas the realistic

environment can be easily implemented without imposing

extra overhead and hardware cost.

The objective of subsequent sections is to seek optimal

control schemes, i.e., design bπtand πtsequentially in each slot

so that the TDR is maximized, for the idealized and realistic

environments, respectively. It will be shown in Section III-B

that the mapping of bπtcan lead to an optimal control scheme

over all possible schemes for the idealized environment, and

will be shown in Section IV-B that the mapping of πtcan lead

to an optimal control scheme over all possible schemes for the

realistic environment.

III. MDP FR AM EWO RK F OR T H E IDE ALI ZED

ENV IRON ME N T

In this section, we formulate the random access control

problem in the idealized environment as an MDP, use the

expected total reward of this MDP to evaluate the TDR, and

then obtain an optimal control scheme that maximizes the

TDR.

A. MDP Formulation

For an arbitrary frame, consider an arbitrary node as the

tagged node, and let the random variable qttaking values from

{0 (inactive),1 (active)}denote the status of the tagged node

at the beginning of slot t. From the dynamic control scheme

speciﬁed in Section II-B, we see that each node becomes

inactive at the beginning of slot t+ 1 with the transmission

probability ptif it is active at the beginning of slot t, and will

be always inactive if it is inactive at the beginning of slot t.

This implies that the probability of moving to the next state

in the state process (qt, nt)t∈T depends only on the current

state. Thus, (qt, nt)t∈T can be viewed as a discrete-time ﬁnite-

horizon, ﬁnite-state Markov chain.

Based on the Markov chain (qt, nt)t∈T , we present an MDP

formulation by introducing the following deﬁnitions.

1) Actions: At the beginning of each slot t∈ T with qt= 1,

the action performed by the tagged node (and the other

active nodes) is the chosen transmission probability pt

taking values in the action space [0,1]. Note that the

tagged node performs no action when qt= 0.

2) State Transition Function: As the tagged node will never

transmit since slot tif qt= 0, we only concern about the

state transition function when qt= 1. The state transition

4

function βt(q′, n′),(1, n), pis deﬁned as the transition

probability of moving from the state (qt, nt) = (1, n)to

the state (qt+1, nt+1 ) = (q′, n′)when each active node at

the beginning of slot tadopts the transmission probability

pt=p. So, we have

βt(q′, n′),(1, n), p

,Pr(qt+1, nt+1 ) = (q′, n′)|(qt, nt) = (1, n), pt=p

=(n

n−n′pn−n′+1−q′(1 −p)n′+q′,if n′≤n,

0,otherwise,(1)

for each t∈ T \ {D}, each q′∈ {0,1}, each n, n′∈ N

and each p∈[0,1].

3) Rewards: The reward gained at slot tis deﬁned as the

average number of packets of the tagged node transmitted

successfully to an arbitrary other node at slot t. As there

is no reward at slot twhen qt= 0, we only focus on the

cases when qt= 1. Let rt(1, n), pdenote the reward

at slot tfor the state (qt, nt) = (1, n)when each active

node at the beginning of slot tadopts the transmission

probability pt=p. So, we have

rt(1, n), p=σp(1 −p)n,(2)

for each t∈ T , each n∈ N and each p∈[0,1].

4) Policies: A deterministic Markovian policy b

πis deﬁned

by a sequence of transmission functions (i.e., decision

rules) for the idealized environment:

b

π,(bπ1,bπ2,...,bπD),where bπt:N → [0,1].

Let b

ΠMD denote the set of all possible such polices.

Obviously, a dynamic control scheme for the idealized

environment as described in Section II-B is essentially a

deterministic Markovian policy here.

Let R

b

π(1, n)denote the expected total reward from slot 1

to slot Dwhen q1= 1,n1=nand the policy b

πis used,

which can be deﬁned by

R

b

π(1, n)

,E

b

πnD

X

t=1,qt=1

rt(1, nt),bπt(nt)|q1= 1, n1=no,

where E

b

πrepresents the conditional expectation given that

policy b

πis employed. Then, the TDR under the policy b

πcan

be computed by

TDR

b

π=X

n∈N N−1

nλn(1 −λ)N−1−nR

b

π(1, n).

B. MDP solution

Due to the ﬁnite horizon, ﬁnite state space, compact action

space, bounded rewards, continuous rewards with respect to

pand continuous state transition function with respect to p

in our MDP formulation, [20, Prop. 4.4.3, Ch. 4] indicates

that for maximizing TDR

b

π, there exists a b

π∈b

ΠMD, which

is indeed optimal over all random and deterministic, history-

dependent and Markovian policies. This property also justiﬁes

the transmission function and design goal for the idealized

environment considered in Section II-B. Hence, we aim to

seek

b

π∗∈arg max

b

π∈

b

ΠMD

TDR

b

π.

Let U∗

t(1, n)denote the value function corresponding to the

maximum total expected reward from slot tto slot Dwhen

qt= 1 and nt=n. Averaging over all possible next states

with qt+1 = 1, we arrive at the following Bellman’s equation:

U∗

D(1, n) = max

p∈[0,1] rD(1, n), p,∀n∈ N ,

U∗

t(1, n) = max

p∈[0,1] rt(1, n), p

+X

n′∈N

βt(1, n′),(1, n), pU∗

t+1(1, n′),∀n∈ N ,

(3)

for each t∈ T \ {D}. Then, applying the backward induction

algorithm to get a solution to Eq. (3) involves ﬁnding global

maximizers of a series of real-coefﬁcient univariate polynomi-

als deﬁned on [0,1], and can formally lead to b

π∗.

IV. POMDP F RAM EWO RK FOR T HE REA LIS TIC

ENV IRON ME N T

In this section, we formulate the random access control

problem in the realistic environment as a POMDP, use the

expected total reward of this POMDP to evaluate the TDR,

and then discuss how to obtain an optimal or near-optimal

control scheme.

A. POMDP Formulation

Based on the Markov chain (qt, nt)t∈T speciﬁed in Sec-

tion III and the activity belief btfor each t∈ T with qt= 1

speciﬁed in Section II-B, we present a POMDP formulation

by introducing the following deﬁnitions.

1) Actions, State Transition Function, Rewards: The deﬁni-

tions of these elements are the same as in Section III.

2) Observations and Observation Function: The tagged

node at the beginning of slot t+ 1 can obtain an

observation on the channel status of slot t, denoted by

ot. When qt+1 = 0, the tagged node will never transmit

since slot t+ 1 and otwill thus be useless. Hence, we

only consider otwhen qt+1 = 1 taking values from the

observation space O,{0 (idle),1 (busy)}. Further, the

observation function ωto, (1, n),(1, n′)is deﬁned as

the probability that the tagged node at the beginning of

slot t+ 1 obtains the observation ot=oif the state

(qt, nt) = (1, n)and the state (qt+1 , nt+1) = (1, n′). So,

we have

ωto, (1, n),(1, n′)

,Prot=o|(qt, nt) = (1, n),(qt+1, nt+1 ) = (1, n′)

=

1,if o= 0, n =n′,

1,if o= 1, n −n′≥1,

0,otherwise,

for each t∈ T , each o∈ O and each n, n′∈ N .

3) Bayesian update of the Activity Belief: It has been shown

in [16] that for each t∈ T with qt= 1, the value

5

of the activity belief btis a sufﬁcient statistic for the

initial activity belief, all past channel statuses and all

past transmission probabilities. First, by the total number

of nodes Nand the packet generation probability λ, the

tagged node can obtain

b1=hλ

,(1 −λ)N−1,(N−1)λ(1 −λ)N−2,...,λN−1.

(4)

Then, for each t∈ T \ {D}, given the condition qt=

qt+1 = 1, the activity belief bt=b, the observation

ot=o, the transmission probability pt=pused at slot

t, the tagged node at slot t+ 1 can obtain bt+1 via the

Bayes’ rule:

bt+1 ,θt(b, p, o, 1,1),

bt+1(n′)

,Prnt+1 =n′|bt=b, pt=p, ot=o, qt=qt+1 = 1

=Pn∈N b(n)ωto, (1, n),(1, n′)βt(1, n′),(1, n), p

χt(o, b, p, 1,1) ,

for each n′∈ N , where

χt(o, b, p, 1,1)

,Prqt+1 = 1, ot=o|qt= 1,bt=b, pt=p

=X

n∈N

b(n)X

n′′∈N

ωto, (1, n),(1, n′′)

·βt(1, n′′),(1, n), p.

4) Policies: A deterministic Markovian policy πis deﬁned

by a sequence of transmission functions for the realistic

environment:

π,(π1, π2,...,πD),where πt:Bt→[0,1].

Let ΠMD denote the set of all possible such polices.

Obviously, a dynamic control scheme for the realistic

environment as speciﬁed in Section II-B is essentially

a deterministic Markovian policy here.

Let Rπ(1,hλ)denote the expected total reward from slot

1to slot Dwhen q1= 1,b1=hλand the policy πis used,

which can be deﬁned by

Rπ(1,hλ)

,EπnD

X

t=1,qt=1

rt(1, nt), πt(bt)|q1= 1,b1=hλo.

Obviously, we have TDRπ=Rπ(1,hλ), where TDRπde-

notes the TDR under the policy π,

B. POMDP solution

Due to the ﬁnite horizon, ﬁnite state space, compact action

space, bounded rewards, continuous rewards with respect

to pand continuous χt(o, b, p, 1,1) with respect to pin

our POMDP formulation, [20, Prop. 4.4.3, Ch. 4] and [21,

Thm. 7.1, Ch. 6] indicate that for maximizing TDRπ, there

exists a π∈ΠMD, which is indeed optimal over all types of

policies. This property also justiﬁes the transmission function

and design goal for the realistic environment considered in

Section II-B. Hence, we aim to seek an optimal policy in

ΠMD that maximizes TDRπ, i.e.,

π∗∈arg max

π∈ΠMD

TDRπ.

Let V∗

t(1,b)denote the value function corresponding to the

maximum total expected reward from slot tto slot Dwhen

qt= 1 and bt=b. Averaging over all possible current states

with qt= 1 and observations with qt+1 = 1, we arrive at the

following Bellman’s equation:

V∗

D(1,b) = max

p∈[0,1] X

n∈N

b(n)rD(1, n), p,∀b∈ BD,

V∗

t(1,b) = max

p∈[0,1] X

n∈N

b(n)rt(1, n), p

+X

o∈O

χt(o, b, p, 1,1)V∗

t+11, θt(b, p, o, 1,1),∀b∈ Bt,

(5)

for each t∈ T \ {D}. Solving Eq. (5) formally leads to π∗.

Unfortunately, getting π∗by solving Eq. (5) is computation-

ally intractable, as both the belief state space St∈T Btand the

action space [0,1] are inﬁnite in our POMDP formulation. As

such, an alternative is to consider a discretized action space

Adthat only consists of uniformly distributed samples of

the interval [0,1], i.e., Ad,{0,∆p, 2∆p, . . . , 1}where ∆p

denotes the sampling interval. Hence, it is easy to see that Bt

will become ﬁnite for each t∈ T due to the ﬁnite Ad. Then,

theoretically, applying the backward induction algorithm [16]

to get a solution to Eq. (5) can lead to a near-optimal policy,

whose loss of optimality increases with ∆p. However, this

approach is still computationally prohibitive due to super-

exponential growth in the value function complexity.

V. A HEUR IS T IC SC HEM E FOR T H E REA L IS TIC

ENV IRON ME N T

To overcome the infeasibility in obtaining an optimal or

near-optimal control scheme for the realistic environment from

the POMDP framework, in this section, we propose a simple

heuristic control scheme that utilizes the key properties of our

problem. It will be shown in Section VI that the heuristic

scheme performs quite well in simulations.

A. Heuristic from the idealized environment

We ﬁrst investigate the behaviors of b

π∗for two extreme

cases in the idealized environment, which would serve to pro-

vide important clues on approximating b

π∗. Let Ut(1, n), p

denote the total expected reward from slot tto slot Dfor the

state (qt, nt) = (1, n)when each active node at the beginning

of slot tadopts the transmission probability pt=pand the

optimal decision rules at slots t+ 1, t + 2 ,...,D. So, we have

Ut(1, n),bπ∗

t(n)=U∗

t(1, n).

6

Lemma 1. When n1=m→ ∞, by assuming that each

collision involves a ﬁnite number of packets, for each t∈ T

and each possible nt=n, we have

lim

m→∞(n+ 1)Ut(1, n),1

n+ 1=(D−t+ 1)σ

e,(6)

lim

m→∞(n+ 1)U∗

t(1, n) = (D−t+ 1)σ

e.(7)

The proof of Lemma 1 is provided in Appendix A.

Lemma 1 motivates us to conjecture that, if n1takes a value

sufﬁciently larger than D, the realizations of (nt+ 1)bπ∗

t(nt)

would always approach 1 for each t∈ T . Fig. 3 shows

1000 such realizations when D= 10 for n1= 30,50,100,

respectively, which conﬁrm our conjecture.

1 2 3 4 5 6 7 8 9 10

1

1.002

1.004

1.006

1.008

1.01

1.012

1.014

1.016

Fig. 3. Realizations of (nt+1)bπ∗

t(nt)when n1= 30,50,100 and D= 10.

We further investigate the behaviors of b

π∗for the extreme

case that n1takes a value sufﬁciently smaller than D.

Lemma 2. For each t∈ T , we have

U∗

t(1,1) = 3D−3t+ 1

3D−3t+ 4σ, (8)

and for each t∈ T \ {D}, we have

bπ∗

t(1) = 3

3D−3t+ 4.(9)

The proof of Lemma 2 is provided in Appendix B.

Inspired by Eq. (9), we consider a simple control scheme

b

πeve ,[bπeve

1,bπeve

2,...,bπeve

D]∈b

ΠMD where

bπeve

t(n) = 1

D−t+ 1,(10)

for each t∈ T and each n∈ N .

Let Ueve

t(1, n)denote the expected total reward from slot t

to slot Dfor the state (qt, nt) = (1, n)when each active node

adopts the decision rules bπeve

tat slots t, t+1,...,D. So, using

the ﬁnite-horizon policy evaluation algorithm [20], we have

Ueve

D(1, n) = rD(1, n),bπeve

D(n),∀n∈ N ,

Ueve

t(1, n) = rt(1, n),bπeve

t(n)

+X

n′∈N

βt(1, n′),(1, n),bπeve

t(n)Ueve

t+1(1, n′),∀n∈ N .

(11)

for each t∈ T \ {D}.

Lemma 3. For each t∈ T \ {D}and each n∈ N , we have

Ueve

t(1, n) = σ1−1

D−t+ 1n.(12)

The proof of Lemma 3 is provided in Appendix C.

For each t∈ T \ {D}and each n∈ N , based on the fact

U∗

t(1, n)≤σ, we have

Ueve

t(1, n)

U∗

t(1, n)≥1−1

D−t+ 1 n.(13)

We can observe from Eq. (13) that, if nis sufﬁciently smaller

than D−t+ 1, the value of U∗

t(1, n)is close to the value

of Ueve

t(1, n). Then, Eq. (13) motivates us to conjecture that,

if nttakes a value sufﬁciently smaller than D−t+ 1,bπeve

t

may behave like bπ∗

t, i.e., the realizations of (D−t+ 1 )bπ∗

t(nt)

would always approach (D−t+ 1)bπeve

t(nt) = 1 for each nt=

n∈ N . Fig. 4 shows 1000 such realizations when n1= 10 for

D= 30,50,100, respectively, which conﬁrm our conjecture.

1 2 3 4 5 6 7 8 9 10

0.972

0.976

0.98

0.984

0.988

0.992

0.996

1

Fig. 4. Realizations of (D−t+ 1)bπ∗

t(nt)when n1= 10 and D=

30,50,100.

Naturally, we obtain the following heuristic from Lemma 1

and Eq. (13).

1) When the number of active nodes is sufﬁciently large

compared with the value of remaining slots, it is desirable

for the active nodes to adopt the transmission probability

that maximizes the instantaneous throughput.

2) When the number of active nodes is sufﬁciently small

compared with the value of remaining slots, it is desirable

for the active nodes to adopt the transmission probability

to ensure that all the backlogged packets would be almost

evenly transmitted in the remaining slots.

Based on this heuristic and the obvious fact bπ∗

D(n) = 1

n+1 for

each n∈ N , we propose a simple approximation on b

π∗.

Approximation on b

π∗: For each slot t∈ T and each nt=

n∈ N , if the number of active nodes n+ 1 is larger than

the value of remaining slots D−t+ 1 or t=D,bπ∗

t(n)can

be estimated by 1

n+1 , otherwise bπ∗

t(n)can be estimated by

1

D−t+1 .

7

0 5 10 15 20 25 30

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

Fig. 5. bπ∗

t(n)and its approximation for typical choices of parameters when

D= 30.

0 5 10 15 20 25 30

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Fig. 6. The total expected rewards from slot tto Dcorresponding to bπ∗

t(n)

and its approximation for typical choices of parameters when D= 30.

Fig. 5 compares bπ∗

t(n)and its approximation for typical

choices of parameters when D= 30. The results show that the

approximation error is very small when the difference between

nand D−tis large, but is noticeable when this difference

is small, thus justifying our heuristic. The results also show

that the ratio of the cases with the approximation error

larger than 8% is 6.67% and the largest approximation error

is 11.47%, thus justifying our approximation. Furthermore,

Fig. 6 compares the total expected rewards from slot tto

Dcorresponding to bπ∗

t(n)and its approximation for typical

choices of parameters when D= 30. The results show that

the approximation leads to at most 0.66% reward loss (much

smaller than the approximation error), and thus verify that our

approximation can be used to obtain the TDR quite close to

the maximum achievable TDR in the idealized environment.

B. A simple approximation on the activity belief of the realistic

environment

To apply the approximation on b

π∗to the realistic envi-

ronment, it is necessary for each active node to perform

a runtime updating of the activity belief bt. However, as

shown in Section IV-A, the full Bayesian updating of bt

is a bit computationally demanding to implement. So, we

shall propose a simple approximation on bt, denoted by

bbd

t,bbd

t(0), bbd

t(1),...,bbd

t(N−1), relying on a binomial

distribution with a changeable parameter vector (Mt, αt).

More speciﬁcally, if (Mt, αt) = (M, α), we have

bbd

t(n) = (M

nαn(1 −α)M−n,if 0≤n≤M,

0,otherwise.(14)

As such, in this manner, each active node will only keep the

parameter vector (Mt, αt)rather than the activity belief bt.

Obviously, by Eq. (4), we can set (M1, α1) = (N−1, λ)

to achieve bt=bbd

t. Then, for each t∈ T \ {D}, we will

show that we can use the Bayes’ rule exactly to set the value of

(Mt+1, αt+1 )when the observation ot= 0, but must introduce

an approximation assumption when ot= 1.

For each t∈ T \ {D}, given (Mt, αt) = (M, α),bbd

t=bbd

and pt=p, the following procedure ﬁrst uses the Bayes’ rule

to compute bmed

t+1 ,bmed

t+1(0), bmed

t+1(1),...,bmed

t+1(N−1), and

then computes (Mt+1 , αt+1)based on the value of bmed

t+1.

Case 1: if ot= 0, the Bayesian update yields

bmed

t+1(n′)

=Pn∈N bbd(n)ωt0,(1, n),(1, n′)βt(1, n′),(1, n), p

χt(0,bbd, p, 1,1)

=

M

n′α−αp

1−αp n′1−α−αp

1−αp M−n′

,if 0≤n′≤M,

0,otherwise.

We require bbd

t+1 to directly take the value of bmed

t+1, and set

(Mt+1, αt+1 ) = (M, α−αp

1−αp ).

Case 2: if ot= 1, the Bayesian update yields

bmed

t+1(n′)

=Pn∈N bbd(n)ωt1,(1, n),(1, n′)βt(1, n′),(1, n), p

χt(1,bbd, p, 1,1)

=

1

1−(1−αp)M

·M

n′α(1 −p)n′1−α(1 −p)M−n′

−(1 −αp)MM

n′α−αp

1−αp n′1−α−αp

1−αp M−n′,

if 0≤n′≤M−1,

0,otherwise.

Such a Bayesian update does not yield a distribution in the

form (14). However, we modify the value of bmed

t+1 to a

distribution in the form (14) by keeping the mean of the

distribution unchanged and considering that the number of

8

TABLE I

ACO MPA RI SON BE TWEE N THE REA LI ZATIO NS O F btA ND I TS AP PROX IMATI ON bbd

tWH EN E AC H ACT I VE N OD E AD OP T S πheu FO R N= 10,λ= 0.8,

D= 10.

bt(0) bt(1) bt(2) bt(3) bt(4) bt(5) bt(6) bt(7) bt(8) bt(9)

t= 1

o1= 0

bt0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218

Approx. 0.000001 0.000018 0.000295 0.002753 0.016515 0.066060 0.176161 0.301990 0.301990 0.134218

t= 2

o2= 1

bt0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160

Approx. 0.000001 0.000042 0.000583 0.004760 0.024988 0.087458 0.204068 0.306102 0.267839 0.104160

t= 3

o3= 1

bt0.000059 0.001098 0.009014 0.042646 0.127254 0.245406 0.298859 0.210235 0.065430 0

Approx. 0.000052 0.001004 0.008559 0.041692 0.126924 0.247294 0.301138 0.209545 0.063792 0

t= 4

o4= 1

bt0.001086 0.012248 0.059916 0.164987 0.276437 0.282086 0.162465 0.040774 0 0

Approx. 0.000974 0.011537 0.058598 0.165343 0.279925 0.284347 0.160466 0.038810 0 0

t= 5

o5= 1

bt0.010921 0.072058 0.201100 0.304173 0.263268 0.123764 0.024716 0 0 0

Approx. 0.010329 0.070827 0.202359 0.308353 0.264299 0.120821 0.023013 0 0 0

t= 6

o6= 0

bt0.068102 0.238724 0.340491 0.247285 0.091556 0.013842 0 0 0 0

Approx. 0.067210 0.240606 0.344541 0.246686 0.088312 0.012646 0 0 0 0

t= 7

o7= 0

bt0.169904 0.357679 0.306377 0.133629 0.029713 0.002698 0 0 0 0

Approx. 0.167239 0.359554 0.309208 0.132956 0.028585 0.002458 0 0 0 0

t= 8

o8= 1

bt0.421334 0.395352 0.150943 0.029344 0.002908 0.000118 0 0 0 0

Approx. 0.416144 0.398784 0.152859 0.029297 0.002807 0.000108 0 0 0 0

active nodes will be reduced by at least one due to a busy

slot. So, when M > 1, we set

(Mt+1, αt+1 ) = M−1,M(α−αp)1−(1 −αp)M−1

(M−1)1−(1 −αp)M,

and when M= 1, we adopt the convention that

(Mt+1, αt+1 ) = (M−1,1).

The accuracy of this approximation will be examined via

numerical results at the end of this section.

C. A heuristic scheme

With the investigations in Sections V-A and V-B together,

we are ready to propose a heuristic but very simple control

scheme for the realistic environment, πheu.

At the beginning of each slot t∈ T , given the parameter

vector of belief approximation (Mt, αt) = (M, α), each active

node uses M α to estimate the mean of nt, and further uses

the following simple rule πheu

t(M, α)to determine the value

of transmission probability pt.

1) If M α + 1 > D −t+ 1 or t=D, set πheu

t(M, α)to

maximize the expected instantaneous throughput, i.e.,

πheu

t(M, α)∈arg max

p∈[0,1] X

n∈N

bbd(n)rt(1, n), p.

= min 1

Mα +α,1.(15)

The proof of Eq. (15) is provided in Appendix D.

2) Otherwise, set

πheu

t(M, α) = 1

D−t+ 1 .

Table I compares the realizations of btand its approxi-

mation bbd

twhen each active node adopts πheu for N= 10,

λ= 0.8,D= 10, and veriﬁes that the proposed approximation

is reasonable.

VI. NU M ER ICA L EVALUATIO N

In this section, we present numerical results to compare

the TDR performance of an optimal control scheme for the

idealized environment b

π∗, the proposed heuristic scheme for

the realistic environment πheu , and an optimal static scheme

for the realistic environment πsta. Here, πsta requires each

active node to always adopt a static and identical transmission

probability, and can be obtained using the single-variable

optimization methods. Such compactions are not only helpful

to demonstrate the performance loss due to the incomplete

knowledge of the value of nt, but also helpful to demonstrate

the performance advantage beneﬁtting from dynamically ad-

justing the transmission probability.

0.1 0.16 0.22 0.28 0.34 0.4

0.15

0.28

0.41

0.54

0.67

0.8

Fig. 7. The TDR as a function of the packet arrival rate λfor N= 50,

D= 10,20,σ= 0.9.

The scenarios considered in the numerical experiments are

in accordance with the system model speciﬁed in Section II.

9

We shall vary the network conﬁguration over a wide range

to investigate the impact of control scheme design on the

TDR performance. Each numerical result is obtained from 107

independent numerical experiments.

Fig. 7 shows the TDR performance as a function of the

packet arrival rate λfor N= 50,D= 10,20,σ= 0.9.

We observe that πheu performs close to b

π∗:3.07%–8.28%

loss when D= 10 and 0.60%–4.47% loss when D= 20.

This indicates that the design of πheu is reasonable, and the

incomplete knowledge of the actual value of nthas a minor

impact on the TDR performance. We further observe that πheu

signiﬁcantly outperforms πs ta:1.84%–17.12% improvement

when D= 10 and 11.11%–19.40% improvement when

D= 20. The reason is obviously that πsta does not adjust

the transmission probability according to the current delivery

urgency and contention intensity. Meanwhile, it is interesting

to note that πsta performs closer to other schemes as λin-

creases. This is because the optimal transmission probabilities

for different values of tand ntbecome closer with the value

of n1, as indicated by Lemma 1.

10 12 14 16 18 20

0.2

0.29

0.38

0.47

0.56

0.65

Fig. 8. The TDR as a function of the delivery deadline D(slots) for N= 50,

λ= 0.25,σ= 0.8,1.

The observations in Fig. 7 are conﬁrmed again from Figs. 8–

9, which show the TDR performance as a function of the

delivery deadline D(slots) and the TDR performance as a

function of the packet success rate σ, respectively. In Fig. 8,

we observe that πheu performs signiﬁcantly better than πsta :

6.45%–17.06% improvement when σ= 0.8and 6.30%–

16.74% improvement when σ= 1, and performs close to b

π∗:

3.12%–6.68% loss when σ= 0.8and 3.45%–6.83% loss when

σ= 1. In Fig. 9, we observe that πheu performs signiﬁcantly

better than πsta:18.51%–19.33% improvement when λ= 0.1

and 5.58%–5.81% improvement when λ= 0.4, and performs

close to b

π∗:0.55%–0.87% loss when λ= 0.1and 4.11%–

4.41% loss when λ= 0.4. It is also shown that πsta performs

close to other transmission schemes as N λ

Dbecomes larger.

VII. CO NC LUS ION

In this paper, under the idealized and realistic environ-

ments, optimal dynamic control schemes for random access

0.8 0.84 0.88 0.92 0.96 1

0.1

0.24

0.38

0.52

0.66

0.8

Fig. 9. The TDR as a function of the packet success rate σfor N= 50,

λ= 0.1,0.4,D= 15.

in deadline-constrained broadcasting with frame-synchronized

trafﬁc have been investigated based on the theories of MDP

and POMDP, respectively. A novel feature of this work is to

require each active node to determine the current transmission

probability not only according to the knowledge of current

contention intensity, but also according to the current deliv-

ery urgency. The proposed heuristic scheme for the realistic

environment is able to achieve the threefold goal of being

implemented without imposing extra overhead and hardware

cost, of being implemented with very low computational

complexity, and of achieving TDR close to the maximum

achievable TDR in the idealized environment. An interesting

and important future research direction is to optimize deadline-

constrained broadcasting under general trafﬁc patterns.

APP EN D IX A

PROO F OF LEM MA 1

Assume each collision involves at most a ﬁnite number,

k≥2, of packets. We begin with the case t=D. By Eq. (2),

we know UD(1, n), p=σp(1 −p)nand thus bπ∗

D(n) = 1

n+1 .

As kand Dare both ﬁnite, for each nD=n∈ {m−k(D−

1), m −k(D−1)+ 1,...,m}, we obtain that m→ ∞ implies

n→ ∞ and then

lim

m→∞(n+1)U∗

D(1, n) = lim

m→∞(n+1)UD(1, n),1

n+ 1=σ

e.

(16)

Next, we consider the case t=D−1. By the ﬁnite-horizon

policy evaluation algorithm [20] and Eqs. (1), (2), for each

nD−1=n∈ {m−k(D−2), m −k(D−2) + 1,...,m}, we

10

have

(n+ 1)UD−1(1, n), p

= (n+ 1)rD−1(1, n), p

+ (n+ 1) X

n′∈N

βD−1(1, n′),(1, n), pU∗

D(1, n′)

=σ(n+ 1)p(1 −p)n

+X

n′∈N

(n+ 1) n!

n′!(n−n′)! pn−n′(1 −p)n′+1U∗

D(1, n′)

=σ(n+ 1)p(1 −p)n

+X

n′∈N n+ 1

n−n′pn−n′(1 −p)n′+1(n′+ 1)U∗

D(1, n′).

By assuming each collision involves at most a ﬁnite number,

k≥2, of packets, we have

(n+ 1)UD−1(1, n), p

=σ(n+ 1)p(1 −p)n

+

n

X

n′=n−k+1 n+ 1

n−n′pn−n′(1 −p)n′+1(n′+ 1)U∗

D(1, n′)

+1−

n

X

n′=n−k+1 n+ 1

n−n′pn−n′(1 −p)n′+1

·(n−k+ 1)U∗

D(1, n −k)(17)

≤σ(1 −1

n+ 1 )n+ (n−k+ 1)U∗

D(1, n −k)

+

n

X

n′=n−k+1 n+ 1

n−n′pn−n′(1 −p)n′+1

·(n′+ 1)U∗

D(1, n′)−(n−k+ 1)U∗

D(1, n −k).(18)

For each n′∈ {n−k+ 1, n −k+ 2,...,n}, since

0≤n+1

n−n′pn−n′(1 −p)n′+1 ≤1, by applying the squeeze

theorem, we obtain from Eq. (16) that

lim

m→∞ n+ 1

n−n′pn−n′(1 −p)n′+1

·(n′+ 1)U∗

D(1, n′)−(n−k+ 1)U∗

D(1, n −k)= 0.

(19)

By Eqs. (16), (19) and inequality (18), as kand Dare both

ﬁnite, we further obtain that m→ ∞ implies n→ ∞ and

then

lim sup

m→∞ (n+ 1)UD−1(1, n), p

≤lim sup

m→∞ σ(1 −1

n+ 1)n+ (n−k+ 1)U∗

D(1, n −k)

= lim

m→∞ σ(1 −1

n+ 1)n+ (n−k+ 1)U∗

D(1, n −k)

=2σ

e,

which implies

lim sup

m→∞ (n+ 1)U∗

D−1(1, n)≤2σ

e.(20)

By setting bπD−1(n) = 1

n+1 for each nD−1=n∈ {m−

k(D−2), m −k(D−2) + 1,...,m}, as kand Dare both

ﬁnite, we obtain that m→ ∞ implies n→ ∞, and then obtain

from Eqs. (16), (17) and (19) that

lim

m→∞(n+ 1)UD−1(1, n),1

n+ 1

= lim

m→∞ σ(1 −1

n+ 1 )n+ (n−k+ 1)U∗

D(1, n −k)

=2σ

e.

Since U∗

D−1(1, n)≥UD−1(1, n),1

n+1 , we have

lim inf

m→∞ (n+ 1)U∗

D−1(1, n)

≥lim inf

m→∞ (n+ 1)UD−1(1, n),1

n+ 1=2σ

e.(21)

Combining inequalities (20) and (21), we have limm→∞ (n+

1)U∗

D−1(1, n) = 2σ

e.

For the case t=D−2, D−3,...,1, iteratively repeating the

above argument can lead to Eqs. (6) and (7) for each possible

nt=n.

APP EN D IX B

PROO F OF LEM MA 2

As U∗

t(1,0) = σfor each t∈ T , we have

Ut(1,1), p= 2σp(1 −p) + (1 −p)2U∗

t+1(1,1),(22)

for each t∈ T \ {D}. Taking the derivative of Ut(1,1), p

with respect to pderives that

d

dpUt(1,1), p

=2σ−2U∗

t+1(1,1)−4σ−2U∗

t+1(1,1)p.

As σ > 0and U∗

t(1,1) ≤σfor each t∈ T \ {D}, we have

bπ∗

t(1) = σ−U∗

t+1(1,1)

2σ−U∗

t+1(1,1) ,(23)

for each t∈ T \ {D}. In particular, as U∗

D(1,1) = σ/4, we

obtain bπ∗

D−1(1) = 3/7, which satisﬁes Eq. (9).

Then, we aim to investigate the relation between bπ∗

t(1) and

bπ∗

D−1(1) for each t∈ T \ {D−1, D}. By setting p=bπ∗

t(1)

in Eq. (22), we obtain

U∗

t(1,1) = 2σbπ∗

t(1)1−bπ∗

t(1)+1−bπ∗

t(1)2U∗

t+1(1,1)

=σ2

2σ−U∗

t+1(1,1) .(24)

Using Eq. (23) to express U∗

t+1(1,1) and U∗

t(1,1) in Eq. (24)

in terms of bπ∗

t(1) and bπ∗

t−1(1), respectively, we have

bπ∗

t(1) = bπ∗

t+1(1)

1 + bπ∗

t+1(1) ,(25)

for each t∈ T \ {D−1, D}. Furthermore, recursively using

Eq. (25) yields

bπ∗

t(1) = bπ∗

D−1(1)

1 + (D−t−1)bπ∗

D−1(1) (26)

and thus implies Eq. (9) by bπ∗

D−1(1) = 3/7.

11

Finally, combining Eqs. (9) and (23) obtains

U∗

t(1,1) = 1−2bπ∗

t−1(1)

1−bπ∗

t−1(1) σ=3D−3t+ 1

3D−3t+ 4 σ, (27)

for each t∈ T \ {1}, and substituting Eq. (27) into Eq. (24)

obtains U∗

1(1,1) = 3D−2

3D+1 σ. Hence we complete the proof for

Eq. (8).

APP EN D IX C

PROO F OF LEM MA 3

We shall prove Ueve

t(1, n) = σ1−1

D−t+1 nfor each n∈

Nby induction from t=D−1down to 1.

First, when t=D−1, by Eqs. (1), (2) and (11), we have

Ueve

D−1(1, n)

=rD−1(1, n),bπeve

D−1(n)

+X

n′∈N

βD−1(1, n′),(1, n),bπeve

D−1(n)Ueve

D(1, n′)

=σ1

21−1

2n+ (1 −1

2)n+1Ueve

D(1,0)

=σ(1 −1

2)n,

for each n∈ N , thereby establishing the induction basis.

Next, when t∈ T \ {D−1, D}, we assume Ueve

t+1(1, n) =

σ1−1

D−tnfor each n∈ N . By Eqs. (1), (2) and (11), we

have

Ueve

t(1, n)

=rt(1, n),bπeve

t(n)

+X

n′∈N

βt(1, n′),(1, n),bπeve

t(n)Ueve

t+1(1, n′)

=σ1

D−t+ 1 1−1

D−t+ 1n

+X

n′∈N n

n−n′1

D−t+ 1 n−n′

·1−1

D−t+ 1n′+1σ1−1

D−tn′

=σ1−1

D−t+ 1 n1

D−t+ 1

+σ1−1

D−t+ 1nD−t

D−t+ 1

·X

n′∈N n

n−n′1

D−tn−n′1−1

D−tn′

=σ1−1

D−t+ 1 n,

for each n∈ N . So, the inductive step is established.

Since both the base case and the inductive step have been

proved as true, we have Ueve

t(1, n) = σ1−1

D−t+1 nfor each

t∈ T \ {D}and each n∈ N .

APP EN D IX D

PROO F OF EQ. (15)

Letting f(M, α), p,(M+1)α

σPn∈N bbd(n)rt(1, n), p

for p∈[0,1] and ci=M+1

iαi(1 −α)(M+1−i)for 1≤i≤

M+ 1, we have

f(M, α), p=

M+1

X

i=1

icip(1 −p)i−1.

The derivative of f(M, α), pwith respect to pis given by

d

dpf(M, α), p

=

M+1

X

i=1

ici(1 −p)i−1−

M+1

X

i=2

i(i−1)cip(1 −p)i−2

= (M+ 1)α+ (M+ 1)2αM+1(−p)M+

M−1

X

j=1

βjpj,(28)

where βj,1≤j≤M−1is derived as follows:

βj= (−1)j

M+1−j

X

k=1 M+ 1

j+kαj+k(1 −α)M+1−j−k

·(j+k)j+k−1

j+ (j+k−1)j+k−2

j−1

= (−1)j(j+ 1)2αj+1

·

M+1−j

X

k=1 j+k

k−1M+ 1

j+kαk−1(1 −α)M+1−j−k

= (−1)j(j+ 1)2αj+1M+ 1

j+ 1

·

M+1−j

X

k=1 M−j

k−1αk−1(1 −α)M−j−k+1

= (−1)jαj+1

·j(M+ 1)2+ (M+ 1)(M−j)(M−1)!

(M−j)!j!

= (−1)jαj+1

·(M+ 1)2M−1

j−1+ (M+ 1)M−1

j.(29)

Combining Eqs. (28) and (29), we have

d

dpf(M, α), p

= (M+ 1)α+ (M+ 1)2αM+1(−p)M

+

M−1

X

j=1 h(M+ 1)2M−1

j−1+ (M+ 1)M−1

ji

·αj+1(−p)j

= (M+ 1)α1−(M+ 1)αp

+ (M+ 1)α1−(M+ 1)αp(−αp)M−1

+

M−2

X

j=1 M−1

j(M+ 1)α1−(M+ 1)αp(−αp)j

= (M+ 1)α1−(M+ 1)αp1−αpM−1.(30)

12

From Eq. (30), for p∈[0,1], we obtain that f(M , α), p≤

f(M, α),1

Mα+αwhen 1

Mα+α≤1, and f(M , α), p≤

f(M, α),1when 1

Mα+α>1.

Hence we complete the proof for Eq. (15).

ACK NOWL E DG E ME NT

The authors would like to thank Dr. He Chen for helpful

suggestions and discussions.

REF ERE NC E S

[1] D. Feng, C. She, K. Ying, L. Lai, Z. Hou, T. Q. S. Quek, Y. Li, and

B. Vucetic, “Toward ultrareliable low-latency communications: Typical

scenarios, possible solutions, and open issues,” IEEE Veh. Technol. Mag.,

vol. 14, no. 2, pp. 94–102, 2019.

[2] J. Gao, M. Li, L. Zhao, and X. Shen, “Contention intensity based

distributed coordination for V2V safety message broadcast,” IEEE Trans.

Veh. Technol., vol. 67, no. 12, pp. 12288–12 301, 2018.

[3] M. Luvisotto, Z. Pang, and D. Dzung, “High-performance wireless

networks for industrial control applications: New targets and feasibility,”

Proc. IEEE, vol. 107, no. 6, pp. 1074–1093, 2019.

[4] Y. H. Bae, “Analysis of optimal random access for broadcasting with

deadline in cognitive radio networks,” IEEE Commun. Lett., vol. 17,

no. 3, pp. 573–575, 2013.

[5] Y. H. Bae, “Random access scheme to improve broadcast reliability,”

IEEE Commun. Lett., vol. 17, no. 7, pp. 1467–1470, 2013.

[6] Y. H. Bae, “Queueing analysis of deadline-constrained broadcasting in

wireless networks,” IEEE Commun. Lett., vol. 19, no. 10, pp. 1782–

1785, 2015.

[7] C. Campolo, A. Vinel, A. Molinaro, and Y. Koucheryavy, “Modeling

broadcasting in IEEE 802.11p/WAVE vehicular networks,” IEEE Com-

mun. Lett., vol. 15, no. 2, pp. 199–201, 2011.

[8] L. Deng, J. Deng, P. Chen, and Y. S. Han, “On the asymptotic perfor-

mance of delay-constrained slotted ALOHA,” in Proc. IEEE ICCCN,

2018, pp. 1–8.

[9] Y. Zhang, Y. Lo, F. Shu, and J. Li, “Achieving maximum reliability

in deadline-constrained random access with multiple-packet reception,”

IEEE Trans. Veh. Technol., vol. 68, no. 6, pp. 5997–6008, 2019.

[10] L. Deng, F. Liu, Y. Zhang, and W. S. Wong, “Delay-constrained

topology-transparent distributed scheduling for MANETs,” IEEE Trans.

Veh. Technol., vol. 70, no. 1, pp. 1083–1088, 2021.

[11] A. Segall, “Recursive estimation from discrete-time point processes,”

IEEE Trans. Inf. Theory, vol. 22, no. 4, pp. 422–431, 1976.

[12] R. Rivest, “Network control by Bayesian broadcast,” IEEE Trans. Inf.

Theory, vol. IT-33, no. 3, pp. 323–328, 1987.

[13] G. del Angel and T. L. Fine, “Optimal power and retransmission control

policies for random access systems,” IEEE/ACM Trans. Netw., vol. 12,

no. 6, pp. 1156–1166, 2004.

[14] L. Bononi, M. Conti, and E. Gregori, “Runtime optimization of IEEE

802.11 wireless LANs performance,” IEEE Trans. Parallel Distrib. Syst.,

vol. 15, no. 1, pp. 66–80, 2004.

[15] H. Wu, C. Zhu, R. J. La, X. Liu, and Y. Zhang, “FASA: Accelerated S-

ALOHA using access history for event-driven M2M communications,”

IEEE/ACM Trans. Netw., vol. 21, no. 6, pp. 1904–1917, 2013.

[16] R. Smallwood and E. Sondik, “The optimal control of partially observ-

able Markov processes over a ﬁnite horizon,” Oper. Res., vol. 21, no. 5,

pp. 1071–1088, 1973.

[17] Y. Zhang, A. Gong, Y. Lo, J. Li, F. Shu, and W. S. Wong, “Generalized

p-persistent CSMA for asynchronous multiple-packet reception,” IEEE

Trans. Commun., vol. 67, no. 10, pp. 6966–6979, 2019.

[18] A. Biason, S. Dey, and M. Zorzi, “A decentralized optimization frame-

work for energy harvesting devices,” IEEE Trans. Mob. Comput., vol. 17,

no. 11, pp. 2483–2496, 2018.

[19] A 5G trafﬁc model for industrial use cases. White Paper, 5G Alliance

for Connected Industries and Automation, 2019.

[20] M. L. Puterman, Markov decision processes: Discrete stochastic dy-

namic programming. John Wiley & Sons, 2014.

[21] P. R. Kumar and P. Varaiya, Stochastic systems: Estimation, identiﬁca-

tion, and adaptive control. SIAM, 2015.

[22] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, “The

complexity of decentralized control of Markov decision processes,”

Math. Oper. Res., vol. 27, no. 4, pp. 819–840, 2002.