Content uploaded by Kumar Saurav
Author content
All content in this area was uploaded by Kumar Saurav on Sep 01, 2020
Content may be subject to copyright.
arXiv:2001.04427v1 [cs.IT] 13 Jan 2020
Game of Ages
Kumar Saurav
School of Technology and Computer Science
Tata Institute of Fundamental Research
Mumbai, India.
kumar.saurav@tifr.res.in
Rahul Vaze
School of Technology and Computer Science
Tata Institute of Fundamental Research
Mumbai, India.
rahul.vaze@gmail.com
Abstract—We consider a distributed IoT network, where each
node wants to minimize its age of information and there is a
cost to make any transmission. A collision model is considered,
where any transmission is successful from a node to a common
monitor if no other node transmits in the same slot. There is
no explicit communication/coordination between any two nodes.
The selfish objective of each node is to minimize a function of its
individual age of information and its transmission cost. Under
this distributed competition model, the objective of this paper
is to find a distributed transmission strategy for each node that
converges to an equilibrium. The proposed transmission strategy
only depends on the past observations seen by each node and
does not require explicit information of the number of other
nodes, or their strategies. A simple update strategy is shown to
converge to an equilibrium, that is in fact a Nash equilibrium for
a suitable utility function, that captures all the right tradeoffs
for each node. In addition, the price of anarchy for the utility
function is shown to approach unity as the number of nodes
grows large.
Index Terms—Age of information, distributed equilibrium,
game theory
I. INT RO DUCTI ON
Consider the modern IoT paradigm in a 5G context, where
there are large number of small IoT devices spread across a
medium sized environment, e.g., a home, an office, an automo-
bile or a factory floor. Each IoT device is monitoring certain
inputs, and wants to communicate an update to a common
monitor essentially as soon as possible. To model this scenario,
a metric called the age of information (AoI) was introduced
recently, that represents the freshness of information at the
monitor/ receiver side, that has become a very popular object
of theoretical interest in recent past [1]–[5]. A nice review can
be found in [6]. Essentially, the age for any device at any time
is the difference between the current time and the generation
time of the last update.
Many variants of the AoI problem for a single node, e.g.
depending on the scheduling discipline like FCFS [1], or
LCFS [7] and more importantly with multiple nodes has been
considered in prior work, e.g., with multiple sources in [2],
[4], [5], [8]–[11]. With multiple sources, at each time slot,
one bit of information can be sent from a set Sof sources to
a monitor, e.g., |S|= 1 in [8], and the objective is to minimize
the long-term weighted sum of the ages of all sources, subject
to individual source throughput constraints.
We acknowledge support of the Department of Atomic Energy, Government
of India, under project no. 12-R&D-TFR-5.01-0500.
One common assumption between almost all prior work on
AoI with multiple nodes is the centralized control over the
transmission decisions by each node. For example, the policy
in [8], transmission decisions for each node are based on the
current global age for each source. The centralized policies
lead to a large overhead and delay, which could be limiting in
a practical large scale IoT deployment, where devices are low-
powered and delay sensitive, and a distributed or autonomous
setup is preferred, where each IoT device can make its own
decisions, given the transmission history.
This paper focusses on the distributed IoT paradigm, where
each device has to make autonomous decisions with no
communication between nodes. To keep the model simple, we
consider slotted time, and assume that if two nodes transmit in
the same slot, a collision occurs, and no update is received at
the monitor. To keep the model practical, we assume that each
node incurs a fixed cost for each transmission, thus ensuring
that no node can transmit all the time. Under this distributed
setting, the objective of each node is to minimize its own
time-averaged age of information while incurring a reasonable
average cost of transmission.
A typical approach to study such problems is to model it
as a game, with a particular utility function, and then try
to find a Nash equilibrium (NE) for it. There are multiple
issues with such an approach: (i)the choice of exact utility
function is not obvious, and more importantly (ii)gathering
network information : e.g. the knowledge of number of other
nodes may not be available in a distributed setting, possibly
because of time varying nature, etc. The solution concept of
NE is important in a distributed competitive setting, since it
establishes that there is a fixed point or a stable strategy for
each node, and ensures that the system can be driven to an
equilibrium.
In this paper, to eliminate the need for network informa-
tion, we take an alternate approach to reach equilibrium via
considering a local probabilistic transmit (learning) algorithm
for each node, which decides the probability with which
each node transmits its most recent packet in any slot. The
learning algorithm for each node is local in the sense that it
only depends on its own current time-averaged age, current
average transmission cost, and past history of success/failures
in slots in which it had transmitted a packet. With this learning
algorithm, the objective is to show that it converges to a fixed
point/equilibrium when followed by all nodes autonomously.
In prior work, finding learning algorithms that achieve
equilibrium has been considered for congestion games (that are
also potential games) where the congestion costs are additive,
and the multiplicative weights learning algorithm is known
to converge to NE [12], [13]. For a more general setting,
Friedman and Shenker [14] showed that learning algorithms
can achieve the NE in a two player zero-sum game, however, a
similar result does not hold for a three player game as shown
by Daskalakis et al. [15]. For a brief survey, we refer the
reader to the work of Shoham et al. [16]. For non-congestion
games, learning algorithms achieving the NE has been briefly
considered [17]–[19]. Similar to our setup, there is also work
[20], [21] in finding utility functions for which a given set
of strategies are NE. Finding utility functions, however, for
which the given set of strategies in addition have low price of
anarchy is something that has remained intractable.
The most related work on learning algorithms to achieve
equilibrium for communication settings is [22]–[24]. In par-
ticular, for analyzing exponential backoff [22] and for arrival
games [23], existence of equilibrium via a learning algorithm
is established. In [24], an uplink throughput game is consid-
ered, where in a distributed setup each node is interested in
maximizing its throughput via updating its transmission rate
using a learning algorithm. Notably, [24] shows that it is not
always possible to show the existence of an equilibrium or
how to achieve it, and in principle, the learning algorithm
based approach to achieve equilibrium is challenging.
In our model, we consider that each IoT device always has
a packet to transmit following [8]–[10]. Under the presence
of multiple competing nodes, and the collision model, it is
not clear when should each node transmit without any explicit
communication between any two nodes, and when there is a
cost for each transmission. Thus, a local probabilistic transmit
(learning) algorithm is considered, where each node decides
to transmit in each slot with probability that is determined by
its own local knowledge of past successes and failures, current
empirical age/cost etc, and the goal is to reach an equilibrium
in this distributed setting. In particular, the proposed learning
algorithm weights the current empirical average of age and
cost inverse exponentially, which is intuitive, since larger the
time-averaged age more aggressive should be the transmission
probability and opposite for large average transmission cost.
The fact that deterministic strategy cannot be an equilibrium
strategy can be argued rather easily.
The learning algorithm tries to find the right balance be-
tween transmitting too often that will lead to lot of collisions
and large transmission cost, and transmitting too seldom which
will increase the time-averaged age. Moreover, the learning
algorithm does not need any knowledge of the network, for
example, the number of other nodes in the network, transmis-
sion strategy of other nodes, etc.
The main result of this paper is to show that the proposed
learning algorithm converges to a unique, non-trivial fixed
point (equilibrium). We also explicitly characterize the fixed
point, and show that it is in fact a NE for a game, where
the corresponding virtual utility function captures the relevant
tradeoffs of the problem, i.e., the utility function for each node
is a function of its own transmission probability via the time-
averaged age and average transmission cost, and is decreasing
in the other nodes’ transmission probabilities, etc.
It is worth noting that the actual probabilistic learning
algorithm makes no use of the knowledge of this virtual
utility function that depends on network parameters such as
the number of nodes in the network, and that is why we call
it the virtual utility function. The virtual utility function is
discovered only to characterize the fixed point of the learning
algorithm. Moreover, we are also able to show that the price
of anarchy of the virtual utility function approaches 1as the
number of nodes grows large. This shows that even if nodes
knew the virtual utility function and could collaborate, the
optimal social utility would be close to the sum of the utilities
obtained by the proposed learning algorithm.
The main technical ingredients of the paper are as follows.
We first consider an expected version of the proposed learning
algorithm, where all random variables are replaced by their
expected values. We then find the underlying virtual utility
function that the expected learning algorithm is trying to
maximize. Corresponding to this utility function, we identify
a multiplayer game G, and show that there is a unique NE for
this game, and that is achieved by the best response strategy.
To show the convergence of the proposed learning algorithm
to a fixed point, we show that its updates converges to the
best response actions for G. Thus, in two steps: (i)proposed
learning algorithm converges to the best response actions for
G, and (ii)best response actions for Gconverges to the NE for
the G, we show that the proposed learning algorithm converges
to a fixed point characterized by the NE of G. Note that this
correspondence between the learning algorithm and the best
response strategy that required network information is only
made for analysis, and the learning algorithm does not need
any network information.
We also present numerical results to validate our theoretical
findings. In particular, we show that the proposed learning al-
gorithm converges to an equilibrium quite fast, and happens for
any choice of N, the number of nodes in the network. To show
this effect, we perturb the system by increasing/decreasing N,
and plot the resultant transmission probabilities. We plot the
time-averaged age seen by any node in the network, which
appears to grow exponentially with Nas expected, since there
is no coordination in the network, and the success probability
for node iis piQN
j6=i,j=1 (1 −pj)if piis the fixed point for
each node i. Even though we analytically only show that the
price of anarchy approaches 1as the number of nodes become
large, in simulations, we observe that it is very close to 1for
all values of the number of nodes in the network.
II. SY S TE M MODE L
Consider a network with Nnodes and a single re-
ceiver/monitor. Time is discretized into equal-length slots.
Following prior work [8]–[11], we assume that a new data-
packet (in short, packet) is generated in each slot at each
node. If a node decides to transmit in any slot, it transmits
the most recent packet, irrespective of success/failures of
transmission in earlier slots. Packet transmitted by a node in
a slot is correctly decoded by the monitor if no other node
transmits in the same slot. Otherwise, a collision occurs and
all the simultaneously transmitted packets are lost. For realistic
modelling, we assume that each transmission by node ℓcosts
cℓunits, to capture transmit energy cost, etc. Also, in each slot
i, the age of node ℓis given by i−µℓ(i), where µℓ(i)is the
last slot (relative to slot i) in which the packet of node ℓwas
successfully received by the monitor. Fig. 1 shows a sample
plot of age against slot. Here, ∆ℓ(i)denotes the age of node ℓ
in slot i, while ∆ℓ0is its initial age. Until the monitor receives
a packet from node ℓ, the age of node ℓgrows linearly with
passage of slots, and it drops to 0 when the packet is received.
slot(i)
∆ℓ(i)
∆ℓ0
i1
i1i2i3
Fig. 1. Sample plot of age (of node ℓ) against slot.
Since nodes are distributed and there is no coordina-
tion/communication between them, a natural competition
model emerges. Each node wants to transmit often to minimize
its age, but in distinct slots, since otherwise there is collision
in which all the nodes (colliding) accrue transmission cost
but without any age reduction. Thus, each node wants to
inherently selfishly maximize a utility function that depends
on its time-averaged age and average transmission cost. The
most appropriate form of utility function is debatable, and even
if we consider a specific utility function, analytically showing
that a NE exists and can be achieved may not be tractable.
The basic idea behind seeking a NE is to show that there is a
fixed point or a stable strategy for each node, and system can
be made to work in an equilibrium.
In this paper, we take an alternate approach to reach equi-
librium via considering a local probabilistic learning algorithm
(learning algorithm called hereafter) for each node, which de-
cides the probability with which each node decides to transmit
its most recent packet in any slot. The learning algorithm for
each node is local in the sense that it only depends on its own
current time-averaged age, current average transmission cost,
and past history of success/failures in slots in which it had
transmitted a packet. This approach completely eliminates the
need for network parameter knowledge, such as the number
of other nodes in the network, and their strategies, which
would be needed if we were to follow the usual technique
of considering an utility function for each node, and finding
its NE.
With this learning algorithm, the objective is to show that
it converges to a fixed point/equilibrium when followed by all
nodes autonomously. A priori this appears to be a challenging
task, however, we show in the next section, that it is possible to
do so for the considered problem. In fact, we also characterize
the fixed point that this learning algorithm achieves.
III. THE LE AR NIN G ALGO RI THM
Let mbe a positive integer. Divide time-axis into frames by
grouping mconsequtive slots together. Therefore, tth frame
consists of slot m(t−1) + 1 to slot mt. Henceforth, let (t, i)
refer to slot m(t−1)+i, i.e., ith slot in the tth frame. Further,
let Cav
ℓ(t)be the average transmission cost of node ℓin frame
t, given by
Cav
ℓ(t) = cℓPm
i=1 Tℓ(t, i)
m,(1)
where cℓis the cost per transmission for node ℓ, and Tℓ(t, i)
is a binary random variable which takes value 1 if node ℓ
transmits in slot (t, i), and 0 otherwise. Also, let ∆av
ℓ(t)be
the time-averaged age of node ℓin frame t(assuming age at
the start of the frame to be 0), given by
∆av
ℓ(t) = Pm
i=1 ∆ℓ(t, i)
m,(2)
where ∆ℓ(t, i) = min{i, µℓ(t, i)}is the age of node ℓin
slot (t, i)where age is reset at the start of frame tto be 0
(recall that µℓ(t, i)denotes the last slot (relative to slot (t, i))
in which the packet of node ℓwas successfully received by
the monitor).
Consider the following learning algorithm for deciding the
probability with which to transmit in any slot in a given frame:
in each slot of a frame t, node ℓtransmits the packet with
probability pℓ(t), where pℓ(1) is initialized with a random
value from interval (0,1), whereas at the end of each frame
t≥1,pℓ(t+ 1) is given by
pℓ(t+ 1) = max{pmin
ℓ, pℓ(t) + κ(t)(e−ρℓ1Cav
ℓ(t)
−1
(1 + ∆av
ℓ(t))eρℓ2−pℓ(t))},(3)
where, κ(t)∈(0,1] is the learning rate, while pmin
ℓ∈(0,1)
and ρℓ1, ρℓ2>0are constants decided by node ℓlocally (i.e.,
independently of other nodes in the network).
The intution for (3) is as follows. If Cav
ℓ(t)is high,
pℓ(t+1) should decrease, whereas, if ∆av
ℓ(t)is high, pℓ(t+1)
should increase. To account for this trade-off, the learning
component of algorithm (3) consists of two additive terms:
(i)e−ρℓ1Cav
ℓ(t)(a decreasing function in Cav
ℓ(t)), and (ii)
−1/((1 + ∆av
ℓ(t))eρℓ2)(an increasing function in ∆av
ℓ(t)due
to negative sign). Because Cav
ℓ(t)∈[0,1] and ∆av
ℓ(t)∈
[0,∞), so the value of first term belongs to the interval
[e−cρℓ1,1] while the value of second term belongs to the in-
terval [−e−ρℓ2,0). Therefore, ρℓ1and ρℓ2controls the relative
weights of Cav
ℓ(t)and ∆av
ℓ(t)respectively, as well as the
range of values the corresponding terms can take.
Note that by simple rearrangement of terms in (3), we obtain
pℓ(t+ 1) = max{pmin
ℓ, κ(t)(e−ρℓ1Cav
ℓ(t)−1
(1+∆av
ℓ(t))eρℓ2) +
(1 −κ(t))pℓ(t)}. For each t≥1,e−ρℓ1Cav
ℓ(t)−1/((1 +
∆av
ℓ(t))eρℓ2)<1, and because we initialize pℓ(1) <1, there-
fore for t≥1and κ(t)∈(0,1], we have κ(t)(e−ρℓ1Cav
ℓ(t)−
1/((1 + ∆av
ℓ(t))eρℓ2)) + (1 −κ(t))pℓ(t)<1(because convex
combination of two terms with value in interval (0,1) also
lies in the same interval). Also, max{·} function ensures that
pℓ(t+1) ≥pmin
ℓ>0. Hence ∀t,pℓ(t+1) ∈[pmin
ℓ,1). And as
[pmin
ℓ,1) ⊆[0,1], therefore, pℓ(t+ 1) is a valid transmission
probability.
Remark 1: For analytical tractability, we update the trans-
mission probability for frame t+ 1, i.e., pℓ(t+ 1) in (3), using
the Cav
ℓ(t)and ∆av
ℓ(t), which only accounts for the average
transmission cost and time-averaged age of the previous (i.e.,
tth) frame (instead of all the previous frames). We show that
choosing large enough frame length m, this simplification is
sufficient.
Now, with the given description of the learning algorithm,
the three main results of the paper can be summarized as
follows.
Theorem 1: If all the nodes in the system obtain their
transmission probability using the learning algorithm (3), then
their transmission probabilities converge to a unique fixed
point almost surely.
Theorem 1 establishes that the learning algorithm (3) can
achieve an equilibrium. Next, we characterize this equilibrium
as follows.
Theorem 2: The unique fixed point of Theorem 1 cor-
responds to the NE of the non-cooperative game G=
{N, pℓ, Uℓ;ℓ∈[N]}, where the Nnodes act as Nplayers,
and each node ℓchooses an action pℓ∈[pmin
ℓ,1] to maximize
its own (virtual) utility Uℓgiven by
Uℓ(pℓ;P−ℓ) = −e−αℓpℓ
αℓ
−p2
ℓ
21 + 1
bℓ+1 + αℓ
αℓ
,(4)
where αℓ=cℓρℓ1,bℓ=Qk6=ℓ(1−pk)−1eρℓ2, and P−ℓdenotes
the transmission probability of all the nodes in the system
except node ℓ.
The utility function for each node (4) is relevant for the con-
sidered problem since it is a function of its own transmission
probability via the time-averaged age and average transmis-
sion cost, and is decreasing in the other nodes’ transmission
probabilities through bℓ, etc.
Even though NE guarantees equilibrium but its efficiency is
quantified by price of anarchy (P oA) that counts the price
for selfish behaviour. For Uℓdefined in (4), let Usys =
PℓUℓ(pℓ;P−ℓ)be the sum of the utilities of all nodes. Then
PoA is defined as P oA =Usys(POP T )
Usys(PN E ), where POP T is the
global optimal for Usys while PNE is the NE point. In the
next Theorem, we show that the PoA remains close to 1
(as desirable) for the considered game with the virtual utility
function (4).
Theorem 3: For the non-cooperative game Gdefined in
Theorem 2, as Nbecomes large, the price of anarchy (P oA)
approaches unity.
The rest of the theoretical part of the paper is dedicated
in proving the above three theorems. We first consider an
expected version of the proposed learning algorithm (3), where
all random variables are replaced by their expected values.
We then find the underlying virtual utility function that the
expected learning algorithm is trying to maximize. Corre-
sponding to this utility function, we identify a multiplayer
game G, and show that there is a unique NE for this game,
and that is achieved by the best response strategy. To show
the convergence of the proposed learning algorithm to a
fixed point, we show that its updates converge to the best
response actions for G. Thus, in two steps: i) proposed learning
algorithm converges to the best response actions for G, and ii)
best response actions for Gconverges to the NE of G, we show
that the proposed learning algorithm converges to a fixed point
characterized by the NE of G.
Let P(t)denote the transmission probability vector of the N
nodes. Then we have the following Lemma (proof in Appendix
A):
Lemma 1: For a given P(t), if mis large,
1) Cav
ℓ(t)a.s.
−−→ cℓ·pℓ(t),
2) ∆av
ℓ(t)a.s.
−−→ 1−pℓ(t)Qk6=l(1 −pk(t))
pℓ(t)Qk6=l(1 −pk(t)) .
Replacing Cav
ℓ(t)and ∆av
ℓ(t)in (3) by their converged
values (assuming large mand using Lemma 1), we obtain
the following expected form of the learning algorthm (3):
pℓ(t+ 1) = max{pmin
ℓ, pℓ(t) + κ(t)(e−αpℓ(t)
−1
bℓ/pℓ(t)−pℓ(t))},(5)
where, αℓ=cℓρℓ1, and bℓ=Qk6=ℓ(1 −pk)−1eρℓ2. To avoid
overload of notation, we use the same notation pfor this
expected update strategy (5) as in (3), and the distinction will
be clear in the sequel. Note that (5) is just for analysis and
it cannot be used in practice as bℓis unknown. Now using
(5), we extract the virtual utility function that (5) is trying to
maximize.
Theorem 4: Let P−ℓdenote the transmission probability
of all the nodes in the system except node ℓ. Then for a
given P−ℓ, the learning algorithm (5) maximizes the following
virtual utility function (unique upto a constant):
Uℓ(pℓ;P−ℓ) = −e−αℓpℓ
αℓ
−p2
ℓ
21 + 1
bℓ+1 + αℓ
αℓ
.(6)
which is continuous and strictly concave for pℓin interval
(0,1], with a unique maximizer p∗
ℓwhich lies in [e−αℓ/2,1).
A. Non-Cooperative Game Model
Using the virtual utility function (6), we next define a
game, where the strategy of each user is the probability with
which to transmit in each slot in an autonomous way. Let
G={N, pℓ, Uℓ;ℓ∈[N]}be a game, with Nnodes as players,
and each node ℓchooses an action pℓ∈[pmin
ℓ,1] to maximize
its own utility Uℓgiven by (6). The best response of a node ℓ
is given by
pbr
ℓ= arg max
pℓ∈[pmin
ℓ,1]
Uℓ(pℓ;P−ℓ),(7)
and under best response strategy, at the end of each frame
t,pℓ(t+ 1) = pbr
ℓ. Further, a Nash Equilibrium (NE) is said
to exist for Gif there exists a transmission probability vector
PNE , such that for each node ℓ,pNE
ℓis best response for
node ℓgiven PNE
−ℓ. Note that the set {pℓ|pmin
ℓ≤pℓ≤1}
is non-empty, compact and convex in R. Additionally from
Theorem 4, the utility function (6) is continuous and strictly
concave (strict concavity implies quasi-concavity as well) for
pℓ(t)∈[pmin
ℓ,1]. Hence using Proposition 1, we conclude that
NE exists for G.
Proposition 1: [Proposition 20.3 in [25]] The non-
cooperative game G={N, pℓ, Uℓ;ℓ∈[N]}has a Nash
Equilibrium if for all ℓ∈[N],
1) the set of actions {pℓ}of player ℓis a non-empty
compact convex subset of a Euclidean space, and
2) the utility function Uℓis continuous and quasi-concave
on the set of actions {pℓ}.
Next, we show that the best response strategy (7) for G
converges to the unique NE.
B. Convergence of Best Response Strategy
Theorem 5: For the non-cooperative game G, if for each
node ℓ,
(N−1)(1 −pmin
global)(N−2)
eρℓ2(αℓ+ 1) <1(8)
(where pmin
global ≤min
j{pmin
j}), then the best response strategy
converges to the unique NE.
Theorem 5 has been proved in Appendix D. Note that (8)
depends on the value of Nand pmin
global. We have assumed that
value of Nis unknown to nodes. Next, we show that if each
node chooses its parameters depending on a predetermined
value of pmin
global independent of the value of N, (8) can be
made to satisfy for all values of N. The result is summarized
in the following Lemma (detailed proof in Appendix E).
Lemma 2: Let pmin
global ∈(0,0.5) be the lower bound on pmin
ℓ
for each node ℓ, known as part of the learning algorithm (3).
If each node ℓassigns pmin
ℓ=pmin
global,ρℓ1≤ − 1
cℓln(2pmin
global),
and
ρℓ2> max (0,ln (n∗−1)(1 −pmin
global)(n∗−2)
αℓ+ 1 !),(9)
where, n∗= 1 −1
ln(1 −pmin
global),then (8) is always satisfied.
Remark 2: As per Lemma 2, ρℓ1≤ − 1
cℓln(2pmin
global).
Suppose that for each node ℓ,ρℓ1=−1
cℓln(2pmin
global), then
αℓ=cℓρℓ1=−ln(2pmin
global), which is a global constant
(independent of cℓ). Now from (5), note that for each node
ℓ, the trajectory of its transmission probability is determined
by αℓ,ρℓ2and pmin
ℓ. When αℓis independent of cℓ, then as per
Lemma 2, ρℓ2and pmin
ℓare also independent of cℓ. Therefore,
if for each node ℓ,ρℓ1=−1
cℓln(2pmin
global), then the trajectory
of transmission probability is independent of cℓ.
In summary, we conclude that each node can independently
choose the parameters such that the condition (8) for conver-
gence of the best response strategy is satisfied for all values
of N.
To finally prove Theorem 1, we next show that the learning
algorithm (3) converges to the best response strategy (7) which
converges to the NE of Gas shown in Theorem 5.
C. Convergence of Learning Algorithm (3) to the Best Re-
sponse Strategy (7)
Definition 1: For the real-valued concave function Uℓ:
(0,1] →R,φis said to be its subgradient at point p∗∈(0,1],
if for every other point p0∈(0,1], we have Uℓ(p0)−
Uℓ(p∗)≤φ·(p0−p∗). Further, a function v(t)(where t
is time) is called the stochastic subgradient of Uℓ(pℓ;P−ℓ)
at point pℓ(t), if for a given value of random variables
(pℓ(0), pℓ(1), ..., pℓ(t)),E{v(t)|pℓ(0), pℓ(1), ..., pℓ(t); P−ℓ}is
a subgradient of Uℓ(pℓ;P−ℓ)at pℓ(t).
Consider the function vℓ(t)defined as below:
vℓ(t) = e−ρℓ1Cav
ℓ(t)−1
(1 + ∆av
ℓ(t))eρℓ2−pℓ(t).(10)
Taking conditional expectation on both sides of (10), we get
E{vℓ(t)|pℓ(0),pℓ(1), ..., pℓ(t); P−ℓ(t)}
(a)
=e−αℓpℓ(t)−1
bℓ/pℓ(t)−pℓ(t),
(b)
=∂Uℓ(pℓ(t); P−ℓ(t))
∂pℓ(t),(11)
where (a) is obtained for large musing Lemma 1 and using αℓ
and bℓto denote cℓρℓ1and Qk6=ℓ(1 −pk)−1eρℓ2respectively,
while (b) is obtained using (6). Further due to Theorem 4,
we know that Ulis a concave function in pℓ(for fixed P−ℓ).
Therefore for p0, p∗∈(0,1] and p06=p∗,
Uℓ(p0;P−ℓ)−Uℓ(p∗;P−ℓ)≤∂Uℓ(pℓ;P−ℓ)
∂pℓpℓ=p∗
·(p0−p∗).
(12)
Hence, using (11) and (12) along with Definition 1,
we conclude that vℓ(t)is a stochastic subgradient of
Uℓ(pℓ(t); P−ℓ(t)). Now using (10), we can write the learning
algorithm (3) as
pℓ(t+ 1) = max{pmin
ℓ, pℓ(t) + κ(t)vℓ(t)},(13)
which suggests that the learning algorithm can be interpreted
as a stochastic subgradient algorithm [26] which maximizes
the virtual utility function given by (6). Using this interpre-
tation of the learning algorithm we obtain Theorem 6, with
detailed proof in Appendix F.
Theorem 6: If the learning rate κ(·)is chosen such that ∀t,
κ(t)>0,P∞
t=1 κ(t) = ∞, and P∞
t=1 κ2(t)<∞, then the
learning algorithm (3) converges to the best response strategy
(7) almost surely.
Theorem 6 suggests that for properly chosen learning rates
(for example, κ(t) = 1/t, ∀t≥1), the learning algorithm
converges to the best response strategy almost surely, and
if (8) is satisfied, then according to Theorem 5, the best
response strategy further converges to a unique NE. Thus,
combining Theorem 5 and Theorem 6, we conclude that if all
the nodes update their transmission probabilities by following
the learning algorithm (3) with an appropriate learning rate and
satisfying (8), then their transmission probabilities converge
to a unique NE (a fixed point) almost surely, thereby proving
Theorem 1. Thus, completing the proof of Theorem 1 and
Theorem 2 simultaneously. Proof of Theorem 3 can be found
in Appendix G.
IV. NUM ERI CA L RESU LTS
We analyzed the convergence properties of the learning
algorithm (3) by simulating a scenario with 10 nodes and ρℓ1,
ρℓ2, and pmin
ℓchosen as per Lemma 2 for different values
of pmin
global. Also, κ(t) = 1/t, ∀t≥1. As shown in Fig. 2,
transmission probability obtained using the learning algorithm
converges to the best response strategy very quickly.
5 10 15 20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transmission Probability
Learning Algorithm
Best Response Strategy
Fig. 2. Variation of transmission probability of a node with time.
To check the robustness of the learning algorithm under
dynamic conditions, we performed a second simulation with
3 nodes at t= 0 and 7 new nodes joining the system at t= 20
and leaving it again at t= 80. As shown in Fig. 3, irrespective
of the disturbance, the learning algorithm converges to the best
response strategy. But note that the learning rate κ(t)decreases
with t. Hence, if the system is disturbed at large t, then the
convergence is slow. However, this issue can be resolved by
reinitializing twhenever it becomes very large.
To understand the effect of number of nodes Non the fixed
point of the learning algorithm (3), Fig. 4 plots the trans-
mission probability (converged) obtained using the learning
algorithm (3) for different values of N(for the simulation, we
used pmin
global = 0.05 and for each node ℓ,cℓ= 1, while pmin
ℓ,
ρℓ1and ρℓ2were chosen as per Lemma 2). For comparative
study, Fig. 4 also plots the transmission probability for the
round-robin (RR) scheme, in which, each node is assigned
a slot in round-robin fashion to avoid collision. With RR,
the nodes may transmit their packets only in their respective
alloted slots with transmission probability obtained using (3).
However, note that the RR is only of theoretical interest
because in practice, there is no mechanism for slot allotment
20 40 60 80 100 120 140 160 180 200
0.38
0.4
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
Transmission Probability
Learning Algorithm
Best Response Strategy
Fig. 3. Convergence of the learning algorithm to the best response strategy
when number of nodes vary with time.
(as neither the nodes can communicate with each other, nor
there is a centralized controller to do so).
Remark 3: When number of nodes is small, the interval
between consequtive alloted slots of each node in RR is also
small. Therefore, depending on the transmission cost of a
node, it may not be optimal for the node to transmit packet in
every alloted slot. Hence, to account for this fact, we consider
that in RR, a node transmits in the alloted slot with probability
obtained using (3). Further, due to the specific choice of (3) for
obtaining transmission probability under RR, the comparision
of corresponding plots for the learning algorithm and RR
provides nice insight regarding the impact of collision on the
learning algorithm (3).
For the learning algorithm (3), as Nincreases, there are two
phenomena which simultaneously influence the transmission
probability: (i)For large N, frequency of packet collision
is high. Therefore, average transmission cost increases with
N, thereby decreasing the transmission probability. (ii)With
more collisions happening (and fewer packets getting received
by the monitor) due to large N, time-averaged age becomes
high, hence increasing the transmission probability. As shown
in Fig. 4, for small N, phenomenon (ii)dominates, thereby
increasing the transmission probability. However, when Nis
large, the two phenomena balances each other, and hence, the
transmission probability gets saturated.
For round-robin scheme, as Nincreases, interval between
successive alloted slots of each node becomes large. Therefore
for a fixed transmission probability, average transmission cost
decreases, whereas time-averaged age increases: both leading
to increase in transmission probability. Therefore, the trans-
mission probability under RR increases very rapidly with N
(in comparision to the learning algorithm).
Additionally, we analysed the variation in time-averaged
age with increase in Nfor the learning algorithm as well as
round-robin scheme. As shown in Fig. 5, time-averaged age
for the learning algorithm increases very rapidly with Nin
comparision to the round-robin scheme. If a packet from node
ℓis successfully received by the monitor once in every Tslots,
then using (2) and assuming m/T (mis the number of slots
2 4 6 8 10 12 14
Number of Nodes (N)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Transmission Probability
Learning Algorithm
Round-Robin
Fig. 4. Variation of transmission probability of a node with number of nodes.
in each frame) to be an integer, we get the time-averaged age
to be
∆av
ℓ(t) = Pm/T
j=1 PT
i=1 ∆ℓ(t, i)
Pm/T
j=1 T=T2/2
T=T
2.(14)
In round-robin scheme, a packet is successfully received every
NE[sℓa]slots, where sℓa is the number of alloted slots per
transmission for node ℓ. As shown in Fig. 4, transmission
probability increases with N, and hence, E[sℓa]decreases
(approaches 1) as Nincreases. Therefore, increase in time-
averaged age for the learning algorithm (3) is NE[sℓa ]/2,
which converges to N/2when Nis large. Fig. 5 shows
a similar trend as can be verified using the transmission
probability values from Fig. 4.
Now for the learning algorithm (3), probability that a packet
of node ℓis received by the monitor is pℓQk6=ℓ(1 −pk).
Let the transmission probability of each node to be equal
(say, p). Therefore, the probability that a packet of node ℓis
received by the monitor becomes p(1 −p)N−1, and hence,
the expected number of slots required for each successful
reception of packet by the monitor is O((1 −p)−N). So when
Nincreases, the time-averaged age for the learning algorithm
grows exponentially.
2 4 6 8 10 12
Number of Nodes (N)
0
5
10
15
20
25
30
35
Time-Averaged Age
Learning Algorithm
Round-Robin
Fig. 5. Variation of time-averaged age with number of nodes.
Finally, we also computed the price of anarchy (P oA)
for the utility function of each node being (4). For any
combination of transmission probability of nodes given by
P, the overall utility of the system is given by Usys(P) =
PℓUℓ(pℓ;P−ℓ), where Uℓ(pℓ;P−ℓ)is the utility of node ℓ.
Therefore, P oA of the learning algorithm is
P oA =Usy s(POP T )
Usys(PLA ),(15)
where, POP T is the optimal transmission probability vector
which maximizes Usys (·), while PLA is the vector of (con-
verged) transmission probabilities obtained using the learning
algorithm (3). Note that P oA ≥1, and a value close to 1
indicates that the algorithm is close to optimal.
Figure 6 plots the P oA of learning algorithm (3) for differ-
ent values of N. Initially when Nincreases, P oA increases
as well. However, for large N,P oA converges back to unity
as per Theorem 3. Detailed explanation of the phenomenon is
discussed in Appendix G.
2 4 6 8 10 12 14 16 18 20
Number of Nodes (N)
0.9995
1
1.0005
1.001
1.0015
Price of Anarchy
Fig. 6. Variation in Price of Anarchy P oA of the Learning Algorithm with
number of nodes.
V. CO NCL US ION
In this paper, we have presented a new direction in achieving
equilibrium in a distributed IoT setting, where each node is
interested in minimizing its age of information when there is
a cost for each transmission. Typically, for distributed models,
one identifies an utility function for each node and tries to
establish a NE for it. However, such an approach requires the
network knowledge, e.g., the number of nodes in the network,
and their strategies, which may not be available in a distributed
network. We instead propose a simple local update (learning)
strategy for each node that determines the probability with
which to transmit in each slot, that depends on the current
empirical average of age and cost. This strategy for appropriate
choice of parameters is shown to achieve an equilibrium that is
also identified by a NE for a suitable virtual game. To further
quantify the efficiency of this learning strategy, it is shown
that the price of anarchy of the virtual game approaches unity
when the number of nodes in the network is large enough.
REF ERE NC ES
[1] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should
one update?” in INFOCOM, 2012 Proceedings IEEE. IEEE, 2012, pp.
2731–2735.
[2] L. Huang and E. Modiano, “Optimizing age-of-information in a multi-
class queueing system,” arXiv preprint arXiv:1504.05103, 2015.
[3] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff,
“Update or wait: How to keep your data fresh,” IEEE Transactions on
Information Theory, vol. 63, no. 11, pp. 7492–7508, 2017.
[4] R. D. Yates and S. K. Kaul, “The age of information: Real-time status
updating by multiple sources,” arXiv preprint arXiv:1608.08622, 2016.
[5] E. Najm, R. Nasser, and E. Telatar, “Content based status updates,” arXiv
preprint arXiv:1801.04067, 2018.
[6] A. Kosta, N. Pappas, V. Angelakis et al., “Age of information: A new
concept, metric, and tool,” Foundations and Trends® in Networking,
vol. 12, no. 3, pp. 162–259, 2017.
[7] S. K. Kaul, R. D. Yates, and M. Gruteser, “Status updates through
queues,” in 2012 46th Annual Conference on Information Sciences and
Systems (CISS). IEEE, 2012, pp. 1–6.
[8] I. Kadota, A. Sinha, and E. Modiano, “Scheduling algorithms for
optimizing age of information in wireless networks with throughput
constraints,” in INFOCOM, 2018 Proceedings IEEE. IEEE, 2018.
[9] Y. Sun, E. Uysal-Biyikoglu, and S. Kompella, “Age-optimal updates of
multiple information flows,” in IEEE INFOCOM 2018-IEEE Confer-
ence on Computer Communications Workshops (INFOCOM WKSHPS).
IEEE, 2018, pp. 136–141.
[10] Y.-P. Hsu, E. Modiano, and L. Duan, “Scheduling algorithms for
minimizing age of information in wireless broadcast networks with
random arrivals: The no-buffer case,” arXiv preprint arXiv:1712.07419,
2017.
[11] V. Tripathi and S. Moharir, “Age of information in multi-source
systems,” in GLOBECOM 2017-2017 IEEE Global Communications
Conference. IEEE, 2017, pp. 1–6.
[12] R. Kleinberg, G. Piliouras, and E. Tardos, “Multiplicative updates out-
perform generic no-regret learning in congestion games,” in Proceedings
of the forty-first annual ACM symposium on Theory of computing.
ACM, 2009, pp. 533–542.
[13] W. Krichene, B. Drigh`es, and A. M. Bayen, “Online learning of
nash equilibria in congestion games,” SIAM Journal on Control and
Optimization, vol. 53, no. 2, pp. 1056–1081, 2015.
[14] E. Friedman and S. Shenker, “Learning and implementation on the
internet,” Manuscript. New Brunswick: Rutgers University, Department
of Economics, 1997.
[15] C. Daskalakis, R. Frongillo, C. H. Papadimitriou, G. Pierrakos, and
G. Valiant, “On learning algorithms for nash equilibria,” in International
Symposium on Algorithmic Game Theory. Springer, 2010, pp. 114–125.
[16] Y. Shoham, R. Powers, and T. Grenager, “If multi-agent learning is the
answer, what is the question?” Artificial Intelligence, vol. 171, no. 7,
pp. 365–377, 2007.
[17] E. Altman and N. Shimkin, “Individual equilibrium and learning in
processor sharing systems,” Operations Research, vol. 46, no. 6, pp.
776–784, 1998.
[18] G. Kasbekar and A. Proutiere, “Opportunistic medium access in multi-
channel wireless systems: A learning approach,” in Communication,
Control, and Computing (Allerton), 2010 48th Annual Allerton Con-
ference on. IEEE, 2010, pp. 1288–1294.
[19] X. Chen and J. Huang, “Distributed spectrum access with spatial reuse,”
IEEE Journal on Selected Areas in Communications, vol. 31, no. 3, pp.
593–603, 2013.
[20] N. Li and J. R. Marden, “Designing games for distributed optimization,”
IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 2, pp.
230–242, 2013.
[21] J. R. Marden and A. Wierman, “Distributed welfare games,” Operations
Research, vol. 61, no. 1, pp. 155–168, 2013.
[22] A. Tang, J.-W. Lee, J. Huang, M. Chiang, and A. R. Calderbank,
“Reverse engineering MAC,” in 2006 4th International Symposium on
Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks.
IEEE, 2006, pp. 1–11.
[23] P. Thaker, A. Gopalan, and R. Vaze, “When to arrive in a congested
system: Achieving equilibrium via learning algorithm,” in 2017 15th
International Symposium on Modeling and Optimization in Mobile, Ad
Hoc, and Wireless Networks (WiOpt). IEEE, 2017, pp. 1–8.
[24] E. Sabir, R. El-Azouzi, V. Kavitha, Y. Hayel, and E.-H. Bouyakhf,
“Stochastic learning solution for constrained nash equilibrium through-
put in non saturated wireless collision channels,” in Proceedings of
the Fourth International ICST Conference on Performance Evaluation
Methodologies and Tools. ICST (Institute for Computer Sciences,
Social-Informatics and ?, 2009, p. 61.
[25] M. J. Osborne and A. Rubinstein, A course in game theory. MIT press,
1994.
[26] S. Boyd and A. Mutapcic, “Stochastic subgradient methods,” Lecture
Notes for EE364b, Stanford University, 2008.
[27] N. Sandri´c, “A note on the birkhoff ergodic theorem,” Results in
Mathematics, vol. 72, no. 1-2, pp. 715–730, 2017.
[28] A. Kumar, “Discrete event stochastic processes,” Lecture Notes for
Engineering Curriculum, 2012.
[29] T. Tao, “Analysis ii, texts and readings in mathematics, vol. 38,”
Hindustan Book Agency, New Delhi, 2009.
[30] Y. M. Ermoliev and R.-B. Wets, Numerical techniques for stochastic
optimization. Springer-Verlag, 1988.
APP EN D IX A
PROO F OF LEM MA 1
From (1), Cav
ℓ(t) = cℓPm
i=1 Tℓ(t,i)
m, where Tℓ(t, i)has
Bernoulli distribution (takes value 1 with probability pℓ(t), and
0 otherwise). Since Tℓ(t, 1), Tℓ(t, 2), ..., Tℓ(t, m)are indepen-
dent and identically distributed, therefore when mis large, we
get relation (1) using strong law of large numbers.
Now, note that ∆ℓ(t, i) = (∆ℓ(t, i −1)+ 1) {Fℓ(t,i)}, where
Fℓ(t, i)is the event that a packet transmitted by node ℓis not
received by the monitor in slot (t, i). Hence,
E{∆ℓ(t, m)|P(t)}=[E{∆ℓ(t, m −1)|P(t)}+ 1]
[1 −pℓ(t)Y
k6=ℓ
(1 −pk(t))].(16)
We also have following Lemma (proved in Appendix B):
Lemma 3: For a fixed t, the sequence ∆ℓ(t, 1),∆ℓ(t, 2), ...
is an ergodic uniform Markov chain.
Using Lemma 3, when mis large, E{∆ℓ(t, m)|P(t)}=
E{∆ℓ(t, m −1)|P(t)}, and additionally using the extension
of Birkhoff ergodic theorem discussed in [27], ∆av
ℓ(t)a.s.
−−→
E{∆ℓ(t, m)|P(t)}. Therefore, plugging these results in (16),
we get relation (2) of Lemma 1.
APP EN D IX B
PROO F OF LEM MA 3
Within a frame t,P(t)is fixed, and hence,
P[∆ℓ(t, i + 1) = x|∆ℓ(t, i) = y] = (1−ν;x=y+ 1
ν;x= 0
where ν=pℓ(t)Qk6=ℓ(1 −pk(t)) is the probability that the
packet transmitted by node ℓin a slot of frame tis successfully
received by the monitor. Therefore for a given state ∆ℓ(t, i) =
y,∆ℓ(t, i + 1) is written independently of ∆ℓ(t, j ),∀j < i,
and the transition probability is independent of i. Therefore,
the sequence ∆ℓ(t, 1),∆ℓ(t, 2), ... is a uniform Markov chain.
Further, note that P[∆ℓ(t, i +x+ 1) = x|∆ℓ(t, i) = x] =
ν(1 −ν)x>0, as well as P[∆ℓ(t, i +x+ 2) = x|∆ℓ(t, i) =
x] = ν2(1 −ν)x>0. Therefore, the Markov chain is also
aperiodic (i.e., period= 1). Hence, to prove that the Markov
chain is a ergodic uniform Markov chain, it is sufficient to
show that it positive recurrent.
Also, from any state y, any other state xcan be reached in
finite number of steps (slots) with positive probability, given
by (1−ν)x−yif x > y, and ν(1−ν)xif x≤y. So, the Markov
chain is a single communicating class. Hence to show that it
is positive recurrent, it is sufficient to show that any particular
state is positive recurrent [28].
Let f(j)
00 denote the probability for returning to state 0in
jth step. Then ∀j≥1, we have f(j)
00 =ν > 0. Hence,
lim
m→∞
m
X
j=1
f(j)
00 =∞,and lim
m→∞
1
m
m
X
j=1
f(j)
00 =ν > 0.(17)
Therefore using Theorem 7, we conclude that the state
∆ℓ(t, i) = 0 is positive recurrent, thereby proving Lemma
3.
Theorem 7: [Theorem 2.4-2.5 in [28]] If
limm→∞ Pm
j=1 f(j)
γγ =∞, then the state γis recurrent.
Additionally, if limm→∞ 1
mPm
j=1 f(j)
00 >0, then γis positive
recurrent.
APP EN D IX C
PROO F OF THE ORE M 4
If the learning algorithm converges to the maximizer p∗
ℓ,
then it should satisfy:
p∗
ℓ=max pmin
ℓ, p∗
ℓ+κ(t)e−αℓp∗
ℓ−1
bℓ/p∗
ℓ
−p∗
ℓ,and
(18)
∂Uℓ(pℓ(t); P−ℓ(t))
∂pℓ(t)pℓ(t)=p∗
ℓ
= 0.(19)
Therefore using (18) and (19), we can write
∂Uℓ(pℓ(t); P−ℓ(t))
∂pℓ(t)=e−αℓpℓ(t)−1
bℓ/pℓ(t)−pℓ(t).(20)
Integrating on both sides of (20) w.r.t. pℓ, we get (6) (with
(1 + αℓ)/αℓas integration constant), which is continuous and
strictly concave (∵∂2Uℓ
∂p2
ℓ
<0) for pℓin interval (0,1]. Also,
∂Uℓ
∂pℓis continuous, and it can be verified that for pℓ= 0+,
∂Uℓ
∂pℓ>0, while for pℓ= 1,∂Uℓ
∂pℓ<0. So, ∃p∗
ℓ∈(0,1) at
which ∂Uℓ
∂pℓ= 0, and p∗
ℓis the unique maximizer because of
strict concavity of Uℓ.
However, note that on solving (19), we get
e−αℓp∗
ℓ=p∗
ℓ1 + 1
bℓ.(21)
Since bℓ≥1, so using (21), we get e−αℓp∗
ℓ≤2p∗
ℓ, and
as e−αℓ≤e−αℓp∗
ℓ, therefore, e−αℓ/2≤p∗
ℓ. Hence, p∗
ℓ∈
[e−αℓ/2,1).
Remark 4: Note that (18) follows from (5), which uses
Lemma 1 assuming the limit m→ ∞. Additionally, for the
convergence of pℓ(t), we assume t→ ∞, and from Theorem
6, κ(t)→0as t→ ∞. Therefore, for (20) to hold, we initially
take the limit m→ ∞, followed by the limit t→ ∞. If the
order of the two limits is exchanged, then κ(t)would converge
to 0 before pℓconverges to p∗
ℓ, and hence (18) and (19) will
not be satisfied.
APP EN D IX D
PROO F OF THE ORE M 5
Best response strategy for the non-cooperative game G
can be expressed as a function fbr : [Pmin,Pmax ]→
[Pmin,Pmax ], where Pmin and Pmax are N−dimensional
vectors. To prove Theorem 5, we use contraction mapping
theorem:
Theorem 8: [Theorem 6.6.4 in [29]] In a metric space
(X, d), a function f:X→Xis called a strict contraction, if
there exists a constant γ∈(0,1), such that d(f(x), f (y)) ≤
γd(x, y),∀x, y ∈X. Additionally, if Xis non-empty and
compact, then fhas a unique fixed point, i.e., there exists a
unique x∗∈Xsuch that x∗=f(x∗), and sequences of the
form x(t+ 1) = f(x(t)) converges to x∗.
Let [Pmin,Pmax ]⊂RNbe the metric space with in-
finity norm as the distance metric. Then for any P1,P2∈
[Pmin,Pmax ],
d(fbr(P1), f br(P2)) = ||fbr(P2)−fbr (P1)||∞,
(a)
≤ ||J||∞||P2−P1||∞,
=||J||∞d(P1,P2),(22)
where in (a), Jis the Jacobian (whose elements are given by
Jlj ,∂pbr
ℓ
∂pj), and the matrix norm is induced by the vector
norm. Also, [Pmin ,Pmax]is non-empty (since ∀k, pmin
k<1)
and compact. Hence, to prove the existence of unique fixed
point for fbr using Theorem 8, it is sufficient to show that
||J||∞<1.
Now, using Lemma 4 (discussed below), we can write (21)
as
e−αℓpbr
ℓ=pbr
ℓ1 + 1
bℓ.(23)
Lemma 4: If pmin
ℓ≤e−αℓ/2, then pbr
ℓ=p∗
ℓ.
Proof: According to Theorem 4, p∗
ℓis a unique maximizer
of Uℓin interval (0,1], therefore if p∗
ℓ∈[pmin
ℓ,1], then
according to (7), pbr
ℓ=pmin
ℓ. Also, p∗
ℓ≥e−αℓ/2. Hence,
if pmin
ℓ≤e−αℓ/2, then pbr
ℓ=p∗
ℓ.
Differentiating (23) w.r.t. pj(∀j), we get
∂pbr
ℓ
∂pℓ
= 0,and ∂pbr
ℓ
∂pjj6=ℓ
=e−ρℓ2Qk6=ℓ,j (1 −pk)
(αℓ+1
pbr
ℓ
)(1 + 1
bℓ).(24)
Since ||J||∞=max
ℓPj|Jℓj |=max
ℓPj|∂pbr
ℓ
∂pj|, hence
||J||∞≤max
ℓ
e−ρℓ2
αℓ+ 1 X
j6=ℓ
Y
k6=ℓ,j
(1 −pk)
,
≤max
ℓ((N−1)(1 −pmin
global)(N−2)
eρℓ2(αℓ+ 1) ),(25)
where pmin
global ≤min
j{pmin
j}. Hence, ||J||∞<1if (25) is less
than 1, thereby proving the existence of a unique fixed point.
Now, note that any fixed point of fbr is also NE of G, and
vice-versa. Therefore, there exists a unique NE. Hence, fbr
(best response strategy) converges to the unique NE, thereby
proving Theorem 5.
APP EN D IX E
PROO F OF LEM MA 2
Let for each node ℓ,pmin
ℓ=pmin
global. To satisfy the condition
in Lemma 4, we restrict pmin
ℓto the interval (0, e−αℓ/2].
Therefore, pmin
global ≤e−αℓ/2 = e−ρℓ1cℓ/2. Hence, ρℓ1≤
−1
cℓln(2pmin
global). Note that ρℓ1>0for pmin
global <0.5.
Now, given that pmin
ℓand ρℓ1is fixed, consider the function
f(n) = (n−1)(1 −pmin
global)(n−2), where n∈R+. If
f(n)/(eρℓ2(αℓ+ 1)) <1for every n∈R+, then (8) is always
satisfied irrespective of N.
Let n∗be the maximizer of f(n). Therefore,
n∗=argmax
n
f(n) = 1 −1
ln(1 −pmin
global),and (26)
f(n∗) = max
nf(n) = (n∗−1)(1 −pmin
global)(n∗−2).(27)
So, f(n)/(eρℓ2(αℓ+ 1)) ≤f(n∗)/(eρℓ2(αℓ+ 1)), and
f(n∗)/(eρℓ2(αℓ+1)) <1is implied by ρℓ2>ln(f(n∗)/(αℓ+
1)). Also, ρℓ2>0. Hence, (8) is always satisfied if ρℓ2>
max{0,ln(f(n∗)/(αℓ+ 1))}.
APP EN D IX F
PROO F OF THE ORE M 6
To prove Theorem 6, we use Theorem 9, which is a special
case of Theorem 6.2 in [30].
Theorem 9: In the optimization problem (7), let Uℓbe a
strictly concave, continuous one-dimensional function in pℓ,
and p∗
ℓbe the unique maximizer. The stochastic subgradient
method (13) will have limt→∞ pℓ(t) = p∗
ℓwith probability 1,
if the following conditions are satisfied:
1) Uℓ(p∗
ℓ;P−ℓ)−Uℓ(pℓ(t); P−ℓ)≤
E{vℓ(t)|pℓ(1), pℓ(2), ..., pℓ(t); P−ℓ}(p∗
ℓ−pℓ(t))+ ro(t),
where ro(t)may depend upon pℓ(1), pℓ(2), ..., pℓ(t).
2) κ(t)>0,∀tand P∞
t=1 κ(t) = ∞.
3) P∞
t=1 E[κ(t)|ro(t)|+κ2(t)|vℓ(t)|2]<∞.
From Theorem 4, we know that Uℓis a strictly concave
and continuous one-dimensional function in pℓ∈[pmin
ℓ,1]
(for fixed P−ℓ), and p∗
ℓis its unique maximizer. Therefore,
according to Theorem 9, (13) (and hence, (3)) will converge to
p∗
ℓalmost surely if the three conditions are satisfied. Note that,
assuming ro(t) = 0,∀t, and using (11) and strict concavity of
Uℓ, condition 1 is satisfied.
Further, if the sequence {κ(t)}t∈Nis chosen such that ∀t,
κ(t)>0,P∞
t=1 κ(t) = ∞, and P∞
t=1 κ2(t)<∞, then
condition 2 is satisfied.
For ro(t) = 0, condition 3 simplifies to
P∞
t=1 E[κ2(t)|vℓ(t)|2]<∞. And we have |vℓ(t)|<∞.
Therefore, there exists a constant M < ∞such that |vℓ(t)|<
M. Hence, P∞
t=1 E[κ2(t)|vℓ(t)|2]≤M2P∞
t=1 E[κ2(t)] <∞
(because κ(t)is fixed, therefore E[κ2(t)] = κ2(t), and
we chose {κ(t)}t∈Nsuch that P∞
t=1 κ2(t)<∞). Hence,
condition 3 is also satisfied.
Therefore, if ∀t,κ(t)>0,P∞
t=1 κ(t) = ∞, and
P∞
t=1 κ2(t)<∞, then the learning algorithm (3) converges
to p∗
ℓalmost surely. Further, due to Lemma 4, we have
pmin
ℓ=pbr
ℓ, thereby proving Theorem 6.
APP EN D IX G
PROO F OF THE ORE M 3
Let PNE be the transmission probability of nodes at NE.
Using (23) for each node ℓ, we get
e−αℓpNE
ℓ=pNE
ℓ1 + 1
bNE
ℓ,(28)
where bN E
ℓ=Qk6=ℓ(1 −pNE
k)−1eρℓ2. Further, the overall
utility of the system is given by Usys (P) = PℓUℓ(pℓ;P−ℓ).
Therefore for each node j,
∂Usy s(P)
∂pj
=e−αjpj−pj1 + 1
bj+X
ℓ6=j
p2
ℓ
2e−ρℓ2Y
k6=ℓ,j
(1 −pk).
(29)
Note that from Theorem 1 and Theorem 2, we have PLA =
PNE (where PLA denotes the vector of (converged) transmis-
sion probabilities obtained using the learning algorithm (3)).
Therefore,
∂Usy s(P)
∂pjP=PLA
=∂Usy s(P)
∂pjP=PN E
,
(a)
=1
2X
ℓ6=j
(pNE
ℓ)2e−ρℓ2Y
k6=ℓ,j
(1 −pNE
k),
(30)
where, (a)is obtained using (28) and (29).
Also, for each node j,Usy s(P)is continuously differen-
tiable in pj, and
∂2Usys(P)
∂p2
j
=−αje−αjpj−1 + 1
bj<0.(31)
So, Usys(P)is strictly concave in pjfor each node j, and
hence for a given P−j,Usys (P)is maximum for pj∈[0,1]
at which the absolute value of its slope (29) is minimum (i.e.,
close to 0). Since Usys(P)is maximum at P=POP T (in (15),
we assumed POP T to be the optimal transmission probability
vector which maximizes Usys (·)), therefore for PLA (i.e.,
PNE ) to be close to POP T (and P oA ≈1according to (15)),
(30) must be close to 0 for each node j. However, when Nis
small (less than 4in Fig. 6), then with addition of every new
node in the system, number of positive terms in the summation
on RHS of (30) increases, thereby taking the value of (30) far
from 0 for each node j. Therefore, P oA increases. But because
(1−pNE
k)<1(∵∀k, pN E
k∈(0,1)), therefore if Nincreases,
then for each ℓ,(pN E
ℓ)2e−ρℓ2Qk6=ℓ,j (1−pN E
k)(i.e., each term
in the summation on RHS of (30)) decreases exponentially.
Hence when Nis large (e.g., in Fig. 6, for N≥4), with
addition of each new node, the overall value of (30) decreases
to a value close to 0, and as a consequence, PLA moves closer
to POP T . Hence, as N→ ∞,P oA approaches unity.