# Michael J. Neely's research while affiliated with University of Southern California and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (171)

In this paper, we provide a sub-gradient based algorithm to solve general constrained convex optimization without taking projections onto the domain set. The well studied Frank-Wolfe type algorithms also avoid projections. However, they are only designed to handle smooth objective functions. The proposed algorithm treats both smooth and non-smooth...

This paper proves an impossibility result for stochastic network utility maximization for multi-user wireless systems, including multiple access and broadcast systems. Every time slot an access point observes the current channel states for each user and opportunistically selects a vector of transmission rates. Channel state vectors are assumed to b...

This paper proves a representation theorem regarding sequences of random elements that take values in a Borel space and are measurable with respect to the sigma algebra generated by an arbitrary union of sigma algebras. This, together with a related representation theorem of Kallenberg, is used to characterize the set of multidimensional decision v...

This paper considers online convex optimization (OCO) problems where decisions are constrained by available energy resources. A key scenario is optimal power control for an energy harvesting device with a finite capacity battery. The goal is to minimize a time-average loss function while keeping the used energy less than what is available. In this...

Future generation wireless technologies are expected to serve an increasingly dense and dynamic population of users that generate short bundles of information to be transferred over the shared spectrum. This calls for new distributed and low-overhead Multiple-Access-Control (MAC) strategies to serve such dynamic demands with spectral efficiency cha...

This paper revisits a classical problem of slotted multiple access with success, idle, and collision events on each slot. First, results of a 2-user multiple access game are reported. The game was conducted at the University of Southern California over multiple semesters and involved competitions between student-designed algorithms. An algorithm ca...

This paper considers online convex optimization (OCO) problems where decisions are constrained by available energy resources. A key scenario is optimal power control for an energy harvesting device with a finite capacity battery. The goal is to minimize a time-average loss function while keeping the used energy less than what is available. In this...

This paper considers online optimization of a renewal-reward system. A controller performs a sequence of tasks back-to-back. Each task has a random vector of parameters, called the task type vector, that affects the task processing options and also affects the resulting reward and time duration of the task. The probability distribution for the task...

We consider online convex optimization with stochastic constraints where the objective functions are arbitrarily time-varying and the constraint functions are independent and identically distributed (i.i.d.) over time. Both the objective and constraint functions are revealed after the decision is made at each time slot. The best known expected regr...

We consider online convex optimization with stochastic constraints where the objective functions are arbitrarily time-varying and the constraint functions are independent and identically distributed (i.i.d.) over time. Both the objective and constraint functions are revealed after the decision is made at each time slot. The best known expected regr...

This paper proves an impossibility result for stochastic network utility maximization for multi-user wireless systems, including multiple access and broadcast systems. Every time slot an access point observes the current channel states for each user and opportunistically selects a vector of transmission rates. Channel state vectors are assumed to b...

We consider online convex optimization with stochastic constraints where the objective functions are arbitrarily time-varying and the constraint functions are independent and identically distributed (i.i.d.) over time. Both the objective and constraint functions are revealed after the decision is made at each time slot. The best known expected regr...

This paper considers utility optimal power control for energy-harvesting wireless devices with a finite capacity battery. The distribution information of the underlying wireless environment and harvestable energy is unknown, and only outdated system state information is known at the device controller. This scenario shares similarity with Lyapunov o...

We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well the decision maker can perform in T slots, starting from any state, compared to the best feasible randomized s...

This paper considers optimization over multiple renewal systems coupled by time-average constraints. These systems act asynchronously over variable length frames. When a particular system starts a new renewal frame, it chooses an action from a set of options for that frame. The action determines the duration of the frame, the penalty incurred durin...

We propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex. The best known iteration complexity solving such a non-smooth problem is $\mathcal{O}(\varepsilon^{-1})$. In this paper, we show that by leveraging a local error bou...

We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well the decision maker can perform in T slots, starting from any state, compared to the best feasible randomized s...

We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well the decision maker can perform in T slots, starting from any state, compared to the best feasible randomized s...

We study constrained stochastic programs where the decision vector at each time slot cannot be chosen freely but is tied to the realization of an underlying random state vector. The goal is to minimize a general objective function subject to linear constraints. A typical scenario where such programs appear is opportunistic scheduling over a network...

This paper considers utility optimal power control for energy harvesting wireless devices with a finite capacity battery. The distribution information of the underlying wireless environment and harvestable energy is unknown and only outdated system state information is known at the device controller. This scenario shares similarity with Lyapunov op...

This paper considers the fundamental convergence time for opportunistic scheduling over time-varying channels. The channel state probabilities are unknown and algorithms must perform some type of estimation and learning while they make decisions to optimize network utility. Existing schemes can achieve a utility within $\epsilon$ of optimality, for...

We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well the decision maker can perform in $T$ slots, starting from any state, compared to the best feasible randomized...

This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich's OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturall...

This paper studies the convergence time of dual gradient methods for general (possibly nondifferentiable) strongly convex programs. For general convex programs, the convergence time of dual subgradient/gradient methods with simple running averages (running averages started from iteration 0) is known to be O(1/ε
<sup xmlns:mml="http://www.w3.org/199...

This paper considers large scale constrained convex (possibly composite and non-separable) programs, which are usually difficult to solve by interior point methods or other Newton-type methods due to the non-smoothness or the prohibitive computation and storage complexity for Hessians and matrix inversions. Instead, they are often solved by first o...

This paper considers dynamic transmit covariance design in point-to-point MIMO fading systems with unknown channel state distributions and inaccurate channel state information subject to both long term and short term power constraints. First, the case of instantaneous but possibly inaccurate channel state information at the transmitter (CSIT) is tr...

This paper considers online convex optimization with time-varying constraint functions. Specifically, we have a sequence of convex objective functions $\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions $\{g_{t,i}(x)\}_{t=0}^{\infty}$ for $i \in \{1, ..., k\}$. The functions are gradually revealed over time. For a given $\epsilon>0$, the go...

The backpressure algorithm has been widely used as a distributed solution to the problem of joint rate control and routing in multi-hop data networks. By controlling a parameter $V$ in the algorithm, the backpressure algorithm can achieve an arbitrarily small utility optimality gap. However, this in turn brings in a large queue length at each node...

Traffic load-balancing in datacenters alleviates hot spots and improves network utilization. In this paper, a stable in-network load-balancing algorithm is developed in the setting of software-defined networking. A control plane configures a data plane over successive intervals of time. While the MaxWeight algorithm can be applied in this setting a...

This paper considers time-average optimization, where a decision vector is chosen every time step within a (possibly non-convex) set, and the goal is to minimize a convex function of the time averages subject to convex constraints on these averages. Such problems have applications in networking, multi-agent systems, and operations research, where d...

This paper considers optimization over multiple renewal systems coupled by time average constraints. These systems act asynchronously over variable length frames. For each system, at the beginning of each renewal frame, it chooses an action which affects the duration of its own frame, the penalty, and the resource expenditure throughout the frame....

Stochastic non-smooth convex optimization constitutes a class of problems in machine learning and operations research. This paper considers minimization of a non-smooth function based on stochastic subgradients. When the function has a locally polyhedral structure, a staggered time average algorithm is proven to have O(1/T) convergence rate. A more...

This paper considers constrained optimization over a renewal system. A controller observes a random event at the beginning of each renewal frame and then chooses an action that affects the duration of the frame, the amount of resources used, and a penalty metric. The goal is to make frame-wise decisions so as to minimize the time average penalty su...

This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional Zinkevich's projection based online algorithm (Zinkevich 2013) can be difficult to implement due to the potentially high computation complexity of the projection operat...

This paper considers large scale constrained convex programs, which are usually not solvable by interior point methods or other Newton-type methods due to the prohibitive computation and storage complexity for Hessians and matrix inversions. Instead, large scale constrained convex programs are often solved by gradient based methods or decomposition...

This paper considers convex programs with a general (possibly
non-differentiable) convex object function and Lipschitz continuous convex
inequality constraint functions and proposes a simple parallel algorithm with
$O(1/t)$ convergence rate. Similar to the classical dual subgradient algorithm
or the ADMM algorithm, the new algorithm has a distribut...

This paper considers dynamic power allocation in MIMO fading systems with
unknown channel state distributions. First, the ideal case of perfect
instantaneous channel state information at the transmitter (CSIT) is treated.
Using the drift-plus-penalty method, a dynamic power allocation policy is
developed and shown to approach optimality, regardless...

This paper considers a cost minimization problem for data centers with $N$
servers and randomly arriving service requests. A central router decides which
server to use for each new request. Each server has three types of states
(active, idle, setup) with different costs and time durations. The servers
operate asynchronously over their own states an...

This paper considers the problem of minimizing the time average of a
controlled stochastic process subject to multiple time average constraints on
other related processes. The probability distribution of the random events in
the system is unknown to the controller. A typical application is time average
power minimization subject to network throughp...

This paper considers optimization of power and delay in a time-varying
wireless link using rateless codes. The link serves a sequence of
variable-length packets. Each packet is coded and transmitted over multiple
slots. Channel conditions can change from slot to slot and are unknown to the
transmitter. The amount of mutual information accumulated o...

We consider the problem of simultaneous on-demand streaming of stored video
to multiple users in a multi-cell wireless network where multiple unicast
streaming sessions are run in parallel and share the same frequency band. Each
streaming session is formed by the sequential transmission of video "chunks,"
such that each chunk arrives into the corre...

This paper treats power-aware throughput maxi-mization in a multi-user file
downloading system. Each user can receive a new file only after its previous
file is finished. The file state processes for each user act as coupled Markov
chains that form a generalized restless bandit system. First, an optimal
algorithm is derived for the case of one user...

This paper studies the convergence time of the drift-plus-penalty algorithm
for strongly convex programs. The drift-plus-penalty algorithm was originally
developed to solve more general stochastic optimization and is closely related
to the dual subgradient algorithm when applied to deterministic convex
programs. For general convex programs, the con...

One practical open problem is the development of a distributed algorithm that
achieves near-optimal utility using only a finite (and small) buffer size for
queues in a stochastic network. This paper studies utility maximization (or
cost minimization) in a finite-buffer regime and considers the corresponding
delay and reliability (or rate of packet...

We consider the design of a scheduling policy for video streaming in a wireless network formed by several users and helpers (e.g., base stations). In such networks, any user is typically in the range of multiple helpers. Hence, an efficient policy should allow the users to dynamically select the helper nodes to download from and determine adaptivel...

This paper considers information sharing in a multi-player repeated game.
Every round, each player observes a subset of components of a random vector and
then takes a control action. The utility earned by each player depends on the
full random vector and on the actions of others. An example is a game where
different rewards are placed over multiple...

This paper considers time-average stochastic optimization, where a time
average decision vector, an average of decision vectors chosen in every time
step from a time-varying (possibly nonconvex) set, minimizes a convex objective
function and satisfies convex constraints. A class of this formulation with a
random, discrete decision set has applicati...

This paper considers the problem of minimizing the time average of a
stochastic process subject to time average constraints on other processes. A
canonical example is minimizing average power in a data network subject to
multi-user throughput constraints. Another example is a (static) convex
program. Under a Slater condition, the drift-plus-penalty...

This paper considers a wireless link with randomly arriving data that is
queued and served over a time-varying channel. It is known that any algorithm
that comes within $\epsilon$ of the minimum average power required for queue
stability must incur average queue size at least $\Omega(\log(1/\epsilon))$.
However, the optimal convergence time is unkn...

This paper considers peer-to-peer scheduling for a network with multiple wireless devices. A subset of the devices are mobile users that desire specific files. Each user may already have certain popular files in its cache. The remaining devices are access points that typically have a larger set of files. Users can download packets of their requeste...

We consider extensions and improvements on our previous work on dynamic
adaptive video streaming in a multi-cell multiuser ``small cell'' wireless
network. Previously, we treated the case of single-antenna base stations and,
starting from a network utility maximization (NUM) formulation, we devised a
``push'' scheduling policy, where users place re...

This paper treats power-aware throughput maximization in a multi-user file
downloading system. Each user can receive a new file only after its previous
file is finished. The file state processes for each user act as coupled Markov
chains that form a generalized restless bandit system. First, an optimal
algorithm is derived for the case of one user....

This paper considers a stochastic optimization approach for job scheduling and server management in large-scale, geographically distributed data centers. Randomly arriving jobs are routed to a choice of servers. The number of active servers depends on server activation decisions that are updated at a slow time scale, and the service rates of the se...

We consider a wireless broadcast station that transmits packets to multiple users. The packet requests for each user may overlap, and some users may already have certain packets. This presents a problem of broadcasting in the presence of side information, and is a generalization of the well-known (and unsolved) index coding problem of information t...

This paper considers a time-varying game with N players. Every time slot,
players observe their own random events and then take a control action. The
events and control actions affect the individual utilities earned by each
player. The goal is to maximize a concave function of time average utilities
subject to equilibrium constraints. Specifically,...

This paper investigates Quality of Information (QoI) aware adaptive sampling in a system where two sensor devices report information to an end user. The system carries out a sequence of tasks, where each task relates to a random event that must be observed. The accumulated information obtained from the sensor devices is reported once per task to a...

This demo abstract describes an initial design of a new adaptive video streaming protocol for device-to-device WiFi-based mobile platforms and its software implementation. For the demonstration, two mobile servers and two mobile users will be deployed verifying that our device-to-device adaptive video streaming implementation works with desirable u...

We consider a one-hop wireless system with a small number of delay constrained users and a larger number of users without delay constraints. We develop a scheduling algorithm that reacts to time varying channels and maximizes throughput utility (to within a desired proximity), stabilizes all queues, and satisfies the delay constraints. The problem...

This paper considers a base station that delivers packets to multiple
receivers through a sequence of coded transmissions. All receivers overhear the
same transmissions. Each receiver may already have some of the packets as side
information, and requests another subset of the packets. This problem is known
as the index coding problem and can be rep...

We consider the jointly optimal design of a transmission scheduling and
admission control policy for adaptive video streaming over small cell networks.
We formulate the problem as a dynamic network utility maximization and observe
that it naturally decomposes into two subproblems: admission control and
transmission scheduling. The resulting algorit...

We consider the optimal design of a scheduling policy for adaptive video
streaming in wireless 'Small-Cells' networks. We formulate the problem as a
network utility maximization, and we observe that it naturally decomposes into
two subproblems: admission control and transmission scheduling. The resulting
algorithms are simple and suitable for distr...

This paper considers a problem where multiple users make repeated decisions
based on their own observed events. The events and decisions at each time step
determine the values of a utility function and a collection of penalty
functions. The goal is to make distributed decisions over time to maximize time
average utility subject to time average cons...

It is well known that max-weight policies based on a queue backlog index can be used to stabilize stochastic networks, and that similar stability results hold if a delay index is used. Using Lyapunov optimization, we extend this analysis to design a utility maximizing algorithm that uses explicit delay information from the head-of-line packet at ea...

This paper considers optimization of time averages in systems with variable length renewal frames. Applications include power-aware and profit-aware scheduling in wireless networks, peer-to-peer networks, and transportation systems. Every frame, a new policy is implemented that affects the frame size and that creates a vector of attributes. The pol...

An information collection problem in a wireless network with random events is
considered. Wireless devices report on each event using one of multiple
reporting formats. Each format has a different quality and uses different data
lengths. Delivering all data in the highest quality format can overload system
resources. The goal is to make intelligent...

We consider the jointly optimal design of a transmission scheduling and admission control policy for adaptive streaming over wireless device-to-device networks. We formulate the problem as a dynamic network utility maximization and observe that it naturally decomposes into two subproblems: admission control and transmission scheduling. The resultin...

We investigate optimal routing and scheduling strategies for multi-hop wireless networks with rateless codes. Rateless codes allow each node of the network to accumulate mutual information with every packet transmission. This enables a significant performance gain over conventional shortest path routing. Further, it also outperforms cooperative com...

This paper considers optimal control for a collection of separate Markov decision systems that operate asynchronously over their own state spaces. Decisions at each system affect: (i) the time spent in the current state, (ii) a vector of penalties incurred, and (iii) the next-state transition probabilities. An example is a network of smart devices...

We analyze a generalized index coding problem that allows multiple users to request the same packet. For this problem we introduce a novel coding scheme called partition multicast. Our scheme can be seen as a natural generalization of clique cover for directed index coding problems. Further, partition multicast corresponds to an achievable scheme f...

Lyapunov drift is a powerful tool for optimizing stochastic queueing networks
subject to stability. However, the most convenient drift conditions often
provide results in terms of a time average expectation, rather than a pure time
average. This paper provides an extended drift-plus-penalty result that ensures
stability with desired time averages w...

An information collection problem in a wireless network with random events is considered. Wireless nodes report on each event using one of multiple reporting formats. Each format has a different quality and uses a different number of bits. Delivering all data in the highest quality format can overload system resources. The goal is to make intellige...

We consider a discrete time queueing system where a controller makes a 2-stage decision every slot. The decision at the first stage reveals a hidden source of randomness with a control-dependent (but unknown) probability distribution. The decision at the second stage generates an attribute vector that depends on this revealed randomness. The goal i...

We investigate opportunistic cooperation between secondary (femtocell) users and primary (macrocell) users in cognitive femtocell networks. We consider two models for such cooperation. In the first model, called the Cooperative Relay Model, a secondary user cannot transmit its own data concurrently with a primary user. However, it can employ cooper...

We consider energy-aware scheduling in a multi-server system with N classes of jobs. Jobs arrive randomly and are queued according to their class. Servers operate asynchronously over their own timelines. Each server can be in either the active state or the idle state. At the beginning of each active period, a server chooses a processing mode from a...

In this work we focus on a stochastic optimization based approach to make distributed routing and server management decisions in the context of large-scale, geographically distributed data centers, which offers significant potential for exploring power cost reductions. Our approach considers such decisions at different time scales and offers provab...

This paper considers peer-to-peer scheduling for a network with multiple wireless devices. A subset of the devices are mobile users that desire specific files. Each user may already have certain popular files in its cache. The remaining devices are access points that typically have access to a larger set of files. Users can download packets of thei...

This paper considers energy-aware control for a computing system with two
states: "active" and "idle." In the active state, the controller chooses to
perform a single task using one of multiple task processing modes. The
controller then saves energy by choosing an amount of time for the system to be
idle. These decisions affect processing time, ene...

The multiple-access framework of ZigZag decoding (1) is a useful technique for combating interference via multiple repeated transmissions, and is known to be compatible with distributed random access protocols. However, in the presence of noise this type of decoding can magnify errors, particularly when packet sizes are large. We present a simple s...

The multiple-access framework of ZigZag decoding (Gollakota and Katabi 2008) is a useful technique for combating interference via multiple repeated transmissions, and is known to be compatible with distributed random access protocols. However, in the presence of noise this type of decoding can magnify errors, particularly when packet sizes are larg...

The freedom and flexibility of wireless Mobile Ad-hoc Networks (MANETs) that make them extremely desirable for many military, emergency, and sensor network applications also present challenges for multiple layers in the network stack. Max-weight scheduling, also known as backpressure routing, is a cross-layer control algorithm that is well-known to...

We consider a system with K states which operates over frames with different lengths. Every frame, the controller observes a new random event and then chooses a control action based on this observation. The current state, random event, and control action together affect: (i) the frame size, (ii) a vector of penalties incurred over the frame, and (i...

We consider a wireless broadcast station that transmits packets to multiple
users. The packet requests for each user may overlap, and some users may
already have certain packets. This presents a problem of broadcasting in the
presence of side information, and is a generalization of the well known (and
unsolved) index coding problem of information t...

We study the fundamental network capacity of a multiuser wireless downlink under two assumptions: (1) Channels are not explicitly measured and thus instantaneous states are unknown; (2) Channels are modeled as Markov chains. This is an important network model to explore because channel probing may be costly or infeasible in some contexts. In this c...

This paper considers maximizing throughput utility in a multi-user network with partially observable Markov ON/OFF channels. Instantaneous channel states are never known, and all control decisions are based on information provided by ACK/NACK feedback from past transmissions. This system can be viewed as a restless multi-armed bandit problem with a...

There has been considerable recent work developing a new stochastic network utility maximization framework using Backpressure algorithms, also known as MaxWeight. A key open problem has been the development of utility-optimal algorithms that are also delay efficient. In this paper, we show that the Backpressure algorithm, when combined with the LIF...

## Citations

... To measure the freshness of data, the concept of Age of Information (AoI) has been introduced over the last decade (see, for example, [2]- [4]), which is defined concisely as the elapsed time since the generation time of the last received status update. Since the introduction of the AoI metric, numerous related studies emerged in various networking scenarios, including wireless random access networks (e.g., [5], [6]), content distribution networks (e.g., [7], [8]), scheduling (e.g., [9]- [13]), queuing networks (e.g., [14], [15]), and vehicular networks (e.g., [16]). ...

... Let E p [|A[k] p − p|] denote the expected mean absolute error given the true parameter is p. The following Bernoulli estimation lemma for mean absolute error is from [28] and is a modified version of a lemma for mean squared error developed in [25]: ...

... The n channels are related, and each channel follows Markovity. Therefore, the entire system can be described as a Markov model [21] [22]. Each channel has two states(idle (1) or busy (0)), which is called the Gilbert-Elliot channel [23] as shown in Figure 1. ...

... This is surprising because strong convexity/concavity provides convergence improvements in other contexts, including online convex optimization problems [23], [24], deterministic minimization via gradient descent [25], and deterministic minimization via stochastic gradients [17]- [19]. 1 This emphasizes the unique properties of opportunistic scheduling problems. ...

... Lemma A.1 (Pushback property of Bregman divergences[25, Lemma 14]). Let B : ∆ × ∆ o → R be a Bregman divergence function, where ∆ is the probability simplex in R d and ∆ o is the interior of ∆. ...

... Yu et al. [46] provide a primal-dual proximal gradient algorithm achieving O( √ T ) cumulative regret and constraint violation by assuming Slater's condition. Moreover, Wei et al. [43] provide bounds of the same order by assuming a less stringent version of the Slater's condition. As a performance metric, the latter work use static regret. ...

... They proposed an algorithm that simultaneously achieves O( √ T ) regret and (expected) constraint violation. A recent work (Wei, Yu, and Neely 2020) has improved this result by removing some assumptions while maintaining the regret guarantees. ...

... However, the energy of battery may be exhausted if the average of harvested energy is below the maximum of feasible set. Then the OCO with stochastic constraints [13] is deeply analyzed, and a new algorithm of guaranteeing node continuity is proposed which required a costly big-capacity battery [14]. ...

... However, the MFA approach is only applicable to a homogeneous system. Finally, [17] and [27] studied weakly-coupled MDPs, where individual MDPs are independent and coupled through constraints, instead of coupled reward as in ours. ...

... In the seminal work of [36] and [37], the max-weight algorithm for assigning service to queues was shown to maximize throughput under complex scheduling constraints and probabilistic dynamics. This framework has been extended and applied to network switching [38], satellite communications [39], ad-hoc networking [40], [41], packet multicasting and broadcasting [42], packet-delivery-time reduction [43], multi-user MIMO [44], energy harvesting systems [45], and age-of-information minimization [46], [47]. In the works of [48] and [49], learning algorithms were used for achieving network stability under unknown arrival and channel statistics. ...