# Tamer BaşarUniversity of Illinois, Urbana-Champaign | UIUC · Department of Electrical and Computer Engineering

Tamer Başar

Doctor of Philosophy

## About

1,025

Publications

106,988

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

39,398

Citations

## Publications

Publications (1,025)

Book on mean-field-type games (under preparation)

We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with provable global convergence in learning the optimal linear estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm does not require any prior knowledge of the system for initialization and does not require the target system to be ope...

In this paper, we present a systematic procedure for robust adaptive control design for minimum phase uncertain multiple-input multiple-output linear systems that are right invertible and can be dynamically extended to a linear system with vector relative degree using a dynamic compensator that is known. For this class of systems, it is always poss...

In the classical communication setting multiple senders having access to the same source of information and transmitting it over channel(s) to a receiver in general leads to a decrease in estimation error at the receiver as compared with the single sender case. However, if the objectives of the information providers are different from that of the e...

This paper proposes a novel discrete-time multi-virus susceptible-infected-recovered (SIR) model that captures the spread of competing epidemics over a population network. First, we provide sufficient conditions for the infection level of all the viruses over the networked model to converge to zero in exponential time. Second, we propose an observa...

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-...

While the techniques in optimal control theory are often model-based, the policy optimization (PO) approach directly optimizes the performance metric of interest. Even though it has been an essential approach for reinforcement learning problems, there is little theoretical understanding of its performance. In this paper, we focus on the risk-constr...

This paper studies an $N$--agent cost-coupled game where the agents are connected via an unreliable capacity constrained network. Each agent receives state information over that network which loses packets with probability $p$. A Base station (BS) actively schedules agent communications over the network by minimizing a weighted Age of Information (...

We revisit in this paper the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG), a newly developed model-free learning framework for control applications. We provide a fine-grained sample complexity analysis for RHPG to learn a control policy that is both stabilizing and $\epsilon$...

We develop the first end-to-end sample complexity of model-free policy gradient (PG) methods in discrete-time infinite-horizon Kalman filtering. Specifically, we introduce the receding-horizon policy gradient (RHPG-KF) framework and demonstrate $\tilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for RHPG-KF in learning a stabilizing filter that...

We revisit in this letter the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG), a newly developed model-free learning framework for control applications. We provide a fine-grained sample complexity analysis for RHPG to learn a control policy that is both stabilizing and
$\epsilo...

How to effectively communicate over wireless networks characterized by link failures is central to understanding the fundamental limits in the performance of a networked control system. In this letter, we study the online remote control of linear-quadratic Gaussian systems over unreliable wireless channels (with random packet drops), where the cont...

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which...

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-...

In this paper, we consider a discrete-time multi-agent system involving $N$ cost-coupled networked rational agents solving a consensus problem and a central Base Station (BS), scheduling agent communications over a network. Due to a hard bandwidth constraint on the number of transmissions through the network, at most $R_d < N$ agents can concurrent...

We consider online reinforcement learning in Mean-Field Games. In contrast to the existing works, we alleviate the need for a mean-field oracle by developing an algorithm that estimates the mean-field and the optimal policy using a single sample path of the generic agent. We call this Sandbox Learning, as it can be used as a warm-start for any agen...

Multiagent decision making over networks has recently attracted an exponentially growing number of researchers from the systems and control community. The area has gained increasing momentum in engineering, social sciences, economics, urban science, and artificial intelligence as it serves as a prevalent framework for studying large and complex sys...

The paper deals with the setting where two viruses (say virus~1 and virus~2) coexist in a population, and they are not necessarily mutually exclusive, in the sense that infection due to one virus does not preclude the possibility of simultaneous infection due to the other. We develop a coupled bi-virus susceptible-infected-susceptible (SIS) model f...

We study incentive designs for a class of stochastic Stackelberg games with one leader and a large number of (finite as well as infinite population of) followers. We investigate whether the leader can craft a strategy under a dynamic information structure that induces a desired behavior among the followers. For the finite population setting, under...

We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a...

Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specif...

Scalability of reinforcement learning algorithms to multi-agent systems is a significant bottleneck to their practical use. In this paper, we approach multi-agent reinforcement learning from a mean-field game perspective, where the number of agents tends to infinity. Our analysis focuses on the structured setting of systems with linear dynamics and...

This paper considers the distributed optimization problem where each node of a peer-to-peer network minimizes a finite sum of objective functions by communicating with its neighboring nodes. In sharp contrast to the existing literature where the fastest distributed algorithms converge either with a global linear or a local superlinear rate, we prop...

This paper proposes a novel discrete-time multi-virus SIR (susceptible-infected-recovered) model that captures the spread of competing SIR epidemics over a population network. First, we provide a sufficient condition for the infection level of all the viruses over the networked model to converge to zero in exponential time. Second, we propose an ob...

In this paper, we study a large population game with heterogeneous dynamics and cost functions solving a consensus problem. Moreover, the agents have communication constraints which appear as: (1) an Additive-White Gaussian Noise (AWGN) channel, and (2) asynchronous data transmission via a fixed scheduling policy. Since the complexity of solving th...

We propose a fully asynchronous networked aggregative game (Asy-NAG) where each player minimizes a local cost function that depends on its own action and the aggregate of all players’ actions. In sharp contrast to the existing NAGs, each player in our Asy-NAG can compute an estimate of the aggregate action at any wall-clock time by only using (poss...

This paper proposes a fully asynchronous scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time using (possibly delayed) information from its neighbors. This is in sharp...

We explore in this paper sufficient conditions for the $H$-property to hold, with a particular focus on the so-called line graphons. A graphon is a symmetric, measurable function from the unit square $[0,1]^2$ to the closed interval $[0,1]$. Graphons can be used to sample random graphs, and a graphon is said to have the $H$-property if graphs on $n...

We consider information dissemination over a network of gossiping agents (nodes). In this model, a source keeps the most up-to-date information about a time-varying binary state of the world, and $n$ receiver nodes want to follow the information at the source as accurately as possible. When the information at the source changes, the source first se...

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing pos...

In this article, we study a class of discrete-time decentralized linear-quadratic-Gaussian (LQG) control problems with two controllers and
$d$
-step delayed information sharing pattern. An explicit form of the pair of optimal linear controllers is obtained in terms of backward Riccati equations and forward equations of estimation error covariance...

This paper proposes a novel discrete-time multi-virus SIR (susceptible-infected-recovered) model that captures the spread of competing SIR epidemics over a population network. First, we provide a sufficient condition for the infection level of all the viruses over the networked model to converge to zero in exponential time. Second, we propose an ob...

Learning in stochastic games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL). In this paper, we consider decentralized MARL in stochastic games in the non-asymptotic regime. In particular, we establish the finite-sample complexity of fully decentralized Q-learning algorithms in a significant class...

Graphons W can be used as stochastic models to sample graphs Gn on n nodes for n arbitrarily large. A graphon W is said to have the H-property if Gn admits a decomposition into disjoint cycles with probability one as n goes to infinity. Such a decomposition is known as a Hamiltonian decomposition. In this paper, we provide necessary conditions for...

Increased penetration of wind energy will make electricity market prices more volatile. As a result, market participants will bear increased financial risks, which impact investment decisions and in turn, makes it harder to achieve sustainable energy goals. As a remedy, in this paper, we propose an insurance market that complements any wholesale ma...

In this paper, we consider the stochastic optimal control problem for (forward) stochastic differential equations (SDEs) with jump diffusions and random coefficients under a recursive-type objective functional captured by a backward SDE (BSDE). Due to the jump-diffusion process with random coefficients in both the constraint (forward SDE) and the r...

This paper is concerned with developing mean-field game models for the evolution of epidemics. Specifically, an agent's decision -- to be socially active in the midst of an epidemic -- is modeled as a mean-field game with health-related costs and activity-related rewards. By considering the fully and partially observed versions of this problem, the...

Many real-world applications of multi-agent reinforcement learning (RL), such as multi-robot navigation and decentralized control of cyber-physical systems, involve the cooperation of agents as a team with aligned objectives. We study multi-agent RL in the most basic cooperative setting -- Markov teams -- a class of Markov games where the cooperati...

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing pos...

This is an overview paper on the relationship between risk-averse designs based on exponential loss functions with or without an additional unknown (adversarial) term and some classes of stochastic games. In particular, the paper discusses the equivalences between risk-averse controller and filter designs and saddle-point solutions of some correspo...

This paper considers remote state estimation in cyber-physical systems (CPSs) with multiple sensors. Each plant is modeled by a discrete-time stochastic linear system with measurements of each sensor transmitted to the corresponding remote estimator over a shared communication network when their securities are interdependent due to network induced...

In this chapter, we propose and analyze a cohesive minimax detection (MAD) mechanism for modern computing systems, e.g., a computer or a network of computers. MAD monitors system‐level activities across the entire system in order to assess them together. It evaluates system‐level activities at two orthogonal directions, in terms of their likeliness...

Graphons $W$ can be used as stochastic models to sample graphs $G_n$ on $n$ nodes for $n$ arbitrarily large. A graphon $W$ is said to have the $H$-property if $G_n$ admits a decomposition into disjoint cycles with probability one as $n$ goes to infinity. Such a decomposition is known as a Hamiltonian decomposition. In this paper, we provide necessa...

In this paper, we consider a discrete-time stochastic control problem with uncertain initial and target states. We first discuss the connection between optimal transport and stochastic control problems of this form. Next, we formulate a linear-quadratic regulator problem where the initial and terminal states are distributed according to specified p...

We study a class of deterministic two-player nonzero-sum differential games where one player uses piecewise-continuous controls to affect the continuously evolving state, while the other player uses impulse controls at certain discrete instants of time to shift the state from one level to another. The state measurements are made at some given insta...

In this paper, we derive the Stackelberg solution associated with a leader–follower rational expectations model. By solving the dynamic rational expectations optimization problem subject to forward and backward stochastic difference equations (FBSDEs), we obtain the Stackelberg strategy of the leader with an adapted open-loop information structure,...

The problem of event-triggered output feedback control for networked control systems (NCSs) with packet losses and quantization is addressed. A new dynamic quantization scheme is proposed to prevent saturation of the quantizer in the presence of external disturbances, and using the emulation-based approach, we show how to design the event-triggerin...

We introduce deceptive signaling framework as a new defense measure against advanced adversaries in cyber-physical systems. In general, adversaries look for system-related information, e.g., the underlying state of the system, in order to learn the system dynamics and to receive useful feedback regarding the success/failure of their actions so as t...

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe th...

The interconnectivity of cyber and physical systems and Internet of things has created ubiquitous concerns of cyber threats for enterprise system managers. It is common that the asset owners and enterprise network operators need to work with cybersecurity professionals to manage the risk by remunerating them for their efforts that are not directly...

The quality of many online services (such as online games, video streaming, cloud services) depends not only on the service capacity but also on the number of users using the service simultaneously. For a new online service, the potential users are often uncertain about both the capacity and congestion level of the service, and hence are uncertain...

Multi-agent reinforcement learning (MARL) has long been a significant research topic in both machine learning and control systems. Recent development of (single-agent) deep reinforcement learning has created a resurgence of interest in developing new MARL algorithms, especially those founded on theoretical analysis. In this paper, we review recent...

This paper deals with distributed reinforcement learning problems with safety constraints. In particular, we consider that a team of agents cooperate in a shared environment, where each agent has its individual reward function and safety constraints that involve all agents' joint actions. As such, the agents aim to maximize the team-average long-te...

Recent years have witnessed significant advances in technologies and services in modern network applications, including smart grid management, wireless communication, cybersecurity as well as multi-agent autonomous systems. Considering the heterogeneous nature of networked entities, emerging network applications call for game-theoretic models and l...

This article deals with distributed policy optimization in reinforcement learning, which involves a central controller and a group of learners. In particular, two typical settings encountered in several applications are considered:
multiagent reinforcement learning
(RL) and
parallel RL
, where frequent information exchanges between the learners...

We study the economic interactions among sellers and buyers in online markets. In such markets, buyers have limited information about the product quality, but can observe the sellers' reputations which depend on their past transaction histories and ratings from past buyers. Sellers compete in the same market through pricing, while considering the i...

Static reduction of dynamic stochastic team (or decentralized stochastic control) problems has been an effective method for establishing existence and approximation results for optimal policies. In this Part I of a two-part paper, we address stochastic dynamic teams. Part II addresses stochastic dynamic games. We consider two distinct types of stat...

Static reduction of dynamic stochastic team problems has been an effective method for establishing existence and approximation results for optimal policies, as we have discussed extensively in Part I of this paper. In this Part II of the two-part paper, we address stochastic dynamic games. Similar to Part I, we consider two distinct types of static...

We study the economic interactions among sellers and buyers in online markets. In such markets, buyers have limited information about the product quality, but can observe the sellers' reputations which depend on their past transaction histories and ratings from past buyers. Sellers compete in the same market through pricing, while considering the i...

We address Bayesian persuasion between a sender and a receiver with state-dependent quadratic cost measures for general classes of distributions. The receiver seeks to make mean-square-error estimate of a state based on a signal sent by the sender while the sender signals strategically in order to control the receiver's estimate in a certain way. S...

We propose a fully asynchronous networked aggregative game (Asy-NAG) where each player minimizes a cost function that depends on its local action and the aggregate of all players' actions. In sharp contrast to the existing NAGs, each player in our Asy-NAG can compute an estimate of the aggregate action at any wall-clock time by only using (possibly...

This paper considers a network of agents, where each agent is assumed to take actions optimally with respect to a predefined payoff function involving the latest actions of the agent's neighbors. Neighborhood relationships stem from payoff functions rather than actual communication channels between the agents. A principal is tasked to optimize the...

Despite the increasing interest in multi-agent reinforcement learning (MARL) in multiple communities, understanding its theoretical foundation has long been recognized as a challenging problem. In this paper, we address this problem by providing a finite-sample analysis for decentralized batch MARL. Specifically, we consider a type of mixed MARL se...

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we...

In this paper, we introduce a model for multiple competing viruses over networks, derived using each state variable of the model as the infection percentage of a group or a subpopulation. We show that the model is well-posed, and also compare it to a full probabilistic Markov model. We provide a necessary and sufficient condition for uniqueness of...