Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We describe Urban Driving Games (UDGs) as a particular class of differential games that model the interactions and incentives of the urban driving task. The drivers possess a “communal” interest, such as not colliding with each other, but are also self-interested in fulfilling traffic rules and personal objectives. Subject to their physical dynamics, the preference of the agents is expressed via a lexicographic relation that puts as first priority the shared objective of not colliding. Under mild assumptions, we show that communal UDGs have the structure of a lexicographic ordinal potential game which allows us to prove several interesting properties. Namely, socially efficient equilibria can be found by solving a single (lexicographic) optimal control problem and iterated best response schemes have desirable convergence guarantees.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Non-cooperative (dynamic) trajectory games [7] are a powerful tool to model multi-agent coordination tasks, for instance, autonomous driving [3,8,9], manipulation [4], multi-robot navigation [10], physical human-robot interaction [11], power networks [12], and space missions [13]. Nash [14] and Stackelberg [15] equilibria are popular solution concepts for such games. ...
... is the surrogate safety cost. Other nonlinear trajectory game algorithms (e.g., ILQGame [16,22] and iterated best response [8]) iteratively update players' strategies until convergence, which is not always guaranteed. In contrast, STP computes the strategy of each player in a single pass, leading to a deterministic computation overhead that scales linearly with the number of players (c.f. Figure 7). ...
... Connection to lexicographic preferences. We can employ STP to compute Stackelberg equilibria with lexicographic preferences, e.g., urban driving games [8], where the LSE in Definition 2 is instead defined with the lexicographic total order ⪯ on the tuple J i safe , J i indv . In this case, Assumption 2 can be relaxed and we only need to enforce (8) for the leader, assuming that the follower's STP policy γ i is the local unique minimum ofJ i safe . ...
... The multiagent trajectory planning problem is formulated as a generalized Nash equilibrium problem (GNEP) [20] with linear shared collisions constraints encoded using a mixed-integer quadratic programming (MIQP) framework. Building on the cooperative structure of urban driving [11,13,21], we recast the problem as a generalized mixed-integer potential game (GMIPG) [22,23], a specialized form of generalized potential games (GPGs) which are a subclass of GNEPs with desirable convergence properties. To manage the computational complexity of this problem, we introduce homotopy class constraints to tame the combinatorial aspect. ...
... Game-theoretic approaches have proven to be effective for modeling decision-making and have been successfully applied in various applications, namely urban driving [11,13]- [15,28,29], and racing [30]- [34]. A substantial body of work employs iterated best response algorithms to find a Nash equilibrium in the joint trajectory space of all agents [17,30]- [32,35]- [37], and nonlinear programming (NLP) techniques dedicated to solving the coupled optimal control problems (OCPs) of GNEPs [10,12,14,38]. ...
... Other approaches exploit the intrinsic cooperative nature of urban driving to give the trajectory planning problem additional structure. This allows reducing the OCPs to a single one within a GPG [39] framework [11,13,21,40], and solving for a pure-strategy Nash equilibrium using a suitable off-the-shelf solver. This work aligns with the latter approach, as we aim for a more refined notion of equilibria [39,41] -one that is optimal from the social point of view as it introduces some notions of fairness [40]. ...
Preprint
We propose a tactical homotopy-aware decision-making framework for game-theoretic motion planning in urban environments. We model urban driving as a generalized Nash equilibrium problem and employ a mixed-integer approach to tame the combinatorial aspect of motion planning. More specifically, by utilizing homotopy classes, we partition the high-dimensional solution space into finite, well-defined subregions. Each subregion (homotopy) corresponds to a high-level tactical decision, such as the passing order between pairs of players. The proposed formulation allows to find global optimal Nash equilibria in a computationally tractable manner by solving a mixed-integer quadratic program. Each homotopy decision is represented by a binary variable that activates different sets of linear collision avoidance constraints. This extra homotopic constraint allows to find solutions in a more efficient way (on a roundabout scenario on average 5-times faster). We experimentally validate the proposed approach on scenarios taken from the rounD dataset. Simulation-based testing in receding horizon fashion demonstrates the capability of the framework in achieving globally optimal solutions while yielding a 78% average decrease in the computational time with respect to an implementation without the homotopic constraints.
... To address this challenge, several numerical methods have been proposed in the literature to solve for the (open-loop) Generalized Nash Equilibrium (GNE). These include decomposition techniques such as Jacobi and (regularized) Gauss-Seidel iterated best response (IBR) schemes [4], [7], [9], [15] where players iteratively improve their strategies while assuming that the strategies of other agents remain constant, sensitivity-enhanced IBR [16], non-linear programming (NLP) techniques [3], [8] which solve the Karush-Kuhn-Tucker (KKT) system of the constraint-coupled optimization problem [17], and differential dynamic programming approaches [18] which aim to obtain a linear-quadratic (LQ) approximation of the underlying GNEP [6]. Additionally, Laine et al. propose a numerical technique to solve for the generalized feedback quasi-Nash equilibria of differential games [19], and apply it in a multi-agent motion planning setting. ...
... In general, game-theoretic motion planning frameworks require that the autonomous vehicles have information about other's cost functions and constraints [3], [7]- [9], which is a reasonable assumption that is also adopted in this work. The cost function can generally be obtained through information sharing, as demonstrated in [3], [4], [6]- [8], [10], or alternatively, its approximation 1 through inverse reinforcement techniques, as showcased in [9] where an agent aims to learn the cost function of other vehicles on the road. ...
... This paper alongside others such as [4], [7], [9], [22], [23], leverage the inherent cooperative nature in urban driving. As a result, the stochastic GNEP (SGNEP) is transformed into a stochastic potential game (SPG), which provides additional desirable properties [24], as it possesses a pure-strategy Nash equilibrium [4]. ...
Article
Urban driving is a challenging task that requires autonomous agents to account for the stochastic dynamics and interactions with other vehicles. In this paper, we propose a novel framework that models urban driving as a stochastic generalized Nash equilibrium problem (SGNEP) and solves it using information-theoretic model predictive control (IT-MPC). By exploiting the cooperative nature of urban driving, we transform the SGNEP into a stochastic potential game (SPG), which has desirable convergence guarantees. Furthermore, we provide an algorithm for isolating interacting vehicles and thus factorizing a game into multiple sub-games. Finally, we solve for the open-loop generalized Nash equilibrium of a stochastic game utilizing a sampling-based technique. We solve the problem in a receding-horizon fashion, and apply our framework to various urban scenarios, such as intersections, lane merges, and ramp merges, and show that it can achieve safe and efficient multi-agent navigation.
... Arguably, the hardness of driving interactions is to coordinate on a certain equilibrium [1] -who goes first when resources are contended. Under mild assumptions, it has been shown that there actually exist certain equilibria that shall be preferred in terms of social efficiency [8] or in terms of cost sharing [9]. At the same time, big strides forward have been made for game-theoretical planners that have local guarantees of convergence [5], [9], [10]. ...
... Many works in the last years modeled driving interactions as a general-sum game [8], [13]. And most-if not all-of the devised solution methods provide guarantees only for local convergence to Nash Equilibria; examples range from iterative quadratic approximations [10], augmented Lagrangian methods [9] and "Newtonesque" methods [14]. ...
... We consider the class of (urban) driving games akin to [8]. They are a particular subclass of general-sum games with few peculiarities. ...
Preprint
Full-text available
We consider the interaction among agents engaging in a driving task and we model it as general-sum game. This class of games exhibits a plurality of different equilibria posing the issue of equilibrium selection. While selecting the most efficient equilibrium (in term of social cost) is often impractical from a computational standpoint, in this work we study the (in)efficiency of any equilibrium players might agree to play. More specifically, we bound the equilibrium inefficiency by modeling driving games as particular type of congestion games over spatio-temporal resources. We obtain novel guarantees that refine existing bounds on the Price of Anarchy (PoA) as a function of problem-dependent game parameters. For instance, the relative trade-off between proximity costs and personal objectives such as comfort and progress. Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies trained via decentralized multi-agent reinforcement learning.
... In contrast, our work uses an iterative best response (IBR) scheme, in which each player takes a turn solving for their optimal strategy while assuming all other players' strategies are fixed [21][22][23][24]. By replacing the dynamic game with a sequence of optimal control problems, the computational burden of solving for a local Nash equilibrium strategy is substantially reduced. ...
... Our work also draws inspiration from the potential games literature, which exploits the cost structure of multi-player interactions in certain robotics applications [23,24]. In particular, when cost terms that couple different players are all symmetric, there exists a single optimal control problem whose solution gives Nash equilibrium strategies of the interacting players. ...
... This connection to potential games is critical, as it suggests a locally-convergent solution method for GTP-SLAM problems given in Section IV-C (Corollary 1). The following results are based upon established concepts in the literature [23,24,26]; here, we illustrate their pertinence to the noncooperative SLAM problem. ...
Preprint
Full-text available
Robots operating in complex, multi-player settings must simultaneously model the environment and the behavior of human or robotic agents who share that environment. Environmental modeling is often approached using Simultaneous Localization and Mapping (SLAM) techniques; however, SLAM algorithms usually neglect multi-player interactions. In contrast, a recent branch of the motion planning literature uses dynamic game theory to explicitly model noncooperative interactions of multiple agents in a known environment with perfect localization. In this work, we fuse ideas from these disparate communities to solve SLAM problems with game theoretic priors. We present GTP-SLAM, a novel, iterative best response-based SLAM algorithm that accurately performs state localization and map reconstruction in an uncharted scene, while capturing the inherent game-theoretic interactions among multiple agents in that scene. By formulating the underlying SLAM problem as a potential game, we inherit a strong convergence guarantee. Empirical results indicate that, when deployed in a realistic traffic simulation, our approach performs localization and mapping more accurately than a standard bundle adjustment algorithm across a wide range of noise levels.
... Additionally, motivated by the complexity of traffic rules and the need for a transparent and interpretable system, another body of literature looked at techniques to specify objectives in a prioritized fashion [2], [15]- [17], leading to practical results [18]. ...
... Then, the existence of pure NE follows via standard game theory results [26]. a) Jointly Communal Objectives: We consider the idea of players not having contrasting objectives by extending the idea of communal metrics [17,Def. 4] to arbitrary pairs of metrics. ...
... Note that Eq. (4) is not restrictive for many common cost functions used in mobile robotics, including, among others, collision costs, clearance objectives (i.e., keeping a minimum safety distance from obstacles) [17]. Moreover, any pair of personal metrics (i.e., metrics whose outcome depends on the single player strategy only) is trivially jointly communal. ...
Preprint
Full-text available
Modern applications require robots to comply with multiple, often conflicting rules and to interact with the other agents. We present Posetal Games as a class of games in which each player expresses a preference over the outcomes via a partially ordered set of metrics. This allows one to combine hierarchical priorities of each player with the interactive nature of the environment. By contextualizing standard game theoretical notions, we provide two sufficient conditions on the preference of the players to prove existence of pure Nash Equilibria in finite action sets. Moreover, we define formal operations on the preference structures and link them to a refinement of the game solutions, showing how the set of equilibria can be systematically shrunk. The presented results are showcased in a driving game where autonomous vehicles select from a finite set of trajectories. The results demonstrate the interpretability of results in terms of minimum-rank-violation for each player.
... HRI as a partially observable game. HRI is oftentimes a noncooperative game [14,40] since decisions of the human and robot are coupled and they may not share the same objectives. Due to interaction uncertainty, HRI planning is more precisely categorized as a partially-observable stochastic game (POSG) [16]. ...
... Due to interaction uncertainty, HRI planning is more precisely categorized as a partially-observable stochastic game (POSG) [16]. While efficient game solvers have been developed for deterministic HRI planning [14,15,40], extending those to POSG requires active information gathering to reduce the uncertainty in doxo-states. Recent approaches [31,38] make simplifying assumptions on the POSG and incorporate a heuristic information-gathering term in the robot's cost to reduce uncertainty, which requires manual tuning to balance with the robot's nominal performance. ...
... While various methods to cope with hierarchical preferences have been developed-such as the aforementioned strategy of weighting agents' preferences according to their priority as in [30]-most focus on single-agent scenarios, and there are very limited results for multi-agent, noncooperative settings. For example, recent work [31] applies lexicographic minimization to an urban driving game via an iterated best response (IBR) scheme. However, this approach is limited to a certain class of games where IBR is guaranteed to converge. ...
Preprint
Full-text available
We study noncooperative games, in which each agent's objective is composed of a sequence of ordered-and potentially conflicting-preferences. Problems of this type naturally model a wide variety of scenarios: for example, drivers at a busy intersection must balance the desire to make forward progress with the risk of collision. Mathematically, these problems possess a nested structure, and to behave properly agents must prioritize their most important preference, and only consider less important preferences to the extent that they do not compromise performance on more important ones. We consider multi-agent, noncooperative variants of these problems, and seek generalized Nash equilibria in which each agent's decision reflects both its hierarchy of preferences and other agents' actions. We make two key contributions. First, we develop a recursive approach for deriving the first-order optimality conditions of each agent's nested problem. Second, we propose a sequence of increasingly tight relaxations, each of which can be transcribed as a mixed complementarity problem and solved via existing methods. Experimental results demonstrate that our approach reliably converges to equilibrium solutions that strictly reflect agents' individual ordered preferences.
... After obtaining the cost matrix, we compute the Nash equilibrium of the matrix game by enumerating all possible combinations of semantic-level actions. If multiple Nash equilibria exist, we select the equilibrium with the lowest social cost [45], defined as C ij := J EV ij + J VG ij . If a pure-strategy Nash equilibrium does not exist, we choose the Stackelberg equilibrium [21], [35] with the EV as the follower as a backup solution. ...
... Dynamic games have shown promise in addressing a wide range of multi-agent coordination scenarios, from au-tonomous driving [8][9][10] and physical human-robot interaction [11] to smart-grid networks [12]. While computing equilibrium solutions of dynamic games is typically challenging, recently developed toolboxes solve linear-quadratic (LQ) approximations of intricate, non-convex games. ...
... In Sadigh et al. (2018), the authors propose to model human-robot interaction as a dynamical system, in which the robot's action can affect the human's action, thus producing predictions of the human's future states. Related work in the following years proposes to solve the joint problem as a generalsum dynamic game Fisac et al. (2019);; Zanardi et al. (2021). Recent work Hu et al. (2023) also considers the joint prediction-planning setting, where the authors focus on safety analysis in belief space, leading to a belief-space zero-sum game formulated using Hamilton-Jacobi (HJ) Reachability. ...
Article
The ability to accurately predict others’ behavior is central to the safety and efficiency of robotic systems in interactive settings, such as human–robot interaction and multi-robot teaming tasks. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents’ goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning, mainly due to the fundamental coupling between the robot’s trajectory plan and its prediction of other agents’ intent. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we use a runtime safety filter (also referred to as a “shielding” scheme), which overrides the robot’s dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability agent behaviors. We demonstrate the efficacy of our approach with both simulated driving studies and hardware experiments using 1/10 scale autonomous vehicles.
... Due to the interactive nature of racing, where each agent has to account for other agents' decisions, interactions among agents can naturally be captured in a game-theoretic framework. Game-theoretic planning has been extensively used in non-cooperative motion planning for multiple agents [6], [7], [8], [9], [10], [11]. Due to its success in motion planning, game theory has also recently been applied in autonomous racing. ...
Preprint
Full-text available
In this work, we consider the problem of autonomous racing with multiple agents where agents must interact closely and influence each other to compete. We model interactions among agents through a game-theoretical framework and propose an efficient algorithm for tractably solving the resulting game in real time. More specifically, we capture interactions among multiple agents through a constrained dynamic game. We show that the resulting dynamic game is an instance of a simple-to-analyze class of games. Namely, we show that our racing game is an instance of a constrained dynamic potential game. An important and appealing property of dynamic potential games is that a generalized Nash equilibrium of the underlying game can be computed by solving a single constrained optimal control problem instead of multiple coupled constrained optimal control problems. Leveraging this property, we show that the problem of autonomous racing is greatly simplified and develop RAPID (autonomous multi-agent RAcing using constrained PotentIal Dynamic games), a racing algorithm that can be solved tractably in real-time. Through simulation studies, we demonstrate that our algorithm outperforms the state-of-the-art approach. We further show the real-time capabilities of our algorithm in hardware experiments.
... Dynamic games have shown promise in addressing a wide range of multi-agent coordination scenarios, from autonomous driving [8][9][10] and physical human-robot interaction [11] to smart-grid networks [12]. While computing equilibrium solutions of dynamic games is typically challenging, contemporary tools have been created to enable linear-quadratic (LQ) approximations of intricate, nonconvex games. ...
Preprint
Full-text available
We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example.
... This choice simplifies the presentation of this work restricting the interaction among the agents to be collisions only. Nevertheless, in more general cases, one can have coupling among the agents also at the cost level (e.g., a safety distance penalty) [16]. The concepts hereby presented would still be applicable (maybe with minor rework) as long as the MA-MMP cost function is separable in the agents. ...
Preprint
Full-text available
Modern robotics often involves multiple embodied agents operating within a shared environment. Path planning in these cases is considerably more challenging than in single-agent scenarios. Although standard Sampling-based Algorithms (SBAs) can be used to search for solutions in the robots' joint space, this approach quickly becomes computationally intractable as the number of agents increases. To address this issue, we integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. During the search for a solution we can decouple (i.e., factorize) different subsets of agents into independent lower-dimensional search spaces once we certify that their future solutions will be independent of each other using a factorization heuristic. Consequently, we progressively construct a lean hypergraph where certain (hyper-)edges split the agents to independent subgraphs. In the best case, this approach can reduce the growth in dimensionality of the search space from exponential to linear in the number of agents. On average, fewer samples are needed to find high-quality solutions while preserving the optimality, completeness, and anytime properties of SBAs. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
... Monderer and Shapley [19] proved that Nash equilibria are guaranteed to exist in potential games, a large subclass of non-cooperative and simultaneous games; specifically, if the rationality parameters of a potential game, i.e., the parameters describing the agents' payoffs and strategy sets, are common information and observable, there always exists a Nash equilibrium that is an extremum of a so-called potential function. Several games with practical applications can be modeled as potential games, for example, congestion games, Cournot games [19], and certain urban driving games [26]. However, the agents' self-driven behavior frequently conflicts with the greater societal goals; in such cases, external regulators (e.g., governments) can intervene in the agents' interaction via incentives, laws, and regulations. ...
Preprint
Full-text available
We propose a stochastic first-order algorithm to learn the rationality parameters of simultaneous and non-cooperative potential games, i.e., the parameters of the agents' optimization problems. Our technique combines (i.) an active-set step that enforces that the agents play at a Nash equilibrium and (ii.) an implicit-differentiation step to update the estimates of the rationality parameters. We detail the convergence properties of our algorithm and perform numerical experiments on Cournot and congestion games, showing that our algorithm effectively finds high-quality solutions (in terms of out-of-sample loss) and scales to large datasets.
... In Sadigh et al. (2018), the authors propose to model human-robot interaction as a dynamical system, in which the robot's action can affect the human's action, thus producing predictions of the human's future states. Related work in the following years proposes to solve the joint problem as a generalsum dynamic game Fisac et al. (2019);Fridovich-Keil et al. (2020); Zanardi et al. (2021). Our method falls into the category of joint prediction and planning. ...
Preprint
Full-text available
The ability to accurately predict the opponent's behavior is central to the safety and efficiency of robotic systems in interactive settings, such as human-robot interaction and multi-robot teaming tasks. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as opponent's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we leverage a supervisory control scheme, oftentimes referred to as ``shielding'', which overrides the ego agent's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability opponent's behaviors. We demonstrate the efficacy of our approach with both simulated driving examples and hardware experiments using 1/10 scale autonomous vehicles.
... Analog to the topological braids, other analytical methods have also been used to evaluate the priority preference of human drivers, which is then embedded into game-theoretic frameworks to model interactions. For instance, Zanardi et al., 2021a;Zanardi et al., 2021b formulated the interactions between human drivers as urban driving games and embedded each player's problem complexity in the lexicographic preference or the prioritized metrics of the agent over the outcomes. ...
Book
Full-text available
In real-world traffic, rational human drivers can make socially-compatible decisions in complex and crowded scenarios by efficiently negotiating with their surroundings using non-linguistic communications such as gesturing, deictics, and motion cues. Understanding the principles and rules of the dynamic interaction among human drivers in complex traffic scenes allows 1) generating diverse social driving behaviors that leverage beliefs and expectations about others’ actions or reactions; 2) predicting the future states of a scene with moving objects, which is essential to building probably safe intelligent vehicles with the capabilities of behavior prediction and potential collision detection; and 3) creating realistic driving simulators. However, this task is not trivial since various social factors exist along the driving interaction process, including social motivation, social perception, and social control. Generally, human driving behavior is compounded by human drivers’ social interactions and their physical interactions with the scene. No human drives a car in a vacuum; she/he must negotiate with other road users to achieve their goals in social traffic scenes. A rational human driver can interact with other road users in a socially-compatible way through implicit communications to complete their driving tasks smoothly in interaction-intensive, safety-critical environments. This monograph reviews the existing approaches and theories to help understand and rethink the interactions among human drivers toward social autonomous driving. Fundamental questions which are covered include: 1) What is social interaction in road traffic scenes? 2) How to measure and evaluate social interaction? 3) How to model and reveal the process of social interaction? 4) How do human drivers reach an implicit agreement and negotiate smoothly in social interaction? This monograph reviews various approaches to modeling and learning the social interactions between human drivers, ranging from optimization theory, deep learning, and graphical models to social force theory and behavioral and cognitive science. Also highlighted are some new directions, critical challenges, and opening questions for future research.
... Analog to the topological braids, other analytical methods have also been used to evaluate the priority preference of human drivers, which is then embedded into game-theoretic frameworks to model interactions. For instance, Zanardi et al., 2021a;Zanardi et al., 2021b formulated the interactions between human drivers as urban driving games and embedded each player's problem complexity in the lexicographic preference or the prioritized metrics of the agent over the outcomes. ...
Article
Full-text available
No human drives a car in a vacuum; she/he must negotiate with other road users to achieve their goals in social traffic scenes. A rational human driver can interact with other road users in a socially-compatible way through implicit communications to complete their driving tasks smoothly in interaction-intensive, safety-critical environments. This paper aims to review the existing approaches and theories to help understand and rethink the interactions among human drivers toward social autonomous driving. We take this survey to seek the answers to a series of fundamental questions: 1) What is social interaction in road traffic scenes? 2) How to measure and evaluate social interaction? 3) How to model and reveal the process of social interaction? 4) How do human drivers reach an implicit agreement and negotiate smoothly in social interaction? This paper reviews various approaches to modeling and learning the social interactions between human drivers, ranging from optimization theory and graphical models to social force theory and behavioral & cognitive science. We also highlight some new directions, critical challenges, and opening questions for future research.
... We express these objectives as a lexicographic preference over the outcomes. We ensure to write the players' preference satisfying the communal properties of Urban Driving Games (UDGs) [29], thus making the game potential. This allows us to search for equilibria in pure strategies, with also the guarantee that the global optimum minimizes the social cost. ...
Conference Paper
Full-text available
Dynamic games feature a state-space complexity that scales superlinearly with the number of players. This makes this class of games often intractable even for a handful of players. We introduce the factorization process of dynamic games as a transformation leveraging the independence of players at equilibrium to build a leaner game graph. When applicable, it yields fewer nodes, fewer players per game node, hence much faster solutions. While for the general case checking for independence of players requires to solve the game itself, we observe that for dynamic games in the robotic domain there exist exact heuristics based on the spatio-temporal occupancy of the individual players. We validate our findings in realistic autonomous driving scenarios showing that already for a 4-player intersection we have a reduction of game nodes and solving time close to 99%.
... H−1 h=0 r i (s h , π(s h )), since we expect this to be high at equilibria (Zanardi et al. (2021) showed that this is typical of driving games under some assumptions). ...
Preprint
Full-text available
We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. H-MARL builds high-probability confidence intervals around the unknown transition model and sequentially updates them based on newly observed data. Using these, it constructs an optimistic hallucinated game for the agents for which equilibrium policies are computed at each round. We consider general statistical models (e.g., Gaussian processes, deep ensembles, etc.) and policy classes (e.g., deep neural networks), and theoretically analyze our approach by bounding the agents' dynamic regret. Moreover, we provide a convergence rate to the equilibria of the underlying Markov game. We demonstrate our approach experimentally on an autonomous driving simulation benchmark. H-MARL learns successful equilibrium policies after a few interactions with the environment and can significantly improve the performance compared to non-exploratory methods.
Article
Existing multi-agent motion planners face scalability challenges with the number of agents and route plans that span long time horizons. We tackle these issues by introducing additional abstraction by interpolating agent trajectories with natural cubic splines and leveraging existing results that under some natural assumptions, the resulting game has the structure of a potential game. We prove a simultaneous gradient descent method using independent per-agent step sizes is guaranteed to converge to a local Nash equilibrium. Compared with recent iLQR-based potential game solvers, our method solves for local Nash equilibrium trajectories faster in games with up to 52 agents, and we demonstrate scalability to long horizons.
Chapter
We give an overview of time-continuous pedestrian models with a focus on data-driven modelling. Starting from pioneer, reactive force-based models we move forward to modern, active pedestrian models with sophisticated collision-avoidance and anticipation techniques through optimisation problems. The overview focuses on the mathematical aspects of the models and their different components. We include methods used for data-based calibration of model parameters, hybrid approaches incorporating neural networks, and purely data-based models fitted by deep learning. The conclusion outlines some development perspectives that we expect to grow in the coming years.
Article
We consider the interaction among agents engaging in a driving task and we model it as general-sum game. This class of games exhibits a plurality of different equilibria posing the issue of equilibrium selection. While selecting the most efficient equilibrium (in term of social cost) is often impractical from a computational standpoint, in this work we study the (in)efficiency of any equilibrium players might agree to play. More specifically, we bound the equilibrium inefficiency by modeling driving games as particular type of congestion games over spatio-temporal resources. We obtain novel guarantees that refine existing bounds on the Price of Anarchy (PoA) as a function of problem-dependent game parameters. For instance, the relative trade-off between proximity costs and personal objectives such as comfort and progress. Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies trained via decentralized multi-agent reinforcement learning.
Preprint
Full-text available
No human drives a car in a vacuum; she/he must negotiate with other road users to achieve their goals in social traffic scenes. A rational human driver can interact with other road users in a socially-compatible way through implicit communications to complete their driving tasks smoothly in interaction-intensive, safety-critical environments. This paper aims to review the existing approaches and theories to help understand and rethink the interactions among human drivers toward social autonomous driving. We take this survey to seek the answers to a series of fundamental questions: 1) What is social interaction in road traffic scenes? 2) How to measure and evaluate social interaction? 3) How to model and reveal the process of social interaction? 4) How do human drivers reach an implicit agreement and negotiate smoothly in social interaction? This paper reviews various approaches to modeling and learning the social interactions between human drivers, ranging from optimization theory and graphical models to social force theory and behavioral & cognitive science. We also highlight some new directions, critical challenges, and opening questions for future research.
Article
Modern applications require robots to comply with multiple, often conflicting rules and to interact with the other agents. We present Posetal Games as a class of games in which each player expresses a preference over the outcomes via a partially ordered set of metrics. This allows one to combine hierarchical priorities of each player with the interactive nature of the environment. By contextualizing standard game theoretical notions, we provide two sufficient conditions on the preference of the players to prove existence of pure Nash Equilibria in finite action sets. Moreover, we define formal operations on the preference structures and link them to a refinement of the game solutions, showing how the set of equilibria can be systematically shrunk. The presented results are showcased in a driving game where autonomous vehicles select from a finite set of trajectories. The results demonstrate the interpretability of results in terms of minimum-rank-violation for each player.
Conference Paper
Full-text available
We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving. Previous work has employed a variety of methods, including multimodal regression, occupancy maps, and 1-step stochastic policies. We instead frame the trajectory prediction problem as classification over a diverse set of trajectories. The size of this set remains manageable due to the limited number of distinct actions that can be taken over a reasonable prediction horizon. We structure the trajectory set to a) ensure a desired level of coverage of the state space, and b) eliminate physically impossible trajectories. By dynamically generating trajectory sets based on the agent's current state, we can further improve our method's efficiency. We demonstrate our approach on public, real world self-driving datasets, and show that it outperforms state-of-the-art methods.
Article
Full-text available
In this paper, we present a game-theoretic model, a new algorithmic framework with convergence theory, and numerical examples for the solution of intersection management problems. In our model, we consider autonomous vehicles that can communicate with each other in order to find individual optimal driving strategies through an intersection, without colliding with other vehicles. This results in coupled optimal control problems and we consider a generalized Nash equilibrium reformulation of the problem. Herein, we have individual differential equations, state and control constraints and additionally nonconvex shared constraints. To handle the nonconvexity we consider a partial penalty approach. To solve the resulting standard Nash equilibrium problem, we propose a decomposition method, where the selection of the players is controlled through penalty terms. The proposed method allows the prevention of a priori introduced hierarchies. Using dynamic programming, we prove convergence of our algorithm. Finally, we present numerical studies that show the effectiveness of the approach.
Article
Full-text available
Despite the dramatic advances made in artificial intelligence (AI) and other fields of computer science towards implementing “intelligent” systems expert in specific tasks, the goal of devising algorithms and machines able to interact with human beings just as naturally as other humans do is still elusive. As this naturalness is arguably a consequence of the similarity of the underlying ‘hardware’ (the human brain), it is reasonable to claim that only artificial systems closely inspired by the actual functioning of the human brain and mind have the potential to render this possible. More specifically, the aim of this paper is to propose a new, biologically inspired computational model able to mimic, in a more accurate way than existing ones, the set of functionalities know as Theory of Mind. This is a set of mental processes that allow an individual to attribute mental states to others. In human social interactions this mechanism is crucial, as it allows one to explain the observed behaviour of others, to guess their intentions and to effectively predict their future conduct. This happens by modelling and selecting the most likely (unobservable) mental states of the considered person, which are the primary causes of everyone’s observed actions. The proposed model combines a number of concepts, including those of hierarchical structure, hypotheses pre-activation, and the notion of agent class or ‘stereotype’. It rests on one of the main psychological approaches to Theory of Mind, termed Simulation Theory (ST), and is supported by significant neuroscientific evidence. Crucially, unlike previous efforts in AI, the proposed model puts the learning element at the forefront, in the belief that simulations of other intelligent being’s reasoning processes need to be learned from experience. In this perspective, a possible implementation of the model in terms of deep, reconfigurable neural networks, trained in a reinforcement learning setting, is outlined.
Conference Paper
Full-text available
Autonomous vehicles (AVs) or self-driving cars have the potential to replace human-operated cars. AVs can sense the environment and even navigate some of the roads in conditions humans find challenging. This may quickly lead to people's overreliance on AVs and overconfidence that no failures will occur. Therefore, AVs can impact society positively and negatively. AVs are X-ware systems that consist of software, hardware, humans, and their interactions. Despite the large number of studies on AVs, there are still a large number of unsolved problems. One major challenge for AVs is communication with other vehicles on the road as well as pedestrians. Replacing some of the human-operated vehicles with AVs will require interactions between AVs and these other users of the transportation network. Most of the previous research efforts consider software failures, whereas few consider the role of humans in the current transition to a society in which self-driving cars predominate. This paper considers three points of view: I) the driver and passenger of the AV, II) pedestrians, and III) AV interaction with other users of the transportation network. We also discuss related studies on human behavior.
Article
Full-text available
Robot motion planners are increasingly being equipped with an intriguing property: human likeness. This property can enhance human–robot interactions and is essential for a convincing computer animation of humans. This paper presents a (multi-agent) motion planner for dynamic environments that generates human-like motion. The presented motion planner stands out against other motion planners by explicitly modeling human-like decision making and taking interdependencies between individuals into account, which is achieved by applying game theory. Non-cooperative games and the concept of a Nash equilibrium are used to formulate the decision process that describes human motion behavior while walking in a populated environment. We evaluate whether our approach generates human-like motions through two experiments: a video study showing simulated, moving pedestrians, wherein the participants are passive observers, and a collision avoidance study, wherein the participants interact within virtual reality with an agent that is controlled by different motion planners. The experiments are designed as variations of the Turing test, which determines whether participants can differentiate between human motions and artificially generated motions. The results of both studies coincide and show that the participants could not distinguish between human motion behavior and our artificial behavior based on game theory. In contrast, the participants could distinguish human motions from motions based on established planners, such as the reciprocal velocity obstacles or social forces.
Article
Full-text available
Traditionally, autonomous cars treat human-driven vehicles like moving obstacles. They predict their future trajectories and plan to stay out of their way. While physically safe, this results in defensive and opaque behaviors. In reality, an autonomous car’s actions will actually affect what other cars will do in response, creating an opportunity for coordination. Our thesis is that we can leverage these responses to plan more efficient and communicative behaviors. We introduce a formulation of interaction with human-driven vehicles as an underactuated dynamical system, in which the robot’s actions have consequences on the state of the autonomous car, but also on the human actions and thus the state of the human-driven car. We model these consequences by approximating the human’s actions as (noisily) optimal with respect to some utility function. The robot uses the human actions as observations of her underlying utility function parameters. We first explore learning these parameters offline, and show that a robot planning in the resulting underactuated system is more efficient than when treating the person as a moving obstacle. We also show that the robot can target specific desired effects, like getting the person to switch lanes or to proceed first through an intersection. We then explore estimating these parameters online, and enable the robot to perform active information gathering: generating actions that purposefully probe the human in order to clarify their underlying utility parameters, like driving style or attention level. We show that this significantly outperforms passive estimation and improves efficiency. Planning in our model results in coordination behaviors: the robot inches forward at an intersection to see if can go through, or it reverses to make the other car proceed first. These behaviors result from the optimization, without relying on hand-coded signaling strategies. Our user studies support the utility of our model when interacting with real users.
Article
Full-text available
We define and discuss several notions of potential functions for games in strategic form. We characterize games that have a potential function, and we present a variety of applications.Journal of Economic LiteratureClassification Numbers:C72, C73
Article
Full-text available
How long does it take until economic agents converge to an equilibrium? By studying the complexity of the problem of computing a mixed Nash equilibrium in a game, we provide evidence that there are games in which convergence to such an equilibrium takes prohibitively long. Traditionally, computational problems fall into two classes: those that have a polynomial-time algorithm and those that are NP-hard. However, the concept of NP-hardness cannot be applied to the rare problems where "every instance has a solution"--for example, in the case of games Nash's theorem asserts that every game has a mixed equilibrium (now known as the Nash equilibrium, in honor of that result). We show that finding a Nash equilibrium is complete for a class of problems called PPAD, containing several other known hard problems; all problems in PPAD share the same style of proof that every instance has a solution.
Article
In this article, we propose an online 3-D planning algorithm for a drone to race competitively against a single adversary drone. The algorithm computes an approximation of the Nash equilibrium in the joint space of trajectories of the two drones at each time step, and proceeds in a receding horizon fashion. The algorithm uses a novel sensitivity term, within an iterative best response computational scheme, to approximate the amount by which the adversary will yield to the ego drone to avoid a collision. This leads to racing trajectories that are more competitive than without the sensitivity term. We prove that the fixed point of this sensitivity enhanced iterative best response satisfies the first-order optimality conditions of a Nash equilibrium. We present results of a simulation study of races with 2-D and 3-D race courses, showing that our game theoretic planner significantly outperforms a model predictive control (MPC) racing algorithm. We also present results of multiple drone racing experiments on a 3-D track in which drones sense each others’ relative position with onboard vision. The proposed game theoretic planner again outperforms the MPC opponent in these experiments where drones reach speeds up to 1.25m/s{1.25\,}\mathrm{{m}}/\mathrm{{s}} .
Article
We consider autonomous racing of two cars and present an approach to formulate the decision making as a non-cooperative non-zero-sum game. The game is formulated by restricting both players to fulfill static track constraints as well as collision constraints which depend on the combined actions of the two players. At the same time the players try to maximize their own progress. In the case where the action space of the players is finite, the racing game can be reformulated as a bimatrix game. For this bimatrix game, we show that the actions obtained by a sequential maximization approach where only the follower considers the action of the leader are identical to a Stackelberg and a Nash equilibrium in pure strategies. Furthermore, we propose a game promoting blocking, by additionally rewarding the leading car for staying ahead at the end of the horizon. We show that this changes the Stackelberg equilibrium, but has a minor influence on the Nash equilibria. For an online implementation, we propose to play the games in a moving horizon fashion, and we present two methods for guaranteeing feasibility of the resulting coupled repeated games. Finally, we study the performance of the proposed approaches in simulation for a set-up that replicates the miniature race car tested at the Automatic Control Laboratory of ETH Zurich. The simulation study shows that the presented games can successfully model different racing behaviors and generate interesting racing situations. (A preprint can be found at: https://arxiv.org/abs/1712.03913)
Article
We consider optimal control problems with ordinary differential equations that are coupled by shared, possibly nonconvex, constraints. For these problems, we use the generalized Nash equilibrium approach and provide a reformulation of normalized Nash equilibria as solutions to a single optimal control problem. By this reformulation, we are able to prove existence, and in some settings, exploiting convexity properties, we also get a limited number or even uniqueness of the normalized Nash equilibria. Then, we use our approach to discuss traffic scenarios with several autonomous vehicles, whose dynamics is described through differential equations, and the avoidance of collisions couples the optimal control problems of the vehicles. For the solution to the discretized problems, we prove strong convergence of the states and weak convergence of the controls. Finally, using existing optimal control software, we show that the generalized Nash equilibrium approach leads to reasonable results for a crossing scenario with different vehicle models.
Article
Real-time implementation of optimization-based control and trajectory planning can be very challenging for nonlinear systems. As a result, if an implementation based on a fixed linearization is not suitable, the nonlinear problems are typically locally approximated online, in order to leverage the speed and robustness of embedded convex quadratic programming (QP) solver technology developed during the last decade. The purpose of this paper is to demonstrate that, using simple standard building blocks from nonlinear programming (NLP), combined with a structure-exploiting linear system solver, it is possible to achieve computation times in the range typical of solvers for QPs, while retaining nonlinearities and solving the NLP to local optimality. The implemented algorithm is an interior-point method with approximate Hessians and adaptive barrier rules, and is provided as an extension to the C code generator FORCES. Three detailed examples are provided that illustrate a significant improvement in control performance when solving NLPs, with computation times that are comparable with those achieved by fast approximate schemes and up to an order of magnitude faster than the state-of-the-art interior-point solver IPOPT.
Article
Potential games are noncooperative games for which there exist auxiliary functions, called potentials, such that the maximizers of the potential are also Nash equilibria of the corresponding game. Some properties of Nash equilibria, such as existence or stability, can be derived from the potential, whenever it exists. We survey different classes of potential games in the static and dynamic cases, with a finite number of players, as well as in population games where a continuum of players is allowed. Likewise, theoretical concepts and applications are discussed by means of illustrative examples.
Chapter
Multiobjective optimization (also known as multiobjective programming, vector optimization, multicriteria optimization, multiattribute optimization, or Pareto optimization) is an area of multiple-criteria decision-making, concerning mathematical optimization problems involving more than one objective functions to be optimized simultaneously. Multiobjective optimization has been applied to many fields of science, including engineering, where optimal decisions need to be taken in the presence of trade-offs between two or more objectives that may be in conflict. Indeed, in many practical engineering applications, designers are making decisions between conflict objectives—for example, maximizing performance while minimizing fuel consumption and emission of pollutants of a vehicle. In these cases, a multiobjective optimization study should be performed, which provides multiple solutions representing the trade-offs among the objective functions.
Conference Paper
Merging is one of the important issues in studying roadway traffic. Merging disturbs the mainline of traffic, which reduces the efficiency or capacity of the highway system. In this paper, we have considered the application of a Stackelberg game theory to a driver behavior model in a merging situation. In this model, the so-called payoffs that reflect the drivers’ aggressiveness affect the decision to proceed to merge and whether to accelerate or decelerate in the game theoretic framework. These merging behaviors in turn impact the mainline traffic, which may lead to a variety of influences, such as collisions or reduced roadway throughput. Consequently, this impact depends on the level of aggressiveness of the driver merging in and those in the mainline, which results in both longitudinal and lateral disturbances in the mainline due to their interaction.
Article
This paper develops an optimization-based theory for the existence and uniqueness of equilibria of a noncooperative game wherein the selfish players' optimization problems are nonconvex and there are side constraints and an associated price clearance to be satisfied by the equilibria. A new concept of equilibrium for such a nonconvex game, which we term a "quasi-Nash equilibrium" (QNE), is introduced as a solution of the variational inequality (VI) obtained by aggregating the firstorder optimality conditions of the players' problems while retaining the convex constraints (if any) in the defining set of the VI. Under a second-order sufficiency condition from nonlinear programming, a QNE becomes a local Nash equilibrium of the game. Uniqueness of a QNE is established using a degree-theoretic proof. Under a key boundedness property of the Karush-Kuhn-Tucker multipliers of the nonconvex constraints and the positive definiteness of the Hessians of the players' Lagrangian functions, we establish the single-valuedness of the players' best-response maps, from which the existence of a Nash equilibrium (NE) of the nonconvex game follows. We also present a distributed algorithm for computing an NE of such a game and provide a matrix-theoretic condition for the convergence of the algorithm. An application is presented that pertains to a special multi-leaderfollower game wherein the nonconvexity is due to the followers' equilibrium conditions in the leaders' optimization problems. Another application to a cognitive radio paradigm in a signal processing game is described in detail in [G. Scutari and J.S. Pang, IEEE Trans. Inform. Theory, submitted; J.S. Pang and G. Scutari, Joint IEEE Trans. Signal Process, submitted].
Article
We consider the problem of automatic generation of control strategies for robotic vehicles given a set of high-level mission specifications, such as "Vehicle x must eventually visit a target region and then return to a base," "Regions A and B must be periodically surveyed," or "None of the vehicles can enter an unsafe region." We focus on instances when all of the given specifications cannot be reached simultaneously due to their incompatibility and/or environmental constraints. We aim to find the least-violating control strategy while considering different priorities of satisfying different parts of the mission. Formally, we consider the missions given in the form of linear temporal logic formulas, each of which is assigned a reward that is earned when the formula is satisfied. Leveraging ideas from the automata-based model checking, we propose an algorithm for finding an optimal control strategy that maximizes the sum of rewards earned if this control strategy is applied. We demonstrate the proposed algorithm on an illustrative case study.
Article
We analyze some new decomposition schemes for the solution of generalized Nash equilibrium problems. We prove convergence for a particular class of generalized potential games that includes some interesting engineering problems. We show that some versions of our algorithms can deal also with problems lacking any convexity and consider separately the case of two players for which stronger results can be obtained. KeywordsGeneralized Nash equilibrium problem–Generalized potential game–Decomposition–Regularization
Article
Some examples are given of differentiable functions of three variables, having the property that if they are treated by the minimization algorithm that searches along the coordinate directions in sequence, then the search path tends to a closed loop. On this loop the gradient of the objective function is bounded away from zero. We discuss the relevance of these examples to the problem of proving general convergence theorems for minimization algorithms that use search directions.
Article
The Generalized Nash equilibrium problem is an important model that has its roots in the economic sciences but is being fruitfully used in many different fields. In this survey paper we aim at discussing its main properties and solution algorithms, pointing out what could be useful topics for future research in the field.
Article
rovided by relating the similarities between various extensions to the basic problem within a common mathematical framework. Modeling, analysis, algorithms, and computed examples are presented for each of three problems: (1) motion planning under uncertainty in sensing and control; (2) motion planning under environment uncertainties; and (3) multiple-robot motion planning. Traditional approaches to the first problem are often based on a methodology known as preimage planning, which involves worst-case analysis. In this context, a general method for determining feedback strategies is developed by blending ideas from stochastic optimal control and dynamic game theory with traditional preimage planning concepts. This generalizes classical preimages to performance preimages and preimage plans to motion strategies with information feedback. For the second problem, robot strategies are analyzed and determined for situations in which the environme