Zhe Wang’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (12)


A bird’s eye overview of TacticAI
A How corner kick situations are converted to a graph representation. Each player is treated as a node in a graph, with node, edge and graph features extracted as detailed in the main text. Then, a graph neural network operates over this graph by performing message passing; each node’s representation is updated using the messages sent to it from its neighbouring nodes. B How TacticAI processes a given corner kick. To ensure that TacticAI’s answers are robust in the face of horizontal or vertical reflections, all possible combinations of reflections are applied to the input corner, and these four views are then fed to the core TacticAI model, where they are able to interact with each other to compute the final player representations—each internal blue arrow corresponds to a single message passing layer from (A). Once player representations are computed, they can be used to predict the corner’s receiver, whether a shot has been taken, as well as assistive adjustments to player positions and velocities, which increase or decrease the probability of a shot being taken.
Corner kicks represented in the latent space shaped by TacticAI
We visualise the latent representations of attacking and defending teams in 1024 corner kicks using t-SNE. A latent team embedding in one corner kick sample is the mean of the latent player representations on the same attacking (A–C) or defending (D) team. Given the reference corner kick sample (A), we retrieve another corner kick sample (B) with respect to the closest distance of their representations in the latent space. We observe that (A) and (B) are both out-swing corner kicks and share similar patterns of their attacking tactics, which are highlighted with rectangles having the same colours, although they bear differences with respect to the absolute positions and velocities of the players. All the while, the latent representation of an in-swing attack (C) is distant from both (A) and (B) in the latent space. The red arrows are only used to demonstrate the difference between in- and out-swing corner kicks, not the actual ball trajectories.
Example of refining a corner kick tactic with TacticAI
TacticAI makes it possible for human coaches to redesign corner kick tactics in ways that help maximise the probability of a positive outcome for either the attacking or the defending team by identifying key players, as well as by providing temporally coordinated tactic recommendations that take all players into consideration. As demonstrated in the present example (A), for a corner kick in which there was a shot attempt in reality (B), TacticAI can generate a tactically-adjusted setting in which the shot probability has been reduced, by adjusting the positioning of the defenders (D). The suggested defender positions result in reduced receiver probability for attacking players 2–5 (see bottom row), while the receiver probability of Attacker 1, who is distant from the goalpost, has been increased (C). The model is capable of generating multiple such scenarios. Coaches can inspect the different options visually and additionally consult TacticAI’s quantitative analysis of the presented tactics.
Statistical analysis for the case study tasks
In task 1, we tested the statistical difference between the real corner kick samples and the synthetic ones generated by TacticAI from two aspects: (A.1) the distributions of their assigned ratings, and (A.2) the corresponding histograms of the rating values. Analogously, in task 2 (receiver prediction), (B.1) we track the distributions of the top-3 accuracy of receiver prediction using those samples, and (B.2) the corresponding histogram of the mean rating per sample. No statistical difference in the mean was observed in either cases ((A.1) (z = −0.34, p > 0.05), and (B.1) (z = 0.97, p > 0.05)). Additionally, we observed a statistically significant difference between the ratings of different raters on receiver prediction, with three clear clusters emerging (C). Specifically, Raters A and E had similar ratings (z = 0.66, p > 0.05), and Raters B and D also rated in similar ways (z = −1.84, p > 0.05), while Rater C responded differently from all other raters. This suggests a good level of variety of the human raters with respect to their perceptions of corner kicks. In task 3—identifying similar corners retrieved in terms of salient strategic setups—there were no significant differences among the distributions of the ratings by different raters (D), suggesting a high level of agreement on the usefulness of TacticAI’s capability of retrieving similar corners (F1,4 = 1.01, p > 0.1). Finally, in task 4, we compared the ratings of TacticAI’s strategic refinements across the human raters (E) and found that the raters also agreed on the general effectiveness of the refinements recommended by TacticAI (F1,4 = 0.45, p > 0.05). Note that the violin plots used in B.1 and C–E model a continuous probability distribution and hence assign nonzero probabilities to values outside of the allowed ranges. We only label y-axis ticks for the possible set of ratings.
Examples of the tactic refinements recommended by TacticAI
These examples are selected from our case study with human experts, to illustrate the breadth of tactical adjustments that TacticAI suggests to teams defending a corner. The density of the yellow circles coincides with the number of times that the corresponding change is recognised as constructive by human experts. Instead of optimising the movement of one specific player, TacticAI can recommend improvements for multiple players in one generation step through suggesting better positions to block the opposing players, or better orientations to track them more efficiently. Some specific comments from expert raters follow. In A, according to raters, TacticAI suggests more favourable positions for several defenders, and improved tracking runs for several others—further, the goalkeeper is positioned more deeply, which is also beneficial. In B, TacticAI suggests that the defenders furthest away from the corner make improved covering runs, which was unanimously deemed useful, with several other defenders also positioned more favourably. In C, TacticAI recommends improved covering runs for a central group of defenders in the penalty box, which was unanimously considered salient by our raters. And in D, TacticAI suggests substantially better tracking runs for two central defenders, along with a better positioning for two other defenders in the goal area.
TacticAI: an AI assistant for football tactics
  • Article
  • Full-text available

March 2024

·

2,396 Reads

·

50 Citations

Zhe Wang

·

Petar Veličković

·

·

[...]

·

Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI’s model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning.

Download

Mastering the game of Stratego with model-free multiagent reinforcement learning

December 2022

·

289 Reads

·

148 Citations

Science

We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.


Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

September 2022

·

121 Reads

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.


Developing, evaluating and scaling learning agents in multi-agent environments

September 2022

·

79 Reads

·

1 Citation

AI Communications

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.


From motor control to team play in simulated humanoid football

August 2022

·

83 Reads

·

93 Citations

Science Robotics

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent behavior in the physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals that are defined on much longer time scales and that often involve complex interactions with the environment and other agents. Recent research has demonstrated the potential of learning-based approaches applied to the respective problems of complex movement, long-term planning, and multiagent coordination. However, their integration traditionally required the design and optimization of independent subsystems and remains challenging. In this work, we tackled the integration of motor control and long-horizon decision-making in the context of simulated humanoid football, which requires agile motor control and multiagent coordination. We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. They were trained to maximize several environment rewards and to imitate pretrained football-specific skills if doing so led to improved performance. The result is a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics, including those used in real-world sport analytics. Our work constitutes a complete demonstration of learned integrated decision-making at multiple scales in a multiagent setting.


Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

June 2022

·

150 Reads

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of 1053510^{535} nodes, i.e., 1017510^{175} times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of 1016410^{164} nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.


Stylized visualization of the multiagent time-series imputation setting. (a) Agent trajectories up to and including time t. Dark blue indicates trajectory portions that are observed (with light indicating otherwise); the camera field of view at the current time t is indicated in grey. (b) Visualization of masks m\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{m}}}$$\end{document} for all timesteps, where mti=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{m}}}^i_t=1$$\end{document} where dark, and mti=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{m}}}^i_t=0$$\end{document} where light. The mask at time t, which corresponds to the frame shown in (a), is highlighted in grey.
Graph Imputer model. Our model imputes missing information at each timestep using a combination of bidirectional LSTMs and graph networks. An exposition of a forward-direction update (corresponding to directionalupdate in Algorithm 1 in the “Methods” section) is provided in the left portion of the figure. Dark blue boxes indicate trajectory segments that are observed for each agent (with light blue indicating otherwise). In each direction, agent-specific temporal context is updated via LSTMs with shared parameters. All agents’ LSTM hidden states, h→t-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathop {{{\varvec{h}}}}\limits ^{{\tiny \rightarrow }}}_{t-1}$$\end{document}, are subsequently used as node features in variational graph networks to ensure information-sharing across agents. This enables learning of a distribution over agent state deviations, Δx→t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta {\mathop {{{\varvec{x}}}}\limits ^{{\tiny \rightarrow }}}_{t}$$\end{document}. The process is likewise repeated in the backward-direction (right portion of the figure), with the directional updates fused to produce an imputed estimate x^t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{{\varvec{x}}}}}_t$$\end{document} at each time t. The dotted line indicates that the Graphnet encoder is used only at training time, with the GraphNet prior being used for the final evaluations conducted at test time.
Trajectory visualizations (best viewed when zoomed in). Each column provides an example trajectory sequence, with the first row illustrating the ground truth, and subsequent rows showing results from various models, including the Graph Imputer (ours). For all examples, the Graph Imputer trajectories seamlessly adhere to the boundary value constraints imposed at the moments of disappearance and reappearance of players.
Pitch control error visualizations. The first column shows the ground truth pitch control field, player positions, and the camera field of view. Each remaining column provides a visualization of the absolute error between pitch control fields based on predicted model outputs and ground truth.
Predicted vs. ground truth pitch control 14 Mean Average Error (MAE) across models, under partially observable settings. Mean and standard deviations are reported over all trajectories in our validation dataset. The Graph Imputer model yields the lowest pitch control error across all baselines. Note that the Bidir. Role- invariant VRNN model, which comes closest to our Graph Imputer model in terms of performance, was handcrafted by us specifically for the football domain.
Multiagent off-screen behavior prediction in football

May 2022

·

749 Reads

·

25 Citations

In multiagent worlds, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents’ dynamic behaviors, make such systems complex and interesting to study from a decision-making perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. In many settings, only sporadic observations of agents may be available in a given trajectory sequence. In football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation in the context of human football play, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer , uses past and future information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We demonstrate our approach on multiagent settings involving players that are partially-observable, using the Graph Imputer to predict the behaviors of off-screen players. To quantitatively evaluate the approach, we conduct experiments on football matches with ground truth trajectory data, using a camera module to simulate the off-screen player state estimation setting. We subsequently use our approach for downstream football analytics under partial observability using the well-established framework of pitch control, which traditionally relies on fully observed data. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football, across all considered metrics.


Time-series Imputation of Temporally-occluded Multiagent Trajectories

June 2021

·

94 Reads

In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. However, in many settings, only sporadic observations of agents may be available in a given trajectory sequence. For instance, in football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses forward- and backward-information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We evaluate our approach on a dataset of football matches, using a projective camera module to train and evaluate our model for the off-screen player state estimation setting. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football.


From Motor Control to Team Play in Simulated Humanoid Football

May 2021

·

182 Reads

·

1 Citation

Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.


Game Plan: What AI can do for Football, and What Football can do for AI

May 2021

·

599 Reads

·

105 Citations

Journal of Artificial Intelligence Research

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players’ and coordinated teams’ behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).


Citations (7)


... Commentary generation systems, evaluated by benchmarks like SCBench and CommentarySet dataset, provide contextually relevant narratives [16]. Tactical analysis systems utilize player tracking and state reconstruction to understand formations and strategies [34,44], while sports health monitoring assesses athlete performance and workload through video analysis [12,19]. Intelligent refereeing systems, such as those developed using the SoccerNet-XFoul dataset [21], integrate multimodal data to support complex decision-making processes. ...

Reference:

SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding
TacticAI: an AI assistant for football tactics

... The convergence of online learning algorithms in games under self-play is a fundamental question in both game theory and machine learning. Self-play methods for computing Nash equilibria have enabled the development of superhuman AI agents in competitive games such as Go [Sil+17], Poker [Bow+15; BS18; BS19], Stratego [Per+22], and Diplomacy [FAI+22]. More recently, self-play learning algorithms have also been applied to large language model (LLM) alignment with human feedback, which can be modeled as a two-player zero-sum game [Mun+23; Swa+24; Ye+24; Wu+24; Liu+24; Liu+25]. ...

Mastering the game of Stratego with model-free multiagent reinforcement learning
  • Citing Article
  • December 2022

Science

... • DeepMind [7] • Five AI [9] • Heriot-Watt University [13] • King's College London [2] • Teesside University [8] • University of Aberdeen [3] • University of Edinburgh [1] • University of Essex [11] • University of Lancaster [4] C ...

Developing, evaluating and scaling learning agents in multi-agent environments

AI Communications

... Robot soccer serves as a high-profile benchmark for such settings, combining real-time control, decentralized decision making, and long-horizon strategy. Yet, despite growing interest, most prior work either adopts rule-based pipelines [9], focuses on simplified 1v1 games [10], [11], or remains confined to simulation [12], [13]. Deploying a fully learning-based, competitive and cooperative soccer system on real legged robots remains a challenge. ...

From motor control to team play in simulated humanoid football
  • Citing Article
  • August 2022

Science Robotics

... The behaviours of agents (players and the ball) in soccer form a rich and important testbed for the study of multiagent adversarial systems (Yeh et al. 2019;Tuyls et al. 2021;Omidshafiei et al. 2022;Wang et al. 2024). In this paper, we model the fine-grained spatiotemporal behaviours of agents in professional soccer games. ...

Multiagent off-screen behavior prediction in football

... Robots also struggle to generalize or adapt to other environments or tasks. To alleviate this problem to a certain extent, there have been recent DRL studies based on motion priors [86][87][88][89][90] , which have been successfully applied to quadrupedal locomotion tasks [12,56,91] . However, the variety of motion priors in these studies is insufficient, and the robot' s behavior is not agile and natural. ...

From Motor Control to Team Play in Simulated Humanoid Football
  • Citing Preprint
  • May 2021

... However, the practical operationalization of all the dimensions that influence tactical behavior and patterns has led to the development of complex and time-consuming methodologies, highly dependent on experience and susceptible to human error (3,5). Although the automation of information collection systems such as tracking systems based on global position system (GPS) or Global Navigation Satellite Systems, local position measure (LPM), or video-based motion analysis (VBMA) has been already a widespread procedure in technical teams (6)(7)(8), the quantification of this information, the visualization datasets, and the dynamics of the work teams have undergone some transformation in recent years (9,10). ...

Game Plan: What AI can do for Football, and What Football can do for AI
  • Citing Article
  • May 2021

Journal of Artificial Intelligence Research