Guy Lever’s research while affiliated with University College London and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (36)


Learning agile soccer skills for a bipedal robot with deep reinforcement learning
  • Article

April 2024

·

129 Reads

·

125 Citations

Science Robotics

Tuomas Haarnoja

·

Ben Moran

·

Guy Lever

·

[...]

·

Nicolas Heess

We investigated whether deep reinforcement learning (deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies. We used deep RL to train a humanoid robot to play a simplified one-versus-one soccer game. The resulting agent exhibits robust and dynamic movement skills, such as rapid fall recovery, walking, turning, and kicking, and it transitions between them in a smooth and efficient manner. It also learned to anticipate ball movements and block opponent shots. The agent’s tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. Our agent was trained in simulation and transferred to real robots zero-shot. A combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training enabled good-quality transfer. In experiments, the agent walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline.


Figure 4 | The joint key poses used to train the get-up teacher, extracted from a scripted get-up controller (Robotis, 2023).
Figure 6 | Embedding of the joint angles recorded while executing either the scripted baseline walking policy (left) or the learned 1v1 policy (right).
Figure 7 | Top row: Example frames or initializations for the set-piece tasks in sim and on the real robot. Bottom row: Overlayed plots of the trajectories collected from the set-piece experiment. The plots show the robot trajectory before kicking (solid red lines) and after kicking (dotted red line), the ball trajectory (dashed red line), and the final ball position (white circle) and opponent position (blue circle). Behaviors in the set pieces in simulation were reliable and consistent: the robot and ball trajectories clearly overlap and the agent scores nearly every time. In contrast, the behaviors in the real environment were somewhat less consistent-the ball trajectories vary more and the robot misses the goal slightly more often.
Figure 12 | Overlaid plots of the trajectories collected from the simulated vision set-piece experiment. We refer the reader to our supplementary website for videos of experiments on real robots. Robot trajectory before kick (solid red lines), after kick (dotted red line) and corresponding ball trajectory (dashed red line), ball (white circle) and opponent (blue circle). (Left) Analysis of penalty kick from fixed starting positions of ball and robots. The robot scored in all trials. (Right) A single trajectory showing adaptive movement behavior when the ball is initialized to be rolling along the negative y-axis.
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
  • Preprint
  • File available

April 2023

·

725 Reads

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.

Download

Generative Adversarial Equilibrium Solvers

February 2023

·

23 Reads

We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the \emph{generalized Nash equilibrium} (GNE) in \emph{pseudo-games}, and its specific instantiation as the \emph{competitive equilibrium} (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other players but also their feasible action spaces. Although the computation of GNE and CE is intractable in the worst-case, i.e., PPAD-hard, in practice, many applications only require solutions with high accuracy in expectation over a distribution of problem instances. We introduce Generative Adversarial Equilibrium Solvers (GAES): a family of generative adversarial neural networks that can learn GNE and CE from only a sample of problem instances. We provide computational and sample complexity bounds, and apply the framework to finding Nash equilibria in normal-form games, CE in Arrow-Debreu competitive economies, and GNE in an environmental economic model of the Kyoto mechanism.


Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

September 2022

·

121 Reads

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.


Developing, evaluating and scaling learning agents in multi-agent environments

September 2022

·

78 Reads

·

1 Citation

AI Communications

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.


From motor control to team play in simulated humanoid football

August 2022

·

83 Reads

·

92 Citations

Science Robotics

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent behavior in the physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals that are defined on much longer time scales and that often involve complex interactions with the environment and other agents. Recent research has demonstrated the potential of learning-based approaches applied to the respective problems of complex movement, long-term planning, and multiagent coordination. However, their integration traditionally required the design and optimization of independent subsystems and remains challenging. In this work, we tackled the integration of motor control and long-horizon decision-making in the context of simulated humanoid football, which requires agile motor control and multiagent coordination. We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. They were trained to maximize several environment rewards and to imitate pretrained football-specific skills if doing so led to improved performance. The result is a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics, including those used in real-world sport analytics. Our work constitutes a complete demonstration of learned integrated decision-making at multiple scales in a multiagent setting.


From Motor Control to Team Play in Simulated Humanoid Football

May 2021

·

180 Reads

·

1 Citation

Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.


A Generalized Training Approach for Multiagent Learning

January 2020

·

95 Reads

·

61 Citations

This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime where in Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, α-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and α-Rank. We demonstrate the competitive performance of α-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where α-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.


Biases for Emergent Communication in Multi-agent Reinforcement Learning

December 2019

·

82 Reads

·

64 Citations

We study the problem of emergent communication, in which language arises because speakers and listeners must communicate information in order to solve tasks. In temporally extended reinforcement learning domains, it has proved hard to learn such communication without centralized training of agents, due in part to a difficult joint exploration problem. We introduce inductive biases for positive signalling and positive listening, which ease this problem. In a simple one-step environment, we demonstrate how these biases ease the learning problem. We also apply our methods to a more extended environment, showing that agents with these inductive biases achieve better performance, and analyse the resulting communication protocols.


Biases for Emergent Communication in Multi-agent Reinforcement Learning

December 2019

·

29 Reads

·

1 Citation

We study the problem of emergent communication, in which language arises because speakers and listeners must communicate information in order to solve tasks. In temporally extended reinforcement learning domains, it has proved hard to learn such communication without centralized training of agents, due in part to a difficult joint exploration problem. We introduce inductive biases for positive signalling and positive listening, which ease this problem. In a simple one-step environment, we demonstrate how these biases ease the learning problem. We also apply our methods to a more extended environment, showing that agents with these inductive biases achieve better performance, and analyse the resulting communication protocols.


Citations (27)


... Robot soccer serves as a high-profile benchmark for such settings, combining real-time control, decentralized decision making, and long-horizon strategy. Yet, despite growing interest, most prior work either adopts rule-based pipelines [9], focuses on simplified 1v1 games [10], [11], or remains confined to simulation [12], [13]. Deploying a fully learning-based, competitive and cooperative soccer system on real legged robots remains a challenge. ...

Reference:

Toward Real-World Cooperative and Competitive Soccer with Quadrupedal Robot Teams
Learning agile soccer skills for a bipedal robot with deep reinforcement learning
  • Citing Article
  • April 2024

Science Robotics

... Recently, several nonparametric models for dependent data, sequence modeling, and time series analysis based on kernel mean embeddings have emerged. Popular approaches include filtering (Song et al., 2009;Fukumizu et al., 2013;Gebhardt et al., 2019), transition models (Sun et al., 2019;Grünewälder et al., 2012b) and reinforcement learning (van Hoof et al., 2015;Lever et al., 2016;van Hoof et al., 2017;Stafford and Shawe-Taylor, 2018;Gebhardt et al., 2018), to only name a few. A theoretical tool to understand these concepts is the kernel autocovariance operator, as its plays an important role in the nonparametric approximation of transition probabilities. ...

Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning
  • Citing Article
  • February 2016

Proceedings of the AAAI Conference on Artificial Intelligence

... • DeepMind [7] • Five AI [9] • Heriot-Watt University [13] • King's College London [2] • Teesside University [8] • University of Aberdeen [3] • University of Edinburgh [1] • University of Essex [11] • University of Lancaster [4] C ...

Developing, evaluating and scaling learning agents in multi-agent environments

AI Communications

... Robot soccer serves as a high-profile benchmark for such settings, combining real-time control, decentralized decision making, and long-horizon strategy. Yet, despite growing interest, most prior work either adopts rule-based pipelines [9], focuses on simplified 1v1 games [10], [11], or remains confined to simulation [12], [13]. Deploying a fully learning-based, competitive and cooperative soccer system on real legged robots remains a challenge. ...

From motor control to team play in simulated humanoid football
  • Citing Article
  • August 2022

Science Robotics

... Robots also struggle to generalize or adapt to other environments or tasks. To alleviate this problem to a certain extent, there have been recent DRL studies based on motion priors [86][87][88][89][90] , which have been successfully applied to quadrupedal locomotion tasks [12,56,91] . However, the variety of motion priors in these studies is insufficient, and the robot' s behavior is not agile and natural. ...

From Motor Control to Team Play in Simulated Humanoid Football
  • Citing Preprint
  • May 2021

... A seminal technique in these successes is self-play and its variants (Heinrich et al., 2015;Heinrich & Silver, 2016;Hennes et al., 2020;Xu et al., 2023b), where agents repeatedly train against older versions of themselves to refine their policies. Another prominent line of work is Policy-Space Response Oracles (PSRO) (Lanctot et al., 2017;Muller et al., 2019), an iterative procedure that produces best responses to a growing population of policies in a meta-game. Conceptually, our iterative framework is related to PSRO in that we both solve an abstracted game before enlarging it to approach the full original game. ...

A Generalized Training Approach for Multiagent Learning
  • Citing Conference Paper
  • January 2020

... To facilitate the agents to use the communication channel, we modify the loss functions of the communication and action policies as suggested by Eccles et al. in [20]. The goal is to bias the agents towards positive signalling, encouraging them to send distinct messages in response to different observations, and positive listening, encouraging agents to respond differently when different messages are received. ...

Biases for Emergent Communication in Multi-agent Reinforcement Learning
  • Citing Conference Paper
  • December 2019

... A signal can emerge in an evolutionary process, as part of the configuration of the search space. This occurs mainly at early stages of the optimization process because this is when exploration often takes more random starting points (Eccles, Bachrach, Lever, Lazaridou & Graepel, 2019). But if the emergent behavior is not enough, the signal must be established using an adequate evolutionary strategy (Simões, Lau & Reis, 2019), (Gigliotta, Bartolomeo & Miglino, 2015). ...

Biases for Emergent Communication in Multi-agent Reinforcement Learning
  • Citing Preprint
  • December 2019

... We investigae using neural networks to predict power indices, including Shapley values and Banzhaf indices, in the marginal contribution network class of coalitional games [12,13]. Game theory studies interactions and teams between participants [14][15][16] with cooperative game theory examining how teams of players may cooperate to achieve certain outcomes [17][18][19][20][21], with many applications [22], including security [23], network analysis [24][25][26][27][28][29][30], voting [31][32][33], log-ics and reasoning [34][35][36] robotics [37][38][39][40], data selection and valuation [41,42], market analysis [43][44][45][46][47][48][49], interpretability [50] team formation [51][52][53][54][55]. The concept of power index quantifies the importance of each player [56][57][58][59]. Calculating these indices exactly is computationally expensive in large games [28,60,61], motivating approximation algorithms [62][63][64]. ...

The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution

... In this section, we examine how RL can regulate and control the collective dynamics in active swarms, focusing on two complementary aspects. First, we discuss the self-organization of active swarms, [99][100][101][102][103][104][105][106][107][108][109] where RL helps individual behaviors optimize local interactions, leading to the emergence of complex patterns like flocking or clustering, without direct centralized control or external influence. Second, we explore the goal-directed control of swarm behaviors, [110][111][112][113] where RL facilitates adjustments to global intervention parameters, guiding individual agents to align with predefined collective goals through external influence or manipulation. ...

Reinforcement Learning Agents acquire Flocking and Symbiotic Behaviour in Simulated Ecosystems
  • Citing Conference Paper
  • January 2019