
Edgar Duenez-Guzman- PhD
- Staff Research Scientist at DeepMind
Edgar Duenez-Guzman
- PhD
- Staff Research Scientist at DeepMind
Working on multi-agent reinforcement learning both to build intelligent systems and to understand real social phenomena
About
41
Publications
12,907
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
700
Citations
Introduction
I am a scientist and an engineer. I work at DeepMind's Multi-Agent team. My research interests span evolution of cooperation in nature and in reinforcement learning agents, safety, theory of computation and swarm robotics.
Current institution
DeepMind
Current position
- Staff Research Scientist
Additional affiliations
March 2015 - present
DeepMind
Position
- Senior Researcher
July 2012 - July 2013
February 2010 - May 2012
Publications
Publications (41)
What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-se...
Traditionally, cognitive and computer scientists have viewed intelligence solipsistically, as a property of unitary agents devoid of social context. Given the success of contemporary learning algorithms, we argue that the bottleneck in artificial intelligence (AI) progress is shifting from data assimilation to novel data generation. We bring togeth...
Coordinated pair bonds are common in birds and also occur in many other taxa. How do animals solve the social dilemmas they face in coordinating with a partner? We developed an evolutionary model to explore this question, based on observations that a) neuroendocrine feedback provides emotional bookkeeping which is thought to play a key role in vert...
In developing artificial intelligence (AI), researchers often benchmark against human performance as a measure of progress. Is this kind of comparison possible for moral cognition? Given that human moral judgment often hinges on intangible properties like “intention” which may have no natural analog in artificial agents, it may prove difficult to d...
Is it possible to evaluate the moral cognition of complex artificial agents? In this work, we take a look at one aspect of morality: `doing the right thing for the right reasons.' We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans to facilitate like-for-like comparison. Morally-motivated behavio...
In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents...
Many environments contain numerous available niches of variable value, each associated with a different local optimum in the space of behaviors (policy space). In such situations it is often difficult to design a learning process capable of evading distraction by poor local optima long enough to stumble upon the best available niche. In this work w...
Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, an...
What inductive biases must be incorporated into multi-agent artificial intelligence models to get them to capture high-fidelity imitation? We think very little is needed. In the right environments, both instrumental- and ritual-stance imitation can emerge from generic learning mechanisms operating on non-deliberative decision architectures. In this...
The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources a...
Humans are learning agents that acquire social group representations from experience. Here, we discuss how to construct artificial agents capable of this feat. One approach, based on deep reinforcement learning, allows the necessary representations to self-organize. This minimizes the need for hand-engineering, improving robustness and scalability....
A key challenge in the study of multiagent cooperation is the need for individual agents not only to cooperate effectively, but to decide with whom to cooperate. This is particularly critical in situations when other agents have hidden, possibly misaligned motivations and goals. Social deduction games offer an avenue to study how individuals might...
Undesired bias afflicts both human and algorithmic decision making, and may be especially prevalent when information processing trade-offs incentivize the use of heuristics. One primary example is \textit{statistical discrimination} -- selecting social partners based not on their underlying attributes, but on readily perceptible characteristics tha...
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create nove...
Society is characterized by the presence of a variety of social norms: collective patterns of sanctioning that can prevent miscoordination and free-riding. Inspired by this, we aim to construct learning dynamics where potentially beneficial social norms can emerge. Since social norms are underpinned by sanctioning, we introduce a training regime wh...
We present DeepMind Lab2D, a scalable environment simulator for artificial intelligence research that facilitates researcher-led experimentation with environment design. DeepMind Lab2D was built with the specific needs of multi-agent deep reinforcement learning researchers in mind, but it may also be useful beyond that particular subfield.
Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model...
Even in simple multi-agent systems, fixed incentives can lead to outcomes that are poor for the group and each individual agent. We propose a method, D3C, for online adjustment of agent incentives that reduces the loss incurred at a Nash equilibrium. Agents adjust their incentives by learning to mix their incentive with that of other agents, until...
Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity. In contrast, studies of reinforcement learning in mixed-motive games have primarily leveraged homogeneous approaches. Given the defining characteristic of mixed-motive games--the imperfect correlation...
Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation’s average return drive subsequent increases inits size, just as Thomas...
We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through {\em safe} policies, i.e.,~policies that do not take the agent to undesirable situations. We formulate these problems as {\em constrained} Markov decision processes (CMDPs) and present safe policy optimization...
Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas...
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to tempo...
Multi-agent cooperation is an important feature of the natural world. Many tasks involve individual incentives that are misaligned with the common good, yet a wide range of organisms from bacteria to insects and humans are able to overcome their differences and collaborate. Therefore, the emergence of cooperative behavior amongst self-interested in...
In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance it is crucial to guarantee the safety of an agent during training as well as deployment (e.g. a robot should avoid taking actions -...
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to tempo...
Division of labor is ubiquitous in biological systems, as evidenced by various forms of complex task specialization observed in both animal societies and multicellular organisms. Although clearly adaptive, the way in which division of labor first evolved remains enigmatic, as it requires the simultaneous co-occurrence of several complex traits to a...
Microbial populations often contain a fraction of slow-growing persister cells that withstand antibiotics and other stress factors. Current theoretical models predict that persistence levels should reflect a stable state in which the survival advantage of persisters under adverse conditions is balanced with the direct growth cost impaired under fav...
NLRP proteins are important components of inflammasomes with a major role in innate immunity. A subset of NLRP genes, with unknown functions, are expressed in oocytes and early embryos. Mutations of Nlrp5 in mice are associated with maternal-effect embryonic lethality and mutations of NLRP7 in women are associated with conception of biparental comp...
The conflicts over sex allocation and male production in insect societies have long served as an important test bed for Hamilton's theory of inclusive fitness, but have for the most part been considered separately. Here, we develop new coevolutionary models to examine the interaction between these two conflicts and demonstrate that sex ratio and co...
In this paper we propose GESwarm, a novel tool that can automatically synthesize collective behaviors for swarms of autonomous robots through evolutionary robotics. Evolutionary robotics typically relies on artificial evolution for tuning the weights of an artificial neural network that is then used as individual behavior representation. The main c...
Punishment offers a powerful mechanism for the maintenance of cooperation in human and animal societies, but the maintenance of costly punishment itself remains problematic. Game theory has shown that corruption, where punishers can defect without being punished themselves, may sustain cooperation. However, in many human societies and some insect o...
Abstract We extend previous results concerning black box search algorithms, presenting new theoretical tools related to no free lunch (NFL) where functions are restricted to some benchmark (that need not be permutation closed), algorithms are restricted to some collection (that need not be permutation closed) or limited to some number of steps, or...
One of the most influential concepts in artificial intelligence is the notion of the swarm. That is, intelligent adaptive behaviour can arise in large groups of interacting agents, even when the individual agents have limited local information and use simple rules. Self-organisation provides a basic structure in such agent societies, while natural...
In this paper, we propose a mechanism for systematic comparison of the efficacy of unsupervised evaluation methods for parameter selection of binarization algorithms in optical character recognition (OCR). We also analyze these measures statistically and ascertain whether a measure is suitable or not to assess a binarization method. The comparison...
Cooperation is ubiquitous in the natural world. What seems nonsensical is why natural selection favors a behavior whereby individuals would lose out by benefiting their competitor. This conundrum, for almost half a century, has puzzled scientists and remains a fundamental problem in biology, psychology, and economics. In recent years, the explanati...
We build a spatial individual-based multilocus model of homoploid hybrid speciation tailored for a tentative case of hybrid origin of Heliconius heurippa from H. melpomene and H. cydno in South America. Our model attempts to account for empirical patterns and data on genetic incompatibility, mating preferences and selection by predation (both based...
We describe our experience developing custom C code for simulating evolution and speciation dynamics using Kraken, the Cray XT5 system at the National Institute for Computational Sciences. The problem's underlying quadratic complexity was problematic, and the numerical instabilities we faced would either compromise or else severely complicate large...
Background:
Arguably the most influential force in human history is the formation of social coalitions and alliances (i.e., long-lasting coalitions) and their impact on individual power. Understanding the dynamics of alliance formation and its consequences for biological, social, and cultural evolution is a formidable theoretical challenge. In mos...
We describe our experience developing custom C code for simulating evolution and speciation dynamics using Kraken, the Cray XT5 system at the National Institute for Computational Sciences. The prob- lem's underlying quadratic complexity was problematic, and the numerical instabilities we faced would either compromise or else severely complicate lar...