Satinder Singh

Satinder Singh
University of Michigan | U-M · Division of Computer Science and Engineering

About

232
Publications
43,387
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
21,505
Citations
Introduction
Skills and Expertise

Publications

Publications (232)
Article
We study emergent communication between speaker and listener recurrent neural-network agents that are tasked to cooperatively construct a blocks-world target image sampled from a generative grammar of blocks configurations. The speaker receives the target image and learns to emit a sequence of discrete symbols from a fixed vocabulary. The listener...
Preprint
This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, wh...
Preprint
All-goals updating exploits the off-policy nature of Q-learning to update all possible goals an agent could have from each transition in the world, and was introduced into Reinforcement Learning (RL) by Kaelbling (1993). In prior work this was mostly explored in small-state RL problems that allowed tabular representations and where all possible goa...
Preprint
This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage...
Article
In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal...
Article
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of...
Article
As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to...
Conference Paper
Full-text available
How detailed should we make the goals we prescribe to AI agents acting on our behalf in complex environments? Detailed and low-level specification of goals can be tedious and expensive to create, and abstract and high-level goals could lead to negative surprises as the agent may find behaviors that we would not want it to do, i.e., lead to unsafe A...
Article
In cooperative multiagent planning, it can often be beneficial for an agent to make commitments about aspects of its behavior to others, allowing them in turn to plan their own behaviors without taking the agent's detailed behavior into account. Extending previous work in the Bayesian setting, we consider instead a worst-case setting in which the a...
Conference Paper
Full-text available
Planning in MDPs often uses a smaller planning horizon than specified in the problem to save computational expense at the risk of a loss due to sub-optimal plans. Jiang et al. [2015b] recently showed that smaller than specified planning horizons can in fact be beneficial in cases where the MDP model is learned from data and therefore not accurate....
Article
Full-text available
In this paper, we introduce a new set of reinforcement learning (RL) tasks in Minecraft (a flexible 3D world). We then use these tasks to systematically compare and contrast existing deep reinforcement learning (DRL) architectures with our new memory-based DRL architectures. These tasks are designed to emphasize, in a controllable manner, issues th...
Conference Paper
An agent that adopts a commitment to another agent should act so as to bring about a state of the world meeting the specifications of the commitment. Thus, by faithfully pursuing a commitment, an agent can be trusted to make sequential decisions that it believes can cause an intended state to arise. In general, though, an agent’s actions will have...
Article
Full-text available
Monte Carlo Tree Search (MCTS) methods have proven powerful in planning for sequential decision-making problems such as Go and video games, but their performance can be poor when the planning depth and sampling trajectories are limited or when the rewards are sparse. We present an adaptation of PGRD (policy-gradient for reward-design) for learning...
Article
Full-text available
Background: Cognitive behavioral therapy (CBT) is one of the most effective treatments for chronic low back pain. However, only half of Department of Veterans Affairs (VA) patients have access to trained CBT therapists, and program expansion is costly. CBT typically consists of 10 weekly hour-long sessions. However, some patients improve after the...
Conference Paper
Full-text available
Predictive state representations (PSRs) model dynam-ical systems using appropriately chosen predictions about future observations as a representation of the current state. In contrast to the hidden states posited by HMMs or RNNs, PSR states are directly observable in the training data; this gives rise to a moment-matching spectral algorithm for lea...
Article
We consider a setting for Inverse Reinforcement Learning (IRL) where the learner is extended with the ability to actively select multiple environments, observing an agent's behavior on each environment. We first demonstrate that if the learner can experiment with any transition dynamics on some fixed set of states and actions, then there exists an...
Article
Introduction: Guidelines to prevent sudden cardiac death (SCD) following acute coronary syndrome (ACS) are widely based on cutoffs defined on left ventricular ejection fraction (LVEF) with limited use of other available data. Methods: We investigated the improvement in predicting post-ACS SCD using a multi-factorial model that integrates an assessm...
Article
Background: Postoperative atrial fibrillation (PAF) is a frequent complication following cardiothoracic surgery and is associated with increased length of stay and increased morbidity. This study develops a novel multi-factorial computational model to predict PAF, as well as a simplified risk score for PAF based on this model. Methods: A multi-fact...
Article
Background: Text messages can improve medication adherence and outcomes in several conditions. For this study, experts developed text messages addressing determinants of medication adherence: disease beliefs, medication necessity, medication concerns, and forgetfulness, as well as positive reinforcement messages for patients who were adherent. Ob...
Article
Full-text available
Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames. While not composed of natural scenes, frames in...
Article
Full-text available
The accurate and early detection of epileptic seizures in continuous electroencephalographic (EEG) data has a growing role in the management of patients with epilepsy. Early detection allows for therapy to be delivered at the start of seizures and for caregivers to be notified promptly about potentially debilitating events. The challenge to detecti...
Conference Paper
Full-text available
State abstractions are often used to reduce the complexity of model-based reinforcement learning when only limited quantities of data are available. However, choosing the appropriate level of abstraction is an important problem in practice. Existing approaches have theoretical guarantees only under strong assumptions on the domain or asymptotically...
Conference Paper
Full-text available
Kulesza et al. [2014] recently observed that low-rank spectral learning algorithms, which discard the smallest singular values of a moment matrix during training, can behave in unexpected ways, producing large errors even when the discarded singular values are arbitrarily small. In this paper we prove that when learning predictive state representat...
Article
Predictive state representations (PSRs) are models of dynamical systems that represent state as a vector of predictions about future observable events (tests) conditioned on past observed events (histories). If a practitioner selects finite sets of tests and histories that are known to be sufficient to completely capture the system, an exact PSR ca...
Conference Paper
Full-text available
Predictive state representations (PSRs) are models of dynamical systems that represent state as a vector of predictions about future observable events (tests) conditioned on past observed events (histories). If a practitioner selects finite sets of tests and histories that are known to be sufficient to completely capture the system, an exact PSR ca...
Article
Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavi...
Article
Background: Mobile health (mHealth) services cannot easily adapt to users' unique needs. Purpose: We used simulations of text messaging (SMS) for improving medication adherence to demonstrate benefits of interventions using reinforcement learning (RL). Methods: We used Monte Carlo simulations to estimate the relative impact of an intervention...
Conference Paper
Speech patterns are modulated by the emotional and neurophysiological state of the speaker. There exists a growing body of work that computationally examines this modulation in patients suffering from depression, autism, and post-traumatic stress disorder. However, the majority of the work in this area focuses on the analysis of structured speech c...
Article
Utility maximization is a key element of a number of theoretical approaches to explaining human behavior. Among these approaches are rational analysis, ideal observer theory, and signal detection theory. While some examples of these approaches define the utility maximization problem with little reference to the bounds imposed by the organism, other...
Article
We propose a framework for including information-processing bounds in rational analyses. It is an application of bounded optimality (Russell & Subramanian, 1995) to the challenges of developing theories of mechanism and behavior. The framework is based on the idea that behaviors are generated by cognitive mechanisms that are adapted to the structur...
Article
Full-text available
When faced with the problem of learning a model of a high-dimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined together to form a more complete, structured model. How...
Article
Postoperative atrial fibrillation (PAP) occurs in 10% to 65% of the patients undergoing cardiothoracic surgery. It is associated with increased post-surgical mortality and morbidity, and results in longer and more expensive hospital stays. Accurately stratifying patients for PAF allows for selective use of prophylactic therapies (e.g., amiodarone)....
Article
Full-text available
Stackelberg games form the core of a number of tools de-ployed for computing optimal patrolling strategies in adver-sarial domains, such as the US Federal Air Marshall Service and the US Coast Guard. In traditional Stackelberg security game models the attacker knows only the probability that each target is covered by the defender, but is oblivious...
Article
Missing values are a common problem when applying classification algorithms to real-world medical data. This is especially true for trauma patients, where the emergent nature of the cases makes it difficult to collect all of the relevant data for each patient. Standard methods for handling missingness first learn a model to estimate missing data va...
Article
Full-text available
The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent break...
Conference Paper
Eye-movements in reading exhibit frequency spillover effects: fixation durations on a word are affected by the frequency of the previous word. We explore the idea that this effect may be an emergent property of a computationally rational eyemovement strategy that is navigating a tradeoff between processing immediate perceptual input, and continued...
Article
Quality Improvement in the ICU ISESSION TYPE: Original Investigation SlidePRESENTED ON: Monday, October 28, 2013 at 01:45 PM - 03:15 PMPURPOSE: Existing ICU scoring metrics do not rely upon minute-to-minute vital sign measurements over the course of the first 24 hours in the ICU and require the use of trained annotators.METHODS: We analyzed 19,685...
Conference Paper
Full-text available
The rate of introduction of new technology into safety critical domains continues to increase. Improvements in evaluation methods are needed to keep pace with the rapid development of these technologies. A significant challenge in improving evaluation is developing efficient methods for collecting and characterizing knowledge of the domain and cont...
Article
Full-text available
Interactive voice response (IVR) calls enhance health systems' ability to identify health risk factors, thereby enabling targeted clinical follow-up. However, redundant assessments may increase patient dropout and represent a lost opportunity to collect more clinically useful data. We determined the extent to which previous IVR assessments predicte...
Article
We explore the idea that eye-movement strategies in reading are precisely adapted to the joint constraints of task structure, task payoff, and processing architecture. We present a model of saccadic control that separates a parametric control policy space from a parametric machine architecture, the latter based on a small set of assumptions derived...
Article
Full-text available
Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the disco...
Article
Full-text available
We are interested in the problem of planning for factored POMDPs. Building on the recent results of Kearns, Mansour and Ng, we provide a planning algorithm for factored POMDPs that exploits the accuracy-efficiency tradeoff in the belief state simplification introduced by Boyen and Koller.
Article
Full-text available
Stochastic games generalize Markov decision processes (MDPs) to a multiagent setting by allowing the state transitions to depend jointly on all player actions, and having rewards determined by multiplayer matrix games at each state. We consider the problem of computing Nash equilibria in stochastic games, the analogue of planning in MDPs. We begin...
Article
Full-text available
Multi-agent games are becoming an increasing prevalent formalism for the study of electronic commerce and auctions. The speed at which transactions can take place and the growing complexity of electronic marketplaces makes the study of computationally simple agents an appealing direction. In this work, we analyze the behavior of agents that increme...
Article
Full-text available
In this work, we introduce graphical modelsfor multi-player game theory, and give powerful algorithms for computing their Nash equilibria in certain cases. An n-player game is given by an undirected graph on n nodes and a set of n local matrices. The interpretation is that the payoff to player i is determined entirely by the actions of player i and...
Conference Paper
Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in common-payoff (or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given tea...
Article
Full-text available
Modeling dynamical systems, both for control purposes and to make predictions about their behavior, is ubiquitous in science and engineering. Predictive state representations (PSRs) are a recently introduced class of models for discrete-time dynamical systems. The key idea behind PSRs and the closely related OOMs (Jaeger's observable operator model...
Article
Full-text available
Models of dynamical systems based on predictive state representations (PSRs) are defined strictly in terms of observable quantities, in contrast with traditional models (such as Hidden Markov Models) that use latent variables or statespace representations. In addition, PSRs have an effectively infinite memory, allowing them to model some systems th...
Article
Full-text available
Consider a multi-agent system in a dynamic and uncertain environment. Each agent's local decision problem is modeled as a Markov decision process (MDP) and agents must coordinate on a joint action in each period, which provides a reward to each agent and causes local state transitions. A social planner knows the model of every agent's MDP and wants...
Article
Full-text available
A graphical multiagent model (GMM) represents a joint distribution over the behavior of a set of agents. One source of knowledge about agents' behavior may come from gametheoretic analysis, as captured by several graphical game representations developed in recent years. GMMs generalize this approach to express arbitrary distributions, based on game...
Article
Abstraction followed by equilibrium finding has emerged as the leading approach to solving games. Lossless abstraction typically yields games that are still too large to solve, so lossy abstraction is needed. Unfortunately, prior lossy game abstraction algorithms have no guarantees on solution quality. We developed a framework that enables the desi...
Conference Paper
Full-text available
Recent work has defined an optimal reward problem (ORP) in which an agent designer, with an objective reward function that evaluates an agent's behavior, has a choice of what reward function to build into a learning or planning agent to guide its behavior. Existing results on ORP show weak mitigation of limited computational resources, i.e., the ex...
Conference Paper
Full-text available
Factored models of multiagent systems address the complexity of joint behavior by exploiting locality in agent interactions. History-dependent graphical multiagent models (hGMMs) further capture dynamics by conditioning behavior on history. The challenges of modeling real human behavior motivated us to extend the hGMM representation by distinguishi...
Article
Factored models of multiagent systems address the complexity of joint behavior by exploiting locality in agent interactions. History-dependent graphical multiagent models (hGMMs) further capture dynamics by conditioning behavior on history. The challenges of modeling real human behavior motivated us to extend the hGMM representation by distinguishi...
Article
Full-text available
The explore{exploit dilemma is one of the central challenges in Reinforcement Learning (RL). Bayesian RL solves the dilemma by providing the agent with information in the form of a prior distribution over environments; however, full Bayesian planning is intractable. Planning with the mean MDP is a common myopic approximation of Bayesian planning. W...
Article
Full-text available
Stackelberg games increasingly influence security poli-cies deployed in real-world settings. Much of the work to date focuses on devising a fixed randomized strategy for the defender, accounting for an attacker who opti-mally responds to it. In practice, defense policies are often subject to constraints and vary over time, allow-ing an attacker to...
Article
Full-text available
Randomized first-mover strategies of Stackelberg games are used in several deployed applications to allocate limited resources for the protection of critical infrastructure. Stackelberg games model the fact that a strategic attacker can surveil and exploit the defender's strategy, and randomization guards against the worst effects by making the def...
Article
Full-text available
Stackelberg games have been used in several deployed ap-plications of game theory to make recommendations for al-locating limited resources for protecting critical infrastruc-ture. The resource allocation strategies are randomized to prevent a strategic attacker from using surveillance to learn and exploit patterns in the allocation. An important l...
Conference Paper
Full-text available
Modeling information diffusion in networks enables reasoning about the spread of ideas, news, opinion, and technology across a network of agents. Existing models generally assume a given network structure, in practice derived from observations of agent communication or other interactions. In many realistic settings, however, observing all connectio...
Article
Full-text available
The First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding...
Article
Full-text available
Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construc...
Conference Paper
Full-text available
We consider semi-autonomous agents that have uncertain knowledge about their environment, but can ask what action the operator would prefer taking in the current or in a potential future state. Asking queries can help improve behavior, but if queries come at a cost (e.g., due to limited operator attention), the number of queries needs to be minimiz...
Conference Paper
Full-text available
Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints o...
Conference Paper
Full-text available
Reinforcement learning (RL) research typically develops algorithms for helping an RL agent best achieve its goals—however they came to be defined—while ignoring the relationship of those goals to the goals of the agent designer. We extend agent design to include the meta-optimization problem of selecting internal agent goals (rewards) which optimiz...
Conference Paper
Full-text available
When its human operator cannot continuously supervise (much less teleoperate) an agent, the agent should be able to recognize its limitations and ask for help when it risks making autonomous decisions that could significantly surprise and disappoint the operator. Inspired by previous research on making exploration-exploitation tradeoff decisions an...
Article
Full-text available
There is great interest in building intrinsic motivation into artificial systems using the reinforcement learning framework. Yet, what intrinsic motivation may mean computationally, and how it may differ from extrinsic motivation, remains a murky and controversial subject. In this paper, we adopt an evolutionary perspective and define a new optimal...
Article
Full-text available
Engineering and Applied Sciences Much of AI is concerned with the design of intelligent agents. A complementary challenge is to understand how to design “rules of encounter” (Rosenschein and Zlotkin 1994) by which to promote simple, robust and beneficial interactions between multiple intelligent agents. This is a natural development, as AI is incre...
Conference Paper
Full-text available
A dynamic model of a multiagent system defines a proba- bility distribution over possible system behaviors over time. Alternative representations for such models present trade- os in expressive power, and accuracy and cost for inferen- tial tasks of interest. In a history-dependent representation, behavior at a given time is specified as a probabil...
Conference Paper
Full-text available
Learning, planning, and representing knowledge in large state spaces at multiple levels of temporal abstraction are key, long-standing challenges for building flexible autonomous agents. The options framework provides a formal mechanism for specifying and learning temporally-extended skills. Although past work has demonstrated the benefit of acting...