Michael L. Littman

Michael L. Littman
Brown University · Department of Computer Science

PhD

About

299
Publications
104,473
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
36,978
Citations
Additional affiliations
July 2002 - June 2012
Rutgers, The State University of New Jersey
Position
  • Professor (Full)
January 2000 - March 2002
AT&T
Position
  • Principal member of technical staff

Publications

Publications (299)
Article
Trigger-action programming (TAP) empowers a wide array of users to automate Internet of Things (IoT) devices. However, it can be challenging for users to create completely correct trigger-action programs (TAPs) on the first try, necessitating debugging. While TAP has received substantial research attention, TAP debugging has not. In this paper, we...
Conference Paper
Reward is the driving force for reinforcement-learning agents. We here set out to understand the expressivity of Markov reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task": (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial o...
Conference Paper
In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved. Previous studies have alluded to this fact but have not examine...
Article
Full-text available
One of the most striking features of human cognition is the ability to plan. Two aspects of human planning stand out—its efficiency and flexibility. Efficiency is especially impressive because plans must often be made in complex environments, and yet people successfully plan solutions to many everyday problems despite having limited cognitive resou...
Preprint
We employ Proximal Iteration for value-function optimization in reinforcement learning. Proximal Iteration is a computationally efficient technique that enables us to bias the optimization procedure towards more desirable solutions. As a concrete application of Proximal Iteration in deep reinforcement learning, we endow the objective function of th...
Preprint
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over b...
Preprint
Reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement-learning hardness called the bad-policy density. This quantity measures the fraction of the deterministic stat...
Preprint
Full-text available
Fluid human-agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are lik...
Article
Full-text available
Theory of mind enables an observer to interpret others' behavior in terms of unobservable beliefs, desires, intentions, feelings, and expectations about the world. This also empowers the person whose behavior is being observed: By intelligently modifying her actions, she can influence the mental representations that an observer ascribes to her, and...
Article
We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These the...
Article
A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis...
Article
In this work, we propose and explore Deep Graph Value Network (DeepGV) as a promising method to work around sample complexity in deep reinforcement-learning agents using a message-passing mechanism. The main idea is that the agent should be guided by structured non-neural-network algorithms like dynamic programming. According to recent advances in...
Preprint
One of the most striking features of human cognition is the capacity to plan. Two aspects of human planning stand out: its efficiency, even in complex environments, and its flexibility, even in changing environments. Efficiency is especially impressive because directly computing an optimal plan is intractable, even for modestly complex tasks, and y...
Article
Two common approaches for automating IoT smart spaces are having users write rules using trigger-action programming (TAP) or training machine learning models based on observed actions. In this paper, we unite these approaches. We introduce and evaluate Trace2TAP, a novel method for automatically synthesizing TAP rules from traces (time-stamped logs...
Preprint
Planning is useful. It lets people take actions that have desirable long-term consequences. But, planning is hard. It requires thinking about consequences, which consumes limited computational and cognitive resources. Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their action...
Preprint
Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks. Our...
Preprint
A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned state-action value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep RBF value functions: state-action value functions learned using a deep neural network wit...
Preprint
We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These the...
Chapter
Because machine learning would benefit from reduced data requirements, some prior work has proposed using humans not just to label data, but also to explain those labels. To characterize the evidence humans might want to provide, we conducted a user study and a data experiment. In the user study, 75 participants provided classification labels for 2...
Preprint
In order for robots and other artificial agents to efficiently learn to perform useful tasks defined by an end user, they must understand not only the goals of those tasks, but also the structure and dynamics of that user's environment. While existing work has looked at how the goals of a task can be inferred from a human teacher, the agent is ofte...
Article
Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and...
Article
ion can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion theory,...
Preprint
Agents that can make better use of computation, experience, time, and memory can solve a greater range of problems more effectively. A crucial ingredient for managing such finite resources is intelligently chosen abstract representations. But, how do abstractions facilitate problem solving under limited resources? What makes an abstraction useful?...
Conference Paper
Trigger-action programming (TAP) is a programming model enabling users to connect services and devices by writing if-then rules. As such systems are deployed in increasingly complex scenarios, users must be able to identify programming bugs and reason about how to fix them. We first systematize the temporal paradigms through which TAP systems could...
Article
Full-text available
Carrots and sticks motivate behavior, and people can teach new behaviors to other organisms, such as children or nonhuman animals, by tapping into their reward learning mechanisms. But how people teach with reward and punishment depends on their expectations about the learner. We examine how people teach using reward and punishment by contrasting t...
Preprint
Theory of mind enables an observer to interpret others’ behavior in terms of unobservable beliefs, desires, intentions, feelings, and expectations about the world. This also empowers the person whose behavior is being observed: By intelligently modifying her actions, she can influence the mental representations that an observer ascribes to her, and...
Preprint
Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and...
Article
Existing work in machine learning has shown that algorithms can benefit from the use of curricula—learning first on simple examples before moving to more difficult problems. This work studies the curriculum-design problem in the context of sequential decision tasks, analyzing how different curricula affect learning in a Sokoban-like domain, and pre...
Article
Significance Two aims of behavioral science, often pursued separately, are to model the evolutionary dynamics and cognitive mechanisms governing behavior in social conflicts. Combining these approaches, we show that the dynamics of proximate mechanisms such as reward learning can play a pivotal role in determining evolutionary outcomes. We focus on...
Conference Paper
This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy....
Article
Full-text available
Background Computer-delivered interventions have been shown to be effective in reducing alcohol consumption in heavy drinking college students. However, these computer-delivered interventions rely on mouse, keyboard, or touchscreen responses for interactions between the users and the computer-delivered intervention. The principles of motivational i...
Article
We propose a new task-specification language for Markov decision processes that is designed to be an improvement over reward functions by being environment independent. The language is a variant of Linear Temporal Logic (LTL) that is extended to probabilistic specifications in a way that permits approximations to be learned in finite time. We provi...
Article
Humans often attempt to influence one another’s behavior using rewards and punishments. How does this work? Psychologists have often assumed that “evaluative feedback” influences behavior via standard learning mechanisms that learn from environmental contingencies. On this view, teaching with evaluative feedback involves leveraging learning systems...
Article
For agents and robots to become more useful, they must be able to quickly learn from non-technical users. This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions...
Conference Paper
While researchers have long investigated end-user programming using a trigger-action (if-then) model, the website IFTTT is among the first instances of this paradigm being used on a large scale. To understand what IFTTT users are creating, we scraped the 224,590 programs shared publicly on IFTTT as of September 2015 and are releasing this dataset t...
Conference Paper
Full-text available
Successfully navigating the social world requires reasoning about both high-level strategic goals, such as whether to cooperate or compete, as well as the low-level actions needed to achieve those goals. While previous work in experimental game theory has examined the former and work on multi-agent systems has examined the later, there has been lit...
Article
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well...
Article
For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all...
Article
We investigate the practicality of letting average users customize smart-home devices using trigger-action ("if, then") programming. We find trigger-action programming can express most desired behaviors submitted by participants in an online study. We identify a class of triggers requiring machine learning that has received little attention. We eva...
Article
Full-text available
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in...
Article
Full-text available
Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms...
Article
Full-text available
In this work, we introduce graphical modelsfor multi-player game theory, and give powerful algorithms for computing their Nash equilibria in certain cases. An n-player game is given by an undirected graph on n nodes and a set of n local matrices. The interpretation is that the payoff to player i is determined entirely by the actions of player i and...
Article
Full-text available
This paper considers the problem of learning task specific web-service descriptions from traces of users successfully completing a task. Unlike prior approaches, we take a traditional machine-learning perspective to the construction of web-service models from data. Our representation models both syntactic features of web-service schemas (including...
Article
Full-text available
Two-player complete-information game trees are perhaps the simplest possible setting for studying general-sum games and the computational problem of finding equilibria. These games admit a simple bottom-up algorithm for finding subgame perfect Nash equilibria efficiently. However, such an algorithm can fail to identify optimal equilibria, such as t...
Article
Full-text available
Model-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic progr...
Article
Full-text available
Continuous state spaces and stochastic, switching dynamics characterize a number of rich, realworld domains, such as robot navigation across varying terrain. We describe a reinforcementlearning algorithm for learning in these domains and prove for certain environments the algorithm is probably approximately correct with a sample complexity that sca...
Article
Full-text available
We present a polynomial-time algorithm that always finds an (approximate) Nash equilibrium for repeated two-player stochastic games. The algorithm exploits the folk theorem to derive a strategy profile that forms an equilibrium by buttressing mutually beneficial behavior with threats, where possible. One component of our algorithm efficiently searc...
Conference Paper
Full-text available
This paper addresses the problem of training an artificial agent to follow verbal instructions representing high-level tasks using a set of instructions paired with demonstration traces of appropriate behavior. From this data, a mapping from instructions to tasks is learned, enabling the agent to carry out new instructions in novel environments.
Conference Paper
Full-text available
This article presents a population-based cognitive hierarchy model that can be used to estimate the reasoning depth and sophistication of a collection of opponents' strategies from observed behavior in repeated games. This framework provides a compact representation of a distribution of complicated strategies by reducing them to a small number of p...
Article
Full-text available
This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforce...
Article
Full-text available
We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and h...
Article
The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts.twitterFollow us on Twitter ...
Article
Full-text available
Bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in the use of Monte-Carlo tree search (MCTS) have shown that it is possible to act near-optimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayes-optimal behavior in an unknown MDP is equivalent to optimal behavior in the k...
Article
Full-text available
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which...
Article
Full-text available
We show that the problem of finding an optimal stochastic 'blind' controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science. Our result establishes that the more genera...
Conference Paper
Full-text available
This paper examines how the choice of the selection mechanism in an evolutionary algorithm impacts the objective function it optimizes, specifically when the fitness function is noisy. We provide formal results showing that, in an abstract infinite-population model, proportional selection optimizes expected fitness, truncation selection optimizes o...
Conference Paper
Full-text available
Although household devices and home appliances function more and more as network-connected computers, they don’t provide programming interfaces for the average user. We first identify the programming primitives and control structures necessary for the universal programming of devices. We then propose a mapping between the features necessary for the...
Article
Full-text available
The special issue of Machine Learning focused on introducing the empirical evaluations in reinforcement learning. It was demonstrated that reinforcement-learning approaches were evaluated in one or more of three ways, such as subjectively, theoretically, and empirically. Subjective evaluations, in which researchers assessed the significance and pot...
Article
The nodes in a wireless ad hoc network act as routers in a self-configuring network without infrastructure. An application running on the nodes in the ad hoc network may require that intermediate nodes act as routers, receiving and forwarding data packets to other nodes to overcome the limitations of noise, router congestion and limited transmissio...
Article
Full-text available
For the problem of model choice in linear regression, we introduce a Bayesian adaptive sampling algorithm (BAS), that samples models without replacement from the space of models. For problems that permit enumeration of all models, BAS is guaranteed to enumerate the model space in 2p iterations where p is the number of potential variables under cons...
Article
Full-text available
The First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding...
Article
Lexicographic preference models (LPMs) are an intuitive representation that corresponds to many real-world preferences exhibited by human decision makers. Previous algorithms for learning LPMs produce a “best guess” LPM that is consistent with the observations. Our approach is more democratic: we do not commit to a single LPM. Instead, we approxima...
Article
Full-text available
We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcement-learning and active-learning...
Article
A good (a professional soft-serve ice cream maker, as it turns out) is being raffled off by an auctioneer. Bidders can buy tickets in the raffle for $1 apiece and at the end of the raffle the good is assigned to one of the bidders. The probability of assigning the good to a given bidder is proportional to the number of tickets he or she bought. Fiv...
Article
Most Relevant Explanation (MRE) is the problem of finding a partial instantiation of a set of target variables that maximizes the generalized Bayes factor as the explanation for given evidence in a Bayesian network. MRE has a huge solution space and is extremely difficult to solve in large Bayesian networks. In this paper, we first prove that MRE i...
Article
Full-text available
Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construc...
Article
Full-text available
The nodes in a wireless ad hoc network act as routers in a self-configuring network without infrastructure. An application running on the nodes in the ad hoc network may require that intermediate nodes act as routers, receiving and forwarding data packets to other nodes to overcome the limitations of noise, router congestion and limited transmissio...
Conference Paper
Full-text available
In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision Processes (MDPs). Our algorithm, Hierarchical Optimistic Optimization applied to Trees (HOOT) addresses planning in continuous-action MDPs. Empirical results are given that...
Conference Paper
Full-text available
In this paper, we apply tools from inverse reinforcement learning (IRL) to the problem of learning from (unlabeled) demonstration trajectories of behavior generated by varying "intentions" or objectives. We derive an EM approach that clusters observed trajectories by inferring the objectives for each cluster using any of several possible IRL method...
Conference Paper
Full-text available
The field of multiagent decision making is extending its tools from classical game theory by embracing reinforcement learning, statistical analysis, and opponent modeling. For example, behavioral economists conclude from experimental results that people act according to levels of reasoning that form a "cognitive hierarchy" of strategies, rather tha...
Data
Full-text available
Giving Everyone A Reason To Program Our programming-based interface allows end users to do MORE than they can with a standard button-style interface End users should be able to accomplish the same tasks with the traditional interface, only FASTER People will be able to learn to do new tasks SOONER because the programming interface is the same acros...
Chapter
Full-text available
Lexicographic preference models (LPMs) are one of the simplest yet most commonly used preference representations. In this chapter, we formally define LPMs and present learning algorithms for mining these models from data. In particular, we study a greedy algorithm that produces a "best guess" LPM that is consistent with the observations and two vot...
Article
The sample complexity of a reinforcement-learning algorithm is highly coupled to how proficiently it explores, which in turn depends critically on the effective size of its state space. This paper proposes a new exploration mechanism for model-based algorithms in continuous state spaces that automatically discovers the relevant dimensions of the en...
Conference Paper
Full-text available
Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-act...
Conference Paper
Full-text available
This paper develops a generalized apprenticeship learning protocol for reinforcementlearning agents with access to a teacher who provides policy traces (transition and reward observations). We characterize sufficient conditions of the underlying models for efficient apprenticeship learning and link this criteria to two established learnability clas...
Conference Paper
Full-text available
Recent advancements in model-based reinforcement learn- ing have shown that the dynamics of many structured do- mains (e.g. DBNs) can be learned with tractable sample com- plexity, despite their exponentially large state spaces. U n- fortunately, these algorithms all require access to a plann er that computes a near optimal policy, and while many t...
Article
Full-text available
One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function a...
Conference Paper
Full-text available
We describe the "Great Insights in Computer Science" courses that are taught at Rutgers and UMBC. These courses were designed independently, but have in common a broad, engaging introduction to computing for non-majors. Both courses include a programming component to help the students gain an intuition for computational concepts, but neither is pri...