About
236
Publications
77,559
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,696
Citations
Introduction
Much of my work is at the intersection of computer science and microeconomics, addressing computational problems in economic contexts and incentive issues in multiagent systems. I also study the application of machine learning to the automated design and analysis of algorithms for solving hard computational problems.
Current institution
Additional affiliations
January 2004 - present
September 1998 - June 2003
Publications
Publications (236)
Conspicuous consumption occurs when a consumer derives value from a good based on its social meaning as a signal of wealth, taste, and/or community affiliation. Common conspicuous goods include designer footwear, country club memberships, and artwork; conspicuous goods also exist in the digital sphere, with non-fungible tokens (NFTs) as a prominent...
Models of human behavior in game-theoretic settings often distinguish between strategic behavior, in which a player both reasons about how others will act and best responds to these beliefs, and "level-0" non-strategic behavior, in which they do not respond to explicit beliefs about others. The state of the art for predicting human behavior on unre...
With the rise of online applications, recommender systems (RSs) often encounter constraints in balancing exploration and exploitation. Such constraints arise when exploration is carried out by agents whose individual utility should be balanced with overall welfare. Recent work suggests that recommendations should be individually rational. Specifica...
Motivated by the rapid ascent of Large Language Models (LLMs) and debates about the extent to which they possess human-level qualities, we propose a framework for testing whether any agent (be it a machine or a human) understands a subject matter. In Turing-test fashion, the framework is based solely on the agent's performance, and specifically on...
Utilitarian algorithm configuration is a general-purpose technique for automatically searching the parameter space of a given algorithm to optimize its performance, as measured by a given utility function, on a given set of inputs. Recently introduced utilitarian configuration procedures offer optimality guarantees about the returned parameterizati...
Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choi...
Mobile gaming is a rapidly growing and incredibly profitable sector; having grown seven-fold over the past 10 years, it now grosses over $100 billion annually. This growth was due in large part to a shift in monetization strategies: rather than charging players an upfront cost ("pay-to-play"), games often request optional microtransactions througho...
This paper revisits the descending clock “reverse” auction design used in the U.S. Federal Communications Commission’s 2016–2017 “incentive auction.” We use extensive computational simulations to investigate the quantitative significance of various aspects of the design, leveraging a reverse auction simulator and realistic models of bidder values....
Retrieval-Augmented Language Modeling (RALM) methods, which condition a language model (LM) on relevant documents from a grounding corpus during generation, were shown to significantly improve language modeling performance. In addition, they can mitigate the problem of factually inaccurate text generation and provide natural source attribution mech...
Before deploying a language model (LM) within a given domain, it is important to measure its tendency to generate factually incorrect information in that domain. Existing factual generation evaluation methods focus on facts sampled from the LM itself, and thus do not control the set of evaluated facts and might under-represent rare and unlikely fac...
Peer grading systems aggregate noisy reports from multiple students to approximate a "true" grade as closely as possible. Most current systems either take the mean or median of reported grades; others aim to estimate students’ grading accuracy under a probabilistic model. This paper extends the state of the art in the latter approach in three key w...
Behavioral game theorists all use experimental data to evaluate predictive models of human behavior. However, they differ greatly in their choice of loss function for these evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and L2 error all being common choices. We attempt to offer a principled answer to the question...
Retrieval-Augmented Language Modeling (RALM) methods, that condition a language model (LM) on relevant documents from a grounding corpus during generation, have been shown to significantly improve language modeling while also providing a natural source attribution mechanism. Existing RALM approaches focus on modifying the LM architecture in order t...
For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only le...
We introduce Monte Carlo Forest Search (MCFS), an offline algorithm for automatically synthesizing strong tree-search solvers for proving \emph{unsatisfiability} on given distributions, leveraging ideas from the Monte Carlo Tree Search (MCTS) algorithm that led to breakthroughs in AlphaGo. The crucial difference between proving unsatisfiability and...
In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Peter Stone of the U...
Peer grading systems aggregate noisy reports from multiple students to approximate a true grade as closely as possible. Most current systems either take the mean or median of reported grades; others aim to estimate students' grading accuracy under a probabilistic model. This paper extends the state of the art in the latter approach in three key way...
Formulating real-world optimization problems often begins with making predictions from historical data (e.g., an optimizer that aims to recommend fast routes relies upon travel-time predictions). Typically, learning the prediction model used to generate the optimization problem and solving that problem are performed in two separate stages. Recent w...
When trying to solve a computational problem we are often faced with a choice among algorithms that are all guaranteed to return the right answer but that differ in their runtime distributions (e.g., SAT solvers, sorting algorithms). This paper aims to lay theoretical foundations for such choices by formalizing preferences over runtime distribution...
Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. Although an essential element of modern AI, LMs are also inherently limited in a number of ways. We discuss these limitations and how they can be avoided by adopting a systems approach. Conceptualizing the challenge as one th...
Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its we...
This paper studies a novel reviewer-paper matching approach that was recently deployed in the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), and has since been adopted by other conferences including AAAI 2022 and ICML 2022. This approach has three main elements: (1) collecting and processing input data to identify problematic matches...
In many real-world auctions, a bidder does not know her exact value for an item, but can perform a costly deliberation to reduce her uncertainty. Relatively little is known about such deliberative environments, which are fundamentally different from classical auction environments. In this paper, we propose a new approach that allows us to leverage...
Stackelberg security games form the backbone of systems like ARMOR, IRIS and PROTECT, which are in regular use by the Los Angeles International Police, US Federal Air Marshal Service and the US Coast Guard respectively. An understanding of the runtime required by algorithms that power such systems is critical to furthering the application of game t...
Supervised learning models often make systematic errors on rare subsets of the data. However, such systematic errors can be difficult to identify, as model performance can only be broken down across sensitive groups when these groups are known and explicitly labelled. This paper introduces a method for discovering systematic errors, which we call t...
Formulating real-world optimization problems often begins with making predictions from historical data (e.g., an optimizer that aims to recommend fast routes relies upon travel-time predictions). Typically, learning the prediction model used to generate the optimization problem and solving that problem are performed in two separate stages. Recent w...
Mechanical TA 2 (MTA2) is an open source web-based peer grading application that leverages trusted TA graders to incentivize high-quality peer review. A previous, prototype implementation of MTA proved the value of the concept, but was neither suitable for use at scale nor easily extensible; MTA2 is a complete reimplementation of the system that ov...
We study a dynamic non-bipartite matching problem. There is a fixed set of agent types, and agents of a given type arrive and depart according to type-specific Poisson processes. The value of a match is determined by the types of the matched agents. We present an online algorithm that is (1/8)-competitive with respect to the value of the optimal-in...
We study a dynamic non-bipartite matching problem. There is a fixed set of agent types, and agents of a given type arrive and depart according to type-specific Poisson processes. The value of a match is determined by the types of the matched agents. We present an online algorithm that is (1/6)-competitive with respect to the value of the optimal-in...
We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. We are motivated by the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels...
Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow local signals, leading to pretraining inefficiency and suboptimal downstream performance. To address this flaw, we...
In many settings, an effective way of evaluating objects of interest is to collect evaluations from dispersed individuals and to aggregate these evaluations together. Some examples are categorizing online content and evaluating student assignments via peer grading. For this data science problem, one challenge is to motivate participants to conduct...
Instrumental variable methods provide a powerful approach to estimating causal effects in the presence of unobserved confounding. But a key challenge when applying them is the reliance on untestable "exclusion" assumptions that rule out any relationship between the instrument variable and the response that is not mediated by the treatment. In this...
A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this lin...
Strangely enough, it is possible to use machine learning models to predict the satisfiability status of hard SAT problems with accuracy considerably higher than random guessing. Existing methods have relied on extensive, manual feature engineering and computationally complex features (e.g., based on linear programming relaxations). We show for the...
On-street parking is convenient, but has many disadvantages: on-street spots come at the expense of other road uses such as traffic lanes, transit lanes, bike lanes, or parklets; drivers looking for parking contribute substantially to traffic congestion and hence to greenhouse gas emissions; safety is reduced both due to the fact that drivers looki...
In many settings, an effective way of evaluating objects of interest is to collect evaluations from dispersed individuals and to aggregate these evaluations together. Some examples are categorizing online content and evaluating student assignments via peer grading. For this data science problem, one challenge is to motivate participants to conduct...
We consider the problem of a nonprofit organization ("center") that must divide resources among subsidiaries ("agents"), based on agents' reported demand forecasts, with the aim of maximizing social good (agents' valuations for the allocation minus any payments that are imposed on them). We investigate the impact of a common feature of the nonprofi...
Peer grading systems make large courses more scalable, provide students with faster and more detailed feedback, and help students to learn by thinking critically about the work of others. A key obstacle to the broader adoption of peer grading is motivating students to provide accurate grades. To incentivize accurate grading, previous work has consi...
Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters. We show that this problem can be addressed by a fully automated appro...
Recommendation systems often face exploration-exploitation tradeoffs: the system can only learn about the desirability of new options by recommending them to some user. Such systems can thus be modeled as multi-armed bandit settings; however, users are self-interested and cannot be made to follow recommendations. We ask whether exploration can neve...
Behavioral game theory seeks to describe the way actual people (as compared to idealized, "rational" agents) act in strategic situations. Our own recent work has identified iterative models, such as quantal cognitive hierarchy, as the state of the art for predicting human play in unrepeated, simultaneous-move games. Iterative models predict that ag...
Algorithm configuration methods optimize the performance of a parameterized heuristic algorithm on a given distribution of problem instances. Recent work introduced an algorithm configuration procedure ('Structured Procrastination') that provably achieves near optimal performance with high probability and with nearly minimal runtime in the worst ca...
In 2017, the U.S. Federal Communications Commission completed the world's first two-sided spectrum auction, reclaiming radio frequency spectrum from television broadcasters to meet exploding demand for mobile broadband, 5G, and other wireless services. Operations research-including a customized series of optimization models, decompositions, cuts, h...
Research in multiagent systems often does not draw a clear distinction between strategic behavior and other forms of intentional (e.g., boundedly rational) action, which are broadly lumped under the term "nonstrategic". If we are already confident that all agents in a system are perfectly rational, as often envisioned by game theory, this issue may...
Assessing the progress made in AI and contributions to the state of the art is of major concern to the community. Recently, Frechette et al. [2016] advocated performing such analysis via the Shapley value, a concept from coalitional game theory. In this paper, we argue that while this general idea is sound, it unfairly penalizes older algorithms th...
Every day we read in the scientific and popular press about advances in AI and how AI is changing our lives. Things are moving at a fast pace, with no obvious end in sight.
In recent years the availability of parallel computation resources has grown rapidly. Nevertheless, even for the most widely studied constraint programming problems such as SAT, solver development and applications remain largely focussed on sequential rather than parallel approaches. To ease the burden usually associated with designing, implementin...
We use deep learning to model interactions across two or more sets of objects, such as user-movie ratings or protein-drug bindings. The canonical representation of such interactions is a matrix (or tensor) with an exchangeability property: the encoding's meaning is not changed by permuting rows or columns. We argue that models should hence be Permu...
The optimization of algorithm (hyper-)parameters is crucial for achieving peak performance across a wide range of domains, ranging from deep neural networks to solvers for hard combinatorial problems. The resulting algorithm configuration (AC) problem has attracted much attention from the machine learning community. However, the proper evaluation o...
Algorithm configuration methods have achieved much practical success, but to date have not been backed by meaningful performance guarantees. We address this gap with a new algorithm configuration framework, Structured Procrastination. With high probability and nearly as quickly as possible in the worst case, our framework finds an algorithm configu...
The recent “incentive auction” of the US Federal Communications Commission was the first auction to reallocate radio frequencies between two different kinds of uses: from broadcast television to wireless Internet access. The design challenge was not just to choose market rules to govern a fixed set of potential trades but also, to determine the bro...
We investigate the economic outcomes that result under simulated bidder behavior in a model of the FCC's reverse auction for radio spectrum. In our simulations, limiting our notion of efficiency to the reverse auction in isolation, the reverse clock auction achieves very efficient solutions, the FCC's scoring rule greatly reduces the total payments...
Over 13 months in 2016-17 the FCC conducted an "incentive auction" to repurpose radio spectrum from broadcast television to wireless internet. In the end, the auction yielded $19.8 billion, $10.05 billion of which was paid to 175 broadcasters for voluntarily relinquishing their licenses across 14 UHF channels. Stations that continued broadcasting w...
Over 13 months in 2016-17 the FCC conducted an "incentive auction" to repurpose radio spectrum from broadcast television to wireless internet. In the end, the auction yielded $19.8 billion, $10.05 billion of which was paid to 175 broadcasters for voluntarily relinquishing their licenses across 14 UHF channels. Stations that continued broadcasting w...
The optimization of algorithm (hyper-)parameters is crucial for achieving peak performance across a wide range of domains, ranging from deep neural networks to solvers for hard combinatorial problems. The resulting algorithm configuration (AC) problem has attracted much attention from the machine learning community. However, the proper evaluation o...
After experimentation with other designs, major search engines converged on weighted, generalized second-price auctions (wGSPs) for selling keyword advertisements. Theoretical analysis is still not able to settle the question of why they found this design preferable to other alternatives. We approach this question in a new way, adopting an analytic...
WEKA is a widely used, open-source machine learning platform. Due to its intuitive interface, it is particularly popular with novice users. However, such users often find it hard to identify the best approach for their particular dataset among the many available. We describe the new version of Auto-WEKA, a system designed to help such users by auto...
Computational mechanism analysis is a recent approach to economic analysis in which a mechanism design setting is analyzed entirely by a computer. For games with non-trivial numbers of players and actions, the approach is only feasible when these games can be encoded compactly, e.g., as Action-Graph Games. Such encoding is currently a manual proces...
In many real-world systems, strategic agents' decisions can be understood as complex - i.e., consisting of multiple sub-decisions - and hence can give rise to an exponential number of pure strategies. Examples include network congestion games, simultaneous auctions, and security games. However, agents' sets of strategies are often structured, allow...
We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction,...
In many games, players’ decisions consist of multiple sub-decisions, and hence can give rise to an exponential number of pure strategies. However, this set of pure strategies is often structured, allowing it to be represented compactly, as in network congestion games, security games, and extensive form games. Reduction to the standard normal form g...
Behavioral game theory seeks to describe the way actual people (as compared to idealized, "rational" agents) act in strategic situations. Our own recent work has identified iterative models (such as quantal cognitive hierarchy) as the state of the art for predicting human play in unrepeated, simultaneous-move games (Wright & Leyton-Brown 2012, 2016...
In many settings, an effective way of evaluating objects of interest is to collect evaluations from dispersed individuals and to aggregate these evaluations together. Some examples are categorizing online content and evaluating student assignments via peer grading. For this data science problem, one challenge is to motivate participants to conduct...
A natural way of attacking a new, computationally challenging problem is to find a novel way of combining design elements introduced in existing algorithms. For example, this approach was made systematic in SATenstein [15], a highly parameterized stochastic local search (SLS) framework for SAT that unifies techniques across a wide range of well-kno...
Since 2004, increases in computational power described by Moore's law have substantially been realized in the form of additional cores rather than through faster clock speeds. To make effective use of modern hardware when solving hard computational problems, it is therefore necessary to employ parallel solution strategies. In this work, we demonstr...
Algorithms for NP-complete problems often have different strengths andweaknesses, and thus algorithm portfolios often outperform individualalgorithms. It is surprisingly difficult to quantify a component algorithm's contributionto such a portfolio. Reporting a component's standalone performance wronglyrewards near-clones while penalizing algorithms...
We investigate the problem of repacking stations in the FCC's upcoming, multi-billion-dollar "incentive auction". Early efforts to solve this problem considered mixed-integer programming formulations, which we show are unable to reliably solve realistic, national-scale problem instances. We describe the result of a multi-year investigation of alter...
The task of algorithm selection involves choosing an algorithm from a set of
algorithms on a per-instance basis in order to exploit the varying performance
of algorithms over a set of instances. The algorithm selection problem is
attracting increasing attention from researchers and practitioners in AI. Years
of fruitful applications in a number of...
It is well known that different solution strategies work well for different
types of instances of hard combinatorial problems. As a consequence, most
solvers for the propositional satisfiability problem (SAT) expose parameters
that allow them to be customized to a particular family of instances. In the
international SAT competition series, these pa...
We describe Mechanical TA, an automated peer review system, and report on our experience using it over three years. Mechanical TA differs from many other peer review systems by involving human teaching assistants (TAs) as a way to assure review quality. Human TAs both evaluate the peer reviews of students who have not yet demonstrated reviewing pro...
Hyperparameter optimization is crucial for achieving peak performance with many machine learning algorithms; however, the evaluation of new optimization techniques on real-world hyperparameter optimization problems can be very expensive. Therefore, experiments are often performed using cheap synthetic test functions with characteristics rather diff...
We study two-sided matching markets in which participants are initially endowed with partial preference orderings, lacking precise information about their true, strictly ordered list of preferences. We wish to reason about matchings that are stable with respect to agents' true preferences, and which are furthermore optimal for one given side of the...
Behavioral game theory seeks to describe the way actual people (as compared to idealized, ``rational'' agents) act in strategic situations. Our own recent work has identified iterative models (such as quantal cognitive hierarchy) as the state of the art for predicting human play in unrepeated, simultaneous-move games [Wright and Leyton-Brown 2012]....
Algorithmic Game Theory (AGT) studies problems at the interface between computer science and microeconomics. Research in this area typically attacks very general settings using theoretical tools. There are great advantages to such an approach: in particular, the field has amassed an impressive range of sweeping impossibility, optimality, and approx...
We study two-sided matching markets in which participants are initially endowed with partial preference orderings, lacking precise information about their true, strictly ordered list of preferences. We wish to reason about matchings that are stable with respect to agents' true preferences, and which are furthermore optimal for one given side of the...
State-of-the-art planners often exhibit substantial runtime variation, making it useful to be able to efficiently predict how long a given planner will take to run on a given instance. In other areas of AI, such needs are met by building so-called empirical performance models (EPMs), statistical models derived from sets of problem instances and per...
PROBLEMS ARE INTRACTABLE when they "can be solved, but not fast enough for the solution to be usable."13 NP-complete problems are commonly said to be intractable; however, the reality is more complex. All known algorithms for solving NP-complete problems require exponential time in the worst case; however, these algorithms nevertheless solve many p...
This article makes two contributions. First, we present a platform for running and analyzing multiagent reinforcement learning experiments. Second, to demonstrate this platform we undertook and evaluated an empirical test of multiagent reinforcement learning algorithms from the literature, which to our knowledge is the largest such test ever conduc...
Configuring algorithms automatically to achieve high performance is becoming increasingly relevant and important in many areas of academia and industry. Algorithm configuration methods take a parameterized target algorithm, a performance metric and a set of example data, and aim to find a parameter configuration that performs as well as possible on...
Modern solvers for hard computational problems often expose parameters that permit customization for high performance on specific instance types. Since it is tedious and time-consuming to manually optimize such highly parameterized algorithms, recent work in the AI literature has developed automated approaches for this algorithm configuration probl...
There exist many algorithms for learning how to play repeated bimatrix games. Most of these algorithms are justified in terms of some sort of theoretical guarantee. On the other hand, little is known about the empirical performance of these algorithms. Most such claims in the literature are been based on small experiments, which has hampered unders...