Thorsten Joachims

Thorsten Joachims
Cornell University | CU · Department of Computer Science

About

256
Publications
61,188
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
53,629
Citations

Publications

Publications (256)
Preprint
Many large-scale recommender systems consist of two stages, where the first stage focuses on efficiently generating a small subset of promising candidates from a huge pool of items for the second-stage model to curate final recommendations from. In this paper, we investigate how to ensure groups fairness to the items in this two-stage paradigm. In...
Preprint
Full-text available
Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user beh...
Preprint
Full-text available
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data. However, there are fundamental limits to using existing log data alone, since the counterfactual estimators that are commonly used in these methods can have large bias and la...
Preprint
Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates. In this paper, we investigate what guarantees a screening classifier can provide, independently of...
Article
In recent years, a new line of research has taken an interventional view of recommender systems, where recommendations are viewed as actions that the system takes to have a desired effect. This interventional view has led to the development of counterfactual inference techniques for evaluating and optimizing recommendation policies. This article ex...
Conference Paper
Rankings are the primary interface through which many online platforms match users to items (e.g. news, products, music, video). In these two-sided markets, not only do the users draw utility from the rankings, but the rankings also determine the utility (e.g. exposure, revenue) for the item providers (e.g. publishers, sellers, artists, studios). I...
Preprint
Full-text available
Fairness has emerged as an important consideration in algorithmic decision-making. Unfairness occurs when an agent with higher merit obtains a worse outcome than an agent with lower merit. Our central point is that a primary cause of unfairness is uncertainty. A principal or algorithm making decisions never has access to the agents' true merit, and...
Preprint
Based on the success of recommender systems in e-commerce, there is growing interest in their use in matching markets (e.g., labor). While this holds potential for improving market fluidity and fairness, we show in this paper that naively applying existing recommender systems to matching markets is sub-optimal. Considering the standard process wher...
Preprint
Contextual bandit algorithms have become widely used for recommendation in online systems (e.g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users. This raises questions of fairness to the items -- and to the sellers, artists, and writers that benefit from this exposure. We argue...
Preprint
Ranking items by their probability of relevance has long been the goal of conventional ranking systems. While this maximizes traditional criteria of ranking performance, there is a growing understanding that it is an oversimplification in online platforms that serve not only a diverse user population, but also the producers of the items. In particu...
Preprint
Full-text available
Learning effective contextual-bandit policies from past actions of a deployed system is highly desirable in many settings (e.g. voice assistants, recommendation, search), since it enables the reuse of large amounts of log data. State-of-the-art methods for such off-policy learning, however, are based on inverse propensity score (IPS) weighting. A k...
Preprint
Full-text available
Rankings are the primary interface through which many online platforms match users to items (e.g. news, products, music, video). In these two-sided markets, not only the users draw utility from the rankings, but the rankings also determine the utility (e.g. exposure, revenue) for the item providers (e.g. publishers, sellers, artists, studios). It h...
Preprint
In offline reinforcement learning (RL), the goal is to learn a successful policy using only a dataset of historical interactions with the environment, without any additional online interactions. This serves as an extreme test for an agent's ability to effectively use historical data, which is critical for efficient RL. Prior work in offline RL has...
Preprint
Addressing unfairness in rankings has become an increasingly important problem due to the growing influence of rankings in critical decision making, yet existing learning-to-rank algorithms suffer from multiple drawbacks when learning fair ranking policies from implicit feedback. Some algorithms suffer from extrinsic reasons of unfairness due to in...
Conference Paper
The REVEAL workshop¹ focuses on framing the recommendation problem as a one of making personalized interventions. Moreover, these interventions sometimes depend on each other, where a stream of interactions occurs between the user and the system, and where each decision to recommend something will have an impact on future steps and long-term reward...
Conference Paper
Accurate estimates of examination bias are crucial for unbiased learning-to-rank from implicit feedback in search engines and recommender systems, since they enable the use of Inverse Propensity Score (IPS) weighting techniques to address selection biases and missing data. Unfortunately, existing examination-bias estimators are limited to the Posit...
Conference Paper
Implicit feedback (e.g., click, dwell time) is an attractive source of training data for Learning-to-Rank, but its naive use leads to learning results that are distorted by presentation bias. For the special case of optimizing average rank for linear ranking functions, however, the recently developed SVM-PropRank method has shown that counterfactua...
Preprint
Full-text available
Conventional Learning-to-Rank (LTR) methods optimize the utility of the rankings to the users, but they are oblivious to their impact on the ranked items. However, there has been a growing understanding that the latter is important to consider for a wide range of ranking applications (e.g. online marketplaces, job placement, admissions). To address...
Conference Paper
Recommender systems rely heavily on the predictive accuracy of the learning algorithm. Most work on improving accuracy has focused on the learning algorithm itself. We argue that this algorithmic focus is myopic. In particular, since learning algorithms generally improve with more and better data, we propose shaping the feedback generation process...
Preprint
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias when observation propensities are known, it remains to show how...
Preprint
Full-text available
The ability to perform offline A/B-testing and off-policy learning using logged contextual bandit feedback is highly desirable in a broad range of applications, including recommender systems, search engines, ad placement, and personalized health care. Both offline A/B-testing and off-policy learning require a counterfactual estimator that evaluates...
Preprint
Full-text available
Accurate estimates of examination bias are crucial for unbiased learning-to-rank from implicit feedback in search engines and recommender systems, since they enable the use of Inverse Propensity Score (IPS) weighting techniques to address selection biases and missing data \citep{Joachims/etal/17a}. Unfortunately, existing examination-bias estimator...
Conference Paper
The inaugural REVEAL workshop¹ focuses on revisiting the offline evaluation problem for recommender systems. Being able to perform offline experiments is key to rapid innovation; however practitioners often observe significant differences between offline results and the outcome of an online experiment, where users are actually exposed to the result...
Conference Paper
Full-text available
Rankings are ubiquitous in the online world today. As we have transitioned from finding books in libraries to ranking products, jobs, job applicants, opinions and potential romantic partners, there is a substantial precedent that ranking systems have a responsibility not only to their users but also to the items being ranked. To address these often...
Conference Paper
Full-text available
Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user-centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how...
Preprint
Full-text available
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can...
Preprint
Full-text available
Implicit feedback (e.g., clicks, dwell times) is an attractive source of training data for Learning-to-Rank, but it inevitably suffers from biases such as position bias. It was recently shown how counterfactual inference techniques can provide a rigorous approach for handling these biases, but existing methods are restricted to the special case of...
Article
Full-text available
Recommender systems rely heavily on the predictive accuracy of the learning algorithm. Most work on improving accuracy has focused on the learning algorithm itself. We argue that this algorithmic focus is myopic. In particular, since learning algorithms generally improve with more and better data, we propose shaping the feedback generation process...
Article
Full-text available
Rankings are ubiquitous in the online world today. As we have transitioned from finding books in libraries to ranking products, jobs, job applicants, opinions and potential romantic partners, there is a substantial precedent that ranking systems have a responsibility not only to their users but also to the items being ranked. To address these often...
Conference Paper
Full-text available
Any learning algorithm for recommendation faces a fundamental trade-off between exploiting partial knowledge of a user»s interests to maximize satisfaction in the short term and discovering additional user interests to maximize satisfaction in the long term. To enable discovery, a machine learning algorithm typically elicits feedback on items it is...
Conference Paper
Full-text available
Accurately evaluating new policies (e.g. ad-placement models, ranking functions, recommendation functions) is one of the key prerequisites for improving interactive systems. While the conventional approach to evaluation relies on online A/B tests, recent work has shown that counterfactual estimators can provide an inexpensive and fast alternative,...
Article
Full-text available
This paper examines the reliability of implicit feedback generated from clickthrough data in WWW search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance j...
Article
Full-text available
We present an interface that can be leveraged to quickly and effortlessly elicit people's preferences for visual stimuli, such as photographs, visual art and screensavers, along with rich side-information about its users. We plan to employ the new interface to collect dense recommender datasets that will complement existing sparse industry-scale da...
Conference Paper
Full-text available
Online marketplaces, search engines, and databases employ aggregated social information to rank their content for users. Two ranking heuristics commonly implemented to order the available options are the average review score and item popularity—that is, the number of users who have experienced an item. These rules, although easy to implement, only...
Article
Full-text available
Online marketplaces, search engines, and databases employ aggregated social information to rank their content for users. Two ranking heuristics commonly implemented to order the available options are the average review score and item popularity-that is, the number of users who have experienced an item. These rules, although easy to implement, only...
Article
Full-text available
Accurately evaluating new policies (e.g. ad-placement models, ranking functions, recommendation functions) is one of the key prerequisites for improving interactive systems. While the conventional approach to evaluation relies on online A/B tests, recent work has shown that counterfactual estimators can provide an inexpensive and fast alternative,...
Conference Paper
Full-text available
Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how...
Article
Full-text available
The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news. Recent approaches for off-policy evaluation and learning in these settings appear promising. With this paper, we provide re...
Conference Paper
Full-text available
Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In t...
Article
Full-text available
Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how...
Conference Paper
Full-text available
In the study of human learning, there is broad evidence that our ability to retain information improves with repeated exposure and decays with delay since last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been taught. A common way to ba...
Conference Paper
Full-text available
We present a general probabilistic framework for predicting the outcome of pairwise matchups (e.g. two-player sport matches) and pairwise preferences (e.g. product preferences), both of which have widespread applications ranging from matchmaking in computer games to recommendation in e-commerce. Unlike existing models for these tasks, our model not...
Chapter
Full-text available
Peer assessment is the most common approach to evaluating scientific work, and it is also gaining popularity for scaling evaluation of student work in large and distributed classes. The key idea is that each peer reviewer or grader rates a relatively small subset of the items, and that some method of manual, semi-automatic, or fully-automatic aggre...
Conference Paper
Full-text available
Students in online courses generate large amounts of data that can be used to personalize the learning process and improve quality of education. In this paper, we present the Latent Skill Embedding (LSE), a probabilistic model of students and educational content that can be used to recommend personalized sequences of lessons with the goal of helpin...
Article
Full-text available
Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reu...
Conference Paper
Full-text available
In this paper, we study shortlists as an interface component for recommender systems with the dual goal of supporting the user's decision process, as well as improving implicit feedback elicitation for increased recommendation quality. A shortlist is a temporary list of candidates that the user is currently considering, e.g., a list of a few movies...
Article
Full-text available
In the study of human learning, there is broad evidence that our ability to retain a piece of information improves with repeated exposure, and that it decays with delay since the last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been ta...
Article
Full-text available
Students in online courses generate large amounts of data that can be used to personalize the learning process and improve quality of education. In this paper, we present the Latent Skill Embedding (LSE), a probabilistic model of students and educational content that can be used to recommend personalized sequences of lessons with the goal of helpin...
Article
Full-text available
Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach l...
Conference Paper
Full-text available
We present a method for learning potentially intransitive preference relations from pairwise comparison and matchup data. Unlike standard preference-learning models that represent the properties of each item/player as a single number, our method infers a multi-dimensional representation for the different aspects of each item/player's strength. We s...
Article
Full-text available
We present a factorization framework to analyze the data of a regression learning task with two peculiarities. First, inputs can be split into two parts that represent semantically significant entities. Second, the performance of regressors is very low. The basic idea of the approach presented here is to try to learn the ordering relations of the t...
Article
Full-text available
In this paper, we study the impact of design choices for recommender systems on one-choice tasks where users want to select one item out of a variety of options. Instead of focusing on only user factors or recommendation quality, we consider how an interface design that provides the user with digital short-term memory impacts both user behavior and...
Article
Understanding and modeling human preferences is one of the key problems in applications ranging from marketing to automated recommendation. In this paper, we focus on learning and analyzing the preferences of consumers regarding food products. In particular, we explore Machine Learning methods that embed consumers and products in an Euclidean space...
Article
We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than simple geometric constraints on trajectories; they are rather governed by the surrounding context of various objects and human interactions in the environment. W...
Conference Paper
Full-text available
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on...
Conference Paper
Full-text available
We address the problem of assessing the quality of a ranking system (e.g., search engine, recommender system, review ranker) given a fixed budget for collecting expert judgments. In particular, we propose a method that selects which items to judge in order to optimize the accuracy of the quality estimate. Our method is not only efficient, but also...
Article
Full-text available
We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. Interactions in the Coactive Learning model take the following form: at each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an obj...
Article
Full-text available
Massive Online Open Courses have become an accessible and affordable choice for education. This has led to new technical challenges for instructors such as student evaluation at scale. Recent work has found ordinal peer grading, where individual grader orderings are aggregated into an overall ordering of assignments, to be a viable alternate to tra...
Article
Full-text available
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on...
Conference Paper
The ability to learn from user interactions can give systems access to unprecedented amounts of world knowledge. This is already evident in search engines, recommender systems, and electronic commerce, and other applications are likely to follow in the near future (e.g., education, smart homes). More generally, the ability to learn from user intera...
Article
Full-text available
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implic...
Article
We explore the use of personalized radio to facilitate the discovery of music created by local artists. We describe a system called MegsRadio.fm that produces a customizable stream of music by both local and well-known (non-local) artists based on seed artists, tags, venues and/or location. We hypothesize that the more popular artists provide conte...
Article
Full-text available
MOOCs have the potential to revolutionize higher education with their wide outreach and accessibility, but they require instructors to come up with scalable alternates to traditional student evaluation. Peer grading -- having students assess each other -- is a promising approach to tackling the problem of evaluation at scale, since the number of "g...
Conference Paper
When a website hosting user-generated content asks users a straightforward question - "Was this content helpful?" with one "Yes" and one "No" button as the two possible answers - one might expect to get a straightforward answer. In this paper, we explore how users respond to this question and find that their responses are not quite straightforward...
Conference Paper
Full-text available