Maarten de Rijke's research while affiliated with University of Amsterdam and other places

Publications (907)

Article
Network-based information has been widely explored and exploited in the information retrieval literature. Attributed networks, consisting of nodes, edges as well as attributes describing properties of nodes, are a basic type of network-based data, and are especially useful for many applications. Examples include user profiling in social networks an...
Preprint
When using medical images for diagnosis, either by clinicians or artificial intelligence (AI) systems, it is important that the images are of high quality. When an image is of low quality, the medical exam that produced the image often needs to be redone. In telemedicine, a common problem is that the quality issue is only flagged once the patient h...
Article
In this work, we explain the setup for a technical, graduate-level course on Fairness, Accountability, Confidentiality, and Transparency in Artificial Intelligence (FACT-AI) at the University of Amsterdam, which teaches FACT-AI concepts through the lens of reproducibility. The focal point of the course is a group project based on reproducing existi...
Article
Model interpretability has become an important problem in machine learning (ML) due to the increased effect algorithmic decisions have on humans. Counterfactual explanations can help users understand not only why ML models make certain decisions, but also how these decisions can be changed. We frame the problem of finding counterfactual explanation...
Preprint
Learned recommender systems may inadvertently leak information about their training data, leading to privacy violations. We investigate privacy threats faced by recommender systems through the lens of membership inference. In such attacks, an adversary aims to infer whether a user's data is used to train the target recommender. To achieve this, pre...
Preprint
Full-text available
We study the problem of restocking a grocery store's inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of foo...
Preprint
Fairness of exposure is a commonly used notion of fairness for ranking systems. It is based on the idea that all items or item groups should get exposure proportional to the merit of the item or the collective merit of the items in the group. Often, stochastic ranking policies are used to ensure fairness of exposure. Previous work unrealistically a...
Preprint
Full-text available
Methods for reinforcement learning for recommendation (RL4Rec) are increasingly receiving attention as they can quickly adapt to user feedback. A typical RL4Rec framework consists of (1) a state encoder to encode the state that stores the users' historical interactions, and (2) an RL method to take actions and observe rewards. Prior work compared f...
Preprint
The cross-entropy objective has proved to be an all-purpose training objective for autoregressive language models (LMs). However, without considering the penalization of problematic tokens, LMs trained using cross-entropy exhibit text degeneration. To address this, unlikelihood training has been proposed to force unlikely tokens to be assigned a lo...
Conference Paper
Full-text available
Dialogue systems (DSs) are evaluated depending on their type and purpose. Two categories are often distinguished: (1) task-oriented dialogue systems (TDSs), which are typically evaluated on utility, i.e., their ability to complete a specified task, and (2) open-domain chat-bots, which are evaluated on the user experience, i.e., based on their abili...
Preprint
There are several measures for fairness in ranking, based on different underlying assumptions and perspectives. PL optimization with the REINFORCE algorithm can be used for optimizing black-box objective functions over permutations. In particular, it can be used for optimizing fairness measures. However, though effective for queries with a moderate...
Preprint
To train image-caption retrieval (ICR) methods, contrastive loss functions are a common choice for optimization functions. Unfortunately, contrastive ICR methods are vulnerable to learning shortcuts: decision rules that perform well on the training data but fail to transfer to other testing conditions. We introduce an approach to reduce shortcut fe...
Preprint
Dialogue systems are evaluated depending on their type and purpose. Two categories are often distinguished: (1) task-oriented dialogue systems (TDS), which are typically evaluated on utility, i.e., their ability to complete a specified task, and (2) open domain chatbots, which are evaluated on the user experience, i.e., based on their ability to en...
Preprint
Automatic text summarization has experienced substantial progress in recent years. With this progress, the question has arisen whether the types of summaries that are typically generated by automatic summarization models align with users' needs. Ter Hoeve et al (2020) answer this question negatively. Amongst others, they recommend focusing on gener...
Preprint
Full-text available
A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries. Today, memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables. To avoid tr...
Preprint
Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. T...
Preprint
Task-oriented dialogue systems (TDSs) are assessed mainly in an offline setting or through human evaluation. The evaluation is often limited to single-turn or very time-intensive. As an alternative, user simulators that mimic user behavior allow us to consider a broad set of user goals to generate human-like conversations for simulated evaluation....
Preprint
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch. Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning. We ask whether these findings gener...
Article
Recent research has highlighted the importance of mixed-initiative interactions in conversational search. To enable mixed-initiative interactions, information retrieval systems should be able to ask diverse questions, such as information-seeking, clarification, and open-ended ones. QG of open-domain conversational systems aims at enhancing the inte...
Article
Sequential recommenders capture dynamic aspects of users’ interests by modeling sequential behavior. Previous studies on sequential recommendations mostly aim to identify users’ main recent interests to optimize the recommendation accuracy; they often neglect the fact that users display multiple interests over extended periods of time, which could...
Article
Probabilistic time series forecasting is crucial in many application domains, such as retail, ecommerce, finance, and biology. With the increasing availability of large volumes of data, a number of neural architectures have been proposed for this problem. In particular, Transformer-based methods achieve state-of-the-art performance on real-world be...
Preprint
Full-text available
Traditional ranking systems are expected to sort items in the order of their relevance and thereby maximize their utility. In fair ranking, utility is complemented with fairness as an optimization goal. Recent work on fair ranking focuses on developing algorithms to optimize for fairness, given position-based exposure. In contrast, we identify the...
Preprint
Full-text available
E-commerce provides rich multimodal data that is barely leveraged in practice. One aspect of this data is a category tree that is being used in search and recommendation. However, in practice, during a user's session there is often a mismatch between a textual and a visual representation of a given category. Motivated by the problem, we introduce t...
Article
Sequential recommendation is a task in which one models and uses sequential information about user behavior for recommendation purposes. We study sequential recommendation in a particularly challenging context, in which multiple individual users share a single account (i.e., they have a shared account) and in which user behavior is available in mul...
Preprint
Full-text available
User interactions with recommender systems (RSs) are affected by user selection bias, e.g., users are more likely to rate popular items (popularity bias) or items that they expect to enjoy beforehand (positivity bias). Methods exist for mitigating the effects of selection bias in user ratings on the evaluation and optimization of RSs. However, thes...
Preprint
Full-text available
In this work we explain the setup for a technical, graduate-level course on Fairness, Accountability, Confidentiality and Transparency in Artificial Intelligence (FACT-AI) at the University of Amsterdam, which teaches FACT-AI concepts through the lens of reproducibility. The focal point of the course is a group project based on reproducing existing...
Article
In this article, we address the problem of answering complex information needs by conducting conversations with search engines , in the sense that users can express their queries in natural language and directly receive the information they need from a short system response in a conversational manner. Recently, there have been some attempts towards...
Article
Conversational search is a relatively young area of research that aims at automating an information-seeking dialogue. In this article, we help to position it with respect to other research areas within conversational artificial intelligence (AI) by analysing the structural properties of an information-seeking dialogue. To this end, we perform a lar...
Article
Hierarchical context modeling plays an important role in the response generation for multi-turn conversational systems. Previous methods mainly model context as multiple independent utterances and rely on attention mechanisms to obtain the context representation. They tend to ignore the explicit responds-to relationships between adjacent utterances...
Preprint
The goal of a next basket recommendation (NBR) system is to recommend items for the next basket for a user, based on the sequence of their prior baskets. Recently, a number of methods with complex modules have been proposed that claim state-of-the-art performance. They rarely look into the predicted basket and just provide intuitive reasons for the...
Preprint
Multiclass multilabel classification refers to the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of that multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). Empiri...
Preprint
In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine c...
Conference Paper
State-of-the-art Learning to Rank (LTR) methods for optimizing ranking systems based on user interactions are divided into online approaches – that learn by direct interaction – and counterfactual approaches – that learn from historical interactions. We propose a novel intervention-aware estimator to bridge this online/counterfactual division. The...
Article
Factorization machines are a generic supervised method for a wide range of tasks in the field of artificial intelligence, such as prediction, inference, etc., which can effectively model feature interactions. However, handling combinations of features is expensive due to the exponential growth of feature interactions with the order. In nature, not...
Preprint
Conversational Question Simplification (CQS) aims to simplify self-contained questions into conversational ones by incorporating some conversational characteristics, e.g., anaphora and ellipsis. Existing maximum likelihood estimation (MLE) based methods often get trapped in easily learned tokens as all tokens are treated equally during training. In...
Preprint
Writers such as journalists often use automatic tools to find relevant content to include in their narratives. In this paper, we focus on supporting writers in the news domain to develop event-centric narratives. Given an incomplete narrative that specifies a main event and a context, we aim to retrieve news articles that discuss relevant events th...
Preprint
Full-text available
Electronic health record (EHR) coding is the task of assigning ICD codes to each EHR. Most previous studies either only focus on the frequent ICD codes or treat rare and frequent ICD codes in the same way. These methods perform well on frequent ICD codes but due to the extremely unbalanced distribution of ICD codes, the performance on rare ones is...
Preprint
Full-text available
One of the key challenges in Sequential Recommendation (SR) is how to extract and represent user preferences. Traditional SR methods rely on the next item as the supervision signal to guide preference extraction and representation. We propose a novel learning strategy, named preference editing. The idea is to force the SR model to discriminate the...
Preprint
Full-text available
Gradient Boosting Machines (GBM) are hugely popular for solving tabular data problems. However, practitioners are not only interested in point predictions, but also in probabilistic predictions in order to quantify the uncertainty of the predictions. Creating such probabilistic predictions is difficult with existing GBM-based solutions: they either...
Article
Conversational interfaces are increasingly popular as a way of connecting people to information. With the increased generative capacity of corpus‐based conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of detecting and clas...
Preprint
Conversational information seeking (CIS) is playing an increasingly important role in connecting people to information. Due to the lack of suitable resource, previous studies on CIS are limited to the study of theoretical/conceptual frameworks, laboratory-based user studies, or a particular aspect of CIS (e.g., asking clarifying questions). In this...
Conference Paper
Full-text available
Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we prop...
Preprint
Medical dialogue generation aims to provide automatic and accurate responses to assist physicians to obtain diagnosis and treatment suggestions in an efficient manner. In medical dialogues two key characteristics are relevant for response generation: patient states (such as symptoms, medication) and physician actions (such as diagnosis, treatments)...
Preprint
Full-text available
Contract element extraction (CEE) is the novel task of automatically identifying and extracting legally relevant elements such as contract dates, payments, and legislation references from contracts. Automatic methods for this task view it as a sequence labeling problem and dramatically reduce human labor. However, as contract genres and element typ...
Preprint
Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we prop...
Preprint
Being able to generate informative and coherent dialogue responses is crucial when designing human-like open-domain dialogue systems. Encoder-decoder-based dialogue models tend to produce generic and dull responses during the decoding step because the most predictable response is likely to be a non-informative response instead of the most suitable...
Preprint
Full-text available
Conversational search is a relatively young area of research that aims at automating an information-seeking dialogue. In this paper we help to position it with respect to other research areas within conversational Artificial Intelligence (AI) by analysing the structural properties of an information-seeking dialogue. To this end, we perform a large-...