W. Bruce Croft

W. Bruce Croft
University of Massachusetts Amherst | UMass Amherst · School of Computer Science

About

272
Publications
31,957
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
20,333
Citations

Publications

Publications (272)
Conference Paper
Full-text available
Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argum...
Preprint
Full-text available
Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argum...
Article
Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users’ lives. This article addresses two research problems that are vital for developing effective...
Preprint
Users often need to look through multiple search result pages or reformulate queries when they have complex information-seeking needs. Conversational search systems make it possible to improve user satisfaction by asking questions to clarify users' search intents. This, however, can take significant effort to answer a series of questions starting w...
Preprint
In this work, we address multi-modal information needs that contain text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. This task requires access to outside knowledge, which in our case we define to be a large unstructured passage collection. We first conduct sparse retrieval with BM25 and stu...
Preprint
Traditional statistical retrieval models often treat each document as a whole. In many cases, however, a document is relevant to a query only because a small part of it contain the targeted information. In this work, we propose a neural passage model (NPM) that uses passage-level information to improve the performance of ad-hoc retrieval. Instead o...
Preprint
Recent studies on Question Answering (QA) and Conversational QA (ConvQA) emphasize the role of retrieval: a system first retrieves evidence from a large collection and then extracts answers. This open-retrieval ConvQA setting typically assumes that each question is answerable by a single span of text within a particular passage (a span answer). The...
Preprint
Full-text available
Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users' lives. This paper addresses two research problems that are vital for developing effective pe...
Preprint
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user respons...
Preprint
Full-text available
Conversational search is one of the ultimate goals of information retrieval. Recent research approaches conversational search by simplified settings of response ranking and conversational question answering, where an answer is either selected from a given candidate set or extracted from a given passage. These simplifications neglect the fundamental...
Preprint
Full-text available
Product search is an important way for people to browse and purchase items on E-commerce platforms. While customers tend to make choices based on their personal tastes and preferences, analysis of commercial product search logs has shown that personalization does not always improve product search quality. Most existing product search techniques, ho...
Preprint
Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step. Previous work shows the efficacy of jointly scoring and selecting sentences with neural sequence generation models. It is, however, n...
Chapter
Full-text available
Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensiv...
Preprint
Full-text available
Personal assistant systems, such as Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana, are becoming ever more widely used. Understanding user intent such as clarification questions, potential answers and user feedback in information-seeking conversations is critical for retrieving good responses. In this paper, we analyze user inten...
Conference Paper
Full-text available
Intelligent assistants change the way people interact with computers and make it possible for people to search for products through conversations when they have purchase needs. During the interactions, the system could ask questions on certain aspects of the ideal products to clarify the users' needs. For example, previous work proposed to ask user...
Conference Paper
Users are known to interact more with fresh content in certain temporally associated domains such as news search or job seeking, leading to an uneven distribution of interactions over items of different degrees of freshness. Data collected under such an "aging effect'' is usually used unconditionally on all sort of recommendation tasks, and as a re...
Conference Paper
Full-text available
In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved...
Conference Paper
Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence on the purchase decision of each customer, it is intuitive that personalization should be beneficial for product search engines. While synthetic experiments from previo...
Conference Paper
Full-text available
Intelligent personal assistant systems that are able to have multi-turn conversations with human users are becoming increasingly popular. Most previous research has been focused on using either retrieval-based or generation-based methods to develop such systems. Retrieval-based methods have the advantage of returning fluent and informative response...
Conference Paper
Conversational question answering (ConvQA) is a simplified but concrete setting of conversational search. One of its major challenges is to leverage the conversation history to understand and answer the current question. In this work, we propose a novel solution for ConvQA that involves three aspects. First, we propose a positional history answer e...
Article
Full-text available
Product search is one of the most popular methods for customers to discover products online. Most existing studies on product search focus on developing effective retrieval models that rank items by their likelihood to be purchased. However, they ignore the problem that there is a gap between how systems and customers perceive the relevance of item...
Conference Paper
Estimating the quality of a result list, often referred to as query performance prediction (QPP), is a challenging and important task in information retrieval. It can be used as feedback to users, search engines, and system administrators. Although predicting the performance of retrieval models has been extensively studied for the ad-hoc retrieval...
Conference Paper
Full-text available
In information retrieval, sampling methods used to select documents for neural models must often deal with large class imbalances during training. This issue necessitates careful selection of negative instances when training neural models to avoid the risk of overfitting. For most work, heuristic sampling approaches, or policies, are created based...
Conference Paper
In traditional retrieval environments, a ranked list of candidate documents is produced without regard to the number of documents. With the rise in interactive IR as well as professional searches such as legal retrieval, this results in a substantial ranked list which is scanned by a user until their information need is satisfied. Determining the p...
Preprint
Full-text available
Product search is one of the most popular methods for customers to discover products online. Most existing studies on product search focus on developing effective retrieval models that rank items by their likelihood to be purchased. They, however, ignore the problem that there is a gap between how systems and customers perceive the relevance of ite...
Preprint
Full-text available
In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved...
Preprint
Full-text available
Intelligent assistants change the way people interact with computers and make it possible for people to search for products through conversations when they have purchase needs. During the interactions, the system could ask questions on certain aspects of the ideal products to clarify the users' needs. For example, previous work proposed to ask user...
Preprint
Full-text available
Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers' preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customer...
Preprint
Full-text available
Conversational question answering (ConvQA) is a simplified but concrete setting of conversational search. One of its major challenges is to leverage the conversation history to understand and answer the current question. In this work, we propose a novel solution for ConvQA that involves three aspects. First, we propose a positional history answer e...
Conference Paper
Full-text available
Users often fail to formulate their complex information needs in a single query. As a consequence, they may need to scan multiple result pages or reformulate their queries, which may be a frustrating experience. Alternatively, systems can improve user satisfaction by proactively asking questions of the users to clarify their information needs. Aski...
Conference Paper
Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to answer the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We...
Conference Paper
There has historically been a divide between the user-oriented and system-oriented research communities in information retrieval. In my opinion, this divide is based primarily on a difference in viewpoint about the relative importance of understanding how people search for information compared to developing new retrieval models and ranking algorith...
Preprint
Full-text available
Users often fail to formulate their complex information needs in a single query. As a consequence, they may need to scan multiple result pages or reformulate their queries, which may be a frustrating experience. Alternatively, systems can improve user satisfaction by proactively asking questions of the users to clarify their information needs. Aski...
Preprint
Full-text available
Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensiv...
Preprint
Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to answer the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We...
Preprint
The bidirectional encoder representations from transformers (BERT) model has recently advanced the state-of-the-art in passage re-ranking. In this paper, we analyze the results produced by a fine-tuned BERT model to better understand the reasons behind such substantial improvements. To this aim, we focus on the MS MARCO passage re-ranking dataset a...
Chapter
Full-text available
Relevance feedback techniques assume that users provide relevance judgments for the top k (usually 10) documents and then re-rank using a new query model based on those judgments. Even though this is effective, there has been little research recently on this topic because requiring users to provide substantial feedback on a result list is impractic...
Preprint
Ranking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growin...
Preprint
Full-text available
Deep text matching approaches have been widely studied for many applications including question answering and information retrieval systems. To deal with a domain that has insufficient labeled data, these approaches can be used in a Transfer Learning (TL) setting to leverage labeled data from a resource-rich source domain. To achieve better perform...
Preprint
Full-text available
Relevance feedback techniques assume that users provide relevance judgments for the top k (usually 10) documents and then re-rank using a new query model based on those judgments. Even though this is effective, there has been little research recently on this topic because requiring users to provide substantial feedback on a result list is impractic...
Preprint
Full-text available
As more and more search traffic comes from mobile phones, intelligent assistants, and smart-home devices, new challenges (e.g., limited presentation space) and opportunities come up in information retrieval. Previously, an effective technique, relevance feedback (RF), has rarely been used in real search scenarios due to the overhead of collecting u...
Preprint
Full-text available
With the recent growth in the use of conversational systems and intelligent assistants such as Google Assistant and Microsoft Cortana, mobile devices are becoming even more pervasive in our lives. As a consequence, users are getting engaged with mobile apps and frequently search for an information need using different apps. Recent work has stated t...
Conference Paper
Implicit feedback (e.g., user clicks) is an important source of data for modern search engines. While heavily biased [8, 9, 11, 27], it is cheap to collect and particularly useful for user-centric retrieval applications such as search ranking. To develop an unbiased learning-to-rank system with biased feedback, previous studies have focused on cons...
Conference Paper
The availability of massive data and computing power allowing for effective data driven neural approaches is having a major impact on machine learning and information retrieval research, but these models have a basic problem with efficiency. Current neural ranking models are implemented as multistage rankers: for efficiency reasons, the neural mode...
Conference Paper
Full-text available
Conversational search and recommendation based on user-system dialogs exhibit major differences from conventional search and recommendation tasks in that 1) the user and system can interact for multiple semantically coherent rounds on a task through natural language dialog, and 2) it becomes possible for the system to understand the user needs or t...
Data
In this data collection, we are particularly interested in providing the first dataset focusing on a unified search framework for mobile devices by collecting cross-app mobile queries as well as their target apps. To this end, we recruited 255 participants through an open online call asking them to install uSearch on their smartphones and let it ru...
Data
In this data collection, we are particularly interested in providing the first dataset focusing on a unified search framework for mobile devices by collecting cross-app mobile queries as well as their target apps. To this end, we initially asked crowdworkers to explain their latest search experience on their smartphones and used them to define vari...
Conference Paper
Neural network approaches have recently shown to be effective in several information retrieval (IR) tasks. However, neural approaches often require large volumes of training data to perform effectively, which is not always available. To mitigate the shortage of labeled data, training neural IR models with weak supervision has been recently proposed...
Conference Paper
Retrieving short, precise answers to non-factoid queries is an increasingly important task, especially for mobile and voice search. Many of these questions may have multiple or alternative answers. In an environment where answers are presented incrementally, this raises the question of how to generate a diverse ranking to cover these alternatives....
Conference Paper
Full-text available
Implicit user feedback (such as clicks and dwell time) is an important source of data for modern search engines. While heavily biased~\citejoachims2005accurately,keane2006modeling,joachims2007evaluating,yue2010beyond, it is cheap to collect and particularly useful for user-centric retrieval applications such as search ranking and query recommendati...
Conference Paper
The ease of constructing effective neural networks has resulted in a large number of varying architectures iteratively improving performance on a task. Due to the nature of these models being black boxes, standard weight inspection is difficult. We propose a probe based methodology to evaluate what information is important or extraneous at each lev...
Preprint
Despite the somewhat different techniques used in developing search engines and recommender systems, they both follow the same goal: helping people to get the information they need at the right time. Due to this common goal, search and recommendation models can potentially benefit from each other. The recent advances in neural network technologies...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
Full-text available
Learning to rank has been intensively studied and widely applied in information retrieval. Typically, a global ranking function is learned from a set of labeled data, which can achieve good performance on average but may be suboptimal for individual queries by ignoring the fact that relevant documents for different queries may have different distri...
Conference Paper
Full-text available
Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this e...
Conference Paper
With the rise in mobile and voice search, answer passage retrieval acts as a critical component of an effective information retrieval system for open domain question answering. Currently, there are no comparable collections that address non-factoid question answering within larger documents while simultaneously providing enough examples sufficient...
Conference Paper
Full-text available
Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. Despite their differences, most existing studies separate the...
Conference Paper
Predicting the performance of a search engine for a given query is a fundamental and challenging task in information retrieval. Accurate performance predictors can be used in various ways, such as triggering an action, choosing the most effective ranking function per query, or selecting the best variant from multiple query formulations. In this pap...
Conference Paper
Unlike traditional learning to rank models that depend on hand-crafted features, neural representation learning models learn higher level features for the ranking task by training on large datasets. Their ability to learn new features directly from the data, however, may come at a price. Without any special supervision, these models learn relations...
Conference Paper
Learning to rank is a key component of modern information retrieval systems. Recently, regression forest models (i.e., random forests, LambdaMART and gradient boosted regression trees) have come to dominate learning to rank systems in practice, as they provide the ability to learn from large scale data while generalizing well to additional test que...