Falk Scholer

Falk Scholer
RMIT University | RMIT · School of Computer Science and Information Technology

PhD

About

175
Publications
29,525
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,903
Citations
Introduction
Skills and Expertise

Publications

Publications (175)
Preprint
Full-text available
Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs u...
Article
Full-text available
Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs u...
Conference Paper
Full-text available
Where do queries-the words searchers type into a search box-come from? The Information Retrieval community understands the performance of queries and search engines extensively, and has recently begun to examine the impact of query variation, showing that different queries for the same information need produce different results. In an information e...
Preprint
Full-text available
Asking clarification questions is an active area of research; however, resources for training and evaluating search clarification methods are not sufficient. To address this issue, we describe MIMICS-Duo, a new freely available dataset of 306 search queries with multiple clarifications (a total of 1,034 query-clarification pairs). MIMICS-Duo contai...
Article
Full-text available
Query performance prediction (QPP) has been studied extensively in the IR community over the last two decades. A by-product of this research is a methodology to evaluate the effectiveness of QPP techniques. In this paper, we re-examine the existing evaluation methodology commonly used for QPP, and propose a new approach. Our key idea is to model QP...
Conference Paper
Full-text available
Commercial software systems are typically opaque with regard to their inner workings. This makes it challenging to understand the nuances of complex systems, and to study their operation, in particular in the context of fairness and bias. We explore a methodology for studying aspects of the behavior of black box systems, focusing on a commercial se...
Article
Relevance is a key concept in information retrieval and widely used for the evaluation of search systems using test collections. We present a comprehensive study of the effect of the choice of relevance scales on the evaluation of information retrieval systems. Our work analyzes and compares four crowdsourced scales (2-levels, 4-levels, and 100-lev...
Conference Paper
Full-text available
Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argum...
Article
Full-text available
This research analyzes human‐generated clarification questions to provide insights into how they are used to disambiguate and provide a better understanding of information needs. A set of clarification questions is extracted from posts on the Stack Exchange platform. Novel taxonomy is defined for the annotation of the questions and their responses....
Preprint
Full-text available
Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argum...
Article
Full-text available
In many search scenarios, such as exploratory, comparative, or survey-oriented search, users interact with dynamic search systems to satisfy multi-aspect information needs. These systems utilize different dynamic approaches that exploit various user feedback granularity types. Although studies have provided insights about the role of many component...
Article
Full-text available
Like other disease outbreaks, the COVID-19 pandemic has led to the rapid generation and dissemination of misinformation and fake news. We investigated whether subscribers to a fact checking newsletter (n = 1397) were willing to share possible misinformation, and whether predictors of possible misinformation sharing are the same as for general sampl...
Conference Paper
Full-text available
Query Performance Prediction (QPP) has been studied extensively in the IR community over the last two decades. A by-product of this research is a methodology to evaluate the effectiveness of QPP techniques. In this paper, we reexamine the existing evaluation methodology commonly used for QPP, and propose a new approach. Our key idea is to model QPP...
Article
Recent years have seen an increase in the number of publicly available datasets that are released to foster research in question answering systems. In this work, we survey the available datasets and also provide a simple, multi-faceted classification of those datasets. We further survey the most recent evaluation results that form the current state...
Conference Paper
Full-text available
We present an ongoing collaboration between computer science researchers and fact-checking experts in a broadcast corporation to develop Watch 'n' Check, a social media monitoring tool that assists fact-checkers to detect and target misinformation online. The lean methodology followed in our collaboration has helped us to better understand how info...
Article
Full-text available
When evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used? We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query 2007, TeraByte 2006, and Robust 2004 TREC collections, which all feature more than 50 topics, something...
Article
Geolocating Twitter users—the task of identifying their home locations—serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been propose...
Article
A lack of reliable relevance labels for training ranking functions is a significant problem for many search applications. Transfer ranking is a technique aiming to transfer knowledge from an existing machine learning ranking task to a new ranking task. Unsupervised transfer ranking is a special case of transfer ranking where there aren’t any releva...
Conference Paper
Full-text available
We investigated the learning process in search by conducting a log-based study involving registered job seekers of a commercial job search engine. The analysis shows that job search is a complex task: seekers usually submit multiple queries over sessions that can last days or even weeks. We find that querying, clicking, and job application rates ch...
Preprint
Full-text available
Geolocating Twitter users---the task of identifying their home locations---serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been pro...
Chapter
Full-text available
Complex dynamic search tasks typically involve multi-aspect information needs and repeated interactions with an information retrieval system. Various metrics have been proposed to evaluate dynamic search systems, including the Cube Test, Expected Utility, and Session Discounted Cumulative Gain. While these complex metrics attempt to measure overall...
Conference Paper
Full-text available
Complex dynamic search tasks typically involve multi-aspect information needs and repeated interactions with an information retrieval system. Various metrics have been proposed to evaluate dynamic search systems, including the Cube Test, Expected Utility, and Session Discounted Cumulative Gain. While these complex metrics attempt to measure overall...
Conference Paper
Full-text available
Intelligent assistants can serve many purposes, including entertainment (e.g. playing music), home automation, and task management (e.g. timers, reminders). The role of these assistants is evolving to also support people engaged in work tasks, in workplaces and beyond. To design truly useful intelligent assistants for work, it is important to bette...
Preprint
Information retrieval systems are evolving from document retrieval to answer retrieval. Web search logs provide large amounts of data about how people interact with ranked lists of documents, but very little is known about interaction with answer texts. In this paper, we use Amazon Mechanical Turk to investigate three answer presentation and intera...
Conference Paper
Offline metrics for IR evaluation are often derived from a user model that seeks to capture the interaction between the user and the ranking, conflating the interaction with a ranking of documents with the user's interaction with the search results page. A desirable property of any effectiveness metric is if the scores it generates over a set of ra...
Conference Paper
Full-text available
A wide range of evaluation metrics have been proposed to measure the quality of search results, including in the presence of diversification. Some of these metrics have been adapted for use in search tasks with different complexities, such as where the search system returns lists of different lengths. Given the range of requirements, it can be diff...
Conference Paper
Consistency of relevance judgments is a vital issue for the construction of test collections in information retrieval. As human relevance assessments are costly, and large collections can contain many documents of varying relevance, collecting reliable judgments is a critical component to building reusable test collections. We explore the impact of...
Article
One typical way of building test collections for offline measurement of information retrieval systems is to pool the ranked outputs of different systems down to some chosen depth d and then form relevance judgments for those documents only. Non-pooled documents—ones that did not appear in the top-d sets of any of the contributing systems—are then d...
Conference Paper
Full-text available
Clinical Decision Support (CDS) systems aim to assist clinicians in their daily decision-making related to diagnosis, tests, and treatments of patients by providing relevant evidence from the scientific literature. This promise however is yet to be fulfilled, with search for relevant literature for a given patient condition still being an active re...
Conference Paper
Full-text available
Errors in formulation of queries made by users can lead to poor search results pages. We performed a living lab study using online A/B testing to measure the degree of improvement achieved with a query amendment technique when applied to a commercial job search engine. Of particular interest in this case study is a clear "success" signal, namely, t...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
Full-text available
Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this e...
Conference Paper
Typing is a common form of query input for search engines and other information retrieval systems; we therefore investigate the relationship between typing behavior and search interactions. The search process is interactive and typically requires entering one or more queries, and assessing both summaries from Search Engine Result Pages and the unde...
Conference Paper
Score-at-a-Time index traversal is a query processing approach which supports early termination in order to balance efficiency and effectiveness trade-offs. In this work, we explore new techniques which extend a modern Score-at-a-Time traversal algorithm to allow for parallel postings traversal. We show that careful integration of parallel traversa...
Conference Paper
Query performance prediction estimates the effectiveness of a query in advance of human judgements. Accurate prediction could be used, for example, to trigger special processing, select query variants, or choose whether to search at all. Prediction evaluations have not distinguished effects due to query wording from effects due to the underlying in...
Conference Paper
Search is near-ubiquitous in human society, being used for entertainment, health, financial and business information seeking. Traditional methods of search evaluation have assumed that searchers move forward through search results in a linear manner; early eye tracking studies have suggested the same. Recent research, though, including eye-tracking...
Conference Paper
Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has e...
Conference Paper
Recipe search systems rely on keyword matching, and in this work we analyze the consistency of vocabulary, investigating how agreement differs when searching: for ingredients versus dishes; for common versus uncommon items; and, between recipe authors and searchers. The experiments for this study use a crowd-sourcing framework and a large corpus of...
Conference Paper
Full-text available
This paper investigates if Information Foraging Theory can be used to understand differences in user behavior when searching on mobile and desktop web search systems. Two groups of thirty-six participants were recruited to carry out six identical web search tasks on desktop or on mobile. The search tasks were prepared with a different number and di...
Conference Paper
A search engine that can return the ideal results for a person's information need, independent of the specific query that is used to express that need, would be preferable to one that is overly swayed by the individual terms used; search engines should be consistent in the presence of syntactic query variations responding to the same information ne...
Conference Paper
In recent years, gathering relevance judgments through non-topic originators has become an increasingly important problem in Information Retrieval. Relevance judgments can be used to measure the effectiveness of a system, and are often needed to build supervised learning models in learning-to-rank retrieval systems. The two most popular approaches...
Article
Re-finding is the process of searching for information that a user has previously encountered and is a common activity carried out with information retrieval systems. In this work, we investigate re-finding in the context of vertical search, differentiating and modeling user re-finding behavior within different media and topic domains, including im...
Article
Information retrieval systems aim to help users satisfy information needs. We argue that the goal of the person using the system, and the pattern of behavior that they exhibit as they proceed to attain that goal, should be incorporated into the methods and techniques used to evaluate the effectiveness of IR systems, so that the resulting effectiven...
Conference Paper
Full-text available
We investigate the influence of language on the accuracy of geolocating Twitter users. Our analysis, using a large corpus of tweets written in thirteen languages, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance has a greater impact on accuracy than geographi...
Article
Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance of documents for information retrieval evaluation, carrying out a large-scale user study across 18 T...
Conference Paper
Full-text available
The Web has created a global marketplace for e-Commerce as well as for talent. Online employment marketplaces provide an effective channel to facilitate the matching between job seekers and hirers. This paper presents an initial exploration of user behavior in job and talent search using query and click logs from a popular employment marketplace. T...
Conference Paper
Judging the relevance of documents for an information need is an activity that underpins the most widely-used approach in the evaluation of information retrieval systems. In this study we investigate the relationship between how long it takes an assessor to judge document relevance, and three key factors that may influence the judging scenario: the...
Conference Paper
Full-text available
We investigate the effectiveness of using semantic and context features for extracting document summaries that are designed to contain answers for non-factoid queries. The summarization methods are compared against state-of-the-art factoid question answering and query-biased summarization techniques. The accuracy of generated answer summaries are e...
Conference Paper
Query-level instance weighting is a technique for unsupervised transfer ranking, which aims to train a ranker on a source collection so that it also performs effectively on a target collection, even if no judgement information exists for the latter. Past work has shown that this approach can be used to significantly improve effectiveness; in this w...
Conference Paper
Human relevance judgments are a key component for measuring the effectiveness of information retrieval systems using test collections. Since relevance is not an absolute concept, human assessors can disagree on particular topic-document pairs for a variety of reasons. In this work we investigate the effect that document presentation order has on in...
Conference Paper
We describe the UQV100 test collection, designed to incorporate variability from users. Information need ?backstories? were written for 100 topics (or sub-topics) from the TREC 2013 and 2014 Web Tracks. Crowd workers were asked to read the backstories, and provide the queries they would use; plus effort estimates of how many useful documents they w...
Article
We present a study of which baseline to use when testing a new retrieval technique. In contrast to past work, we show that measuring a statistically significant improvement over a weak baseline is not a good predictor of whether a similar improvement will be measured on a strong baseline. Sometimes strong baselines are made worse when a new techniq...
Conference Paper
Full-text available
Retrieving finer grained text units such as passages or sentences as answers for non-factoid Web queries is becoming increasingly important for applications such as mobile Web search. In this work, we introduce the answer sentence retrieval task for non-factoid Web queries, and investigate how this task can be effectively solved under a learning to...
Conference Paper
A large number of metrics have been proposed to measure the effectiveness of information retrieval systems. Here we provide a detailed explanation of one recent proposal, INST, articulate the various properties that it embodies, and describe a number of pragmatic issues that need to be taken in to account when writing an implementation. The result...