Mark Sanderson

Mark Sanderson
RMIT University | RMIT · School of Computer Science and Information Technology

PhD

About

397
Publications
79,769
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,847
Citations
Introduction
I am Professor at RMIT University. I'm a researcher in information retrieval, where I work on evaluation of search engines, summarisation, geographic search, and log analysis.
Additional affiliations
September 2010 - present
RMIT University
April 1999 - August 2010
The University of Sheffield
January 1998 - March 1999
University of Massachusetts Amherst

Publications

Publications (397)
Article
Full-text available
Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs u...
Conference Paper
Full-text available
Where do queries-the words searchers type into a search box-come from? The Information Retrieval community understands the performance of queries and search engines extensively, and has recently begun to examine the impact of query variation, showing that different queries for the same information need produce different results. In an information e...
Preprint
Full-text available
Asking clarification questions is an active area of research; however, resources for training and evaluating search clarification methods are not sufficient. To address this issue, we describe MIMICS-Duo, a new freely available dataset of 306 search queries with multiple clarifications (a total of 1,034 query-clarification pairs). MIMICS-Duo contai...
Preprint
This volume contains the position papers presented at CSCW 2021 Workshop - Investigating and Mitigating Biases in Crowdsourced Data, held online on 23rd October 2021, at the 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2021). The workshop explored how specific crowdsourcing workflows, worker attributes, and...
Article
Full-text available
This research analyzes human‐generated clarification questions to provide insights into how they are used to disambiguate and provide a better understanding of information needs. A set of clarification questions is extracted from posts on the Stack Exchange platform. Novel taxonomy is defined for the annotation of the questions and their responses....
Conference Paper
Full-text available
Studies of interaction log analysis are a common tool to investigate behavioural data and contribute to insights into users' interaction patterns with a system [11, 18]. We present a log analysis from a be-spoke conversational system, RealSAM, an audio-only interaction media assistant in which users can navigate and interact with media content thro...
Article
We investigate the impact of popularity bias in false-positive metrics in the offline evaluation of recommender systems. Unlike their true-positive complements, false-positive metrics reward systems that minimize recommendations disliked by users. Our analysis is, to the best of our knowledge, the first to show that false-positive metrics tend to p...
Article
Recent years have seen an increase in the number of publicly available datasets that are released to foster research in question answering systems. In this work, we survey the available datasets and also provide a simple, multi-faceted classification of those datasets. We further survey the most recent evaluation results that form the current state...
Preprint
Full-text available
This paper investigates the Cyber-Physical behavior of users in a large indoor shopping mall by leveraging anonymized (opt in) Wi-Fi association and browsing logs recorded by the mall operators. Our analysis shows that many users exhibit a high correlation between their cyber activities and their physical context. To find this correlation, we propo...
Article
In the week of November 10--15, 2019, 44 researchers from the fields of information retrieval and Web search, natural language processing, human computer interaction, and dialogue systems met for the Dagstuhl Seminar 19461 "Conversational Search" to share the latest development in the area of conversational search and discuss its research agenda an...
Preprint
Dagstuhl Seminar 19461 "Conversational Search" was held on 10-15 November 2019. 44~researchers in Information Retrieval and Web Search, Natural Language Processing, Human Computer Interaction, and Dialogue Systems were invited to share the latest development in the area of Conversational Search and discuss its research agenda and future directions....
Article
The rapid growth in speech and small screen interfaces, particularly on mobile devices, has significantly influenced the way users interact with intelligent systems to satisfy their information needs. The growing interest in personal digital assistants, such as Amazon Alexa, Apple Siri, Google Assistant, and Microsoft Cortana, demonstrates the will...
Article
Full-text available
This paper investigates the Cyber-Physical behavior of a user in a large indoor shopping center by leveraging anonymized (opt in) Wi-Fi association and browsing logs recorded by the center operators. Our analysis shows that many users exhibit high correlation between their cyber activities and physical context. To find this correlation, we propose...
Article
Full-text available
When evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used? We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query 2007, TeraByte 2006, and Robust 2004 TREC collections, which all feature more than 50 topics, something...
Preprint
Full-text available
This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions. Specifically, we propose to develop and operate a prototype conversational search system for scholarly activities. This...
Article
Full-text available
Neural sequence-to-sequence (seq2seq) models have been widely used in abstractive summarization tasks. One of the challenges of this task is redundant contents in the input document often confuses the models and leads to poor performance. An efficient way to solve this problem is to select salient information from the input document. In this paper,...
Article
Geolocating Twitter users—the task of identifying their home locations—serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been propose...
Article
Full-text available
The purpose of the SIGIR 2019 workshop on Fairness, Accountability, Confidentiality, Transparency , and Safety (FACTS-IR) was to explore challenges in responsible information retrieval system development and deployment. To this end, the workshop aimed to crowd-source from the larger SIGIR community and draft an actionable research agenda on five ke...
Article
A lack of reliable relevance labels for training ranking functions is a significant problem for many search applications. Transfer ranking is a technique aiming to transfer knowledge from an existing machine learning ranking task to a new ranking task. Unsupervised transfer ranking is a special case of transfer ranking where there aren’t any releva...
Conference Paper
Full-text available
We investigated the learning process in search by conducting a log-based study involving registered job seekers of a commercial job search engine. The analysis shows that job search is a complex task: seekers usually submit multiple queries over sessions that can last days or even weeks. We find that querying, clicking, and job application rates ch...
Preprint
Full-text available
Conversation is the natural mode for information exchange in daily life, a spoken conversational interaction for search input and output is a logical format for information seeking. However, the conceptualisation of user-system interactions or information exchange in spoken conversational search (SCS) has not been explored. The first step in concep...
Preprint
Full-text available
Geolocating Twitter users---the task of identifying their home locations---serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been pro...
Conference Paper
Full-text available
We improve the measurement accuracy of retrieval system performance by better modeling the noise present in test collection scores. Our technique draws its inspiration from two approaches: one, which exploits the variable measurement accuracy of topics; the other, which randomly splits document collections into shards. We describe and theoretically...
Conference Paper
Full-text available
We improve the measurement accuracy of retrieval system performance by better modeling the noise present in test collection scores. Our technique draws its inspiration from two approaches: one, which exploits the variable measurement accuracy of topics; the other, which randomly splits document collections into shards. We describe and theoretically...
Conference Paper
Full-text available
Intelligent assistants can serve many purposes, including entertainment (e.g. playing music), home automation, and task management (e.g. timers, reminders). The role of these assistants is evolving to also support people engaged in work tasks, in workplaces and beyond. To design truly useful intelligent assistants for work, it is important to bette...
Article
Full-text available
Despite the bulk of research studying how to more accurately compare the performance of IR systems, less attention is devoted to better understanding the different factors which play a role in such performance and how they interact. This is the case of shards, i.e. partitioning a document collection into sub-parts, which are used for many different...
Conference Paper
The task intelligence workshop at the 2019 ACM Web Search and Data Mining (WSDM) conference comprised a mixture of research paper presentations, reports from data challenge participants, invited keynote(s) on broad topics related to tasks, and a workshop-wide discussion about task intelligence and its implications for system development.
Article
Full-text available
Understanding the association between customer demographics and behaviour is critical for operators of indoor retail spaces. This study explores such an association based on a combined understanding of customer Cyber (online), Physical, and (some aspects of) Social (CPS) behaviour, at the conjunction of corresponding CPS spaces. We combine the resu...
Article
Full-text available
Purpose Social media platforms provide a source of information about events. However, this information may not be credible, and the distance between an information source and the event may impact on that credibility. Therefore, the purpose of this paper is to address an understanding of the relationship between sources, physical distance from that...
Conference Paper
Full-text available
This paper investigates the Cyber-Physical behavior of a user in a large indoor shopping center by leveraging anonymized (opt in) Wi-Fi association and browsing logs recorded by the center operators. Our analysis shows that many users exhibit high correlation between their cyber activities and physical context. To find this correlation , we propose...
Conference Paper
Full-text available
Errors in formulation of queries made by users can lead to poor search results pages. We performed a living lab study using online A/B testing to measure the degree of improvement achieved with a query amendment technique when applied to a commercial job search engine. Of particular interest in this case study is a clear "success" signal, namely, t...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
Full-text available
Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this e...
Chapter
Full-text available
We address the problem of identifying in-app user actions from Web access logs when the content of those logs is both encrypted (through HTTPS) and also contains automated Web accesses. We find that the distribution of time gaps between HTTPS accesses can distinguish user actions from automated Web accesses generated by the apps, and we determine t...
Conference Paper
Full-text available
We address the problem of identifying in-app user actions from Web access logs when the content of those logs is both encrypted (through HTTPS) and also contains automated Web accesses. We find that the distribution of time gaps between HTTPS accesses can distinguish user actions from automated Web accesses generated by the apps, and we determine th...
Conference Paper
Full-text available
We conducted a laboratory-based observational study where pairs of people performed search tasks communicating verbally. Examination of the discourse allowed commonly used interactions to be identified for Spoken Conversational Search (SCS). We compared the interactions to existing models of search behaviour. We find that SCS is more complex and in...
Conference Paper
Typing is a common form of query input for search engines and other information retrieval systems; we therefore investigate the relationship between typing behavior and search interactions. The search process is interactive and typically requires entering one or more queries, and assessing both summaries from Search Engine Result Pages and the unde...
Article
Full-text available
Traditionally, recommender systems modelled the physical and cyber contextual influence on people's moving, querying, and browsing behaviours in isolation. Yet, searching, querying and moving behaviours are intricately linked, especially indoors. Here, we introduce a tripartite location-query-browse graph (LQB) for nuanced contextual recommendation...
Conference Paper
Full-text available
We investigated the credibility perception of tweet readers from the USA and by readers from eight Arabic countries; our aim was to understand if credibility was affected by country and/or by culture. Results from a crowd-sourcing experiment, showed a wide variety of factors affected credibility perception, including a tweet author’s gender, profil...
Conference Paper
Full-text available
This paper investigates if Information Foraging Theory can be used to understand differences in user behavior when searching on mobile and desktop web search systems. Two groups of thirty-six participants were recruited to carry out six identical web search tasks on desktop or on mobile. The search tasks were prepared with a different number and di...
Conference Paper
Full-text available
The increase of voice-based interaction has changed the way people seek information, making search more conversational. Development of effective conversational approaches to search requires better understanding of how people express information needs in dialogue. This paper describes the creation and examination of over 32K spoken utterances collec...
Conference Paper
Full-text available
Incorporating conventional, unsupervised features into a neural architecture has the potential to improve modeling effectiveness, but this aspect is often overlooked in the research of deep learning models for information retrieval. We investigate this incorporation in the context of answer sentence selection, and show that combining a set of query...
Conference Paper
Full-text available
Understanding the factors comprising IR system eeectiveness is of primary importance to compare diierent IR systems. EEectiveness is traditionally broken down, using ANOVA, into a topic and a system eeect but this leaves out a key component of our evaluation paradigm: the collections of documents. We break down eeective-ness into topic, system and...
Article
Re-finding is the process of searching for information that a user has previously encountered and is a common activity carried out with information retrieval systems. In this work, we investigate re-finding in the context of vertical search, differentiating and modeling user re-finding behavior within different media and topic domains, including im...
Article
Searching for specific topics on Twitter, readers have to judge the credibility of tweets. In this paper, we examine the relationship between reader demographics, news attributes and tweet features with reader’s credibility perception, and further examine the correlation among these factors. We found that reader’s educational background and geo-loc...
Conference Paper
Full-text available
We investigate the influence of language on the accuracy of geolocating Twitter users. Our analysis, using a large corpus of tweets written in thirteen languages, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance has a greater impact on accuracy than geographi...