Alistair Moffat

Alistair Moffat
University of Melbourne | MSD · Department of Computing and Information Systems

About

316
Publications
33,517
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,385
Citations
Additional affiliations
November 1986 - present
University of Melbourne
Position
  • Professor
February 1981 - November 1986
University of Canterbury
Position
  • Assistant Lecturer

Publications

Publications (316)
Article
Full-text available
In top- k ranked retrieval the goal is to efficiently compute an ordered list of the highest scoring k documents according to some stipulated similarity function such as the well-known BM25 approach. In most implementation techniques a min-heap of size k is used to track the top scoring candidates. In this work we consider the question of how best...
Article
Web connectivity graphs and similar linked data such as inverted indexes are important components of the information access systems provided by social media and web search services. The Bipartite Graph Partitioning mechanism of Dhulipala et al. [KDD 2016] relabels the vertices of large sparse graphs, seeking to enhance compressibility and thus redu...
Preprint
The recent MSMARCO passage retrieval collection has allowed researchers to develop highly tuned retrieval systems. One aspect of this data set that makes it distinctive compared to traditional corpora is that most of the topics only have a single answer passage marked relevant. Here we carry out a "what if" sensitivity study, asking whether a set o...
Article
We consider the precedence and priority issues that arise from the increasingly common trend of distributing unrefereed preprints via services such as arXiv prior to or at the same time as they are submitted for peer review, and the effect that this practice can have on the integrity of the scientific review process. We offer some suggestions which...
Chapter
IR test collections make use of human annotated judgments. However, new systems that surface unjudged documents high in their result lists might undermine the reliability of statistical comparisons of system effectiveness, eroding the collection’s value. Here we explore a Bayesian inference-based analysis in a “high uncertainty” evaluation scenario...
Chapter
Privacy concerns can prohibit research access to large-scale commercial query logs. Here we focus on generation of a synthetic log from a publicly available dataset, suitable for evaluation of query auto completion (QAC) systems. The synthetic log contains plausible string sequences reflecting how users enter their queries in a QAC interface. Prope...
Article
Full-text available
Query auto completion (QAC) is used in search interfaces to interactively offer a list of suggestions to users as they enter queries. The suggested completions are updated each time the user modifies their partial query, as they either add further keystrokes or interact directly with completions that have been offered. In this work we use a state m...
Conference Paper
Offline metrics for IR evaluation are often derived from a user model that seeks to capture the interaction between the user and the ranking, conflating the interaction with a ranking of documents with the user's interaction with the search results page. A desirable property of any effectiveness metric is if the scores it generates over a set of ra...
Conference Paper
Full-text available
A wide range of evaluation metrics have been proposed to measure the quality of search results, including in the presence of diversification. Some of these metrics have been adapted for use in search tasks with different complexities, such as where the search system returns lists of different lengths. Given the range of requirements, it can be diff...
Conference Paper
Relevance judgments are conventionally formed by small numbers of experts using ordinal relevance scales defined by two or more relevance categories. Such judgments often contain many ties: documents in the same category that cannot be separated by relevance. Here we explore the use of crowd-sourcing and combined three-way relevance assessments usi...
Article
One typical way of building test collections for offline measurement of information retrieval systems is to pool the ranked outputs of different systems down to some chosen depth d and then form relevance judgments for those documents only. Non-pooled documents—ones that did not appear in the top-d sets of any of the contributing systems—are then d...
Conference Paper
Full-text available
Errors in formulation of queries made by users can lead to poor search results pages. We performed a living lab study using online A/B testing to measure the degree of improvement achieved with a query amendment technique when applied to a commercial job search engine. Of particular interest in this case study is a clear "success" signal, namely, t...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
We examine approaches used for block-based inverted index compression, such as the OptPFOR mechanism, in which fixed-length blocks of postings data are compressed independently of each other. Building on previous work in which asymmetric numeral systems (ANS) entropy coding is used to represent each block, we explore a number of enhancements: (i) t...
Article
Efficient storage of large inverted indexes is one of the key technologies that support current web search services. Here we re-examine mechanisms for representing document-level inverted indexes and within-document term frequencies, including comparing specialized methods developed for this task against recent fast implementations of general-purpo...
Conference Paper
Query performance prediction estimates the effectiveness of a query in advance of human judgements. Accurate prediction could be used, for example, to trigger special processing, select query variants, or choose whether to search at all. Prediction evaluations have not distinguished effects due to query wording from effects due to the underlying in...
Conference Paper
Query auto completion mechanisms assist users to formulate search requests by suggesting possible queries corresponding to incomplete text they have typed. Keystroke by keystroke, these mechanisms proceed by finding matching strings from resources such as logs that have captured the behavior of previous users; they might also be informed by key phr...
Conference Paper
Search is near-ubiquitous in human society, being used for entertainment, health, financial and business information seeking. Traditional methods of search evaluation have assumed that searchers move forward through search results in a linear manner; early eye tracking studies have suggested the same. Recent research, though, including eye-tracking...
Conference Paper
Techniques for effectively representing the postings lists associated with inverted indexes have been studied for many years. Here we combine the recently developed "asymmetric numeral systems" (ANS) approach to entropy coding and a range of previous index compression methods, including VByte, Simple, and Packed. The ANS mechanism allows each of th...
Article
Given an effectiveness metric M(·), two ordered document rankings X <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> and X <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> generated by a score-based information retrieval activity, and relevan...
Conference Paper
Increasing test collection sizes and limited judgment budgets create measurement challenges for IR batch evaluations, challenges that are greater when using deep effectiveness metrics than when using shallow metrics, because of the increased likelihood that unjudged documents will be encountered. Here we study the problem of metric score adjustment...
Conference Paper
A search engine that can return the ideal results for a person's information need, independent of the specific query that is used to express that need, would be preferable to one that is overly swayed by the individual terms used; search engines should be consistent in the presence of syntactic query variations responding to the same information ne...
Article
Information retrieval systems aim to help users satisfy information needs. We argue that the goal of the person using the system, and the pattern of behavior that they exhibit as they proceed to attain that goal, should be incorporated into the methods and techniques used to evaluate the effectiveness of IR systems, so that the resulting effectiven...
Article
Full-text available
Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. In this paper we extend the study of...
Conference Paper
Vast amounts of data are collected and stored every day, as part of corporate knowledge bases and as a response to legislative compliance requirements. To reduce the cost of retaining such data, compression tools are often applied. But simply seeking the best compression ratio is not necessarily the most economical choice, and other factors also co...
Conference Paper
Indexed pattern search in text has been studied for many decades. For small alphabets, the FM-Index provides unmatched performance for Count operations, in terms of both space required and search speed. For large alphabets – for example, when the tokens are words – the situation is more complex, and FM-Index representations are compact, but potenti...
Conference Paper
Full-text available
The Web has created a global marketplace for e-Commerce as well as for talent. Online employment marketplaces provide an effective channel to facilitate the matching between job seekers and hirers. This paper presents an initial exploration of user behavior in job and talent search using query and click logs from a popular employment marketplace. T...
Conference Paper
Batch-mode retrieval evaluation relies on suitable relevance judgments being available. Here we explore the implications on pool size of adopting a "query variations" approach to collection construction. Using the resources provided as part of the UQV100 collection [Bailey et al., SIGIR 2016] and a total of five different systems, we show that pool...
Conference Paper
We explore the implications of tied scores arising in the document similarity scoring regimes that are used when queries are processed in a retrieval engine. Our investigation has two parts: first, we evaluate past TREC runs to determine the prevalence and impact of tied scores, to understand the alternative treatments that might be used to handle...
Conference Paper
Batched evaluations in IR experiments are commonly built using relevance judgments formed over a sampled pool of documents. However, judgment coverage tends to be incomplete relative to the metrics being used to compute effectiveness, since collection size often makes it financially impractical to judge every document. As a result, a considerable b...
Conference Paper
Bag-of-words retrieval models are widely used, and provide a robust trade-off between efficiency and effectiveness. These models often make simplifying assumptions about relations between query terms, and treat term statistics independently. However, query terms are rarely independent, and previous work has repeatedly shown that term dependencies c...
Article
Motivation: Next generation sequencing machines produce vast amounts of genomic data. For the data to be useful, it is essential that it can be stored and manipulated efficiently. This work responds to the combined challenge of compressing genomic data, while providing fast access to regions of interest, without necessitating decompression of whol...
Article
Full-text available
Large-scale retrieval systems are often implemented as a cascading sequence of phases -- a first filtering step, in which a large set of candidate documents are extracted using a simple technique such as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scor...
Article
Full-text available
Batch IR evaluations are usually performed in a framework that consists of a document collection, a set of queries, a set of relevance judgments, and one or more effectiveness metrics. A large number of evaluation metrics have been proposed, with two primary families having emerged: recall-based metrics, and utility-based metrics. In both families,...
Conference Paper
We describe the UQV100 test collection, designed to incorporate variability from users. Information need ?backstories? were written for 100 topics (or sub-topics) from the TREC 2013 and 2014 Web Tracks. Crowd workers were asked to read the backstories, and provide the queries they would use; plus effort estimates of how many useful documents they w...
Conference Paper
Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. Here we extend the study of selectiv...
Conference Paper
Many types of search tasks are answered through the computation of a ranked list of suggested answers. We re-examine the usual assumption that answer lists should be as long as possible, and suggest that when the number of matching items is potentially small -- perhaps even zero -- it may be more helpful to "quit while ahead", that is, to truncate...
Article
Indexed pattern search in text has been studied for many decades. For small alphabets, the FM-Index provides unmatched performance, in terms of both space required and search speed. For large alphabets -- for example, when the tokens are words -- the situation is more complex, and FM-Index representations are compact, but potentially slow. In this...
Conference Paper
Web crawls generate vast quantities of text, retained and archived by the search services that initiate them. To store such data and to allow storage costs to be minimized, while still providing some level of random access to the compressed data, efficient and effective compression techniques are critical. The Relative Lempel Ziv (RLZ) scheme provi...
Conference Paper
Full-text available
Selective search is a distributed retrieval technique that reduces the computational cost of large-scale information retrieval. By partitioning the collection into topical shards, and using a resource selection algorithm to identify a subset of shards to search, selective search allows retrieval effectiveness to be maintained while evaluating fewer...
Conference Paper
Web archives, query and proxy logs, and so on, can all be very large and highly repetitive; and are accessed only sporadically and partially, rather than continually and holistically. This type of data is ideal for compression-based archiving, provided that random-access to small fragments of the original data can be achieved without needing to dec...
Conference Paper
A large number of metrics have been proposed to measure the effectiveness of information retrieval systems. Here we provide a detailed explanation of one recent proposal, INST, articulate the various properties that it embodies, and describe a number of pragmatic issues that need to be taken in to account when writing an implementation. The result...
Conference Paper
Sophisticated ranking mechanisms make use of term dependency features in order to compute similarity scores for documents. These features often include exact phrase occurrences, and term proximity estimates. Both cases build on the intuition that if multiple query terms appear near each other, the document is more likely to be relevant to the query...
Conference Paper
Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive judging is not feasible. Therefore an approach called pooling is typically used where, for...
Article
Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement le...
Conference Paper
Effective postings list compression techniques, and the efficiency of postings list processing schemes such as WAND, have significantly improved the practical performance of ranked document retrieval using inverted indexes. Recently, suffix array-based index structures have been proposed as a complementary tool, to support phrase searching. The rel...
Conference Paper
Test collection design eliminates sources of user variability to make statistical comparisons among information retrieval (IR) systems more affordable. Does this choice unnecessarily limit generalizability of the outcomes to real usage scenarios? We explore two aspects of user variability with regard to evaluating the relative performance of IR sys...
Article
Huffman codes are legendary in the computing disciplines, and are embedded in a wide range of critically important communications and storage codecs. With 2015 marking the 64th anniversary of their development-1,000,000 years in binary-it is timely to review Huffman and related codes, and the many mechanisms that have been developed for computing a...
Conference Paper
The use of phrases as part of similarity computations can enhance search effectiveness. But the gain comes at a cost, either in terms of index size, if all word-tuples are treated as queryable objects; or in terms of processing time, if postings lists for phrases are constructed at query time. There is also a lack of clarity as to which phrases are...
Conference Paper
Information retrieval systems are often evaluated through the use of effectiveness metrics. In the past, the metrics used have corresponded to fixed models of user behavior, presuming, for example, that the user will view a pre-determined number of items in the search engine results page, or that they have a constant probability of advancing from o...
Conference Paper
A citation network is a structure of linked documents that share a pool of authors and a pool of subjects, and via citations, provide references to related documents that have preceded them in the chronology of research. In this paper we review citation networks, and survey and categorize the operations that extract data from them. Our goal is to c...
Conference Paper
Information retrieval systems can be evaluated in laboratory settings through the use of user studies, and through the use of test collections and effectiveness metrics. In a larger investigation we are exploring the extent to which individual user differences and behaviours can affect the scores generated by a retrieval system. Our objective in th...
Conference Paper
The dominant retrieval models in information retrieval systems today are variants of TF×IDF, and typically use bag-of-words processing in order to balance recall and precision. However, the size of collections continues to increase, and the number of results produced by these models exceeds the number of documents that can be reasonably assessed. T...
Conference Paper
We consider the problem of pattern-search in compressed text in a context in which: (a) the text is stored as a sequence of factors against a static phrase-book; (b) decoding of factors is from right-to-left; and (c) extraction of each symbol in each factor requires Θ(logσ) time, where σ is the size of the original alphabet. To determine possible a...
Article
Understanding and modeling user behavior is critical to designing search systems: it allows us to drive batch evaluations, predict how users would respond to changes in systems or interfaces, and suggest ideas for improvement. In this work we present a comprehensive model of the interactions between a searcher and a search engine, and the decisions...
Article
Score-safe index processing has received a great deal of attention over the last two decades. By pre-calculating maximum term impacts during indexing, the number of scoring operations can be minimized, and the top-k documents for a query can be located efficiently. However, these methods often ignore the importance of the effectiveness gains possib...
Chapter
Engineering efficient implementations of compact and succinct structures is time-consuming and challenging, since there is no standard library of easy-to-use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is difficult, since older baseline implementations may not rel...
Article
Descriptions of new string search or indexing algorithms are often accompanied by an experimental evaluation. In this article, we provide guidance as to how such investigations can be carried out, drawing on our experience of measurement in this field. In particular, we describe methodologies for stratifying patterns according to their length and f...
Article
Full-text available
Next-generation sequencing technologies are revolutionizing medicine. Data from sequencing technologies is typically represented as a string of bases, an associated sequence of perbase quality scores, and other meta-data; and in aggregate can require a very large amount of space. The quality scores show how accurate the bases are with respect to th...
Article
Problem DefinitionSuppose that a message \( { M=\langle s_1, s_2, \dots, s_n\rangle } \) of length \( { n=|M| } \) symbols is to be represented, where each symbol si is an integer in the range \( { 1 \leq s_i \leq U } \), for some upper limit U that may or may not be known, and may or may not be finite. Messages in this form are commonly the output...
Conference Paper
Recent human evaluation of machine translation has focused on relative preference judgments of translation quality, making it difficult to track longitudinal improvements over time. We carry out a large-scale crowd-sourcing experiment to estimate the degree to which state-of-the-Art performance in machine translation has increased over the past fiv...
Conference Paper
Search effectiveness metrics quantify the relevance of the ranked document lists returned by retrieval systems. In this paper we characterize metrics according to seven numeric properties – boundedness, monotonicity, convergence, top-weightedness, localization, completeness, and realizability. We demonstrate that these properties partition the comm...
Conference Paper
Search engine result pages – the ten blue links – are a staple of document retrieval services. The usual presumption is that users read these one-by-one from the top, making judgments about the usefulness of documents based on the snippets presented, accessing the underlying document when a snippet seems attractive, and then moving on to the next s...
Conference Paper
Web search services process thousands of queries per second, and filter their answers from collections containing very large amounts of data. Fast response to queries is a critical service expectation. The well-known WAND processing strategy is one way of reducing the amount of computation necessary when executing such a query. The value of WAND ha...
Conference Paper
Web search tools are used on a daily basis by billions of people. The commercial providers of these services spend large amounts of money measuring their own effectiveness and benchmarking against their competitors; nothing less than their corporate survival is at stake. Techniques for offline or "batch" evaluation of search quality have received c...
Conference Paper
Full-text available
Human evaluation of machine translation quality is a key element in the development of machine translation systems, as automatic metrics are validated through correlation with human judgment. However , achievement of consistent human judgments of machine translation is not easy, with decreasing levels of consistency reported in annual evaluation ca...
Conference Paper
Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-to- use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is a difficult task, since older base- line implementa...
Conference Paper
Retrieval system effectiveness can be measured in two quite different ways: by monitoring the behavior of users and gathering data about the ease and accuracy with which they accomplish certain specified information-seeking tasks; or by using numeric effectiveness metrics to score system runs in reference to a set of relevance judgments. In the sec...
Conference Paper
The suffix array is an efficient in-memory data structure for pattern search; and two-level variants also exist that are suited to external searching and can handle strings larger than the available memory. Assuming the latter situation, we introduce a factor-based mechanism for compressing the text string that integrates seamlessly with the in-mem...
Article
The suffix array is an efficient data structure for in-memory pattern search. Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers. In this paper we describe a new two-level suffix array-based index structure that requires significantl...
Conference Paper
Genomic sequence data is being generated in massive quantities, and must be stored in compressed form. Here we examine the combined challenge of storing such data compactly, yet providing bioinformatics researchers with the ability to extract particular regions of interest without needing to fully decompress multi-gigabyte data collections. We focu...
Article
When faced with a poor set of document summaries on the first page of returned search results, a user may respond in various ways: by proceeding on to the next page of results; by entering another query; by switching to another service; or by abandoning their search. We analyse this aspect of searcher behaviour using a commercial search system, com...