Bodo Billerbeck

Bodo Billerbeck
Microsoft · Bing

PhD

About

41
Publications
3,012
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
544
Citations
Additional affiliations
October 2007 - present
Microsoft
Position
  • Senior Applied Scientist
January 2000 - July 2005
RMIT University
Position
  • Sessional Head Tutor
Education
February 2003 - September 2005
RMIT University
Field of study
  • Computer Science, Information Retrieval
September 1997 - February 2002
RMIT University
Field of study
  • Computer Systems Engineering
February 1997 - September 2002
RMIT University
Field of study
  • Computer Science

Publications

Publications (41)
Preprint
Search techniques make use of elementary information such as term frequencies and document lengths in computation of similarity weighting. They can also exploit richer statistics, in particular the number of documents in which any two terms co-occur. In this paper we propose alternative methods for computing this statistic, a challenging task becau...
Preprint
Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval. However, click logs have not been publicly released for academic use, because they can be too revealing of personally or commercially sensitive information. This paper describes a click data release rela...
Chapter
Privacy concerns can prohibit research access to large-scale commercial query logs. Here we focus on generation of a synthetic log from a publicly available dataset, suitable for evaluation of query auto completion (QAC) systems. The synthetic log contains plausible string sequences reflecting how users enter their queries in a QAC interface. Prope...
Chapter
The SynthaCorpus methods for generating words (Chapters 4 and 5) are concerned with generating a series of integers Ri representing the ranks of the words in a Zipf-style ordering. If we are given a lexicon (for example the lexicon of a corpus being emulated), we can convert each Ri to a string by simple look-up. If we have no lexicon and no intere...
Chapter
Term dependence manifests itselfas patterns ofword associations which would be very unlikely to be observed if words were randomly scattered throughout the corpus.
Chapter
There are two principal use cases for simulating information retrieval test collections: emulating private corpora to train and tune algorithms, and to estimate hardware requirements; andgenerating artificial test collections to support academic research, particularly into efficiency and scalability. emulating private corpora to train and tune algo...
Chapter
In order to study query processing efficiency and effectiveness using a simulated text corpus, it is necessary to obtain a set of compatible queries and judgments. The text generation methods implemented in Syntha Corpus make no pretence of being able to generate meaningful natural language. Consequently, it is presently out of the question that ad...
Chapter
Chapters 3–6 present alternative approaches to each of the major corpus modeling dimensions. These approaches vary in their ability to faithfully model a corpus; some of the approaches can be more or less faithful depending upon settings such as the number ofsegments in a piecewise linear model. There is a clear need to devise suitable evaluation m...
Chapter
In discussion of collection emulation thus far we have focused on achieving fidelity of emulation, and shown that achieving high fidelity requires complex models with many parameters.
Chapter
In this chapter, we assess the validity of synthetic test collections, constructed using methods we have described, in IR experimentation. To what extent do the timing, resource usage, and effectiveness results obtainable using synthetic data predict those we would get with real data? We also explore the trade-off between emulation fidelity and con...
Chapter
Both in industry and in research, it is desirable that the runtime of IR algorithms should grow no faster than linearly with the size of the problem. An algorithm is said to be scalable if its running time increases linearly with size.
Chapter
In this chapter we take a look at the speed of operation of various components of the SynthaCorpus suite. We start with a somewhat of-the-wall idea, that of building a text generator into the retrieval system component (particularly the indexer) in order to save on the time and space needed to write very large corpora to disk.
Chapter
The distribution of word frequencies has a big effect on the design of efficient text retrieval systems. Schemes for reducing the size of indexes, for matching phrases, and for efficiently processing queries of all types are designed around the observation that in most real text corpora thousands of words occur only once while a few words may accou...
Chapter
In some applications, very approximate modeling of the distribution of document lengths will suffice. However, there are many scenarios in which accurate modeling is desirable. For example, significant gains in retrieval effectiveness were achieved at TREC-3 through better normalization of document length [74, 81]. Those effects could not have been...
Article
Modern web search engines use many signals to select and rank results in response to queries. However, searchers’ mental models of search are relatively unsophisticated, hindering their ability to use search engines efficiently and effectively. Annotating results with more in-depth explanations could help, but search engine providers need to know w...
Article
Full-text available
Query auto completion (QAC) is used in search interfaces to interactively offer a list of suggestions to users as they enter queries. The suggested completions are updated each time the user modifies their partial query, as they either add further keystrokes or interact directly with completions that have been offered. In this work we use a state m...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
When building a large inverted file index on a system with effectively unlimited memory, performance may be constrained by RAM latency. To optimise speed requires an understanding of the non-uniform memory access characteristics of modern systems. We address three main techniques for improving the performance of an in-memory, list-based inverted fi...
Conference Paper
Similarity functions assign scores to documents in response to queries. These functions require as input statistics about the terms in the queries and documents, where the intention is that the statistics are estimates of the relative informativeness of the terms. Common measures of informativeness use the number of documents containing each term (...
Patent
Full-text available
Systems, methods, and computer media for identifying query rewriting replacement terms are provided. A list of related string pairs each comprising a first string and second string is received. The first string of each related string pair is a user search query extracted from user click log data. For one or more of the related string pairs, the str...
Conference Paper
Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. Rewriting algorithms bring linguistic datasets to bear without the need for iterative relevance feedback, but most studies of rewriting have used proprietary datasets such as large-scale search logs. By...
Conference Paper
Full-text available
We present a new approach for personalizing Web search results to a specific user. Ranking functions for Web search engines are typically trained by machine learning algorithms using either direct human relevance judgments or indirect judgments obtained from click-through data from millions of users. The rankings are thus optimized to this generic...
Conference Paper
Full-text available
Searching for entities is an emerging task in Information Retrieval for which the goal is finding well defined entities instead of documents matching the query terms. In this paper we propose a novel approach to Entity Retrieval by using Web search engine query logs. We use Markov random walks on (1) Click Graphs – built from clickthrough data – a...
Conference Paper
Full-text available
We present an approach for answering Entity Retrieval queries using click-through information in query log data from a commercial Web search engine. We compare results using click graphs and session graphs and present an evaluation test set making use of Wikipedia "List of" pages.
Conference Paper
Full-text available
Clickthrough data has been the subject of increasing pop- ularity as an implicit indicator of user feedback. Previous analysis has suggested that user click behaviour is subject to a quality bias—that is, users click at different rank positions when viewing effective search results than when viewing less effective search results. Based on this ob-...
Article
Query expansion is a well-known method for improving average eectiv eness in information retrieval. The most eectiv e query expansion methods rely on retriev- ing documents which are used as a source of expansion terms. Retrieving those documents is costly. We examine the bottlenecks of a conventional approach and investigate alternative methods ai...
Article
Full-text available
In document information retrieval, the ter-minology given by a user may not match the terminol-ogy of a relevant document. Query expansion seeks to address this mismatch; it can significantly increase effectiveness, but is slow and resource-intensive. We investigate the use of document expansion as an alter-native, in which documents are augmented...
Conference Paper
Full-text available
The terabyte track consists of the three tasks: adhoc retrieval, efficient retrieval, and named page finding. For the adhoc retrieval task we used a language modelling approach based on query likelihood, as well as a new technique aimed at reducing the amount of memory used for ranking documents. For the efficiency task, we submitted results from b...
Conference Paper
Full-text available
Query expansion is a well-known method for improving av- erage eectiveness in information retrieval. However, the most eective query expansion methods rely on costly retrieval and processing of feed- back documents. We explore alternative methods for reducing query- evaluation costs, and propose a new method based on keeping a brief summary of each...
Thesis
Full-text available
Hundreds of millions of users each day search the web and other repositories to meet their information needs. However, queries can fail to find documents due to a mismatch in terminology. Query expansion seeks to address this problem by automatically adding terms from highly ranked documents to the query. While query expansion has been shown to be...
Article
In information retrieval, queries can fail to find documents due to mismatch in terminology. Query expansion is a well-known technique addressing this problem, where additional query terms are automatically chosen from highly ranked documents, and it has been shown to be e#ective at improving query performance. However, current techniques for query...
Conference Paper
Full-text available
In information retrieval, queries can fail to find documents due to mismatch in terminology. Query expansion is a well-known technique addressing this problem, where additional query terms are automatically chosen from highly ranked documents, and it has been shown to be effective at improving query performance. However, current techniques for quer...
Article
Full-text available
Hundreds of millions of users each day use web search engines to meet their information needs. Advances in web search e#ectiveness are therefore perhaps the most significant public outcomes of IR research. Query expansion is one such method for improving the e#ectiveness of ranked retrieval by adding additional terms to a query. In previous approac...
Conference Paper
Full-text available
The effectiveness of queries in information retrieval can be improved through query expansion. This technique automatically introduces additional query terms that are statistically likely to match documents on the intended topic. However, query expansion techniques rely on fixed parameters. Our investigation of the effect of varying these parameter...

Network

Cited By