Stephen E. Robertson

Stephen E. Robertson
University College London | UCL · Department of Computer Science

Doctor of Philosophy

About

258
Publications
124,438
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
19,406
Citations
Introduction
Stephen E. Robertson is currently retired, but has a visiting position at the Department of Computer Science, University College London. Stephen's research area is information retrieval. His most recent publication is 'A Brief History of Search Results Ranking'.

Publications

Publications (258)
Book
Full-text available
The idea that the digital age has revolutionized our day-to-day experience of the world is nothing new, and has been amply recognized by cultural historians. In contrast, Stephen Robertson’s BC: Before Computers is a work which questions the idea that the mid-twentieth century saw a single moment of rupture. It is about all the things that we had t...
Article
The theory and practice of search results ranking, as currently offered by most web search engines, is older than one might think. The first proposal for a system of ranking was in a JACM paper in 1960. Through the remainder of the twentieth century, extensive research was done on ranking systems - on devising methods of ranking, on the use of lear...
Conference Paper
To me, an awareness of history is a fundamental requirement for progress; and I believe that we in the field of information retrieval are currently ill-served in this domain, or at least not as aware as we should be. While it is true that a researcher in IR is expected to acquire some knowledge of what has gone before, this knowledge is typically f...
Article
Full-text available
Article
Stemming is a widely used technique in information retrieval systems to address the vocabulary mismatch problem arising out of morphological phenomena. The major shortcoming of the commonly used stemmers is that they accept the morphological variants of the query words without considering their thematic coherence with the given query, which leads t...
Conference Paper
Full-text available
Score-distribution models are used for various practical purposes in search, for example for results merging and threshold setting. In this paper, the basic ideas of the score-distributional approach to viewing and analysing the effectiveness of search systems are re-examined. All recent score-distribution modelling work depends on the availability...
Article
The possibility of using fewer topics in TREC, and in TREC-like initiatives, has been studied recently, with encouraging results: even when decreasing consistently the number of topics (for example, using a topic subset of cardinality only 10, in place of the usual 50) it is possible, at least potentially, to obtain similar results when evaluating...
Article
Full-text available
A solid research path towards new information retrieval models is to further develop the theory behind existing models. A profound understanding of these models is therefore essential. In this paper, we revisit probability ranking principle (PRP)-based models, probability of relevance (PR) models, and language models, finding conceptual differences...
Conference Paper
Increasingly, web recommender systems face scenarios where they need to serve suggestions to groups of users; for example, when families share e-commerce or movie rental web accounts. Research to date in this domain has proposed two approaches: computing recommendations for the group by merging any members' ratings into a single profile, or computi...
Article
Full-text available
Lab-based evaluations typically assess the quality of a retrieval system with respect to its ability to retrieve documents that are relevant to the information need of an end user. In a real-time search task however users not only wish to retrieve the most relevant items but the most recent as well. The current evaluation framework is not adequate...
Article
Full-text available
We explore the notion, put forward by Cormack & Lynam and Robertson, that we should consider a document collection used for Cranfield-style experiments as a sample from some larger population of documents. In this view, any per-topic metric (such as average precision) should be regarded as an estimate of that metric's true value for that topic in t...
Article
Full-text available
We propose an approach to the retrieval of entities that have a specific relationship with the entity given in a query. Our research goal is to investigate whether related entity finding problem can be addressed by combining a measure of relatedness of candidate answer entities to the query, and likelihood that the candidate answer entity belongs t...
Article
Full-text available
In this work, we propose a theory for information matching. It is motivated by the observation that retrieval is about the relevance matching between two sets of properties (features), namely, the information need representation and information item representation. However, many probabilistic retrieval models rely on fixing one representation and o...
Conference Paper
On the basis of a theoretical analysis of issues around populations and sampling, for both topics and documents, and parameters with which we hope to characterise the effectiveness of different systems, we propose a modification to the traditional average precision metric. This modification involves both transformation and (in the estimation of the...
Chapter
The web search engine has come to occupy a central position in our information-seeking habits as citizens. This chapter explores the genesis of the idea of a search engine, and how this mechanism has developed in the web context. Search engines have adapted to the world of the web (in particular, to users and to the uses to which they have been put...
Article
Full-text available
In this paper, an Eliteness Hypothesis for information retrieval is proposed, where we define two generative processes to create information items and queries. By assuming the deterministic relationships between the eliteness of terms and relevance, we obtain a new theoretical retrieval framework. The resulting ranking function is a unified one as...
Conference Paper
We consider the selection of good subsets of topics for system evaluation. It has previously been suggested that some individual topics and some subsets of topics are better for system evaluation than others: given limited resources, choosing the best subset of topics may give significantly better prediction of overall system effectiveness than (fo...
Article
Full-text available
We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two ne...
Article
Full-text available
We first present in this paper an analytical view of heuristic retrieval constraints which yields simple tests to determine whether a retrieval function satisfies the constraints or not. We then review empirical findings on word frequency distributions ...
Article
Full-text available
Traditional information retrieval research has mostly focussed on satisfying clearly specified information needs. However, in reality, queries are often ambiguous and/or underspeci-fied. In light of this, evaluating search result diversity is beginning to receive attention. We propose simple evalu-ation metrics for diversified Web search results. O...
Article
Most current machine learning methods for building search engines are based on the assumption that there is a target evaluation metric that evaluates the quality of the search engine with respect to an end user and the engine should be trained to optimize for that metric. Treating the target evaluation metric as a given, many different approaches (...
Conference Paper
Full-text available
Evaluation metrics play a critical role both in the context of comparative evaluation of the performance of retrieval systems and in the context of learning-to-rank (LTR) as objective functions to be optimized. Many different evaluation metrics have been proposed in the IR literature, with average precision (AP) being the dominant one due a number...
Conference Paper
Most information retrieval evaluation metrics are designed to measure the satisfaction of the user given the results returned by a search engine. In order to evaluate user satisfaction, most of these metrics have underlying user models, which aim at modeling how users interact with search engine results. Hence, the quality of an evaluation metric i...
Article
Full-text available
We consider the issue of evaluating information retrieval systems on the basis of a limited number of topics. In contrast to statistically-based work on sample sizes, we hypothesize that some topics or topic sets are better than others at predicting true system effectiveness, and that with the right choice of topics, accurate predictions can be obt...
Conference Paper
Full-text available
We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two ne...
Article
Full-text available
The LETOR datasets consist of data extracted from tradi- tional IR test corpora. For each of a number of test top- ics, a set of documents has been extracted, in the form of features of each document-query pair, for use by a ranker. An examination of the ways in which documents were se- lected for each topic shows that the selection has (for each o...
Article
Full-text available
Although Average Precision (AP) has been the most widely-used retrieval effectiveness metric since the ad-vent of Text Retrieval Conference (TREC), the general belief among researchers is that it lacks a user model. In light of this, Robertson recently pointed out that AP can be interpreted as a special case of Normalised Cu-mulative Precision (NCP...
Article
Full-text available
The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especi...
Conference Paper
Full-text available
We took part in the Web and Relevance Feedback tracks, using the ClueWeb09 corpus. To process the corpus, we developed a parallel processing pipeline which avoids the generation of an inverted file. We describe the components of the parallel architecture and the pipeline and how we ran the TREC experiments, and we present effectiveness results.
Conference Paper
Full-text available
Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut- off value which optimizes a given effectiveness measure. Assuming no other inp...
Conference Paper
Much research in learning to rank has been placed on developing sophisticated learning methods, treating the training set as a given. However, the number of judgments in the training set directly aff ects the quality of the learned system. Given the expense of obtaining relevance judgments for constructing training data, one often has a limited bud...
Conference Paper
Full-text available
The ESP Game was designed to harvest human intelligence to assign labels to images - a task which is still difficult for even the most advanced systems in image processing. However, the ESP Game as it is currently implemented encourages players to assign "obvious" labels, which can be easily predicted given previously assigned labels. We present a...
Conference Paper
This book constitutes the refereed proceedings of the Second International Conference on the Theory of Information Retrieval, ICTIR 2009, held in Cambridge, UK, in September 2009. The 18 revised full papers, 14 short papers, and 11 posters presented together with one invited talk were carefully reviewed and selected from 82 submissions. The papers...
Article
Full-text available
Themes of the talk • Search as a science • The role of experiment and other empirical data gathering in IR • The (partial) standoff between the Cranfield tradition and user-oriented work • The role of theory in IR – the relation of theories and models to empirical data • Abstraction July 2009 Evaluation workshop, SIGIR 09, Boston 2 A caricature On...
Article
Full-text available
Collaborative filtering is concerned with making recommendations about items to users. Most formulations of the problem are specifically designed for predicting user ratings, assuming past data of explicit user ratings is available. However, in practice we may only have implicit evidence of user preference; and furthermore, a better view of the tas...
Article
Full-text available
Relevance Feedback has been one of the successes of information retrieval research for the past 30 years. It has been proven to be worthwhile in a wide variety of settings, both when actual user feedback is available, and when the user feedback is implicit. However, while the applications of relevance feedback and type of user input to relevance fe...
Article
Full-text available
This paper is a personal take on the history of evaluation experiments in information retrieval. It describes some of the early experiments that were formative in our understanding, and goes on to discuss the current dominance of TREC (the Text REtrieval Conference) and to assess its impact.
Conference Paper
Full-text available
We present the results of experiments using terms from citations for scientific literature search. To index a given document, we use terms used by citing documents to describe that document, in combination with terms from the document itself. We find that the combination of terms gives better retrieval performance than standard indexing of the docu...
Article
This article presents a bilingual ontology-based dialog system with multiple services. An ontology-alignment algorithm is proposed to integrate ontologies of different languages for cross-language applications. A domain-specific ontology is further ...
Conference Paper
Full-text available
Query expansion by word alterations (alterna- tive forms of a word) is often used in Web search to replace word stemming. This allows users to specify particular word forms in a query. However, if many alterations are added, query traffic will be greatly increased. In this paper, we propose methods to select only a few useful word alterations for q...
Conference Paper
Full-text available
In the field of information retrieval, one is often faced with the problem of computing the correlation between two ranked lists. The most commonly used statistic that quantifies this correlation is Kendall's Τ. Often times, in the information retrieval community, discrepancies among those items having high rankings are more important than those am...
Conference Paper
Full-text available
Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality - many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that...
Conference Paper
We consider the question of whether Average Precision, as a measure of retrieval effectiveness, can be regarded as deriving from a model of user searching behaviour. It turns out that indeed it can be so regarded, under a very simple stochastic model of user behaviour.
Conference Paper
Full-text available
We address the problem of learning large complex rank- ing functions. Most IR applications use evaluation metrics that depend only upon the ranks of documents. However, most ranking functions generate document scores, which are sorted to produce a ranking. Hence IR metrics are innately non-smooth with respect to the scores, due to the sort. Un- for...
Conference Paper
The Cranfield projects began in 1958 -- fifty years ago. They have of course been extraordinarily influential, forming a view of information retrieval as an experimental science, which in some fashion persists to this day. Although the Cranfield tradition has had its ups and downs - the main down being in the late eighties, when it showed signs of...
Conference Paper
In previous work, we have shown that using terms from around citations in citing papers to index the cited paper, in addition to the cited paper's own terms, can improve retrieval effectiveness. Now, we investigate how to select text from around the citations in order to extract good index terms. We compare the retrieval effectiveness that results...
Article
Full-text available
In this work 1 , we analyze the popular KL-divergence ranking function in information re-trieval. We uncover the generative distribution, namely the Smoothed Dirichlet distribution, under-lying this ranking function and show that this distri-bution captures term occurrence distribution much better than the multinomial, thus offering, for the first...
Conference Paper
Full-text available
This paper describes the official measures of retrieval effectiveness that are planned to be employed for the ad hoc track of INEX 2007.
Conference Paper
This paper describes the official measures of retrieval effectiveness that are employed for the Ad Hoc Track at INEX 2007. Whereas in earlier years all, but only, XML elements could be retrieved, the result format has been liberalized to arbitrary passages. In response, the INEX 2007 measures are based on the amount of highlighted text retrieved, l...
Article
Full-text available
Retrieval system experimentation has assumed that user requests represent a single information need. The problem is identifying and meeting this need. Search engine experience demonstrates that this assumption is far from holding in the real world. Responding appropriately to this fact raises new issues for research on retrieval system theory, desi...
Article
Retrieval system experimentation has assumed that user requests represent a single information need. The problem is identifying and meeting this need. Search engine experience demonstrates that this assumption is far from holding in the real world. Responding appropriately to this fact raises new issues for research on retrieval system theory, desi...
Article
This paper describes research that aims to define the information needs of mobile individuals, to implement a mobile information system that can satisfy those needs, and finally to evaluate the performance of that system with end-users. First a review ...
Article
Full-text available
Purpose – An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted f...
Article
Full-text available
Many current retrieval models and scoring functions contain free pa- rameters which need to be set - ideally, optimized. The process of optimization normally involves some training corpus of the usual document-query-relevance judgement type, and some choice of mea- sure that is to be optimized. The paper proposes a way to think about the process of...
Article
Full-text available
We investigate the effect of different sources of relevant documents in the creation of a test collection in the scientific domain. Based on the Cranfield 2 design, paper authors are asked to judge their cited papers for relevance in the first stage. In a second stage, documents outside the reference list are judged. In this paper, we use the test...
Conference Paper
Full-text available
We discuss the idea of modelling the statistical distributions of scores of documents, classified as relevant or non-relevant. Various specific combinations of standard statistical distributions have been used for this purpose. Some theoretical considerations indicate problems with some of the choices of pairs of distributions. Specifically, we rev...
Article
Full-text available
In early 2006, as a result of a series of conversations between Steve Robertson, Mark Sanderson and Karen Spärck-Jones, Karen circulated a note summing up our discussions, which were on the topic of ambiguous requests. At the core of our discussion was the question: is too much information retrieval research focussed on search tasks where the query...
Conference Paper
Full-text available
We propose a novel method of analysing data gathered from TREC or similar information retrieval evaluation experi- ments. We define two normalized versions of average pre- cision, that we use to construct a weighted bipartite graph of TREC systems and topics. We analyze the meaning of well known — and somewhat generalized — indicators from social n...
Conference Paper
Full-text available
The experimental evaluation of information retrieval systems has a venerable history. Long before the current notion of a search engine, in fact before search by computer was even feasible, people in the library and information science community were beginning to tackle the evaluation issue. Sometimes it feels as though evaluation methodology has b...
Article
Full-text available
Work on the statistical validity of experimental results in retrieval tests has concentrated on treating the topics as a sample from a population, but regarding the collection of documents as fixed. This paper raises the argument that we should also consider the documents as having been sampled from a population. It follows that we should regard a...
Conference Paper
Full-text available
In this paper, we describe the Centre for Interactive Systems Research’s participation in the INEX 2006 adhoc track. Rather than using a field-weighted BM25 model in INEX 2005, we revert back to using the traditional BM25 weighting function. Our main research aims in this year are to investigate the effects of document filtering (by considering onl...
Article
Lexical cohesion is a property of text, achieved through lexical-semantic relations between words in text. Most information retrieval systems make use of lexical relations in text only to a limited extent. In this paper we empirically investigate whether the degree of lexical cohesion between the contexts of query terms’ occurrences in a document i...
Article
Full-text available
We consider the question of how informa- tion from the textual context of citations in scientic papers could improve index- ing of the cited papers. We rst present ex- amples which show that the context should in principle provide better and new index terms. We then discuss linguistic phenom- ena around citations and which type of processing would...
Conference Paper
Full-text available
We consider the retrieval of XML-structured documents, and of passages from such documents, defined as elements of the XML structure. These are considered from the point of view of passage retrieval, as a form of document retrieval. A retrievable unit (an element chosen as defining suitable passages for retrieval) is a textual document in its own r...
Chapter
Full-text available
The two previous probabilistic models of information retrieval, which seemed to be in some sense incompatible, can now be regarded as two complementary parts of a unified model. The new Model 3, which is derived in the framework of the unified model from a combination of Models 1 and 2, makes use of relevance feedback information from the individua...
Conference Paper
Full-text available
We present an approach to building a test collection of research papers. The ap-proach is based on the Cran eld 2 tests but uses as its vehicle a current conference; research questions and relevance judge-ments of all cited papers are elicited from conference authors. The resultant test col-lection is different from TREC's in that it comprises scie...
Conference Paper
Full-text available
This is the first year for the participation of the City University Centre of Interactive System Research (CISR) in the Expert Search Task. In this paper, we describe an expert search experiment based on window- based techniques, that is, we build profile for each expert by using informa- tion around the expert's name and email address in the docum...
Conference Paper
This paper, based on a talk, presents an overview of evaluation experiments in information retrieval, and also of statistical approaches to search. A strong connection exists between them: the notion that the objective of search can be expressed in terms of the measures used for evaluation informs the statistical theory in several ways. The latest...