S. Walker's research while affiliated with City, University of London and other places

Publications (17)

Article
Full-text available
this paper. For these comparisons we consider only the msPUMb model. We note that these results are better than those submitted originally: after submission we discovered an error in our data normalisation procedure; after correcting it the performance of all models was increased
Article
Full-text available
This paper explores various strategies for enhancing the reliability of pseudo-relevance feedback using TREC and NTCIR test collections. For each test request, the number of pseudo-relevanct documents (Ê) or the number of expansion terms (Ì) is determined based on a similar training request (i.e. via direct mapping) or a group of similar training r...
Article
Full-text available
Introduction: summary General City submitted two runs each for the automatic ad hoc, very large collection track, automatic routing and Chinese track; and took part in the interactive and #ltering tracks. There were no very signi#cant new developments; the same Okapi-style weighting as in TREC#3 and TREC#4 was used this time round, although there w...
Article
Full-text available
this paper; comparisons between passage and non-passage runs can be seen in a few of the tables.
Article
This paper reports on City University's work on the TREC#2 project from its commencementuptoNovember 1993. It includes many results whichwere obtained after the August 1993 deadline for submission of o#cial results.
Article
Full-text available
this paper; comparisons between passage and non-passage runs can be seen in a few of the tables.
Article
A brief review of the history of laboratory testing of information retrieval systems focuses on the idea of a general-purpose test collection of documents, queries and relevance judgements. The TREC programme is introduced in this context, and an overview is given of the methods used in TREC. The Okapi team’s participation in TREC is then discussed...
Article
Transaction logging has been used extensively in Okapi-related projects to allow search algorithms and user interfaces to be investigated, evaluated and compared. Logging software may be independent of the system being examined, or integrated with it. Logging is undertaken for various purposes: recovery, playback or analysis, and the methods and fo...
Article
At the heart of the Okapi system is a formula referring to some half a dozen variables, which estimate the probability that a given document is relevant to a given query. User interface design for Okapi aims to present its search capabilities as clearly and simply as possible. But between the basic formula and the simple interface lie several layer...
Article
Full-text available
City submitted two runs each for the automatic ad hoc, very large collection track, automatic routing and Chinese track; and took part in the interactive and ltering tracks. There were no very signicant new developments; the same Okapi-style weighting as in TREC{3 and TREC{4 was used this time round, although there were attempts, in the ad hoc and...
Article
The Okapi system has been used in a series of experiments on the TREC collections, investigating probabilistic models, relevance feedback, and query expansion, and interaction issues. Some new probabilistic models have been developed, resulting in simple weighting functions that take account of document length and within-document and within-query t...
Article
The Okapi system has been used in a series of experiments on the TREC collections, investigating probabilistic models, relevance feedback, and query expansion, and interaction issues. Some new probabilistic models have been developed, resulting in simple weighting functions that take account of document length and within-document and within-query t...

Citations

... Text matching is an important technique that can find accurate information from the huge amount of resources and plays a fundamental role in many downstream natural language processing tasks, such as Document Retrieval (DR) [33], Question Answering (QA) [6], Retrieval-based Dialogue (RD) [52], Paraphrase Identification (PI) [11], and Natural Language Inference (NLI) [3]. Traditional text matching methods focus on measuring the word-to-word exact matching between two texts, for instance, TF-IDF [17] and BM25 [40]. The design of such methods follows the heuristics in information retrieval [12], so it can generalize to different tasks by only adjusting a small number of parameters. ...
... In information retrieval, query expansion, especially pseudo-relevance feedback (PRF), has been studied for years [3], [4], [5], [6], [7], [8], [9], and integrated in various classic retrieval models, such as vector space model [10], probabilistic model [11], relevance model [12] and mixture model [13]. ...
... It has implementations for all the selected twenty-five retrieval models to experiment with in the proposed empirical investigation of this article. These include TF-IDF [44,61,63,68], LemurTF_IDF [80], DLH [3], DLH13 [3,45] DPH [3,5], BM25 [34,48,61,69], DFR_BM25 [2,4], InL2 [2,4], InB2 [44], In_expB2 [2,4], In_expC2 [2,4], IFB2 [2,4], PL2 [2,4], BB2 [2,4], DFIC [22,39], DFIZ [22,39], DFRee [44], DFReeKLIM [5,44], DirichletLM [59,81], HiemstraLM [27], LGD [19,20], ML2 [58], MDL2 [58], PL2F [46], and BM25F [79]. To make the understanding easier on these models, Table 4 of Appendix 1 lists all the symbols used in their mathematical representations along with their intended meaning and usage. ...
... ) for utility optimization [9][13], and margin-based local regression for risk reduction [11]. It is beyond the scope of this paper to compare all the different ways to adapt Rocchio-style methods for AF. ...
... Sakai [2000Sakai [ , 2001 proposed a Flexible PRF method that determines P and T based on "sudden drops" in the initial document score curve. At NTCIR-2, Sakai et al. [2001aSakai et al. [ , 2001b used various statistics, such as the (normalized) document scores, queryterm idf and expansion-term selection values for mapping test topics onto training topics to achieve Flexible PRF. revisited the approach used by Sakai et al. [2000], but used the average initial document score, the average term selection value, and so on, as evidence for grouping topics and reported that using the average initial document score for Flexible PRF has a small positive effect that is consistent across different test collections and languages. ...
... Interactive Information Retrieval algorithms for term selection for query expansion. Prior to TREC, the Okapi team had worked only with operational systems or with small-scale partially controlled experiments with real collections (Robertson, Walker & Beaulieu, 1997, p. 23). As such TREC is a change of test culture and environment to the Okapi test tradition. ...
... Plus un terme candidat est jugé important vis-à-vis du document analysé, plus celui-ci est pertinent en tant que terme-clé. Okapi (ou BM25) (Robertson et al., 1999) est une mesure alternative à TF-IDF. En Recherche d'Information (RI), celle-ci est plus utilisée que le TF-IDF. ...
... The statistics of this data set are shown inTable 5. In the IP@CLEF-2009 competition, the best systems employed retrieval models (Kullback Leibler [33], Okapi [34]) as well as regression models [35, 36, 37]. Most of the systems participated in the competition despite the high complexity and obtained poor results [3]. ...
... We consider the dialog history and the initial response as a query to retrieve relevant knowledge instances from the corpus. Next, we identify the top relevant instances in the given corpus with respect to the constructed query using cosine similarity on TF-IDF based representations (Robertson et al., 1995). ...
... A decade ago, a TREC task called " Filtering " [11] had the following definition: finding documents relevant to a query in a stream of data. Several effective approaches were inspired by information retrieval techniques to score documents (Okapi [12], Rocchio [13], ...) with the use of a learned threshold to filter out non relevant documents [15]. Most successful approaches rely on machine learning with extensive use of SVMs with words as features [2]. ...