To read the full-text of this research, you can request a copy directly from the authors.
This is RMIT's first year of participation in the TDT evaluati on. Our system uses a linear classifier to track topics and an approac h based on our previous work in document routing. We aimed this year to develop a baseline system, and to then test selected variati ons, in- cluding adaptive tracking. Our contribution this year to ha ve im- plemented an efficient system, that is, one that maximises tr acking document throughput and minimises system overheads.
The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.
This paper reports on City University's work on the TREC#2 project from its commencementuptoNovember 1993. It includes many results whichwere obtained after the August 1993 deadline for submission of o#cial results.
Training algorithms for linear text classifiers
D D Lewis
R E Schapire
J P Callan
D.D. Lewis, R.E. Schapire, J.P. Callan, and R. Papka. Training algorithms for linear text classifiers. In Hans-Peter Frei,
Donna Harman, Peter Schäuble, and Ross Wilkinson, editors,
Proc. ACM-SIGIR International Conference on Research and
Development in Information Retrieval, pages 298-306, Zurich,