Vaughan R. Shanks's research while affiliated with RMIT University and other places

Publications (5)

Conference Paper
Automatic categorisation is an important technique for the management of large document collections. Categorisation can be used to store or locate documents that satisfy an information need when the need cannot be expressed as a concise list of query terms. Inverted indexes are used in all query-based retrieval systems to allow efficient query proc...
Conference Paper
Categorisation is a useful method for organising documents into subcollections that can be browsed or searched to more accurately and quickly meet information needs. On the Web, category-based portals such as Yahoo! and DMOZ are extremely popular: DMOZ is maintained by over 56,000 volunteers, is used as the basis of the popular Google directory, an...
Article
This is RMIT's first year of participation in the TDT evaluati on. Our system uses a linear classifier to track topics and an approac h based on our previous work in document routing. We aimed this year to develop a baseline system, and to then test selected variati ons, in- cluding adaptive tracking. Our contribution this year to ha ve im- plement...

Citations

... Zusätzlich zum Inverted-Index speichern wir statistische Informationen bezüglich der Terme wie document frequency, inverse document frequency [51]. Abbildung 3.6 zeigt eine Tabelle aus diesen Informationen mit und ohne Lemmatisierung, Dekomposition und Stoppwort-Entfernung. ...
... Sparse inference Earlier research has applied inverted indices for reducing the classification times for Knearest Neighbours [Yang, 1994] and Centroid [Shanks et al., 2003]. The same reductions are gained for computing posterior probabilities for linearly interpolated language models in information retrieval [Hiemstra, 1998, Zhai andLafferty, 2001b]. ...
... The classification was actually conducted on the summaries of the web text documents which are organized in word-based approach. Shanks and Williams used only the first fragment of each document for their classification task [19]. However, this approach only works well for documents which present overview of the whole document at the beginning. ...