Using variable length ngrams for retrieving technical abstracts in Japanese (poster session).

DOI: 10.1145/355214.355250 Conference: Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, 2000, Hong Kong, China, September 30 - October 01, 2000
Previous studies have reported that bigrams work well for many Asian language including Chinese, Korean and Japanese. Most of these studies have focused on newspaper texts. We report an experiment with a very different genre (technical abstracts) and find performance can be improved by combining both short and long ngrams. It is a sound approach to work with all ngrams of all lengths since we will have more information than that of bigrams.

