How many performance measures to evaluate information retrieval systems?
ABSTRACT Evaluating effectiveness of information retrieval systems is achieved by performing on a collection of documents, a search,
in which a set of test queries are performed and, for each query, the list of the relevant documents. This evaluation framework
also includes performance measures making it possible to control the impact of a modification of search parameters. The program
trec_eval calculates a large number of measures, some being more used like the mean average precision or recall-precision
curves. The motivation of our work is to compare all measures and to help the user to choose a small number of them when evaluating
different information retrieval systems. In this paper, we present the study we carried out from a massive data analysis of
TREC results. Relationships between the 130 measures calculated by trec_eval for individual queries are investigated, and
we show that they can be clustered into homogeneous clusters.
KeywordsInformation retrieval–Performance measures–Evaluation–Statistical data analysis