Nowadays, many factors support a growing production of information. In modern large environments, for the user's point of view, it is desirable to have IRS that retrieve docu- ments according to their relevance levels. Relevance levels have been studied in some previous IR works while some others (few) IR research works tackled the questions of IRS effectiveness and collections size. These latter works used standard IR measures on collections of increasing size to analyze IRS effectiveness scalability. In this work, we bring together these two issues in IR (multigraded relevance and scalability) by designing some new metrics for evaluating the ability of IRS to rank documents according to their relevance levels when collection size increases.
[Show abstract][Hide abstract] ABSTRACT: User relevance judgments are central to both the systems and user-oriented approaches to information retrieval (IR) systems research and development. User-oriented relevance research has also operated on two largely unconnected tracks. First, a relevance level track that examines users' criteria for relevance judgments. Second, a regions of relevance track that examines the measurement of users' relevance judgments. Users judgments and criteria for highly relevant items have been central issues for much of the relevance research. Findings are presented from four separate studies of relevance judgments by 55 users, conducting their initial online search on a particular information problem. In three studies, the number of items judged “partially” relevant (on a scale of relevant, partially relevant or not relevant) was positively correlated with different aspects of changes in users', including: (1) information problem definition, (2) search intermediaries' perceptions that a user's question and information problem has changed during the mediated search interaction, (3) personal knowledge due to the search interaction, and (4) criteria for making relevance judgments. Users with high knowledge and topic levels were more likely to judge items as highly relevant. Differences between users' criteria for highly, partially and non-relevant items are also identified. Findings suggest the need to expand the framework for relevance research and further identify the characteristics of the middle region of relevance or partial relevance as: (1) partially relevant items may play an important role in the early stages of a user's information seeking process over time for a particular information problem and (2) a relationship may exist between partially relevant items retrieved and changes in users' information problems during an information seeking process. Results also suggest that partially relevant items may be useful at the early stages of users' information seeking processes. We propose a useful concept of relevance as a relationship and an effect on the movement of a user through the iterative stages of their information seeking process. Users' relevance judgments can also be plotted on a three-dimensional spatial model of relevance level, region and time. Implications for the development of IR systems, searching practice and relevance research are also discussed.
[Show abstract][Hide abstract] ABSTRACT: This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in moderu large IR environments. The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) two novel measures computing the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. We then demonstrate the use of these evaluation methods in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance. The test was run with a best match retrieval system (In- Query ) in a text database consisting of newspaper articles. The results indicate that the tested strong query structures are most effective in retrieving highly relevant documents. The differences between the query types are practically essential and statistically significant. More generally, the novel evaluation methods and the case demonstrate that non-dichotomous rele- vance assessments are applicable in IR experiments, may reveal interesting phenomena, and allow harder testing of IR methods. 1.
Note: Although carefully collected, accuracy of this list of references cannot be guaranteed.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.