How Am I Doing?
Search Systems Offline
Aldo Lipani, Ben Carterette, Emine Yilmaz
ACM Transactions on Information Systems
Volume 39 Issue 4 October 2021
SCAI - October 8th, 2021
Build test collections for conversational search systems.
A Framework for Offline Evaluation
• A methodology for building test collections with relevance judgments
• An evaluation measure based on a user interaction model
• An approach to collecting user interaction data to train the model
Test Collection-Based Evaluation of Conversational Search
Can we simulate this?
Subtopic-Based Evaluation of Conversational Search
Conversational Search Simulation Model
Components of a Simulation-Based Evaluation
Conversational search system:
•Takes a question/query, returns an answer in form of a sentence/paragraph
•User queries that model subtopics
•Transition probabilities between subtopics
•Corpus of “answers”
•Relevance judgments of answers to subtopics
Two variants for line 16, the subtopic
RI. sampled is independent of answer relevance
RD. sampled is conditioned on the answer relevance
Comparison of ECS to Precision and RBP
• Evaluating conversations offline with test collections is hard
• Use insights from diversity & novelty, sessions, tasks to design an evaluation
framework based on simulation
• Provided results based on subtopics, and subtopics and relevance.
• Crowdsource data for test collection queries and transition probabilities
• Evaluating conversations offline with test collections is not so hard anymore!