Content uploaded by Aldo Lipani
Author content
All content in this area was uploaded by Aldo Lipani on Oct 10, 2021
Content may be subject to copyright.
How Am I Doing?
Evaluating Conversational
Search Systems Offline
Aldo Lipani, Ben Carterette, Emine Yilmaz
ACM Transactions on Information Systems
Volume 39 Issue 4 October 2021
SCAI - October 8th, 2021
Build test collections for conversational search systems.
A Framework for Offline Evaluation
• A methodology for building test collections with relevance judgments
• An evaluation measure based on a user interaction model
• An approach to collecting user interaction data to train the model
3
Test Collection-Based Evaluation of Conversational Search
4
System A
System B
Can we simulate this?
Subtopic-Based Evaluation of Conversational Search
6
Conversational Search Simulation Model
7
Components of a Simulation-Based Evaluation
Conversational search system:
•Takes a question/query, returns an answer in form of a sentence/paragraph
Test collection:
•Topics/tasks/information needs
•Subtopics/aspects/facets/subtasks/entities
•User queries that model subtopics
•Transition probabilities between subtopics
•Corpus of “answers”
•Relevance judgments of answers to subtopics
8
Evaluation Metric
Two variants for line 16, the subtopic
RI. sampled is independent of answer relevance
RD. sampled is conditioned on the answer relevance
9
Comparison of ECS to Precision and RBP
10
Summary
• Evaluating conversations offline with test collections is hard
• Use insights from diversity & novelty, sessions, tasks to design an evaluation
framework based on simulation
• Provided results based on subtopics, and subtopics and relevance.
• Crowdsource data for test collection queries and transition probabilities
• Evaluating conversations offline with test collections is not so hard anymore!
11