PresentationPDF Available

SCAI 2021 Slides - How Am I Doing? Evaluating Conversational Search Systems Offline

Authors:
How Am I Doing?
Evaluating Conversational
Search Systems Offline
Aldo Lipani, Ben Carterette, Emine Yilmaz
ACM Transactions on Information Systems
Volume 39 Issue 4 October 2021
SCAI - October 8th, 2021
Build test collections for conversational search systems.
A Framework for Offline Evaluation
A methodology for building test collections with relevance judgments
An evaluation measure based on a user interaction model
An approach to collecting user interaction data to train the model
3
Test Collection-Based Evaluation of Conversational Search
4
System A
System B
Can we simulate this?
Subtopic-Based Evaluation of Conversational Search
6
Conversational Search Simulation Model
7
Components of a Simulation-Based Evaluation
Conversational search system:
Takes a question/query, returns an answer in form of a sentence/paragraph
Test collection:
Topics/tasks/information needs
Subtopics/aspects/facets/subtasks/entities
User queries that model subtopics
Transition probabilities between subtopics
Corpus of “answers”
Relevance judgments of answers to subtopics
8
Evaluation Metric
Two variants for line 16, the subtopic
RI. sampled is independent of answer relevance
RD. sampled is conditioned on the answer relevance
9
Comparison of ECS to Precision and RBP
10
Summary
Evaluating conversations offline with test collections is hard
Use insights from diversity & novelty, sessions, tasks to design an evaluation
framework based on simulation
Provided results based on subtopics, and subtopics and relevance.
Crowdsource data for test collection queries and transition probabilities
Evaluating conversations offline with test collections is not so hard anymore!
11
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.