January 2024
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Publications (189)
January 2024
·
16 Reads
·
2 Citations
IEEE/ACM Transactions on Audio Speech and Language Processing
This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with Unstructured Knowledge Access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog and 4. Situated interactive multimodal dialog. This paper describes the task definition, provided datasets, baselines, and evaluation setup for each track. We also summarize the results of the submitted systems to highlight the general trends of the state-of-the-art technologies for the tasks.
January 2023
·
487 Reads
Language models have steadily increased in size over the past few years. They achieve a high level of performance on various natural language processing (NLP) tasks such as question answering and summarization. Large language models (LLMs) have been used for generation and can now output human-like text. Due to this, there are other downstream tasks in the realm of dialog that can now harness the LLMs' language understanding capabilities. Dialog evaluation is one task that this paper will explore. It concentrates on prompting with LLMs: BLOOM, OPT, GPT-3, Flan-T5, InstructDial and TNLGv2. The paper shows that the choice of datasets used for training a model contributes to how well it performs on a task as well as on how the prompt should be structured. Specifically, the more diverse and relevant the group of datasets that a model is trained on, the better dialog evaluation performs. This paper also investigates how the number of examples in the prompt and the type of example selection used affect the model's performance.
August 2022
·
14 Reads
The DialPort project http://dialport.org/, funded by the National Science Foundation (NSF), covers a group of tools and services that aim at fulfilling the needs of the dialog research community. Over the course of six years, several offerings have been created, including the DialPort Portal and DialCrowd. This paper describes these contributions, which will be demoed at SIGDIAL, including implementation, prior studies, corresponding discoveries, and the locations at which the tools will remain freely available to the community going forward.
July 2022
·
14 Reads
The ultimate goal of dialog research is to develop systems that can be effectively used in interactive settings by real users. To this end, we introduced the Interactive Evaluation of Dialog Track at the 9th Dialog System Technology Challenge. This track consisted of two sub-tasks. The first sub-task involved building knowledge-grounded response generation models. The second sub-task aimed to extend dialog models beyond static datasets by assessing them in an interactive setting with real users. Our track challenges participants to develop strong response generation models and explore strategies that extend them to back-and-forth interactions with real users. The progression from static corpora to interactive evaluation introduces unique challenges and facilitates a more thorough assessment of open-domain dialog systems. This paper provides an overview of the track, including the methodology and results. Furthermore, it provides insights into how to best evaluate open-domain dialog models
July 2022
·
51 Reads
To facilitate zero-shot generalization in taskoriented dialog, this paper proposes Language Models as Data (LAD). LAD is a paradigm for creating diverse and accurate synthetic data which conveys the necessary structural constraints and can be used to train a downstream neural dialog model. LAD leverages GPT-3 to induce linguistic diversity. LAD achieves significant performance gains in zero-shot settings on intent prediction (+15%), slot filling (+31.4 F-1) and next action prediction (+11 F1). Furthermore, an interactive human evaluation shows that training with LAD is competitive with training on human dialogs. LAD is open-sourced, with the code and data available at https://github.com/Shikib/lad.
July 2022
·
15 Reads
Dialog system developers need high-quality data to train, fine-tune and assess their systems. They often use crowdsourcing for this since it provides large quantities of data from many workers. However, the data may not be of sufficiently good quality. This can be due to the way that the requester presents a task and how they interact with the workers. This paper introduces DialCrowd 2.0 to help requesters obtain higher quality data by, for example, presenting tasks more clearly and facilitating effective communication with workers. DialCrowd 2.0 guides developers in creating improved Human Intelligence Tasks (HITs) and is directly applicable to the workflows used currently by developers and researchers.
May 2022
·
385 Reads
Instruction tuning is an emergent paradigm in NLP wherein natural language instructions are leveraged with language models to induce zero-shot performance on unseen tasks. Instructions have been shown to enable good performance on unseen tasks and datasets in both large and small language models. Dialogue is an especially interesting area to explore instruction tuning because dialogue systems perform multiple kinds of tasks related to language (e.g., natural language understanding and generation, domain-specific interaction), yet instruction tuning has not been systematically explored for dialogue-related tasks. We introduce InstructDial, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets. Next, we explore cross-task generalization ability on models tuned on InstructDial across diverse dialogue tasks. Our analysis reveals that InstructDial enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting. To ensure that models adhere to instructions, we introduce novel meta-tasks. We establish benchmark zero-shot and few-shot performance of models trained using the proposed framework on multiple dialogue tasks.
March 2022
·
91 Reads
·
1 Citation
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog. The workshop explored the current state of the art along with its limitations and suggested promising directions for future work in this important and very rapidly changing area of research.
January 2022
·
15 Reads
·
49 Citations
Citations (58)
... Identifying knowledge sources was also used in the Ninth Dialog System Technology Challenge (DSTC9) with a track called "Beyond domain APIs -Tasks-oriented conversational modeling with unstructured knowledge access". This track aimed to expand different task-oriented dialog systems by incorporating external unstructured knowledge sources (Gunasekara et al., 2020). The track's purpose was to investigate how to support frictionless task-oriented situations so that the flow of the conversation does not break when users have questions that are out of the scope of APIs/DB but possibly are available in external knowledge sources. ...
Reference:
Conversational Information Seeking
- Citing Article
- Full-text available
January 2024
IEEE/ACM Transactions on Audio Speech and Language Processing
... If the applications were made available on the Play Store, 70 % of the students indicated that they would download them; however, only 5 % were willing to pay for the download. This is contrary to the findings of a study done at the Carnegie Mellon University in the United States of America, where non-native English students evaluated an application used during the preparation of scientific presentations [7]. The students thought the application could be used in real-life situations and they were willing to pay between $1 and $2 for it [7]. ...
- Citing Conference Paper
August 2013
... This is especially true when the evaluation is carried out through user studies, which compensate users for their participation [4]. Therefore, quite a lot of efforts are made, aimed at automating the evaluation, or at least automating certain aspects of the evaluation [19,20,21,22]. But still, as automated metrics do not necessarily capture all aspects of the system's quality, a human evaluation is performed, which usually asks about the naturalness and quality of the generated utterances and flow of dialogue [6,17]. ...
- Citing Conference Paper
January 2020
... A long-standing goal in task-oriented dialogue research has been zero-shot transfer of critical modules such as the NLU and DST to previously unseen domains and backend APIs (Mehri et al., 2022). To achieve this goal, we need a way to represent new domains and APIs in a format that can be fed to a machine learning model. ...
- Citing Conference Paper
January 2022
... Previous studies primarily focused on text-based zero-shot natural language understanding (NLU) [2,3,4,5], which processes transcripts produced by an automatic speech recognition (ASR) model to create a modular solution to zero-shot SLU. Among these studies, the prompt-based question-answering (QA) framework [6,7,8] has gained popularity, driven by the recent advancements in generative large language models (LLMs) [9,10,11]. This approach involves crafting a descriptive question for each semantic label (e.g. ...
- Citing Conference Paper
January 2021
... To address the generalization issues in neural networks, particularly in task-oriented dialogue systems, various neuro-symbolic methodologies have been investigated. (Mehri and Eskenazi, 2021) proposes schema graphs to generalize across various unseen domains and tasks. In (Romero et al., 2021), the authors fine-tuned GPT-2 to generate the text and symbolic representations. ...
- Citing Conference Paper
January 2021
... To facilitate zero-shot inference across varied tasks and datasets, Google introduced instruction tuning (Wei et al., 2021). This method trains language models to execute tasks based on natural language instructions (Chakrabarty et al., 2022;Gupta et al., 2022). Whereas the traditional classification paradigm requires that a specific new head is trained for each task, the instruction tuning paradigm maintains the language generation head trained during pre-training and finetunes the model by transforming classification tasks into language generation tasks. ...
- Citing Conference Paper
January 2022
... A growing body of work has shown that people mirror linguistic patterns produced by technology, as well. For example, people adopt the words and syntactic structures produced by a computer system [9,10] and the pronunciation patterns of text-to-speech (TTS) voices presented across a variety of forms [14,17,19,24,30,49,54,56,57]. However, the magnitude of mirroring often differs when making direct comparisons between a human and technological interlocutor. ...
- Citing Conference Paper
September 2012
... Dialogue evaluation research has raised awareness of measuring flexibility and understanding among many other criteria. There exist automated metrics based on NLP models for assessing the quality of dialogues, but their correlation with human judgments needs to be improved on (Mehri et al., 2022;Siro et al., 2022). While TTM is focused on usability metrics (easiness, confidence, speed, likeliness to use), we target dialogue and explanation quality metrics. ...
- Citing Preprint
- File available
March 2022
... Related work for our work is relatively sparse. Although automatic evaluation of dialogue systems is an active field of research (Yeh, Eskenazi, and Mehri 2021;Khalid and Lee 2022), most of the metrics and approaches focus on evaluating a dialogue in utterance level Ghazarian et al. 2020). However, our work focuses on the evaluation of the dialogues in conversation level, mostly produced by an AI algorithms, such as Graph2Bot introduced by Bouraoui et al. (2019) and is a tool for assisting conversational agent designers. ...
- Citing Conference Paper
January 2021