Sina J. Semnani’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (26)


Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
  • Preprint

August 2024

·

3 Reads

Yucheng Jiang

·

Yijia Shao

·

Dekun Ma

·

[...]

·

Monica S. Lam

While language model (LM)-powered chatbots and generative search engines excel at answering concrete queries, discovering information in the terrain of unknown unknowns remains challenging for users. To emulate the common educational scenario where children/students learn by listening to and participating in conversations of their parents/teachers, we create Collaborative STORM (Co-STORM). Unlike QA systems that require users to ask all the questions, Co-STORM lets users observe and occasionally steer the discourse among several LM agents. The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously. To facilitate user interaction, Co-STORM assists users in tracking the discourse by organizing the uncovered information into a dynamic mind map, ultimately generating a comprehensive report as takeaways. For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals. Co-STORM outperforms baseline methods on both discourse trace and report quality. In a further human evaluation, 70% of participants prefer Co-STORM over a search engine, and 78% favor it over a RAG chatbot.


SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions

July 2024

·

4 Reads

Recent work integrating Large Language Models (LLMs) has led to significant improvements in the Knowledge Base Question Answering (KBQA) task. However, we posit that existing KBQA datasets that either have simple questions, use synthetically generated logical forms, or are based on small knowledge base (KB) schemas, do not capture the true complexity of KBQA tasks. To address this, we introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from forum discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. Much more complex than existing datasets, SPINACH calls for strong KBQA systems that do not rely on training data to learn the KB schema, but can dynamically explore large and often incomplete schemas and reason about them. Along with the dataset, we introduce the SPINACH agent, a new KBQA approach that mimics how a human expert would write SPARQLs for such challenging questions. Experiments on existing datasets show SPINACH's capability in KBQA, achieving a new state of the art on the QALD-7, QALD-9 Plus and QALD-10 datasets by 30.1%, 27.0%, and 10.0% in F1, respectively, and coming within 1.6% of the fine-tuned LLaMA SOTA model on WikiWebQuestions. On our new SPINACH dataset, SPINACH agent outperforms all baselines, including the best GPT-4-based KBQA agent, by 38.1% in F1.


Figure 8: An example of feedback report on lack of information in Social good (Doctors without borders).
Figure 11: A snapshot of instruction for crowdsourcing.
Figure 13: A snapshot of instruction for the real user experiment.
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval
  • Preprint
  • File available

July 2024

·

46 Reads

Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots can accelerate the positive effects of persuasion in such applications. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. To address this issue, we propose a method to leverage the generalizability and inherent persuasive abilities of large language models (LLMs) in creating effective and truthful persuasive chatbot for any given domain in a zero-shot manner. Unlike previous studies which used pre-defined persuasion strategies, our method first uses an LLM to generate responses, then extracts the strategies used on the fly, and replaces any unsubstantiated claims in the response with retrieved facts supporting the strategies. We applied our chatbot, PersuaBot, to three significantly different domains needing persuasion skills: donation solicitation, recommendations, and health intervention. Our experiments on simulated and human conversations show that our zero-shot approach is more persuasive than prior work, while achieving factual accuracy surpassing state-of-the-art knowledge-oriented chatbots. Our study demonstrated that when persuasive chatbots are employed responsibly for social good, it is an enabler of positive individual and social change.

Download

Figure 3: Evaluation issues resolved within the gap between EM and platinum.
SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing

June 2024

·

11 Reads

We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive heterogeneous open-domain QA dataset, with 56.5% exact match (EM) rate. More importantly, manual analysis on a sample of the dataset suggests that SPAGHETTI is more than 90% accurate, indicating that EM is no longer suitable for assessing the capabilities of QA systems today.


Benchmark Underestimates the Readiness of Multi-lingual Dialogue Agents

May 2024

·

23 Reads

Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are more compatible with in-context learning where only a handful of few-shot examples are used. We test our approach on the multilingual TOD dataset X-RiSAWOZ, which has 12 domains in Chinese, English, French, Korean, Hindi, and code-mixed Hindi-English. Our turn-by-turn DST accuracy on the 6 languages range from 55.6% to 80.3%, seemingly worse than the SOTA results from fine-tuned models that achieve from 60.7% to 82.8%; our BLEU scores in the response generation (RG) subtask are also significantly lower than SOTA. However, after manual evaluation of the validation set, we find that by correcting gold label errors and improving dataset annotation schema, GPT-4 with our prompts can achieve (1) 89.6%-96.8% accuracy in DST, and (2) more than 99% correct response generation across different languages. This leads us to conclude that current automatic metrics heavily underestimate the effectiveness of in-context learning.







Citations (12)


... Yoon et al. [44] investigated how multiple coordinated chatbots could encourage charitable donations, finding that organizational alignment enhanced credibility though some approaches appeared overly insistent. Furumai et al. [16] demonstrated success with a multi-step dialogue approach that achieved superior scores in both persuasiveness and factual accuracy by grounding appeals in verifiable data and success stories. ...

Reference:

Persuasion with Large Language Models: a Survey
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval
  • Citing Conference Paper
  • January 2024

... The absence of high quality test data built on heterogeneous knowledge sources. Existing benchmarks that are built on multiple knowledge sources either have a narrow knowledge scope, typically only encompassing encyclopedic knowledge from Wikipedia and Wikidata [43,48,49], or lack instance-level design, allowing individual questions to be answered using a single knowledge source without requiring crossknowledge source querying and comparison [6,40]. ...

SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing
  • Citing Conference Paper
  • January 2024

... With the continuous representation nature of LLMs, the dictionary lookup in LLMs does not only work for an exact matched pattern, but also for similarly relevant patterns. Such a behavior has been observed in prior works focusing on knowledge discovery from structured data such as tabular and graph [188,241,269] which extends even further to a multihop reasoning through dictionary look up from structured data [233]. ...

SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models
  • Citing Conference Paper
  • January 2024

... This feature allows the system to efficiently keep up with dynamic knowledge domains, eliminating the need for costly and timeconsuming LLM fine-tuning. Because of its great flexibility, RAG arises in innumerable industrial contexts (e.g., ChatGPT Retrieval Plugin [21], WikiChat [22], FinGPT [23], and Cha-tRTX [24]) as well as in specialized domains (e.g., medical diagnostic assistants [25] and email/code completion [26]). ...

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
  • Citing Conference Paper
  • January 2023

... To improve the reliability and factual accuracy of language model responses, several approaches have been proposed, covering various aspects of language modeling and data analysis. The work by Xu et al. [39] involves fine-tuning LLMs for high-quality Wikidata-related questions and answers Huo et al. [15] investigates a method for automatically verifying LLM responses using a corpus. Another study [34] expands the training set with contrastive samples exhibiting different degrees of errors. ...

Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata
  • Citing Conference Paper
  • January 2023

... Most approaches involve translating data from one language, which is still labor intensive (Ding et al., 2022;Li et al., 2021a). Zero and few-shot approaches that have been proposed often have a large performance gap with more data-intensive methods, and are either only tested in the monolingual setting (Zhao et al., 2022;Li et al., 2021b;Campagna et al., 2020Campagna et al., , 2022 or rely on machine translation from a large preexisting dataset (Moradshahi et al., 2023a). Recent advances in LLMs have made possible an extremely data-efficient approach: in-context learning (Brown et al., 2020). ...

Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation
  • Citing Conference Paper
  • January 2023

... In addition researchers also released multilingual dialogue datasets for task-oriented dialogues, such as GlobalWoZ [52] and X-RiSAWOZ [53], to help developers better develop and evaluate multilingual task-oriented dialogue systems. ...

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

... Regarding mapping a natural language input to a prompt template, existing techniques of knowledge representation and reasoning can be very helpful. More specifically, ontology alignment and semantic parsing [4,43] can help map an NL input to a structured representation of knowledge and infer implicit concepts and relationships. These algorithms can be used to generate more precise and accurate prompts for LLMs, and to improve the effectiveness of the Socratic method in dialogue formulation [42]. ...

A Few-Shot Semantic Parser for Wizard-of-Oz Dialogues with the Precise ThingTalk Representation
  • Citing Conference Paper
  • January 2022

... Recently, some works such as SMCalFlow (Semantic Machines et al., 2020) and TOC (Campagna et al., 2021) and ThingTalk (Lam et al., 2022;Campagna et al., 2022) proposes to use executable dialogue states for TOD. To this end, the representation itself is also a specially-designed programming language. ...

ThingTalk: An Extensible, Executable Representation Language for Task-Oriented Dialogues
  • Citing Preprint
  • March 2022