Monica S. Lam’s research while affiliated with Stanford University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (253)


Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
  • Preprint

August 2024

·

3 Reads

Yucheng Jiang

·

Yijia Shao

·

Dekun Ma

·

[...]

·

Monica S. Lam

While language model (LM)-powered chatbots and generative search engines excel at answering concrete queries, discovering information in the terrain of unknown unknowns remains challenging for users. To emulate the common educational scenario where children/students learn by listening to and participating in conversations of their parents/teachers, we create Collaborative STORM (Co-STORM). Unlike QA systems that require users to ask all the questions, Co-STORM lets users observe and occasionally steer the discourse among several LM agents. The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously. To facilitate user interaction, Co-STORM assists users in tracking the discourse by organizing the uncovered information into a dynamic mind map, ultimately generating a comprehensive report as takeaways. For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals. Co-STORM outperforms baseline methods on both discourse trace and report quality. In a further human evaluation, 70% of participants prefer Co-STORM over a search engine, and 78% favor it over a RAG chatbot.


SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions

July 2024

·

4 Reads

Recent work integrating Large Language Models (LLMs) has led to significant improvements in the Knowledge Base Question Answering (KBQA) task. However, we posit that existing KBQA datasets that either have simple questions, use synthetically generated logical forms, or are based on small knowledge base (KB) schemas, do not capture the true complexity of KBQA tasks. To address this, we introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from forum discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. Much more complex than existing datasets, SPINACH calls for strong KBQA systems that do not rely on training data to learn the KB schema, but can dynamically explore large and often incomplete schemas and reason about them. Along with the dataset, we introduce the SPINACH agent, a new KBQA approach that mimics how a human expert would write SPARQLs for such challenging questions. Experiments on existing datasets show SPINACH's capability in KBQA, achieving a new state of the art on the QALD-7, QALD-9 Plus and QALD-10 datasets by 30.1%, 27.0%, and 10.0% in F1, respectively, and coming within 1.6% of the fine-tuned LLaMA SOTA model on WikiWebQuestions. On our new SPINACH dataset, SPINACH agent outperforms all baselines, including the best GPT-4-based KBQA agent, by 38.1% in F1.


Figure 3: KITA provides the latest set of worksheets and only one previous conversation turn to the semantic parser. The parser outputs the current worksheets, which are used by the Agent Policy to generate the Agent Dialogue Acts. These Acts along with the latest worksheet value and the latest user utterance are used to generate the response.
Figure 6: User study webapp for Course enrollment assistant, similar interface was used for other studies
Figure 15: Here, we define the Course worksheet. This contains the details about a single course. The form doesn't have any predicates, so it can be filled at any time. It contains fields for the course_name, the grade_type which can be either Credit/No Credit or Letter, and course_num_units which indicates how many units the course is being taken for.
Figure 16: Course database worksheet
Figure 28: Restaurant Reservation: User Info form
LLM-Based Open-Domain Integrated Task and Knowledge Assistants with Programmable Policies
  • Preprint
  • File available

July 2024

·

7 Reads

Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses ("hallucinate"). Traditional dialogue trees can only handle a limited number of conversation flows, making them inherently brittle. To this end, we present KITA - a programmable framework for creating task-oriented conversational agents that are designed to handle complex user interactions. Unlike LLMs, KITA provides reliable grounded responses, with controllable agent policies through its expressive specification, KITA Worksheet. In contrast to dialog trees, it is resilient to diverse user queries, helpful with knowledge sources, and offers ease of programming policies through its declarative paradigm. Through a real-user study involving 62 participants, we show that KITA beats the GPT-4 with function calling baseline by 26.1, 22.5, and 52.4 points on execution accuracy, dialogue act accuracy, and goal completion rate, respectively. We also release 22 real-user conversations with KITA manually corrected to ensure accuracy.

Download

Figure 8: An example of feedback report on lack of information in Social good (Doctors without borders).
Figure 11: A snapshot of instruction for crowdsourcing.
Figure 13: A snapshot of instruction for the real user experiment.
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval

July 2024

·

46 Reads

Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots can accelerate the positive effects of persuasion in such applications. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. To address this issue, we propose a method to leverage the generalizability and inherent persuasive abilities of large language models (LLMs) in creating effective and truthful persuasive chatbot for any given domain in a zero-shot manner. Unlike previous studies which used pre-defined persuasion strategies, our method first uses an LLM to generate responses, then extracts the strategies used on the fly, and replaces any unsubstantiated claims in the response with retrieved facts supporting the strategies. We applied our chatbot, PersuaBot, to three significantly different domains needing persuasion skills: donation solicitation, recommendations, and health intervention. Our experiments on simulated and human conversations show that our zero-shot approach is more persuasive than prior work, while achieving factual accuracy surpassing state-of-the-art knowledge-oriented chatbots. Our study demonstrated that when persuasive chatbots are employed responsibly for social good, it is an enabler of positive individual and social change.


Figure 3: Evaluation issues resolved within the gap between EM and platinum.
SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing

June 2024

·

11 Reads

We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive heterogeneous open-domain QA dataset, with 56.5% exact match (EM) rate. More importantly, manual analysis on a sample of the dataset suggests that SPAGHETTI is more than 90% accurate, indicating that EM is no longer suitable for assessing the capabilities of QA systems today.


Benchmark Underestimates the Readiness of Multi-lingual Dialogue Agents

May 2024

·

23 Reads

Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are more compatible with in-context learning where only a handful of few-shot examples are used. We test our approach on the multilingual TOD dataset X-RiSAWOZ, which has 12 domains in Chinese, English, French, Korean, Hindi, and code-mixed Hindi-English. Our turn-by-turn DST accuracy on the 6 languages range from 55.6% to 80.3%, seemingly worse than the SOTA results from fine-tuned models that achieve from 60.7% to 82.8%; our BLEU scores in the response generation (RG) subtask are also significantly lower than SOTA. However, after manual evaluation of the validation set, we find that by correcting gold label errors and improving dataset annotation schema, GPT-4 with our prompts can achieve (1) 89.6%-96.8% accuracy in DST, and (2) more than 99% correct response generation across different languages. This leads us to conclude that current automatic metrics heavily underestimate the effectiveness of in-context learning.






Citations (63)


... Yoon et al. [44] investigated how multiple coordinated chatbots could encourage charitable donations, finding that organizational alignment enhanced credibility though some approaches appeared overly insistent. Furumai et al. [16] demonstrated success with a multi-step dialogue approach that achieved superior scores in both persuasiveness and factual accuracy by grounding appeals in verifiable data and success stories. ...

Reference:

Persuasion with Large Language Models: a Survey
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval
  • Citing Conference Paper
  • January 2024

... The absence of high quality test data built on heterogeneous knowledge sources. Existing benchmarks that are built on multiple knowledge sources either have a narrow knowledge scope, typically only encompassing encyclopedic knowledge from Wikipedia and Wikidata [43,48,49], or lack instance-level design, allowing individual questions to be answered using a single knowledge source without requiring crossknowledge source querying and comparison [6,40]. ...

SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing
  • Citing Conference Paper
  • January 2024

... With the continuous representation nature of LLMs, the dictionary lookup in LLMs does not only work for an exact matched pattern, but also for similarly relevant patterns. Such a behavior has been observed in prior works focusing on knowledge discovery from structured data such as tabular and graph [188,241,269] which extends even further to a multihop reasoning through dictionary look up from structured data [233]. ...

SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models
  • Citing Conference Paper
  • January 2024

... Artificial intelligence (AI) has become an increasingly valuable tool in academic research, offering support across various aspects of the scientific process (Byun & Stuhlmüller, 2023;Chen & Eger, 2023;Nechakhin et al., 2024;Shao et al., 2024). For instance, platforms such as Elicit (Byun & Stuhlmüller, 2023) 2 and ResearchRabbit 3 facilitate finding relevant literature for specific research topics. ...

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
  • Citing Conference Paper
  • January 2024

... In current development environments, it is rather difficult for software developers to provide a multimodal user interface because of its complex and cost-intensive implementation [1,3,24]. Therefore, we focus on software developers' needs and users' expectations. ...

ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models
  • Citing Conference Paper
  • May 2024

... This feature allows the system to efficiently keep up with dynamic knowledge domains, eliminating the need for costly and timeconsuming LLM fine-tuning. Because of its great flexibility, RAG arises in innumerable industrial contexts (e.g., ChatGPT Retrieval Plugin [21], WikiChat [22], FinGPT [23], and Cha-tRTX [24]) as well as in specialized domains (e.g., medical diagnostic assistants [25] and email/code completion [26]). ...

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
  • Citing Conference Paper
  • January 2023

... To improve the reliability and factual accuracy of language model responses, several approaches have been proposed, covering various aspects of language modeling and data analysis. The work by Xu et al. [39] involves fine-tuning LLMs for high-quality Wikidata-related questions and answers Huo et al. [15] investigates a method for automatically verifying LLM responses using a corpus. Another study [34] expands the training set with contrastive samples exhibiting different degrees of errors. ...

Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata
  • Citing Conference Paper
  • January 2023

... Most approaches involve translating data from one language, which is still labor intensive (Ding et al., 2022;Li et al., 2021a). Zero and few-shot approaches that have been proposed often have a large performance gap with more data-intensive methods, and are either only tested in the monolingual setting (Zhao et al., 2022;Li et al., 2021b;Campagna et al., 2020Campagna et al., , 2022 or rely on machine translation from a large preexisting dataset (Moradshahi et al., 2023a). Recent advances in LLMs have made possible an extremely data-efficient approach: in-context learning (Brown et al., 2020). ...

Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation
  • Citing Conference Paper
  • January 2023

... Another recent multilingual dataset X-RiSAWOZ (Moradshahi et al., 2023b) was translated from RiSAWOZ (Quan et al., 2020), which contains human-written dialogues spanning 12 domains and has the lowest annotation error rate among popular TOD datasets (Moradshahi et al., 2023c). Moradshahi et al. (2023b) first manually translated the validation and test sets of RiSAWOZ from Chinese to English. ...

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
  • Citing Conference Paper
  • January 2023

... In addition researchers also released multilingual dialogue datasets for task-oriented dialogues, such as GlobalWoZ [52] and X-RiSAWOZ [53], to help developers better develop and evaluate multilingual task-oriented dialogue systems. ...

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents