Kaixin Ma

Kaixin Ma
Carnegie Mellon University | CMU · Language Technologies Institute

Doctor of Philosophy

About

46
Publications
7,515
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
534
Citations
Introduction
Commonsense Reasoning, Question Answering, Natural Language Understanding

Publications

Publications (46)
Conference Paper
Full-text available
This paper presents a new corpus and a robust deep learning architecture for a task in reading comprehension, passage completion, on multi-party dialog. Given a dialog in text and a passage containing factual descriptions about the dialog where mentions of the characters are replaced by blanks, the task is to fill the blanks with the most appropria...
Conference Paper
Full-text available
Recent developments in pre-trained neural language modeling have led to leaps in accuracy on common-sense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promi...
Conference Paper
Full-text available
Automatic emotion recognition plays a critical role in technologies such as intelligent agents and social robots and is increasingly being deployed in applied settings such as education and healthcare. Most research to date has focused on recognizing the emotional expressions of young and middle-aged adults and, to a lesser extent, children and ado...
Preprint
The rapid development of large language and multimodal models has sparked significant interest in using proprietary models, such as GPT-4o, to develop autonomous agents capable of handling real-world scenarios like web navigation. Although recent open-source efforts have tried to equip agents with the ability to explore environments and continuousl...
Preprint
Full-text available
Object navigation in unknown environments is crucial for deploying embodied agents in real-world applications. While we have witnessed huge progress due to large-scale scene datasets, faster simulators, and stronger models, previous studies mainly focus on limited scene types and target objects. In this paper, we study a new task of navigating to d...
Preprint
Full-text available
Large Language Models (LLMs) excel in code generation yet struggle with modern AI software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI software engineering requires not only basic coding proficiency but also advanced skills in managing and interacting with code repositories. However, existing methods often ov...
Preprint
Text-rich images, where text serves as the central visual element guiding the overall understanding, are prevalent in real-world applications, such as presentation slides, scanned documents, and webpage snapshots. Tasks involving multiple text-rich images are especially challenging, as they require not only understanding the content of individual i...
Preprint
Full-text available
We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots. Unlike copilot systems, which primarily rely on users to provide essential state information (e.g., task descriptions) and assist users by answering questions or auto-completing contents, autopilot systems must complete tasks from start to finish i...
Preprint
Full-text available
While visual question-answering (VQA) benchmarks have catalyzed the development of reasoning techniques, they have focused on vertical thinking. Effective problem-solving also necessitates lateral thinking, which remains understudied in AI and has not been used to test visual perception systems. To bridge this gap, we formulate visual lateral think...
Conference Paper
Full-text available
The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-ch...
Chapter
Full-text available
Stories about everyday situations are an essential part of human communication, motivating the need to develop AI agents that can reliably understand these stories. Despite the long list of supervised methods for story completion and procedural understanding, current AI fails to generalize its procedural reasoning to unseen stories. This paper is b...
Preprint
Full-text available
Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. While achieving decent performance, previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples to teach the model how to reason in the interact...
Chapter
Commonsense reasoning is an attractive test bed for neuro-symbolic techniques, because it is a difficult challenge where pure neural and symbolic methods fall short. In this chapter, we review commonsense reasoning methods that combine large-scale knowledge resources with generalizable neural models to achieve both robustness and explainability. We...
Preprint
Full-text available
Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure. Understanding traffic situations requires a complex fusion of perceptual information with domain-specific and causal commonsense knowledge. Whereas prior work has provided benchmarks and methods for tr...
Preprint
Communication via natural language is a crucial aspect of intelligence, and it requires computational models to learn and reason about world concepts, with varying levels of supervision. While there has been significant progress made on fully-supervised non-interactive tasks, such as question-answering and procedural text understanding, much of the...
Preprint
The retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate retrieval skills are annotated for different datasets, recent work focuses on customized methods, limiting the model transferability and scalability. In this work, we propose a modular retriever where...
Preprint
Full-text available
Stories about everyday situations are an essential part of human communication, motivating the need to develop AI agents that can reliably understand these stories. Despite the long list of supervised methods for story completion and procedural understanding, current AI has no mechanisms to automatically track and explain procedures in unseen stori...
Conference Paper
Full-text available
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources. The key novelty of our method is the introduction of the intermediary modules into the current retriever-reader pipeline. Unlike previous methods that solely rely on the retriever for gathering all evid...
Preprint
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources. The key novelty of our method is the introduction of the intermediary modules into the current retriever-reader pipeline. Unlike previous methods that solely rely on the retriever for gathering all evid...
Conference Paper
Full-text available
Procedural text understanding is a challenging language reasoning task that requires models to track entity states across the development of a narrative. A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs. Prior methods considered a subset of these aspects...
Preprint
Procedural text understanding is a challenging language reasoning task that requires models to track entity states across the development of a narrative. A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs. Prior methods considered a subset of these aspects...
Preprint
Full-text available
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models, in zero-shot evaluation on various downstream language reasoning tasks. Since these improvements are reported in aggregate, however, little is known about (i) how to select the appropriate knowledge for so...
Preprint
Full-text available
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed. The situations in which this combination is most appropriate are characterized, including quantitat...
Chapter
Full-text available
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed. The situations in which this combination is most appropriate are characterized, including quantitat...
Preprint
Full-text available
Due to its potential for a universal interface over both data and text, data-to-text generation is becoming increasingly popular recently. However, few previous work has focused on its application to downstream tasks, e.g. using the converted data for grounding or reasoning. In this work, we aim to bridge this gap and use the data-to-text method as...
Preprint
Full-text available
Commonsense reasoning benchmarks have been largely solved by fine-tuning language models. The downside is that fine-tuning may cause models to overfit to task-specific data and thereby forget their knowledge gained during pre-training. Recent works only propose lightweight model updates as models may already possess useful knowledge from past exper...
Article
Full-text available
Commonsense knowledge is essential for many AI applications, including those in natural language processing, visual processing, and planning. Consequently, many sources that include commonsense knowledge have been designed and constructed over the past decades. Recently, the focus has been on large text-based sources, which facilitate easier integr...
Article
Full-text available
Recent developments in pre-trained neural language modeling have led to leaps in accuracy on common-sense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promi...
Preprint
Full-text available
Commonsense knowledge is essential for many AI applications, including those in natural language processing, visual processing, and planning. Consequently, many sources that include commonsense knowledge have been designed and constructed over the past decades. Recently, the focus has been on large text-based sources, which facilitate easier integr...
Conference Paper
Full-text available
Conditional text generation has been a challenging task that is yet to see human-level performance from state-of-the-art models. In this work, we specifically focus on the CommonGen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts. Despite advances in other tasks, large pre-trained language models tha...
Preprint
Full-text available
Conditional text generation has been a challenging task that is yet to see human-level performance from state-of-the-art models. In this work, we specifically focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts. Despite advances in other tasks, large pre-trained language models tha...
Preprint
Full-text available
Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promis...
Preprint
As audio/visual classification models are widely deployed for sensitive tasks like content filtering at scale, it is critical to understand their robustness along with improving the accuracy. This work aims to study several key questions related to multimodal learning through the lens of adversarial noises: 1) The trade-off between early/middle/lat...
Preprint
Full-text available
Computational context understanding refers to an agent's ability to fuse disparate sources of information for decision-making and is, therefore, generally regarded as a prerequisite for sophisticated machine reasoning capabilities, such as in artificial intelligence (AI). Data-driven and knowledge-driven methods are two classical techniques in the...
Preprint
Full-text available
Non-extractive commonsense QA remains a challenging AI task, as it requires systems to reason about, synthesize, and gather disparate pieces of information, in order to generate responses to queries. Recent approaches on such tasks show increased performance, only when models are either pre-trained with additional information or when domain-specifi...
Conference Paper
Full-text available
We propose a convolutional neural network model for text-based speaker identification on multiparty dialogues extracted from the TV show, Friends. While most previous works on this task rely heavily on acoustic features, our approach attempts to identify speakers in dialogues using their speech patterns as captured by transcriptions to the TV show....

Network

Cited By