Figure - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
Recent research has shown that rationales, or step-by-step chains of thought, can be used to improve performance in multi-step reasoning tasks. We reconsider rationale-augmented prompting for few-shot in-context learning, where (input -> output) prompts are expanded to (input, rationale -> output) prompts. For rationale-augmented prompting we demon...
Contexts in source publication
Context 1
... results for the PaLM-540B model are shown in Table 3, Table 4 and Table 6, and give a comparison to two baseline approaches: (1) standard few-shot prompting without rationales ( Brown et al., 2020), and (2) rationale-based prompting, including few-shot chain-of-thought (CoT) prompting , and zero-shot CoT ( Kojima et al., 2022) where the model is prompted with "Let's think step by step" to generate initial rationales then prompted with "Therefore, the answer is" to obtain the final answer. 4 ...Context 2
... explain these experiments in more detail. Table 3 shows the results obtained across a range of natural language inference tasks. One can see that the three rationale-augmented ensembling strategies ("output-sampled") all achieve significantly higher accuracy than chain-of-thought prompting with human-written rationales ( Wei et al., 2022). ...Context 3
... control for the bias of manually written rationales, we also investigate performance on the e-SNLI dataset using crowd-sourced rationales ( Camburu et al., 2018). As shown in Table 3, the improvement of rationaleaugmented ensemble appears to be stable regardless of whether the rationales are crowd-sourced or author-supplied. ...Similar publications
We describe the systems of the University of Alberta team for the SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task. We present a novel algorithm that leverages glosses retrieved from BabelNet, in combination with text and image encoders. Furthermore, we compare language-specific encoders against the application of English encoders to tran...
In the chapter we consider Information Extraction approaches that automatically identify structured information in text documents and comprise a set of tasks. The Text Classification task assigns a document to one or more pre-defined content categories or classes. This includes many subtasks such as language identification, sentiment analysis, etc....
Humans often make creative use of words to express novel senses. A long-standing effort in natural language processing has been focusing on word sense disambiguation (WSD), but little has been explored about how the sense inventory of a word may be extended toward novel meanings. We present a paradigm of word sense extension (WSE) that enables word...
Word Sense Disambiguation (WSD) aims to determine the correct meaning of words that can have multiple interpretations. Recently, contextualized word embeddings, whose goal is to give different representations of the same word in diverse contexts, have been shown to have a tremendous impact on several natural language processing tasks including ques...
Natural language processing (NLP) may face the inexplicable “black-box” problem of parameters and unreasonable modeling for lack of embedding of some characteristics of natural language, while the quantum-inspired models based on quantum theory may provide a potential solution. However, the essential prior knowledge and pretrained text features are...
Citations
... For AQuA, DROP, ANLI-A1, ANLI-A2, ANLI-A3, ComV and OBQA, we use their official test set for evaluation. For BoolQ, we follow Wang et al. (2022a) to use the validation set for evaluation, since its test set is not public available. For FactCK and WikiQA, we manually split they into train/test split, and use the training set's questions as unlabeled dataset, since there is not split version of them released. ...
... The answer is it is not possible to tell. Table 7: Few-shot CoT prompts for NLI tasks, three subsets of ANLI from (Wang et al., 2022a). Q: Which one of the following statements is against common sense? ...
Large Language Models have shown impressive abilities on various tasks. However, fundamentally improving them depends on high-quality datasets or computationally expensive fine-tuning. On the contrary, human can easily improve themselves by thinking and memory, without external resources. In this paper, we propose a framework, MoT, to let the LLM self-improve through Memory of Thoughts, without annotated datasets and parameter updates. Specifically, the framework is divided into two stages: 1. before the test stage, we let the LLM pre-think on the unlabeled dataset and save the high-confidence thoughts as external memory; 2. during inference, given a test question, we let the LLM recall relevant memory to help itself reason and answer it. Experimental results show that the proposed framework can help ChatGPT significantly improve its abilities in math reasoning, commonsense reasoning, factual reasoning and natural language inference. Further analyses show that each component contributes critically to the improvements.
Introduction
Constructing an accurate and comprehensive knowledge graph of specific diseases is critical for practical clinical disease diagnosis and treatment, reasoning and decision support, rehabilitation, and health management. For knowledge graph construction tasks (such as named entity recognition, relation extraction), classical BERT-based methods require a large amount of training data to ensure model performance. However, real-world medical annotation data, especially disease-specific annotation samples, are very limited. In addition, existing models do not perform well in recognizing out-of-distribution entities and relations that are not seen in the training phase.
Method
In this study, we present a novel and practical pipeline for constructing a heart failure knowledge graph using large language models and medical expert refinement. We apply prompt engineering to the three phases of schema design: schema design, information extraction, and knowledge completion. The best performance is achieved by designing task-specific prompt templates combined with the TwoStepChat approach.
Results
Experiments on two datasets show that the TwoStepChat method outperforms the Vanillia prompt and outperforms the fine-tuned BERT-based baselines. Moreover, our method saves 65% of the time compared to manual annotation and is better suited to extract the out-of-distribution information in the real world.
Generative methods greatly promote aspect-based sentiment analysis via generating a sequence of sentiment elements in a specified format. However, existing studies usually predict sentiment elements in a fixed order, which ignores the effect of the interdependence of the elements in a sentiment tuple and the diversity of language expression on the results. In this work, we propose Multi-view Prompting (MvP) that aggregates sentiment elements generated in different orders, leveraging the intuition of human-like problem-solving processes from different views. Specifically, MvP introduces element order prompts to guide the language model to generate multiple sentiment tuples, each with a different element order, and then selects the most reasonable tuples by voting. MvP can naturally model multi-view and multi-task as permutations and combinations of elements, respectively, outperforming previous task-specific designed methods on multiple ABSA tasks with a single model. Extensive experiments show that MvP significantly advances the state-of-the-art performance on 10 datasets of 4 benchmark tasks, and performs quite effectively in low-resource settings. Detailed evaluation verified the effectiveness, flexibility, and cross-task transferability of MvP.