Table 2 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model...
Contexts in source publication
Context 1
... evaluate the proposed methods on five kinds of understanding tasks: sentiment classification, natural language inference, question answering, grammar checking, and paraphrasing. as shown in Table 2. Following Wei et al. (2022a), we report the test accuracy if a test set is provided by Huggingface datasets 1 ; otherwise, we use the validation set as our test set. ...
Context 2
... Wei et al. (2022a), we take the results of GPT-3 (175B), LaMDA-PT (137B), and GLaM (64B) from their respective papers. For OPT (175B), we re-implement OPT (175B) and use the same template in Table 2 to get the results. Note that only OPT has a strictly fair comparison with the proposed method. ...
Similar publications
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address the problem of reference resolution, when people use natural expressions to choose between real world entities. For example, given the choice `Shoul...
Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have re...
Multilingual language models (MLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning. So far, only ~ 28 out of ~2,000 African languages are covered in existing language models. We ameliorate this limitation by developing SERENGETI, a set of massively multili...
Large language models having hundreds of millions, and even billions, of parameters have performed extremely well on a variety of natural language processing (NLP) tasks. Their widespread use and adoption, however, is hindered by the lack of availability and portability of sufficiently large computational resources. This paper proposes a knowledge...