Source publication
Preprint
Full-text available
With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model...

Contexts in source publication

Context 1
... evaluate the proposed methods on five kinds of understanding tasks: sentiment classification, natural language inference, question answering, grammar checking, and paraphrasing. as shown in Table 2. Following Wei et al. (2022a), we report the test accuracy if a test set is provided by Huggingface datasets 1 ; otherwise, we use the validation set as our test set. ...
Context 2
... Wei et al. (2022a), we take the results of GPT-3 (175B), LaMDA-PT (137B), and GLaM (64B) from their respective papers. For OPT (175B), we re-implement OPT (175B) and use the same template in Table 2 to get the results. Note that only OPT has a strictly fair comparison with the proposed method. ...

Similar publications

Preprint
Full-text available
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address the problem of reference resolution, when people use natural expressions to choose between real world entities. For example, given the choice `Shoul...
Preprint
Full-text available
Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have re...
Preprint
Full-text available
Multilingual language models (MLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning. So far, only ~ 28 out of ~2,000 African languages are covered in existing language models. We ameliorate this limitation by developing SERENGETI, a set of massively multili...
Preprint
Full-text available
Large language models having hundreds of millions, and even billions, of parameters have performed extremely well on a variety of natural language processing (NLP) tasks. Their widespread use and adoption, however, is hindered by the lack of availability and portability of sufficiently large computational resources. This paper proposes a knowledge...