Figure - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Average results and standard deviation in parentheses over 5 runs on low-resource data in GLUE. ∆ shows the absolute difference between the results of the KL-Norm model with BERT-base.
Source publication
Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research...
Context in source publication
Context 1
... Benchmarks We have used the three low resource datasets namely MRPC and RTE for the evaluation of the proposed method. Table 1 Low Resource varying datasets We have used the four large NLP datasets such as SNLI, MNLI, QNLI, and YELP and subsample the dataset using the random seeds. We then evaluated the performance of NLI datasets under varying sizes of training data (200, 400, 600 800 and 1000 samples). ...Similar publications
Transfer learning is beneficial by allowing the expressive features of models pretrained on large-scale datasets to be finetuned for the target task of smaller, more domain-specific datasets. However, there is a concern that these pretrained models may come with their own biases which would propagate into the finetuned model. In this work, we inves...