Validation and training losses of KL-Norm for varying β and a fixed bottleneck size on GLUE.

Validation and training losses of KL-Norm for varying β and a fixed bottleneck size on GLUE.

Source publication
Preprint
Full-text available
Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research...

Similar publications

Preprint
Full-text available
Transfer learning is beneficial by allowing the expressive features of models pretrained on large-scale datasets to be finetuned for the target task of smaller, more domain-specific datasets. However, there is a concern that these pretrained models may come with their own biases which would propagate into the finetuned model. In this work, we inves...