Figure 1 - available via license: Creative Commons Zero 1.0
Content may be subject to copyright.
BERT architecture [1]

BERT architecture [1]

Source publication
Preprint
Full-text available
Pre-trained language models such as BERT are known to perform exceedingly well on various NLP tasks and have even established new State-Of-The-Art (SOTA) benchmarks for many of these tasks. Owing to its success on various tasks and benchmark datasets, industry practitioners have started to explore BERT to build applications solving industry use cas...

Similar publications

Preprint
Full-text available
Fine-tuning contextualized representations learned by pre-trained language models has become a standard practice in the NLP field. However, pre-trained representations are prone to degradation (also known as representation collapse) during fine-tuning, which leads to instability, suboptimal performance, and weak generalization. In this paper, we pr...