November 2023
·
51 Reads
IEEE Transactions on Pattern Analysis and Machine Intelligence
Transformer models have achieved outstanding results on a variety of language tasks, such as text classification, ma- chine translation, and question answering. This success in the field of Natural Language Processing (NLP) has sparked interest in the computer vision community to apply these models to vision and multi-modal learning tasks. However, visual data has a unique structure, requiring the need to rethink network designs and training methods. As a result, Transformer models and their variations have been suc- cessfully used for image recognition, object detection, seg- mentation, image super-resolution, video understanding, image generation, text-image synthesis, and visual question answering, among other applications.