Augment-CLIP systems performance.

Augment-CLIP systems performance.

Source publication
Preprint
Full-text available
This paper describes our zero-shot approaches for the Visual Word Sense Disambiguation (VWSD) Task in English. Our preliminary study shows that the simple approach of matching candidate images with the phrase using CLIP suffers from the many-to-many nature of image-text pairs. We find that the CLIP text encoder may have limited abilities in capturi...

Contexts in source publication

Context 1
... then rank the candidate images based on the new probability in descending order, with the highest probability candidate image being the predicted image from the ensembled model. See Table 1. ...
Context 2
... Augment-CLIP does not outperform Base-CLIP, often due to poor translation, but, interestingly, it offers sufficient complementarity to Base-CLIP or other Augment-CLIP that it improves performance through ensembling. See results in Table 1. ...
Context 3
... the organizers' baseline uses CLIP-ViT-largepatch14-336, an even larger model which improved performance in test data. See Table 1. This leads to the question of how different Base-CLIP embeddings affect performance on this task, which is outside the scope of this paper as we take the Base-CLIP embedding as a given in our systems. ...
Context 4
... adding Chinese translation to the ensemble (ensemble(B-CLIP, zh, k2t 2)), test data hit rate increases from 59.18 to 63.71 and test data mrr increases from 73.21 to 76.11. See Table 1. ...

Similar publications

Conference Paper
Full-text available
The increase in the popularity of code mixed languages has resulted in the need to engineer language models for the same. Unlike pure languages , code-mixed languages lack clear grammatical structures, leading to ambiguous sentence constructions. This ambiguity presents significant challenges for natural language processing tasks, including syntact...
Preprint
Full-text available
We evaluate a battery of recent large language models on two benchmarks for word sense disambiguation in Swedish. At present, all current models are less accurate than the best supervised disambiguators in cases where a training set is available, but most models outperform graph-based unsupervised systems. Different prompting approaches are compare...
Article
Full-text available
The level and volume of automatic computerized processing of linguistic information has become one of the most important criteria for measuring whether a country has entered the information society. The study begins with statistical linguistics and aims to process complicated Chinese information. In this paper, after establishing the word database...
Article
Full-text available
Natural language processing (NLP) may face the inexplicable “black-box” problem of parameters and unreasonable modeling for lack of embedding of some characteristics of natural language, while the quantum-inspired models based on quantum theory may provide a potential solution. However, the essential prior knowledge and pretrained text features are...