Figure 3 - uploaded by Farhad Nooralahzadeh
Content may be subject to copyright.
The code-mixed question, where a set of 4 words is randomly selected in order to be replaced by their translation using bilingual dictionaries of MUSE (Lample et al. 2018) into 4 randomly selected target languages in xGQA.

The code-mixed question, where a set of 4 words is randomly selected in order to be replaced by their translation using bilingual dictionaries of MUSE (Lample et al. 2018) into 4 randomly selected target languages in xGQA.

Source publication
Conference Paper
Full-text available
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross...

Contexts in source publication

Context 1
... the word has multiple translations in the target language, then one of them is randomly selected for replacement. To increase the data diversity during the training, Qin et al. (2020) proposes to reset the replacement after each epoch and to replace different words at different epochs. 2 Figure 3 shows the result of applying the code-mixing procedure to our example. ...
Context 2
... the word has multiple translations in the target language, then one of them is randomly selected for replacement. To increase the data diversity during the training, Qin et al. (2020) proposes to reset the replacement after each epoch and to replace different words at different epochs. 2 Figure 3 shows the result of applying the code-mixing procedure to our example. ...

Similar publications

Preprint
Full-text available
Language models are an integral part of various lossless text compression methods, especially arithmetic coding. While statistical models are usually preferred due to their low latency and low memory requirements, it is possible to attain high compression rates with neural language models. Modern large language models (LLMs) have pushed the boundar...
Preprint
Full-text available
Research interest in task-oriented dialogs has increased as systems such as Google Assistant, Alexa and Siri have become ubiquitous in everyday life. However, the impact of academic research in this area has been limited by the lack of datasets that realistically capture the wide array of user pain points. To enable research on some of the more cha...
Article
Full-text available
At the crossroad between German and Italian worlds, merchants in Bolzano (South Tyrol, Italy) played a crucial role in promoting multilingualism. This work will explore the language management of merchants’ life as emerging from the documents of their official institution, the so-called Magistrato Mercantile (1635-1851). For this first linguistic s...
Preprint
Full-text available
We present V\=arta, a large-scale multilingual dataset for headline generation in Indic languages. This dataset includes 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources. To the best of our knowledge, this is the largest collection of curated articles for Indic languages cur...

Citations

... Sparse finetuning (Dao et al., 2022;Thangarasa et al., 2023) is a technique for adapting LLMs to specific tasks or datasets while only updating a small subset of the model's parameters. Our approach shares similarities with sparse fine-tuning, a technique commonly used in multilingual modeling (Nooralahzadeh & Sennrich, 2023;Choenni et al., 2024), where languages are typically assumed to be encoded modularly. Sparse fine-tuning identifies specific components (e.g., heads), fine-tunes all their parameters, and discards the others. ...
Preprint
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing techniques, which modify the activations of specific model components. These methods, due to their extremely small parameter counts, show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods.
... It involves constructing scene graphs by extracting triples of subjects, objects, and their relationships [1]. These graphs have wide-ranging applications, including image retrieval [2,3], visual reasoning [4,5], visual question answering [6], image captioning [7,8], and robotics [9,10]. ...
Article
Full-text available
The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.
... Knowledge Localization Pruning methods have been used to uncover subnetworks, i.e. subsets of model parameters (Frankle and Carbin, 2018) isolating task-specific (Nooralahzadeh and Sennrich, 2023), domain-specific (Hendy et al., 2022) or language-specific (Wang et al., 2020;Choenni et al., 2023a,b;Nooralahzadeh and Sennrich, 2023) information. In this paper, we use pruning to find subnetworks that contain stereotypical gender bias. ...
... Knowledge Localization Pruning methods have been used to uncover subnetworks, i.e. subsets of model parameters (Frankle and Carbin, 2018) isolating task-specific (Nooralahzadeh and Sennrich, 2023), domain-specific (Hendy et al., 2022) or language-specific (Wang et al., 2020;Choenni et al., 2023a,b;Nooralahzadeh and Sennrich, 2023) information. In this paper, we use pruning to find subnetworks that contain stereotypical gender bias. ...
Preprint
Full-text available
Stereotypical bias encoded in language models (LMs) poses a threat to safe language technology, yet our understanding of how bias manifests in the parameters of LMs remains incomplete. We introduce local contrastive editing that enables the localization and editing of a subset of weights in a target model in relation to a reference model. We deploy this approach to identify and modify subsets of weights that are associated with gender stereotypes in LMs. Through a series of experiments, we demonstrate that local contrastive editing can precisely localize and control a small subset (< 0.5%) of weights that encode gender bias. Our work (i) advances our understanding of how stereotypical biases can manifest in the parameter space of LMs and (ii) opens up new avenues for developing parameter-efficient strategies for controlling model properties in a contrastive manner.
... For visual question answering in particular, there has been work dedicated to reducing the crosslingual performance gap. Nooralahzadeh and Sennrich (2023) assessed that a high ambiguity in the label space makes learning more difficult, attempting to remedy for this with several strategies, including addition of a similarity-based loss to standard classification cross-entropy loss, codeswitching at the instance level and a sparse finetuning approach. Liu et al. (2023a) reduce the ZS-XLT performance gap by replacing the standard single-layer classifier with a deeper two-layer architecture. ...
... It is worth noting that all three question types effectively have only 'yes' and 'no' as answer labels. This means that SUF is not improving ZS-XLT by reducing label space ambiguity (like, e.g., Nooralahzadeh and Sennrich (2023)), but rather by preventing early overfitting to English-specific idiosyncrasies in the questions. Expectedly, all models generally exhibit the lowest performance on the open-ended Query questions, which account for the largest portion of the xQGA data. ...
... Since then, it has been shown that such subnetworks exist within BERT models (Prasanna et al., 2020;Budhraja et al., 2021;, and that both languageneutral and language-specific subnetworks can be found in multilingual LMs (Foroutan et al., 2022). Hence, sparse training gained popularity in multilingual NLP: Nooralahzadeh and Sennrich (2023) show that training task-specific subnetworks can help in cross-lingual transfer, Lin et al. (2021) use language-pair-specific subnetworks for neural machine translation, and Hendy et al. (2022) use domain-specific subnetworks. Finally, Wang et al. (2020); Lu et al. (2022);Choenni et al. (2023a); use language-specific subnetworks to improve cross-lingual performance on a range of tasks, e.g. ...
... Knowledge Localization Pruning methods have been used to uncover subnetworks, i.e. subsets of model parameters (Frankle and Carbin, 2018) isolating task-specific (Nooralahzadeh and Sennrich, 2023), domain-specific (Hendy et al., 2022) or language-specific (Wang et al., 2020;Choenni et al., 2023a,b;Nooralahzadeh and Sennrich, 2023) information. In this paper, we use pruning to find subnetworks that contain stereotypical gender bias. ...
... Knowledge Localization Pruning methods have been used to uncover subnetworks, i.e. subsets of model parameters (Frankle and Carbin, 2018) isolating task-specific (Nooralahzadeh and Sennrich, 2023), domain-specific (Hendy et al., 2022) or language-specific (Wang et al., 2020;Choenni et al., 2023a,b;Nooralahzadeh and Sennrich, 2023) information. In this paper, we use pruning to find subnetworks that contain stereotypical gender bias. ...
... In this article, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive transfer during fine-tuning, with the goal of improving the performance of multilin-gual language models on low-resource languages. While recent works apply various subnetwork based approaches to their models statically (Lu et al. 2022;Yang et al. 2022;Nooralahzadeh and Sennrich 2022), we propose a new method that allows the model to dynamically update the subnetworks during fine-tuning. This allows for sharing between language pairs to a different extent at the different learning stages of the models. ...
... Analyzing and training shared subnetworks. The idea of sharing through sparse subnetworks was first proposed for multi-task learning (Sun et al. 2020), and was recently studied in the multilingual setting: Foroutan et al. (2022) show that both language-neutral and language-specific subnetworks exist in multilingual models, and Nooralahzadeh and Sennrich (2022) show that training task-specific subnetworks can help in cross-lingual transfer as well. ...
Article
Full-text available
Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this article, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive transfer during fine-tuning. We introduce dynamic subnetworks, which are jointly updated with the model, and we combine our methods with meta-learning, an established, but complementary, technique for improving cross-lingual transfer. Finally, we provide extensive analyses of how each of our methods affects the models.
... In this paper, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive transfer during fine-tuning, with the goal of improving the performance of multilingual language models on low-resource languages. While recent works apply various subnetwork based approaches to their models statically (Lu et al., 2022;Yang et al., 2022;Nooralahzadeh and Sennrich, 2022), we propose a new method that allows the model to dynamically update the subnetworks during fine-tuning. This allows for sharing between language pairs to a different extent at the different learning stages of the models. ...
... Analyzing and training shared subnetworks The idea of sharing through sparse subnetworks was first proposed for multi-task learning (Sun et al., 2020) and was recently studied in the multilingual setting: Foroutan et al. (2022) show that both language-neutral and language-specific subnetworks exist in multilingual models, and Nooralahzadeh and Sennrich (2022) show that training task-specific subnetworks can help in crosslingual transfer as well. In concurrent work, Lu et al. (2022) show that using language-specific subnetworks at the pretraining stage can mitigate negative interference for speech recognition. ...
Preprint
Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this paper, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive transfer during fine-tuning. We introduce dynamic subnetworks, which are jointly updated with the model, and we combine our methods with meta-learning, an established, but complementary, technique for improving cross-lingual transfer. Finally, we provide extensive analyses of how each of our methods affects the models.
Article
Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems.