Albert Gatt

Albert Gatt
University of Malta · Institute of Linguistics

PhD

About

164
Publications
26,157
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,675
Citations
Introduction
My main interest is in computational linguistics and experimental psycholinguistics. I focus mostly on automatic generation of text from non-linguistic data, and on language production. I also work on digital language resources and tools for small languages, in particular, Maltese.
Additional affiliations
November 2004 - September 2009
University of Aberdeen
Position
  • Research Associate
September 2014 - present
Tilburg University
Position
  • Research Associate

Publications

Publications (164)
Preprint
Full-text available
In Natural Language Generation (NLG), important information is sometimes omitted in the output text. To better understand and analyse how this type of mistake arises, we focus on RDF-to-Text generation and explore two methods of probing omissions in the encoder output of BART (Lewis et al, 2020) and of T5 (Raffel et al, 2019): (i) a novel parameter...
Preprint
Full-text available
This study investigates the ability of various vision-language (VL) models to ground context-dependent and non-context-dependent verb phrases. To do that, we introduce the CV-Probes dataset, designed explicitly for studying context understanding, containing image-caption pairs with context-dependent verbs (e.g., "beg") and non-context-dependent ver...
Preprint
Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use of automatic metrics, focusing particularly on natural language generation (NLG) tasks. We inspect...
Preprint
Full-text available
Various benchmarks have been proposed to test linguistic understanding in pre-trained vision \& language (VL) models. Here we build on the existence task from the VALSE benchmark (Parcalabescu et al, 2022) which we use to test models' understanding of negation, a particularly interesting issue for multimodal models. However, while such VL benchmark...
Article
Full-text available
Among the existing eXplainable AI (XAI) approaches, Feature Attribution methods are a popular option due to their interpretable nature. However, each method leads to a different solution, thus introducing uncertainty regarding their reliability and coherence with respect to the underlying model. This work introduces TextFocus , a metric for evalu...
Article
Full-text available
When applied to Image-to-text models, explainability methods have two challenges. First, they often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. This makes explanations expensive to compute and unable to comprehensively explain the model's output. Second, for models with vis...
Preprint
Full-text available
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable...
Preprint
Full-text available
When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. Those explanations are expensive to compute and unable to comprehensively explain the model's output. Therefore, these models often require some sort of approxim...
Article
Full-text available
Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation ( REG ), where the production of identifying descriptions is always dependent on a given context. Re...
Preprint
Full-text available
Current captioning datasets, focus on object-centric captions, describing the visible objects in the image, often ending up stating the obvious (for humans), e.g. "people eating food in a park". Although these datasets are useful to evaluate the ability of Vision & Language models to recognize the visual content, they lack in expressing trivial abs...
Preprint
Full-text available
Image captioning models tend to describe images in an object-centric way, emphasising visible objects. But image descriptions can also abstract away from objects and describe the type of scene depicted. In this paper, we explore the potential of a state-of-the-art Vision and Language model, VinVL, to caption images at the scene level using (1) a no...
Conference Paper
Full-text available
Natural language techniques have been employed in attempts to automatically translate legal texts, and specifically contracts, into formal models that allow automatic reasoning. However, such techniques suffer from incomplete coverage, typically resulting in parts of the text being left uninterpreted, and which, in turn, may result in the formal mo...
Conference Paper
In this work we link the understandability of machine learning models to the complexity of their SHapley Additive exPlanations (SHAP). Thanks to this reframing we introduce two novel metrics for understandability: SHAP Length and SHAP Interaction Length. These are model-agnostic, efficient, intuitive and theoretically grounded metrics that are anch...
Preprint
Current image description generation models do not transfer well to the task of describing human faces. To encourage the development of more human-focused descriptions, we developed a new data set of facial descriptions based on the CelebA image data set. We describe the properties of this data set, and present results from a face description gener...
Preprint
Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety of languages, but many languages remain excluded from these models. In this paper, we analyse the effect of pre-training with monolingual data for a low-resource language that is not included in mBERT -- Maltese -- with a range of pre-training set up...
Article
Full-text available
Twitter sentiment has been shown to be useful in predicting whether Bitcoin’s price will increase or decrease. Yet the state-of-the-art is limited to predicting the price direction and not the magnitude of increase/decrease. In this paper, we seek to build on the state-of-the-art to not only predict the direction yet to also predict the magnitude o...
Article
Full-text available
Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of year...
Preprint
Full-text available
We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models t...
Preprint
Full-text available
Developing speech technologies is a challenge for low-resource languages for which both annotated and raw speech data is sparse. Maltese is one such language. Recent years have seen an increased interest in the computational processing of Maltese, including speech technologies, but resources for the latter remain sparse. In this paper, we consider...
Preprint
Full-text available
Images can be described in terms of the objects they contain, or in terms of the types of scene or place that they instantiate. In this paper we address to what extent pretrained Vision and Language models can learn to align descriptions of both types with images. We compare 3 state-of-the-art models, VisualBERT, LXMERT and CLIP. We find that (i) V...
Preprint
Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one. This paper analyses the relationship between them, in the context of fine-tuning on two tasks -- POS tagging and natural language inference -- which require the model to bring to bear different...
Article
Full-text available
Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded i...
Chapter
Full-text available
We have defined an interdisciplinary program for training a new generation of researchers who will be ready to leverage the use of Artificial Intelligence (AI)-based models and techniques even by non-expert users. The final goal is to make AI self-explaining and thus contribute to translating knowledge into products and services for economic and so...
Preprint
Full-text available
An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method...
Conference Paper
Full-text available
The opaque nature of many machine learning techniques prevents the wide adoption of powerful information processing tools for high stakes scenarios. The emerging field eXplainable Artificial Intelligence (XAI) aims at providing justifications for automatic decision-making systems in order to ensure reliability and trustworthiness in the users. For...
Chapter
Full-text available
The European MAPA (Multilingual Anonymisation for Public Administrations) project aims at developing an open-source solution for automatic de-identification of medical and legal documents. We introduce here the context, partners and aims of the project, and report on preliminary results.
Preprint
Full-text available
Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data,...
Preprint
Full-text available
Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bi...
Conference Paper
We have defined an interdisciplinary program for training a new generation of researchers who will be ready to leverage the use of Artificial Intelligence (AI)-based models and techniques even by non-expert users. The final goal is to make AI self-explaining and thus contribute to translating knowledge into products and services for economic and so...
Preprint
Full-text available
This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary. The proposed scheme is motivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ matters in Malta, which was conducted under the auspices of the EU-funded C.O.N.T.A.C.T. project....
Preprint
Full-text available
Maltese, the national language of Malta, is spoken by approximately 500,000 people. Speech processing for Maltese is still in its early stages of development. In this paper, we present the first spoken Maltese corpus designed purposely for Automatic Speech Recognition (ASR). The MASRI-HEADSET corpus was developed by the MASRI project at the Univers...
Chapter
Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data,...
Preprint
A neural language model can be conditioned into generating descriptions for images by providing visual information apart from the sentence prefix. This visual information can be included into the language model through different points of entry resulting in different neural architectures. We identify four main architectures which we call init-injec...
Article
Full-text available
We describe an applied methodology to build fuzzy models of geographical expressions, which are meant to be used for natural language generation purposes. Our approach encompasses a language grounding task within the development of an actual data-to-text system for the generation of textual descriptions of live weather data. For this, we gathered d...
Preprint
Full-text available
Natural Language Inference (NLI) is the task of determining the semantic relationship between a premise and a hypothesis. In this paper, we focus on the {\em generation} of hypotheses from premises in a multimodal setting, to generate a sentence (hypothesis) given an image and/or its description (premise) as the input. The main goals of this paper...
Preprint
Inspired by Labov's seminal work on stylistic variation as a function of social stratification, we develop and compare neural models that predict a person's presumed socio-economic status, obtained through distant supervision,from their writing style on social media. The focus of our work is on identifying the most important stylistic parameters to...
Article
Full-text available
In psycholinguistics, there has been relatively little work investigating conceptualization-how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existi...
Preprint
In psycholinguistics, there has been relatively little work investigating conceptualisation –how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of AI, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms fo...
Chapter
Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs, and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial, and operational sources is now overtaking healthcare organizati...
Chapter
Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions. Such pre-gen metrics are strongly correlated to standard evaluation metrics.
Chapter
This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the posi...
Preprint
When designing a neural caption generator, a convolutional neural network can be used to extract image features. Is it possible to also use a neural language model to extract sentence prefix features? We answer this question by trying different ways to transfer the recurrent neural network and embedding layer from a neural language model to an imag...
Preprint
Full-text available
Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions. Such pre-gen metrics are strongly correlated to standard evaluation metrics.
Preprint
Full-text available
This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the posi...
Preprint
In this paper we study empirically the validity of measures of referential success for referring expressions involving gradual properties. More specifically, we study the ability of several measures of referential success to predict the success of a user in choosing the right object, given a referring expression. Experimental results indicate that...
Conference Paper
Full-text available
We present a data resource which can be useful for research purposes on language grounding tasks in the context of geographical referring expression generation. The resource is composed of two data sets that encompass 25 different geographical descriptors and a set of associated graphical representations, drawn as polygons on a map by two groups of...
Preprint
Full-text available
We present a data resource which can be useful for research purposes on language grounding tasks in the context of geographical referring expression generation. The resource is composed of two data sets that encompass 25 different geographical descriptors and a set of associated graphical representations, drawn as polygons on a map by two groups of...
Preprint
Full-text available
Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial and operational sources is now overtaking healthcare organization...
Chapter
Full-text available
Healthcare organizations are in a continuous effort to improve health outcomes , reduce costs and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial and operational sources is now overtaking healthcare organizatio...
Preprint
Full-text available
Capturing semantic relations between sentences, such as entailment, is a long-standing challenge for computational semantics. Logic-based models analyse entailment in terms of possible worlds (interpretations, or situations) where a premise P entails a hypothesis H iff in all worlds where P is true, H is also true. Statistical models view this rela...
Article
Full-text available
The past few years have witnessed renewed interest in NLP tasks at the interface between vision and language. One intensively-studied problem is that of automatically generating text from images. In this paper, we extend this problem to the more specific domain of face description. Unlike scene descriptions, face descriptions are more fine-grained...
Article
Full-text available
This article describes the development of a free/open-source morphological description of Maltese, originally created as the analysis component in a rule-based machine translation system for Maltese to Arabic and later applied to other tasks. The lexicon formalism we use is lttoolbox, part of the Apertium machine translation platform. An evaluation...
Conference Paper
Full-text available
In neural image captioning systems, a recurrent neural network (RNN) is typically viewed as the primary `generation' component. The dominant model in the literature is one in which visual features encoded by a convolutional network are `injected' into the RNN. An alternative architecture encodes visual and linguistic features separately, merging th...
Preprint
In neural image captioning systems, a recurrent neural network (RNN) is typically viewed as the primary `generation' component. This view suggests that the image features should be `injected' into the RNN. This is in fact the dominant view in the literature. Alternatively, the RNN can instead be viewed as only encoding the previously generated word...
Conference Paper
We present a novel heuristic approach that defines fuzzy geographical descriptors using data gathered from a survey with human subjects. The participants were asked to provide graphical interpretations of the descriptors `north' and `south' for the Galician region (Spain). Based on these interpretations, our approach builds fuzzy descriptors that a...
Article
Full-text available
We present a novel heuristic approach that defines fuzzy geographical descriptors using data gathered from a survey with human subjects. The participants were asked to provide graphical interpretations of the descriptors `north' and `south' for the Galician region (Spain). Based on these interpretations, our approach builds fuzzy descriptors that a...
Article
Full-text available
This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applic...