Mirella Lapata

Mirella Lapata
The University of Edinburgh | UoE · School of Informatics

About

265
Publications
27,308
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,669
Citations
Introduction
Skills and Expertise

Publications

Publications (265)
Preprint
Full-text available
The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditiona...
Article
We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input. We focus on generating long-form text, that is, documents with multiple paragraphs, and propose a neural model enhanced with a planning component responsible for organizing high-level information in a coherent and meaningful way. We infer...
Preprint
Full-text available
In this work, we study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models. We show there exists a network of latent states that summarize linguistic properties of contextualized representations. Instead of seeking alignments to existing well-defined annotations, we infer this lat...
Article
The availability of large-scale datasets has driven the development of neural models that create generic summaries for single or multiple documents. For query-focused summarization (QFS), labeled training data in the form of queries, documents, and summaries is not readily available. We provide a unified modeling framework for any kind of summariza...
Preprint
Full-text available
We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality compared to previous stochastic decoding strategies. It builds on recently proposed plan-based neural generation models (Narayan et al, 2021) that are trained to first create a composition of the output and then ge...
Preprint
Full-text available
We propose a generative model of paraphrase generation, that encourages syntactic diversity by conditioning on an explicit syntactic sketch. We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings as a sequence of discrete latent variables that make iterative refinem...
Preprint
We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input. We focus on generating long-form text, i.e., documents with multiple paragraphs, and propose a neural model enhanced with a planning component responsible for organizing high-level information in a coherent and meaningful way. We infer la...
Preprint
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language. The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German, and the methodology for its creation can be applied to several other lan...
Article
Full-text available
We present a memory-based model for context- dependent semantic parsing. Previous approaches focus on enabling the decoder to copy or modify the parse from the previous utterance, assuming there is a dependency between the current and previous parses. In this work, we propose to represent contextual information using an external memory. We learn a...
Preprint
Full-text available
Background: Natural language processing could assist multiple tasks in systematic reviews to reduce workflow, including the extraction of PICO elements such as study populations, interventions and outcomes. The PICO framework provides a basis for the retrieval and selection for inclusion of published evidence relevant to a specific systematic revie...
Preprint
Full-text available
There is mounting evidence that existing neural network models, in particular the very popular sequence-to-sequence architecture, struggle with compositional generalization, i.e., the ability to systematically generalize to unseen compositions of seen components. In this paper we demonstrate that one of the reasons hindering compositional generaliz...
Article
Full-text available
We sought to apply natural language processing to the task of automatic risk of bias assessment in preclinical literature, which could speed the process of systematic review, provide information to guide research improvement activity, and support translation from preclinical to clinical research. We use 7,840 full-text publications describing anima...
Preprint
Full-text available
Opinion summarization has been traditionally approached with unsupervised, weakly-supervised and few-shot learning techniques. In this work, we collect a large dataset of summaries paired with user reviews for over 31,000 products, enabling supervised training. However, the number of reviews per product is large (320 on average), making summarizati...
Preprint
Full-text available
Recent work on opinion summarization produces general summaries based on a set of input reviews and the popularity of opinions expressed in them. In this paper, we propose an approach that allows the generation of customized summaries based on aspect queries (e.g., describing the location and room of a hotel). Using a review corpus, we create a syn...
Conference Paper
Full-text available
We present a memory-based model for context-dependent semantic parsing. Previous approaches focus on enabling the decoder to copy or modify the parse from the previous utterance, assuming there is a dependency between the current and previous parses. In this work, we propose to represent contextual information using an external memory. We learn a c...
Preprint
We present a memory-based model for context-dependent semantic parsing. Previous approaches focus on enabling the decoder to copy or modify the parse from the previous utterance, assuming there is a dependency between the current and previous parses. In this work, we propose to represent contextual information using an external memory. We learn a c...
Article
The ability to convey relevant and diverse information is critical in multi-document summarization and yet remains elusive for neural seq-to-seq models whose outputs are often redundant and fail to correctly cover important details. In this work, we propose an attention mechanism which encourages greater focus on relevance and diversity. Attention...
Preprint
Full-text available
Objective: We sought to apply natural language processing to the task of automatic risk of bias assessment in preclinical literature, which could speed the process of systematic review, provide information to guide research improvement activity, and support translation from preclinical to clinical research. Materials and Methods: We use 7,840 full-...
Preprint
Full-text available
Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i.e., interpret sentences representing novel combinations of concepts (e.g., text segmen...
Preprint
Full-text available
We propose a method for generating paraphrases of English questions that retain the original intent but use a different surface form. Our model combines a careful choice of training objective with a principled information bottleneck, to induce a latent encoding space that disentangles meaning and form. We train an encoder-decoder model to reconstru...
Preprint
The availability of large-scale datasets has driven the development of neural models that create summaries from single documents, for generic purposes. When using a summarization system, users often have specific intents with various language realizations, which, depending on the information need, can range from a single keyword to a long narrative...
Article
Full-text available
Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or variants thereof. These models generate text that is fluent (but often imprecise) and perform quite poorly at selecting appropriate content and ordering it coherently. To overcome some of these issues, we propose a neural model with a macro...
Preprint
Full-text available
Recent work in crosslingual semantic parsing has successfully applied machine translation to localize accurate parsing to new languages. However, these advances assume access to high-quality machine translation systems, and tools such as word aligners, for all test languages. We remove these assumptions and study cross-lingual semantic parsing as a...
Preprint
Full-text available
Semantic parsing aims at translating natural language (NL) utterances onto machine-interpretable programs, which can be executed against a real-world environment. The expensive annotation of utterance-program pairs has long been acknowledged as a major bottleneck for the deployment of contemporary neural models to real-life applications. In this wo...
Article
Full-text available
We present the Quantized Transformer (QT), an unsupervised system for extractive opinion summarization. QT is inspired by Vector- Quantized Variational Autoencoders, which we repurpose for popularity-driven summarization. It uses a clustering interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among h...
Article
Full-text available
We consider the task of cross-lingual semantic parsing in the style of Discourse Representation Theory (DRT) where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide learning in other languages. We introduce Universal Discourse Representation Theory (UDRT), a variant of DRT that explicitly anchors semant...
Preprint
Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or variants thereof. These models generate text which is fluent (but often imprecise) and perform quite poorly at selecting appropriate content and ordering it coherently. To overcome some of these issues, we propose a neural model with a macr...
Preprint
The availability of large-scale datasets has driven the development of neural sequence-to-sequence models to generate generic summaries, i.e., summaries which do not correspond to any pre-specified queries. However, due to the lack of training data, query focused summarization (QFS) has been studied mainly with extractive methods. In this work, we...
Preprint
Full-text available
The recent success of deep learning techniques for abstractive summarization is predicated on the availability of large-scale datasets. When summarizing reviews (e.g., for products or movies), such training data is neither available nor can be easily sourced, motivating the development of methods which rely on synthetic datasets for supervised trai...
Preprint
Full-text available
We summarize full-length movies by creating shorter videos containing their most informative scenes. We explore the hypothesis that a summary can be created by assembling scenes which are turning points (TPs), i.e., key events in a movie that describe its storyline. We propose a model that identifies TP scenes by building a sparse movie graph that...
Preprint
We present the Quantized Transformer (QT), an unsupervised system for extractive opinion summarization. QT is inspired by Vector-Quantized Variational Autoencoders, which we repurpose for popularity-driven summarization. It uses a clustering interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among hu...
Conference Paper
Full-text available
Sentence simplification aims to make sentences easier to read and understand. Recent approaches have shown promising results with encoder-decoder models trained on large amounts of parallel data which often only exists in English. We propose a zero-shot modeling framework which transfers simplification knowledge from English to another language (fo...
Preprint
Although neural sequence-to-sequence models have been successfully applied to semantic parsing, they struggle to perform well on query-based data splits that require \emph{composition generalization}, an ability of systematically generalizing to unseen composition of seen components. Motivated by the explicitly built-in compositionality in traditio...
Preprint
The importance of building semantic parsers which can be applied to new domains and generate programs unseen at training has long been acknowledged, and datasets testing out-of-domain performance are becoming increasingly available. However, little or no attention has been devoted to studying learning algorithms or objectives which promote domain g...
Preprint
In this paper we apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training on single reference and noisy datasets. Instead of relying on one-hot annotation labels, our student summarization model is trained with guidance from a teacher which generates smoothed labels to help regul...
Preprint
Opinion summarization is an automatic creation of text reflecting subjective information expressed in multiple documents, such as user reviews of a product. The task is practically important and has attracted a lot of attention. However, due to a high cost of summary production, datasets large enough for training supervised models are lacking. Inst...
Preprint
Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront. As a result, such models are biased on position and often perform a smart selection of sentences from the beginning of the document. When summarizing long narratives, which have complex structure and pres...
Preprint
It is a big challenge to model long-range input for document summarization. In this paper, we target using a select and generate paradigm to enhance the capability of selecting explainable contents (i.e., interpret the selection given its semantics, novelty, relevance) and then guiding to control the abstract generation. Specifically, a newly desig...
Preprint
The supervised training of high-capacity models on large datasets containing hundreds of thousands of document-summary pairs is critical to the recent success of deep learning techniques for abstractive summarization. Unfortunately, in most domains (other than news) such training data is not available and cannot be easily sourced. In this paper we...
Preprint
Successful linguistic communication relies on a shared experience of the world, and it is this shared experience that makes utterances meaningful. Despite the incredible effectiveness of language processing models trained on text alone, today's best systems still make mistakes that arise from a failure to relate language to the physical world it de...
Preprint
Datasets for semantic parsing scarcely consider languages other than English and professional translation can be prohibitively expensive. In this work, we propose to adapt a semantic parser trained on a single language, such as English, to new languages and multiple domains with minimal annotation. We evaluate if machine translation is an adequate...
Preprint
We consider the problem of better modeling query-cluster interactions to facilitate query focused multi-document summarization (QFS). Due to the lack of training data, existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments. In this work, we leverage distant supervision from question an...
Preprint
Summarization of opinions is the process of automatically creating text summaries that reflect subjective information expressed in input documents, such as product reviews. While most previous research in opinion summarization has focused on the extractive setting, i.e. selecting fragments of the input documents to produce a summary, we let the mod...
Preprint
Sentence simplification aims to make sentences easier to read and understand. Recent approaches have shown promising results with sequence-to-sequence models which have been developed assuming homogeneous target audiences. In this paper we argue that different users have different simplification needs (e.g. dyslexics vs. non-native speakers), and p...
Preprint
Semantic parses are directed acyclic graphs (DAGs), so semantic parsing should be modeled as graph prediction. But predicting graphs presents difficult technical challenges, so it is simpler and more common to predict the linearized graphs found in semantic parsing datasets using well-understood sequence models. The cost of this simplicity is that...
Article
We introduce "extreme summarization," a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question "What is the article about?". We argue that extreme summarization, by nature, is not amenable to extractive strategies and requires an abstractive modeling approach. In the hope of driving r...
Preprint
Full-text available
Semantic parsing aims to map natural language utterances onto machine interpretable meaning representations, aka programs whose execution against a real-world environment produces a denotation. Weakly-supervised semantic parsers are trained on utterance-denotation pairs treating programs as latent. The task is challenging due to the large search sp...
Preprint
Opinion summarization is the task of automatically generating summaries for a set of opinions about a specific target (e.g., a movie or a product). Since the number of input documents can be prohibitively large, neural network-based methods sacrifice end-to-end elegance and follow a two-stage approach where an extractive model first pre-selects a s...
Preprint
According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a screenplay: they define the plot structure, determine its progression and segment the screenplay into thematic units (e.g., setup, complications, aftermath). We propose the task of turning point identification in m...
Preprint
Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstr...
Preprint
In this paper we introduce domain detection as a new natural language processing task. We argue that the ability to detect textual segments which are domain-heavy, i.e., sentences or phrases which are representative of and provide evidence for a given domain could enhance the robustness and portability of various text classification applications. W...
Preprint
We introduce 'extreme summarization', a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question ``What is the article about?''. We argue that extreme summarization, by nature, is not amenable to extractive strategies and requires an abstractive modeling approach. In the hope of driving...
Article
Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We...
Preprint
Existing neural generation approaches create multi-sentence text as a single sequence. In this paper we propose a structured convolutional decoder that is guided by the content structure of target summaries. We compare our model with existing sequential decoders on three data sets representing different domains. Automatic and human evaluation demon...
Preprint
Single document summarization has enjoyed renewed interests in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. In this paper we develop an unsupervised approach arguing that it is unrealistic to expect large-scale and high-quality training data to be available or created for different typ...
Preprint
Recent approaches to data-to-text generation have shown great promise thanks to the use of large-scale datasets and the application of neural network architectures which are trained end-to-end. These models rely on representation learning to select content appropriately, structure it coherently, and verbalize it grammatically, treating entities as...
Preprint
In this paper, we develop a neural summarization model which can effectively process multiple input documents and distill Transformer architecture with the ability to encode documents in a hierarchical manner. We represent cross-document relationships via an attention mechanism which allows to share information as opposed to simply concatenating te...
Preprint
Generating texts which express complex ideas spanning multiple sentences requires a structured representation of their content (document plan), but these representations are prohibitively expensive to manually produce. In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction syste...
Article
Full-text available
In this paper we focus on learning dependency aware representations for semantic role labeling without recourse to an external parser. The backbone of our model is an LSTM-based semantic role labeler jointly trained with two auxiliary tasks: predicting the dependency label of a word and whether there exists an arc linking it to the predicate. The a...
Article
Full-text available
In this paper we introduce domain detection as a new natural language processing task. We argue that the ability to detect textual segments that are domain-heavy (i.e., sentences or phrases that are representative of and provide evidence for a given domain) could enhance the robustness and portability of various text classification applications. We...
Preprint
Full-text available
Categories such as animal or furniture are acquired at an early age and play an important role in processing, organizing, and communicating world knowledge. Categories exist across cultures: they allow to efficiently represent the complexity of the world, and members of a community strongly agree on their nature, revealing a shared mental represent...