Alon Talmor's research while affiliated with Tel Aviv University and other places

Publications (20)

Preprint
Constructing benchmarks that test the abilities of modern natural language understanding models is difficult - pre-trained language models exploit artifacts in benchmarks to achieve human parity, but still fail on adversarial examples and make errors that demonstrate a lack of common sense. In this work, we propose gamification as a framework for d...
Preprint
Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs, where answering the question requires reasoning over multiple f...
Preprint
When answering complex questions, people can seamlessly combine information from visual, textual and tabular sources. While interest in models that reason over multiple pieces of evidence has surged in recent years, there has been relatively little work on question answering models that reason across multiple modalities. In this paper, we present M...
Article
Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations s...
Preprint
To what extent can a neural network systematically reason over symbolic facts? Evidence suggests that large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control. Recently, it has been shown that Transformer-based models succeed in consistent reasoning over explicit symbolic facts, under a "clos...
Preprint
Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations s...
Preprint
Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of...
Preprint
We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were m...
Preprint
Full-text available
Recent years have seen a dramatic expansion of tasks and datasets posed as question answering, from reading comprehension, semantic role labeling, and even machine translation, to image and video understanding. With this expansion, there are many differing views on the utility and definition of "question answering" itself. Some argue that its scope...
Preprint
A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more...
Preprint
Full-text available
When answering a question, people often draw upon their rich world knowledge in addition to some task-specific context. Recent work has focused primarily on answering questions based on some relevant document or content, and required very little general background. To investigate question answering with prior knowledge, we present CommonsenseQA: a...
Preprint
Recently, Talmor and Berant (2018) introduced ComplexWebQuestions - a dataset focused on answering complex questions by decomposing them into a sequence of simpler questions and extracting the answer from retrieved web snippets. In their work the authors used a pre-trained reading comprehension (RC) model (Salant and Berant, 2018) to extract the an...
Article
Answering complex questions is a time-consuming activity for humans that requires reasoning and integration of information. Recent work on reading comprehension made headway in answering simple questions, but tackling complex questions is still an ongoing research challenge. Conversely, semantic parsers have been successful at handling compositiona...
Article
Full-text available
Semantic parsing shines at analyzing complex natural language that involves composition and computation over multiple pieces of evidence. However, datasets for semantic parsing contain many factoid questions that can be answered from a single web document. In this paper, we propose to evaluate semantic parsing-based question answering models by com...

Citations

... Gordon and Van Durme [2013] note that learning about the world from language corpora is challenging because much of world knowledge is implied: people are much more likely to communicate new or unusual information rather than commonly known facts. Indeed, language models have impaired knowledge of domains that are underreported, such as basic shape knowledge [e.g., "the wheels are round" Lucy and Gauthier, 2017, Utsumi, 2020, Chersoni et al., 2021 and object size knowledge [e.g., "a table is smaller than an airplane" Talmor et al., 2020a, Liu et al., 2022b. These shortcomings arise from the fact that these models are trained to extract statistical information about words in text rather than a set of stable, consistent, and complete facts about the world. ...
... We collect three types of supervision signals for model pretraining: named entity annotation in Ontonotes for task annotation D task ; NYT (Riedel, Yao, and McCallum 2010) and Rebel (Huguet Cabot and Navigli 2021) for distant supervision D distant ; machine reading comprehension from MRQA (Fisch et al. 2019) for indirect supervision D indirect . For the Rebel data, we only keep the 230 most frequently occurring relation types and randomly sample 300K instances for pretraining. ...
... Effective communication in a second language (L2) requires learners to understand what words mean in context (Wright & Cervetti, 2017), making vocabulary building essential (Nation & Webb, 2011;Susanto, 2017;Wang & Treffers-Daller, 2017). In reading, for example, a higher vocabulary leads to greater comprehension (Gardner et al., 2019), and someone must know at least 98% of the words in a passage to interpret its meaning (Schmitt et al., 2011). Furthermore, it is thought that a person needs to know at least 6,000 words to interpret written content in a language (Frevert et al., 2014). ...
... In other words, providing a justification does not significantly increase the cognitive load of annotators, because they already perform this task when choosing the correct label, albeit implicitly (Zaidan, Eisner, and Piatko 2007). For existing benchmarks, rationale annotations can be gathered postpublication (Dua et al. 2019). ...
... A straightforward approach is directly mixing the datasets of each task without any task-specific marks and fine-tuning PLMs on mixed datasets. For example, Alon et al. [63] propose MultiQA. They train models on multiple datasets and find it leads to robust generalization in reading comprehension task. ...
... Although more and more datasets and methods for multi-hop QA have been proposed, there are still many challenges that need to be solved in this field. For example, the data source of most datasets on multi-hop QA is limited to one information type, such as textual documents [3,4], semi-structured tables [5], or knowledgebased (KB) [6,7]. However, different types of data are often complementary to each other. ...
... Internet powered Q&A: Several recent works have attempted complex question answering using the internet. Talmor and Berant [2018] retrieve snippets of information from the web with a search engine to answer a structured decomposition of the question with a simpler Q&A model. The recent WebGPT [Nakano et al., 2021] system takes a broadly similar approach to long-form question answering. ...
... Evaluating The Parsing Accuracy To evaluate performance on the datasets, we use the exact match parsing accuracy [63,71,24]. This metric is whether the parse exactly matches the gold parse in the dataset. ...
... Webclopedia is a combination of Information Retrieval, NLP components and statistical techniques. WebQA [18] is based on a Web Question Answering System based on named entities recognition which uses template mapping techniques to detect the type of question. The wolfram alpha [19] is a computational solution motor created through wolfram alpha LLC, an auxiliary of Wolfram Research. ...