Conference Paper

A Method for Building a Commonsense Inference Dataset based on Basic Events

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The fourth best resourced language is Japanese, with a Cloze RC dataset [270], a manual translation of a part of SQuAD [10], and a commonsense reasoning resource [189]. ...
Preprint
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of ``reasoning types" in question answering and propose a new taxonomy. We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.
Article
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of “skills” that question answering/reading comprehension systems are supposed to acquire, and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of over-focusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.
Article
This paper describes our dataset of Japanese cloze questions designed for the evaluation of machine reading comprehension. The dataset consists of questions automatically generated from Aozora Bunko, and each question is defined as a 4-tuple: a context passage, a query holding a slot, an answer character, and a set of possible answer characters. The query is generated from the original sentence, which appears immediately after the context passage on the target book, by replacing the answer character with the slot. The set of possible answer characters consists of the answer character and the other characters who appear in the context passage. Because the context passage and the query share the same context, a machine that precisely understands the context may select the correct answer from the set of possible answer characters. The unique point of our approach is that we focus on characters of target books as slots to generate queries from original sentences because they play important roles in narrative texts and a precise understanding of their relationship is necessary for reading comprehension. To extract characters from target books, manually created dictionaries of characters are employed because some characters appear as common nouns, not as named entities.
Article
Full-text available
We present an unsupervised method for inducing semantic frames from verb uses in giga-word corpora. Our semantic frames are verb-specific example-based frames that are distinguished according to their senses. We use the Chinese Restaurant Process to automatically induce these frames from a massive amount of verb instances. In our experiments, we acquire broad-coverage semantic frames from two giga-word corpora, the larger comprising 20 billion words. Our experimental results indicate the effectiveness of our approach.
Conference Paper
Full-text available
SemEval-2012 Task 7 presented a deceptively simple challenge: given an English sentence as a premise, select the sentence amongst two alternatives that more plausibly has a causal relation to the premise. In this paper, we describe the development of this task and its motivation. We describe the two systems that competed in this task as part of SemEval-2012, and compare their results to those achieved in previously published research. We discuss the characteristics that make this task so difficult, and offer our thoughts on how progress can be made in the future.
Conference Paper
Full-text available
We present the second version of the Penn Discourse Treebank, PDTB-2.0, describing its lexically-grounded annotations of discourse relations and their two abstract object arguments over the 1 million word Wall Street Journal corpus. We describe all aspects of the annotation, including (a) the argument structure of discourse relations, (b) the sense annotation of the relations, an d (c) the attribution of discourse relations and each of their arguments. We list the differences between PDTB-1.0 and PDTB-2.0. We present representative statistics for several aspects of the annotation in the corp us.
Article
Full-text available
Since a decade ago, a person-century of effort has gone into building CYC, a universal schema of roughly 105 general concepts spanning human reality. Most of the time has been spent codifying knowledge about the concept; approximately 106 common sense axioms have been handcrafted for and entered into CYC's knowledge base, millions more have been inferred and cached by CYC. This paper studies the fundamental assumptions of doing such a large-scale project, reviews the technical lessons learned by the developers, and surveys the range of applications that are enabled by the technology.
Conference Paper
Much work in knowledge extraction from text tacitly assumes that the frequency with which people write about actions, outcomes, or properties is a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals. In this paper, we question this idea, examining the phenomenon of reporting bias and the challenge it poses for knowledge extraction. We conclude with discussion of approaches to learning commonsense knowledge from text despite this distortion.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • Jacob Devlin
  • Ming-Wei Chang
  • Kenton Lee
  • Kristina Toutanova
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota. Association for Computational Linguistics.
ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning
  • Maarten Sap
  • Emily Ronan Le Bras
  • Chandra Allaway
  • Nicholas Bhagavatula
  • Hannah Lourie
  • Brendan Rashkin
  • Noah A Roof
  • Yejin Smith
  • Choi
Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 -February 1, 2019, pages 3027-3035. AAAI Press.
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
  • Alon Talmor
  • Jonathan Herzig
  • Nicholas Lourie
  • Jonathan Berant
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149-4158, Minneapolis, Minnesota. Association for Computational Linguistics.
Super-GLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
  • Alex Wang
  • Yada Pruksachatkun
  • Nikita Nangia
  • Amanpreet Singh
  • Julian Michael
  • Felix Hill
  • Omer Levy
  • Samuel R Bowman
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019a. Super-GLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 3266-3280. Curran Associates, Inc.