About
21
Publications
2,432
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
137
Citations
Introduction
Michael Wayne Goodman recently completed a Ph.D. in Computational Linguistics from the Department of Linguistics at the University of Washington in Seattle. He focuses on semantic representations and machine translation as well as NLP for low-resource languages.
Additional affiliations
April 2014 - August 2015
Publications
Publications (21)
The Grammar Matrix project is a meta-grammar engineering framework expressed in Head-driven Phrase Structure Grammar (HPSG) and Minimal Recursion Semantics (MRS). It automates grammar implementation and is thus a tool and a resource for linguistic hypothesis testing at scale. In this paper, we summarize how the Grammar Matrix grew in the last decad...
Meaning Representation (AMR; Banarescu et al., 2013) encodes the meaning of sentences as a directed graph and Smatch (Cai and Knight, 2013) is the primary metric for evaluating AMR graphs. Smatch, however, is unaware of some meaning-equivalent variations in graph structure allowed by the AMR Specification and gives different scores for AMRs exhibit...
We propose neural models to generate high-quality text from structured representations based on Minimal Recursion Semantics (MRS). MRS is a rich semantic representation that encodes more precise semantic detail than other representations such as Abstract Meaning Representation (AMR). We show that a sequence-to-sequence model that maps a linearizati...
Extracting semi-structured text from scientific writing in PDF files is a difficult task that researchers have faced for decades. In the 1990s, this task was largely a computer vision and OCR problem, as PDF files were often the result of scanning printed documents. Today, PDFs have standardized digital typesetting without the need for OCR, but ext...
The current release of the ODIN (Online Database of Interlinear Text) database
contains over 150,000 linguistic examples, from nearly 1,500 languages, extracted
from PDFs found on the web, representing a significant source of data for language
research, particularly for low-resource languages. Errors introduced during PDF-to-text conversion or poor...
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swath of the world’s languages. In many cases this involves boot...
We describe resources aimed at increasing the usability of the semantic representations utilized within the DELPH-IN (Deep Linguistic Processing with HPSG) consortium. We concentrate in particular on the Dependency Minimal Recursion Semantics (DMRS) formalism, a graph-based representation designed for compositional semantic representation with deep...
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swathe of the worlds languages. In many cases this involves boot...
This paper presents Xigt, an extensible storage format for interlinear glossed text (IGT). We review design desiderata for such a format based on our own use cases as well as general best practices, and then explore existing representations of IGT through the lens of those desiderata. We give an overview of the data model and XML serialization of X...
We present a case study of the methodology of using information extracted from interlinear glossed text (IGT) to create of actual working HPSG grammar fragments using the Grammar Matrix focusing on one language: Chintang. Though the results are barely measurable in terms of coverage over running text, they nonetheless provide a proof of concept. Ou...
The LinGO Grammar Matrix customization system (Bender et al., 2002, 2010) is a web-based software system for creating implemented HPSG (Pollard and Sag, 1994) grammar fragments on the basis of user input of typological
This demonstration presents the LinGO Grammar Matrix grammar customization system: a repository of distilled linguistic knowledge and a web-based service which elicits a typological description of a language from the user and yields a customized grammar fragment ready for sustained development into a broad-coverage grammar. We describe the implemen...
We demonstrate that the bidirectionality of deep grammars, allowing them to generate as well as parse sentences, can be used to automatically and effectively identify errors in the grammars. The system is tested on two implemented HPSG grammars: Jacy for Japanese, and the ERG for English. Using this system, we were able to increase generation cover...