Michael White

Michael White
The Ohio State University | OSU · Department of Linguistics

PhD

About

99
Publications
8,222
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,691
Citations
Citations since 2016
18 Research Items
538 Citations
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
Additional affiliations
September 2005 - present
The Ohio State University
Position
  • Professor (Associate)
June 2002 - May 2005
The University of Edinburgh
Position
  • Research Associate
August 1989 - June 1994
University of Pennsylvania
Position
  • PhD Student

Publications

Publications (99)
Article
Introduction: Advances in natural language understanding have facilitated the development of Virtual Standardized Patients (VSPs) that may soon rival human patients in conversational ability. We describe herein the development of an artificial intelligence (AI) system for VSPs enabling students to practice their history taking skills. Methods: O...
Article
Full-text available
Randomized prospective studies represent the gold standard for experimental design. In this paper, we present a randomized prospective study to validate the benefits of combining rule-based and data-driven natural language understanding methods in a virtual patient dialogue system. The system uses a rule-based pattern matching approach together wit...
Preprint
Full-text available
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requi...
Article
Introduction: Practicing a medical history using standardized patients is an essential component of medical school curricula. Recent advances in technology now allow for newer approaches for practicing and assessing communication skills. We describe herein a virtual standardized patient (VSP) system that allows students to practice their history ta...
Preprint
Generating fluent natural language responses from structured semantic representations is a critical step in task-oriented conversational systems. Avenues like the E2E NLG Challenge have encouraged the development of neural approaches, particularly sequence-to-sequence (Seq2Seq) models for this problem. The semantic representations used, however, ar...
Article
We investigate the extent to which syntactic choice in written English is influenced by processing considerations as predicted by Gibson's (2000) Dependency Locality Theory (DLT) and Surprisal Theory (Hale, 2001; Levy, 2008). A long line of previous work attests that languages display a tendency for shorter dependencies, and in a previous corpus st...
Article
In this survey, we review recent progress on surface realization in natural language generation (NLG), highlighting how machine learning models have moved beyond n-grams to successfully incorporate linguistic insights into increasingly rich models. We also advance the view that NLG still has much to gain by taking up insights from psycholinguistic...
Conference Paper
Full-text available
We investigate whether parsers can be used for self-monitoring in surface realization in order to avoid egregious errors involving "vicious" ambiguities, namely those where the intended interpretation fails to be considerably more likely than alternative ones. Using parse accuracy in a simple reranking strategy for selfmonitoring, we find that with...
Conference Paper
Full-text available
We present a novel algorithm for inducing Combinatory Categorial Grammars from dependency treebanks, along with initial experiments showing that it can be used to achieve competitive realization results using an enhanced version of the surface realization shared task data.
Conference Paper
Full-text available
Monolingual alignment is frequently required for natural language tasks that involve similar or comparable sentences. We present a new model for monolingual alignment in which the score of an alignment decomposes over both the set of aligned phrases as well as a set of aligned dependency arcs. Optimal alignments under this scoring function are deco...
Conference Paper
Full-text available
Comprehension and corpus studies have found that the tendency to minimize dependency length has a strong influence on constituent ordering choices. In this paper, we investigate dependency length minimization in the context of discriminative realization ranking, focusing on its potential to eliminate egregious ordering errors as well as better matc...
Conference Paper
Full-text available
We describe a new shared task on syntactic paraphrase ranking that is intended to run in conjunction with the main surface realization shared task. Taking advantage of the human judgments collected to evaluate the surface realizations produced by competing systems, the task is to automatically rank these realizations---viewed as syntactic paraphras...
Conference Paper
Full-text available
The Surface Realisation Shared Task was first run in 2011. Two common-ground input rep-resentations were developed and for the first time several independently developed surface realisers produced realisations from the same shared inputs. However, the input representa-tions had several shortcomings which we have been aiming to address in the time s...
Article
Full-text available
This report documents our efforts to develop a Generation Challenges 2011 surface realiza-tion system by converting the shared task deep inputs to ones compatible with OpenCCG. Al-though difficulties in conversion led us to em-ploy machine learning for relation mapping and to introduce several robustness measures into OpenCCG's grammar-based chart...
Conference Paper
This paper shows how glue rules can be used to increase the robustness of statistical chart realization in a manner inspired by dependency realization. Unlike the use of glue rules in MT---but like previous work with XLE on improving robustness with hand-crafted grammars---they are invoked here as a fall-back option when no grammatically complete r...
Conference Paper
Full-text available
This paper shows that using linguistically motivated features for English that-complementizer choice in an averaged perceptron model for classification can improve upon the prediction accuracy of a state-of-the-art realization ranking model. We report results on a binary classification task for predicting the presence/absence of a that-complementiz...
Conference Paper
Full-text available
We present a method of creating disjunctive logical forms (DLFs) from aligned sentences for grammar-based paraphrase generation us-ing the OpenCCG broad coverage surface realizer. The method takes as input word-level alignments of two sentences that are para-phrases and projects these alignments onto the logical forms that result from automatically...
Conference Paper
Full-text available
The Surface Realisation (SR) Task was a new task at Generation Challenges 2011, and had two tracks: (1) Shallow: mapping from shal-low input representations to realisations; and (2) Deep: mapping from deep input represen-tations to realisations. Five teams submitted six systems in total, and we additionally evalu-ated human toplines. Systems were e...
Article
Full-text available
This article introduces Discourse Combinatory Categorial Grammar (DCCG) and shows how it can be used to generate multi-sentence paraphrases, flexibly incorporating both intra- and inter- sentential discourse connectives. DCCG employs a simple, practical approach to extending Combinatory Categorial Grammar (CCG) to encompass coverage of discourse-le...
Article
Full-text available
Generating responses that take user preferences into account requires adaptation at all levels of the generation process. This article describes a multi-level approach to presenting user-tailored information in spoken dialogues which brings together for the first time multi-attribute decision models, strategic content planning, surface realization...
Conference Paper
Full-text available
We present the first evaluation of the utility of automatic evaluation metrics on surface realizations of Penn Treebank data. Using outputs of the OpenCCG and XLE realizers, along with ranked WordNet synonym substitutions, we collected a corpus of generated surface realizations. These outputs were then rated and post-edited by human annotators. We...
Conference Paper
Full-text available
This paper shows that incorporating linguistically motivated features to ensure correct animacy and number agreement in an averaged perceptron ranking model for CCG realization helps improve a state-of-the-art baseline even further. Traditionally, these features have been modelled using hard constraints in the grammar. However, given the graded nat...
Conference Paper
We show that a ranking model produced by machine learning outperforms two baselines when applied to the task of selecting texts for use in creating a unit-selection synthesis voice with good domain coverage. The model learns to predict the estimated utility of an utterance based on features relating it to the utterances selected so far and a corpus...
Article
Full-text available
This study examines the relationship between online pro-cessing effects observed in earlier eye-tracking experiments [1, 2] and offline quality ratings gathered for the synthetic and natural speech stimuli used in these experiments, along with their acoustic-prosodic properties. White et al. [2] reported that even high-quality synthetic speech fail...
Conference Paper
Full-text available
This paper describes how named entity (NE) classes can be used to improve broad cover- age surface realization with the OpenCCG re- alizer. Our experiments indicate that collaps- ing certain multi-word NEs and interpolating a language model where NEs are replaced by their class labels yields the largest quality in- crease, with 4-grams adding a sma...
Conference Paper
Full-text available
This paper shows that discriminative reranking with an averaged perceptron model yields substantial improvements in realization quality with CCG. The paper confirms the utility of including language model log probabilities as features in the model, which prior work on discrimina- tive training with log linear models for HPSG realization had called...
Conference Paper
Full-text available
The past decade has witnessed remarkable progress in speech synthesis research, to the point where synthetic voices can be hard to distinguish from natural ones, at least for utterances with neutral, declarative prosody. Neutral intonation often does not suffice, however, in interactive systems: instead it can sound disengaged or “dead,” and can be...
Article
Full-text available
Corpus conversion and grammar extraction have traditionally been portrayed as tasks that are performed once and never again revisited (Burke et al., 2004). We report the successful implementation of an approach to these tasks that facilitates the improvement of grammar engineering as an evolving process. Taking the standard version of the CCGbank (...
Article
Full-text available
This paper describes a more precise anal-ysis of punctuation for a bi-directional, broad coverage English grammar extracted from the CCGbank (Hockenmaier and Steedman, 2007). We discuss various ap-proaches which have been proposed in the literature to constrain overgeneration with punctuation, and illustrate how as-pects of Briscoe's (1994) influen...
Conference Paper
Full-text available
This paper describes a method of accurately projecting Propbank roles onto constituents in the CCGbank with near perfect accuracy and automatically annotating verbal categories with the semantic roles of their arguments. The current version of the CCGbank annotates arguments and adjuncts in a suboptimal way - it relies heavily on the Penn Treebank...
Conference Paper
Full-text available
In lexicalized grammatical formalisms, it is possible to separate lexical category assign- ment from the combinatory processes that make use of such categories, such as pars- ing and realization. We adapt techniques from supertagging — a relatively recent tech- nique that performs complex lexical tagging before full parsing (Bangalore and Joshi, 19...
Conference Paper
Full-text available
We investigate two methods for enhancing variation in the output of a stochastic surface realiser: choosing from among the highest-scoring realisation candidates instead of taking the single highestscoring result (e-best sampling), and penalising the words from earlier sentences in a discourse when generating later ones (anti-repetition scoring). I...
Article
Full-text available
We describe a chart realization algorithm for Combinatory Categorial Grammar (CCG), and show how it can be used to efficiently realize a wide range of coordination phenomena, including argument cluster coordination and gapping. The algorithm incorporates three novel methods for improving the efficiency of chart realization: (i) using rules to chunk...
Conference Paper
Full-text available
This paper presents a method for adapting a language generator to the strengths and weaknesses of a synthetic voice, thereby improving the naturalness of synthetic speech in a spoken language dialogue sys- tem. The method trains a discriminative reranker to select paraphrases that are pre- dicted to sound natural when synthesized. The ranker is tra...
Article
Full-text available
This paper presents a novel algorithm for efficiently generating paraphrases from disjunctive logical forms. The algorithm is couched in the framework of Combinatory Categorial Grammar (CCG) and has been implemented as an extension to the OpenCCG surface realizer. The algorithm makes use of packed representations similar to those initially proposed...
Article
Full-text available
We present an extensible API for inte- grating language modeling and realiza- tion, describing its design and efficient implementation in the OpenCCG sur- face realizer. With OpenCCG, language models may be used to select realiza- tions with preferred word orders, pro- mote alignment with a conversational partner, avoid repetitive language use, and...
Conference Paper
Full-text available
We describe how context-sensitive, usertailored output is specified and produced in the COMIC multimodal dialogue system. At the conference, we will demonstrate the user-adapted features of the dialogue manager and text planner. 1
Article
Full-text available
For a successful and satisfying interaction, a dialogue partic-ipant may align their language to be more like that of their interlocutor. In the first part of this paper, we examine the alignment phenomenon from the viewpoint of personality-related, linguistic, sociolinguistic and psycholinguistic re-search, concluding that some people are stronger...
Article
Full-text available
This document describes the first NIST MT Evaluation submission of the newly formed Edinburgh University Statistical Machine Translation Group. Our entry to the 2005 DARPA/NIST MT Evalua- tion was largely based on the 2004 MIT system. In a two month effort we fo- cused on adding more data and a few new features to our Arabic-English sys- tem. We al...
Article
We describe a method of synthesising contextually appropriate intonation with limited domain unit selection voices. The method enables the natural language generation component of a dialogue system to specify its intonation choices via APML, an XML-based markup language. In a pilot study, we built an APML-aware limited domain voice for use in fligh...
Article
Full-text available
We describe the design of an MT system that em-ploys transfer rules induced from parsed bitexts and present evaluation results. The system learns lexico-structural transfer rules using syntactic pat-tern matching, statistical co-occurrence and error-driven filtering. In an experiment with domain-specific Korean to English translation, the approach...
Article
Full-text available
We describe an approach to presenting information in spoken dialogues that for the first time brings together multi-attribute decision models, strategic content planning, state-of-the-art dialogue management, and realization which incorporates prosodic features. The system selects the most important subset of available options to mention and the at...
Conference Paper
Full-text available
We present a novel ensemble of six methods for improv- ing the eciency of chart realization. The methods are couched in the framework of Combinatory Categorial Grammar (CCG), but we con- jecture that they can be adapted to related grammatical frameworks as well. The ensemble includes two new methods introduced here— feature-based licensing and inst...
Article
We describe an approach to presenting information in spoken dialogues that for the first time brings together multi-attribute decision models, strategic content planning, state-of-the-art dialogue management, and realization which incorporates prosodic features. The system selects the most important subset of available options to mention and the at...
Article
We report on two preliminary evaluations of RIPTIDES, a system that combines information extraction (IE), extraction-based summarization, and natural language generation to support user-directed multidocument summarization. We report first on a case study of the system's ability to detect discrepancies in numerical estimates appearing in di#erent n...
Article
Full-text available
This paper describes a novel approach to inducing lexico-structural transfer rules from parsed bi-texts using syntactic pattern matching, statistical cooccurrence and error-driven filtering.
Article
Full-text available
We describe a bottom-up chart re-alization algorithm adapted for use with Combinatory Categorial Grammar (CCG), and show how it can be used to efficiently realize a wide range of co-ordination phenomena, including argu-ment cluster coordination and gapping. The algorithm has been implemented as an extension to the OpenNLP open source CCG parser. As...
Article
Full-text available
In this paper, we present EXEMPLARS, an object-oriented, rule-base d framework designed to support practical, dynamic text generation, emphasizing its novel features compared to existing hybrid systems that mix template-style and more sophisticated techniques. These features-.include an extensible classification~based text planning mechanism, a def...
Article
Full-text available
We present and evaluate a randomized local search procedure for selecting sentences to include in a multidocument summary. The search favors the inclusion of adjacent sentences while penalizing the selection of repetitive material, in order to improve intelligibility without unduly affecting informativeness. Sentence similarity is determined using...
Conference Paper
Full-text available
To aid analysts in detecting discrepancies in numeric estimates in news articles from multiple sources, we propose the automatic generation of hypertext summaries that include a high-level textual overview; tables of all comparable numeric estimates, organized to highlight discrepancies; and targeted access to supporting information from the origin...
Article
Full-text available
The first part of the paper develops a novel, sortally-based approach to the problem of aspectual composition. The account is argued to be superior on both empirical and computational grounds to previous semantic approaches relying on referentiai homogeneity tests. While the account is restricted to manner-of-motion verbs, it does cover their inter...
Article
Full-text available
In the first part of the paper, I present a new treatment of THE IMPERFECTIVE PARADOX (Dowty 1979) for the restricted case of trajectory- of-motion events. This treatment extends and re- fines those of Moens and Steedman (1988) and Jackendoff (1991). In the second part, I describe an implemented algorithm based on this treatment which determines wh...
Conference Paper
Full-text available
We report on two preliminary evaluations of RIPTIDES, a sys-tem that combines information extraction (IE), extraction-based sum-marization, and natural language generation to support user-directed multidocument summarization. We report first on a case study of the system's ability to detect discrepancies in numerical estimates appear-ing in differe...
Article
is somehow to be related to properties of the predicates drink wine and drink a bottle of wine which, in event talk, may be stated as follows: an event of drinking wine may have a proper part which is also an event of drinking wine, but an event of drinking a bottle of wine cannot have a proper part which is an event of drinking a bottle of wine. I...
Conference Paper
Full-text available