
David StallardRaytheon BBN Technologies | BBN
David Stallard
About
58
Publications
7,146
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,056
Citations
Citations since 2017
Publications
Publications (58)
The development of high-performance statistical machine translation (SMT) systems is contingent on the availability of substantial, in-domain parallel training corpora. The latter, however, are expensive to produce due to the labor-intensive nature of manual translation. We propose to alleviate this problem with a novel, semi-supervised, batch-mode...
In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Ar...
If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-the-art Arabic-to...
Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourcing to cheaply and quickly build Levantine-English and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web te...
A common cause of errors in spoken language systems is the presence of out-of-vocabulary (OOV) words in the input. Named entities (people, places, organizations, etc.) are a particularly important class of OOVs. In this paper we focus on detecting OOV named entities (NEs) for two-way English/Iraqi speech-to-speech translation. Our approach builds o...
The availability of substantial, in-domain parallel corpora is critical for the development of high-performance statistical machine translation (SMT) systems. Such corpora, however, are expensive to produce due to the labor intensive nature of manual translation. We propose to alleviate this problem with a novel, semi-supervised, batch-mode active...
Speech-to-speech translation systems have made a great deal of progress in recent years. But users of such systems still face the problem of not knowing whether the system has translated their utterance correctly. Various confirmation strategies can be used to address this problem. Some of these generate a confirmation utterance for the user to app...
Production of parallel training corpora for the development of statistical machine translation (SMT) systems for resource-poor languages usually requires extensive manual effort. Active sample selection aims to reduce the labor, time, and expense incurred in producing such resources, attaining a given performance benchmark with the smallest possibl...
We report on recent improvements in our English/Iraqi Arabic speech-to-speech translation system. User interface improvements include a novel parallel approach to user confirmation which makes confirmation cost-free in terms of dialog duration. Automatic speech recognition improvements include the incorporation of state-of-the-art techniques in fea...
In this paper, we describe a novel approach that exploits intra-sentence and dialog-level context for improving translation performance on spoken Iraqi utterances that contain named entities (NEs). Dialog-level context is used to predict whether the Iraqi response is likely to contain names and the intra-sentence context is used to determine words...
Speech-to-speech translation (S2S) technology holds out the promise of allowing spoken communication across language barriers. Recently, there has been a great deal of progress in S2S technology, much of it under the sponsorship of DARPA's TransTac program. In this paper, we present BBN's S2S system, "TransTalk", whose development has been funded u...
We report on recent ASR and MT work on our English/Iraqi Arabic speech-to-speech translation system. We present detailed results for both objective and subjective evaluations of translation quality, along with a detailed analysis and categorization of translation errors. We also present novel ideas for quantifying the relative importance of differe...
In this paper, we introduce a new metric which we call the semantic translation error rate, or STER, for evaluating the performance of machine translation systems. STER is based on the previously published translation error rate (TER) (Snover et al., 2006) and METEOR (Banerjee and Lavie, 2005) metrics. Specifically, STER extends TER in two ways: fi...
In this paper we present a speech-to-speech translation system configured for translingual communication in English and colloquial Iraqi on a mobile, handheld device. The end-to-end system employs a medium/large vocabulary n-gram speech recognition engine for recognizing English and colloquial Iraqi, a question canonicalizer for mapping a recognize...
In this paper, we present a 2-way speech-to-speech translation system for English and Iraqi colloquial Arabic, the dialect of Arabic spoken by ordinary people in Iraq. The application domain of the system is military force protection, including municipal services surveys, detainee screening, and descriptions of people, houses, vehicles, etc. The sy...
We describe and present evaluation results for Talk'n'Travel, a spoken dialogue language system for making air travel plans over the telephone. Talk'n'Travel is a fully conversational, mixedinitiative system that allows the user to specify the constraints on his travel plan in arbitrary order, ask questions, etc., in general spoken English. The sys...
This paper describes the evaluation methodology and results of the 2001 DARPA Communicator evaluation. The experiment spanned 6 months of 2001 and involved eight DARPA Communicator sys- tems in the travel planning domain. It resulted in a corpus of 1242 dialogs which include many more dialogues for complex tasks than the 2000 evaluation. We describ...
this document, the user will get fired. This means that in every possible future where the system fails to print this document, the user gets fired. This research is b Andrew Haas, who is also working on a planning program that uses these ideas
We present a natural language interface system which is based entirely on trained statistical models. The system consists of three stages of processing: parsing, semantic interpretation, and discourse. Each of these stages is modeled as a statistical process. The models are fully integrated, resulting in an end-to-end system that maps input utteran...
We propose a distinction between two kinds of metonymy: "referential" metonymy, in which the referent of an N-P is shifted, and "predicative" metonymy, in which the referent of the NP is unchanged and the ar- gument place of the predicate is shifted instead. Examples are, respectively, "The hamburger is waiting for his check" and "Which airlines fl...
We describe Talk'n'Travel, a spoken dialogue language system for making air travel plans over the telephone.
We present a computational treatment of the semantics of plural Noun Phrases which extends an earlier approach presented by Scha [7] to be able to deal with multiple-level plurals ("the boys and the girls", "the juries and the committees". etc.) t We ar- gue that the arbitrary depth to which such plural structures can be nested creates a correspond...
the syntactically impossible antecedents. This latter for handling bound anaphora, disjoint reference, and pronominal reference. The algorithm maps over every node in a parse tree in a left-to-right, depth first manner. Forward and backwards coreference, and disjoint reference are assigned during this tree walk. A semantic interpretation procedure...
A new method is presented for simplifying the logical expressions used to represent utterance meaning in a natural language system. 1 This simplification method utilizes the encoded knowledge and the limited inference-making capability of a taxonomic knowledge representation system to reduce the constituent structure of logical expressions. The spe...
This paper describes results of an experiment with 9 different DARPA Communicator Systems who participated in the June 2000 data collection. All systems supported travel planning and utilized some form of mixed-initiative interaction. However they varied in several critical dimensions: (1) They targeted different back-end databases for travel infor...
A central problem for mixed-initiative dialogue management is coping with user utterances that fall outside of the expected sequence of dialogue. Independent initiative by the user may require a complete revision of the future course of the dialogue, even when the system is engaged in activities of its own, such as querying a database, etc. This pa...
We describe the first sentence understanding system that is
completely based on learned methods both for understanding individual
sentences, and determining their meaning in the context of preceding
sentences. We divide the problem into three stages: semantic parsing,
semantic classification, and discourse modeling. Each of these stages
requires a...
Describes a sentence understanding system that is completely based
on learned methods both for understanding individual sentences and for
determining their meaning in the context of the preceding sentences. We
describe the models used for each of three stages in the understanding:
semantic parsing, semantic classification and discourse modeling. Wh...
The design and performance of a complete spoken language understanding system under development at BBN are described. The system, dubbed HARC (Hear And Respond to Continuous speech), successfully integrates state-of-the-art speech recognition and natural language understanding subsystems. The system has been tested extensively on a restricted airli...
This paper presents the Semantic Linker, the fallback component used by the the DELPHI natural language component of the BBN spoken language system HARC. The Semantic Linker is invoked when DELPHI's regular chart-based unification grammar parser is unable to parse an input; it attempts to come up with a semantic interpretation by combining the frag...
We have recently made significant changes to the BBN DELPHI syntactic and semantic analysis component. These goal of these changes was to maintain the tight coupling between syntax and semantics characteristic of earlier versions of DELPHI, while making it possible for the system to provide useful semantic interpretations of input for which complet...
We present results from the February '92 evaluation on the ATIS travel planning domain for HARC, the BBN spoken language system (SLS). In addition, we discuss in detail the individual performance of BYBLOS, the speech recognition (SPREC) component.In the official scoring, conducted by NIST, BBN's HARC system produced a weighted SLS score of 43.7 on...
This paper presents the fallback understanding component of BBN's DELPHI NL sysystem. This component is invoked when the core DELPHI system is unable to understand an input. It incorporates both syntax- and frame-based fragment combination sub-components, in an attempt to provide a smoother path from accurate but fragile conventional parsers on the...
We present the "mapping unit" approach to representing subeategoriza- tion information, a computational framework for encoding subcategorization information which has been developed and implemented for BBN's DEL- PHI system (the NL component of the HARC spoken language system). The advantage of our approach to subeategorization lies in its flexibil...
ABSTRACT This paper presents the test results of running BBN's HARC spoken language system and DELPHI natural language understanding system on the ATIS benchmarks. We give a brief system overview, and review the major changes that have
This paper reports recent progress on the development of the Delphi natural language component of the BBN spoken language system for the ATIS domain, focussing on the comparative evaluation performed by NIST in June, 1990.
This paper presents recent natural language work on HARC, the BBN Spoken Language System. The HARC system in- corporates the Byblos system (6) as its speech recognition component and the natural language system Delphi, which consists of a bottom-up parser paired with an integrated syn- tax/semantics unification grammar, a discourse module, and a da...
We describe HARC, a system for speech understanding that integrates speech recognition techniques with natural language processing. The integrated system uses statistical pattern recognition to build a lattice of potential words in the input speech. This word lattice is passed to a unification parser to derive all possible associated syntactic stru...
This paper describes the current state of work on unification-based semantic interpretation in HARC (for Hear and Recognize Continous speech) the BBN Spoken Language System. It presents the implementation of an integrated syntax/semantics grammar written in a unification formalism similar to Definite Clause Grammar. This formalism is described, and...
Theories of semantic interpretation which wish to capture as many generalizations as possible must face up to the manifoldly ambiguous and contextually dependent nature of word meaning. In this paper I present a two-level scheme of semantic interpretation in which the first level deals with the semantic consequences of syntactic structure and the s...
BBN's responsibility is to conduct research and development in natural language interface technology. This responsibility has three aspects:• to demonstrate state-of-the-art technology in a Strategic Computing application, collecting data regarding the effectiveness of the demonstrated heuristics,• to conduct research in natural language interface...
An abstract is not available.
Significant advances have been achieved in Speech-to-Speech (S2S) translation systems in recent years. However, rapid configuration of S2S systems for low-resource language pairs and domains remains a challenging problem due to lack of human translated bilingual training data. In this paper, we report on an effort to port our existing English/Iraqi...