Richard Schwartz

Richard Schwartz
Raytheon BBN Technologies | BBN · Speech Language and Multimedia

BS in EE from M.I.T.

About

307
Publications
43,825
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,831
Citations
Citations since 2017
6 Research Items
2963 Citations
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
Additional affiliations
February 1972 - August 2013
Raytheon BBN Technologies
Position
  • Principal Investigator
Description
  • Research on many areas of speech and language including speech recognition, keyword spotting, speech coding, speaker ID, OCR, handwriting recognition, information retrieval, topic spotting, name finding, machine translation

Publications

Publications (307)
Preprint
Full-text available
Graph-based extractive document summarization relies on the quality of the sentence similarity graph. Bag-of-words or tf-idf based sentence similarity uses exact word matching, but fails to measure the semantic similarity between individual words or to consider the semantic structure of sentences. In order to improve the similarity measure between...
Conference Paper
Full-text available
We propose a neural network model to estimate word translation probabilities for Cross-Lingual Information Retrieval (CLIR). The model estimates better probabilities for word translations than automatic word alignments alone, and generalizes to unseen source-target word pairs. We further improve the lexical neural translation model (and subsequentl...
Article
Full-text available
We investigate modeling strategies for English code-switched words as found in a Swahili spoken term detection system. Code switching, where speakers switch language in a conversation, occurs frequently in multilingual environments, and typically de- teriorates STD performance. Analysis is performed in the context of the IARPA Babel program which f...
Article
Many under-resourced languages such as Arabic diglossia or Hindi sub-dialects do not have sufficient in-domain text to build strong language models for use with automatic speech recognition (ASR). Semi-supervised language modeling uses a speech-to-text system to produce automatic transcripts from a large amount of in-domain audio typically to augme...
Article
Full-text available
We present a three-pronged approach to improving Statistical Machine Translation (SMT), building on recent success in the application of neural networks to SMT. First, we propose new features based on neural networks to model various non-local translation phenomena. Second, we augment the architecture of the neural network with tensor layers that c...
Article
Full-text available
We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and...
Conference Paper
Full-text available
Recent work has shown success in using neural network language models (NNLMs) as features in MT systems. Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window. Our model is purely lexicalized and can be integrated into any MT decoder. We also present several variations of...
Conference Paper
Full-text available
In this paper we describe the Vietnamese conversational telephone speech keyword spotting system under the IARPA Babel program for the 2013 evaluation conducted by NIST. The system contains several, recently developed, novel methods that significantly improve speech-to-text and keyword spotting performance such as stacked bottleneck neural network...
Conference Paper
As shown in [1, 2], score normalization is of crucial importance for improving the Average Term-Weighted Value (ATWV) measure that is commonly used for evaluating keyword spotting systems. In this paper, we compare three different methods for score normalization within a keyword spotting system that employs phonetic search. We show that a new unsup...
Patent
Full-text available
A method for text recognition includes generating a number of text hypotheses for an image, for example, using an HMM based approach using fixed-width analysis features. For each text hypothesis, one or more segmentations are generated and scored at the segmental level, for example, according to character or character group segments of the text hyp...
Article
We compare several approaches, separately and together, for spotting of out-of-vocabulary (OOV) keywords, in terms of their ATWV scores. We considered three types of recognition units (whole words, syllables, and subwords of different lengths) and two basic search strategies (whole-unit, fuzzy phonetic search). In all cases, the search was performe...
Article
This paper presents a set of techniques that we used to improve our keyword search system for the third phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded radio communication channels. The results for both Levantine and Farsi, which...
Article
In this work, we investigate how to improve semi-supervised DNN for low resource languages where the initial systems may have high error rate. We propose using semi-supervised MLP features for DNN training, and we also explore using confidence to improve semi-supervised cross entropy and sequence training. The work conducted in this paper was evalu...
Conference Paper
In this paper, we investigate semi-supervised training for low resource languages where the initial systems may have high error rate (≥ 70.0% word eror rate). To handle the lack of data, we study semi-supervised techniques including data selection, data weighting, discriminative training and multilayer perceptron learning to improve system performa...
Conference Paper
We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system com...
Conference Paper
Full-text available
This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differi...
Article
We present a systematic study of the effect of crowdsourced translations on Machine Translation performance. We compare Machine Translation systems trained on the same data but with translations obtained using Amazon's Mechanical Turk vs. professional translations, and show that the same performance is obtained from Mechanical Turk translations at...
Conference Paper
Full-text available
Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourcing to cheaply and quickly build Levantine-English and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web te...
Article
We present a method that avoids the problem of a large vocabulary recognition system missing keywords due to pruning errors or degraded speech. The method, called white listing, assures that all tokens of all of the keywords are found by the recognizer, albeit with a low score. We show that this method far outperforms methods that attempt to increa...
Article
Up till recently, state-of-the-art, large vocabulary, continuous speech recognition (CSR) had employed hidden Markov modeling (HMM) to model speech sounds. In an attempt to improve over HMM we developed a hybrid system that integrates HMM technology with neural networks. We present the concept of a Segmental Neural Net (SNN) for phonetic modeling i...
Conference Paper
Full-text available
Useful training data for automatic speech recognition systems of colloquial speech is usually limited to expensive in-domain transcription. Broadcast news is an appealing source of easily available data to bootstrap into a new dialect. However, some languages, like Arabic, have deep linguistic differences resulting in poor cross domain performance....
Conference Paper
Full-text available
BBN submitted system combination outputs for Czech-English, German-English, Spanish-English, and French-English language pairs. All combinations were based on confusion network decoding. The confusion networks were built using incremental hypothesis alignment algorithm with flexible matching. A novel bi-gram count feature, which can penalize bi-gra...
Conference Paper
In this paper we attempt to quantify the ability of naive listeners to perform speaker recognition in the context of the NIST evaluation task. We describe our protocol: a series of listening experiments using large numbers of naive listeners (432) on Amazon's Mechanical Turk that attempts to measure the ability of the average human listener to perf...
Article
This chapter describes approaches for translation from speech. Translation from speech presents two new issues. First, of course, we must recognize the speech in the source language. Although speech recognition has improved considerably over the last three decades, it is still far from being a solved problem. In the best of conditions, when the spe...
Article
Full-text available
BBN submitted system combination out-puts for Czech-English, German-English, Spanish-English, French-English, and All-English language pairs. All combinations were based on confusion network decod-ing. An incremental hypothesis alignment algorithm with flexible matching was used to build the networks. The bi-gram de-coding weights for the single so...
Article
Full-text available
We present a method for incorporating arbitrary context-informed word attributes into statistical machine translation by clustering attribute-qualified source words, and smoothing their word translation probabilities using binary decision trees. We describe two ways in which the decision trees are used in machine translation: by using the attribute...
Article
Full-text available
In this paper we present a methodoly for the scoring of punctuation annotated texts, as well as a preliminary system to perform the task. We modify SCLITE's scoring method- ology to support scoring of punctuation. Using this method- ology, we show that the error rate of an initial automatic sys- tem is comparable to annotator inconsistency. However...
Conference Paper
Full-text available
Previous work on self-training of acoustic models using unlabeled data reported significant reductions in WER assuming a large phonetic dictionary was available. We now assume only those words from ten hours of speech are initially available. Subsequently, we are then given a large vocabulary and then quantify the value of repeating self-training w...
Article
Full-text available
This paper describes a new evaluation metric, TER-Plus (TERp) for automatic evaluation of machine translation (MT). TERp is an extension of Translation Edit Rate (TER). It builds on the success of TER as an evaluation metric and alignment tool and addresses several of its weaknesses through the use of paraphrases, stemming, synonyms, as well as edi...
Conference Paper
Full-text available
We measure the effects of a weak language model, estimated from as little as 100k words of text, on unsupervised acoustic model training and then explore the best method of using word confidences to estimate n-gram counts for unsupervised language model training. Even with 100k words of text and 10 hours of training data, unsupervised acoustic mode...
Article
Full-text available
Abstract Automatic,Machine,Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments,of,translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, mea- sure varying aspects of MT performance that can be captured by autom...
Conference Paper
Full-text available
Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments of translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, measure varying aspects of MT performance that can be captured by automatic MT met...
Conference Paper
Full-text available
This paper describes the incremental hypothesis alignment algorithm used in the BBN submissions to the WMT09 system combination task. The alignment algorithm used a sentence specific alignment order, flexible matching, and new shift heuristics. These refinements yield more compact confusion networks compared to using the pair-wise or incremental TE...
Article
Full-text available
Confusion network decoding has been the most successful approach in combining out- puts from multiple machine translation (MT) systems in the recent DARPA GALE and NIST Open MT evaluations. Due to the vary- ing word order between outputs from differ- ent MT systems, the hypothesis alignment presents the biggest challenge in confusion network decodi...
Conference Paper
Full-text available
Traditionally, statistical machine translation systems have relied on parallel bi-lingual data to train a translation model. While bi-lingual parallel data are expensive to generate, mono- lingual data are relatively common. Yet mono- lingual data have been under-utilized, having been used primarily for training a language model in the target langu...
Article
Full-text available
In this paper, we discuss how we apply automatically generated semantic knowledge to benefit statistical machine translation (SMT). Currently, almost all statistical machine translation systems rely heavily on memorizing translations of phrases. Some systems attempt to go further and generalize these learned phrase translations into templates using...
Article
Full-text available
Most state-of-the-art statistical machine trans-lation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maxi-mize a translation quality metric, using held-out test sentences and their corresponding ref-erence translations. However,...
Article
Full-text available
This paper describes TER-Plus (TERp) the University of Maryland / BBN Technologies submission for the NIST Metric MATR 2008 workshop on automatic machine translation evaluation metrics. TERp is an extension of Translation Edit Rate (TER) that builds off of the success of TER as an evaluation metric and alignment tool while addressing several of its...
Article
Full-text available
This paper addresses two types of classification of noisy, unstructured text such as newsgroup messages: (1) spotting messages containing topics of interest, and (2) automatic conceptual organization of messages without prior knowledge of topics of interest. In addition to applying our hidden Markov model methodology to spotting topics of interest...
Article
This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual’s performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user...
Article
This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization—a “parse-and-trim” approach and a statistical noisy-channel approach. We introduce the multi-candidate reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generat...
Conference Paper
Full-text available
This paper presents a set of experiments that we conducted in order to optimize the performance of an Arabic/English machine translation system on broadcast news and conversational speech data. Proper integration of speech-to-text (STT) and machine translation (MT) requires special attention to issues such as sentence boundary detection, punctuatio...
Conference Paper
Full-text available
In many applications of topic spotting technology, especially those that require a human review of in-topic documents, a low false alarm rate is a key requirement. Topic spotting techniques typically include a rejection scheme to filter out off-topic documents. In this paper we present a robust methodology for rejecting off-topic messages that, in...
Conference Paper
Full-text available
This paper investigates the use of several language model adaptation techniques applied to the task of machine translation from Arabic broadcast speech. Unsupervised and discriminative approaches slightly outperform the traditional perplexity-based optimization technique. Language model adaptation, when used for n-best rescoring, improves machine t...
Conference Paper
Full-text available
Currently there are several approaches to machine translation (MT) based on differ- ent paradigms; e.g., phrasal, hierarchical and syntax-based. These three approaches yield similar translation accuracy despite using fairly different levels of linguistic knowledge. The availability of such a variety of systems has led to a growing interest toward f...
Conference Paper
Full-text available
Recently, confusion network decoding has been applied in machine translation system combination. Due to errors in the hypoth- esis alignment, decoding may result in un- grammatical combination outputs. This pa- per describes an improved confusion net- work based method to combine outputs from multiple MT systems. In this approach, ar- bitrary featu...
Article
Full-text available
This paper aims to quantify the main error types the 2004 BBN speech recognition system made in the broadcast news (BN) and conversational telephone speech (CTS) DARPA EARS evaluations. We show that many of the remaining errors occur in clusters rather than isolated, have specific causes, and differ to some extent between the BN and CTS domains. Th...
Article
Full-text available
This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive g...
Conference Paper
Full-text available
The region-dependent transform (RDT) is a feature extrac- tion method for speech recognition that employs the Minimum Phoneme Error (MPE) criterion to optimize a set of feature trans- forms, each concentrating on a region of the acoustic space. Pre- vious results have shown that RDT gives significant recognition- error reduction in a large vocabula...
Conference Paper
Full-text available
Discriminatively trained feature transforms such as MPE-HLDA, fMPE and MMI-SPLICE have been shown to be effective in reducing recognition errors in today's state-of-the-art speech recognition systems. This paper introduces the concept of region dependent linear transform (RDLT), which unifies the above three types of feature transforms and provides...
Conference Paper
Full-text available
In this paper, we present a novel approach for morphological decomposition in large vocabulary Arabic speech recognition. It achieved low out-of-vocabulary (OOV) rate as well as high recognition accuracy in a state-of-the-art Arabic broadcast news transcription system. In this approach, the compound words are decomposed into stems and affixes in bo...
Conference Paper
Full-text available
This paper presents our recent effort that aims at improving our Arabic broadcast news (BN) recognition system by using thousands of hours of un-transcribed Arabic audio in the way of unsupervised training. Unsupervised training is first carried out on the 1,900-hour English topic detection and tracking (TDT) data and is compared with the lightly-s...
Article
Full-text available
We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judg-ments. Translation Edit Rate (TER) mea-sures the amount of editing that a hu-man would have to perform to change a system output so it exactly matches a ref...
Article
Full-text available
We applied a single-document sentence-trimming approach (Trimmer) to the problem of multi-document summariza-tion. Trimmer was designed with the in-tention of compressing a lead sentence into a space consisting of tens of char-acters. In our Multi-Document Trimmer (MDT), we use Trimmer to generate multiple trimmed candidates for each sentence. Sent...
Article
Full-text available
In this paper, we present cluster-dependent acoustic modeling for large-vocabulary speech recognition. With large amount of acoustic training data, we build multiple cluster-dependent models (CDM), each focusing on a group of speakers in order to represent speaker-dependent characteristics. It is motivated by the fact that a sufficiently trained sp...
Article
Full-text available
We implemented an initial application of a sentence-trimming approach (Trim-mer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks. Sentence trimming was incorporated into a feature-based summarization system, called Multi-Document Trimmer (MDT), by us-ing sentence trimming as both a pre-processing stage and a feature...