Srinivas Bangalore

Srinivas Bangalore
AT&T

About

184
Publications
15,728
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,185
Citations
Citations since 2016
0 Research Items
790 Citations
2016201720182019202020212022020406080100120140
2016201720182019202020212022020406080100120140
2016201720182019202020212022020406080100120140
2016201720182019202020212022020406080100120140

Publications

Publications (184)
Patent
A method, system and computer readable medium that generates a dialog model for use in automated dialog is disclosed. The method may include collecting a plurality of task-oriented dialog interactions between users and human agents for a given domain, identifying one or more task in each dialog interaction, identifying one or more subtasks in each...
Conference Paper
Full-text available
Statistical Machine Translation (SMT) systems are heavily dependent on the quality of parallel corpora used to train translation models. Translation quality between certain Indian languages is often poor due to the lack of training data of good quality. We used triangulation as a technique to improve the quality of translations in cases where the d...
Conference Paper
Full-text available
Morphologically rich languages generally require large amounts of parallel data to adequately estimate parameters in a statistical Machine Translation(SMT) system. However, it is time consuming and expensive to create large collections of parallel data. In this paper, we explore two strategies for circumventing sparsity caused by lack of large para...
Conference Paper
Full-text available
The purpose of the current investigation is to predict post-editor profiles based on user behaviour and demographics using machine learning techniques to gain a better understanding of post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database from the CRITT Translation Process Research Database (TPR-DB). The analy...
Conference Paper
Full-text available
Professional human translation is necessary to meet high quality standards in industry and governmental agencies. Translators engage in multiple activities during their task, and there is a need to model their behavior, with the objective to understand and optimize the translation process. In recent years, user interfaces enabled us to record user...
Conference Paper
Full-text available
In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language string into a target lan-guage string using different input methods such as speech and typing. In this paper, we improve the performance of speech recognition of a translator speaking in the target language, taking the advantage of source Language strin...
Patent
A system and method disclosed for using and updating a database of template responses for a live agent in response to user communications. The method includes computing an average string distance between each response from a live agent and a template, use to generate the response, modifying the computed average string distance based on a customer s...
Conference Paper
Full-text available
Researchers are proposing interactive machine translation as a potential method to make language translation process more efficient and usable. Introduction of different modalities like eye gaze and speech are being explored to add to the interactivity of language translation system. Unfortunately, the raw data provided by Automatic Speech Recognit...
Conference Paper
Full-text available
In this paper, we discuss our efforts in the development of Indian spoken languages corpora for building large vocabulary speech recognition systems using WATSON Toolkit. The current paper demonstrates that these corpora can be reduced to a varied degree for various phonemes by comparing the similarity among phonemes of different languages. We also...
Patent
A method, system and machine-readable medium are provided for watermarking natural language digital text. A deep structure may be generated and a group of features may be extracted from natural language digital text input. The deep structure may be modified based, at least partly, on a watermark. Natural language digital text output may be generate...
Patent
Systems, methods, and non-transitory computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the spe...
Patent
A system to generate a response to a text-based natural language message includes a user interface, processing device, and a computer-readable storage medium storing executable instructions to generate the response to the text-based natural language message. The instructions and a method for generating the response include identifying a sentence in...
Patent
A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information i...
Conference Paper
Full-text available
Typing has traditionally been the only in-put method used by human translators working with computer-assisted transla-tion (CAT) tools. However, speech is a nat-ural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recog...
Conference Paper
The Web is an ever increasing, dynamically changing, multilingual repository of text. There have been several approaches to harvest this repository for bootstrapping, supplementing and adapting data needed for training models in speech and language applications. In this paper, we present semi-supervised and unsupervised approaches to harvesting mul...
Conference Paper
In a conventional telephone conversation between two speakers of the same language, the interaction is real-time and the speakers process the information stream incrementally. In this work, we address the problem of incremental speech-to-speech translation (S2S) that enables cross-lingual communication between two remote participants over a telepho...
Conference Paper
A key step in validating a proposed idea or system is to evaluate over a suitable dataset. However, to this date there have been no useful tools for researchers to understand which datasets have been used for what purpose, or in what prior work. Instead, they have to manually browse through papers to find the suitable datasets and their correspondi...
Conference Paper
Natural Language Processing systems are often composed of a sequence of transductive components that transform their input into an output with additional syntactic and/or semantic labels. However, each component in this chain is typically error-prone and the error is magnified as the processing proceeds down the chain. In this paper, we present det...
Conference Paper
We introduce a simple and novel method for the weakly supervised problem of Part-Of-Speech tagging with a dictionary. Our method involves training a connectionist network that simultaneously learns a distributed latent representation of the words, while maximizing the tagging accuracy. To compensate for the unavailability of true labels, we resort...
Article
Conventional approaches to speech-to-speech (S2S) translation typically ignore key contextual information such as prosody, emphasis, discourse state in the translation process. Capturing and exploiting such contextual information is especially important in machine-mediated S2S translation as it can serve as a complementary knowledge source that can...
Conference Paper
Full-text available
Word prediction performed by language models has an important role in many tasks as e.g. word sense disambiguation, speech recognition, hand-writing recognition, query spelling and query segmentation. Recent research has exploited the textual content of the Web to create language models. In this paper, we propose a new focused crawling strategy to...
Conference Paper
Parallel text acquisition from the Web is an attractive way for augmenting statistical models (e.g., machine translation, crosslingual document retrieval) with domain representative data. The basis for obtaining such data is a collection of pairs of bilingual Web sites or pages. In this work, we propose a crawling strategy that locates bilingualWeb...
Conference Paper
Full-text available
With the widespread adoption of high-speed wireless networks symbiotically complemented by the burgeoning demand for smart mobile devices, access to the Internet is evolving from personal computers (PCs) to mobile devices. In this article, we highlight the characteristics of mobile search, discuss the state of speech-based mobile search, and presen...
Conference Paper
Full-text available
Distributed representations of words are attractive since they provide a means for measuring word similarity. However, most approaches to learning distributed representations are divorced from the task context. In this paper, we describe a model that learns distributed representations of words in order to optimize task performance. We investigate t...
Article
In this paper, we present techniques that exploit finite-state models for voice search applications. In particular, we illustrate the use of finite-state models for encoding the search index in order to tightly integrate the speech recognition and the search components of a voice search system. We show that the tight integration mutually benefits A...
Conference Paper
Full-text available
State-of-the-art probabilistic models of text such as n-grams require an exponential number of examples as the size of the context grows, a problem that is due to the discrete word representation. We propose to solve this problem by learning a continuous-valued and low-dimensional mapping of words, and base our predictions for the probabilities of...
Conference Paper
Full-text available
There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of t...
Conference Paper
We present an approach for enriching dialog based textto-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare...
Article
Full-text available
Determining the coreference of entity mentions in a discourse is a key part of the interpretation process for advanced spoken dialog applications. In this paper, we present the most comprehensive system for statistical coreference resolution in dialog to date. We also compare the impact of two contrasting theories of dialog structure (the stack mod...
Article
A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then...
Conference Paper
Full-text available
Mobile devices are being used as aids to access a variety of information resources by browsing the Web. However, given their limited screen real estate and soft keyboards, general Web browsing to access information is a tedious task. A system that (a) allows a user to specify their information need as a spoken language query and (b) returns the ans...
Conference Paper
Full-text available
Mobile devices are becoming the dominant mode of information access despite being cumbersome to input text using small keyboards and browsing web pages on small screens. We present Qme!, a speech-based question-answering system that allows for spoken queries and retrieves answers to the questions instead of web pages. We present bootstrap methods t...
Conference Paper
Full-text available
The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
Article
Investigations into employing statistical approaches with linguistically motivated representations and its impact on Natural Language processing tasks. © 2010 Massachusetts Institute of Technology. All rights reserved.
Article
Prosody is an important cue for identifying dialog acts. In this paper, we show that modeling the sequence of acoustic–prosodic values as n-gram features with a maximum entropy model for dialog act (DA) tagging can perform better than conventional approaches that use coarse representation of the prosodic contour through summative statistics of the...
Article
Full-text available
Multimodal grammars provide an effective mechanism for quickly creating integration and understanding capabilities for interactive systems supporting simultaneous use of multiple input modalities. However, like other approaches based on hand-crafted grammars, multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inp...
Article
Statistical phrase-based machine translation models crucially rely on word alignments. The search for word-alignments assumes a model of word locality between source and target languages that is violated in starkly different word-order languages such as English-Hindi. In this article, we present models that decouple the steps of lexical selection a...
Conference Paper
Full-text available
With the exponential growth in the number of mobile devices, voice enabled local search is emerging as one of the most popular applications. Although the application is essentially an integration of automatic speech recognition (ASR) and text or database search, the potential usefulness of this application has been widely acknowledged. In this pape...
Conference Paper
Full-text available
Current statistical speech translation approaches predominantly rely on just text transcripts and are limited in their use of rich contextual information such as prosody and discourse function. In this paper, we explore the role of discourse context characterized through dialog acts (DAs) in statistical translation. We present a bag-of-words (BOW)...
Conference Paper
Full-text available
In this paper, we discuss the benefits of tightly coupling speech recognition and search com- ponents in the context of a speech-driven search application. We demonstrate that by in- corporating constraints from the information repository that is being searched not only im- proves the speech recognition accuracy but also results in higher search ac...
Conference Paper
Full-text available
MICA is a dependency parser which returns deep dependency representations, is fast, has state-of-the-art performance, and is freely available.
Conference Paper
Full-text available
In this paper, we present an integrated model of the two central tasks of dialog management: interpreting user actions and generating system actions. We model the interpretation task as a classication prob- lem and the generation task as a predic- tion problem. These two tasks are inter- leaved in an incremental parsing-based di- alog model. We co...
Conference Paper
Full-text available
Mobile voice-enabled search is emerging as one of the most popular applications abetted by the exponential growth in the number of mobile devices. The automatic speech recognition (ASR) output of the voice query is parsed into several fields. Search is then performed on a text corpus or a database. In order to improve the ro- bustness of the query...
Conference Paper
Full-text available
Key contextual information such as word prominence, empha- sis, and contrast is typically ignored in speech-to-speech (S2S) translation due to the compartmentalized nature of the trans- lation process. Conventional S2S systems rely on extract- ing prosody dependent cues from hypothesized (possibly erro- neous) translation output using only words an...
Conference Paper
Full-text available
In this paper we describe a statistical shared plan-based approach to dialog modeling and dialog management. We apply this approach to a corpus of human-human spoken dialogs. We compare the performance of models trained on transcribed and automatically recognized speech, and present ideas for further improving the models. 1.
Article
Full-text available
In the first REG competition, researchers proposed several general-purpose algorithms for attribute selection for referring expression generation. However, most of this work did not take into account: a) stylistic differences between speakers; or b) trainable surface re-alization approaches that combine semantic and word order information. In this...
Conference Paper
Full-text available
Previous work in referring expression generation has explored general purpose techniques for attribute selection and surface realization. However, most of this work did not take into account: a) stylistic differences between speakers; or b) trainable surface realization approaches that combine semantic and word order information. In this paper we d...
Conference Paper
Full-text available
Current statistical speech translation ap- proaches predominantly rely on just text tran- scripts and do not adequately utilize the rich contextual information such as conveyed through prosody and discourse function. In this paper, we explore the role of context char- acterized through dialog acts (DAs) in statis- tical translation. We demonstrate...
Article
Full-text available
In this paper, we describe a maximum entropy-based automatic prosody labeling framework that exploits both language and speech information. We apply the proposed framework to both prominence and phrase structure detection within the Tones and Break Indices (ToBI) annotation scheme. Our framework utilizes novel syntactic features in the form of supe...
Conference Paper
Full-text available
Most existing generation systems for spoken dialog require the system engineer to specify by hand the words to be used in system prompts. However, the existence of corpora of spoken dialog makes it possible to acquire the words and structure of system prompts automatically. In this paper, we construct statistical models for generating system prompt...
Article
Prosody is an important cue for identifying dialog acts. In this paper, we show that modeling the sequence of acoustic-prosodic values as n-gram features with a maximum entropy model for dialog act (DA) tagging can perform better than conventional approaches that use coarse representation of the prosodic contour through acoustic correlates of proso...
Conference Paper
Full-text available
Abstract Compared to the telephone, email based cus- tomer care is increasingly becoming,the pre- ferred channel of communication,for corpora- tions and customers. Most email-based cus- tomer care management,systems,provide a method to include template texts in order to re- duce the handling time for a customer’s email. The text in a template is su...
Conference Paper
Full-text available
With the explosive growth in mobile computing and communication over the past few years, it is possible to access almost any information from virtually anywhere. However, the efficiency and effectiveness of this interaction is severely limited by the inherent characteristics of mobile devices, including small screen size and the lack of a viable ke...
Conference Paper
Full-text available
Cue-based automatic dialog act tagging uses lexical, syn- tactic and prosodic knowledge in the identification of dia- log acts. In this paper, we propose a discriminative frame- work for automatic dialog act tagging using maximum entropy modeling. We propose two schemes for inte- grating prosody in our modeling framework: (i) Syntax- based categori...
Article
Full-text available
Machine translation of a source language sentence involves selecting appropriate target language words and ordering the se-lected words to form a well-formed tar-get language sentence. Most of the pre-vious work on statistical machine transla-tion relies on (local) associations of target words/phrases with source words/phrases for lexical selection...
Conference Paper
In this paper we describe an automatic prosody labeling framework that exploits both language and speech information intended for the purpose of incorporating prosody within a speech-to-speech translation framework. We propose a maximum entropy syntactic- prosodic model that achieves an accuracy of 85.22% and 91.54% for pitch accent and boundary to...
Conference Paper
Full-text available
Machine translation of a source language sentence involves selecting appropriate tar- get language words and ordering the se- lected words to form a well-formed tar- get language sentence. Most of the pre- vious work on statistical machine transla- tion relies on (local) associations of target words/phrases with source words/phrases for lexical sel...
Article
There has been a contemporary surge of interest in the application of stochastic models of parsing. The use of tree-adjoining grammar (TAG) in this domain has been relatively limited due in part to the unavailability, until recently, of large-scale corpora hand-annotated with TAG structures. Our goals are to develop inexpensive means of generating...
Conference Paper
Full-text available
With the availability of large corpora of spoken dialog, it is now possible to use data-driven techniques to build and use models of task-oriented dialogs. In this paper, we use data-driven techniques to build task structures for individual dialogs, and use the dialog task structures for: dialog act classification, task/subtask classification, task...
Conference Paper
Full-text available
Data-driven techniques have influenced many aspects of speech and language processing. Models derived from data are generally more robust than hand-crafted systems since they better reflect the distributions of the phenomena being modeled. With the availability of large spoken dialog corpora, dialog management can now reap the benefit of data-drive...
Conference Paper
Full-text available
Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, hand-crafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. In previous work, we have shown how the robustness of stochastic language models can be combined with the expressiveness of multimodal...
Article
Full-text available
Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, ext...
Conference Paper
Full-text available
Multimodal grammars provide an expres- sive formalism for multimodal integra- tion and understanding. However, hand- crafted multimodal grammars can be brit- tle with respect to unexpected, erroneous, or disfluent inputs. Spoken language (speech-only) understanding systems have addressed this issue of lack of robustness of hand-crafted grammars by...
Article
Full-text available
In this paper, we present our system for statistical machine translation that is based on weighted finite-state transducers. We describe the construction of the transducer, the estima-tion of the weights, acquisition of phrases (locally ordered tokens) and the mechanism we use for global reordering. We also present a novel approach to machine trans...
Article
Summary form only given. A typical text-based natural language application (eg. machine translation, summarization, information extraction) consists of a pipeline of preprocessing steps such as tokenization, stemming, part-of-speech tagging, named entity detection, chunking, parsing. Information flows downstream through the preprocessing steps alon...
Conference Paper
Full-text available
Arabic orthography does not provide full vocalization of the text, and the reader is expected to infer short vowels from the context of the sentence. Inferring the full form of a word is useful when developing Arabic speech and language processing tools, since it is likely to reduce ambiguity in these tasks. In this paper, we present generative tec...
Article
Multimodal interfaces are systems that allow input and/or output to be conveyed over multiple channels such as speech, graphics, and gesture. In addition to parsing and understanding separate utterances from different modes such as speech or gesture, multimodal interfaces also need to parse and understand composite multimodal utterances that are di...
Article
Full-text available
Many speech and language processing problems have been suc-cessfully cast as classification problems– associating a token with a label from a prespecified label set. However, in all these appli-cations the set of labels is regarded as a flat list of symbols with no inherent internal structure and no co-constraints among the la-bels. In this paper,...