Gina-Anne Levow's research while affiliated with University of Washington Seattle and other places

Publications (121)

Article
Our study tests the acoustic fidelity of remote recordings, using a large variety of stimuli and recording environments. With standard recording environments not available due to COVID-19, more studies investigate remote recordings for acoustic analyses [e.g., Guan and Li (2021); Freeman and De Decker (2021)]. High fidelity remote recordings also s...
Article
No PDF available ABSTRACT Automatic speech recognition (ASR) has seen dramatic improvements as a result of advances in deep learning. This has led several recent studies to conclude that ASR is approaching parity with human performance, at least in certain speech contexts. These studies use average word error rate (WER) as their primary evaluation...
Preprint
Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech data, where there is often no meaningful pause between words. Near-perfect supervised methods have been developed...
Article
Advances in speech and language processing have enabled the creation of applications that could, in principle, accelerate the process of language documentation, as speech communities and linguists work on urgent language documentation and reclamation projects. However, such systems have yet to make a significant impact on language documentation, as...
Preprint
Full-text available
This study presents a corpus of turn changes between speakers in U.S. Supreme Court oral arguments. Each turn change is labeled on a spectrum of "cooperative" to "competitive" by a human annotator with legal experience in the United States. We analyze the relationship between speech features, the nature of exchanges, and the gender and legal role o...
Chapter
Automatic speech recognition (ASR) can be deployed in a previously unknown language, in less than 24 h, given just three resources: an acoustic model trained on other languages, a set of language-model training data, and a grapheme-to-phoneme (G2P) transducer to connect them. The LanguageNet G2Ps were created with the goal of being small, fast, and...
Preprint
Full-text available
We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table. A pretrained acoustic model maps acoustic features to phonemes; a reversed G2P maps these to graphemes; then a langu...
Article
This special issue contains four articles based on and expanded from systems presented at the SIGHAN-7 Chinese Spelling Check Bakeoff. We provide an overview of the approaches and designs for Chinese spelling checkers presented in these articles. We conclude this introductory article with a summary of possible future directions.
Conference Paper
Full-text available
This study investigates characteristics of stance-related discourse function, stance strength, and polarity in uses of the word 'yeah.' In an annotated corpus of 20 talker dyads engaged in collaborative tasks, over 2300 'yeahs' fall into six common stance-act categories. While agreement, usually with weak, positive stance, accounts for about three-...
Conference Paper
Full-text available
From activities as simple as scheduling a meeting to those as complex as balancing a national budget, people take stances in negotiations and decision making. While the related areas of subjectivity and sentiment analysis have received significant attention, work has focused almost exclusively on text, and much stance-taking activity is carried out...
Conference Paper
Full-text available
Normally, yeah has positive polarity, but with a change in prosody, it can convey a negative stance (e.g., polite disagreement/rejection). This study examines acoustic-prosodic features of ‘negative yeahs’ in a stance-rich corpus of collaborative tasks. Four categories are identified based on degree of agreement/acceptance and distinguished by an i...
Conference Paper
Imprecise articulatory breakdown is one of the characteristics of dysarthric speech. This work attempts to develop a framework to automatically identify problematic articulatory patterns of dysarthric speakers in terms of distinctive features (DFs), which are effective for describing speech production. The identification of problematic articulatory...
Poster
Full-text available
Stance, or a speaker’s attitudes or opinions about the topic of discussion, has been investigated textually in conversation- and discourse analysis and in computational models, but little work has focused on its acoustic-phonetic properties. This is a difficult problem, given that stance is a complex activity that must be expressed along with sever...
Conference Paper
Full-text available
The ATAROS project aims to identify acoustic signals of stance-taking in order to inform the development of automatic stance recognition in natural speech. Due to the typically low frequency of stance-taking in existing corpora that have been used to investigate related phenomena such as subjectivity, we are creating an audio corpus of unscripted c...
Technical Report
Full-text available
This technical report describes the ATAROS (Automatic Tagging and Recognition of Stance) corpus design, collection, and transcription procedures, as well as current work in progress: stance annotation protocols and task validation measures. Portions of this work were presented at the ASA 2014 Spring Meeting and are under review for Interspeech 2014...
Poster
Full-text available
While stance-taking has been examined qualitatively within conversation and discourse analysis and modeled using text-based approaches in computational linguistics, there has been little quantification of its acoustic-phonetic correlates. One reason for this is the relative sparsity of stance-taking behavior in spontaneous conversations. Another is...
Article
Differences in pronunciation have been shown to underlie significant talker-dependent intelligibility differences. There are several dimensions of variability that are correlated with talker intelligibility including pitch range, vowel-space expansion, and rhythmic patterns. Prior work has shown that some of the better predictors of individual inte...
Article
In this chapter, the authors describe their work in developing a crowdsourcing methodology for spoken dialog system (SDS) evaluation through Amazon Mechanical Turk (MTurk), a popular crowdsourcing marketplace that makes use of human intelligence online to perform tasks which cannot be completed entirely by computer programs. The chapter is organize...
Book
Provides an insightful and practical introduction to crowdsourcing as a means of rapidly processing speech data. Intended for those who want to get started in the domain and learn how to set up a task, what interfaces are available, how to assess the work, etc. as well as for those who already have used crowdsourcing and want to create better tasks...
Article
We propose a collaborative filtering (CF) model to predict user satisfaction in SDS evaluation. Inspired by the use of CF in recommendation systems, where a user's preference for a new item is assume to resemble that for similar items rated previously, we adapt the idea to predict user evaluations of unrated dialogs based on the ratings received by...
Conference Paper
Verbal feedback provides important cues in establishing interactional rapport. The challenge of recognizing contexts for verbal feedback largely arises from relative sparseness and optionality. In addition, cross-language and inter-speaker variations can make recognition more difficult. In this paper, we show that boosting can improve accuracy in r...
Conference Paper
Spoken dialog systems frameworks fill a crucial role in the spoken dialog systems community by providing resources to lower barriers to entry. However, different user groups have different requirements and expectations for such systems. Here, we consider the particular needs for spoken dialog systems toolkits within an instructional setting. We dis...
Presentation
Full-text available
Stance-taking, evaluation, attitude- and opinion-expression have been examined qualitatively within conversation analysis and discourse analysis. Recognition of evaluation has also received attention within computational linguistics. However, little work has been done on the phonetic correlates of such affective expression. In order to examine stan...
Article
Full-text available
Diverse multi-modal behaviors provide important cues in establishing and maintaining interactional rapport. However, these behaviors are often subtle and culture-specific. In this paper, we focus on two forms of backchannel behavior: vocal backchannels and non-verbal headnods. We employ a corpus of quasi-monologic story-telling interactions elicite...
Conference Paper
We propose a tone recognition approach that employs linear-chain Conditional Random Fields (CRF) to model tone variation due to intonation effects. We implement three linear-chain CRFs which aim at modeling intonation effects at phrasesentence-and story-level boundaries, where we show that standard recognition techniques degrade and common normaliz...
Article
Full-text available
Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and mi...
Chapter
The University of Maryland participated in the TDT-1999 topic tracking task. This chapter describes the system architecture, including source-dependent normalization, and then focuses on the cross-language case in which English training stories were used to find Mandarin stories on the same topic. Processes that may introduce noise, including error...
Conference Paper
Developing accurate models to automatically predict user satisfaction about the overall quality of a Spoken Dialog System (SDS) is highly desirable for SDS evaluation. In the original PARADISE framework, a linear regression model is trained using measures drawn from rated dialogs as predictors with user satisfaction as the target. In this paper, we...
Conference Paper
Verbal feedback is an important information source in establishing interactional rapport. However, predicting verbal feedback across languages is challenging due to language-specific differences, inter-speaker variation, and the relative sparseness and optionality of verbal feedback. In this paper, we employ an approach combining classifier weighti...
Article
This paper presents an initial attempt at the use of crowd-sourcing for collection of user judgments on spoken dialog systems (SDSs). This is implemented on Amazon Mechanical Turk (MTurk), where a Requester can design a human intelligence task (HIT) to be performed by a large number of Workers efficiently and cost-effectively. We describe a design...
Article
Development of spoken dialog systems (SDSs) can be facilitated by better evaluation methods. Previous methods seldom consider the efficiency of the system, which is important to users. We study the problem of evaluating SDSs and propose a new framework by generalizing states from utterances of dialogs to build finite state machine (FSM). These stat...
Article
User evaluations of dialogs from a spoken dialog system (SDS) can be directly used to gauge the system's performance. However, it is costly to obtain manual evaluations of a large corpus of dialogs. Semi-supervised learning (SSL) provides a possible solution. This process learns from a small amount of manually labeled data, together with a large am...
Conference Paper
Full-text available
Aspects of speech and non-verbal behavior allow conversational partners to establish and maintain rapport by signaling engagement or endorsement. In the verbal channel, these factors encompass requests for and production of vocal feedback, as well as lexical and grammatical mirroring. However, these cues are often subtle and culture-specific. Here,...
Conference Paper
Prosody plays an integral role in spoken language understanding. In isiZulu, a Nguni family language with lexical tone, prosodic information determines word meaning. We assess the impact of models of tone and coarticulation for tone recognition. We demonstrate the importance of modeling prosodic context to improve tone recognition. We employ this l...
Conference Paper
Acquisition of prosody, in addition to vo- cabulary and grammar, is essential for lan- guage learners. However, it has received less attention in instruction. To enable automatic identification and feedback on learners' prosodic errors, we investigate automatic pitch accent labeling for non- native speech. We demonstrate that an acoustic-based cont...
Conference Paper
We investigate several measures of voice quality (VQ) to improve tone recognition in Mandarin Chinese. We find that band energy measures such as spectral balance (Sluijter and van Heuven, 1996) work better than measures based on glottal flow estimation and harmonic-formant differences. We also determine a set of bands and measures that improve tone...
Conference Paper
This paper discusses a new approach to improve tone recognition by modeling the tone nucleus with vowel landmark detection. The tone nucleus region is identified based on vowel landmark frames derived by an automatic landmark recognition system. In the corresponding tone recognition experiments, the best results with landmark-based tone nucleus reg...
Article
Teaching Computational Linguistics is in- herently multi-disciplinary and frequently poses challenges and provides opportunities in teaching to a student body with diverse ed- ucational backgrounds and goals. This pa- per describes the use of a computational en- vironment (SIDGrid) that facilitates interdis- ciplinary instruction by providing suppo...
Article
Full-text available
The SIDGrid architecture provides a frame-work for distributed annotation, archiving, and analysis of the rapidly growing volume of multimodal data. The framework inte-grates three main components: an annota-tion and analysis client, a web-accessible data repository, and a portal to the dis-tributed processing capability of the Ter-aGrid. The archi...
Article
Document indexing and representation of term-document relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Seman-tic Analysis as a framework for computing se-mantically motivated term and document vec-tors. Our focus on term vectors is motivated by the recent success of co-occurrenc...
Conference Paper
Document representation has a large im- pact on the performance of document re- trieval and clustering algorithms. We propose a hybrid document indexing scheme that combines the traditional bag- of-words representation with spectral em- bedding. This method accounts for the specifics of the document collection and also uses semantic similarity info...
Article
Full-text available
The Social Informatics Data Grid is a new infrastructure designed to transform how social and behavioral scientists collect and annotate data, collaborate and share data, and analyze and mine large data repositories. An important goal of the project is to be compatible with existing databases and tools that support the sharing, storage and retrieva...
Article
The Third International Chinese Language Processing Bakeoff was held in Spring 2006 to assess the state of the art in two important tasks: word segmentation and named entity recognition. Twenty-nine groups submitted result sets in the two tasks across two tracks and a total of five corpora. We found strong results in both tasks as well as continuin...
Article
Document indexing and representation of term-document relations are very impor-tant for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based association measure within the Generalized Latent Semantic Analysis framework. We evaluate the graph-based GLSA on the document cluster...
Article
We investigate the use in Mandarin tone recognition of over two hundred possible local acoustic features based on pitch, overall intensity, and band-passe d intensity in the rhyme of a syllable. Features involving pitch height are not as useful as one might expect, showing the need for phrase-level pitch height correction. The intensity contour is...
Conference Paper
To improve tone recognition in continuous speech, we propose a strategy focusing on separating regions influenced by tonal coar- ticulation from regions that more closely approximate canonical tone production. Given a syllable segmentation, this approach em- ploys amplitude and pitch information to generate an improved sub-syllable segmentation and...
Conference Paper
We use a combination of linear support vector machines and hidden markov models for dialog act tagging in the HCRC MapTask corpus, and obtain better results than those previ- ously reported. Support vector machines allow easy integra- tion of sparse high-dimensional text features and dense low- dimensional acoustic features, and produce posterior p...
Conference Paper
Recognition of tone and intonation is es- sential for speech recognition and lan- guage understanding. However, most ap- proaches to this recognition task have re- lied upon extensive collections of man- ually tagged data obtained at substantial time and financial cost. In this paper, we explore two approaches to tone learn- ing with substantially...
Conference Paper
Term translation probabilities proved an effective method of semantic smoothing in the language modelling approach to infor- mation retrieval tasks. In this paper, we use Generalized Latent Semantic Analysis to compute semantically motivated term and document vectors. The normalized cosine similarity between the term vec- tors is used as term trans...
Article
Most cues for Mandarin tone recognition involve pitch, overall intensity and duration. This paper investigates ten other possible cues, and finds one that resu lts in modest, but significant, improvement in classification accuracy on a small speaker-independent cor pus of Mandarin news broadcast speech. This cue consists of the energies in the sixt...
Article
Cross-language information retrieval (CLIR) systems allow users to find documents written in different languages from that of their query. Simple knowledge structures such as bilingual term lists have proven to be a remarkably useful basis for bridging that language gap. A broad array of dictionary-based techniques have demonstrated utility, but co...
Article
We use Support Vector Machines for Dialog Act Tagging in the HCRC MapTask cor-pus, and achieve 64.5% classification accu-racy using text and prosodic features with-out using contextual or higher-level infor-mation. The sparse text features are con-verted to a dense representation using Prin-cipal Components Analysis before concate-nating them with...
Conference Paper
Tone and intonation play a crucial role across many lan- guages. However, the use and structure of tone varies widely, ranging from lexical tone which determines word identity to pitch accent signalling information status. In this paper, we employ a uniform representation of acous- tic features for recognition of both Mandarin tone and English pitc...
Article
Fluent dialogue requires that speak-ers successfully negotiate and signal turn-taking. While many cues to turn change have been proposed, especially in multi-modal frameworks, here we fo-cus on the use of prosodic cues to these functions. In particular, we consider the use of prosodic cues in a tone lan-guage, Mandarin Chinese, where varia-tions in...
Conference Paper
Miscommunication in human-computer interaction is unavoid- able, although speech recognition accuracy continues to im- prove. The perceived difficulty of correcting miscommunica- tions has an even larger negative impact on assessments of sys- tem quality than does the absolute error rate. Therefore it is essential to improve error resolution capabi...
Conference Paper
The University of Chicago participated in the Cross-Language Evaluation Forum 2004 (CLEF2004) cross-language multilingual, bilingual, and spoken language tracks. Cross-language experiments focused on meeting the challenges of new languages with freely available resources. We found that modest effectiveness could be achieved with the additional appl...
Conference Paper
Full-text available
This paper summarizes the Cross-Language Spoken Document Retrieval (CL-SDR) track held at CLEF 2004. The CL-SDR task at CLEF 2004 was again based on the TREC-8 and TREC-9 SDR tasks. This year the CL-SDR task was extended to explore the unknown story boundaries condition introduced at TREC. The paper reports results from the participants showing tha...
Article
Automatic topic segmentation, separation of a discourse stream into its constituent sto-ries or topics, is a necessary preprocessing step for applications such as information re-trieval, anaphora resolution, and summariza-tion. While significant progress has been made in this area for text sources and for English au-dio sources, little work has bee...
Article
Automatic topic segmentation, separation of a discourse stream into its constituent sto-ries or topics, is a necessary preprocessing step for applications such as information re-trieval, anaphora resolution, and summariza-tion. While significant progress has been made in this area for text sources and for English au-dio sources, little work has bee...
Article
Tonal languages, such as Mandarin, convey information using both phonemes and tones. Using a recently proposed framework for measuring the functional load of a phonological contrast (i.e. how much use a language makes of the contrast), we carry out several computations to estimate how much use Mandarin makes of tones. The most interesting result is...
Article
This paper describes a system for rapidly retargetable interactive translingual retrieval. Basic functionality can be achieved for a new document language in a single day, and further improvements require only a relatively modest additional investment. We applied the techniques rst to search Chinese collections using English queries, and have succe...
Article
Automatic topic segmentation, separation of a dis-course stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for En-glish audio sources, little work has been...
Article
Pseudo-relevance feedback, while useful in mono-lingual applications for refining and enriching short user queries, proves even more important in cross-language information retrieval (CLIR). For CLIR, query expansion before and after translation can pro-vide an opportunity to recover from translation gaps, reduce ambiguity, and enhance recall. Furt...
Article
This paper describes a system for rapidly retargetable interactive translingual retrieval. Basic functionality can be achieved for a new document language in a single day, and further improvements require only a relatively modest additional investment. We applied the techniques first to searchChinese collections using English queries, and have succ...
Conference Paper
Full-text available
This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of English- Mandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-b...
Article
Query expansion by pseudo-relevance feedback is a well-established technique in both mono- and cross- lingual information retrieval, enriching and disambiguating the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent development motivated by error-prone transcription and translation process...
Conference Paper
Query expansion by pseudo-relevance feedback is a well-established technique in both mono- and cross- lingual informa- tion retrieval, enriching and disambiguat- ing the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent de- velopment motivated by error-prone tran- scription and translation...
Article
This paper describes an approach to large-scale construction of a semantic hierarchy for Chinese verbs. Leveraging off of an existing Chinese conceptual database called HowNet and a Levin-based English verb classification, we use thematic-role information to create links between Chinese concepts and English classes. The resulting hierarchy is used...
Article
This paper addresses the problem of automatic acquisition of lexical knowledge for rapid construction of engines for machine translation and embedded multilingual applications. We describe new techniques for large-scale construction of a Chinese–English verb lexicon and we evaluate the coverage and effectiveness of the resulting lexicon. Leveraging...
Article
Miscommunication in spoken human–computer interaction is unavoidable. Ironically, the user's attempts to repair these miscommunications are even more likely to result in recognition failures, leading to frustrating error “spirals”. In this paper we investigate users' adaptations to recognition errors made by a spoken language system and the impact...
Article
Full-text available
This report describes Project MEI (Mandarin-English Information), one of the four projects selected for the Johns Hopkins University Summer Workshop 2000. Our research focus is on the integration of speech recognition and embedded machine translation technologies in the context of crosslingual spoken document retrieval (CL-SDR), also known as trans...
Article
This paper describes the Mandarin–English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) fro...
Conference Paper
The limited coverage of available translation lexicons can pose a serious challenge in some cross-language information retrieval applications. We present two techniques for combining evidence from dictionary-based and corpus-based translation lexicons, and show that backoff translation outperforms a technique based on merging lexicons.
Article
: This paper addresses the problem of building conceptual resources for multilingual applications. We describe new techniques for large-scale construction of a Chinese-English lexicon for verbs, using thematic-role information to create links between Chinese and English conceptual information. We then present an approach to compensating for gaps in...
Article
. The University of Maryland participated in the CLEF 2000 multilingual task, submitting three official runs that explored the impact of applying language-independent stemming techniques to dictionarybased cross-language information retrieval. The paper begins by describing a cross-language information retrieval architecture based on balanced docum...
Article
Normal human speech has a clear intonational and rhythmic character. This is true of many Pacific Rim languages and plays a particularly crucial role in the many tone languages of the region, such as Thai and Chinese. However, most computer speech systems fail to utilize prosody for disambiguation or increased naturalness. In this paper, we examine...
Article
The University of Maryland participated in the topic tracking task, submitting four runs for the required condition (four English training stories). In this paper, we present the results of these runs and three additional contrastive runs, comparing the effectiveness of different translation selection strategies, stopwording in Mandarin, post-trans...
Article
Full-text available
We outline challenges for modeling human language assessment in automatic systems, both in terms of the process and the reliability of the result. We propose an architecture for a system to evaluate examinees via the Computerized Oral Proficiency Instrument, to determine whether they have `reached' or `not reached' the Intermediate Low level of pro...
Article
Miscommunication in speech recognition systems is unavoidable, but a detailed characterization of user corrections will enable speech systems to identify when a correction is taking place and to more accurately recognize the content of correction utterances. In this paper we investigate the adaptations of users when they encounter recognition error...
Article
The University of Maryland participated in the topic tracking task, submitting four runs for the required conditions (basic and challenge). In this working notes paper, we present preliminary results based on those runs and six additional contrastive runs that explored translation selection, posttranslation resegmentation, post-transcription docume...
Article
The University of Maryland participated in the CLEF 2000 multilingual task, submitting three official runs that explored the impact of applying language-independent stemming techniques to dictionary-based cross-language information retrieval. The paper begins by describing a cross-language information retrieval architecture based on balanced docume...
Article
The University of Maryland participated in the CLEF 2000 multilingual task of information retrieval in English, French, Italian, and German, submitting three contrastive runs. These runs explore the impact of using language-independent stemming techniques in a dictionary-based document translation strategy. We performed a three-way contrast of unst...