Gina-Anne Levow's research while affiliated with University of Washington Seattle and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (121)
Our study tests the acoustic fidelity of remote recordings, using a large variety of stimuli and recording environments. With standard recording environments not available due to COVID-19, more studies investigate remote recordings for acoustic analyses [e.g., Guan and Li (2021); Freeman and De Decker (2021)]. High fidelity remote recordings also s...
No PDF available
ABSTRACT
Automatic speech recognition (ASR) has seen dramatic improvements as a result of advances in deep learning. This has led several recent studies to conclude that ASR is approaching parity with human performance, at least in certain speech contexts. These studies use average word error rate (WER) as their primary evaluation...
Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech data, where there is often no meaningful pause between words. Near-perfect supervised methods have been developed...
Advances in speech and language processing have enabled the creation of applications that could, in principle, accelerate the process of language documentation, as speech communities and linguists work on urgent language documentation and reclamation projects. However, such systems have yet to make a significant impact on language documentation, as...
This study presents a corpus of turn changes between speakers in U.S. Supreme Court oral arguments. Each turn change is labeled on a spectrum of "cooperative" to "competitive" by a human annotator with legal experience in the United States. We analyze the relationship between speech features, the nature of exchanges, and the gender and legal role o...
Automatic speech recognition (ASR) can be deployed in a previously unknown language, in less than 24 h, given just three resources: an acoustic model trained on other languages, a set of language-model training data, and a grapheme-to-phoneme (G2P) transducer to connect them. The LanguageNet G2Ps were created with the goal of being small, fast, and...
We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table. A pretrained acoustic model maps acoustic features to phonemes; a reversed G2P maps these to graphemes; then a langu...
This special issue contains four articles based on and expanded from systems presented at the SIGHAN-7 Chinese Spelling Check Bakeoff. We provide an overview of the approaches and designs for Chinese spelling checkers presented in these articles. We conclude this introductory article with a summary of possible future directions.
This study investigates characteristics of stance-related discourse function, stance strength, and polarity in uses of the word 'yeah.' In an annotated corpus of 20 talker dyads engaged in collaborative tasks, over 2300 'yeahs' fall into six common stance-act categories. While agreement, usually with weak, positive stance, accounts for about three-...
From activities as simple as scheduling a meeting to those as complex as balancing a national budget, people take stances in negotiations and decision making. While the related areas of subjectivity and sentiment analysis have received significant attention, work has focused almost exclusively on text, and much stance-taking activity is carried out...
Normally, yeah has positive polarity, but with a change in prosody, it can convey a negative stance (e.g., polite disagreement/rejection). This study examines acoustic-prosodic features of ‘negative yeahs’ in a stance-rich corpus of collaborative tasks. Four categories are identified based on degree of agreement/acceptance and distinguished by an i...
Imprecise articulatory breakdown is one of the characteristics of dysarthric speech. This work attempts to develop a framework to automatically identify problematic articulatory patterns of dysarthric speakers in terms of distinctive features (DFs), which are effective for describing speech production. The identification of problematic articulatory...
Stance, or a speaker’s attitudes or opinions about the topic of discussion, has been investigated textually in conversation- and discourse analysis and in computational models, but little work has focused on its acoustic-phonetic properties. This is a difficult problem, given that stance is a complex activity that must be expressed along with sever...
The ATAROS project aims to identify acoustic signals of stance-taking in order to inform the development of automatic stance recognition in natural speech. Due to the typically low frequency of stance-taking in existing corpora that have been used to investigate related phenomena such as subjectivity, we are creating an audio corpus of unscripted c...
This technical report describes the ATAROS (Automatic Tagging and Recognition of Stance) corpus design, collection, and transcription procedures, as well as current work in progress: stance annotation protocols and task validation measures. Portions of this work were presented at the ASA 2014 Spring Meeting and are under review for Interspeech 2014...
While stance-taking has been examined qualitatively within conversation and discourse analysis and modeled using text-based approaches in computational linguistics, there has been little quantification of its acoustic-phonetic correlates. One reason for this is the relative sparsity of stance-taking behavior in spontaneous conversations. Another is...
Differences in pronunciation have been shown to underlie significant talker-dependent intelligibility differences. There are several dimensions of variability that are correlated with talker intelligibility including pitch range, vowel-space expansion, and rhythmic patterns. Prior work has shown that some of the better predictors of individual inte...
In this chapter, the authors describe their work in developing a crowdsourcing methodology for spoken dialog system (SDS) evaluation through Amazon Mechanical Turk (MTurk), a popular crowdsourcing marketplace that makes use of human intelligence online to perform tasks which cannot be completed entirely by computer programs. The chapter is organize...
Provides an insightful and practical introduction to crowdsourcing as a means of rapidly processing speech data. Intended for those who want to get started in the domain and learn how to set up a task, what interfaces are available, how to assess the work, etc. as well as for those who already have used crowdsourcing and want to create better tasks...
We propose a collaborative filtering (CF) model to predict user satisfaction in SDS evaluation. Inspired by the use of CF in recommendation systems, where a user's preference for a new item is assume to resemble that for similar items rated previously, we adapt the idea to predict user evaluations of unrated dialogs based on the ratings received by...
Verbal feedback provides important cues in establishing interactional rapport. The challenge of recognizing contexts for verbal feedback largely arises from relative sparseness and optionality. In addition, cross-language and inter-speaker variations can make recognition more difficult. In this paper, we show that boosting can improve accuracy in r...
Spoken dialog systems frameworks fill a crucial role in the spoken dialog systems community by providing resources to lower barriers to entry. However, different user groups have different requirements and expectations for such systems. Here, we consider the particular needs for spoken dialog systems toolkits within an instructional setting. We dis...
Stance-taking, evaluation, attitude- and opinion-expression have been examined qualitatively within conversation analysis and discourse analysis. Recognition of evaluation has also received attention within computational linguistics. However, little work has been done on the phonetic correlates of such affective expression. In order to examine stan...
Diverse multi-modal behaviors provide important cues in establishing and maintaining interactional rapport. However, these behaviors are often subtle and culture-specific. In this paper, we focus on two forms of backchannel behavior: vocal backchannels and non-verbal headnods. We employ a corpus of quasi-monologic story-telling interactions elicite...
We propose a tone recognition approach that employs linear-chain Conditional Random Fields (CRF) to model tone variation due to intonation effects. We implement three linear-chain CRFs which aim at modeling intonation effects at phrasesentence-and story-level boundaries, where we show that standard recognition techniques degrade and common normaliz...
Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and mi...
The University of Maryland participated in the TDT-1999 topic tracking task. This chapter describes the system architecture,
including source-dependent normalization, and then focuses on the cross-language case in which English training stories were
used to find Mandarin stories on the same topic. Processes that may introduce noise, including error...
Developing accurate models to automatically predict user satisfaction about the overall quality of a Spoken Dialog System (SDS) is highly desirable for SDS evaluation. In the original PARADISE framework, a linear regression model is trained using measures drawn from rated dialogs as predictors with user satisfaction as the target. In this paper, we...
Verbal feedback is an important information source in establishing interactional rapport. However, predicting verbal feedback across languages is challenging due to language-specific differences, inter-speaker variation, and the relative sparseness and optionality of verbal feedback. In this paper, we employ an approach combining classifier weighti...
This paper presents an initial attempt at the use of crowd-sourcing for collection of user judgments on spoken dialog systems (SDSs). This is implemented on Amazon Mechanical Turk (MTurk), where a Requester can design a human intelligence task (HIT) to be performed by a large number of Workers efficiently and cost-effectively. We describe a design...
Development of spoken dialog systems (SDSs) can be facilitated by better evaluation methods. Previous methods seldom consider the efficiency of the system, which is important to users. We study the problem of evaluating SDSs and propose a new framework by generalizing states from utterances of dialogs to build finite state machine (FSM). These stat...
User evaluations of dialogs from a spoken dialog system (SDS) can be directly used to gauge the system's performance. However, it is costly to obtain manual evaluations of a large corpus of dialogs. Semi-supervised learning (SSL) provides a possible solution. This process learns from a small amount of manually labeled data, together with a large am...
Aspects of speech and non-verbal behavior allow conversational partners to establish and maintain rapport by signaling engagement or endorsement. In the verbal channel, these factors encompass requests for and production of vocal feedback, as well as lexical and grammatical mirroring. However, these cues are often subtle and culture-specific. Here,...
Prosody plays an integral role in spoken language understanding. In isiZulu, a Nguni family language with lexical tone, prosodic information determines word meaning. We assess the impact of models of tone and coarticulation for tone recognition. We demonstrate the importance of modeling prosodic context to improve tone recognition. We employ this l...
Acquisition of prosody, in addition to vo- cabulary and grammar, is essential for lan- guage learners. However, it has received less attention in instruction. To enable automatic identification and feedback on learners' prosodic errors, we investigate automatic pitch accent labeling for non- native speech. We demonstrate that an acoustic-based cont...
We investigate several measures of voice quality (VQ) to improve tone recognition in Mandarin Chinese. We find that band energy measures such as spectral balance (Sluijter and van Heuven, 1996) work better than measures based on glottal flow estimation and harmonic-formant differences. We also determine a set of bands and measures that improve tone...
This paper discusses a new approach to improve tone recognition by modeling the tone nucleus with vowel landmark detection. The tone nucleus region is identified based on vowel landmark frames derived by an automatic landmark recognition system. In the corresponding tone recognition experiments, the best results with landmark-based tone nucleus reg...
Teaching Computational Linguistics is in- herently multi-disciplinary and frequently poses challenges and provides opportunities in teaching to a student body with diverse ed- ucational backgrounds and goals. This pa- per describes the use of a computational en- vironment (SIDGrid) that facilitates interdis- ciplinary instruction by providing suppo...
The SIDGrid architecture provides a frame-work for distributed annotation, archiving, and analysis of the rapidly growing volume of multimodal data. The framework inte-grates three main components: an annota-tion and analysis client, a web-accessible data repository, and a portal to the dis-tributed processing capability of the Ter-aGrid. The archi...
Document indexing and representation of term-document relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Seman-tic Analysis as a framework for computing se-mantically motivated term and document vec-tors. Our focus on term vectors is motivated by the recent success of co-occurrenc...
Document representation has a large im- pact on the performance of document re- trieval and clustering algorithms. We propose a hybrid document indexing scheme that combines the traditional bag- of-words representation with spectral em- bedding. This method accounts for the specifics of the document collection and also uses semantic similarity info...
The Social Informatics Data Grid is a new infrastructure designed to transform how social and behavioral scientists collect and annotate data, collaborate and share data, and analyze and mine large data repositories. An important goal of the project is to be compatible with existing databases and tools that support the sharing, storage and retrieva...
The Third International Chinese Language Processing Bakeoff was held in Spring 2006 to assess the state of the art in two important tasks: word segmentation and named entity recognition. Twenty-nine groups submitted result sets in the two tasks across two tracks and a total of five corpora. We found strong results in both tasks as well as continuin...
Document indexing and representation of term-document relations are very impor-tant for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based association measure within the Generalized Latent Semantic Analysis framework. We evaluate the graph-based GLSA on the document cluster...
We investigate the use in Mandarin tone recognition of over two hundred possible local acoustic features based on pitch, overall intensity, and band-passe d intensity in the rhyme of a syllable. Features involving pitch height are not as useful as one might expect, showing the need for phrase-level pitch height correction. The intensity contour is...
To improve tone recognition in continuous speech, we propose a strategy focusing on separating regions influenced by tonal coar- ticulation from regions that more closely approximate canonical tone production. Given a syllable segmentation, this approach em- ploys amplitude and pitch information to generate an improved sub-syllable segmentation and...
We use a combination of linear support vector machines and hidden markov models for dialog act tagging in the HCRC MapTask corpus, and obtain better results than those previ- ously reported. Support vector machines allow easy integra- tion of sparse high-dimensional text features and dense low- dimensional acoustic features, and produce posterior p...
Recognition of tone and intonation is es- sential for speech recognition and lan- guage understanding. However, most ap- proaches to this recognition task have re- lied upon extensive collections of man- ually tagged data obtained at substantial time and financial cost. In this paper, we explore two approaches to tone learn- ing with substantially...
Term translation probabilities proved an effective method of semantic smoothing in the language modelling approach to infor- mation retrieval tasks. In this paper, we use Generalized Latent Semantic Analysis to compute semantically motivated term and document vectors. The normalized cosine similarity between the term vec- tors is used as term trans...
Most cues for Mandarin tone recognition involve pitch, overall intensity and duration. This paper investigates ten other possible cues, and finds one that resu lts in modest, but significant, improvement in classification accuracy on a small speaker-independent cor pus of Mandarin news broadcast speech. This cue consists of the energies in the sixt...
Cross-language information retrieval (CLIR) systems allow users to find documents written in different languages from that of their query. Simple knowledge structures such as bilingual term lists have proven to be a remarkably useful basis for bridging that language gap. A broad array of dictionary-based techniques have demonstrated utility, but co...
We use Support Vector Machines for Dialog Act Tagging in the HCRC MapTask cor-pus, and achieve 64.5% classification accu-racy using text and prosodic features with-out using contextual or higher-level infor-mation. The sparse text features are con-verted to a dense representation using Prin-cipal Components Analysis before concate-nating them with...
Tone and intonation play a crucial role across many lan- guages. However, the use and structure of tone varies widely, ranging from lexical tone which determines word identity to pitch accent signalling information status. In this paper, we employ a uniform representation of acous- tic features for recognition of both Mandarin tone and English pitc...
Fluent dialogue requires that speak-ers successfully negotiate and signal turn-taking. While many cues to turn change have been proposed, especially in multi-modal frameworks, here we fo-cus on the use of prosodic cues to these functions. In particular, we consider the use of prosodic cues in a tone lan-guage, Mandarin Chinese, where varia-tions in...
Miscommunication in human-computer interaction is unavoid- able, although speech recognition accuracy continues to im- prove. The perceived difficulty of correcting miscommunica- tions has an even larger negative impact on assessments of sys- tem quality than does the absolute error rate. Therefore it is essential to improve error resolution capabi...
The University of Chicago participated in the Cross-Language Evaluation Forum 2004 (CLEF2004) cross-language multilingual,
bilingual, and spoken language tracks. Cross-language experiments focused on meeting the challenges of new languages with
freely available resources. We found that modest effectiveness could be achieved with the additional appl...
This paper summarizes the Cross-Language Spoken Document Retrieval (CL-SDR) track held at CLEF 2004. The CL-SDR task at CLEF 2004 was again based on the TREC-8 and TREC-9 SDR tasks. This year the CL-SDR task was extended to explore the unknown story boundaries condition introduced at TREC. The paper reports results from the participants showing tha...
Automatic topic segmentation, separation of a discourse stream into its constituent sto-ries or topics, is a necessary preprocessing step for applications such as information re-trieval, anaphora resolution, and summariza-tion. While significant progress has been made in this area for text sources and for English au-dio sources, little work has bee...
Automatic topic segmentation, separation of a discourse stream into its constituent sto-ries or topics, is a necessary preprocessing step for applications such as information re-trieval, anaphora resolution, and summariza-tion. While significant progress has been made in this area for text sources and for English au-dio sources, little work has bee...
Tonal languages, such as Mandarin, convey information using both phonemes and tones. Using a recently proposed framework for measuring the functional load of a phonological contrast (i.e. how much use a language makes of the contrast), we carry out several computations to estimate how much use Mandarin makes of tones. The most interesting result is...
This paper describes a system for rapidly retargetable interactive translingual retrieval. Basic functionality can be achieved for a new document language in a single day, and further improvements require only a relatively modest additional investment. We applied the techniques rst to search Chinese collections using English queries, and have succe...
Automatic topic segmentation, separation of a dis-course stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for En-glish audio sources, little work has been...
Pseudo-relevance feedback, while useful in mono-lingual applications for refining and enriching short user queries, proves even more important in cross-language information retrieval (CLIR). For CLIR, query expansion before and after translation can pro-vide an opportunity to recover from translation gaps, reduce ambiguity, and enhance recall. Furt...
This paper describes a system for rapidly retargetable interactive translingual retrieval. Basic functionality can be achieved for a new document language in a single day, and further improvements require only a relatively modest additional investment. We applied the techniques first to searchChinese collections using English queries, and have succ...
This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of English- Mandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-b...
Query expansion by pseudo-relevance feedback is a well-established technique in both mono- and cross- lingual information retrieval, enriching and disambiguating the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent development motivated by error-prone transcription and translation process...
Query expansion by pseudo-relevance feedback is a well-established technique in both mono- and cross- lingual informa- tion retrieval, enriching and disambiguat- ing the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent de- velopment motivated by error-prone tran- scription and translation...
This paper describes an approach to large-scale construction of a semantic hierarchy for Chinese verbs. Leveraging off of an existing Chinese conceptual database called HowNet and a Levin-based English verb classification, we use thematic-role information to create links between Chinese concepts and English classes. The resulting hierarchy is used...
This paper addresses the problem of automatic acquisition of lexical knowledge for rapid construction of engines for machine
translation and embedded multilingual applications. We describe new techniques for large-scale construction of a Chinese–English
verb lexicon and we evaluate the coverage and effectiveness of the resulting lexicon. Leveraging...
Miscommunication in spoken human–computer interaction is unavoidable. Ironically, the user's attempts to repair these miscommunications are even more likely to result in recognition failures, leading to frustrating error “spirals”. In this paper we investigate users' adaptations to recognition errors made by a spoken language system and the impact...
This report describes Project MEI (Mandarin-English Information), one of the four projects selected for the Johns Hopkins University Summer Workshop 2000. Our research focus is on the integration of speech recognition and embedded machine translation technologies in the context of crosslingual spoken document retrieval (CL-SDR), also known as trans...
This paper describes the Mandarin–English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) fro...
The limited coverage of available translation lexicons can pose a serious challenge in some cross-language information retrieval applications. We present two techniques for combining evidence from dictionary-based and corpus-based translation lexicons, and show that backoff translation outperforms a technique based on merging lexicons.
: This paper addresses the problem of building conceptual resources for multilingual applications. We describe new techniques for large-scale construction of a Chinese-English lexicon for verbs, using thematic-role information to create links between Chinese and English conceptual information. We then present an approach to compensating for gaps in...
. The University of Maryland participated in the CLEF 2000 multilingual task, submitting three official runs that explored the impact of applying language-independent stemming techniques to dictionarybased cross-language information retrieval. The paper begins by describing a cross-language information retrieval architecture based on balanced docum...
Normal human speech has a clear intonational and rhythmic character. This is true of many Pacific Rim languages and plays a particularly crucial role in the many tone languages of the region, such as Thai and Chinese. However, most computer speech systems fail to utilize prosody for disambiguation or increased naturalness. In this paper, we examine...
The University of Maryland participated in the topic tracking task, submitting four runs for the required condition (four English training stories). In this paper, we present the results of these runs and three additional contrastive runs, comparing the effectiveness of different translation selection strategies, stopwording in Mandarin, post-trans...
We outline challenges for modeling human language assessment in automatic systems, both in terms of the process and the reliability of the result. We propose an architecture for a system to evaluate examinees via the Computerized Oral Proficiency Instrument, to determine whether they have `reached' or `not reached' the Intermediate Low level of pro...
Miscommunication in speech recognition systems is unavoidable, but a detailed characterization of user corrections will enable speech systems to identify when a correction is taking place and to more accurately recognize the content of correction utterances. In this paper we investigate the adaptations of users when they encounter recognition error...
The University of Maryland participated in the topic tracking task, submitting four runs for the required conditions (basic and challenge). In this working notes paper, we present preliminary results based on those runs and six additional contrastive runs that explored translation selection, posttranslation resegmentation, post-transcription docume...
The University of Maryland participated in the CLEF 2000 multilingual task, submitting three official runs that explored the impact of applying language-independent stemming techniques to dictionary-based cross-language information retrieval. The paper begins by describing a cross-language information retrieval architecture based on balanced docume...
The University of Maryland participated in the CLEF 2000 multilingual task of information retrieval in English, French, Italian, and German, submitting three contrastive runs. These runs explore the impact of using language-independent stemming techniques in a dictionary-based document translation strategy. We performed a three-way contrast of unst...