Jody Kreiman

Jody Kreiman
  • PhD
  • Professor (Full) at University of California, Los Angeles

About

236
Publications
75,824
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,694
Citations
Current institution
University of California, Los Angeles
Current position
  • Professor (Full)
Additional affiliations
January 1987 - December 2012
University of California, Los Angeles
November 1993 - present
University of California, Los Angeles
Position
  • Toward standardizing perceptual voice quality measures
Description
  • Ongoing inquiries into the perception, production, and acoustics of voice
Education
September 1977 - April 1987
University of Chicago
Field of study
  • Linguistics

Publications

Publications (236)
Article
Voice quality serves as a rich source of information about speakers, providing listeners with impressions of identity, emotional state, age, sex, reproductive fitness, and other biologically and socially salient characteristics. Understanding how this information is transmitted, accessed, and exploited requires knowledge of the psychoacoustic dimen...
Article
Recent decades have seen rapid growth in our understanding of the production and perception of the human voice and of the acoustic features that critically link production and perception in communication. This review describes some of the many acoustic measures of voice (and voice quality) that have been proposed to quantify these features, with at...
Article
The problem of characterizing voice quality has long caused debate and frustration. The richness of the available descriptive vocabulary is overwhelming, but the density and complexity of the information voices convey lead some to conclude that language can never adequately specify what we hear. Others argue that terminology lacks an empirical basi...
Article
This paper continues our studies of acoustic variability in voice between and within speakers. Previous work indicates that acoustic variability is characterized by the balance between high-frequency harmonic and inharmonic energy in the voice (measured using cepstral peak prominence) and by formant dispersion, regardless of the speaker’s sex, nati...
Article
Full-text available
The problem of how to characterize voice quality is an endless source of debate and frustration across disciplines. The richness of the vocabulary available to describe voice is overwhelming, but the density of the information conveyed by voice has led some scholars to conclude that language can never adequately specify what we hear. Others have ar...
Poster
Full-text available
“Creaky voice” is a term that covers multiple kinds of voicing, and there is no single defining acoustic property shared by all subtypes of creaky voice. Here we explore the distinct characteristics of each subtype. We identify three main properties of creaky voice: low f0, irregular f0, and constricted glottis (as shown by electroglottography). Pr...
Article
Recent empirical work has begun to consider the social, political, and biological contexts in which voice information is exploited by both speakers and listeners, making this a fruitful moment to address the points of contact between scientific thought and the critical perspectives put forth in Nina Sun Eidsheim’s The Race of Sound. In that book, v...
Conference Paper
Full-text available
This study aims to identify laryngeal manipulations that would allow a male to approximate a female-sounding voice, and that can be targeted in voice feminization surgery or therapy. Synthetic voices were generated using a three-dimensional vocal fold model with parametric variations in vocal fold geometry, stiffness, adduction, and subglottal pres...
Conference Paper
Full-text available
This study presents a cross-linguistic investigation of acoustic voice spaces in English, Seoul Korean, and White Hmong, which differ in whether they phonologically contrast phonation type and/or tone. The overarching hypothesis is that acoustic variability in voice will be shaped by biological factors, linguistic factors, and individual idiosyncra...
Article
Full-text available
This study replicates and extends the recent findings of Lee, Keating, and Kreiman [J. Acoust. Soc. Am. 146(3), 1568–1579 (2019)] on acoustic voice variation in read speech, which showed remarkably similar acoustic voice spaces for groups of female and male talkers and the individual talkers within these groups. Principal component analysis was app...
Article
Full-text available
No PDF available ABSTRACT Our previous studies examined the manner in which within- and between-speaker acoustic variability in voice follow patterns determined by biological factors, the language spoken, and individual idiosyncrasies. To date, we have analyzed data from speakers of English, Seoul Korean, and Hmong, which differ in whether they con...
Article
Full-text available
This study compares human speaker discrimination performance for read speech versus casual conversations and explores differences between unfamiliar voices that are “easy” versus “hard” to “tell together” versus “tell apart.” Thirty listeners were asked whether pairs of short style-matched or -mismatched, text-independent utterances represented the...
Article
Full-text available
No PDF available ABSTRACT What does it mean for a voice to sound normal? Our recent work comparing listener judgements of talkers with and without a diagnosed voice disorder showed that judgments of “normal” versus “not normal” and strategies for estimating how much a given voice deviates from normal depended on the listener, the context, the purpo...
Article
Full-text available
No PDF available ABSTRACT This study examines the extent to which the phonological structure of a language impacts acoustic variation in voice spaces for individuals and populations of speakers. Our recent work on two typologically different languages, American English and Seoul Korean, showed striking similarities in the acoustic spaces derived fr...
Article
No PDF available ABSTRACT Male vocal folds are often longer and thicker than female vocal folds. The length difference is generally considered responsible for the notable difference in fundamental frequency between men and women, which plays an important role in gender perception. The role of the thickness difference in gender perception is less cl...
Article
Full-text available
Introduction Vibratory asymmetry and neuromuscular compensation are often seen in laryngeal neuromuscular pathology. However, the ramifications of these findings on voice quality are unclear. This study investigated the effects of varying levels of vibratory asymmetry and neuromuscular compensation on cepstral peak prominence (CPP), an analog of vo...
Article
Full-text available
Objectives Laryngeal vibratory asymmetry occurring with paresis may result in a perceptually normal or abnormal voice. The present study aims to determine the relationships between the degree of vibratory asymmetry, acoustic measures, and perception of sound stimuli. Study Design Animal Model of Voice Production, Perceptual Analysis of Voice. Met...
Article
Full-text available
No agreed-upon method currently exists for objective measurement of perceived voice quality. This paper describes validation of a psychoacoustic model designed to fill this gap. This model includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice,...
Chapter
It is rather unclear what is meant by “normal” voice quality, just as it is often unclear what is meant by “voice quality” in general. To shed light on this matter, listeners heard 1-sec sustained vowels produced by 100 female speakers, half of whom were recorded as part of a clinical voice evaluation and half of whom were undergraduate students wh...
Article
Full-text available
No PDF available ABSTRACT Acoustic voice spaces for English speakers are characterized mainly by variability in F0, the balance between higher harmonic amplitudes and inharmonic energy, and higher formant frequencies [JASA 146(4), 3011 (2019)]. We extended this investigation to another language to test the hypothesis that a few biologically relevan...
Article
Full-text available
Objectives/hypotheses: Charismatic leaders use vocal behavior to persuade their audience, achieve goals, arouse emotional states, and convey personality traits and leadership status. This study investigates voice fundamental frequency (f0) and sound pressure level (SPL) in female and male French, Italian, Brazilian, and American politicians to det...
Preprint
Full-text available
Does speaking style variation affect humans' ability to distinguish individuals from their voices? How do humans compare with automatic systems designed to discriminate between voices? In this paper, we attempt to answer these questions by comparing human and machine speaker discrimination performance for read speech versus casual conversations. Th...
Chapter
The sound of a voice—its quality—plays an integral role in the biological and social existences of animal species ranging from frogs to birds to elephants to primates and humans. Across animal species, voice plays a part in many, many aspects of behavior, including mate selection and attraction, social organization, identification of parent/child/s...
Chapter
Investigators and clinicians have long sought to apply acoustic analysis to track changes in voice quality in Parkinson disease (PD), in order to evaluate or document treatment effects, track disease progression, or to attempt remote automatic diagnosis. These studies have often had disappointing results, so that the best way to apply acoustics to...
Article
Full-text available
No PDF available ABSTRACT Using principal component analysis (PCA), our previous study [JASA 145(pt. 2), 1930, (2019)] of read sentences found surprisingly similar acoustic voice spaces for groups of female and male talkers and for the individuals within groups. Formant frequencies and the balance between higher harmonic amplitudes and inharmonic e...
Article
Full-text available
No PDF available ABSTRACT Our recent studies [JASA 145(Pt. 2), 1930, (2019); this conference] show that acoustic spaces characterizing within- and between-speaker variability in voice quality have similar structures, with a few features (acoustic variability and formant dispersion) important for all speakers combined with idiosyncratic features cha...
Article
Full-text available
Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within-and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive model...
Conference Paper
Full-text available
Little is known about the nature or extent of everyday variability in voice quality within a speaker or how this differs across speakers. Using principal component analysis, we identified measures that account for perceptually relevant acoustic variance within speakers. Based on face-identity studies and cognitive models of speaker recognition, we...
Article
Full-text available
No PDF available ABSTRACT Little is known about the nature or extent of everyday variability in voice quality within a speaker or how this differs across speakers. Using a suite of measures that map between acoustics and perception of voice quality, this study elucidates which acoustic variables within speakers’ individual voice spaces best charact...
Preprint
Full-text available
The manner in which acoustic features contribute to perceiving speaker identity remains unclear. In an attempt to better understand speaker perception, we investigated human and machine speaker discrimination with utterances shorter than 2 seconds. Sixty-five listeners performed a same vs. different task. Machine performance was estimated with i-ve...
Article
Full-text available
The past decades have seen an explosion of research into the psychological, cognitive, neural, biological, and technical mechanisms of voice perception. These mechanisms refer to the general ability to extract information from voices expressed by other living beings or by technical systems. Voice perception research is now a lively area of research...
Article
Voices play essential roles in human experience, and carry many kinds of meaning. At present, the study of voice is consistently tied to the speech chain model, so that voice production, acoustics, and perception are treated as separate, independently studied stages. Such studies provide no insight into how meaning in voice is constructed, and crea...
Article
Full-text available
Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, c...
Article
It is rather unclear what is meant by “normal” voice quality, just as it is unclear what is meant by “voice quality” in general. A clearer understanding of what listeners perceive as normal and what strikes them as disordered would benefit both clinical practice, for which a normal sound is presumably the goal of treatment, and the study of voice q...
Article
Many commentaries on the voice of singer Maria Callas note that her voice changed markedly over the course of her career, with changes often attributed to “ferocious dieting.” Such claims are particularly troubling in the absence of evidence that weight loss affects voice acoustics, and in the relative absence of acoustic data testing specific hypo...
Article
Little is known about how to characterize normal variability in voice quality within and across utterances from normal speakers. Our previous study of female voices suggested that only a few acoustic parameters consistently distinguish speakers, with most of the work being done by idiosyncratic subsets of parameters. The present study extends this...
Conference Paper
Full-text available
Due to within-speaker variability in phonetic content and/or speaking style, the performance of automatic speaker verification (ASV) systems degrades especially when the enrollment and test utterances are short. This study examines how different types of variability influence performance of ASV systems. Speech samples (< 2 sec) from the UCLA Speake...
Article
Full-text available
Purpose The question of what type of utterance—a sustained vowel or continuous speech—is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Me...
Article
Full-text available
Based on our psychoacoustic model of voice quality, the UCLA voice synthesizer allows users to copy synthesize nearly any steady-state voice sample or to create stimuli that systematically vary in specific acoustic dimensions. This new release contains a number of significant improvements from earlier versions. The vocal tract model now includes th...
Article
Little is known about how to characterize normal variability in voice quality within and across utterances from normal speakers. Given a standard set of acoustic measures of voice, how similar are samples of 50 women’s voices? Fifty women, all native speakers of English, read 5 sentences twice on 3 days—30 sentences per speaker. The VoiceSauce anal...
Conference Paper
Full-text available
Despite recent breakthroughs in automatic speaker recognition (ASpR), system performance still degrades when utterances are short and/or when within-speaker variability is large. This study used short test utterances (2-3sec) to investigate the effect of within-speaker variability on state-of-the-art ASpR system performance. A subset of a newly-dev...
Article
Full-text available
Purpose This letter briefly reviews ideas about the purpose and benefits of peer review and reaches some idealistic conclusions about the process. Method The author uses both literature review and meditation born of long experience. Results From a cynical perspective, peer review constitutes an adversarial process featuring domination of the weak...
Article
Full-text available
The voice of jazz performer Jimmy Scott raises interesting questions of how gender is marked (or not marked) in singingvoice. Scott was born with Kallman’s Syndrome, which affects male hormonal levels and prevents the onset of puberty. Although he self-identified as a “regular guy,” in his career he was presented as a novelty act—a boy who sounded...
Article
Full-text available
Experiments using animal and human larynx models are often conducted without a vocal tract. While it is often assumed that the absence of a vocal tract has only small effects on vocal fold vibration, it is not actually known how sound production and quality are affected. In this study, the validity of using data obtained in the absence of a vocal t...
Article
A psychoacoustic model of the voicesourcespectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1–H2), the second and fourth harmonics (H2–H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4–2 kHz), and the harmonic nearest 2 kHz and that...
Conference Paper
Full-text available
Little is known about intraspeaker changes in voice across changing speaking situations in everyday life. In this study, we examined acoustic variations between and within 5 talkers and their effect on the likelihood that voice samples would not be identified as coming from the same talker. Talkers were drawn from a large database recorded to captu...
Conference Paper
Full-text available
There is not one kind, but instead several kinds, of creaky voice, or creak. There is no single defining property shared by all kinds. Instead, each kind exhibits some properties but not others. Therefore different acoustic measures characterize different kinds of creak. This paper describes how various acoustic measures should pattern for each kin...
Article
Full-text available
Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners comple...
Article
Full-text available
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
Full-text available
Increasing evidence suggests that voices are best thought of as complex auditory patterns, and that listeners perceive and remember voices with reference to a “prototype” or “average” for that talker. Little is known about how, and how much, individual talkers vary their voice quality across situations that arise in every-day speaking, so the natur...
Conference Paper
Full-text available
Our proposed perceptually motivated spectral-domain model of the voice source comprises spectral slopes in four ranges (H1-H2, H2-H4, H4-the harmonic nearest 2 kHz, and that harmonic to the harmonic nearest 5 kHz). Previous studies established the necessity of these parameters by demonstrating that listeners are sensitive to all of these parameters...
Conference Paper
Full-text available
Models of the voice source differ in how they fit natural voices, but it is still unclear which differences in fit are perceptually salient. This study describes ongoing analyses of differences in the fit of six voice source models to 40 natural voices, and how these differences relate to perceptual similarities among stimuli. Listeners completed a...
Conference Paper
Full-text available
The relative magnitude of the first two harmonics of the voice source (H1*-H2*) is an important measure and is assumed to be one exponent of changes in vocal quality along a breathy-to-pressed continuum. H1*-H2* is often associated with glot-tal open quotient (OQ) and glottal pulse skewness (as quantified by speed quotient, SQ), but may also covary...
Conference Paper
Full-text available
The relative magnitude of the first two harmonics of the voice source (H1*-H2*) is an important measure and is assumed to be one exponent of changes in vocal quality along a breathy-to-pressed continuum. H1*-H2* is often associated with glot-tal open quotient (OQ) and glottal pulse skewness (as quantified by speed quotient, SQ), but may also covary...
Article
Full-text available
At present, two important questions about voice remain unanswered: When voice quality changes, what physiological alteration caused this change, and if a change to the voice production system occurs, what change in perceived quality can be expected? We argue that these questions can only be answered by an integrated model of voice linking productio...
Article
Full-text available
Our previous study examined the perceptual adequacy of different source models. We found that perceived similarity between modeled and natural voice samples was best predicted (in the time dimension) by thematch between waveforms at the negative peak of the flow derivative (R(2) = 0.34). The extent of fit during the opening phase of the source puls...
Article
How specific aspects of vocal fold vibration alter voice register has long been a subject of interest. Transitions between vocal registers are often studied using dynamic vocal fold models and electroglottographic signals. Although laryngeal high-speed videoendoscopy has also been applied to study steady-state voice registers, there has been little...
Article
Laryngeal high-speed videoendoscopy is a state-of-the-art technique to examine physiological vibrational patterns of the vocal folds. With sampling rates of thousands of frames per second, high-speed videoendoscopy produces a large amount of data that is difficult to analyze subjectively. In order to visualize high-speed video in a straightforward...
Article
Full-text available
A psychoacoustic model of the source spectrum has been proposed in which source contributions to overall voice quality can be quantified by four spectral slope components: H1-H2 (the amplitude difference between the first and second harmonics), H2-H4, H4-2000 Hz (i.e., the harmonic nearest to 2000 Hz), and 2000-5000 Hz. The natural variability of t...
Conference Paper
Full-text available
Investigating the relationship between glottal area waveform shape and harmonic magnitudes through computational modeling and laryngeal high-speed videoendoscopy Abstract The glottal open quotient (OQ) is often associated with the amplitude of the first source harmonic relative to the second (H1*-H2*), which is assumed to be one cause of a change i...
Conference Paper
Full-text available
Many glottal source models have been proposed, but none has been systematically validated perceptually. Our previous work showed that model fitting of the negative peak of the flow derivative is the most important predictor of perceptual similarity to the target voice. In this study, a new voice source model is proposed to capture perceptually-impo...
Article
Full-text available
A psychoacoustic model of the source spectrum has been proposed in which four spectral slope parameters describe perception of overall voice quality: H1-H2 (the difference in amplitude between the first and second harmonics), H2-H4, H4-2000 Hz (i.e., the harmonic nearest 2000 Hz), and 2000-5000 Hz. The goals of this study are to evaluate perceptual...
Article
Full-text available
Decreasing epilaryngeal area has been shown to increase glottal flow pulse skewing and harmonic amplitudes [Titze, J. Acoust. Soc. Am. 123, 2733 (2008)]. It is not known, however, whether listeners perceive voice quality changes when epilaryngeal area is altered, or if perceived quality is different if the area change occurs at the ventricular fold...
Article
Full-text available
Many glottal source models have been proposed, but none has been systematically validated perceptually. Our previous work showed that model fitting of the negative peak of the flow derivative is the most important predictor of perceptual similarity to the target voice. In this study, a new voice source model motivated by high-speed laryngeal videoe...
Article
Full-text available
Because voice signals result from vocal fold vibration, perceptually meaningful vibratory measures should quantify those aspects of vibration that correspond to differences in voice quality. In this study, glottal area waveforms were extracted from high-speed videoendoscopy of the vocal folds. Principal component analysis was applied to these wavef...
Article
Full-text available
This study investigates the importance of source spectrum slopes in the perception of phonation by White Hmong listeners. In White Hmong, nonmodal phonation (breathy or creaky voice) accompanies certain lexical tones, but its importance in tonal contrasts is unclear. In this study, native listeners participated in two perceptual tasks, in which the...
Article
Full-text available
At present, it is not well understood how changes in vocal fold biomechanics correspond to changes in voice quality. Understanding such cross-domain links from physiology to acoustics to perception in the "speech chain" is of both theoretical and clinical importance. This study investigates links between changes in body layer stiffness, which is re...
Article
Laryngeal high-speed videoendoscopy is a state-of-the-art technique to examine physiological vibrational patterns of the vocal folds. With sampling rates of thousands of frames per second, high-speed videoendoscopy produces a large amount of data that is difficult to analyze subjectively. In order to visualize high-speed video in a straightforward...
Article
Full-text available
Increases in open quotient are widely assumed to cause changes in the amplitude of the first harmonic relative to the second (H1∗-H2∗), which in turn correspond to increases in perceived vocal breathiness. Empirical support for these assumptions is rather limited, and reported relationships among these three descriptive levels have been variable. T...
Conference Paper
Full-text available
Estimation of the glottal source has applications in many areas of speech processing. Therefore, a noise-robust automatic source estimation algorithm is proposed in this paper. The source signal is estimated using a codebook search approach. The glottal area waveforms extracted from high-speed recordings of the glottis is converted to the glottal f...
Article
Because voice signals result from vocal fold vibration, perceptually-meaningful vibratory measures should quantify those aspects of vibration that correspond to differences in voice quality. In this study, glottal area waveforms were calculated from high-speed images of the vocal folds. Principal component analysis was applied to these waveforms to...
Article
Full-text available
Many models of the glottal source have been proposed, but none has been systematically validated perceptually, so that it is unclear whether deviations from perfect fit have perceptual importance. If model fit fails in ways that have no perceptual significance, such "errors" can be ignored, but poor fit with respect to perceptually-important featur...
Article
Full-text available
This study investigates the relative importance of phonation and pitch cues in (White) Hmong tone identification. Hmong has seven productive tones, two of which involve non-modal phonation. The breathy tone is usually produced with a mid- or high-falling pitch contour similar to the high-falling modal tone. Similarly, aside from some pitch differen...
Conference Paper
Full-text available
High-speed video provides a way to record physiological vi-brational patterns of the vocal folds. Due to the large amount of data it produces, many methods have been proposed to reduce raw images to the underlying vibrational patterns. Previous methods either focus on a certain location on the vocal folds, or a certain frequency of vibrational acti...
Article
Full-text available
Although the amount of inharmonic energy (noise) present in a human voice is an important determinant of vocal quality, little is known about the perceptual interaction between harmonic and inharmonic aspects of the voice source. This paper reports three experiments investigating this issue. Results indicate that perception of the harmonic slope an...
Article
The goal of this project is to develop a new source model from high-speed recordings of vocal fold vibrations and simultaneous audio recordings. By analyzing area waveforms and acoustics and performing perception experiments, we will better parameterize the new source model and also uncover those aspects of the model that are perceptually salient,...
Article
Full-text available
Modeling the source spectrum requires understanding of the perceptual importance of different spectral-domain attributes of the voice source. Although the roles of H1-H2 and high-frequency harmonics in quality perception are somewhat understood, the extent of spectral detail that is perceptually significant in other frequency ranges is not known. T...
Conference Paper
Full-text available
During speech production, the vocal folds may not close completely. The resulting glottal gap (GG) or incomplete glottal closure has not been systematically studied in terms of GG acoustic and/or perceptual consequences. This paper uses high-speed imaging to investigate the relationship between GG area, source parameters, acoustic measures, and voi...
Article
Full-text available
The human voice is described in dialogic linguistics as an embodiment of self in a social context, contributing to expression, perception and mutual exchange of self, consciousness, inner life, and personhood. While these approaches are subjective and arise from phenomenological perspectives, scientific facts about personal vocal identity, and its...
Article
Full-text available
Purpose Interrater disagreements in ratings of quality plague the study of voice. This study compared 2 methods for handling this variability. Method Listeners provided multiple breathiness ratings for 2 sets of pathological voices, one including 20 male and 20 female voices unselected for quality and one including 20 breathy female voices. Rating...
Article
The debate over the equivalence of voice quality measures derived from continuous speech vs steady state vowels is founded in the various definitions of voice quality and the resulting uncertainty about which aspects of speechmeasures of quality are most important. Measures derived from steady?state phonation correspond to narrow definitions of voi...
Article
Despite the great need for valid measures of voice quality, we are still unable to adequately quantify what a person sounds like. Drawing on recent theoretical and experimental work, we propose assessment of overall quality as a whole, rather than using individual rating scales. This paper describes experiments evaluating a psychoacousticmodel that...
Article
Control of vocal quality is a primary goal of speech and the focus of clinical management of voice disorders. Therefore, it can be argued that auditory?perceptual sensitivity to the acoustic changes that occur as a laryngeal parameter is manipulated should be considered in the development of physiologic models of phonation. Listener perception of t...
Chapter
Introduction Respiration The Larynx and Phonation The Supraglottal Vocal Tract and Resonance Kinds of Acoustic Analyses Controlling the Sound of a Voice What Gives Rise to Individual Voice Quality?
Chapter
Why Should We Care About Voice Quality? What is Voice? What is Voice Quality? The Definitional Dilemma Measuring Voice Quality Alternatives to Dimensional and Featural Measurement Systems for Voice Quality Organization of the Book

Network

Cited By