ArticlePDF Available

Acoustic Interpretation of the Voice Range Profile (Phonetogram)

Authors:

Abstract

The voice range profile (VRP) is a display of vocal intensity range versus fundamental frequency (F0). Past measurements have shown that the intensity range is reduced at the extremes of the F0 range, that there is a gradual upward tilt of the high- and low-intensity boundaries with increasing F0, and that a ripple exists at the boundaries. The intensity ripple, which results from tuning of source harmonics to the formants, is more noticeable at the upper boundary than the lower boundary because higher harmonics are not energized as effectively near phonation threshold as at maximum lung pressure. The gradual tilt of the intensity boundaries results from more effective transmission and radiation of acoustic energy at higher fundamental frequencies. This depends on the spectral distribution of the source power, however, At low F0, a smaller spectral slope (more harmonic energy) produces greater intensity. At high F0, on the other hand, a shift of energy toward the fundamental results in greater intensity. This dependence of intensity on spectral distribution of source power seems to explain the reduced intensity range at higher F0. An unrelated problem of reduced intensity range at low F0 stems from the inherent difficulty of keeping F0 from rising when subglottal pressure is increased.
A preview of the PDF is not available
... This is further substantiated by the increasingly large z axis value ranges in the detail graphs (see Figs. 3-5 as well as the video simulations). 2 Particularly as regards tuning of f R1 , the results presented here corroborate the findings of Titze (1992), who indicated that at lower f o , a shallower spectral slope (and thus a stronger voice source) produces greater intensity output, while at higher f o , a shift of energy toward the fundamental is required to optimize sound output levels. ...
... In contrast, in a voice with a strong voice source, the need for resonance tuning may not be that prevalent. This is again in agreement with previously published data (Titze, 1992). ...
... dB(A) per octave of f o . This is roughly in line with empirical data from voice range profiles (Ternstr€ om et al., 2016), where the level increase is due to both vocal tract radiation issues (systematically tested here) and increase in lung pressure as a function of f o (Titze, 1992), i.e., a factor that was not considered in this simulation study. ...
Article
A well-known concept of singing voice pedagogy is “formant tuning,” where the lowest two vocal tract resonances ([Formula: see text]) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of [Formula: see text] variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of –6, –12, and –18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.
... The most obvious clash between frequency and loudness is that loud vocalizations tend to be high-pitched because increasing subglottal pressure and vocal effort, which is necessary for being loud, simultaneously raises f o (Behrman, 2021;Titze, 2000). Of particular relevance to vocal intimidation, voice range profiles demonstrate that the maximum achievable loudness tends to increase as f o rises, at least up to some threshold (Gramming & Sundberg, 1988;Hunter & Titze, 2005;Lamesch et al., 2012;Titze, 1992). Another reason to increase f o if loudness is a priority is that radiation efficiency (i.e., the proportion of generated acoustic power that leaves the mouth) improves at higher frequencies. ...
... f o increased by 1.9 semitones (95% CI [1.5, 2.3]) in women and 1.7 [1.3, 2.2] semitones in men for every doubling of loudness (+6 dB). This change in f o was much smaller than described by Titze (1992), who reported an increase of 8-9 dB per octave in the opposite setup (controlled stepwise increases of f o rather than loudness), but otherwise in line with theoretical expectations and previous evidence. Interestingly, f o rose by about an octave (12 semitones) over the typical range of loudness in vocal ramps (30 dB), and both of these values were very close to the changes in the fight/free and fight/vowel conditions relative to baseline, again suggesting that the rise of f o in intimidating vocalizations may be a simple side effect of moving from relaxed to very loud phonation. ...
Article
Full-text available
Across many species, a major function of vocal communication is to convey formidability, with low voice frequencies traditionally considered the main vehicle for projecting large size and aggression. Vocal loudness is often ignored, yet it might explain some puzzling exceptions to this frequency code. Here we demonstrate, through acoustic analyses of over 3,000 human vocalizations and four perceptual experiments, that vocalizers produce low frequencies when attempting to sound large, but loudness is prioritized for displays of strength and aggression. Our results show that, although being loud is effective for signaling strength and aggression, it poses a physiological trade-off with low frequencies because a loud voice is achieved by elevating pitch and opening the mouth wide into a-like vowels. This may explain why aggressive vocalizations are often high-pitched and why open vowels are considered “large” in sound symbolism despite their high first formant. Callers often compensate by adding vocal harshness (nonlinear vocal phenomena) to undesirably high-pitched loud vocalizations, but a combination of low and loud remains an honest predictor of both perceived and actual physical formidability. The proposed notion of a loudness–frequency trade-off thus adds a new dimension to the widely accepted frequency code and requires a fundamental rethinking of the evolutionary forces shaping the form of acoustic signals.
... The SST range and f o values were chosen to reflect values realistically achievable by healthy adults. 48,72,73 For this exploration of intensity effects, no VE was included, and was set to 0.0 ST. The relative SPL of each tone was measured using the PRAAT "get intensity" function. ...
Article
Full-text available
Objectives. This in silico study explored the effects of a wide range of fundamental frequency (f o), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures. Method. Using 53 synthesized tones produced in Madde, the effects of stepwise increases in f o , intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any f o effects on jitter, RAP, shimmer, APQ3, and HNR were stable. A total of 10 sinewaves were synthesized in Sopran from 100 to 1000 Hz using formant frequencies for /a/, /i/, and /u/-like vowels, respectively. All effects were statistically assessed with Kendall's tau-b and partial correlation. Results. Increasing f o resulted in an overall increase in jitter, RAP, shimmer, and APQ3 values, respectively (P < 0.01). Oscillations of the data across the explored f o range were observed in all measurement outputs. In the Sopran tests, the oscillatory pattern seen in the Madde f o condition remained and showed differences between vowel conditions. Increasing intensity (decreasing SST) led to reduced pitch and amplitude perturbation and HNR (P < 0. 05). Increasing VE led to lower HNR and an almost linear increase of all other measures (P < 0.05). Conclusion. These novel data offer a controlled demonstration for the behavior of jitter (local) %, RAP %, shimmer (local) %, APQ3 %, and HNR (dB) when varying f o , SST, and VE in synthesized tones. Since humans will vary in all of these aspects in spoken language and vowel phonation, researchers should take potential resonance-harmonics type effects into account when comparing intersubject or preintervention and post-intervention data using these measures.
... Thus, it may be difficult to distinguish between efficient (and sustainable) and hyperfunctional phonation solely based on ER. Further, fo influences ER, as higher fo are stronger in SPL owing to resonance-harmonics interactions and greater radiation efficiency [70,91,115,116]. Previous research has noted an increase in speakers' PTP after increased vocal demand [90,97,117], most likely owing to increased tissue viscosity, thickness of the vocal folds' colliding edge, and sub-optimal (i.e., too narrow or too wide) prephonatory glottal width [117][118][119]. ...
Article
Full-text available
To date, no established protocol exists for measuring functional voice changes in singers with subclinical singing-voice complaints. Hence, these may go undiagnosed until they progress into greater severity. This exploratory study sought to (1) determine which scale items in the self-perceptual Evaluation of Ability to Sing Easily (EASE) are associated with instrumental voice measures, and (2) construct as proof-of-concept an instrumental index related to singers' perceptions of their vocal function and health status. Eighteen classical singers were acoustically recorded in a controlled environment singing an /a/ vowel using soft phonation. Aerodynamic data were collected during a softly sung /papapapapapapa/ task with the KayPENTAX Phonatory Aerodynamic System. Using multi and univariate linear regression techniques, CPPS, vibrato jitter, vibrato shimmer , and an efficiency ratio (SPL/PSub) were included in a significant model (p < 0.001) explaining 62.4% of variance in participants' composite scores of three scale items related to vocal fatigue. The instrumental index showed a significant association (p = 0.001) with the EASE vocal fatigue subscale overall. Findings illustrate that an aeroacoustic instrumental index may be useful for monitoring functional changes in the singing voice as part of a multidimensional diagnostic approach to preventative and rehabilitative voice healthcare for professional singing-voice users.
... A potential explanation might be a coupling between call frequency and call amplitude. In humans and birds, frequency and amplitude are coupled when vocalising at the physiological limits (very high or very low vocal amplitude): low frequency vocalisations can only be emitted at lower amplitudes and high frequencies are typically of higher amplitudes (Nemeth et al., 2013;Titze, 1992). Echolocation calls are very intense, close to the bats' physiological limits (Currie et al., 2020). ...
Article
Full-text available
Echolocation is the use of self-emitted calls to probe the surrounding environment. The atmosphere strongly absorbs sound energy, particularly high frequencies, thereby limiting the sensory range of echolocating animals. Atmospheric attenuation varies with temperature and humidity, which both vary widely in the temperate zone. Since echolocating insectivorous bats rely on ultrasound to capture insects, their foraging success might decrease with seasonal and daily variations in weather. To counteract weather-induced variations in prey detection, we hypothesised that European bats decrease call frequency and increase call energy when atmospheric attenuation increases, thereby maintaining their prey detection distance. Using acoustic localisation and automated call analysis, we measured call frequency and energy in free-flying bats of three common European insectivorous species. One species, Pipistrellus nathusii/kuhlii, increased call frequency, but simultaneously decreased call energy, while the two other species (P. pipistrellus and Myotis daubentonii) did not alter call parameters. We estimated the detection distance for prey based on the recorded call parameters and prey characteristics, using a custom-developed theoretical model. None of the three species maintained prey detection distance (it decreased by 1.7 to 3.4 m) when atmospheric attenuation increased. This study contributes to a better understanding of the sensory challenges faced by animals in fluctuating environments.
Article
Purpose This study examined whether the “Three Bears Passage” (TB), a standard Mandarin reading passage, could elicit significant vocal range variations in individuals with voice disorders. Relative sensitivity of TB versus another existing standard reading passage, “Passage in Mandarin” (PM), for differentiating between individuals with and without voice disorders was also evaluated. Method Forty-two individuals with normal voice and 30 individuals with voice disorders participated in the study. Maximum fundamental frequency ( f 0 ), minimum f 0 , mean f 0 , f 0 range, maximum vocal intensity, minimum intensity, mean intensity, and intensity range of all participants reading aloud the two passages were measured with Praat to construct speech range profiles (SRPs). Results Significantly larger vocal range was found for TB than for PM in individuals with voice disorders, including significantly higher maximum f 0 , mean f 0 , maximum intensity, mean intensity, and significantly larger f 0 range and intensity range. Significantly more limited vocal range was observed in individuals with voice disorders than those without, with more obviously restricted SRPs while reading aloud TB compared to PM. Receiver operating characteristic analysis suggested that TB was more sensitive than PM in distinguishing between individuals with and without voice disorders. Conclusions Our findings supported the potential of TB as a standard clinical assessment tool for evaluating pathological changes in vocal range. Future studies should explore if therapeutic approaches based on the passage or variations of it could be developed for overcoming functional limitations and restrictions in vocal range for specific voice disorders.
Article
Objective Adenotonsillectomy is one of the most common surgical procedures performed on children. Caregivers are often concerned about voice change after the procedure, and such concerns remain unsettled. This meta‐analysis analyzed voice change in children after adenotonsillectomy. Data Sources The PubMed, Medline, EMBASE, and Cochrane databases. Review Methods The study protocol was registered on PROSPERO. Two authors independently searched for articles using keywords “adenoidectomy,” “tonsillectomy, “voice,” “nasalance,”and “speech.” English articles specifying voice changes after adenotonsillectomy were pooled with standardized mean difference (SMD) using random‐effects model. Evaluation methods were computerized acoustic voice analysis, aerodynamic analysis, nasometer, rhinomanometry, evaluations from a speech‐language pathologist or otolaryngologist, and a caregiver assessment questionnaire. Results Twenty‐three studies with 2154 children were analyzed (mean age: 8.0 y; 58% boys; mean sample size: 94 children). Due to insufficient data for other outcome variables, this meta‐analysis only summarized changes in the computerized acoustic voice analysis 1 month and 3 months after surgery. The computerized acoustic analysis revealed significant changes in jitter (SMD = −0.36; 95% confidence interval [CI]: −0.60 to −0.11), shimmer (SMD = −0.34; 95% CI: −0.57 to −0.11), and soft phonation index (SMD = −0.36; 95% CI: −0.57 to −0.15) at 1 month after surgery. Parameters including fundamental frequency, jitter, noise‐to‐harmonics ratio, and shimmer were not significantly changed at 3 months after surgery. Conclusions This meta‐analysis observed small improvements in jitter, shimmer, and soft phonation index 1 month after surgery. No significant effects were observed in voice outcomes 3 months after surgery. Laryngoscope , 2023
Article
Purpose The purpose of this clinical focus article is to summarize a community discussion about the practical implementation of recommended vocal function measures by practicing speech-language pathologists specializing in the treatment of voice and upper airway disorders, review common barriers and challenges to implementation, and suggest opportunities for further education and discussion. Method An online discussion was held with members of American Speech-Language-Hearing Association Special Interest Group (SIG) 3 to facilitate discussion regarding participants' experiences implementing vocal function assessment (acoustic and aerodynamic assessment) in their practice settings. The discussion was based on the expert panel consensus paper by Patel et al. (2018), which provided recommendations for a minimum core set of vocal function measures. Results Discussion topics included standardization methods, environmental factors, preferred hardware and software, tasks and measures, interpretation, and infection control. Participants reported that the recommendations of the consensus paper provide a useful guideline for obtaining a core set of reliable and valid measures. They also reported facing barriers in meeting these recommendations due to varying practice settings and resources. Conclusions Variations in instrumental assessment may arise due to differences in clinic models, testing environments, accessible equipment, allotted time, and clinician opinion. During the discussion, participants emphasized the need for further education and discussion on the implementation of vocal function assessment, particularly regarding adaptations for different clinical models, low-cost and low-tech alternatives, synthesis of findings, and the relevance of additional or omitted measures in specific situations. To address these concerns, it is recommended that the SIG 3 community delve deeper into this topic, open additional discussion about various topics cited as barriers to vocal function assessment implementation, and create ongoing educational opportunities for clinicians, especially for those who lack access to a voice-specialized clinical fellowship program or mentorship by a specialized clinical expert.
Article
Full-text available
This study compared voice range profiles (VRPs) of modal and falsetto register in 53 dysphonic and 53 non-dysphonic adult women with gliding vowel /a/’. The results shows that maximum fundamental frequency (F0MAX), maximum intensity (IMAX), F0 range (F0RANGE), and intensity range (IRANGE) are lower in the dysphonic group than in the non-dysphonic group. F0MAX and F0RANGE are significantly higher in falsetto register than modal register in both groups. IMAX and IRANGE are significantly higher in falsetto register in the non-dysphonic group, but those are not different between two registers in the dysphonic group. There was no statistically significant difference in minimum F0 (F0MIN) and minimum intensity (IMIN) between the two groups. Modal-falsetto register transition occurred at 378.86 Hz (F4#) in the dysphonic group and 557.79 Hz (C5#) in the non-dysphonic group, which was significantly lower in the dysphonic group. It can be seen that both modal and falsetto registers in dysphonic adult women are reduced compared to non-dysphoinc adult women, indicating that the vocal folds of dysphonic adult women are not easy to vibrate in high pitches. The results of this study would be the basic data for understanding the acoustic features of voice disorders.
Article
Full-text available
Glottal flow parameters are generally defined as time-domain entities that specify the shape of glottal pulses or their derivatives. The present study is concerned with the relations of glottal parameters to frequency-domain properties in order to bring out perceptually important aspects. The analysis also aims at techniques to extract frequency- as well as time-domain parameters from frequency-domain representations. This involves frequency-domain inverse filtering, analytical transformations, and analysis-by-synthesis procedures. Frequency-domain processing is recommended as a complement to or a substitute to conventional time-domain analysis. As advantage is the less severe demands on low-frequency recording fidelity. Moreover, already available narrow-band spectral sections may be processed in order to derive major voice source parameters. The frequency-domain matching ensures optimal conditions for Hi-Fi resynthesis. The theoretical analysis also sheds light on time-domain processing techniques suitable to support frequency-domain processing, e.g., selective inverse filtering. The frequency-domain analysis includes studies of covarying formant bandwidths and subglottal coupling effects which become especially apparent in breathy voicing.
Article
This book has its origin in a letter. In November of 1959, the late Prof. Dr. WERNER MEYER-EpPLER wrote to me, asking if I would contribute to a series he was planning on Communication. His book " Grundlagen und Anwendungen der Informationstheorie" was to serve as the initial volume of the series. After protracted consideration, I agreed to undertake the job provided it could be done outside my regular duties at the Bell Telephone Laboratories. Shortly afterwards, I received additional responsibilities in my research organization, and felt that I could not conveniently pursue the manuscript. Consequently, except for the preparation of a detailed outline, the writing was delayed for about a year and a half. In the interim, Professor MEYER-EpPLER suffered a fatal illness, and Professors H. WOLTER and W. D. KEIDEL assumed the editorial re­ sponsibilities for the book series. The main body of this material was therefore written as a leisurc­ time project in the years 1962 and 1963. The complete draft of the manuscript was duplicated and circulated to colleagues in three parts during 1963. Valuable comments and criticisms were obtained, revisions made, and the manuscript submitted to the publisher in March of 1964. The mechanics of printing have filled the remaining time. If the reader finds merit in the work, it will be owing in great measure to the people with whom I have had the good fortune to be associated.
Article
Phonation threshold pressure has previously been defined as the minimum lung pressure required to initiate phonation. By modeling the dependence of this pressure on fundamental frequency, it is shown that relatively simple aerodynamic relations for time-varying flow in the glottis are obtained. Lung pressure and peak glottal flow are nearly linearly related, but not proportional. For this reason, typical power-law relations that have previously been proposed do not hold. Glottal impedance for time-varying flow must be defined differentially rather than as a simple ratio between pressure and flow. It is shown that the peak flow, the peak flow derivative, the open quotient, and the speed quotient of inverse-filtered glottal flow waveforms all depend explicitly on phonation threshold pressure. Data from singers are compared with those from nonsingers. The primary difference is that singers obtain two to three times greater peak flow for a given lung pressure, suggesting that they adjust their glottal or vocal tract impedance for optimal flow transfer between the source and the resonator.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
Measurements on the inverse filtered airflow waveform (the ‘‘glottal waveform’’) and of estimated average transglottal pressure and glottal airflow were made from noninvasive recordings of productions of syllable sequences in soft, normal, and loud voice for 25 male and 20 female speakers. Statistical analyses showed that with change from normal to loud voice, both males and females produced loud voice with increased pressure, accompanied by increased ac flow and increased maximum airflow declination rate. With change from normal voice, soft voice was produced with decreased pressure, ac flow and maximum airflow declination rate, and increased dc and average flow. Within the loudness conditions, there was no significant male–female difference in air pressure. Several glottal waveform parameters separated males and females in normal and loud voice. The data indicate higher ac flow and higher maximum airflow declination rate for males. In soft voice, the male and female glottal waveforms were more alike, and there was no significant difference in maximum airflow declination rate. The dc flow did not differ significantly between males and females. Possible relevance to biomechanical differences and differences in voice source characteristics between males and females and across loudness conditions is discussed.