Fig 1 - uploaded by Paula Odete Fernandes
Content may be subject to copyright.
Source publication
Voice acoustic analysis is becoming more and more useful in diagnosis of voice disorders or laryngological pathologies. The facility to record a voice signal is an advantage over other invasive techniques. This paper presents the statistical analyzes of a set of voice parameters like jitter, shimmer and HNR over a 4 groups of subjects with dysphoni...
Similar publications
Purpose:
The present study aimed at observing the possible differential effects of eight semioccluded vocal tract exercises (SOVTE) on vocal economy measured by the Quasi Output Cost Ratio (QOCR).
Methods:
Thirty-six participants were included in this study. They were divided into two groups: an experimental group of subjects diagnosed with mild...
Citations
... As a part of acoustic voice parameters, jitter ppq5 (%), shimmer (dB), HNR (dB), and CPPS (dB) were extracted. The controlled vibration of vocal cords influences jitter, reduction in glottal resistance correlates with shimmer, the overall periodicity of the voice signal by calculating the ratio of harmonics and noise component influences HNR, 30 and increased CPPS indicates good harmonic structure in voice. 31,32 Therefore, the perturbation parameters, that is, jitter and shimmer, are expected to decrease, whereas HNR and CPPS are expected to increase with improved vocal function following effective voice therapy. ...
Summary: Purpose. The data on the effectiveness of the semi-occluded vocal tract exercise (SOVTE) program over multiple sessions are not yet sufficiently documented, and there is a need for future research within structured therapeutic frameworks. Investigating therapeutic techniques within a treatment paradigm may help to evaluate the long-term effectiveness and better management of voice disorders.
Study design. The design of this study was a prospective, single-blind, randomized controlled trial.
Method. The study was a four-arm trial—(a) lip trill group—experimental group 1, (b) straw phonation group—experimental group 2, (c) vocal function exercises group—experimental group 3, or (d) no-treatment group (NT). The study included 44 participants who were diagnosed with voice disorder secondary to organic causes. Outcome measures were obtained after 6 voice therapy sessions of 30–40 minutes each, carried out twice
weekly for the 3 experimental groups. Independent practice at home was used as well. The subjective, objective, and multiparametric measures of voice were analyzed.
Results. Linear mixed-model analyses revealed significant effects of Conditions and Groups, confirming treatment efficacy. Repeated measures analysis of variance also showed strong conditions effects. In posttreatment, the values of the subjective parameters (Grade, Roughness, Breathiness, Asthenia, Strain, and the scores of the Vocal Performance Questionnaire) were reduced; the values of acoustic and multiparametric measures such as jitter, shimmer, Acoustic Voice Quality Index, and Acoustic Breathiness Index scores reduced and the smoothed cepstral peak prominence values increased; and the perturbation-related electroglottographic parameters reduced in all experimental groups. Post hoc tests confirmed significantly better outcomes in experimental groups versus the NT, and the lip trill group showed better improvements among all the experimental groups.
Conclusion. The results confirmed that all three SOVTEs caused significant improvements in the outcome measures in individuals with dysphonia compared with the NT.
Trial registration. Clinical Trials Registry—India (CTRI/2023/09/057396).
... Our results are consistent with previous research (Santos et al., 2013), which also identified increased levels of jitter and shimmer, along with reduced HNR, as distinguishing acoustic features in children with ASD integrated into a ML model to classify ASD and TD groups. The study further emphasizes that these vocal quality differences, linked to breathiness, hoarseness, and roughness, can serve as early biomarkers for ASD or even other disorders or pathologies (Meghashree & Nataraja, 2019; Santos et al., 2013;Teixeira & Fernandes, 2015). optimization, were implemented using the PyTorch v.2.0.1 framework. ...
The objective of this study is to identify the acoustic characteristics of cries of Typically Developing (TD) and Autism Spectrum Disorder (ASD) children via Deep Learning (DL) techniques to support clinicians in the early detection of ASD.
We used an existing cry dataset that included 31 children with ASD and 31 TD children aged between 18 and 54 months. Statistical analysis was applied to find differences between groups for different voice acoustic features such as jitter, shimmer and harmonics-to-noise ratio (HNR). A DL model based on Recursive Convolutional Neural Networks (R-CNN) was developed to classify cries of ASD and TD children.
We found a statistical significant increase in jitter and shimmer for ASD cries compared to TD, as well as a decrease in HNR for ASD cries. Additionally, the DL algorithm achieved an accuracy of 90.28% in differentiating ASD cries from TD.
Empowering clinicians with automatic non-invasive Artificial Intelligence (AI) tools based on cry vocal biomarkers holds considerable promise in advancing early detection and intervention initiatives for children at risk of ASD, thereby improving their developmental trajectories.
... 4. Jitter and shimmer, which are acoustic measures of voice stability (Teixeira et al., 2013), were frequently found to be higher in EL infants, correlating with later diagnoses of autism or behavioral issues (Santos et al., 2013). They are often associated with hoarseness and roughness of the voice (Teixeira & Fernandes, 2015). Jitter refers to the cycle-to-cycle variation in F0, while shimmer measures the cycle-to-cycle variation in amplitude 5. Formants (F1, F2, etc.) are the prominent spectral peaks observed in the acoustic signal, created by resonance frequencies shaped by articulation in speech. ...
... Both retrospective and prospective studies demonstrate that features like F0, hyperphonation, amplitude, voice quality measures, and cry duration as reliable vocal biomarkers for distinguishing AutI/EL from TD/DL groups. Among these, hyperphonation is often associated with heightened arousal or stress, potentially reflecting sensory or emotional regulation differences common in autism Zeskind et al., 2011), while voice quality irregularities in jitter and shimmer may signify underlying motor or neurological differences (Teixeira & Fernandes, 2015), underscoring the potential of infant cries as a biomarker for early screening. Moreover, the consistency of these findings across different ages and cry contexts (e.g., distress-related or spontaneous cries), also suggests their robustness as vocal markers of early autism-related neurodevelopmental markers. ...
Cry analysis is emerging as a promising tool for early autism identification. Acoustic features such as fundamental frequency (F0), cry duration, and phonation have shown potential as early vocal biomarkers. This systematic review and meta-analysis aimed to evaluate the diagnostic value of cry characteristics and the role of Machine Learning (ML) in improving autism screening. A comprehensive search of relevant databases was conducted to identify studies examining acoustic cry features in infants with an elevated likelihood of autism. Inclusion criteria focused on retrospective and prospective studies with clear cry feature extraction methods. A meta-analysis was performed to synthesize findings, particularly focusing on differences in F0, and assessing the role of ML-based cry analysis. The review identified eleven studies with consistent acoustic markers, including F0, phonation, duration, amplitude, and voice quality, as reliable indicators of neurodevelopmental differences associated with autism. ML approaches significantly improved screening precision by capturing non-linear patterns in cry data. The meta-analysis of six studies revealed a trend toward higher F0 in autistic infants, although the pooled effect size was not statistically significant. Methodological heterogeneity and small sample sizes were notable limitations across studies. Cry analysis holds promise as a non-invasive, accessible tool for early autism screening, with ML integration enhancing its diagnostic potential. However, the findings emphasize the need for large-scale, longitudinal studies with standardized methodologies to validate its utility and ensure its applicability across diverse populations. Addressing these gaps could establish cry analysis as a cornerstone of early autism identification.
... The overall value of the HNR of the signal varies because different vocal tract configurations involve different amplitudes for the harmonics [60]. The vowel /a/ has generally lower HNR than the /i/ and /u/ vowels [61]. The vowel /a/ presents a constriction in the pharynx and an expansion near the lips, while the vowel /i/ requires a constriction near the lips and an Associations Between Stress and Voice Acoustics Parameters in Professors Carrillo-González et al. ...
Introduction. Teachers have a high risk of developing voice disorders and high-stress levels due to their working conditions. Moreover, stress causes changes at a physiological level in different systems such as the cardiac, gastrointestinal, and respiratory systems. In the latter, the rate of airflow is increased producing significant changes in the acoustic parameters of the voice. Methods. An exploratory, correlational, longitudinal study was conducted to investigate the association between perceived stress and three acoustic parameters related to voice perturbation and harmonicity (jitter, shimmer, and harmonics-to-noise ratio) among college professors. The study also aimed to explore potential changes in this association over the follow-up period. Twenty-four college professors participated in the study. Participants completed a questionnaire that gathered information on socio-demographic characteristics, working conditions, and stress perception. Voice samples were collected from each participant and subjected to acoustic analysis using Praat software. To examine the associations between stress levels and the acoustic parameters, generalized linear models (GLM) with a gamma distribution were employed. Results. We found that professors with low stress levels had increased jitter and shimmer values; whereas participants with moderate and high stress levels had increased harmonics-to-noise ratio values compared to those with a lower stress level. Conclusions. Stress has an important effect on voice perturbation and harmonicity parameters. These results justify the design of interdisciplinary workplace interventions for voice disorders among teachers that include activities on stress management.
... Harmonic to Noise Ratio (HNR) enables for assessing the relationship between the harmonic and noise components of a speech signal. Different vocal tract topologies result in various amplitudes for the harmonics, which can cause the HNR value of a signal to change [16,[21][22][23][24]. ...
Using acoustic analysis to classify and identify speech disorders non-invasively can reduce waiting times for patients and specialists while also increasing the accuracy of diagnoses. In order to identify models to use in a vocal disease diagnosis system, we want to know which models have higher success rates in distinguishing between healthy and pathological sounds. For this purpose, 708 diseased people spread throughout 19 pathologies, and 194 control people were used. There are nine sound files per subject, three vowels in three tones, for each subject. From each sound file, 13 parameters were extracted. For the classification of healthy/pathological individuals, a variety of classifiers based on Machine Learning models were used, including decision trees, discriminant analyses, logistic regression classifiers, naive Bayes classifiers, support vector machines, classifiers of closely related variables, ensemble classifiers and artificial neural network classifiers. For each patient, 118 parameters were used initially. The first analysis aimed to find the best classifier, thus obtaining an accuracy of 81.3% for the Ensemble Sub-space Discriminant classifier. The second and third analyses aimed to improve ground accuracy using preprocessing methodologies. Therefore, in the second analysis, the PCA technique was used, with an accuracy of 80.2%. The third analysis combined several outlier treatment models with several data normalization models and, in general, accuracy improved, obtaining the best accuracy (82.9%) with the combination of the Greebs model for outliers treatment and the range model for the normalization of data procedure.
... Analyzing voice signals yields a number of parameters. For the purpose of diagnosing dysphonia, Teixeira and Fernandes (Teixeira & Fernandes, 2015) investigated how reliable Jitter, Shimmer, and HNR features were. Three factors were statistically examined for the vowels /a/, /i/, and /u/ in high, low, and normal tones. ...
... The jitta vowel system emphasized nearly every tone. This pertains to the examination conducted in (Teixeira & Fernandes, 2015), which highlights a distinction between normal and pathological conditions in terms of Jitter characteristics across three different tones and vowels. No emphasis was placed on any certain pitch or vowel for the jitter settings. ...
Analysis of the voice is an important diagnostic technique that may be used to identify anomalies in the voice. It provides a non-invasive alternative to treatments that are invasive. Within the scope of this research, a complete investigation of voice disorder evaluation approaches is investigated, with a particular emphasis placed on acoustic analysis and categorization. The research makes use of a number of metrics, including Jitter, Shimmer, and Harmonic-to-Noise Ratio (HNR), in conjunction with an Artificial Neural Network (ANN) classifier. Through the utilization of the Saarbruecken Voice Database (SVD), the research attempts to differentiate between voices that are healthy and voices that are dysphonic regardless of gender. In order to improve the accuracy of the model, Principal Component Analysis (PCA) is a useful tool for feature selection. Among the females, the best accuracy achieved when using all indications to differentiate between healthy and dysphonic persons was 87.9%. By using the produced output model, we were able to reduce the number of input parameters to 17. Consistent with all parameters, the highest obtained accuracy was 87.9%. Exhibiting neither loss nor gain of knowledge. The men's instance indicates significantly reduced levels of precision. The male group attained a maximum accuracy of 94% when classifying between healthy and dysphonic individuals using all measures. The generated output model enabled us to decrease the input parameters to 14. Once again, the highest level of accuracy achieved remained consistent at 70% across all parameters.The findings demonstrate that there are various degrees of accuracy in the male and female groups, demonstrating the efficacy of particular factors in categorization.
... Jitter and shimmer are the two common perturbation measures in acoustic analysis. Jitter is a measure of frequency instability, while shimmer is a measure of amplitude instability [15]. Perturbation refers to a disturbance in the regularity of a waveform and correlates to perceived roughness or harshness of the voice. ...
Currently, there are no established objective biomarkers for diagnosing or monitoring schizophrenia. Studies have shown that there are noteworthy differences in the speech of schizophrenics. The primary goal of the current study is to examine possible acoustic differences in vowel production between Greek speakers with schizophrenia and healthy controls. Eleven Greek speakers with schizophrenia and twelve healthy controls participated in the study. The results showed significant differences between the two groups in F1 and F2 frequencies, in jitter and shimmer as well as in the total length of pauses in spontaneous speech. These can pave the way for future developments toward the detection of disease patterns using inexpensive and non-invasive methods.
... This is in line with Zeskind's findings (Zeskind et al., 1985) where cries with a faster repetition rate, shorter cry expirations or pauses, and higher F0 values may elicit more urgent caregiver responses than other vocalizations with less intense acoustic characteristics. Also, our results matched the limited literature on Jitter, Shimmer, HNR or excessive crying when studying irritable infants (Fuller et al., 1994) or dysphonation in adults (Teixeira and Fernandes, 2015). In a summary, our findings were consistent with the assumption that the myelinated branch of the vagus system is involved in both the regulation of heart rate and laryngeal muscles, suggesting that vagal influence on the heart may reflect vagal output to the laryngeal muscles, related to the F0 of infant crying (Shinya et al., 2016). ...
Introduction
Even though infant crying is a common phenomenon in humans’ early life, it is still a challenge for researchers to properly understand it as a reflection of complex neurophysiological functions. Our study aims to determine the association between neonatal cry acoustics with neurophysiological signals and behavioral features according to different cry distress levels of newborns.
Methods
Multimodal data from 25 healthy term newborns were collected simultaneously recording infant cry vocalizations, electroencephalography (EEG), near-infrared spectroscopy (NIRS) and videos of facial expressions and body movements. Statistical analysis was conducted on this dataset to identify correlations among variables during three different infant conditions (i.e., resting, cry, and distress). A Deep Learning (DL) algorithm was used to objectively and automatically evaluate the level of cry distress in infants.
Results
We found correlations between most of the features extracted from the signals depending on the infant’s arousal state, among them: fundamental frequency (F0), brain activity (delta, theta, and alpha frequency bands), cerebral and body oxygenation, heart rate, facial tension, and body rigidity. Additionally, these associations reinforce that what is occurring at an acoustic level can be characterized by behavioral and neurophysiological patterns. Finally, the DL audio model developed was able to classify the different levels of distress achieving 93% accuracy.
Conclusion
Our findings strengthen the potential of crying as a biomarker evidencing the physical, emotional and health status of the infant becoming a crucial tool for caregivers and clinicians.
... The remaining determinations forms of jitter and shimmer are not used because in a statistical study carried out by [61] they did not show statistically significant differences between jitter and relative shimmer correspondingly. ...
Schizophrenia is a mental illness that affects an estimated 21 million people worldwide. The literature establishes that electroencephalography (EEG) is a well-implemented means of studying and diagnosing mental disorders. However, it is known that speech and language provide unique and essential information about human thought. Semantic and emotional content, semantic coherence, syntactic structure, and complexity can thus be combined in a machine learning process to detect schizophrenia. Several studies show that early identification is crucial to prevent the onset of illness or mitigate possible complications. Therefore, it is necessary to identify disease-specific biomarkers for an early diagnosis support system. This work contributes to improving our knowledge about schizophrenia and the features that can identify this mental illness via speech and EEG. The emotional state is a specific characteristic of schizophrenia that can be identified with speech emotion analysis. The most used features of speech found in the literature review are fundamental frequency (F0), intensity/loudness (I), frequency formants (F1, F2, and F3), Mel-frequency cepstral coefficients (MFCC’s), the duration of pauses and sentences (SD), and the duration of silence between words. Combining at least two feature categories achieved high accuracy in the schizophrenia classification. Prosodic and spectral or temporal features achieved the highest accuracy. The work with higher accuracy used the prosodic and spectral features QEVA, SDVV, and SSDL, which were derived from the F0 and spectrogram. The emotional state can be identified with most of the features previously mentioned (F0, I, F1, F2, F3, MFCCs, and SD), linear prediction cepstral coefficients (LPCC), linear spectral features (LSF), and the pause rate. Using the event-related potentials (ERP), the most promissory features found in the literature are mismatch negativity (MMN), P2, P3, P50, N1, and N2. The EEG features with higher accuracy in schizophrenia classification subjects are the nonlinear features, such as Cx, HFD, and Lya.
... The voices were recorded with the Zoom (H5, Japan) recorder. [27] Subjective assessment Patients were given a Persian version of the VHI questionnaire. This questionnaire has 30 questions in three subgroups; emotional, functional, and physical. ...
... The mean overall voice severity parameter "G," from the GRBAS ranged from G = 0-3 (0 = normal, 1 = mild, 2 = moderate, and 3 = severe) considered for each patient. [27] Two experienced speech therapists, completely unaware of the patient's condition evaluated any voice samples separately. All the mentioned tests were repeated in three times points including before, end, and 6 months after treatment. ...
... BLR is used when the outcome variable had two categories. [22,27] In this study, laryngeal damages were considered as dependent variables (presence = 1, absent = 0) and chemotherapy, mean, maximum, and minimum doses in the larynx, V50 Gy in the volume of 27% or higher of the larynx, smoking status, age, and gender were considered as independent variables. At the first, the collinearity statistical test was carried out to investigate the collinearity between the predictor variables. ...
Background: Laryngeal damages after chemoradiation therapy (RT) in nonlaryngeal head‑and‑neck
cancers (HNCs) can cause voice disorders and finally reduce the patient’s quality of life (QOL).
The aim of this study was to evaluate voice and predict laryngeal damages using statistical binary
logistic regression (BLR) models in patients with nonlaryngeal HNCs. Methods: This cross‑section
experimental study was performed on seventy patients (46 males, 24 females) with an average age
of 50.43 ± 16.54 years, with nonlaryngeal HNCs and eighty individuals with assumed normal voices.
Subjective and objective voice assessment was carried out in three stages including before, at the
end, and 6 months after treatment. Eventually, the Enter method of the BLR was used to measure
the odds ratio of independent variables. Results: In objective evaluation, the acoustic parameters
except for F0 increased significantly (P < 0.001) at the end treatment stage and decreased 6 months
after treatment. The same trend can be seen in the subjective evaluations, whereas none of the values
returned to pretreatment levels. Statistical models of BLR showed that chemotherapy (P < 0.05),
mean laryngeal dose (P < 0.05), V50 Gy (P = 0.002), and gender (P = 0.008) had the greatest effect
on incidence laryngeal damages. The model based on acoustic analysis had the highest percentage
accuracy of 84.3%, sensitivity of 87.2%, and the area under the curve of 0.927. Conclusions: Voice
evaluation and the use of BLR models to determine important factors were the optimum methods to
reduce laryngeal damages and maintain the patient’s QOL.