[Show abstract][Hide abstract] ABSTRACT: Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners’ ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (, ). These correlations were approximately the same as the interrater agreement among human raters (, ). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.
Full-text · Article · Jul 2015 · Computational and Mathematical Methods in Medicine
[Show abstract][Hide abstract] ABSTRACT: Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text ‘Der Nordwind und die Sonne’ (‘The North Wind and the Sun’). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion ‘match of breath and sense units’ and r = 0.87 for the overall voice quality. Human–machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
No preview · Article · May 2015 · Logopedics, phoniatrics, vocology
[Show abstract][Hide abstract] ABSTRACT: Parkinson’s disease (PD) is a chronic neurodegenerative disorder of the nervous central system and it can affect the communication skills of the patients. There is an interest in the research community to develop computer aided
tools for the analysis of the speech of people with PD for detection and monitoring. In this paper, three new acoustic measures for the simultaneous analysis of the phonation and articulation of patients with PD are presented. These new measures along with other classical articulation and perturbation features are objectively
evaluated with a discriminant criterion. According to the results, the speech of people with PD can be detected with an accuracy of 81% when phonation and articulation features are combined.
[Show abstract][Hide abstract] ABSTRACT: Hintergrund
Patienten mit chronischer Laryngitis und T1-Stimmlippenkarzinom wurden durch eine subjektive und objektive, textbasierte Stimm- und Sprechanalyse miteinander verglichen, um festzustellen, welche Patientengruppe stärker in ihrer stimmlichen Kommunikationsfähigkeit eingeschränkt ist und ob eine Differenzierung der beiden Gruppen möglich ist.
Patienten und Methode
Jeweils 13 Patienten mit histologisch gesicherter chronischer Laryngitis bzw. T1-Stimmlippenkarzinom wurden subjektiv von fünf Experten anhand von sieben klinisch relevanten Kriterien und objektiv mittels automatischer Spracherkennung und prosodischer Analyse bewertet und verglichen.
Sowohl die Daten der fünf Experten als auch die Ergebnisse der automatischen Bewertung weisen zwischen den zwei Patientengruppen keine signifikanten Unterschiede in der Stimm- und Sprechanalyse auf.
Durch eine subjektive und objektive Stimm- und Sprechanalyse ist eine Unterscheidung zwischen chronischer Laryngitis und T1-Stimmlippenkarzinom nicht möglich, da die Patientengruppen in ihrer Stimmqualität keinen signifikanten Unterschied aufweisen.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Patients with chronic laryngitis and T1 vocal cord cancer were compared using perceptual and text-based objective voice and speech analyses in order to determine which group is more affected in its ability to communicate and whether a distinction between the two pathologies is possible. PATIENTS AND METHODS: In all, 13 patients with histologically proven chronic laryngitis and 13 patients with T1 vocal cord cancer were compared perceptually by five speech therapists on the basis of seven criteria and objectively by a speech recognition system and prosodic analysis. RESULTS: Both, the data of the five speech therapists and the results of the automatic analysis revealed no significant differences between the two patient groups. CONCLUSION: A distinction between chronic laryngitis and T1 vocal cord carcinoma by mere voice and speech analysis is not possible, because the patient groups do not show significant differences in their voice quality.
[Show abstract][Hide abstract] ABSTRACT: When coming to terms with a diagnosis of laryngeal cancer, patients find different ways of coping with their illness. These may or may not be related to communication. Vocal aspects of quality of life are particularly important with cancer of the larynx. The correlation between coping and subjective assessment of the voice-related quality of life was assessed in a cross-sectional study of patients after resection of T1 and T2 laryngeal tumours. As part of follow-up care, 55 male cancer patients with partial laryngectomy were asked about their voice-related quality of life and their coping strategies. The Voice-Related Quality of Life Questionnaire (V-RQOL) and the Trier Coping Scales (TCS) were used as survey instruments. The voice-related quality of life of the patients was assessed on average as medium to good. The coping strategy most frequently chosen by patients was 'threat prevention', followed by 'search for social integration', 'rumination', 'search for information and experience exchange' and 'search for support in religion'. Correlations between coping strategy and the voice-related quality of life were weak to moderate and somewhat inconsistent in this patient population. There was no consistent or strong correlation between voice-related quality of life and coping strategies in male patients with partial laryngectomy, so that individual differences appeared to be more important in coping with illness than primarily voice-related factors such as the voice-related quality of life.
No preview · Article · Apr 2012 · Archives of Oto-Rhino-Laryngology
[Show abstract][Hide abstract] ABSTRACT: The tracheoesophageal (TE) substitute voice is currently state–of–the–art treatment to restore the ability to speak after laryngectomy. The intelligibility while talking over a telephone is an important clinical factor, as it is a crucial part of the patients ’ social life. An objective way to rate the intelligibility of substitute voices when talking over a telephone is desirable to improve the post–laryngectomy speech therapy. An automatic speech recognition (ASR) system was applied to 41 high quality recordings of post–laryngectomy patients. The ASR system was trained with normal, non–pathologic speech. It yielded a word accuracy (WA) of 36.9%±18.0%; compared to the intelligibility rating of a group of human experts the ASR system had a correlation coefficient of-.88. After downsampling the 41 recordings to telephone quality, the ASR system reached a WA of 26.4%±13.9 % leading to a correlation coefficient of-.80. These results confirm that an ASR system can be used for objective intelligibility rating over the telephone. Samodejna evalvacija traheoezofagalnega telefonskega govora Traheoezofagalni nadomestni glas je trenutno najsodobnejˇsi način obnove sposobnosti govora po laringektomiji. Razumljivost pri telefonskem pogovoru je pomemben kliničen dejavnik, saj predstavlja ključen del pacientove socialne interakcije. Za izboljˇsanje govorne terapije po laringektomiji je zaˇzelen objektiven način ocenjevanja razumljivosti nadomestnih glasov pri telefonskem pogovoru. S sistemom za samodejno razpoznavanje govora (SRG) je bilo pregledanih 41 visoko kakovostnih posnetkov pacientov po laringektomiji. Sistem SRG so učili z normalnim, nepatoloˇskim govorom. Odstotek pravilno razpoznanih besed je bil 36,9%±18,0%; v primerjavi z
[Show abstract][Hide abstract] ABSTRACT: The standard for the analysis of distorted voices is perceptual rating of read-out texts or spontaneous speech. Automatic voice evaluation, however, is usually done on stable sections of sustained vowels. In this paper, text-based and established vowel-based analysis are compared with respect to their ability to measure hoarseness and its subclasses. 73 hoarse patients (48.3 ± 16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Five speech therapists and physicians rated roughness, breathiness, and hoarseness according to the German RBH evaluation scheme. The best human-machine correlations were obtained for measures based on the Cepstral Peak Prominence (CPP; up to |r|=0.73). Support Vector Regression (SVR) on CPP-based measures and prosodic features improved the results further to r ≈ 0.8 and confirmed that automatic voice evaluation should be performed on a text recording.
[Show abstract][Hide abstract] ABSTRACT: The assessment of the treatment results of laryngeal cancer includes subjective aspects. Two tools for assessment of the quality of life of patients after treatment of small laryngeal carcinoma were compared: The disease-unspecific short-form-36 health-survey (SF-36) and the specific voice-related-quality-of-life-questionnaire (V-RQOL).
Data of 65 patients after partial laryngeal resection was evaluated during regular out clinic examinations.
The average V-RQOL total score was 70,0 ± 24,3. Similar results were achieved for the physically (68,2 ± 24,3) and for the emotional (72,5 ± 27,6) subscores of the V-RQOL-survey being lower than the cut-off for healthy voices, which is at 80 points. The SF-36-health survey score was 43,0 ± 10,7 for the physically subscore and 50,2 ± 9,1 for the emotional subscore. Both subscores were rated worse than the age-adjusted standard value for the SF-36. There is a moderate correlation between both questionnaires, which does not depend from the size of the laryngeal carcinoma (T1 or T2).
The voice-related quality of life is part of the health-related quality of life next to other factors. An amelioration of voice-related quality of life thus should lead to better general, health-related quality of life.
No preview · Article · Oct 2011 · Laryngo-Rhino-Otologie
[Show abstract][Hide abstract] ABSTRACT: Automatic voice evaluation is usually performed on stable sections of sustained vowels, which often cannot capture hoarseness properly. The measures cepstral peak prominence (CPP) and smoothed CPP (CPPS) do not require exact determination of the cycles of fundamental frequency like established perturbation-based measures. They can also be applied to text recordings. In this study, they were compared with perceptual evaluation of voice quality and the German roughness-breathiness-hoarseness (RBH) scheme.
Retrospective data analysis.
Seventy-three hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text "The North Wind and the Sun". The text recordings were evaluated perceptually by five speech therapists and physicians according to the RBH scale. The criterion "overall quality" was measured on a 4-point scale and a visual analog scale. For the human-machine correlation, the automatic measures of the Praat program (vowels only) and the "cpps" software were compared with the experts' ratings. The experiments were repeated for speakers with jitter ≤5% or shimmer ≤5% (n=47).
For the entire group (n=73), the best human-machine results for most of the rating criteria were obtained for text-based CPP and CPPS (up to |ρ|=0.73). For the 47 selected speakers, the correlation was remarkably worse for all measures but still best for text-based CPP and CPPS (|ρ|≤0.50).
Cepstrum analysis should be performed on a text recording. Then, it outperforms all perturbation-based measures, and it can be a meaningful objective support for perceptual analysis.
Full-text · Article · Sep 2011 · Journal of voice: official journal of the Voice Foundation
[Show abstract][Hide abstract] ABSTRACT: Objective assessment of intelligibility on the telephone is desirable for voice and speech assessment and rehabilitation. A total of 82 patients after partial laryngectomy read a standardized text which was synchronously recorded by a headset and via telephone. Five experienced raters assessed intelligibility perceptually on a five-point scale. Objective evaluation was performed by support vector regression on the word accuracy (WA) and word correctness (WR) of a speech recognition system, and a set of prosodic features. WA and WR alone exhibited correlations to human evaluation between |r| = 0.57 and |r| = 0.75. The correlation was r = 0.79 for headset and r = 0.86 for telephone recordings when prosodic features and WR were combined. The best feature subset was optimal for both signal qualities. It consists of WR, the average duration of the silent pauses before a word, the standard deviation of the fundamental frequency on the entire sample, the standard deviation of jitter, and the ratio of the durations of the voiced sections and the entire recording.
Full-text · Article · Aug 2011 · Logopedics, phoniatrics, vocology
[Show abstract][Hide abstract] ABSTRACT: One aspect of voice and speech evaluation after laryngeal cancer is acoustic analysis. Perceptual evaluation by expert raters is a standard in the clinical environment for global criteria such as overall quality or intelligibility. So far, automatic approaches evaluate acoustic properties of pathologic voices based on voiced/unvoiced distinction and fundamental frequency analysis of sustained vowels. Because of the high amount of noisy components and the increasing aperiodicity of highly pathologic voices, a fully automatic analysis of fundamental frequency is difficult. We introduce a purely data-driven system for the acoustic analysis of pathologic voices based on recordings of a standard text.
Short-time segments of the speech signal are analyzed in the spectral domain, and speaker models based on this information are built. These speaker models act as a clustered representation of the acoustic properties of a person's voice and are thus characteristic for speakers with different kinds and degrees of pathologic conditions. The system is evaluated on two different data sets with speakers reading standardized texts. One data set contains 77 speakers after laryngeal cancer treated with partial removal of the larynx. The other data set contains 54 totally laryngectomized patients, equipped with a Provox shunt valve. Each speaker was rated by five expert listeners regarding three different criteria: strain, voice quality, and speech intelligibility.
We show correlations for each data set with r and ρ≥0.8 between the automatic system and the mean value of the five raters. The interrater correlation of one rater to the mean value of the remaining raters is in the same range. We thus assume that for selected evaluation criteria, the system can serve as a validated objective support for acoustic voice and speech analysis.
Full-text · Article · Aug 2011 · Journal of voice: official journal of the Voice Foundation
[Show abstract][Hide abstract] ABSTRACT: Laryngeal cancer can affect the patients' voice. For assessment of the patients' self-perception of their voice, several tools were introduced into clinical routine. The voice handicap index questionnaire (VHI) is regarded as the "gold standard". However, in benign laryngeal pathologies and in functional dysphonia, the shorter voice-related quality of life questionnaire (V-RQOL) proved to be equivalent. This study examines the correlation of both questionnaires in patients who had been treated for small (T1 and T2) laryngeal cancer. It was performed during regular outclinic examinations. In total, 65 patients aged 62.1 ± 10.0 years completed the German versions of the VHI and V-RQOL. Their average VHI total score was 38.9 ± 26.0 points and the average V-RQOL score was 70.1 ± 24.4%. The total scores correlated with |ρ| = 0.92 and p < 0.01. Both questionnaires give quasi identical results, the shorter V-RQOL may be favoured for clinical application.
Preview · Article · Mar 2011 · Archives of Oto-Rhino-Laryngology
[Show abstract][Hide abstract] ABSTRACT: For voice rehabilitation, speech intelligibility is an important criterion. Automatic evaluation of intelligibility has been shown to be successful for automatic speech recognition methods combined with prosodic analysis. In this paper, this method is extended by using measures based on the Cepstral Peak Prominence (CPP). 73 hoarse patients (48.3 ± 16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Their intelligibility was evaluated perceptually by 5 speech therapists and physicians according to a 5-point scale. Support Vector Regression (SVR) revealed a feature set with a human-machine correlation of r = 0.85 consisting of the word accuracy, smoothed CPP computed from a speech section, and three prosodic features (normalized energy of word-pause-word intervals, F
0 value at voice offset in a word, and standard deviation of jitter). The average human-human correlation was r = 0.82. Hence, the automatic method can be a meaningful objective support for perceptual analysis.
[Show abstract][Hide abstract] ABSTRACT: Treatment of small carcinoma of the larynx may lead to voice handicap and restricted quality of life. The relationship between the two is revealed. Sixty-five patients aged 62.1 ± 10.0 years rated their voice handicap and quality of life after treatment of T1 (n = 35) or T2 (n = 30) laryngeal carcinoma during regular out-patient examinations. For the self-assessment of the voice, the Voice Handicap Index (VHI) and the disease-independent Short Form-36 Health Survery (SF-36) questionnaires were used. Voice handicap (total score 38.9 ± 26.0) did not differ in the two tested groups, T1 and T2, and the data of SF-36 (physical score 43.0 ± 10.7; mental score 50.2 ± 9.1) showed significant differences for the mental score. Patients rated their voice handicap worse than healthy persons did after treatment of laryngeal carcinoma. VHI and SF-36 data were strongly correlated. Voice handicap is significantly related to the quality of life, especially affecting the mental domain. Thus, the rehabilitation of voice disorders should have a beneficial impact on quality of life.
Preview · Article · Oct 2010 · Folia Phoniatrica et Logopaedica
[Show abstract][Hide abstract] ABSTRACT: Dental rehabilitation by complete dentures is a state-of-the-art approach to improve functional aspects of the oral cavity
of edentulous patients. It is important to assure that these dentures have a sufficient fit. We introduce a dataset of 13
edentulous patients that have been recorded with and without complete dentures in situ. These patients have been rated an
insufficient fit of their dentures, so that additional (sufficient) dentures and additional speech recordings have been prepared.
In this paper we show that sufficient dentures increase the performance of an ASR system by ca. 27 %. Based on these results,
we present and discuss three different systems that automatically determine whether the dentures of an edentulous person have
a sufficient fit or not. The system with the best performance models the recordings by GMMs and uses the mean vectors of these
GMMs as features in an SVM. With this system we were able to achieve a recognition rate of 80 %.
[Show abstract][Hide abstract] ABSTRACT: We present a novel system for the automatic evaluation of speech and voice disorders. The system can be accessed via the internet platform-independently. The patient reads a text or names pictures. His or her speech is then analyzed by automatic speech recognition and prosodic analysis. For patients who had their larynx removed due to cancer and for children with cleft lip and palate we show that we can achieve significant correlations between the automatic analysis and the judgment of human experts in a leave-one-out experiment (p < .001). A correlation of .90 for the evaluation of the laryngectomees and .87 for the evaluation of the children’s data was obtained. This is comparable to human inter-rater correlations.
Full-text · Article · May 2009 · Speech Communication
[Show abstract][Hide abstract] ABSTRACT: The Hoarseness Diagram, a program for voice quality analysis used in German-speaking countries, was compared with an automatic speech recognition system with a module for prosodic analysis. The latter computed prosodic features on the basis of a text recording. We examined whether voice analysis of sustained vowels and text analysis correlate in tracheoesophageal speakers.
Test speakers were 24 male laryngectomees with tracheoesophageal substitute speech, age 60.6 +/- 8.9 years. Each person read the German version of the text 'The North Wind and the Sun'. Additionally, five sustained vowels were recorded from each patient. The fundamental frequency (F(0)) detected by both programs was compared for all vowels. The correlation between the measures obtained by the Hoarseness Diagram and the features from the prosody module was computed.
Both programs have problems in determining the F(0) of highly pathologic voices. Parameters like jitter, shimmer, F(0), and irregularity as computed by the Hoarseness Diagram from vowels show correlations of about -0.8 with prosodic features obtained from the text recordings.
Voice properties can reliably be evaluated both on the basis of vowel and text recordings. Text analysis, however, also offers possibilities for the automatic evaluation of running speech since it realistically represents everyday speech.
No preview · Article · Apr 2009 · Folia Phoniatrica et Logopaedica
[Show abstract][Hide abstract] ABSTRACT: Tracheoesophageal voice is state-of-the-art in voice rehabilitation after laryngectomy. Intelligibility on a telephone is an important evaluation criterion as it is a crucial part of social life. An objective measure of intelligibility when talking on a telephone is desirable in the field of postlaryngectomy speech therapy and its evaluation.
Based upon successful earlier studies with broadband speech, an automatic speech recognition (ASR) system was applied to 41 recordings of postlaryngectomy patients. Recordings were available in different signal qualities; quality was the crucial criterion for this study.
Compared to the intelligibility rating of 5 human experts, the ASR system had a correlation coefficient of r = -0.87 and Krippendorff's alpha of 0.65 when broadband speech was processed. The rater group alone achieved alpha = 0.66. With the test recordings in telephone quality, the system reached r = -0.79 and alpha = 0.67.
For medical purposes, a comprehensive diagnostic approach to (substitute) voice has to cover both subjective and objective tests. An automatic recognition system such as the one proposed in this study can be used for objective intelligibility rating with results comparable to those of human experts. This holds for broadband speech as well as for automatic evaluation via telephone.
No preview · Article · Apr 2009 · Folia Phoniatrica et Logopaedica