Tino Haderlein

Universitätsklinikum Erlangen, Erlangen, Bavaria, Germany

Are you Tino Haderlein?

Claim your profile

Publications (54)19.65 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text ‘Der Nordwind und die Sonne’ (‘The North Wind and the Sun’). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion ‘match of breath and sense units’ and r = 0.87 for the overall voice quality. Human–machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
    Logopedics, phoniatrics, vocology 05/2015; DOI:10.3109/14015439.2015.1019563 · 0.82 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners’ ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (, ). These correlations were approximately the same as the interrater agreement among human raters (, ). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.
    Computational and Mathematical Methods in Medicine 01/2015; 2015:1-11. DOI:10.1155/2015/316325 · 1.02 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Parkinson’s disease (PD) is a chronic neurodegenerative disorder of the nervous central system and it can affect the communication skills of the patients. There is an interest in the research community to develop computer aided tools for the analysis of the speech of people with PD for detection and monitoring. In this paper, three new acoustic measures for the simultaneous analysis of the phonation and articulation of patients with PD are presented. These new measures along with other classical articulation and perturbation features are objectively evaluated with a discriminant criterion. According to the results, the speech of people with PD can be detected with an accuracy of 81% when phonation and articulation features are combined.
    Text, Speech, and Dialogue, Brno, Czech Republic; 09/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic intelligibility assessment using automatic speech recognition is usually language specific. In this study, a language-independent approach is proposed. It uses models that are trained with Flemish speech, and it is applied to assess chronically hoarse German speakers. The research questions are here: is it possible to construct suitable acoustic features that generalize to other languages and a speech disorder, and is the generated model for intelligibility also suitable for specific subtypes of that disorder, i.e. functional and organic dysphonia? 73 German-speaking persons with chronic hoarseness read the text 'Der Nordwind und die Sonne'. Perceptual intelligibility scores were used as ground truth during the training of an automatic model that converts speaker level acoustic measurements into intelligibility scores. Cross-validation is used to assess model performance. The interrater agreement for all patients (n = 73) and for the functional and organic dysphonia subgroups (n = 45 and n = 24) are r = 0.82, r = 0.83 and r = 0.75, respectively. The automatic assessment based on phonologically based acoustic models revealed correlations between perceptual and automatic intelligibility ratings of r = 0.79 (all patients), r = 0.78 (functional dysphonia) and r = 0.80 (organic dysphonia). The automatic, objective measurement of intelligibility is a valuable instrument in an evidence-based clinical practice. © 2015 S. Karger AG, Basel.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hintergrund Patienten mit chronischer Laryngitis und T1-Stimmlippenkarzinom wurden durch eine subjektive und objektive, textbasierte Stimm- und Sprechanalyse miteinander verglichen, um festzustellen, welche Patientengruppe stärker in ihrer stimmlichen Kommunikationsfähigkeit eingeschränkt ist und ob eine Differenzierung der beiden Gruppen möglich ist. Patienten und Methode Jeweils 13 Patienten mit histologisch gesicherter chronischer Laryngitis bzw. T1-Stimmlippenkarzinom wurden subjektiv von fünf Experten anhand von sieben klinisch relevanten Kriterien und objektiv mittels automatischer Spracherkennung und prosodischer Analyse bewertet und verglichen. Ergebnis Sowohl die Daten der fünf Experten als auch die Ergebnisse der automatischen Bewertung weisen zwischen den zwei Patientengruppen keine signifikanten Unterschiede in der Stimm- und Sprechanalyse auf. Schlussfolgerung Durch eine subjektive und objektive Stimm- und Sprechanalyse ist eine Unterscheidung zwischen chronischer Laryngitis und T1-Stimmlippenkarzinom nicht möglich, da die Patientengruppen in ihrer Stimmqualität keinen signifikanten Unterschied aufweisen.
    HNO 08/2013; 61(8). DOI:10.1007/s00106-013-2718-z · 0.54 Impact Factor
  • B Bartke · T Haderlein · M Döllinger · E Nöth · S Graf · U Eysholdt · A Ziethe
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Patients with chronic laryngitis and T1 vocal cord cancer were compared using perceptual and text-based objective voice and speech analyses in order to determine which group is more affected in its ability to communicate and whether a distinction between the two pathologies is possible. PATIENTS AND METHODS: In all, 13 patients with histologically proven chronic laryngitis and 13 patients with T1 vocal cord cancer were compared perceptually by five speech therapists on the basis of seven criteria and objectively by a speech recognition system and prosodic analysis. RESULTS: Both, the data of the five speech therapists and the results of the automatic analysis revealed no significant differences between the two patient groups. CONCLUSION: A distinction between chronic laryngitis and T1 vocal cord carcinoma by mere voice and speech analysis is not possible, because the patient groups do not show significant differences in their voice quality.
    HNO 06/2013; · 0.54 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the combination of the work of speech therapists and speech recognition systems. Our long term goal is to evaluate the degree of stuttering during therapy and to use the automatic analysis of stuttered speech as a screening method, e.g. the search for potential stutterers at an early age. The approach is to have a patient read a standard text aloud and then automatically count the unfluent parts and classify them. The text to be read by the patients is automatically transformed into a formal grammar that considers potential dysfluencies caused by stuttering. Recordings from
  • E Seiferlein · T Haderlein · M Schuster · E Gräßel · C Bohr
    [Show abstract] [Hide abstract]
    ABSTRACT: When coming to terms with a diagnosis of laryngeal cancer, patients find different ways of coping with their illness. These may or may not be related to communication. Vocal aspects of quality of life are particularly important with cancer of the larynx. The correlation between coping and subjective assessment of the voice-related quality of life was assessed in a cross-sectional study of patients after resection of T1 and T2 laryngeal tumours. As part of follow-up care, 55 male cancer patients with partial laryngectomy were asked about their voice-related quality of life and their coping strategies. The Voice-Related Quality of Life Questionnaire (V-RQOL) and the Trier Coping Scales (TCS) were used as survey instruments. The voice-related quality of life of the patients was assessed on average as medium to good. The coping strategy most frequently chosen by patients was 'threat prevention', followed by 'search for social integration', 'rumination', 'search for information and experience exchange' and 'search for support in religion'. Correlations between coping strategy and the voice-related quality of life were weak to moderate and somewhat inconsistent in this patient population. There was no consistent or strong correlation between voice-related quality of life and coping strategies in male patients with partial laryngectomy, so that individual differences appeared to be more important in coping with illness than primarily voice-related factors such as the voice-related quality of life.
    Archives of Oto-Rhino-Laryngology 04/2012; 269(9):2091-6. DOI:10.1007/s00405-012-2020-9 · 1.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The tracheoesophageal (TE) substitute voice is currently state–of–the–art treatment to restore the ability to speak after laryngectomy. The intelligibility while talking over a telephone is an important clinical factor, as it is a crucial part of the patients ’ social life. An objective way to rate the intelligibility of substitute voices when talking over a telephone is desirable to improve the post–laryngectomy speech therapy. An automatic speech recognition (ASR) system was applied to 41 high quality recordings of post–laryngectomy patients. The ASR system was trained with normal, non–pathologic speech. It yielded a word accuracy (WA) of 36.9%±18.0%; compared to the intelligibility rating of a group of human experts the ASR system had a correlation coefficient of-.88. After downsampling the 41 recordings to telephone quality, the ASR system reached a WA of 26.4%±13.9 % leading to a correlation coefficient of-.80. These results confirm that an ASR system can be used for objective intelligibility rating over the telephone. Samodejna evalvacija traheoezofagalnega telefonskega govora Traheoezofagalni nadomestni glas je trenutno najsodobnejˇsi način obnove sposobnosti govora po laringektomiji. Razumljivost pri telefonskem pogovoru je pomemben kliničen dejavnik, saj predstavlja ključen del pacientove socialne interakcije. Za izboljˇsanje govorne terapije po laringektomiji je zaˇzelen objektiven način ocenjevanja razumljivosti nadomestnih glasov pri telefonskem pogovoru. S sistemom za samodejno razpoznavanje govora (SRG) je bilo pregledanih 41 visoko kakovostnih posnetkov pacientov po laringektomiji. Sistem SRG so učili z normalnim, nepatoloˇskim govorom. Odstotek pravilno razpoznanih besed je bil 36,9%±18,0%; v primerjavi z
  • Source
    Tino Haderlein · Cornelia Moers · Bernd Möbius · Elmar Nöth
    [Show abstract] [Hide abstract]
    ABSTRACT: The standard for the analysis of distorted voices is perceptual rating of read-out texts or spontaneous speech. Automatic voice evaluation, however, is usually done on stable sections of sustained vowels. In this paper, text-based and established vowel-based analysis are compared with respect to their ability to measure hoarseness and its subclasses. 73 hoarse patients (48.3 ± 16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Five speech therapists and physicians rated roughness, breathiness, and hoarseness according to the German RBH evaluation scheme. The best human-machine correlations were obtained for measures based on the Cepstral Peak Prominence (CPP; up to |r|=0.73). Support Vector Regression (SVR) on CPP-based measures and prosodic features improved the results further to r ≈ 0.8 and confirmed that automatic voice evaluation should be performed on a text recording.
  • M Lindl · T Haderlein · E Grässel · A Maier · A Ströbele · C Bohr · M Schuster
    [Show abstract] [Hide abstract]
    ABSTRACT: The assessment of the treatment results of laryngeal cancer includes subjective aspects. Two tools for assessment of the quality of life of patients after treatment of small laryngeal carcinoma were compared: The disease-unspecific short-form-36 health-survey (SF-36) and the specific voice-related-quality-of-life-questionnaire (V-RQOL). Data of 65 patients after partial laryngeal resection was evaluated during regular out clinic examinations. The average V-RQOL total score was 70,0 ± 24,3. Similar results were achieved for the physically (68,2 ± 24,3) and for the emotional (72,5 ± 27,6) subscores of the V-RQOL-survey being lower than the cut-off for healthy voices, which is at 80 points. The SF-36-health survey score was 43,0 ± 10,7 for the physically subscore and 50,2 ± 9,1 for the emotional subscore. Both subscores were rated worse than the age-adjusted standard value for the SF-36. There is a moderate correlation between both questionnaires, which does not depend from the size of the laryngeal carcinoma (T1 or T2). The voice-related quality of life is part of the health-related quality of life next to other factors. An amelioration of voice-related quality of life thus should lead to better general, health-related quality of life.
    Laryngo-Rhino-Otologie 10/2011; 91(8):494-9. DOI:10.1055/s-0031-1279734 · 0.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic voice evaluation is usually performed on stable sections of sustained vowels, which often cannot capture hoarseness properly. The measures cepstral peak prominence (CPP) and smoothed CPP (CPPS) do not require exact determination of the cycles of fundamental frequency like established perturbation-based measures. They can also be applied to text recordings. In this study, they were compared with perceptual evaluation of voice quality and the German roughness-breathiness-hoarseness (RBH) scheme. Retrospective data analysis. Seventy-three hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text "The North Wind and the Sun". The text recordings were evaluated perceptually by five speech therapists and physicians according to the RBH scale. The criterion "overall quality" was measured on a 4-point scale and a visual analog scale. For the human-machine correlation, the automatic measures of the Praat program (vowels only) and the "cpps" software were compared with the experts' ratings. The experiments were repeated for speakers with jitter ≤5% or shimmer ≤5% (n=47). For the entire group (n=73), the best human-machine results for most of the rating criteria were obtained for text-based CPP and CPPS (up to |ρ|=0.73). For the 47 selected speakers, the correlation was remarkably worse for all measures but still best for text-based CPP and CPPS (|ρ|≤0.50). Cepstrum analysis should be performed on a text recording. Then, it outperforms all perturbation-based measures, and it can be a meaningful objective support for perceptual analysis.
    Journal of voice: official journal of the Voice Foundation 09/2011; 26(4):416-24. DOI:10.1016/j.jvoice.2011.05.001 · 0.94 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective assessment of intelligibility on the telephone is desirable for voice and speech assessment and rehabilitation. A total of 82 patients after partial laryngectomy read a standardized text which was synchronously recorded by a headset and via telephone. Five experienced raters assessed intelligibility perceptually on a five-point scale. Objective evaluation was performed by support vector regression on the word accuracy (WA) and word correctness (WR) of a speech recognition system, and a set of prosodic features. WA and WR alone exhibited correlations to human evaluation between |r| = 0.57 and |r| = 0.75. The correlation was r = 0.79 for headset and r = 0.86 for telephone recordings when prosodic features and WR were combined. The best feature subset was optimal for both signal qualities. It consists of WR, the average duration of the silent pauses before a word, the standard deviation of the fundamental frequency on the entire sample, the standard deviation of jitter, and the ratio of the durations of the voiced sections and the entire recording.
    Logopedics, phoniatrics, vocology 08/2011; 36(4):175-81. DOI:10.3109/14015439.2011.607470 · 0.82 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One aspect of voice and speech evaluation after laryngeal cancer is acoustic analysis. Perceptual evaluation by expert raters is a standard in the clinical environment for global criteria such as overall quality or intelligibility. So far, automatic approaches evaluate acoustic properties of pathologic voices based on voiced/unvoiced distinction and fundamental frequency analysis of sustained vowels. Because of the high amount of noisy components and the increasing aperiodicity of highly pathologic voices, a fully automatic analysis of fundamental frequency is difficult. We introduce a purely data-driven system for the acoustic analysis of pathologic voices based on recordings of a standard text. Short-time segments of the speech signal are analyzed in the spectral domain, and speaker models based on this information are built. These speaker models act as a clustered representation of the acoustic properties of a person's voice and are thus characteristic for speakers with different kinds and degrees of pathologic conditions. The system is evaluated on two different data sets with speakers reading standardized texts. One data set contains 77 speakers after laryngeal cancer treated with partial removal of the larynx. The other data set contains 54 totally laryngectomized patients, equipped with a Provox shunt valve. Each speaker was rated by five expert listeners regarding three different criteria: strain, voice quality, and speech intelligibility. We show correlations for each data set with r and ρ≥0.8 between the automatic system and the mean value of the five raters. The interrater correlation of one rater to the mean value of the remaining raters is in the same range. We thus assume that for selected evaluation criteria, the system can serve as a validated objective support for acoustic voice and speech analysis.
    Journal of voice: official journal of the Voice Foundation 08/2011; 26(3):390-7. DOI:10.1016/j.jvoice.2011.04.010 · 0.94 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In comparison with laryngeal voice, substitute voice after laryngectomy is characterized by restricted aero-acoustic properties. Until now, an objective means of prosodic diVerences between substitute and normal voices does not exist. In a pilot study, we applied an automatic prosody analysis module to 18 speech samples of laryngectomees (age: 64.2 § 8.3 years) and 18 recordings of normal speakers of the same age (65.4 § 7.6 years). Ninety-Wve diVerent features per word based upon the speech energy, fundamental frequency F0 and duration measures on words, pauses and voiced/voiceless sections were measured. These reXect aspects of loudness, pitch and articulation rate. Subjective evaluation of the 18 patients ’ voices was performed by a panel of Wve experts on the criteria “noise”, “speech eVort”, “roughness”, “intelligibility”, “match of breath and sense units ” and “overall quality”. These ratings were compared to the automatically computed features. Several of them could be identiWed being twice as high for the laryngectomees compared to the normal speakers, and vice versa. Comparing the evaluation data of the human experts and the automatic rating, correlation coeYcients of up to 0.84 were measured. The automatic analysis serves as a good means to objectify and quantify the global speech outcome of laryngectomees. Even better results are expected when both the computation of the features and the comparison method to the human ratings will have been revised and adapted to the special properties of the substitute voices.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Laryngeal cancer can affect the patients' voice. For assessment of the patients' self-perception of their voice, several tools were introduced into clinical routine. The voice handicap index questionnaire (VHI) is regarded as the "gold standard". However, in benign laryngeal pathologies and in functional dysphonia, the shorter voice-related quality of life questionnaire (V-RQOL) proved to be equivalent. This study examines the correlation of both questionnaires in patients who had been treated for small (T1 and T2) laryngeal cancer. It was performed during regular outclinic examinations. In total, 65 patients aged 62.1 ± 10.0 years completed the German versions of the VHI and V-RQOL. Their average VHI total score was 38.9 ± 26.0 points and the average V-RQOL score was 70.1 ± 24.4%. The total scores correlated with |ρ| = 0.92 and p < 0.01. Both questionnaires give quasi identical results, the shorter V-RQOL may be favoured for clinical application.
    Archives of Oto-Rhino-Laryngology 03/2011; 268(3):401-4. DOI:10.1007/s00405-010-1374-0 · 1.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For voice rehabilitation, speech intelligibility is an important criterion. Automatic evaluation of intelligibility has been shown to be successful for automatic speech recognition methods combined with prosodic analysis. In this paper, this method is extended by using measures based on the Cepstral Peak Prominence (CPP). 73 hoarse patients (48.3 ± 16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Their intelligibility was evaluated perceptually by 5 speech therapists and physicians according to a 5-point scale. Support Vector Regression (SVR) revealed a feature set with a human-machine correlation of r = 0.85 consisting of the word accuracy, smoothed CPP computed from a speech section, and three prosodic features (normalized energy of word-pause-word intervals, F 0 value at voice offset in a word, and standard deviation of jitter). The average human-human correlation was r = 0.82. Hence, the automatic method can be a meaningful objective support for perceptual analysis.
    Text, Speech and Dialogue - 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. Proceedings; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Treatment of small carcinoma of the larynx may lead to voice handicap and restricted quality of life. The relationship between the two is revealed. Sixty-five patients aged 62.1 ± 10.0 years rated their voice handicap and quality of life after treatment of T1 (n = 35) or T2 (n = 30) laryngeal carcinoma during regular out-patient examinations. For the self-assessment of the voice, the Voice Handicap Index (VHI) and the disease-independent Short Form-36 Health Survery (SF-36) questionnaires were used. Voice handicap (total score 38.9 ± 26.0) did not differ in the two tested groups, T1 and T2, and the data of SF-36 (physical score 43.0 ± 10.7; mental score 50.2 ± 9.1) showed significant differences for the mental score. Patients rated their voice handicap worse than healthy persons did after treatment of laryngeal carcinoma. VHI and SF-36 data were strongly correlated. Voice handicap is significantly related to the quality of life, especially affecting the mental domain. Thus, the rehabilitation of voice disorders should have a beneficial impact on quality of life.
    Folia Phoniatrica et Logopaedica 10/2010; 63(3):122-8. DOI:10.1159/000316416 · 0.55 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dental rehabilitation by complete dentures is a state-of-the-art approach to improve functional aspects of the oral cavity of edentulous patients. It is important to assure that these dentures have a sufficient fit. We introduce a dataset of 13 edentulous patients that have been recorded with and without complete dentures in situ. These patients have been rated an insufficient fit of their dentures, so that additional (sufficient) dentures and additional speech recordings have been prepared. In this paper we show that sufficient dentures increase the performance of an ASR system by ca. 27 %. Based on these results, we present and discuss three different systems that automatically determine whether the dentures of an edentulous person have a sufficient fit or not. The system with the best performance models the recordings by GMMs and uses the mean vectors of these GMMs as features in an SVM. With this system we were able to achieve a recognition rate of 80 %.
    Text, Speech and Dialogue, 13th International Conference, TSD 2010, Brno, Czech Republic, September 6-10, 2010. Proceedings; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a novel system for the automatic evaluation of speech and voice disorders. The system can be accessed via the internet platform-independently. The patient reads a text or names pictures. His or her speech is then analyzed by automatic speech recognition and prosodic analysis. For patients who had their larynx removed due to cancer and for children with cleft lip and palate we show that we can achieve significant correlations between the automatic analysis and the judgment of human experts in a leave-one-out experiment (p < .001). A correlation of .90 for the evaluation of the laryngectomees and .87 for the evaluation of the children’s data was obtained. This is comparable to human inter-rater correlations.
    Speech Communication 05/2009; 51(5-51):425-437. DOI:10.1016/j.specom.2009.01.004 · 1.55 Impact Factor