Tino Haderlein

University of Bonn, Bonn, North Rhine-Westphalia, Germany

Are you Tino Haderlein?

Claim your profile

Publications (47)15.53 Total impact

  • [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: Patients with chronic laryngitis and T1 vocal cord cancer were compared using perceptual and text-based objective voice and speech analyses in order to determine which group is more affected in its ability to communicate and whether a distinction between the two pathologies is possible. PATIENTS AND METHODS: In all, 13 patients with histologically proven chronic laryngitis and 13 patients with T1 vocal cord cancer were compared perceptually by five speech therapists on the basis of seven criteria and objectively by a speech recognition system and prosodic analysis. RESULTS: Both, the data of the five speech therapists and the results of the automatic analysis revealed no significant differences between the two patient groups. CONCLUSION: A distinction between chronic laryngitis and T1 vocal cord carcinoma by mere voice and speech analysis is not possible, because the patient groups do not show significant differences in their voice quality.
    HNO 06/2013; · 0.42 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: When coming to terms with a diagnosis of laryngeal cancer, patients find different ways of coping with their illness. These may or may not be related to communication. Vocal aspects of quality of life are particularly important with cancer of the larynx. The correlation between coping and subjective assessment of the voice-related quality of life was assessed in a cross-sectional study of patients after resection of T1 and T2 laryngeal tumours. As part of follow-up care, 55 male cancer patients with partial laryngectomy were asked about their voice-related quality of life and their coping strategies. The Voice-Related Quality of Life Questionnaire (V-RQOL) and the Trier Coping Scales (TCS) were used as survey instruments. The voice-related quality of life of the patients was assessed on average as medium to good. The coping strategy most frequently chosen by patients was 'threat prevention', followed by 'search for social integration', 'rumination', 'search for information and experience exchange' and 'search for support in religion'. Correlations between coping strategy and the voice-related quality of life were weak to moderate and somewhat inconsistent in this patient population. There was no consistent or strong correlation between voice-related quality of life and coping strategies in male patients with partial laryngectomy, so that individual differences appeared to be more important in coping with illness than primarily voice-related factors such as the voice-related quality of life.
    Archives of Oto-Rhino-Laryngology 04/2012; 269(9):2091-6. · 1.29 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: The assessment of the treatment results of laryngeal cancer includes subjective aspects. Two tools for assessment of the quality of life of patients after treatment of small laryngeal carcinoma were compared: The disease-unspecific short-form-36 health-survey (SF-36) and the specific voice-related-quality-of-life-questionnaire (V-RQOL). Data of 65 patients after partial laryngeal resection was evaluated during regular out clinic examinations. The average V-RQOL total score was 70,0 ± 24,3. Similar results were achieved for the physically (68,2 ± 24,3) and for the emotional (72,5 ± 27,6) subscores of the V-RQOL-survey being lower than the cut-off for healthy voices, which is at 80 points. The SF-36-health survey score was 43,0 ± 10,7 for the physically subscore and 50,2 ± 9,1 for the emotional subscore. Both subscores were rated worse than the age-adjusted standard value for the SF-36. There is a moderate correlation between both questionnaires, which does not depend from the size of the laryngeal carcinoma (T1 or T2). The voice-related quality of life is part of the health-related quality of life next to other factors. An amelioration of voice-related quality of life thus should lead to better general, health-related quality of life.
    Laryngo-Rhino-Otologie 10/2011; 91(8):494-9. · 0.82 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Automatic voice evaluation is usually performed on stable sections of sustained vowels, which often cannot capture hoarseness properly. The measures cepstral peak prominence (CPP) and smoothed CPP (CPPS) do not require exact determination of the cycles of fundamental frequency like established perturbation-based measures. They can also be applied to text recordings. In this study, they were compared with perceptual evaluation of voice quality and the German roughness-breathiness-hoarseness (RBH) scheme. Retrospective data analysis. Seventy-three hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text "The North Wind and the Sun". The text recordings were evaluated perceptually by five speech therapists and physicians according to the RBH scale. The criterion "overall quality" was measured on a 4-point scale and a visual analog scale. For the human-machine correlation, the automatic measures of the Praat program (vowels only) and the "cpps" software were compared with the experts' ratings. The experiments were repeated for speakers with jitter ≤5% or shimmer ≤5% (n=47). For the entire group (n=73), the best human-machine results for most of the rating criteria were obtained for text-based CPP and CPPS (up to |ρ|=0.73). For the 47 selected speakers, the correlation was remarkably worse for all measures but still best for text-based CPP and CPPS (|ρ|≤0.50). Cepstrum analysis should be performed on a text recording. Then, it outperforms all perturbation-based measures, and it can be a meaningful objective support for perceptual analysis.
    Journal of voice: official journal of the Voice Foundation 09/2011; 26(4):416-24. · 0.95 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Objective assessment of intelligibility on the telephone is desirable for voice and speech assessment and rehabilitation. A total of 82 patients after partial laryngectomy read a standardized text which was synchronously recorded by a headset and via telephone. Five experienced raters assessed intelligibility perceptually on a five-point scale. Objective evaluation was performed by support vector regression on the word accuracy (WA) and word correctness (WR) of a speech recognition system, and a set of prosodic features. WA and WR alone exhibited correlations to human evaluation between |r| = 0.57 and |r| = 0.75. The correlation was r = 0.79 for headset and r = 0.86 for telephone recordings when prosodic features and WR were combined. The best feature subset was optimal for both signal qualities. It consists of WR, the average duration of the silent pauses before a word, the standard deviation of the fundamental frequency on the entire sample, the standard deviation of jitter, and the ratio of the durations of the voiced sections and the entire recording.
    Logopedics, phoniatrics, vocology 08/2011; 36(4):175-81.
  • [show abstract] [hide abstract]
    ABSTRACT: One aspect of voice and speech evaluation after laryngeal cancer is acoustic analysis. Perceptual evaluation by expert raters is a standard in the clinical environment for global criteria such as overall quality or intelligibility. So far, automatic approaches evaluate acoustic properties of pathologic voices based on voiced/unvoiced distinction and fundamental frequency analysis of sustained vowels. Because of the high amount of noisy components and the increasing aperiodicity of highly pathologic voices, a fully automatic analysis of fundamental frequency is difficult. We introduce a purely data-driven system for the acoustic analysis of pathologic voices based on recordings of a standard text. Short-time segments of the speech signal are analyzed in the spectral domain, and speaker models based on this information are built. These speaker models act as a clustered representation of the acoustic properties of a person's voice and are thus characteristic for speakers with different kinds and degrees of pathologic conditions. The system is evaluated on two different data sets with speakers reading standardized texts. One data set contains 77 speakers after laryngeal cancer treated with partial removal of the larynx. The other data set contains 54 totally laryngectomized patients, equipped with a Provox shunt valve. Each speaker was rated by five expert listeners regarding three different criteria: strain, voice quality, and speech intelligibility. We show correlations for each data set with r and ρ≥0.8 between the automatic system and the mean value of the five raters. The interrater correlation of one rater to the mean value of the remaining raters is in the same range. We thus assume that for selected evaluation criteria, the system can serve as a validated objective support for acoustic voice and speech analysis.
    Journal of voice: official journal of the Voice Foundation 08/2011; 26(3):390-7. · 0.95 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Laryngeal cancer can affect the patients' voice. For assessment of the patients' self-perception of their voice, several tools were introduced into clinical routine. The voice handicap index questionnaire (VHI) is regarded as the "gold standard". However, in benign laryngeal pathologies and in functional dysphonia, the shorter voice-related quality of life questionnaire (V-RQOL) proved to be equivalent. This study examines the correlation of both questionnaires in patients who had been treated for small (T1 and T2) laryngeal cancer. It was performed during regular outclinic examinations. In total, 65 patients aged 62.1 ± 10.0 years completed the German versions of the VHI and V-RQOL. Their average VHI total score was 38.9 ± 26.0 points and the average V-RQOL score was 70.1 ± 24.4%. The total scores correlated with |ρ| = 0.92 and p < 0.01. Both questionnaires give quasi identical results, the shorter V-RQOL may be favoured for clinical application.
    Archives of Oto-Rhino-Laryngology 03/2011; 268(3):401-4. · 1.29 Impact Factor
  • Source
    Text, Speech and Dialogue - 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. Proceedings; 01/2011
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Treatment of small carcinoma of the larynx may lead to voice handicap and restricted quality of life. The relationship between the two is revealed. Sixty-five patients aged 62.1 ± 10.0 years rated their voice handicap and quality of life after treatment of T1 (n = 35) or T2 (n = 30) laryngeal carcinoma during regular out-patient examinations. For the self-assessment of the voice, the Voice Handicap Index (VHI) and the disease-independent Short Form-36 Health Survery (SF-36) questionnaires were used. Voice handicap (total score 38.9 ± 26.0) did not differ in the two tested groups, T1 and T2, and the data of SF-36 (physical score 43.0 ± 10.7; mental score 50.2 ± 9.1) showed significant differences for the mental score. Patients rated their voice handicap worse than healthy persons did after treatment of laryngeal carcinoma. VHI and SF-36 data were strongly correlated. Voice handicap is significantly related to the quality of life, especially affecting the mental domain. Thus, the rehabilitation of voice disorders should have a beneficial impact on quality of life.
    Folia Phoniatrica et Logopaedica 10/2010; 63(3):122-8. · 1.08 Impact Factor
  • Source
    EURASIP J. Audio, Speech and Music Processing. 01/2010; 2010.
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Dental rehabilitation by complete dentures is a state-of-the-art approach to improve functional aspects of the oral cavity of edentulous patients. It is important to assure that these dentures have a sufficient fit. We introduce a dataset of 13 edentulous patients that have been recorded with and without complete dentures in situ. These patients have been rated an insufficient fit of their dentures, so that additional (sufficient) dentures and additional speech recordings have been prepared. In this paper we show that sufficient dentures increase the performance of an ASR system by ca. 27 %. Based on these results, we present and discuss three different systems that automatically determine whether the dentures of an edentulous person have a sufficient fit or not. The system with the best performance models the recordings by GMMs and uses the mean vectors of these GMMs as features in an SVM. With this system we were able to achieve a recognition rate of 80 %.
    Text, Speech and Dialogue, 13th International Conference, TSD 2010, Brno, Czech Republic, September 6-10, 2010. Proceedings; 01/2010
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: We present a novel system for the automatic evaluation of speech and voice disorders. The system can be accessed via the internet platform-independently. The patient reads a text or names pictures. His or her speech is then analyzed by automatic speech recognition and prosodic analysis. For patients who had their larynx removed due to cancer and for children with cleft lip and palate we show that we can achieve significant correlations between the automatic analysis and the judgment of human experts in a leave-one-out experiment (p < .001). A correlation of .90 for the evaluation of the laryngectomees and .87 for the evaluation of the children’s data was obtained. This is comparable to human inter-rater correlations.
    Speech Communication. 05/2009;
  • [show abstract] [hide abstract]
    ABSTRACT: The Hoarseness Diagram, a program for voice quality analysis used in German-speaking countries, was compared with an automatic speech recognition system with a module for prosodic analysis. The latter computed prosodic features on the basis of a text recording. We examined whether voice analysis of sustained vowels and text analysis correlate in tracheoesophageal speakers. Test speakers were 24 male laryngectomees with tracheoesophageal substitute speech, age 60.6 +/- 8.9 years. Each person read the German version of the text 'The North Wind and the Sun'. Additionally, five sustained vowels were recorded from each patient. The fundamental frequency (F(0)) detected by both programs was compared for all vowels. The correlation between the measures obtained by the Hoarseness Diagram and the features from the prosody module was computed. Both programs have problems in determining the F(0) of highly pathologic voices. Parameters like jitter, shimmer, F(0), and irregularity as computed by the Hoarseness Diagram from vowels show correlations of about -0.8 with prosodic features obtained from the text recordings. Voice properties can reliably be evaluated both on the basis of vowel and text recordings. Text analysis, however, also offers possibilities for the automatic evaluation of running speech since it realistically represents everyday speech.
    Folia Phoniatrica et Logopaedica 04/2009; 61(2):112-6. · 1.08 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: In this study, an objective version of the postlaryngectomy telephone test (PLTT) for measuring speech intelligibility based on automatic speech recognition is presented. Thirty-one patients with tracheoesophageal substitute voice (25 men and six women, 63.4+/-8.7 years) were evaluated by 11 naïve listeners. The automatic measurement of speech intelligibility was expressed by means of word accuracy and word recognition rates, or the percentage of correctly recognized words from a word sequence. These automatic measures were compared with the subjectively obtained PLTT values. The average PLTT intelligibility of the 11 naïve listeners was 47%; the automatically obtained word accuracy and word recognition rates were much lower (approximately 0% and 15%, respectively). The correlation between subjective and automatic evaluation, however, reached more than 0.9 in some of the examined cases. Automatic speech recognition provides an efficient, objective measure that is equivalent to the overall PLTT intelligibility value.
    HNO 02/2009; 57(1):51-6. · 0.42 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Tracheoesophageal voice is state-of-the-art in voice rehabilitation after laryngectomy. Intelligibility on a telephone is an important evaluation criterion as it is a crucial part of social life. An objective measure of intelligibility when talking on a telephone is desirable in the field of postlaryngectomy speech therapy and its evaluation. Based upon successful earlier studies with broadband speech, an automatic speech recognition (ASR) system was applied to 41 recordings of postlaryngectomy patients. Recordings were available in different signal qualities; quality was the crucial criterion for this study. Compared to the intelligibility rating of 5 human experts, the ASR system had a correlation coefficient of r = -0.87 and Krippendorff's alpha of 0.65 when broadband speech was processed. The rater group alone achieved alpha = 0.66. With the test recordings in telephone quality, the system reached r = -0.79 and alpha = 0.67. For medical purposes, a comprehensive diagnostic approach to (substitute) voice has to cover both subjective and objective tests. An automatic recognition system such as the one proposed in this study can be used for objective intelligibility rating with results comparable to those of human experts. This holds for broadband speech as well as for automatic evaluation via telephone.
    Folia Phoniatrica et Logopaedica 01/2009; 61(1):12-7. · 1.08 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: There are several methods to create visualizations of speech data. All of them, how-ever, lack the ability to remove microphone-dependent distortions. We examined the use of Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and the COmprehensive Space Map of Objective Signal (COSMOS) method in this work. To solve the problem of lacking microphone independency of PCA, LDA, and COSMOS, we present two methods to reduce the influence of the recording conditions on the visualization. The first one is a rigid registration of maps created from identical speakers recorded under different conditions, i.e. different microphones and distances. The second method is an extension of the COSMOS method, which performs a non-rigid registration during the mapping procedure. As a measure for the quality of the visualization, we computed the mapping error which occurs during the dimension reduction and the grouping error as the average distance between the representations of the same speaker recorded by different mi-crophones. The best linear method in leave-one-speaker-out evaluation is PCA plus rigid registration with a mapping error of 47 % and a grouping error of 18 %. The proposed method, however, surpasses this even further with a mapping error of 24 % and a grouping error which is close to zero.
    org Journal of Pattern Recognition Research. 01/2009; 1:32-51.
  • Source
    INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009; 01/2009
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: We describe a GMM-UBM-based evaluation system for pathologic voices that uses standard cepstral features. Per speaker one GMM is created and its components are used to create a so-called GMM supervector. The supervector of each speaker is labeled with the intelligibility values obtained by human evaluation and is used to train an SVR. We studied different GMM supervectors containing different GMM components. On a database of 85 pathologic speak-ers, we achieved a correlation between the automatic system and the expert listeners of r = 0.83 when using a 13312-dimensional super-vector containing the values of the diagonal covariance matrices of 26-dimensional Gaussians.
    01/2009;
  • Source
    Text, Speech and Dialogue, 12th International Conference, TSD 2009, Pilsen, Czech Republic, September 13-17, 2009. Proceedings; 01/2009
  • Source
    T. Haderlein, T. Bocklet, E. Noth, F. Rosanowski
    [show abstract] [hide abstract]
    ABSTRACT: The hoarseness diagram, a program for voice quality analysis using recordings of sustained vowels, was compared to an automatic speech recognition system with a module for prosodic analysis. The latter computed prosodic features on a text recording. We examined whether the voice analysis of sustained vowel and text analysis correlate on a group of 24 male laryngectomees (average age: 60.6plusmn8.9 years) using tracheoesophageal substitute speech. Each person read the German version of the text ldquothe north wind and the sunrdquo which consists of 108 words. Additionally, 5 sustained vowels were recorded from each patient. The correlation between the measures obtained by the Hoarseness Diagram and the prosodic features from the prosody module was determined. Parameters like jitter, shimmer, F0 and irregularity computed by the Hoarseness Diagram on vowel recordings show correlations of about -0.8 to prosodic features obtained from the text recordings. Hence, voice properties can reliably be evaluated both on a vowel and a text recording. The text analysis, however, offers also possibilities for automatic speech evaluation since it represents a real communication situation better.
    Systems, Signals and Image Processing, 2008. IWSSIP 2008. 15th International Conference on; 07/2008

Publication Stats

171 Citations
509 Downloads
2k Views
15.53 Total Impact Points

Institutions

  • 2011
    • University of Bonn
      Bonn, North Rhine-Westphalia, Germany
  • 2006–2011
    • Universitätsklinikum Erlangen
      • • Department of Phoniatrics and Paedaudiology
      • • Department of Otorhinolaryngology – Head and Neck Surgery
      Erlangen, Bavaria, Germany
  • 2004–2011
    • Friedrich-Alexander Universität Erlangen-Nürnberg
      • • Pattern Recognition Lab
      • • Department of Computer Science
      Erlangen, Bavaria, Germany