Conference Paper

An Acoustic Framework for Detecting Fatigue in Speech Based Human-Computer-Interaction.

DOI: 10.1007/978-3-540-70540-6_7 Conference: Computers Helping People with Special Needs, 11th International Conference, ICCHP 2008, Linz, Austria, July 9-11, 2008. Proceedings
Source: DBLP


This article describes a general framework for detecting accident-prone fatigue states based on prosody, articulation and
speech quality related speech characteristics. The advantages of this real-time measurement approach are that obtaining speech
data is non obtrusive, and free from sensor application and calibration efforts. The main part of the feature computation
is the combination of frame level based speech features and high level contour descriptors resulting in over 8,500 features
per speech sample. In general the measurement process follows the speech adapted steps of pattern recognition: (a) recording
speech, (b) preprocessing (segmenting speech units of interest), (c) feature computation (using perceptual and signal processing
related features, as e.g. fundamental frequency, intensity, pause patterns, formants, cepstral coefficients), (d) dimensionality
reduction (filter and wrapper based feature subset selection, (un-)supervised feature transformation), (e) classification
(e.g. SVM, K-NN classifier), and (f) evaluation (e.g. 10-fold cross validation). The validity of this approach is briefly
discussed by summarizing the empirical results of a sleep deprivation study.

Download full-text


Available from: Jarek Krajewski
  • Source
    • "However, since it is a voice based system we propose our output would, though using texts and visuals, not use embodied agents. Work to date suggests that speech is a significant property for detecting fatigue [17] and emotion [18]. In this work, voiced-to-unvoiced and voiced-to-silence ratio is used as an indicator for detecting drowsiness and measuring level of alertness respectively. "
    [Show abstract] [Hide abstract]
    ABSTRACT: While significant work is being done in order to develop empathic agents which can identify emotions of the user through eye gaze and facial expressions, a neglected area, especially in the pedagogical context, is the use of voice for detection of alertness, fatigue and emotions. Some of the issues are lack of constant monitoring and visual feedback. However, such a system has advantages - the ability to work where visual monitoring is expensive, in darkness or where mobile devices cannot provide adequate visual feedback. We propose a model for a system capable of identifying emotions as well as alertness and fatigue based entirely on voice interaction, keyboard and mouse clicks, we also propose to develop an engine which can intelligently improve its prediction of emotions and cognitive states based on earlier interaction, and suggest appropriate measures to improve emotions, reduce distractions and mitigate fatigue.
    Full-text · Conference Paper · Dec 2013
  • Source
    • "Furthermore, much effort has been put into considering the special requirements of assistive environments and developing the accordingly adapted interactive systems. Krajewski et al. (2008) described an acoustic framework for detecting accident-prone fatigue states according to prosody, articulation and speech quality related speech characteristics for speech-based human computer interaction (HCI). Moreover, Jian et al. (2012) studied, implemented, and evaluated the speech interface of a multimodal interactive guidance system based on the most common elderly-centered characteristics during interaction within assistive environments. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we describe our recent and future research on multimodal interaction in an Ambient Assisted Living Lab. Our work combines two interaction modes, speech and gesture, for multiple device control in Ambient Assisted Living environments. We conducted a user study concerning multimodal interaction between participants and an intelligent wheelchair in a smart home environment. Important empirical data were collected through the user study, which encouraged further developments on our multi-modal interactive system for Ambient Assisted Living environments.
    Full-text · Conference Paper · Jul 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this study is to apply a state-of-the-art speech emotion recognition engine on the detection of microsleep endangered sleepiness states. Current approaches in speech emotion recognition use low-level descriptors and functionals to compute brute-force feature sets. This paper describes a further enrichment of the temporal information, aggregating functionals and utilizing a broad pool of diverse elementary statistics and spectral descriptors. The resulting 45,088 features were applied to speech samples gained from a car simulator based sleep deprivation study. After a correlation-filter based feature subset selection, which was employed on the feature space in an attempt to maximize relevance, several classification models were trained. The best model (Support Vector Machine, dot kernel) achieved 86.1% recognition rate in predicting microsleep endangered sleepiness stages.
    Full-text · Conference Paper · Jan 2009
Show more