Conference Paper

An Acoustic Framework for Detecting Fatigue in Speech Based Human-Computer-Interaction.

DOI: 10.1007/978-3-540-70540-6_7 Conference: Computers Helping People with Special Needs, 11th International Conference, ICCHP 2008, Linz, Austria, July 9-11, 2008. Proceedings
Source: DBLP

ABSTRACT This article describes a general framework for detecting accident-prone fatigue states based on prosody, articulation and
speech quality related speech characteristics. The advantages of this real-time measurement approach are that obtaining speech
data is non obtrusive, and free from sensor application and calibration efforts. The main part of the feature computation
is the combination of frame level based speech features and high level contour descriptors resulting in over 8,500 features
per speech sample. In general the measurement process follows the speech adapted steps of pattern recognition: (a) recording
speech, (b) preprocessing (segmenting speech units of interest), (c) feature computation (using perceptual and signal processing
related features, as e.g. fundamental frequency, intensity, pause patterns, formants, cepstral coefficients), (d) dimensionality
reduction (filter and wrapper based feature subset selection, (un-)supervised feature transformation), (e) classification
(e.g. SVM, K-NN classifier), and (f) evaluation (e.g. 10-fold cross validation). The validity of this approach is briefly
discussed by summarizing the empirical results of a sleep deprivation study.

Download full-text

Full-text

Available from: Jarek Krajewski, Jul 02, 2015
1 Follower
 · 
197 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: While significant work is being done in order to develop empathic agents which can identify emotions of the user through eye gaze and facial expressions, a neglected area, especially in the pedagogical context, is the use of voice for detection of alertness, fatigue and emotions. Some of the issues are lack of constant monitoring and visual feedback. However, such a system has advantages – the ability to work where visual monitoring is expensive, in darkness or where mobile devices cannot provide adequate visual feedback. We propose a model for a system capable of identifying emotions as well as alertness and fatigue based entirely on voice interaction, keyboard and mouse clicks; we also propose to develop an engine which can intelligently improve its prediction of emotions and cognitive states based on earlier interaction, and suggest appropriate measures to improve emotions, reduce distractions and mitigate fatigue. Keywords— Empathic agent; fatigue detection; intelligent e-learning system; speech emotion recognition. I. INTRODUCTION Today many e-learning systems are able to adapt to the needs, requirements and orientations of individual students. Such systems are considered intelligent or adaptive. But a more recent development is systems which are able to identify the emotions and affective states of individuals and react intelligently to them. In the pedagogical context, such developments are considered significant since they would allow computers to interact like human instructors and teachers through animated agents which appear and speak on the computer screen. Currently, many models [1], [2] and [3] are being developed which use face, eye tracking, voice, etc. in combination as inputs in order to identify emotions and to respond intelligently to them. Moreover, certain research focuses only on visual inputs, say, face and eyes, in order to predict emotions [4], [5]. However, there is hardly any research on developing robust technology to identify emotions, fatigue and alertness through voice input and to apply this in developing an intelligent and empathic feedback system. True, voice has already been used in various other contexts such as driver fatigue [6], [7], stress detection [8] but its independent use in a pedagogical context needs exploration. This is even more challenging since the user does not provide constant audio feedback. We propose a model for using the user's keyboard and mouse clicking behavior along with voice input to develop an intelligent student monitoring and interactive system which can be used along with any e-learning system.
    T4E, IIT Kharagpur; 12/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this study is to apply a state-of-the-art speech emotion recognition engine on the detection of microsleep endangered sleepiness states. Current approaches in speech emotion recognition use low-level descriptors and functionals to compute brute-force feature sets. This paper describes a further enrichment of the temporal information, aggregating functionals and utilizing a broad pool of diverse elementary statistics and spectral descriptors. The resulting 45,088 features were applied to speech samples gained from a car simulator based sleep deprivation study. After a correlation-filter based feature subset selection, which was employed on the feature space in an attempt to maximize relevance, several classification models were trained. The best model (Support Vector Machine, dot kernel) achieved 86.1% recognition rate in predicting microsleep endangered sleepiness stages.
    Pattern Recognition, 2008. ICPR 2008. 19th International Conference on; 01/2009
  • Source