Publications

  • Petr Zelinka, Milan Sigmund, Jiri Schimmel
    [Show abstract] [Hide abstract]
    ABSTRACT: The impact of changes in a speaker’s vocal effort on the performance of automatic speech recognition has largely been overlooked by researchers and virtually no speech resources exist for the development and testing of speech recognizers at all vocal effort levels. This study deals with speech properties in the whole range of vocal modes – whispering, soft speech, normal speech, loud speech, and shouting. Fundamental acoustic and phonetic changes are documented. The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system’s robustness are tested. The proposed multiple model framework approach reaches a 50% relative reduction of word error rate compared to the baseline system. A new specialized speech database, BUT-VE1, is presented, which contains speech recordings of 13 speakers at 5 vocal effort levels with manual phonetic segmentation and sound pressure level calibration.
    Speech Communication. 07/2012; 54(6):732–742.
  • Source
    Milan Sigmund, Petr Zelinka
    [Show abstract] [Hide abstract]
    ABSTRACT: A significant part of information carried in speech signal refers to the speaker. This paper deals with investigating alcohol intoxication based on analyzing recorded speech signal. Speech changes resulting from alcohol intoxication were investigated in the waveform of glottal pulses estimated from speech by applying the Iterative Adaptive Inverse Filtering (IAIF). Experimental results show that analysis of glottal excitation appears to be a useful approach to provide evidence of alcohol intoxication of over 1‰. At this alcohol level, the associated negative events influence professional performance and may involve fatal accidents in some cases. Via analyzing the speech signal, the speaker could be automatically monitored without their active co-operation. For use in our experiments, a new collection of Czech alcoholized speech consisting of phonetically identical speech data spoken in both sober and intoxicated state was created.
    Information technology and control 06/2011; 40. · 0.67 Impact Factor
  • P. Zelinka, M. Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes several practical steps for accurate statistical modeling of a known acoustical noise environment to attain good performance of a small vocabulary speech recognizer for isolated words based on whole-word hidden Markov models. Hierarchical segmentation based on Bayes information criterion and k-means clustering followed by split-merge Gaussian mixture model training were utilized for noise model estimation. Parallel model combination technique produces final noise-corrupted speech models for a small group of speakers. Experiments were carried out on a real operating room ambient noise recorded during a neurosurgery at the University Hospital in Marburg.
    Radioelektronika (RADIOELEKTRONIKA), 2010 20th International Conference; 05/2010
  • Milan SIGMUND
    [Show abstract] [Hide abstract]
    ABSTRACT: A significant part of information carried in speech signal refers to the speaker expressing their personality, transferring emotions, and reflecting situations such as fatigue, stress or some medical problems. This paper deals with speech signal spoken by speaker under psychological stress. A classification of various states of stress and the corresponding type of stressor is defined. For use in our experiments, a new database of speech under stress consisting of data collected during oral examinations at our university was created. Long-time changes of fundamental frequency and short-time changes in spectrum of vowels due to exam stress were investigated. The primary goal of the study reported here is to give an introduction to the problem “speaker's stress and speech signal”.
    01/2010;
  • Petr Zelinka, Milan Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an approach for enhancing the robustness of isolated words recognizer by extending its flexibility in the domain of speaker's variable vocal effort level. An analysis of spectral properties of spoken vowels in four various speaking modes (whispering, soft, normal, and loud) confirm consistent spectral tilt changes. Severe impact of vocal effort variability on the accuracy of a speaker-dependent word recognizer is presented and an efficient remedial measure using multiple-model framework paired with accurate speech mode detector is proposed.
    01/2010;
  • Source
    Milan Sigmund
    Recent Advances in Signal Processing, 11/2009; , ISBN: 978-953-307-002-5
  • Pavel SALA, Milan SIGMUND
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with methods for the estimation of glottal pulses from speech signal and their analysis for finding appropriate criterions to describe a selected diagnosis. The method Iterative Adaptive Inverse Filtering is presented more in details. This method was applied to the speech under stress aimed to investigate the influence of stress by speaker on the generating glottal flow. Using two developed criterions, the best stress detection rate achieved 89 % on a small available database of speakers.
    01/2009;
  • Source
    Milan Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: This work was supported by the Czech Ministry of Education in the frame of the Research Plan No. MSM 0021630513 "Advanced Electronic Communication Systems and Technologies".
    Frontiers in Robotics, Automation and Control, 10/2008; , ISBN: 978-953-7619-17-6
  • Source
    Milan Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: Summary This paper presents and discusses an approach to automatic gender distinction in a short segment of normally spoken continuous speech. In order to see which phonemes are effective for gender recognition, we analyzed individual vowels. Two different simple identifiers based on selected mel-frequency cepstral coefficients were evaluated. Using vowel phonemes, we achieved in short-time analysis (20 msec) a gender identification accuracy of more than 90%. Especially for vowel "a", almost no error occurs. For text-independent analysis, the speech duration of 500 msec was sufficient to identify male/female speakers with the accuracy of more than 93%. Automatic estimation of speaker's gender by her/his voice is an important factor to realize high-quality dialogue systems.
    01/2008;
  • Source
    Milan Sigmund, Ales Prokes, Zdenek Brabec°
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, the problem of speech signal under psycho-logical stress is addressed. The investigation into the speaker's stress is based on statistical analysis of glottal pulse derivative extracted from the vowel signals. A pitch synchronous selection of segments from the glottal pulse waveform is used. Selected segments are fixed in their maxima and overlaid. The generated distribution matrix is analysed using special cuts. A new database of speech under stress is created for use in our experiments consisting of data collected during oral final examinations at our univer-sity. The database contains read and conversational speech of 31 male speakers, both in neutral and in stressed state. The stress recognition rate in the speaker dependent bino-mial classification (stress/no stress) reaches 88%.
    01/2007;
  • Source
    Milan Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: Summary This paper deals with speech signal as significant indicator of psychological stress when the speaker is involved in a stressful activity. The investigation of speaker's stress is based on specific changes in short-time spectrum of vowel phonemes. For each selected signal segment, the spectrum is computed by means of two different methods: Fourier transformation and chirp transformation. Comparative results between both spectra serve for speaker's stress detection. In case of speech under stress, the obtained spectra differ towards the higher frequencies due to enhanced pitch modulation observed in the envelope of the chirp spectrum. For use in our experiments, a new database of speech under stress consisting of data collected during oral examinations at our university was created.
    01/2007;
  • Milan Sigmund, Tomás Dostál
    IASTED International Conference on Artificial Intelligence and Applications, part of the 25th Multi-Conference on Applied Informatics, Innsbruck, Austria, February 12-14, 2007; 01/2007
  • M. Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper briefly describes a newly created database of Czech speech under realistic stressed conditions and presents some selected results achieved by analyzing stressed speech. The motivation for creating a new database was the non-existence of stressed speech corpora for Czech or any other Slavic language. The database contains read and conversational speech both in neutral and stressed state of 31 male speakers. The stressed speech was recorded during final oral examinations at the Brno University of Technology. In order to quantify the stress of individual speakers the speaker's heart rate was also measured and recorded simultaneously with the speech. Experiments conducted using this database show that the speech corpus can be used for development and evaluation of specific algorithms by identifying the extent of stress speakers have by analyzing their voices only
    Signal Processing Symposium, 2006. NORSIG 2006. Proceedings of the 7th Nordic; 07/2006
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces two different ways of time-frequency representations for voice activity detection (VAD). The first method is based on the chirp-based spectral representation of the signal, while the second method is based on wavelet decomposition. Not only this is the first implementation of the Fan-Chirp Transform for VAD, but the method based on Discrete Wavelet Transform is also one of the few multidimensional approaches in the field. The paper addresses the performance of both methods with clean speech and speech in noisy conditions, and discusses their limitations.
    Proceedings of the IASTED International Conference on Signal Processing, Pattern Recognition, and Applications, SPPRA 2006, February 15-17, 2006, Innsbruck, Austria; 01/2006
  • A. Kuiper, M. Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: Film recordings from the beginning of the century get deteriorated during time. There are many reasons for the decreasing quality. Mainly the quality decreases because of faults arising due to bad maintenance of the reproduction-equipment or inappropriate storage of the film. Research-institutions have been trying to restore faults on motion pictures using digital imgage processing for the last couple of years. Many methods for correcting typical faults, suchs as scratches, dust and dirt are known. Nearly every approach fixing a fault brings different results and therefore rating the quality of an invidual restoration is a difficult task. Mostly it is judged based on the subjective perceiption of an audience. As the original state of the material is not known in most cases a comparison between the original undestroyed and restored state is not possible. A solution to this problem could be the simulation of faults on a not destroyed picture sequence in order to apply developed restoration algorithms on it. This paper is about a developed application which is intened to fill this gap by simulating faults as realistic as possible in order to test restoration-methods.
    Computer as a Tool, 2005. EUROCON 2005.The International Conference on; 02/2005
  • Milan Sigmund, Tomás Dostál
    IASTED International Conference on Artificial Intelligence and Applications, part of the 23rd Multi-Conference on Applied Informatics, Innsbruck, Austria, February 14-16, 2005; 01/2005
  • Source
    Milan Sigmund, Petr Jelínek
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new tool developed for searching of phoneme boundaries in spoken language. The used algorithm is based on the comparison of speech features obtained in the stable portion of each phoneme.
    01/2005;
  • Milan Bostik, Milan Sigmund
    8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003; 01/2003
  • Milan Sigmund, Pavel Novotny
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes programs for 3-dimensional engraving. The programs use raster or vector images to create a D model and, subsequently, convert this model into a sequence of control commands for 3D engraving machines. Three programs have been developed. A program for engraving general 3D surfaces from grey-scale images, a program for preparing these grey-scale images from patterns and vector images, and a program for fast 2D engraving. A simple and fast preparation of the 3D model, a user-friendly environment, and small hardware requirements were the principal goals.
    Journal of Intelligent and Robotic Systems 01/2000; 28:69-84. · 0.83 Impact Factor
  • Source
    Tomáš Dostál, Milan Sigmund
    [Show abstract] [Hide abstract]
    ABSTRACT: Universal multifunctional (low-pass, high-pass, band-pass, band-reject and all-pass) third-order active filter in current mode, based on multi-loop feedback state-variable structure, with differential-input single-output transconductors (OTA) and single-input multi-output current followers (mirrors) are presented in this paper.

1 Following View all

10 Followers View all