
Louis C.W. PolsUniversity of Amsterdam | UVA · Department of Linguistics
Louis C.W. Pols
PhD
About
191
Publications
15,629
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,508
Citations
Publications
Publications (191)
This study examined spectral properties of the Hungarian vowel pair /i/ vs. /i:/ with contrasting phonemic vowel lengths in 2;0 and 4;0 years old boys acquiring Hungarian as their native language. Results were obtained by an automated pitch-synchronous bandfilter analysis method that estimates the spectral envelope representation of vowels. Subsequ...
Purpose
Confusions between voiced and voiceless plosives and voiced and voiceless fricatives are common in Dutch tracheoesophageal (TE) speech. This study investigates (a) which acoustic measures are found to convey a correct voicing contrast in TE speech and (b) whether different measures are found in TE speech than in normal laryngeal (NL) speech...
The influence of the mother tongue on vowel productions in infancy is different for deaf and hearing babies. Audio material of five hearing and five deaf infants acquiring Dutch was collected monthly from month 5-18, and at 24 months. Fifty unlabelled utterances were digitized for each recording. This study focused on developmental paths in vowel p...
It is well known that listeners can ignore disturbances in speech and rely on context to interpolate the message. This fact is used to determine the importance of individual words for project- ing Transition Relevance Places, TRPs. Subjects were asked to shadow manipulated pre-recorded dialogs with minimal re- sponses, saying 'ah' when they feel it...
Confusions between voiced and voiceless plosives and fricatives are the most common confusions in Dutch tracheoesophageal (TE) speech. The problem is attributed to the working of the new voice source: the pharyngo-esophageal segment, or neoglottis. In order to learn how these speakers convey the voiced-voiceless distinction, detailed analyses are n...
Because of the aperiodicity of many tracheoesophageal voices, acoustic analysis of the tracheoesophageal voice is less straightforward than that of the normal voice. This study presents the development and testing of an acoustic signal typing system based on visual inspection of a narrow-band spectrogram that can be used by researchers for classifi...
In two Reaction Times (RT) experiments, subjects were asked to respond with minimal responses to prerecorded dialogs and impoverished versions of these dialogs, containing either only intonation and pause information, hummed stimuli, or no pe- riodic component at all, whispered stimuli. For the hummed, stimuli, response delays and, especially, vari...
The effect of the position of the last accented word on the projec- tion of TRPs was investigated with two RT experiments. Subjects were asked to respond with minimal responses to prerecorded di- alogs and impoverished versions of these dialogs, containing either only intonation and pause information,hummed stimuli, or no peri- odic component at al...
Total laryngectomy has far-reaching effects on vocal tract anatomy and physiology. The preferred method for restoring postlaryngectomy oral communication is prosthetic tracheoesophageal (TE) speech, which like laryngeal speech is pulmonary driven. TE speech quality is better than esophageal or electrolarynx speech quality, but still very deviant fr...
It is notoriously difficult to perform reliable spectro-temporal analyses on speech of young children (up to two years of age), partly because of the high pitch of their voices. Formants are very poorly defined and thus we used a pitch- synchronous bandfilter analysis, followed by a principal components analysis to represent the acoustic characteri...
The purpose of the current study was to assess the anatomic and functional correlates of voice quality in tracheoesophageal speech, with dynamic imaging studies of the neoglottis. Videofluoroscopy (providing a lateral view), digital high-speed endoscopy (providing a "birds-eye" view), and their relationships with perceptual evaluations of voice qua...
A total laryngectomy changes the anatomy and physiology of the vocal tract, with a most noticeable effect on speech. By applying a voice prosthesis, enabling the patient to use tracheoesophageal (TE) speech, speech is of better quality than with esophageal or electrolarynx speech, but still very deviant from laryngeal speech. Most studies on TE spe...
Traditional hand-edited formant measurements may result in biased assessment of vowel formants in children's speech. Therefore, vowel spaces that are constructed by hand-edited formant measures may be unreliable. The recent development of an automated frequency domain analysis method allows for more reliable measurements. Thus, a valid comparison o...
This paper is an initial report on the systematic analysis of changes within the vowel system of Standard Dutch. The work focuses on the recent lowering of the diphthong /Ei/, known as 'Polder Dutch' (Poldernederlands). The purpose was to find an automatizable method to reliably analyze and compare speakers of a large corpus of Dutch spon- taneous...
A Frisian adaptation of a Dutch TTS system based on Festival, NeXTeNS, is presented as a case study in prototyping TTS for resource-poor minority languages. For these languages, demonstrator systems are essential to seed projects in speech and language technology. The conversion of a Dutch TTS system to a new language with minimal speech and langua...
As a result of the cooperation in the Intas 915 project, annotated speech corpora have become available in three different languages for both read and spontaneous speech of some 4-5 male and 4-5 female speakers per language (6-10 minutes per speaker). These data have been used to study the effects of redundancy on acoustic vowel reduction, in terms...
This contribution presents current trends in phonetics research with some bias towards the research of my own Amsterdam group. I also tried to exclude areas that will most probably be covered by other invited speakers. This let me to emphasize, apart from my admiration for Ken Stevens and his work, the importance of dynamic information in speech, n...
A Frisian adaptation of a Dutch TTS system based on Festival, NeXTeNS, is presented as a case study in prototyping TTS for resource-poor minority languages. For these languages, demonstrator systems are essential to seed projects in speech and language technology. The conversion of a Dutch TTS system to a new language with minimal speech and langua...
The present study was conducted to investigate voice quality in tracheoesophageal speech by means of perceptual evaluations and to develop a clinically useful subset of perceptual scales sufficient for these perceptual evaluations. The perceptual ratings were obtained from both naive and trained raters (speechlanguage pathologists [SLPs]) after lis...
Speech is considered an efficient communication channel. This implies that the organization of utterances is such that more speaking effort is directed towards important parts than towards redundant parts. Based on a model of incremental word recognition, the importance of a segment is defined as its contribution to word-disambiguation. This import...
An interesting but so far neglected topic in the development of infant sound production is the hypothesized progression toward adult vowel quality. Likely, this process is quite different for normally hearing babies and for deaf babies. A band filtering analysis method is used to measure the spectral envelope in these high-pitched infants' sounds a...
Speech is considered an efficient communication channel. This implies that the organization of utterances is such that more speaking effort is directed towards important parts than towards redundant parts. Based on a model of incremental word recognition, the importance of a segment is defined as its contribution to word-disambiguation. This import...
At the Institute of Phonetic Sciences (IFA) we have collected a corpus of spoken Dutch of 4 male and 4 female speakers, containing informal as well as read speech, plus lists of sentences, words, and syllables taken from the transcribed conversation text, and then spoken in isolation. This pertains to about 5.5 hours of speech. All this material is...
Speaking is generally considered efficient in that less effort is spent articulating more redundant items. With efficient speech production, less reduction is expected in the pronunciation of phonemes that are more important (distinctive) for word identification. The importance of a single phoneme in word recognition can be quantified as the inform...
Abstract Atthe Institute of Phonetic Sciences (IFA) we have,collected a corpus,of spoken,Dutch of 4 male and 4 female speakers, containing conversational as well as read speech, plus sentences, words and syllables taken from the transcribed conversation text, and then spoken in isolation. This pertains to about ,5.5 hours of speech. All this materi...
In this paper both acoustical as well as textual correlates of prominence are discussed. Prominence, as we use it, is defined at the word level and is based on listener judgments. A selection of useful acoustic input features is tested for classification of prominent words, with the help of Feed Forward Nets. We use spoken sentences from many diffe...
This paper summarizes the results of a series of experiments conducted to investigate various aspects of normal pharyngeal articulation and the nature of pharyngeal coarticualtion. Video fiberscopic imaging, electromagnetography and acoustic a nalysis techniques were used to obtain empirical and quantitative data on the use of the pharynx in speech...
An open source database of hand-segmented Dutch speech was constructed with off-the-shelf software using speech from 8 speakers in a variety of speaking styles. For a total of 50,000 words, speech acquisition and preparation took around 3 person-weeks per speaker. Hand segmentation took 1,000 hours of labeling altogether. The asymptotic segmentatio...
Phoneme recognition can mean two things, conscious phoneme-naming and pre-conscious phone-categorization. Phoneme naming is based on a (learned) label in the mental lexicon. Tasks requiring phoneme awareness will therefore exhibit all the features of retrieving lexical items. Phone categorization is a hypothetical pre-lexical and pre-conscious proc...
Proper early acquisition of speech and language appears to be a necessary process to reach mature speech communication. In modeling the process of natural (and pathological) speech production and speech perception, we frequently concentrate on specific aspects of phonetic knowledge. But also to improve the performance of speech technological system...
Training the acoustic models for automatic speech recognition (ASR) as well as the similarity between the training corpus and the recognition task have a major influence on the performance of a speech recogniser. The more similar the two sets are, the better the performance of the speech recogniser will be. When the recognition task consists of cit...
It is proposed that some of the variation in speech is the result of an effort to communicate efficiently. Speaking is considered efficient if the speech sound contains only the information needed to understand it. This efficiency is tested by means of a corpus of spontaneous and matched read speech, and syllable, word, and N-gram frequencies as me...
In two papers, Nearey (1992, 1997) discusses the fact that theories on phoneme identification generally favor strong cues that are localized in the speech signal. He proposes an alternative view in which cues to phoneme identity are relatively weak and dispersed. In the present listening experiment, Dutch subjects identified speech tokens containin...
In this paper we present several acoustical features, which are used as predictors for prominence. A set of 1244 sentences from 273 different speakers is selected from the Dutch Polyphone Corpus. Via listening experiments the subjective prominence markers are obtained. Several acoustical features concerning F 0 , energy and duration are derived and...
The acoustic consequences of the articulatory reduction of consonants remain largely unknown. Much more is known about acoustic vowel reduction. Whether the acoustical and perceptual consequences of articulatory consonant reduction are comparable in kind and extent to the consequences of vowel reduction is still an open question. In this study we c...
Computational models of speech pattern processing might be able to benefit a lot from sound and speech perception by humans. Psycho-acoustics has given us insight into the limits and the capabilities of peripheral hearing for, mainly, simple stationary sounds. Threshold phenomena and temporal and spectral resolution for such stimuli are a first ind...
This paper describes a first step towards the automatic classification of prominence (as defined by native listeners). As a result of a listening experiment each word in 500 sentences was marked with a rating scale between `0' (non-prominent) and `10' (very prominent). These prominence labels are compared with the following acoustical features: lou...
In this study the specific problem of robustness in automatic speech recognition under various acoustic conditions is reviewed. The currently most used techniques to solve this problem are discussed. One specific technique (RASTA-PLP) is then implemented and compared with a more conventional approach (MFCC) for the frond-end processing for the reco...
In describing human performance in sound perception, in word recognition, in speech understanding, and in dialogue handling, we generally test human limits under controlled conditions and try to understand the underlying mechanisms, however, the human system itself has already been built by nature. In speech and language technology we would like to...
Two perception experiments with different instructions and different presentations were used to locate prominence in 81 read aloud sentences. Results show that, depending on the instruction and presentation (mark prominent words or prominent syllables), the subjects can listen more analytically or more globally. The results indicate that a word per...
Four discrimination experiments were carried out to examine auditory sensitivity to changes in (endpoint) frequency, duration, and rate-of-frequency change of short and rapid speechlike transitions. The transitions were simple (tone sweeps) as well as (harmonically) complex, and isolated as well as preceded or followed by a stationary part. They si...
At the Cocosda meeting in September 1997 in Rhodes, Greece it was decided to set up a TTS server website. The above authors constitute a committee that will describe, design and make operational such a TTS website. The TTS server will permit immediate access to a variety of systems, whereas a text server will allow controlled access to a variety of...
In this paper an improved objective speech quality evaluation measure is proposed that is based on a combination of (averaged) instantaneous and dynamic spectral features. In the current method, the test words are represented by time sequences of LPC cepstrum coefficients and log-energy, as well as by regression coefficients, being the dynamic meas...
In describing human performance in sound perception, in word
recognition, in speech understanding and in dialogue handling, we
generally test human limits under controlled conditions and try to
understand the underlying mechanisms. However, the human system itself
has already been built by nature. In speech and language technology, we
would like to...
Ce papier présente les résultats d'une analyse de moyens prosodiques employés par des locuteurs dans la structuration du discours spontané. Huit locuteurs, 4 hommes et 4 femmes, ont lu à haute voix une courte histoire en néerlandais, qu'ils ont ensuite racontée en leurs propres mots ('version racontée'). Le but du papier présent est d'étudier les d...
Using dynamic spectral features, a distance measure for objective quality evaluation of Chinese communication channels was briefly studied. It was based on studies for subjective and objective quality evaluation and LPC cepstral distance measure. The dynamic spectral features were expressed by the cepstral regression coefficients. Three parameters...
acoustic signal which simply represents the sequence of the actually pronounced phones. This might in turn reduce recognition accuracy. One solution to this problem is to use "phonological rules" ((2), for both French and English, and (3)) which account for various pronunciation variations in the language. However, it is hard to obtain reliable set...
Reduction causes changes in the acoustics of consonant realizations that affect their identification. In this study we try to identify some of the acoustic parameters that are correlated with this change in identification. Speaking style is used to manipulate the degree of reduction. Pairs of otherwise identical intervocalic consonants from read an...
acoustic signal which simply represents the sequence of the actually pronounced phones. This might in turn reduce recognition accuracy. One solution to this problem is to use "phonological rules" ([2], for both French and English, and [3]) which account for various pronunciation variations in the language. However, it is hard to obtain reliable set...
Several recent studies have shown that speech production develops in an organized way, already in the first twelve months of life. This development is determined by several factors such as anatomical growth and physiological constraints. Studying the sound production of deaf infants and comparing this with that of normally hearing infants, can give...
The paper presents research on integrating context-dependent
durational knowledge into HMM-based speech recognition. The first part
of the paper presents work on obtaining relations between the parameters
of the context-free HMMs and their durational behaviour, in preparation
for the context-dependent durational modelling presented in the second
pa...
Several recent studies have shown that speech production develops
in an organized way, already in the first twelve months of life. This
development is determined by several factors such as anatomical growth
and physiological constraints. Studying the sound production of deaf
infants and comparing this with that of normally hearing infants, can
give...
As indicated by Bourlard et al. (1996), the best and simplest solution so far in standard ASR technology to implement durational knowledge, seems to consist of imposing a (trained) minimum segment duration, simply by duplicating or adding states that cannot be skipped. We want to argue that recognition performance can be further improved by incorpo...
Phoneme-based speech recognition for the Dutch language, using hi-fi recorded sentences as input, was investigated in this Master Thesis. This study was performed in the framework of a project called Talking Heads, in which KPN Research and TNO-TPD are involved. In that project it will be studied whether integration of automatic speech recognition...
The paper presents statistical analyses of context dependent phone
durations using the hand segmented TIMIT database, for the purpose of
improving automatic speech recognition. Two main approaches were used.
(1) Duration distributions were found under the influence of individual
contextual factors, such as broader classes specified by long or short...
Vowel reduction has been studied for years. It is a universal
phenomenon that reduces the distinction of vowels in informal speech and
unstressed syllables. How consonants behave in situations where vowels
are reduced is much less well known. The authors compare durational and
spectral data (for both intervocalic consonants and vowels) segmented
fr...
Vowel reduction has been studied for years. It is a universal phenomenon that reduces the distinction of vowels in informal speech and in unstressed syllables. How consonants behave in situations where vowels are reduced is much less well known. In this paper we compare durational and spectral data for both intervocalic consonants and vowels segmen...
Research on the relationship between perceived prominence accent and the measurement of acoustical pitch movements in texts read aloud is described in this paper. By means of a listening experiment it was investigated which words in 7 sentences read by 8 speakers were perceived as prominent by the majority of 7 listeners. The pitch contours of thes...
Objective whole-spectrum and formant analyses have been performed on all 15 Dutch vowels pronounced in /C1VC2/ words by 24 deaf and 24 normal-hearing children, in order to develop a model of pronunciation quality for evaluating (deaf) speech; the results as obtained for adult males by Bakkum et al. [J. Acoust. Soc. Am. 94, 1989-2004 (1993)] have be...
Two discrimination experiments were performed to determine auditory sensitivity for single and complex consonant-vowel (CV)-and vowel-consonant (VC)-like formant transitions. In experiment I difference limens in end-point frequency were determined by means of same/different paired comparison tasks for 20-, 30-, and 50-ms second formant (F2) speechl...
Perceptual processing of single and complex (multi-formant) CV-like and VC-like sounds, as well as interpolated natural speech-based syllables is examined in forcedchoice classification and ABX discrimination tasks. The sounds have short and rapid plosive-like vocalic transitions, and are preceded or followed by an /a/-like or /u/-like stationary p...
Synthetic vowels were used to investigate how listeners use vowel duration and formant track shape to determine vowel identity. The synthetic vowels had level or parabolically shaped formant tracks and variable durations. They were presented in isolation as well as in synthetic ConsonantVowel -Consonant syllables. There was no evidence of perceptua...
Frequency and duration discrimination thresholds of short rising and falling one-formant speechlike transitions without a steady state were determined by means of same/different paired comparison tasks in two experiments. When frequency extent is varied (experiment 1), just noticeable differences decrease with increasing transition duration. Expres...
An objective analysis has been performed on all 15 Dutch vowels pronounced in /hVt/ words by nine native Dutch, nine non-native, and six deaf males. Spectral representations of the vowel segments were created by determining the mean output levels of a bank of 16 filters (90-7200 Hz), with 1/3-oct bandwidths and logarithmic spacing of their center f...
Some 550 vowel segments have been excised from a text read by a Dutch speaker, both at normal rate and at fast rate. The duration of each segment is measured, as well as static and dynamic formant characteristics, such as midpoint formant frequencies, and descriptions of the formant tracks in terms of 16 equidistant points per segment, or Legendre...
In hidden Markovmodeling (HMM) of speech signals, the statistics of speechcharacteristics are represented by HMM parameters after the HMM training. This procedure is purely statistical. This study concerns the incorporation of explicit knowledge into the HMM training. Therefore one specific parameter, i.e., segment duration, was selected. In order...