"In English & Japanese, the vowels /i/, /u/, /e/, /o/, /A/ exhibit language-specific placement along spread-round, high-low, and frontback continua (e.g., Japanese /u/ is relatively more spread than English /u/), but broadly follow the pattern shown in Fig. 1 (cf. Fromkin, 1964, Fukuda & Hiki, 1982; Wada et al., 1969). Figure 1: General placement of vowels along feature continua. "
[Show abstract][Hide abstract] ABSTRACT: Previous analyses of vowel context effects on the sibilant fricatives of English (/s, ʃ/) and Japanese (/s, ɕ/) have focused on spectral properties computed from a limited number of time points, e.g., frication midpoint or vowel onset. However, the spectra of sibilants vary temporally; thus, it is worth considering how their spectral dynamics vary across vocalic contexts. Vowel context effects were investigated with respect to the trajectories of three psychoacoustic measures computed across the timecourse of native, adult speakers' productions of word-initial, pre-vocalic sibilants. Psychoacoustic spectra were computed from 10 ms windows, spaced evenly across the frication, by passing each window through a bank of gammatone filters, which modeled the auditory system's differential frequency selectivity. From these psychoacoustic spectra, the peak frequency, excitation-drop (difference between maximum high-band and minimum low-band excitation), and the half-power bandwidth of the peak were computed. In Japanese, the peak frequency and the excitation-drop trajectories showed effects of vowel height—the trajectories for high (vs. mid and low) vowel contexts diverging from 50-75% of the fricative duration onward. In English, the excitation-drop trajectories showed similar effects of vowel height; however, the trajectories for high vowels diverged later than in Japanese. Peak bandwidth exhibited context effects only for Japanese /ɕ/, where it was lower in back vowel contexts across the first 75% of the fricative duration.
Journal of the Acoustical Society of America; 04/2015
"Research has shown that only few parameters are necessary to describe lip shapes (Fromkin, 1964; Benoit, 1990; Cosi and Magno-Caldognetto, 1996). These parameters are capable to describe each cluster of visually confused phonemes (visemes). "
[Show abstract][Hide abstract] ABSTRACT: Padova for their helpful collaboration in this project and for providing the data collected with ELITE. I would like to thank Stefano Pasquariello for having implementing the 3D facial model. ABSTRACT Our goal is to create a natural talking face with, in particular, lip-readable movements. Based on real data extracted from a speaker with an opto-electronic system that applies passive markers on the speaker face, we have approximated the data using a neural network model. In this chapter we present our work on the construction of a visual-text-to-speech system where we have included coarticulation rules. In particular, we describe our computational model of lip movements and our 3D facial model compliant with MPEG-4 standard. Our experiment is based on some phonetic-phonological considerations on the parameters defining labial orifice, and on identification tests of visual articulatory movements.
MPEG-4 Facial Animation: The Standard, Implementation And Applications, 01/2003: pages 125 - 140; , ISBN: 9780470854624
"The presence or absence of correlation observed in this study for Italian is congruous with previous results reported for English by Fromkin (1964) and Linker (1982), and for French, by Abry and Boë (1986) and Benoit et al. (1992), independently of the instrumentation or reperee points used in defining the parameters. The normalized mean values, pooled over the 6 subjects and the 5 repetitions, for each parameter and each vowel, are reported in Table 2. "
[Show abstract][Hide abstract] ABSTRACT: This research focuses on the spatio-temporal characteristics of lips and jaw movements and on their relevance for lip-reading, bimodal communication theory and bimodal recognition applications. 3D visible articulatory targets for vowels and consonants are proposed. Relevant modifications on the spatio- temporal consonant targets due to coarticulatory phenomena are exemplified. When visual parameters are added to acoustic ones as inputs to a Recurrent Neural Network system, high recognition results in plosive classification experiments are obtained. Lip-reading and bimodal recognition research is following the same trend which occurred to the studies on the transmission of the linguistic information by the auditory channel. The first stage was focused on visual intellegibility tests, i.e on the quantification of the information trasmitted by the visual channel. In the second stage the research proceeds with the identification of the characteristics of the signal which trasmit the information. To that purpose, various devices capturing and recording distal and proximal signals have to be designed, built and tuned up, and various techniques for the creation of synthetic stimuli for experimental tests have to be developed. Only relying on a great amount of experimental data, sufficient to capture the complexity of the whole phenomenum, and, possibly, characterized by a cross-linguistic nature in order to separate the fundamental mechanisms from more liguo-specific characteristics, the elaboration of adequate theories of visual perception of articulatory movements and of bimodal perception of speech will be possible. The experimental data presented in the following are intended to contibute to this second stage of the research (Magno Caldognetto et al., 1995). In fact they constitute the natural development of previous studies executed at CSRF focused on auditory (Magno Caldognetto and Vagges, 1990a, 1990b) and visual (Magno Caldognetto, Vagges and Ferrero, 1980) intelligibility tests which enabled us to quantify and verify the characteristics of the phonological information transmitted separately by both channels. As illustrated in
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.