"Research has shown that only few parameters are necessary to describe lip shapes (Fromkin, 1964; Benoit, 1990; Cosi and Magno-Caldognetto, 1996). These parameters are capable to describe each cluster of visually confused phonemes (visemes). "
[Show abstract][Hide abstract] ABSTRACT: Padova for their helpful collaboration in this project and for providing the data collected with ELITE. I would like to thank Stefano Pasquariello for having implementing the 3D facial model. ABSTRACT Our goal is to create a natural talking face with, in particular, lip-readable movements. Based on real data extracted from a speaker with an opto-electronic system that applies passive markers on the speaker face, we have approximated the data using a neural network model. In this chapter we present our work on the construction of a visual-text-to-speech system where we have included coarticulation rules. In particular, we describe our computational model of lip movements and our 3D facial model compliant with MPEG-4 standard. Our experiment is based on some phonetic-phonological considerations on the parameters defining labial orifice, and on identification tests of visual articulatory movements.
MPEG-4 Facial Animation: The Standard, Implementation And Applications, 01/2003: pages 125 - 140; , ISBN: 9780470854624
"The presence or absence of correlation observed in this study for Italian is congruous with previous results reported for English by Fromkin (1964) and Linker (1982), and for French, by Abry and Boë (1986) and Benoit et al. (1992), independently of the instrumentation or reperee points used in defining the parameters. The normalized mean values, pooled over the 6 subjects and the 5 repetitions, for each parameter and each vowel, are reported in Table 2. "
[Show abstract][Hide abstract] ABSTRACT: This research focuses on the spatio-temporal characteristics of lips and jaw movements and on their relevance for lip-reading, bimodal communication theory and bimodal recognition applications. 3D visible articulatory targets for vowels and consonants are proposed. Relevant modifications on the spatio- temporal consonant targets due to coarticulatory phenomena are exemplified. When visual parameters are added to acoustic ones as inputs to a Recurrent Neural Network system, high recognition results in plosive classification experiments are obtained. Lip-reading and bimodal recognition research is following the same trend which occurred to the studies on the transmission of the linguistic information by the auditory channel. The first stage was focused on visual intellegibility tests, i.e on the quantification of the information trasmitted by the visual channel. In the second stage the research proceeds with the identification of the characteristics of the signal which trasmit the information. To that purpose, various devices capturing and recording distal and proximal signals have to be designed, built and tuned up, and various techniques for the creation of synthetic stimuli for experimental tests have to be developed. Only relying on a great amount of experimental data, sufficient to capture the complexity of the whole phenomenum, and, possibly, characterized by a cross-linguistic nature in order to separate the fundamental mechanisms from more liguo-specific characteristics, the elaboration of adequate theories of visual perception of articulatory movements and of bimodal perception of speech will be possible. The experimental data presented in the following are intended to contibute to this second stage of the research (Magno Caldognetto et al., 1995). In fact they constitute the natural development of previous studies executed at CSRF focused on auditory (Magno Caldognetto and Vagges, 1990a, 1990b) and visual (Magno Caldognetto, Vagges and Ferrero, 1980) intelligibility tests which enabled us to quantify and verify the characteristics of the phonological information transmitted separately by both channels. As illustrated in
[Show abstract][Hide abstract] ABSTRACT: We present work on a new anatomically-based 3D parametric lip model for synchronized speech, which also supports lip motion needed for facial animation. The lip model is represented by a B-spline surface and high- level parameters which dene the articulation of the surface. The model parameterization is muscle-based to allow for specication of a wide range of lip motion. The B-spline surface species not only the external portion of the lips, but the internal surface as well. This complete geometric representation replaces, the possibly incomplete, lip geometry of any facial model. We render the model using a procedural texturing paradigm to give color, lighting and surface texture for increased realism. We use our lip model in a text-to-audio-visual-speech system to achieve speech synchro- nized facial animation.
Note: This list is based on the publications in our database and might not be exhaustive.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.