Figure 2 - uploaded by Carlo Drioli
Content may be subject to copyright.
Speech signal and time evolution of some kinematic parameters associated with the sequence /'aba/ expressing disgust.  

Speech signal and time evolution of some kinematic parameters associated with the sequence /'aba/ expressing disgust.  

Source publication
Article
Full-text available
This paper concerns the bimodal transmission of emotive speech and describes how the expression of joy, surprise, sadness, disgust, anger, and fear, leads to visual and acoustic target modifications in some Italian phonemes. Current knowledge on the audio-visual transmission of emotive speech traditionally concerns global prosodic and intonational...

Contexts in source publication

Context 1
... Left and Right Corner horizontal displacements (LCX and RCX), calculated as the distance between the markers placed on the left and the right lip corner and the sagittal Σ plane passing through the tip of the nose and perpendicular to the Ω plane (these parameters are not visualized in Fig. ...
Context 2
... Left and Right Corner vertical displacements (LCY and RCY), calculated as the distance between the markers placed on the left and right lip corner and the transversal plane Ω, containing the line crossing the markers placed on the lobes of the ears and on the nose (these parameters are not visualized in Fig. ...
Context 3
... for ASYMX and ASYMY values different from zero indicate the presence of an asymmetry. Positive values for ASYMY mean that the right lip corner moves in an asymmetric higher position along the vertical axis than the left corner. Positive values for ASYMX indicate that the lips are displaced in a right asymmetrical way along the horizontal axis. Fig. 2 proposes some of the labial kinematic parameters considered in this study and relative to /'aba/ expressing disgust: LO, LR, ULP, LLP, UL, LL, ASYMX and ASYMY. ...

Similar publications

Article
Full-text available
This paper concerns the bimodal transmission of emotive speech and describes how the expression of joy, surprise,sadness, disgust, anger, and fear, leads to visual and acoustic target modifications in some Italian phonemes. Current knowledge on the audio-visual transmission of emotive speech traditionally concerns global prosodic and intonational c...
Conference Paper
Full-text available
Obtaining aligned spectral pairs in case of non-parallel data for stand-alone Voice Conversion (VC) technique is a challenging research problem. Unsupervised alignment algorithm, namely, an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) iteratively tries to align the spectral features by minimizing th...

Citations

... Two different configurations have been adopted for articulatory data collection: the first one, specifically designed for the analysis of labial movements, considers a simple scheme with only 8 reflecting markers (bigger markers in Fig.2) while the second, adapted to the analysis of expressive and emotive speech, utilizes the full and complete set of 28 markers. All the movements of the 8 or 28 markers, depending on the adopted acquisition pattern, are recorded and collected, together with their velocity and acceleration, simultaneously with the co-produced speech which is usually segmented and analyzed by means of PRAAT [18], that computes also intensity, duration, spectrograms, formants, pitch synchronous F0, and various voice quality parameters in the case of emotive and expressive speech [19,20]. ...
Conference Paper
Full-text available
LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR. It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was built from real human data physically extracted by ELITE optic-tracking movement analyzer. LUCIA can copy a real human being by reproducing the movements of passive markers positioned on his face and recorded by the ELITE device or can be driven by an emotional XML tagged input text, thus realizing true audio/visual emotive/expressive synthesis. Synchronization between visual and audio data is very important in order to create the correct WAV and FAP files needed for the animation. LUCIA's voice is based on the ISTC Italian version of FESTIVAL-MBROLA packages, modified by means of an appropriate APML/VSML tagged language. LUCIA is available in two different versions: an open source framework and the "work in progress" WebGL. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.
... They concluded that there are significant differences in the patterns presented in neutral and emotional speech [31]. Similar inferences were also reported in [32]. As a direct consequence of these results, emotion-dependent models to synchronize lips movement with speech have been developed for human-like facial animations [20]. ...
... In agreement with previous works [6], [31], [32], the results presented here show that the lower face region conveys important emotional clues. Table I and Figure 4 indicate that the inter-emotional differences in the average displacement coefficient of the markers are significant, with the exception of the pair happiness-anger, in which only 41.38% of the markers were found with significant differences (see Table II). ...
Article
Full-text available
The verbal and nonverbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message. The present paper investigates the influence of articulation and emotions on the interrelation between facial gestures and speech. The analyses are based on an audio-visual database recorded from an actress with markers attached to her face, who was asked to read semantically neutral sentences, expressing four emotion states (neutral, sadness, happiness, and anger). A multilinear regression framework is used to estimate facial features from acoustic speech parameters. The levels of coupling between the communication channels are quantified by using Pearson's correlation between the recorded and estimated facial features. The results show that facial and acoustic features are strongly interrelated, showing levels of correlation higher than r = 0.8 when the mapping is computed at sentence-level using spectral envelope speech features. The results reveal that the lower face region provides the highest activeness and correlation levels. Furthermore, the correlation levels present significant interemo- tional differences, which suggest that emotional content affect the relationship between facial gestures and speech. Principal component analysis (PCA) shows that the audiovisual mapping parameters are grouped in a smaller subspace, which suggests that there is an emotion-dependent structure that is preserved from across sentences. The results suggest that this internal structure seems to be easy to model when prosodic-features are used to estimate the audiovisual mapping. The results also reveal that the correlation levels within a sentence vary according to broad phonetic properties presented in the sentence. Consonants, especially unvoiced and fricative sounds, present the lowest correlation lev- els. Likewise, the results show that facial gestures are linked at different resolutions. While the orofacial area is locally connected with the speech, other facial gestures such as eyebrow motion are linked only at the sentence-level. The results presented here have important implications for applications such as facial animation and multimodal emotion recognition.
... Articulatory parameters such as the tongue tip, jaw and lip also present more peripheral articulation during emotional speech compared to neutral speech (Lee et al., 2005, 2006). Similar results were reported by Nordstrand et al. (2003) and Caldognetto et al. (2003). In fact, these characteristic patterns during emotional speech have been used in facial animation to generate viseme models for expressive virtual agents (Bevacqua and Pelachaud, 2004). ...
... Articulatory parameters such as the tongue tip, jaw and lip also present more peripheral articulation during emotional speech compared to neutral speech (Lee et al., 2005Lee et al., , 2006). Similar results were reported by Nordstrand et al. (2003) and Caldognetto et al. (2003). In fact, these characteristic patterns during emotional speech have been used in facial animation to generate viseme models for expressive virtual agents (Bevacqua and Pelachaud, 2004). ...
... Also, the activeness in the lower face region for sadness decreases 20% compared with the activeness in neutral state. These results reveal that emotional modulation affects the activeness in the face, which agrees with previous work (Lee et al., 2005Lee et al., , 2006 Nordstrand et al., 2003; Caldognetto et al., 2003; Bevacqua and Pelachaud, 2004). Notice that the upper face region presents the highest relative increments for happiness and anger compared to the neutral case (120%). ...
Article
Full-text available
Communicative goals are simultaneously expressed through gestures and speech to convey messages enriched with valuable verbal and non-verbal clues. This paper analyzes and quantifies how linguistic and affective goals are reflected in facial expressions. Using a database recorded from an actress with markers attached to her face, the facial features during emotional speech were compared with the ones expressed during neutral speech. The results show that the facial activeness is mainly driven by articulatory processes. However, clear spatial-temporal patterns are observed during emotional speech, which indicate that emotional goals enhance and modulate facial expressions. The results also show that the upper face region has more degrees of free-dom to convey non-verbal information than the lower face region, which is highly con-strained by the underlying articulatory processes. These results are important toward understanding how humans communicate and interact.
... In Definition Parameters) define the shape of the model, while ve r lips articulatory model and to dr m different configurations have been adopted for articulatory data collection: the irst one, specifically designed for the analysis of labial movements, considers a simscheme with only 8 reflecting markers (bigger grey markers in Fig. 2) while p second, adapted to the analysis of expressive and emotive speech, utilizes the full and complete set of 28 markers. All the movements of the 8 or 28 markers, depending on the adopted acquisition pattern, are recorded and collected, together with their velocity and acceleration, simultaneously with the co-produced speech which is usually segmented and analyzed by means of PRAAT [3], that computes also intensity, duration, spectrograms, formants, pitch synchronous F0, and various voice quality parameters in the case of emotive and expressive speech [4][5]. ...
Conference Paper
Full-text available
INTERFACE is an integrated software implemented in Matlab© and created to speed-up the procedure for building an emotive/expressive talking head. Various processing tools, working on dynamic articulatory data physi- cally extracted by an optotracking 3D movement analyzer called ELITE, were implemented to build the animation engine and also to create the correct WAV and FAP files needed for the animation. By the use of INTERFACE, LUCIA, our animated MPEG-4 talking face, can copy a real human by reproducing the movements of passive markers positioned on his face and recorded by an opto- electronic device, or can be directly driven by an emotional XML tagged input text, thus realizing a true audio/visual emotive/expressive synthesis. LUCIA's voice is based on an Italian version of FESTIVAL - MBROLA packages, modi- fied for expressive/emotive synthesis by means of an appropriate APML/VSML tagged language.
... Two different configurations have been adopted for articulatory data collection: the first one, specifically designed for the analysis of labial movements, considers a simple scheme with only 8 reflecting markers (bigger grey markers onFigure 1a) while the second, adapted to the analysis of expressive and emotive speech, utilizes the full and complete set of 28 markers. All the movements of the 8 or 28 markers, depending on the adopted acquisition pattern, are recorded and collected, together with their velocity and acceleration, simultaneously with the co-produced speech which is usually segmented and analyzed by means of PRAAT [3] , that computes also intensity, duration, spectrograms, formants , pitch synchronous F0, and various voice quality parameters in the case of emotive and expressive speech45. As for the analysis of the labial movements, the most common parameters selected to quantify the labial configuration modifications, as illustrated inFigure 1b for some of them, are introduced in the followingTable: @BULLET Lip Opening (LO), calculated as the distance between markers placed on the central points of the upper and lower lip vermillion borders [d(m2,m3)]; this parameter correlates with the HIGH-LOW phonetic dimension. ...
Conference Paper
Full-text available
Audio/visual speech, in the form of labial movement and facial ex- pression data, was utilized in order to semi-automatically build a new Italian expressive and emotive talking head capable of believable and emotional be- havior. The methodology, the procedures and the specific software tools util- ized for this scope will be described together with some implementation exam- ples.
... First of all we asked ourselves if emotions modify significantly the labial configuration of /'a/, /b/ and /v/ targets. Secondly we investigated how articulatory parameters are modified by emotions and finally we tried to quantify the lip movements emotive range with respect to the linguisticarticulatory targets, see [4,5]. ...
... In order to collect the articulatory and acoustic data, an automatic optotracking movement analyser for 3D kinematic data acquisition (ELITE) was used, which also allows a synchronous recording of the acoustic signal (for previous applications of this data acquisition system to the definition of Italian visemes on an articulatory basis: see [1,2,3,4]). This system tracks the infrared light reflected by small (2 mm diameter), passive markers glued on different points of the external lips contour and of the face, following the scheme inFig.1. ...
... To answer this first question, for each acquisition session, the articulatory data at resting position has been recorded as well, and the extracted parameters have been normalized with respect to this values. With this procedure we obtain the data released from the lips shape, see [4]: The parameters selected to quantify the labial configuration modifications are the following:, calculated as the distance between the markers placed on the left and the right lip corner and the sagittal Σ plane passing through the nose tip and perpendicular to the Ω plane. Positive values correspond to a higher distance from the plane Σ, negative values to a lowering of distance from the plane Σ. @BULLET Left and Right Corner vertical displacements (LCY and RCY), calculated as the distance between the markers placed on the left and right lip corner and the transversal plane Ω, containing the line crossing the markers placed on the lobes of the ears and on the nose. ...
Conference Paper
Full-text available
The aim of the research is the phonetic articulatory description of emotive speech achievable studying the labial movements, which are the product of the compliance with both the phonetic-phonological constraints and the lip configuration required for the visual encoding of emotions. In this research we analyse the interaction between labial configurations, peculiar to six emotions (anger, disgust, joy, fear, surprise and sadness), and the articulatory lip movements defined by phonetic-phonological rules, specific to the vowel /’a/ and consonants /b/ and /v/.