Emanuela Magno Caldognetto's research while affiliated with Italian National Research Council and other places

Publications (56)

Article
Full-text available
This paper concerns the bimodal transmission of emotive speech and describes how the expression of joy, surprise,sadness, disgust, anger, and fear, leads to visual and acoustic target modifications in some Italian phonemes. Current knowledge on the audio-visual transmission of emotive speech traditionally concerns global prosodic and intonational c...
Chapter
Full-text available
In order to be believable, embodied conversational agents (ECAs) must show expression of emotions in a consistent and natural looking way across modalities. The ECA has to be able to display coordinated signs of emotion during realistic emotional behaviour. Such a capability requires one to study and represent emotions and coordination of modalitie...
Article
Full-text available
Irony has been studied by famous scholars across centuries, as well as more recently in cognitive and pragmatic research. The prosodic and visual signals of irony were also studied. Irony is a communicative act in which the Sender’s literal goal is to communicate a meaning x, but through this meaning the Sender has the goal to communicate another m...
Article
We report the case of an Italian speaker (GBC), with classical Wernicke's aphasia following a vascular lesion in the posterior middle temporal region. GBC exhibited a selective deficit in spoken language production affecting vowels more than consonants. In reading from a newspaper, GBC substituted vowels for other vowels from the Italian inventory...
Article
Full-text available
This paper presents an Italian database of acted emotional speech and facial expressions. New data regarding the transition between emotional states has been collected. Although acted expressions have intrinsic limitations related to their naturalness, this method can be convenient for speech and faces synthesis and within evaluation frameworks. Us...
Article
This paper describes how the visual characteristics of some Italian phones (/’a/, /b/, /v/) are modified in emotive speech by the expression of the “big six” emotions: joy, surprise, sadness, disgust, anger, and fear. In this research we specifically analyze the interaction between the articulatory lip targets of the Italian vowel /’a/ and consonan...
Conference Paper
Full-text available
The aim of the research is the phonetic articulatory description of emotive speech achievable studying the labial movements, which are the product of the compliance with both the phonetic-phonological constraints and the lip configuration required for the visual encoding of emotions. In this research we analyse the interaction between labial config...
Article
This paper describes how the visual and acoustic characteristics of some Italian phones (/'a/, /b/, /v/) are modifled in emotive speech by the expression of joy, surprise, sadness, disgust, anger, and fear. In this research we speciflcally analyze the interaction between labial conflgurations, peculiar to each emotion, and the articulatory lip move...
Conference Paper
Full-text available
We aim at creating Embodied Conversational Agents (ECAs) able to communicate multimodally with a user or with other ECAs. In this paper we focus on the Gestural Mind Markers, that is, those gestures that convey information on the Speaker’s Mind; we present the ANVIL-SCORE, a tool to analyze and classify multimodal data that is a semantically augmen...
Conference Paper
Full-text available
We describe the 3D spatial characteristics of upper and lower lip in the articulatory production of /p/ and /f/ consonantal targets in 'VCV pseudo-words (V: /a i u/). Results will be relevant to the discussion of articulatory production and coarticulation theories, to the cross-linguistic comparisons, to the visual articulatory perception and to th...
Conference Paper
Full-text available
A modified version of the coarticulation model proposed by Cohen and Massaro (1993) is described. A semi-automatic minimization technique, working on real cinematic data, acquired by the ELITE opto-electronic system, was used to train the dynamic characteristics of the model. Finally, the model was applied with success to GRETA, an Italian talking...
Conference Paper
Full-text available
Abstract Our goal is to create a natural talking face with, in particu- lar, lip-readable movements. Based on real data extracted from an Italian speaker with the ELITE system, we have approxi- mated the data using radial basis functions. In this paper we present our 3D facial model based on MPEG-4 standard and our computational model,of lip moveme...
Conference Paper
Full-text available
L'intelligibilità dei movimenti articolatori visibili: caratteristiche degli stimoli vs. bias linguistici in corso di stampa in (P. Cosi e E. Magno Caldognetto (a cura di), Atti delle XI Giornate di Studio del G.F.S., Padova.
Conference Paper
Full-text available
The description of visual stimuli structure and the individuation of the cues critical for visemes identification is essential when developing unimodal and/or bimodal speech perception theories. In order to complete our previous researches on 3D spatial static vocalic and consonantal targets, the present paper focuses on the analysis of the spatio-...
Conference Paper
Full-text available
LÕarticolo presenta i risultati delle analisi acustiche di enunciati, costituiti da una parola, pronunciati da tre soggetti, finalizzate allÕindividuazione degli indici macroprosodici che veicolano la trasmissione vocale delle emozioni di gioia, tristezza, disgusto, paura, collera e sorpresa.
Article
Full-text available
In order to identify the Italian consonantal visemes, to verify the results of perceptive tests and elaborate rules for bimodal synthesis and recognition, the 3D (lip height, lip width, lower lip protrusion) lip target shapes for all the 21 Italian consonants were determined. Moreover, the spatio-temporal characteristics of the closure/opening move...
Chapter
Full-text available
) It is well known that changes in the rate of movement of the speech articulators can modify the bell-shaped profile of a typical velocity curve. In this experiment, four stutterers and four nonstutterers produced 10 sequences of /papapapa.../ and 10 of /bababa.../ at comfortable rate, and then at maximal rate. The kinematics of the opening and cl...
Conference Paper
Full-text available
A speaker independent bimodal phonetic classification experiment regarding Italian plosive consonants is described. The phonetic classification scheme is based an a feedforward recurrent back-propagation neural network working on audio and visual information. The speech signal is processed by an auditory model producing spectral-like parameters, wh...
Article
A phonetic classification scheme based on a feed forward recurrent back-propagation neural network working on audio and visual information is described. The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a specialised hardware, called ELITE, computing lip and jaw kinematic...
Article
Full-text available
This research focuses on the spatio-temporal characteristics of lips and jaw movements and on their relevance for lip-reading, bimodal communication theory and bimodal recognition applications. 3D visible articulatory targets for vowels and consonants are proposed. Relevant modifications on the spatiotemporal consonant targets due to coarticulatory...
Conference Paper
Full-text available
Viene qui illustrato il dispositivo sperimentale messo a punto presso il CSRF del CNR di Padova che, tramite l'associazione di ELITE ed elettropalatografo RION, permette la registrazione e l'analisi contemporanea dei movimenti labiali, mandibolari e linguali e anche del prodotto acustico. I risultati così ottenuti permettono di indagare le caratter...
Article
Full-text available
There are two types of studies on disfluencies which are considered to be traditional. The first type consists on the individuation and quantification of the cognitive and situational variables that influence the stutterers' linguistic performance. The second concerns the localization of the "loci" of the utterance associated with the stuttering oc...
Article
Full-text available
A bimodal automatic speech recognition system, in which the speech signal is synchronously analyzed by an audio channel producing spectral-like parameters every 2 ms and by a visual channel computing lip and jaw kinematic parameters, is described and some results are given for various speaker independent phonetic recognition experiments regarding t...
Conference Paper
Full-text available
A bimodal automatic speech recognition system, using simultaneously auditory model and articulatory parameters, is described. Results given for various speaker dependent phonetic recognition experiments, regarding the Italian plosive class, show the usefulness of this approach especially in noisy conditions
Article
Hesitation analysis of spontaneous production from three neologistic jargonaphasics is described. The results appear to differ from patient to patient as far as the relative proportion in the number and length of pauses before correct words and mistakes is concerned. Generalization of the conclusion beyond single cases may not therefore be legitima...
Article
ABSTRACT— Discrimination and identification of emotions in human voice was studied in normal controls and in 4 groups of brain-damaged subjects, subdivided along the right/left and anterior/posterior dimensions. Results showed a failure of right-brain-damaged patients, the right posterior group being significantly worse than all the other groups. Q...
Article
Full-text available
1. INTRODUZIONE Nella didattica on line la rete è utilizzata essenzialmente per l'erogazione di materiale didattico multimediale sia da parte del docente che da parte degli studenti e per la comunicazione nelle comunità di apprendimento. L'interazione dialogica può essere asincrona (e-mail, forum, newsletter) o sincrona (chat, audioconferenza, vide...
Article
Full-text available
This work focuses on the description of the environment and the procedures utilized at ISTC-SPFD for the biometric data collection of visual face-related articulatory (spatio-temporal) movements useful for lip-reading, bimodal communication theory, and the development of bimodal talking head and bimodal speech recognition applications.
Article
The role of tutoring systems and the user modelling interface are central issues on in e-learning community studies (Bevacqua et al., in press ; Berry et al., 2005; De Carolis et al., 2005). In this paper we propose a system based on a text-to-speech 3D synthesised animated face incorporating cognitive, linguistic and phonetics features (and the co...

Citations

... It may also be the case that radial, compared to horizontal, visual motion shares more perceptual attributes with speech-related mouth movements (such as protrusions and rounding; Zmarich and Caldognetto, 2003 ) and speech time-varying acoustic forms that together contributes to shape abstract speech representations in the human brain ( Eberhardt et al., 2014 ). In fact, there is initial evidence that the visual cortex, when early deprived of visual inputs, shows even enhanced functional synchronicity to the acoustic temporal dynamics of speech in blind humans ( Van Ackeren et al., 2018 ), suggesting that early visual and auditory regions may both contributes to the formation of abstracted speech representations. ...
... Nella loro applicazione alla percezione visiva i test d'intelligibilità fonologica permettono di valutare quantitativamente e qualitativamente l'informazione estratta dai movimenti articolatori visibili. I risultati di tali test forniscono, sulla base della valutazione a priori dei punteggi di riconoscimento corretto, le gerarchie di intelligibilità per le singole unità fonologiche o per le classi fonetiche di modo, luogo e sonorità, mentre la valutazione globale dei risultati corretti e degli errori di confusione permette l'individuazione a posteriori di gruppi di stimoli visivi che vengono confusi tra loro per similarità, cioè i visemi [1,2]. In questo articolo vengono valutati gli effetti sull'intelligibilità della qualità degli stimoli (cioè dell'opposizione tra stimoli statici vs stimoli dinamici) e della variabilità indotta dal contesto vocalico sulle caratteristiche visibili delle consonanti (sia in termini di bersaglio 3D che di movimenti labiali [3,4]). ...
... Finally we need to convert these 4 parameters in parameters that drive the facial model, i.e. in FAPS (see as exampleFigure 8). For the moment we chose sequences of the type /'aCa/ where C is one of the consonants /p, f, t, s, l, λ, /, i.e. the most preferred consonants in the identification tests of the visible articulatory movements [7, 8]. In fact, it is well known that the distinction, within homorganic consonants (as for instance /p, b, m/), between voiced and unvoiced consonants and between oral and nasal consonants, is not visually detectable, because vocal folds and velum movements are not visible. ...
... Infine viene esplicitata l'utilità delle ricerche sulla percezione dei movimenti articolatori visibili nell'elaborazione delle teorie sulla percezione dei suoni linguistici (per una revisione cfr. [5]) e nelle applicazioni tecnologiche della sintesi bimodale del parlato, Facce Parlanti e Agenti Virtuali ( [6]). ...
... The aim of this study is twofold: to explore the feasibility of using the Qualisys Mac Reflex motion tracking system to acquire dialogic speech as well as to investigate the potential of analysing how specific head movements are used to signal feedback and show evidence that it is possible to measure and quantify the entity of these movements in the acquired data 23 . So far recordings with optoelectronic systems had mainly focussed on the acquisition of short prompted utterances for the purpose of studying articulation [Hällgren & Lyberg 1998;Magno Caldognetto & Zmarich 1999, Granström, House & Beskow 2002], but also for the purpose of estimating face motion from the speech acoustic [Yehia, Kuratate & Vakiotakis-Bateson 2002]. The novelty of the method here proposed consist in the acquisition of semi-spontaneous dialogic speech using an opto-electronic motion tracking system for the purpose of studying feedback phenomena. ...
... AR is expressed as number of syllables per second (syll/sec) of the uttered sequence in relation to the duration of the entire phonic chain (delimited by empty pauses) excluding hesitations and disfluencies (cfr. Zmarich, Magno Caldognetto & Ferrero, 1996) 8 . This index is computed for the NT only. ...
... Tab. 2. Tabella di conversione dei punteggi di gravità della balbuzie all'SSI-3 in percentili di popolazione e in livelli qualitativi di gravità corrispondenti Il Profilo delle disfluenze è un indice che si basa sulla classificazione delle disfluenze proposta da Yairi e Ambrose (2005), secondo i quali le disfluenze individuabili nel parlato di bambini balbuzienti possono essere discriminate qualitativamente in due diverse categorie: SLD e OD (per i rapporti tra le disfluenze e gli speech errors e gli speech repairs prodotti dai parlanti balbuzienti e non balbuzienti, v. Magno Caldognetto, Zmarich 1995). Le SLD sono le vere e proprie disfluenze da balbuzie e sono state già descritte alla sezione 1. Le OD, rilevabili anche nel parlato di persone normofluenti, includono le interiezioni, le ripetizioni di parole polisillabiche e di frasi, le revisioni e le interruzioni di frasi. ...
... Prima di descrivere gli studi relativi alla codifica fonetica nella balbuzie, è meglio premettere che ci sono moltissimi studi che hanno trovato differenze tra i balbuzienti e i normoparlanti su misure cinematiche o acustiche del parlato, ma che queste misure non sono attribuibili in modo diretto e univoco a un determinato stadio di produzione del modello di Levelt. Di questo tipo sono gli studi acustici che hanno trovato il parlato dei balbuzienti più lento (durate segmentali, e sub-segmentali, come il VOT, più lunghe), e gli studi cinematici che hanno evidenziato come i gesti articolatori dei balbuzienti siano caratterizzati da una maggior durata, minore ampiezza e velocità massima inferiore rispetto ai normoparlanti (Zmarich, Magno Caldognetto & Vagges, 1994 per altri riferimenti bibliografici, cfr. van Lieshout, Hulstijn & Peters, 2004;Max 2004). ...
... At this point we need to clarify that the objective of this work is to examine and classify the different descriptions of movement and the languages we use to convey them and not the symbolic meanings that the movement might convey. This differentiates our work form multilayered models that have been presented to annotate gestural and bodily communication [1] [25]. Of course expressing emotions and feelings in dance are very important aspects and can imply a specific segmentation on movement but this aspect falls out of the scope of our current work. ...
... p < .001 McNeill 1992;Poggi 2007), as also witnessed by literature on aphasics' gestures (Feyereisen 1983;Hadar et al. 1998;Magno Caldognetto and Poggi 1995). The same could be the case for iconics, metaphorics and deictics. ...