
Piero CosiItalian National Research Council | CNR · Section of Padova
Piero Cosi
E.E. Degree
About
167
Publications
39,750
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,763
Citations
Introduction
Additional affiliations
October 1997 - December 1997
OGI School of Science and Engineering
Position
- Auditory modelling and Automatic Speech Recognition research
Description
- Visiting Researcher (Auditory modelling and Automatic Speech Recognition research)
Publications
Publications (167)
In this study, we present an innovative technique for speaker adaptation in order to improve the accuracy of segmentation with application to unit-selection Text-To-Speech (TTS) systems. Unlike conventional techniques for speaker adaptation, which attempt to improve the accuracy of the segmentation using acoustic models that are more robust in the...
The integrity of phonetic perception abilities is necessary for a normal functioning future speech development. Since the ability to discriminate linguistic sounds is typically associated to the correct acquisition and production of the same sounds, an alteration of this ability could contribute to the onset of speech and language disorders. Suppor...
With the recent availability of industry-grade, high-performing engines for video games production, researchers in different fields have been exploiting the advanced technologies offered by these artefacts to improve the quality of the interactive experiences they design. While these engines provide excellent and easy-to-use tools to design interfa...
In this paper, we propose a new set of experiments to further evaluate the performance of a previously presented system based on an adaptive strategy for stimuli selection masked behind a gamified activity. This involves two virtual agents creating a social setting designed to support a narrative to engage young children. With respect to previously...
The main disadvantages of the existing methods for studying speech articulators (such as electromagnetic and
optoelectronic systems) are the high cost and the discomfort to participants or patients. The aim of this work is to introduce a completely markerless low-cost 3D tracking technique in the context of speech articulation, and then compare it...
The main disadvantages of the existing methods for studying speech articulators (such as electromagnetic and optoelectronic systems) are the high cost and the discomfort to participants or patients. The aim of this work is to introduce a completely markerless low-cost 3D tracking technique in the context of speech articulation, and then compare it...
Social robots have the potential to provide support in a number of practical domains, such as learning and behaviour change. This potential is particularly relevant for children, who have proven receptive to interactions with social robots. To reach learning and therapeutic goals, a number of issues need to be investigated, notably the design of an...
Artificial companion agents are becoming increasingly important in the field of health care, particularly when children are involved, with the aim of providing novel educational tools, supporting communication between young patients and hospital personnel and taking on the role of entertainment robots. The principal application of the European FP7...
In this paper we present a web interface to study Italian through the access to read Italian literature. The system allows to browse the content, search for specific words and listen to the correct pronunciation produced by native speakers in a given context. This work aims at providing people who are interested in learning Italian with a new way o...
In this DEMO we present the first worldwide WebGL implementation of a talking head (LuciaWebGL), and also the first WebGL talking head running on iOS mobile devices (Apple iPhone and iPad).
In this Forced Alignment on Children Speech (FACS) task, systems are required to align audio sequences of children read spoken sentences to the provided relative transcriptions, and the task has to be considered speaker independent.
Artificial companion agents have the potential to combine novel means for effective health communication with young patients sup-port and entertainment. However, the theory and practice of long-term child-robot interaction is currently an under-developed area of research. This paper introduces an approach that integrates multi-ple functional aspect...
The work reported in this paper focuses on giving humanoid robots the capacity to express emotions with their body. Previous results show that adults are able to interpret different key poses displayed by a humanoid robot and also that changing the head position affects the expressiveness of the key poses in a consistent way. Moving the head down l...
This paper concerns the bimodal transmission of emotive speech and describes how the expression of joy, surprise,sadness, disgust, anger, and fear, leads to visual and acoustic target modifications in some Italian phonemes. Current knowledge on the audio-visual transmission of emotive speech traditionally concerns global prosodic and intonational c...
The Evalita 2011 contest proposed two forced alignment tasks, word and phone segmentation, and two modalities, “open” and “closed”. A system for each combination of task and modality has been proposed and submitted for evaluation. Direct use of Silence/Activity detection in forced alignment has been tested. Positive effects were shown in the acoust...
For robots to interact effectively with human users they must be capable of coordinated, timely behavior in response to social context. The Adaptive Strategies for Sustainable Long-Term Social Interaction (ALIZ-E) project focuses on the design of long-term, adaptive social interaction between robots and child users in real-world settings. In this p...
In this paper we present the implementation of a WebGL Talking Head for iOS mobile devices (Apple iPhone and iPad). It works on standard MPEG-4 Facial Animation Parameters (FAPs) and speaks with the Italian version of FESTIVAL TTS. It is totally based on true real human data. The 3D kinematics information are used to create lips articulatory model...
Luciaweb is a 3D Italian talking avatar based on the new WebGL technology. WebGL is the standard programming library to develop 3D computer graphics inside the web browsers. In the last year we developed a facial animation system based on this library to interact with the user in a bimodal way. The overall system is a client-server application usin...
LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR. It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was built from real human data physically extracted by ELITE optic-tracking movement analyzer. LUCIA can copy a real human being b...
Previous results show that adults are able to interpret different key poses displayed by the robot and also that changing the head position affects the expressiveness of the key poses in a consistent way. Moving the head down leads to decreased arousal (the level of energy), valence (positive or negative) and stance (approaching or avoiding) wherea...
Conversational systems play an important role in scenarios without a keyboard, e.g., talking to a robot. Communication in
human-robot interaction (HRI) ultimately involves a combination of verbal and non-verbal inputs and outputs. HRI systems must
process verbal and non-verbal observations and execute verbal and non-verbal actions in parallel, to i...
In this paper, we describe the application of two vocoder tech-niques for an experiment of spectral envelope transformation. We processed speech data in a neutral standard reading style in order to reproduce the spectral shapes of two emotional speak-ing styles: happy and sad. This was achieved by means of con-version functions which operate in the...
Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a processing framework for voice transformations finalized to the analysis and synthesis of e motive speech. We use a GMM-based model to compute the differences between an MBROLA voice and an ang...
This paper presents an Italian database of acted emotional speech and facial expressions. New data regarding the transition between emotional states has been collected. Although acted expressions have intrinsic limitations related to their naturalness, this method can be convenient for speech and faces synthesis and within evaluation frameworks. Us...
INTERFACE is an integrated software implemented in Matlab© and created to speed-up the procedure for building an emotive/expressive talking head. Various processing tools, working on dynamic articulatory data physi- cally extracted by an optotracking 3D movement analyzer called ELITE, were implemented to build the animation engine and also to creat...
In order to speed-up the procedure for building an emotive/expressive talking head such as LUCIA, an integrated software called INTERFACE was designed and implemented in Matlab©. INTERFACE simplifies and automates many of the operations needed for that purpose. A set of processing tools, focusing mainly on dynamic articulatory data physically extra...
The topic of this work is an extension of our previous research on the development of a general data-driven procedure for creating a neutral "narrative-style" prosodic module for the Italian FESTIVAL Text-To-Speech (TTS) synthesizer, and it is focused on investigating and implementing new strategies for building a new emotional FESTIVAL TTS. The ne...
This work was conducted with the specific goals of developing improved recognition of children's speech in Italian and the integration of the children's speech recognition models into the Italian version of the Colorado Literacy Tutor platform. Specifically, children's speech recognition research for Italian was conducted using the ITC-irst Childre...
This paper describes how the visual characteristics of some Italian phones (/’a/, /b/, /v/) are modified in emotive speech by the expression of the “big six” emotions: joy, surprise, sadness, disgust, anger, and fear. In this research we specifically analyze the interaction between the articulatory lip targets of the Italian vowel /’a/ and consonan...
A general data-driven procedure for creating new prosodic modules for the Italian FESTIVAL Text-To-Speech (TTS) [1] synthesizer is described. These modules are based on the "Classification and Regression Trees" (CART) theory. The prosodic factors taken into consideration are: duration, pitch and loudness. Loudness control has been implemented as an...
The Italian Literacy Tutor (ILT) is a project designed to implement a program of individualized, computer-aided reading instruction with the potential to dramatically improve reading achievement and learning from text in Italian, especially considering L2 teaching/learning frameworks or working with children with reading disabilities. The Italian L...
The aim of the research is the phonetic articulatory description of emotive speech achievable studying the labial movements, which are the product of the compliance with both the phonetic-phonological constraints and the lip configuration required for the visual encoding of emotions. In this research we analyse the interaction between labial config...
This paper reports the results of a preliminary cross-evaluation experiment run in the framework of the European research project PF-Star 1, with the double aim of evaluating th e possibility of exchanging FAP data between the involved sites and assessing the adequacy of the emotional facial gestures performed by talking heads. The results provide...
Audio/visual speech, in the form of labial movement and facial ex- pression data, was utilized in order to semi-automatically build a new Italian expressive and emotive talking head capable of believable and emotional be- havior. The methodology, the procedures and the specific software tools util- ized for this scope will be described together wit...
Despite the growing attention towards the communication adequacy of embodied conversational agents (ECAs), standards for their assessment are still missing. This paper reports about a methodology for the evaluation of the adequacy of facial displays in the expression of some basic emotional states, based on a recognition task. We consider recogniti...
In this work, the development of Baldini, an Italian version of Baldi, a computer-animated conversational agent, is presented.
This paper describes how the visual and acoustic characteristics of some Italian phones (/'a/, /b/, /v/) are modifled in emotive speech by the expression of joy, surprise, sadness, disgust, anger, and fear. In this research we speciflcally analyze the interaction between labial conflgurations, peculiar to each emotion, and the articulatory lip move...
LUCIA, a new Italian talking head based on a modified version of the Cohen-Massaro's labial coarticulation model is described. A semi-automatic minimization technique, working on real cinematic data, acquired by the ELITE opto- electronic system, was used to train the dynamic characteristics of the model. LUCIA is an MPEG-4 standard facial animatio...
SOMMARIO In questo articolo si descrive il lavoro realizzato per la definizione ed acquisizione di un database di parlato italiano letto, annotato a livello morfo-sintattico, a livello sintattico a costituenti e a livello prosodico secondo il formalismo ToBI. In particolare, vengono qui presentati il corpus, l'annotazione prosodica (convenzioni, me...
This paper concerns the bimodal transmission of emotive speech and describes how the expression of joy, surprise, sadness, disgust, anger, and fear, leads to visual and acoustic target modifications in some Italian phonemes. Current knowledge on the audio-visual transmission of emotive speech traditionally concerns global prosodic and intonational...
Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a sinusoidal modeling process- ing framework for voice transformations finalized to the anal- ysis and synthesis of emotive speech. A set of acoustic cues is selected to compare the voice quality...
In this work, a slightly modified version of the original PaIntE model, based on an F0 parametrization with an especially designed approximation function, is considered. The model's parameters have been automatically optimized using a small set of Italian ToBI labeled sentences. This method drives our ongoing data-based approach to intonation model...
A modified version of the coarticulation model proposed by Cohen and Massaro (1993) is described. A semi-automatic minimization technique, working on real cinematic data, acquired by the ELITE opto-electronic system, was used to train the dynamic characteristics of the model. Finally, the model was applied with success to GRETA, an Italian talking...
Abstract Our goal is to create a natural talking face with, in particu- lar, lip-readable movements. Based on real data extracted from an Italian speaker with the ELITE system, we have approxi- mated the data using radial basis functions. In this paper we present our 3D facial model based on MPEG-4 standard and our computational model,of lip moveme...
The development of a high-performance telephone-bandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on context-dependent categories to account for coarticulatory variation. Various front-end processing and system arch...
The development of a speaker independent connected "digits" recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the system which is based on an hybrid ANN/HMM architecture. The recognizer is trained on context- dependent categories to account for coarticulatory variation. Various front-end processing was c...
The development of a speaker independent "general purpose" phonetic recognizer for Italian is described. The CSLU Toolkit was used to develop and implement the system. The recognizer, based on a frame-based hybrid HMM/ANN architecture trained on context-dependent categories to account for coarticulatory variation, recognizes 38 different phonemes (...
In this paper we describe the design of a phoneme classi"er that is based on AIDA, a speech database that has been recently proposed as a standard for Italian concerning the phonetic level. We present experimental results using LVQ and show that the proper selection of Kohonen's learning parameter a, based on some intriguing links with Backpropagat...
This paper deals with the more recent results obtained by the
application of the CSLU Toolkit frame-based hybrid HMM/ANN architecture
on the connected digit recognition task for the Italian language. The
hybrid architecture for speaker independent recognition is described and
the last obtained results are introduced in detail