Piero Cosi

Piero Cosi
Italian National Research Council | CNR · Section of Padova

E.E. Degree

About

166
Publications
36,116
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,678
Citations
Citations since 2016
5 Research Items
550 Citations
2016201720182019202020212022020406080
2016201720182019202020212022020406080
2016201720182019202020212022020406080
2016201720182019202020212022020406080
Additional affiliations
September 2008 - October 2008
The University of Sydney
Position
  • Speech Signal Processing
Description
  • Visiting researcher CNR - Short-Term Mobility Program 2008
June 2003 - July 2003
University of Colorado Boulder
Position
  • CSLR International Workshop
Description
  • Visiting researcher at “CSLR International Workshop” June 30, 2003 - July 25, 2003.
October 1997 - December 1997
OGI School of Science and Engineering
Position
  • Auditory modelling and Automatic Speech Recognition research
Description
  • Visiting Researcher (Auditory modelling and Automatic Speech Recognition research)

Publications

Publications (166)
Preprint
Full-text available
In this study, we present an innovative technique for speaker adaptation in order to improve the accuracy of segmentation with application to unit-selection Text-To-Speech (TTS) systems. Unlike conventional techniques for speaker adaptation, which attempt to improve the accuracy of the segmentation using acoustic models that are more robust in the...
Chapter
Full-text available
The integrity of phonetic perception abilities is necessary for a normal functioning future speech development. Since the ability to discriminate linguistic sounds is typically associated to the correct acquisition and production of the same sounds, an alteration of this ability could contribute to the onset of speech and language disorders. Suppor...
Article
Full-text available
With the recent availability of industry-grade, high-performing engines for video games production, researchers in different fields have been exploiting the advanced technologies offered by these artefacts to improve the quality of the interactive experiences they design. While these engines provide excellent and easy-to-use tools to design interfa...
Conference Paper
Full-text available
In this paper, we propose a new set of experiments to further evaluate the performance of a previously presented system based on an adaptive strategy for stimuli selection masked behind a gamified activity. This involves two virtual agents creating a social setting designed to support a narrative to engage young children. With respect to previously...
Conference Paper
Full-text available
The main disadvantages of the existing methods for studying speech articulators (such as electromagnetic and optoelectronic systems) are the high cost and the discomfort to participants or patients. The aim of this work is to introduce a completely markerless low-cost 3D tracking technique in the context of speech articulation, and then compare it...
Article
Full-text available
The main disadvantages of the existing methods for studying speech articulators (such as electromagnetic and optoelectronic systems) are the high cost and the discomfort to participants or patients. The aim of this work is to introduce a completely markerless low-cost 3D tracking technique in the context of speech articulation, and then compare it...
Article
Full-text available
Social robots have the potential to provide support in a number of practical domains, such as learning and behaviour change. This potential is particularly relevant for children, who have proven receptive to interactions with social robots. To reach learning and therapeutic goals, a number of issues need to be investigated, notably the design of an...
Conference Paper
Full-text available
Chapter
Artificial companion agents are becoming increasingly important in the field of health care, particularly when children are involved, with the aim of providing novel educational tools, supporting communication between young patients and hospital personnel and taking on the role of entertainment robots. The principal application of the European FP7...
Conference Paper
Full-text available
In this paper we present a web interface to study Italian through the access to read Italian literature. The system allows to browse the content, search for specific words and listen to the correct pronunciation produced by native speakers in a given context. This work aims at providing people who are interested in learning Italian with a new way o...
Article
Full-text available
In this DEMO we present the first worldwide WebGL implementation of a talking head (LuciaWebGL), and also the first WebGL talking head running on iOS mobile devices (Apple iPhone and iPad).
Conference Paper
Full-text available
In this Forced Alignment on Children Speech (FACS) task, systems are required to align audio sequences of children read spoken sentences to the provided relative transcriptions, and the task has to be considered speaker independent.
Article
Full-text available
Artificial companion agents have the potential to combine novel means for effective health communication with young patients sup-port and entertainment. However, the theory and practice of long-term child-robot interaction is currently an under-developed area of research. This paper introduces an approach that integrates multi-ple functional aspect...
Article
Full-text available
The work reported in this paper focuses on giving humanoid robots the capacity to express emotions with their body. Previous results show that adults are able to interpret different key poses displayed by a humanoid robot and also that changing the head position affects the expressiveness of the key poses in a consistent way. Moving the head down l...
Article
Full-text available
This paper concerns the bimodal transmission of emotive speech and describes how the expression of joy, surprise,sadness, disgust, anger, and fear, leads to visual and acoustic target modifications in some Italian phonemes. Current knowledge on the audio-visual transmission of emotive speech traditionally concerns global prosodic and intonational c...
Chapter
Full-text available
The Evalita 2011 contest proposed two forced alignment tasks, word and phone segmentation, and two modalities, “open” and “closed”. A system for each combination of task and modality has been proposed and submitted for evaluation. Direct use of Silence/Activity detection in forced alignment has been tested. Positive effects were shown in the acoust...
Article
Full-text available
For robots to interact effectively with human users they must be capable of coordinated, timely behavior in response to social context. The Adaptive Strategies for Sustainable Long-Term Social Interaction (ALIZ-E) project focuses on the design of long-term, adaptive social interaction between robots and child users in real-world settings. In this p...
Conference Paper
Full-text available
In this paper we present the implementation of a WebGL Talking Head for iOS mobile devices (Apple iPhone and iPad). It works on standard MPEG-4 Facial Animation Parameters (FAPs) and speaks with the Italian version of FESTIVAL TTS. It is totally based on true real human data. The 3D kinematics information are used to create lips articulatory model...
Conference Paper
Full-text available
Luciaweb is a 3D Italian talking avatar based on the new WebGL technology. WebGL is the standard programming library to develop 3D computer graphics inside the web browsers. In the last year we developed a facial animation system based on this library to interact with the user in a bimodal way. The overall system is a client-server application usin...
Conference Paper
Full-text available
LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR. It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was built from real human data physically extracted by ELITE optic-tracking movement analyzer. LUCIA can copy a real human being b...
Conference Paper
Full-text available
Previous results show that adults are able to interpret different key poses displayed by the robot and also that changing the head position affects the expressiveness of the key poses in a consistent way. Moving the head down leads to decreased arousal (the level of energy), valence (positive or negative) and stance (approaching or avoiding) wherea...
Chapter
Full-text available
Conversational systems play an important role in scenarios without a keyboard, e.g., talking to a robot. Communication in human-robot interaction (HRI) ultimately involves a combination of verbal and non-verbal inputs and outputs. HRI systems must process verbal and non-verbal observations and execute verbal and non-verbal actions in parallel, to i...
Article
Full-text available
In this paper, we describe the application of two vocoder tech-niques for an experiment of spectral envelope transformation. We processed speech data in a neutral standard reading style in order to reproduce the spectral shapes of two emotional speak-ing styles: happy and sad. This was achieved by means of con-version functions which operate in the...
Conference Paper
Full-text available
Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a processing framework for voice transformations finalized to the analysis and synthesis of e motive speech. We use a GMM-based model to compute the differences between an MBROLA voice and an ang...
Article
Full-text available
This paper presents an Italian database of acted emotional speech and facial expressions. New data regarding the transition between emotional states has been collected. Although acted expressions have intrinsic limitations related to their naturalness, this method can be convenient for speech and faces synthesis and within evaluation frameworks. Us...
Conference Paper
Full-text available
INTERFACE is an integrated software implemented in Matlab© and created to speed-up the procedure for building an emotive/expressive talking head. Various processing tools, working on dynamic articulatory data physi- cally extracted by an optotracking 3D movement analyzer called ELITE, were implemented to build the animation engine and also to creat...
Conference Paper
Full-text available
In order to speed-up the procedure for building an emotive/expressive talking head such as LUCIA, an integrated software called INTERFACE was designed and implemented in Matlab©. INTERFACE simplifies and automates many of the operations needed for that purpose. A set of processing tools, focusing mainly on dynamic articulatory data physically extra...
Conference Paper
Full-text available
The topic of this work is an extension of our previous research on the development of a general data-driven procedure for creating a neutral "narrative-style" prosodic module for the Italian FESTIVAL Text-To-Speech (TTS) synthesizer, and it is focused on investigating and implementing new strategies for building a new emotional FESTIVAL TTS. The ne...
Conference Paper
Full-text available
This work was conducted with the specific goals of developing improved recognition of children's speech in Italian and the integration of the children's speech recognition models into the Italian version of the Colorado Literacy Tutor platform. Specifically, children's speech recognition research for Italian was conducted using the ITC-irst Childre...
Article
This paper describes how the visual characteristics of some Italian phones (/’a/, /b/, /v/) are modified in emotive speech by the expression of the “big six” emotions: joy, surprise, sadness, disgust, anger, and fear. In this research we specifically analyze the interaction between the articulatory lip targets of the Italian vowel /’a/ and consonan...
Article
Full-text available
A general data-driven procedure for creating new prosodic modules for the Italian FESTIVAL Text-To-Speech (TTS) [1] synthesizer is described. These modules are based on the "Classification and Regression Trees" (CART) theory. The prosodic factors taken into consideration are: duration, pitch and loudness. Loudness control has been implemented as an...
Conference Paper
The Italian Literacy Tutor (ILT) is a project designed to implement a program of individualized, computer-aided reading instruction with the potential to dramatically improve reading achievement and learning from text in Italian, especially considering L2 teaching/learning frameworks or working with children with reading disabilities. The Italian L...
Conference Paper
Full-text available
The aim of the research is the phonetic articulatory description of emotive speech achievable studying the labial movements, which are the product of the compliance with both the phonetic-phonological constraints and the lip configuration required for the visual encoding of emotions. In this research we analyse the interaction between labial config...
Conference Paper
Full-text available
This paper reports the results of a preliminary cross-evaluation experiment run in the framework of the European research project PF-Star 1, with the double aim of evaluating th e possibility of exchanging FAP data between the involved sites and assessing the adequacy of the emotional facial gestures performed by talking heads. The results provide...
Conference Paper
Full-text available
Audio/visual speech, in the form of labial movement and facial ex- pression data, was utilized in order to semi-automatically build a new Italian expressive and emotive talking head capable of believable and emotional be- havior. The methodology, the procedures and the specific software tools util- ized for this scope will be described together wit...
Conference Paper
Full-text available
Despite the growing attention towards the communication adequacy of embodied conversational agents (ECAs), standards for their assessment are still missing. This paper reports about a methodology for the evaluation of the adequacy of facial displays in the expression of some basic emotional states, based on a recognition task. We consider recogniti...
Article
Full-text available
In this work, the development of Baldini, an Italian version of Baldi, a computer-animated conversational agent, is presented.
Article
This paper describes how the visual and acoustic characteristics of some Italian phones (/'a/, /b/, /v/) are modifled in emotive speech by the expression of joy, surprise, sadness, disgust, anger, and fear. In this research we speciflcally analyze the interaction between labial conflgurations, peculiar to each emotion, and the articulatory lip move...
Conference Paper
Full-text available
LUCIA, a new Italian talking head based on a modified version of the Cohen-Massaro's labial coarticulation model is described. A semi-automatic minimization technique, working on real cinematic data, acquired by the ELITE opto- electronic system, was used to train the dynamic characteristics of the model. LUCIA is an MPEG-4 standard facial animatio...
Article
Full-text available
SOMMARIO In questo articolo si descrive il lavoro realizzato per la definizione ed acquisizione di un database di parlato italiano letto, annotato a livello morfo-sintattico, a livello sintattico a costituenti e a livello prosodico secondo il formalismo ToBI. In particolare, vengono qui presentati il corpus, l'annotazione prosodica (convenzioni, me...
Article
Full-text available
This paper concerns the bimodal transmission of emotive speech and describes how the expression of joy, surprise, sadness, disgust, anger, and fear, leads to visual and acoustic target modifications in some Italian phonemes. Current knowledge on the audio-visual transmission of emotive speech traditionally concerns global prosodic and intonational...
Article
Full-text available
Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a sinusoidal modeling process- ing framework for voice transformations finalized to the anal- ysis and synthesis of emotive speech. A set of acoustic cues is selected to compare the voice quality...
Conference Paper
Full-text available
In this work, a slightly modified version of the original PaIntE model, based on an F0 parametrization with an especially designed approximation function, is considered. The model's parameters have been automatically optimized using a small set of Italian ToBI labeled sentences. This method drives our ongoing data-based approach to intonation model...