Example Praat annotation of a French utterance: "Son bagou pourrait faciliter la communauté." containing functional contours: declaration (DC), dependency to the left/right (DG/DD), and cliticisation (XX).

Source publication

PySFC - A System for Prosody Analysis based on the Superposition of Functional Contours Prosody Model

Conference Paper

Full-text available

Jun 2018

Context 1

... the SFC are only a small part of the PySFC ecosystem, with the majority of code going into the necessary tools for working with the data, and to a lesser extent into the plotting functionalities. Currently, PySFC supports the proprietary SFC fpro file format as well as standard Praat TextGrid annotations. An example Praat annotation is shown in Fig. 2. The levels that are important for the PySFC are the phonetic (PHON) and syl- lable (SYLL) interval tiers, which are used for determining the vocalic nuclei and RU boundaries, and the linguistic function tiers. The latter are point tiers in which each function's scope is marked by a start and end point (":FF") and the anchor RU is ...

View in full-text

Context 2

... and RU boundaries, and the linguistic function tiers. The latter are point tiers in which each function's scope is marked by a start and end point (":FF") and the anchor RU is marked with a landmark (":" plus the function type, e.g. ":DC"). If the function has no left or right context, the land- mark itself delimits the scope, e.g. for DC. In Fig. 2, there is a PHRASE tier that holds the attitude, NIV1 with an overlapped dependency to the left (DG) and a dependency to the right (DD), and NIV2 with three clitic (XX) contours. The PySFC decom- position of the example annotated here is shown later in Fig. ...

View in full-text

Can prosody meet pragmatics? Case of discourse particles in French

Conference Paper

Full-text available

Aug 2019

Relation between the colligation of seem and its semantic preference.

A Corpus-Based Study of Semantic Prosody and Semantic Preference of “Seem”

Article

Full-text available

Jan 2020

Weiqian Liu

Automatic assessment of non-native prosody annotation, modelling and evaluation

Book

Full-text available

Jan 2012

Fig. 1 : Fréquence moyenne des noyaux non-finaux (quatre locuteurs dans...

Fig. 5 : Fréquence moyenne des noyaux non-finaux (quatre locuteurs pour...

Prosodie et structure informationelle : La mobilité des noyaux prosodiques en français et en tchèque

Article

Full-text available

Jun 2012

Tomáš Duběda

`Complex perceptions.' An account of the word order, information structure and prosody of the Ancient Greek AcP construction

Thesis

Full-text available

Jun 2012

Kees Thijs

Sequence-To-Sequence Modelling of F0 for Speech Emotion Conversion

Conference Paper

Full-text available

May 2019

Voice interfaces are becoming wildly popular and driving demand for more advanced speech synthesis and voice transformation systems. Current text-to-speech methods produce realistic sounding voices, but they lack the emotional expressivity that listeners expect, given the context of the interaction and the phrase being spoken. Emotional voice conversion is a research domain concerned with generating expressive speech from neutral synthesised speech or natural human voice. This research investigated the effectiveness of using a sequence-to-sequence (seq2seq) encoder-decoder based model to transform the intonation of a human voice from neutral to expressive speech, with some preliminary introduction of linguistic conditioning. A subjective experiment conducted on the task of speech emotion recognition by listeners successfully demonstrated the effectiveness of the proposed sequence-to-sequence models to produce convincing voice emotion transformations. In particular, conditioning the model on the position of the syllable in the phrase significantly improved recognition rates.

Hierarchical prosody modeling for Mandarin spontaneous speech

Article

Apr 2019
J ACOUST SOC AM

In this paper, a hierarchical prosody model (HPM)-based method for Mandarin spontaneous speech is proposed. First, an HPM is designed for describing relations among acoustic features of utterances, linguistic features of texts, and prosodic tags representing the underlying hierarchical prosodic structures of utterances. Subsequently, a sequential optimization algorithm is employed to train the HPM based on a large conversational speech corpus, the Mandarin Conversational Dialogue Corpus (MCDC), which features orthographic transcriptions and prosodic event annotations. In this unsupervised training method, all utterances of the MCDC are labeled with two types of prosodic tags, namely, break and prosodic states, automatically and simultaneously. After training, the HPM parameters are examined to identify critical prosodic properties of Mandarin spontaneous speech, which are then compared with their counterparts in the read-speech HPM. The prosodic tags on the studied utterances enable mapping of various prosodic events onto the hierarchical prosodic structures of the utterances. Prosodic analyses of some disfluent events are conducted using the prosodic tags affixed to the MCDC. Finally, an application of the HPM to assist in Mandarin spontaneous-speech recognition is discussed. Significant relative error rate reductions of 9.0%, 9.2%, 15.6%, and 7.3% are obtained for base-syllable, character, tone, and word recognition, respectively.

Embedding Context-Dependent Variations of Prosodic Contours using Variational Encoding for Decomposing the Structure of Speech Prosody

Conference Paper

Full-text available

Nov 2018

Prosody in speech is used to communicate a variety of linguistic, paralinguistic and non-linguistic information via multiparametric contours. The Superposition of Functional Contours (SFC) model is capable of extracting the average shape of these elementary contours through iterative analysis-by-synthesis training of neural network contour generators (CGs). The Weighted SFC (WSFC) model is an extension to the SFC that can capture the prominence of each functional contour in the final prosody. Finally, the recently proposed Variational Prosody Model (VPM) is able, in addition, to capture a part of the functional contours' variance. Its variational CGs (VCGs) use the linguistic context input to map out a prosodic latent space for each contour. Here we propose an extension on the VPM based on variance embedding and recurrent neural network contour generators (VRCGs). This approach decouples the prosodic latent space from the length of the contour's scope, thus it can now be readily explored even for longer contours.

Learning socio-communicative behaviors of a humanoid robot by demonstration

Thesis

Oct 2018

Duc-Canh Nguyen

A socially assistive robot (SAR) is meant to engage people into situated interaction such as monitoring physical exercise, neuropsychological rehabilitation or cognitive training. While the interactive behavioral policies of such systems are mainly hand-scripted, we discuss here key features of the training of multimodal interactive behaviors in the framework of the SOMBRERO project.In our work, we used learning by demonstration in order to provide the robot with adequate skills for performing collaborative tasks in human centered environments. There are three main steps of learning interaction by demonstration: we should (1) collect representative interactive behaviors from human coaches; (2) build comprehensive models of these overt behaviors while taking into account a priori knowledge (task and user model, etc.); and then (3) provide the target robot with appropriate gesture controllers to execute the desired behaviors.Multimodal HRI (Human-Robot Interaction) models are mostly inspired by Human-Human interaction (HHI) behaviors. Transferring HHI behaviors to HRI models faces several issues: (1) adapting the human behaviors to the robot’s interactive capabilities with regards to its physical limitations and impoverished perception, action and reasoning capabilities; (2) the drastic changes of human partner behaviors in front of robots or virtual agents; (3) the modeling of joint interactive behaviors; (4) the validation of the robotic behaviors by human partners until they are perceived as adequate and meaningful.In this thesis, we study and make progress over those four challenges. In particular, we solve the two first issues (transfer from HHI to HRI) by adapting the scenario and using immersive teleoperation. In addition, we use Recurrent Neural Networks to model multimodal interactive behaviors (such as speech, gaze, arm movements, head motion, backchannels) that surpass traditional methods (Hidden Markov Model, Dynamic Bayesian Network, etc.) in both accuracy and coordination between the modalities. We also build and evaluate a proof-of-concept autonomous robot to perform the tasks.

A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours

Conference Paper

Full-text available

Sep 2018

The way speech prosody encodes linguistic, paralinguistic and non-linguistic information via multiparametric representations of the speech signals is still an open issue. The Superposition of Functional Contours (SFC) model proposes to decompose prosody into elementary multiparametric functional contours through the iterative training of neural network contour generators using analysis-by-synthesis. Each generator is responsible for computing multiparametric contours that encode one given linguistic, paralinguistic and non-linguistic information on a variable scope of rhythmic units. The contributions of all generators' outputs are then overlapped and added to produce the prosody of the utterance. We propose an extension of the contour generators that allows them to model the prominence of the elementary contours based on contextual information. WSFC jointly learns the patterns of the elementary multiparametric functional contours and their weights dependent on the contours' contexts. The experimental results show that the proposed weighted SFC (WSFC) model can successfully capture contour prominence and thus improve SFC modelling performance. The WSFC is also shown to be effective at modelling the impact of attitudes on the prominence of functional contours cuing syntactic relations in French, and that of emphasis on the prominence of tone contours in Chinese.

A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours

Preprint

Full-text available

Jun 2018

The significance of scope in modelling tones in Chinese

Conference Paper

Full-text available

Jun 2018

The Superposition of Functional Contours (SFC) prosody model decomposes the intonation and duration contours into elementary contours that encode specific linguistic functions. It can be used to extract these functional contours at multiple linguistic levels. The PySFC system, which incorporates the SFC, can thus be used to analyse the significance of including the neighbouring syllables in the scope of the tone functional contours in spoken Chinese on the modelling of prosody. Our results show that significant improvements of modelling tone functional contours are obtained by including the right syllable in the scope, but not the left one. We thus show that there is a larger carry-over effect for Chinese tones in contrast to an anticipatory one. This finding is in line with the established state-of-the-art.

Example Praat annotation of a French utterance: "Son bagou pourrait faciliter la communauté." containing functional contours: declaration (DC), dependency to the left/right (DG/DD), and cliticisation (XX).

Contexts in source publication

Similar publications

Citations