
Daniel HirstFrench National Centre for Scientific Research (CNRS) & Aix-Marseille University · Laboratoire Parole et Langage
Daniel Hirst
PhD, Dr Hab.
About
193
Publications
135,613
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,402
Citations
Introduction
Additional affiliations
Education
September 1982 - September 1987
Institut de Phonétique, Université de Provence
Field of study
- Phonetics and Linguistics
September 1970 - September 1974
Insitut de Phonétique, Université de Provence
Field of study
- Phonetics and Linguistics
September 1965 - June 1968
St. David's College, University of Wales
Field of study
- English Literature
Publications
Publications (193)
In this study we analyse 18 metrics which were extracted fully automatically from the acoustic signal to describe the melodic characteristics of recordings of English read by L2 Chinese speakers from Shanghai. The metrics were compared to those of native English speakers recording the same material and also to comparable Chinese recordings read by...
this paper I present an overview ofProZed an aid for developing prosody rules for speech synthesis using the MOMEL and INTSINT [19], algorithms and interfaced with the MBROLA , [12]MBROLIGN [23] and Praat [4] programs. It allows the interactive editing of a symbolic representation of an utterance in any of the twenty languages and dialects for whic...
This paper presents a revised version of an implementation of the Momel and INTSINT algorithms for the automatic modelling and symbolic coding of intonation patterns. The algorithms are implemented as external functions which are seamlessly integrated into the Praat speech manipulation software by means of the recently proposed plugin facility for...
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
My original intention when I first started working on this book was to conclude with a final chapter on prosodic variation across languages. As my work on the book progressed, however, I came to the conclusion that such a chapter would be premature.
There are a number of different ways in which we can account for the way in which speech prosody contributes to the interpretation of an utterance. In this chapter, I concentrate essentially on my own work in this area and do not attempt to give anything like an exhaustive account of work on the interpretation of speech prosody in other frameworks,...
We saw in the last chapter that there appears to be no simple one-to-one correspondence between, on the one hand, the typological categorisation of languages into quantity languages, tone languages and stress languages and on the other hand the objective measurable acoustic correlates of these underlying lexical features.
Knowledge is the concern of science, as the Latin for knowledge (scientia) suggests. Knowledge about language is the concern of linguistics, often described as one of the human sciences.
There is a general consensus today that, although utterances are produced, transmitted and perceived as a linear stream of (respectively) physiological, acoustic and perceptual events, see Fig. (6.1), they are mentally represented as a hierarchical prosodic structure, in which smaller chunks of speech are grouped into larger chunks following a hier...
As we saw in Chapter 1, prosody is not recorded in written language, with the marginal exception of punctuation marks. Before the invention of sound recording in the 19th century, it was not possible to hear exactly the same sound more than once. This simple fact applied equally to music and to speech. It was of course always possible for a speaker...
As a native speaker of English who has been living and working in France for the last 50 years, I often wonder what I could answer, if asked to summarise what I actually know today about the prosodic differences between the French and English languages, two languages which I obviously know rather well. I could certainly give a talk or write an essa...
The search for an appropriate scale for measuring fundamental frequency has been one part of a systematic attempt, in particular by researchers from the Netherlands (’t Hart et al., 1990) (“the Dutch school”), to develop a model of the way in which pitch is perceived. This was done by stylising raw fundamental frequency patterns as a sequence of st...
Speech is conveyed by sound. The sounds of speech, like all sounds, are physical events that can be recorded and analysed. The scientific analysis of sounds is called acoustics.
Prosodic parameters, as we saw in Chapter 1, can contribute to the lexical or morphological identity of words in many languages. Languages differ a great deal in the way in which they use prosody in their lexicon.
This book presents the author's personal overview of Speech Prosody, and in particular the deferent areas in which he has been especially interested over the last few decades. These Include the acoustics of spooch prosody, the relationship between lexical and non-lexical prosody, the phonclogy of prosody the modcing of rhythm and of melody, and the...
This is a preprint of Chapter 3 - The transcription of prosody - of my forthcoming book Speech Prosody. From Acoustics to Interpretation.
Comments, suggestions and questions are welcome.
Click on the links to view and/or download the publication or software.
This is a preprint of chapter 5 The Phonology of Speech Prosody. of my forthcoming book.
Speech Prosody. From Acoustics to Interpretation
Comments, suggestions and questions are welcome.
This is a preprint of chapter 4 The Prosody of Words of my forthcoming book: Speech Prosody. From Acoustics to Interpretation.
Comments and questions are welcome.
This is an updated Preprint of Chapter 6. Prosodic Structure.
from my forthcoming book Speech Prosody. From Acoustics to Interpretation.
Questions, comments and corrections are welcome.
This is an updated preprint of Chapter 8 of my forthcoming book
Speech Prosody. From Acoustics to Interpretation.
questions, comments and suggestions are very welcome.
This is a updated preprint of chapter 7 of my forthcoming book *Speech Prosody. From Acoustics to Interpretation*
Comments and suggestions are very welcome.
This is a preprint of Chapter 9 of my forthcoming book Speech Prosody: from Acoustics to Interpretation
This is a preprint of Chapter 10: Conclusion of my forthcoming book *Speech Prosody. From Acoustics to Interpretation.
It also includes the list of references for the book and a language index, an author index and a general index.
This is a preprint of chapter 2 of my forthcoming book.
Comments and questions are welcome.
1/ Question(s) raised and problematicThe aim of this article is to show how research at the Speech and Language Laboratory (Laboratoire Parole et Langage, hereafter LPL) contributes in a significant way to the renewal of knowledge on the prosody of language and languages. It shows how the work of LPL members attempts to address the following questi...
This is a short description of the Momel algorithm, implemented as a plugin to the Praat software.
ProZed (Prosody Editor) is a tool designed to allow linguists to manipulate the prosody of an utterance via a symbolic representation in order to evaluate linguistic models (Hirst 2012, 2015).
Prosody is manipulated via a Praat TextGrid which allows the user to modify the rhythm and melody.
Rhythm is manipulated by factoring segmental duration into...
This is a preprint of chapter 1 of my forthcoming book.
Speech Prosody: from Acoustics to Interpretation.
Comments and questions are welcome.
updated February 10, 2024
An introduction to the the range of current theoretical approaches to the prosody of spoken utterances, with practical applications of those theories.
Prosody is an extremely dynamic field, with a rapid pace of theoretical development and a steady expansion of its influence beyond linguistics into such areas as cognitive psychology, neuroscience, c...
In this chapter, we introduce the reader to the concepts of pitch and fundamental frequency from a functional, physiological and physical perspective. Several issues, including the modelling of intonation, pitch detection and measurement and acoustic scales, described below, are addressed to inform the reader about best practice for teaching and le...
It is well known that L2 speakers have particular difficulty with prosody - the rhythm and melody of their speech - and that this is a major factor leading to their speech being difficult to understand for native speakers. This presentation suggests the possibility of providing automatic visual and auditory feedback as an aid to the improvement of...
This tutorial explains how to display the prosody of a recording
and how to transfer (clone) the prosody of a source recording to a target recording
These tutorials are first drafts and feedback is welcome, in particular if anything is not clear or needs more explanation.
send feedback to <djhirst@me.com>
This tutorial shows how to use the Momel-Intsint plugin for the automatic analysis of the prosody of a single recording.
These tutorials are first drafts and feedback is welcome, in particular if anything is not clear or needs more explanation.
Send feedback to <djhirst@me.com>
This tutorial shows how to use the Momel-INSINT plugin for the automatic analysis of the prosody of a whole batch of recordings.
It presupposes familiarity with analysing the prosody of a single recording, as described in Tutorial 1
These tutorials are first drafts and feedback is welcome, in particular if anything is not clear or needs more expl...
This is a preprint of chapter 8 Modelling Speech Melody
of my forthcoming book: Speech Prosody. From Acoustics to Interpretation.
Questions, comments and suggestions are welcome.
Minor updates - replaces the term "target points" by "anchor points"
Includes updated Readme file
This presentation reports work in progress on an improved and simplified algorithm for coding the output of the Momel algorithm using the INTSINT alphabet, building on recent work which proposed the Octave-Median scale (ome = log2(Hz/Median)) as a natural scale for the representation of pitch. Preliminary results comparing the output of the new alg...
Our ideas about prosodic representation are heavily influenced by our knowledge of written language. All writing systems represent utterances as a linear sequence of elements drawn from a finite set of characters. In many languages special characters such as spaces or punctuation marks are used as boundary symbols. There is a general consensus toda...
Modelling pitch patterns from acoustic data needs to take into account the fact that raw f0 curves are the product of an underlying global pitch pattern and a more local (micromelodic) influence of the individual speech sounds. This suggests the hypothesis that pitch could be modelled using only the f0 detected on sonorant rimes (vowels and sonoran...
OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages representing different prosodic typological characteristics. OMProDat contains recordings of 40 five-sentence passages read by 5 male and 5 female speakers of each language...
Fundamental frequency, the primary acoustic correlate of speech melody, is generally analysed and displayed using a linear scale (Hertz) or a logarithmic one, generally in semitones and usually offset to an arbitrary reference level such as 100 Hz. In this paper we argue that a more natural scale for analysing speech is the OME (Octave-MEdian) scal...
Based on the Momel algorithm, a set of acoustic parameters was analyzed automatically on Chinese emotional speech. Global prosodic features were calculated on the sentence level, which showed a concordance with the usual pattern reported in the literature. Local constraints were also considered on the syllable layer. An ANOVA showed that there were...
During Speech Prosody 2012, we presented SPPAS, SPeech Phonetization Alignment and Syllabification, a tool to auto-matically produce annotations which include utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. SPPAS is open source software is-sued under the GNU Public License. SPPAS is multi-pla...
Current research on speech prosody generally makes use of large quantities of recorded data. In order to provide an open multi-lingual basis for the comparative study of speech prosody, the Laboratoire Parole et Langage has begun the creation of an open database OMProDat containing recordings of 40 five sentence passages, originally taken from the...
It is more and more standard practice, in speech research, to make publicly available the data used in the research, in particular the speech recordings. This can potentially raise the problem of how to respect the anonymity of the speakers, particular if the recordings consist of unmonitored conversations, which may contain references to people by...
In recent years there have been a number of proposals for objective paradigms for establishing prosodic typologies among languages. This paper compares the results of melody metrics calculated on just over two hours of read speech for each of three languages. Pitch movements in Chinese, a lexical tone language, were found to be significantly more a...
The contributions to this volume focus on the interrelation between prosody and iconicity and shed new light on the topic by enlarging the number of parameters traditionally considered, and by confronting various theoretical backgrounds. The parameters taken into account include socio-linguistic criteria (age, sex, socio-economic category, region);...
In standard Chinese, a low tone (Tone 3) is usually changed into a rising tone (Tone 2) when it is immediately followed by another third tone, which is known as the third tone sandhi. The 3rd tone sandhi has been widely discussed in Chinese phonology. This paper, however, employs a prosodic corpus we are developing to study the acoustic realization...
Wiktor Jassem's short article on rhythm (‘Indication of speech rhythm in the transcription of educated Southern English’) was published in 1949 in Le Maître Phonétique , the ancestor of today's Journal of the International Phonetic Association . The author was a young man aged 27 at the time, who was working on a longer treatment of the intonation...
SPPAS, SPeech Phonetization Alignment and Syllabification, is a tool to automatically produce annotations which include utterance, word, syllable and phoneme seg-mentations from a recorded speech sound and its tran-scription. SPPAS is currently implemented for French, English, Italian and Chinese and there is a very simple procedure to add other la...
This paper describes a tool designed to allow linguists to manipulate the prosody of an utterance via a sym-bolic representation in order to evaluate linguistic mod-els. Prosody is manipulated via a Praat TextGrid which allows the user to modify the rhythm and melody. Rhythm is manipulated by factoring segmental duration into three components: (i)...
This paper presents a multilingual learners corpus (AixOx) collected in the framework of an ALLIANCE project. Speakers reading forty 1-minute passages in French and in English were recorded. The passages are taken from the EUROM 1 corpus (Chan et al. 1995).
The corpus consists of the recordings of these passages read by native speakers and L2 lear...
This collection of studies on phonetics and phonology is cordially dedicated to Professor Wiktor Jassem by his colleagues and friends on the occasion of his 90th birthday, 11th June 2012, in appreciation of his influential and pioneering contributions to the field.
This paper describes the application of the analysis by synthesis paradigm to the melody of speech. A complete chain of processes is described from the acoustic analysis of fundamental frequency (f 0), via the phonetic modelling of f 0 using the Momel algorithm, to the surface phonological representation of the curves using the INTSINT alphabet. Ea...
The term dialect is used here as a taxinomic level of linguistic classification, subordinate to a language. In this sense everyone speaks a dialect, including those who speak the prestigious dialect of a language. A comparaison with biological taxinomy brings to light a parallel between the notions of language and of species where, in both cases, t...
The following two texts were published by the predecessor of our journal over half a century ago and illustrate the characteristic nature of its publications at that time.
We propose in this paper a broad-coverage approach for multimodal annotation of conversational data. Large annotation pro-jects addressing the question of multimo-dal annotation bring together many dif-ferent kinds of information from different domains, with different levels of granula-rity. We present in this paper the first re-sults of the OTIM p...
Fundamental frequency, the primary acoustic correlate of speech melody, is generally analysed and displayed using a linear scale (in Hertz) or a logarithmic one (usually in semitones), generally offset to an arbitrary reference level. In this paper we argue that a more natural scale for analysing speech is the OME (Octave-MEdian) scale, using the o...
This study investigates rhythmic parameters in the production of French learners in a dual perspective: (i) to analyse the influence of rhythm of the native language (L1=French) on the target language (L2=English) and, (ii) to provide prosodic evaluative criteria for French speakers' productions. The method used is a comparative analysis of French...
While current tools for the automatic analysis and modeling of intonation are satisfactory for laboratory or isolated sentences, they appear insufficient for the study of longer stretches of authentic speech, which are in general marked by systematic changes of register. This study shows that implementing automatically detected register changes sig...
Most existing algorithms to identify the primary stressed syllable of accented words for the recognition and synthesis of Arabic prosody are based on the fundamental frequency. In this study, we used both formants values and the acoustic parameter of energy by means of a classification by a discriminant analysis to detect the primary stressed sylla...
This paper presents results from the analysis of the rhythmic characteristics of a corpus of five and a half hours of authentic speech of British English. It is shown (as suggested by Wiktor Jassem over 50 years ago) that the most appropriate unit to describe the relative lengthening of phones is the Narrow Rhythm Unit, beginning with the stressed...
This paper proposes an approach for a Classification by Discriminant Analysis of stressed syllables in Standard Arabic. In this study, we exploited the acoustic parameters of fundamental frequency and energy by means of a classification by a discriminant analysis to detect stressed syllables of Standard Arabic words with the structure [CVCVCV] read...
A promising strategy for the multilingual annotation of speech prosody is to use manual annotation of a small corpus of speech to bootstrap a fully automatic annotation system. We make a systematic distinction between functional annotation and formal annotation. The use of functional prosodic labelling for prosody control in a Finnish speech synthe...
A promising strategy for the multilingual annotation of speech prosody is to use manual annotation of a small corpus of speech to bootstrap a fully automatic annotation system. We make a systematic distinction between functional annotation and formal annotation. The use of functional prosodic labelling for prosody control in a Finnish speech synthe...
This database may be freely distributed and used without any restriction except that it should always be accompanied by this notice. Our only request is that the providers of the database (us) should be informed of any enrichments you or others may make to it and that these enrichments should be made freely available for future distributions.
Problem statement: In the early days of speech synthesis research the obvious focus of attention was intelligibility. But many researchers agree that the major remaining obstacle to fully acceptable synthetic speech is that it continues to be insufficiently natural. Approach: In this study, we exploited microvariations of fundamental frequency (F0)...
Problem Statement: Current algorithms for the recognition and synthesis of Arabic prosody concentrate on identifying the primary stressed syllable of accented words on the basis of fundamental frequency. Generally, the three acoustic parameters used in prosody are: Fundamental frequency, duration and energy. Approach: In this study, we exploited th...