Table 1 - uploaded by Kairi Tamuri
Content may be subject to copyright.
Source publication
Fundamental frequency (F0, perceived as pitch) is an important prosodic cue of emotion. The aim of the present study was to fi nd out if sentence emotion has any infl uence detectable in the F0 height and range of Estonian read-out speech. Thus the F0 of each vowel found in Estonian read-out sentences was measured, and its median for three emotions...
Context in source publication
Similar publications
This paper deals with the issue of the influence of verbal content on listeners who have to identify or evaluate speech emotions, and whether or not the emotional aspect of verbal content should be eliminated. We compare the acoustic parameters of sentences expressing joy, anger, sadness and neutrality of two groups: (1) where the verbal content ai...
This article presents the novel method for emotion recognition from polish speech. We compared two different databases: spontaneous and acted out speech. For the purpose of this research we gathered a set of audio samples with emotional information, which serve as input database. Multiple Classifier Systems were used for classification, with common...
Citations
... In that particular study, a reduced fundamental frequency contrast between happiness and sadness was found, which might have been caused by the wider frequency range (�160 -400 hertz) of happiness expressed by children with versus without hearing loss. This frequency range encompasses the entire frequency ranges of sadness and anger, as also previously published (Tamuri 2014). This makes it easier to mistake happiness for other emotions in subjective evaluations. ...
Objective:
The emotional prosodic expression potential of children with cochlear implants is poorer than that of normal hearing peers. Though little is known about children with hearing aids.
Design:
This study was set up to generate a better understanding of hearing aid users' prosodic identifiability compared to cochlear implant users and peers without hearing loss.
Study sample:
Emotional utterances of 75 Dutch speaking children (7 - 12 yr; 26 CHA, 23 CCI, 26 CNH) were gathered. Utterances were evaluated blindly by normal hearing Dutch listeners: 22 children and 9 adults (17 - 24 yrs) for resemblance to three emotions (happiness, sadness, anger).
Results:
Emotions were more accurately recognised by adults than by children. Both children and adults correctly judged happiness significantly less often in CCI than in CNH. Also, adult listeners confused happiness with sadness more often in both CHA and CCI than in CNH.
Conclusions:
Children and adults are able to accurately evaluate the emotions expressed through speech by children with varying degrees of hearing loss, ranging from mild to profound, nearly as well as they can with typically hearing children. The favourable outcomes emphasise the resilience of children with hearing loss in developing effective emotional communication skills.
... We have worked with speech material that was read in a neutral speech style with a relatively monotonous F0, therefore, the results do not represent the entire vocal range of the speakers. In different speech styles, e.g. in a spontaneous or emotional speech, higher F0 values would be expected for joy and lower for anger, the F0 range would be larger for anger and smaller for sadness as it has been found in adult speech (Tamuri 2015). ...
The paper introduces the Estonian Adolescent Speech Corpus and explores the developmental changes in speech production based on acoustic characteristics of fundamental frequency (F0), formant frequencies, and speech tempo as a function of age and gender. Age- and gender-related anatomic changes in adolescence have implications for speech acoustics: a sudden drop of F0 at puberty in boys, and an almost gradual decrease of the acoustic vowels space. In parallel with anatomic changes, the development of the speech motor system is manifested as the increase of speaking rate. The analysis of fundamental frequency (F0) shows that in both male and female speakers, the F0 decreases gradually at the age from 9 to 12 years, then in males F0 drops ca by 100 Hz at the age of 12–15 due to puberty voice change, and becomes stable at the age of 15–18; in female speakers, a gradual decrease of F0 continues till the age 18. The formant frequencies of vowels decrease gradually from 10 to 15 years in both genders and the quality of vowels stabilizes at the age of 15–18 years, gender-specific differences emerge at the age of 12–13. Speech rate increases from 4 syllables per second in 9–10 years to 5.1 syllables per second in 14 years and becomes stable between the ages of 15 and 18, gender differences are not significant. The results of the current study can be considered as reference data that are typical for Estonian-speaking individuals aged 9–18 years with normal language development.
... Overall, higher arousal emotions such as happiness or joy are associated with higher mean intensity, higher mean F0 and shorter pauses, whereas lower arousal emotions such as sadness are associated with lower mean intensity, lower mean F0 and longer pauses compared to neutral speech (Juslin & Laukka, 2003;Tisljár-Szabó & Pléh, 2014). However, subtle differences between languages and individual differences due to voice type (i.e., lax/tense voice type) have been documented (Tamuri, 2014). In the context of emotional prosody studies, there is also evidence that vocal cues are reliable indicators of psychiatric disorders. ...
Emotion, as part of the overall sensorimotor, introspective, and affective system, is an essential part of language comprehension within the framework of embodied semantics. As emotional state influences semantic and syntactic processing, emotional language processing has been shown to modulate mood as well. The reciprocal relationship between language and emotion has also been informative in bilingualism. Here we take a relatively underresearched type of bilingual processing, simultaneous interpreting, as a case of extreme bilingualism and investigate the effect of emotional language rendering in the L1 on subjective affect and prosodic markers of L2 output. 18 trainee interpreters were asked to simultaneously interpret three speeches in Turkish that varied in emotionality, valence (negative, neutral, and positive), and difficulty in English. Responses to emotional language processing were analysed based on participants’ self-reported positive and negative affect using the Positive and Negative Affect Schedule (PANAS) and three prosodic parameters (intensity, pitch, and fluency). Results showed that interpreting emotionally negative speech increased negative affect, whereas interpreting emotionally positive speech did not modify positive affect. Intensity generally reflected cognitive load. Pitch and fluency, in particular, were more sensitive to changes in the valence of the source speech.
... According to the results of an acoustic analysis of related studies [3][4][5], the emotions manifested in English read-out speech can be described by the following characteristics (see Table 1): ...
In this study, we investigate how listeners recognize emotional speech and if they are better recognizing some emotions than others. Chinese listeners with relatively basic and relatively more advanced English skills were asked to recognize three kinds of emotional speech (expressing anger, joy, and sadness), as well as neutral speech, produced by native English and Chinese English speakers. The Chinese listeners with a more advanced English level showed significantly better skills of speaker identification, while sadness any joy were better recognized than anger. This research has implications for cross-cultural communication and speaker identification in general.
... For women, F 0 is nearly constant up to the age of menopause, when it starts to decrease [40]. Pitch also highly depends on the speaker's emotional state, e.g., the pitch is highest for joy and lowest for anger [20]. Pitch variation during a talk is more important than the frequency of the pitch itself [19]. ...
Great public speakers are made, not born. Practicing a presentation in front of colleagues is common practice and results in a set of subjective judgements what could be improved. In this paper we describe the design and implementation of a mobile app which estimates the quality of speaker's delivery in real time in a fair, repeatable and privacy-preserving way. Quantle estimates the speaker's pace in terms of the number of syllables, words and clauses, computes pitch and duration of pauses. The basic parameters are then used to estimate the talk complexity based on readability scores from the literature to help the speaker adjust his delivery to the target audience. In contrast to speech-to-text-based methods used to implement a digital presentation coach, Quantle does processing locally in real time and works in the flight mode. This design has three implications: (1) Quantle does not interfere with the surrounding hardware, (2) it is power-aware, since 95.2% of the energy used by the app on iPhone 6 is spent to operate the built-in microphone and the screen, and (3) audio data and processing results are not shared with a third party therewith preserving speaker's privacy.
We evaluate Quantle on artificial, online and live data. We artificially modify an audio sample by changing the volume, speed, tempo, pitch and noise level to test robustness of Quantle and its performance limits. We then test Quantle on 1017 TED talks held in English and compare computed features to those extracted from the available transcript processed by online text evaluation services. Quantle estimates of syllable and word counts are 85.4% and 82.8% accurate, and pitch is over 90% accurate. We use the outcome of this study to extract typical ranges for each vocal characteristic. We then use Quantle on live data at a social event, and as a tool for speakers to track their delivery when rehearsing a talk. Our results confirm that Quantle is robust to different noise levels, varying distances from the sound source, phone orientation, and achieves comparable performance to speech-to-text methods.
... Kairi Tamuri (2015) on mõõtnud põhitooni kõrgust ja ulatust kolme emotsiooni (viha, rõõmu, kurbust) kandvates ja neutraalse kõnena loetud lausetes. Tema tööst selgub, et kõige kõrgem põhitoon on eestikeelses kõnes rõõmul ja kõige madalam vihal. ...
Raamat "Eesti keele hääldus" kirjeldab eesti keele hääldussüsteemi ning selle varieerumist. Siit saab ülevaate eesti keele vokaalidest ja konsonantidest, sõnarõhust, vältest ja intonatsioonist. Raamat esitab samuti sissevaate eesti keele häälduse uurimislukku nii foneetika kui ka fonoloogia vaatenurgast.
http://dspace.ut.ee/bitstream/handle/10062/57960/EKV2_haaldus.pdf?sequence=1&isAllowed=y
... The sentences were arranged into a web-based listening test. According to the results of a statistical acoustic analysis of the sentences drawn from the Estonian Emotional Speech Corpus, the emotions manifested in Estonian read-out speech can be described by the following characteristics (Tamuri 2012(Tamuri , 2015, and Tamuri and Mihkla 2012): Table 1. Continuation Joy has a high pitch, average intensity and average speech rate. ...
We investigated the influence of culture and language on the understanding of speech emotions. Listeners from different cultures and language families had to recognize moderately expressed vocal emotions (joy, anger, sadness) and neutrality of each sentence in foreign speech without seeing the speaker. The web-based listening test consisted of 35 context-free sentences drawn from the Estonian Emotional Speech Corpus. Eight adult groups participated, comprising: 30 Estonians; 17 Latvians; 16 North-Italians; 18 Finns; 16 Swedes; 16 Danes; 16 Norwegians; 16 Russians. All participants lived in their home countries and, except the Estonians, had no knowledge of Estonian. Results showed that most of the test groups differed significantly from Estonians in the recognition of most emotions. Only Estonian sadness was recognized
well by all test groups. Results indicated that genealogical relation of languages and similarity of cultural values are not advantages in recognizing vocal emotions expressed in a different culture and language.
... One of the authors of this paper, Kairi Tamuri, has analysed the acoustics of Estonian emotional speech readaloud (see section 2), using the speech data in the Estonian Emotional Speech Corpus, 1 and determined the acoustic features and the values that characterize the three basic emotions in Estonian speech, and which distinguish them from one another and from neutral speech. She has analysed pauses (Tamuri 2010), formants and precision of articulation (Tamuri 2012a), speech rate , intensity of speech (Tamuri 2012b) and fundamental frequency (Tamuri 2015) in emotional speech. It is important to underline that the object of study was not spontaneous or acted speech, but read-aloud speech. ...
Abstract. The goal of this study was to conduct modelling experiments, the purpose of
which was the expression of three basic emotions (joy, sadness and anger) in Estonian
parametric text-to-speech synthesis on the basis of both a male and a female voice. For
each emotion, three different test models were constructed and presented for evaluation
to subjects in perception tests. The test models were based on the basic emotions’
characteristic parameter values that had been determined on the basis of human speech.
In synthetic speech, the test subjects most accurately recognized the emotion of sadness,
and least accurately the emotion of joy. The results of the test showed that, in the case
of the synthesized male voice, the model with enhanced parameter values performed
best for all three emotions, whereas in the case of the synthetic female voice, different
emotions called for different models: the model with decreased values was the most
suitable one for the expression of joy, and the model with enhanced values was the most
suitable for the expression of sadness and anger. Logistic regression was applied to the
results of the perception tests in order to determine the significance and contribution
of each acoustic parameter in the emotion models, and the possible need to adjust the
values of the parameters.
Keywords: Estonian, emotions, speech synthesis, acoustic model, speech rate, intensity,
fundamental frequency
DOI: http://dx.doi.org/10.12697/jeful.2015.6.3.06
The research aimed to assess an AI-based cognitive function decline screening tool, “ONSEIⓇ”, in detecting for impaired cognitive abilities due to psychological and physical factors. A total of 153 workers were recruited who performed the ONSEIⓇ assessment, which determined possibility of cognitive impairment in the case of positive reaction, and answered the questionnaires using their smartphones. Participants received daily reminder emails for 60 days, prompting them to perform the ONSEIⓇ assessment and answer a questionnaire containing items pertaining to their everyday life. On the last day, we conducted a survey on the changes in awareness regarding health after participating in this study. Daily surveys showed that subjective mood decline or anxiety was significantly associated with positive results on the ONSEIⓇ. However, subjective physical discomfort on the day of assessment, and sleep disturbance, and excessive alcohol consumption on the previous day were not. The post-study questionnaire revealed changes in the perception of mental health, and efforts toward appropriate sleep and hydration. The ONSEIⓇ lacked sufficient detection sensitivity for impaired cognitive abilities due to psychological and physical factors. However, repetition of the ONSEIⓇ assessment and asking about everyday life might promote awareness of, and behavioral changes in, psychological and physical health.