Anton Batliner

Anton Batliner
Technical University of Munich | TUM · School of Medicine and Health

Dr.

About

475
Publications
104,641
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,732
Citations
Additional affiliations
February 2012 - October 2012
Technical University of Munich
Position
  • Researcher
January 1996 - present
January 1997 - December 2012
Friedrich-Alexander-University Erlangen-Nürnberg

Publications

Publications (475)
Preprint
Full-text available
Chronic obstructive pulmonary disease (COPD) is a serious inflammatory lung disease affecting millions of people around the world. Due to an obstructed airflow from the lungs, it also becomes manifest in patients' vocal behaviour. Of particular importance is the detection of an exacerbation episode, which marks an acute phase and often requires hos...
Preprint
Full-text available
We revisit the INTERSPEECH 2009 Emotion Challenge -- the first ever speech emotion recognition (SER) challenge -- and evaluate a series of deep learning models that are representative of the major advances in SER research in the time since then. We start by training each model using a fixed set of hyperparameters, and further fine-tune the best-per...
Article
Full-text available
The relationship between music and emotion has been addressed within several disciplines, from more historico-philosophical and anthropological ones, such as musicology and ethnomusicology, to others that are traditionally more empirical and technological, such as psychology and computer science. Yet, understanding the link between music and emotio...
Conference Paper
Full-text available
Emotion is an important component of music investigated in music psychology. In recent years, the use of computational methods to assess the link between music and emotions has been promoted by advances in music emotion recognition. However, one of the main limitations of applying data-driven approaches to understand such a link is the scarce knowl...
Article
Full-text available
Charisma is considered as one's ability to attract and potentially influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of...
Article
Full-text available
Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields o...
Preprint
Full-text available
The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-C...
Article
Full-text available
The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from in...
Article
Full-text available
This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones (‘closed world’). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a...
Preprint
Full-text available
Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields o...
Article
Full-text available
Purpose The aim of this study was to investigate the speech prosody of postlingually deaf cochlear implant (CI) users compared with control speakers without hearing or speech impairment. Method Speech recordings of 74 CI users (37 males and 37 females) and 72 age-balanced control speakers (36 males and 36 females) are considered. All participants...
Article
Full-text available
Since the end of the last century, the automatic processing of paralinguistics has been investigated widely and put into practice in many applications, on wearables, smartphones, and computers. In this contribution, we address ethical awareness for paralinguistic applications, by establishing taxonomies for data representations, system designs for...
Preprint
This is the Proceedings of the ACII Affective Vocal Bursts Workshop and Competition (A-VB). A-VB was a workshop-based challenge that introduces the problem of understanding emotional expression in vocal bursts -- a wide range of non-verbal vocalizations that includes laughs, grunts, gasps, and much more. With affective states informing both mental...
Preprint
Full-text available
Chronic obstructive pulmonary disease (COPD) causes lung inflammation and airflow blockage leading to a variety of respiratory symptoms; it is also a leading cause of death and affects millions of individuals around the world. Patients often require treatment and hospitalisation, while no cure is currently available. As COPD predominantly affects t...
Preprint
Full-text available
The ACII Affective Vocal Bursts Workshop & Competition is focused on understanding multiple affective dimensions of vocal bursts: laughs, gasps, cries, screams, and many other non-linguistic vocalizations central to the expression of emotion and to human communication more generally. This year's competition comprises four tracks using a large-scale...
Preprint
Full-text available
The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audi...
Article
Full-text available
The article ‘The perception of emotional cues by children in artifcial background noise’, written by Emilia Parada-Cabaleiro, Anton Batliner, Alice Baird and Björn Schuller, was originally published Online First without Open Access. After publication in volume 23, issue 1, page 169–182 it has been decided to make the article an Open Access publicat...
Preprint
Full-text available
The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from in...
Article
Full-text available
Musical listening is broadly used as an inexpensive and safe method to reduce self-perceived anxiety. This strategy is based on the emotivist assumption claiming that emotions are not only recognised in music but induced by it. Yet, the acoustic properties of musical work capable of reducing anxiety are still under-researched. To fill this gap, we...
Preprint
Full-text available
As one of the most prevalent neurodegenerative disorders, Parkinson's disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patte...
Article
As one of the most prevalent neurodegenerative disorders, Parkinson’s disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patte...
Conference Paper
Full-text available
Renaissance music constitutes a resource of immense richness for Western culture, as shown by its central role in digital humanities. Yet, despite the advance of computational musicology in analysing other Western repertoires, the use of computer-based methods to automatically retrieve relevant information from Renaissance music, e. g., identifying...
Article
The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask...
Article
The Coronavirus (COVID-19) pandemic impelled several research efforts, from collecting COVID-19 patients’ data to screening them for virus detection. Some COVID-19 symptoms are related to the functioning of the respiratory system that influences speech production; this suggests research on identifying markers of COVID-19 in speech and other human g...
Article
Full-text available
COVID-19 is a global health crisis that has been affecting our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. Many symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the first time, the prese...
Preprint
Full-text available
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubCh...
Preprint
Full-text available
COVID-19 is a global health crisis that has been affecting many aspects of our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. A considerable proportion of symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice p...
Article
Full-text available
Extensive research has been published on the effects of music in reducing anxiety. Yet, for most of the existing works, a common methodology regarding musical genres and measurement techniques is missing, which limits considerably the comparison between them. In this study,we assess, for the first time, markedly different musical genres with both p...
Conference Paper
Full-text available
Modelling of the breath signal is of high interest to both healthcare professionals and computer scientists, as a source of diagnosis-related information, or a means for curating higher quality datasets in speech analysis research. The formation of a breath signal gold standard is, however, not a straightforward task, as it requires specialised equ...
Conference Paper
Full-text available
The INTERSPEECH 2020 Computational Paralinguistics Challenge addresses three different problems for the first time in a research competition under well-defined conditions: In the Elderly Emotion Sub-Challenge, arousal and valence in the speech of elderly individuals have to be modelled as a 3-class problem; in the Breathing Sub-Challenge, breathing...
Article
Full-text available
With the advent of ‘heavy Artificial Intelligence’ - big data, deep learning, and ubiquitous use of the internet, ethical considerations are widely dealt with in public discussions and governmental bodies. Within Computational Paralinguistics with its manifold topics and possible applications (modelling of long-term, medium-term, and short-term tra...
Article
Full-text available
Most typically developed individuals have the ability to perceive emotions encoded in speech; yet, factors such as age or environmental conditions can restrict this inherent skill. Noise pollution and multimedia over-stimulation are common components of contemporary society, and have shown to particularly impair a child’s interpersonal skills. Asse...
Conference Paper
Full-text available
Early musical sources in white mensural notation-the most common notation in European printed music during the Renaissance-are nowadays preserved by libraries worldwide trough digitalisation. Still, the application of music information retrieval to this repertoire is restricted by the use of digitalisation techniques which produce an uncodified out...
Preprint
Full-text available
In this article, we study laughter found in child-robot interaction where it had not been prompted intentionally. Different types of laughter and speech-laugh are annotated and processed. In a descriptive part, we report on the position of laughter and speech-laugh in syntax and dialogue structure, and on communicative functions. In a second part,...
Article
Full-text available
We present DEMoS (Database of Elicited Mood in Speech), a new, large database with Italian emotional speech: 68 speakers, some 9 k speech samples. As Italian is under-represented in speech emotion research, for a comparison with the state-of-the-art, we model the ‘big 6 emotions’ and guilt. Besides making available this database for research, our c...
Conference Paper
Full-text available
The expression of emotion is an inherent aspect in singing, especially in operatic voice. Yet, adverse acoustic conditions , as, e. g., a performance in open-air, or a noisy analog recording, may affect its perception. State-of-the art methods for emotional speech evaluation have been applied to operatic voice, such as perception experiments, acous...
Conference Paper
Full-text available
The Italian madrigal, a polyphonic secular a cappella composition of the 16 th century, is characterised by a strong musical-linguistic relationship, which has made it an icon of the 'Renaissance humanism'. In madrigals, lyrical meaning is mimicked by the music, through the utilisa-tion of a composition technique known as madrigalism. The synergy b...
Conference Paper
Full-text available
The INTERSPEECH 2018 Computational Paralinguistics Challenge addresses four different problems for the first time ina research competition under well-defined conditions: In the Atypical Affect Sub-Challenge, four basic emotions annotatedin the speech of handicapped subjects have to be classified; in the Self-Assessed Affect Sub-Challenge, valence s...
Article
In this article, we review the INTERSPEECH 2013 Computational Paralinguistics ChallengE (ComParE) – the first of its kind – in light of the recent developments in affective and behavioural computing. The impact of the first ComParE instalment is manifold: first, it featured various new recognition tasks including social signals such as laughter and...
Conference Paper
Full-text available
With the increased usage of internet based services and the mass of digital content now available online, the organisation of such content has become a major topic of interest both commercially and within academic research. The addition of emotional understanding for the content is a relevant parameter not only for music classification within digit...
Conference Paper
Full-text available
The outputs of the higher layers of deep pre-trained convolutional neural networks (CNNs) have consistently been shown to provide a rich representation of an image for use in recognition tasks. This study explores the suitability of such an approach for speech-based emotion recognition tasks. First, we detail a new acoustic feature representation,...
Conference Paper
Full-text available
The automatic analysis of notated Renaissance music is restricted by a shortfall in codified repertoire. Thousands of scores have been digitised by music libraries across the world, but the absence of symbolically codified information makes these inaccessible for computational evaluation. Optical Music Recognition (OMR) made great progress in addre...
Conference Paper
In this work, we present an in-depth analysis of the interdependency between the non-native prosody and the native language (L1) of English L2 speakers, as separately investigated in the Degree of Nativeness Task and the Native Language Task of the INTERSPEECH 2015 and 2016 Computational Paralinguistics ChallengE (ComParE). To this end, we propose...
Article
Full-text available
We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard devi...