Heysem Kaya

Heysem Kaya

PhD

About

68
Publications
26,649
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,355
Citations
Introduction
Heysem does research in Human-computer Interaction, Affective Computing, Computational Paralinguistics and Machine Learning.
Additional affiliations
January 2015 - March 2015
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
Position
  • Visiting researcher
Description
  • Conducted research on speech signal processing for paralinguistic analysis.
August 2013 - October 2015
Technische Universität München
Position
  • Visiting Researcher
Description
  • Conducted research on computational paralinguistics and machine learning particularly on feature selection for acoustic depression prediction.
September 2010 - August 2015
Bogazici University
Position
  • Research Assistant

Publications

Publications (68)
Article
Multimodal recognition of affective states is a difficult problem, unless the recording conditions are carefully controlled. For recognition “in the wild”, large variances in face pose and illumination, cluttered backgrounds, occlusions, audio and video noise, as well as issues with subtle cues of expression are some of the issues to target. In thi...
Article
An important research direction in speech technology is robust cross-corpus and cross-language emotion recognition. In this paper, we propose computationally efficient and performance effective feature normalization strategies for the challenging task of cross-corpus acoustic emotion recognition. We particularly deploy a cascaded normalization appr...
Article
In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3-7 years old children. The base corpus includes over 20K recordings (approx. 30 hours), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: p...
Conference Paper
Full-text available
We describe an end-to-end system for explainable automatic job candidate screening from video CVs. In this application, audio, face and scene features are first computed from an input video CV, using rich feature sets. These multiple modalities are fed into modality-specific regressors to predict apparent personality traits and a variable that pred...
Article
Full-text available
Computational paralinguistics deals with underlying meaning of the verbal messages, which is of interest in manifold applications ranging from intelligent tutoring systems to affect sensitive robots. The state-of-the-art pipeline of paralinguistic speech analysis utilizes brute-force feature extraction, and the features need to be tailored accordin...
Preprint
Full-text available
Inpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can assess each patient's likelihood of becoming violent based on clinical notes. Yet, while machine learning models benefit from having more data, data availab...
Article
Full-text available
Inpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can assess each patient’s likelihood of becoming violent based on clinical notes. Yet, while machine learning models benefit from having more data, data availab...
Preprint
Full-text available
Bipolar disorder is a mental disorder that causes periods of manic and depressive episodes. In this work, we classify recordings from Bipolar Disorder corpus that contain 7 different tasks, into hypomania, mania, and remission classes using only speech features. We perform our experiments on splitted tasks from the interviews. Best results achieved...
Article
Full-text available
As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, namely “in-the-wild” data. This work investigates audi...
Article
Many people experience a traumatic event during their lifetime. In some extraordinary situations, such as natural disasters, war, massacres, terrorism, or mass migration, the traumatic event is shared by a community and the effects go beyond those directly affected. Today, thanks to recorded interviews and testimonials, many archives and collection...
Preprint
Full-text available
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubCh...
Article
Full-text available
Board games are fertile grounds for the display of social signals, and they provide insights into psychological indicators in multi-person interactions. In this work, we introduce a new dataset collected from four-player board game sessions, recorded via multiple cameras, and containing over 46 hours of visual material. The new MUMBAI dataset is ex...
Preprint
Full-text available
Automated classification of animal vocalisations is a potentially powerful wildlife monitoring tool. Training robust classifiers requires sizable annotated datasets, which are not easily recorded in the wild. To circumvent this problem, we recorded four primate species under semi-natural conditions in a wildlife sanctuary in Cameroon with the objec...
Preprint
In this paper, we present our contribution to ABAW facial expression challenge. We report the proposed system and the official challenge results adhering to the challenge protocol. Using end-to-end deep learning and benefiting from transfer learning approaches, we reached a validation set challenge performance measure of 56.56%.
Preprint
Full-text available
Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 20...
Preprint
Full-text available
Cross-language, cross-cultural emotion recognition and accurate prediction of affective disorders are two of the major challenges in affective computing today. In this work, we compare several systems for Detecting Depression with AI Sub-challenge (DDS) and Cross-cultural Emotion Sub-challenge (CES) that are published as part of the AudioVisual Emo...
Conference Paper
Full-text available
Many people experience a traumatic event during their lifetime. In some extraordinary situations, such as natural disasters, war, massacres, terrorism or mass migration, the traumatic event is shared by a community and the effects go beyond those directly affected. Today, thanks to recorded interviews and testimonials, many archives and collections...
Article
Full-text available
Recently, Speech Emotion Recognition (SER) has become an important research topic of affective computing. It is a difficult problem, where some of the greatest challenges lie in the feature selection and representation tasks. A good feature representation should be able to reflect global trends as well as temporal structure of the signal, since emo...
Chapter
Full-text available
In the wild emotion recognition requires dealing with large variances in input signals, multiple sources of noise that will distract the learners, as well as difficult annotation and ground truth acquisition conditions. In this chapter, we briefly survey the latest developments in multimodal approaches for video-based emotion recognition in the wil...
Conference Paper
Full-text available
The Audio/Visual Emotion Challenge and Workshop (AVEC 2018) "Bipolar disorder, and cross-cultural affect recognition'' is the eighth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions...
Conference Paper
Full-text available
Acoustic emotion recognition is a popular and central research direction in paralinguistic analysis, due its relation to a wide range of affective states/traits and manifold applications. Developing highly generalizable models still remains as a challenge for researchers and engineers, because of multitude of nuisance factors. To assert generalizat...
Conference Paper
Full-text available
Perceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the L...
Article
Full-text available
Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an intr...
Chapter
Full-text available
Automatic analysis of job interview screening decisions is useful for establishing the nature of biases that may play a role in such decisions. In particular, assessment of apparent personality gives insights into the first impressions evoked by a candidate. Such analysis tools can be used for training purposes, if they can be configured to provide...
Conference Paper
Full-text available
The field of paralinguistics is growing rapidly with a wide range of applications that go beyond recognition of emotions, laughter and personality. The research flourishes in multiple directions such as signal representation and classification, addressing the issues of the domain. Apart from the noise robustness, an important issue with real life d...
Chapter
Full-text available
First impressions influence the behavior of people towards a newly encountered person or a human-like agent. Apart from the physical characteristics of the encountered face, the emotional expressions displayed on it, as well as ambient information affect these impressions. In this work, we propose an approach to predict the first impressions people...
Article
Full-text available
We present analytical survey of state-of-the-art actual tasks in the area of computational paralinguistics, as well as the recent achievements of automatic systems for paralinguistic analysis of conversational speech. Paralinguistics studies non-verbal aspects of human communication and speech such as: natural emotions, accents, psycho-physiologica...
Conference Paper
One of the challenges in speech emotion recognition is robust and speaker-independent emotion recognition. In this paper, we take a cascaded normalization approach, combining linear speaker level, non-linear value level and feature vector level normalization to minimize speaker-related effects and to maximize class separability with linear kernel c...
Conference Paper
Full-text available
Nowadays computer-aided medical systems has become widespread. These systems assist the scientists in the medical field with diagnosis and treatment. In the same vein, in this study detection of medial meniscus from MR images of the knee is performed automatically. Knee MR images used in this study were obtained from Osteoarthritis initiative. 75%...
Conference Paper
Full-text available
This paper presents our contribution to ACM ICMI 2015 Emotion Recognition in the Wild Challenge (EmotiW 2015). We participate in both static facial expression (SFEW) and audiovisual emotion recognition challenges. In both challenges , we use a set of visual descriptors and their early and late fusion schemes. For AFEW, we also exploit a set of popu...
Article
Full-text available
Psychoanalysis can be thought of as a scene that is created by the analyst, the patient and the “analytic work.” The “work” comprises the interactional patterns of the dyad which evolve over time to create new possibilities for functioning of the patient. Taking this framework as a starting point, this study presents an empirically based investigat...
Conference Paper
Full-text available
We present the first child emotional speech corpus in Russian, called "EmoChildRu " , which contains audio materials of 3-7 year old kids. The database includes over 20K recordings (approx. 30 hours), collected from 100 children. Recordings were carried out in three controlled settings by creating different emotional states for children: playing wi...
Conference Paper
Full-text available
Computational Paralinguistics has several unresolved issues, one of which is coping with large variability due to speakers, spoken content and corpora. In this paper, we address the variability compensation issue by proposing a novel method composed of i) Fisher vector encoding of low level descriptors extracted from the signal, ii) speaker z-norma...
Article
Full-text available
A mixture of factor analyzers is a semi-parametric density estimator that generalizes the well-known mixtures of Gaussians model by allowing each Gaussian in the mixture to be represented in a different lower-dimensional manifold. This paper presents a robust and parsimonious model selection algorithm for training a mixture of factor analyzers, car...
Article
Full-text available
This paper proposes extreme learning machines (ELM) for modeling audio and video features for emotion recognition under uncontrolled conditions. The ELM paradigm is a fast and accurate learning alternative for single layer Feedforward networks. We experiment on the acted facial expressions in the wild corpus, which features seven discrete emotions,...
Conference Paper
Full-text available
This paper presents our contribution to ACM ICMI 2014 Mapping Personality Traits Challenge and Workshop. The proposed system utilizes Extreme Learning Machines (ELM) and Canonical Correlation Analysis (CCA) for modeling acous-tic features. The ELM paradigm is proposed as a fast and accurate alternative to train Single Layer Feed-forward Net-works (...
Conference Paper
Full-text available
The analysis of spoken emotions is of increasing interest in human computer interaction, in order to drive the machine communication into a humane manner. It has manifold ap-plications ranging from intelligent tutoring systems to affect sensitive robots, from smart call centers to patient telemon-itoring. In general the study of computational paral...
Conference Paper
Full-text available
This paper presents our contribution to ACM ICMI 2014 Emotion Recognition in the Wild Challenge and Workshop. The proposed system utilizes Extreme Learning Machines (ELM) for modeling modality-specific features and combines the scores for final prediction. The state-of-the-art results in acoustic and visual emotion recognition are obtained either u...
Conference Paper
Full-text available
This paper presents our work on ACM MM Audio Visual Emotion Corpus 2014 (AVEC 2014) using the baseline fea-tures in accordance with the challenge protocol. For predic-tion, we use Canonical Correlation Analysis (CCA) in af-fect sub-challenge (ASC) and Moore-Penrose generalized in-verse (MPGI) in depression sub-challenge (DSC). The video baseline pr...
Conference Paper
Full-text available
This paper presents our work on ACM MM Audio Visual Emotion Corpus 2013 (AVEC 2013) depression recognition sub-challenge using the baseline features in accordance with the challenge protocol. We use Canonical Correlation Anal-ysis for audio-visual fusion as well as covariate extraction for the target task. The video baseline provides histograms of...
Conference Paper
Full-text available
In this study we present our system for INTERSPEECH 2014 Computational Paralinguistics Challenge (ComParE 2014), Physical Load Sub-challenge (PLS). Our contribution is twofold. First, we propose using Low Level Descriptor (LLD) information as hints, so as to partition the feature space into meaningful subsets called views. We also show the virtue o...
Conference Paper
Full-text available
In this study we make use of Canonical Correlation Analy-sis (CCA) based feature selection for continuous depression recognition from speech. Besides its common use in multi-modal/multi-view feature extraction, CCA can be easily em-ployed as a feature selector. We introduce several novel ways of CCA based filter (ranking) methods, showing their rel...
Conference Paper
Full-text available
This study aims at presenting an emotional corpus collected at Boğaziçi University / Electrical and Electronics Department, on which no previous signal processing and machine learning study was done for classification purposes. It also aims at providing the protocol for further experiments on this corpus. The emotional corpus consists of 484 speech...
Conference Paper
Full-text available
In this study, we investigate several methods on the Interspeech 2013 Paralinguistic Challenge -Social Signals Sub-Challenge dataset. The task of this sub-challenge is to detect laughter and fillers per frame. We apply Random Forests with varying number of trees and randomly selected features. We then proceed with minimum Redundancy Maximum Relevan...
Conference Paper
Full-text available
This study proposes a probabilistic approach to evaluate prenatal risk of Down syndrome. In this study, we address the decision-making problem in diagnosing Down syndrome from the machine learning perspective aiming to decrease invasive tests. We employ Naive Bayes and Bayesian Networks classification algorithms as probabilistic methods. This proba...
Chapter
Full-text available
Centroid based clustering methods, such as K-Means, form Voronoi cells whose radii are inversely proportional to number of clusters, K, and the expectation of posterior probability distribution in the closest cluster is related to that of a k-Nearest Neighbor Classifier (k-NN) due to the Law of Large Numbers. The aim of this study is to examine the...
Conference Paper
Full-text available
In pursuit of finding accurate and efficient ways of predicting hourly electrical energy output, this study utilizes a dataset collected over 6 years (2006-2011) whose data points correspond to average hourly sensor measurements when the plant is set to work with full load. The input features are ambient temperature, relative humidity and ambient p...