Chi-Chun Lee

Chi-Chun Lee
National Tsing Hua University | NTHU · Department of Electrical Engineering

PhD EE 2012, University of Southern California

About

216
Publications
35,170
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,211
Citations
Introduction
Jeremy is a Professor at the Department of Electrical Engineering of the National Tsing Hua University (NTHU), Taiwan. He received his B.S. degree and Ph.D. degree both in Electrical Engineering under supervision of Prof Shri Narayanan from the University of Southern California (USC). Please see my personal webpage for more detail: https://biic.ee.nthu.edu.tw/cclee.php
Additional affiliations
August 2018 - present
National Tsing Hua University
Position
  • Professor (Associate)
February 2014 - July 2018
National Tsing Hua University
Position
  • Professor (Assistant)
August 2007 - December 2012
University of Southern California
Position
  • Research Assistant
Education
August 2007 - December 2012
University of Southern California
Field of study
  • Electrical Engineering
August 2003 - May 2007
University of Southern California
Field of study
  • Electrical Engineering

Publications

Publications (216)
Conference Paper
Full-text available
Advanced wearable tracking shows potential for identifying psychological and emotional stress relevant to the mental health of high-intensity emergency responders. Heart rate variability (HRV) captured by wearable devices can indicate the correlation between intra-subject daily variations and stress. HRV also varies due to various demographic attri...
Conference Paper
Full-text available
Heart Rate Variability (HRV) features are recognized as powerful indicators of various diseases, including heart failure, diabetes, and mental health disorders. Besides, HRV features are robust against noise, making them ideal for wearable devices. Despite their potential, HRV feature sets are limited by sample quantity. Direct augmentation often d...
Conference Paper
Full-text available
Mental stress has become a growing concern in contemporary society; fortunately, recent developments in wear-able technology now offer a promising solution. However, a common issue in longitudinal tracking with wearable sensors is missing data, which can introduce biases during model training, affecting predictions and leading to unfair outcomes fo...
Article
Full-text available
Background and introduction: In comparison to other physical assessment methods, the inconsistency in respiratory evaluations continues to pose a major issue and challenge. Objectives: This study aims to evaluate the difference in the identification ability of different breath sound. Methods/description: In this prospective study, breath sound...
Preprint
BACKGROUND eep learning techniques have shown promising results in the automatic classification of respiratory sounds. However, accurately distinguishing these sounds in real-world noisy conditions poses challenges for clinical deployment. Additionally, predicting signals with only background noise could undermine user trust in the system. OBJECTI...
Conference Paper
Full-text available
Training speech emotion recognition (SER) requires human-annotated labels and speech data. However, emotion perception is complex. The pre-defined emotion categories are not enough for annotators to describe their emotion perception. Devoted annotators will use natural language rather than traditional emotion labels when annotating data, resulting...
Research
Full-text available
Supplementary Material for Empower Typed Descriptions by Large Language Models for Speech Emotion Recognition
Preprint
Full-text available
Speech Emotion Recognition (SER) systems rely on speech input and emotional labels annotated by humans. However, various emotion databases collect perceptional evaluations in different ways. For instance, the IEMOCAP dataset uses video clips with sounds for annotators to provide their emotional perceptions. However, the most significant English emo...
Preprint
Emotional Voice Conversion (EVC) modifies speech emotion to enhance communication by amplifying positive cues and reducing negative ones. This complex task involves entangled factors like voice quality, speaker traits, and content. Traditional deep learning models like GANs and autoencoders have achieved some success in EVC by learning mappings or...
Conference Paper
Full-text available
Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final transformer layer of these models. However, given the task-specific nature and hierarchical architectu...
Conference Paper
In recent years, there have been significant advancements in one-shot voice conversion (VC), enabling the alteration of speaker traits with just a single sentence. However, as this technology matures and generates increasingly realistic utterances, it becomes vulnerable to privacy concerns. In this paper, we propose RW-VoiceShield to shield voice f...
Conference Paper
Speech recordings frequently encounter a variety of distortions, making the task of eliminating them essential yet challenging. In this study, leveraging the current success of score-based generative modeling (SGM), we propose a novel noise-robust bandwidth expansion (BWE) framework based on an innovative parameterized stochastic diffusion process,...
Conference Paper
Full-text available
Speech emotion recognition (SER) has been extensively integrated into voice-centric applications. A unique fairness issue of SER stems from the naturally biased labels given by raters as ground truth. While existing efforts primarily aim to advance SER fairness through a group (i.e., gender) fairness standpoint, our analysis reveals that label bias...
Conference Paper
Full-text available
Speech emotion recognition (SER) helps to achieve better human-to-machine interactions in voice technologies. Recent studies have pointed out critical fairness issues in the SER. While there are efforts in building fair SER, most of the works focus on fairness between demographic groups and rely on these broad categorical attributes to build a fair...
Conference Paper
Full-text available
Speech emotion recognition (SER) is an essential technology for human-computer interaction systems. However, the previous study reveals that 80.77% of SER papers yield results that cannot be reproduced on the well-known IEMOCAP dataset. The main reason for reproducibility challenges is that the database did not provide standard data splits (e.g., t...
Research
Full-text available
Supplementary Material for Embracing Ambiguity And Subjectivity Using The All-inclusive Aggregation Rule For Evaluating Multi-label Speech Emotion Recognition Systems
Conference Paper
Full-text available
Speech Emotion Recognition (SER) faces a distinct challenge compared to other speech-related tasks because the annotations will show the subjective emotional perceptions of different annotators. Previous SER studies often view the subjectivity of emotion perception as noise by using the majority rule or plurality rule to obtain the consensus labels...
Research
Full-text available
Supplementary Material for Open-Emotion: A Reproducible EMO-SUPERB for Speech Emotion Recognition System
Thesis
Full-text available
Over the past twenty years, there has been a growing focus on speech emotion recognition (SER). To develop SER systems capable of identifying emotions in speech, researchers need to gather emotional databases for training purposes. This process involves training crowdsourced raters or in-house annotators to express their emotional responses after e...
Preprint
Full-text available
The neural codec model reduces speech data transmission delay and serves as the foundational tokenizer for speech language models (speech LMs). Preserving emotional information in codecs is crucial for effective communication and context understanding. However, there is a lack of studies on emotion loss in existing codecs. This paper evaluates neur...
Preprint
Deep learning techniques have shown promising results in the automatic classification of respiratory sounds. However, accurately distinguishing these sounds in real-world noisy conditions poses challenges for clinical deployment. Additionally, predicting signals with only background noise could undermine user trust in the system. In this study, we...
Preprint
Full-text available
Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final transformer layer of these models. However, given the task-specific nature and hierarchical architectu...
Article
Full-text available
Automatic sensing of emotional information in speech is important for numerous everyday applications. Conventional Speech Emotion Recognition (SER) models rely on averaging or consensus of human annotations for training, but emotions and raters' interpretations are subjective in nature, leading to diverse variations in perceptions. To address this,...
Preprint
Full-text available
The rapid growth of Speech Emotion Recognition (SER) has diverse global applications, from improving human-computer interactions to aiding mental health diagnostics. However, SER models might contain social bias toward gender, leading to unfair outcomes. This study analyzes gender bias in SER models trained with Self-Supervised Learning (SSL) at sc...
Conference Paper
Full-text available
Speech emotion recognition (SER) adds to the humane aspects of voice technologies to enhance user experiences. The ground truth emotion annotations provided by human raters and attributes related to the speakers themselves arise a compounded fairness issue in SER. While there exist works in fair SER, our work presents one of the first studies in ad...
Conference Paper
Full-text available
Continuously identifying day-to-day mental stress can be realized by accessing wearable devices to measure physiological indicators. However, the nature of bodily signals raises issues of privacy and data heterogeneity. Recent federated learning scheme provides a promising direction to alleviate the privacy concern, but the large inter-client diffe...
Article
Flow cytometry (FC) is routinely used for hematological disease diagnosis and monitoring. Advancement in this technology allows us to measure an increasing number of markers simultaneously, generating complex high-dimensional datasets. However, current analytic software and methods rely on experienced analysts to perform labor-intensive manual insp...
Article
Full-text available
When selecting test data for subjective tasks, most studies define ground truth labels using aggregation methods such as the majority or plurality rules. These methods discard data points without consensus, making the test set easier than practical tasks where a prediction is needed for each sample. However, the discarded data points often express...
Article
Full-text available
The advancement of Speech Emotion Recognition (SER) is significantly dependent on the quality of emotional speech corpora used for model training. Researchers in the field of SER have developed various corpora by adjusting design parameters to enhance the reliability of the training source. For this study, we focus on exploring communication mode...
Article
Electronic claims records (ECRs) are large scale and longitudinal collections of individual's medical service seeking actions. Compared to in-hospital medical records (EMRs), ECRs are more standardized and cross-sites. Recently, there has been studies showing promising results on modeling claims data for a wide range of medical applications. Howeve...
Article
The substantial growth of Internet-of-Things technology and the ubiquity of smartphone devices has increased the public and industry focus on speech emotion recognition (SER) technologies. Yet, conceptual, technical, and societal challenges restrict the wide adoption of these technologies in various domains, including, healthcare, and education. Th...
Conference Paper
Full-text available
The field of speech emotion recognition (SER) aims to create scientifically rigorous systems that can reliably characterize emotional behaviors expressed in speech. A key aspect for building SER systems is to obtain emotional data that is both reliable and reproducible for practitioners. However, academic researchers encounter difficulties in acces...
Conference Paper
Full-text available
In the field of affective computing, emotional annotations are highly important for both the recognition and synthesis of human emotions. Researchers must ensure that these emotional labels are adequate for modeling general human perception. An unavoidable part of obtaining such labels is that human annotators are exposed to known and unknown stimu...
Conference Paper
Full-text available
Speech emotion recognition (SER) is a key technological module to be integrated into many voice-based solutions. One of the unique fairness issues in SER is caused by the inherently biased emotion perception given by the raters as ground truth labels. Mitigating rater biases are at core for SER to move toward optimizing both recognition and fairnes...
Conference Paper
Full-text available
Modeling cross-lingual speech emotion recognition (SER) has become more prevalent because of its diverse applications. Existing studies have mostly focused on technical approaches that adapt the feature, domain, or label across languages, without considering in detail the similarities between the languages. This study focuses on domain adaptation i...
Conference Paper
Full-text available
The uncertainty in modeling emotions makes speech emotion recognition (SER) systems less reliable. An intuitive way to increase trust in SER is to reject predictions with low confidence. This approach assumes that an SER system is well calibrated, where highly confident predictions are often right and low confident predictions are often wrong. Henc...
Article
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that is prevalent and heterogeneous. Autistic traits describe a wide heterogeneity of behavior symptoms of ASD, and these traits are reflections of core neurodevelopment function deficits. Researchers have predominantly taken a clinical angle to understand autistic traits. They have be...
Preprint
Full-text available
The ability to accurately detect onset of dementia is important in the treatment of the disease. Clinically, the diagnosis of Alzheimer Disease (AD) and Mild Cognitive Impairment (MCI) patients are based on an integrated assessment of psychological tests and brain imaging such as positron emission tomography (PET) and anatomical magnetic resonance...
Conference Paper
Full-text available
Advancing speech emotion recognition (SER) depends highly on the source used to train the model, i.e., the emotional speech corpora. By permuting different design parameters, researchers have released versions of corpora that attempt to provide a "better-quality" source for training SER. In this work, we focus on studying communication modes of col...
Conference Paper
Full-text available
Previous studies on speech emotion recognition (SER) with categorical emotions have often formulated the task as a single-label classification problem, where the emotions are considered orthogonal to each other. However, previous studies have indicated that emotions can co-occur, especially for more ambiguous emotional sentences (e.g., a mixture of...
Article
A small group is a fundamental interaction unit for achieving a shared goal. Group performance can be automatically predicted using computational methods to analyze members’ verbal behavior in task-oriented interactions, as has been proven in several recent works. Most of the prior works focus on lower-level verbal behaviors, such as acoustics and...
Conference Paper
Identifying gene mutation is essential to prognosis and therapeutic decisions for acute myeloid leukemia (AML) but the current gene analysis is inefficient and non-scalable. Pathological images are readily accessible and can be effectively modeled using deep learning. This work aims at predicting gene mutation directly by modeling bone marrow smear...
Conference Paper
Full-text available
Every sound event that we receive and produce everyday carry certain emotional cues. Recently, developing computational methods to recognize induced emotion in movies using content-based modeling is gaining more attention. Most of the existing works treat this as a task of multimodal audio- visual modeling; while these approaches are promising, thi...
Article
Differentiating types of hematologic malignancies is vital to determine therapeutic strategies for the newly diagnosed patients. Flow cytometry (FC) can be used as diagnostic indicator by measuring the multi-parameter fluorescent markers on thousands of antibody-bound cells, but the manual interpretation of large scale flow cytometry data has long...
Conference Paper
Full-text available
The decision of ground truth for speech emotion recognition (SER) is still a critical issue in affective computing tasks. Previous studies on emotion recognition often rely on consensus labels after aggregating the classes selected by multiple annotators. It is common for a perceptual evaluation conducted to annotate emotional corpora to include th...
Article
Full-text available
Speech emotion recognition (SER) plays a crucial role in understanding user feelings when developing artificial intelligence services. However, the data mismatch and label distortion between the training (source) set and the testing (target) set significantly degrade the performances when developing the SER systems. Additionally, most emotion-relat...
Article
Most speech emotion recognition studies often focus on recognizing pre-set emotion classes. However, the task definition may change due to a shift in focus to a previously unseen class in real-world applications. This cross-task modeling has not been addressed previously. Lengthy data re-collection, model retraining, and the traditional adaptation...
Article
Culture is the social norm that often dictates a person’s thoughts, decision-making, and social behaviors during interaction at an individual level. In this study, we present a computational framework that automatically assesses an individual culture attribute of power distance (PDI), i.e., the measure to describe one’s acceptance of social status,...
Article
Achieving robust cross contexts speech emotion recognition (SER) has become a critical next direction of research for wide adoption of SER technology. The core challenge is in the large variability of affective speech that is highly contextualized. Prior works have worked on this as a transfer learning problem that mostly focuses on developing doma...
Article
Speech emotion recognition (SER) is an important research area, with direct impacts in applications of our daily lives, spanning education, health care, security and defense, entertainment, and human–computer interaction. The advances in many other speech signal modeling tasks, such as automatic speech recognition, text-to-speech synthesis, and spe...
Conference Paper
Data-driven deep learning has been considered a promising method for building powerful models for medical data, which often requires a large amount of diverse data to be sufficiently effective. However, the expensive cost of collecting and the privacy constraints lead to the fact that existing medical datasets are small-scale and distributed. Feder...
Conference Paper
Full-text available
Physiological synchrony is a particular phenomenon of physiological responses during a face-face conversation. However, while many previous studies had proposed various physiological synchrony measures between interlocutors in dyadic conversations, there are very few works on computing physiological synchrony in small groups (three or more people)....
Conference Paper
Full-text available
Individual (personalized) self-assessed emotion recognition has received more and more attention recently, such as Human-Centered Artificial Intelligence (AI). In most previous studies, researchers utilized the physiological changes and reactions in the body evoked by multi-media stimuli, e.g., video or music, to build a model for recognizing indiv...
Article
Objectives: Flow cytometry (FC) is critical for the diagnosis and monitoring of hematologic malignancies. Machine learning (ML) methods rapidly classify multidimensional data and should dramatically improve the efficiency of FC data analysis. We aimed to build a model to classify acute leukemias, including acute promyelocytic leukemia (APL), and d...
Article
Mismatch between databases entails a challenge in performing emotion recognition on a practical-condition unlabeled database with labeled source data. The alignment between the source and target is crucial for conventional neural network; therefore, many studies have mapped two domains in a common feature space. However, the effect of distortion in...
Conference Paper
Full-text available
It is well known that human is not good at deception detection because of a natural inclination of truth-bias. However, during a conversation, when an interlocutor (interrogator) is being asked explicitly to assess whether his/her interacting partner (deceiver) is lying, this perceptual judgment depends highly on how the interrogator interprets the...
Article
Full-text available
Physiological automatic personality recognition has been largely developed to model an individuals personality trait from a variety of signals. However, few studies have tackled the problems of integration methodology from multiple observations into a single personality prediction. In this study, we focus on finding a novel learning architecture to...
Preprint
Full-text available
Advancement in speech technology has brought convenience to our life. However, the concern is on the rise as speech signal contains multiple personal attributes, which would lead to either sensitive information leakage or bias toward decision. In this work, we propose an attribute-aligned learning strategy to derive speech representation that can f...
Article
Full-text available
During the COVID-19 pandemic, healthcare professionals and academic facilities are called to provide leadership in disseminating accurate and timely information through approaches that meet the needs of the public. Graduate students from a university in Taiwan collaborated with experts to provide interactive live broadcasting sessions on the COVID-...
Article
Full-text available
While deceptive behaviors are a natural part of human life, it is well known that human is generally bad at detecting deception. In this study, we present an automatic deception detection framework by comprehensively integrating prior domain knowledge in deceptive behavior understanding. Specifically, we compute acoustics, textual information, impl...