Jiawei Du

Jiawei Du
  • Master of Science
  • Master's Student at National Taiwan University

About

10
Publications
896
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11
Citations
Introduction
Skills and Expertise
Current institution
National Taiwan University
Current position
  • Master's Student

Publications

Publications (10)
Preprint
Full-text available
With the rapid advancement of codec-based speech generation (CoSG) systems, creating fake speech that mimics an individual's identity and spreads misinformation has become remarkably easy. Addressing the risks posed by such deepfake speech has attracted significant attention. However, most existing studies focus on detecting fake data generated by...
Conference Paper
Full-text available
Training speech emotion recognition (SER) requires human-annotated labels and speech data. However, emotion perception is complex. The pre-defined emotion categories are not enough for annotators to describe their emotion perception. Devoted annotators will use natural language rather than traditional emotion labels when annotating data, resulting...
Research
Full-text available
Supplementary Material for Empower Typed Descriptions by Large Language Models for Speech Emotion Recognition
Preprint
Full-text available
Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content, paralinguistics, speaker characteristics, and audio information even at low bitrates. Recently, numerous advanced neural...
Preprint
Full-text available
Mainstream zero-shot TTS production systems like Voicebox and Seed-TTS achieve human parity speech by leveraging Flow-matching and Diffusion models, respectively. Unfortunately, human-level audio synthesis leads to identity misuse and information security issues. Currently, many antispoofing models have been developed against deepfake audio. Howeve...
Conference Paper
Full-text available
Speech emotion recognition (SER) is an essential technology for human-computer interaction systems. However, the previous study reveals that 80.77% of SER papers yield results that cannot be reproduced on the well-known IEMOCAP dataset. The main reason for reproducibility challenges is that the database did not provide standard data splits (e.g., t...
Preprint
Full-text available
Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbat...

Network

Cited By