Yang Gao

Yang Gao
  • Carnegie Mellon University

About

12
Publications
2,644
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
229
Citations
Current institution
Carnegie Mellon University

Publications

Publications (12)
Preprint
Full-text available
Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice. Different researchers use different kinds of information from the voice signal to achieve this. Various types of phonated sounds and the sound of cough and breath have all been used with varying degrees of success in automated voice-based COVID-19 dete...
Preprint
Full-text available
State-of-the-art methods for audio generation suffer from fingerprint artifacts and repeated inconsistencies across temporal and spectral domains. Such artifacts could be well captured by the frequency domain analysis over the spectrogram. Thus, we propose a novel use of long-range spectro-temporal modulation feature -- 2D DCT over log-Mel spectrog...
Preprint
Full-text available
Automatic speaker verification (ASV) systems utilize the biometric information in human speech to verify the speaker's identity. The techniques used for performing speaker verification are often vulnerable to malicious attacks that attempt to induce the ASV system to return wrong results, allowing an impostor to bypass the system and gain access. A...
Preprint
Full-text available
With increasing interests in interactive speech systems, speech emotion recognition and multi-style text-to-speech (TTS) synthesis are becoming increasingly important research areas. In this paper, we combine both. We present a method to extract speech style embed-dings from input speech queries and apply this embedding as conditional input to a TT...
Preprint
Full-text available
In regression tasks the distribution of the data is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are n...
Article
Full-text available
Voice impersonation is not the same as voice transformation, although the latter is an essential element of it. In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the ta...
Article
Full-text available
This paper examines the speaker identification potential of breath sounds in continuous speech. Speech is largely produced during exhalation. In order to replenish air in the lungs, speakers must periodically inhale. When inhalation occurs in the midst of continuous speech, it is generally through the mouth. Intra-speech breathing behavior has been...

Network

Cited By