
Yang Gao- Carnegie Mellon University
Yang Gao
- Carnegie Mellon University
About
12
Publications
2,644
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
229
Citations
Introduction
Current institution
Publications
Publications (12)
Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice. Different researchers use different kinds of information from the voice signal to achieve this. Various types of phonated sounds and the sound of cough and breath have all been used with varying degrees of success in automated voice-based COVID-19 dete...
State-of-the-art methods for audio generation suffer from fingerprint artifacts and repeated inconsistencies across temporal and spectral domains. Such artifacts could be well captured by the frequency domain analysis over the spectrogram. Thus, we propose a novel use of long-range spectro-temporal modulation feature -- 2D DCT over log-Mel spectrog...
Automatic speaker verification (ASV) systems utilize the biometric information in human speech to verify the speaker's identity. The techniques used for performing speaker verification are often vulnerable to malicious attacks that attempt to induce the ASV system to return wrong results, allowing an impostor to bypass the system and gain access. A...
With increasing interests in interactive speech systems, speech emotion recognition and multi-style text-to-speech (TTS) synthesis are becoming increasingly important research areas. In this paper, we combine both. We present a method to extract speech style embed-dings from input speech queries and apply this embedding as conditional input to a TT...
In regression tasks the distribution of the data is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are n...
Voice impersonation is not the same as voice transformation, although the latter is an essential element of it. In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the ta...
This paper examines the speaker identification potential of breath sounds in continuous speech. Speech is largely produced during exhalation. In order to replenish air in the lungs, speakers must periodically inhale. When inhalation occurs in the midst of continuous speech, it is generally through the mouth. Intra-speech breathing behavior has been...